date:20190502

Re: [PATCH v2 04/17] kunit: test: add kunit_stream a std::stream like logger

2019-05-02 Thread Brendan Higgins

On Thu, May 2, 2019 at 6:50 PM shuah  wrote:
>
> On 5/1/19 5:01 PM, Brendan Higgins wrote:

< snip >

> > diff --git a/kunit/kunit-stream.c b/kunit/kunit-stream.c
> > new file mode 100644
> > index 0..93c14eec03844
> > --- /dev/null
> > +++ b/kunit/kunit-stream.c
> > @@ -0,0 +1,149 @@

< snip >

> > +static int kunit_stream_init(struct kunit_resource *res, void *context)
> > +{
> > + struct kunit *test = context;
> > + struct kunit_stream *stream;
> > +
> > + stream = kzalloc(sizeof(*stream), GFP_KERNEL);
> > + if (!stream)
> > + return -ENOMEM;
> > + res->allocation = stream;
> > + stream->test = test;
> > + spin_lock_init(>lock);
> > + stream->internal_stream = new_string_stream();
> > +
> > + if (!stream->internal_stream)
> > + return -ENOMEM;
>
> What happens to stream? Don't you want to free that?

Good catch. Will fix in next revision.

< snip >

Cheers
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 12/17] kunit: tool: add Python wrappers for running KUnit tests

2019-05-02 Thread Brendan Higgins

On Thu, May 2, 2019 at 6:45 PM Frank Rowand  wrote:
>
> On 5/2/19 4:45 PM, Brendan Higgins wrote:
> > On Thu, May 2, 2019 at 2:16 PM Frank Rowand  wrote:
> >>
> >> On 5/2/19 11:07 AM, Brendan Higgins wrote:
> >>> On Thu, May 2, 2019 at 4:02 AM Greg KH  wrote:
> 
>  On Wed, May 01, 2019 at 04:01:21PM -0700, Brendan Higgins wrote:
> > From: Felix Guo 
> >
> > The ultimate goal is to create minimal isolated test binaries; in the
> > meantime we are using UML to provide the infrastructure to run tests, so
> > define an abstract way to configure and run tests that allow us to
> > change the context in which tests are built without affecting the user.
> > This also makes pretty and dynamic error reporting, and a lot of other
> > nice features easier.
> >
> > kunit_config.py:
> >   - parse .config and Kconfig files.
> >
> > kunit_kernel.py: provides helper functions to:
> >   - configure the kernel using kunitconfig.
> >   - build the kernel with the appropriate configuration.
> >   - provide function to invoke the kernel and stream the output back.
> >
> > Signed-off-by: Felix Guo 
> > Signed-off-by: Brendan Higgins 
> 
>  Ah, here's probably my answer to my previous logging format question,
>  right?  What's the chance that these wrappers output stuff in a standard
>  format that test-framework-tools can already parse?  :)
> >
> > To be clear, the test-framework-tools format we are talking about is
> > TAP13[1], correct?
>
> I'm not sure what the test community prefers for a format.  I'll let them
> jump in and debate that question.
>
>
> >
> > My understanding is that is what kselftest is being converted to use.
> >
> >>>
> >>> It should be pretty easy to do. I had some patches that pack up the
> >>> results into a serialized format for a presubmit service; it should be
> >>> pretty straightforward to take the same logic and just change the
> >>> output format.
> >>
> >> When examining and trying out the previous versions of the patch I found
> >> the wrappers useful to provide information about how to control and use
> >> the tests, but I had no interest in using the scripts as they do not
> >> fit in with my personal environment and workflow.
> >>
> >> In the previous versions of the patch, these helper scripts are optional,
> >> which is good for my use case.  If the helper scripts are required to
> >
> > They are still optional.
> >
> >> get the data into the proper format then the scripts are not quite so
> >> optional, they become the expected environment.  I think the proper
> >> format should exist without the helper scripts.
> >
> > That's a good point. A couple things,
> >
> > First off, supporting TAP13, either in the kernel or the wrapper
> > script is not hard, but I don't think that is the real issue that you
> > raise.
> >
> > If your only concern is that you will always be able to have human
> > readable KUnit results printed to the kernel log, that is a guarantee
> > I feel comfortable making. Beyond that, I think it is going to take a
> > long while before I would feel comfortable guaranteeing anything about
> > how will KUnit work, what kind of data it will want to expose, and how
> > it will be organized. I think the wrapper script provides a nice
> > facade that I can maintain, can mediate between the implementation
> > details and the user, and can mediate between the implementation
> > details and other pieces of software that might want to consume
> > results.
> >
> > [1] https://testanything.org/tap-version-13-specification.html
>
> My concern is based on a focus on my little part of the world
> (which in _previous_ versions of the patch series was the devicetree
> unittest.c tests being converted to use the kunit infrastructure).
> If I step back and think of the entire kernel globally I may end
> up with a different conclusion - but I'm going to remain myopic
> for this email.
>
> I want the test results to be usable by me and my fellow
> developers.  I prefer that the test results be easily accessible
> (current printk() implementation means that kunit messages are
> just as accessible as the current unittest.c printk() output).
> If the printk() output needs to be filtered through a script
> to generate the actual test results then that is sub-optimal
> to me.  It is one more step added to my workflow.  And
> potentially with an embedded target a major pain to get a
> data file (the kernel log file) transferred from a target
> to my development host.

That's fair. If that is indeed your only concern, then I don't think
the wrapper script will ever be an issue for you. You will always be
able to execute a given test the old fashioned/manual way, and the
wrapper script only summarizes results, it does not change the
contents.

>
> I want a reported test failure to be easy to trace back to the
> point in the source where the failure is reported.  With printk()
> the search is a simple

Re: [PATCH v2 07/17] kunit: test: add initial tests

2019-05-02 Thread Brendan Higgins

On Thu, May 2, 2019 at 6:27 PM shuah  wrote:
>
> On 5/1/19 5:01 PM, Brendan Higgins wrote:
> > Add a test for string stream along with a simpler example.
> >
> > Signed-off-by: Brendan Higgins 
> > ---
> >   kunit/Kconfig  | 12 ++
> >   kunit/Makefile |  4 ++
> >   kunit/example-test.c   | 88 ++
> >   kunit/string-stream-test.c | 61 ++
> >   4 files changed, 165 insertions(+)
> >   create mode 100644 kunit/example-test.c
> >   create mode 100644 kunit/string-stream-test.c
> >
> > diff --git a/kunit/Kconfig b/kunit/Kconfig
> > index 64480092b2c24..5cb500355c873 100644
> > --- a/kunit/Kconfig
> > +++ b/kunit/Kconfig
> > @@ -13,4 +13,16 @@ config KUNIT
> > special hardware. For more information, please see
> > Documentation/kunit/
> >
> > +config KUNIT_TEST
> > + bool "KUnit test for KUnit"
> > + depends on KUNIT
> > + help
> > +   Enables KUnit test to test KUnit.
> > +
>
> Please add a bit more information on what this config option
> does. Why should user care to enable it?
>
> > +config KUNIT_EXAMPLE_TEST
> > + bool "Example test for KUnit"
> > + depends on KUNIT
> > + help
> > +   Enables example KUnit test to demo features of KUnit.
> > +
>
> Same here.

Sounds reasonable. Will fix in the next revision.

< snip >

Thanks!
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 03/17] kunit: test: add string_stream a std::stream like string builder

2019-05-02 Thread Brendan Higgins

On Thu, May 2, 2019 at 6:26 PM shuah  wrote:
>
> On 5/1/19 5:01 PM, Brendan Higgins wrote:
< snip >
> > diff --git a/kunit/Makefile b/kunit/Makefile
> > index 5efdc4dea2c08..275b565a0e81f 100644
> > --- a/kunit/Makefile
> > +++ b/kunit/Makefile
> > @@ -1 +1,2 @@
> > -obj-$(CONFIG_KUNIT) +=   test.o
> > +obj-$(CONFIG_KUNIT) +=   test.o \
> > + string-stream.o
> > diff --git a/kunit/string-stream.c b/kunit/string-stream.c
> > new file mode 100644
> > index 0..7018194ecf2fa
> > --- /dev/null
> > +++ b/kunit/string-stream.c
> > @@ -0,0 +1,144 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * C++ stream style string builder used in KUnit for building messages.
> > + *
> > + * Copyright (C) 2019, Google LLC.
> > + * Author: Brendan Higgins 
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +int string_stream_vadd(struct string_stream *this,
> > +const char *fmt,
> > +va_list args)
> > +{
> > + struct string_stream_fragment *fragment;
>
> Since there is field with the same name, please use a different
> name. Using the same name for the struct which contains a field
> of the same name get very confusing and will hard to maintain
> the code.
>
> > + int len;
> > + va_list args_for_counting;
> > + unsigned long flags;
> > +
> > + /* Make a copy because `vsnprintf` could change it */
> > + va_copy(args_for_counting, args);
> > +
> > + /* Need space for null byte. */
> > + len = vsnprintf(NULL, 0, fmt, args_for_counting) + 1;
> > +
> > + va_end(args_for_counting);
> > +
> > + fragment = kmalloc(sizeof(*fragment), GFP_KERNEL);
> > + if (!fragment)
> > + return -ENOMEM;
> > +
> > + fragment->fragment = kmalloc(len, GFP_KERNEL);
>
> This is confusing. See above comment.

Good point. Will fix in the next revision.

< snip >

Thanks!
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework

2019-05-02 Thread Logan Gunthorpe

On 2019-05-01 5:01 p.m., Brendan Higgins wrote:
> ## TLDR
> 
> I rebased the last patchset on 5.1-rc7 in hopes that we can get this in
> 5.2.
> 

As I said on the last posting, I like this and would like to see it move
forward. I still have the same concerns over the downsides of using UML
(ie. not being able to compile large swaths of the tree due to features
that don't exist in that arch) but these are concerns for later.

I'd prefer to see the unnecessary indirection that I pointed out in
patch 8 cleaned up but, besides that, the code looks good to me.

Reviewed-by: Logan Gunthorpe 

Thanks!

Logan
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 08/17] kunit: test: add support for test abort

2019-05-02 Thread Logan Gunthorpe

On 2019-05-01 5:01 p.m., Brendan Higgins wrote:
> +/*
> + * struct kunit_try_catch - provides a generic way to run code which might 
> fail.
> + * @context: used to pass user data to the try and catch functions.
> + *
> + * kunit_try_catch provides a generic, architecture independent way to 
> execute
> + * an arbitrary function of type kunit_try_catch_func_t which may bail out by
> + * calling kunit_try_catch_throw(). If kunit_try_catch_throw() is called, 
> @try
> + * is stopped at the site of invocation and @catch is catch is called.

I found some of the C++ comparisons in this series a bit distasteful but
wasn't going to say anything until I saw the try catch But looking
into the implementation it's just a thread that can exit early which
seems fine to me. Just a poor choice of name I guess...

[snip]

> +static void __noreturn kunit_abort(struct kunit *test)
> +{
> + kunit_set_death_test(test, true);
> +
> + kunit_try_catch_throw(>try_catch);
> +
> + /*
> +  * Throw could not abort from test.
> +  *
> +  * XXX: we should never reach this line! As kunit_try_catch_throw is
> +  * marked __noreturn.
> +  */
> + WARN_ONCE(true, "Throw could not abort from test!\n");
> +}
> +
>  int kunit_init_test(struct kunit *test, const char *name)
>  {
>   spin_lock_init(>lock);
> @@ -77,6 +103,7 @@ int kunit_init_test(struct kunit *test, const char *name)
>   test->name = name;
>   test->vprintk = kunit_vprintk;
>   test->fail = kunit_fail;
> + test->abort = kunit_abort;

There are a number of these function pointers which seem to be pointless
to me as you only ever set them to one function. Just call the function
directly. As it is, it is an unnecessary indirection for someone reading
the code. If and when you have multiple implementations of the function
then add the pointer. Don't assume you're going to need it later on and
add all this maintenance burden if you never use it..

[snip]

> +void kunit_generic_try_catch_init(struct kunit_try_catch *try_catch)
> +{
> + try_catch->run = kunit_generic_run_try_catch;
> + try_catch->throw = kunit_generic_throw;
> +}

Same here. There's only one implementation of try_catch and I can't
really see any sensible justification for another implementation. Even
if there is, add the indirection when the second implementation is
added. This isn't C++ and we don't need to make everything a "method".

Thanks,

Logan
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 04/17] kunit: test: add kunit_stream a std::stream like logger

2019-05-02 Thread shuah


On 5/1/19 5:01 PM, Brendan Higgins wrote:

A lot of the expectation and assertion infrastructure prints out fairly
complicated test failure messages, so add a C++ style log library for
for logging test results.

Signed-off-by: Brendan Higgins 
---
  include/kunit/kunit-stream.h |  85 
  include/kunit/test.h |   2 +
  kunit/Makefile   |   3 +-
  kunit/kunit-stream.c | 149 +++
  kunit/test.c |   8 ++
  5 files changed, 246 insertions(+), 1 deletion(-)
  create mode 100644 include/kunit/kunit-stream.h
  create mode 100644 kunit/kunit-stream.c

diff --git a/include/kunit/kunit-stream.h b/include/kunit/kunit-stream.h
new file mode 100644
index 0..d457a54fe0100
--- /dev/null
+++ b/include/kunit/kunit-stream.h
@@ -0,0 +1,85 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * C++ stream style string formatter and printer used in KUnit for outputting
+ * KUnit messages.
+ *
+ * Copyright (C) 2019, Google LLC.
+ * Author: Brendan Higgins 
+ */
+
+#ifndef _KUNIT_KUNIT_STREAM_H
+#define _KUNIT_KUNIT_STREAM_H
+
+#include 
+#include 
+
+struct kunit;
+
+/**
+ * struct kunit_stream - a std::stream style string builder.
+ *
+ * A std::stream style string builder. Allows messages to be built up and
+ * printed all at once.
+ */
+struct kunit_stream {
+   /* private: internal use only. */
+   struct kunit *test;
+   spinlock_t lock; /* Guards level. */
+   const char *level;
+   struct string_stream *internal_stream;
+};
+
+/**
+ * kunit_new_stream() - constructs a new  kunit_stream.
+ * @test: The test context object.
+ *
+ * Constructs a new test managed  kunit_stream.
+ */
+struct kunit_stream *kunit_new_stream(struct kunit *test);
+
+/**
+ * kunit_stream_set_level(): sets the level that string should be printed at.
+ * @this: the stream being operated on.
+ * @level: the print level the stream is set to output to.
+ *
+ * Sets the print level at which the stream outputs.
+ */
+void kunit_stream_set_level(struct kunit_stream *this, const char *level);
+
+/**
+ * kunit_stream_add(): adds the formatted input to the internal buffer.
+ * @this: the stream being operated on.
+ * @fmt: printf style format string to append to stream.
+ *
+ * Appends the formatted string, @fmt, to the internal buffer.
+ */
+void __printf(2, 3) kunit_stream_add(struct kunit_stream *this,
+const char *fmt, ...);
+
+/**
+ * kunit_stream_append(): appends the contents of @other to @this.
+ * @this: the stream to which @other is appended.
+ * @other: the stream whose contents are appended to @this.
+ *
+ * Appends the contents of @other to @this.
+ */
+void kunit_stream_append(struct kunit_stream *this, struct kunit_stream 
*other);
+
+/**
+ * kunit_stream_commit(): prints out the internal buffer to the user.
+ * @this: the stream being operated on.
+ *
+ * Outputs the contents of the internal buffer as a kunit_printk formatted
+ * output.
+ */
+void kunit_stream_commit(struct kunit_stream *this);
+
+/**
+ * kunit_stream_clear(): clears the internal buffer.
+ * @this: the stream being operated on.
+ *
+ * Clears the contents of the internal buffer.
+ */
+void kunit_stream_clear(struct kunit_stream *this);
+
+#endif /* _KUNIT_KUNIT_STREAM_H */
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 819edd8db4e81..4668e8a635954 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -11,6 +11,7 @@
  
  #include 

  #include 
+#include 
  
  struct kunit_resource;
  
@@ -171,6 +172,7 @@ struct kunit {

void (*vprintk)(const struct kunit *test,
const char *level,
struct va_format *vaf);
+   void (*fail)(struct kunit *test, struct kunit_stream *stream);
  };
  
  int kunit_init_test(struct kunit *test, const char *name);

diff --git a/kunit/Makefile b/kunit/Makefile
index 275b565a0e81f..6ddc622ee6b1c 100644
--- a/kunit/Makefile
+++ b/kunit/Makefile
@@ -1,2 +1,3 @@
  obj-$(CONFIG_KUNIT) +=test.o \
-   string-stream.o
+   string-stream.o \
+   kunit-stream.o
diff --git a/kunit/kunit-stream.c b/kunit/kunit-stream.c
new file mode 100644
index 0..93c14eec03844
--- /dev/null
+++ b/kunit/kunit-stream.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * C++ stream style string formatter and printer used in KUnit for outputting
+ * KUnit messages.
+ *
+ * Copyright (C) 2019, Google LLC.
+ * Author: Brendan Higgins 
+ */
+
+#include 
+#include 
+#include 
+
+const char *kunit_stream_get_level(struct kunit_stream *this)
+{
+   unsigned long flags;
+   const char *level;
+
+   spin_lock_irqsave(>lock, flags);
+   level = this->level;
+   spin_unlock_irqrestore(>lock, flags);
+
+   return level;
+}
+
+void kunit_stream_set_level(struct kunit_stream *this,

Re: [PATCH v2 12/17] kunit: tool: add Python wrappers for running KUnit tests

2019-05-02 Thread Frank Rowand

On 5/2/19 4:45 PM, Brendan Higgins wrote:
> On Thu, May 2, 2019 at 2:16 PM Frank Rowand  wrote:
>>
>> On 5/2/19 11:07 AM, Brendan Higgins wrote:
>>> On Thu, May 2, 2019 at 4:02 AM Greg KH  wrote:

 On Wed, May 01, 2019 at 04:01:21PM -0700, Brendan Higgins wrote:
> From: Felix Guo 
>
> The ultimate goal is to create minimal isolated test binaries; in the
> meantime we are using UML to provide the infrastructure to run tests, so
> define an abstract way to configure and run tests that allow us to
> change the context in which tests are built without affecting the user.
> This also makes pretty and dynamic error reporting, and a lot of other
> nice features easier.
>
> kunit_config.py:
>   - parse .config and Kconfig files.
>
> kunit_kernel.py: provides helper functions to:
>   - configure the kernel using kunitconfig.
>   - build the kernel with the appropriate configuration.
>   - provide function to invoke the kernel and stream the output back.
>
> Signed-off-by: Felix Guo 
> Signed-off-by: Brendan Higgins 

 Ah, here's probably my answer to my previous logging format question,
 right?  What's the chance that these wrappers output stuff in a standard
 format that test-framework-tools can already parse?  :)
> 
> To be clear, the test-framework-tools format we are talking about is
> TAP13[1], correct?

I'm not sure what the test community prefers for a format.  I'll let them
jump in and debate that question.

> 
> My understanding is that is what kselftest is being converted to use.
> 
>>>
>>> It should be pretty easy to do. I had some patches that pack up the
>>> results into a serialized format for a presubmit service; it should be
>>> pretty straightforward to take the same logic and just change the
>>> output format.
>>
>> When examining and trying out the previous versions of the patch I found
>> the wrappers useful to provide information about how to control and use
>> the tests, but I had no interest in using the scripts as they do not
>> fit in with my personal environment and workflow.
>>
>> In the previous versions of the patch, these helper scripts are optional,
>> which is good for my use case.  If the helper scripts are required to
> 
> They are still optional.
> 
>> get the data into the proper format then the scripts are not quite so
>> optional, they become the expected environment.  I think the proper
>> format should exist without the helper scripts.
> 
> That's a good point. A couple things,
> 
> First off, supporting TAP13, either in the kernel or the wrapper
> script is not hard, but I don't think that is the real issue that you
> raise.
> 
> If your only concern is that you will always be able to have human
> readable KUnit results printed to the kernel log, that is a guarantee
> I feel comfortable making. Beyond that, I think it is going to take a
> long while before I would feel comfortable guaranteeing anything about
> how will KUnit work, what kind of data it will want to expose, and how
> it will be organized. I think the wrapper script provides a nice
> facade that I can maintain, can mediate between the implementation
> details and the user, and can mediate between the implementation
> details and other pieces of software that might want to consume
> results.
> 
> [1] https://testanything.org/tap-version-13-specification.html

My concern is based on a focus on my little part of the world
(which in _previous_ versions of the patch series was the devicetree
unittest.c tests being converted to use the kunit infrastructure).
If I step back and think of the entire kernel globally I may end
up with a different conclusion - but I'm going to remain myopic
for this email.

I want the test results to be usable by me and my fellow
developers.  I prefer that the test results be easily accessible
(current printk() implementation means that kunit messages are
just as accessible as the current unittest.c printk() output).
If the printk() output needs to be filtered through a script
to generate the actual test results then that is sub-optimal
to me.  It is one more step added to my workflow.  And
potentially with an embedded target a major pain to get a
data file (the kernel log file) transferred from a target
to my development host.

I want a reported test failure to be easy to trace back to the
point in the source where the failure is reported.  With printk()
the search is a simple grep for the failure message.  If the
failure message has been processed by a script, and then the
failure reported to me in an email, then I may have to look
at the script to reverse engineer how the original failure
message was transformed into the message that was reported
to me in the email.  Then I search for the point in the
source where the failure is reported.  So a basic task has
just become more difficult and time consuming.

-Frank
___
Linux-nvdimm mailing list

Re: [PATCH v2 03/17] kunit: test: add string_stream a std::stream like string builder

2019-05-02 Thread shuah


On 5/1/19 5:01 PM, Brendan Higgins wrote:

A number of test features need to do pretty complicated string printing
where it may not be possible to rely on a single preallocated string
with parameters.

So provide a library for constructing the string as you go similar to
C++'s std::string.

Signed-off-by: Brendan Higgins 
---
  include/kunit/string-stream.h |  51 
  kunit/Makefile|   3 +-
  kunit/string-stream.c | 144 ++
  3 files changed, 197 insertions(+), 1 deletion(-)
  create mode 100644 include/kunit/string-stream.h
  create mode 100644 kunit/string-stream.c

diff --git a/include/kunit/string-stream.h b/include/kunit/string-stream.h
new file mode 100644
index 0..567a4629406da
--- /dev/null
+++ b/include/kunit/string-stream.h
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * C++ stream style string builder used in KUnit for building messages.
+ *
+ * Copyright (C) 2019, Google LLC.
+ * Author: Brendan Higgins 
+ */
+
+#ifndef _KUNIT_STRING_STREAM_H
+#define _KUNIT_STRING_STREAM_H
+
+#include 
+#include 
+#include 
+#include 
+
+struct string_stream_fragment {
+   struct list_head node;
+   char *fragment;
+};
+
+struct string_stream {
+   size_t length;
+   struct list_head fragments;
+
+   /* length and fragments are protected by this lock */
+   spinlock_t lock;
+   struct kref refcount;
+};
+
+struct string_stream *new_string_stream(void);
+
+void destroy_string_stream(struct string_stream *stream);
+
+void string_stream_get(struct string_stream *stream);
+
+int string_stream_put(struct string_stream *stream);
+
+int string_stream_add(struct string_stream *this, const char *fmt, ...);
+
+int string_stream_vadd(struct string_stream *this,
+  const char *fmt,
+  va_list args);
+
+char *string_stream_get_string(struct string_stream *this);
+
+void string_stream_clear(struct string_stream *this);
+
+bool string_stream_is_empty(struct string_stream *this);
+
+#endif /* _KUNIT_STRING_STREAM_H */
diff --git a/kunit/Makefile b/kunit/Makefile
index 5efdc4dea2c08..275b565a0e81f 100644
--- a/kunit/Makefile
+++ b/kunit/Makefile
@@ -1 +1,2 @@
-obj-$(CONFIG_KUNIT) += test.o
+obj-$(CONFIG_KUNIT) += test.o \
+   string-stream.o
diff --git a/kunit/string-stream.c b/kunit/string-stream.c
new file mode 100644
index 0..7018194ecf2fa
--- /dev/null
+++ b/kunit/string-stream.c
@@ -0,0 +1,144 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * C++ stream style string builder used in KUnit for building messages.
+ *
+ * Copyright (C) 2019, Google LLC.
+ * Author: Brendan Higgins 
+ */
+
+#include 
+#include 
+#include 
+
+int string_stream_vadd(struct string_stream *this,
+  const char *fmt,
+  va_list args)
+{
+   struct string_stream_fragment *fragment;


Since there is field with the same name, please use a different
name. Using the same name for the struct which contains a field
of the same name get very confusing and will hard to maintain
the code.


+   int len;
+   va_list args_for_counting;
+   unsigned long flags;
+
+   /* Make a copy because `vsnprintf` could change it */
+   va_copy(args_for_counting, args);
+
+   /* Need space for null byte. */
+   len = vsnprintf(NULL, 0, fmt, args_for_counting) + 1;
+
+   va_end(args_for_counting);
+
+   fragment = kmalloc(sizeof(*fragment), GFP_KERNEL);
+   if (!fragment)
+   return -ENOMEM;
+
+   fragment->fragment = kmalloc(len, GFP_KERNEL);


This is confusing. See above comment.



+   if (!fragment->fragment) {
+   kfree(fragment);
+   return -ENOMEM;
+   }
+
+   len = vsnprintf(fragment->fragment, len, fmt, args);
+   spin_lock_irqsave(>lock, flags);
+   this->length += len;
+   list_add_tail(>node, >fragments);
+   spin_unlock_irqrestore(>lock, flags);
+   return 0;
+}
+
+int string_stream_add(struct string_stream *this, const char *fmt, ...)
+{
+   va_list args;
+   int result;
+
+   va_start(args, fmt);
+   result = string_stream_vadd(this, fmt, args);
+   va_end(args);
+   return result;
+}
+
+void string_stream_clear(struct string_stream *this)
+{
+   struct string_stream_fragment *fragment, *fragment_safe;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   list_for_each_entry_safe(fragment,
+fragment_safe,
+>fragments,
+node) {
+   list_del(>node);
+   kfree(fragment->fragment);
+   kfree(fragment);


This is what git me down the road of checking the structure
name to begin with. :)


+   }
+   this->length = 0;
+   spin_unlock_irqrestore(>lock, flags);
+}
+
+char

Re: [PATCH v2 07/17] kunit: test: add initial tests

2019-05-02 Thread shuah


On 5/1/19 5:01 PM, Brendan Higgins wrote:

Add a test for string stream along with a simpler example.

Signed-off-by: Brendan Higgins 
---
  kunit/Kconfig  | 12 ++
  kunit/Makefile |  4 ++
  kunit/example-test.c   | 88 ++
  kunit/string-stream-test.c | 61 ++
  4 files changed, 165 insertions(+)
  create mode 100644 kunit/example-test.c
  create mode 100644 kunit/string-stream-test.c

diff --git a/kunit/Kconfig b/kunit/Kconfig
index 64480092b2c24..5cb500355c873 100644
--- a/kunit/Kconfig
+++ b/kunit/Kconfig
@@ -13,4 +13,16 @@ config KUNIT
  special hardware. For more information, please see
  Documentation/kunit/
  
+config KUNIT_TEST

+   bool "KUnit test for KUnit"
+   depends on KUNIT
+   help
+ Enables KUnit test to test KUnit.
+


Please add a bit more information on what this config option
does. Why should user care to enable it?


+config KUNIT_EXAMPLE_TEST
+   bool "Example test for KUnit"
+   depends on KUNIT
+   help
+ Enables example KUnit test to demo features of KUnit.
+


Same here.


  endmenu
diff --git a/kunit/Makefile b/kunit/Makefile
index 6ddc622ee6b1c..60a9ea6cb4697 100644
--- a/kunit/Makefile
+++ b/kunit/Makefile
@@ -1,3 +1,7 @@
  obj-$(CONFIG_KUNIT) +=test.o \
string-stream.o \
kunit-stream.o
+
+obj-$(CONFIG_KUNIT_TEST) +=string-stream-test.o
+
+obj-$(CONFIG_KUNIT_EXAMPLE_TEST) +=example-test.o
diff --git a/kunit/example-test.c b/kunit/example-test.c
new file mode 100644
index 0..3947dd7c8f922
--- /dev/null
+++ b/kunit/example-test.c
@@ -0,0 +1,88 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Example KUnit test to show how to use KUnit.
+ *
+ * Copyright (C) 2019, Google LLC.
+ * Author: Brendan Higgins 
+ */
+
+#include 
+
+/*
+ * This is the most fundamental element of KUnit, the test case. A test case
+ * makes a set EXPECTATIONs and ASSERTIONs about the behavior of some code; if
+ * any expectations or assertions are not met, the test fails; otherwise, the
+ * test passes.
+ *
+ * In KUnit, a test case is just a function with the signature
+ * `void (*)(struct kunit *)`. `struct kunit` is a context object that stores
+ * information about the current test.
+ */
+static void example_simple_test(struct kunit *test)
+{
+   /*
+* This is an EXPECTATION; it is how KUnit tests things. When you want
+* to test a piece of code, you set some expectations about what the
+* code should do. KUnit then runs the test and verifies that the code's
+* behavior matched what was expected.
+*/
+   KUNIT_EXPECT_EQ(test, 1 + 1, 2);
+}
+
+/*
+ * This is run once before each test case, see the comment on
+ * example_test_module for more information.
+ */
+static int example_test_init(struct kunit *test)
+{
+   kunit_info(test, "initializing\n");
+
+   return 0;
+}
+
+/*
+ * Here we make a list of all the test cases we want to add to the test module
+ * below.
+ */
+static struct kunit_case example_test_cases[] = {
+   /*
+* This is a helper to create a test case object from a test case
+* function; its exact function is not important to understand how to
+* use KUnit, just know that this is how you associate test cases with a
+* test module.
+*/
+   KUNIT_CASE(example_simple_test),
+   {},
+};
+
+/*
+ * This defines a suite or grouping of tests.
+ *
+ * Test cases are defined as belonging to the suite by adding them to
+ * `kunit_cases`.
+ *
+ * Often it is desirable to run some function which will set up things which
+ * will be used by every test; this is accomplished with an `init` function
+ * which runs before each test case is invoked. Similarly, an `exit` function
+ * may be specified which runs after every test case and can be used to for
+ * cleanup. For clarity, running tests in a test module would behave as 
follows:
+ *
+ * module.init(test);
+ * module.test_case[0](test);
+ * module.exit(test);
+ * module.init(test);
+ * module.test_case[1](test);
+ * module.exit(test);
+ * ...;
+ */
+static struct kunit_module example_test_module = {
+   .name = "example",
+   .init = example_test_init,
+   .test_cases = example_test_cases,
+};
+
+/*
+ * This registers the above test module telling KUnit that this is a suite of
+ * tests that need to be run.
+ */
+module_test(example_test_module);
diff --git a/kunit/string-stream-test.c b/kunit/string-stream-test.c
new file mode 100644
index 0..b2a98576797c9
--- /dev/null
+++ b/kunit/string-stream-test.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KUnit test for struct string_stream.
+ *
+ * Copyright (C) 2019, Google LLC.
+ * Author: Brendan Higgins 
+ */
+
+#include 
+#include 
+#include 
+
+static void

Re: [PATCH v2 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework

2019-05-02 Thread Brendan Higgins

On Thu, May 2, 2019 at 7:04 AM shuah  wrote:
>
> On 5/2/19 4:50 AM, Greg KH wrote:
> > On Wed, May 01, 2019 at 04:01:09PM -0700, Brendan Higgins wrote:
> >> ## TLDR
> >>
> >> I rebased the last patchset on 5.1-rc7 in hopes that we can get this in
> >> 5.2.
> >
> > That might be rushing it, normally trees are already closed now for
> > 5.2-rc1 if 5.1-final comes out this Sunday.
> >
> >> Shuah, I think you, Greg KH, and myself talked off thread, and we agreed
> >> we would merge through your tree when the time came? Am I remembering
> >> correctly?
> >
> > No objection from me.
> >
>
> Yes. I can take these through kselftest tree when the time comes.

Awesome.

> Agree with Greg that 5.2 might be rushing it. 5.3 would be a good
> target.

Whoops. I guess I should have sent this out a bit earlier. Oh well, as
long as we are on our way!
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework

2019-05-02 Thread Brendan Higgins

On Thu, May 2, 2019 at 4:05 AM Greg KH  wrote:
>
> On Thu, May 02, 2019 at 12:50:53PM +0200, Greg KH wrote:
> > On Wed, May 01, 2019 at 04:01:09PM -0700, Brendan Higgins wrote:
> > > ## TLDR
> > >
> > > I rebased the last patchset on 5.1-rc7 in hopes that we can get this in
> > > 5.2.
> >
> > That might be rushing it, normally trees are already closed now for
> > 5.2-rc1 if 5.1-final comes out this Sunday.
> >
> > > Shuah, I think you, Greg KH, and myself talked off thread, and we agreed
> > > we would merge through your tree when the time came? Am I remembering
> > > correctly?
> >
> > No objection from me.
> >
> > Let me go review the latest round of patches now.
>
> Overall, looks good to me, and provides a framework we can build on.
> I'm a bit annoyed at the reliance on uml at the moment, but we can work
> on that in the future :)

Eh, I mostly fixed that.

I removed the KUnit framework's reliance on UML i.e. the actual tests
now run on any architecture.

The only UML dependent bit is the KUnit wrapper scripts, which could
be made to work to support other architectures pretty trivially. The
only limitation here is that it would be dependent on the actual
workflow you are using.

In anycase, if you are comfortable reading the results in the kernel
logs, then there is no dependence on UML. (I should probably provide
some documentation on that...)

>
> Thanks for sticking with this, now the real work begins...

I don't doubt it.

>
> Reviewed-by: Greg Kroah-Hartman 

Does this cover all the patches in this set?

Thanks!
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v6 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section

2019-05-02 Thread Dan Williams

On Thu, May 2, 2019 at 7:53 AM Pavel Tatashin  wrote:
>
> On Wed, Apr 17, 2019 at 2:52 PM Dan Williams  wrote:
> >
> > Up-level the local section size and mask from kernel/memremap.c to
> > global definitions.  These will be used by the new sub-section hotplug
> > support.
> >
> > Cc: Michal Hocko 
> > Cc: Vlastimil Babka 
> > Cc: Jérôme Glisse 
> > Cc: Logan Gunthorpe 
> > Signed-off-by: Dan Williams 
>
> Should be dropped from this series as it has been replaced by a very
> similar patch in the mainline:
>
> 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
>  mm/memremap: Rename and consolidate SECTION_SIZE

I saw that patch fly by and acked it, but I have not seen it picked up
anywhere. I grabbed latest -linus and -next, but don't see that
commit.

$ git show 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
fatal: bad object 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 12/17] kunit: tool: add Python wrappers for running KUnit tests

2019-05-02 Thread Brendan Higgins

On Thu, May 2, 2019 at 2:16 PM Frank Rowand  wrote:
>
> On 5/2/19 11:07 AM, Brendan Higgins wrote:
> > On Thu, May 2, 2019 at 4:02 AM Greg KH  wrote:
> >>
> >> On Wed, May 01, 2019 at 04:01:21PM -0700, Brendan Higgins wrote:
> >>> From: Felix Guo 
> >>>
> >>> The ultimate goal is to create minimal isolated test binaries; in the
> >>> meantime we are using UML to provide the infrastructure to run tests, so
> >>> define an abstract way to configure and run tests that allow us to
> >>> change the context in which tests are built without affecting the user.
> >>> This also makes pretty and dynamic error reporting, and a lot of other
> >>> nice features easier.
> >>>
> >>> kunit_config.py:
> >>>   - parse .config and Kconfig files.
> >>>
> >>> kunit_kernel.py: provides helper functions to:
> >>>   - configure the kernel using kunitconfig.
> >>>   - build the kernel with the appropriate configuration.
> >>>   - provide function to invoke the kernel and stream the output back.
> >>>
> >>> Signed-off-by: Felix Guo 
> >>> Signed-off-by: Brendan Higgins 
> >>
> >> Ah, here's probably my answer to my previous logging format question,
> >> right?  What's the chance that these wrappers output stuff in a standard
> >> format that test-framework-tools can already parse?  :)

To be clear, the test-framework-tools format we are talking about is
TAP13[1], correct?

My understanding is that is what kselftest is being converted to use.

> >
> > It should be pretty easy to do. I had some patches that pack up the
> > results into a serialized format for a presubmit service; it should be
> > pretty straightforward to take the same logic and just change the
> > output format.
>
> When examining and trying out the previous versions of the patch I found
> the wrappers useful to provide information about how to control and use
> the tests, but I had no interest in using the scripts as they do not
> fit in with my personal environment and workflow.
>
> In the previous versions of the patch, these helper scripts are optional,
> which is good for my use case.  If the helper scripts are required to

They are still optional.

> get the data into the proper format then the scripts are not quite so
> optional, they become the expected environment.  I think the proper
> format should exist without the helper scripts.

That's a good point. A couple things,

First off, supporting TAP13, either in the kernel or the wrapper
script is not hard, but I don't think that is the real issue that you
raise.

If your only concern is that you will always be able to have human
readable KUnit results printed to the kernel log, that is a guarantee
I feel comfortable making. Beyond that, I think it is going to take a
long while before I would feel comfortable guaranteeing anything about
how will KUnit work, what kind of data it will want to expose, and how
it will be organized. I think the wrapper script provides a nice
facade that I can maintain, can mediate between the implementation
details and the user, and can mediate between the implementation
details and other pieces of software that might want to consume
results.

[1] https://testanything.org/tap-version-13-specification.html
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support

2019-05-02 Thread Dan Williams

On Thu, May 2, 2019 at 3:46 PM Pavel Tatashin  wrote:
>
> Hi Dan,
>
> How do you test these patches? Do you have any instructions?

Yes, I briefly mentioned this in the cover letter, but here is the
test I am using:

>
> I see for example that check_hotplug_memory_range() still enforces
> memory_block_size_bytes() alignment.
>
> Also, after removing check_hotplug_memory_range(), I tried to online
> 16M aligned DAX memory, and got the following panic:

Right, this functionality is currently strictly limited to the
devm_memremap_pages() case where there are guarantees that the memory
will never be onlined. This is due to the fact that the section size
is entangled with the memblock api. That said I would have expected
you to trigger the warning in subsection_check() before getting this
far into the hotplug process.
>
> # echo online > /sys/devices/system/memory/memory7/state
> [  202.193132] WARNING: CPU: 2 PID: 351 at drivers/base/memory.c:207
> memory_block_action+0x110/0x178
> [  202.193391] Modules linked in:
> [  202.193698] CPU: 2 PID: 351 Comm: sh Not tainted
> 5.1.0-rc7_pt_devdax-00038-g865af4385544-dirty #9
> [  202.193909] Hardware name: linux,dummy-virt (DT)
> [  202.194122] pstate: 6005 (nZCv daif -PAN -UAO)
> [  202.194243] pc : memory_block_action+0x110/0x178
> [  202.194404] lr : memory_block_action+0x90/0x178
> [  202.194506] sp : 16763ca0
> [  202.194592] x29: 16763ca0 x28: 80016fd29b80
> [  202.194724] x27:  x26: 
> [  202.194838] x25: 15546000 x24: 001c
> [  202.194949] x23:  x22: 0004
> [  202.195058] x21: 001c x20: 0008
> [  202.195168] x19: 0007 x18: 
> [  202.195281] x17:  x16: 
> [  202.195393] x15:  x14: 
> [  202.195505] x13:  x12: 
> [  202.195614] x11:  x10: 
> [  202.195744] x9 :  x8 : 00018000
> [  202.195858] x7 : 0018 x6 : 15541930
> [  202.195966] x5 : 15541930 x4 : 0001
> [  202.196074] x3 : 0001 x2 : 
> [  202.196185] x1 : 0070 x0 : 
> [  202.196366] Call trace:
> [  202.196455]  memory_block_action+0x110/0x178
> [  202.196589]  memory_subsys_online+0x3c/0x80
> [  202.196681]  device_online+0x6c/0x90
> [  202.196761]  state_store+0x84/0x100
> [  202.196841]  dev_attr_store+0x18/0x28
> [  202.196927]  sysfs_kf_write+0x40/0x58
> [  202.197010]  kernfs_fop_write+0xcc/0x1d8
> [  202.197099]  __vfs_write+0x18/0x40
> [  202.197187]  vfs_write+0xa4/0x1b0
> [  202.197295]  ksys_write+0x64/0xd8
> [  202.197430]  __arm64_sys_write+0x18/0x20
> [  202.197521]  el0_svc_common.constprop.0+0x7c/0xe8
> [  202.197621]  el0_svc_handler+0x28/0x78
> [  202.197706]  el0_svc+0x8/0xc
> [  202.197828] ---[ end trace 57719823dda6d21e ]---
>
> Thank you,
> Pasha
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support

2019-05-02 Thread Dan Williams

On Thu, May 2, 2019 at 4:20 PM Dan Williams  wrote:
>
> On Thu, May 2, 2019 at 3:46 PM Pavel Tatashin  
> wrote:
> >
> > Hi Dan,
> >
> > How do you test these patches? Do you have any instructions?
>
> Yes, I briefly mentioned this in the cover letter, but here is the
> test I am using:

Sorry, fumble fingered the 'send' button, here is that link:

https://github.com/pmem/ndctl/blob/subsection-pending/test/sub-section.sh
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support

2019-05-02 Thread Pavel Tatashin

Hi Dan,

How do you test these patches? Do you have any instructions?

I see for example that check_hotplug_memory_range() still enforces
memory_block_size_bytes() alignment.

Also, after removing check_hotplug_memory_range(), I tried to online
16M aligned DAX memory, and got the following panic:

# echo online > /sys/devices/system/memory/memory7/state
[  202.193132] WARNING: CPU: 2 PID: 351 at drivers/base/memory.c:207
memory_block_action+0x110/0x178
[  202.193391] Modules linked in:
[  202.193698] CPU: 2 PID: 351 Comm: sh Not tainted
5.1.0-rc7_pt_devdax-00038-g865af4385544-dirty #9
[  202.193909] Hardware name: linux,dummy-virt (DT)
[  202.194122] pstate: 6005 (nZCv daif -PAN -UAO)
[  202.194243] pc : memory_block_action+0x110/0x178
[  202.194404] lr : memory_block_action+0x90/0x178
[  202.194506] sp : 16763ca0
[  202.194592] x29: 16763ca0 x28: 80016fd29b80
[  202.194724] x27:  x26: 
[  202.194838] x25: 15546000 x24: 001c
[  202.194949] x23:  x22: 0004
[  202.195058] x21: 001c x20: 0008
[  202.195168] x19: 0007 x18: 
[  202.195281] x17:  x16: 
[  202.195393] x15:  x14: 
[  202.195505] x13:  x12: 
[  202.195614] x11:  x10: 
[  202.195744] x9 :  x8 : 00018000
[  202.195858] x7 : 0018 x6 : 15541930
[  202.195966] x5 : 15541930 x4 : 0001
[  202.196074] x3 : 0001 x2 : 
[  202.196185] x1 : 0070 x0 : 
[  202.196366] Call trace:
[  202.196455]  memory_block_action+0x110/0x178
[  202.196589]  memory_subsys_online+0x3c/0x80
[  202.196681]  device_online+0x6c/0x90
[  202.196761]  state_store+0x84/0x100
[  202.196841]  dev_attr_store+0x18/0x28
[  202.196927]  sysfs_kf_write+0x40/0x58
[  202.197010]  kernfs_fop_write+0xcc/0x1d8
[  202.197099]  __vfs_write+0x18/0x40
[  202.197187]  vfs_write+0xa4/0x1b0
[  202.197295]  ksys_write+0x64/0xd8
[  202.197430]  __arm64_sys_write+0x18/0x20
[  202.197521]  el0_svc_common.constprop.0+0x7c/0xe8
[  202.197621]  el0_svc_handler+0x28/0x78
[  202.197706]  el0_svc+0x8/0xc
[  202.197828] ---[ end trace 57719823dda6d21e ]---

Thank you,
Pasha
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [v5 0/3] "Hotremove" persistent memory

2019-05-02 Thread Pavel Tatashin

On Thu, May 2, 2019 at 6:29 PM Verma, Vishal L  wrote:
>
> On Thu, 2019-05-02 at 17:44 -0400, Pavel Tatashin wrote:
>
> > > In running with these patches, and testing the offlining part, I ran
> > > into the following lockdep below.
> > >
> > > This is with just these three patches on top of -rc7.
> >
> > Hi Verma,
> >
> > Thank you for testing. I wonder if there is a command sequence that I
> > could run to reproduce it?
> > Also, could you please send your config and qemu arguments.
> >
> Yes, here is the qemu config:
>
> qemu-system-x86_64
> -machine accel=kvm
> -machine 
> pc-i440fx-2.6,accel=kvm,usb=off,vmport=off,dump-guest-core=off,nvdimm
> -cpu Haswell-noTSX
> -m 12G,slots=3,maxmem=44G
> -realtime mlock=off
> -smp 8,sockets=2,cores=4,threads=1
> -numa node,nodeid=0,cpus=0-3,mem=6G
> -numa node,nodeid=1,cpus=4-7,mem=6G
> -numa node,nodeid=2
> -numa node,nodeid=3
> -drive 
> file=/virt/fedora-test.qcow2,format=qcow2,if=none,id=drive-virtio-disk1
> -device 
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=1
> -object 
> memory-backend-file,id=mem1,share,mem-path=/virt/nvdimm1,size=16G,align=128M
> -device nvdimm,memdev=mem1,id=nv1,label-size=2M,node=2
> -object 
> memory-backend-file,id=mem2,share,mem-path=/virt/nvdimm2,size=16G,align=128M
> -device nvdimm,memdev=mem2,id=nv2,label-size=2M,node=3
> -serial stdio
> -display none
>
> For the command list - I'm using WIP patches to ndctl/daxctl to add the
> command I mentioned earlier. Using this command, I can reproduce the
> lockdep issue. I thought I should be able to reproduce the issue by
> onlining/offlining through sysfs directly too - something like:
>
>node="$(cat /sys/bus/dax/devices/dax0.0/target_node)"
>for mem in /sys/devices/system/node/node"$node"/memory*; do
>  echo "offline" > $mem/state
>done
>
> But with that I can't reproduce the problem.
>
> I'll try to dig a bit deeper into what might be happening, the daxctl
> modifications simply amount to doing the same thing as above in C, so
> I'm not immediately sure what might be happening.
>
> If you're interested, I can post the ndctl patches - maybe as an RFC -
> to test with.

I could apply the patches and test with them. Also, could you please
send your kernel config.

Thank you,
Pasha

>
> Thanks,
> -Vishal
>
>
>
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [v5 0/3] "Hotremove" persistent memory

2019-05-02 Thread Verma, Vishal L

On Thu, 2019-05-02 at 17:44 -0400, Pavel Tatashin wrote:

> > In running with these patches, and testing the offlining part, I ran
> > into the following lockdep below.
> > 
> > This is with just these three patches on top of -rc7.
> 
> Hi Verma,
> 
> Thank you for testing. I wonder if there is a command sequence that I
> could run to reproduce it?
> Also, could you please send your config and qemu arguments.
> 
Yes, here is the qemu config:

qemu-system-x86_64
-machine accel=kvm
-machine 
pc-i440fx-2.6,accel=kvm,usb=off,vmport=off,dump-guest-core=off,nvdimm
-cpu Haswell-noTSX
-m 12G,slots=3,maxmem=44G
-realtime mlock=off
-smp 8,sockets=2,cores=4,threads=1
-numa node,nodeid=0,cpus=0-3,mem=6G
-numa node,nodeid=1,cpus=4-7,mem=6G
-numa node,nodeid=2
-numa node,nodeid=3
-drive 
file=/virt/fedora-test.qcow2,format=qcow2,if=none,id=drive-virtio-disk1
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=1
-object 
memory-backend-file,id=mem1,share,mem-path=/virt/nvdimm1,size=16G,align=128M
-device nvdimm,memdev=mem1,id=nv1,label-size=2M,node=2
-object 
memory-backend-file,id=mem2,share,mem-path=/virt/nvdimm2,size=16G,align=128M
-device nvdimm,memdev=mem2,id=nv2,label-size=2M,node=3
-serial stdio
-display none

For the command list - I'm using WIP patches to ndctl/daxctl to add the
command I mentioned earlier. Using this command, I can reproduce the
lockdep issue. I thought I should be able to reproduce the issue by
onlining/offlining through sysfs directly too - something like:

   node="$(cat /sys/bus/dax/devices/dax0.0/target_node)"
   for mem in /sys/devices/system/node/node"$node"/memory*; do
 echo "offline" > $mem/state
   done

But with that I can't reproduce the problem.

I'll try to dig a bit deeper into what might be happening, the daxctl
modifications simply amount to doing the same thing as above in C, so
I'm not immediately sure what might be happening.

If you're interested, I can post the ndctl patches - maybe as an RFC -
to test with.

Thanks,
-Vishal



___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [v5 0/3] "Hotremove" persistent memory

2019-05-02 Thread Pavel Tatashin

On Thu, May 2, 2019 at 4:50 PM Verma, Vishal L  wrote:
>
> On Thu, 2019-05-02 at 14:43 -0400, Pavel Tatashin wrote:
> > The series of operations look like this:
> >
> > 1. After boot restore /dev/pmem0 to ramdisk to be consumed by apps.
> >and free ramdisk.
> > 2. Convert raw pmem0 to devdax
> >ndctl create-namespace --mode devdax --map mem -e namespace0.0 -f
> > 3. Hotadd to System RAM
> >echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
> >echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
> >echo online_movable > /sys/devices/system/memoryXXX/state
> > 4. Before reboot hotremove device-dax memory from System RAM
> >echo offline > /sys/devices/system/memoryXXX/state
> >echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
>
> Hi Pavel,
>
> I am working on adding this sort of a workflow into a new daxctl command
> (daxctl-reconfigure-device)- this will allow changing the 'mode' of a
> dax device to kmem, online the resulting memory, and with your patches,
> also attempt to offline the memory, and change back to device-dax.
>
> In running with these patches, and testing the offlining part, I ran
> into the following lockdep below.
>
> This is with just these three patches on top of -rc7.

Hi Verma,

Thank you for testing. I wonder if there is a command sequence that I
could run to reproduce it?
Also, could you please send your config and qemu arguments.

Thank you,
Pasha

>
>
> [  +0.004886] ==
> [  +0.001576] WARNING: possible circular locking dependency detected
> [  +0.001506] 5.1.0-rc7+ #13 Tainted: G   O
> [  +0.000929] --
> [  +0.000708] daxctl/22950 is trying to acquire lock:
> [  +0.000548] f4d397f7 (kn->count#424){}, at: 
> kernfs_remove_by_name_ns+0x40/0x80
> [  +0.000922]
>   but task is already holding lock:
> [  +0.000657] 2aa52a9f (mem_sysfs_mutex){+.+.}, at: 
> unregister_memory_section+0x22/0xa0
> [  +0.000960]
>   which lock already depends on the new lock.
>
> [  +0.001001]
>   the existing dependency chain (in reverse order) is:
> [  +0.000837]
>   -> #3 (mem_sysfs_mutex){+.+.}:
> [  +0.000631]__mutex_lock+0x82/0x9a0
> [  +0.000477]unregister_memory_section+0x22/0xa0
> [  +0.000582]__remove_pages+0xe9/0x520
> [  +0.000489]arch_remove_memory+0x81/0xc0
> [  +0.000510]devm_memremap_pages_release+0x180/0x270
> [  +0.000633]release_nodes+0x234/0x280
> [  +0.000483]device_release_driver_internal+0xf4/0x1d0
> [  +0.000701]bus_remove_device+0xfc/0x170
> [  +0.000529]device_del+0x16a/0x380
> [  +0.000459]unregister_dev_dax+0x23/0x50
> [  +0.000526]release_nodes+0x234/0x280
> [  +0.000487]device_release_driver_internal+0xf4/0x1d0
> [  +0.000646]unbind_store+0x9b/0x130
> [  +0.000467]kernfs_fop_write+0xf0/0x1a0
> [  +0.000510]vfs_write+0xba/0x1c0
> [  +0.000438]ksys_write+0x5a/0xe0
> [  +0.000521]do_syscall_64+0x60/0x210
> [  +0.000489]entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  +0.000637]
>   -> #2 (mem_hotplug_lock.rw_sem){}:
> [  +0.000717]get_online_mems+0x3e/0x80
> [  +0.000491]kmem_cache_create_usercopy+0x2e/0x270
> [  +0.000609]kmem_cache_create+0x12/0x20
> [  +0.000507]ptlock_cache_init+0x20/0x28
> [  +0.000506]start_kernel+0x240/0x4d0
> [  +0.000480]secondary_startup_64+0xa4/0xb0
> [  +0.000539]
>   -> #1 (cpu_hotplug_lock.rw_sem){}:
> [  +0.000784]cpus_read_lock+0x3e/0x80
> [  +0.000511]online_pages+0x37/0x310
> [  +0.000469]memory_subsys_online+0x34/0x60
> [  +0.000611]device_online+0x60/0x80
> [  +0.000611]state_store+0x66/0xd0
> [  +0.000552]kernfs_fop_write+0xf0/0x1a0
> [  +0.000649]vfs_write+0xba/0x1c0
> [  +0.000487]ksys_write+0x5a/0xe0
> [  +0.000459]do_syscall_64+0x60/0x210
> [  +0.000482]entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  +0.000646]
>   -> #0 (kn->count#424){}:
> [  +0.000669]lock_acquire+0x9e/0x180
> [  +0.000471]__kernfs_remove+0x26a/0x310
> [  +0.000518]kernfs_remove_by_name_ns+0x40/0x80
> [  +0.000583]remove_files.isra.1+0x30/0x70
> [  +0.000555]sysfs_remove_group+0x3d/0x80
> [  +0.000524]sysfs_remove_groups+0x29/0x40
> [  +0.000532]device_remove_attrs+0x42/0x80
> [  +0.000522]device_del+0x162/0x380
> [  +0.000464]device_unregister+0x16/0x60
> [  +0.000505]unregister_memory_section+0x6e/0xa0
> [  +0.000591]__remove_pages+0xe9/0x520
> [  +0.000492]arch_remove_memory+0x81/0xc0
> [  +0.000568]try_remove_memory+0xba/0xd0
> [  +0.000510]remove_memory+0x23/0x40
> [  +0.000483]dev_dax_kmem_remove+0x29/0x57 [kmem]
> [  +0.000608]

Re: [PATCH v6 08/12] mm/sparsemem: Prepare for sub-section ranges

2019-05-02 Thread Pavel Tatashin

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams  wrote:
>
> Prepare the memory hot-{add,remove} paths for handling sub-section
> ranges by plumbing the starting page frame and number of pages being
> handled through arch_{add,remove}_memory() to
> sparse_{add,remove}_one_section().
>
> This is simply plumbing, small cleanups, and some identifier renames. No
> intended functional changes.
>
> Cc: Michal Hocko 
> Cc: Vlastimil Babka 
> Cc: Logan Gunthorpe 
> Signed-off-by: Dan Williams 

Reviewed-by: Pavel Tatashin 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 04/17] kunit: test: add kunit_stream a std::stream like logger

2019-05-02 Thread Frank Rowand

On 5/2/19 1:25 PM, Brendan Higgins wrote:
> On Thu, May 2, 2019 at 4:00 AM Greg KH  wrote:
>>
>> On Wed, May 01, 2019 at 04:01:13PM -0700, Brendan Higgins wrote:
>>> A lot of the expectation and assertion infrastructure prints out fairly
>>> complicated test failure messages, so add a C++ style log library for
>>> for logging test results.
>>
>> Ideally we would always use a standard logging format, like the
>> kselftest tests all are aiming to do.  That way the output can be easily
>> parsed by tools to see if the tests succeed/fail easily.
>>
>> Any chance of having this logging framework enforcing that format as
>> well?
> 
> I agree with your comment on the later patch that we should handle
> this at the wrapper script layer (KUnit tool).

This discussion is a little confusing, because it is spread across two
patches.

I do not agree that this should be handled in the wrapper script, as
noted in my reply to patch 12, so not repeating it here.

-Frank
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 12/17] kunit: tool: add Python wrappers for running KUnit tests

2019-05-02 Thread Frank Rowand

On 5/2/19 11:07 AM, Brendan Higgins wrote:
> On Thu, May 2, 2019 at 4:02 AM Greg KH  wrote:
>>
>> On Wed, May 01, 2019 at 04:01:21PM -0700, Brendan Higgins wrote:
>>> From: Felix Guo 
>>>
>>> The ultimate goal is to create minimal isolated test binaries; in the
>>> meantime we are using UML to provide the infrastructure to run tests, so
>>> define an abstract way to configure and run tests that allow us to
>>> change the context in which tests are built without affecting the user.
>>> This also makes pretty and dynamic error reporting, and a lot of other
>>> nice features easier.
>>>
>>> kunit_config.py:
>>>   - parse .config and Kconfig files.
>>>
>>> kunit_kernel.py: provides helper functions to:
>>>   - configure the kernel using kunitconfig.
>>>   - build the kernel with the appropriate configuration.
>>>   - provide function to invoke the kernel and stream the output back.
>>>
>>> Signed-off-by: Felix Guo 
>>> Signed-off-by: Brendan Higgins 
>>
>> Ah, here's probably my answer to my previous logging format question,
>> right?  What's the chance that these wrappers output stuff in a standard
>> format that test-framework-tools can already parse?  :)
> 
> It should be pretty easy to do. I had some patches that pack up the
> results into a serialized format for a presubmit service; it should be
> pretty straightforward to take the same logic and just change the
> output format.

When examining and trying out the previous versions of the patch I found
the wrappers useful to provide information about how to control and use
the tests, but I had no interest in using the scripts as they do not
fit in with my personal environment and workflow.

In the previous versions of the patch, these helper scripts are optional,
which is good for my use case.  If the helper scripts are required to
get the data into the proper format then the scripts are not quite so
optional, they become the expected environment.  I think the proper
format should exist without the helper scripts.

-Frank
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [v5 0/3] "Hotremove" persistent memory

2019-05-02 Thread Verma, Vishal L

On Thu, 2019-05-02 at 14:43 -0400, Pavel Tatashin wrote:
> The series of operations look like this:
> 
> 1. After boot restore /dev/pmem0 to ramdisk to be consumed by apps.
>and free ramdisk.
> 2. Convert raw pmem0 to devdax
>ndctl create-namespace --mode devdax --map mem -e namespace0.0 -f
> 3. Hotadd to System RAM
>echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
>echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
>echo online_movable > /sys/devices/system/memoryXXX/state
> 4. Before reboot hotremove device-dax memory from System RAM
>echo offline > /sys/devices/system/memoryXXX/state
>echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind

Hi Pavel,

I am working on adding this sort of a workflow into a new daxctl command
(daxctl-reconfigure-device)- this will allow changing the 'mode' of a
dax device to kmem, online the resulting memory, and with your patches,
also attempt to offline the memory, and change back to device-dax.

In running with these patches, and testing the offlining part, I ran
into the following lockdep below.

This is with just these three patches on top of -rc7.


[  +0.004886] ==
[  +0.001576] WARNING: possible circular locking dependency detected
[  +0.001506] 5.1.0-rc7+ #13 Tainted: G   O 
[  +0.000929] --
[  +0.000708] daxctl/22950 is trying to acquire lock:
[  +0.000548] f4d397f7 (kn->count#424){}, at: 
kernfs_remove_by_name_ns+0x40/0x80
[  +0.000922] 
  but task is already holding lock:
[  +0.000657] 2aa52a9f (mem_sysfs_mutex){+.+.}, at: 
unregister_memory_section+0x22/0xa0
[  +0.000960] 
  which lock already depends on the new lock.

[  +0.001001] 
  the existing dependency chain (in reverse order) is:
[  +0.000837] 
  -> #3 (mem_sysfs_mutex){+.+.}:
[  +0.000631]__mutex_lock+0x82/0x9a0
[  +0.000477]unregister_memory_section+0x22/0xa0
[  +0.000582]__remove_pages+0xe9/0x520
[  +0.000489]arch_remove_memory+0x81/0xc0
[  +0.000510]devm_memremap_pages_release+0x180/0x270
[  +0.000633]release_nodes+0x234/0x280
[  +0.000483]device_release_driver_internal+0xf4/0x1d0
[  +0.000701]bus_remove_device+0xfc/0x170
[  +0.000529]device_del+0x16a/0x380
[  +0.000459]unregister_dev_dax+0x23/0x50
[  +0.000526]release_nodes+0x234/0x280
[  +0.000487]device_release_driver_internal+0xf4/0x1d0
[  +0.000646]unbind_store+0x9b/0x130
[  +0.000467]kernfs_fop_write+0xf0/0x1a0
[  +0.000510]vfs_write+0xba/0x1c0
[  +0.000438]ksys_write+0x5a/0xe0
[  +0.000521]do_syscall_64+0x60/0x210
[  +0.000489]entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  +0.000637] 
  -> #2 (mem_hotplug_lock.rw_sem){}:
[  +0.000717]get_online_mems+0x3e/0x80
[  +0.000491]kmem_cache_create_usercopy+0x2e/0x270
[  +0.000609]kmem_cache_create+0x12/0x20
[  +0.000507]ptlock_cache_init+0x20/0x28
[  +0.000506]start_kernel+0x240/0x4d0
[  +0.000480]secondary_startup_64+0xa4/0xb0
[  +0.000539] 
  -> #1 (cpu_hotplug_lock.rw_sem){}:
[  +0.000784]cpus_read_lock+0x3e/0x80
[  +0.000511]online_pages+0x37/0x310
[  +0.000469]memory_subsys_online+0x34/0x60
[  +0.000611]device_online+0x60/0x80
[  +0.000611]state_store+0x66/0xd0
[  +0.000552]kernfs_fop_write+0xf0/0x1a0
[  +0.000649]vfs_write+0xba/0x1c0
[  +0.000487]ksys_write+0x5a/0xe0
[  +0.000459]do_syscall_64+0x60/0x210
[  +0.000482]entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  +0.000646] 
  -> #0 (kn->count#424){}:
[  +0.000669]lock_acquire+0x9e/0x180
[  +0.000471]__kernfs_remove+0x26a/0x310
[  +0.000518]kernfs_remove_by_name_ns+0x40/0x80
[  +0.000583]remove_files.isra.1+0x30/0x70
[  +0.000555]sysfs_remove_group+0x3d/0x80
[  +0.000524]sysfs_remove_groups+0x29/0x40
[  +0.000532]device_remove_attrs+0x42/0x80
[  +0.000522]device_del+0x162/0x380
[  +0.000464]device_unregister+0x16/0x60
[  +0.000505]unregister_memory_section+0x6e/0xa0
[  +0.000591]__remove_pages+0xe9/0x520
[  +0.000492]arch_remove_memory+0x81/0xc0
[  +0.000568]try_remove_memory+0xba/0xd0
[  +0.000510]remove_memory+0x23/0x40
[  +0.000483]dev_dax_kmem_remove+0x29/0x57 [kmem]
[  +0.000608]device_release_driver_internal+0xe4/0x1d0
[  +0.000637]unbind_store+0x9b/0x130
[  +0.000464]kernfs_fop_write+0xf0/0x1a0
[  +0.000685]vfs_write+0xba/0x1c0
[  +0.000594]ksys_write+0x5a/0xe0
[  +0.000449]do_syscall_64+0x60/0x210
[  +0.000481]entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  +0.000619] 
  other info that might help us debug this:

[  +0.000889] Chain exists of:

Re: [PATCH v2 07/17] kunit: test: add initial tests

2019-05-02 Thread Brendan Higgins

On Thu, May 2, 2019 at 3:58 AM Greg KH  wrote:
>
> On Wed, May 01, 2019 at 04:01:16PM -0700, Brendan Higgins wrote:
> > Add a test for string stream along with a simpler example.
> >
> > Signed-off-by: Brendan Higgins 
> > ---
> >  kunit/Kconfig  | 12 ++
> >  kunit/Makefile |  4 ++
> >  kunit/example-test.c   | 88 ++
> >  kunit/string-stream-test.c | 61 ++
> >  4 files changed, 165 insertions(+)
> >  create mode 100644 kunit/example-test.c
> >  create mode 100644 kunit/string-stream-test.c
> >
> > diff --git a/kunit/Kconfig b/kunit/Kconfig
> > index 64480092b2c24..5cb500355c873 100644
> > --- a/kunit/Kconfig
> > +++ b/kunit/Kconfig
> > @@ -13,4 +13,16 @@ config KUNIT
> > special hardware. For more information, please see
> > Documentation/kunit/
> >
> > +config KUNIT_TEST
> > + bool "KUnit test for KUnit"
> > + depends on KUNIT
> > + help
> > +   Enables KUnit test to test KUnit.
> > +
> > +config KUNIT_EXAMPLE_TEST
> > + bool "Example test for KUnit"
> > + depends on KUNIT
> > + help
> > +   Enables example KUnit test to demo features of KUnit.
>
> Can't these tests be module?

At this time, no. KUnit doesn't support loading tests as kernel
modules; it is something we could add in in the future, but I would
rather not open that can of worms right now. There are some other
things I would like to do that would probably be easier to do before
adding support for tests as loadable modules.

>
> Or am I mis-reading the previous logic?
>
> Anyway, just a question, nothing objecting to this as-is for now.

Cool

Cheers!
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 04/17] kunit: test: add kunit_stream a std::stream like logger

2019-05-02 Thread Brendan Higgins

On Thu, May 2, 2019 at 4:00 AM Greg KH  wrote:
>
> On Wed, May 01, 2019 at 04:01:13PM -0700, Brendan Higgins wrote:
> > A lot of the expectation and assertion infrastructure prints out fairly
> > complicated test failure messages, so add a C++ style log library for
> > for logging test results.
>
> Ideally we would always use a standard logging format, like the
> kselftest tests all are aiming to do.  That way the output can be easily
> parsed by tools to see if the tests succeed/fail easily.
>
> Any chance of having this logging framework enforcing that format as
> well?

I agree with your comment on the later patch that we should handle
this at the wrapper script layer (KUnit tool).
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v6 05/12] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap()

2019-05-02 Thread Pavel Tatashin

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams  wrote:
>
> Allow sub-section sized ranges to be added to the memmap.
> populate_section_memmap() takes an explict pfn range rather than
> assuming a full section, and those parameters are plumbed all the way
> through to vmmemap_populate(). There should be no sub-section usage in
> current deployments. New warnings are added to clarify which memmap
> allocation paths are sub-section capable.
>
> Cc: Michal Hocko 
> Cc: David Hildenbrand 
> Cc: Logan Gunthorpe 
> Signed-off-by: Dan Williams 

 Reviewed-by: Pavel Tatashin 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v6 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal

2019-05-02 Thread Pavel Tatashin

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams  wrote:
>
> Sub-section hotplug support reduces the unit of operation of hotplug
> from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units
> (PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider
> PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not
> valid_section(), can toggle.
>
> Cc: Michal Hocko 
> Cc: Vlastimil Babka 
> Cc: Logan Gunthorpe 
> Signed-off-by: Dan Williams 
> ---
>  include/linux/mmzone.h |2 ++
>  mm/memory_hotplug.c|   16 
>  2 files changed, 10 insertions(+), 8 deletions(-)

given removing all unused "*ms"

Reviewed-by: Pavel Tatashin 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

[v5 2/3] mm/hotplug: make remove_memory() interface useable

2019-05-02 Thread Pavel Tatashin

As of right now remove_memory() interface is inherently broken. It tries
to remove memory but panics if some memory is not offline. The problem
is that it is impossible to ensure that all memory blocks are offline as
this function also takes lock_device_hotplug that is required to
change memory state via sysfs.

So, between calling this function and offlining all memory blocks there
is always a window when lock_device_hotplug is released, and therefore,
there is always a chance for a panic during this window.

Make this interface to return an error if memory removal fails. This way
it is safe to call this function without panicking machine, and also
makes it symmetric to add_memory() which already returns an error.

Signed-off-by: Pavel Tatashin 
---
 include/linux/memory_hotplug.h |  8 +++--
 mm/memory_hotplug.c| 61 ++
 2 files changed, 46 insertions(+), 23 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 8ade08c50d26..5438a2d92560 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -304,7 +304,7 @@ static inline void pgdat_resize_init(struct pglist_data 
*pgdat) {}
 extern bool is_mem_section_removable(unsigned long pfn, unsigned long 
nr_pages);
 extern void try_offline_node(int nid);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
-extern void remove_memory(int nid, u64 start, u64 size);
+extern int remove_memory(int nid, u64 start, u64 size);
 extern void __remove_memory(int nid, u64 start, u64 size);
 
 #else
@@ -321,7 +321,11 @@ static inline int offline_pages(unsigned long start_pfn, 
unsigned long nr_pages)
return -EINVAL;
 }
 
-static inline void remove_memory(int nid, u64 start, u64 size) {}
+static inline bool remove_memory(int nid, u64 start, u64 size)
+{
+   return -EBUSY;
+}
+
 static inline void __remove_memory(int nid, u64 start, u64 size) {}
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 8c454e82d4f6..a826aededa1a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1778,9 +1778,10 @@ static int check_memblock_offlined_cb(struct 
memory_block *mem, void *arg)
endpa = PFN_PHYS(section_nr_to_pfn(mem->end_section_nr + 1))-1;
pr_warn("removing memory fails, because memory [%pa-%pa] is 
onlined\n",
, );
-   }
 
-   return ret;
+   return -EBUSY;
+   }
+   return 0;
 }
 
 static int check_cpu_on_node(pg_data_t *pgdat)
@@ -1843,19 +1844,9 @@ void try_offline_node(int nid)
 }
 EXPORT_SYMBOL(try_offline_node);
 
-/**
- * remove_memory
- * @nid: the node ID
- * @start: physical address of the region to remove
- * @size: size of the region to remove
- *
- * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
- * and online/offline operations before this call, as required by
- * try_offline_node().
- */
-void __ref __remove_memory(int nid, u64 start, u64 size)
+static int __ref try_remove_memory(int nid, u64 start, u64 size)
 {
-   int ret;
+   int rc = 0;
 
BUG_ON(check_hotplug_memory_range(start, size));
 
@@ -1863,13 +1854,13 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
 
/*
 * All memory blocks must be offlined before removing memory.  Check
-* whether all memory blocks in question are offline and trigger a BUG()
+* whether all memory blocks in question are offline and return error
 * if this is not the case.
 */
-   ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
-   check_memblock_offlined_cb);
-   if (ret)
-   BUG();
+   rc = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
+  check_memblock_offlined_cb);
+   if (rc)
+   goto done;
 
/* remove memmap entry */
firmware_map_remove(start, start + size, "System RAM");
@@ -1879,14 +1870,42 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
 
try_offline_node(nid);
 
+done:
mem_hotplug_done();
+   return rc;
 }
 
-void remove_memory(int nid, u64 start, u64 size)
+/**
+ * remove_memory
+ * @nid: the node ID
+ * @start: physical address of the region to remove
+ * @size: size of the region to remove
+ *
+ * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
+ * and online/offline operations before this call, as required by
+ * try_offline_node().
+ */
+void __remove_memory(int nid, u64 start, u64 size)
 {
+
+   /*
+* trigger BUG() is some memory is not offlined prior to calling this
+* function
+*/
+   if (try_remove_memory(nid, start, size))
+   BUG();
+}
+
+/* Remove memory if every memory block is offline, otherwise return false */
+int remove_memory(int nid, u64 start, u64 size)
+{
+   int rc;
+

Re: [PATCH v2 16/17] kernel/sysctl-test: Add null pointer test for sysctl.c:proc_dointvec()

2019-05-02 Thread Brendan Higgins

On Thu, May 2, 2019 at 11:15 AM  wrote:
>
>
>
> > -Original Message-
> > From: Greg KH
> >
> > On Wed, May 01, 2019 at 04:01:25PM -0700, Brendan Higgins wrote:
> > > From: Iurii Zaikin 
> > >
> > > KUnit tests for initialized data behavior of proc_dointvec that is
> > > explicitly checked in the code. Includes basic parsing tests including
> > > int min/max overflow.
> > >
> > > Signed-off-by: Iurii Zaikin 
> > > Signed-off-by: Brendan Higgins 
> > > ---
> > >  kernel/Makefile  |   2 +
> > >  kernel/sysctl-test.c | 292
> > +++
> > >  lib/Kconfig.debug|   6 +
> > >  3 files changed, 300 insertions(+)
> > >  create mode 100644 kernel/sysctl-test.c
> > >
> > > diff --git a/kernel/Makefile b/kernel/Makefile
> > > index 6c57e78817dad..c81a8976b6a4b 100644
> > > --- a/kernel/Makefile
> > > +++ b/kernel/Makefile
> > > @@ -112,6 +112,8 @@ obj-$(CONFIG_HAS_IOMEM) += iomem.o
> > >  obj-$(CONFIG_ZONE_DEVICE) += memremap.o
> > >  obj-$(CONFIG_RSEQ) += rseq.o
> > >
> > > +obj-$(CONFIG_SYSCTL_KUNIT_TEST) += sysctl-test.o
> >
> > You are going to have to have a "standard" naming scheme for test
> > modules, are you going to recommend "foo-test" over "test-foo"?  If so,
> > that's fine, we should just be consistant and document it somewhere.
> >
> > Personally, I'd prefer "test-foo", but that's just me, naming is hard...
>
> My preference would be "test-foo" as well.  Just my 2 cents.

I definitely agree we should be consistent. My personal bias
(unsurprisingly) is "foo-test," but this is just because that is the
convention I am used to in other projects I have worked on.

On an unbiased note, we are currently almost evenly split between the
two conventions with *slight* preference for "foo-test": I ran the two
following grep commands on v5.1-rc7:

grep -Hrn --exclude-dir="build" -e "config [a-zA-Z_0-9]\+_TEST$" | wc -l
grep -Hrn --exclude-dir="build" -e "config TEST_[a-zA-Z_0-9]\+" | wc -l

"foo-test" has 36 occurrences.
"test-foo" has 33 occurrences.

The things I am more concerned about is how this would affect file
naming. If we have a unit test for foo.c, I think foo_test.c is more
consistent with our namespacing conventions. The other thing, is if we
already have a Kconfig symbol called FOO_TEST (or TEST_FOO) what
should we name the KUnit test in this case? FOO_UNIT_TEST?
FOO_KUNIT_TEST, like I did above?

Cheers
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

[v5 3/3] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-05-02 Thread Pavel Tatashin

It is now allowed to use persistent memory like a regular RAM, but
currently there is no way to remove this memory until machine is
rebooted.

This work expands the functionality to also allows hotremoving
previously hotplugged persistent memory, and recover the device for use
for other purposes.

To hotremove persistent memory, the management software must first
offline all memory blocks of dax region, and than unbind it from
device-dax/kmem driver. So, operations should look like this:

echo offline > /sys/devices/system/memory/memoryN/state
...
echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind

Note: if unbind is done without offlining memory beforehand, it won't be
possible to do dax0.0 hotremove, and dax's memory is going to be part of
System RAM until reboot.

Signed-off-by: Pavel Tatashin 
Reviewed-by: David Hildenbrand 
---
 drivers/dax/dax-private.h |  2 ++
 drivers/dax/kmem.c| 41 +++
 2 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index a45612148ca0..999aaf3a29b3 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -53,6 +53,7 @@ struct dax_region {
  * @pgmap - pgmap for memmap setup / lifetime (driver owned)
  * @ref: pgmap reference count (driver owned)
  * @cmp: @ref final put completion (driver owned)
+ * @dax_mem_res: physical address range of hotadded DAX memory
  */
 struct dev_dax {
struct dax_region *region;
@@ -62,6 +63,7 @@ struct dev_dax {
struct dev_pagemap pgmap;
struct percpu_ref ref;
struct completion cmp;
+   struct resource *dax_kmem_res;
 };
 
 static inline struct dev_dax *to_dev_dax(struct device *dev)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 4c0131857133..3d0a7e702c94 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -71,21 +71,54 @@ int dev_dax_kmem_probe(struct device *dev)
kfree(new_res);
return rc;
}
+   dev_dax->dax_kmem_res = new_res;
 
return 0;
 }
 
+#ifdef CONFIG_MEMORY_HOTREMOVE
+static int dev_dax_kmem_remove(struct device *dev)
+{
+   struct dev_dax *dev_dax = to_dev_dax(dev);
+   struct resource *res = dev_dax->dax_kmem_res;
+   resource_size_t kmem_start = res->start;
+   resource_size_t kmem_size = resource_size(res);
+   int rc;
+
+   /*
+* We have one shot for removing memory, if some memory blocks were not
+* offline prior to calling this function remove_memory() will fail, and
+* there is no way to hotremove this memory until reboot because device
+* unbind will succeed even if we return failure.
+*/
+   rc = remove_memory(dev_dax->target_node, kmem_start, kmem_size);
+   if (rc) {
+   dev_err(dev,
+   "DAX region %pR cannot be hotremoved until the next 
reboot\n",
+   res);
+   return rc;
+   }
+
+   /* Release and free dax resources */
+   release_resource(res);
+   kfree(res);
+   dev_dax->dax_kmem_res = NULL;
+
+   return 0;
+}
+#else
 static int dev_dax_kmem_remove(struct device *dev)
 {
/*
-* Purposely leak the request_mem_region() for the device-dax
-* range and return '0' to ->remove() attempts. The removal of
-* the device from the driver always succeeds, but the region
-* is permanently pinned as reserved by the unreleased
+* Without hotremove purposely leak the request_mem_region() for the
+* device-dax range and return '0' to ->remove() attempts. The removal
+* of the device from the driver always succeeds, but the region is
+* permanently pinned as reserved by the unreleased
 * request_mem_region().
 */
return 0;
 }
+#endif /* CONFIG_MEMORY_HOTREMOVE */
 
 static struct dax_device_driver device_dax_kmem_driver = {
.drv = {
-- 
2.21.0

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

[v5 1/3] device-dax: fix memory and resource leak if hotplug fails

2019-05-02 Thread Pavel Tatashin

When add_memory() function fails, the resource and the memory should be
freed.

Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like 
normal RAM")

Signed-off-by: Pavel Tatashin 
Reviewed-by: Dave Hansen 
---
 drivers/dax/kmem.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index a02318c6d28a..4c0131857133 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -66,8 +66,11 @@ int dev_dax_kmem_probe(struct device *dev)
new_res->name = dev_name(dev);
 
rc = add_memory(numa_node, new_res->start, resource_size(new_res));
-   if (rc)
+   if (rc) {
+   release_resource(new_res);
+   kfree(new_res);
return rc;
+   }
 
return 0;
 }
-- 
2.21.0

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

[v5 0/3] "Hotremove" persistent memory

2019-05-02 Thread Pavel Tatashin

Changelog:
v5
- Addressed comments from Dan Williams: made remove_memory() to return
  an error code, and use this function from dax.

v4
- Addressed comments from Dave Hansen

v3
- Addressed comments from David Hildenbrand. Don't release
  lock_device_hotplug after checking memory status, and rename
  memblock_offlined_cb() to check_memblock_offlined_cb()

v2
- Dan Williams mentioned that drv->remove() return is ignored
  by unbind. Unbind always succeeds. Because we cannot guarantee
  that memory can be offlined from the driver, don't even
  attempt to do so. Simply check that every section is offlined
  beforehand and only then proceed with removing dax memory.

---

Recently, adding a persistent memory to be used like a regular RAM was
added to Linux. This work extends this functionality to also allow hot
removing persistent memory.

We (Microsoft) have an important use case for this functionality.

The requirement is for physical machines with small amount of RAM (~8G)
to be able to reboot in a very short period of time (<1s). Yet, there is
a userland state that is expensive to recreate (~2G).

The solution is to boot machines with 2G preserved for persistent
memory.

Copy the state, and hotadd the persistent memory so machine still has
all 8G available for runtime. Before reboot, offline and hotremove
device-dax 2G, copy the memory that is needed to be preserved to pmem0
device, and reboot.

The series of operations look like this:

1. After boot restore /dev/pmem0 to ramdisk to be consumed by apps.
   and free ramdisk.
2. Convert raw pmem0 to devdax
   ndctl create-namespace --mode devdax --map mem -e namespace0.0 -f
3. Hotadd to System RAM
   echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
   echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
   echo online_movable > /sys/devices/system/memoryXXX/state
4. Before reboot hotremove device-dax memory from System RAM
   echo offline > /sys/devices/system/memoryXXX/state
   echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
5. Create raw pmem0 device
   ndctl create-namespace --mode raw  -e namespace0.0 -f
6. Copy the state that was stored by apps to ramdisk to pmem device
7. Do kexec reboot or reboot through firmware if firmware does not
   zero memory in pmem0 region (These machines have only regular
   volatile memory). So to have pmem0 device either memmap kernel
   parameter is used, or devices nodes in dtb are specified.

Pavel Tatashin (3):
  device-dax: fix memory and resource leak if hotplug fails
  mm/hotplug: make remove_memory() interface useable
  device-dax: "Hotremove" persistent memory that is used like normal RAM

 drivers/dax/dax-private.h  |  2 ++
 drivers/dax/kmem.c | 46 ++---
 include/linux/memory_hotplug.h |  8 +++--
 mm/memory_hotplug.c| 61 ++
 4 files changed, 89 insertions(+), 28 deletions(-)

-- 
2.21.0

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

RE: [PATCH v2 16/17] kernel/sysctl-test: Add null pointer test for sysctl.c:proc_dointvec()

2019-05-02 Thread Tim.Bird




> -Original Message-
> From: Greg KH 
> 
> On Wed, May 01, 2019 at 04:01:25PM -0700, Brendan Higgins wrote:
> > From: Iurii Zaikin 
> >
> > KUnit tests for initialized data behavior of proc_dointvec that is
> > explicitly checked in the code. Includes basic parsing tests including
> > int min/max overflow.
> >
> > Signed-off-by: Iurii Zaikin 
> > Signed-off-by: Brendan Higgins 
> > ---
> >  kernel/Makefile  |   2 +
> >  kernel/sysctl-test.c | 292
> +++
> >  lib/Kconfig.debug|   6 +
> >  3 files changed, 300 insertions(+)
> >  create mode 100644 kernel/sysctl-test.c
> >
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index 6c57e78817dad..c81a8976b6a4b 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -112,6 +112,8 @@ obj-$(CONFIG_HAS_IOMEM) += iomem.o
> >  obj-$(CONFIG_ZONE_DEVICE) += memremap.o
> >  obj-$(CONFIG_RSEQ) += rseq.o
> >
> > +obj-$(CONFIG_SYSCTL_KUNIT_TEST) += sysctl-test.o
> 
> You are going to have to have a "standard" naming scheme for test
> modules, are you going to recommend "foo-test" over "test-foo"?  If so,
> that's fine, we should just be consistant and document it somewhere.
> 
> Personally, I'd prefer "test-foo", but that's just me, naming is hard...

My preference would be "test-foo" as well.  Just my 2 cents.
 -- Tim

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 12/17] kunit: tool: add Python wrappers for running KUnit tests

2019-05-02 Thread Brendan Higgins

On Thu, May 2, 2019 at 4:02 AM Greg KH  wrote:
>
> On Wed, May 01, 2019 at 04:01:21PM -0700, Brendan Higgins wrote:
> > From: Felix Guo 
> >
> > The ultimate goal is to create minimal isolated test binaries; in the
> > meantime we are using UML to provide the infrastructure to run tests, so
> > define an abstract way to configure and run tests that allow us to
> > change the context in which tests are built without affecting the user.
> > This also makes pretty and dynamic error reporting, and a lot of other
> > nice features easier.
> >
> > kunit_config.py:
> >   - parse .config and Kconfig files.
> >
> > kunit_kernel.py: provides helper functions to:
> >   - configure the kernel using kunitconfig.
> >   - build the kernel with the appropriate configuration.
> >   - provide function to invoke the kernel and stream the output back.
> >
> > Signed-off-by: Felix Guo 
> > Signed-off-by: Brendan Higgins 
>
> Ah, here's probably my answer to my previous logging format question,
> right?  What's the chance that these wrappers output stuff in a standard
> format that test-framework-tools can already parse?  :)

It should be pretty easy to do. I had some patches that pack up the
results into a serialized format for a presubmit service; it should be
pretty straightforward to take the same logic and just change the
output format.

Cheers
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [v4 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-05-02 Thread Pavel Tatashin

> >device-dax/kmem driver. So, operations should look like this:
> >
> >echo offline > echo offline > /sys/devices/system/memory/memoryN/state

>
> This looks wrong :)
>

Indeed, I  will fix patch log in the next version.

Thank you,
Pasha
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [v4 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-05-02 Thread Sasha Levin


On Wed, May 01, 2019 at 03:18:46PM -0400, Pavel Tatashin wrote:

It is now allowed to use persistent memory like a regular RAM, but
currently there is no way to remove this memory until machine is
rebooted.

This work expands the functionality to also allows hotremoving
previously hotplugged persistent memory, and recover the device for use
for other purposes.

To hotremove persistent memory, the management software must first
offline all memory blocks of dax region, and than unbind it from
device-dax/kmem driver. So, operations should look like this:

echo offline > echo offline > /sys/devices/system/memory/memoryN/state


This looks wrong :)

--
Thanks,
Sasha
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [v4 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-05-02 Thread Pavel Tatashin

> Currently the kmem driver can be built as a module, and I don't see a
> need to drop that flexibility. What about wrapping these core
> routines:
>
> unlock_device_hotplug
> __remove_memory
> walk_memory_range
> lock_device_hotplug
>
> ...into a common exported (gpl) helper like:
>
> int try_remove_memory(int nid, struct resource *res)
>
> Because as far as I can see there's nothing device-dax specific about
> this "try remove iff offline" functionality outside of looking up the
> related 'struct resource'. The check_devdax_mem_offlined_cb callback
> can be made generic if the callback argument is the resource pointer.

Makes sense, I will do both things that you suggested.

Thank you,
Pasha
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot

2019-05-02 Thread Pavel Tatashin

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams  wrote:
>
> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
> map_active bitmask length (64)). If it turns out that 2MB is too large
> of an active tracking granularity it is trivial to increase the size of
> the map_active bitmap.

Please mention that 2M on Intel, and 16M on Arm64.

>
> The implications of a partially populated section is that pfn_valid()
> needs to go beyond a valid_section() check and read the sub-section
> active ranges from the bitmask.
>
> Cc: Michal Hocko 
> Cc: Vlastimil Babka 
> Cc: Logan Gunthorpe 
> Signed-off-by: Dan Williams 
> ---
>  include/linux/mmzone.h |   29 -
>  mm/page_alloc.c|4 +++-
>  mm/sparse.c|   48 
> 
>  3 files changed, 79 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 6726fc175b51..cffde898e345 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1175,6 +1175,8 @@ struct mem_section_usage {
> unsigned long pageblock_flags[0];
>  };
>
> +void section_active_init(unsigned long pfn, unsigned long nr_pages);
> +
>  struct page;
>  struct page_ext;
>  struct mem_section {
> @@ -1312,12 +1314,36 @@ static inline struct mem_section 
> *__pfn_to_section(unsigned long pfn)
>
>  extern int __highest_present_section_nr;
>
> +static inline int section_active_index(phys_addr_t phys)
> +{
> +   return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE;

How about also defining SECTION_ACTIVE_SHIFT like this:

/* BITS_PER_LONG = 2^6 */
#define BITS_PER_LONG_SHIFT 6
#define SECTION_ACTIVE_SHIFT (SECTION_SIZE_BITS - BITS_PER_LONG_SHIFT)
#define SECTION_ACTIVE_SIZE (1 << SECTION_ACTIVE_SHIFT)

The return above would become:
return (phys & ~(PA_SECTION_MASK)) >> SECTION_ACTIVE_SHIFT;

> +}
> +
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> +static inline int pfn_section_valid(struct mem_section *ms, unsigned long 
> pfn)
> +{
> +   int idx = section_active_index(PFN_PHYS(pfn));
> +
> +   return !!(ms->usage->map_active & (1UL << idx));
> +}
> +#else
> +static inline int pfn_section_valid(struct mem_section *ms, unsigned long 
> pfn)
> +{
> +   return 1;
> +}
> +#endif
> +
>  #ifndef CONFIG_HAVE_ARCH_PFN_VALID
>  static inline int pfn_valid(unsigned long pfn)
>  {
> +   struct mem_section *ms;
> +
> if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> return 0;
> -   return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> +   ms = __nr_to_section(pfn_to_section_nr(pfn));
> +   if (!valid_section(ms))
> +   return 0;
> +   return pfn_section_valid(ms, pfn);
>  }
>  #endif
>
> @@ -1349,6 +1375,7 @@ void sparse_init(void);
>  #define sparse_init()  do {} while (0)
>  #define sparse_index_init(_sec, _nid)  do {} while (0)
>  #define pfn_present pfn_valid
> +#define section_active_init(_pfn, _nr_pages) do {} while (0)
>  #endif /* CONFIG_SPARSEMEM */
>
>  /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f671401a7c0b..c9ad28a78018 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7273,10 +7273,12 @@ void __init free_area_init_nodes(unsigned long 
> *max_zone_pfn)
>
> /* Print out the early node map */
> pr_info("Early memory node ranges\n");
> -   for_each_mem_pfn_range(i, MAX_NUMNODES, _pfn, _pfn, )
> +   for_each_mem_pfn_range(i, MAX_NUMNODES, _pfn, _pfn, ) {
> pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
> (u64)start_pfn << PAGE_SHIFT,
> ((u64)end_pfn << PAGE_SHIFT) - 1);
> +   section_active_init(start_pfn, end_pfn - start_pfn);
> +   }
>
> /* Initialise every node */
> mminit_verify_pageflags_layout();
> diff --git a/mm/sparse.c b/mm/sparse.c
> index f87de7ad32c8..5ef2f884c4e1 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -210,6 +210,54 @@ static inline unsigned long 
> first_present_section_nr(void)
> return next_present_section_nr(-1);
>  }
>
> +static unsigned long section_active_mask(unsigned long pfn,
> +   unsigned long nr_pages)
> +{
> +   int idx_start, idx_size;
> +   phys_addr_t start, size;
> +
> +   if (!nr_pages)
> +   return 0;
> +
> +   start = PFN_PHYS(pfn);
> +   size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> +   - (pfn & ~PAGE_SECTION_MASK)));
> +   size = ALIGN(size, SECTION_ACTIVE_SIZE);
> +
> +   idx_start = section_active_index(start);
> +   idx_size = section_active_index(size);
> +
> +   if (idx_size == 0)
> +   return -1;
> +   return ((1UL << idx_size) - 1) << idx_start;
> +}
> +
> +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> +{
> +   int end_sec = pfn_to_section_nr(pfn +

Re: [v4 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-05-02 Thread Dan Williams

On Wed, May 1, 2019 at 12:19 PM Pavel Tatashin
 wrote:
>
> It is now allowed to use persistent memory like a regular RAM, but
> currently there is no way to remove this memory until machine is
> rebooted.
>
> This work expands the functionality to also allows hotremoving
> previously hotplugged persistent memory, and recover the device for use
> for other purposes.
>
> To hotremove persistent memory, the management software must first
> offline all memory blocks of dax region, and than unbind it from
> device-dax/kmem driver. So, operations should look like this:
>
> echo offline > echo offline > /sys/devices/system/memory/memoryN/state
> ...
> echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
>
> Note: if unbind is done without offlining memory beforehand, it won't be
> possible to do dax0.0 hotremove, and dax's memory is going to be part of
> System RAM until reboot.
>
> Signed-off-by: Pavel Tatashin 
> ---
>  drivers/dax/dax-private.h |  2 +
>  drivers/dax/kmem.c| 99 +--
>  2 files changed, 97 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> index a45612148ca0..999aaf3a29b3 100644
> --- a/drivers/dax/dax-private.h
> +++ b/drivers/dax/dax-private.h
> @@ -53,6 +53,7 @@ struct dax_region {
>   * @pgmap - pgmap for memmap setup / lifetime (driver owned)
>   * @ref: pgmap reference count (driver owned)
>   * @cmp: @ref final put completion (driver owned)
> + * @dax_mem_res: physical address range of hotadded DAX memory
>   */
>  struct dev_dax {
> struct dax_region *region;
> @@ -62,6 +63,7 @@ struct dev_dax {
> struct dev_pagemap pgmap;
> struct percpu_ref ref;
> struct completion cmp;
> +   struct resource *dax_kmem_res;
>  };
>
>  static inline struct dev_dax *to_dev_dax(struct device *dev)
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 4c0131857133..72b868066026 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -71,21 +71,112 @@ int dev_dax_kmem_probe(struct device *dev)
> kfree(new_res);
> return rc;
> }
> +   dev_dax->dax_kmem_res = new_res;
>
> return 0;
>  }
>
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +static int
> +check_devdax_mem_offlined_cb(struct memory_block *mem, void *arg)
> +{
> +   /* Memory block device */
> +   struct device *mem_dev = >dev;
> +   bool is_offline;
> +
> +   device_lock(mem_dev);
> +   is_offline = mem_dev->offline;
> +   device_unlock(mem_dev);
> +
> +   /*
> +* Check that device-dax's memory_blocks are offline. If a 
> memory_block
> +* is not offline a warning is printed and an error is returned.
> +*/
> +   if (!is_offline) {
> +   /* Dax device device */
> +   struct device *dev = (struct device *)arg;
> +   struct dev_dax *dev_dax = to_dev_dax(dev);
> +   struct resource *res = _dax->region->res;
> +   unsigned long spfn = section_nr_to_pfn(mem->start_section_nr);
> +   unsigned long epfn = section_nr_to_pfn(mem->end_section_nr) +
> +  PAGES_PER_SECTION - 1;
> +   phys_addr_t spa = spfn << PAGE_SHIFT;
> +   phys_addr_t epa = epfn << PAGE_SHIFT;
> +
> +   dev_err(dev,
> +   "DAX region %pR cannot be hotremoved until the next 
> reboot. Memory block [%pa-%pa] is not offline.\n",
> +   res, , );
> +
> +   return -EBUSY;
> +   }
> +
> +   return 0;
> +}
> +
> +static int dev_dax_kmem_remove(struct device *dev)
> +{
> +   struct dev_dax *dev_dax = to_dev_dax(dev);
> +   struct resource *res = dev_dax->dax_kmem_res;
> +   resource_size_t kmem_start;
> +   resource_size_t kmem_size;
> +   unsigned long start_pfn;
> +   unsigned long end_pfn;
> +   int rc;
> +
> +   kmem_start = res->start;
> +   kmem_size = resource_size(res);
> +   start_pfn = kmem_start >> PAGE_SHIFT;
> +   end_pfn = start_pfn + (kmem_size >> PAGE_SHIFT) - 1;
> +
> +   /*
> +* Keep hotplug lock while checking memory state, and also required
> +* during __remove_memory() call. Admin can't change memory state via
> +* sysfs while this lock is kept.
> +*/
> +   lock_device_hotplug();
> +
> +   /*
> +* Walk and check that every singe memory_block of dax region is
> +* offline. Hotremove can succeed only when every memory_block is
> +* offlined beforehand.
> +*/
> +   rc = walk_memory_range(start_pfn, end_pfn, dev,
> +  check_devdax_mem_offlined_cb);
> +
> +   /*
> +* If admin has not offlined memory beforehand, we cannot hotremove 
> dax.
> +* Unfortunately, because unbind will still succeed there is no way 
> for
> +* user to hotremove dax after this.
> +

Re: [PATCH v6 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section

2019-05-02 Thread Pavel Tatashin

On Wed, Apr 17, 2019 at 2:52 PM Dan Williams  wrote:
>
> Up-level the local section size and mask from kernel/memremap.c to
> global definitions.  These will be used by the new sub-section hotplug
> support.
>
> Cc: Michal Hocko 
> Cc: Vlastimil Babka 
> Cc: Jérôme Glisse 
> Cc: Logan Gunthorpe 
> Signed-off-by: Dan Williams 

Should be dropped from this series as it has been replaced by a very
similar patch in the mainline:

7c697d7fb5cb14ef60e2b687333ba3efb74f73da
 mm/memremap: Rename and consolidate SECTION_SIZE
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [v4 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-05-02 Thread Pavel Tatashin

>
> Memory unplug bits
>
> Reviewed-by: David Hildenbrand 
>

Thank you David.

Pasha
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage

2019-05-02 Thread Pavel Tatashin

On Thu, May 2, 2019 at 2:07 AM Dan Williams  wrote:
>
> On Wed, May 1, 2019 at 4:25 PM Pavel Tatashin  
> wrote:
> >
> > On 19-04-17 11:39:00, Dan Williams wrote:
> > > Towards enabling memory hotplug to track partial population of a
> > > section, introduce 'struct mem_section_usage'.
> > >
> > > A pointer to a 'struct mem_section_usage' instance replaces the existing
> > > pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
> > > 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
> > > house a new 'map_active' bitmap.  The new bitmap enables the memory
> > > hot{plug,remove} implementation to act on incremental sub-divisions of a
> > > section.
> > >
> > > The primary motivation for this functionality is to support platforms
> > > that mix "System RAM" and "Persistent Memory" within a single section,
> > > or multiple PMEM ranges with different mapping lifetimes within a single
> > > section. The section restriction for hotplug has caused an ongoing saga
> > > of hacks and bugs for devm_memremap_pages() users.
> > >
> > > Beyond the fixups to teach existing paths how to retrieve the 'usemap'
> > > from a section, and updates to usemap allocation path, there are no
> > > expected behavior changes.
> > >
> > > Cc: Michal Hocko 
> > > Cc: Vlastimil Babka 
> > > Cc: Logan Gunthorpe 
> > > Signed-off-by: Dan Williams 

Reviewed-by: Pavel Tatashin 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [v4 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-05-02 Thread David Hildenbrand

On 01.05.19 21:18, Pavel Tatashin wrote:
> It is now allowed to use persistent memory like a regular RAM, but
> currently there is no way to remove this memory until machine is
> rebooted.
> 
> This work expands the functionality to also allows hotremoving
> previously hotplugged persistent memory, and recover the device for use
> for other purposes.
> 
> To hotremove persistent memory, the management software must first
> offline all memory blocks of dax region, and than unbind it from
> device-dax/kmem driver. So, operations should look like this:
> 
> echo offline > echo offline > /sys/devices/system/memory/memoryN/state
> ...
> echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
> 
> Note: if unbind is done without offlining memory beforehand, it won't be
> possible to do dax0.0 hotremove, and dax's memory is going to be part of
> System RAM until reboot.
> 
> Signed-off-by: Pavel Tatashin 
> ---
>  drivers/dax/dax-private.h |  2 +
>  drivers/dax/kmem.c| 99 +--
>  2 files changed, 97 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> index a45612148ca0..999aaf3a29b3 100644
> --- a/drivers/dax/dax-private.h
> +++ b/drivers/dax/dax-private.h
> @@ -53,6 +53,7 @@ struct dax_region {
>   * @pgmap - pgmap for memmap setup / lifetime (driver owned)
>   * @ref: pgmap reference count (driver owned)
>   * @cmp: @ref final put completion (driver owned)
> + * @dax_mem_res: physical address range of hotadded DAX memory
>   */
>  struct dev_dax {
>   struct dax_region *region;
> @@ -62,6 +63,7 @@ struct dev_dax {
>   struct dev_pagemap pgmap;
>   struct percpu_ref ref;
>   struct completion cmp;
> + struct resource *dax_kmem_res;
>  };
>  
>  static inline struct dev_dax *to_dev_dax(struct device *dev)
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 4c0131857133..72b868066026 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -71,21 +71,112 @@ int dev_dax_kmem_probe(struct device *dev)
>   kfree(new_res);
>   return rc;
>   }
> + dev_dax->dax_kmem_res = new_res;
>  
>   return 0;
>  }
>  
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +static int
> +check_devdax_mem_offlined_cb(struct memory_block *mem, void *arg)
> +{
> + /* Memory block device */
> + struct device *mem_dev = >dev;
> + bool is_offline;
> +
> + device_lock(mem_dev);
> + is_offline = mem_dev->offline;
> + device_unlock(mem_dev);
> +
> + /*
> +  * Check that device-dax's memory_blocks are offline. If a memory_block
> +  * is not offline a warning is printed and an error is returned.
> +  */
> + if (!is_offline) {
> + /* Dax device device */
> + struct device *dev = (struct device *)arg;
> + struct dev_dax *dev_dax = to_dev_dax(dev);
> + struct resource *res = _dax->region->res;
> + unsigned long spfn = section_nr_to_pfn(mem->start_section_nr);
> + unsigned long epfn = section_nr_to_pfn(mem->end_section_nr) +
> +PAGES_PER_SECTION - 1;
> + phys_addr_t spa = spfn << PAGE_SHIFT;
> + phys_addr_t epa = epfn << PAGE_SHIFT;
> +
> + dev_err(dev,
> + "DAX region %pR cannot be hotremoved until the next 
> reboot. Memory block [%pa-%pa] is not offline.\n",
> + res, , );
> +
> + return -EBUSY;
> + }
> +
> + return 0;
> +}
> +
> +static int dev_dax_kmem_remove(struct device *dev)
> +{
> + struct dev_dax *dev_dax = to_dev_dax(dev);
> + struct resource *res = dev_dax->dax_kmem_res;
> + resource_size_t kmem_start;
> + resource_size_t kmem_size;
> + unsigned long start_pfn;
> + unsigned long end_pfn;
> + int rc;
> +
> + kmem_start = res->start;
> + kmem_size = resource_size(res);
> + start_pfn = kmem_start >> PAGE_SHIFT;
> + end_pfn = start_pfn + (kmem_size >> PAGE_SHIFT) - 1;
> +
> + /*
> +  * Keep hotplug lock while checking memory state, and also required
> +  * during __remove_memory() call. Admin can't change memory state via
> +  * sysfs while this lock is kept.
> +  */
> + lock_device_hotplug();
> +
> + /*
> +  * Walk and check that every singe memory_block of dax region is
> +  * offline. Hotremove can succeed only when every memory_block is
> +  * offlined beforehand.
> +  */
> + rc = walk_memory_range(start_pfn, end_pfn, dev,
> +check_devdax_mem_offlined_cb);
> +
> + /*
> +  * If admin has not offlined memory beforehand, we cannot hotremove dax.
> +  * Unfortunately, because unbind will still succeed there is no way for
> +  * user to hotremove dax after this.
> +  */
> + if (rc) {
> + unlock_device_hotplug();
> + return rc;
> + }
> +
> + /* Hotremove memory,

Re: [PATCH v2 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework

2019-05-02 Thread shuah


On 5/2/19 4:50 AM, Greg KH wrote:

On Wed, May 01, 2019 at 04:01:09PM -0700, Brendan Higgins wrote:

## TLDR

I rebased the last patchset on 5.1-rc7 in hopes that we can get this in
5.2.


That might be rushing it, normally trees are already closed now for
5.2-rc1 if 5.1-final comes out this Sunday.


Shuah, I think you, Greg KH, and myself talked off thread, and we agreed
we would merge through your tree when the time came? Am I remembering
correctly?


No objection from me.



Yes. I can take these through kselftest tree when the time comes.
Agree with Greg that 5.2 might be rushing it. 5.3 would be a good
target.

thanks,
-- Shuah



___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v7 03/12] mm/sparsemem: Add helpers track active portions of a section at boot

2019-05-02 Thread Dan Williams

On Thu, May 2, 2019 at 12:48 AM Oscar Salvador  wrote:
>
> On Wed, May 01, 2019 at 10:55:37PM -0700, Dan Williams wrote:
> > Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> > section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
> > map_active bitmask length (64)). If it turns out that 2MB is too large
> > of an active tracking granularity it is trivial to increase the size of
> > the map_active bitmap.
> >
> > The implications of a partially populated section is that pfn_valid()
> > needs to go beyond a valid_section() check and read the sub-section
> > active ranges from the bitmask.
> >
> > Cc: Michal Hocko 
> > Cc: Vlastimil Babka 
> > Cc: Logan Gunthorpe 
> > Tested-by: Jane Chu 
> > Signed-off-by: Dan Williams 
>
> Unfortunately I did not hear back about the comments/questions I made for this
> in the previous version.

Apologies, yes, will incorporate.

>
> > ---
> >  include/linux/mmzone.h |   29 -
> >  mm/page_alloc.c|4 +++-
> >  mm/sparse.c|   48 
> > 
> >  3 files changed, 79 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 6726fc175b51..cffde898e345 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -1175,6 +1175,8 @@ struct mem_section_usage {
> >   unsigned long pageblock_flags[0];
> >  };
> >
> > +void section_active_init(unsigned long pfn, unsigned long nr_pages);
> > +
> >  struct page;
> >  struct page_ext;
> >  struct mem_section {
> > @@ -1312,12 +1314,36 @@ static inline struct mem_section 
> > *__pfn_to_section(unsigned long pfn)
> >
> >  extern int __highest_present_section_nr;
> >
> > +static inline int section_active_index(phys_addr_t phys)
> > +{
> > + return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE;
> > +}
> > +
> > +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> > +static inline int pfn_section_valid(struct mem_section *ms, unsigned long 
> > pfn)
> > +{
> > + int idx = section_active_index(PFN_PHYS(pfn));
> > +
> > + return !!(ms->usage->map_active & (1UL << idx));
>
> section_active_mask() also converts the value to address/size.
> Why do we need to convert the values and we cannot work with pfn/pages 
> instead?
> It should be perfectly possible unless I am missing something.
>
> The only thing required would be to export earlier your:
>
> +#define PAGES_PER_SUB_SECTION (SECTION_ACTIVE_SIZE / PAGE_SIZE)
> +#define PAGE_SUB_SECTION_MASK (~(PAGES_PER_SUB_SECTION-1))
>
> and change section_active_index to:
>
> static inline int section_active_index(unsigned long pfn)
> {
> return (pfn & ~(PAGE_SECTION_MASK)) / SUB_SECTION_ACTIVE_PAGES;
> }
>
> In this way we do need to shift the values every time and we can work with 
> them
> directly.
> Maybe you made it work this way because a reason I am missing.
>
> > +static unsigned long section_active_mask(unsigned long pfn,
> > + unsigned long nr_pages)
> > +{
> > + int idx_start, idx_size;
> > + phys_addr_t start, size;
> > +
> > + if (!nr_pages)
> > + return 0;
> > +
> > + start = PFN_PHYS(pfn);
> > + size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> > + - (pfn & ~PAGE_SECTION_MASK)));
>
> It seems to me that we already picked the lowest value back in
> section_active_init, so we should be fine if we drop the min() here?
>
> Another thing is why do we need to convert the values to address/size, and we
> cannot work with pfns/pages.
> Unless I am missing something it should be possible.

Right, I believe the physical address conversion was a holdover from a
previous version and these helpers can be cleaned up to be pfn based,
good catch.

>
> > + size = ALIGN(size, SECTION_ACTIVE_SIZE);
> > +
> > + idx_start = section_active_index(start);
> > + idx_size = section_active_index(size);
> > +
> > + if (idx_size == 0)
> > + return -1;
>
> Maybe we would be better off converting that -1 into something like 
> "FULL_SECTION",
> or at least dropping a comment there that "-1" means that the section is fully
> populated.

Agreed, I'll add a #define.

Thanks Oscar.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v7 06/12] mm/hotplug: Kill is_dev_zone() usage in __remove_pages()

2019-05-02 Thread David Hildenbrand

On 02.05.19 07:55, Dan Williams wrote:
> The zone type check was a leftover from the cleanup that plumbed altmap
> through the memory hotplug path, i.e. commit da024512a1fa "mm: pass the
> vmem_altmap to arch_remove_memory and __remove_pages".
> 
> Cc: Michal Hocko 
> Cc: Logan Gunthorpe 
> Cc: David Hildenbrand 
> Signed-off-by: Dan Williams 
> ---
>  mm/memory_hotplug.c |7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 0d379da0f1a8..108380e20d8f 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -544,11 +544,8 @@ void __remove_pages(struct zone *zone, unsigned long 
> phys_start_pfn,
>   unsigned long map_offset = 0;
>   int sections_to_remove;
>  
> - /* In the ZONE_DEVICE case device driver owns the memory region */
> - if (is_dev_zone(zone)) {
> - if (altmap)
> - map_offset = vmem_altmap_offset(altmap);
> - }
> + if (altmap)
> + map_offset = vmem_altmap_offset(altmap);
>  
>   clear_zone_contiguous(zone);
>  
> 


That can be picked up independently

Reviewed-by: David Hildenbrand 

-- 

Thanks,

David / dhildenb
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework

2019-05-02 Thread Greg KH

On Thu, May 02, 2019 at 12:50:53PM +0200, Greg KH wrote:
> On Wed, May 01, 2019 at 04:01:09PM -0700, Brendan Higgins wrote:
> > ## TLDR
> > 
> > I rebased the last patchset on 5.1-rc7 in hopes that we can get this in
> > 5.2.
> 
> That might be rushing it, normally trees are already closed now for
> 5.2-rc1 if 5.1-final comes out this Sunday.
> 
> > Shuah, I think you, Greg KH, and myself talked off thread, and we agreed
> > we would merge through your tree when the time came? Am I remembering
> > correctly?
> 
> No objection from me.
> 
> Let me go review the latest round of patches now.

Overall, looks good to me, and provides a framework we can build on.
I'm a bit annoyed at the reliance on uml at the moment, but we can work
on that in the future :)

Thanks for sticking with this, now the real work begins...

Reviewed-by: Greg Kroah-Hartman 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 16/17] kernel/sysctl-test: Add null pointer test for sysctl.c:proc_dointvec()

2019-05-02 Thread Greg KH

On Wed, May 01, 2019 at 04:01:25PM -0700, Brendan Higgins wrote:
> From: Iurii Zaikin 
> 
> KUnit tests for initialized data behavior of proc_dointvec that is
> explicitly checked in the code. Includes basic parsing tests including
> int min/max overflow.
> 
> Signed-off-by: Iurii Zaikin 
> Signed-off-by: Brendan Higgins 
> ---
>  kernel/Makefile  |   2 +
>  kernel/sysctl-test.c | 292 +++
>  lib/Kconfig.debug|   6 +
>  3 files changed, 300 insertions(+)
>  create mode 100644 kernel/sysctl-test.c
> 
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 6c57e78817dad..c81a8976b6a4b 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -112,6 +112,8 @@ obj-$(CONFIG_HAS_IOMEM) += iomem.o
>  obj-$(CONFIG_ZONE_DEVICE) += memremap.o
>  obj-$(CONFIG_RSEQ) += rseq.o
>  
> +obj-$(CONFIG_SYSCTL_KUNIT_TEST) += sysctl-test.o

You are going to have to have a "standard" naming scheme for test
modules, are you going to recommend "foo-test" over "test-foo"?  If so,
that's fine, we should just be consistant and document it somewhere.

Personally, I'd prefer "test-foo", but that's just me, naming is hard...

thanks,

greg k-h
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 12/17] kunit: tool: add Python wrappers for running KUnit tests

2019-05-02 Thread Greg KH

On Wed, May 01, 2019 at 04:01:21PM -0700, Brendan Higgins wrote:
> From: Felix Guo 
> 
> The ultimate goal is to create minimal isolated test binaries; in the
> meantime we are using UML to provide the infrastructure to run tests, so
> define an abstract way to configure and run tests that allow us to
> change the context in which tests are built without affecting the user.
> This also makes pretty and dynamic error reporting, and a lot of other
> nice features easier.
> 
> kunit_config.py:
>   - parse .config and Kconfig files.
> 
> kunit_kernel.py: provides helper functions to:
>   - configure the kernel using kunitconfig.
>   - build the kernel with the appropriate configuration.
>   - provide function to invoke the kernel and stream the output back.
> 
> Signed-off-by: Felix Guo 
> Signed-off-by: Brendan Higgins 

Ah, here's probably my answer to my previous logging format question,
right?  What's the chance that these wrappers output stuff in a standard
format that test-framework-tools can already parse?  :)

thanks,

greg k-h
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 04/17] kunit: test: add kunit_stream a std::stream like logger

2019-05-02 Thread Greg KH

On Wed, May 01, 2019 at 04:01:13PM -0700, Brendan Higgins wrote:
> A lot of the expectation and assertion infrastructure prints out fairly
> complicated test failure messages, so add a C++ style log library for
> for logging test results.

Ideally we would always use a standard logging format, like the
kselftest tests all are aiming to do.  That way the output can be easily
parsed by tools to see if the tests succeed/fail easily.

Any chance of having this logging framework enforcing that format as
well?

thanks,

greg k-h
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 07/17] kunit: test: add initial tests

2019-05-02 Thread Greg KH

On Wed, May 01, 2019 at 04:01:16PM -0700, Brendan Higgins wrote:
> Add a test for string stream along with a simpler example.
> 
> Signed-off-by: Brendan Higgins 
> ---
>  kunit/Kconfig  | 12 ++
>  kunit/Makefile |  4 ++
>  kunit/example-test.c   | 88 ++
>  kunit/string-stream-test.c | 61 ++
>  4 files changed, 165 insertions(+)
>  create mode 100644 kunit/example-test.c
>  create mode 100644 kunit/string-stream-test.c
> 
> diff --git a/kunit/Kconfig b/kunit/Kconfig
> index 64480092b2c24..5cb500355c873 100644
> --- a/kunit/Kconfig
> +++ b/kunit/Kconfig
> @@ -13,4 +13,16 @@ config KUNIT
> special hardware. For more information, please see
> Documentation/kunit/
>  
> +config KUNIT_TEST
> + bool "KUnit test for KUnit"
> + depends on KUNIT
> + help
> +   Enables KUnit test to test KUnit.
> +
> +config KUNIT_EXAMPLE_TEST
> + bool "Example test for KUnit"
> + depends on KUNIT
> + help
> +   Enables example KUnit test to demo features of KUnit.

Can't these tests be module?

Or am I mis-reading the previous logic?

Anyway, just a question, nothing objecting to this as-is for now.

thanks,

greg k-h
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework

2019-05-02 Thread Greg KH

On Wed, May 01, 2019 at 04:01:09PM -0700, Brendan Higgins wrote:
> ## TLDR
> 
> I rebased the last patchset on 5.1-rc7 in hopes that we can get this in
> 5.2.

That might be rushing it, normally trees are already closed now for
5.2-rc1 if 5.1-final comes out this Sunday.

> Shuah, I think you, Greg KH, and myself talked off thread, and we agreed
> we would merge through your tree when the time came? Am I remembering
> correctly?

No objection from me.

Let me go review the latest round of patches now.

thanks,

greg k-h
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

[bug report] acpi/nfit: Add support for Intel DSM 1.8 commands

2019-05-02 Thread Dan Carpenter

Hello Dan Williams,

Commit 11189c1089da ("acpi/nfit: Fix command-supported detection") from
Jan 19, 2019 leads to the following static checker warning:

drivers/acpi/nfit/core.c:503 acpi_nfit_ctl()
error: passing untrusted data 'func' to 'variable_test_bit()'

Related but not critical:

drivers/acpi/nfit/core.c:3510 acpi_nfit_clear_to_send()
error: undefined (user controlled) shift '1 << func'

drivers/nvdimm/bus.c
  1062  buf = vmalloc(buf_len);
  1063  if (!buf)
  1064  return -ENOMEM;
  1065  
  1066  if (copy_from_user(buf, p, buf_len)) {
  1067  rc = -EFAULT;
  1068  goto out;
  1069  }
  1070  
  1071  nvdimm_bus_lock(_bus->dev);
  1072  rc = nd_cmd_clear_to_send(nvdimm_bus, nvdimm, func, buf);
  1073  if (rc)
  1074  goto out_unlock;
  1075  
  1076  rc = nd_desc->ndctl(nd_desc, nvdimm, cmd, buf, buf_len, 
_rc);
  1077  if (rc < 0)
  1078  goto out_unlock;

This is __nd_ioctl().  We get "buf" from the user and then pass it to
acpi_nfit_clear_to_send() and then acpi_nfit_ctl().

drivers/acpi/nfit/core.c
   446  int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm 
*nvdimm,
   447  unsigned int cmd, void *buf, unsigned int buf_len, int 
*cmd_rc)
   448  {
   449  struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
   450  struct nfit_mem *nfit_mem = nvdimm_provider_data(nvdimm);
   451  union acpi_object in_obj, in_buf, *out_obj;
   452  const struct nd_cmd_desc *desc = NULL;
   453  struct device *dev = acpi_desc->dev;
   454  struct nd_cmd_pkg *call_pkg = NULL;
   455  const char *cmd_name, *dimm_name;
   456  unsigned long cmd_mask, dsm_mask;
   457  u32 offset, fw_status = 0;
   458  acpi_handle handle;
   459  const guid_t *guid;
   460  int func, rc, i;
   461  
   462  if (cmd_rc)
   463  *cmd_rc = -EINVAL;
   464  
   465  if (cmd == ND_CMD_CALL)
   466  call_pkg = buf;
^^

   467  func = cmd_to_func(nfit_mem, cmd, call_pkg);
^^^

We set func = call_pkg->nd_command (0-s32max).


   468  if (func < 0)
   469  return func;
   470  
   471  if (nvdimm) {
   472  struct acpi_device *adev = nfit_mem->adev;
   473  
   474  if (!adev)
   475  return -ENOTTY;
   476  
   477  dimm_name = nvdimm_name(nvdimm);
   478  cmd_name = nvdimm_cmd_name(cmd);
   479  cmd_mask = nvdimm_cmd_mask(nvdimm);
   480  dsm_mask = nfit_mem->dsm_mask;
   481  desc = nd_cmd_dimm_desc(cmd);
   482  guid = to_nfit_uuid(nfit_mem->family);
   483  handle = adev->handle;
   484  } else {
   485  struct acpi_device *adev = to_acpi_dev(acpi_desc);
   486  
   487  cmd_name = nvdimm_bus_cmd_name(cmd);
   488  cmd_mask = nd_desc->cmd_mask;
   489  dsm_mask = nd_desc->bus_dsm_mask;
   490  desc = nd_cmd_bus_desc(cmd);
   491  guid = to_nfit_uuid(NFIT_DEV_BUS);
   492  handle = adev->handle;
   493  dimm_name = "bus";
   494  }
   495  
   496  if (!desc || (cmd && (desc->out_num + desc->in_num == 0)))
   497  return -ENOTTY;
   498  
   499  /*
   500   * Check for a valid command.  For ND_CMD_CALL, we also have to
   501   * make sure that the DSM function is supported.
   502   */
   503  if (cmd == ND_CMD_CALL && !test_bit(func, _mask))


"func" might be beyond the end of the bitmap so it could be an out of
bounds read.  We do bounds check it to <= 31 in nfit_dsm_revid() but I
couldn't see any range checking in acpi_evaluate_dsm().

   504  return -ENOTTY;
   505  else if (!test_bit(cmd, _mask))
   506  return -ENOTTY;
   507  
   508  in_obj.type = ACPI_TYPE_PACKAGE;
   509  in_obj.package.count = 1;
   510  in_obj.package.elements = _buf;
   511  in_buf.type = ACPI_TYPE_BUFFER;
   512  in_buf.buffer.pointer = buf;
   513  in_buf.buffer.length = 0;
   514  
   515  /* libnvdimm has already validated the input envelope */
   516  for (i = 0; i < desc->in_num; i++)
   517  in_buf.buffer.length += nd_cmd_in_size(nvdimm, cmd, 
desc,
   518  i, buf);
   519  
   520  if (call_pkg) {
   521  /* skip over package wrapper */
   522

Re: [PATCH v7 03/12] mm/sparsemem: Add helpers track active portions of a section at boot

2019-05-02 Thread Oscar Salvador

On Wed, May 01, 2019 at 10:55:37PM -0700, Dan Williams wrote:
> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
> map_active bitmask length (64)). If it turns out that 2MB is too large
> of an active tracking granularity it is trivial to increase the size of
> the map_active bitmap.
> 
> The implications of a partially populated section is that pfn_valid()
> needs to go beyond a valid_section() check and read the sub-section
> active ranges from the bitmask.
> 
> Cc: Michal Hocko 
> Cc: Vlastimil Babka 
> Cc: Logan Gunthorpe 
> Tested-by: Jane Chu 
> Signed-off-by: Dan Williams 

Unfortunately I did not hear back about the comments/questions I made for this
in the previous version.

> ---
>  include/linux/mmzone.h |   29 -
>  mm/page_alloc.c|4 +++-
>  mm/sparse.c|   48 
> 
>  3 files changed, 79 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 6726fc175b51..cffde898e345 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1175,6 +1175,8 @@ struct mem_section_usage {
>   unsigned long pageblock_flags[0];
>  };
>  
> +void section_active_init(unsigned long pfn, unsigned long nr_pages);
> +
>  struct page;
>  struct page_ext;
>  struct mem_section {
> @@ -1312,12 +1314,36 @@ static inline struct mem_section 
> *__pfn_to_section(unsigned long pfn)
>  
>  extern int __highest_present_section_nr;
>  
> +static inline int section_active_index(phys_addr_t phys)
> +{
> + return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE;
> +}
> +
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> +static inline int pfn_section_valid(struct mem_section *ms, unsigned long 
> pfn)
> +{
> + int idx = section_active_index(PFN_PHYS(pfn));
> +
> + return !!(ms->usage->map_active & (1UL << idx));

section_active_mask() also converts the value to address/size.
Why do we need to convert the values and we cannot work with pfn/pages instead?
It should be perfectly possible unless I am missing something.

The only thing required would be to export earlier your:

+#define PAGES_PER_SUB_SECTION (SECTION_ACTIVE_SIZE / PAGE_SIZE)
+#define PAGE_SUB_SECTION_MASK (~(PAGES_PER_SUB_SECTION-1))

and change section_active_index to:

static inline int section_active_index(unsigned long pfn)
{
return (pfn & ~(PAGE_SECTION_MASK)) / SUB_SECTION_ACTIVE_PAGES;
}

In this way we do need to shift the values every time and we can work with them
directly.
Maybe you made it work this way because a reason I am missing.

> +static unsigned long section_active_mask(unsigned long pfn,
> + unsigned long nr_pages)
> +{
> + int idx_start, idx_size;
> + phys_addr_t start, size;
> +
> + if (!nr_pages)
> + return 0;
> +
> + start = PFN_PHYS(pfn);
> + size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> + - (pfn & ~PAGE_SECTION_MASK)));

It seems to me that we already picked the lowest value back in
section_active_init, so we should be fine if we drop the min() here?

Another thing is why do we need to convert the values to address/size, and we
cannot work with pfns/pages.
Unless I am missing something it should be possible.

> + size = ALIGN(size, SECTION_ACTIVE_SIZE);
> +
> + idx_start = section_active_index(start);
> + idx_size = section_active_index(size);
> +
> + if (idx_size == 0)
> + return -1;

Maybe we would be better off converting that -1 into something like 
"FULL_SECTION",
or at least dropping a comment there that "-1" means that the section is fully
populated.

> + return ((1UL << idx_size) - 1) << idx_start;
> +}
> +
> +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> +{
> + int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> + int i, start_sec = pfn_to_section_nr(pfn);
> +
> + if (!nr_pages)
> + return;
> +
> + for (i = start_sec; i <= end_sec; i++) {
> + struct mem_section *ms;
> + unsigned long mask;
> + unsigned long pfns;
> +
> + pfns = min(nr_pages, PAGES_PER_SECTION
> + - (pfn & ~PAGE_SECTION_MASK));
> + mask = section_active_mask(pfn, pfns);
> +
> + ms = __nr_to_section(i);
> + ms->usage->map_active |= mask;
> + pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i, 
> ms->usage->map_active);
> +
> + pfn += pfns;
> + nr_pages -= pfns;
> + }
> +}
> +
>  /* Record a memory area against a node. */
>  void __init memory_present(int nid, unsigned long start, unsigned long end)
>  {
> 

-- 
Oscar Salvador
SUSE L3
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

[PATCH v7 07/12] mm: Kill is_dev_zone() helper

2019-05-02 Thread Dan Williams

Given there are no more usages of is_dev_zone() outside of 'ifdef
CONFIG_ZONE_DEVICE' protection, kill off the compilation helper.

Cc: Michal Hocko 
Cc: Logan Gunthorpe 
Acked-by: David Hildenbrand 
Reviewed-by: Oscar Salvador 
Signed-off-by: Dan Williams 
---
 include/linux/mmzone.h |   12 
 mm/page_alloc.c|2 +-
 2 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b13f0cddf75e..3237c5e456df 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -855,18 +855,6 @@ static inline int local_memory_node(int node_id) { return 
node_id; };
  */
 #define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones)
 
-#ifdef CONFIG_ZONE_DEVICE
-static inline bool is_dev_zone(const struct zone *zone)
-{
-   return zone_idx(zone) == ZONE_DEVICE;
-}
-#else
-static inline bool is_dev_zone(const struct zone *zone)
-{
-   return false;
-}
-#endif
-
 /*
  * Returns true if a zone has pages managed by the buddy allocator.
  * All the reclaim decisions have to use this function rather than
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a68735c79609..be309d6a79de 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5864,7 +5864,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
unsigned long start = jiffies;
int nid = pgdat->node_id;
 
-   if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone)))
+   if (WARN_ON_ONCE(!pgmap || zone_idx(zone) != ZONE_DEVICE))
return;
 
/*

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

[PATCH v7 12/12] libnvdimm/pfn: Stop padding pmem namespaces to section alignment

2019-05-02 Thread Dan Williams

Now that the mm core supports section-unaligned hotplug of ZONE_DEVICE
memory, we no longer need to add padding at pfn/dax device creation
time. The kernel will still honor padding established by older kernels.

Reported-by: Jeff Moyer 
Signed-off-by: Dan Williams 
---
 drivers/nvdimm/pfn.h  |   11 ++-
 drivers/nvdimm/pfn_devs.c |   75 +++--
 include/linux/mmzone.h|4 ++
 3 files changed, 19 insertions(+), 71 deletions(-)

diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index e901e3a3b04c..ae589cc528f2 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -41,18 +41,13 @@ struct nd_pfn_sb {
__le64 checksum;
 };
 
-#ifdef CONFIG_SPARSEMEM
-#define PFN_SECTION_ALIGN_DOWN(x) SECTION_ALIGN_DOWN(x)
-#define PFN_SECTION_ALIGN_UP(x) SECTION_ALIGN_UP(x)
-#else
 /*
  * In this case ZONE_DEVICE=n and we will disable 'pfn' device support,
  * but we still want pmem to compile.
  */
-#define PFN_SECTION_ALIGN_DOWN(x) (x)
-#define PFN_SECTION_ALIGN_UP(x) (x)
+#ifndef SUB_SECTION_ALIGN_DOWN
+#define SUB_SECTION_ALIGN_DOWN(x) (x)
+#define SUB_SECTION_ALIGN_UP(x) (x)
 #endif
 
-#define PHYS_SECTION_ALIGN_DOWN(x) 
PFN_PHYS(PFN_SECTION_ALIGN_DOWN(PHYS_PFN(x)))
-#define PHYS_SECTION_ALIGN_UP(x) PFN_PHYS(PFN_SECTION_ALIGN_UP(PHYS_PFN(x)))
 #endif /* __NVDIMM_PFN_H */
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index a2406253eb70..7bdaaf3dc77e 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -595,14 +595,14 @@ static u32 info_block_reserve(void)
 }
 
 /*
- * We hotplug memory at section granularity, pad the reserved area from
- * the previous section base to the namespace base address.
+ * We hotplug memory at sub-section granularity, pad the reserved area
+ * from the previous section base to the namespace base address.
  */
 static unsigned long init_altmap_base(resource_size_t base)
 {
unsigned long base_pfn = PHYS_PFN(base);
 
-   return PFN_SECTION_ALIGN_DOWN(base_pfn);
+   return SUB_SECTION_ALIGN_DOWN(base_pfn);
 }
 
 static unsigned long init_altmap_reserve(resource_size_t base)
@@ -610,7 +610,7 @@ static unsigned long init_altmap_reserve(resource_size_t 
base)
unsigned long reserve = info_block_reserve() >> PAGE_SHIFT;
unsigned long base_pfn = PHYS_PFN(base);
 
-   reserve += base_pfn - PFN_SECTION_ALIGN_DOWN(base_pfn);
+   reserve += base_pfn - SUB_SECTION_ALIGN_DOWN(base_pfn);
return reserve;
 }
 
@@ -641,8 +641,7 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct 
dev_pagemap *pgmap)
nd_pfn->npfns = le64_to_cpu(pfn_sb->npfns);
pgmap->altmap_valid = false;
} else if (nd_pfn->mode == PFN_MODE_PMEM) {
-   nd_pfn->npfns = PFN_SECTION_ALIGN_UP((resource_size(res)
-   - offset) / PAGE_SIZE);
+   nd_pfn->npfns = PHYS_PFN((resource_size(res) - offset));
if (le64_to_cpu(nd_pfn->pfn_sb->npfns) > nd_pfn->npfns)
dev_info(_pfn->dev,
"number of pfns truncated from %lld to 
%ld\n",
@@ -658,50 +657,10 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, 
struct dev_pagemap *pgmap)
return 0;
 }
 
-static u64 phys_pmem_align_down(struct nd_pfn *nd_pfn, u64 phys)
-{
-   return min_t(u64, PHYS_SECTION_ALIGN_DOWN(phys),
-   ALIGN_DOWN(phys, nd_pfn->align));
-}
-
-/*
- * Check if pmem collides with 'System RAM', or other regions when
- * section aligned.  Trim it accordingly.
- */
-static void trim_pfn_device(struct nd_pfn *nd_pfn, u32 *start_pad, u32 
*end_trunc)
-{
-   struct nd_namespace_common *ndns = nd_pfn->ndns;
-   struct nd_namespace_io *nsio = to_nd_namespace_io(>dev);
-   struct nd_region *nd_region = to_nd_region(nd_pfn->dev.parent);
-   const resource_size_t start = nsio->res.start;
-   const resource_size_t end = start + resource_size(>res);
-   resource_size_t adjust, size;
-
-   *start_pad = 0;
-   *end_trunc = 0;
-
-   adjust = start - PHYS_SECTION_ALIGN_DOWN(start);
-   size = resource_size(>res) + adjust;
-   if (region_intersects(start - adjust, size, IORESOURCE_SYSTEM_RAM,
-   IORES_DESC_NONE) == REGION_MIXED
-   || nd_region_conflict(nd_region, start - adjust, size))
-   *start_pad = PHYS_SECTION_ALIGN_UP(start) - start;
-
-   /* Now check that end of the range does not collide. */
-   adjust = PHYS_SECTION_ALIGN_UP(end) - end;
-   size = resource_size(>res) + adjust;
-   if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
-   IORES_DESC_NONE) == REGION_MIXED
-   || !IS_ALIGNED(end, nd_pfn->align)
-   || nd_region_conflict(nd_region, start, size))
-   *end_trunc = end - phys_pmem_align_down(nd_pfn, end);
-}
-

[PATCH v7 10/12] mm/devm_memremap_pages: Enable sub-section remap

2019-05-02 Thread Dan Williams

Teach devm_memremap_pages() about the new sub-section capabilities of
arch_{add,remove}_memory(). Effectively, just replace all usage of
align_start, align_end, and align_size with res->start, res->end, and
resource_size(res). The existing sanity check will still make sure that
the two separate remap attempts do not collide within a sub-section (2MB
on x86).

Cc: Michal Hocko 
Cc: Toshi Kani 
Cc: Jérôme Glisse 
Cc: Logan Gunthorpe 
Signed-off-by: Dan Williams 
---
 kernel/memremap.c |   61 +
 1 file changed, 24 insertions(+), 37 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index f355586ea54a..425904858d97 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -59,7 +59,7 @@ static unsigned long pfn_first(struct dev_pagemap *pgmap)
struct vmem_altmap *altmap = >altmap;
unsigned long pfn;
 
-   pfn = res->start >> PAGE_SHIFT;
+   pfn = PHYS_PFN(res->start);
if (pgmap->altmap_valid)
pfn += vmem_altmap_offset(altmap);
return pfn;
@@ -87,7 +87,6 @@ static void devm_memremap_pages_release(void *data)
struct dev_pagemap *pgmap = data;
struct device *dev = pgmap->dev;
struct resource *res = >res;
-   resource_size_t align_start, align_size;
unsigned long pfn;
int nid;
 
@@ -96,25 +95,21 @@ static void devm_memremap_pages_release(void *data)
put_page(pfn_to_page(pfn));
 
/* pages are dead and unused, undo the arch mapping */
-   align_start = res->start & ~(PA_SECTION_SIZE - 1);
-   align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
-   - align_start;
-
-   nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT));
+   nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
 
mem_hotplug_begin();
if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
-   pfn = align_start >> PAGE_SHIFT;
+   pfn = PHYS_PFN(res->start);
__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
-   align_size >> PAGE_SHIFT, NULL);
+   PHYS_PFN(resource_size(res)), NULL);
} else {
-   arch_remove_memory(nid, align_start, align_size,
+   arch_remove_memory(nid, res->start, resource_size(res),
pgmap->altmap_valid ? >altmap : NULL);
-   kasan_remove_zero_shadow(__va(align_start), align_size);
+   kasan_remove_zero_shadow(__va(res->start), resource_size(res));
}
mem_hotplug_done();
 
-   untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
+   untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res));
pgmap_array_delete(res);
dev_WARN_ONCE(dev, pgmap->altmap.alloc,
  "%s: failed to free all reserved pages\n", __func__);
@@ -141,16 +136,13 @@ static void devm_memremap_pages_release(void *data)
  */
 void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 {
-   resource_size_t align_start, align_size, align_end;
-   struct vmem_altmap *altmap = pgmap->altmap_valid ?
-   >altmap : NULL;
struct resource *res = >res;
struct dev_pagemap *conflict_pgmap;
struct mhp_restrictions restrictions = {
/*
 * We do not want any optional features only our own memmap
*/
-   .altmap = altmap,
+   .altmap = pgmap->altmap_valid ? >altmap : NULL,
};
pgprot_t pgprot = PAGE_KERNEL;
int error, nid, is_ram;
@@ -158,26 +150,21 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
if (!pgmap->ref || !pgmap->kill)
return ERR_PTR(-EINVAL);
 
-   align_start = res->start & ~(PA_SECTION_SIZE - 1);
-   align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
-   - align_start;
-   align_end = align_start + align_size - 1;
-
-   conflict_pgmap = get_dev_pagemap(PHYS_PFN(align_start), NULL);
+   conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->start), NULL);
if (conflict_pgmap) {
dev_WARN(dev, "Conflicting mapping in same section\n");
put_dev_pagemap(conflict_pgmap);
return ERR_PTR(-ENOMEM);
}
 
-   conflict_pgmap = get_dev_pagemap(PHYS_PFN(align_end), NULL);
+   conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->end), NULL);
if (conflict_pgmap) {
dev_WARN(dev, "Conflicting mapping in same section\n");
put_dev_pagemap(conflict_pgmap);
return ERR_PTR(-ENOMEM);
}
 
-   is_ram = region_intersects(align_start, align_size,
+   is_ram = region_intersects(res->start, resource_size(res),
IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE);
 
if (is_ram != REGION_DISJOINT) {
@@ -198,8 +185,8 @@ void

[PATCH v7 08/12] mm/sparsemem: Prepare for sub-section ranges

2019-05-02 Thread Dan Williams

Prepare the memory hot-{add,remove} paths for handling sub-section
ranges by plumbing the starting page frame and number of pages being
handled through arch_{add,remove}_memory() to
sparse_{add,remove}_one_section().

This is simply plumbing, small cleanups, and some identifier renames. No
intended functional changes.

Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: Logan Gunthorpe 
Signed-off-by: Dan Williams 
---
 include/linux/memory_hotplug.h |7 +-
 mm/memory_hotplug.c|  118 +---
 mm/sparse.c|7 +-
 3 files changed, 83 insertions(+), 49 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index ae892eef8b82..835a94650ee3 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -354,9 +354,10 @@ extern int add_memory_resource(int nid, struct resource 
*resource);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages, struct vmem_altmap *altmap);
 extern bool is_memblock_offlined(struct memory_block *mem);
-extern int sparse_add_one_section(int nid, unsigned long start_pfn,
- struct vmem_altmap *altmap);
-extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
*ms,
+extern int sparse_add_section(int nid, unsigned long pfn,
+   unsigned long nr_pages, struct vmem_altmap *altmap);
+extern void sparse_remove_section(struct zone *zone, struct mem_section *ms,
+   unsigned long pfn, unsigned long nr_pages,
unsigned long map_offset, struct vmem_altmap *altmap);
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
  unsigned long pnum);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 108380e20d8f..9f73332af910 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -251,22 +251,44 @@ void __init register_page_bootmem_info_node(struct 
pglist_data *pgdat)
 }
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
-static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
-   struct vmem_altmap *altmap, bool want_memblock)
+static int __meminit __add_section(int nid, unsigned long pfn,
+   unsigned long nr_pages, struct vmem_altmap *altmap,
+   bool want_memblock)
 {
int ret;
 
-   if (pfn_valid(phys_start_pfn))
+   if (pfn_valid(pfn))
return -EEXIST;
 
-   ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
+   ret = sparse_add_section(nid, pfn, nr_pages, altmap);
if (ret < 0)
return ret;
 
if (!want_memblock)
return 0;
 
-   return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
+   return hotplug_memory_register(nid, __pfn_to_section(pfn));
+}
+
+static int subsection_check(unsigned long pfn, unsigned long nr_pages,
+   unsigned long flags, const char *reason)
+{
+   /*
+* Only allow partial section hotplug for !memblock ranges,
+* since register_new_memory() requires section alignment, and
+* CONFIG_SPARSEMEM_VMEMMAP=n requires sections to be fully
+* populated.
+*/
+   if ((!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)
+   || (flags & MHP_MEMBLOCK_API))
+   && ((pfn & ~PAGE_SECTION_MASK)
+   || (nr_pages & ~PAGE_SECTION_MASK))) {
+   WARN(1, "Sub-section hot-%s incompatible with %s\n", reason,
+   (flags & MHP_MEMBLOCK_API)
+   ? "memblock api" : "!CONFIG_SPARSEMEM_VMEMMAP");
+   return -EINVAL;
+   }
+   return 0;
 }
 
 /*
@@ -275,34 +297,40 @@ static int __meminit __add_section(int nid, unsigned long 
phys_start_pfn,
  * call this function after deciding the zone to which to
  * add the new pages.
  */
-int __ref __add_pages(int nid, unsigned long phys_start_pfn,
-   unsigned long nr_pages, struct mhp_restrictions *restrictions)
+int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
+   struct mhp_restrictions *restrictions)
 {
unsigned long i;
-   int err = 0;
-   int start_sec, end_sec;
+   int start_sec, end_sec, err;
struct vmem_altmap *altmap = restrictions->altmap;
 
-   /* during initialize mem_map, align hot-added range to section */
-   start_sec = pfn_to_section_nr(phys_start_pfn);
-   end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
-
if (altmap) {
/*
 * Validate altmap is within bounds of the total request
 */
-   if (altmap->base_pfn != phys_start_pfn
+   if (altmap->base_pfn != pfn
|| vmem_altmap_offset(altmap) > nr_pages) {

[PATCH v7 06/12] mm/hotplug: Kill is_dev_zone() usage in __remove_pages()

2019-05-02 Thread Dan Williams

The zone type check was a leftover from the cleanup that plumbed altmap
through the memory hotplug path, i.e. commit da024512a1fa "mm: pass the
vmem_altmap to arch_remove_memory and __remove_pages".

Cc: Michal Hocko 
Cc: Logan Gunthorpe 
Cc: David Hildenbrand 
Signed-off-by: Dan Williams 
---
 mm/memory_hotplug.c |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0d379da0f1a8..108380e20d8f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -544,11 +544,8 @@ void __remove_pages(struct zone *zone, unsigned long 
phys_start_pfn,
unsigned long map_offset = 0;
int sections_to_remove;
 
-   /* In the ZONE_DEVICE case device driver owns the memory region */
-   if (is_dev_zone(zone)) {
-   if (altmap)
-   map_offset = vmem_altmap_offset(altmap);
-   }
+   if (altmap)
+   map_offset = vmem_altmap_offset(altmap);
 
clear_zone_contiguous(zone);
 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

[PATCH v7 11/12] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields

2019-05-02 Thread Dan Williams

At namespace creation time there is the potential for the "expected to
be zero" fields of a 'pfn' info-block to be filled with indeterminate
data. While the kernel buffer is zeroed on allocation it is immediately
overwritten by nd_pfn_validate() filling it with the current contents of
the on-media info-block location. For fields like, 'flags' and the
'padding' it potentially means that future implementations can not rely
on those fields being zero.

In preparation to stop using the 'start_pad' and 'end_trunc' fields for
section alignment, arrange for fields that are not explicitly
initialized to be guaranteed zero. Bump the minor version to indicate it
is safe to assume the 'padding' and 'flags' are zero. Otherwise, this
corruption is expected to benign since all other critical fields are
explicitly initialized.

Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem")
Cc: 
Signed-off-by: Dan Williams 
---
 drivers/nvdimm/dax_devs.c |2 +-
 drivers/nvdimm/pfn.h  |1 +
 drivers/nvdimm/pfn_devs.c |   18 +++---
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/dax_devs.c b/drivers/nvdimm/dax_devs.c
index 0453f49dc708..326f02ffca81 100644
--- a/drivers/nvdimm/dax_devs.c
+++ b/drivers/nvdimm/dax_devs.c
@@ -126,7 +126,7 @@ int nd_dax_probe(struct device *dev, struct 
nd_namespace_common *ndns)
nvdimm_bus_unlock(>dev);
if (!dax_dev)
return -ENOMEM;
-   pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+   pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, DAX_SIG);
dev_dbg(dev, "dax: %s\n", rc == 0 ? dev_name(dax_dev) : "");
diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index dde9853453d3..e901e3a3b04c 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -36,6 +36,7 @@ struct nd_pfn_sb {
__le32 end_trunc;
/* minor-version-2 record the base alignment of the mapping */
__le32 align;
+   /* minor-version-3 guarantee the padding and flags are zero */
u8 padding[4000];
__le64 checksum;
 };
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 01f40672507f..a2406253eb70 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -420,6 +420,15 @@ static int nd_pfn_clear_memmap_errors(struct nd_pfn 
*nd_pfn)
return 0;
 }
 
+/**
+ * nd_pfn_validate - read and validate info-block
+ * @nd_pfn: fsdax namespace runtime state / properties
+ * @sig: 'devdax' or 'fsdax' signature
+ *
+ * Upon return the info-block buffer contents (->pfn_sb) are
+ * indeterminate when validation fails, and a coherent info-block
+ * otherwise.
+ */
 int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 {
u64 checksum, offset;
@@ -565,7 +574,7 @@ int nd_pfn_probe(struct device *dev, struct 
nd_namespace_common *ndns)
nvdimm_bus_unlock(>dev);
if (!pfn_dev)
return -ENOMEM;
-   pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+   pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn = to_nd_pfn(pfn_dev);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, PFN_SIG);
@@ -702,7 +711,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
u64 checksum;
int rc;
 
-   pfn_sb = devm_kzalloc(_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
+   pfn_sb = devm_kmalloc(_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
if (!pfn_sb)
return -ENOMEM;
 
@@ -711,11 +720,14 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
sig = DAX_SIG;
else
sig = PFN_SIG;
+
rc = nd_pfn_validate(nd_pfn, sig);
if (rc != -ENODEV)
return rc;
 
/* no info block, do init */;
+   memset(pfn_sb, 0, sizeof(*pfn_sb));
+
nd_region = to_nd_region(nd_pfn->dev.parent);
if (nd_region->ro) {
dev_info(_pfn->dev,
@@ -768,7 +780,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
memcpy(pfn_sb->uuid, nd_pfn->uuid, 16);
memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(>dev), 16);
pfn_sb->version_major = cpu_to_le16(1);
-   pfn_sb->version_minor = cpu_to_le16(2);
+   pfn_sb->version_minor = cpu_to_le16(3);
pfn_sb->start_pad = cpu_to_le32(start_pad);
pfn_sb->end_trunc = cpu_to_le32(end_trunc);
pfn_sb->align = cpu_to_le32(nd_pfn->align);

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

[PATCH v7 09/12] mm/sparsemem: Support sub-section hotplug

2019-05-02 Thread Dan Williams

The libnvdimm sub-system has suffered a series of hacks and broken
workarounds for the memory-hotplug implementation's awkward
section-aligned (128MB) granularity. For example the following backtrace
is emitted when attempting arch_add_memory() with physical address
ranges that intersect 'System RAM' (RAM) with 'Persistent Memory' (PMEM)
within a given section:

 WARNING: CPU: 0 PID: 558 at kernel/memremap.c:300 
devm_memremap_pages+0x3b5/0x4c0
 devm_memremap_pages attempted on mixed region [mem 0x2-0x2fbff 
flags 0x200]
 [..]
 Call Trace:
   dump_stack+0x86/0xc3
   __warn+0xcb/0xf0
   warn_slowpath_fmt+0x5f/0x80
   devm_memremap_pages+0x3b5/0x4c0
   __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
   pmem_attach_disk+0x19a/0x440 [nd_pmem]

Recently it was discovered that the problem goes beyond RAM vs PMEM
collisions as some platform produce PMEM vs PMEM collisions within a
given section. The libnvdimm workaround for that case revealed that the
libnvdimm section-alignment-padding implementation has been broken for a
long while. A fix for that long-standing breakage introduces as many
problems as it solves as it would require a backward-incompatible change
to the namespace metadata interpretation. Instead of that dubious route
[1], address the root problem in the memory-hotplug implementation.

[1]: 
https://lore.kernel.org/r/155000671719.348031.2347363160141119237.st...@dwillia2-desk3.amr.corp.intel.com
Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: Logan Gunthorpe 
Signed-off-by: Dan Williams 
---
 mm/sparse.c |  223 ---
 1 file changed, 150 insertions(+), 73 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 198371e5fc87..419a3620af6e 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -83,8 +83,15 @@ static int __meminit sparse_index_init(unsigned long 
section_nr, int nid)
unsigned long root = SECTION_NR_TO_ROOT(section_nr);
struct mem_section *section;
 
+   /*
+* An existing section is possible in the sub-section hotplug
+* case. First hot-add instantiates, follow-on hot-add reuses
+* the existing section.
+*
+* The mem_hotplug_lock resolves the apparent race below.
+*/
if (mem_section[root])
-   return -EEXIST;
+   return 0;
 
section = sparse_index_alloc(nid);
if (!section)
@@ -338,6 +345,15 @@ static void __meminit sparse_init_one_section(struct 
mem_section *ms,
unsigned long pnum, struct page *mem_map,
struct mem_section_usage *usage)
 {
+   /*
+* Given that SPARSEMEM_VMEMMAP=y supports sub-section hotplug,
+* ->section_mem_map can not be guaranteed to point to a full
+*  section's worth of memory.  The field is only valid / used
+*  in the SPARSEMEM_VMEMMAP=n case.
+*/
+   if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
+   mem_map = NULL;
+
ms->section_mem_map &= ~SECTION_MAP_MASK;
ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum) |
SECTION_HAS_MEM_MAP;
@@ -743,10 +759,130 @@ static void free_map_bootmem(struct page *memmap)
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
+#ifndef CONFIG_MEMORY_HOTREMOVE
+static void free_map_bootmem(struct page *memmap)
+{
+}
+#endif
+
+static bool is_early_section(struct mem_section *ms)
+{
+   struct page *usage_page;
+
+   usage_page = virt_to_page(ms->usage);
+   if (PageSlab(usage_page) || PageCompound(usage_page))
+   return false;
+   else
+   return true;
+}
+
+static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
+   int nid, struct vmem_altmap *altmap)
+{
+   unsigned long mask = section_active_mask(pfn, nr_pages);
+   struct mem_section *ms = __pfn_to_section(pfn);
+   bool early_section = is_early_section(ms);
+   struct page *memmap = NULL;
+
+   if (WARN(!ms->usage || (ms->usage->map_active & mask) != mask,
+   "section already deactivated: active: %#lx mask: 
%#lx\n",
+   ms->usage ? ms->usage->map_active : 0, mask))
+   return;
+
+   if (WARN(!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)
+   && nr_pages < PAGES_PER_SECTION,
+   "partial memory section removal not 
supported\n"))
+   return;
+
+   /*
+* There are 3 cases to handle across two configurations
+* (SPARSEMEM_VMEMMAP={y,n}):
+*
+* 1/ deactivation of a partial hot-added section (only possible
+* in the SPARSEMEM_VMEMMAP=y case).
+*a/ section was present at memory init
+*b/ section was hot-added post memory init
+* 2/ deactivation of a complete hot-added section
+* 3/ deactivation of a complete section from memory

[PATCH v7 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal

2019-05-02 Thread Dan Williams

Sub-section hotplug support reduces the unit of operation of hotplug
from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units
(PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider
PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not
valid_section(), can toggle.

Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: Logan Gunthorpe 
Reviewed-by: Oscar Salvador 
Signed-off-by: Dan Williams 
---
 include/linux/mmzone.h |2 ++
 mm/memory_hotplug.c|   29 -
 2 files changed, 10 insertions(+), 21 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index cffde898e345..b13f0cddf75e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1164,6 +1164,8 @@ static inline unsigned long section_nr_to_pfn(unsigned 
long sec)
 
 #define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG)
 #define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1))
+#define PAGES_PER_SUB_SECTION (SECTION_ACTIVE_SIZE / PAGE_SIZE)
+#define PAGE_SUB_SECTION_MASK (~(PAGES_PER_SUB_SECTION-1))
 
 struct mem_section_usage {
/*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a76fc6a6e9fe..0d379da0f1a8 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -325,12 +325,8 @@ static unsigned long find_smallest_section_pfn(int nid, 
struct zone *zone,
 unsigned long start_pfn,
 unsigned long end_pfn)
 {
-   struct mem_section *ms;
-
-   for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) {
-   ms = __pfn_to_section(start_pfn);
-
-   if (unlikely(!valid_section(ms)))
+   for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUB_SECTION) {
+   if (unlikely(!pfn_valid(start_pfn)))
continue;
 
if (unlikely(pfn_to_nid(start_pfn) != nid))
@@ -350,15 +346,12 @@ static unsigned long find_biggest_section_pfn(int nid, 
struct zone *zone,
unsigned long start_pfn,
unsigned long end_pfn)
 {
-   struct mem_section *ms;
unsigned long pfn;
 
/* pfn is the end pfn of a memory section. */
pfn = end_pfn - 1;
-   for (; pfn >= start_pfn; pfn -= PAGES_PER_SECTION) {
-   ms = __pfn_to_section(pfn);
-
-   if (unlikely(!valid_section(ms)))
+   for (; pfn >= start_pfn; pfn -= PAGES_PER_SUB_SECTION) {
+   if (unlikely(!pfn_valid(pfn)))
continue;
 
if (unlikely(pfn_to_nid(pfn) != nid))
@@ -380,7 +373,6 @@ static void shrink_zone_span(struct zone *zone, unsigned 
long start_pfn,
unsigned long z = zone_end_pfn(zone); /* zone_end_pfn namespace clash */
unsigned long zone_end_pfn = z;
unsigned long pfn;
-   struct mem_section *ms;
int nid = zone_to_nid(zone);
 
zone_span_writelock(zone);
@@ -417,10 +409,8 @@ static void shrink_zone_span(struct zone *zone, unsigned 
long start_pfn,
 * it check the zone has only hole or not.
 */
pfn = zone_start_pfn;
-   for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) {
-   ms = __pfn_to_section(pfn);
-
-   if (unlikely(!valid_section(ms)))
+   for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUB_SECTION) {
+   if (unlikely(!pfn_valid(pfn)))
continue;
 
if (page_zone(pfn_to_page(pfn)) != zone)
@@ -448,7 +438,6 @@ static void shrink_pgdat_span(struct pglist_data *pgdat,
unsigned long p = pgdat_end_pfn(pgdat); /* pgdat_end_pfn namespace 
clash */
unsigned long pgdat_end_pfn = p;
unsigned long pfn;
-   struct mem_section *ms;
int nid = pgdat->node_id;
 
if (pgdat_start_pfn == start_pfn) {
@@ -485,10 +474,8 @@ static void shrink_pgdat_span(struct pglist_data *pgdat,
 * has only hole or not.
 */
pfn = pgdat_start_pfn;
-   for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SECTION) {
-   ms = __pfn_to_section(pfn);
-
-   if (unlikely(!valid_section(ms)))
+   for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SUB_SECTION) {
+   if (unlikely(!pfn_valid(pfn)))
continue;
 
if (pfn_to_nid(pfn) != nid)

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

[PATCH v7 03/12] mm/sparsemem: Add helpers track active portions of a section at boot

2019-05-02 Thread Dan Williams

Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
map_active bitmask length (64)). If it turns out that 2MB is too large
of an active tracking granularity it is trivial to increase the size of
the map_active bitmap.

The implications of a partially populated section is that pfn_valid()
needs to go beyond a valid_section() check and read the sub-section
active ranges from the bitmask.

Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: Logan Gunthorpe 
Tested-by: Jane Chu 
Signed-off-by: Dan Williams 
---
 include/linux/mmzone.h |   29 -
 mm/page_alloc.c|4 +++-
 mm/sparse.c|   48 
 3 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6726fc175b51..cffde898e345 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1175,6 +1175,8 @@ struct mem_section_usage {
unsigned long pageblock_flags[0];
 };
 
+void section_active_init(unsigned long pfn, unsigned long nr_pages);
+
 struct page;
 struct page_ext;
 struct mem_section {
@@ -1312,12 +1314,36 @@ static inline struct mem_section 
*__pfn_to_section(unsigned long pfn)
 
 extern int __highest_present_section_nr;
 
+static inline int section_active_index(phys_addr_t phys)
+{
+   return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE;
+}
+
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
+{
+   int idx = section_active_index(PFN_PHYS(pfn));
+
+   return !!(ms->usage->map_active & (1UL << idx));
+}
+#else
+static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
+{
+   return 1;
+}
+#endif
+
 #ifndef CONFIG_HAVE_ARCH_PFN_VALID
 static inline int pfn_valid(unsigned long pfn)
 {
+   struct mem_section *ms;
+
if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
return 0;
-   return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
+   ms = __nr_to_section(pfn_to_section_nr(pfn));
+   if (!valid_section(ms))
+   return 0;
+   return pfn_section_valid(ms, pfn);
 }
 #endif
 
@@ -1349,6 +1375,7 @@ void sparse_init(void);
 #define sparse_init()  do {} while (0)
 #define sparse_index_init(_sec, _nid)  do {} while (0)
 #define pfn_present pfn_valid
+#define section_active_init(_pfn, _nr_pages) do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 61c2b54a5b61..a68735c79609 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7291,10 +7291,12 @@ void __init free_area_init_nodes(unsigned long 
*max_zone_pfn)
 
/* Print out the early node map */
pr_info("Early memory node ranges\n");
-   for_each_mem_pfn_range(i, MAX_NUMNODES, _pfn, _pfn, )
+   for_each_mem_pfn_range(i, MAX_NUMNODES, _pfn, _pfn, ) {
pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
(u64)start_pfn << PAGE_SHIFT,
((u64)end_pfn << PAGE_SHIFT) - 1);
+   section_active_init(start_pfn, end_pfn - start_pfn);
+   }
 
/* Initialise every node */
mminit_verify_pageflags_layout();
diff --git a/mm/sparse.c b/mm/sparse.c
index f87de7ad32c8..8d4f28e2c25e 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -210,6 +210,54 @@ static inline unsigned long first_present_section_nr(void)
return next_present_section_nr(-1);
 }
 
+static unsigned long section_active_mask(unsigned long pfn,
+   unsigned long nr_pages)
+{
+   int idx_start, idx_size;
+   phys_addr_t start, size;
+
+   if (!nr_pages)
+   return 0;
+
+   start = PFN_PHYS(pfn);
+   size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
+   - (pfn & ~PAGE_SECTION_MASK)));
+   size = ALIGN(size, SECTION_ACTIVE_SIZE);
+
+   idx_start = section_active_index(start);
+   idx_size = section_active_index(size);
+
+   if (idx_size == 0)
+   return -1;
+   return ((1UL << idx_size) - 1) << idx_start;
+}
+
+void section_active_init(unsigned long pfn, unsigned long nr_pages)
+{
+   int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
+   int i, start_sec = pfn_to_section_nr(pfn);
+
+   if (!nr_pages)
+   return;
+
+   for (i = start_sec; i <= end_sec; i++) {
+   struct mem_section *ms;
+   unsigned long mask;
+   unsigned long pfns;
+
+   pfns = min(nr_pages, PAGES_PER_SECTION
+   - (pfn & ~PAGE_SECTION_MASK));
+   mask = section_active_mask(pfn, pfns);
+
+   ms = __nr_to_section(i);
+   ms->usage->map_active |= mask;
+   pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i, 
ms->usage->map_active);
+
+   pfn += pfns;
+

[PATCH v7 01/12] mm/sparsemem: Introduce struct mem_section_usage

2019-05-02 Thread Dan Williams

Towards enabling memory hotplug to track partial population of a
section, introduce 'struct mem_section_usage'.

A pointer to a 'struct mem_section_usage' instance replaces the existing
pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
house a new 'map_active' bitmap.  The new bitmap enables the memory
hot{plug,remove} implementation to act on incremental sub-divisions of a
section.

The primary motivation for this functionality is to support platforms
that mix "System RAM" and "Persistent Memory" within a single section,
or multiple PMEM ranges with different mapping lifetimes within a single
section. The section restriction for hotplug has caused an ongoing saga
of hacks and bugs for devm_memremap_pages() users.

Beyond the fixups to teach existing paths how to retrieve the 'usemap'
from a section, and updates to usemap allocation path, there are no
expected behavior changes.

Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: Logan Gunthorpe 
Signed-off-by: Dan Williams 
---
 include/linux/mmzone.h |   23 --
 mm/memory_hotplug.c|   18 ++-
 mm/page_alloc.c|2 +
 mm/sparse.c|   81 
 4 files changed, 71 insertions(+), 53 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 70394cabaf4e..f0bbd85dc19a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1160,6 +1160,19 @@ static inline unsigned long section_nr_to_pfn(unsigned 
long sec)
 #define SECTION_ALIGN_UP(pfn)  (((pfn) + PAGES_PER_SECTION - 1) & 
PAGE_SECTION_MASK)
 #define SECTION_ALIGN_DOWN(pfn)((pfn) & PAGE_SECTION_MASK)
 
+#define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG)
+#define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1))
+
+struct mem_section_usage {
+   /*
+* SECTION_ACTIVE_SIZE portions of the section that are populated in
+* the memmap
+*/
+   unsigned long map_active;
+   /* See declaration of similar field in struct zone */
+   unsigned long pageblock_flags[0];
+};
+
 struct page;
 struct page_ext;
 struct mem_section {
@@ -1177,8 +1190,7 @@ struct mem_section {
 */
unsigned long section_mem_map;
 
-   /* See declaration of similar field in struct zone */
-   unsigned long *pageblock_flags;
+   struct mem_section_usage *usage;
 #ifdef CONFIG_PAGE_EXTENSION
/*
 * If SPARSEMEM, pgdat doesn't have page_ext pointer. We use
@@ -1209,6 +1221,11 @@ extern struct mem_section **mem_section;
 extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT];
 #endif
 
+static inline unsigned long *section_to_usemap(struct mem_section *ms)
+{
+   return ms->usage->pageblock_flags;
+}
+
 static inline struct mem_section *__nr_to_section(unsigned long nr)
 {
 #ifdef CONFIG_SPARSEMEM_EXTREME
@@ -1220,7 +1237,7 @@ static inline struct mem_section 
*__nr_to_section(unsigned long nr)
return _section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK];
 }
 extern int __section_nr(struct mem_section* ms);
-extern unsigned long usemap_size(void);
+extern size_t mem_section_usage_size(void);
 
 /*
  * We use the lower bits of the mem_map pointer to store
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 328878b6799d..a76fc6a6e9fe 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -165,9 +165,10 @@ void put_page_bootmem(struct page *page)
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
 static void register_page_bootmem_info_section(unsigned long start_pfn)
 {
-   unsigned long *usemap, mapsize, section_nr, i;
+   unsigned long mapsize, section_nr, i;
struct mem_section *ms;
struct page *page, *memmap;
+   struct mem_section_usage *usage;
 
section_nr = pfn_to_section_nr(start_pfn);
ms = __nr_to_section(section_nr);
@@ -187,10 +188,10 @@ static void register_page_bootmem_info_section(unsigned 
long start_pfn)
for (i = 0; i < mapsize; i++, page++)
get_page_bootmem(section_nr, page, SECTION_INFO);
 
-   usemap = ms->pageblock_flags;
-   page = virt_to_page(usemap);
+   usage = ms->usage;
+   page = virt_to_page(usage);
 
-   mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT;
+   mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT;
 
for (i = 0; i < mapsize; i++, page++)
get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
@@ -199,9 +200,10 @@ static void register_page_bootmem_info_section(unsigned 
long start_pfn)
 #else /* CONFIG_SPARSEMEM_VMEMMAP */
 static void register_page_bootmem_info_section(unsigned long start_pfn)
 {
-   unsigned long *usemap, mapsize, section_nr, i;
+   unsigned long mapsize, section_nr, i;
struct mem_section *ms;
struct page *page, *memmap;
+   struct mem_section_usage *usage;
 
section_nr = pfn_to_section_nr(start_pfn);

[PATCH v7 00/12] mm: Sub-section memory hotplug support

2019-05-02 Thread Dan Williams

Changes since v6 [1]:

- Rebase on next-20190501, no related conflicts or updates

- Fix boot crash due to inaccurate setup of the initial section
  ->map_active bitmask caused by multiple activations of the same
  section. (Jane, Jeff)

- Fix pmem startup crash when devm_memremap_pages() needs to instantiate
  a new section. (Jeff)

- Drop mhp_restrictions for the __remove_pages() path in favor of
  find_memory_block() to detect cases where section-aligned remove is
  required (David)

- Add "[PATCH v7 06/12] mm/hotplug: Kill is_dev_zone() usage in
  __remove_pages()"

- Cleanup shrink_{zone,pgdat}_span to remove no longer necessary @ms
  section variables. (Oscar)

- Add subsection_check() to the __add_pages() path to prevent
  inadvertent sub-section misuse.

[1]: 
https://lore.kernel.org/lkml/12633539.2015392.2477781120122237934.st...@dwillia2-desk3.amr.corp.intel.com/

---
[merge logistics]

Hi Andrew,

I believe this is ready for another spin in -mm now that the boot
regression has been squashed. In a chat with Michal last night at LSF/MM
I submitted to his assertion that the boot regression validates the
general concern that there were/are subtle dependencies on sections
beyond what I found to date by code inspection. Of course I want to
relieve the pain that the section constraint inflicts on libnvdimm and
devm_memremap_pages() as soon as possible (i.e. v5.2), but deferment to
v5.3 to give Michal time to do an in-depth look is also acceptable.

---
[cover letter]

The memory hotplug section is an arbitrary / convenient unit for memory
hotplug. 'Section-size' units have bled into the user interface
('memblock' sysfs) and can not be changed without breaking existing
userspace. The section-size constraint, while mostly benign for typical
memory hotplug, has and continues to wreak havoc with 'device-memory'
use cases, persistent memory (pmem) in particular. Recall that pmem uses
devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a
'struct page' memmap for pmem. However, it does not use the 'bottom
half' of memory hotplug, i.e. never marks pmem pages online and never
exposes the userspace memblock interface for pmem. This leaves an
opening to redress the section-size constraint.

To date, the libnvdimm subsystem has attempted to inject padding to
satisfy the internal constraints of arch_add_memory(). Beyond
complicating the code, leading to bugs [2], wasting memory, and limiting
configuration flexibility, the padding hack is broken when the platform
changes this physical memory alignment of pmem from one boot to the
next. Device failure (intermittent or permanent) and physical
reconfiguration are events that can cause the platform firmware to
change the physical placement of pmem on a subsequent boot, and device
failure is an everyday event in a data-center.

It turns out that sections are only a hard requirement of the
user-facing interface for memory hotplug and with a bit more
infrastructure sub-section arch_add_memory() support can be added for
kernel internal usages like devm_memremap_pages(). Here is an analysis
of the current design assumptions in the current code and how they are
addressed in the new implementation:

Current design assumptions:

- Sections that describe boot memory (early sections) are never
  unplugged / removed.

- pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a
  valid_section() check

- __add_pages() and helper routines assume all operations occur in
  PAGES_PER_SECTION units.

- The memblock sysfs interface only comprehends full sections

New design assumptions:

- Sections are instrumented with a sub-section bitmask to track (on x86)
  individual 2MB sub-divisions of a 128MB section.

- Partially populated early sections can be extended with additional
  sub-sections, and those sub-sections can be removed with
  arch_remove_memory(). With this in place we no longer lose usable memory
  capacity to padding.

- pfn_valid() is updated to look deeper than valid_section() to also check the
  active-sub-section mask. This indication is in the same cacheline as
  the valid_section() so the performance impact is expected to be
  negligible. So far the lkp robot has not reported any regressions.

- Outside of the core vmemmap population routines which are replaced,
  other helper routines like shrink_{zone,pgdat}_span() are updated to
  handle the smaller granularity. Core memory hotplug routines that deal
  with online memory are not touched.

- The existing memblock sysfs user api guarantees / assumptions are
  not touched since this capability is limited to !online
  !memblock-sysfs-accessible sections.

Meanwhile the issue reports continue to roll in from users that do not
understand when and how the 128MB constraint will bite them. The current
implementation relied on being able to support at least one misaligned
namespace, but that immediately falls over on any moderately complex
namespace creation attempt. Beyond the initial

[PATCH v7 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section

2019-05-02 Thread Dan Williams

Up-level the local section size and mask from kernel/memremap.c to
global definitions.  These will be used by the new sub-section hotplug
support.

Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: Jérôme Glisse 
Cc: Logan Gunthorpe 
Signed-off-by: Dan Williams 
---
 include/linux/mmzone.h |2 ++
 kernel/memremap.c  |   10 --
 mm/hmm.c   |2 --
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f0bbd85dc19a..6726fc175b51 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1134,6 +1134,8 @@ static inline unsigned long early_pfn_to_nid(unsigned 
long pfn)
  * PFN_SECTION_SHIFT   pfn to/from section number
  */
 #define PA_SECTION_SHIFT   (SECTION_SIZE_BITS)
+#define PA_SECTION_SIZE(1UL << PA_SECTION_SHIFT)
+#define PA_SECTION_MASK(~(PA_SECTION_SIZE-1))
 #define PFN_SECTION_SHIFT  (SECTION_SIZE_BITS - PAGE_SHIFT)
 
 #define NR_MEM_SECTIONS(1UL << SECTIONS_SHIFT)
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 4e59d29245f4..f355586ea54a 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -14,8 +14,6 @@
 #include 
 
 static DEFINE_XARRAY(pgmap_array);
-#define SECTION_MASK ~((1UL << PA_SECTION_SHIFT) - 1)
-#define SECTION_SIZE (1UL << PA_SECTION_SHIFT)
 
 #if IS_ENABLED(CONFIG_DEVICE_PRIVATE)
 vm_fault_t device_private_entry_fault(struct vm_area_struct *vma,
@@ -98,8 +96,8 @@ static void devm_memremap_pages_release(void *data)
put_page(pfn_to_page(pfn));
 
/* pages are dead and unused, undo the arch mapping */
-   align_start = res->start & ~(SECTION_SIZE - 1);
-   align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
+   align_start = res->start & ~(PA_SECTION_SIZE - 1);
+   align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
- align_start;
 
nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT));
@@ -160,8 +158,8 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
if (!pgmap->ref || !pgmap->kill)
return ERR_PTR(-EINVAL);
 
-   align_start = res->start & ~(SECTION_SIZE - 1);
-   align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
+   align_start = res->start & ~(PA_SECTION_SIZE - 1);
+   align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
- align_start;
align_end = align_start + align_size - 1;
 
diff --git a/mm/hmm.c b/mm/hmm.c
index 0db8491090b8..a7e7f8e33c5f 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -34,8 +34,6 @@
 #include 
 #include 
 
-#define PA_SECTION_SIZE (1UL << PA_SECTION_SHIFT)
-
 #if IS_ENABLED(CONFIG_HMM_MIRROR)
 static const struct mmu_notifier_ops hmm_mmu_notifier_ops;
 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

[PATCH v7 05/12] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap()

2019-05-02 Thread Dan Williams

Allow sub-section sized ranges to be added to the memmap.
populate_section_memmap() takes an explict pfn range rather than
assuming a full section, and those parameters are plumbed all the way
through to vmmemap_populate(). There should be no sub-section usage in
current deployments. New warnings are added to clarify which memmap
allocation paths are sub-section capable.

Cc: Michal Hocko 
Cc: David Hildenbrand 
Cc: Logan Gunthorpe 
Signed-off-by: Dan Williams 
---
 arch/x86/mm/init_64.c |4 ++-
 include/linux/mm.h|4 ++-
 mm/sparse-vmemmap.c   |   21 +++--
 mm/sparse.c   |   61 +++--
 4 files changed, 57 insertions(+), 33 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 20d14254b686..bb018d09d2dc 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1457,7 +1457,9 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
 {
int err;
 
-   if (boot_cpu_has(X86_FEATURE_PSE))
+   if (end - start < PAGES_PER_SECTION * sizeof(struct page))
+   err = vmemmap_populate_basepages(start, end, node);
+   else if (boot_cpu_has(X86_FEATURE_PSE))
err = vmemmap_populate_hugepages(start, end, node, altmap);
else if (altmap) {
pr_err_once("%s: no cpu support for altmap allocations\n",
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0e8834ac32b7..5360a0e4051d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2748,8 +2748,8 @@ const char * arch_vma_name(struct vm_area_struct *vma);
 void print_vma_addr(char *prefix, unsigned long rip);
 
 void *sparse_buffer_alloc(unsigned long size);
-struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
-   struct vmem_altmap *altmap);
+struct page * __populate_section_memmap(unsigned long pfn,
+   unsigned long nr_pages, int nid, struct vmem_altmap *altmap);
 pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
 p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node);
 pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 7fec05796796..dcb023aa23d1 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -245,19 +245,26 @@ int __meminit vmemmap_populate_basepages(unsigned long 
start,
return 0;
 }
 
-struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid,
-   struct vmem_altmap *altmap)
+struct page * __meminit __populate_section_memmap(unsigned long pfn,
+   unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
 {
unsigned long start;
unsigned long end;
-   struct page *map;
 
-   map = pfn_to_page(pnum * PAGES_PER_SECTION);
-   start = (unsigned long)map;
-   end = (unsigned long)(map + PAGES_PER_SECTION);
+   /*
+* The minimum granularity of memmap extensions is
+* SECTION_ACTIVE_SIZE as allocations are tracked in the
+* 'map_active' bitmap of the section.
+*/
+   end = ALIGN(pfn + nr_pages, PHYS_PFN(SECTION_ACTIVE_SIZE));
+   pfn &= PHYS_PFN(SECTION_ACTIVE_MASK);
+   nr_pages = end - pfn;
+
+   start = (unsigned long) pfn_to_page(pfn);
+   end = start + nr_pages * sizeof(struct page);
 
if (vmemmap_populate(start, end, nid, altmap))
return NULL;
 
-   return map;
+   return pfn_to_page(pfn);
 }
diff --git a/mm/sparse.c b/mm/sparse.c
index 8d4f28e2c25e..ed26761327bf 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -452,8 +452,8 @@ static unsigned long __init section_map_size(void)
return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
 }
 
-struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid,
-   struct vmem_altmap *altmap)
+struct page __init *__populate_section_memmap(unsigned long pfn,
+   unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
 {
unsigned long size = section_map_size();
struct page *map = sparse_buffer_alloc(size);
@@ -534,10 +534,13 @@ static void __init sparse_init_nid(int nid, unsigned long 
pnum_begin,
}
sparse_buffer_init(map_count * section_map_size(), nid);
for_each_present_section_nr(pnum_begin, pnum) {
+   unsigned long pfn = section_nr_to_pfn(pnum);
+
if (pnum >= pnum_end)
break;
 
-   map = sparse_mem_map_populate(pnum, nid, NULL);
+   map = __populate_section_memmap(pfn, PAGES_PER_SECTION,
+   nid, NULL);
if (!map) {
pr_err("%s: node[%d] memory map backing failed. Some 
memory will not be available.",
   __func__, nid);
@@ -637,17 +640,17 @@ void offline_mem_sections(unsigned long start_pfn, 
unsigned long end_pfn)
 #endif
 
 #ifdef

Re: [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage

2019-05-02 Thread Dan Williams

On Wed, May 1, 2019 at 4:25 PM Pavel Tatashin  wrote:
>
> On 19-04-17 11:39:00, Dan Williams wrote:
> > Towards enabling memory hotplug to track partial population of a
> > section, introduce 'struct mem_section_usage'.
> >
> > A pointer to a 'struct mem_section_usage' instance replaces the existing
> > pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
> > 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
> > house a new 'map_active' bitmap.  The new bitmap enables the memory
> > hot{plug,remove} implementation to act on incremental sub-divisions of a
> > section.
> >
> > The primary motivation for this functionality is to support platforms
> > that mix "System RAM" and "Persistent Memory" within a single section,
> > or multiple PMEM ranges with different mapping lifetimes within a single
> > section. The section restriction for hotplug has caused an ongoing saga
> > of hacks and bugs for devm_memremap_pages() users.
> >
> > Beyond the fixups to teach existing paths how to retrieve the 'usemap'
> > from a section, and updates to usemap allocation path, there are no
> > expected behavior changes.
> >
> > Cc: Michal Hocko 
> > Cc: Vlastimil Babka 
> > Cc: Logan Gunthorpe 
> > Signed-off-by: Dan Williams 
> > ---
> >  include/linux/mmzone.h |   23 --
> >  mm/memory_hotplug.c|   18 ++-
> >  mm/page_alloc.c|2 +
> >  mm/sparse.c|   81 
> > 
> >  4 files changed, 71 insertions(+), 53 deletions(-)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 70394cabaf4e..f0bbd85dc19a 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -1160,6 +1160,19 @@ static inline unsigned long 
> > section_nr_to_pfn(unsigned long sec)
> >  #define SECTION_ALIGN_UP(pfn)(((pfn) + PAGES_PER_SECTION - 1) & 
> > PAGE_SECTION_MASK)
> >  #define SECTION_ALIGN_DOWN(pfn)  ((pfn) & PAGE_SECTION_MASK)
> >
> > +#define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG)
> > +#define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1))
> > +
> > +struct mem_section_usage {
> > + /*
> > +  * SECTION_ACTIVE_SIZE portions of the section that are populated in
> > +  * the memmap
> > +  */
> > + unsigned long map_active;
>
> I think this should be proportional to section_size / subsection_size.
> For example, on intel section size = 128M, and subsection is 2M, so
> 64bits work nicely. But, on arm64 section size if 1G, so subsection is
> 16M.
>
> On the other hand 16M is already much better than what we have: with 1G
> section size and 2M pmem alignment we guaranteed to loose 1022M. And
> with 16M subsection it is only 14M.

I'm ok with it being 16M for now unless it causes a problem in
practice, i.e. something like the minimum hardware mapping alignment
for physical memory being less than 16M.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

69 matches

Mail list logo