Re: [PATCH v4] x86: fix kaslr and memmap collision

2017-01-04 Thread Baoquan He
On 01/04/17 at 10:06am, Dave Jiang wrote:
> On 01/03/2017 07:37 PM, Baoquan He wrote:
> >>  #include 
> >>  #include 
> >> @@ -61,9 +62,16 @@ enum mem_avoid_index {
> >>MEM_AVOID_INITRD,
> >>MEM_AVOID_CMDLINE,
> >>MEM_AVOID_BOOTPARAMS,
> >> +  MEM_AVOID_MEMMAP1,
> >> +  MEM_AVOID_MEMMAP2,
> >> +  MEM_AVOID_MEMMAP3,
> >> +  MEM_AVOID_MEMMAP4,
> > 
> > This looks not good. Could it be done like fixed_addresses?
> > Something like:
> > 
> > MEM_AVOID_MEMMAP_BEGIN,
> > MEM_AVOID_MEMMAP_BEGIN + MEM_AVOID_MAX -1,
> > 
> > Please point it out to me if there's some existing code in kernel like
> > your way, I can also accept it.
> 
> I think you mean:
> MEM_AVOID_MEMMAP_BEGIN + MAX_MEMMAP_REGIONS - 1

Ah, yes. Sorry for this.

> 
> I will change
> 
> 
> > 
> >>MEM_AVOID_MAX,
> >>  };
> >>  
> >> +/* only supporting at most 4 memmap regions with kaslr */
> > And here, "Only supporting at most 4 un-usable memmap regions with kaslr"?
> > Surely this is based on if you will ignore the usable memory and do not
> > store it as 0. And also the log need be changed too accordingly.
> >> +#define MAX_MEMMAP_REGIONS4
...
> >> +static int
> >> +parse_memmap(char *p, unsigned long long *start, unsigned long long *size)
> >> +{
> >> +  char *oldp;
> >> +
> >> +  if (!p)
> >> +  return -EINVAL;
> >> +
> >> +  /* we don't care about this option here */
> >> +  if (!strncmp(p, "exactmap", 8))
> >> +  return -EINVAL;
> >> +
> >> +  oldp = p;
> >> +  *size = _memparse(p, &p);
> >> +  if (p == oldp)
> >> +  return -EINVAL;
> >> +
> >> +  switch (*p) {
> >> +  case '@':
> >> +  /* skip this region, usable */
> >> +  *start = 0;
> >> +  *size = 0;
> >> +  return 0;
> > How about direclty return if nn@ss? Seems no need to waste one mem avoid
> > region slot. In fact even amount of usable memory regions are provided to
> > 100, it won't impact that you want to specify a reserve memmap region if
> > you skip it direclty. Personal opinion.
> 
> We are not wasting the slot. If you look mem_avoid_memmap() where I call
> the function, it will skip with a continue if size == 0 without
> incrementing the 'i' counter. That will skip all the nn@ss regions
> without counting against the max avoid mapping.

Yes, indeed. Sorry, I didn't read the code carefully.

Thanks
Baoquan
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH] mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}

2017-01-04 Thread Dan Williams
Both arch_add_memory() and arch_remove_memory() expect a single threaded
context.

For example, arch/x86/mm/init_64.c::kernel_physical_mapping_init() does
not hold any locks over this check and branch:

if (pgd_val(*pgd)) {
pud = (pud_t *)pgd_page_vaddr(*pgd);
paddr_last = phys_pud_init(pud, __pa(vaddr),
   __pa(vaddr_end),
   page_size_mask);
continue;
}

pud = alloc_low_page();
paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end),
   page_size_mask);

The result is that two threads calling devm_memremap_pages()
simultaneously can end up colliding on pgd initialization. This leads to
crash signatures like the following where the loser of the race
initializes the wrong pgd entry:

BUG: unable to handle kernel paging request at 888ebfff
IP: [] memcpy_erms+0x6/0x10
PGD 2f8e8fc067 PUD 0 /* < Invalid PUD */
Oops:  [#1] SMP DEBUG_PAGEALLOC
CPU: 54 PID: 3818 Comm: systemd-udevd Not tainted 4.6.7+ #13
task: 882fac290040 ti: 882f887a4000 task.ti: 882f887a4000
RIP: 0010:[]  [] memcpy_erms+0x6/0x10
[..]
Call Trace:
 [] ? pmem_do_bvec+0x205/0x370 [nd_pmem]
 [] ? blk_queue_enter+0x3a/0x280
 [] pmem_rw_page+0x38/0x80 [nd_pmem]
 [] bdev_read_page+0x84/0xb0

Hold the standard memory hotplug mutex over calls to
arch_{add,remove}_memory().

Cc: 
Cc: Christoph Hellwig 
Fixes: 41e94a851304 ("add devm_memremap_pages")
Signed-off-by: Dan Williams 
---
 kernel/memremap.c |4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index b501e390bb34..9ecedc28b928 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -246,7 +246,9 @@ static void devm_memremap_pages_release(struct device *dev, 
void *data)
/* pages are dead and unused, undo the arch mapping */
align_start = res->start & ~(SECTION_SIZE - 1);
align_size = ALIGN(resource_size(res), SECTION_SIZE);
+   mem_hotplug_begin();
arch_remove_memory(align_start, align_size);
+   mem_hotplug_done();
untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
pgmap_radix_release(res);
dev_WARN_ONCE(dev, pgmap->altmap && pgmap->altmap->alloc,
@@ -358,7 +360,9 @@ void *devm_memremap_pages(struct device *dev, struct 
resource *res,
if (error)
goto err_pfn_remap;
 
+   mem_hotplug_begin();
error = arch_add_memory(nid, align_start, align_size, true);
+   mem_hotplug_done();
if (error)
goto err_add_memory;
 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[ndctl PATCH] ndctl: support machines without numa

2017-01-04 Thread Dan Williams
From: Diego Dompe 

Currently when using ndctl in machines that do not have numa, the ndctl
command fails because it fails to find the numa_node attribute in
systems.

Just assume the default, zero, in these cases rather than fail.

Link: https://github.com/pmem/ndctl/pull/9
[djbw: minor style fixups]
Signed-off-by: Dan Williams 
---

Diego, if this reworked version looks good to you I'll add your
"Signed-off-by: Diego Dompe " to the tags. Let me know.
Thanks for the fix!

 ndctl/lib/libndctl.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/ndctl/lib/libndctl.c b/ndctl/lib/libndctl.c
index ea3111eb6c93..8240235dff5c 100644
--- a/ndctl/lib/libndctl.c
+++ b/ndctl/lib/libndctl.c
@@ -2780,9 +2780,8 @@ static void *add_namespace(void *parent, int id, const 
char *ndns_base)
ndns->raw_mode = strtoul(buf, NULL, 0);
 
sprintf(path, "%s/numa_node", ndns_base);
-   if (sysfs_read_attr(ctx, path, buf) < 0)
-   goto err_read;
-   ndns->numa_node = strtol(buf, NULL, 0);
+   if (sysfs_read_attr(ctx, path, buf) == 0)
+   ndns->numa_node = strtol(buf, NULL, 0);
 
switch (ndns->type) {
case ND_DEVICE_NAMESPACE_BLK:

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[ndctl PATCH] test: fix multi_pmem link error

2017-01-04 Thread Dan Williams
As reported by issue #8:

  "I tried to build ndctl, and I executed "make check",
   but I met link error of libuuid like followings.

/usr/bin/ld: multi-pmem.o: undefined reference to symbol 
'uuid_unparse@@UUID_1.0'
/lib/x86_64-linux-gnu/libuuid.so.1: error adding symbols: DSO missing from 
command line

   Not only libuuid, but also libkmod has similar problem."

Add the missing libraries to the dependency list. This happens to work
on latest Fedora, but fails for the reporter on Ubuntu 14.04.

Link: https://github.com/pmem/ndctl/issues/8
Reported-by: Yasunori Goto 
Signed-off-by: Dan Williams 
---
 test/Makefile.am |2 ++
 1 file changed, 2 insertions(+)

diff --git a/test/Makefile.am b/test/Makefile.am
index 9e3ae1dab411..7ec14118f74d 100644
--- a/test/Makefile.am
+++ b/test/Makefile.am
@@ -89,4 +89,6 @@ multi_pmem_SOURCES = \
 multi_pmem_LDADD = \
$(LIBNDCTL_LIB) \
$(JSON_LIBS) \
+   $(UUID_LIBS) \
+   $(KMOD_LIBS) \
../libutil.a

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v5] x86: fix kaslr and memmap collision

2017-01-04 Thread Dave Jiang
CONFIG_RANDOMIZE_BASE relocates the kernel to a random base address.
However it does not take into account the memmap= parameter passed in from
the kernel cmdline. This results in the kernel sometimes being put in
the middle of memmap. Teaching kaslr to not insert the kernel in
memmap defined regions. We will support up to 4 memmap regions. Any
additional regions will cause kaslr to disable. The mem_avoid set has
been augmented to add up to 4 unusable regions of memmaps provided by the
user to exclude those regions from the set of valid address range to insert
the uncompressed kernel image. The nn@ss ranges will be skipped by the
mem_avoid set since it indicates memory useable.

Signed-off-by: Dave Jiang 
---

v2:
Addressing comments from Ingo.
- Handle entire list of memmaps
v3:
Fix 32bit build issue
v4:
Addressing comments from Baoquan
- Not exclude nn@ss ranges
v5:
Addressing additional comments from Baoquan
- Update commit header and various coding style changes

diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index e5612f3..59c2075 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -332,7 +332,10 @@ int strncmp(const char *cs, const char *ct, size_t count);
 size_t strnlen(const char *s, size_t maxlen);
 unsigned int atou(const char *s);
 unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int 
base);
+unsigned long simple_strtoul(const char *cp, char **endp, unsigned int base);
+long simple_strtol(const char *cp, char **endp, unsigned int base);
 size_t strlen(const char *s);
+char *strchr(const char *s, int c);
 
 /* tty.c */
 void puts(const char *);
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index a66854d..036b514 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -11,6 +11,7 @@
  */
 #include "misc.h"
 #include "error.h"
+#include "../boot.h"
 
 #include 
 #include 
@@ -56,11 +57,16 @@ struct mem_vector {
unsigned long size;
 };
 
+/* only supporting at most 4 unusable memmap regions with kaslr */
+#define MAX_MEMMAP_REGIONS 4
+
 enum mem_avoid_index {
MEM_AVOID_ZO_RANGE = 0,
MEM_AVOID_INITRD,
MEM_AVOID_CMDLINE,
MEM_AVOID_BOOTPARAMS,
+   MEM_AVOID_MEMMAP_BEGIN,
+   MEM_AVOID_MEMMAP_END = MEM_AVOID_MEMMAP_BEGIN + MAX_MEMMAP_REGIONS - 1,
MEM_AVOID_MAX,
 };
 
@@ -77,6 +83,121 @@ static bool mem_overlaps(struct mem_vector *one, struct 
mem_vector *two)
return true;
 }
 
+/**
+ * _memparse - parse a string with mem suffixes into a number
+ * @ptr: Where parse begins
+ * @retptr: (output) Optional pointer to next char after parse completes
+ *
+ * Parses a string into a number.  The number stored at @ptr is
+ * potentially suffixed with K, M, G, T, P, E.
+ */
+static unsigned long long _memparse(const char *ptr, char **retptr)
+{
+   char *endptr;   /* local pointer to end of parsed string */
+
+   unsigned long long ret = simple_strtoull(ptr, &endptr, 0);
+
+   switch (*endptr) {
+   case 'E':
+   case 'e':
+   ret <<= 10;
+   case 'P':
+   case 'p':
+   ret <<= 10;
+   case 'T':
+   case 't':
+   ret <<= 10;
+   case 'G':
+   case 'g':
+   ret <<= 10;
+   case 'M':
+   case 'm':
+   ret <<= 10;
+   case 'K':
+   case 'k':
+   ret <<= 10;
+   endptr++;
+   default:
+   break;
+   }
+
+   if (retptr)
+   *retptr = endptr;
+
+   return ret;
+}
+
+static int
+parse_memmap(char *p, unsigned long long *start, unsigned long long *size)
+{
+   char *oldp;
+
+   if (!p)
+   return -EINVAL;
+
+   /* we don't care about this option here */
+   if (!strncmp(p, "exactmap", 8))
+   return -EINVAL;
+
+   oldp = p;
+   *size = _memparse(p, &p);
+   if (p == oldp)
+   return -EINVAL;
+
+   switch (*p) {
+   case '@':
+   /* skip this region, usable */
+   *start = 0;
+   *size = 0;
+   return 0;
+   case '#':
+   case '$':
+   case '!':
+   *start = _memparse(p + 1, &p);
+   return 0;
+   }
+
+   return -EINVAL;
+}
+
+static int mem_avoid_memmap(void)
+{
+   char arg[128];
+   int rc = 0;
+
+   /* see if we have any memmap areas */
+   if (cmdline_find_option("memmap", arg, sizeof(arg)) > 0) {
+   int i = 0;
+   char *str = arg;
+
+   while (str && (i < MAX_MEMMAP_REGIONS)) {
+   unsigned long long start, size;
+   char *k = strchr(str, ',');
+
+   if (k)
+   *k++ = 0;
+
+   rc = parse_memmap(str, &start, &size);
+   if (rc < 0)
+   break;
+   str = 

Re: nvdimm read EFAULT vs EIO

2017-01-04 Thread Dan Williams
On Wed, Jan 4, 2017 at 8:45 AM, Stefan Hajnoczi  wrote:
> In drivers/nvdimm/pmem.c:pmem_do_bvec() reads are handled as follows:
>
>   if (unlikely(bad_pmem))
>   rc = -EIO;
>   else {
>   rc = read_pmem(page, off, pmem_addr, len);
>   flush_dcache_page(page);
>   }
>
> read_pmem() returns -EFAULT on error so the error return value of
> pmem_do_bvec() is either -EIO or -EFAULT.  EFAULT strikes me as unusual for a
> function used in the block I/O code path.
>
> Is the distinction between EIO and EFAULT significant for pmem_do_bvec() 
> callers?

It looks like all users except btrfs just check if bi_error is non-zero

> If not, should the function return -EIO in both cases?

I'm leaning toward fixing up btrfs to handle all errors as I/O errors.
Otherwise, it's not clear to me why
fs/btrfs/volumes.c::btrfs_end_bio() would need to parse error codes,
but drivers/md/raid5.c::raid5_end_read_request() does not.

Should probably address it both ways, make btrfs more tolerant in what
it accepts and make pmem more consistent / conservative in what it
sends.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v4] x86: fix kaslr and memmap collision

2017-01-04 Thread Dave Jiang
On 01/03/2017 07:37 PM, Baoquan He wrote:
> Hi Dave,
> 
> I have several concerns, please see the inline comments.
> 
> On 01/03/17 at 01:48pm, Dave Jiang wrote:
>> CONFIG_RANDOMIZE_BASE relocates the kernel to a random base address.
>> However it does not take into account the memmap= parameter passed in from
>> the kernel cmdline. This results in the kernel sometimes being put in
>> the middle of the user memmap. Teaching kaslr to not insert the kernel in
>   ~~~
> It could be better to be replaced with "user-defined physical RAM map"
> or memmap directly because memmap is meaning user-defined physical RAM
> map. Please check the boot info printing about them. I was confused at
> first glance.

Will update

>> memmap defined regions. We will support up to 4 memmap regions. Any
>> additional regions will cause kaslr to disable. The mem_avoid set has
>> been augmented to add up to 4 regions of memmaps provided by the user
>   ~~~
> 4 un-usable memmap region need be cared, usable memory is not
> concerned

Will update

.
>> to exclude those regions from the set of valid address range to insert
>> the uncompressed kernel image. The nn@ss ranges will be skipped by the
>> mem_avoid set since it indicates memory useable.
>>
>> Signed-off-by: Dave Jiang 
>> ---
>>  arch/x86/boot/boot.h |3 +
>>  arch/x86/boot/compressed/kaslr.c |  131 
>> ++
>>  arch/x86/boot/string.c   |   38 +++
>>  3 files changed, 172 insertions(+)
>>
>> diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
>> index e5612f3..59c2075 100644
>> --- a/arch/x86/boot/boot.h
>> +++ b/arch/x86/boot/boot.h
>> @@ -332,7 +332,10 @@ int strncmp(const char *cs, const char *ct, size_t 
>> count);
>>  size_t strnlen(const char *s, size_t maxlen);
>>  unsigned int atou(const char *s);
>>  unsigned long long simple_strtoull(const char *cp, char **endp, unsigned 
>> int base);
>> +unsigned long simple_strtoul(const char *cp, char **endp, unsigned int 
>> base);
>> +long simple_strtol(const char *cp, char **endp, unsigned int base);
>>  size_t strlen(const char *s);
>> +char *strchr(const char *s, int c);
>>  
>>  /* tty.c */
>>  void puts(const char *);
>> diff --git a/arch/x86/boot/compressed/kaslr.c 
>> b/arch/x86/boot/compressed/kaslr.c
>> index a66854d..f5a1c8d 100644
>> --- a/arch/x86/boot/compressed/kaslr.c
>> +++ b/arch/x86/boot/compressed/kaslr.c
>> @@ -11,6 +11,7 @@
>>   */
>>  #include "misc.h"
>>  #include "error.h"
>> +#include "../boot.h"
>>  
>>  #include 
>>  #include 
>> @@ -61,9 +62,16 @@ enum mem_avoid_index {
>>  MEM_AVOID_INITRD,
>>  MEM_AVOID_CMDLINE,
>>  MEM_AVOID_BOOTPARAMS,
>> +MEM_AVOID_MEMMAP1,
>> +MEM_AVOID_MEMMAP2,
>> +MEM_AVOID_MEMMAP3,
>> +MEM_AVOID_MEMMAP4,
> 
> This looks not good. Could it be done like fixed_addresses?
> Something like:
> 
>   MEM_AVOID_MEMMAP_BEGIN,
>   MEM_AVOID_MEMMAP_BEGIN + MEM_AVOID_MAX -1,
> 
> Please point it out to me if there's some existing code in kernel like
> your way, I can also accept it.

I think you mean:
MEM_AVOID_MEMMAP_BEGIN + MAX_MEMMAP_REGIONS - 1

I will change


> 
>>  MEM_AVOID_MAX,
>>  };
>>  
>> +/* only supporting at most 4 memmap regions with kaslr */
> And here, "Only supporting at most 4 un-usable memmap regions with kaslr"?
> Surely this is based on if you will ignore the usable memory and do not
> store it as 0. And also the log need be changed too accordingly.
>> +#define MAX_MEMMAP_REGIONS  4
>> +
>>  static struct mem_vector mem_avoid[MEM_AVOID_MAX];
>>  
>>  static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
>> @@ -77,6 +85,121 @@ static bool mem_overlaps(struct mem_vector *one, struct 
>> mem_vector *two)
>>  return true;
>>  }
>>  
>> +/**
>> + *  _memparse - parse a string with mem suffixes into a number
>> + *  @ptr: Where parse begins
>> + *  @retptr: (output) Optional pointer to next char after parse completes
>> + *
>> + *  Parses a string into a number.  The number stored at @ptr is
>> + *  potentially suffixed with K, M, G, T, P, E.
>> + */
>> +static unsigned long long _memparse(const char *ptr, char **retptr)
>> +{
>> +char *endptr;   /* local pointer to end of parsed string */
>> +
>> +unsigned long long ret = simple_strtoull(ptr, &endptr, 0);
>> +
>> +switch (*endptr) {
>> +case 'E':
>> +case 'e':
>> +ret <<= 10;
>> +case 'P':
>> +case 'p':
>> +ret <<= 10;
>> +case 'T':
>> +case 't':
>> +ret <<= 10;
>> +case 'G':
>> +case 'g':
>> +ret <<= 10;
>> +case 'M':
>> +case 'm':
>> +ret <<= 10;
>> +case 'K':
>> +case 'k':
>> +ret <<= 10;
>> +endptr++;
>> +default:
>> +break;
>> +}
>> +
>> +if (retptr)
>> +*retptr = endptr;
>> +
>> +return ret;
>> +}
>> +
>> +static int
>> +parse_memmap(char *p, unsi

Re: LTP rwtest01 blocks on DAX mountpoint

2017-01-04 Thread Ross Zwisler
On Wed, Jan 04, 2017 at 05:48:34PM +0800, Xiong Zhou wrote:
> On Tue, Jan 03, 2017 at 09:57:10AM -0700, Ross Zwisler wrote:
> > On Tue, Jan 03, 2017 at 02:49:22PM +0800, Xiong Zhou wrote:
> > > On Mon, Jan 02, 2017 at 02:49:41PM -0700, Ross Zwisler wrote:
> > > > On Mon, Jan 02, 2017 at 06:16:17PM +0100, Jan Kara wrote:
> > > > > On Fri 30-12-16 17:33:53, Xiong Zhou wrote:
> > > > > > On Sat, Dec 24, 2016 at 07:07:14PM +0800, Xiong Zhou wrote:
> > > > > > > Hi lists,
> > > snip
> > > > > I was trying to reproduce this but for me rwtest01 completes just 
> > > > > fine on
> > > > > dax mountpoint (I've used your reproducer). So can you sample several
> > > > > kernel stack traces to get a rough idea where the kernel is running?
> > > > > Thanks!
> > > > > 
> > > > >   Honza
> > > > 
> > > > I'm also unable to reproduce this issue.  I've tried with both the 
> > > > blamed
> > > > commit:
> > > > 4b4bb46 (HEAD) dax: clear dirty entry tags on cache flush
> > > > and with v4.9-rc2.  Both pass the test in my setup.
> > > > Perhaps the variable is the size of your PMEM partitions?
> > > > # fdisk -l /dev/pmem0
> > > > Disk /dev/pmem0: 16 GiB, 17179869184 bytes, 33554432 sectors
> > > > Units: sectors of 1 * 512 = 512 bytes
> > > > Sector size (logical/physical): 512 bytes / 4096 bytes
> > > > I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> > > > Disklabel type: dos
> > > > Disk identifier: 0xfe50c900
> > > > Device   BootStart  End  Sectors Size Id Type
> > > > /dev/pmem0p1  4096 25165823 25161728  12G 83 Linux
> > > > /dev/pmem0p2  25165824 33550335  8384512   4G 83 Linux
> > > > 
> > > > What does your setup look like?
> > > > I'm using the current tip of the LTP tree:
> > > > 8cc4165  waitid02: define _XOPEN_SOURCE 500
> > > > Thanks,
> > > > - Ross
> > > 
> > > Thanks all for looking into it.
> > > 
> > > Turns out the rc2 relative updates fix this issue, so does
> > > an old issue i reported a while ago:
> > > multi-threads libvmmalloc fork test hang
> > > https://lists.01.org/pipermail/linux-nvdimm/2016-October/007602.html
> > > 
> > > I'm able to reproduce these issues before rc2, now it
> > > passes on current Linus tree:
> > > c8b4ec8 Merge tag 'fscrypt-for-stable'
> > 
> > Hmm...I'm able to reproduce the other libvmmalloc issue with both v4.10-rc2
> > and with "c8b4ec8 Merge tag 'fscrypt-for-stable'".  I'm debugging that issue
> > today.
> > 
> > It's interesting that both tests started passing for you.  Did you change
> > something in your test setup?
> 
> Hi,
> 
> Quick update:
>   Ross's new patch fixed the vmmaloc_fork issue, not the rc2 update.
>   Regression tests is going on, so far so good.
> 
> I'm able to reproduce the vmmalloc_fork issue on rc2 kernel
>   c8b4ec8 Merge tag 'fscrypt-for-stable'
> with nvml commit to
>   77c2a5a Merge pull request #1554 from krzycz/win-libvmem_rc
> 
> My previous statement about rc2 fixed old vmmalloc_fork issue
> was wrong, my mistake. I have changed my test setup.
> 
> Now after some tests, Ross's patch
>   [PATCH] dax: fix deadlock with DAX 4k holes
> on top of Linus tree c8b4ec8 have fixed this vmmalloc_fork issue.
> My DAX regression tests is going on, looks good so far. Gonna
> update once it have finished.

Cool, thanks for the update.  If you're still able to reproduce this second
issue after my patch we can dig in to the differences between your test setup
and mine so I can reproduce it & debug.

Thanks for the reports!
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] dax: fix deadlock with DAX 4k holes

2017-01-04 Thread Xiong Zhou
On Tue, Jan 03, 2017 at 02:36:05PM -0700, Ross Zwisler wrote:
> Currently in DAX if we have three read faults on the same hole address we
> can end up with the following:
> 
> Thread 0  Thread 1Thread 2
>   
> dax_iomap_fault
>  grab_mapping_entry
>   lock_slot
>
> 
>   dax_iomap_fault
>grab_mapping_entry
> get_unlocked_mapping_entry
>  
> 
>   dax_iomap_fault
>grab_mapping_entry
> get_unlocked_mapping_entry
>  
>   dax_load_hole
>find_or_create_page
>...
> page_cache_tree_insert
>  dax_wake_mapping_entry_waiter
>   
>  __radix_tree_replace
>   
> 
>   
>   get_page
>   lock_page
>   ...
>   put_locked_mapping_entry
>   unlock_page
>   put_page
> 
>   wait queue>
> 
> The crux of the problem is that once we insert a 4k zero page, all locking
> from then on is done in terms of that 4k zero page and any additional
> threads sleeping on the empty DAX entry will never be woken.  Fix this by
> waking all sleepers when we replace the DAX radix tree entry with a 4k zero
> page.  This will allow all sleeping threads to successfully transition from
> locking based on the DAX empty entry to locking on the 4k zero page.
> 
> With the test case reported by Xiong this happens very regularly in my test
> setup, with some runs resulting in 9+ threads in this deadlocked state.
> With this fix I've been able to run that same test dozens of times in a
> loop without issue.
> 
> Signed-off-by: Ross Zwisler 
> Reported-by: Xiong Zhou 
> Fixes: commit ac401cc78242 ("dax: New fault locking")
> Cc: Jan Kara 
> Cc: sta...@vger.kernel.org # 4.7+
> ---

Positive test result of this patch for this issue and the regression
tests.

Great job!

> 
> This issue exists as far back as v4.7, and I was easly able to reproduce it
> with v4.7 using the same test.
> 
> Unfortunately this patch won't apply cleanly to the stable trees, but the
> change is very simple and should be easy to replicate by hand.  Please ping
> me if you'd like patches that apply cleanly to the v4.9 and v4.8.15 trees.
> 
> ---
>  mm/filemap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index d0e4d10..b772a33 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -138,7 +138,7 @@ static int page_cache_tree_insert(struct address_space 
> *mapping,
>   dax_radix_locked_entry(0, RADIX_DAX_EMPTY));
>   /* Wakeup waiters for exceptional entry lock */
>   dax_wake_mapping_entry_waiter(mapping, page->index, p,
> -   false);
> +   true);
>   }
>   }
>   __radix_tree_replace(&mapping->page_tree, node, slot, page,
> -- 
> 2.7.4
> 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: LTP rwtest01 blocks on DAX mountpoint

2017-01-04 Thread Xiong Zhou
On Tue, Jan 03, 2017 at 09:57:10AM -0700, Ross Zwisler wrote:
> On Tue, Jan 03, 2017 at 02:49:22PM +0800, Xiong Zhou wrote:
> > On Mon, Jan 02, 2017 at 02:49:41PM -0700, Ross Zwisler wrote:
> > > On Mon, Jan 02, 2017 at 06:16:17PM +0100, Jan Kara wrote:
> > > > On Fri 30-12-16 17:33:53, Xiong Zhou wrote:
> > > > > On Sat, Dec 24, 2016 at 07:07:14PM +0800, Xiong Zhou wrote:
> > > > > > Hi lists,
> > snip
> > > > I was trying to reproduce this but for me rwtest01 completes just fine 
> > > > on
> > > > dax mountpoint (I've used your reproducer). So can you sample several
> > > > kernel stack traces to get a rough idea where the kernel is running?
> > > > Thanks!
> > > > 
> > > > Honza
> > > 
> > > I'm also unable to reproduce this issue.  I've tried with both the blamed
> > > commit:
> > > 4b4bb46 (HEAD) dax: clear dirty entry tags on cache flush
> > > and with v4.9-rc2.  Both pass the test in my setup.
> > > Perhaps the variable is the size of your PMEM partitions?
> > > # fdisk -l /dev/pmem0
> > > Disk /dev/pmem0: 16 GiB, 17179869184 bytes, 33554432 sectors
> > > Units: sectors of 1 * 512 = 512 bytes
> > > Sector size (logical/physical): 512 bytes / 4096 bytes
> > > I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> > > Disklabel type: dos
> > > Disk identifier: 0xfe50c900
> > > Device   BootStart  End  Sectors Size Id Type
> > > /dev/pmem0p1  4096 25165823 25161728  12G 83 Linux
> > > /dev/pmem0p2  25165824 33550335  8384512   4G 83 Linux
> > > 
> > > What does your setup look like?
> > > I'm using the current tip of the LTP tree:
> > > 8cc4165  waitid02: define _XOPEN_SOURCE 500
> > > Thanks,
> > > - Ross
> > 
> > Thanks all for looking into it.
> > 
> > Turns out the rc2 relative updates fix this issue, so does
> > an old issue i reported a while ago:
> > multi-threads libvmmalloc fork test hang
> > https://lists.01.org/pipermail/linux-nvdimm/2016-October/007602.html
> > 
> > I'm able to reproduce these issues before rc2, now it
> > passes on current Linus tree:
> > c8b4ec8 Merge tag 'fscrypt-for-stable'
> 
> Hmm...I'm able to reproduce the other libvmmalloc issue with both v4.10-rc2
> and with "c8b4ec8 Merge tag 'fscrypt-for-stable'".  I'm debugging that issue
> today.
> 
> It's interesting that both tests started passing for you.  Did you change
> something in your test setup?

Hi,

Quick update:
  Ross's new patch fixed the vmmaloc_fork issue, not the rc2 update.
  Regression tests is going on, so far so good.

I'm able to reproduce the vmmalloc_fork issue on rc2 kernel
c8b4ec8 Merge tag 'fscrypt-for-stable'
with nvml commit to
77c2a5a Merge pull request #1554 from krzycz/win-libvmem_rc

My previous statement about rc2 fixed old vmmalloc_fork issue
was wrong, my mistake. I have changed my test setup.

Now after some tests, Ross's patch
[PATCH] dax: fix deadlock with DAX 4k holes
on top of Linus tree c8b4ec8 have fixed this vmmalloc_fork issue.
My DAX regression tests is going on, looks good so far. Gonna
update once it have finished.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm