Re: ndctl hangs after memory deregistration

2019-06-13 Thread Yue Li
Thanks Dan for the reply!

On 6/14/19, 3:06 AM, "Dan Williams"  wrote:

On Wed, Jun 12, 2019 at 9:08 PM Yue Li  wrote:
>
> hi Dan and Steve,
>
>

Hi,

I just happened to see this by luck, please use my Intel address, and
copy the libnvdimm mailing list on issues like this
(linux-nvdimm@lists.01.org).

OK.

> We recently ran into a strange issue where ndctl command hangs on dev dax 
after our software uses it.

The last thing that device-dax teardown does is wait for any pinned
pages to be released before allowing the exit to proceed.

OK.

> Inside our application, we basically will first RDMA register the whole 
device, then deregister, and exit.

Is this just using simple ibverbs to unregister or something specific
to this driver.

There was a bug upstream that addressed cases where device teardown
proceeded when it shouldn't, but the sequence you describe is the
opposite the pages pins should be torn down before the device
reconfiguration.

> However, if we remove the registration and deregistration code, ndctl 
works correctly without hanging. The problem occurs both on DRAM emulated dax 
as well as real PMEM backed dax.
>
> Here is our system information:
>
>
>
> CentOS 7.6
>
> Vanilla kernel 3.10.0-957.el7.x86_64

Are you familiar with rebuilding the kernel? I'd ask you to try to
reproduce with the latest development kernel that includes these
fixes:

4422ee8476f0 mm/devm_memremap_pages: fix final page put race
771f0714d0dc PCI/P2PDMA: track pgmap references per resource, not globally
af37085de906 lib/genalloc: introduce chunk owners
e0047ff8aa77 PCI/P2PDMA: fix the gen_pool_add_virt() failure path
0315d47d6ae9 mm/devm_memremap_pages: introduce devm_memunmap_pages
216475c7eaa8 drivers/base/devres: introduce devm_release_action()

...but it sounds like you may be hitting a different issue.

Thanks for the suggestion, we will download the upstream kernel and try it 
again. Will post the results soon. 

Best, 

Yue




___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


RE: ndctl hangs after memory deregistration

2019-06-17 Thread Jacky Wu
Hi Dan,

I wrote a small program to simulate our use case, and tested 3 cases, do no 
register/unregister, do register only but no unregister, do both 
register/unregister, and ndctl command hung in latter two cases.  I'm attaching 
the source code for your reference.

I will try using latest kernel next.

Thanks,
Jacky

-Original Message-
From: Yue Li  
Sent: Friday, June 14, 2019 7:10 AM
To: Dan Williams 
Cc: Scargall, Steve ; Jacky Wu 
; linux-nvdimm@lists.01.org
Subject: Re: ndctl hangs after memory deregistration

Thanks Dan for the reply!

On 6/14/19, 3:06 AM, "Dan Williams"  wrote:

On Wed, Jun 12, 2019 at 9:08 PM Yue Li  wrote:
>
> hi Dan and Steve,
>
>

Hi,

I just happened to see this by luck, please use my Intel address, and
copy the libnvdimm mailing list on issues like this
(linux-nvdimm@lists.01.org).

OK.

> We recently ran into a strange issue where ndctl command hangs on dev dax 
after our software uses it.

The last thing that device-dax teardown does is wait for any pinned
pages to be released before allowing the exit to proceed.

OK.

> Inside our application, we basically will first RDMA register the whole 
device, then deregister, and exit.

Is this just using simple ibverbs to unregister or something specific
to this driver.

There was a bug upstream that addressed cases where device teardown
proceeded when it shouldn't, but the sequence you describe is the
opposite the pages pins should be torn down before the device
reconfiguration.

> However, if we remove the registration and deregistration code, ndctl 
works correctly without hanging. The problem occurs both on DRAM emulated dax 
as well as real PMEM backed dax.
>
> Here is our system information:
>
>
>
> CentOS 7.6
>
> Vanilla kernel 3.10.0-957.el7.x86_64

Are you familiar with rebuilding the kernel? I'd ask you to try to
reproduce with the latest development kernel that includes these
fixes:

4422ee8476f0 mm/devm_memremap_pages: fix final page put race
771f0714d0dc PCI/P2PDMA: track pgmap references per resource, not globally
af37085de906 lib/genalloc: introduce chunk owners
e0047ff8aa77 PCI/P2PDMA: fix the gen_pool_add_virt() failure path
0315d47d6ae9 mm/devm_memremap_pages: introduce devm_memunmap_pages
216475c7eaa8 drivers/base/devres: introduce devm_release_action()

...but it sounds like you may be hitting a different issue.

Thanks for the suggestion, we will download the upstream kernel and try it 
again. Will post the results soon. 

Best, 

Yue




___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


RE: ndctl hangs after memory deregistration

2019-06-17 Thread Jacky Wu
Tried on kernel 4.18.20 and this issue is not seen.



[root@localhost ~]# ./test-ibv_reg 10.8.8.133 /dev/dax0.0 3

Creating RDMA event channel.

Creating RDMA communication identifier.

RDMA bind address to 10.8.8.133

RDMA start listen

Register memory region.

Unregister memory region.

Pool unmapped.

Pool handler closed.

Pool closed.

De-allocated PD.

Destroyed RDMA communication identifier.

Destroyed RDMA event channel.

[root@localhost ~]# ndctl create-namespace -fe namespace0.0 -a 4k

{

  "dev":"namespace0.0",

  "mode":"devdax",

  "map":"dev",

  "size":"7.87 GiB (8.45 GB)",

  "uuid":"743ec485-6c77-4323-90ca-5ad864a00e72",

  "daxregion":{

"id":0,

"size":"7.87 GiB (8.45 GB)",

"align":4096,

"devices":[

  {

"chardev":"dax0.0",

"size":"7.87 GiB (8.45 GB)"

  }

]

  },

  "numa_node":0

}



[root@localhost ~]# uname -a

Linux localhost.localdomain 4.18.20 #1 SMP Mon Jun 17 06:43:19 EDT 2019 x86_64 
x86_64 x86_64 GNU/Linux





Thanks,

Jacky



-Original Message-
From: Jacky Wu
Sent: Monday, June 17, 2019 4:58 PM
To: Yue Li ; Dan Williams 
Cc: Scargall, Steve ; linux-nvdimm@lists.01.org
Subject: RE: ndctl hangs after memory deregistration



Hi Dan,



I wrote a small program to simulate our use case, and tested 3 cases, do no 
register/unregister, do register only but no unregister, do both 
register/unregister, and ndctl command hung in latter two cases.  I'm attaching 
the source code for your reference.



I will try using latest kernel next.



Thanks,

Jacky



-Original Message-

From: Yue Li mailto:yue...@memverge.com>>

Sent: Friday, June 14, 2019 7:10 AM

To: Dan Williams mailto:dan.j.willi...@intel.com>>

Cc: Scargall, Steve 
mailto:steve.scarg...@intel.com>>; Jacky Wu 
mailto:jacky...@memverge.com>>; 
linux-nvdimm@lists.01.org<mailto:linux-nvdimm@lists.01.org>

Subject: Re: ndctl hangs after memory deregistration



Thanks Dan for the reply!



On 6/14/19, 3:06 AM, "Dan Williams" 
mailto:dan.j.willi...@intel.com>> wrote:



On Wed, Jun 12, 2019 at 9:08 PM Yue Li 
mailto:yue...@memverge.com>> wrote:

>

> hi Dan and Steve,

>

>



Hi,



I just happened to see this by luck, please use my Intel address, and

   copy the libnvdimm mailing list on issues like this

(linux-nvdimm@lists.01.org<mailto:linux-nvdimm@lists.01.org>).



OK.



> We recently ran into a strange issue where ndctl command hangs on dev dax 
after our software uses it.



The last thing that device-dax teardown does is wait for any pinned

pages to be released before allowing the exit to proceed.



OK.



> Inside our application, we basically will first RDMA register the whole 
device, then deregister, and exit.



Is this just using simple ibverbs to unregister or something specific

to this driver.



There was a bug upstream that addressed cases where device teardown

proceeded when it shouldn't, but the sequence you describe is the

opposite the pages pins should be torn down before the device

reconfiguration.



> However, if we remove the registration and deregistration code, ndctl 
works correctly without hanging. The problem occurs both on DRAM emulated dax 
as well as real PMEM backed dax.

>

> Here is our system information:

>

>

>

> CentOS 7.6

>

> Vanilla kernel 3.10.0-957.el7.x86_64



Are you familiar with rebuilding the kernel? I'd ask you to try to

reproduce with the latest development kernel that includes these

fixes:



4422ee8476f0 mm/devm_memremap_pages: fix final page put race

771f0714d0dc PCI/P2PDMA: track pgmap references per resource, not globally

af37085de906 lib/genalloc: introduce chunk owners

e0047ff8aa77 PCI/P2PDMA: fix the gen_pool_add_virt() failure path

0315d47d6ae9 mm/devm_memremap_pages: introduce devm_memunmap_pages

216475c7eaa8 drivers/base/devres: introduce devm_release_action()



...but it sounds like you may be hitting a different issue.



Thanks for the suggestion, we will download the upstream kernel and try it 
again. Will post the results soon.



Best,



Yue








___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm