Re: NFS over RDMA crashing
On Sat, 08 Mar 2014 14:13:44 -0600 Steve Wise sw...@opengridcomputing.com wrote: On 3/8/2014 1:20 PM, Steve Wise wrote: I removed your change and started debugging original crash that happens on top-o-tree. Seems like rq_next_pages is screwed up. It should always be = rq_respages, yes? I added a BUG_ON() to assert this in rdma_read_xdr() we hit the BUG_ON(). Look crash svc_rqst.rq_next_page 0x8800b84e6000 rq_next_page = 0x8800b84e6228 crash svc_rqst.rq_respages 0x8800b84e6000 rq_respages = 0x8800b84e62a8 Any ideas Bruce/Tom? Guys, the patch below seems to fix the problem. Dunno if it is correct though. What do you think? diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..6d62411 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, sge_no++; } rqstp-rq_respages = rqstp-rq_pages[sge_no]; + rqstp-rq_next_page = rqstp-rq_respages; /* We should never run out of SGE because the limit is defined to * support the max allowed RPC data length @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct svcxprt_rdma *xprt, /* rq_respages points one past arg pages */ rqstp-rq_respages = rqstp-rq_arg.pages[page_no]; + rqstp-rq_next_page = rqstp-rq_respages; /* Create the reply and chunk maps */ offset = 0; While this patch avoids the crashing, it apparently isn't correct...I'm getting IO errors reading files over the mount. :) I hit the same oops and tested your patch and it seems to have fixed that particular panic, but I still see a bunch of other mem corruption oopses even with it. I'll look more closely at that when I get some time. FWIW, I can easily reproduce that by simply doing something like: $ dd if=/dev/urandom of=/file/on/nfsordma/mount bs=4k count=1 I'm not sure why you're not seeing any panics with your patch in place. Perhaps it's due to hw differences between our test rigs. The EIO problem that you're seeing is likely the same client bug that Chuck recently fixed in this patch: [PATCH 2/8] SUNRPC: Fix large reads on NFS/RDMA AIUI, Trond is merging that set for 3.15, so I'd make sure your client has those patches when testing. Finally, I also have a forthcoming patch to fix non-page aligned NFS READs as well. I'm hesitant to send that out though until I can at least run the connectathon testsuite against this server. The WRITE oopses sort of prevent that for now... -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS over RDMA crashing
On Mar 12, 2014, at 9:33, Jeff Layton jlay...@redhat.com wrote: On Sat, 08 Mar 2014 14:13:44 -0600 Steve Wise sw...@opengridcomputing.com wrote: On 3/8/2014 1:20 PM, Steve Wise wrote: I removed your change and started debugging original crash that happens on top-o-tree. Seems like rq_next_pages is screwed up. It should always be = rq_respages, yes? I added a BUG_ON() to assert this in rdma_read_xdr() we hit the BUG_ON(). Look crash svc_rqst.rq_next_page 0x8800b84e6000 rq_next_page = 0x8800b84e6228 crash svc_rqst.rq_respages 0x8800b84e6000 rq_respages = 0x8800b84e62a8 Any ideas Bruce/Tom? Guys, the patch below seems to fix the problem. Dunno if it is correct though. What do you think? diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..6d62411 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, sge_no++; } rqstp-rq_respages = rqstp-rq_pages[sge_no]; + rqstp-rq_next_page = rqstp-rq_respages; /* We should never run out of SGE because the limit is defined to * support the max allowed RPC data length @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct svcxprt_rdma *xprt, /* rq_respages points one past arg pages */ rqstp-rq_respages = rqstp-rq_arg.pages[page_no]; + rqstp-rq_next_page = rqstp-rq_respages; /* Create the reply and chunk maps */ offset = 0; While this patch avoids the crashing, it apparently isn't correct...I'm getting IO errors reading files over the mount. :) I hit the same oops and tested your patch and it seems to have fixed that particular panic, but I still see a bunch of other mem corruption oopses even with it. I'll look more closely at that when I get some time. FWIW, I can easily reproduce that by simply doing something like: $ dd if=/dev/urandom of=/file/on/nfsordma/mount bs=4k count=1 I'm not sure why you're not seeing any panics with your patch in place. Perhaps it's due to hw differences between our test rigs. The EIO problem that you're seeing is likely the same client bug that Chuck recently fixed in this patch: [PATCH 2/8] SUNRPC: Fix large reads on NFS/RDMA AIUI, Trond is merging that set for 3.15, so I'd make sure your client has those patches when testing. Nothing is in my queue yet. _ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.mykleb...@primarydata.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS over RDMA crashing
Hi Trond, I think this patch is still 'off-by-one'. We'll take a look at this today. Thanks, Tom On 3/12/14 9:05 AM, Trond Myklebust wrote: On Mar 12, 2014, at 9:33, Jeff Layton jlay...@redhat.com wrote: On Sat, 08 Mar 2014 14:13:44 -0600 Steve Wise sw...@opengridcomputing.com wrote: On 3/8/2014 1:20 PM, Steve Wise wrote: I removed your change and started debugging original crash that happens on top-o-tree. Seems like rq_next_pages is screwed up. It should always be = rq_respages, yes? I added a BUG_ON() to assert this in rdma_read_xdr() we hit the BUG_ON(). Look crash svc_rqst.rq_next_page 0x8800b84e6000 rq_next_page = 0x8800b84e6228 crash svc_rqst.rq_respages 0x8800b84e6000 rq_respages = 0x8800b84e62a8 Any ideas Bruce/Tom? Guys, the patch below seems to fix the problem. Dunno if it is correct though. What do you think? diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..6d62411 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, sge_no++; } rqstp-rq_respages = rqstp-rq_pages[sge_no]; + rqstp-rq_next_page = rqstp-rq_respages; /* We should never run out of SGE because the limit is defined to * support the max allowed RPC data length @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct svcxprt_rdma *xprt, /* rq_respages points one past arg pages */ rqstp-rq_respages = rqstp-rq_arg.pages[page_no]; + rqstp-rq_next_page = rqstp-rq_respages; /* Create the reply and chunk maps */ offset = 0; While this patch avoids the crashing, it apparently isn't correct...I'm getting IO errors reading files over the mount. :) I hit the same oops and tested your patch and it seems to have fixed that particular panic, but I still see a bunch of other mem corruption oopses even with it. I'll look more closely at that when I get some time. FWIW, I can easily reproduce that by simply doing something like: $ dd if=/dev/urandom of=/file/on/nfsordma/mount bs=4k count=1 I'm not sure why you're not seeing any panics with your patch in place. Perhaps it's due to hw differences between our test rigs. The EIO problem that you're seeing is likely the same client bug that Chuck recently fixed in this patch: [PATCH 2/8] SUNRPC: Fix large reads on NFS/RDMA AIUI, Trond is merging that set for 3.15, so I'd make sure your client has those patches when testing. Nothing is in my queue yet. _ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.mykleb...@primarydata.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS over RDMA crashing
On Wed, 12 Mar 2014 10:05:24 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: On Mar 12, 2014, at 9:33, Jeff Layton jlay...@redhat.com wrote: On Sat, 08 Mar 2014 14:13:44 -0600 Steve Wise sw...@opengridcomputing.com wrote: On 3/8/2014 1:20 PM, Steve Wise wrote: I removed your change and started debugging original crash that happens on top-o-tree. Seems like rq_next_pages is screwed up. It should always be = rq_respages, yes? I added a BUG_ON() to assert this in rdma_read_xdr() we hit the BUG_ON(). Look crash svc_rqst.rq_next_page 0x8800b84e6000 rq_next_page = 0x8800b84e6228 crash svc_rqst.rq_respages 0x8800b84e6000 rq_respages = 0x8800b84e62a8 Any ideas Bruce/Tom? Guys, the patch below seems to fix the problem. Dunno if it is correct though. What do you think? diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..6d62411 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, sge_no++; } rqstp-rq_respages = rqstp-rq_pages[sge_no]; + rqstp-rq_next_page = rqstp-rq_respages; /* We should never run out of SGE because the limit is defined to * support the max allowed RPC data length @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct svcxprt_rdma *xprt, /* rq_respages points one past arg pages */ rqstp-rq_respages = rqstp-rq_arg.pages[page_no]; + rqstp-rq_next_page = rqstp-rq_respages; /* Create the reply and chunk maps */ offset = 0; While this patch avoids the crashing, it apparently isn't correct...I'm getting IO errors reading files over the mount. :) I hit the same oops and tested your patch and it seems to have fixed that particular panic, but I still see a bunch of other mem corruption oopses even with it. I'll look more closely at that when I get some time. FWIW, I can easily reproduce that by simply doing something like: $ dd if=/dev/urandom of=/file/on/nfsordma/mount bs=4k count=1 I'm not sure why you're not seeing any panics with your patch in place. Perhaps it's due to hw differences between our test rigs. The EIO problem that you're seeing is likely the same client bug that Chuck recently fixed in this patch: [PATCH 2/8] SUNRPC: Fix large reads on NFS/RDMA AIUI, Trond is merging that set for 3.15, so I'd make sure your client has those patches when testing. Nothing is in my queue yet. Doh! Any reason not to merge that set from Chuck? They do fix a couple of nasty client bugs... -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS over RDMA crashing
On Mar 12, 2014, at 10:28, Jeffrey Layton jlay...@redhat.com wrote: On Wed, 12 Mar 2014 10:05:24 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: On Mar 12, 2014, at 9:33, Jeff Layton jlay...@redhat.com wrote: On Sat, 08 Mar 2014 14:13:44 -0600 Steve Wise sw...@opengridcomputing.com wrote: On 3/8/2014 1:20 PM, Steve Wise wrote: I removed your change and started debugging original crash that happens on top-o-tree. Seems like rq_next_pages is screwed up. It should always be = rq_respages, yes? I added a BUG_ON() to assert this in rdma_read_xdr() we hit the BUG_ON(). Look crash svc_rqst.rq_next_page 0x8800b84e6000 rq_next_page = 0x8800b84e6228 crash svc_rqst.rq_respages 0x8800b84e6000 rq_respages = 0x8800b84e62a8 Any ideas Bruce/Tom? Guys, the patch below seems to fix the problem. Dunno if it is correct though. What do you think? diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..6d62411 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, sge_no++; } rqstp-rq_respages = rqstp-rq_pages[sge_no]; + rqstp-rq_next_page = rqstp-rq_respages; /* We should never run out of SGE because the limit is defined to * support the max allowed RPC data length @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct svcxprt_rdma *xprt, /* rq_respages points one past arg pages */ rqstp-rq_respages = rqstp-rq_arg.pages[page_no]; + rqstp-rq_next_page = rqstp-rq_respages; /* Create the reply and chunk maps */ offset = 0; While this patch avoids the crashing, it apparently isn't correct...I'm getting IO errors reading files over the mount. :) I hit the same oops and tested your patch and it seems to have fixed that particular panic, but I still see a bunch of other mem corruption oopses even with it. I'll look more closely at that when I get some time. FWIW, I can easily reproduce that by simply doing something like: $ dd if=/dev/urandom of=/file/on/nfsordma/mount bs=4k count=1 I'm not sure why you're not seeing any panics with your patch in place. Perhaps it's due to hw differences between our test rigs. The EIO problem that you're seeing is likely the same client bug that Chuck recently fixed in this patch: [PATCH 2/8] SUNRPC: Fix large reads on NFS/RDMA AIUI, Trond is merging that set for 3.15, so I'd make sure your client has those patches when testing. Nothing is in my queue yet. Doh! Any reason not to merge that set from Chuck? They do fix a couple of nasty client bugs… Most of them are one-line debugging dprintks which I do not intend to apply. One of them confuses a readdir optimisation with a bugfix; at the very least the patch comments need changing. That leaves 2 that can go in, however as they are clearly insufficient to make RDMA safe for general use, they certainly do not warrant a stable@ label. The workaround for the Oopses is simple: use TCP. _ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.mykleb...@primarydata.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS over RDMA crashing
On Wed, 12 Mar 2014 11:03:52 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: On Mar 12, 2014, at 10:28, Jeffrey Layton jlay...@redhat.com wrote: On Wed, 12 Mar 2014 10:05:24 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: On Mar 12, 2014, at 9:33, Jeff Layton jlay...@redhat.com wrote: On Sat, 08 Mar 2014 14:13:44 -0600 Steve Wise sw...@opengridcomputing.com wrote: On 3/8/2014 1:20 PM, Steve Wise wrote: I removed your change and started debugging original crash that happens on top-o-tree. Seems like rq_next_pages is screwed up. It should always be = rq_respages, yes? I added a BUG_ON() to assert this in rdma_read_xdr() we hit the BUG_ON(). Look crash svc_rqst.rq_next_page 0x8800b84e6000 rq_next_page = 0x8800b84e6228 crash svc_rqst.rq_respages 0x8800b84e6000 rq_respages = 0x8800b84e62a8 Any ideas Bruce/Tom? Guys, the patch below seems to fix the problem. Dunno if it is correct though. What do you think? diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..6d62411 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, sge_no++; } rqstp-rq_respages = rqstp-rq_pages[sge_no]; + rqstp-rq_next_page = rqstp-rq_respages; /* We should never run out of SGE because the limit is defined to * support the max allowed RPC data length @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct svcxprt_rdma *xprt, /* rq_respages points one past arg pages */ rqstp-rq_respages = rqstp-rq_arg.pages[page_no]; + rqstp-rq_next_page = rqstp-rq_respages; /* Create the reply and chunk maps */ offset = 0; While this patch avoids the crashing, it apparently isn't correct...I'm getting IO errors reading files over the mount. :) I hit the same oops and tested your patch and it seems to have fixed that particular panic, but I still see a bunch of other mem corruption oopses even with it. I'll look more closely at that when I get some time. FWIW, I can easily reproduce that by simply doing something like: $ dd if=/dev/urandom of=/file/on/nfsordma/mount bs=4k count=1 I'm not sure why you're not seeing any panics with your patch in place. Perhaps it's due to hw differences between our test rigs. The EIO problem that you're seeing is likely the same client bug that Chuck recently fixed in this patch: [PATCH 2/8] SUNRPC: Fix large reads on NFS/RDMA AIUI, Trond is merging that set for 3.15, so I'd make sure your client has those patches when testing. Nothing is in my queue yet. Doh! Any reason not to merge that set from Chuck? They do fix a couple of nasty client bugs… Most of them are one-line debugging dprintks which I do not intend to apply. Fair enough. Those are certainly not necessary, but some of them clean up existing printks and probably do need to go in. That said, debugging this stuff is *really* difficult so having extra debug printks in place seems like a good thing (unless you're arguing for moving wholesale to tracepoints instead). One of them confuses a readdir optimisation with a bugfix; at the very least the patch comments need changing. I'll leave that to Chuck to comment on. I had the impression that it was a bugfix, but maybe there's some better way to handle that bug. That leaves 2 that can go in, however as they are clearly insufficient to make RDMA safe for general use, they certainly do not warrant a stable@ label. The workaround for the Oopses is simple: use TCP. Yeah, it's definitely rickety, but it's in and we do need to get fixes merged to this code. I'm ok with dropping the stable labels on those patches, but if we're going to declare this stuff not stable enough for general use then I think that we should take an aggressive approach on merging fixes to it. FWIW, I also notice that Kconfig doesn't show the option to actually enable/disable RDMA transports. I'll post a patch to fix that soon. Since this stuff is not very safe to use, then we should make it reasonably simple to disable it. -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS over RDMA crashing
On 3/7/2014 2:41 PM, Steve Wise wrote: Does this help? They must have added this for some reason, but I'm not seeing how it could have ever done anything --b. diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..e8f25ec 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -520,13 +520,6 @@ next_sge: for (ch_no = 0; rqstp-rq_pages[ch_no] rqstp-rq_respages; ch_no++) rqstp-rq_pages[ch_no] = NULL; - /* -* Detach res pages. If svc_release sees any it will attempt to -* put them. -*/ - while (rqstp-rq_next_page != rqstp-rq_respages) - *(--rqstp-rq_next_page) = NULL; - return err; } I can reproduce this server crash readily on a recent net-next tree. I added the above change, and see a different crash: [ 192.764773] BUG: unable to handle kernel paging request at 1000 [ 192.765688] IP: [8113c159] put_page+0x9/0x50 [ 192.765688] PGD 0 [ 192.765688] Oops: [#1] SMP DEBUG_PAGEALLOC [ 192.765688] Modules linked in: nfsd lockd nfs_acl exportfs auth_rpcgss oid_registry svcrdma tg3 ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc autofs4 sunrpc rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb4 iw_cxgb3 cxgb3 mdio ib_qib dca mlx4_en ib_mthca vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support dcdbas sg microcode pcspkr mlx4_ib ib_sa serio_raw ib_mad ib_core ib_addr ipv6 ptp pps_core lpc_ich mfd_core i5100_edac edac_core mlx4_core cxgb4 ext4 jbd2 mbcache sd_mod crc_t10dif crct10dif_common sr_mod cdrom pata_acpi ata_generic ata_piix radeon ttm drm_kms_helper drm i2c_algo_bit [ 192.765688] i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tg3] [ 192.765688] CPU: 1 PID: 6590 Comm: nfsd Not tainted 3.14.0-rc3-pending+ #5 [ 192.765688] Hardware name: Dell Inc. PowerEdge R300/0TY179, BIOS 1.3.0 08/15/2008 [ 192.765688] task: 8800b75c62c0 ti: 8801faa4a000 task.ti: 8801faa4a000 [ 192.765688] RIP: 0010:[8113c159] [8113c159] put_page+0x9/0x50 [ 192.765688] RSP: 0018:8801faa4be28 EFLAGS: 00010206 [ 192.765688] RAX: 8801fa9542a8 RBX: 8801fa954000 RCX: 0001 [ 192.765688] RDX: 8801fa953e10 RSI: 0200 RDI: 1000 [ 192.765688] RBP: 8801faa4be28 R08: 9b8d39b9 R09: 0017 [ 192.765688] R10: R11: R12: 8800cb2e7c00 [ 192.765688] R13: 8801fa954210 R14: R15: [ 192.765688] FS: () GS:88022ec8() knlGS: [ 192.765688] CS: 0010 DS: ES: CR0: 8005003b [ 192.765688] CR2: 1000 CR3: b9a5a000 CR4: 07e0 [ 192.765688] Stack: [ 192.765688] 8801faa4be58 a0881f4e 880204dd0e00 8801fa954000 [ 192.765688] 880204dd0e00 8800cb2e7c00 8801faa4be88 a08825f5 [ 192.765688] 8801fa954000 8800b75c62c0 81ae5ac0 a08cf930 [ 192.765688] Call Trace: [ 192.765688] [a0881f4e] svc_xprt_release+0x6e/0xf0 [sunrpc] [ 192.765688] [a08825f5] svc_recv+0x165/0x190 [sunrpc] [ 192.765688] [a08cf930] ? nfsd_pool_stats_release+0x60/0x60 [nfsd] [ 192.765688] [a08cf9e5] nfsd+0xb5/0x160 [nfsd] [ 192.765688] [a08cf930] ? nfsd_pool_stats_release+0x60/0x60 [nfsd] [ 192.765688] [8107471e] kthread+0xce/0xf0 [ 192.765688] [81074650] ? kthread_freezable_should_stop+0x70/0x70 [ 192.765688] [81584e2c] ret_from_fork+0x7c/0xb0 [ 192.765688] [81074650] ? kthread_freezable_should_stop+0x70/0x70 [ 192.765688] Code: 8d 7b 10 e8 ea fa ff ff 48 c7 03 00 00 00 00 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 66 f7 07 00 c0 75 32 8b 47 1c 48 8d 57 1c 85 c0 74 1c f0 ff 0a [ 192.765688] RIP [8113c159] put_page+0x9/0x50 [ 192.765688] RSP 8801faa4be28 [ 192.765688] CR2: 1000 crash This new crash is here calling put_page() on garbage I guess: static inline void svc_free_res_pages(struct svc_rqst *rqstp) { while (rqstp-rq_next_page != rqstp-rq_respages) { struct page **pp = --rqstp-rq_next_page; if (*pp) { put_page(*pp); *pp = NULL; } } } I removed your change and started debugging original crash that happens on top-o-tree. Seems like rq_next_pages is screwed up. It should always be = rq_respages, yes? I added a BUG_ON() to assert this in rdma_read_xdr() we hit the BUG_ON(). Look crash
Re: NFS over RDMA crashing
I removed your change and started debugging original crash that happens on top-o-tree. Seems like rq_next_pages is screwed up. It should always be = rq_respages, yes? I added a BUG_ON() to assert this in rdma_read_xdr() we hit the BUG_ON(). Look crash svc_rqst.rq_next_page 0x8800b84e6000 rq_next_page = 0x8800b84e6228 crash svc_rqst.rq_respages 0x8800b84e6000 rq_respages = 0x8800b84e62a8 Any ideas Bruce/Tom? Guys, the patch below seems to fix the problem. Dunno if it is correct though. What do you think? diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..6d62411 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, sge_no++; } rqstp-rq_respages = rqstp-rq_pages[sge_no]; + rqstp-rq_next_page = rqstp-rq_respages; /* We should never run out of SGE because the limit is defined to * support the max allowed RPC data length @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct svcxprt_rdma *xprt, /* rq_respages points one past arg pages */ rqstp-rq_respages = rqstp-rq_arg.pages[page_no]; + rqstp-rq_next_page = rqstp-rq_respages; /* Create the reply and chunk maps */ offset = 0; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS over RDMA crashing
On 3/8/2014 1:20 PM, Steve Wise wrote: I removed your change and started debugging original crash that happens on top-o-tree. Seems like rq_next_pages is screwed up. It should always be = rq_respages, yes? I added a BUG_ON() to assert this in rdma_read_xdr() we hit the BUG_ON(). Look crash svc_rqst.rq_next_page 0x8800b84e6000 rq_next_page = 0x8800b84e6228 crash svc_rqst.rq_respages 0x8800b84e6000 rq_respages = 0x8800b84e62a8 Any ideas Bruce/Tom? Guys, the patch below seems to fix the problem. Dunno if it is correct though. What do you think? diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..6d62411 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, sge_no++; } rqstp-rq_respages = rqstp-rq_pages[sge_no]; + rqstp-rq_next_page = rqstp-rq_respages; /* We should never run out of SGE because the limit is defined to * support the max allowed RPC data length @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct svcxprt_rdma *xprt, /* rq_respages points one past arg pages */ rqstp-rq_respages = rqstp-rq_arg.pages[page_no]; + rqstp-rq_next_page = rqstp-rq_respages; /* Create the reply and chunk maps */ offset = 0; While this patch avoids the crashing, it apparently isn't correct...I'm getting IO errors reading files over the mount. :) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: NFS over RDMA crashing
Resurrecting an old issue :) More inline below... -Original Message- From: linux-nfs-ow...@vger.kernel.org [mailto:linux-nfs- ow...@vger.kernel.org] On Behalf Of J. Bruce Fields Sent: Thursday, February 07, 2013 10:42 AM To: Yan Burman Cc: linux-...@vger.kernel.org; sw...@opengridcomputing.com; linux- r...@vger.kernel.org; Or Gerlitz Subject: Re: NFS over RDMA crashing On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote: On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck: --- BUG: unable to handle kernel paging request at 880324dc7ff8 IP: [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 800324dc7161 Oops: 0003 [#1] PREEMPT SMP Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs 8021q bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6 Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro X8DTH-i/6/iF/6F/X8DTH RIP: 0010:[a05f3dfb] [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP: 0018:880324c3dbf8 EFLAGS: 00010297 RAX: 880324dc8000 RBX: 0001 RCX: 880324dd8428 RDX: 880324dc7ff8 RSI: 880324dd8428 RDI: 81149618 RBP: 880324c3dd78 R08: 60f9c860 R09: 0001 R10: 880324dd8000 R11: 0001 R12: 8806299dcb10 R13: 0003 R14: 0001 R15: 0010 FS: () GS:88063fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 880324dc7ff8 CR3: 01a0b000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process nfsd (pid: 4744, threadinfo 880324c3c000, task 88033055) Stack: 880324c3dc78 880324c3dcd8 0282 880631cec000 880324dd8000 88062ed33040 000124c3dc48 880324dd8000 88062ed33058 880630ce2b90 8806299e8000 0003 Call Trace: [a05f466e] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma] [81086540] ? try_to_wake_up+0x2f0/0x2f0 [a045963f] svc_recv+0x3ef/0x4b0 [sunrpc] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [a0571e5d] nfsd+0xad/0x130 [nfsd] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [81071df6] kthread+0xd6/0xe0 [81071d20] ? __init_kthread_worker+0x70/0x70 [814b462c] ret_from_fork+0x7c/0xb0 [81071d20] ? __init_kthread_worker+0x70/0x70 Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00 48 c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 RIP [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP 880324c3dbf8 CR2: 880324dc7ff8 ---[ end trace 06d0384754e9609a ]--- It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e nfsd4: cleanup: replace rq_resused count by rq_next_page pointer is responsible for the crash (it seems to be crashing in net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527) It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet. When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no longer getting the server crashes, so the reset of my tests were done using that point (it is somewhere in the middle of 3.7.0-rc2). OK, so this part's clearly my fault--I'll work on a patch, but the rdma's use of the -rq_pages array is pretty confusing. Does this help? They must have added this for some reason, but I'm not seeing how it could have ever done anything --b. diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..e8f25ec 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -520,13 +520,6 @@ next_sge: for (ch_no = 0; rqstp-rq_pages[ch_no] rqstp-rq_respages; ch_no++) rqstp-rq_pages[ch_no] = NULL; - /* - * Detach res pages
RE: NFS over RDMA crashing
Does this help? They must have added this for some reason, but I'm not seeing how it could have ever done anything --b. diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..e8f25ec 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -520,13 +520,6 @@ next_sge: for (ch_no = 0; rqstp-rq_pages[ch_no] rqstp-rq_respages; ch_no++) rqstp-rq_pages[ch_no] = NULL; - /* -* Detach res pages. If svc_release sees any it will attempt to -* put them. -*/ - while (rqstp-rq_next_page != rqstp-rq_respages) - *(--rqstp-rq_next_page) = NULL; - return err; } I can reproduce this server crash readily on a recent net-next tree. I added the above change, and see a different crash: [ 192.764773] BUG: unable to handle kernel paging request at 1000 [ 192.765688] IP: [8113c159] put_page+0x9/0x50 [ 192.765688] PGD 0 [ 192.765688] Oops: [#1] SMP DEBUG_PAGEALLOC [ 192.765688] Modules linked in: nfsd lockd nfs_acl exportfs auth_rpcgss oid_registry svcrdma tg3 ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc autofs4 sunrpc rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb4 iw_cxgb3 cxgb3 mdio ib_qib dca mlx4_en ib_mthca vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support dcdbas sg microcode pcspkr mlx4_ib ib_sa serio_raw ib_mad ib_core ib_addr ipv6 ptp pps_core lpc_ich mfd_core i5100_edac edac_core mlx4_core cxgb4 ext4 jbd2 mbcache sd_mod crc_t10dif crct10dif_common sr_mod cdrom pata_acpi ata_generic ata_piix radeon ttm drm_kms_helper drm i2c_algo_bit [ 192.765688] i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tg3] [ 192.765688] CPU: 1 PID: 6590 Comm: nfsd Not tainted 3.14.0-rc3-pending+ #5 [ 192.765688] Hardware name: Dell Inc. PowerEdge R300/0TY179, BIOS 1.3.0 08/15/2008 [ 192.765688] task: 8800b75c62c0 ti: 8801faa4a000 task.ti: 8801faa4a000 [ 192.765688] RIP: 0010:[8113c159] [8113c159] put_page+0x9/0x50 [ 192.765688] RSP: 0018:8801faa4be28 EFLAGS: 00010206 [ 192.765688] RAX: 8801fa9542a8 RBX: 8801fa954000 RCX: 0001 [ 192.765688] RDX: 8801fa953e10 RSI: 0200 RDI: 1000 [ 192.765688] RBP: 8801faa4be28 R08: 9b8d39b9 R09: 0017 [ 192.765688] R10: R11: R12: 8800cb2e7c00 [ 192.765688] R13: 8801fa954210 R14: R15: [ 192.765688] FS: () GS:88022ec8() knlGS: [ 192.765688] CS: 0010 DS: ES: CR0: 8005003b [ 192.765688] CR2: 1000 CR3: b9a5a000 CR4: 07e0 [ 192.765688] Stack: [ 192.765688] 8801faa4be58 a0881f4e 880204dd0e00 8801fa954000 [ 192.765688] 880204dd0e00 8800cb2e7c00 8801faa4be88 a08825f5 [ 192.765688] 8801fa954000 8800b75c62c0 81ae5ac0 a08cf930 [ 192.765688] Call Trace: [ 192.765688] [a0881f4e] svc_xprt_release+0x6e/0xf0 [sunrpc] [ 192.765688] [a08825f5] svc_recv+0x165/0x190 [sunrpc] [ 192.765688] [a08cf930] ? nfsd_pool_stats_release+0x60/0x60 [nfsd] [ 192.765688] [a08cf9e5] nfsd+0xb5/0x160 [nfsd] [ 192.765688] [a08cf930] ? nfsd_pool_stats_release+0x60/0x60 [nfsd] [ 192.765688] [8107471e] kthread+0xce/0xf0 [ 192.765688] [81074650] ? kthread_freezable_should_stop+0x70/0x70 [ 192.765688] [81584e2c] ret_from_fork+0x7c/0xb0 [ 192.765688] [81074650] ? kthread_freezable_should_stop+0x70/0x70 [ 192.765688] Code: 8d 7b 10 e8 ea fa ff ff 48 c7 03 00 00 00 00 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 66 f7 07 00 c0 75 32 8b 47 1c 48 8d 57 1c 85 c0 74 1c f0 ff 0a [ 192.765688] RIP [8113c159] put_page+0x9/0x50 [ 192.765688] RSP 8801faa4be28 [ 192.765688] CR2: 1000 crash This new crash is here calling put_page() on garbage I guess: static inline void svc_free_res_pages(struct svc_rqst *rqstp) { while (rqstp-rq_next_page != rqstp-rq_respages) { struct page **pp = --rqstp-rq_next_page; if (*pp) { put_page(*pp); *pp = NULL; } } } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: NFS over RDMA crashing
-Original Message- From: J. Bruce Fields [mailto:bfie...@fieldses.org] Sent: Friday, February 15, 2013 17:28 To: Yan Burman Cc: linux-...@vger.kernel.org; sw...@opengridcomputing.com; linux- r...@vger.kernel.org; Or Gerlitz Subject: Re: NFS over RDMA crashing On Mon, Feb 11, 2013 at 03:19:42PM +, Yan Burman wrote: -Original Message- From: J. Bruce Fields [mailto:bfie...@fieldses.org] Sent: Thursday, February 07, 2013 18:42 To: Yan Burman Cc: linux-...@vger.kernel.org; sw...@opengridcomputing.com; linux- r...@vger.kernel.org; Or Gerlitz Subject: Re: NFS over RDMA crashing On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote: On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck: --- BUG: unable to handle kernel paging request at 880324dc7ff8 IP: [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 800324dc7161 Oops: 0003 [#1] PREEMPT SMP Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs 8021q bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6 Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro X8DTH-i/6/iF/6F/X8DTH RIP: 0010:[a05f3dfb] [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP: 0018:880324c3dbf8 EFLAGS: 00010297 RAX: 880324dc8000 RBX: 0001 RCX: 880324dd8428 RDX: 880324dc7ff8 RSI: 880324dd8428 RDI: 81149618 RBP: 880324c3dd78 R08: 60f9c860 R09: 0001 R10: 880324dd8000 R11: 0001 R12: 8806299dcb10 R13: 0003 R14: 0001 R15: 0010 FS: () GS:88063fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 880324dc7ff8 CR3: 01a0b000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process nfsd (pid: 4744, threadinfo 880324c3c000, task 88033055) Stack: 880324c3dc78 880324c3dcd8 0282 880631cec000 880324dd8000 88062ed33040 000124c3dc48 880324dd8000 88062ed33058 880630ce2b90 8806299e8000 0003 Call Trace: [a05f466e] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma] [81086540] ? try_to_wake_up+0x2f0/0x2f0 [a045963f] svc_recv+0x3ef/0x4b0 [sunrpc] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [a0571e5d] nfsd+0xad/0x130 [nfsd] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [81071df6] kthread+0xd6/0xe0 [81071d20] ? __init_kthread_worker+0x70/0x70 [814b462c] ret_from_fork+0x7c/0xb0 [81071d20] ? __init_kthread_worker+0x70/0x70 Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00 48 c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 RIP [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP 880324c3dbf8 CR2: 880324dc7ff8 ---[ end trace 06d0384754e9609a ]--- It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e nfsd4: cleanup: replace rq_resused count by rq_next_page pointer is responsible for the crash (it seems to be crashing in net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527) It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet. When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no longer getting the server crashes, so the reset of my tests were done using that point (it is somewhere in the middle of 3.7.0-rc2). OK, so this part's clearly my fault--I'll work on a patch, but the rdma's use of the -rq_pages array is pretty confusing. Does this help? They must have added this for some reason, but I'm not seeing how it could have
Re: NFS over RDMA crashing
On Mon, Feb 11, 2013 at 03:19:42PM +, Yan Burman wrote: -Original Message- From: J. Bruce Fields [mailto:bfie...@fieldses.org] Sent: Thursday, February 07, 2013 18:42 To: Yan Burman Cc: linux-...@vger.kernel.org; sw...@opengridcomputing.com; linux- r...@vger.kernel.org; Or Gerlitz Subject: Re: NFS over RDMA crashing On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote: On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck: --- BUG: unable to handle kernel paging request at 880324dc7ff8 IP: [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 800324dc7161 Oops: 0003 [#1] PREEMPT SMP Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs 8021q bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6 Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro X8DTH-i/6/iF/6F/X8DTH RIP: 0010:[a05f3dfb] [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP: 0018:880324c3dbf8 EFLAGS: 00010297 RAX: 880324dc8000 RBX: 0001 RCX: 880324dd8428 RDX: 880324dc7ff8 RSI: 880324dd8428 RDI: 81149618 RBP: 880324c3dd78 R08: 60f9c860 R09: 0001 R10: 880324dd8000 R11: 0001 R12: 8806299dcb10 R13: 0003 R14: 0001 R15: 0010 FS: () GS:88063fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 880324dc7ff8 CR3: 01a0b000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process nfsd (pid: 4744, threadinfo 880324c3c000, task 88033055) Stack: 880324c3dc78 880324c3dcd8 0282 880631cec000 880324dd8000 88062ed33040 000124c3dc48 880324dd8000 88062ed33058 880630ce2b90 8806299e8000 0003 Call Trace: [a05f466e] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma] [81086540] ? try_to_wake_up+0x2f0/0x2f0 [a045963f] svc_recv+0x3ef/0x4b0 [sunrpc] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [a0571e5d] nfsd+0xad/0x130 [nfsd] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [81071df6] kthread+0xd6/0xe0 [81071d20] ? __init_kthread_worker+0x70/0x70 [814b462c] ret_from_fork+0x7c/0xb0 [81071d20] ? __init_kthread_worker+0x70/0x70 Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00 48 c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 RIP [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP 880324c3dbf8 CR2: 880324dc7ff8 ---[ end trace 06d0384754e9609a ]--- It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e nfsd4: cleanup: replace rq_resused count by rq_next_page pointer is responsible for the crash (it seems to be crashing in net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527) It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet. When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no longer getting the server crashes, so the reset of my tests were done using that point (it is somewhere in the middle of 3.7.0-rc2). OK, so this part's clearly my fault--I'll work on a patch, but the rdma's use of the -rq_pages array is pretty confusing. Does this help? They must have added this for some reason, but I'm not seeing how it could have ever done anything --b. diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..e8f25ec 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -520,13 +520,6 @@ next_sge: for (ch_no = 0; rqstp-rq_pages[ch_no] rqstp-rq_respages; ch_no++) rqstp-rq_pages[ch_no] = NULL
RE: NFS over RDMA crashing
-Original Message- From: J. Bruce Fields [mailto:bfie...@fieldses.org] Sent: Thursday, February 07, 2013 18:42 To: Yan Burman Cc: linux-...@vger.kernel.org; sw...@opengridcomputing.com; linux- r...@vger.kernel.org; Or Gerlitz Subject: Re: NFS over RDMA crashing On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote: On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck: --- BUG: unable to handle kernel paging request at 880324dc7ff8 IP: [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 800324dc7161 Oops: 0003 [#1] PREEMPT SMP Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs 8021q bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6 Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro X8DTH-i/6/iF/6F/X8DTH RIP: 0010:[a05f3dfb] [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP: 0018:880324c3dbf8 EFLAGS: 00010297 RAX: 880324dc8000 RBX: 0001 RCX: 880324dd8428 RDX: 880324dc7ff8 RSI: 880324dd8428 RDI: 81149618 RBP: 880324c3dd78 R08: 60f9c860 R09: 0001 R10: 880324dd8000 R11: 0001 R12: 8806299dcb10 R13: 0003 R14: 0001 R15: 0010 FS: () GS:88063fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 880324dc7ff8 CR3: 01a0b000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process nfsd (pid: 4744, threadinfo 880324c3c000, task 88033055) Stack: 880324c3dc78 880324c3dcd8 0282 880631cec000 880324dd8000 88062ed33040 000124c3dc48 880324dd8000 88062ed33058 880630ce2b90 8806299e8000 0003 Call Trace: [a05f466e] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma] [81086540] ? try_to_wake_up+0x2f0/0x2f0 [a045963f] svc_recv+0x3ef/0x4b0 [sunrpc] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [a0571e5d] nfsd+0xad/0x130 [nfsd] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [81071df6] kthread+0xd6/0xe0 [81071d20] ? __init_kthread_worker+0x70/0x70 [814b462c] ret_from_fork+0x7c/0xb0 [81071d20] ? __init_kthread_worker+0x70/0x70 Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00 48 c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 RIP [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP 880324c3dbf8 CR2: 880324dc7ff8 ---[ end trace 06d0384754e9609a ]--- It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e nfsd4: cleanup: replace rq_resused count by rq_next_page pointer is responsible for the crash (it seems to be crashing in net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527) It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet. When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no longer getting the server crashes, so the reset of my tests were done using that point (it is somewhere in the middle of 3.7.0-rc2). OK, so this part's clearly my fault--I'll work on a patch, but the rdma's use of the -rq_pages array is pretty confusing. Does this help? They must have added this for some reason, but I'm not seeing how it could have ever done anything --b. diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..e8f25ec 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -520,13 +520,6 @@ next_sge: for (ch_no = 0; rqstp-rq_pages[ch_no] rqstp-rq_respages; ch_no++) rqstp-rq_pages[ch_no] = NULL; - /* - * Detach res pages. If svc_release sees any it will attempt to - * put them. - */ - while (rqstp-rq_next_page != rqstp-rq_respages
Re: NFS over RDMA crashing
On Mon, Feb 11, 2013 at 03:19:42PM +, Yan Burman wrote: -Original Message- From: J. Bruce Fields [mailto:bfie...@fieldses.org] Sent: Thursday, February 07, 2013 18:42 To: Yan Burman Cc: linux-...@vger.kernel.org; sw...@opengridcomputing.com; linux- r...@vger.kernel.org; Or Gerlitz Subject: Re: NFS over RDMA crashing On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote: On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck: --- BUG: unable to handle kernel paging request at 880324dc7ff8 IP: [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 800324dc7161 Oops: 0003 [#1] PREEMPT SMP Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs 8021q bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6 Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro X8DTH-i/6/iF/6F/X8DTH RIP: 0010:[a05f3dfb] [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP: 0018:880324c3dbf8 EFLAGS: 00010297 RAX: 880324dc8000 RBX: 0001 RCX: 880324dd8428 RDX: 880324dc7ff8 RSI: 880324dd8428 RDI: 81149618 RBP: 880324c3dd78 R08: 60f9c860 R09: 0001 R10: 880324dd8000 R11: 0001 R12: 8806299dcb10 R13: 0003 R14: 0001 R15: 0010 FS: () GS:88063fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 880324dc7ff8 CR3: 01a0b000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process nfsd (pid: 4744, threadinfo 880324c3c000, task 88033055) Stack: 880324c3dc78 880324c3dcd8 0282 880631cec000 880324dd8000 88062ed33040 000124c3dc48 880324dd8000 88062ed33058 880630ce2b90 8806299e8000 0003 Call Trace: [a05f466e] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma] [81086540] ? try_to_wake_up+0x2f0/0x2f0 [a045963f] svc_recv+0x3ef/0x4b0 [sunrpc] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [a0571e5d] nfsd+0xad/0x130 [nfsd] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [81071df6] kthread+0xd6/0xe0 [81071d20] ? __init_kthread_worker+0x70/0x70 [814b462c] ret_from_fork+0x7c/0xb0 [81071d20] ? __init_kthread_worker+0x70/0x70 Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00 48 c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 RIP [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP 880324c3dbf8 CR2: 880324dc7ff8 ---[ end trace 06d0384754e9609a ]--- It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e nfsd4: cleanup: replace rq_resused count by rq_next_page pointer is responsible for the crash (it seems to be crashing in net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527) It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet. When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no longer getting the server crashes, so the reset of my tests were done using that point (it is somewhere in the middle of 3.7.0-rc2). OK, so this part's clearly my fault--I'll work on a patch, but the rdma's use of the -rq_pages array is pretty confusing. Does this help? They must have added this for some reason, but I'm not seeing how it could have ever done anything --b. diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..e8f25ec 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -520,13 +520,6 @@ next_sge: for (ch_no = 0; rqstp-rq_pages[ch_no] rqstp-rq_respages; ch_no++) rqstp-rq_pages[ch_no] = NULL
RE: NFS over RDMA crashing
-Original Message- From: Jeff Becker [mailto:jeffrey.c.bec...@nasa.gov] Sent: Wednesday, February 06, 2013 19:07 To: Steve Wise Cc: Yan Burman; bfie...@fieldses.org; linux-...@vger.kernel.org; linux- r...@vger.kernel.org; Or Gerlitz; Tom Tucker Subject: Re: NFS over RDMA crashing Hi. In case you're interested, I did the NFS/RDMA backports for OFED. I tested that NFS/RDMA in OFED 3.5 works on kernel 3.5, and also the RHEL 6.3 kernel. However, I did not test it with SRIOV. If you test it (OFED-3.5-rc6 was released last week), I'd like to know how it goes. Thanks. Jeff Becker On 02/06/2013 07:58 AM, Steve Wise wrote: On 2/6/2013 9:48 AM, Yan Burman wrote: When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no longer getting the server crashes, so the reset of my tests were done using that point (it is somewhere in the middle of 3.7.0-rc2) +tom tucker I'd try going back a few kernels, like to 3.5.x and see if things are more stable. If you find a point that works, then git bisect might help identify the regression. -- Vanilla 3.5.7 seem to work OK out of the box. I will try 3.6 next week + performance comparisons. Yan -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS over RDMA crashing
On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote: On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck: --- BUG: unable to handle kernel paging request at 880324dc7ff8 IP: [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 800324dc7161 Oops: 0003 [#1] PREEMPT SMP Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs 8021q bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6 Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro X8DTH-i/6/iF/6F/X8DTH RIP: 0010:[a05f3dfb] [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP: 0018:880324c3dbf8 EFLAGS: 00010297 RAX: 880324dc8000 RBX: 0001 RCX: 880324dd8428 RDX: 880324dc7ff8 RSI: 880324dd8428 RDI: 81149618 RBP: 880324c3dd78 R08: 60f9c860 R09: 0001 R10: 880324dd8000 R11: 0001 R12: 8806299dcb10 R13: 0003 R14: 0001 R15: 0010 FS: () GS:88063fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 880324dc7ff8 CR3: 01a0b000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process nfsd (pid: 4744, threadinfo 880324c3c000, task 88033055) Stack: 880324c3dc78 880324c3dcd8 0282 880631cec000 880324dd8000 88062ed33040 000124c3dc48 880324dd8000 88062ed33058 880630ce2b90 8806299e8000 0003 Call Trace: [a05f466e] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma] [81086540] ? try_to_wake_up+0x2f0/0x2f0 [a045963f] svc_recv+0x3ef/0x4b0 [sunrpc] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [a0571e5d] nfsd+0xad/0x130 [nfsd] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [81071df6] kthread+0xd6/0xe0 [81071d20] ? __init_kthread_worker+0x70/0x70 [814b462c] ret_from_fork+0x7c/0xb0 [81071d20] ? __init_kthread_worker+0x70/0x70 Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00 48 c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 RIP [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP 880324c3dbf8 CR2: 880324dc7ff8 ---[ end trace 06d0384754e9609a ]--- It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e nfsd4: cleanup: replace rq_resused count by rq_next_page pointer is responsible for the crash (it seems to be crashing in net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527) It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet. When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no longer getting the server crashes, so the reset of my tests were done using that point (it is somewhere in the middle of 3.7.0-rc2). OK, so this part's clearly my fault--I'll work on a patch, but the rdma's use of the -rq_pages array is pretty confusing. Does this help? They must have added this for some reason, but I'm not seeing how it could have ever done anything --b. diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..e8f25ec 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -520,13 +520,6 @@ next_sge: for (ch_no = 0; rqstp-rq_pages[ch_no] rqstp-rq_respages; ch_no++) rqstp-rq_pages[ch_no] = NULL; - /* -* Detach res pages. If svc_release sees any it will attempt to -* put them. -*/ - while (rqstp-rq_next_page != rqstp-rq_respages) - *(--rqstp-rq_next_page) = NULL; - return err; } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS over RDMA crashing
On 2/6/13 3:28 PM, Steve Wise wrote: On 2/6/2013 4:24 PM, J. Bruce Fields wrote: On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck: --- BUG: unable to handle kernel paging request at 880324dc7ff8 IP: [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 800324dc7161 Oops: 0003 [#1] PREEMPT SMP Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs 8021q bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6 Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro X8DTH-i/6/iF/6F/X8DTH RIP: 0010:[a05f3dfb] [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP: 0018:880324c3dbf8 EFLAGS: 00010297 RAX: 880324dc8000 RBX: 0001 RCX: 880324dd8428 RDX: 880324dc7ff8 RSI: 880324dd8428 RDI: 81149618 RBP: 880324c3dd78 R08: 60f9c860 R09: 0001 R10: 880324dd8000 R11: 0001 R12: 8806299dcb10 R13: 0003 R14: 0001 R15: 0010 FS: () GS:88063fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 880324dc7ff8 CR3: 01a0b000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process nfsd (pid: 4744, threadinfo 880324c3c000, task 88033055) Stack: 880324c3dc78 880324c3dcd8 0282 880631cec000 880324dd8000 88062ed33040 000124c3dc48 880324dd8000 88062ed33058 880630ce2b90 8806299e8000 0003 Call Trace: [a05f466e] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma] [81086540] ? try_to_wake_up+0x2f0/0x2f0 [a045963f] svc_recv+0x3ef/0x4b0 [sunrpc] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [a0571e5d] nfsd+0xad/0x130 [nfsd] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [81071df6] kthread+0xd6/0xe0 [81071d20] ? __init_kthread_worker+0x70/0x70 [814b462c] ret_from_fork+0x7c/0xb0 [81071d20] ? __init_kthread_worker+0x70/0x70 Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00 48 c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 RIP [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP 880324c3dbf8 CR2: 880324dc7ff8 ---[ end trace 06d0384754e9609a ]--- It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e nfsd4: cleanup: replace rq_resused count by rq_next_page pointer is responsible for the crash (it seems to be crashing in net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527) It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet. When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no longer getting the server crashes, so the reset of my tests were done using that point (it is somewhere in the middle of 3.7.0-rc2). OK, so this part's clearly my fault--I'll work on a patch, but the rdma's use of the -rq_pages array is pretty confusing. Maybe Tom can shed some light? Yes, the RDMA transport has two confusing tweaks on rq_pages. Most transports (UDP/TCP) use the rq_pages allocated by SVC. For RDMA, however, the RQ already contains pre-allocated memory that will contain inbound NFS requests from the client. Instead of copying this data from the per-registered receive buffer into the buffer in rq_pages, I just replace the page in rq_pages with the one that already contains the data. The second somewhat strange thing is that the NFS request contains an NFSRDMA header. This is just like TCP (i.e. the 4B length), however, the difference is that (unlike TCP) this header is needed for the response because it maps out where in the client the response data will be written. Tom -- To unsubscribe from this list: send the line unsubscribe linux-nfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at
NFS over RDMA crashing
Hi. I have been trying to create a setup with NFS/RDMA, but I am getting crashes. I am using Mellanox ConnectX 3 HCA with SRIOV enabled with two KVM VMs with RHEL 6.3 getting one VF each. My test case is trying to use one VM's storage from another using NFS over RDMA (192.168.20.210 server, 192.168.20.211 client) I started with two physical hosts, but because of crashes moved to VMs which are easier to debug. I have functional ipoib connection between the two VMs and rping is working between them also. My /etc/exports has the following entry: /mnt/tmp*(fsid=1,rw,async,insecure,all_squash) while /mnt/tmp has tmpfs mounted on it. My mount command is: mount -t nfs -o rdma,port=2050 192.168.20.210:/mnt/tmp /mnt/tmp I have tried latest net-next kernel first, but I was getting the following errors: = [ INFO: possible recursive locking detected ] 3.8.0-rc5+ #4 Not tainted - kworker/6:0/49 is trying to acquire lock: (id_priv-handler_mutex){+.+.+.}, at: [a05e7813] rdma_destroy_id+0x33/0x250 [rdma_cm] but task is already holding lock: (id_priv-handler_mutex){+.+.+.}, at: [a05e317b] cma_disable_callback+0x2b/0x60 [rdma_cm] other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(id_priv-handler_mutex); lock(id_priv-handler_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/6:0/49: #0: (ib_cm){.+.+.+}, at: [81068f50] process_one_work+0x160/0x720 #1: (((work-work)-work)){+.+.+.}, at: [81068f50] process_one_work+0x160/0x720 #2: (id_priv-handler_mutex){+.+.+.}, at: [a05e317b] cma_disable_callback+0x2b/0x60 [rdma_cm] stack backtrace: Pid: 49, comm: kworker/6:0 Not tainted 3.8.0-rc5+ #4 Call Trace: [8109f99c] validate_chain+0xdcc/0x11f0 [8109bdcf] ? save_trace+0x3f/0xc0 [810a0760] __lock_acquire+0x440/0xc30 [810a0760] ? __lock_acquire+0x440/0xc30 [810a0fe5] lock_acquire+0x95/0x1e0 [a05e7813] ? rdma_destroy_id+0x33/0x250 [rdma_cm] [a05e7813] ? rdma_destroy_id+0x33/0x250 [rdma_cm] [814a9aff] mutex_lock_nested+0x5f/0x3b0 [a05e7813] ? rdma_destroy_id+0x33/0x250 [rdma_cm] [8109d68d] ? trace_hardirqs_on_caller+0x10d/0x1a0 [8109d72d] ? trace_hardirqs_on+0xd/0x10 [814aca3d] ? _raw_spin_unlock_irqrestore+0x3d/0x80 [a05e7813] rdma_destroy_id+0x33/0x250 [rdma_cm] [a05e8f99] cma_req_handler+0x719/0x730 [rdma_cm] [814aca04] ? _raw_spin_unlock_irqrestore+0x4/0x80 [a05d5772] cm_process_work+0x22/0x170 [ib_cm] [a05d6acd] cm_req_handler+0x67d/0xa70 [ib_cm] [a05d6fed] cm_work_handler+0x12d/0x1218 [ib_cm] [81068fc2] process_one_work+0x1d2/0x720 [81068f50] ? process_one_work+0x160/0x720 [a05d6ec0] ? cm_req_handler+0xa70/0xa70 [ib_cm] [81069930] worker_thread+0x120/0x460 [814ab4b4] ? preempt_schedule+0x44/0x60 [81069810] ? manage_workers+0x300/0x300 [81071df6] kthread+0xd6/0xe0 [81071d20] ? __init_kthread_worker+0x70/0x70 [814b462c] ret_from_fork+0x7c/0xb0 [81071d20] ? __init_kthread_worker+0x70/0x70 When killing mount command that got stuck: --- BUG: unable to handle kernel paging request at 880324dc7ff8 IP: [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 800324dc7161 Oops: 0003 [#1] PREEMPT SMP Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs 8021q bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6 Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro X8DTH-i/6/iF/6F/X8DTH RIP: 0010:[a05f3dfb] [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP: 0018:880324c3dbf8 EFLAGS: 00010297 RAX: 880324dc8000 RBX: 0001 RCX: 880324dd8428 RDX: 880324dc7ff8 RSI: 880324dd8428 RDI: 81149618 RBP: 880324c3dd78 R08: 60f9c860 R09: 0001 R10: 880324dd8000 R11: 0001 R12: 8806299dcb10 R13: 0003 R14: 0001 R15: 0010 FS: () GS:88063fc0() knlGS: CS:
Re: NFS over RDMA crashing
On 2/6/2013 9:48 AM, Yan Burman wrote: When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no longer getting the server crashes, so the reset of my tests were done using that point (it is somewhere in the middle of 3.7.0-rc2). +tom tucker I'd try going back a few kernels, like to 3.5.x and see if things are more stable. If you find a point that works, then git bisect might help identify the regression. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS over RDMA crashing
Hi. In case you're interested, I did the NFS/RDMA backports for OFED. I tested that NFS/RDMA in OFED 3.5 works on kernel 3.5, and also the RHEL 6.3 kernel. However, I did not test it with SRIOV. If you test it (OFED-3.5-rc6 was released last week), I'd like to know how it goes. Thanks. Jeff Becker On 02/06/2013 07:58 AM, Steve Wise wrote: On 2/6/2013 9:48 AM, Yan Burman wrote: When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no longer getting the server crashes, so the reset of my tests were done using that point (it is somewhere in the middle of 3.7.0-rc2) +tom tucker I'd try going back a few kernels, like to 3.5.x and see if things are more stable. If you find a point that works, then git bisect might help identify the regression. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS over RDMA crashing
On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck: --- BUG: unable to handle kernel paging request at 880324dc7ff8 IP: [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 800324dc7161 Oops: 0003 [#1] PREEMPT SMP Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs 8021q bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6 Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro X8DTH-i/6/iF/6F/X8DTH RIP: 0010:[a05f3dfb] [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP: 0018:880324c3dbf8 EFLAGS: 00010297 RAX: 880324dc8000 RBX: 0001 RCX: 880324dd8428 RDX: 880324dc7ff8 RSI: 880324dd8428 RDI: 81149618 RBP: 880324c3dd78 R08: 60f9c860 R09: 0001 R10: 880324dd8000 R11: 0001 R12: 8806299dcb10 R13: 0003 R14: 0001 R15: 0010 FS: () GS:88063fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 880324dc7ff8 CR3: 01a0b000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process nfsd (pid: 4744, threadinfo 880324c3c000, task 88033055) Stack: 880324c3dc78 880324c3dcd8 0282 880631cec000 880324dd8000 88062ed33040 000124c3dc48 880324dd8000 88062ed33058 880630ce2b90 8806299e8000 0003 Call Trace: [a05f466e] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma] [81086540] ? try_to_wake_up+0x2f0/0x2f0 [a045963f] svc_recv+0x3ef/0x4b0 [sunrpc] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [a0571e5d] nfsd+0xad/0x130 [nfsd] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [81071df6] kthread+0xd6/0xe0 [81071d20] ? __init_kthread_worker+0x70/0x70 [814b462c] ret_from_fork+0x7c/0xb0 [81071d20] ? __init_kthread_worker+0x70/0x70 Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00 48 c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 RIP [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP 880324c3dbf8 CR2: 880324dc7ff8 ---[ end trace 06d0384754e9609a ]--- It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e nfsd4: cleanup: replace rq_resused count by rq_next_page pointer is responsible for the crash (it seems to be crashing in net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527) It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet. When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no longer getting the server crashes, so the reset of my tests were done using that point (it is somewhere in the middle of 3.7.0-rc2). OK, so this part's clearly my fault--I'll work on a patch, but the rdma's use of the -rq_pages array is pretty confusing. --b. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS over RDMA crashing
On 2/6/2013 4:24 PM, J. Bruce Fields wrote: On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck: --- BUG: unable to handle kernel paging request at 880324dc7ff8 IP: [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 800324dc7161 Oops: 0003 [#1] PREEMPT SMP Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs 8021q bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6 Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro X8DTH-i/6/iF/6F/X8DTH RIP: 0010:[a05f3dfb] [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP: 0018:880324c3dbf8 EFLAGS: 00010297 RAX: 880324dc8000 RBX: 0001 RCX: 880324dd8428 RDX: 880324dc7ff8 RSI: 880324dd8428 RDI: 81149618 RBP: 880324c3dd78 R08: 60f9c860 R09: 0001 R10: 880324dd8000 R11: 0001 R12: 8806299dcb10 R13: 0003 R14: 0001 R15: 0010 FS: () GS:88063fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 880324dc7ff8 CR3: 01a0b000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process nfsd (pid: 4744, threadinfo 880324c3c000, task 88033055) Stack: 880324c3dc78 880324c3dcd8 0282 880631cec000 880324dd8000 88062ed33040 000124c3dc48 880324dd8000 88062ed33058 880630ce2b90 8806299e8000 0003 Call Trace: [a05f466e] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma] [81086540] ? try_to_wake_up+0x2f0/0x2f0 [a045963f] svc_recv+0x3ef/0x4b0 [sunrpc] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [a0571e5d] nfsd+0xad/0x130 [nfsd] [a0571db0] ? nfsd_svc+0x740/0x740 [nfsd] [81071df6] kthread+0xd6/0xe0 [81071d20] ? __init_kthread_worker+0x70/0x70 [814b462c] ret_from_fork+0x7c/0xb0 [81071d20] ? __init_kthread_worker+0x70/0x70 Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00 48 c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 RIP [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP 880324c3dbf8 CR2: 880324dc7ff8 ---[ end trace 06d0384754e9609a ]--- It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e nfsd4: cleanup: replace rq_resused count by rq_next_page pointer is responsible for the crash (it seems to be crashing in net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527) It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet. When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no longer getting the server crashes, so the reset of my tests were done using that point (it is somewhere in the middle of 3.7.0-rc2). OK, so this part's clearly my fault--I'll work on a patch, but the rdma's use of the -rq_pages array is pretty confusing. Maybe Tom can shed some light? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html