Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
Hi, Jeremy, Raghavendra Thanks! That patch worked just fine for me as well. cheers, artem On Mon, Dec 13, 2010 at 2:39 AM, Jeremy Stout wrote: > I recompiled GlusterFS using the unaccepted patch and I haven't > received any RDMA error messages yet. I'll run some benchmarking tests > over the next couple of days to test the program's stability. > > Thank you. > > On Fri, Dec 10, 2010 at 12:22 AM, Raghavendra G > wrote: >> Hi Artem, >> >> you can check the maximum limits using the patch I had sent earlier in the >> same thread. Also, the patch >> http://patches.gluster.com/patch/5844/ (which is not accepted yet), will >> check for whether the number of cqe being passed in ibv_creation_cq is >> greater than the value allowed by the device and if so, it will try to >> create CQ with maximum limit allowed by the device. >> >> regards, >> - Original Message - >> From: "Artem Trunov" >> To: "Raghavendra G" >> Cc: "Jeremy Stout" , gluster-users@gluster.org >> Sent: Thursday, December 9, 2010 7:13:40 PM >> Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 >> >> Hi, Ravendra, Jeremy >> >> This was very interesting debugging thread to me, since I have the >> same symptoms, but unsure of the origin. Please see log for my mount >> command at the end of the message. >> >> I have installed 3.3.1. My OFED is 1.5.1 - does it make serious >> difference between already mentioned 1.5.2? >> >> On hardware limitations - I have Mellanox InfiniHost III Lx 20Gb/s and >> it says in specs: >> >> "Supports 16 million QPs, EEs & CQs " >> >> Is this enough? How can I query for actual settings on max_cq, max_cqe? >> >> In general, how should I proceed? What are my other debugging options? >> Should I try to go Jeremy path with hacking the gluster code? >> >> cheers >> Artem. >> >> Log: >> >> - >> [2010-12-09 15:15:53.847595] W [io-stats.c:1644:init] test-volume: >> dangling volume. check volfile >> [2010-12-09 15:15:53.847643] W [dict.c:1204:data_to_str] dict: @data=(nil) >> [2010-12-09 15:15:53.847657] W [dict.c:1204:data_to_str] dict: @data=(nil) >> [2010-12-09 15:15:53.858574] E [rdma.c:2066:rdma_create_cq] >> rpc-transport/rdma: test-volume-client-1: creation of send_cq failed >> [2010-12-09 15:15:53.858805] E [rdma.c:3771:rdma_get_device] >> rpc-transport/rdma: test-volume-client-1: could not create CQ >> [2010-12-09 15:15:53.858821] E [rdma.c:3957:rdma_init] >> rpc-transport/rdma: could not create rdma device for mthca0 >> [2010-12-09 15:15:53.858893] E [rdma.c:4789:init] >> test-volume-client-1: Failed to initialize IB Device >> [2010-12-09 15:15:53.858909] E >> [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' >> initialization failed >> pending frames: >> >> patchset: v3.1.1 >> signal received: 11 >> time of crash: 2010-12-09 15:15:53 >> configuration details: >> argp 1 >> backtrace 1 >> dlfcn 1 >> fdatasync 1 >> libpthread 1 >> llistxattr 1 >> setfsid 1 >> spinlock 1 >> epoll.h 1 >> xattr.h 1 >> st_atim.tv_nsec 1 >> package-string: glusterfs 3.1.1 >> /lib64/libc.so.6[0x32aca302d0] >> /lib64/libc.so.6(strcmp+0x0)[0x32aca79140] >> /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so[0x2c4fef6c] >> /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x2f)[0x2c50013f] >> /usr/lib64/libgfrpc.so.0(rpc_transport_load+0x389)[0x3fcca0cac9] >> /usr/lib64/libgfrpc.so.0(rpc_clnt_new+0xfe)[0x3fcca1053e] >> /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa1)[0x2b194f01] >> /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(init+0x129)[0x2b1950d9] >> /usr/lib64/libglusterfs.so.0(xlator_init+0x58)[0x3fcc617398] >> /usr/lib64/libglusterfs.so.0(glusterfs_graph_init+0x31)[0x3fcc640291] >> /usr/lib64/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x3fcc6403c8] >> /usr/sbin/glusterfs(glusterfs_process_volfp+0xfa)[0x40373a] >> /usr/sbin/glusterfs(mgmt_getspec_cbk+0xc5)[0x406125] >> /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x3fcca0f542] >> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x3fcca0f73d] >> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x3fcca0a95c] >> /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2ad6ef9f] >> /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x170)[0x2ad6f130] >> /usr/lib64/libglusterfs.so.0[0x3f
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
Hi all To add some info: 1) I can query adapter settings with "ibv_devinfo -v" and get these values 2) I can vary max_cq via ib_mthca param num_cq, but that doesn't affect max_cqe. cheers Artem. On Fri, Dec 10, 2010 at 1:41 PM, Artem Trunov wrote: > Hi, Raghavendra, Jeremy > > Thanks, I have tried with the patch and also with ofed 1.5.2 and got > pretty much what Jeremy had: > > [2010-12-10 13:32:59.69007] E [rdma.c:2047:rdma_create_cq] > rpc-transport/rdma: max_mr_size = 18446744073709551615, max_cq = > 65408, max_cqe = 131071, max_mr = 131056 > > Aren't these parameters configurable on some driver level? I am a bit > new to the IB business, so don't know... > > How do you suggest to proceed? To try the unaccepted patch? > > cheers > Artem. > > On Fri, Dec 10, 2010 at 6:22 AM, Raghavendra G > wrote: >> Hi Artem, >> >> you can check the maximum limits using the patch I had sent earlier in the >> same thread. Also, the patch >> http://patches.gluster.com/patch/5844/ (which is not accepted yet), will >> check for whether the number of cqe being passed in ibv_creation_cq is >> greater than the value allowed by the device and if so, it will try to >> create CQ with maximum limit allowed by the device. >> >> regards, >> - Original Message - >> From: "Artem Trunov" >> To: "Raghavendra G" >> Cc: "Jeremy Stout" , gluster-users@gluster.org >> Sent: Thursday, December 9, 2010 7:13:40 PM >> Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 >> >> Hi, Ravendra, Jeremy >> >> This was very interesting debugging thread to me, since I have the >> same symptoms, but unsure of the origin. Please see log for my mount >> command at the end of the message. >> >> I have installed 3.3.1. My OFED is 1.5.1 - does it make serious >> difference between already mentioned 1.5.2? >> >> On hardware limitations - I have Mellanox InfiniHost III Lx 20Gb/s and >> it says in specs: >> >> "Supports 16 million QPs, EEs & CQs " >> >> Is this enough? How can I query for actual settings on max_cq, max_cqe? >> >> In general, how should I proceed? What are my other debugging options? >> Should I try to go Jeremy path with hacking the gluster code? >> >> cheers >> Artem. >> >> Log: >> >> - >> [2010-12-09 15:15:53.847595] W [io-stats.c:1644:init] test-volume: >> dangling volume. check volfile >> [2010-12-09 15:15:53.847643] W [dict.c:1204:data_to_str] dict: @data=(nil) >> [2010-12-09 15:15:53.847657] W [dict.c:1204:data_to_str] dict: @data=(nil) >> [2010-12-09 15:15:53.858574] E [rdma.c:2066:rdma_create_cq] >> rpc-transport/rdma: test-volume-client-1: creation of send_cq failed >> [2010-12-09 15:15:53.858805] E [rdma.c:3771:rdma_get_device] >> rpc-transport/rdma: test-volume-client-1: could not create CQ >> [2010-12-09 15:15:53.858821] E [rdma.c:3957:rdma_init] >> rpc-transport/rdma: could not create rdma device for mthca0 >> [2010-12-09 15:15:53.858893] E [rdma.c:4789:init] >> test-volume-client-1: Failed to initialize IB Device >> [2010-12-09 15:15:53.858909] E >> [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' >> initialization failed >> pending frames: >> >> patchset: v3.1.1 >> signal received: 11 >> time of crash: 2010-12-09 15:15:53 >> configuration details: >> argp 1 >> backtrace 1 >> dlfcn 1 >> fdatasync 1 >> libpthread 1 >> llistxattr 1 >> setfsid 1 >> spinlock 1 >> epoll.h 1 >> xattr.h 1 >> st_atim.tv_nsec 1 >> package-string: glusterfs 3.1.1 >> /lib64/libc.so.6[0x32aca302d0] >> /lib64/libc.so.6(strcmp+0x0)[0x32aca79140] >> /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so[0x2c4fef6c] >> /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x2f)[0x2c50013f] >> /usr/lib64/libgfrpc.so.0(rpc_transport_load+0x389)[0x3fcca0cac9] >> /usr/lib64/libgfrpc.so.0(rpc_clnt_new+0xfe)[0x3fcca1053e] >> /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa1)[0x2b194f01] >> /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(init+0x129)[0x2b1950d9] >> /usr/lib64/libglusterfs.so.0(xlator_init+0x58)[0x3fcc617398] >> /usr/lib64/libglusterfs.so.0(glusterfs_graph_init+0x31)[0x3fcc640291] >> /usr/lib64/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x3fcc6403c8] >> /usr/sbin/glusterfs(glusterfs_process_volfp+0xfa)[0x40373a] >> /usr/sbin/glusterfs(mgmt_getspec_cbk+0xc5)[0x406125] >> /usr/lib64/libgfrpc.so.0(rpc_
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
Hi, Raghavendra, Jeremy Thanks, I have tried with the patch and also with ofed 1.5.2 and got pretty much what Jeremy had: [2010-12-10 13:32:59.69007] E [rdma.c:2047:rdma_create_cq] rpc-transport/rdma: max_mr_size = 18446744073709551615, max_cq = 65408, max_cqe = 131071, max_mr = 131056 Aren't these parameters configurable on some driver level? I am a bit new to the IB business, so don't know... How do you suggest to proceed? To try the unaccepted patch? cheers Artem. On Fri, Dec 10, 2010 at 6:22 AM, Raghavendra G wrote: > Hi Artem, > > you can check the maximum limits using the patch I had sent earlier in the > same thread. Also, the patch > http://patches.gluster.com/patch/5844/ (which is not accepted yet), will > check for whether the number of cqe being passed in ibv_creation_cq is > greater than the value allowed by the device and if so, it will try to create > CQ with maximum limit allowed by the device. > > regards, > - Original Message - > From: "Artem Trunov" > To: "Raghavendra G" > Cc: "Jeremy Stout" , gluster-users@gluster.org > Sent: Thursday, December 9, 2010 7:13:40 PM > Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 > > Hi, Ravendra, Jeremy > > This was very interesting debugging thread to me, since I have the > same symptoms, but unsure of the origin. Please see log for my mount > command at the end of the message. > > I have installed 3.3.1. My OFED is 1.5.1 - does it make serious > difference between already mentioned 1.5.2? > > On hardware limitations - I have Mellanox InfiniHost III Lx 20Gb/s and > it says in specs: > > "Supports 16 million QPs, EEs & CQs " > > Is this enough? How can I query for actual settings on max_cq, max_cqe? > > In general, how should I proceed? What are my other debugging options? > Should I try to go Jeremy path with hacking the gluster code? > > cheers > Artem. > > Log: > > - > [2010-12-09 15:15:53.847595] W [io-stats.c:1644:init] test-volume: > dangling volume. check volfile > [2010-12-09 15:15:53.847643] W [dict.c:1204:data_to_str] dict: @data=(nil) > [2010-12-09 15:15:53.847657] W [dict.c:1204:data_to_str] dict: @data=(nil) > [2010-12-09 15:15:53.858574] E [rdma.c:2066:rdma_create_cq] > rpc-transport/rdma: test-volume-client-1: creation of send_cq failed > [2010-12-09 15:15:53.858805] E [rdma.c:3771:rdma_get_device] > rpc-transport/rdma: test-volume-client-1: could not create CQ > [2010-12-09 15:15:53.858821] E [rdma.c:3957:rdma_init] > rpc-transport/rdma: could not create rdma device for mthca0 > [2010-12-09 15:15:53.858893] E [rdma.c:4789:init] > test-volume-client-1: Failed to initialize IB Device > [2010-12-09 15:15:53.858909] E > [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' > initialization failed > pending frames: > > patchset: v3.1.1 > signal received: 11 > time of crash: 2010-12-09 15:15:53 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > fdatasync 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 3.1.1 > /lib64/libc.so.6[0x32aca302d0] > /lib64/libc.so.6(strcmp+0x0)[0x32aca79140] > /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so[0x2c4fef6c] > /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x2f)[0x2c50013f] > /usr/lib64/libgfrpc.so.0(rpc_transport_load+0x389)[0x3fcca0cac9] > /usr/lib64/libgfrpc.so.0(rpc_clnt_new+0xfe)[0x3fcca1053e] > /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa1)[0x2b194f01] > /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(init+0x129)[0x2b1950d9] > /usr/lib64/libglusterfs.so.0(xlator_init+0x58)[0x3fcc617398] > /usr/lib64/libglusterfs.so.0(glusterfs_graph_init+0x31)[0x3fcc640291] > /usr/lib64/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x3fcc6403c8] > /usr/sbin/glusterfs(glusterfs_process_volfp+0xfa)[0x40373a] > /usr/sbin/glusterfs(mgmt_getspec_cbk+0xc5)[0x406125] > /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x3fcca0f542] > /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x3fcca0f73d] > /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x3fcca0a95c] > /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2ad6ef9f] > /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x170)[0x2ad6f130] > /usr/lib64/libglusterfs.so.0[0x3fcc637917] > /usr/sbin/glusterfs(main+0x39b)[0x40470b] > /lib64/libc.so.6(__libc_start_main+0xf4)[0x32aca1d994] > /usr/sbin/glusterfs[0x402e29] > > > > > On Fri, Dec 3, 2010 at 1:53 PM, Raghavendra G wrote: >> From the logs its evident that the reason for completion queu
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
Hi, Ravendra, Jeremy This was very interesting debugging thread to me, since I have the same symptoms, but unsure of the origin. Please see log for my mount command at the end of the message. I have installed 3.3.1. My OFED is 1.5.1 - does it make serious difference between already mentioned 1.5.2? On hardware limitations - I have Mellanox InfiniHost III Lx 20Gb/s and it says in specs: "Supports 16 million QPs, EEs & CQs " Is this enough? How can I query for actual settings on max_cq, max_cqe? In general, how should I proceed? What are my other debugging options? Should I try to go Jeremy path with hacking the gluster code? cheers Artem. Log: - [2010-12-09 15:15:53.847595] W [io-stats.c:1644:init] test-volume: dangling volume. check volfile [2010-12-09 15:15:53.847643] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-09 15:15:53.847657] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-09 15:15:53.858574] E [rdma.c:2066:rdma_create_cq] rpc-transport/rdma: test-volume-client-1: creation of send_cq failed [2010-12-09 15:15:53.858805] E [rdma.c:3771:rdma_get_device] rpc-transport/rdma: test-volume-client-1: could not create CQ [2010-12-09 15:15:53.858821] E [rdma.c:3957:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-12-09 15:15:53.858893] E [rdma.c:4789:init] test-volume-client-1: Failed to initialize IB Device [2010-12-09 15:15:53.858909] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed pending frames: patchset: v3.1.1 signal received: 11 time of crash: 2010-12-09 15:15:53 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.1.1 /lib64/libc.so.6[0x32aca302d0] /lib64/libc.so.6(strcmp+0x0)[0x32aca79140] /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so[0x2c4fef6c] /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x2f)[0x2c50013f] /usr/lib64/libgfrpc.so.0(rpc_transport_load+0x389)[0x3fcca0cac9] /usr/lib64/libgfrpc.so.0(rpc_clnt_new+0xfe)[0x3fcca1053e] /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa1)[0x2b194f01] /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(init+0x129)[0x2b1950d9] /usr/lib64/libglusterfs.so.0(xlator_init+0x58)[0x3fcc617398] /usr/lib64/libglusterfs.so.0(glusterfs_graph_init+0x31)[0x3fcc640291] /usr/lib64/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x3fcc6403c8] /usr/sbin/glusterfs(glusterfs_process_volfp+0xfa)[0x40373a] /usr/sbin/glusterfs(mgmt_getspec_cbk+0xc5)[0x406125] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x3fcca0f542] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x3fcca0f73d] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x3fcca0a95c] /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2ad6ef9f] /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x170)[0x2ad6f130] /usr/lib64/libglusterfs.so.0[0x3fcc637917] /usr/sbin/glusterfs(main+0x39b)[0x40470b] /lib64/libc.so.6(__libc_start_main+0xf4)[0x32aca1d994] /usr/sbin/glusterfs[0x402e29] On Fri, Dec 3, 2010 at 1:53 PM, Raghavendra G wrote: > From the logs its evident that the reason for completion queue creation > failure is that the number of completion queue elements (in a completion > queue) we had requested in ibv_create_cq, (1024 * send_count) is less than > the maximum supported by the ib hardware (max_cqe = 131071). > > - Original Message - > From: "Jeremy Stout" > To: "Raghavendra G" > Cc: gluster-users@gluster.org > Sent: Friday, December 3, 2010 4:20:04 PM > Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 > > I patched the source code and rebuilt GlusterFS. Here are the full logs: > Server: > [2010-12-03 07:08:55.945804] I [glusterd.c:275:init] management: Using > /etc/glusterd as working directory > [2010-12-03 07:08:55.947692] E [rdma.c:2047:rdma_create_cq] > rpc-transport/rdma: max_mr_size = 18446744073709551615, max_cq = > 65408, max_cqe = 131071, max_mr = 131056 > [2010-12-03 07:08:55.953226] E [rdma.c:2079:rdma_create_cq] > rpc-transport/rdma: rdma.management: creation of send_cq failed > [2010-12-03 07:08:55.953509] E [rdma.c:3785:rdma_get_device] > rpc-transport/rdma: rdma.management: could not create CQ > [2010-12-03 07:08:55.953582] E [rdma.c:3971:rdma_init] > rpc-transport/rdma: could not create rdma device for mthca0 > [2010-12-03 07:08:55.953668] E [rdma.c:4803:init] rdma.management: > Failed to initialize IB Device > [2010-12-03 07:08:55.953691] E > [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' > initialization failed > [2010-12-03 07:08:55.953780] I [glusterd.c:96:glusterd_uuid_init] > glusterd: generated UUID: 4eb47ca7-227c-49c4-97bd-25ac177b2f6a > Given volfile: > +--+ > 1: volume management > 2: type mgmt/glusterd > 3: option working-d