Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
Hi, Jeremy, Raghavendra Thanks! That patch worked just fine for me as well. cheers, artem On Mon, Dec 13, 2010 at 2:39 AM, Jeremy Stout stout.jer...@gmail.com wrote: I recompiled GlusterFS using the unaccepted patch and I haven't received any RDMA error messages yet. I'll run some benchmarking tests over the next couple of days to test the program's stability. Thank you. On Fri, Dec 10, 2010 at 12:22 AM, Raghavendra G raghaven...@gluster.com wrote: Hi Artem, you can check the maximum limits using the patch I had sent earlier in the same thread. Also, the patch http://patches.gluster.com/patch/5844/ (which is not accepted yet), will check for whether the number of cqe being passed in ibv_creation_cq is greater than the value allowed by the device and if so, it will try to create CQ with maximum limit allowed by the device. regards, - Original Message - From: Artem Trunov datam...@gmail.com To: Raghavendra G raghaven...@gluster.com Cc: Jeremy Stout stout.jer...@gmail.com, gluster-users@gluster.org Sent: Thursday, December 9, 2010 7:13:40 PM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 Hi, Ravendra, Jeremy This was very interesting debugging thread to me, since I have the same symptoms, but unsure of the origin. Please see log for my mount command at the end of the message. I have installed 3.3.1. My OFED is 1.5.1 - does it make serious difference between already mentioned 1.5.2? On hardware limitations - I have Mellanox InfiniHost III Lx 20Gb/s and it says in specs: Supports 16 million QPs, EEs CQs Is this enough? How can I query for actual settings on max_cq, max_cqe? In general, how should I proceed? What are my other debugging options? Should I try to go Jeremy path with hacking the gluster code? cheers Artem. Log: - [2010-12-09 15:15:53.847595] W [io-stats.c:1644:init] test-volume: dangling volume. check volfile [2010-12-09 15:15:53.847643] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-09 15:15:53.847657] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-09 15:15:53.858574] E [rdma.c:2066:rdma_create_cq] rpc-transport/rdma: test-volume-client-1: creation of send_cq failed [2010-12-09 15:15:53.858805] E [rdma.c:3771:rdma_get_device] rpc-transport/rdma: test-volume-client-1: could not create CQ [2010-12-09 15:15:53.858821] E [rdma.c:3957:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-12-09 15:15:53.858893] E [rdma.c:4789:init] test-volume-client-1: Failed to initialize IB Device [2010-12-09 15:15:53.858909] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed pending frames: patchset: v3.1.1 signal received: 11 time of crash: 2010-12-09 15:15:53 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.1.1 /lib64/libc.so.6[0x32aca302d0] /lib64/libc.so.6(strcmp+0x0)[0x32aca79140] /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so[0x2c4fef6c] /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x2f)[0x2c50013f] /usr/lib64/libgfrpc.so.0(rpc_transport_load+0x389)[0x3fcca0cac9] /usr/lib64/libgfrpc.so.0(rpc_clnt_new+0xfe)[0x3fcca1053e] /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa1)[0x2b194f01] /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(init+0x129)[0x2b1950d9] /usr/lib64/libglusterfs.so.0(xlator_init+0x58)[0x3fcc617398] /usr/lib64/libglusterfs.so.0(glusterfs_graph_init+0x31)[0x3fcc640291] /usr/lib64/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x3fcc6403c8] /usr/sbin/glusterfs(glusterfs_process_volfp+0xfa)[0x40373a] /usr/sbin/glusterfs(mgmt_getspec_cbk+0xc5)[0x406125] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x3fcca0f542] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x3fcca0f73d] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x3fcca0a95c] /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2ad6ef9f] /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x170)[0x2ad6f130] /usr/lib64/libglusterfs.so.0[0x3fcc637917] /usr/sbin/glusterfs(main+0x39b)[0x40470b] /lib64/libc.so.6(__libc_start_main+0xf4)[0x32aca1d994] /usr/sbin/glusterfs[0x402e29] On Fri, Dec 3, 2010 at 1:53 PM, Raghavendra G raghaven...@gluster.com wrote: From the logs its evident that the reason for completion queue creation failure is that the number of completion queue elements (in a completion queue) we had requested in ibv_create_cq, (1024 * send_count) is less than the maximum supported by the ib hardware (max_cqe = 131071). - Original Message - From: Jeremy Stout stout.jer...@gmail.com To: Raghavendra G raghaven...@gluster.com Cc: gluster-users@gluster.org Sent: Friday, December 3, 2010 4:20:04 PM Subject: Re: [Gluster-users] RDMA Problems
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
I recompiled GlusterFS using the unaccepted patch and I haven't received any RDMA error messages yet. I'll run some benchmarking tests over the next couple of days to test the program's stability. Thank you. On Fri, Dec 10, 2010 at 12:22 AM, Raghavendra G raghaven...@gluster.com wrote: Hi Artem, you can check the maximum limits using the patch I had sent earlier in the same thread. Also, the patch http://patches.gluster.com/patch/5844/ (which is not accepted yet), will check for whether the number of cqe being passed in ibv_creation_cq is greater than the value allowed by the device and if so, it will try to create CQ with maximum limit allowed by the device. regards, - Original Message - From: Artem Trunov datam...@gmail.com To: Raghavendra G raghaven...@gluster.com Cc: Jeremy Stout stout.jer...@gmail.com, gluster-users@gluster.org Sent: Thursday, December 9, 2010 7:13:40 PM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 Hi, Ravendra, Jeremy This was very interesting debugging thread to me, since I have the same symptoms, but unsure of the origin. Please see log for my mount command at the end of the message. I have installed 3.3.1. My OFED is 1.5.1 - does it make serious difference between already mentioned 1.5.2? On hardware limitations - I have Mellanox InfiniHost III Lx 20Gb/s and it says in specs: Supports 16 million QPs, EEs CQs Is this enough? How can I query for actual settings on max_cq, max_cqe? In general, how should I proceed? What are my other debugging options? Should I try to go Jeremy path with hacking the gluster code? cheers Artem. Log: - [2010-12-09 15:15:53.847595] W [io-stats.c:1644:init] test-volume: dangling volume. check volfile [2010-12-09 15:15:53.847643] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-09 15:15:53.847657] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-09 15:15:53.858574] E [rdma.c:2066:rdma_create_cq] rpc-transport/rdma: test-volume-client-1: creation of send_cq failed [2010-12-09 15:15:53.858805] E [rdma.c:3771:rdma_get_device] rpc-transport/rdma: test-volume-client-1: could not create CQ [2010-12-09 15:15:53.858821] E [rdma.c:3957:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-12-09 15:15:53.858893] E [rdma.c:4789:init] test-volume-client-1: Failed to initialize IB Device [2010-12-09 15:15:53.858909] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed pending frames: patchset: v3.1.1 signal received: 11 time of crash: 2010-12-09 15:15:53 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.1.1 /lib64/libc.so.6[0x32aca302d0] /lib64/libc.so.6(strcmp+0x0)[0x32aca79140] /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so[0x2c4fef6c] /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x2f)[0x2c50013f] /usr/lib64/libgfrpc.so.0(rpc_transport_load+0x389)[0x3fcca0cac9] /usr/lib64/libgfrpc.so.0(rpc_clnt_new+0xfe)[0x3fcca1053e] /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa1)[0x2b194f01] /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(init+0x129)[0x2b1950d9] /usr/lib64/libglusterfs.so.0(xlator_init+0x58)[0x3fcc617398] /usr/lib64/libglusterfs.so.0(glusterfs_graph_init+0x31)[0x3fcc640291] /usr/lib64/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x3fcc6403c8] /usr/sbin/glusterfs(glusterfs_process_volfp+0xfa)[0x40373a] /usr/sbin/glusterfs(mgmt_getspec_cbk+0xc5)[0x406125] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x3fcca0f542] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x3fcca0f73d] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x3fcca0a95c] /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2ad6ef9f] /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x170)[0x2ad6f130] /usr/lib64/libglusterfs.so.0[0x3fcc637917] /usr/sbin/glusterfs(main+0x39b)[0x40470b] /lib64/libc.so.6(__libc_start_main+0xf4)[0x32aca1d994] /usr/sbin/glusterfs[0x402e29] On Fri, Dec 3, 2010 at 1:53 PM, Raghavendra G raghaven...@gluster.com wrote: From the logs its evident that the reason for completion queue creation failure is that the number of completion queue elements (in a completion queue) we had requested in ibv_create_cq, (1024 * send_count) is less than the maximum supported by the ib hardware (max_cqe = 131071). - Original Message - From: Jeremy Stout stout.jer...@gmail.com To: Raghavendra G raghaven...@gluster.com Cc: gluster-users@gluster.org Sent: Friday, December 3, 2010 4:20:04 PM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 I patched the source code and rebuilt GlusterFS. Here are the full logs: Server: [2010-12-03 07:08:55.945804] I [glusterd.c:275:init] management: Using /etc
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
Hi, Raghavendra, Jeremy Thanks, I have tried with the patch and also with ofed 1.5.2 and got pretty much what Jeremy had: [2010-12-10 13:32:59.69007] E [rdma.c:2047:rdma_create_cq] rpc-transport/rdma: max_mr_size = 18446744073709551615, max_cq = 65408, max_cqe = 131071, max_mr = 131056 Aren't these parameters configurable on some driver level? I am a bit new to the IB business, so don't know... How do you suggest to proceed? To try the unaccepted patch? cheers Artem. On Fri, Dec 10, 2010 at 6:22 AM, Raghavendra G raghaven...@gluster.com wrote: Hi Artem, you can check the maximum limits using the patch I had sent earlier in the same thread. Also, the patch http://patches.gluster.com/patch/5844/ (which is not accepted yet), will check for whether the number of cqe being passed in ibv_creation_cq is greater than the value allowed by the device and if so, it will try to create CQ with maximum limit allowed by the device. regards, - Original Message - From: Artem Trunov datam...@gmail.com To: Raghavendra G raghaven...@gluster.com Cc: Jeremy Stout stout.jer...@gmail.com, gluster-users@gluster.org Sent: Thursday, December 9, 2010 7:13:40 PM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 Hi, Ravendra, Jeremy This was very interesting debugging thread to me, since I have the same symptoms, but unsure of the origin. Please see log for my mount command at the end of the message. I have installed 3.3.1. My OFED is 1.5.1 - does it make serious difference between already mentioned 1.5.2? On hardware limitations - I have Mellanox InfiniHost III Lx 20Gb/s and it says in specs: Supports 16 million QPs, EEs CQs Is this enough? How can I query for actual settings on max_cq, max_cqe? In general, how should I proceed? What are my other debugging options? Should I try to go Jeremy path with hacking the gluster code? cheers Artem. Log: - [2010-12-09 15:15:53.847595] W [io-stats.c:1644:init] test-volume: dangling volume. check volfile [2010-12-09 15:15:53.847643] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-09 15:15:53.847657] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-09 15:15:53.858574] E [rdma.c:2066:rdma_create_cq] rpc-transport/rdma: test-volume-client-1: creation of send_cq failed [2010-12-09 15:15:53.858805] E [rdma.c:3771:rdma_get_device] rpc-transport/rdma: test-volume-client-1: could not create CQ [2010-12-09 15:15:53.858821] E [rdma.c:3957:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-12-09 15:15:53.858893] E [rdma.c:4789:init] test-volume-client-1: Failed to initialize IB Device [2010-12-09 15:15:53.858909] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed pending frames: patchset: v3.1.1 signal received: 11 time of crash: 2010-12-09 15:15:53 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.1.1 /lib64/libc.so.6[0x32aca302d0] /lib64/libc.so.6(strcmp+0x0)[0x32aca79140] /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so[0x2c4fef6c] /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x2f)[0x2c50013f] /usr/lib64/libgfrpc.so.0(rpc_transport_load+0x389)[0x3fcca0cac9] /usr/lib64/libgfrpc.so.0(rpc_clnt_new+0xfe)[0x3fcca1053e] /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa1)[0x2b194f01] /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(init+0x129)[0x2b1950d9] /usr/lib64/libglusterfs.so.0(xlator_init+0x58)[0x3fcc617398] /usr/lib64/libglusterfs.so.0(glusterfs_graph_init+0x31)[0x3fcc640291] /usr/lib64/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x3fcc6403c8] /usr/sbin/glusterfs(glusterfs_process_volfp+0xfa)[0x40373a] /usr/sbin/glusterfs(mgmt_getspec_cbk+0xc5)[0x406125] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x3fcca0f542] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x3fcca0f73d] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x3fcca0a95c] /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2ad6ef9f] /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x170)[0x2ad6f130] /usr/lib64/libglusterfs.so.0[0x3fcc637917] /usr/sbin/glusterfs(main+0x39b)[0x40470b] /lib64/libc.so.6(__libc_start_main+0xf4)[0x32aca1d994] /usr/sbin/glusterfs[0x402e29] On Fri, Dec 3, 2010 at 1:53 PM, Raghavendra G raghaven...@gluster.com wrote: From the logs its evident that the reason for completion queue creation failure is that the number of completion queue elements (in a completion queue) we had requested in ibv_create_cq, (1024 * send_count) is less than the maximum supported by the ib hardware (max_cqe = 131071). - Original Message - From: Jeremy Stout stout.jer...@gmail.com To: Raghavendra G raghaven...@gluster.com Cc: gluster-users@gluster.org Sent
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
Hi all To add some info: 1) I can query adapter settings with ibv_devinfo -v and get these values 2) I can vary max_cq via ib_mthca param num_cq, but that doesn't affect max_cqe. cheers Artem. On Fri, Dec 10, 2010 at 1:41 PM, Artem Trunov datam...@gmail.com wrote: Hi, Raghavendra, Jeremy Thanks, I have tried with the patch and also with ofed 1.5.2 and got pretty much what Jeremy had: [2010-12-10 13:32:59.69007] E [rdma.c:2047:rdma_create_cq] rpc-transport/rdma: max_mr_size = 18446744073709551615, max_cq = 65408, max_cqe = 131071, max_mr = 131056 Aren't these parameters configurable on some driver level? I am a bit new to the IB business, so don't know... How do you suggest to proceed? To try the unaccepted patch? cheers Artem. On Fri, Dec 10, 2010 at 6:22 AM, Raghavendra G raghaven...@gluster.com wrote: Hi Artem, you can check the maximum limits using the patch I had sent earlier in the same thread. Also, the patch http://patches.gluster.com/patch/5844/ (which is not accepted yet), will check for whether the number of cqe being passed in ibv_creation_cq is greater than the value allowed by the device and if so, it will try to create CQ with maximum limit allowed by the device. regards, - Original Message - From: Artem Trunov datam...@gmail.com To: Raghavendra G raghaven...@gluster.com Cc: Jeremy Stout stout.jer...@gmail.com, gluster-users@gluster.org Sent: Thursday, December 9, 2010 7:13:40 PM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 Hi, Ravendra, Jeremy This was very interesting debugging thread to me, since I have the same symptoms, but unsure of the origin. Please see log for my mount command at the end of the message. I have installed 3.3.1. My OFED is 1.5.1 - does it make serious difference between already mentioned 1.5.2? On hardware limitations - I have Mellanox InfiniHost III Lx 20Gb/s and it says in specs: Supports 16 million QPs, EEs CQs Is this enough? How can I query for actual settings on max_cq, max_cqe? In general, how should I proceed? What are my other debugging options? Should I try to go Jeremy path with hacking the gluster code? cheers Artem. Log: - [2010-12-09 15:15:53.847595] W [io-stats.c:1644:init] test-volume: dangling volume. check volfile [2010-12-09 15:15:53.847643] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-09 15:15:53.847657] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-09 15:15:53.858574] E [rdma.c:2066:rdma_create_cq] rpc-transport/rdma: test-volume-client-1: creation of send_cq failed [2010-12-09 15:15:53.858805] E [rdma.c:3771:rdma_get_device] rpc-transport/rdma: test-volume-client-1: could not create CQ [2010-12-09 15:15:53.858821] E [rdma.c:3957:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-12-09 15:15:53.858893] E [rdma.c:4789:init] test-volume-client-1: Failed to initialize IB Device [2010-12-09 15:15:53.858909] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed pending frames: patchset: v3.1.1 signal received: 11 time of crash: 2010-12-09 15:15:53 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.1.1 /lib64/libc.so.6[0x32aca302d0] /lib64/libc.so.6(strcmp+0x0)[0x32aca79140] /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so[0x2c4fef6c] /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x2f)[0x2c50013f] /usr/lib64/libgfrpc.so.0(rpc_transport_load+0x389)[0x3fcca0cac9] /usr/lib64/libgfrpc.so.0(rpc_clnt_new+0xfe)[0x3fcca1053e] /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa1)[0x2b194f01] /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(init+0x129)[0x2b1950d9] /usr/lib64/libglusterfs.so.0(xlator_init+0x58)[0x3fcc617398] /usr/lib64/libglusterfs.so.0(glusterfs_graph_init+0x31)[0x3fcc640291] /usr/lib64/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x3fcc6403c8] /usr/sbin/glusterfs(glusterfs_process_volfp+0xfa)[0x40373a] /usr/sbin/glusterfs(mgmt_getspec_cbk+0xc5)[0x406125] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x3fcca0f542] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x3fcca0f73d] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x3fcca0a95c] /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2ad6ef9f] /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x170)[0x2ad6f130] /usr/lib64/libglusterfs.so.0[0x3fcc637917] /usr/sbin/glusterfs(main+0x39b)[0x40470b] /lib64/libc.so.6(__libc_start_main+0xf4)[0x32aca1d994] /usr/sbin/glusterfs[0x402e29] On Fri, Dec 3, 2010 at 1:53 PM, Raghavendra G raghaven...@gluster.com wrote: From the logs its evident that the reason for completion queue creation failure is that the number of completion queue elements (in a completion queue) we
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
Hi, Ravendra, Jeremy This was very interesting debugging thread to me, since I have the same symptoms, but unsure of the origin. Please see log for my mount command at the end of the message. I have installed 3.3.1. My OFED is 1.5.1 - does it make serious difference between already mentioned 1.5.2? On hardware limitations - I have Mellanox InfiniHost III Lx 20Gb/s and it says in specs: Supports 16 million QPs, EEs CQs Is this enough? How can I query for actual settings on max_cq, max_cqe? In general, how should I proceed? What are my other debugging options? Should I try to go Jeremy path with hacking the gluster code? cheers Artem. Log: - [2010-12-09 15:15:53.847595] W [io-stats.c:1644:init] test-volume: dangling volume. check volfile [2010-12-09 15:15:53.847643] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-09 15:15:53.847657] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-09 15:15:53.858574] E [rdma.c:2066:rdma_create_cq] rpc-transport/rdma: test-volume-client-1: creation of send_cq failed [2010-12-09 15:15:53.858805] E [rdma.c:3771:rdma_get_device] rpc-transport/rdma: test-volume-client-1: could not create CQ [2010-12-09 15:15:53.858821] E [rdma.c:3957:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-12-09 15:15:53.858893] E [rdma.c:4789:init] test-volume-client-1: Failed to initialize IB Device [2010-12-09 15:15:53.858909] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed pending frames: patchset: v3.1.1 signal received: 11 time of crash: 2010-12-09 15:15:53 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.1.1 /lib64/libc.so.6[0x32aca302d0] /lib64/libc.so.6(strcmp+0x0)[0x32aca79140] /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so[0x2c4fef6c] /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x2f)[0x2c50013f] /usr/lib64/libgfrpc.so.0(rpc_transport_load+0x389)[0x3fcca0cac9] /usr/lib64/libgfrpc.so.0(rpc_clnt_new+0xfe)[0x3fcca1053e] /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa1)[0x2b194f01] /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(init+0x129)[0x2b1950d9] /usr/lib64/libglusterfs.so.0(xlator_init+0x58)[0x3fcc617398] /usr/lib64/libglusterfs.so.0(glusterfs_graph_init+0x31)[0x3fcc640291] /usr/lib64/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x3fcc6403c8] /usr/sbin/glusterfs(glusterfs_process_volfp+0xfa)[0x40373a] /usr/sbin/glusterfs(mgmt_getspec_cbk+0xc5)[0x406125] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x3fcca0f542] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x3fcca0f73d] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x3fcca0a95c] /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2ad6ef9f] /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x170)[0x2ad6f130] /usr/lib64/libglusterfs.so.0[0x3fcc637917] /usr/sbin/glusterfs(main+0x39b)[0x40470b] /lib64/libc.so.6(__libc_start_main+0xf4)[0x32aca1d994] /usr/sbin/glusterfs[0x402e29] On Fri, Dec 3, 2010 at 1:53 PM, Raghavendra G raghaven...@gluster.com wrote: From the logs its evident that the reason for completion queue creation failure is that the number of completion queue elements (in a completion queue) we had requested in ibv_create_cq, (1024 * send_count) is less than the maximum supported by the ib hardware (max_cqe = 131071). - Original Message - From: Jeremy Stout stout.jer...@gmail.com To: Raghavendra G raghaven...@gluster.com Cc: gluster-users@gluster.org Sent: Friday, December 3, 2010 4:20:04 PM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 I patched the source code and rebuilt GlusterFS. Here are the full logs: Server: [2010-12-03 07:08:55.945804] I [glusterd.c:275:init] management: Using /etc/glusterd as working directory [2010-12-03 07:08:55.947692] E [rdma.c:2047:rdma_create_cq] rpc-transport/rdma: max_mr_size = 18446744073709551615, max_cq = 65408, max_cqe = 131071, max_mr = 131056 [2010-12-03 07:08:55.953226] E [rdma.c:2079:rdma_create_cq] rpc-transport/rdma: rdma.management: creation of send_cq failed [2010-12-03 07:08:55.953509] E [rdma.c:3785:rdma_get_device] rpc-transport/rdma: rdma.management: could not create CQ [2010-12-03 07:08:55.953582] E [rdma.c:3971:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-12-03 07:08:55.953668] E [rdma.c:4803:init] rdma.management: Failed to initialize IB Device [2010-12-03 07:08:55.953691] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed [2010-12-03 07:08:55.953780] I [glusterd.c:96:glusterd_uuid_init] glusterd: generated UUID: 4eb47ca7-227c-49c4-97bd-25ac177b2f6a Given volfile: +--+ 1: volume management 2: type mgmt/glusterd
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
-behind 10: subvolumes testdir-client-0 11: end-volume 12: 13: volume testdir-read-ahead 14: type performance/read-ahead 15: subvolumes testdir-write-behind 16: end-volume 17: 18: volume testdir-io-cache 19: type performance/io-cache 20: subvolumes testdir-read-ahead 21: end-volume 22: 23: volume testdir-quick-read 24: type performance/quick-read 25: subvolumes testdir-io-cache 26: end-volume 27: 28: volume testdir-stat-prefetch 29: type performance/stat-prefetch 30: subvolumes testdir-quick-read 31: end-volume 32: 33: volume testdir 34: type debug/io-stats 35: subvolumes testdir-stat-prefetch 36: end-volume +--+ On Fri, Dec 3, 2010 at 12:38 AM, Raghavendra G raghaven...@gluster.com wrote: Hi Jeremy, Can you apply the attached patch, rebuild and start glusterfs? Please make sure to send us the logs of glusterfs. regards, - Original Message - From: Jeremy Stout stout.jer...@gmail.com To: gluster-users@gluster.org Sent: Friday, December 3, 2010 6:38:00 AM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 I'm currently using OFED 1.5.2. For the sake of testing, I just compiled GlusterFS 3.1.1 from source, without any modifications, on two systems that have a 2.6.33.7 kernel and OFED 1.5.2 built from source. Here are the results: Server: [2010-12-02 21:17:55.886563] I [glusterd-handler.c:936:glusterd_handle_cli_start_volume] glusterd: Received start vol reqfor volume testdir [2010-12-02 21:17:55.886597] I [glusterd-utils.c:232:glusterd_lock] glusterd: Cluster lock held by 7dd23af5-277e-4ea1-a495-2a9d882287ec [2010-12-02 21:17:55.886607] I [glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired local lock [2010-12-02 21:17:55.886628] I [glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock req to 0 peers [2010-12-02 21:17:55.887031] I [glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req to 0 peers [2010-12-02 21:17:56.60427] I [glusterd-utils.c:971:glusterd_volume_start_glusterfs] : About to start glusterfs for brick submit-1:/mnt/gluster [2010-12-02 21:17:56.104896] I [glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req to 0 peers [2010-12-02 21:17:56.104935] I [glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent unlock req to 0 peers [2010-12-02 21:17:56.104953] I [glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared local lock [2010-12-02 21:17:56.114764] I [glusterd-pmap.c:281:pmap_registry_remove] pmap: removing brick (null) on port 24009 Client: [2010-12-02 21:17:25.503395] W [io-stats.c:1644:init] testdir: dangling volume. check volfile [2010-12-02 21:17:25.503434] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-02 21:17:25.503447] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-02 21:17:25.543409] E [rdma.c:2066:rdma_create_cq] rpc-transport/rdma: testdir-client-0: creation of send_cq failed [2010-12-02 21:17:25.543660] E [rdma.c:3771:rdma_get_device] rpc-transport/rdma: testdir-client-0: could not create CQ [2010-12-02 21:17:25.543725] E [rdma.c:3957:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-12-02 21:17:25.543812] E [rdma.c:4789:init] testdir-client-0: Failed to initialize IB Device [2010-12-02 21:17:25.543830] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed Thank you for the help so far. On Thu, Dec 2, 2010 at 8:02 PM, Craig Carl cr...@gluster.com wrote: Jeremy - What version of OFED are you running? Would you mind install version 1.5.2 from source? We have seen this resolve several issues of this type. http://www.openfabrics.org/downloads/OFED/ofed-1.5.2/ Thanks, Craig -- Craig Carl Senior Systems Engineer Gluster On 12/02/2010 10:05 AM, Jeremy Stout wrote: An another follow-up, I tested several compilations today with different values for send/receive count. I found the maximum value I could use for both variables was 127. With a value of 127, GlusterFS did not produce any errors. However, when I changed the value back to 128, the RDMA errors appeared again. I also tried setting soft/hard memlock to unlimited in the limits.conf file, but still ran into RDMA errors on the client side when the count variables were set to 128. On Thu, Dec 2, 2010 at 9:04 AM, Jeremy Stoutstout.jer...@gmail.com wrote: Thank you for the response. I've been testing GlusterFS 3.1.1 on two different OpenSUSE 11.3 systems. Since both systems generated the same error messages, I'll include the output for both. System #1: fs-1:~ # cat /proc/meminfo MemTotal: 16468756 kB MemFree: 16126680 kB Buffers: 15680 kB Cached: 155860 kB SwapCached: 0 kB Active: 65228 kB Inactive: 123100 kB Active(anon): 18632 kB
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
From the logs its evident that the reason for completion queue creation failure is that the number of completion queue elements (in a completion queue) we had requested in ibv_create_cq, (1024 * send_count) is less than the maximum supported by the ib hardware (max_cqe = 131071). - Original Message - From: Jeremy Stout stout.jer...@gmail.com To: Raghavendra G raghaven...@gluster.com Cc: gluster-users@gluster.org Sent: Friday, December 3, 2010 4:20:04 PM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 I patched the source code and rebuilt GlusterFS. Here are the full logs: Server: [2010-12-03 07:08:55.945804] I [glusterd.c:275:init] management: Using /etc/glusterd as working directory [2010-12-03 07:08:55.947692] E [rdma.c:2047:rdma_create_cq] rpc-transport/rdma: max_mr_size = 18446744073709551615, max_cq = 65408, max_cqe = 131071, max_mr = 131056 [2010-12-03 07:08:55.953226] E [rdma.c:2079:rdma_create_cq] rpc-transport/rdma: rdma.management: creation of send_cq failed [2010-12-03 07:08:55.953509] E [rdma.c:3785:rdma_get_device] rpc-transport/rdma: rdma.management: could not create CQ [2010-12-03 07:08:55.953582] E [rdma.c:3971:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-12-03 07:08:55.953668] E [rdma.c:4803:init] rdma.management: Failed to initialize IB Device [2010-12-03 07:08:55.953691] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed [2010-12-03 07:08:55.953780] I [glusterd.c:96:glusterd_uuid_init] glusterd: generated UUID: 4eb47ca7-227c-49c4-97bd-25ac177b2f6a Given volfile: +--+ 1: volume management 2: type mgmt/glusterd 3: option working-directory /etc/glusterd 4: option transport-type socket,rdma 5: option transport.socket.keepalive-time 10 6: option transport.socket.keepalive-interval 2 7: end-volume 8: +--+ [2010-12-03 07:09:10.244790] I [glusterd-handler.c:785:glusterd_handle_create_volume] glusterd: Received create volume req [2010-12-03 07:09:10.247646] I [glusterd-utils.c:232:glusterd_lock] glusterd: Cluster lock held by 4eb47ca7-227c-49c4-97bd-25ac177b2f6a [2010-12-03 07:09:10.247678] I [glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired local lock [2010-12-03 07:09:10.247708] I [glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock req to 0 peers [2010-12-03 07:09:10.248038] I [glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req to 0 peers [2010-12-03 07:09:10.251970] I [glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req to 0 peers [2010-12-03 07:09:10.252020] I [glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent unlock req to 0 peers [2010-12-03 07:09:10.252036] I [glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared local lock [2010-12-03 07:09:22.11649] I [glusterd-handler.c:936:glusterd_handle_cli_start_volume] glusterd: Received start vol reqfor volume testdir [2010-12-03 07:09:22.11724] I [glusterd-utils.c:232:glusterd_lock] glusterd: Cluster lock held by 4eb47ca7-227c-49c4-97bd-25ac177b2f6a [2010-12-03 07:09:22.11734] I [glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired local lock [2010-12-03 07:09:22.11761] I [glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock req to 0 peers [2010-12-03 07:09:22.12120] I [glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req to 0 peers [2010-12-03 07:09:22.184403] I [glusterd-utils.c:971:glusterd_volume_start_glusterfs] : About to start glusterfs for brick pgh-submit-1:/mnt/gluster [2010-12-03 07:09:22.229143] I [glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req to 0 peers [2010-12-03 07:09:22.229198] I [glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent unlock req to 0 peers [2010-12-03 07:09:22.229218] I [glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared local lock [2010-12-03 07:09:22.240157] I [glusterd-pmap.c:281:pmap_registry_remove] pmap: removing brick (null) on port 24009 Client: [2010-12-03 07:09:00.82784] W [io-stats.c:1644:init] testdir: dangling volume. check volfile [2010-12-03 07:09:00.82824] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-03 07:09:00.82836] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-03 07:09:00.85980] E [rdma.c:2047:rdma_create_cq] rpc-transport/rdma: max_mr_size = 18446744073709551615, max_cq = 65408, max_cqe = 131071, max_mr = 131056 [2010-12-03 07:09:00.92883] E [rdma.c:2079:rdma_create_cq] rpc-transport/rdma: testdir-client-0: creation of send_cq failed [2010-12-03 07:09:00.93156] E [rdma.c:3785:rdma_get_device] rpc-transport/rdma: testdir-client-0: could not create CQ [2010-12-03 07:09:00.93224] E [rdma.c:3971:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-12-03 07:09:00.93313] E [rdma.c
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
Hi Jeremy, can you also get the output of, #uname -a #ulimit -l regards, - Original Message - From: Raghavendra G raghaven...@gluster.com To: Jeremy Stout stout.jer...@gmail.com Cc: gluster-users@gluster.org Sent: Thursday, December 2, 2010 10:20:04 AM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 Hi Jeremy, In order to diagnoise why completion queue creation is failing (as indicated by logs), we want to know what was the free memory available in your system when glusterfs was started. regards, - Original Message - From: Raghavendra G raghaven...@gluster.com To: Jeremy Stout stout.jer...@gmail.com Cc: gluster-users@gluster.org Sent: Thursday, December 2, 2010 10:11:18 AM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 Hi Jeremy, Yes, there might be some performance decrease. But, it should not affect working of rdma. regards, - Original Message - From: Jeremy Stout stout.jer...@gmail.com To: gluster-users@gluster.org Sent: Thursday, December 2, 2010 8:30:20 AM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 As an update to my situation, I think I have GlusterFS 3.1.1 working now. I was able to create and mount RDMA volumes without any errors. To fix the problem, I had to make the following changes on lines 3562 and 3563 in rdma.c: options-send_count = 32; options-recv_count = 32; The values were set to 128. I'll run some tests tomorrow to verify that it is working correctly. Assuming it does, what would be the expected side-effect of changing the values from 128 to 32? Will there be a decrease in performance? On Wed, Dec 1, 2010 at 10:07 AM, Jeremy Stout stout.jer...@gmail.com wrote: Here are the results of the test: submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong local address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID :: local address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID :: local address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID :: local address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID :: local address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID :: local address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID :: local address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID :: local address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID :: local address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID :: local address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID :: local address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID :: local address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID :: local address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID :: local address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID :: local address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID :: local address: LID 0x0002, QPN 0x000415, PSN 0x29562e, GID :: remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID :: remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID :: remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID :: remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID :: remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID :: remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID :: 8192000 bytes in 0.01 seconds = 5917.47 Mbit/sec 1000 iters in 0.01 seconds = 11.07 usec/iter fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong submit-1 local address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: local address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: local address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: local address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: local address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: local address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: local address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: local address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: local address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: local address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: local address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID :: local address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID :: local address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID :: local address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID :: local address
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
An another follow-up, I tested several compilations today with different values for send/receive count. I found the maximum value I could use for both variables was 127. With a value of 127, GlusterFS did not produce any errors. However, when I changed the value back to 128, the RDMA errors appeared again. I also tried setting soft/hard memlock to unlimited in the limits.conf file, but still ran into RDMA errors on the client side when the count variables were set to 128. On Thu, Dec 2, 2010 at 9:04 AM, Jeremy Stout stout.jer...@gmail.com wrote: Thank you for the response. I've been testing GlusterFS 3.1.1 on two different OpenSUSE 11.3 systems. Since both systems generated the same error messages, I'll include the output for both. System #1: fs-1:~ # cat /proc/meminfo MemTotal: 16468756 kB MemFree: 16126680 kB Buffers: 15680 kB Cached: 155860 kB SwapCached: 0 kB Active: 65228 kB Inactive: 123100 kB Active(anon): 18632 kB Inactive(anon): 48 kB Active(file): 46596 kB Inactive(file): 123052 kB Unevictable: 1988 kB Mlocked: 1988 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 30072 kB Writeback: 4 kB AnonPages: 18780 kB Mapped: 12136 kB Shmem: 220 kB Slab: 39592 kB SReclaimable: 13108 kB SUnreclaim: 26484 kB KernelStack: 2360 kB PageTables: 2036 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 8234376 kB Committed_AS: 107304 kB VmallocTotal: 34359738367 kB VmallocUsed: 314316 kB VmallocChunk: 34349860776 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 9856 kB DirectMap2M: 3135488 kB DirectMap1G: 13631488 kB fs-1:~ # uname -a Linux fs-1 2.6.32.25-November2010 #2 SMP PREEMPT Mon Nov 1 15:19:55 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux fs-1:~ # ulimit -l 64 System #2: submit-1:~ # cat /proc/meminfo MemTotal: 16470424 kB MemFree: 16197292 kB Buffers: 11788 kB Cached: 85492 kB SwapCached: 0 kB Active: 39120 kB Inactive: 76548 kB Active(anon): 18532 kB Inactive(anon): 48 kB Active(file): 20588 kB Inactive(file): 76500 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 67100656 kB SwapFree: 67100656 kB Dirty: 24 kB Writeback: 0 kB AnonPages: 18408 kB Mapped: 11644 kB Shmem: 184 kB Slab: 34000 kB SReclaimable: 8512 kB SUnreclaim: 25488 kB KernelStack: 2160 kB PageTables: 1952 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 75335868 kB Committed_AS: 105620 kB VmallocTotal: 34359738367 kB VmallocUsed: 76416 kB VmallocChunk: 34359652640 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 7488 kB DirectMap2M: 16769024 kB submit-1:~ # uname -a Linux submit-1 2.6.33.7-November2010 #1 SMP PREEMPT Mon Nov 8 13:49:00 EST 2010 x86_64 x86_64 x86_64 GNU/Linux submit-1:~ # ulimit -l 64 I retrieved the memory information on each machine after starting the glusterd process. On Thu, Dec 2, 2010 at 3:51 AM, Raghavendra G raghaven...@gluster.com wrote: Hi Jeremy, can you also get the output of, #uname -a #ulimit -l regards, - Original Message - From: Raghavendra G raghaven...@gluster.com To: Jeremy Stout stout.jer...@gmail.com Cc: gluster-users@gluster.org Sent: Thursday, December 2, 2010 10:20:04 AM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 Hi Jeremy, In order to diagnoise why completion queue creation is failing (as indicated by logs), we want to know what was the free memory available in your system when glusterfs was started. regards, - Original Message - From: Raghavendra G raghaven...@gluster.com To: Jeremy Stout stout.jer...@gmail.com Cc: gluster-users@gluster.org Sent: Thursday, December 2, 2010 10:11:18 AM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 Hi Jeremy, Yes, there might be some performance decrease. But, it should not affect working of rdma. regards, - Original Message - From: Jeremy Stout stout.jer...@gmail.com To: gluster-users@gluster.org Sent: Thursday, December 2, 2010 8:30:20 AM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 As an update to my situation, I think I have GlusterFS 3.1.1 working now. I was able to create and mount RDMA volumes without any
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
Jeremy - What version of OFED are you running? Would you mind install version 1.5.2 from source? We have seen this resolve several issues of this type. http://www.openfabrics.org/downloads/OFED/ofed-1.5.2/ Thanks, Craig -- Craig Carl Senior Systems Engineer Gluster On 12/02/2010 10:05 AM, Jeremy Stout wrote: An another follow-up, I tested several compilations today with different values for send/receive count. I found the maximum value I could use for both variables was 127. With a value of 127, GlusterFS did not produce any errors. However, when I changed the value back to 128, the RDMA errors appeared again. I also tried setting soft/hard memlock to unlimited in the limits.conf file, but still ran into RDMA errors on the client side when the count variables were set to 128. On Thu, Dec 2, 2010 at 9:04 AM, Jeremy Stoutstout.jer...@gmail.com wrote: Thank you for the response. I've been testing GlusterFS 3.1.1 on two different OpenSUSE 11.3 systems. Since both systems generated the same error messages, I'll include the output for both. System #1: fs-1:~ # cat /proc/meminfo MemTotal: 16468756 kB MemFree:16126680 kB Buffers: 15680 kB Cached: 155860 kB SwapCached:0 kB Active:65228 kB Inactive: 123100 kB Active(anon): 18632 kB Inactive(anon): 48 kB Active(file): 46596 kB Inactive(file): 123052 kB Unevictable:1988 kB Mlocked:1988 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 30072 kB Writeback: 4 kB AnonPages: 18780 kB Mapped:12136 kB Shmem: 220 kB Slab: 39592 kB SReclaimable: 13108 kB SUnreclaim:26484 kB KernelStack:2360 kB PageTables: 2036 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit: 8234376 kB Committed_AS: 107304 kB VmallocTotal: 34359738367 kB VmallocUsed: 314316 kB VmallocChunk: 34349860776 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free:0 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k:9856 kB DirectMap2M: 3135488 kB DirectMap1G:13631488 kB fs-1:~ # uname -a Linux fs-1 2.6.32.25-November2010 #2 SMP PREEMPT Mon Nov 1 15:19:55 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux fs-1:~ # ulimit -l 64 System #2: submit-1:~ # cat /proc/meminfo MemTotal: 16470424 kB MemFree:16197292 kB Buffers: 11788 kB Cached:85492 kB SwapCached:0 kB Active:39120 kB Inactive: 76548 kB Active(anon): 18532 kB Inactive(anon): 48 kB Active(file): 20588 kB Inactive(file):76500 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 67100656 kB SwapFree: 67100656 kB Dirty:24 kB Writeback: 0 kB AnonPages: 18408 kB Mapped:11644 kB Shmem: 184 kB Slab: 34000 kB SReclaimable: 8512 kB SUnreclaim:25488 kB KernelStack:2160 kB PageTables: 1952 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:75335868 kB Committed_AS: 105620 kB VmallocTotal: 34359738367 kB VmallocUsed: 76416 kB VmallocChunk: 34359652640 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free:0 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k:7488 kB DirectMap2M:16769024 kB submit-1:~ # uname -a Linux submit-1 2.6.33.7-November2010 #1 SMP PREEMPT Mon Nov 8 13:49:00 EST 2010 x86_64 x86_64 x86_64 GNU/Linux submit-1:~ # ulimit -l 64 I retrieved the memory information on each machine after starting the glusterd process. On Thu, Dec 2, 2010 at 3:51 AM, Raghavendra Graghaven...@gluster.com wrote: Hi Jeremy, can you also get the output of, #uname -a #ulimit -l regards, - Original Message - From: Raghavendra Graghaven...@gluster.com To: Jeremy Stoutstout.jer...@gmail.com Cc: gluster-users@gluster.org Sent: Thursday, December 2, 2010 10:20:04 AM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 Hi Jeremy, In order to diagnoise why completion queue creation is failing (as indicated by logs), we want to know what was the free memory available in your system when glusterfs was started. regards, - Original Message - From: Raghavendra Graghaven...@gluster.com To: Jeremy Stoutstout.jer...@gmail.com Cc: gluster-users@gluster.org Sent: Thursday, December 2, 2010 10:11:18 AM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 Hi Jeremy, Yes, there might be some performance decrease. But, it should not affect working of rdma. regards, - Original Message - From: Jeremy Stoutstout.jer...@gmail.com To: gluster-users@gluster.org Sent: Thursday, December 2, 2010 8:30:20 AM Subject: Re
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
: 11788 kB Cached: 85492 kB SwapCached: 0 kB Active: 39120 kB Inactive: 76548 kB Active(anon): 18532 kB Inactive(anon): 48 kB Active(file): 20588 kB Inactive(file): 76500 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 67100656 kB SwapFree: 67100656 kB Dirty: 24 kB Writeback: 0 kB AnonPages: 18408 kB Mapped: 11644 kB Shmem: 184 kB Slab: 34000 kB SReclaimable: 8512 kB SUnreclaim: 25488 kB KernelStack: 2160 kB PageTables: 1952 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 75335868 kB Committed_AS: 105620 kB VmallocTotal: 34359738367 kB VmallocUsed: 76416 kB VmallocChunk: 34359652640 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 7488 kB DirectMap2M: 16769024 kB submit-1:~ # uname -a Linux submit-1 2.6.33.7-November2010 #1 SMP PREEMPT Mon Nov 8 13:49:00 EST 2010 x86_64 x86_64 x86_64 GNU/Linux submit-1:~ # ulimit -l 64 I retrieved the memory information on each machine after starting the glusterd process. On Thu, Dec 2, 2010 at 3:51 AM, Raghavendra Graghaven...@gluster.com wrote: Hi Jeremy, can you also get the output of, #uname -a #ulimit -l regards, - Original Message - From: Raghavendra Graghaven...@gluster.com To: Jeremy Stoutstout.jer...@gmail.com Cc: gluster-users@gluster.org Sent: Thursday, December 2, 2010 10:20:04 AM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 Hi Jeremy, In order to diagnoise why completion queue creation is failing (as indicated by logs), we want to know what was the free memory available in your system when glusterfs was started. regards, - Original Message - From: Raghavendra Graghaven...@gluster.com To: Jeremy Stoutstout.jer...@gmail.com Cc: gluster-users@gluster.org Sent: Thursday, December 2, 2010 10:11:18 AM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 Hi Jeremy, Yes, there might be some performance decrease. But, it should not affect working of rdma. regards, - Original Message - From: Jeremy Stoutstout.jer...@gmail.com To: gluster-users@gluster.org Sent: Thursday, December 2, 2010 8:30:20 AM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 As an update to my situation, I think I have GlusterFS 3.1.1 working now. I was able to create and mount RDMA volumes without any errors. To fix the problem, I had to make the following changes on lines 3562 and 3563 in rdma.c: options-send_count = 32; options-recv_count = 32; The values were set to 128. I'll run some tests tomorrow to verify that it is working correctly. Assuming it does, what would be the expected side-effect of changing the values from 128 to 32? Will there be a decrease in performance? On Wed, Dec 1, 2010 at 10:07 AM, Jeremy Stoutstout.jer...@gmail.com wrote: Here are the results of the test: submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong local address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID :: local address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID :: local address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID :: local address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID :: local address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID :: local address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID :: local address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID :: local address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID :: local address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID :: local address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID :: local address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID :: local address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID :: local address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID :: local address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID :: local address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID :: local address: LID 0x0002, QPN 0x000415, PSN 0x29562e, GID :: remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: remote
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
Hi Jeremy, Can you apply the attached patch, rebuild and start glusterfs? Please make sure to send us the logs of glusterfs. regards, - Original Message - From: Jeremy Stout stout.jer...@gmail.com To: gluster-users@gluster.org Sent: Friday, December 3, 2010 6:38:00 AM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 I'm currently using OFED 1.5.2. For the sake of testing, I just compiled GlusterFS 3.1.1 from source, without any modifications, on two systems that have a 2.6.33.7 kernel and OFED 1.5.2 built from source. Here are the results: Server: [2010-12-02 21:17:55.886563] I [glusterd-handler.c:936:glusterd_handle_cli_start_volume] glusterd: Received start vol reqfor volume testdir [2010-12-02 21:17:55.886597] I [glusterd-utils.c:232:glusterd_lock] glusterd: Cluster lock held by 7dd23af5-277e-4ea1-a495-2a9d882287ec [2010-12-02 21:17:55.886607] I [glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired local lock [2010-12-02 21:17:55.886628] I [glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock req to 0 peers [2010-12-02 21:17:55.887031] I [glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req to 0 peers [2010-12-02 21:17:56.60427] I [glusterd-utils.c:971:glusterd_volume_start_glusterfs] : About to start glusterfs for brick submit-1:/mnt/gluster [2010-12-02 21:17:56.104896] I [glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req to 0 peers [2010-12-02 21:17:56.104935] I [glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent unlock req to 0 peers [2010-12-02 21:17:56.104953] I [glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared local lock [2010-12-02 21:17:56.114764] I [glusterd-pmap.c:281:pmap_registry_remove] pmap: removing brick (null) on port 24009 Client: [2010-12-02 21:17:25.503395] W [io-stats.c:1644:init] testdir: dangling volume. check volfile [2010-12-02 21:17:25.503434] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-02 21:17:25.503447] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-02 21:17:25.543409] E [rdma.c:2066:rdma_create_cq] rpc-transport/rdma: testdir-client-0: creation of send_cq failed [2010-12-02 21:17:25.543660] E [rdma.c:3771:rdma_get_device] rpc-transport/rdma: testdir-client-0: could not create CQ [2010-12-02 21:17:25.543725] E [rdma.c:3957:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-12-02 21:17:25.543812] E [rdma.c:4789:init] testdir-client-0: Failed to initialize IB Device [2010-12-02 21:17:25.543830] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed Thank you for the help so far. On Thu, Dec 2, 2010 at 8:02 PM, Craig Carl cr...@gluster.com wrote: Jeremy - What version of OFED are you running? Would you mind install version 1.5.2 from source? We have seen this resolve several issues of this type. http://www.openfabrics.org/downloads/OFED/ofed-1.5.2/ Thanks, Craig -- Craig Carl Senior Systems Engineer Gluster On 12/02/2010 10:05 AM, Jeremy Stout wrote: An another follow-up, I tested several compilations today with different values for send/receive count. I found the maximum value I could use for both variables was 127. With a value of 127, GlusterFS did not produce any errors. However, when I changed the value back to 128, the RDMA errors appeared again. I also tried setting soft/hard memlock to unlimited in the limits.conf file, but still ran into RDMA errors on the client side when the count variables were set to 128. On Thu, Dec 2, 2010 at 9:04 AM, Jeremy Stoutstout.jer...@gmail.com wrote: Thank you for the response. I've been testing GlusterFS 3.1.1 on two different OpenSUSE 11.3 systems. Since both systems generated the same error messages, I'll include the output for both. System #1: fs-1:~ # cat /proc/meminfo MemTotal: 16468756 kB MemFree: 16126680 kB Buffers: 15680 kB Cached: 155860 kB SwapCached: 0 kB Active: 65228 kB Inactive: 123100 kB Active(anon): 18632 kB Inactive(anon): 48 kB Active(file): 46596 kB Inactive(file): 123052 kB Unevictable: 1988 kB Mlocked: 1988 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 30072 kB Writeback: 4 kB AnonPages: 18780 kB Mapped: 12136 kB Shmem: 220 kB Slab: 39592 kB SReclaimable: 13108 kB SUnreclaim: 26484 kB KernelStack: 2360 kB PageTables: 2036 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 8234376 kB Committed_AS: 107304 kB VmallocTotal: 34359738367 kB VmallocUsed: 314316 kB VmallocChunk: 34349860776 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k
[Gluster-users] RDMA Problems with GlusterFS 3.1.1
Whenever I try to start or mount a GlusterFS 3.1.1 volume that uses RDMA, I'm seeing the following error messages in the log file on the server: [2010-11-30 18:37:53.51270] I [nfs.c:652:init] nfs: NFS service started [2010-11-30 18:37:53.51362] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-11-30 18:37:53.51375] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-11-30 18:37:53.59628] E [rdma.c:2066:rdma_create_cq] rpc-transport/rdma: testdir-client-0: creation of send_cq failed [2010-11-30 18:37:53.59851] E [rdma.c:3771:rdma_get_device] rpc-transport/rdma: testdir-client-0: could not create CQ [2010-11-30 18:37:53.59925] E [rdma.c:3957:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-11-30 18:37:53.60009] E [rdma.c:4789:init] testdir-client-0: Failed to initialize IB Device [2010-11-30 18:37:53.60030] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed On the client, I see: [2010-11-30 18:43:49.653469] W [io-stats.c:1644:init] testdir: dangling volume. check volfile [2010-11-30 18:43:49.653573] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-11-30 18:43:49.653607] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-11-30 18:43:49.736275] E [rdma.c:2066:rdma_create_cq] rpc-transport/rdma: testdir-client-0: creation of send_cq failed [2010-11-30 18:43:49.736651] E [rdma.c:3771:rdma_get_device] rpc-transport/rdma: testdir-client-0: could not create CQ [2010-11-30 18:43:49.736689] E [rdma.c:3957:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-11-30 18:43:49.736805] E [rdma.c:4789:init] testdir-client-0: Failed to initialize IB Device [2010-11-30 18:43:49.736841] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed This results in an unsuccessful mount. I created the mount using the following commands: /usr/local/glusterfs/3.1.1/sbin/gluster volume create testdir transport rdma submit-1:/exports /usr/local/glusterfs/3.1.1/sbin/gluster volume start testdir To mount the directory, I use: mount -t glusterfs submit-1:/testdir /mnt/glusterfs I don't think it is an Infiniband problem since GlusterFS 3.0.6 and GlusterFS 3.1.0 worked on the same systems. For GlusterFS 3.1.0, the commands listed above produced no error messages. If anyone can provide help with debugging these error messages, it would be appreciated. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
Can you verify that ibv_srq_pingpong works from the server where this log file is from? Thanks, Avati On Wed, Dec 1, 2010 at 7:44 PM, Jeremy Stout stout.jer...@gmail.com wrote: Whenever I try to start or mount a GlusterFS 3.1.1 volume that uses RDMA, I'm seeing the following error messages in the log file on the server: [2010-11-30 18:37:53.51270] I [nfs.c:652:init] nfs: NFS service started [2010-11-30 18:37:53.51362] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-11-30 18:37:53.51375] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-11-30 18:37:53.59628] E [rdma.c:2066:rdma_create_cq] rpc-transport/rdma: testdir-client-0: creation of send_cq failed [2010-11-30 18:37:53.59851] E [rdma.c:3771:rdma_get_device] rpc-transport/rdma: testdir-client-0: could not create CQ [2010-11-30 18:37:53.59925] E [rdma.c:3957:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-11-30 18:37:53.60009] E [rdma.c:4789:init] testdir-client-0: Failed to initialize IB Device [2010-11-30 18:37:53.60030] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed On the client, I see: [2010-11-30 18:43:49.653469] W [io-stats.c:1644:init] testdir: dangling volume. check volfile [2010-11-30 18:43:49.653573] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-11-30 18:43:49.653607] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-11-30 18:43:49.736275] E [rdma.c:2066:rdma_create_cq] rpc-transport/rdma: testdir-client-0: creation of send_cq failed [2010-11-30 18:43:49.736651] E [rdma.c:3771:rdma_get_device] rpc-transport/rdma: testdir-client-0: could not create CQ [2010-11-30 18:43:49.736689] E [rdma.c:3957:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-11-30 18:43:49.736805] E [rdma.c:4789:init] testdir-client-0: Failed to initialize IB Device [2010-11-30 18:43:49.736841] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed This results in an unsuccessful mount. I created the mount using the following commands: /usr/local/glusterfs/3.1.1/sbin/gluster volume create testdir transport rdma submit-1:/exports /usr/local/glusterfs/3.1.1/sbin/gluster volume start testdir To mount the directory, I use: mount -t glusterfs submit-1:/testdir /mnt/glusterfs I don't think it is an Infiniband problem since GlusterFS 3.0.6 and GlusterFS 3.1.0 worked on the same systems. For GlusterFS 3.1.0, the commands listed above produced no error messages. If anyone can provide help with debugging these error messages, it would be appreciated. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
Here are the results of the test: submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong local address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID :: local address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID :: local address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID :: local address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID :: local address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID :: local address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID :: local address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID :: local address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID :: local address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID :: local address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID :: local address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID :: local address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID :: local address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID :: local address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID :: local address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID :: local address: LID 0x0002, QPN 0x000415, PSN 0x29562e, GID :: remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID :: remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID :: remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID :: remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID :: remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID :: remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID :: 8192000 bytes in 0.01 seconds = 5917.47 Mbit/sec 1000 iters in 0.01 seconds = 11.07 usec/iter fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong submit-1 local address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: local address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: local address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: local address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: local address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: local address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: local address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: local address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: local address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: local address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: local address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID :: local address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID :: local address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID :: local address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID :: local address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID :: local address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID :: remote address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID :: remote address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID :: remote address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID :: remote address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID :: remote address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID :: remote address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID :: remote address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID :: remote address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID :: remote address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID :: remote address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID :: remote address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID :: remote address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID :: remote address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID :: remote address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID :: remote address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID :: remote address: LID 0x0002, QPN 0x000415, PSN 0x29562e, GID :: 8192000 bytes in 0.01 seconds = 7423.65 Mbit/sec 1000 iters in 0.01 seconds = 8.83 usec/iter Based on the output, I believe it ran correctly. On Wed, Dec 1, 2010 at 9:51 AM, Anand Avati anand.av...@gmail.com wrote: Can you verify that ibv_srq_pingpong works from the server where this log file is from? Thanks, Avati On Wed, Dec 1, 2010 at 7:44 PM, Jeremy Stout stout.jer...@gmail.com wrote: Whenever I try to start or mount a GlusterFS 3.1.1 volume that uses RDMA, I'm seeing the following error messages in the log file on the server:
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
As an update to my situation, I think I have GlusterFS 3.1.1 working now. I was able to create and mount RDMA volumes without any errors. To fix the problem, I had to make the following changes on lines 3562 and 3563 in rdma.c: options-send_count = 32; options-recv_count = 32; The values were set to 128. I'll run some tests tomorrow to verify that it is working correctly. Assuming it does, what would be the expected side-effect of changing the values from 128 to 32? Will there be a decrease in performance? On Wed, Dec 1, 2010 at 10:07 AM, Jeremy Stout stout.jer...@gmail.com wrote: Here are the results of the test: submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong local address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID :: local address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID :: local address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID :: local address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID :: local address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID :: local address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID :: local address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID :: local address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID :: local address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID :: local address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID :: local address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID :: local address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID :: local address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID :: local address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID :: local address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID :: local address: LID 0x0002, QPN 0x000415, PSN 0x29562e, GID :: remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID :: remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID :: remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID :: remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID :: remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID :: remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID :: 8192000 bytes in 0.01 seconds = 5917.47 Mbit/sec 1000 iters in 0.01 seconds = 11.07 usec/iter fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong submit-1 local address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: local address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: local address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: local address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: local address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: local address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: local address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: local address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: local address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: local address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: local address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID :: local address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID :: local address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID :: local address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID :: local address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID :: local address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID :: remote address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID :: remote address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID :: remote address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID :: remote address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID :: remote address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID :: remote address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID :: remote address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID :: remote address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID :: remote address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID :: remote address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID :: remote address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID :: remote address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID :: remote address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID :: remote address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID :: remote address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID :: remote address: LID
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
Hi Jeremy, Yes, there might be some performance decrease. But, it should not affect working of rdma. regards, - Original Message - From: Jeremy Stout stout.jer...@gmail.com To: gluster-users@gluster.org Sent: Thursday, December 2, 2010 8:30:20 AM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 As an update to my situation, I think I have GlusterFS 3.1.1 working now. I was able to create and mount RDMA volumes without any errors. To fix the problem, I had to make the following changes on lines 3562 and 3563 in rdma.c: options-send_count = 32; options-recv_count = 32; The values were set to 128. I'll run some tests tomorrow to verify that it is working correctly. Assuming it does, what would be the expected side-effect of changing the values from 128 to 32? Will there be a decrease in performance? On Wed, Dec 1, 2010 at 10:07 AM, Jeremy Stout stout.jer...@gmail.com wrote: Here are the results of the test: submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong local address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID :: local address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID :: local address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID :: local address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID :: local address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID :: local address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID :: local address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID :: local address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID :: local address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID :: local address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID :: local address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID :: local address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID :: local address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID :: local address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID :: local address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID :: local address: LID 0x0002, QPN 0x000415, PSN 0x29562e, GID :: remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID :: remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID :: remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID :: remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID :: remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID :: remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID :: 8192000 bytes in 0.01 seconds = 5917.47 Mbit/sec 1000 iters in 0.01 seconds = 11.07 usec/iter fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong submit-1 local address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: local address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: local address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: local address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: local address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: local address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: local address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: local address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: local address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: local address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: local address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID :: local address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID :: local address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID :: local address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID :: local address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID :: local address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID :: remote address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID :: remote address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID :: remote address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID :: remote address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID :: remote address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID :: remote address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID :: remote address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID :: remote address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID :: remote address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID :: remote address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID :: remote address: LID
Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
Hi Jeremy, In order to diagnoise why completion queue creation is failing (as indicated by logs), we want to know what was the free memory available in your system when glusterfs was started. regards, - Original Message - From: Raghavendra G raghaven...@gluster.com To: Jeremy Stout stout.jer...@gmail.com Cc: gluster-users@gluster.org Sent: Thursday, December 2, 2010 10:11:18 AM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 Hi Jeremy, Yes, there might be some performance decrease. But, it should not affect working of rdma. regards, - Original Message - From: Jeremy Stout stout.jer...@gmail.com To: gluster-users@gluster.org Sent: Thursday, December 2, 2010 8:30:20 AM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 As an update to my situation, I think I have GlusterFS 3.1.1 working now. I was able to create and mount RDMA volumes without any errors. To fix the problem, I had to make the following changes on lines 3562 and 3563 in rdma.c: options-send_count = 32; options-recv_count = 32; The values were set to 128. I'll run some tests tomorrow to verify that it is working correctly. Assuming it does, what would be the expected side-effect of changing the values from 128 to 32? Will there be a decrease in performance? On Wed, Dec 1, 2010 at 10:07 AM, Jeremy Stout stout.jer...@gmail.com wrote: Here are the results of the test: submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong local address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID :: local address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID :: local address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID :: local address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID :: local address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID :: local address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID :: local address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID :: local address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID :: local address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID :: local address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID :: local address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID :: local address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID :: local address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID :: local address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID :: local address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID :: local address: LID 0x0002, QPN 0x000415, PSN 0x29562e, GID :: remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID :: remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID :: remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID :: remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID :: remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID :: remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID :: 8192000 bytes in 0.01 seconds = 5917.47 Mbit/sec 1000 iters in 0.01 seconds = 11.07 usec/iter fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong submit-1 local address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: local address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: local address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: local address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: local address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: local address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: local address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: local address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: local address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: local address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: local address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID :: local address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID :: local address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID :: local address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID :: local address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID :: local address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID :: remote address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID :: remote address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID :: remote address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID :: remote address: LID