Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-13 Thread Artem Trunov
Hi, Jeremy, Raghavendra

Thanks! That patch worked just fine for me as well.

cheers,
artem

On Mon, Dec 13, 2010 at 2:39 AM, Jeremy Stout stout.jer...@gmail.com wrote:
 I recompiled GlusterFS using the unaccepted patch and I haven't
 received any RDMA error messages yet. I'll run some benchmarking tests
 over the next couple of days to test the program's stability.

 Thank you.

 On Fri, Dec 10, 2010 at 12:22 AM, Raghavendra G raghaven...@gluster.com 
 wrote:
 Hi Artem,

 you can check the maximum limits using the patch I had sent earlier in the 
 same thread. Also, the patch
 http://patches.gluster.com/patch/5844/ (which is not accepted yet), will 
 check for whether the number of cqe being passed in ibv_creation_cq is 
 greater than the value allowed by the device and if so, it will try to 
 create CQ with maximum limit allowed by the device.

 regards,
 - Original Message -
 From: Artem Trunov datam...@gmail.com
 To: Raghavendra G raghaven...@gluster.com
 Cc: Jeremy Stout stout.jer...@gmail.com, gluster-users@gluster.org
 Sent: Thursday, December 9, 2010 7:13:40 PM
 Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

 Hi, Ravendra, Jeremy

 This was very interesting debugging thread to me, since I have the
 same symptoms, but unsure of the origin. Please see log for my mount
 command at the end of the message.

 I have installed 3.3.1. My OFED is 1.5.1 - does it make serious
 difference between already mentioned 1.5.2?

 On hardware limitations - I have Mellanox InfiniHost III Lx 20Gb/s and
 it says in specs:

 Supports 16 million QPs, EEs  CQs 

 Is this enough? How can I query for actual settings on max_cq, max_cqe?

 In general, how should I proceed? What are my other debugging options?
 Should I try to go Jeremy path with hacking the gluster code?

 cheers
 Artem.

 Log:

 -
 [2010-12-09 15:15:53.847595] W [io-stats.c:1644:init] test-volume:
 dangling volume. check volfile
 [2010-12-09 15:15:53.847643] W [dict.c:1204:data_to_str] dict: @data=(nil)
 [2010-12-09 15:15:53.847657] W [dict.c:1204:data_to_str] dict: @data=(nil)
 [2010-12-09 15:15:53.858574] E [rdma.c:2066:rdma_create_cq]
 rpc-transport/rdma: test-volume-client-1: creation of send_cq failed
 [2010-12-09 15:15:53.858805] E [rdma.c:3771:rdma_get_device]
 rpc-transport/rdma: test-volume-client-1: could not create CQ
 [2010-12-09 15:15:53.858821] E [rdma.c:3957:rdma_init]
 rpc-transport/rdma: could not create rdma device for mthca0
 [2010-12-09 15:15:53.858893] E [rdma.c:4789:init]
 test-volume-client-1: Failed to initialize IB Device
 [2010-12-09 15:15:53.858909] E
 [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
 initialization failed
 pending frames:

 patchset: v3.1.1
 signal received: 11
 time of crash: 2010-12-09 15:15:53
 configuration details:
 argp 1
 backtrace 1
 dlfcn 1
 fdatasync 1
 libpthread 1
 llistxattr 1
 setfsid 1
 spinlock 1
 epoll.h 1
 xattr.h 1
 st_atim.tv_nsec 1
 package-string: glusterfs 3.1.1
 /lib64/libc.so.6[0x32aca302d0]
 /lib64/libc.so.6(strcmp+0x0)[0x32aca79140]
 /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so[0x2c4fef6c]
 /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x2f)[0x2c50013f]
 /usr/lib64/libgfrpc.so.0(rpc_transport_load+0x389)[0x3fcca0cac9]
 /usr/lib64/libgfrpc.so.0(rpc_clnt_new+0xfe)[0x3fcca1053e]
 /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa1)[0x2b194f01]
 /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(init+0x129)[0x2b1950d9]
 /usr/lib64/libglusterfs.so.0(xlator_init+0x58)[0x3fcc617398]
 /usr/lib64/libglusterfs.so.0(glusterfs_graph_init+0x31)[0x3fcc640291]
 /usr/lib64/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x3fcc6403c8]
 /usr/sbin/glusterfs(glusterfs_process_volfp+0xfa)[0x40373a]
 /usr/sbin/glusterfs(mgmt_getspec_cbk+0xc5)[0x406125]
 /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x3fcca0f542]
 /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x3fcca0f73d]
 /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x3fcca0a95c]
 /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2ad6ef9f]
 /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x170)[0x2ad6f130]
 /usr/lib64/libglusterfs.so.0[0x3fcc637917]
 /usr/sbin/glusterfs(main+0x39b)[0x40470b]
 /lib64/libc.so.6(__libc_start_main+0xf4)[0x32aca1d994]
 /usr/sbin/glusterfs[0x402e29]




 On Fri, Dec 3, 2010 at 1:53 PM, Raghavendra G raghaven...@gluster.com 
 wrote:
 From the logs its evident that the reason for completion queue creation 
 failure is that the number of completion queue elements (in a completion 
 queue) we had requested in ibv_create_cq, (1024 * send_count) is less than 
 the maximum supported by the ib hardware (max_cqe = 131071).

 - Original Message -
 From: Jeremy Stout stout.jer...@gmail.com
 To: Raghavendra G raghaven...@gluster.com
 Cc: gluster-users@gluster.org
 Sent: Friday, December 3, 2010 4:20:04 PM
 Subject: Re: [Gluster-users] RDMA Problems

Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-12 Thread Jeremy Stout
I recompiled GlusterFS using the unaccepted patch and I haven't
received any RDMA error messages yet. I'll run some benchmarking tests
over the next couple of days to test the program's stability.

Thank you.

On Fri, Dec 10, 2010 at 12:22 AM, Raghavendra G raghaven...@gluster.com wrote:
 Hi Artem,

 you can check the maximum limits using the patch I had sent earlier in the 
 same thread. Also, the patch
 http://patches.gluster.com/patch/5844/ (which is not accepted yet), will 
 check for whether the number of cqe being passed in ibv_creation_cq is 
 greater than the value allowed by the device and if so, it will try to create 
 CQ with maximum limit allowed by the device.

 regards,
 - Original Message -
 From: Artem Trunov datam...@gmail.com
 To: Raghavendra G raghaven...@gluster.com
 Cc: Jeremy Stout stout.jer...@gmail.com, gluster-users@gluster.org
 Sent: Thursday, December 9, 2010 7:13:40 PM
 Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

 Hi, Ravendra, Jeremy

 This was very interesting debugging thread to me, since I have the
 same symptoms, but unsure of the origin. Please see log for my mount
 command at the end of the message.

 I have installed 3.3.1. My OFED is 1.5.1 - does it make serious
 difference between already mentioned 1.5.2?

 On hardware limitations - I have Mellanox InfiniHost III Lx 20Gb/s and
 it says in specs:

 Supports 16 million QPs, EEs  CQs 

 Is this enough? How can I query for actual settings on max_cq, max_cqe?

 In general, how should I proceed? What are my other debugging options?
 Should I try to go Jeremy path with hacking the gluster code?

 cheers
 Artem.

 Log:

 -
 [2010-12-09 15:15:53.847595] W [io-stats.c:1644:init] test-volume:
 dangling volume. check volfile
 [2010-12-09 15:15:53.847643] W [dict.c:1204:data_to_str] dict: @data=(nil)
 [2010-12-09 15:15:53.847657] W [dict.c:1204:data_to_str] dict: @data=(nil)
 [2010-12-09 15:15:53.858574] E [rdma.c:2066:rdma_create_cq]
 rpc-transport/rdma: test-volume-client-1: creation of send_cq failed
 [2010-12-09 15:15:53.858805] E [rdma.c:3771:rdma_get_device]
 rpc-transport/rdma: test-volume-client-1: could not create CQ
 [2010-12-09 15:15:53.858821] E [rdma.c:3957:rdma_init]
 rpc-transport/rdma: could not create rdma device for mthca0
 [2010-12-09 15:15:53.858893] E [rdma.c:4789:init]
 test-volume-client-1: Failed to initialize IB Device
 [2010-12-09 15:15:53.858909] E
 [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
 initialization failed
 pending frames:

 patchset: v3.1.1
 signal received: 11
 time of crash: 2010-12-09 15:15:53
 configuration details:
 argp 1
 backtrace 1
 dlfcn 1
 fdatasync 1
 libpthread 1
 llistxattr 1
 setfsid 1
 spinlock 1
 epoll.h 1
 xattr.h 1
 st_atim.tv_nsec 1
 package-string: glusterfs 3.1.1
 /lib64/libc.so.6[0x32aca302d0]
 /lib64/libc.so.6(strcmp+0x0)[0x32aca79140]
 /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so[0x2c4fef6c]
 /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x2f)[0x2c50013f]
 /usr/lib64/libgfrpc.so.0(rpc_transport_load+0x389)[0x3fcca0cac9]
 /usr/lib64/libgfrpc.so.0(rpc_clnt_new+0xfe)[0x3fcca1053e]
 /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa1)[0x2b194f01]
 /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(init+0x129)[0x2b1950d9]
 /usr/lib64/libglusterfs.so.0(xlator_init+0x58)[0x3fcc617398]
 /usr/lib64/libglusterfs.so.0(glusterfs_graph_init+0x31)[0x3fcc640291]
 /usr/lib64/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x3fcc6403c8]
 /usr/sbin/glusterfs(glusterfs_process_volfp+0xfa)[0x40373a]
 /usr/sbin/glusterfs(mgmt_getspec_cbk+0xc5)[0x406125]
 /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x3fcca0f542]
 /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x3fcca0f73d]
 /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x3fcca0a95c]
 /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2ad6ef9f]
 /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x170)[0x2ad6f130]
 /usr/lib64/libglusterfs.so.0[0x3fcc637917]
 /usr/sbin/glusterfs(main+0x39b)[0x40470b]
 /lib64/libc.so.6(__libc_start_main+0xf4)[0x32aca1d994]
 /usr/sbin/glusterfs[0x402e29]




 On Fri, Dec 3, 2010 at 1:53 PM, Raghavendra G raghaven...@gluster.com wrote:
 From the logs its evident that the reason for completion queue creation 
 failure is that the number of completion queue elements (in a completion 
 queue) we had requested in ibv_create_cq, (1024 * send_count) is less than 
 the maximum supported by the ib hardware (max_cqe = 131071).

 - Original Message -
 From: Jeremy Stout stout.jer...@gmail.com
 To: Raghavendra G raghaven...@gluster.com
 Cc: gluster-users@gluster.org
 Sent: Friday, December 3, 2010 4:20:04 PM
 Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

 I patched the source code and rebuilt GlusterFS. Here are the full logs:
 Server:
 [2010-12-03 07:08:55.945804] I [glusterd.c:275:init] management: Using
 /etc

Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-10 Thread Artem Trunov
Hi, Raghavendra, Jeremy

Thanks, I have tried with the patch and also with ofed 1.5.2 and got
pretty much what Jeremy had:

[2010-12-10 13:32:59.69007] E [rdma.c:2047:rdma_create_cq]
rpc-transport/rdma: max_mr_size = 18446744073709551615, max_cq =
65408, max_cqe = 131071, max_mr = 131056

Aren't these parameters configurable on some driver level? I am a bit
new to the IB business, so don't know...

How do you suggest to proceed? To try the unaccepted patch?

cheers
Artem.

On Fri, Dec 10, 2010 at 6:22 AM, Raghavendra G raghaven...@gluster.com wrote:
 Hi Artem,

 you can check the maximum limits using the patch I had sent earlier in the 
 same thread. Also, the patch
 http://patches.gluster.com/patch/5844/ (which is not accepted yet), will 
 check for whether the number of cqe being passed in ibv_creation_cq is 
 greater than the value allowed by the device and if so, it will try to create 
 CQ with maximum limit allowed by the device.

 regards,
 - Original Message -
 From: Artem Trunov datam...@gmail.com
 To: Raghavendra G raghaven...@gluster.com
 Cc: Jeremy Stout stout.jer...@gmail.com, gluster-users@gluster.org
 Sent: Thursday, December 9, 2010 7:13:40 PM
 Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

 Hi, Ravendra, Jeremy

 This was very interesting debugging thread to me, since I have the
 same symptoms, but unsure of the origin. Please see log for my mount
 command at the end of the message.

 I have installed 3.3.1. My OFED is 1.5.1 - does it make serious
 difference between already mentioned 1.5.2?

 On hardware limitations - I have Mellanox InfiniHost III Lx 20Gb/s and
 it says in specs:

 Supports 16 million QPs, EEs  CQs 

 Is this enough? How can I query for actual settings on max_cq, max_cqe?

 In general, how should I proceed? What are my other debugging options?
 Should I try to go Jeremy path with hacking the gluster code?

 cheers
 Artem.

 Log:

 -
 [2010-12-09 15:15:53.847595] W [io-stats.c:1644:init] test-volume:
 dangling volume. check volfile
 [2010-12-09 15:15:53.847643] W [dict.c:1204:data_to_str] dict: @data=(nil)
 [2010-12-09 15:15:53.847657] W [dict.c:1204:data_to_str] dict: @data=(nil)
 [2010-12-09 15:15:53.858574] E [rdma.c:2066:rdma_create_cq]
 rpc-transport/rdma: test-volume-client-1: creation of send_cq failed
 [2010-12-09 15:15:53.858805] E [rdma.c:3771:rdma_get_device]
 rpc-transport/rdma: test-volume-client-1: could not create CQ
 [2010-12-09 15:15:53.858821] E [rdma.c:3957:rdma_init]
 rpc-transport/rdma: could not create rdma device for mthca0
 [2010-12-09 15:15:53.858893] E [rdma.c:4789:init]
 test-volume-client-1: Failed to initialize IB Device
 [2010-12-09 15:15:53.858909] E
 [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
 initialization failed
 pending frames:

 patchset: v3.1.1
 signal received: 11
 time of crash: 2010-12-09 15:15:53
 configuration details:
 argp 1
 backtrace 1
 dlfcn 1
 fdatasync 1
 libpthread 1
 llistxattr 1
 setfsid 1
 spinlock 1
 epoll.h 1
 xattr.h 1
 st_atim.tv_nsec 1
 package-string: glusterfs 3.1.1
 /lib64/libc.so.6[0x32aca302d0]
 /lib64/libc.so.6(strcmp+0x0)[0x32aca79140]
 /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so[0x2c4fef6c]
 /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x2f)[0x2c50013f]
 /usr/lib64/libgfrpc.so.0(rpc_transport_load+0x389)[0x3fcca0cac9]
 /usr/lib64/libgfrpc.so.0(rpc_clnt_new+0xfe)[0x3fcca1053e]
 /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa1)[0x2b194f01]
 /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(init+0x129)[0x2b1950d9]
 /usr/lib64/libglusterfs.so.0(xlator_init+0x58)[0x3fcc617398]
 /usr/lib64/libglusterfs.so.0(glusterfs_graph_init+0x31)[0x3fcc640291]
 /usr/lib64/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x3fcc6403c8]
 /usr/sbin/glusterfs(glusterfs_process_volfp+0xfa)[0x40373a]
 /usr/sbin/glusterfs(mgmt_getspec_cbk+0xc5)[0x406125]
 /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x3fcca0f542]
 /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x3fcca0f73d]
 /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x3fcca0a95c]
 /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2ad6ef9f]
 /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x170)[0x2ad6f130]
 /usr/lib64/libglusterfs.so.0[0x3fcc637917]
 /usr/sbin/glusterfs(main+0x39b)[0x40470b]
 /lib64/libc.so.6(__libc_start_main+0xf4)[0x32aca1d994]
 /usr/sbin/glusterfs[0x402e29]




 On Fri, Dec 3, 2010 at 1:53 PM, Raghavendra G raghaven...@gluster.com wrote:
 From the logs its evident that the reason for completion queue creation 
 failure is that the number of completion queue elements (in a completion 
 queue) we had requested in ibv_create_cq, (1024 * send_count) is less than 
 the maximum supported by the ib hardware (max_cqe = 131071).

 - Original Message -
 From: Jeremy Stout stout.jer...@gmail.com
 To: Raghavendra G raghaven...@gluster.com
 Cc: gluster-users@gluster.org
 Sent

Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-10 Thread Artem Trunov
Hi all

To add some info:

1) I can query adapter settings with ibv_devinfo -v and get these values

2) I can vary max_cq via ib_mthca param num_cq, but that doesn't affect max_cqe.

cheers
Artem.

On Fri, Dec 10, 2010 at 1:41 PM, Artem Trunov datam...@gmail.com wrote:
 Hi, Raghavendra, Jeremy

 Thanks, I have tried with the patch and also with ofed 1.5.2 and got
 pretty much what Jeremy had:

 [2010-12-10 13:32:59.69007] E [rdma.c:2047:rdma_create_cq]
 rpc-transport/rdma: max_mr_size = 18446744073709551615, max_cq =
 65408, max_cqe = 131071, max_mr = 131056

 Aren't these parameters configurable on some driver level? I am a bit
 new to the IB business, so don't know...

 How do you suggest to proceed? To try the unaccepted patch?

 cheers
 Artem.

 On Fri, Dec 10, 2010 at 6:22 AM, Raghavendra G raghaven...@gluster.com 
 wrote:
 Hi Artem,

 you can check the maximum limits using the patch I had sent earlier in the 
 same thread. Also, the patch
 http://patches.gluster.com/patch/5844/ (which is not accepted yet), will 
 check for whether the number of cqe being passed in ibv_creation_cq is 
 greater than the value allowed by the device and if so, it will try to 
 create CQ with maximum limit allowed by the device.

 regards,
 - Original Message -
 From: Artem Trunov datam...@gmail.com
 To: Raghavendra G raghaven...@gluster.com
 Cc: Jeremy Stout stout.jer...@gmail.com, gluster-users@gluster.org
 Sent: Thursday, December 9, 2010 7:13:40 PM
 Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

 Hi, Ravendra, Jeremy

 This was very interesting debugging thread to me, since I have the
 same symptoms, but unsure of the origin. Please see log for my mount
 command at the end of the message.

 I have installed 3.3.1. My OFED is 1.5.1 - does it make serious
 difference between already mentioned 1.5.2?

 On hardware limitations - I have Mellanox InfiniHost III Lx 20Gb/s and
 it says in specs:

 Supports 16 million QPs, EEs  CQs 

 Is this enough? How can I query for actual settings on max_cq, max_cqe?

 In general, how should I proceed? What are my other debugging options?
 Should I try to go Jeremy path with hacking the gluster code?

 cheers
 Artem.

 Log:

 -
 [2010-12-09 15:15:53.847595] W [io-stats.c:1644:init] test-volume:
 dangling volume. check volfile
 [2010-12-09 15:15:53.847643] W [dict.c:1204:data_to_str] dict: @data=(nil)
 [2010-12-09 15:15:53.847657] W [dict.c:1204:data_to_str] dict: @data=(nil)
 [2010-12-09 15:15:53.858574] E [rdma.c:2066:rdma_create_cq]
 rpc-transport/rdma: test-volume-client-1: creation of send_cq failed
 [2010-12-09 15:15:53.858805] E [rdma.c:3771:rdma_get_device]
 rpc-transport/rdma: test-volume-client-1: could not create CQ
 [2010-12-09 15:15:53.858821] E [rdma.c:3957:rdma_init]
 rpc-transport/rdma: could not create rdma device for mthca0
 [2010-12-09 15:15:53.858893] E [rdma.c:4789:init]
 test-volume-client-1: Failed to initialize IB Device
 [2010-12-09 15:15:53.858909] E
 [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
 initialization failed
 pending frames:

 patchset: v3.1.1
 signal received: 11
 time of crash: 2010-12-09 15:15:53
 configuration details:
 argp 1
 backtrace 1
 dlfcn 1
 fdatasync 1
 libpthread 1
 llistxattr 1
 setfsid 1
 spinlock 1
 epoll.h 1
 xattr.h 1
 st_atim.tv_nsec 1
 package-string: glusterfs 3.1.1
 /lib64/libc.so.6[0x32aca302d0]
 /lib64/libc.so.6(strcmp+0x0)[0x32aca79140]
 /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so[0x2c4fef6c]
 /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x2f)[0x2c50013f]
 /usr/lib64/libgfrpc.so.0(rpc_transport_load+0x389)[0x3fcca0cac9]
 /usr/lib64/libgfrpc.so.0(rpc_clnt_new+0xfe)[0x3fcca1053e]
 /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa1)[0x2b194f01]
 /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(init+0x129)[0x2b1950d9]
 /usr/lib64/libglusterfs.so.0(xlator_init+0x58)[0x3fcc617398]
 /usr/lib64/libglusterfs.so.0(glusterfs_graph_init+0x31)[0x3fcc640291]
 /usr/lib64/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x3fcc6403c8]
 /usr/sbin/glusterfs(glusterfs_process_volfp+0xfa)[0x40373a]
 /usr/sbin/glusterfs(mgmt_getspec_cbk+0xc5)[0x406125]
 /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x3fcca0f542]
 /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x3fcca0f73d]
 /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x3fcca0a95c]
 /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2ad6ef9f]
 /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x170)[0x2ad6f130]
 /usr/lib64/libglusterfs.so.0[0x3fcc637917]
 /usr/sbin/glusterfs(main+0x39b)[0x40470b]
 /lib64/libc.so.6(__libc_start_main+0xf4)[0x32aca1d994]
 /usr/sbin/glusterfs[0x402e29]




 On Fri, Dec 3, 2010 at 1:53 PM, Raghavendra G raghaven...@gluster.com 
 wrote:
 From the logs its evident that the reason for completion queue creation 
 failure is that the number of completion queue elements (in a completion 
 queue) we

Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-09 Thread Artem Trunov
Hi, Ravendra, Jeremy

This was very interesting debugging thread to me, since I have the
same symptoms, but unsure of the origin. Please see log for my mount
command at the end of the message.

I have installed 3.3.1. My OFED is 1.5.1 - does it make serious
difference between already mentioned 1.5.2?

On hardware limitations - I have Mellanox InfiniHost III Lx 20Gb/s and
it says in specs:

Supports 16 million QPs, EEs  CQs 

Is this enough? How can I query for actual settings on max_cq, max_cqe?

In general, how should I proceed? What are my other debugging options?
Should I try to go Jeremy path with hacking the gluster code?

cheers
Artem.

Log:

-
[2010-12-09 15:15:53.847595] W [io-stats.c:1644:init] test-volume:
dangling volume. check volfile
[2010-12-09 15:15:53.847643] W [dict.c:1204:data_to_str] dict: @data=(nil)
[2010-12-09 15:15:53.847657] W [dict.c:1204:data_to_str] dict: @data=(nil)
[2010-12-09 15:15:53.858574] E [rdma.c:2066:rdma_create_cq]
rpc-transport/rdma: test-volume-client-1: creation of send_cq failed
[2010-12-09 15:15:53.858805] E [rdma.c:3771:rdma_get_device]
rpc-transport/rdma: test-volume-client-1: could not create CQ
[2010-12-09 15:15:53.858821] E [rdma.c:3957:rdma_init]
rpc-transport/rdma: could not create rdma device for mthca0
[2010-12-09 15:15:53.858893] E [rdma.c:4789:init]
test-volume-client-1: Failed to initialize IB Device
[2010-12-09 15:15:53.858909] E
[rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
initialization failed
pending frames:

patchset: v3.1.1
signal received: 11
time of crash: 2010-12-09 15:15:53
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.1.1
/lib64/libc.so.6[0x32aca302d0]
/lib64/libc.so.6(strcmp+0x0)[0x32aca79140]
/usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so[0x2c4fef6c]
/usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x2f)[0x2c50013f]
/usr/lib64/libgfrpc.so.0(rpc_transport_load+0x389)[0x3fcca0cac9]
/usr/lib64/libgfrpc.so.0(rpc_clnt_new+0xfe)[0x3fcca1053e]
/usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa1)[0x2b194f01]
/usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(init+0x129)[0x2b1950d9]
/usr/lib64/libglusterfs.so.0(xlator_init+0x58)[0x3fcc617398]
/usr/lib64/libglusterfs.so.0(glusterfs_graph_init+0x31)[0x3fcc640291]
/usr/lib64/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x3fcc6403c8]
/usr/sbin/glusterfs(glusterfs_process_volfp+0xfa)[0x40373a]
/usr/sbin/glusterfs(mgmt_getspec_cbk+0xc5)[0x406125]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x3fcca0f542]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x3fcca0f73d]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x3fcca0a95c]
/usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2ad6ef9f]
/usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x170)[0x2ad6f130]
/usr/lib64/libglusterfs.so.0[0x3fcc637917]
/usr/sbin/glusterfs(main+0x39b)[0x40470b]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x32aca1d994]
/usr/sbin/glusterfs[0x402e29]




On Fri, Dec 3, 2010 at 1:53 PM, Raghavendra G raghaven...@gluster.com wrote:
 From the logs its evident that the reason for completion queue creation 
 failure is that the number of completion queue elements (in a completion 
 queue) we had requested in ibv_create_cq, (1024 * send_count) is less than 
 the maximum supported by the ib hardware (max_cqe = 131071).

 - Original Message -
 From: Jeremy Stout stout.jer...@gmail.com
 To: Raghavendra G raghaven...@gluster.com
 Cc: gluster-users@gluster.org
 Sent: Friday, December 3, 2010 4:20:04 PM
 Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

 I patched the source code and rebuilt GlusterFS. Here are the full logs:
 Server:
 [2010-12-03 07:08:55.945804] I [glusterd.c:275:init] management: Using
 /etc/glusterd as working directory
 [2010-12-03 07:08:55.947692] E [rdma.c:2047:rdma_create_cq]
 rpc-transport/rdma: max_mr_size = 18446744073709551615, max_cq =
 65408, max_cqe = 131071, max_mr = 131056
 [2010-12-03 07:08:55.953226] E [rdma.c:2079:rdma_create_cq]
 rpc-transport/rdma: rdma.management: creation of send_cq failed
 [2010-12-03 07:08:55.953509] E [rdma.c:3785:rdma_get_device]
 rpc-transport/rdma: rdma.management: could not create CQ
 [2010-12-03 07:08:55.953582] E [rdma.c:3971:rdma_init]
 rpc-transport/rdma: could not create rdma device for mthca0
 [2010-12-03 07:08:55.953668] E [rdma.c:4803:init] rdma.management:
 Failed to initialize IB Device
 [2010-12-03 07:08:55.953691] E
 [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
 initialization failed
 [2010-12-03 07:08:55.953780] I [glusterd.c:96:glusterd_uuid_init]
 glusterd: generated UUID: 4eb47ca7-227c-49c4-97bd-25ac177b2f6a
 Given volfile:
 +--+
  1: volume management
  2:     type mgmt/glusterd

Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-03 Thread Jeremy Stout
-behind
 10: subvolumes testdir-client-0
 11: end-volume
 12:
 13: volume testdir-read-ahead
 14: type performance/read-ahead
 15: subvolumes testdir-write-behind
 16: end-volume
 17:
 18: volume testdir-io-cache
 19: type performance/io-cache
 20: subvolumes testdir-read-ahead
 21: end-volume
 22:
 23: volume testdir-quick-read
 24: type performance/quick-read
 25: subvolumes testdir-io-cache
 26: end-volume
 27:
 28: volume testdir-stat-prefetch
 29: type performance/stat-prefetch
 30: subvolumes testdir-quick-read
 31: end-volume
 32:
 33: volume testdir
 34: type debug/io-stats
 35: subvolumes testdir-stat-prefetch
 36: end-volume

+--+


On Fri, Dec 3, 2010 at 12:38 AM, Raghavendra G raghaven...@gluster.com wrote:
 Hi Jeremy,

 Can you apply the attached patch, rebuild and start glusterfs? Please make 
 sure to send us the logs of glusterfs.

 regards,
 - Original Message -
 From: Jeremy Stout stout.jer...@gmail.com
 To: gluster-users@gluster.org
 Sent: Friday, December 3, 2010 6:38:00 AM
 Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

 I'm currently using OFED 1.5.2.

 For the sake of testing, I just compiled GlusterFS 3.1.1 from source,
 without any modifications, on two systems that have a 2.6.33.7 kernel
 and OFED 1.5.2 built from source. Here are the results:

 Server:
 [2010-12-02 21:17:55.886563] I
 [glusterd-handler.c:936:glusterd_handle_cli_start_volume] glusterd:
 Received start vol reqfor volume testdir
 [2010-12-02 21:17:55.886597] I [glusterd-utils.c:232:glusterd_lock]
 glusterd: Cluster lock held by 7dd23af5-277e-4ea1-a495-2a9d882287ec
 [2010-12-02 21:17:55.886607] I
 [glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired
 local lock
 [2010-12-02 21:17:55.886628] I
 [glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock
 req to 0 peers
 [2010-12-02 21:17:55.887031] I
 [glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req
 to 0 peers
 [2010-12-02 21:17:56.60427] I
 [glusterd-utils.c:971:glusterd_volume_start_glusterfs] : About to
 start glusterfs for brick submit-1:/mnt/gluster
 [2010-12-02 21:17:56.104896] I
 [glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req
 to 0 peers
 [2010-12-02 21:17:56.104935] I
 [glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent
 unlock req to 0 peers
 [2010-12-02 21:17:56.104953] I
 [glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared
 local lock
 [2010-12-02 21:17:56.114764] I
 [glusterd-pmap.c:281:pmap_registry_remove] pmap: removing brick (null)
 on port 24009

 Client:
 [2010-12-02 21:17:25.503395] W [io-stats.c:1644:init] testdir:
 dangling volume. check volfile
 [2010-12-02 21:17:25.503434] W [dict.c:1204:data_to_str] dict: @data=(nil)
 [2010-12-02 21:17:25.503447] W [dict.c:1204:data_to_str] dict: @data=(nil)
 [2010-12-02 21:17:25.543409] E [rdma.c:2066:rdma_create_cq]
 rpc-transport/rdma: testdir-client-0: creation of send_cq failed
 [2010-12-02 21:17:25.543660] E [rdma.c:3771:rdma_get_device]
 rpc-transport/rdma: testdir-client-0: could not create CQ
 [2010-12-02 21:17:25.543725] E [rdma.c:3957:rdma_init]
 rpc-transport/rdma: could not create rdma device for mthca0
 [2010-12-02 21:17:25.543812] E [rdma.c:4789:init] testdir-client-0:
 Failed to initialize IB Device
 [2010-12-02 21:17:25.543830] E
 [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
 initialization failed

 Thank you for the help so far.

 On Thu, Dec 2, 2010 at 8:02 PM, Craig Carl cr...@gluster.com wrote:
 Jeremy -
   What version of OFED are you running? Would you mind install version 1.5.2
 from source? We have seen this resolve several issues of this type.
 http://www.openfabrics.org/downloads/OFED/ofed-1.5.2/


 Thanks,

 Craig

 --
 Craig Carl
 Senior Systems Engineer
 Gluster


 On 12/02/2010 10:05 AM, Jeremy Stout wrote:

 An another follow-up, I tested several compilations today with
 different values for send/receive count. I found the maximum value I
 could use for both variables was 127. With a value of 127, GlusterFS
 did not produce any errors. However, when I changed the value back to
 128, the RDMA errors appeared again.

 I also tried setting soft/hard memlock to unlimited in the
 limits.conf file, but still ran into RDMA errors on the client side
 when the count variables were set to 128.

 On Thu, Dec 2, 2010 at 9:04 AM, Jeremy Stoutstout.jer...@gmail.com
  wrote:

 Thank you for the response. I've been testing GlusterFS 3.1.1 on two
 different OpenSUSE 11.3 systems. Since both systems generated the same
 error messages, I'll include the output for both.

 System #1:
 fs-1:~ # cat /proc/meminfo
 MemTotal:       16468756 kB
 MemFree:        16126680 kB
 Buffers:           15680 kB
 Cached:           155860 kB
 SwapCached:            0 kB
 Active:            65228 kB
 Inactive:         123100 kB
 Active(anon):      18632 kB

Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-03 Thread Raghavendra G
From the logs its evident that the reason for completion queue creation failure 
is that the number of completion queue elements (in a completion queue) we had 
requested in ibv_create_cq, (1024 * send_count) is less than the maximum 
supported by the ib hardware (max_cqe = 131071).

- Original Message -
From: Jeremy Stout stout.jer...@gmail.com
To: Raghavendra G raghaven...@gluster.com
Cc: gluster-users@gluster.org
Sent: Friday, December 3, 2010 4:20:04 PM
Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

I patched the source code and rebuilt GlusterFS. Here are the full logs:
Server:
[2010-12-03 07:08:55.945804] I [glusterd.c:275:init] management: Using
/etc/glusterd as working directory
[2010-12-03 07:08:55.947692] E [rdma.c:2047:rdma_create_cq]
rpc-transport/rdma: max_mr_size = 18446744073709551615, max_cq =
65408, max_cqe = 131071, max_mr = 131056
[2010-12-03 07:08:55.953226] E [rdma.c:2079:rdma_create_cq]
rpc-transport/rdma: rdma.management: creation of send_cq failed
[2010-12-03 07:08:55.953509] E [rdma.c:3785:rdma_get_device]
rpc-transport/rdma: rdma.management: could not create CQ
[2010-12-03 07:08:55.953582] E [rdma.c:3971:rdma_init]
rpc-transport/rdma: could not create rdma device for mthca0
[2010-12-03 07:08:55.953668] E [rdma.c:4803:init] rdma.management:
Failed to initialize IB Device
[2010-12-03 07:08:55.953691] E
[rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
initialization failed
[2010-12-03 07:08:55.953780] I [glusterd.c:96:glusterd_uuid_init]
glusterd: generated UUID: 4eb47ca7-227c-49c4-97bd-25ac177b2f6a
Given volfile:
+--+
  1: volume management
  2: type mgmt/glusterd
  3: option working-directory /etc/glusterd
  4: option transport-type socket,rdma
  5: option transport.socket.keepalive-time 10
  6: option transport.socket.keepalive-interval 2
  7: end-volume
  8:

+--+
[2010-12-03 07:09:10.244790] I
[glusterd-handler.c:785:glusterd_handle_create_volume] glusterd:
Received create volume req
[2010-12-03 07:09:10.247646] I [glusterd-utils.c:232:glusterd_lock]
glusterd: Cluster lock held by 4eb47ca7-227c-49c4-97bd-25ac177b2f6a
[2010-12-03 07:09:10.247678] I
[glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired
local lock
[2010-12-03 07:09:10.247708] I
[glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock
req to 0 peers
[2010-12-03 07:09:10.248038] I
[glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req
to 0 peers
[2010-12-03 07:09:10.251970] I
[glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req
to 0 peers
[2010-12-03 07:09:10.252020] I
[glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent
unlock req to 0 peers
[2010-12-03 07:09:10.252036] I
[glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared
local lock
[2010-12-03 07:09:22.11649] I
[glusterd-handler.c:936:glusterd_handle_cli_start_volume] glusterd:
Received start vol reqfor volume testdir
[2010-12-03 07:09:22.11724] I [glusterd-utils.c:232:glusterd_lock]
glusterd: Cluster lock held by 4eb47ca7-227c-49c4-97bd-25ac177b2f6a
[2010-12-03 07:09:22.11734] I
[glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired
local lock
[2010-12-03 07:09:22.11761] I
[glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock
req to 0 peers
[2010-12-03 07:09:22.12120] I
[glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req
to 0 peers
[2010-12-03 07:09:22.184403] I
[glusterd-utils.c:971:glusterd_volume_start_glusterfs] : About to
start glusterfs for brick pgh-submit-1:/mnt/gluster
[2010-12-03 07:09:22.229143] I
[glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req
to 0 peers
[2010-12-03 07:09:22.229198] I
[glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent
unlock req to 0 peers
[2010-12-03 07:09:22.229218] I
[glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared
local lock
[2010-12-03 07:09:22.240157] I
[glusterd-pmap.c:281:pmap_registry_remove] pmap: removing brick (null)
on port 24009


Client:
[2010-12-03 07:09:00.82784] W [io-stats.c:1644:init] testdir: dangling
volume. check volfile
[2010-12-03 07:09:00.82824] W [dict.c:1204:data_to_str] dict: @data=(nil)
[2010-12-03 07:09:00.82836] W [dict.c:1204:data_to_str] dict: @data=(nil)
[2010-12-03 07:09:00.85980] E [rdma.c:2047:rdma_create_cq]
rpc-transport/rdma: max_mr_size = 18446744073709551615, max_cq =
65408, max_cqe = 131071, max_mr = 131056
[2010-12-03 07:09:00.92883] E [rdma.c:2079:rdma_create_cq]
rpc-transport/rdma: testdir-client-0: creation of send_cq failed
[2010-12-03 07:09:00.93156] E [rdma.c:3785:rdma_get_device]
rpc-transport/rdma: testdir-client-0: could not create CQ
[2010-12-03 07:09:00.93224] E [rdma.c:3971:rdma_init]
rpc-transport/rdma: could not create rdma device for mthca0
[2010-12-03 07:09:00.93313] E [rdma.c

Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-02 Thread Raghavendra G
Hi Jeremy,

can you also get the output of,

#uname -a

#ulimit -l

regards,
- Original Message -
From: Raghavendra G raghaven...@gluster.com
To: Jeremy Stout stout.jer...@gmail.com
Cc: gluster-users@gluster.org
Sent: Thursday, December 2, 2010 10:20:04 AM
Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

Hi Jeremy,

In order to diagnoise why completion queue creation is failing (as indicated by 
logs), we want to know what was the free memory available in your system when 
glusterfs was started.

regards,
- Original Message -
From: Raghavendra G raghaven...@gluster.com
To: Jeremy Stout stout.jer...@gmail.com
Cc: gluster-users@gluster.org
Sent: Thursday, December 2, 2010 10:11:18 AM
Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

Hi Jeremy,

Yes, there might be some performance decrease. But, it should not affect 
working of rdma.

regards,
- Original Message -
From: Jeremy Stout stout.jer...@gmail.com
To: gluster-users@gluster.org
Sent: Thursday, December 2, 2010 8:30:20 AM
Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

As an update to my situation, I think I have GlusterFS 3.1.1 working
now. I was able to create and mount RDMA volumes without any errors.

To fix the problem, I had to make the following changes on lines 3562
and 3563 in rdma.c:
options-send_count = 32;
options-recv_count = 32;

The values were set to 128.

I'll run some tests tomorrow to verify that it is working correctly.
Assuming it does, what would be the expected side-effect of changing
the values from 128 to 32? Will there be a decrease in performance?


On Wed, Dec 1, 2010 at 10:07 AM, Jeremy Stout stout.jer...@gmail.com wrote:
 Here are the results of the test:
 submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong
  local address:  LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
  local address:  LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
  local address:  LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
  local address:  LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID ::
  local address:  LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID ::
  local address:  LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID ::
  local address:  LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID ::
  local address:  LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID ::
  local address:  LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID ::
  local address:  LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID ::
  local address:  LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID ::
  local address:  LID 0x0002, QPN 0x000411, PSN 0x89c683, GID ::
  local address:  LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID ::
  local address:  LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID ::
  local address:  LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID ::
  local address:  LID 0x0002, QPN 0x000415, PSN 0x29562e, GID ::
  remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
  remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
  remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
  remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
  remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
  remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
  remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
  remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
  remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
  remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
  remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
  remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
  remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
  remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
  remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID ::
  remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID ::
 8192000 bytes in 0.01 seconds = 5917.47 Mbit/sec
 1000 iters in 0.01 seconds = 11.07 usec/iter

 fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong submit-1
  local address:  LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
  local address:  LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
  local address:  LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
  local address:  LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
  local address:  LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
  local address:  LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
  local address:  LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
  local address:  LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
  local address:  LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
  local address:  LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
  local address:  LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
  local address:  LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
  local address:  LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
  local address:  LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
  local address

Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-02 Thread Jeremy Stout
An another follow-up, I tested several compilations today with
different values for send/receive count. I found the maximum value I
could use for both variables was 127. With a value of 127, GlusterFS
did not produce any errors. However, when I changed the value back to
128, the RDMA errors appeared again.

I also tried setting soft/hard memlock to unlimited in the
limits.conf file, but still ran into RDMA errors on the client side
when the count variables were set to 128.

On Thu, Dec 2, 2010 at 9:04 AM, Jeremy Stout stout.jer...@gmail.com wrote:
 Thank you for the response. I've been testing GlusterFS 3.1.1 on two
 different OpenSUSE 11.3 systems. Since both systems generated the same
 error messages, I'll include the output for both.

 System #1:
 fs-1:~ # cat /proc/meminfo
 MemTotal:       16468756 kB
 MemFree:        16126680 kB
 Buffers:           15680 kB
 Cached:           155860 kB
 SwapCached:            0 kB
 Active:            65228 kB
 Inactive:         123100 kB
 Active(anon):      18632 kB
 Inactive(anon):       48 kB
 Active(file):      46596 kB
 Inactive(file):   123052 kB
 Unevictable:        1988 kB
 Mlocked:            1988 kB
 SwapTotal:             0 kB
 SwapFree:              0 kB
 Dirty:             30072 kB
 Writeback:             4 kB
 AnonPages:         18780 kB
 Mapped:            12136 kB
 Shmem:               220 kB
 Slab:              39592 kB
 SReclaimable:      13108 kB
 SUnreclaim:        26484 kB
 KernelStack:        2360 kB
 PageTables:         2036 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
 CommitLimit:     8234376 kB
 Committed_AS:     107304 kB
 VmallocTotal:   34359738367 kB
 VmallocUsed:      314316 kB
 VmallocChunk:   34349860776 kB
 HardwareCorrupted:     0 kB
 HugePages_Total:       0
 HugePages_Free:        0
 HugePages_Rsvd:        0
 HugePages_Surp:        0
 Hugepagesize:       2048 kB
 DirectMap4k:        9856 kB
 DirectMap2M:     3135488 kB
 DirectMap1G:    13631488 kB

 fs-1:~ # uname -a
 Linux fs-1 2.6.32.25-November2010 #2 SMP PREEMPT Mon Nov 1 15:19:55
 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

 fs-1:~ # ulimit -l
 64

 System #2:
 submit-1:~ # cat /proc/meminfo
 MemTotal:       16470424 kB
 MemFree:        16197292 kB
 Buffers:           11788 kB
 Cached:            85492 kB
 SwapCached:            0 kB
 Active:            39120 kB
 Inactive:          76548 kB
 Active(anon):      18532 kB
 Inactive(anon):       48 kB
 Active(file):      20588 kB
 Inactive(file):    76500 kB
 Unevictable:           0 kB
 Mlocked:               0 kB
 SwapTotal:      67100656 kB
 SwapFree:       67100656 kB
 Dirty:                24 kB
 Writeback:             0 kB
 AnonPages:         18408 kB
 Mapped:            11644 kB
 Shmem:               184 kB
 Slab:              34000 kB
 SReclaimable:       8512 kB
 SUnreclaim:        25488 kB
 KernelStack:        2160 kB
 PageTables:         1952 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
 CommitLimit:    75335868 kB
 Committed_AS:     105620 kB
 VmallocTotal:   34359738367 kB
 VmallocUsed:       76416 kB
 VmallocChunk:   34359652640 kB
 HardwareCorrupted:     0 kB
 HugePages_Total:       0
 HugePages_Free:        0
 HugePages_Rsvd:        0
 HugePages_Surp:        0
 Hugepagesize:       2048 kB
 DirectMap4k:        7488 kB
 DirectMap2M:    16769024 kB

 submit-1:~ # uname -a
 Linux submit-1 2.6.33.7-November2010 #1 SMP PREEMPT Mon Nov 8 13:49:00
 EST 2010 x86_64 x86_64 x86_64 GNU/Linux

 submit-1:~ # ulimit -l
 64

 I retrieved the memory information on each machine after starting the
 glusterd process.

 On Thu, Dec 2, 2010 at 3:51 AM, Raghavendra G raghaven...@gluster.com wrote:
 Hi Jeremy,

 can you also get the output of,

 #uname -a

 #ulimit -l

 regards,
 - Original Message -
 From: Raghavendra G raghaven...@gluster.com
 To: Jeremy Stout stout.jer...@gmail.com
 Cc: gluster-users@gluster.org
 Sent: Thursday, December 2, 2010 10:20:04 AM
 Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

 Hi Jeremy,

 In order to diagnoise why completion queue creation is failing (as indicated 
 by logs), we want to know what was the free memory available in your system 
 when glusterfs was started.

 regards,
 - Original Message -
 From: Raghavendra G raghaven...@gluster.com
 To: Jeremy Stout stout.jer...@gmail.com
 Cc: gluster-users@gluster.org
 Sent: Thursday, December 2, 2010 10:11:18 AM
 Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

 Hi Jeremy,

 Yes, there might be some performance decrease. But, it should not affect 
 working of rdma.

 regards,
 - Original Message -
 From: Jeremy Stout stout.jer...@gmail.com
 To: gluster-users@gluster.org
 Sent: Thursday, December 2, 2010 8:30:20 AM
 Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

 As an update to my situation, I think I have GlusterFS 3.1.1 working
 now. I was able to create and mount RDMA volumes without any

Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-02 Thread Craig Carl

Jeremy -
   What version of OFED are you running? Would you mind install version 
1.5.2 from source? We have seen this resolve several issues of this type.

http://www.openfabrics.org/downloads/OFED/ofed-1.5.2/


Thanks,

Craig

--
Craig Carl
Senior Systems Engineer
Gluster


On 12/02/2010 10:05 AM, Jeremy Stout wrote:

An another follow-up, I tested several compilations today with
different values for send/receive count. I found the maximum value I
could use for both variables was 127. With a value of 127, GlusterFS
did not produce any errors. However, when I changed the value back to
128, the RDMA errors appeared again.

I also tried setting soft/hard memlock to unlimited in the
limits.conf file, but still ran into RDMA errors on the client side
when the count variables were set to 128.

On Thu, Dec 2, 2010 at 9:04 AM, Jeremy Stoutstout.jer...@gmail.com  wrote:

Thank you for the response. I've been testing GlusterFS 3.1.1 on two
different OpenSUSE 11.3 systems. Since both systems generated the same
error messages, I'll include the output for both.

System #1:
fs-1:~ # cat /proc/meminfo
MemTotal:   16468756 kB
MemFree:16126680 kB
Buffers:   15680 kB
Cached:   155860 kB
SwapCached:0 kB
Active:65228 kB
Inactive: 123100 kB
Active(anon):  18632 kB
Inactive(anon):   48 kB
Active(file):  46596 kB
Inactive(file):   123052 kB
Unevictable:1988 kB
Mlocked:1988 kB
SwapTotal: 0 kB
SwapFree:  0 kB
Dirty: 30072 kB
Writeback: 4 kB
AnonPages: 18780 kB
Mapped:12136 kB
Shmem:   220 kB
Slab:  39592 kB
SReclaimable:  13108 kB
SUnreclaim:26484 kB
KernelStack:2360 kB
PageTables: 2036 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit: 8234376 kB
Committed_AS: 107304 kB
VmallocTotal:   34359738367 kB
VmallocUsed:  314316 kB
VmallocChunk:   34349860776 kB
HardwareCorrupted: 0 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
DirectMap4k:9856 kB
DirectMap2M: 3135488 kB
DirectMap1G:13631488 kB

fs-1:~ # uname -a
Linux fs-1 2.6.32.25-November2010 #2 SMP PREEMPT Mon Nov 1 15:19:55
EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

fs-1:~ # ulimit -l
64

System #2:
submit-1:~ # cat /proc/meminfo
MemTotal:   16470424 kB
MemFree:16197292 kB
Buffers:   11788 kB
Cached:85492 kB
SwapCached:0 kB
Active:39120 kB
Inactive:  76548 kB
Active(anon):  18532 kB
Inactive(anon):   48 kB
Active(file):  20588 kB
Inactive(file):76500 kB
Unevictable:   0 kB
Mlocked:   0 kB
SwapTotal:  67100656 kB
SwapFree:   67100656 kB
Dirty:24 kB
Writeback: 0 kB
AnonPages: 18408 kB
Mapped:11644 kB
Shmem:   184 kB
Slab:  34000 kB
SReclaimable:   8512 kB
SUnreclaim:25488 kB
KernelStack:2160 kB
PageTables: 1952 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:75335868 kB
Committed_AS: 105620 kB
VmallocTotal:   34359738367 kB
VmallocUsed:   76416 kB
VmallocChunk:   34359652640 kB
HardwareCorrupted: 0 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
DirectMap4k:7488 kB
DirectMap2M:16769024 kB

submit-1:~ # uname -a
Linux submit-1 2.6.33.7-November2010 #1 SMP PREEMPT Mon Nov 8 13:49:00
EST 2010 x86_64 x86_64 x86_64 GNU/Linux

submit-1:~ # ulimit -l
64

I retrieved the memory information on each machine after starting the
glusterd process.

On Thu, Dec 2, 2010 at 3:51 AM, Raghavendra Graghaven...@gluster.com  wrote:

Hi Jeremy,

can you also get the output of,

#uname -a

#ulimit -l

regards,
- Original Message -
From: Raghavendra Graghaven...@gluster.com
To: Jeremy Stoutstout.jer...@gmail.com
Cc: gluster-users@gluster.org
Sent: Thursday, December 2, 2010 10:20:04 AM
Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

Hi Jeremy,

In order to diagnoise why completion queue creation is failing (as indicated by 
logs), we want to know what was the free memory available in your system when 
glusterfs was started.

regards,
- Original Message -
From: Raghavendra Graghaven...@gluster.com
To: Jeremy Stoutstout.jer...@gmail.com
Cc: gluster-users@gluster.org
Sent: Thursday, December 2, 2010 10:11:18 AM
Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

Hi Jeremy,

Yes, there might be some performance decrease. But, it should not affect 
working of rdma.

regards,
- Original Message -
From: Jeremy Stoutstout.jer...@gmail.com
To: gluster-users@gluster.org
Sent: Thursday, December 2, 2010 8:30:20 AM
Subject: Re

Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-02 Thread Jeremy Stout
:           11788 kB
 Cached:            85492 kB
 SwapCached:            0 kB
 Active:            39120 kB
 Inactive:          76548 kB
 Active(anon):      18532 kB
 Inactive(anon):       48 kB
 Active(file):      20588 kB
 Inactive(file):    76500 kB
 Unevictable:           0 kB
 Mlocked:               0 kB
 SwapTotal:      67100656 kB
 SwapFree:       67100656 kB
 Dirty:                24 kB
 Writeback:             0 kB
 AnonPages:         18408 kB
 Mapped:            11644 kB
 Shmem:               184 kB
 Slab:              34000 kB
 SReclaimable:       8512 kB
 SUnreclaim:        25488 kB
 KernelStack:        2160 kB
 PageTables:         1952 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
 CommitLimit:    75335868 kB
 Committed_AS:     105620 kB
 VmallocTotal:   34359738367 kB
 VmallocUsed:       76416 kB
 VmallocChunk:   34359652640 kB
 HardwareCorrupted:     0 kB
 HugePages_Total:       0
 HugePages_Free:        0
 HugePages_Rsvd:        0
 HugePages_Surp:        0
 Hugepagesize:       2048 kB
 DirectMap4k:        7488 kB
 DirectMap2M:    16769024 kB

 submit-1:~ # uname -a
 Linux submit-1 2.6.33.7-November2010 #1 SMP PREEMPT Mon Nov 8 13:49:00
 EST 2010 x86_64 x86_64 x86_64 GNU/Linux

 submit-1:~ # ulimit -l
 64

 I retrieved the memory information on each machine after starting the
 glusterd process.

 On Thu, Dec 2, 2010 at 3:51 AM, Raghavendra Graghaven...@gluster.com
  wrote:

 Hi Jeremy,

 can you also get the output of,

 #uname -a

 #ulimit -l

 regards,
 - Original Message -
 From: Raghavendra Graghaven...@gluster.com
 To: Jeremy Stoutstout.jer...@gmail.com
 Cc: gluster-users@gluster.org
 Sent: Thursday, December 2, 2010 10:20:04 AM
 Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

 Hi Jeremy,

 In order to diagnoise why completion queue creation is failing (as
 indicated by logs), we want to know what was the free memory available in
 your system when glusterfs was started.

 regards,
 - Original Message -
 From: Raghavendra Graghaven...@gluster.com
 To: Jeremy Stoutstout.jer...@gmail.com
 Cc: gluster-users@gluster.org
 Sent: Thursday, December 2, 2010 10:11:18 AM
 Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

 Hi Jeremy,

 Yes, there might be some performance decrease. But, it should not affect
 working of rdma.

 regards,
 - Original Message -
 From: Jeremy Stoutstout.jer...@gmail.com
 To: gluster-users@gluster.org
 Sent: Thursday, December 2, 2010 8:30:20 AM
 Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

 As an update to my situation, I think I have GlusterFS 3.1.1 working
 now. I was able to create and mount RDMA volumes without any errors.

 To fix the problem, I had to make the following changes on lines 3562
 and 3563 in rdma.c:
 options-send_count = 32;
 options-recv_count = 32;

 The values were set to 128.

 I'll run some tests tomorrow to verify that it is working correctly.
 Assuming it does, what would be the expected side-effect of changing
 the values from 128 to 32? Will there be a decrease in performance?


 On Wed, Dec 1, 2010 at 10:07 AM, Jeremy Stoutstout.jer...@gmail.com
  wrote:

 Here are the results of the test:
 submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs #
 ibv_srq_pingpong
  local address:  LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
  local address:  LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
  local address:  LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
  local address:  LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID ::
  local address:  LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID ::
  local address:  LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID ::
  local address:  LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID ::
  local address:  LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID ::
  local address:  LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID ::
  local address:  LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID ::
  local address:  LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID ::
  local address:  LID 0x0002, QPN 0x000411, PSN 0x89c683, GID ::
  local address:  LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID ::
  local address:  LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID ::
  local address:  LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID ::
  local address:  LID 0x0002, QPN 0x000415, PSN 0x29562e, GID ::
  remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
  remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
  remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
  remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
  remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
  remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
  remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
  remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
  remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
  remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
  remote

Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-02 Thread Raghavendra G
Hi Jeremy,

Can you apply the attached patch, rebuild and start glusterfs? Please make sure 
to send us the logs of glusterfs.

regards,
- Original Message -
From: Jeremy Stout stout.jer...@gmail.com
To: gluster-users@gluster.org
Sent: Friday, December 3, 2010 6:38:00 AM
Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

I'm currently using OFED 1.5.2.

For the sake of testing, I just compiled GlusterFS 3.1.1 from source,
without any modifications, on two systems that have a 2.6.33.7 kernel
and OFED 1.5.2 built from source. Here are the results:

Server:
[2010-12-02 21:17:55.886563] I
[glusterd-handler.c:936:glusterd_handle_cli_start_volume] glusterd:
Received start vol reqfor volume testdir
[2010-12-02 21:17:55.886597] I [glusterd-utils.c:232:glusterd_lock]
glusterd: Cluster lock held by 7dd23af5-277e-4ea1-a495-2a9d882287ec
[2010-12-02 21:17:55.886607] I
[glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired
local lock
[2010-12-02 21:17:55.886628] I
[glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock
req to 0 peers
[2010-12-02 21:17:55.887031] I
[glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req
to 0 peers
[2010-12-02 21:17:56.60427] I
[glusterd-utils.c:971:glusterd_volume_start_glusterfs] : About to
start glusterfs for brick submit-1:/mnt/gluster
[2010-12-02 21:17:56.104896] I
[glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req
to 0 peers
[2010-12-02 21:17:56.104935] I
[glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent
unlock req to 0 peers
[2010-12-02 21:17:56.104953] I
[glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared
local lock
[2010-12-02 21:17:56.114764] I
[glusterd-pmap.c:281:pmap_registry_remove] pmap: removing brick (null)
on port 24009

Client:
[2010-12-02 21:17:25.503395] W [io-stats.c:1644:init] testdir:
dangling volume. check volfile
[2010-12-02 21:17:25.503434] W [dict.c:1204:data_to_str] dict: @data=(nil)
[2010-12-02 21:17:25.503447] W [dict.c:1204:data_to_str] dict: @data=(nil)
[2010-12-02 21:17:25.543409] E [rdma.c:2066:rdma_create_cq]
rpc-transport/rdma: testdir-client-0: creation of send_cq failed
[2010-12-02 21:17:25.543660] E [rdma.c:3771:rdma_get_device]
rpc-transport/rdma: testdir-client-0: could not create CQ
[2010-12-02 21:17:25.543725] E [rdma.c:3957:rdma_init]
rpc-transport/rdma: could not create rdma device for mthca0
[2010-12-02 21:17:25.543812] E [rdma.c:4789:init] testdir-client-0:
Failed to initialize IB Device
[2010-12-02 21:17:25.543830] E
[rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
initialization failed

Thank you for the help so far.

On Thu, Dec 2, 2010 at 8:02 PM, Craig Carl cr...@gluster.com wrote:
 Jeremy -
   What version of OFED are you running? Would you mind install version 1.5.2
 from source? We have seen this resolve several issues of this type.
 http://www.openfabrics.org/downloads/OFED/ofed-1.5.2/


 Thanks,

 Craig

 --
 Craig Carl
 Senior Systems Engineer
 Gluster


 On 12/02/2010 10:05 AM, Jeremy Stout wrote:

 An another follow-up, I tested several compilations today with
 different values for send/receive count. I found the maximum value I
 could use for both variables was 127. With a value of 127, GlusterFS
 did not produce any errors. However, when I changed the value back to
 128, the RDMA errors appeared again.

 I also tried setting soft/hard memlock to unlimited in the
 limits.conf file, but still ran into RDMA errors on the client side
 when the count variables were set to 128.

 On Thu, Dec 2, 2010 at 9:04 AM, Jeremy Stoutstout.jer...@gmail.com
  wrote:

 Thank you for the response. I've been testing GlusterFS 3.1.1 on two
 different OpenSUSE 11.3 systems. Since both systems generated the same
 error messages, I'll include the output for both.

 System #1:
 fs-1:~ # cat /proc/meminfo
 MemTotal:       16468756 kB
 MemFree:        16126680 kB
 Buffers:           15680 kB
 Cached:           155860 kB
 SwapCached:            0 kB
 Active:            65228 kB
 Inactive:         123100 kB
 Active(anon):      18632 kB
 Inactive(anon):       48 kB
 Active(file):      46596 kB
 Inactive(file):   123052 kB
 Unevictable:        1988 kB
 Mlocked:            1988 kB
 SwapTotal:             0 kB
 SwapFree:              0 kB
 Dirty:             30072 kB
 Writeback:             4 kB
 AnonPages:         18780 kB
 Mapped:            12136 kB
 Shmem:               220 kB
 Slab:              39592 kB
 SReclaimable:      13108 kB
 SUnreclaim:        26484 kB
 KernelStack:        2360 kB
 PageTables:         2036 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
 CommitLimit:     8234376 kB
 Committed_AS:     107304 kB
 VmallocTotal:   34359738367 kB
 VmallocUsed:      314316 kB
 VmallocChunk:   34349860776 kB
 HardwareCorrupted:     0 kB
 HugePages_Total:       0
 HugePages_Free:        0
 HugePages_Rsvd:        0
 HugePages_Surp:        0
 Hugepagesize:       2048 kB
 DirectMap4k

[Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-01 Thread Jeremy Stout
Whenever I try to start or mount a GlusterFS 3.1.1 volume that uses
RDMA, I'm seeing the following error messages in the log file on the
server:
[2010-11-30 18:37:53.51270] I [nfs.c:652:init] nfs: NFS service started
[2010-11-30 18:37:53.51362] W [dict.c:1204:data_to_str] dict: @data=(nil)
[2010-11-30 18:37:53.51375] W [dict.c:1204:data_to_str] dict: @data=(nil)
[2010-11-30 18:37:53.59628] E [rdma.c:2066:rdma_create_cq]
rpc-transport/rdma: testdir-client-0: creation of send_cq failed
[2010-11-30 18:37:53.59851] E [rdma.c:3771:rdma_get_device]
rpc-transport/rdma: testdir-client-0: could not create CQ
[2010-11-30 18:37:53.59925] E [rdma.c:3957:rdma_init]
rpc-transport/rdma: could not create rdma device for mthca0
[2010-11-30 18:37:53.60009] E [rdma.c:4789:init] testdir-client-0:
Failed to initialize IB Device
[2010-11-30 18:37:53.60030] E [rpc-transport.c:971:rpc_transport_load]
rpc-transport: 'rdma' initialization failed

On the client, I see:
[2010-11-30 18:43:49.653469] W [io-stats.c:1644:init] testdir:
dangling volume. check volfile
[2010-11-30 18:43:49.653573] W [dict.c:1204:data_to_str] dict: @data=(nil)
[2010-11-30 18:43:49.653607] W [dict.c:1204:data_to_str] dict: @data=(nil)
[2010-11-30 18:43:49.736275] E [rdma.c:2066:rdma_create_cq]
rpc-transport/rdma: testdir-client-0: creation of send_cq failed
[2010-11-30 18:43:49.736651] E [rdma.c:3771:rdma_get_device]
rpc-transport/rdma: testdir-client-0: could not create CQ
[2010-11-30 18:43:49.736689] E [rdma.c:3957:rdma_init]
rpc-transport/rdma: could not create rdma device for mthca0
[2010-11-30 18:43:49.736805] E [rdma.c:4789:init] testdir-client-0:
Failed to initialize IB Device
[2010-11-30 18:43:49.736841] E
[rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
initialization failed

This results in an unsuccessful mount.

I created the mount using the following commands:
/usr/local/glusterfs/3.1.1/sbin/gluster volume create testdir
transport rdma submit-1:/exports
/usr/local/glusterfs/3.1.1/sbin/gluster volume start testdir

To mount the directory, I use:
mount -t glusterfs submit-1:/testdir /mnt/glusterfs

I don't think it is an Infiniband problem since GlusterFS 3.0.6 and
GlusterFS 3.1.0 worked on the same systems. For GlusterFS 3.1.0, the
commands listed above produced no error messages.

If anyone can provide help with debugging these error messages, it
would be appreciated.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-01 Thread Anand Avati
Can you verify that ibv_srq_pingpong works from the server where this log
file is from?

Thanks,
Avati

On Wed, Dec 1, 2010 at 7:44 PM, Jeremy Stout stout.jer...@gmail.com wrote:

 Whenever I try to start or mount a GlusterFS 3.1.1 volume that uses
 RDMA, I'm seeing the following error messages in the log file on the
 server:
 [2010-11-30 18:37:53.51270] I [nfs.c:652:init] nfs: NFS service started
 [2010-11-30 18:37:53.51362] W [dict.c:1204:data_to_str] dict: @data=(nil)
 [2010-11-30 18:37:53.51375] W [dict.c:1204:data_to_str] dict: @data=(nil)
 [2010-11-30 18:37:53.59628] E [rdma.c:2066:rdma_create_cq]
 rpc-transport/rdma: testdir-client-0: creation of send_cq failed
 [2010-11-30 18:37:53.59851] E [rdma.c:3771:rdma_get_device]
 rpc-transport/rdma: testdir-client-0: could not create CQ
 [2010-11-30 18:37:53.59925] E [rdma.c:3957:rdma_init]
 rpc-transport/rdma: could not create rdma device for mthca0
 [2010-11-30 18:37:53.60009] E [rdma.c:4789:init] testdir-client-0:
 Failed to initialize IB Device
 [2010-11-30 18:37:53.60030] E [rpc-transport.c:971:rpc_transport_load]
 rpc-transport: 'rdma' initialization failed

 On the client, I see:
 [2010-11-30 18:43:49.653469] W [io-stats.c:1644:init] testdir:
 dangling volume. check volfile
 [2010-11-30 18:43:49.653573] W [dict.c:1204:data_to_str] dict: @data=(nil)
 [2010-11-30 18:43:49.653607] W [dict.c:1204:data_to_str] dict: @data=(nil)
 [2010-11-30 18:43:49.736275] E [rdma.c:2066:rdma_create_cq]
 rpc-transport/rdma: testdir-client-0: creation of send_cq failed
 [2010-11-30 18:43:49.736651] E [rdma.c:3771:rdma_get_device]
 rpc-transport/rdma: testdir-client-0: could not create CQ
 [2010-11-30 18:43:49.736689] E [rdma.c:3957:rdma_init]
 rpc-transport/rdma: could not create rdma device for mthca0
 [2010-11-30 18:43:49.736805] E [rdma.c:4789:init] testdir-client-0:
 Failed to initialize IB Device
 [2010-11-30 18:43:49.736841] E
 [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
 initialization failed

 This results in an unsuccessful mount.

 I created the mount using the following commands:
 /usr/local/glusterfs/3.1.1/sbin/gluster volume create testdir
 transport rdma submit-1:/exports
 /usr/local/glusterfs/3.1.1/sbin/gluster volume start testdir

 To mount the directory, I use:
 mount -t glusterfs submit-1:/testdir /mnt/glusterfs

 I don't think it is an Infiniband problem since GlusterFS 3.0.6 and
 GlusterFS 3.1.0 worked on the same systems. For GlusterFS 3.1.0, the
 commands listed above produced no error messages.

 If anyone can provide help with debugging these error messages, it
 would be appreciated.
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-01 Thread Jeremy Stout
Here are the results of the test:
submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong
  local address:  LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
  local address:  LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
  local address:  LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
  local address:  LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID ::
  local address:  LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID ::
  local address:  LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID ::
  local address:  LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID ::
  local address:  LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID ::
  local address:  LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID ::
  local address:  LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID ::
  local address:  LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID ::
  local address:  LID 0x0002, QPN 0x000411, PSN 0x89c683, GID ::
  local address:  LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID ::
  local address:  LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID ::
  local address:  LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID ::
  local address:  LID 0x0002, QPN 0x000415, PSN 0x29562e, GID ::
  remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
  remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
  remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
  remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
  remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
  remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
  remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
  remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
  remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
  remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
  remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
  remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
  remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
  remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
  remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID ::
  remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID ::
8192000 bytes in 0.01 seconds = 5917.47 Mbit/sec
1000 iters in 0.01 seconds = 11.07 usec/iter

fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong submit-1
  local address:  LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
  local address:  LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
  local address:  LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
  local address:  LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
  local address:  LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
  local address:  LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
  local address:  LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
  local address:  LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
  local address:  LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
  local address:  LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
  local address:  LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
  local address:  LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
  local address:  LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
  local address:  LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
  local address:  LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID ::
  local address:  LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID ::
  remote address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
  remote address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
  remote address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
  remote address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID ::
  remote address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID ::
  remote address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID ::
  remote address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID ::
  remote address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID ::
  remote address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID ::
  remote address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID ::
  remote address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID ::
  remote address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID ::
  remote address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID ::
  remote address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID ::
  remote address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID ::
  remote address: LID 0x0002, QPN 0x000415, PSN 0x29562e, GID ::
8192000 bytes in 0.01 seconds = 7423.65 Mbit/sec
1000 iters in 0.01 seconds = 8.83 usec/iter

Based on the output, I believe it ran correctly.

On Wed, Dec 1, 2010 at 9:51 AM, Anand Avati anand.av...@gmail.com wrote:
 Can you verify that ibv_srq_pingpong works from the server where this log
 file is from?

 Thanks,
 Avati

 On Wed, Dec 1, 2010 at 7:44 PM, Jeremy Stout stout.jer...@gmail.com wrote:

 Whenever I try to start or mount a GlusterFS 3.1.1 volume that uses
 RDMA, I'm seeing the following error messages in the log file on the
 server:
 

Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-01 Thread Jeremy Stout
As an update to my situation, I think I have GlusterFS 3.1.1 working
now. I was able to create and mount RDMA volumes without any errors.

To fix the problem, I had to make the following changes on lines 3562
and 3563 in rdma.c:
options-send_count = 32;
options-recv_count = 32;

The values were set to 128.

I'll run some tests tomorrow to verify that it is working correctly.
Assuming it does, what would be the expected side-effect of changing
the values from 128 to 32? Will there be a decrease in performance?


On Wed, Dec 1, 2010 at 10:07 AM, Jeremy Stout stout.jer...@gmail.com wrote:
 Here are the results of the test:
 submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong
  local address:  LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
  local address:  LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
  local address:  LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
  local address:  LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID ::
  local address:  LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID ::
  local address:  LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID ::
  local address:  LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID ::
  local address:  LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID ::
  local address:  LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID ::
  local address:  LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID ::
  local address:  LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID ::
  local address:  LID 0x0002, QPN 0x000411, PSN 0x89c683, GID ::
  local address:  LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID ::
  local address:  LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID ::
  local address:  LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID ::
  local address:  LID 0x0002, QPN 0x000415, PSN 0x29562e, GID ::
  remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
  remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
  remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
  remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
  remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
  remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
  remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
  remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
  remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
  remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
  remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
  remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
  remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
  remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
  remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID ::
  remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID ::
 8192000 bytes in 0.01 seconds = 5917.47 Mbit/sec
 1000 iters in 0.01 seconds = 11.07 usec/iter

 fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong submit-1
  local address:  LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
  local address:  LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
  local address:  LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
  local address:  LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
  local address:  LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
  local address:  LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
  local address:  LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
  local address:  LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
  local address:  LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
  local address:  LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
  local address:  LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
  local address:  LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
  local address:  LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
  local address:  LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
  local address:  LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID ::
  local address:  LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID ::
  remote address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
  remote address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
  remote address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
  remote address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID ::
  remote address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID ::
  remote address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID ::
  remote address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID ::
  remote address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID ::
  remote address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID ::
  remote address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID ::
  remote address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID ::
  remote address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID ::
  remote address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID ::
  remote address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID ::
  remote address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID ::
  remote address: LID 

Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-01 Thread Raghavendra G
Hi Jeremy,

Yes, there might be some performance decrease. But, it should not affect 
working of rdma.

regards,
- Original Message -
From: Jeremy Stout stout.jer...@gmail.com
To: gluster-users@gluster.org
Sent: Thursday, December 2, 2010 8:30:20 AM
Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

As an update to my situation, I think I have GlusterFS 3.1.1 working
now. I was able to create and mount RDMA volumes without any errors.

To fix the problem, I had to make the following changes on lines 3562
and 3563 in rdma.c:
options-send_count = 32;
options-recv_count = 32;

The values were set to 128.

I'll run some tests tomorrow to verify that it is working correctly.
Assuming it does, what would be the expected side-effect of changing
the values from 128 to 32? Will there be a decrease in performance?


On Wed, Dec 1, 2010 at 10:07 AM, Jeremy Stout stout.jer...@gmail.com wrote:
 Here are the results of the test:
 submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong
  local address:  LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
  local address:  LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
  local address:  LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
  local address:  LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID ::
  local address:  LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID ::
  local address:  LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID ::
  local address:  LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID ::
  local address:  LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID ::
  local address:  LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID ::
  local address:  LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID ::
  local address:  LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID ::
  local address:  LID 0x0002, QPN 0x000411, PSN 0x89c683, GID ::
  local address:  LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID ::
  local address:  LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID ::
  local address:  LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID ::
  local address:  LID 0x0002, QPN 0x000415, PSN 0x29562e, GID ::
  remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
  remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
  remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
  remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
  remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
  remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
  remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
  remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
  remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
  remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
  remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
  remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
  remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
  remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
  remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID ::
  remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID ::
 8192000 bytes in 0.01 seconds = 5917.47 Mbit/sec
 1000 iters in 0.01 seconds = 11.07 usec/iter

 fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong submit-1
  local address:  LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
  local address:  LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
  local address:  LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
  local address:  LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
  local address:  LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
  local address:  LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
  local address:  LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
  local address:  LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
  local address:  LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
  local address:  LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
  local address:  LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
  local address:  LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
  local address:  LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
  local address:  LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
  local address:  LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID ::
  local address:  LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID ::
  remote address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
  remote address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
  remote address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
  remote address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID ::
  remote address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID ::
  remote address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID ::
  remote address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID ::
  remote address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID ::
  remote address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID ::
  remote address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID ::
  remote address: LID

Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

2010-12-01 Thread Raghavendra G
Hi Jeremy,

In order to diagnoise why completion queue creation is failing (as indicated by 
logs), we want to know what was the free memory available in your system when 
glusterfs was started.

regards,
- Original Message -
From: Raghavendra G raghaven...@gluster.com
To: Jeremy Stout stout.jer...@gmail.com
Cc: gluster-users@gluster.org
Sent: Thursday, December 2, 2010 10:11:18 AM
Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

Hi Jeremy,

Yes, there might be some performance decrease. But, it should not affect 
working of rdma.

regards,
- Original Message -
From: Jeremy Stout stout.jer...@gmail.com
To: gluster-users@gluster.org
Sent: Thursday, December 2, 2010 8:30:20 AM
Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1

As an update to my situation, I think I have GlusterFS 3.1.1 working
now. I was able to create and mount RDMA volumes without any errors.

To fix the problem, I had to make the following changes on lines 3562
and 3563 in rdma.c:
options-send_count = 32;
options-recv_count = 32;

The values were set to 128.

I'll run some tests tomorrow to verify that it is working correctly.
Assuming it does, what would be the expected side-effect of changing
the values from 128 to 32? Will there be a decrease in performance?


On Wed, Dec 1, 2010 at 10:07 AM, Jeremy Stout stout.jer...@gmail.com wrote:
 Here are the results of the test:
 submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong
  local address:  LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
  local address:  LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
  local address:  LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
  local address:  LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID ::
  local address:  LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID ::
  local address:  LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID ::
  local address:  LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID ::
  local address:  LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID ::
  local address:  LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID ::
  local address:  LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID ::
  local address:  LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID ::
  local address:  LID 0x0002, QPN 0x000411, PSN 0x89c683, GID ::
  local address:  LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID ::
  local address:  LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID ::
  local address:  LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID ::
  local address:  LID 0x0002, QPN 0x000415, PSN 0x29562e, GID ::
  remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
  remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
  remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
  remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
  remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
  remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
  remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
  remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
  remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
  remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
  remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
  remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
  remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
  remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
  remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID ::
  remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID ::
 8192000 bytes in 0.01 seconds = 5917.47 Mbit/sec
 1000 iters in 0.01 seconds = 11.07 usec/iter

 fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong submit-1
  local address:  LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
  local address:  LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
  local address:  LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
  local address:  LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
  local address:  LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
  local address:  LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
  local address:  LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
  local address:  LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
  local address:  LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
  local address:  LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
  local address:  LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
  local address:  LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
  local address:  LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
  local address:  LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
  local address:  LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID ::
  local address:  LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID ::
  remote address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
  remote address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
  remote address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
  remote address: LID