Re: [Gluster-users] Rsync
Hi Hiren, What glusterfs version are you using? Can you send us the volfiles and the log files. Pavan On 22/09/09 16:01 +0100, Hiren Joshi wrote: > I forgot to mention, the mount is mounted with direct-io, would this > make a difference? > > > -Original Message- > > From: gluster-users-boun...@gluster.org > > [mailto:gluster-users-boun...@gluster.org] On Behalf Of Hiren Joshi > > Sent: 22 September 2009 11:40 > > To: gluster-users@gluster.org > > Subject: [Gluster-users] Rsync > > > > Hello all, > > > > I'm getting what I think is bizarre behaviour I have about 400G to > > rsync (rsync -av) onto a gluster share, the data is in a directory > > structure which has about 1000 directories per parent and about 1000 > > directories in each of them. > > > > When I try to rsync an end leaf directory (this has about 4 > > dirs and 100 > > files in each) the operation takes about 10 seconds. When I > > go one level > > above (1000 dirs with about 4 dirs in each with about 100 > > files in each) > > the operation takes about 10 minutes. > > > > Now, if I then go one level above that (that's 1000 dirs with > > 1000 dirs > > in each with about 4 dirs in each with about 100 files in each) the > > operation takes days! Top shows glusterfsd takes 300-600% cpu usage > > (2X4core), I have about 48G of memory (usage is 0% as expected). > > > > Has anyone seen anything like this? How can I speed it up? > > > > Thanks, > > > > Josh. > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Recovery from network failure
On 9/23/09, Georgecooldude wrote: > Anyone have any ideas on the below? Thanks. > Does the logfile of the server whose cable you pulled out, recognize the disconnection from the client? Avati ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Recovery from network failure
Anyone have any ideas on the below? Thanks. On Sun, Sep 20, 2009 at 10:05 PM, Georgecooldude wrote: > The file in question is a linux ISO. On Server01 I've copied it over to my > gluster mount and waited for it to be fully copied. Then on server02 I've > waited until about half of the file is replicated over and pulled the > network cable. Once the network cable is back in again no matter what I do I > cannot get it to sync the file back up. I did notice however that if I > reboot both of the servers then the corrupt image on Server02 is then > replicated over to Server01. > > Should Gluster 2.0.6 be able to cope with something like this or is this a > 2.1 feature? > > On Sun, Sep 20, 2009 at 6:26 PM, Anand Avati wrote: > >> > I'm starting gluster like this: >> > sudo glusterfs -f /etc/glusterfs/glusterfs-client.vol /mnt/glusterfs >> > sudo glusterfsd -f /etc/glusterfs/glusterfs-server.vol >> > >> > And do the following to try and trigger it to replicate >> > sudo ls -alRh /mnt/glusterfs/ >> > sudo ls -alRh /data/export >> > sudo ls -alRh /data/export-ns >> > >> > Am I missing something? >> >> You need to access the file in question from the mountpoint (ls -lR >> just ends up accessing all the files). Are you accessing it while the >> file is still open and being written to? self heal of open files will >> be supported only in 2.1 release. >> >> Avati >> > > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband
> I also had to upgrade the firmware on the mellanox cards I have to enable > srq (send recieve que) *shared receive queue Avati ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband
Sorry the mail daemon just batched me the rest of this conversation and I see this is already done. please ignore. -Mic Mickey Mazarick wrote: I had some difficulty getting OFED 1.3 working on kernel 2.6.27 about 6 months back. It took some patching but I did find that you needed to have the srq enabled for it to work. The ibv_srq_pingpong test app was a good test for weather it would work with gluster of not. I also had to upgrade the firmware on the mellanox cards I have to enable srq (send recieve que) -Mic Nathan Stratton wrote: Hate to post again, but anyone have any ideas on this? -Nathan On Fri, 18 Sep 2009, Nathan Stratton wrote: Has anyone been able to get Infiniband working with 2.6.31 kernel and fuse 2.8.0? My config works fine on my Centos 2.6.18 box, so I know that is ok. Infiniband looks good: [r...@xen1 src]# lsmod |grep ib ib_ucm 13752 0 ib_uverbs 32256 2 rdma_ucm,ib_ucm ib_ipoib 68880 0 ib_mthca 123700 0 [r...@xen1 src]# ibv_devices device node GUID -- mthca0 0005ad0327e8 Gluster looks like it starts OK, but I can't touch the mount and after a while it times out. Debug logs: [2009-09-18 19:36:17] D [glusterfsd.c:354:_get_specfp] glusterfs: loading volume file /usr/local/etc/glusterfs/glusterfs.vol Version : glusterfs 2.0.6 built on Sep 18 2009 09:54:43 TLA Revision : v2.0.6 Starting Time: 2009-09-18 19:36:17 Command line : glusterfs -L DEBUG -l /var/log/glusterfs.log --disable-direct-io-mode /share PID : 8303 System name : Linux Nodename : xen1.hou.blinkmind.com Kernel Release : 2.6.31 Hardware Identifier: x86_64 Given volfile: +--+ 1: volume brick0 2: type protocol/client 3: option transport-type ib-verbs/client 4: option remote-host 172.16.0.200 5: option remote-port 6997 6: option transport.address-family inet/inet6 7: option remote-subvolume brick 8: end-volume 9: 10: volume mirror0 11: type protocol/client 12: option transport-type ib-verbs/client 13: option remote-host 172.16.0.201 14: option remote-port 6997 15: option transport.address-family inet/inet6 16: option remote-subvolume brick 17: end-volume 18: 19: volume brick1 20: type protocol/client 21: option transport-type ib-verbs/client 22: option remote-host 172.16.0.202 23: option remote-port 6997 24: option transport.address-family inet/inet6 25: option remote-subvolume brick 26: end-volume 27: 28: volume mirror1 29: type protocol/client 30: option transport-type ib-verbs/client 31: option remote-host 172.16.0.203 32: option remote-port 6997 33: option transport.address-family inet/inet6 34: option remote-subvolume brick 35: end-volume 36: 37: volume brick2 38: type protocol/client 39: option transport-type ib-verbs/client 40: option remote-host 172.16.0.204 41: option remote-port 6997 42: option transport.address-family inet/inet6 43: option remote-subvolume brick 44: end-volume 45: 46: volume mirror2 47: type protocol/client 48: option transport-type ib-verbs/client 49: option remote-host 172.16.0.205 50: option remote-port 6997 51: option transport.address-family inet/inet6 52: option remote-subvolume brick 53: end-volume 54: 55: volume block0 56: type cluster/replicate 57: subvolumes brick0 mirror0 58: end-volume 59: 60: volume block1 61: type cluster/replicate 62: subvolumes brick1 mirror1 63: end-volume 64: 65: volume block2 66: type cluster/replicate 67: subvolumes brick2 mirror2 68: end-volume 69: 70: volume unify 71: type cluster/distribute 72: subvolumes block0 block1 block2 73: end-volume 74: +--+ [2009-09-18 19:36:17] D [glusterfsd.c:1205:main] glusterfs: running in pid 8303 [2009-09-18 19:36:17] D [client-protocol.c:5952:init] brick0: defaulting frame-timeout to 30mins [2009-09-18 19:36:17] D [client-protocol.c:5963:init] brick0: defaulting ping-timeout to 10 [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: attempt to load file /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] brick0: no range check required for 'option remote-port 6997' [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: attempt to load file /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] brick0: no range check required for 'option remote-port 6997' [2009-09-18 19:36:17] D [client-protocol.c:5952:init] mirror0: defaulting frame-timeout to 30mins [2009-09-18 19:36:17] D [client-protocol.c:5963:init] mirror0: defaulting ping-timeout to 10 [2009-09-18 19:36:17] D [transport.c:141:tran
Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband
I had some difficulty getting OFED 1.3 working on kernel 2.6.27 about 6 months back. It took some patching but I did find that you needed to have the srq enabled for it to work. The ibv_srq_pingpong test app was a good test for weather it would work with gluster of not. I also had to upgrade the firmware on the mellanox cards I have to enable srq (send recieve que) -Mic Nathan Stratton wrote: Hate to post again, but anyone have any ideas on this? -Nathan On Fri, 18 Sep 2009, Nathan Stratton wrote: Has anyone been able to get Infiniband working with 2.6.31 kernel and fuse 2.8.0? My config works fine on my Centos 2.6.18 box, so I know that is ok. Infiniband looks good: [r...@xen1 src]# lsmod |grep ib ib_ucm 13752 0 ib_uverbs 32256 2 rdma_ucm,ib_ucm ib_ipoib 68880 0 ib_mthca 123700 0 [r...@xen1 src]# ibv_devices device node GUID -- mthca0 0005ad0327e8 Gluster looks like it starts OK, but I can't touch the mount and after a while it times out. Debug logs: [2009-09-18 19:36:17] D [glusterfsd.c:354:_get_specfp] glusterfs: loading volume file /usr/local/etc/glusterfs/glusterfs.vol Version : glusterfs 2.0.6 built on Sep 18 2009 09:54:43 TLA Revision : v2.0.6 Starting Time: 2009-09-18 19:36:17 Command line : glusterfs -L DEBUG -l /var/log/glusterfs.log --disable-direct-io-mode /share PID : 8303 System name : Linux Nodename : xen1.hou.blinkmind.com Kernel Release : 2.6.31 Hardware Identifier: x86_64 Given volfile: +--+ 1: volume brick0 2: type protocol/client 3: option transport-type ib-verbs/client 4: option remote-host 172.16.0.200 5: option remote-port 6997 6: option transport.address-family inet/inet6 7: option remote-subvolume brick 8: end-volume 9: 10: volume mirror0 11: type protocol/client 12: option transport-type ib-verbs/client 13: option remote-host 172.16.0.201 14: option remote-port 6997 15: option transport.address-family inet/inet6 16: option remote-subvolume brick 17: end-volume 18: 19: volume brick1 20: type protocol/client 21: option transport-type ib-verbs/client 22: option remote-host 172.16.0.202 23: option remote-port 6997 24: option transport.address-family inet/inet6 25: option remote-subvolume brick 26: end-volume 27: 28: volume mirror1 29: type protocol/client 30: option transport-type ib-verbs/client 31: option remote-host 172.16.0.203 32: option remote-port 6997 33: option transport.address-family inet/inet6 34: option remote-subvolume brick 35: end-volume 36: 37: volume brick2 38: type protocol/client 39: option transport-type ib-verbs/client 40: option remote-host 172.16.0.204 41: option remote-port 6997 42: option transport.address-family inet/inet6 43: option remote-subvolume brick 44: end-volume 45: 46: volume mirror2 47: type protocol/client 48: option transport-type ib-verbs/client 49: option remote-host 172.16.0.205 50: option remote-port 6997 51: option transport.address-family inet/inet6 52: option remote-subvolume brick 53: end-volume 54: 55: volume block0 56: type cluster/replicate 57: subvolumes brick0 mirror0 58: end-volume 59: 60: volume block1 61: type cluster/replicate 62: subvolumes brick1 mirror1 63: end-volume 64: 65: volume block2 66: type cluster/replicate 67: subvolumes brick2 mirror2 68: end-volume 69: 70: volume unify 71: type cluster/distribute 72: subvolumes block0 block1 block2 73: end-volume 74: +--+ [2009-09-18 19:36:17] D [glusterfsd.c:1205:main] glusterfs: running in pid 8303 [2009-09-18 19:36:17] D [client-protocol.c:5952:init] brick0: defaulting frame-timeout to 30mins [2009-09-18 19:36:17] D [client-protocol.c:5963:init] brick0: defaulting ping-timeout to 10 [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: attempt to load file /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] brick0: no range check required for 'option remote-port 6997' [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: attempt to load file /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] brick0: no range check required for 'option remote-port 6997' [2009-09-18 19:36:17] D [client-protocol.c:5952:init] mirror0: defaulting frame-timeout to 30mins [2009-09-18 19:36:17] D [client-protocol.c:5963:init] mirror0: defaulting ping-timeout to 10 [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: attempt to load file /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so [2009-09-18 19:36:17] D [xlator.c:276:_volume_op
Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband
On Tue, 22 Sep 2009, Anand Avati wrote: Then I guess even IPoIB is not working for you? I'm not sure if you Actually, IPoIB is working just fine. might have to upgrade to a new OFED for either libibverbs or the mthca uverbs driver may not be compatible with the latest kernel IB drivers. Ya, I think you may be onto something there. Any hints from the OFED mailing list? Will try them next, does not look like a gluster issue. Avati ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband
> > > [r...@xen1 ~]# ibv_srq_pingpong 10.13.0.220 > > > local address: LID 0x000b, QPN 0x300406, PSN 0x1ace41 > > > local address: LID 0x000b, QPN 0x300407, PSN 0x6ba197 > > > local address: LID 0x000b, QPN 0x300408, PSN 0xa6f895 > > > local address: LID 0x000b, QPN 0x300409, PSN 0xf054c0 > > > local address: LID 0x000b, QPN 0x30040a, PSN 0xea4bd3 > > > local address: LID 0x000b, QPN 0x30040b, PSN 0xfe3039 > > > local address: LID 0x000b, QPN 0x30040c, PSN 0x037fa4 > > > local address: LID 0x000b, QPN 0x30040d, PSN 0x1feccd > > > local address: LID 0x000b, QPN 0x30040e, PSN 0x22daed > > > local address: LID 0x000b, QPN 0x30040f, PSN 0xcaa26b > > > local address: LID 0x000b, QPN 0x300410, PSN 0xe87f33 > > > local address: LID 0x000b, QPN 0x300411, PSN 0x84bb4a > > > local address: LID 0x000b, QPN 0x300412, PSN 0x09286e > > > local address: LID 0x000b, QPN 0x300413, PSN 0xecf483 > > > local address: LID 0x000b, QPN 0x300414, PSN 0xd55285 > > > local address: LID 0x000b, QPN 0x300415, PSN 0xdd7065 > > > remote address: LID 0x0004, QPN 0x460406, PSN 0x198bcb > > > remote address: LID 0x0004, QPN 0x460407, PSN 0x645159 > > > remote address: LID 0x0004, QPN 0x460408, PSN 0x4a1a2f > > > remote address: LID 0x0004, QPN 0x460409, PSN 0x8dff52 > > > remote address: LID 0x0004, QPN 0x46040a, PSN 0xe317fd > > > remote address: LID 0x0004, QPN 0x46040b, PSN 0x12da1b > > > remote address: LID 0x0004, QPN 0x460418, PSN 0xc8e0de > > > remote address: LID 0x0004, QPN 0x460419, PSN 0xfc6e7f > > > remote address: LID 0x0004, QPN 0x46041a, PSN 0xa3ffb7 > > > remote address: LID 0x0004, QPN 0x46041b, PSN 0x0cc86d > > > remote address: LID 0x0004, QPN 0x46041c, PSN 0x107a0d > > > remote address: LID 0x0004, QPN 0x46041d, PSN 0xe2661c > > > remote address: LID 0x0004, QPN 0x46041e, PSN 0xfb8fd8 > > > remote address: LID 0x0004, QPN 0x46041f, PSN 0xc438a5 > > > remote address: LID 0x0004, QPN 0x460420, PSN 0x0be0ff > > > remote address: LID 0x0004, QPN 0x460421, PSN 0x91b657 > > > > > > This one ends odd, I think it should tell me more info, but it just > sits > > > there. > > > > > > > Just starting ibv_srq_pingpong makes it the "server". You should run > > "ibv_srq_pingpong " from a second server and then the two will > > ping-pong each other. Can you please post that output as well? > > > > The above was from the client using the IP address of the server. The > server showed: Then I guess even IPoIB is not working for you? I'm not sure if you might have to upgrade to a new OFED for either libibverbs or the mthca uverbs driver may not be compatible with the latest kernel IB drivers. Any hints from the OFED mailing list? Avati ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband
On Tue, 22 Sep 2009, Anand Avati wrote: What does the server log have to say? Can you also check if port-1 is the active port in ibv_devinfo? Looks like ib-verbs messaging is not happening. Does ibv_srq_pingpong give sane results? [3.890311] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 2008) [3.890315] ib_mthca: Initializing :08:00.0 [3.890354] ib_mthca :08:00.0: PCI INT A -> GSI 28 (level, low) -> IRQ 28 [7.899804] ADDRCONF(NETDEV_UP): ib0: link is not ready [7.902722] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready [ 17.935013] ib0: no IPv6 routers present [r...@xen1 ~]# ibv_devinfo hca_id: mthca0 fw_ver: 3.5.0 node_guid: 0005:ad00:0003:27e8 sys_image_guid: 0005:ad00:0100:d050 vendor_id: 0x02c9 vendor_part_id: 23108 hw_ver: 0xA1 board_id: MT_0270110001 phys_port_cnt: 2 port: 1 state: active (4) max_mtu:2048 (4) active_mtu: 2048 (4) sm_lid: 2 port_lid: 11 port_lmc: 0x00 port: 2 state: down (1) max_mtu:2048 (4) active_mtu: 512 (2) sm_lid: 0 port_lid: 0 port_lmc: 0x00 [r...@xen1 ~]# ibv_srq_pingpong 10.13.0.220 local address: LID 0x000b, QPN 0x300406, PSN 0x1ace41 local address: LID 0x000b, QPN 0x300407, PSN 0x6ba197 local address: LID 0x000b, QPN 0x300408, PSN 0xa6f895 local address: LID 0x000b, QPN 0x300409, PSN 0xf054c0 local address: LID 0x000b, QPN 0x30040a, PSN 0xea4bd3 local address: LID 0x000b, QPN 0x30040b, PSN 0xfe3039 local address: LID 0x000b, QPN 0x30040c, PSN 0x037fa4 local address: LID 0x000b, QPN 0x30040d, PSN 0x1feccd local address: LID 0x000b, QPN 0x30040e, PSN 0x22daed local address: LID 0x000b, QPN 0x30040f, PSN 0xcaa26b local address: LID 0x000b, QPN 0x300410, PSN 0xe87f33 local address: LID 0x000b, QPN 0x300411, PSN 0x84bb4a local address: LID 0x000b, QPN 0x300412, PSN 0x09286e local address: LID 0x000b, QPN 0x300413, PSN 0xecf483 local address: LID 0x000b, QPN 0x300414, PSN 0xd55285 local address: LID 0x000b, QPN 0x300415, PSN 0xdd7065 remote address: LID 0x0004, QPN 0x460406, PSN 0x198bcb remote address: LID 0x0004, QPN 0x460407, PSN 0x645159 remote address: LID 0x0004, QPN 0x460408, PSN 0x4a1a2f remote address: LID 0x0004, QPN 0x460409, PSN 0x8dff52 remote address: LID 0x0004, QPN 0x46040a, PSN 0xe317fd remote address: LID 0x0004, QPN 0x46040b, PSN 0x12da1b remote address: LID 0x0004, QPN 0x460418, PSN 0xc8e0de remote address: LID 0x0004, QPN 0x460419, PSN 0xfc6e7f remote address: LID 0x0004, QPN 0x46041a, PSN 0xa3ffb7 remote address: LID 0x0004, QPN 0x46041b, PSN 0x0cc86d remote address: LID 0x0004, QPN 0x46041c, PSN 0x107a0d remote address: LID 0x0004, QPN 0x46041d, PSN 0xe2661c remote address: LID 0x0004, QPN 0x46041e, PSN 0xfb8fd8 remote address: LID 0x0004, QPN 0x46041f, PSN 0xc438a5 remote address: LID 0x0004, QPN 0x460420, PSN 0x0be0ff remote address: LID 0x0004, QPN 0x460421, PSN 0x91b657 This one ends odd, I think it should tell me more info, but it just sits there. Just starting ibv_srq_pingpong makes it the "server". You should run "ibv_srq_pingpong " from a second server and then the two will ping-pong each other. Can you please post that output as well? The above was from the client using the IP address of the server. The server showed: [r...@xen0 ~]# ibv_srq_pingpong local address: LID 0x0004, QPN 0x460406, PSN 0x198bcb local address: LID 0x0004, QPN 0x460407, PSN 0x645159 local address: LID 0x0004, QPN 0x460408, PSN 0x4a1a2f local address: LID 0x0004, QPN 0x460409, PSN 0x8dff52 local address: LID 0x0004, QPN 0x46040a, PSN 0xe317fd local address: LID 0x0004, QPN 0x46040b, PSN 0x12da1b local address: LID 0x0004, QPN 0x460418, PSN 0xc8e0de local address: LID 0x0004, QPN 0x460419, PSN 0xfc6e7f local address: LID 0x0004, QPN 0x46041a, PSN 0xa3ffb7 local address: LID 0x0004, QPN 0x46041b, PSN 0x0cc86d local address: LID 0x0004, QPN 0x46041c, PSN 0x107a0d local address: LID 0x0004, QPN 0x46041d, PSN 0xe2661c local address: LID 0x0004, QPN 0x46041e, PSN 0xfb8fd8 local address: LID 0x0004, QPN 0x46041f, PSN 0xc438a5 local address: LID 0x0004, QPN 0x460420, PSN 0x0be0ff local address: LID 0x0004, QPN 0x460421, PSN 0x91b657 remote address: LID 0x000b, QPN 0x300406, PS
Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband
> > What does the server log have to say? Can you also check if port-1 is > > the active port in ibv_devinfo? Looks like ib-verbs messaging is not > > happening. Does ibv_srq_pingpong give sane results? > > > > [3.890311] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, > 2008) > [3.890315] ib_mthca: Initializing :08:00.0 > [3.890354] ib_mthca :08:00.0: PCI INT A -> GSI 28 (level, low) -> > IRQ 28 > [7.899804] ADDRCONF(NETDEV_UP): ib0: link is not ready > [7.902722] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready > [ 17.935013] ib0: no IPv6 routers present > > > [r...@xen1 ~]# ibv_devinfo > hca_id: mthca0 > fw_ver: 3.5.0 > node_guid: 0005:ad00:0003:27e8 > sys_image_guid: 0005:ad00:0100:d050 > vendor_id: 0x02c9 > vendor_part_id: 23108 > hw_ver: 0xA1 > board_id: MT_0270110001 > phys_port_cnt: 2 > port: 1 > state: active (4) > max_mtu:2048 (4) > active_mtu: 2048 (4) > sm_lid: 2 > port_lid: 11 > port_lmc: 0x00 > > port: 2 > state: down (1) > max_mtu:2048 (4) > active_mtu: 512 (2) > sm_lid: 0 > port_lid: 0 > port_lmc: 0x00 > > > > [r...@xen1 ~]# ibv_srq_pingpong 10.13.0.220 > local address: LID 0x000b, QPN 0x300406, PSN 0x1ace41 > local address: LID 0x000b, QPN 0x300407, PSN 0x6ba197 > local address: LID 0x000b, QPN 0x300408, PSN 0xa6f895 > local address: LID 0x000b, QPN 0x300409, PSN 0xf054c0 > local address: LID 0x000b, QPN 0x30040a, PSN 0xea4bd3 > local address: LID 0x000b, QPN 0x30040b, PSN 0xfe3039 > local address: LID 0x000b, QPN 0x30040c, PSN 0x037fa4 > local address: LID 0x000b, QPN 0x30040d, PSN 0x1feccd > local address: LID 0x000b, QPN 0x30040e, PSN 0x22daed > local address: LID 0x000b, QPN 0x30040f, PSN 0xcaa26b > local address: LID 0x000b, QPN 0x300410, PSN 0xe87f33 > local address: LID 0x000b, QPN 0x300411, PSN 0x84bb4a > local address: LID 0x000b, QPN 0x300412, PSN 0x09286e > local address: LID 0x000b, QPN 0x300413, PSN 0xecf483 > local address: LID 0x000b, QPN 0x300414, PSN 0xd55285 > local address: LID 0x000b, QPN 0x300415, PSN 0xdd7065 > remote address: LID 0x0004, QPN 0x460406, PSN 0x198bcb > remote address: LID 0x0004, QPN 0x460407, PSN 0x645159 > remote address: LID 0x0004, QPN 0x460408, PSN 0x4a1a2f > remote address: LID 0x0004, QPN 0x460409, PSN 0x8dff52 > remote address: LID 0x0004, QPN 0x46040a, PSN 0xe317fd > remote address: LID 0x0004, QPN 0x46040b, PSN 0x12da1b > remote address: LID 0x0004, QPN 0x460418, PSN 0xc8e0de > remote address: LID 0x0004, QPN 0x460419, PSN 0xfc6e7f > remote address: LID 0x0004, QPN 0x46041a, PSN 0xa3ffb7 > remote address: LID 0x0004, QPN 0x46041b, PSN 0x0cc86d > remote address: LID 0x0004, QPN 0x46041c, PSN 0x107a0d > remote address: LID 0x0004, QPN 0x46041d, PSN 0xe2661c > remote address: LID 0x0004, QPN 0x46041e, PSN 0xfb8fd8 > remote address: LID 0x0004, QPN 0x46041f, PSN 0xc438a5 > remote address: LID 0x0004, QPN 0x460420, PSN 0x0be0ff > remote address: LID 0x0004, QPN 0x460421, PSN 0x91b657 > > This one ends odd, I think it should tell me more info, but it just sits > there. Just starting ibv_srq_pingpong makes it the "server". You should run "ibv_srq_pingpong " from a second server and then the two will ping-pong each other. Can you please post that output as well? Avati ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband
On Tue, 22 Sep 2009, Anand Avati wrote: Hate to post again, but anyone have any ideas on this? What does the server log have to say? Can you also check if port-1 is the active port in ibv_devinfo? Looks like ib-verbs messaging is not happening. Does ibv_srq_pingpong give sane results? [3.890311] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 2008) [3.890315] ib_mthca: Initializing :08:00.0 [3.890354] ib_mthca :08:00.0: PCI INT A -> GSI 28 (level, low) -> IRQ 28 [7.899804] ADDRCONF(NETDEV_UP): ib0: link is not ready [7.902722] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready [ 17.935013] ib0: no IPv6 routers present [r...@xen1 ~]# ibv_devinfo hca_id: mthca0 fw_ver: 3.5.0 node_guid: 0005:ad00:0003:27e8 sys_image_guid: 0005:ad00:0100:d050 vendor_id: 0x02c9 vendor_part_id: 23108 hw_ver: 0xA1 board_id: MT_0270110001 phys_port_cnt: 2 port: 1 state: active (4) max_mtu:2048 (4) active_mtu: 2048 (4) sm_lid: 2 port_lid: 11 port_lmc: 0x00 port: 2 state: down (1) max_mtu:2048 (4) active_mtu: 512 (2) sm_lid: 0 port_lid: 0 port_lmc: 0x00 [r...@xen1 ~]# ibv_srq_pingpong 10.13.0.220 local address: LID 0x000b, QPN 0x300406, PSN 0x1ace41 local address: LID 0x000b, QPN 0x300407, PSN 0x6ba197 local address: LID 0x000b, QPN 0x300408, PSN 0xa6f895 local address: LID 0x000b, QPN 0x300409, PSN 0xf054c0 local address: LID 0x000b, QPN 0x30040a, PSN 0xea4bd3 local address: LID 0x000b, QPN 0x30040b, PSN 0xfe3039 local address: LID 0x000b, QPN 0x30040c, PSN 0x037fa4 local address: LID 0x000b, QPN 0x30040d, PSN 0x1feccd local address: LID 0x000b, QPN 0x30040e, PSN 0x22daed local address: LID 0x000b, QPN 0x30040f, PSN 0xcaa26b local address: LID 0x000b, QPN 0x300410, PSN 0xe87f33 local address: LID 0x000b, QPN 0x300411, PSN 0x84bb4a local address: LID 0x000b, QPN 0x300412, PSN 0x09286e local address: LID 0x000b, QPN 0x300413, PSN 0xecf483 local address: LID 0x000b, QPN 0x300414, PSN 0xd55285 local address: LID 0x000b, QPN 0x300415, PSN 0xdd7065 remote address: LID 0x0004, QPN 0x460406, PSN 0x198bcb remote address: LID 0x0004, QPN 0x460407, PSN 0x645159 remote address: LID 0x0004, QPN 0x460408, PSN 0x4a1a2f remote address: LID 0x0004, QPN 0x460409, PSN 0x8dff52 remote address: LID 0x0004, QPN 0x46040a, PSN 0xe317fd remote address: LID 0x0004, QPN 0x46040b, PSN 0x12da1b remote address: LID 0x0004, QPN 0x460418, PSN 0xc8e0de remote address: LID 0x0004, QPN 0x460419, PSN 0xfc6e7f remote address: LID 0x0004, QPN 0x46041a, PSN 0xa3ffb7 remote address: LID 0x0004, QPN 0x46041b, PSN 0x0cc86d remote address: LID 0x0004, QPN 0x46041c, PSN 0x107a0d remote address: LID 0x0004, QPN 0x46041d, PSN 0xe2661c remote address: LID 0x0004, QPN 0x46041e, PSN 0xfb8fd8 remote address: LID 0x0004, QPN 0x46041f, PSN 0xc438a5 remote address: LID 0x0004, QPN 0x460420, PSN 0x0be0ff remote address: LID 0x0004, QPN 0x460421, PSN 0x91b657 This one ends odd, I think it should tell me more info, but it just sits there. -Nathan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband
> Hate to post again, but anyone have any ideas on this? What does the server log have to say? Can you also check if port-1 is the active port in ibv_devinfo? Looks like ib-verbs messaging is not happening. Does ibv_srq_pingpong give sane results? Avati ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband
Hate to post again, but anyone have any ideas on this? -Nathan On Fri, 18 Sep 2009, Nathan Stratton wrote: Has anyone been able to get Infiniband working with 2.6.31 kernel and fuse 2.8.0? My config works fine on my Centos 2.6.18 box, so I know that is ok. Infiniband looks good: [r...@xen1 src]# lsmod |grep ib ib_ucm 13752 0 ib_uverbs 32256 2 rdma_ucm,ib_ucm ib_ipoib 68880 0 ib_mthca 123700 0 [r...@xen1 src]# ibv_devices device node GUID -- mthca0 0005ad0327e8 Gluster looks like it starts OK, but I can't touch the mount and after a while it times out. Debug logs: [2009-09-18 19:36:17] D [glusterfsd.c:354:_get_specfp] glusterfs: loading volume file /usr/local/etc/glusterfs/glusterfs.vol Version : glusterfs 2.0.6 built on Sep 18 2009 09:54:43 TLA Revision : v2.0.6 Starting Time: 2009-09-18 19:36:17 Command line : glusterfs -L DEBUG -l /var/log/glusterfs.log --disable-direct-io-mode /share PID : 8303 System name : Linux Nodename : xen1.hou.blinkmind.com Kernel Release : 2.6.31 Hardware Identifier: x86_64 Given volfile: +--+ 1: volume brick0 2: type protocol/client 3: option transport-type ib-verbs/client 4: option remote-host 172.16.0.200 5: option remote-port 6997 6: option transport.address-family inet/inet6 7: option remote-subvolume brick 8: end-volume 9: 10: volume mirror0 11: type protocol/client 12: option transport-type ib-verbs/client 13: option remote-host 172.16.0.201 14: option remote-port 6997 15: option transport.address-family inet/inet6 16: option remote-subvolume brick 17: end-volume 18: 19: volume brick1 20: type protocol/client 21: option transport-type ib-verbs/client 22: option remote-host 172.16.0.202 23: option remote-port 6997 24: option transport.address-family inet/inet6 25: option remote-subvolume brick 26: end-volume 27: 28: volume mirror1 29: type protocol/client 30: option transport-type ib-verbs/client 31: option remote-host 172.16.0.203 32: option remote-port 6997 33: option transport.address-family inet/inet6 34: option remote-subvolume brick 35: end-volume 36: 37: volume brick2 38: type protocol/client 39: option transport-type ib-verbs/client 40: option remote-host 172.16.0.204 41: option remote-port 6997 42: option transport.address-family inet/inet6 43: option remote-subvolume brick 44: end-volume 45: 46: volume mirror2 47: type protocol/client 48: option transport-type ib-verbs/client 49: option remote-host 172.16.0.205 50: option remote-port 6997 51: option transport.address-family inet/inet6 52: option remote-subvolume brick 53: end-volume 54: 55: volume block0 56: type cluster/replicate 57: subvolumes brick0 mirror0 58: end-volume 59: 60: volume block1 61: type cluster/replicate 62: subvolumes brick1 mirror1 63: end-volume 64: 65: volume block2 66: type cluster/replicate 67: subvolumes brick2 mirror2 68: end-volume 69: 70: volume unify 71: type cluster/distribute 72: subvolumes block0 block1 block2 73: end-volume 74: +--+ [2009-09-18 19:36:17] D [glusterfsd.c:1205:main] glusterfs: running in pid 8303 [2009-09-18 19:36:17] D [client-protocol.c:5952:init] brick0: defaulting frame-timeout to 30mins [2009-09-18 19:36:17] D [client-protocol.c:5963:init] brick0: defaulting ping-timeout to 10 [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: attempt to load file /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] brick0: no range check required for 'option remote-port 6997' [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: attempt to load file /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] brick0: no range check required for 'option remote-port 6997' [2009-09-18 19:36:17] D [client-protocol.c:5952:init] mirror0: defaulting frame-timeout to 30mins [2009-09-18 19:36:17] D [client-protocol.c:5963:init] mirror0: defaulting ping-timeout to 10 [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: attempt to load file /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] mirror0: no range check required for 'option remote-port 6997' [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: attempt to load file /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] mirror0: no range check required for 'option remote-port 6997' [2009-09-18 19:36:17] D [client-protocol.c:5952:init]
Re: [Gluster-users] Rsync
I forgot to mention, the mount is mounted with direct-io, would this make a difference? > -Original Message- > From: gluster-users-boun...@gluster.org > [mailto:gluster-users-boun...@gluster.org] On Behalf Of Hiren Joshi > Sent: 22 September 2009 11:40 > To: gluster-users@gluster.org > Subject: [Gluster-users] Rsync > > Hello all, > > I'm getting what I think is bizarre behaviour I have about 400G to > rsync (rsync -av) onto a gluster share, the data is in a directory > structure which has about 1000 directories per parent and about 1000 > directories in each of them. > > When I try to rsync an end leaf directory (this has about 4 > dirs and 100 > files in each) the operation takes about 10 seconds. When I > go one level > above (1000 dirs with about 4 dirs in each with about 100 > files in each) > the operation takes about 10 minutes. > > Now, if I then go one level above that (that's 1000 dirs with > 1000 dirs > in each with about 4 dirs in each with about 100 files in each) the > operation takes days! Top shows glusterfsd takes 300-600% cpu usage > (2X4core), I have about 48G of memory (usage is 0% as expected). > > Has anyone seen anything like this? How can I speed it up? > > Thanks, > > Josh. > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Rsync
Hello all, I'm getting what I think is bizarre behaviour I have about 400G to rsync (rsync -av) onto a gluster share, the data is in a directory structure which has about 1000 directories per parent and about 1000 directories in each of them. When I try to rsync an end leaf directory (this has about 4 dirs and 100 files in each) the operation takes about 10 seconds. When I go one level above (1000 dirs with about 4 dirs in each with about 100 files in each) the operation takes about 10 minutes. Now, if I then go one level above that (that's 1000 dirs with 1000 dirs in each with about 4 dirs in each with about 100 files in each) the operation takes days! Top shows glusterfsd takes 300-600% cpu usage (2X4core), I have about 48G of memory (usage is 0% as expected). Has anyone seen anything like this? How can I speed it up? Thanks, Josh. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users