Re: [Gluster-users] Rsync

2009-09-22 Thread Pavan Vilas Sondur
Hi Hiren,
What glusterfs version are you using? Can you send us the volfiles and the log 
files.

Pavan

On 22/09/09 16:01 +0100, Hiren Joshi wrote:
> I forgot to mention, the mount is mounted with direct-io, would this
> make a difference? 
> 
> > -Original Message-
> > From: gluster-users-boun...@gluster.org 
> > [mailto:gluster-users-boun...@gluster.org] On Behalf Of Hiren Joshi
> > Sent: 22 September 2009 11:40
> > To: gluster-users@gluster.org
> > Subject: [Gluster-users] Rsync
> > 
> > Hello all,
> >  
> > I'm getting what I think is bizarre behaviour I have about 400G to
> > rsync (rsync -av) onto a gluster share, the data is in a directory
> > structure which has about 1000 directories per parent and about 1000
> > directories in each of them.
> >  
> > When I try to rsync an end leaf directory (this has about 4 
> > dirs and 100
> > files in each) the operation takes about 10 seconds. When I 
> > go one level
> > above (1000 dirs with about 4 dirs in each with about 100 
> > files in each)
> > the operation takes about 10 minutes.
> >  
> > Now, if I then go one level above that (that's 1000 dirs with 
> > 1000 dirs
> > in each with about 4 dirs in each with about 100 files in each) the
> > operation takes days! Top shows glusterfsd takes 300-600% cpu usage
> > (2X4core), I have about 48G of memory (usage is 0% as expected).
> >  
> > Has anyone seen anything like this? How can I speed it up?
> >  
> > Thanks,
> >  
> > Josh.
> > 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Recovery from network failure

2009-09-22 Thread Anand Avati
On 9/23/09, Georgecooldude  wrote:
> Anyone have any ideas on the below? Thanks.
>

Does the logfile of the server whose cable you pulled out, recognize
the disconnection from the client?

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Recovery from network failure

2009-09-22 Thread Georgecooldude
Anyone have any ideas on the below? Thanks.

On Sun, Sep 20, 2009 at 10:05 PM, Georgecooldude
wrote:

> The file in question is a linux ISO. On Server01 I've copied it over to my
> gluster mount and waited for it to be fully copied. Then on server02 I've
> waited until about half of the file is replicated over and pulled the
> network cable. Once the network cable is back in again no matter what I do I
> cannot get it to sync the file back up. I did notice however that if I
> reboot both of the servers then the corrupt image on Server02 is then
> replicated over to Server01.
>
> Should Gluster 2.0.6 be able to cope with something like this or is this a
> 2.1 feature?
>
>   On Sun, Sep 20, 2009 at 6:26 PM, Anand Avati  wrote:
>
>> > I'm starting gluster like this:
>> > sudo glusterfs -f /etc/glusterfs/glusterfs-client.vol /mnt/glusterfs
>> > sudo glusterfsd -f /etc/glusterfs/glusterfs-server.vol
>> >
>> > And do the following to try and trigger it to replicate
>> > sudo ls -alRh /mnt/glusterfs/
>> > sudo ls -alRh /data/export
>> > sudo ls -alRh /data/export-ns
>> >
>> > Am I missing something?
>>
>> You need to access the file in question from the mountpoint (ls -lR
>> just ends up accessing all the files). Are you accessing it while the
>> file is still open and being written to? self heal of open files will
>> be supported only in 2.1 release.
>>
>> Avati
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband

2009-09-22 Thread Anand Avati
>  I also had to upgrade the firmware on the mellanox cards I have to enable
> srq (send recieve que)

*shared receive queue

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband

2009-09-22 Thread Mickey Mazarick
Sorry the mail daemon just batched me the rest of this conversation and 
I see this is already done. please ignore.


-Mic

Mickey Mazarick wrote:
I had some difficulty getting OFED 1.3 working on kernel 2.6.27 about 
6 months back.  It took some patching but I did find that you needed 
to have the srq enabled for it to work. The ibv_srq_pingpong test app 
was a good test for weather it would work with gluster of not.


I also had to upgrade the firmware on the mellanox cards I have to 
enable srq (send recieve que)


-Mic

Nathan Stratton wrote:


Hate to post again, but anyone have any ideas on this?

-Nathan

On Fri, 18 Sep 2009, Nathan Stratton wrote:



Has anyone been able to get Infiniband working with 2.6.31 kernel 
and fuse 2.8.0? My config works fine on my Centos 2.6.18 box, so I 
know that is ok.


Infiniband looks good:

[r...@xen1 src]# lsmod |grep ib
ib_ucm 13752  0
ib_uverbs  32256  2 rdma_ucm,ib_ucm
ib_ipoib   68880  0
ib_mthca  123700  0

[r...@xen1 src]# ibv_devices
   device node GUID
   --  
   mthca0  0005ad0327e8

Gluster looks like it starts OK, but I can't touch the mount and 
after a while it times out. Debug logs:



[2009-09-18 19:36:17] D [glusterfsd.c:354:_get_specfp] glusterfs: 
loading volume file /usr/local/etc/glusterfs/glusterfs.vol
 


Version  : glusterfs 2.0.6 built on Sep 18 2009 09:54:43
TLA Revision : v2.0.6
Starting Time: 2009-09-18 19:36:17
Command line : glusterfs -L DEBUG -l /var/log/glusterfs.log 
--disable-direct-io-mode /share

PID  : 8303
System name  : Linux
Nodename : xen1.hou.blinkmind.com
Kernel Release : 2.6.31
Hardware Identifier: x86_64

Given volfile:
+--+ 


 1: volume brick0
 2:  type protocol/client
 3:  option transport-type ib-verbs/client
 4:  option remote-host 172.16.0.200
 5:  option remote-port 6997
 6:  option transport.address-family inet/inet6
 7:  option remote-subvolume brick
 8: end-volume
 9:
10: volume mirror0
11:  type protocol/client
12:  option transport-type ib-verbs/client
13:  option remote-host 172.16.0.201
14:  option remote-port 6997
15:  option transport.address-family inet/inet6
16:  option remote-subvolume brick
17: end-volume
18:
19: volume brick1
20:  type protocol/client
21:  option transport-type ib-verbs/client
22:  option remote-host 172.16.0.202
23:  option remote-port 6997
24:  option transport.address-family inet/inet6
25:  option remote-subvolume brick
26: end-volume
27:
28: volume mirror1
29:  type protocol/client
30:  option transport-type ib-verbs/client
31:  option remote-host 172.16.0.203
32:  option remote-port 6997
33:  option transport.address-family inet/inet6
34:  option remote-subvolume brick
35: end-volume
36:
37: volume brick2
38:  type protocol/client
39:  option transport-type ib-verbs/client
40:  option remote-host 172.16.0.204
41:  option remote-port 6997
42:  option transport.address-family inet/inet6
43:  option remote-subvolume brick
44: end-volume
45:
46: volume mirror2
47:  type protocol/client
48:  option transport-type ib-verbs/client
49:  option remote-host 172.16.0.205
50:  option remote-port 6997
51:  option transport.address-family inet/inet6
52:  option remote-subvolume brick
53: end-volume
54:
55: volume block0
56:  type cluster/replicate
57:  subvolumes brick0 mirror0
58: end-volume
59:
60: volume block1
61:  type cluster/replicate
62:  subvolumes brick1 mirror1
63: end-volume
64:
65: volume block2
66:  type cluster/replicate
67:  subvolumes brick2 mirror2
68: end-volume
69:
70: volume unify
71:  type cluster/distribute
72:  subvolumes block0 block1 block2
73: end-volume
74:

+--+ 

[2009-09-18 19:36:17] D [glusterfsd.c:1205:main] glusterfs: running 
in pid 8303
[2009-09-18 19:36:17] D [client-protocol.c:5952:init] brick0: 
defaulting frame-timeout to 30mins
[2009-09-18 19:36:17] D [client-protocol.c:5963:init] brick0: 
defaulting ping-timeout to 10
[2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
attempt to load file 
/usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
[2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] 
brick0: no range check required for 'option remote-port 6997'
[2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
attempt to load file 
/usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
[2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] 
brick0: no range check required for 'option remote-port 6997'
[2009-09-18 19:36:17] D [client-protocol.c:5952:init] mirror0: 
defaulting frame-timeout to 30mins
[2009-09-18 19:36:17] D [client-protocol.c:5963:init] mirror0: 
defaulting ping-timeout to 10
[2009-09-18 19:36:17] D [transport.c:141:tran

Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband

2009-09-22 Thread Mickey Mazarick
I had some difficulty getting OFED 1.3 working on kernel 2.6.27 about 6 
months back.  It took some patching but I did find that you needed to 
have the srq enabled for it to work. The ibv_srq_pingpong test app was a 
good test for weather it would work with gluster of not.


I also had to upgrade the firmware on the mellanox cards I have to 
enable srq (send recieve que)


-Mic

Nathan Stratton wrote:


Hate to post again, but anyone have any ideas on this?

-Nathan

On Fri, 18 Sep 2009, Nathan Stratton wrote:



Has anyone been able to get Infiniband working with 2.6.31 kernel and 
fuse 2.8.0? My config works fine on my Centos 2.6.18 box, so I know 
that is ok.


Infiniband looks good:

[r...@xen1 src]# lsmod |grep ib
ib_ucm 13752  0
ib_uverbs  32256  2 rdma_ucm,ib_ucm
ib_ipoib   68880  0
ib_mthca  123700  0

[r...@xen1 src]# ibv_devices
   device node GUID
   --  
   mthca0  0005ad0327e8

Gluster looks like it starts OK, but I can't touch the mount and 
after a while it times out. Debug logs:



[2009-09-18 19:36:17] D [glusterfsd.c:354:_get_specfp] glusterfs: 
loading volume file /usr/local/etc/glusterfs/glusterfs.vol
 


Version  : glusterfs 2.0.6 built on Sep 18 2009 09:54:43
TLA Revision : v2.0.6
Starting Time: 2009-09-18 19:36:17
Command line : glusterfs -L DEBUG -l /var/log/glusterfs.log 
--disable-direct-io-mode /share

PID  : 8303
System name  : Linux
Nodename : xen1.hou.blinkmind.com
Kernel Release : 2.6.31
Hardware Identifier: x86_64

Given volfile:
+--+ 


 1: volume brick0
 2:  type protocol/client
 3:  option transport-type ib-verbs/client
 4:  option remote-host 172.16.0.200
 5:  option remote-port 6997
 6:  option transport.address-family inet/inet6
 7:  option remote-subvolume brick
 8: end-volume
 9:
10: volume mirror0
11:  type protocol/client
12:  option transport-type ib-verbs/client
13:  option remote-host 172.16.0.201
14:  option remote-port 6997
15:  option transport.address-family inet/inet6
16:  option remote-subvolume brick
17: end-volume
18:
19: volume brick1
20:  type protocol/client
21:  option transport-type ib-verbs/client
22:  option remote-host 172.16.0.202
23:  option remote-port 6997
24:  option transport.address-family inet/inet6
25:  option remote-subvolume brick
26: end-volume
27:
28: volume mirror1
29:  type protocol/client
30:  option transport-type ib-verbs/client
31:  option remote-host 172.16.0.203
32:  option remote-port 6997
33:  option transport.address-family inet/inet6
34:  option remote-subvolume brick
35: end-volume
36:
37: volume brick2
38:  type protocol/client
39:  option transport-type ib-verbs/client
40:  option remote-host 172.16.0.204
41:  option remote-port 6997
42:  option transport.address-family inet/inet6
43:  option remote-subvolume brick
44: end-volume
45:
46: volume mirror2
47:  type protocol/client
48:  option transport-type ib-verbs/client
49:  option remote-host 172.16.0.205
50:  option remote-port 6997
51:  option transport.address-family inet/inet6
52:  option remote-subvolume brick
53: end-volume
54:
55: volume block0
56:  type cluster/replicate
57:  subvolumes brick0 mirror0
58: end-volume
59:
60: volume block1
61:  type cluster/replicate
62:  subvolumes brick1 mirror1
63: end-volume
64:
65: volume block2
66:  type cluster/replicate
67:  subvolumes brick2 mirror2
68: end-volume
69:
70: volume unify
71:  type cluster/distribute
72:  subvolumes block0 block1 block2
73: end-volume
74:

+--+ 

[2009-09-18 19:36:17] D [glusterfsd.c:1205:main] glusterfs: running 
in pid 8303
[2009-09-18 19:36:17] D [client-protocol.c:5952:init] brick0: 
defaulting frame-timeout to 30mins
[2009-09-18 19:36:17] D [client-protocol.c:5963:init] brick0: 
defaulting ping-timeout to 10
[2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
attempt to load file 
/usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
[2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] 
brick0: no range check required for 'option remote-port 6997'
[2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
attempt to load file 
/usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
[2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] 
brick0: no range check required for 'option remote-port 6997'
[2009-09-18 19:36:17] D [client-protocol.c:5952:init] mirror0: 
defaulting frame-timeout to 30mins
[2009-09-18 19:36:17] D [client-protocol.c:5963:init] mirror0: 
defaulting ping-timeout to 10
[2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
attempt to load file 
/usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
[2009-09-18 19:36:17] D [xlator.c:276:_volume_op

Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband

2009-09-22 Thread Nathan Stratton

On Tue, 22 Sep 2009, Anand Avati wrote:


Then I guess even IPoIB is not working for you? I'm not sure if you


Actually, IPoIB is working just fine.


might have to upgrade to a new OFED for either libibverbs or the mthca
uverbs driver may not be compatible with the latest kernel IB drivers.


Ya, I think you may be onto something there.


Any hints from the OFED mailing list?


Will try them next, does not look like a gluster issue.


Avati


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband

2009-09-22 Thread Anand Avati
> > >  [r...@xen1 ~]# ibv_srq_pingpong 10.13.0.220
> > >  local address:  LID 0x000b, QPN 0x300406, PSN 0x1ace41
> > >  local address:  LID 0x000b, QPN 0x300407, PSN 0x6ba197
> > >  local address:  LID 0x000b, QPN 0x300408, PSN 0xa6f895
> > >  local address:  LID 0x000b, QPN 0x300409, PSN 0xf054c0
> > >  local address:  LID 0x000b, QPN 0x30040a, PSN 0xea4bd3
> > >  local address:  LID 0x000b, QPN 0x30040b, PSN 0xfe3039
> > >  local address:  LID 0x000b, QPN 0x30040c, PSN 0x037fa4
> > >  local address:  LID 0x000b, QPN 0x30040d, PSN 0x1feccd
> > >  local address:  LID 0x000b, QPN 0x30040e, PSN 0x22daed
> > >  local address:  LID 0x000b, QPN 0x30040f, PSN 0xcaa26b
> > >  local address:  LID 0x000b, QPN 0x300410, PSN 0xe87f33
> > >  local address:  LID 0x000b, QPN 0x300411, PSN 0x84bb4a
> > >  local address:  LID 0x000b, QPN 0x300412, PSN 0x09286e
> > >  local address:  LID 0x000b, QPN 0x300413, PSN 0xecf483
> > >  local address:  LID 0x000b, QPN 0x300414, PSN 0xd55285
> > >  local address:  LID 0x000b, QPN 0x300415, PSN 0xdd7065
> > >  remote address: LID 0x0004, QPN 0x460406, PSN 0x198bcb
> > >  remote address: LID 0x0004, QPN 0x460407, PSN 0x645159
> > >  remote address: LID 0x0004, QPN 0x460408, PSN 0x4a1a2f
> > >  remote address: LID 0x0004, QPN 0x460409, PSN 0x8dff52
> > >  remote address: LID 0x0004, QPN 0x46040a, PSN 0xe317fd
> > >  remote address: LID 0x0004, QPN 0x46040b, PSN 0x12da1b
> > >  remote address: LID 0x0004, QPN 0x460418, PSN 0xc8e0de
> > >  remote address: LID 0x0004, QPN 0x460419, PSN 0xfc6e7f
> > >  remote address: LID 0x0004, QPN 0x46041a, PSN 0xa3ffb7
> > >  remote address: LID 0x0004, QPN 0x46041b, PSN 0x0cc86d
> > >  remote address: LID 0x0004, QPN 0x46041c, PSN 0x107a0d
> > >  remote address: LID 0x0004, QPN 0x46041d, PSN 0xe2661c
> > >  remote address: LID 0x0004, QPN 0x46041e, PSN 0xfb8fd8
> > >  remote address: LID 0x0004, QPN 0x46041f, PSN 0xc438a5
> > >  remote address: LID 0x0004, QPN 0x460420, PSN 0x0be0ff
> > >  remote address: LID 0x0004, QPN 0x460421, PSN 0x91b657
> > >
> > >  This one ends odd, I think it should tell me more info, but it just
> sits
> > > there.
> > >
> >
> > Just starting ibv_srq_pingpong makes it the "server". You should run
> > "ibv_srq_pingpong " from a second server and then the two will
> > ping-pong each other. Can you please post that output as well?
> >
>
>  The above was from the client using the IP address of the server. The
> server showed:

Then I guess even IPoIB is not working for you? I'm not sure if you
might have to upgrade to a new OFED for either libibverbs or the mthca
uverbs driver may not be compatible with the latest kernel IB drivers.
Any hints from the OFED mailing list?

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband

2009-09-22 Thread Nathan Stratton

On Tue, 22 Sep 2009, Anand Avati wrote:


What does the server log have to say? Can you also check if port-1 is
the active port in ibv_devinfo? Looks like ib-verbs messaging is not
happening. Does ibv_srq_pingpong give sane results?



 [3.890311] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4,
2008)
 [3.890315] ib_mthca: Initializing :08:00.0
 [3.890354] ib_mthca :08:00.0: PCI INT A -> GSI 28 (level, low) ->
IRQ 28
 [7.899804] ADDRCONF(NETDEV_UP): ib0: link is not ready
 [7.902722] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
 [   17.935013] ib0: no IPv6 routers present


 [r...@xen1 ~]# ibv_devinfo
 hca_id: mthca0
fw_ver: 3.5.0
node_guid:  0005:ad00:0003:27e8
sys_image_guid: 0005:ad00:0100:d050
vendor_id:  0x02c9
vendor_part_id: 23108
hw_ver: 0xA1
board_id:   MT_0270110001
phys_port_cnt:  2
port:   1
state:  active (4)
max_mtu:2048 (4)
active_mtu: 2048 (4)
sm_lid: 2
port_lid:   11
port_lmc:   0x00

port:   2
state:  down (1)
max_mtu:2048 (4)
active_mtu: 512 (2)
sm_lid: 0
port_lid:   0
port_lmc:   0x00



 [r...@xen1 ~]# ibv_srq_pingpong 10.13.0.220
  local address:  LID 0x000b, QPN 0x300406, PSN 0x1ace41
  local address:  LID 0x000b, QPN 0x300407, PSN 0x6ba197
  local address:  LID 0x000b, QPN 0x300408, PSN 0xa6f895
  local address:  LID 0x000b, QPN 0x300409, PSN 0xf054c0
  local address:  LID 0x000b, QPN 0x30040a, PSN 0xea4bd3
  local address:  LID 0x000b, QPN 0x30040b, PSN 0xfe3039
  local address:  LID 0x000b, QPN 0x30040c, PSN 0x037fa4
  local address:  LID 0x000b, QPN 0x30040d, PSN 0x1feccd
  local address:  LID 0x000b, QPN 0x30040e, PSN 0x22daed
  local address:  LID 0x000b, QPN 0x30040f, PSN 0xcaa26b
  local address:  LID 0x000b, QPN 0x300410, PSN 0xe87f33
  local address:  LID 0x000b, QPN 0x300411, PSN 0x84bb4a
  local address:  LID 0x000b, QPN 0x300412, PSN 0x09286e
  local address:  LID 0x000b, QPN 0x300413, PSN 0xecf483
  local address:  LID 0x000b, QPN 0x300414, PSN 0xd55285
  local address:  LID 0x000b, QPN 0x300415, PSN 0xdd7065
  remote address: LID 0x0004, QPN 0x460406, PSN 0x198bcb
  remote address: LID 0x0004, QPN 0x460407, PSN 0x645159
  remote address: LID 0x0004, QPN 0x460408, PSN 0x4a1a2f
  remote address: LID 0x0004, QPN 0x460409, PSN 0x8dff52
  remote address: LID 0x0004, QPN 0x46040a, PSN 0xe317fd
  remote address: LID 0x0004, QPN 0x46040b, PSN 0x12da1b
  remote address: LID 0x0004, QPN 0x460418, PSN 0xc8e0de
  remote address: LID 0x0004, QPN 0x460419, PSN 0xfc6e7f
  remote address: LID 0x0004, QPN 0x46041a, PSN 0xa3ffb7
  remote address: LID 0x0004, QPN 0x46041b, PSN 0x0cc86d
  remote address: LID 0x0004, QPN 0x46041c, PSN 0x107a0d
  remote address: LID 0x0004, QPN 0x46041d, PSN 0xe2661c
  remote address: LID 0x0004, QPN 0x46041e, PSN 0xfb8fd8
  remote address: LID 0x0004, QPN 0x46041f, PSN 0xc438a5
  remote address: LID 0x0004, QPN 0x460420, PSN 0x0be0ff
  remote address: LID 0x0004, QPN 0x460421, PSN 0x91b657

 This one ends odd, I think it should tell me more info, but it just sits
there.


Just starting ibv_srq_pingpong makes it the "server". You should run
"ibv_srq_pingpong " from a second server and then the two will
ping-pong each other. Can you please post that output as well?


The above was from the client using the IP address of the server. The 
server showed:


[r...@xen0 ~]# ibv_srq_pingpong
  local address:  LID 0x0004, QPN 0x460406, PSN 0x198bcb
  local address:  LID 0x0004, QPN 0x460407, PSN 0x645159
  local address:  LID 0x0004, QPN 0x460408, PSN 0x4a1a2f
  local address:  LID 0x0004, QPN 0x460409, PSN 0x8dff52
  local address:  LID 0x0004, QPN 0x46040a, PSN 0xe317fd
  local address:  LID 0x0004, QPN 0x46040b, PSN 0x12da1b
  local address:  LID 0x0004, QPN 0x460418, PSN 0xc8e0de
  local address:  LID 0x0004, QPN 0x460419, PSN 0xfc6e7f
  local address:  LID 0x0004, QPN 0x46041a, PSN 0xa3ffb7
  local address:  LID 0x0004, QPN 0x46041b, PSN 0x0cc86d
  local address:  LID 0x0004, QPN 0x46041c, PSN 0x107a0d
  local address:  LID 0x0004, QPN 0x46041d, PSN 0xe2661c
  local address:  LID 0x0004, QPN 0x46041e, PSN 0xfb8fd8
  local address:  LID 0x0004, QPN 0x46041f, PSN 0xc438a5
  local address:  LID 0x0004, QPN 0x460420, PSN 0x0be0ff
  local address:  LID 0x0004, QPN 0x460421, PSN 0x91b657
  remote address: LID 0x000b, QPN 0x300406, PS

Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband

2009-09-22 Thread Anand Avati
> > What does the server log have to say? Can you also check if port-1 is
> > the active port in ibv_devinfo? Looks like ib-verbs messaging is not
> > happening. Does ibv_srq_pingpong give sane results?
> >
>
>  [3.890311] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4,
> 2008)
>  [3.890315] ib_mthca: Initializing :08:00.0
>  [3.890354] ib_mthca :08:00.0: PCI INT A -> GSI 28 (level, low) ->
> IRQ 28
>  [7.899804] ADDRCONF(NETDEV_UP): ib0: link is not ready
>  [7.902722] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
>  [   17.935013] ib0: no IPv6 routers present
>
>
>  [r...@xen1 ~]# ibv_devinfo
>  hca_id: mthca0
> fw_ver: 3.5.0
> node_guid:  0005:ad00:0003:27e8
> sys_image_guid: 0005:ad00:0100:d050
> vendor_id:  0x02c9
> vendor_part_id: 23108
> hw_ver: 0xA1
> board_id:   MT_0270110001
> phys_port_cnt:  2
> port:   1
> state:  active (4)
> max_mtu:2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 2
> port_lid:   11
> port_lmc:   0x00
>
> port:   2
> state:  down (1)
> max_mtu:2048 (4)
> active_mtu: 512 (2)
> sm_lid: 0
> port_lid:   0
> port_lmc:   0x00
>
>
>
>  [r...@xen1 ~]# ibv_srq_pingpong 10.13.0.220
>   local address:  LID 0x000b, QPN 0x300406, PSN 0x1ace41
>   local address:  LID 0x000b, QPN 0x300407, PSN 0x6ba197
>   local address:  LID 0x000b, QPN 0x300408, PSN 0xa6f895
>   local address:  LID 0x000b, QPN 0x300409, PSN 0xf054c0
>   local address:  LID 0x000b, QPN 0x30040a, PSN 0xea4bd3
>   local address:  LID 0x000b, QPN 0x30040b, PSN 0xfe3039
>   local address:  LID 0x000b, QPN 0x30040c, PSN 0x037fa4
>   local address:  LID 0x000b, QPN 0x30040d, PSN 0x1feccd
>   local address:  LID 0x000b, QPN 0x30040e, PSN 0x22daed
>   local address:  LID 0x000b, QPN 0x30040f, PSN 0xcaa26b
>   local address:  LID 0x000b, QPN 0x300410, PSN 0xe87f33
>   local address:  LID 0x000b, QPN 0x300411, PSN 0x84bb4a
>   local address:  LID 0x000b, QPN 0x300412, PSN 0x09286e
>   local address:  LID 0x000b, QPN 0x300413, PSN 0xecf483
>   local address:  LID 0x000b, QPN 0x300414, PSN 0xd55285
>   local address:  LID 0x000b, QPN 0x300415, PSN 0xdd7065
>   remote address: LID 0x0004, QPN 0x460406, PSN 0x198bcb
>   remote address: LID 0x0004, QPN 0x460407, PSN 0x645159
>   remote address: LID 0x0004, QPN 0x460408, PSN 0x4a1a2f
>   remote address: LID 0x0004, QPN 0x460409, PSN 0x8dff52
>   remote address: LID 0x0004, QPN 0x46040a, PSN 0xe317fd
>   remote address: LID 0x0004, QPN 0x46040b, PSN 0x12da1b
>   remote address: LID 0x0004, QPN 0x460418, PSN 0xc8e0de
>   remote address: LID 0x0004, QPN 0x460419, PSN 0xfc6e7f
>   remote address: LID 0x0004, QPN 0x46041a, PSN 0xa3ffb7
>   remote address: LID 0x0004, QPN 0x46041b, PSN 0x0cc86d
>   remote address: LID 0x0004, QPN 0x46041c, PSN 0x107a0d
>   remote address: LID 0x0004, QPN 0x46041d, PSN 0xe2661c
>   remote address: LID 0x0004, QPN 0x46041e, PSN 0xfb8fd8
>   remote address: LID 0x0004, QPN 0x46041f, PSN 0xc438a5
>   remote address: LID 0x0004, QPN 0x460420, PSN 0x0be0ff
>   remote address: LID 0x0004, QPN 0x460421, PSN 0x91b657
>
>  This one ends odd, I think it should tell me more info, but it just sits
> there.

Just starting ibv_srq_pingpong makes it the "server". You should run
"ibv_srq_pingpong " from a second server and then the two will
ping-pong each other. Can you please post that output as well?

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband

2009-09-22 Thread Nathan Stratton


On Tue, 22 Sep 2009, Anand Avati wrote:


 Hate to post again, but anyone have any ideas on this?


What does the server log have to say? Can you also check if port-1 is
the active port in ibv_devinfo? Looks like ib-verbs messaging is not
happening. Does ibv_srq_pingpong give sane results?


[3.890311] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 2008)
[3.890315] ib_mthca: Initializing :08:00.0
[3.890354] ib_mthca :08:00.0: PCI INT A -> GSI 28 (level, low) -> IRQ 28
[7.899804] ADDRCONF(NETDEV_UP): ib0: link is not ready
[7.902722] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[   17.935013] ib0: no IPv6 routers present


[r...@xen1 ~]# ibv_devinfo
hca_id: mthca0
fw_ver: 3.5.0
node_guid:  0005:ad00:0003:27e8
sys_image_guid: 0005:ad00:0100:d050
vendor_id:  0x02c9
vendor_part_id: 23108
hw_ver: 0xA1
board_id:   MT_0270110001
phys_port_cnt:  2
port:   1
state:  active (4)
max_mtu:2048 (4)
active_mtu: 2048 (4)
sm_lid: 2
port_lid:   11
port_lmc:   0x00

port:   2
state:  down (1)
max_mtu:2048 (4)
active_mtu: 512 (2)
sm_lid: 0
port_lid:   0
port_lmc:   0x00



[r...@xen1 ~]# ibv_srq_pingpong 10.13.0.220
  local address:  LID 0x000b, QPN 0x300406, PSN 0x1ace41
  local address:  LID 0x000b, QPN 0x300407, PSN 0x6ba197
  local address:  LID 0x000b, QPN 0x300408, PSN 0xa6f895
  local address:  LID 0x000b, QPN 0x300409, PSN 0xf054c0
  local address:  LID 0x000b, QPN 0x30040a, PSN 0xea4bd3
  local address:  LID 0x000b, QPN 0x30040b, PSN 0xfe3039
  local address:  LID 0x000b, QPN 0x30040c, PSN 0x037fa4
  local address:  LID 0x000b, QPN 0x30040d, PSN 0x1feccd
  local address:  LID 0x000b, QPN 0x30040e, PSN 0x22daed
  local address:  LID 0x000b, QPN 0x30040f, PSN 0xcaa26b
  local address:  LID 0x000b, QPN 0x300410, PSN 0xe87f33
  local address:  LID 0x000b, QPN 0x300411, PSN 0x84bb4a
  local address:  LID 0x000b, QPN 0x300412, PSN 0x09286e
  local address:  LID 0x000b, QPN 0x300413, PSN 0xecf483
  local address:  LID 0x000b, QPN 0x300414, PSN 0xd55285
  local address:  LID 0x000b, QPN 0x300415, PSN 0xdd7065
  remote address: LID 0x0004, QPN 0x460406, PSN 0x198bcb
  remote address: LID 0x0004, QPN 0x460407, PSN 0x645159
  remote address: LID 0x0004, QPN 0x460408, PSN 0x4a1a2f
  remote address: LID 0x0004, QPN 0x460409, PSN 0x8dff52
  remote address: LID 0x0004, QPN 0x46040a, PSN 0xe317fd
  remote address: LID 0x0004, QPN 0x46040b, PSN 0x12da1b
  remote address: LID 0x0004, QPN 0x460418, PSN 0xc8e0de
  remote address: LID 0x0004, QPN 0x460419, PSN 0xfc6e7f
  remote address: LID 0x0004, QPN 0x46041a, PSN 0xa3ffb7
  remote address: LID 0x0004, QPN 0x46041b, PSN 0x0cc86d
  remote address: LID 0x0004, QPN 0x46041c, PSN 0x107a0d
  remote address: LID 0x0004, QPN 0x46041d, PSN 0xe2661c
  remote address: LID 0x0004, QPN 0x46041e, PSN 0xfb8fd8
  remote address: LID 0x0004, QPN 0x46041f, PSN 0xc438a5
  remote address: LID 0x0004, QPN 0x460420, PSN 0x0be0ff
  remote address: LID 0x0004, QPN 0x460421, PSN 0x91b657

This one ends odd, I think it should tell me more info, but it just sits 
there.


-Nathan
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband

2009-09-22 Thread Anand Avati
>  Hate to post again, but anyone have any ideas on this?

What does the server log have to say? Can you also check if port-1 is
the active port in ibv_devinfo? Looks like ib-verbs messaging is not
happening. Does ibv_srq_pingpong give sane results?

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband

2009-09-22 Thread Nathan Stratton


Hate to post again, but anyone have any ideas on this?

-Nathan

On Fri, 18 Sep 2009, Nathan Stratton wrote:



Has anyone been able to get Infiniband working with 2.6.31 kernel and fuse 
2.8.0? My config works fine on my Centos 2.6.18 box, so I know that is ok.


Infiniband looks good:

[r...@xen1 src]# lsmod |grep ib
ib_ucm 13752  0
ib_uverbs  32256  2 rdma_ucm,ib_ucm
ib_ipoib   68880  0
ib_mthca  123700  0

[r...@xen1 src]# ibv_devices
   device  node GUID
   --   
   mthca0   0005ad0327e8

Gluster looks like it starts OK, but I can't touch the mount and after a 
while it times out. Debug logs:



[2009-09-18 19:36:17] D [glusterfsd.c:354:_get_specfp] glusterfs: loading 
volume file /usr/local/etc/glusterfs/glusterfs.vol


Version  : glusterfs 2.0.6 built on Sep 18 2009 09:54:43
TLA Revision : v2.0.6
Starting Time: 2009-09-18 19:36:17
Command line : glusterfs -L DEBUG -l /var/log/glusterfs.log 
--disable-direct-io-mode /share

PID  : 8303
System name  : Linux
Nodename : xen1.hou.blinkmind.com
Kernel Release : 2.6.31
Hardware Identifier: x86_64

Given volfile:
+--+
 1: volume brick0
 2:  type protocol/client
 3:  option transport-type ib-verbs/client
 4:  option remote-host 172.16.0.200
 5:  option remote-port 6997
 6:  option transport.address-family inet/inet6
 7:  option remote-subvolume brick
 8: end-volume
 9:
10: volume mirror0
11:  type protocol/client
12:  option transport-type ib-verbs/client
13:  option remote-host 172.16.0.201
14:  option remote-port 6997
15:  option transport.address-family inet/inet6
16:  option remote-subvolume brick
17: end-volume
18:
19: volume brick1
20:  type protocol/client
21:  option transport-type ib-verbs/client
22:  option remote-host 172.16.0.202
23:  option remote-port 6997
24:  option transport.address-family inet/inet6
25:  option remote-subvolume brick
26: end-volume
27:
28: volume mirror1
29:  type protocol/client
30:  option transport-type ib-verbs/client
31:  option remote-host 172.16.0.203
32:  option remote-port 6997
33:  option transport.address-family inet/inet6
34:  option remote-subvolume brick
35: end-volume
36:
37: volume brick2
38:  type protocol/client
39:  option transport-type ib-verbs/client
40:  option remote-host 172.16.0.204
41:  option remote-port 6997
42:  option transport.address-family inet/inet6
43:  option remote-subvolume brick
44: end-volume
45:
46: volume mirror2
47:  type protocol/client
48:  option transport-type ib-verbs/client
49:  option remote-host 172.16.0.205
50:  option remote-port 6997
51:  option transport.address-family inet/inet6
52:  option remote-subvolume brick
53: end-volume
54:
55: volume block0
56:  type cluster/replicate
57:  subvolumes brick0 mirror0
58: end-volume
59:
60: volume block1
61:  type cluster/replicate
62:  subvolumes brick1 mirror1
63: end-volume
64:
65: volume block2
66:  type cluster/replicate
67:  subvolumes brick2 mirror2
68: end-volume
69:
70: volume unify
71:  type cluster/distribute
72:  subvolumes block0 block1 block2
73: end-volume
74:

+--+
[2009-09-18 19:36:17] D [glusterfsd.c:1205:main] glusterfs: running in pid 
8303
[2009-09-18 19:36:17] D [client-protocol.c:5952:init] brick0: defaulting 
frame-timeout to 30mins
[2009-09-18 19:36:17] D [client-protocol.c:5963:init] brick0: defaulting 
ping-timeout to 10
[2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: attempt 
to load file /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
[2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] brick0: 
no range check required for 'option remote-port 6997'
[2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: attempt 
to load file /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
[2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] brick0: 
no range check required for 'option remote-port 6997'
[2009-09-18 19:36:17] D [client-protocol.c:5952:init] mirror0: defaulting 
frame-timeout to 30mins
[2009-09-18 19:36:17] D [client-protocol.c:5963:init] mirror0: defaulting 
ping-timeout to 10
[2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: attempt 
to load file /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
[2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] mirror0: 
no range check required for 'option remote-port 6997'
[2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: attempt 
to load file /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
[2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] mirror0: 
no range check required for 'option remote-port 6997'
[2009-09-18 19:36:17] D [client-protocol.c:5952:init]

Re: [Gluster-users] Rsync

2009-09-22 Thread Hiren Joshi
I forgot to mention, the mount is mounted with direct-io, would this
make a difference? 

> -Original Message-
> From: gluster-users-boun...@gluster.org 
> [mailto:gluster-users-boun...@gluster.org] On Behalf Of Hiren Joshi
> Sent: 22 September 2009 11:40
> To: gluster-users@gluster.org
> Subject: [Gluster-users] Rsync
> 
> Hello all,
>  
> I'm getting what I think is bizarre behaviour I have about 400G to
> rsync (rsync -av) onto a gluster share, the data is in a directory
> structure which has about 1000 directories per parent and about 1000
> directories in each of them.
>  
> When I try to rsync an end leaf directory (this has about 4 
> dirs and 100
> files in each) the operation takes about 10 seconds. When I 
> go one level
> above (1000 dirs with about 4 dirs in each with about 100 
> files in each)
> the operation takes about 10 minutes.
>  
> Now, if I then go one level above that (that's 1000 dirs with 
> 1000 dirs
> in each with about 4 dirs in each with about 100 files in each) the
> operation takes days! Top shows glusterfsd takes 300-600% cpu usage
> (2X4core), I have about 48G of memory (usage is 0% as expected).
>  
> Has anyone seen anything like this? How can I speed it up?
>  
> Thanks,
>  
> Josh.
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Rsync

2009-09-22 Thread Hiren Joshi
Hello all,
 
I'm getting what I think is bizarre behaviour I have about 400G to
rsync (rsync -av) onto a gluster share, the data is in a directory
structure which has about 1000 directories per parent and about 1000
directories in each of them.
 
When I try to rsync an end leaf directory (this has about 4 dirs and 100
files in each) the operation takes about 10 seconds. When I go one level
above (1000 dirs with about 4 dirs in each with about 100 files in each)
the operation takes about 10 minutes.
 
Now, if I then go one level above that (that's 1000 dirs with 1000 dirs
in each with about 4 dirs in each with about 100 files in each) the
operation takes days! Top shows glusterfsd takes 300-600% cpu usage
(2X4core), I have about 48G of memory (usage is 0% as expected).
 
Has anyone seen anything like this? How can I speed it up?
 
Thanks,
 
Josh.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users