Re: [Gluster-users] Rsync

2009-09-28 Thread Hiren Joshi
Another update:
It took 1240 minutes (over 20 hours) to complete on the simplified
system (without mirroring). What else can I do to debug?

> -Original Message-
> From: gluster-users-boun...@gluster.org 
> [mailto:gluster-users-boun...@gluster.org] On Behalf Of Hiren Joshi
> Sent: 24 September 2009 13:05
> To: Pavan Vilas Sondur
> Cc: gluster-users@gluster.org
> Subject: Re: [Gluster-users] Rsync
> 
>  
> 
> > -Original Message-
> > From: Pavan Vilas Sondur [mailto:pa...@gluster.com] 
> > Sent: 24 September 2009 12:42
> > To: Hiren Joshi
> > Cc: gluster-users@gluster.org
> > Subject: Re: Rsync
> > 
> > Can you let us know the following:
> > 
> >  * What is the exact directory structure?
> /abc/def/ghi/jkl/[1-4]
> now abc, def, ghi and jkl are one of a thousand dirs.
> 
> >  * How many files are there in each individual directory and 
> > of what size?
> Each of the [1-4] dirs has about 100 files in, all under 1MB.
> 
> >  * It looks like each server process has 6 export 
> > directories. Can you run one server process each for a single 
> > export directory and check if the rsync speeds up?
> I had no idea you could do that. How? Would I need to create 6 config
> files and start gluster:
> 
> /usr/sbin/glusterfsd -f /etc/glusterfs/export1.vol or similar?
> 
> I'll give this a go
> 
> >  * Also, do you have any benchmarks with a similar setup on 
> say, NFS?
> NFS will create the dir tree in about 20 minutes then start 
> copying the
> files over, it takes about 2-3 hours.
> 
> > 
> > Pavan
> > 
> > On 24/09/09 12:13 +0100, Hiren Joshi wrote:
> > > It's been running for over 24 hours now.
> > > Network traffic is nominal, top shows about 200-400% cpu 
> (7 cores so
> > > it's not too bad).
> > > About 14G of memory used (the rest is being used as disk cache).
> > > 
> > > Thoughts?
> > > 
> > > 
> > > 
> > > 
> > > > > > > 
> > > > > > > An update, after running the rsync for a day, I killed it 
> > > > > > and remounted
> > > > > > > all the disks (the underlying filesystem, not the 
> gluster) 
> > > > > > with noatime,
> > > > > > > the rsync completed in about 600 minutes. I'm now 
> going to 
> > > > > > try one level
> > > > > > > up (about 1,000,000,000 dirs).
> > > > > > > 
> > > > > > > > -Original Message-
> > > > > > > > From: Pavan Vilas Sondur [mailto:pa...@gluster.com] 
> > > > > > > > Sent: 23 September 2009 07:55
> > > > > > > > To: Hiren Joshi
> > > > > > > > Cc: gluster-users@gluster.org
> > > > > > > > Subject: Re: Rsync
> > > > > > > > 
> > > > > > > > Hi Hiren,
> > > > > > > > What glusterfs version are you using? Can you 
> send us the 
> > > > > > > > volfiles and the log files.
> > > > > > > > 
> > > > > > > > Pavan
> > > > > > > > 
> > > > > > > > On 22/09/09 16:01 +0100, Hiren Joshi wrote:
> > > > > > > > > I forgot to mention, the mount is mounted with 
> > > > > > direct-io, would this
> > > > > > > > > make a difference? 
> > > > > > > > > 
> > > > > > > > > > -Original Message-
> > > > > > > > > > From: gluster-users-boun...@gluster.org 
> > > > > > > > > > [mailto:gluster-users-boun...@gluster.org] On 
> > Behalf Of 
> > > > > > > > Hiren Joshi
> > > > > > > > > > Sent: 22 September 2009 11:40
> > > > > > > > > > To: gluster-users@gluster.org
> > > > > > > > > > Subject: [Gluster-users] Rsync
> > > > > > > > > > 
> > > > > > > > > > Hello all,
> > > > > > > > > >  
> > > > > > > > > > I'm getting what I think is bizarre 
> > behaviour I have 
> > > > > > > > about 400G to
> > > > > > > > > > rsync (rsync -av) onto a gluster share, the data is 
> > > > > > in a directory
> > > > > > > > > > structure which has about 1000 directories 
> > per parent and 
> > > > > > > > about 1000
> > > > > > > > > > directories in each of them.
> > > > > > > > > >  
> > > > > > > > > > When I try to rsync an end leaf directory (this 
> > > > has about 4 
> > > > > > > > > > dirs and 100
> > > > > > > > > > files in each) the operation takes about 10 
> > > > seconds. When I 
> > > > > > > > > > go one level
> > > > > > > > > > above (1000 dirs with about 4 dirs in each 
> > with about 100 
> > > > > > > > > > files in each)
> > > > > > > > > > the operation takes about 10 minutes.
> > > > > > > > > >  
> > > > > > > > > > Now, if I then go one level above that (that's 1000 
> > > > > dirs with 
> > > > > > > > > > 1000 dirs
> > > > > > > > > > in each with about 4 dirs in each with about 
> > 100 files in 
> > > > > > > > each) the
> > > > > > > > > > operation takes days! Top shows glusterfsd 
> > takes 300-600% 
> > > > > > > > cpu usage
> > > > > > > > > > (2X4core), I have about 48G of memory 
> (usage is 0% as 
> > > > > > expected).
> > > > > > > > > >  
> > > > > > > > > > Has anyone seen anything like this? How can I 
> > speed it up?
> > > > > > > > > >  
> > > > > > > > > > Thanks,
> > > > > > > > > >  
> > > > > > > > > > Josh.
> > > > > > > > > > 
> > > > > > > > > ___
> > > > > > > > > Gluster-user

[Gluster-users] Installing glusterfs ver 2.* on fedora7 i386 ?

2009-09-28 Thread Joon Woo Kim
I succeeded in installing 2.0.6 src rpm on my fedora7 i386 hosts,
but it does not work at all. Everytime I try to write files in the mounted 
folder,
the operation fails with error messages like " invalid argument".

I think there's difference between fc7 and glusterfs ver 2.* in the file 
attribute format.
Does anyone know how to fix this? The former version 1.3.* is not comfortable 
to use,
so I don't want to choose.


Joon Woo Kim___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] AFR self-heal bug with rmdir (Directory not empty)

2009-09-28 Thread Corentin Chary
Hi,
I'm trying to use glusterfs with afr.
My setup have 2 servers and 2 clients. / is mounted with user_xattr.
It seems that if you shutdown a server, remove a directory with one or
more childs, then restart the server, the changes won't be replicated
because rmdir is not recursive in afr-self-heal-entry.c

Here is my test case:
$ bin/clients.sh  # launch 2 clients

$ tree mnt/ export/
mnt/
|-- 1
`-- 2
export/
|-- 1
`-- 2

$ mkdir + touch

$ tree mnt/1
mnt/1
|-- dir-empty
|-- dir-with-file
|   `-- file
`-- dir-with-subdir
`-- subdir

$ kill server 2

$ rm mnt/1/dir* -rf

$ start server 2

$ tree mnt/
mnt/
|-- 1
`-- 2

$ tree export/
export/
|-- 1
|-- 2
|-- dir-with-file
|   `-- file
`-- dir-with-subdir
`-- subdir

Log:
[2009-09-28 15:30:09] D
[afr-self-heal-entry.c:1865:afr_sh_entry_sync_prepare] afr:
self-healing directory / from subvolume brick1 to 1 other
[2009-09-28 15:30:09] D
[afr-self-heal-entry.c:455:afr_sh_entry_expunge_remove_cbk] afr:
removing /dir-with-subdir on brick2 failed (Directory not empty)
[2009-09-28 15:30:09] D
[afr-self-heal-entry.c:455:afr_sh_entry_expunge_remove_cbk] afr:
removing /dir-with-file on brick2 failed (Directory not empty)
[2009-09-28 15:30:09] D
[afr-self-heal-entry.c:449:afr_sh_entry_expunge_remove_cbk] afr:
removed /dir-empty on brick2


# server-x.vol
volume brick
  type storage/posix
  option directory /home/iksaif/tmp/glusterfs/export/1
end-volume

volume brick-lock
  type features/posix-locks
  option mandatory-locks on
  subvolumes brick
end-volume

volume server
  type protocol/server
  option transport-type tcp
  option transport.socket.bind-address 127.0.0.1
  option transport.socket.listen-port 7001
  subvolumes brick-lock
  option auth.addr.brick-lock.allow *
end-volume

# client.vol
volume brick1
 type protocol/client
 option transport-type tcp
 option remote-host 127.0.0.1
 option remote-port 7001
 option remote-subvolume brick-lock
end-volume

volume brick2
 type protocol/client
 option transport-type tcp
 option remote-host 127.0.0.1
 option remote-port 7002
 option remote-subvolume brick-lock
end-volume

volume afr
 type cluster/afr
 subvolumes brick1 brick2
end-volume

Thanks;
-- 
Corentin Chary
http://xf.iksaif.net
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] is glusterfs DHT really distributed?

2009-09-28 Thread Wei Dong

Hi All,

I noticed a very weird phenomenon when I'm copying data (200KB image 
files) to our glusterfs storage.  When I run only run client, it copies 
roughly 20 files per second and as soon as I start a second client on 
another machine, the copy rate of the first client immediately degrade 
to 5 files per second.   When I stop the second client, the first client 
will immediately speed up to the original 20 files per second.  When I 
run 15 clients, the aggregate throughput is about 8 files per second, 
much worse than running only one client.  Neither CPU nor network is 
saturated.  My volume file is attached.  The servers are running on a 66 
node cluster and the clients are a 15-node cluster.


We have 33x2 servers and at most 15 separate machines, with each server 
serving < 0.5 clients on average.  I cannot think of a reason for a 
distributed system to behave like this.  There must be some kind of 
central access point.


- Wei






volume posix0
type storage/posix
option directory /state/partition1/gluster
end-volume

volume lock0
type features/locks
subvolumes posix0
end-volume

volume brick0
type performance/io-threads
option thread-count 4
subvolumes lock0
end-volume

volume posix1
type storage/posix
option directory /state/partition2/gluster
end-volume

volume lock1
type features/locks
subvolumes posix1
end-volume

volume brick1
type performance/io-threads
option thread-count 4
subvolumes lock1
end-volume

volume posix2
type storage/posix
option directory /state/partition3/gluster
end-volume

volume lock2
type features/locks
subvolumes posix2
end-volume

volume brick2
type performance/io-threads
option thread-count 4
subvolumes lock2
end-volume

volume posix3
type storage/posix
option directory /state/partition4/gluster
end-volume

volume lock3
type features/locks
subvolumes posix3
end-volume

volume brick3
type performance/io-threads
option thread-count 4
subvolumes lock3
end-volume

volume server
type protocol/server
option transport-type tcp
option transport.socket.listen-port 7001
option auth.addr.brick0.allow *.*.*.*
option auth.addr.brick1.allow *.*.*.*
option auth.addr.brick2.allow *.*.*.*
option auth.addr.brick3.allow *.*.*.*
subvolumes brick0 brick1 brick2 brick3
end-volume


volume brick-0-0-0
type protocol/client
option transport-type tcp
option remote-host c8-0-0
option remote-port 7001
option remote-subvolume brick0
end-volume

volume brick-0-0-1
type protocol/client
option transport-type tcp
option remote-host c8-1-0
option remote-port 7001
option remote-subvolume brick0
end-volume

volume rep-0-0
type cluster/replicate
subvolumes brick-0-0-0 brick-0-0-1
end-volume

volume brick-0-1-0
type protocol/client
option transport-type tcp
option remote-host c8-0-0
option remote-port 7001
option remote-subvolume brick1
end-volume

volume brick-0-1-1
type protocol/client
option transport-type tcp
option remote-host c8-1-0
option remote-port 7001
option remote-subvolume brick1
end-volume

volume rep-0-1
type cluster/replicate
subvolumes brick-0-1-0 brick-0-1-1
end-volume

volume brick-0-2-0
type protocol/client
option transport-type tcp
option remote-host c8-0-0
option remote-port 7001
option remote-subvolume brick2
end-volume

volume brick-0-2-1
type protocol/client
option transport-type tcp
option remote-host c8-1-0
option remote-port 7001
option remote-subvolume brick2
end-volume

volume rep-0-2
type cluster/replicate
subvolumes brick-0-2-0 brick-0-2-1
end-volume

volume brick-0-3-0
type protocol/client
option transport-type tcp
option remote-host c8-0-0
option remote-port 7001
option remote-subvolume brick3
end-volume

volume brick-0-3-1
type protocol/client
option transport-type tcp
option remote-host c8-1-0
option remote-port 7001
option remote-subvolume brick3
end-volume

volume rep-0-3
type cluster/replicate
subvolumes brick-0-3-0 brick-0-3-1
end-volume

volume brick-1-0-0
type protocol/client
option transport-type tcp
option remote-host c8-0-1
option remote-port 7001
option remote-subvolume brick0
end-volume

volume brick-1-0-1
type protocol/client
option transport-type tcp
option remote-host c8-1-1
option remote-port 7001
option remote-subvolume brick0
end-volume

volume rep-1-0
type cluster/replicate
subvolumes brick-1-0-0 brick-1-0-1
end-volume

volume brick-1-1-0
type protocol/client
option transport-type tcp
option remote-host c8-0-1
option remote-port 7001
option remote-subvolume brick1
end-volume

volume brick-1-1-1
type protocol/client
option transport-type tcp
option remote-host c8-1-1
option remote-port 7001
option remote-subvolume brick1
end-volume

volume rep-1-1
type cluster/replicate
subvolumes brick-1-1-0 brick-1-1-1
end-volume

volume brick-1-2-0
type protocol/client
option transport-type tcp
option remote-host c8-0-1
option remote-port 7001
option remote-subvolume brick2
end-volume

volume brick-1-2-1
type protocol/client
option transport-type tcp
option remote-host c8-1-1
option remote-port 7001
option remote-subvolume brick2
end-volume

volume rep-1-2
type clu

Re: [Gluster-users] is glusterfs DHT really distributed?

2009-09-28 Thread Mark Mielke

On 09/28/2009 10:35 AM, Wei Dong wrote:

Hi All,

I noticed a very weird phenomenon when I'm copying data (200KB image 
files) to our glusterfs storage.  When I run only run client, it 
copies roughly 20 files per second and as soon as I start a second 
client on another machine, the copy rate of the first client 
immediately degrade to 5 files per second.   When I stop the second 
client, the first client will immediately speed up to the original 20 
files per second.  When I run 15 clients, the aggregate throughput is 
about 8 files per second, much worse than running only one client.  
Neither CPU nor network is saturated.  My volume file is attached.  
The servers are running on a 66 node cluster and the clients are a 
15-node cluster.


We have 33x2 servers and at most 15 separate machines, with each 
server serving < 0.5 clients on average.  I cannot think of a reason 
for a distributed system to behave like this.  There must be some kind 
of central access point.


Although there is probably room for the GlusterFS folk to optimize...

You should consider directory write operations to involve the whole 
cluster. Creating a file is a directory write operation. Think of how it 
might have to do self-heal across the cluster, make sure the name is 
right and not already in use across the cluster, and such things.


Once you get to reads and writes for a particular file, it should be 
distributed.


Cheers,
mark

--
Mark Mielke

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Installing glusterfs ver 2.* on fedora7 i386

2009-09-28 Thread Robertson, Jason
One change from 1.3 to 2.0 is that the vol files change a great deal, my
upgrade from 1.3 to 2.0 took a few hours of reconfiguration and testing
(testing took the most time)

cluster/afr has changed to cluster/replicate and such  take the time to
read the documentation.

Outside of the vol configurations, the upgrade was rather seemless, from
the filesystem point of view.


--

Message: 2
Date: Mon, 28 Sep 2009 19:37:15 +0900
From: "Joon Woo Kim" 
Subject: [Gluster-users] Installing glusterfs ver 2.* on fedora7 i386
?
To: 
Message-ID: <01a601ca4027$a5121b90$df2a0...@oasys.kt.co.kr>
Content-Type: text/plain; charset="ks_c_5601-1987"

I succeeded in installing 2.0.6 src rpm on my fedora7 i386 hosts,
but it does not work at all. Everytime I try to write files in the
mounted folder,
the operation fails with error messages like " invalid argument".

I think there's difference between fc7 and glusterfs ver 2.* in the file
attribute format.
Does anyone know how to fix this? The former version 1.3.* is not
comfortable to use,
so I don't want to choose.


Joon Woo Kim

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] is glusterfs DHT really distributed?

2009-09-28 Thread Wei Dong
Your reply makes all sense to me.  I remember that auto-heal happens at 
file reading; doest that mean opening a file for read is also a global 
operation?  Do you mean that there's no other way of copying 30 million 
files to our 66-node glusterfs cluster for parallel processing other 
than waiting for half a month?  Can I somehow disable self-heal and get 
a seedup?


Things turn out to be too bad for me.

- Wei


Mark Mielke wrote:

On 09/28/2009 10:35 AM, Wei Dong wrote:

Hi All,

I noticed a very weird phenomenon when I'm copying data (200KB image 
files) to our glusterfs storage.  When I run only run client, it 
copies roughly 20 files per second and as soon as I start a second 
client on another machine, the copy rate of the first client 
immediately degrade to 5 files per second.   When I stop the second 
client, the first client will immediately speed up to the original 20 
files per second.  When I run 15 clients, the aggregate throughput is 
about 8 files per second, much worse than running only one client.  
Neither CPU nor network is saturated.  My volume file is attached.  
The servers are running on a 66 node cluster and the clients are a 
15-node cluster.


We have 33x2 servers and at most 15 separate machines, with each 
server serving < 0.5 clients on average.  I cannot think of a reason 
for a distributed system to behave like this.  There must be some 
kind of central access point.


Although there is probably room for the GlusterFS folk to optimize...

You should consider directory write operations to involve the whole 
cluster. Creating a file is a directory write operation. Think of how 
it might have to do self-heal across the cluster, make sure the name 
is right and not already in use across the cluster, and such things.


Once you get to reads and writes for a particular file, it should be 
distributed.


Cheers,
mark



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] is glusterfs DHT really distributed?

2009-09-28 Thread Mark Mielke

On 09/28/2009 10:51 AM, Wei Dong wrote:
Your reply makes all sense to me.  I remember that auto-heal happens 
at file reading; doest that mean opening a file for read is also a 
global operation?  Do you mean that there's no other way of copying 30 
million files to our 66-node glusterfs cluster for parallel processing 
other than waiting for half a month?  Can I somehow disable self-heal 
and get a seedup?


Things turn out to be too bad for me.


On this page:

http://www.gluster.com/community/documentation/index.php/Translators/cluster/distribute


It seems to suggest that 'lookup-unhashed' says that the default is 'on'.

Perhaps try turning it 'off'?

Cheers,
mark






Mark Mielke wrote:

On 09/28/2009 10:35 AM, Wei Dong wrote:

Hi All,

I noticed a very weird phenomenon when I'm copying data (200KB image 
files) to our glusterfs storage.  When I run only run client, it 
copies roughly 20 files per second and as soon as I start a second 
client on another machine, the copy rate of the first client 
immediately degrade to 5 files per second.   When I stop the second 
client, the first client will immediately speed up to the original 
20 files per second.  When I run 15 clients, the aggregate 
throughput is about 8 files per second, much worse than running only 
one client.  Neither CPU nor network is saturated.  My volume file 
is attached.  The servers are running on a 66 node cluster and the 
clients are a 15-node cluster.


We have 33x2 servers and at most 15 separate machines, with each 
server serving < 0.5 clients on average.  I cannot think of a reason 
for a distributed system to behave like this.  There must be some 
kind of central access point.


Although there is probably room for the GlusterFS folk to optimize...

You should consider directory write operations to involve the whole 
cluster. Creating a file is a directory write operation. Think of how 
it might have to do self-heal across the cluster, make sure the name 
is right and not already in use across the cluster, and such things.


Once you get to reads and writes for a particular file, it should be 
distributed.


Cheers,
mark







--
Mark Mielke

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] is glusterfs DHT really distributed?

2009-09-28 Thread Anand Avati
>  http://www.gluster.com/community/documentation/index.php/Translators/cluster/distribute
>
> It seems to suggest that 'lookup-unhashed' says that the default is 'on'.
>
> Perhaps try turning it 'off'?

Wei,
   There are two things we would like you to try. First is what Mark
has just pointed, the 'option lookup-unhashed off' in distribute. The
second is 'option transport.socket.nodelay on' in each of your
protocol/client _and_ protocol/server volumes. Do let us know what
influence these changes have on your performance.

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users