Re: [Gluster-users] Revisit: FORTRAN Codes and File I/O

2010-06-30 Thread Jeff Darcy

On 06/30/2010 03:33 PM, Brian Smith wrote:

Spoke too soon.  Same problem occurs minus all performance translators.
Debug logs on the server show

[2010-06-30 15:30:54] D [server-protocol.c:2104:server_create_cbk]
server-tcp: create(/b/brs/Si/CHGCAR) inode (ptr=0x2aaab00e05b0,
ino=2159011921, gen=5488651098262601749) found conflict
(ptr=0x2aaab40cca00, ino=2159011921, gen=5488651098262601749)
[2010-06-30 15:30:54] D [server-resolve.c:386:resolve_entry_simple]
server-tcp: inode (pointer: 0x2aaab40cca00 ino:2159011921) found for
path (/b/brs/Si/CHGCAR) while type is RESOLVE_NOT
[2010-06-30 15:30:54] D [server-protocol.c:2132:server_create_cbk]
server-tcp: 72: CREATE (null) (0) ==>  -1 (File exists)
   
The first line almost looks like a create attempt for a file that 
already exists at the server.  The second and third lines look like *yet 
another* create attempt, failing this time before the request is even 
passed to the next translator.  This might be a good time to drag out 
the debug/trace translator, and sit it on top of brick1 to watch the 
create calls.  That will help nail down the exact sequence of events as 
the server sees them, so we don't go looking in the wrong places.  It 
might even be useful to do the same on the client side, but perhaps not 
yet.  Instructions are here:


http://www.gluster.com/community/documentation/index.php/Translators/debug/trace

In the mean time, to further identity which code paths are most likely 
to be relevant, it would be helpful to know a couple more things.


(1) Is each storage/posix volume using just one local filesystem, or is 
it possible that the underlying directory tree spans more than one?  
This could lead to inode-number duplication, which requires extra handling.


(2) Are either of the server-side volumes close to being full?  This 
could result in creating an extra "linkfile" on the subvolume/server 
where we'd normally create the file, pointing to where we really created 
it due to space considerations.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Revisit: FORTRAN Codes and File I/O

2010-06-30 Thread Brian Smith
Spoke too soon.  Same problem occurs minus all performance translators.
Debug logs on the server show

[2010-06-30 15:30:54] D [server-protocol.c:2104:server_create_cbk]
server-tcp: create(/b/brs/Si/CHGCAR) inode (ptr=0x2aaab00e05b0,
ino=2159011921, gen=5488651098262601749) found conflict
(ptr=0x2aaab40cca00, ino=2159011921, gen=5488651098262601749)
[2010-06-30 15:30:54] D [server-resolve.c:386:resolve_entry_simple]
server-tcp: inode (pointer: 0x2aaab40cca00 ino:2159011921) found for
path (/b/brs/Si/CHGCAR) while type is RESOLVE_NOT
[2010-06-30 15:30:54] D [server-protocol.c:2132:server_create_cbk]
server-tcp: 72: CREATE (null) (0) ==> -1 (File exists)

-Brian

-- 
Brian Smith
Senior Systems Administrator
IT Research Computing, University of South Florida
4202 E. Fowler Ave. ENB204
Office Phone: +1 813 974-1467
Organization URL: http://rc.usf.edu


On Wed, 2010-06-30 at 13:06 -0400, Brian Smith wrote:
> I received these in my debug output during a run that failed:
> 
> [2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead:
> unexpected offset (8192 != 1062) resetting
> [2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead:
> unexpected offset (8192 != 1062) resetting
> [2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead:
> unexpected offset (8192 != 1062) resetting
> [2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead:
> unexpected offset (8192 != 1062) resetting
> 
> I disabled the read-ahead translator as well as the three other
> performance translators commented out in my vol file (I'm on GigE; the
> docs say I can still reach link max anyway) and my processes appear to
> be running smoothly.  I'll go ahead and submit the bug report with
> tracing enabled as well.
> 
> -Brian
> 
> 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Shared files occasionally unreadable from some nodes

2010-06-30 Thread Jonathan nilsson
If I use md5sum I get two different results on two different hosts. On the
host where the file appears to be empty I got the md5sum of an empty file
(d41d8cd98f00b204e9800998ecf8427e). I did some experiments since my last
post and it looks like disabling the iocache plugin will eliminate these
errors. I've attached the logs from the host where the file appears empty.

On Wed, Jun 30, 2010 at 1:42 AM, Lakshmipathi wrote:

> Hi Jonathan nilsson,
> Could you please verify the files integrity using md5sum instead of
> checking size using ls command ?
> Please sent us the log files too.
>
> --
> 
> Cheers,
> Lakshmipathi.G
> FOSS Programmer.
> - Original Message -
> From: "Jonathan nilsson" 
> To: gluster-users@gluster.org
> Sent: Thursday, June 24, 2010 9:22:29 PM
> Subject: [Gluster-users] Shared files occasionally unreadable from some
> nodes
>
> Hello all,
>
> I am new to gluster and I've been seeing some inconsistent behavior. When I
> write files to the gluster about 1 in 1000 will be unreadable on one node.
> From that node I can see the file with ls and ls does report the correct
> size. However running cat on the file produces no output and vim thinks
> that
> it is full of the ^@ character. If I try to read the file from another node
> it is fine.
>
> After some Googling I've read that an ls -lR can fix similar problems but
> it
> hasn't had any effect for me. Running touch on the file does restore its
> contents. I am running Glusterfs 3.0.4 on RHEL 5.4. I generated the config
> files with the volgen tool and didn't make any changes.
>
> Is this a known issue or something that could've happened if I screwed up
> the configuration?
>
> Here is my glusterfs.vol
> ## file auto generated by /usr/bin/glusterfs-volgen (mount.vol)
> # Cmd line:
> # $ /usr/bin/glusterfs-volgen -n warehouse --raid 1
> gluster1:/export/warehouse gluster2:/export/warehouse
> gluster3:/export/warehouse gluster4:/export/warehouse
>
> # RAID 1
> # TRANSPORT-TYPE tcp
> volume gluster4-1
>type protocol/client
>option transport-type tcp
>option remote-host gluster4
>option transport.socket.nodelay on
>option transport.remote-port 6996
>option remote-subvolume brick1
> end-volume
>
> volume gluster2-1
>type protocol/client
>option transport-type tcp
>option remote-host gluster2
>option transport.socket.nodelay on
>option transport.remote-port 6996
>option remote-subvolume brick1
> end-volume
>
> volume gluster3-1
>type protocol/client
>option transport-type tcp
>option remote-host gluster3
>option transport.socket.nodelay on
>option transport.remote-port 6996
>option remote-subvolume brick1
> end-volume
>
> volume gluster1-1
>type protocol/client
>option transport-type tcp
>option remote-host gluster1
>option transport.socket.nodelay on
>option transport.remote-port 6996
>option remote-subvolume brick1
> end-volume
>
> volume mirror-0
>type cluster/replicate
>subvolumes gluster1-1 gluster2-1
> end-volume
>
> volume mirror-1
>type cluster/replicate
>subvolumes gluster3-1 gluster4-1
> end-volume
>
> volume distribute
>type cluster/distribute
>subvolumes mirror-0 mirror-1
> end-volume
>
> volume readahead
>type performance/read-ahead
>option page-count 4
>subvolumes distribute
> end-volume
>
> volume iocache
>type performance/io-cache
>option cache-size `echo $(( $(grep 'MemTotal' /proc/meminfo | sed
> 's/[^0-9]//g') / 5120 ))`MB
>option cache-timeout 1
>subvolumes readahead
> end-volume
>
> volume quickread
>type performance/quick-read
>option cache-timeout 1
>option max-file-size 64kB
>subvolumes iocache
> end-volume
>
> volume writebehind
>type performance/write-behind
>option cache-size 4MB
>subvolumes quickread
> end-volume
>
> volume statprefetch
>type performance/stat-prefetch
>subvolumes writebehind
> end-volume
>
> ## file auto
> generated by /usr/bin/glusterfs-volgen (export.vol)
> # Cmd line:
> # $ /usr/bin/glusterfs-volgen -n warehouse --raid 1
> gluster1:/export/warehouse gluster2:/export/warehouse
> gluster3:/export/warehouse gluster4:/export/warehouse
>
> volume posix1
>  type storage/posix
>  option directory /export/warehouse
> end-volume
>
> volume locks1
>type features/locks
>subvolumes posix1
> end-volume
>
> volume brick1
>type performance/io-threads
>option thread-count 8
>subvolumes locks1
> end-volume
>
> volume server-tcp
>type protocol/server
>option transport-type tcp
>option auth.addr.brick1.allow *
>option transport.socket.listen-port 6996
>option transport.socket.nodelay on
>subvolumes brick1
> end-volume
>
> and here is my glusterfsd.vol
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>
_

Re: [Gluster-users] Web Farm Configuration

2010-06-30 Thread Dennis A. Arkhangelski
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I guess Jenn just faces severe I/O lags when farm connection rate starts
to increase. Gluster does not handle industrial load out of the box.

Jenn, can you please briefly describe your Gluster configuration and
application specifics? Actually I need:
- - your volume topology (how many, distribute/stripe/afr, stripe block
size etc.)
- - performance translators that are in use, specific translator settings
- - do you use FS locks?
- - what's your average file size?
- - can you please outline your FS access patterns (e.g. mostly read,
mostly write, access request rate estimates etc.)?

Quick and general hints are:
1. Renice glusterfsd and glusterfs processes on all server and client
nodes, I typically use -20. This is the must indeed, I even modify my rc
scripts to reflect it.
2. Check your I/O scheduler on server nodes and set it to "anticipatory"
(assuming you use Linux). It goes like this:
[r...@tifereth ~]# cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq]
[r...@tifereth ~]# echo "anticipatory" > /sys/block/sda/queue/scheduler
[r...@tifereth ~]# cat /sys/block/sda/queue/scheduler
noop [anticipatory] deadline cfq
[r...@tifereth ~]#
3. Set your io-cache cache size and quick-read file size values to
reasonable minimum (glusterfs process may crash randomly under load when
these are set too high, seems to be version-independent).

On 30.06.10 19:05, Emmanuel Noobadmin wrote:
> I'll probably be using gluster for a web farm later this year so would
> you mind sharing a bit more stats on the load you are handling (and
> what kind of servers) when it crash and burns?
> 
> 
> On 6/30/10, Jenn Fountain  wrote:
>> I am researching the best solution for file replication (images, htmls, etc)
>> for our web farm app.   Originally, the current production was configured to
>> read from the gluster mount on all 4 servers in the farm.  However, when the
>> load became high, the servers crashed and burned so I had to remove gluster.
>>I realize that our configuration may not have been optimal so I am trying
>> to find the best configuration with gluster.  Does anyone on the list have
>> gluster configured in a webfarm and how do you have it configured?   Thanks
>> for any info!
>>
>> -Jennifer
>>
>>
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> .
> 

- -- 
Regards,
Dennis Arkhangelski
Technical Manager
WHB Networks LLC.
http://www.webhostingbuzz.com/

-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwrdhsACgkQH77FUyBB2YWbSACfW/KtXUX+tJNqRl7SddYCmi9n
Ma4AnjXEdN7mwc+NL64WkE7u/Otr200f
=K8lX
-END PGP SIGNATURE-
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Revisit: FORTRAN Codes and File I/O

2010-06-30 Thread Brian Smith
I received these in my debug output during a run that failed:

[2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead:
unexpected offset (8192 != 1062) resetting
[2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead:
unexpected offset (8192 != 1062) resetting
[2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead:
unexpected offset (8192 != 1062) resetting
[2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead:
unexpected offset (8192 != 1062) resetting

I disabled the read-ahead translator as well as the three other
performance translators commented out in my vol file (I'm on GigE; the
docs say I can still reach link max anyway) and my processes appear to
be running smoothly.  I'll go ahead and submit the bug report with
tracing enabled as well.

-Brian


-- 
Brian Smith
Senior Systems Administrator
IT Research Computing, University of South Florida
4202 E. Fowler Ave. ENB204
Office Phone: +1 813 974-1467
Organization URL: http://rc.usf.edu


On Tue, 2010-06-29 at 21:45 -0700, Harshavardhana wrote:
> On 06/29/2010 04:36 PM, Brian Smith wrote:
> > It's obviously been a while since I brought this issue up, but it has
> > cropped up again for us.  We're now on 3.0.3 and I've included my
> > glusterfs*.vol files below.  We end up with file i/o errors like the
> > ones below:
> >
> > forrtl: File exists
> > forrtl: severe (10): cannot overwrite existing file, unit 18,
> > file /work/b/brs/vdWSi/CHGCAR
> >
> > Even if the file existed, it shouldn't really be a problem.  Other file
> > systems work just fine.  I'll get some more verbose logging going and
> > share my output.  glusterfsd.vol is the same in the referenced e-mails
> > below.
> >
> > Thanks in advance,
> > -Brian
> >
> >
> Hi Brian,
> 
>   We would need debug or trace logs from the client side? . This 
> seems to be a race and i assume you are using "vasp" application which 
> creates the file CHGCAR, DOSCAR etc files.
> Since we don't have vasp in house, would you mind opening a bug at 
> http://bugs.gluster.com/
> and also "trace" logs from the client side attached with it.
> 
> Regards
> 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS performance questions for Amazon EC2 deployment

2010-06-30 Thread Craig Box
> OCFS2 is a shared-disk filesystem, and in EC2 neither ephemeral storage
> nor EBS can be mounted on more than one instance simultaneously.
> Therefore, you'd need something to provide a shared-disk abstraction
> within an AZ.  DRBD mode can do this, and I think it's even reentrant so
> that the devices created this way can themselves be used as components
> for the inter-AZ-replication devices, but active/active mode isn't
> recommended and I don't think you can connect more than two nodes this
> way.

What I am doing is using DRBD for shared disk between AZs, which (with
OCFS2) then gives me a standard POSIX file system, which I can share
inside the AZ with GlusterFS.  A bit of a duct-tape job perhaps, but
seems like it will work.  The proof will be in the testing, which I am
just building instances for now.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Web Farm Configuration

2010-06-30 Thread Emmanuel Noobadmin
I'll probably be using gluster for a web farm later this year so would
you mind sharing a bit more stats on the load you are handling (and
what kind of servers) when it crash and burns?


On 6/30/10, Jenn Fountain  wrote:
> I am researching the best solution for file replication (images, htmls, etc)
> for our web farm app.   Originally, the current production was configured to
> read from the gluster mount on all 4 servers in the farm.  However, when the
> load became high, the servers crashed and burned so I had to remove gluster.
>I realize that our configuration may not have been optimal so I am trying
> to find the best configuration with gluster.  Does anyone on the list have
> gluster configured in a webfarm and how do you have it configured?   Thanks
> for any info!
>
> -Jennifer
>
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS performance questions for Amazon EC2 deployment

2010-06-30 Thread Jeff Darcy
On 06/30/2010 10:22 AM, Craig Box wrote:
> OK, so this brings me to Plan B.  (Feel free to suggest a plan C if you can.)
> 
> I want to have six nodes, three in each availability zone, replicate a
> Mercurial repository.  Here's some art:
> 
> [gluster c/s] [gluster c/s] | [gluster c/s] [gluster c/s]
> |
>[gluster s]  |  [gluster s]
>   [OCFS 2]  |  [OCFS 2]
>   [ DRBD ] --- [ DRBD ]
> 
> DRBD doing the cross-AZ replication, and a three node GlusterFS
> cluster inside each AZ.  That way, any one machine going down should
> still mean all the rest of the nodes can access the files.
> 
> Sound believable?

OCFS2 is a shared-disk filesystem, and in EC2 neither ephemeral storage
nor EBS can be mounted on more than one instance simultaneously.
Therefore, you'd need something to provide a shared-disk abstraction
within an AZ.  DRBD mode can do this, and I think it's even reentrant so
that the devices created this way can themselves be used as components
for the inter-AZ-replication devices, but active/active mode isn't
recommended and I don't think you can connect more than two nodes this
way.  What's really needed, and I'm slightly surprised doesn't already
exist, is a DRBD proxy that can be connected as a destination by several
local DRBD sources, and then preserve request order even across devices
as it becomes a DRBD source and ships those requests to another proxy in
another AZ.  Linbit's proxy doesn't seem to be designed for that
particular purpose.  The considerations for dm-replicator are
essentially the same BTW.

An async/long-distance replication translator has certainly been a
frequent topic of discussion between me, the Gluster folks, and others.
 I have plans to shoot for full N-way active/active replication, but
with that ambition comes complexity and we'll probably see simpler forms
(e.g. two-way active/passive) much earlier.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS performance questions for Amazon EC2 deployment

2010-06-30 Thread Craig Box
OK, so this brings me to Plan B.  (Feel free to suggest a plan C if you can.)

I want to have six nodes, three in each availability zone, replicate a
Mercurial repository.  Here's some art:

[gluster c/s] [gluster c/s] | [gluster c/s] [gluster c/s]
|
   [gluster s]  |  [gluster s]
  [OCFS 2]  |  [OCFS 2]
  [ DRBD ] --- [ DRBD ]

DRBD doing the cross-AZ replication, and a three node GlusterFS
cluster inside each AZ.  That way, any one machine going down should
still mean all the rest of the nodes can access the files.

Sound believable?

Craig

On Tue, Jun 29, 2010 at 5:16 PM, Count Zero  wrote:
> My short (and probably disappointing) answer is that with all my attempts, 
> and weeks trying to research and improve the performance, and asking here on 
> the mailing lists, that I have both failed to make it work over WAN, and that 
> authoritative answers were that "Wan is in the works".
>
> So for now, until WAN is officially supported, Keep it working within the 
> same zone, and use some other replication method to synchronize the two zones.
>
>
>
> On Jun 29, 2010, at 7:12 PM, Craig Box wrote:
>
>> Hi all,
>>
>> Spent the day reading the docs, blog posts, this mailing list, and
>> lurking on IRC, but still have a few questions to ask.
>>
>> My goal is to implement a cross-availability-zone file system in
>> Amazon EC2, and ensure that even if one server goes down, or is
>> rebooted, all clients can continue, reading from/writing to a
>> secondary server.
>>
>> The primary purpose is to share some data files for running a web site
>> for an open source project - a Mercurial repository and some shared
>> data, such as wiki images - but the main code/images/CSS etc for the
>> site will be stored on each instance and managed by version control.
>>
>> As we have 150GB ephemeral storage (aka instance store, as opposed to
>> EBS) free on each instance, I thought it might be good if we were to
>> use that as the POSIX backend for Gluster, and have a complete copy of
>> the Mercurial repository on each system, with each client using its
>> local brick as the read subvolume for speed.  That way, you don't need
>> to go to the network for reads, which ought to be far more common than
>> writes.
>>
>> We want to have the files available to seven servers, four in one AZ
>> and three in another.
>>
>> I think it best if we maximise client performance, rather than
>> replication speed; if one of our nodes is a few seconds behind, it's
>> not the end of the world, but if it consistently takes a few seconds
>> on every file write, that would be irritating.
>>
>> Some questions which I hope someone can answer:
>>
>> 1. Somewhat obviously, when we turn on replication and introduce a
>> second server, write speed to the volume drops drastically  If we use
>> client-side replication, we can have redundancy in servers.  Does this
>> mean that GlusterFS client blocks, waiting for the client to write to
>> every server?  If we changed to server-side replication, would this
>> background the replication overhead?
>>
>> 2. If we were to use server-side replication, should we use the
>> write-behind translator in the server stack?
>>
>> 3. I was originally using 3.0.2 packaged with Ubuntu 10.04, and have
>> tried upgrading to 3.0.5rc7 (as suggested on this list) for better
>> performance with the quick-read translator, and other fixes.  However,
>> this actually seemed to make write performance *worse*!  Should this
>> be expected?
>>
>> (Our write test is totally scientific *cough*: we cp -a a directory of
>> files onto the mounted volume.)
>>
>> 4. Should I expect a different performance pattern using the instance
>> storage, rather than an EBS volume?  I found this post helpful -
>> http://www.sirgroane.net/2010/03/tuning-glusterfs-for-apache-on-ec2/ -
>> but it talks more about reading files than writing them, and it writes
>> off some translators as not useful because of the way EBS works.
>>
>> 5. Is cluster/replicate even the right answer?  Could we do something
>> with cluster/distribute - is this, in effect, a RAID 10?  It doesn't
>> seem that replicate could possibly scale up to the number of nodes you
>> hear about other people using GlusterFS with.
>>
>> 6. Could we do something crafty where you read directly from the POSIX
>> volume but you do all your writes through GlusterFS?  I see it's
>> unsupported, but I guess that is just because you might get old data
>> by reading the disk, rather than the client.
>>
>> Any advice that anyone can provide is welcome, and my thanks in advance!
>>
>> Regards
>> Craig
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-u

[Gluster-users] Web Farm Configuration

2010-06-30 Thread Jenn Fountain
I am researching the best solution for file replication (images, htmls, etc) 
for our web farm app.   Originally, the current production was configured to 
read from the gluster mount on all 4 servers in the farm.  However, when the 
load became high, the servers crashed and burned so I had to remove gluster.
I realize that our configuration may not have been optimal so I am trying to 
find the best configuration with gluster.  Does anyone on the list have gluster 
configured in a webfarm and how do you have it configured?   Thanks for any 
info!

-Jennifer





___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Shared VM disk/image on gluster for redundancy?

2010-06-30 Thread Jeff Darcy
On 06/30/2010 07:53 AM, Emmanuel Noobadmin wrote:
> On Wed, Jun 30, 2010 at 7:25 PM, Jeff Darcy  wrote:
> 
>> Another option, since you do have a fast interconnect, would be
>> to place all of the permanent storage on the data nodes and use storage
>> on the app nodes only for caching (as we had discussed).  Replicate
>> pair-wise or diagonally between data nodes, distribute across the
>> replica sets, and you'd have a pretty good solution to handle future
>> expansion.
> 
> I think I'll probably go with this since you mention the replicate
> over distribute doesn't work that well and I like to keep the app and
> storage separate. But might change my mind if testing indicates the
> performance level is not acceptable.
> 
> As for fast interconnect, does that imply 10GbE/FC kind of speeds or
> would normal GbE work?

Hm, it appears I was confusing this thread with another one where the
person had mentioned using DDR IB.  By "fast interconnect" (having
worked with interconnects up to 48Gb/s/node) I usually mean at least
10GbE and preferably some form of IB.  Accessing all storage over a GbE
network can work, but often requires more careful tuning and selection
of equipment to get adequate performance.  A lot depends on how much you
can benefit from things like read-ahead and io-cache, or how much data
you're willing to leave in write-behind buffers.  It might well be the
case that replicate over nufa/distribute will work better for your
environment after all despite the issues with app-node "crosstalk" or
the "inversion" of replicate vs. distribute.  I think it's time to
experiment with some of the options and see how they do.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Shared VM disk/image on gluster for redundancy?

2010-06-30 Thread Emmanuel Noobadmin
On Wed, Jun 30, 2010 at 7:25 PM, Jeff Darcy  wrote:

> Another option, since you do have a fast interconnect, would be
> to place all of the permanent storage on the data nodes and use storage
> on the app nodes only for caching (as we had discussed).  Replicate
> pair-wise or diagonally between data nodes, distribute across the
> replica sets, and you'd have a pretty good solution to handle future
> expansion.

I think I'll probably go with this since you mention the replicate
over distribute doesn't work that well and I like to keep the app and
storage separate. But might change my mind if testing indicates the
performance level is not acceptable.

As for fast interconnect, does that imply 10GbE/FC kind of speeds or
would normal GbE work?



On 6/30/10, Jeff Darcy  wrote:
> On 06/29/2010 11:31 PM, Emmanuel Noobadmin wrote:
>> With the nufa volumes, a file is only written to one of the volumes
>> listed in its definition.
>> If the volume is a replicate volume, then the file is replicated on
>> each of the volumes listed in its definition.
>>
>> e.g in this case
>> volume my_nufa
>>   type cluster/nufa
>>   option local-volume-name rep1
>>   subvolumes rep0 rep1 rep2
>> end-volume
>>
>> A file is only found in one of rep0 rep1 or rep2. If it was on rep2,
>> then it would be inaccessible if rep2 fails such as network failure
>> cutting rep2 off.
>
> Yes, but rep2 as a whole could only fail if all of its component volumes
> - one on an app node and one on a data node - failed simultaneously.
> That's about as good protection as you're going to get without
> increasing your replication level (therefore decreasing both performance
> and effective storage utilization).
>
>> Then when I add a rep3, gluster should automatically start putting new
>> files onto it.
>>
>> At this point though, it seems that if I use nufa, I would have an
>> issue if I add a purely storage only rep3 instead of an app+storage
>> node. None of the servers will use it until their local volume reaches
>> max capacity right? :D
>>
>> So if I preferred to have the load spread out more evenly, I should
>> then be using cluster/distribute?
>
> If you want even distribution across different or variable numbers of
> app/data nodes, then cluster/distribute would be the way to go.  For
> example, you could create a distribute set across the storage nodes and
> a nufa set across the app nodes, and then replicate between the two
> (each app node preferring the local member of the nufa set).  You'd lose
> the ability to suppress app-node-to-app-node communication with
> different read-subvolume assignments, though, and in my experience
> replicate over distribute doesn't work quite as well as the other way
> around.  Another option, since you do have a fast interconnect, would be
> to place all of the permanent storage on the data nodes and use storage
> on the app nodes only for caching (as we had discussed).  Replicate
> pair-wise or diagonally between data nodes, distribute across the
> replica sets, and you'd have a pretty good solution to handle future
> expansion.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Shared VM disk/image on gluster for redundancy?

2010-06-30 Thread Jeff Darcy
On 06/29/2010 11:31 PM, Emmanuel Noobadmin wrote:
> With the nufa volumes, a file is only written to one of the volumes
> listed in its definition.
> If the volume is a replicate volume, then the file is replicated on
> each of the volumes listed in its definition.
> 
> e.g in this case
> volume my_nufa
>   type cluster/nufa
>   option local-volume-name rep1
>   subvolumes rep0 rep1 rep2
> end-volume
> 
> A file is only found in one of rep0 rep1 or rep2. If it was on rep2,
> then it would be inaccessible if rep2 fails such as network failure
> cutting rep2 off.

Yes, but rep2 as a whole could only fail if all of its component volumes
- one on an app node and one on a data node - failed simultaneously.
That's about as good protection as you're going to get without
increasing your replication level (therefore decreasing both performance
and effective storage utilization).

> Then when I add a rep3, gluster should automatically start putting new
> files onto it.
> 
> At this point though, it seems that if I use nufa, I would have an
> issue if I add a purely storage only rep3 instead of an app+storage
> node. None of the servers will use it until their local volume reaches
> max capacity right? :D
> 
> So if I preferred to have the load spread out more evenly, I should
> then be using cluster/distribute?

If you want even distribution across different or variable numbers of
app/data nodes, then cluster/distribute would be the way to go.  For
example, you could create a distribute set across the storage nodes and
a nufa set across the app nodes, and then replicate between the two
(each app node preferring the local member of the nufa set).  You'd lose
the ability to suppress app-node-to-app-node communication with
different read-subvolume assignments, though, and in my experience
replicate over distribute doesn't work quite as well as the other way
around.  Another option, since you do have a fast interconnect, would be
to place all of the permanent storage on the data nodes and use storage
on the app nodes only for caching (as we had discussed).  Replicate
pair-wise or diagonally between data nodes, distribute across the
replica sets, and you'd have a pretty good solution to handle future
expansion.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] transport.remote-port is changing on volume restart

2010-06-30 Thread Rafael Pappert
Hello List,

I'm evaluate gluster platform as a "static file backend" for a webserver farm.
First of all, I have to say thank you to the guys at gluster, you did an 
awesome job.

But there is one really annoying thing, after each restart of a volume in the
volume-manager, i have to change the transport.remote-port in the "client.vol" 
and
remount the volume on all clients.

Is there a better way to do this or is there a misconfiguration?
My client.vol looks like this:

volume 192.168.1.167-1
type protocol/client
option transport-type tcp
option remote-host 192.168.1.167
option transport.socket.nodelay on
option transport.remote-port 10006
option remote-subvolume brick1
end-volume

volume 192.168.1.168-1
type protocol/client
option transport-type tcp
option remote-host 192.168.1.168
option transport.socket.nodelay on
option transport.remote-port 10006
option remote-subvolume brick1
end-volume

volume mirror-0
type cluster/replicate
subvolumes 192.168.1.168-1 192.168.1.167-1
end-volume

volume readahead
type performance/read-ahead
option page-count 4
subvolumes mirror-0
end-volume

volume iocache
type performance/io-cache
option cache-size `echo $(( $(grep 'MemTotal' /proc/meminfo | sed 
's/[^0-9]//g') / 5120 ))`MB
option cache-timeout 1
subvolumes readahead
end-volume

volume quickread
type performance/quick-read
option cache-timeout 1
option max-file-size 64kB
subvolumes iocache
end-volume

volume writebehind
type performance/write-behind
option cache-size 4MB
subvolumes quickread
end-volume

volume statprefetch
type performance/stat-prefetch
subvolumes writebehind
end-volume

Thank you in advance,
Rafael.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] debootstrap on glusterfs

2010-06-30 Thread Kalin Bogatzevski
Hi,

I am trying to install a debootstrap lenny on a glusterfs export.
GlusterFS is compiled from source (git 3.1.0).
The volumes are created by volgen.
When I issue the command, an error occurres:

sh2:/# debootstrap lenny /zfs
/usr/share/debootstrap/functions: line 1047: /zfs/test-dev-null: No such device 
or address
E: Cannot install into target '/zfs' mounted with noexec or nodev

I cannot see any change if I modify my /etc/fstab, as probably those exec,dev 
are not supported options:

sh2:/# mount
fusectl on /sys/fs/fuse/connections type fusectl (rw)
/etc/glusterfs/glusterfs.vol on /zfs type fuse.glusterfs 
(rw,allow_other,default_permissions,max_read=131072)


Has anyone did that before? Please provide me any hints to decide whether to 
continue with those tests.

Thanks!
Kalin.___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users