Re: [Gluster-users] Revisit: FORTRAN Codes and File I/O
On 06/30/2010 03:33 PM, Brian Smith wrote: Spoke too soon. Same problem occurs minus all performance translators. Debug logs on the server show [2010-06-30 15:30:54] D [server-protocol.c:2104:server_create_cbk] server-tcp: create(/b/brs/Si/CHGCAR) inode (ptr=0x2aaab00e05b0, ino=2159011921, gen=5488651098262601749) found conflict (ptr=0x2aaab40cca00, ino=2159011921, gen=5488651098262601749) [2010-06-30 15:30:54] D [server-resolve.c:386:resolve_entry_simple] server-tcp: inode (pointer: 0x2aaab40cca00 ino:2159011921) found for path (/b/brs/Si/CHGCAR) while type is RESOLVE_NOT [2010-06-30 15:30:54] D [server-protocol.c:2132:server_create_cbk] server-tcp: 72: CREATE (null) (0) ==> -1 (File exists) The first line almost looks like a create attempt for a file that already exists at the server. The second and third lines look like *yet another* create attempt, failing this time before the request is even passed to the next translator. This might be a good time to drag out the debug/trace translator, and sit it on top of brick1 to watch the create calls. That will help nail down the exact sequence of events as the server sees them, so we don't go looking in the wrong places. It might even be useful to do the same on the client side, but perhaps not yet. Instructions are here: http://www.gluster.com/community/documentation/index.php/Translators/debug/trace In the mean time, to further identity which code paths are most likely to be relevant, it would be helpful to know a couple more things. (1) Is each storage/posix volume using just one local filesystem, or is it possible that the underlying directory tree spans more than one? This could lead to inode-number duplication, which requires extra handling. (2) Are either of the server-side volumes close to being full? This could result in creating an extra "linkfile" on the subvolume/server where we'd normally create the file, pointing to where we really created it due to space considerations. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Revisit: FORTRAN Codes and File I/O
Spoke too soon. Same problem occurs minus all performance translators. Debug logs on the server show [2010-06-30 15:30:54] D [server-protocol.c:2104:server_create_cbk] server-tcp: create(/b/brs/Si/CHGCAR) inode (ptr=0x2aaab00e05b0, ino=2159011921, gen=5488651098262601749) found conflict (ptr=0x2aaab40cca00, ino=2159011921, gen=5488651098262601749) [2010-06-30 15:30:54] D [server-resolve.c:386:resolve_entry_simple] server-tcp: inode (pointer: 0x2aaab40cca00 ino:2159011921) found for path (/b/brs/Si/CHGCAR) while type is RESOLVE_NOT [2010-06-30 15:30:54] D [server-protocol.c:2132:server_create_cbk] server-tcp: 72: CREATE (null) (0) ==> -1 (File exists) -Brian -- Brian Smith Senior Systems Administrator IT Research Computing, University of South Florida 4202 E. Fowler Ave. ENB204 Office Phone: +1 813 974-1467 Organization URL: http://rc.usf.edu On Wed, 2010-06-30 at 13:06 -0400, Brian Smith wrote: > I received these in my debug output during a run that failed: > > [2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead: > unexpected offset (8192 != 1062) resetting > [2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead: > unexpected offset (8192 != 1062) resetting > [2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead: > unexpected offset (8192 != 1062) resetting > [2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead: > unexpected offset (8192 != 1062) resetting > > I disabled the read-ahead translator as well as the three other > performance translators commented out in my vol file (I'm on GigE; the > docs say I can still reach link max anyway) and my processes appear to > be running smoothly. I'll go ahead and submit the bug report with > tracing enabled as well. > > -Brian > > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Shared files occasionally unreadable from some nodes
If I use md5sum I get two different results on two different hosts. On the host where the file appears to be empty I got the md5sum of an empty file (d41d8cd98f00b204e9800998ecf8427e). I did some experiments since my last post and it looks like disabling the iocache plugin will eliminate these errors. I've attached the logs from the host where the file appears empty. On Wed, Jun 30, 2010 at 1:42 AM, Lakshmipathi wrote: > Hi Jonathan nilsson, > Could you please verify the files integrity using md5sum instead of > checking size using ls command ? > Please sent us the log files too. > > -- > > Cheers, > Lakshmipathi.G > FOSS Programmer. > - Original Message - > From: "Jonathan nilsson" > To: gluster-users@gluster.org > Sent: Thursday, June 24, 2010 9:22:29 PM > Subject: [Gluster-users] Shared files occasionally unreadable from some > nodes > > Hello all, > > I am new to gluster and I've been seeing some inconsistent behavior. When I > write files to the gluster about 1 in 1000 will be unreadable on one node. > From that node I can see the file with ls and ls does report the correct > size. However running cat on the file produces no output and vim thinks > that > it is full of the ^@ character. If I try to read the file from another node > it is fine. > > After some Googling I've read that an ls -lR can fix similar problems but > it > hasn't had any effect for me. Running touch on the file does restore its > contents. I am running Glusterfs 3.0.4 on RHEL 5.4. I generated the config > files with the volgen tool and didn't make any changes. > > Is this a known issue or something that could've happened if I screwed up > the configuration? > > Here is my glusterfs.vol > ## file auto generated by /usr/bin/glusterfs-volgen (mount.vol) > # Cmd line: > # $ /usr/bin/glusterfs-volgen -n warehouse --raid 1 > gluster1:/export/warehouse gluster2:/export/warehouse > gluster3:/export/warehouse gluster4:/export/warehouse > > # RAID 1 > # TRANSPORT-TYPE tcp > volume gluster4-1 >type protocol/client >option transport-type tcp >option remote-host gluster4 >option transport.socket.nodelay on >option transport.remote-port 6996 >option remote-subvolume brick1 > end-volume > > volume gluster2-1 >type protocol/client >option transport-type tcp >option remote-host gluster2 >option transport.socket.nodelay on >option transport.remote-port 6996 >option remote-subvolume brick1 > end-volume > > volume gluster3-1 >type protocol/client >option transport-type tcp >option remote-host gluster3 >option transport.socket.nodelay on >option transport.remote-port 6996 >option remote-subvolume brick1 > end-volume > > volume gluster1-1 >type protocol/client >option transport-type tcp >option remote-host gluster1 >option transport.socket.nodelay on >option transport.remote-port 6996 >option remote-subvolume brick1 > end-volume > > volume mirror-0 >type cluster/replicate >subvolumes gluster1-1 gluster2-1 > end-volume > > volume mirror-1 >type cluster/replicate >subvolumes gluster3-1 gluster4-1 > end-volume > > volume distribute >type cluster/distribute >subvolumes mirror-0 mirror-1 > end-volume > > volume readahead >type performance/read-ahead >option page-count 4 >subvolumes distribute > end-volume > > volume iocache >type performance/io-cache >option cache-size `echo $(( $(grep 'MemTotal' /proc/meminfo | sed > 's/[^0-9]//g') / 5120 ))`MB >option cache-timeout 1 >subvolumes readahead > end-volume > > volume quickread >type performance/quick-read >option cache-timeout 1 >option max-file-size 64kB >subvolumes iocache > end-volume > > volume writebehind >type performance/write-behind >option cache-size 4MB >subvolumes quickread > end-volume > > volume statprefetch >type performance/stat-prefetch >subvolumes writebehind > end-volume > > ## file auto > generated by /usr/bin/glusterfs-volgen (export.vol) > # Cmd line: > # $ /usr/bin/glusterfs-volgen -n warehouse --raid 1 > gluster1:/export/warehouse gluster2:/export/warehouse > gluster3:/export/warehouse gluster4:/export/warehouse > > volume posix1 > type storage/posix > option directory /export/warehouse > end-volume > > volume locks1 >type features/locks >subvolumes posix1 > end-volume > > volume brick1 >type performance/io-threads >option thread-count 8 >subvolumes locks1 > end-volume > > volume server-tcp >type protocol/server >option transport-type tcp >option auth.addr.brick1.allow * >option transport.socket.listen-port 6996 >option transport.socket.nodelay on >subvolumes brick1 > end-volume > > and here is my glusterfsd.vol > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > > _
Re: [Gluster-users] Web Farm Configuration
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I guess Jenn just faces severe I/O lags when farm connection rate starts to increase. Gluster does not handle industrial load out of the box. Jenn, can you please briefly describe your Gluster configuration and application specifics? Actually I need: - - your volume topology (how many, distribute/stripe/afr, stripe block size etc.) - - performance translators that are in use, specific translator settings - - do you use FS locks? - - what's your average file size? - - can you please outline your FS access patterns (e.g. mostly read, mostly write, access request rate estimates etc.)? Quick and general hints are: 1. Renice glusterfsd and glusterfs processes on all server and client nodes, I typically use -20. This is the must indeed, I even modify my rc scripts to reflect it. 2. Check your I/O scheduler on server nodes and set it to "anticipatory" (assuming you use Linux). It goes like this: [r...@tifereth ~]# cat /sys/block/sda/queue/scheduler noop anticipatory deadline [cfq] [r...@tifereth ~]# echo "anticipatory" > /sys/block/sda/queue/scheduler [r...@tifereth ~]# cat /sys/block/sda/queue/scheduler noop [anticipatory] deadline cfq [r...@tifereth ~]# 3. Set your io-cache cache size and quick-read file size values to reasonable minimum (glusterfs process may crash randomly under load when these are set too high, seems to be version-independent). On 30.06.10 19:05, Emmanuel Noobadmin wrote: > I'll probably be using gluster for a web farm later this year so would > you mind sharing a bit more stats on the load you are handling (and > what kind of servers) when it crash and burns? > > > On 6/30/10, Jenn Fountain wrote: >> I am researching the best solution for file replication (images, htmls, etc) >> for our web farm app. Originally, the current production was configured to >> read from the gluster mount on all 4 servers in the farm. However, when the >> load became high, the servers crashed and burned so I had to remove gluster. >>I realize that our configuration may not have been optimal so I am trying >> to find the best configuration with gluster. Does anyone on the list have >> gluster configured in a webfarm and how do you have it configured? Thanks >> for any info! >> >> -Jennifer >> >> >> >> >> >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >> > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > . > - -- Regards, Dennis Arkhangelski Technical Manager WHB Networks LLC. http://www.webhostingbuzz.com/ -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.14 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwrdhsACgkQH77FUyBB2YWbSACfW/KtXUX+tJNqRl7SddYCmi9n Ma4AnjXEdN7mwc+NL64WkE7u/Otr200f =K8lX -END PGP SIGNATURE- ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Revisit: FORTRAN Codes and File I/O
I received these in my debug output during a run that failed: [2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead: unexpected offset (8192 != 1062) resetting [2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead: unexpected offset (8192 != 1062) resetting [2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead: unexpected offset (8192 != 1062) resetting [2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead: unexpected offset (8192 != 1062) resetting I disabled the read-ahead translator as well as the three other performance translators commented out in my vol file (I'm on GigE; the docs say I can still reach link max anyway) and my processes appear to be running smoothly. I'll go ahead and submit the bug report with tracing enabled as well. -Brian -- Brian Smith Senior Systems Administrator IT Research Computing, University of South Florida 4202 E. Fowler Ave. ENB204 Office Phone: +1 813 974-1467 Organization URL: http://rc.usf.edu On Tue, 2010-06-29 at 21:45 -0700, Harshavardhana wrote: > On 06/29/2010 04:36 PM, Brian Smith wrote: > > It's obviously been a while since I brought this issue up, but it has > > cropped up again for us. We're now on 3.0.3 and I've included my > > glusterfs*.vol files below. We end up with file i/o errors like the > > ones below: > > > > forrtl: File exists > > forrtl: severe (10): cannot overwrite existing file, unit 18, > > file /work/b/brs/vdWSi/CHGCAR > > > > Even if the file existed, it shouldn't really be a problem. Other file > > systems work just fine. I'll get some more verbose logging going and > > share my output. glusterfsd.vol is the same in the referenced e-mails > > below. > > > > Thanks in advance, > > -Brian > > > > > Hi Brian, > > We would need debug or trace logs from the client side? . This > seems to be a race and i assume you are using "vasp" application which > creates the file CHGCAR, DOSCAR etc files. > Since we don't have vasp in house, would you mind opening a bug at > http://bugs.gluster.com/ > and also "trace" logs from the client side attached with it. > > Regards > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS performance questions for Amazon EC2 deployment
> OCFS2 is a shared-disk filesystem, and in EC2 neither ephemeral storage > nor EBS can be mounted on more than one instance simultaneously. > Therefore, you'd need something to provide a shared-disk abstraction > within an AZ. DRBD mode can do this, and I think it's even reentrant so > that the devices created this way can themselves be used as components > for the inter-AZ-replication devices, but active/active mode isn't > recommended and I don't think you can connect more than two nodes this > way. What I am doing is using DRBD for shared disk between AZs, which (with OCFS2) then gives me a standard POSIX file system, which I can share inside the AZ with GlusterFS. A bit of a duct-tape job perhaps, but seems like it will work. The proof will be in the testing, which I am just building instances for now. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Web Farm Configuration
I'll probably be using gluster for a web farm later this year so would you mind sharing a bit more stats on the load you are handling (and what kind of servers) when it crash and burns? On 6/30/10, Jenn Fountain wrote: > I am researching the best solution for file replication (images, htmls, etc) > for our web farm app. Originally, the current production was configured to > read from the gluster mount on all 4 servers in the farm. However, when the > load became high, the servers crashed and burned so I had to remove gluster. >I realize that our configuration may not have been optimal so I am trying > to find the best configuration with gluster. Does anyone on the list have > gluster configured in a webfarm and how do you have it configured? Thanks > for any info! > > -Jennifer > > > > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS performance questions for Amazon EC2 deployment
On 06/30/2010 10:22 AM, Craig Box wrote: > OK, so this brings me to Plan B. (Feel free to suggest a plan C if you can.) > > I want to have six nodes, three in each availability zone, replicate a > Mercurial repository. Here's some art: > > [gluster c/s] [gluster c/s] | [gluster c/s] [gluster c/s] > | >[gluster s] | [gluster s] > [OCFS 2] | [OCFS 2] > [ DRBD ] --- [ DRBD ] > > DRBD doing the cross-AZ replication, and a three node GlusterFS > cluster inside each AZ. That way, any one machine going down should > still mean all the rest of the nodes can access the files. > > Sound believable? OCFS2 is a shared-disk filesystem, and in EC2 neither ephemeral storage nor EBS can be mounted on more than one instance simultaneously. Therefore, you'd need something to provide a shared-disk abstraction within an AZ. DRBD mode can do this, and I think it's even reentrant so that the devices created this way can themselves be used as components for the inter-AZ-replication devices, but active/active mode isn't recommended and I don't think you can connect more than two nodes this way. What's really needed, and I'm slightly surprised doesn't already exist, is a DRBD proxy that can be connected as a destination by several local DRBD sources, and then preserve request order even across devices as it becomes a DRBD source and ships those requests to another proxy in another AZ. Linbit's proxy doesn't seem to be designed for that particular purpose. The considerations for dm-replicator are essentially the same BTW. An async/long-distance replication translator has certainly been a frequent topic of discussion between me, the Gluster folks, and others. I have plans to shoot for full N-way active/active replication, but with that ambition comes complexity and we'll probably see simpler forms (e.g. two-way active/passive) much earlier. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS performance questions for Amazon EC2 deployment
OK, so this brings me to Plan B. (Feel free to suggest a plan C if you can.) I want to have six nodes, three in each availability zone, replicate a Mercurial repository. Here's some art: [gluster c/s] [gluster c/s] | [gluster c/s] [gluster c/s] | [gluster s] | [gluster s] [OCFS 2] | [OCFS 2] [ DRBD ] --- [ DRBD ] DRBD doing the cross-AZ replication, and a three node GlusterFS cluster inside each AZ. That way, any one machine going down should still mean all the rest of the nodes can access the files. Sound believable? Craig On Tue, Jun 29, 2010 at 5:16 PM, Count Zero wrote: > My short (and probably disappointing) answer is that with all my attempts, > and weeks trying to research and improve the performance, and asking here on > the mailing lists, that I have both failed to make it work over WAN, and that > authoritative answers were that "Wan is in the works". > > So for now, until WAN is officially supported, Keep it working within the > same zone, and use some other replication method to synchronize the two zones. > > > > On Jun 29, 2010, at 7:12 PM, Craig Box wrote: > >> Hi all, >> >> Spent the day reading the docs, blog posts, this mailing list, and >> lurking on IRC, but still have a few questions to ask. >> >> My goal is to implement a cross-availability-zone file system in >> Amazon EC2, and ensure that even if one server goes down, or is >> rebooted, all clients can continue, reading from/writing to a >> secondary server. >> >> The primary purpose is to share some data files for running a web site >> for an open source project - a Mercurial repository and some shared >> data, such as wiki images - but the main code/images/CSS etc for the >> site will be stored on each instance and managed by version control. >> >> As we have 150GB ephemeral storage (aka instance store, as opposed to >> EBS) free on each instance, I thought it might be good if we were to >> use that as the POSIX backend for Gluster, and have a complete copy of >> the Mercurial repository on each system, with each client using its >> local brick as the read subvolume for speed. That way, you don't need >> to go to the network for reads, which ought to be far more common than >> writes. >> >> We want to have the files available to seven servers, four in one AZ >> and three in another. >> >> I think it best if we maximise client performance, rather than >> replication speed; if one of our nodes is a few seconds behind, it's >> not the end of the world, but if it consistently takes a few seconds >> on every file write, that would be irritating. >> >> Some questions which I hope someone can answer: >> >> 1. Somewhat obviously, when we turn on replication and introduce a >> second server, write speed to the volume drops drastically If we use >> client-side replication, we can have redundancy in servers. Does this >> mean that GlusterFS client blocks, waiting for the client to write to >> every server? If we changed to server-side replication, would this >> background the replication overhead? >> >> 2. If we were to use server-side replication, should we use the >> write-behind translator in the server stack? >> >> 3. I was originally using 3.0.2 packaged with Ubuntu 10.04, and have >> tried upgrading to 3.0.5rc7 (as suggested on this list) for better >> performance with the quick-read translator, and other fixes. However, >> this actually seemed to make write performance *worse*! Should this >> be expected? >> >> (Our write test is totally scientific *cough*: we cp -a a directory of >> files onto the mounted volume.) >> >> 4. Should I expect a different performance pattern using the instance >> storage, rather than an EBS volume? I found this post helpful - >> http://www.sirgroane.net/2010/03/tuning-glusterfs-for-apache-on-ec2/ - >> but it talks more about reading files than writing them, and it writes >> off some translators as not useful because of the way EBS works. >> >> 5. Is cluster/replicate even the right answer? Could we do something >> with cluster/distribute - is this, in effect, a RAID 10? It doesn't >> seem that replicate could possibly scale up to the number of nodes you >> hear about other people using GlusterFS with. >> >> 6. Could we do something crafty where you read directly from the POSIX >> volume but you do all your writes through GlusterFS? I see it's >> unsupported, but I guess that is just because you might get old data >> by reading the disk, rather than the client. >> >> Any advice that anyone can provide is welcome, and my thanks in advance! >> >> Regards >> Craig >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-u
[Gluster-users] Web Farm Configuration
I am researching the best solution for file replication (images, htmls, etc) for our web farm app. Originally, the current production was configured to read from the gluster mount on all 4 servers in the farm. However, when the load became high, the servers crashed and burned so I had to remove gluster. I realize that our configuration may not have been optimal so I am trying to find the best configuration with gluster. Does anyone on the list have gluster configured in a webfarm and how do you have it configured? Thanks for any info! -Jennifer ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Shared VM disk/image on gluster for redundancy?
On 06/30/2010 07:53 AM, Emmanuel Noobadmin wrote: > On Wed, Jun 30, 2010 at 7:25 PM, Jeff Darcy wrote: > >> Another option, since you do have a fast interconnect, would be >> to place all of the permanent storage on the data nodes and use storage >> on the app nodes only for caching (as we had discussed). Replicate >> pair-wise or diagonally between data nodes, distribute across the >> replica sets, and you'd have a pretty good solution to handle future >> expansion. > > I think I'll probably go with this since you mention the replicate > over distribute doesn't work that well and I like to keep the app and > storage separate. But might change my mind if testing indicates the > performance level is not acceptable. > > As for fast interconnect, does that imply 10GbE/FC kind of speeds or > would normal GbE work? Hm, it appears I was confusing this thread with another one where the person had mentioned using DDR IB. By "fast interconnect" (having worked with interconnects up to 48Gb/s/node) I usually mean at least 10GbE and preferably some form of IB. Accessing all storage over a GbE network can work, but often requires more careful tuning and selection of equipment to get adequate performance. A lot depends on how much you can benefit from things like read-ahead and io-cache, or how much data you're willing to leave in write-behind buffers. It might well be the case that replicate over nufa/distribute will work better for your environment after all despite the issues with app-node "crosstalk" or the "inversion" of replicate vs. distribute. I think it's time to experiment with some of the options and see how they do. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Shared VM disk/image on gluster for redundancy?
On Wed, Jun 30, 2010 at 7:25 PM, Jeff Darcy wrote: > Another option, since you do have a fast interconnect, would be > to place all of the permanent storage on the data nodes and use storage > on the app nodes only for caching (as we had discussed). Replicate > pair-wise or diagonally between data nodes, distribute across the > replica sets, and you'd have a pretty good solution to handle future > expansion. I think I'll probably go with this since you mention the replicate over distribute doesn't work that well and I like to keep the app and storage separate. But might change my mind if testing indicates the performance level is not acceptable. As for fast interconnect, does that imply 10GbE/FC kind of speeds or would normal GbE work? On 6/30/10, Jeff Darcy wrote: > On 06/29/2010 11:31 PM, Emmanuel Noobadmin wrote: >> With the nufa volumes, a file is only written to one of the volumes >> listed in its definition. >> If the volume is a replicate volume, then the file is replicated on >> each of the volumes listed in its definition. >> >> e.g in this case >> volume my_nufa >> type cluster/nufa >> option local-volume-name rep1 >> subvolumes rep0 rep1 rep2 >> end-volume >> >> A file is only found in one of rep0 rep1 or rep2. If it was on rep2, >> then it would be inaccessible if rep2 fails such as network failure >> cutting rep2 off. > > Yes, but rep2 as a whole could only fail if all of its component volumes > - one on an app node and one on a data node - failed simultaneously. > That's about as good protection as you're going to get without > increasing your replication level (therefore decreasing both performance > and effective storage utilization). > >> Then when I add a rep3, gluster should automatically start putting new >> files onto it. >> >> At this point though, it seems that if I use nufa, I would have an >> issue if I add a purely storage only rep3 instead of an app+storage >> node. None of the servers will use it until their local volume reaches >> max capacity right? :D >> >> So if I preferred to have the load spread out more evenly, I should >> then be using cluster/distribute? > > If you want even distribution across different or variable numbers of > app/data nodes, then cluster/distribute would be the way to go. For > example, you could create a distribute set across the storage nodes and > a nufa set across the app nodes, and then replicate between the two > (each app node preferring the local member of the nufa set). You'd lose > the ability to suppress app-node-to-app-node communication with > different read-subvolume assignments, though, and in my experience > replicate over distribute doesn't work quite as well as the other way > around. Another option, since you do have a fast interconnect, would be > to place all of the permanent storage on the data nodes and use storage > on the app nodes only for caching (as we had discussed). Replicate > pair-wise or diagonally between data nodes, distribute across the > replica sets, and you'd have a pretty good solution to handle future > expansion. > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Shared VM disk/image on gluster for redundancy?
On 06/29/2010 11:31 PM, Emmanuel Noobadmin wrote: > With the nufa volumes, a file is only written to one of the volumes > listed in its definition. > If the volume is a replicate volume, then the file is replicated on > each of the volumes listed in its definition. > > e.g in this case > volume my_nufa > type cluster/nufa > option local-volume-name rep1 > subvolumes rep0 rep1 rep2 > end-volume > > A file is only found in one of rep0 rep1 or rep2. If it was on rep2, > then it would be inaccessible if rep2 fails such as network failure > cutting rep2 off. Yes, but rep2 as a whole could only fail if all of its component volumes - one on an app node and one on a data node - failed simultaneously. That's about as good protection as you're going to get without increasing your replication level (therefore decreasing both performance and effective storage utilization). > Then when I add a rep3, gluster should automatically start putting new > files onto it. > > At this point though, it seems that if I use nufa, I would have an > issue if I add a purely storage only rep3 instead of an app+storage > node. None of the servers will use it until their local volume reaches > max capacity right? :D > > So if I preferred to have the load spread out more evenly, I should > then be using cluster/distribute? If you want even distribution across different or variable numbers of app/data nodes, then cluster/distribute would be the way to go. For example, you could create a distribute set across the storage nodes and a nufa set across the app nodes, and then replicate between the two (each app node preferring the local member of the nufa set). You'd lose the ability to suppress app-node-to-app-node communication with different read-subvolume assignments, though, and in my experience replicate over distribute doesn't work quite as well as the other way around. Another option, since you do have a fast interconnect, would be to place all of the permanent storage on the data nodes and use storage on the app nodes only for caching (as we had discussed). Replicate pair-wise or diagonally between data nodes, distribute across the replica sets, and you'd have a pretty good solution to handle future expansion. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] transport.remote-port is changing on volume restart
Hello List, I'm evaluate gluster platform as a "static file backend" for a webserver farm. First of all, I have to say thank you to the guys at gluster, you did an awesome job. But there is one really annoying thing, after each restart of a volume in the volume-manager, i have to change the transport.remote-port in the "client.vol" and remount the volume on all clients. Is there a better way to do this or is there a misconfiguration? My client.vol looks like this: volume 192.168.1.167-1 type protocol/client option transport-type tcp option remote-host 192.168.1.167 option transport.socket.nodelay on option transport.remote-port 10006 option remote-subvolume brick1 end-volume volume 192.168.1.168-1 type protocol/client option transport-type tcp option remote-host 192.168.1.168 option transport.socket.nodelay on option transport.remote-port 10006 option remote-subvolume brick1 end-volume volume mirror-0 type cluster/replicate subvolumes 192.168.1.168-1 192.168.1.167-1 end-volume volume readahead type performance/read-ahead option page-count 4 subvolumes mirror-0 end-volume volume iocache type performance/io-cache option cache-size `echo $(( $(grep 'MemTotal' /proc/meminfo | sed 's/[^0-9]//g') / 5120 ))`MB option cache-timeout 1 subvolumes readahead end-volume volume quickread type performance/quick-read option cache-timeout 1 option max-file-size 64kB subvolumes iocache end-volume volume writebehind type performance/write-behind option cache-size 4MB subvolumes quickread end-volume volume statprefetch type performance/stat-prefetch subvolumes writebehind end-volume Thank you in advance, Rafael. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] debootstrap on glusterfs
Hi, I am trying to install a debootstrap lenny on a glusterfs export. GlusterFS is compiled from source (git 3.1.0). The volumes are created by volgen. When I issue the command, an error occurres: sh2:/# debootstrap lenny /zfs /usr/share/debootstrap/functions: line 1047: /zfs/test-dev-null: No such device or address E: Cannot install into target '/zfs' mounted with noexec or nodev I cannot see any change if I modify my /etc/fstab, as probably those exec,dev are not supported options: sh2:/# mount fusectl on /sys/fs/fuse/connections type fusectl (rw) /etc/glusterfs/glusterfs.vol on /zfs type fuse.glusterfs (rw,allow_other,default_permissions,max_read=131072) Has anyone did that before? Please provide me any hints to decide whether to continue with those tests. Thanks! Kalin.___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users