Re: [Gluster-users] The continuing story ...
> - server was ping'able > - glusterfsd was disconnected by the client because of missing > ping-pong - no login possible > - no fs action (no lights on the hd-stack) > - no screen (was blank, stayed blank) This is very similar to what I have seen many times (even back on 1.3), and have also commented on the list. It seems that we have quite a few ACK's on this, or similar problems. The only thing different in my scenario, is that the console doesn't stay blank. When attempting to login I get the last login message, and nothing more, no prompt ever. Also, I can see that other processes are still listening on sockets etc.. so it seems like the kernel just can't grab new FD's. I too found the hang happens more easily if a downed node from a replicate pair re-joins after some time. Following suggestions that this is all kernel related, I have just moved up to RHEL 5.4 in the hope that the new kernel will help. This fix stood out as potentially related for me: https://bugzilla.redhat.com/show_bug.cgi?id=445433 We also have a broadcom network card, which had reports of hangs under load, the kernel has a patch for that too. If I still run into the hangs, I'll try xfs. Thanks, Jeff. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] client coherence problem with locks and truncate
On Sat, 2009-09-05 at 06:45 -0400, Anand Avati wrote: > Can you try your tests by mounting with --attribute-timeout=0 command > line parameter? Still happens. Cheers, Rob ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] The continuing story ...
Yep, I experience this exact lock-up state on the 2.x train of GlusterFS with two severs, each with local client, and have so far given up testing :( - I run 1.3 in production which still has problems when one of the servers goes down, and was hoping to move up to 2.x quickly, but cant at the moment. Every time a new version comes out I update hoping it will be solved. Because the machine that hangs, hangs so completely one can't ssh in and can't get a proper dump from the process, and any DEBUG log enabled has no information in it either, so I haven't been able to provide anything useful to the team to work from :( On 7 Sep 2009, at 15:46, Stephan von Krawczynski wrote: Hello all, last week we saw our first try to enable something like a real-world environment on glusterfs fail. Nevertheless we managed to get a working combination of _one_ server and _one_ client (using a replicate setup with a missing second server). This setup worked for about 4 days, so yesterday we tried to enable the second server. Within minutes the first one crashed. Well, really we do not know if it crashed in its true meaning, the situation looked like this: - server was ping'able - glusterfsd was disconnected by the client because of missing ping- pong - no login possible - no fs action (no lights on the hd-stack) - no screen (was blank, stayed blank) This could also be a user-space hang or cpu busy/looping. We don't know. The really interesting part is that the server worked for days being single, but as soon as dual server fs action (obviously in combination with self healing) started it did not survive 10 minutes. Of course the second server went on, but we had to stop the whole thing because the data was not completely healed, so it made no sense to go on with old copies. This was glusterfs 2.0.6 with a minimal server setup (storage/posix, features/locks, performance/io-threads) on a linux kernel 2.6.25.2. Is there someone out there that experienced something the like? Any ideas? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] How does replication work?
Like the subject implies, how does replication work exactly? If a client is the only one that has the IP addresses defined for the servers, does that mean that only a client writing a file ensures that it goes to both servers? That would tell me that the servers don't directly communicate with each other for replication. If so, how does healing work? Since the client is the only configuration with the multiple server IP addresses, is it the client's "task" to make sure the server heals itself once it's back online? If not, how do they servers know each other exist if not for the client config file? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] performance/io-cache translator question
Hello all, can I tell the io-cache translator to _not_ cache a certain type of file? Maybe with a special priority or by simply not adding it to the option priority line? Lets say I know it is good to cache "*.h" but useless to cache "*". How would I configure that? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] The continuing story ...
Hello all, last week we saw our first try to enable something like a real-world environment on glusterfs fail. Nevertheless we managed to get a working combination of _one_ server and _one_ client (using a replicate setup with a missing second server). This setup worked for about 4 days, so yesterday we tried to enable the second server. Within minutes the first one crashed. Well, really we do not know if it crashed in its true meaning, the situation looked like this: - server was ping'able - glusterfsd was disconnected by the client because of missing ping-pong - no login possible - no fs action (no lights on the hd-stack) - no screen (was blank, stayed blank) This could also be a user-space hang or cpu busy/looping. We don't know. The really interesting part is that the server worked for days being single, but as soon as dual server fs action (obviously in combination with self healing) started it did not survive 10 minutes. Of course the second server went on, but we had to stop the whole thing because the data was not completely healed, so it made no sense to go on with old copies. This was glusterfs 2.0.6 with a minimal server setup (storage/posix, features/locks, performance/io-threads) on a linux kernel 2.6.25.2. Is there someone out there that experienced something the like? Any ideas? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] options not recognized
Hi , I get following in GlusterFS log . gluster version used in 2.0.6. Has there been any change ? W [xlator.c:555:validate_xlator_volume_options] writebehind: option 'window-size' is deprecated, preferred is 'cache-size', continuing with correction W [glusterfsd.c:470:_log_if_option_is_invalid] cache: option 'page-size' is not recognized W [glusterfsd.c:470:_log_if_option_is_invalid] brick1: option 'transport-timeout' is not recognized Thank you. -Paras ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Newbie questions :-)
Philipp Huber wrote: Daniel, Fantastic, thanks very much for your reply. We are very excited about GlusterFS and are working on a business case for a Cloud Storage product that would complement our Cloud Computing platform. One quick question re your #4 answer, does that mean you will have to take the volume down for a re-sync? Thanks for your reply, Phil Please direct your replies to the list, mate. :) As for question #4 : > 4) Is it correct to assume that after a failed 'brick' comes back > online, the auto-heal functionality will take care of the re-sync'ing? The volume doesn't need to be taken down, no, but replication won't happen by magic either. Basically, for a node to realise that its copy of the file is no longer current (or that it shouldn't be there, or should be there, or whatever), the file has to be accessed. On a webserver or something like that, the access might easily occur organically (a graphic or html page being served). On file servers where there's less interactivity, running a simple script that will find and, say, stat the files in the exported tree (for example) will ensure coherency. -- Daniel Maher ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Newbie questions :-)
Hello ! Philipp Huber wrote: 1) Can I configure GlusterFS so it can withstand a complete 'brick' failure without users loosing access to their data? Yes. 2) If Yes, can I configure how many redundant copies of the files are store, e.g. 2x, 3x? Yes. 3) Can I control the amount of replication per user? No. 4) Is it correct to assume that after a failed 'brick' comes back online, the auto-heal functionality will take care of the re-sync'ing? Yes (but not in the background...) 5) As GlusterFS stores Metadata along with the normal data, what is the capacity overhead in %? That's a good question. :) -- Daniel Maher ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] accesing same server through different interfaces
Hi Having one server being accesible through two different ip addreses is there any way to have glusterfs fallback to the second address when it cannot connect to the server through the first one ? I think this is the HA module that they are working on... they are working on or it's already working ? according to: http://gluster.org/docs/index.php/Whats_New_v2.0 i seems that is already working but i cannot find any reference to it in the documentation http://www.gluster.org/docs/index.php/Translators -- Best regards ... David Saez Padroshttp://www.ols.es On-Line Services 2000 S.L. telf+34 902 50 29 75 ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users