Re: [Gluster-users] The continuing story ...

2009-09-07 Thread Jeff Evans
> - server was ping'able
> - glusterfsd was disconnected by the client because of missing
> ping-pong - no login possible
> - no fs action (no lights on the hd-stack)
> - no screen (was blank, stayed blank)

This is very similar to what I have seen many times (even back on
1.3), and have also commented on the list.

It seems that we have quite a few ACK's on this, or similar problems.

The only thing different in my scenario, is that the console doesn't
stay blank. When attempting to login I get the last login message, and
nothing more, no prompt ever. Also, I can see that other processes are
still listening on sockets etc.. so it seems like the kernel just
can't grab new FD's.

I too found the hang happens more easily if a downed node from a
replicate pair re-joins after some time.

Following suggestions that this is all kernel related, I have just
moved up to RHEL 5.4 in the hope that the new kernel will
help.

This fix stood out as potentially related for me:
https://bugzilla.redhat.com/show_bug.cgi?id=445433

We also have a broadcom network card, which had reports of hangs under
load, the kernel has a patch for that too.

If I still run into the hangs, I'll try xfs.

Thanks, Jeff.




___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] client coherence problem with locks and truncate

2009-09-07 Thread Robert L. Millner
On Sat, 2009-09-05 at 06:45 -0400, Anand Avati wrote:
> Can you try your tests by mounting with --attribute-timeout=0 command
> line parameter?

Still happens.

Cheers,
Rob


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] The continuing story ...

2009-09-07 Thread Daniel Jordan Bambach
Yep, I experience this exact lock-up state on the 2.x train of  
GlusterFS with two severs, each with local client, and have so far  
given up testing :( - I run 1.3 in production which still has problems  
when one of the servers goes down, and was hoping to move up to 2.x  
quickly, but cant at the moment.


Every time a new version comes out I update hoping it will be solved.

Because the machine that hangs, hangs so completely one can't ssh in  
and can't get a proper dump from the process, and any DEBUG log  
enabled has no information in it either, so I haven't been able to  
provide anything useful to the team to work from :(




On 7 Sep 2009, at 15:46, Stephan von Krawczynski wrote:


Hello all,

last week we saw our first try to enable something like a real-world
environment on glusterfs fail.
Nevertheless we managed to get a working combination of _one_ server  
and _one_

client (using a replicate setup with a missing second server).
This setup worked for about 4 days, so yesterday we tried to enable  
the second
server. Within minutes the first one crashed. Well, really we do not  
know if

it crashed in its true meaning, the situation looked like this:
- server was ping'able
- glusterfsd was disconnected by the client because of missing ping- 
pong

- no login possible
- no fs action (no lights on the hd-stack)
- no screen (was blank, stayed blank)

This could also be a user-space hang or cpu busy/looping. We don't  
know.
The really interesting part is that the server worked for days being  
single,
but as soon as dual server fs action (obviously in combination with  
self

healing) started it did not survive 10 minutes.
Of course the second server went on, but we had to stop the whole  
thing
because the data was not completely healed, so it made no sense to  
go on with

old copies.
This was glusterfs 2.0.6 with a minimal server setup (storage/posix,
features/locks, performance/io-threads) on a linux kernel 2.6.25.2.
Is there someone out there that experienced something the like?
Any ideas?

--
Regards,
Stephan

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] How does replication work?

2009-09-07 Thread Alan Ivey
Like the subject implies, how does replication work exactly?

If a client is the only one that has the IP addresses defined for the servers, 
does that mean that only a client writing a file ensures that it goes to both 
servers? That would tell me that the servers don't directly communicate with 
each other for replication.

If so, how does healing work? Since the client is the only configuration with 
the multiple server IP addresses, is it the client's "task" to make sure the 
server heals itself once it's back online?

If not, how do they servers know each other exist if not for the client config 
file?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] performance/io-cache translator question

2009-09-07 Thread Stephan von Krawczynski
Hello all,

can I tell the io-cache translator to _not_ cache a certain type of file?
Maybe with a special priority or by simply not adding it to the option
priority line?
Lets say I know it is good to cache "*.h" but useless to cache "*". How would
I configure that?

-- 
Regards,
Stephan
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] The continuing story ...

2009-09-07 Thread Stephan von Krawczynski
Hello all,

last week we saw our first try to enable something like a real-world
environment on glusterfs fail.
Nevertheless we managed to get a working combination of _one_ server and _one_
client (using a replicate setup with a missing second server).
This setup worked for about 4 days, so yesterday we tried to enable the second
server. Within minutes the first one crashed. Well, really we do not know if
it crashed in its true meaning, the situation looked like this:
- server was ping'able
- glusterfsd was disconnected by the client because of missing ping-pong
- no login possible
- no fs action (no lights on the hd-stack)
- no screen (was blank, stayed blank)

This could also be a user-space hang or cpu busy/looping. We don't know.
The really interesting part is that the server worked for days being single,
but as soon as dual server fs action (obviously in combination with self
healing) started it did not survive 10 minutes.
Of course the second server went on, but we had to stop the whole thing
because the data was not completely healed, so it made no sense to go on with
old copies.
This was glusterfs 2.0.6 with a minimal server setup (storage/posix,
features/locks, performance/io-threads) on a linux kernel 2.6.25.2.
Is there someone out there that experienced something the like? 
Any ideas?

-- 
Regards,
Stephan

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] options not recognized

2009-09-07 Thread Paras Fadte
Hi ,

I get following in GlusterFS log . gluster version used in 2.0.6. Has
there been any change ?

W [xlator.c:555:validate_xlator_volume_options] writebehind: option
'window-size' is deprecated, preferred is 'cache-size', continuing
with correction
W [glusterfsd.c:470:_log_if_option_is_invalid] cache: option
'page-size' is not recognized
W [glusterfsd.c:470:_log_if_option_is_invalid] brick1: option
'transport-timeout' is not recognized


Thank you.

-Paras
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Newbie questions :-)

2009-09-07 Thread Daniel Maher

Philipp Huber wrote:

Daniel,

Fantastic, thanks very much for your reply. We are very excited about GlusterFS 
and are working on a business case for a Cloud Storage product that would 
complement our Cloud Computing platform.

One quick question re your #4 answer, does that mean you will have to take the 
volume down for a re-sync?

Thanks for your reply,
Phil


Please direct your replies to the list, mate. :)

As for question #4 :

> 4)  Is it correct to assume that after a failed 'brick' comes back
> online, the auto-heal functionality will take care of the re-sync'ing?

The volume doesn't need to be taken down, no, but replication won't 
happen by magic either.  Basically, for a node to realise that its copy 
of the file is no longer current (or that it shouldn't be there, or 
should be there, or whatever), the file has to be accessed.


On a webserver or something like that, the access might easily occur 
organically (a graphic or html page being served).  On file servers 
where there's less interactivity, running a simple script that will find 
and, say, stat the files in the exported tree (for example) will ensure 
coherency.



--
Daniel Maher 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Newbie questions :-)

2009-09-07 Thread Daniel Maher

Hello !

Philipp Huber wrote:


1)  Can I configure GlusterFS so it can withstand a complete 'brick'
failure without users loosing access to their data?


Yes.


2)  If Yes, can I configure how many redundant copies of the files are
store, e.g. 2x, 3x? 


Yes.


3)  Can I control the amount of replication per user?


No.


4)  Is it correct to assume that after a failed 'brick' comes back
online, the auto-heal functionality will take care of the re-sync'ing?


Yes (but not in the background...)


5)  As GlusterFS stores Metadata along with the normal data, what is the
capacity overhead in %?


That's a good question. :)


--
Daniel Maher 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] accesing same server through different interfaces

2009-09-07 Thread David Saez Padros

Hi


Having one server being accesible through two different ip addreses
is there any way to have glusterfs fallback to the second address
when it cannot connect to the server through the first one ?


I think this is the HA module that they are working on...


they are working on or it's already working ? according to:

http://gluster.org/docs/index.php/Whats_New_v2.0

i seems that is already working but i cannot find any reference
to it in the documentation

http://www.gluster.org/docs/index.php/Translators

--
Best regards ...


   David Saez Padroshttp://www.ols.es
   On-Line Services 2000 S.L.   telf+34 902 50 29 75



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users