Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

2015-02-10 Thread Tiago Santos
23 PM, Joe Julian 
>>> wrote:
>>>
>>>> Mismatched GFIDs would happen if a file is created on multiple replicas
>>>> during a split-brain event. The GFID is assigned at file creation.
>>>>
>>>>
>>>> On 01/26/2015 01:04 PM, A Ghoshal wrote:
>>>>
>>>>>  Yep, so it is indeed a split-brain caused by a mismatch of the
>>>>> trusted.gfid attribute.
>>>>>
>>>>> Sadly, I don't know precisely what causes it. -Communication loss
>>>>> might be one of the triggers. I am guessing the files with the problem are
>>>>> dynamic, correct? In our setup (also replica 2), communication is never a
>>>>> problem but we do see this when one of the server takes a reboot. Maybe
>>>>> some obscure and difficult to understand race between background self-heal
>>>>> and the self heal daemon...
>>>>>
>>>>> In any case, a normal procedure for split brain recovery would work
>>>>> for you if you wish to get you files back in function. It's easy to find 
>>>>> on
>>>>> google. I use the instructions on Joe Julian's blog page myself.
>>>>>
>>>>>
>>>>>   -Tiago Santos  wrote: -
>>>>>
>>>>>   ===
>>>>>   To: A Ghoshal 
>>>>>   From: Tiago Santos 
>>>>>   Date: 01/27/2015 02:11AM
>>>>>   Cc: gluster-users 
>>>>>   Subject: Re: [Gluster-users] Pretty much any operation related to
>>>>> Gluster mounted fs hangs for a while
>>>>>   ===
>>>>> Oh, right!
>>>>>
>>>>> Follow the outputs:
>>>>>
>>>>>
>>>>> root@web3:/export/images1-1/brick# time getfattr -m . -d -e hex
>>>>> templates/assets/prod/temporary/13/user_1339200.png
>>>>> # file: templates/assets/prod/temporary/13/user_1339200.png
>>>>> trusted.afr.site-images-client-0=0x0004
>>>>> trusted.afr.site-images-client-1=0x00020009
>>>>> trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527
>>>>>
>>>>> real 0m0.024s
>>>>> user 0m0.001s
>>>>> sys 0m0.001s
>>>>>
>>>>>
>>>>>
>>>>> root@web4:/export/images2-1/brick# time getfattr -m . -d -e hex
>>>>> templates/assets/prod/temporary/13/user_1339200.png
>>>>> # file: templates/assets/prod/temporary/13/user_1339200.png
>>>>> trusted.afr.site-images-client-0=0x
>>>>> trusted.afr.site-images-client-1=0x
>>>>> trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3
>>>>>
>>>>> real 0m0.003s
>>>>> user 0m0.000s
>>>>> sys 0m0.006s
>>>>>
>>>>>
>>>>> Not sure exactly what that means. I'm googling, and would appreciate
>>>>> if you
>>>>> guys can bring some light.
>>>>>
>>>>> Thanks!
>>>>> --
>>>>> Tiago
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal  wrote:
>>>>>
>>>>>  Actually you ran getfattr on the volume - which is why the requisite
>>>>>> extended attributes never showed up...
>>>>>>
>>>>>> Your bricks are mounted elsewhere.
>>>>>>   /exports/images1-1/brick, and exports/images2-1/brick
>>>>>>
>>>>>> Btw, what version of Linux do you use? And, are the files you observe
>>>>>> the
>>>>>> input/output errors on soft-links?
>>>>>>
>>>>>>   -Tiago Santos  wrote: -
>>>>>>
>>>>>>   ===
>>>>>>   To: A Ghoshal 
>>>>>>   From: Tiago Santos 
>>>>>>   Date: 01/27/2015 12:20AM
>>>>>>   Cc: gluster-users 
>>>>>>   Subject: Re: [Gluster-users] Pretty much any operation related to
>>>>>> Gluster
>>>>>> mounted fs hangs for a while
>>>>>>   ===
>>>>>> Thanks for you input, Anirban.
>>>>>>
>>>>>> I ran the commands on both servers, with the following results:
>>>>>>
>>>>>>
>>>>>> root@web3:/var/www/site-images# time getfattr -m . -d -e hex
>>>>>> templates/assets/prod/temporary/13/user_1339200.png
>>>>>>
>>>>>> real 0m34.524s
>>>>>> user 0m0.004s
>>>>>> sys 0m0.000s
>>>>>>
>>>>>>
>>>>>> root@web4:/var/www/site-images# time getfattr -m . -d -e hex
>>>>>> templates/assets/prod/temporary/13/user_1339200.png
>>>>>> getfattr: templates/assets/prod/temporary/13/user_1339200.png:
>>>>>> Input/output
>>>>>> error
>>>>>>
>>>>>> real 0m11.315s
>>>>>> user 0m0.001s
>>>>>> sys 0m0.003s
>>>>>> root@web4:/var/www/site-images# ls
>>>>>> templates/assets/prod/temporary/13/user_1339200.png
>>>>>> ls: cannot access templates/assets/prod/temporary/13/user_1339200.png:
>>>>>> Input/output error
>>>>>>
>>>>>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

2015-01-28 Thread Tiago Santos
Since I stopped writing to the clients (so I could cleanly work on the
split brain) I got no more entries on /var/log/gluster.log (this is the
client log, right?)


While working with diff command in order to fix the split brain, I saw
several entries like these:

diff: r2/webhost/sites/clipart/assets/apache/images/13/templates/558482:
Transport endpoint is not connected
diff: r2/webhost/sites/clipart/assets/apache/images/13/templates/558483:
Transport endpoint is not connected
diff: r2/webhost/sites/clipart/assets/apache/images/13/templates/558484:
Transport endpoint is not connected

They happen a lot, then stops. Then happen again and so on.

At the same time the errors are showing, ping from the system I'm working
on split-brain to the system that is failing to connect (r2) shows this:

64 bytes from r2-server (r2-ip): icmp_seq=662 ttl=64 time=1.21 ms
64 bytes from r2-server (r2-ip): icmp_seq=663 ttl=64 time=0.990 ms
64 bytes from r2-server (r2-ip): icmp_seq=664 ttl=64 time=1.01 ms

I know this is a very trivial network checking that may not be showing me
what I want to see, and I'm working on more elaborated one. But I'm
completely open for suggestions on how to properly do that in order to
verify if this is issue when talking about gluster.


So far, thank you so much, guys!



On Mon, Jan 26, 2015 at 8:36 PM, Joe Julian  wrote:

>  Check your client logs. Perhaps the client isn't actually connecting to
> both servers.
>
> On 01/26/2015 02:12 PM, Tiago Santos wrote:
>
> That's what I meant. Sorry for the confusion.
>
> I'm writing on Client1 (same server as Brick1). Client2 (mounted Brick2,
> on server2) has nothing writing to it (so far).
>
>  My wondering is how I went up on having a split-brain if I'm only
> writing on one client.
>
>
>
>
>
> On Mon, Jan 26, 2015 at 8:04 PM, Joe Julian  wrote:
>
>>  Nothing but GlusterFS should be writing to bricks. Mount a client and
>> write there.
>>
>>
>> On 01/26/2015 01:38 PM, Tiago Santos wrote:
>>
>> Right.
>>
>>  I have Brick1 being constantly written. But I have nothing writing on
>> Brick2. It just get "healed" data from Brick1.
>>
>>  This setup is still not in production, and there's no applications
>> using that data. I have rsyncs constantly updating Brick1 (bring data from
>> production servers), and then Gluster updates Brick2.
>>
>>  Which makes me wonder how may I be creating multiple replicas during a
>> split-brain.
>>
>>
>>  It may be the case that, having a split-brain event, I may be updating
>> versions of the same file on Brick1 (only), and Gluster understands it as
>> different versions and things get confuse?
>>
>>
>>  Anyways, while we talk I'm gonna run Joe's precious procedure on
>> split-brain recovery.
>>
>>
>>
>>
>>
>> On Mon, Jan 26, 2015 at 7:23 PM, Joe Julian  wrote:
>>
>>> Mismatched GFIDs would happen if a file is created on multiple replicas
>>> during a split-brain event. The GFID is assigned at file creation.
>>>
>>>
>>> On 01/26/2015 01:04 PM, A Ghoshal wrote:
>>>
>>>>  Yep, so it is indeed a split-brain caused by a mismatch of the
>>>> trusted.gfid attribute.
>>>>
>>>> Sadly, I don't know precisely what causes it. -Communication loss might
>>>> be one of the triggers. I am guessing the files with the problem are
>>>> dynamic, correct? In our setup (also replica 2), communication is never a
>>>> problem but we do see this when one of the server takes a reboot. Maybe
>>>> some obscure and difficult to understand race between background self-heal
>>>> and the self heal daemon...
>>>>
>>>> In any case, a normal procedure for split brain recovery would work for
>>>> you if you wish to get you files back in function. It's easy to find on
>>>> google. I use the instructions on Joe Julian's blog page myself.
>>>>
>>>>
>>>>   -Tiago Santos  wrote: -
>>>>
>>>>   ===
>>>>   To: A Ghoshal 
>>>>   From: Tiago Santos 
>>>>   Date: 01/27/2015 02:11AM
>>>>   Cc: gluster-users 
>>>>   Subject: Re: [Gluster-users] Pretty much any operation related to
>>>> Gluster mounted fs hangs for a while
>>>>   ===
>>>> Oh, right!
>>>>
>>>> Follow the outputs:
>>>>
>>>>
>>>> root@web3:/export/images1-1/brick# time getfattr -m . -d -e hex
>>>&

Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

2015-01-26 Thread Joe Julian
Check your client logs. Perhaps the client isn't actually connecting to 
both servers.


On 01/26/2015 02:12 PM, Tiago Santos wrote:

That's what I meant. Sorry for the confusion.

I'm writing on Client1 (same server as Brick1). Client2 (mounted 
Brick2, on server2) has nothing writing to it (so far).


My wondering is how I went up on having a split-brain if I'm only 
writing on one client.






On Mon, Jan 26, 2015 at 8:04 PM, Joe Julian <mailto:j...@julianfamily.org>> wrote:


Nothing but GlusterFS should be writing to bricks. Mount a client
and write there.


On 01/26/2015 01:38 PM, Tiago Santos wrote:

Right.

I have Brick1 being constantly written. But I have nothing
writing on Brick2. It just get "healed" data from Brick1.

This setup is still not in production, and there's no
applications using that data. I have rsyncs constantly updating
Brick1 (bring data from production servers), and then Gluster
updates Brick2.

Which makes me wonder how may I be creating multiple replicas
during a split-brain.


It may be the case that, having a split-brain event, I may be
updating versions of the same file on Brick1 (only), and Gluster
understands it as different versions and things get confuse?


Anyways, while we talk I'm gonna run Joe's precious procedure on
split-brain recovery.





On Mon, Jan 26, 2015 at 7:23 PM, Joe Julian mailto:j...@julianfamily.org>> wrote:

Mismatched GFIDs would happen if a file is created on
multiple replicas during a split-brain event. The GFID is
assigned at file creation.


On 01/26/2015 01:04 PM, A Ghoshal wrote:

Yep, so it is indeed a split-brain caused by a mismatch
of the trusted.gfid attribute.

Sadly, I don't know precisely what causes it.
-Communication loss might be one of the triggers. I am
guessing the files with the problem are dynamic, correct?
In our setup (also replica 2), communication is never a
problem but we do see this when one of the server takes a
reboot. Maybe some obscure and difficult to understand
race between background self-heal and the self heal daemon...

In any case, a normal procedure for split brain recovery
would work for you if you wish to get you files back in
function. It's easy to find on google. I use the
instructions on Joe Julian's blog page myself.


  -Tiago Santos mailto:ti...@musthavemenus.com>> wrote: -

  ===
  To: A Ghoshal mailto:a.ghos...@tcs.com>>
  From: Tiago Santos mailto:ti...@musthavemenus.com>>
  Date: 01/27/2015 02:11AM
      Cc: gluster-users mailto:gluster-users@gluster.org>>
          Subject: Re: [Gluster-users] Pretty much any operation
related to Gluster mounted fs hangs for a while
  ===
Oh, right!

Follow the outputs:


root@web3:/export/images1-1/brick# time getfattr -m . -d
-e hex
templates/assets/prod/temporary/13/user_1339200.png
# file: templates/assets/prod/temporary/13/user_1339200.png
trusted.afr.site-images-client-0=0x0004
trusted.afr.site-images-client-1=0x00020009
trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527

real 0m0.024s
user 0m0.001s
sys 0m0.001s



root@web4:/export/images2-1/brick# time getfattr -m . -d
-e hex
templates/assets/prod/temporary/13/user_1339200.png
# file: templates/assets/prod/temporary/13/user_1339200.png
trusted.afr.site-images-client-0=0x
trusted.afr.site-images-client-1=0x
trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3

real 0m0.003s
user 0m0.000s
sys 0m0.006s


Not sure exactly what that means. I'm googling, and would
appreciate if you
guys can bring some light.

Thanks!
--
Tiago




On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal
mailto:a.ghos...@tcs.com>> wrote:

Actually you ran getfattr on the volume - which is
why the requisite
extended attributes never showed up...

Your bricks are mounted elsewhere.
  /exports/images1-1/brick, and exports/images2-1/brick

Btw, what version of Linux do you use? And, are the
files you observe the
input/output errors on soft-links?

  -Tiago Santos

Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

2015-01-26 Thread Tiago Santos
That's what I meant. Sorry for the confusion.

I'm writing on Client1 (same server as Brick1). Client2 (mounted Brick2, on
server2) has nothing writing to it (so far).

My wondering is how I went up on having a split-brain if I'm only writing
on one client.





On Mon, Jan 26, 2015 at 8:04 PM, Joe Julian  wrote:

>  Nothing but GlusterFS should be writing to bricks. Mount a client and
> write there.
>
>
> On 01/26/2015 01:38 PM, Tiago Santos wrote:
>
> Right.
>
>  I have Brick1 being constantly written. But I have nothing writing on
> Brick2. It just get "healed" data from Brick1.
>
>  This setup is still not in production, and there's no applications using
> that data. I have rsyncs constantly updating Brick1 (bring data from
> production servers), and then Gluster updates Brick2.
>
>  Which makes me wonder how may I be creating multiple replicas during a
> split-brain.
>
>
>  It may be the case that, having a split-brain event, I may be updating
> versions of the same file on Brick1 (only), and Gluster understands it as
> different versions and things get confuse?
>
>
>  Anyways, while we talk I'm gonna run Joe's precious procedure on
> split-brain recovery.
>
>
>
>
>
> On Mon, Jan 26, 2015 at 7:23 PM, Joe Julian  wrote:
>
>> Mismatched GFIDs would happen if a file is created on multiple replicas
>> during a split-brain event. The GFID is assigned at file creation.
>>
>>
>> On 01/26/2015 01:04 PM, A Ghoshal wrote:
>>
>>>  Yep, so it is indeed a split-brain caused by a mismatch of the
>>> trusted.gfid attribute.
>>>
>>> Sadly, I don't know precisely what causes it. -Communication loss might
>>> be one of the triggers. I am guessing the files with the problem are
>>> dynamic, correct? In our setup (also replica 2), communication is never a
>>> problem but we do see this when one of the server takes a reboot. Maybe
>>> some obscure and difficult to understand race between background self-heal
>>> and the self heal daemon...
>>>
>>> In any case, a normal procedure for split brain recovery would work for
>>> you if you wish to get you files back in function. It's easy to find on
>>> google. I use the instructions on Joe Julian's blog page myself.
>>>
>>>
>>>   -Tiago Santos  wrote: -
>>>
>>>   ===
>>>   To: A Ghoshal 
>>>   From: Tiago Santos 
>>>   Date: 01/27/2015 02:11AM
>>>   Cc: gluster-users 
>>>   Subject: Re: [Gluster-users] Pretty much any operation related to
>>> Gluster mounted fs hangs for a while
>>>   ===
>>> Oh, right!
>>>
>>> Follow the outputs:
>>>
>>>
>>> root@web3:/export/images1-1/brick# time getfattr -m . -d -e hex
>>> templates/assets/prod/temporary/13/user_1339200.png
>>> # file: templates/assets/prod/temporary/13/user_1339200.png
>>> trusted.afr.site-images-client-0=0x0004
>>> trusted.afr.site-images-client-1=0x00020009
>>> trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527
>>>
>>> real 0m0.024s
>>> user 0m0.001s
>>> sys 0m0.001s
>>>
>>>
>>>
>>> root@web4:/export/images2-1/brick# time getfattr -m . -d -e hex
>>> templates/assets/prod/temporary/13/user_1339200.png
>>> # file: templates/assets/prod/temporary/13/user_1339200.png
>>> trusted.afr.site-images-client-0=0x
>>> trusted.afr.site-images-client-1=0x
>>> trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3
>>>
>>> real 0m0.003s
>>> user 0m0.000s
>>> sys 0m0.006s
>>>
>>>
>>> Not sure exactly what that means. I'm googling, and would appreciate if
>>> you
>>> guys can bring some light.
>>>
>>> Thanks!
>>> --
>>> Tiago
>>>
>>>
>>>
>>>
>>> On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal  wrote:
>>>
>>>  Actually you ran getfattr on the volume - which is why the requisite
>>>> extended attributes never showed up...
>>>>
>>>> Your bricks are mounted elsewhere.
>>>>   /exports/images1-1/brick, and exports/images2-1/brick
>>>>
>>>> Btw, what version of Linux do you use? And, are the files you observe
>>>> the
>>>> input/output errors on soft-links?
>>>>
>>>>   -Tiago Santos  wrote: 

Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

2015-01-26 Thread Joe Julian
Nothing but GlusterFS should be writing to bricks. Mount a client and 
write there.


On 01/26/2015 01:38 PM, Tiago Santos wrote:

Right.

I have Brick1 being constantly written. But I have nothing writing on 
Brick2. It just get "healed" data from Brick1.


This setup is still not in production, and there's no applications 
using that data. I have rsyncs constantly updating Brick1 (bring data 
from production servers), and then Gluster updates Brick2.


Which makes me wonder how may I be creating multiple replicas during a 
split-brain.



It may be the case that, having a split-brain event, I may be updating 
versions of the same file on Brick1 (only), and Gluster understands it 
as different versions and things get confuse?



Anyways, while we talk I'm gonna run Joe's precious procedure on 
split-brain recovery.






On Mon, Jan 26, 2015 at 7:23 PM, Joe Julian <mailto:j...@julianfamily.org>> wrote:


Mismatched GFIDs would happen if a file is created on multiple
replicas during a split-brain event. The GFID is assigned at file
creation.


On 01/26/2015 01:04 PM, A Ghoshal wrote:

Yep, so it is indeed a split-brain caused by a mismatch of the
trusted.gfid attribute.

Sadly, I don't know precisely what causes it. -Communication
loss might be one of the triggers. I am guessing the files
with the problem are dynamic, correct? In our setup (also
replica 2), communication is never a problem but we do see
this when one of the server takes a reboot. Maybe some obscure
and difficult to understand race between background self-heal
and the self heal daemon...

In any case, a normal procedure for split brain recovery would
work for you if you wish to get you files back in function.
It's easy to find on google. I use the instructions on Joe
Julian's blog page myself.


  -Tiago Santos mailto:ti...@musthavemenus.com>> wrote: -

  ===
  To: A Ghoshal mailto:a.ghos...@tcs.com>>
  From: Tiago Santos mailto:ti...@musthavemenus.com>>
  Date: 01/27/2015 02:11AM
  Cc: gluster-users mailto:gluster-users@gluster.org>>
      Subject: Re: [Gluster-users] Pretty much any operation
related to Gluster mounted fs hangs for a while
  ===
Oh, right!

Follow the outputs:


root@web3:/export/images1-1/brick# time getfattr -m . -d -e hex
templates/assets/prod/temporary/13/user_1339200.png
# file: templates/assets/prod/temporary/13/user_1339200.png
trusted.afr.site-images-client-0=0x0004
trusted.afr.site-images-client-1=0x00020009
trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527

real 0m0.024s
user 0m0.001s
sys 0m0.001s



root@web4:/export/images2-1/brick# time getfattr -m . -d -e hex
templates/assets/prod/temporary/13/user_1339200.png
# file: templates/assets/prod/temporary/13/user_1339200.png
trusted.afr.site-images-client-0=0x
trusted.afr.site-images-client-1=0x
trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3

real 0m0.003s
user 0m0.000s
sys 0m0.006s


Not sure exactly what that means. I'm googling, and would
appreciate if you
guys can bring some light.

Thanks!
--
Tiago




On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal mailto:a.ghos...@tcs.com>> wrote:

Actually you ran getfattr on the volume - which is why the
requisite
extended attributes never showed up...

Your bricks are mounted elsewhere.
  /exports/images1-1/brick, and exports/images2-1/brick

Btw, what version of Linux do you use? And, are the files
you observe the
input/output errors on soft-links?

  -Tiago Santos mailto:ti...@musthavemenus.com>> wrote: -

  ===
  To: A Ghoshal mailto:a.ghos...@tcs.com>>
  From: Tiago Santos mailto:ti...@musthavemenus.com>>
  Date: 01/27/2015 12:20AM
      Cc: gluster-users mailto:gluster-users@gluster.org>>
      Subject: Re: [Gluster-users] Pretty much any operation
related to Gluster
mounted fs hangs for a while
  ===
Thanks for you input, Anirban.

I ran the commands on both servers, with the following
results:


root@web3:/var/www/site-images# time getfattr -m . -d -e hex
templates/assets/prod/temporary/13/user_1339200.png

real 0m34.524s
user 

Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

2015-01-26 Thread Tiago Santos
On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal  wrote:

>
> Btw, what version of Linux do you use? And, are the files you observe the
> input/output errors on soft-links?
>


This is Ubuntu 14.04.

The files reporting input/output errors are all standard files, no links
involved.


Thanks,

-- 
*Tiago Santos*
MustHaveMenus.com
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

2015-01-26 Thread Tiago Santos
Right.

I have Brick1 being constantly written. But I have nothing writing on
Brick2. It just get "healed" data from Brick1.

This setup is still not in production, and there's no applications using
that data. I have rsyncs constantly updating Brick1 (bring data from
production servers), and then Gluster updates Brick2.

Which makes me wonder how may I be creating multiple replicas during a
split-brain.


It may be the case that, having a split-brain event, I may be updating
versions of the same file on Brick1 (only), and Gluster understands it as
different versions and things get confuse?


Anyways, while we talk I'm gonna run Joe's precious procedure on
split-brain recovery.





On Mon, Jan 26, 2015 at 7:23 PM, Joe Julian  wrote:

> Mismatched GFIDs would happen if a file is created on multiple replicas
> during a split-brain event. The GFID is assigned at file creation.
>
>
> On 01/26/2015 01:04 PM, A Ghoshal wrote:
>
>> Yep, so it is indeed a split-brain caused by a mismatch of the
>> trusted.gfid attribute.
>>
>> Sadly, I don't know precisely what causes it. -Communication loss might
>> be one of the triggers. I am guessing the files with the problem are
>> dynamic, correct? In our setup (also replica 2), communication is never a
>> problem but we do see this when one of the server takes a reboot. Maybe
>> some obscure and difficult to understand race between background self-heal
>> and the self heal daemon...
>>
>> In any case, a normal procedure for split brain recovery would work for
>> you if you wish to get you files back in function. It's easy to find on
>> google. I use the instructions on Joe Julian's blog page myself.
>>
>>
>>   -Tiago Santos  wrote: -----
>>
>>   ===========
>>   To: A Ghoshal 
>>   From: Tiago Santos 
>>   Date: 01/27/2015 02:11AM
>>   Cc: gluster-users 
>>   Subject: Re: [Gluster-users] Pretty much any operation related to
>> Gluster mounted fs hangs for a while
>>   ===
>> Oh, right!
>>
>> Follow the outputs:
>>
>>
>> root@web3:/export/images1-1/brick# time getfattr -m . -d -e hex
>> templates/assets/prod/temporary/13/user_1339200.png
>> # file: templates/assets/prod/temporary/13/user_1339200.png
>> trusted.afr.site-images-client-0=0x0004
>> trusted.afr.site-images-client-1=0x00020009
>> trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527
>>
>> real 0m0.024s
>> user 0m0.001s
>> sys 0m0.001s
>>
>>
>>
>> root@web4:/export/images2-1/brick# time getfattr -m . -d -e hex
>> templates/assets/prod/temporary/13/user_1339200.png
>> # file: templates/assets/prod/temporary/13/user_1339200.png
>> trusted.afr.site-images-client-0=0x
>> trusted.afr.site-images-client-1=0x
>> trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3
>>
>> real 0m0.003s
>> user 0m0.000s
>> sys 0m0.006s
>>
>>
>> Not sure exactly what that means. I'm googling, and would appreciate if
>> you
>> guys can bring some light.
>>
>> Thanks!
>> --
>> Tiago
>>
>>
>>
>>
>> On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal  wrote:
>>
>>  Actually you ran getfattr on the volume - which is why the requisite
>>> extended attributes never showed up...
>>>
>>> Your bricks are mounted elsewhere.
>>>   /exports/images1-1/brick, and exports/images2-1/brick
>>>
>>> Btw, what version of Linux do you use? And, are the files you observe the
>>> input/output errors on soft-links?
>>>
>>>   -Tiago Santos  wrote: -
>>>
>>>   ===
>>>   To: A Ghoshal 
>>>   From: Tiago Santos 
>>>   Date: 01/27/2015 12:20AM
>>>   Cc: gluster-users 
>>>   Subject: Re: [Gluster-users] Pretty much any operation related to
>>> Gluster
>>> mounted fs hangs for a while
>>>   ===
>>> Thanks for you input, Anirban.
>>>
>>> I ran the commands on both servers, with the following results:
>>>
>>>
>>> root@web3:/var/www/site-images# time getfattr -m . -d -e hex
>>> templates/assets/prod/temporary/13/user_1339200.png
>>>
>>> real 0m34.524s
>>> user 0m0.004s
>>> sys 0m0.000s
>>>
>>>
>>> root@web4:/var/www/site-images# time getfattr -m . -d -e hex
>>> templates/assets/prod/temporary/13/user_1339200.png
>>> getfattr: templates/assets/

Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

2015-01-26 Thread Joe Julian
Mismatched GFIDs would happen if a file is created on multiple replicas 
during a split-brain event. The GFID is assigned at file creation.


On 01/26/2015 01:04 PM, A Ghoshal wrote:

Yep, so it is indeed a split-brain caused by a mismatch of the trusted.gfid 
attribute.

Sadly, I don't know precisely what causes it. -Communication loss might be one 
of the triggers. I am guessing the files with the problem are dynamic, correct? 
In our setup (also replica 2), communication is never a problem but we do see 
this when one of the server takes a reboot. Maybe some obscure and difficult to 
understand race between background self-heal and the self heal daemon...

In any case, a normal procedure for split brain recovery would work for you if 
you wish to get you files back in function. It's easy to find on google. I use 
the instructions on Joe Julian's blog page myself.


  -Tiago Santos  wrote: -

  ===
  To: A Ghoshal 
  From: Tiago Santos 
  Date: 01/27/2015 02:11AM
  Cc: gluster-users 
  Subject: Re: [Gluster-users] Pretty much any operation related to Gluster 
mounted fs hangs for a while
  ===
Oh, right!

Follow the outputs:


root@web3:/export/images1-1/brick# time getfattr -m . -d -e hex
templates/assets/prod/temporary/13/user_1339200.png
# file: templates/assets/prod/temporary/13/user_1339200.png
trusted.afr.site-images-client-0=0x0004
trusted.afr.site-images-client-1=0x00020009
trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527

real 0m0.024s
user 0m0.001s
sys 0m0.001s



root@web4:/export/images2-1/brick# time getfattr -m . -d -e hex
templates/assets/prod/temporary/13/user_1339200.png
# file: templates/assets/prod/temporary/13/user_1339200.png
trusted.afr.site-images-client-0=0x
trusted.afr.site-images-client-1=0x
trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3

real 0m0.003s
user 0m0.000s
sys 0m0.006s


Not sure exactly what that means. I'm googling, and would appreciate if you
guys can bring some light.

Thanks!
--
Tiago




On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal  wrote:


Actually you ran getfattr on the volume - which is why the requisite
extended attributes never showed up...

Your bricks are mounted elsewhere.
  /exports/images1-1/brick, and exports/images2-1/brick

Btw, what version of Linux do you use? And, are the files you observe the
input/output errors on soft-links?

  -Tiago Santos  wrote: -

  ===
  To: A Ghoshal 
  From: Tiago Santos 
  Date: 01/27/2015 12:20AM
  Cc: gluster-users 
  Subject: Re: [Gluster-users] Pretty much any operation related to Gluster
mounted fs hangs for a while
  ===
Thanks for you input, Anirban.

I ran the commands on both servers, with the following results:


root@web3:/var/www/site-images# time getfattr -m . -d -e hex
templates/assets/prod/temporary/13/user_1339200.png

real 0m34.524s
user 0m0.004s
sys 0m0.000s


root@web4:/var/www/site-images# time getfattr -m . -d -e hex
templates/assets/prod/temporary/13/user_1339200.png
getfattr: templates/assets/prod/temporary/13/user_1339200.png: Input/output
error

real 0m11.315s
user 0m0.001s
sys 0m0.003s
root@web4:/var/www/site-images# ls
templates/assets/prod/temporary/13/user_1339200.png
ls: cannot access templates/assets/prod/temporary/13/user_1339200.png:
Input/output error


 
=-=-=

Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

2015-01-26 Thread A Ghoshal
Yep, so it is indeed a split-brain caused by a mismatch of the trusted.gfid 
attribute. 

Sadly, I don't know precisely what causes it. -Communication loss might be one 
of the triggers. I am guessing the files with the problem are dynamic, correct? 
In our setup (also replica 2), communication is never a problem but we do see 
this when one of the server takes a reboot. Maybe some obscure and difficult to 
understand race between background self-heal and the self heal daemon...

In any case, a normal procedure for split brain recovery would work for you if 
you wish to get you files back in function. It's easy to find on google. I use 
the instructions on Joe Julian's blog page myself. 


 -Tiago Santos  wrote: -

 ===
 To: A Ghoshal 
 From: Tiago Santos 
 Date: 01/27/2015 02:11AM 
 Cc: gluster-users 
 Subject: Re: [Gluster-users] Pretty much any operation related to Gluster 
mounted fs hangs for a while
 ===
   Oh, right!

Follow the outputs:


root@web3:/export/images1-1/brick# time getfattr -m . -d -e hex
templates/assets/prod/temporary/13/user_1339200.png
# file: templates/assets/prod/temporary/13/user_1339200.png
trusted.afr.site-images-client-0=0x0004
trusted.afr.site-images-client-1=0x00020009
trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527

real 0m0.024s
user 0m0.001s
sys 0m0.001s



root@web4:/export/images2-1/brick# time getfattr -m . -d -e hex
templates/assets/prod/temporary/13/user_1339200.png
# file: templates/assets/prod/temporary/13/user_1339200.png
trusted.afr.site-images-client-0=0x
trusted.afr.site-images-client-1=0x
trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3

real 0m0.003s
user 0m0.000s
sys 0m0.006s


Not sure exactly what that means. I'm googling, and would appreciate if you
guys can bring some light.

Thanks!
--
Tiago




On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal  wrote:

>
> Actually you ran getfattr on the volume - which is why the requisite
> extended attributes never showed up...
>
> Your bricks are mounted elsewhere.
>  /exports/images1-1/brick, and exports/images2-1/brick
>
> Btw, what version of Linux do you use? And, are the files you observe the
> input/output errors on soft-links?
>
>  -Tiago Santos  wrote: -
>
>  ===
>  To: A Ghoshal 
>  From: Tiago Santos 
>  Date: 01/27/2015 12:20AM
>  Cc: gluster-users 
>  Subject: Re: [Gluster-users] Pretty much any operation related to Gluster
> mounted fs hangs for a while
>  ===
>Thanks for you input, Anirban.
>
> I ran the commands on both servers, with the following results:
>
>
> root@web3:/var/www/site-images# time getfattr -m . -d -e hex
> templates/assets/prod/temporary/13/user_1339200.png
>
> real 0m34.524s
> user 0m0.004s
> sys 0m0.000s
>
>
> root@web4:/var/www/site-images# time getfattr -m . -d -e hex
> templates/assets/prod/temporary/13/user_1339200.png
> getfattr: templates/assets/prod/temporary/13/user_1339200.png: Input/output
> error
>
> real 0m11.315s
> user 0m0.001s
> sys 0m0.003s
> root@web4:/var/www/site-images# ls
> templates/assets/prod/temporary/13/user_1339200.png
> ls: cannot access templates/assets/prod/temporary/13/user_1339200.png:
> Input/output error
>
>

=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

2015-01-26 Thread Tiago Santos
Oh, right!

Follow the outputs:


root@web3:/export/images1-1/brick# time getfattr -m . -d -e hex
templates/assets/prod/temporary/13/user_1339200.png
# file: templates/assets/prod/temporary/13/user_1339200.png
trusted.afr.site-images-client-0=0x0004
trusted.afr.site-images-client-1=0x00020009
trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527

real 0m0.024s
user 0m0.001s
sys 0m0.001s



root@web4:/export/images2-1/brick# time getfattr -m . -d -e hex
templates/assets/prod/temporary/13/user_1339200.png
# file: templates/assets/prod/temporary/13/user_1339200.png
trusted.afr.site-images-client-0=0x
trusted.afr.site-images-client-1=0x
trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3

real 0m0.003s
user 0m0.000s
sys 0m0.006s


Not sure exactly what that means. I'm googling, and would appreciate if you
guys can bring some light.

Thanks!
--
Tiago




On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal  wrote:

>
> Actually you ran getfattr on the volume - which is why the requisite
> extended attributes never showed up...
>
> Your bricks are mounted elsewhere.
>  /exports/images1-1/brick, and exports/images2-1/brick
>
> Btw, what version of Linux do you use? And, are the files you observe the
> input/output errors on soft-links?
>
>  -Tiago Santos  wrote: -
>
>  ===
>  To: A Ghoshal 
>  From: Tiago Santos 
>  Date: 01/27/2015 12:20AM
>  Cc: gluster-users 
>  Subject: Re: [Gluster-users] Pretty much any operation related to Gluster
> mounted fs hangs for a while
>  ===
>Thanks for you input, Anirban.
>
> I ran the commands on both servers, with the following results:
>
>
> root@web3:/var/www/site-images# time getfattr -m . -d -e hex
> templates/assets/prod/temporary/13/user_1339200.png
>
> real 0m34.524s
> user 0m0.004s
> sys 0m0.000s
>
>
> root@web4:/var/www/site-images# time getfattr -m . -d -e hex
> templates/assets/prod/temporary/13/user_1339200.png
> getfattr: templates/assets/prod/temporary/13/user_1339200.png: Input/output
> error
>
> real 0m11.315s
> user 0m0.001s
> sys 0m0.003s
> root@web4:/var/www/site-images# ls
> templates/assets/prod/temporary/13/user_1339200.png
> ls: cannot access templates/assets/prod/temporary/13/user_1339200.png:
> Input/output error
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

2015-01-26 Thread A Ghoshal
 
Actually you ran getfattr on the volume - which is why the requisite extended 
attributes never showed up...

Your bricks are mounted elsewhere. 
 /exports/images1-1/brick, and exports/images2-1/brick

Btw, what version of Linux do you use? And, are the files you observe the 
input/output errors on soft-links?

 -Tiago Santos  wrote: -

 ===
 To: A Ghoshal 
 From: Tiago Santos 
 Date: 01/27/2015 12:20AM 
 Cc: gluster-users 
 Subject: Re: [Gluster-users] Pretty much any operation related to Gluster 
mounted fs hangs for a while
 ===
   Thanks for you input, Anirban.

I ran the commands on both servers, with the following results:


root@web3:/var/www/site-images# time getfattr -m . -d -e hex
templates/assets/prod/temporary/13/user_1339200.png

real 0m34.524s
user 0m0.004s
sys 0m0.000s


root@web4:/var/www/site-images# time getfattr -m . -d -e hex
templates/assets/prod/temporary/13/user_1339200.png
getfattr: templates/assets/prod/temporary/13/user_1339200.png: Input/output
error

real 0m11.315s
user 0m0.001s
sys 0m0.003s
root@web4:/var/www/site-images# ls
templates/assets/prod/temporary/13/user_1339200.png
ls: cannot access templates/assets/prod/temporary/13/user_1339200.png:
Input/output error


Not sure if it elucidate the issue..


Also, I saw at /var/log/gluster.log a zillion entries like these:

[2015-01-26 17:35:39.973268] W
[client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1:
remote operation failed: Transport endpoint is not connected. Path:
/templates/apache/template/prod/facebook/9616964
(----)
[2015-01-26 17:35:39.973435] W
[client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1:
remote operation failed: Transport endpoint is not connected. Path:
/templates/apache/template/prod/facebook/9594915
(----)
[2015-01-26 17:35:39.973571] W
[client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1:
remote operation failed: Transport endpoint is not connected. Path:
/templates/apache/template/prod/facebook/9681971
(----)
[2015-01-26 17:35:39.973686] W
[client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1:
remote operation failed: Transport endpoint is not connected. Path:
/templates/apache/template/prod/facebook/19615
(----)
[2015-01-26 17:35:39.973802] W
[client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1:
remote operation failed: Transport endpoint is not connected. Path:
/templates/apache/template/prod/facebook/130392
(----)


I have talked with some guys at #gluster that pointed it could be network
issues. I'm still looking into it, but since the issue also happens locally
(within the same server), would that still be a valid point?


Also, less often, I see entries like these:

[2015-01-26 17:41:25.956418] E
[afr-self-heal-common.c:1615:afr_sh_common_lookup_cbk]
0-site-images-replicate-0: Conflicting entries for
/webhost/sites/clipart/assets/apache/images/graphics/215126/image1.png
[2015-01-26 17:41:26.588753] E
[afr-self-heal-common.c:1615:afr_sh_common_lookup_cbk]
0-site-images-replicate-0: Conflicting entries for
/webhost/sites/clipart/assets/apache/images/graphics/215126/image1.png


Are those a definitive indication of a split-brain? Or just something usual
until self-heal takes care of recently updated files?






On Mon, Jan 26, 2015 at 2:25 PM, A Ghoshal  wrote:

>  I am plagued with something of this sort, too!
>
> What I mostly see when I explore these things is that
>
> A) it's a split-brain.
> B) the split-brain is because the gfid's on the two replicas are at odds.
>
> You could check that out by
> 1. On each server, first 'cd' to where your brick is mounted.
> 2. getfattr -m . -d -e hex
> templates/assets/prod/temporary/13/user_1339200.png
>
> You will see a trusted.gfid kind of extended attribute. If it's not the
> same on both servers, there's a problem.
>
> Thanks,
> Anirban
>
>

Regards,
-- 
*Tiago Santos*
MustHaveMenus.com

=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

2015-01-26 Thread Tiago Santos
Thanks for you input, Anirban.

I ran the commands on both servers, with the following results:


root@web3:/var/www/site-images# time getfattr -m . -d -e hex
templates/assets/prod/temporary/13/user_1339200.png

real 0m34.524s
user 0m0.004s
sys 0m0.000s


root@web4:/var/www/site-images# time getfattr -m . -d -e hex
templates/assets/prod/temporary/13/user_1339200.png
getfattr: templates/assets/prod/temporary/13/user_1339200.png: Input/output
error

real 0m11.315s
user 0m0.001s
sys 0m0.003s
root@web4:/var/www/site-images# ls
templates/assets/prod/temporary/13/user_1339200.png
ls: cannot access templates/assets/prod/temporary/13/user_1339200.png:
Input/output error


Not sure if it elucidate the issue..


Also, I saw at /var/log/gluster.log a zillion entries like these:

[2015-01-26 17:35:39.973268] W
[client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1:
remote operation failed: Transport endpoint is not connected. Path:
/templates/apache/template/prod/facebook/9616964
(----)
[2015-01-26 17:35:39.973435] W
[client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1:
remote operation failed: Transport endpoint is not connected. Path:
/templates/apache/template/prod/facebook/9594915
(----)
[2015-01-26 17:35:39.973571] W
[client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1:
remote operation failed: Transport endpoint is not connected. Path:
/templates/apache/template/prod/facebook/9681971
(----)
[2015-01-26 17:35:39.973686] W
[client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1:
remote operation failed: Transport endpoint is not connected. Path:
/templates/apache/template/prod/facebook/19615
(----)
[2015-01-26 17:35:39.973802] W
[client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1:
remote operation failed: Transport endpoint is not connected. Path:
/templates/apache/template/prod/facebook/130392
(----)


I have talked with some guys at #gluster that pointed it could be network
issues. I'm still looking into it, but since the issue also happens locally
(within the same server), would that still be a valid point?


Also, less often, I see entries like these:

[2015-01-26 17:41:25.956418] E
[afr-self-heal-common.c:1615:afr_sh_common_lookup_cbk]
0-site-images-replicate-0: Conflicting entries for
/webhost/sites/clipart/assets/apache/images/graphics/215126/image1.png
[2015-01-26 17:41:26.588753] E
[afr-self-heal-common.c:1615:afr_sh_common_lookup_cbk]
0-site-images-replicate-0: Conflicting entries for
/webhost/sites/clipart/assets/apache/images/graphics/215126/image1.png


Are those a definitive indication of a split-brain? Or just something usual
until self-heal takes care of recently updated files?






On Mon, Jan 26, 2015 at 2:25 PM, A Ghoshal  wrote:

>  I am plagued with something of this sort, too!
>
> What I mostly see when I explore these things is that
>
> A) it's a split-brain.
> B) the split-brain is because the gfid's on the two replicas are at odds.
>
> You could check that out by
> 1. On each server, first 'cd' to where your brick is mounted.
> 2. getfattr -m . -d -e hex
> templates/assets/prod/temporary/13/user_1339200.png
>
> You will see a trusted.gfid kind of extended attribute. If it's not the
> same on both servers, there's a problem.
>
> Thanks,
> Anirban
>
>

Regards,
-- 
*Tiago Santos*
MustHaveMenus.com
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

2015-01-26 Thread A Ghoshal
 I am plagued with something of this sort, too!

What I mostly see when I explore these things is that

A) it's a split-brain. 
B) the split-brain is because the gfid's on the two replicas are at odds. 

You could check that out by 
1. On each server, first 'cd' to where your brick is mounted.
2. getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png

You will see a trusted.gfid kind of extended attribute. If it's not the same on 
both servers, there's a problem.

Thanks, 
Anirban

 -Tiago Santos  wrote: -

 ===
 To: gluster-users@gluster.org
 From: Tiago Santos 
 Date: 01/26/2015 09:38PM 
 Subject: [Gluster-users] Pretty much any operation related to Gluster  mounted 
fs hangs for a while
 ===
   Hey guys,

I'm experiencing this weird case for pretty much any command (ls, df, find,
etc) I try to run against a Gluster client filesystem.


Just for you guys to understand what I'm talking about, follows this easy
and simple test I just ran:


root@web3:~# date; time ls -ltrh
/var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png
Mon Jan 26 07:00:27 PST 2015
-rwx---r-- 1 mhmadmin mhmadmin 61K Jan 22 14:37
/var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png

real 0m*33.651s*
user 0m0.001s
sys 0m0.004s
root@web3:~# date; time ls -ltrh
/var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png
Mon Jan 26 07:01:03 PST 2015
ls: cannot access
/var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png:
*Input/output
error*

real *1m40.241s*
user 0m0.000s
sys 0m0.003s
root@web3:~# date; time ls -ltrh
/var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png
Mon Jan 26 07:02:51 PST 2015
ls: cannot access
/var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png:
*Input/output
error*

real *0m12.834s*
user 0m0.000s
sys 0m0.003s
root@web3:~# date; time ls -ltrh
/var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png
Mon Jan 26 07:03:10 PST 2015
-rwx---r-- 1 mhmadmin mhmadmin 61K Jan 22 14:37
/var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png

real *2m10.150s*
user 0m0.000s
sys 0m0.005s


Sometimes it passes, but takes a really long time to run a simple command
(this is a 61K file), sometimes I see the Input/output error. The important
thing to mention is that this behavior happens almost all the time. I can
quickly reproduce it if asked.


This is a 2-node gluster setup. Both VMs act as Client and Server (sorry if
I'm not using the correct gluster naming.. I'm getting to know it for weeks
now).


More info:

# gluster --version
glusterfs 3.5.3 built on Nov 18 2014 03:53:25
Repository revision: git://git.gluster.com/glusterfs.git

# df -Th
Filesystem  TypeSize  Used Avail Use%
Mounted on
/dev/mapper/data_vg-data_lv ext4   1007G  506G  451G  53%
/export/images1-1
images1.mydomain.com:/site-images fuse.glusterfs 1007G  506G  451G  53%
/var/www/site-images

# uname -a
Linux web3 3.13.0-44-generic #73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014
x86_64 x86_64 x86_64 GNU/Linux

# gluster volume info

Volume Name: site-images
Type: Replicate
Volume ID: 68bca3c9-210c-45a9-b2bc-6a0e2ee630bb
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: images1.mydomain.com:/export/images1-1/brick
Brick2: images2.mydomain.com:/export/images2-1/brick



Would anyone help me identify what is going on here?


Thanks in advance!

-- 
*Tiago Santos*
MustHaveMenus.com
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

2015-01-26 Thread Tiago Santos
Hey guys,

I'm experiencing this weird case for pretty much any command (ls, df, find,
etc) I try to run against a Gluster client filesystem.


Just for you guys to understand what I'm talking about, follows this easy
and simple test I just ran:


root@web3:~# date; time ls -ltrh
/var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png
Mon Jan 26 07:00:27 PST 2015
-rwx---r-- 1 mhmadmin mhmadmin 61K Jan 22 14:37
/var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png

real 0m*33.651s*
user 0m0.001s
sys 0m0.004s
root@web3:~# date; time ls -ltrh
/var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png
Mon Jan 26 07:01:03 PST 2015
ls: cannot access
/var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png:
*Input/output
error*

real *1m40.241s*
user 0m0.000s
sys 0m0.003s
root@web3:~# date; time ls -ltrh
/var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png
Mon Jan 26 07:02:51 PST 2015
ls: cannot access
/var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png:
*Input/output
error*

real *0m12.834s*
user 0m0.000s
sys 0m0.003s
root@web3:~# date; time ls -ltrh
/var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png
Mon Jan 26 07:03:10 PST 2015
-rwx---r-- 1 mhmadmin mhmadmin 61K Jan 22 14:37
/var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png

real *2m10.150s*
user 0m0.000s
sys 0m0.005s


Sometimes it passes, but takes a really long time to run a simple command
(this is a 61K file), sometimes I see the Input/output error. The important
thing to mention is that this behavior happens almost all the time. I can
quickly reproduce it if asked.


This is a 2-node gluster setup. Both VMs act as Client and Server (sorry if
I'm not using the correct gluster naming.. I'm getting to know it for weeks
now).


More info:

# gluster --version
glusterfs 3.5.3 built on Nov 18 2014 03:53:25
Repository revision: git://git.gluster.com/glusterfs.git

# df -Th
Filesystem  TypeSize  Used Avail Use%
Mounted on
/dev/mapper/data_vg-data_lv ext4   1007G  506G  451G  53%
/export/images1-1
images1.mydomain.com:/site-images fuse.glusterfs 1007G  506G  451G  53%
/var/www/site-images

# uname -a
Linux web3 3.13.0-44-generic #73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014
x86_64 x86_64 x86_64 GNU/Linux

# gluster volume info

Volume Name: site-images
Type: Replicate
Volume ID: 68bca3c9-210c-45a9-b2bc-6a0e2ee630bb
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: images1.mydomain.com:/export/images1-1/brick
Brick2: images2.mydomain.com:/export/images2-1/brick



Would anyone help me identify what is going on here?


Thanks in advance!

-- 
*Tiago Santos*
MustHaveMenus.com
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users