Re: [Gluster-devel] XFS kernel panic bug?

2014-06-12 Thread Justin Clift
On 12/06/2014, at 6:58 AM, Niels de Vos wrote:
snip
 If you capture a vmcore (needs kdump installed and configured), we may 
 be able to see the cause more clearly.


That does help, and so will Harsha's suggestion too probably. :)

I'll look into it properly later on today.

For the moment, I've rebooted the other slaves which seems to put them into
an ok state for a few runs.

Also just started some rackspace-regression runs on them, using the ones
queued up in the normal regression queue.

The results are being updated live into Gerrit now (+1/-1/MERGE CONFLICT).

So, if you see any regression runs pass on the slaves, it's worth removing
the corresponding job from the main regression queue.  That'll help keep
the queue shorter for today at least. :)

Btw - Happy vacation Niels :)

/me goes to bed

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] XFS kernel panic bug?

2014-06-12 Thread Niels de Vos
On Thu, Jun 12, 2014 at 07:26:25AM +0100, Justin Clift wrote:
 On 12/06/2014, at 6:58 AM, Niels de Vos wrote:
 snip
  If you capture a vmcore (needs kdump installed and configured), we may 
  be able to see the cause more clearly.

Oh, these seem to be Xen hosts. I don't think kdump (mainly kexec) works 
on Xen. You would need to run xen-dump (or something like that) on the 
Dom0, for that, you'll have to call Rackspace support, and I have no 
idea how they handle such requests...

 That does help, and so will Harsha's suggestion too probably. :)

That is indeed a solution that can mostly prevent such memory 
dead-locks. Those options can be used to configure to push out the 
outstanding data earlier to the loop-devices, and to the underlying XFS 
filesystem that hold the backing files for the loop-devices.

Cheers,
Niels

 I'll look into it properly later on today.
 
 For the moment, I've rebooted the other slaves which seems to put them into
 an ok state for a few runs.
 
 Also just started some rackspace-regression runs on them, using the ones
 queued up in the normal regression queue.
 
 The results are being updated live into Gerrit now (+1/-1/MERGE CONFLICT).
 
 So, if you see any regression runs pass on the slaves, it's worth removing
 the corresponding job from the main regression queue.  That'll help keep
 the queue shorter for today at least. :)
 
 Btw - Happy vacation Niels :)
 
 /me goes to bed
 
 + Justin
 
 --
 GlusterFS - http://www.gluster.org
 
 An open source, distributed file system scaling to several
 petabytes, and handling thousands of clients.
 
 My personal twitter: twitter.com/realjustinclift
 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Please use http://build.gluster.org/job/rackspace-regression/

2014-06-12 Thread Pranith Kumar Karampuri

hi Guys,
 Rackspace slaves are in action now, thanks to Justin. Please use 
the URL in Subject to run the regressions. I already shifted some jobs 
to rackspace.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Request for merging patch

2014-06-12 Thread Vijay Bellur

On 06/12/2014 01:35 PM, Pranith Kumar Karampuri wrote:

Vijay,
Could you merge this patch please.

http://review.gluster.org/7928



Done, thanks.

-Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t

2014-06-12 Thread Sachin Pandit
http://review.gluster.org/#/c/8041/ is merged upstream.

~ Sachin.
- Original Message -
From: Sachin Pandit span...@redhat.com
To: Raghavendra Talur rta...@redhat.com
Cc: Pranith Kumar Karampuri pkara...@redhat.com, Gluster Devel 
gluster-devel@gluster.org
Sent: Thursday, June 12, 2014 12:58:44 PM
Subject: Re: [Gluster-devel] spurious regression failure in 
tests/bugs/bug-1104642.t

Patch link http://review.gluster.org/#/c/8041/.

~ Sachin.

- Original Message -
From: Raghavendra Talur rta...@redhat.com
To: Pranith Kumar Karampuri pkara...@redhat.com
Cc: Sachin Pandit span...@redhat.com, Gluster Devel 
gluster-devel@gluster.org
Sent: Thursday, June 12, 2014 10:46:14 AM
Subject: Re: [Gluster-devel] spurious regression failure in 
tests/bugs/bug-1104642.t

Sachin and I looked at the failure.

Current guess is that glusterd_2 had not yet completed the handshake with
glusterd_1 and hence did not know about the option set.

KP suggested that instead of having a sleep before this command,
we could get peer status and verify that it is 1 and then get the 
vol info. Although even this does not make the test fully deterministic,
we will be closer to it. Sachin will send out a patch for the same.

Raghavendra Talur 

- Original Message -
From: Pranith Kumar Karampuri pkara...@redhat.com
To: Sachin Pandit span...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org
Sent: Thursday, June 12, 2014 9:54:03 AM
Subject: Re: [Gluster-devel] spurious regression failure in 
tests/bugs/bug-1104642.t

Check the logs to find the reason.

Pranith.
On 06/12/2014 09:24 AM, Sachin Pandit wrote:
 I am not hitting this even after running the test case in a loop.
 I'll update in this thread once I find out the root cause of the failure.

 ~ Sachin

 - Original Message -
 From: Sachin Pandit span...@redhat.com
 To: Pranith Kumar Karampuri pkara...@redhat.com
 Cc: Gluster Devel gluster-devel@gluster.org
 Sent: Thursday, June 12, 2014 8:50:40 AM
 Subject: Re: [Gluster-devel] spurious regression failure in   
 tests/bugs/bug-1104642.t

 I will look into this.

 - Original Message -
 From: Pranith Kumar Karampuri pkara...@redhat.com
 To: Gluster Devel gluster-devel@gluster.org
 Cc: rta...@redhat.com, span...@redhat.com
 Sent: Wednesday, June 11, 2014 9:08:44 PM
 Subject: spurious regression failure in tests/bugs/bug-1104642.t

 Raghavendra/Sachin,
 Could one of you guys take a look at this please.

 pk1@localhost - ~/workspace/gerrit-repo (master)
 21:04:46 :) ⚡ ~/.scripts/regression.py
 http://build.gluster.org/job/regression/4831/consoleFull
 Patch == http://review.gluster.com/#/c/7994/2
 Author == Raghavendra Talur rta...@redhat.com
 Build triggered by == amarts
 Build-url == http://build.gluster.org/job/regression/4831/consoleFull
 Download-log-at ==
 http://build.gluster.org:443/logs/regression/glusterfs-logs-20140611:08:39:04.tgz
 Test written by == Author: Sachin Pandit span...@redhat.com

 ./tests/bugs/bug-1104642.t [13]
 0 #!/bin/bash
 1
 2 . $(dirname $0)/../include.rc
 3 . $(dirname $0)/../volume.rc
 4 . $(dirname $0)/../cluster.rc
 5
 6
 7 function get_value()
 8 {
 9 local key=$1
 10 local var=CLI_$2
 11
 12 eval cli_index=\$$var
 13
 14 $cli_index volume info | grep ^$key\
 15 | sed 's/.*: //'
 16 }
 17
 18 cleanup
 19
 20 TEST launch_cluster 2
 21
 22 TEST $CLI_1 peer probe $H2;
 23 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count
 24
 25 TEST $CLI_1 volume create $V0 $H1:$B1/${V0}0 $H2:$B2/${V0}1
 26 EXPECT $V0 get_value 'Volume Name' 1
 27 EXPECT Created get_value 'Status' 1
 28
 29 TEST $CLI_1 volume start $V0
 30 EXPECT Started get_value 'Status' 1
 31
 32 #Bring down 2nd glusterd
 33 TEST kill_glusterd 2
 34
 35 #set the volume all options from the 1st glusterd
 36 TEST $CLI_1 volume set all cluster.server-quorum-ratio 80
 37
 38 #Bring back the 2nd glusterd
 39 TEST $glusterd_2
 40
 41 #Verify whether the value has been synced
 42 EXPECT '80' get_value 'cluster.server-quorum-ratio' 1
 ***43 EXPECT '80' get_value 'cluster.server-quorum-ratio' 2
 44
 45 cleanup;

 Pranith
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

-- 
Thanks! 
Raghavendra Talur | Red Hat Storage Developer | Bangalore | +918039245176 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t

2014-06-12 Thread Pranith Kumar Karampuri

Thanks a lot for quick resolution Sachin

Pranith
On 06/12/2014 04:38 PM, Sachin Pandit wrote:

http://review.gluster.org/#/c/8041/ is merged upstream.

~ Sachin.
- Original Message -
From: Sachin Pandit span...@redhat.com
To: Raghavendra Talur rta...@redhat.com
Cc: Pranith Kumar Karampuri pkara...@redhat.com, Gluster Devel 
gluster-devel@gluster.org
Sent: Thursday, June 12, 2014 12:58:44 PM
Subject: Re: [Gluster-devel] spurious regression failure in 
tests/bugs/bug-1104642.t

Patch link http://review.gluster.org/#/c/8041/.

~ Sachin.

- Original Message -
From: Raghavendra Talur rta...@redhat.com
To: Pranith Kumar Karampuri pkara...@redhat.com
Cc: Sachin Pandit span...@redhat.com, Gluster Devel 
gluster-devel@gluster.org
Sent: Thursday, June 12, 2014 10:46:14 AM
Subject: Re: [Gluster-devel] spurious regression failure in 
tests/bugs/bug-1104642.t

Sachin and I looked at the failure.

Current guess is that glusterd_2 had not yet completed the handshake with
glusterd_1 and hence did not know about the option set.

KP suggested that instead of having a sleep before this command,
we could get peer status and verify that it is 1 and then get the
vol info. Although even this does not make the test fully deterministic,
we will be closer to it. Sachin will send out a patch for the same.

Raghavendra Talur

- Original Message -
From: Pranith Kumar Karampuri pkara...@redhat.com
To: Sachin Pandit span...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org
Sent: Thursday, June 12, 2014 9:54:03 AM
Subject: Re: [Gluster-devel] spurious regression failure in 
tests/bugs/bug-1104642.t

Check the logs to find the reason.

Pranith.
On 06/12/2014 09:24 AM, Sachin Pandit wrote:

I am not hitting this even after running the test case in a loop.
I'll update in this thread once I find out the root cause of the failure.

~ Sachin

- Original Message -
From: Sachin Pandit span...@redhat.com
To: Pranith Kumar Karampuri pkara...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org
Sent: Thursday, June 12, 2014 8:50:40 AM
Subject: Re: [Gluster-devel] spurious regression failure in 
tests/bugs/bug-1104642.t

I will look into this.

- Original Message -
From: Pranith Kumar Karampuri pkara...@redhat.com
To: Gluster Devel gluster-devel@gluster.org
Cc: rta...@redhat.com, span...@redhat.com
Sent: Wednesday, June 11, 2014 9:08:44 PM
Subject: spurious regression failure in tests/bugs/bug-1104642.t

Raghavendra/Sachin,
Could one of you guys take a look at this please.

pk1@localhost - ~/workspace/gerrit-repo (master)
21:04:46 :) ⚡ ~/.scripts/regression.py
http://build.gluster.org/job/regression/4831/consoleFull
Patch == http://review.gluster.com/#/c/7994/2
Author == Raghavendra Talur rta...@redhat.com
Build triggered by == amarts
Build-url == http://build.gluster.org/job/regression/4831/consoleFull
Download-log-at ==
http://build.gluster.org:443/logs/regression/glusterfs-logs-20140611:08:39:04.tgz
Test written by == Author: Sachin Pandit span...@redhat.com

./tests/bugs/bug-1104642.t [13]
0 #!/bin/bash
1
2 . $(dirname $0)/../include.rc
3 . $(dirname $0)/../volume.rc
4 . $(dirname $0)/../cluster.rc
5
6
7 function get_value()
8 {
9 local key=$1
10 local var=CLI_$2
11
12 eval cli_index=\$$var
13
14 $cli_index volume info | grep ^$key\
15 | sed 's/.*: //'
16 }
17
18 cleanup
19
20 TEST launch_cluster 2
21
22 TEST $CLI_1 peer probe $H2;
23 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count
24
25 TEST $CLI_1 volume create $V0 $H1:$B1/${V0}0 $H2:$B2/${V0}1
26 EXPECT $V0 get_value 'Volume Name' 1
27 EXPECT Created get_value 'Status' 1
28
29 TEST $CLI_1 volume start $V0
30 EXPECT Started get_value 'Status' 1
31
32 #Bring down 2nd glusterd
33 TEST kill_glusterd 2
34
35 #set the volume all options from the 1st glusterd
36 TEST $CLI_1 volume set all cluster.server-quorum-ratio 80
37
38 #Bring back the 2nd glusterd
39 TEST $glusterd_2
40
41 #Verify whether the value has been synced
42 EXPECT '80' get_value 'cluster.server-quorum-ratio' 1
***43 EXPECT '80' get_value 'cluster.server-quorum-ratio' 2
44
45 cleanup;

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Spurious regression test failure in ./tests/bugs/bug-1101143.t

2014-06-12 Thread Pranith Kumar Karampuri

Thanks for reporting. Will take a look.

Pranith

On 06/12/2014 05:52 PM, Raghavendra Talur wrote:

Hi Pranith,

This test failed for my patch set today and seems to be a spurious 
failure.

Here is the console output for the run.
http://build.gluster.org/job/rackspace-regression/107/consoleFull

Could you please have a look at it?

--
Thanks!
Raghavendra Talur | Red Hat Storage Developer | Bangalore |+918039245176



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Rolling upgrades from glusterfs 3.4 to 3.5

2014-06-12 Thread Ravishankar N

Hi Vijay,

Since glusterfs 3.5, posix_lookup() sends ESTALE instead of ENOENT [1] 
when when a parent gfid (entry) is not present on the brick . In a 
replicate set up, this causes a problem because AFR gives more priority 
to ESTALE than ENOENT, causing IO to fail [2]. The fix is in progress at 
[3] and is client-side specific , and would make it to 3.5.2


But we will still hit the problem when rolling upgrade is performed from 
3.4 to 3.5,  unless the clients are also upgraded to 3.5: To elaborate 
an example:


0) Create a 1x2 volume using 2 nodes and mount it from client. All 
machines are glusterfs 3.4
1) Perform for i in {1..30}; do mkdir $i; tar xf glusterfs-3.5git.tar.gz 
-C $i done
2) While this is going on, kill one of the node in the replica pair and 
upgrade it to glusterfs 3.5 (simulating rolling upgrade)

3) After a while, kill all tar processes
4) Create a backup directory and move all 1..30 dirs inside 'backup'
5) Start the untar processes in 1) again
6) Bring up the upgraded node. Tar fails with estale errors.

Essentially the errors occur because [3] is a client side fix. But 
rolling upgrades are targeted at servers while the older clients still 
need to access them without issues.


A solution is to have a fix in the posix translator wherein the newer 
client passes it's version (3.5) to posix_lookup() which then sends 
ESTALE if version is 3.5 or newer but sends ENOENT instead if it is an 
older client. Does this seem okay?


[1] http://review.gluster.org/6318
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1106408
[3] http://review.gluster.org/#/c/8015/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Please use http://build.gluster.org/job/rackspace-regression/

2014-06-12 Thread Justin Clift
On 12/06/2014, at 10:22 AM, Pranith Kumar Karampuri wrote:
 hi Guys,
 Rackspace slaves are in action now, thanks to Justin. Please use the URL 
 in Subject to run the regressions. I already shifted some jobs to rackspace.


Good thinking, but please hold off on this for now.

The slaves are hugely unreliable (lots of hanging), at the
moment. :(

Rebooting each slave after each run seems to help, but that's
not a real solution.

I'll be adjusting and tweaking their settings throughout the day
in order to improve it.

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] glusterfs split-brain problem

2014-06-12 Thread Krishnan Parthasarathi
Hi,
Pranith, who is the AFR maintainer, would be the best person to answer this 
question. CC'ing Pranith and gluster-devel.

Krish

- Original Message -
 hi  Krishnan Parthasarathi
 
 Do you tell me which glusterfs-version has great improvement for glusterfs
 split-brain problem?
 Can you tell me the relevant links?
 
 thank you very much!
 
 
 
 
 justgluste...@gmail.com
 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] glusterfs split-brain problem

2014-06-12 Thread Pranith Kumar Karampuri

hi,
Could you let us know what is the exact problem you are running into?

Pranith
On 06/13/2014 09:27 AM, Krishnan Parthasarathi wrote:

Hi,
Pranith, who is the AFR maintainer, would be the best person to answer this
question. CC'ing Pranith and gluster-devel.

Krish

- Original Message -

hi  Krishnan Parthasarathi

Do you tell me which glusterfs-version has great improvement for glusterfs
split-brain problem?
Can you tell me the relevant links?

thank you very much!




justgluste...@gmail.com



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel