Re: [Gluster-devel] spurious regression failures again! [bug-1112559.t]

2014-07-22 Thread Anders Blomdell
, snapshots are created and peer probe is NOT 
 done simultaneously.
 
 Will continue on the investigation and will keep you posted.
 
 
 Regards,
 Joe
 
 
 
 
 - Original Message -
 From: Joseph Fernandes josfe...@redhat.com
 To: Avra Sengupta aseng...@redhat.com
 Cc: Pranith Kumar Karampuri pkara...@redhat.com, Gluster Devel 
 gluster-devel@gluster.org, Varun Shastry vshas...@redhat.com, Justin 
 Clift jus...@gluster.org
 Sent: Thursday, July 17, 2014 10:58:14 AM
 Subject: Re: [Gluster-devel] spurious regression failures again!
 
 Hi Avra,
 
 Just clarifying things here,
 1) When testing with the setup provide by Justin, I found the only place 
 where bug-1112559.t failed was after the failure mgmt_v3-locks.t in the 
 previous regression run. The mail attached with the previous mail was just an 
 OBSERVATION and NOT an INFERENCE that failure of mgmt_v3-locks.t was the root 
 cause of bug-1112559.t . I am NOT jumping the gun and making any 
 statement/conclusion here. Its just an OBSERVATION. And thanks for the 
 clarification on why mgmt_v3-locks.t is failing.
 
 2) I agree with you that the cleanup script needs to kill all gluster* 
 processes. And its also true that port range used by gluster for bricks is 
 unique.
 But bug-1112559.t fails only because of the unavailability of port, to start 
 the snap brick. Therefore this suggests that there is some process(gluster or 
 non-gluster)
 still using the port. 
 
 3) And Finally that bug-1112559.t failing individually all the time is not 
 true as when looked into the links which you have provided there are case 
 where there are previous other test case failures, on the same testing 
 machine (slave26). By this I am not pointing out that those failure are the 
 root cause for bug-1112559.t to fail 
 As stated earlier its a notable OBSERVATION(Keeping in mind point 2 about 
 ports and cleanup)
 
 I have run nearly 30 runs on slave30 and only one time bug-1112559.t failed 
 (As stated in point 1). I am continuing to run more runs. The only problem is 
 the occurrence of bug-1112559.t failure is spurious and there is no 
 deterministic way of reproducing it. 
 
 Will keep all posted about the results.
 
 Regards,
 Joe
 
 
 
 - Original Message -
 From: Avra Sengupta aseng...@redhat.com
 To: Joseph Fernandes josfe...@redhat.com, Pranith Kumar Karampuri 
 pkara...@redhat.com
 Cc: Gluster Devel gluster-devel@gluster.org, Varun Shastry 
 vshas...@redhat.com, Justin Clift jus...@gluster.org
 Sent: Wednesday, July 16, 2014 1:03:21 PM
 Subject: Re: [Gluster-devel] spurious regression failures again!
 
 Joseph,
 
 I am not sure I understand how this is affecting the spurious failure of 
 bug-1112559.t. As per the mail you have attached, and according to your 
 analysis,  bug-1112559.t fails because a cleanup hasn't happened 
 properly after a previous test-case failed and in your case there was a 
 crash as well.
 
 Now out of all the times bug-1112559.t has failed, most of the time it's 
 the only test case failing and there isn't any crash. Below are the 
 regression runs that pranith had sent for the same.
 
 http://build.gluster.org/job/rackspace-regression-2GB/541/consoleFull
 
 http://build.gluster.org/job/rackspace-regression-2GB-triggered/173/consoleFull
 
 http://build.gluster.org/job/rackspace-regression-2GB-triggered/172/consoleFull
 
 http://build.gluster.org/job/rackspace-regression-2GB/543/console
 
 In all of the above bug-1112559.t is the only test case that fails and 
 there is no crash.
 
 So what I fail to understand here is, if this particular testcase fails 
 independently as well as with other testcases, then how can we conclude 
 that any other testcase failing is somehow not doing a cleanup properly 
 and that is the reason for bug-1112559.t failing.
 
 mgmt_v3-locks.t fails because glusterd takes more time to register a 
 node going down, and hence the peer status doesn't return what the 
 testcase expects it to. It's a race. The testcase ends with a cleanup 
 routine like every other testcase, that kills all gluster and glusterfsd 
 processes, which might be using any brick ports. So could you please 
 explain how or which process still uses the brick ports that the snap 
 bricks are trying to use leading to the failure of bug-1112559.t.
 
 Regards,
 Avra
 
 On 07/15/2014 09:57 PM, Joseph Fernandes wrote:
 Just pointing out ,

 2) tests/basic/mgmt_v3-locks.t - Author: Avra
 http://build.gluster.org/job/rackspace-regression-2GB-triggered/375/consoleFull

 This is the similar kind of error I saw in my testing of spurious failure 
 tests/bugs/bug-1112559.t

 Please refer the attached mail.

 Regards,
 Joe



 - Original Message -
 From: Pranith Kumar Karampuri pkara...@redhat.com
 To: Joseph Fernandes josfe...@redhat.com
 Cc: Gluster Devel gluster-devel@gluster.org, Varun Shastry 
 vshas...@redhat.com
 Sent: Tuesday, July 15, 2014 9:34:26 PM
 Subject: Re: [Gluster-devel] spurious regression failures again!


 On 07/15/2014 09

Re: [Gluster-devel] spurious regression failures again! [bug-1112559.t]

2014-07-22 Thread Joe Julian

On 07/22/2014 07:19 AM, Anders Blomdell wrote:

Could this be a time to propose that gluster understands port reservation a'la 
systemd (LISTEN_FDS),
and make the test harness make sure that random ports do not collide with the 
set of expected ports,
which will be beneficial when starting from systemd as well.

Wouldn't that only work for Fedora and RHEL7?
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious regression failures again! [bug-1112559.t]

2014-07-22 Thread Anders Blomdell
On 2014-07-22 16:44, Justin Clift wrote:
 On 22/07/2014, at 3:28 PM, Joe Julian wrote:
 On 07/22/2014 07:19 AM, Anders Blomdell wrote:
 Could this be a time to propose that gluster understands port reservation 
 a'la systemd (LISTEN_FDS),
 and make the test harness make sure that random ports do not collide with 
 the set of expected ports,
 which will be beneficial when starting from systemd as well.
 Wouldn't that only work for Fedora and RHEL7?
 
 Probably depends how it's done.  Maybe make it a conditional
 thing that's compiled in or not, depending on the platform?
Don't think so, the LISTEN_FDS is dead simple; if LISTEN_FDS is 
set in the environment, fd#3 to fd#3+LISTEN_FDS are sockets opened
by the calling process, and their function has to be deduced via 
getsockname and sockets should not opened by the process. If 
LISTEN_FDS is not set, proceed to open sockets just like before.

The good thing about this is that systemd can reserve the ports 
used very early during boot, and no other process can steal them
away. For testing purposes, this could be used to assure that
all ports are available before starting tests (if random port
stealing is the true problem here, that is still an unverified
shot in the dark).

 
 Unless there's a better, cross platform approach of course. :)
 
 Regards and best wishes,
 
 Justin Clift
 
/Anders


-- 
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious regression failures again!

2014-07-18 Thread Varun Shastry

Hi,

Created a bug against the same. Please use it to submit if required.
https://bugzilla.redhat.com/show_bug.cgi?id=1121014

Thanks
Varun Shastry


On Tuesday 15 July 2014 09:34 PM, Pranith Kumar Karampuri wrote:


On 07/15/2014 09:24 PM, Joseph Fernandes wrote:

Hi Pranith,

Could you please share the link of the console output of the failures.

Added them inline. Thanks for reminding :-)

Pranith


Regards,
Joe

- Original Message -
From: Pranith Kumar Karampuri pkara...@redhat.com
To: Gluster Devel gluster-devel@gluster.org, Varun Shastry 
vshas...@redhat.com

Sent: Tuesday, July 15, 2014 8:52:44 PM
Subject: [Gluster-devel] spurious regression failures again!

hi,
  We have 4 tests failing once in a while causing problems:
1) tests/bugs/bug-1087198.t - Author: Varun
http://build.gluster.org/job/rackspace-regression-2GB-triggered/379/consoleFull 


2) tests/basic/mgmt_v3-locks.t - Author: Avra
http://build.gluster.org/job/rackspace-regression-2GB-triggered/375/consoleFull 


3) tests/basic/fops-sanity.t - Author: Pranith
http://build.gluster.org/job/rackspace-regression-2GB-triggered/383/consoleFull 



Please take a look at them and post updates.

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious regression failures again!

2014-07-16 Thread Joseph Fernandes
Hi Avra,

Just clarifying things here,
1) When testing with the setup provide by Justin, I found the only place where 
bug-1112559.t failed was after the failure mgmt_v3-locks.t in the previous 
regression run. The mail attached with the previous mail was just an 
OBSERVATION and NOT an INFERENCE that failure of mgmt_v3-locks.t was the root 
cause of bug-1112559.t . I am NOT jumping the gun and making any 
statement/conclusion here. Its just an OBSERVATION. And thanks for the 
clarification on why mgmt_v3-locks.t is failing.

2) I agree with you that the cleanup script needs to kill all gluster* 
processes. And its also true that port range used by gluster for bricks is 
unique.
But bug-1112559.t fails only because of the unavailability of port, to start 
the snap brick. Therefore this suggests that there is some process(gluster or 
non-gluster)
still using the port. 

3) And Finally that bug-1112559.t failing individually all the time is not true 
as when looked into the links which you have provided there are case where 
there are previous other test case failures, on the same testing machine 
(slave26). By this I am not pointing out that those failure are the root cause 
for bug-1112559.t to fail 
As stated earlier its a notable OBSERVATION(Keeping in mind point 2 about ports 
and cleanup)

I have run nearly 30 runs on slave30 and only one time bug-1112559.t failed (As 
stated in point 1). I am continuing to run more runs. The only problem is the 
occurrence of bug-1112559.t failure is spurious and there is no deterministic 
way of reproducing it. 

Will keep all posted about the results.

Regards,
Joe



- Original Message -
From: Avra Sengupta aseng...@redhat.com
To: Joseph Fernandes josfe...@redhat.com, Pranith Kumar Karampuri 
pkara...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org, Varun Shastry 
vshas...@redhat.com, Justin Clift jus...@gluster.org
Sent: Wednesday, July 16, 2014 1:03:21 PM
Subject: Re: [Gluster-devel] spurious regression failures again!

Joseph,

I am not sure I understand how this is affecting the spurious failure of 
bug-1112559.t. As per the mail you have attached, and according to your 
analysis,  bug-1112559.t fails because a cleanup hasn't happened 
properly after a previous test-case failed and in your case there was a 
crash as well.

Now out of all the times bug-1112559.t has failed, most of the time it's 
the only test case failing and there isn't any crash. Below are the 
regression runs that pranith had sent for the same.

http://build.gluster.org/job/rackspace-regression-2GB/541/consoleFull

http://build.gluster.org/job/rackspace-regression-2GB-triggered/173/consoleFull

http://build.gluster.org/job/rackspace-regression-2GB-triggered/172/consoleFull

http://build.gluster.org/job/rackspace-regression-2GB/543/console

In all of the above bug-1112559.t is the only test case that fails and 
there is no crash.

So what I fail to understand here is, if this particular testcase fails 
independently as well as with other testcases, then how can we conclude 
that any other testcase failing is somehow not doing a cleanup properly 
and that is the reason for bug-1112559.t failing.

mgmt_v3-locks.t fails because glusterd takes more time to register a 
node going down, and hence the peer status doesn't return what the 
testcase expects it to. It's a race. The testcase ends with a cleanup 
routine like every other testcase, that kills all gluster and glusterfsd 
processes, which might be using any brick ports. So could you please 
explain how or which process still uses the brick ports that the snap 
bricks are trying to use leading to the failure of bug-1112559.t.

Regards,
Avra

On 07/15/2014 09:57 PM, Joseph Fernandes wrote:
 Just pointing out ,

 2) tests/basic/mgmt_v3-locks.t - Author: Avra
 http://build.gluster.org/job/rackspace-regression-2GB-triggered/375/consoleFull

 This is the similar kind of error I saw in my testing of spurious failure 
 tests/bugs/bug-1112559.t

 Please refer the attached mail.

 Regards,
 Joe



 - Original Message -
 From: Pranith Kumar Karampuri pkara...@redhat.com
 To: Joseph Fernandes josfe...@redhat.com
 Cc: Gluster Devel gluster-devel@gluster.org, Varun Shastry 
 vshas...@redhat.com
 Sent: Tuesday, July 15, 2014 9:34:26 PM
 Subject: Re: [Gluster-devel] spurious regression failures again!


 On 07/15/2014 09:24 PM, Joseph Fernandes wrote:
 Hi Pranith,

 Could you please share the link of the console output of the failures.
 Added them inline. Thanks for reminding :-)

 Pranith
 Regards,
 Joe

 - Original Message -
 From: Pranith Kumar Karampuri pkara...@redhat.com
 To: Gluster Devel gluster-devel@gluster.org, Varun Shastry 
 vshas...@redhat.com
 Sent: Tuesday, July 15, 2014 8:52:44 PM
 Subject: [Gluster-devel] spurious regression failures again!

 hi,
We have 4 tests failing once in a while causing problems:
 1) tests/bugs/bug-1087198.t - Author: Varun
 http://build.gluster.org/job/rackspace-regression

[Gluster-devel] spurious regression failures again!

2014-07-15 Thread Pranith Kumar Karampuri

hi,
We have 4 tests failing once in a while causing problems:
1) tests/bugs/bug-1087198.t - Author: Varun
2) tests/basic/mgmt_v3-locks.t - Author: Avra
3) tests/basic/fops-sanity.t - Author: Pranith

Please take a look at them and post updates.

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious regression failures again!

2014-07-15 Thread Joseph Fernandes
Hi Pranith,

Could you please share the link of the console output of the failures. 

Regards,
Joe

- Original Message -
From: Pranith Kumar Karampuri pkara...@redhat.com
To: Gluster Devel gluster-devel@gluster.org, Varun Shastry 
vshas...@redhat.com
Sent: Tuesday, July 15, 2014 8:52:44 PM
Subject: [Gluster-devel] spurious regression failures again!

hi,
 We have 4 tests failing once in a while causing problems:
1) tests/bugs/bug-1087198.t - Author: Varun
2) tests/basic/mgmt_v3-locks.t - Author: Avra
3) tests/basic/fops-sanity.t - Author: Pranith

Please take a look at them and post updates.

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel