subject:"\[Gluster\-devel\] Spurious failures"

[Gluster-devel] Spurious failures

2015-09-22 Thread Krutika Dhananjay

Hi, 

The following tests seem to be failing consistently on the build machines in 
Linux: 

./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t .. 
./tests/geo-rep/georep-basic-dr-rsync.t .. 
./tests/geo-rep/georep-basic-dr-tarssh.t .. 

I have added these tests into the tracker etherpad. 

Meanwhile could someone from geo-rep and glusterd team take a look or perhaps 
move them to bad tests list? 


Here is one place where the three tests failed: 
https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
 

-Krutika 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-22 Thread Atin Mukherjee

Krutika,

./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t is
already a part of bad_tests () in both mainline and 3.7. Could you
provide me the link where this test has failed explicitly and that has
caused the regression to fail?

~Atin


On 09/22/2015 07:27 PM, Krutika Dhananjay wrote:
> Hi,
> 
> The following tests seem to be failing consistently on the build
> machines in Linux:
> 
> ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t ..
> 
> ./tests/geo-rep/georep-basic-dr-rsync.t ..
> 
> ./tests/geo-rep/georep-basic-dr-tarssh.t ..
> 
> I have added these tests into the tracker etherpad.
> 
> Meanwhile could someone from geo-rep and glusterd team take a look or
> perhaps move them to bad tests list?
> 
> 
> Here is one place where the three tests failed:
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> 
> -Krutika
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-22 Thread Krutika Dhananjay

https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
 

Ctrl + f 'not ok'. 

-Krutika 

- Original Message -

> From: "Atin Mukherjee" 
> To: "Krutika Dhananjay" , "Gluster Devel"
> 
> Cc: "Gaurav Garg" , "Aravinda" ,
> "Kotresh Hiremath Ravishankar" 
> Sent: Tuesday, September 22, 2015 8:39:56 PM
> Subject: Re: Spurious failures

> Krutika,

> ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t is
> already a part of bad_tests () in both mainline and 3.7. Could you
> provide me the link where this test has failed explicitly and that has
> caused the regression to fail?

> ~Atin

> On 09/22/2015 07:27 PM, Krutika Dhananjay wrote:
> > Hi,
> >
> > The following tests seem to be failing consistently on the build
> > machines in Linux:
> >
> > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t ..
> >
> > ./tests/geo-rep/georep-basic-dr-rsync.t ..
> >
> > ./tests/geo-rep/georep-basic-dr-tarssh.t ..
> >
> > I have added these tests into the tracker etherpad.
> >
> > Meanwhile could someone from geo-rep and glusterd team take a look or
> > perhaps move them to bad tests list?
> >
> >
> > Here is one place where the three tests failed:
> > https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> >
> > -Krutika
> >
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-22 Thread Atin Mukherjee

./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t (Wstat:
0 Tests: 8 Failed: 2)
  Failed tests:  6, 8
Files=1, Tests=8, 48 wallclock secs ( 0.01 usr  0.01 sys +  0.88 cusr
0.56 csys =  1.46 CPU)
Result: FAIL
./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t: bad
status 1
*Ignoring failure from known-bad test
./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t*
[11:24:16] ./tests/bugs/glusterd/bug-1242543-replace-brick.t .. ok
17587 ms
[11:24:16]
All tests successful

On 09/22/2015 08:46 PM, Krutika Dhananjay wrote:
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> 
> Ctrl + f 'not ok'.
> 
> -Krutika
> 
> 
> 
> *From: *"Atin Mukherjee" 
> *To: *"Krutika Dhananjay" , "Gluster Devel"
> 
> *Cc: *"Gaurav Garg" , "Aravinda"
> , "Kotresh Hiremath Ravishankar"
> 
> *Sent: *Tuesday, September 22, 2015 8:39:56 PM
> *Subject: *Re: Spurious failures
> 
> Krutika,
> 
> ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t is
> already a part of bad_tests () in both mainline and 3.7. Could you
> provide me the link where this test has failed explicitly and that has
> caused the regression to fail?
> 
> ~Atin
> 
> 
> On 09/22/2015 07:27 PM, Krutika Dhananjay wrote:
> > Hi,
> >
> > The following tests seem to be failing consistently on the build
> > machines in Linux:
> >
> > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t ..
> >
> > ./tests/geo-rep/georep-basic-dr-rsync.t ..
> >
> > ./tests/geo-rep/georep-basic-dr-tarssh.t ..
> >
> > I have added these tests into the tracker etherpad.
> >
> > Meanwhile could someone from geo-rep and glusterd team take a look or
> > perhaps move them to bad tests list?
> >
> >
> > Here is one place where the three tests failed:
> >
> 
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> >
> > -Krutika
> >
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-22 Thread Krutika Dhananjay

Ah! Sorry. I didn't read that line. :) 

Just figured even ./tests/geo-rep/georep-basic-dr-rsync.t is added to bad tests 
list. 

So it's just /tests/geo-rep/georep-basic-dr-tarssh.t for now. 

Thanks Atin! 

-Krutika 

- Original Message -

> From: "Atin Mukherjee" 
> To: "Krutika Dhananjay" 
> Cc: "Gluster Devel" , "Gaurav Garg"
> , "Aravinda" , "Kotresh Hiremath
> Ravishankar" 
> Sent: Tuesday, September 22, 2015 8:51:22 PM
> Subject: Re: Spurious failures

> ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t (Wstat:
> 0 Tests: 8 Failed: 2)
> Failed tests: 6, 8
> Files=1, Tests=8, 48 wallclock secs ( 0.01 usr 0.01 sys + 0.88 cusr
> 0.56 csys = 1.46 CPU)
> Result: FAIL
> ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t: bad
> status 1
> *Ignoring failure from known-bad test
> ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t*
> [11:24:16] ./tests/bugs/glusterd/bug-1242543-replace-brick.t .. ok
> 17587 ms
> [11:24:16]
> All tests successful

> On 09/22/2015 08:46 PM, Krutika Dhananjay wrote:
> > https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> >
> > Ctrl + f 'not ok'.
> >
> > -Krutika
> >
> > 
> >
> > *From: *"Atin Mukherjee" 
> > *To: *"Krutika Dhananjay" , "Gluster Devel"
> > 
> > *Cc: *"Gaurav Garg" , "Aravinda"
> > , "Kotresh Hiremath Ravishankar"
> > 
> > *Sent: *Tuesday, September 22, 2015 8:39:56 PM
> > *Subject: *Re: Spurious failures
> >
> > Krutika,
> >
> > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t is
> > already a part of bad_tests () in both mainline and 3.7. Could you
> > provide me the link where this test has failed explicitly and that has
> > caused the regression to fail?
> >
> > ~Atin
> >
> >
> > On 09/22/2015 07:27 PM, Krutika Dhananjay wrote:
> > > Hi,
> > >
> > > The following tests seem to be failing consistently on the build
> > > machines in Linux:
> > >
> > > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t ..
> > >
> > > ./tests/geo-rep/georep-basic-dr-rsync.t ..
> > >
> > > ./tests/geo-rep/georep-basic-dr-tarssh.t ..
> > >
> > > I have added these tests into the tracker etherpad.
> > >
> > > Meanwhile could someone from geo-rep and glusterd team take a look or
> > > perhaps move them to bad tests list?
> > >
> > >
> > > Here is one place where the three tests failed:
> > >
> > https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> > >
> > > -Krutika
> > >
> >
> >
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-23 Thread Kotresh Hiremath Ravishankar

Hi Krutika,

It's failing with

++ gluster --mode=script --wignore volume geo-rep master 
slave21.cloud.gluster.org::slave create push-pem
Gluster version mismatch between master and slave.

I will look into it.

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Krutika Dhananjay" 
> To: "Atin Mukherjee" 
> Cc: "Gluster Devel" , "Gaurav Garg" 
> , "Aravinda" ,
> "Kotresh Hiremath Ravishankar" 
> Sent: Tuesday, September 22, 2015 9:03:44 PM
> Subject: Re: Spurious failures
> 
> Ah! Sorry. I didn't read that line. :)
> 
> Just figured even ./tests/geo-rep/georep-basic-dr-rsync.t is added to bad
> tests list.
> 
> So it's just /tests/geo-rep/georep-basic-dr-tarssh.t for now.
> 
> Thanks Atin!
> 
> -Krutika
> 
> - Original Message -
> 
> > From: "Atin Mukherjee" 
> > To: "Krutika Dhananjay" 
> > Cc: "Gluster Devel" , "Gaurav Garg"
> > , "Aravinda" , "Kotresh Hiremath
> > Ravishankar" 
> > Sent: Tuesday, September 22, 2015 8:51:22 PM
> > Subject: Re: Spurious failures
> 
> > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t (Wstat:
> > 0 Tests: 8 Failed: 2)
> > Failed tests: 6, 8
> > Files=1, Tests=8, 48 wallclock secs ( 0.01 usr 0.01 sys + 0.88 cusr
> > 0.56 csys = 1.46 CPU)
> > Result: FAIL
> > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t: bad
> > status 1
> > *Ignoring failure from known-bad test
> > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t*
> > [11:24:16] ./tests/bugs/glusterd/bug-1242543-replace-brick.t .. ok
> > 17587 ms
> > [11:24:16]
> > All tests successful
> 
> > On 09/22/2015 08:46 PM, Krutika Dhananjay wrote:
> > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> > >
> > > Ctrl + f 'not ok'.
> > >
> > > -Krutika
> > >
> > > 
> > >
> > > *From: *"Atin Mukherjee" 
> > > *To: *"Krutika Dhananjay" , "Gluster Devel"
> > > 
> > > *Cc: *"Gaurav Garg" , "Aravinda"
> > > , "Kotresh Hiremath Ravishankar"
> > > 
> > > *Sent: *Tuesday, September 22, 2015 8:39:56 PM
> > > *Subject: *Re: Spurious failures
> > >
> > > Krutika,
> > >
> > > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t is
> > > already a part of bad_tests () in both mainline and 3.7. Could you
> > > provide me the link where this test has failed explicitly and that has
> > > caused the regression to fail?
> > >
> > > ~Atin
> > >
> > >
> > > On 09/22/2015 07:27 PM, Krutika Dhananjay wrote:
> > > > Hi,
> > > >
> > > > The following tests seem to be failing consistently on the build
> > > > machines in Linux:
> > > >
> > > > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t ..
> > > >
> > > > ./tests/geo-rep/georep-basic-dr-rsync.t ..
> > > >
> > > > ./tests/geo-rep/georep-basic-dr-tarssh.t ..
> > > >
> > > > I have added these tests into the tracker etherpad.
> > > >
> > > > Meanwhile could someone from geo-rep and glusterd team take a look or
> > > > perhaps move them to bad tests list?
> > > >
> > > >
> > > > Here is one place where the three tests failed:
> > > >
> > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> > > >
> > > > -Krutika
> > > >
> > >
> > >
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-23 Thread Kotresh Hiremath Ravishankar

Hi Krutika,

Looks like the prerequisites for geo-replication to work is changed
in slave21

Hi Michael,

Could you please check following settings are made in all linux regression 
machines?
Or provide me with root password so that I can verify.

1. Setup Passwordless SSH for the root user:
 
2. Add below line in /root/.bashrc. This is required as geo-rep does "gluster 
--version" via ssh
   and it can't find the gluster PATH via ssh.
 export PATH=$PATH:/build/install/sbin:/build/install/bin

Once above settings are done, the following script should output proper version.

---
#!/bin/bash

function SSHM()
{
ssh -q \
-oPasswordAuthentication=no \
-oStrictHostKeyChecking=no \
-oControlMaster=yes \
"$@";
}

function cmd_slave()
{
local cmd_line;
cmd_line=$(cat < From: "Kotresh Hiremath Ravishankar" 
> To: "Krutika Dhananjay" 
> Cc: "Atin Mukherjee" , "Gluster Devel" 
> , "Gaurav Garg"
> , "Aravinda" 
> Sent: Wednesday, September 23, 2015 12:31:12 PM
> Subject: Re: Spurious failures
> 
> Hi Krutika,
> 
> It's failing with
> 
> ++ gluster --mode=script --wignore volume geo-rep master
> slave21.cloud.gluster.org::slave create push-pem
> Gluster version mismatch between master and slave.
> 
> I will look into it.
> 
> Thanks and Regards,
> Kotresh H R
> 
> - Original Message -
> > From: "Krutika Dhananjay" 
> > To: "Atin Mukherjee" 
> > Cc: "Gluster Devel" , "Gaurav Garg"
> > , "Aravinda" ,
> > "Kotresh Hiremath Ravishankar" 
> > Sent: Tuesday, September 22, 2015 9:03:44 PM
> > Subject: Re: Spurious failures
> > 
> > Ah! Sorry. I didn't read that line. :)
> > 
> > Just figured even ./tests/geo-rep/georep-basic-dr-rsync.t is added to bad
> > tests list.
> > 
> > So it's just /tests/geo-rep/georep-basic-dr-tarssh.t for now.
> > 
> > Thanks Atin!
> > 
> > -Krutika
> > 
> > - Original Message -
> > 
> > > From: "Atin Mukherjee" 
> > > To: "Krutika Dhananjay" 
> > > Cc: "Gluster Devel" , "Gaurav Garg"
> > > , "Aravinda" , "Kotresh Hiremath
> > > Ravishankar" 
> > > Sent: Tuesday, September 22, 2015 8:51:22 PM
> > > Subject: Re: Spurious failures
> > 
> > > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t (Wstat:
> > > 0 Tests: 8 Failed: 2)
> > > Failed tests: 6, 8
> > > Files=1, Tests=8, 48 wallclock secs ( 0.01 usr 0.01 sys + 0.88 cusr
> > > 0.56 csys = 1.46 CPU)
> > > Result: FAIL
> > > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t: bad
> > > status 1
> > > *Ignoring failure from known-bad test
> > > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t*
> > > [11:24:16] ./tests/bugs/glusterd/bug-1242543-replace-brick.t .. ok
> > > 17587 ms
> > > [11:24:16]
> > > All tests successful
> > 
> > > On 09/22/2015 08:46 PM, Krutika Dhananjay wrote:
> > > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> > > >
> > > > Ctrl + f 'not ok'.
> > > >
> > > > -Krutika
> > > >
> > > > 
> > > >
> > > > *From: *"Atin Mukherjee" 
> > > > *To: *"Krutika Dhananjay" , "Gluster Devel"
> > > > 
> > > > *Cc: *"Gaurav Garg" , "Aravinda"
> > > > , "Kotresh Hiremath Ravishankar"
> > > > 
> > > > *Sent: *Tuesday, September 22, 2015 8:39:56 PM
> > > > *Subject: *Re: Spurious failures
> > > >
> > > > Krutika,
> > > >
> > > > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t is
> > > > already a part of bad_tests () in both mainline and 3.7. Could you
> > > > provide me the link where this test has failed explicitly and that has
> > > > caused the regression to fail?
> > > >
> > > > ~Atin
> > > >
> > > >
> > > > On 09/22/2015 07:27 PM, Krutika Dhananjay wrote:
> > > > > Hi,
> > > > >
> > > > > The following tests seem to be failing consistently on the build
> > > > > machines in Linux:
> > > > >
> > > > > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t ..
> > > > >
> > > > > ./tests/geo-rep/georep-basic-dr-rsync.t ..
> > > > >
> > > > > ./tests/geo-rep/georep-basic-dr-tarssh.t ..
> > > > >
> > > > > I have added these tests into the tracker etherpad.
> > > > >
> > > > > Meanwhile could someone from geo-rep and glusterd team take a look or
> > > > > perhaps move them to bad tests list?
> > > > >
> > > > >
> > > > > Here is one place where the three tests failed:
> > > > >
> > > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> > > > >
> > > > > -Krutika
> > > > >
> > > >
> > > >
> > 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-23 Thread Michael Scherer

Le mercredi 23 septembre 2015 à 03:25 -0400, Kotresh Hiremath
Ravishankar a écrit :
> Hi Krutika,
> 
> Looks like the prerequisites for geo-replication to work is changed
> in slave21
> 
> Hi Michael,

Hi,

> Could you please check following settings are made in all linux regression 
> machines?

Yeah, I will add to salt.

> Or provide me with root password so that I can verify.

Root login using password should be disabled, so no. If that's still
working and people use it, that's gonna change soon, too much problems
with it.

> 1. Setup Passwordless SSH for the root user:

Can you be more explicit on where should the user come from so I can
properly integrate that ?

There is something adding lots of line to /root/.ssh/authorized_keys on
the slave, and this make me quite unconfortable, so if that's it, I
rather have it done cleanly, and for that, I need to understand the
test, and the requirement.
 
> 2. Add below line in /root/.bashrc. This is required as geo-rep does "gluster 
> --version" via ssh
>and it can't find the gluster PATH via ssh.
>  export PATH=$PATH:/build/install/sbin:/build/install/bin

I will do this one.

Is georep supposed to work on other platform like freebsd ? ( because
freebsd do not have bash, so I have to adapt to local way, but if that's
not gonna be tested, I rather not spend too much time on reading the
handbook for now )

> Once above settings are done, the following script should output proper 
> version.
> 
> ---
> #!/bin/bash
> 
> function SSHM()
> {
> ssh -q \
> -oPasswordAuthentication=no \
> -oStrictHostKeyChecking=no \
> -oControlMaster=yes \
> "$@";
> }
> 
> function cmd_slave()
> {
> local cmd_line;
> cmd_line=$(cat < function do_verify() {
> ver=\$(gluster --version | head -1 | cut -f2 -d " ");
> echo \$ver;
> };
> source /etc/profile && do_verify;
> EOF
> );
> echo $cmd_line;
> }[root@slave32 ~]
> 
> HOST=$1
> cmd_line=$(cmd_slave);
> ver=`SSHM root@$HOST bash -c "'$cmd_line'"`;
> echo $ver
> -
> 
> I could verify for slave32. 
> [root@slave32 ~]# vi /tmp/gver.sh 
> [root@slave32 ~]# /tmp/gver.sh slave32
> 3.8dev
> 
> Please help me in verifying the same for all the linux regression machines.
> 

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-23 Thread Kotresh Hiremath Ravishankar

Hi Michael,

Please find my replies below.

>>> Root login using password should be disabled, so no. If that's still
>>> working and people use it, that's gonna change soon, too much problems
>>> with it.

  Ok

>>>Can you be more explicit on where should the user come from so I can
>>>properly integrate that ?

  It's just PasswordLess SSH from root to root on to same host.
  1. Generate ssh key:
#ssh-keygen
  2. Add it to /root/.ssh/authorized_keys
#ssh-copy-id -i  root@host

  Requirement by geo-replication:
'ssh root@host' should not ask for password


>>>There is something adding lots of line to /root/.ssh/authorized_keys on
>>>the slave, and this make me quite unconfortable, so if that's it, I
>>>rather have it done cleanly, and for that, I need to understand the
>>>test, and the requirement.

  Yes, geo-rep is doing it. It adds only once per session. Since the
   test is running continuously for different patches, it's building up.
   I will submit a patch to clean it up in geo-rep testsuite itself.

>>>I will do this one.
  
Thank you!

>>>Is georep supposed to work on other platform like freebsd ? ( because
>>>freebsd do not have bash, so I have to adapt to local way, but if that's
>>>not gonna be tested, I rather not spend too much time on reading the
>>>handbook for now )

As of now it is supported only on Linux, it has known issues with other 
platforms 
such as NetBSD...

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Michael Scherer" 
> To: "Kotresh Hiremath Ravishankar" 
> Cc: "Krutika Dhananjay" , "Atin Mukherjee" 
> , "Gaurav Garg"
> , "Aravinda" , "Gluster Devel" 
> 
> Sent: Wednesday, September 23, 2015 3:30:39 PM
> Subject: Re: Spurious failures
> 
> Le mercredi 23 septembre 2015 à 03:25 -0400, Kotresh Hiremath
> Ravishankar a écrit :
> > Hi Krutika,
> > 
> > Looks like the prerequisites for geo-replication to work is changed
> > in slave21
> > 
> > Hi Michael,
> 
> Hi,
> 
> > Could you please check following settings are made in all linux regression
> > machines?
> 
> Yeah, I will add to salt.
> 
> > Or provide me with root password so that I can verify.
> 
> Root login using password should be disabled, so no. If that's still
> working and people use it, that's gonna change soon, too much problems
> with it.
> 
> > 1. Setup Passwordless SSH for the root user:
> 
> Can you be more explicit on where should the user come from so I can
> properly integrate that ?
> 
> There is something adding lots of line to /root/.ssh/authorized_keys on
> the slave, and this make me quite unconfortable, so if that's it, I
> rather have it done cleanly, and for that, I need to understand the
> test, and the requirement.
>  
> > 2. Add below line in /root/.bashrc. This is required as geo-rep does
> > "gluster --version" via ssh
> >and it can't find the gluster PATH via ssh.
> >  export PATH=$PATH:/build/install/sbin:/build/install/bin
> 
> I will do this one.
> 
> Is georep supposed to work on other platform like freebsd ? ( because
> freebsd do not have bash, so I have to adapt to local way, but if that's
> not gonna be tested, I rather not spend too much time on reading the
> handbook for now )
> 
> > Once above settings are done, the following script should output proper
> > version.
> > 
> > ---
> > #!/bin/bash
> > 
> > function SSHM()
> > {
> > ssh -q \
> > -oPasswordAuthentication=no \
> > -oStrictHostKeyChecking=no \
> > -oControlMaster=yes \
> > "$@";
> > }
> > 
> > function cmd_slave()
> > {
> > local cmd_line;
> > cmd_line=$(cat < > function do_verify() {
> > ver=\$(gluster --version | head -1 | cut -f2 -d " ");
> > echo \$ver;
> > };
> > source /etc/profile && do_verify;
> > EOF
> > );
> > echo $cmd_line;
> > }[root@slave32 ~]
> > 
> > HOST=$1
> > cmd_line=$(cmd_slave);
> > ver=`SSHM root@$HOST bash -c "'$cmd_line'"`;
> > echo $ver
> > -
> > 
> > I could verify for slave32.
> > [root@slave32 ~]# vi /tmp/gver.sh
> > [root@slave32 ~]# /tmp/gver.sh slave32
> > 3.8dev
> > 
> > Please help me in verifying the same for all the linux regression machines.
> > 
> 
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
> 
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-23 Thread Michael Scherer

Le mercredi 23 septembre 2015 à 06:24 -0400, Kotresh Hiremath
Ravishankar a écrit :
> Hi Michael,
> 
> Please find my replies below.
> 
> >>> Root login using password should be disabled, so no. If that's still
> >>> working and people use it, that's gonna change soon, too much problems
> >>> with it.
> 
>   Ok
> 
> >>>Can you be more explicit on where should the user come from so I can
> >>>properly integrate that ?
> 
>   It's just PasswordLess SSH from root to root on to same host.
>   1. Generate ssh key:
> #ssh-keygen
>   2. Add it to /root/.ssh/authorized_keys
> #ssh-copy-id -i  root@host
> 
>   Requirement by geo-replication:
> 'ssh root@host' should not ask for password

So, it is ok if I restrict that to be used only on 127.0.0.1 ?

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-23 Thread Kotresh Hiremath Ravishankar

Hi,

>>>So, it is ok if I restrict that to be used only on 127.0.0.1 ?
I think no, testcases use 'H0' to create volumes
 H0=${H0:=`hostname`};
Geo-rep expects passwordLess SSH to 'H0'  
 

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Michael Scherer" 
> To: "Kotresh Hiremath Ravishankar" 
> Cc: "Krutika Dhananjay" , "Atin Mukherjee" 
> , "Gaurav Garg"
> , "Aravinda" , "Gluster Devel" 
> 
> Sent: Wednesday, 23 September, 2015 5:05:58 PM
> Subject: Re: Spurious failures
> 
> Le mercredi 23 septembre 2015 à 06:24 -0400, Kotresh Hiremath
> Ravishankar a écrit :
> > Hi Michael,
> > 
> > Please find my replies below.
> > 
> > >>> Root login using password should be disabled, so no. If that's still
> > >>> working and people use it, that's gonna change soon, too much problems
> > >>> with it.
> > 
> >   Ok
> > 
> > >>>Can you be more explicit on where should the user come from so I can
> > >>>properly integrate that ?
> > 
> >   It's just PasswordLess SSH from root to root on to same host.
> >   1. Generate ssh key:
> > #ssh-keygen
> >   2. Add it to /root/.ssh/authorized_keys
> > #ssh-copy-id -i  root@host
> > 
> >   Requirement by geo-replication:
> > 'ssh root@host' should not ask for password
> 
> So, it is ok if I restrict that to be used only on 127.0.0.1 ?
> 
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
> 
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-24 Thread Michael Scherer

Le jeudi 24 septembre 2015 à 02:24 -0400, Kotresh Hiremath Ravishankar a
écrit :
> Hi,
> 
> >>>So, it is ok if I restrict that to be used only on 127.0.0.1 ?
> I think no, testcases use 'H0' to create volumes
>  H0=${H0:=`hostname`};
> Geo-rep expects passwordLess SSH to 'H0'  
>  

Ok, this definitely requires some tests and toughts. It only use ipv4
too ?
(I guess yes, since ipv6 is removed from the rackspace build slaves)
-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-24 Thread Kotresh Hiremath Ravishankar

>>> Ok, this definitely requires some tests and toughts. It only use ipv4
>>> too ?
>>> (I guess yes, since ipv6 is removed from the rackspace build slaves)
   
Yes!

Could we know when can these settings be done on all linux slave machines?
If it takes sometime, we should consider moving all geo-rep testcases under 
bad tests
till then.

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Michael Scherer" 
> To: "Kotresh Hiremath Ravishankar" 
> Cc: "Krutika Dhananjay" , "Atin Mukherjee" 
> , "Gaurav Garg"
> , "Aravinda" , "Gluster Devel" 
> 
> Sent: Thursday, 24 September, 2015 1:18:16 PM
> Subject: Re: Spurious failures
> 
> Le jeudi 24 septembre 2015 à 02:24 -0400, Kotresh Hiremath Ravishankar a
> écrit :
> > Hi,
> > 
> > >>>So, it is ok if I restrict that to be used only on 127.0.0.1 ?
> > I think no, testcases use 'H0' to create volumes
> >  H0=${H0:=`hostname`};
> > Geo-rep expects passwordLess SSH to 'H0'
> >  
> 
> Ok, this definitely requires some tests and toughts. It only use ipv4
> too ?
> (I guess yes, since ipv6 is removed from the rackspace build slaves)
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
> 
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-24 Thread Michael Scherer

Le jeudi 24 septembre 2015 à 06:50 -0400, Kotresh Hiremath Ravishankar a
écrit :
> >>> Ok, this definitely requires some tests and toughts. It only use ipv4
> >>> too ?
> >>> (I guess yes, since ipv6 is removed from the rackspace build slaves)
>
> Yes!
> 
> Could we know when can these settings be done on all linux slave machines?
> If it takes sometime, we should consider moving all geo-rep testcases 
> under bad tests
> till then.

I will do that this afternoon, now I have a clear idea of what need to
be done.
( I already pushed the path change )

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-24 Thread Kotresh Hiremath Ravishankar

Thank you:) and also please check the script I had given passes in all machines

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Michael Scherer" 
> To: "Kotresh Hiremath Ravishankar" 
> Cc: "Krutika Dhananjay" , "Atin Mukherjee" 
> , "Gaurav Garg"
> , "Aravinda" , "Gluster Devel" 
> 
> Sent: Thursday, 24 September, 2015 5:00:43 PM
> Subject: Re: Spurious failures
> 
> Le jeudi 24 septembre 2015 à 06:50 -0400, Kotresh Hiremath Ravishankar a
> écrit :
> > >>> Ok, this definitely requires some tests and toughts. It only use ipv4
> > >>> too ?
> > >>> (I guess yes, since ipv6 is removed from the rackspace build slaves)
> >
> > Yes!
> > 
> > Could we know when can these settings be done on all linux slave
> > machines?
> > If it takes sometime, we should consider moving all geo-rep testcases
> > under bad tests
> > till then.
> 
> I will do that this afternoon, now I have a clear idea of what need to
> be done.
> ( I already pushed the path change )
> 
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
> 
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-24 Thread Michael Scherer

Le jeudi 24 septembre 2015 à 07:59 -0400, Kotresh Hiremath Ravishankar a
écrit :
> Thank you:) and also please check the script I had given passes in all 
> machines

So it worked everywhere, but on slave0 and slave1. Not sure what is
wrong, or if they are used, I will check later.


-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-27 Thread Kotresh Hiremath Ravishankar

Thanks Michael!

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Michael Scherer" 
> To: "Kotresh Hiremath Ravishankar" 
> Cc: "Krutika Dhananjay" , "Atin Mukherjee" 
> , "Gaurav Garg"
> , "Aravinda" , "Gluster Devel" 
> 
> Sent: Thursday, 24 September, 2015 11:09:52 PM
> Subject: Re: Spurious failures
> 
> Le jeudi 24 septembre 2015 à 07:59 -0400, Kotresh Hiremath Ravishankar a
> écrit :
> > Thank you:) and also please check the script I had given passes in all
> > machines
> 
> So it worked everywhere, but on slave0 and slave1. Not sure what is
> wrong, or if they are used, I will check later.
> 
> 
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
> 
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Spurious failures? (master)

2015-06-04 Thread Shyam


Just checking,

This review request: http://review.gluster.org/#/c/11073/

Failed in the following tests:

1) Linux
[20:20:16] ./tests/bugs/replicate/bug-880898.t ..
not ok 4
Failed 1/4 subtests
[20:20:16]

http://build.gluster.org/job/rackspace-regression-2GB-triggered/10088/consoleFull

2) NetBSD (Du seems to have faced the same)
[11:56:45] ./tests/basic/afr/sparse-file-self-heal.t ..
not ok 52 Got "" instead of "1"
not ok 53 Got "" instead of "1"
not ok 54
not ok 55 Got "2" instead of "0"
not ok 56 Got "d41d8cd98f00b204e9800998ecf8427e" instead of 
"b6d81b360a5672d80c27430f39153e2c"

not ok 60 Got "0" instead of "1"
Failed 6/64 subtests
[11:56:45]

http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/6233/consoleFull

I have not done any analysis, and also the change request should not 
affect the paths that this test is failing on.


Checking the logs for Linux did not throw any more light on the cause, 
although the brick logs are not updated(?) to reflect the volume create 
and start as per the TC in (1).


Anyone know anything (more) about this?

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Spurious failures again

2015-07-08 Thread Kaushal M

I've been hitting spurious failures in Linux regression runs for my change [1].

The following tests failed,
./tests/basic/afr/replace-brick-self-heal.t [2]
./tests/bugs/replicate/bug-1238508-self-heal.t [3]
./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]
./tests/bugs/quota/bug-1235182.t [5]
./tests/bugs/replicate/bug-977797.t [6]

Can AFR and quota owners look into this?

Thanks.

Kaushal

[1] https://review.gluster.org/11559
[2] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull
[3] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull
[4] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
[5] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
[6] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures again

2015-07-10 Thread Dan Lambright



- Original Message -
> From: "Atin Mukherjee" 
> To: "Vijaikumar Mallikarjuna" 
> Cc: "Gluster Devel" 
> Sent: Wednesday, July 8, 2015 12:46:42 PM
> Subject: Re: [Gluster-devel] Spurious failures again
> 
> 
> 
> I think our linux regression is again unstable. I am seeing at least 10 such
> test cases ( if not more) which have failed. I think we should again start
> maintaining an etherpad page (probably the same earlier one) and keep track
> of them otherwise it will be difficult to track what is fixed and what's not
> if we have to go through mails.
> 
> Thoughts?


+2 to this, we worked very hard to fix a spurious problem in one of our tests 
and have held off merging it until it passes, but we keep hitting other 
spurious errors. 

> 
> -Atin
> Sent from one plus one
> On Jul 8, 2015 8:45 PM, "Vijaikumar M" < vmall...@redhat.com > wrote:
> 
> 
> 
> 
> On Wednesday 08 July 2015 03:53 PM, Vijaikumar M wrote:
> 
> 
> 
> 
> On Wednesday 08 July 2015 03:42 PM, Kaushal M wrote:
> 
> 
> I've been hitting spurious failures in Linux regression runs for my change
> [1].
> 
> The following tests failed,
> ./tests/basic/afr/replace-brick-self-heal.t [2]
> ./tests/bugs/replicate/bug-1238508-self-heal.t [3]
> ./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]
> I will look into this issue
> Patch submitted: http://review.gluster.org/#/c/11583/
> 
> 
> 
> 
> 
> 
> 
> ./tests/bugs/quota/bug-1235182.t [5]
> I have submitted two patches to fix failures from 'bug-1235182.t'
> http://review.gluster.org/#/c/11561/
> http://review.gluster.org/#/c/11510/
> 
> 
> 
> ./tests/bugs/replicate/bug-977797.t [6]
> 
> Can AFR and quota owners look into this?
> 
> Thanks.
> 
> Kaushal
> 
> [1] https://review.gluster.org/11559
> [2]
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull
> [3]
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull
> [4]
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
> [5]
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
> [6]
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull
> 
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures? (master)

2015-06-04 Thread Pranith Kumar Karampuri




On 06/05/2015 02:12 AM, Shyam wrote:

Just checking,

This review request: http://review.gluster.org/#/c/11073/

Failed in the following tests:

1) Linux
[20:20:16] ./tests/bugs/replicate/bug-880898.t ..
not ok 4
This seems to be same RC as in self-heald.t where heal info is not 
failing sometimes when the brick is down.

Failed 1/4 subtests
[20:20:16]

http://build.gluster.org/job/rackspace-regression-2GB-triggered/10088/consoleFull 



2) NetBSD (Du seems to have faced the same)
[11:56:45] ./tests/basic/afr/sparse-file-self-heal.t ..
not ok 52 Got "" instead of "1"
not ok 53 Got "" instead of "1"
not ok 54
not ok 55 Got "2" instead of "0"
not ok 56 Got "d41d8cd98f00b204e9800998ecf8427e" instead of 
"b6d81b360a5672d80c27430f39153e2c"

not ok 60 Got "0" instead of "1"
Failed 6/64 subtests
[11:56:45]

http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/6233/consoleFull 

There is a bug in statedump code path, If it races with STACK_RESET then 
shd seems to crash. I see the following output indicating the process died.


kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill 
-l [sigspec]



I have not done any analysis, and also the change request should not 
affect the paths that this test is failing on.


Checking the logs for Linux did not throw any more light on the cause, 
although the brick logs are not updated(?) to reflect the volume 
create and start as per the TC in (1).


Anyone know anything (more) about this?

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures again

2015-07-08 Thread Vijaikumar M




On Wednesday 08 July 2015 03:42 PM, Kaushal M wrote:

I've been hitting spurious failures in Linux regression runs for my change [1].

The following tests failed,
./tests/basic/afr/replace-brick-self-heal.t [2]
./tests/bugs/replicate/bug-1238508-self-heal.t [3]
./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]

I will look into this issue

./tests/bugs/quota/bug-1235182.t [5]

I have submitted two patches to fix failures from 'bug-1235182.t'
http://review.gluster.org/#/c/11561/
http://review.gluster.org/#/c/11510/


./tests/bugs/replicate/bug-977797.t [6]

Can AFR and quota owners look into this?

Thanks.

Kaushal

[1] https://review.gluster.org/11559
[2] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull
[3] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull
[4] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
[5] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
[6] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures again

2015-07-08 Thread Anuradha Talur



- Original Message -
> From: "Kaushal M" 
> To: "Gluster Devel" 
> Sent: Wednesday, July 8, 2015 3:42:12 PM
> Subject: [Gluster-devel] Spurious failures again
> 
> I've been hitting spurious failures in Linux regression runs for my change
> [1].
> 
> The following tests failed,
> ./tests/basic/afr/replace-brick-self-heal.t [2]
> ./tests/bugs/replicate/bug-1238508-self-heal.t [3]
> ./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]
> ./tests/bugs/quota/bug-1235182.t [5]
> ./tests/bugs/replicate/bug-977797.t [6]
> 
> Can AFR and quota owners look into this?
> 
> Thanks.
> 
> Kaushal
> 
> [1] https://review.gluster.org/11559
> [2]
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull

Will look into this.
> [3]
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull
For 3rd one the patch needs to be rebased. Ravi sent a fix 
http://review.gluster.org/#/c/11556/ .
> [4]
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
> [5]
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
> [6]
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 

-- 
Thanks,
Anuradha.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures again

2015-07-08 Thread Ravishankar N




On 07/08/2015 03:57 PM, Anuradha Talur wrote:


- Original Message -

From: "Kaushal M" 
To: "Gluster Devel" 
Sent: Wednesday, July 8, 2015 3:42:12 PM
Subject: [Gluster-devel] Spurious failures again

I've been hitting spurious failures in Linux regression runs for my change
[1].

The following tests failed,
./tests/basic/afr/replace-brick-self-heal.t [2]
./tests/bugs/replicate/bug-1238508-self-heal.t [3]
./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]
./tests/bugs/quota/bug-1235182.t [5]
./tests/bugs/replicate/bug-977797.t [6]



Ran ./tests/bugs/replicate/bug-977797.t  multiple times in a loop, no 
failure observed. The logs in [6] seem inaccessible  as well.




Can AFR and quota owners look into this?

Thanks.

Kaushal

[1] https://review.gluster.org/11559
[2]
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull

Will look into this.

[3]
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull

For 3rd one the patch needs to be rebased. Ravi sent a fix 
http://review.gluster.org/#/c/11556/ .

[4]
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
[5]
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
[6]
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures again

2015-07-08 Thread Kaushal M

Thanks everyone who's looking into these. But I got one more failure
and a different one this time.

/tests/bugs/quota/bug-1178130.t [1]

The test passed but a core file was generated.

[1]: 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12101/consoleText

On Wed, Jul 8, 2015 at 5:01 PM, Ravishankar N  wrote:
>
>
> On 07/08/2015 03:57 PM, Anuradha Talur wrote:
>>
>>
>> - Original Message -
>>>
>>> From: "Kaushal M" 
>>> To: "Gluster Devel" 
>>> Sent: Wednesday, July 8, 2015 3:42:12 PM
>>> Subject: [Gluster-devel] Spurious failures again
>>>
>>> I've been hitting spurious failures in Linux regression runs for my
>>> change
>>> [1].
>>>
>>> The following tests failed,
>>> ./tests/basic/afr/replace-brick-self-heal.t [2]
>>> ./tests/bugs/replicate/bug-1238508-self-heal.t [3]
>>> ./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]
>>> ./tests/bugs/quota/bug-1235182.t [5]
>>> ./tests/bugs/replicate/bug-977797.t [6]
>
>
>
> Ran ./tests/bugs/replicate/bug-977797.t  multiple times in a loop, no
> failure observed. The logs in [6] seem inaccessible  as well.
>
>
>>>
>>> Can AFR and quota owners look into this?
>>>
>>> Thanks.
>>>
>>> Kaushal
>>>
>>> [1] https://review.gluster.org/11559
>>> [2]
>>>
>>> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull
>>
>> Will look into this.
>>>
>>> [3]
>>>
>>> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull
>>
>> For 3rd one the patch needs to be rebased. Ravi sent a fix
>> http://review.gluster.org/#/c/11556/ .
>>>
>>> [4]
>>>
>>> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
>>> [5]
>>>
>>> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
>>> [6]
>>>
>>> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures again

2015-07-08 Thread Vijaikumar M




On Wednesday 08 July 2015 03:53 PM, Vijaikumar M wrote:



On Wednesday 08 July 2015 03:42 PM, Kaushal M wrote:
I've been hitting spurious failures in Linux regression runs for my 
change [1].


The following tests failed,
./tests/basic/afr/replace-brick-self-heal.t [2]
./tests/bugs/replicate/bug-1238508-self-heal.t [3]
./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]

I will look into this issue

Patch submitted: http://review.gluster.org/#/c/11583/




./tests/bugs/quota/bug-1235182.t [5]

I have submitted two patches to fix failures from 'bug-1235182.t'
http://review.gluster.org/#/c/11561/
http://review.gluster.org/#/c/11510/


./tests/bugs/replicate/bug-977797.t [6]

Can AFR and quota owners look into this?

Thanks.

Kaushal

[1] https://review.gluster.org/11559
[2] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull
[3] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull
[4] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
[5] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
[6] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures again

2015-07-08 Thread Atin Mukherjee

I think our linux regression is again unstable. I am seeing at least 10
such test cases ( if not more) which have failed. I think we should again
start maintaining an etherpad page (probably the same earlier one) and keep
track of them otherwise it will be difficult to track what is fixed and
what's not if we have to go through mails.

Thoughts?

-Atin
Sent from one plus one
On Jul 8, 2015 8:45 PM, "Vijaikumar M"  wrote:

>
>
> On Wednesday 08 July 2015 03:53 PM, Vijaikumar M wrote:
>
>>
>>
>> On Wednesday 08 July 2015 03:42 PM, Kaushal M wrote:
>>
>>> I've been hitting spurious failures in Linux regression runs for my
>>> change [1].
>>>
>>> The following tests failed,
>>> ./tests/basic/afr/replace-brick-self-heal.t [2]
>>> ./tests/bugs/replicate/bug-1238508-self-heal.t [3]
>>> ./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]
>>>
>> I will look into this issue
>>
> Patch submitted: http://review.gluster.org/#/c/11583/
>
>
>
>  ./tests/bugs/quota/bug-1235182.t [5]
>>>
>> I have submitted two patches to fix failures from 'bug-1235182.t'
>> http://review.gluster.org/#/c/11561/
>> http://review.gluster.org/#/c/11510/
>>
>>  ./tests/bugs/replicate/bug-977797.t [6]
>>>
>>> Can AFR and quota owners look into this?
>>>
>>> Thanks.
>>>
>>> Kaushal
>>>
>>> [1] https://review.gluster.org/11559
>>> [2]
>>> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull
>>> [3]
>>> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull
>>> [4]
>>> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
>>> [5]
>>> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
>>> [6]
>>> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull
>>>
>>
>>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures again

2015-07-08 Thread Ravishankar N




On 07/08/2015 11:16 PM, Atin Mukherjee wrote:


I think our linux regression is again unstable. I am seeing at least 
10 such test cases ( if not more) which have failed. I think we should 
again start maintaining an etherpad page (probably the same earlier 
one) and keep track of them otherwise it will be difficult to track 
what is fixed and what's not if we have to go through mails.


Thoughts?




Makes sense. The link is here 
https://public.pad.fsfe.org/p/gluster-spurious-failures

Perhaps we should remove the entries and start fresh.

-Ravi


-Atin
Sent from one plus one

On Jul 8, 2015 8:45 PM, "Vijaikumar M" > wrote:




On Wednesday 08 July 2015 03:53 PM, Vijaikumar M wrote:



On Wednesday 08 July 2015 03:42 PM, Kaushal M wrote:

I've been hitting spurious failures in Linux regression
runs for my change [1].

The following tests failed,
./tests/basic/afr/replace-brick-self-heal.t [2]
./tests/bugs/replicate/bug-1238508-self-heal.t [3]
./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]

I will look into this issue

Patch submitted: http://review.gluster.org/#/c/11583/



./tests/bugs/quota/bug-1235182.t [5]

I have submitted two patches to fix failures from 'bug-1235182.t'
http://review.gluster.org/#/c/11561/
http://review.gluster.org/#/c/11510/

./tests/bugs/replicate/bug-977797.t [6]

Can AFR and quota owners look into this?

Thanks.

Kaushal

[1] https://review.gluster.org/11559
[2]

http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull
[3]

http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull
[4]

http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
[5]

http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
[6]

http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull



___
Gluster-devel mailing list
Gluster-devel@gluster.org 
http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures again

2015-07-09 Thread Jeff Darcy

Sad but true.  More tests are failing than passing, and the failures are
often *clearly* unrelated to the patches they're supposedly testing.
Let's revive the Etherpad, and use it to track progress as we clean this
up.

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Spurious Failures in regression runs

2015-03-30 Thread Vijay Bellur


Hi All,

We are attempting to capture all known spurious regression failures from 
the jenkins instance in build.gluster.org at [1].
The issues listed in the etherpad impede our patch merging workflow and 
need to be sorted out before we branch
release-3.7. If you happen to be the owner of one or more issues in the 
etherpad, can you please look into the failures and

have them addressed soon?

Thanks,
Vijay

[1] https://public.pad.fsfe.org/p/gluster-spurious-failures

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] spurious failures in split-brain-healing.t

2015-03-10 Thread Emmanuel Dreyfus

Hello

NetBSD also exhibit spurious failures in split-brain-healing.t
I would be glad if someone could help here.

I found the following problems:

1) On *BSD, ls ran as root lists dot-files, which includes .glusterfs.
Fix is simple:
-replica_0_files_list=(`ls $B0/${V0}1`)
-replica_1_files_list=(`ls $B0/${V0}3`)
+replica_0_files_list=( $B0/${V0}1/* )
+replica_1_files_list=( $B0/${V0}3/* )

2) The spurious failure: accessing files here succeeds instead of giving EIO.
Help wanted!
### Acessing the files should now give EIO. 
###
TEST ! cat file1  
TEST ! cat file2
(...)

3) later I hit this, I do not know yet if it is a consequence or not:
assertion "list_empty (&priv->table.lru[i])" failed: file "quick-read.c", line 
1052, function "qr_inode_table_destroy"




-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious Failures in regression runs

2015-03-30 Thread Justin Clift

On 30 Mar 2015, at 18:54, Vijay Bellur  wrote:
> Hi All,
> 
> We are attempting to capture all known spurious regression failures from the 
> jenkins instance in build.gluster.org at [1].
> The issues listed in the etherpad impede our patch merging workflow and need 
> to be sorted out before we branch
> release-3.7. If you happen to be the owner of one or more issues in the 
> etherpad, can you please look into the failures and
> have them addressed soon?

To help show up more regression failures, we ran 20x new VM's
in Rackspace with a full regression test each of master head
branch:

 * Two hung regression tests on tests/bugs/posix/bug-1113960.t
   * Still hung in case anyone wants to check them out
 * 162.242.167.96
 * 162.242.167.132
 * Both allowing remote root login, and using our jenkins
   slave password as their root pw

* 2 x failures on ./tests/basic/afr/sparse-file-self-heal.t
  Failed tests:  1-6, 11, 20-30, 33-34, 36, 41, 50-61, 64

  Added to etherpad

* 1 x failure on ./tests/bugs/disperse/bug-1187474.t
  Failed tests:  11-12

  Added to etherpad

* 1 x failure on ./tests/basic/uss.t
  Failed test:  153

  Already on etherpad

Looks like our general failure rate is improving. :)  The hangs
are a bit worrying though. :(

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious Failures in regression runs

2015-03-30 Thread Nithya Balachandran

I'll take a look at the hangs.


Regards,
Nithya

- Original Message -
From: "Justin Clift" 
To: "Vijay Bellur" 
Cc: "Gluster Devel" , "Nithya Balachandran" 

Sent: Tuesday, 31 March, 2015 5:40:29 AM
Subject: Re: [Gluster-devel] Spurious Failures in regression runs

On 30 Mar 2015, at 18:54, Vijay Bellur  wrote:
> Hi All,
> 
> We are attempting to capture all known spurious regression failures from the 
> jenkins instance in build.gluster.org at [1].
> The issues listed in the etherpad impede our patch merging workflow and need 
> to be sorted out before we branch
> release-3.7. If you happen to be the owner of one or more issues in the 
> etherpad, can you please look into the failures and
> have them addressed soon?

To help show up more regression failures, we ran 20x new VM's
in Rackspace with a full regression test each of master head
branch:

 * Two hung regression tests on tests/bugs/posix/bug-1113960.t
   * Still hung in case anyone wants to check them out
 * 162.242.167.96
 * 162.242.167.132
 * Both allowing remote root login, and using our jenkins
   slave password as their root pw

* 2 x failures on ./tests/basic/afr/sparse-file-self-heal.t
  Failed tests:  1-6, 11, 20-30, 33-34, 36, 41, 50-61, 64

  Added to etherpad

* 1 x failure on ./tests/bugs/disperse/bug-1187474.t
  Failed tests:  11-12

  Added to etherpad

* 1 x failure on ./tests/basic/uss.t
  Failed test:  153

  Already on etherpad

Looks like our general failure rate is improving. :)  The hangs
are a bit worrying though. :(

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Spurious failures in tests/basic/afr/arbiter.t

2015-07-19 Thread Niels de Vos

I have seen several occurences of failures in arbiter.t now. This is one
of the errors:


https://build.gluster.org/job/rackspace-regression-2GB-triggered/12626/consoleFull

[21:20:20] ./tests/basic/afr/arbiter.t .. 
not ok 7 Got "N" instead of "Y"
not ok 15 
not ok 16 Got "" instead of "1"
not ok 23 Got "" instead of "1"
not ok 25 Got "0" when not expecting it
not ok 26 
not ok 34 Got "0" instead of "1"
not ok 35 Got "0" instead of "1"
not ok 41 Got "" instead of "1"
not ok 47 Got "N" instead of "Y"
Failed 10/47 subtests 
[21:20:20]

Test Summary Report
---
./tests/basic/afr/arbiter.t (Wstat: 0 Tests: 47 Failed: 10)
  Failed tests:  7, 15-16, 23, 25-26, 34-35, 41, 47
Files=1, Tests=47, 243 wallclock secs ( 0.04 usr  0.00 sys + 15.22 cusr  
3.48 csys = 18.74 CPU)
Result: FAIL


Who could have look at this?

Thanks,
Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-15 Thread Pranith Kumar Karampuri

hi,
In the latest build I fired for review.gluster.com/7766 
(http://build.gluster.org/job/regression/4443/console) failed because of 
spurious failure. The script doesn't wait for nfs export to be available. I 
fixed that, but interestingly I found quite a few scripts with same problem. 
Some of the scripts are relying on 'sleep 5' which also could lead to spurious 
failures if the export is not available in 5 seconds. We found that waiting for 
20 seconds is better, but 'sleep 20' would unnecessarily delay the build 
execution. So if you guys are going to write any scripts which has to do nfs 
mounts, please do it the following way:

EXPECT_WITHIN 20 "1" is_nfs_export_available;
TEST mount -t nfs -o vers=3 $H0:/$V0 $N0;

Please review http://review.gluster.com/7773 :-)

I saw one more spurious failure in a snapshot related script 
tests/bugs/bug-1090042.t on the next build fired by Niels.
Joesph (CCed) is debugging it. He agreed to reply what he finds and share it 
with us so that we won't introduce similar bugs in future.

I encourage you guys to share what you fix to prevent spurious failures in 
future.

Thanks
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in split-brain-healing.t

2015-03-10 Thread Ravishankar N



On 03/10/2015 06:55 PM, Emmanuel Dreyfus wrote:

3) later I hit this, I do not know yet if it is a consequence or not:
assertion "list_empty (&priv->table.lru[i])" failed: file "quick-read.c", line 1052, 
function "qr_inode_table_destroy"
This happens in debug builds only, it should be fixed with 
http://review.gluster.org/#/c/9819/

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures in tests/basic/afr/arbiter.t

2015-07-19 Thread Ravishankar N


I'll take a look.
Regards,
Ravi

On 07/20/2015 03:07 AM, Niels de Vos wrote:

I have seen several occurences of failures in arbiter.t now. This is one
of the errors:

 
https://build.gluster.org/job/rackspace-regression-2GB-triggered/12626/consoleFull

 [21:20:20] ./tests/basic/afr/arbiter.t ..
 not ok 7 Got "N" instead of "Y"
 not ok 15
 not ok 16 Got "" instead of "1"
 not ok 23 Got "" instead of "1"
 not ok 25 Got "0" when not expecting it
 not ok 26
 not ok 34 Got "0" instead of "1"
 not ok 35 Got "0" instead of "1"
 not ok 41 Got "" instead of "1"
 not ok 47 Got "N" instead of "Y"
 Failed 10/47 subtests
 [21:20:20]
 
 Test Summary Report

 ---
 ./tests/basic/afr/arbiter.t (Wstat: 0 Tests: 47 Failed: 10)
   Failed tests:  7, 15-16, 23, 25-26, 34-35, 41, 47
 Files=1, Tests=47, 243 wallclock secs ( 0.04 usr  0.00 sys + 15.22 cusr  
3.48 csys = 18.74 CPU)
 Result: FAIL


Who could have look at this?

Thanks,
Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures in tests/basic/afr/arbiter.t

2015-07-20 Thread Niels de Vos

On Mon, Jul 20, 2015 at 09:25:15AM +0530, Ravishankar N wrote:
> I'll take a look.

Thanks. I'm actually not sure if this is a arbiter.t issue, maybe I
blamed it too early? Its the first test that gets executed, and no
others are tried after it failed.

Niels


> Regards,
> Ravi
> 
> On 07/20/2015 03:07 AM, Niels de Vos wrote:
> >I have seen several occurences of failures in arbiter.t now. This is one
> >of the errors:
> >
> > 
> > https://build.gluster.org/job/rackspace-regression-2GB-triggered/12626/consoleFull
> >
> > [21:20:20] ./tests/basic/afr/arbiter.t ..
> > not ok 7 Got "N" instead of "Y"
> > not ok 15
> > not ok 16 Got "" instead of "1"
> > not ok 23 Got "" instead of "1"
> > not ok 25 Got "0" when not expecting it
> > not ok 26
> > not ok 34 Got "0" instead of "1"
> > not ok 35 Got "0" instead of "1"
> > not ok 41 Got "" instead of "1"
> > not ok 47 Got "N" instead of "Y"
> > Failed 10/47 subtests
> > [21:20:20]
> > Test Summary Report
> > ---
> > ./tests/basic/afr/arbiter.t (Wstat: 0 Tests: 47 Failed: 10)
> >   Failed tests:  7, 15-16, 23, 25-26, 34-35, 41, 47
> > Files=1, Tests=47, 243 wallclock secs ( 0.04 usr  0.00 sys + 15.22 cusr 
> >  3.48 csys = 18.74 CPU)
> > Result: FAIL
> >
> >
> >Who could have look at this?
> >
> >Thanks,
> >Niels
> >___
> >Gluster-devel mailing list
> >Gluster-devel@gluster.org
> >http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures in tests/basic/afr/arbiter.t

2015-07-20 Thread Ravishankar N




On 07/20/2015 12:45 PM, Niels de Vos wrote:

On Mon, Jul 20, 2015 at 09:25:15AM +0530, Ravishankar N wrote:

I'll take a look.

Thanks. I'm actually not sure if this is a arbiter.t issue, maybe I
blamed it too early? Its the first test that gets executed, and no
others are tried after it failed.

Niels



Regards,
Ravi

On 07/20/2015 03:07 AM, Niels de Vos wrote:

I have seen several occurences of failures in arbiter.t now. This is one
of the errors:

 
https://build.gluster.org/job/rackspace-regression-2GB-triggered/12626/consoleFull

 [21:20:20] ./tests/basic/afr/arbiter.t ..
 not ok 7 Got "N" instead of "Y"
 not ok 15
 not ok 16 Got "" instead of "1"
 not ok 23 Got "" instead of "1"
 not ok 25 Got "0" when not expecting it
 not ok 26
 not ok 34 Got "0" instead of "1"
 not ok 35 Got "0" instead of "1"
 not ok 41 Got "" instead of "1"
 not ok 47 Got "N" instead of "Y"
 Failed 10/47 subtests
 [21:20:20]
 Test Summary Report
 ---
 ./tests/basic/afr/arbiter.t (Wstat: 0 Tests: 47 Failed: 10)
   Failed tests:  7, 15-16, 23, 25-26, 34-35, 41, 47








So the test #7 that failed is "16 EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" 
force_umount $M0"
Looking at mnt-glusterfs-0.log, I see that the unmount has already 
happened before the actual command was run, at least from the time stamp 
logged by G_LOG() function.


[2015-07-19 21:16:21.784293] I [fuse-bridge.c:4946:fuse_thread_proc] 
0-fuse: unmounting /mnt/glusterfs/0
[2015-07-19 21:16:21.784542] W [glusterfsd.c:1214:cleanup_and_exit] 
(-->/lib64/libpthread.so.0(+0x79d1) [0x7fc3f41c49d1] 
-->glusterfs(glusterfs_sigwaiter+0xe4) [0x409734] 
-->glusterfs(cleanup_and_exit+0x87) [0x407ba7] ) 0-: received signum 
(15), shutting down
[2015-07-19 21:16:21.784571] I [fuse-bridge.c:5645:fini] 0-fuse: 
Unmounting '/mnt/glusterfs/0'.
[2015-07-19 21:16:21.785817332]:++ 
G_LOG:./tests/basic/afr/arbiter.t: TEST: 15 ! stat 
/mnt/glusterfs/0/.meta/graphs/active/patchy-replicate-0/options/arbiter-count 
++
[2015-07-19 21:16:21.796574975]:++ 
G_LOG:./tests/basic/afr/arbiter.t: TEST: 16 Y force_umount 
/mnt/glusterfs/0 ++


I have no clue as to why that could have happened because appending to 
the gluster log files using G_LOG() is done *before* the test is 
executed.In all my trial runs, the G_LOG message gets logged first, 
followed by the logs relevant to the actual command being run.



FWIW, http://review.gluster.org/#/c/4/ changed made the following 
change to arbiter.t  amongst other test cases :


-TEST umount $M0
+EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" force_umount $M0

But I'm not sure doing a umount -f has any impact for fuse mounts.

Regards,
Ravi


 Files=1, Tests=47, 243 wallclock secs ( 0.04 usr  0.00 sys + 15.22 cusr  
3.48 csys = 18.74 CPU)
 Result: FAIL


Who could have look at this?

Thanks,
Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-15 Thread Anand Avati

On Thu, May 15, 2014 at 5:49 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> hi,
> In the latest build I fired for review.gluster.com/7766 (
> http://build.gluster.org/job/regression/4443/console) failed because of
> spurious failure. The script doesn't wait for nfs export to be available. I
> fixed that, but interestingly I found quite a few scripts with same
> problem. Some of the scripts are relying on 'sleep 5' which also could lead
> to spurious failures if the export is not available in 5 seconds. We found
> that waiting for 20 seconds is better, but 'sleep 20' would unnecessarily
> delay the build execution. So if you guys are going to write any scripts
> which has to do nfs mounts, please do it the following way:
>
> EXPECT_WITHIN 20 "1" is_nfs_export_available;
> TEST mount -t nfs -o vers=3 $H0:/$V0 $N0;
>

Always please also add mount -o soft,intr in the regression scripts for
mounting nfs. Becomes so much easier to cleanup any "hung" mess. We
probably need an NFS mounting helper function which can be called like:

TEST mount_nfs $H0:/$V0 $N0;

Thanks

Avati
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-15 Thread Pranith Kumar Karampuri



- Original Message -
> From: "Anand Avati" 
> To: "Pranith Kumar Karampuri" 
> Cc: "Gluster Devel" 
> Sent: Friday, May 16, 2014 6:30:44 AM
> Subject: Re: [Gluster-devel] Spurious failures because of nfs and snapshots
> 
> On Thu, May 15, 2014 at 5:49 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
> 
> > hi,
> > In the latest build I fired for review.gluster.com/7766 (
> > http://build.gluster.org/job/regression/4443/console) failed because of
> > spurious failure. The script doesn't wait for nfs export to be available. I
> > fixed that, but interestingly I found quite a few scripts with same
> > problem. Some of the scripts are relying on 'sleep 5' which also could lead
> > to spurious failures if the export is not available in 5 seconds. We found
> > that waiting for 20 seconds is better, but 'sleep 20' would unnecessarily
> > delay the build execution. So if you guys are going to write any scripts
> > which has to do nfs mounts, please do it the following way:
> >
> > EXPECT_WITHIN 20 "1" is_nfs_export_available;
> > TEST mount -t nfs -o vers=3 $H0:/$V0 $N0;
> >
> 
> Always please also add mount -o soft,intr in the regression scripts for
> mounting nfs. Becomes so much easier to cleanup any "hung" mess. We
> probably need an NFS mounting helper function which can be called like:
> 
> TEST mount_nfs $H0:/$V0 $N0;

Will do, there seems to be some extra-options(noac etc) for some of these, so 
will add one more argument for any extra options for nfs mount.

Pranith
> 
> Thanks
> 
> Avati
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-16 Thread Joseph Fernandes


Hi All,

tests/bugs/bug-1090042.t : 

I was able to reproduce the issue i.e when this test is done in a loop 

for i in {1..135} ; do  ./bugs/bug-1090042.t

When checked the logs 
[2014-05-16 10:49:49.003978] I [rpc-clnt.c:973:rpc_clnt_connection_init] 
0-management: setting frame-timeout to 600
[2014-05-16 10:49:49.004035] I [rpc-clnt.c:988:rpc_clnt_connection_init] 
0-management: defaulting ping-timeout to 30secs
[2014-05-16 10:49:49.004303] I [rpc-clnt.c:973:rpc_clnt_connection_init] 
0-management: setting frame-timeout to 600
[2014-05-16 10:49:49.004340] I [rpc-clnt.c:988:rpc_clnt_connection_init] 
0-management: defaulting ping-timeout to 30secs

The issue is with ping-timeout and is tracked under the bug 

https://bugzilla.redhat.com/show_bug.cgi?id=1096729


The workaround is mentioned in 
https://bugzilla.redhat.com/show_bug.cgi?id=1096729#c8


Regards,
Joe

- Original Message -
From: "Pranith Kumar Karampuri" 
To: "Gluster Devel" 
Cc: "Joseph Fernandes" 
Sent: Friday, May 16, 2014 6:19:54 AM
Subject: Spurious failures because of nfs and snapshots

hi,
In the latest build I fired for review.gluster.com/7766 
(http://build.gluster.org/job/regression/4443/console) failed because of 
spurious failure. The script doesn't wait for nfs export to be available. I 
fixed that, but interestingly I found quite a few scripts with same problem. 
Some of the scripts are relying on 'sleep 5' which also could lead to spurious 
failures if the export is not available in 5 seconds. We found that waiting for 
20 seconds is better, but 'sleep 20' would unnecessarily delay the build 
execution. So if you guys are going to write any scripts which has to do nfs 
mounts, please do it the following way:

EXPECT_WITHIN 20 "1" is_nfs_export_available;
TEST mount -t nfs -o vers=3 $H0:/$V0 $N0;

Please review http://review.gluster.com/7773 :-)

I saw one more spurious failure in a snapshot related script 
tests/bugs/bug-1090042.t on the next build fired by Niels.
Joesph (CCed) is debugging it. He agreed to reply what he finds and share it 
with us so that we won't introduce similar bugs in future.

I encourage you guys to share what you fix to prevent spurious failures in 
future.

Thanks
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-18 Thread Pranith Kumar Karampuri

hi Vijai, Joseph,
In 2 of the last 3 build failures, 
http://build.gluster.org/job/regression/4479/console, 
http://build.gluster.org/job/regression/4478/console this 
test(tests/bugs/bug-1090042.t) failed. Do you guys think it is better to revert 
this test until the fix is available? Please send a patch to revert the test 
case if you guys feel so. You can re-submit it along with the fix to the bug 
mentioned by Joseph.

Pranith.

- Original Message -
> From: "Joseph Fernandes" 
> To: "Pranith Kumar Karampuri" 
> Cc: "Gluster Devel" 
> Sent: Friday, 16 May, 2014 5:13:57 PM
> Subject: Re: Spurious failures because of nfs and snapshots
> 
> 
> Hi All,
> 
> tests/bugs/bug-1090042.t :
> 
> I was able to reproduce the issue i.e when this test is done in a loop
> 
> for i in {1..135} ; do  ./bugs/bug-1090042.t
> 
> When checked the logs
> [2014-05-16 10:49:49.003978] I [rpc-clnt.c:973:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2014-05-16 10:49:49.004035] I [rpc-clnt.c:988:rpc_clnt_connection_init]
> 0-management: defaulting ping-timeout to 30secs
> [2014-05-16 10:49:49.004303] I [rpc-clnt.c:973:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2014-05-16 10:49:49.004340] I [rpc-clnt.c:988:rpc_clnt_connection_init]
> 0-management: defaulting ping-timeout to 30secs
> 
> The issue is with ping-timeout and is tracked under the bug
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1096729
> 
> 
> The workaround is mentioned in
> https://bugzilla.redhat.com/show_bug.cgi?id=1096729#c8
> 
> 
> Regards,
> Joe
> 
> - Original Message -
> From: "Pranith Kumar Karampuri" 
> To: "Gluster Devel" 
> Cc: "Joseph Fernandes" 
> Sent: Friday, May 16, 2014 6:19:54 AM
> Subject: Spurious failures because of nfs and snapshots
> 
> hi,
> In the latest build I fired for review.gluster.com/7766
> (http://build.gluster.org/job/regression/4443/console) failed because of
> spurious failure. The script doesn't wait for nfs export to be
> available. I fixed that, but interestingly I found quite a few scripts
> with same problem. Some of the scripts are relying on 'sleep 5' which
> also could lead to spurious failures if the export is not available in 5
> seconds. We found that waiting for 20 seconds is better, but 'sleep 20'
> would unnecessarily delay the build execution. So if you guys are going
> to write any scripts which has to do nfs mounts, please do it the
> following way:
> 
> EXPECT_WITHIN 20 "1" is_nfs_export_available;
> TEST mount -t nfs -o vers=3 $H0:/$V0 $N0;
> 
> Please review http://review.gluster.com/7773 :-)
> 
> I saw one more spurious failure in a snapshot related script
> tests/bugs/bug-1090042.t on the next build fired by Niels.
> Joesph (CCed) is debugging it. He agreed to reply what he finds and share it
> with us so that we won't introduce similar bugs in future.
> 
> I encourage you guys to share what you fix to prevent spurious failures in
> future.
> 
> Thanks
> Pranith
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-18 Thread Justin Clift

On 16/05/2014, at 1:49 AM, Pranith Kumar Karampuri wrote:
> hi,
>In the latest build I fired for review.gluster.com/7766 
> (http://build.gluster.org/job/regression/4443/console) failed because of 
> spurious failure. The script doesn't wait for nfs export to be available. I 
> fixed that, but interestingly I found quite a few scripts with same problem. 
> Some of the scripts are relying on 'sleep 5' which also could lead to 
> spurious failures if the export is not available in 5 seconds.

Cool.  Fixing this NFS problem across all of the tests would be really
welcome.  That specific failed test (bug-1087198.t) is the most common
one I've seen over the last few weeks, causing about half of all
failures in master.

Eliminating this class of regression failure would be really helpful. :)

+ Justin

--
Open Source and Standards @ Red Hat

twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-18 Thread Pranith Kumar Karampuri



- Original Message -
> From: "Justin Clift" 
> To: "Pranith Kumar Karampuri" 
> Cc: "Gluster Devel" 
> Sent: Monday, 19 May, 2014 10:26:04 AM
> Subject: Re: [Gluster-devel] Spurious failures because of nfs and snapshots
> 
> On 16/05/2014, at 1:49 AM, Pranith Kumar Karampuri wrote:
> > hi,
> >In the latest build I fired for review.gluster.com/7766
> >(http://build.gluster.org/job/regression/4443/console) failed because
> >of spurious failure. The script doesn't wait for nfs export to be
> >available. I fixed that, but interestingly I found quite a few scripts
> >with same problem. Some of the scripts are relying on 'sleep 5' which
> >also could lead to spurious failures if the export is not available in
> >5 seconds.
> 
> Cool.  Fixing this NFS problem across all of the tests would be really
> welcome.  That specific failed test (bug-1087198.t) is the most common
> one I've seen over the last few weeks, causing about half of all
> failures in master.
> 
> Eliminating this class of regression failure would be really helpful. :)

This particular class is eliminated :-). Patch was merged on Friday.

Pranith
> 
> + Justin
> 
> --
> Open Source and Standards @ Red Hat
> 
> twitter.com/realjustinclift
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-18 Thread Justin Clift

On 19/05/2014, at 6:00 AM, Pranith Kumar Karampuri wrote:

> This particular class is eliminated :-). Patch was merged on Friday.


Excellent.  I've just kicked off 10 instances in Rackspace to each run
the regression tests on master head.

Hopefully less than 1/2 of them fail this time.  Has been about 30%
pass rate recently. :)

+ Justin

--
Open Source and Standards @ Red Hat

twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-18 Thread Pranith Kumar Karampuri

- Original Message -
> From: "Justin Clift" 
> To: "Pranith Kumar Karampuri" 
> Cc: "Gluster Devel" 
> Sent: Monday, 19 May, 2014 10:41:03 AM
> Subject: Re: [Gluster-devel] Spurious failures because of nfs and snapshots
> 
> On 19/05/2014, at 6:00 AM, Pranith Kumar Karampuri wrote:
> 
> > This particular class is eliminated :-). Patch was merged on Friday.
> 
> 
> Excellent.  I've just kicked off 10 instances in Rackspace to each run
> the regression tests on master head.
> 
> Hopefully less than 1/2 of them fail this time.  Has been about 30%
> pass rate recently. :)

I am working on one more patch about timeouts at the moment. Will be sending it 
shortly. That should help us manage waiting for timeouts easily.
With the work kaushal, vijay did for providing logs, core files, we should be 
able to reduce the number of spurious regressions. Because now, we can debug 
them without stopping running of regressions :-).

Pranith
> 
> + Justin
> 
> --
> Open Source and Standards @ Red Hat
> 
> twitter.com/realjustinclift
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-18 Thread Vijaikumar M


Hi Joseph,

In the log mentioned below, it say ping-time is set to default value 
30sec.I think issue is different.
Can you please point me to the logs where you where able to re-create 
the problem.


Thanks,
Vijay



On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote:

hi Vijai, Joseph,
 In 2 of the last 3 build failures, 
http://build.gluster.org/job/regression/4479/console, 
http://build.gluster.org/job/regression/4478/console this 
test(tests/bugs/bug-1090042.t) failed. Do you guys think it is better to revert 
this test until the fix is available? Please send a patch to revert the test 
case if you guys feel so. You can re-submit it along with the fix to the bug 
mentioned by Joseph.

Pranith.

- Original Message -

From: "Joseph Fernandes" 
To: "Pranith Kumar Karampuri" 
Cc: "Gluster Devel" 
Sent: Friday, 16 May, 2014 5:13:57 PM
Subject: Re: Spurious failures because of nfs and snapshots


Hi All,

tests/bugs/bug-1090042.t :

I was able to reproduce the issue i.e when this test is done in a loop

for i in {1..135} ; do  ./bugs/bug-1090042.t

When checked the logs
[2014-05-16 10:49:49.003978] I [rpc-clnt.c:973:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2014-05-16 10:49:49.004035] I [rpc-clnt.c:988:rpc_clnt_connection_init]
0-management: defaulting ping-timeout to 30secs
[2014-05-16 10:49:49.004303] I [rpc-clnt.c:973:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2014-05-16 10:49:49.004340] I [rpc-clnt.c:988:rpc_clnt_connection_init]
0-management: defaulting ping-timeout to 30secs

The issue is with ping-timeout and is tracked under the bug

https://bugzilla.redhat.com/show_bug.cgi?id=1096729


The workaround is mentioned in
https://bugzilla.redhat.com/show_bug.cgi?id=1096729#c8


Regards,
Joe

- Original Message -
From: "Pranith Kumar Karampuri" 
To: "Gluster Devel" 
Cc: "Joseph Fernandes" 
Sent: Friday, May 16, 2014 6:19:54 AM
Subject: Spurious failures because of nfs and snapshots

hi,
 In the latest build I fired for review.gluster.com/7766
 (http://build.gluster.org/job/regression/4443/console) failed because of
 spurious failure. The script doesn't wait for nfs export to be
 available. I fixed that, but interestingly I found quite a few scripts
 with same problem. Some of the scripts are relying on 'sleep 5' which
 also could lead to spurious failures if the export is not available in 5
 seconds. We found that waiting for 20 seconds is better, but 'sleep 20'
 would unnecessarily delay the build execution. So if you guys are going
 to write any scripts which has to do nfs mounts, please do it the
 following way:

EXPECT_WITHIN 20 "1" is_nfs_export_available;
TEST mount -t nfs -o vers=3 $H0:/$V0 $N0;

Please review http://review.gluster.com/7773 :-)

I saw one more spurious failure in a snapshot related script
tests/bugs/bug-1090042.t on the next build fired by Niels.
Joesph (CCed) is debugging it. He agreed to reply what he finds and share it
with us so that we won't introduce similar bugs in future.

I encourage you guys to share what you fix to prevent spurious failures in
future.

Thanks
Pranith



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-18 Thread Pranith Kumar Karampuri

The latest build failure also has the same issue:
Download it from here:
http://build.gluster.org:443/logs/glusterfs-logs-20140518%3a22%3a27%3a31.tgz

Pranith

- Original Message -
> From: "Vijaikumar M" 
> To: "Joseph Fernandes" 
> Cc: "Pranith Kumar Karampuri" , "Gluster Devel" 
> 
> Sent: Monday, 19 May, 2014 11:41:28 AM
> Subject: Re: Spurious failures because of nfs and snapshots
> 
> Hi Joseph,
> 
> In the log mentioned below, it say ping-time is set to default value
> 30sec.I think issue is different.
> Can you please point me to the logs where you where able to re-create
> the problem.
> 
> Thanks,
> Vijay
> 
> 
> 
> On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote:
> > hi Vijai, Joseph,
> >  In 2 of the last 3 build failures,
> >  http://build.gluster.org/job/regression/4479/console,
> >  http://build.gluster.org/job/regression/4478/console this
> >  test(tests/bugs/bug-1090042.t) failed. Do you guys think it is better
> >  to revert this test until the fix is available? Please send a patch
> >  to revert the test case if you guys feel so. You can re-submit it
> >  along with the fix to the bug mentioned by Joseph.
> >
> > Pranith.
> >
> > - Original Message -
> >> From: "Joseph Fernandes" 
> >> To: "Pranith Kumar Karampuri" 
> >> Cc: "Gluster Devel" 
> >> Sent: Friday, 16 May, 2014 5:13:57 PM
> >> Subject: Re: Spurious failures because of nfs and snapshots
> >>
> >>
> >> Hi All,
> >>
> >> tests/bugs/bug-1090042.t :
> >>
> >> I was able to reproduce the issue i.e when this test is done in a loop
> >>
> >> for i in {1..135} ; do  ./bugs/bug-1090042.t
> >>
> >> When checked the logs
> >> [2014-05-16 10:49:49.003978] I [rpc-clnt.c:973:rpc_clnt_connection_init]
> >> 0-management: setting frame-timeout to 600
> >> [2014-05-16 10:49:49.004035] I [rpc-clnt.c:988:rpc_clnt_connection_init]
> >> 0-management: defaulting ping-timeout to 30secs
> >> [2014-05-16 10:49:49.004303] I [rpc-clnt.c:973:rpc_clnt_connection_init]
> >> 0-management: setting frame-timeout to 600
> >> [2014-05-16 10:49:49.004340] I [rpc-clnt.c:988:rpc_clnt_connection_init]
> >> 0-management: defaulting ping-timeout to 30secs
> >>
> >> The issue is with ping-timeout and is tracked under the bug
> >>
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1096729
> >>
> >>
> >> The workaround is mentioned in
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1096729#c8
> >>
> >>
> >> Regards,
> >> Joe
> >>
> >> - Original Message -
> >> From: "Pranith Kumar Karampuri" 
> >> To: "Gluster Devel" 
> >> Cc: "Joseph Fernandes" 
> >> Sent: Friday, May 16, 2014 6:19:54 AM
> >> Subject: Spurious failures because of nfs and snapshots
> >>
> >> hi,
> >>  In the latest build I fired for review.gluster.com/7766
> >>  (http://build.gluster.org/job/regression/4443/console) failed because
> >>  of
> >>  spurious failure. The script doesn't wait for nfs export to be
> >>  available. I fixed that, but interestingly I found quite a few
> >>  scripts
> >>  with same problem. Some of the scripts are relying on 'sleep 5' which
> >>  also could lead to spurious failures if the export is not available
> >>  in 5
> >>  seconds. We found that waiting for 20 seconds is better, but 'sleep
> >>  20'
> >>  would unnecessarily delay the build execution. So if you guys are
> >>  going
> >>  to write any scripts which has to do nfs mounts, please do it the
> >>  following way:
> >>
> >> EXPECT_WITHIN 20 "1" is_nfs_export_available;
> >> TEST mount -t nfs -o vers=3 $H0:/$V0 $N0;
> >>
> >> Please review http://review.gluster.com/7773 :-)
> >>
> >> I saw one more spurious failure in a snapshot related script
> >> tests/bugs/bug-1090042.t on the next build fired by Niels.
> >> Joesph (CCed) is debugging it. He agreed to reply what he finds and share
> >> it
> >> with us so that we won't introduce similar bugs in future.
> >>
> >> I encourage you guys to share what you fix to prevent spurious failures in
> >> future.
> >>
> >> Thanks
> >> Pranith
> >>
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-19 Thread Vijaikumar M


Brick disconnected with ping-time out:

Here is the log message
[2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main] 
0-/build/install/sbin/glusterfsd: Started running /build/install/sbi
n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s 
build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9 
1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3 
-p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f 
bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid 
-S /var/run/51fe50a6faf0aae006c815da946caf3a.socket --brick-name 
/var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l 
/build/install/var/log/glusterfs/br 
icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log 
--xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba
7b4b0 --brick-port 49164 --xlator-option 
3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164)
  2 [2014-05-19 04:29:38.141118] I 
[rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting 
ping-timeout to 30secs
  3 [2014-05-19 04:30:09.139521] C 
[rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server 
10.3.129.13:24007 has not responded in the last 30 seconds, disconnecting.




Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where 
ping-timer will be disabled by default for all the rpc connection except 
for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec).



Thanks,
Vijay


On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote:

The latest build failure also has the same issue:
Download it from here:
http://build.gluster.org:443/logs/glusterfs-logs-20140518%3a22%3a27%3a31.tgz

Pranith

- Original Message -

From: "Vijaikumar M" 
To: "Joseph Fernandes" 
Cc: "Pranith Kumar Karampuri" , "Gluster Devel" 

Sent: Monday, 19 May, 2014 11:41:28 AM
Subject: Re: Spurious failures because of nfs and snapshots

Hi Joseph,

In the log mentioned below, it say ping-time is set to default value
30sec.I think issue is different.
Can you please point me to the logs where you where able to re-create
the problem.

Thanks,
Vijay



On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote:

hi Vijai, Joseph,
  In 2 of the last 3 build failures,
  http://build.gluster.org/job/regression/4479/console,
  http://build.gluster.org/job/regression/4478/console this
  test(tests/bugs/bug-1090042.t) failed. Do you guys think it is better
  to revert this test until the fix is available? Please send a patch
  to revert the test case if you guys feel so. You can re-submit it
  along with the fix to the bug mentioned by Joseph.

Pranith.

- Original Message -

From: "Joseph Fernandes" 
To: "Pranith Kumar Karampuri" 
Cc: "Gluster Devel" 
Sent: Friday, 16 May, 2014 5:13:57 PM
Subject: Re: Spurious failures because of nfs and snapshots


Hi All,

tests/bugs/bug-1090042.t :

I was able to reproduce the issue i.e when this test is done in a loop

for i in {1..135} ; do  ./bugs/bug-1090042.t

When checked the logs
[2014-05-16 10:49:49.003978] I [rpc-clnt.c:973:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2014-05-16 10:49:49.004035] I [rpc-clnt.c:988:rpc_clnt_connection_init]
0-management: defaulting ping-timeout to 30secs
[2014-05-16 10:49:49.004303] I [rpc-clnt.c:973:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2014-05-16 10:49:49.004340] I [rpc-clnt.c:988:rpc_clnt_connection_init]
0-management: defaulting ping-timeout to 30secs

The issue is with ping-timeout and is tracked under the bug

https://bugzilla.redhat.com/show_bug.cgi?id=1096729


The workaround is mentioned in
https://bugzilla.redhat.com/show_bug.cgi?id=1096729#c8


Regards,
Joe

- Original Message -
From: "Pranith Kumar Karampuri" 
To: "Gluster Devel" 
Cc: "Joseph Fernandes" 
Sent: Friday, May 16, 2014 6:19:54 AM
Subject: Spurious failures because of nfs and snapshots

hi,
  In the latest build I fired for review.gluster.com/7766
  (http://build.gluster.org/job/regression/4443/console) failed because
  of
  spurious failure. The script doesn't wait for nfs export to be
  available. I fixed that, but interestingly I found quite a few
  scripts
  with same problem. Some of the scripts are relying on 'sleep 5' which
  also could lead to spurious failures if the export is not available
  in 5
  seconds. We found that waiting for 20 seconds is better, but 'sleep
  20'
  would unnecessarily delay the build execution. So if you guys are
  going
  to write any scripts which has to do nfs mounts, please do it the
  following way:

EXPECT_WITHIN 20 "1" is_nfs_export_available;
TEST mount -t nfs -o vers=3 $H0:/$V0 $N0;

Please review http://review.gluster.com/7773 :-)

I saw one more spurious failure in a snapshot related script
tests/b

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-19 Thread Pranith Kumar Karampuri



- Original Message -
> From: "Pranith Kumar Karampuri" 
> To: "Justin Clift" 
> Cc: "Gluster Devel" 
> Sent: Monday, May 19, 2014 11:20:21 AM
> Subject: Re: [Gluster-devel] Spurious failures because of nfs and snapshots
> 
> 
> 
> - Original Message -
> > From: "Justin Clift" 
> > To: "Pranith Kumar Karampuri" 
> > Cc: "Gluster Devel" 
> > Sent: Monday, 19 May, 2014 10:41:03 AM
> > Subject: Re: [Gluster-devel] Spurious failures because of nfs and snapshots
> > 
> > On 19/05/2014, at 6:00 AM, Pranith Kumar Karampuri wrote:
> > 
> > > This particular class is eliminated :-). Patch was merged on Friday.
> > 
> > 
> > Excellent.  I've just kicked off 10 instances in Rackspace to each run
> > the regression tests on master head.
> > 
> > Hopefully less than 1/2 of them fail this time.  Has been about 30%
> > pass rate recently. :)
> 
> I am working on one more patch about timeouts at the moment. Will be sending
> it shortly. That should help us manage waiting for timeouts easily.
> With the work kaushal, vijay did for providing logs, core files, we should be
> able to reduce the number of spurious regressions. Because now, we can debug
> them without stopping running of regressions :-).

The patch is now ready for review at http://review.gluster.com/7799. Intention 
of the patch is to make sure similar events wait for same time for those events 
to happen. i.e. EXPECT_WITHIN related ones. So next time some event stops 
completing the action in the expected time-limit and we want to increase the 
timeout, we only have to change it in one place.

Pranith

> 
> Pranith
> > 
> > + Justin
> > 
> > --
> > Open Source and Standards @ Red Hat
> > 
> > twitter.com/realjustinclift
> > 
> > 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-20 Thread Pranith Kumar Karampuri

hi,
Please resubmit the patches on top of http://review.gluster.com/#/c/7753 to 
prevent frequent regression failures.

Pranith
- Original Message -
> From: "Vijaikumar M" 
> To: "Pranith Kumar Karampuri" 
> Cc: "Joseph Fernandes" , "Gluster Devel" 
> 
> Sent: Monday, May 19, 2014 2:40:47 PM
> Subject: Re: Spurious failures because of nfs and snapshots
> 
> Brick disconnected with ping-time out:
> 
> Here is the log message
> [2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main]
> 0-/build/install/sbin/glusterfsd: Started running /build/install/sbi
> n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s
> build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9
> 1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3
> -p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f
> bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid
> -S /var/run/51fe50a6faf0aae006c815da946caf3a.socket --brick-name
> /var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l
> /build/install/var/log/glusterfs/br
> icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log
> --xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba
> 7b4b0 --brick-port 49164 --xlator-option
> 3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164)
>2 [2014-05-19 04:29:38.141118] I
> [rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting
> ping-timeout to 30secs
>3 [2014-05-19 04:30:09.139521] C
> [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server
> 10.3.129.13:24007 has not responded in the last 30 seconds, disconnecting.
> 
> 
> 
> Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where
> ping-timer will be disabled by default for all the rpc connection except
> for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec).
> 
> 
> Thanks,
> Vijay
> 
> 
> On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote:
> > The latest build failure also has the same issue:
> > Download it from here:
> > http://build.gluster.org:443/logs/glusterfs-logs-20140518%3a22%3a27%3a31.tgz
> >
> > Pranith
> >
> > - Original Message -
> >> From: "Vijaikumar M" 
> >> To: "Joseph Fernandes" 
> >> Cc: "Pranith Kumar Karampuri" , "Gluster Devel"
> >> 
> >> Sent: Monday, 19 May, 2014 11:41:28 AM
> >> Subject: Re: Spurious failures because of nfs and snapshots
> >>
> >> Hi Joseph,
> >>
> >> In the log mentioned below, it say ping-time is set to default value
> >> 30sec.I think issue is different.
> >> Can you please point me to the logs where you where able to re-create
> >> the problem.
> >>
> >> Thanks,
> >> Vijay
> >>
> >>
> >>
> >> On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote:
> >>> hi Vijai, Joseph,
> >>>   In 2 of the last 3 build failures,
> >>>   http://build.gluster.org/job/regression/4479/console,
> >>>   http://build.gluster.org/job/regression/4478/console this
> >>>   test(tests/bugs/bug-1090042.t) failed. Do you guys think it is
> >>>   better
> >>>   to revert this test until the fix is available? Please send a patch
> >>>   to revert the test case if you guys feel so. You can re-submit it
> >>>   along with the fix to the bug mentioned by Joseph.
> >>>
> >>> Pranith.
> >>>
> >>> - Original Message -
>  From: "Joseph Fernandes" 
>  To: "Pranith Kumar Karampuri" 
>  Cc: "Gluster Devel" 
>  Sent: Friday, 16 May, 2014 5:13:57 PM
>  Subject: Re: Spurious failures because of nfs and snapshots
> 
> 
>  Hi All,
> 
>  tests/bugs/bug-1090042.t :
> 
>  I was able to reproduce the issue i.e when this test is done in a loop
> 
>  for i in {1..135} ; do  ./bugs/bug-1090042.t
> 
>  When checked the logs
>  [2014-05-16 10:49:49.003978] I [rpc-clnt.c:973:rpc_clnt_connection_init]
>  0-management: setting frame-timeout to 600
>  [2014-05-16 10:49:49.004035] I [rpc-clnt.c:988:rpc_clnt_connection_init]
>  0-management: defaulting ping-timeout to 30secs
>  [2014-05-16 10:49:49.004303] I [rpc-clnt.c:973:rpc_clnt_connection_init]
>  0-management: setting frame-timeout to 600
>  [2014-05-16 10:49:49.004340] I [rpc-clnt.c:988:rpc_clnt_connection_init]
>  0-management: defaulting ping-timeout to 30secs
> 
>  The issue is with ping-timeout and is tracked under the bug
> 
>  https://bugzilla.redhat.com/show_bug.cgi?id=1096729
> 
> 
>  The workaround is mentioned in
>  https://bugzilla.redhat.com/show_bug.cgi?id=1096729#c8
> 
> 
>  Regards,
>  Joe
> 
>  - Original Message -
>  From: "Pranith Kumar Karampuri" 
>  To: "Gluster Devel" 
>  Cc: "Joseph Fernandes" 
>  Sent: Friday, May 16, 2014 6:19:54 AM
>  Subject: Spurious failures because of nfs and snapshots
> 
>  hi,
>    In

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-20 Thread Pranith Kumar Karampuri

Hey,
Seems like even after this fix is merged, the regression tests are failing 
for the same script. You can check the logs at 
http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz

Relevant logs:
[2014-05-20 20:17:07.026045]  : volume create patchy 
build.gluster.org:/d/backends/patchy1 build.gluster.org:/d/backends/patchy2 : 
SUCCESS
[2014-05-20 20:17:08.030673]  : volume start patchy : SUCCESS
[2014-05-20 20:17:08.279148]  : volume barrier patchy enable : SUCCESS
[2014-05-20 20:17:08.476785]  : volume barrier patchy enable : FAILED : Failed 
to reconfigure barrier.
[2014-05-20 20:17:08.727429]  : volume barrier patchy disable : SUCCESS
[2014-05-20 20:17:08.926995]  : volume barrier patchy disable : FAILED : Failed 
to reconfigure barrier.

Pranith

- Original Message -
> From: "Pranith Kumar Karampuri" 
> To: "Gluster Devel" 
> Cc: "Joseph Fernandes" , "Vijaikumar M" 
> 
> Sent: Tuesday, May 20, 2014 3:41:11 PM
> Subject: Re: Spurious failures because of nfs and snapshots
> 
> hi,
> Please resubmit the patches on top of http://review.gluster.com/#/c/7753
> to prevent frequent regression failures.
> 
> Pranith
> - Original Message -
> > From: "Vijaikumar M" 
> > To: "Pranith Kumar Karampuri" 
> > Cc: "Joseph Fernandes" , "Gluster Devel"
> > 
> > Sent: Monday, May 19, 2014 2:40:47 PM
> > Subject: Re: Spurious failures because of nfs and snapshots
> > 
> > Brick disconnected with ping-time out:
> > 
> > Here is the log message
> > [2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main]
> > 0-/build/install/sbin/glusterfsd: Started running /build/install/sbi
> > n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s
> > build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9
> > 1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3
> > -p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f
> > bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid
> > -S /var/run/51fe50a6faf0aae006c815da946caf3a.socket --brick-name
> > /var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l
> > /build/install/var/log/glusterfs/br
> > icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log
> > --xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba
> > 7b4b0 --brick-port 49164 --xlator-option
> > 3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164)
> >2 [2014-05-19 04:29:38.141118] I
> > [rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting
> > ping-timeout to 30secs
> >3 [2014-05-19 04:30:09.139521] C
> > [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server
> > 10.3.129.13:24007 has not responded in the last 30 seconds, disconnecting.
> > 
> > 
> > 
> > Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where
> > ping-timer will be disabled by default for all the rpc connection except
> > for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec).
> > 
> > 
> > Thanks,
> > Vijay
> > 
> > 
> > On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote:
> > > The latest build failure also has the same issue:
> > > Download it from here:
> > > http://build.gluster.org:443/logs/glusterfs-logs-20140518%3a22%3a27%3a31.tgz
> > >
> > > Pranith
> > >
> > > - Original Message -
> > >> From: "Vijaikumar M" 
> > >> To: "Joseph Fernandes" 
> > >> Cc: "Pranith Kumar Karampuri" , "Gluster Devel"
> > >> 
> > >> Sent: Monday, 19 May, 2014 11:41:28 AM
> > >> Subject: Re: Spurious failures because of nfs and snapshots
> > >>
> > >> Hi Joseph,
> > >>
> > >> In the log mentioned below, it say ping-time is set to default value
> > >> 30sec.I think issue is different.
> > >> Can you please point me to the logs where you where able to re-create
> > >> the problem.
> > >>
> > >> Thanks,
> > >> Vijay
> > >>
> > >>
> > >>
> > >> On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote:
> > >>> hi Vijai, Joseph,
> > >>>   In 2 of the last 3 build failures,
> > >>>   http://build.gluster.org/job/regression/4479/console,
> > >>>   http://build.gluster.org/job/regression/4478/console this
> > >>>   test(tests/bugs/bug-1090042.t) failed. Do you guys think it is
> > >>>   better
> > >>>   to revert this test until the fix is available? Please send a
> > >>>   patch
> > >>>   to revert the test case if you guys feel so. You can re-submit it
> > >>>   along with the fix to the bug mentioned by Joseph.
> > >>>
> > >>> Pranith.
> > >>>
> > >>> - Original Message -
> >  From: "Joseph Fernandes" 
> >  To: "Pranith Kumar Karampuri" 
> >  Cc: "Gluster Devel" 
> >  Sent: Friday, 16 May, 2014 5:13:57 PM
> >  Subject: Re: Spurious failures because of nfs and snapshots
> > 
> > 
> >  Hi All,
> > 
> >  tests/bugs/bug-1090042.t :
> > 
> >  I was able to reproduce th

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-20 Thread Vijaikumar M

From the log: 
http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a17%3a10%3a51.tgzit 
looks like glusterd was hung:


*Glusterd log:**
* 5305 [2014-05-20 20:08:55.040665] E 
[glusterd-snapshot.c:3805:glusterd_add_brick_to_snap_volume] 
0-management: Unable to fetch snap device (vol1.brick_snapdevice0). 
Leaving empty
 5306 [2014-05-20 20:08:55.649146] I 
[rpc-clnt.c:973:rpc_clnt_connection_init] 0-management: setting 
frame-timeout to 600
 5307 [2014-05-20 20:08:55.663181] I 
[rpc-clnt.c:973:rpc_clnt_connection_init] 0-management: setting 
frame-timeout to 600
 5308 [2014-05-20 20:16:55.541197] W 
[glusterfsd.c:1182:cleanup_and_exit] (--> 0-: received signum (15), 
shutting down


Glusterd was hung when executing the testcase ./tests/bugs/bug-1090042.t.

*Cli log:**
*72649 [2014-05-20 20:12:51.960765] T 
[rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
 72650 [2014-05-20 20:12:51.960850] T [socket.c:2689:socket_connect] 
(-->/build/install/lib/libglusterfs.so.0(gf_timer_proc+0x1a2) 
[0x7ff8b6609994] 
(-->/build/install/lib/libgfrpc.so.0(rpc_clnt_reconnect+0x137) 
[0x7ff8b5d3305b] (- 
->/build/install/lib/libgfrpc.so.0(rpc_transport_connect+0x74) 
[0x7ff8b5d30071]))) 0-glusterfs: connect () called on transport already 
connected
 72651 [2014-05-20 20:12:52.960943] T 
[rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
 72652 [2014-05-20 20:12:52.960999] T [socket.c:2697:socket_connect] 
0-glusterfs: connecting 0x1e0fcc0, state=0 gen=0 sock=-1
 72653 [2014-05-20 20:12:52.961038] W [dict.c:1059:data_to_str] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(+0xb5f3) 
[0x7ff8ad9e95f3] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(socket_clien 
t_get_remote_sockaddr+0x10a) [0x7ff8ad9ed568] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(client_fill_address_family+0xf1) 
[0x7ff8ad9ec7d0]))) 0-dict: data is NULL
 72654 [2014-05-20 20:12:52.961070] W [dict.c:1059:data_to_str] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(+0xb5f3) 
[0x7ff8ad9e95f3] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(socket_clien 
t_get_remote_sockaddr+0x10a) [0x7ff8ad9ed568] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(client_fill_address_family+0x100) 
[0x7ff8ad9ec7df]))) 0-dict: data is NULL
 72655 [2014-05-20 20:12:52.961079] E 
[name.c:140:client_fill_address_family] 0-glusterfs: 
transport.address-family not specified. Could not guess default value 
from (remote-host:(null) or transport.unix.connect-path:(null)) 
optio   ns
 72656 [2014-05-20 20:12:54.961273] T 
[rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
 72657 [2014-05-20 20:12:54.961404] T [socket.c:2689:socket_connect] 
(-->/build/install/lib/libglusterfs.so.0(gf_timer_proc+0x1a2) 
[0x7ff8b6609994] 
(-->/build/install/lib/libgfrpc.so.0(rpc_clnt_reconnect+0x137) 
[0x7ff8b5d3305b] (- 
->/build/install/lib/libgfrpc.so.0(rpc_transport_connect+0x74) 
[0x7ff8b5d30071]))) 0-glusterfs: connect () called on transport already 
connected
 72658 [2014-05-20 20:12:55.120645] D [cli-cmd.c:384:cli_cmd_submit] 
0-cli: Returning 110
 72659 [2014-05-20 20:12:55.120723] D 
[cli-rpc-ops.c:8716:gf_cli_snapshot] 0-cli: Returning 110



Now we need to find why glusterd was hung.


Thanks,
Vijay



On Wednesday 21 May 2014 06:46 AM, Pranith Kumar Karampuri wrote:

Hey,
 Seems like even after this fix is merged, the regression tests are failing 
for the same script. You can check the logs at 
http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz

Relevant logs:
[2014-05-20 20:17:07.026045]  : volume create patchy 
build.gluster.org:/d/backends/patchy1 build.gluster.org:/d/backends/patchy2 : 
SUCCESS
[2014-05-20 20:17:08.030673]  : volume start patchy : SUCCESS
[2014-05-20 20:17:08.279148]  : volume barrier patchy enable : SUCCESS
[2014-05-20 20:17:08.476785]  : volume barrier patchy enable : FAILED : Failed 
to reconfigure barrier.
[2014-05-20 20:17:08.727429]  : volume barrier patchy disable : SUCCESS
[2014-05-20 20:17:08.926995]  : volume barrier patchy disable : FAILED : Failed 
to reconfigure barrier.

Pranith

- Original Message -

From: "Pranith Kumar Karampuri" 
To: "Gluster Devel" 
Cc: "Joseph Fernandes" , "Vijaikumar M" 

Sent: Tuesday, May 20, 2014 3:41:11 PM
Subject: Re: Spurious failures because of nfs and snapshots

hi,
 Please resubmit the patches on top of http://review.gluster.com/#/c/7753
 to prevent frequent regression failures.

Pranith
- Original Message -

From: "Vijaikumar M" 
To: "Pranith Kumar Karampuri" 
Cc: "Joseph Fernandes" , "Gluster Devel"

Sent: Monday, May 19, 2014 2:40:47 PM
Subject: Re: Spurious failures because of nfs and snapshots

Brick disconnected with ping-time out:

Here is the log message
[2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main]
0-/build/install/sbin/glusterfsd: Started running /build/install/sbi
n/

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-21 Thread Pranith Kumar Karampuri



- Original Message -
> From: "Atin Mukherjee" 
> To: gluster-devel@gluster.org, "Pranith Kumar Karampuri" 
> Sent: Wednesday, May 21, 2014 3:39:21 PM
> Subject: Re: Fwd: Re: [Gluster-devel] Spurious failures because of nfs and 
> snapshots
> 
> 
> 
> On 05/21/2014 11:42 AM, Atin Mukherjee wrote:
> > 
> > 
> > On 05/21/2014 10:54 AM, SATHEESARAN wrote:
> >> Guys,
> >>
> >> This is the issue pointed out by Pranith with regard to Barrier.
> >> I was reading through it.
> >>
> >> But I wanted to bring it to concern
> >>
> >> -- S
> >>
> >>
> >>  Original Message 
> >> Subject:   Re: [Gluster-devel] Spurious failures because of nfs and
> >> snapshots
> >> Date:  Tue, 20 May 2014 21:16:57 -0400 (EDT)
> >> From:  Pranith Kumar Karampuri 
> >> To:Vijaikumar M , Joseph Fernandes
> >> 
> >> CC:Gluster Devel 
> >>
> >>
> >>
> >> Hey,
> >> Seems like even after this fix is merged, the regression tests are
> >> failing for the same script. You can check the logs at
> >> 
> >> http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz
> > Pranith,
> > 
> > Is this the correct link? I don't see any log having this sequence there.
> > Also looking at the log from this mail, this is expected as per the
> > barrier functionality, an enable request followed by another enable
> > should always fail and the same happens for disable.
> > 
> > Can you please confirm the link and which particular regression test is
> > causing this issue, is it bug-1090042.t?
> > 
> > --Atin
> >>
> >> Relevant logs:
> >> [2014-05-20 20:17:07.026045]  : volume create patchy
> >> build.gluster.org:/d/backends/patchy1
> >> build.gluster.org:/d/backends/patchy2 : SUCCESS
> >> [2014-05-20 20:17:08.030673]  : volume start patchy : SUCCESS
> >> [2014-05-20 20:17:08.279148]  : volume barrier patchy enable : SUCCESS
> >> [2014-05-20 20:17:08.476785]  : volume barrier patchy enable : FAILED :
> >> Failed to reconfigure barrier.
> >> [2014-05-20 20:17:08.727429]  : volume barrier patchy disable : SUCCESS
> >> [2014-05-20 20:17:08.926995]  : volume barrier patchy disable : FAILED :
> >> Failed to reconfigure barrier.
> >>
> This log is for bug-1092841.t and its expected.

Damn :-(. I think I screwed up the timestamps while checking Sorry about 
that :-(. But there are failures. Check 
http://build.gluster.org/job/regression/4501/consoleFull

Pranith

> 
> --Atin
> >> Pranith
> >>
> >> - Original Message -
> >>> From: "Pranith Kumar Karampuri" 
> >>> To: "Gluster Devel" 
> >>> Cc: "Joseph Fernandes" , "Vijaikumar M"
> >>> 
> >>> Sent: Tuesday, May 20, 2014 3:41:11 PM
> >>> Subject: Re: Spurious failures because of nfs and snapshots
> >>>
> >>> hi,
> >>> Please resubmit the patches on top of
> >>> http://review.gluster.com/#/c/7753
> >>> to prevent frequent regression failures.
> >>>
> >>> Pranith
> >>> - Original Message -
> >>>> From: "Vijaikumar M" 
> >>>> To: "Pranith Kumar Karampuri" 
> >>>> Cc: "Joseph Fernandes" , "Gluster Devel"
> >>>> 
> >>>> Sent: Monday, May 19, 2014 2:40:47 PM
> >>>> Subject: Re: Spurious failures because of nfs and snapshots
> >>>>
> >>>> Brick disconnected with ping-time out:
> >>>>
> >>>> Here is the log message
> >>>> [2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main]
> >>>> 0-/build/install/sbin/glusterfsd: Started running /build/install/sbi
> >>>> n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s
> >>>> build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9
> >>>> 1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3
> >>>> -p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f
> >>>> bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid
> >>>> -S /var/run/51fe50a6faf0aae006c815da946caf3a.socket --brick-name
> >>>> /var/run/glu

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-21 Thread Vijaikumar M

KP, Atin and myself did some debugging and found that there was a 
deadlock in glusterd.


When creating a volume snapshot, the back-end operation 'taking a 
lvm_snapshot and starting brick' for the each brick

are executed in parallel using synctask framework.

brick_start was releasing a big_lock with brick_connect and does a lock 
again.
This caused a deadlock in some race condition where main-thread waiting 
for one of the synctask thread to finish and

synctask-thread waiting for the big_lock.


We are working on fixing this issue.

Thanks,
Vijay


On Wednesday 21 May 2014 12:23 PM, Vijaikumar M wrote:
From the log: 
http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a17%3a10%3a51.tgzit 
looks like glusterd was hung:


*Glusterd log:**
* 5305 [2014-05-20 20:08:55.040665] E 
[glusterd-snapshot.c:3805:glusterd_add_brick_to_snap_volume] 
0-management: Unable to fetch snap device (vol1.brick_snapdevice0). 
Leaving empty
 5306 [2014-05-20 20:08:55.649146] I 
[rpc-clnt.c:973:rpc_clnt_connection_init] 0-management: setting 
frame-timeout to 600
 5307 [2014-05-20 20:08:55.663181] I 
[rpc-clnt.c:973:rpc_clnt_connection_init] 0-management: setting 
frame-timeout to 600
 5308 [2014-05-20 20:16:55.541197] W 
[glusterfsd.c:1182:cleanup_and_exit] (--> 0-: received signum (15), 
shutting down


Glusterd was hung when executing the testcase ./tests/bugs/bug-1090042.t.

*Cli log:**
*72649 [2014-05-20 20:12:51.960765] T 
[rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
 72650 [2014-05-20 20:12:51.960850] T [socket.c:2689:socket_connect] 
(-->/build/install/lib/libglusterfs.so.0(gf_timer_proc+0x1a2) 
[0x7ff8b6609994] 
(-->/build/install/lib/libgfrpc.so.0(rpc_clnt_reconnect+0x137) 
[0x7ff8b5d3305b] (- 
->/build/install/lib/libgfrpc.so.0(rpc_transport_connect+0x74) 
[0x7ff8b5d30071]))) 0-glusterfs: connect () called on transport 
already connected
 72651 [2014-05-20 20:12:52.960943] T 
[rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
 72652 [2014-05-20 20:12:52.960999] T [socket.c:2697:socket_connect] 
0-glusterfs: connecting 0x1e0fcc0, state=0 gen=0 sock=-1
 72653 [2014-05-20 20:12:52.961038] W [dict.c:1059:data_to_str] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(+0xb5f3) 
[0x7ff8ad9e95f3] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(socket_clien 
t_get_remote_sockaddr+0x10a) [0x7ff8ad9ed568] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(client_fill_address_family+0xf1) 
[0x7ff8ad9ec7d0]))) 0-dict: data is NULL
 72654 [2014-05-20 20:12:52.961070] W [dict.c:1059:data_to_str] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(+0xb5f3) 
[0x7ff8ad9e95f3] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(socket_clien 
t_get_remote_sockaddr+0x10a) [0x7ff8ad9ed568] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(client_fill_address_family+0x100) 
[0x7ff8ad9ec7df]))) 0-dict: data is NULL
 72655 [2014-05-20 20:12:52.961079] E 
[name.c:140:client_fill_address_family] 0-glusterfs: 
transport.address-family not specified. Could not guess default value 
from (remote-host:(null) or transport.unix.connect-path:(null)) 
optio   ns
 72656 [2014-05-20 20:12:54.961273] T 
[rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
 72657 [2014-05-20 20:12:54.961404] T [socket.c:2689:socket_connect] 
(-->/build/install/lib/libglusterfs.so.0(gf_timer_proc+0x1a2) 
[0x7ff8b6609994] 
(-->/build/install/lib/libgfrpc.so.0(rpc_clnt_reconnect+0x137) 
[0x7ff8b5d3305b] (- 
->/build/install/lib/libgfrpc.so.0(rpc_transport_connect+0x74) 
[0x7ff8b5d30071]))) 0-glusterfs: connect () called on transport 
already connected
 72658 [2014-05-20 20:12:55.120645] D [cli-cmd.c:384:cli_cmd_submit] 
0-cli: Returning 110
 72659 [2014-05-20 20:12:55.120723] D 
[cli-rpc-ops.c:8716:gf_cli_snapshot] 0-cli: Returning 110



Now we need to find why glusterd was hung.


Thanks,
Vijay



On Wednesday 21 May 2014 06:46 AM, Pranith Kumar Karampuri wrote:

Hey,
 Seems like even after this fix is merged, the regression tests are failing 
for the same script. You can check the logs 
athttp://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz

Relevant logs:
[2014-05-20 20:17:07.026045]  : volume create patchy 
build.gluster.org:/d/backends/patchy1 build.gluster.org:/d/backends/patchy2 : 
SUCCESS
[2014-05-20 20:17:08.030673]  : volume start patchy : SUCCESS
[2014-05-20 20:17:08.279148]  : volume barrier patchy enable : SUCCESS
[2014-05-20 20:17:08.476785]  : volume barrier patchy enable : FAILED : Failed 
to reconfigure barrier.
[2014-05-20 20:17:08.727429]  : volume barrier patchy disable : SUCCESS
[2014-05-20 20:17:08.926995]  : volume barrier patchy disable : FAILED : Failed 
to reconfigure barrier.

Pranith

- Original Message -

From: "Pranith Kumar Karampuri"
To: "Gluster Devel"
Cc: "Joseph Fernandes", "Vijaikumar M"
Sent: Tuesday, May 20, 2014 3:41:11 PM
Subject: Re: Spuriou

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-21 Thread Vijay Bellur


On 05/21/2014 08:50 PM, Vijaikumar M wrote:

KP, Atin and myself did some debugging and found that there was a
deadlock in glusterd.

When creating a volume snapshot, the back-end operation 'taking a
lvm_snapshot and starting brick' for the each brick
are executed in parallel using synctask framework.

brick_start was releasing a big_lock with brick_connect and does a lock
again.
This caused a deadlock in some race condition where main-thread waiting
for one of the synctask thread to finish and
synctask-thread waiting for the big_lock.


We are working on fixing this issue.



If this fix is going to take more time, can we please log a bug to track 
this problem and remove the test cases that need to be addressed from 
the test unit? This way other valid patches will not be blocked by the 
failure of the snapshot test unit.


We can introduce these tests again as part of the fix for the problem.

-Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-22 Thread Vijaikumar M


I have posted a patch that fixes this issue:
http://review.gluster.org/#/c/7842/

Thanks,
Vijay


On Thursday 22 May 2014 11:35 AM, Vijay Bellur wrote:

On 05/21/2014 08:50 PM, Vijaikumar M wrote:

KP, Atin and myself did some debugging and found that there was a
deadlock in glusterd.

When creating a volume snapshot, the back-end operation 'taking a
lvm_snapshot and starting brick' for the each brick
are executed in parallel using synctask framework.

brick_start was releasing a big_lock with brick_connect and does a lock
again.
This caused a deadlock in some race condition where main-thread waiting
for one of the synctask thread to finish and
synctask-thread waiting for the big_lock.


We are working on fixing this issue.



If this fix is going to take more time, can we please log a bug to 
track this problem and remove the test cases that need to be addressed 
from the test unit? This way other valid patches will not be blocked 
by the failure of the snapshot test unit.


We can introduce these tests again as part of the fix for the problem.

-Vijay



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] spurious failures in tests/basic/volume-snapshot-clone.t

2015-05-04 Thread Pranith Kumar Karampuri


hi Avra/Rajesh,
Any update on this test?

 * tests/basic/volume-snapshot-clone.t

 * http://review.gluster.org/#/c/10053/

 * Came back on April 9

 * http://build.gluster.org/job/rackspace-regression-2GB-triggered/6658/



Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] spurious failures in tests/bugs/snapshot/bug-1112559.t

2015-05-04 Thread Pranith Kumar Karampuri


Avra,
  Is it reproducible on your setup? If not do you want to move it 
to end of the page in 
https://public.pad.fsfe.org/p/gluster-spurious-failures


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/basic/volume-snapshot-clone.t

2015-05-04 Thread Avra Sengupta


Hi,

As already discussed, if you encounter this or any other snapshot tests, 
it would be great to provide the regression run instance so that we can 
have a look at the logs if there are any. Also I tried running the test 
in a loop as you suggested. After an hour and a half I stopped it so 
that I can use my machines to work on some patches. So please let us 
know when this or any snapshot tests fails for anyone and we will look 
into it asap.


Regards,
Avra

On 05/05/2015 09:01 AM, Pranith Kumar Karampuri wrote:

hi Avra/Rajesh,
Any update on this test?

  * tests/basic/volume-snapshot-clone.t

  * http://review.gluster.org/#/c/10053/

  * Came back on April 9

  * http://build.gluster.org/job/rackspace-regression-2GB-triggered/6658/



Pranith


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/basic/volume-snapshot-clone.t

2015-05-04 Thread Pranith Kumar Karampuri



On 05/05/2015 10:32 AM, Avra Sengupta wrote:

Hi,

As already discussed, if you encounter this or any other snapshot 
tests, it would be great to provide the regression run instance so 
that we can have a look at the logs if there are any. Also I tried 
running the test in a loop as you suggested. After an hour and a half 
I stopped it so that I can use my machines to work on some patches. So 
please let us know when this or any snapshot tests fails for anyone 
and we will look into it asap.

Please read the mail again to find the link which has the logs.

./tests/basic/volume-snapshot-clone.t   
(Wstat: 0 Tests: 41 Failed: 3)
  Failed tests:  36, 38, 40



Pranith


Regards,
Avra

On 05/05/2015 09:01 AM, Pranith Kumar Karampuri wrote:

hi Avra/Rajesh,
Any update on this test?

  * tests/basic/volume-snapshot-clone.t

  * http://review.gluster.org/#/c/10053/

  * Came back on April 9

  * http://build.gluster.org/job/rackspace-regression-2GB-triggered/6658/



Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/basic/volume-snapshot-clone.t

2015-05-04 Thread Avra Sengupta


On 05/05/2015 10:43 AM, Pranith Kumar Karampuri wrote:


On 05/05/2015 10:32 AM, Avra Sengupta wrote:

Hi,

As already discussed, if you encounter this or any other snapshot 
tests, it would be great to provide the regression run instance so 
that we can have a look at the logs if there are any. Also I tried 
running the test in a loop as you suggested. After an hour and a half 
I stopped it so that I can use my machines to work on some patches. 
So please let us know when this or any snapshot tests fails for 
anyone and we will look into it asap.

Please read the mail again to find the link which has the logs.
./tests/basic/volume-snapshot-clone.t   
(Wstat: 0 Tests: 41 Failed: 3)
   Failed tests:  36, 38, 40
As repeatedly told, older regression run doesn't have the logs any more. 
Please find the link and try and fetch the logs. Please tell me if I am 
missing something here.


[root@VM1 lab]# wget 
http://slave33.cloud.gluster.org/logs/glusterfs-logs-20150409:09:27:03.tgz .
--2015-05-05 10:47:18-- 
http://slave33.cloud.gluster.org/logs/glusterfs-logs-20150409:09:27:03.tgz

Resolving slave33.cloud.gluster.org... 104.130.217.7
Connecting to slave33.cloud.gluster.org|104.130.217.7|:80... failed: 
Connection refused.

--2015-05-05 10:47:19--  http://./
Resolving  failed: No address associated with hostname.
wget: unable to resolve host address “.”
[root@VM1 lab]#

Regards,
Avra



Pranith


Regards,
Avra

On 05/05/2015 09:01 AM, Pranith Kumar Karampuri wrote:

hi Avra/Rajesh,
Any update on this test?

  * tests/basic/volume-snapshot-clone.t

  * http://review.gluster.org/#/c/10053/

  * Came back on April 9

  * http://build.gluster.org/job/rackspace-regression-2GB-triggered/6658/



Pranith






___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/basic/volume-snapshot-clone.t

2015-05-04 Thread Pranith Kumar Karampuri



On 05/05/2015 10:48 AM, Avra Sengupta wrote:

On 05/05/2015 10:43 AM, Pranith Kumar Karampuri wrote:


On 05/05/2015 10:32 AM, Avra Sengupta wrote:

Hi,

As already discussed, if you encounter this or any other snapshot 
tests, it would be great to provide the regression run instance so 
that we can have a look at the logs if there are any. Also I tried 
running the test in a loop as you suggested. After an hour and a 
half I stopped it so that I can use my machines to work on some 
patches. So please let us know when this or any snapshot tests fails 
for anyone and we will look into it asap.

Please read the mail again to find the link which has the logs.
./tests/basic/volume-snapshot-clone.t   
(Wstat: 0 Tests: 41 Failed: 3)
   Failed tests:  36, 38, 40
As repeatedly told, older regression run doesn't have the logs any 
more. Please find the link and try and fetch the logs. Please tell me 
if I am missing something here.


[root@VM1 lab]# wget 
http://slave33.cloud.gluster.org/logs/glusterfs-logs-20150409:09:27:03.tgz 
.
--2015-05-05 10:47:18-- 
http://slave33.cloud.gluster.org/logs/glusterfs-logs-20150409:09:27:03.tgz

Resolving slave33.cloud.gluster.org... 104.130.217.7
Connecting to slave33.cloud.gluster.org|104.130.217.7|:80... failed: 
Connection refused.

--2015-05-05 10:47:19-- http://./
Resolving  failed: No address associated with hostname.
wget: unable to resolve host address “.”
[root@VM1 lab]#

Ah! my bad, will let you know if it happens again.

Pranith


Regards,
Avra



Pranith


Regards,
Avra

On 05/05/2015 09:01 AM, Pranith Kumar Karampuri wrote:

hi Avra/Rajesh,
Any update on this test?

  * tests/basic/volume-snapshot-clone.t

  * http://review.gluster.org/#/c/10053/

  * Came back on April 9

  * http://build.gluster.org/job/rackspace-regression-2GB-triggered/6658/



Pranith








___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] spurious failures in tests/bugs/snapshot/bug-1109889.t

2015-07-09 Thread Pranith Kumar Karampuri


hi,
  Could you please look into 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12150/consoleFull 



Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] spurious failures for ./tests/basic/afr/root-squash-self-heal.t

2016-08-31 Thread Susant Palai

Hi,
 $subject is failing spuriously for one of my patch.
One of the test case is: EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" 
afr_child_up_status $V0 0 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Spurious failures in ec/quota.t and distribute/bug-860663.t

2016-03-01 Thread Poornima Gurusiddaiah

Hi, 

I see these test cases failing spuriously, 

./tests/basic/ec/quota.t Failed Tests: 1-13, 16, 18, 20, 2 
https://build.gluster.org/job/rackspace-regression-2GB-triggered/18637/consoleFull
 
./tests/bugs/distribute/bug-860663.t Failed Test: 13 
https://build.gluster.org/job/rackspace-regression-2GB-triggered/18622/consoleFull
 

Could any one from Quota and dht look into it? 
Regards, 
Poornima 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] spurious failures in tests/basic/afr/sparse-file-self-heal.t

2015-05-01 Thread Pranith Kumar Karampuri


hi,
 As per the etherpad: 
https://public.pad.fsfe.org/p/gluster-spurious-failures


 * tests/basic/afr/sparse-file-self-heal.t (Wstat: 0 Tests: 64 Failed: 35)

 * Failed tests:  1-6, 11, 20-30, 33-34, 36, 41, 50-61, 64

 * Happens in master (Mon 30th March - git commit id
   3feaf1648528ff39e23748ac9004a77595460c9d)

 * (hasn't yet been added to BZs)

If glusterd itself fails to come up, of course the test will fail :-). 
Is it still happening?


Pranith


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/bugs/snapshot/bug-1109889.t

2015-07-09 Thread Pranith Kumar Karampuri


Sorry, seems like this is already fixed, I just need to rebase.

Pranith

On 07/09/2015 03:56 PM, Pranith Kumar Karampuri wrote:

hi,
  Could you please look into 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12150/consoleFull 



Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/bugs/snapshot/bug-1109889.t

2015-07-09 Thread Kaushal M

This doesn't seem to have been fixed completely. My change [1] failed
(again !) on this test [2], even after rebasing onto the fix [3].

[1]: https://review.gluster.org/11559
[2]: 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12152/consoleFull
[3]: https://review.gluster.org/11579

On Thu, Jul 9, 2015 at 4:20 PM, Pranith Kumar Karampuri
 wrote:
> Sorry, seems like this is already fixed, I just need to rebase.
>
> Pranith
>
>
> On 07/09/2015 03:56 PM, Pranith Kumar Karampuri wrote:
>>
>> hi,
>>   Could you please look into
>> http://build.gluster.org/job/rackspace-regression-2GB-triggered/12150/consoleFull
>>
>> Pranith
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures for ./tests/basic/afr/root-squash-self-heal.t

2016-08-31 Thread Susant Palai

>From glusterd log:
[2016-08-31 07:54:24.817811] E [run.c:191:runner_log] 
(-->/build/install/lib/glusterfs/3.9dev/xlator/mgmt/glusterd.so(+0xe1c30) 
[0x7f1a34ebac30] 
-->/build/install/lib/glusterfs/3.9dev/xlator/mgmt/glusterd.so(+0xe1794) 
[0x7f1a34eba794] -->/build/install/lib/libglusterfs.so.0(runner_log+0x1ae) 
[0x7f1a3fa15cea] ) 0-management: Failed to execute script: 
/var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=patchy 
--first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2016-08-31 07:54:24.819166]:++ 
G_LOG:./tests/basic/afr/root-squash-self-heal.t: TEST: 20 1 afr_child_up_status 
patchy 0 ++

The above is spawned from a "volume start force". I checked the brick logs and 
the killed brick had started successfully.

Links to failures:
 https://build.gluster.org/job/centos6-regression/429/console
 https://build.gluster.org/job/netbsd7-regression/358/consoleFull


Thanks,
Susant

- Original Message -
> From: "Susant Palai" 
> To: "gluster-devel" 
> Sent: Thursday, 1 September, 2016 12:13:01 PM
> Subject: [Gluster-devel] spurious failures for
> ./tests/basic/afr/root-squash-self-heal.t
> 
> Hi,
>  $subject is failing spuriously for one of my patch.
> One of the test case is: EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1"
> afr_child_up_status $V0 0
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures in ec/quota.t and distribute/bug-860663.t

2016-03-01 Thread Vijaikumar Mallikarjuna

Hi Poornima,

Below patch might solve the regression failure for
''./tests/basic/ec/quota.t'

http://review.gluster.org/#/c/13446/
http://review.gluster.org/#/c/13447/

Thanks,
Vijay


On Tue, Mar 1, 2016 at 4:49 PM, Poornima Gurusiddaiah 
wrote:

> Hi,
>
> I see these test cases failing spuriously,
>
> ./tests/basic/ec/quota.t Failed Tests: 1-13, 16, 18, 20, 2
>
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/18637/consoleFull
>
> ./tests/bugs/distribute/bug-860663.t Failed Test: 13
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/18622/consoleFull
>
> Could any one from Quota and dht look into it?
>
> Regards,
> Poornima
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures in ec/quota.t and distribute/bug-860663.t

2016-03-01 Thread Xavier Hernandez


Hi Poornima,

On 01/03/16 12:19, Poornima Gurusiddaiah wrote:

Hi,

I see these test cases failing spuriously,

./tests/basic/ec/quota.t Failed Tests: 1-13, 16, 18, 20, 2
https://build.gluster.org/job/rackspace-regression-2GB-triggered/18637/consoleFull


This is already solved by http://review.gluster.org/13446/. It has been 
merged just a couple hours ago.


Xavi



./tests/bugs/distribute/bug-860663.t Failed Test: 13
https://build.gluster.org/job/rackspace-regression-2GB-triggered/18622/consoleFull

Could any one from Quota and dht look into it?

Regards,
Poornima


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures in ec/quota.t and distribute/bug-860663.t

2016-03-01 Thread Sakshi Bansal

Hi,

Patch http://review.gluster.org/#/c/10906/ (recently merged) fixes 
./tests/bugs/distribute/bug-860663.t.


- Original Message -
From: "Xavier Hernandez" 
To: "Poornima Gurusiddaiah" , "Gluster Devel" 
, "Manikandan Selvaganesan" , 
"Susant Palai" , "Nithya Balachandran" 
Sent: Tuesday, March 1, 2016 4:57:11 PM
Subject: Re: [Gluster-devel] Spurious failures in ec/quota.t and 
distribute/bug-860663.t

Hi Poornima,

On 01/03/16 12:19, Poornima Gurusiddaiah wrote:
> Hi,
>
> I see these test cases failing spuriously,
>
> ./tests/basic/ec/quota.t Failed Tests: 1-13, 16, 18, 20, 2
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/18637/consoleFull

This is already solved by http://review.gluster.org/13446/. It has been 
merged just a couple hours ago.

Xavi

>
> ./tests/bugs/distribute/bug-860663.t Failed Test: 13
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/18622/consoleFull
>
> Could any one from Quota and dht look into it?
>
> Regards,
> Poornima
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures in ec/quota.t and distribute/bug-860663.t

2016-03-01 Thread Poornima Gurusiddaiah

Thank You, have rebased the patch.

Regards,
Poornima

- Original Message -
> From: "Xavier Hernandez" 
> To: "Poornima Gurusiddaiah" , "Gluster Devel" 
> , "Manikandan
> Selvaganesan" , "Susant Palai" , 
> "Nithya Balachandran" 
> Sent: Tuesday, March 1, 2016 4:57:11 PM
> Subject: Re: [Gluster-devel] Spurious failures in ec/quota.t and 
> distribute/bug-860663.t
> 
> Hi Poornima,
> 
> On 01/03/16 12:19, Poornima Gurusiddaiah wrote:
> > Hi,
> >
> > I see these test cases failing spuriously,
> >
> > ./tests/basic/ec/quota.t Failed Tests: 1-13, 16, 18, 20, 2
> > https://build.gluster.org/job/rackspace-regression-2GB-triggered/18637/consoleFull
> 
> This is already solved by http://review.gluster.org/13446/. It has been
> merged just a couple hours ago.
> 
> Xavi
> 
> >
> > ./tests/bugs/distribute/bug-860663.t Failed Test: 13
> > https://build.gluster.org/job/rackspace-regression-2GB-triggered/18622/consoleFull
> >
> > Could any one from Quota and dht look into it?
> >
> > Regards,
> > Poornima
> >
> >
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> >
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures in ec/quota.t and distribute/bug-860663.t

2016-03-01 Thread Raghavendra Gowdappa



- Original Message -
> From: "Poornima Gurusiddaiah" 
> To: "Gluster Devel" , "Manikandan Selvaganesan" 
> , "Susant Palai"
> , "Nithya Balachandran" 
> Sent: Tuesday, March 1, 2016 4:49:51 PM
> Subject: [Gluster-devel] Spurious failures in ec/quota.t and  
> distribute/bug-860663.t
> 
> Hi,
> 
> I see these test cases failing spuriously,
> 
> ./tests/basic/ec/quota.t Failed Tests: 1-13, 16, 18, 20, 2
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/18637/consoleFull
> ./tests/bugs/distribute/bug-860663.t Failed Test: 13
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/18622/consoleFull

The test which failed is just a umount. Not sure why it failed


# Unmount and remount to make sure we're doing fresh lookups.   


TEST umount $M0


Alternatively we can have another fresh mount on say $M1, and run future tests. 
Can you check whether patch [1] fixes your issue (push your patch as a 
dependency of [1])?

[1] http://review.gluster.org/13567

> 
> Could any one from Quota and dht look into it?
> Regards,
> Poornima
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/basic/afr/sparse-file-self-heal.t

2015-05-01 Thread Vijay Bellur


On 05/02/2015 08:17 AM, Pranith Kumar Karampuri wrote:

hi,
  As per the etherpad:
https://public.pad.fsfe.org/p/gluster-spurious-failures

  * tests/basic/afr/sparse-file-self-heal.t (Wstat: 0 Tests: 64 Failed: 35)

  * Failed tests:  1-6, 11, 20-30, 33-34, 36, 41, 50-61, 64

  * Happens in master (Mon 30th March - git commit id
3feaf1648528ff39e23748ac9004a77595460c9d)

  * (hasn't yet been added to BZs)

If glusterd itself fails to come up, of course the test will fail :-).
Is it still happening?



We have not been actively curating this list for the last few days and 
am not certain if this failure happens anymore.


Investigating why a regression run fails for our patches and fixing them 
(though unrelated to our patch) should be the most effective way going 
ahead.


-Vijay


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/basic/afr/sparse-file-self-heal.t

2015-05-01 Thread Krishnan Parthasarathi


> If glusterd itself fails to come up, of course the test will fail :-). Is it
> still happening?
Pranith,

Did you get a chance to see glusterd logs and find why glusterd didn't come up?
Please paste the relevant logs in this thread.

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/basic/afr/sparse-file-self-heal.t

2015-05-02 Thread Pranith Kumar Karampuri



On 05/02/2015 10:14 AM, Krishnan Parthasarathi wrote:

If glusterd itself fails to come up, of course the test will fail :-). Is it
still happening?

Pranith,

Did you get a chance to see glusterd logs and find why glusterd didn't come up?
Please paste the relevant logs in this thread.

No :-(. The etherpad doesn't have any links :-(.
Justin any help here?

Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-01 Thread Pranith Kumar Karampuri


hi,
http://build.gluster.org/job/rackspace-regression-2GB-triggered/11757/consoleFull 
has the logs. Could you please look into it.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-01 Thread Joseph Fernandes

Yep will have a look 

- Original Message -
From: "Pranith Kumar Karampuri" 
To: "Joseph Fernandes" , "Gluster Devel" 

Sent: Wednesday, July 1, 2015 1:44:44 PM
Subject: spurious failures 
tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

hi,
http://build.gluster.org/job/rackspace-regression-2GB-triggered/11757/consoleFull
 
has the logs. Could you please look into it.

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-01 Thread Pranith Kumar Karampuri


Thanks Joseph!

Pranith

On 07/01/2015 01:59 PM, Joseph Fernandes wrote:

Yep will have a look

- Original Message -
From: "Pranith Kumar Karampuri" 
To: "Joseph Fernandes" , "Gluster Devel" 

Sent: Wednesday, July 1, 2015 1:44:44 PM
Subject: spurious failures 
tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

hi,
http://build.gluster.org/job/rackspace-regression-2GB-triggered/11757/consoleFull
has the logs. Could you please look into it.

Pranith


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-01 Thread Joseph Fernandes

aph_init] 0-patchy-server: initializing translator 
failed
[2015-07-01 07:33:25.069808] E [MSGID: 101176] 
[graph.c:669:glusterfs_graph_activate] 0-graph: init failed
[2015-07-01 07:33:25.070183] W [glusterfsd.c:1214:cleanup_and_exit] (--> 0-: 
received signum (0), shutting down


Looks like it is assigned a port which is already in used.

The status of the volume in glusterd is not started, as a result attach-tier 
command fails, i.e tiering rebalancer cannot run.

[2015-07-01 07:33:25.275092] E [MSGID: 106301] 
[glusterd-op-sm.c:4086:glusterd_op_ac_send_stage_op] 0-management: Staging of 
operation 'Volume Rebalance' failed on localhost : Volume patchy needs to be 
started to perform rebalance

but the volume is running in the crippled mode, as a result mount works fine.

i.e TEST $GFS --volfile-id=/$V0 --volfile-server=$H0 $M0; works fine

TEST 9-12 failed as attach has failed.


Regards,
Joe

- Original Message -
From: "Joseph Fernandes" 
To: "Pranith Kumar Karampuri" 
Cc: "Gluster Devel" 
Sent: Wednesday, July 1, 2015 1:59:41 PM
Subject: Re: [Gluster-devel] spurious failures 
tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

Yep will have a look 

- Original Message -
From: "Pranith Kumar Karampuri" 
To: "Joseph Fernandes" , "Gluster Devel" 

Sent: Wednesday, July 1, 2015 1:44:44 PM
Subject: spurious failures 
tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

hi,
http://build.gluster.org/job/rackspace-regression-2GB-triggered/11757/consoleFull
 
has the logs. Could you please look into it.

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-01 Thread Raghavendra Talur

1599:rpcsvc_transport_create]
> 0-rpc-service: listening on transport failed
> [2015-07-01 07:33:25.069774] W [MSGID: 115045] [server.c:996:init]
> 0-patchy-server: creation of listener failed
> [2015-07-01 07:33:25.069788] E [MSGID: 101019] [xlator.c:423:xlator_init]
> 0-patchy-server: Initialization of volume 'patchy-server' failed, review
> your volfile again
> [2015-07-01 07:33:25.069798] E [MSGID: 101066]
> [graph.c:323:glusterfs_graph_init] 0-patchy-server: initializing translator
> failed
> [2015-07-01 07:33:25.069808] E [MSGID: 101176]
> [graph.c:669:glusterfs_graph_activate] 0-graph: init failed
> [2015-07-01 07:33:25.070183] W [glusterfsd.c:1214:cleanup_and_exit] (-->
> 0-: received signum (0), shutting down
>
>
> Looks like it is assigned a port which is already in used.
>

Saw the same error in another test failing for another patch set.
Here is the link:
http://build.gluster.org/job/rackspace-regression-2GB-triggered/11740/consoleFull

A port assigned by Glusterd for a brick is found to be in use already by
the brick. Any changes in Glusterd recently which can cause this?

Or is it a test infra problem?



>
> The status of the volume in glusterd is not started, as a result
> attach-tier command fails, i.e tiering rebalancer cannot run.
>
> [2015-07-01 07:33:25.275092] E [MSGID: 106301]
> [glusterd-op-sm.c:4086:glusterd_op_ac_send_stage_op] 0-management: Staging
> of operation 'Volume Rebalance' failed on localhost : Volume patchy needs
> to be started to perform rebalance
>
> but the volume is running in the crippled mode, as a result mount works
> fine.
>
> i.e TEST $GFS --volfile-id=/$V0 --volfile-server=$H0 $M0; works fine
>
> TEST 9-12 failed as attach has failed.
>
>
> Regards,
> Joe
>
> - Original Message -
> From: "Joseph Fernandes" 
> To: "Pranith Kumar Karampuri" 
> Cc: "Gluster Devel" 
> Sent: Wednesday, July 1, 2015 1:59:41 PM
> Subject: Re: [Gluster-devel] spurious failures
> tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
>
> Yep will have a look
>
> - Original Message -
> From: "Pranith Kumar Karampuri" 
> To: "Joseph Fernandes" , "Gluster Devel" <
> gluster-devel@gluster.org>
> Sent: Wednesday, July 1, 2015 1:44:44 PM
> Subject: spurious failures
> tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
>
> hi,
>
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/11757/consoleFull
> has the logs. Could you please look into it.
>
> Pranith
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
*Raghavendra Talur *
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-01 Thread Raghavendra Talur

ng on transport failed
>> [2015-07-01 07:33:25.069774] W [MSGID: 115045] [server.c:996:init]
0-patchy-server: creation of listener failed
>> [2015-07-01 07:33:25.069788] E [MSGID: 101019]
[xlator.c:423:xlator_init] 0-patchy-server: Initialization of volume
'patchy-server' failed, review your volfile again
>> [2015-07-01 07:33:25.069798] E [MSGID: 101066]
[graph.c:323:glusterfs_graph_init] 0-patchy-server: initializing translator
failed
>> [2015-07-01 07:33:25.069808] E [MSGID: 101176]
[graph.c:669:glusterfs_graph_activate] 0-graph: init failed
>> [2015-07-01 07:33:25.070183] W [glusterfsd.c:1214:cleanup_and_exit] (-->
0-: received signum (0), shutting down
>>
>>
>> Looks like it is assigned a port which is already in used.
>
>
> Saw the same error in another test failing for another patch set.
> Here is the link:
http://build.gluster.org/job/rackspace-regression-2GB-triggered/11740/consoleFull
>
> A port assigned by Glusterd for a brick is found to be in use already by
the brick. Any changes in Glusterd recently which can cause this?
>
> Or is it a test infra problem?

Prasanna is looking into this for now.

>
>
>>
>>
>> The status of the volume in glusterd is not started, as a result
attach-tier command fails, i.e tiering rebalancer cannot run.
>>
>> [2015-07-01 07:33:25.275092] E [MSGID: 106301]
[glusterd-op-sm.c:4086:glusterd_op_ac_send_stage_op] 0-management: Staging
of operation 'Volume Rebalance' failed on localhost : Volume patchy needs
to be started to perform rebalance
>>
>> but the volume is running in the crippled mode, as a result mount works
fine.
>>
>> i.e TEST $GFS --volfile-id=/$V0 --volfile-server=$H0 $M0; works fine
>>
>> TEST 9-12 failed as attach has failed.
>>
>>
>> Regards,
>> Joe
>>
>> - Original Message -
>> From: "Joseph Fernandes" 
>> To: "Pranith Kumar Karampuri" 
>> Cc: "Gluster Devel" 
>> Sent: Wednesday, July 1, 2015 1:59:41 PM
>> Subject: Re: [Gluster-devel] spurious failures
tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
>>
>> Yep will have a look
>>
>> - Original Message -
>> From: "Pranith Kumar Karampuri" 
>> To: "Joseph Fernandes" , "Gluster Devel" <
gluster-devel@gluster.org>
>> Sent: Wednesday, July 1, 2015 1:44:44 PM
>> Subject: spurious failures
tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
>>
>> hi,
>>
http://build.gluster.org/job/rackspace-regression-2GB-triggered/11757/consoleFull
>> has the logs. Could you please look into it.
>>
>> Pranith
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
> --
> Raghavendra Talur
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-01 Thread Krishnan Parthasarathi


> > 
> > A port assigned by Glusterd for a brick is found to be in use already by
> > the brick. Any changes in Glusterd recently which can cause this?
> > 
> > Or is it a test infra problem?

This issue is likely to be caused by http://review.gluster.org/11039
This patch changes the port allocation that happens for rpc_clnt based
connections. Previously, ports allocated where < 1024. With this change,
these connections, typically mount process, gluster-nfs server processes
etc could end up using ports that bricks are being assigned to.

IIUC, the intention of the patch was to make server processes lenient to
inbound messages from ports > 1024. If we don't require to use ports > 1024
we could leave the port allocation for rpc_clnt connections as before.
Alternately, we could reserve the range of ports starting from 49152 for bricks
by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is specific 
to Linux.
I'm not aware of how this could be done in NetBSD for instance though.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Raghavendra Talur

On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi  wrote:

>
> > >
> > > A port assigned by Glusterd for a brick is found to be in use already
> by
> > > the brick. Any changes in Glusterd recently which can cause this?
> > >
> > > Or is it a test infra problem?
>
> This issue is likely to be caused by http://review.gluster.org/11039
> This patch changes the port allocation that happens for rpc_clnt based
> connections. Previously, ports allocated where < 1024. With this change,
> these connections, typically mount process, gluster-nfs server processes
> etc could end up using ports that bricks are being assigned to.
>
> IIUC, the intention of the patch was to make server processes lenient to
> inbound messages from ports > 1024. If we don't require to use ports > 1024
> we could leave the port allocation for rpc_clnt connections as before.
> Alternately, we could reserve the range of ports starting from 49152 for
> bricks
> by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is
> specific to Linux.
> I'm not aware of how this could be done in NetBSD for instance though.
>


It seems this is exactly whats happening.

I have a question, I get the following data from netstat and grep

tcp0  0 f6be17c0fbf5:1023   f6be17c0fbf5:24007
 ESTABLISHED 31516/glusterfsd
tcp0  0 f6be17c0fbf5:49152  f6be17c0fbf5:490
 ESTABLISHED 31516/glusterfsd
unix  3  [ ] STREAM CONNECTED 988353   31516/glusterfsd
/var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket

Here 31516 is the brick pid.

Looking at the data, line 2 is very clear, it shows connection between
brick and glusterfs client.
unix socket on line 3 is also clear, it is the unix socket connection that
glusterd and brick process use for communication.

I am not able to understand line 1; which part of brick process established
a tcp connection with glusterd using port 1023?
Note: this data is from a build which does not have the above mentioned
patch.

-- 
*Raghavendra Talur *
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Raghavendra Talur

On Thu, Jul 2, 2015 at 4:40 PM, Raghavendra Talur <
raghavendra.ta...@gmail.com> wrote:

>
>
> On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi <
> kpart...@redhat.com> wrote:
>
>>
>> > >
>> > > A port assigned by Glusterd for a brick is found to be in use already
>> by
>> > > the brick. Any changes in Glusterd recently which can cause this?
>> > >
>> > > Or is it a test infra problem?
>>
>> This issue is likely to be caused by http://review.gluster.org/11039
>> This patch changes the port allocation that happens for rpc_clnt based
>> connections. Previously, ports allocated where < 1024. With this change,
>> these connections, typically mount process, gluster-nfs server processes
>> etc could end up using ports that bricks are being assigned to.
>>
>> IIUC, the intention of the patch was to make server processes lenient to
>> inbound messages from ports > 1024. If we don't require to use ports >
>> 1024
>> we could leave the port allocation for rpc_clnt connections as before.
>> Alternately, we could reserve the range of ports starting from 49152 for
>> bricks
>> by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is
>> specific to Linux.
>> I'm not aware of how this could be done in NetBSD for instance though.
>>
>
>
> It seems this is exactly whats happening.
>
> I have a question, I get the following data from netstat and grep
>
> tcp0  0 f6be17c0fbf5:1023   f6be17c0fbf5:24007
>  ESTABLISHED 31516/glusterfsd
> tcp0  0 f6be17c0fbf5:49152  f6be17c0fbf5:490
>  ESTABLISHED 31516/glusterfsd
> unix  3  [ ] STREAM CONNECTED 988353
> 31516/glusterfsd
> /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket
>
> Here 31516 is the brick pid.
>
> Looking at the data, line 2 is very clear, it shows connection between
> brick and glusterfs client.
> unix socket on line 3 is also clear, it is the unix socket connection that
> glusterd and brick process use for communication.
>
> I am not able to understand line 1; which part of brick process
> established a tcp connection with glusterd using port 1023?
> Note: this data is from a build which does not have the above mentioned
> patch.
>


The patch which exposed this bug is being reverted till the underlying bug
is also fixed.
You can monitor revert patches here
master: http://review.gluster.org/11507
3.7 branch: http://review.gluster.org/11508

Please rebase your patches after the above patches are merged to ensure
that you patches pass regression.



>
> --
> *Raghavendra Talur *
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Prasanna Kalever

This is caused because when bind-insecure is turned on (which is the default 
now), it may happen
that brick is not able to bind to port assigned by Glusterd for example 
49192-49195...
It seems to occur because the rpc_clnt connections are binding to ports in the 
same range. 
so brick fails to bind to a port which is already used by someone else.

This bug already exist before http://review.gluster.org/#/c/11039/ when use 
rdma, i.e. even
previously rdma binds to port >= 1024 if it cannot find a free port < 1024,
even when bind insecure was turned off (ref to commit '0e3fd04e').
Since we don't have tests related to rdma we did not discover this issue 
previously.

http://review.gluster.org/#/c/11039/ discovers the bug we encountered, however 
now the bug can be fixed by
http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port numbers 
from 65535 in a descending
order, as a result port clash is minimized, also it fixes issues in rdma too

Thanks to Raghavendra Talur for help in discovering the real cause

Regards,
Prasanna Kalever

- Original Message -
From: "Raghavendra Talur" 
To: "Krishnan Parthasarathi" 
Cc: "Gluster Devel" 
Sent: Thursday, July 2, 2015 6:45:17 PM
Subject: Re: [Gluster-devel] spurious failures  
tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

On Thu, Jul 2, 2015 at 4:40 PM, Raghavendra Talur < raghavendra.ta...@gmail.com 
> wrote: 

On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi < kpart...@redhat.com > 
wrote: 

> > 
> > A port assigned by Glusterd for a brick is found to be in use already by 
> > the brick. Any changes in Glusterd recently which can cause this? 
> > 
> > Or is it a test infra problem? 

This issue is likely to be caused by http://review.gluster.org/11039 
This patch changes the port allocation that happens for rpc_clnt based 
connections. Previously, ports allocated where < 1024. With this change, 
these connections, typically mount process, gluster-nfs server processes 
etc could end up using ports that bricks are being assigned to. 

IIUC, the intention of the patch was to make server processes lenient to 
inbound messages from ports > 1024. If we don't require to use ports > 1024 
we could leave the port allocation for rpc_clnt connections as before. 
Alternately, we could reserve the range of ports starting from 49152 for bricks 
by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is specific 
to Linux. 
I'm not aware of how this could be done in NetBSD for instance though. 

It seems this is exactly whats happening. 

I have a question, I get the following data from netstat and grep 

tcp 0 0 f6be17c0fbf5:1023 f6be17c0fbf5:24007 ESTABLISHED 31516/glusterfsd 
tcp 0 0 f6be17c0fbf5:49152 f6be17c0fbf5:490 ESTABLISHED 31516/glusterfsd 
unix 3 [ ] STREAM CONNECTED 988353 31516/glusterfsd 
/var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket 

Here 31516 is the brick pid. 

Looking at the data, line 2 is very clear, it shows connection between brick 
and glusterfs client. 
unix socket on line 3 is also clear, it is the unix socket connection that 
glusterd and brick process use for communication. 

I am not able to understand line 1; which part of brick process established a 
tcp connection with glusterd using port 1023? 
Note: this data is from a build which does not have the above mentioned patch. 

The patch which exposed this bug is being reverted till the underlying bug is 
also fixed. 
You can monitor revert patches here 
master: http://review.gluster.org/11507 
3.7 branch: http://review.gluster.org/11508 

Please rebase your patches after the above patches are merged to ensure that 
you patches pass regression. 

-- 
Raghavendra Talur 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Atin Mukherjee

Thanks Prasanna for the patches :)

-Atin
Sent from one plus one
On Jul 2, 2015 9:19 PM, "Prasanna Kalever"  wrote:

>
> This is caused because when bind-insecure is turned on (which is the
> default now), it may happen
> that brick is not able to bind to port assigned by Glusterd for example
> 49192-49195...
> It seems to occur because the rpc_clnt connections are binding to ports in
> the same range.
> so brick fails to bind to a port which is already used by someone else.
>
> This bug already exist before http://review.gluster.org/#/c/11039/ when
> use rdma, i.e. even
> previously rdma binds to port >= 1024 if it cannot find a free port < 1024,
> even when bind insecure was turned off (ref to commit '0e3fd04e').
> Since we don't have tests related to rdma we did not discover this issue
> previously.
>
> http://review.gluster.org/#/c/11039/ discovers the bug we encountered,
> however now the bug can be fixed by
> http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port
> numbers from 65535 in a descending
> order, as a result port clash is minimized, also it fixes issues in rdma
> too
>
> Thanks to Raghavendra Talur for help in discovering the real cause
>
>
> Regards,
> Prasanna Kalever
>
>
>
> - Original Message -
> From: "Raghavendra Talur" 
> To: "Krishnan Parthasarathi" 
> Cc: "Gluster Devel" 
> Sent: Thursday, July 2, 2015 6:45:17 PM
> Subject: Re: [Gluster-devel] spurious failures
> tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
>
>
>
> On Thu, Jul 2, 2015 at 4:40 PM, Raghavendra Talur <
> raghavendra.ta...@gmail.com > wrote:
>
>
>
>
>
> On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi <
> kpart...@redhat.com > wrote:
>
>
>
> > >
> > > A port assigned by Glusterd for a brick is found to be in use already
> by
> > > the brick. Any changes in Glusterd recently which can cause this?
> > >
> > > Or is it a test infra problem?
>
> This issue is likely to be caused by http://review.gluster.org/11039
> This patch changes the port allocation that happens for rpc_clnt based
> connections. Previously, ports allocated where < 1024. With this change,
> these connections, typically mount process, gluster-nfs server processes
> etc could end up using ports that bricks are being assigned to.
>
> IIUC, the intention of the patch was to make server processes lenient to
> inbound messages from ports > 1024. If we don't require to use ports > 1024
> we could leave the port allocation for rpc_clnt connections as before.
> Alternately, we could reserve the range of ports starting from 49152 for
> bricks
> by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is
> specific to Linux.
> I'm not aware of how this could be done in NetBSD for instance though.
>
>
> It seems this is exactly whats happening.
>
> I have a question, I get the following data from netstat and grep
>
> tcp 0 0 f6be17c0fbf5:1023 f6be17c0fbf5:24007 ESTABLISHED 31516/glusterfsd
> tcp 0 0 f6be17c0fbf5:49152 f6be17c0fbf5:490 ESTABLISHED 31516/glusterfsd
> unix 3 [ ] STREAM CONNECTED 988353 31516/glusterfsd
> /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket
>
> Here 31516 is the brick pid.
>
> Looking at the data, line 2 is very clear, it shows connection between
> brick and glusterfs client.
> unix socket on line 3 is also clear, it is the unix socket connection that
> glusterd and brick process use for communication.
>
> I am not able to understand line 1; which part of brick process
> established a tcp connection with glusterd using port 1023?
> Note: this data is from a build which does not have the above mentioned
> patch.
>
>
> The patch which exposed this bug is being reverted till the underlying bug
> is also fixed.
> You can monitor revert patches here
> master: http://review.gluster.org/11507
> 3.7 branch: http://review.gluster.org/11508
>
> Please rebase your patches after the above patches are merged to ensure
> that you patches pass regression.
>
>
>
>
>
> --
> Raghavendra Talur
>
>
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Raghavendra Gowdappa

I've reverted [1] which brought the change allow-insecure to be on by default. 
The patch seems to have issues which will be addressed and merged later. The 
revert can be found at [2].

[1] http://review.gluster.org/11274
[2] http://review.gluster.org/11507

Please let me know if the regressions are still failing.

regards,
Raghavendra.


- Original Message -
> From: "Atin Mukherjee" 
> To: "Prasanna Kalever" 
> Cc: "Gluster Devel" 
> Sent: Thursday, July 2, 2015 9:41:33 PM
> Subject: Re: [Gluster-devel] spurious failures
> tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
> 
> 
> 
> Thanks Prasanna for the patches :)
> 
> -Atin
> Sent from one plus one
> On Jul 2, 2015 9:19 PM, "Prasanna Kalever" < pkale...@redhat.com > wrote:
> 
> 
> 
> This is caused because when bind-insecure is turned on (which is the default
> now), it may happen
> that brick is not able to bind to port assigned by Glusterd for example
> 49192-49195...
> It seems to occur because the rpc_clnt connections are binding to ports in
> the same range.
> so brick fails to bind to a port which is already used by someone else.
> 
> This bug already exist before http://review.gluster.org/#/c/11039/ when use
> rdma, i.e. even
> previously rdma binds to port >= 1024 if it cannot find a free port < 1024,
> even when bind insecure was turned off (ref to commit '0e3fd04e').
> Since we don't have tests related to rdma we did not discover this issue
> previously.
> 
> http://review.gluster.org/#/c/11039/ discovers the bug we encountered,
> however now the bug can be fixed by
> http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port numbers
> from 65535 in a descending
> order, as a result port clash is minimized, also it fixes issues in rdma too
> 
> Thanks to Raghavendra Talur for help in discovering the real cause
> 
> 
> Regards,
> Prasanna Kalever
> 
> 
> 
> - Original Message -
> From: "Raghavendra Talur" < raghavendra.ta...@gmail.com >
> To: "Krishnan Parthasarathi" < kpart...@redhat.com >
> Cc: "Gluster Devel" < gluster-devel@gluster.org >
> Sent: Thursday, July 2, 2015 6:45:17 PM
> Subject: Re: [Gluster-devel] spurious failures
> tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
> 
> 
> 
> On Thu, Jul 2, 2015 at 4:40 PM, Raghavendra Talur <
> raghavendra.ta...@gmail.com > wrote:
> 
> 
> 
> 
> 
> On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi < kpart...@redhat.com
> > wrote:
> 
> 
> 
> > > 
> > > A port assigned by Glusterd for a brick is found to be in use already by
> > > the brick. Any changes in Glusterd recently which can cause this?
> > > 
> > > Or is it a test infra problem?
> 
> This issue is likely to be caused by http://review.gluster.org/11039
> This patch changes the port allocation that happens for rpc_clnt based
> connections. Previously, ports allocated where < 1024. With this change,
> these connections, typically mount process, gluster-nfs server processes
> etc could end up using ports that bricks are being assigned to.
> 
> IIUC, the intention of the patch was to make server processes lenient to
> inbound messages from ports > 1024. If we don't require to use ports > 1024
> we could leave the port allocation for rpc_clnt connections as before.
> Alternately, we could reserve the range of ports starting from 49152 for
> bricks
> by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is specific
> to Linux.
> I'm not aware of how this could be done in NetBSD for instance though.
> 
> 
> It seems this is exactly whats happening.
> 
> I have a question, I get the following data from netstat and grep
> 
> tcp 0 0 f6be17c0fbf5:1023 f6be17c0fbf5:24007 ESTABLISHED 31516/glusterfsd
> tcp 0 0 f6be17c0fbf5:49152 f6be17c0fbf5:490 ESTABLISHED 31516/glusterfsd
> unix 3 [ ] STREAM CONNECTED 988353 31516/glusterfsd
> /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket
> 
> Here 31516 is the brick pid.
> 
> Looking at the data, line 2 is very clear, it shows connection between brick
> and glusterfs client.
> unix socket on line 3 is also clear, it is the unix socket connection that
> glusterd and brick process use for communication.
> 
> I am not able to understand line 1; which part of brick process established a
> tcp connection with glusterd using port 1023?
> Note: this data is from a build which does not have the above mentioned
> patch.
> 
> 
> The patch which exposed this bug is being reverted till the underlying bug is
> also fixed.
> You

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Krishnan Parthasarathi

 
> It seems this is exactly whats happening.
> 
> I have a question, I get the following data from netstat and grep
> 
> tcp0  0 f6be17c0fbf5:1023   f6be17c0fbf5:24007
>  ESTABLISHED 31516/glusterfsd
> tcp0  0 f6be17c0fbf5:49152  f6be17c0fbf5:490
>  ESTABLISHED 31516/glusterfsd
> unix  3  [ ] STREAM CONNECTED 988353   31516/glusterfsd
> /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket
> 
> Here 31516 is the brick pid.
> 
> Looking at the data, line 2 is very clear, it shows connection between
> brick and glusterfs client.
> unix socket on line 3 is also clear, it is the unix socket connection that
> glusterd and brick process use for communication.
> 
> I am not able to understand line 1; which part of brick process established
> a tcp connection with glusterd using port 1023?

This is the rpc connection from any glusterfs(d) process to glusterd to fetch
volfile on receiving notification from glusterd.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Krishnan Parthasarathi



- Original Message -
> 
> This is caused because when bind-insecure is turned on (which is the default
> now), it may happen
> that brick is not able to bind to port assigned by Glusterd for example
> 49192-49195...
> It seems to occur because the rpc_clnt connections are binding to ports in
> the same range.
> so brick fails to bind to a port which is already used by someone else.
> 
> This bug already exist before http://review.gluster.org/#/c/11039/ when use
> rdma, i.e. even
> previously rdma binds to port >= 1024 if it cannot find a free port < 1024,
> even when bind insecure was turned off (ref to commit '0e3fd04e').
> Since we don't have tests related to rdma we did not discover this issue
> previously.
> 
> http://review.gluster.org/#/c/11039/ discovers the bug we encountered,
> however now the bug can be fixed by
> http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port numbers
> from 65535 in a descending
> order, as a result port clash is minimized, also it fixes issues in rdma too

This approach could still surprise the storage-admin when glusterfs(d) processes
bind to ports in the range where brick ports are being assigned. We should make 
this
predictable by reserving brick ports setting net.ipv4.ip_local_reserved_ports.
Initially reserve 50 ports starting at 49152. Subsequently, we could reserve 
ports on demand,
say 50 more ports, when we exhaust previously reserved range. 
net.ipv4.ip_local_reserved_ports
doesn't interfere with explicit port allocation behaviour. i.e if the socket 
uses
a port other than zero. With this option we don't have to manage ports 
assignment at a process
level. Thoughts?

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Atin Mukherjee



On 07/03/2015 11:58 AM, Krishnan Parthasarathi wrote:
> 
> 
> - Original Message -
>>
>> This is caused because when bind-insecure is turned on (which is the default
>> now), it may happen
>> that brick is not able to bind to port assigned by Glusterd for example
>> 49192-49195...
>> It seems to occur because the rpc_clnt connections are binding to ports in
>> the same range.
>> so brick fails to bind to a port which is already used by someone else.
>>
>> This bug already exist before http://review.gluster.org/#/c/11039/ when use
>> rdma, i.e. even
>> previously rdma binds to port >= 1024 if it cannot find a free port < 1024,
>> even when bind insecure was turned off (ref to commit '0e3fd04e').
>> Since we don't have tests related to rdma we did not discover this issue
>> previously.
>>
>> http://review.gluster.org/#/c/11039/ discovers the bug we encountered,
>> however now the bug can be fixed by
>> http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port numbers
>> from 65535 in a descending
>> order, as a result port clash is minimized, also it fixes issues in rdma too
> 
> This approach could still surprise the storage-admin when glusterfs(d) 
> processes
> bind to ports in the range where brick ports are being assigned. We should 
> make this
> predictable by reserving brick ports setting net.ipv4.ip_local_reserved_ports.
> Initially reserve 50 ports starting at 49152. Subsequently, we could reserve 
> ports on demand,
> say 50 more ports, when we exhaust previously reserved range. 
> net.ipv4.ip_local_reserved_ports
> doesn't interfere with explicit port allocation behaviour. i.e if the socket 
> uses
> a port other than zero. With this option we don't have to manage ports 
> assignment at a process
> level. Thoughts?
If the reallocation can be done on demand, I do think this is a better
approach to tackle this problem.
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 

-- 
~Atin
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-03 Thread Krishnan Parthasarathi


> > This approach could still surprise the storage-admin when glusterfs(d)
> > processes
> > bind to ports in the range where brick ports are being assigned. We should
> > make this
> > predictable by reserving brick ports setting
> > net.ipv4.ip_local_reserved_ports.
> > Initially reserve 50 ports starting at 49152. Subsequently, we could
> > reserve ports on demand,
> > say 50 more ports, when we exhaust previously reserved range.
> > net.ipv4.ip_local_reserved_ports
> > doesn't interfere with explicit port allocation behaviour. i.e if the
> > socket uses
> > a port other than zero. With this option we don't have to manage ports
> > assignment at a process
> > level. Thoughts?
> If the reallocation can be done on demand, I do think this is a better
> approach to tackle this problem.

We could fix the predictability aspect in a different patch. This patch, where
we assign ports starting from 65335 in descending order, can be reviewed 
independently.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

95 matches

Mail list logo