Re: [Gluster-devel] spurious regression errors getting worse

2015-11-09 Thread Atin Mukherjee
-Atin
Sent from one plus one
On Nov 9, 2015 7:11 PM, "Dan Lambright"  wrote:
>
>
>
> - Original Message -
> > From: "Niels de Vos" 
> > To: "Atin Mukherjee" , "Rajesh Joseph" <
rjos...@redhat.com>
> > Cc: "Dan Lambright" , "Gluster Devel" <
gluster-devel@gluster.org>
> > Sent: Monday, November 9, 2015 7:39:28 AM
> > Subject: Re: [Gluster-devel] spurious regression errors getting worse
> >
> > On Fri, Nov 06, 2015 at 09:36:50AM +0530, Atin Mukherjee wrote:
> > >
> > >
> > > On 11/06/2015 07:47 AM, Dan Lambright wrote:
> > > > It seems to have become more difficult in the last week to pass
> > > > regression tests.
> > > >
> > > > I've started recording the tests that seem to be failing the most:
> > > >
> > > > bug-1221481-allow-fops-on-dir-split-brain.t
> > > > bug-1238706-daemons-stop-on-peer-cleanup.t
> > > You shouldn't be worried about
> > > bug-1238706-daemons-stop-on-peer-cleanup.t as its marked as bad. Also
> > > respective failure links for all these tests would help component
owners
> > > to root cause the issues. Probably you could add all of them here [1]
> >
> > We really need to file bugs for the problems, it allows us to get
> > notifications about problems. Etherpads can be nice to dedicated work,
> > but many of us do not check them regularly.
>
> There does not seem to be a small core set of bugs that fails regularly.
Rather, the set of spurious bugs seems to be large.
> So we would be filing a lot of bugs. But, it could be done. And is
probably more effective than a rarely consulted ether pad.
Filing bugs is definitely a next step process even if the etherpad is
maintained. So my two cents. I just wanted to ensure that we have at least
one place holder (etherpad here as that has been a process till now for
spurious failures) to file all these failures.
>
>
> >
> > Rajesh, tests/bugs/snapshot/bug-1227646.t resulted in a core for this
> > run:
> >
> >
https://build.gluster.org/job/rackspace-regression-2GB-triggered/15664/consoleFull
> >
> > I'm not sure if someone is looking into that already?
> >
> > Thanks,
> > Niels
> >
> > >
> > > [1] https://public.pad.fsfe.org/p/gluster-spurious-failures
> > >
> > > Thanks,
> > > Atin
> > > > ./tests/bugs/quota/bug-1235182.t
> > > > ./tests/bugs/distribute/bug-1066798.t
> > > > ./tests/bugs/snapshot/bug-1166197.t
> > > >
> > > > In some cases regression must be run a half dozen times before
finally
> > > > passing.
> > > >
> > > > Could the owners those tests please look into these?
> > > > ___
> > > > Gluster-devel mailing list
> > > > Gluster-devel@gluster.org
> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > >
> > > ___
> > > Gluster-devel mailing list
> > > Gluster-devel@gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> >
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] what rpm sub-package do /usr/{libexec, sbin}/gfind_missing_files belong to?

2015-11-09 Thread Kaleb S. KEITHLEY

the in-tree glusterfs.spec(.in) has them immediately following the
geo-rep sub-package, but outside the %if ... %endif.

Are they part of geo-rep? Or something else?

-- 

Kaleb
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious regression errors getting worse

2015-11-09 Thread Niels de Vos
On Fri, Nov 06, 2015 at 09:36:50AM +0530, Atin Mukherjee wrote:
> 
> 
> On 11/06/2015 07:47 AM, Dan Lambright wrote:
> > It seems to have become more difficult in the last week to pass regression 
> > tests.
> > 
> > I've started recording the tests that seem to be failing the most:
> > 
> > bug-1221481-allow-fops-on-dir-split-brain.t
> > bug-1238706-daemons-stop-on-peer-cleanup.t
> You shouldn't be worried about
> bug-1238706-daemons-stop-on-peer-cleanup.t as its marked as bad. Also
> respective failure links for all these tests would help component owners
> to root cause the issues. Probably you could add all of them here [1]

We really need to file bugs for the problems, it allows us to get
notifications about problems. Etherpads can be nice to dedicated work,
but many of us do not check them regulary.

Rajesh, tests/bugs/snapshot/bug-1227646.t resulted in a core for this
run:

  
https://build.gluster.org/job/rackspace-regression-2GB-triggered/15664/consoleFull

I'm not sure if someone is looking into that already?

Thanks,
Niels

> 
> [1] https://public.pad.fsfe.org/p/gluster-spurious-failures
> 
> Thanks,
> Atin
> > ./tests/bugs/quota/bug-1235182.t
> > ./tests/bugs/distribute/bug-1066798.t
> > ./tests/bugs/snapshot/bug-1166197.t
> > 
> > In some cases regression must be run a half dozen times before finally 
> > passing.
> > 
> > Could the owners those tests please look into these?
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel


signature.asc
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious regression errors getting worse

2015-11-09 Thread Dan Lambright


- Original Message -
> From: "Niels de Vos" 
> To: "Atin Mukherjee" , "Rajesh Joseph" 
> 
> Cc: "Dan Lambright" , "Gluster Devel" 
> 
> Sent: Monday, November 9, 2015 7:39:28 AM
> Subject: Re: [Gluster-devel] spurious regression errors getting worse
> 
> On Fri, Nov 06, 2015 at 09:36:50AM +0530, Atin Mukherjee wrote:
> > 
> > 
> > On 11/06/2015 07:47 AM, Dan Lambright wrote:
> > > It seems to have become more difficult in the last week to pass
> > > regression tests.
> > > 
> > > I've started recording the tests that seem to be failing the most:
> > > 
> > > bug-1221481-allow-fops-on-dir-split-brain.t
> > > bug-1238706-daemons-stop-on-peer-cleanup.t
> > You shouldn't be worried about
> > bug-1238706-daemons-stop-on-peer-cleanup.t as its marked as bad. Also
> > respective failure links for all these tests would help component owners
> > to root cause the issues. Probably you could add all of them here [1]
> 
> We really need to file bugs for the problems, it allows us to get
> notifications about problems. Etherpads can be nice to dedicated work,
> but many of us do not check them regularly.

There does not seem to be a small core set of bugs that fails regularly. 
Rather, the set of spurious bugs seems to be large.
So we would be filing a lot of bugs. But, it could be done. And is probably 
more effective than a rarely consulted ether pad.


> 
> Rajesh, tests/bugs/snapshot/bug-1227646.t resulted in a core for this
> run:
> 
>   
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/15664/consoleFull
> 
> I'm not sure if someone is looking into that already?
> 
> Thanks,
> Niels
> 
> > 
> > [1] https://public.pad.fsfe.org/p/gluster-spurious-failures
> > 
> > Thanks,
> > Atin
> > > ./tests/bugs/quota/bug-1235182.t
> > > ./tests/bugs/distribute/bug-1066798.t
> > > ./tests/bugs/snapshot/bug-1166197.t
> > > 
> > > In some cases regression must be run a half dozen times before finally
> > > passing.
> > > 
> > > Could the owners those tests please look into these?
> > > ___
> > > Gluster-devel mailing list
> > > Gluster-devel@gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > 
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Gerrit review, submit type and Jenkins testing

2015-11-09 Thread Raghavendra Talur
Hi,

While trying to understand how our gerrit+jenkins setup works, I realized
of a possibility of allowing bugs to get in.

Currently, our gerrit is setup to have cherry-pick as the submit type. Now
consider a case where:

Dev1 sends a commit B with parent commit A(A is already merged).
Dev2 sends a commit C with parent commit A(A is already merged).

Both the patches get +2 from Jenkins.

Maintainer merges commit B from Dev1.
Another maintainer merges commit C from Dev2.

If the two commits B and C changed code which had no merge conflicts but
were conflicting in logic,
then we have a master which has bugs.

If Dev3 now sends a commit D with re-based master as parent, we have the
following cases:

1. If bug introduced above is not racy, we have tests always failing for
Dev3 on commit D. Tests that fail would be from components that commit B
and C changed. Dev3 has no idea on how to fix them and has to enlist help
from Dev1 and Dev2.

2. If bug introduced above is racy, then there is a probability that Dev3
escapes from this trouble and someone else will bear it later. Even if the
racy code is hit and test fails, Dev3 will probably re-trigger the tests
given that they failed for a component which is not related to his/her code
and the bug stays in code longer.

The most obvious but not practical solution to the above problem is to
change the submit type in gerrit to "fast-forward only". It would then
ensure that once commit B is merged, Dev2 has to re-base and re-run the
tests on commit C with commit B as parent, before it could be merged. It is
not practical because it will cause all patches in review to get re-based
and re-triggered whenever a patch is merged.

A little modification to the above solution would be to

   - change submit type to fast-forward only
   - don't run any jenkins job on patches till they get +2 from reviewers
   - once a +2 is given, run jenkins job on patch and automatically submit
   it if test passes.
   - automatically rebase all patches on review with new master and mark
   conflict if merge conflict arises.

As a side effect of this, Dev would now be forced to run a complete
regression on dev machine before sending a patch for review.

Any thoughts on the above solutions or other suggestions?

Thanks,
Raghavendra Talur
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious regression errors getting worse

2015-11-09 Thread Dan Lambright


- Original Message -
> From: "Niels de Vos" 
> To: "Dan Lambright" 
> Cc: "Gluster Devel" 
> Sent: Monday, November 9, 2015 4:09:08 PM
> Subject: Re: [Gluster-devel] spurious regression errors getting worse
> 
> On Thu, Nov 05, 2015 at 09:17:28PM -0500, Dan Lambright wrote:
> > It seems to have become more difficult in the last week to pass regression
> > tests.
> > 
> > I've started recording the tests that seem to be failing the most:
> > 
> > bug-1221481-allow-fops-on-dir-split-brain.t
> > bug-1238706-daemons-stop-on-peer-cleanup.t
> > ./tests/bugs/quota/bug-1235182.t
> > ./tests/bugs/distribute/bug-1066798.t
> > ./tests/bugs/snapshot/bug-1166197.t
> > 
> > In some cases regression must be run a half dozen times before finally
> > passing.
> > 
> > Could the owners those tests please look into these?
> 
> https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/11617/consoleFull
> failed on
> 
> [16:18:49] ./tests/basic/tier/fops-during-migration-pause.t ..
> not ok 19
> not ok 20
> Failed 2/20 subtests
> [16:18:49]
> 
> Please have a look. Thanks,

Hm. This one most certainly broke due to one of the fixes we merged over the 
weekend, its spurious and snuck through.
Will fix it right away.
Thank you

> Niels
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gerrit review, submit type and Jenkins testing

2015-11-09 Thread Atin Mukherjee
-Atin
Sent from one plus one
On Nov 10, 2015 11:24 AM, "Kaushal M"  wrote:
>
> On Tue, Nov 10, 2015 at 9:24 AM, Raghavendra Gowdappa
>  wrote:
> >
> >
> > - Original Message -
> >> From: "Raghavendra Talur" 
> >> To: "Gluster Devel" 
> >> Sent: Tuesday, November 10, 2015 3:10:34 AM
> >> Subject: [Gluster-devel] Gerrit review, submit type and Jenkins testing
> >>
> >> Hi,
> >>
> >> While trying to understand how our gerrit+jenkins setup works, I
realized of
> >> a possibility of allowing bugs to get in.
> >>
> >> Currently, our gerrit is setup to have cherry-pick as the submit type.
Now
> >> consider a case where:
> >>
> >> Dev1 sends a commit B with parent commit A(A is already merged).
> >> Dev2 sends a commit C with parent commit A(A is already merged).
> >>
> >> Both the patches get +2 from Jenkins.
> >>
> >> Maintainer merges commit B from Dev1.
> >> Another maintainer merges commit C from Dev2.
> >>
> >> If the two commits B and C changed code which had no merge conflicts
but were
> >> conflicting in logic,
> >> then we have a master which has bugs.
> >>
> >> If Dev3 now sends a commit D with re-based master as parent, we have
the
> >> following cases:
> >>
> >> 1. If bug introduced above is not racy, we have tests always failing
for Dev3
> >> on commit D. Tests that fail would be from components that commit B
and C
> >> changed. Dev3 has no idea on how to fix them and has to enlist help
from
> >> Dev1 and Dev2.
> >>
> >> 2. If bug introduced above is racy, then there is a probability that
Dev3
> >> escapes from this trouble and someone else will bear it later. Even if
the
> >> racy code is hit and test fails, Dev3 will probably re-trigger the
tests
> >> given that they failed for a component which is not related to his/her
code
> >> and the bug stays in code longer.
> >>
> >> The most obvious but not practical solution to the above problem is to
change
> >> the submit type in gerrit to "fast-forward only". It would then ensure
that
> >> once commit B is merged, Dev2 has to re-base and re-run the tests on
commit
> >> C with commit B as parent, before it could be merged. It is not
practical
> >> because it will cause all patches in review to get re-based and
re-triggered
> >> whenever a patch is merged.
> >>
> >> A little modification to the above solution would be to
> >>
> >>
> >> * change submit type to fast-forward only
> >> * don't run any jenkins job on patches till they get +2 from
reviewers
> >> * once a +2 is given, run jenkins job on patch and automatically
submit
> >> it if test passes.
> >> * automatically rebase all patches on review with new master and
mark
> >> conflict if merge conflict arises.
> >
> > Seems like a good suggestion. How about a slight variation to the above
process? Can we run one initial set of regression immediately after
submission, but before any reviews? That way reviewers can prioritize those
patches that have passed regression over the ones that have failed? Flip
side is that minimum two sets of regressions are needed to merge any patch.
I am making this suggestion with the assumption that dev/reviewer time is
more precious than machine time. Of course, this will have issues with
patches that need to get in urgently (user/customer hot fix etc) where time
is a constraint. But that can be worked around on a case-by-case basis.
>
> We would still be running smoke, which would catch any very obvious
> mistakes, isn't this enough?
>
> Regarding the initial regression run, would it include the complete
> regression suite or just a subset. If it is the complete set, then it
> would be no different from what we are doing now. If it is a subset,
> then we will need to come up with a subset of the regression suite
> that catches most of obvious mistakes. We've had discussions several
> times (don't remember if it was on the mailing lists, but I have had
> conversations) about doing this. Every time we ended up on the
> question of how we choose the subset, which is where we stopped.
How about running tests/basic/*.t? I know coverage wise this isn't good
enough, but still better than running everything or nothing.
>
> >
> >>
> >> As a side effect of this, Dev would now be forced to run a complete
> >> regression on dev machine before sending a patch for review.
> >>
> >> Any thoughts on the above solutions or other suggestions?
> >>
> >> Thanks,
> >> Raghavendra Talur
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> ___
> >> Gluster-devel mailing list
> >> Gluster-devel@gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-devel
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org

Re: [Gluster-devel] Gerrit review, submit type and Jenkins testing

2015-11-09 Thread Kaushal M
On Tue, Nov 10, 2015 at 9:24 AM, Raghavendra Gowdappa
 wrote:
>
>
> - Original Message -
>> From: "Raghavendra Talur" 
>> To: "Gluster Devel" 
>> Sent: Tuesday, November 10, 2015 3:10:34 AM
>> Subject: [Gluster-devel] Gerrit review, submit type and Jenkins testing
>>
>> Hi,
>>
>> While trying to understand how our gerrit+jenkins setup works, I realized of
>> a possibility of allowing bugs to get in.
>>
>> Currently, our gerrit is setup to have cherry-pick as the submit type. Now
>> consider a case where:
>>
>> Dev1 sends a commit B with parent commit A(A is already merged).
>> Dev2 sends a commit C with parent commit A(A is already merged).
>>
>> Both the patches get +2 from Jenkins.
>>
>> Maintainer merges commit B from Dev1.
>> Another maintainer merges commit C from Dev2.
>>
>> If the two commits B and C changed code which had no merge conflicts but were
>> conflicting in logic,
>> then we have a master which has bugs.
>>
>> If Dev3 now sends a commit D with re-based master as parent, we have the
>> following cases:
>>
>> 1. If bug introduced above is not racy, we have tests always failing for Dev3
>> on commit D. Tests that fail would be from components that commit B and C
>> changed. Dev3 has no idea on how to fix them and has to enlist help from
>> Dev1 and Dev2.
>>
>> 2. If bug introduced above is racy, then there is a probability that Dev3
>> escapes from this trouble and someone else will bear it later. Even if the
>> racy code is hit and test fails, Dev3 will probably re-trigger the tests
>> given that they failed for a component which is not related to his/her code
>> and the bug stays in code longer.
>>
>> The most obvious but not practical solution to the above problem is to change
>> the submit type in gerrit to "fast-forward only". It would then ensure that
>> once commit B is merged, Dev2 has to re-base and re-run the tests on commit
>> C with commit B as parent, before it could be merged. It is not practical
>> because it will cause all patches in review to get re-based and re-triggered
>> whenever a patch is merged.
>>
>> A little modification to the above solution would be to
>>
>>
>> * change submit type to fast-forward only
>> * don't run any jenkins job on patches till they get +2 from reviewers
>> * once a +2 is given, run jenkins job on patch and automatically submit
>> it if test passes.
>> * automatically rebase all patches on review with new master and mark
>> conflict if merge conflict arises.
>
> Seems like a good suggestion. How about a slight variation to the above 
> process? Can we run one initial set of regression immediately after 
> submission, but before any reviews? That way reviewers can prioritize those 
> patches that have passed regression over the ones that have failed? Flip side 
> is that minimum two sets of regressions are needed to merge any patch. I am 
> making this suggestion with the assumption that dev/reviewer time is more 
> precious than machine time. Of course, this will have issues with patches 
> that need to get in urgently (user/customer hot fix etc) where time is a 
> constraint. But that can be worked around on a case-by-case basis.

We would still be running smoke, which would catch any very obvious
mistakes, isn't this enough?

Regarding the initial regression run, would it include the complete
regression suite or just a subset. If it is the complete set, then it
would be no different from what we are doing now. If it is a subset,
then we will need to come up with a subset of the regression suite
that catches most of obvious mistakes. We've had discussions several
times (don't remember if it was on the mailing lists, but I have had
conversations) about doing this. Every time we ended up on the
question of how we choose the subset, which is where we stopped.

>
>>
>> As a side effect of this, Dev would now be forced to run a complete
>> regression on dev machine before sending a patch for review.
>>
>> Any thoughts on the above solutions or other suggestions?
>>
>> Thanks,
>> Raghavendra Talur
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gerrit review, submit type and Jenkins testing

2015-11-09 Thread Atin Mukherjee


On 11/10/2015 03:10 AM, Raghavendra Talur wrote:
> Hi,
> 
> While trying to understand how our gerrit+jenkins setup works, I
> realized of a possibility of allowing bugs to get in.
> 
> Currently, our gerrit is setup to have cherry-pick as the submit type.
> Now consider a case where:
> 
> Dev1 sends a commit B with parent commit A(A is already merged).
> Dev2 sends a commit C with parent commit A(A is already merged).
> 
> Both the patches get +2 from Jenkins.
> 
> Maintainer merges commit B from Dev1.
> Another maintainer merges commit C from Dev2.
> 
> If the two commits B and C changed code which had no merge conflicts but
> were conflicting in logic,
> then we have a master which has bugs.
> 
> If Dev3 now sends a commit D with re-based master as parent, we have the
> following cases:
> 
> 1. If bug introduced above is not racy, we have tests always failing for
> Dev3 on commit D. Tests that fail would be from components that commit B
> and C changed. Dev3 has no idea on how to fix them and has to enlist
> help from Dev1 and Dev2.
> 
> 2. If bug introduced above is racy, then there is a probability that
> Dev3 escapes from this trouble and someone else will bear it later. Even
> if the racy code is hit and test fails, Dev3 will probably re-trigger
> the tests given that they failed for a component which is not related to
> his/her code and the bug stays in code longer.
> 
> The most obvious but not practical solution to the above problem is to
> change the submit type in gerrit to "fast-forward only". It would then
> ensure that once commit B is merged, Dev2 has to re-base and re-run the
> tests on commit C with commit B as parent, before it could be merged. It
> is not practical because it will cause all patches in review to get
> re-based and re-triggered whenever a patch is merged.
> 
> A little modification to the above solution would be to 
> 
>   * change submit type to fast-forward only
>   * don't run any jenkins job on patches till they get +2 from reviewers
>   * once a +2 is given, run jenkins job on patch and automatically
> submit it if test passes.
>   * automatically rebase all patches on review with new master and mark
> conflict if merge conflict arises.
The overall idea looks good to me, however I'd be bit hesitant to give a
+2 before seeing the regression votes until and unless the patch is
pretty straight forwad. For me +1 sounds to be a good level to trigger
the regression. Once the regression passes, maintainers can +2 a patch
and then merge it manually. And then the 4th point follows.

Thoughts?

~Atin
> 
> As a side effect of this, Dev would now be forced to run a complete
> regression on dev machine before sending a patch for review.
> 
> Any thoughts on the above solutions or other suggestions?
> 
> Thanks,
> Raghavendra Talur
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gerrit review, submit type and Jenkins testing

2015-11-09 Thread Kaushal M
On Tue, Nov 10, 2015 at 3:10 AM, Raghavendra Talur  wrote:
> Hi,
>
> While trying to understand how our gerrit+jenkins setup works, I realized of
> a possibility of allowing bugs to get in.
>
> Currently, our gerrit is setup to have cherry-pick as the submit type. Now
> consider a case where:
>
> Dev1 sends a commit B with parent commit A(A is already merged).
> Dev2 sends a commit C with parent commit A(A is already merged).
>
> Both the patches get +2 from Jenkins.
>
> Maintainer merges commit B from Dev1.
> Another maintainer merges commit C from Dev2.
>
> If the two commits B and C changed code which had no merge conflicts but
> were conflicting in logic,
> then we have a master which has bugs.
>
> If Dev3 now sends a commit D with re-based master as parent, we have the
> following cases:
>
> 1. If bug introduced above is not racy, we have tests always failing for
> Dev3 on commit D. Tests that fail would be from components that commit B and
> C changed. Dev3 has no idea on how to fix them and has to enlist help from
> Dev1 and Dev2.
>
> 2. If bug introduced above is racy, then there is a probability that Dev3
> escapes from this trouble and someone else will bear it later. Even if the
> racy code is hit and test fails, Dev3 will probably re-trigger the tests
> given that they failed for a component which is not related to his/her code
> and the bug stays in code longer.
>
> The most obvious but not practical solution to the above problem is to
> change the submit type in gerrit to "fast-forward only". It would then
> ensure that once commit B is merged, Dev2 has to re-base and re-run the
> tests on commit C with commit B as parent, before it could be merged. It is
> not practical because it will cause all patches in review to get re-based
> and re-triggered whenever a patch is merged.
>
> A little modification to the above solution would be to
>
> change submit type to fast-forward only
> don't run any jenkins job on patches till they get +2 from reviewers
> once a +2 is given, run jenkins job on patch and automatically submit it if
> test passes.
> automatically rebase all patches on review with new master and mark conflict
> if merge conflict arises.

Have you checked if this is even possible?

>
> As a side effect of this, Dev would now be forced to run a complete
> regression on dev machine before sending a patch for review.
>
> Any thoughts on the above solutions or other suggestions?
>
> Thanks,
> Raghavendra Talur
>
>
>
>
>
>
>
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Logging framework in Gluter 4.0

2015-11-09 Thread Kaushal M
On Fri, Nov 6, 2015 at 10:45 PM, Atin Mukherjee
 wrote:
> -Atin
> Sent from one plus one
>
>
> On Nov 6, 2015 7:50 PM, "Shyam"  wrote:
>>
>> On 11/06/2015 06:58 AM, Atin Mukherjee wrote:
>>>
>>>
>>>
>>> On 11/06/2015 01:30 PM, Aravinda wrote:


 regards
 Aravinda
 http://aravindavk.in

 On 11/06/2015 12:28 PM, Avra Sengupta wrote:
>
> Hi,
>
> As almost all the components targeted for Gluster 4.0 have moved from
> design phase to implementation phase on some level or another, I feel
> it's time to get some consensus on the logging framework we are going
> to use. Are we going to stick with the message ID formatted logging
> framework in use today, or shall we move on to a better solution.
>>
>>
>> I would prefer to think that we extend the log messages with what we want.
>> Message IDs are there for a purpose as outlined in earlier mails on the same
>> topic, so we [w|s]hould stick with message IDs and add more is how I think
>> of the problem.
>>
>> If we take inspiration from other frameworks, I would say we improve what
>> we log.
>>
>> Where we log is made pluggable in the current framework, although as we
>> add more logging frameworks (like rsyslog or lumberjack etc.) the
>> abstraction provided for plugging in these could improve. This is a
>> contained change in the logging.c file though.
>>
>>
>
> Some good to have features I would like to have in the logging
> framework are:
> 1. Specific distinction between logs of various transactions (or
> operations), that would make it a lot easier to make sense than
> looking at some serialized log dump

 Externally grouping Message IDs may help in this situation.
>>>
>>> Transaction Id is definitely a *must have* requirement when 4.0's theme
>>> is scalability where we are talking about number of servers been
>>> thousands in the configuration with out which analysis of an issue will
>>> be a nightmare. GlusterD 2.0 is definitely going to have logs with txn
>>> ids.
>>
>>
>> +1
>>
>> I thought this was discussed in another thread on the lists, not able to
>> find it, a link would help.
>>
>> The question is, should this be automated by the logging framework, or
>> something each xlator should do?
>>
>> I prefer automating this, but how? (frame pointers? at least on the IO
>> xlator stack) What is glusterd doing for the same?
> As of now we have started using logrus pkg in golang which logs every
> message with its txn id. How it does is what I need to find out looking at
> the pkg code (if you are looking to borrow the similar idea)
>

The logrus[1] package provides structured logging and ability to build
incremental log contexts. This makes it possible to do things like
message-ids. But IMO the advantage of using structured logging is that
it would kind of remove the need for message-ids.

In structured logging, it encouraged to keep the log string itself
constant and keep the variable portions as metadata attached to the
log, generally as key-value pairs. This allows us to implement things
like message-ids, but it is not really required. As the log string is
kept constant, it becomes easier to search for individual logs and
document them. For example (not really the best example), currently we
log socket failures in the following format

"readv on some.socket failed (No data available)"

This is hard to search for or parse, as the log string itself changes
when the socket or error changes. If using structured logging, this
log would be in the format

"readv on socket failed socket=some.socket errno=ENODATA"

This is much more easier to search and parse.

Using log-contexts allows attaching a certain metadata to all logs in
a context, for example, attaching transaction-ids to all logs of a
transaction. If we were to use log-contexts above example we could
attach the transaction-id to the log,

"readv on socket failed socket=some.socket errno=ENODATA
txn-id="

allowing us to easily track during which transaction did the read fail.

>>
>>

> 2. We often encounter issues, and then find ourselves wishing the
> process was running in debug mode so that we could have gotten some
> debug logs. If there is any solution that enables me tracability of
> the process's flow, without compromising the space constraints of the
> logs that would be great to have.

 I think after each debugging session, we should revise the log messages
 to get following answers.
 1. Is something missing in log, which could have reduced the time spent
 to root cause.
 2. Logging is available but imcomplete details. (For example, on volume
 set, debug message will have key and value but no Volume name)
 3. Is something redundant or not required. (For example, Geo-replication
 worker logs sync engine as rsync or tar mode. If we have multiple
 workers then all workers prints this log 

Re: [Gluster-devel] Logging framework in Gluter 4.0

2015-11-09 Thread Kaushal M
On Tue, Nov 10, 2015 at 12:05 PM, Kaushal M  wrote:
> On Fri, Nov 6, 2015 at 10:45 PM, Atin Mukherjee
>  wrote:
>> -Atin
>> Sent from one plus one
>>
>>
>> On Nov 6, 2015 7:50 PM, "Shyam"  wrote:
>>>
>>> On 11/06/2015 06:58 AM, Atin Mukherjee wrote:



 On 11/06/2015 01:30 PM, Aravinda wrote:
>
>
> regards
> Aravinda
> http://aravindavk.in
>
> On 11/06/2015 12:28 PM, Avra Sengupta wrote:
>>
>> Hi,
>>
>> As almost all the components targeted for Gluster 4.0 have moved from
>> design phase to implementation phase on some level or another, I feel
>> it's time to get some consensus on the logging framework we are going
>> to use. Are we going to stick with the message ID formatted logging
>> framework in use today, or shall we move on to a better solution.
>>>
>>>
>>> I would prefer to think that we extend the log messages with what we want.
>>> Message IDs are there for a purpose as outlined in earlier mails on the same
>>> topic, so we [w|s]hould stick with message IDs and add more is how I think
>>> of the problem.
>>>
>>> If we take inspiration from other frameworks, I would say we improve what
>>> we log.
>>>
>>> Where we log is made pluggable in the current framework, although as we
>>> add more logging frameworks (like rsyslog or lumberjack etc.) the
>>> abstraction provided for plugging in these could improve. This is a
>>> contained change in the logging.c file though.
>>>
>>>
>>
>> Some good to have features I would like to have in the logging
>> framework are:
>> 1. Specific distinction between logs of various transactions (or
>> operations), that would make it a lot easier to make sense than
>> looking at some serialized log dump
>
> Externally grouping Message IDs may help in this situation.

 Transaction Id is definitely a *must have* requirement when 4.0's theme
 is scalability where we are talking about number of servers been
 thousands in the configuration with out which analysis of an issue will
 be a nightmare. GlusterD 2.0 is definitely going to have logs with txn
 ids.
>>>
>>>
>>> +1
>>>
>>> I thought this was discussed in another thread on the lists, not able to
>>> find it, a link would help.
>>>
>>> The question is, should this be automated by the logging framework, or
>>> something each xlator should do?
>>>
>>> I prefer automating this, but how? (frame pointers? at least on the IO
>>> xlator stack) What is glusterd doing for the same?
>> As of now we have started using logrus pkg in golang which logs every
>> message with its txn id. How it does is what I need to find out looking at
>> the pkg code (if you are looking to borrow the similar idea)
>>
>
> The logrus[1] package provides structured logging and ability to build
> incremental log contexts. This makes it possible to do things like
> message-ids. But IMO the advantage of using structured logging is that
> it would kind of remove the need for message-ids.
>
> In structured logging, it encouraged to keep the log string itself
> constant and keep the variable portions as metadata attached to the
> log, generally as key-value pairs. This allows us to implement things
> like message-ids, but it is not really required. As the log string is
> kept constant, it becomes easier to search for individual logs and
> document them. For example (not really the best example), currently we
> log socket failures in the following format
>
> "readv on some.socket failed (No data available)"
>
> This is hard to search for or parse, as the log string itself changes
> when the socket or error changes. If using structured logging, this
> log would be in the format
>
> "readv on socket failed socket=some.socket errno=ENODATA"
>
> This is much more easier to search and parse.
>
> Using log-contexts allows attaching a certain metadata to all logs in
> a context, for example, attaching transaction-ids to all logs of a
> transaction. If we were to use log-contexts above example we could
> attach the transaction-id to the log,
>
> "readv on socket failed socket=some.socket errno=ENODATA
> txn-id="
>
> allowing us to easily track during which transaction did the read fail.
>

(Missed the link)
[1]: https://github.com/sirupsen/logrus

>>>
>>>
>
>> 2. We often encounter issues, and then find ourselves wishing the
>> process was running in debug mode so that we could have gotten some
>> debug logs. If there is any solution that enables me tracability of
>> the process's flow, without compromising the space constraints of the
>> logs that would be great to have.
>
> I think after each debugging session, we should revise the log messages
> to get following answers.
> 1. Is something missing in log, which could have reduced the time spent
> to root cause.
> 2. Logging is available but imcomplete details. (For 

[Gluster-devel] iostat not showing data transfer while doing read operation with libgfapi

2015-11-09 Thread satish kondapalli
Hi,

I am running performance  test between fuse vs libgfapi.  I have a single
node, client and server are running on same node. I have NVMe SSD device as
a storage.

My volume info::

[root@sys04 ~]# gluster vol info
Volume Name: vol1
Type: Distribute
Volume ID: 9f60ceaf-3643-4325-855a-455974e36cc7
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 172.16.71.19:/mnt_nvme/brick1
Options Reconfigured:
performance.cache-size: 0
performance.write-behind: off
performance.read-ahead: off
performance.io-cache: off
performance.strict-o-direct: on


fio Job file::

[global]
direct=1
runtime=20
time_based
ioengine=gfapi
iodepth=1
volume=vol1
brick=172.16.71.19
rw=read
size=128g
bs=32k
group_reporting
numjobs=1
filename=128g.bar

While doing sequential read test, I am not seeing any data transfer on
device with iostat tool.  Looks like gfapi engine is reading from the cache
because i am reading from same file with different block sizes.

But i disabled  io cache  for my volume. Can someone help me  from where
fio is reading the data?


Sateesh
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Gerrit review, submit type and Jenkins testing

2015-11-09 Thread Raghavendra Gowdappa


- Original Message -
> From: "Raghavendra Talur" 
> To: "Gluster Devel" 
> Sent: Tuesday, November 10, 2015 3:10:34 AM
> Subject: [Gluster-devel] Gerrit review, submit type and Jenkins testing
> 
> Hi,
> 
> While trying to understand how our gerrit+jenkins setup works, I realized of
> a possibility of allowing bugs to get in.
> 
> Currently, our gerrit is setup to have cherry-pick as the submit type. Now
> consider a case where:
> 
> Dev1 sends a commit B with parent commit A(A is already merged).
> Dev2 sends a commit C with parent commit A(A is already merged).
> 
> Both the patches get +2 from Jenkins.
> 
> Maintainer merges commit B from Dev1.
> Another maintainer merges commit C from Dev2.
> 
> If the two commits B and C changed code which had no merge conflicts but were
> conflicting in logic,
> then we have a master which has bugs.
> 
> If Dev3 now sends a commit D with re-based master as parent, we have the
> following cases:
> 
> 1. If bug introduced above is not racy, we have tests always failing for Dev3
> on commit D. Tests that fail would be from components that commit B and C
> changed. Dev3 has no idea on how to fix them and has to enlist help from
> Dev1 and Dev2.
> 
> 2. If bug introduced above is racy, then there is a probability that Dev3
> escapes from this trouble and someone else will bear it later. Even if the
> racy code is hit and test fails, Dev3 will probably re-trigger the tests
> given that they failed for a component which is not related to his/her code
> and the bug stays in code longer.
> 
> The most obvious but not practical solution to the above problem is to change
> the submit type in gerrit to "fast-forward only". It would then ensure that
> once commit B is merged, Dev2 has to re-base and re-run the tests on commit
> C with commit B as parent, before it could be merged. It is not practical
> because it will cause all patches in review to get re-based and re-triggered
> whenever a patch is merged.
> 
> A little modification to the above solution would be to
> 
> 
> * change submit type to fast-forward only
> * don't run any jenkins job on patches till they get +2 from reviewers
> * once a +2 is given, run jenkins job on patch and automatically submit
> it if test passes.
> * automatically rebase all patches on review with new master and mark
> conflict if merge conflict arises.

Seems like a good suggestion. How about a slight variation to the above 
process? Can we run one initial set of regression immediately after submission, 
but before any reviews? That way reviewers can prioritize those patches that 
have passed regression over the ones that have failed? Flip side is that 
minimum two sets of regressions are needed to merge any patch. I am making this 
suggestion with the assumption that dev/reviewer time is more precious than 
machine time. Of course, this will have issues with patches that need to get in 
urgently (user/customer hot fix etc) where time is a constraint. But that can 
be worked around on a case-by-case basis.

> 
> As a side effect of this, Dev would now be forced to run a complete
> regression on dev machine before sending a patch for review.
> 
> Any thoughts on the above solutions or other suggestions?
> 
> Thanks,
> Raghavendra Talur
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious regression errors getting worse

2015-11-09 Thread Niels de Vos
On Thu, Nov 05, 2015 at 09:17:28PM -0500, Dan Lambright wrote:
> It seems to have become more difficult in the last week to pass regression 
> tests.
> 
> I've started recording the tests that seem to be failing the most:
> 
> bug-1221481-allow-fops-on-dir-split-brain.t
> bug-1238706-daemons-stop-on-peer-cleanup.t
> ./tests/bugs/quota/bug-1235182.t
> ./tests/bugs/distribute/bug-1066798.t
> ./tests/bugs/snapshot/bug-1166197.t
> 
> In some cases regression must be run a half dozen times before finally 
> passing.
> 
> Could the owners those tests please look into these?

https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/11617/consoleFull
failed on 

[16:18:49] ./tests/basic/tier/fops-during-migration-pause.t .. 
not ok 19 
not ok 20 
Failed 2/20 subtests 
[16:18:49]

Please have a look. Thanks,
Niels


signature.asc
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel