Re: [Gluster-devel] spurious regression errors getting worse
-Atin Sent from one plus one On Nov 9, 2015 7:11 PM, "Dan Lambright"wrote: > > > > - Original Message - > > From: "Niels de Vos" > > To: "Atin Mukherjee" , "Rajesh Joseph" < rjos...@redhat.com> > > Cc: "Dan Lambright" , "Gluster Devel" < gluster-devel@gluster.org> > > Sent: Monday, November 9, 2015 7:39:28 AM > > Subject: Re: [Gluster-devel] spurious regression errors getting worse > > > > On Fri, Nov 06, 2015 at 09:36:50AM +0530, Atin Mukherjee wrote: > > > > > > > > > On 11/06/2015 07:47 AM, Dan Lambright wrote: > > > > It seems to have become more difficult in the last week to pass > > > > regression tests. > > > > > > > > I've started recording the tests that seem to be failing the most: > > > > > > > > bug-1221481-allow-fops-on-dir-split-brain.t > > > > bug-1238706-daemons-stop-on-peer-cleanup.t > > > You shouldn't be worried about > > > bug-1238706-daemons-stop-on-peer-cleanup.t as its marked as bad. Also > > > respective failure links for all these tests would help component owners > > > to root cause the issues. Probably you could add all of them here [1] > > > > We really need to file bugs for the problems, it allows us to get > > notifications about problems. Etherpads can be nice to dedicated work, > > but many of us do not check them regularly. > > There does not seem to be a small core set of bugs that fails regularly. Rather, the set of spurious bugs seems to be large. > So we would be filing a lot of bugs. But, it could be done. And is probably more effective than a rarely consulted ether pad. Filing bugs is definitely a next step process even if the etherpad is maintained. So my two cents. I just wanted to ensure that we have at least one place holder (etherpad here as that has been a process till now for spurious failures) to file all these failures. > > > > > > Rajesh, tests/bugs/snapshot/bug-1227646.t resulted in a core for this > > run: > > > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/15664/consoleFull > > > > I'm not sure if someone is looking into that already? > > > > Thanks, > > Niels > > > > > > > > [1] https://public.pad.fsfe.org/p/gluster-spurious-failures > > > > > > Thanks, > > > Atin > > > > ./tests/bugs/quota/bug-1235182.t > > > > ./tests/bugs/distribute/bug-1066798.t > > > > ./tests/bugs/snapshot/bug-1166197.t > > > > > > > > In some cases regression must be run a half dozen times before finally > > > > passing. > > > > > > > > Could the owners those tests please look into these? > > > > ___ > > > > Gluster-devel mailing list > > > > Gluster-devel@gluster.org > > > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > > > > ___ > > > Gluster-devel mailing list > > > Gluster-devel@gluster.org > > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] what rpm sub-package do /usr/{libexec, sbin}/gfind_missing_files belong to?
the in-tree glusterfs.spec(.in) has them immediately following the geo-rep sub-package, but outside the %if ... %endif. Are they part of geo-rep? Or something else? -- Kaleb ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious regression errors getting worse
On Fri, Nov 06, 2015 at 09:36:50AM +0530, Atin Mukherjee wrote: > > > On 11/06/2015 07:47 AM, Dan Lambright wrote: > > It seems to have become more difficult in the last week to pass regression > > tests. > > > > I've started recording the tests that seem to be failing the most: > > > > bug-1221481-allow-fops-on-dir-split-brain.t > > bug-1238706-daemons-stop-on-peer-cleanup.t > You shouldn't be worried about > bug-1238706-daemons-stop-on-peer-cleanup.t as its marked as bad. Also > respective failure links for all these tests would help component owners > to root cause the issues. Probably you could add all of them here [1] We really need to file bugs for the problems, it allows us to get notifications about problems. Etherpads can be nice to dedicated work, but many of us do not check them regulary. Rajesh, tests/bugs/snapshot/bug-1227646.t resulted in a core for this run: https://build.gluster.org/job/rackspace-regression-2GB-triggered/15664/consoleFull I'm not sure if someone is looking into that already? Thanks, Niels > > [1] https://public.pad.fsfe.org/p/gluster-spurious-failures > > Thanks, > Atin > > ./tests/bugs/quota/bug-1235182.t > > ./tests/bugs/distribute/bug-1066798.t > > ./tests/bugs/snapshot/bug-1166197.t > > > > In some cases regression must be run a half dozen times before finally > > passing. > > > > Could the owners those tests please look into these? > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious regression errors getting worse
- Original Message - > From: "Niels de Vos"> To: "Atin Mukherjee" , "Rajesh Joseph" > > Cc: "Dan Lambright" , "Gluster Devel" > > Sent: Monday, November 9, 2015 7:39:28 AM > Subject: Re: [Gluster-devel] spurious regression errors getting worse > > On Fri, Nov 06, 2015 at 09:36:50AM +0530, Atin Mukherjee wrote: > > > > > > On 11/06/2015 07:47 AM, Dan Lambright wrote: > > > It seems to have become more difficult in the last week to pass > > > regression tests. > > > > > > I've started recording the tests that seem to be failing the most: > > > > > > bug-1221481-allow-fops-on-dir-split-brain.t > > > bug-1238706-daemons-stop-on-peer-cleanup.t > > You shouldn't be worried about > > bug-1238706-daemons-stop-on-peer-cleanup.t as its marked as bad. Also > > respective failure links for all these tests would help component owners > > to root cause the issues. Probably you could add all of them here [1] > > We really need to file bugs for the problems, it allows us to get > notifications about problems. Etherpads can be nice to dedicated work, > but many of us do not check them regularly. There does not seem to be a small core set of bugs that fails regularly. Rather, the set of spurious bugs seems to be large. So we would be filing a lot of bugs. But, it could be done. And is probably more effective than a rarely consulted ether pad. > > Rajesh, tests/bugs/snapshot/bug-1227646.t resulted in a core for this > run: > > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/15664/consoleFull > > I'm not sure if someone is looking into that already? > > Thanks, > Niels > > > > > [1] https://public.pad.fsfe.org/p/gluster-spurious-failures > > > > Thanks, > > Atin > > > ./tests/bugs/quota/bug-1235182.t > > > ./tests/bugs/distribute/bug-1066798.t > > > ./tests/bugs/snapshot/bug-1166197.t > > > > > > In some cases regression must be run a half dozen times before finally > > > passing. > > > > > > Could the owners those tests please look into these? > > > ___ > > > Gluster-devel mailing list > > > Gluster-devel@gluster.org > > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Gerrit review, submit type and Jenkins testing
Hi, While trying to understand how our gerrit+jenkins setup works, I realized of a possibility of allowing bugs to get in. Currently, our gerrit is setup to have cherry-pick as the submit type. Now consider a case where: Dev1 sends a commit B with parent commit A(A is already merged). Dev2 sends a commit C with parent commit A(A is already merged). Both the patches get +2 from Jenkins. Maintainer merges commit B from Dev1. Another maintainer merges commit C from Dev2. If the two commits B and C changed code which had no merge conflicts but were conflicting in logic, then we have a master which has bugs. If Dev3 now sends a commit D with re-based master as parent, we have the following cases: 1. If bug introduced above is not racy, we have tests always failing for Dev3 on commit D. Tests that fail would be from components that commit B and C changed. Dev3 has no idea on how to fix them and has to enlist help from Dev1 and Dev2. 2. If bug introduced above is racy, then there is a probability that Dev3 escapes from this trouble and someone else will bear it later. Even if the racy code is hit and test fails, Dev3 will probably re-trigger the tests given that they failed for a component which is not related to his/her code and the bug stays in code longer. The most obvious but not practical solution to the above problem is to change the submit type in gerrit to "fast-forward only". It would then ensure that once commit B is merged, Dev2 has to re-base and re-run the tests on commit C with commit B as parent, before it could be merged. It is not practical because it will cause all patches in review to get re-based and re-triggered whenever a patch is merged. A little modification to the above solution would be to - change submit type to fast-forward only - don't run any jenkins job on patches till they get +2 from reviewers - once a +2 is given, run jenkins job on patch and automatically submit it if test passes. - automatically rebase all patches on review with new master and mark conflict if merge conflict arises. As a side effect of this, Dev would now be forced to run a complete regression on dev machine before sending a patch for review. Any thoughts on the above solutions or other suggestions? Thanks, Raghavendra Talur ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious regression errors getting worse
- Original Message - > From: "Niels de Vos"> To: "Dan Lambright" > Cc: "Gluster Devel" > Sent: Monday, November 9, 2015 4:09:08 PM > Subject: Re: [Gluster-devel] spurious regression errors getting worse > > On Thu, Nov 05, 2015 at 09:17:28PM -0500, Dan Lambright wrote: > > It seems to have become more difficult in the last week to pass regression > > tests. > > > > I've started recording the tests that seem to be failing the most: > > > > bug-1221481-allow-fops-on-dir-split-brain.t > > bug-1238706-daemons-stop-on-peer-cleanup.t > > ./tests/bugs/quota/bug-1235182.t > > ./tests/bugs/distribute/bug-1066798.t > > ./tests/bugs/snapshot/bug-1166197.t > > > > In some cases regression must be run a half dozen times before finally > > passing. > > > > Could the owners those tests please look into these? > > https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/11617/consoleFull > failed on > > [16:18:49] ./tests/basic/tier/fops-during-migration-pause.t .. > not ok 19 > not ok 20 > Failed 2/20 subtests > [16:18:49] > > Please have a look. Thanks, Hm. This one most certainly broke due to one of the fixes we merged over the weekend, its spurious and snuck through. Will fix it right away. Thank you > Niels > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gerrit review, submit type and Jenkins testing
-Atin Sent from one plus one On Nov 10, 2015 11:24 AM, "Kaushal M"wrote: > > On Tue, Nov 10, 2015 at 9:24 AM, Raghavendra Gowdappa > wrote: > > > > > > - Original Message - > >> From: "Raghavendra Talur" > >> To: "Gluster Devel" > >> Sent: Tuesday, November 10, 2015 3:10:34 AM > >> Subject: [Gluster-devel] Gerrit review, submit type and Jenkins testing > >> > >> Hi, > >> > >> While trying to understand how our gerrit+jenkins setup works, I realized of > >> a possibility of allowing bugs to get in. > >> > >> Currently, our gerrit is setup to have cherry-pick as the submit type. Now > >> consider a case where: > >> > >> Dev1 sends a commit B with parent commit A(A is already merged). > >> Dev2 sends a commit C with parent commit A(A is already merged). > >> > >> Both the patches get +2 from Jenkins. > >> > >> Maintainer merges commit B from Dev1. > >> Another maintainer merges commit C from Dev2. > >> > >> If the two commits B and C changed code which had no merge conflicts but were > >> conflicting in logic, > >> then we have a master which has bugs. > >> > >> If Dev3 now sends a commit D with re-based master as parent, we have the > >> following cases: > >> > >> 1. If bug introduced above is not racy, we have tests always failing for Dev3 > >> on commit D. Tests that fail would be from components that commit B and C > >> changed. Dev3 has no idea on how to fix them and has to enlist help from > >> Dev1 and Dev2. > >> > >> 2. If bug introduced above is racy, then there is a probability that Dev3 > >> escapes from this trouble and someone else will bear it later. Even if the > >> racy code is hit and test fails, Dev3 will probably re-trigger the tests > >> given that they failed for a component which is not related to his/her code > >> and the bug stays in code longer. > >> > >> The most obvious but not practical solution to the above problem is to change > >> the submit type in gerrit to "fast-forward only". It would then ensure that > >> once commit B is merged, Dev2 has to re-base and re-run the tests on commit > >> C with commit B as parent, before it could be merged. It is not practical > >> because it will cause all patches in review to get re-based and re-triggered > >> whenever a patch is merged. > >> > >> A little modification to the above solution would be to > >> > >> > >> * change submit type to fast-forward only > >> * don't run any jenkins job on patches till they get +2 from reviewers > >> * once a +2 is given, run jenkins job on patch and automatically submit > >> it if test passes. > >> * automatically rebase all patches on review with new master and mark > >> conflict if merge conflict arises. > > > > Seems like a good suggestion. How about a slight variation to the above process? Can we run one initial set of regression immediately after submission, but before any reviews? That way reviewers can prioritize those patches that have passed regression over the ones that have failed? Flip side is that minimum two sets of regressions are needed to merge any patch. I am making this suggestion with the assumption that dev/reviewer time is more precious than machine time. Of course, this will have issues with patches that need to get in urgently (user/customer hot fix etc) where time is a constraint. But that can be worked around on a case-by-case basis. > > We would still be running smoke, which would catch any very obvious > mistakes, isn't this enough? > > Regarding the initial regression run, would it include the complete > regression suite or just a subset. If it is the complete set, then it > would be no different from what we are doing now. If it is a subset, > then we will need to come up with a subset of the regression suite > that catches most of obvious mistakes. We've had discussions several > times (don't remember if it was on the mailing lists, but I have had > conversations) about doing this. Every time we ended up on the > question of how we choose the subset, which is where we stopped. How about running tests/basic/*.t? I know coverage wise this isn't good enough, but still better than running everything or nothing. > > > > >> > >> As a side effect of this, Dev would now be forced to run a complete > >> regression on dev machine before sending a patch for review. > >> > >> Any thoughts on the above solutions or other suggestions? > >> > >> Thanks, > >> Raghavendra Talur > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> ___ > >> Gluster-devel mailing list > >> Gluster-devel@gluster.org > >> http://www.gluster.org/mailman/listinfo/gluster-devel > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org
Re: [Gluster-devel] Gerrit review, submit type and Jenkins testing
On Tue, Nov 10, 2015 at 9:24 AM, Raghavendra Gowdappawrote: > > > - Original Message - >> From: "Raghavendra Talur" >> To: "Gluster Devel" >> Sent: Tuesday, November 10, 2015 3:10:34 AM >> Subject: [Gluster-devel] Gerrit review, submit type and Jenkins testing >> >> Hi, >> >> While trying to understand how our gerrit+jenkins setup works, I realized of >> a possibility of allowing bugs to get in. >> >> Currently, our gerrit is setup to have cherry-pick as the submit type. Now >> consider a case where: >> >> Dev1 sends a commit B with parent commit A(A is already merged). >> Dev2 sends a commit C with parent commit A(A is already merged). >> >> Both the patches get +2 from Jenkins. >> >> Maintainer merges commit B from Dev1. >> Another maintainer merges commit C from Dev2. >> >> If the two commits B and C changed code which had no merge conflicts but were >> conflicting in logic, >> then we have a master which has bugs. >> >> If Dev3 now sends a commit D with re-based master as parent, we have the >> following cases: >> >> 1. If bug introduced above is not racy, we have tests always failing for Dev3 >> on commit D. Tests that fail would be from components that commit B and C >> changed. Dev3 has no idea on how to fix them and has to enlist help from >> Dev1 and Dev2. >> >> 2. If bug introduced above is racy, then there is a probability that Dev3 >> escapes from this trouble and someone else will bear it later. Even if the >> racy code is hit and test fails, Dev3 will probably re-trigger the tests >> given that they failed for a component which is not related to his/her code >> and the bug stays in code longer. >> >> The most obvious but not practical solution to the above problem is to change >> the submit type in gerrit to "fast-forward only". It would then ensure that >> once commit B is merged, Dev2 has to re-base and re-run the tests on commit >> C with commit B as parent, before it could be merged. It is not practical >> because it will cause all patches in review to get re-based and re-triggered >> whenever a patch is merged. >> >> A little modification to the above solution would be to >> >> >> * change submit type to fast-forward only >> * don't run any jenkins job on patches till they get +2 from reviewers >> * once a +2 is given, run jenkins job on patch and automatically submit >> it if test passes. >> * automatically rebase all patches on review with new master and mark >> conflict if merge conflict arises. > > Seems like a good suggestion. How about a slight variation to the above > process? Can we run one initial set of regression immediately after > submission, but before any reviews? That way reviewers can prioritize those > patches that have passed regression over the ones that have failed? Flip side > is that minimum two sets of regressions are needed to merge any patch. I am > making this suggestion with the assumption that dev/reviewer time is more > precious than machine time. Of course, this will have issues with patches > that need to get in urgently (user/customer hot fix etc) where time is a > constraint. But that can be worked around on a case-by-case basis. We would still be running smoke, which would catch any very obvious mistakes, isn't this enough? Regarding the initial regression run, would it include the complete regression suite or just a subset. If it is the complete set, then it would be no different from what we are doing now. If it is a subset, then we will need to come up with a subset of the regression suite that catches most of obvious mistakes. We've had discussions several times (don't remember if it was on the mailing lists, but I have had conversations) about doing this. Every time we ended up on the question of how we choose the subset, which is where we stopped. > >> >> As a side effect of this, Dev would now be forced to run a complete >> regression on dev machine before sending a patch for review. >> >> Any thoughts on the above solutions or other suggestions? >> >> Thanks, >> Raghavendra Talur >> >> >> >> >> >> >> >> >> >> ___ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-devel > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gerrit review, submit type and Jenkins testing
On 11/10/2015 03:10 AM, Raghavendra Talur wrote: > Hi, > > While trying to understand how our gerrit+jenkins setup works, I > realized of a possibility of allowing bugs to get in. > > Currently, our gerrit is setup to have cherry-pick as the submit type. > Now consider a case where: > > Dev1 sends a commit B with parent commit A(A is already merged). > Dev2 sends a commit C with parent commit A(A is already merged). > > Both the patches get +2 from Jenkins. > > Maintainer merges commit B from Dev1. > Another maintainer merges commit C from Dev2. > > If the two commits B and C changed code which had no merge conflicts but > were conflicting in logic, > then we have a master which has bugs. > > If Dev3 now sends a commit D with re-based master as parent, we have the > following cases: > > 1. If bug introduced above is not racy, we have tests always failing for > Dev3 on commit D. Tests that fail would be from components that commit B > and C changed. Dev3 has no idea on how to fix them and has to enlist > help from Dev1 and Dev2. > > 2. If bug introduced above is racy, then there is a probability that > Dev3 escapes from this trouble and someone else will bear it later. Even > if the racy code is hit and test fails, Dev3 will probably re-trigger > the tests given that they failed for a component which is not related to > his/her code and the bug stays in code longer. > > The most obvious but not practical solution to the above problem is to > change the submit type in gerrit to "fast-forward only". It would then > ensure that once commit B is merged, Dev2 has to re-base and re-run the > tests on commit C with commit B as parent, before it could be merged. It > is not practical because it will cause all patches in review to get > re-based and re-triggered whenever a patch is merged. > > A little modification to the above solution would be to > > * change submit type to fast-forward only > * don't run any jenkins job on patches till they get +2 from reviewers > * once a +2 is given, run jenkins job on patch and automatically > submit it if test passes. > * automatically rebase all patches on review with new master and mark > conflict if merge conflict arises. The overall idea looks good to me, however I'd be bit hesitant to give a +2 before seeing the regression votes until and unless the patch is pretty straight forwad. For me +1 sounds to be a good level to trigger the regression. Once the regression passes, maintainers can +2 a patch and then merge it manually. And then the 4th point follows. Thoughts? ~Atin > > As a side effect of this, Dev would now be forced to run a complete > regression on dev machine before sending a patch for review. > > Any thoughts on the above solutions or other suggestions? > > Thanks, > Raghavendra Talur > > > > > > > > > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gerrit review, submit type and Jenkins testing
On Tue, Nov 10, 2015 at 3:10 AM, Raghavendra Talurwrote: > Hi, > > While trying to understand how our gerrit+jenkins setup works, I realized of > a possibility of allowing bugs to get in. > > Currently, our gerrit is setup to have cherry-pick as the submit type. Now > consider a case where: > > Dev1 sends a commit B with parent commit A(A is already merged). > Dev2 sends a commit C with parent commit A(A is already merged). > > Both the patches get +2 from Jenkins. > > Maintainer merges commit B from Dev1. > Another maintainer merges commit C from Dev2. > > If the two commits B and C changed code which had no merge conflicts but > were conflicting in logic, > then we have a master which has bugs. > > If Dev3 now sends a commit D with re-based master as parent, we have the > following cases: > > 1. If bug introduced above is not racy, we have tests always failing for > Dev3 on commit D. Tests that fail would be from components that commit B and > C changed. Dev3 has no idea on how to fix them and has to enlist help from > Dev1 and Dev2. > > 2. If bug introduced above is racy, then there is a probability that Dev3 > escapes from this trouble and someone else will bear it later. Even if the > racy code is hit and test fails, Dev3 will probably re-trigger the tests > given that they failed for a component which is not related to his/her code > and the bug stays in code longer. > > The most obvious but not practical solution to the above problem is to > change the submit type in gerrit to "fast-forward only". It would then > ensure that once commit B is merged, Dev2 has to re-base and re-run the > tests on commit C with commit B as parent, before it could be merged. It is > not practical because it will cause all patches in review to get re-based > and re-triggered whenever a patch is merged. > > A little modification to the above solution would be to > > change submit type to fast-forward only > don't run any jenkins job on patches till they get +2 from reviewers > once a +2 is given, run jenkins job on patch and automatically submit it if > test passes. > automatically rebase all patches on review with new master and mark conflict > if merge conflict arises. Have you checked if this is even possible? > > As a side effect of this, Dev would now be forced to run a complete > regression on dev machine before sending a patch for review. > > Any thoughts on the above solutions or other suggestions? > > Thanks, > Raghavendra Talur > > > > > > > > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Logging framework in Gluter 4.0
On Fri, Nov 6, 2015 at 10:45 PM, Atin Mukherjeewrote: > -Atin > Sent from one plus one > > > On Nov 6, 2015 7:50 PM, "Shyam" wrote: >> >> On 11/06/2015 06:58 AM, Atin Mukherjee wrote: >>> >>> >>> >>> On 11/06/2015 01:30 PM, Aravinda wrote: regards Aravinda http://aravindavk.in On 11/06/2015 12:28 PM, Avra Sengupta wrote: > > Hi, > > As almost all the components targeted for Gluster 4.0 have moved from > design phase to implementation phase on some level or another, I feel > it's time to get some consensus on the logging framework we are going > to use. Are we going to stick with the message ID formatted logging > framework in use today, or shall we move on to a better solution. >> >> >> I would prefer to think that we extend the log messages with what we want. >> Message IDs are there for a purpose as outlined in earlier mails on the same >> topic, so we [w|s]hould stick with message IDs and add more is how I think >> of the problem. >> >> If we take inspiration from other frameworks, I would say we improve what >> we log. >> >> Where we log is made pluggable in the current framework, although as we >> add more logging frameworks (like rsyslog or lumberjack etc.) the >> abstraction provided for plugging in these could improve. This is a >> contained change in the logging.c file though. >> >> > > Some good to have features I would like to have in the logging > framework are: > 1. Specific distinction between logs of various transactions (or > operations), that would make it a lot easier to make sense than > looking at some serialized log dump Externally grouping Message IDs may help in this situation. >>> >>> Transaction Id is definitely a *must have* requirement when 4.0's theme >>> is scalability where we are talking about number of servers been >>> thousands in the configuration with out which analysis of an issue will >>> be a nightmare. GlusterD 2.0 is definitely going to have logs with txn >>> ids. >> >> >> +1 >> >> I thought this was discussed in another thread on the lists, not able to >> find it, a link would help. >> >> The question is, should this be automated by the logging framework, or >> something each xlator should do? >> >> I prefer automating this, but how? (frame pointers? at least on the IO >> xlator stack) What is glusterd doing for the same? > As of now we have started using logrus pkg in golang which logs every > message with its txn id. How it does is what I need to find out looking at > the pkg code (if you are looking to borrow the similar idea) > The logrus[1] package provides structured logging and ability to build incremental log contexts. This makes it possible to do things like message-ids. But IMO the advantage of using structured logging is that it would kind of remove the need for message-ids. In structured logging, it encouraged to keep the log string itself constant and keep the variable portions as metadata attached to the log, generally as key-value pairs. This allows us to implement things like message-ids, but it is not really required. As the log string is kept constant, it becomes easier to search for individual logs and document them. For example (not really the best example), currently we log socket failures in the following format "readv on some.socket failed (No data available)" This is hard to search for or parse, as the log string itself changes when the socket or error changes. If using structured logging, this log would be in the format "readv on socket failed socket=some.socket errno=ENODATA" This is much more easier to search and parse. Using log-contexts allows attaching a certain metadata to all logs in a context, for example, attaching transaction-ids to all logs of a transaction. If we were to use log-contexts above example we could attach the transaction-id to the log, "readv on socket failed socket=some.socket errno=ENODATA txn-id=" allowing us to easily track during which transaction did the read fail. >> >> > 2. We often encounter issues, and then find ourselves wishing the > process was running in debug mode so that we could have gotten some > debug logs. If there is any solution that enables me tracability of > the process's flow, without compromising the space constraints of the > logs that would be great to have. I think after each debugging session, we should revise the log messages to get following answers. 1. Is something missing in log, which could have reduced the time spent to root cause. 2. Logging is available but imcomplete details. (For example, on volume set, debug message will have key and value but no Volume name) 3. Is something redundant or not required. (For example, Geo-replication worker logs sync engine as rsync or tar mode. If we have multiple workers then all workers prints this log
Re: [Gluster-devel] Logging framework in Gluter 4.0
On Tue, Nov 10, 2015 at 12:05 PM, Kaushal Mwrote: > On Fri, Nov 6, 2015 at 10:45 PM, Atin Mukherjee > wrote: >> -Atin >> Sent from one plus one >> >> >> On Nov 6, 2015 7:50 PM, "Shyam" wrote: >>> >>> On 11/06/2015 06:58 AM, Atin Mukherjee wrote: On 11/06/2015 01:30 PM, Aravinda wrote: > > > regards > Aravinda > http://aravindavk.in > > On 11/06/2015 12:28 PM, Avra Sengupta wrote: >> >> Hi, >> >> As almost all the components targeted for Gluster 4.0 have moved from >> design phase to implementation phase on some level or another, I feel >> it's time to get some consensus on the logging framework we are going >> to use. Are we going to stick with the message ID formatted logging >> framework in use today, or shall we move on to a better solution. >>> >>> >>> I would prefer to think that we extend the log messages with what we want. >>> Message IDs are there for a purpose as outlined in earlier mails on the same >>> topic, so we [w|s]hould stick with message IDs and add more is how I think >>> of the problem. >>> >>> If we take inspiration from other frameworks, I would say we improve what >>> we log. >>> >>> Where we log is made pluggable in the current framework, although as we >>> add more logging frameworks (like rsyslog or lumberjack etc.) the >>> abstraction provided for plugging in these could improve. This is a >>> contained change in the logging.c file though. >>> >>> >> >> Some good to have features I would like to have in the logging >> framework are: >> 1. Specific distinction between logs of various transactions (or >> operations), that would make it a lot easier to make sense than >> looking at some serialized log dump > > Externally grouping Message IDs may help in this situation. Transaction Id is definitely a *must have* requirement when 4.0's theme is scalability where we are talking about number of servers been thousands in the configuration with out which analysis of an issue will be a nightmare. GlusterD 2.0 is definitely going to have logs with txn ids. >>> >>> >>> +1 >>> >>> I thought this was discussed in another thread on the lists, not able to >>> find it, a link would help. >>> >>> The question is, should this be automated by the logging framework, or >>> something each xlator should do? >>> >>> I prefer automating this, but how? (frame pointers? at least on the IO >>> xlator stack) What is glusterd doing for the same? >> As of now we have started using logrus pkg in golang which logs every >> message with its txn id. How it does is what I need to find out looking at >> the pkg code (if you are looking to borrow the similar idea) >> > > The logrus[1] package provides structured logging and ability to build > incremental log contexts. This makes it possible to do things like > message-ids. But IMO the advantage of using structured logging is that > it would kind of remove the need for message-ids. > > In structured logging, it encouraged to keep the log string itself > constant and keep the variable portions as metadata attached to the > log, generally as key-value pairs. This allows us to implement things > like message-ids, but it is not really required. As the log string is > kept constant, it becomes easier to search for individual logs and > document them. For example (not really the best example), currently we > log socket failures in the following format > > "readv on some.socket failed (No data available)" > > This is hard to search for or parse, as the log string itself changes > when the socket or error changes. If using structured logging, this > log would be in the format > > "readv on socket failed socket=some.socket errno=ENODATA" > > This is much more easier to search and parse. > > Using log-contexts allows attaching a certain metadata to all logs in > a context, for example, attaching transaction-ids to all logs of a > transaction. If we were to use log-contexts above example we could > attach the transaction-id to the log, > > "readv on socket failed socket=some.socket errno=ENODATA > txn-id=" > > allowing us to easily track during which transaction did the read fail. > (Missed the link) [1]: https://github.com/sirupsen/logrus >>> >>> > >> 2. We often encounter issues, and then find ourselves wishing the >> process was running in debug mode so that we could have gotten some >> debug logs. If there is any solution that enables me tracability of >> the process's flow, without compromising the space constraints of the >> logs that would be great to have. > > I think after each debugging session, we should revise the log messages > to get following answers. > 1. Is something missing in log, which could have reduced the time spent > to root cause. > 2. Logging is available but imcomplete details. (For
[Gluster-devel] iostat not showing data transfer while doing read operation with libgfapi
Hi, I am running performance test between fuse vs libgfapi. I have a single node, client and server are running on same node. I have NVMe SSD device as a storage. My volume info:: [root@sys04 ~]# gluster vol info Volume Name: vol1 Type: Distribute Volume ID: 9f60ceaf-3643-4325-855a-455974e36cc7 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 172.16.71.19:/mnt_nvme/brick1 Options Reconfigured: performance.cache-size: 0 performance.write-behind: off performance.read-ahead: off performance.io-cache: off performance.strict-o-direct: on fio Job file:: [global] direct=1 runtime=20 time_based ioengine=gfapi iodepth=1 volume=vol1 brick=172.16.71.19 rw=read size=128g bs=32k group_reporting numjobs=1 filename=128g.bar While doing sequential read test, I am not seeing any data transfer on device with iostat tool. Looks like gfapi engine is reading from the cache because i am reading from same file with different block sizes. But i disabled io cache for my volume. Can someone help me from where fio is reading the data? Sateesh ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gerrit review, submit type and Jenkins testing
- Original Message - > From: "Raghavendra Talur"> To: "Gluster Devel" > Sent: Tuesday, November 10, 2015 3:10:34 AM > Subject: [Gluster-devel] Gerrit review, submit type and Jenkins testing > > Hi, > > While trying to understand how our gerrit+jenkins setup works, I realized of > a possibility of allowing bugs to get in. > > Currently, our gerrit is setup to have cherry-pick as the submit type. Now > consider a case where: > > Dev1 sends a commit B with parent commit A(A is already merged). > Dev2 sends a commit C with parent commit A(A is already merged). > > Both the patches get +2 from Jenkins. > > Maintainer merges commit B from Dev1. > Another maintainer merges commit C from Dev2. > > If the two commits B and C changed code which had no merge conflicts but were > conflicting in logic, > then we have a master which has bugs. > > If Dev3 now sends a commit D with re-based master as parent, we have the > following cases: > > 1. If bug introduced above is not racy, we have tests always failing for Dev3 > on commit D. Tests that fail would be from components that commit B and C > changed. Dev3 has no idea on how to fix them and has to enlist help from > Dev1 and Dev2. > > 2. If bug introduced above is racy, then there is a probability that Dev3 > escapes from this trouble and someone else will bear it later. Even if the > racy code is hit and test fails, Dev3 will probably re-trigger the tests > given that they failed for a component which is not related to his/her code > and the bug stays in code longer. > > The most obvious but not practical solution to the above problem is to change > the submit type in gerrit to "fast-forward only". It would then ensure that > once commit B is merged, Dev2 has to re-base and re-run the tests on commit > C with commit B as parent, before it could be merged. It is not practical > because it will cause all patches in review to get re-based and re-triggered > whenever a patch is merged. > > A little modification to the above solution would be to > > > * change submit type to fast-forward only > * don't run any jenkins job on patches till they get +2 from reviewers > * once a +2 is given, run jenkins job on patch and automatically submit > it if test passes. > * automatically rebase all patches on review with new master and mark > conflict if merge conflict arises. Seems like a good suggestion. How about a slight variation to the above process? Can we run one initial set of regression immediately after submission, but before any reviews? That way reviewers can prioritize those patches that have passed regression over the ones that have failed? Flip side is that minimum two sets of regressions are needed to merge any patch. I am making this suggestion with the assumption that dev/reviewer time is more precious than machine time. Of course, this will have issues with patches that need to get in urgently (user/customer hot fix etc) where time is a constraint. But that can be worked around on a case-by-case basis. > > As a side effect of this, Dev would now be forced to run a complete > regression on dev machine before sending a patch for review. > > Any thoughts on the above solutions or other suggestions? > > Thanks, > Raghavendra Talur > > > > > > > > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious regression errors getting worse
On Thu, Nov 05, 2015 at 09:17:28PM -0500, Dan Lambright wrote: > It seems to have become more difficult in the last week to pass regression > tests. > > I've started recording the tests that seem to be failing the most: > > bug-1221481-allow-fops-on-dir-split-brain.t > bug-1238706-daemons-stop-on-peer-cleanup.t > ./tests/bugs/quota/bug-1235182.t > ./tests/bugs/distribute/bug-1066798.t > ./tests/bugs/snapshot/bug-1166197.t > > In some cases regression must be run a half dozen times before finally > passing. > > Could the owners those tests please look into these? https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/11617/consoleFull failed on [16:18:49] ./tests/basic/tier/fops-during-migration-pause.t .. not ok 19 not ok 20 Failed 2/20 subtests [16:18:49] Please have a look. Thanks, Niels signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel