On Thu, Aug 2, 2018 at 1:42 PM Kotresh Hiremath Ravishankar < khire...@redhat.com> wrote:
> > > On Thu, Aug 2, 2018 at 5:05 PM, Atin Mukherjee <atin.mukherje...@gmail.com > > wrote: > >> >> >> On Thu, Aug 2, 2018 at 4:37 PM Kotresh Hiremath Ravishankar < >> khire...@redhat.com> wrote: >> >>> >>> >>> On Thu, Aug 2, 2018 at 3:49 PM, Xavi Hernandez <xhernan...@redhat.com> >>> wrote: >>> >>>> On Thu, Aug 2, 2018 at 6:14 AM Atin Mukherjee <amukh...@redhat.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Jul 31, 2018 at 10:11 PM Atin Mukherjee <amukh...@redhat.com> >>>>> wrote: >>>>> >>>>>> I just went through the nightly regression report of brick mux runs >>>>>> and here's what I can summarize. >>>>>> >>>>>> >>>>>> ========================================================================================================================================================================= >>>>>> Fails only with brick-mux >>>>>> >>>>>> ========================================================================================================================================================================= >>>>>> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even >>>>>> after 400 secs. Refer >>>>>> https://fstat.gluster.org/failure/209?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all, >>>>>> specifically the latest report >>>>>> https://build.gluster.org/job/regression-test-burn-in/4051/consoleText >>>>>> . Wasn't timing out as frequently as it was till 12 July. But since 27 >>>>>> July, it has timed out twice. Beginning to believe commit >>>>>> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 has added the delay and now 400 >>>>>> secs isn't sufficient enough (Mohit?) >>>>>> >>>>>> tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t >>>>>> (Ref - >>>>>> https://build.gluster.org/job/regression-test-with-multiplex/814/console) >>>>>> - Test fails only in brick-mux mode, AI on Atin to look at and get back. >>>>>> >>>>>> tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t ( >>>>>> https://build.gluster.org/job/regression-test-with-multiplex/813/console) >>>>>> - Seems like failed just twice in last 30 days as per >>>>>> https://fstat.gluster.org/failure/251?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all. >>>>>> Need help from AFR team. >>>>>> >>>>>> tests/bugs/quota/bug-1293601.t ( >>>>>> https://build.gluster.org/job/regression-test-with-multiplex/812/console) >>>>>> - Hasn't failed after 26 July and earlier it was failing regularly. Did >>>>>> we >>>>>> fix this test through any patch (Mohit?) >>>>>> >>>>>> tests/bitrot/bug-1373520.t - ( >>>>>> https://build.gluster.org/job/regression-test-with-multiplex/811/console) >>>>>> - Hasn't failed after 27 July and earlier it was failing regularly. Did >>>>>> we >>>>>> fix this test through any patch (Mohit?) >>>>>> >>>>> >>>>> I see this has failed in day before yesterday's regression run as well >>>>> (and I could reproduce it locally with brick mux enabled). The test fails >>>>> in healing a file within a particular time period. >>>>> >>>>> *15:55:19* not ok 25 Got "0" instead of "512", LINENUM:55*15:55:19* >>>>> FAILED COMMAND: 512 path_size /d/backends/patchy5/FILE1 >>>>> >>>>> Need EC dev's help here. >>>>> >>>> >>>> I'm not sure where the problem is exactly. I've seen that when the test >>>> fails, self-heal is attempting to heal the file, but when the file is >>>> accessed, an Input/Output error is returned, aborting heal. I've checked >>>> that a heal is attempted every time the file is accessed, but it fails >>>> always. This error seems to come from bit-rot stub xlator. >>>> >>>> When in this situation, if I stop and start the volume, self-heal >>>> immediately heals the files. It seems like an stale state that is kept by >>>> the stub xlator, preventing the file from being healed. >>>> >>>> Adding bit-rot maintainers for help on this one. >>>> >>> >>> Bitrot-stub marks the file as corrupted in inode_ctx. But when the file >>> and it's hardlink are deleted from that brick and a lookup is done >>> on the file, it cleans up the marker on getting ENOENT. This is part of >>> recovery steps, and only md-cache is disabled during the process. >>> Is there any other perf xlators that needs to be disabled for this >>> scenario to expect a lookup/revalidate on the brick where >>> the back end file is deleted? >>> >> >> But the same test doesn't fail with brick multiplexing not enabled. Do we >> know why? >> > Don't know, something to do with perf xlators I suppose. It's not > repdroduced on my local system with brick-mux enabled as well. But it's > happening on Xavis' system. > > Xavi, > Could you try with the patch [1] and let me know whether it fixes the > issue. > With the additional performance xlators disabled still happens. The only thing that I've observed is that if I add a sleep just before stopping the volume, the test seems to pass always. Maybe there are some background updates going on ? (ec does background updates, but I'm not sure how this can be related with the Input/Output error accessing the brick file). Xavi > [1] https://review.gluster.org/#/c/20619/1 > >> >> >>> >>>> Xavi >>>> >>>> >>>> >>>>> >>>>>> tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a >>>>>> core, not sure if related to brick mux or not, so not sure if brick mux >>>>>> is >>>>>> culprit here or not. Ref - >>>>>> https://build.gluster.org/job/regression-test-with-multiplex/806/console >>>>>> . Seems to be a glustershd crash. Need help from AFR folks. >>>>>> >>>>>> >>>>>> ========================================================================================================================================================================= >>>>>> Fails for non-brick mux case too >>>>>> >>>>>> ========================================================================================================================================================================= >>>>>> tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup >>>>>> very often, with out brick mux as well. Refer >>>>>> https://build.gluster.org/job/regression-test-burn-in/4050/consoleText >>>>>> . There's an email in gluster-devel and a BZ 1610240 for the same. >>>>>> >>>>>> tests/bugs/bug-1368312.t - Seems to be recent failures ( >>>>>> https://build.gluster.org/job/regression-test-with-multiplex/815/console) >>>>>> - seems to be a new failure, however seen this for a non-brick-mux case >>>>>> too >>>>>> - >>>>>> https://build.gluster.org/job/regression-test-burn-in/4039/consoleText >>>>>> . Need some eyes from AFR folks. >>>>>> >>>>>> tests/00-geo-rep/georep-basic-dr-tarssh.t - this isn't specific to >>>>>> brick mux, have seen this failing at multiple default regression runs. >>>>>> Refer >>>>>> https://fstat.gluster.org/failure/392?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all >>>>>> . We need help from geo-rep dev to root cause this earlier than later >>>>>> >>>>>> tests/00-geo-rep/georep-basic-dr-rsync.t - this isn't specific to >>>>>> brick mux, have seen this failing at multiple default regression runs. >>>>>> Refer >>>>>> https://fstat.gluster.org/failure/393?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all >>>>>> . We need help from geo-rep dev to root cause this earlier than later >>>>>> >>>>>> tests/bugs/glusterd/validating-server-quorum.t ( >>>>>> https://build.gluster.org/job/regression-test-with-multiplex/810/console) >>>>>> - Fails for non-brick-mux cases too, >>>>>> https://fstat.gluster.org/failure/580?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all >>>>>> . Atin has a patch https://review.gluster.org/20584 which resolves >>>>>> it but patch is failing regression for a different test which is >>>>>> unrelated. >>>>>> >>>>>> tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t >>>>>> (Ref - >>>>>> https://build.gluster.org/job/regression-test-with-multiplex/809/console) >>>>>> - fails for non brick mux case too - >>>>>> https://build.gluster.org/job/regression-test-burn-in/4049/consoleText >>>>>> - Need some eyes from AFR folks. >>>>>> >>>>> >>> >>> >>> -- >>> Thanks and Regards, >>> Kotresh H R >>> _______________________________________________ >>> Gluster-devel mailing list >>> Gluster-devel@gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-devel >> >> > > > -- > Thanks and Regards, > Kotresh H R >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel