Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status (Wed, August 08th)
Hi Shyam/Atin, I have posted the patch[1] for geo-rep test cases failure: tests/00-geo-rep/georep-basic-dr-rsync.t tests/00-geo-rep/georep-basic-dr-tarssh.t tests/00-geo-rep/00-georep-verify-setup.t Please include patch [1] while triggering tests. The instrumentation patch [2] which was included can be removed. [1] https://review.gluster.org/#/c/glusterfs/+/20704/ [2] https://review.gluster.org/#/c/glusterfs/+/20477/ Thanks, Kotresh HR On Fri, Aug 10, 2018 at 3:21 PM, Pranith Kumar Karampuri < pkara...@redhat.com> wrote: > > > On Thu, Aug 9, 2018 at 4:02 PM Pranith Kumar Karampuri < > pkara...@redhat.com> wrote: > >> >> >> On Thu, Aug 9, 2018 at 6:34 AM Shyam Ranganathan >> wrote: >> >>> Today's patch set 7 [1], included fixes provided till last evening IST, >>> and its runs can be seen here [2] (yay! we can link to comments in >>> gerrit now). >>> >>> New failures: (added to the spreadsheet) >>> ./tests/bugs/protocol/bug-808400-repl.t (core dumped) >>> ./tests/bugs/quick-read/bug-846240.t >>> >>> Older tests that had not recurred, but failed today: (moved up in the >>> spreadsheet) >>> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t >>> ./tests/bugs/index/bug-1559004-EMLINK-handling.t >>> >> >> The above test is timing out. I had to increase the timeout while adding >> the .t so that creation of maximum number of links that will max-out in >> ext4. Will re-check if it is the same issue and get back. >> > > This test is timing out with lcov. I bumped up timeout to 30 minutes @ > https://review.gluster.org/#/c/glusterfs/+/20699, I am not happy that > this test takes so long, but without this it is difficult to find > regression on ext4 which has limits on number of hardlinks in a > directory(It took us almost one year after we introduced regression to find > this problem when we did introduce regression last time). If there is a way > of running this .t once per day and before each release. I will be happy to > make it part of that. Let me know. > > >> >> >>> >>> Other issues; >>> Test ./tests/basic/ec/ec-5-2.t core dumped again >>> Few geo-rep failures, Kotresh should have more logs to look at with >>> these runs >>> Test ./tests/bugs/glusterd/quorum-validation.t dumped core again >>> >>> Atin/Amar, we may need to merge some of the patches that have proven to >>> be holding up and fixing issues today, so that we do not leave >>> everything to the last. Check and move them along or lmk. >>> >>> Shyam >>> >>> [1] Patch set 7: https://review.gluster.org/c/glusterfs/+/20637/7 >>> [2] Runs against patch set 7 and its status (incomplete as some runs >>> have not completed): >>> https://review.gluster.org/c/glusterfs/+/20637/7#message- >>> 37bc68ce6f2157f2947da6fd03b361ab1b0d1a77 >>> (also updated in the spreadsheet) >>> >>> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote: >>> > Deserves a new beginning, threads on the other mail have gone deep >>> enough. >>> > >>> > NOTE: (5) below needs your attention, rest is just process and data on >>> > how to find failures. >>> > >>> > 1) We are running the tests using the patch [2]. >>> > >>> > 2) Run details are extracted into a separate sheet in [3] named "Run >>> > Failures" use a search to find a failing test and the corresponding run >>> > that it failed in. >>> > >>> > 3) Patches that are fixing issues can be found here [1], if you think >>> > you have a patch out there, that is not in this list, shout out. >>> > >>> > 4) If you own up a test case failure, update the spreadsheet [3] with >>> > your name against the test, and also update other details as needed (as >>> > comments, as edit rights to the sheet are restricted). >>> > >>> > 5) Current test failures >>> > We still have the following tests failing and some without any RCA or >>> > attention, (If something is incorrect, write back). >>> > >>> > ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs >>> > attention) >>> > ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh) >>> > ./tests/bugs/glusterd/add-brick-and-validate-replicated- >>> volume-options.t >>> > (Atin) >>> > ./tests/bugs/ec/bug-1236065.t (Ashish) >>> > ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh) >>> > ./tests/basic/ec/ec-1468261.t (needs attention) >>> > ./tests/basic/afr/add-brick-self-heal.t (needs attention) >>> > ./tests/basic/afr/granular-esh/replace-brick.t (needs attention) >>> > ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention) >>> > ./tests/bugs/glusterd/validating-server-quorum.t (Atin) >>> > ./tests/bugs/replicate/bug-1363721.t (Ravi) >>> > >>> > Here are some newer failures, but mostly one-off failures except cores >>> > in ec-5-2.t. All of the following need attention as these are new. >>> > >>> > ./tests/00-geo-rep/00-georep-verify-setup.t >>> > ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t >>> > ./tests/basic/stats-dump.t >>> > ./tests/bugs/bug-1110262.t >>> > ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync- >>>
Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status (Wed, August 08th)
On Thu, Aug 9, 2018 at 4:02 PM Pranith Kumar Karampuri wrote: > > > On Thu, Aug 9, 2018 at 6:34 AM Shyam Ranganathan > wrote: > >> Today's patch set 7 [1], included fixes provided till last evening IST, >> and its runs can be seen here [2] (yay! we can link to comments in >> gerrit now). >> >> New failures: (added to the spreadsheet) >> ./tests/bugs/protocol/bug-808400-repl.t (core dumped) >> ./tests/bugs/quick-read/bug-846240.t >> >> Older tests that had not recurred, but failed today: (moved up in the >> spreadsheet) >> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t >> ./tests/bugs/index/bug-1559004-EMLINK-handling.t >> > > The above test is timing out. I had to increase the timeout while adding > the .t so that creation of maximum number of links that will max-out in > ext4. Will re-check if it is the same issue and get back. > This test is timing out with lcov. I bumped up timeout to 30 minutes @ https://review.gluster.org/#/c/glusterfs/+/20699, I am not happy that this test takes so long, but without this it is difficult to find regression on ext4 which has limits on number of hardlinks in a directory(It took us almost one year after we introduced regression to find this problem when we did introduce regression last time). If there is a way of running this .t once per day and before each release. I will be happy to make it part of that. Let me know. > > >> >> Other issues; >> Test ./tests/basic/ec/ec-5-2.t core dumped again >> Few geo-rep failures, Kotresh should have more logs to look at with >> these runs >> Test ./tests/bugs/glusterd/quorum-validation.t dumped core again >> >> Atin/Amar, we may need to merge some of the patches that have proven to >> be holding up and fixing issues today, so that we do not leave >> everything to the last. Check and move them along or lmk. >> >> Shyam >> >> [1] Patch set 7: https://review.gluster.org/c/glusterfs/+/20637/7 >> [2] Runs against patch set 7 and its status (incomplete as some runs >> have not completed): >> >> https://review.gluster.org/c/glusterfs/+/20637/7#message-37bc68ce6f2157f2947da6fd03b361ab1b0d1a77 >> (also updated in the spreadsheet) >> >> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote: >> > Deserves a new beginning, threads on the other mail have gone deep >> enough. >> > >> > NOTE: (5) below needs your attention, rest is just process and data on >> > how to find failures. >> > >> > 1) We are running the tests using the patch [2]. >> > >> > 2) Run details are extracted into a separate sheet in [3] named "Run >> > Failures" use a search to find a failing test and the corresponding run >> > that it failed in. >> > >> > 3) Patches that are fixing issues can be found here [1], if you think >> > you have a patch out there, that is not in this list, shout out. >> > >> > 4) If you own up a test case failure, update the spreadsheet [3] with >> > your name against the test, and also update other details as needed (as >> > comments, as edit rights to the sheet are restricted). >> > >> > 5) Current test failures >> > We still have the following tests failing and some without any RCA or >> > attention, (If something is incorrect, write back). >> > >> > ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs >> > attention) >> > ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh) >> > ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t >> > (Atin) >> > ./tests/bugs/ec/bug-1236065.t (Ashish) >> > ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh) >> > ./tests/basic/ec/ec-1468261.t (needs attention) >> > ./tests/basic/afr/add-brick-self-heal.t (needs attention) >> > ./tests/basic/afr/granular-esh/replace-brick.t (needs attention) >> > ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention) >> > ./tests/bugs/glusterd/validating-server-quorum.t (Atin) >> > ./tests/bugs/replicate/bug-1363721.t (Ravi) >> > >> > Here are some newer failures, but mostly one-off failures except cores >> > in ec-5-2.t. All of the following need attention as these are new. >> > >> > ./tests/00-geo-rep/00-georep-verify-setup.t >> > ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t >> > ./tests/basic/stats-dump.t >> > ./tests/bugs/bug-1110262.t >> > >> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t >> > ./tests/basic/ec/ec-data-heal.t >> > ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t >> > >> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t >> > ./tests/basic/ec/ec-5-2.t >> > >> > 6) Tests that are addressed or are not occurring anymore are, >> > >> > ./tests/bugs/glusterd/rebalance-operations-in-single-node.t >> > ./tests/bugs/index/bug-1559004-EMLINK-handling.t >> > ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t >> > ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t >> > ./tests/bitrot/bug-1373520.t >> > ./tests/bugs/distribute/bug-1117851.t >> >
Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status (Wed, August 08th)
On Thu, Aug 9, 2018 at 6:34 AM Shyam Ranganathan wrote: > Today's patch set 7 [1], included fixes provided till last evening IST, > and its runs can be seen here [2] (yay! we can link to comments in > gerrit now). > > New failures: (added to the spreadsheet) > ./tests/bugs/protocol/bug-808400-repl.t (core dumped) > ./tests/bugs/quick-read/bug-846240.t > > Older tests that had not recurred, but failed today: (moved up in the > spreadsheet) > ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t > ./tests/bugs/index/bug-1559004-EMLINK-handling.t > The above test is timing out. I had to increase the timeout while adding the .t so that creation of maximum number of links that will max-out in ext4. Will re-check if it is the same issue and get back. > > Other issues; > Test ./tests/basic/ec/ec-5-2.t core dumped again > Few geo-rep failures, Kotresh should have more logs to look at with > these runs > Test ./tests/bugs/glusterd/quorum-validation.t dumped core again > > Atin/Amar, we may need to merge some of the patches that have proven to > be holding up and fixing issues today, so that we do not leave > everything to the last. Check and move them along or lmk. > > Shyam > > [1] Patch set 7: https://review.gluster.org/c/glusterfs/+/20637/7 > [2] Runs against patch set 7 and its status (incomplete as some runs > have not completed): > > https://review.gluster.org/c/glusterfs/+/20637/7#message-37bc68ce6f2157f2947da6fd03b361ab1b0d1a77 > (also updated in the spreadsheet) > > On 08/07/2018 07:37 PM, Shyam Ranganathan wrote: > > Deserves a new beginning, threads on the other mail have gone deep > enough. > > > > NOTE: (5) below needs your attention, rest is just process and data on > > how to find failures. > > > > 1) We are running the tests using the patch [2]. > > > > 2) Run details are extracted into a separate sheet in [3] named "Run > > Failures" use a search to find a failing test and the corresponding run > > that it failed in. > > > > 3) Patches that are fixing issues can be found here [1], if you think > > you have a patch out there, that is not in this list, shout out. > > > > 4) If you own up a test case failure, update the spreadsheet [3] with > > your name against the test, and also update other details as needed (as > > comments, as edit rights to the sheet are restricted). > > > > 5) Current test failures > > We still have the following tests failing and some without any RCA or > > attention, (If something is incorrect, write back). > > > > ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs > > attention) > > ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh) > > ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t > > (Atin) > > ./tests/bugs/ec/bug-1236065.t (Ashish) > > ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh) > > ./tests/basic/ec/ec-1468261.t (needs attention) > > ./tests/basic/afr/add-brick-self-heal.t (needs attention) > > ./tests/basic/afr/granular-esh/replace-brick.t (needs attention) > > ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention) > > ./tests/bugs/glusterd/validating-server-quorum.t (Atin) > > ./tests/bugs/replicate/bug-1363721.t (Ravi) > > > > Here are some newer failures, but mostly one-off failures except cores > > in ec-5-2.t. All of the following need attention as these are new. > > > > ./tests/00-geo-rep/00-georep-verify-setup.t > > ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t > > ./tests/basic/stats-dump.t > > ./tests/bugs/bug-1110262.t > > > ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t > > ./tests/basic/ec/ec-data-heal.t > > ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t > > > ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t > > ./tests/basic/ec/ec-5-2.t > > > > 6) Tests that are addressed or are not occurring anymore are, > > > > ./tests/bugs/glusterd/rebalance-operations-in-single-node.t > > ./tests/bugs/index/bug-1559004-EMLINK-handling.t > > ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t > > ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t > > ./tests/bitrot/bug-1373520.t > > ./tests/bugs/distribute/bug-1117851.t > > ./tests/bugs/glusterd/quorum-validation.t > > ./tests/bugs/distribute/bug-1042725.t > > > ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t > > ./tests/bugs/quota/bug-1293601.t > > ./tests/bugs/bug-1368312.t > > ./tests/bugs/distribute/bug-1122443.t > > ./tests/bugs/core/bug-1432542-mpx-restart-crash.t > > > > Shyam (and Atin) > > > > On 08/05/2018 06:24 PM, Shyam Ranganathan wrote: > >> Health on master as of the last nightly run [4] is still the same. > >> > >> Potential patches that rectify the situation (as in [1]) are bunched in > >> a patch [2] that Atin and myself have put through several regressions > >> (mux, normal and line coverage) and these have also not passed. > >> > >> Till we rectify the
Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status (Wed, August 08th)
On Thu, 9 Aug 2018 at 06:34, Shyam Ranganathan wrote: > Today's patch set 7 [1], included fixes provided till last evening IST, > and its runs can be seen here [2] (yay! we can link to comments in > gerrit now). > > New failures: (added to the spreadsheet) > ./tests/bugs/protocol/bug-808400-repl.t (core dumped) > ./tests/bugs/quick-read/bug-846240.t > > Older tests that had not recurred, but failed today: (moved up in the > spreadsheet) > ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t > ./tests/bugs/index/bug-1559004-EMLINK-handling.t > > Other issues; > Test ./tests/basic/ec/ec-5-2.t core dumped again > Few geo-rep failures, Kotresh should have more logs to look at with > these runs > Test ./tests/bugs/glusterd/quorum-validation.t dumped core again > > Atin/Amar, we may need to merge some of the patches that have proven to > be holding up and fixing issues today, so that we do not leave > everything to the last. Check and move them along or lmk. Ack. I’ll be merging those patches. > > Shyam > > [1] Patch set 7: https://review.gluster.org/c/glusterfs/+/20637/7 > [2] Runs against patch set 7 and its status (incomplete as some runs > have not completed): > > https://review.gluster.org/c/glusterfs/+/20637/7#message-37bc68ce6f2157f2947da6fd03b361ab1b0d1a77 > (also updated in the spreadsheet) > > On 08/07/2018 07:37 PM, Shyam Ranganathan wrote: > > Deserves a new beginning, threads on the other mail have gone deep > enough. > > > > NOTE: (5) below needs your attention, rest is just process and data on > > how to find failures. > > > > 1) We are running the tests using the patch [2]. > > > > 2) Run details are extracted into a separate sheet in [3] named "Run > > Failures" use a search to find a failing test and the corresponding run > > that it failed in. > > > > 3) Patches that are fixing issues can be found here [1], if you think > > you have a patch out there, that is not in this list, shout out. > > > > 4) If you own up a test case failure, update the spreadsheet [3] with > > your name against the test, and also update other details as needed (as > > comments, as edit rights to the sheet are restricted). > > > > 5) Current test failures > > We still have the following tests failing and some without any RCA or > > attention, (If something is incorrect, write back). > > > > ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs > > attention) > > ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh) > > ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t > > (Atin) > > ./tests/bugs/ec/bug-1236065.t (Ashish) > > ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh) > > ./tests/basic/ec/ec-1468261.t (needs attention) > > ./tests/basic/afr/add-brick-self-heal.t (needs attention) > > ./tests/basic/afr/granular-esh/replace-brick.t (needs attention) > > ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention) > > ./tests/bugs/glusterd/validating-server-quorum.t (Atin) > > ./tests/bugs/replicate/bug-1363721.t (Ravi) > > > > Here are some newer failures, but mostly one-off failures except cores > > in ec-5-2.t. All of the following need attention as these are new. > > > > ./tests/00-geo-rep/00-georep-verify-setup.t > > ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t > > ./tests/basic/stats-dump.t > > ./tests/bugs/bug-1110262.t > > > ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t > > ./tests/basic/ec/ec-data-heal.t > > ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t > > > ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t > > ./tests/basic/ec/ec-5-2.t > > > > 6) Tests that are addressed or are not occurring anymore are, > > > > ./tests/bugs/glusterd/rebalance-operations-in-single-node.t > > ./tests/bugs/index/bug-1559004-EMLINK-handling.t > > ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t > > ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t > > ./tests/bitrot/bug-1373520.t > > ./tests/bugs/distribute/bug-1117851.t > > ./tests/bugs/glusterd/quorum-validation.t > > ./tests/bugs/distribute/bug-1042725.t > > > ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t > > ./tests/bugs/quota/bug-1293601.t > > ./tests/bugs/bug-1368312.t > > ./tests/bugs/distribute/bug-1122443.t > > ./tests/bugs/core/bug-1432542-mpx-restart-crash.t > > > > Shyam (and Atin) > > > > On 08/05/2018 06:24 PM, Shyam Ranganathan wrote: > >> Health on master as of the last nightly run [4] is still the same. > >> > >> Potential patches that rectify the situation (as in [1]) are bunched in > >> a patch [2] that Atin and myself have put through several regressions > >> (mux, normal and line coverage) and these have also not passed. > >> > >> Till we rectify the situation we are locking down master branch commit > >> rights to the following people, Amar, Atin, Shyam, Vijay. > >> > >> The intention is to stabilize master and not