Re: [Gluster-devel] tests/bugs/core/multiplex-limit-issue-151.t timed out

2018-08-10 Thread Mohit Agrawal
File a bug https://bugzilla.redhat.com/show_bug.cgi?id=1615003, I am not
able to extract logs
specific to this test case from the log dump.


Thanks
Mohit Agrawal

On Sat, Aug 11, 2018 at 9:27 AM, Atin Mukherjee  wrote:

> https://build.gluster.org/job/line-coverage/455/consoleFull
>
> 1 test failed:
> tests/bugs/core/multiplex-limit-issue-151.t (timed out)
>
> The last job https://build.gluster.org/job/line-coverage/454/consoleFull
> took only 21 secs, so we're not anyway near to breaching the threshold of
> the timeout secs. Possibly a hang?
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down status (Fri, August 9th)

2018-08-10 Thread Ravishankar N




On 08/11/2018 07:29 AM, Shyam Ranganathan wrote:

./tests/bugs/replicate/bug-1408712.t (one retry)
I'll take a look at this. But it looks like archiving the artifacts 
(logs) for this run 
(https://build.gluster.org/job/regression-on-demand-full-run/44/consoleFull) 
was a failure.

Thanks,
Ravi
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] tests/bugs/core/multiplex-limit-issue-151.t timed out

2018-08-10 Thread Atin Mukherjee
https://build.gluster.org/job/line-coverage/455/consoleFull

1 test failed:
tests/bugs/core/multiplex-limit-issue-151.t (timed out)

The last job https://build.gluster.org/job/line-coverage/454/consoleFull
took only 21 secs, so we're not anyway near to breaching the threshold of
the timeout secs. Possibly a hang?
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down status (Fri, August 9th)

2018-08-10 Thread Shyam Ranganathan
Today's patch set is 9 [1].

Total of 7 runs for line-coverage, mux regressions, centos7 regressions
are running (some are yet to complete).

Test failure summary is as follows,
./tests/bugs/glusterd/validating-server-quorum.t (2 cores)
./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
(2 retries)
./tests/bugs/replicate/bug-1408712.t (one retry)
./tests/bugs/core/multiplex-limit-issue-151.t (one retry)
./tests/bugs/quick-read/bug-846240.t (one retry)
./tests/00-geo-rep/georep-basic-dr-rsync.t (one retry)

Test output can be found at, [2] and [3]. [2] will be updated as runs
that are still ongoing complete.

Shyam
[1] Patch set: https://review.gluster.org/c/glusterfs/+/20637/9
[2] Sheet recording failures:
https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit#gid=1535799585
[3] Comment on patch set 9 recording runs till now:
https://review.gluster.org/c/glusterfs/+/20637#message-07f3886dda133ed642438eb9e82b82d957668e86
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>> destabilize it.
>>
>> Test cases that are tracked as failures and need action are present here
>> [3].
>>
>> @Nigel, request you to apply the commit rights change as you see this
>> mail and let the list know regarding the same as well.
>>
>> Thanks,
>> Shyam
>>
>> [1] Patches that address regression failures:
>> https://review.gluster.org/#/q/starredby:srangana%2540redhat.com
>>
>> [2] Bunched up patch against which regressions were run:
>> 

Re: [Gluster-devel] 3.6.5 -> 4.1 upgrade

2018-08-10 Thread Amar Tumballi
Upgrade from 3.8.x to 4.1 shouldn't be an issue, if you are doing a rolling
upgrade (ie, client will still be accessing the data).

I am not sure about 3.6.5 to 4.1 for rolling upgrade, mainly because there
was some changes which prevented them.

If you are planning for an offline upgrade (ie, mount and bricks can go
offline for few minutes), then it should be fine, we haven't changed
anything on disk format, data layout etc.

-Amar

On Fri, Aug 10, 2018 at 4:45 PM, Roman  wrote:

> Dear devs,
>
> Am I safe to upgrade to 4.1 from 3.6.5 with a single apt-get install
> command?
>
> --
> Best regards,
> Roman.
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Amar Tumballi (amarts)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status (Thu, August 09th)

2018-08-10 Thread Atin Mukherjee
Pranith,

https://review.gluster.org/c/glusterfs/+/20685 seems to have caused
multiple failure runs out of
https://review.gluster.org/c/glusterfs/+/20637/8 out of yesterday's report.
Did you get a chance to look at it?

On Fri, Aug 10, 2018 at 1:03 PM Pranith Kumar Karampuri 
wrote:

>
>
> On Fri, Aug 10, 2018 at 6:34 AM Shyam Ranganathan 
> wrote:
>
>> Today's test results are updated in the spreadsheet in sheet named "Run
>> patch set 8".
>>
>> I took in patch https://review.gluster.org/c/glusterfs/+/20685 which
>> caused quite a few failures, so not updating new failures as issue yet.
>>
>> Please look at the failures for tests that were retried and passed, as
>> the logs for the initial runs should be preserved from this run onward.
>>
>> Otherwise nothing else to report on the run status, if you are averse to
>> spreadsheets look at this comment in gerrit [1].
>>
>> Shyam
>>
>> [1] Patch set 8 run status:
>>
>> https://review.gluster.org/c/glusterfs/+/20637/8#message-54de30fa384fd02b0426d9db6d07fad4eeefcf08
>> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
>> > Deserves a new beginning, threads on the other mail have gone deep
>> enough.
>> >
>> > NOTE: (5) below needs your attention, rest is just process and data on
>> > how to find failures.
>> >
>> > 1) We are running the tests using the patch [2].
>> >
>> > 2) Run details are extracted into a separate sheet in [3] named "Run
>> > Failures" use a search to find a failing test and the corresponding run
>> > that it failed in.
>> >
>> > 3) Patches that are fixing issues can be found here [1], if you think
>> > you have a patch out there, that is not in this list, shout out.
>> >
>> > 4) If you own up a test case failure, update the spreadsheet [3] with
>> > your name against the test, and also update other details as needed (as
>> > comments, as edit rights to the sheet are restricted).
>> >
>> > 5) Current test failures
>> > We still have the following tests failing and some without any RCA or
>> > attention, (If something is incorrect, write back).
>> >
>> > ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
>> > attention)
>> > ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
>> > ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
>> > (Atin)
>> > ./tests/bugs/ec/bug-1236065.t (Ashish)
>> > ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
>> > ./tests/basic/ec/ec-1468261.t (needs attention)
>> > ./tests/basic/afr/add-brick-self-heal.t (needs attention)
>> > ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
>> > ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
>> > ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
>> > ./tests/bugs/replicate/bug-1363721.t (Ravi)
>> >
>> > Here are some newer failures, but mostly one-off failures except cores
>> > in ec-5-2.t. All of the following need attention as these are new.
>> >
>> > ./tests/00-geo-rep/00-georep-verify-setup.t
>> > ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
>> > ./tests/basic/stats-dump.t
>> > ./tests/bugs/bug-1110262.t
>> >
>> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
>> > ./tests/basic/ec/ec-data-heal.t
>> > ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
>>
>
> Sent https://review.gluster.org/c/glusterfs/+/20697 for the test above.
>
>
>> >
>> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
>> > ./tests/basic/ec/ec-5-2.t
>> >
>> > 6) Tests that are addressed or are not occurring anymore are,
>> >
>> > ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
>> > ./tests/bugs/index/bug-1559004-EMLINK-handling.t
>> > ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
>> > ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
>> > ./tests/bitrot/bug-1373520.t
>> > ./tests/bugs/distribute/bug-1117851.t
>> > ./tests/bugs/glusterd/quorum-validation.t
>> > ./tests/bugs/distribute/bug-1042725.t
>> >
>> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
>> > ./tests/bugs/quota/bug-1293601.t
>> > ./tests/bugs/bug-1368312.t
>> > ./tests/bugs/distribute/bug-1122443.t
>> > ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>> >
>> > Shyam (and Atin)
>> >
>> > On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> >> Health on master as of the last nightly run [4] is still the same.
>> >>
>> >> Potential patches that rectify the situation (as in [1]) are bunched in
>> >> a patch [2] that Atin and myself have put through several regressions
>> >> (mux, normal and line coverage) and these have also not passed.
>> >>
>> >> Till we rectify the situation we are locking down master branch commit
>> >> rights to the following people, Amar, Atin, Shyam, Vijay.
>> >>
>> >> The intention is to stabilize master and not add more patches that my
>> >> destabilize it.
>> >>
>> >> Test cases that are tracked as failures and need action are present
>> here
>> >> [3].
>> 

[Gluster-devel] Coverity covscan for 2018-08-10-7d484949 (master branch)

2018-08-10 Thread staticanalysis


GlusterFS Coverity covscan results for the master branch are available from
http://download.gluster.org/pub/gluster/glusterfs/static-analysis/master/glusterfs-coverity/2018-08-10-7d484949/

Coverity covscan results for other active branches are also available at
http://download.gluster.org/pub/gluster/glusterfs/static-analysis/

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] ASAN Builds!

2018-08-10 Thread Niels de Vos
On Fri, Aug 10, 2018 at 05:50:28PM +0530, Nigel Babu wrote:
> Hello folks,
> 
> Thanks to Niels, we now have ASAN builds compiling and a flag for getting
> it to work locally. The patch[1] is not merged yet, but I can trigger runs
> off the patch for now. The first run is off[2]
> 
> [1]: https://review.gluster.org/c/glusterfs/+/20589/2
> [2]: https://build.gluster.org/job/asan/66/console

There has been a newer version of the patch(es) that make ASAN builds
work on el7 systems too. Nigel started a new run at
https://build.gluster.org/job/asan/68/consoleFull and it has

Enable ASAN  : yes

in the console output.

Other devs that want to test this, need to apply a few patches that have
not been merged yet. In case you have git-review installed, the
following should get them (use git-review with the git-remote origin
pointing to review.gluster.org):

1. https://review.gluster.org/c/glusterfs/+/20589
   $ git review -r origin -d 20589

2. https://review.gluster.org/c/glusterfs/+/20688
   $ git review -r origin -d 20688

3. https://review.gluster.org/c/glusterfs/+/20692
   $ git review -r origin -d 20692

With this, you should be able to build with ASAN enabled if you do
either of these:

$ ./autogen.sh && ./configure --enable-asan
$ make dist && rpmbuild --with asan -ta glusterfs*.tar.gz

This could probably e added to some 'how to debug gluster' documents.
Suggestions for the best location are welcome.

Thanks,
Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status (Wed, August 08th)

2018-08-10 Thread Kotresh Hiremath Ravishankar
Hi Shyam/Atin,

I have posted the patch[1] for geo-rep test cases failure:
tests/00-geo-rep/georep-basic-dr-rsync.t
tests/00-geo-rep/georep-basic-dr-tarssh.t
tests/00-geo-rep/00-georep-verify-setup.t

Please include patch [1] while triggering tests.
The instrumentation patch [2] which was included can be removed.

[1]  https://review.gluster.org/#/c/glusterfs/+/20704/
[2]  https://review.gluster.org/#/c/glusterfs/+/20477/

Thanks,
Kotresh HR




On Fri, Aug 10, 2018 at 3:21 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

>
>
> On Thu, Aug 9, 2018 at 4:02 PM Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> On Thu, Aug 9, 2018 at 6:34 AM Shyam Ranganathan 
>> wrote:
>>
>>> Today's patch set 7 [1], included fixes provided till last evening IST,
>>> and its runs can be seen here [2] (yay! we can link to comments in
>>> gerrit now).
>>>
>>> New failures: (added to the spreadsheet)
>>> ./tests/bugs/protocol/bug-808400-repl.t (core dumped)
>>> ./tests/bugs/quick-read/bug-846240.t
>>>
>>> Older tests that had not recurred, but failed today: (moved up in the
>>> spreadsheet)
>>> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
>>> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
>>>
>>
>> The above test is timing out. I had to increase the timeout while adding
>> the .t so that creation of maximum number of links that will max-out in
>> ext4. Will re-check if it is the same issue and get back.
>>
>
> This test is timing out with lcov. I bumped up timeout to 30 minutes @
> https://review.gluster.org/#/c/glusterfs/+/20699, I am not happy that
> this test takes so long, but without this it is difficult to find
> regression on ext4 which has limits on number of hardlinks in a
> directory(It took us almost one year after we introduced regression to find
> this problem when we did introduce regression last time). If there is a way
> of running this .t once per day and before each release. I will be happy to
> make it part of that. Let me know.
>
>
>>
>>
>>>
>>> Other issues;
>>> Test ./tests/basic/ec/ec-5-2.t core dumped again
>>> Few geo-rep failures, Kotresh should have more logs to look at with
>>> these runs
>>> Test ./tests/bugs/glusterd/quorum-validation.t dumped core again
>>>
>>> Atin/Amar, we may need to merge some of the patches that have proven to
>>> be holding up and fixing issues today, so that we do not leave
>>> everything to the last. Check and move them along or lmk.
>>>
>>> Shyam
>>>
>>> [1] Patch set 7: https://review.gluster.org/c/glusterfs/+/20637/7
>>> [2] Runs against patch set 7 and its status (incomplete as some runs
>>> have not completed):
>>> https://review.gluster.org/c/glusterfs/+/20637/7#message-
>>> 37bc68ce6f2157f2947da6fd03b361ab1b0d1a77
>>> (also updated in the spreadsheet)
>>>
>>> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
>>> > Deserves a new beginning, threads on the other mail have gone deep
>>> enough.
>>> >
>>> > NOTE: (5) below needs your attention, rest is just process and data on
>>> > how to find failures.
>>> >
>>> > 1) We are running the tests using the patch [2].
>>> >
>>> > 2) Run details are extracted into a separate sheet in [3] named "Run
>>> > Failures" use a search to find a failing test and the corresponding run
>>> > that it failed in.
>>> >
>>> > 3) Patches that are fixing issues can be found here [1], if you think
>>> > you have a patch out there, that is not in this list, shout out.
>>> >
>>> > 4) If you own up a test case failure, update the spreadsheet [3] with
>>> > your name against the test, and also update other details as needed (as
>>> > comments, as edit rights to the sheet are restricted).
>>> >
>>> > 5) Current test failures
>>> > We still have the following tests failing and some without any RCA or
>>> > attention, (If something is incorrect, write back).
>>> >
>>> > ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
>>> > attention)
>>> > ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
>>> > ./tests/bugs/glusterd/add-brick-and-validate-replicated-
>>> volume-options.t
>>> > (Atin)
>>> > ./tests/bugs/ec/bug-1236065.t (Ashish)
>>> > ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
>>> > ./tests/basic/ec/ec-1468261.t (needs attention)
>>> > ./tests/basic/afr/add-brick-self-heal.t (needs attention)
>>> > ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
>>> > ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
>>> > ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
>>> > ./tests/bugs/replicate/bug-1363721.t (Ravi)
>>> >
>>> > Here are some newer failures, but mostly one-off failures except cores
>>> > in ec-5-2.t. All of the following need attention as these are new.
>>> >
>>> > ./tests/00-geo-rep/00-georep-verify-setup.t
>>> > ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
>>> > ./tests/basic/stats-dump.t
>>> > ./tests/bugs/bug-1110262.t
>>> > ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-
>>> 

Re: [Gluster-devel] Python components and test coverage

2018-08-10 Thread Sankarshan Mukhopadhyay
On Fri, Aug 10, 2018 at 5:47 PM, Nigel Babu  wrote:
> Hello folks,
>
> We're currently in a transition to python3. Right now, there's a bug in one
> piece of this transition code. I saw Nithya run into this yesterday. The
> challenge here is, none of our testing for python2/python3 transition
> catches this bug. Both Pylint and the ast-based testing that Kaleb
> recommended does not catch this bug. The bug is trivial and would take 2
> mins to fix, the challenge is that until we exercise almost all of these
> code paths from both Python3 and Python2, we're not going to find out that
> there are subtle breakages like this.
>

Where is this great reveal - what is this above mentioned bug?

> As far as I know, the three pieces where we use Python are geo-rep,
> glusterfind, and libgfapi-python. My question:
> * Are there more places where we run python?
> * What sort of automated test coverage do we have for these components right
> now?
> * What can the CI team do to help identify problems? We have both Centos7
> and Fedora28 builders, so we can definitely help run tests specific to
> python.
>
> --
> nigelb
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel



-- 
sankarshan mukhopadhyay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] ASAN Builds!

2018-08-10 Thread Nigel Babu
Hello folks,

Thanks to Niels, we now have ASAN builds compiling and a flag for getting
it to work locally. The patch[1] is not merged yet, but I can trigger runs
off the patch for now. The first run is off[2]

[1]: https://review.gluster.org/c/glusterfs/+/20589/2
[2]: https://build.gluster.org/job/asan/66/console

-- 
nigelb
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Python components and test coverage

2018-08-10 Thread Nigel Babu
Hello folks,

We're currently in a transition to python3. Right now, there's a bug in one
piece of this transition code. I saw Nithya run into this yesterday. The
challenge here is, none of our testing for python2/python3 transition
catches this bug. Both Pylint and the ast-based testing that Kaleb
recommended does not catch this bug. The bug is trivial and would take 2
mins to fix, the challenge is that until we exercise almost all of these
code paths from both Python3 and Python2, we're not going to find out that
there are subtle breakages like this.

As far as I know, the three pieces where we use Python are geo-rep,
glusterfind, and libgfapi-python. My question:
* Are there more places where we run python?
* What sort of automated test coverage do we have for these components
right now?
* What can the CI team do to help identify problems? We have both Centos7
and Fedora28 builders, so we can definitely help run tests specific to
python.

-- 
nigelb
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] 3.6.5 -> 4.1 upgrade

2018-08-10 Thread Roman
Dear devs,

Am I safe to upgrade to 4.1 from 3.6.5 with a single apt-get install
command?

-- 
Best regards,
Roman.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status (Wed, August 08th)

2018-08-10 Thread Pranith Kumar Karampuri
On Thu, Aug 9, 2018 at 4:02 PM Pranith Kumar Karampuri 
wrote:

>
>
> On Thu, Aug 9, 2018 at 6:34 AM Shyam Ranganathan 
> wrote:
>
>> Today's patch set 7 [1], included fixes provided till last evening IST,
>> and its runs can be seen here [2] (yay! we can link to comments in
>> gerrit now).
>>
>> New failures: (added to the spreadsheet)
>> ./tests/bugs/protocol/bug-808400-repl.t (core dumped)
>> ./tests/bugs/quick-read/bug-846240.t
>>
>> Older tests that had not recurred, but failed today: (moved up in the
>> spreadsheet)
>> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
>> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
>>
>
> The above test is timing out. I had to increase the timeout while adding
> the .t so that creation of maximum number of links that will max-out in
> ext4. Will re-check if it is the same issue and get back.
>

This test is timing out with lcov. I bumped up timeout to 30 minutes @
https://review.gluster.org/#/c/glusterfs/+/20699, I am not happy that this
test takes so long, but without this it is difficult to find regression on
ext4 which has limits on number of hardlinks in a directory(It took us
almost one year after we introduced regression to find this problem when we
did introduce regression last time). If there is a way of running this .t
once per day and before each release. I will be happy to make it part of
that. Let me know.


>
>
>>
>> Other issues;
>> Test ./tests/basic/ec/ec-5-2.t core dumped again
>> Few geo-rep failures, Kotresh should have more logs to look at with
>> these runs
>> Test ./tests/bugs/glusterd/quorum-validation.t dumped core again
>>
>> Atin/Amar, we may need to merge some of the patches that have proven to
>> be holding up and fixing issues today, so that we do not leave
>> everything to the last. Check and move them along or lmk.
>>
>> Shyam
>>
>> [1] Patch set 7: https://review.gluster.org/c/glusterfs/+/20637/7
>> [2] Runs against patch set 7 and its status (incomplete as some runs
>> have not completed):
>>
>> https://review.gluster.org/c/glusterfs/+/20637/7#message-37bc68ce6f2157f2947da6fd03b361ab1b0d1a77
>> (also updated in the spreadsheet)
>>
>> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
>> > Deserves a new beginning, threads on the other mail have gone deep
>> enough.
>> >
>> > NOTE: (5) below needs your attention, rest is just process and data on
>> > how to find failures.
>> >
>> > 1) We are running the tests using the patch [2].
>> >
>> > 2) Run details are extracted into a separate sheet in [3] named "Run
>> > Failures" use a search to find a failing test and the corresponding run
>> > that it failed in.
>> >
>> > 3) Patches that are fixing issues can be found here [1], if you think
>> > you have a patch out there, that is not in this list, shout out.
>> >
>> > 4) If you own up a test case failure, update the spreadsheet [3] with
>> > your name against the test, and also update other details as needed (as
>> > comments, as edit rights to the sheet are restricted).
>> >
>> > 5) Current test failures
>> > We still have the following tests failing and some without any RCA or
>> > attention, (If something is incorrect, write back).
>> >
>> > ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
>> > attention)
>> > ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
>> > ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
>> > (Atin)
>> > ./tests/bugs/ec/bug-1236065.t (Ashish)
>> > ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
>> > ./tests/basic/ec/ec-1468261.t (needs attention)
>> > ./tests/basic/afr/add-brick-self-heal.t (needs attention)
>> > ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
>> > ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
>> > ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
>> > ./tests/bugs/replicate/bug-1363721.t (Ravi)
>> >
>> > Here are some newer failures, but mostly one-off failures except cores
>> > in ec-5-2.t. All of the following need attention as these are new.
>> >
>> > ./tests/00-geo-rep/00-georep-verify-setup.t
>> > ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
>> > ./tests/basic/stats-dump.t
>> > ./tests/bugs/bug-1110262.t
>> >
>> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
>> > ./tests/basic/ec/ec-data-heal.t
>> > ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
>> >
>> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
>> > ./tests/basic/ec/ec-5-2.t
>> >
>> > 6) Tests that are addressed or are not occurring anymore are,
>> >
>> > ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
>> > ./tests/bugs/index/bug-1559004-EMLINK-handling.t
>> > ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
>> > ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
>> > ./tests/bitrot/bug-1373520.t
>> > ./tests/bugs/distribute/bug-1117851.t
>> > 

[Gluster-devel] Gerrit Upgrade Retrospective

2018-08-10 Thread Nigel Babu
Hello folks,

This is a quick retrospective we (the Infra team) did for the Gerrit
upgrade from 2 days ago.

## Went Well
* We had a full back up to fall back to. We had to fall back on this.
* We had a good 4h window so we had time to make mistakes and recover from
them.
* We had a good number of tests that were part of our upgrade steps. This
helped us catch a problem with the serviceuser plugin. We deleted the
plugin to overcome this.

## Went Badly
* This document did not capture that the serviceuser plugin also needs to
be upgraded.
* We made a mistake where we started the upgrade in the backup rather than
the main folder. We need to change our backup workflow so that this doesn't
happen in the future. This is an incredibly easy mistake to make.
* Git clones did not work. This was not part of our testing.
* cgit shows no repos. This was also not part of our testing.

## Future Recommendations
* [DONE] Setup proper documentation for the Gerrit upgrade workflow.
* We need to ensure that the engineer doing the upgrade does a staging
upgrade at least once or perhaps even twice to ensure the steps are
absolutely accurate.
* Gerrit stage consumes our ansible playbooks, but the sooner we can switch
master to this, the better. It catches problems we've already solved in the
past and automated away.

-- 
nigelb
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Clang failures update

2018-08-10 Thread Amar Tumballi
On Fri, Aug 10, 2018 at 1:50 PM, Nigel Babu  wrote:

> Hello folks,
>
> Based on Yaniv's feedback, I've removed deadcode.DeadStores checker. We
> are left with 161 failures. I'm going to move this to 140 as a target for
> now. The job will continue to be yellow and we need to fix at least 21
> failures by 31 Aug. That's about 7 issues per week to fix.
>

Can we try to do 130 as the target till Aug 31st? That means 10 issues per
week, ie, 2 per working day. Should be possible to achieve IMO :-)

-Amar


>
> If anyone wants me to change the goal posts for this one, please let me
> know.
>
>
> If you want to run this on your local Fedora 27 machine, it should work
> fine.
>

  If you want to run this on a Fedora 28 machine, you'll need to do a
> little bit of a hack. Search for PYTHONDEV_CPPFLAGS in configure.ac and
> add this line right below the existing line:
>
> PYTHONDEV_CPPFLAGS=`echo ${PYTHONDEV_CPPFLAGS} | sed -e
> 's/-fcf-protection//g'`
>
> Fedora 28 has GCC 8.0 and clang 7.0, this is the root cause of this
> failure and in a future version, this should work without the need for this
> hack.
>
> --
> nigelb
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Amar Tumballi (amarts)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Clang failures update

2018-08-10 Thread Nigel Babu
Hello folks,

Based on Yaniv's feedback, I've removed deadcode.DeadStores checker. We are
left with 161 failures. I'm going to move this to 140 as a target for now.
The job will continue to be yellow and we need to fix at least 21 failures
by 31 Aug. That's about 7 issues per week to fix.

If anyone wants me to change the goal posts for this one, please let me
know.


If you want to run this on your local Fedora 27 machine, it should work
fine. If you want to run this on a Fedora 28 machine, you'll need to do a
little bit of a hack. Search for PYTHONDEV_CPPFLAGS in configure.ac and add
this line right below the existing line:

PYTHONDEV_CPPFLAGS=`echo ${PYTHONDEV_CPPFLAGS} | sed -e
's/-fcf-protection//g'`

Fedora 28 has GCC 8.0 and clang 7.0, this is the root cause of this failure
and in a future version, this should work without the need for this hack.

-- 
nigelb
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status (Thu, August 09th)

2018-08-10 Thread Pranith Kumar Karampuri
On Fri, Aug 10, 2018 at 6:34 AM Shyam Ranganathan 
wrote:

> Today's test results are updated in the spreadsheet in sheet named "Run
> patch set 8".
>
> I took in patch https://review.gluster.org/c/glusterfs/+/20685 which
> caused quite a few failures, so not updating new failures as issue yet.
>
> Please look at the failures for tests that were retried and passed, as
> the logs for the initial runs should be preserved from this run onward.
>
> Otherwise nothing else to report on the run status, if you are averse to
> spreadsheets look at this comment in gerrit [1].
>
> Shyam
>
> [1] Patch set 8 run status:
>
> https://review.gluster.org/c/glusterfs/+/20637/8#message-54de30fa384fd02b0426d9db6d07fad4eeefcf08
> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> > Deserves a new beginning, threads on the other mail have gone deep
> enough.
> >
> > NOTE: (5) below needs your attention, rest is just process and data on
> > how to find failures.
> >
> > 1) We are running the tests using the patch [2].
> >
> > 2) Run details are extracted into a separate sheet in [3] named "Run
> > Failures" use a search to find a failing test and the corresponding run
> > that it failed in.
> >
> > 3) Patches that are fixing issues can be found here [1], if you think
> > you have a patch out there, that is not in this list, shout out.
> >
> > 4) If you own up a test case failure, update the spreadsheet [3] with
> > your name against the test, and also update other details as needed (as
> > comments, as edit rights to the sheet are restricted).
> >
> > 5) Current test failures
> > We still have the following tests failing and some without any RCA or
> > attention, (If something is incorrect, write back).
> >
> > ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> > attention)
> > ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> > ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> > (Atin)
> > ./tests/bugs/ec/bug-1236065.t (Ashish)
> > ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> > ./tests/basic/ec/ec-1468261.t (needs attention)
> > ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> > ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> > ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> > ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> > ./tests/bugs/replicate/bug-1363721.t (Ravi)
> >
> > Here are some newer failures, but mostly one-off failures except cores
> > in ec-5-2.t. All of the following need attention as these are new.
> >
> > ./tests/00-geo-rep/00-georep-verify-setup.t
> > ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> > ./tests/basic/stats-dump.t
> > ./tests/bugs/bug-1110262.t
> >
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> > ./tests/basic/ec/ec-data-heal.t
> > ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
>

Sent https://review.gluster.org/c/glusterfs/+/20697 for the test above.


> >
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> > ./tests/basic/ec/ec-5-2.t
> >
> > 6) Tests that are addressed or are not occurring anymore are,
> >
> > ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> > ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> > ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> > ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> > ./tests/bitrot/bug-1373520.t
> > ./tests/bugs/distribute/bug-1117851.t
> > ./tests/bugs/glusterd/quorum-validation.t
> > ./tests/bugs/distribute/bug-1042725.t
> >
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> > ./tests/bugs/quota/bug-1293601.t
> > ./tests/bugs/bug-1368312.t
> > ./tests/bugs/distribute/bug-1122443.t
> > ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> >
> > Shyam (and Atin)
> >
> > On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
> >> Health on master as of the last nightly run [4] is still the same.
> >>
> >> Potential patches that rectify the situation (as in [1]) are bunched in
> >> a patch [2] that Atin and myself have put through several regressions
> >> (mux, normal and line coverage) and these have also not passed.
> >>
> >> Till we rectify the situation we are locking down master branch commit
> >> rights to the following people, Amar, Atin, Shyam, Vijay.
> >>
> >> The intention is to stabilize master and not add more patches that my
> >> destabilize it.
> >>
> >> Test cases that are tracked as failures and need action are present here
> >> [3].
> >>
> >> @Nigel, request you to apply the commit rights change as you see this
> >> mail and let the list know regarding the same as well.
> >>
> >> Thanks,
> >> Shyam
> >>
> >> [1] Patches that address regression failures:
> >> https://review.gluster.org/#/q/starredby:srangana%2540redhat.com
> >>
> >> [2] Bunched up patch against which regressions were run:
> >> https://review.gluster.org/#/c/20637
> >>
> 

Re: [Gluster-devel] afr_set_transaction_flock lead to bad perforance when write with multi-pthread or multi-process

2018-08-10 Thread Lian, George (NSB - CN/Hangzhou)
Hi,
>>>Can you please try and disable eager-lock?
Eager-lock is disabled already,  and from the source code below:

Arbiter and data FOP will trigger FLOCK the entire file, isn’t it ?

if ((priv->arbiter_count || local->transaction.eager_lock_on ||
 priv->full_lock) &&
local->transaction.type == AFR_DATA_TRANSACTION) {
/*Lock entire file to avoid network split brains.*/
int_lock->flock.l_len   = 0;
int_lock->flock.l_start = 0;
} else {
Best Regards,
George


From: Yaniv Kaul 
Sent: Friday, August 10, 2018 1:37 AM
To: Lian, George (NSB - CN/Hangzhou) 
Subject: Re: [Gluster-devel] afr_set_transaction_flock lead to bad perforance 
when write with multi-pthread or multi-process

Can you please try and disable eager-lock?
Y.


On Thu, Aug 9, 2018, 8:01 PM Lian, George (NSB - CN/Hangzhou) 
mailto:george.l...@nokia-sbell.com>> wrote:
Hi, Gluster expert,

When we setup replicate volume with info like the below:

Volume Name: test
Type: Replicate
Volume ID: 9373eba9-eb84-4618-a54c-f2837345daec
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: rcp:/trunk/brick/test1/sn0
Brick2: rcp:/trunk/brick/test1/sn1
Brick3: rcp:/trunk/brick/test1/sn2 (arbiter)

If we run a performance test which could write a same file with multi-pthread 
in same time.(write different offset). The write performance drop a lots (about 
60%-70% off  to the volume which no arbiter)
And when we study the source code, there is a function 
“afr_set_transaction_flock” in” afr-transaction.c”,
It will flock the entire file when arbiter_count is not zero, I suppose it is 
the root cause lead to performance drop.
Now my question is:

1) Why flock the entire file when arbiter is set on? Could you please share 
the detail why it will lead to split brain only to arbiter?

2) If it is the root cause, and it really will lead to split-brain if not 
lock entire file, is there any solution to avoid performance drop for this 
mulit-write case?

The following is attached source code for this function FYI:
--
int afr_set_transaction_flock (xlator_t *this, afr_local_t *local)
{
afr_internal_lock_t *int_lock = NULL;
afr_private_t   *priv = NULL;

int_lock = >internal_lock;
priv = this->private;

if ((priv->arbiter_count || local->transaction.eager_lock_on ||
 priv->full_lock) &&
local->transaction.type == AFR_DATA_TRANSACTION) {
/*Lock entire file to avoid network split brains.*/
int_lock->flock.l_len   = 0;
int_lock->flock.l_start = 0;
} else {
int_lock->flock.l_len   = local->transaction.len;
int_lock->flock.l_start = local->transaction.start;
}
int_lock->flock.l_type  = F_WRLCK;

return 0;
}

Thanks & Best Regards,
George
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Gluster Outreachy

2018-08-10 Thread Bhumika Goyal
Hi all,

*Gentle reminder!*

The doc[1] for adding project ideas for Outreachy will be open for editing
till August 20th. Please feel free to add your project ideas :).
[1]:
https://docs.google.com/document/d/16yKKDD2Dd6Ag0tssrdoFPojKsF16QI5-j7cUHcR5Pq4/edit?usp=sharing

Thanks,
Bhumika



On Wed, Jul 4, 2018 at 4:51 PM, Bhumika Goyal  wrote:

> Hi all,
>
> Gnome has been working on an initiative known as Outreachy[1] since 2010.
> Outreachy is a three months remote internship program. It aims to increase
> the participation of women and members from under-represented groups in
> open source. This program is held twice in a year. During the internship
> period, interns contribute to a project under the guidance of one or more
> mentors.
>
> For the next round(Dec 2018- March 2019) we are planning to apply projects
> from Gluster. We would like you to propose projects ideas or/and come
> forward as mentors/volunteers.
> Please feel free to add project ideas in this doc[2]. The doc[2] will be
> open for editing till July end.
>
> [1]: https://www.outreachy.org/
> [2]: https://docs.google.com/document/d/16yKKDD2Dd6Ag0tssrdoFPojK
> sF16QI5-j7cUHcR5Pq4/edit?usp=sharing
>
> Outreachy timeline:
> Pre-Application Period - Late August to early September
> Application Period - Early September to mid-October
> Internship Period -  December to March
>
> Thanks,
> Bhumika
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel