Re: [Gluster-devel] Release 6.2: Expected tagging on May 15th

2019-05-17 Thread Shyam Ranganathan
These patches were dependent on each other, so a merge was not required
to get regression passing, that analysis seems incorrect.

A patch series when tested, will pull in all the dependent patches
anyway, so please relook at what the failure could be. (I assume you
would anyway).

Shyam
On 5/17/19 3:46 AM, Hari Gowtham wrote:
> https://review.gluster.org/#/q/topic:%22ref-1709738%22+(status:open%20OR%20status:merged)
> 
> On Fri, May 17, 2019 at 1:13 PM Amar Tumballi Suryanarayan
>  wrote:
>>
>> Which are the patches? I can merge it for now.
>>
>> -Amar
>>
>> On Fri, May 17, 2019 at 1:10 PM Hari Gowtham  wrote:
>>>
>>> Thanks Sunny.
>>> Have CCed Shyam.
>>>
>>> On Fri, May 17, 2019 at 1:06 PM Sunny Kumar  wrote:
>>>>
>>>> Hi Hari,
>>>>
>>>> For this to pass regression other 3 patches needs to merge first, I
>>>> tried to merge but do not have sufficient permissions to merge on 6.2
>>>> branch.
>>>> I know bug is already in place to grant additional permission for
>>>> us(Me, you and Rinku) so until then waiting on Shyam to merge it.
>>>>
>>>> -Sunny
>>>>
>>>> On Fri, May 17, 2019 at 12:54 PM Hari Gowtham  wrote:
>>>>>
>>>>> Hi Kotresh ans Sunny,
>>>>> The patch has been failing regression a few times.
>>>>> We need to look into why this is happening and take a decision
>>>>> as to take it in release 6.2 or drop it.
>>>>>
>>>>> On Wed, May 15, 2019 at 4:27 PM Hari Gowtham  wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The following patch is waiting for centos regression.
>>>>>> https://review.gluster.org/#/c/glusterfs/+/22725/
>>>>>>
>>>>>> Sunny or Kotresh, please do take a look so that we can go ahead with
>>>>>> the tagging.
>>>>>>
>>>>>> On Thu, May 9, 2019 at 4:45 PM Hari Gowtham  wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Expected tagging date for release-6.2 is on May, 15th, 2019.
>>>>>>>
>>>>>>> Please ensure required patches are backported and also are passing
>>>>>>> regressions and are appropriately reviewed for easy merging and tagging
>>>>>>> on the date.
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Hari Gowtham.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Hari Gowtham.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Hari Gowtham.
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Hari Gowtham.
>>> ___
>>>
>>> Community Meeting Calendar:
>>>
>>> APAC Schedule -
>>> Every 2nd and 4th Tuesday at 11:30 AM IST
>>> Bridge: https://bluejeans.com/836554017
>>>
>>> NA/EMEA Schedule -
>>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>>> Bridge: https://bluejeans.com/486278655
>>>
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>>
>>
>>
>> --
>> Amar Tumballi (amarts)
> 
> 
> 
___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



[Gluster-devel] Announcing Gluster release 6.1

2019-04-22 Thread Shyam Ranganathan
The Gluster community is pleased to announce the release of Gluster
6.1 (packages available at [1]).

Release notes for the release can be found at [2].

Major changes, features and limitations addressed in this release:

None

Thanks,
Gluster community

[1] Packages for 6.1:
https://download.gluster.org/pub/gluster/glusterfs/6/6.1/

[2] Release notes for 6.1:
https://docs.gluster.org/en/latest/release-notes/6.1/

___
maintainers mailing list
maintain...@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 6.1: Tagged!

2019-04-17 Thread Shyam Ranganathan
Is now tagged and being packaged. If anyone gets a chance, please test
the packages from CentOS SIG, as I am unavailable for the next 4 days.

Thanks,
Shyam
On 4/16/19 9:53 AM, Shyam Ranganathan wrote:
> Status: Tagging pending
> 
> Waiting on patches:
> (Kotresh/Atin) - glusterd: fix loading ctime in client graph logic
>   https://review.gluster.org/c/glusterfs/+/22579
> 
> Following patches will not be taken in if CentOS regression does not
> pass by tomorrow morning Eastern TZ,
> (Pranith/KingLongMee) - cluster-syncop: avoid duplicate unlock of
> inodelk/entrylk
>   https://review.gluster.org/c/glusterfs/+/22385
> (Aravinda) - geo-rep: IPv6 support
>   https://review.gluster.org/c/glusterfs/+/22488
> (Aravinda) - geo-rep: fix integer config validation
>   https://review.gluster.org/c/glusterfs/+/22489
> 
> Tracker bug status:
> (Ravi) - Bug 1693155 - Excessive AFR messages from gluster showing in
> RHGSWA.
>   All patches are merged, but none of the patches adds the "Fixes"
> keyword, assume this is an oversight and that the bug is fixed in this
> release.
> 
> (Atin) - Bug 1698131 - multiple glusterfsd processes being launched for
> the same brick, causing transport endpoint not connected
>   No work has occurred post logs upload to bug, restart of bircks and
> possibly glusterd is the existing workaround when the bug is hit. Moving
> this out of the tracker for 6.1.
> 
> (Xavi) - Bug 1699917 - I/O error on writes to a disperse volume when
> replace-brick is executed
>   Very recent bug (15th April), does not seem to have any critical data
> corruption or service availability issues, planning on not waiting for
> the fix in 6.1
> 
> - Shyam
> On 4/6/19 4:38 AM, Atin Mukherjee wrote:
>> Hi Mohit,
>>
>> https://review.gluster.org/22495 should get into 6.1 as it’s a
>> regression. Can you please attach the respective bug to the tracker Ravi
>> pointed out?
>>
>>
>> On Sat, 6 Apr 2019 at 12:00, Ravishankar N > <mailto:ravishan...@redhat.com>> wrote:
>>
>> Tracker bug is https://bugzilla.redhat.com/show_bug.cgi?id=1692394, in
>> case anyone wants to add blocker bugs.
>>
>>
>> On 05/04/19 8:03 PM, Shyam Ranganathan wrote:
>> > Hi,
>> >
>> > Expected tagging date for release-6.1 is on April, 10th, 2019.
>> >
>> > Please ensure required patches are backported and also are passing
>> > regressions and are appropriately reviewed for easy merging and
>> tagging
>> > on the date.
>> >
>> > Thanks,
>> > Shyam
>> > ___
>> > Gluster-devel mailing list
>> > Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
>> > https://lists.gluster.org/mailman/listinfo/gluster-devel
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>> -- 
>> - Atin (atinm)
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 6.1: Expected tagging on April 10th

2019-04-16 Thread Shyam Ranganathan
Status: Tagging pending

Waiting on patches:
(Kotresh/Atin) - glusterd: fix loading ctime in client graph logic
  https://review.gluster.org/c/glusterfs/+/22579

Following patches will not be taken in if CentOS regression does not
pass by tomorrow morning Eastern TZ,
(Pranith/KingLongMee) - cluster-syncop: avoid duplicate unlock of
inodelk/entrylk
  https://review.gluster.org/c/glusterfs/+/22385
(Aravinda) - geo-rep: IPv6 support
  https://review.gluster.org/c/glusterfs/+/22488
(Aravinda) - geo-rep: fix integer config validation
  https://review.gluster.org/c/glusterfs/+/22489

Tracker bug status:
(Ravi) - Bug 1693155 - Excessive AFR messages from gluster showing in
RHGSWA.
  All patches are merged, but none of the patches adds the "Fixes"
keyword, assume this is an oversight and that the bug is fixed in this
release.

(Atin) - Bug 1698131 - multiple glusterfsd processes being launched for
the same brick, causing transport endpoint not connected
  No work has occurred post logs upload to bug, restart of bircks and
possibly glusterd is the existing workaround when the bug is hit. Moving
this out of the tracker for 6.1.

(Xavi) - Bug 1699917 - I/O error on writes to a disperse volume when
replace-brick is executed
  Very recent bug (15th April), does not seem to have any critical data
corruption or service availability issues, planning on not waiting for
the fix in 6.1

- Shyam
On 4/6/19 4:38 AM, Atin Mukherjee wrote:
> Hi Mohit,
> 
> https://review.gluster.org/22495 should get into 6.1 as it’s a
> regression. Can you please attach the respective bug to the tracker Ravi
> pointed out?
> 
> 
> On Sat, 6 Apr 2019 at 12:00, Ravishankar N  <mailto:ravishan...@redhat.com>> wrote:
> 
> Tracker bug is https://bugzilla.redhat.com/show_bug.cgi?id=1692394, in
> case anyone wants to add blocker bugs.
> 
> 
> On 05/04/19 8:03 PM, Shyam Ranganathan wrote:
> > Hi,
> >
> > Expected tagging date for release-6.1 is on April, 10th, 2019.
> >
> > Please ensure required patches are backported and also are passing
> > regressions and are appropriately reviewed for easy merging and
> tagging
> > on the date.
> >
> > Thanks,
> > Shyam
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
> > https://lists.gluster.org/mailman/listinfo/gluster-devel
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
> 
> -- 
> - Atin (atinm)
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Release 6.1: Expected tagging on April 10th

2019-04-05 Thread Shyam Ranganathan
Hi,

Expected tagging date for release-6.1 is on April, 10th, 2019.

Please ensure required patches are backported and also are passing
regressions and are appropriately reviewed for easy merging and tagging
on the date.

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Announcing Gluster release 4.1.8

2019-04-05 Thread Shyam Ranganathan
The Gluster community is pleased to announce the release of Gluster
4.1.8 (packages available at [1]).

Release notes for the release can be found at [2].

Major changes, features and limitations addressed in this release:

None

Thanks,
Gluster community

[1] Packages for 4.1.8:
https://download.gluster.org/pub/gluster/glusterfs/4.1/4.1.8/

[2] Release notes for 4.1.8:
https://docs.gluster.org/en/latest/release-notes/4.1.8/



___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Announcing Gluster Release 6

2019-03-25 Thread Shyam Ranganathan
The Gluster community is pleased to announce the release of 6.0, our
latest release.

This is a major release that includes a range of code improvements and
stability fixes along with a few features as noted below.

A selection of the key features and bugs addressed are documented in
this [1] page.

Announcements:

1. Releases that receive maintenance updates post release 6 are, 4.1 and
5 [2]

2. Release 6 will receive maintenance updates around the 10th of every
month for the first 3 months post release (i.e Apr'19, May'19, Jun'19).
Post the initial 3 months, it will receive maintenance updates every 2
months till EOL. [3]

A series of features/xlators have been deprecated in release 6 as
follows, for upgrade procedures from volumes that use these features to
release 6 refer to the release 6 upgrade guide [4].

Features deprecated:
- Block device (bd) xlator
- Decompounder feature
- Crypt xlator
- Symlink-cache xlator
- Stripe feature
- Tiering support (tier xlator and changetimerecorder)

Highlights of this release are:
- Several stability fixes addressing, coverity, clang-scan, address
sanitizer and valgrind reported issues
- Removal of unused and hence, deprecated code and features
- Client side inode garbage collection
- This release addresses one of the major concerns regarding FUSE mount
process memory footprint, by introducing client side inode garbage
collection
- Performance Improvements
- "--auto-invalidation" on FUSE mounts to leverage kernel page cache
more effectively

Bugs addressed are provided towards the end, in the release notes [1]

Thank you,
Gluster community

References:
[1] Release notes: https://docs.gluster.org/en/latest/release-notes/6.0/

[2] Release schedule: https://www.gluster.org/release-schedule/

[3] Gluster release cadence and version changes:
https://lists.gluster.org/pipermail/announce/2018-July/000103.html

[4] Upgrade guide to release-6:
https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_6/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster version EOL date

2019-03-22 Thread Shyam Ranganathan
As per the release schedule page [1] 5 will EOL when release 8 is out.
Releases are 4 months apart, hence 12 months from when release 5 was
out, it would be EOL'd.

Major releases receive minor updates 5.x, which are bug fixes and
stability fixes. These do not extend the 12 month cycle for the release.

Shyam

[1] Release schedule: https://www.gluster.org/release-schedule/

On 3/22/19 6:45 AM, ABHISHEK PALIWAL wrote:
> Hi,
> 
> As per gluster community seems the latest version is 5.5. Could any one
> tell me what would be the EOL date for version 5.5? is it after 12 month
> of release date or what?
> 
> -- 
> 
> 
> 
> 
> Regards
> Abhishek Paliwal
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Release 6: Tagged and ready for packaging

2019-03-19 Thread Shyam Ranganathan
Hi,

RC1 testing is complete and blockers have been addressed. The release is
now tagged for a final round of packaging and package testing before
release.

Thanks for testing out the RC builds and reporting issues that needed to
be addressed.

As packaging and final package testing is finishing up, we would be
writing the upgrade guide for the release as well, before announcing the
release for general consumption.

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] GlusterFS - 6.0RC - Test days (27th, 28th Feb)

2019-03-05 Thread Shyam Ranganathan
On 3/4/19 12:33 PM, Shyam Ranganathan wrote:
> On 3/4/19 10:08 AM, Atin Mukherjee wrote:
>>
>>
>> On Mon, 4 Mar 2019 at 20:33, Amar Tumballi Suryanarayan
>> mailto:atumb...@redhat.com>> wrote:
>>
>> Thanks to those who participated.
>>
>> Update at present:
>>
>> We found 3 blocker bugs in upgrade scenarios, and hence have marked
>> release
>> as pending upon them. We will keep these lists updated about progress.
>>
>>
>> I’d like to clarify that upgrade testing is blocked. So just fixing
>> these test blocker(s) isn’t enough to call release-6 green. We need to
>> continue and finish the rest of the upgrade tests once the respective
>> bugs are fixed.
> 
> Based on fixes expected by tomorrow for the upgrade fixes, we will build
> an RC1 candidate on Wednesday (6-Mar) (tagging early Wed. Eastern TZ).
> This RC can be used for further testing.

There have been no backports for the upgrade failures, request folks
working on the same to post a list of bugs that need to be fixed, to
enable tracking the same. (also, ensure they are marked against the
release-6 tracker [1])

Also, we need to start writing out the upgrade guide for release-6, any
volunteers for the same?

Thanks,
Shyam

[1] Release-6 tracker bug:
https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-6.0
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] GlusterFS - 6.0RC - Test days (27th, 28th Feb)

2019-03-04 Thread Shyam Ranganathan
On 3/4/19 10:08 AM, Atin Mukherjee wrote:
> 
> 
> On Mon, 4 Mar 2019 at 20:33, Amar Tumballi Suryanarayan
> mailto:atumb...@redhat.com>> wrote:
> 
> Thanks to those who participated.
> 
> Update at present:
> 
> We found 3 blocker bugs in upgrade scenarios, and hence have marked
> release
> as pending upon them. We will keep these lists updated about progress.
> 
> 
> I’d like to clarify that upgrade testing is blocked. So just fixing
> these test blocker(s) isn’t enough to call release-6 green. We need to
> continue and finish the rest of the upgrade tests once the respective
> bugs are fixed.

Based on fixes expected by tomorrow for the upgrade fixes, we will build
an RC1 candidate on Wednesday (6-Mar) (tagging early Wed. Eastern TZ).
This RC can be used for further testing.

> 
> 
> 
> -Amar
> 
> On Mon, Feb 25, 2019 at 11:41 PM Amar Tumballi Suryanarayan <
> atumb...@redhat.com > wrote:
> 
> > Hi all,
> >
> > We are calling out our users, and developers to contribute in
> validating
> > ‘glusterfs-6.0rc’ build in their usecase. Specially for the cases of
> > upgrade, stability, and performance.
> >
> > Some of the key highlights of the release are listed in release-notes
> > draft
> >
> 
> .
> > Please note that there are some of the features which are being
> dropped out
> > of this release, and hence making sure your setup is not going to
> have an
> > issue is critical. Also the default lru-limit option in fuse mount for
> > Inodes should help to control the memory usage of client
> processes. All the
> > good reason to give it a shot in your test setup.
> >
> > If you are developer using gfapi interface to integrate with other
> > projects, you also have some signature changes, so please make
> sure your
> > project would work with latest release. Or even if you are using a
> project
> > which depends on gfapi, report the error with new RPMs (if any).
> We will
> > help fix it.
> >
> > As part of test days, we want to focus on testing the latest upcoming
> > release i.e. GlusterFS-6, and one or the other gluster volunteers
> would be
> > there on #gluster channel on freenode to assist the people. Some
> of the key
> > things we are looking as bug reports are:
> >
> >    -
> >
> >    See if upgrade from your current version to 6.0rc is smooth,
> and works
> >    as documented.
> >    - Report bugs in process, or in documentation if you find mismatch.
> >    -
> >
> >    Functionality is all as expected for your usecase.
> >    - No issues with actual application you would run on production
> etc.
> >    -
> >
> >    Performance has not degraded in your usecase.
> >    - While we have added some performance options to the code, not
> all of
> >       them are turned on, as they have to be done based on usecases.
> >       - Make sure the default setup is at least same as your current
> >       version
> >       - Try out few options mentioned in release notes (especially,
> >       --auto-invalidation=no) and see if it helps performance.
> >    -
> >
> >    While doing all the above, check below:
> >    - see if the log files are making sense, and not flooding with some
> >       “for developer only” type of messages.
> >       - get ‘profile info’ output from old and now, and see if
> there is
> >       anything which is out of normal expectation. Check with us
> on the numbers.
> >       - get a ‘statedump’ when there are some issues. Try to make
> sense
> >       of it, and raise a bug if you don’t understand it completely.
> >
> >
> >
> 
> Process
> > expected on test days.
> >
> >    -
> >
> >    We have a tracker bug
> >    [0]
> >    - We will attach all the ‘blocker’ bugs to this bug.
> >    -
> >
> >    Use this link to report bugs, so that we have more metadata around
> >    given bugzilla.
> >    - Click Here
> >     
>  
> 
> >       [1]
> >    -
> >
> >    The test cases which are to be tested are listed here in this sheet
> >   
> 
> [2],
> >    please add, update, and keep it up-to-date to reduce duplicate
> efforts
> 
> -- 
> - Atin (atinm)
> 
> ___
> Gluster-devel mailing list

Re: [Gluster-devel] [Gluster-Maintainers] glusterfs-6.0rc0 released

2019-02-25 Thread Shyam Ranganathan
Hi,

Release-6 RC0 packages are built (see mail below). This is a good time
to start testing the release bits, and reporting any issues on bugzilla.
Do post on the lists any testing done and feedback from the same.

We have about 2 weeks to GA of release-6 barring any major blockers
uncovered during the test phase. Please take this time to help make the
release effective, by testing the same.

Thanks,
Shyam

NOTE: CentOS StorageSIG packages for the same are still pending and
should be available in due course.
On 2/23/19 9:41 AM, Kaleb Keithley wrote:
> 
> GlusterFS 6.0rc0 is built in Fedora 30 and Fedora 31/rawhide.
> 
> Packages for Fedora 29, RHEL 8, RHEL 7, and RHEL 6* and Debian 9/stretch
> and Debian 10/buster are at
> https://download.gluster.org/pub/gluster/glusterfs/qa-releases/6.0rc0/
> 
> Packages are signed. The public key is at
> https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub
> 
> * RHEL 6 is client-side only. Fedora 29, RHEL 7, and RHEL 6 RPMs are
> Fedora Koji scratch builds. RHEL 7 and RHEL 6 RPMs are provided here for
> convenience only, and are independent of the RPMs in the CentOS Storage SIG.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 6: Branched and next steps

2019-02-20 Thread Shyam Ranganathan
On 2/20/19 7:45 AM, Amar Tumballi Suryanarayan wrote:
> 
> 
> On Tue, Feb 19, 2019 at 1:37 AM Shyam Ranganathan  <mailto:srang...@redhat.com>> wrote:
> 
> In preparation for RC0 I have put up an intial patch for the release
> notes [1]. Request the following actions on the same (either a followup
> patchset, or a dependent one),
> 
> - Please review!
> - Required GD2 section updated to latest GD2 status
> 
> 
> I am inclined to drop the GD2 section for 'standalone' users. As the
> team worked with goals of making GD2 invisible with containers (GCS) in
> mind. So, should we call out any features of GD2 at all?

This is fine, we possibly need to add a note in the release notes, on
the GD2 future and where it would land, so that we can inform users
about the continued use of GD1 in non-GCS use cases.

I will add some text around the same in the release-notes.

> 
> Anyways, as per my previous email on GCS release updates, we are
> planning to have a container available with gd2 and glusterfs, which can
> be used by people who are trying out options with GD2.
>  
> 
> - Require notes on "Reduce the number or threads used in the brick
> process" and the actual status of the same in the notes
> 
> 
> This work is still in progress, and we are treating it as a bug fix for
> 'brick-multiplex' usecase, which is mainly required in scaled volume
> number usecase in container world. My guess is, we won't have much
> content to add for glusterfs-6.0 at the moment.

Ack!

>  
> 
> RC0 build target would be tomorrow or by Wednesday.
> 
> 
> Thanks, I was testing for few upgrade and different version clusters
> support. With 4.1.6 and latest release-6.0 branch, things works fine. I
> haven't done much of a load testing yet. 

Awesome! Helps write out the upgrade guide as well. As this time content
there would/should carry data regarding how to upgrade if any of the
deprecated xlators are in use by a deployment.

> 
> Requesting people to support in upgrade testing. From different volume
> options, and different usecase scenarios.
> 
> Regards,
> Amar
> 
>  
> 
> Thanks,
> Shyam
> 
> [1] Release notes patch: https://review.gluster.org/c/glusterfs/+/6
> 
> On 2/5/19 8:25 PM, Shyam Ranganathan wrote:
> > Hi,
> >
> > Release 6 is branched, and tracker bug for 6.0 is created [1].
> >
> > Do mark blockers for the release against [1].
> >
> > As of now we are only tracking [2] "core: implement a global
> thread pool
> > " for a backport as a feature into the release.
> >
> > We expect to create RC0 tag and builds for upgrade and other testing
> > close to mid-week next week (around 13th Feb), and the release is
> slated
> > for the first week of March for GA.
> >
> > I will post updates to this thread around release notes and other
> > related activity.
> >
> > Thanks,
> > Shyam
> >
> > [1] Tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-6.0
> >
> > [2] Patches tracked for a backport:
> >   - https://review.gluster.org/c/glusterfs/+/20636
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
> > https://lists.gluster.org/mailman/listinfo/gluster-devel
> >
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
> 
> 
> 
> -- 
> Amar Tumballi (amarts)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 6: Branched and next steps

2019-02-18 Thread Shyam Ranganathan
In preparation for RC0 I have put up an intial patch for the release
notes [1]. Request the following actions on the same (either a followup
patchset, or a dependent one),

- Please review!
- Required GD2 section updated to latest GD2 status
- Require notes on "Reduce the number or threads used in the brick
process" and the actual status of the same in the notes

RC0 build target would be tomorrow or by Wednesday.

Thanks,
Shyam

[1] Release notes patch: https://review.gluster.org/c/glusterfs/+/6

On 2/5/19 8:25 PM, Shyam Ranganathan wrote:
> Hi,
> 
> Release 6 is branched, and tracker bug for 6.0 is created [1].
> 
> Do mark blockers for the release against [1].
> 
> As of now we are only tracking [2] "core: implement a global thread pool
> " for a backport as a feature into the release.
> 
> We expect to create RC0 tag and builds for upgrade and other testing
> close to mid-week next week (around 13th Feb), and the release is slated
> for the first week of March for GA.
> 
> I will post updates to this thread around release notes and other
> related activity.
> 
> Thanks,
> Shyam
> 
> [1] Tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-6.0
> 
> [2] Patches tracked for a backport:
>   - https://review.gluster.org/c/glusterfs/+/20636
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Release 6: Branched and next steps

2019-02-05 Thread Shyam Ranganathan
Hi,

Release 6 is branched, and tracker bug for 6.0 is created [1].

Do mark blockers for the release against [1].

As of now we are only tracking [2] "core: implement a global thread pool
" for a backport as a feature into the release.

We expect to create RC0 tag and builds for upgrade and other testing
close to mid-week next week (around 13th Feb), and the release is slated
for the first week of March for GA.

I will post updates to this thread around release notes and other
related activity.

Thanks,
Shyam

[1] Tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-6.0

[2] Patches tracked for a backport:
  - https://review.gluster.org/c/glusterfs/+/20636
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 6: Kick off!

2019-01-24 Thread Shyam Ranganathan
On 1/24/19 3:23 AM, Soumya Koduri wrote:
> Hi Shyam,
> 
> Sorry for the late response. I just realized that we had two more new
> APIs glfs_setattr/fsetattr which uses 'struct stat' made public [1]. As
> mentioned in one of the patchset review comments, since the goal is to
> move to glfs_stat in release-6, do we need to update these APIs as well
> to use the new struct? Or shall we retain them in FUTURE for now and
> address in next minor release? Please suggest.

So the goal in 6 is to not return stat but glfs_stat in the modified
pre/post stat return APIs (instead of making this a 2-step for
application consumers).

To reach glfs_stat everywhere, we have a few more things to do. I had
this patch in my radar, but just like pub_glfs_stat returns stat (hence
we made glfs_statx as private), I am seeing this as "fine for now". In
the future we only want to return glfs_stat.

So for now, we let this API be. The next round of converting stat to
glfs_stat would take into account clearing up all such instances. So
that all application consumers will need to modify code as required in
one shot.

Does this answer the concern? and, thanks for bringing this to notice.

> 
> Thanks,
> Soumya
> 
> [1] https://review.gluster.org/#/c/glusterfs/+/21734/
> 
> 
> On 1/23/19 8:43 PM, Shyam Ranganathan wrote:
>> On 1/23/19 6:03 AM, Ashish Pandey wrote:
>>>
>>> Following is the patch I am working and targeting -
>>> https://review.gluster.org/#/c/glusterfs/+/21933/
>>
>> This is a bug fix, and the patch size at the moment is also small in
>> lines changed. Hence, even if it misses branching the fix can be
>> backported.
>>
>> Thanks for the heads up!
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 6: Kick off!

2019-01-23 Thread Shyam Ranganathan
On 1/23/19 6:03 AM, Ashish Pandey wrote:
> 
> Following is the patch I am working and targeting - 
> https://review.gluster.org/#/c/glusterfs/+/21933/

This is a bug fix, and the patch size at the moment is also small in
lines changed. Hence, even if it misses branching the fix can be backported.

Thanks for the heads up!
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 6: Kick off!

2019-01-23 Thread Shyam Ranganathan
On 1/23/19 5:52 AM, RAFI KC wrote:
> There are three patches that I'm working for Gluster-6.
> 
> [1] : https://review.gluster.org/#/c/glusterfs/+/22075/

We discussed mux for shd in the maintainers meeting, and decided that
this would be for the next release, as the patchset is not ready
(branching is today, if I get the time to get it done).

> 
> [2] : https://review.gluster.org/#/c/glusterfs/+/21333/

Ack! in case this is not in by branching we can backport the same

> 
> [3] : https://review.gluster.org/#/c/glusterfs/+/21720/

Bug fix, can be backported post branching as well, so again ack!

Thanks for responding.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 6: Kick off!

2019-01-18 Thread Shyam Ranganathan
On 12/6/18 9:34 AM, Shyam Ranganathan wrote:
> On 11/6/18 11:34 AM, Shyam Ranganathan wrote:
>> ## Schedule
> 
> We have decided to postpone release-6 by a month, to accommodate for
> late enhancements and the drive towards getting what is required for the
> GCS project [1] done in core glusterfs.
> 
> This puts the (modified) schedule for Release-6 as below,
> 
> Working backwards on the schedule, here's what we have:
> - Announcement: Week of Mar 4th, 2019
> - GA tagging: Mar-01-2019
> - RC1: On demand before GA
> - RC0: Feb-04-2019
> - Late features cut-off: Week of Jan-21st, 2018
> - Branching (feature cutoff date): Jan-14-2018
>   (~45 days prior to branching)

We are slightly past the branching date, I would like to branch early
next week, so please respond with a list of patches that need to be part
of the release and are still pending a merge, will help address review
focus on the same and also help track it down and branch the release.

Thanks, Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression health for release-5.next and release-6

2019-01-11 Thread Shyam Ranganathan
We can check health on master post the patch as stated by Mohit below.

Release-5 is causing some concerns as we need to tag the release
yesterday, but we have the following 2 tests failing or coredumping
pretty regularly, need attention on these.

ec/bug-1236065.t
glusterd/add-brick-and-validate-replicated-volume-options.t

Shyam
On 1/10/19 6:20 AM, Mohit Agrawal wrote:
> I think we should consider regression-builds after merged the patch
> (https://review.gluster.org/#/c/glusterfs/+/21990/) 
> as we know this patch introduced some delay.
> 
> Thanks,
> Mohit Agrawal
> 
> On Thu, Jan 10, 2019 at 3:55 PM Atin Mukherjee  <mailto:amukh...@redhat.com>> wrote:
> 
> Mohit, Sanju - request you to investigate the failures related to
> glusterd and brick-mux and report back to the list.
> 
> On Thu, Jan 10, 2019 at 12:25 AM Shyam Ranganathan
> mailto:srang...@redhat.com>> wrote:
> 
> Hi,
> 
> As part of branching preparation next week for release-6, please
> find
> test failures and respective test links here [1].
> 
> The top tests that are failing/dumping-core are as below and
> need attention,
> - ec/bug-1236065.t
> - glusterd/add-brick-and-validate-replicated-volume-options.t
> - readdir-ahead/bug-1390050.t
> - glusterd/brick-mux-validation.t
> - bug-1432542-mpx-restart-crash.t
> 
> Others of interest,
> - replicate/bug-1341650.t
> 
> Please file a bug if needed against the test case and report the
> same
> here, in case a problem is already addressed, then do send back the
> patch details that addresses this issue as a response to this mail.
> 
> Thanks,
> Shyam
> 
> [1] Regression failures:
> https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Regression health for release-5.next and release-6

2019-01-09 Thread Shyam Ranganathan
Hi,

As part of branching preparation next week for release-6, please find
test failures and respective test links here [1].

The top tests that are failing/dumping-core are as below and need attention,
- ec/bug-1236065.t
- glusterd/add-brick-and-validate-replicated-volume-options.t
- readdir-ahead/bug-1390050.t
- glusterd/brick-mux-validation.t
- bug-1432542-mpx-restart-crash.t

Others of interest,
- replicate/bug-1341650.t

Please file a bug if needed against the test case and report the same
here, in case a problem is already addressed, then do send back the
patch details that addresses this issue as a response to this mail.

Thanks,
Shyam

[1] Regression failures: https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] https://review.gluster.org/#/c/glusterfs/+/19778/

2019-01-08 Thread Shyam Ranganathan
On 1/8/19 8:33 AM, Nithya Balachandran wrote:
> Shyam, what is your take on this?
> An upstream user has tried it out and reported that it seems to fix the
> issue , however cpu utilization doubles.

We usually do not backport big fixes unless they are critical. My first
answer would be, can't this wait for rel-6 which is up next?

The change has gone through a good review overall, so from a review
thoroughness perspective it looks good.

The change has a test case to ensure that the limits are honored, so
again a plus.

Also, it is a switch, so in the worst case moving back to unlimited
should be possible with little adverse effects in case the fix has issues.

It hence, comes down to how confident are we that the change is not
disruptive to an existing branch? If we can answer this with resonable
confidence we can backport it and release it with the next 5.x update
release.

> 
> Regards,
> Nithya
> 
> On Fri, 28 Dec 2018 at 09:17, Amar Tumballi  <mailto:atumb...@redhat.com>> wrote:
> 
> I feel its good to backport considering glusterfs-6.0 is another 2
> months away.
> 
> On Fri, Dec 28, 2018 at 8:19 AM Nithya Balachandran
> mailto:nbala...@redhat.com>> wrote:
> 
> Hi,
> 
> Can we backport this to release-5 ? We have several reports of
> high memory usage in fuse clients from users and this is likely
> to help.
> 
> Regards,
> Nithya
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
> 
> 
> -- 
> Amar Tumballi (amarts)
> 
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Announcing Gluster release 5.2

2018-12-13 Thread Shyam Ranganathan
The Gluster community is pleased to announce the release of Gluster
5.2 (packages available at [1]).

Release notes can be found at [2].

Major changes, features and limitations addressed in this release:

- Several bugs as listed in the release notes have been addressed

Thanks,
Gluster community

[1] Packages for 5.2:
https://download.gluster.org/pub/gluster/glusterfs/5/5.2/
(CentOS storage SIG packages may arrive on Monday (17th Dec-2018) or
later as per the CentOS schedules)

[2] Release notes for 5.2:
https://docs.gluster.org/en/latest/release-notes/5.2/
OR,
https://github.com/gluster/glusterfs/blob/release-5/doc/release-notes/5.2.md
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 6: Kick off!

2018-12-06 Thread Shyam Ranganathan
On 11/6/18 11:34 AM, Shyam Ranganathan wrote:
> ## Schedule

We have decided to postpone release-6 by a month, to accommodate for
late enhancements and the drive towards getting what is required for the
GCS project [1] done in core glusterfs.

This puts the (modified) schedule for Release-6 as below,

Working backwards on the schedule, here's what we have:
- Announcement: Week of Mar 4th, 2019
- GA tagging: Mar-01-2019
- RC1: On demand before GA
- RC0: Feb-04-2019
- Late features cut-off: Week of Jan-21st, 2018
- Branching (feature cutoff date): Jan-14-2018
  (~45 days prior to branching)
- Feature/scope proposal for the release (end date): *Dec-12-2018*

So the first date is the feature/scope proposal end date, which is next
week, please send in enhancements that you are working on that will meet
the above schedule, for us to track and ensure they get in on time better.

> 
> ## Volunteers
> This is my usual call for volunteers to run the release with me or
> otherwise, but please do consider. We need more hands this time, and
> possibly some time sharing during the end of the year owing to the holidays.

Also, taking this opportunity to call for volunteers to run the release
again. Anyone interested please do respond.

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Patches in merge conflict

2018-12-05 Thread Shyam Ranganathan
Due to the merge of https://review.gluster.org/c/glusterfs/+/21746 which
changes a whole lot of files to get header includes with the new changed
path for libglusterfs includes, a lot of patches are in merge conflict.

If you notice that your patch is one such, please rebase to the tip of
master, using the gerrit UI, or manually if that fails.

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Geo-rep tests failing on master Cent7-regressions

2018-12-04 Thread Shyam Ranganathan
Hi Kotresh,

Multiple geo-rep tests are failing on master on various patch regressions.

Looks like you have put in
https://review.gluster.org/c/glusterfs/+/21794 for review, to address
the issue at present.

Would that be correct?

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Problem: Include libglusterfs files in gfapi headers

2018-11-21 Thread Shyam Ranganathan
In the commit [1] that introduces statx structure as an out arg from
glfs APIs, there is a need to provide a compatible header for statx,
when the base distribution does not still support statx (say centos7).

The header is provided as libglusterfs/src/compat-statx.h and is
packaged with the glusterfs-devel RPM (as are other libglusterfs
headers, and the api-devel package depends on this package, so all this
is fine so far).

The issue at hand is that, the inclusion of the new header [2] is done
using the user specified format for header inclusion (i.e
"compat-statx.h") [3] and should really be a system header file that
comes in with the glusterfs-devel package.

When included as  instead of the current
"compat-statx.h" though, the compilation fails, as there is no directory
named glusterfs, which is added to the search path during compilation,
that contains this header.

For solutions, I tried adding a symlink within libglusterfs/src/ named
glusterfs that points to src, thus providing the directory under which
compat-statx.h can be found. This works when compiling the code, but not
when building packages, as the symlink does not transfer (or I did not
write enough code to make that happen). In reality I do not like this
solution to really use this framework.

The mail is to solicit inputs on how we can solve the compile and
packaging build time dependency and retain the inclusion as a system
header than a user header as it currently stands.

My thought is as follows:
- Create a similar structure that the packaging lays out the headers on
a system and move the headers in there, thus having a cleaner build and
package than other hacks like above.
  - IOW, create glusterfs directory under libglusterfs/src and move
relevant headers that are included in the packaging in there, and
similarly move headers in api/src to a directory like
api/src/glusterfs/api/ such that when appropriate search paths are
provided these can be included in the right manner as system headers and
not user headers.

This work can also help xlator development outside the tree, and also
help with providing a pkgconfig for glusterffs-devel IMO.

Comments and other thoughts?

Shyam

[1] Commit introducing statx: https://review.gluster.org/c/glusterfs/+/19802

[2] Inclusion of the header:
https://review.gluster.org/c/glusterfs/+/19802/9/api/src/glfs.h#54

[3] Include syntax from gcc docs:
https://gcc.gnu.org/onlinedocs/cpp/Include-Syntax.html
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Release 4.1.7 & 5.2

2018-11-14 Thread Shyam Ranganathan
Hi,

As 4.1.6 and 5.1 are now tagged and off to packaging, announcing the
tracker and dates for the next minor versions of these stable releases.

4.1.7:
- Tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-4.1.7
- Deadline for fixes: 2019-01-21

5.2:
- Tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.2
- Deadline for fixes: 2018-12-10

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression failure: https://build.gluster.org/job/centos7-regression/3678/

2018-11-14 Thread Shyam Ranganathan
On 11/14/2018 10:04 AM, Nithya Balachandran wrote:
> Hi Mohit,
> 
> The regression run in the subject has failed because a brick has crashed in 
> 
> bug-1432542-mpx-restart-crash.t
> 
> 
> *06:03:38* 1 test(s) generated core 
> *06:03:38* ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> *06:03:38*
> 
> 
> The brick process has crashed in posix_fs_health_check as  this->priv
> contains garbage. It looks like it might have been freed already. Can
> you take a look at it?

Sounds like another incarnation of:
https://bugzilla.redhat.com/show_bug.cgi?id=1636570

@mohit, any further clues?

> 
> 
> 
> (gdb) bt
> #0  0x7f4019ea1f19 in vfprintf () from ./lib64/libc.so.6
> #1  0x7f4019eccf49 in vsnprintf () from ./lib64/libc.so.6
> #2  0x7f401b87705a in gf_vasprintf (string_ptr=0x7f3e81ff99f0,
> format=0x7f400df32f40 "op=%s;path=%s;error=%s;brick=%s:%s timeout is
> %d", arg=0x7f3e81ff99f8)
>     at
> /home/jenkins/root/workspace/centos7-regression/libglusterfs/src/mem-pool.c:234
> #3  0x7f401b8de6e2 in _gf_event
> (event=EVENT_POSIX_HEALTH_CHECK_FAILED, fmt=0x7f400df32f40
> "op=%s;path=%s;error=%s;brick=%s:%s timeout is %d")
>     at
> /home/jenkins/root/workspace/centos7-regression/libglusterfs/src/events.c:89
> #4  0x7f400def07f9 in posix_fs_health_check (this=0x7f3fd78b7840) at
> /home/jenkins/root/workspace/centos7-regression/xlators/storage/posix/src/posix-helpers.c:1960
> #5  0x7f400def0926 in posix_health_check_thread_proc
> (data=0x7f3fd78b7840)
>     at
> /home/jenkins/root/workspace/centos7-regression/xlators/storage/posix/src/posix-helpers.c:2005
> #6  0x7f401a68ae25 in start_thread () from ./lib64/libpthread.so.0
> #7  0x7f4019f53bad in clone () from ./lib64/libc.so.6
> (gdb) f 4
> #4  0x7f400def07f9 in posix_fs_health_check (this=0x7f3fd78b7840) at
> /home/jenkins/root/workspace/centos7-regression/xlators/storage/posix/src/posix-helpers.c:1960
> 1960        gf_event(EVENT_POSIX_HEALTH_CHECK_FAILED,
> (gdb) l
> 1955        sys_close(fd);
> 1956    }
> 1957    if (ret && file_path[0]) {
> 1958        gf_msg(this->name, GF_LOG_WARNING, errno,
> P_MSG_HEALTHCHECK_FAILED,
> 1959               "%s() on %s returned", op, file_path);
> 1960        gf_event(EVENT_POSIX_HEALTH_CHECK_FAILED,
> 1961                 "op=%s;path=%s;error=%s;brick=%s:%s timeout is %d", op,
> 1962                 file_path, strerror(op_errno), priv->hostname,
> priv->base_path,
> 1963                 timeout);
> 1964    }
> (gdb) p pri->hostname
> No symbol "pri" in current context.
> *(gdb) p priv->hostname*
> *$14 = 0xa200 *
> *(gdb) p priv->base_path*
> *$15 = 0x7f3ddeadc0de00  0x7f3ddeadc0de00>*
> (gdb) 
> 
> 
> 
> Thanks,
> Nithya
> 
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Gluster Weekly Report : Static Analyser

2018-11-07 Thread Shyam Ranganathan
On 11/06/2018 02:08 PM, Shyam Ranganathan wrote:
> Hi,
> 
> I was attempting to fix a class of "Insecure data handling" defects in
> coverity around GF_FREE accessing tainted strings. Below is a short
> writeup of the same (pasted into the notes for each issue as well).
> Notifying the list of the same.
> 
> (attempted annotation) Fix: https://review.gluster.org/c/glusterfs/+/21422

Posted a new patch after using another system to check various coverity
runs and annotations. This one works, and once merged should auto-ignore
this pattern of issues. https://review.gluster.org/c/glusterfs/+/21584

> 
> The fix was to annotate the pointer coming into GF_FREE (or really
> __gf_free) as not tainted, based on the reasoning below. This coverity
> annotation is applied incorrectly in the code, as we need to annotate a
> function that on exit marks the string as taint free. IOW, see
> https://community.synopsys.com/s/article/From-Case-Clearing-TAINTED-STRING
> 
> On attempting to write such alternative functions and testing with an in
> house coverity run, the taint was still not cleared. As a result, I am
> marking this/these issues as "False positive"+"Ignore".
> 
> The reason to treat this as a false positive is as follows,
> - The allocation function returns a pointer past the header, where the
> actual usage starts
> - The free function accesses the header information to check if the
> trailer is overwritten to detect memory region overwrites
> - When these pointers are used for IO with external sources the entire
> pointer is tainted
> 
> As we are detecting a similar corruption, using the region before the
> returned pointer (and some after), and not checking regions that were
> passed to the respective external IO sources, the regions need not be
> sanitized before accessing the same. As a result, these instances are
> marked as false positives
> 
> An older thread discussing this from Xavi can be found here:
> https://lists.gluster.org/pipermail/gluster-devel/2014-December/043314.html
> 
> Shyam
> On 11/02/2018 01:07 PM, Sunny Kumar wrote:
>> Hello folks,
>>
>> The current status of static analyser is below:
>>
>> Coverity scan status:
>> Last week we started from 135 and now its 116 (2nd Nov scan)
>> Contributors - Sunny (1 patch containing 7 fixes) and
>> Varsha (1 patch containing 1 fix).
>>
>> As you all are aware we are marking few features as deprecated in gluster 
>> [1].
>> Few coverity defects eliminated due to this activity. (from tier and stripe)
>> [1]. https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html
>>
>> Clang-scan status:
>> Last week we started from 90 and today its 84 (build #503).
>> Contributors- Harpreet (2 patches), Shwetha and Amar(1 patch each).
>>
>> If you want to contribute in fixing coverity and clang-scan fixes
>> please follow these instruction:
>> * for coverity scan fixes:
>> https://lists.gluster.org/pipermail/gluster-devel/2018-August/055155.html
>>  * for clang-scan:
>> https://lists.gluster.org/pipermail/gluster-devel/2018-August/055338.html
>>
>>
>> Regards,
>> Sunny kumar
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster Weekly Report : Static Analyser

2018-11-06 Thread Shyam Ranganathan
Hi,

I was attempting to fix a class of "Insecure data handling" defects in
coverity around GF_FREE accessing tainted strings. Below is a short
writeup of the same (pasted into the notes for each issue as well).
Notifying the list of the same.

(attempted annotation) Fix: https://review.gluster.org/c/glusterfs/+/21422

The fix was to annotate the pointer coming into GF_FREE (or really
__gf_free) as not tainted, based on the reasoning below. This coverity
annotation is applied incorrectly in the code, as we need to annotate a
function that on exit marks the string as taint free. IOW, see
https://community.synopsys.com/s/article/From-Case-Clearing-TAINTED-STRING

On attempting to write such alternative functions and testing with an in
house coverity run, the taint was still not cleared. As a result, I am
marking this/these issues as "False positive"+"Ignore".

The reason to treat this as a false positive is as follows,
- The allocation function returns a pointer past the header, where the
actual usage starts
- The free function accesses the header information to check if the
trailer is overwritten to detect memory region overwrites
- When these pointers are used for IO with external sources the entire
pointer is tainted

As we are detecting a similar corruption, using the region before the
returned pointer (and some after), and not checking regions that were
passed to the respective external IO sources, the regions need not be
sanitized before accessing the same. As a result, these instances are
marked as false positives

An older thread discussing this from Xavi can be found here:
https://lists.gluster.org/pipermail/gluster-devel/2014-December/043314.html

Shyam
On 11/02/2018 01:07 PM, Sunny Kumar wrote:
> Hello folks,
> 
> The current status of static analyser is below:
> 
> Coverity scan status:
> Last week we started from 135 and now its 116 (2nd Nov scan)
> Contributors - Sunny (1 patch containing 7 fixes) and
> Varsha (1 patch containing 1 fix).
> 
> As you all are aware we are marking few features as deprecated in gluster [1].
> Few coverity defects eliminated due to this activity. (from tier and stripe)
> [1]. https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html
> 
> Clang-scan status:
> Last week we started from 90 and today its 84 (build #503).
> Contributors- Harpreet (2 patches), Shwetha and Amar(1 patch each).
> 
> If you want to contribute in fixing coverity and clang-scan fixes
> please follow these instruction:
> * for coverity scan fixes:
> https://lists.gluster.org/pipermail/gluster-devel/2018-August/055155.html
>  * for clang-scan:
> https://lists.gluster.org/pipermail/gluster-devel/2018-August/055338.html
> 
> 
> Regards,
> Sunny kumar
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Release 6: Kick off!

2018-11-06 Thread Shyam Ranganathan
Hi,

With release-5 out of the door, it is time to start some activities for
release-6.

## Scope
It is time to collect and determine scope for the release, so as usual,
please send in features/enhancements that you are working towards
reaching maturity for this release to the devel list, and mark/open the
github issue with the required milestone [1].

At a broader scale, in the maintainers meeting we discussed the
enhancement wish list as in [2].

Other than the above, we are continuing with our quality focus and would
want to see a downward trend (or near-zero) in the following areas,
- Coverity
- clang
- ASAN

We would also like to tighten our nightly testing health, and would
ideally not want to have tests retry and pass in the second attempt on
the testing runs. Towards this, we would send in reports of retried and
failed tests, that need attention and fixes as required.

## Schedule
NOTE: Schedule is going to get heavily impacted due to end of the year
holidays, but we will try to keep it up as much as possible.

Working backwards on the schedule, here's what we have:
- Announcement: Week of Feb 4th, 2019
- GA tagging: Feb-01-2019
- RC1: On demand before GA
- RC0: Jan-02-2019
- Late features cut-off: Week of Dec-24th, 2018
- Branching (feature cutoff date): Dec-17-2018
  (~45 days prior to branching)
- Feature/scope proposal for the release (end date): Nov-21-2018

## Volunteers
This is my usual call for volunteers to run the release with me or
otherwise, but please do consider. We need more hands this time, and
possibly some time sharing during the end of the year owing to the holidays.

Thanks,
Shyam

[1] Release-6 github milestone:
https://github.com/gluster/glusterfs/milestone/8

[2] Release-6 enhancement wishlist:
https://hackmd.io/sP5GsZ-uQpqnmGZmFKuWIg#
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Consolidating Feature Requests in github

2018-11-05 Thread Shyam Ranganathan
On 11/05/2018 08:29 AM, Vijay Bellur wrote:
> Hi All,
> 
> I am triaging the open RFEs in bugzilla [1]. Since our new(er) workflow
> involves managing RFEs as github issues, I am considering migrating
> relevant open RFEs from bugzilla to github. Once migrated, a RFE in
> bugzilla would be closed with an appropriate comment. I can also update
> the external tracker to point to the respective github issue. Once the
> migration is done, all our feature requests can be further triaged and
> tracked in github.
> 
> Any objections to doing this?

None from me, I see this as needed and the way forward.

Only thing to consider maybe, how we treat bugs/questions using github
and if we want those moved out to bugzilla (during regular triage of
github issues) or not. IOW, what happens in the reverse from github to
bugzilla.

> 
> Thanks,
> Vijay
> 
> [1] https://goo.gl/7fsgTs
> 
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: GA Tagged and release tarball generated

2018-10-18 Thread Shyam Ranganathan
GA tagging done and release tarball is generated.

5.1 release tracker is now open for blockers against the same:
https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.1

5.x minor release is set to release on the 10th of every month, jFYI (as
the release schedule page in the website is updated).

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Release 5: GA tomorrow!

2018-10-17 Thread Shyam Ranganathan
On 10/15/2018 02:29 PM, Shyam Ranganathan wrote:
> On 10/11/2018 11:25 AM, Shyam Ranganathan wrote:
>> So we are through with a series of checks and tasks on release-5 (like
>> ensuring all backports to other branches are present in 5, upgrade
>> testing, basic performance testing, Package testing, etc.), but still
>> need the following resolved else we stand to delay the release GA
>> tagging, which I hope to get done over the weekend or by Monday 15th
>> morning (EDT).
>>
>> 1) Fix for libgfapi-python related blocker on Gluster:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1630804
>>
>> @ppai, who needs to look into this?
> 
> Du has looked into this, but resolution is still pending, and release
> still awaiting on this being a blocker.

Fix is backported and awaiting regression scores, before we merge and
make a release (tomorrow!).

@Kaushal, if we tag GA tomorrow EDT, would it be possible to tag GD2
today, for the packaging team to pick the same up?

> 
>>
>> 2) Release notes for options added to the code (see:
>> https://lists.gluster.org/pipermail/gluster-devel/2018-October/055563.html )
>>
>> @du, @krutika can we get some text for the options referred in the mail
>> above?
> 
> Inputs received and release notes updated:
> https://review.gluster.org/c/glusterfs/+/21421

Last chance to add review comments to the release notes!

> 
>>
>> 3) Python3 testing
>> - Heard back from Kotresh on geo-rep passing and saw that we have
>> handled cliutils issues
>> - Anything more to cover? (@aravinda, @kotresh, @ppai?)
>> - We are attempting to get a regression run on a Python3 platform, but
>> that maybe a little ways away from the release (see:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1638030 )
>>
>> Request attention to the above, to ensure we are not breaking things
>> with the release.
>>
>> Thanks,
>> Shyam
>> ___
>> maintainers mailing list
>> maintain...@gluster.org
>> https://lists.gluster.org/mailman/listinfo/maintainers
>>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] Announcing Glusterfs release 3.12.15 (Long Term Maintenance)

2018-10-17 Thread Shyam Ranganathan
On 10/17/2018 07:08 AM, Paolo Margara wrote:
> Hi,
> 
> this release will be the last of the 3.12.x branch prior it reach the EOL?

Yes that is true, this would be the last minor release, as release-5
comes out.

> 
> 
> Greetings,
> 
>     Paolo
> 
> Il 16/10/18 17:41, Jiffin Tony Thottan ha scritto:
>>
>> The Gluster community is pleased to announce the release of Gluster
>> 3.12.15 (packages available at [1,2,3]).
>>
>> Release notes for the release can be found at [4].
>>
>> Thanks,
>> Gluster community
>>
>>
>> [1] https://download.gluster.org/pub/gluster/glusterfs/3.12/3.12.15/
>> [2] https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-3.12
>> 
>> [3] https://build.opensuse.org/project/subprojects/home:glusterfs
>> [4] Release notes:
>> https://gluster.readthedocs.io/en/latest/release-notes/3.12.15/
>>
>>
>> ___
>> Gluster-users mailing list
>> gluster-us...@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
> 
> 
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: GA and what are we waiting on

2018-10-15 Thread Shyam Ranganathan
On 10/11/2018 11:25 AM, Shyam Ranganathan wrote:
> So we are through with a series of checks and tasks on release-5 (like
> ensuring all backports to other branches are present in 5, upgrade
> testing, basic performance testing, Package testing, etc.), but still
> need the following resolved else we stand to delay the release GA
> tagging, which I hope to get done over the weekend or by Monday 15th
> morning (EDT).
> 
> 1) Fix for libgfapi-python related blocker on Gluster:
> https://bugzilla.redhat.com/show_bug.cgi?id=1630804
> 
> @ppai, who needs to look into this?

Du has looked into this, but resolution is still pending, and release
still awaiting on this being a blocker.

> 
> 2) Release notes for options added to the code (see:
> https://lists.gluster.org/pipermail/gluster-devel/2018-October/055563.html )
> 
> @du, @krutika can we get some text for the options referred in the mail
> above?

Inputs received and release notes updated:
https://review.gluster.org/c/glusterfs/+/21421

> 
> 3) Python3 testing
> - Heard back from Kotresh on geo-rep passing and saw that we have
> handled cliutils issues
> - Anything more to cover? (@aravinda, @kotresh, @ppai?)
> - We are attempting to get a regression run on a Python3 platform, but
> that maybe a little ways away from the release (see:
> https://bugzilla.redhat.com/show_bug.cgi?id=1638030 )
> 
> Request attention to the above, to ensure we are not breaking things
> with the release.
> 
> Thanks,
> Shyam
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Maintainer meeting minutes : 15th Oct, 2018

2018-10-15 Thread Shyam Ranganathan
### BJ Link
* Bridge: https://bluejeans.com/217609845
* Watch: 

### Attendance
* Nigel, Nithya, Deepshikha, Akarsha, Kaleb, Shyam, Sunny

### Agenda
* AI from previous meeting:
  - Glusto-Test completion on release-5 branch - On Glusto team
  - Vijay will take this on.
  - He will be focusing it on next week.
  - Glusto for 5 may not be happening before release, but we'll do
it right after release it looks like.

- Release 6 Scope
- Will be sending out an email today/tomorrow for scope of release 6.
- Send a biweekly email with focus on glusterfs release focus areas.

- GCS scope into release-6 scope and get issues marked against the same
- For release-6 we want a thinner stack. This means we'd be removing
xlators from the code that Amar has already sent an email about.
- Locking support for gluster-block. Design still WIP. One of the
big ticket items that should make it to release 6. Includes reflink
support and enough locking support to ensure snapshots are consistent.
- GD1 vs GD2. We've been talking about it since release-4.0. We need
to call this out and understand if we will have GD2 as default. This is
call out for a plan for when we want to make this transition.

- Round Table
- [Nigel] Minimum build and CI health for all projects (including
sub-projects).
- This was primarily driven for GCS
- But, we need this even otherwise to sustain quality of projects
- AI: Call out on lists around release 6 scope, with a possible
list of sub-projects
- [Kaleb] SELinux package status
- Waiting on testing to understand if this is done right
- Can be released when required, as it is a separate package
- Release-5 the SELinux policies are with Fedora packages
- Need to coordinate with Fedora release, as content is in 2
packages
- AI: Nigel to follow up and get updates by the next meeting

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Release 5: GA and what are we waiting on

2018-10-11 Thread Shyam Ranganathan
So we are through with a series of checks and tasks on release-5 (like
ensuring all backports to other branches are present in 5, upgrade
testing, basic performance testing, Package testing, etc.), but still
need the following resolved else we stand to delay the release GA
tagging, which I hope to get done over the weekend or by Monday 15th
morning (EDT).

1) Fix for libgfapi-python related blocker on Gluster:
https://bugzilla.redhat.com/show_bug.cgi?id=1630804

@ppai, who needs to look into this?

2) Release notes for options added to the code (see:
https://lists.gluster.org/pipermail/gluster-devel/2018-October/055563.html )

@du, @krutika can we get some text for the options referred in the mail
above?

3) Python3 testing
- Heard back from Kotresh on geo-rep passing and saw that we have
handled cliutils issues
- Anything more to cover? (@aravinda, @kotresh, @ppai?)
- We are attempting to get a regression run on a Python3 platform, but
that maybe a little ways away from the release (see:
https://bugzilla.redhat.com/show_bug.cgi?id=1638030 )

Request attention to the above, to ensure we are not breaking things
with the release.

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Missing option documentation (need inputs)

2018-10-11 Thread Shyam Ranganathan
On 10/10/2018 11:20 PM, Atin Mukherjee wrote:
> 
> 
> On Wed, 10 Oct 2018 at 20:30, Shyam Ranganathan  <mailto:srang...@redhat.com>> wrote:
> 
> The following options were added post 4.1 and are part of 5.0 as the
> first release for the same. They were added in as part of bugs, and
> hence looking at github issues to track them as enhancements did not
> catch the same.
> 
> We need to document it in the release notes (and also the gluster doc.
> site ideally), and hence I would like a some details on what to write
> for the same (or release notes commits) for them.
> 
> Option: cluster.daemon-log-level
> Attention: @atin
> Review: https://review.gluster.org/c/glusterfs/+/20442
> 
> 
> This option has to be used based on extreme need basis and this is why
> it has been mentioned as GLOBAL_NO_DOC. So ideally this shouldn't be
> documented.
> 
> Do we still want to capture it in the release notes?

This is an interesting catch-22, when we want users to use the option
(say to provide better logs for troubleshooting), we have nothing to
point to, and it would be instructions (repeated over the course of
time) over mails.

I would look at adding this into an options section in the docs, but the
best I can find in there is
https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/

I would say we need to improve the way we deal with options and the
required submissions around the same.

Thoughts?

> 
> <https://review.gluster.org/c/glusterfs/+/20442>
> 
> Option: ctime-invalidation
> Attention: @Du
> Review: https://review.gluster.org/c/glusterfs/+/20286
> 
> Option: shard-lru-limit
> Attention: @krutika
> Review: https://review.gluster.org/c/glusterfs/+/20544
> 
> Option: shard-deletion-rate
> Attention: @krutika
> Review: https://review.gluster.org/c/glusterfs/+/19970
> 
> Please send in the required text ASAP, as we are almost towards the end
> of the release.
> 
> Thanks,
> Shyam
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Branched and further dates

2018-10-10 Thread Shyam Ranganathan
On 09/26/2018 10:21 AM, Shyam Ranganathan wrote:
> 3. Upgrade testing
>   - Need *volunteers* to do the upgrade testing as stated in the 4.1
> upgrade guide [3] to note any differences or changes to the same
>   - Explicit call out on *disperse* volumes, as we continue to state
> online upgrade is not possible, is this addressed and can this be tested
> and the documentation improved around the same?

Completed upgrade testing using RC1 packages against a 4.1 cluster.
Things hold up fine. (replicate type volumes)

I have not attempted a rolling upgrade of disperse volumes, as we still
lack instructions to do so. @Pranith/@Xavi is this feasible this release
onward?

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Nightly build status (week of 01 - 07 Oct, 2018)

2018-10-09 Thread Shyam Ranganathan
We have a set of 4 cores which seem to originate from 2 bugs as filed
and referenced below.

Bug 1: https://bugzilla.redhat.com/show_bug.cgi?id=1636570
Cleanup sequence issues in posix xlator. Mohit/Xavi/Du/Pranith are we
handling this as a part of addressing cleanup in brick mux, or should
we? Instead of piece meal fixes?

Bug 2: https://bugzilla.redhat.com/show_bug.cgi?id=1637743
Initial analysis seems to point to glusterd starting the same brick
instance twice (non-mux case). Request GlusterD folks to take a look.

1) Release-5

Link: ttps://build.gluster.org/job/nightly-release-5/

Failures:
a)
https://build.gluster.org/job/regression-test-with-multiplex/886/consoleText
  - Bug and RCA: https://bugzilla.redhat.com/show_bug.cgi?id=1636570

2) Master

Link: https://build.gluster.org/job/nightly-master/

Failures:
a) Failed job line-coverage:
https://build.gluster.org/job/line-coverage/530/consoleText
  - Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1637743 (initial
analysis)
  - Core generated
  - Test:
./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t

b) Failed job regression:
https://build.gluster.org/job/regression-test-burn-in/4127/consoleText
  - Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1637743 (initial
analysis) (same as 2.a)
  - Core generated
  - Test: ./tests/bugs/glusterd/quorum-validation.t

c) Failed job regression-with-mux:
https://build.gluster.org/job/regression-test-with-multiplex/889/consoleText
  - Bug and RCA: https://bugzilla.redhat.com/show_bug.cgi?id=1636570
(same as 1.a)
  - Core generated
  - Test: ./tests/basic/ec/ec-5-2.t

NOTE: All night-lies failed in distributed-regression tests as well, but
as these are not yet stable not calling these out.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Branched and further dates

2018-10-05 Thread Shyam Ranganathan
On 10/05/2018 10:59 AM, Shyam Ranganathan wrote:
> On 10/04/2018 11:33 AM, Shyam Ranganathan wrote:
>> On 09/13/2018 11:10 AM, Shyam Ranganathan wrote:
>>> RC1 would be around 24th of Sep. with final release tagging around 1st
>>> of Oct.
>> RC1 now stands to be tagged tomorrow, and patches that are being
>> targeted for a back port include,
> We still are awaiting release notes (other than the bugs section) to be
> closed.
> 
> There is one new bug that needs attention from the replicate team.
> https://bugzilla.redhat.com/show_bug.cgi?id=1636502
> 
> The above looks important to me to be fixed before the release, @ravi or
> @pranith can you take a look?
> 

RC1 is tagged and release tarball generated.

We still have 2 issues to work on,

1. The above messages from AFR in self heal logs

2. We need to test with Py3, else we risk putting out packages there on
Py3 default distros and causing some mayhem if basic things fail.

I am open to suggestions on how to ensure we work with Py3, thoughts?

I am thinking we run a regression on F28 (or a platform that defaults to
Py3) and ensure regressions are passing at the very least. For other
Python code that regressions do not cover,
- We have a list at [1]
- How can we split ownership of these?

@Aravinda, @Kotresh, and @ppai, looking to you folks to help out with
the process and needs here.

Shyam

[1] https://github.com/gluster/glusterfs/issues/411
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Branched and further dates

2018-10-05 Thread Shyam Ranganathan
On 10/04/2018 11:33 AM, Shyam Ranganathan wrote:
> On 09/13/2018 11:10 AM, Shyam Ranganathan wrote:
>> RC1 would be around 24th of Sep. with final release tagging around 1st
>> of Oct.
> 
> RC1 now stands to be tagged tomorrow, and patches that are being
> targeted for a back port include,

We still are awaiting release notes (other than the bugs section) to be
closed.

There is one new bug that needs attention from the replicate team.
https://bugzilla.redhat.com/show_bug.cgi?id=1636502

The above looks important to me to be fixed before the release, @ravi or
@pranith can you take a look?

> 
> 1) https://review.gluster.org/c/glusterfs/+/21314 (snapshot volfile in
> mux cases)
> 
> @RaBhat working on this.

Done

> 
> 2) Py3 corrections in master
> 
> @Kotresh are all changes made to master backported to release-5 (may not
> be merged, but looking at if they are backported and ready for merge)?

Done, release notes amend pending

> 
> 3) Release notes review and updates with GD2 content pending
> 
> @Kaushal/GD2 team can we get the updates as required?
> https://review.gluster.org/c/glusterfs/+/21303

Still awaiting this.

> 
> 4) This bug [2] was filed when we released 4.0.
> 
> The issue has not bitten us in 4.0 or in 4.1 (yet!) (i.e the options
> missing and hence post-upgrade clients failing the mount). This is
> possibly the last chance to fix it.
> 
> Glusterd and protocol maintainers, can you chime in, if this bug needs
> to be and can be fixed? (thanks to @anoopcs for pointing it out to me)

Release notes to be corrected to call this out.

> 
> The tracker bug [1] does not have any other blockers against it, hence
> assuming we are not tracking/waiting on anything other than the set above.
> 
> Thanks,
> Shyam
> 
> [1] Tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.0
> [2] Potential upgrade bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1540659
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 5: Branched and further dates

2018-10-04 Thread Shyam Ranganathan
On 10/04/2018 12:01 PM, Atin Mukherjee wrote:
> 4) This bug [2] was filed when we released 4.0.
> 
> The issue has not bitten us in 4.0 or in 4.1 (yet!) (i.e the options
> missing and hence post-upgrade clients failing the mount). This is
> possibly the last chance to fix it.
> 
> Glusterd and protocol maintainers, can you chime in, if this bug needs
> to be and can be fixed? (thanks to @anoopcs for pointing it out to me)
> 
> 
> This is a bad bug to live with. OTOH, I do not have an immediate
> solution in my mind on how to make sure (a) these options when
> reintroduced are made no-ops, especially they will be disallowed to tune
> (with out dirty option check hacks at volume set staging code) . If
> we're to tag RC1 tomorrow, I wouldn't be able to take a risk to commit
> this change.
> 
> Can we actually have a note in our upgrade guide to document that if
> you're upgrading to 4.1 or higher version make sure to disable these
> options before the upgrade to mitigate this?

Yes, adding this to the "Major Issues" section in the release notes as
well as noting it in the upgrade guide is possible. I will go with this
option for now, as we do not have complaints around this from 4.0/4.1
releases (which have the same issue as well).
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 5: Branched and further dates

2018-10-04 Thread Shyam Ranganathan
On 09/13/2018 11:10 AM, Shyam Ranganathan wrote:
> RC1 would be around 24th of Sep. with final release tagging around 1st
> of Oct.

RC1 now stands to be tagged tomorrow, and patches that are being
targeted for a back port include,

1) https://review.gluster.org/c/glusterfs/+/21314 (snapshot volfile in
mux cases)

@RaBhat working on this.

2) Py3 corrections in master

@Kotresh are all changes made to master backported to release-5 (may not
be merged, but looking at if they are backported and ready for merge)?

3) Release notes review and updates with GD2 content pending

@Kaushal/GD2 team can we get the updates as required?
https://review.gluster.org/c/glusterfs/+/21303

4) This bug [2] was filed when we released 4.0.

The issue has not bitten us in 4.0 or in 4.1 (yet!) (i.e the options
missing and hence post-upgrade clients failing the mount). This is
possibly the last chance to fix it.

Glusterd and protocol maintainers, can you chime in, if this bug needs
to be and can be fixed? (thanks to @anoopcs for pointing it out to me)

The tracker bug [1] does not have any other blockers against it, hence
assuming we are not tracking/waiting on anything other than the set above.

Thanks,
Shyam

[1] Tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.0
[2] Potential upgrade bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1540659
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Memory overwrites due to processing vol files???

2018-09-28 Thread Shyam Ranganathan
We tested with ASAN and without the fix at [1], and it consistently
crashes at the mdcache xlator when brick mux is enabled.
On 09/28/2018 03:50 PM, FNU Raghavendra Manjunath wrote:
> 
> I was looking into the issue and  this is what I could find while
> working with shyam.
> 
> There are 2 things here.
> 
> 1) The multiplexed brick process for the snapshot(s) getting the client
> volfile (I suspect, it happened
>      when restore operation was performed).
> 2) Memory corruption happening while the multiplexed brick process is
> building the graph (for the client
>      volfile it got above)
> 
> I have been able to reproduce the issue in my local computer once, when
> I ran the testcase tests/bugs/snapshot/bug-1275616.t
> 
> Upon comparison, we found that the backtrace of the core I got and the
> core generated in the regression runs was similar.
> In fact, the victim information shyam mentioned before, is also similar
> in the core that I was able to get.  
> 
> On top of that, when the brick process was run with valgrind, it
> reported following memory corruption
> 
> ==31257== Conditional jump or move depends on uninitialised value(s)
> ==31257==    at 0x1A7D0564: mdc_xattr_list_populate (md-cache.c:3127)
> ==31257==    by 0x1A7D1903: mdc_init (md-cache.c:3486)
> ==31257==    by 0x4E62D41: __xlator_init (xlator.c:684)
> ==31257==    by 0x4E62E67: xlator_init (xlator.c:709)
> ==31257==    by 0x4EB2BEB: glusterfs_graph_init (graph.c:359)
> ==31257==    by 0x4EB37F8: glusterfs_graph_activate (graph.c:722)
> ==31257==    by 0x40AEC3: glusterfs_process_volfp (glusterfsd.c:2528)
> ==31257==    by 0x410868: mgmt_getspec_cbk (glusterfsd-mgmt.c:2076)
> ==31257==    by 0x518408D: rpc_clnt_handle_reply (rpc-clnt.c:755)
> ==31257==    by 0x51845C1: rpc_clnt_notify (rpc-clnt.c:923)
> ==31257==    by 0x518084E: rpc_transport_notify (rpc-transport.c:525)
> ==31257==    by 0x123273DF: socket_event_poll_in (socket.c:2504)
> ==31257==  Uninitialised value was created by a heap allocation
> ==31257==    at 0x4C2DB9D: malloc (vg_replace_malloc.c:299)
> ==31257==    by 0x4E9F58E: __gf_malloc (mem-pool.c:136)
> ==31257==    by 0x1A7D052A: mdc_xattr_list_populate (md-cache.c:3123)
> ==31257==    by 0x1A7D1903: mdc_init (md-cache.c:3486)
> ==31257==    by 0x4E62D41: __xlator_init (xlator.c:684)
> ==31257==    by 0x4E62E67: xlator_init (xlator.c:709)
> ==31257==    by 0x4EB2BEB: glusterfs_graph_init (graph.c:359)
> ==31257==    by 0x4EB37F8: glusterfs_graph_activate (graph.c:722)
> ==31257==    by 0x40AEC3: glusterfs_process_volfp (glusterfsd.c:2528)
> ==31257==    by 0x410868: mgmt_getspec_cbk (glusterfsd-mgmt.c:2076)
> ==31257==    by 0x518408D: rpc_clnt_handle_reply (rpc-clnt.c:755)
> ==31257==    by 0x51845C1: rpc_clnt_notify (rpc-clnt.c:923)
> 
> Based on the above observations, I think the below patch  by Shyam
> should fix the crash.

[1]

> https://review.gluster.org/#/c/glusterfs/+/21299/
> 
> But, I am still trying understand, why a brick process should get a
> client volfile (i.e. the 1st issue mentioned above). 
> 
> Regards,
> Raghavendra
> 
> On Wed, Sep 26, 2018 at 9:00 PM Shyam Ranganathan  <mailto:srang...@redhat.com>> wrote:
> 
> On 09/26/2018 10:21 AM, Shyam Ranganathan wrote:
> > 2. Testing dashboard to maintain release health (new, thanks Nigel)
> >   - Dashboard at [2]
> >   - We already have 3 failures here as follows, needs attention from
> > appropriate *maintainers*,
> >     (a)
> >
> 
> https://build.gluster.org/job/regression-test-with-multiplex/871/consoleText
> >       - Failed with core:
> ./tests/basic/afr/gfid-mismatch-resolution-with-cli.t
> >     (b)
> >
> 
> https://build.gluster.org/job/regression-test-with-multiplex/873/consoleText
> >       - Failed with core: ./tests/bugs/snapshot/bug-1275616.t
> >       - Also test ./tests/bugs/glusterd/validating-server-quorum.t
> had to be
> > retried
> 
> I was looking at the cores from the above 2 instances, the one in job
> 873 is been a typical pattern, where malloc fails as there is internal
> header corruption in the free bins.
> 
> When examining the victim that would have been allocated, it is often
> carrying incorrect size and other magic information. If the data in
> victim is investigated it looks like a volfile.
> 
> With the crash in 871, I thought there maybe a point where this is
> detected earlier, but not able to make headway in the same.
> 
> So, what could be corrupting this memory and is it when the graph is
> being processed? Can we run this with ASAN or such (I have not tried,
> but 

Re: [Gluster-devel] Python3 build process

2018-09-28 Thread Shyam Ranganathan
On 09/28/2018 09:11 AM, Niels de Vos wrote:
> On Fri, Sep 28, 2018 at 08:57:06AM -0400, Shyam Ranganathan wrote:
>> On 09/28/2018 06:12 AM, Niels de Vos wrote:
>>> On Thu, Sep 27, 2018 at 08:40:54AM -0400, Shyam Ranganathan wrote:
>>>> On 09/27/2018 08:07 AM, Kaleb S. KEITHLEY wrote:
>>>>>> The thought is,
>>>>>> - Add a configure option "--enable-py-version-correction" to configure,
>>>>>> that is disabled by default
>>>>> "correction" implies there's something that's incorrect. How about
>>>>> "conversion" or perhaps just --enable-python2
>>>>>
>>>> I would not like to go with --enable-python2 as that implies it is an
>>>> conscious choice with the understanding that py2 is on the box. Given
>>>> the current ability to detect and hence correct the python shebangs, I
>>>> would think we should retain it as a more detect and modify the shebangs
>>>> option name. (I am looking at this more as an option that does the right
>>>> thing implicitly than someone/tool using this checking explicitly, which
>>>> can mean different things to different people, if that makes sense)
>>>>
>>>> Now "correction" seems like an overkill, maybe "conversion"?
>>> Is it really needed to have this as an option? Instead of an option in
>>> configure.ac, can it not be a post-install task in a Makefile.am? The
>>> number of executable python scripts that get installed are minimal, so I
>>> do not expect that a lot of changes are needed for this.
>>
>> Here is how I summarize this proposal,
>> - Perform the shebang "correction" for py2 in the post install
>>   - Keeps the git clone clean
>> - shebang correction occurs based on a configure time option
>>   - It is not implicit but an explicit choice to correct the shebangs to
>> py2, hence we need an option either way
>> - The configure option would be "--enable-python2"
>>   - Developers that need py2, can configure it as such
>>   - Regression jobs that need py2, either because of the platform they
>> test against, or for py2 compliance in the future, use the said option
>>   - Package builds are agnostic to these changes (currently) as they
>> decide at build time based on the platform what needs to be done.
> 
> I do not think such a ./configure option is needed. configure.ac can
> find out the version that is available, and pick python3 if it has both.
> 
> Tests should just run with "$PYTHON run-the-test.py" instead of
> ./run-the-test.py with a #!/usr/bin/python shebang. The testing
> framework can also find out what version of python is available.

If we back up a bit here, if all shebangs are cleared, then we do not
need anything. That is not the situation at the moment, and neither do I
know if that state can be reached.

We also need to ensure we work against py2 and py3 for the near future,
which entails being specific in some regression job at least on the
python choice, does that correct the shebangs really depends on the
above conclusion.

> 
> 
>>> There do seem quite some Python files that have a shebang, but do not
>>> need it (__init__.py, not executable, no __main__-like functions). This
>>> should probably get reviewed as well. When those scripts get their
>>> shebang removed, even fewer files need to be 'fixed-up'.
>>
>> I propose maintainers/component-owner take this cleanup.
> 
> That would be ideal!
> 
> 
>>> Is there a BZ or GitHub Issue that I can use to send some fixes?
>>
>> See: https://github.com/gluster/glusterfs/issues/411
> 
> Thanks,
> Niels
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Python3 build process

2018-09-28 Thread Shyam Ranganathan
On 09/28/2018 12:36 AM, Kotresh Hiremath Ravishankar wrote:
> > - All regression hosts are currently py2 and so if we do not run
> the py
> > shebang correction during configure (as we do not build and test from
> > RPMS), we would be running with incorrect py3 shebangs (although this
> > seems to work, see [2]. @kotresh can we understand why?)
> 
> Is it because we don't test any of the python in the regression tests?
> 
> Geo-replication do have regression tests but not sure about glusterfind,
> events.
> 
> Or because when we do, we invoke python scripts with `python foo.py` or
> `$PYTHON foo.py` everywhere? The shebangs are ignored when scripts are
> invoked this way.
> 
> The reason why geo-rep is passing is for the same reason mentioned. Geo-rep
> python file is invoked from a c program always prefixing it with python
> as follows.
> 
> python = getenv("PYTHON");
>     if (!python)
>     python = PYTHON;
>     nargv[j++] = python;
>     nargv[j++] = GSYNCD_PREFIX "/python/syncdaemon/" GSYNCD_PY;

Thank you, makes sense now.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Python3 build process

2018-09-28 Thread Shyam Ranganathan
On 09/28/2018 06:12 AM, Niels de Vos wrote:
> On Thu, Sep 27, 2018 at 08:40:54AM -0400, Shyam Ranganathan wrote:
>> On 09/27/2018 08:07 AM, Kaleb S. KEITHLEY wrote:
>>>> The thought is,
>>>> - Add a configure option "--enable-py-version-correction" to configure,
>>>> that is disabled by default
>>> "correction" implies there's something that's incorrect. How about
>>> "conversion" or perhaps just --enable-python2
>>>
>> I would not like to go with --enable-python2 as that implies it is an
>> conscious choice with the understanding that py2 is on the box. Given
>> the current ability to detect and hence correct the python shebangs, I
>> would think we should retain it as a more detect and modify the shebangs
>> option name. (I am looking at this more as an option that does the right
>> thing implicitly than someone/tool using this checking explicitly, which
>> can mean different things to different people, if that makes sense)
>>
>> Now "correction" seems like an overkill, maybe "conversion"?
> Is it really needed to have this as an option? Instead of an option in
> configure.ac, can it not be a post-install task in a Makefile.am? The
> number of executable python scripts that get installed are minimal, so I
> do not expect that a lot of changes are needed for this.

Here is how I summarize this proposal,
- Perform the shebang "correction" for py2 in the post install
  - Keeps the git clone clean
- shebang correction occurs based on a configure time option
  - It is not implicit but an explicit choice to correct the shebangs to
py2, hence we need an option either way
- The configure option would be "--enable-python2"
  - Developers that need py2, can configure it as such
  - Regression jobs that need py2, either because of the platform they
test against, or for py2 compliance in the future, use the said option
  - Package builds are agnostic to these changes (currently) as they
decide at build time based on the platform what needs to be done.

> 
> There do seem quite some Python files that have a shebang, but do not
> need it (__init__.py, not executable, no __main__-like functions). This
> should probably get reviewed as well. When those scripts get their
> shebang removed, even fewer files need to be 'fixed-up'.

I propose maintainers/component-owner take this cleanup.

> 
> Is there a BZ or GitHub Issue that I can use to send some fixes?

See: https://github.com/gluster/glusterfs/issues/411

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Python3 build process

2018-09-27 Thread Shyam Ranganathan
On 09/27/2018 08:07 AM, Kaleb S. KEITHLEY wrote:
>> The thought is,
>> - Add a configure option "--enable-py-version-correction" to configure,
>> that is disabled by default
> "correction" implies there's something that's incorrect. How about
> "conversion" or perhaps just --enable-python2
> 

I would not like to go with --enable-python2 as that implies it is an
conscious choice with the understanding that py2 is on the box. Given
the current ability to detect and hence correct the python shebangs, I
would think we should retain it as a more detect and modify the shebangs
option name. (I am looking at this more as an option that does the right
thing implicitly than someone/tool using this checking explicitly, which
can mean different things to different people, if that makes sense)

Now "correction" seems like an overkill, maybe "conversion"?
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Memory overwrites due to processing vol files???

2018-09-26 Thread Shyam Ranganathan
On 09/26/2018 10:21 AM, Shyam Ranganathan wrote:
> 2. Testing dashboard to maintain release health (new, thanks Nigel)
>   - Dashboard at [2]
>   - We already have 3 failures here as follows, needs attention from
> appropriate *maintainers*,
> (a)
> https://build.gluster.org/job/regression-test-with-multiplex/871/consoleText
>   - Failed with core: 
> ./tests/basic/afr/gfid-mismatch-resolution-with-cli.t
> (b)
> https://build.gluster.org/job/regression-test-with-multiplex/873/consoleText
>   - Failed with core: ./tests/bugs/snapshot/bug-1275616.t
>   - Also test ./tests/bugs/glusterd/validating-server-quorum.t had to be
> retried

I was looking at the cores from the above 2 instances, the one in job
873 is been a typical pattern, where malloc fails as there is internal
header corruption in the free bins.

When examining the victim that would have been allocated, it is often
carrying incorrect size and other magic information. If the data in
victim is investigated it looks like a volfile.

With the crash in 871, I thought there maybe a point where this is
detected earlier, but not able to make headway in the same.

So, what could be corrupting this memory and is it when the graph is
being processed? Can we run this with ASAN or such (I have not tried,
but need pointers if anyone has run tests with ASAN).

Here is the (brief) stack analysis of the core in 873:
NOTE: we need to start avoiding flushing the logs when we are dumping
core, as that leads to more memory allocations and causes a sort of
double fault in such cases.

Core was generated by `/build/install/sbin/glusterfsd -s
builder101.cloud.gluster.org --volfile-id /sn'.
Program terminated with signal 6, Aborted.
#0  0x7f23cf590277 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x7f23cf590277 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f23cf591968 in __GI_abort () at abort.c:90
#2  0x7f23cf5d2d37 in __libc_message (do_abort=do_abort@entry=2,
fmt=fmt@entry=0x7f23cf6e4d58 "*** Error in `%s': %s: 0x%s ***\n") at
../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x7f23cf5db499 in malloc_printerr (ar_ptr=0x7f23bc20,
ptr=, str=0x7f23cf6e4ea8 "free(): corrupted unsorted
chunks", action=3) at malloc.c:5025
#4  _int_free (av=0x7f23bc20, p=, have_lock=0) at
malloc.c:3847
#5  0x7f23d0f7c6e4 in __gf_free (free_ptr=0x7f23bc0a56a0) at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/mem-pool.c:356
#6  0x7f23d0f41821 in log_buf_destroy (buf=0x7f23bc0a5568) at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/logging.c:358
#7  0x7f23d0f44e55 in gf_log_flush_list (copy=0x7f23c404a290,
ctx=0x1ff6010) at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/logging.c:1739
#8  0x7f23d0f45081 in gf_log_flush_extra_msgs (ctx=0x1ff6010, new=0)
at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/logging.c:1807
#9  0x7f23d0f4162d in gf_log_set_log_buf_size (buf_size=0) at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/logging.c:290
#10 0x7f23d0f41acc in gf_log_disable_suppression_before_exit
(ctx=0x1ff6010) at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/logging.c:444
#11 0x7f23d0f4c027 in gf_print_trace (signum=6, ctx=0x1ff6010) at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/common-utils.c:922
#12 0x0040a84a in glusterfsd_print_trace (signum=6) at
/home/jenkins/root/workspace/regression-test-with-multiplex/glusterfsd/src/glusterfsd.c:2316
#13 
#14 0x7f23cf590277 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#15 0x7f23cf591968 in __GI_abort () at abort.c:90
#16 0x7f23cf5d2d37 in __libc_message (do_abort=2,
fmt=fmt@entry=0x7f23cf6e4d58 "*** Error in `%s': %s: 0x%s ***\n") at
../sysdeps/unix/sysv/linux/libc_fatal.c:196
#17 0x7f23cf5dcc86 in malloc_printerr (ar_ptr=0x7f23bc20,
ptr=0x7f23bc003cd0, str=0x7f23cf6e245b "malloc(): memory corruption",
action=) at malloc.c:5025
#18 _int_malloc (av=av@entry=0x7f23bc20, bytes=bytes@entry=15664) at
malloc.c:3473
#19 0x7f23cf5df84c in __GI___libc_malloc (bytes=15664) at malloc.c:2899
#20 0x7f23d0f3bbbf in __gf_default_malloc (size=15664) at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/mem-pool.h:106
#21 0x7f23d0f3f02f in xlator_mem_acct_init (xl=0x7f23bc082b20,
num_types=163) at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/xlator.c:800
#22 0x7f23b90a37bf in mem_acct_init (this=0x7f23bc082b20) at
/home/jenkins/root/workspace/regression-test-with-multiplex/xlators/performance/open-behind/src/open-behind.c:

[Gluster-devel] Python3 build process

2018-09-26 Thread Shyam Ranganathan
Hi,

With the introduction of default python 3 shebangs and the change in
configure.ac to correct these to py2 if the build is being attempted on
a machine that does not have py3, there are a couple of issues
uncovered. Here is the plan to fix the same, suggestions welcome.

Issues:
- A configure job is run when creating the dist tarball, and this runs
on non py3 platforms, hence changing the dist tarball to basically have
py2 shebangs, as a result the release-new build job always outputs py
files with the py2 shebang. See tarball in [1]

- All regression hosts are currently py2 and so if we do not run the py
shebang correction during configure (as we do not build and test from
RPMS), we would be running with incorrect py3 shebangs (although this
seems to work, see [2]. @kotresh can we understand why?)

Plan to address the above is detailed in this bug [3].

The thought is,
- Add a configure option "--enable-py-version-correction" to configure,
that is disabled by default

- All regression jobs will run with the above option, and hence this
will correct the py shebangs in the regression machines. In the future
as we run on both py2 and py3 machines, this will run with the right
python shebangs on these machines.

- The packaging jobs will now run the py version detection and shebang
correction during actual build and packaging, Kaleb already has put up a
patch for the same [2].

Thoughts?

Shyam

[1] Release tarball: https://build.gluster.org/job/release-new/69/
[2] Patch that defaults to py3 in regression and passes regressions:
https://review.gluster.org/c/glusterfs/+/21266
[3] Infra bug to change regression jobs:
https://bugzilla.redhat.com/show_bug.cgi?id=1633425
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Branched and further dates

2018-09-26 Thread Shyam Ranganathan
Hi,

Updates on the release and shout out for help is as follows,

RC0 Release packages for testing are available see the thread at [1]

These are the following activities that we need to complete for calling
the release as GA (with no major regressions i.e):

1. Release notes (Owner: release owner (myself), will send out an
initial version for review and to solicit inputs today)

2. Testing dashboard to maintain release health (new, thanks Nigel)
  - Dashboard at [2]
  - We already have 3 failures here as follows, needs attention from
appropriate *maintainers*,
(a)
https://build.gluster.org/job/regression-test-with-multiplex/871/consoleText
- Failed with core: 
./tests/basic/afr/gfid-mismatch-resolution-with-cli.t
(b)
https://build.gluster.org/job/regression-test-with-multiplex/873/consoleText
- Failed with core: ./tests/bugs/snapshot/bug-1275616.t
- Also test ./tests/bugs/glusterd/validating-server-quorum.t had to be
retried
(c)
https://build.gluster.org/job/regression-test-burn-in/4109/consoleText
- Failed with core: ./tests/basic/mgmt_v3-locks.t

3. Upgrade testing
  - Need *volunteers* to do the upgrade testing as stated in the 4.1
upgrade guide [3] to note any differences or changes to the same
  - Explicit call out on *disperse* volumes, as we continue to state
online upgrade is not possible, is this addressed and can this be tested
and the documentation improved around the same?

4. Performance testing/benchmarking
  - I would be using smallfile and FIO to baseline 3.12 and 4.1 and test
RC0 for any major regressions
  - If we already know of any please shout out so that we are aware of
the problems and upcoming fixes to the same

5. Major testing areas
  - Py3 support: Need *volunteers* here to test out the Py3 support
around changed python files, if there is not enough coverage in the
regression test suite for the same

Thanks,
Shyam

[1] Packages for RC0:
https://lists.gluster.org/pipermail/maintainers/2018-September/005044.html

[2] Release testing health dashboard:
https://build.gluster.org/job/nightly-release-5/

[3] 4.1 upgrade guide:
https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/

On 09/13/2018 11:10 AM, Shyam Ranganathan wrote:
> Hi,
> 
> Release 5 has been branched today. To backport fixes to the upcoming 5.0
> release use the tracker bug [1].
> 
> We intend to roll out RC0 build by end of tomorrow for testing, unless
> the set of usual cleanup patches (op-version, some messages, gfapi
> version) land in any form of trouble.
> 
> RC1 would be around 24th of Sep. with final release tagging around 1st
> of Oct.
> 
> I would like to encourage everyone to test out the bits as appropriate
> and post updates to this thread.
> 
> Thanks,
> Shyam
> 
> [1] 5.0 tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.0
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Release 5: Branched and further dates

2018-09-13 Thread Shyam Ranganathan
Hi,

Release 5 has been branched today. To backport fixes to the upcoming 5.0
release use the tracker bug [1].

We intend to roll out RC0 build by end of tomorrow for testing, unless
the set of usual cleanup patches (op-version, some messages, gfapi
version) land in any form of trouble.

RC1 would be around 24th of Sep. with final release tagging around 1st
of Oct.

I would like to encourage everyone to test out the bits as appropriate
and post updates to this thread.

Thanks,
Shyam

[1] 5.0 tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.0
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 5: Release calendar and status updates

2018-09-10 Thread Shyam Ranganathan
On 08/22/2018 02:03 PM, Shyam Ranganathan wrote:
> On 08/14/2018 02:28 PM, Shyam Ranganathan wrote:
>> 2) Branching date: (Monday) Aug-20-2018 (~40 days before GA tagging)
> 
> We are postponing branching to 2nd week of September (10th), as the
> entire effort in this release has been around stability and fixing
> issues across the board.

This is delayed for the following reasons,
- Stability of mux regressions
  There have been a few cores last week and we at least need an analysis
of the same before branching. Mohit, Atin and myself have looked at the
same and will post a broader update later today or tomorrow.

NOTE: Branching is not being withheld for the above, as we would
backport the required fixes, and post branching there is work to do in
terms of cleaning up the branch (gfapi, versions etc.) that takes some time.

- Not having the Gluster 5.0 found in version in bugzilla
This issue has been resolved with the bugzilla team today, so it is no
longer a blocker.

(read on as I still need information for some of the asks below)

> 
> Thus, we are expecting no net new features from hereon till branching,
> and features that are already a part of the code base and its details
> are as below.
> 



> 1) Changes to options tables in xlators (#302)
> 
> @Kaushal/GD2 team, can we call this complete? There maybe no real
> release notes for the same, as these are internal in nature, but
> checking nevertheless.

@Kaushal or GD2 contributors, ping!

> 5) Turn on Dentry fop serializer by default in brick stack (#421)
> 
> @du, the release note for this can be short, as other details are
> captured in 4.0 release notes.
> 
> However, in 4.0 release we noted a limitation with this feature as follows,
> 
> "Limitations: This feature is released as a technical preview, as
> performance implications are not known completely." (see section
> https://docs.gluster.org/en/latest/release-notes/4.0.0/#standalone )
> 
> Do we now have better data regarding the same that we can use when
> announcing the release?

@Du ping!
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Proposal to change Gerrit -> Bugzilla updates

2018-09-10 Thread Shyam Ranganathan
On 09/10/2018 08:37 AM, Nigel Babu wrote:
> Hello folks,
> 
> We now have review.gluster.org  as an
> external tracker on Bugzilla. Our current automation when there is a
> bugzilla attached to a patch is as follows:
> 
> 1. When a new patchset has "Fixes: bz#1234" or "Updates: bz#1234", we
> will post a comment to the bug with a link to the patch and change the
> status to POST. 2. When the patchset is merged, if the commit said
> "Fixes", we move the status to MODIFIED.
> 
> I'd like to propose the following improvements:
> 1. Add the Gerrit URL as an external tracker to the bug.

My assumption here is that for each patch that mentions a BZ, an
additional tracker would be added to the tracker list, right?

Further assumption (as I have not used trackers before) is that this
would reduce noise as comments in the bug itself, right?

In the past we have reduced noise by not commenting on the bug (or
github issue) every time the patch changes, so we get 2 comments per
patch currently, with the above change we would just get one and that
too as a terse external reference (see [1], based on my test/understanding).

What we would lose is the commit details when the patch is merged in the
BZ, as far as I can tell based on the changes below. These are useful
and would like these to be retained in case they are not.

> 2. When a patch is merged, only change state of the bug if needed. If
> there is no state change, do not add an additional message. The external
> tracker state should change reflecting the state of the review.

I added a tracker to this bug [1], but not seeing the tracker state
correctly reflected in BZ, is this work that needs to be done?

> 3. Assign the bug to the committer. This has edge cases, but it's best
> to at least handle the easy ones and then figure out edge cases later.
> The experience is going to be better than what it is right now.

Is the above a reference to just the "assigned to", or overall process?
If overall can you elaborate a little more on why this would be better
(I am not saying it is not, attempting to understand how you see it).

> 
> Please provide feedback/comments by end of day Friday. I plan to add
> this activity to the next Infra team sprint that starts on Monday (Sep 17).

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1619423
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Test health report (week ending 26th Aug, 2018)

2018-08-28 Thread Shyam Ranganathan
Need more focus on the retry cases, so that we have fewer failures
overall due to the same.

2 useful changes are in, fstat now has the ability to filter by job, and
tests that timeout will save older logs for analysis (assuming of course
the run failed, if on retry the run passes then the job is setup not to
save logs (as before)).

Line-coverage failures:
https://fstat.gluster.org/summary?start_date=2018-08-20_date=2018-08-26=master=line-coverage

Regression test burn-in:
https://fstat.gluster.org/summary?start_date=2018-08-20_date=2018-08-26=master=regression-test-burn-in

Mux-regressions:
https://fstat.gluster.org/summary?start_date=2018-08-20_date=2018-08-26=master=regression-test-with-multiplex


https://build.gluster.org/job/regression-test-with-multiplex/834/console
18:29:04 2 test(s) needed retry
18:29:04 ./tests/00-geo-rep/georep-basic-dr-rsync.t
18:29:04 ./tests/bugs/glusterd/validating-server-quorum.t

https://build.gluster.org/job/regression-test-burn-in/4071/console
18:27:03 1 test(s) needed retry
18:27:03 ./tests/00-geo-rep/georep-basic-dr-tarssh.t

https://build.gluster.org/job/regression-test-with-multiplex/835/console
18:25:06 1 test(s) needed retry
18:25:06 ./tests/bugs/shard/bug-shard-discard.t

https://build.gluster.org/job/regression-test-burn-in/4072/console
18:34:35 1 test(s) needed retry
18:34:35 ./tests/basic/volume-snapshot.t

https://build.gluster.org/job/line-coverage/487/console
18:43:30 1 test(s) generated core
18:43:30 ./tests/bugs/glusterd/validating-server-quorum.t
18:43:30
18:43:30 1 test(s) needed retry
18:43:30 ./tests/00-geo-rep/georep-basic-dr-tarssh.t

https://build.gluster.org/job/regression-test-burn-in/4073/console
18:31:42 1 test(s) generated core
18:31:42 ./tests/bugs/glusterd/validating-server-quorum.t

https://build.gluster.org/job/regression-test-with-multiplex/837/console
18:28:56 1 test(s) failed
18:28:56 ./tests/basic/afr/split-brain-favorite-child-policy.t
18:28:56
18:28:56 1 test(s) generated core
18:28:56 ./tests/basic/afr/split-brain-favorite-child-policy.t

https://build.gluster.org/job/line-coverage/489/consoleFull
20:36:49 3 test(s) failed
20:36:49 ./tests/00-geo-rep/georep-basic-dr-tarssh.t
20:36:49 ./tests/basic/tier/fops-during-migration-pause.t
20:36:49 ./tests/bugs/readdir-ahead/bug-1436090.t
20:36:49
20:36:49 2 test(s) generated core
20:36:49 ./tests/00-geo-rep/00-georep-verify-setup.t
20:36:49 ./tests/00-geo-rep/georep-basic-dr-rsync.t
20:36:49
20:36:49 4 test(s) needed retry
20:36:49 ./tests/00-geo-rep/georep-basic-dr-tarssh.t
20:36:49 ./tests/basic/tier/fops-during-migration-pause.t
20:36:49 ./tests/basic/tier/fops-during-migration.t
20:36:49 ./tests/bugs/readdir-ahead/bug-1436090.t

https://build.gluster.org/job/regression-test-with-multiplex/840/console
18:22:44 2 test(s) needed retry
18:22:44 ./tests/basic/volume-snapshot-clone.t
18:22:44 ./tests/bugs/posix/bug-1619720.t

https://build.gluster.org/job/regression-test-burn-in/4075/console
https://build.gluster.org/job/regression-test-with-multiplex/839/consoleText
Multiple failures, look related to some infra issue at that point, not
recording the same in the mail.

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 5: Release calendar and status updates

2018-08-22 Thread Shyam Ranganathan
On 08/14/2018 02:28 PM, Shyam Ranganathan wrote:
> 2) Branching date: (Monday) Aug-20-2018 (~40 days before GA tagging)

We are postponing branching to 2nd week of September (10th), as the
entire effort in this release has been around stability and fixing
issues across the board.

Thus, we are expecting no net new features from hereon till branching,
and features that are already a part of the code base and its details
are as below.

> 
> 3) Late feature back port closure: (Friday) Aug-24-2018 (1 week from
> branching)

As stated above, there is no late feature back port.

The features that are part of master since 4.1 release are as follows,
with some questions for the authors,

1) Changes to options tables in xlators (#302)

@Kaushal/GD2 team, can we call this complete? There maybe no real
release notes for the same, as these are internal in nature, but
checking nevertheless.

2) CloudArchival (#387)

@susant, what is the status of this feature? Is it complete?
I am missing user documentation, and code coverage from the tests is
very low (see:
https://build.gluster.org/job/line-coverage/485/Line_20Coverage_20Report/ )

3) Quota fsck (#390)

@Sanoj I do have documentation in the github issue, but would prefer if
the user facing documentation moves to glusterdocs instead.

Further I see no real test coverage for the tool provided here, any
thoughts around the same?

The script is not part of the tarball and hence the distribution RPMs as
well, what is the thought around distributing the same?

4) Ensure python3 compatibility across code base (#411)

@Kaleb/others, last patch to call this issue done (sans real testing at
the moment) is https://review.gluster.org/c/glusterfs/+/20868 request
review and votes here, to get this merged before branching.

5) Turn on Dentry fop serializer by default in brick stack (#421)

@du, the release note for this can be short, as other details are
captured in 4.0 release notes.

However, in 4.0 release we noted a limitation with this feature as follows,

"Limitations: This feature is released as a technical preview, as
performance implications are not known completely." (see section
https://docs.gluster.org/en/latest/release-notes/4.0.0/#standalone )

Do we now have better data regarding the same that we can use when
announcing the release?

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Test health report (week ending 19th Aug. 2018)

2018-08-20 Thread Shyam Ranganathan
Although tests have stabilized quite a bit, and from the maintainers
meeting we know that some tests have patches coming in, here is a
readout of other tests that needed a retry. We need to reduce failures
on retries as well, to be able to not have spurious or other failures in
test runs.

Tests being worked on (from the maintainers meeting notes):
- bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t

Other retries and failures, request component maintainers to look at the
test case and resulting failures and post back any findings to the lists
to take things forward,

https://build.gluster.org/job/line-coverage/481/console
20:10:01 1 test(s) needed retry
20:10:01 ./tests/basic/distribute/rebal-all-nodes-migrate.t

https://build.gluster.org/job/line-coverage/483/console
18:42:01 2 test(s) needed retry
18:42:01 ./tests/basic/tier/fops-during-migration-pause.t
18:42:01
./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
(fix in progress)

https://build.gluster.org/job/regression-test-burn-in/4067/console
18:27:21 1 test(s) generated core
18:27:21 ./tests/bugs/readdir-ahead/bug-1436090.t

https://build.gluster.org/job/regression-test-with-multiplex/828/console
18:19:39 1 test(s) needed retry
18:19:39 ./tests/bugs/glusterd/validating-server-quorum.t

https://build.gluster.org/job/regression-test-with-multiplex/829/console
18:24:14 2 test(s) needed retry
18:24:14 ./tests/00-geo-rep/georep-basic-dr-rsync.t
18:24:14 ./tests/bugs/glusterd/quorum-validation.t

https://build.gluster.org/job/regression-test-with-multiplex/831/console
18:20:49 1 test(s) generated core
18:20:49 ./tests/basic/ec/ec-5-2.t

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Release 5: Release calendar and status updates

2018-08-14 Thread Shyam Ranganathan
This mail is to solicit the following,

Features/enhancements planned for Gluster 5 needs the following from
contributors:
  - Open/Use relevant issue
  - Mark issue with the "Release 5" milestone [1]
  - Post to the devel lists issue details, requesting addition to track
the same for the release

NOTE: We are ~7 days from branching, and I do not have any issues marked
for the release, please respond with your issues that are going to be a
part of this release as you read this.

Calendar of activities look as follows:

1) master branch health checks (weekly, till branching)
  - Expect every Monday a status update on various tests runs

2) Branching date: (Monday) Aug-20-2018 (~40 days before GA tagging)

3) Late feature back port closure: (Friday) Aug-24-2018 (1 week from
branching)

4) Initial release notes readiness: (Monday) Aug-27-2018

5) RC0 build: (Monday) Aug-27-2018



6) RC1 build: (Monday) Sep-17-2018



7) GA tagging: (Monday) Oct-01-2018



8) ~week later release announcement

Go/no-go discussions per-phase will be discussed in the maintainers list.


[1] Release milestone: https://github.com/gluster/glusterfs/milestone/7
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down for stabilization (unlocking the same)

2018-08-14 Thread Shyam Ranganathan
On 08/14/2018 12:51 AM, Pranith Kumar Karampuri wrote:
> 
> 
> On Mon, Aug 13, 2018 at 10:55 PM Shyam Ranganathan  <mailto:srang...@redhat.com>> wrote:
> 
> On 08/13/2018 02:20 AM, Pranith Kumar Karampuri wrote:
> >     - At the end of 2 weeks, reassess master and nightly test
> status, and
> >     see if we need another drive towards stabilizing master by
> locking down
> >     the same and focusing only on test and code stability around
> the same.
> >
> >
> > When will there be a discussion about coming up with guidelines to
> > prevent lock down in future?
> 
> A thread for the same is started in the maintainers list.
> 
> 
> Could you point me to the thread please? I am only finding a thread with
> subject "Lock down period merge process"

That is the one I am talking about, where you already raised the above
point (if I recollect right).

> 
> >
> > I think it is better to lock-down specific components by removing
> commit
> > access for the respective owners for those components when a test in a
> > particular component starts to fail.
> 
>     Also I suggest we move this to the maintainers thread, to keep the noise
> levels across lists in check.
> 
> Thanks,
> Shyam
> 
> 
> 
> -- 
> Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down: RCA for tests (UNSOLVED bug-1110262.t)

2018-08-13 Thread Shyam Ranganathan
On 08/12/2018 08:42 PM, Shyam Ranganathan wrote:
> As a means of keeping the focus going and squashing the remaining tests
> that were failing sporadically, request each test/component owner to,
> 
> - respond to this mail changing the subject (testname.t) to the test
> name that they are responding to (adding more than one in case they have
> the same RCA)
> - with the current RCA and status of the same
> 
> List of tests and current owners as per the spreadsheet that we were
> tracking are:
> 
> ./tests/bugs/bug-1110262.tTBD

The above test fails as follows,
Run: https://build.gluster.org/job/line-coverage/427/consoleFull

Log snippet: (retried and passed so no further logs)
18:50:33 useradd: user 'dev' already exists
18:50:33 not ok 13 , LINENUM:42
18:50:33 FAILED COMMAND: useradd dev
18:50:33 groupadd: group 'QA' already exists
18:50:33 not ok 14 , LINENUM:43
18:50:33 FAILED COMMAND: groupadd QA

Basically, the user/group existed and hence the test failed. Now, I
tried getting to the build history of the machine that failed this test,
but Jenkins has not been cooperative, this was in an effort to
understand which previous run failed.

Also one other test case uses the same user and group names,
tests/bugs/bug-1584517.t, but that runs after this test.

So I do not know how that user and group name leaked, due to which this
test case failed.

Bug filed: https://bugzilla.redhat.com/show_bug.cgi?id=1615604

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Master branch lock down: RCA for tests (UNSOLVED ./tests/basic/stats-dump.t)

2018-08-13 Thread Shyam Ranganathan
On 08/13/2018 02:32 PM, Shyam Ranganathan wrote:
> I will be adding a bug and a fix that tries this in a loop to avoid the
> potential race that I see above as the cause.

Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1615582
Potential fix: https://review.gluster.org/c/glusterfs/+/20726

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Master branch lock down: RCA for tests (UNSOLVED ./tests/basic/stats-dump.t)

2018-08-13 Thread Shyam Ranganathan
On 08/12/2018 08:42 PM, Shyam Ranganathan wrote:
> As a means of keeping the focus going and squashing the remaining tests
> that were failing sporadically, request each test/component owner to,
> 
> - respond to this mail changing the subject (testname.t) to the test
> name that they are responding to (adding more than one in case they have
> the same RCA)
> - with the current RCA and status of the same
> 
> List of tests and current owners as per the spreadsheet that we were
> tracking are:
> 
> ./tests/basic/stats-dump.tTBD

This test fails as follows:

  01:07:31 not ok 20 , LINENUM:42
  01:07:31 FAILED COMMAND: grep .queue_size
/var/lib/glusterd/stats/glusterfsd__d_backends_patchy1.dump

  18:35:43 not ok 21 , LINENUM:43
  18:35:43 FAILED COMMAND: grep .queue_size
/var/lib/glusterd/stats/glusterfsd__d_backends_patchy2.dump

Basically when grep'ing for a pattern in the stats dump it is not
finding the second grep pattern of "queue_size" in one or the other bricks.

The above seems incorrect, if it found "aggr.fop.write.count" it stands
to reason that it found a stats dump, further there is a 2 second sleep
as well in the test case and the dump interval is 1 second.

The only reason for this to fail could hence possibly be that the file
was just (re)opened (by the io-stats dumper thread) for overwriting
content, at which point the fopen uses the mode "w+", and the file was
hence truncated, and the grep CLI also opened the file at the same time,
and hence found no content.

I will be adding a bug and a fix that tries this in a loop to avoid the
potential race that I see above as the cause.

Other ideas/causes welcome!

Also, this has failed in mux and non-mux environments,
Runs with failure:
https://build.gluster.org/job/regression-on-demand-multiplex/175/consoleFull
(no logs)

https://build.gluster.org/job/regression-on-demand-full-run/59/consoleFull
(has logs)

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Master branch lock down: RCA for tests (./tests/bugs/core/bug-1432542-mpx-restart-crash.t)

2018-08-13 Thread Shyam Ranganathan
On 08/12/2018 08:42 PM, Shyam Ranganathan wrote:
> As a means of keeping the focus going and squashing the remaining tests
> that were failing sporadically, request each test/component owner to,
> 
> - respond to this mail changing the subject (testname.t) to the test
> name that they are responding to (adding more than one in case they have
> the same RCA)
> - with the current RCA and status of the same
> 
> List of tests and current owners as per the spreadsheet that we were
> tracking are:
> 
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t 1608568 Nithya/Shyam

This test had 2 issues,

1. Needed more time in lcov builds, hence timeout was bumped to 800, and
also one of the EXPECT_WITHIN needed more tolerance and was bumped up to
120 seconds

2. This test OOM killed at times, and to reduce the memory pressure due
to the test, post each client mount that was in use for a dd test, it
was unmounted. This resulted in no more OOM kills for the test.

Shyam (and Nithya)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Master branch lock down: RCA for tests (./tests/bugs/distribute/bug-1042725.t)

2018-08-13 Thread Shyam Ranganathan
On 08/12/2018 08:42 PM, Shyam Ranganathan wrote:
> As a means of keeping the focus going and squashing the remaining tests
> that were failing sporadically, request each test/component owner to,
> 
> - respond to this mail changing the subject (testname.t) to the test
> name that they are responding to (adding more than one in case they have
> the same RCA)
> - with the current RCA and status of the same
> 
> List of tests and current owners as per the spreadsheet that we were
> tracking are:
> 
> ./tests/bugs/distribute/bug-1042725.t Shyam

The test above failed to even start glusterd (the first line of the
test) properly when it failed. On inspection it was noted that the
previous test ./tests/bugs/core/multiplex-limit-issue-151.t had not
completed succesfully and also had an different cleanup pattern
(trapping cleanup as an TERM exit, and not invoking it outright).

The test ./tests/bugs/core/multiplex-limit-issue-151.t was cleaned up to
perform cleanup as appropriate, and no further errors in the test
./tests/bugs/distribute/bug-1042725.t have been seen since then.

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Master branch lock down: RCA for tests (./tests/bugs/distribute/bug-1117851.t)

2018-08-13 Thread Shyam Ranganathan
On 08/12/2018 08:42 PM, Shyam Ranganathan wrote:
> As a means of keeping the focus going and squashing the remaining tests
> that were failing sporadically, request each test/component owner to,
> 
> - respond to this mail changing the subject (testname.t) to the test
> name that they are responding to (adding more than one in case they have
> the same RCA)
> - with the current RCA and status of the same
> 
> List of tests and current owners as per the spreadsheet that we were
> tracking are:
> 
> ./tests/bugs/distribute/bug-1117851.t Shyam/Nigel

Tests in lcov instrumented code take more time than normal. This test
was pushing towards 180-190 seconds on successful runs. As a result to
remove any potential issues around tests that run close to the default
timeout of 200 seconds, 2 changes were done.

1) https://review.gluster.org/c/glusterfs/+/20648
Added an option to run-tests.sh to enable setting the default timeout to
a different value.

2) https://review.gluster.org/c/build-jobs/+/20655
Changed the line-coverage job to use the above to set the default
timeout to 300 for the test run

Since the changes this issue has not failed in lcov runs.

Shyam (and Nigel/Nithya)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down for stabilization (unlocking the same)

2018-08-13 Thread Shyam Ranganathan
On 08/13/2018 02:20 AM, Pranith Kumar Karampuri wrote:
> - At the end of 2 weeks, reassess master and nightly test status, and
> see if we need another drive towards stabilizing master by locking down
> the same and focusing only on test and code stability around the same.
> 
> 
> When will there be a discussion about coming up with guidelines to
> prevent lock down in future?

A thread for the same is started in the maintainers list.

> 
> I think it is better to lock-down specific components by removing commit
> access for the respective owners for those components when a test in a
> particular component starts to fail.

Also I suggest we move this to the maintainers thread, to keep the noise
levels across lists in check.

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Master branch lock down: RCA for tests (testname.t)

2018-08-12 Thread Shyam Ranganathan
As a means of keeping the focus going and squashing the remaining tests
that were failing sporadically, request each test/component owner to,

- respond to this mail changing the subject (testname.t) to the test
name that they are responding to (adding more than one in case they have
the same RCA)
- with the current RCA and status of the same

List of tests and current owners as per the spreadsheet that we were
tracking are:

./tests/basic/distribute/rebal-all-nodes-migrate.t  TBD
./tests/basic/tier/tier-heald.t TBD
./tests/basic/afr/sparse-file-self-heal.t   TBD
./tests/bugs/shard/bug-1251824.tTBD
./tests/bugs/shard/configure-lru-limit.tTBD
./tests/bugs/replicate/bug-1408712.tRavi
./tests/basic/afr/replace-brick-self-heal.t TBD
./tests/00-geo-rep/00-georep-verify-setup.t Kotresh
./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t Karthik
./tests/basic/stats-dump.t  TBD
./tests/bugs/bug-1110262.t  TBD
./tests/basic/ec/ec-data-heal.t Mohit
./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t   Pranith
./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
TBD
./tests/basic/ec/ec-5-2.t   Sunil
./tests/bugs/shard/bug-shard-discard.t  TBD
./tests/bugs/glusterd/remove-brick-testcases.t  TBD
./tests/bugs/protocol/bug-808400-repl.t TBD
./tests/bugs/quick-read/bug-846240.tDu
./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t   Mohit
./tests/00-geo-rep/georep-basic-dr-tarssh.t Kotresh
./tests/bugs/ec/bug-1236065.t   Pranith
./tests/00-geo-rep/georep-basic-dr-rsync.t  Kotresh
./tests/basic/ec/ec-1468261.t   Ashish
./tests/basic/afr/add-brick-self-heal.t Ravi
./tests/basic/afr/granular-esh/replace-brick.t  Pranith
./tests/bugs/core/multiplex-limit-issue-151.t   Sanju
./tests/bugs/glusterd/validating-server-quorum.tAtin
./tests/bugs/replicate/bug-1363721.tRavi
./tests/bugs/index/bug-1559004-EMLINK-handling.tPranith
./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t 
Karthik
./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
Atin
./tests/bugs/glusterd/rebalance-operations-in-single-node.t TBD
./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t   TBD
./tests/bitrot/bug-1373520.tKotresh
./tests/bugs/distribute/bug-1117851.t   Shyam/Nigel
./tests/bugs/glusterd/quorum-validation.t   Atin
./tests/bugs/distribute/bug-1042725.t   Shyam
./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
Karthik
./tests/bugs/quota/bug-1293601.tTBD
./tests/bugs/bug-1368312.t  Du
./tests/bugs/distribute/bug-1122443.t   Du
./tests/bugs/core/bug-1432542-mpx-restart-crash.t   1608568 Nithya/Shyam

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Master branch lock down status (Aug 12th, 2018) (patchset 12)

2018-08-12 Thread Shyam Ranganathan
Patch set 12 results:

./tests/bugs/glusterd/quorum-validation.t (3 retries, 1 core)
./tests/bugs/glusterd/validating-server-quorum.t (1 core)
(NEW) ./tests/basic/distribute/rebal-all-nodes-migrate.t (1 retry)
./tests/basic/stats-dump.t (1 retry)
./tests/bugs/shard/bug-1251824.t (1 retry)
./tests/basic/ec/ec-5-2.t (1 core)
(NEW) ./tests/basic/tier/tier-heald.t (1 core) (Looks similar to,
./tests/bugs/glusterd/remove-brick-testcases.t (run: lcov#432))

Sheet updated here:
https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit#gid=522127663

Gerrit comment here:
https://review.gluster.org/c/glusterfs/+/20637/12#message-186adbee76d6999385022239cb2daba589f0a81f

Shyam
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>> destabilize it.
>>
>> Test cases that are tracked as failures and need action are present here
>> [3].
>>
>> @Nigel, request you to apply the commit rights change as you see this
>> mail and let the list know regarding the same as well.
>>
>> Thanks,
>> Shyam
>>
>> [1] Patches that address regression failures:
>> https://review.gluster.org/#/q/starredby:srangana%2540redhat.com
>>
>> [2] Bunched

Re: [Gluster-devel] Master branch lock down status (Patch set 11, Aug 12, 2018)

2018-08-12 Thread Shyam Ranganathan
Patch set 11 report:

line coverage: 4/8 PASS, 7/8 with retries, 1 core
CentOS regression: 5/8 PASS, 8/8 PASS-With-RETRIES
Mux regression: 7/8 PASS, 1 core

No NEW failures, sheet [1] updated with run details, and so is the WIP
patch with the same data [2].

Cores:
- ./tests/bugs/glusterd/validating-server-quorum.t
- ./tests/basic/ec/ec-5-2.t

Other retries/failures:
- ./tests/bugs/shard/bug-shard-discard.t
- ./tests/basic/afr/replace-brick-self-heal.t
- ./tests/bugs/core/multiplex-limit-issue-151.t
- ./tests/00-geo-rep/georep-basic-dr-tarssh.t
- ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
- ./tests/bugs/shard/configure-lru-limit.t
- ./tests/bugs/glusterd/quorum-validation.t


[1] Sheet with failure and run data:
https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit#gid=1434742898

[2] Gerrit comment with the same information:
https://review.gluster.org/c/glusterfs/+/20637/12#message-1f8f94aaa88be276229f20eb25a650381bc37543
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>> destabilize it.
>>
>> Test cases that are tracked as failures and need action are present here
>> [3].
>>
>> @Nigel, request you to apply the commit rights change as you se

Re: [Gluster-devel] Master branch lock down status (Sat. Aug 10th)

2018-08-11 Thread Shyam Ranganathan
Patch set 10: Run status

Line-coverage 4/7 PASS, 7/7 PASS-With-RETRY
Mux-regressions 4/55 PASS, 1 core
CentOS7 Regression 3/7 PASS, 7/7 PASS-With-RETRY

./tests/bugs/replicate/bug-1408712.t (2 fail/retry)
./tests/bugs/glusterd/quorum-validation.t (1 fail/retry)
./tests/bugs/core/multiplex-limit-issue-151.t (1 fail/retry)
./tests/bugs/shard/bug-shard-discard.t (1 fail/retry)
(NEW) ./tests/basic/afr/sparse-file-self-heal.t (1 fail/retry)
(NEW) ./tests/bugs/shard/bug-1251824.t (1 fail/retry)
(NEW) ./tests/bugs/shard/configure-lru-limit.t (1 fail/retry)
./tests/bugs/glusterd/validating-server-quorum.t (2 fail/retry)

Sheet [1] has run details and also comment on patch [2] has run details.

Atin/Shyam

[1] Sheet:
https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit#gid=552922579

[2] Comment on patch:
https://review.gluster.org/c/glusterfs/+/20637#message-2030bb77ed8d98618caded7b823bc4d65238e911
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>> destabilize it.
>>
>> Test cases that are tracked as failures and need action are present here
>> [3].
>>
>> @Nigel, request you to apply the commit rights change as you see this
>> mail and let the list k

Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-11 Thread Shyam Ranganathan
On 08/09/2018 10:58 PM, Raghavendra Gowdappa wrote:
> 
> 
> On Fri, Aug 10, 2018 at 1:38 AM, Shyam Ranganathan  <mailto:srang...@redhat.com>> wrote:
> 
> On 08/08/2018 09:04 PM, Shyam Ranganathan wrote:
> > Today's patch set 7 [1], included fixes provided till last evening IST,
> > and its runs can be seen here [2] (yay! we can link to comments in
> > gerrit now).
> > 
> > New failures: (added to the spreadsheet)
> > ./tests/bugs/quick-read/bug-846240.t
> 
> The above test fails always if there is a sleep of 10 added at line 36.
> 
> I tried to replicate this in my setup, and was able to do so 3/150 times
> and the failures were the same as the ones reported in the build logs
> (as below).
> 
> Not finding any clear reason for the failure, I delayed the test (i.e
> added a sleep 10) after the open on M0 to see if the race is uncovered,
> and it was.
> 
> Du, request you to take a look at the same, as the test is around
> quick-read but involves open-behind as well.
> 
> 
> Thanks for that information. I'll be working on this today.

Heads up Du, failed again with the same pattern in run
https://build.gluster.org/job/regression-on-demand-full-run/46/consoleFull

> 
> 
> Failure snippet:
> 
> 23:41:24 [23:41:28] Running tests in file
> ./tests/bugs/quick-read/bug-846240.t
> 23:41:28 ./tests/bugs/quick-read/bug-846240.t ..
> 23:41:28 1..17
> 23:41:28 ok 1, LINENUM:9
> 23:41:28 ok 2, LINENUM:10
> 
> 23:41:28 ok 13, LINENUM:40
> 23:41:28 not ok 14 , LINENUM:50
> 23:41:28 FAILED COMMAND: [ 0 -ne 0 ]
> 
> Shyam
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] tests/bugs/core/multiplex-limit-issue-151.t timed out

2018-08-11 Thread Shyam Ranganathan
On 08/11/2018 12:43 AM, Mohit Agrawal wrote:
> File a bug https://bugzilla.redhat.com/show_bug.cgi?id=1615003, I am not
> able to extract logs
> specific to this test case from the log dump.

This is because the test is calling cleanup twice in its exit bash traps.

- First, the test itself sets a trap to cleanup at
https://github.com/gluster/glusterfs/blob/master/tests/bugs/core/multiplex-limit-issue-151.t#L30

- There is an additional trap set to cleanup in include.rc,
https://github.com/gluster/glusterfs/blob/master/tests/include.rc#L719

The tar ball is generated in the cleanup routine, and also ensures that
on content in the tar balls is between 2 invocations. Thus, calling
cleanup twice will result in an empty tarball.

This can be seen running the test locally as,
`./tests/bugs/distribute/bug-1042725.t`

There are a few things in that test we need clarified,
1. why trap this:
https://github.com/gluster/glusterfs/blob/master/tests/bugs/core/multiplex-limit-issue-151.t#L29
2. Why trap cleanup, rather than invoke it at the end of the test as is
normal

Also, in the merged patch sets 2/4/6/7/8 I had added a cleanup at the
end (as I traced the failure of ./tests/bugs/distribute/bug-1042725.t to
incorrect cleanup by the previous test (or timeout in cleanup)). I did
not do the same in patch set 9.

So, I will post a patch that removes the traps set by this test (so we
get logs from this test), and hence add the manual cleanup at the end of
the test.

Finally, I do not see an infra bug in this.

(updated the bug as well)

> 
> 
> Thanks
> Mohit Agrawal
> 
> On Sat, Aug 11, 2018 at 9:27 AM, Atin Mukherjee  > wrote:
> 
> https://build.gluster.org/job/line-coverage/455/consoleFull
> 
> 
> 1 test failed:
> tests/bugs/core/multiplex-limit-issue-151.t (timed out)
> 
> The last job
> https://build.gluster.org/job/line-coverage/454/consoleFull
>  took
> only 21 secs, so we're not anyway near to breaching the threshold of
> the timeout secs. Possibly a hang?
> 
> 
> 
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down status (Fri, August 9th)

2018-08-11 Thread Shyam Ranganathan
On 08/10/2018 09:59 PM, Shyam Ranganathan wrote:
> Today's patch set is 9 [1].
> 
> Total of 7 runs for line-coverage, mux regressions, centos7 regressions
> are running (some are yet to complete).
> 
> Test failure summary is as follows,
Updating this section
1. ./tests/bugs/glusterd/validating-server-quorum.t (3 cores, 1 retry)
2. ./tests/bugs/core/multiplex-limit-issue-151.t (1 failure, 1 retry)
3.
./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
(2 retries)
4. (NEW) ./tests/basic/afr/replace-brick-self-heal.t (1 retry)
5. ./tests/bugs/glusterd/quorum-validation.t (2 retires, 1 core)
6. (NEW) ./tests/bugs/replicate/bug-1408712.t (1 retry) (ravi looking at it)
7. replace-brick-self-heal.t (1 retry)
8. ./tests/00-geo-rep/georep-basic-dr-rsync.t (1 retry)

> 
> Test output can be found at, [2] and [3]. [2] will be updated as runs
> that are still ongoing complete.

Above is also updated to find the runs where the tests fail.

> 
> Shyam
> [1] Patch set: https://review.gluster.org/c/glusterfs/+/20637/9
> [2] Sheet recording failures:
> https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit#gid=1535799585
> [3] Comment on patch set 9 recording runs till now:
> https://review.gluster.org/c/glusterfs/+/20637#message-07f3886dda133ed642438eb9e82b82d957668e86
> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
>> Deserves a new beginning, threads on the other mail have gone deep enough.
>>
>> NOTE: (5) below needs your attention, rest is just process and data on
>> how to find failures.
>>
>> 1) We are running the tests using the patch [2].
>>
>> 2) Run details are extracted into a separate sheet in [3] named "Run
>> Failures" use a search to find a failing test and the corresponding run
>> that it failed in.
>>
>> 3) Patches that are fixing issues can be found here [1], if you think
>> you have a patch out there, that is not in this list, shout out.
>>
>> 4) If you own up a test case failure, update the spreadsheet [3] with
>> your name against the test, and also update other details as needed (as
>> comments, as edit rights to the sheet are restricted).
>>
>> 5) Current test failures
>> We still have the following tests failing and some without any RCA or
>> attention, (If something is incorrect, write back).
>>
>> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
>> attention)
>> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
>> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
>> (Atin)
>> ./tests/bugs/ec/bug-1236065.t (Ashish)
>> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
>> ./tests/basic/ec/ec-1468261.t (needs attention)
>> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
>> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
>> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
>> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
>> ./tests/bugs/replicate/bug-1363721.t (Ravi)
>>
>> Here are some newer failures, but mostly one-off failures except cores
>> in ec-5-2.t. All of the following need attention as these are new.
>>
>> ./tests/00-geo-rep/00-georep-verify-setup.t
>> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
>> ./tests/basic/stats-dump.t
>> ./tests/bugs/bug-1110262.t
>> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
>> ./tests/basic/ec/ec-data-heal.t
>> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
>> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
>> ./tests/basic/ec/ec-5-2.t
>>
>> 6) Tests that are addressed or are not occurring anymore are,
>>
>> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
>> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
>> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
>> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
>> ./tests/bitrot/bug-1373520.t
>> ./tests/bugs/distribute/bug-1117851.t
>> ./tests/bugs/glusterd/quorum-validation.t
>> ./tests/bugs/distribute/bug-1042725.t
>> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
>> ./tests/bugs/quota/bug-1293601.t
>> ./tests/bugs/bug-1368312.t
>> ./tests/bugs/distribute/bug-1122443.t
>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>
>> Shyam (and Atin)
>>
>> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>>> Health on master as of the last nightly run [4] is still the same.
&

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status (Fri, August 9th)

2018-08-11 Thread Shyam Ranganathan
On 08/11/2018 02:09 AM, Atin Mukherjee wrote:
> I saw the same behaviour for
> https://build.gluster.org/job/regression-on-demand-full-run/47/consoleFull
> as well. In both the cases the common pattern is if a test was retried
> but overall the job succeeded. Is this a bug which got introduced
> recently? At the moment, this is blocking us to debug any tests which
> has been retried but the job overall succeeded.
> 
> *01:54:20* Archiving artifacts
> *01:54:21* ‘glusterfs-logs.tgz’ doesn’t match anything
> *01:54:21* No artifacts found that match the file pattern 
> "glusterfs-logs.tgz". Configuration error?
> *01:54:21* Finished: SUCCESS
> 
> I saw the same behaviour for 
> https://build.gluster.org/job/regression-on-demand-full-run/47/consoleFull as 
> well.

This has been the behavior always, if we call out a run as failed from
run-tests.sh (when there are retries) then the logs will be archived. We
do not call out a run as a failure in case there were retries, hence no
logs.

I will add this today to the WIP testing patchset.

> 
> 
> On Sat, Aug 11, 2018 at 9:40 AM Ravishankar N  <mailto:ravishan...@redhat.com>> wrote:
> 
> 
> 
> On 08/11/2018 07:29 AM, Shyam Ranganathan wrote:
> > ./tests/bugs/replicate/bug-1408712.t (one retry)
> I'll take a look at this. But it looks like archiving the artifacts
> (logs) for this run
> 
> (https://build.gluster.org/job/regression-on-demand-full-run/44/consoleFull)
> 
> was a failure.
> Thanks,
> Ravi
> ___
> maintainers mailing list
> maintain...@gluster.org <mailto:maintain...@gluster.org>
> https://lists.gluster.org/mailman/listinfo/maintainers
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down status (Fri, August 9th)

2018-08-10 Thread Shyam Ranganathan
Today's patch set is 9 [1].

Total of 7 runs for line-coverage, mux regressions, centos7 regressions
are running (some are yet to complete).

Test failure summary is as follows,
./tests/bugs/glusterd/validating-server-quorum.t (2 cores)
./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
(2 retries)
./tests/bugs/replicate/bug-1408712.t (one retry)
./tests/bugs/core/multiplex-limit-issue-151.t (one retry)
./tests/bugs/quick-read/bug-846240.t (one retry)
./tests/00-geo-rep/georep-basic-dr-rsync.t (one retry)

Test output can be found at, [2] and [3]. [2] will be updated as runs
that are still ongoing complete.

Shyam
[1] Patch set: https://review.gluster.org/c/glusterfs/+/20637/9
[2] Sheet recording failures:
https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit#gid=1535799585
[3] Comment on patch set 9 recording runs till now:
https://review.gluster.org/c/glusterfs/+/20637#message-07f3886dda133ed642438eb9e82b82d957668e86
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>> destabilize it.
>>
>> Test cases that are tracked as failures and need action are present here
>> [3].
>>
>> @Nigel, request you to apply the

Re: [Gluster-devel] Master branch lock down status (Thu, August 09th)

2018-08-09 Thread Shyam Ranganathan
Today's test results are updated in the spreadsheet in sheet named "Run
patch set 8".

I took in patch https://review.gluster.org/c/glusterfs/+/20685 which
caused quite a few failures, so not updating new failures as issue yet.

Please look at the failures for tests that were retried and passed, as
the logs for the initial runs should be preserved from this run onward.

Otherwise nothing else to report on the run status, if you are averse to
spreadsheets look at this comment in gerrit [1].

Shyam

[1] Patch set 8 run status:
https://review.gluster.org/c/glusterfs/+/20637/8#message-54de30fa384fd02b0426d9db6d07fad4eeefcf08
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>> destabilize it.
>>
>> Test cases that are tracked as failures and need action are present here
>> [3].
>>
>> @Nigel, request you to apply the commit rights change as you see this
>> mail and let the list know regarding the same as well.
>>
>> Thanks,
>> Shyam
>>
>> [1] Patches that address regression failures:
>> https://review.gluster.org/#/q/starredby:srangana%2540redhat.com
>>
>> [2] Bunched up patch against which regressions were run:
>> https://revi

Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-09 Thread Shyam Ranganathan
On 08/08/2018 09:04 PM, Shyam Ranganathan wrote:
> Today's patch set 7 [1], included fixes provided till last evening IST,
> and its runs can be seen here [2] (yay! we can link to comments in
> gerrit now).
> 
> New failures: (added to the spreadsheet)
> ./tests/bugs/quick-read/bug-846240.t

The above test fails always if there is a sleep of 10 added at line 36.

I tried to replicate this in my setup, and was able to do so 3/150 times
and the failures were the same as the ones reported in the build logs
(as below).

Not finding any clear reason for the failure, I delayed the test (i.e
added a sleep 10) after the open on M0 to see if the race is uncovered,
and it was.

Du, request you to take a look at the same, as the test is around
quick-read but involves open-behind as well.

Failure snippet:

23:41:24 [23:41:28] Running tests in file
./tests/bugs/quick-read/bug-846240.t
23:41:28 ./tests/bugs/quick-read/bug-846240.t ..
23:41:28 1..17
23:41:28 ok 1, LINENUM:9
23:41:28 ok 2, LINENUM:10

23:41:28 ok 13, LINENUM:40
23:41:28 not ok 14 , LINENUM:50
23:41:28 FAILED COMMAND: [ 0 -ne 0 ]

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-08 Thread Shyam Ranganathan
Today's patch set 7 [1], included fixes provided till last evening IST,
and its runs can be seen here [2] (yay! we can link to comments in
gerrit now).

New failures: (added to the spreadsheet)
./tests/bugs/protocol/bug-808400-repl.t (core dumped)
./tests/bugs/quick-read/bug-846240.t

Older tests that had not recurred, but failed today: (moved up in the
spreadsheet)
./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
./tests/bugs/index/bug-1559004-EMLINK-handling.t

Other issues;
Test ./tests/basic/ec/ec-5-2.t core dumped again
Few geo-rep failures, Kotresh should have more logs to look at with
these runs
Test ./tests/bugs/glusterd/quorum-validation.t dumped core again

Atin/Amar, we may need to merge some of the patches that have proven to
be holding up and fixing issues today, so that we do not leave
everything to the last. Check and move them along or lmk.

Shyam

[1] Patch set 7: https://review.gluster.org/c/glusterfs/+/20637/7
[2] Runs against patch set 7 and its status (incomplete as some runs
have not completed):
https://review.gluster.org/c/glusterfs/+/20637/7#message-37bc68ce6f2157f2947da6fd03b361ab1b0d1a77
(also updated in the spreadsheet)

On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>&g

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status

2018-08-08 Thread Shyam Ranganathan
On 08/08/2018 09:43 AM, Shyam Ranganathan wrote:
> On 08/08/2018 09:41 AM, Kotresh Hiremath Ravishankar wrote:
>> For geo-rep test retrials. Could you take this instrumentation patch [1]
>> and give a run?
>> I am have tried thrice on the patch with brick mux enabled and without
>> but couldn't hit
>> geo-rep failure. May be some race and it's not happening with
>> instrumentation patch.
>>
>> [1] https://review.gluster.org/20477
> 
> Will do in my refresh today, thanks.
> 

Kotresh, this run may have the additional logs that you are looking for.
As this is a failed run on one of the geo-rep test cases.

https://build.gluster.org/job/line-coverage/434/consoleFull
19:10:55, 1 test(s) failed
19:10:55, ./tests/00-geo-rep/georep-basic-dr-tarssh.t
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status

2018-08-08 Thread Shyam Ranganathan
On 08/08/2018 04:56 AM, Nigel Babu wrote:
> Also, Shyam was saying that in case of retries, the old (failure) logs
> get overwritten by the retries which are successful. Can we disable
> re-trying the .ts when they fail just for this lock down period
> alone so
> that we do have the logs?
> 
> 
> Please don't apply a band-aid. Please fix run-test.sh so that the second
> run has a -retry attached to the file name or some such, please.

Posted patch https://review.gluster.org/c/glusterfs/+/20682 that
achieves this.

I do not like the fact that I use the gluster CLI in run-scripts.sh,
alternatives welcome.

If it looks functionally fine, then I will merge it into the big patch
[1] that we are using to run multiple tests (so that at least we start
getting retry logs from there).

Prior to this I had done this within include.rc and in cleanup, but that
gets invoked twice (at least) per test, and so generated far too many
empty tarballs for no reason.

Also, the change above does not prevent half complete logs if any test
calls cleanup in between (as that would create a tarball in between that
would be overwritten by the last invocation of cleanup).

Shyam

[1] big patch: https://review.gluster.org/c/glusterfs/+/20637
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status

2018-08-08 Thread Shyam Ranganathan
On 08/08/2018 09:41 AM, Kotresh Hiremath Ravishankar wrote:
> For geo-rep test retrials. Could you take this instrumentation patch [1]
> and give a run?
> I am have tried thrice on the patch with brick mux enabled and without
> but couldn't hit
> geo-rep failure. May be some race and it's not happening with
> instrumentation patch.
> 
> [1] https://review.gluster.org/20477

Will do in my refresh today, thanks.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Test: ./tests/bugs/ec/bug-1236065.t

2018-08-07 Thread Shyam Ranganathan
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/ec/bug-1236065.t (Ashish)

Ashish/Atin, the above test failed in run:
https://build.gluster.org/job/regression-on-demand-multiplex/172/consoleFull

The above run is based on patchset 4 of
https://review.gluster.org/#/c/20637/4

The logs look as below, and as Ashish is unable to reproduce this, and
all failures are on line 78 with a heal outstanding of 105, looks like
this run may provide some possibilities on narrowing it down.

The problem seems to be glustershd not connecting to one of the bricks
that is restarted, and hence failing to heal that brick. This also looks
like what Ravi RCAd for the test: ./tests/bugs/replicate/bug-1363721.t

==
Test times from: cat ./glusterd.log | grep TEST
[2018-08-06 20:56:28.177386]:++
G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 77 gluster --mode=script
--wignore volume heal patchy full ++
[2018-08-06 20:56:28.767209]:++
G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 78 ^0$ get_pending_heal_count
patchy ++
[2018-08-06 20:57:48.957136]:++
G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 80 rm -f 0.o 10.o 11.o 12.o
13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o
++
==
Repeated connection failure to client-3 in glustershd.log:
[2018-08-06 20:56:30.218482] I [rpc-clnt.c:2087:rpc_clnt_reconfig]
0-patchy-client-3: changing port to 49152 (from 0)
[2018-08-06 20:56:30.222738] W [MSGID: 114043]
[client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-3: failed
to set the volume [Resource temporarily unavailable]
[2018-08-06 20:56:30.222788] W [MSGID: 114007]
[client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-3: failed
to get 'process-uuid' from reply dict [Invalid argument]
[2018-08-06 20:56:30.222813] E [MSGID: 114044]
[client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-3:
SETVOLUME on remote-host failed: cleanup flag is set for xlator.  Try
again later [Resource tempor
arily unavailable]
[2018-08-06 20:56:30.222845] I [MSGID: 114051]
[client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-3:
sending CHILD_CONNECTING event
[2018-08-06 20:56:30.222919] I [MSGID: 114018]
[client.c:2255:client_rpc_notify] 0-patchy-client-3: disconnected from
patchy-client-3. Client process will keep trying to connect to glusterd
until brick's port is
 available
==
Repeated connection messages close to above retries in
d-backends-patchy0.log:
[2018-08-06 20:56:38.530009] I [addr.c:55:compare_addr_and_update]
0-/d/backends/patchy0: allowed = "*", received addr = "127.0.0.1"
[2018-08-06 20:56:38.530044] I [login.c:111:gf_auth] 0-auth/login:
allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
The message "I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 0-dict:
key 'trusted.ec.version' is would not be sent on wire in future [Invalid
argument]" repeated 6 times between [2018-08-06 20:56:37.931040] and
 [2018-08-06 20:56:37.933084]
[2018-08-06 20:56:38.530067] I [MSGID: 115029]
[server-handshake.c:786:server_setvolume] 0-patchy-server: accepted
client from
CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo
ud.gluster.org-PC_NAME:patchy-client-0-RECON_NO:-0 (version: 4.2dev)
[2018-08-06 20:56:38.540499] I [addr.c:55:compare_addr_and_update]
0-/d/backends/patchy1: allowed = "*", received addr = "127.0.0.1"
[2018-08-06 20:56:38.540533] I [login.c:111:gf_auth] 0-auth/login:
allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
[2018-08-06 20:56:38.540555] I [MSGID: 115029]
[server-handshake.c:786:server_setvolume] 0-patchy-server: accepted
client from
CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo
ud.gluster.org-PC_NAME:patchy-client-1-RECON_NO:-0 (version: 4.2dev)
[2018-08-06 20:56:38.552442] I [addr.c:55:compare_addr_and_update]
0-/d/backends/patchy2: allowed = "*", received addr = "127.0.0.1"
[2018-08-06 20:56:38.552472] I [login.c:111:gf_auth] 0-auth/login:
allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
[2018-08-06 20:56:38.552494] I [MSGID: 115029]
[server-handshake.c:786:server_setvolume] 0-patchy-server: accepted
client from
CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo
ud.gluster.org-PC_NAME:patchy-client-2-RECON_NO:-0 (version: 4.2dev)
[2018-08-06 20:56:38.571671] I [addr.c:55:compare_addr_and_update]
0-/d/backends/patchy4: allowed = "*", received addr = "127.0.0.1"
[2018-08-06 20:56:38.571701] I [login.c:111:gf_auth] 0-auth/login:
allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
[2018-08-06 20:56:38.

Re: [Gluster-devel] Test: ./tests/bugs/distribute/bug-1042725.t

2018-08-07 Thread Shyam Ranganathan
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/distribute/bug-1042725.t

The above test fails, I think due to cleanup not completing in the
previous test failure.

The failed runs are:
https://build.gluster.org/job/line-coverage/405/consoleFull
https://build.gluster.org/job/line-coverage/415/consoleFull

The logs are similar, where test 1042725.t fails to start glusterd and
the previous test ./tests/bugs/core/multiplex-limit-issue-151.t has
timed out.

I am thinking we need to increase the cleanup time as well on time out
tests from 5 seconds to 10 seconds to prevent these, thoughts?

This timer:
https://github.com/gluster/glusterfs/blob/master/run-tests.sh#L16

Logs look as follows:
16:24:48

16:24:48 [16:24:51] Running tests in file
./tests/bugs/core/multiplex-limit-issue-151.t
16:28:08 ./tests/bugs/core/multiplex-limit-issue-151.t timed out after
200 seconds
16:28:08 ./tests/bugs/core/multiplex-limit-issue-151.t: bad status 124
16:28:08
16:28:08*
16:28:08*   REGRESSION FAILED   *
16:28:08* Retrying failed tests in case *
16:28:08* we got some spurious failures *
16:28:08*
16:28:08
16:31:28 ./tests/bugs/core/multiplex-limit-issue-151.t timed out after
200 seconds
16:31:28 End of test ./tests/bugs/core/multiplex-limit-issue-151.t
16:31:28

16:31:28
16:31:28
16:31:28

16:31:28 [16:31:31] Running tests in file
./tests/bugs/distribute/bug-1042725.t
16:32:35 ./tests/bugs/distribute/bug-1042725.t ..
16:32:35 1..16
16:32:35 Terminated
16:32:35 not ok 1 , LINENUM:9
16:32:35 FAILED COMMAND: glusterd
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Master branch lock down status

2018-08-07 Thread Shyam Ranganathan
Deserves a new beginning, threads on the other mail have gone deep enough.

NOTE: (5) below needs your attention, rest is just process and data on
how to find failures.

1) We are running the tests using the patch [2].

2) Run details are extracted into a separate sheet in [3] named "Run
Failures" use a search to find a failing test and the corresponding run
that it failed in.

3) Patches that are fixing issues can be found here [1], if you think
you have a patch out there, that is not in this list, shout out.

4) If you own up a test case failure, update the spreadsheet [3] with
your name against the test, and also update other details as needed (as
comments, as edit rights to the sheet are restricted).

5) Current test failures
We still have the following tests failing and some without any RCA or
attention, (If something is incorrect, write back).

./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
attention)
./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
(Atin)
./tests/bugs/ec/bug-1236065.t (Ashish)
./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
./tests/basic/ec/ec-1468261.t (needs attention)
./tests/basic/afr/add-brick-self-heal.t (needs attention)
./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
./tests/bugs/glusterd/validating-server-quorum.t (Atin)
./tests/bugs/replicate/bug-1363721.t (Ravi)

Here are some newer failures, but mostly one-off failures except cores
in ec-5-2.t. All of the following need attention as these are new.

./tests/00-geo-rep/00-georep-verify-setup.t
./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
./tests/basic/stats-dump.t
./tests/bugs/bug-1110262.t
./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
./tests/basic/ec/ec-data-heal.t
./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
./tests/basic/ec/ec-5-2.t

6) Tests that are addressed or are not occurring anymore are,

./tests/bugs/glusterd/rebalance-operations-in-single-node.t
./tests/bugs/index/bug-1559004-EMLINK-handling.t
./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
./tests/bitrot/bug-1373520.t
./tests/bugs/distribute/bug-1117851.t
./tests/bugs/glusterd/quorum-validation.t
./tests/bugs/distribute/bug-1042725.t
./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
./tests/bugs/quota/bug-1293601.t
./tests/bugs/bug-1368312.t
./tests/bugs/distribute/bug-1122443.t
./tests/bugs/core/bug-1432542-mpx-restart-crash.t

Shyam (and Atin)

On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
> Health on master as of the last nightly run [4] is still the same.
> 
> Potential patches that rectify the situation (as in [1]) are bunched in
> a patch [2] that Atin and myself have put through several regressions
> (mux, normal and line coverage) and these have also not passed.
> 
> Till we rectify the situation we are locking down master branch commit
> rights to the following people, Amar, Atin, Shyam, Vijay.
> 
> The intention is to stabilize master and not add more patches that my
> destabilize it.
> 
> Test cases that are tracked as failures and need action are present here
> [3].
> 
> @Nigel, request you to apply the commit rights change as you see this
> mail and let the list know regarding the same as well.
> 
> Thanks,
> Shyam
> 
> [1] Patches that address regression failures:
> https://review.gluster.org/#/q/starredby:srangana%2540redhat.com
> 
> [2] Bunched up patch against which regressions were run:
> https://review.gluster.org/#/c/20637
> 
> [3] Failing tests list:
> https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit?usp=sharing
> 
> [4] Nightly run dashboard: https://build.gluster.org/job/nightly-master/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 5: Master branch health report (Week of 30th July)

2018-08-07 Thread Shyam Ranganathan
On 08/07/2018 02:58 PM, Yaniv Kaul wrote:
> The intention is to stabilize master and not add more patches that my
> destabilize it.
> 
> 
> https://review.gluster.org/#/c/20603/ has been merged.
> As far as I can see, it has nothing to do with stabilization and should
> be reverted.

Posted this on the gerrit review as well:


4.1 does not have nightly tests, those run on master only.

Stability of master does not (will not), in the near term guarantee
stability of release branches, unless patches that impact code already
on release branches, get fixes on master and are back ported.

Release branches get fixes back ported (as is normal), this fix and its
merge should not impact current master stability in any way, and neither
stability of 4.1 branch.


The current hold is on master, not on release branches. I agree that
merging further code changes on release branches (for example geo-rep
issues that are backported (see [1]), as there are tests that fail
regularly on master), may further destabilize the release branch. This
patch is not one of those.

Merging patches on release branches are allowed by release owners only,
and usual practice is keeping the backlog low (merging weekly) in these
cases as per the dashboard [1].

Allowing for the above 2 reasons this patch was found,
- Not on master
- Not stabilizing or destabilizing the release branch
and hence was merged.

If maintainers disagree I can revert the same.

Shyam

[1] Release 4.1 dashboard:
https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:4-1-dashboard
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 5: Master branch health report (Week of 30th July)

2018-08-05 Thread Shyam Ranganathan
On 07/31/2018 07:16 AM, Shyam Ranganathan wrote:
> On 07/30/2018 03:21 PM, Shyam Ranganathan wrote:
>> On 07/24/2018 03:12 PM, Shyam Ranganathan wrote:
>>> 1) master branch health checks (weekly, till branching)
>>>   - Expect every Monday a status update on various tests runs
>> See https://build.gluster.org/job/nightly-master/ for a report on
>> various nightly and periodic jobs on master.
> Thinking aloud, we may have to stop merges to master to get these test
> failures addressed at the earliest and to continue maintaining them
> GREEN for the health of the branch.
> 
> I would give the above a week, before we lockdown the branch to fix the
> failures.
> 
> Let's try and get line-coverage and nightly regression tests addressed
> this week (leaving mux-regression open), and if addressed not lock the
> branch down.
> 

Health on master as of the last nightly run [4] is still the same.

Potential patches that rectify the situation (as in [1]) are bunched in
a patch [2] that Atin and myself have put through several regressions
(mux, normal and line coverage) and these have also not passed.

Till we rectify the situation we are locking down master branch commit
rights to the following people, Amar, Atin, Shyam, Vijay.

The intention is to stabilize master and not add more patches that my
destabilize it.

Test cases that are tracked as failures and need action are present here
[3].

@Nigel, request you to apply the commit rights change as you see this
mail and let the list know regarding the same as well.

Thanks,
Shyam

[1] Patches that address regression failures:
https://review.gluster.org/#/q/starredby:srangana%2540redhat.com

[2] Bunched up patch against which regressions were run:
https://review.gluster.org/#/c/20637

[3] Failing tests list:
https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit?usp=sharing

[4] Nightly run dashboard: https://build.gluster.org/job/nightly-master/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 5: Nightly test failures tracking

2018-08-02 Thread Shyam Ranganathan
On 07/24/2018 03:12 PM, Shyam Ranganathan wrote:
> 1) master branch health checks (weekly, till branching)
>   - Expect every Monday a status update on various tests runs

As we have quite a few jobs failing and quite a few tests failing, to
enable tracking this better I have created the sheet as in [1].

Atin and myself will keep this updated. If anyone is working on a test
case failure, add your name as a comment to the "Owner" cell, and if
there is a bug filed, do the same to the BZ# cell.

Newer failures or additions will be done to the sheet and in addtion
posted to this thread for contributors to pick up and analyze.

Current list of tests are as follows (which some of you are already
looking at),
./tests/bugs/core/bug-1432542-mpx-restart-crash.t
./tests/00-geo-rep/georep-basic-dr-tarssh.t
./tests/bugs/bug-1368312.t
./tests/bugs/distribute/bug-1122443.t
./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
./tests/bitrot/bug-1373520.t
./tests/bugs/ec/bug-1236065.t
./tests/00-geo-rep/georep-basic-dr-rsync.t
./tests/basic/ec/ec-1468261.t
./tests/bugs/glusterd/quorum-validation.t
./tests/bugs/quota/bug-1293601.t
./tests/basic/afr/add-brick-self-heal.t
./tests/basic/afr/granular-esh/replace-brick.t
./tests/bugs/core/multiplex-limit-issue-151.t
./tests/bugs/distribute/bug-1042725.t
./tests/bugs/distribute/bug-1117851.t
./tests/bugs/glusterd/rebalance-operations-in-single-node.t
./tests/bugs/index/bug-1559004-EMLINK-handling.t
./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t

Thanks.

[1] Test failures tracking:
https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit?usp=sharing
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] bug-1432542-mpx-restart-crash.t failures

2018-08-02 Thread Shyam Ranganathan
On 08/01/2018 11:10 PM, Nigel Babu wrote:
> Hi Shyam,
> 
> Amar and I sat down to debug this failure[1] this morning. There was a
> bit of fun looking at the logs. It looked like the test restarted
> itself. The first log entry is at 16:20:03. This test has a timeout of
> 400 seconds which is around 16:26:43.
> 
> However, if you account for the fact that we log from the second step or
> so, it looks like the test timed out and we restarted it. The first log
> entry is from a few steps in, this makes sense. I think your patch[2] to
> increase the timeout to 800 seconds is the right way forward.
> 
> The last step before the timeout is this
> [2018-07-30 16:26:29.160943]  : volume stop patchy-vol17 : SUCCESS
> [2018-07-30 16:26:40.222688]  : volume delete patchy-vol17 : SUCCESS
> 
> There are 20 volumes, so it really needs at least a 90 second bump. I'm
> estimating 30 seconds per volume to clean up. You probably want to some
> extra time so it passes on lcov as well. So right now the 800 second
> clean up looks good.

Unfortunately the timeout bump still does not clear lcov, see,
https://build.gluster.org/job/line-coverage/401/console
https://build.gluster.org/job/line-coverage/400/console
https://build.gluster.org/job/line-coverage/406/console

The first test passes, then as a part of the full run it fails again.

Patch also pushes up the EXPECT_WITHIN to 120 seconds... :(

> 
> [1]: https://build.gluster.org/job/regression-test-burn-in/4051/
> [2]: https://review.gluster.org/#/c/20568/2
> -- 
> nigelb
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 5: Master branch health report (Week of 30th July)

2018-08-01 Thread Shyam Ranganathan
Below is a summary of failures over the last 7 days on the nightly
health check jobs. This is one test per line, sorted in descending order
of occurrence (IOW, most frequent failure is on top).

The list includes spurious failures as well, IOW passed on a retry. This
is because if we do not weed out the spurious errors, failures may
persist and make it difficult to gauge the health of the branch.

The number at the end of the test line are Jenkins job numbers where
these failed. The job numbers runs as follows,
- https://build.gluster.org/job/regression-test-burn-in/ ID: 4048 - 4053
- https://build.gluster.org/job/line-coverage/ ID: 392 - 407
- https://build.gluster.org/job/regression-test-with-multiplex/ ID: 811
- 817

So to get to job 4051 (say), use the link
https://build.gluster.org/job/regression-test-burn-in/4051

Atin has called out some folks for attention to some tests, consider
this a call out to others, if you see a test against your component,
help around root causing and fixing it is needed.

tests/bugs/core/bug-1432542-mpx-restart-crash.t, 4049, 4051, 4052, 405,
404, 403, 396, 392

tests/00-geo-rep/georep-basic-dr-tarssh.t, 811, 814, 817, 4050, 4053

tests/bugs/bug-1368312.t, 815, 816, 811, 813, 403

tests/bugs/distribute/bug-1122443.t, 4050, 407, 403, 815, 816

tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t,
814, 816, 817, 812, 815

tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t,
4049, 812, 814, 405, 392

tests/bitrot/bug-1373520.t, 811, 816, 817, 813

tests/bugs/ec/bug-1236065.t, 812, 813, 815

tests/00-geo-rep/georep-basic-dr-rsync.t, 813, 4046

tests/basic/ec/ec-1468261.t, 817, 812

tests/bugs/glusterd/quorum-validation.t, 4049, 407

tests/bugs/quota/bug-1293601.t, 811, 812

tests/basic/afr/add-brick-self-heal.t, 407

tests/basic/afr/granular-esh/replace-brick.t, 392

tests/bugs/core/multiplex-limit-issue-151.t, 405

tests/bugs/distribute/bug-1042725.t, 405

tests/bugs/distribute/bug-1117851.t, 405

tests/bugs/glusterd/rebalance-operations-in-single-node.t, 405

tests/bugs/index/bug-1559004-EMLINK-handling.t, 405

tests/bugs/replicate/bug-1386188-sbrain-fav-child.t, 4048

tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t, 813  


Thanks,
Shyam


On 07/30/2018 03:21 PM, Shyam Ranganathan wrote:
> On 07/24/2018 03:12 PM, Shyam Ranganathan wrote:
>> 1) master branch health checks (weekly, till branching)
>>   - Expect every Monday a status update on various tests runs
> 
> See https://build.gluster.org/job/nightly-master/ for a report on
> various nightly and periodic jobs on master.
> 
> RED:
> 1. Nightly regression (3/6 failed)
> - Tests that reported failure:
> ./tests/00-geo-rep/georep-basic-dr-rsync.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/distribute/bug-1122443.t
> 
> - Tests that needed a retry:
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t
> ./tests/bugs/glusterd/quorum-validation.t
> 
> 2. Regression with multiplex (cores and test failures)
> 
> 3. line-coverage (cores and test failures)
> - Tests that failed:
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t (patch
> https://review.gluster.org/20568 does not fix the timeout entirely, as
> can be seen in this run,
> https://build.gluster.org/job/line-coverage/401/consoleFull )
> 
> Calling out to contributors to take a look at various failures, and post
> the same as bugs AND to the lists (so that duplication is avoided) to
> get this to a GREEN status.
> 
> GREEN:
> 1. cpp-check
> 2. RPM builds
> 
> IGNORE (for now):
> 1. clang scan (@nigel, this job requires clang warnings to be fixed to
> go green, right?)
> 
> Shyam
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

2018-08-01 Thread Shyam Ranganathan
On 07/31/2018 12:41 PM, Atin Mukherjee wrote:
> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even after
> 400 secs. Refer
> https://fstat.gluster.org/failure/209?state=2_date=2018-06-30_date=2018-07-31=all,
> specifically the latest report
> https://build.gluster.org/job/regression-test-burn-in/4051/consoleText .
> Wasn't timing out as frequently as it was till 12 July. But since 27
> July, it has timed out twice. Beginning to believe commit
> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 has added the delay and now 400
> secs isn't sufficient enough (Mohit?)

The above test is the one that is causing line coverage to fail as well
(mostly, say 50% of the time).

I did have this patch up to increase timeouts and also ran a few rounds
of tests, but results are mixed. It passes when run first, and later
errors out in other places (although not timing out).

See: https://review.gluster.org/#/c/20568/2 for the changes and test run
details.

The failure of this test in regression-test-burn-in run#4051 is strange
again, it looks like the test completed within stipulated time, but
restarted again post cleanup_func was invoked.

Digging a little further the manner of cleanup_func and traps used in
this test seem *interesting* and maybe needs a closer look to arrive at
possible issues here.

@Mohit, request you to take a look at the line coverage failures as
well, as you handle the failures in this test.

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

2018-08-01 Thread Shyam Ranganathan
On 08/01/2018 12:13 AM, Sankarshan Mukhopadhyay wrote:
>> Thinking aloud, we may have to stop merges to master to get these test
>> failures addressed at the earliest and to continue maintaining them
>> GREEN for the health of the branch.
>>
>> I would give the above a week, before we lockdown the branch to fix the
>> failures.
>>
> Is 1 week a sufficient estimate to address the issues?
> 

Branching is Aug 20th, so I would say Aug 6th lockdown decision is
almost a little late, and also once we get this going it should be
possible to maintain health going forward. So taking a blocking stance
at this juncture is probably for the best.

Having said that, I am also stating we get Cent7 regressions and lcov
GREEN by this time, giving mux a week more to get the stability in
place. This is due to my belief that mux may take a bit longer than the
other 2 (IOW, addressing the sufficiency clause in the concern raised
above).
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] FreeBSD smoke test may fail for older changes, rebase needed

2018-08-01 Thread Shyam Ranganathan
On 07/31/2018 02:12 AM, Niels de Vos wrote:
> On Mon, Jul 30, 2018 at 02:44:57PM -0400, Shyam Ranganathan wrote:
>> On 07/28/2018 12:45 PM, Niels de Vos wrote:
>>> On Sat, Jul 28, 2018 at 03:37:46PM +0200, Niels de Vos wrote:
>>>> This Friday argp-standalone got installed on the FreeBSD Jenkins
>>>> slave(s). With the library available, we can now drop the bundled and
>>>> unmaintained contrib/argp-standlone/ from our glusterfs sources.
>>>>
>>>> Unfortunately building on FreeBSD fails if the header/library is
>>>> installed. This has been corrected with https://review.gluster.org/20581
>>>> but that means changes posted in Gerrit may need a rebase to include the
>>>> fix for building on FreeBSD.
>>>>
>>>> I think I have rebased all related changes that did not have negative
>>>> comments asking for corrections/improvement. In case I missed a change,
>>>> please rebase your patch so the smoke test runs again.
>>>>
>>>> Sorry for any inconvenience that this caused,
>>>> Niels
>>>
>>> It just occured to me that the argp-standalone installation also affects
>>> the release-4.1 and release-3.12 branches. Jiffin, Shyam, do you want to
>>> cherry-pick https://review.gluster.org/20581 to fix that, or do you
>>> prefer an alternative that always uses the bundled version of the
>>> library?
>>
>> The outcome is to get existing maintained release branches building and
>> working on FreeBSD, would that be correct?
> 
> 'working' in the way that they were earlier. I do not know of any
> (automated or manual) tests that verify the correct functioning. It is
> build tested only. I think.
> 
>> If so I think we can use the cherry-picked version, the changes seem
>> mostly straight forward, and it is possibly easier to maintain.
> 
> It is straight forward, but does add a new requirement on a library that
> should get installed on the system. This is not something that we
> normally allow during a stable release.
> 
>> Although, I have to ask, what is the downside of not taking it in at
>> all? If it is just FreeBSD, then can we live with the same till release-
>> is out?
> 
> Yes, it is 'just' FreeBSD build testing. Users should still be able to
> build the stable releases on FreeBSD as long as they do not install
> argp-standalone. In that case the bundled version will be used as the
> stable releases still have that in their tree.
> 
> If the patch does not get merged, it will cause the smoke tests on
> FreeBSD to fail. As Nigel mentions, it is possible to disable this test
> for the stable branches.
> 
> An alternative would be to fix the build process, and optionally use the
> bundled library in case it is not installed on the system. This is what
> we normally would have done, but it seems to have been broken in the
> case of FreeBSD + argp-standalone.

Based on the above reasoning, I would suggest that we do not backport
this to the release branches, and disable the FreeBSD job on them, and
if possible enable them for the next release (5).

Objections?

> 
> Niels
> 
> 
>> Finally, thanks for checking as the patch is not a simple bug-fix backport.
>>
>>>
>>> Niels
>>>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 5: Master branch health report (Week of 30th July)

2018-07-31 Thread Shyam Ranganathan
On 07/30/2018 03:21 PM, Shyam Ranganathan wrote:
> On 07/24/2018 03:12 PM, Shyam Ranganathan wrote:
>> 1) master branch health checks (weekly, till branching)
>>   - Expect every Monday a status update on various tests runs
> 
> See https://build.gluster.org/job/nightly-master/ for a report on
> various nightly and periodic jobs on master.

Thinking aloud, we may have to stop merges to master to get these test
failures addressed at the earliest and to continue maintaining them
GREEN for the health of the branch.

I would give the above a week, before we lockdown the branch to fix the
failures.

Let's try and get line-coverage and nightly regression tests addressed
this week (leaving mux-regression open), and if addressed not lock the
branch down.

> 
> RED:
> 1. Nightly regression (3/6 failed)
> - Tests that reported failure:
> ./tests/00-geo-rep/georep-basic-dr-rsync.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/distribute/bug-1122443.t
> 
> - Tests that needed a retry:
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t
> ./tests/bugs/glusterd/quorum-validation.t
> 
> 2. Regression with multiplex (cores and test failures)
> 
> 3. line-coverage (cores and test failures)
> - Tests that failed:
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t (patch
> https://review.gluster.org/20568 does not fix the timeout entirely, as
> can be seen in this run,
> https://build.gluster.org/job/line-coverage/401/consoleFull )
> 
> Calling out to contributors to take a look at various failures, and post
> the same as bugs AND to the lists (so that duplication is avoided) to
> get this to a GREEN status.
> 
> GREEN:
> 1. cpp-check
> 2. RPM builds
> 
> IGNORE (for now):
> 1. clang scan (@nigel, this job requires clang warnings to be fixed to
> go green, right?)
> 
> Shyam
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 5: Master branch health report (Week of 30th July)

2018-07-30 Thread Shyam Ranganathan
On 07/24/2018 03:12 PM, Shyam Ranganathan wrote:
> 1) master branch health checks (weekly, till branching)
>   - Expect every Monday a status update on various tests runs

See https://build.gluster.org/job/nightly-master/ for a report on
various nightly and periodic jobs on master.

RED:
1. Nightly regression (3/6 failed)
- Tests that reported failure:
./tests/00-geo-rep/georep-basic-dr-rsync.t
./tests/bugs/core/bug-1432542-mpx-restart-crash.t
./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
./tests/bugs/distribute/bug-1122443.t

- Tests that needed a retry:
./tests/00-geo-rep/georep-basic-dr-tarssh.t
./tests/bugs/glusterd/quorum-validation.t

2. Regression with multiplex (cores and test failures)

3. line-coverage (cores and test failures)
- Tests that failed:
./tests/bugs/core/bug-1432542-mpx-restart-crash.t (patch
https://review.gluster.org/20568 does not fix the timeout entirely, as
can be seen in this run,
https://build.gluster.org/job/line-coverage/401/consoleFull )

Calling out to contributors to take a look at various failures, and post
the same as bugs AND to the lists (so that duplication is avoided) to
get this to a GREEN status.

GREEN:
1. cpp-check
2. RPM builds

IGNORE (for now):
1. clang scan (@nigel, this job requires clang warnings to be fixed to
go green, right?)

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] FreeBSD smoke test may fail for older changes, rebase needed

2018-07-30 Thread Shyam Ranganathan
On 07/28/2018 12:45 PM, Niels de Vos wrote:
> On Sat, Jul 28, 2018 at 03:37:46PM +0200, Niels de Vos wrote:
>> This Friday argp-standalone got installed on the FreeBSD Jenkins
>> slave(s). With the library available, we can now drop the bundled and
>> unmaintained contrib/argp-standlone/ from our glusterfs sources.
>>
>> Unfortunately building on FreeBSD fails if the header/library is
>> installed. This has been corrected with https://review.gluster.org/20581
>> but that means changes posted in Gerrit may need a rebase to include the
>> fix for building on FreeBSD.
>>
>> I think I have rebased all related changes that did not have negative
>> comments asking for corrections/improvement. In case I missed a change,
>> please rebase your patch so the smoke test runs again.
>>
>> Sorry for any inconvenience that this caused,
>> Niels
> 
> It just occured to me that the argp-standalone installation also affects
> the release-4.1 and release-3.12 branches. Jiffin, Shyam, do you want to
> cherry-pick https://review.gluster.org/20581 to fix that, or do you
> prefer an alternative that always uses the bundled version of the
> library?

The outcome is to get existing maintained release branches building and
working on FreeBSD, would that be correct?

If so I think we can use the cherry-picked version, the changes seem
mostly straight forward, and it is possibly easier to maintain.

Although, I have to ask, what is the downside of not taking it in at
all? If it is just FreeBSD, then can we live with the same till release-
is out?

Finally, thanks for checking as the patch is not a simple bug-fix backport.

> 
> Niels
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 5: Master branch health report (Week of 23rd July)

2018-07-27 Thread Shyam Ranganathan
On 07/26/2018 12:53 AM, Nigel Babu wrote:
> 3) bug-1432542-mpx-restart-crash.t times out consistently:
> https://bugzilla.redhat.com/show_bug.cgi?id=1608568
> 
> @nigel is there a way to on-demand request lcov tests through gerrit? I
> am thinking of pushing a patch that increases the timeout and check if
> it solves the problem for this test as detailed in the bug.
> 
> 
> You should have access to trigger the job from Jenkins. Does that work
> for now?

Thanks Nigel.

After fixing up the Jenkins job to run against a pending commit in
gerrit and tweaking one more timeout value, this test has passed in lcov
runs (see [1], still running but the first test that has passed was the
failing test).

@mohit/@sanju, this is a mux test, and increasing timeouts seem to do
the trick, but I am not quite happy with the situation, can you take a
look and see where the (extra) time is being spent and why?

The other test has also passed in the nightly regressions, post the fix
in sdfs. So with this we should get back to GREEN on line-coverage
nightly runs.

[1] line-coverage test run: https://build.gluster.org/job/line-coverage/401
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 5: Master branch health report (Week of 23rd July)

2018-07-25 Thread Shyam Ranganathan
On 07/25/2018 04:18 PM, Shyam Ranganathan wrote:
> 2) glusterd crash in test sdfs-sanity.t:
> https://bugzilla.redhat.com/show_bug.cgi?id=1608566
> 
> glusterd folks, request you to take a look to correct this.

Persisted with this a little longer and the fix is posted at
https://review.gluster.org/#/c/20565/ (reviews welcome)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


  1   2   3   4   5   6   >