from:"Shyam"

Re: [Gluster-devel] Release 6.2: Expected tagging on May 15th

2019-05-17 Thread Shyam Ranganathan

These patches were dependent on each other, so a merge was not required
to get regression passing, that analysis seems incorrect.

A patch series when tested, will pull in all the dependent patches
anyway, so please relook at what the failure could be. (I assume you
would anyway).

Shyam
On 5/17/19 3:46 AM, Hari Gowtham wrote:
> https://review.gluster.org/#/q/topic:%22ref-1709738%22+(status:open%20OR%20status:merged)
> 
> On Fri, May 17, 2019 at 1:13 PM Amar Tumballi Suryanarayan
>  wrote:
>>
>> Which are the patches? I can merge it for now.
>>
>> -Amar
>>
>> On Fri, May 17, 2019 at 1:10 PM Hari Gowtham  wrote:
>>>
>>> Thanks Sunny.
>>> Have CCed Shyam.
>>>
>>> On Fri, May 17, 2019 at 1:06 PM Sunny Kumar  wrote:
>>>>
>>>> Hi Hari,
>>>>
>>>> For this to pass regression other 3 patches needs to merge first, I
>>>> tried to merge but do not have sufficient permissions to merge on 6.2
>>>> branch.
>>>> I know bug is already in place to grant additional permission for
>>>> us(Me, you and Rinku) so until then waiting on Shyam to merge it.
>>>>
>>>> -Sunny
>>>>
>>>> On Fri, May 17, 2019 at 12:54 PM Hari Gowtham  wrote:
>>>>>
>>>>> Hi Kotresh ans Sunny,
>>>>> The patch has been failing regression a few times.
>>>>> We need to look into why this is happening and take a decision
>>>>> as to take it in release 6.2 or drop it.
>>>>>
>>>>> On Wed, May 15, 2019 at 4:27 PM Hari Gowtham  wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The following patch is waiting for centos regression.
>>>>>> https://review.gluster.org/#/c/glusterfs/+/22725/
>>>>>>
>>>>>> Sunny or Kotresh, please do take a look so that we can go ahead with
>>>>>> the tagging.
>>>>>>
>>>>>> On Thu, May 9, 2019 at 4:45 PM Hari Gowtham  wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Expected tagging date for release-6.2 is on May, 15th, 2019.
>>>>>>>
>>>>>>> Please ensure required patches are backported and also are passing
>>>>>>> regressions and are appropriately reviewed for easy merging and tagging
>>>>>>> on the date.
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Hari Gowtham.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Hari Gowtham.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Hari Gowtham.
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Hari Gowtham.
>>> ___
>>>
>>> Community Meeting Calendar:
>>>
>>> APAC Schedule -
>>> Every 2nd and 4th Tuesday at 11:30 AM IST
>>> Bridge: https://bluejeans.com/836554017
>>>
>>> NA/EMEA Schedule -
>>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>>> Bridge: https://bluejeans.com/486278655
>>>
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>>
>>
>>
>> --
>> Amar Tumballi (amarts)
> 
> 
> 
___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Announcing Gluster release 6.1

2019-04-22 Thread Shyam Ranganathan

The Gluster community is pleased to announce the release of Gluster
6.1 (packages available at [1]).

Release notes for the release can be found at [2].

Major changes, features and limitations addressed in this release:

None

Thanks,
Gluster community

[1] Packages for 6.1:
https://download.gluster.org/pub/gluster/glusterfs/6/6.1/

[2] Release notes for 6.1:
https://docs.gluster.org/en/latest/release-notes/6.1/

___
maintainers mailing list
maintain...@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 6.1: Tagged!

2019-04-17 Thread Shyam Ranganathan

Is now tagged and being packaged. If anyone gets a chance, please test
the packages from CentOS SIG, as I am unavailable for the next 4 days.

Thanks,
Shyam
On 4/16/19 9:53 AM, Shyam Ranganathan wrote:
> Status: Tagging pending
> 
> Waiting on patches:
> (Kotresh/Atin) - glusterd: fix loading ctime in client graph logic
>   https://review.gluster.org/c/glusterfs/+/22579
> 
> Following patches will not be taken in if CentOS regression does not
> pass by tomorrow morning Eastern TZ,
> (Pranith/KingLongMee) - cluster-syncop: avoid duplicate unlock of
> inodelk/entrylk
>   https://review.gluster.org/c/glusterfs/+/22385
> (Aravinda) - geo-rep: IPv6 support
>   https://review.gluster.org/c/glusterfs/+/22488
> (Aravinda) - geo-rep: fix integer config validation
>   https://review.gluster.org/c/glusterfs/+/22489
> 
> Tracker bug status:
> (Ravi) - Bug 1693155 - Excessive AFR messages from gluster showing in
> RHGSWA.
>   All patches are merged, but none of the patches adds the "Fixes"
> keyword, assume this is an oversight and that the bug is fixed in this
> release.
> 
> (Atin) - Bug 1698131 - multiple glusterfsd processes being launched for
> the same brick, causing transport endpoint not connected
>   No work has occurred post logs upload to bug, restart of bircks and
> possibly glusterd is the existing workaround when the bug is hit. Moving
> this out of the tracker for 6.1.
> 
> (Xavi) - Bug 1699917 - I/O error on writes to a disperse volume when
> replace-brick is executed
>   Very recent bug (15th April), does not seem to have any critical data
> corruption or service availability issues, planning on not waiting for
> the fix in 6.1
> 
> - Shyam
> On 4/6/19 4:38 AM, Atin Mukherjee wrote:
>> Hi Mohit,
>>
>> https://review.gluster.org/22495 should get into 6.1 as it’s a
>> regression. Can you please attach the respective bug to the tracker Ravi
>> pointed out?
>>
>>
>> On Sat, 6 Apr 2019 at 12:00, Ravishankar N > <mailto:ravishan...@redhat.com>> wrote:
>>
>> Tracker bug is https://bugzilla.redhat.com/show_bug.cgi?id=1692394, in
>> case anyone wants to add blocker bugs.
>>
>>
>> On 05/04/19 8:03 PM, Shyam Ranganathan wrote:
>> > Hi,
>> >
>> > Expected tagging date for release-6.1 is on April, 10th, 2019.
>> >
>> > Please ensure required patches are backported and also are passing
>> > regressions and are appropriately reviewed for easy merging and
>> tagging
>> > on the date.
>> >
>> > Thanks,
>> > Shyam
>> > ___
>> > Gluster-devel mailing list
>> > Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
>> > https://lists.gluster.org/mailman/listinfo/gluster-devel
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>> -- 
>> - Atin (atinm)
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 6.1: Expected tagging on April 10th

2019-04-16 Thread Shyam Ranganathan

Status: Tagging pending

Waiting on patches:
(Kotresh/Atin) - glusterd: fix loading ctime in client graph logic
  https://review.gluster.org/c/glusterfs/+/22579

Following patches will not be taken in if CentOS regression does not
pass by tomorrow morning Eastern TZ,
(Pranith/KingLongMee) - cluster-syncop: avoid duplicate unlock of
inodelk/entrylk
  https://review.gluster.org/c/glusterfs/+/22385
(Aravinda) - geo-rep: IPv6 support
  https://review.gluster.org/c/glusterfs/+/22488
(Aravinda) - geo-rep: fix integer config validation
  https://review.gluster.org/c/glusterfs/+/22489

Tracker bug status:
(Ravi) - Bug 1693155 - Excessive AFR messages from gluster showing in
RHGSWA.
  All patches are merged, but none of the patches adds the "Fixes"
keyword, assume this is an oversight and that the bug is fixed in this
release.

(Atin) - Bug 1698131 - multiple glusterfsd processes being launched for
the same brick, causing transport endpoint not connected
  No work has occurred post logs upload to bug, restart of bircks and
possibly glusterd is the existing workaround when the bug is hit. Moving
this out of the tracker for 6.1.

(Xavi) - Bug 1699917 - I/O error on writes to a disperse volume when
replace-brick is executed
  Very recent bug (15th April), does not seem to have any critical data
corruption or service availability issues, planning on not waiting for
the fix in 6.1

- Shyam
On 4/6/19 4:38 AM, Atin Mukherjee wrote:
> Hi Mohit,
> 
> https://review.gluster.org/22495 should get into 6.1 as it’s a
> regression. Can you please attach the respective bug to the tracker Ravi
> pointed out?
> 
> 
> On Sat, 6 Apr 2019 at 12:00, Ravishankar N  <mailto:ravishan...@redhat.com>> wrote:
> 
> Tracker bug is https://bugzilla.redhat.com/show_bug.cgi?id=1692394, in
> case anyone wants to add blocker bugs.
> 
> 
> On 05/04/19 8:03 PM, Shyam Ranganathan wrote:
> > Hi,
> >
> > Expected tagging date for release-6.1 is on April, 10th, 2019.
> >
> > Please ensure required patches are backported and also are passing
> > regressions and are appropriately reviewed for easy merging and
> tagging
> > on the date.
> >
> > Thanks,
> > Shyam
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
> > https://lists.gluster.org/mailman/listinfo/gluster-devel
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
> 
> -- 
> - Atin (atinm)
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Release 6.1: Expected tagging on April 10th

2019-04-05 Thread Shyam Ranganathan

Hi,

Expected tagging date for release-6.1 is on April, 10th, 2019.

Please ensure required patches are backported and also are passing
regressions and are appropriately reviewed for easy merging and tagging
on the date.

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Announcing Gluster release 4.1.8

2019-04-05 Thread Shyam Ranganathan

The Gluster community is pleased to announce the release of Gluster
4.1.8 (packages available at [1]).

Release notes for the release can be found at [2].

Major changes, features and limitations addressed in this release:

None

Thanks,
Gluster community

[1] Packages for 4.1.8:
https://download.gluster.org/pub/gluster/glusterfs/4.1/4.1.8/

[2] Release notes for 4.1.8:
https://docs.gluster.org/en/latest/release-notes/4.1.8/



___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Announcing Gluster Release 6

2019-03-25 Thread Shyam Ranganathan

The Gluster community is pleased to announce the release of 6.0, our
latest release.

This is a major release that includes a range of code improvements and
stability fixes along with a few features as noted below.

A selection of the key features and bugs addressed are documented in
this [1] page.

Announcements:

1. Releases that receive maintenance updates post release 6 are, 4.1 and
5 [2]

2. Release 6 will receive maintenance updates around the 10th of every
month for the first 3 months post release (i.e Apr'19, May'19, Jun'19).
Post the initial 3 months, it will receive maintenance updates every 2
months till EOL. [3]

A series of features/xlators have been deprecated in release 6 as
follows, for upgrade procedures from volumes that use these features to
release 6 refer to the release 6 upgrade guide [4].

Features deprecated:
- Block device (bd) xlator
- Decompounder feature
- Crypt xlator
- Symlink-cache xlator
- Stripe feature
- Tiering support (tier xlator and changetimerecorder)

Highlights of this release are:
- Several stability fixes addressing, coverity, clang-scan, address
sanitizer and valgrind reported issues
- Removal of unused and hence, deprecated code and features
- Client side inode garbage collection
- This release addresses one of the major concerns regarding FUSE mount
process memory footprint, by introducing client side inode garbage
collection
- Performance Improvements
- "--auto-invalidation" on FUSE mounts to leverage kernel page cache
more effectively

Bugs addressed are provided towards the end, in the release notes [1]

Thank you,
Gluster community

References:
[1] Release notes: https://docs.gluster.org/en/latest/release-notes/6.0/

[2] Release schedule: https://www.gluster.org/release-schedule/

[3] Gluster release cadence and version changes:
https://lists.gluster.org/pipermail/announce/2018-July/000103.html

[4] Upgrade guide to release-6:
https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_6/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Gluster version EOL date

2019-03-22 Thread Shyam Ranganathan

As per the release schedule page [1] 5 will EOL when release 8 is out.
Releases are 4 months apart, hence 12 months from when release 5 was
out, it would be EOL'd.

Major releases receive minor updates 5.x, which are bug fixes and
stability fixes. These do not extend the 12 month cycle for the release.

Shyam

[1] Release schedule: https://www.gluster.org/release-schedule/

On 3/22/19 6:45 AM, ABHISHEK PALIWAL wrote:
> Hi,
> 
> As per gluster community seems the latest version is 5.5. Could any one
> tell me what would be the EOL date for version 5.5? is it after 12 month
> of release date or what?
> 
> -- 
> 
> 
> 
> 
> Regards
> Abhishek Paliwal
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Release 6: Tagged and ready for packaging

2019-03-19 Thread Shyam Ranganathan

Hi,

RC1 testing is complete and blockers have been addressed. The release is
now tagged for a final round of packaging and package testing before
release.

Thanks for testing out the RC builds and reporting issues that needed to
be addressed.

As packaging and final package testing is finishing up, we would be
writing the upgrade guide for the release as well, before announcing the
release for general consumption.

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] GlusterFS - 6.0RC - Test days (27th, 28th Feb)

2019-03-05 Thread Shyam Ranganathan

On 3/4/19 12:33 PM, Shyam Ranganathan wrote:
> On 3/4/19 10:08 AM, Atin Mukherjee wrote:
>>
>>
>> On Mon, 4 Mar 2019 at 20:33, Amar Tumballi Suryanarayan
>> mailto:atumb...@redhat.com>> wrote:
>>
>> Thanks to those who participated.
>>
>> Update at present:
>>
>> We found 3 blocker bugs in upgrade scenarios, and hence have marked
>> release
>> as pending upon them. We will keep these lists updated about progress.
>>
>>
>> I’d like to clarify that upgrade testing is blocked. So just fixing
>> these test blocker(s) isn’t enough to call release-6 green. We need to
>> continue and finish the rest of the upgrade tests once the respective
>> bugs are fixed.
> 
> Based on fixes expected by tomorrow for the upgrade fixes, we will build
> an RC1 candidate on Wednesday (6-Mar) (tagging early Wed. Eastern TZ).
> This RC can be used for further testing.

There have been no backports for the upgrade failures, request folks
working on the same to post a list of bugs that need to be fixed, to
enable tracking the same. (also, ensure they are marked against the
release-6 tracker [1])

Also, we need to start writing out the upgrade guide for release-6, any
volunteers for the same?

Thanks,
Shyam

[1] Release-6 tracker bug:
https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-6.0
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] GlusterFS - 6.0RC - Test days (27th, 28th Feb)

2019-03-04 Thread Shyam Ranganathan

On 3/4/19 10:08 AM, Atin Mukherjee wrote:
> 
> 
> On Mon, 4 Mar 2019 at 20:33, Amar Tumballi Suryanarayan
> mailto:atumb...@redhat.com>> wrote:
> 
> Thanks to those who participated.
> 
> Update at present:
> 
> We found 3 blocker bugs in upgrade scenarios, and hence have marked
> release
> as pending upon them. We will keep these lists updated about progress.
> 
> 
> I’d like to clarify that upgrade testing is blocked. So just fixing
> these test blocker(s) isn’t enough to call release-6 green. We need to
> continue and finish the rest of the upgrade tests once the respective
> bugs are fixed.

Based on fixes expected by tomorrow for the upgrade fixes, we will build
an RC1 candidate on Wednesday (6-Mar) (tagging early Wed. Eastern TZ).
This RC can be used for further testing.

> 
> 
> 
> -Amar
> 
> On Mon, Feb 25, 2019 at 11:41 PM Amar Tumballi Suryanarayan <
> atumb...@redhat.com > wrote:
> 
> > Hi all,
> >
> > We are calling out our users, and developers to contribute in
> validating
> > ‘glusterfs-6.0rc’ build in their usecase. Specially for the cases of
> > upgrade, stability, and performance.
> >
> > Some of the key highlights of the release are listed in release-notes
> > draft
> >
> 
> .
> > Please note that there are some of the features which are being
> dropped out
> > of this release, and hence making sure your setup is not going to
> have an
> > issue is critical. Also the default lru-limit option in fuse mount for
> > Inodes should help to control the memory usage of client
> processes. All the
> > good reason to give it a shot in your test setup.
> >
> > If you are developer using gfapi interface to integrate with other
> > projects, you also have some signature changes, so please make
> sure your
> > project would work with latest release. Or even if you are using a
> project
> > which depends on gfapi, report the error with new RPMs (if any).
> We will
> > help fix it.
> >
> > As part of test days, we want to focus on testing the latest upcoming
> > release i.e. GlusterFS-6, and one or the other gluster volunteers
> would be
> > there on #gluster channel on freenode to assist the people. Some
> of the key
> > things we are looking as bug reports are:
> >
> >    -
> >
> >    See if upgrade from your current version to 6.0rc is smooth,
> and works
> >    as documented.
> >    - Report bugs in process, or in documentation if you find mismatch.
> >    -
> >
> >    Functionality is all as expected for your usecase.
> >    - No issues with actual application you would run on production
> etc.
> >    -
> >
> >    Performance has not degraded in your usecase.
> >    - While we have added some performance options to the code, not
> all of
> >       them are turned on, as they have to be done based on usecases.
> >       - Make sure the default setup is at least same as your current
> >       version
> >       - Try out few options mentioned in release notes (especially,
> >       --auto-invalidation=no) and see if it helps performance.
> >    -
> >
> >    While doing all the above, check below:
> >    - see if the log files are making sense, and not flooding with some
> >       “for developer only” type of messages.
> >       - get ‘profile info’ output from old and now, and see if
> there is
> >       anything which is out of normal expectation. Check with us
> on the numbers.
> >       - get a ‘statedump’ when there are some issues. Try to make
> sense
> >       of it, and raise a bug if you don’t understand it completely.
> >
> >
> >
> 
> Process
> > expected on test days.
> >
> >    -
> >
> >    We have a tracker bug
> >    [0]
> >    - We will attach all the ‘blocker’ bugs to this bug.
> >    -
> >
> >    Use this link to report bugs, so that we have more metadata around
> >    given bugzilla.
> >    - Click Here
> >     
>  
> 
> >       [1]
> >    -
> >
> >    The test cases which are to be tested are listed here in this sheet
> >   
> 
> [2],
> >    please add, update, and keep it up-to-date to reduce duplicate
> efforts
> 
> -- 
> - Atin (atinm)
> 
>

Re: [Gluster-devel] [Gluster-Maintainers] glusterfs-6.0rc0 released

2019-02-25 Thread Shyam Ranganathan

Hi,

Release-6 RC0 packages are built (see mail below). This is a good time
to start testing the release bits, and reporting any issues on bugzilla.
Do post on the lists any testing done and feedback from the same.

We have about 2 weeks to GA of release-6 barring any major blockers
uncovered during the test phase. Please take this time to help make the
release effective, by testing the same.

Thanks,
Shyam

NOTE: CentOS StorageSIG packages for the same are still pending and
should be available in due course.
On 2/23/19 9:41 AM, Kaleb Keithley wrote:
> 
> GlusterFS 6.0rc0 is built in Fedora 30 and Fedora 31/rawhide.
> 
> Packages for Fedora 29, RHEL 8, RHEL 7, and RHEL 6* and Debian 9/stretch
> and Debian 10/buster are at
> https://download.gluster.org/pub/gluster/glusterfs/qa-releases/6.0rc0/
> 
> Packages are signed. The public key is at
> https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub
> 
> * RHEL 6 is client-side only. Fedora 29, RHEL 7, and RHEL 6 RPMs are
> Fedora Koji scratch builds. RHEL 7 and RHEL 6 RPMs are provided here for
> convenience only, and are independent of the RPMs in the CentOS Storage SIG.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 6: Branched and next steps

2019-02-20 Thread Shyam Ranganathan

On 2/20/19 7:45 AM, Amar Tumballi Suryanarayan wrote:
> 
> 
> On Tue, Feb 19, 2019 at 1:37 AM Shyam Ranganathan  <mailto:srang...@redhat.com>> wrote:
> 
> In preparation for RC0 I have put up an intial patch for the release
> notes [1]. Request the following actions on the same (either a followup
> patchset, or a dependent one),
> 
> - Please review!
> - Required GD2 section updated to latest GD2 status
> 
> 
> I am inclined to drop the GD2 section for 'standalone' users. As the
> team worked with goals of making GD2 invisible with containers (GCS) in
> mind. So, should we call out any features of GD2 at all?

This is fine, we possibly need to add a note in the release notes, on
the GD2 future and where it would land, so that we can inform users
about the continued use of GD1 in non-GCS use cases.

I will add some text around the same in the release-notes.

> 
> Anyways, as per my previous email on GCS release updates, we are
> planning to have a container available with gd2 and glusterfs, which can
> be used by people who are trying out options with GD2.
>  
> 
> - Require notes on "Reduce the number or threads used in the brick
> process" and the actual status of the same in the notes
> 
> 
> This work is still in progress, and we are treating it as a bug fix for
> 'brick-multiplex' usecase, which is mainly required in scaled volume
> number usecase in container world. My guess is, we won't have much
> content to add for glusterfs-6.0 at the moment.

Ack!

>  
> 
> RC0 build target would be tomorrow or by Wednesday.
> 
> 
> Thanks, I was testing for few upgrade and different version clusters
> support. With 4.1.6 and latest release-6.0 branch, things works fine. I
> haven't done much of a load testing yet. 

Awesome! Helps write out the upgrade guide as well. As this time content
there would/should carry data regarding how to upgrade if any of the
deprecated xlators are in use by a deployment.

> 
> Requesting people to support in upgrade testing. From different volume
> options, and different usecase scenarios.
> 
> Regards,
> Amar
> 
>  
> 
> Thanks,
> Shyam
> 
> [1] Release notes patch: https://review.gluster.org/c/glusterfs/+/6
> 
> On 2/5/19 8:25 PM, Shyam Ranganathan wrote:
> > Hi,
> >
> > Release 6 is branched, and tracker bug for 6.0 is created [1].
> >
> > Do mark blockers for the release against [1].
> >
> > As of now we are only tracking [2] "core: implement a global
> thread pool
> > " for a backport as a feature into the release.
> >
> > We expect to create RC0 tag and builds for upgrade and other testing
> > close to mid-week next week (around 13th Feb), and the release is
> slated
> > for the first week of March for GA.
> >
> > I will post updates to this thread around release notes and other
> > related activity.
> >
> > Thanks,
> > Shyam
> >
> > [1] Tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-6.0
> >
> > [2] Patches tracked for a backport:
> >   - https://review.gluster.org/c/glusterfs/+/20636
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
> > https://lists.gluster.org/mailman/listinfo/gluster-devel
> >
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
> 
> 
> 
> -- 
> Amar Tumballi (amarts)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 6: Branched and next steps

2019-02-18 Thread Shyam Ranganathan

In preparation for RC0 I have put up an intial patch for the release
notes [1]. Request the following actions on the same (either a followup
patchset, or a dependent one),

- Please review!
- Required GD2 section updated to latest GD2 status
- Require notes on "Reduce the number or threads used in the brick
process" and the actual status of the same in the notes

RC0 build target would be tomorrow or by Wednesday.

Thanks,
Shyam

[1] Release notes patch: https://review.gluster.org/c/glusterfs/+/6

On 2/5/19 8:25 PM, Shyam Ranganathan wrote:
> Hi,
> 
> Release 6 is branched, and tracker bug for 6.0 is created [1].
> 
> Do mark blockers for the release against [1].
> 
> As of now we are only tracking [2] "core: implement a global thread pool
> " for a backport as a feature into the release.
> 
> We expect to create RC0 tag and builds for upgrade and other testing
> close to mid-week next week (around 13th Feb), and the release is slated
> for the first week of March for GA.
> 
> I will post updates to this thread around release notes and other
> related activity.
> 
> Thanks,
> Shyam
> 
> [1] Tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-6.0
> 
> [2] Patches tracked for a backport:
>   - https://review.gluster.org/c/glusterfs/+/20636
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] gfapi: Changed API signatures in release-6

2019-02-07 Thread Shyam Ranganathan

Hi,

A few GFAPI signatures have changed in release-6 and the list can be
seen here [1]. The change is to adapt to a more elaborate stat structure
than what POSIX provides and details around the same can be seen here [2].

If you have an application that compiles against gfapi, then it needs to
adapt to the new APIs in case you are buildnig against master or against
release-6.

Existing compiled applications will continue to work as the older
versions of the symbol are present and have not changed their ABI.

Further, if your build environment still depends on a version less than
release-6 or master, the builds and functionality remains the same (IOW,
there is not immediate need to adapt to the new APIs).

Components that I know integrate to gfapi and hence may need to change are,
- Samba Gluster plugin
- NFS Ganesha FSAL for Gluster
- TCMU-Gluster integration
- QEMU Gluster integration
- Other language API bindings
  - go
  - python
  - 
- Anything else?

Request respective maintainers or members working with these integration
to post required patches to the respective projects.

Thanks,
Shyam

[1] APIs changed/added in release-6:
https://github.com/gluster/glusterfs/blob/release-6/api/src/gfapi.map#L245
(NOTE: Version will change to 6.0 as this patch is merged
https://review.gluster.org/c/glusterfs/+/22173 )

[2] Issue dealing with statx like stat information returns from gfapi:
  - https://github.com/gluster/glusterfs/issues/389
  - https://github.com/gluster/glusterfs/issues/273
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Release 6: Branched and next steps

2019-02-05 Thread Shyam Ranganathan

Hi,

Release 6 is branched, and tracker bug for 6.0 is created [1].

Do mark blockers for the release against [1].

As of now we are only tracking [2] "core: implement a global thread pool
" for a backport as a feature into the release.

We expect to create RC0 tag and builds for upgrade and other testing
close to mid-week next week (around 13th Feb), and the release is slated
for the first week of March for GA.

I will post updates to this thread around release notes and other
related activity.

Thanks,
Shyam

[1] Tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-6.0

[2] Patches tracked for a backport:
  - https://review.gluster.org/c/glusterfs/+/20636
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 6: Kick off!

2019-01-24 Thread Shyam Ranganathan

On 1/24/19 3:23 AM, Soumya Koduri wrote:
> Hi Shyam,
> 
> Sorry for the late response. I just realized that we had two more new
> APIs glfs_setattr/fsetattr which uses 'struct stat' made public [1]. As
> mentioned in one of the patchset review comments, since the goal is to
> move to glfs_stat in release-6, do we need to update these APIs as well
> to use the new struct? Or shall we retain them in FUTURE for now and
> address in next minor release? Please suggest.

So the goal in 6 is to not return stat but glfs_stat in the modified
pre/post stat return APIs (instead of making this a 2-step for
application consumers).

To reach glfs_stat everywhere, we have a few more things to do. I had
this patch in my radar, but just like pub_glfs_stat returns stat (hence
we made glfs_statx as private), I am seeing this as "fine for now". In
the future we only want to return glfs_stat.

So for now, we let this API be. The next round of converting stat to
glfs_stat would take into account clearing up all such instances. So
that all application consumers will need to modify code as required in
one shot.

Does this answer the concern? and, thanks for bringing this to notice.

> 
> Thanks,
> Soumya
> 
> [1] https://review.gluster.org/#/c/glusterfs/+/21734/
> 
> 
> On 1/23/19 8:43 PM, Shyam Ranganathan wrote:
>> On 1/23/19 6:03 AM, Ashish Pandey wrote:
>>>
>>> Following is the patch I am working and targeting -
>>> https://review.gluster.org/#/c/glusterfs/+/21933/
>>
>> This is a bug fix, and the patch size at the moment is also small in
>> lines changed. Hence, even if it misses branching the fix can be
>> backported.
>>
>> Thanks for the heads up!
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 6: Kick off!

2019-01-23 Thread Shyam Ranganathan

On 1/23/19 6:03 AM, Ashish Pandey wrote:
> 
> Following is the patch I am working and targeting - 
> https://review.gluster.org/#/c/glusterfs/+/21933/

This is a bug fix, and the patch size at the moment is also small in
lines changed. Hence, even if it misses branching the fix can be backported.

Thanks for the heads up!
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 6: Kick off!

2019-01-23 Thread Shyam Ranganathan

On 1/23/19 5:52 AM, RAFI KC wrote:
> There are three patches that I'm working for Gluster-6.
> 
> [1] : https://review.gluster.org/#/c/glusterfs/+/22075/

We discussed mux for shd in the maintainers meeting, and decided that
this would be for the next release, as the patchset is not ready
(branching is today, if I get the time to get it done).

> 
> [2] : https://review.gluster.org/#/c/glusterfs/+/21333/

Ack! in case this is not in by branching we can backport the same

> 
> [3] : https://review.gluster.org/#/c/glusterfs/+/21720/

Bug fix, can be backported post branching as well, so again ack!

Thanks for responding.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 6: Kick off!

2019-01-18 Thread Shyam Ranganathan

On 12/6/18 9:34 AM, Shyam Ranganathan wrote:
> On 11/6/18 11:34 AM, Shyam Ranganathan wrote:
>> ## Schedule
> 
> We have decided to postpone release-6 by a month, to accommodate for
> late enhancements and the drive towards getting what is required for the
> GCS project [1] done in core glusterfs.
> 
> This puts the (modified) schedule for Release-6 as below,
> 
> Working backwards on the schedule, here's what we have:
> - Announcement: Week of Mar 4th, 2019
> - GA tagging: Mar-01-2019
> - RC1: On demand before GA
> - RC0: Feb-04-2019
> - Late features cut-off: Week of Jan-21st, 2018
> - Branching (feature cutoff date): Jan-14-2018
>   (~45 days prior to branching)

We are slightly past the branching date, I would like to branch early
next week, so please respond with a list of patches that need to be part
of the release and are still pending a merge, will help address review
focus on the same and also help track it down and branch the release.

Thanks, Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Regression health for release-5.next and release-6

2019-01-11 Thread Shyam Ranganathan

We can check health on master post the patch as stated by Mohit below.

Release-5 is causing some concerns as we need to tag the release
yesterday, but we have the following 2 tests failing or coredumping
pretty regularly, need attention on these.

ec/bug-1236065.t
glusterd/add-brick-and-validate-replicated-volume-options.t

Shyam
On 1/10/19 6:20 AM, Mohit Agrawal wrote:
> I think we should consider regression-builds after merged the patch
> (https://review.gluster.org/#/c/glusterfs/+/21990/) 
> as we know this patch introduced some delay.
> 
> Thanks,
> Mohit Agrawal
> 
> On Thu, Jan 10, 2019 at 3:55 PM Atin Mukherjee  <mailto:amukh...@redhat.com>> wrote:
> 
> Mohit, Sanju - request you to investigate the failures related to
> glusterd and brick-mux and report back to the list.
> 
> On Thu, Jan 10, 2019 at 12:25 AM Shyam Ranganathan
> mailto:srang...@redhat.com>> wrote:
> 
> Hi,
> 
> As part of branching preparation next week for release-6, please
> find
> test failures and respective test links here [1].
> 
> The top tests that are failing/dumping-core are as below and
> need attention,
> - ec/bug-1236065.t
> - glusterd/add-brick-and-validate-replicated-volume-options.t
> - readdir-ahead/bug-1390050.t
> - glusterd/brick-mux-validation.t
> - bug-1432542-mpx-restart-crash.t
> 
> Others of interest,
> - replicate/bug-1341650.t
> 
> Please file a bug if needed against the test case and report the
> same
> here, in case a problem is already addressed, then do send back the
> patch details that addresses this issue as a response to this mail.
> 
> Thanks,
> Shyam
> 
> [1] Regression failures:
> https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Regression health for release-5.next and release-6

2019-01-09 Thread Shyam Ranganathan

Hi,

As part of branching preparation next week for release-6, please find
test failures and respective test links here [1].

The top tests that are failing/dumping-core are as below and need attention,
- ec/bug-1236065.t
- glusterd/add-brick-and-validate-replicated-volume-options.t
- readdir-ahead/bug-1390050.t
- glusterd/brick-mux-validation.t
- bug-1432542-mpx-restart-crash.t

Others of interest,
- replicate/bug-1341650.t

Please file a bug if needed against the test case and report the same
here, in case a problem is already addressed, then do send back the
patch details that addresses this issue as a response to this mail.

Thanks,
Shyam

[1] Regression failures: https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] https://review.gluster.org/#/c/glusterfs/+/19778/

2019-01-08 Thread Shyam Ranganathan

On 1/8/19 8:33 AM, Nithya Balachandran wrote:
> Shyam, what is your take on this?
> An upstream user has tried it out and reported that it seems to fix the
> issue , however cpu utilization doubles.

We usually do not backport big fixes unless they are critical. My first
answer would be, can't this wait for rel-6 which is up next?

The change has gone through a good review overall, so from a review
thoroughness perspective it looks good.

The change has a test case to ensure that the limits are honored, so
again a plus.

Also, it is a switch, so in the worst case moving back to unlimited
should be possible with little adverse effects in case the fix has issues.

It hence, comes down to how confident are we that the change is not
disruptive to an existing branch? If we can answer this with resonable
confidence we can backport it and release it with the next 5.x update
release.

> 
> Regards,
> Nithya
> 
> On Fri, 28 Dec 2018 at 09:17, Amar Tumballi  <mailto:atumb...@redhat.com>> wrote:
> 
> I feel its good to backport considering glusterfs-6.0 is another 2
> months away.
> 
> On Fri, Dec 28, 2018 at 8:19 AM Nithya Balachandran
> mailto:nbala...@redhat.com>> wrote:
> 
> Hi,
> 
> Can we backport this to release-5 ? We have several reports of
> high memory usage in fuse clients from users and this is likely
> to help.
> 
> Regards,
> Nithya
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
> 
> 
> -- 
> Amar Tumballi (amarts)
> 
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Announcing Gluster release 5.2

2018-12-13 Thread Shyam Ranganathan

The Gluster community is pleased to announce the release of Gluster
5.2 (packages available at [1]).

Release notes can be found at [2].

Major changes, features and limitations addressed in this release:

- Several bugs as listed in the release notes have been addressed

Thanks,
Gluster community

[1] Packages for 5.2:
https://download.gluster.org/pub/gluster/glusterfs/5/5.2/
(CentOS storage SIG packages may arrive on Monday (17th Dec-2018) or
later as per the CentOS schedules)

[2] Release notes for 5.2:
https://docs.gluster.org/en/latest/release-notes/5.2/
OR,
https://github.com/gluster/glusterfs/blob/release-5/doc/release-notes/5.2.md
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 6: Kick off!

2018-12-06 Thread Shyam Ranganathan

On 11/6/18 11:34 AM, Shyam Ranganathan wrote:
> ## Schedule

We have decided to postpone release-6 by a month, to accommodate for
late enhancements and the drive towards getting what is required for the
GCS project [1] done in core glusterfs.

This puts the (modified) schedule for Release-6 as below,

Working backwards on the schedule, here's what we have:
- Announcement: Week of Mar 4th, 2019
- GA tagging: Mar-01-2019
- RC1: On demand before GA
- RC0: Feb-04-2019
- Late features cut-off: Week of Jan-21st, 2018
- Branching (feature cutoff date): Jan-14-2018
  (~45 days prior to branching)
- Feature/scope proposal for the release (end date): *Dec-12-2018*

So the first date is the feature/scope proposal end date, which is next
week, please send in enhancements that you are working on that will meet
the above schedule, for us to track and ensure they get in on time better.

> 
> ## Volunteers
> This is my usual call for volunteers to run the release with me or
> otherwise, but please do consider. We need more hands this time, and
> possibly some time sharing during the end of the year owing to the holidays.

Also, taking this opportunity to call for volunteers to run the release
again. Anyone interested please do respond.

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Patches in merge conflict

2018-12-05 Thread Shyam Ranganathan

Due to the merge of https://review.gluster.org/c/glusterfs/+/21746 which
changes a whole lot of files to get header includes with the new changed
path for libglusterfs includes, a lot of patches are in merge conflict.

If you notice that your patch is one such, please rebase to the tip of
master, using the gerrit UI, or manually if that fails.

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Shard test failing more commonly on master

2018-12-04 Thread Shyam Ranganathan

Test: ./tests/bugs/shard/zero-flag.t

Runs:
  - https://build.gluster.org/job/centos7-regression/3942/console
  - https://build.gluster.org/job/centos7-regression/3941/console
  - https://build.gluster.org/job/centos7-regression/3938/console

Failures seem to occur at common points across the tests like so,

09:52:34 stat: missing operand
09:52:34 Try 'stat --help' for more information.
09:52:34 not ok 17 Got "" instead of "2097152", LINENUM:40
09:52:34 FAILED COMMAND: 2097152 echo

09:52:34 stat: cannot stat
‘/d/backends/patchy*/.shard/41fed5c6-636e-44d6-b6ed-068b941843cd.2’: No
such file or directory
09:52:34 not ok 27 , LINENUM:64
09:52:34 FAILED COMMAND: stat
/d/backends/patchy*/.shard/41fed5c6-636e-44d6-b6ed-068b941843cd.2
09:52:34 stat: missing operand
09:52:34 Try 'stat --help' for more information.
09:52:34 not ok 28 Got "" instead of "1048602", LINENUM:66
09:52:34 FAILED COMMAND: 1048602 echo

Krutika, is this something you are already chasing down?

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Geo-rep tests failing on master Cent7-regressions

2018-12-04 Thread Shyam Ranganathan

Hi Kotresh,

Multiple geo-rep tests are failing on master on various patch regressions.

Looks like you have put in
https://review.gluster.org/c/glusterfs/+/21794 for review, to address
the issue at present.

Would that be correct?

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Problem: Include libglusterfs files in gfapi headers

2018-11-21 Thread Shyam Ranganathan

In the commit [1] that introduces statx structure as an out arg from
glfs APIs, there is a need to provide a compatible header for statx,
when the base distribution does not still support statx (say centos7).

The header is provided as libglusterfs/src/compat-statx.h and is
packaged with the glusterfs-devel RPM (as are other libglusterfs
headers, and the api-devel package depends on this package, so all this
is fine so far).

The issue at hand is that, the inclusion of the new header [2] is done
using the user specified format for header inclusion (i.e
"compat-statx.h") [3] and should really be a system header file that
comes in with the glusterfs-devel package.

When included as  instead of the current
"compat-statx.h" though, the compilation fails, as there is no directory
named glusterfs, which is added to the search path during compilation,
that contains this header.

For solutions, I tried adding a symlink within libglusterfs/src/ named
glusterfs that points to src, thus providing the directory under which
compat-statx.h can be found. This works when compiling the code, but not
when building packages, as the symlink does not transfer (or I did not
write enough code to make that happen). In reality I do not like this
solution to really use this framework.

The mail is to solicit inputs on how we can solve the compile and
packaging build time dependency and retain the inclusion as a system
header than a user header as it currently stands.

My thought is as follows:
- Create a similar structure that the packaging lays out the headers on
a system and move the headers in there, thus having a cleaner build and
package than other hacks like above.
  - IOW, create glusterfs directory under libglusterfs/src and move
relevant headers that are included in the packaging in there, and
similarly move headers in api/src to a directory like
api/src/glusterfs/api/ such that when appropriate search paths are
provided these can be included in the right manner as system headers and
not user headers.

This work can also help xlator development outside the tree, and also
help with providing a pkgconfig for glusterffs-devel IMO.

Comments and other thoughts?

Shyam

[1] Commit introducing statx: https://review.gluster.org/c/glusterfs/+/19802

[2] Inclusion of the header:
https://review.gluster.org/c/glusterfs/+/19802/9/api/src/glfs.h#54

[3] Include syntax from gcc docs:
https://gcc.gnu.org/onlinedocs/cpp/Include-Syntax.html
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Release 4.1.7 & 5.2

2018-11-14 Thread Shyam Ranganathan

Hi,

As 4.1.6 and 5.1 are now tagged and off to packaging, announcing the
tracker and dates for the next minor versions of these stable releases.

4.1.7:
- Tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-4.1.7
- Deadline for fixes: 2019-01-21

5.2:
- Tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.2
- Deadline for fixes: 2018-12-10

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Regression failure: https://build.gluster.org/job/centos7-regression/3678/

2018-11-14 Thread Shyam Ranganathan

On 11/14/2018 10:04 AM, Nithya Balachandran wrote:
> Hi Mohit,
> 
> The regression run in the subject has failed because a brick has crashed in 
> 
> bug-1432542-mpx-restart-crash.t
> 
> 
> *06:03:38* 1 test(s) generated core 
> *06:03:38* ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> *06:03:38*
> 
> 
> The brick process has crashed in posix_fs_health_check as  this->priv
> contains garbage. It looks like it might have been freed already. Can
> you take a look at it?

Sounds like another incarnation of:
https://bugzilla.redhat.com/show_bug.cgi?id=1636570

@mohit, any further clues?

> 
> 
> 
> (gdb) bt
> #0  0x7f4019ea1f19 in vfprintf () from ./lib64/libc.so.6
> #1  0x7f4019eccf49 in vsnprintf () from ./lib64/libc.so.6
> #2  0x7f401b87705a in gf_vasprintf (string_ptr=0x7f3e81ff99f0,
> format=0x7f400df32f40 "op=%s;path=%s;error=%s;brick=%s:%s timeout is
> %d", arg=0x7f3e81ff99f8)
>     at
> /home/jenkins/root/workspace/centos7-regression/libglusterfs/src/mem-pool.c:234
> #3  0x7f401b8de6e2 in _gf_event
> (event=EVENT_POSIX_HEALTH_CHECK_FAILED, fmt=0x7f400df32f40
> "op=%s;path=%s;error=%s;brick=%s:%s timeout is %d")
>     at
> /home/jenkins/root/workspace/centos7-regression/libglusterfs/src/events.c:89
> #4  0x7f400def07f9 in posix_fs_health_check (this=0x7f3fd78b7840) at
> /home/jenkins/root/workspace/centos7-regression/xlators/storage/posix/src/posix-helpers.c:1960
> #5  0x7f400def0926 in posix_health_check_thread_proc
> (data=0x7f3fd78b7840)
>     at
> /home/jenkins/root/workspace/centos7-regression/xlators/storage/posix/src/posix-helpers.c:2005
> #6  0x7f401a68ae25 in start_thread () from ./lib64/libpthread.so.0
> #7  0x7f4019f53bad in clone () from ./lib64/libc.so.6
> (gdb) f 4
> #4  0x7f400def07f9 in posix_fs_health_check (this=0x7f3fd78b7840) at
> /home/jenkins/root/workspace/centos7-regression/xlators/storage/posix/src/posix-helpers.c:1960
> 1960        gf_event(EVENT_POSIX_HEALTH_CHECK_FAILED,
> (gdb) l
> 1955        sys_close(fd);
> 1956    }
> 1957    if (ret && file_path[0]) {
> 1958        gf_msg(this->name, GF_LOG_WARNING, errno,
> P_MSG_HEALTHCHECK_FAILED,
> 1959               "%s() on %s returned", op, file_path);
> 1960        gf_event(EVENT_POSIX_HEALTH_CHECK_FAILED,
> 1961                 "op=%s;path=%s;error=%s;brick=%s:%s timeout is %d", op,
> 1962                 file_path, strerror(op_errno), priv->hostname,
> priv->base_path,
> 1963                 timeout);
> 1964    }
> (gdb) p pri->hostname
> No symbol "pri" in current context.
> *(gdb) p priv->hostname*
> *$14 = 0xa200 *
> *(gdb) p priv->base_path*
> *$15 = 0x7f3ddeadc0de00  0x7f3ddeadc0de00>*
> (gdb) 
> 
> 
> 
> Thanks,
> Nithya
> 
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Gluster Weekly Report : Static Analyser

2018-11-07 Thread Shyam Ranganathan

On 11/06/2018 02:08 PM, Shyam Ranganathan wrote:
> Hi,
> 
> I was attempting to fix a class of "Insecure data handling" defects in
> coverity around GF_FREE accessing tainted strings. Below is a short
> writeup of the same (pasted into the notes for each issue as well).
> Notifying the list of the same.
> 
> (attempted annotation) Fix: https://review.gluster.org/c/glusterfs/+/21422

Posted a new patch after using another system to check various coverity
runs and annotations. This one works, and once merged should auto-ignore
this pattern of issues. https://review.gluster.org/c/glusterfs/+/21584

> 
> The fix was to annotate the pointer coming into GF_FREE (or really
> __gf_free) as not tainted, based on the reasoning below. This coverity
> annotation is applied incorrectly in the code, as we need to annotate a
> function that on exit marks the string as taint free. IOW, see
> https://community.synopsys.com/s/article/From-Case-Clearing-TAINTED-STRING
> 
> On attempting to write such alternative functions and testing with an in
> house coverity run, the taint was still not cleared. As a result, I am
> marking this/these issues as "False positive"+"Ignore".
> 
> The reason to treat this as a false positive is as follows,
> - The allocation function returns a pointer past the header, where the
> actual usage starts
> - The free function accesses the header information to check if the
> trailer is overwritten to detect memory region overwrites
> - When these pointers are used for IO with external sources the entire
> pointer is tainted
> 
> As we are detecting a similar corruption, using the region before the
> returned pointer (and some after), and not checking regions that were
> passed to the respective external IO sources, the regions need not be
> sanitized before accessing the same. As a result, these instances are
> marked as false positives
> 
> An older thread discussing this from Xavi can be found here:
> https://lists.gluster.org/pipermail/gluster-devel/2014-December/043314.html
> 
> Shyam
> On 11/02/2018 01:07 PM, Sunny Kumar wrote:
>> Hello folks,
>>
>> The current status of static analyser is below:
>>
>> Coverity scan status:
>> Last week we started from 135 and now its 116 (2nd Nov scan)
>> Contributors - Sunny (1 patch containing 7 fixes) and
>> Varsha (1 patch containing 1 fix).
>>
>> As you all are aware we are marking few features as deprecated in gluster 
>> [1].
>> Few coverity defects eliminated due to this activity. (from tier and stripe)
>> [1]. https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html
>>
>> Clang-scan status:
>> Last week we started from 90 and today its 84 (build #503).
>> Contributors- Harpreet (2 patches), Shwetha and Amar(1 patch each).
>>
>> If you want to contribute in fixing coverity and clang-scan fixes
>> please follow these instruction:
>> * for coverity scan fixes:
>> https://lists.gluster.org/pipermail/gluster-devel/2018-August/055155.html
>>  * for clang-scan:
>> https://lists.gluster.org/pipermail/gluster-devel/2018-August/055338.html
>>
>>
>> Regards,
>> Sunny kumar
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Gluster Weekly Report : Static Analyser

2018-11-06 Thread Shyam Ranganathan

Hi,

I was attempting to fix a class of "Insecure data handling" defects in
coverity around GF_FREE accessing tainted strings. Below is a short
writeup of the same (pasted into the notes for each issue as well).
Notifying the list of the same.

(attempted annotation) Fix: https://review.gluster.org/c/glusterfs/+/21422

The fix was to annotate the pointer coming into GF_FREE (or really
__gf_free) as not tainted, based on the reasoning below. This coverity
annotation is applied incorrectly in the code, as we need to annotate a
function that on exit marks the string as taint free. IOW, see
https://community.synopsys.com/s/article/From-Case-Clearing-TAINTED-STRING

On attempting to write such alternative functions and testing with an in
house coverity run, the taint was still not cleared. As a result, I am
marking this/these issues as "False positive"+"Ignore".

The reason to treat this as a false positive is as follows,
- The allocation function returns a pointer past the header, where the
actual usage starts
- The free function accesses the header information to check if the
trailer is overwritten to detect memory region overwrites
- When these pointers are used for IO with external sources the entire
pointer is tainted

As we are detecting a similar corruption, using the region before the
returned pointer (and some after), and not checking regions that were
passed to the respective external IO sources, the regions need not be
sanitized before accessing the same. As a result, these instances are
marked as false positives

An older thread discussing this from Xavi can be found here:
https://lists.gluster.org/pipermail/gluster-devel/2014-December/043314.html

Shyam
On 11/02/2018 01:07 PM, Sunny Kumar wrote:
> Hello folks,
> 
> The current status of static analyser is below:
> 
> Coverity scan status:
> Last week we started from 135 and now its 116 (2nd Nov scan)
> Contributors - Sunny (1 patch containing 7 fixes) and
> Varsha (1 patch containing 1 fix).
> 
> As you all are aware we are marking few features as deprecated in gluster [1].
> Few coverity defects eliminated due to this activity. (from tier and stripe)
> [1]. https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html
> 
> Clang-scan status:
> Last week we started from 90 and today its 84 (build #503).
> Contributors- Harpreet (2 patches), Shwetha and Amar(1 patch each).
> 
> If you want to contribute in fixing coverity and clang-scan fixes
> please follow these instruction:
> * for coverity scan fixes:
> https://lists.gluster.org/pipermail/gluster-devel/2018-August/055155.html
>  * for clang-scan:
> https://lists.gluster.org/pipermail/gluster-devel/2018-August/055338.html
> 
> 
> Regards,
> Sunny kumar
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Release 6: Kick off!

2018-11-06 Thread Shyam Ranganathan

Hi,

With release-5 out of the door, it is time to start some activities for
release-6.

## Scope
It is time to collect and determine scope for the release, so as usual,
please send in features/enhancements that you are working towards
reaching maturity for this release to the devel list, and mark/open the
github issue with the required milestone [1].

At a broader scale, in the maintainers meeting we discussed the
enhancement wish list as in [2].

Other than the above, we are continuing with our quality focus and would
want to see a downward trend (or near-zero) in the following areas,
- Coverity
- clang
- ASAN

We would also like to tighten our nightly testing health, and would
ideally not want to have tests retry and pass in the second attempt on
the testing runs. Towards this, we would send in reports of retried and
failed tests, that need attention and fixes as required.

## Schedule
NOTE: Schedule is going to get heavily impacted due to end of the year
holidays, but we will try to keep it up as much as possible.

Working backwards on the schedule, here's what we have:
- Announcement: Week of Feb 4th, 2019
- GA tagging: Feb-01-2019
- RC1: On demand before GA
- RC0: Jan-02-2019
- Late features cut-off: Week of Dec-24th, 2018
- Branching (feature cutoff date): Dec-17-2018
  (~45 days prior to branching)
- Feature/scope proposal for the release (end date): Nov-21-2018

## Volunteers
This is my usual call for volunteers to run the release with me or
otherwise, but please do consider. We need more hands this time, and
possibly some time sharing during the end of the year owing to the holidays.

Thanks,
Shyam

[1] Release-6 github milestone:
https://github.com/gluster/glusterfs/milestone/8

[2] Release-6 enhancement wishlist:
https://hackmd.io/sP5GsZ-uQpqnmGZmFKuWIg#
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] On making ctime generator enabled by default in stack

2018-11-06 Thread Shyam Ranganathan

On 11/05/2018 10:56 PM, Raghavendra Gowdappa wrote:
> All,
> 
> There is a patch [1] from Kotresh, which makes ctime generator as
> default in stack. Currently ctime generator is being recommended only
> for usecases where ctime is important (like for Elasticsearch). However,
> a reliable (c)(m)time can fix many consistency issues within glusterfs
> stack too. These are issues with caching layers having stale (meta)data
> [2][3][4]. Basically just like applications, components within glusterfs
> stack too need a time to find out which among racing ops (like write,
> stat, etc) has latest (meta)data.
> 
> Also note that a consistent (c)(m)time is not an optional feature, but
> instead forms the core of the infrastructure. So, I am proposing to
> merge this patch. If you've any objections, please voice out before Nov
> 13, 2018 (a week from today).

The primary issue which is discussed in the patch, is upgrade, as the
option name changes. So, I would like clear instructions on how to
perform rolling upgrades, in scenarios where existing installations are
using the older options. If there are no special instructions, I am good
with the patch.

Also, during rolling upgrades if the older option is set, older and
newer clients will send the time information in the frame for use by the
server, right? IOW, in mixed version cluster is the integrity of time
preserved on disk and also sent by the client?

> 
> As to the existing known issues/limitations with ctime generator, my
> conversations with Kotresh, revealed following:
> * Potential performance degradation (we don't yet have data to
> conclusively prove it, preliminary basic tests from Kotresh didn't
> indicate a significant perf drop).
> * atime consistency. ctime generator offers atime consistency equivalent
> to noatime mounts. But, with my limited experience I've not seen too
> many usecases that require atime consistency. If you've a usecase please
> point it out and we'll think how we can meet that requirement.
> 
> [1] https://review.gluster.org/#/c/glusterfs/+/21060/
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1600923
> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1617972
> [4] https://bugzilla.redhat.com/show_bug.cgi?id=1393743
> 
> regards,
> Raghavendra
> 
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Consolidating Feature Requests in github

2018-11-05 Thread Shyam Ranganathan

On 11/05/2018 08:29 AM, Vijay Bellur wrote:
> Hi All,
> 
> I am triaging the open RFEs in bugzilla [1]. Since our new(er) workflow
> involves managing RFEs as github issues, I am considering migrating
> relevant open RFEs from bugzilla to github. Once migrated, a RFE in
> bugzilla would be closed with an appropriate comment. I can also update
> the external tracker to point to the respective github issue. Once the
> migration is done, all our feature requests can be further triaged and
> tracked in github.
> 
> Any objections to doing this?

None from me, I see this as needed and the way forward.

Only thing to consider maybe, how we treat bugs/questions using github
and if we want those moved out to bugzilla (during regular triage of
github issues) or not. IOW, what happens in the reverse from github to
bugzilla.

> 
> Thanks,
> Vijay
> 
> [1] https://goo.gl/7fsgTs
> 
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Announcing Gluster Release 5

2018-10-23 Thread Shyam Ranganathan

The Gluster community is pleased to announce the release of 5.0, our
latest release.

This is a major release that includes a range of code improvements and
stability fixes with some management and standalone features as noted below.

A selection of the key features and changes are documented on this [1] page.

Announcements:

1. Releases that receive maintenance updates post release 5 are, 4.1 and
5. (see [2])

**NOTE:** 3.12 long term maintenance release, will reach end of life
(EOL) with the release of 5.0. (see [2])

2. Release 5 will receive maintenance updates around the 10th of every
month for the first 3 months post release (i.e Nov'18, Dec'18, Jan'19).
Post the initial 3 months, it will receive maintenance updates every 2
months till EOL. (see [3])

Major changes and features:

1) Management:
GlusterD2

IMP: GlusterD2 in Gluster-5 is still considered a preview and is
experimental. It should not be considered ready for production use.
Users should still expect some breaking changes even though all efforts
would be taken to ensure that these can be avoided. As GD2 is still
under heavy development, new features can be expected throughout the
Gluster 5 release.

The following major changes have been committed to GlusterD2 since v4.1.0.
- Volume snapshots
- Volume heal
- Tracing with Opencensus
- Portmap refactoring
- Smartvol API merged with volume create API
- Configure GlusterD2 with environment variables

2) Standalone
- Entry creation and handling, consistency is improved
- Python code in Gluster packages is Python 3 ready
- Quota fsck script to correct quota accounting
- Added noatime option in utime xlator
- Added ctime-invalidation option in quick-read xlator
- Added shard-deletion-rate option in shard xlator
- Removed last usage of MD5 digest in code, towards better FIPS compliance
- Code improvements

3) Bugs Addressed
The release notes[1] also contain bugs addresses in this release.

References:
[1] Release notes: https://docs.gluster.org/en/latest/release-notes/5.0/

[2] Release schedule: https://www.gluster.org/release-schedule/

[3] Gluster release cadence and version changes:
https://lists.gluster.org/pipermail/announce/2018-July/000103.html
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: GA Tagged and release tarball generated

2018-10-18 Thread Shyam Ranganathan

GA tagging done and release tarball is generated.

5.1 release tracker is now open for blockers against the same:
https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.1

5.x minor release is set to release on the 10th of every month, jFYI (as
the release schedule page in the website is updated).

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: GA tomorrow!

2018-10-17 Thread Shyam Ranganathan

On 10/15/2018 02:29 PM, Shyam Ranganathan wrote:
> On 10/11/2018 11:25 AM, Shyam Ranganathan wrote:
>> So we are through with a series of checks and tasks on release-5 (like
>> ensuring all backports to other branches are present in 5, upgrade
>> testing, basic performance testing, Package testing, etc.), but still
>> need the following resolved else we stand to delay the release GA
>> tagging, which I hope to get done over the weekend or by Monday 15th
>> morning (EDT).
>>
>> 1) Fix for libgfapi-python related blocker on Gluster:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1630804
>>
>> @ppai, who needs to look into this?
> 
> Du has looked into this, but resolution is still pending, and release
> still awaiting on this being a blocker.

Fix is backported and awaiting regression scores, before we merge and
make a release (tomorrow!).

@Kaushal, if we tag GA tomorrow EDT, would it be possible to tag GD2
today, for the packaging team to pick the same up?

> 
>>
>> 2) Release notes for options added to the code (see:
>> https://lists.gluster.org/pipermail/gluster-devel/2018-October/055563.html )
>>
>> @du, @krutika can we get some text for the options referred in the mail
>> above?
> 
> Inputs received and release notes updated:
> https://review.gluster.org/c/glusterfs/+/21421

Last chance to add review comments to the release notes!

> 
>>
>> 3) Python3 testing
>> - Heard back from Kotresh on geo-rep passing and saw that we have
>> handled cliutils issues
>> - Anything more to cover? (@aravinda, @kotresh, @ppai?)
>> - We are attempting to get a regression run on a Python3 platform, but
>> that maybe a little ways away from the release (see:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1638030 )
>>
>> Request attention to the above, to ensure we are not breaking things
>> with the release.
>>
>> Thanks,
>> Shyam
>> ___
>> maintainers mailing list
>> maintain...@gluster.org
>> https://lists.gluster.org/mailman/listinfo/maintainers
>>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Announcing Glusterfs release 3.12.15 (Long Term Maintenance)

2018-10-17 Thread Shyam Ranganathan

On 10/17/2018 07:08 AM, Paolo Margara wrote:
> Hi,
> 
> this release will be the last of the 3.12.x branch prior it reach the EOL?

Yes that is true, this would be the last minor release, as release-5
comes out.

> 
> 
> Greetings,
> 
>     Paolo
> 
> Il 16/10/18 17:41, Jiffin Tony Thottan ha scritto:
>>
>> The Gluster community is pleased to announce the release of Gluster
>> 3.12.15 (packages available at [1,2,3]).
>>
>> Release notes for the release can be found at [4].
>>
>> Thanks,
>> Gluster community
>>
>>
>> [1] https://download.gluster.org/pub/gluster/glusterfs/3.12/3.12.15/
>> [2] https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-3.12
>> 
>> [3] https://build.opensuse.org/project/subprojects/home:glusterfs
>> [4] Release notes:
>> https://gluster.readthedocs.io/en/latest/release-notes/3.12.15/
>>
>>
>> ___
>> Gluster-users mailing list
>> gluster-us...@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
> 
> 
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] GlusterFS Project Update - Week 1&2 of Oct

2018-10-16 Thread Shyam Ranganathan

This is a once in 2 weeks update on activities around glusterfs
project [1]. This is intended to provide the community with updates on
progress around key initiatives and also to reiterate current goals that
the project is working towards.

This is intended to help contributors to pick and address key areas that
are in focus, and the community to help provide feedback and raise flags
that need attention.

1. Key highlights of the last 2 weeks:
- Patches merged [1]
  Key patches:
- Coverity fixes
- Python3 related fixes
- ASAN fixes (trickling in)
- Patch to handle a case of hang in arbiter
  https://review.gluster.org/21380
- Fixes in cleanup sequence
  https://review.gluster.org/21379
- Release updates:
  - Release 5 has a single blocker before GA, all other activities are
complete
- Blocker bug: https://bugzilla.redhat.com/show_bug.cgi?id=1630804
  - Release 6 scope call out to happen this week!
- Interesting devel threads
  - “Gluster performance updates”
https://lists.gluster.org/pipermail/gluster-devel/2018-October/055484.html
  - “Update of work on fixing POSIX compliance issues in Glusterfs”
https://lists.gluster.org/pipermail/gluster-devel/2018-October/055488.html
  - “Compile Xlator manually with lib 'glusterfs'”
https://lists.gluster.org/pipermail/gluster-devel/2018-October/055560.html

2. Bug trends in the last 2 weeks
  - Bugs and status for the last 2 weeks [3]
- 14 bugs are still in the NEW state and need assignment

3. Key focus areas for the next 2 weeks
  - Continue coverity, clang, ASAN focus
- Coverity how to participate [4]
- Clang issues that need attention [5]
- ASAN issues:
  See https://review.gluster.org/c/glusterfs/+/21300 on how to
effectively use ASAN builds, and use the same to clear up ASAN issues
appearing in your testing.

  - Improve on bug backlog reduction (details to follow)

  - Remove unsupported xlators from the code base:
https://bugzilla.redhat.com/show_bug.cgi?id=1635688

  - Prepare xlators for classification assignment, to enable selective
volume graph topology for GCS volumes
https://github.com/gluster/glusterfs/blob/master/doc/developer-guide/xlator-classification.md

  - Adapt all xlators (and options) to the new registration function as in,
https://review.gluster.org/c/glusterfs/+/19712

3. Next release focus areas
  - Deprecate xlators as announced in the lists
  - Complete implementation of xlator classification for all xlators
  - Cleanup sequence with brick-mux
  - Fencing infrastructure for gluster-block
  - Fuse Interrupt Syscall Support
  - Release 6 targeted enhancements [6] (Needs to be populated)

4. Longer term focus areas (possibly beyond the next release)
  - Reflink support, extended to snapshot support for gluster-block
  - Client side caching improvements

- Amar, Shyam and Xavi

Links:

[1] GlusterFS: https://github.com/gluster/glusterfs/

[2] Patches merged in the last 2 weeks:
https://review.gluster.org/q/project:glusterfs+branch:master+until:2018-10-14+since:2018-10-01+status:merged

[3] Bug status for the last 2 weeks:
https://bugzilla.redhat.com/report.cgi?x_axis_field=bug_status&y_axis_field=component&z_axis_field=&no_redirect=1&query_format=report-table&short_desc_type=allwordssubstr&short_desc=&classification=Community&product=GlusterFS&longdesc_type=allwordssubstr&longdesc=&bug_file_loc_type=allwordssubstr&bug_file_loc=&status_whiteboard_type=allwordssubstr&status_whiteboard=&keywords_type=allwords&keywords=&deadlinefrom=&deadlineto=&bug_id=&bug_id_type=anyexact&votes=&votes_type=greaterthaneq&emailtype1=substring&email1=&emailtype2=substring&email2=&emailtype3=substring&email3=&chfield=%5BBug+creation%5D&chfieldvalue=&chfieldfrom=2018-10-01&chfieldto=2018-10-14&j_top=AND&f1=component&o1=notequals&v1=project-infrastructure&f2=noop&o2=noop&v2=&format=table&action=wrap

[4] Coverity reduction and how to participate:
https://lists.gluster.org/pipermail/gluster-devel/2018-August/055155.html

[5] CLang issues needing attention:
https://build.gluster.org/job/clang-scan/lastCompletedBuild/clangScanBuildBugs/

[6] Release 6 targeted enhancements:
https://github.com/gluster/glusterfs/milestone/8
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: GA and what are we waiting on

2018-10-15 Thread Shyam Ranganathan

On 10/11/2018 11:25 AM, Shyam Ranganathan wrote:
> So we are through with a series of checks and tasks on release-5 (like
> ensuring all backports to other branches are present in 5, upgrade
> testing, basic performance testing, Package testing, etc.), but still
> need the following resolved else we stand to delay the release GA
> tagging, which I hope to get done over the weekend or by Monday 15th
> morning (EDT).
> 
> 1) Fix for libgfapi-python related blocker on Gluster:
> https://bugzilla.redhat.com/show_bug.cgi?id=1630804
> 
> @ppai, who needs to look into this?

Du has looked into this, but resolution is still pending, and release
still awaiting on this being a blocker.

> 
> 2) Release notes for options added to the code (see:
> https://lists.gluster.org/pipermail/gluster-devel/2018-October/055563.html )
> 
> @du, @krutika can we get some text for the options referred in the mail
> above?

Inputs received and release notes updated:
https://review.gluster.org/c/glusterfs/+/21421

> 
> 3) Python3 testing
> - Heard back from Kotresh on geo-rep passing and saw that we have
> handled cliutils issues
> - Anything more to cover? (@aravinda, @kotresh, @ppai?)
> - We are attempting to get a regression run on a Python3 platform, but
> that maybe a little ways away from the release (see:
> https://bugzilla.redhat.com/show_bug.cgi?id=1638030 )
> 
> Request attention to the above, to ensure we are not breaking things
> with the release.
> 
> Thanks,
> Shyam
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Maintainer meeting minutes : 15th Oct, 2018

2018-10-15 Thread Shyam Ranganathan

### BJ Link
* Bridge: https://bluejeans.com/217609845
* Watch: 

### Attendance
* Nigel, Nithya, Deepshikha, Akarsha, Kaleb, Shyam, Sunny

### Agenda
* AI from previous meeting:
  - Glusto-Test completion on release-5 branch - On Glusto team
  - Vijay will take this on.
  - He will be focusing it on next week.
  - Glusto for 5 may not be happening before release, but we'll do
it right after release it looks like.

- Release 6 Scope
- Will be sending out an email today/tomorrow for scope of release 6.
- Send a biweekly email with focus on glusterfs release focus areas.

- GCS scope into release-6 scope and get issues marked against the same
- For release-6 we want a thinner stack. This means we'd be removing
xlators from the code that Amar has already sent an email about.
- Locking support for gluster-block. Design still WIP. One of the
big ticket items that should make it to release 6. Includes reflink
support and enough locking support to ensure snapshots are consistent.
- GD1 vs GD2. We've been talking about it since release-4.0. We need
to call this out and understand if we will have GD2 as default. This is
call out for a plan for when we want to make this transition.

- Round Table
- [Nigel] Minimum build and CI health for all projects (including
sub-projects).
- This was primarily driven for GCS
- But, we need this even otherwise to sustain quality of projects
- AI: Call out on lists around release 6 scope, with a possible
list of sub-projects
- [Kaleb] SELinux package status
- Waiting on testing to understand if this is done right
- Can be released when required, as it is a separate package
- Release-5 the SELinux policies are with Fedora packages
- Need to coordinate with Fedora release, as content is in 2
packages
- AI: Nigel to follow up and get updates by the next meeting

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 5: GA and what are we waiting on

2018-10-11 Thread Shyam Ranganathan

On 10/11/2018 11:25 AM, Shyam Ranganathan wrote:
> 1) Fix for libgfapi-python related blocker on Gluster:
> https://bugzilla.redhat.com/show_bug.cgi?id=1630804


@du @krutika, the root cause for the above issue is from the commit,

commit c9bde3021202f1d5c5a2d19ac05a510fc1f788ac
https://review.gluster.org/c/glusterfs/+/20639

performance/readdir-ahead: keep stats of cached dentries in sync with
modifications

I have updated the bug with the required findings, please take a look
and let us know if we can get a fix in time for release-5.

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Release 5: GA and what are we waiting on

2018-10-11 Thread Shyam Ranganathan

So we are through with a series of checks and tasks on release-5 (like
ensuring all backports to other branches are present in 5, upgrade
testing, basic performance testing, Package testing, etc.), but still
need the following resolved else we stand to delay the release GA
tagging, which I hope to get done over the weekend or by Monday 15th
morning (EDT).

1) Fix for libgfapi-python related blocker on Gluster:
https://bugzilla.redhat.com/show_bug.cgi?id=1630804

@ppai, who needs to look into this?

2) Release notes for options added to the code (see:
https://lists.gluster.org/pipermail/gluster-devel/2018-October/055563.html )

@du, @krutika can we get some text for the options referred in the mail
above?

3) Python3 testing
- Heard back from Kotresh on geo-rep passing and saw that we have
handled cliutils issues
- Anything more to cover? (@aravinda, @kotresh, @ppai?)
- We are attempting to get a regression run on a Python3 platform, but
that maybe a little ways away from the release (see:
https://bugzilla.redhat.com/show_bug.cgi?id=1638030 )

Request attention to the above, to ensure we are not breaking things
with the release.

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Missing option documentation (need inputs)

2018-10-11 Thread Shyam Ranganathan

On 10/10/2018 11:20 PM, Atin Mukherjee wrote:
> 
> 
> On Wed, 10 Oct 2018 at 20:30, Shyam Ranganathan  <mailto:srang...@redhat.com>> wrote:
> 
> The following options were added post 4.1 and are part of 5.0 as the
> first release for the same. They were added in as part of bugs, and
> hence looking at github issues to track them as enhancements did not
> catch the same.
> 
> We need to document it in the release notes (and also the gluster doc.
> site ideally), and hence I would like a some details on what to write
> for the same (or release notes commits) for them.
> 
> Option: cluster.daemon-log-level
> Attention: @atin
> Review: https://review.gluster.org/c/glusterfs/+/20442
> 
> 
> This option has to be used based on extreme need basis and this is why
> it has been mentioned as GLOBAL_NO_DOC. So ideally this shouldn't be
> documented.
> 
> Do we still want to capture it in the release notes?

This is an interesting catch-22, when we want users to use the option
(say to provide better logs for troubleshooting), we have nothing to
point to, and it would be instructions (repeated over the course of
time) over mails.

I would look at adding this into an options section in the docs, but the
best I can find in there is
https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/

I would say we need to improve the way we deal with options and the
required submissions around the same.

Thoughts?

> 
> <https://review.gluster.org/c/glusterfs/+/20442>
> 
> Option: ctime-invalidation
> Attention: @Du
> Review: https://review.gluster.org/c/glusterfs/+/20286
> 
> Option: shard-lru-limit
> Attention: @krutika
> Review: https://review.gluster.org/c/glusterfs/+/20544
> 
> Option: shard-deletion-rate
> Attention: @krutika
> Review: https://review.gluster.org/c/glusterfs/+/19970
> 
> Please send in the required text ASAP, as we are almost towards the end
> of the release.
> 
> Thanks,
> Shyam
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Branched and further dates

2018-10-10 Thread Shyam Ranganathan

On 09/26/2018 10:21 AM, Shyam Ranganathan wrote:
> 3. Upgrade testing
>   - Need *volunteers* to do the upgrade testing as stated in the 4.1
> upgrade guide [3] to note any differences or changes to the same
>   - Explicit call out on *disperse* volumes, as we continue to state
> online upgrade is not possible, is this addressed and can this be tested
> and the documentation improved around the same?

Completed upgrade testing using RC1 packages against a 4.1 cluster.
Things hold up fine. (replicate type volumes)

I have not attempted a rolling upgrade of disperse volumes, as we still
lack instructions to do so. @Pranith/@Xavi is this feasible this release
onward?

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Missing option documentation (need inputs)

2018-10-10 Thread Shyam Ranganathan

The following options were added post 4.1 and are part of 5.0 as the
first release for the same. They were added in as part of bugs, and
hence looking at github issues to track them as enhancements did not
catch the same.

We need to document it in the release notes (and also the gluster doc.
site ideally), and hence I would like a some details on what to write
for the same (or release notes commits) for them.

Option: cluster.daemon-log-level
Attention: @atin
Review: https://review.gluster.org/c/glusterfs/+/20442

Option: ctime-invalidation
Attention: @Du
Review: https://review.gluster.org/c/glusterfs/+/20286

Option: shard-lru-limit
Attention: @krutika
Review: https://review.gluster.org/c/glusterfs/+/20544

Option: shard-deletion-rate
Attention: @krutika
Review: https://review.gluster.org/c/glusterfs/+/19970

Please send in the required text ASAP, as we are almost towards the end
of the release.

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Nightly build status (week of 01 - 07 Oct, 2018)

2018-10-09 Thread Shyam Ranganathan

We have a set of 4 cores which seem to originate from 2 bugs as filed
and referenced below.

Bug 1: https://bugzilla.redhat.com/show_bug.cgi?id=1636570
Cleanup sequence issues in posix xlator. Mohit/Xavi/Du/Pranith are we
handling this as a part of addressing cleanup in brick mux, or should
we? Instead of piece meal fixes?

Bug 2: https://bugzilla.redhat.com/show_bug.cgi?id=1637743
Initial analysis seems to point to glusterd starting the same brick
instance twice (non-mux case). Request GlusterD folks to take a look.

1) Release-5

Link: ttps://build.gluster.org/job/nightly-release-5/

Failures:
a)
https://build.gluster.org/job/regression-test-with-multiplex/886/consoleText
  - Bug and RCA: https://bugzilla.redhat.com/show_bug.cgi?id=1636570

2) Master

Link: https://build.gluster.org/job/nightly-master/

Failures:
a) Failed job line-coverage:
https://build.gluster.org/job/line-coverage/530/consoleText
  - Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1637743 (initial
analysis)
  - Core generated
  - Test:
./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t

b) Failed job regression:
https://build.gluster.org/job/regression-test-burn-in/4127/consoleText
  - Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1637743 (initial
analysis) (same as 2.a)
  - Core generated
  - Test: ./tests/bugs/glusterd/quorum-validation.t

c) Failed job regression-with-mux:
https://build.gluster.org/job/regression-test-with-multiplex/889/consoleText
  - Bug and RCA: https://bugzilla.redhat.com/show_bug.cgi?id=1636570
(same as 1.a)
  - Core generated
  - Test: ./tests/basic/ec/ec-5-2.t

NOTE: All night-lies failed in distributed-regression tests as well, but
as these are not yet stable not calling these out.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Branched and further dates

2018-10-05 Thread Shyam Ranganathan

On 10/05/2018 10:59 AM, Shyam Ranganathan wrote:
> On 10/04/2018 11:33 AM, Shyam Ranganathan wrote:
>> On 09/13/2018 11:10 AM, Shyam Ranganathan wrote:
>>> RC1 would be around 24th of Sep. with final release tagging around 1st
>>> of Oct.
>> RC1 now stands to be tagged tomorrow, and patches that are being
>> targeted for a back port include,
> We still are awaiting release notes (other than the bugs section) to be
> closed.
> 
> There is one new bug that needs attention from the replicate team.
> https://bugzilla.redhat.com/show_bug.cgi?id=1636502
> 
> The above looks important to me to be fixed before the release, @ravi or
> @pranith can you take a look?
> 

RC1 is tagged and release tarball generated.

We still have 2 issues to work on,

1. The above messages from AFR in self heal logs

2. We need to test with Py3, else we risk putting out packages there on
Py3 default distros and causing some mayhem if basic things fail.

I am open to suggestions on how to ensure we work with Py3, thoughts?

I am thinking we run a regression on F28 (or a platform that defaults to
Py3) and ensure regressions are passing at the very least. For other
Python code that regressions do not cover,
- We have a list at [1]
- How can we split ownership of these?

@Aravinda, @Kotresh, and @ppai, looking to you folks to help out with
the process and needs here.

Shyam

[1] https://github.com/gluster/glusterfs/issues/411
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Branched and further dates

2018-10-05 Thread Shyam Ranganathan

On 10/05/2018 11:04 AM, Atin Mukherjee wrote:
> >
> > 3) Release notes review and updates with GD2 content pending
> >
> > @Kaushal/GD2 team can we get the updates as required?
> > https://review.gluster.org/c/glusterfs/+/21303
> 
> Still awaiting this.
> 
> 
> Kaushal has added a comment into the patch providing the content today
> morning IST. Any additional details are you looking for?

Saw this now, this should be fine. I did not read comments in my morning
(my bad), and instead saw there was no activity on the patch itself, and
missed this.

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Branched and further dates

2018-10-05 Thread Shyam Ranganathan

On 10/04/2018 11:33 AM, Shyam Ranganathan wrote:
> On 09/13/2018 11:10 AM, Shyam Ranganathan wrote:
>> RC1 would be around 24th of Sep. with final release tagging around 1st
>> of Oct.
> 
> RC1 now stands to be tagged tomorrow, and patches that are being
> targeted for a back port include,

We still are awaiting release notes (other than the bugs section) to be
closed.

There is one new bug that needs attention from the replicate team.
https://bugzilla.redhat.com/show_bug.cgi?id=1636502

The above looks important to me to be fixed before the release, @ravi or
@pranith can you take a look?

> 
> 1) https://review.gluster.org/c/glusterfs/+/21314 (snapshot volfile in
> mux cases)
> 
> @RaBhat working on this.

Done

> 
> 2) Py3 corrections in master
> 
> @Kotresh are all changes made to master backported to release-5 (may not
> be merged, but looking at if they are backported and ready for merge)?

Done, release notes amend pending

> 
> 3) Release notes review and updates with GD2 content pending
> 
> @Kaushal/GD2 team can we get the updates as required?
> https://review.gluster.org/c/glusterfs/+/21303

Still awaiting this.

> 
> 4) This bug [2] was filed when we released 4.0.
> 
> The issue has not bitten us in 4.0 or in 4.1 (yet!) (i.e the options
> missing and hence post-upgrade clients failing the mount). This is
> possibly the last chance to fix it.
> 
> Glusterd and protocol maintainers, can you chime in, if this bug needs
> to be and can be fixed? (thanks to @anoopcs for pointing it out to me)

Release notes to be corrected to call this out.

> 
> The tracker bug [1] does not have any other blockers against it, hence
> assuming we are not tracking/waiting on anything other than the set above.
> 
> Thanks,
> Shyam
> 
> [1] Tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.0
> [2] Potential upgrade bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1540659
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 5: Branched and further dates

2018-10-05 Thread Shyam Ranganathan

On 10/04/2018 02:46 PM, Kotresh Hiremath Ravishankar wrote:
> 2) Py3 corrections in master
> 
> @Kotresh are all changes made to master backported to release-5 (may not
> be merged, but looking at if they are backported and ready for merge)?
> 
> 
> All changes made to master are backported to release-5. But py3 support
> is still not complete.
> 

So if run with Py3 the code may not work as intended? Looking for some
clarification around "not complete" so that release notes can be amended
accordingly.

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 5: Branched and further dates

2018-10-04 Thread Shyam Ranganathan

On 10/04/2018 12:01 PM, Atin Mukherjee wrote:
> 4) This bug [2] was filed when we released 4.0.
> 
> The issue has not bitten us in 4.0 or in 4.1 (yet!) (i.e the options
> missing and hence post-upgrade clients failing the mount). This is
> possibly the last chance to fix it.
> 
> Glusterd and protocol maintainers, can you chime in, if this bug needs
> to be and can be fixed? (thanks to @anoopcs for pointing it out to me)
> 
> 
> This is a bad bug to live with. OTOH, I do not have an immediate
> solution in my mind on how to make sure (a) these options when
> reintroduced are made no-ops, especially they will be disallowed to tune
> (with out dirty option check hacks at volume set staging code) . If
> we're to tag RC1 tomorrow, I wouldn't be able to take a risk to commit
> this change.
> 
> Can we actually have a note in our upgrade guide to document that if
> you're upgrading to 4.1 or higher version make sure to disable these
> options before the upgrade to mitigate this?

Yes, adding this to the "Major Issues" section in the release notes as
well as noting it in the upgrade guide is possible. I will go with this
option for now, as we do not have complaints around this from 4.0/4.1
releases (which have the same issue as well).
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 5: Branched and further dates

2018-10-04 Thread Shyam Ranganathan

On 09/13/2018 11:10 AM, Shyam Ranganathan wrote:
> RC1 would be around 24th of Sep. with final release tagging around 1st
> of Oct.

RC1 now stands to be tagged tomorrow, and patches that are being
targeted for a back port include,

1) https://review.gluster.org/c/glusterfs/+/21314 (snapshot volfile in
mux cases)

@RaBhat working on this.

2) Py3 corrections in master

@Kotresh are all changes made to master backported to release-5 (may not
be merged, but looking at if they are backported and ready for merge)?

3) Release notes review and updates with GD2 content pending

@Kaushal/GD2 team can we get the updates as required?
https://review.gluster.org/c/glusterfs/+/21303

4) This bug [2] was filed when we released 4.0.

The issue has not bitten us in 4.0 or in 4.1 (yet!) (i.e the options
missing and hence post-upgrade clients failing the mount). This is
possibly the last chance to fix it.

Glusterd and protocol maintainers, can you chime in, if this bug needs
to be and can be fixed? (thanks to @anoopcs for pointing it out to me)

The tracker bug [1] does not have any other blockers against it, hence
assuming we are not tracking/waiting on anything other than the set above.

Thanks,
Shyam

[1] Tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.0
[2] Potential upgrade bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1540659
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Branched and further dates

2018-10-02 Thread Shyam Ranganathan

On 09/26/2018 10:21 AM, Shyam Ranganathan wrote:
> 1. Release notes (Owner: release owner (myself), will send out an
> initial version for review and to solicit inputs today)

Please find the initial commit here [1].

@Kaushal/GD2 team, request updation of the Management section with
relevant notes.

@others, reviews welcome, also if any noted feature still has to update
the gluster documentation to call out the options or its use, now would
be a good time to close the same, as it can aid users better than just
release notes and what is written in there.

Shyam

[1] https://review.gluster.org/c/glusterfs/+/21303
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Performance comparisons

2018-10-02 Thread Shyam Ranganathan

On 09/26/2018 10:21 AM, Shyam Ranganathan wrote:
> 4. Performance testing/benchmarking
>   - I would be using smallfile and FIO to baseline 3.12 and 4.1 and test
> RC0 for any major regressions
>   - If we already know of any please shout out so that we are aware of
> the problems and upcoming fixes to the same


Managed to complete this yesterday, and attached are the results. The
comparison is between 4.1.5 and 5.0 to understand if there are any mojor
regressions, the tests themselves need to be tuned better in the future,
but help to provide an initial look into the comparing releases.

Observations (just some top ones, not details, considering at least a 5%
delta from prior release numbers):
- ls -l tests with smallfile on dist-arbiter volumes seems to have regressed
- create tests with smallfile on dist-arbiter volumes seems to have improved
- FIO sequential write performance remains mostly the same across volume
types
- FIO sequential read performance seems to degrade on certain volume types
- FIO random write performance seems to degrade on certain volume types
- FIO random read performance seems to have improved across volume types

Goof-ups:
- The volume creation ansible play just laid out bricks in host order,
hence for tests like dist-dispers-4x(4+2) all 6 bricks of the same
subvolume ended up on the same host. Interestingly this happened across
both versions compared, and hence the pattern was the same, allowing
some base comparisons.

Next steps:
- I will be running the tests that gave inconsistent results with RC1
when we build the same
- It would be useful for component owners to take a look and call out
possible causes for some of the degrades, if already known

Shyam


gbench-Summary-4.1.5-to-5.0.ods
Description: application/vnd.oasis.opendocument.spreadsheet
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Memory overwrites due to processing vol files???

2018-09-28 Thread Shyam Ranganathan

We tested with ASAN and without the fix at [1], and it consistently
crashes at the mdcache xlator when brick mux is enabled.
On 09/28/2018 03:50 PM, FNU Raghavendra Manjunath wrote:
> 
> I was looking into the issue and  this is what I could find while
> working with shyam.
> 
> There are 2 things here.
> 
> 1) The multiplexed brick process for the snapshot(s) getting the client
> volfile (I suspect, it happened
>      when restore operation was performed).
> 2) Memory corruption happening while the multiplexed brick process is
> building the graph (for the client
>      volfile it got above)
> 
> I have been able to reproduce the issue in my local computer once, when
> I ran the testcase tests/bugs/snapshot/bug-1275616.t
> 
> Upon comparison, we found that the backtrace of the core I got and the
> core generated in the regression runs was similar.
> In fact, the victim information shyam mentioned before, is also similar
> in the core that I was able to get.  
> 
> On top of that, when the brick process was run with valgrind, it
> reported following memory corruption
> 
> ==31257== Conditional jump or move depends on uninitialised value(s)
> ==31257==    at 0x1A7D0564: mdc_xattr_list_populate (md-cache.c:3127)
> ==31257==    by 0x1A7D1903: mdc_init (md-cache.c:3486)
> ==31257==    by 0x4E62D41: __xlator_init (xlator.c:684)
> ==31257==    by 0x4E62E67: xlator_init (xlator.c:709)
> ==31257==    by 0x4EB2BEB: glusterfs_graph_init (graph.c:359)
> ==31257==    by 0x4EB37F8: glusterfs_graph_activate (graph.c:722)
> ==31257==    by 0x40AEC3: glusterfs_process_volfp (glusterfsd.c:2528)
> ==31257==    by 0x410868: mgmt_getspec_cbk (glusterfsd-mgmt.c:2076)
> ==31257==    by 0x518408D: rpc_clnt_handle_reply (rpc-clnt.c:755)
> ==31257==    by 0x51845C1: rpc_clnt_notify (rpc-clnt.c:923)
> ==31257==    by 0x518084E: rpc_transport_notify (rpc-transport.c:525)
> ==31257==    by 0x123273DF: socket_event_poll_in (socket.c:2504)
> ==31257==  Uninitialised value was created by a heap allocation
> ==31257==    at 0x4C2DB9D: malloc (vg_replace_malloc.c:299)
> ==31257==    by 0x4E9F58E: __gf_malloc (mem-pool.c:136)
> ==31257==    by 0x1A7D052A: mdc_xattr_list_populate (md-cache.c:3123)
> ==31257==    by 0x1A7D1903: mdc_init (md-cache.c:3486)
> ==31257==    by 0x4E62D41: __xlator_init (xlator.c:684)
> ==31257==    by 0x4E62E67: xlator_init (xlator.c:709)
> ==31257==    by 0x4EB2BEB: glusterfs_graph_init (graph.c:359)
> ==31257==    by 0x4EB37F8: glusterfs_graph_activate (graph.c:722)
> ==31257==    by 0x40AEC3: glusterfs_process_volfp (glusterfsd.c:2528)
> ==31257==    by 0x410868: mgmt_getspec_cbk (glusterfsd-mgmt.c:2076)
> ==31257==    by 0x518408D: rpc_clnt_handle_reply (rpc-clnt.c:755)
> ==31257==    by 0x51845C1: rpc_clnt_notify (rpc-clnt.c:923)
> 
> Based on the above observations, I think the below patch  by Shyam
> should fix the crash.

[1]

> https://review.gluster.org/#/c/glusterfs/+/21299/
> 
> But, I am still trying understand, why a brick process should get a
> client volfile (i.e. the 1st issue mentioned above). 
> 
> Regards,
> Raghavendra
> 
> On Wed, Sep 26, 2018 at 9:00 PM Shyam Ranganathan  <mailto:srang...@redhat.com>> wrote:
> 
> On 09/26/2018 10:21 AM, Shyam Ranganathan wrote:
> > 2. Testing dashboard to maintain release health (new, thanks Nigel)
> >   - Dashboard at [2]
> >   - We already have 3 failures here as follows, needs attention from
> > appropriate *maintainers*,
> >     (a)
> >
> 
> https://build.gluster.org/job/regression-test-with-multiplex/871/consoleText
> >       - Failed with core:
> ./tests/basic/afr/gfid-mismatch-resolution-with-cli.t
> >     (b)
> >
> 
> https://build.gluster.org/job/regression-test-with-multiplex/873/consoleText
> >       - Failed with core: ./tests/bugs/snapshot/bug-1275616.t
> >       - Also test ./tests/bugs/glusterd/validating-server-quorum.t
> had to be
> > retried
> 
> I was looking at the cores from the above 2 instances, the one in job
> 873 is been a typical pattern, where malloc fails as there is internal
> header corruption in the free bins.
> 
> When examining the victim that would have been allocated, it is often
> carrying incorrect size and other magic information. If the data in
> victim is investigated it looks like a volfile.
> 
> With the crash in 871, I thought there maybe a point where this is
> detected earlier, but not able to make headway in the same.
> 
> So, what could be corrupting this memory and is it when the graph is
> being processed? Can we run this with ASAN or such (I have not tried,
> but

Re: [Gluster-devel] Python3 build process

2018-09-28 Thread Shyam Ranganathan

On 09/28/2018 09:11 AM, Niels de Vos wrote:
> On Fri, Sep 28, 2018 at 08:57:06AM -0400, Shyam Ranganathan wrote:
>> On 09/28/2018 06:12 AM, Niels de Vos wrote:
>>> On Thu, Sep 27, 2018 at 08:40:54AM -0400, Shyam Ranganathan wrote:
>>>> On 09/27/2018 08:07 AM, Kaleb S. KEITHLEY wrote:
>>>>>> The thought is,
>>>>>> - Add a configure option "--enable-py-version-correction" to configure,
>>>>>> that is disabled by default
>>>>> "correction" implies there's something that's incorrect. How about
>>>>> "conversion" or perhaps just --enable-python2
>>>>>
>>>> I would not like to go with --enable-python2 as that implies it is an
>>>> conscious choice with the understanding that py2 is on the box. Given
>>>> the current ability to detect and hence correct the python shebangs, I
>>>> would think we should retain it as a more detect and modify the shebangs
>>>> option name. (I am looking at this more as an option that does the right
>>>> thing implicitly than someone/tool using this checking explicitly, which
>>>> can mean different things to different people, if that makes sense)
>>>>
>>>> Now "correction" seems like an overkill, maybe "conversion"?
>>> Is it really needed to have this as an option? Instead of an option in
>>> configure.ac, can it not be a post-install task in a Makefile.am? The
>>> number of executable python scripts that get installed are minimal, so I
>>> do not expect that a lot of changes are needed for this.
>>
>> Here is how I summarize this proposal,
>> - Perform the shebang "correction" for py2 in the post install
>>   - Keeps the git clone clean
>> - shebang correction occurs based on a configure time option
>>   - It is not implicit but an explicit choice to correct the shebangs to
>> py2, hence we need an option either way
>> - The configure option would be "--enable-python2"
>>   - Developers that need py2, can configure it as such
>>   - Regression jobs that need py2, either because of the platform they
>> test against, or for py2 compliance in the future, use the said option
>>   - Package builds are agnostic to these changes (currently) as they
>> decide at build time based on the platform what needs to be done.
> 
> I do not think such a ./configure option is needed. configure.ac can
> find out the version that is available, and pick python3 if it has both.
> 
> Tests should just run with "$PYTHON run-the-test.py" instead of
> ./run-the-test.py with a #!/usr/bin/python shebang. The testing
> framework can also find out what version of python is available.

If we back up a bit here, if all shebangs are cleared, then we do not
need anything. That is not the situation at the moment, and neither do I
know if that state can be reached.

We also need to ensure we work against py2 and py3 for the near future,
which entails being specific in some regression job at least on the
python choice, does that correct the shebangs really depends on the
above conclusion.

> 
> 
>>> There do seem quite some Python files that have a shebang, but do not
>>> need it (__init__.py, not executable, no __main__-like functions). This
>>> should probably get reviewed as well. When those scripts get their
>>> shebang removed, even fewer files need to be 'fixed-up'.
>>
>> I propose maintainers/component-owner take this cleanup.
> 
> That would be ideal!
> 
> 
>>> Is there a BZ or GitHub Issue that I can use to send some fixes?
>>
>> See: https://github.com/gluster/glusterfs/issues/411
> 
> Thanks,
> Niels
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Python3 build process

2018-09-28 Thread Shyam Ranganathan

On 09/28/2018 12:36 AM, Kotresh Hiremath Ravishankar wrote:
> > - All regression hosts are currently py2 and so if we do not run
> the py
> > shebang correction during configure (as we do not build and test from
> > RPMS), we would be running with incorrect py3 shebangs (although this
> > seems to work, see [2]. @kotresh can we understand why?)
> 
> Is it because we don't test any of the python in the regression tests?
> 
> Geo-replication do have regression tests but not sure about glusterfind,
> events.
> 
> Or because when we do, we invoke python scripts with `python foo.py` or
> `$PYTHON foo.py` everywhere? The shebangs are ignored when scripts are
> invoked this way.
> 
> The reason why geo-rep is passing is for the same reason mentioned. Geo-rep
> python file is invoked from a c program always prefixing it with python
> as follows.
> 
> python = getenv("PYTHON");
>     if (!python)
>     python = PYTHON;
>     nargv[j++] = python;
>     nargv[j++] = GSYNCD_PREFIX "/python/syncdaemon/" GSYNCD_PY;

Thank you, makes sense now.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Python3 build process

2018-09-28 Thread Shyam Ranganathan

On 09/28/2018 06:12 AM, Niels de Vos wrote:
> On Thu, Sep 27, 2018 at 08:40:54AM -0400, Shyam Ranganathan wrote:
>> On 09/27/2018 08:07 AM, Kaleb S. KEITHLEY wrote:
>>>> The thought is,
>>>> - Add a configure option "--enable-py-version-correction" to configure,
>>>> that is disabled by default
>>> "correction" implies there's something that's incorrect. How about
>>> "conversion" or perhaps just --enable-python2
>>>
>> I would not like to go with --enable-python2 as that implies it is an
>> conscious choice with the understanding that py2 is on the box. Given
>> the current ability to detect and hence correct the python shebangs, I
>> would think we should retain it as a more detect and modify the shebangs
>> option name. (I am looking at this more as an option that does the right
>> thing implicitly than someone/tool using this checking explicitly, which
>> can mean different things to different people, if that makes sense)
>>
>> Now "correction" seems like an overkill, maybe "conversion"?
> Is it really needed to have this as an option? Instead of an option in
> configure.ac, can it not be a post-install task in a Makefile.am? The
> number of executable python scripts that get installed are minimal, so I
> do not expect that a lot of changes are needed for this.

Here is how I summarize this proposal,
- Perform the shebang "correction" for py2 in the post install
  - Keeps the git clone clean
- shebang correction occurs based on a configure time option
  - It is not implicit but an explicit choice to correct the shebangs to
py2, hence we need an option either way
- The configure option would be "--enable-python2"
  - Developers that need py2, can configure it as such
  - Regression jobs that need py2, either because of the platform they
test against, or for py2 compliance in the future, use the said option
  - Package builds are agnostic to these changes (currently) as they
decide at build time based on the platform what needs to be done.

> 
> There do seem quite some Python files that have a shebang, but do not
> need it (__init__.py, not executable, no __main__-like functions). This
> should probably get reviewed as well. When those scripts get their
> shebang removed, even fewer files need to be 'fixed-up'.

I propose maintainers/component-owner take this cleanup.

> 
> Is there a BZ or GitHub Issue that I can use to send some fixes?

See: https://github.com/gluster/glusterfs/issues/411

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Python3 build process

2018-09-27 Thread Shyam Ranganathan

On 09/27/2018 08:07 AM, Kaleb S. KEITHLEY wrote:
>> The thought is,
>> - Add a configure option "--enable-py-version-correction" to configure,
>> that is disabled by default
> "correction" implies there's something that's incorrect. How about
> "conversion" or perhaps just --enable-python2
> 

I would not like to go with --enable-python2 as that implies it is an
conscious choice with the understanding that py2 is on the box. Given
the current ability to detect and hence correct the python shebangs, I
would think we should retain it as a more detect and modify the shebangs
option name. (I am looking at this more as an option that does the right
thing implicitly than someone/tool using this checking explicitly, which
can mean different things to different people, if that makes sense)

Now "correction" seems like an overkill, maybe "conversion"?
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Memory overwrites due to processing vol files???

2018-09-26 Thread Shyam Ranganathan

On 09/26/2018 10:21 AM, Shyam Ranganathan wrote:
> 2. Testing dashboard to maintain release health (new, thanks Nigel)
>   - Dashboard at [2]
>   - We already have 3 failures here as follows, needs attention from
> appropriate *maintainers*,
> (a)
> https://build.gluster.org/job/regression-test-with-multiplex/871/consoleText
>   - Failed with core: 
> ./tests/basic/afr/gfid-mismatch-resolution-with-cli.t
> (b)
> https://build.gluster.org/job/regression-test-with-multiplex/873/consoleText
>   - Failed with core: ./tests/bugs/snapshot/bug-1275616.t
>   - Also test ./tests/bugs/glusterd/validating-server-quorum.t had to be
> retried

I was looking at the cores from the above 2 instances, the one in job
873 is been a typical pattern, where malloc fails as there is internal
header corruption in the free bins.

When examining the victim that would have been allocated, it is often
carrying incorrect size and other magic information. If the data in
victim is investigated it looks like a volfile.

With the crash in 871, I thought there maybe a point where this is
detected earlier, but not able to make headway in the same.

So, what could be corrupting this memory and is it when the graph is
being processed? Can we run this with ASAN or such (I have not tried,
but need pointers if anyone has run tests with ASAN).

Here is the (brief) stack analysis of the core in 873:
NOTE: we need to start avoiding flushing the logs when we are dumping
core, as that leads to more memory allocations and causes a sort of
double fault in such cases.

Core was generated by `/build/install/sbin/glusterfsd -s
builder101.cloud.gluster.org --volfile-id /sn'.
Program terminated with signal 6, Aborted.
#0  0x7f23cf590277 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x7f23cf590277 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f23cf591968 in __GI_abort () at abort.c:90
#2  0x7f23cf5d2d37 in __libc_message (do_abort=do_abort@entry=2,
fmt=fmt@entry=0x7f23cf6e4d58 "*** Error in `%s': %s: 0x%s ***\n") at
../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x7f23cf5db499 in malloc_printerr (ar_ptr=0x7f23bc20,
ptr=, str=0x7f23cf6e4ea8 "free(): corrupted unsorted
chunks", action=3) at malloc.c:5025
#4  _int_free (av=0x7f23bc20, p=, have_lock=0) at
malloc.c:3847
#5  0x7f23d0f7c6e4 in __gf_free (free_ptr=0x7f23bc0a56a0) at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/mem-pool.c:356
#6  0x7f23d0f41821 in log_buf_destroy (buf=0x7f23bc0a5568) at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/logging.c:358
#7  0x7f23d0f44e55 in gf_log_flush_list (copy=0x7f23c404a290,
ctx=0x1ff6010) at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/logging.c:1739
#8  0x7f23d0f45081 in gf_log_flush_extra_msgs (ctx=0x1ff6010, new=0)
at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/logging.c:1807
#9  0x7f23d0f4162d in gf_log_set_log_buf_size (buf_size=0) at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/logging.c:290
#10 0x7f23d0f41acc in gf_log_disable_suppression_before_exit
(ctx=0x1ff6010) at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/logging.c:444
#11 0x7f23d0f4c027 in gf_print_trace (signum=6, ctx=0x1ff6010) at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/common-utils.c:922
#12 0x0040a84a in glusterfsd_print_trace (signum=6) at
/home/jenkins/root/workspace/regression-test-with-multiplex/glusterfsd/src/glusterfsd.c:2316
#13 
#14 0x7f23cf590277 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#15 0x7f23cf591968 in __GI_abort () at abort.c:90
#16 0x7f23cf5d2d37 in __libc_message (do_abort=2,
fmt=fmt@entry=0x7f23cf6e4d58 "*** Error in `%s': %s: 0x%s ***\n") at
../sysdeps/unix/sysv/linux/libc_fatal.c:196
#17 0x7f23cf5dcc86 in malloc_printerr (ar_ptr=0x7f23bc20,
ptr=0x7f23bc003cd0, str=0x7f23cf6e245b "malloc(): memory corruption",
action=) at malloc.c:5025
#18 _int_malloc (av=av@entry=0x7f23bc20, bytes=bytes@entry=15664) at
malloc.c:3473
#19 0x7f23cf5df84c in __GI___libc_malloc (bytes=15664) at malloc.c:2899
#20 0x7f23d0f3bbbf in __gf_default_malloc (size=15664) at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/mem-pool.h:106
#21 0x7f23d0f3f02f in xlator_mem_acct_init (xl=0x7f23bc082b20,
num_types=163) at
/home/jenkins/root/workspace/regression-test-with-multiplex/libglusterfs/src/xlator.c:800
#22 0x7f23b90a37bf in mem_acct_init (this=0x7f23bc082b20) at
/home/jenkins/root/workspace/regression-test-with-multiplex/xlators/performance/open-behind/src

[Gluster-devel] Python3 build process

2018-09-26 Thread Shyam Ranganathan

Hi,

With the introduction of default python 3 shebangs and the change in
configure.ac to correct these to py2 if the build is being attempted on
a machine that does not have py3, there are a couple of issues
uncovered. Here is the plan to fix the same, suggestions welcome.

Issues:
- A configure job is run when creating the dist tarball, and this runs
on non py3 platforms, hence changing the dist tarball to basically have
py2 shebangs, as a result the release-new build job always outputs py
files with the py2 shebang. See tarball in [1]

- All regression hosts are currently py2 and so if we do not run the py
shebang correction during configure (as we do not build and test from
RPMS), we would be running with incorrect py3 shebangs (although this
seems to work, see [2]. @kotresh can we understand why?)

Plan to address the above is detailed in this bug [3].

The thought is,
- Add a configure option "--enable-py-version-correction" to configure,
that is disabled by default

- All regression jobs will run with the above option, and hence this
will correct the py shebangs in the regression machines. In the future
as we run on both py2 and py3 machines, this will run with the right
python shebangs on these machines.

- The packaging jobs will now run the py version detection and shebang
correction during actual build and packaging, Kaleb already has put up a
patch for the same [2].

Thoughts?

Shyam

[1] Release tarball: https://build.gluster.org/job/release-new/69/
[2] Patch that defaults to py3 in regression and passes regressions:
https://review.gluster.org/c/glusterfs/+/21266
[3] Infra bug to change regression jobs:
https://bugzilla.redhat.com/show_bug.cgi?id=1633425
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Branched and further dates

2018-09-26 Thread Shyam Ranganathan

Hi,

Updates on the release and shout out for help is as follows,

RC0 Release packages for testing are available see the thread at [1]

These are the following activities that we need to complete for calling
the release as GA (with no major regressions i.e):

1. Release notes (Owner: release owner (myself), will send out an
initial version for review and to solicit inputs today)

2. Testing dashboard to maintain release health (new, thanks Nigel)
  - Dashboard at [2]
  - We already have 3 failures here as follows, needs attention from
appropriate *maintainers*,
(a)
https://build.gluster.org/job/regression-test-with-multiplex/871/consoleText
- Failed with core: 
./tests/basic/afr/gfid-mismatch-resolution-with-cli.t
(b)
https://build.gluster.org/job/regression-test-with-multiplex/873/consoleText
- Failed with core: ./tests/bugs/snapshot/bug-1275616.t
- Also test ./tests/bugs/glusterd/validating-server-quorum.t had to be
retried
(c)
https://build.gluster.org/job/regression-test-burn-in/4109/consoleText
- Failed with core: ./tests/basic/mgmt_v3-locks.t

3. Upgrade testing
  - Need *volunteers* to do the upgrade testing as stated in the 4.1
upgrade guide [3] to note any differences or changes to the same
  - Explicit call out on *disperse* volumes, as we continue to state
online upgrade is not possible, is this addressed and can this be tested
and the documentation improved around the same?

4. Performance testing/benchmarking
  - I would be using smallfile and FIO to baseline 3.12 and 4.1 and test
RC0 for any major regressions
  - If we already know of any please shout out so that we are aware of
the problems and upcoming fixes to the same

5. Major testing areas
  - Py3 support: Need *volunteers* here to test out the Py3 support
around changed python files, if there is not enough coverage in the
regression test suite for the same

Thanks,
Shyam

[1] Packages for RC0:
https://lists.gluster.org/pipermail/maintainers/2018-September/005044.html

[2] Release testing health dashboard:
https://build.gluster.org/job/nightly-release-5/

[3] 4.1 upgrade guide:
https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/

On 09/13/2018 11:10 AM, Shyam Ranganathan wrote:
> Hi,
> 
> Release 5 has been branched today. To backport fixes to the upcoming 5.0
> release use the tracker bug [1].
> 
> We intend to roll out RC0 build by end of tomorrow for testing, unless
> the set of usual cleanup patches (op-version, some messages, gfapi
> version) land in any form of trouble.
> 
> RC1 would be around 24th of Sep. with final release tagging around 1st
> of Oct.
> 
> I would like to encourage everyone to test out the bits as appropriate
> and post updates to this thread.
> 
> Thanks,
> Shyam
> 
> [1] 5.0 tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.0
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Release 5: Branched and further dates

2018-09-13 Thread Shyam Ranganathan

Hi,

Release 5 has been branched today. To backport fixes to the upcoming 5.0
release use the tracker bug [1].

We intend to roll out RC0 build by end of tomorrow for testing, unless
the set of usual cleanup patches (op-version, some messages, gfapi
version) land in any form of trouble.

RC1 would be around 24th of Sep. with final release tagging around 1st
of Oct.

I would like to encourage everyone to test out the bits as appropriate
and post updates to this thread.

Thanks,
Shyam

[1] 5.0 tracker: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.0
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 5: Release calendar and status updates

2018-09-10 Thread Shyam Ranganathan

On 08/22/2018 02:03 PM, Shyam Ranganathan wrote:
> On 08/14/2018 02:28 PM, Shyam Ranganathan wrote:
>> 2) Branching date: (Monday) Aug-20-2018 (~40 days before GA tagging)
> 
> We are postponing branching to 2nd week of September (10th), as the
> entire effort in this release has been around stability and fixing
> issues across the board.

This is delayed for the following reasons,
- Stability of mux regressions
  There have been a few cores last week and we at least need an analysis
of the same before branching. Mohit, Atin and myself have looked at the
same and will post a broader update later today or tomorrow.

NOTE: Branching is not being withheld for the above, as we would
backport the required fixes, and post branching there is work to do in
terms of cleaning up the branch (gfapi, versions etc.) that takes some time.

- Not having the Gluster 5.0 found in version in bugzilla
This issue has been resolved with the bugzilla team today, so it is no
longer a blocker.

(read on as I still need information for some of the asks below)

> 
> Thus, we are expecting no net new features from hereon till branching,
> and features that are already a part of the code base and its details
> are as below.
> 



> 1) Changes to options tables in xlators (#302)
> 
> @Kaushal/GD2 team, can we call this complete? There maybe no real
> release notes for the same, as these are internal in nature, but
> checking nevertheless.

@Kaushal or GD2 contributors, ping!

> 5) Turn on Dentry fop serializer by default in brick stack (#421)
> 
> @du, the release note for this can be short, as other details are
> captured in 4.0 release notes.
> 
> However, in 4.0 release we noted a limitation with this feature as follows,
> 
> "Limitations: This feature is released as a technical preview, as
> performance implications are not known completely." (see section
> https://docs.gluster.org/en/latest/release-notes/4.0.0/#standalone )
> 
> Do we now have better data regarding the same that we can use when
> announcing the release?

@Du ping!
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Proposal to change Gerrit -> Bugzilla updates

2018-09-10 Thread Shyam Ranganathan

On 09/10/2018 08:37 AM, Nigel Babu wrote:
> Hello folks,
> 
> We now have review.gluster.org  as an
> external tracker on Bugzilla. Our current automation when there is a
> bugzilla attached to a patch is as follows:
> 
> 1. When a new patchset has "Fixes: bz#1234" or "Updates: bz#1234", we
> will post a comment to the bug with a link to the patch and change the
> status to POST. 2. When the patchset is merged, if the commit said
> "Fixes", we move the status to MODIFIED.
> 
> I'd like to propose the following improvements:
> 1. Add the Gerrit URL as an external tracker to the bug.

My assumption here is that for each patch that mentions a BZ, an
additional tracker would be added to the tracker list, right?

Further assumption (as I have not used trackers before) is that this
would reduce noise as comments in the bug itself, right?

In the past we have reduced noise by not commenting on the bug (or
github issue) every time the patch changes, so we get 2 comments per
patch currently, with the above change we would just get one and that
too as a terse external reference (see [1], based on my test/understanding).

What we would lose is the commit details when the patch is merged in the
BZ, as far as I can tell based on the changes below. These are useful
and would like these to be retained in case they are not.

> 2. When a patch is merged, only change state of the bug if needed. If
> there is no state change, do not add an additional message. The external
> tracker state should change reflecting the state of the review.

I added a tracker to this bug [1], but not seeing the tracker state
correctly reflected in BZ, is this work that needs to be done?

> 3. Assign the bug to the committer. This has edge cases, but it's best
> to at least handle the easy ones and then figure out edge cases later.
> The experience is going to be better than what it is right now.

Is the above a reference to just the "assigned to", or overall process?
If overall can you elaborate a little more on why this would be better
(I am not saying it is not, attempting to understand how you see it).

> 
> Please provide feedback/comments by end of day Friday. I plan to add
> this activity to the next Infra team sprint that starts on Monday (Sep 17).

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1619423
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Test health report (week ending 26th Aug, 2018)

2018-08-28 Thread Shyam Ranganathan

Need more focus on the retry cases, so that we have fewer failures
overall due to the same.

2 useful changes are in, fstat now has the ability to filter by job, and
tests that timeout will save older logs for analysis (assuming of course
the run failed, if on retry the run passes then the job is setup not to
save logs (as before)).

Line-coverage failures:
https://fstat.gluster.org/summary?start_date=2018-08-20&end_date=2018-08-26&branch=master&job=line-coverage

Regression test burn-in:
https://fstat.gluster.org/summary?start_date=2018-08-20&end_date=2018-08-26&branch=master&job=regression-test-burn-in

Mux-regressions:
https://fstat.gluster.org/summary?start_date=2018-08-20&end_date=2018-08-26&branch=master&job=regression-test-with-multiplex


https://build.gluster.org/job/regression-test-with-multiplex/834/console
18:29:04 2 test(s) needed retry
18:29:04 ./tests/00-geo-rep/georep-basic-dr-rsync.t
18:29:04 ./tests/bugs/glusterd/validating-server-quorum.t

https://build.gluster.org/job/regression-test-burn-in/4071/console
18:27:03 1 test(s) needed retry
18:27:03 ./tests/00-geo-rep/georep-basic-dr-tarssh.t

https://build.gluster.org/job/regression-test-with-multiplex/835/console
18:25:06 1 test(s) needed retry
18:25:06 ./tests/bugs/shard/bug-shard-discard.t

https://build.gluster.org/job/regression-test-burn-in/4072/console
18:34:35 1 test(s) needed retry
18:34:35 ./tests/basic/volume-snapshot.t

https://build.gluster.org/job/line-coverage/487/console
18:43:30 1 test(s) generated core
18:43:30 ./tests/bugs/glusterd/validating-server-quorum.t
18:43:30
18:43:30 1 test(s) needed retry
18:43:30 ./tests/00-geo-rep/georep-basic-dr-tarssh.t

https://build.gluster.org/job/regression-test-burn-in/4073/console
18:31:42 1 test(s) generated core
18:31:42 ./tests/bugs/glusterd/validating-server-quorum.t

https://build.gluster.org/job/regression-test-with-multiplex/837/console
18:28:56 1 test(s) failed
18:28:56 ./tests/basic/afr/split-brain-favorite-child-policy.t
18:28:56
18:28:56 1 test(s) generated core
18:28:56 ./tests/basic/afr/split-brain-favorite-child-policy.t

https://build.gluster.org/job/line-coverage/489/consoleFull
20:36:49 3 test(s) failed
20:36:49 ./tests/00-geo-rep/georep-basic-dr-tarssh.t
20:36:49 ./tests/basic/tier/fops-during-migration-pause.t
20:36:49 ./tests/bugs/readdir-ahead/bug-1436090.t
20:36:49
20:36:49 2 test(s) generated core
20:36:49 ./tests/00-geo-rep/00-georep-verify-setup.t
20:36:49 ./tests/00-geo-rep/georep-basic-dr-rsync.t
20:36:49
20:36:49 4 test(s) needed retry
20:36:49 ./tests/00-geo-rep/georep-basic-dr-tarssh.t
20:36:49 ./tests/basic/tier/fops-during-migration-pause.t
20:36:49 ./tests/basic/tier/fops-during-migration.t
20:36:49 ./tests/bugs/readdir-ahead/bug-1436090.t

https://build.gluster.org/job/regression-test-with-multiplex/840/console
18:22:44 2 test(s) needed retry
18:22:44 ./tests/basic/volume-snapshot-clone.t
18:22:44 ./tests/bugs/posix/bug-1619720.t

https://build.gluster.org/job/regression-test-burn-in/4075/console
https://build.gluster.org/job/regression-test-with-multiplex/839/consoleText
Multiple failures, look related to some infra issue at that point, not
recording the same in the mail.

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 5: Release calendar and status updates

2018-08-22 Thread Shyam Ranganathan

On 08/14/2018 02:28 PM, Shyam Ranganathan wrote:
> 2) Branching date: (Monday) Aug-20-2018 (~40 days before GA tagging)

We are postponing branching to 2nd week of September (10th), as the
entire effort in this release has been around stability and fixing
issues across the board.

Thus, we are expecting no net new features from hereon till branching,
and features that are already a part of the code base and its details
are as below.

> 
> 3) Late feature back port closure: (Friday) Aug-24-2018 (1 week from
> branching)

As stated above, there is no late feature back port.

The features that are part of master since 4.1 release are as follows,
with some questions for the authors,

1) Changes to options tables in xlators (#302)

@Kaushal/GD2 team, can we call this complete? There maybe no real
release notes for the same, as these are internal in nature, but
checking nevertheless.

2) CloudArchival (#387)

@susant, what is the status of this feature? Is it complete?
I am missing user documentation, and code coverage from the tests is
very low (see:
https://build.gluster.org/job/line-coverage/485/Line_20Coverage_20Report/ )

3) Quota fsck (#390)

@Sanoj I do have documentation in the github issue, but would prefer if
the user facing documentation moves to glusterdocs instead.

Further I see no real test coverage for the tool provided here, any
thoughts around the same?

The script is not part of the tarball and hence the distribution RPMs as
well, what is the thought around distributing the same?

4) Ensure python3 compatibility across code base (#411)

@Kaleb/others, last patch to call this issue done (sans real testing at
the moment) is https://review.gluster.org/c/glusterfs/+/20868 request
review and votes here, to get this merged before branching.

5) Turn on Dentry fop serializer by default in brick stack (#421)

@du, the release note for this can be short, as other details are
captured in 4.0 release notes.

However, in 4.0 release we noted a limitation with this feature as follows,

"Limitations: This feature is released as a technical preview, as
performance implications are not known completely." (see section
https://docs.gluster.org/en/latest/release-notes/4.0.0/#standalone )

Do we now have better data regarding the same that we can use when
announcing the release?

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Test health report (week ending 19th Aug. 2018)

2018-08-20 Thread Shyam Ranganathan

Although tests have stabilized quite a bit, and from the maintainers
meeting we know that some tests have patches coming in, here is a
readout of other tests that needed a retry. We need to reduce failures
on retries as well, to be able to not have spurious or other failures in
test runs.

Tests being worked on (from the maintainers meeting notes):
- bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t

Other retries and failures, request component maintainers to look at the
test case and resulting failures and post back any findings to the lists
to take things forward,

https://build.gluster.org/job/line-coverage/481/console
20:10:01 1 test(s) needed retry
20:10:01 ./tests/basic/distribute/rebal-all-nodes-migrate.t

https://build.gluster.org/job/line-coverage/483/console
18:42:01 2 test(s) needed retry
18:42:01 ./tests/basic/tier/fops-during-migration-pause.t
18:42:01
./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
(fix in progress)

https://build.gluster.org/job/regression-test-burn-in/4067/console
18:27:21 1 test(s) generated core
18:27:21 ./tests/bugs/readdir-ahead/bug-1436090.t

https://build.gluster.org/job/regression-test-with-multiplex/828/console
18:19:39 1 test(s) needed retry
18:19:39 ./tests/bugs/glusterd/validating-server-quorum.t

https://build.gluster.org/job/regression-test-with-multiplex/829/console
18:24:14 2 test(s) needed retry
18:24:14 ./tests/00-geo-rep/georep-basic-dr-rsync.t
18:24:14 ./tests/bugs/glusterd/quorum-validation.t

https://build.gluster.org/job/regression-test-with-multiplex/831/console
18:20:49 1 test(s) generated core
18:20:49 ./tests/basic/ec/ec-5-2.t

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Release 5: Release calendar and status updates

2018-08-14 Thread Shyam Ranganathan

This mail is to solicit the following,

Features/enhancements planned for Gluster 5 needs the following from
contributors:
  - Open/Use relevant issue
  - Mark issue with the "Release 5" milestone [1]
  - Post to the devel lists issue details, requesting addition to track
the same for the release

NOTE: We are ~7 days from branching, and I do not have any issues marked
for the release, please respond with your issues that are going to be a
part of this release as you read this.

Calendar of activities look as follows:

1) master branch health checks (weekly, till branching)
  - Expect every Monday a status update on various tests runs

2) Branching date: (Monday) Aug-20-2018 (~40 days before GA tagging)

3) Late feature back port closure: (Friday) Aug-24-2018 (1 week from
branching)

4) Initial release notes readiness: (Monday) Aug-27-2018

5) RC0 build: (Monday) Aug-27-2018



6) RC1 build: (Monday) Sep-17-2018



7) GA tagging: (Monday) Oct-01-2018



8) ~week later release announcement

Go/no-go discussions per-phase will be discussed in the maintainers list.


[1] Release milestone: https://github.com/gluster/glusterfs/milestone/7
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down for stabilization (unlocking the same)

2018-08-14 Thread Shyam Ranganathan

On 08/14/2018 12:51 AM, Pranith Kumar Karampuri wrote:
> 
> 
> On Mon, Aug 13, 2018 at 10:55 PM Shyam Ranganathan  <mailto:srang...@redhat.com>> wrote:
> 
> On 08/13/2018 02:20 AM, Pranith Kumar Karampuri wrote:
> >     - At the end of 2 weeks, reassess master and nightly test
> status, and
> >     see if we need another drive towards stabilizing master by
> locking down
> >     the same and focusing only on test and code stability around
> the same.
> >
> >
> > When will there be a discussion about coming up with guidelines to
> > prevent lock down in future?
> 
> A thread for the same is started in the maintainers list.
> 
> 
> Could you point me to the thread please? I am only finding a thread with
> subject "Lock down period merge process"

That is the one I am talking about, where you already raised the above
point (if I recollect right).

> 
> >
> > I think it is better to lock-down specific components by removing
> commit
> > access for the respective owners for those components when a test in a
> > particular component starts to fail.
> 
>     Also I suggest we move this to the maintainers thread, to keep the noise
> levels across lists in check.
> 
> Thanks,
> Shyam
> 
> 
> 
> -- 
> Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down: RCA for tests (testname.t)

2018-08-13 Thread Shyam Ranganathan

On 08/12/2018 08:42 PM, Shyam Ranganathan wrote:
> As a means of keeping the focus going and squashing the remaining tests
> that were failing sporadically, request each test/component owner to,
> 
> - respond to this mail changing the subject (testname.t) to the test
> name that they are responding to (adding more than one in case they have
> the same RCA)
> - with the current RCA and status of the same
> 
> List of tests and current owners as per the spreadsheet that we were
> tracking are:

The following were changes to run-tests or infra to enable better
debugging of tests, covering them as well in the RCA thread.

@nigel, @atin, add any that I missed out.

1) line-coverage tests now run with a default timeout of 300 seconds
(covered in this mail:
https://lists.gluster.org/pipermail/gluster-devel/2018-August/055262.html )

2) Tests retried now save their logs in a separate tarball named after
the test and a timestamp (around when it fails), enables better
debugging of retried tests

See: https://review.gluster.org/c/glusterfs/+/20682

3) (UNSOLVED) tests that timeout still do not stash their logs away,
reproducible on local systems as well, need to close this out soon to
enable debugging those retries as well.

4) lcov tests were reporting as failed in Jenkins, on successful runs.
This is now corrected.

See: https://review.gluster.org/c/build-jobs/+/20715

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down: RCA for tests (UNSOLVED bug-1110262.t)

2018-08-13 Thread Shyam Ranganathan

On 08/12/2018 08:42 PM, Shyam Ranganathan wrote:
> As a means of keeping the focus going and squashing the remaining tests
> that were failing sporadically, request each test/component owner to,
> 
> - respond to this mail changing the subject (testname.t) to the test
> name that they are responding to (adding more than one in case they have
> the same RCA)
> - with the current RCA and status of the same
> 
> List of tests and current owners as per the spreadsheet that we were
> tracking are:
> 
> ./tests/bugs/bug-1110262.tTBD

The above test fails as follows,
Run: https://build.gluster.org/job/line-coverage/427/consoleFull

Log snippet: (retried and passed so no further logs)
18:50:33 useradd: user 'dev' already exists
18:50:33 not ok 13 , LINENUM:42
18:50:33 FAILED COMMAND: useradd dev
18:50:33 groupadd: group 'QA' already exists
18:50:33 not ok 14 , LINENUM:43
18:50:33 FAILED COMMAND: groupadd QA

Basically, the user/group existed and hence the test failed. Now, I
tried getting to the build history of the machine that failed this test,
but Jenkins has not been cooperative, this was in an effort to
understand which previous run failed.

Also one other test case uses the same user and group names,
tests/bugs/bug-1584517.t, but that runs after this test.

So I do not know how that user and group name leaked, due to which this
test case failed.

Bug filed: https://bugzilla.redhat.com/show_bug.cgi?id=1615604

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down: RCA for tests (UNSOLVED ./tests/basic/stats-dump.t)

2018-08-13 Thread Shyam Ranganathan

On 08/13/2018 03:34 PM, Niels de Vos wrote:
> On Mon, Aug 13, 2018 at 02:32:19PM -0400, Shyam Ranganathan wrote:
>> On 08/12/2018 08:42 PM, Shyam Ranganathan wrote:
>>> As a means of keeping the focus going and squashing the remaining tests
>>> that were failing sporadically, request each test/component owner to,
>>>
>>> - respond to this mail changing the subject (testname.t) to the test
>>> name that they are responding to (adding more than one in case they have
>>> the same RCA)
>>> - with the current RCA and status of the same
>>>
>>> List of tests and current owners as per the spreadsheet that we were
>>> tracking are:
>>>
>>> ./tests/basic/stats-dump.t  TBD
>>
>> This test fails as follows:
>>
>>   01:07:31 not ok 20 , LINENUM:42
>>   01:07:31 FAILED COMMAND: grep .queue_size
>> /var/lib/glusterd/stats/glusterfsd__d_backends_patchy1.dump
>>
>>   18:35:43 not ok 21 , LINENUM:43
>>   18:35:43 FAILED COMMAND: grep .queue_size
>> /var/lib/glusterd/stats/glusterfsd__d_backends_patchy2.dump
>>
>> Basically when grep'ing for a pattern in the stats dump it is not
>> finding the second grep pattern of "queue_size" in one or the other bricks.
>>
>> The above seems incorrect, if it found "aggr.fop.write.count" it stands
>> to reason that it found a stats dump, further there is a 2 second sleep
>> as well in the test case and the dump interval is 1 second.
>>
>> The only reason for this to fail could hence possibly be that the file
>> was just (re)opened (by the io-stats dumper thread) for overwriting
>> content, at which point the fopen uses the mode "w+", and the file was
>> hence truncated, and the grep CLI also opened the file at the same time,
>> and hence found no content.
> 
> This sounds like a dangerous approach in any case. Truncating a file
> while there are potential other readers should probably not be done. I
> wonder if there is a good reason for this.
> 
> A safer solution would be to create a new temporary file, write the
> stats to that and once done rename it to the expected filename. Any
> process reading from the 'old' file will have its file-descriptor open
> and can still read the previous, but consistent contents.

Correct. I do not know the answer to why it was so, but here is where it
is so,
https://github.com/gluster/glusterfs/blob/58d2c13c7996d6d192cc792eca372538673f808e/xlators/debug/io-stats/src/io-stats.c#L3224

Just so that I am not dreaming up the RCA :)

I guess the ideal fix would be to actually use the tmpfile rename
pattern here, to avoid readers from reading empty files.

It can be argued that, there maybe readers who hold the file open and
read contents from the same, and hence the file should remain the same
etc. but this needs more eyes and thought before we decide either way.

> 
> Niels
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down: RCA for tests (remove-brick-testcases.t) (./tests/basic/tier/tier-heald.t)

2018-08-13 Thread Shyam Ranganathan

On 08/13/2018 05:40 AM, Ravishankar N wrote:
> 
> 
> On 08/13/2018 06:12 AM, Shyam Ranganathan wrote:
>> As a means of keeping the focus going and squashing the remaining tests
>> that were failing sporadically, request each test/component owner to,
>>
>> - respond to this mail changing the subject (testname.t) to the test
>> name that they are responding to (adding more than one in case they have
>> the same RCA)
>> - with the current RCA and status of the same
>>
>> List of tests and current owners as per the spreadsheet that we were
>> tracking are:
>>
>> TBD
>>
>> ./tests/bugs/glusterd/remove-brick-testcases.t    TBD
> In this case, the .t passed but self-heal-daemon (which btw does not
> have any role in this test because there is no I/O or heals in this .t)
> has crashed with the following bt:
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x7ff8c6bc0b4f in _IO_cleanup () from ./lib64/libc.so.6
> [Current thread is 1 (LWP 17530)]
> (gdb)
> (gdb) bt
> #0  0x7ff8c6bc0b4f in _IO_cleanup () from ./lib64/libc.so.6
> #1  0x7ff8c6b7cb8b in __run_exit_handlers () from ./lib64/libc.so.6
> #2  0x7ff8c6b7cc27 in exit () from ./lib64/libc.so.6
> #3  0x0040b14d in cleanup_and_exit (signum=15) at glusterfsd.c:1570
> #4  0x0040de71 in glusterfs_sigwaiter (arg=0x7ffd5f270d20) at
> glusterfsd.c:2332
> #5  0x7ff8c757ce25 in start_thread () from ./lib64/libpthread.so.0
> #6  0x7ff8c6c41bad in clone () from ./lib64/libc.so.6

Slightly better stack with libc symbols as well,

Program terminated with signal 11, Segmentation fault.
#0  0x7ff8c6bc0b4f in _IO_unbuffer_write () at genops.c:965
965 if (fp->_lock == NULL || _IO_lock_trylock (*fp->_lock) == 0)
(gdb) bt
#0  0x7ff8c6bc0b4f in _IO_unbuffer_write () at genops.c:965
#1  _IO_cleanup () at genops.c:1025
#2  0x7ff8c6b7cb8b in __run_exit_handlers (status=15,
listp=, run_list_atexit=run_list_atexit@entry=true) at
exit.c:90
#3  0x7ff8c6b7cc27 in __GI_exit (status=) at exit.c:99
#4  0x0040b14d in cleanup_and_exit (signum=15) at glusterfsd.c:1570
#5  0x0040de71 in glusterfs_sigwaiter (arg=0x7ffd5f270d20) at
glusterfsd.c:2332
#6  0x7ff8c757ce25 in start_thread (arg=0x7ff8bf318700) at
pthread_create.c:308
#7  0x7ff8c6c41bad in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

> 
> Not able to find out the reason of the crash. Any pointers are
> appreciated. Regression run/core can be found at
> https://build.gluster.org/job/line-coverage/432/consoleFull .

FWIW ./tests/basic/tier/tier-heald.t also core dumped here.
Run:
https://build.gluster.org/job/regression-on-demand-multiplex/237/consoleFull

Stack of thread that hit the segmentation fault:
Program terminated with signal 11, Segmentation fault.
#0  0x7fe4c60d3b4f in _IO_unbuffer_write () at genops.c:965
965 if (fp->_lock == NULL || _IO_lock_trylock (*fp->_lock) == 0)
(gdb) bt
#0  0x7fe4c60d3b4f in _IO_unbuffer_write () at genops.c:965
#1  _IO_cleanup () at genops.c:1025
#2  0x7fe4c608fb8b in __run_exit_handlers (status=15,
listp=, run_list_atexit=run_list_atexit@entry=true) at
exit.c:90
#3  0x7fe4c608fc27 in __GI_exit (status=) at exit.c:99
#4  0x00408bf5 in cleanup_and_exit (signum=15) at
/home/jenkins/root/workspace/regression-on-demand-multiplex/glusterfsd/src/glusterfsd.c:1570
#5  0x0040a7af in glusterfs_sigwaiter (arg=0x7ffebf1439a0) at
/home/jenkins/root/workspace/regression-on-demand-multiplex/glusterfsd/src/glusterfsd.c:2332
#6  0x7fe4c6a8fe25 in start_thread (arg=0x7fe4be82b700) at
pthread_create.c:308
#7  0x7fe4c6154bad in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

I do not have any further clues at present, was staring at this last
week for some time, need to dig deeper here. As Ravi points out, any
help appreciated in resolving this.

> 
> Thanks,
> Ravi
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down: RCA for tests (UNSOLVED ./tests/basic/stats-dump.t)

2018-08-13 Thread Shyam Ranganathan

On 08/13/2018 02:32 PM, Shyam Ranganathan wrote:
> I will be adding a bug and a fix that tries this in a loop to avoid the
> potential race that I see above as the cause.

Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1615582
Potential fix: https://review.gluster.org/c/glusterfs/+/20726

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down: RCA for tests (UNSOLVED ./tests/basic/stats-dump.t)

2018-08-13 Thread Shyam Ranganathan

On 08/12/2018 08:42 PM, Shyam Ranganathan wrote:
> As a means of keeping the focus going and squashing the remaining tests
> that were failing sporadically, request each test/component owner to,
> 
> - respond to this mail changing the subject (testname.t) to the test
> name that they are responding to (adding more than one in case they have
> the same RCA)
> - with the current RCA and status of the same
> 
> List of tests and current owners as per the spreadsheet that we were
> tracking are:
> 
> ./tests/basic/stats-dump.tTBD

This test fails as follows:

  01:07:31 not ok 20 , LINENUM:42
  01:07:31 FAILED COMMAND: grep .queue_size
/var/lib/glusterd/stats/glusterfsd__d_backends_patchy1.dump

  18:35:43 not ok 21 , LINENUM:43
  18:35:43 FAILED COMMAND: grep .queue_size
/var/lib/glusterd/stats/glusterfsd__d_backends_patchy2.dump

Basically when grep'ing for a pattern in the stats dump it is not
finding the second grep pattern of "queue_size" in one or the other bricks.

The above seems incorrect, if it found "aggr.fop.write.count" it stands
to reason that it found a stats dump, further there is a 2 second sleep
as well in the test case and the dump interval is 1 second.

The only reason for this to fail could hence possibly be that the file
was just (re)opened (by the io-stats dumper thread) for overwriting
content, at which point the fopen uses the mode "w+", and the file was
hence truncated, and the grep CLI also opened the file at the same time,
and hence found no content.

I will be adding a bug and a fix that tries this in a loop to avoid the
potential race that I see above as the cause.

Other ideas/causes welcome!

Also, this has failed in mux and non-mux environments,
Runs with failure:
https://build.gluster.org/job/regression-on-demand-multiplex/175/consoleFull
(no logs)

https://build.gluster.org/job/regression-on-demand-full-run/59/consoleFull
(has logs)

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down: RCA for tests (./tests/bugs/core/bug-1432542-mpx-restart-crash.t)

2018-08-13 Thread Shyam Ranganathan

On 08/12/2018 08:42 PM, Shyam Ranganathan wrote:
> As a means of keeping the focus going and squashing the remaining tests
> that were failing sporadically, request each test/component owner to,
> 
> - respond to this mail changing the subject (testname.t) to the test
> name that they are responding to (adding more than one in case they have
> the same RCA)
> - with the current RCA and status of the same
> 
> List of tests and current owners as per the spreadsheet that we were
> tracking are:
> 
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t 1608568 Nithya/Shyam

This test had 2 issues,

1. Needed more time in lcov builds, hence timeout was bumped to 800, and
also one of the EXPECT_WITHIN needed more tolerance and was bumped up to
120 seconds

2. This test OOM killed at times, and to reduce the memory pressure due
to the test, post each client mount that was in use for a dd test, it
was unmounted. This resulted in no more OOM kills for the test.

Shyam (and Nithya)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down: RCA for tests (./tests/bugs/distribute/bug-1042725.t)

2018-08-13 Thread Shyam Ranganathan

On 08/12/2018 08:42 PM, Shyam Ranganathan wrote:
> As a means of keeping the focus going and squashing the remaining tests
> that were failing sporadically, request each test/component owner to,
> 
> - respond to this mail changing the subject (testname.t) to the test
> name that they are responding to (adding more than one in case they have
> the same RCA)
> - with the current RCA and status of the same
> 
> List of tests and current owners as per the spreadsheet that we were
> tracking are:
> 
> ./tests/bugs/distribute/bug-1042725.t Shyam

The test above failed to even start glusterd (the first line of the
test) properly when it failed. On inspection it was noted that the
previous test ./tests/bugs/core/multiplex-limit-issue-151.t had not
completed succesfully and also had an different cleanup pattern
(trapping cleanup as an TERM exit, and not invoking it outright).

The test ./tests/bugs/core/multiplex-limit-issue-151.t was cleaned up to
perform cleanup as appropriate, and no further errors in the test
./tests/bugs/distribute/bug-1042725.t have been seen since then.

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down: RCA for tests (./tests/bugs/distribute/bug-1117851.t)

2018-08-13 Thread Shyam Ranganathan

On 08/12/2018 08:42 PM, Shyam Ranganathan wrote:
> As a means of keeping the focus going and squashing the remaining tests
> that were failing sporadically, request each test/component owner to,
> 
> - respond to this mail changing the subject (testname.t) to the test
> name that they are responding to (adding more than one in case they have
> the same RCA)
> - with the current RCA and status of the same
> 
> List of tests and current owners as per the spreadsheet that we were
> tracking are:
> 
> ./tests/bugs/distribute/bug-1117851.t Shyam/Nigel

Tests in lcov instrumented code take more time than normal. This test
was pushing towards 180-190 seconds on successful runs. As a result to
remove any potential issues around tests that run close to the default
timeout of 200 seconds, 2 changes were done.

1) https://review.gluster.org/c/glusterfs/+/20648
Added an option to run-tests.sh to enable setting the default timeout to
a different value.

2) https://review.gluster.org/c/build-jobs/+/20655
Changed the line-coverage job to use the above to set the default
timeout to 300 for the test run

Since the changes this issue has not failed in lcov runs.

Shyam (and Nigel/Nithya)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down for stabilization (unlocking the same)

2018-08-13 Thread Shyam Ranganathan

On 08/13/2018 02:20 AM, Pranith Kumar Karampuri wrote:
> - At the end of 2 weeks, reassess master and nightly test status, and
> see if we need another drive towards stabilizing master by locking down
> the same and focusing only on test and code stability around the same.
> 
> 
> When will there be a discussion about coming up with guidelines to
> prevent lock down in future?

A thread for the same is started in the maintainers list.

> 
> I think it is better to lock-down specific components by removing commit
> access for the respective owners for those components when a test in a
> particular component starts to fail.

Also I suggest we move this to the maintainers thread, to keep the noise
levels across lists in check.

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down: RCA for tests (testname.t)

2018-08-12 Thread Shyam Ranganathan

As a means of keeping the focus going and squashing the remaining tests
that were failing sporadically, request each test/component owner to,

- respond to this mail changing the subject (testname.t) to the test
name that they are responding to (adding more than one in case they have
the same RCA)
- with the current RCA and status of the same

List of tests and current owners as per the spreadsheet that we were
tracking are:

./tests/basic/distribute/rebal-all-nodes-migrate.t  TBD
./tests/basic/tier/tier-heald.t TBD
./tests/basic/afr/sparse-file-self-heal.t   TBD
./tests/bugs/shard/bug-1251824.tTBD
./tests/bugs/shard/configure-lru-limit.tTBD
./tests/bugs/replicate/bug-1408712.tRavi
./tests/basic/afr/replace-brick-self-heal.t TBD
./tests/00-geo-rep/00-georep-verify-setup.t Kotresh
./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t Karthik
./tests/basic/stats-dump.t  TBD
./tests/bugs/bug-1110262.t  TBD
./tests/basic/ec/ec-data-heal.t Mohit
./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t   Pranith
./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
TBD
./tests/basic/ec/ec-5-2.t   Sunil
./tests/bugs/shard/bug-shard-discard.t  TBD
./tests/bugs/glusterd/remove-brick-testcases.t  TBD
./tests/bugs/protocol/bug-808400-repl.t TBD
./tests/bugs/quick-read/bug-846240.tDu
./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t   Mohit
./tests/00-geo-rep/georep-basic-dr-tarssh.t Kotresh
./tests/bugs/ec/bug-1236065.t   Pranith
./tests/00-geo-rep/georep-basic-dr-rsync.t  Kotresh
./tests/basic/ec/ec-1468261.t   Ashish
./tests/basic/afr/add-brick-self-heal.t Ravi
./tests/basic/afr/granular-esh/replace-brick.t  Pranith
./tests/bugs/core/multiplex-limit-issue-151.t   Sanju
./tests/bugs/glusterd/validating-server-quorum.tAtin
./tests/bugs/replicate/bug-1363721.tRavi
./tests/bugs/index/bug-1559004-EMLINK-handling.tPranith
./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t 
Karthik
./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
Atin
./tests/bugs/glusterd/rebalance-operations-in-single-node.t TBD
./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t   TBD
./tests/bitrot/bug-1373520.tKotresh
./tests/bugs/distribute/bug-1117851.t   Shyam/Nigel
./tests/bugs/glusterd/quorum-validation.t   Atin
./tests/bugs/distribute/bug-1042725.t   Shyam
./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
Karthik
./tests/bugs/quota/bug-1293601.tTBD
./tests/bugs/bug-1368312.t  Du
./tests/bugs/distribute/bug-1122443.t   Du
./tests/bugs/core/bug-1432542-mpx-restart-crash.t   1608568 Nithya/Shyam

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Master branch lock down for stabilization (unlocking the same)

2018-08-12 Thread Shyam Ranganathan

Hi,

So we have had master locked down for a week to ensure we only get fixes
for failing tests in order to stabilize the code base, partly for
release-5 branching as well.

As of this weekend, we (Atin and myself) have been looking at the
pass/fail rates on the tests, and whether we are discovering newer
failures of more of the same.

Our runs with patch sets 10->11->12 is looking better than where we
started, and we have a list of tests that we need to still fix.

But there are other issues and fixes that are needed in the code that
are lagging behind due to the lock down. The plan going forward is as
follows,

- Unlock master, and ensure that we do not start seeing newer failures
as we merge other patches in, if so raise them on the lists and as bugs
and let's work towards ensuring these are addressed. *Maintainers*
please pay special attention when merging patches.

- Address the current pending set of tests that have been identified as
failing, over the course of the next 2 weeks. *Contributors* continue
the focus here, so that we do not have to end up with another drive
towards the same in 2 weeks.

- At the end of 2 weeks, reassess master and nightly test status, and
see if we need another drive towards stabilizing master by locking down
the same and focusing only on test and code stability around the same.

Atin and Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down status (Aug 12th, 2018) (patchset 12)

2018-08-12 Thread Shyam Ranganathan

Patch set 12 results:

./tests/bugs/glusterd/quorum-validation.t (3 retries, 1 core)
./tests/bugs/glusterd/validating-server-quorum.t (1 core)
(NEW) ./tests/basic/distribute/rebal-all-nodes-migrate.t (1 retry)
./tests/basic/stats-dump.t (1 retry)
./tests/bugs/shard/bug-1251824.t (1 retry)
./tests/basic/ec/ec-5-2.t (1 core)
(NEW) ./tests/basic/tier/tier-heald.t (1 core) (Looks similar to,
./tests/bugs/glusterd/remove-brick-testcases.t (run: lcov#432))

Sheet updated here:
https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit#gid=522127663

Gerrit comment here:
https://review.gluster.org/c/glusterfs/+/20637/12#message-186adbee76d6999385022239cb2daba589f0a81f

Shyam
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>> destabilize it.
>>
>> Test cases that are tracked as failures and need action are present here
>> [3].
>>
>> @Nigel, request you to apply the commit rights change as you see this
>> mail and let the list know regarding the same as well.
>>
>> Thanks,
>> Shyam
>>
>> [1] Patches that address regression failures:
>> https://review.gluster.org/#/q/starredby:srangana%2540redhat.com
>>
>> [2] Bunched

Re: [Gluster-devel] Master branch lock down status (Patch set 11, Aug 12, 2018)

2018-08-12 Thread Shyam Ranganathan

Patch set 11 report:

line coverage: 4/8 PASS, 7/8 with retries, 1 core
CentOS regression: 5/8 PASS, 8/8 PASS-With-RETRIES
Mux regression: 7/8 PASS, 1 core

No NEW failures, sheet [1] updated with run details, and so is the WIP
patch with the same data [2].

Cores:
- ./tests/bugs/glusterd/validating-server-quorum.t
- ./tests/basic/ec/ec-5-2.t

Other retries/failures:
- ./tests/bugs/shard/bug-shard-discard.t
- ./tests/basic/afr/replace-brick-self-heal.t
- ./tests/bugs/core/multiplex-limit-issue-151.t
- ./tests/00-geo-rep/georep-basic-dr-tarssh.t
- ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
- ./tests/bugs/shard/configure-lru-limit.t
- ./tests/bugs/glusterd/quorum-validation.t


[1] Sheet with failure and run data:
https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit#gid=1434742898

[2] Gerrit comment with the same information:
https://review.gluster.org/c/glusterfs/+/20637/12#message-1f8f94aaa88be276229f20eb25a650381bc37543
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>> destabilize it.
>>
>> Test cases that are tracked as failures and need action are present here
>> [3].
>>
>> @Nigel, request you to apply the commit rights change as you se

Re: [Gluster-devel] Master branch lock down status (Sat. Aug 10th)

2018-08-11 Thread Shyam Ranganathan

Patch set 10: Run status

Line-coverage 4/7 PASS, 7/7 PASS-With-RETRY
Mux-regressions 4/55 PASS, 1 core
CentOS7 Regression 3/7 PASS, 7/7 PASS-With-RETRY

./tests/bugs/replicate/bug-1408712.t (2 fail/retry)
./tests/bugs/glusterd/quorum-validation.t (1 fail/retry)
./tests/bugs/core/multiplex-limit-issue-151.t (1 fail/retry)
./tests/bugs/shard/bug-shard-discard.t (1 fail/retry)
(NEW) ./tests/basic/afr/sparse-file-self-heal.t (1 fail/retry)
(NEW) ./tests/bugs/shard/bug-1251824.t (1 fail/retry)
(NEW) ./tests/bugs/shard/configure-lru-limit.t (1 fail/retry)
./tests/bugs/glusterd/validating-server-quorum.t (2 fail/retry)

Sheet [1] has run details and also comment on patch [2] has run details.

Atin/Shyam

[1] Sheet:
https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit#gid=552922579

[2] Comment on patch:
https://review.gluster.org/c/glusterfs/+/20637#message-2030bb77ed8d98618caded7b823bc4d65238e911
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>> destabilize it.
>>
>> Test cases that are tracked as failures and need action are present here
>> [3].
>>
>> @Nigel, request you to apply the commit rights change as you see this
>> mail and let the list k

Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-11 Thread Shyam Ranganathan

On 08/09/2018 10:58 PM, Raghavendra Gowdappa wrote:
> 
> 
> On Fri, Aug 10, 2018 at 1:38 AM, Shyam Ranganathan  <mailto:srang...@redhat.com>> wrote:
> 
> On 08/08/2018 09:04 PM, Shyam Ranganathan wrote:
> > Today's patch set 7 [1], included fixes provided till last evening IST,
> > and its runs can be seen here [2] (yay! we can link to comments in
> > gerrit now).
> > 
> > New failures: (added to the spreadsheet)
> > ./tests/bugs/quick-read/bug-846240.t
> 
> The above test fails always if there is a sleep of 10 added at line 36.
> 
> I tried to replicate this in my setup, and was able to do so 3/150 times
> and the failures were the same as the ones reported in the build logs
> (as below).
> 
> Not finding any clear reason for the failure, I delayed the test (i.e
> added a sleep 10) after the open on M0 to see if the race is uncovered,
> and it was.
> 
> Du, request you to take a look at the same, as the test is around
> quick-read but involves open-behind as well.
> 
> 
> Thanks for that information. I'll be working on this today.

Heads up Du, failed again with the same pattern in run
https://build.gluster.org/job/regression-on-demand-full-run/46/consoleFull

> 
> 
> Failure snippet:
> 
> 23:41:24 [23:41:28] Running tests in file
> ./tests/bugs/quick-read/bug-846240.t
> 23:41:28 ./tests/bugs/quick-read/bug-846240.t ..
> 23:41:28 1..17
> 23:41:28 ok 1, LINENUM:9
> 23:41:28 ok 2, LINENUM:10
> 
> 23:41:28 ok 13, LINENUM:40
> 23:41:28 not ok 14 , LINENUM:50
> 23:41:28 FAILED COMMAND: [ 0 -ne 0 ]
> 
> Shyam
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] tests/bugs/core/multiplex-limit-issue-151.t timed out

2018-08-11 Thread Shyam Ranganathan

On 08/11/2018 12:43 AM, Mohit Agrawal wrote:
> File a bug https://bugzilla.redhat.com/show_bug.cgi?id=1615003, I am not
> able to extract logs
> specific to this test case from the log dump.

This is because the test is calling cleanup twice in its exit bash traps.

- First, the test itself sets a trap to cleanup at
https://github.com/gluster/glusterfs/blob/master/tests/bugs/core/multiplex-limit-issue-151.t#L30

- There is an additional trap set to cleanup in include.rc,
https://github.com/gluster/glusterfs/blob/master/tests/include.rc#L719

The tar ball is generated in the cleanup routine, and also ensures that
on content in the tar balls is between 2 invocations. Thus, calling
cleanup twice will result in an empty tarball.

This can be seen running the test locally as,
`./tests/bugs/distribute/bug-1042725.t`

There are a few things in that test we need clarified,
1. why trap this:
https://github.com/gluster/glusterfs/blob/master/tests/bugs/core/multiplex-limit-issue-151.t#L29
2. Why trap cleanup, rather than invoke it at the end of the test as is
normal

Also, in the merged patch sets 2/4/6/7/8 I had added a cleanup at the
end (as I traced the failure of ./tests/bugs/distribute/bug-1042725.t to
incorrect cleanup by the previous test (or timeout in cleanup)). I did
not do the same in patch set 9.

So, I will post a patch that removes the traps set by this test (so we
get logs from this test), and hence add the manual cleanup at the end of
the test.

Finally, I do not see an infra bug in this.

(updated the bug as well)

> 
> 
> Thanks
> Mohit Agrawal
> 
> On Sat, Aug 11, 2018 at 9:27 AM, Atin Mukherjee  > wrote:
> 
> https://build.gluster.org/job/line-coverage/455/consoleFull
> 
> 
> 1 test failed:
> tests/bugs/core/multiplex-limit-issue-151.t (timed out)
> 
> The last job
> https://build.gluster.org/job/line-coverage/454/consoleFull
>  took
> only 21 secs, so we're not anyway near to breaching the threshold of
> the timeout secs. Possibly a hang?
> 
> 
> 
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down status (Fri, August 9th)

2018-08-11 Thread Shyam Ranganathan

On 08/10/2018 09:59 PM, Shyam Ranganathan wrote:
> Today's patch set is 9 [1].
> 
> Total of 7 runs for line-coverage, mux regressions, centos7 regressions
> are running (some are yet to complete).
> 
> Test failure summary is as follows,
Updating this section
1. ./tests/bugs/glusterd/validating-server-quorum.t (3 cores, 1 retry)
2. ./tests/bugs/core/multiplex-limit-issue-151.t (1 failure, 1 retry)
3.
./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
(2 retries)
4. (NEW) ./tests/basic/afr/replace-brick-self-heal.t (1 retry)
5. ./tests/bugs/glusterd/quorum-validation.t (2 retires, 1 core)
6. (NEW) ./tests/bugs/replicate/bug-1408712.t (1 retry) (ravi looking at it)
7. replace-brick-self-heal.t (1 retry)
8. ./tests/00-geo-rep/georep-basic-dr-rsync.t (1 retry)

> 
> Test output can be found at, [2] and [3]. [2] will be updated as runs
> that are still ongoing complete.

Above is also updated to find the runs where the tests fail.

> 
> Shyam
> [1] Patch set: https://review.gluster.org/c/glusterfs/+/20637/9
> [2] Sheet recording failures:
> https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit#gid=1535799585
> [3] Comment on patch set 9 recording runs till now:
> https://review.gluster.org/c/glusterfs/+/20637#message-07f3886dda133ed642438eb9e82b82d957668e86
> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
>> Deserves a new beginning, threads on the other mail have gone deep enough.
>>
>> NOTE: (5) below needs your attention, rest is just process and data on
>> how to find failures.
>>
>> 1) We are running the tests using the patch [2].
>>
>> 2) Run details are extracted into a separate sheet in [3] named "Run
>> Failures" use a search to find a failing test and the corresponding run
>> that it failed in.
>>
>> 3) Patches that are fixing issues can be found here [1], if you think
>> you have a patch out there, that is not in this list, shout out.
>>
>> 4) If you own up a test case failure, update the spreadsheet [3] with
>> your name against the test, and also update other details as needed (as
>> comments, as edit rights to the sheet are restricted).
>>
>> 5) Current test failures
>> We still have the following tests failing and some without any RCA or
>> attention, (If something is incorrect, write back).
>>
>> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
>> attention)
>> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
>> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
>> (Atin)
>> ./tests/bugs/ec/bug-1236065.t (Ashish)
>> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
>> ./tests/basic/ec/ec-1468261.t (needs attention)
>> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
>> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
>> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
>> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
>> ./tests/bugs/replicate/bug-1363721.t (Ravi)
>>
>> Here are some newer failures, but mostly one-off failures except cores
>> in ec-5-2.t. All of the following need attention as these are new.
>>
>> ./tests/00-geo-rep/00-georep-verify-setup.t
>> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
>> ./tests/basic/stats-dump.t
>> ./tests/bugs/bug-1110262.t
>> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
>> ./tests/basic/ec/ec-data-heal.t
>> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
>> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
>> ./tests/basic/ec/ec-5-2.t
>>
>> 6) Tests that are addressed or are not occurring anymore are,
>>
>> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
>> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
>> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
>> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
>> ./tests/bitrot/bug-1373520.t
>> ./tests/bugs/distribute/bug-1117851.t
>> ./tests/bugs/glusterd/quorum-validation.t
>> ./tests/bugs/distribute/bug-1042725.t
>> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
>> ./tests/bugs/quota/bug-1293601.t
>> ./tests/bugs/bug-1368312.t
>> ./tests/bugs/distribute/bug-1122443.t
>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>
>> Shyam (and Atin)
>>
>> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>>> Health on master as of the last nightly run [4] is still the same.
>&

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status (Fri, August 9th)

2018-08-11 Thread Shyam Ranganathan

On 08/11/2018 02:09 AM, Atin Mukherjee wrote:
> I saw the same behaviour for
> https://build.gluster.org/job/regression-on-demand-full-run/47/consoleFull
> as well. In both the cases the common pattern is if a test was retried
> but overall the job succeeded. Is this a bug which got introduced
> recently? At the moment, this is blocking us to debug any tests which
> has been retried but the job overall succeeded.
> 
> *01:54:20* Archiving artifacts
> *01:54:21* ‘glusterfs-logs.tgz’ doesn’t match anything
> *01:54:21* No artifacts found that match the file pattern 
> "glusterfs-logs.tgz". Configuration error?
> *01:54:21* Finished: SUCCESS
> 
> I saw the same behaviour for 
> https://build.gluster.org/job/regression-on-demand-full-run/47/consoleFull as 
> well.

This has been the behavior always, if we call out a run as failed from
run-tests.sh (when there are retries) then the logs will be archived. We
do not call out a run as a failure in case there were retries, hence no
logs.

I will add this today to the WIP testing patchset.

> 
> 
> On Sat, Aug 11, 2018 at 9:40 AM Ravishankar N  <mailto:ravishan...@redhat.com>> wrote:
> 
> 
> 
> On 08/11/2018 07:29 AM, Shyam Ranganathan wrote:
> > ./tests/bugs/replicate/bug-1408712.t (one retry)
> I'll take a look at this. But it looks like archiving the artifacts
> (logs) for this run
> 
> (https://build.gluster.org/job/regression-on-demand-full-run/44/consoleFull)
> 
> was a failure.
> Thanks,
> Ravi
> ___
> maintainers mailing list
> maintain...@gluster.org <mailto:maintain...@gluster.org>
> https://lists.gluster.org/mailman/listinfo/maintainers
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down status (Fri, August 9th)

2018-08-10 Thread Shyam Ranganathan

Today's patch set is 9 [1].

Total of 7 runs for line-coverage, mux regressions, centos7 regressions
are running (some are yet to complete).

Test failure summary is as follows,
./tests/bugs/glusterd/validating-server-quorum.t (2 cores)
./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
(2 retries)
./tests/bugs/replicate/bug-1408712.t (one retry)
./tests/bugs/core/multiplex-limit-issue-151.t (one retry)
./tests/bugs/quick-read/bug-846240.t (one retry)
./tests/00-geo-rep/georep-basic-dr-rsync.t (one retry)

Test output can be found at, [2] and [3]. [2] will be updated as runs
that are still ongoing complete.

Shyam
[1] Patch set: https://review.gluster.org/c/glusterfs/+/20637/9
[2] Sheet recording failures:
https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit#gid=1535799585
[3] Comment on patch set 9 recording runs till now:
https://review.gluster.org/c/glusterfs/+/20637#message-07f3886dda133ed642438eb9e82b82d957668e86
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>> destabilize it.
>>
>> Test cases that are tracked as failures and need action are present here
>> [3].
>>
>> @Nigel, request you to appl

Re: [Gluster-devel] Master branch lock down status (Thu, August 09th)

2018-08-09 Thread Shyam Ranganathan

Today's test results are updated in the spreadsheet in sheet named "Run
patch set 8".

I took in patch https://review.gluster.org/c/glusterfs/+/20685 which
caused quite a few failures, so not updating new failures as issue yet.

Please look at the failures for tests that were retried and passed, as
the logs for the initial runs should be preserved from this run onward.

Otherwise nothing else to report on the run status, if you are averse to
spreadsheets look at this comment in gerrit [1].

Shyam

[1] Patch set 8 run status:
https://review.gluster.org/c/glusterfs/+/20637/8#message-54de30fa384fd02b0426d9db6d07fad4eeefcf08
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>> destabilize it.
>>
>> Test cases that are tracked as failures and need action are present here
>> [3].
>>
>> @Nigel, request you to apply the commit rights change as you see this
>> mail and let the list know regarding the same as well.
>>
>> Thanks,
>> Shyam
>>
>> [1] Patches that address regression failures:
>> https://review.gluster.org/#/q/starredby:srangana%2540redhat.com
>>
>> [2] Bunched up patch against which regressions were run:
>> https:/

Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-09 Thread Shyam Ranganathan

On 08/08/2018 09:04 PM, Shyam Ranganathan wrote:
> Today's patch set 7 [1], included fixes provided till last evening IST,
> and its runs can be seen here [2] (yay! we can link to comments in
> gerrit now).
> 
> New failures: (added to the spreadsheet)
> ./tests/bugs/quick-read/bug-846240.t

The above test fails always if there is a sleep of 10 added at line 36.

I tried to replicate this in my setup, and was able to do so 3/150 times
and the failures were the same as the ones reported in the build logs
(as below).

Not finding any clear reason for the failure, I delayed the test (i.e
added a sleep 10) after the open on M0 to see if the race is uncovered,
and it was.

Du, request you to take a look at the same, as the test is around
quick-read but involves open-behind as well.

Failure snippet:

23:41:24 [23:41:28] Running tests in file
./tests/bugs/quick-read/bug-846240.t
23:41:28 ./tests/bugs/quick-read/bug-846240.t ..
23:41:28 1..17
23:41:28 ok 1, LINENUM:9
23:41:28 ok 2, LINENUM:10

23:41:28 ok 13, LINENUM:40
23:41:28 not ok 14 , LINENUM:50
23:41:28 FAILED COMMAND: [ 0 -ne 0 ]

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-08 Thread Shyam Ranganathan

Today's patch set 7 [1], included fixes provided till last evening IST,
and its runs can be seen here [2] (yay! we can link to comments in
gerrit now).

New failures: (added to the spreadsheet)
./tests/bugs/protocol/bug-808400-repl.t (core dumped)
./tests/bugs/quick-read/bug-846240.t

Older tests that had not recurred, but failed today: (moved up in the
spreadsheet)
./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
./tests/bugs/index/bug-1559004-EMLINK-handling.t

Other issues;
Test ./tests/basic/ec/ec-5-2.t core dumped again
Few geo-rep failures, Kotresh should have more logs to look at with
these runs
Test ./tests/bugs/glusterd/quorum-validation.t dumped core again

Atin/Amar, we may need to merge some of the patches that have proven to
be holding up and fixing issues today, so that we do not leave
everything to the last. Check and move them along or lmk.

Shyam

[1] Patch set 7: https://review.gluster.org/c/glusterfs/+/20637/7
[2] Runs against patch set 7 and its status (incomplete as some runs
have not completed):
https://review.gluster.org/c/glusterfs/+/20637/7#message-37bc68ce6f2157f2947da6fd03b361ab1b0d1a77
(also updated in the spreadsheet)

On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>&g

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status

2018-08-08 Thread Shyam Ranganathan

On 08/08/2018 09:43 AM, Shyam Ranganathan wrote:
> On 08/08/2018 09:41 AM, Kotresh Hiremath Ravishankar wrote:
>> For geo-rep test retrials. Could you take this instrumentation patch [1]
>> and give a run?
>> I am have tried thrice on the patch with brick mux enabled and without
>> but couldn't hit
>> geo-rep failure. May be some race and it's not happening with
>> instrumentation patch.
>>
>> [1] https://review.gluster.org/20477
> 
> Will do in my refresh today, thanks.
> 

Kotresh, this run may have the additional logs that you are looking for.
As this is a failed run on one of the geo-rep test cases.

https://build.gluster.org/job/line-coverage/434/consoleFull
19:10:55, 1 test(s) failed
19:10:55, ./tests/00-geo-rep/georep-basic-dr-tarssh.t
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status

2018-08-08 Thread Shyam Ranganathan

On 08/08/2018 04:56 AM, Nigel Babu wrote:
> Also, Shyam was saying that in case of retries, the old (failure) logs
> get overwritten by the retries which are successful. Can we disable
> re-trying the .ts when they fail just for this lock down period
> alone so
> that we do have the logs?
> 
> 
> Please don't apply a band-aid. Please fix run-test.sh so that the second
> run has a -retry attached to the file name or some such, please.

Posted patch https://review.gluster.org/c/glusterfs/+/20682 that
achieves this.

I do not like the fact that I use the gluster CLI in run-scripts.sh,
alternatives welcome.

If it looks functionally fine, then I will merge it into the big patch
[1] that we are using to run multiple tests (so that at least we start
getting retry logs from there).

Prior to this I had done this within include.rc and in cleanup, but that
gets invoked twice (at least) per test, and so generated far too many
empty tarballs for no reason.

Also, the change above does not prevent half complete logs if any test
calls cleanup in between (as that would create a tarball in between that
would be overwritten by the last invocation of cleanup).

Shyam

[1] big patch: https://review.gluster.org/c/glusterfs/+/20637
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status

2018-08-08 Thread Shyam Ranganathan

On 08/08/2018 09:41 AM, Kotresh Hiremath Ravishankar wrote:
> For geo-rep test retrials. Could you take this instrumentation patch [1]
> and give a run?
> I am have tried thrice on the patch with brick mux enabled and without
> but couldn't hit
> geo-rep failure. May be some race and it's not happening with
> instrumentation patch.
> 
> [1] https://review.gluster.org/20477

Will do in my refresh today, thanks.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Test: ./tests/bugs/ec/bug-1236065.t

2018-08-07 Thread Shyam Ranganathan

On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/ec/bug-1236065.t (Ashish)

Ashish/Atin, the above test failed in run:
https://build.gluster.org/job/regression-on-demand-multiplex/172/consoleFull

The above run is based on patchset 4 of
https://review.gluster.org/#/c/20637/4

The logs look as below, and as Ashish is unable to reproduce this, and
all failures are on line 78 with a heal outstanding of 105, looks like
this run may provide some possibilities on narrowing it down.

The problem seems to be glustershd not connecting to one of the bricks
that is restarted, and hence failing to heal that brick. This also looks
like what Ravi RCAd for the test: ./tests/bugs/replicate/bug-1363721.t

==
Test times from: cat ./glusterd.log | grep TEST
[2018-08-06 20:56:28.177386]:++
G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 77 gluster --mode=script
--wignore volume heal patchy full ++
[2018-08-06 20:56:28.767209]:++
G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 78 ^0$ get_pending_heal_count
patchy ++
[2018-08-06 20:57:48.957136]:++
G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 80 rm -f 0.o 10.o 11.o 12.o
13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o
++
==
Repeated connection failure to client-3 in glustershd.log:
[2018-08-06 20:56:30.218482] I [rpc-clnt.c:2087:rpc_clnt_reconfig]
0-patchy-client-3: changing port to 49152 (from 0)
[2018-08-06 20:56:30.222738] W [MSGID: 114043]
[client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-3: failed
to set the volume [Resource temporarily unavailable]
[2018-08-06 20:56:30.222788] W [MSGID: 114007]
[client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-3: failed
to get 'process-uuid' from reply dict [Invalid argument]
[2018-08-06 20:56:30.222813] E [MSGID: 114044]
[client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-3:
SETVOLUME on remote-host failed: cleanup flag is set for xlator.  Try
again later [Resource tempor
arily unavailable]
[2018-08-06 20:56:30.222845] I [MSGID: 114051]
[client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-3:
sending CHILD_CONNECTING event
[2018-08-06 20:56:30.222919] I [MSGID: 114018]
[client.c:2255:client_rpc_notify] 0-patchy-client-3: disconnected from
patchy-client-3. Client process will keep trying to connect to glusterd
until brick's port is
 available
==
Repeated connection messages close to above retries in
d-backends-patchy0.log:
[2018-08-06 20:56:38.530009] I [addr.c:55:compare_addr_and_update]
0-/d/backends/patchy0: allowed = "*", received addr = "127.0.0.1"
[2018-08-06 20:56:38.530044] I [login.c:111:gf_auth] 0-auth/login:
allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
The message "I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 0-dict:
key 'trusted.ec.version' is would not be sent on wire in future [Invalid
argument]" repeated 6 times between [2018-08-06 20:56:37.931040] and
 [2018-08-06 20:56:37.933084]
[2018-08-06 20:56:38.530067] I [MSGID: 115029]
[server-handshake.c:786:server_setvolume] 0-patchy-server: accepted
client from
CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo
ud.gluster.org-PC_NAME:patchy-client-0-RECON_NO:-0 (version: 4.2dev)
[2018-08-06 20:56:38.540499] I [addr.c:55:compare_addr_and_update]
0-/d/backends/patchy1: allowed = "*", received addr = "127.0.0.1"
[2018-08-06 20:56:38.540533] I [login.c:111:gf_auth] 0-auth/login:
allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
[2018-08-06 20:56:38.540555] I [MSGID: 115029]
[server-handshake.c:786:server_setvolume] 0-patchy-server: accepted
client from
CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo
ud.gluster.org-PC_NAME:patchy-client-1-RECON_NO:-0 (version: 4.2dev)
[2018-08-06 20:56:38.552442] I [addr.c:55:compare_addr_and_update]
0-/d/backends/patchy2: allowed = "*", received addr = "127.0.0.1"
[2018-08-06 20:56:38.552472] I [login.c:111:gf_auth] 0-auth/login:
allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
[2018-08-06 20:56:38.552494] I [MSGID: 115029]
[server-handshake.c:786:server_setvolume] 0-patchy-server: accepted
client from
CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo
ud.gluster.org-PC_NAME:patchy-client-2-RECON_NO:-0 (version: 4.2dev)
[2018-08-06 20:56:38.571671] I [addr.c:55:compare_addr_and_update]
0-/d/backends/patchy4: allowed = "*", received addr = "127.0.0.1"
[2018-08-06 20:56:38.571701] I [login.c:111:gf_auth] 0-auth/login:
allowed user names: 756f302a-66eb-4cc0-8f91-797183312

1 2 3 4 5 6 7 >

1 - 100 of 630 matches

Mail list logo