Re: [Gluster-Maintainers] Release 5: New option noatime

2018-09-27 Thread Amar Tumballi
On Thu, Sep 27, 2018 at 6:40 PM Shyam Ranganathan 
wrote:

> On 09/27/2018 09:08 AM, Shyam Ranganathan wrote:
> > Writing this to solicit opinions on merging this [1] change that
> > introduces an option late in the release cycle.
> >
> > I went through the code, and most changes are standard option handling
> > and basic xlator scaffolding, other than the change in posix xlator code
> > that handles the flag to not set atime and the code in utime xlator that
> > conditionally sets the flag. (of which IMO the latter is more important
> > than the former, as posix is just acting on the flag).
> >
> > The option if enabled would hence not update atime for the following
> > FOPs, opendir, open, read, and would continue updating atime on the
> > following FOPs fallocate and zerofill (which also update mtime, so the
> > AFR self heal on time change would kick in anyways).
> >
> > As the option suggests, with it enables atime is almost meaningless and
> > hence it almost does not matter where we update it and where not. Just
> > considering the problem where atime changes cause AFR to trigger a heal,
> > and the FOPs above that strictly only change atime handled with this
> > option, I am looking at this as functionally workable.
> >
>

Thanks for all these details, Shyam! Helps many to understand what the
feature is.


> > So IMO we can accept this even though it is late, but would like to hear
> > from others if this needs to be deferred till release-6.
> >
>

I am all for accepting this for glusterfs-5.0! Two reasons, in one of the
quick setup we tried, it helped to get elastic search run smoothly on
glusterfs mounts. 2nd, we did hear from Anuradha/Ram in another email
thread (Cloudsync with AFR) that it helped them in solving the issue.

This particular patch makes the overhead of ctime feature much much lesser!

-Amar


> > Shyam
>
> [1] Patch under review: https://review.gluster.org/c/glusterfs/+/21281
> ___
> maintainers mailing list
> maintainers@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>


-- 
Amar Tumballi (amarts)
___
maintainers mailing list
maintainers@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers


Re: [Gluster-Maintainers] Lock down period merge process

2018-09-27 Thread Amar Tumballi
Top posting as I am not trying to answer any individual points!

It is my wish that we don't get into lock down state! But, there may be
times when it is needed! My take is, we will go with an approach which
works for majority of the cases, and when we get to it 1-2 times, lets do
another retrospective of events happened during the time when there was a
lock-down, and then improve further. Planning too much for future won't get
us any value at this time. We have bi-weekly maintainer meetings, where we
can propose changes, and get to solutions. None of this is written in
stone, so lets move on :-)

-Amar


On Thu, Sep 27, 2018 at 8:18 PM Shyam Ranganathan 
wrote:

> On 09/27/2018 10:05 AM, Atin Mukherjee wrote:
> > Now does this mean we block commit rights for component Y till
> > we have the root cause?
> >
> >
> > It was a way of making it someone's priority. If you have another
> > way to make it someone's priority that is better than this, please
> > suggest and we can have a discussion around it and agree on it :-).
> >
> >
> > This is what I can think of:
> >
> > 1. Component peers/maintainers take a first triage of the test failure.
> > Do the initial debugging and (a) point to the component which needs
> > further debugging or (b) seek for help at gluster-devel ML for
> > additional insight for identifying the problem and narrowing down to a
> > component.
> > 2. If it’s (1 a) then we already know the component and the owner. If
> > it’s (2 b) at this juncture, it’s all maintainers responsibility to
> > ensure the email is well understood and based on the available details
> > the ownership is picked up by respective maintainers. It might be also
> > needed that multiple maintainers might have to be involved and this is
> > why I focus on this as a group effort than individual one.
>
> In my thinking, acting as a group here is better than making it a
> sub-groups/individuals responsibility. Which has been put forth by Atin
> (IMO) well. Thus, keep the merge rights out for all (of course some
> still need to have it), and get the situation addressed is better.
> ___
> maintainers mailing list
> maintainers@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>


-- 
Amar Tumballi (amarts)
___
maintainers mailing list
maintainers@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers


[Gluster-Maintainers] Build failed in Jenkins: regression-test-with-multiplex #876

2018-09-27 Thread jenkins
See 


Changes:

[Amar Tumballi] monitoring: create dump dir if it doesn't exist

[Amar Tumballi] rpc: failed requests immediately if rpc connection is down

[Amar Tumballi] python3: assume python3 unless building _packages_ on sys 
without py3

[Amar Tumballi] libglusterfs : fix coverity issue in store.c

--
[...truncated 1.04 MB...]
#4  0x7f9feca44bad in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 13 (Thread 0x7f9f9e7fc700 (LWP 1289)):
#0  0x7f9fed17fd42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
No symbol table info available.
#1  0x7f9fe09dfbb2 in janitor_get_next_fd (this=0x7f9fa946db30) at 
:1353
priv = 0x7f9faa317c20
pfd = 0x0
timeout = {tv_sec = 1538076785, tv_nsec = 0}
#2  0x7f9fe09dfd66 in posix_janitor_thread_proc (data=0x7f9fa946db30) at 
:1401
this = 0x7f9fa946db30
priv = 0x7f9faa317c20
pfd = 0x0
now = 1538076775
__FUNCTION__ = "posix_janitor_thread_proc"
#3  0x7f9fed17be25 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x7f9feca44bad in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 12 (Thread 0x7f9f9effd700 (LWP 1288)):
#0  0x7f9feca0b56d in nanosleep () from /lib64/libc.so.6
No symbol table info available.
#1  0x7f9feca0b404 in sleep () from /lib64/libc.so.6
No symbol table info available.
#2  0x7f9fe09e13e9 in posix_health_check_thread_proc (data=0x7f9fa946db30) 
at 
:1922
this = 0x7f9fa946db30
priv = 0x7f9faa317c20
interval = 30
ret = -1
top = 0x0
victim = 0x0
trav_p = 0x0
count = 0
victim_found = false
ctx = 0xd73010
__FUNCTION__ = "posix_health_check_thread_proc"
#3  0x7f9fed17be25 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x7f9feca44bad in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 11 (Thread 0x7f9fbd1f8700 (LWP 1287)):
#0  0x7f9feca0b56d in nanosleep () from /lib64/libc.so.6
No symbol table info available.
#1  0x7f9feca0b404 in sleep () from /lib64/libc.so.6
No symbol table info available.
#2  0x7f9fe09e1da4 in posix_disk_space_check_thread_proc 
(data=0x7f9fa946db30) at 
:2107
this = 0x7f9fa946db30
priv = 0x7f9faa317c20
interval = 5
ret = 0
__FUNCTION__ = "posix_disk_space_check_thread_proc"
#3  0x7f9fed17be25 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x7f9feca44bad in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 10 (Thread 0x7f9f9f7fe700 (LWP 1285)):
#0  0x7f9feca3bc73 in select () from /lib64/libc.so.6
No symbol table info available.
#1  0x7f9fdb705a85 in changelog_ev_dispatch (data=0x7f9fa9dd1858) at 
:352
ret = 3
opaque = 0x0
this = 0x7f9fa9472870
c_clnt = 0x7f9fa9dd1858
tv = {tv_sec = 0, tv_usec = 87051}
__FUNCTION__ = "changelog_ev_dispatch"
#2  0x7f9fed17be25 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x7f9feca44bad in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 9 (Thread 0x7f9f9700 (LWP 1284)):
#0  0x7f9feca3bc73 in select () from /lib64/libc.so.6
No symbol table info available.
#1  0x7f9fdb705a85 in changelog_ev_dispatch (data=0x7f9fa9dd1858) at 
:352
ret = 3
opaque = 0x0
this = 0x7f9fa9472870
c_clnt = 0x7f9fa9dd1858
tv = {tv_sec = 0, tv_usec = 87048}
__FUNCTION__ = "changelog_ev_dispatch"
#2  0x7f9fed17be25 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x7f9feca44bad in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 8 (Thread 0x7f9fbdafa700 (LWP 1283)):
#0  0x7f9fed17f995 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
No symbol table info available.
#1  0x7f9fdb705473 in changelog_ev_connector (data=0x7f9fa9dd1858) at 

Re: [Gluster-Maintainers] [for discussion] suggestions around improvements in bug triage workflow

2018-09-27 Thread Atin Mukherjee
On Thu, 27 Sep 2018 at 20:37, Sankarshan Mukhopadhyay <
sankarshan.mukhopadh...@gmail.com> wrote:

> The origin of this conversation is a bit of a hall-way discussion with
> Shyam. The actual matter should be familiar to maintainers. For what
> it is worth, it was also mentioned at the recent Community meeting.
>
> As the current workflows go, once a release is made generally
> available, a large swathe of bugs against an EOLd release are
> automatically closed citing that "the release is EOLd and if the bug
> is still reproducible on later releases, please reopen against those".
> However, there is perhaps a better way to handle this:


I will play a devil’s advocate role here, but one of the question we need
to ask ourselves additionally:

- Why are we getting into such state where so many bugs primarily the ones
which haven’t got development’s attention get auto closed due to EOL?
- Doesn’t this indicate we’re actually piling up our backlog with
(probable) genuine defects and not taking enough action?

Bugzilla triage needs to be made as a habit by individuals to ensure new
bugs get attention. Technically this will no longer be a problem.

However, for now I think this workflow sounds a right measure atleast to
ensure we don’t close down a genuine defect.


>
> [0] clone the bug into master so that it continues to be part of a
> valid bug backlog
>
> [1] validate per release that the circumstances described by the bug
> are actually resolved and hence CLOSED CURRENTRELEASE them
>
> I am posting here for discussion around this as well as being able to
> identify whether tooling/automation can be used to handle some of
> this.
>
>
>
> --
> sankarshan mukhopadhyay
> 
> ___
> maintainers mailing list
> maintainers@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>
-- 
- Atin (atinm)
___
maintainers mailing list
maintainers@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers


[Gluster-Maintainers] [for discussion] suggestions around improvements in bug triage workflow

2018-09-27 Thread Sankarshan Mukhopadhyay
The origin of this conversation is a bit of a hall-way discussion with
Shyam. The actual matter should be familiar to maintainers. For what
it is worth, it was also mentioned at the recent Community meeting.

As the current workflows go, once a release is made generally
available, a large swathe of bugs against an EOLd release are
automatically closed citing that "the release is EOLd and if the bug
is still reproducible on later releases, please reopen against those".
However, there is perhaps a better way to handle this:

[0] clone the bug into master so that it continues to be part of a
valid bug backlog

[1] validate per release that the circumstances described by the bug
are actually resolved and hence CLOSED CURRENTRELEASE them

I am posting here for discussion around this as well as being able to
identify whether tooling/automation can be used to handle some of
this.



-- 
sankarshan mukhopadhyay

___
maintainers mailing list
maintainers@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers


Re: [Gluster-Maintainers] Lock down period merge process

2018-09-27 Thread Shyam Ranganathan
On 09/27/2018 10:05 AM, Atin Mukherjee wrote:
> Now does this mean we block commit rights for component Y till
> we have the root cause? 
> 
> 
> It was a way of making it someone's priority. If you have another
> way to make it someone's priority that is better than this, please
> suggest and we can have a discussion around it and agree on it :-).
> 
> 
> This is what I can think of:
> 
> 1. Component peers/maintainers take a first triage of the test failure.
> Do the initial debugging and (a) point to the component which needs
> further debugging or (b) seek for help at gluster-devel ML for
> additional insight for identifying the problem and narrowing down to a
> component. 
> 2. If it’s (1 a) then we already know the component and the owner. If
> it’s (2 b) at this juncture, it’s all maintainers responsibility to
> ensure the email is well understood and based on the available details
> the ownership is picked up by respective maintainers. It might be also
> needed that multiple maintainers might have to be involved and this is
> why I focus on this as a group effort than individual one.

In my thinking, acting as a group here is better than making it a
sub-groups/individuals responsibility. Which has been put forth by Atin
(IMO) well. Thus, keep the merge rights out for all (of course some
still need to have it), and get the situation addressed is better.
___
maintainers mailing list
maintainers@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers


Re: [Gluster-Maintainers] Lock down period merge process

2018-09-27 Thread Atin Mukherjee
On Thu, 27 Sep 2018 at 18:27, Pranith Kumar Karampuri 
wrote:

>
>
> On Thu, Sep 27, 2018 at 5:27 PM Atin Mukherjee 
> wrote:
>
>> tests/bugs//xxx.t failing can’t always mean there’s a bug in
>> component Y.
>>
>
> I agree.
>
>
>> It could be anywhere till we root cause the problem.
>>
>
> Some one needs to step in to find out what the root cause is. I agree that
> for a component like glusterd bugs in other components can easily lead to
> failures. How do we make sure that someone takes a look at it?
>
>
>> Now does this mean we block commit rights for component Y till we have
>> the root cause?
>>
>
> It was a way of making it someone's priority. If you have another way to
> make it someone's priority that is better than this, please suggest and we
> can have a discussion around it and agree on it :-).
>

This is what I can think of:

1. Component peers/maintainers take a first triage of the test failure. Do
the initial debugging and (a) point to the component which needs further
debugging or (b) seek for help at gluster-devel ML for additional insight
for identifying the problem and narrowing down to a component.
2. If it’s (1 a) then we already know the component and the owner. If it’s
(2 b) at this juncture, it’s all maintainers responsibility to ensure the
email is well understood and based on the available details the ownership
is picked up by respective maintainers. It might be also needed that
multiple maintainers might have to be involved and this is why I focus on
this as a group effort than individual one.


>
>
>> That doesn’t make much sense right? This is one of the reasons in such
>> case we need to work as a group, figure out the problem and fix it, till
>> then locking down the entire repo for further commits look a better option
>> (IMHO).
>>
>
> Let us dig deeper into what happens when we work as a group, in general it
> will be one person who will take the lead and get help. Is there a way to
> find that person without locking down whole master? If there is, we may
> never have to get to a place where we lock down master completely. We may
> not even have to lock down components. Suggestions are welcome.
>
>
>> On Thu, 27 Sep 2018 at 14:04, Nigel Babu  wrote:
>>
>>> We know maintainers of the components which are leading to repeated
 failures in that component and we just need to do the same thing we did to
 remove commit access for the maintainer of the component instead of all of
 the people. So in that sense it is not good faith and can be enforced.

>>>
>>> Pranith, I believe the difference of opinion is because you're looking
>>> at this problem in terms of "who" rather than "what". We do not care about
>>> *who* broke master. Removing commit access from a component owner doesn't
>>> stop someone else from landing a patch will create a failure in the same
>>> component or even a different component. We cannot stop patches from
>>> landing because it touches a specific component. And even if we could, our
>>> components are not entirely independent of each other. There could still be
>>> failures. This is a common scenario and it happened the last time we had to
>>> close master. Let me further re-emphasize our goals:
>>>
>>> * When master is broken, every team member's energy needs to be focused
>>> on getting master to green. Who broke the build isn't a concern as much as
>>> *the build is broken*. This is not a situation to punish specific people.
>>> * If we allow other commits to land, we run the risk of someone else
>>> breaking master with a different patch. Now we have two failures to debug
>>> and fix.
>>> ___
>>> maintainers mailing list
>>> maintainers@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/maintainers
>>>
>> --
>> - Atin (atinm)
>>
>
>
> --
> Pranith
>
-- 
- Atin (atinm)
___
maintainers mailing list
maintainers@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers


Re: [Gluster-Maintainers] Release 5: New option noatime

2018-09-27 Thread Shyam Ranganathan
On 09/27/2018 09:08 AM, Shyam Ranganathan wrote:
> Writing this to solicit opinions on merging this [1] change that
> introduces an option late in the release cycle.
> 
> I went through the code, and most changes are standard option handling
> and basic xlator scaffolding, other than the change in posix xlator code
> that handles the flag to not set atime and the code in utime xlator that
> conditionally sets the flag. (of which IMO the latter is more important
> than the former, as posix is just acting on the flag).
> 
> The option if enabled would hence not update atime for the following
> FOPs, opendir, open, read, and would continue updating atime on the
> following FOPs fallocate and zerofill (which also update mtime, so the
> AFR self heal on time change would kick in anyways).
> 
> As the option suggests, with it enables atime is almost meaningless and
> hence it almost does not matter where we update it and where not. Just
> considering the problem where atime changes cause AFR to trigger a heal,
> and the FOPs above that strictly only change atime handled with this
> option, I am looking at this as functionally workable.
> 
> So IMO we can accept this even though it is late, but would like to hear
> from others if this needs to be deferred till release-6.
> 
> Shyam

[1] Patch under review: https://review.gluster.org/c/glusterfs/+/21281
___
maintainers mailing list
maintainers@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers


[Gluster-Maintainers] Release 5: New option noatime

2018-09-27 Thread Shyam Ranganathan
Writing this to solicit opinions on merging this [1] change that
introduces an option late in the release cycle.

I went through the code, and most changes are standard option handling
and basic xlator scaffolding, other than the change in posix xlator code
that handles the flag to not set atime and the code in utime xlator that
conditionally sets the flag. (of which IMO the latter is more important
than the former, as posix is just acting on the flag).

The option if enabled would hence not update atime for the following
FOPs, opendir, open, read, and would continue updating atime on the
following FOPs fallocate and zerofill (which also update mtime, so the
AFR self heal on time change would kick in anyways).

As the option suggests, with it enables atime is almost meaningless and
hence it almost does not matter where we update it and where not. Just
considering the problem where atime changes cause AFR to trigger a heal,
and the FOPs above that strictly only change atime handled with this
option, I am looking at this as functionally workable.

So IMO we can accept this even though it is late, but would like to hear
from others if this needs to be deferred till release-6.

Shyam
___
maintainers mailing list
maintainers@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers


Re: [Gluster-Maintainers] Lock down period merge process

2018-09-27 Thread Pranith Kumar Karampuri
On Thu, Sep 27, 2018 at 5:27 PM Atin Mukherjee  wrote:

> tests/bugs//xxx.t failing can’t always mean there’s a bug in
> component Y.
>

I agree.


> It could be anywhere till we root cause the problem.
>

Some one needs to step in to find out what the root cause is. I agree that
for a component like glusterd bugs in other components can easily lead to
failures. How do we make sure that someone takes a look at it?


> Now does this mean we block commit rights for component Y till we have the
> root cause?
>

It was a way of making it someone's priority. If you have another way to
make it someone's priority that is better than this, please suggest and we
can have a discussion around it and agree on it :-).


> That doesn’t make much sense right? This is one of the reasons in such
> case we need to work as a group, figure out the problem and fix it, till
> then locking down the entire repo for further commits look a better option
> (IMHO).
>

Let us dig deeper into what happens when we work as a group, in general it
will be one person who will take the lead and get help. Is there a way to
find that person without locking down whole master? If there is, we may
never have to get to a place where we lock down master completely. We may
not even have to lock down components. Suggestions are welcome.


> On Thu, 27 Sep 2018 at 14:04, Nigel Babu  wrote:
>
>> We know maintainers of the components which are leading to repeated
>>> failures in that component and we just need to do the same thing we did to
>>> remove commit access for the maintainer of the component instead of all of
>>> the people. So in that sense it is not good faith and can be enforced.
>>>
>>
>> Pranith, I believe the difference of opinion is because you're looking at
>> this problem in terms of "who" rather than "what". We do not care about
>> *who* broke master. Removing commit access from a component owner doesn't
>> stop someone else from landing a patch will create a failure in the same
>> component or even a different component. We cannot stop patches from
>> landing because it touches a specific component. And even if we could, our
>> components are not entirely independent of each other. There could still be
>> failures. This is a common scenario and it happened the last time we had to
>> close master. Let me further re-emphasize our goals:
>>
>> * When master is broken, every team member's energy needs to be focused
>> on getting master to green. Who broke the build isn't a concern as much as
>> *the build is broken*. This is not a situation to punish specific people.
>> * If we allow other commits to land, we run the risk of someone else
>> breaking master with a different patch. Now we have two failures to debug
>> and fix.
>> ___
>> maintainers mailing list
>> maintainers@gluster.org
>> https://lists.gluster.org/mailman/listinfo/maintainers
>>
> --
> - Atin (atinm)
>


-- 
Pranith
___
maintainers mailing list
maintainers@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers


Re: [Gluster-Maintainers] Lock down period merge process

2018-09-27 Thread Nigel Babu
>
> We know maintainers of the components which are leading to repeated
> failures in that component and we just need to do the same thing we did to
> remove commit access for the maintainer of the component instead of all of
> the people. So in that sense it is not good faith and can be enforced.
>

Pranith, I believe the difference of opinion is because you're looking at
this problem in terms of "who" rather than "what". We do not care about
*who* broke master. Removing commit access from a component owner doesn't
stop someone else from landing a patch will create a failure in the same
component or even a different component. We cannot stop patches from
landing because it touches a specific component. And even if we could, our
components are not entirely independent of each other. There could still be
failures. This is a common scenario and it happened the last time we had to
close master. Let me further re-emphasize our goals:

* When master is broken, every team member's energy needs to be focused on
getting master to green. Who broke the build isn't a concern as much as
*the build is broken*. This is not a situation to punish specific people.
* If we allow other commits to land, we run the risk of someone else
breaking master with a different patch. Now we have two failures to debug
and fix.
___
maintainers mailing list
maintainers@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers


Re: [Gluster-Maintainers] Lock down period merge process

2018-09-27 Thread Pranith Kumar Karampuri
On Wed, Sep 26, 2018 at 8:14 PM Shyam Ranganathan 
wrote:

> This was discussed in the maintainers meeting (see notes [1]), and the
> conclusion is as follows,
>

I had to leave early that day due to a conflicting meeting. Comments below.


>
> - Merge lock down would be across the code base, and not component
> specific. As component level decision goes into more 'good faith'
> category and requires more tooling to avoid the same.
>

We know maintainers of the components which are leading to repeated
failures in that component and we just need to do the same thing we did to
remove commit access for the maintainer of the component instead of all of
the people. So in that sense it is not good faith and can be enforced.


>
> - Merge lock down would get closer to when repeated failures are
> noticed, than as it stands now (looking for failures across) as we
> strengthen the code base
>
> In all testing health maintained at always GREEN is where we want to
> reach over time and take a step back to correct any anomalies when we
> detect the same to retain the said health.
>
> Shyam
>
> [1] Maintainer meeting notes:
> https://lists.gluster.org/pipermail/maintainers/2018-September/005054.html
> (see Round table section)
> On 09/03/2018 01:47 AM, Pranith Kumar Karampuri wrote:
> >
> >
> > On Wed, Aug 22, 2018 at 5:54 PM Shyam Ranganathan  > > wrote:
> >
> > On 08/18/2018 12:45 AM, Pranith Kumar Karampuri wrote:
> > >
> > >
> > > On Tue, Aug 14, 2018 at 5:29 PM Shyam Ranganathan
> > mailto:srang...@redhat.com>
> > > >> wrote:
> > >
> > > On 08/09/2018 01:24 AM, Pranith Kumar Karampuri wrote:
> > > >
> > > >
> > > > On Thu, Aug 9, 2018 at 1:25 AM Shyam Ranganathan
> > > mailto:srang...@redhat.com>
> > >
> > > > 
> >  > > >
> > > > Maintainers,
> > > >
> > > > The following thread talks about a merge during a merge
> > > lockdown, with
> > > > differing view points (this mail is not to discuss the
> view
> > > points).
> > > >
> > > > The root of the problem is that we leave the current
> process
> > > to good
> > > > faith. If we have a simple rule that we will not merge
> > > anything during a
> > > > lock down period, this confusion and any future
> > repetitions of
> > > the same
> > > > would not occur.
> > > >
> > > > I propose that we follow the simpler rule, and would
> > like to hear
> > > > thoughts around this.
> > > >
> > > > This also means that in the future, we may not need to
> > remove
> > > commit
> > > > access for other maintainers, as we do *not* follow a
> good
> > > faith policy,
> > > > and instead depend on being able to revert and announce
> > on the
> > > threads
> > > > why we do so.
> > > >
> > > >
> > > > I think it is a good opportunity to establish guidelines and
> > > process so
> > > > that we don't end up in this state in future where one needs
> > to lock
> > > > down the branch to make it stable. From that p.o.v.
> > discussion on this
> > > > thread about establishing a process for lock down probably
> > doesn't add
> > > > much value. My personal opinion for this instance at least
> > is that
> > > it is
> > > > good that it was locked down. I tend to forget things and not
> > > having the
> > > > access to commit helped follow the process automatically :-).
> > >
> > > The intention is that master and release branches are always
> > maintained
> > > in good working order. This involves, tests and related checks
> > passing
> > > *always*.
> > >
> > > When this situation is breached, correcting it immediately is
> > better
> > > than letting it build up, as that would entail longer times
> > and more
> > > people to fix things up.
> > >
> > > In an ideal world, if nightly runs fail, the next thing done
> > would be to
> > > examine patches that were added between the 2 runs, and see if
> > they are
> > > the cause for failure, and back them out.
> > >
> > > Hence calling to backout patches is something that would
> > happen more
> > > regularly in the future if things are breaking.
> > >
> > >
> > > I'm with you till here.
> > >
> > >
> > >
> > > Lock down may happen if 2 consecutive nightly