New consensus sought - when to reset try repository?

2014-04-30 Thread Hal Wine
On 2014-04-30 16:52 , Daniel Holbert wrote:
> On 03/07/2014 02:41 PM, Hal Wine wrote:
>> On 2014-02-28 17:24 , Hal Wine wrote:
>>> tl;dr: what is the balance point between pushes to try taking too long
>>> and loosing repository history of recent try pushes?
>> Based on the responses to this specific question, we'll go back to
>> waiting for developers to notify IT when there is enough performance
>> impact to warrant a reset of the try repository

Thanks for reopening this thread.

>
> As documented on
>  https://bugzilla.mozilla.org/show_bug.cgi?id=994028
> we've now had multiple instances in the past few weeks where Try has
> been horked (refusing all pushes) for hours at a time, with no clear
> reason why.
>
> I'm not sure if this is caused by Try having too many heads & needing a
> reset, but it seems like it could be. (It also could be *indirectly*
> caused by the too-many-heads issue, too; e.g. perhaps someone
> interrupted a push because it was taking too long (due to too many
> heads), and their client inadvertently left something on the server
> locked, which then locks everyone else out for hours.)
>
> Whatever the cause, it's feeling more and more like periodic, automatic
> Try resets would be helpful to keep things running smoothly.

Yes, or a better working definition of "too much performance impact". In
this case, we had a 4h10m gap in the pushlog, and now things are back to
"normal". A reset would take about that long to perform.

>
> Would it be possible to set up a system along the lines of dbaron's
> suggestion earlier in this post? (Frequent resets, with a post-reset
> step to pull in the most recent ~2 weeks worth of heads from the old
> repo, so that people's try pushes don't mysteriously disappear if they
> happen to push right before a reset.)

It is something that could be tried - we'll try a few dry runs to see
how much this adds to the reset try duration (given that we have to pull
those changes from the "slow repo").

I also have some fresh thoughts on https://bugzil.la/691459 - there may
be some log correlation possible to get us hard data on overall success
rates and push times.

Looking forward to getting a newer (and hopefully better) approach to
this recurring issue.

--Hal

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Decision reached on: Consensus sought - when to reset try repository?

2014-04-30 Thread Daniel Holbert
On 03/07/2014 02:41 PM, Hal Wine wrote:
> On 2014-02-28 17:24 , Hal Wine wrote:
>> tl;dr: what is the balance point between pushes to try taking too long
>> and loosing repository history of recent try pushes?
> Based on the responses to this specific question, we'll go back to
> waiting for developers to notify IT when there is enough performance
> impact to warrant a reset of the try repository

As documented on
 https://bugzilla.mozilla.org/show_bug.cgi?id=994028
we've now had multiple instances in the past few weeks where Try has
been horked (refusing all pushes) for hours at a time, with no clear
reason why.

I'm not sure if this is caused by Try having too many heads & needing a
reset, but it seems like it could be. (It also could be *indirectly*
caused by the too-many-heads issue, too; e.g. perhaps someone
interrupted a push because it was taking too long (due to too many
heads), and their client inadvertently left something on the server
locked, which then locks everyone else out for hours.)

Whatever the cause, it's feeling more and more like periodic, automatic
Try resets would be helpful to keep things running smoothly.

Would it be possible to set up a system along the lines of dbaron's
suggestion earlier in this post? (Frequent resets, with a post-reset
step to pull in the most recent ~2 weeks worth of heads from the old
repo, so that people's try pushes don't mysteriously disappear if they
happen to push right before a reset.)

Thanks,
~Daniel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Decision reached on: Consensus sought - when to reset try repository?

2014-03-07 Thread Hal Wine
On 2014-02-28 17:24 , Hal Wine wrote:
> tl;dr: what is the balance point between pushes to try taking too long
> and loosing repository history of recent try pushes?
Based on the responses to this specific question, we'll go back to
waiting for developers to notify IT when there is enough performance
impact to warrant a reset of the try repository. I've added the
reporting instructions to the wiki page about try:
https://wiki.mozilla.org/ReleaseEngineering/TryServer



Thanks to everyone else for showing interest in the underlying problems.
Suggestions for that are best added to the bugs cited below.

Thanks!
--Hal
>
> Summary:
> 
>
> As most developers have experienced, pushing to try can sometimes take
> a long time. Once it takes "too long" (as measured by screams of pain
> in #releng) ,
> a "try [repository] reset" is scheduled. This hurts productivity and
> increases frustration for everyone involved (devs, IT, RelEng). We
> don't want to do this anymore.
>
> A reset of the try repository deletes the existing contents, and
> replaces with a fresh clone from mozilla-central. While the tbpl
> information will remain valid for any completed build, any attempt to
> view the diffs for a try build will fail (unless you already had them
> in your local repository).
>
> Progress on resolution of the root cause:
> -
>
> IT has made tremendous progress in reducing the occurrence of "long
> push times", but they still are not predictable. Various attempts at
> monitoring[1] and auto correction[2] have not been successful in
> improving the situation. Work continues on additional changes that
> should improve the situation[3].
>
> The most recent mitigation strategy is to trade the "unknown timing"
> disruption of the push times increasing to a pain threshold with a
> "known timing" of reseting the try repository every TCW (tree closing
> window - every 6 wks currently). However, we heard from some folks
> that this is too often.
>
> The most recent try-reset-triggered-by-pain was a duration of 6
> months[4]. There was at least one report just 3 months after reset of
> problems[5].
>
> So, the question is - what say developers -- what's the balance point
> between:
>  - too often, making collaborating on try pushes hard
>  - too infrequent, introducing increasing push times
>
> --Hal
>
> Prior Work:
> ---
> [1] bug https://bugzil.la/691459
> [2] bugs https://bugzil.la/554656https://bugzil.la/734225
> #c24
> https://bugzil.la/633161https://bugzil.la/529179
> [3] bugs https://bugzil.la/770811https://bugzil.la/937732others
> [4] bugs https://bugzil.la/894429&; https://bugzil.la/962275
> [5] bug https://bugzil.la/925354
>

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Consensus sought - when to reset try repository?

2014-03-05 Thread Ed Morley

On 05 March 2014 06:07:28, Gregory Szorc wrote:

I wouldn't have such a big issue with Try resets if we didn't lose
information in the process. I believe every time there's been a Try
reset, I've lost data from a recent (<1 week) Try push and I needed to
re-run that job


Whilst it doesn't help with being able to refer to the diff, fwiw bug 
721152 means that TBPL now supports accessing the results of a Try run 
even after repo-reset, so long as you use the single-revision URL form 
that appears in the "thank you for your try push" email.


Note that TBPL data is purged for all trees after 30 days (in line with 
when the logs are purged from ftp.m.o).


Ed
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Consensus sought - when to reset try repository?

2014-03-04 Thread Gregory Szorc

On 2/28/14, 5:24 PM, Hal Wine wrote:

tl;dr: what is the balance point between pushes to try taking too long
and loosing repository history of recent try pushes?

Summary:


As most developers have experienced, pushing to try can sometimes take a
long time. Once it takes "too long" (as measured by screams of pain in
#releng) , a
"try [repository] reset" is scheduled. This hurts productivity and
increases frustration for everyone involved (devs, IT, RelEng). We don't
want to do this anymore.

A reset of the try repository deletes the existing contents, and
replaces with a fresh clone from mozilla-central. While the tbpl
information will remain valid for any completed build, any attempt to
view the diffs for a try build will fail (unless you already had them in
your local repository).

Progress on resolution of the root cause:
-

IT has made tremendous progress in reducing the occurrence of "long push
times", but they still are not predictable. Various attempts at
monitoring[1] and auto correction[2] have not been successful in
improving the situation. Work continues on additional changes that
should improve the situation[3].

The most recent mitigation strategy is to trade the "unknown timing"
disruption of the push times increasing to a pain threshold with a
"known timing" of reseting the try repository every TCW (tree closing
window - every 6 wks currently). However, we heard from some folks that
this is too often.

The most recent try-reset-triggered-by-pain was a duration of 6
months[4]. There was at least one report just 3 months after reset of
problems[5].

So, the question is - what say developers -- what's the balance point
between:
  - too often, making collaborating on try pushes hard
  - too infrequent, introducing increasing push times


I wouldn't have such a big issue with Try resets if we didn't lose 
information in the process. I believe every time there's been a Try 
reset, I've lost data from a recent (<1 week) Try push and I needed to 
re-run that job - incurring extra cost to Mozilla and wasting my time. I 
also periodically find myself wanting to answer questions like "what 
percentage of tree closures are due to pushes that didn't go to Try 
first." Data loss stinks.


I'd say the goal should be "no data loss." I have an idea that will 
enable us to achieve this.


Let's expose every newly-reset instance of the Try repo as a separate 
URL. We would still push to ssh://hg.mozilla.org/try, but the URLs 
printed and the URLs used by automation would be URLs to repos that 
would never go away. e.g. 
https://hg.mozilla.org/tries/try1/rev/840f122d1286 ("try1" being the 
important bit in there). When we reset Try, you'd hand out URLs to 
"try2." You could reset the writable Try repo as frequently as you 
desired and aside from a slightly different repo URL being given out, 
nobody should notice.


The main drawbacks of this approach that I can think of are all in 
automation: parts of automation are very repo/URL centric and having 
effectively dynamic URLs might break assumptions. But making automation 
work against arbitrary URLs is a good thing, as it allows automation to 
be more flexible and this allows people to experiment with alternate 
repo hosting, landing tools, landing-integrated code review tools, etc 
without requiring special involvement from RelEng. "Everything is a web 
service and is self-service," etc.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Consensus sought - when to reset try repository?

2014-03-02 Thread Ted Mielczarek
On 2/28/2014 8:44 PM, John Schoenick wrote:
> On 02/28/2014 05:40 PM, Daniel Holbert wrote:
>> On 02/28/2014 05:32 PM, L. David Baron wrote:
>>> Why not change the try repo reset procedure so that instead of just
>>> cloning mozilla-central, you also pull from the old try repo into
>>> the new one all of the heads of try pushes made within the last one
>>> or two weeks.  (Presumably there's a list of them somewhere, or it
>>> could be maintained?)  Then the try reset won't break things for
>>> those recent pushes, but only the older ones.
>> This seems like a good solution.
>>
>> One (possibly obvious) clarification: we'd need to rely on the pushlog
>> DB (rather than the changeset datestamps) when creating the list of
>> recent heads, since changeset datestamps are customizable and hence
>> unreliable.
>
> Or taking this a step further, having a rolling cronjob |hg strip|
> revisions not on m-c older than a certain date would remove the need
> to perform resets entirely, and give a predictable date after which
> your try push would disappear. You could even add a "keep me for N
> days" parameter to try syntax for pushes that we'd like to stick around.
>
Note, we already investigated this some time ago[1], and "hg strip"
doesn't interact well with the current pushlog hook. It's possible we
could make this work if we changed the pushlog hook to accomodate.

-Ted

1. https://bugzilla.mozilla.org/show_bug.cgi?id=633161

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Consensus sought - when to reset try repository?

2014-03-01 Thread Ehsan Akhgari

On 2014-02-28, 9:02 PM, Hal Wine wrote:

On 2014-02-28 17:32 , L. David Baron wrote:

On Friday 2014-02-28 17:24 -0800, Hal Wine wrote:

So, the question is - what say developers -- what's the balance point
between:
  - too often, making collaborating on try pushes hard
  - too infrequent, introducing increasing push times


Why not change the try repo reset procedure so that instead of just
cloning mozilla-central, you also pull from the old try repo into
the new one all of the heads of try pushes made within the last one
or two weeks.  (Presumably there's a list of them somewhere, or it
could be maintained?)  Then the try reset won't break things for
those recent pushes, but only the older ones.


David -- that's one idea that has not yet been tried. I suspect other
folks will also come up with new ideas.

However - in the meantime, what try reset schedule would devs prefer?
Are you suggesting we stay with the only-reset-when-devs-scream-in-pain
approach until a real solution is found?


I would recommend doing that while we try to find a real solution.

Cheers,
Ehsan

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Consensus sought - when to reset try repository?

2014-02-28 Thread Hal Wine
On 2014-02-28 17:32 , L. David Baron wrote:
> On Friday 2014-02-28 17:24 -0800, Hal Wine wrote:
>> So, the question is - what say developers -- what's the balance point
>> between:
>>  - too often, making collaborating on try pushes hard
>>  - too infrequent, introducing increasing push times
> 
> Why not change the try repo reset procedure so that instead of just
> cloning mozilla-central, you also pull from the old try repo into
> the new one all of the heads of try pushes made within the last one
> or two weeks.  (Presumably there's a list of them somewhere, or it
> could be maintained?)  Then the try reset won't break things for
> those recent pushes, but only the older ones.

David -- that's one idea that has not yet been tried. I suspect other
folks will also come up with new ideas.

However - in the meantime, what try reset schedule would devs prefer?
Are you suggesting we stay with the only-reset-when-devs-scream-in-pain
approach until a real solution is found?

--Hal

P.S. There is data deep in the bugs which casts doubt on effectiveness
of such an approach. (The issue is not strictly number of heads, but
also an unknown function of the "depth" of the heads.)


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Consensus sought - when to reset try repository?

2014-02-28 Thread L. David Baron
On Friday 2014-02-28 17:44 -0800, John Schoenick wrote:
> Or taking this a step further, having a rolling cronjob |hg strip|
> revisions not on m-c older than a certain date would remove the need
> to perform resets entirely, and give a predictable date after which
> your try push would disappear. You could even add a "keep me for N
> days" parameter to try syntax for pushes that we'd like to stick
> around.

I'm not sure how well "hg strip" would interact with a repository
that people are pushing to at the same time, though.

-David

-- 
𝄞   L. David Baron http://dbaron.org/   𝄂
𝄢   Mozilla  https://www.mozilla.org/   𝄂
 Before I built a wall I'd ask to know
 What I was walling in or walling out,
 And to whom I was like to give offense.
   - Robert Frost, Mending Wall (1914)


signature.asc
Description: Digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Consensus sought - when to reset try repository?

2014-02-28 Thread Ryan VanderMeulen

On 2/28/2014 8:44 PM, John Schoenick wrote:

Or taking this a step further, having a rolling cronjob |hg strip|
revisions not on m-c older than a certain date would remove the need to
perform resets entirely, and give a predictable date after which your
try push would disappear. You could even add a "keep me for N days"
parameter to try syntax for pushes that we'd like to stick around.


30 days is how long the logs are kept for, so maybe that would be a good 
amount of time.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Consensus sought - when to reset try repository?

2014-02-28 Thread John Schoenick

On 02/28/2014 05:40 PM, Daniel Holbert wrote:

On 02/28/2014 05:32 PM, L. David Baron wrote:

Why not change the try repo reset procedure so that instead of just
cloning mozilla-central, you also pull from the old try repo into
the new one all of the heads of try pushes made within the last one
or two weeks.  (Presumably there's a list of them somewhere, or it
could be maintained?)  Then the try reset won't break things for
those recent pushes, but only the older ones.

This seems like a good solution.

One (possibly obvious) clarification: we'd need to rely on the pushlog
DB (rather than the changeset datestamps) when creating the list of
recent heads, since changeset datestamps are customizable and hence
unreliable.


Or taking this a step further, having a rolling cronjob |hg strip| 
revisions not on m-c older than a certain date would remove the need to 
perform resets entirely, and give a predictable date after which your 
try push would disappear. You could even add a "keep me for N days" 
parameter to try syntax for pushes that we'd like to stick around.





~Daniel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Consensus sought - when to reset try repository?

2014-02-28 Thread Daniel Holbert
On 02/28/2014 05:32 PM, L. David Baron wrote:
> Why not change the try repo reset procedure so that instead of just
> cloning mozilla-central, you also pull from the old try repo into
> the new one all of the heads of try pushes made within the last one
> or two weeks.  (Presumably there's a list of them somewhere, or it
> could be maintained?)  Then the try reset won't break things for
> those recent pushes, but only the older ones.

This seems like a good solution.

One (possibly obvious) clarification: we'd need to rely on the pushlog
DB (rather than the changeset datestamps) when creating the list of
recent heads, since changeset datestamps are customizable and hence
unreliable.

~Daniel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Consensus sought - when to reset try repository?

2014-02-28 Thread L. David Baron
On Friday 2014-02-28 17:24 -0800, Hal Wine wrote:
> So, the question is - what say developers -- what's the balance point
> between:
>  - too often, making collaborating on try pushes hard
>  - too infrequent, introducing increasing push times

Why not change the try repo reset procedure so that instead of just
cloning mozilla-central, you also pull from the old try repo into
the new one all of the heads of try pushes made within the last one
or two weeks.  (Presumably there's a list of them somewhere, or it
could be maintained?)  Then the try reset won't break things for
those recent pushes, but only the older ones.

-David

-- 
𝄞   L. David Baron http://dbaron.org/   𝄂
𝄢   Mozilla  https://www.mozilla.org/   𝄂
 Before I built a wall I'd ask to know
 What I was walling in or walling out,
 And to whom I was like to give offense.
   - Robert Frost, Mending Wall (1914)


signature.asc
Description: Digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Consensus sought - when to reset try repository?

2014-02-28 Thread Hal Wine
tl;dr: what is the balance point between pushes to try taking too long
and loosing repository history of recent try pushes?

Summary:


As most developers have experienced, pushing to try can sometimes take a
long time. Once it takes "too long" (as measured by screams of pain in
#releng) , a
"try [repository] reset" is scheduled. This hurts productivity and
increases frustration for everyone involved (devs, IT, RelEng). We don't
want to do this anymore.

A reset of the try repository deletes the existing contents, and
replaces with a fresh clone from mozilla-central. While the tbpl
information will remain valid for any completed build, any attempt to
view the diffs for a try build will fail (unless you already had them in
your local repository).

Progress on resolution of the root cause:
-

IT has made tremendous progress in reducing the occurrence of "long push
times", but they still are not predictable. Various attempts at
monitoring[1] and auto correction[2] have not been successful in
improving the situation. Work continues on additional changes that
should improve the situation[3].

The most recent mitigation strategy is to trade the "unknown timing"
disruption of the push times increasing to a pain threshold with a
"known timing" of reseting the try repository every TCW (tree closing
window - every 6 wks currently). However, we heard from some folks that
this is too often.

The most recent try-reset-triggered-by-pain was a duration of 6
months[4]. There was at least one report just 3 months after reset of
problems[5].

So, the question is - what say developers -- what's the balance point
between:
 - too often, making collaborating on try pushes hard
 - too infrequent, introducing increasing push times

--Hal

Prior Work:
---
[1] bug https://bugzil.la/691459
[2] bugs https://bugzil.la/554656https://bugzil.la/734225
#c24
https://bugzil.la/633161https://bugzil.la/529179
[3] bugs https://bugzil.la/770811https://bugzil.la/937732others
[4] bugs https://bugzil.la/894429&; https://bugzil.la/962275
[5] bug https://bugzil.la/925354

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform