Re: Forcing GC to always fail

2018-11-28 Thread Bryan Turner
On Wed, Nov 28, 2018 at 5:19 PM Junio C Hamano  wrote:
>
> > Another issue with the canned steps for "git gc" is that it means it
> > can't be used to do specific types of cleanup on a different schedule
> > from others. For example, we use "git pack-refs" directly to
> > frequently pack the refs in our repositories, separate from "git
> > repack" + "git prune" for repacking objects. That allows us to keep
> > our refs packed better without incurring the full overhead of
> > constantly building new packs.
>
> I am not sure if the above is an example of things that are good.
> We keep individual "pack-refs" and "rev-list | pack-objects"
> available exactly to give finer grained control to repository
> owners, and "gc" is meant to be one-size-fits-all easy to run
> by end users.  Adding options to "git gc --no-reflog --pack-refs"
> to complicate it sounds somewhat backwards.

I think we're in agreement there. I was citing the fact that GC isn't
good for targeted maintenance as a reason why we use "pack-refs"
directly, which sounds like what you're saying as well. I don't think
that inflating GC with options to skip specific steps is a good idea,
but that does mean that, for those targeted operations, we need to use
the lower-level commands directly, rather than GC.

Bryan


Re: Forcing GC to always fail

2018-11-28 Thread Junio C Hamano
Bryan Turner  writes:

> For us, the biggest issue was "git gc"'s insistence on trying to run
> "git reflog expire". That triggers locking behaviors that resulted in
> very frequent GC failures--and the only reflogs Bitbucket Server (by
> default) creates are all configured to never ex[ire or be pruned, so
> the effort is all wasted anyway.

Detecting that the expiry threshold is set to "never" before
spending cycles and seeks to sift between old and new and not
spawning the expire command?

This seems like an obvious low-hanging fruit to me.

> Another issue with the canned steps for "git gc" is that it means it
> can't be used to do specific types of cleanup on a different schedule
> from others. For example, we use "git pack-refs" directly to
> frequently pack the refs in our repositories, separate from "git
> repack" + "git prune" for repacking objects. That allows us to keep
> our refs packed better without incurring the full overhead of
> constantly building new packs.

I am not sure if the above is an example of things that are good.
We keep individual "pack-refs" and "rev-list | pack-objects"
available exactly to give finer grained control to repository
owners, and "gc" is meant to be one-size-fits-all easy to run
by end users.  Adding options to "git gc --no-reflog --pack-refs"
to complicate it sounds somewhat backwards.


Re: Forcing GC to always fail

2018-11-27 Thread Bryan Turner
On Tue, Nov 27, 2018 at 5:55 PM Elijah Newren  wrote:
>
> On Tue, Nov 27, 2018 at 4:16 PM Ævar Arnfjörð Bjarmason
>  wrote:
> >
> > On Wed, Nov 28 2018, Bryan Turner wrote:
> >
> > > On Tue, Nov 27, 2018 at 3:47 PM Ævar Arnfjörð Bjarmason
> > >  wrote:
> > >>
> > >> On Tue, Nov 27 2018, Bryan Turner wrote:
> > >>
> > >> >
> > >> > Is there anything I can set, perhaps some invalid configuration
> > >> > option/value, that will make "git gc" (most important) and "git
> > >> > reflog" (ideal, but less important) fail when they're run in our
> > >> > repositories? Hopefully at that point customers will reach out to us
> > >> > for help _before_ they corrupt their repositories.
> > >>
> > >> $ stahp='Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc' && 
> > >> git -c gc.pruneExpire=$stahp gc; git -c gc.reflogExpire=$stahp reflog 
> > >> expire
> > >> error: Invalid gc.pruneexpire: 
> > >> 'Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc'
> > >> fatal: unable to parse 'gc.pruneexpire' from command-line config
> > >> error: 'Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc' for 
> > >> 'gc.reflogexpire' is not a valid timestamp
> > >> fatal: unable to parse 'gc.reflogexpire' from command-line config
> > >
> > > Thanks for that! It looks like that does block both "git gc" and "git
> > > reflog expire" without blocking "git pack-refs", "git repack" or "git
> > > prune". Fantastic! The fact that it shows the invalid value means it
> > > might also be possible to at least provide a useful hint that manual
> > > GC is not safe.
> > >
> > > I appreciate your help, Ævar.
> >
> > No problem. I was going to add that you can set
> > e.g. pack.windowMemory='some.message' to make this go for git-repack
> > too, but it sounds like you don't want that.
> >
> > Is there a reason for why BitBucket isn't using 'git-gc'? Some other
> > hosting providers use it, and if you don't run it with "expire now" or
> > similarly aggressive non-default values on an active repository it won't
> > corrupt anything.

We did use "git gc" (sans options; its configuration was applied via
"config") for several years, but there were some rough edges.

The biggest one was that "git gc" has a canned set of steps it runs
that can't be disabled, even when they add no value (or are actively
detrimental).

For us, the biggest issue was "git gc"'s insistence on trying to run
"git reflog expire". That triggers locking behaviors that resulted in
very frequent GC failures--and the only reflogs Bitbucket Server (by
default) creates are all configured to never ex[ire or be pruned, so
the effort is all wasted anyway.

Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 270 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
remote: error: cannot lock ref 'stash-refs/pull-requests/13996/merge':
ref 'stash-refs/pull-requests/13996/merge' is at
2ec6a74b453d76f7a5247baa9f396361027ffdf but expected
1678f303202010c6d7e6201226df08dc8fc49ae3
remote: error: cannot lock ref 'stash-refs/pull-requests/13996/to':
ref 'stash-refs/pull-requests/13996/to' is at
bd32d53e4fe63b15be029085f1b6d795d526adbc but expected
f55ec06e89a11b8bdbcd97680a900361307d28c4
remote: error: cannot lock ref 'stash-refs/pull-requests/14006/merge':
ref 'stash-refs/pull-requests/14006/merge' is at
1cc907e2a7082033d70a164f222d3cce17a453a9 but expected
ae057003d7ed7d096b5b952191d784113f25b982
remote: error: cannot lock ref 'stash-refs/pull-requests/14006/to':
ref 'stash-refs/pull-requests/14006/to' is at
bd32d53e4fe63b15be029085f1b6d795d526adbc but expected
f55ec06e89a11b8bdbcd97680a900361307d28c4
remote: error: cannot lock ref 'stash-refs/pull-requests/14043/merge':
ref 'stash-refs/pull-requests/14043/merge' is at
a2e510b1b2b583b273f6d6d28e13151619e8d143 but expected
7735a5bde21815d307c68244e8fd2d67a09d5a39
remote: error: cannot lock ref 'stash-refs/pull-requests/14043/to':
ref 'stash-refs/pull-requests/14043/to' is at
bd32d53e4fe63b15be029085f1b6d795d526adbc but expected
f55ec06e89a11b8bdbcd97680a900361307d28c4
remote: error: cannot lock ref 'stash-refs/pull-requests/14047/merge':
ref 'stash-refs/pull-requests/14047/merge' is at
bd4f0e9bcbed34fd9befa65763f5aee6c9ebd8ce but expected
649dea948e8e6b54506615e5d61c6779c242d5af
remote: error: cannot lock ref 'stash-refs/pull-requests/14047/to':
ref 'stash-refs/pull-requests/14047/to' is at
bd32d53e4fe63b15be029085f1b6d795d526adbc but expected
f55ec06e89a11b8bdbcd97680a900361307d28c4
remote: error: failed to run reflog
To https://...
   f55ec06..bd32d53  master -> master

(Note: That example was from when "git gc --auto" was running attached
to a push, because auto-detach doesn't always detach, but our explicit
"git gc" processing would fail with the same "cannot lock ref"
messages.)

The worktree and rerere cleanup "git gc" does is also unnecessary
overhead, but was less of a concern because it wouldn't make GC
_fail_.


Re: Forcing GC to always fail

2018-11-27 Thread Elijah Newren
On Tue, Nov 27, 2018 at 4:16 PM Ævar Arnfjörð Bjarmason
 wrote:
>
> On Wed, Nov 28 2018, Bryan Turner wrote:
>
> > On Tue, Nov 27, 2018 at 3:47 PM Ævar Arnfjörð Bjarmason
> >  wrote:
> >>
> >> On Tue, Nov 27 2018, Bryan Turner wrote:
> >>
> >> >
> >> > Is there anything I can set, perhaps some invalid configuration
> >> > option/value, that will make "git gc" (most important) and "git
> >> > reflog" (ideal, but less important) fail when they're run in our
> >> > repositories? Hopefully at that point customers will reach out to us
> >> > for help _before_ they corrupt their repositories.
> >>
> >> $ stahp='Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc' && 
> >> git -c gc.pruneExpire=$stahp gc; git -c gc.reflogExpire=$stahp reflog 
> >> expire
> >> error: Invalid gc.pruneexpire: 
> >> 'Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc'
> >> fatal: unable to parse 'gc.pruneexpire' from command-line config
> >> error: 'Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc' for 
> >> 'gc.reflogexpire' is not a valid timestamp
> >> fatal: unable to parse 'gc.reflogexpire' from command-line config
> >
> > Thanks for that! It looks like that does block both "git gc" and "git
> > reflog expire" without blocking "git pack-refs", "git repack" or "git
> > prune". Fantastic! The fact that it shows the invalid value means it
> > might also be possible to at least provide a useful hint that manual
> > GC is not safe.
> >
> > I appreciate your help, Ævar.
>
> No problem. I was going to add that you can set
> e.g. pack.windowMemory='some.message' to make this go for git-repack
> too, but it sounds like you don't want that.
>
> Is there a reason for why BitBucket isn't using 'git-gc'? Some other
> hosting providers use it, and if you don't run it with "expire now" or
> similarly aggressive non-default values on an active repository it won't
> corrupt anything.

...assuming no other repo has this one as an alternate, which I
suspect is the issue at play.  (I wrote an alternate-aware gc script
years ago when using Atlassian Stash to try to workaround this issue,
but think I only used it for a couple repos and never got around to
deploying it in prod for continuous use, probably worried I had missed
a corner case.  Had meant to, but at some point the powers that be
decided to push us toward a different repository manager tool, and
I've long since forgotten most details.)


Re: Forcing GC to always fail

2018-11-27 Thread Ævar Arnfjörð Bjarmason


On Wed, Nov 28 2018, Bryan Turner wrote:

> On Tue, Nov 27, 2018 at 3:47 PM Ævar Arnfjörð Bjarmason
>  wrote:
>>
>>
>> On Tue, Nov 27 2018, Bryan Turner wrote:
>>
>> >
>> > Is there anything I can set, perhaps some invalid configuration
>> > option/value, that will make "git gc" (most important) and "git
>> > reflog" (ideal, but less important) fail when they're run in our
>> > repositories? Hopefully at that point customers will reach out to us
>> > for help _before_ they corrupt their repositories.
>>
>> You could fix this and so many other issues by just hanging up a "Is
>> This Good For The Company?" banner up in Atlassian offices .
>
> Not sure I understand what this means, or what your goal was in saying
> it. No one inside Atlassian is running these commands. I'm trying to
> help save administrators from themselves, which reduces real world
> end-user pain that comes from decisions made without fully
> understanding the consequences. It feels like this comment is mocking
> my earnest desire to help, and my genuine question looking for any
> insight people more familiar with the code might be able to offer.
> Perhaps I'm just missing the joke, but if it's an Office Space
> reference it feels like it's in pretty poor taste.

I (mis)read 'administrators' as being other people at Atlassian. Yeah
it's a reference to Office Space. I meaning to poke some fun at the
situation of having to defensively configure tools least co-workers run
them the wrong way, which I'm sure we've all had to do at some point. I
didn't mean any offense by it.

>>
>> But more seriously:
>>
>> $ stahp='Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc' && git 
>> -c gc.pruneExpire=$stahp gc; git -c gc.reflogExpire=$stahp reflog expire
>> error: Invalid gc.pruneexpire: 
>> 'Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc'
>> fatal: unable to parse 'gc.pruneexpire' from command-line config
>> error: 'Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc' for 
>> 'gc.reflogexpire' is not a valid timestamp
>> fatal: unable to parse 'gc.reflogexpire' from command-line config
>
> Thanks for that! It looks like that does block both "git gc" and "git
> reflog expire" without blocking "git pack-refs", "git repack" or "git
> prune". Fantastic! The fact that it shows the invalid value means it
> might also be possible to at least provide a useful hint that manual
> GC is not safe.
>
> I appreciate your help, Ævar.

No problem. I was going to add that you can set
e.g. pack.windowMemory='some.message' to make this go for git-repack
too, but it sounds like you don't want that.

Is there a reason for why BitBucket isn't using 'git-gc'? Some other
hosting providers use it, and if you don't run it with "expire now" or
similarly aggressive non-default values on an active repository it won't
corrupt anything.


Re: Forcing GC to always fail

2018-11-27 Thread Bryan Turner
On Tue, Nov 27, 2018 at 3:47 PM Ævar Arnfjörð Bjarmason
 wrote:
>
>
> On Tue, Nov 27 2018, Bryan Turner wrote:
>
> >
> > Is there anything I can set, perhaps some invalid configuration
> > option/value, that will make "git gc" (most important) and "git
> > reflog" (ideal, but less important) fail when they're run in our
> > repositories? Hopefully at that point customers will reach out to us
> > for help _before_ they corrupt their repositories.
>
> You could fix this and so many other issues by just hanging up a "Is
> This Good For The Company?" banner up in Atlassian offices .

Not sure I understand what this means, or what your goal was in saying
it. No one inside Atlassian is running these commands. I'm trying to
help save administrators from themselves, which reduces real world
end-user pain that comes from decisions made without fully
understanding the consequences. It feels like this comment is mocking
my earnest desire to help, and my genuine question looking for any
insight people more familiar with the code might be able to offer.
Perhaps I'm just missing the joke, but if it's an Office Space
reference it feels like it's in pretty poor taste.

>
> But more seriously:
>
> $ stahp='Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc' && git 
> -c gc.pruneExpire=$stahp gc; git -c gc.reflogExpire=$stahp reflog expire
> error: Invalid gc.pruneexpire: 
> 'Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc'
> fatal: unable to parse 'gc.pruneexpire' from command-line config
> error: 'Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc' for 
> 'gc.reflogexpire' is not a valid timestamp
> fatal: unable to parse 'gc.reflogexpire' from command-line config

Thanks for that! It looks like that does block both "git gc" and "git
reflog expire" without blocking "git pack-refs", "git repack" or "git
prune". Fantastic! The fact that it shows the invalid value means it
might also be possible to at least provide a useful hint that manual
GC is not safe.

I appreciate your help, Ævar.

Bryan


Re: Forcing GC to always fail

2018-11-27 Thread Ævar Arnfjörð Bjarmason


On Tue, Nov 27 2018, Bryan Turner wrote:

> Something of an odd question, but is there something I can do in the
> configuration for a repository that forces any "git gc" run in that
> repository to always fail without doing anything? (Ideally I'd like to
> make "git reflog expire" _also_ fail.)
>
> Background: For Bitbucket Server, we have a fairly recurrent issue
> where administrators decide they know how to manage garbage collection
> for our repositories better than we do, so they jump on the server and
> start running things like this:
>
> git reflog expire --expire=now –all
> git gc --prune=now
> git repack -adf --window=200 --depth=200
>
> They then come running to us with their corrupted repository expecting
> and/or hoping that we can fix it (often without proper backups).
>
> Bitbucket Server itself never runs "git gc" (or "git reflog expire").
> We've configured how reflog expiry should be handled, but of course
> that's overridden by explicit command line options like
> "--expire=now". We _do_ run "git pack-refs", "git repack" and "git
> prune" (with various options), so those commands need to continue to
> work.
>
> Is there anything I can set, perhaps some invalid configuration
> option/value, that will make "git gc" (most important) and "git
> reflog" (ideal, but less important) fail when they're run in our
> repositories? Hopefully at that point customers will reach out to us
> for help _before_ they corrupt their repositories.

You could fix this and so many other issues by just hanging up a "Is
This Good For The Company?" banner up in Atlassian offices .

But more seriously:

$ stahp='Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc' && git -c 
gc.pruneExpire=$stahp gc; git -c gc.reflogExpire=$stahp reflog expire
error: Invalid gc.pruneexpire: 
'Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc'
fatal: unable to parse 'gc.pruneexpire' from command-line config
error: 'Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc' for 
'gc.reflogexpire' is not a valid timestamp
fatal: unable to parse 'gc.reflogexpire' from command-line config


Forcing GC to always fail

2018-11-27 Thread Bryan Turner
Something of an odd question, but is there something I can do in the
configuration for a repository that forces any "git gc" run in that
repository to always fail without doing anything? (Ideally I'd like to
make "git reflog expire" _also_ fail.)

Background: For Bitbucket Server, we have a fairly recurrent issue
where administrators decide they know how to manage garbage collection
for our repositories better than we do, so they jump on the server and
start running things like this:

git reflog expire --expire=now –all
git gc --prune=now
git repack -adf --window=200 --depth=200

They then come running to us with their corrupted repository expecting
and/or hoping that we can fix it (often without proper backups).

Bitbucket Server itself never runs "git gc" (or "git reflog expire").
We've configured how reflog expiry should be handled, but of course
that's overridden by explicit command line options like
"--expire=now". We _do_ run "git pack-refs", "git repack" and "git
prune" (with various options), so those commands need to continue to
work.

Is there anything I can set, perhaps some invalid configuration
option/value, that will make "git gc" (most important) and "git
reflog" (ideal, but less important) fail when they're run in our
repositories? Hopefully at that point customers will reach out to us
for help _before_ they corrupt their repositories.

Bryan