Re: Question regarding quarantine environments

2018-08-03 Thread Jeff King
On Fri, Aug 03, 2018 at 03:25:08PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I'd be a bit careful with that, though, as the definition of "new" is
> > vague there.
> >
> > For example, completing a thin pack may mean that the receiver creates a
> > copy of a base object found in the main repo. That object isn't new as
> > part of the push, nor was it even sent on the wire, but it will appear
> > in the quarantine directory. But only sometimes, depending on whether we
> > kept the sender's pack or exploded it to loose objects.
> 
> Right, I mean:
> 
> is_new = !in_quarantine() && in_main()
> 
> Or:
> 
> is_new = !in_main()
> 
> Should work, in the latter case if the object really is missing from the
> quarnatine too, other fsck bits will stop the push.

Ah, OK. Yes, I agree that should work to cover new objects (including
ones that the other side but aren't actually needed to update the refs,
though hopefully that is rare).

There may also be other object stores, if the main repository used
alternates (or if somebody set GIT_ALTERNATE_OBJECT_DIRECTORIES). You
can probably disregard that, though, as:

  1. If you ignore the main repo, presumably you ignore its
 recursive info/alternates, too.

  2. The easy mechanism for ignoring the main repo is to ignore
 GIT_ALTERNATE_OBJECT_DIRECTORIES, so you'd already be handling
 that.

> But as you point out:
> 
> is_new = in_quarantine()
> 
> Cannot be relied upon, although it'll be true most of the time.
> 
> Perhaps I'm missing some edge case above, but I wanted to reword it to
> make sure I understood it correctly (and perhaps you have a correction).

Nope, I just didn't think through what you were saying carefully enough. ;)

-Peff


Re: Question regarding quarantine environments

2018-08-03 Thread Ævar Arnfjörð Bjarmason


On Fri, Aug 03 2018, Jeff King wrote:

> On Fri, Aug 03, 2018 at 02:56:11PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > Any Git commands you run should therefore find objects from either
>> > location, but any writes would go to the quarantine (most notably, Git's
>> > own index-pack/unpack-objects processes, which is the point of the
>> > quarantine in the first place).
>>
>> To add to this, one interesting thing that you can do with hooks because
>> of this quarantine is to answer certain questions about the push that
>> were prohibitively expensive before it existed, but there's no explicit
>> documentation for this.
>>
>> E.g. for a hook that wants to ban big blobs in the repo, but wants to
>> allow all existing blobs (you don't want to block e.g. a revert of a
>> commit that removed it from the checkout), you can juggle these two env
>> variables and hide the "main" object dir from the hook for some
>> operations, so e.g. if a blob lookup succeeds in the alternate
>> quarantine dir, but not the main object dir, you know it's new.
>
> I'd be a bit careful with that, though, as the definition of "new" is
> vague there.
>
> For example, completing a thin pack may mean that the receiver creates a
> copy of a base object found in the main repo. That object isn't new as
> part of the push, nor was it even sent on the wire, but it will appear
> in the quarantine directory. But only sometimes, depending on whether we
> kept the sender's pack or exploded it to loose objects.

Right, I mean:

is_new = !in_quarantine() && in_main()

Or:

is_new = !in_main()

Should work, in the latter case if the object really is missing from the
quarnatine too, other fsck bits will stop the push.

But as you point out:

is_new = in_quarantine()

Cannot be relied upon, although it'll be true most of the time.

Perhaps I'm missing some edge case above, but I wanted to reword it to
make sure I understood it correctly (and perhaps you have a correction).


Re: Question regarding quarantine environments

2018-08-03 Thread Jeff King
On Fri, Aug 03, 2018 at 02:56:11PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Any Git commands you run should therefore find objects from either
> > location, but any writes would go to the quarantine (most notably, Git's
> > own index-pack/unpack-objects processes, which is the point of the
> > quarantine in the first place).
> 
> To add to this, one interesting thing that you can do with hooks because
> of this quarantine is to answer certain questions about the push that
> were prohibitively expensive before it existed, but there's no explicit
> documentation for this.
> 
> E.g. for a hook that wants to ban big blobs in the repo, but wants to
> allow all existing blobs (you don't want to block e.g. a revert of a
> commit that removed it from the checkout), you can juggle these two env
> variables and hide the "main" object dir from the hook for some
> operations, so e.g. if a blob lookup succeeds in the alternate
> quarantine dir, but not the main object dir, you know it's new.

I'd be a bit careful with that, though, as the definition of "new" is
vague there.

For example, completing a thin pack may mean that the receiver creates a
copy of a base object found in the main repo. That object isn't new as
part of the push, nor was it even sent on the wire, but it will appear
in the quarantine directory. But only sometimes, depending on whether we
kept the sender's pack or exploded it to loose objects.

-Peff


Re: Question regarding quarantine environments

2018-08-03 Thread Ævar Arnfjörð Bjarmason


On Thu, Aug 02 2018, Jeff King wrote:

> On Thu, Aug 02, 2018 at 12:58:52PM -0500, Liam Decker wrote:
>
>> I've been working on a git hook in golang recently. However, the library I
>> was using did not support a possible quarantine directory, which I would
>> use for my hook.
>>
>> I have been trying to find out how git finds this incoming directory in the
>> objects folder, as their code simply assumed it resided in
>> .git/objects/<1st byte>/
>
> When you're running a hook inside the quarantine environment, then
> $GIT_OBJECT_DIRECTORY in the environment will be set to the quarantine
> directory, and $GIT_ALTERNATE_OBJECT_DIRECTORIES will point to the main
> repository object directory (possibly alongside other alternates, if
> there were any already set).
>
> Any Git commands you run should therefore find objects from either
> location, but any writes would go to the quarantine (most notably, Git's
> own index-pack/unpack-objects processes, which is the point of the
> quarantine in the first place).

To add to this, one interesting thing that you can do with hooks because
of this quarantine is to answer certain questions about the push that
were prohibitively expensive before it existed, but there's no explicit
documentation for this.

E.g. for a hook that wants to ban big blobs in the repo, but wants to
allow all existing blobs (you don't want to block e.g. a revert of a
commit that removed it from the checkout), you can juggle these two env
variables and hide the "main" object dir from the hook for some
operations, so e.g. if a blob lookup succeeds in the alternate
quarantine dir, but not the main object dir, you know it's new.

>> The solution that I implemented was to check the objects directory for the
>> object, and if it was not there, to look for a quarantine directory and try
>> there. However, that feels fairly inefficient.
>
> That's more or less what Git will do under the hood (though in the
> opposite order).
>
>> For the curious, the library and solution I attempted are both here [5]
>
> Just skimming, but it sounds like go-git does not support the
> GIT_OBJECT_DIRECTORY environment variable.
>
> -Peff


Re: Question regarding quarantine environments

2018-08-02 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Thu, Aug 02, 2018 at 12:58:52PM -0500, Liam Decker wrote:

>> The solution that I implemented was to check the objects directory for the
>> object, and if it was not there, to look for a quarantine directory and try
>> there. However, that feels fairly inefficient.
>
> That's more or less what Git will do under the hood (though in the
> opposite order).
>
>> For the curious, the library and solution I attempted are both here [5]
>
> Just skimming, but it sounds like go-git does not support the
> GIT_OBJECT_DIRECTORY environment variable.

To be clear: we don't guarantee that the quarantine directory in the
future will be where it is today.

So as Peff hinted, supporting GIT_OBJECT_DIRECTORY in go-git is likely
to be the best way forward for your tool.

Thanks and hope that helps,
Jonathan


Re: Question regarding quarantine environments

2018-08-02 Thread Jeff King
On Thu, Aug 02, 2018 at 12:58:52PM -0500, Liam Decker wrote:

> I've been working on a git hook in golang recently. However, the library I
> was using did not support a possible quarantine directory, which I would
> use for my hook.
> 
> I have been trying to find out how git finds this incoming directory in the
> objects folder, as their code simply assumed it resided in
> .git/objects/<1st byte>/

When you're running a hook inside the quarantine environment, then
$GIT_OBJECT_DIRECTORY in the environment will be set to the quarantine
directory, and $GIT_ALTERNATE_OBJECT_DIRECTORIES will point to the main
repository object directory (possibly alongside other alternates, if
there were any already set).

Any Git commands you run should therefore find objects from either
location, but any writes would go to the quarantine (most notably, Git's
own index-pack/unpack-objects processes, which is the point of the
quarantine in the first place).

> The solution that I implemented was to check the objects directory for the
> object, and if it was not there, to look for a quarantine directory and try
> there. However, that feels fairly inefficient.

That's more or less what Git will do under the hood (though in the
opposite order).

> For the curious, the library and solution I attempted are both here [5]

Just skimming, but it sounds like go-git does not support the
GIT_OBJECT_DIRECTORY environment variable.

-Peff