Re: Question regarding quarantine environments
On Fri, Aug 03, 2018 at 03:25:08PM +0200, Ævar Arnfjörð Bjarmason wrote: > > I'd be a bit careful with that, though, as the definition of "new" is > > vague there. > > > > For example, completing a thin pack may mean that the receiver creates a > > copy of a base object found in the main repo. That object isn't new as > > part of the push, nor was it even sent on the wire, but it will appear > > in the quarantine directory. But only sometimes, depending on whether we > > kept the sender's pack or exploded it to loose objects. > > Right, I mean: > > is_new = !in_quarantine() && in_main() > > Or: > > is_new = !in_main() > > Should work, in the latter case if the object really is missing from the > quarnatine too, other fsck bits will stop the push. Ah, OK. Yes, I agree that should work to cover new objects (including ones that the other side but aren't actually needed to update the refs, though hopefully that is rare). There may also be other object stores, if the main repository used alternates (or if somebody set GIT_ALTERNATE_OBJECT_DIRECTORIES). You can probably disregard that, though, as: 1. If you ignore the main repo, presumably you ignore its recursive info/alternates, too. 2. The easy mechanism for ignoring the main repo is to ignore GIT_ALTERNATE_OBJECT_DIRECTORIES, so you'd already be handling that. > But as you point out: > > is_new = in_quarantine() > > Cannot be relied upon, although it'll be true most of the time. > > Perhaps I'm missing some edge case above, but I wanted to reword it to > make sure I understood it correctly (and perhaps you have a correction). Nope, I just didn't think through what you were saying carefully enough. ;) -Peff
Re: Question regarding quarantine environments
On Fri, Aug 03 2018, Jeff King wrote: > On Fri, Aug 03, 2018 at 02:56:11PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > Any Git commands you run should therefore find objects from either >> > location, but any writes would go to the quarantine (most notably, Git's >> > own index-pack/unpack-objects processes, which is the point of the >> > quarantine in the first place). >> >> To add to this, one interesting thing that you can do with hooks because >> of this quarantine is to answer certain questions about the push that >> were prohibitively expensive before it existed, but there's no explicit >> documentation for this. >> >> E.g. for a hook that wants to ban big blobs in the repo, but wants to >> allow all existing blobs (you don't want to block e.g. a revert of a >> commit that removed it from the checkout), you can juggle these two env >> variables and hide the "main" object dir from the hook for some >> operations, so e.g. if a blob lookup succeeds in the alternate >> quarantine dir, but not the main object dir, you know it's new. > > I'd be a bit careful with that, though, as the definition of "new" is > vague there. > > For example, completing a thin pack may mean that the receiver creates a > copy of a base object found in the main repo. That object isn't new as > part of the push, nor was it even sent on the wire, but it will appear > in the quarantine directory. But only sometimes, depending on whether we > kept the sender's pack or exploded it to loose objects. Right, I mean: is_new = !in_quarantine() && in_main() Or: is_new = !in_main() Should work, in the latter case if the object really is missing from the quarnatine too, other fsck bits will stop the push. But as you point out: is_new = in_quarantine() Cannot be relied upon, although it'll be true most of the time. Perhaps I'm missing some edge case above, but I wanted to reword it to make sure I understood it correctly (and perhaps you have a correction).
Re: Question regarding quarantine environments
On Fri, Aug 03, 2018 at 02:56:11PM +0200, Ævar Arnfjörð Bjarmason wrote: > > Any Git commands you run should therefore find objects from either > > location, but any writes would go to the quarantine (most notably, Git's > > own index-pack/unpack-objects processes, which is the point of the > > quarantine in the first place). > > To add to this, one interesting thing that you can do with hooks because > of this quarantine is to answer certain questions about the push that > were prohibitively expensive before it existed, but there's no explicit > documentation for this. > > E.g. for a hook that wants to ban big blobs in the repo, but wants to > allow all existing blobs (you don't want to block e.g. a revert of a > commit that removed it from the checkout), you can juggle these two env > variables and hide the "main" object dir from the hook for some > operations, so e.g. if a blob lookup succeeds in the alternate > quarantine dir, but not the main object dir, you know it's new. I'd be a bit careful with that, though, as the definition of "new" is vague there. For example, completing a thin pack may mean that the receiver creates a copy of a base object found in the main repo. That object isn't new as part of the push, nor was it even sent on the wire, but it will appear in the quarantine directory. But only sometimes, depending on whether we kept the sender's pack or exploded it to loose objects. -Peff
Re: Question regarding quarantine environments
On Thu, Aug 02 2018, Jeff King wrote: > On Thu, Aug 02, 2018 at 12:58:52PM -0500, Liam Decker wrote: > >> I've been working on a git hook in golang recently. However, the library I >> was using did not support a possible quarantine directory, which I would >> use for my hook. >> >> I have been trying to find out how git finds this incoming directory in the >> objects folder, as their code simply assumed it resided in >> .git/objects/<1st byte>/ > > When you're running a hook inside the quarantine environment, then > $GIT_OBJECT_DIRECTORY in the environment will be set to the quarantine > directory, and $GIT_ALTERNATE_OBJECT_DIRECTORIES will point to the main > repository object directory (possibly alongside other alternates, if > there were any already set). > > Any Git commands you run should therefore find objects from either > location, but any writes would go to the quarantine (most notably, Git's > own index-pack/unpack-objects processes, which is the point of the > quarantine in the first place). To add to this, one interesting thing that you can do with hooks because of this quarantine is to answer certain questions about the push that were prohibitively expensive before it existed, but there's no explicit documentation for this. E.g. for a hook that wants to ban big blobs in the repo, but wants to allow all existing blobs (you don't want to block e.g. a revert of a commit that removed it from the checkout), you can juggle these two env variables and hide the "main" object dir from the hook for some operations, so e.g. if a blob lookup succeeds in the alternate quarantine dir, but not the main object dir, you know it's new. >> The solution that I implemented was to check the objects directory for the >> object, and if it was not there, to look for a quarantine directory and try >> there. However, that feels fairly inefficient. > > That's more or less what Git will do under the hood (though in the > opposite order). > >> For the curious, the library and solution I attempted are both here [5] > > Just skimming, but it sounds like go-git does not support the > GIT_OBJECT_DIRECTORY environment variable. > > -Peff
Re: Question regarding quarantine environments
Hi, Jeff King wrote: > On Thu, Aug 02, 2018 at 12:58:52PM -0500, Liam Decker wrote: >> The solution that I implemented was to check the objects directory for the >> object, and if it was not there, to look for a quarantine directory and try >> there. However, that feels fairly inefficient. > > That's more or less what Git will do under the hood (though in the > opposite order). > >> For the curious, the library and solution I attempted are both here [5] > > Just skimming, but it sounds like go-git does not support the > GIT_OBJECT_DIRECTORY environment variable. To be clear: we don't guarantee that the quarantine directory in the future will be where it is today. So as Peff hinted, supporting GIT_OBJECT_DIRECTORY in go-git is likely to be the best way forward for your tool. Thanks and hope that helps, Jonathan
Re: Question regarding quarantine environments
On Thu, Aug 02, 2018 at 12:58:52PM -0500, Liam Decker wrote: > I've been working on a git hook in golang recently. However, the library I > was using did not support a possible quarantine directory, which I would > use for my hook. > > I have been trying to find out how git finds this incoming directory in the > objects folder, as their code simply assumed it resided in > .git/objects/<1st byte>/ When you're running a hook inside the quarantine environment, then $GIT_OBJECT_DIRECTORY in the environment will be set to the quarantine directory, and $GIT_ALTERNATE_OBJECT_DIRECTORIES will point to the main repository object directory (possibly alongside other alternates, if there were any already set). Any Git commands you run should therefore find objects from either location, but any writes would go to the quarantine (most notably, Git's own index-pack/unpack-objects processes, which is the point of the quarantine in the first place). > The solution that I implemented was to check the objects directory for the > object, and if it was not there, to look for a quarantine directory and try > there. However, that feels fairly inefficient. That's more or less what Git will do under the hood (though in the opposite order). > For the curious, the library and solution I attempted are both here [5] Just skimming, but it sounds like go-git does not support the GIT_OBJECT_DIRECTORY environment variable. -Peff