Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-11-01 Thread Jeff King
On Wed, Oct 31, 2018 at 07:28:09AM +0100, Christian Couder wrote:

> > For (2), I would like to see us improve the remote helper
> > infrastructure instead of introducing a new ODB helper.  Remote
> > helpers are already permitted to fetch some objects without listing
> > refs --- perhaps we will want to
> >
> >  i. split listing refs to a separate capability, so that a remote
> > helper can advertise that it doesn't support that.  (Alternatively
> > the remote could advertise that it has no refs.)
> >
> >  ii. Use the "long-running process" mechanism to improve how Git
> >  communicates with a remote helper.
> 
> Yeah, I agree that improving the remote helper infrastructure is
> probably better than what I have been trying to add. And I agree with
> the above 2 steps you propose.

One thing you might want to port over is the ability to ask the remote
helper "tell me the type and size of these objects". The reason I built
that into the original external-odb interface proposal was so that diff
could easily skip large objects without faulting them in (because
they're considered binary, and we'd just say "binary files differ"
anyway). That makes things like "git log -p" work a lot better (try it
with a blob-less partial clone now; it's pretty painful).

I know that's kind of the _opposite_ of how partial clones work now,
where we try really hard not to have to even tell the client the full
list of objects. That's good if the reason you want a partial clone is
because there are gigantic numbers of objects (e.g., the Windows repo).
But I think many people are interested in having a more moderate number
of large objects (e.g., things like game development that are using
git-lfs now). It would be great if we could support both use cases
easily.

-Peff


Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-10-31 Thread Christian Couder
Hi Jonathan,

On Tue, Oct 16, 2018 at 7:43 PM Jonathan Nieder  wrote:
>
> Hi Christian,
>
> On Tue, Sep 25, 2018, Christian Couder wrote:
>
> > In the cover letter there is a "Discussion" section which is about
> > this, but I agree that it might not be very clear.
> >
> > The main issue that this patch series tries to solve is that
> > extensions.partialclone config option limits the partial clone and
> > promisor features to only one remote. One related issue is that it
> > also prevents to have other kind of promisor/partial clone/odb
> > remotes. By other kind I mean remotes that would not necessarily be
> > git repos, but that could store objects (that's where ODB, for Object
> > DataBase, comes from) and could provide those objects to Git through a
> > helper (or driver) script or program.
>
> Thanks for this explanation.  I took the opportunity to learn more
> while you were in the bay area for the google summer of code mentor
> summit and learned a little more, which was very helpful to me.

Thanks for inviting me at the Google offices in Sunnyvale and San
Francisco to discuss about this and other issues.

> The broader picture is that this is meant to make Git natively handle
> large blobs in a nicer way.  The design in this series has a few
> components:
>
>  1. Teaching partial clone to attempt to fetch missing objects from
> multiple remotes instead of only one.  This is useful because you
> can have a server that is nearby and cheaper to serve from (some
> kind of local cache server) that you make requests to first before
> falling back to the canonical source of objects.
>
>  2. Simplifying the protocol for fetching missing objects so that it
> can be satisfied by a lighter weight object storage system than
> a full Git server.  The ODB helpers introduced in this series are
> meant to speak such a simpler protocol since they are only used
> for one-off requests of a collection of missing objects instead of
> needing to understand refs, Git's negotiation, etc.
>
>  3. (possibly, though not in this series) Making the criteria for what
> objects can be missing more aggressive, so that I can "git add"
> a large file and work with it using Git without even having a
> second copy of that object in my local object store.

Yeah, I think this is a good summary of the issues I have been trying
to address.

> For (2), I would like to see us improve the remote helper
> infrastructure instead of introducing a new ODB helper.  Remote
> helpers are already permitted to fetch some objects without listing
> refs --- perhaps we will want to
>
>  i. split listing refs to a separate capability, so that a remote
> helper can advertise that it doesn't support that.  (Alternatively
> the remote could advertise that it has no refs.)
>
>  ii. Use the "long-running process" mechanism to improve how Git
>  communicates with a remote helper.

Yeah, I agree that improving the remote helper infrastructure is
probably better than what I have been trying to add. And I agree with
the above 2 steps you propose.

> For (1), things get more tricky.  In an object store from a partial
> clone today, we relax the ordinary "closure under reachability"
> invariant but in a minor way.  We'll need to work out how this works
> with multiple promisor remotes.
>
> The idea today is that there are two kinds of packs: promisor packs
> (from the promisor remote) and non-promisor packs.  Promisor packs are
> allowed to have reachability edges (for example a tree->blob edge)
> that point to a missing object, since the promisor remote has promised
> that we will be able to access that object on demand.  Non-promisor
> packs are also allowed to have reachability edges that point to a
> missing object, as long as there is a reachability edge from an object
> in a promisor pack to the same object (because of the same promise).
> See "Handling Missing Objects" in Documentation/technical/partial-clone.txt
> for more details.
>
> To prevent older versions of Git from being confused by partial clone
> repositories, they use the repositoryFormatVersion mechanism:
>
> [core]
> repositoryFormatVersion = 1
> [extensions]
> partialClone = ...
>
> If we change the invariant, we will need to use a new extensions.* key
> to ensure that versions of Git that are not aware of the new invariant
> do not operate on the repository.

Maybe the versions of Git that are not aware of the new invariant
could still work using only the specified remote while the new
versions would know that they can use other remotes by looking at
other config variables.

> A promisor pack is indicated by there being a .promisor file next to
> the usual .pack file.  Currently the .promisor file is empty.  The
> previous idea was that once we want more metadata (e.g. for the sake of
> multiple promisor remotes), we could write it in that file.  For
> example, remotes could be 

Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-10-18 Thread Junio C Hamano
Jonathan Nieder  writes:

> Junio C Hamano wrote:
> ...
>> It is a good idea to implicitly include the promisor-remote to the
>> set of secondary places to consult to help existing versions of Git,
>> but once the repository starts fetching incomplete subgraphs and
>> adding new object.missingobjectremote [*1*], these versions of Git
>> will stop working correctly, so I am not sure if it is all that
>> useful approach for compatibility in practice.
>
> Can you spell this out for me more?  Do you mean that a remote from
> this list might make a promise that the original partialClone remote
> can't keep?

It was my failed attempt to demonstrate that I understood what was
being discussed by rephrasing JTan's

Or allow extensions.partialClone= wherein  is not in the
missingObjectRemote, in which case  is tried first, so that
we don't have to reject some configurations.



Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-10-18 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Tan  writes:
>> Jonathan Nieder wrote:

>>> [object]
>>> missingObjectRemote = local-cache-remote
>>> missingObjectRemote = origin
>>
>> In the presence of missingObjectRemote, old versions of Git, when lazily
>> fetching, would only know to try the extensions.partialClone remote. But
>> this is safe because existing data wouldn't be clobbered (since we're
>> not using ideas like adding meaning to the contents of the .promisor
>> file). Also, other things like fsck and gc still work.
>
> It is a good idea to implicitly include the promisor-remote to the
> set of secondary places to consult to help existing versions of Git,
> but once the repository starts fetching incomplete subgraphs and
> adding new object.missingobjectremote [*1*], these versions of Git
> will stop working correctly, so I am not sure if it is all that
> useful approach for compatibility in practice.

Can you spell this out for me more?  Do you mean that a remote from
this list might make a promise that the original partialClone remote
can't keep?

If we're careful to only page in objects that were promised by the
original partialClone remote, then a promise "I promise to supply you
on demand with any object directly or indirectly reachable from any
object you have fetched from me" from the partialClone remote should
be enough.

> [Footnote]
>
> *1* That name with two "object" in it sounds horrible.

Sorry about that.  Another name used while discussing this was
objectAccess.promisorRemote.  As long as the idea is clear (that this
means "remotes to use when attempting to obtain an object that is not
already available locally"), I am not attached to any particular name.

Thanks,
Jonathan


Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-10-18 Thread Junio C Hamano
Jonathan Tan  writes:

>>  [object]
>>  missingObjectRemote = local-cache-remote
>>  missingObjectRemote = origin
>> 
> In the presence of missingObjectRemote, old versions of Git, when lazily
> fetching, would only know to try the extensions.partialClone remote. But
> this is safe because existing data wouldn't be clobbered (since we're
> not using ideas like adding meaning to the contents of the .promisor
> file). Also, other things like fsck and gc still work.

It is a good idea to implicitly include the promisor-remote to the
set of secondary places to consult to help existing versions of Git,
but once the repository starts fetching incomplete subgraphs and
adding new object.missingobjectremote [*1*], these versions of Git
will stop working correctly, so I am not sure if it is all that
useful approach for compatibility in practice.


[Footnote]

*1* That name with two "object" in it sounds horrible.  I think the
same keyname in 'core' section may sit better (this feature sounds
more 'core' than other cruft that crept into 'core' section over
time).  

Or "odb.remoteAlternate" (as opposed to object/info/alternates that
are local alternates), perhaps.


Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-10-16 Thread Jonathan Tan
>  1. Teaching partial clone to attempt to fetch missing objects from
> multiple remotes instead of only one.  This is useful because you
> can have a server that is nearby and cheaper to serve from (some
> kind of local cache server) that you make requests to first before
> falling back to the canonical source of objects.

Quoting the above definition of (1) for reference. I think Jonathan
Nieder has covered the relevant points well - I'll just expand on (1).

> So much for the current setup.  For (1), I believe you are proposing to
> still have only one effective , so it doesn't necessarily
> require modifying the extensions.* configuration.  Instead, the idea is
> that when trying to access an object, we would follow one of a list of
> steps:
> 
>  1. First, check the local object store. If it's there, we're done.
>  2. Second, try alternates --- maybe the object is in one of those!
>  3. Now, try promisor remotes, one at a time, in user-configured order.
> 
> In other words, I think that for (1) all we would need is a new
> configuration
> 
>   [object]
>   missingObjectRemote = local-cache-remote
>   missingObjectRemote = origin
> 
> The semantics would be that when trying to access a promised object,
> we attempt to fetch from these remotes one at a time, in the order
> specified.  We could require that the remote named in
> extensions.partialClone be one of the listed remotes, without having
> to care where it shows up in the list.

Or allow extensions.partialClone= wherein  is not in the
missingObjectRemote, in which case  is tried first, so that we don't
have to reject some configurations.

> That way, we get the benefit (1) without having to change the
> semantics of extensions.partialClone and without having to care about
> the order of sections in the config.  What do you think?

Let's define the promisor remotes of a repository as those in
missingObjectRemote or extensions.partialClone (currently, we talk about
"the promisor remote" (singular), defined in extensions.partialClone).

Overall, this seems like a reasonable idea to me, if we keep the
restriction that we can only fetch with filter from a promisor remote.
This allows us to extend the definition of a promisor object in a
manner consistent to the current definition - to say "a promisor object
is one promised by at least one promisor remote" (currently, "a promisor
object is promised by the promisor remote").

In the presence of missingObjectRemote, old versions of Git, when lazily
fetching, would only know to try the extensions.partialClone remote. But
this is safe because existing data wouldn't be clobbered (since we're
not using ideas like adding meaning to the contents of the .promisor
file). Also, other things like fsck and gc still work.


Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-10-16 Thread Jonathan Nieder
Hi Christian,

On Tue, Sep 25, 2018, Christian Couder wrote:

> In the cover letter there is a "Discussion" section which is about
> this, but I agree that it might not be very clear.
>
> The main issue that this patch series tries to solve is that
> extensions.partialclone config option limits the partial clone and
> promisor features to only one remote. One related issue is that it
> also prevents to have other kind of promisor/partial clone/odb
> remotes. By other kind I mean remotes that would not necessarily be
> git repos, but that could store objects (that's where ODB, for Object
> DataBase, comes from) and could provide those objects to Git through a
> helper (or driver) script or program.

Thanks for this explanation.  I took the opportunity to learn more
while you were in the bay area for the google summer of code mentor
summit and learned a little more, which was very helpful to me.

The broader picture is that this is meant to make Git natively handle
large blobs in a nicer way.  The design in this series has a few
components:

 1. Teaching partial clone to attempt to fetch missing objects from
multiple remotes instead of only one.  This is useful because you
can have a server that is nearby and cheaper to serve from (some
kind of local cache server) that you make requests to first before
falling back to the canonical source of objects.

 2. Simplifying the protocol for fetching missing objects so that it
can be satisfied by a lighter weight object storage system than
a full Git server.  The ODB helpers introduced in this series are
meant to speak such a simpler protocol since they are only used
for one-off requests of a collection of missing objects instead of
needing to understand refs, Git's negotiation, etc.

 3. (possibly, though not in this series) Making the criteria for what
objects can be missing more aggressive, so that I can "git add"
a large file and work with it using Git without even having a
second copy of that object in my local object store.

For (2), I would like to see us improve the remote helper
infrastructure instead of introducing a new ODB helper.  Remote
helpers are already permitted to fetch some objects without listing
refs --- perhaps we will want to

 i. split listing refs to a separate capability, so that a remote
helper can advertise that it doesn't support that.  (Alternatively
the remote could advertise that it has no refs.)

 ii. Use the "long-running process" mechanism to improve how Git
 communicates with a remote helper.

For (1), things get more tricky.  In an object store from a partial
clone today, we relax the ordinary "closure under reachability"
invariant but in a minor way.  We'll need to work out how this works
with multiple promisor remotes.

The idea today is that there are two kinds of packs: promisor packs
(from the promisor remote) and non-promisor packs.  Promisor packs are
allowed to have reachability edges (for example a tree->blob edge)
that point to a missing object, since the promisor remote has promised
that we will be able to access that object on demand.  Non-promisor
packs are also allowed to have reachability edges that point to a
missing object, as long as there is a reachability edge from an object
in a promisor pack to the same object (because of the same promise).
See "Handling Missing Objects" in Documentation/technical/partial-clone.txt
for more details.

To prevent older versions of Git from being confused by partial clone
repositories, they use the repositoryFormatVersion mechanism:

[core]
repositoryFormatVersion = 1
[extensions]
partialClone = ...

If we change the invariant, we will need to use a new extensions.* key
to ensure that versions of Git that are not aware of the new invariant
do not operate on the repository.

A promisor pack is indicated by there being a .promisor file next to
the usual .pack file.  Currently the .promisor file is empty.  The
previous idea was that once we want more metadata (e.g. for the sake of
multiple promisor remotes), we could write it in that file.  For
example, remotes could be associated to a  and the
.promisor file could indicate which  has promised to serve
requests for objects reachable from objects in this pack.

That will complicate the object access code as well, since currently
we only find who has promised an object during "git fsck" and similar
operations.  During everyday access we do not care which promisor
pack caused the object to be promised, since there is only one promisor
remote to fetch from anyway.

So much for the current setup.  For (1), I believe you are proposing to
still have only one effective , so it doesn't necessarily
require modifying the extensions.* configuration.  Instead, the idea is
that when trying to access an object, we would follow one of a list of
steps:

 1. First, check the local object store. If it's there, we're done.
 2. Second, try 

Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-09-26 Thread Junio C Hamano
Jeff King  writes:

>> I do not think "sources that are not git repositories" is all that
>> interesting, unless they can also serve as the source for ext::
>> remote helper.  And if they can serve "git fetch ext::...", I think
>> they can be treated just like a normal Git repository by the
>> backfill code that needs to lazily populate the partial clone.
>
> I don't know about that. Imagine I had a regular Git repo with a bunch
> of large blobs, and then I also stored those large blobs in something
> like S3 that provides caching, geographic locality, and resumable
> transfers.
>
> It would be nice to be able to say:
>
>   1. Clone from the real repo, but do not transfer any blobs larger than
>  10MB.
>
>   2. When you need a blob, check the external odb that points to S3. Git
>  cannot know about this automatically, but presumably you would set
>  a few config variables to point to an external-odb helper script.
>
>   3. If for some reason S3 doesn't work, you can always request it from
>  the original repo. That part _doesn't_ need extra config, since we
>  can assume that the source of the promisor pack can feed us the
>  extra objects[1].
>
> But you don't need to ever be able to "git fetch" from the S3 repo.
>
> Now if you are arguing that the interface to the external-odb helper
> script should be that it _looks_ like upload-pack, but simply advertises
> no refs and will let you fetch any object, that makes more sense to me.
> It's not something you could "git clone", but you can "git fetch" from
> it.

Yup.  The lazy backfill JTan has, if I understand correctly, only
wants "Please give me this and that object" and use of "upload-pack"
is an implementation detail.  Over the existing Git protocols, you
may implement it as sending these object names as "want" and perhaps
restrict the traversal (if there is a "want" object that is commit)
by giving some commits as "have", i.e. "upload-pack" may not be the
best model for the other side, but that is what we have readily
available.  I was hoping that the way we take to move forward is to
enhance that interface so that we can use different "object store"
backends as needed, to satisfy needs from both parties.


Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-09-26 Thread Taylor Blau
On Wed, Sep 26, 2018 at 12:12:22AM -0400, Jeff King wrote:
> On Tue, Sep 25, 2018 at 03:31:36PM -0700, Junio C Hamano wrote:
>
> > Christian Couder  writes:
> >
> > > The main issue that this patch series tries to solve is that
> > > extensions.partialclone config option limits the partial clone and
> > > promisor features to only one remote. One related issue is that it
> > > also prevents to have other kind of promisor/partial clone/odb
> > > remotes. By other kind I mean remotes that would not necessarily be
> > > git repos, but that could store objects (that's where ODB, for Object
> > > DataBase, comes from) and could provide those objects to Git through a
> > > helper (or driver) script or program.
> >
> > I do not think "sources that are not git repositories" is all that
> > interesting, unless they can also serve as the source for ext::
> > remote helper.  And if they can serve "git fetch ext::...", I think
> > they can be treated just like a normal Git repository by the
> > backfill code that needs to lazily populate the partial clone.
>
> I don't know about that. Imagine I had a regular Git repo with a bunch
> of large blobs, and then I also stored those large blobs in something
> like S3 that provides caching, geographic locality, and resumable
> transfers.
>
> [ ... ]
>
> Now if you are arguing that the interface to the external-odb helper
> script should be that it _looks_ like upload-pack, but simply advertises
> no refs and will let you fetch any object, that makes more sense to me.
> It's not something you could "git clone", but you can "git fetch" from
> it.
>
> However, that may be an overly constricting interface for the helper.
> E.g., we might want to be able to issue several requests and have them
> transfer in parallel. But I suppose we could teach that trick to
> upload-pack in the long run, as it may be applicable even to fetching
> from "real" git repos.
>
> Hmm. Actually, I kind of like that direction the more I think about it.

Yes, this is an important design decision for Git LFS, which I believe
is important to this series. Git LFS allows the caller to issue `n`
parallel object transfers (uploads or downloads) at a time, which is
useful when, say, checking out a repository that has many large objects.

We do this trick with 'filter.lfs.process', where we accumulate many Git
LFS objects that we wish to tell Git about so that it can check them out
into the working copy, and then promise that we will provide the
contents later (e.g., by sending status=delayed).

We then "batch" up all of those requests, issue them all at once (after
which the LFS API will tell us the URLs of where to upload/download each
item), and then we open "N" threads to do that work.

After all of that, we respond back with all of the objects that we had
to download, and close the process filter.

Thanks,
Taylor


Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-09-25 Thread Jeff King
On Tue, Sep 25, 2018 at 03:31:36PM -0700, Junio C Hamano wrote:

> Christian Couder  writes:
> 
> > The main issue that this patch series tries to solve is that
> > extensions.partialclone config option limits the partial clone and
> > promisor features to only one remote. One related issue is that it
> > also prevents to have other kind of promisor/partial clone/odb
> > remotes. By other kind I mean remotes that would not necessarily be
> > git repos, but that could store objects (that's where ODB, for Object
> > DataBase, comes from) and could provide those objects to Git through a
> > helper (or driver) script or program.
> 
> I do not think "sources that are not git repositories" is all that
> interesting, unless they can also serve as the source for ext::
> remote helper.  And if they can serve "git fetch ext::...", I think
> they can be treated just like a normal Git repository by the
> backfill code that needs to lazily populate the partial clone.

I don't know about that. Imagine I had a regular Git repo with a bunch
of large blobs, and then I also stored those large blobs in something
like S3 that provides caching, geographic locality, and resumable
transfers.

It would be nice to be able to say:

  1. Clone from the real repo, but do not transfer any blobs larger than
 10MB.

  2. When you need a blob, check the external odb that points to S3. Git
 cannot know about this automatically, but presumably you would set
 a few config variables to point to an external-odb helper script.

  3. If for some reason S3 doesn't work, you can always request it from
 the original repo. That part _doesn't_ need extra config, since we
 can assume that the source of the promisor pack can feed us the
 extra objects[1].

But you don't need to ever be able to "git fetch" from the S3 repo.

Now if you are arguing that the interface to the external-odb helper
script should be that it _looks_ like upload-pack, but simply advertises
no refs and will let you fetch any object, that makes more sense to me.
It's not something you could "git clone", but you can "git fetch" from
it.

However, that may be an overly constricting interface for the helper.
E.g., we might want to be able to issue several requests and have them
transfer in parallel. But I suppose we could teach that trick to
upload-pack in the long run, as it may be applicable even to fetching
from "real" git repos.

Hmm. Actually, I kind of like that direction the more I think about it.

-Peff


Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-09-25 Thread Junio C Hamano
Christian Couder  writes:

> The main issue that this patch series tries to solve is that
> extensions.partialclone config option limits the partial clone and
> promisor features to only one remote. One related issue is that it
> also prevents to have other kind of promisor/partial clone/odb
> remotes. By other kind I mean remotes that would not necessarily be
> git repos, but that could store objects (that's where ODB, for Object
> DataBase, comes from) and could provide those objects to Git through a
> helper (or driver) script or program.

I do not think "sources that are not git repositories" is all that
interesting, unless they can also serve as the source for ext::
remote helper.  And if they can serve "git fetch ext::...", I think
they can be treated just like a normal Git repository by the
backfill code that needs to lazily populate the partial clone.

And it would be nice to be able to say "I took these commits from
that remote and that remote should be able to backfill the trees and
the blobs necessary to complete these commits" for more than one
remote would obviously be a good thing.  The way we mark the
promisor packs currently is by a mere presence of a file, but
nothing prevents us from extending it to write the nickname of the
configured remote the pack was taken from to help us answer "who can
feed us the remaining objects?", for example, so I do not think it
is an insurmountable problem

I guess JTan is the primary person who is interested/working on the
partial clone with backfill?  Have you two been collaborating well?

Do you two need help from us to make that happen, and if so what do
you need?  Stop the world and declare this and that source files are
off limits for two weeks, or something like that?




Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-09-25 Thread Christian Couder
On Fri, Aug 3, 2018 at 12:55 AM, Stefan Beller  wrote:
> On Wed, Aug 1, 2018 at 11:16 PM Christian Couder
>  wrote:
>>
>> From: Christian Couder 
>>
>> Signed-off-by: Junio C Hamano 
>> ---
>>  Documentation/config.txt | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 43b2de7b5f..2d048d47f2 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -2513,6 +2513,11 @@ This setting can be overridden with the 
>> `GIT_NOTES_REWRITE_REF`
>>  environment variable, which must be a colon separated list of refs or
>>  globs.
>>
>> +odb..promisorRemote::
>> +   The name of a promisor remote. For now promisor remotes are
>> +   the only kind of remote object database (odb) that is
>> +   supported.
>> +
>
> Can you explain the end goal for this? (I did not find it in the cover letter,
> nor do I make sense of this documentation)

First thank you for (re)opening this discussion, as I think it may
help resolve the issues related to my work.

In the cover letter there is a "Discussion" section which is about
this, but I agree that it might not be very clear.

The main issue that this patch series tries to solve is that
extensions.partialclone config option limits the partial clone and
promisor features to only one remote. One related issue is that it
also prevents to have other kind of promisor/partial clone/odb
remotes. By other kind I mean remotes that would not necessarily be
git repos, but that could store objects (that's where ODB, for Object
DataBase, comes from) and could provide those objects to Git through a
helper (or driver) script or program.

For reference I tried to raise these issues (especially the first one)
at least twice before extensions.partialclone was merged:

https://public-inbox.org/git/cap8ufd3jt+0lq9yx_7x3sjd+jg+a25bagdg7zp+dzv43+1-...@mail.gmail.com/
https://public-inbox.org/git/cap8ufd0p7kvo2np4wq7oasv4h1+sqhapuzw5aqef+enns0s...@mail.gmail.com/

but it was still merged as is.

(So of course now it's not surprising that my work on this patch
series keeps conflicting with work that is still going on promisors
and partial clone, and unfortunately the result is that my work keeps
being ejected from pu when it can reach it.)

> So from what I understand, this series relates to partialClone, which
> has the remote name of the "promisor" in extensions.partialclone.
> That is the remote to contact for any needs w.r.t. partial clone and
> fetching on demand.

Yes.

> This key "odb..promisorRemote = " introduces
> 2 new names, where do each of these two names hook in?
> name2 is a remote, such as "origin" from what I can tell, but
> which naming scheme does name1 follow here?

There is just one new name. Instead of:

  extensions.partialclone = 

there is:

  odb..promisorRemote = 

So it is now like:

  remote..url = 

which we use for remote repositories.

And it enables us to:

  - have more than one promisor remote
  - specify different parameters for each promisor remote
  - make it possible later to have other kind of promisor/odb remotes

It also restores the distributed nature of Git which was kind of
broken for promisor remotes.

> What makes the odb key different, in that the partial clone
> feature only handles objects as well?

I am not sure I understand this question. Anyway if we want more than
one promisor remote, we need to be able to specify different
parameters for each promisor remote. For example now
core.partialclonefilter is used to specify some filters for the
promisor remote, but how can we nicely specify different partial clone
filters if we have more than one promisor remote?

With the changes in this patch series core.partialclonefilter is
replaced with odb..partialclonefilter, so that
parameters for a remote odb are properly grouped together in the
section where the remote odb is defined.

One alternative scheme could be for example to have:

  remote..promisor = (true|false)

or maybe:

  remote..partialclone = (true|false)

instead of:

  extensions.partialclone = 

And then we could also have:

  remote..partialclonefilter = 

The issue with this scheme is that it kind of overloads the
"remote..*" namespace for something that can be seen as
different especially if, as I want to do it later, we are going to
have other kind of promisor/odb remotes.

I plan to send a V5 of this patch series really soon now, where I will
try to explain better the end goal.


Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-08-02 Thread Stefan Beller
On Wed, Aug 1, 2018 at 11:16 PM Christian Couder
 wrote:
>
> From: Christian Couder 
>
> Signed-off-by: Junio C Hamano 
> ---
>  Documentation/config.txt | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 43b2de7b5f..2d048d47f2 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2513,6 +2513,11 @@ This setting can be overridden with the 
> `GIT_NOTES_REWRITE_REF`
>  environment variable, which must be a colon separated list of refs or
>  globs.
>
> +odb..promisorRemote::
> +   The name of a promisor remote. For now promisor remotes are
> +   the only kind of remote object database (odb) that is
> +   supported.
> +

Can you explain the end goal for this? (I did not find it in the cover letter,
nor do I make sense of this documentation)

So from what I understand, this series relates to partialClone, which
has the remote name of the "promisor" in extensions.partialclone.
That is the remote to contact for any needs w.r.t. partial clone and
fetching on demand.

This key "odb..promisorRemote = " introduces
2 new names, where do each of these two names hook in?
name2 is a remote, such as "origin" from what I can tell, but
which naming scheme does name1 follow here?

What makes the odb key different, in that the partial clone
feature only handles objects as well?

>  pack.window::
> The size of the window used by linkgit:git-pack-objects[1] when no
> window size is given on the command line. Defaults to 10.
> --
> 2.18.0.330.g17eb9fed90
>


[PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-08-02 Thread Christian Couder
From: Christian Couder 

Signed-off-by: Junio C Hamano 
---
 Documentation/config.txt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 43b2de7b5f..2d048d47f2 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2513,6 +2513,11 @@ This setting can be overridden with the 
`GIT_NOTES_REWRITE_REF`
 environment variable, which must be a colon separated list of refs or
 globs.
 
+odb..promisorRemote::
+   The name of a promisor remote. For now promisor remotes are
+   the only kind of remote object database (odb) that is
+   supported.
+
 pack.window::
The size of the window used by linkgit:git-pack-objects[1] when no
window size is given on the command line. Defaults to 10.
-- 
2.18.0.330.g17eb9fed90