Re: [PATCH 0/9] Add missing includes and forward declares

2018-08-14 Thread Elijah Newren
On Tue, Aug 14, 2018 at 10:45 PM Junio C Hamano  wrote:
> Elijah Newren  writes:
>
> > On Mon, Aug 13, 2018 at 11:24 AM Junio C Hamano  wrote:
> >> Jeff King  writes:
> >
> >> As things are slowly moving out of the so-far kitchen-sink "cache.h"
> >> into more specific subsystem headers (like object-store.h), we may
> >> actually want to tighten the "header that includes it first" part a
> >> bit in the future, so that 'git grep cache.h' would give us a more
> >> explicit and a better picture of what really depends on knowing what
> >> the lowest level plumbing API are built around.
> >>
> >> > So I think the better test is a two-line .c file with:
> >> >
> >> >   #include "git-compat-util.h"
> >> >   #include $header_to_check
> >>
> >> But until that tightening happens, I do not actually mind the
> >> two-line .c file started with inclusion of cache.h instead of
> >> git-compat-util.h.  That would limit the scope of this series
> >> further.
> >
> > Yes, this removes about 2/3 of patch #1.
>
> Sorry for making a misleading comment.  I should have phrased "I
> would not have minded if the series were looser by assuming
> cache.h", implying that "but now the actual patch went extra mile to
> be more complete, what we have is even better ;-)".

Ah, gotcha.  Thanks for the clarification.


Re: [PATCHv3 1/6] Add missing includes and forward declares

2018-08-14 Thread Elijah Newren
On Tue, Aug 14, 2018 at 10:10 PM Jonathan Nieder  wrote:
>
> Elijah Newren wrote:
>
> > Subject: Add missing includes and forward declares
>
> nit: s/declares/declarations/

Thanks.

> This is a huge patch.  Was it autogenerated or generated manually?
> Can the commit message say something about methodology?

Mostly manually.  I had a simple program that would create a dummy.c
file that included git-compat-util.h then exactly one header, compile
it, and spit any compile errors at me.  I repeated that through the
top-level headers.

I didn't want to repeat that description in all 6 patches, since all
six came from that, so I put it in the cover letter.  Since patch #1
has most that changes though, I guess it makes sense to include it at
least in that one?

> Is there an easy way to review it?  (Keep in mind that I'm super lazy.
> ;-))

I guess I could send you my hacky python script that loops through the
top-level header files and creates the dummy two-line c file, and you
could inspect it and run it.  But that only verifies that it compiles,
not that the changes I choose are "correct".

>
> > Signed-off-by: Elijah Newren 
> > ---
> [...]
> > --- a/alloc.h
> > +++ b/alloc.h
> > @@ -1,9 +1,11 @@
> >  #ifndef ALLOC_H
> >  #define ALLOC_H
> >
> > +struct alloc_state;
> >  struct tree;
> >  struct commit;
> >  struct tag;
> > +struct repository;
> >
> >  void *alloc_blob_node(struct repository *r);
>
> That's reasonable.  Going forward, is there a way to tell if some of
> these forward declarations are no longer needed at some point in the
> future (e.g. can clang be convinced to warn us about it)?

I'm not aware of anything currently; while I could have easily missed
things, projects like
https://github.com/include-what-you-use/include-what-you-use (which
look active and have a July 2018 date on them) make me suspect there
isn't a good answer currently.

> [...]
> > --- a/apply.h
> > +++ b/apply.h
> > @@ -1,6 +1,9 @@
> >  #ifndef APPLY_H
> >  #define APPLY_H
> >
> > +#include "lockfile.h"
> > +#include "string-list.h"
> > +
> >  enum apply_ws_error_action {
>
> Here, to avoid strange behavior, we have to be careful to make sure
> the headers don't #include apply.h back.  It's a pretty high-level
> header so there's no risk of that *phew*.

:-)

> [...]
> > --- a/archive.h
> > +++ b/archive.h
> > @@ -3,6 +3,9 @@
> >
> >  #include "pathspec.h"
> >
> > +struct object_id;
> > +enum object_type;
>
> enums are of unknown size, so forward declarations don't work for
> them.  See bb/pedantic for some examples.

structs are also of unknown size; the size is irrelevant when the
function signature merely uses a pointer to the struct or enum.  The
enum forward declaration fixes a compilation bug.

> enum object_type is defined in cache.h, so should this #include that?

We could, but we don't need the definition; a forward declaration is sufficient.

> [...]
> > --- a/commit-graph.h
> > +++ b/commit-graph.h
> > @@ -4,6 +4,7 @@
> >  #include "git-compat-util.h"
> >  #include "repository.h"
> >  #include "string-list.h"
> > +#include "cache.h"
>
> We can skip the #include of git-compat-util.h since all .c files
> include it.

Good point.  Should I go through and remove all the inclusions of
git-compat-util.h in header files?

> [...]
> > --- a/fsmonitor.h
> > +++ b/fsmonitor.h
> > @@ -1,6 +1,13 @@
> >  #ifndef FSMONITOR_H
> >  #define FSMONITOR_H
> >
> > +#include "cache.h"
> > +#include "dir.h"
> > +
> > +struct cache_entry;
> > +struct index_state;
> > +struct strbuf;
>
> cache_entry et al are defined in cache.h, right?  Are these forward
> decls needed?

Good catch; they can be removed.  I'm pretty sure I added the forward
declarations first, then noticed it wasn't enough, added the cache.h
include, and forgot to clean up.

> [...]
> > --- a/merge-recursive.h
> > +++ b/merge-recursive.h
> > @@ -1,8 +1,10 @@
> >  #ifndef MERGE_RECURSIVE_H
> >  #define MERGE_RECURSIVE_H
> >
> > -#include "unpack-trees.h"
> >  #include "string-list.h"
> > +#include "unpack-trees.h"
>
> just curious, no need to change: where does this reordering come from?

Well, since I was manually editing anyway, I saw these and decided to
alphabetize it since it's a file I deal with a lot.  *shrug*

> [...]
> > --- a/pathspec.h
> > +++ b/pathspec.h
> > @@ -1,6 +1,11 @@
> >  #ifndef PATHSPEC_H
> >  #define PATHSPEC_H
> >
> > +#include "string.h"
> > +#include "strings.h"
>
> What are these headers?

The original patch[1] had explanations of why I added them:

+#include "string.h"   /* For str[n]cmp */
+#include "strings.h"  /* For str[n]casecmp */

But Peff requested that I remove the comments.

[1] https://public-inbox.org/git/20180811043218.31456-2-new...@gmail.com/

> The rest looks good.
>
> Thanks and hope that helps,
> Jonathan

Thanks for taking a look!


Re: Syncing HEAD

2018-08-14 Thread Christian Couder
On Tue, Aug 14, 2018 at 11:47 PM, Jeff King  wrote:
> On Tue, Aug 14, 2018 at 05:06:16PM -0400, Jeff King wrote:
>
>> On Tue, Aug 14, 2018 at 10:09:37PM +0200, Christian Couder wrote:
>>
>> > When cloning with --mirror, the clone gets its HEAD initialized with
>> > the value HEAD has in its origin remote. After that if HEAD changes in
>> > origin there is no simple way to sync HEAD at the same time as the
>> > refs are synced.
>> >
>> > It looks like the simplest way to sync HEAD is:
>> >
>> > 1) git remote show origin
>> > 2) parse "HEAD branch: XXX" from the output of the above command
>> > 3) git symbolic-ref HEAD refs/heads/XXX
>>
>> How about:
>>
>>   git remote set-head origin -a
>>
>> ?
>
> Reading your message again, I see you actually care less about the
> refs/remote placeholder and more about the actual HEAD in a bare repo.

Yeah, I am interesting in updating the actual HEAD in a bare repo.

> In which case "git remote" isn't going to help, though its underlying
> code has the algorithm you would want.

Ok, I will take a look at the algorithm.

>> One tricky thing is that the name "refs/remotes//HEAD" is only
>> special by convention, and that convention is known on the writing side
>> only by git-clone and git-remote. So obviously:
>
> And so here the convention is simpler, because we're talking about the
> main HEAD. But we still have know if you want to do that, and not update
> some refs/remotes/ symref in a bare repo.

We could maybe look at the "remote.XXX.mirror" config option. If it is
set to "true", we could interpret that as meaning we are interested in
updating the main HEAD and not some refs/remotes/ symref.

> So all of this really implies to me that you want to be able to say
> "take this symref on the other side and update this one on the local
> side". I.e., some way to tell a refspec "don't update the value, update
> the symref destination". So imagine we made "~" the magic character for
> "just the symrefs" (I picked that because it's not allowed in a
> refname).
>
> Then you could do what you want with:
>
>   git config --add remote.origin.fetch ~HEAD:HEAD
>
> and these two would be the same:
>
>   git remote set-head origin -a
>   git fetch origin ~HEAD:refs/remotes/origin/HEAD
>
> And it would allow more exotic things, too, like:
>
>   # always update the remote notion of HEAD on every fetch
>   git config --add remote.origin.fetch ~HEAD:refs/remotes/origin/HEAD
>
>   # update a non-HEAD symref we track for our own purposes
>   git fetch origin ~refs/tags/LATEST:refs/tags/LATEST
>
>   # or the same thing but using the usual refspec "dst defaults to src"
>   # rule and dwim lookup magic
>   git fetch origin ~LATEST

And `git fetch origin ~HEAD` would sync the main HEAD?

Yeah, that looks like an interesting solution to the problem.

I wonder though if we should restrict the way `git fetch origin ~XXX`
searches the .git/ directory itself.

> In protocol v0 we don't get symref reports from the other side over the
> git protocol (except for HEAD), but we could use the same logic we use
> for determining HEAD for older versions of Git: find a ref that points
> to the same tip. Though I would say that unlike the existing code in
> guess_remote_head(), we'd probably want to treat an ambiguity as an
> error, and not just default to refs/heads/master.

I wonder what `git fetch origin ~refs/heads/*:refs/heads/*` should do.
Could it know which refs are symrefs using protocol v0? Should it
guess that refs with uppercase names are symrefs? Should we allow '*'
at all in those kinds of refspecs?

It looks like making "~" the magic character for "just the symrefs"
might be a good solution in the end, though we might want to restrict
it to protocol v2.
So perhaps something like `git fetch --update-head` that you suggest
in another email would be a good solution for now and for protocol v0.


Re: [PATCH 0/9] Add missing includes and forward declares

2018-08-14 Thread Junio C Hamano
Elijah Newren  writes:

> On Mon, Aug 13, 2018 at 11:24 AM Junio C Hamano  wrote:
>> Jeff King  writes:
>
>> As things are slowly moving out of the so-far kitchen-sink "cache.h"
>> into more specific subsystem headers (like object-store.h), we may
>> actually want to tighten the "header that includes it first" part a
>> bit in the future, so that 'git grep cache.h' would give us a more
>> explicit and a better picture of what really depends on knowing what
>> the lowest level plumbing API are built around.
>>
>> > So I think the better test is a two-line .c file with:
>> >
>> >   #include "git-compat-util.h"
>> >   #include $header_to_check
>>
>> But until that tightening happens, I do not actually mind the
>> two-line .c file started with inclusion of cache.h instead of
>> git-compat-util.h.  That would limit the scope of this series
>> further.
>
> Yes, this removes about 2/3 of patch #1.

Sorry for making a misleading comment.  I should have phrased "I
would not have minded if the series were looser by assuming
cache.h", implying that "but now the actual patch went extra mile to
be more complete, what we have is even better ;-)".



Re: [PATCHv3 6/6] Add missing includes and forward declares

2018-08-14 Thread Jonathan Nieder
Hi,

Elijah Newren wrote:

> Signed-off-by: Elijah Newren 
> ---
>  bisect.h   | 2 ++
>  pack-objects.h | 1 +
>  2 files changed, 3 insertions(+)

Do you have more context about why these are in a separate commit?

For pack-objects.h, I think you might be making a fix to the
cc/delta-islands topic.  But for bisect.h, I don't see any recent
relevant changes.

Curious,
Jonathan


Re: [PATCHv3 5/6] compat/precompose_utf8.h: use more common include guard style

2018-08-14 Thread Jonathan Nieder
Elijah Newren wrote:

> Signed-off-by: Elijah Newren 
> ---
>  compat/precompose_utf8.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Reviewed-by: Jonathan Nieder 

The more unusual style is less likely to be recognized by compilers, so
we can waste some I/O re-reading the header at compile time.

> --- a/compat/precompose_utf8.h
> +++ b/compat/precompose_utf8.h
> @@ -1,4 +1,6 @@
>  #ifndef PRECOMPOSE_UNICODE_H
> +#define PRECOMPOSE_UNICODE_H
> +
>  #include 
>  #include 
>  #include 

Not about this patch: these headers are #include-d in git-compat-util.h
which we assume to be already included first before anything else.  Can
we remove these redundant #includes?


Re: [PATCHv3 4/6] urlmatch.h: fix include guard

2018-08-14 Thread Jonathan Nieder
Elijah Newren wrote:

> Signed-off-by: Elijah Newren 
> ---
>  urlmatch.h | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Jonathan Nieder 

This has two benefits:

- avoids the definitions in this header from being evaluated twice

- many compilers recognize the #include guard idiom and also are able
  to avoid re-reading the text of the header twice, too

The header guard had been broken since day one.  Is there a tool that
can detect this class of problem?


Re: [PATCHv3 3/6] Move definition of enum branch_track from cache.h to branch.h

2018-08-14 Thread Jonathan Nieder
Elijah Newren wrote:

> 'branch_track' feels more closely related to branching, and it is
> needed later in branch.h; rather than #include'ing cache.h in branch.h
> for this small enum, just move the enum and the external declaration
> for git_branch_track to branch.h.
>
> Signed-off-by: Elijah Newren 
> ---
>  branch.h  | 11 +++
>  cache.h   | 10 --
>  config.c  |  1 +
>  environment.c |  1 +
>  4 files changed, 13 insertions(+), 10 deletions(-)

\o/

Reviewed-by: Jonathan Nieder 


Re: [PATCHv3 2/6] alloc: make allocate_alloc_state and clear_alloc_state more consistent

2018-08-14 Thread Jonathan Nieder
Elijah Newren wrote:

> Since both functions are using the same data type, they should either both
> refer to it as void *, or both use the real type (struct alloc_state *).
> Opt for the latter.
> 
> Signed-off-by: Elijah Newren 
> ---
>  alloc.c | 2 +-
>  alloc.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

This API comes from 14ba97f8 (alloc: allow arbitrary repositories for
alloc functions, 2018-05-15).  The pointer returned by
allocate_alloc_state points to an initialized, usable object so
returning a struct instead of a void pointer is the right thing to do.
Thanks for noticing and fixing it.

That commit in turn was probably inspired by v1.5.2-rc0~16^2 (Clean up
object creation to use more common code, 2007-04-16), which was
following the same convention (void * for raw memory, struct blob *
for initialized object).

Reviewed-by: Jonathan Nieder 


Re: [PATCHv3 1/6] Add missing includes and forward declares

2018-08-14 Thread Jonathan Nieder
Elijah Newren wrote:

> Subject: Add missing includes and forward declares

nit: s/declares/declarations/

This is a huge patch.  Was it autogenerated or generated manually?
Can the commit message say something about methodology?

Is there an easy way to review it?  (Keep in mind that I'm super lazy.
;-))

> Signed-off-by: Elijah Newren 
> ---
[...]
> --- a/alloc.h
> +++ b/alloc.h
> @@ -1,9 +1,11 @@
>  #ifndef ALLOC_H
>  #define ALLOC_H
>  
> +struct alloc_state;
>  struct tree;
>  struct commit;
>  struct tag;
> +struct repository;
>  
>  void *alloc_blob_node(struct repository *r);

That's reasonable.  Going forward, is there a way to tell if some of
these forward declarations are no longer needed at some point in the
future (e.g. can clang be convinced to warn us about it)?

[...]
> --- a/apply.h
> +++ b/apply.h
> @@ -1,6 +1,9 @@
>  #ifndef APPLY_H
>  #define APPLY_H
>  
> +#include "lockfile.h"
> +#include "string-list.h"
> +
>  enum apply_ws_error_action {

Here, to avoid strange behavior, we have to be careful to make sure
the headers don't #include apply.h back.  It's a pretty high-level
header so there's no risk of that *phew*.

[...]
> --- a/archive.h
> +++ b/archive.h
> @@ -3,6 +3,9 @@
>  
>  #include "pathspec.h"
>  
> +struct object_id;
> +enum object_type;

enums are of unknown size, so forward declarations don't work for
them.  See bb/pedantic for some examples.

enum object_type is defined in cache.h, so should this #include that?

[...]
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -4,6 +4,7 @@
>  #include "git-compat-util.h"
>  #include "repository.h"
>  #include "string-list.h"
> +#include "cache.h"

We can skip the #include of git-compat-util.h since all .c files
include it.

[...]
> --- a/fsmonitor.h
> +++ b/fsmonitor.h
> @@ -1,6 +1,13 @@
>  #ifndef FSMONITOR_H
>  #define FSMONITOR_H
>  
> +#include "cache.h"
> +#include "dir.h"
> +
> +struct cache_entry;
> +struct index_state;
> +struct strbuf;

cache_entry et al are defined in cache.h, right?  Are these forward
decls needed?

[...]
> --- a/merge-recursive.h
> +++ b/merge-recursive.h
> @@ -1,8 +1,10 @@
>  #ifndef MERGE_RECURSIVE_H
>  #define MERGE_RECURSIVE_H
>  
> -#include "unpack-trees.h"
>  #include "string-list.h"
> +#include "unpack-trees.h"

just curious, no need to change: where does this reordering come from?

[...]
> --- a/pathspec.h
> +++ b/pathspec.h
> @@ -1,6 +1,11 @@
>  #ifndef PATHSPEC_H
>  #define PATHSPEC_H
>  
> +#include "string.h"
> +#include "strings.h"

What are these headers?

The rest looks good.

Thanks and hope that helps,
Jonathan


Re: [PATCH 0/9] Add missing includes and forward declares

2018-08-14 Thread Jonathan Nieder
Jeff King wrote:

> Subject: [PATCH] test-tool.h: include git-compat-util.h
>
> The test-tool programs include "test-tool.h" as their first
> include, which breaks our CodingGuideline of "the first
> include must be git-compat-util.h or an equivalent". This
> isn't actually a problem, as test-tool.h doesn't define
> anything tricky, but we should probably follow our own rule.
>
> Rather than change them all, let's instead make test-tool.h
> one of those equivalents, just like we do for builtin.h
> (which many of the actual git builtins include first).

I wonder if it would not be simpler to change them all.  It would be
one less special case.

That said,

> Signed-off-by: Jeff King 
> ---
>  t/helper/test-tool.h | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Jonathan Nieder 

Thanks.


Re: [PATCH 0/9] Add missing includes and forward declares

2018-08-14 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Aug 13, 2018 at 11:24:37AM -0700, Junio C Hamano wrote:
>> Jeff King wrote:

>>> So I think the better test is a two-line .c file with:
>>>
>>>   #include "git-compat-util.h"
>>>   #include $header_to_check
>>
>> But until that tightening happens, I do not actually mind the
>> two-line .c file started with inclusion of cache.h instead of
>> git-compat-util.h.  That would limit the scope of this series
>> further.
>
> I can go either way on this. Using cache.h makes Elijah's current series
> a bit more focused. But I think we'd eventually want to go there anyway.
> I don't have a strong opinion on doing it now or later.

For what it's worth, I'd been assuming that any header that is not
self-contained after a #include of "git-compat-util.h" is a bug.

It's true that many .c files include git-compat-util.h via cache.h,
making such a bug harder to notice.  But the moment you write a source
file that doesn't include cache.h you'd run into that problem, so it's
worth fixing ahead-of-time anyway.

Thanks,
Jonathan


Re: [PATCH 0/9] Add missing includes and forward declares

2018-08-14 Thread Jeff King
On Mon, Aug 13, 2018 at 11:24:37AM -0700, Junio C Hamano wrote:

> As things are slowly moving out of the so-far kitchen-sink "cache.h"
> into more specific subsystem headers (like object-store.h), we may
> actually want to tighten the "header that includes it first" part a
> bit in the future, so that 'git grep cache.h' would give us a more
> explicit and a better picture of what really depends on knowing what
> the lowest level plumbing API are built around.

Yeah, I agree that's a good long-term goal. I think "builtin.h" is
reasonable to remain as a magic header for builtins.

> > So I think the better test is a two-line .c file with:
> >
> >   #include "git-compat-util.h"
> >   #include $header_to_check
> 
> But until that tightening happens, I do not actually mind the
> two-line .c file started with inclusion of cache.h instead of
> git-compat-util.h.  That would limit the scope of this series
> further.

I can go either way on this. Using cache.h makes Elijah's current series
a bit more focused. But I think we'd eventually want to go there anyway.
I don't have a strong opinion on doing it now or later.

> > I wonder if there's an easy way to check. I guess adding '-Dint=foo' to
> > the command line, and then putting '#undef int' at the top of
> > git-compat-util would catch just about any code the compiler sees that
> > doesn't have git-compat-util included. :)
> 
> ;-).

So much to my amazement, my off-the-cuff suggestion actually worked
pretty well. The only failures were xdiff (expected, since it does its
own system-header stuff, though IMHO we might think about changing that)
and the programs in t/helper. Here's a patch to address the latter (and
you can see now why I said the above thing about "builtin.h"):

-- >8 --
Subject: [PATCH] test-tool.h: include git-compat-util.h

The test-tool programs include "test-tool.h" as their first
include, which breaks our CodingGuideline of "the first
include must be git-compat-util.h or an equivalent". This
isn't actually a problem, as test-tool.h doesn't define
anything tricky, but we should probably follow our own rule.

Rather than change them all, let's instead make test-tool.h
one of those equivalents, just like we do for builtin.h
(which many of the actual git builtins include first).

Signed-off-by: Jeff King 
---
 t/helper/test-tool.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 80cbcf0857..75d7c782f0 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -1,6 +1,8 @@
 #ifndef __TEST_TOOL_H__
 #define __TEST_TOOL_H__
 
+#include "git-compat-util.h"
+
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
 int cmd__ctype(int argc, const char **argv);
-- 
2.18.0.1070.g4763fa3c01



Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Jeff King
On Tue, Aug 14, 2018 at 04:55:34PM -0700, Matthew DeVore wrote:

> >   - it's expensive to compute, because you have to actually walk all of
> > the possible commits and trees that could refer to it. This
> > prohibits a lot of other optimizations like reachability bitmaps
> > (though with some complexity you could cache the depths, too).
> I think what the user likely wants is to use the minimum depth based
> on the commits in the traversal, not every commit in the repo - is
> this what you mean?

Right, I'd agree they probably want the minimum for that traversal. And
for `rev-list --filter`, that's probably OK. But keep in mind the main
goal for --filter is using it for fetches, and many servers do not
perform the traversal at all. Instead they use reachability bitmaps to
come up with the set of objects to send. The bitmaps have enough
information to say "remove all trees from the set", but not enough to do
any kind of depth-based calculation (not even "is this a root tree").

> Makes sense. I changed it like this -
> 
> diff --git a/Documentation/rev-list-options.txt
> b/Documentation/rev-list-options.txt
> index 0b5f77ad3..5f1672913 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -732,8 +732,10 @@ the requested refs.
>  The form '--filter=sparse:path=' similarly uses a sparse-checkout
>  specification contained in .
>  +
> -The form '--filter=tree:' omits all blobs and trees deeper than
> - from the root tree. Currently, only =0 is supported.
> +The form '--filter=tree:' omits all blobs and trees whose depth
> +from the root tree is >=  (minimum depth if an object is located
> +at multiple depths in the commits traversed). Currently, only =0
> +is supported, which omits all blobs and trees.

Yes, I think that makes sense. Thanks!

-Peff


Re: [PATCH 1/2] store submodule in common dir

2018-08-14 Thread Jonathan Nieder
Junio C Hamano wrote:

> Theoretically we should be able to make modules/kernel%2fv2.[24]
> additional "worktree"s of modules/kernel%2fv2.6, but given that
> these are all "bare" repositories without an attached working tree,
> I am not sure how that would supposed to work.  Thinking about
> having multiple worktrees on a single bare repository makes me head
> spin and ache X-<;-)

Agreed about spinning head.  This is why I suggested at [1] that
anyone intereseted in this start with description of their proposed
design, which would have three benefits:

- after implementation, it would document the intent behind whatever
  we choose to do, hopefully saving people debugging or improving this
  code some head spinning

- it would allow subject matter experts on-list to suggest refinements
  and simplifications

- it would avoid the interested contributor going too far down a blind
  alley in case their initial proposal has a fatal flaw

I also agree with the "theoretically we should be able to make it
work".  As described in [1], I think most of this is work we're going
to have to do eventually, as part of properly supporting multiple
worktrees for a superproject.  But I don't want to set wrong
expectations: this will be hard.

Thanks,
Jonathan

[1] 
https://public-inbox.org/git/20180814231049.gh142...@aiede.svl.corp.google.com/


[PATCH v5 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Matthew DeVore
Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also consider only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore 
---
 Documentation/rev-list-options.txt |  5 +++
 list-objects-filter-options.c  |  4 +++
 list-objects-filter-options.h  |  1 +
 list-objects-filter.c  | 50 ++
 t/t5317-pack-objects-filter-objects.sh | 28 +++
 t/t5616-partial-clone.sh   | 38 
 t/t6112-rev-list-filters-objects.sh| 12 +++
 7 files changed, 138 insertions(+)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 7b273635d..5f1672913 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -731,6 +731,11 @@ the requested refs.
 +
 The form '--filter=sparse:path=' similarly uses a sparse-checkout
 specification contained in .
++
+The form '--filter=tree:' omits all blobs and trees whose depth
+from the root tree is >=  (minimum depth if an object is located
+at multiple depths in the commits traversed). Currently, only =0
+is supported, which omits all blobs and trees.
 
 --no-filter::
Turn off any previous `--filter=` argument.
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..a28382940 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
return 0;
}
 
+   } else if (!strcmp(arg, "tree:0")) {
+   filter_options->choice = LOFC_TREE_NONE;
+   return 0;
+
} else if (skip_prefix(arg, "sparse:oid=", )) {
struct object_context oc;
struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
LOFC_DISABLED = 0,
LOFC_BLOB_NONE,
LOFC_BLOB_LIMIT,
+   LOFC_TREE_NONE,
LOFC_SPARSE_OID,
LOFC_SPARSE_PATH,
LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..8e3caf5bf 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -80,6 +80,55 @@ static void *filter_blobs_none__init(
return d;
 }
 
+/*
+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+   struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+   enum list_objects_filter_situation filter_situation,
+   struct object *obj,
+   const char *pathname,
+   const char *filename,
+   void *filter_data_)
+{
+   struct filter_trees_none_data *filter_data = filter_data_;
+
+   switch (filter_situation) {
+   default:
+   die("unknown filter_situation");
+   return LOFR_ZERO;
+
+   case LOFS_BEGIN_TREE:
+   case LOFS_BLOB:
+   if (filter_data->omits)
+   oidset_insert(filter_data->omits, >oid);
+   return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+   case LOFS_END_TREE:
+   assert(obj->type == OBJ_TREE);
+   return LOFR_ZERO;
+
+   }
+}
+
+static void* filter_trees_none__init(
+   struct oidset *omitted,
+   struct list_objects_filter_options *filter_options,
+   filter_object_fn *filter_fn,
+   filter_free_fn *filter_free_fn)
+{
+   struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
+   d->omits = omitted;
+
+   *filter_fn = filter_trees_none;
+   *filter_free_fn = free;
+   return d;
+}
+
 /*
  * A filter for list-objects to omit large blobs.
  * And to OPTIONALLY collect a list of the omitted OIDs.
@@ -374,6 +423,7 @@ static filter_init_fn s_filters[] = {
NULL,
filter_blobs_none__init,
filter_blobs_limit__init,
+   filter_trees_none__init,
filter_sparse_oid__init,

[PATCH v5 3/6] list-objects: always parse trees gently

2018-08-14 Thread Matthew DeVore
If parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the die("bad tree object...") call, so any message printed by
parse_tree_gently() is superfluous.

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..f9b51db7a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,6 @@ static void process_tree(struct traversal_context *ctx,
struct rev_info *revs = ctx->revs;
int baselen = base->len;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-   int gently = revs->ignore_missing_links ||
-revs->exclude_promisor_objects;
 
if (!revs->tree_objects)
return;
@@ -152,7 +150,7 @@ static void process_tree(struct traversal_context *ctx,
die("bad tree object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
-   if (parse_tree_gently(tree, gently) < 0) {
+   if (parse_tree_gently(tree, 1) < 0) {
if (revs->ignore_missing_links)
return;
 
-- 
2.18.0.865.gffc8e1a3cd6-goog



[PATCH v5 4/6] rev-list: handle missing tree objects properly

2018-08-14 Thread Matthew DeVore
Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. The --missing=*
and --exclude-promisor-objects flags now work for trees as they already
do for blobs. This is demonstrated in t6112.

Signed-off-by: Matthew DeVore 
---
 builtin/rev-list.c | 11 ---
 list-objects.c | 11 +--
 revision.h | 15 +
 t/t0410-partial-clone.sh   | 45 ++
 t/t5317-pack-objects-filter-objects.sh | 13 
 t/t6112-rev-list-filters-objects.sh| 17 ++
 6 files changed, 105 insertions(+), 7 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..49d6deed7 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 */
switch (arg_missing_action) {
case MA_ERROR:
-   die("missing blob object '%s'", oid_to_hex(>oid));
+   die("missing %s object '%s'",
+   type_name(obj->type), oid_to_hex(>oid));
return;
 
case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
case MA_ALLOW_PROMISOR:
if (is_promisor_object(>oid))
return;
-   die("unexpected missing blob object '%s'",
-   oid_to_hex(>oid));
+   die("unexpected missing %s object '%s'",
+   type_name(obj->type), oid_to_hex(>oid));
return;
 
default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
struct rev_list_info *info = cb_data;
-   if (obj->type == OBJ_BLOB && !has_object_file(>oid)) {
+   if (!has_object_file(>oid)) {
finish_object__ma(obj);
return 1;
}
@@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
init_revisions(, prefix);
revs.abbrev = DEFAULT_ABBREV;
revs.commit_format = CMIT_FMT_UNSPECIFIED;
+   revs.do_not_die_on_missing_tree = 1;
 
/*
 * Scan the argument list before invoking setup_revisions(), so that we
diff --git a/list-objects.c b/list-objects.c
index f9b51db7a..243192af5 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,6 +143,7 @@ static void process_tree(struct traversal_context *ctx,
struct rev_info *revs = ctx->revs;
int baselen = base->len;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
+   int failed_parse;
 
if (!revs->tree_objects)
return;
@@ -150,7 +151,9 @@ static void process_tree(struct traversal_context *ctx,
die("bad tree object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
-   if (parse_tree_gently(tree, 1) < 0) {
+
+   failed_parse = parse_tree_gently(tree, 1);
+   if (failed_parse) {
if (revs->ignore_missing_links)
return;
 
@@ -163,7 +166,8 @@ static void process_tree(struct traversal_context *ctx,
is_promisor_object(>oid))
return;
 
-   die("bad tree object %s", oid_to_hex(>oid));
+   if (!revs->do_not_die_on_missing_tree)
+   die("bad tree object %s", oid_to_hex(>oid));
}
 
strbuf_addstr(base, name);
@@ -178,7 +182,8 @@ static void process_tree(struct traversal_context *ctx,
if (base->len)
strbuf_addch(base, '/');
 
-   process_tree_contents(ctx, tree, base);
+   if (!failed_parse)
+   process_tree_contents(ctx, tree, base);
 
if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/revision.h b/revision.h
index c599c34da..51189 100644
--- a/revision.h
+++ b/revision.h
@@ -125,6 +125,21 @@ struct rev_info {
line_level_traverse:1,
tree_blobs_in_commit_order:1,
 
+   /*
+* Blobs are shown without regard for their existence.
+* But not so for trees: unless exclude_promisor_objects
+* is set and the tree in question is a promisor object;
+* OR ignore_missing_links is set, the revision walker
+* dies with a "bad tree object HASH" message when
+* encountering a missing tree. For callers that can
+* handle missing trees 

[PATCH v5 0/6] filter: support for excluding all trees and blobs

2018-08-14 Thread Matthew DeVore
Please take a look. I believe I have applied or responded to all suggestions
since the last iteration.

Matthew DeVore (6):
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  list-objects: always parse trees gently
  rev-list: handle missing tree objects properly
  revision: mark non-user-given objects instead
  list-objects-filter: implement filter tree:0

 Documentation/rev-list-options.txt |   5 +
 builtin/rev-list.c |  11 +-
 list-objects-filter-options.c  |   4 +
 list-objects-filter-options.h  |   1 +
 list-objects-filter.c  |  50 ++
 list-objects.c | 232 +
 revision.c |   1 -
 revision.h |  25 ++-
 t/t0410-partial-clone.sh   |  45 +
 t/t5317-pack-objects-filter-objects.sh |  41 +
 t/t5616-partial-clone.sh   |  38 
 t/t6112-rev-list-filters-objects.sh|  29 
 12 files changed, 364 insertions(+), 118 deletions(-)

-- 
2.18.0.865.gffc8e1a3cd6-goog



[PATCH v5 1/6] list-objects: store common func args in struct

2018-08-14 Thread Matthew DeVore
This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 158 +++--
 1 file changed, 74 insertions(+), 84 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..584518a3f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+   struct rev_info *revs;
+   show_object_fn show_object;
+   show_commit_fn show_commit;
+   void *show_data;
+   filter_object_fn filter_fn;
+   void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 struct blob *blob,
-show_object_fn show,
 struct strbuf *path,
-const char *name,
-void *cb_data,
-filter_object_fn filter_fn,
-void *filter_data)
+const char *name)
 {
struct object *obj = >object;
size_t pathlen;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-   if (!revs->blob_objects)
+   if (!ctx->revs->blob_objects)
return;
if (!obj)
die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 * may cause the actual filter to report an incomplete list
 * of missing objects.
 */
-   if (revs->exclude_promisor_objects &&
+   if (ctx->revs->exclude_promisor_objects &&
!has_object_file(>oid) &&
is_promisor_object(>oid))
return;
 
pathlen = path->len;
strbuf_addstr(path, name);
-   if (!(obj->flags & USER_GIVEN) && filter_fn)
-   r = filter_fn(LOFS_BLOB, obj,
- path->buf, >buf[pathlen],
- filter_data);
+   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   r = ctx->filter_fn(LOFS_BLOB, obj,
+  path->buf, >buf[pathlen],
+  ctx->filter_data);
if (r & LOFR_MARK_SEEN)
obj->flags |= SEEN;
if (r & LOFR_DO_SHOW)
-   show(obj, path->buf, cb_data);
+   ctx->show_object(obj, path->buf, ctx->show_data);
strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
const unsigned char *sha1,
-   show_object_fn show,
struct strbuf *path,
-   const char *name,
-   void *cb_data)
+   const char *name)
 {
/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 struct tree *tree,
-show_object_fn show,
 struct strbuf *base,
-const char *name,
-void *cb_data,
-filter_object_fn filter_fn,
-void *filter_data)
+const char *name)
 {
struct object *obj = >object;
+   struct rev_info *revs = ctx->revs;
struct tree_desc desc;
struct name_entry entry;
enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
}
 
strbuf_addstr(base, name);
-   if (!(obj->flags & USER_GIVEN) && filter_fn)
-   r = filter_fn(LOFS_BEGIN_TREE, obj,
- base->buf, >buf[baselen],
- filter_data);
+   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+  base->buf, >buf[baselen],
+  ctx->filter_data);
if (r & LOFR_MARK_SEEN)
obj->flags |= SEEN;
if (r & LOFR_DO_SHOW)
-   show(obj, base->buf, cb_data);
+   ctx->show_object(obj, base->buf, ctx->show_data);
if (base->len)
strbuf_addch(base, '/');
 
@@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs,
}
 
if (S_ISDIR(entry.mode))
-   process_tree(revs,
+   process_tree(ctx,
 lookup_tree(the_repository, entry.oid),
-   

[PATCH v5 2/6] list-objects: refactor to process_tree_contents

2018-08-14 Thread Matthew DeVore
This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
while (...) {
/* this is very indented */
}
}

we will have:

if (foo)
process_tree_contents(...);

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 68 ++
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 584518a3f..ccc529e5e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx,
/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+struct tree *tree,
+struct strbuf *base,
+const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+ struct tree *tree,
+ struct strbuf *base)
+{
+   struct tree_desc desc;
+   struct name_entry entry;
+   enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+   all_entries_interesting : entry_not_interesting;
+
+   init_tree_desc(, tree->buffer, tree->size);
+
+   while (tree_entry(, )) {
+   if (match != all_entries_interesting) {
+   match = tree_entry_interesting(, base, 0,
+  
>revs->diffopt.pathspec);
+   if (match == all_entries_not_interesting)
+   break;
+   if (match == entry_not_interesting)
+   continue;
+   }
+
+   if (S_ISDIR(entry.mode))
+   process_tree(ctx,
+lookup_tree(the_repository, entry.oid),
+base, entry.path);
+   else if (S_ISGITLINK(entry.mode))
+   process_gitlink(ctx, entry.oid->hash,
+   base, entry.path);
+   else
+   process_blob(ctx,
+lookup_blob(the_repository, entry.oid),
+base, entry.path);
+   }
+}
+
 static void process_tree(struct traversal_context *ctx,
 struct tree *tree,
 struct strbuf *base,
@@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx,
 {
struct object *obj = >object;
struct rev_info *revs = ctx->revs;
-   struct tree_desc desc;
-   struct name_entry entry;
-   enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-   all_entries_interesting: entry_not_interesting;
int baselen = base->len;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
int gently = revs->ignore_missing_links ||
@@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx,
if (base->len)
strbuf_addch(base, '/');
 
-   init_tree_desc(, tree->buffer, tree->size);
-
-   while (tree_entry(, )) {
-   if (match != all_entries_interesting) {
-   match = tree_entry_interesting(, base, 0,
-  >diffopt.pathspec);
-   if (match == all_entries_not_interesting)
-   break;
-   if (match == entry_not_interesting)
-   continue;
-   }
-
-   if (S_ISDIR(entry.mode))
-   process_tree(ctx,
-lookup_tree(the_repository, entry.oid),
-base, entry.path);
-   else if (S_ISGITLINK(entry.mode))
-   process_gitlink(ctx, entry.oid->hash, base, entry.path);
-   else
-   process_blob(ctx,
-lookup_blob(the_repository, entry.oid),
-base, entry.path);
-   }
+   process_tree_contents(ctx, tree, base);
 
if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.18.0.865.gffc8e1a3cd6-goog



[PATCH v5 5/6] revision: mark non-user-given objects instead

2018-08-14 Thread Matthew DeVore
Currently, list-objects.c incorrectly treats all root trees of commits
as USER_GIVEN. Also, it would be easier to mark objects that are
non-user-given instead of user-given, since the places in the code
where we access an object through a reference are more obvious than
the places where we access an object that was given by the user.

Resolve these two problems by introducing a flag NOT_USER_GIVEN that
marks blobs and trees that are non-user-given, replacing USER_GIVEN.
(Only blobs and trees are marked because this mark is only used when
filtering objects, and filtering of other types of objects is not
supported yet.)

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 31 ++-
 revision.c |  1 -
 revision.h | 10 +++---
 3 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 243192af5..7a1a0929d 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
 
pathlen = path->len;
strbuf_addstr(path, name);
-   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
r = ctx->filter_fn(LOFS_BLOB, obj,
   path->buf, >buf[pathlen],
   ctx->filter_data);
@@ -120,17 +120,19 @@ static void process_tree_contents(struct 
traversal_context *ctx,
continue;
}
 
-   if (S_ISDIR(entry.mode))
-   process_tree(ctx,
-lookup_tree(the_repository, entry.oid),
-base, entry.path);
+   if (S_ISDIR(entry.mode)) {
+   struct tree *t = lookup_tree(the_repository, entry.oid);
+   t->object.flags |= NOT_USER_GIVEN;
+   process_tree(ctx, t, base, entry.path);
+   }
else if (S_ISGITLINK(entry.mode))
process_gitlink(ctx, entry.oid->hash,
base, entry.path);
-   else
-   process_blob(ctx,
-lookup_blob(the_repository, entry.oid),
-base, entry.path);
+   else {
+   struct blob *b = lookup_blob(the_repository, entry.oid);
+   b->object.flags |= NOT_USER_GIVEN;
+   process_blob(ctx, b, base, entry.path);
+   }
}
 }
 
@@ -171,7 +173,7 @@ static void process_tree(struct traversal_context *ctx,
}
 
strbuf_addstr(base, name);
-   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
   base->buf, >buf[baselen],
   ctx->filter_data);
@@ -185,7 +187,7 @@ static void process_tree(struct traversal_context *ctx,
if (!failed_parse)
process_tree_contents(ctx, tree, base);
 
-   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+   if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
r = ctx->filter_fn(LOFS_END_TREE, obj,
   base->buf, >buf[baselen],
   ctx->filter_data);
@@ -301,8 +303,11 @@ static void do_traverse(struct traversal_context *ctx)
 * an uninteresting boundary commit may not have its tree
 * parsed yet, but we are not going to show them anyway
 */
-   if (get_commit_tree(commit))
-   add_pending_tree(ctx->revs, get_commit_tree(commit));
+   if (get_commit_tree(commit)) {
+   struct tree *tree = get_commit_tree(commit);
+   tree->object.flags |= NOT_USER_GIVEN;
+   add_pending_tree(ctx->revs, tree);
+   }
ctx->show_commit(commit, ctx->show_data);
 
if (ctx->revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index 062749437..6d355b43c 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info 
*revs,
strbuf_release();
return; /* do not add the commit itself */
}
-   obj->flags |= USER_GIVEN;
add_object_array_with_path(obj, name, >pending, mode, path);
 }
 
diff --git a/revision.h b/revision.h
index 51189..2d381e636 100644
--- a/revision.h
+++ b/revision.h
@@ -8,7 +8,11 @@
 #include "diff.h"
 #include "commit-slab-decl.h"
 
-/* Remember to update object flag allocation in object.h */
+/* Remember to update object flag allocation in object.h
+ * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only 

Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Matthew DeVore
On Tue, Aug 14, 2018 at 1:01 PM Jeff King  wrote:
>
> On Tue, Aug 14, 2018 at 10:28:13AM -0700, Matthew DeVore wrote:
>
> > The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
> > would filter out all but the root tree and blobs. In order to avoid
> > confusion between 0 and capital O, the documentation was worded in a
> > somewhat round-about way that also hints at this future improvement to
> > the feature.
>
> I'm OK with this as a name, since we're explicitly not supporting deeper
> depths. But I'd note that "depth" is actually a tricky characteristic,
> as it's not a property of the object itself, but rather who refers to
> it. So:
>
>   - it's expensive to compute, because you have to actually walk all of
> the possible commits and trees that could refer to it. This
> prohibits a lot of other optimizations like reachability bitmaps
> (though with some complexity you could cache the depths, too).
I think what the user likely wants is to use the minimum depth based
on the commits in the traversal, not every commit in the repo - is
this what you mean?

>
>   - you have to define it as something like "the minimum depth at which
> this object is found", since there may be multiple depths
>
> I think you can read that second definition between the lines of:
>
> > +The form '--filter=tree:' omits all blobs and trees deeper than
> > + from the root tree. Currently, only =0 is supported.
>
> But I wonder if we should be more precise. It doesn't matter now, but it
> may help set expectations if the feature does come later.
>
Makes sense. I changed it like this -

diff --git a/Documentation/rev-list-options.txt
b/Documentation/rev-list-options.txt
index 0b5f77ad3..5f1672913 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -732,8 +732,10 @@ the requested refs.
 The form '--filter=sparse:path=' similarly uses a sparse-checkout
 specification contained in .
 +
-The form '--filter=tree:' omits all blobs and trees deeper than
- from the root tree. Currently, only =0 is supported.
+The form '--filter=tree:' omits all blobs and trees whose depth
+from the root tree is >=  (minimum depth if an object is located
+at multiple depths in the commits traversed). Currently, only =0
+is supported, which omits all blobs and trees.

 --no-filter::
  Turn off any previous `--filter=` argument.


> -Peff


Re: [PATCH 1/2] store submodule in common dir

2018-08-14 Thread Stefan Beller
On Tue, Aug 14, 2018 at 4:20 PM Junio C Hamano  wrote:
>
> Junio C Hamano  writes:
>
> > My understanding of what Joakim wants to do is to have a top-level
> > project that has three subdirectories, e.g. kernel/v2.2, kernel/v2.4
> > and kernel/v2.6, each of which is a submodule that houses these
> > versions of Linux kernel source, but only clone Linus's repository
> > (as the up-to-late tree has all the necessary history to check out
> > these past development tracks).  And that should be doable with
> > just the main checkout, without any additional worktree (it's just
> > the matter of having .git/modules/kernel%2fv2.6/ directory pointed
> > by two symlinks from .git/modules/kernel%2fv2.[24], or something
> > like that).
>
> Actually I take the last part of that back.  When thought naively
> about, it may appear that it should be doable, but because each of
> the modules/* directory in the top-level project has to serve as the
> $GIT_DIR for each submodule checkout, and the desire is to have
> these three directories to have checkout of three different
> branches, a single directory under modules/. that is shared among
> three submodules would *not* work---they must have separate index,
> HEAD, etc.
>
> Theoretically we should be able to make modules/kernel%2fv2.[24]
> additional "worktree"s of modules/kernel%2fv2.6, but given that
> these are all "bare" repositories without an attached working tree,
> I am not sure how that would supposed to work.  Thinking about
> having multiple worktrees on a single bare repository makes me head
> spin and ache X-<;-)

Well the worktree feature would do all this in a safe manner, but the existing
feature of just cloning the submodules with a reference pointer to another
repository at least dedupes some of the object store, although warnings
need to be read carefully.

Regarding the worktree, I would think we'd want to have

  git worktree --recurse-submodules [list, add, etc]

that would do a sensible thing for each action on the worktrees,
but you seem to propose to have one of the three submodules
the main worktree and the other two would be just worktrees of
the first.

  I guess you could just

* init/update one of the submodules
* add their worktrees manually pointed to where
  the 2nd and 3rd submodule would go.
* make a symbolic link from
  ln -s .git/modules/<1st>/worktrees/<2nd> .git/modules/<2nd>
  ln -s .git/modules/<1st>/worktrees/<3rd> .git/modules/<3rd>

as then submodule operations should "just work" and by having the
worktrees in-place where the submodules are, we'd also have
all the protection against overzealous garbage collection, too.

Stefan


Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Matthew DeVore
On Tue, Aug 14, 2018 at 1:55 PM Junio C Hamano  wrote:
>
> Jonathan Tan  writes:
>
> >> - grep -E "tree|blob" objs >trees_and_blobs &&
> >> - test_line_count = 1 trees_and_blobs
> >> + grep -E "tree|blob" objs \
> >> + | awk -f print_1.awk >trees_and_blobs &&
> >> + git -C r1 rev-parse HEAD: >expected &&
> >> + test_cmp trees_and_blobs expected
> >
> > Indent "| awk" (and similar lines in this patch) - although I guess it
> > is likely that you actually have it indented, and your e-mail client
> > modified the whitespace so that it looks like there is no indent.
>
> No, wrap lines like this
>
> command1 arg1 arg2 |
> command2 arg1 arg2 &&
>
> That way, you do not need backslash to continue line.
>
> Also think twice when you are seeing yourself piping output from
> "grep" to more powerful tools like "perl", "awk" or "sed".  Often
> you can lose the upstream "grep".

Thank you. I changed it to this:
  awk -e "/tree|blob/{print \$1}" objs >trees_and_blobs

About the line wrapping strategy, the files I've edited all did it
with the \ and the | at the start of subsequent lines - so I made my
code match that style. Otherwise I would have liked to use the style
you suggest...


Re: [PATCH 1/2] store submodule in common dir

2018-08-14 Thread Stefan Beller
On Tue, Aug 14, 2018 at 4:04 PM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > Signed-off-by: Stefan Beller 
> > ---
> >  path.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > On Tue, Aug 14, 2018 at 3:27 PM Joakim Tjernlund 
> >  wrote:
> >>
> >> I am trying to create 3 submodules from the same git repo, each pointing 
> >> to a different branch.
> >> Since the repo is somewhat large, I don't want the 3 submodules to clone 
> >> the same repo 3
> >> times, I want one clone and then have the 3 submodules to point to 
> >> different commits.
> >>
> >> Is this possible? If not, could it be added?
> >
> > yup.
> >
> > According to recent discussions, it would be just this patch.
> > (plus some unspecified amount of work, TBD).
> >
> > I thought about proposing something proper later, but here is the WIP patch.
> >
> > Thanks,
> > Stefan
>
> My understanding of what Joakim wants to do is to have a top-level
> project that has three subdirectories, e.g. kernel/v2.2, kernel/v2.4
> and kernel/v2.6, each of which is a submodule that houses these
> versions of Linux kernel source, but only clone Linus's repository
> (as the up-to-late tree has all the necessary history to check out
> these past development tracks).  And that should be doable with
> just the main checkout, without any additional worktree (it's just
> the matter of having .git/modules/kernel%2fv2.6/ directory pointed
> by two symlinks from .git/modules/kernel%2fv2.[24], or something
> like that).

Ah! I misunderstood due to fast reading.

For that I think you are interested in the feature added in d92a39590d1
(Add --reference option to git submodule., 2009-05-04), i.e.
both the update and add command take the --reference flag
that can be pointed at another repository such as an outside
clone of these three submodules, so some deduplication will
be performed.

> Isn't "common_dir" stuff used to decide if each of separate
> "worktree" instance (of the superproject) shares ".git/$stuff"
> with each other?
>
> Unless I am grossly misinterpreting the original question, I fail to
> see how changing .git/modules to be shared across worktrees possibly
> affects anything.  I am puzzled...

I did misunderstand grossly.

Stefan


Re: [PATCH 1/2] store submodule in common dir

2018-08-14 Thread Junio C Hamano
Junio C Hamano  writes:

> My understanding of what Joakim wants to do is to have a top-level
> project that has three subdirectories, e.g. kernel/v2.2, kernel/v2.4
> and kernel/v2.6, each of which is a submodule that houses these
> versions of Linux kernel source, but only clone Linus's repository
> (as the up-to-late tree has all the necessary history to check out
> these past development tracks).  And that should be doable with
> just the main checkout, without any additional worktree (it's just
> the matter of having .git/modules/kernel%2fv2.6/ directory pointed
> by two symlinks from .git/modules/kernel%2fv2.[24], or something
> like that).

Actually I take the last part of that back.  When thought naively
about, it may appear that it should be doable, but because each of
the modules/* directory in the top-level project has to serve as the
$GIT_DIR for each submodule checkout, and the desire is to have
these three directories to have checkout of three different
branches, a single directory under modules/. that is shared among
three submodules would *not* work---they must have separate index,
HEAD, etc.

Theoretically we should be able to make modules/kernel%2fv2.[24]
additional "worktree"s of modules/kernel%2fv2.6, but given that
these are all "bare" repositories without an attached working tree,
I am not sure how that would supposed to work.  Thinking about
having multiple worktrees on a single bare repository makes me head
spin and ache X-<;-)


Re: [PATCH] partial-clone: render design doc using asciidoc

2018-08-14 Thread Stefan Beller
On Tue, Aug 14, 2018 at 4:12 PM Jonathan Nieder  wrote:
>
> Hi,
>
> Stefan Beller wrote:
> > On Tue, Aug 14, 2018 at 3:28 PM Jonathan Nieder  wrote:
>
> >> repack in GC has been updated to not touch promisor
> >> packfiles at all, and to only repack other objects.
> >
> > We'd need to adapt this documentation in Jonathans series?
>
> Yes, or in a separate patch.
>
> >> -[0] https://bugs.chromium.org/p/git/issues/detail?id=2
> >> -Chromium work item for: Partial Clone
> >> +[0] https://crbug.com/git/2
> >> +Bug#2: Partial Clone
> >
> > This is more than a formatting fix (I did not quite check the rest,
> > but this stood out), but some change to the actual content?
>
> "git show --word-diff" tells me it's the only place that touched
> wording.  If you'd like, I can send it as a separate, preparatory
> patch.

--word-diff is even nicer than -w which is fine for all except the links.

If you are inclined to resend, please do separate, but I do not
insist on it.

Thanks,
Stefan


Re: [PATCH v4 4/6] rev-list: handle missing tree objects properly

2018-08-14 Thread Jonathan Tan
> > So we don't want to die in list-objects.c. If we
> > fail to fetch, then we will die on line 213 in rev-list.c.
> 
> Why don't we want to die in list-objects.c? When --missing=error is
> passed, fetch_if_missing retains its default value of 1, so
> parse_tree_gently() will attempt to fetch it - and if it fails, I think
> it's appropriate to die in list-objects.c (and this should be the
> current behavior). On other values, e.g. --missing=allow-any, there is
> no autofetch (since fetch_if_missing is 0), so it is correct not to die
> in list-objects.c.

After some in-office discussion, I should have checked line 213 in
builtin/rev-list.c more thorougly. Indeed it is OK not to die in
list-objects.c here, since builtin/rev-list.c already knows how to
handle missing objects in the --missing=error circumstance.


Re: [PATCH] partial-clone: render design doc using asciidoc

2018-08-14 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:
> On Tue, Aug 14, 2018 at 3:28 PM Jonathan Nieder  wrote:

>> repack in GC has been updated to not touch promisor
>> packfiles at all, and to only repack other objects.
>
> We'd need to adapt this documentation in Jonathans series?

Yes, or in a separate patch.

>> -[0] https://bugs.chromium.org/p/git/issues/detail?id=2
>> -Chromium work item for: Partial Clone
>> +[0] https://crbug.com/git/2
>> +Bug#2: Partial Clone
>
> This is more than a formatting fix (I did not quite check the rest,
> but this stood out), but some change to the actual content?

"git show --word-diff" tells me it's the only place that touched
wording.  If you'd like, I can send it as a separate, preparatory
patch.

Thanks,
Jonathan


Re: git submodule: 3 modules same git repo?

2018-08-14 Thread Jonathan Nieder
Hi,

Joakim Tjernlund wrote:

> I am trying to create 3 submodules from the same git repo, each pointing to a 
> different branch.
> Since the repo is somewhat large, I don't want the 3 submodules to clone the 
> same repo 3
> times, I want one clone and then have the 3 submodules to point to different 
> commits.
>
> Is this possible? If not, could it be added?

It's an interesting case.  You're not the only one that has wanted
something like this: for example, "repo" (a similar tool with some
differences) included the change [1] (repo: Support multiple branches
for the same project, 2014-01-30) for this kind of application.

In practice, it's a bit messy.  To allow switching to a branch where
the submodule is not present, we store the submodule .git directory
in the superproject's .git/modules/ directory.  This is
an ordinary .git directory, with files like

HEAD
config
packed-refs
[etc]

The "git worktree" tool allows having multiple worktrees for a single
Git repository.  Each worktree has its own HEAD.  So the layout would
look something like

.git/modules//
config
packed-refs
[etc]
worktrees/
/
HEAD
/
HEAD

By piggy-backing on the "git worktree" feature, this should work
great, but it will take some work to teach Git to set it up.

A side note: as Stefan (cc-ed) mentioned, there is another, related
interaction between submodules and "git worktree".  Namely: I might want
to have multiple worktrees for a single superproject Git repository.
In that case, the layout might look something like

.git/
HEAD
config
packed-refs
[etc]
modules/
/
HEAD
config
packed-refs
...
worktrees/
/
HEAD
/
HEAD

The patch that Stefan sent heads in this direction, but it has a
problem: if the submodule is checked out in both worktrees, then they
cannot have the same HEAD.  So to support that goal well would also
require supporting the goal you've described as a side effect: each
submodule would need to have multiple worktrees, at least one per
worktree of the superproject.

Sorry, that got a bit contorted after a while.  If you'd be interested
in pursuing this, I'd be happy to review any thoughts you have (for
example if you write a brief design doc).

My experience from interacting with the feature [1] is that it is easy
to make mistakes when trying to support this kind of case.  (repo had
some bugs due to, for example, multiple checkouts of a repository
trying to run "git prune" at the same time.)  The "git worktree"
feature should be a good foundation to build on that avoids some of
the problems encountered there.

Thanks,
Jonathan

[1] https://gerrit-review.googlesource.com/c/git-repo/+/50715/


Re: [PATCH 1/2] store submodule in common dir

2018-08-14 Thread Junio C Hamano
Stefan Beller  writes:

> Signed-off-by: Stefan Beller 
> ---
>  path.c | 1 +
>  1 file changed, 1 insertion(+)
>  
> On Tue, Aug 14, 2018 at 3:27 PM Joakim Tjernlund 
>  wrote:
>>
>> I am trying to create 3 submodules from the same git repo, each pointing to 
>> a different branch.
>> Since the repo is somewhat large, I don't want the 3 submodules to clone the 
>> same repo 3
>> times, I want one clone and then have the 3 submodules to point to different 
>> commits.
>>
>> Is this possible? If not, could it be added?
>
> yup.
>
> According to recent discussions, it would be just this patch.
> (plus some unspecified amount of work, TBD).
>
> I thought about proposing something proper later, but here is the WIP patch.
>
> Thanks,
> Stefan 

My understanding of what Joakim wants to do is to have a top-level
project that has three subdirectories, e.g. kernel/v2.2, kernel/v2.4
and kernel/v2.6, each of which is a submodule that houses these
versions of Linux kernel source, but only clone Linus's repository
(as the up-to-late tree has all the necessary history to check out
these past development tracks).  And that should be doable with
just the main checkout, without any additional worktree (it's just
the matter of having .git/modules/kernel%2fv2.6/ directory pointed
by two symlinks from .git/modules/kernel%2fv2.[24], or something
like that).

Isn't "common_dir" stuff used to decide if each of separate
"worktree" instance (of the superproject) shares ".git/$stuff"
with each other?

Unless I am grossly misinterpreting the original question, I fail to
see how changing .git/modules to be shared across worktrees possibly
affects anything.  I am puzzled...

>
> diff --git a/path.c b/path.c
> index 34f0f98349a..64c9821b834 100644
> --- a/path.c
> +++ b/path.c
> @@ -115,6 +115,7 @@ static struct common_dir common_list[] = {
>   { 1, 1, 1, "logs/HEAD" },
>   { 0, 1, 1, "logs/refs/bisect" },
>   { 0, 1, 0, "lost-found" },
> + { 0, 1, 0, "modules" },
>   { 0, 1, 0, "objects" },
>   { 0, 1, 0, "refs" },
>   { 0, 1, 1, "refs/bisect" },


Re: [PATCH] partial-clone: render design doc using asciidoc

2018-08-14 Thread Stefan Beller
On Tue, Aug 14, 2018 at 3:28 PM Jonathan Nieder  wrote:
>
> Rendered documentation can be easier to read than raw text because
> headings and emphasized phrases stand out.  Add the missing markup and
> Makefile rule required to render this design document using asciidoc.
>
> Tested by running
>
>   make -C Documentation technical/partial-clone.html
>
> and viewing the output in a browser.
>
> Signed-off-by: Jonathan Nieder 
> ---
> Today I noticed that this document wasn't available at
> https://www.kernel.org/pub/software/scm/git/docs/technical/partial-clone.html,
> so I wrote this patch.
>
> Thoughts of all kinds welcome, as always.

yay! Thanks for writing it!

...

> repack in GC has been updated to not touch promisor
> packfiles at all, and to only repack other objects.

We'd need to adapt this documentation in Jonathans series?



> -[0] https://bugs.chromium.org/p/git/issues/detail?id=2
> -Chromium work item for: Partial Clone
> +[0] https://crbug.com/git/2
> +Bug#2: Partial Clone

This is more than a formatting fix (I did not quite check the rest,
but this stood out), but some change to the actual content?


Re: [PATCH v4 4/6] rev-list: handle missing tree objects properly

2018-08-14 Thread Jonathan Tan
> > > @@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const 
> > > char *prefix)
> > >   init_revisions(, prefix);
> > >   revs.abbrev = DEFAULT_ABBREV;
> > >   revs.commit_format = CMIT_FMT_UNSPECIFIED;
> > > + revs.do_not_die_on_missing_tree = 1;
> >
> > Is this correct? I would have expected this to be set only if --missing
> > was set.
> If --missing is not set, then we want to fetch missing objects
> automatically, and then die if we fail to do that, which is what
> happens for blobs.

This is true, and should already be handled. Pay attention to when
fetch_if_missing is set in builtin/rev-list.c.

do_not_die_on_missing_tree should probably be set to 1 whenever
fetch_if_missing is set to 0, I think.

(I acknowledge that the usage of this global variable is confusing, but
I couldn't think of a better way to implement this when I did. Perhaps
when the object store refactoring is done, this can be a store-specific
setting instead of a global variable.)

> So we don't want to die in list-objects.c. If we
> fail to fetch, then we will die on line 213 in rev-list.c.

Why don't we want to die in list-objects.c? When --missing=error is
passed, fetch_if_missing retains its default value of 1, so
parse_tree_gently() will attempt to fetch it - and if it fails, I think
it's appropriate to die in list-objects.c (and this should be the
current behavior). On other values, e.g. --missing=allow-any, there is
no autofetch (since fetch_if_missing is 0), so it is correct not to die
in list-objects.c.

> > As for --missing=allow-promisor, I don't see them being tested anywhere
> > :-( so feel free to make a suggestion. I would put them in t6112 for
> > easy comparison with the other --missing tests.
> Kept my allow-promisor test in t0410 since it requires partial clone
> to be turned on in the config, and because it is pretty similar to
> --exclude-promisor-objects.

OK, sounds good to me.


Re: [PATCH v4 4/6] rev-list: handle missing tree objects properly

2018-08-14 Thread Matthew DeVore
On Tue, Aug 14, 2018 at 11:06 AM Jonathan Tan  wrote:
>
> > Previously, we assumed only blob objects could be missing. This patch
> > makes rev-list handle missing trees like missing blobs. A missing tree
> > will cause an error if --missing indicates an error should be caused,
> > and the hash is printed even if the tree is missing.
>
> The last sentence is difficult to understand - probably better to say
> that all --missing= arguments and --exclude-promisor-objects work for
> missing trees like they currently do for blobs (and do not fixate on
> just --missing=error). And also demonstrate this in tests, like in
> t6612.
Fixed the commit message. And for the tests, in t0410 I changed the
--missing=allow-any to --missing-allow-promisor, and in t6112 I added
--missing=allow-any and --missing=print test cases.

>
> > In list-objects.c we no longer print a message to stderr if a tree
> > object is missing (quiet_on_missing is always true). I couldn't find
> > any place where this would matter, or where the caller of
> > traverse_commit_list would need to be fixed to show the error. However,
> > in the future it would be trivial to make the caller show the message if
> > we needed to.
> >
> > This is not tested very thoroughly, since we cannot create promisor
> > objects in tests without using an actual partial clone. t0410 has a
> > promise_and_delete utility function, but the is_promisor_object function
> > does not return 1 for objects deleted in this way. More tests will will
> > come in a patch that implements a filter that can be used with git
> > clone.
>
> These two paragraphs are no longer applicable, I think.
Sorry about that. Removed.

>
> > @@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const 
> > char *prefix)
> >   init_revisions(, prefix);
> >   revs.abbrev = DEFAULT_ABBREV;
> >   revs.commit_format = CMIT_FMT_UNSPECIFIED;
> > + revs.do_not_die_on_missing_tree = 1;
>
> Is this correct? I would have expected this to be set only if --missing
> was set.
If --missing is not set, then we want to fetch missing objects
automatically, and then die if we fail to do that, which is what
happens for blobs. So we don't want to die in list-objects.c. If we
fail to fetch, then we will die on line 213 in rev-list.c.

>
> > - process_tree_contents(ctx, tree, base);
> > + /*
> > +  * NEEDSWORK: we should not have to process this tree's contents if 
> > the
> > +  * filter wants to exclude all its contents AND the filter doesn't 
> > need
> > +  * to collect the omitted OIDs. We should add a LOFR_SKIP_TREE bit 
> > which
> > +  * allows skipping all children.
> > +  */
> > + if (parsed)
> > + process_tree_contents(ctx, tree, base);
>
> I agree with Jeff Hostetler in [1] that a LOFR_SKIP_TREE bit is
> desirable, but I don't think that this patch is the right place to
> introduce this NEEDSWORK. For me, this patch is about skipping iterating
> over the contents of a tree because the tree does not exist; this
> NEEDSWORK is about skipping iterating over the contents of a tree
> because we don't want its contents, and it is quite confusing to
> conflate the two.
I've removed this.

>
> [1] 
> https://public-inbox.org/git/d751d56b-84bb-a03d-5f2a-7dbaf8d94...@jeffhostetler.com/
>
> > @@ -124,6 +124,7 @@ struct rev_info {
> >   first_parent_only:1,
> >   line_level_traverse:1,
> >   tree_blobs_in_commit_order:1,
> > + do_not_die_on_missing_tree:1,
>
> I know that the other flags don't have documentation, but I think it's
> worth documenting this one because it is rather complicated. I have
> provided a sample one in my earlier review - feel free to use that or
> come up with your own.
Added your wording to revision.h without major change.

>
> > diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
> > index 4984ca583..74e3c5767 100755
> > --- a/t/t0410-partial-clone.sh
> > +++ b/t/t0410-partial-clone.sh
> > @@ -186,6 +186,72 @@ test_expect_success 'rev-list stops traversal at 
> > missing and promised commit' '
> >   ! grep $FOO out
> >  '
> >
> > +test_expect_success 'show missing tree objects with --missing=print' '
> > + rm -rf repo &&
> > + test_create_repo repo &&
> > + test_commit -C repo foo &&
> > + test_commit -C repo bar &&
> > + test_commit -C repo baz &&
> > +
> > + TREE=$(git -C repo rev-parse bar^{tree}) &&
> > +
> > + promise_and_delete $TREE &&
> > +
> > + git -C repo config core.repositoryformatversion 1 &&
> > + git -C repo config extensions.partialclone "arbitrary string" &&
> > + git -C repo rev-list --quiet --missing=print --objects HEAD 
> > >missing_objs 2>rev_list_err &&
> > + echo "?$TREE" >expected &&
> > + test_cmp expected missing_objs &&
> > +
> > + # do not complain when a missing tree cannot be parsed
> > + ! grep -q "Could not read " rev_list_err
> > +'
>
> I 

[PATCH 1/2] store submodule in common dir

2018-08-14 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 path.c | 1 +
 1 file changed, 1 insertion(+)
 
On Tue, Aug 14, 2018 at 3:27 PM Joakim Tjernlund 
 wrote:
>
> I am trying to create 3 submodules from the same git repo, each pointing to a 
> different branch.
> Since the repo is somewhat large, I don't want the 3 submodules to clone the 
> same repo 3
> times, I want one clone and then have the 3 submodules to point to different 
> commits.
>
> Is this possible? If not, could it be added?

yup.

According to recent discussions, it would be just this patch.
(plus some unspecified amount of work, TBD).

I thought about proposing something proper later, but here is the WIP patch.

Thanks,
Stefan 

diff --git a/path.c b/path.c
index 34f0f98349a..64c9821b834 100644
--- a/path.c
+++ b/path.c
@@ -115,6 +115,7 @@ static struct common_dir common_list[] = {
{ 1, 1, 1, "logs/HEAD" },
{ 0, 1, 1, "logs/refs/bisect" },
{ 0, 1, 0, "lost-found" },
+   { 0, 1, 0, "modules" },
{ 0, 1, 0, "objects" },
{ 0, 1, 0, "refs" },
{ 0, 1, 1, "refs/bisect" },
-- 
2.18.0.865.gffc8e1a3cd6-goog



Re: t5570-git-daemon fails with SIGPIPE on OSX

2018-08-14 Thread Jeff King
On Tue, Aug 14, 2018 at 06:32:47PM -0400, Jeff King wrote:

> I suspect the (largely untested) patch below would make your test
> problems go away. Or instead, we could simply add sigpipe=ok to the
> test_must_fail invocation, but I agree with you that the current
> behavior on OS X is not ideal (the user sees no error message).

Whoops, that patch of course misses adding "sigchain_push(SIGPIPE,
SIG_IGN)" during the fetch-pack operation. I'll leave that as an
exercise to the reader. ;)

-Peff


Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-14 Thread Stefan Beller
On Tue, Aug 14, 2018 at 2:12 PM Jonathan Nieder  wrote:
>
> Hi,
>
> Stefan Beller wrote:
> > On Tue, Aug 14, 2018 at 11:57 AM Jonathan Nieder  wrote:
>
> >> Second, what if we store the pathname in config?  We already store the
> >> URL there:
> >>
> >> [submodule "plugins/hooks"]
> >> url = https://gerrit.googlesource.com/plugins/hooks
> >>
> >> So we could (as a followup patch) do something like
> >>
> >> [submodule "plugins/hooks"]
> >> url = https://gerrit.googlesource.com/plugins/hooks
> >> gitdirname = plugins%2fhooks
> >>
> >> and use that for lookups instead of regenerating the directory name.
> >> What do you think?
> >
> > As I just looked at worktree code, this sounds intriguing for the wrong
> > reason (again), as a user may want to point the gitdirname to a repository
> > that they have already on disk outside the actual superproject. They
> > would be reinventing worktrees in the submodule space. ;-)
> >
> > This would open up the security hole that we just had, again.
> > So we'd have to make sure that the gitdirname (instead of the
> > now meaningless subsection name) is proof to ../ attacks.
> >
> > I feel uneasy about this as then the user might come in
> > and move submodules and repoint the gitdirname...
> > to a not url encoded path. Exposing this knob just
> > asks for trouble, no?
>
> What if we forbid directory separator characters in the gitdirname?

Fine with me, but ideally we'd want to allow sharding the
submodules. When you have 1000 submodules
we'd want them not all inside the toplevel "modules/" ?
Up to now we could just wave hands and claim the user
(who is clearly experienced with submodules as they
use so many of them) would shard it properly.

With this scheme we loose the ability to shard.

> [...]
> > What would happen if gitdirname is changed as part of
> > history? (The same problem we have now with changing
> > the subsection name)
>
> In this proposal, it would only be read from config, not from
> .gitmodules.

Ah good point. That makes sense.

Stepping back a bit regarding the config:
When I clone gerrit (or any repo using submodules)

$ git clone --recurse-submodules \
  https://gerrit.googlesource.com/gerrit g2
[...]
$ cat g2/.git/config
[submodule]
active = .
[submodule "plugins/codemirror-editor"]
url = https://gerrit.googlesource.com/plugins/codemirror-editor
[... more urls to follow...]

Originally we have had the url in the config, (a) that we can change
the URLs after the "git submodule init" and "git submodule update"
step that actually clones the submodule if not present and much more
importantly (b) to know which submodule "was initialized/active".

Now that we have the submodule.active or even
submodule..active flags, we do not need (b) any more.
So the URL turns into a useless piece of cruft that just is unneeded
and might confuse the user.

So maybe I'd want to propose a patch that removes
submodule..url from the config once it is cloned.
(I just read up on "submodule sync" again, but that might not
even need special care for this new world)

And with all that said, I think if we can avoid having the submodules
gitdir in the config, the config would look much cleaner, too.

But maybe that is the wrong thing to optimize for. ;-)
It just demonstrates that we'd have a submodule specific
thing again in the config.

So my preference would be to do a similar thing as
url-encoding as that solves the issue of slashes and
potentially of case sensitivity (e.g. encode upper case A
as lower case with underscore _a)

However the transition worries me, as it transitions
within the same namespace. Back then when we
transferred from the .git dir inside the submodules
working tree to the embedded version in the superprojects
.git dir, there was no overlap, and any potential directory
in .git/modules/ that was already there, was highly
unusual, so asking the user for help is the reasonable
thing to do.
But now we might run into issues that has overlap between
old(name as is) and new (urlencoded) world.

So maybe we also want to transition from

modules/

to

submodules/)>

Thanks,
Stefan


Re: t5570-git-daemon fails with SIGPIPE on OSX

2018-08-14 Thread Jeff King
On Mon, Aug 06, 2018 at 05:11:13PM +0200, SZEDER Gábor wrote:

>   - 'git upload-pack' receives the request, parses the want line,
> notices the corrupt pack, responds with an 'ERR upload-pack: not
> our ref' pkt-line, and die()s right away.
> 
>   - 'git fetch' finally approaches the end of the function, where it
> attempts to send a done pkt-line via another send_request() call
> through the now closing TCP socket.
> 
>   - What happens now seems to depend on the platform:
> 
> - On Linux, both on my machine and on Travis CI, it shows textbook
>   example behaviour: write() returns with error and sets errno to
>   ECONNRESET.  Since it happens in write_or_die(), 'git fetch'
>   die()s with 'fatal: write error: Connection reset by peer', and
>   doesn't show the error send by 'git upload-pack'; how could it,
>   it doesn't even get as far to receive upload-pack's ERR
>   pkt-line.
> 
>   The test only checks that 'git fetch' fails, but it doesn't
>   check whether it failed with the right error message, so the
>   test still succeeds.  Had it checked the error message as well,
>   we most likely had noticed this issue already, it doesn't happen
>   all that rarely.

Hmm. Traditionally we did not send ERR as part of upload-pack at all. It
was the message you got from git-daemon if it couldn't start the
requested sub-process. It was only later in bdb31eada7 (upload-pack:
report "not our ref" to client, 2017-02-23) that we started sending
them. So I think that is why it does not check the error message: it is
not expecting that case at all (and it is not actually interesting here,
as the real problem is that the remote side is corrupt, but it sadly
does not say anything so useful).

I think that's somewhat tangential, though. The root of the issue is
this:

> - On the new OSX images with XCode 9.4 on Travis CI the write()
>   triggers SIGPIPE right away, and 'test_must_fail' notices it and
>   fails the test.  I couldn't see any sign of an ECONNRESET or any
>   other error that we could act upon to avoid the SIGPIPE.

Right, as soon as we get SIGPIPE we can't offer any useful message,
because we're dead. I would argue that fetch should simply turn off
SIGPIPE entirely, and rely on getting EPIPE from write(). But since
we're in write_or_die(), it actually turns EPIPE back into a SIGPIPE
death!

So we'd probably also want to teach it to use a real write_in_full(),
and then output a more useful message in this case. write_or_die()
really does produce bad messages regardless, because it doesn't know
what it's writing to.

That would give us a baby step in the right direction, because at least
we'd always be doing a controlled die() then. And then the next step
would be to show the remote error message (even though it's not actually
useful in this case, in theory upload-pack could generate something
better). And that would mean turning the die() on write into an attempt
to drain any ERR messages before either dying or returning an error up
the stack.

I suspect the (largely untested) patch below would make your test
problems go away. Or instead, we could simply add sigpipe=ok to the
test_must_fail invocation, but I agree with you that the current
behavior on OS X is not ideal (the user sees no error message).

-Peff

diff --git a/fetch-pack.c b/fetch-pack.c
index 5714bcbddd..3e80604562 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -188,8 +188,10 @@ static void send_request(struct fetch_pack_args *args,
if (args->stateless_rpc) {
send_sideband(fd, -1, buf->buf, buf->len, LARGE_PACKET_MAX);
packet_flush(fd);
-   } else
-   write_or_die(fd, buf->buf, buf->len);
+   } else {
+   if (write_in_full(fd, buf->buf, buf->len) < 0)
+   die_errno("unable to write to remote");
+   }
 }
 
 static void insert_one_alternate_object(struct fetch_negotiator *negotiator,
@@ -1167,7 +1169,8 @@ static int send_fetch_request(struct fetch_negotiator 
*negotiator, int fd_out,
 
/* Send request */
packet_buf_flush(_buf);
-   write_or_die(fd_out, req_buf.buf, req_buf.len);
+   if (write_in_full(fd_out, req_buf.buf, req_buf.len) < 0)
+   die_errno("unable to write request to remote");
 
strbuf_release(_buf);
return ret;
diff --git a/pkt-line.c b/pkt-line.c
index a593c08aad..450d0801b1 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -88,13 +88,15 @@ static void packet_trace(const char *buf, unsigned int len, 
int write)
 void packet_flush(int fd)
 {
packet_trace("", 4, 1);
-   write_or_die(fd, "", 4);
+   if (write_in_full(fd, "", 4) < 0)
+   die_errno("unable to write flush packet");
 }
 
 void packet_delim(int fd)
 {
packet_trace("0001", 4, 1);
-   write_or_die(fd, "0001", 4);
+   if (write_in_full(fd, "", 4) < 0)
+   die_errno("unable to write 

[PATCH] partial-clone: render design doc using asciidoc

2018-08-14 Thread Jonathan Nieder
Rendered documentation can be easier to read than raw text because
headings and emphasized phrases stand out.  Add the missing markup and
Makefile rule required to render this design document using asciidoc.

Tested by running

  make -C Documentation technical/partial-clone.html

and viewing the output in a browser.

Signed-off-by: Jonathan Nieder 
---
Today I noticed that this document wasn't available at
https://www.kernel.org/pub/software/scm/git/docs/technical/partial-clone.html,
so I wrote this patch.

Thoughts of all kinds welcome, as always.

 Documentation/Makefile|   1 +
 Documentation/technical/partial-clone.txt | 208 +++---
 2 files changed, 105 insertions(+), 104 deletions(-)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index d079d7c73ac..a42dcfc7459 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -76,6 +76,7 @@ TECH_DOCS += technical/long-running-process-protocol
 TECH_DOCS += technical/pack-format
 TECH_DOCS += technical/pack-heuristics
 TECH_DOCS += technical/pack-protocol
+TECH_DOCS += technical/partial-clone
 TECH_DOCS += technical/protocol-capabilities
 TECH_DOCS += technical/protocol-common
 TECH_DOCS += technical/protocol-v2
diff --git a/Documentation/technical/partial-clone.txt 
b/Documentation/technical/partial-clone.txt
index 0bed2472c81..1ef66bd788a 100644
--- a/Documentation/technical/partial-clone.txt
+++ b/Documentation/technical/partial-clone.txt
@@ -69,24 +69,24 @@ Design Details
 
 - A new pack-protocol capability "filter" is added to the fetch-pack and
   upload-pack negotiation.
-
-  This uses the existing capability discovery mechanism.
-  See "filter" in Documentation/technical/pack-protocol.txt.
++
+This uses the existing capability discovery mechanism.
+See "filter" in Documentation/technical/pack-protocol.txt.
 
 - Clients pass a "filter-spec" to clone and fetch which is passed to the
   server to request filtering during packfile construction.
-
-  There are various filters available to accommodate different situations.
-  See "--filter=" in Documentation/rev-list-options.txt.
++
+There are various filters available to accommodate different situations.
+See "--filter=" in Documentation/rev-list-options.txt.
 
 - On the server pack-objects applies the requested filter-spec as it
   creates "filtered" packfiles for the client.
-
-  These filtered packfiles are *incomplete* in the traditional sense because
-  they may contain objects that reference objects not contained in the
-  packfile and that the client doesn't already have.  For example, the
-  filtered packfile may contain trees or tags that reference missing blobs
-  or commits that reference missing trees.
++
+These filtered packfiles are *incomplete* in the traditional sense because
+they may contain objects that reference objects not contained in the
+packfile and that the client doesn't already have.  For example, the
+filtered packfile may contain trees or tags that reference missing blobs
+or commits that reference missing trees.
 
 - On the client these incomplete packfiles are marked as "promisor packfiles"
   and treated differently by various commands.
@@ -104,47 +104,47 @@ Handling Missing Objects
   to repository corruption.  To differentiate these cases, the local
   repository specially indicates such filtered packfiles obtained from the
   promisor remote as "promisor packfiles".
-
-  These promisor packfiles consist of a ".promisor" file with
-  arbitrary contents (like the ".keep" files), in addition to
-  their ".pack" and ".idx" files.
++
+These promisor packfiles consist of a ".promisor" file with
+arbitrary contents (like the ".keep" files), in addition to
+their ".pack" and ".idx" files.
 
 - The local repository considers a "promisor object" to be an object that
   it knows (to the best of its ability) that the promisor remote has promised
   that it has, either because the local repository has that object in one of
   its promisor packfiles, or because another promisor object refers to it.
-
-  When Git encounters a missing object, Git can see if it a promisor object
-  and handle it appropriately.  If not, Git can report a corruption.
-
-  This means that there is no need for the client to explicitly maintain an
-  expensive-to-modify list of missing objects.[a]
++
+When Git encounters a missing object, Git can see if it a promisor object
+and handle it appropriately.  If not, Git can report a corruption.
++
+This means that there is no need for the client to explicitly maintain an
+expensive-to-modify list of missing objects.[a]
 
 - Since almost all Git code currently expects any referenced object to be
   present locally and because we do not want to force every command to do
   a dry-run first, a fallback mechanism is added to allow Git to attempt
   to dynamically fetch missing objects from the promisor remote.
-
-  When the normal object lookup fails to find an object, Git invokes
-  fetch-object to try to get 

git submodule: 3 modules same git repo?

2018-08-14 Thread Joakim Tjernlund
I am trying to create 3 submodules from the same git repo, each pointing to a 
different branch.
Since the repo is somewhat large, I don't want the 3 submodules to clone the 
same repo 3
times, I want one clone and then have the 3 submodules to point to different 
commits.

Is this possible? If not, could it be added?

  Jocke


Re: Syncing HEAD

2018-08-14 Thread Ævar Arnfjörð Bjarmason


On Tue, Aug 14 2018, Christian Couder wrote:

> Hi,
>
> When cloning with --mirror, the clone gets its HEAD initialized with
> the value HEAD has in its origin remote. After that if HEAD changes in
> origin there is no simple way to sync HEAD at the same time as the
> refs are synced.
>
> It looks like the simplest way to sync HEAD is:
>
> 1) git remote show origin
> 2) parse "HEAD branch: XXX" from the output of the above command
> 3) git symbolic-ref HEAD refs/heads/XXX
>
> It looks like it would be quite easy to add an option to `fetch` to
> sync HEAD at the same time as regular refs are synced because every
> fetch from an origin that uses a recent Git contains something like:
>
> 19:55:39.304976 pkt-line.c:80   packet:  git< 
> HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow
> deepen-since deepen-not deepen-relative no-progress include-tag
> multi_ack_detailed no-done symref=HEAD:refs/heads/test-1
> agent=git/2.18.0
>
> which in this example shows that HEAD is a symref to refs/heads/test-1
> in origin.
>
> Is there a reason why no such option already exists? Would it makes
> sense to add one? Is there any reason why it's not a good idea? Or am
> I missing something?
>
> I am asking because GitLab uses HEAD in the bare repos it manages to
> store the default branch against which the Merge Requests (same thing
> as Pull Requests on GitHub) are created.
>
> So when people want to keep 2 GitLab hosted repos in sync, GitLab
> needs to sync HEADs too, not just the refs.
>
> I think this could be useful to other setups than GitLab though.
>
> Thanks,
> Christian.

Isn't this some other facet of (or the same) bug I reported in
https://public-inbox.org/git/87bmcyfh67@evledraar.gmail.com/ ?


Re: [PATCH] t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test

2018-08-14 Thread Jeff King
On Tue, Aug 14, 2018 at 01:47:21PM +0200, SZEDER Gábor wrote:

> The test 'pack-objects to file can use bitmap' added in 645c432d61
> (pack-objects: use reachability bitmap index when generating
> non-stdout pack, 2016-09-10) is silently buggy and doesn't check what
> it's supposed to.
> 
> In 't5310-pack-bitmaps.sh', the 'list_packed_objects' helper function
> does what its name implies by running:
> 
>   git show-index <"$1" | cut -d' ' -f2
> 
> The test in question invokes this function like this:
> 
>   list_packed_objects packa.objects &&
>   list_packed_objects packb.objects &&
>   test_cmp packa.objects packb.objects
> 
> Note how these two callsites don't specify the name of the pack index
> file as the function's parameter, but redirect the function's standard
> input from it.  This triggers an error message from the shell, as it
> has no filename to redirect from in the function, but this error is
> ignored, because it happens upstream of a pipe.  Consequently, both
> invocations produce empty 'pack{a,b}.objects' files, and the
> subsequent 'test_cmp' happily finds those two empty files identical.
> 
> Fix these two 'list_packed_objects' invocations by specifying the pack
> index files as parameters.  Furthermore, eliminate the pipe in that
> function by replacing it with an &&-chained pair of commands using an
> intermediate file, so a failure of 'git show-index' or the shell
> redirection will fail the test.

Good catch, and nicely explained. The patch itself looks obviously
correct.

-Peff


Re: Syncing HEAD

2018-08-14 Thread Jeff King
On Tue, Aug 14, 2018 at 05:06:16PM -0400, Jeff King wrote:

> On Tue, Aug 14, 2018 at 10:09:37PM +0200, Christian Couder wrote:
> 
> > When cloning with --mirror, the clone gets its HEAD initialized with
> > the value HEAD has in its origin remote. After that if HEAD changes in
> > origin there is no simple way to sync HEAD at the same time as the
> > refs are synced.
> > 
> > It looks like the simplest way to sync HEAD is:
> > 
> > 1) git remote show origin
> > 2) parse "HEAD branch: XXX" from the output of the above command
> > 3) git symbolic-ref HEAD refs/heads/XXX
> 
> How about:
> 
>   git remote set-head origin -a
> 
> ?

Reading your message again, I see you actually care less about the
refs/remote placeholder and more about the actual HEAD in a bare repo.

In which case "git remote" isn't going to help, though its underlying
code has the algorithm you would want.

> One tricky thing is that the name "refs/remotes//HEAD" is only
> special by convention, and that convention is known on the writing side
> only by git-clone and git-remote. So obviously:

And so here the convention is simpler, because we're talking about the
main HEAD. But we still have know if you want to do that, and not update
some refs/remotes/ symref in a bare repo.

So all of this really implies to me that you want to be able to say
"take this symref on the other side and update this one on the local
side". I.e., some way to tell a refspec "don't update the value, update
the symref destination". So imagine we made "~" the magic character for
"just the symrefs" (I picked that because it's not allowed in a
refname).

Then you could do what you want with:

  git config --add remote.origin.fetch ~HEAD:HEAD

and these two would be the same:

  git remote set-head origin -a
  git fetch origin ~HEAD:refs/remotes/origin/HEAD

And it would allow more exotic things, too, like:

  # always update the remote notion of HEAD on every fetch
  git config --add remote.origin.fetch ~HEAD:refs/remotes/origin/HEAD

  # update a non-HEAD symref we track for our own purposes
  git fetch origin ~refs/tags/LATEST:refs/tags/LATEST

  # or the same thing but using the usual refspec "dst defaults to src"
  # rule and dwim lookup magic
  git fetch origin ~LATEST

In protocol v0 we don't get symref reports from the other side over the
git protocol (except for HEAD), but we could use the same logic we use
for determining HEAD for older versions of Git: find a ref that points
to the same tip. Though I would say that unlike the existing code in
guess_remote_head(), we'd probably want to treat an ambiguity as an
error, and not just default to refs/heads/master.

-Peff


Re: [PATCH] submodule: add more exhaustive up-path testing

2018-08-14 Thread Stefan Beller
On Tue, Aug 14, 2018 at 2:16 PM Ævar Arnfjörð Bjarmason
 wrote:
>
>
> On Tue, Aug 14 2018, Stefan Beller wrote:
>
> > On Tue, Aug 14, 2018 at 2:05 PM Ævar Arnfjörð Bjarmason
> >  wrote:
> >
> >> > So ideally we'd also error out as soon as the host name is touched?
> >>
> >> Do we have some utility function that'll take whatever we have in
> >> remote..url and spew out the username / host / path? We must,
> >> since the clone protocol needs it, but I haven't found it.
> >
> > Nope. Welcome to the wonderful world of submodules.
> > As submodules are in the transition state towards "in C",
> > we could do some refactorings that would allow for easy access to
> > these properties, but the transition is done via faithful conversion from
> > shell to C, which did not have these functions either, but could just
> > do some string hackery. And that is how we ended up with this
> > function in the first place.
>
> The remote/clone machinery is in C and so is the resolve-relative-url
> helper. What's the missing bridge here?
>
> Maybe we don't have that function yet, but we can presumably expose it,
> and this seems unrelated to some other bits of submodules being in
> shellscript.
>
> No?

Gah, I misread your original question. Sorry about that.


Re: [PATCH 0/7] Resend of sb/submodule-update-in-c

2018-08-14 Thread Stefan Beller
gist
On Tue, Aug 14, 2018 at 2:01 PM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > Thanks Brandon for pointing out to use repo_git_path instead of
> > manually constructing the path.
> >
> > That is the only change in this resend.
>
> Rcpt.  Hopefully this is now ready for 'next'?

I don't know any reasons opposing its progression.

So, Yes, I think it is.

Stefan


Re: [PATCH] submodule: add more exhaustive up-path testing

2018-08-14 Thread Ævar Arnfjörð Bjarmason


On Tue, Aug 14 2018, Stefan Beller wrote:

> On Tue, Aug 14, 2018 at 2:05 PM Ævar Arnfjörð Bjarmason
>  wrote:
>
>> > So ideally we'd also error out as soon as the host name is touched?
>>
>> Do we have some utility function that'll take whatever we have in
>> remote..url and spew out the username / host / path? We must,
>> since the clone protocol needs it, but I haven't found it.
>
> Nope. Welcome to the wonderful world of submodules.
> As submodules are in the transition state towards "in C",
> we could do some refactorings that would allow for easy access to
> these properties, but the transition is done via faithful conversion from
> shell to C, which did not have these functions either, but could just
> do some string hackery. And that is how we ended up with this
> function in the first place.

The remote/clone machinery is in C and so is the resolve-relative-url
helper. What's the missing bridge here?

Maybe we don't have that function yet, but we can presumably expose it,
and this seems unrelated to some other bits of submodules being in
shellscript.

No?


Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-14 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:
> On Tue, Aug 14, 2018 at 11:57 AM Jonathan Nieder  wrote:

>> Second, what if we store the pathname in config?  We already store the
>> URL there:
>>
>> [submodule "plugins/hooks"]
>> url = https://gerrit.googlesource.com/plugins/hooks
>>
>> So we could (as a followup patch) do something like
>>
>> [submodule "plugins/hooks"]
>> url = https://gerrit.googlesource.com/plugins/hooks
>> gitdirname = plugins%2fhooks
>>
>> and use that for lookups instead of regenerating the directory name.
>> What do you think?
>
> As I just looked at worktree code, this sounds intriguing for the wrong
> reason (again), as a user may want to point the gitdirname to a repository
> that they have already on disk outside the actual superproject. They
> would be reinventing worktrees in the submodule space. ;-)
>
> This would open up the security hole that we just had, again.
> So we'd have to make sure that the gitdirname (instead of the
> now meaningless subsection name) is proof to ../ attacks.
>
> I feel uneasy about this as then the user might come in
> and move submodules and repoint the gitdirname...
> to a not url encoded path. Exposing this knob just
> asks for trouble, no?

What if we forbid directory separator characters in the gitdirname?

[...]
> What would happen if gitdirname is changed as part of
> history? (The same problem we have now with changing
> the subsection name)

In this proposal, it would only be read from config, not from
.gitmodules.

Thanks,
Jonathan


Re: [PATCH] submodule: add more exhaustive up-path testing

2018-08-14 Thread Stefan Beller
On Tue, Aug 14, 2018 at 2:05 PM Ævar Arnfjörð Bjarmason
 wrote:

> > So ideally we'd also error out as soon as the host name is touched?
>
> Do we have some utility function that'll take whatever we have in
> remote..url and spew out the username / host / path? We must,
> since the clone protocol needs it, but I haven't found it.

Nope. Welcome to the wonderful world of submodules.
As submodules are in the transition state towards "in C",
we could do some refactorings that would allow for easy access to
these properties, but the transition is done via faithful conversion from
shell to C, which did not have these functions either, but could just
do some string hackery. And that is how we ended up with this
function in the first place.


Re: Syncing HEAD

2018-08-14 Thread Brandon Williams
On 08/14, Stefan Beller wrote:
> On Tue, Aug 14, 2018 at 1:09 PM Christian Couder
>  wrote:
> >
> > Hi,
> >
> > When cloning with --mirror, the clone gets its HEAD initialized with
> > the value HEAD has in its origin remote. After that if HEAD changes in
> > origin there is no simple way to sync HEAD at the same time as the
> > refs are synced.
> >
> > It looks like the simplest way to sync HEAD is:
> >
> > 1) git remote show origin
> > 2) parse "HEAD branch: XXX" from the output of the above command
> > 3) git symbolic-ref HEAD refs/heads/XXX
> >
> > It looks like it would be quite easy to add an option to `fetch` to
> > sync HEAD at the same time as regular refs are synced because every
> > fetch from an origin that uses a recent Git contains something like:
> >
> > 19:55:39.304976 pkt-line.c:80   packet:  git< 
> > HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow
> > deepen-since deepen-not deepen-relative no-progress include-tag
> > multi_ack_detailed no-done symref=HEAD:refs/heads/test-1
> > agent=git/2.18.0
> >
> > which in this example shows that HEAD is a symref to refs/heads/test-1
> > in origin.
> >
> > Is there a reason why no such option already exists? Would it makes
> > sense to add one? Is there any reason why it's not a good idea? Or am
> > I missing something?
> 
> I think it is a great idea to add that. IIRC there was some talk when
> designing protocol v2 on how fetching of symrefs could be added later
> on in the protocol, which is why I cc'd Brandon who did the work there.

Actually the functionality for fetching symrefs already exists (when
using protocol v2 of course).  Despite this functionality existing its
not being used right now.

When performing a v2 fetch the first thing that a client does is request
the list of refs (by doing an ls-refs request).  The output from ls-refs
(if asked) will included information about each ref including if they
are a symref and what ref they resolve to.  So really we just need to
plumb that information through fetch to actually update HEAD, or even
update other symrefs which exist on the server.

> 
> > I am asking because GitLab uses HEAD in the bare repos it manages to
> > store the default branch against which the Merge Requests (same thing
> > as Pull Requests on GitHub) are created.
> >
> > So when people want to keep 2 GitLab hosted repos in sync, GitLab
> > needs to sync HEADs too, not just the refs.
> >
> > I think this could be useful to other setups than GitLab though.
> 
> As said, I can see how that is useful; I recently came across some
> HEAD bug related to submodules, and there we'd also have the application.
> 
> git clone --recurse-submodules file://...
> 
> might clone the submodules that are in detached HEAD, which is totally
> not a long term viable good HEAD, so a subsequent fetch might want
> to change the detached HEAD in submodules or re-affix it to branches.
> 
> Unrelated/extended: I think it would be cool to mirror a repository even
> more, e.g. it would be cool to be able to fetch (if configured as allowed)
> the remote reflog, (not to be confused with you local reflog of the remote).
> I think that work would be enabled once reftables are available, which you
> have an eye on?

-- 
Brandon Williams


Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-14 Thread Stefan Beller
On Tue, Aug 14, 2018 at 11:57 AM Jonathan Nieder  wrote:
>
> Hi,
>
> Brandon Williams wrote:
> > On 08/09, Jeff King wrote:
>
> >> One interesting thing about url-encoding is that it's not one-to-one.
> >> This case could also be %2F, which is a different file (on a
> >> case-sensitive filesystem). I think "%20" and "+" are similarly
> >> interchangeable.
> >>
> >> If we were decoding the filenames, that's fine. The round-trip is
> >> lossless.
> >>
> >> But that's not quite how the new code behaves. We encode the input and
> >> then check to see if it matches an encoding we previously performed. So
> >> if our urlencode routines ever change, this will subtly break.
> >>
> >> I don't know how much it's worth caring about. We're not that likely to
> >> change the routines ourself (though certainly a third-party
> >> implementation would need to know our exact url-encoding decisions).
> >
> > This is exactly the reason why I wanted to get some opinions on what the
> > best thing to do here would be.  I _think_ the best thing would probably
> > be to write a specific routine to do the conversion, and it wouldn't
> > even have to be all that complex.  Basically I'm just interested in
> > converting '/' characters so that things no longer behave like
> > nested directories.
>
> First of all, I think the behavior with this patch is already much
> better than the previous status quo.  I'm using the patch now and am
> very happy with it.
>
> Second, what if we store the pathname in config?  We already store the
> URL there:
>
> [submodule "plugins/hooks"]
> url = https://gerrit.googlesource.com/plugins/hooks
>
> So we could (as a followup patch) do something like
>
> [submodule "plugins/hooks"]
> url = https://gerrit.googlesource.com/plugins/hooks
> gitdirname = plugins%2fhooks
>
> and use that for lookups instead of regenerating the directory name.
> What do you think?

As I just looked at worktree code, this sounds intriguing for the wrong
reason (again), as a user may want to point the gitdirname to a repository
that they have already on disk outside the actual superproject. They
would be reinventing worktrees in the submodule space. ;-)

This would open up the security hole that we just had, again.
So we'd have to make sure that the gitdirname (instead of the
now meaningless subsection name) is proof to ../ attacks.

I feel uneasy about this as then the user might come in
and move submodules and repoint the gitdirname...
to a not url encoded path. Exposing this knob just
asks for trouble, no?

On the other hand, the only requirement for the "name" is
now uniqueness, and that is implied with subsections,
so I guess it looks elegant.

What would happen if gitdirname is changed as part of
history? (The same problem we have now with changing
the subsection name)

The more I think about it the less appealing this is, but it looks
elegant.

Stefan


Re: Syncing HEAD

2018-08-14 Thread Jeff King
On Tue, Aug 14, 2018 at 10:09:37PM +0200, Christian Couder wrote:

> When cloning with --mirror, the clone gets its HEAD initialized with
> the value HEAD has in its origin remote. After that if HEAD changes in
> origin there is no simple way to sync HEAD at the same time as the
> refs are synced.
> 
> It looks like the simplest way to sync HEAD is:
> 
> 1) git remote show origin
> 2) parse "HEAD branch: XXX" from the output of the above command
> 3) git symbolic-ref HEAD refs/heads/XXX

How about:

  git remote set-head origin -a

?

> It looks like it would be quite easy to add an option to `fetch` to
> sync HEAD at the same time as regular refs are synced because every
> fetch from an origin that uses a recent Git contains something like:

I think the "remote set-head" option is not very discoverable, since
people are used to working with "fetch", making it the natural place to
look. Just like we ported "remote update" over to "fetch --all", I think
it would be sensible to have "fetch --update-head" or similar.

One tricky thing is that the name "refs/remotes//HEAD" is only
special by convention, and that convention is known on the writing side
only by git-clone and git-remote. So obviously:

  git fetch --update-head https://example.com/

is nonsense. We don't even have a ref. What should:

  git config remote.origin.fetch refs/heads/*:refs/remotes/foo/*
  git fetch --update-head origin

do? Should it update based no the remote name, or based on the refspec?
What happens if there are several refspecs? Etc.

99% of the time those questions won't come up. But we should design so
that we do the obvious thing in those 99%, and something sane in the
other 1%.

-Peff


Re: [PATCH] submodule: add more exhaustive up-path testing

2018-08-14 Thread Ævar Arnfjörð Bjarmason


On Tue, Aug 14 2018, Stefan Beller wrote:

> On Tue, Aug 14, 2018 at 11:59 AM Ævar Arnfjörð Bjarmason
>  wrote:
>>
>> The tests added in 63e95beb08 ("submodule: port resolve_relative_url
>> from shell to C", 2016-04-15) didn't do a good job of testing various
>> up-path invocations where the up-path would bring us beyond even the
>> URL in question without emitting an error.
>>
>> These results look nonsensical, but it's worth exhaustively testing
>> them before fixing any of this code, so we can see which of these
>> cases were changed.
>
> Yeah. Please look at the comment in builtin/submodule--helper.c
> in that commit, where I described my expectations.
>
> I should have put them into tests instead with the expectations
> spelled out there.

I'll check that out thanks. I saw that comment, but have been skimming
most of this code...

> Thanks for this patch!
> Stefan
>
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>
>
>> So I think these tests are worthwihle in themselves,
>
> The reason I put it in the comment instead of tests was the
> ease of spelling out both the status quo and expectations.
>
>> but would like
>> some advice on how to proceed with that from someone more familiar
>> with submodules.
>
> So ideally we'd also error out as soon as the host name is touched?

Do we have some utility function that'll take whatever we have in
remote..url and spew out the username / host / path? We must,
since the clone protocol needs it, but I haven't found it.


Re: [PATCH v2] send-email: add an option to impose delay sent E-Mails

2018-08-14 Thread Ævar Arnfjörð Bjarmason


On Tue, Aug 14 2018, Eric Wong wrote:

> Ævar Arnfjörð Bjarmason  wrote:
>> Add a --send-delay option with a corresponding sendemail.smtpSendDelay
>> configuration variable. When set to e.g. 2, this causes send-email to
>> sleep 2 seconds before sending the next E-Mail. We'll only sleep
>> between sends, not before the first send, or after the last.
>>
>> This option has two uses. Firstly, to be able to Ctrl+C a long send
>> with "all" if you have a change of heart. Secondly, as a hack in some
>> mail setups to, with a sufficiently high delay, force the receiving
>> client to sort the E-Mails correctly.
>>
>> Some popular E-Mail clients completely ignore the "Date" header, which
>> format-patch is careful to set such that the patches will be displayed
>> in order, and instead sort by the time the E-mail was received.
>>
>> Google's GMail is a good example of such a client. It ostensibly sorts
>> by some approximation of received time (although not by any "Received"
>> header). It's more usual than not to see patches showing out of order
>> in GMail. To take a few examples of orders seen on patches on the Git
>> mailing list:
>>
>> 1 -> 3 -> 4 -> 2 -> 8 -> 7 (completion by Nguyễn Thái Ngọc Duy)
>> 2 -> 0 -> 1 -> 3 (pack search by Derrick Stolee)
>> 3 -> 2 -> 1 (fast-import by Jameson Miller)
>> 2 -> 3 -> 1 -> 5 -> 4 -> 6 (diff-highlight by Jeff King)
>>
>> The reason to add the new "X-Mailer-Send-Delay" header is to make it
>> easy to tell what the imposed delay was, if any. This allows for
>> gathering some data on how the transfer of E-Mails with & without this
>> option behaves. This may not be workable without really long delays,
>> see [1] and [2].
>
> Aside from the new header, I think this is better implemented
> using the existing $relogin_delay and $batch_size=1.
>
> Disconnecting during the delay might be more sympathetic to
> existing mail servers (which aren't C10K-optimized).

Yeah that's a good point, maybe we're being wasteful on remote resources
here.

> If the client sleeps, the server may disconnect the client anyways to
> save resources.

Seems like something we'd need to deal with anyway, do we?

>> @@ -1741,6 +1747,10 @@ sub process_file {
>>  $message, $xfer_encoding, $target_xfer_encoding);
>>  push @xh, "Content-Transfer-Encoding: $xfer_encoding";
>>  unshift @xh, 'MIME-Version: 1.0' unless $has_mime_version;
>> +if ($send_delay && $i > 0) {
>> +push @xh, "X-Mailer-Send-Delay: $send_delay";
>> +sleep $send_delay;
>> +}
>
> We can add this header for relogin_delay + batch_size
>
> But maybe --send-delay can be a shortcut for
> --relogin-delay and --batch-size=1

I need to enter a password when sending a batch with my SMTP server now,
once. With relogin I'd need to enter this N times unless I use whatever
auth save facility there is in git-send-email (which I don't use now).

I don't think it makes sense to conflate these two modes.


Re: [PATCH 0/7] Resend of sb/submodule-update-in-c

2018-08-14 Thread Junio C Hamano
Stefan Beller  writes:

> Thanks Brandon for pointing out to use repo_git_path instead of
> manually constructing the path.
>
> That is the only change in this resend.

Rcpt.  Hopefully this is now ready for 'next'?


Re: Contributor Summit planning

2018-08-14 Thread Junio C Hamano
Jeff King  writes:

> One problem there is that the prefixes are ambiguous (e.g., Jacob Keller
> shares with me, and I think at least one other over the years). You
> could look at the author of the tip commit, but that's not always right
> (and in fact, counting just merged topics misses bug-fixes that get
> applied directly on top of other people's topics).

Yes, a fix by somebody else to a bug that was recently introduced is
safest to apply to the original topic and merge down; that way makes
it more difficult to merge the original topic to older maintenance
track(s) without the fix by mistake.  So a "topic" with commits from
multiple people is not all that unusual.

Thanks.


Re: Syncing HEAD

2018-08-14 Thread Stefan Beller
On Tue, Aug 14, 2018 at 1:09 PM Christian Couder
 wrote:
>
> Hi,
>
> When cloning with --mirror, the clone gets its HEAD initialized with
> the value HEAD has in its origin remote. After that if HEAD changes in
> origin there is no simple way to sync HEAD at the same time as the
> refs are synced.
>
> It looks like the simplest way to sync HEAD is:
>
> 1) git remote show origin
> 2) parse "HEAD branch: XXX" from the output of the above command
> 3) git symbolic-ref HEAD refs/heads/XXX
>
> It looks like it would be quite easy to add an option to `fetch` to
> sync HEAD at the same time as regular refs are synced because every
> fetch from an origin that uses a recent Git contains something like:
>
> 19:55:39.304976 pkt-line.c:80   packet:  git< 
> HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow
> deepen-since deepen-not deepen-relative no-progress include-tag
> multi_ack_detailed no-done symref=HEAD:refs/heads/test-1
> agent=git/2.18.0
>
> which in this example shows that HEAD is a symref to refs/heads/test-1
> in origin.
>
> Is there a reason why no such option already exists? Would it makes
> sense to add one? Is there any reason why it's not a good idea? Or am
> I missing something?

I think it is a great idea to add that. IIRC there was some talk when
designing protocol v2 on how fetching of symrefs could be added later
on in the protocol, which is why I cc'd Brandon who did the work there.

> I am asking because GitLab uses HEAD in the bare repos it manages to
> store the default branch against which the Merge Requests (same thing
> as Pull Requests on GitHub) are created.
>
> So when people want to keep 2 GitLab hosted repos in sync, GitLab
> needs to sync HEADs too, not just the refs.
>
> I think this could be useful to other setups than GitLab though.

As said, I can see how that is useful; I recently came across some
HEAD bug related to submodules, and there we'd also have the application.

git clone --recurse-submodules file://...

might clone the submodules that are in detached HEAD, which is totally
not a long term viable good HEAD, so a subsequent fetch might want
to change the detached HEAD in submodules or re-affix it to branches.

Unrelated/extended: I think it would be cool to mirror a repository even
more, e.g. it would be cool to be able to fetch (if configured as allowed)
the remote reflog, (not to be confused with you local reflog of the remote).
I think that work would be enabled once reftables are available, which you
have an eye on?


Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Junio C Hamano
Jonathan Tan  writes:

>> - grep -E "tree|blob" objs >trees_and_blobs &&
>> - test_line_count = 1 trees_and_blobs
>> + grep -E "tree|blob" objs \
>> + | awk -f print_1.awk >trees_and_blobs &&
>> + git -C r1 rev-parse HEAD: >expected &&
>> + test_cmp trees_and_blobs expected
>
> Indent "| awk" (and similar lines in this patch) - although I guess it
> is likely that you actually have it indented, and your e-mail client
> modified the whitespace so that it looks like there is no indent.

No, wrap lines like this

command1 arg1 arg2 |
command2 arg1 arg2 &&

That way, you do not need backslash to continue line.

Also think twice when you are seeing yourself piping output from
"grep" to more powerful tools like "perl", "awk" or "sed".  Often
you can lose the upstream "grep".


Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-14 Thread Junio C Hamano
Duy Nguyen  writes:

> These trace messages are made for human consumption. Granted
> occasionally we need some processing but I find one liners mostly
> suffice. Now we turn these into something made for machines, turning
> people to second citizens. I've read these messages reformatted for
> human, it's usually too verbose even if it's reformatted.

I actually actively hate the aspect of the slog thing that exposes
the fact that it wants to take and show JSON too much in its API,
but if you look at these "jw_object_*()" thing as _only_ filling
parameters to be emitted, there is no reason to think we cannot
enhance/extend slog_emit_*() thing to take a format string (perhaps
inside the jw structure) so that the formatter does not have to
generate JSON at all.  Envisioning that kind of future, json_writer
is a misnomer that too narrowly defines what it is---it is merely a
generic data container that the codepath being traced can use to
communicate what needs to be logged to the outside world.
slog_emit* can (and when enhanced, should) be capable of paying
attention to an external input (e.g. environment variable) to switch
the output format, and JSON could be just one of the choices.

> It's not just sampling points. There's things like index id being
> shown in the message for example. I prefer to keep free style format
> to help me read. There's also things like indentation I do here to
> help me read.

Yup, I do not think that contradicts with the approach to have a
single unified "data collection" API; you should also be able to
specify how that collection of data is to be presented in the trace
messages meant for humans, which would be discarded when emitting
json but would be used when showing human-readble trace, no?


Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-14 Thread Junio C Hamano
Jeff King  writes:

> On Tue, Aug 14, 2018 at 01:43:38PM -0400, Derrick Stolee wrote:
>
>> On 8/13/2018 5:54 PM, Jeff King wrote:
>> > So I try not to think too hard on metrics, and just use them to get a
>> > rough view on who is active.
>> 
>> I've been very interested in measuring community involvement, with the
>> knowledge that any metric is flawed and we should not ever say "this metric
>> is how we measure the quality of a contributor". It can be helpful, though,
>> to track some metrics and their change over time.
>> 
>> Here are a few measurements we can make:
>
> Thanks, it was nice to see a more comprehensive list in one spot.
>
> It would be neat to have a tool that presents all of these
> automatically, but I think the email ones are pretty tricky (most people
> don't have the whole list archive sitting around).

I do not think it covered e-mail at all, but there was git stats
project several years ago (perhaps part of GSoC IIRC).

> I think I mentioned "surviving lines" elsewhere, which I do like this
> (and almost certainly stole from Junio a long time ago):

Yeah, I recall that one as part of counting how many of 1244 lines
Linus originally wrote still were in our codebase at around v1.6.0
timeframe (the answer was ~220 IIRC) ;-)



Re: [PATCH v3 7/7] submodule: support reading .gitmodules even when it's not checked out

2018-08-14 Thread Junio C Hamano
Antonio Ospite  writes:

>   /* Equivalent to ACTION_SET in builtin/config.c */
> - if (argc == 3)
> + if (argc == 3) {
> + struct object_id oid;
> +
> + /*
> +  * If the .gitmodules file is not in the working tree but it
> +  * is in the current branch, stop, as writing new values (and
> +  * staging them) would blindly overwrite ALL the old content.

Hmph, "the file is missing" certainly is a condition we would want
to notice, but wouldn't we in general want to prevent us from
overwriting any local modification, where "missing" is merely a very
special case of local modification?  I am wondering if we would want
to stop if .gitmodules file exists both in the working tree and in
the index, and the contents of them differ, or something like that.

> diff --git a/git-submodule.sh b/git-submodule.sh
> index ff258e2e8c..b1cb187227 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -159,6 +159,13 @@ cmd_add()
>   shift
>   done
>  
> + # For more details about this check, see
> + # builtin/submodule--helper.c::module_config()
> + if test ! -e .gitmodules && git cat-file -e HEAD:.gitmodules > 
> /dev/null 2>&1

No SP between redirection '>' and its target '/dev/null'.

More importantly, I think it is better to add a submodule--helper
subcommand that exposes the check in question, as the code is
already written ;-) That approach will guarantee that the logic and
the message stay the same between here and in the C code.  Then you
do not even need these two line comment.

> + then
> +  die "$(eval_gettext "please make sure that the .gitmodules 
> file in the current branch is checked out")"
> + fi
> +

> diff --git a/submodule-config.c b/submodule-config.c
> index b7ef055c63..088dabb56f 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -1,4 +1,5 @@
>  #include "cache.h"
> +#include "dir.h"
>  #include "repository.h"
>  #include "config.h"
>  #include "submodule-config.h"
> @@ -603,8 +604,19 @@ static void submodule_cache_check_init(struct repository 
> *repo)
>  static void config_from_gitmodules(config_fn_t fn, struct repository *repo, 
> void *data)
>  {
>   if (repo->worktree) {
> - char *file = repo_worktree_path(repo, GITMODULES_FILE);
> - git_config_from_file(fn, file, data);
> + struct git_config_source config_source = { 0 };
> + const struct config_options opts = { 0 };
> + struct object_id oid;
> + char *file;
> +
> + file = repo_worktree_path(repo, GITMODULES_FILE);
> + if (file_exists(file))
> + config_source.file = file;
> + else if (get_oid(GITMODULES_HEAD, ) >= 0)
> + config_source.blob = GITMODULES_HEAD;

What is the reason why we fall back directly to HEAD when working
tree file does not exist?  I thought that our usual fallback was to
the version in the index for other things like .gitignore/attribute
and this codepath look like an oddball.  Are you trying to handle
the case where we are in a bare repository without any file checked
out (and there is not even the index)?

> diff --git a/t/t7416-submodule-sparse-gitmodules.sh 
> b/t/t7416-submodule-sparse-gitmodules.sh
> new file mode 100755
> index 00..5341e9b012
> --- /dev/null
> +++ b/t/t7416-submodule-sparse-gitmodules.sh
> @@ -0,0 +1,90 @@
> +#!/bin/sh
> +#
> +# Copyright (C) 2018  Antonio Ospite 
> +#
> +
> +test_description='Test reading/writing .gitmodules when not in the working 
> tree
> +
> +This test verifies that, when .gitmodules is in the current branch but is not
> +in the working tree reading from it still works but writing to it does not.
> +
> +The test setup uses a sparse checkout, however the same scenario can be set 
> up
> +also by committing .gitmodules and then just removing it from the filesystem.
> +'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'sparse checkout setup which hides .gitmodules' '
> + echo file > file &&

No SP between redirection '>' and its target 'file'.

> + git add file &&
> + test_tick &&
> + git commit -m upstream &&
> + git clone . super &&
> + git clone super submodule &&
> + git clone super new_submodule &&
> + (cd super &&
> + git submodule add ../submodule &&
> + test_tick &&
> + git commit -m submodule &&
> + cat >.git/info/sparse-checkout <<\EOF &&
> +/*
> +!/.gitmodules
> +EOF

You can use <<-\EOF and indent the body of the here-doc, which makes
the result easier to read, i.e.

cat >target <<-\EOF &&
line 1
line 2
EOF

> + git config core.sparsecheckout true &&
> + git read-tree -m -u HEAD &&

That's old fashioned---I am curious if this has to be one-way merge
or can just be a usual "git checkout" (I am merely curious; 

Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Jonathan Tan
> - grep -E "tree|blob" objs >trees_and_blobs &&
> - test_line_count = 1 trees_and_blobs
> + grep -E "tree|blob" objs \
> + | awk -f print_1.awk >trees_and_blobs &&
> + git -C r1 rev-parse HEAD: >expected &&
> + test_cmp trees_and_blobs expected

Indent "| awk" (and similar lines in this patch) - although I guess it
is likely that you actually have it indented, and your e-mail client
modified the whitespace so that it looks like there is no indent.

Other than that, this interdiff looks good to me. Thanks.


Re: [PATCH v3 3/7] t7411: be nicer to future tests and really clean things up

2018-08-14 Thread Junio C Hamano
Antonio Ospite  writes:

>  test_expect_success 'error message contains blob reference' '
> + # Remove the error introduced in the previous test.
> + # It is not needed in the following tests.
> + test_when_finished "git -C super reset --hard HEAD^" &&
>   (cd super &&
>   sha1=$(git rev-parse HEAD) &&
>   test-tool submodule-config \

Antonio Ospite  writes:

> Tests 5 and 8 in t/t7411-submodule-config.sh add two commits with
> invalid lines in .gitmodules but then only the second commit is removed.
>
> This may affect future subsequent tests if they assume that the
> .gitmodules file has no errors.
>
> Remove both the commits as soon as they are not needed anymore.
>
> The error introduced in test 5 is also required by test 6, so the two
> commits from above are removed respectively in tests 6 and 8.
>
> Signed-off-by: Antonio Ospite 
> ---
>  t/t7411-submodule-config.sh | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
> index 0bde5850ac..c6b6cf6fae 100755
> --- a/t/t7411-submodule-config.sh
> +++ b/t/t7411-submodule-config.sh
> @@ -98,6 +98,9 @@ test_expect_success 'error in one submodule config lets 
> continue' '
>  '
>  
>  test_expect_success 'error message contains blob reference' '
> + # Remove the error introduced in the previous test.
> + # It is not needed in the following tests.
> + test_when_finished "git -C super reset --hard HEAD^" &&

Hmm, that is ugly.  Depending on where in the subshell the previous
test failed, you'd still be taking us to an unexpected place.
Imagine if "git commit -m 'add error'" failed, for example, in the
test before this one.

I am wondering if the proper fix is to merge the previous one and
this one into a single test.  The combined test would

- remember where the HEAD in super is and arrange to come back
  to it when test is done
- break .gitmodules and commit it
- run test-tool and check its output
- also check its error output

in a single test_expect_success.

> @@ -123,6 +126,7 @@ test_expect_success 'using different treeishs works' '
>  '
>  
>  test_expect_success 'error in history in fetchrecursesubmodule lets 
> continue' '
> + test_when_finished "git -C super reset --hard HEAD^" &&
>   (cd super &&
>   git config -f .gitmodules \
>   submodule.submodule.fetchrecursesubmodules blabla &&
> @@ -134,8 +138,7 @@ test_expect_success 'error in history in 
> fetchrecursesubmodule lets continue' '
>   HEAD b \
>   HEAD submodule \
>   >actual &&
> - test_cmp expect_error actual  &&
> - git reset --hard HEAD^
> + test_cmp expect_error actual
>   )
>  '

If we want to be more robust, you'd probably need to find a better
anchoring point than HEAD, which can be pointing different commit
depending on where in the subshell the process is hit with ^C,
i.e.

ORIG=$(git -C super rev-parse HEAD) &&
test_when_finished "git -C super reset --hard $ORIG" &&
(
cd super &&
...

The patch is still an improvement compared to the current code,
where a broken test-tool that does not produce expected output in
the file 'actual' is guaranteed to leave us at a commit that we do
not expect to be at, but not entirely satisfactory.


Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-14 Thread Jeff Hostetler




On 8/14/2018 2:44 PM, Stefan Beller wrote:

On Tue, Aug 14, 2018 at 11:32 AM Duy Nguyen  wrote:


On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler  wrote:

I'm looking at adding code to my SLOG (better name suggestions welcome)
patch series to eventually replace the existing git_trace facility.


Complement maybe. Replace, please no. I'd rather not stare at json messages.


 From the sidelines: We'd only need one logging infrastructure in place, as the
formatting would be done as a later step? For local operations we'd certainly
find better formatting than json, and we figured that we might end up desiring
ProtocolBuffers[1] instead of JSon, so if it would be easy to change
the output of
the structured logging easily that would be great.

But AFAICT these series are all about putting the sampling points into the
code base, so formatting would be orthogonal to it?

Stefan

[1] https://developers.google.com/protocol-buffers/



Last time I checked, protocol-buffers has a C++ binding but not
a C binding.

I've not had a chance to use pbuffers, so I have to ask what advantages
would they have over JSON or some other similar self-describing format?
And/or would it be possible for you to tail the json log file and
convert it to whatever format you preferred?

It seems like the important thing is to capture structured data
(whatever the format) to disk first.

Jeff


Syncing HEAD

2018-08-14 Thread Christian Couder
Hi,

When cloning with --mirror, the clone gets its HEAD initialized with
the value HEAD has in its origin remote. After that if HEAD changes in
origin there is no simple way to sync HEAD at the same time as the
refs are synced.

It looks like the simplest way to sync HEAD is:

1) git remote show origin
2) parse "HEAD branch: XXX" from the output of the above command
3) git symbolic-ref HEAD refs/heads/XXX

It looks like it would be quite easy to add an option to `fetch` to
sync HEAD at the same time as regular refs are synced because every
fetch from an origin that uses a recent Git contains something like:

19:55:39.304976 pkt-line.c:80   packet:  git< 
HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow
deepen-since deepen-not deepen-relative no-progress include-tag
multi_ack_detailed no-done symref=HEAD:refs/heads/test-1
agent=git/2.18.0

which in this example shows that HEAD is a symref to refs/heads/test-1
in origin.

Is there a reason why no such option already exists? Would it makes
sense to add one? Is there any reason why it's not a good idea? Or am
I missing something?

I am asking because GitLab uses HEAD in the bare repos it manages to
store the default branch against which the Merge Requests (same thing
as Pull Requests on GitHub) are created.

So when people want to keep 2 GitLab hosted repos in sync, GitLab
needs to sync HEADs too, not just the refs.

I think this could be useful to other setups than GitLab though.

Thanks,
Christian.


Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-14 Thread Jeff King
On Tue, Aug 14, 2018 at 12:47:59PM -0700, Stefan Beller wrote:

> On Tue, Aug 14, 2018 at 12:36 PM Jeff King  wrote:
> 
> > Thanks, it was nice to see a more comprehensive list in one spot.
> >
> > It would be neat to have a tool that presents all of these
> > automatically, but I think the email ones are pretty tricky (most people
> > don't have the whole list archive sitting around).
> 
> With the advent of public inbox, this is easy to obtain?

For our project, yes. But I was thinking of a tool that could be used
for other projects, too.

> > At one point I sent a patch series that would let shortlog group by
> > trailers. Nobody seemed all that interested and I didn't end up using it
> > for its original purpose, so I didn't polish it further.  But I'd be
> > happy to re-submit it if you think it would be useful.
> 
> I would think it is useful. Didn't Linus also ask for a related thing?
> https://public-inbox.org/git/CA+55aFzWkE43rSm-TJNKkHq4F3eOiGR0-Bo9V1=a1s=vq0k...@mail.gmail.com/

He wanted grouping by committer, which we ended up adding as a separate
feature. I think there's some discussion of the trailer thing in that
thread.

-Peff


Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Jeff King
On Tue, Aug 14, 2018 at 10:28:13AM -0700, Matthew DeVore wrote:

> The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
> would filter out all but the root tree and blobs. In order to avoid
> confusion between 0 and capital O, the documentation was worded in a
> somewhat round-about way that also hints at this future improvement to
> the feature.

I'm OK with this as a name, since we're explicitly not supporting deeper
depths. But I'd note that "depth" is actually a tricky characteristic,
as it's not a property of the object itself, but rather who refers to
it. So:

  - it's expensive to compute, because you have to actually walk all of
the possible commits and trees that could refer to it. This
prohibits a lot of other optimizations like reachability bitmaps
(though with some complexity you could cache the depths, too).

  - you have to define it as something like "the minimum depth at which
this object is found", since there may be multiple depths

I think you can read that second definition between the lines of:

> +The form '--filter=tree:' omits all blobs and trees deeper than
> + from the root tree. Currently, only =0 is supported.

But I wonder if we should be more precise. It doesn't matter now, but it
may help set expectations if the feature does come later.

-Peff


Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Matthew DeVore
On Tue, Aug 14, 2018 at 11:18 AM Jonathan Tan  wrote:
>
> > @@ -743,6 +743,9 @@ specification contained in .
> >   A debug option to help with future "partial clone" development.
> >   This option specifies how missing objects are handled.
> >  +
> > +The form '--filter=tree:' omits all blobs and trees deeper than
> > + from the root tree. Currently, only =0 is supported.
> > ++
> >  The form '--missing=error' requests that rev-list stop with an error if
> >  a missing object is encountered.  This is the default action.
> >  +
>
> The "--filter" documentation should go with the other "--filter"
> information, not right after --missing.
Fixed. My problem was that I didn't know what the + meant - I guess it
means that the paragraph before and after are in the same section?

>
> > +test_expect_success 'setup for tests of tree:0' '
> > + mkdir r1/subtree &&
> > + echo "This is a file in a subtree" > r1/subtree/file &&
> > + git -C r1 add subtree/file &&
> > + git -C r1 commit -m subtree
> > +'
>
> Style: no space after >
Fixed.

>
> > +test_expect_success 'grab tree directly when using tree:0' '
> > + # We should get the tree specified directly but not its blobs or 
> > subtrees.
> > + git -C r1 pack-objects --rev --stdout --filter=tree:0 
> > >commitsonly.pack <<-EOF &&
> > + HEAD:
> > + EOF
> > + git -C r1 index-pack ../commitsonly.pack &&
> > + git -C r1 verify-pack -v ../commitsonly.pack >objs &&
> > + grep -E "tree|blob" objs >trees_and_blobs &&
> > + test_line_count = 1 trees_and_blobs
> > +'
>
> Can we also verify that the SHA-1 in trees_and_blobs is what we
> expected?
Done - Now I'm comparing to the output of `git rev-parse HEAD:` and I
don't need the separate line count check either.
>
> > +test_expect_success 'use fsck before and after manually fetching a missing 
> > subtree' '
> > + # push new commit so server has a subtree
> > + mkdir src/dir &&
> > + echo "in dir" > src/dir/file.txt &&
>
> No space after >
Fixed.

>
> > + git -C src add dir/file.txt &&
> > + git -C src commit -m "file in dir" &&
> > + git -C src push -u srv master &&
> > + SUBTREE=$(git -C src rev-parse HEAD:dir) &&
> > +
> > + rm -rf dst &&
> > + git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst 
> > &&
> > + git -C dst fsck &&
> > + git -C dst cat-file -p $SUBTREE >tree_contents 2>err &&
> > + git -C dst fsck
> > +'
>
> If you don't need to redirect to err, don't do so.
>
> Before the cat-file, also verify that the tree is missing, most likely
> through a "git rev-list" with "--missing=print".
That won't work though - the subtree's hash is not known because its
parent tree is not there. I've merged the three tests in this file,
and as a result am now using the check which makes sure the object
types are only "commit"

>
> And I would grep on the tree_contents to ensure that the filename
> ("file.txt") is there, so that we know that we got the correct tree.
Done.

>
> > +test_expect_success 'can use tree:0 to filter partial clone' '
> > + rm -rf dst &&
> > + git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst 
> > &&
> > + git -C dst rev-list master --missing=allow-any --objects 
> > >fetched_objects &&
> > + cat fetched_objects \
> > + | awk -f print_1.awk \
> > + | xargs -n1 git -C dst cat-file -t >fetched_types &&
> > + sort fetched_types -u >unique_types.observed &&
> > + echo commit > unique_types.expected &&
> > + test_cmp unique_types.observed unique_types.expected
> > +'
> > +
> > +test_expect_success 'auto-fetching of trees with --missing=error' '
> > + git -C dst rev-list master --missing=error --objects >fetched_objects 
> > &&
> > + cat fetched_objects \
> > + | awk -f print_1.awk \
> > + | xargs -n1 git -C dst cat-file -t >fetched_types &&
> > + sort fetched_types -u >unique_types.observed &&
> > + printf "blob\ncommit\ntree\n" >unique_types.expected &&
> > + test_cmp unique_types.observed unique_types.expected
> > +'
>
> These two tests seem redundant with the 'use fsck before and after
> manually fetching a missing subtree' test (after the latter is
> appropriately renamed). I think we only need to test this sequence once,
> which can be placed in one or spread over multiple tests:
>
>  1. partial clone with --filter=tree:0
>  2. fsck works
>  3. verify that trees are indeed missing
>  4. autofetch a tree
>  5. fsck still works
Done - that's much nicer. Thanks!

Here is an interdiff from v4 of the patch:

diff --git a/Documentation/rev-list-options.txt
b/Documentation/rev-list-options.txt
index 9e351ec2a..0b5f77ad3 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -731,6 +731,9 @@ the requested refs.
 +
 The form '--filter=sparse:path=' similarly uses a sparse-checkout
 specification contained in .
++
+The form '--filter=tree:' omits all 

Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-14 Thread Jeff King
On Tue, Aug 14, 2018 at 08:51:41PM +0200, Duy Nguyen wrote:

> > But AFAICT these series are all about putting the sampling points into the
> > code base, so formatting would be orthogonal to it?
> 
> It's not just sampling points. There's things like index id being
> shown in the message for example. I prefer to keep free style format
> to help me read. There's also things like indentation I do here to
> help me read. Granted you could do all that with scripts and stuff,
> but will we pass around in mail  dumps of json messages to be decoded
> locally?

I think you could have both forms using the same entry points sprinkled
through the code.

At GitHub we have a similar telemetry-ish thing, where we collect some
data points and then the resulting JSON is stored for every operation
(for a few weeks for read ops, and indefinitely attached to every ref
write).

And I've found that the storage and the trace-style "just show a
human-readable message to stderr" interface complement each other in
both directions:

 - you can output a human readable message that is sent immediately to
   the trace mechanism but _also_ becomes part of the telemetry. E.g.,
   imagine that one item in the json blob is "this is the last message
   from GIT_TRACE_FOO". Now you can push tracing messages into whatever
   plan you're using to store SLOG. We do this less with TRACE, and much
   more with error() and die() messages.

 - when a structured telemetry item is updated, we can still output a
   human-readable trace message with just that item. E.g., with:

 trace_performance(n, "foo");

   we could either store a json key (perf.foo=n) or output a nicely
   formatted string like we do now, depending on what the user has
   configured (or even both, of course).

It helps if the sampling points give enough information to cover both
cases (as in the trace_performance example), but you can generally
shoe-horn unstructured data into the structured log, and pretty-print
structured data.

-Peff


Re: [PATCH] submodule: add more exhaustive up-path testing

2018-08-14 Thread Junio C Hamano
Stefan Beller  writes:

> Thanks for this patch!
> Stefan

Thanks, I'd take it as your Acked-by: (please holler if it isn't
before the patch hits 'next').


Re: [PATCH v2] send-email: add an option to impose delay sent E-Mails

2018-08-14 Thread Junio C Hamano
Eric Wong  writes:

>> Some popular E-Mail clients completely ignore the "Date" header, which
>> format-patch is careful to set such that the patches will be displayed
>> in order, and instead sort by the time the E-mail was received.

It is send-email that carefully shows monotonically increasing
timestamps so that the sender's datestamp can be used for sorting by
the recipient, not format-patch, which records author-date,
primarily meant for local replaying, in the generated messages but
discarded by send-email.

> Disconnecting during the delay might be more sympathetic to
> existing mail servers (which aren't C10K-optimized).  If the
> client sleeps, the server may disconnect the client anyways
> to save resources.
>
> But maybe --send-delay can be a shortcut for
> --relogin-delay and --batch-size=1

Both good points to point at a simpler and better solution, I guess.




Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-14 Thread Stefan Beller
On Tue, Aug 14, 2018 at 12:36 PM Jeff King  wrote:

> Thanks, it was nice to see a more comprehensive list in one spot.
>
> It would be neat to have a tool that presents all of these
> automatically, but I think the email ones are pretty tricky (most people
> don't have the whole list archive sitting around).

With the advent of public inbox, this is easy to obtain?

>
> > 2. Number of other commit tag-lines (Reviewed-By, Helped-By, Reported-By,
> > etc.).
> >
> > Using git repo:
> >
> > $ git log --since=2018-01-01 junio/next|grep by:|grep -v
> > Signed-off-by:|sort|uniq -c|sort -nr|head -n 20
>
> At one point I sent a patch series that would let shortlog group by
> trailers. Nobody seemed all that interested and I didn't end up using it
> for its original purpose, so I didn't polish it further.  But I'd be
> happy to re-submit it if you think it would be useful.

I would think it is useful. Didn't Linus also ask for a related thing?
https://public-inbox.org/git/CA+55aFzWkE43rSm-TJNKkHq4F3eOiGR0-Bo9V1=a1s=vq0k...@mail.gmail.com/


Re: [PATCH 3/4] cat-file: use a single strbuf for all output

2018-08-14 Thread Jeff King
On Tue, Aug 14, 2018 at 09:30:57PM +0200, René Scharfe wrote:

> > -static void batch_object_write(const char *obj_name, struct batch_options 
> > *opt,
> > +static void batch_object_write(const char *obj_name,
> > +  struct strbuf *scratch,
> > +  struct batch_options *opt,
> >struct expand_data *data)
> >  {
> > -   struct strbuf buf = STRBUF_INIT;
> 
> We could also avoid passing that buffer around by making it static.  I
> shy away from adding static variables because the resulting code won't
> be thread-safe, but that fear might be irrational, especially with
> cat-file.

True, I didn't even think of that after your original got me in the
mindset of passing the buffer down. It's not too bad to do it this way,
and I agree with you that we are better avoiding static variables if we
can. Five years ago I might have said the opposite, but we've cleaned up
a lot of confusing hidden-static bits in that time. Let's not go in the
opposite direction. :)

-Peff


Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-14 Thread Jeff King
On Tue, Aug 14, 2018 at 01:43:38PM -0400, Derrick Stolee wrote:

> On 8/13/2018 5:54 PM, Jeff King wrote:
> > So I try not to think too hard on metrics, and just use them to get a
> > rough view on who is active.
> 
> I've been very interested in measuring community involvement, with the
> knowledge that any metric is flawed and we should not ever say "this metric
> is how we measure the quality of a contributor". It can be helpful, though,
> to track some metrics and their change over time.
> 
> Here are a few measurements we can make:

Thanks, it was nice to see a more comprehensive list in one spot.

It would be neat to have a tool that presents all of these
automatically, but I think the email ones are pretty tricky (most people
don't have the whole list archive sitting around).

> 2. Number of other commit tag-lines (Reviewed-By, Helped-By, Reported-By,
> etc.).
> 
>     Using git repo:
> 
>     $ git log --since=2018-01-01 junio/next|grep by:|grep -v
> Signed-off-by:|sort|uniq -c|sort -nr|head -n 20

At one point I sent a patch series that would let shortlog group by
trailers. Nobody seemed all that interested and I didn't end up using it
for its original purpose, so I didn't polish it further.  But I'd be
happy to re-submit it if you think it would be useful.

The shell hackery here isn't too bad, but doing it internally is a
little faster, a little more robust (less parsing), and lets you show
more details about the commits themselves (e.g., who reviews whom).

> 3. Number of threads started by user.

You have "started" and "participated in". I guess one more would be
"closed", as in "solved a bug", but that is quite hard to tell without
looking at the content. Taking just the last person in a thread as the
closer means that an OP saying "thanks!" wrecks it. And somebody who
rants long enough that everybody else loses interest gets marked as a
closer. ;)

> If you have other ideas for fun measurements, then please let me know.

I think I mentioned "surviving lines" elsewhere, which I do like this
(and almost certainly stole from Junio a long time ago):

  # Obviously you can tweak this as you like, but the mass-imported bits
  # in compat and xdiff tend to skew the counts. It's possibly worth
  # counting language lines separately.
  git ls-files '*.c' '*.h' :^compat :^contrib :^xdiff |
  while read fn; do
# eye candy
echo >&2 "Blaming $fn..."

# You can use more/fewer -C to dig more or less for code moves.
# Possibly "-w" would help, though I doubt it shifts things more
# than a few percent anyway.
git blame -C --line-porcelain $fn
  done |
  perl -lne '/^author (.*)/ and print $1' |
  sort | uniq -c | sort -rn | head

The output right now is:

  35156 Junio C Hamano
  22207 Jeff King
  17466 Nguyễn Thái Ngọc Duy
  12005 Johannes Schindelin
  10259 Michael Haggerty
   9389 Linus Torvalds
   8318 Brandon Williams
   7776 Stefan Beller
   5947 Christian Couder
   4935 René Scharfe

which seems reasonable.

-Peff


Re: [PATCH 3/4] cat-file: use a single strbuf for all output

2018-08-14 Thread René Scharfe
Am 14.08.2018 um 20:20 schrieb Jeff King:
> When we're in batch mode, we end up in batch_object_write()
> for each object, which allocates its own strbuf for each
> call. Instead, we can provide a single "scratch" buffer that
> gets reused for each output. When running:
> 
>   git cat-file --batch-all-objects --batch-check='%(objectname)'
> 
> on git.git, my best-of-five time drops from:
> 
>   real0m0.171s
>   user0m0.159s
>   sys 0m0.012s
> 
> to:
> 
>   real0m0.133s
>   user0m0.121s
>   sys 0m0.012s
> 
> Note that we could do this just by putting the "scratch"
> pointer into "struct expand_data", but I chose instead to
> add an extra parameter to the callstack. That's more
> verbose, but it makes it a bit more obvious what is going
> on, which in turn makes it easy to see where we need to be
> releasing the string in the caller (right after the loop
> which uses it in each case).
> 
> Based-on-a-patch-by: René Scharfe 
> Signed-off-by: Jeff King 
> ---
> It also made it easy to see that without the prior patch,
> we'd have been using "buf" for two parameters. :)

Good catch.

> 
>  builtin/cat-file.c | 28 +---
>  1 file changed, 17 insertions(+), 11 deletions(-)
> 
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 3ed1d0be80..08dced2614 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -338,11 +338,11 @@ static void print_object_or_die(struct batch_options 
> *opt, struct expand_data *d
>   }
>  }
>  
> -static void batch_object_write(const char *obj_name, struct batch_options 
> *opt,
> +static void batch_object_write(const char *obj_name,
> +struct strbuf *scratch,
> +struct batch_options *opt,
>  struct expand_data *data)
>  {
> - struct strbuf buf = STRBUF_INIT;

We could also avoid passing that buffer around by making it static.  I
shy away from adding static variables because the resulting code won't
be thread-safe, but that fear might be irrational, especially with
cat-file.

> -
>   if (!data->skip_object_info &&
>   oid_object_info_extended(the_repository, >oid, >info,
>OBJECT_INFO_LOOKUP_REPLACE) < 0) {
> @@ -352,10 +352,10 @@ static void batch_object_write(const char *obj_name, 
> struct batch_options *opt,
>   return;
>   }
>  
> - strbuf_expand(, opt->format, expand_format, data);
> - strbuf_addch(, '\n');
> - batch_write(opt, buf.buf, buf.len);
> - strbuf_release();
> + strbuf_reset(scratch);
> + strbuf_expand(scratch, opt->format, expand_format, data);
> + strbuf_addch(scratch, '\n');
> + batch_write(opt, scratch->buf, scratch->len);
>  
>   if (opt->print_contents) {
>   print_object_or_die(opt, data);
> @@ -363,7 +363,9 @@ static void batch_object_write(const char *obj_name, 
> struct batch_options *opt,
>   }
>  }
>  
> -static void batch_one_object(const char *obj_name, struct batch_options *opt,
> +static void batch_one_object(const char *obj_name,
> +  struct strbuf *scratch,
> +  struct batch_options *opt,
>struct expand_data *data)
>  {
>   struct object_context ctx;
> @@ -405,20 +407,21 @@ static void batch_one_object(const char *obj_name, 
> struct batch_options *opt,
>   return;
>   }
>  
> - batch_object_write(obj_name, opt, data);
> + batch_object_write(obj_name, scratch, opt, data);
>  }
>  
>  struct object_cb_data {
>   struct batch_options *opt;
>   struct expand_data *expand;
>   struct oidset *seen;
> + struct strbuf *scratch;
>  };
>  
>  static int batch_object_cb(const struct object_id *oid, void *vdata)
>  {
>   struct object_cb_data *data = vdata;
>   oidcpy(>expand->oid, oid);
> - batch_object_write(NULL, data->opt, data->expand);
> + batch_object_write(NULL, data->scratch, data->opt, data->expand);
>   return 0;
>  }
>  
> @@ -509,6 +512,7 @@ static int batch_objects(struct batch_options *opt)
>  
>   cb.opt = opt;
>   cb.expand = 
> + cb.scratch = 
>  
>   if (opt->unordered) {
>   struct oidset seen = OIDSET_INIT;
> @@ -531,6 +535,7 @@ static int batch_objects(struct batch_options *opt)
>   oid_array_clear();
>   }
>  
> + strbuf_release();
>   return 0;
>   }
>  
> @@ -559,10 +564,11 @@ static int batch_objects(struct batch_options *opt)
>   data.rest = p;
>   }
>  
> - batch_one_object(input.buf, opt, );
> + batch_one_object(input.buf, , opt, );
>   }
>  
>   strbuf_release();
> + strbuf_release();
>   warn_on_object_refname_ambiguity = save_warning;
>   return retval;
>  }
> 


Re: [PATCH v4 0/5] Speed up unpack_trees()

2018-08-14 Thread Ben Peart




On 8/12/2018 4:15 AM, Nguyễn Thái Ngọc Duy wrote:

v4 has a bunch of changes

- 1/5 is a new one to show indented tracing. This way it's less
   misleading to read nested time measurements
- 3/5 now has the switch/restore cache_bottom logic. Junio suggested a
   check instead in his final note, but I think this is safer (yeah I'm
   scared too)
- the old 4/4 is dropped because
   - it assumes n-way logic
   - the visible time saving is not worth the tradeoff
   - Elijah gave me an idea to avoid add_index_entry() that I think
 does not have n-way logic assumptions and gives better saving.
 But it requires some more changes so I'm going to do it later
- 5/5 is also new and should help reduce cache_tree_update() cost.
   I wrote somewhere I was not going to work on this part, but it turns
   out just a couple lines, might as well do it now.

Interdiff



I've now had a chance to run the git tests, as well as our own unit and 
functional tests with this patch series and all passed.


I reviewed the tests in t0090-cache-tree.h and verified that there are 
tests that validate the cache tree is correct after doing a checkout and 
merge (both of which exercise the new cache tree optimization in patch 5).


I've also run our perf test suite and the results are outstanding:

Checkout saves 51% on average
Merge saves 44%
Pull saves 30%
Rebase saves 26%

For perspective, that means these commands are going from ~20 seconds to 
~10 seconds.


I don't feel that any of my comments to the individual patches deserve a 
re-roll.  Given the ongoing discussion about the additional tracing - 
I'm happy to leave out the first 2 patches so that the rest can go in 
sooner rather than later.


Looks good!


diff --git a/cache-tree.c b/cache-tree.c
index 0dbe10fc85..105f13806f 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -426,7 +426,6 @@ static int update_one(struct cache_tree *it,
  
  int cache_tree_update(struct index_state *istate, int flags)

  {
-   uint64_t start = getnanotime();
struct cache_tree *it = istate->cache_tree;
struct cache_entry **cache = istate->cache;
int entries = istate->cache_nr;
@@ -434,11 +433,12 @@ int cache_tree_update(struct index_state *istate, int 
flags)
  
  	if (i)

return i;
+   trace_performance_enter();
i = update_one(it, cache, entries, "", 0, , flags);
+   trace_performance_leave("cache_tree_update");
if (i < 0)
return i;
istate->cache_changed |= CACHE_TREE_CHANGED;
-   trace_performance_since(start, "repair cache-tree");
return 0;
  }
  
diff --git a/cache.h b/cache.h

index e6f7ee4b64..8b447652a7 100644
--- a/cache.h
+++ b/cache.h
@@ -673,7 +673,6 @@ extern int index_name_pos(const struct index_state *, const 
char *name, int name
  #define ADD_CACHE_JUST_APPEND 8   /* Append only; 
tree.c::read_tree() */
  #define ADD_CACHE_NEW_ONLY 16 /* Do not replace existing ones */
  #define ADD_CACHE_KEEP_CACHE_TREE 32  /* Do not invalidate cache-tree */
-#define ADD_CACHE_SKIP_VERIFY_PATH 64  /* Do not verify path */
  extern int add_index_entry(struct index_state *, struct cache_entry *ce, int 
option);
  extern void rename_index_entry_at(struct index_state *, int pos, const char 
*new_name);
  
diff --git a/diff-lib.c b/diff-lib.c

index a9f38eb5a3..1ffa22c882 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -518,8 +518,8 @@ static int diff_cache(struct rev_info *revs,
  int run_diff_index(struct rev_info *revs, int cached)
  {
struct object_array_entry *ent;
-   uint64_t start = getnanotime();
  
+	trace_performance_enter();

ent = revs->pending.objects;
if (diff_cache(revs, >item->oid, ent->name, cached))
exit(128);
@@ -528,7 +528,7 @@ int run_diff_index(struct rev_info *revs, int cached)
diffcore_fix_diff_index(>diffopt);
diffcore_std(>diffopt);
diff_flush(>diffopt);
-   trace_performance_since(start, "diff-index");
+   trace_performance_leave("diff-index");
return 0;
  }
  
diff --git a/dir.c b/dir.c

index 21e6f2520a..c5e9fc8cea 100644
--- a/dir.c
+++ b/dir.c
@@ -2263,11 +2263,11 @@ int read_directory(struct dir_struct *dir, struct 
index_state *istate,
   const char *path, int len, const struct pathspec *pathspec)
  {
struct untracked_cache_dir *untracked;
-   uint64_t start = getnanotime();
  
  	if (has_symlink_leading_path(path, len))

return dir->nr;
  
+	trace_performance_enter();

untracked = validate_untracked_cache(dir, len, pathspec);
if (!untracked)
/*
@@ -2302,7 +2302,7 @@ int read_directory(struct dir_struct *dir, struct 
index_state *istate,
dir->nr = i;
}
  
-	trace_performance_since(start, "read directory %.*s", len, path);

+   trace_performance_leave("read directory %.*s", len, path);
if (dir->untracked) {
static int 

Re: [PATCH] mingw: enable atomic O_APPEND

2018-08-14 Thread Jeff King
On Tue, Aug 14, 2018 at 08:29:04PM +0200, Johannes Sixt wrote:

> Am 14.08.2018 um 00:37 schrieb Jeff King:
> > And then you can do something like:
> > 
> >for size in 4097 8193 16385 32769 65537 131073 262145 524289 1048577; do
> >  >out ;# clean up from last run
> >  echo "Trying $size..."
> >  timeout 5 ./write $size out
> >  if ! ./check $size  >echo "$size failed"
> >break
> >  fi
> >done
> > 
> > On my Linux system, each of those seems to write several gigabytes
> > without overlapping. I did manage to hit some failing cases, but they
> > were never sheared writes, but rather cases where there was an
> > incomplete write at the end-of-file.
> 
> I used your programs with necessary adjustments (as fork() is not
> available), and did similar tests with concurrent processes. With packet
> sizes 1025, 4093, 7531 (just to include some odd number), and 8193 I did not
> observe any overlapping or short writes.
> 
> I'm now very confident that we are on the safe side for our purposes.

Great, thanks for testing!

Re-reading what I wrote about end-of-file above and thinking about the
conversation with Ævar elsewhere in the thread, I suspect it _is_ easy
to get overlapping writes if the processes are receiving signals (since
clearly the TERM signal caused a partial write).

My experiment doesn't simulate that at all. I suppose the parent process
could send SIGUSR1 to the child in each loop, and the child would catch
it but keep going.

Hmm, that was easy enough to do (programs below for reference), but
surprisingly it didn't fail for me (except for the normal end-of-file
truncation). It's like the OS is willing to truncate the write of a
dying program but not one for a signal that is getting handled. Which is
great for us, since it's exactly what we want, but makes me even more
suspicious that a non-Linux kernel might behave completely differently.

I still think we're fine in practice, as I'd expect any kernel to be
atomic under the page size. So this was mostly just for my own
edification.

-Peff

-- >8 --
/* check.c, with separate short-read reporting */
#include 
#include 
#include 

int main(int argc, const char **argv)
{
int size = atoi(argv[1]);
int block = 0;
char *buf;

buf = malloc(size);
while (1) {
int i;
/* assume atomic reads */
int r = read(0, buf, size);
if (!r)
break;
if (r < size) {
fprintf(stderr, "short read\n");
return 1;
}
for (i = 1; i < size; i++) {
if (buf[i] != buf[0]) {
fprintf(stderr, "overlap in block %d\n", block);
return 1;
}
}
block++;
}
}
-- >8 --

-- >8 --
/* write.c with signals; you can also confirm via strace
   that each write is atomic */
#include 
#include 
#include 
#include 
#include 
#include 
#include 

void handle_signal(int sig)
{
/* do nothing */
}

static void doit(int size, const char *fn, char c, pid_t pid)
{
int fd;
char *buf;

fd = open(fn, O_WRONLY|O_APPEND|O_CREAT, 0666);
if (fd < 0) {
perror("open");
return;
}

buf = malloc(size);
memset(buf, c, size);

while (1) {
if (pid)
kill(pid, SIGUSR1);
write(fd, buf, size);
}
}

int main(int argc, const char **argv)
{
int size = atoi(argv[1]);
pid_t pid;

signal(SIGUSR1, handle_signal);

pid = fork();
if (pid)
doit(size, argv[2], '1', pid);
else
doit(size, argv[2], '2', pid);
return 0;
}
-- >8 --


Re: [PATCH] submodule: add more exhaustive up-path testing

2018-08-14 Thread Stefan Beller
On Tue, Aug 14, 2018 at 11:59 AM Ævar Arnfjörð Bjarmason
 wrote:
>
> The tests added in 63e95beb08 ("submodule: port resolve_relative_url
> from shell to C", 2016-04-15) didn't do a good job of testing various
> up-path invocations where the up-path would bring us beyond even the
> URL in question without emitting an error.
>
> These results look nonsensical, but it's worth exhaustively testing
> them before fixing any of this code, so we can see which of these
> cases were changed.

Yeah. Please look at the comment in builtin/submodule--helper.c
in that commit, where I described my expectations.

I should have put them into tests instead with the expectations
spelled out there.

Thanks for this patch!
Stefan

>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---


> So I think these tests are worthwihle in themselves,

The reason I put it in the comment instead of tests was the
ease of spelling out both the status quo and expectations.

> but would like
> some advice on how to proceed with that from someone more familiar
> with submodules.

So ideally we'd also error out as soon as the host name is touched?


Re: [PATCH 4/4] range-diff: indent special lines as context

2018-08-14 Thread Stefan Beller
On Tue, Aug 14, 2018 at 11:54 AM Johannes Schindelin
 wrote:
>
> Hi Stefan,
>
> On Mon, 13 Aug 2018, Stefan Beller wrote:
>
> > > > The later lines that indicate a change to the Makefile will be treated 
> > > > as
> > > > context both in the outer and inner diff, such that those lines stay
> > > > regular color.
> > >
> > > While I am a fan of having those lines colored correctly, I have to admit
> > > that I am not exactly enthusiastic about that extra indentation...
> > >
> > > Otherwise, this looks good to me.
> >
> > Can you explain what makes you less enthused about the indentation?
> >
> > Advantage:
> > * allows easy coloring (easy implementation)
> > Disadvantage:
> > * formats change,
>
> This is it. It breaks my visual flow.
>
> > but the range diff is still in its early design phase, so we're not
> > breaking things, yet?
>
> Indeed. We're not breaking things. If you feel strongly about it, we can
> have that indentation, I *can* get used to it.

I only feel strongly about it now as that is the *easiest* way to make
the colors
look like I want them to look. And I really value colors in the range-diff.
Earlier you said that color-less range-diff is nearly useless for you and I
thought it was hyperbole, but by now I realize how much truth you spoke.
So getting the colors fixed to not markup files (+++/ --- lines of the inner
diff) is a high priority for me. So high that I would compromise on the
indentation/flow of these corner case areas.

> >   (Do we ever plan on sending range-diff patches that can be applied to
> >   rewrite history? I am very uncertain on such a feature request.  It
> >   sounds cool, though)
>
> I remember that I heard you discussing this internally. I am not too big a
> fan of this idea, I have to admit. The range diff seems more designed to
> explain how a patch series evolved, rather than providing machine-readable
> data that allows to recreate said evolution. For example, the committer
> information as well as the date are missing, which would preclude a
> faithful reconstruction.
>

Ah! good point. Though we could just work around that and use the email
date for the new author dates. ;-)

> And that is not all: if you wanted to "apply" a range diff, you would need
> to know more about the base(s) of the two commit ranges. You would need to
> know that they are at least very similar to the base onto which you want
> to apply this.

You would say so in the cover letter "This is a resend of sb/range-diff-colors"
and by the knowledge of that tip only and the range-diff you would
know how the new series would look like, even if it was rebased.

> And quite seriously, this would be the wrong way to go in my mind. We have
> a very efficient data format to transport all of that information: the Git
> bundle.

The bundle format is very efficient for machine transport, but I thought the
whole point of the mailing list was easy human readable parts, i.e. you can
point out things in a diff, which you could also do in a range-diff to some
extend. We would loose some of the "fresh eyes" as you'd only see the
changed part of the series. :-/ So yeah even for the workflow this seems
a net-negative. I just thought it would be cool.

> Let's not overload the range diff format with multiple, partially
> contradicting purposes. Think "separation of concerns". It's the same
> issue, really, as trying to send highly structured data such as bug
> reports or code contributions via a medium meant to send unstructured
> plain or formatted text back and forth between human beings.

ok.

Thanks,
Stefan


[PATCH] submodule: add more exhaustive up-path testing

2018-08-14 Thread Ævar Arnfjörð Bjarmason
The tests added in 63e95beb08 ("submodule: port resolve_relative_url
from shell to C", 2016-04-15) didn't do a good job of testing various
up-path invocations where the up-path would bring us beyond even the
URL in question without emitting an error.

These results look nonsensical, but it's worth exhaustively testing
them before fixing any of this code, so we can see which of these
cases were changed.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

The reason I started to poke at this was because I wanted to add
support for relative URLs like //group/project.

This solves the problem of there being e.g. a myhost.com/group/project
which includes a ../submodule submodule pointing to
myhost.com/group/submodule , and me wanting to fork it to
myhost.com/myuser/project.

But of course with ../submodule the relative URL will point to
myhost.com/myuser/submodule, which won't exist.

So I was hoping to find some point in the relative URL code where we
were actually aware of what the host boundary was, so I could just cut
off everything after that if //group/project was provided, strip off
one slash to make it /group/project, and call it a day.

But as the tests below show the code isn't aware of that at all, and
will happily trample over the host(name) to produce nonsensical
results.

So I think these tests are worthwihle in themselves, but would like
some advice on how to proceed with that from someone more familiar
with submodules.

 t/t0060-path-utils.sh | 13 +
 1 file changed, 13 insertions(+)

diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh
index 21a8b53132..cd74c0a471 100755
--- a/t/t0060-path-utils.sh
+++ b/t/t0060-path-utils.sh
@@ -330,6 +330,9 @@ test_submodule_relative_url "(null)" "../foo" 
"../submodule" "../submodule"
 test_submodule_relative_url "(null)" "./foo/bar" "../submodule" "foo/submodule"
 test_submodule_relative_url "(null)" "./foo" "../submodule" "submodule"
 test_submodule_relative_url "(null)" "//somewhere else/repo" "../subrepo" 
"//somewhere else/subrepo"
+test_submodule_relative_url "(null)" "//somewhere else/repo" "../../subrepo" 
"//subrepo"
+test_submodule_relative_url "(null)" "//somewhere else/repo" 
"../../../subrepo" "/subrepo"
+test_submodule_relative_url "(null)" "//somewhere else/repo" 
"../../../../subrepo" "subrepo"
 test_submodule_relative_url "(null)" "$(pwd)/subsuper_update_r" 
"../subsubsuper_update_r" "$(pwd)/subsubsuper_update_r"
 test_submodule_relative_url "(null)" "$(pwd)/super_update_r2" 
"../subsuper_update_r" "$(pwd)/subsuper_update_r"
 test_submodule_relative_url "(null)" "$(pwd)/." "../." "$(pwd)/."
@@ -344,10 +347,20 @@ test_submodule_relative_url "(null)" "file:///tmp/repo" 
"../subrepo" "file:///tm
 test_submodule_relative_url "(null)" "foo/bar" "../submodule" "foo/submodule"
 test_submodule_relative_url "(null)" "foo" "../submodule" "submodule"
 test_submodule_relative_url "(null)" "helper:://hostname/repo" "../subrepo" 
"helper:://hostname/subrepo"
+test_submodule_relative_url "(null)" "helper:://hostname/repo" "../../subrepo" 
"helper:://subrepo"
+test_submodule_relative_url "(null)" "helper:://hostname/repo" 
"../../../subrepo" "helper::/subrepo"
+test_submodule_relative_url "(null)" "helper:://hostname/repo" 
"../../../../subrepo" "helper::subrepo"
+test_submodule_relative_url "(null)" "helper:://hostname/repo" 
"../../../../../subrepo" "helper:subrepo"
+test_submodule_relative_url "(null)" "helper:://hostname/repo" 
"../../../../../../subrepo" ".:subrepo"
 test_submodule_relative_url "(null)" "ssh://hostname/repo" "../subrepo" 
"ssh://hostname/subrepo"
+test_submodule_relative_url "(null)" "ssh://hostname/repo" "../../subrepo" 
"ssh://subrepo"
+test_submodule_relative_url "(null)" "ssh://hostname/repo" "../../../subrepo" 
"ssh:/subrepo"
+test_submodule_relative_url "(null)" "ssh://hostname/repo" 
"../../../../subrepo" "ssh:subrepo"
+test_submodule_relative_url "(null)" "ssh://hostname/repo" 
"../../../../../subrepo" ".:subrepo"
 test_submodule_relative_url "(null)" "ssh://hostname:22/repo" "../subrepo" 
"ssh://hostname:22/subrepo"
 test_submodule_relative_url "(null)" "user@host:path/to/repo" "../subrepo" 
"user@host:path/to/subrepo"
 test_submodule_relative_url "(null)" "user@host:repo" "../subrepo" 
"user@host:subrepo"
+test_submodule_relative_url "(null)" "user@host:repo" "../../subrepo" 
".:subrepo"
 
 test_expect_success 'match .gitmodules' '
test-tool path-utils is_dotgitmodules \
-- 
2.18.0.865.gffc8e1a3cd6



Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-14 Thread Jeff King
On Tue, Aug 14, 2018 at 11:04:06AM -0700, Brandon Williams wrote:

> > I think this backwards-compatibility is necessary to avoid pain. But
> > until it goes away, I don't think this is helping the vulnerability from
> > 0383bbb901. Because there the issue was that the submodule name pointed
> > back into the working tree, so this access() would find the untrusted
> > working tree code and say "ah, an old-fashioned name!".
> [...]
> 
> Oh I know that this doesn't help with that vulnerability.  As you've
> said we fix it and now disallow ".." at the submodule-config level so
> really this path is simply about using what we get out of
> submodule-config in a more sane manor.

OK, I'm alright with that as long as we are all on the same page. I
think I mistook "this addresses the vulnerability" from your commit
message the wrong way. I took it as "this patch", but reading it again,
you simply mean "the '..' handling we already did".

I do think eventually dropping this back-compatibility could save us
from another directory-escape problem, but it's hard to justify the
real-world pain for a hypothetical benefit. Maybe in a few years we
could get rid of it in a major version bump.

> > One interesting thing about url-encoding is that it's not one-to-one.
> > This case could also be %2F, which is a different file (on a
> > case-sensitive filesystem). I think "%20" and "+" are similarly
> > interchangeable.
> > 
> > If we were decoding the filenames, that's fine. The round-trip is
> > lossless.
> > 
> > But that's not quite how the new code behaves. We encode the input and
> > then check to see if it matches an encoding we previously performed. So
> > if our urlencode routines ever change, this will subtly break.
> > 
> > I don't know how much it's worth caring about. We're not that likely to
> > change the routines ourself (though certainly a third-party
> > implementation would need to know our exact url-encoding decisions).
> 
> This is exactly the reason why I wanted to get some opinions on what the
> best thing to do here would be.  I _think_ the best thing would probably
> be to write a specific routine to do the conversion, and it wouldn't
> even have to be all that complex.  Basically I'm just interested in
> converting '/' characters so that things no longer behave like
> nested directories.

I think we benefit from catching names that would trigger filesystem
case-folding, too. If I have submodules with names "foo" and "FOO", we
would not want to confuse them (or at least we should confuse them
equally on all platforms). I doubt you can do anything malicious, but it
might simply be annoying.

That implies to me using a custom function (even if its encoded form
ends up being understandable as url-encoding).

-Peff


Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-14 Thread Jonathan Nieder
Hi,

Brandon Williams wrote:
> On 08/09, Jeff King wrote:

>> One interesting thing about url-encoding is that it's not one-to-one.
>> This case could also be %2F, which is a different file (on a
>> case-sensitive filesystem). I think "%20" and "+" are similarly
>> interchangeable.
>>
>> If we were decoding the filenames, that's fine. The round-trip is
>> lossless.
>>
>> But that's not quite how the new code behaves. We encode the input and
>> then check to see if it matches an encoding we previously performed. So
>> if our urlencode routines ever change, this will subtly break.
>>
>> I don't know how much it's worth caring about. We're not that likely to
>> change the routines ourself (though certainly a third-party
>> implementation would need to know our exact url-encoding decisions).
>
> This is exactly the reason why I wanted to get some opinions on what the
> best thing to do here would be.  I _think_ the best thing would probably
> be to write a specific routine to do the conversion, and it wouldn't
> even have to be all that complex.  Basically I'm just interested in
> converting '/' characters so that things no longer behave like
> nested directories.

First of all, I think the behavior with this patch is already much
better than the previous status quo.  I'm using the patch now and am
very happy with it.

Second, what if we store the pathname in config?  We already store the
URL there:

[submodule "plugins/hooks"]
url = https://gerrit.googlesource.com/plugins/hooks

So we could (as a followup patch) do something like

[submodule "plugins/hooks"]
url = https://gerrit.googlesource.com/plugins/hooks
gitdirname = plugins%2fhooks

and use that for lookups instead of regenerating the directory name.
What do you think?

Thanks,
Jonathan


Re: [PATCH 4/4] range-diff: indent special lines as context

2018-08-14 Thread Johannes Schindelin
Hi Stefan,

On Mon, 13 Aug 2018, Stefan Beller wrote:

> > > The later lines that indicate a change to the Makefile will be treated as
> > > context both in the outer and inner diff, such that those lines stay
> > > regular color.
> >
> > While I am a fan of having those lines colored correctly, I have to admit
> > that I am not exactly enthusiastic about that extra indentation...
> >
> > Otherwise, this looks good to me.
> 
> Can you explain what makes you less enthused about the indentation?
> 
> Advantage:
> * allows easy coloring (easy implementation)
> Disadvantage:
> * formats change,

This is it. It breaks my visual flow.

> but the range diff is still in its early design phase, so we're not
> breaking things, yet?

Indeed. We're not breaking things. If you feel strongly about it, we can
have that indentation, I *can* get used to it.

>   (Do we ever plan on sending range-diff patches that can be applied to
>   rewrite history? I am very uncertain on such a feature request.  It
>   sounds cool, though)

I remember that I heard you discussing this internally. I am not too big a
fan of this idea, I have to admit. The range diff seems more designed to
explain how a patch series evolved, rather than providing machine-readable
data that allows to recreate said evolution. For example, the committer
information as well as the date are missing, which would preclude a
faithful reconstruction.

And that is not all: if you wanted to "apply" a range diff, you would need
to know more about the base(s) of the two commit ranges. You would need to
know that they are at least very similar to the base onto which you want
to apply this.

And quite seriously, this would be the wrong way to go in my mind. We have
a very efficient data format to transport all of that information: the Git
bundle.

Let's not overload the range diff format with multiple, partially
contradicting purposes. Think "separation of concerns". It's the same
issue, really, as trying to send highly structured data such as bug
reports or code contributions via a medium meant to send unstructured
plain or formatted text back and forth between human beings.

Ciao,
Dscho


Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-14 Thread Duy Nguyen
On Tue, Aug 14, 2018 at 8:44 PM Stefan Beller  wrote:
>
> On Tue, Aug 14, 2018 at 11:32 AM Duy Nguyen  wrote:
> >
> > On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler  
> > wrote:
> > > I'm looking at adding code to my SLOG (better name suggestions welcome)
> > > patch series to eventually replace the existing git_trace facility.
> >
> > Complement maybe. Replace, please no. I'd rather not stare at json messages.
>
> From the sidelines: We'd only need one logging infrastructure in place, as the
> formatting would be done as a later step? For local operations we'd certainly
> find better formatting than json, and we figured that we might end up desiring
> ProtocolBuffers[1] instead of JSon, so if it would be easy to change
> the output of
> the structured logging easily that would be great.

These trace messages are made for human consumption. Granted
occasionally we need some processing but I find one liners mostly
suffice. Now we turn these into something made for machines, turning
people to second citizens. I've read these messages reformatted for
human, it's usually too verbose even if it's reformatted.

> But AFAICT these series are all about putting the sampling points into the
> code base, so formatting would be orthogonal to it?

It's not just sampling points. There's things like index id being
shown in the message for example. I prefer to keep free style format
to help me read. There's also things like indentation I do here to
help me read. Granted you could do all that with scripts and stuff,
but will we pass around in mail  dumps of json messages to be decoded
locally?

> Stefan
>
> [1] https://developers.google.com/protocol-buffers/



-- 
Duy


Re: [PATCH v2] send-email: add an option to impose delay sent E-Mails

2018-08-14 Thread Eric Wong
Ævar Arnfjörð Bjarmason  wrote:
> Add a --send-delay option with a corresponding sendemail.smtpSendDelay
> configuration variable. When set to e.g. 2, this causes send-email to
> sleep 2 seconds before sending the next E-Mail. We'll only sleep
> between sends, not before the first send, or after the last.
> 
> This option has two uses. Firstly, to be able to Ctrl+C a long send
> with "all" if you have a change of heart. Secondly, as a hack in some
> mail setups to, with a sufficiently high delay, force the receiving
> client to sort the E-Mails correctly.
> 
> Some popular E-Mail clients completely ignore the "Date" header, which
> format-patch is careful to set such that the patches will be displayed
> in order, and instead sort by the time the E-mail was received.
> 
> Google's GMail is a good example of such a client. It ostensibly sorts
> by some approximation of received time (although not by any "Received"
> header). It's more usual than not to see patches showing out of order
> in GMail. To take a few examples of orders seen on patches on the Git
> mailing list:
> 
> 1 -> 3 -> 4 -> 2 -> 8 -> 7 (completion by Nguyễn Thái Ngọc Duy)
> 2 -> 0 -> 1 -> 3 (pack search by Derrick Stolee)
> 3 -> 2 -> 1 (fast-import by Jameson Miller)
> 2 -> 3 -> 1 -> 5 -> 4 -> 6 (diff-highlight by Jeff King)
> 
> The reason to add the new "X-Mailer-Send-Delay" header is to make it
> easy to tell what the imposed delay was, if any. This allows for
> gathering some data on how the transfer of E-Mails with & without this
> option behaves. This may not be workable without really long delays,
> see [1] and [2].

Aside from the new header, I think this is better implemented
using the existing $relogin_delay and $batch_size=1.

Disconnecting during the delay might be more sympathetic to
existing mail servers (which aren't C10K-optimized).  If the
client sleeps, the server may disconnect the client anyways
to save resources.

> @@ -1741,6 +1747,10 @@ sub process_file {
>   $message, $xfer_encoding, $target_xfer_encoding);
>   push @xh, "Content-Transfer-Encoding: $xfer_encoding";
>   unshift @xh, 'MIME-Version: 1.0' unless $has_mime_version;
> + if ($send_delay && $i > 0) {
> + push @xh, "X-Mailer-Send-Delay: $send_delay";
> + sleep $send_delay;
> + }

We can add this header for relogin_delay + batch_size

But maybe --send-delay can be a shortcut for
--relogin-delay and --batch-size=1


Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-14 Thread Stefan Beller
On Tue, Aug 14, 2018 at 11:32 AM Duy Nguyen  wrote:
>
> On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler  wrote:
> > I'm looking at adding code to my SLOG (better name suggestions welcome)
> > patch series to eventually replace the existing git_trace facility.
>
> Complement maybe. Replace, please no. I'd rather not stare at json messages.

>From the sidelines: We'd only need one logging infrastructure in place, as the
formatting would be done as a later step? For local operations we'd certainly
find better formatting than json, and we figured that we might end up desiring
ProtocolBuffers[1] instead of JSon, so if it would be easy to change
the output of
the structured logging easily that would be great.

But AFAICT these series are all about putting the sampling points into the
code base, so formatting would be orthogonal to it?

Stefan

[1] https://developers.google.com/protocol-buffers/


Re: [PATCH v2] send-email: add an option to impose delay sent E-Mails

2018-08-14 Thread Stefan Beller
On Tue, Aug 14, 2018 at 11:15 AM Ævar Arnfjörð Bjarmason
 wrote:
>
> Add a --send-delay option with a corresponding sendemail.smtpSendDelay
> configuration variable. When set to e.g. 2, this causes send-email to
> sleep 2 seconds before sending the next E-Mail. We'll only sleep
> between sends, not before the first send, or after the last.
>
> This option has two uses. Firstly, to be able to Ctrl+C a long send
> with "all" if you have a change of heart. Secondly, as a hack in some
> mail setups to, with a sufficiently high delay, force the receiving
> client to sort the E-Mails correctly.
>
> Some popular E-Mail clients completely ignore the "Date" header, which
> format-patch is careful to set such that the patches will be displayed
> in order, and instead sort by the time the E-mail was received.
>
> Google's GMail is a good example of such a client. It ostensibly sorts
> by some approximation of received time (although not by any "Received"
> header). It's more usual than not to see patches showing out of order
> in GMail. To take a few examples of orders seen on patches on the Git
> mailing list:
>
> 1 -> 3 -> 4 -> 2 -> 8 -> 7 (completion by Nguyễn Thái Ngọc Duy)
> 2 -> 0 -> 1 -> 3 (pack search by Derrick Stolee)
> 3 -> 2 -> 1 (fast-import by Jameson Miller)
> 2 -> 3 -> 1 -> 5 -> 4 -> 6 (diff-highlight by Jeff King)
>
> The reason to add the new "X-Mailer-Send-Delay" header is to make it
> easy to tell what the imposed delay was, if any. This allows for
> gathering some data on how the transfer of E-Mails with & without this
> option behaves. This may not be workable without really long delays,
> see [1] and [2].
>
> The reason for why the getopt format is "send-delay=s" instead of
> "send-delay=d" is because we're doing manual validation of the value
> we get passed, which getopt would corrupt in cases of e.g. float
> values before we could show a sensible error message.
>
> 1. 
> https://public-inbox.org/git/20180325210132.ge74...@genre.crustytoothpaste.net/
> 2. https://public-inbox.org/git/xmqqpo3rehe4@gitster-ct.c.googlers.com/
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>
> I submitted this back in March hoping it would solve mail ordering
> problems, but the other motive I had for this is that I'm paranoid
> that I'm sending out bad E-Mails, and tend to "y" to each one because
> "a" is too fast.'

Heh. GMail seems to have added an Undo button in their UI, which
would be the same feature as this one. (Hit Ctrl+C in time to "undo"
the sending command)

I have been bitten quite a few times by using "a" as I had old
series still laying around, such that it would send a new series and parts
of the old series (or when you changed subjects and resend another
iteration of a series, you may end up with two "patch 1"s).
So I learned to be careful before pressing "a" on sending.

Maybe the underlying issue is that you really only want to send a series
and not "all" as send-email asks for.
So maybe that dialog could learn a [s]eries switch, which would
check either filenames to count up, or if the base that is recorded
(base-commit for first and prerequisite-patch-id for followups)
is consistent.

Send this email? ([y]es|[n]o|[e]dit|[q]uit|[a]ll|[s]eries):

Another note:
I personally never use no/quit, but Ctrl+C for both cases.

This is also different than the feature of 5453b83bdf9 (send-email
--batch-size to work around some SMTP server limit, 2017-05-21)
which introduced sendemail.smtpReloginDelay, which would offer the
same functionality when the batch-size is set to 1. (Although this would
keep you connected to the server as well as add the X-Mailer-Send-Delay
header, which is nothing from the official email RFC, but your own invention?)

Having sorted mails in GMail would be nice!

Thanks,
Stefan


Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-14 Thread Duy Nguyen
On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler  wrote:
> I'm looking at adding code to my SLOG (better name suggestions welcome)
> patch series to eventually replace the existing git_trace facility.

Complement maybe. Replace, please no. I'd rather not stare at json messages.
-- 
Duy


Re: [PATCH] mingw: enable atomic O_APPEND

2018-08-14 Thread Johannes Sixt

Am 14.08.2018 um 00:37 schrieb Jeff King:

And then you can do something like:

   for size in 4097 8193 16385 32769 65537 131073 262145 524289 1048577; do
 >out ;# clean up from last run
 echo "Trying $size..."
 timeout 5 ./write $size out
 if ! ./check $size 

I used your programs with necessary adjustments (as fork() is not 
available), and did similar tests with concurrent processes. With packet 
sizes 1025, 4093, 7531 (just to include some odd number), and 8193 I did 
not observe any overlapping or short writes.


I'm now very confident that we are on the safe side for our purposes.

-- Hannes


Re: [PATCH] git-submodule.sh: accept verbose flag in cmd_update to be non-quiet

2018-08-14 Thread Jonathan Nieder
Stefan Beller wrote:

>  git-submodule.sh | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/git-submodule.sh b/git-submodule.sh
> index 8b5ad59bdee..f7fd80345cd 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -438,6 +438,9 @@ cmd_update()
>   -q|--quiet)
>   GIT_QUIET=1
>   ;;
> + -v)
> + GIT_QUIET=0
> + ;;

Reviewed-by: Jonathan Nieder 

Thanks.


[PATCH] git-submodule.sh: accept verbose flag in cmd_update to be non-quiet

2018-08-14 Thread Stefan Beller
In a56771a668d (builtin/pull: respect verbosity settings in submodules,
2018-01-25), we made sure to pass on both quiet and verbose flag from
builtin/pull.c to the submodule shell script. However git-submodule doesn't
understand a verbose flag, which results in a bug when invoking

  git pull --recurse-submodules -v [...]

There are a few different approaches to fix this bug:

1) rewrite 'argv_push_verbosity' or its caller in builtin/pull.c to
   cap opt_verbosity at 0. Then 'argv_push_verbosity' would only add
   '-q' if any.

2) Have a flag in 'argv_push_verbosity' that specifies if we allow adding
  -q or -v (or both).

3) Add -v to git-submodule.sh and make it a no-op

(1) seems like a maintenance burden: What if we add code after
the submodule operations or move submodule operations higher up,
then we have altered the opt_verbosity setting further down the line
in builtin/pull.c.

(2) seems like it could work reasonably well without more regressions

(3) seems easiest to implement as well as actually is a feature with the
last-one-wins rule of passing flags to Git commands.

Signed-off-by: Stefan Beller 
---

On Tue, Aug 14, 2018 at 10:54 AM Jochen Kühner  wrote:
>
> If I set 
> git config --global submodule.recurse true
> and run git via:
> git pull --progress -v --no-rebase "origin"
> The command will fail with following output (Errorlevel is 1)
> Fetching submodule submodules/tstemplates
> From http://10.0.102.194:7990/scm/mcc/tstemplates
> = [up to date]  feature/robin -> origin/feature/robin
> = [up to date]  master-> origin/master
> Already up to date.
> usage: git submodule [--quiet] add [-b ] [-f|--force] [--name ] 
> [--reference ] [--]  []
>or: git submodule [--quiet] status [--cached] [--recursive] [--] 
> [...]
>or: git submodule [--quiet] init [--] [...]
>or: git submodule [--quiet] deinit [-f|--force] (--all| [--] ...)
>or: git submodule [--quiet] update [--init] [--remote] [-N|--no-fetch] 
> [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] 
> [--reference ] [--recursive] [--] [...]
>or: git submodule [--quiet] summary [--cached|--files] [--summary-limit 
> ] [commit] [--] [...]
>or: git submodule [--quiet] foreach [--recursive] 
>or: git submodule [--quiet] sync [--recursive] [--] [...]
>or: git submodule [--quiet] absorbgitdirs [--] [...]
>
> seams that the “verbose” parameter “-v” is also sent to “git submodules” wich 
> does not support it.
>
> If I remove “-v” it will work.
>
> Problem is, I use TortoiseGit, wich will automatically create this command!

 git-submodule.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/git-submodule.sh b/git-submodule.sh
index 8b5ad59bdee..f7fd80345cd 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -438,6 +438,9 @@ cmd_update()
-q|--quiet)
GIT_QUIET=1
;;
+   -v)
+   GIT_QUIET=0
+   ;;
--progress)
progress=1
;;
-- 
2.18.0.265.g16de1b435c9.dirty



[PATCH 4/4] for_each_*_object: move declarations to object-store.h

2018-08-14 Thread Jeff King
The for_each_loose_object() and for_each_packed_object()
functions are meant to be part of a unified interface: they
use the same set of for_each_object_flags, and it's not
inconceivable that we might one day add a single
for_each_object() wrapper around them.

Let's put them together in a single file, so we can avoid
awkwardness like saying "the flags for this function are
over in cache.h". Moving the loose functions to packfile.h
is silly. Moving the packed functions to cache.h works, but
makes the "cache.h is a kitchen sink" problem worse. The
best place is the recently-created object-store.h, since
these are quite obviously related to object storage.

The for_each_*_in_objdir() functions do not use the same
flags, but they are logically part of the same interface as
for_each_loose_object(), and share callback signatures. So
we'll move those, as well, as they also make sense in
object-store.h.

Signed-off-by: Jeff King 
---
This patch also happens to be a nice showcase for --color-moved.

 builtin/prune-packed.c |  1 +
 cache.h| 75 ---
 object-store.h | 90 ++
 packfile.h | 20 --
 4 files changed, 91 insertions(+), 95 deletions(-)

diff --git a/builtin/prune-packed.c b/builtin/prune-packed.c
index 4ff525e50f..a9e7b552b9 100644
--- a/builtin/prune-packed.c
+++ b/builtin/prune-packed.c
@@ -3,6 +3,7 @@
 #include "progress.h"
 #include "parse-options.h"
 #include "packfile.h"
+#include "object-store.h"
 
 static const char * const prune_packed_usage[] = {
N_("git prune-packed [-n | --dry-run] [-q | --quiet]"),
diff --git a/cache.h b/cache.h
index 6d14702df2..aee36afa54 100644
--- a/cache.h
+++ b/cache.h
@@ -1575,81 +1575,6 @@ extern int odb_mkstemp(struct strbuf *temp_filename, 
const char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-/*
- * Iterate over the files in the loose-object parts of the object
- * directory "path", triggering the following callbacks:
- *
- *  - loose_object is called for each loose object we find.
- *
- *  - loose_cruft is called for any files that do not appear to be
- *loose objects. Note that we only look in the loose object
- *directories "objects/[0-9a-f]{2}/", so we will not report
- *"objects/foobar" as cruft.
- *
- *  - loose_subdir is called for each top-level hashed subdirectory
- *of the object directory (e.g., "$OBJDIR/f0"). It is called
- *after the objects in the directory are processed.
- *
- * Any callback that is NULL will be ignored. Callbacks returning non-zero
- * will end the iteration.
- *
- * In the "buf" variant, "path" is a strbuf which will also be used as a
- * scratch buffer, but restored to its original contents before
- * the function returns.
- */
-typedef int each_loose_object_fn(const struct object_id *oid,
-const char *path,
-void *data);
-typedef int each_loose_cruft_fn(const char *basename,
-   const char *path,
-   void *data);
-typedef int each_loose_subdir_fn(unsigned int nr,
-const char *path,
-void *data);
-int for_each_file_in_obj_subdir(unsigned int subdir_nr,
-   struct strbuf *path,
-   each_loose_object_fn obj_cb,
-   each_loose_cruft_fn cruft_cb,
-   each_loose_subdir_fn subdir_cb,
-   void *data);
-int for_each_loose_file_in_objdir(const char *path,
- each_loose_object_fn obj_cb,
- each_loose_cruft_fn cruft_cb,
- each_loose_subdir_fn subdir_cb,
- void *data);
-int for_each_loose_file_in_objdir_buf(struct strbuf *path,
- each_loose_object_fn obj_cb,
- each_loose_cruft_fn cruft_cb,
- each_loose_subdir_fn subdir_cb,
- void *data);
-
-/*
- * Flags for for_each_*_object(), including for_each_loose below and
- * for_each_packed in packfile.h.
- */
-enum for_each_object_flags {
-   /* Iterate only over local objects, not alternates. */
-   FOR_EACH_OBJECT_LOCAL_ONLY = (1<<0),
-
-   /* Only iterate over packs obtained from the promisor remote. */
-   FOR_EACH_OBJECT_PROMISOR_ONLY = (1<<1),
-
-   /*
-* Visit objects within a pack in packfile order rather than .idx order
-*/
-   FOR_EACH_OBJECT_PACK_ORDER = (1<<2),
-};
-
-/*
- * Iterate over all accessible loose objects without respect to
- * reachability. By default, this includes both local and alternate objects.
- * The order in which objects are visited is unspecified.
- *
- * Any flags specific to packs 

[PATCH 3/4] cat-file: use a single strbuf for all output

2018-08-14 Thread Jeff King
When we're in batch mode, we end up in batch_object_write()
for each object, which allocates its own strbuf for each
call. Instead, we can provide a single "scratch" buffer that
gets reused for each output. When running:

  git cat-file --batch-all-objects --batch-check='%(objectname)'

on git.git, my best-of-five time drops from:

  real  0m0.171s
  user  0m0.159s
  sys   0m0.012s

to:

  real  0m0.133s
  user  0m0.121s
  sys   0m0.012s

Note that we could do this just by putting the "scratch"
pointer into "struct expand_data", but I chose instead to
add an extra parameter to the callstack. That's more
verbose, but it makes it a bit more obvious what is going
on, which in turn makes it easy to see where we need to be
releasing the string in the caller (right after the loop
which uses it in each case).

Based-on-a-patch-by: René Scharfe 
Signed-off-by: Jeff King 
---
It also made it easy to see that without the prior patch,
we'd have been using "buf" for two parameters. :)

 builtin/cat-file.c | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 3ed1d0be80..08dced2614 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -338,11 +338,11 @@ static void print_object_or_die(struct batch_options 
*opt, struct expand_data *d
}
 }
 
-static void batch_object_write(const char *obj_name, struct batch_options *opt,
+static void batch_object_write(const char *obj_name,
+  struct strbuf *scratch,
+  struct batch_options *opt,
   struct expand_data *data)
 {
-   struct strbuf buf = STRBUF_INIT;
-
if (!data->skip_object_info &&
oid_object_info_extended(the_repository, >oid, >info,
 OBJECT_INFO_LOOKUP_REPLACE) < 0) {
@@ -352,10 +352,10 @@ static void batch_object_write(const char *obj_name, 
struct batch_options *opt,
return;
}
 
-   strbuf_expand(, opt->format, expand_format, data);
-   strbuf_addch(, '\n');
-   batch_write(opt, buf.buf, buf.len);
-   strbuf_release();
+   strbuf_reset(scratch);
+   strbuf_expand(scratch, opt->format, expand_format, data);
+   strbuf_addch(scratch, '\n');
+   batch_write(opt, scratch->buf, scratch->len);
 
if (opt->print_contents) {
print_object_or_die(opt, data);
@@ -363,7 +363,9 @@ static void batch_object_write(const char *obj_name, struct 
batch_options *opt,
}
 }
 
-static void batch_one_object(const char *obj_name, struct batch_options *opt,
+static void batch_one_object(const char *obj_name,
+struct strbuf *scratch,
+struct batch_options *opt,
 struct expand_data *data)
 {
struct object_context ctx;
@@ -405,20 +407,21 @@ static void batch_one_object(const char *obj_name, struct 
batch_options *opt,
return;
}
 
-   batch_object_write(obj_name, opt, data);
+   batch_object_write(obj_name, scratch, opt, data);
 }
 
 struct object_cb_data {
struct batch_options *opt;
struct expand_data *expand;
struct oidset *seen;
+   struct strbuf *scratch;
 };
 
 static int batch_object_cb(const struct object_id *oid, void *vdata)
 {
struct object_cb_data *data = vdata;
oidcpy(>expand->oid, oid);
-   batch_object_write(NULL, data->opt, data->expand);
+   batch_object_write(NULL, data->scratch, data->opt, data->expand);
return 0;
 }
 
@@ -509,6 +512,7 @@ static int batch_objects(struct batch_options *opt)
 
cb.opt = opt;
cb.expand = 
+   cb.scratch = 
 
if (opt->unordered) {
struct oidset seen = OIDSET_INIT;
@@ -531,6 +535,7 @@ static int batch_objects(struct batch_options *opt)
oid_array_clear();
}
 
+   strbuf_release();
return 0;
}
 
@@ -559,10 +564,11 @@ static int batch_objects(struct batch_options *opt)
data.rest = p;
}
 
-   batch_one_object(input.buf, opt, );
+   batch_one_object(input.buf, , opt, );
}
 
strbuf_release();
+   strbuf_release();
warn_on_object_refname_ambiguity = save_warning;
return retval;
 }
-- 
2.18.0.1066.g0d97f3a098



Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-14 Thread Jeff Hostetler




On 8/13/2018 6:41 PM, Junio C Hamano wrote:

Jeff King  writes:


I can buy the argument that it's nice to have some form of profiling
that works everywhere, even if it's lowest-common-denominator. I just
wonder if we could be investing effort into tooling around existing
solutions that will end up more powerful and flexible in the long run.


Another thing I noticed is that the codepaths we would find
interesting to annotate with trace_performance_* stuff often
overlaps with the "slog" thing.  If the latter aims to eventually
replace GIT_TRACE (and if not, I suspect there is not much point
adding it in the first place), perhaps we can extend it to also
cover the need of these trace_performance_* calls, so that we do not
have to carry three different tracing mechanisms.



I'm looking at adding code to my SLOG (better name suggestions welcome)
patch series to eventually replace the existing git_trace facility.
And I would like to have a set of nested messages like Duy has proposed
be a part of that.

In an independent effort I've found the nested messages being very
helpful in certain contexts.  They are not a replacement for the
various platform tools, like PerfView and friends as discussed earlier
on this thread, but then again I can ask a customer to turn a knob and
run it again and send me the output and hopefully get a rough idea of
the problem -- without having them install a bunch of perf tools.

Jeff



Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Jonathan Tan
> @@ -743,6 +743,9 @@ specification contained in .
>   A debug option to help with future "partial clone" development.
>   This option specifies how missing objects are handled.
>  +
> +The form '--filter=tree:' omits all blobs and trees deeper than
> + from the root tree. Currently, only =0 is supported.
> ++
>  The form '--missing=error' requests that rev-list stop with an error if
>  a missing object is encountered.  This is the default action.
>  +

The "--filter" documentation should go with the other "--filter"
information, not right after --missing.

> +test_expect_success 'setup for tests of tree:0' '
> + mkdir r1/subtree &&
> + echo "This is a file in a subtree" > r1/subtree/file &&
> + git -C r1 add subtree/file &&
> + git -C r1 commit -m subtree
> +'

Style: no space after >

> +test_expect_success 'grab tree directly when using tree:0' '
> + # We should get the tree specified directly but not its blobs or 
> subtrees.
> + git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack 
> <<-EOF &&
> + HEAD:
> + EOF
> + git -C r1 index-pack ../commitsonly.pack &&
> + git -C r1 verify-pack -v ../commitsonly.pack >objs &&
> + grep -E "tree|blob" objs >trees_and_blobs &&
> + test_line_count = 1 trees_and_blobs
> +'

Can we also verify that the SHA-1 in trees_and_blobs is what we
expected?

> +test_expect_success 'use fsck before and after manually fetching a missing 
> subtree' '
> + # push new commit so server has a subtree
> + mkdir src/dir &&
> + echo "in dir" > src/dir/file.txt &&

No space after >

> + git -C src add dir/file.txt &&
> + git -C src commit -m "file in dir" &&
> + git -C src push -u srv master &&
> + SUBTREE=$(git -C src rev-parse HEAD:dir) &&
> +
> + rm -rf dst &&
> + git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
> + git -C dst fsck &&
> + git -C dst cat-file -p $SUBTREE >tree_contents 2>err &&
> + git -C dst fsck
> +'

If you don't need to redirect to err, don't do so.

Before the cat-file, also verify that the tree is missing, most likely
through a "git rev-list" with "--missing=print".

And I would grep on the tree_contents to ensure that the filename
("file.txt") is there, so that we know that we got the correct tree.

> +test_expect_success 'can use tree:0 to filter partial clone' '
> + rm -rf dst &&
> + git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
> + git -C dst rev-list master --missing=allow-any --objects 
> >fetched_objects &&
> + cat fetched_objects \
> + | awk -f print_1.awk \
> + | xargs -n1 git -C dst cat-file -t >fetched_types &&
> + sort fetched_types -u >unique_types.observed &&
> + echo commit > unique_types.expected &&
> + test_cmp unique_types.observed unique_types.expected
> +'
> +
> +test_expect_success 'auto-fetching of trees with --missing=error' '
> + git -C dst rev-list master --missing=error --objects >fetched_objects &&
> + cat fetched_objects \
> + | awk -f print_1.awk \
> + | xargs -n1 git -C dst cat-file -t >fetched_types &&
> + sort fetched_types -u >unique_types.observed &&
> + printf "blob\ncommit\ntree\n" >unique_types.expected &&
> + test_cmp unique_types.observed unique_types.expected
> +'

These two tests seem redundant with the 'use fsck before and after
manually fetching a missing subtree' test (after the latter is
appropriately renamed). I think we only need to test this sequence once,
which can be placed in one or spread over multiple tests:

 1. partial clone with --filter=tree:0
 2. fsck works
 3. verify that trees are indeed missing
 4. autofetch a tree
 5. fsck still works


[PATCH 2/4] cat-file: split batch "buf" into two variables

2018-08-14 Thread Jeff King
We use the "buf" strbuf for two things: to read incoming
lines, and as a scratch space for test-expanding the
user-provided format. Let's split this into two variables
with descriptive names, which makes their purpose and
lifetime more clear.

It will also help in a future patch when we start using the
"output" buffer for more expansions.

Signed-off-by: Jeff King 
---
René, in the patch you sent earlier, I noticed that for the
non-batch-all-objects case we use the same strbuf for input and output.
That'd probably be OK most of the time (the first thing we do is resolve
the input to an oid), but I suspect it could be pretty bad with %(rest).
We'd write over or even realloc the string it points into as part of the
output.

This patch just clarifies the names; your reuse idea is in the next one.

 builtin/cat-file.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 04b5cda191..3ed1d0be80 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -466,7 +466,8 @@ static int batch_unordered_packed(const struct object_id 
*oid,
 
 static int batch_objects(struct batch_options *opt)
 {
-   struct strbuf buf = STRBUF_INIT;
+   struct strbuf input = STRBUF_INIT;
+   struct strbuf output = STRBUF_INIT;
struct expand_data data;
int save_warning;
int retval = 0;
@@ -481,8 +482,9 @@ static int batch_objects(struct batch_options *opt)
 */
memset(, 0, sizeof(data));
data.mark_query = 1;
-   strbuf_expand(, opt->format, expand_format, );
+   strbuf_expand(, opt->format, expand_format, );
data.mark_query = 0;
+   strbuf_release();
if (opt->cmdmode)
data.split_on_whitespace = 1;
 
@@ -542,14 +544,14 @@ static int batch_objects(struct batch_options *opt)
save_warning = warn_on_object_refname_ambiguity;
warn_on_object_refname_ambiguity = 0;
 
-   while (strbuf_getline(, stdin) != EOF) {
+   while (strbuf_getline(, stdin) != EOF) {
if (data.split_on_whitespace) {
/*
 * Split at first whitespace, tying off the beginning
 * of the string and saving the remainder (or NULL) in
 * data.rest.
 */
-   char *p = strpbrk(buf.buf, " \t");
+   char *p = strpbrk(input.buf, " \t");
if (p) {
while (*p && strchr(" \t", *p))
*p++ = '\0';
@@ -557,10 +559,10 @@ static int batch_objects(struct batch_options *opt)
data.rest = p;
}
 
-   batch_one_object(buf.buf, opt, );
+   batch_one_object(input.buf, opt, );
}
 
-   strbuf_release();
+   strbuf_release();
warn_on_object_refname_ambiguity = save_warning;
return retval;
 }
-- 
2.18.0.1066.g0d97f3a098



[PATCH v2] send-email: add an option to impose delay sent E-Mails

2018-08-14 Thread Ævar Arnfjörð Bjarmason
Add a --send-delay option with a corresponding sendemail.smtpSendDelay
configuration variable. When set to e.g. 2, this causes send-email to
sleep 2 seconds before sending the next E-Mail. We'll only sleep
between sends, not before the first send, or after the last.

This option has two uses. Firstly, to be able to Ctrl+C a long send
with "all" if you have a change of heart. Secondly, as a hack in some
mail setups to, with a sufficiently high delay, force the receiving
client to sort the E-Mails correctly.

Some popular E-Mail clients completely ignore the "Date" header, which
format-patch is careful to set such that the patches will be displayed
in order, and instead sort by the time the E-mail was received.

Google's GMail is a good example of such a client. It ostensibly sorts
by some approximation of received time (although not by any "Received"
header). It's more usual than not to see patches showing out of order
in GMail. To take a few examples of orders seen on patches on the Git
mailing list:

1 -> 3 -> 4 -> 2 -> 8 -> 7 (completion by Nguyễn Thái Ngọc Duy)
2 -> 0 -> 1 -> 3 (pack search by Derrick Stolee)
3 -> 2 -> 1 (fast-import by Jameson Miller)
2 -> 3 -> 1 -> 5 -> 4 -> 6 (diff-highlight by Jeff King)

The reason to add the new "X-Mailer-Send-Delay" header is to make it
easy to tell what the imposed delay was, if any. This allows for
gathering some data on how the transfer of E-Mails with & without this
option behaves. This may not be workable without really long delays,
see [1] and [2].

The reason for why the getopt format is "send-delay=s" instead of
"send-delay=d" is because we're doing manual validation of the value
we get passed, which getopt would corrupt in cases of e.g. float
values before we could show a sensible error message.

1. 
https://public-inbox.org/git/20180325210132.ge74...@genre.crustytoothpaste.net/
2. https://public-inbox.org/git/xmqqpo3rehe4@gitster-ct.c.googlers.com/

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

I submitted this back in March hoping it would solve mail ordering
problems, but the other motive I had for this is that I'm paranoid
that I'm sending out bad E-Mails, and tend to "y" to each one because
"a" is too fast.

So I'm re-sending this with an updated commit message & rationale, and
not sending 2/2 to toggle this on by default. I'd still like to have
this feature.

 Documentation/config.txt |  6 
 Documentation/git-send-email.txt |  4 +++
 git-send-email.perl  | 18 ---
 t/t9001-send-email.sh| 55 
 4 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 63365dcf3d..5eb81b64a7 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3216,6 +3216,12 @@ sendemail.smtpReloginDelay::
Seconds wait before reconnecting to smtp server.
See also the `--relogin-delay` option of linkgit:git-send-email[1].
 
+sendemail.smtpSendDelay::
+   Seconds wait in between message sending before sending another
+   message. Set it to 0 to impose no extra delay, defaults to 0.
++
+See also the `--send-delay` option of linkgit:git-send-email[1].
+
 showbranch.default::
The default set of branches for linkgit:git-show-branch[1].
See linkgit:git-show-branch[1].
diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt
index 465a4ecbed..98fdd9b803 100644
--- a/Documentation/git-send-email.txt
+++ b/Documentation/git-send-email.txt
@@ -270,6 +270,10 @@ must be used for each option.
with --batch-size option.  Defaults to the `sendemail.smtpReloginDelay`
configuration variable.
 
+--send-delay=::
+   Wait $ between sending emails. Defaults to the
+   `sendemail.smtpSendDelay` configuration variable.
+
 Automating
 ~~
 
diff --git a/git-send-email.perl b/git-send-email.perl
index 2be5dac337..65b53ee9f1 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -89,6 +89,7 @@ sub usage {
 --batch-size  * send max  message per connection.
 --relogin-delay   * delay  seconds between two 
successive login.
  This option can only be used with 
--batch-size
+--send-delay  * ensure that  seconds pass between 
two successive sends.
 
   Automating:
 --identity* Use the sendemail. options.
@@ -225,7 +226,7 @@ sub do_edit {
 my ($to_cmd, $cc_cmd);
 my ($smtp_server, $smtp_server_port, @smtp_server_options);
 my ($smtp_authuser, $smtp_encryption, $smtp_ssl_cert_path);
-my ($batch_size, $relogin_delay);
+my ($batch_size, $relogin_delay, $send_delay);
 my ($identity, $aliasfiletype, @alias_files, $smtp_domain, $smtp_auth);
 my ($validate, $confirm);
 my (@suppress_cc);
@@ -259,6 +260,7 @@ sub do_edit {
 "smtpauth" => \$smtp_auth,
 "smtpbatchsize" => \$batch_size,
 "smtprelogindelay" => \$relogin_delay,
+

[PATCH 1/4] cat-file: use oidset check-and-insert

2018-08-14 Thread Jeff King
We don't need to check if the oidset has our object before
we insert it; that's done as part of the insertion. We can
just rely on the return value from oidset_insert(), which
saves one hash lookup per object.

This measurable speedup is tiny and within the run-to-run
noise, but the result is simpler to read, too.

Signed-off-by: Jeff King 
---
 builtin/cat-file.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 45992c9be9..04b5cda191 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -443,9 +443,8 @@ static int batch_unordered_object(const struct object_id 
*oid, void *vdata)
 {
struct object_cb_data *data = vdata;
 
-   if (oidset_contains(data->seen, oid))
+   if (oidset_insert(data->seen, oid))
return 0;
-   oidset_insert(data->seen, oid);
 
return batch_object_cb(oid, data);
 }
-- 
2.18.0.1066.g0d97f3a098



[PATCH 0/4] finishing touches on jk/for-each-object-iteration

2018-08-14 Thread Jeff King
On Mon, Aug 13, 2018 at 11:45:06AM -0700, Jonathan Tan wrote:

> >   [1/7]: for_each_*_object: store flag definitions in a single location
> >   [2/7]: for_each_*_object: take flag arguments as enum
> >   [3/7]: for_each_*_object: give more comprehensive docstrings
> >   [4/7]: for_each_packed_object: support iterating in pack-order
> >   [5/7]: t1006: test cat-file --batch-all-objects with duplicates
> >   [6/7]: cat-file: rename batch_{loose,packed}_object callbacks
> >   [7/7]: cat-file: support "unordered" output for --batch-all-objects
> 
> Thanks for laying all the patches out so cleanly! All of them are:
> Reviewed-by: Jonathan Tan 
> 
> Normally I would re-explain the patches to demonstrate that I understand
> them, but in this case, I think they are simple enough - patches 1, 2,
> 3, and 6 are refactorings that I agree with, patch 5 just makes a test
> more comprehensive, and patches 4 and 7 do what their commit messages
> say.
> 
> Stefan brought up the concern that cache.h is increasing in size, but I
> agree with the patch as written that it's probably best that we
> centralize all the flags somewhere, and we can deal with the location in
> a future patch.

Thanks for the review. Here are a few patches on top to deal with the
cache.h thing, as well as some optimizations that came out of discussing
oidset in another thread (I left out for now the "big" optimization of
moving oidset to a different data structure; that's complicated enough
to be dealt with on its own, I think).

The first patch here could arguably be squashed into the final patch of
the original series, but I'm OK with it either way.

  [1/4]: cat-file: use oidset check-and-insert
  [2/4]: cat-file: split batch "buf" into two variables
  [3/4]: cat-file: use a single strbuf for all output
  [4/4]: for_each_*_object: move declarations to object-store.h

 builtin/cat-file.c | 43 +++-
 builtin/prune-packed.c |  1 +
 cache.h| 75 ---
 object-store.h | 90 ++
 packfile.h | 20 --
 5 files changed, 116 insertions(+), 113 deletions(-)

-Peff


Re: [PATCH v3 5/5] list-objects-filter: implement filter tree:0

2018-08-14 Thread Matthew DeVore
On Tue, Aug 14, 2018 at 8:13 AM Jeff Hostetler  wrote:
>
>
>
> On 8/13/2018 2:14 PM, Matthew DeVore wrote:
> > Teach list-objects the "tree:0" filter which allows for filtering
> > out all tree and blob objects (unless other objects are explicitly
> > specified by the user). The purpose of this patch is to allow smaller
> > partial clones.
> >
> > The name of this filter - tree:0 - does not explicitly specify that
> > it also filters out all blobs, but this should not cause much confusion
> > because blobs are not at all useful without the trees that refer to
> > them.
> >
> > I also consider only:commits as a name, but this is inaccurate because
> > it suggests that annotated tags are omitted, but actually they are
> > included.
> >
> > The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
> > would filter out all but the root tree and blobs. In order to avoid
> > confusion between 0 and capital O, the documentation was worded in a
> > somewhat round-about way that also hints at this future improvement to
> > the feature.
> >
> > Signed-off-by: Matthew DeVore 
> > ---
> >   Documentation/rev-list-options.txt |  3 ++
> >   list-objects-filter-options.c  |  4 +++
> >   list-objects-filter-options.h  |  1 +
> >   list-objects-filter.c  | 50 ++
> >   t/t5317-pack-objects-filter-objects.sh | 27 ++
> >   t/t5616-partial-clone.sh   | 27 ++
> >   t/t6112-rev-list-filters-objects.sh| 13 +++
> >   7 files changed, 125 insertions(+)
> >
> > diff --git a/Documentation/rev-list-options.txt 
> > b/Documentation/rev-list-options.txt
> > index 7b273635d..9e351ec2a 100644
> > --- a/Documentation/rev-list-options.txt
> > +++ b/Documentation/rev-list-options.txt
> > @@ -743,6 +743,9 @@ specification contained in .
> >   A debug option to help with future "partial clone" development.
> >   This option specifies how missing objects are handled.
> >   +
> > +The form '--filter=tree:' omits all blobs and trees deeper than
> > + from the root tree. Currently, only =0 is supported.
> > ++
> >   The form '--missing=error' requests that rev-list stop with an error if
> >   a missing object is encountered.  This is the default action.
> >   +
> > diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> > index c0e2bd6a0..a28382940 100644
> > --- a/list-objects-filter-options.c
> > +++ b/list-objects-filter-options.c
> > @@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
> >   return 0;
> >   }
> >
> > + } else if (!strcmp(arg, "tree:0")) {
> > + filter_options->choice = LOFC_TREE_NONE;
> > + return 0;
> > +
> >   } else if (skip_prefix(arg, "sparse:oid=", )) {
> >   struct object_context oc;
> >   struct object_id sparse_oid;
> > diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> > index a61f8..af64e5c66 100644
> > --- a/list-objects-filter-options.h
> > +++ b/list-objects-filter-options.h
> > @@ -10,6 +10,7 @@ enum list_objects_filter_choice {
> >   LOFC_DISABLED = 0,
> >   LOFC_BLOB_NONE,
> >   LOFC_BLOB_LIMIT,
> > + LOFC_TREE_NONE,
> >   LOFC_SPARSE_OID,
> >   LOFC_SPARSE_PATH,
> >   LOFC__COUNT /* must be last */
> > diff --git a/list-objects-filter.c b/list-objects-filter.c
> > index a0ba78b20..8e3caf5bf 100644
> > --- a/list-objects-filter.c
> > +++ b/list-objects-filter.c
> > @@ -80,6 +80,55 @@ static void *filter_blobs_none__init(
> >   return d;
> >   }
> >
> > +/*
> > + * A filter for list-objects to omit ALL trees and blobs from the 
> > traversal.
> > + * Can OPTIONALLY collect a list of the omitted OIDs.
> > + */
> > +struct filter_trees_none_data {
> > + struct oidset *omits;
> > +};
> > +
> > +static enum list_objects_filter_result filter_trees_none(
> > + enum list_objects_filter_situation filter_situation,
> > + struct object *obj,
> > + const char *pathname,
> > + const char *filename,
> > + void *filter_data_)
> > +{
> > + struct filter_trees_none_data *filter_data = filter_data_;
> > +
> > + switch (filter_situation) {
> > + default:
> > + die("unknown filter_situation");
> > + return LOFR_ZERO;
> > +
> > + case LOFS_BEGIN_TREE:
> > + case LOFS_BLOB:
> > + if (filter_data->omits)
> > + oidset_insert(filter_data->omits, >oid);
> > + return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
> > +
> > + case LOFS_END_TREE:
> > + assert(obj->type == OBJ_TREE);
> > + return LOFR_ZERO;
> > +
> > + }
> > +}
>
> There are a couple of options here:
> [] If really want to omit all trees and blobs (and we DO NOT want
> the oidset of everything omitted), then we might be able to
> shortcut the traversal and speed things up.
>
> {} add a LOFR_SKIP_TREE bit to 

  1   2   >