Re: [PATCH v1 03/25] structured-logging: add structured logging framework

2018-08-20 Thread Jonathan Nieder
Jeff Hostetler wrote:

[...]
> --- a/compat/mingw.h
> +++ b/compat/mingw.h
> @@ -144,8 +144,15 @@ static inline int fcntl(int fd, int cmd, ...)
>   errno = EINVAL;
>   return -1;
>  }
> +
>  /* bash cannot reliably detect negative return codes as failure */
> +#if defined(STRUCTURED_LOGGING)

Git usually spells this as #ifdef.

> +#include "structured-logging.h"
> +#define exit(code) exit(strlog_exit_code((code) & 0xff))
> +#else
>  #define exit(code) exit((code) & 0xff)
> +#endif

This is hard to follow, since it only makes sense in combination with
the corresponding code in git-compat-util.h.  Can they be defined
together?  If not, can they have comments to make it easier to know to
edit one too when editing the other?

[...]
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -1239,4 +1239,13 @@ extern void unleak_memory(const void *ptr, size_t len);
>  #define UNLEAK(var) do {} while (0)
>  #endif
>  
> +#include "structured-logging.h"

Is this #include needed?  Usually git-compat-util.h only defines C
standard library functions or utilities that are used everywhere.

[...]
> --- a/git.c
> +++ b/git.c
[...]
> @@ -700,7 +701,7 @@ static int run_argv(int *argcp, const char ***argv)
>   return done_alias;
>  }
>  
> -int cmd_main(int argc, const char **argv)
> +static int real_cmd_main(int argc, const char **argv)
>  {
>   const char *cmd;
>   int done_help = 0;
> @@ -779,3 +780,8 @@ int cmd_main(int argc, const char **argv)
>  
>   return 1;
>  }
> +
> +int cmd_main(int argc, const char **argv)
> +{
> + return slog_wrap_main(real_cmd_main, argc, argv);
> +}

Can real_cmd_main get a different name, describing what it does?

[...]
> --- a/structured-logging.c
> +++ b/structured-logging.c
> @@ -1,3 +1,10 @@
[...]
> +static uint64_t my__start_time;
> +static uint64_t my__exit_time;
> +static int my__is_config_loaded;
> +static int my__is_enabled;
> +static int my__is_pretty;
> +static int my__signal;
> +static int my__exit_code;
> +static int my__pid;
> +static int my__wrote_start_event;
> +static int my__log_fd = -1;

Please don't use this my__ notation.  The inconsistency with the rest
of Git makes the code feel out of place and provides an impediment to
smooth reading.

[...]
> +static void emit_start_event(void)
> +{
> + struct json_writer jw = JSON_WRITER_INIT;
> +
> + /* build "cmd_start" event message */
> + jw_object_begin(, my__is_pretty);
> + {
> + jw_object_string(, "event", "cmd_start");
> + jw_object_intmax(, "clock_us", (intmax_t)my__start_time);
> + jw_object_intmax(, "pid", (intmax_t)my__pid);

The use of blocks here is unexpected and makes me wonder what kind of
macro wizardry is going on.  Perhaps this is idiomatic for the
json-writer API; if so, can you add an example to json-writer.h to
help the next surprised reader?

That said, I think

json_object_begin(, ...);

json_object_string(...
json_object_int(...
...

json_object_begin_inline_array(, "argv");
for (k = 0; k < argv.argc; k++)
json_object_string(...
json_object_end();

json_object_end();

is still readable and less unexpected.

[...]
> +static void emit_exit_event(void)
> +{
> + struct json_writer jw = JSON_WRITER_INIT;
> + uint64_t atexit_time = getnanotime() / 1000;
> +
> + /* close unterminated forms */

What are unterminated forms?

> + if (my__errors.json.len)
> + jw_end(__errors);
[...]
> +int slog_default_config(const char *key, const char *value)
> +{
> + const char *sub;
> +
> + /*
> +  * git_default_config() calls slog_default_config() with "slog.*"
> +  * k/v pairs.  git_default_config() MAY or MAY NOT be called when
> +  * cmd_() calls git_config().

No need to shout.

[...]
> +/*
> + * If cmd_() did not cause slog_default_config() to be called
> + * during git_config(), we try to lookup our config settings the first
> + * time we actually need them.
> + *
> + * (We do this rather than using read_early_config() at initialization
> + * because we want any "-c key=value" arguments to be included.)
> + */

Which function is initialization referring to here?

Lazy loading in order to guarantee loading after a different subsystem
sounds a bit fragile, so I wonder if we can make the sequencing more
explicit.

Stopping here.  I still like where this is going, but some aspects of
the coding style are making it hard to see the forest for the trees.
Perhaps some more details about API in the design doc would help.

Thanks and hope that helps,
Jonathan


Re: [PATCH v1 02/25] structured-logging: add STRUCTURED_LOGGING=1 to Makefile

2018-08-20 Thread Jonathan Nieder
g...@jeffhostetler.com wrote:

> From: Jeff Hostetler 
>
> Teach the Makefile to take STRUCTURED_LOGGING=1 variable to
> compile in/out structured logging feature.
>
> Signed-off-by: Jeff Hostetler 
> ---
>  Makefile |  8 
>  structured-logging.c |  9 +
>  structured-logging.h | 13 +
>  3 files changed, 30 insertions(+)
>  create mode 100644 structured-logging.c
>  create mode 100644 structured-logging.h

This should probably be squashed with a later patch (e.g., patch 3).
When taken alone, it produces

[...]
> --- /dev/null
> +++ b/structured-logging.c
> @@ -0,0 +1,9 @@
> +#if !defined(STRUCTURED_LOGGING)
> +/*
> + * Structured logging is not available.
> + * Stub out all API routines.
> + */
> +
> +#else
> +
> +#endif

which is not idiomatic (for example, it's missing a #include of
git-compat-util.h, etc).

Thanks,
Jonathan


Re: [PATCH v1 01/25] structured-logging: design document

2018-08-20 Thread Jonathan Nieder
Hi,

Jeff Hostetler wrote:

> Signed-off-by: Jeff Hostetler 
> ---
>  Documentation/technical/structured-logging.txt | 816 
> +
>  1 file changed, 816 insertions(+)
>  create mode 100644 Documentation/technical/structured-logging.txt

Can you add this to Documentation/Makefile as well, so that an HTML
version will appear once this is merged?  See
https://public-inbox.org/git/20180814222846.gg142...@aiede.svl.corp.google.com/
for an example.

[...]
> +++ b/Documentation/technical/structured-logging.txt
> @@ -0,0 +1,816 @@
> +Structured Logging
> +==
> +
> +Structured Logging (SLOG) is an optional feature to allow Git to
> +generate structured log data for executed commands.  This includes
> +command line arguments, command run times, error codes and messages,
> +child process information, time spent in various critical functions,
> +and repository data-shape information.  Data is written to a target
> +log file in JSON[1,2,3] format.

I like the idea of more structured logs for tracing, monitoring, and
diagnosis (see also
https://research.google.com/archive/papers/dapper-2010-1.pdf on the
subject of tracing).  My main focus in looking over this initial
contribution is

 1. Is this something that could eventually merge with our other
tracing APIs (e.g. trace_printf), or will they remain separate?

 2. Is the API introduced here one that will be easy to work with
and adapt?  What will calling code that makes use of it look like?

[...]
> +Background (Git Merge 2018 Barcelona)
> +=
> +
> +Performance and/or error logging was discussed during the contributor's
> +summit in Barcelona.  Here are the relevant notes from the meeting
> +minutes[6].
> +
> +> Performance misc (Ævar)
> +> ---
> +> [...]
> +>  - central error reporting for git
> +>- `git status` logging
> +>- git config that collects data, pushes to known endpoint with `git 
> push`
> +>- pre_command and post_command hooks, for logs
> +>- `gvfs diagnose` that looks at packfiles, etc
[etc]

I'm not sure what to make of this section.  Is it a historical
artifact, or should I consider it part of the design?

[...]
> +A Quick Example
> +===
> +
> +Note: JSON pretty-printing is enabled in all of the examples shown in
> +this document.  When pretty-printing is turned off, each event is
> +written on a single line.  Pretty-printing is intended for debugging.
> +It should be turned off in production to make post-processing easier.
> +
> +$ git config slog.pretty 
> +
> +Here is a quick example showing SLOG data for "git status".  This
> +example has all optional features turned off.  It contains 2 events.

Thanks for the example!  I'd also be interested in what the tracing
code looks like, since the API is the 'deepest' change involved (the
tracing format can easily change later based on experience).

My initial reaction to the example trace is that it feels like an odd
compromise: on one hand it uses text (JSON) to be human-readable, and
on the other hand it is structured to be machine-readable.  The JSON
writer isn't using a standard JSON serialization library so parsing
difficulties seem likely, and the output is noisy enough that reading
it by hand is hard, too.  That leads me to wish for a different
format, like protobuf (where the binary form is very concise and
parseable, and the textual form is IMHO more readable than JSON).

All that said, this is just the first tracing output format and it's
easy to make new ones later.  It seems fine for that purpose.

[...]
> +Target Log File
> +===
> +
> +SLOG writes events to a log file.  File logging works much like
> +GIT_TRACE where events are appended to a file on disk.

Does it write with O_APPEND?  Does it do anything to guard against
interleaved events --- e.g. are messages buffered and then written
with a single write() call?

[...]
> +Comparison with GIT_TRACE
> +=
> +
> +SLOG is very similar to the existing GIT_TRACE[4] API because both
> +write event messages at various points during a command.  However,
> +there are some fundamental differences that warrant it being
> +considered a separate feature rather than just another
> +GIT_TRACE_:

The natural question would be in the opposite direction: can and
should trace_printf be reimplemented on top of this feature?

[...]
> +
> +

nit: trailing whitespace (you can find these with "git show --check").

[...]
> +A session id (SID) is a cheap, unique-enough string to associate all
> +of the events generated by a single process.  A child git process inherits
> +the SID of their parent git process and incorporates it into their SID.

I wonder if we can structure events in a more hierarchical way to
avoid having to special-case the toplevel command in this way.

[...]
> +# The "cmd_exit" event includes the command result and elapsed
> +# time and the various configuration 

Re: Git Project Leadership Committee

2018-08-20 Thread Jeff King
On Thu, Aug 16, 2018 at 06:41:38PM -0400, Jeff King wrote:

>  - we should avoid anyone who is affiliated with a company that already
>has a member on the committee. So nobody from Google and nobody from
>GitHub. I would extend that to Microsoft, too, given a certain
>impending acquisition. I'd expect anybody who is affiliated with a
>company to recuse themselves from decisions that directly affect that
>company (which is what we've done so far).
> 
>  - I think ideally the candidate would be somebody who represents the
>long tail of volunteer community members who don't work on Git as
>part of their day-job. The current members have a fairly skewed view
>in that respect. At the same time, we can't really represent the
>_really_ long tail of infrequent contributors, by the "stick around"
>criterion above.

Thanks both Christian and Ævar for giving more details on your
situations elsewhere in the thread. I do think neither of you is quite
in the "I just do this in my spare time" situation. But I also think
that situation is going to be inversely correlated with being active in
the project and wanting to spend time on governance stuff. So IMHO some
compromise there is necessary. And I feel like both of you can represent
those interests, even if they're not exactly the situation you're in.

So what next?

There was a little bit of off-list discussion (mostly nominations to
avoid putting the candidate on the spot), but no new public candidates.
I'm happy to entertain more discussion here, but it seems like everybody
is reasonably happy with these two names.

So either Junio and I can pick one, or possibly we could have both (that
gives us a 4-person committee, but again, tied votes haven't been an
issue so far).

Any final thoughts are welcome.

Also, on a more meta-level, I'm happy to hear any thoughts about this
process that we might want to enshrine for later iterations. This is
obviously not nearly as formal as something like Debian elections. But I
don't think we're a big enough community to need that. So my attempt is
to just keep things informal, but try to give as many opportunities as
possible for people to speak up.

-Peff


[L10N] Kickoff for Git 2.19.0 round 1

2018-08-20 Thread Jiang Xin
Hi,

Git v2.19.0-rc0 has been released, and it's time to start new round of git l10n.
This time there are 382 updated messages need to be translated since last
update:

l10n: git.pot: v2.19.0 round 1 (382 new, 30 removed)

Generate po/git.pot from v2.19.0-rc0 for git v2.19.0 l10n round 1.

Signed-off-by: Jiang Xin >

You can get it from the usual place:

https://github.com/git-l10n/git-po/

As how to update your XX.po and help to translate Git, please see
"Updating a XX.po file" and other sections in “po/README" file.

--
Jiang Xin


Re: Antw: Re: non-smooth progress indication for git fsck and git gc

2018-08-20 Thread Jeff King
On Mon, Aug 20, 2018 at 10:57:13AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > That seems to apply. BTW: Is there a way go get some repository statistics
> > like a histogram of object sizes (or whatever that might be useful to help
> > making decisions)?
> 
> The git-sizer program is really helpful in this regard:
> https://github.com/github/git-sizer

Yeah, I'd very much agree that's the best tool for a general overview of
the repository stats.

Ulrich, if you want to more analysis (like a histogram), probably
something like:

  git cat-file --batch-all-objects --batch-check='%(objectsize)'

might be a good starting place to dump information (see the "BATCH
OUTPUT" section of "git help cat-file" for more format items).

> > If it's sorting, maybe add some code like (wild guess):
> >
> > if (objects_to_sort > magic_number)
> >message("Sorting something...");
> 
> I think a good solution to these cases is to just introduce something to
> the progress.c mode where it learns how to display a counter where we
> don't know what the end-state will be. Something like your proposed
> magic_number can just be covered under the more general case where we
> don't show the progress bar unless it's been 1 second (which I believe
> is the default).

Yeah.  We already have open-ended counters (e.g., "counting objects"),
and delayed meters (we actually normalized the default to 2s recently).

I _think_ they should work together OK without further modification.
Once upon a time the caller had to say "don't show if we're past N%
after M seconds", but I think with the current code we'd just show it if
we're not completely finished after 2 seconds.

So it really should just be a simple:

  progress = start_delayed_progress("Hashing packfile", 0);

That said, counting bytes would probably be ugly (just because the
numbers get really big). We'd probably prefer to show a throughput or
something. And as you noted, there's some refactoring to be done with
pack-check for it to show multiple progress meters.

(I still think in the long run we would want to scrap that code, but
that's a much bigger job; I'm fine if somebody wants to do incremental
improvements in the meantime).

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-20 Thread Stefan Beller
On Mon, Aug 20, 2018 at 5:27 PM Jonathan Nieder  wrote:
>
> Jonathan Nieder wrote:
> > Stefan Beller wrote:
> >> Junio C Hamano wrote:
>
> >>>  * "git submodule" did not correctly adjust core.worktree setting that
> >>>indicates whether/where a submodule repository has its associated
> >>>working tree across various state transitions, which has been
> >>>corrected.
> >>>(merge 984cd77ddb sb/submodule-core-worktree later to maint).
> >>
> >> Personally I do not view this as a bug fix but a feature
> >> (but then again my thinking might be tainted of too much
> >> submodule work) hence I would not merge it down.
> >
> > Can you elaborate?
>
> ... ah, I figured it out.  You are saying "would not merge it down to
> maint".  In that case, I agree, since this this is not a recent bug
> (it's existed since before v1.7.10-rc1~14^2~2, 2012-03-02).

Yeah; the behavior was the gold standard for submodules ever since,
so I am wary of changing it under the guise of fixing a bug.
The core.worktree setting doesn't harm the user by default; you
need to craft a very specific situation to benefit from this feature.

Stefan


Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0

2018-08-20 Thread Stefan Beller
> At the risk of going on a tangent, I assumed this was because enums
> are really ints, and the "default" is there in case the enum somehow
> got assigned to an int without a corresponding value. Either because
> of a cast from an int that was out-of-range, or new values that were
> obtained from arithmetic or bitwise operations on the declared enum
> values, which created undeclared values.

See
374166cb381 (grep: catch a missing enum in switch statement, 2017-05-25)
or a bit date, but nevertheless an interesting read:
b8527d5fa61 (wt-status: fix possible use of uninitialized variable, 2013-03-21)

...compilers these days are just too smart to reason about them :-)


Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0

2018-08-20 Thread Stefan Beller
> > heh. Thanks for switching the style; I should have emphasized that
> > (after reflection) I found them equally good, I am used to one
> > over the other more.
> It seems marginally better to me. I also noticed a clean-up patch
> going by that aggressively switched to test_must_be_empty wherever
> possible: 
> https://public-inbox.org/git/20180819215725.29001-1-szeder@gmail.com/
>
> OTOH, if it were up to me I would have just gotten rid of
> test_must_be_empty and used an existing function with the right
> argument, like `test_cmp /dev/null` - but using some form consistently
> is the most important, whatever it is.

/dev/null, eh? It shows you don't use Windows on a day to day basis. ;-)
But yeah consistency is really good to have. :)


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-20 Thread Jonathan Nieder
Jonathan Nieder wrote:
> Stefan Beller wrote:
>> Junio C Hamano wrote:

>>>  * "git submodule" did not correctly adjust core.worktree setting that
>>>indicates whether/where a submodule repository has its associated
>>>working tree across various state transitions, which has been
>>>corrected.
>>>(merge 984cd77ddb sb/submodule-core-worktree later to maint).
>>
>> Personally I do not view this as a bug fix but a feature
>> (but then again my thinking might be tainted of too much
>> submodule work) hence I would not merge it down.
>
> Can you elaborate?

... ah, I figured it out.  You are saying "would not merge it down to
maint".  In that case, I agree, since this this is not a recent bug
(it's existed since before v1.7.10-rc1~14^2~2, 2012-03-02).

Thanks,
Jonathan


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-20 Thread Jonathan Nieder
(-cc: other lists)
Stefan Beller wrote:
> Junio C Hamano wrote:

>>  * "git submodule" did not correctly adjust core.worktree setting that
>>indicates whether/where a submodule repository has its associated
>>working tree across various state transitions, which has been
>>corrected.
>>(merge 984cd77ddb sb/submodule-core-worktree later to maint).
>
> Personally I do not view this as a bug fix but a feature
> (but then again my thinking might be tainted of too much
> submodule work) hence I would not merge it down.

Can you elaborate?

The symptom that this series fixes was pretty bad, so I'm pretty glad
you wrote it.

Thanks,
Jonathan


Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0

2018-08-20 Thread Matthew DeVore
On Fri, Aug 17, 2018 at 3:28 PM Stefan Beller  wrote:
>
> On Fri, Aug 17, 2018 at 3:20 PM Matthew DeVore  wrote:
> >
> > On Fri, Aug 17, 2018 at 2:42 PM Stefan Beller  wrote:
> > >
> > > On Wed, Aug 15, 2018 at 4:23 PM Matthew DeVore  wrote:
> > > >
> > > > Teach list-objects the "tree:0" filter which allows for filtering
> > > > out all tree and blob objects (unless other objects are explicitly
> > > > specified by the user). The purpose of this patch is to allow smaller
> > > > partial clones.
> > > >
> > > > The name of this filter - tree:0 - does not explicitly specify that
> > > > it also filters out all blobs, but this should not cause much confusion
> > > > because blobs are not at all useful without the trees that refer to
> > > > them.
> > > >
> > > > I also consider only:commits as a name, but this is inaccurate because
> > > > it suggests that annotated tags are omitted, but actually they are
> > > > included.
> > >
> > > Speaking of tag objects, it is possible to tag anything, including blobs.
> > > Would a blob that is tagged (hence reachable without a tree) be not
> > > filtered by tree:0 (or in the future any deeper depth) ?
> > I think so. If I try to fetch a tagged tree or blob, it should fetch
> > that object itself, since I'm referring to it explicitly in the git
> > pack-objects arguments (I mention fetch since git rev-list apparently
> > doesn't support specifying non-commits on the command line). This is
> > similar to how I can fetch a commit that would otherwise be filtered
> > *if* I specify it explicitly (rather than a child commit).
> >
> > If you're fetching a tagged tree, then for depth=0, it will fetch the
> > given tree only, and not fetch any referents of an explicitly-given
> > tree. For depth=1, it will fetch all direct referents.
> >
> > If you're fetching a commit, then for depth=0, you will not get any
> > tree objects, and for depth=1, you'll get only the root tree object
> > and none of its referrents. So the commit itself is like a "layer" in
> > the depth count.
>
> That seems smart. Thanks!
>
> >
> > >
> > > I found this series a good read, despite my unfamiliarity of the
> > > partial cloning.
> > >
> > > One situation where I scratched my head for a second were previous patches
> > > that  use "test_line_count = 0 rev_list_err" whereas using 
> > > test_must_be_empty
> > > would be an equally good choice (I am more used to the latter than the 
> > > former)
> >
> > Done. Here is an interdiff (sorry, the tab characters are not
> > maintained by my mail client):
>
> heh. Thanks for switching the style; I should have emphasized that
> (after reflection) I found them equally good, I am used to one
> over the other more.
It seems marginally better to me. I also noticed a clean-up patch
going by that aggressively switched to test_must_be_empty wherever
possible: 
https://public-inbox.org/git/20180819215725.29001-1-szeder@gmail.com/

OTOH, if it were up to me I would have just gotten rid of
test_must_be_empty and used an existing function with the right
argument, like `test_cmp /dev/null` - but using some form consistently
is the most important, whatever it is.

>
> So if that is the only issue brought up, I would not even ask for a resend.
>
> Thanks,
> Stefan


Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0

2018-08-20 Thread Matthew DeVore
On Mon, Aug 20, 2018 at 11:38 AM Stefan Beller  wrote:
>
> On Mon, Aug 20, 2018 at 6:18 AM Matthew DeVore  wrote:
> >
> > There were many instances in this file where it seemed like BUG would be
> > better, so I created a new commit before this one to switch them over. The
> > interdiff is below.
> >
> > BTW, why are there so many instances of "die" without "_"? I expect all
> > errors that may be caused by a user to be localized.
>
> Well, there is the porcelain layer to be consumed by a human user
> and the plumbing that is good for scripts. And in scripts you might want
> to grep for certain errors and react to that, so a non-localized error
> message makes the script possible to run in any localisation.
>
> The BUG is strictly for things that are due to Gits internals,
> not for problematic user input. Problematic user input
> definitely wants a die(...), and depending on the plumbing/porcelain
> layer it may need to be _(translatable).
Ah I see. Plumbing commands are not translated. Makes perfect sense now.

>
> I think BUG() would never go with translated strings.
>
> > I'm going by the output of this: grep -IrE '\Wdie\([^_]' --exclude-dir=t
> >
> > diff --git a/list-objects-filter.c b/list-objects-filter.c
> > index 8e3caf5bf..09b2b05d5 100644
> > --- a/list-objects-filter.c
> > +++ b/list-objects-filter.c
> > @@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none(
> >
> > switch (filter_situation) {
> > default:
> > -   die("unknown filter_situation");
> > -   return LOFR_ZERO;
> > +   BUG("unknown filter_situation: %d", filter_situation);
> >
> > case LOFS_BEGIN_TREE:
> > assert(obj->type == OBJ_TREE);
> > @@ -99,8 +98,7 @@ static enum list_objects_filter_result filter_trees_none(
> >
> > switch (filter_situation) {
> > default:
> > -   die("unknown filter_situation");
> > -   return LOFR_ZERO;
> > +   BUG("unknown filter_situation: %d", filter_situation);
> >
> > case LOFS_BEGIN_TREE:
> > case LOFS_BLOB:
> > @@ -151,8 +149,7 @@ static enum list_objects_filter_result
> > filter_blobs_limit(
> >
> > switch (filter_situation) {
> > default:
> > -   die("unknown filter_situation");
> > -   return LOFR_ZERO;
> > +   BUG("unknown filter_situation: %d", filter_situation);
> >
> > case LOFS_BEGIN_TREE:
> > assert(obj->type == OBJ_TREE);
> > @@ -257,8 +254,7 @@ static enum list_objects_filter_result filter_sparse(
> >
> > switch (filter_situation) {
> > default:
> > -   die("unknown filter_situation");
> > -   return LOFR_ZERO;
> > +   BUG("unknown filter_situation: %d", filter_situation);
>
> Up until here we just have replace the die by BUG in the default
> case of the state machine switch. (We need the default due to strict
> compile flags, but as filter_situation is an enum I thought we would not
> as compilers are smart enough to see we got all values of the enum
> covered).
At the risk of going on a tangent, I assumed this was because enums
are really ints, and the "default" is there in case the enum somehow
got assigned to an int without a corresponding value. Either because
of a cast from an int that was out-of-range, or new values that were
obtained from arithmetic or bitwise operations on the declared enum
values, which created undeclared values.

>
> I agree that keeping the defaults and having a BUG() is reasonable.
>
>
> >
> > case LOFS_BEGIN_TREE:
> > assert(obj->type == OBJ_TREE);
> > @@ -439,7 +435,7 @@ void *list_objects_filter__init(
> > assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
> >
> > if (filter_options->choice >= LOFC__COUNT)
> > -   die("invalid list-objects filter choice: %d",
> > +   BUG("invalid list-objects filter choice: %d",
> > filter_options->choice);
>
> This also makes sense, combined with the assert before, this looks like
> really defensive code.
>
> I think this patch is a good idea!
Thank you for your feedback :)

>
> Thanks,
> Stefan


What's cooking in git.git (Aug 2018, #05; Mon, 20)

2018-08-20 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

An early preview of the upcoming 2.19 release of Git has been tagged
as v2.19.0-rc0; before -rc1, I plan to merge three more topics to
'master' from 'next'.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ab/checkout-default-remote (2018-08-18) 1 commit
  (merged to 'next' on 2018-08-20 at 21fff26f74)
 + t2024: mark test using "checkout -p" with PERL prerequisite

 Test fix.


* ab/fetch-tags-noclobber (2018-08-13) 7 commits
  (merged to 'next' on 2018-08-15 at eca0ac8afa)
 + pull doc: fix a long-standing grammar error
 + fetch tests: correct a comment "remove it" -> "remove them"
 + push tests: assert re-pushing annotated tags
 + push tests: add more testing for forced tag pushing
 + push tests: fix logic error in "push" test assertion
 + push tests: remove redundant 'git push' invocation
 + fetch tests: change "Tag" test tag to "testTag"

 Test and doc clean-ups.


* ab/newhash-is-sha256 (2018-08-07) 2 commits
  (merged to 'next' on 2018-08-15 at 2e808d75d3)
 + doc hash-function-transition: pick SHA-256 as NewHash
 + doc hash-function-transition: note the lack of a changelog

 Documentation update.


* ab/submodule-relative-url-tests (2018-08-14) 1 commit
  (merged to 'next' on 2018-08-17 at 17b28d8262)
 + submodule: add more exhaustive up-path testing

 Test updates.


* ab/test-must-be-empty-for-master (2018-07-30) 1 commit
  (merged to 'next' on 2018-08-15 at 17652a77fb)
 + tests: make use of the test_must_be_empty function

 Test updates.


* ds/commit-graph-fsck (2018-08-13) 1 commit
  (merged to 'next' on 2018-08-15 at a2f82d3cbd)
 + t5318: use 'test_cmp_bin' to compare commit-graph files

 Test fix.


* en/incl-forward-decl (2018-08-15) 6 commits
  (merged to 'next' on 2018-08-17 at 04fc9c11bb)
 + Remove forward declaration of an enum
 + compat/precompose_utf8.h: use more common include guard style
 + urlmatch.h: fix include guard
 + Move definition of enum branch_track from cache.h to branch.h
 + alloc: make allocate_alloc_state and clear_alloc_state more consistent
 + Add missing includes and forward declarations

 Code hygiene improvement for the header files.


* en/t7406-fixes (2018-08-08) 5 commits
  (merged to 'next' on 2018-08-15 at c6a740d828)
 + t7406: avoid using test_must_fail for commands other than git
 + t7406: prefer test_* helper functions to test -[feds]
 + t7406: avoid having git commands upstream of a pipe
 + t7406: simplify by using diff --name-only instead of diff --raw
 + t7406: fix call that was failing for the wrong reason

 Test fixes.


* en/update-index-doc (2018-08-08) 1 commit
  (merged to 'next' on 2018-08-15 at 3ee0ae14dc)
 + git-update-index.txt: reword possibly confusing example

 Doc update.


* es/chain-lint-more (2018-08-13) 6 commits
  (merged to 'next' on 2018-08-15 at bb5150ee96)
 + chainlint: add test of pathological case which triggered false positive
 + chainlint: recognize multi-line quoted strings more robustly
 + chainlint: let here-doc and multi-line string commence on same line
 + chainlint: recognize multi-line $(...) when command cuddled with "$("
 + chainlint: match 'quoted' here-doc tags
 + chainlint: match arbitrary here-docs tags rather than hard-coded names

 Improve built-in facility to catch broken &&-chain in the tests.


* hn/highlight-sideband-keywords (2018-08-18) 2 commits
  (merged to 'next' on 2018-08-20 at ec6f953f8c)
 + sideband: do not read beyond the end of input
  (merged to 'next' on 2018-08-15 at f8945f3be5)
 + sideband: highlight keywords in remote sideband output

 The sideband code learned to optionally paint selected keywords at
 the beginning of incoming lines on the receiving end.


* jc/gpg-status (2018-08-09) 1 commit
  (merged to 'next' on 2018-08-15 at 824781761a)
 + gpg-interface: propagate exit status from gpg back to the callers

 "git verify-tag" and "git verify-commit" have been taught to use
 the exit status of underlying "gpg --verify" to signal bad or
 untrusted signature they found.


* jc/update-index-doc (2018-08-08) 1 commit
  (merged to 'next' on 2018-08-15 at 055994ccca)
 + update-index: there no longer is `apply --index-info`

 Doc update.


* jh/partial-clone-doc (2018-08-15) 1 commit
  (merged to 'next' on 2018-08-15 at cf09e8be6a)
 + partial-clone: render design doc using asciidoc

 Doc updates.


* jk/for-each-object-iteration (2018-08-14) 11 commits
  (merged to 'next' on 2018-08-15 at e2558810ff)
 + for_each_*_object: move declarations to object-store.h
 + cat-file: use a single strbuf for all output
 + cat-file: split batch "buf" into two 

Re: [PATCH] submodule: add config for where gitdirs are located

2018-08-20 Thread Junio C Hamano
Brandon Williams  writes:

> Introduce the config "submodule..gitdirpath" which is used to
> indicate where a submodule's gitdir is located inside of a repository's
> "modules" directory.
>
> Signed-off-by: Brandon Williams 
> ---
>
> Maybe something like this on top?  Do you think we should disallow
> "../" in this config, even though it is a repository local
> configuration and not shipped in .gitmodules?

Sounds sensible to start strict and loosen later if/when necessary.

If we disallow "../", shouldn't we also reject an absolute path
(meaning, you can only specify a subdirectory of $GIT_DIR/modules/
in the super-project)?


Re: [PATCH 8/9] midx: test a few commands that use get_all_packs

2018-08-20 Thread Stefan Beller
On Mon, Aug 20, 2018 at 9:52 AM Derrick Stolee  wrote:
>
> The new get_all_packs() method exposed the packfiles coverede by

s/coverede/covered/


Re: [PATCH 7/9] treewide: use get_all_packs

2018-08-20 Thread Stefan Beller
On Mon, Aug 20, 2018 at 9:52 AM Derrick Stolee  wrote:
>
> There are many places in the codebase that want to iterate over
> all packfiles known to Git. The purposes are wide-ranging, and
> those that can take advantage of the multi-pack-index already
> do. So, use get_all_packs() instead of get_packed_git() to be
> sure we are iterating over all packfiles.

So get_packed_git shouold not be used any further?
Do we want to annotate it as deprecated, to be deleted
in the future? Or even remove it as part of this series
(that might be an issue with other series in flight).


Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects

2018-08-20 Thread Jeff King
On Mon, Aug 20, 2018 at 02:03:53PM -0700, Junio C Hamano wrote:

> > So taking all of those options into account, what I ended up
> > with is a separate list of "external bases" that are not
> > part of the main packing list. Each delta entry that points
> > to an external base has a single-bit flag to do so; we have a
> > little breathing room in the bitfield section of
> > object_entry.
> >
> > This lets us limit the change primarily to the oe_delta()
> > and oe_set_delta_ext() functions. And as a bonus, most of
> > the rest of the code does not consider these dummy entries
> > at all, saving both runtime CPU and code complexity.
> 
> Tricky ;-)
> 
> I wonder if we can move the preferred base objects that we are not
> going to send also off of the "main packing list" to this new
> mechanism?

That gets trickier. Obviously we don't need to actually send them, so
they're ignored during the write phase. But we do put them into the
delta-search window along with the other objects, and we care about
things like their size and type (which our dummy objects do not even
have; they really are just placeholders for the oids).

So I'd guess that we'd end up with three arrays of entries:

 - objects we want to send, with full details including delta bases

 - preferred base objects with _some_ details

 - dummy thin objects for reused deltas

The trick is that I don't know which details fall under the "some" in
the second class. That could be extracted from a careful reading of the
code, but it seems like it has a high chance of regression (and I'm not
sure there's a huge upside, given that the code is already written
as-is).

I also worried about the same thing with these new reused dummy objects.
but I think by now I'd have shaken out any problems in production use.

> > +static struct bitmap_index *bitmap_git;
> > ...
> > +static int thin = 0;
> 
> Please trust what BSS will do to your static vars.

Heh. I copied the non-static declaration, but didn't think about
shortening it. Looks like there were one or two minor comments, so I'll
do a re-roll that addresses them.

-Peff


Re: [PATCH v3 7/7] submodule: support reading .gitmodules even when it's not checked out

2018-08-20 Thread Antonio Ospite
On Tue, 14 Aug 2018 13:36:17 -0700
Junio C Hamano  wrote:

> Antonio Ospite  writes:
> 
> > /* Equivalent to ACTION_SET in builtin/config.c */
> > -   if (argc == 3)
> > +   if (argc == 3) {
> > +   struct object_id oid;
> > +
> > +   /*
> > +* If the .gitmodules file is not in the working tree but it
> > +* is in the current branch, stop, as writing new values (and
> > +* staging them) would blindly overwrite ALL the old content.
> 
> Hmph, "the file is missing" certainly is a condition we would want
> to notice, but wouldn't we in general want to prevent us from
> overwriting any local modification, where "missing" is merely a very
> special case of local modification?  I am wondering if we would want
> to stop if .gitmodules file exists both in the working tree and in
> the index, and the contents of them differ, or something like that.
>

TTBOMK checking the index status (with something like
is_staging_gitmodules_ok()) when the .gitmodules file *exists* in the
working tree would break calling "git submodule add" multiple times
before committing the changes.

> > diff --git a/git-submodule.sh b/git-submodule.sh
> > index ff258e2e8c..b1cb187227 100755
> > --- a/git-submodule.sh
> > +++ b/git-submodule.sh
> > @@ -159,6 +159,13 @@ cmd_add()
> > shift
> > done
> >  
> > +   # For more details about this check, see
> > +   # builtin/submodule--helper.c::module_config()
> > +   if test ! -e .gitmodules && git cat-file -e HEAD:.gitmodules > 
> > /dev/null 2>&1
> 
> No SP between redirection '>' and its target '/dev/null'.
>
> More importantly, I think it is better to add a submodule--helper
> subcommand that exposes the check in question, as the code is
> already written ;-) That approach will guarantee that the logic and
> the message stay the same between here and in the C code.  Then you
> do not even need these two line comment.
>

Yeah I anticipated this concern in the patch annotation, but I was
hoping that it would be OK to have this as a followup change.

I guess I can do it for v4 instead.

Does the interface suggested in the patch annotation sound acceptable?

To recap:

  - add an is_gitmodules_safely_writeable() helper;
  - expose a "submodule--helper config --is-safely-writeable"
subcommand for git-submodule.sh to use.

[...]
> > @@ -603,8 +604,19 @@ static void submodule_cache_check_init(struct 
> > repository *repo)
> >  static void config_from_gitmodules(config_fn_t fn, struct repository 
> > *repo, void *data)
> >  {
> > if (repo->worktree) {
> > -   char *file = repo_worktree_path(repo, GITMODULES_FILE);
> > -   git_config_from_file(fn, file, data);
> > +   struct git_config_source config_source = { 0 };
> > +   const struct config_options opts = { 0 };
> > +   struct object_id oid;
> > +   char *file;
> > +
> > +   file = repo_worktree_path(repo, GITMODULES_FILE);
> > +   if (file_exists(file))
> > +   config_source.file = file;
> > +   else if (get_oid(GITMODULES_HEAD, ) >= 0)
> > +   config_source.blob = GITMODULES_HEAD;
> 
> What is the reason why we fall back directly to HEAD when working
> tree file does not exist?  I thought that our usual fallback was to
> the version in the index for other things like .gitignore/attribute
> and this codepath look like an oddball.  Are you trying to handle
> the case where we are in a bare repository without any file checked
> out (and there is not even the index)?
>

My use case is about *reading* .gitmodules when it's ignored in a sparse
checkout, in this scenario there are usually no staged changes
to .gitmodules, so I basically just didn't care about the index.

Would using ":.gitmodules" instead of "HEAD:.gitmodules" be enough?

By reading man  gitrevisions(7) and after a quick test with "git
cat-file blob :.gitmodules" it looks like this would be more in line
with your suggestion, still covering my use case.

If so, what name should I use instead of GITMODULES_HEAD?
GITMODULES_BLOB is already taken for something different, maybe
GITMODULES_REF or GITMODULES_OBJECT?

> > diff --git a/t/t7416-submodule-sparse-gitmodules.sh 
> > b/t/t7416-submodule-sparse-gitmodules.sh
> > new file mode 100755
> > index 00..5341e9b012
> > --- /dev/null
> > +++ b/t/t7416-submodule-sparse-gitmodules.sh
> > @@ -0,0 +1,90 @@
> > +#!/bin/sh
> > +#
> > +# Copyright (C) 2018  Antonio Ospite 
> > +#
> > +
> > +test_description='Test reading/writing .gitmodules when not in the working 
> > tree
> > +
> > +This test verifies that, when .gitmodules is in the current branch but is 
> > not
> > +in the working tree reading from it still works but writing to it does not.
> > +
> > +The test setup uses a sparse checkout, however the same scenario can be 
> > set up
> > +also by committing .gitmodules and then just removing it from the 
> > filesystem.
> > +'
> > +
> > +. ./test-lib.sh
> > +

Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)

2018-08-20 Thread Junio C Hamano
Stefan Beller  writes:

>>  I find the design a bit iffy in that our usual "missing in the
>>  working tree?  let's use the latest blob" fallback is to take it
>>  from the index, not from the HEAD.
>
> I am not sure; why does it feel iffy?

Why doesn't it?  There is no justification why it has to be
different from others and read from HEAD bypassing index.




Re: [PATCH 3/9] midx: mark bad packed objects

2018-08-20 Thread Stefan Beller
On Mon, Aug 20, 2018 at 9:52 AM Derrick Stolee  wrote:
>
> When an object fails to decompress from a pack-file, we mark the object
> as 'bad' so we can retry with a different copy of the object (if such a
> copy exists).
>
> Before now, the multi-pack-index did not update the bad objects list for
> the pack-files it contains, and we did not check the bad objects list
> when reading an object. Now, do both.

This sounds like a bug fix unlike patches 1 & 2 that sound like
feature work(2) or making code more readable(1).
(After studying the code, this doesn't sound like a bug fix any more,
but a safety thing)

Is it worth having this on a separate track coming in
faster than the rest of this series?

>
> Signed-off-by: Derrick Stolee 
> ---
>  midx.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/midx.c b/midx.c
> index 6824acf5f8..7fa75a37a3 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -280,6 +280,16 @@ static int nth_midxed_pack_entry(struct multi_pack_index 
> *m, struct pack_entry *
> if (!is_pack_valid(p))
> return 0;
>
> +   if (p->num_bad_objects) {
> +   uint32_t i;

Is there a reason that i needs to be if 32 bits?
Would size_t (for iterating) or 'int' (as a default
like in many for loops) be ok, too?


Re: [PATCH 2/9] multi-pack-index: store local property

2018-08-20 Thread Stefan Beller
On Mon, Aug 20, 2018 at 9:52 AM Derrick Stolee  wrote:
>
> A pack-file is 'local' if it is stored within the usual object
> directory. If it is stored in an alternate, it is non-local.
>
> Pack-files are stored using a 'pack_local' member in the packed_git
> struct. Add a similar 'local' member to the multi_pack_index struct
> and 'local' parameters to the methods that load and prepare multi-
> pack-indexes.

Going by that example, maybe we'd want to have it be the first bit
of a bitfield, c.f.

unsigned pack_local:1,
pack_keep:1,
pack_keep_in_core:1,
freshened:1,
do_not_close:1,
pack_promisor:1;

in the definition of packed_git.

Is there value in documenting both packfiles as well as midx variables?
(When going with the bitfield example, it may be a bit more worthwhile,
as the commit message is harder to find, as it will need repeated
blaming in the future, if there is a commit on top that adds another
bit to the field; I am unsure, but I would lean towards documentation
as it becomes a bit unclear: Is the local flag for the midx file that
is read, or rather for the underlying packs? What exactly is local? )

AFAICT this patch alone doesn't have any effect, yet, as it only pipes
the flag thru, is that worth mentioning or is that obvious?

Thanks,
Stefan


Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects

2018-08-20 Thread Junio C Hamano
Jeff King  writes:

> And that's exactly what this patch does: when we're
> considering whether to reuse an on-disk delta, if bitmaps
> tell us the other side has the object (and we're making a
> thin-pack), then we reuse it.

That's really the natural extension and logical consequence of the
"reuse existing deltas" mechanism from Feb 2006 ;-)

> So taking all of those options into account, what I ended up
> with is a separate list of "external bases" that are not
> part of the main packing list. Each delta entry that points
> to an external base has a single-bit flag to do so; we have a
> little breathing room in the bitfield section of
> object_entry.
>
> This lets us limit the change primarily to the oe_delta()
> and oe_set_delta_ext() functions. And as a bonus, most of
> the rest of the code does not consider these dummy entries
> at all, saving both runtime CPU and code complexity.

Tricky ;-)

I wonder if we can move the preferred base objects that we are not
going to send also off of the "main packing list" to this new
mechanism?

> +static struct bitmap_index *bitmap_git;
> ...
> +static int thin = 0;

Please trust what BSS will do to your static vars.


[RFC] Git enumerations

2018-08-20 Thread Julian Ganz
Hi list,

I'm a co-author of the git-dit tool [1] discussed in the thread

git-bug: Distributed bug tracker embedded in git

on this mailing list. Partly motivated by Jonathan Nieder's call for
more collaboration, I'd like to request comments on a functionality
which I originally planned on introducing as a git-dit internal
feature.


# Problem statement:

Especially with tools like git-dit or git-bug, we often list OIDs,
usually of commits, and additional data. For example, `git dit list -a`
may print something like the following (`-a` is for abbreviated ids):

8640a6b (Mon Dec 19 17:11:14 2016) Implement "subscribe" functionality
ae058fd (Fri Dec 16 10:35:40 2016) Add features to dit-{list,show} to show 
statuses

For viewing the details of the first issue, you would invoke the "show"
subcommand with the respective (possibly abbreviated) id. You generally
either have to re-type or copy this id. The first option tends to
get on one's nerves, since the IDs have to be unambiguous for the
repository. The second may disturb your workflow (since you have to
grab your mouse) or may not be possible (e.g. plain TTY).

So it would be nice if you could just identify them via a plain number.
For the output of `git log --oneline --decorate` and similar commands,
we already have something similar: each commit shown can generally be
addressed via some-ref~some-number. So you can, more or less, inspect
those without copying IDs. For listing issues with git-dit, it's not
that easy, since the issues are generally not connected.


# Solution using symbolic refs

When printing a list of disconnected objects addressable through OIDs,
we would generate an enumeration on the fly. For each entry, we then
store a symbolic ref reflecting the enumeration in some namespace
(e.g. "refs/enumeration/1", "refs/enumeration/2", ...). Obviously, each
symbolic ref would point to the respective entry's OID. Essentially, we
have persisted the list printed by the first command.

Other subcommand of the tool would contain some minimal logic for
detecting and choosing such an enumeration (e.g. "2") in the place of
an OID.

This may have additional benefits for some tools, since we can also
review a previously printed list. This is especially interesting if
generating the list is costly.


# Additional tweaks

Instead of plain numbers, the symbolic refs could also contain an
optional prefix. `git dit list`, for example could generate
enumerations like "i1", "i2", ... while `git dit show` (which prints
"messages") would generate "m1", "m2", "m3", ... This would support
workflows like:

 * view the list of open issues
 * inspect an issue
 * show that (exact) list of issues again
 * inspect the next, ...

Alternatively, one could start using namespaces within
"refs/enumeration", but imo this would hurt the usability in most cases.


# Integration

As stated initially, I originally planned on introducing this as an
internal feature to git-dit. However, I realized that other tools may
also benefit from such enumerations. Hence, I'm thinking about
providing by a third-party library as well as a minimal tool/subcommand
for managing (e.g. clearing) enumerations.


Regards,
Julian

[1] https://github.com/neithernut/git-dit


Re: [PATCH v2 0/8] Clarify commit-graph and grafts/replace/shallow incompatibilities

2018-08-20 Thread Derrick Stolee

On 8/20/2018 3:37 PM, Stefan Beller wrote:

On Mon, Aug 20, 2018 at 11:24 AM Derrick Stolee  wrote:

One unresolved issue with the commit-graph feature is that it can cause
issues when combined with replace objects, commit grafts, or shallow
clones. These are not 100% incompatible, as one could be reasonably
successful writing a commit-graph after replacing some objects and not
have issues. The problems happen when commits that are already in the
commit-graph file are replaced, or when git is run with the
`--no-replace-objects` option; this can cause incorrect parents or
incorrect generation numbers. Similar things occur with commit grafts
and shallow clones, especially when running `git fetch --unshallow` in a
shallow repo.

Instead of trying (and probably failing) to make these features work
together, default to making the commit-graph feature unavailable in these
situations. Create a new method 'commit_graph_compatible(r)' that checks
if the repository 'r' has any of these features enabled.

CHANGES IN V2:

* The first two commits regarding the ref iterators are unchanged, despite
   a lot of discussion on the subject [1].

* I included Peff's changes in jk/core-use-replace-refs, changing the base
   commit for the series to 1689c22c1c328e9135ed51458e9f9a5d224c5057 (the merge
   that brought that topic into 'msater').

* I fixed the tests for the interactions with the graft feature.

Because of the change of base, it is hard to provide a side-by-side diff
from v1.

Thanks,
-Stolee

[1] 
https://public-inbox.org/git/CAGZ79kZ3PzqpGzXWcmxjzi98gA+LT2MBOf8KaA89hOa-Qig=o...@mail.gmail.com/
 Stefan's response recommending we keep the first two commits.


After reviewing my own patches, I flipped again (Sorry!) and would
rather not see my patches be merged, but the very original solution
by you, that you proposed at [1]. That said, I will not insist on it, and
if this is merged as is, we can fix it up later.

With that said, I just read through the remaining patches, I think
they are a valuable addition to Git and could be merged as-is.

[1] 
https://github.com/gitgitgadget/git/pull/11/commits/300db80140dacc927db0d46c804ca0ef4dcc1be1

Thanks,
Stefan


Just to keep things on-list as possible, here is the patch for the 
commit linked above. It would replace patches 1 & 2 from this series.


Junio: I can send a v3 that includes this commit if you need it, or if 
there are other edits to be made in this series.


commit 300db80140dacc927db0d46c804ca0ef4dcc1be1
Author: Derrick Stolee 
Date:   Fri Jul 20 15:39:06 2018 -0400

    replace-objects: use arbitrary repositories

    This is the smallest possible change that makes prepare_replace_objects
    work properly with arbitrary repositories. By supplying the repository
    as the cb_data, we do not need to modify any code in the ref iterator
    logic. We will likely want to do a full replacement of the ref iterator
    logic to provide a repository struct as a concrete parameter.

    Signed-off-by: Derrick Stolee 

diff --git a/replace-object.c b/replace-object.c
index 801b5c1678..e99fcd1ff6 100644
--- a/replace-object.c
+++ b/replace-object.c
@@ -14,6 +14,7 @@ static int register_replace_ref(const char *refname,
    const char *slash = strrchr(refname, '/');
    const char *hash = slash ? slash + 1 : refname;
    struct replace_object *repl_obj = xmalloc(sizeof(*repl_obj));
+   struct repository *r = (struct repository *)cb_data;

    if (get_oid_hex(hash, _obj->original.oid)) {
    free(repl_obj);
@@ -25,7 +26,7 @@ static int register_replace_ref(const char *refname,
    oidcpy(_obj->replacement, oid);

    /* Register new object */
-   if (oidmap_put(the_repository->objects->replace_map, repl_obj))
+   if (oidmap_put(r->objects->replace_map, repl_obj))
    die("duplicate replace ref: %s", refname);

    return 0;
@@ -40,7 +41,7 @@ static void prepare_replace_object(struct repository *r)
    xmalloc(sizeof(*r->objects->replace_map));
    oidmap_init(r->objects->replace_map, 0);

-   for_each_replace_ref(r, register_replace_ref, NULL);
+   for_each_replace_ref(r, register_replace_ref, r);
 }

 /* We allow "recursive" replacement. Only within reason, though */



Re: [PATCH 3/7] submodule: is_submodule_active to differentiate between new and old mode

2018-08-20 Thread Stefan Beller
On Thu, Aug 16, 2018 at 10:37 AM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > The change a086f921a72 (submodule: decouple url and submodule interest,
> > 2017-03-17) enables us to do more than originally thought.
> > As the url setting was used both to actually set the url where to
> > obtain the submodule from, as well as used as a boolean flag later
> > to see if it was active, we would need to keep the url around.
> >
> > Now that submodules can be activated using the submodule.[.]active
> > setting, we could remove the url if the submodule is activated via that
> > setting.
>
> ... as the cloned submodule repository has $GIT_DIR/config which
> knows its own origin, we do not need to record URL in superproject's
> $GIT_DIR/config.  Back before we started using a directory under
> $GIT_DIR/modules/ as a more permanent location to store submodule,
> and point at it by a gitdir file in $path/.git to allow removal of a
> submodule from the working tree of the superproject without having
> to reclone when resurrecting the same submodule, it may have been
> useful to keep custom URL somewhere in the superproject, but that no
> longer is the case.
>
> > In preparation to do so, pave the way by providing an easy way to see
> > if a submodule is considered active via the new .active setting or via
> > the old .url setting.
>
> Makes sense.
>
> >
> > Signed-off-by: Stefan Beller 
> > ---
> >  submodule.c | 5 +
> >  submodule.h | 6 ++
> >  2 files changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/submodule.c b/submodule.c
> > index 6e14547e9e0..d56350ed094 100644
> > --- a/submodule.c
> > +++ b/submodule.c
> > @@ -221,9 +221,6 @@ int 
> > option_parse_recurse_submodules_worktree_updater(const struct option *opt,
> >   return 0;
> >  }
> >
> > -/*
> > - * Determine if a submodule has been initialized at a given 'path'
> > - */
> >  int is_submodule_active(struct repository *repo, const char *path)
> >  {
> >   int ret = 0;
> > @@ -267,7 +264,7 @@ int is_submodule_active(struct repository *repo, const 
> > char *path)
> >
> >   /* fallback to checking if the URL is set */
> >   key = xstrfmt("submodule.%s.url", module->name);
> > - ret = !repo_config_get_string(repo, key, );
> > + ret = !repo_config_get_string(repo, key, ) ? 2 : 0;
> >
> >   free(value);
> >   free(key);
> > diff --git a/submodule.h b/submodule.h
> > index 4644683e6cb..bfc070e4629 100644
> > --- a/submodule.h
> > +++ b/submodule.h
> > @@ -45,6 +45,12 @@ extern int git_default_submodule_config(const char *var, 
> > const char *value, void
> >  struct option;
> >  int option_parse_recurse_submodules_worktree_updater(const struct option 
> > *opt,
> >const char *arg, int 
> > unset);
> > +/*
> > + * Determine if a submodule has been initialized at a given 'path'.
> > + * Returns 1 if it is considered active via the submodule.[.]active
> > + * setting, or return 2 if it is active via the older submodule.url 
> > setting.
> > + */
> > +#define SUBMODULE_ACTIVE_VIA_URL 2
> >  extern int is_submodule_active(struct repository *repo, const char *path);
> >  /*
> >   * Determine if a submodule has been populated at a given 'path' by 
> > checking if
>
> This change assumes that all the existing return sites in the
> is_submodule_active() function signals true with 1 (or at least some
> value that is different from 2).

Yes.

> So I think this patch is insufficient, and needs to at least change
> the "submodule.active" codepath to return !!ret; otherwise, a caller
> that now expects 0 (not active), 1 (active but can lose URL) and 2
> (active and the presence of URL makes it so) will be confused when
> one of the MATCHED_* constants from dir.h comes back.

Yes.

I'll resend when appropriately.

Thanks,
Stefan


Re: [PATCH 7/7] submodule--helper: introduce new update-module-mode helper

2018-08-20 Thread Stefan Beller
On Sat, Aug 18, 2018 at 9:11 AM Duy Nguyen  wrote:
>
> On Tue, Aug 14, 2018 at 12:45 AM Stefan Beller  wrote:
> > +static int module_update_module_mode(int argc, const char **argv, const 
> > char *prefix)
> > +{
> > +   const char *path, *update = NULL;
> > +   int just_cloned;
> > +   struct submodule_update_strategy update_strategy = { .type = 
> > SM_UPDATE_CHECKOUT };
> > +
> > +   if (argc < 3 || argc > 4)
> > +   die("submodule--helper update-module-clone expects 
> >   []");
>
> Maybe _() ?

I would rather not, as the submodule--helper is "internal only" and these die()
calls could be clarified via

#define BUG_IN_CALLING_SH(x) die(x)

After the conversion to C is done, all these submodule helpers would go away,
so I'd not burden the translators too much?

Thanks,
Stefan


Re: [PATCH v7 09/16] fetch: support filters

2018-08-20 Thread Jeff Hostetler




On 8/19/2018 7:24 AM, Duy Nguyen wrote:

On Fri, Dec 8, 2017 at 5:00 PM Jeff Hostetler  wrote:


From: Jeff Hostetler 

Teach fetch to support filters. This is only allowed for the remote
configured in extensions.partialcloneremote.

Signed-off-by: Jonathan Tan 
---
  builtin/fetch.c   | 23 +--
  connected.c   |  2 ++
  remote-curl.c |  6 ++
  t/t5500-fetch-pack.sh | 36 
  4 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 1b1f039..14aab71 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -18,6 +18,7 @@
  #include "argv-array.h"
  #include "utf8.h"
  #include "packfile.h"
+#include "list-objects-filter-options.h"

  static const char * const builtin_fetch_usage[] = {
 N_("git fetch [] [ [...]]"),
@@ -55,6 +56,7 @@ static int recurse_submodules_default = 
RECURSE_SUBMODULES_ON_DEMAND;
  static int shown_url = 0;
  static int refmap_alloc, refmap_nr;
  static const char **refmap_array;
+static struct list_objects_filter_options filter_options;

  static int git_fetch_config(const char *k, const char *v, void *cb)
  {
@@ -160,6 +162,7 @@ static struct option builtin_fetch_options[] = {
 TRANSPORT_FAMILY_IPV4),
 OPT_SET_INT('6', "ipv6", , N_("use IPv6 addresses only"),
 TRANSPORT_FAMILY_IPV6),
+   OPT_PARSE_LIST_OBJECTS_FILTER(_options),


Documentation is missing. Please add something to git-fetch.txt (or
fetch-options.txt) about this option. I would make a patch but I don't
know enough about this to write and I'm in the middle of something
else.



Documentation for --filter= (and --no-filter) were added to
rev-list-options.txt.  The opening paragraph talks about --objects
which would need to be omitted for fetch and clone, but the rest
of that text is relevant.

I'll push up a patch shortly.

Thanks
Jeff


Re: [PATCH 1/3] diff.c: add --output-indicator-{new, old, context}

2018-08-20 Thread Stefan Beller
On Mon, Aug 20, 2018 at 12:31 PM Johannes Schindelin
 wrote:
>
> Hi Stefan,
>
> On Fri, 17 Aug 2018, Stefan Beller wrote:
>
> > This will prove useful in range-diff in a later patch as we will be able
> > to differentiate between adding a new file (that line is starting with
> > +++ and then the file name) and regular new lines.
> >
> > It could also be useful for experimentation in new patch formats, i.e.
> > we could teach git to emit moved lines with lines other than +/-.
>
> Thanks.
>
> > diff --git a/diff.c b/diff.c
> > index c5c7739ce34..03486c35b75 100644
> > --- a/diff.c
> > +++ b/diff.c
> > @@ -1281,7 +1281,9 @@ static void emit_diff_symbol_from_struct(struct 
> > diff_options *o,
> >   else if (c == '-')
> >   set = diff_get_color_opt(o, DIFF_FILE_OLD);
> >   }
> > - emit_line_ws_markup(o, set_sign, set, reset, ' ', line, len,
> ^
> Here we already pass `o`... so...
>
> > + emit_line_ws_markup(o, set_sign, set, reset,
> > + 
> > o->output_indicators[OUTPUT_INDICATOR_CONTEXT],
> 
> ^^^
> ... here, we could simply pass `OUTPUT_INDICATOR_CONTEXT` and let the
> callee look it up in`o->output_indicators[]`...
>
> I read all three patches and did not see a reason why we could not
> simplify the code that way.
>
> Other than that: great!

Thanks!

I considered it, but was put off by the (small) effort of yet another
diff refactoring.

I'll include it in a resend if a resend is needed, otherwise
I would suggest to make it a patch on top?

Thanks,
Stefan


Re: [PATCH v2 0/8] Clarify commit-graph and grafts/replace/shallow incompatibilities

2018-08-20 Thread Stefan Beller
On Mon, Aug 20, 2018 at 11:24 AM Derrick Stolee  wrote:
>
> One unresolved issue with the commit-graph feature is that it can cause
> issues when combined with replace objects, commit grafts, or shallow
> clones. These are not 100% incompatible, as one could be reasonably
> successful writing a commit-graph after replacing some objects and not
> have issues. The problems happen when commits that are already in the
> commit-graph file are replaced, or when git is run with the
> `--no-replace-objects` option; this can cause incorrect parents or
> incorrect generation numbers. Similar things occur with commit grafts
> and shallow clones, especially when running `git fetch --unshallow` in a
> shallow repo.
>
> Instead of trying (and probably failing) to make these features work
> together, default to making the commit-graph feature unavailable in these
> situations. Create a new method 'commit_graph_compatible(r)' that checks
> if the repository 'r' has any of these features enabled.
>
> CHANGES IN V2:
>
> * The first two commits regarding the ref iterators are unchanged, despite
>   a lot of discussion on the subject [1].
>
> * I included Peff's changes in jk/core-use-replace-refs, changing the base
>   commit for the series to 1689c22c1c328e9135ed51458e9f9a5d224c5057 (the merge
>   that brought that topic into 'msater').
>
> * I fixed the tests for the interactions with the graft feature.
>
> Because of the change of base, it is hard to provide a side-by-side diff
> from v1.
>
> Thanks,
> -Stolee
>
> [1] 
> https://public-inbox.org/git/CAGZ79kZ3PzqpGzXWcmxjzi98gA+LT2MBOf8KaA89hOa-Qig=o...@mail.gmail.com/
> Stefan's response recommending we keep the first two commits.
>

After reviewing my own patches, I flipped again (Sorry!) and would
rather not see my patches be merged, but the very original solution
by you, that you proposed at [1]. That said, I will not insist on it, and
if this is merged as is, we can fix it up later.

With that said, I just read through the remaining patches, I think
they are a valuable addition to Git and could be merged as-is.

[1] 
https://github.com/gitgitgadget/git/pull/11/commits/300db80140dacc927db0d46c804ca0ef4dcc1be1

Thanks,
Stefan


pw/rebase-i-author-script-fix, was Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)

2018-08-20 Thread Johannes Schindelin
Team,

On Mon, 20 Aug 2018, Phillip Wood wrote:

> On 17/08/2018 23:44, Junio C Hamano wrote:
> > Here are the topics that have been cooking.  Commits prefixed with
> > '-' are only in 'pu' (proposed updates) while commits prefixed with
> > '+' are in 'next'.  The ones marked with '.' do not appear in any of
> > the integration branches, but I am still holding onto them.
> 
> > * pw/rebase-i-author-script-fix (2018-08-07) 2 commits
> >  - sequencer: fix quoting in write_author_script
> >  - sequencer: handle errors from read_author_ident()
> > 
> >  Recent "git rebase -i" update started to write bogusly formatted
> >  author-script, with a matching broken reading code.  These are
> >  being fixed.
> > 
> >  Undecided.
> >  Is it the list consensus to favor this "with extra code, read the
> >  script written by bad writer" approach?
> 
> I think there was agreement between myself and Eric on the last version,
> I'm not sure anyone else has expressed an opinion. The problem with
> fixing the quoting without any backwards compatibility is that if git is
> upgraded while a rebase is stopped read_author_script() will happily use
> the broken quoting to create a corrupted author name in the new commit
> if the name contains "'".
> 
> The compatibility code in the latest version relies on the missing "'"
> at the end of the GIT_AUTHOR_DATE line which is fixed by
> es/rebase-i-author-script-fix which is now in master. If there is a
> release with es/rebase-i-author-script-fix but not
> pw/rebase-i-author-script-fix we'll have to rethink as the detection
> wont be reliable. I have a branch that fixes read_author_script() to use
> sq_dequote() at
> https://github.com/phillipwood/git/commits/wip/fix-author-script. At the
> moment it has compatibility with broken quoting, but I could strip that
> out and then sq_dequote() will return an error with the broken quoting
> and the user would have to restart the rebase. So one option is to drop
> this series and wait for me to finish the improved solution next month.

Having thought about it, I am now convinced that it wold be overkill to
cater to "upgrade in the middle of a rebase". I think we should drop that
part, as it uglifies the code rather a lot, and the impact is not worth
the effort IMHO.

Ciao,
Johannes


Re: [PATCH 1/3] diff.c: add --output-indicator-{new, old, context}

2018-08-20 Thread Johannes Schindelin
Hi Stefan,

On Fri, 17 Aug 2018, Stefan Beller wrote:

> This will prove useful in range-diff in a later patch as we will be able
> to differentiate between adding a new file (that line is starting with
> +++ and then the file name) and regular new lines.
> 
> It could also be useful for experimentation in new patch formats, i.e.
> we could teach git to emit moved lines with lines other than +/-.

Thanks.

> diff --git a/diff.c b/diff.c
> index c5c7739ce34..03486c35b75 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1281,7 +1281,9 @@ static void emit_diff_symbol_from_struct(struct 
> diff_options *o,
>   else if (c == '-')
>   set = diff_get_color_opt(o, DIFF_FILE_OLD);
>   }
> - emit_line_ws_markup(o, set_sign, set, reset, ' ', line, len,
^
Here we already pass `o`... so...

> + emit_line_ws_markup(o, set_sign, set, reset,
> + 
> o->output_indicators[OUTPUT_INDICATOR_CONTEXT],

^^^
... here, we could simply pass `OUTPUT_INDICATOR_CONTEXT` and let the
callee look it up in`o->output_indicators[]`...

I read all three patches and did not see a reason why we could not
simplify the code that way.

Other than that: great!

Thank you,
Dscho

> + line, len,
>   flags & (DIFF_SYMBOL_CONTENT_WS_MASK), 0);
>   break;
>   case DIFF_SYMBOL_PLUS:
> @@ -1324,7 +1326,9 @@ static void emit_diff_symbol_from_struct(struct 
> diff_options *o,
>   set = diff_get_color_opt(o, DIFF_CONTEXT_BOLD);
>   flags &= ~DIFF_SYMBOL_CONTENT_WS_MASK;
>   }
> - emit_line_ws_markup(o, set_sign, set, reset, '+', line, len,
> + emit_line_ws_markup(o, set_sign, set, reset,
> + o->output_indicators[OUTPUT_INDICATOR_NEW],
> + line, len,
>   flags & DIFF_SYMBOL_CONTENT_WS_MASK,
>   flags & DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF);
>   break;
> @@ -1367,7 +1371,9 @@ static void emit_diff_symbol_from_struct(struct 
> diff_options *o,
>   else
>   set = diff_get_color_opt(o, DIFF_CONTEXT_DIM);
>   }
> - emit_line_ws_markup(o, set_sign, set, reset, '-', line, len,
> + emit_line_ws_markup(o, set_sign, set, reset,
> + o->output_indicators[OUTPUT_INDICATOR_OLD],
> + line, len,
>   flags & DIFF_SYMBOL_CONTENT_WS_MASK, 0);
>   break;
>   case DIFF_SYMBOL_WORDS_PORCELAIN:
> @@ -4382,6 +4388,9 @@ void diff_setup(struct diff_options *options)
>  
>   options->file = stdout;
>  
> + options->output_indicators[OUTPUT_INDICATOR_NEW] = '+';
> + options->output_indicators[OUTPUT_INDICATOR_OLD] = '-';
> + options->output_indicators[OUTPUT_INDICATOR_CONTEXT] = ' ';
>   options->abbrev = DEFAULT_ABBREV;
>   options->line_termination = '\n';
>   options->break_opt = -1;
> @@ -4869,6 +4878,12 @@ int diff_opt_parse(struct diff_options *options,
>options->output_format |= DIFF_FORMAT_DIFFSTAT;
>   } else if (!strcmp(arg, "--no-compact-summary"))
>options->flags.stat_with_summary = 0;
> + else if (skip_prefix(arg, "--output-indicator-new=", ))
> + options->output_indicators[OUTPUT_INDICATOR_NEW] = arg[0];
> + else if (skip_prefix(arg, "--output-indicator-old=", ))
> + options->output_indicators[OUTPUT_INDICATOR_OLD] = arg[0];
> + else if (skip_prefix(arg, "--output-indicator-context=", ))
> + options->output_indicators[OUTPUT_INDICATOR_CONTEXT] = arg[0];
>  
>   /* renames options */
>   else if (starts_with(arg, "-B") ||
> diff --git a/diff.h b/diff.h
> index e1e54256c18..d7edc64454a 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -194,6 +194,11 @@ struct diff_options {
>   FILE *file;
>   int close_file;
>  
> +#define OUTPUT_INDICATOR_NEW 0
> +#define OUTPUT_INDICATOR_OLD 1
> +#define OUTPUT_INDICATOR_CONTEXT 2
> + char output_indicators[3];
> +
>   struct pathspec pathspec;
>   pathchange_fn_t pathchange;
>   change_fn_t change;
> -- 
> 2.18.0.265.g16de1b435c9.dirty
> 
> 


Re: [PATCH/RFC] commit: new option to abort -a something is already staged

2018-08-20 Thread Jonathan Nieder
Hi,

Nguyễn Thái Ngọc Duy wrote:

> So many times I have carefully cherry picked changes to the index with
> `git add -p` then accidentally did `git commit -am ` (usually by
> retrieving a command from history and pressing Enter too quickly)
> which destroyed beautiful index.
>
> One way to deal with this is some form of `git undo` that allows me to
> retrieve the old index.

Yes, I would love such an undo feature!

How would you imagine that this information would get stored?  We can
start with adding that and a low-level (plumbing) interface to it, to
avoid being blocked on getting the user-facing (porcelain) "git undo"
interface right.  (Or we can go straight for the porcelain.  It's fine
for it to start minimal and gain switches later.)

[...]
> Instead, let's handle just this problem for now. This new option
> commit.preciousDirtyIndex, if set to false, will not allow `commit -a`
> to continue if the final index is different from the existing one. I
> don't think this can be achieved with hooks because the hooks don't
> know about "-a" or different commit modes.

I frequently use "git commit -a" this way intentionally, so I would be
unlikely to turn this config on.  That leads me to suspect it's not a
good candidate for configuration:

- it's not configuration for the sake of a transition period, since some
  people would keep it on forever

- it's not configuration based on different project needs, either

So configuration doesn't feel like a good fit.

It might be that I can switch my muscle memory to "git add -u && git
commit", in which case this could work as a new default without
configuration (or with configuration for a transition period).  A
separate commandline option "git commit -a --no-clobber-index" might
make sense as well, since then I could pass --clobber-index to keep my
current workflow.  That would also follow the usual progression:
introduce commandline option first, then config, then change default.

That said, I lean toward your initial thought, that this is papering
over a missing undo feature.  Can you say more about how you'd imagine
undo working?

Thanks,
Jonathan


Re: Do not raise conflict when a code in a patch was already added

2018-08-20 Thread Johannes Sixt

Am 20.08.2018 um 19:40 schrieb Phillip Wood:

On 20/08/2018 11:22, Konstantin Kharlamov wrote:

It's spectacular, that content of one of inserted conflict markers is
empty, so all you have to do is to remove the markers, and use `git add`
on the file, and then `git rebase --continue`

Its a lot of unncessary actions, git could just figure that the code it
sees in the patch is already there, being a part of another commit.


If there are conflict markers where one side is empty it means that some
lines from the merge base (which for a rebase is the parent of the
commit being picked) have been deleted on one side and modified on the
other. Git cannot know if you want to use the deleted version or the
modified version.


There's another possibility (and I think it is what happens actually in 
Konstantin's case): When one side added lines 1 2 and the other side 
added 1 2 3, then the actual conflict is << 1 2 == 1 2 3 >>, but our 
merge code is able to move the identical part out of the conflicted 
section: 1 2 << == 3 >>. But this is just a courtesy for the user; the 
real conflict is the original one. Without this optimization, the work 
to resolve the conflict would be slightly more arduous.


-- Hannes


Re: [PATCH v2 0/8] Clarify commit-graph and grafts/replace/shallow incompatibilities

2018-08-20 Thread Stefan Beller
On Mon, Aug 20, 2018 at 11:24 AM Derrick Stolee  wrote:
> Because of the change of base, it is hard to provide a side-by-side diff
> from v1.

I thought that was the whole point of range-diff. ;-)

I'll take a look.

Thanks,
Stefan


Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0

2018-08-20 Thread Stefan Beller
On Mon, Aug 20, 2018 at 6:18 AM Matthew DeVore  wrote:
>
> There were many instances in this file where it seemed like BUG would be
> better, so I created a new commit before this one to switch them over. The
> interdiff is below.
>
> BTW, why are there so many instances of "die" without "_"? I expect all
> errors that may be caused by a user to be localized.

Well, there is the porcelain layer to be consumed by a human user
and the plumbing that is good for scripts. And in scripts you might want
to grep for certain errors and react to that, so a non-localized error
message makes the script possible to run in any localisation.

The BUG is strictly for things that are due to Gits internals,
not for problematic user input. Problematic user input
definitely wants a die(...), and depending on the plumbing/porcelain
layer it may need to be _(translatable).

I think BUG() would never go with translated strings.

> I'm going by the output of this: grep -IrE '\Wdie\([^_]' --exclude-dir=t
>
> diff --git a/list-objects-filter.c b/list-objects-filter.c
> index 8e3caf5bf..09b2b05d5 100644
> --- a/list-objects-filter.c
> +++ b/list-objects-filter.c
> @@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none(
>
> switch (filter_situation) {
> default:
> -   die("unknown filter_situation");
> -   return LOFR_ZERO;
> +   BUG("unknown filter_situation: %d", filter_situation);
>
> case LOFS_BEGIN_TREE:
> assert(obj->type == OBJ_TREE);
> @@ -99,8 +98,7 @@ static enum list_objects_filter_result filter_trees_none(
>
> switch (filter_situation) {
> default:
> -   die("unknown filter_situation");
> -   return LOFR_ZERO;
> +   BUG("unknown filter_situation: %d", filter_situation);
>
> case LOFS_BEGIN_TREE:
> case LOFS_BLOB:
> @@ -151,8 +149,7 @@ static enum list_objects_filter_result
> filter_blobs_limit(
>
> switch (filter_situation) {
> default:
> -   die("unknown filter_situation");
> -   return LOFR_ZERO;
> +   BUG("unknown filter_situation: %d", filter_situation);
>
> case LOFS_BEGIN_TREE:
> assert(obj->type == OBJ_TREE);
> @@ -257,8 +254,7 @@ static enum list_objects_filter_result filter_sparse(
>
> switch (filter_situation) {
> default:
> -   die("unknown filter_situation");
> -   return LOFR_ZERO;
> +   BUG("unknown filter_situation: %d", filter_situation);

Up until here we just have replace the die by BUG in the default
case of the state machine switch. (We need the default due to strict
compile flags, but as filter_situation is an enum I thought we would not
as compilers are smart enough to see we got all values of the enum
covered).

I agree that keeping the defaults and having a BUG() is reasonable.


>
> case LOFS_BEGIN_TREE:
> assert(obj->type == OBJ_TREE);
> @@ -439,7 +435,7 @@ void *list_objects_filter__init(
> assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
>
> if (filter_options->choice >= LOFC__COUNT)
> -   die("invalid list-objects filter choice: %d",
> +   BUG("invalid list-objects filter choice: %d",
> filter_options->choice);

This also makes sense, combined with the assert before, this looks like
really defensive code.

I think this patch is a good idea!

Thanks,
Stefan


Re: [PATCH v3] checkout: optimize "git checkout -b "

2018-08-20 Thread Junio C Hamano
Elijah Newren  writes:

> == The patch ==
> - Ben's patch only affects the "checkout -b $NEWBRANCH" case.  He
> checks for it by looking for any other flag that would be a different
> case, and using the old codepath if he finds any.
> - This means there is a "giant list of checks" for this optimization,
> and an ongoing maintenance burden because if anyone ever adds any
> extra options, this optimization might suddenly break things if that
> giant list of checks isn't updated.

Correct.  

Having said that, I do not mind leaving it in 'pu' for the rest of
the cycle, possibly merging it to 'next' after that and keeping it
there, to see how horrible a maintenance burden it would become in
real life, though.



[PATCH v2 1/8] refs.c: migrate internal ref iteration to pass thru repository argument

2018-08-20 Thread Derrick Stolee
From: Stefan Beller 

In 60ce76d3581 (refs: add repository argument to for_each_replace_ref,
2018-04-11) and 0d296c57aec (refs: allow for_each_replace_ref to handle
arbitrary repositories, 2018-04-11), for_each_replace_ref learned how
to iterate over refs by a given arbitrary repository.
New attempts in the object store conversion have shown that it is useful
to have the repository handle available that the refs iteration is
currently iterating over.

To achieve this goal we will need to add a repository argument to
each_ref_fn in refs.h. However as many callers rely on the signature
such a patch would be too large.

So convert the internals of the ref subsystem first to pass through a
repository argument without exposing the change to the user. Assume
the_repository for the passed through repository, although it is not
used anywhere yet.

Signed-off-by: Stefan Beller 
Signed-off-by: Derrick Stolee 
---
 refs.c   | 39 +--
 refs.h   | 10 ++
 refs/iterator.c  |  6 +++---
 refs/refs-internal.h |  5 +++--
 4 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/refs.c b/refs.c
index 457fb78057..7cd76f72d2 100644
--- a/refs.c
+++ b/refs.c
@@ -1386,17 +1386,50 @@ struct ref_iterator *refs_ref_iterator_begin(
  * non-zero value, stop the iteration and return that value;
  * otherwise, return 0.
  */
+static int do_for_each_repo_ref(struct repository *r, const char *prefix,
+   each_repo_ref_fn fn, int trim, int flags,
+   void *cb_data)
+{
+   struct ref_iterator *iter;
+   struct ref_store *refs = get_main_ref_store(r);
+
+   if (!refs)
+   return 0;
+
+   iter = refs_ref_iterator_begin(refs, prefix, trim, flags);
+
+   return do_for_each_repo_ref_iterator(r, iter, fn, cb_data);
+}
+
+struct do_for_each_ref_help {
+   each_ref_fn *fn;
+   void *cb_data;
+};
+
+static int do_for_each_ref_helper(struct repository *r,
+ const char *refname,
+ const struct object_id *oid,
+ int flags,
+ void *cb_data)
+{
+   struct do_for_each_ref_help *hp = cb_data;
+
+   return hp->fn(refname, oid, flags, hp->cb_data);
+}
+
 static int do_for_each_ref(struct ref_store *refs, const char *prefix,
   each_ref_fn fn, int trim, int flags, void *cb_data)
 {
struct ref_iterator *iter;
+   struct do_for_each_ref_help hp = { fn, cb_data };
 
if (!refs)
return 0;
 
iter = refs_ref_iterator_begin(refs, prefix, trim, flags);
 
-   return do_for_each_ref_iterator(iter, fn, cb_data);
+   return do_for_each_repo_ref_iterator(the_repository, iter,
+   do_for_each_ref_helper, );
 }
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
@@ -2025,10 +2058,12 @@ int refs_verify_refname_available(struct ref_store 
*refs,
 int refs_for_each_reflog(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
struct ref_iterator *iter;
+   struct do_for_each_ref_help hp = { fn, cb_data };
 
iter = refs->be->reflog_iterator_begin(refs);
 
-   return do_for_each_ref_iterator(iter, fn, cb_data);
+   return do_for_each_repo_ref_iterator(the_repository, iter,
+do_for_each_ref_helper, );
 }
 
 int for_each_reflog(each_ref_fn fn, void *cb_data)
diff --git a/refs.h b/refs.h
index cc2fb4c68c..80eec8bbc6 100644
--- a/refs.h
+++ b/refs.h
@@ -274,6 +274,16 @@ struct ref_transaction;
 typedef int each_ref_fn(const char *refname,
const struct object_id *oid, int flags, void *cb_data);
 
+/*
+ * The same as each_ref_fn, but also with a repository argument that
+ * contains the repository associated with the callback.
+ */
+typedef int each_repo_ref_fn(struct repository *r,
+const char *refname,
+const struct object_id *oid,
+int flags,
+void *cb_data);
+
 /*
  * The following functions invoke the specified callback function for
  * each reference indicated.  If the function ever returns a nonzero
diff --git a/refs/iterator.c b/refs/iterator.c
index 2ac91ac340..629e00a122 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -407,15 +407,15 @@ struct ref_iterator *prefix_ref_iterator_begin(struct 
ref_iterator *iter0,
 
 struct ref_iterator *current_ref_iter = NULL;
 
-int do_for_each_ref_iterator(struct ref_iterator *iter,
-each_ref_fn fn, void *cb_data)
+int do_for_each_repo_ref_iterator(struct repository *r, struct ref_iterator 
*iter,
+ each_repo_ref_fn fn, void *cb_data)
 {
int retval = 0, ok;
struct ref_iterator *old_ref_iter = current_ref_iter;
 

[PATCH v2 8/8] commit-graph: close_commit_graph before shallow walk

2018-08-20 Thread Derrick Stolee
Call close_commit_graph() when about to start a rev-list walk that
includes shallow commits. This is necessary in code paths that "fake"
shallow commits for the sake of fetch. Specifically, test 351 in
t5500-fetch-pack.sh runs

git fetch --shallow-exclude one origin

with a file-based transfer. When the "remote" has a commit-graph, we do
not prevent the commit-graph from being loaded, but then the commits are
intended to be dynamically transferred into shallow commits during
get_shallow_commits_by_rev_list(). By closing the commit-graph before
this call, we prevent this interaction.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 8 
 commit-graph.h | 1 +
 upload-pack.c  | 2 ++
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index cee2caab5c..4bd1a4abbf 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -260,10 +260,10 @@ static int prepare_commit_graph(struct repository *r)
return !!r->objects->commit_graph;
 }
 
-static void close_commit_graph(void)
+void close_commit_graph(struct repository *r)
 {
-   free_commit_graph(the_repository->objects->commit_graph);
-   the_repository->objects->commit_graph = NULL;
+   free_commit_graph(r->objects->commit_graph);
+   r->objects->commit_graph = NULL;
 }
 
 static int bsearch_graph(struct commit_graph *g, struct object_id *oid, 
uint32_t *pos)
@@ -875,7 +875,7 @@ void write_commit_graph(const char *obj_dir,
write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
-   close_commit_graph();
+   close_commit_graph(the_repository);
finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
commit_lock_file();
 
diff --git a/commit-graph.h b/commit-graph.h
index 76e098934a..13d736cdde 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -59,6 +59,7 @@ void write_commit_graph(const char *obj_dir,
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
 
+void close_commit_graph(struct repository *);
 void free_commit_graph(struct commit_graph *);
 
 #endif
diff --git a/upload-pack.c b/upload-pack.c
index 82b393ec31..2ae9d9bb47 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -24,6 +24,7 @@
 #include "quote.h"
 #include "upload-pack.h"
 #include "serve.h"
+#include "commit-graph.h"
 
 /* Remember to update object flag allocation in object.h */
 #define THEY_HAVE  (1u << 11)
@@ -740,6 +741,7 @@ static void deepen_by_rev_list(int ac, const char **av,
 {
struct commit_list *result;
 
+   close_commit_graph(the_repository);
result = get_shallow_commits_by_rev_list(ac, av, SHALLOW, NOT_SHALLOW);
send_shallow(result);
free_commit_list(result);
-- 
2.18.0.118.gd4f65b8d14



[PATCH v2 2/8] refs.c: upgrade for_each_replace_ref to be a each_repo_ref_fn callback

2018-08-20 Thread Derrick Stolee
From: Stefan Beller 

Signed-off-by: Stefan Beller 
Signed-off-by: Derrick Stolee 
---
 builtin/replace.c | 8 
 refs.c| 9 -
 refs.h| 2 +-
 replace-object.c  | 5 +++--
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/builtin/replace.c b/builtin/replace.c
index e997a562f0..b5861a0ee9 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -39,7 +39,8 @@ struct show_data {
enum replace_format format;
 };
 
-static int show_reference(const char *refname, const struct object_id *oid,
+static int show_reference(struct repository *r, const char *refname,
+ const struct object_id *oid,
  int flag, void *cb_data)
 {
struct show_data *data = cb_data;
@@ -56,9 +57,8 @@ static int show_reference(const char *refname, const struct 
object_id *oid,
if (get_oid(refname, ))
return error("Failed to resolve '%s' as a valid 
ref.", refname);
 
-   obj_type = oid_object_info(the_repository, ,
-  NULL);
-   repl_type = oid_object_info(the_repository, oid, NULL);
+   obj_type = oid_object_info(r, , NULL);
+   repl_type = oid_object_info(r, oid, NULL);
 
printf("%s (%s) -> %s (%s)\n", refname, 
type_name(obj_type),
   oid_to_hex(oid), type_name(repl_type));
diff --git a/refs.c b/refs.c
index 7cd76f72d2..c5a5f727e8 100644
--- a/refs.c
+++ b/refs.c
@@ -1474,12 +1474,11 @@ int refs_for_each_fullref_in(struct ref_store *refs, 
const char *prefix,
return do_for_each_ref(refs, prefix, fn, 0, flag, cb_data);
 }
 
-int for_each_replace_ref(struct repository *r, each_ref_fn fn, void *cb_data)
+int for_each_replace_ref(struct repository *r, each_repo_ref_fn fn, void 
*cb_data)
 {
-   return do_for_each_ref(get_main_ref_store(r),
-  git_replace_ref_base, fn,
-  strlen(git_replace_ref_base),
-  DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
+   return do_for_each_repo_ref(r, git_replace_ref_base, fn,
+   strlen(git_replace_ref_base),
+   DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
 }
 
 int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
diff --git a/refs.h b/refs.h
index 80eec8bbc6..a0a18223a1 100644
--- a/refs.h
+++ b/refs.h
@@ -317,7 +317,7 @@ int for_each_fullref_in(const char *prefix, each_ref_fn fn, 
void *cb_data,
 int for_each_tag_ref(each_ref_fn fn, void *cb_data);
 int for_each_branch_ref(each_ref_fn fn, void *cb_data);
 int for_each_remote_ref(each_ref_fn fn, void *cb_data);
-int for_each_replace_ref(struct repository *r, each_ref_fn fn, void *cb_data);
+int for_each_replace_ref(struct repository *r, each_repo_ref_fn fn, void 
*cb_data);
 int for_each_glob_ref(each_ref_fn fn, const char *pattern, void *cb_data);
 int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 const char *prefix, void *cb_data);
diff --git a/replace-object.c b/replace-object.c
index 4162df6324..3c17864eb7 100644
--- a/replace-object.c
+++ b/replace-object.c
@@ -6,7 +6,8 @@
 #include "repository.h"
 #include "commit.h"
 
-static int register_replace_ref(const char *refname,
+static int register_replace_ref(struct repository *r,
+   const char *refname,
const struct object_id *oid,
int flag, void *cb_data)
 {
@@ -25,7 +26,7 @@ static int register_replace_ref(const char *refname,
oidcpy(_obj->replacement, oid);
 
/* Register new object */
-   if (oidmap_put(the_repository->objects->replace_map, repl_obj))
+   if (oidmap_put(r->objects->replace_map, repl_obj))
die("duplicate replace ref: %s", refname);
 
return 0;
-- 
2.18.0.118.gd4f65b8d14



[PATCH v2 5/8] commit-graph: not compatible with replace objects

2018-08-20 Thread Derrick Stolee
Create new method commit_graph_compatible(r) to check if a given
repository r is compatible with the commit-graph feature. Fill the
method with a check to see if replace-objects exist. Test this
interaction succeeds, including ignoring an existing commit-graph and
failing to write a new commit-graph. However, we do ensure that
we write a new commit-graph by setting read_replace_refs to 0, thereby
ignoring the replace refs.

Signed-off-by: Derrick Stolee 
---
 builtin/commit-graph.c  |  4 
 commit-graph.c  | 21 +
 replace-object.c|  2 +-
 replace-object.h|  2 ++
 t/t5318-commit-graph.sh | 22 ++
 5 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 0bf0c48657..da737df321 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -120,6 +120,8 @@ static int graph_read(int argc, const char **argv)
return 0;
 }
 
+extern int read_replace_refs;
+
 static int graph_write(int argc, const char **argv)
 {
struct string_list *pack_indexes = NULL;
@@ -150,6 +152,8 @@ static int graph_write(int argc, const char **argv)
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
 
+   read_replace_refs = 0;
+
if (opts.reachable) {
write_commit_graph_reachable(opts.obj_dir, opts.append);
return 0;
diff --git a/commit-graph.c b/commit-graph.c
index b0a55ad128..2c01fa433f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -13,6 +13,8 @@
 #include "commit-graph.h"
 #include "object-store.h"
 #include "alloc.h"
+#include "hashmap.h"
+#include "replace-object.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
@@ -56,6 +58,19 @@ static struct commit_graph *alloc_commit_graph(void)
return g;
 }
 
+extern int read_replace_refs;
+
+static int commit_graph_compatible(struct repository *r)
+{
+   if (read_replace_refs) {
+   prepare_replace_object(r);
+   if (hashmap_get_size(>objects->replace_map->map))
+   return 0;
+   }
+
+   return 1;
+}
+
 struct commit_graph *load_commit_graph_one(const char *graph_file)
 {
void *graph_map;
@@ -223,6 +238,9 @@ static int prepare_commit_graph(struct repository *r)
 */
return 0;
 
+   if (!commit_graph_compatible(r))
+   return 0;
+
obj_dir = r->objects->objectdir;
prepare_commit_graph_one(r, obj_dir);
prepare_alt_odb(r);
@@ -693,6 +711,9 @@ void write_commit_graph(const char *obj_dir,
int num_extra_edges;
struct commit_list *parent;
 
+   if (!commit_graph_compatible(the_repository))
+   return;
+
oids.nr = 0;
oids.alloc = approximate_object_count() / 4;
 
diff --git a/replace-object.c b/replace-object.c
index 3c17864eb7..9821f1477e 100644
--- a/replace-object.c
+++ b/replace-object.c
@@ -32,7 +32,7 @@ static int register_replace_ref(struct repository *r,
return 0;
 }
 
-static void prepare_replace_object(struct repository *r)
+void prepare_replace_object(struct repository *r)
 {
if (r->objects->replace_map)
return;
diff --git a/replace-object.h b/replace-object.h
index 9345e105dd..16528df942 100644
--- a/replace-object.h
+++ b/replace-object.h
@@ -10,6 +10,8 @@ struct replace_object {
struct object_id replacement;
 };
 
+void prepare_replace_object(struct repository *r);
+
 /*
  * This internal function is only declared here for the benefit of
  * lookup_replace_object().  Please do not call it directly.
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 4f17d7701e..e0c3c60f66 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -259,6 +259,28 @@ test_expect_success 'check that gc computes commit-graph' '
test_cmp commit-graph-after-gc $objdir/info/commit-graph
 '
 
+test_expect_success 'replace-objects invalidates commit-graph' '
+   cd "$TRASH_DIRECTORY" &&
+   test_when_finished rm -rf replace &&
+   git clone full replace &&
+   (
+   cd replace &&
+   git commit-graph write --reachable &&
+   test_path_is_file .git/objects/info/commit-graph &&
+   git replace HEAD~1 HEAD~2 &&
+   git -c core.commitGraph=false log >expect &&
+   git -c core.commitGraph=true log >actual &&
+   test_cmp expect actual &&
+   git commit-graph write --reachable &&
+   git -c core.commitGraph=false --no-replace-objects log >expect 
&&
+   git -c core.commitGraph=true --no-replace-objects log >actual &&
+   test_cmp expect actual &&
+   rm -rf .git/objects/info/commit-graph &&
+   git commit-graph write --reachable &&
+   test_path_is_file .git/objects/info/commit-graph
+   )

[PATCH v2 3/8] commit-graph: update design document

2018-08-20 Thread Derrick Stolee
As it exists right now, the commit-graph feature may provide
inconsistent results when combined with commit grafts, replace objects,
and shallow clones. Update the design document to discuss why these
interactions are difficult to reconcile and how we will avoid errors by
preventing updates to and reads from the commit-graph file when these
other features exist.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph.txt | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
index c664acbd76..001395e950 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -112,12 +112,24 @@ Design Details
 - The file format includes parameters for the object ID hash function,
   so a future change of hash algorithm does not require a change in format.
 
+- Commit grafts and replace objects can change the shape of the commit
+  history. The latter can also be enabled/disabled on the fly using
+  `--no-replace-objects`. This leads to difficultly storing both possible
+  interpretations of a commit id, especially when computing generation
+  numbers. The commit-graph will not be read or written when
+  replace-objects or grafts are present.
+
+- Shallow clones create grafts of commits by dropping their parents. This
+  leads the commit-graph to think those commits have generation number 1.
+  If and when those commits are made unshallow, those generation numbers
+  become invalid. Since shallow clones are intended to restrict the commit
+  history to a very small set of commits, the commit-graph feature is less
+  helpful for these clones, anyway. The commit-graph will not be read or
+  written when shallow commits are present.
+
 Future Work
 ---
 
-- The commit graph feature currently does not honor commit grafts. This can
-  be remedied by duplicating or refactoring the current graft logic.
-
 - After computing and storing generation numbers, we must make graph
   walks aware of generation numbers to gain the performance benefits they
   enable. This will mostly be accomplished by swapping a commit-date-ordered
-- 
2.18.0.118.gd4f65b8d14



[PATCH v2 4/8] test-repository: properly init repo

2018-08-20 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 t/helper/test-repository.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/t/helper/test-repository.c b/t/helper/test-repository.c
index 2762ca6562..6a84a53efb 100644
--- a/t/helper/test-repository.c
+++ b/t/helper/test-repository.c
@@ -15,7 +15,10 @@ static void test_parse_commit_in_graph(const char *gitdir, 
const char *worktree,
struct commit *c;
struct commit_list *parent;
 
-   repo_init(, gitdir, worktree);
+   setup_git_env(gitdir);
+
+   if (repo_init(, gitdir, worktree))
+   die("Couldn't init repo");
 
c = lookup_commit(, commit_oid);
 
@@ -38,7 +41,10 @@ static void test_get_commit_tree_in_graph(const char *gitdir,
struct commit *c;
struct tree *tree;
 
-   repo_init(, gitdir, worktree);
+   setup_git_env(gitdir);
+
+   if (repo_init(, gitdir, worktree))
+   die("Couldn't init repo");
 
c = lookup_commit(, commit_oid);
 
-- 
2.18.0.118.gd4f65b8d14



[PATCH v2 7/8] commit-graph: not compatible with uninitialized repo

2018-08-20 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index c4eaedf4e5..cee2caab5c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -62,6 +62,9 @@ extern int read_replace_refs;
 
 static int commit_graph_compatible(struct repository *r)
 {
+   if (!r->gitdir)
+   return 0;
+
if (read_replace_refs) {
prepare_replace_object(r);
if (hashmap_get_size(>objects->replace_map->map))
-- 
2.18.0.118.gd4f65b8d14



[PATCH v2 6/8] commit-graph: not compatible with grafts

2018-08-20 Thread Derrick Stolee
Augment commit_graph_compatible(r) to return false when the given
repository r has commit grafts or is a shallow clone. Test that in these
situations we ignore existing commit-graph files and we do not write new
commit-graph files.

Helped-by: Jakub Narebski 
Signed-off-by: Derrick Stolee 
---
 commit-graph.c  |  6 ++
 commit.c|  2 +-
 commit.h|  1 +
 t/t5318-commit-graph.sh | 38 ++
 4 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 2c01fa433f..c4eaedf4e5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -68,6 +68,12 @@ static int commit_graph_compatible(struct repository *r)
return 0;
}
 
+   prepare_commit_graft(r);
+   if (r->parsed_objects && r->parsed_objects->grafts_nr)
+   return 0;
+   if (is_repository_shallow(r))
+   return 0;
+
return 1;
 }
 
diff --git a/commit.c b/commit.c
index 30d1af2b20..ef9a2cbb23 100644
--- a/commit.c
+++ b/commit.c
@@ -209,7 +209,7 @@ static int read_graft_file(struct repository *r, const char 
*graft_file)
return 0;
 }
 
-static void prepare_commit_graft(struct repository *r)
+void prepare_commit_graft(struct repository *r)
 {
char *graft_file;
 
diff --git a/commit.h b/commit.h
index da0db36eba..5459e279fe 100644
--- a/commit.h
+++ b/commit.h
@@ -202,6 +202,7 @@ typedef int (*each_commit_graft_fn)(const struct 
commit_graft *, void *);
 
 struct commit_graft *read_graft_line(struct strbuf *line);
 int register_commit_graft(struct repository *r, struct commit_graft *, int);
+void prepare_commit_graft(struct repository *r);
 struct commit_graft *lookup_commit_graft(struct repository *r, const struct 
object_id *oid);
 
 extern struct commit_list *get_merge_bases(struct commit *rev1, struct commit 
*rev2);
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index e0c3c60f66..6aee861f78 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -281,6 +281,44 @@ test_expect_success 'replace-objects invalidates 
commit-graph' '
)
 '
 
+test_expect_success 'commit grafts invalidate commit-graph' '
+   cd "$TRASH_DIRECTORY" &&
+   test_when_finished rm -rf graft &&
+   git clone full graft &&
+   (
+   cd graft &&
+   git commit-graph write --reachable &&
+   test_path_is_file .git/objects/info/commit-graph &&
+   H1=$(git rev-parse --verify HEAD~1) &&
+   H3=$(git rev-parse --verify HEAD~3) &&
+   echo "$H1 $H3" >.git/info/grafts &&
+   git -c core.commitGraph=false log >expect &&
+   git -c core.commitGraph=true log >actual &&
+   test_cmp expect actual &&
+   git commit-graph write --reachable &&
+   git -c core.commitGraph=false --no-replace-objects log >expect 
&&
+   git -c core.commitGraph=true --no-replace-objects log >actual &&
+   test_cmp expect actual &&
+   rm -rf .git/objects/info/commit-graph &&
+   git commit-graph write --reachable &&
+   test_path_is_missing .git/objects/info/commit-graph
+   )
+'
+
+test_expect_success 'replace-objects invalidates commit-graph' '
+   cd "$TRASH_DIRECTORY" &&
+   test_when_finished rm -rf shallow &&
+   git clone --depth 2 "file://$TRASH_DIRECTORY/full" shallow &&
+   (
+   cd shallow &&
+   git commit-graph write --reachable &&
+   test_path_is_missing .git/objects/info/commit-graph &&
+   git fetch origin --unshallow &&
+   git commit-graph write --reachable &&
+   test_path_is_file .git/objects/info/commit-graph
+   )
+'
+
 # the verify tests below expect the commit-graph to contain
 # exactly the commits reachable from the commits/8 branch.
 # If the file changes the set of commits in the list, then the
-- 
2.18.0.118.gd4f65b8d14



[PATCH v2 0/8] Clarify commit-graph and grafts/replace/shallow incompatibilities

2018-08-20 Thread Derrick Stolee
One unresolved issue with the commit-graph feature is that it can cause
issues when combined with replace objects, commit grafts, or shallow
clones. These are not 100% incompatible, as one could be reasonably
successful writing a commit-graph after replacing some objects and not
have issues. The problems happen when commits that are already in the
commit-graph file are replaced, or when git is run with the
`--no-replace-objects` option; this can cause incorrect parents or
incorrect generation numbers. Similar things occur with commit grafts
and shallow clones, especially when running `git fetch --unshallow` in a
shallow repo.

Instead of trying (and probably failing) to make these features work
together, default to making the commit-graph feature unavailable in these
situations. Create a new method 'commit_graph_compatible(r)' that checks
if the repository 'r' has any of these features enabled.

CHANGES IN V2:

* The first two commits regarding the ref iterators are unchanged, despite
  a lot of discussion on the subject [1].

* I included Peff's changes in jk/core-use-replace-refs, changing the base
  commit for the series to 1689c22c1c328e9135ed51458e9f9a5d224c5057 (the merge
  that brought that topic into 'msater').

* I fixed the tests for the interactions with the graft feature.

Because of the change of base, it is hard to provide a side-by-side diff
from v1.

Thanks,
-Stolee

[1] 
https://public-inbox.org/git/CAGZ79kZ3PzqpGzXWcmxjzi98gA+LT2MBOf8KaA89hOa-Qig=o...@mail.gmail.com/
Stefan's response recommending we keep the first two commits.

Derrick Stolee (6):
  commit-graph: update design document
  test-repository: properly init repo
  commit-graph: not compatible with replace objects
  commit-graph: not compatible with grafts
  commit-graph: not compatible with uninitialized repo
  commit-graph: close_commit_graph before shallow walk

Stefan Beller (2):
  refs.c: migrate internal ref iteration to pass thru repository
argument
  refs.c: upgrade for_each_replace_ref to be a each_repo_ref_fn callback

 Documentation/technical/commit-graph.txt | 18 +--
 builtin/commit-graph.c   |  4 ++
 builtin/replace.c|  8 ++--
 commit-graph.c   | 38 +--
 commit-graph.h   |  1 +
 commit.c |  2 +-
 commit.h |  1 +
 refs.c   | 48 ---
 refs.h   | 12 -
 refs/iterator.c  |  6 +--
 refs/refs-internal.h |  5 +-
 replace-object.c |  7 +--
 replace-object.h |  2 +
 t/helper/test-repository.c   | 10 +++-
 t/t5318-commit-graph.sh  | 60 
 upload-pack.c|  2 +
 16 files changed, 194 insertions(+), 30 deletions(-)


base-commit: 1689c22c1c328e9135ed51458e9f9a5d224c5057
-- 
2.18.0.118.gd4f65b8d14



Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)

2018-08-20 Thread Jonathan Nieder
Junio C Hamano wrote:

> * jh/structured-logging (2018-07-25) 25 commits
>  - structured-logging: add config data facility
>  - structured-logging: t0420 tests for interacitve child_summary
>  - structured-logging: t0420 tests for child process detail events
>  - structured-logging: add child process classification
>  - structured-logging: add detail-events for child processes
>  - structured-logging: add structured logging to remote-curl
>  - structured-logging: t0420 tests for aux-data
>  - structured-logging: add aux-data for size of sparse-checkout file
>  - structured-logging: add aux-data for index size
>  - structured-logging: add aux-data facility
>  - structured-logging: t0420 tests for timers
>  - structured-logging: add timer around preload_index
>  - structured-logging: add timer around wt-status functions
>  - structured-logging: add timer around do_write_index
>  - structured-logging: add timer around do_read_index
>  - structured-logging: add timer facility
>  - structured-logging: add detail-event for lazy_init_name_hash
>  - structured-logging: add detail-event facility
>  - structured-logging: t0420 basic tests
>  - structured-logging: set sub_command field for checkout command
>  - structured-logging: set sub_command field for branch command
>  - structured-logging: add session-id to log events
>  - structured-logging: add structured logging framework
>  - structured-logging: add STRUCTURED_LOGGING=1 to Makefile
>  - structured-logging: design document
>
>  Will merge to 'next'.

I believe this is under-reviewed (which I certainly share some blame
for).  Please hold off and I can look at it today.


Re: [PATCH v3] checkout: optimize "git checkout -b "

2018-08-20 Thread Elijah Newren
On Mon, Aug 20, 2018 at 6:40 AM Ben Peart  wrote:
> On 8/18/2018 9:44 PM, Elijah Newren wrote:
...
> > == My opinions ==
> > - The performance wins are big enough that I can see why Ben is pushing 
> > this.
> > - I totally see Duy's point that more general optimizations would be
> > really nice.
> > - I really dislike the "giant list of checks" and the ongoing
> > maintenance burden it implies.
> >
> > Overall, I have to side with Duy and say I don't think we should take
> > the patch as-is.  Since that'll be frustrating for Ben to hear, I've
> > tried to come up with some additional alternatives:
>
> Thank you for the thorough review and summary.  From my perspective, it
> is both fair and accurate.
>
...
> > - Rewrite this patch so it instead does a very small set of checks at
> > the beginning of cmd_checkout(); e.g. check if argc == 3 and argv[1]
> > == "-b" and if so then perform the minimum operations needed to create
> > and checkout the new branch (maybe even calling in to cmd_branch() and
> > cmd_symbolic_ref() and calling some reflog update function).  Preface
> > it with a comment that it's a performance hack that might eventually
> > be able to go away.
>
> I'm happy to do this if it would make the patch better/more acceptable
> to the community.

For whatever it's worth, it'd change my vote.  While I would like
general optimizations more, I do think that's overall a longer term
prospect, and the wins here are big enough to justify a performance
hack -- if the amount of new code and added maintenance overhead is
minimized.  I think that checking for the args matching checkout -b
rather than checking all flags that might imply something other than
args being checkout -b would be a good way to help minimize both.

> One other change I could do would be to address Duy's concern about the
> config option would be to remove that entirely.  That means those using
> sparse-checkout don't have the option of getting the performance win but
> the benefit is that there isn't any documentation or behavior changes
> that need to be made - things just get a lot faster if you happen to use
> the (rather common) "checkout -b" option.

Playing with sparse-checkout, it feels to me like a half-baked
feature.  It seems like it required too much manual work, and it was
sometimes hard to tell if I was misunderstanding configuration rules
or I was just running into bugs in the code.  I think I hit both but I
didn't really want to get side-tracked further, yet.  (I do want to
eventually come back to it.)  The only reason someone would go through
that pain is if it provided massive performance benefits.

Viewing things when-in-doubt through the lens of strict backwards
compatibility may inadvertently mean enforcing buggy and/or slow
behavior with sparse-checkouts.  So, while git usually focuses
strongly on backwards compatibility (which is a good thing), that
might actually be wrong in the specific case of the sparse-checkout
feature.  In particular, since the whole point of the feature is
essentially a performance hack, if the initial implementation has ways
in which it really hurts performance, it seems like it'd be more in
keeping with the point of the feature to take performance fixes and
swallow subtle behavioral changes than to require special flags to get
decent performance.

That might sound like a long winded way of saying I'm totally in favor
of dropping the config option in this case, but I'm also trying to
build a case that I think there will be other situations where we want
to subtly and maybe even not-so-subtly change behavior of
sparse-checkouts to make them both more performant and far nicer for
end users.  I'm not sure others will agree with me on this argument,
but sparse checkouts felt painful to me and I think we need something
better.

> If making these two changes would resolve the remaining concerns, I'm
> happy to make them and send another patch series.  If it will just
> result in another round of "I'm still not comfortable with this" then
> I'd rather spend my time elsewhere. :)

That's fair.  The first change would be enough to resolve the concern
for me if others strongly push back on the second
(config-option-removal) change, but I'd definitely prefer both changes
-- fewer options means less code, less ongoing maintenance work, and
less manual setup needed by users.

> > - Possibly crazy idea for massive global performance win: modify how
> > the cache works so that it can have tree entries instead of just file
> > entries; make use of that with sparse checkouts (and partial clones?)
> > so that the cache only stores a tree when that entire tree is
> > "unwanted".  Suddenly, git status, git add, git checkout, etc., etc.
> > are all much faster.  merge needs some special work to operate,
> > though.
>
> Not crazy at all!  It's something we've investigated and discussed
> several times over the past couple of years.  This is the largest
> potential optimization that I am aware of as it 

Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)

2018-08-20 Thread Derrick Stolee

On 8/20/2018 1:26 PM, Junio C Hamano wrote:

Jonathan Nieder  writes:


Usually, I refrain from merging larger topics in 'next' down to
'master' when we get close to -rc0, but I am wondering if it is
better to merge all of them to 'master', even the ones on the larger
and possibly undercooked side, expecting that we collectively spend
effort on hunting and fixing bugs in them during the pre-release
freeze period.  If we were to go that route, I'd want everybody's
buy-in and I'll promise to ignore any shiny new toys that appear on
list that are not regression fixes to topics merged to 'master'
since the end of the previous cycle to make sure people are not
distracted.

Based on what I see in 'next' (midx, range-diff, etc), I quite like
this idea.

The comment above was about the ones that was marked as "Will merge
to 'master'" that are in 'next', not the ones marked as "Will cook
in 'next'", like the midx ones.

I am not worried about range-diff, as I do not think it came close
enough to the core-ish code to break other things; the potential
damage is fairly isolated and the worst would be that we'd realize
that we shipped with a new command that was buggy after the release,
which is not the end of the world.  'midx' and 'reachable' are quite
different stories, as bugs in them would in the worst case lead to
an immediate and unrecoverable repository corruption.

So I am still on the fence, even though I am leaning toward merging
them down to 'master'.


I'm happy to wait until 2.19 is cut for ds/multi-pack-index to merge 
down. I sent a new series today [1] that improves the series and 
corrects some bugs I found while making the full Git test suite work 
when writing a multi-pack-index on every repack. Nothing was 
"unrecoverable repository corruption," but I understand the concern 
there. Making the safest decision is probably the best decision.


I'm happy to contribute to a "pre-2.19 bug bash" after you merge things 
into master.


Thanks,

-Stolee

[1] 
https://public-inbox.org/git/20180820165124.152146-1-dsto...@microsoft.com/T/#u


    [PATCH 0/9] multi-pack-index cleanups



Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)

2018-08-20 Thread Stefan Beller
On Fri, Aug 17, 2018 at 3:44 PM Junio C Hamano  wrote:
>
> Usually, I refrain from merging larger topics in 'next' down to
> 'master' when we get close to -rc0, but I am wondering if it is
> better to merge all of them to 'master', even the ones on the larger
> and possibly undercooked side, expecting that we collectively spend
> effort on hunting and fixing bugs in them during the pre-release
> freeze period.  If we were to go that route, I'd want everybody's
> buy-in and I'll promise to ignore any shiny new toys that appear on
> list that are not regression fixes to topics merged to 'master'
> since the end of the previous cycle to make sure people are not
> distracted.

Speaking of releases, linux for example has some releases that are
more stable than others, as most distros pick the same release for
their 'stable' release, whereas the regular and new releases are just
off of the latest release of linux.

Would a similar model be an interesting thought to entertain?

I guess I could buy into a few weeks of bug fixing.

> * nd/config-blame-sort (2018-08-06) 1 commit
[...]
> * nd/no-extern (2018-08-03) 12 commits
[...]

Thanks Duy for these two series!

I would have expected the no-extern series to
have a bit of merge conflicts similar to  sb/object-store-lookup
as it touches a lot of headers, but so far this looks like smooth sailing?

> * sb/submodule-cleanup (2018-08-16) 2 commits
>   (merged to 'next' on 2018-08-17 at ca9d8aaef4)
>  + builtin/submodule--helper: remove stray new line
>  + t7410: update to new style
>
>  A few preliminary minor clean-ups in the area around submodules.
>
>  Will merge to 'master'.

Oh, that is the prologue of
https://public-inbox.org/git/20180816023100.161626-1-sbel...@google.com/
which I will resend to build on top. Given your question to hunt more bugs,
I'd delay its resend until after the release.

>
> * ao/submodule-wo-gitmodules-checked-out (2018-08-14) 7 commits
>  - submodule: support reading .gitmodules even when it's not checked out
>  - t7506: clean up .gitmodules properly before setting up new scenario
>  - submodule: use the 'submodule--helper config' command
>  - submodule--helper: add a new 'config' subcommand
>  - t7411: be nicer to future tests and really clean things up
>  - submodule: factor out a config_set_in_gitmodules_file_gently function
>  - submodule: add a print_config_from_gitmodules() helper
>
>  The submodule support has been updated to read from the blob at
>  HEAD:.gitmodules when the .gitmodules file is missing from the
>  working tree.
>
>  I find the design a bit iffy in that our usual "missing in the
>  working tree?  let's use the latest blob" fallback is to take it
>  from the index, not from the HEAD.

I am not sure; why does it feel iffy?

> * bw/submodule-name-to-dir (2018-08-10) 2 commits
>  In modern repository layout, the real body of a cloned submodule
>  repository is held in .git/modules/ of the superproject, indexed by
>  the submodule name.  URLencode the submodule name before computing
>  the name of the directory to make sure they form a flat namespace.
>
>  Will merge to 'next'.

Cool! Is the discussion on top of it still going whether to use a
new config for special cases or how we distinguish between a/b/
and a%2fb as submodule names?

> * md/filter-trees (2018-08-16) 6 commits
>  - list-objects-filter: implement filter tree:0
>  - revision: mark non-user-given objects instead
>  - rev-list: handle missing tree objects properly
>  - list-objects: always parse trees gently
>  - list-objects: refactor to process_tree_contents
>  - list-objects: store common func args in struct
>
>  The "rev-list --filter" feature learned to exclude all trees via
>  "tree:0" filter.

I gave this a read and think it is good to go.

> * sb/config-write-fix (2018-08-08) 3 commits
>   (merged to 'next' on 2018-08-17 at 7d9c7ce81f)
>  + git-config: document accidental multi-line setting in deprecated syntax
>  + config: fix case sensitive subsection names on writing
>  + t1300: document current behavior of setting options
>
>  Recent update to "git config" broke updating variable in a
>  subsection, which has been corrected.
>
>  Will merge to 'master'.

Thanks!

>
>
> * sb/range-diff-colors (2018-08-14) 8 commits
>  - diff.c: rewrite emit_line_0 more understandably
>  - diff.c: omit check for line prefix in emit_line_0
>  - diff: use emit_line_0 once per line
>  - diff.c: add set_sign to emit_line_0
>  - diff.c: reorder arguments for emit_line_ws_markup
>  - diff.c: simplify caller of emit_line_0
>  - t3206: add color test for range-diff --dual-color
>  - test_decode_color: understand FAINT and ITALIC
>  (this branch uses js/range-diff; is tangled with es/format-patch-rangediff.)
>
>  The color output support for recently introduced "range-diff"
>  command got tweaked a bit.

No, this series doesn't tweak the colored range-diff.
This might be:

  Add more test coverage to colored range-diff and
  refactor the diff machinery to be more 

Re: [PATCH/RFC] commit: new option to abort -a something is already staged

2018-08-20 Thread Eric Sunshine
On Mon, Aug 20, 2018 at 11:41 AM Nguyễn Thái Ngọc Duy  wrote:
> One way to deal with this is some form of `git undo` that allows me to
> retrieve the old index. That's not a lot of work by itself. The problem
> is designing that `git undo` interface because there are more undo
> options that this.

s/that/than/


Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)

2018-08-20 Thread Eric Sunshine
On Mon, Aug 20, 2018 at 6:23 AM Phillip Wood  wrote:
> On 17/08/2018 23:44, Junio C Hamano wrote:
> > * pw/rebase-i-author-script-fix (2018-08-07) 2 commits
> >  - sequencer: fix quoting in write_author_script
> >  - sequencer: handle errors from read_author_ident()
> >
> >  Undecided.
> >  Is it the list consensus to favor this "with extra code, read the
> >  script written by bad writer" approach?
>
> I think there was agreement between myself and Eric on the last version,
> I'm not sure anyone else has expressed an opinion.

Phillip's v4 [1] does take backward-compatibility into account, but
with minimal extra code, which seems a reasonable compromise between
no backward-compatibility and the heavyweight approach of v3. I left a
few comments on v4 [1], but none were blockers, and changes based upon
those comments can be done later, so I think the series can move
forward.

[1]: 
https://public-inbox.org/git/20180807093452.22524-1-phillip.w...@talktalk.net/


Re: Do not raise conflict when a code in a patch was already added

2018-08-20 Thread Phillip Wood
On 20/08/2018 11:22, Konstantin Kharlamov wrote:
> So, steps-to-reproduce below rather full of trivia like setting up a
> repo, but the TL;DR is:
> 
> Upon using `git rebase -i HEAD~1` and then `git add -p` to add part of a
> "hunk" as one commit, and then using `git rebase --continue` so the
> other part of hunk would be left in top commit; git raises a conflict.

I think this is a misleading error message as in your example below
there are no conflicts, just unstaged changes. git-rebase.sh has the
following code


git update-index --ignore-submodules --refresh &&
git diff-files --quiet --ignore-submodules || {
echo "$(gettext "You must edit all merge conflicts and then
mark them as resolved using git add")"
exit 1

I think this pre-dates interactive rebases when the only unstaged
changes could be conflicts.

> 
> It's spectacular, that content of one of inserted conflict markers is
> empty, so all you have to do is to remove the markers, and use `git add`
> on the file, and then `git rebase --continue`
> 
> Its a lot of unncessary actions, git could just figure that the code it
> sees in the patch is already there, being a part of another commit.

If there are conflict markers where one side is empty it means that some
lines from the merge base (which for a rebase is the parent of the
commit being picked) have been deleted on one side and modified on the
other. Git cannot know if you want to use the deleted version or the
modified version. You can use 'git diff --cc' to see the combined diff
which should show the lines being deleted on both sides and an addition
on the side with the modified lines. You can also set the
merge.conflictStyle config variable to diff3 to see the original text as
well as the text from the merge heads.

Best Wishes

Phillip

> 
> Maybe git could issue a warning, or to question a user interactively
> (y/n); but raising a conflict IMO is unnecessary.
> 
> # Steps to reproduce
> 
> In empty dir execute:
> 
> $ git init
> $ touch test
> Initialized empty Git repository in /tmp/test/.git/
> $ git add test
> $ git commit
> [master (root-commit) a7ce543] 1st commit
>  1 file changed, 2 insertions(+)
>  create mode 100644 test
> $ echo -e "foo\nbar" > test # content you'll want to break
> $ git add -u && git commit
> [detached HEAD 9e28331] 2-nd commit
>  1 file changed, 2 insertions(+)
> $ git rebase -i --root
> Stopped at a7ce543...  1st commit
> You can amend the commit now, with
> 
>   git commit --amend
> 
> Once you are satisfied with your changes, run
> 
>   git rebase --continue
> 
> Put "edit" for the 2-nd commit
> 
> $ git reset HEAD^
> Unstaged changes after reset:
> M   test
> $ git add -p
> diff --git a/test b/test
> index e69de29..3bd1f0e 100644
> --- a/test
> +++ b/test
> @@ -0,0 +1,2 @@
> +foo
> +bar
> Stage this hunk [y,n,q,a,d,e,?]? e
> 
> ╭─constantine@constantine-N61Ja  /tmp/test ‹node-›  ‹› (e721fa3*)
> ╰─$ git commit
> [detached HEAD 27b2f63] add foo
>  1 file changed, 1 insertion(+)
> ╭─constantine@constantine-N61Ja  /tmp/test ‹node-›  ‹› (27b2f63*)
> ╰─$ git rebase --continue
> test: needs update
> You must edit all merge conflicts and then
> mark them as resolved using git add
> 
> What happened is that it's obvious that the hunk was broken to multiple
> commits, and git should figure that out, and not to raise a conflict.
> 
> Side note: for some reason in the test git didn't insert conflict
> markers. It did in real-world usecase though, and there was simply no
> content inside one of them.
> 



Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)

2018-08-20 Thread Junio C Hamano
Jonathan Nieder  writes:

>> Usually, I refrain from merging larger topics in 'next' down to
>> 'master' when we get close to -rc0, but I am wondering if it is
>> better to merge all of them to 'master', even the ones on the larger
>> and possibly undercooked side, expecting that we collectively spend
>> effort on hunting and fixing bugs in them during the pre-release
>> freeze period.  If we were to go that route, I'd want everybody's
>> buy-in and I'll promise to ignore any shiny new toys that appear on
>> list that are not regression fixes to topics merged to 'master'
>> since the end of the previous cycle to make sure people are not
>> distracted.
>
> Based on what I see in 'next' (midx, range-diff, etc), I quite like
> this idea.

The comment above was about the ones that was marked as "Will merge
to 'master'" that are in 'next', not the ones marked as "Will cook
in 'next'", like the midx ones.  

I am not worried about range-diff, as I do not think it came close
enough to the core-ish code to break other things; the potential
damage is fairly isolated and the worst would be that we'd realize
that we shipped with a new command that was buggy after the release,
which is not the end of the world.  'midx' and 'reachable' are quite
different stories, as bugs in them would in the worst case lead to
an immediate and unrecoverable repository corruption.

So I am still on the fence, even though I am leaning toward merging
them down to 'master'.


[PATCH 8/9] midx: test a few commands that use get_all_packs

2018-08-20 Thread Derrick Stolee
The new get_all_packs() method exposed the packfiles coverede by
a multi-pack-index. Before, the 'git cat-file --batch' and
'git count-objects' commands would skip objects in an environment
with a multi-pack-index.

Further, a reachability bitmap would be ignored if its pack-file
was covered by a multi-pack-index.

Signed-off-by: Derrick Stolee 
---
 t/t5319-multi-pack-index.sh | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 4b6e2825a6..424d0c640f 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -86,8 +86,14 @@ test_expect_success 'write midx with one v1 pack' '
 '
 
 midx_git_two_modes () {
-   git -c core.multiPackIndex=false $1 >expect &&
-   git -c core.multiPackIndex=true $1 >actual &&
+   if [ "$2" = "sorted" ]
+   then
+   git -c core.multiPackIndex=false $1 | sort >expect &&
+   git -c core.multiPackIndex=true $1 | sort >actual
+   else
+   git -c core.multiPackIndex=false $1 >expect &&
+   git -c core.multiPackIndex=true $1 >actual
+   fi &&
test_cmp expect actual
 }
 
@@ -95,7 +101,10 @@ compare_results_with_midx () {
MSG=$1
test_expect_success "check normal git operations: $MSG" '
midx_git_two_modes "rev-list --objects --all" &&
-   midx_git_two_modes "log --raw"
+   midx_git_two_modes "log --raw" &&
+   midx_git_two_modes "count-objects --verbose" &&
+   midx_git_two_modes "cat-file --batch-all-objects --buffer 
--batch-check" &&
+   midx_git_two_modes "cat-file --batch-all-objects --buffer 
--batch-check --unsorted" sorted
'
 }
 
@@ -149,6 +158,12 @@ test_expect_success 'repack removes multi-pack-index' '
 
 compare_results_with_midx "after repack"
 
+test_expect_success 'multi-pack-index and pack-bitmap' '
+   git -c repack.writeBitmaps=true repack -ad &&
+   git multi-pack-index write &&
+   git rev-list --test-bitmap HEAD
+'
+
 test_expect_success 'multi-pack-index and alternates' '
git init --bare alt.git &&
echo $(pwd)/alt.git/objects >.git/objects/info/alternates &&
-- 
2.18.0.118.gd4f65b8d14



[PATCH 9/9] pack-objects: consider packs in multi-pack-index

2018-08-20 Thread Derrick Stolee
When running 'git pack-objects --local', we want to avoid packing
objects that are in an alternate. Currently, we check for these
objects using the packed_git_mru list, which excludes the pack-files
covered by a multi-pack-index.

Add a new iteration over the multi-pack-indexes to find these
copies and mark them as unwanted.

Signed-off-by: Derrick Stolee 
---
 builtin/pack-objects.c  | 28 
 t/t5319-multi-pack-index.sh |  8 +++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1a896d7810..4a9a42d29a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -31,6 +31,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "dir.h"
+#include "midx.h"
 
 #define IN_PACK(obj) oe_in_pack(_pack, obj)
 #define SIZE(obj) oe_size(_pack, obj)
@@ -1034,6 +1035,7 @@ static int want_object_in_pack(const struct object_id 
*oid,
 {
int want;
struct list_head *pos;
+   struct multi_pack_index *m;
 
if (!exclude && local && has_loose_object_nonlocal(oid))
return 0;
@@ -1048,6 +1050,32 @@ static int want_object_in_pack(const struct object_id 
*oid,
if (want != -1)
return want;
}
+
+   for (m = get_multi_pack_index(the_repository); m; m = m->next) {
+   struct pack_entry e;
+   if (fill_midx_entry(oid, , m)) {
+   struct packed_git *p = e.p;
+   off_t offset;
+   
+   if (p == *found_pack)
+   offset = *found_offset;
+   else
+   offset = find_pack_entry_one(oid->hash, p);
+
+   if (offset) {
+   if (!*found_pack) {
+   if (!is_pack_valid(p))
+   continue;
+   *found_offset = offset;
+   *found_pack = p;
+   }
+   want = want_found_object(exclude, p);
+   if (want != -1)
+   return want;
+   }
+   }
+   }
+
list_for_each(pos, get_packed_git_mru(the_repository)) {
struct packed_git *p = list_entry(pos, struct packed_git, mru);
off_t offset;
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 424d0c640f..6f56b38674 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -176,7 +176,13 @@ test_expect_success 'multi-pack-index and alternates' '
 compare_results_with_midx "with alternate (local midx)"
 
 test_expect_success 'multi-pack-index in an alternate' '
-   mv .git/objects/pack/* alt.git/objects/pack
+   mv .git/objects/pack/* alt.git/objects/pack &&
+   test_commit add_local_objects &&
+   git repack --local &&
+   git multi-pack-index write &&
+   midx_read_expect 1 3 4 $objdir &&
+   git reset --hard HEAD~1 &&
+   rm -f .git/objects/pack/*
 '
 
 compare_results_with_midx "with alternate (remote midx)"
-- 
2.18.0.118.gd4f65b8d14



[PATCH 2/9] multi-pack-index: store local property

2018-08-20 Thread Derrick Stolee
A pack-file is 'local' if it is stored within the usual object
directory. If it is stored in an alternate, it is non-local.

Pack-files are stored using a 'pack_local' member in the packed_git
struct. Add a similar 'local' member to the multi_pack_index struct
and 'local' parameters to the methods that load and prepare multi-
pack-indexes.

Signed-off-by: Derrick Stolee 
---
 midx.c| 11 ++-
 midx.h|  6 --
 packfile.c|  4 ++--
 t/helper/test-read-midx.c |  2 +-
 4 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/midx.c b/midx.c
index 19b7df338e..6824acf5f8 100644
--- a/midx.c
+++ b/midx.c
@@ -37,7 +37,7 @@ static char *get_midx_filename(const char *object_dir)
return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
 
-struct multi_pack_index *load_multi_pack_index(const char *object_dir)
+struct multi_pack_index *load_multi_pack_index(const char *object_dir, int 
local)
 {
struct multi_pack_index *m = NULL;
int fd;
@@ -73,6 +73,7 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
m->fd = fd;
m->data = midx_map;
m->data_len = midx_size;
+   m->local = local;
 
m->signature = get_be32(m->data);
if (m->signature != MIDX_SIGNATURE) {
@@ -209,7 +210,7 @@ static int prepare_midx_pack(struct multi_pack_index *m, 
uint32_t pack_int_id)
strbuf_addf(_name, "%s/pack/%s", m->object_dir,
m->pack_names[pack_int_id]);
 
-   m->packs[pack_int_id] = add_packed_git(pack_name.buf, pack_name.len, 1);
+   m->packs[pack_int_id] = add_packed_git(pack_name.buf, pack_name.len, 
m->local);
strbuf_release(_name);
return !m->packs[pack_int_id];
 }
@@ -318,7 +319,7 @@ int midx_contains_pack(struct multi_pack_index *m, const 
char *idx_name)
return 0;
 }
 
-int prepare_multi_pack_index_one(struct repository *r, const char *object_dir)
+int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, 
int local)
 {
struct multi_pack_index *m = r->objects->multi_pack_index;
struct multi_pack_index *m_search;
@@ -332,7 +333,7 @@ int prepare_multi_pack_index_one(struct repository *r, 
const char *object_dir)
if (!strcmp(object_dir, m_search->object_dir))
return 1;
 
-   r->objects->multi_pack_index = load_multi_pack_index(object_dir);
+   r->objects->multi_pack_index = load_multi_pack_index(object_dir, local);
 
if (r->objects->multi_pack_index) {
r->objects->multi_pack_index->next = m;
@@ -746,7 +747,7 @@ int write_midx_file(const char *object_dir)
  midx_name);
}
 
-   packs.m = load_multi_pack_index(object_dir);
+   packs.m = load_multi_pack_index(object_dir, 1);
 
packs.nr = 0;
packs.alloc_list = packs.m ? packs.m->num_packs : 16;
diff --git a/midx.h b/midx.h
index e3b07f1586..8aa79f4b62 100644
--- a/midx.h
+++ b/midx.h
@@ -18,6 +18,8 @@ struct multi_pack_index {
uint32_t num_packs;
uint32_t num_objects;
 
+   int local;
+
const unsigned char *chunk_pack_names;
const uint32_t *chunk_oid_fanout;
const unsigned char *chunk_oid_lookup;
@@ -29,14 +31,14 @@ struct multi_pack_index {
char object_dir[FLEX_ARRAY];
 };
 
-struct multi_pack_index *load_multi_pack_index(const char *object_dir);
+struct multi_pack_index *load_multi_pack_index(const char *object_dir, int 
local);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, 
uint32_t *result);
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
struct multi_pack_index *m,
uint32_t n);
 int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct 
multi_pack_index *m);
 int midx_contains_pack(struct multi_pack_index *m, const char *idx_name);
-int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
+int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, 
int local);
 
 int write_midx_file(const char *object_dir);
 void clear_midx_file(const char *object_dir);
diff --git a/packfile.c b/packfile.c
index 12db1a9d7d..896da460ac 100644
--- a/packfile.c
+++ b/packfile.c
@@ -963,11 +963,11 @@ static void prepare_packed_git(struct repository *r)
 
if (r->objects->packed_git_initialized)
return;
-   prepare_multi_pack_index_one(r, r->objects->objectdir);
+   prepare_multi_pack_index_one(r, r->objects->objectdir, 1);
prepare_packed_git_one(r, r->objects->objectdir, 1);
prepare_alt_odb(r);
for (alt = r->objects->alt_odb_list; alt; alt = alt->next) {
-   prepare_multi_pack_index_one(r, alt->path);
+   prepare_multi_pack_index_one(r, alt->path, 0);
prepare_packed_git_one(r, alt->path, 0);

[PATCH 4/9] midx: stop reporting garbage

2018-08-20 Thread Derrick Stolee
When prepare_packed_git is called with the report_garbage method
initialized, we report unexpected files in the objects directory
as garbage. Stop reporting the multi-pack-index and the pack-files
it covers as garbage.

Signed-off-by: Derrick Stolee 
---
 packfile.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/packfile.c b/packfile.c
index 896da460ac..fe713a0242 100644
--- a/packfile.c
+++ b/packfile.c
@@ -820,9 +820,8 @@ static void prepare_pack(const char *full_name, size_t 
full_name_len,
struct packed_git *p;
size_t base_len = full_name_len;
 
-   if (strip_suffix_mem(full_name, _len, ".idx")) {
-   if (data->m && midx_contains_pack(data->m, file_name))
-   return;
+   if (strip_suffix_mem(full_name, _len, ".idx") &&
+   !(data->m && midx_contains_pack(data->m, file_name))) {
/* Don't reopen a pack we already have. */
for (p = data->r->objects->packed_git; p; p = p->next) {
size_t len;
@@ -842,6 +841,8 @@ static void prepare_pack(const char *full_name, size_t 
full_name_len,
if (!report_garbage)
return;
 
+   if (!strcmp(file_name, "multi-pack-index"))
+   return;
if (ends_with(file_name, ".idx") ||
ends_with(file_name, ".pack") ||
ends_with(file_name, ".bitmap") ||
-- 
2.18.0.118.gd4f65b8d14



[PATCH 3/9] midx: mark bad packed objects

2018-08-20 Thread Derrick Stolee
When an object fails to decompress from a pack-file, we mark the object
as 'bad' so we can retry with a different copy of the object (if such a
copy exists).

Before now, the multi-pack-index did not update the bad objects list for
the pack-files it contains, and we did not check the bad objects list
when reading an object. Now, do both.

Signed-off-by: Derrick Stolee 
---
 midx.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/midx.c b/midx.c
index 6824acf5f8..7fa75a37a3 100644
--- a/midx.c
+++ b/midx.c
@@ -280,6 +280,16 @@ static int nth_midxed_pack_entry(struct multi_pack_index 
*m, struct pack_entry *
if (!is_pack_valid(p))
return 0;
 
+   if (p->num_bad_objects) {
+   uint32_t i;
+   struct object_id oid;
+   nth_midxed_object_oid(, m, pos);
+   for (i = 0; i < p->num_bad_objects; i++)
+   if (!hashcmp(oid.hash,
+p->bad_object_sha1 + the_hash_algo->rawsz 
* i))
+   return 0;
+   }
+
e->offset = nth_midxed_offset(m, pos);
e->p = p;
 
-- 
2.18.0.118.gd4f65b8d14



[PATCH 7/9] treewide: use get_all_packs

2018-08-20 Thread Derrick Stolee
There are many places in the codebase that want to iterate over
all packfiles known to Git. The purposes are wide-ranging, and
those that can take advantage of the multi-pack-index already
do. So, use get_all_packs() instead of get_packed_git() to be
sure we are iterating over all packfiles.

Signed-off-by: Derrick Stolee 
---
 builtin/count-objects.c  |  2 +-
 builtin/fsck.c   |  4 ++--
 builtin/gc.c |  4 ++--
 builtin/pack-objects.c   | 14 +++---
 builtin/pack-redundant.c |  4 ++--
 fast-import.c|  4 ++--
 http-backend.c   |  4 ++--
 pack-bitmap.c|  2 +-
 pack-objects.c   |  2 +-
 packfile.c   |  2 +-
 server-info.c|  4 ++--
 11 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index d51e2ce1ec..a7cad052c6 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -123,7 +123,7 @@ int cmd_count_objects(int argc, const char **argv, const 
char *prefix)
struct strbuf pack_buf = STRBUF_INIT;
struct strbuf garbage_buf = STRBUF_INIT;
 
-   for (p = get_packed_git(the_repository); p; p = p->next) {
+   for (p = get_all_packs(the_repository); p; p = p->next) {
if (!p->pack_local)
continue;
if (open_pack_index(p))
diff --git a/builtin/fsck.c b/builtin/fsck.c
index c96f3f4fcc..184d8e7f4e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -740,7 +740,7 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
struct progress *progress = NULL;
 
if (show_progress) {
-   for (p = get_packed_git(the_repository); p;
+   for (p = get_all_packs(the_repository); p;
 p = p->next) {
if (open_pack_index(p))
continue;
@@ -749,7 +749,7 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
 
progress = start_progress(_("Checking 
objects"), total);
}
-   for (p = get_packed_git(the_repository); p;
+   for (p = get_all_packs(the_repository); p;
 p = p->next) {
/* verify gives error messages itself */
if (verify_pack(p, fsck_obj_buffer,
diff --git a/builtin/gc.c b/builtin/gc.c
index 57069442b0..2b592260e9 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -183,7 +183,7 @@ static struct packed_git *find_base_packs(struct 
string_list *packs,
 {
struct packed_git *p, *base = NULL;
 
-   for (p = get_packed_git(the_repository); p; p = p->next) {
+   for (p = get_all_packs(the_repository); p; p = p->next) {
if (!p->pack_local)
continue;
if (limit) {
@@ -208,7 +208,7 @@ static int too_many_packs(void)
if (gc_auto_pack_limit <= 0)
return 0;
 
-   for (cnt = 0, p = get_packed_git(the_repository); p; p = p->next) {
+   for (cnt = 0, p = get_all_packs(the_repository); p; p = p->next) {
if (!p->pack_local)
continue;
if (p->pack_keep)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 4391504a91..1a896d7810 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2784,7 +2784,7 @@ static void add_objects_in_unpacked_packs(struct rev_info 
*revs)
 
memset(_pack, 0, sizeof(in_pack));
 
-   for (p = get_packed_git(the_repository); p; p = p->next) {
+   for (p = get_all_packs(the_repository); p; p = p->next) {
struct object_id oid;
struct object *o;
 
@@ -2848,7 +2848,7 @@ static int has_sha1_pack_kept_or_nonlocal(const struct 
object_id *oid)
struct packed_git *p;
 
p = (last_found != (void *)1) ? last_found :
-   get_packed_git(the_repository);
+   get_all_packs(the_repository);
 
while (p) {
if ((!p->pack_local || p->pack_keep ||
@@ -2858,7 +2858,7 @@ static int has_sha1_pack_kept_or_nonlocal(const struct 
object_id *oid)
return 1;
}
if (p == last_found)
-   p = get_packed_git(the_repository);
+   p = get_all_packs(the_repository);
else
p = p->next;
if (p == last_found)
@@ -2894,7 +2894,7 @@ static void loosen_unused_packed_objects(struct rev_info 
*revs)
uint32_t i;
struct object_id oid;
 
-   for (p = get_packed_git(the_repository); p; p = p->next) {
+   for (p = get_all_packs(the_repository); p; p = 

[PATCH 1/9] multi-pack-index: provide more helpful usage info

2018-08-20 Thread Derrick Stolee
The multi-pack-index builtin has a very simple command-line
interface. Instead of simply reporting usage, give the user a
hint to why the arguments failed.

Reported-by: Eric Sunshine 
Signed-off-by: Derrick Stolee 
---
 builtin/multi-pack-index.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 6a7aa00cf2..2633efd95d 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -32,16 +32,16 @@ int cmd_multi_pack_index(int argc, const char **argv,
opts.object_dir = get_object_directory();
 
if (argc == 0)
-   goto usage;
+   usage_with_options(builtin_multi_pack_index_usage,
+  builtin_multi_pack_index_options);
 
-   if (!strcmp(argv[0], "write")) {
-   if (argc > 1)
-   goto usage;
+   if (argc > 1) {
+   die(_("too many arguments"));
+   return 1;
+   }
 
+   if (!strcmp(argv[0], "write"))
return write_midx_file(opts.object_dir);
-   }
 
-usage:
-   usage_with_options(builtin_multi_pack_index_usage,
-  builtin_multi_pack_index_options);
+   die(_("unrecognized verb: %s"), argv[0]);
 }
-- 
2.18.0.118.gd4f65b8d14



[PATCH 5/9] midx: fix bug that skips midx with alternates

2018-08-20 Thread Derrick Stolee
The logic for constructing the linked list of multi-pack-indexes
in the object store is incorrect. If the local object store has
a multi-pack-index, but an alternate does not, then the list is
dropped.

Add tests that would have revealed this bug.

Signed-off-by: Derrick Stolee 
---
 midx.c  | 11 ++-
 t/t5319-multi-pack-index.sh | 17 +
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/midx.c b/midx.c
index 7fa75a37a3..0710c4c175 100644
--- a/midx.c
+++ b/midx.c
@@ -331,7 +331,7 @@ int midx_contains_pack(struct multi_pack_index *m, const 
char *idx_name)
 
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, 
int local)
 {
-   struct multi_pack_index *m = r->objects->multi_pack_index;
+   struct multi_pack_index *m;
struct multi_pack_index *m_search;
int config_value;
 
@@ -339,14 +339,15 @@ int prepare_multi_pack_index_one(struct repository *r, 
const char *object_dir, i
!config_value)
return 0;
 
-   for (m_search = m; m_search; m_search = m_search->next)
+   for (m_search = r->objects->multi_pack_index; m_search; m_search = 
m_search->next)
if (!strcmp(object_dir, m_search->object_dir))
return 1;
 
-   r->objects->multi_pack_index = load_multi_pack_index(object_dir, local);
+   m = load_multi_pack_index(object_dir, local);
 
-   if (r->objects->multi_pack_index) {
-   r->objects->multi_pack_index->next = m;
+   if (m) {
+   m->next = r->objects->multi_pack_index;
+   r->objects->multi_pack_index = m;
return 1;
}
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index ae1d5d4592..4b6e2825a6 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -149,6 +149,23 @@ test_expect_success 'repack removes multi-pack-index' '
 
 compare_results_with_midx "after repack"
 
+test_expect_success 'multi-pack-index and alternates' '
+   git init --bare alt.git &&
+   echo $(pwd)/alt.git/objects >.git/objects/info/alternates &&
+   echo content1 >file1 &&
+   altblob=$(GIT_DIR=alt.git git hash-object -w file1) &&
+   git cat-file blob $altblob &&
+   git rev-list --all
+'
+
+compare_results_with_midx "with alternate (local midx)"
+
+test_expect_success 'multi-pack-index in an alternate' '
+   mv .git/objects/pack/* alt.git/objects/pack
+'
+
+compare_results_with_midx "with alternate (remote midx)"
+
 
 # usage: corrupt_data   []
 corrupt_data () {
-- 
2.18.0.118.gd4f65b8d14



[PATCH 6/9] packfile: add all_packs list

2018-08-20 Thread Derrick Stolee
If a repo contains a multi-pack-index, then the packed_git list
does not contain the packfiles that are covered by the multi-pack-index.
This is important for doing object lookups, abbreviations, and
approximating object count. However, there are many operations that
really want to iterate over all packfiles.

Create a new 'all_packs' linked list that contains this list, starting
with the packfiles in the multi-pack-index and then continuing along
the packed_git linked list.

Signed-off-by: Derrick Stolee 
---
 midx.c |  2 +-
 midx.h |  1 +
 object-store.h |  6 ++
 packfile.c | 27 +++
 packfile.h |  1 +
 5 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 0710c4c175..f3e8dbc108 100644
--- a/midx.c
+++ b/midx.c
@@ -197,7 +197,7 @@ static void close_midx(struct multi_pack_index *m)
FREE_AND_NULL(m->pack_names);
 }
 
-static int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id)
+int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id)
 {
struct strbuf pack_name = STRBUF_INIT;
 
diff --git a/midx.h b/midx.h
index 8aa79f4b62..a210f1af2a 100644
--- a/midx.h
+++ b/midx.h
@@ -32,6 +32,7 @@ struct multi_pack_index {
 };
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int 
local);
+int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, 
uint32_t *result);
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
struct multi_pack_index *m,
diff --git a/object-store.h b/object-store.h
index f9c57e2c26..c69d546392 100644
--- a/object-store.h
+++ b/object-store.h
@@ -128,6 +128,12 @@ struct raw_object_store {
/* A most-recently-used ordered version of the packed_git list. */
struct list_head packed_git_mru;
 
+   /*
+* A linked list containing all packfiles, starting with those
+* contained in the multi_pack_index.
+*/
+   struct packed_git *all_packs;
+
/*
 * A fast, rough count of the number of objects in the repository.
 * These two fields are not meant for direct access. Use
diff --git a/packfile.c b/packfile.c
index fe713a0242..adcf2e12a0 100644
--- a/packfile.c
+++ b/packfile.c
@@ -972,6 +972,9 @@ static void prepare_packed_git(struct repository *r)
prepare_packed_git_one(r, alt->path, 0);
}
rearrange_packed_git(r);
+
+   r->objects->all_packs = NULL;
+
prepare_packed_git_mru(r);
r->objects->packed_git_initialized = 1;
 }
@@ -995,6 +998,30 @@ struct multi_pack_index *get_multi_pack_index(struct 
repository *r)
return r->objects->multi_pack_index;
 }
 
+struct packed_git *get_all_packs(struct repository *r)
+{
+   prepare_packed_git(r);
+
+   if (!r->objects->all_packs) {
+   struct packed_git *p = r->objects->packed_git;
+   struct multi_pack_index *m;
+
+   for (m = r->objects->multi_pack_index; m; m = m->next) {
+   uint32_t i;
+   for (i = 0; i < m->num_packs; i++) {
+   if (!prepare_midx_pack(m, i)) {
+   m->packs[i]->next = p;
+   p = m->packs[i];
+   }
+   }
+   }
+
+   r->objects->all_packs = p;
+   }
+
+   return r->objects->all_packs;
+}
+
 struct list_head *get_packed_git_mru(struct repository *r)
 {
prepare_packed_git(r);
diff --git a/packfile.h b/packfile.h
index 786686..3b90e2864c 100644
--- a/packfile.h
+++ b/packfile.h
@@ -51,6 +51,7 @@ extern void install_packed_git(struct repository *r, struct 
packed_git *pack);
 struct packed_git *get_packed_git(struct repository *r);
 struct list_head *get_packed_git_mru(struct repository *r);
 struct multi_pack_index *get_multi_pack_index(struct repository *r);
+struct packed_git *get_all_packs(struct repository *r);
 
 /*
  * Give a rough count of objects in the repository. This sacrifices accuracy
-- 
2.18.0.118.gd4f65b8d14



[PATCH 0/9] multi-pack-index cleanups

2018-08-20 Thread Derrick Stolee
This series is based on ds/multi-pack-index and
jk/for-each-object-iteration.

The multi-pack-index indexes objects across multiple pack-files. To
speed up object lookups and abbreviations, we do not place the pack-
files covered by the multi-pack-index into the packed_git linked list
or the packed_git_mru list. Existing test coverage focused on typical
uses and the main consumers of the multi-pack-index.

To better understand the implications of the multi-pack-index with
other scenarios, I ran the test suite after adding a step to 'git repack'
to write a multi-pack-index, and to default core.multiPackIndex to 'true'.
This commit is available as [1].

The following issues were discovered, and are fixed by this series:

1. The multi-pack-index did not distinguish between local and non-local
   pack-files.

2. A bad packed object was not inspected by object lookups in the multi-
   pack-index, so would loop infinitely trying to load the same object.

3. 'git count-objects --verbose' would not see the objects in the multi-
   pack-index and would report the multi-pack-index as garbage.

4. If the local object directory had a multi-pack-index but an alternate
   did not, then the multi-pack-index would be dropped.

5. If the multi-pack-index covered a pack-file that was paired with a
   reachability bitmap, then that bitmap would not be loaded.

Several issues were resolved simply by making a new 'all_packs' list in
the object store and replacing get_packed_git() calls with get_all_packs()
calls. The all_packs list is a linked list that starts with the pack-files
in multi-pack-indexes and then continues along the packed_git linked list.

Also: I simplified the usage reports in 'git multi-pack-index' to help
users who are entering parameters incorrectly.

[1] 
https://github.com/derrickstolee/git/commit/098dd1d515b592fb165a276241d7d68d1cde0036
DO-NOT-MERGE: compute multi-pack-index on repack.
I will send this commit as a separate patch so we can see the change
I made and the one test I needed to fix (because it moves a pack-file,
thereby making the multi-pack-index invalid).

[2] https://github.com/derrickstolee/git/pull/9
A GitHub pull request containing this series.

Derrick Stolee (9):
  multi-pack-index: provide more helpful usage info
  multi-pack-index: store local property
  midx: mark bad packed objects
  midx: stop reporting garbage
  midx: fix bug that skips midx with alternates
  packfile: add all_packs list
  treewide: use get_all_packs
  midx: test a few commands that use get_all_packs
  pack-objects: consider packs in multi-pack-index

 builtin/count-objects.c |  2 +-
 builtin/fsck.c  |  4 ++--
 builtin/gc.c|  4 ++--
 builtin/multi-pack-index.c  | 16 +++---
 builtin/pack-objects.c  | 42 +--
 builtin/pack-redundant.c|  4 ++--
 fast-import.c   |  4 ++--
 http-backend.c  |  4 ++--
 midx.c  | 32 ++-
 midx.h  |  7 --
 object-store.h  |  6 +
 pack-bitmap.c   |  2 +-
 pack-objects.c  |  2 +-
 packfile.c  | 40 -
 packfile.h  |  1 +
 server-info.c   |  4 ++--
 t/helper/test-read-midx.c   |  2 +-
 t/t5319-multi-pack-index.sh | 44 ++---
 18 files changed, 168 insertions(+), 52 deletions(-)

-- 
2.18.0.118.gd4f65b8d14



Re: [PATCH v3 4/7] submodule--helper: add a new 'config' subcommand

2018-08-20 Thread Antonio Ospite
On Tue, 14 Aug 2018 10:10:58 -0700
Brandon Williams  wrote:

> On 08/14, Antonio Ospite wrote:
> > Add a new 'config' subcommand to 'submodule--helper', this extra level
> > of indirection makes it possible to add some flexibility to how the
> > submodules configuration is handled.
> > 
> > Signed-off-by: Antonio Ospite 
> > ---
> >  builtin/submodule--helper.c | 14 ++
> 
> >  new |  0
> 
> Looks like you may have accidentally left in an empty file "new" it should
> probably be removed from this commit before it gets merged.
> 

Yeah, I had added it to test "git cat-file -e new" for a later patch and
then I must have messed up some rebase. Thanks for pointing it out.

Ciao,
   Antonio

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?


Re: [PATCH v3 3/7] t7411: be nicer to future tests and really clean things up

2018-08-20 Thread Antonio Ospite
On Tue, 14 Aug 2018 13:16:38 -0700
Junio C Hamano  wrote:

> Antonio Ospite  writes:
> 
[...]
> >  test_expect_success 'error message contains blob reference' '
> > +   # Remove the error introduced in the previous test.
> > +   # It is not needed in the following tests.
> > +   test_when_finished "git -C super reset --hard HEAD^" &&
> 
> Hmm, that is ugly.  Depending on where in the subshell the previous
> test failed, you'd still be taking us to an unexpected place.
> Imagine if "git commit -m 'add error'" failed, for example, in the
> test before this one.
> 
> I am wondering if the proper fix is to merge the previous one and
> this one into a single test.  The combined test would
> 
> - remember where the HEAD in super is and arrange to come back
>   to it when test is done
> - break .gitmodules and commit it
> - run test-tool and check its output
> - also check its error output
> 
> in a single test_expect_success.
>

I will try that.

> > @@ -123,6 +126,7 @@ test_expect_success 'using different treeishs works' '
> >  '
> >  
> >  test_expect_success 'error in history in fetchrecursesubmodule lets 
> > continue' '
> > +   test_when_finished "git -C super reset --hard HEAD^" &&
> > (cd super &&
> > git config -f .gitmodules \
> > submodule.submodule.fetchrecursesubmodules blabla &&
> > @@ -134,8 +138,7 @@ test_expect_success 'error in history in 
> > fetchrecursesubmodule lets continue' '
> > HEAD b \
> > HEAD submodule \
> > >actual &&
> > -   test_cmp expect_error actual  &&
> > -   git reset --hard HEAD^
> > +   test_cmp expect_error actual
> > )
> >  '
> 
> If we want to be more robust, you'd probably need to find a better
> anchoring point than HEAD, which can be pointing different commit
> depending on where in the subshell the process is hit with ^C,
> i.e.
> 
>   ORIG=$(git -C super rev-parse HEAD) &&
>   test_when_finished "git -C super reset --hard $ORIG" &&
>   (
>   cd super &&
>   ...
>

I see, ORIG is set and evaluated immediately but the value will be
used only at a later time.

I remember that you raised concerns also in the previous review round
but I didn't quite get what you meant, now I think I do.

> The patch is still an improvement compared to the current code,
> where a broken test-tool that does not produce expected output in
> the file 'actual' is guaranteed to leave us at a commit that we do
> not expect to be at, but not entirely satisfactory.

I can do a v4 with these fixes since there are also some comments about
other patches.

Thanks,
   Antonio

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?


Re: [PATCH/RFC] commit: new option to abort -a something is already staged

2018-08-20 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> +commit.preciousDirtyIndex::
> + If some changes are partially staged in the index (i.e.
> + "git commit -a" and "git commit" commit different changes), reject
> + "git commit -a".

Or perhaps better yet, go into yes/no interaction to confirm if you
have access to interactive terminal at fd #0/#1, perhaps?

> diff --git a/builtin/commit.c b/builtin/commit.c
> index 213fca2d8e..489e4b9f50 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -98,6 +98,7 @@ static const char *author_message, *author_message_buffer;
>  static char *edit_message, *use_message;
>  static char *fixup_message, *squash_message;
>  static int all, also, interactive, patch_interactive, only, amend, signoff;
> +static int allow_dirty_index = 1;
>  static int edit_flag = -1; /* unspecified */
>  static int quiet, verbose, no_verify, allow_empty, dry_run, renew_authorship;
>  static int config_commit_verbose = -1; /* unspecified */
> @@ -385,10 +386,24 @@ static const char *prepare_index(int argc, const char 
> **argv, const char *prefix
>* (B) on failure, rollback the real index.
>*/
>   if (all || (also && pathspec.nr)) {
> + int compare_oid = all && !allow_dirty_index;
> + struct object_id previous_oid;
> +
> + if (compare_oid) {
> + if (update_main_cache_tree(0) || !the_index.cache_tree)
> + die(_("error building trees"));

Hmph, when we conclude a conflicted merge with "commit -a", wouldn't
we fail to compute the "before" picture, with higher-stage entries
stil in the index?

> + if (the_index.cache_tree->entry_count >= 0)
> + oidcpy(_oid, 
> _index.cache_tree->oid);
> + else
> + oidclr(_oid);

The cache-tree covers no entry, meaning the index has no cache
entries?  Shouldn't we setting EMPTY_TREE_SHA1_BIN or something
instead?

> + }
>   hold_locked_index(_lock, LOCK_DIE_ON_ERROR);
>   add_files_to_cache(also ? prefix : NULL, , 0);
>   refresh_cache_or_die(refresh_flags);
>   update_main_cache_tree(WRITE_TREE_SILENT);
> + if (compare_oid && the_index.cache_tree &&
> + oidcmp(_oid, _index.cache_tree->oid))
> + die(_("staged content is different, aborting"));

I was hoping that it is an easy change to teach add_files_to_cache()
to report how many paths it actually has "added" (modifications and
removals are of course also counted), which allows us not to waste
computing a throw-away tree object once more.

There only are three existing callers to the function, and only one
of them rely on the current "non-zero is error, zero is good"
semantics, so updating that caller (and perhaps vetting the other
callers to see if they also _should_ pay attention to the return
value, at least to detect errors of not number of paths updated)
shouldn't be that much more effort, and would be a good update to
the API regardless of this new feature, I would think.

>   if (write_locked_index(_index, _lock, 0))
>   die(_("unable to write new_index file"));
>   commit_style = COMMIT_NORMAL;
> @@ -1413,6 +1428,10 @@ static int git_commit_config(const char *k, const char 
> *v, void *cb)
>   config_commit_verbose = git_config_bool_or_int(k, v, _bool);
>   return 0;
>   }
> + if (!strcmp(k, "commit.preciousdirtyindex")) {
> + allow_dirty_index = !git_config_bool(k, v);
> + return 0;
> + }
>  
>   status = git_gpg_config(k, v, NULL);
>   if (status)


Re: "less -F" is broken

2018-08-20 Thread Mark Nudelman

On 8/16/2018 10:10 AM, Linus Torvalds wrote:

One option that I didn't try to go for - because I just don't know the
less code base well enough - is to basically make the behavior of '-F'
be something like this:

  - as long as all the lines are short and well-behaved, and we haven't
seen enough lines to fill the screen, act like 'cat' and just feed
them through

  - when you fill the screen (or when you hit some other condition that
makes you go "now I won't exit" - that could be a long line, but maybe
it could also be the user giving keyboard input for a less command?)
you send the init sequence and just redraw the whole screen.


I'm not sure that this would be a very nice user experience.  On a 
terminal where the init sequence opens an alternate screen, some lines 
from the start of the file would be printed in the main screen, and then 
the whole file would be viewed in the alternate screen.  After exiting 
less, the user would see his main screen with some lines from the first 
page of the file displayed and then cut off at a seemingly arbitrary 
point.  Seems like that could be confusing and annoying.




But let's say that that was all the user was interested in, and the
user presses 'q' to quit less. That doesn't work at all - it will wait
for that full ten seconds.

That actually happens even without -F too.

Wouldn't it be good to react to things like searches to highlight
something (and to 'quit' for the 'never mind, alteady got it' case)
even if there isn't enough data to fill the whole screen yet?

that said, ^C works, and this is not new behavior, so I'm just
throwing this out as a "maybe a different approach would fix _both_
the -F behavior _and_ the above traditional issue"?


This issue is, as you say, not related to the -F issue, but arises 
because less doesn't have a way to be reading a file and simultaneously 
react to terminal key presses.  When I first wrote less, there was no 
easy way to do this in Unix.  Less also runs on other OSes which don't 
provide this functionality.  The best I was able to do was to allow 
ctrl-C to interrupt the read.  Of course in a modern OS that has 
select() or similar functionality this could be implemented, but I think 
it would require some largish changes to the architecture.  (Or maybe 
not; I haven't really investigated this in detail.)


BTW, your first message seems to indicate that you didn't find the less 
project on github.  It's at https://github.com/gwsw/less (mentioned in 
the README).  The latest version (v535) has the -F change implemented, 
but I haven't yet released this for beta testing.


--Mark



[PATCH/RFC] commit: new option to abort -a something is already staged

2018-08-20 Thread Nguyễn Thái Ngọc Duy
So many times I have carefully cherry picked changes to the index with
`git add -p` then accidentally did `git commit -am ` (usually by
retrieving a command from history and pressing Enter too quickly)
which destroyed beautiful index.

One way to deal with this is some form of `git undo` that allows me to
retrieve the old index. That's not a lot of work by itself. The problem
is designing that `git undo` interface because there are more undo
options that this.

Instead, let's handle just this problem for now. This new option
commit.preciousDirtyIndex, if set to false, will not allow `commit -a`
to continue if the final index is different from the existing one. I
don't think this can be achieved with hooks because the hooks don't
know about "-a" or different commit modes.

Or is there a better way to handle this?

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/config.txt |  5 +
 builtin/commit.c | 19 +++
 2 files changed, 24 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 95ad715a44..3937681ee9 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1417,6 +1417,11 @@ commit.gpgSign::
convenient to use an agent to avoid typing your GPG passphrase
several times.
 
+commit.preciousDirtyIndex::
+   If some changes are partially staged in the index (i.e.
+   "git commit -a" and "git commit" commit different changes), reject
+   "git commit -a".
+
 commit.status::
A boolean to enable/disable inclusion of status information in the
commit message template when using an editor to prepare the commit
diff --git a/builtin/commit.c b/builtin/commit.c
index 213fca2d8e..489e4b9f50 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -98,6 +98,7 @@ static const char *author_message, *author_message_buffer;
 static char *edit_message, *use_message;
 static char *fixup_message, *squash_message;
 static int all, also, interactive, patch_interactive, only, amend, signoff;
+static int allow_dirty_index = 1;
 static int edit_flag = -1; /* unspecified */
 static int quiet, verbose, no_verify, allow_empty, dry_run, renew_authorship;
 static int config_commit_verbose = -1; /* unspecified */
@@ -385,10 +386,24 @@ static const char *prepare_index(int argc, const char 
**argv, const char *prefix
 * (B) on failure, rollback the real index.
 */
if (all || (also && pathspec.nr)) {
+   int compare_oid = all && !allow_dirty_index;
+   struct object_id previous_oid;
+
+   if (compare_oid) {
+   if (update_main_cache_tree(0) || !the_index.cache_tree)
+   die(_("error building trees"));
+   if (the_index.cache_tree->entry_count >= 0)
+   oidcpy(_oid, 
_index.cache_tree->oid);
+   else
+   oidclr(_oid);
+   }
hold_locked_index(_lock, LOCK_DIE_ON_ERROR);
add_files_to_cache(also ? prefix : NULL, , 0);
refresh_cache_or_die(refresh_flags);
update_main_cache_tree(WRITE_TREE_SILENT);
+   if (compare_oid && the_index.cache_tree &&
+   oidcmp(_oid, _index.cache_tree->oid))
+   die(_("staged content is different, aborting"));
if (write_locked_index(_index, _lock, 0))
die(_("unable to write new_index file"));
commit_style = COMMIT_NORMAL;
@@ -1413,6 +1428,10 @@ static int git_commit_config(const char *k, const char 
*v, void *cb)
config_commit_verbose = git_config_bool_or_int(k, v, _bool);
return 0;
}
+   if (!strcmp(k, "commit.preciousdirtyindex")) {
+   allow_dirty_index = !git_config_bool(k, v);
+   return 0;
+   }
 
status = git_gpg_config(k, v, NULL);
if (status)
-- 
2.18.0.1003.g5e2e2c8169



Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output

2018-08-20 Thread Junio C Hamano
Han-Wen Nienhuys  writes:

>> @@ -84,6 +86,9 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
>> const char *src, int n)
>> for (i = 0; i < ARRAY_SIZE(keywords); i++) {
>> struct keyword_entry *p = keywords + i;
>> int len = strlen(p->keyword);
>> +
>> +   if (n <= len)
>> +   continue;
>
> I would suggest
>
>  if (n < len) continue;
> ..
> if (!strncasecmp(p->keyword, src, len) && (n == len || !isalnum(src[len]))) {
>
> so we colorize a single line that looks like "warning" as well

That's the kind of thing I would have mentioned in the initial
review of the feature, and I do not think it is a bad idea.

I do think it is a bad idea to roll it into this patch, though.




Re: [PATCH v2 05/11] t0027: make hash size independent'

2018-08-20 Thread Torsten Bögershausen
On 20.08.18 00:10, Eric Sunshine wrote:
> On Sun, Aug 19, 2018 at 5:57 PM brian m. carlson
>  wrote:
>> On Sun, Aug 19, 2018 at 04:10:21PM -0400, Eric Sunshine wrote:
>>> On Sun, Aug 19, 2018 at 1:54 PM brian m. carlson
 -   tr '\015\000abcdef0123456789' QN0 <"$2" >"$exp" &&
 +   tr '\015\000abcdef0123456789' QN0 <"$2" >"$exp+" &&
>>>
>>> My immediate thought upon reading this was whether "+" is valid in
>>> Windows filenames. Apparently, it is, but perhaps (if you re-roll) it
>>> would make sense to use a character less likely to cause brain
>>> hiccups; for instance, "exp0'.
>>
>> The reason I picked that is that we use it quite a bit in the Makefile,
>> so it seemed like an obvious choice for a temporary file name.  If you
>> feel strongly I can pick something else, but I thought it would be
>> reasonably intuitive for other developers.  Maybe I was wrong, though.
> 
> I don't feel strongly about it. My brain tripped over it probably
> because it's not an idiom in Git tests. In fact, I see just one place
> where "+" has been used like this, in t6026.
> 

Probably "tmp" is a better name than "exp+" 
(Why the '+' ? Is it better that the non-'+' ?)

Anyway,
If we re-order a little bit, can we use a simple '|' instead ?

tr '\015\000abcdef0123456789' QN0 <"$2" |
sed -e "s/+/$ZERO_OID/"   >"$exp" &&
tr '\015\000abcdef0123456789' QN0 <"$3" >"$act" &&


Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output

2018-08-20 Thread Junio C Hamano
Jeff King  writes:

>> In the longer term we may want to accept size_t as parameter for
>> clarity (even though we know that a sideband message we are painting
>> typically would fit on a line on a terminal and int is sufficient).
>> Write it down as a NEEDSWORK comment.
>
> This "typically" made me nervous about what happens in the non-typical
> case, but I think we can say something even stronger: the length comes
> from a pktline, so the maximum is less than 16 bits. I wondered if we
> might ever call this on the accumulated string from multiple sidebands,
> but it doesn't look like it.

I think a sideband packet may be split on one but not the other
codepath to result in multiple calls, but I do not think we coalesce
them together, so I agree that int is sufficient.


Re: [PATCH v3] checkout: optimize "git checkout -b "

2018-08-20 Thread Ben Peart




On 8/18/2018 9:44 PM, Elijah Newren wrote:

On Fri, Aug 17, 2018 at 5:41 AM Ben Peart  wrote:

On 8/16/2018 2:37 PM, Duy Nguyen wrote:

On Thu, Aug 16, 2018 at 8:27 PM Ben Peart  wrote:


From: Ben Peart 

Skip merging the commit, updating the index and working directory if and
only if we are creating a new branch via "git checkout -b ."
Any other checkout options will still go through the former code path.

If sparse_checkout is on, require the user to manually opt in to this
optimzed behavior by setting the config setting checkout.optimizeNewBranch
to true as we will no longer update the skip-worktree bit in the index, nor
add/remove files in the working directory to reflect the current sparse
checkout settings.

For comparison, running "git checkout -b " on a large repo takes:

14.6 seconds - without this patch
0.3 seconds - with this patch


I still don't think we should do this. If you want lightning fast
branch creation, just use 'git branch'. From the timing breakdown you
shown in the other thread it looks like sparse checkout still takes
seconds, which could be optimized (or even excluded, I mentioned this
too). And split index (or something similar if you can't use it) would
give you saving across the board. There is still one idea Elijah gave
me that should further lower traverse_trees()  cost.



We have investigated some of these already - split index ended up
slowing things down more than it sped them up do to the higher compute
costs.  Sparse checkout we've already optimized significantly - limiting
the patterns we accept so that we can do the lookup via a hashmap
instead of the robust pattern matching.  We will continue to look for
other optimizations and appreciate any and all ideas!

In the end, this optimization makes a huge performance improvement by
avoiding doing a lot of work that isn't necessary.  Taking a command
from 14+ seconds to sub-second is just too much of a win for us to ignore.


But anyway, it's not my call. I'll stop here.


It's even less of my call, but since things seem to be stuck in
what-should-we-do state (as per Junio's comments on this patch in the
last two "What's cooking" emails), and since Ben and Duy obviously
have opposite opinions on Ben's patch, let's see if I might be able to
help at all.  Here's my summary and my findings:

== The pain ==
- For repositories with a really large number of entries (500K as Ben
says), some operations are much slower than it feels like they should
be.
- This does not seem to be GFVS-specific in any way, I can duplicate
slowness with a simple git-bomb[1]-like repo that has a sparse
checkout pattern ignoring the "bomb" side.  (It has 1M+1 entries in
the index, and .git/info/sparse-checkout ignores the 1M so the working
copy only has 1 entry).  The timings on my repo for "git checkout -b
$NEWBRANCH" are almost exactly double what Ben reports he gets on
their repo.

[1] https://kate.io/blog/making-your-own-exploding-git-repos/

== Short term solutions ==
- Alternative git commands exist today to do a fast checkout of a new
branch in a huge repo.  I also get sub-second timings in my
even-bigger repo with this:
git branch $NEWBRANCH && git symbolic-ref HEAD $NEWBRANCH
But I do understand that wrapping this into a script or executable
(git-fast-new-branch?) and asking users to use it is a usability
problem and an uphill battle.  (Sidenote: this isn't quite the same
operation; it's missing a reflog update.  The -m option to
symbolic-ref doesn't seem to help; I guess the fact that HEAD's
sha1sum is not changing is viewed as not an update?  However, that
could also be scripted around.)
- Ben's patch successfully drops the time for "git checkout -b
$NEWBRANCH" from 26+ seconds (in my cooked-up testcase) to sub-second
(in fact, under .1 seconds for me).  That's a _huge_ win.

== unpack_trees optimization notes ==
- Ben's patch is extremely focused.  It only affects "git checkout -b
$NEWBRANCH".  If someone runs "git branch $NEWBRANCH && git checkout
$NEWBRANCH", they get the old 26+ second timing.  They also suddenly
get the really long timing if they add any other flags or checkout a
commit that differs in only a single entry in the entire tree.  It
would be nice if we did general optimization for all issues rather
than just special casing such narrow cases.
- However, optimizing unpack_trees is hard.  It's really easy to get
lost trying to look at the code.  Time has been spent trying to
optimizing it.  Ben really likes the speedup factors of 2-3 that Duy
has produced.  But he's pessimistic we'll find enough to bridge the
gap for this case.  And he's worried about breaking unrelated stuff
due to the complexity of unpack_trees.
- Duy is pretty sure we can optimize unpack_trees in at least one more
way.  I've tried looking through the code and think there are others,
but then again I'm known to get lost and confused in unpack_trees.

== The patch ==
- Ben's patch only affects the "checkout -b $NEWBRANCH" case.  He
checks for it 

Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0

2018-08-20 Thread Matthew DeVore

There were many instances in this file where it seemed like BUG would be
better, so I created a new commit before this one to switch them over. The
interdiff is below.

BTW, why are there so many instances of "die" without "_"? I expect all
errors that may be caused by a user to be localized.

I'm going by the output of this: grep -IrE '\Wdie\([^_]' --exclude-dir=t

diff --git a/list-objects-filter.c b/list-objects-filter.c
index 8e3caf5bf..09b2b05d5 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none(

switch (filter_situation) {
default:
-   die("unknown filter_situation");
-   return LOFR_ZERO;
+   BUG("unknown filter_situation: %d", filter_situation);

case LOFS_BEGIN_TREE:
assert(obj->type == OBJ_TREE);
@@ -99,8 +98,7 @@ static enum list_objects_filter_result filter_trees_none(

switch (filter_situation) {
default:
-   die("unknown filter_situation");
-   return LOFR_ZERO;
+   BUG("unknown filter_situation: %d", filter_situation);

case LOFS_BEGIN_TREE:
case LOFS_BLOB:
@@ -151,8 +149,7 @@ static enum list_objects_filter_result  
filter_blobs_limit(


switch (filter_situation) {
default:
-   die("unknown filter_situation");
-   return LOFR_ZERO;
+   BUG("unknown filter_situation: %d", filter_situation);

case LOFS_BEGIN_TREE:
assert(obj->type == OBJ_TREE);
@@ -257,8 +254,7 @@ static enum list_objects_filter_result filter_sparse(

switch (filter_situation) {
default:
-   die("unknown filter_situation");
-   return LOFR_ZERO;
+   BUG("unknown filter_situation: %d", filter_situation);

case LOFS_BEGIN_TREE:
assert(obj->type == OBJ_TREE);
@@ -439,7 +435,7 @@ void *list_objects_filter__init(
assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);

if (filter_options->choice >= LOFC__COUNT)
-   die("invalid list-objects filter choice: %d",
+   BUG("invalid list-objects filter choice: %d",
filter_options->choice);

init_fn = s_filters[filter_options->choice];


Re: Git Project Leadership Committee

2018-08-20 Thread Christian Couder
On Fri, Aug 17, 2018 at 12:41 AM, Jeff King  wrote:

> So here are the nominations I came up with. If you'd like to nominate
> somebody else (or yourself!), please do. If you have opinions, let me
> know (public or private, as you prefer).
>
>  - Christian Couder
>  - Ævar Arnfjörð Bjarmason

Thanks for nominating both!

> Both are active, have been around a long time, and have taken part in
> non-code activities and governance discussions. My understanding is that
> Christian freelances on Git, which doesn't quite fit my "volunteer
> representative" request, but I think contracting on Git is its own
> interesting perspective to represent (and certainly he spent many years
> on the volunteer side).

Yeah, I am freelancing since October 2015 mostly for GitLab,
Booking.com and Protocol Labs as can be seen on my LinkedIn profile:

https://www.linkedin.com/in/christian-couder-569a731/

I feel lucky to be considered mostly like a regular employee
especially by GitLab and Protocol Labs. Both of these companies employ
a high ratio of remote developers from around the world, who often
have some kind of freelance legal status, so they give them as much as
possible the same kind of perks or incentives (like stock options) as
regular employees.

GitLab is a very open and transparent company. The way it works is
described in details in its Handbook
(https://about.gitlab.com/handbook/). Its informal policy regarding
Git has been to use regular released versions of Git in GitLab. If
possible GitLab should use a recent version of Git to benefit from the
latest improvements, though it should be compatible with old versions
of Git, as this can be useful for example to people who want to build
GitLab from source on top of a regular Linux distro that comes with an
old Git.

So for GitLab my work on Git has to be integrated upstream. I have
been working on remote odb related things, which I haven't managed to
get merged yet, and on a few other small things like delta islands for
which things have been going better so far.

I also do some Git support at GitLab (for Git users, GitLab
developers, customers, sales people, ...). I am sponsored by them to
participate in or give presentations at conferences (like FOSSASIA
2017, GSoC Mentor Summit, Bloomberg Hackathon, Git Merge, GitLab
Summit, ...). And sometimes I do other marketing, security, developer
relations or sales (like meeting a few French customers) related
things.

Ævar already talked in details about Booking.com and my work for them.

I have been working much less for Protocol Labs than for GitLab or
Booking.com since I started working for GitLab around 2 years ago.

As with Git I had started working on my free time on IPFS
(https://ipfs.io/) before I became a freelance working on it. So for
Protocol Labs I have been using Sharness
(https://github.com/chriscool/sharness/, which was created in 2011 by
extracting t/test-lib.sh from Git) to add and maintain end to end
tests to go-IPFS and other IPFS related projects.

Around one year ago Protocol Labs made a successful ICO (Initial Coin
Offering) for Filecoin (https://filecoin.io/) and since then things
have become a bit more like in a regular company (which is not
necessarily bad).

I have also had a few consulting contracts from various French
companies for a few days each about consulting or teaching Git/GitLab.


Re: [PATCH v5 3/7] unpack-trees: optimize walking same trees with cache-tree

2018-08-20 Thread Ben Peart




On 8/18/2018 10:41 AM, Nguyễn Thái Ngọc Duy wrote:

In order to merge one or many trees with the index, unpack-trees code
walks multiple trees in parallel with the index and performs n-way
merge. If we find out at start of a directory that all trees are the
same (by comparing OID) and cache-tree happens to be available for
that directory as well, we could avoid walking the trees because we
already know what these trees contain: it's flattened in what's called
"the index".

The upside is of course a lot less I/O since we can potentially skip
lots of trees (think subtrees). We also save CPU because we don't have
to inflate and apply the deltas. The downside is of course more
fragile code since the logic in some functions are now duplicated
elsewhere.

"checkout -" with this patch on webkit.git (275k files):

 baseline  new
   
 0.056651714   0.080394752 s:  read cache .git/index
 0.183101080   0.216010838 s:  preload index
 0.008584433   0.008534301 s:  refresh index
 0.633767589   0.251992198 s:   traverse_trees
 0.340265448   0.377031383 s:   check_updates
 0.381884638   0.372768105 s:   cache_tree_update
 1.401562947   1.045887251 s:  unpack_trees
 0.338687914   0.314983512 s:  write index, changed mask = 2e
 0.411927922   0.062572653 s:traverse_trees
 0.23335   0.22544 s:check_updates
 0.423697246   0.073795585 s:   unpack_trees
 0.423708360   0.073807557 s:  diff-index
 2.559524127   1.938191592 s: git command: git checkout -

Another measurement from Ben's running "git checkout" with over 500k
trees (on the whole series):

 baselinenew
   --
 0.535510167 0.556558733 s: read cache .git/index
 0.3057373   0.3147105   s: initialize name hash
 0.0184082   0.023558433 s: preload index
 0.086910967 0.089085967 s: refresh index
 7.889590767 2.191554433 s: unpack trees
 0.120760833 0.131941267 s: update worktree after a merge
 2.2583504   2.572663167 s: repair cache-tree
 0.8916137   0.959495233 s: write index, changed mask = 28
 3.405199233 0.2710663   s: unpack trees
 0.000999667 0.0021554   s: update worktree after a merge
 3.4063306   0.273318333 s: diff-index
 16.9524923  9.462943133 s: git command: git.exe checkout

This command calls unpack_trees() twice, the first time on 2way merge
and the second 1way merge. In both times, "unpack trees" time is
reduced to one third. Overall time reduction is not that impressive of
course because index operations take a big chunk. And there's that
repair cache-tree line.

PS. A note about cache-tree invalidation and the use of it in this
code.

We do invalidate cache-tree in _source_ index when we add new entries
to the (temporary) "result" index. But we also use the cache-tree from
source index in this optimization. Does this mean we end up having no
cache-tree in the source index to activate this optimization?

The answer is twisted: the order of finding a good cache-tree and
invalidating it matters. In this case we check for a good cache-tree
first in all_trees_same_as_cache_tree(), then we start to merge things
and potentially invalidate that same cache-tree in the process. Since
cache-tree invalidation happens after the optimization kicks in, we're
still good. But we may lose that cache-tree at the very first
call_unpack_fn() call in traverse_by_cache_tree().

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
  unpack-trees.c | 127 +
  1 file changed, 127 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index 6d9f692ea6..8376663b59 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -635,6 +635,102 @@ static inline int are_same_oid(struct name_entry *name_j, 
struct name_entry *nam
return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
  }
  
+static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,

+   struct name_entry *names,
+   struct traverse_info *info)
+{
+   struct unpack_trees_options *o = info->data;
+   int i;
+
+   if (!o->merge || dirmask != ((1 << n) - 1))
+   return 0;
+
+   for (i = 1; i < n; i++)
+   if (!are_same_oid(names, names + i))
+   return 0;
+
+   return cache_tree_matches_traversal(o->src_index->cache_tree, names, 
info);
+}
+
+static int index_pos_by_traverse_info(struct name_entry *names,
+ struct traverse_info *info)
+{
+   struct unpack_trees_options *o = info->data;
+   int len = traverse_path_len(info, names);
+   char *name = xmalloc(len + 1 /* slash */ + 1 /* NUL 

Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output

2018-08-20 Thread Han-Wen Nienhuys
On Sat, Aug 18, 2018 at 6:16 PM Junio C Hamano  wrote:
> > Actually, let's just lose the conditional.  strbuf_add() would catch
> > and issue an error message when it notices that we fed negative
> > count anyway ;-).
>
> So I'll have this applied on top of the original topic to prevent a
> buggy version from escaping the lab.
>
> -- >8 --
> Subject: [PATCH] sideband: do not read beyond the end of input
>
> The caller of maybe_colorize_sideband() gives a counted buffer
> , but the callee checked src[] as if it were a NUL terminated
> buffer.  If src[] had all isspace() bytes in it, we would have made
> n negative, and then
>
>  (1) made number of strncasecmp() calls to see if the remaining
>  bytes in src[] matched keywords, reading beyond the end of the
>  array (this actually happens even if n does not go negative),
>  and/or
>
>  (2) called strbuf_add() with negative count, most likely triggering
>  the "you want to use way too much memory" error due to unsigned
>  integer overflow.
>
> Fix both issues by making sure we do not go beyond [n].
>
> In the longer term we may want to accept size_t as parameter for
> clarity (even though we know that a sideband message we are painting
> typically would fit on a line on a terminal and int is sufficient).
> Write it down as a NEEDSWORK comment.
>
> Helped-by: Jonathan Nieder 
> Signed-off-by: Junio C Hamano 
> ---
>  sideband.c  |  8 ++--
>  t/t5409-colorize-remote-messages.sh | 14 ++
>  2 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/sideband.c b/sideband.c
> index 1c6bb0e25b..368647acf8 100644
> --- a/sideband.c
> +++ b/sideband.c
> @@ -65,6 +65,8 @@ void list_config_color_sideband_slots(struct string_list 
> *list, const char *pref
>   * Optionally highlight one keyword in remote output if it appears at the 
> start
>   * of the line. This should be called for a single line only, which is
>   * passed as the first N characters of the SRC array.
> + *
> + * NEEDSWORK: use "size_t n" instead for clarity.
>   */
>  static void maybe_colorize_sideband(struct strbuf *dest, const char *src, 
> int n)
>  {
> @@ -75,7 +77,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
> const char *src, int n)
> return;
> }
>
> -   while (isspace(*src)) {
> +   while (0 < n && isspace(*src)) {
> strbuf_addch(dest, *src);
> src++;
> n--;
> @@ -84,6 +86,9 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
> const char *src, int n)
> for (i = 0; i < ARRAY_SIZE(keywords); i++) {
> struct keyword_entry *p = keywords + i;
> int len = strlen(p->keyword);
> +
> +   if (n <= len)
> +   continue;

I would suggest

 if (n < len) continue;
..
if (!strncasecmp(p->keyword, src, len) && (n == len || !isalnum(src[len]))) {

so we colorize a single line that looks like "warning" as well


Other than that, LGTM.

--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado


Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output

2018-08-20 Thread Han-Wen Nienhuys
and, thanks for cleaning up after me :)

--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado


[no subject]

2018-08-20 Thread Charity Bam
Üdvözlet neked, a nevem Mrs. Charity Edward. egy francia
állampolgárság, özvegy vagyok, jelenleg kórházba került a rákbetegség
miatt. Közben úgy döntöttem, hogy pénzt adok neked mint egy megbízható
személy, aki ezt a pénzt bölcsen fogja felhasználni, 3 800 000 millió
eurót. segíteni a szegényeket és kevésbé kiváltságos.

Tehát ha hajlandóak vagyunk elfogadni ezt az ajánlatot, és pontosan
úgy, ahogy én utasítom, akkor további részletekért térjenek vissza
hozzám.

Mrs. Charity Edward Bam.


Do not raise conflict when a code in a patch was already added

2018-08-20 Thread Konstantin Kharlamov

So, steps-to-reproduce below rather full of trivia like setting up a
repo, but the TL;DR is:

Upon using `git rebase -i HEAD~1` and then `git add -p` to add part of a
"hunk" as one commit, and then using `git rebase --continue` so the
other part of hunk would be left in top commit; git raises a conflict.

It's spectacular, that content of one of inserted conflict markers is
empty, so all you have to do is to remove the markers, and use `git add`
on the file, and then `git rebase --continue`

Its a lot of unncessary actions, git could just figure that the code it
sees in the patch is already there, being a part of another commit.

Maybe git could issue a warning, or to question a user interactively 
(y/n); but raising a conflict IMO is unnecessary.


# Steps to reproduce

In empty dir execute:

$ git init
$ touch test
Initialized empty Git repository in /tmp/test/.git/
$ git add test
$ git commit
[master (root-commit) a7ce543] 1st commit
 1 file changed, 2 insertions(+)
 create mode 100644 test
$ echo -e "foo\nbar" > test # content you'll want to break
$ git add -u && git commit
[detached HEAD 9e28331] 2-nd commit
 1 file changed, 2 insertions(+)
$ git rebase -i --root
Stopped at a7ce543...  1st commit
You can amend the commit now, with

  git commit --amend

Once you are satisfied with your changes, run

  git rebase --continue

Put "edit" for the 2-nd commit

$ git reset HEAD^
Unstaged changes after reset:
M   test
$ git add -p
diff --git a/test b/test
index e69de29..3bd1f0e 100644
--- a/test
+++ b/test
@@ -0,0 +1,2 @@
+foo
+bar
Stage this hunk [y,n,q,a,d,e,?]? e

╭─constantine@constantine-N61Ja  /tmp/test ‹node-›  ‹› (e721fa3*)
╰─$ git commit
[detached HEAD 27b2f63] add foo
 1 file changed, 1 insertion(+)
╭─constantine@constantine-N61Ja  /tmp/test ‹node-›  ‹› (27b2f63*)
╰─$ git rebase --continue
test: needs update
You must edit all merge conflicts and then
mark them as resolved using git add

What happened is that it's obvious that the hunk was broken to multiple
commits, and git should figure that out, and not to raise a conflict.

Side note: for some reason in the test git didn't insert conflict
markers. It did in real-world usecase though, and there was simply no
content inside one of them.


Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)

2018-08-20 Thread Phillip Wood
On 17/08/2018 23:44, Junio C Hamano wrote:
> Here are the topics that have been cooking.  Commits prefixed with
> '-' are only in 'pu' (proposed updates) while commits prefixed with
> '+' are in 'next'.  The ones marked with '.' do not appear in any of
> the integration branches, but I am still holding onto them.

> * pw/rebase-i-author-script-fix (2018-08-07) 2 commits
>  - sequencer: fix quoting in write_author_script
>  - sequencer: handle errors from read_author_ident()
> 
>  Recent "git rebase -i" update started to write bogusly formatted
>  author-script, with a matching broken reading code.  These are
>  being fixed.
> 
>  Undecided.
>  Is it the list consensus to favor this "with extra code, read the
>  script written by bad writer" approach?

I think there was agreement between myself and Eric on the last version,
I'm not sure anyone else has expressed an opinion. The problem with
fixing the quoting without any backwards compatibility is that if git is
upgraded while a rebase is stopped read_author_script() will happily use
the broken quoting to create a corrupted author name in the new commit
if the name contains "'".

The compatibility code in the latest version relies on the missing "'"
at the end of the GIT_AUTHOR_DATE line which is fixed by
es/rebase-i-author-script-fix which is now in master. If there is a
release with es/rebase-i-author-script-fix but not
pw/rebase-i-author-script-fix we'll have to rethink as the detection
wont be reliable. I have a branch that fixes read_author_script() to use
sq_dequote() at
https://github.com/phillipwood/git/commits/wip/fix-author-script. At the
moment it has compatibility with broken quoting, but I could strip that
out and then sq_dequote() will return an error with the broken quoting
and the user would have to restart the rebase. So one option is to drop
this series and wait for me to finish the improved solution next month.

> 
> * pw/add-p-select (2018-07-26) 4 commits
>  - add -p: optimize line selection for short hunks
>  - add -p: allow line selection to be inverted
>  - add -p: select modified lines correctly
>  - add -p: select individual hunk lines
> 
>  "git add -p" interactive interface learned to let users choose
>  individual added/removed lines to be used in the operation, instead
>  of accepting or rejecting a whole hunk.
> 
>  Will hold.
>  cf. 
>  I found the feature to be hard to explain, and may result in more
>  end-user complaints, but let's see.

Thanks, I'll send some follow up patches to improve the help and
documentation next month.

Best Wishes

Phillip


Re: Antw: Re: non-smooth progress indication for git fsck and git gc

2018-08-20 Thread Ulrich Windl
Hi!

Here are some stats from the repository. First the fast import ones (which had
good performance, but probably all cached, also):

% git fast-import <../git-stream
/usr/lib/git/git-fast-import statistics:
-
Alloc'd objects:  55000
Total objects:51959 (56 duplicates  )
  blobs  :47991 ( 0 duplicates  0 deltas of   
  0 attempts)
  trees  : 3946 (56 duplicates994 deltas of  
3929 attempts)
  commits:   22 ( 0 duplicates  0 deltas of   
  0 attempts)
  tags   :0 ( 0 duplicates  0 deltas of   
  0 attempts)
Total branches:  15 (15 loads )
  marks:1048576 ( 48013 unique)
  atoms:  43335
Memory total:  9611 KiB
   pools:  7033 KiB
 objects:  2578 KiB
-
pack_report: getpagesize()=   4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit  = 8589934592
pack_report: pack_used_ctr=   1780
pack_report: pack_mmap_calls  = 23
pack_report: pack_open_windows=  1 /  1
pack_report: pack_mapped  =2905751 /2905751
-

Then the output from git-sizer:

Processing blobs: 47991
Processing trees: 3946
Processing commits: 22
Matching commits to trees: 22
Processing annotated tags: 0
Processing references: 15
| Name | Value | Level of concern   |
|  | - | -- |
| Overall repository size  |   ||
| * Blobs  |   ||
|   * Total size   |  13.7 GiB | *  |
|  |   ||
| Biggest objects  |   ||
| * Trees  |   ||
|   * Maximum entries  [1] |  13.4 k   | *  |
| * Blobs  |   ||
|   * Maximum size [2] |   279 MiB | *  |
|  |   ||
| Biggest checkouts|   ||
| * Maximum path depth [3] |10 | *  |
| * Maximum path length[3] |   130 B   | *  |
| * Total size of files[3] |  8.63 GiB | *  |

[1]  b701345cbd4317276568b9d9890fd77f38933bdc
(refs/heads/master:Resources/default/CIFP)
[2]  19f54c4a7595667329c1be23200234f0fe50ac56
(refs/heads/master:Resources/default/apt.dat)
[3]  b0e3d3a2b7f2504117408f13486c905a8eb8fb1e (refs/heads/master^{tree})

Some notes:
[1] is a directory with many short (typically < 1kB) text files
[2] is a very large text file with many changes
[3] Yes, some paths are long

Regards,
Ulrich


>>> Ævar Arnfjörð Bjarmason  schrieb am 20.08.2018 um 10:57
in
Nachricht <87woslpg9i@evledraar.gmail.com>:

> On Mon, Aug 20 2018, Ulrich Windl wrote:
> 
> Jeff King  schrieb am 16.08.2018 um 22:55 in Nachricht
>> <20180816205556.ga8...@sigill.intra.peff.net>:
>>> On Thu, Aug 16, 2018 at 10:35:53PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>>
 This is all interesting, but I think unrelated to what Ulrich is talking
 about. Quote:

 Between the two phases of "git fsck" (checking directories and
 checking objects) there was a break of several seconds where no
 progress was indicated

 I.e. it's not about the pause you get with your testcase (which is
 certainly another issue) but the break between the two progress bars.
>>>
>>> I think he's talking about both. What I said responds to this:
>>
>> Hi guys!
>>
>> Yes, I was wondering what git does between the two visible phases, and 
> between
>> the lines I was suggesting another progress message between those phases.
At
>> least the maximum unspecific three-dot-message "Thinking..." could be 
> displayed
>> ;-) Of course anything more appropriate would be welcome.
>> Also that message should only be displayed if it's foreseeable that the
>> operation will take significant time. In my case (I just repeated it a few
>> minutes ago) the delay is significant (at least 10 seconds). As noted 
> earlier I
>> was hoping to capture the timing in a screencast, but it seems 

Re: Draft of Git Rev News edition 42

2018-08-20 Thread Christian Couder
On Mon, Aug 20, 2018 at 10:51 AM, Ævar Arnfjörð Bjarmason
 wrote:
> On Mon, Aug 20 2018, Christian Couder wrote:

>> Jakub, Markus, Gabriel and me plan to publish this edition on
>> Wednesday August 22nd.
>
> Let's mention that we've picked SHA-256 as NewHash, as a follow-up to
> last month's edition which covered the discussion around the NewHash
> selection. I've submitted a PR for that here:
> https://github.com/git/git.github.io/pull/307/files

Great, merged! Thanks!


Re: Antw: Re: non-smooth progress indication for git fsck and git gc

2018-08-20 Thread Ævar Arnfjörð Bjarmason


On Mon, Aug 20 2018, Ulrich Windl wrote:

 Jeff King  schrieb am 16.08.2018 um 22:55 in Nachricht
> <20180816205556.ga8...@sigill.intra.peff.net>:
>> On Thu, Aug 16, 2018 at 10:35:53PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>>> This is all interesting, but I think unrelated to what Ulrich is talking
>>> about. Quote:
>>>
>>> Between the two phases of "git fsck" (checking directories and
>>> checking objects) there was a break of several seconds where no
>>> progress was indicated
>>>
>>> I.e. it's not about the pause you get with your testcase (which is
>>> certainly another issue) but the break between the two progress bars.
>>
>> I think he's talking about both. What I said responds to this:
>
> Hi guys!
>
> Yes, I was wondering what git does between the two visible phases, and between
> the lines I was suggesting another progress message between those phases. At
> least the maximum unspecific three-dot-message "Thinking..." could be 
> displayed
> ;-) Of course anything more appropriate would be welcome.
> Also that message should only be displayed if it's foreseeable that the
> operation will take significant time. In my case (I just repeated it a few
> minutes ago) the delay is significant (at least 10 seconds). As noted earlier 
> I
> was hoping to capture the timing in a screencast, but it seems all the delays
> were just optimized away in the recording.
>
>>
>>> >> During "git gc" the writing objects phase did not update for some
>>> >> seconds, but then the percentage counter jumped like from 15% to 42%.
>>
>> But yeah, I missed that the fsck thing was specifically about a break
>> between two meters. That's a separate problem, but also worth
>> discussing (and hopefully much easier to address).
>>
>>> If you fsck this repository it'll take around (on my spinning rust
>>> server) 30 seconds between 100% of "Checking object directories" before
>>> you get any output from "Checking objects".
>>>
>>> The breakdown of that is (this is from approximate eyeballing):
>>>
>>>  * We spend 1-3 seconds just on this:
>>>
>>
> https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack
>
>> -check.c#L181
>>
>> OK, so that's checking the sha1 over the .idx file. We could put a meter
>> on that. I wouldn't expect it to generally be all that slow outside of
>> pathological cases, since it scales with the number of objects (and 1s
>> is our minimum update anyway, so that might be OK as-is). Your case has
>> 13M objects, which is quite large.
>
> Sometimes an oldish CPU could bring performance surprises, maybe. Anyway my
> CPU is question is an AMD Phenom2 quad-core with 3.2GHz nominal, and there is 
> a
> classic spinning disk with 5400RPM built in...
>
>>
>>>  * We spend the majority of the ~30s on this:
>>>
>>
> https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack
>
>> -check.c#L70-L79
>>
>> This is hashing the actual packfile. This is potentially quite long,
>> especially if you have a ton of big objects.
>
> That seems to apply. BTW: Is there a way go get some repository statistics
> like a histogram of object sizes (or whatever that might be useful to help
> making decisions)?

The git-sizer program is really helpful in this regard:
https://github.com/github/git-sizer

>>
>> I wonder if we need to do this as a separate step anyway, though. Our
>> verification is based on index-pack these days, which means it's going
>> to walk over the whole content as part of the "Indexing objects" step to
>> expand base objects and mark deltas for later. Could we feed this hash
>> as part of that walk over the data? It's not going to save us 30s, but
>> it's likely to be more efficient. And it would fold the effort naturally
>> into the existing progress meter.
>>
>>>  * Wes spend another 3-5 seconds on this QSORT:
>>>
>>
> https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack
>
>> -check.c#L105
>>
>> That's a tough one. I'm not sure how we'd count it (how many compares we
>> do?). And each item is doing so little work that hitting the progress
>> code may make things noticeably slower.
>
> If it's sorting, maybe add some code like (wild guess):
>
> if (objects_to_sort > magic_number)
>message("Sorting something...");

I think a good solution to these cases is to just introduce something to
the progress.c mode where it learns how to display a counter where we
don't know what the end-state will be. Something like your proposed
magic_number can just be covered under the more general case where we
don't show the progress bar unless it's been 1 second (which I believe
is the default).

>>
>> Again, your case is pretty big. Just based on the number of objects,
>> linux.git should be 1.5-2.5 seconds on your machine for the same
>> operation. Which I think may be small enough to ignore (or even just
>> print a generic before/after). It's really the 30s packfile hash that's
>> making the whole thing so terrible.
>>
>> -Peff


Re: Draft of Git Rev News edition 42

2018-08-20 Thread Ævar Arnfjörð Bjarmason


On Mon, Aug 20 2018, Christian Couder wrote:

> Hi,
>
> A draft of a new Git Rev News edition is available here:
>
>   
> https://github.com/git/git.github.io/blob/master/rev_news/drafts/edition-42.md
>
> Everyone is welcome to contribute in any section either by editing the
> above page on GitHub and sending a pull request, or by commenting on
> this GitHub issue:
>
>   https://github.com/git/git.github.io/issues/305
>
> You can also reply to this email.
>
> In general all kinds of contribution, for example proofreading,
> suggestions for articles or links, help on the issues in GitHub, and
> so on, are very much appreciated.
>
> I tried to cc everyone who appears in this edition, but maybe I missed
> some people, sorry about that.
>
> Jakub, Markus, Gabriel and me plan to publish this edition on
> Wednesday August 22nd.

Let's mention that we've picked SHA-256 as NewHash, as a follow-up to
last month's edition which covered the discussion around the NewHash
selection. I've submitted a PR for that here:
https://github.com/git/git.github.io/pull/307/files


Antw: Re: non-smooth progress indication for git fsck and git gc

2018-08-20 Thread Ulrich Windl
>>> Jeff King  schrieb am 16.08.2018 um 22:55 in Nachricht
<20180816205556.ga8...@sigill.intra.peff.net>:
> On Thu, Aug 16, 2018 at 10:35:53PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
>> This is all interesting, but I think unrelated to what Ulrich is talking
>> about. Quote:
>> 
>> Between the two phases of "git fsck" (checking directories and
>> checking objects) there was a break of several seconds where no
>> progress was indicated
>> 
>> I.e. it's not about the pause you get with your testcase (which is
>> certainly another issue) but the break between the two progress bars.
> 
> I think he's talking about both. What I said responds to this:

Hi guys!

Yes, I was wondering what git does between the two visible phases, and between
the lines I was suggesting another progress message between those phases. At
least the maximum unspecific three-dot-message "Thinking..." could be displayed
;-) Of course anything more appropriate would be welcome.
Also that message should only be displayed if it's foreseeable that the
operation will take significant time. In my case (I just repeated it a few
minutes ago) the delay is significant (at least 10 seconds). As noted earlier I
was hoping to capture the timing in a screencast, but it seems all the delays
were just optimized away in the recording.

> 
>> >> During "git gc" the writing objects phase did not update for some
>> >> seconds, but then the percentage counter jumped like from 15% to 42%.
> 
> But yeah, I missed that the fsck thing was specifically about a break
> between two meters. That's a separate problem, but also worth
> discussing (and hopefully much easier to address).
> 
>> If you fsck this repository it'll take around (on my spinning rust
>> server) 30 seconds between 100% of "Checking object directories" before
>> you get any output from "Checking objects".
>> 
>> The breakdown of that is (this is from approximate eyeballing):
>> 
>>  * We spend 1-3 seconds just on this:
>>
>
https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack

> -check.c#L181
> 
> OK, so that's checking the sha1 over the .idx file. We could put a meter
> on that. I wouldn't expect it to generally be all that slow outside of
> pathological cases, since it scales with the number of objects (and 1s
> is our minimum update anyway, so that might be OK as-is). Your case has
> 13M objects, which is quite large.

Sometimes an oldish CPU could bring performance surprises, maybe. Anyway my
CPU is question is an AMD Phenom2 quad-core with 3.2GHz nominal, and there is a
classic spinning disk with 5400RPM built in...

> 
>>  * We spend the majority of the ~30s on this:
>>
>
https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack

> -check.c#L70-L79
> 
> This is hashing the actual packfile. This is potentially quite long,
> especially if you have a ton of big objects.

That seems to apply. BTW: Is there a way go get some repository statistics
like a histogram of object sizes (or whatever that might be useful to help
making decisions)?

> 
> I wonder if we need to do this as a separate step anyway, though. Our
> verification is based on index-pack these days, which means it's going
> to walk over the whole content as part of the "Indexing objects" step to
> expand base objects and mark deltas for later. Could we feed this hash
> as part of that walk over the data? It's not going to save us 30s, but
> it's likely to be more efficient. And it would fold the effort naturally
> into the existing progress meter.
> 
>>  * Wes spend another 3-5 seconds on this QSORT:
>>
>
https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack

> -check.c#L105
> 
> That's a tough one. I'm not sure how we'd count it (how many compares we
> do?). And each item is doing so little work that hitting the progress
> code may make things noticeably slower.

If it's sorting, maybe add some code like (wild guess):

if (objects_to_sort > magic_number)
   message("Sorting something...");

> 
> Again, your case is pretty big. Just based on the number of objects,
> linux.git should be 1.5-2.5 seconds on your machine for the same
> operation. Which I think may be small enough to ignore (or even just
> print a generic before/after). It's really the 30s packfile hash that's
> making the whole thing so terrible.
> 
> -Peff





Antw: Re: non-smooth progress indication for git fsck and git gc

2018-08-20 Thread Ulrich Windl
>>> Duy Nguyen  schrieb am 16.08.2018 um 17:18 in Nachricht
:
> On Thu, Aug 16, 2018 at 1:10 PM Ulrich Windl
>  wrote:
>>
>> Hi!
>>
>> I'd like to point out some minor issue observed while processing some 
> 5-object repository with many binary objects, but most are rather small:
>>
>> Between the two phases of "git fsck" (checking directories and checking 
> objects) there was a break of several seconds where no progress was 
> indicated.
>>
>> During "git gc" the writing objects phase did not update for some seconds, 
> but then the percentage counter jumped like from 15% to 42%.
>>
>> I understand that updating the progress output too often can be a 
> performance bottleneck, while upating it too rarely might only bore the 
> user... ;-)
>>
>> But maybe something can be done. My git version is 2.13.7 (openSUSE 42.3).
> 
> Is it possible to make this repository public? You can also use "git
> fast-export --anonymize" to make a repo with same structure but no
> real content (but read the man page about that option first)

Hi!

Actually I tried that locally, but with the resulting repository both, fsck and 
gc are very fast. So I guess it won't be very useful. Also the original .git 
directory uses 5.3G, while the anonymous .git just used 4.3M...

I tried to capture the behavior as screencast, but it seems the screencast 
optimized the little cahnges away, and in the result git almost had no delay on 
any operation 8-(

Regards,
Ulrich

> 
>> Regards,
>> Ulrich
>>
>>
> 
> 
> -- 
> Duy





Draft of Git Rev News edition 42

2018-08-20 Thread Christian Couder
Hi,

A draft of a new Git Rev News edition is available here:

  https://github.com/git/git.github.io/blob/master/rev_news/drafts/edition-42.md

Everyone is welcome to contribute in any section either by editing the
above page on GitHub and sending a pull request, or by commenting on
this GitHub issue:

  https://github.com/git/git.github.io/issues/305

You can also reply to this email.

In general all kinds of contribution, for example proofreading,
suggestions for articles or links, help on the issues in GitHub, and
so on, are very much appreciated.

I tried to cc everyone who appears in this edition, but maybe I missed
some people, sorry about that.

Jakub, Markus, Gabriel and me plan to publish this edition on
Wednesday August 22nd.

Thanks,
Christian.