Re: Do not raise conflict when a code in a patch was already added

2018-08-21 Thread Konstantin Kharlamov




On 20.08.2018 22:22, Johannes Sixt wrote:

Am 20.08.2018 um 19:40 schrieb Phillip Wood:

On 20/08/2018 11:22, Konstantin Kharlamov wrote:

It's spectacular, that content of one of inserted conflict markers is
empty, so all you have to do is to remove the markers, and use `git add`
on the file, and then `git rebase --continue`

Its a lot of unncessary actions, git could just figure that the code it
sees in the patch is already there, being a part of another commit.


If there are conflict markers where one side is empty it means that some
lines from the merge base (which for a rebase is the parent of the
commit being picked) have been deleted on one side and modified on the
other. Git cannot know if you want to use the deleted version or the
modified version.


There's another possibility (and I think it is what happens actually in 
Konstantin's case): When one side added lines 1 2 and the other side 
added 1 2 3, then the actual conflict is << 1 2 == 1 2 3 >>, but our 
merge code is able to move the identical part out of the conflicted 
section: 1 2 << == 3 >>. But this is just a courtesy for the user; the 
real conflict is the original one. Without this optimization, the work 
to resolve the conflict would be slightly more arduous.


Yeah, thanks, that's what happens. And I'm wondering, is it really 
needed to raise a conflict there? Would it be worth to just apply the 
line "3", possibly with a warning or an interactive question to user 
(apply/raise) that identical parts were ignored?


Add LIFESTYLE EXPO TOKYO to your 2019 trade show planning!

2018-08-21 Thread Eiichi Hasegawa LIFESTYLE EXPO TOKYO Show Management
Dear International Sales & Marketing Director,
Zhejiang Wuchuan Industrial Co., Ltd
(Please kindly forward this message to the person responsible for export.)

This is Eiichi Hasegawa from LIFESTYLE EXPO TOKYO Show Management.
LIFESTYLE EXPO TOKYO is Japan's leading trade show for gifts, homeware and 
lifestyle products held twice a year.

Today, we would like you to consider adding LIFESTYLE EXPO TOKYO 2019 
[January]/[July] to your 2019 trade show planning for the following reasons.

1) Best gateway to enter Japan/Asia-Pacific market!
-> You can find importers/distributors exclusive to Japan/Asia market.

2) Successful results can be expected.
-> Many exhibitors conduct in-depth "B to B" meetings with visitors on-sight.

3) Chosen by international exhibitors. 
-> LIFESTYLE EXPO TOKYO 2018 attracted countries and regions such as below.
 Australia, Bangladesh, China, Denmark, France, Germany, Hong Kong, India, 
Israel, Italy, Korea, USA, etc.

To receive further information on exhibiting, please simply reply to this email 
now.
-
mailto:lifestyle-...@reedexpo.co.jp
Company Name:
Contact Person:
Email:
TEL:
-

We look forward to hearing from you.

Best regards,

Eiichi Hasegawa (Mr.), Chisato Miyawaki (Ms.), Mikako Shimada (Ms.)
Qu Jun (Mr.), Choi Ilyong (Mr.)
LIFESTYLE EXPO TOKYO Show Management
Reed Exhibitions Japan Ltd.
TEL: +81-3-3349-8505
Mailto:lifestyle-...@reedexpo.co.jp

--
LIFESTYLE EXPO TOKYO 2019 [January] 
Date: January 30 (Wed.) - February 1 (Fri.), 2019
Venue: Makuhari Messe, Japan
WEB: https://www.lifestyle-expo.jp/en/spring/
--
LIFESTYLE EXPO TOKYO 2019 [June] 
Date: June 26 (Wed.) - June 28 (Fri.), 2019
Venue: Tokyo Big Sight, Japan
WEB: https://www.lifestyle-expo.jp/en/summer/
--



ID: E36-G1402-0075

















This message is delivered to you to provide details of exhibitions and 
conferences organised, co-organised, or managed by Reed Exhibitions Japan Ltd.
If you would like to change your contact information, or prefer not to receive 
further information on this exhibition/conference, please follow the directions 
below.


Please click the URL below and follow the directions on the website to update 
your e-mail and other information.
https://contact.reedexpo.co.jp/expo/REED/?lg=en&tp=ch&ec=CHANGE


Please reply to this mail changing the subject to "Remove from list".
You will not receive any further information on this exhibition/conference.


Re: Do not raise conflict when a code in a patch was already added

2018-08-21 Thread Igor Djordjevic
Hi Konstantin,

On 21/08/2018 11:37, Konstantin Kharlamov wrote:
> 
> > There's another possibility (and I think it is what happens 
> > actually in Konstantin's case): When one side added lines 1 2 and the 
> > other side added 1 2 3, then the actual conflict is 
> > << 1 2 == 1 2 3 >>, but our merge code is able to move the identical 
> > part out of the conflicted section: 1 2 << == 3 >>. But this is just 
> > a courtesy for the user; the real conflict is the original one. 
> > Without this optimization, the work to resolve the conflict would be 
> > slightly more arduous.
> 
> Yeah, thanks, that's what happens. And I'm wondering, is it really 
> needed to raise a conflict there? Would it be worth to just apply the 
> line "3", possibly with a warning or an interactive question to user 
> (apply/raise) that identical parts were ignored?

I see how this might make sense in the given example of "A added 1 
and 2, B added 1 and 2 and 3", but I'm afraid that might be a too 
narrow view.

What we actually don't know is if A deliberately chose not to include 
3, or even worse, if A started from having "1 and 2 and 3" in there, 
and then decided to remove 3.

In both these situation just applying 3 would be wrong, and raising a 
conflict seems as the most (and only?) sensible solution.

Applying _and_ asking for confirmation might be interesting, but I'm 
afraid it would favor specific use case only, being an annoyance in 
all the others (where it should really be a conflict, and you now 
have additional prompt to deal with).

That said, it would indeed be nice to have a way to communicate to 
`git rebase` that we are just splitting later commit into smaller 
parts preceding it, so situations like this could be resolved 
automatically and without conflicts, as you'd expected - but only 
within that narrow, user-provided/communicated context, not in 
general case.

Regards, Buga


[PATCH 1/1] Docs: Add commit-graph tech docs to Makefile

2018-08-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

Ensure that the commit-graph.txt and commit-graph-format.txt files
are compiled to HTML using ASCIIDOC.

Signed-off-by: Derrick Stolee 
---
 Documentation/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index d079d7c73..841e4f705 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -69,6 +69,8 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out 
technical/api-index-skel.txt technica
 SP_ARTICLES += $(API_DOCS)
 
 TECH_DOCS += SubmittingPatches
+TECH_DOCS += technical/commit-graph
+TECH_DOCS += technical/commit-graph-format
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
-- 
gitgitgadget


[PATCH 0/1] Docs: Add commit-graph tech docs to Makefile

2018-08-21 Thread Derrick Stolee via GitGitGadget
Similar to [1], add the commit-graph and commit-graph-format technical docs
to Documentation/Makefile so they are automatically converted to HTML when
needed.

I compiled the docs and inspected the HTML manually in the browser. Nothing
looked strange, so I don't think the docs themselves need any editing for
format.

[1] 
https://public-inbox.org/git/20180814222846.gg142...@aiede.svl.corp.google.com/
[PATCH] partial-clone: render design doc using asciidoc

Derrick Stolee (1):
  Docs: Add commit-graph tech docs to Makefile

 Documentation/Makefile | 2 ++
 1 file changed, 2 insertions(+)


base-commit: 53f9a3e157dbbc901a02ac2c73346d375e24978c
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-22%2Fderrickstolee%2Fmake-docs-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-22/derrickstolee/make-docs-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/22
-- 
gitgitgadget


Feature request: detecting mistyped command and isdentical possible matches

2018-08-21 Thread Erik Huizinga
Hi!

I have an alias for 'git status': 'git st'. I also have configured
that an unrecognised (mistyped) Git command runs the closest match. In
this case I did the following:

'''
$ git stt
git: 'stt' is not a git command. See 'git --help'.

The most similar commands are
  st
  status
'''

In this particular case it would have been nice if Git also understood
that 'git st' and 'git status' are equivalent in my configuration, so
that it shouldn't have to warn me about them as different matches, but
instead just run any of them.

Thank you for Git. Maybe I'll see this feature in the future!

Best,
Erik


Re: [PATCH 3/9] midx: mark bad packed objects

2018-08-21 Thread Derrick Stolee

On 8/20/2018 5:23 PM, Stefan Beller wrote:

On Mon, Aug 20, 2018 at 9:52 AM Derrick Stolee  wrote:

When an object fails to decompress from a pack-file, we mark the object
as 'bad' so we can retry with a different copy of the object (if such a
copy exists).

Before now, the multi-pack-index did not update the bad objects list for
the pack-files it contains, and we did not check the bad objects list
when reading an object. Now, do both.

This sounds like a bug fix unlike patches 1 & 2 that sound like
feature work(2) or making code more readable(1).
(After studying the code, this doesn't sound like a bug fix any more,
but a safety thing)


It is a safety thing. One codepath that needs this includes this comment:

            /*
             * We're probably in deep shit, but let's try to fetch
             * the required base anyway from another pack or loose.
             * This is costly but should happen only in the presence
             * of a corrupted pack, and is better than failing outright.
             */

This goes back to 8eca0b47: implement some resilience against pack 
corruptions. The logic is tested in t5303-pack-corruption-resilience.sh, 
with the test title '... and a redundant pack allows for full recovery too'.



Is it worth having this on a separate track coming in
faster than the rest of this series?


ds/multi-pack-index is cooking in 'next' until after 2.19.0, so I'm not 
sure it's worth splitting things up at this point.



Signed-off-by: Derrick Stolee 
---
  midx.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/midx.c b/midx.c
index 6824acf5f8..7fa75a37a3 100644
--- a/midx.c
+++ b/midx.c
@@ -280,6 +280,16 @@ static int nth_midxed_pack_entry(struct multi_pack_index 
*m, struct pack_entry *
 if (!is_pack_valid(p))
 return 0;

+   if (p->num_bad_objects) {
+   uint32_t i;

Is there a reason that i needs to be if 32 bits?
Would size_t (for iterating) or 'int' (as a default
like in many for loops) be ok, too?


i is bounded by p->num_bad_objects, which is also uint32_t.

Thanks,

-Stolee



Re: [PATCH 7/9] treewide: use get_all_packs

2018-08-21 Thread Derrick Stolee

On 8/20/2018 6:01 PM, Stefan Beller wrote:

On Mon, Aug 20, 2018 at 9:52 AM Derrick Stolee  wrote:

There are many places in the codebase that want to iterate over
all packfiles known to Git. The purposes are wide-ranging, and
those that can take advantage of the multi-pack-index already
do. So, use get_all_packs() instead of get_packed_git() to be
sure we are iterating over all packfiles.

So get_packed_git shouold not be used any further?
Do we want to annotate it as deprecated, to be deleted
in the future? Or even remove it as part of this series
(that might be an issue with other series in flight).


Perhaps the right thing to do would be to put a big warning over the 
definition to say "only use this if you really don't want 
get_all_packs()" or something. The reason I didn't remove it from 
packfile.h is that it is used in sha1-name.c alongside 
get_multi_pack_index() (so making it 'static' would be incorrect).




Re: [PATCH 0/9] multi-pack-index cleanups

2018-08-21 Thread Duy Nguyen
On Mon, Aug 20, 2018 at 6:53 PM Derrick Stolee  wrote:
> To better understand the implications of the multi-pack-index with
> other scenarios, I ran the test suite after adding a step to 'git repack'
> to write a multi-pack-index, and to default core.multiPackIndex to 'true'.
> This commit is available as [1].
>
> ...
>
> [1] 
> https://github.com/derrickstolee/git/commit/098dd1d515b592fb165a276241d7d68d1cde0036
> DO-NOT-MERGE: compute multi-pack-index on repack.

It should be able to _merge_ this patch, but only activate it when
some test environment variable is set. On Travis x86 we run the test
suite twice, once normal, and once with special features activated
(see "if test "$jobname" = "linux-gcc"" in ci/run-build-and-tests.sh).
I don't think midx is incompatible with any of those features, so we
could activate and run the whole test suite with it too.
-- 
Duy


Re: [PATCH 0/9] multi-pack-index cleanups

2018-08-21 Thread Derrick Stolee

On 8/21/2018 10:34 AM, Duy Nguyen wrote:

On Mon, Aug 20, 2018 at 6:53 PM Derrick Stolee  wrote:

To better understand the implications of the multi-pack-index with
other scenarios, I ran the test suite after adding a step to 'git repack'
to write a multi-pack-index, and to default core.multiPackIndex to 'true'.
This commit is available as [1].

...

[1] 
https://github.com/derrickstolee/git/commit/098dd1d515b592fb165a276241d7d68d1cde0036
 DO-NOT-MERGE: compute multi-pack-index on repack.

It should be able to _merge_ this patch, but only activate it when
some test environment variable is set. On Travis x86 we run the test
suite twice, once normal, and once with special features activated
(see "if test "$jobname" = "linux-gcc"" in ci/run-build-and-tests.sh).
I don't think midx is incompatible with any of those features, so we
could activate and run the whole test suite with it too.


Interesting. I'll look into this. A similar approach may work for the 
commit-graph.


Thanks,

-Stolee



Re: [PATCH/RFC] commit: new option to abort -a something is already staged

2018-08-21 Thread Duy Nguyen
On Mon, Aug 20, 2018 at 9:30 PM Jonathan Nieder  wrote:
>
> Hi,
>
> Nguyễn Thái Ngọc Duy wrote:
>
> > So many times I have carefully cherry picked changes to the index with
> > `git add -p` then accidentally did `git commit -am ` (usually by
> > retrieving a command from history and pressing Enter too quickly)
> > which destroyed beautiful index.
> >
> > One way to deal with this is some form of `git undo` that allows me to
> > retrieve the old index.
>
> Yes, I would love such an undo feature!
>
> How would you imagine that this information would get stored?  We can
> start with adding that and a low-level (plumbing) interface to it, to
> avoid being blocked on getting the user-facing (porcelain) "git undo"
> interface right.  (Or we can go straight for the porcelain.  It's fine
> for it to start minimal and gain switches later.)

Yeah I'd love to see "git undo" too, but unfortunately I don't have a
clear model how it should operate. Which is why I still hesitate to
implement one. :P

All I have is something close to how undo is done in an editor, where
we could save a list of actions and corresponding undo ones, but
editors are wysiwyg and we can't just let the user undo, see the
result elsewhere and redo if they're not happy.

> > Instead, let's handle just this problem for now. This new option
> > commit.preciousDirtyIndex, if set to false, will not allow `commit -a`
> > to continue if the final index is different from the existing one. I
> > don't think this can be achieved with hooks because the hooks don't
> > know about "-a" or different commit modes.
>
> I frequently use "git commit -a" this way intentionally, so I would be
> unlikely to turn this config on.  That leads me to suspect it's not a
> good candidate for configuration:
>
> - it's not configuration for the sake of a transition period, since some
>   people would keep it on forever
>
> - it's not configuration based on different project needs, either
>
> So configuration doesn't feel like a good fit.

I think it falls under personal preference (yes some people like me
will keep it on forever in fear of losing staged changes).

> That said, I lean toward your initial thought, that this is papering
> over a missing undo feature.  Can you say more about how you'd imagine
> undo working?

There is not much to say. But at least to be able to undo changes in
the index, I made two attempts in the past [1] [2] (and forgot about
them until I got bitten by the lack of undo again this time). I guess
I could push for one of those approaches again because it will help me
and also lay the foundation for any git-undo in the future.

Once we have can restore index, undoing "git reset --hard" is also
possible (by hashing everything and putting them in a temporary index
first).

[1] 
https://public-inbox.org/git/1375597720-13236-1-git-send-email-pclo...@gmail.com/
[2] 
https://public-inbox.org/git/1375966270-10968-1-git-send-email-pclo...@gmail.com/
-- 
Duy


Re: [PATCH v3] checkout: optimize "git checkout -b "

2018-08-21 Thread Duy Nguyen
On Mon, Aug 20, 2018 at 8:16 PM Elijah Newren  wrote:
> Playing with sparse-checkout, it feels to me like a half-baked
> feature.  It seems like it required too much manual work, and it was
> sometimes hard to tell if I was misunderstanding configuration rules
> or I was just running into bugs in the code.  I think I hit both but I
> didn't really want to get side-tracked further, yet.  (I do want to
> eventually come back to it.)  The only reason someone would go through
> that pain is if it provided massive performance benefits.

In my defense it was one of my first contribution when I was naiver
and basically an evolution of "git update-index --assume-unchanged". I
have something in the queue to improve/complement sparse-checkout but
my last update on that branch was 2.5 years ago, so it's not coming
soon.

I'd love to year how sparse checkout could be improved, or even
replaced. I think we still have to have some configuration rules, and
yes the flexibility of sparse checkout (or gitignore to be precise)
rules is a double-edged sword.
-- 
Duy


Re: Antw: Re: non-smooth progress indication for git fsck and git gc

2018-08-21 Thread Duy Nguyen
On Tue, Aug 21, 2018 at 3:13 AM Jeff King  wrote:
> I _think_ they should work together OK without further modification.
> Once upon a time the caller had to say "don't show if we're past N%
> after M seconds", but I think with the current code we'd just show it if
> we're not completely finished after 2 seconds.
>
> So it really should just be a simple:
>
>   progress = start_delayed_progress("Hashing packfile", 0);
>
> That said, counting bytes would probably be ugly (just because the
> numbers get really big). We'd probably prefer to show a throughput or
> something.

Or just an ascii throbber. I think the important thing is communicate
"I'm still doing something, not hanging up". "Hashing packfile"
satisfies the "something" part, the throbber the "hanging".
-- 
Duy


Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0

2018-08-21 Thread Duy Nguyen
On Mon, Aug 20, 2018 at 8:38 PM Stefan Beller  wrote:
>
> On Mon, Aug 20, 2018 at 6:18 AM Matthew DeVore  wrote:
> >
> > There were many instances in this file where it seemed like BUG would be
> > better, so I created a new commit before this one to switch them over. The
> > interdiff is below.
> >
> > BTW, why are there so many instances of "die" without "_"? I expect all
> > errors that may be caused by a user to be localized.
>
> Well, there is the porcelain layer to be consumed by a human user
> and the plumbing that is good for scripts. And in scripts you might want
> to grep for certain errors and react to that, so a non-localized error
> message makes the script possible to run in any localisation.

I probably have a different view about this, but strings (as English
sentences) are for human only and should be translated. For machines
there should be well defined format (that  just might look like
English), not totally free text. In some case, this format can be as
simple as the "error/warning/fatal" prefix, which is left
untranslated, but the rest should be. There is no guarantee that these
die() messages will not change in the future, even left untranslated.
-- 
Duy


Re: [PATCH 1/3] diff.c: add --output-indicator-{new, old, context}

2018-08-21 Thread Johannes Schindelin
Hi Stefan,

On Mon, 20 Aug 2018, Stefan Beller wrote:

> On Mon, Aug 20, 2018 at 12:31 PM Johannes Schindelin
>  wrote:
> >
> > On Fri, 17 Aug 2018, Stefan Beller wrote:
> >
> > > This will prove useful in range-diff in a later patch as we will be able
> > > to differentiate between adding a new file (that line is starting with
> > > +++ and then the file name) and regular new lines.
> > >
> > > It could also be useful for experimentation in new patch formats, i.e.
> > > we could teach git to emit moved lines with lines other than +/-.
> >
> > Thanks.
> >
> > > diff --git a/diff.c b/diff.c
> > > index c5c7739ce34..03486c35b75 100644
> > > --- a/diff.c
> > > +++ b/diff.c
> > > @@ -1281,7 +1281,9 @@ static void emit_diff_symbol_from_struct(struct 
> > > diff_options *o,
> > >   else if (c == '-')
> > >   set = diff_get_color_opt(o, DIFF_FILE_OLD);
> > >   }
> > > - emit_line_ws_markup(o, set_sign, set, reset, ' ', line, len,
> > ^
> > Here we already pass `o`... so...
> >
> > > + emit_line_ws_markup(o, set_sign, set, reset,
> > > + 
> > > o->output_indicators[OUTPUT_INDICATOR_CONTEXT],
> > 
> > ^^^
> > ... here, we could simply pass `OUTPUT_INDICATOR_CONTEXT` and let the
> > callee look it up in`o->output_indicators[]`...
> >
> > I read all three patches and did not see a reason why we could not
> > simplify the code that way.
> >
> > Other than that: great!
> 
> Thanks!
> 
> I considered it, but was put off by the (small) effort of yet another
> diff refactoring.
> 
> I'll include it in a resend if a resend is needed, otherwise
> I would suggest to make it a patch on top?

Sounds good!
Dscho


Re: [PATCH v2 5/8] commit-graph: not compatible with replace objects

2018-08-21 Thread Junio C Hamano
Derrick Stolee  writes:

> Create new method commit_graph_compatible(r) to check if a given
> repository r is compatible with the commit-graph feature. Fill the
> method with a check to see if replace-objects exist. Test this
> interaction succeeds, including ignoring an existing commit-graph and
> failing to write a new commit-graph. However, we do ensure that
> we write a new commit-graph by setting read_replace_refs to 0, thereby
> ignoring the replace refs.
>
> Signed-off-by: Derrick Stolee 
> ---
>  builtin/commit-graph.c  |  4 
>  commit-graph.c  | 21 +
>  replace-object.c|  2 +-
>  replace-object.h|  2 ++
>  t/t5318-commit-graph.sh | 22 ++
>  5 files changed, 50 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 0bf0c48657..da737df321 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -120,6 +120,8 @@ static int graph_read(int argc, const char **argv)
>   return 0;
>  }
>  
> +extern int read_replace_refs;
> +

Why do you need this (and also in commit-graph.c)?  I thought
cache.h includes it, which you can just make use of it.

> +static int commit_graph_compatible(struct repository *r)
> +{
> + if (read_replace_refs) {
> + prepare_replace_object(r);
> + if (hashmap_get_size(&r->objects->replace_map->map))
> + return 0;
> + }
> +
> + return 1;
> +}

> diff --git a/replace-object.c b/replace-object.c
> index 3c17864eb7..9821f1477e 100644
> --- a/replace-object.c
> +++ b/replace-object.c
> @@ -32,7 +32,7 @@ static int register_replace_ref(struct repository *r,
>   return 0;
>  }
>  
> -static void prepare_replace_object(struct repository *r)
> +void prepare_replace_object(struct repository *r)
>  {
>   if (r->objects->replace_map)
>   return;

The way the new caller is written, I am wondering if this function
should return "we are (or, are not) using the replace object
feature", making it unnecessary for callers on the reading side to
know about "read-replace-refs" external variable, for example.

/*
 * To be called on-demand from codepaths that want to make
 * sure that replacement objects can be found as necessary.
 * 
 * Return number of replacement defined for the repository, or
 * -1 when !read_replace_refs tells us not to use replacement
 * mechanism at all.
 */
int prepare_replace_object(struct repository *r)
{
if (!read_replace_refs)
return -1;

if (!r->objects->replace_map) {
r->objects->replace_map =
xmalloc(...);
oidmap_init(r->objects->replace_map, 0);
for_each_refplace_ref(r, register_...);
}
return hashmap_get_size(&r->objects->replace_map->map);
}

Then, the caller side can simply become something like:

#define cgraph_compat(r) (prepare_replace_object(r) <= 0)

There are various writers to read_replace_refs variable, but I think
they should first be replaced with calls to something like:

void use_replace_refs(struct repository *r, int enable);

which allows us to hide the global variable and later make it per
repository if we wanted to.



Re: [PATCH v2 0/8] Clarify commit-graph and grafts/replace/shallow incompatibilities

2018-08-21 Thread Junio C Hamano
Stefan Beller  writes:

> On Mon, Aug 20, 2018 at 11:24 AM Derrick Stolee  wrote:
>> Because of the change of base, it is hard to provide a side-by-side diff
>> from v1.
>
> I thought that was the whole point of range-diff. ;-)

I thought so, too.  Was there any change that confused range-diff
machinery?


Re: [PATCH 1/1] Docs: Add commit-graph tech docs to Makefile

2018-08-21 Thread Junio C Hamano
"Derrick Stolee via GitGitGadget"  writes:

> From: Derrick Stolee 
>
> Ensure that the commit-graph.txt and commit-graph-format.txt files
> are compiled to HTML using ASCIIDOC.
>
> Signed-off-by: Derrick Stolee 
> ---
>  Documentation/Makefile | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/Documentation/Makefile b/Documentation/Makefile
> index d079d7c73..841e4f705 100644
> --- a/Documentation/Makefile
> +++ b/Documentation/Makefile
> @@ -69,6 +69,8 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out 
> technical/api-index-skel.txt technica
>  SP_ARTICLES += $(API_DOCS)
>  
>  TECH_DOCS += SubmittingPatches
> +TECH_DOCS += technical/commit-graph
> +TECH_DOCS += technical/commit-graph-format
>  TECH_DOCS += technical/hash-function-transition
>  TECH_DOCS += technical/http-protocol
>  TECH_DOCS += technical/index-format

Looking at what 5641eb94 ("partial-clone: render design doc using
asciidoc", 2018-08-14) had to do to make the straight text usable as
AsciiDoc input, I do not think a patch to D/Makefile alone is
sufficient; just quick skim of the source makes a reader notice a
block of text like this:

Equivalently, the generation number of a commit A is one more than the
length of a longest path from A to a root commit. The recursive definition
is easier to use for computation and observing the following property:

If A and B are commits with generation numbers N and M, respectively,
and N <= M, then A cannot reach B. That is, we know without searching
that B is not an ancestor of A because it is further from a root commit
than A.

Conversely, when checking if A is an ancestor of B, then we only need
to walk commits until all commits on the walk boundary have generation
number at most N. If we walk commits using a priority queue seeded by
generation numbers, then we always expand the boundary commit with 
highest
generation number and can easily detect the stopping condition.

which would give you two paragraphs that are typeset in monospace,
without giving enough visual cue that these two are the "following
properties" the previous paragraph talks about, like the original
plain-text does.

I haven't checked the "-format" document, but I suspect both of
these are easier to read in text than in HTML formatted without
tweaking the mark-up a bit.  It's not like we are not shipping the
text versions; let's not give readers HTML that is uglier than text.


Re: [PATCH v2 0/8] Clarify commit-graph and grafts/replace/shallow incompatibilities

2018-08-21 Thread Derrick Stolee

On 8/21/2018 1:51 PM, Junio C Hamano wrote:

Stefan Beller  writes:


On Mon, Aug 20, 2018 at 11:24 AM Derrick Stolee  wrote:

Because of the change of base, it is hard to provide a side-by-side diff
from v1.

I thought that was the whole point of range-diff. ;-)

I thought so, too.  Was there any change that confused range-diff
machinery?


I've just been out of the loop and didn't realize what range-diff did. 
Here is the range-diff:


1:  43ddcc9ef9 = 1:  c8edae4179 refs.c: migrate internal ref iteration 
to pass thru repository argument
2:  22dc9ce836 = 2:  f89451e884 refs.c: upgrade for_each_replace_ref to 
be a each_repo_ref_fn callback

3:  5e90d36f84 ! 3:  d95aa3472c commit-graph: update design document
    @@ -19,7 +19,7 @@
    so a future change of hash algorithm does not require a change 
in format.


 +- Commit grafts and replace objects can change the shape of the 
commit

    -+  history. These can also be enabled/disabled on the fly using
    ++  history. The latter can also be enabled/disabled on the fly using
 +  `--no-replace-objects`. This leads to difficultly storing both 
possible
 +  interpretations of a commit id, especially when computing 
generation

 +  numbers. The commit-graph will not be read or written when
4:  88711a3cf4 = 4:  ec89c88e14 test-repository: properly init repo
5:  7f596c1718 ! 5:  0321a3cf10 commit-graph: not compatible with 
replace objects

    @@ -6,10 +6,34 @@
 repository r is compatible with the commit-graph feature. Fill the
 method with a check to see if replace-objects exist. Test this
 interaction succeeds, including ignoring an existing 
commit-graph and

    -    failing to write a new commit-graph.
    +    failing to write a new commit-graph. However, we do ensure that
    +    we write a new commit-graph by setting read_replace_refs to 0, 
thereby

    +    ignoring the replace refs.

 Signed-off-by: Derrick Stolee 

    +diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
    +--- a/builtin/commit-graph.c
     b/builtin/commit-graph.c
    +@@
    +   return 0;
    + }
    +
    ++extern int read_replace_refs;
    ++
    + static int graph_write(int argc, const char **argv)
    + {
    +   struct string_list *pack_indexes = NULL;
    +@@
    +   if (!opts.obj_dir)
    +   opts.obj_dir = get_object_directory();
    +
    ++  read_replace_refs = 0;
    ++
    +   if (opts.reachable) {
    +   write_commit_graph_reachable(opts.obj_dir, opts.append);
    +   return 0;
    +
 diff --git a/commit-graph.c b/commit-graph.c
 --- a/commit-graph.c
 +++ b/commit-graph.c
    @@ -26,11 +50,15 @@
    return g;
  }

    ++extern int read_replace_refs;
    ++
 +static int commit_graph_compatible(struct repository *r)
 +{
    -+  prepare_replace_object(r);
    -+  if (hashmap_get_size(&r->objects->replace_map->map))
    -+  return 0;
    ++  if (read_replace_refs) {
    ++  prepare_replace_object(r);
    ++  if (hashmap_get_size(&r->objects->replace_map->map))
    ++  return 0;
    ++  }
 +
 +  return 1;
 +}
    @@ -110,7 +138,7 @@
 +  test_cmp expect actual &&
 +  rm -rf .git/objects/info/commit-graph &&
 +  git commit-graph write --reachable &&
    -+  test_path_is_missing .git/objects/info/commit-graph
    ++  test_path_is_file .git/objects/info/commit-graph
 +  )
 +'
 +
6:  94dd91ac35 ! 6:  d18ecc0124 commit-graph: not compatible with grafts
    @@ -7,24 +7,25 @@
 situations we ignore existing commit-graph files and we do not 
write new

 commit-graph files.

    +    Helped-by: Jakub Narebski 
 Signed-off-by: Derrick Stolee 

 diff --git a/commit-graph.c b/commit-graph.c
 --- a/commit-graph.c
 +++ b/commit-graph.c
 @@
    +   return 0;
    +   }

    - static int commit_graph_compatible(struct repository *r)
    - {
 +  prepare_commit_graft(r);
 +  if (r->parsed_objects && r->parsed_objects->grafts_nr)
 +  return 0;
 +  if (is_repository_shallow(r))
 +  return 0;
 +
    -   prepare_replace_object(r);
    -   if (hashmap_get_size(&r->objects->replace_map->map))
    -   return 0;
    +   return 1;
    + }
    +

 diff --git a/commit.c b/commit.c
 --- a/commit.c
    @@ -66,7 +67,9 @@
 +  cd graft &&
 +  git commit-graph write --reachable &&
 +  test_path_is_file .git/objects/info/commit-graph &&
    -+  git replace --graft HEAD~1 HEAD~3 &&
    ++  H1=$(git rev-parse --verify HEAD~1) &&
    ++  H3=$(git rev-parse --verify HEAD~3) &&
    ++  echo "$H1 $H3" >.git/info/grafts &&
 +  git -c core.commitGraph=false log >expect &&
 +  git -c core.commitGraph=true log >actual &&
 +  test_cmp expect actual &&
7:  5314a5a93d ! 7:  87d

Re: [PATCH v2 5/8] commit-graph: not compatible with replace objects

2018-08-21 Thread Derrick Stolee

On 8/21/2018 1:45 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


Create new method commit_graph_compatible(r) to check if a given
repository r is compatible with the commit-graph feature. Fill the
method with a check to see if replace-objects exist. Test this
interaction succeeds, including ignoring an existing commit-graph and
failing to write a new commit-graph. However, we do ensure that
we write a new commit-graph by setting read_replace_refs to 0, thereby
ignoring the replace refs.

Signed-off-by: Derrick Stolee 
---
  builtin/commit-graph.c  |  4 
  commit-graph.c  | 21 +
  replace-object.c|  2 +-
  replace-object.h|  2 ++
  t/t5318-commit-graph.sh | 22 ++
  5 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 0bf0c48657..da737df321 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -120,6 +120,8 @@ static int graph_read(int argc, const char **argv)
return 0;
  }
  
+extern int read_replace_refs;

+

Why do you need this (and also in commit-graph.c)?  I thought
cache.h includes it, which you can just make use of it.



You're right. I don't know how I missed that.



+static int commit_graph_compatible(struct repository *r)
+{
+   if (read_replace_refs) {
+   prepare_replace_object(r);
+   if (hashmap_get_size(&r->objects->replace_map->map))
+   return 0;
+   }
+
+   return 1;
+}
diff --git a/replace-object.c b/replace-object.c
index 3c17864eb7..9821f1477e 100644
--- a/replace-object.c
+++ b/replace-object.c
@@ -32,7 +32,7 @@ static int register_replace_ref(struct repository *r,
return 0;
  }
  
-static void prepare_replace_object(struct repository *r)

+void prepare_replace_object(struct repository *r)
  {
if (r->objects->replace_map)
return;

The way the new caller is written, I am wondering if this function
should return "we are (or, are not) using the replace object
feature", making it unnecessary for callers on the reading side to
know about "read-replace-refs" external variable, for example.

/*
 * To be called on-demand from codepaths that want to make
 * sure that replacement objects can be found as necessary.
 *
 * Return number of replacement defined for the repository, or
 * -1 when !read_replace_refs tells us not to use replacement
 * mechanism at all.
 */
int prepare_replace_object(struct repository *r)
{
if (!read_replace_refs)
return -1;

if (!r->objects->replace_map) {
r->objects->replace_map =
xmalloc(...);
oidmap_init(r->objects->replace_map, 0);
for_each_refplace_ref(r, register_...);
}
return hashmap_get_size(&r->objects->replace_map->map);
}


This is a good idea. I can incorporate it into v3.


Then, the caller side can simply become something like:

#define cgraph_compat(r) (prepare_replace_object(r) <= 0)


With the next two patches that add more conditions to 
commit_graph_compatible(), I'd prefer keeping it a method instead of a 
macro.



There are various writers to read_replace_refs variable, but I think
they should first be replaced with calls to something like:

void use_replace_refs(struct repository *r, int enable);

which allows us to hide the global variable and later make it per
repository if we wanted to.


I will incorporate this into v3 as well.

Thanks,

-Stolee



[PATCH] test-tool.h: include git-compat-util.h

2018-08-21 Thread Jeff King
The test-tool programs include "test-tool.h" as their first
include, which breaks our CodingGuideline of "the first
include must be git-compat-util.h or an equivalent".

Rather than change them all, let's instead make test-tool.h
one of those equivalents, just like we do for builtin.h
(which many of the actual git builtins include first).

Signed-off-by: Jeff King 
---
Repost, as the original was in a larger thread about includes and didn't
get any comment.

 t/helper/test-tool.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index e926c416ea..e954e8c522 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -1,6 +1,8 @@
 #ifndef __TEST_TOOL_H__
 #define __TEST_TOOL_H__
 
+#include "git-compat-util.h"
+
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
 int cmd__ctype(int argc, const char **argv);
-- 
2.19.0.rc0.398.g138a08f6f6


git@vger.kernel.org

2018-08-21 Thread Antoine Beaupré
On 2018-08-08 18:10:22, Matthieu Moy wrote:
> "jrnieder"  wrote:
>
>> (+cc: some folks interested in git-remote-mediawiki)
>
> Thanks.
>
> In case it still matters, an obvious Acked-by: Matthieu Moy 
> 

Hi,

I seem to have lost context of the original email, and can't find a copy
on public-inbox.org... Is there a patch we should merge back into
git-mediawiki already?

Thanks!

A.

-- 
Your injured body has become the burden of your digital soul.
- Yin Aiwen, 2013, The Massage is the Medium


Re: [PATCH] test-tool.h: include git-compat-util.h

2018-08-21 Thread Junio C Hamano
Jeff King  writes:

> The test-tool programs include "test-tool.h" as their first
> include, which breaks our CodingGuideline of "the first
> include must be git-compat-util.h or an equivalent".
>
> Rather than change them all, let's instead make test-tool.h
> one of those equivalents, just like we do for builtin.h
> (which many of the actual git builtins include first).
>
> Signed-off-by: Jeff King 
> ---
> Repost, as the original was in a larger thread about includes and didn't
> get any comment.

Thanks.  Will queue.

>
>  t/helper/test-tool.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
> index e926c416ea..e954e8c522 100644
> --- a/t/helper/test-tool.h
> +++ b/t/helper/test-tool.h
> @@ -1,6 +1,8 @@
>  #ifndef __TEST_TOOL_H__
>  #define __TEST_TOOL_H__
>  
> +#include "git-compat-util.h"
> +
>  int cmd__chmtime(int argc, const char **argv);
>  int cmd__config(int argc, const char **argv);
>  int cmd__ctype(int argc, const char **argv);


[PATCH 1/6] t/perf: factor boilerplate out of test_perf

2018-08-21 Thread Jeff King
About half of test_perf() is boilerplate preparing to run
_any_ test, and the other half is specifically running a
timing test. Let's split it into two functions, so that we
can reuse the boilerplate in future commits.

Signed-off-by: Jeff King 
---
 t/perf/perf-lib.sh | 61 ++
 1 file changed, 35 insertions(+), 26 deletions(-)

diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
index e4c343a6b7..a54be09516 100644
--- a/t/perf/perf-lib.sh
+++ b/t/perf/perf-lib.sh
@@ -179,8 +179,8 @@ exit $ret' >&3 2>&4
return "$eval_ret"
 }
 
-
-test_perf () {
+test_wrapper_ () {
+   test_wrapper_func_=$1; shift
test_start_
test "$#" = 3 && { test_prereq=$1; shift; } || test_prereq=
test "$#" = 2 ||
@@ -191,35 +191,44 @@ test_perf () {
base=$(basename "$0" .sh)
echo "$test_count" >>"$perf_results_dir"/$base.subtests
echo "$1" >"$perf_results_dir"/$base.$test_count.descr
-   if test -z "$verbose"; then
-   printf "%s" "perf $test_count - $1:"
-   else
-   echo "perf $test_count - $1:"
-   fi
-   for i in $(test_seq 1 $GIT_PERF_REPEAT_COUNT); do
-   say >&3 "running: $2"
-   if test_run_perf_ "$2"
-   then
-   if test -z "$verbose"; then
-   printf " %s" "$i"
-   else
-   echo "* timing run 
$i/$GIT_PERF_REPEAT_COUNT:"
-   fi
+   base="$perf_results_dir"/"$perf_results_prefix$(basename "$0" 
.sh)"."$test_count"
+   "$test_wrapper_func_" "$@"
+   fi
+
+   test_finish_
+}
+
+test_perf_ () {
+   if test -z "$verbose"; then
+   printf "%s" "perf $test_count - $1:"
+   else
+   echo "perf $test_count - $1:"
+   fi
+   for i in $(test_seq 1 $GIT_PERF_REPEAT_COUNT); do
+   say >&3 "running: $2"
+   if test_run_perf_ "$2"
+   then
+   if test -z "$verbose"; then
+   printf " %s" "$i"
else
-   test -z "$verbose" && echo
-   test_failure_ "$@"
-   break
+   echo "* timing run $i/$GIT_PERF_REPEAT_COUNT:"
fi
-   done
-   if test -z "$verbose"; then
-   echo " ok"
else
-   test_ok_ "$1"
+   test -z "$verbose" && echo
+   test_failure_ "$@"
+   break
fi
-   base="$perf_results_dir"/"$perf_results_prefix$(basename "$0" 
.sh)"."$test_count"
-   "$TEST_DIRECTORY"/perf/min_time.perl test_time.* >"$base".times
+   done
+   if test -z "$verbose"; then
+   echo " ok"
+   else
+   test_ok_ "$1"
fi
-   test_finish_
+   "$TEST_DIRECTORY"/perf/min_time.perl test_time.* >"$base".times
+}
+
+test_perf () {
+   test_wrapper_ test_perf_ "$@"
 }
 
 # We extend test_done to print timings at the end (./run disables this
-- 
2.19.0.rc0.398.g138a08f6f6



[PATCH v2 0/6] reuse on-disk deltas for fetches with bitmaps

2018-08-21 Thread Jeff King
On Fri, Aug 17, 2018 at 04:54:27PM -0400, Jeff King wrote:

> This series more aggressively reuses on-disk deltas to serve fetches
> when reachability bitmaps tell us a more complete picture of what the
> client has. That saves server CPU and results in smaller packs. See the
> final patch for numbers and more discussion.

Here's a v2, with just a few cosmetic fixes to address the comments on
v1 (range-diff below).

  [1/6]: t/perf: factor boilerplate out of test_perf
  [2/6]: t/perf: factor out percent calculations
  [3/6]: t/perf: add infrastructure for measuring sizes
  [4/6]: t/perf: add perf tests for fetches from a bitmapped server
  [5/6]: pack-bitmap: save "have" bitmap from walk
  [6/6]: pack-objects: reuse on-disk deltas for thin "have" objects

 builtin/pack-objects.c | 28 +++
 pack-bitmap.c  | 25 +-
 pack-bitmap.h  |  7 +++
 pack-objects.c | 19 
 pack-objects.h | 20 +++-
 t/perf/README  | 25 ++
 t/perf/aggregate.perl  | 69 ++--
 t/perf/p5311-pack-bitmaps-fetch.sh | 45 ++
 t/perf/perf-lib.sh | 74 +++---
 9 files changed, 260 insertions(+), 52 deletions(-)
 create mode 100755 t/perf/p5311-pack-bitmaps-fetch.sh

1:  89fa0ec8d8 ! 1:  3e1b94d7d6 pack-bitmap: save "have" bitmap from walk
@@ -69,6 +69,8 @@
 +
 +  if (!bitmap_git)
 +  return 0; /* no bitmap loaded */
++  if (!bitmap_git->result)
++  BUG("failed to perform bitmap walk before querying");
 +  if (!bitmap_git->haves)
 +  return 0; /* walk had no "haves" */
 +
2:  f7ca0d59e3 ! 2:  b8b2416aac pack-objects: reuse on-disk deltas for thin 
"have" objects
@@ -12,7 +12,7 @@
 However, this misses some opportunities. Modulo some special
 cases like shallow or partial clones, we know that every
 object reachable from the "haves" could be a preferred base.
-We don't use them all for two reasons:
+We don't use all of them for two reasons:
 
   1. It's expensive to traverse the whole history and
  enumerate all of the objects the other side has.
@@ -100,15 +100,16 @@
 
 The second is that the rest of the code assumes that any
 reused delta will point to another "struct object_entry" as
-its base. But by definition, we don't have such an entry!
+its base. But of course the case we are interested in here
+is the one where don't have such an entry!
 
 I looked at a number of options that didn't quite work:
 
- - we could use a different flag for reused deltas. But it's
-   not a single bit for "I'm being reused". We have to
-   actually store the oid of the base, which is normally
-   done by pointing to the existing object_entry. And we'd
-   have to modify all the code which looks at deltas.
+ - we could use a flag to signal a reused delta, but it's
+   not a single bit. We have to actually store the oid of
+   the base, which is normally done by pointing to the
+   existing object_entry. And we'd have to modify all the
+   code which looks at deltas.
 
  - we could add the reused bases to the end of the existing
object_entry array. While this does create some extra
@@ -173,7 +174,7 @@
  static int depth = 50;
  static int delta_search_threads;
  static int pack_to_stdout;
-+static int thin = 0;
++static int thin;
  static int num_preferred_base;
  static struct progress *progress_state;
  


[PATCH 4/6] t/perf: add perf tests for fetches from a bitmapped server

2018-08-21 Thread Jeff King
A server with bitmapped packs can serve a clone very
quickly. However, fetches are not necessarily made any
faster, because we spend a lot less time in object traversal
(which is what bitmaps help with) and more time finding
deltas (because we may have to throw out on-disk deltas if
the client does not have the base).

As a first step to making this faster, this patch introduces
a new perf script to measure fetches into a repo of various
ages from a fully-bitmapped server.

We separately measure the work done by the server (in
pack-objects) and that done by the client (in index-pack).
Furthermore, we measure the size of the resulting pack.

Breaking it down like this (instead of just doing a regular
"git fetch") lets us see how much each side benefits from
any changes. And since we know the pack size, if we estimate
the network speed, then one could calculate a complete
wall-clock time for the operation (though the script does
not do this automatically).

Signed-off-by: Jeff King 
---
 t/perf/p5311-pack-bitmaps-fetch.sh | 45 ++
 1 file changed, 45 insertions(+)
 create mode 100755 t/perf/p5311-pack-bitmaps-fetch.sh

diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh 
b/t/perf/p5311-pack-bitmaps-fetch.sh
new file mode 100755
index 00..b04575951f
--- /dev/null
+++ b/t/perf/p5311-pack-bitmaps-fetch.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+test_description='performance of fetches from bitmapped packs'
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+test_expect_success 'create bitmapped server repo' '
+   git config pack.writebitmaps true &&
+   git config pack.writebitmaphashcache true &&
+   git repack -ad
+'
+
+# simulate a fetch from a repository that last fetched N days ago, for
+# various values of N. We do so by following the first-parent chain,
+# and assume the first entry in the chain that is N days older than the current
+# HEAD is where the HEAD would have been then.
+for days in 1 2 4 8 16 32 64 128; do
+   title=$(printf '%10s' "($days days)")
+   test_expect_success "setup revs from $days days ago" '
+   now=$(git log -1 --format=%ct HEAD) &&
+   then=$(($now - ($days * 86400))) &&
+   tip=$(git rev-list -1 --first-parent --until=$then HEAD) &&
+   {
+   echo HEAD &&
+   echo ^$tip
+   } >revs
+   '
+
+   test_perf "server $title" '
+   git pack-objects --stdout --revs \
+--thin --delta-base-offset \
+tmp.pack
+   '
+
+   test_size "size   $title" '
+   wc -c 

[PATCH 3/6] t/perf: add infrastructure for measuring sizes

2018-08-21 Thread Jeff King
The main objective of scripts in the perf framework is to
run "test_perf", which measures the time it takes to run
some operation. However, it can also be interesting to see
the change in the output size of certain operations.

This patch introduces test_size, which records a single
numeric output from the test and shows it in the aggregated
output (with pretty printing and relative size comparison).

Signed-off-by: Jeff King 
---
 t/perf/README | 25 ++
 t/perf/aggregate.perl | 48 ++-
 t/perf/perf-lib.sh| 13 
 3 files changed, 81 insertions(+), 5 deletions(-)

diff --git a/t/perf/README b/t/perf/README
index 21321a0f36..be12090c38 100644
--- a/t/perf/README
+++ b/t/perf/README
@@ -168,3 +168,28 @@ that
   While we have tried to make sure that it can cope with embedded
   whitespace and other special characters, it will not work with
   multi-line data.
+
+Rather than tracking the performance by run-time as `test_perf` does, you
+may also track output size by using `test_size`. The stdout of the
+function should be a single numeric value, which will be captured and
+shown in the aggregated output. For example:
+
+   test_perf 'time foo' '
+   ./foo >foo.out
+   '
+
+   test_size 'output size'
+   wc -c ;
return undef if not defined $line;
close $fh or die "cannot close $name: $!";
-   $line =~ /^(?:(\d+):)?(\d+):(\d+(?:\.\d+)?) (\d+(?:\.\d+)?) 
(\d+(?:\.\d+)?)$/
-   or die "bad input line: $line";
-   my $rt = ((defined $1 ? $1 : 0.0)*60+$2)*60+$3;
-   return ($rt, $4, $5);
+   # times
+   if ($line =~ /^(?:(\d+):)?(\d+):(\d+(?:\.\d+)?) (\d+(?:\.\d+)?) 
(\d+(?:\.\d+)?)$/) {
+   my $rt = ((defined $1 ? $1 : 0.0)*60+$2)*60+$3;
+   return ($rt, $4, $5);
+   # size
+   } elsif ($line =~ /^\d+$/) {
+   return $&;
+   } else {
+   die "bad input line: $line";
+   }
 }
 
 sub relative_change {
@@ -32,9 +38,15 @@ sub relative_change {
 
 sub format_times {
my ($r, $u, $s, $firstr) = @_;
+   # no value means we did not finish the test
if (!defined $r) {
return "";
}
+   # a single value means we have a size, not times
+   if (!defined $u) {
+   return format_size($r, $firstr);
+   }
+   # otherwise, we have real/user/system times
my $out = sprintf "%.2f(%.2f+%.2f)", $r, $u, $s;
$out .= ' ' . relative_change($r, $firstr) if defined $firstr;
return $out;
@@ -54,6 +66,25 @@ sub usage {
exit(1);
 }
 
+sub human_size {
+   my $n = shift;
+   my @units = ('', qw(K M G));
+   while ($n > 900 && @units > 1) {
+   $n /= 1000;
+   shift @units;
+   }
+   return $n unless length $units[0];
+   return sprintf '%.1f%s', $n, $units[0];
+}
+
+sub format_size {
+   my ($size, $first) = @_;
+   # match the width of a time: 0.00(0.00+0.00)
+   my $out = sprintf '%15s', human_size($size);
+   $out .= ' ' . relative_change($size, $first) if defined $first;
+   return $out;
+}
+
 my (@dirs, %dirnames, %dirabbrevs, %prefixes, @tests,
 $codespeed, $sortby, $subsection, $reponame);
 
@@ -184,7 +215,14 @@ sub print_default_results {
my $firstr;
for my $i (0..$#dirs) {
my $d = $dirs[$i];
-   $times{$prefixes{$d}.$t} = 
[get_times("$resultsdir/$prefixes{$d}$t.times")];
+   my $base = "$resultsdir/$prefixes{$d}$t";
+   $times{$prefixes{$d}.$t} = [];
+   foreach my $type (qw(times size)) {
+   if (-e "$base.$type") {
+   $times{$prefixes{$d}.$t} = 
[get_times("$base.$type")];
+   last;
+   }
+   }
my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}};
my $w = length format_times($r,$u,$s,$firstr);
$colwidth[$i] = $w if $w > $colwidth[$i];
diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
index a54be09516..11d1922cf5 100644
--- a/t/perf/perf-lib.sh
+++ b/t/perf/perf-lib.sh
@@ -231,6 +231,19 @@ test_perf () {
test_wrapper_ test_perf_ "$@"
 }
 
+test_size_ () {
+   say >&3 "running: $2"
+   if test_eval_ "$2" 3>"$base".size; then
+   test_ok_ "$1"
+   else
+   test_failure_ "$@"
+   fi
+}
+
+test_size () {
+   test_wrapper_ test_size_ "$@"
+}
+
 # We extend test_done to print timings at the end (./run disables this
 # and does it after running everything)
 test_at_end_hook_ () {
-- 
2.19.0.rc0.398.g138a08f6f6



[PATCH 2/6] t/perf: factor out percent calculations

2018-08-21 Thread Jeff King
This will let us reuse the code when we add new values to
aggregate besides times.

Signed-off-by: Jeff King 
---
 t/perf/aggregate.perl | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index bc865160e7..3181b087ab 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -19,21 +19,24 @@ sub get_times {
return ($rt, $4, $5);
 }
 
+sub relative_change {
+   my ($r, $firstr) = @_;
+   if ($firstr > 0) {
+   return sprintf "%+.1f%%", 100.0*($r-$firstr)/$firstr;
+   } elsif ($r == 0) {
+   return "=";
+   } else {
+   return "+inf";
+   }
+}
+
 sub format_times {
my ($r, $u, $s, $firstr) = @_;
if (!defined $r) {
return "";
}
my $out = sprintf "%.2f(%.2f+%.2f)", $r, $u, $s;
-   if (defined $firstr) {
-   if ($firstr > 0) {
-   $out .= sprintf " %+.1f%%", 100.0*($r-$firstr)/$firstr;
-   } elsif ($r == 0) {
-   $out .= " =";
-   } else {
-   $out .= " +inf";
-   }
-   }
+   $out .= ' ' . relative_change($r, $firstr) if defined $firstr;
return $out;
 }
 
-- 
2.19.0.rc0.398.g138a08f6f6



[PATCH 5/6] pack-bitmap: save "have" bitmap from walk

2018-08-21 Thread Jeff King
When we do a bitmap walk, we save the result, which
represents (WANTs & ~HAVEs); i.e., every object we care
about visiting in our walk. However, we throw away the
haves bitmap, which can sometimes be useful, too. Save it
and provide an access function so code which has performed a
walk can query it.

A few notes on the accessor interface:

 - the bitmap code calls these "haves" because it grew out
   of the want/have negotiation for fetches. But really,
   these are simply the objects that would be flagged
   UNINTERESTING in a regular traversal. Let's use that
   more universal nomenclature for the external module
   interface. We may want to change the internal naming
   inside the bitmap code, but that's outside the scope of
   this patch.

 - it still uses a bare "sha1" rather than "oid". That's
   true of all of the bitmap code. And in this particular
   instance, our caller in pack-objects is dealing with the
   bare sha1 that comes from a packed REF_DELTA (we're
   pointing directly to the mmap'd pack on disk). That's
   something we'll have to deal with as we transition to a
   new hash, but we can wait and see how the caller ends up
   being fixed and adjust this interface accordingly.

Signed-off-by: Jeff King 
---
 pack-bitmap.c | 25 -
 pack-bitmap.h |  7 +++
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index f0a1937a1c..c3231ef9ef 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -86,6 +86,9 @@ struct bitmap_index {
/* Bitmap result of the last performed walk */
struct bitmap *result;
 
+   /* "have" bitmap from the last performed walk */
+   struct bitmap *haves;
+
/* Version of the bitmap index */
unsigned int version;
 
@@ -759,8 +762,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info 
*revs)
bitmap_and_not(wants_bitmap, haves_bitmap);
 
bitmap_git->result = wants_bitmap;
+   bitmap_git->haves = haves_bitmap;
 
-   bitmap_free(haves_bitmap);
return bitmap_git;
 
 cleanup:
@@ -1114,5 +1117,25 @@ void free_bitmap_index(struct bitmap_index *b)
free(b->ext_index.objects);
free(b->ext_index.hashes);
bitmap_free(b->result);
+   bitmap_free(b->haves);
free(b);
 }
+
+int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git,
+const unsigned char *sha1)
+{
+   int pos;
+
+   if (!bitmap_git)
+   return 0; /* no bitmap loaded */
+   if (!bitmap_git->result)
+   BUG("failed to perform bitmap walk before querying");
+   if (!bitmap_git->haves)
+   return 0; /* walk had no "haves" */
+
+   pos = bitmap_position_packfile(bitmap_git, sha1);
+   if (pos < 0)
+   return 0;
+
+   return bitmap_get(bitmap_git->haves, pos);
+}
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 8a04741e12..c633bf5238 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -53,6 +53,13 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct 
packing_data *mapping
 khash_sha1 *reused_bitmaps, int show_progress);
 void free_bitmap_index(struct bitmap_index *);
 
+/*
+ * After a traversal has been performed on the bitmap_index, this can be
+ * queried to see if a particular object was reachable from any of the
+ * objects flagged as UNINTERESTING.
+ */
+int bitmap_has_sha1_in_uninteresting(struct bitmap_index *, const unsigned 
char *sha1);
+
 void bitmap_writer_show_progress(int show);
 void bitmap_writer_set_checksum(unsigned char *sha1);
 void bitmap_writer_build_type_index(struct packing_data *to_pack,
-- 
2.19.0.rc0.398.g138a08f6f6



[PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects

2018-08-21 Thread Jeff King
When we serve a fetch, we pass the "wants" and "haves" from
the fetch negotiation to pack-objects. That tells us not
only which objects we need to send, but we also use the
boundary commits as "preferred bases": their trees and blobs
are candidates for delta bases, both for reusing on-disk
deltas and for finding new ones.

However, this misses some opportunities. Modulo some special
cases like shallow or partial clones, we know that every
object reachable from the "haves" could be a preferred base.
We don't use all of them for two reasons:

  1. It's expensive to traverse the whole history and
 enumerate all of the objects the other side has.

  2. The delta search is expensive, so we want to keep the
 number of candidate bases sane. The boundary commits
 are the most likely to work.

When we have reachability bitmaps, though, reason 1 no
longer applies. We can efficiently compute the set of
reachable objects on the other side (and in fact already did
so as part of the bitmap set-difference to get the list of
interesting objects). And using this set conveniently
covers the shallow and partial cases, since we have to
disable the use of bitmaps for those anyway.

The second reason argues against using these bases in the
search for new deltas. But there's one case where we can use
this information for free: when we have an existing on-disk
delta that we're considering reusing, we can do so if we
know the other side has the base object. This in fact saves
time during the delta search, because it's one less delta we
have to compute.

And that's exactly what this patch does: when we're
considering whether to reuse an on-disk delta, if bitmaps
tell us the other side has the object (and we're making a
thin-pack), then we reuse it.

Here are the results on p5311 using linux.git, which
simulates a client fetching after `N` days since their last
fetch:

Test origin  HEAD
--
5311.3: server   (1 days)0.27(0.27+0.04) 0.12(0.09+0.03) -55.6%
5311.4: size (1 days)   0.9M  237.0K -73.7%
5311.5: client   (1 days)0.04(0.05+0.00) 0.10(0.10+0.00) +150.0%
5311.7: server   (2 days)0.34(0.42+0.04) 0.13(0.10+0.03) -61.8%
5311.8: size (2 days)   1.5M  347.7K -76.5%
5311.9: client   (2 days)0.07(0.08+0.00) 0.16(0.15+0.01) +128.6%
5311.11: server   (4 days)   0.56(0.77+0.08) 0.13(0.10+0.02) -76.8%
5311.12: size (4 days)  2.8M  566.6K -79.8%
5311.13: client   (4 days)   0.13(0.15+0.00) 0.34(0.31+0.02) +161.5%
5311.15: server   (8 days)   0.97(1.39+0.11) 0.30(0.25+0.05) -69.1%
5311.16: size (8 days)  4.3M1.0M -76.0%
5311.17: client   (8 days)   0.20(0.22+0.01) 0.53(0.52+0.01) +165.0%
5311.19: server  (16 days)   1.52(2.51+0.12) 0.30(0.26+0.03) -80.3%
5311.20: size(16 days)  8.0M2.0M -74.5%
5311.21: client  (16 days)   0.40(0.47+0.03) 1.01(0.98+0.04) +152.5%
5311.23: server  (32 days)   2.40(4.44+0.20) 0.31(0.26+0.04) -87.1%
5311.24: size(32 days) 14.1M4.1M -70.9%
5311.25: client  (32 days)   0.70(0.90+0.03) 1.81(1.75+0.06) +158.6%
5311.27: server  (64 days)   11.76(26.57+0.29)   0.55(0.50+0.08) -95.3%
5311.28: size(64 days) 89.4M   47.4M -47.0%
5311.29: client  (64 days)   5.71(9.31+0.27) 15.20(15.20+0.32) +166.2%
5311.31: server (128 days)   16.15(36.87+0.40)   0.91(0.82+0.14) -94.4%
5311.32: size   (128 days)134.8M  100.4M -25.5%
5311.33: client (128 days)   9.42(16.86+0.49)25.34(25.80+0.46) +169.0%

In all cases we save CPU time on the server (sometimes
significant) and the resulting pack is smaller. We do spend
more CPU time on the client side, because it has to
reconstruct more deltas. But that's the right tradeoff to
make, since clients tend to outnumber servers. It just means
the thin pack mechanism is doing its job.

>From the user's perspective, the end-to-end time of the
operation will generally be faster. E.g., in the 128-day
case, we saved 15s on the server at a cost of 16s on the
client. Since the resulting pack is 34MB smaller, this is a
net win if the network speed is less than 270Mbit/s. And
that's actually the worst case. The 64-day case saves just
over 11s at a cost of just under 11s. So it's a slight win
at any network speed, and the 40MB saved is pure bonus. That
trend continues for the smaller fetches.

The implementation itself is mostly straightforward, with
the new logic going into check_object(). But there are two
tricky bits.

The first is that check_object() needs access to the
relevant information (the thin flag and bitmap result). We
can do this by pushing these into program-lifetime globals.

The second is that the rest of the code assumes that any
reused delta will point to another "struct 

Re: [PATCH v2 0/6] reuse on-disk deltas for fetches with bitmaps

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 03:06:22PM -0400, Jeff King wrote:

> On Fri, Aug 17, 2018 at 04:54:27PM -0400, Jeff King wrote:
> 
> > This series more aggressively reuses on-disk deltas to serve fetches
> > when reachability bitmaps tell us a more complete picture of what the
> > client has. That saves server CPU and results in smaller packs. See the
> > final patch for numbers and more discussion.
> 
> Here's a v2, with just a few cosmetic fixes to address the comments on
> v1 (range-diff below).

Whoops. I accidentally just sent these as replies to the wrong thread:

  https://public-inbox.org/git/20180821184140.ga24...@sigill.intra.peff.net/

I'll avoid spamming the list by re-sending, but let me know if it's
easier to simply repost.

-Peff


git@vger.kernel.org

2018-08-21 Thread Eric Sunshine
On Tue, Aug 21, 2018 at 2:55 PM Antoine Beaupré  wrote:
> On 2018-08-08 18:10:22, Matthieu Moy wrote:
> > "jrnieder"  wrote:
> >> (+cc: some folks interested in git-remote-mediawiki)
> >
> > In case it still matters, an obvious Acked-by: Matthieu Moy 
> > 
>
> I seem to have lost context of the original email, and can't find a copy
> on public-inbox.org... Is there a patch we should merge back into
> git-mediawiki already?

The patch is here[1].

[1]: 
https://public-inbox.org/git/20180730204646.32312-1-sunsh...@sunshineco.com/


[PATCH] SubmittingPatches: mention doc-diff

2018-08-21 Thread Jeff King
We already advise people to make sure their documentation
formats correctly. Let's point them at the doc-diff script,
which can help with that.

Let's also put a brief note in the script about its purpose,
since that otherwise can only be found in the original
commit message. Along with the existing -h/usage text,
that's hopefully enough for developers to make use of it.

Signed-off-by: Jeff King 
---
Just a finishing touch on the jk/diff-rendered-docs topic.

 Documentation/SubmittingPatches | 4 +++-
 Documentation/doc-diff  | 8 
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index b44fd51f27..ec8b205145 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -80,7 +80,9 @@ GitHub-Travis CI hints section for details.
 
 Do not forget to update the documentation to describe the updated
 behavior and make sure that the resulting documentation set formats
-well. It is currently a liberal mixture of US and UK English norms for
+well (try the Documentation/doc-diff script).
+
+We currently have a liberal mixture of US and UK English norms for
 spelling and grammar, which is somewhat unfortunate.  A huge patch that
 touches the files all over the place only to correct the inconsistency
 is not welcome, though.  Potential clashes with other changes that can
diff --git a/Documentation/doc-diff b/Documentation/doc-diff
index f483fe427c..6e285e648c 100755
--- a/Documentation/doc-diff
+++ b/Documentation/doc-diff
@@ -1,4 +1,12 @@
 #!/bin/sh
+#
+# Build two documentation trees and diff the resulting formatted output.
+# Compared to a source diff, this can reveal mistakes in the formatting.
+# For example:
+#
+#   ./doc-diff origin/master HEAD
+#
+# would show the differences introduced by a branch based on master.
 
 OPTIONS_SPEC="\
 doc-diff [options]   [-- ]
-- 
2.19.0.rc0.398.g138a08f6f6


[PATCH v2 0/2] Docs: Add commit-graph tech docs to Makefile

2018-08-21 Thread Derrick Stolee via GitGitGadget
Similar to [1], add the commit-graph and commit-graph-format technical docs
to Documentation/Makefile so they are automatically converted to HTML when
needed.

I compiled the docs and inspected the HTML manually in the browser. As
suggested, I modified the documents to format a bit better. See the commit
messages for details. Since the files had been modified since 'maint', this
version is based on 'master'.

[1] 
https://public-inbox.org/git/20180814222846.gg142...@aiede.svl.corp.google.com/
[PATCH] partial-clone: render design doc using asciidoc

Derrick Stolee (2):
  Docs: Add commit-graph tech docs to Makefile
  commit-graph.txt: improve formatting for asciidoc

 Documentation/Makefile   |  2 ++
 Documentation/technical/commit-graph.txt | 43 +++-
 2 files changed, 22 insertions(+), 23 deletions(-)


base-commit: fa03cdc39b951d1cfbfd690fe6f3ac6c57ab6a44
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-22%2Fderrickstolee%2Fmake-docs-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-22/derrickstolee/make-docs-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/22

Range-diff vs v1:

 1:  ef5af2ccc = 1:  4c66af626 Docs: Add commit-graph tech docs to Makefile
 -:  - > 2:  6cf253c2a commit-graph.txt: improve formatting for asciidoc

-- 
gitgitgadget


git@vger.kernel.org

2018-08-21 Thread Antoine Beaupré
On 2018-08-21 15:22:43, Eric Sunshine wrote:
> On Tue, Aug 21, 2018 at 2:55 PM Antoine Beaupré  wrote:
>> On 2018-08-08 18:10:22, Matthieu Moy wrote:
>> > "jrnieder"  wrote:
>> >> (+cc: some folks interested in git-remote-mediawiki)
>> >
>> > In case it still matters, an obvious Acked-by: Matthieu Moy 
>> > 
>>
>> I seem to have lost context of the original email, and can't find a copy
>> on public-inbox.org... Is there a patch we should merge back into
>> git-mediawiki already?
>
> The patch is here[1].
>
> [1]: 
> https://public-inbox.org/git/20180730204646.32312-1-sunsh...@sunshineco.com/

Thanks, so

Acked-by: Antoine Beaupré 

FWIW. :)

A.

-- 
The history of any one part of the earth, like the life of a soldier,
consists of long periods of boredom and short periods of terror.
   - British geologist Derek V. Ager


[PATCH v2 1/2] Docs: Add commit-graph tech docs to Makefile

2018-08-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

Ensure that the commit-graph.txt and commit-graph-format.txt files
are compiled to HTML using ASCIIDOC.

Signed-off-by: Derrick Stolee 
---
 Documentation/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index d079d7c73..841e4f705 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -69,6 +69,8 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out 
technical/api-index-skel.txt technica
 SP_ARTICLES += $(API_DOCS)
 
 TECH_DOCS += SubmittingPatches
+TECH_DOCS += technical/commit-graph
+TECH_DOCS += technical/commit-graph-format
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
-- 
gitgitgadget



[PATCH v2 2/2] commit-graph.txt: improve formatting for asciidoc

2018-08-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

When viewing commit-graph.txt as a plain-text document, it makes
sense to keep paragraphs left-padded between bullet points.
However, asciidoc converts these left-padded paragraphs as monospace
fonts, creating an unpleasant document. Remove the padding.

The "Future Work" section includes a bulleted list of items, and one
item has sub-items. These do not render properly in asciidoc, so
remove the sub-list and incorporate them into the paragraph.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph.txt | 43 +++-
 1 file changed, 20 insertions(+), 23 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
index c664acbd7..bdc3eab9e 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -49,23 +49,23 @@ Equivalently, the generation number of a commit A is one 
more than the
 length of a longest path from A to a root commit. The recursive definition
 is easier to use for computation and observing the following property:
 
-If A and B are commits with generation numbers N and M, respectively,
-and N <= M, then A cannot reach B. That is, we know without searching
-that B is not an ancestor of A because it is further from a root commit
-than A.
+If A and B are commits with generation numbers N and M, respectively,
+and N <= M, then A cannot reach B. That is, we know without searching
+that B is not an ancestor of A because it is further from a root commit
+than A.
 
-Conversely, when checking if A is an ancestor of B, then we only need
-to walk commits until all commits on the walk boundary have generation
-number at most N. If we walk commits using a priority queue seeded by
-generation numbers, then we always expand the boundary commit with highest
-generation number and can easily detect the stopping condition.
+Conversely, when checking if A is an ancestor of B, then we only need
+to walk commits until all commits on the walk boundary have generation
+number at most N. If we walk commits using a priority queue seeded by
+generation numbers, then we always expand the boundary commit with highest
+generation number and can easily detect the stopping condition.
 
 This property can be used to significantly reduce the time it takes to
 walk commits and determine topological relationships. Without generation
 numbers, the general heuristic is the following:
 
-If A and B are commits with commit time X and Y, respectively, and
-X < Y, then A _probably_ cannot reach B.
+If A and B are commits with commit time X and Y, respectively, and
+X < Y, then A _probably_ cannot reach B.
 
 This heuristic is currently used whenever the computation is allowed to
 violate topological relationships due to clock skew (such as "git log"
@@ -121,11 +121,8 @@ Future Work
 - After computing and storing generation numbers, we must make graph
   walks aware of generation numbers to gain the performance benefits they
   enable. This will mostly be accomplished by swapping a commit-date-ordered
-  priority queue with one ordered by generation number. The following
-  operations are important candidates:
-
-- 'log --topo-order'
-- 'tag --merged'
+  priority queue with one ordered by generation number. Commands that could
+  improve include 'git log --topo-order' and 'git tag --merged'.
 
 - A server could provide a commit graph file as part of the network protocol
   to avoid extra calculations by clients. This feature is only of benefit if
@@ -148,13 +145,13 @@ Related Links
 More discussion about generation numbers and not storing them inside
 commit objects. A valuable quote:
 
-"I think we should be moving more in the direction of keeping
- repo-local caches for optimizations. Reachability bitmaps have been
- a big performance win. I think we should be doing the same with our
- properties of commits. Not just generation numbers, but making it
- cheap to access the graph structure without zlib-inflating whole
- commit objects (i.e., packv4 or something like the "metapacks" I
- proposed a few years ago)."
+"I think we should be moving more in the direction of keeping
+ repo-local caches for optimizations. Reachability bitmaps have been
+ a big performance win. I think we should be doing the same with our
+ properties of commits. Not just generation numbers, but making it
+ cheap to access the graph structure without zlib-inflating whole
+ commit objects (i.e., packv4 or something like the "metapacks" I
+ proposed a few years ago)."
 
 [4] 
https://public-inbox.org/git/20180108154822.54829-1-...@jeffhostetler.com/T/#u
 A patch to remove the ahead-behind calculation from 'status'.
-- 
gitgitgadget


Re: [PATCH v2 0/6] reuse on-disk deltas for fetches with bitmaps

2018-08-21 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Aug 17, 2018 at 04:54:27PM -0400, Jeff King wrote:
>
>> This series more aggressively reuses on-disk deltas to serve fetches
>> when reachability bitmaps tell us a more complete picture of what the
>> client has. That saves server CPU and results in smaller packs. See the
>> final patch for numbers and more discussion.
>
> Here's a v2, with just a few cosmetic fixes to address the comments on
> v1 (range-diff below).
>
>   [1/6]: t/perf: factor boilerplate out of test_perf
>   [2/6]: t/perf: factor out percent calculations
>   [3/6]: t/perf: add infrastructure for measuring sizes
>   [4/6]: t/perf: add perf tests for fetches from a bitmapped server
>   [5/6]: pack-bitmap: save "have" bitmap from walk
>   [6/6]: pack-objects: reuse on-disk deltas for thin "have" objects

Thanks.

> 1:  89fa0ec8d8 ! 1:  3e1b94d7d6 pack-bitmap: save "have" bitmap from walk
> @@ -69,6 +69,8 @@
>  +
>  +if (!bitmap_git)
>  +return 0; /* no bitmap loaded */
> ++if (!bitmap_git->result)
> ++BUG("failed to perform bitmap walk before querying");
>  +if (!bitmap_git->haves)
>  +return 0; /* walk had no "haves" */
>  +

The first four are unchanged, so this actually compares 5/6 of the
previous and the current one.  Omitting the four identical ones
makes sense, but I wonder if it makes it easier to see if we keep
the number-label of the surviving patches.

> 2:  f7ca0d59e3 ! 2:  b8b2416aac pack-objects: reuse on-disk deltas for thin 
> "have" objects



Re: [PATCH v2 2/2] commit-graph.txt: improve formatting for asciidoc

2018-08-21 Thread Eric Sunshine
On Tue, Aug 21, 2018 at 3:29 PM Derrick Stolee via GitGitGadget
 wrote:
> When viewing commit-graph.txt as a plain-text document, it makes
> sense to keep paragraphs left-padded between bullet points.
> However, asciidoc converts these left-padded paragraphs as monospace
> fonts, creating an unpleasant document. Remove the padding.
>
> The "Future Work" section includes a bulleted list of items, and one
> item has sub-items. These do not render properly in asciidoc, so
> remove the sub-list and incorporate them into the paragraph.

See: http://asciidoc.org/userguide.html#_bulleted_lists

And...

> diff --git a/Documentation/technical/commit-graph.txt 
> b/Documentation/technical/commit-graph.txt
> @@ -148,13 +145,13 @@ Related Links
> -"I think we should be moving more in the direction of keeping
> - repo-local caches for optimizations. Reachability bitmaps have been
> - a big performance win. I think we should be doing the same with our
> - properties of commits. Not just generation numbers, but making it
> - cheap to access the graph structure without zlib-inflating whole
> - commit objects (i.e., packv4 or something like the "metapacks" I
> - proposed a few years ago)."
> +"I think we should be moving more in the direction of keeping
> + repo-local caches for optimizations. Reachability bitmaps have been
> + a big performance win. I think we should be doing the same with our
> + properties of commits. Not just generation numbers, but making it
> + cheap to access the graph structure without zlib-inflating whole
> + commit objects (i.e., packv4 or something like the "metapacks" I
> + proposed a few years ago)."

Perhaps this should be using a quote block:
http://asciidoc.org/userguide.html#_quote_blocks


worktree duplicates, was: [PATCH] SubmittingPatches: mention doc-diff

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 03:23:21PM -0400, Jeff King wrote:

> We already advise people to make sure their documentation
> formats correctly. Let's point them at the doc-diff script,
> which can help with that.
> 
> Let's also put a brief note in the script about its purpose,
> since that otherwise can only be found in the original
> commit message. Along with the existing -h/usage text,
> that's hopefully enough for developers to make use of it.
> 
> Signed-off-by: Jeff King 
> ---
> Just a finishing touch on the jk/diff-rendered-docs topic.

I noticed one other oddity with this script, but I actually think the
fix lies elsewhere.

The script does basically this to set up the temporary tree:

  test -d $tmp || git worktree add $tmp ...

The script never cleans up the worktree (since its results can often be
reused between runs), but you may do so with "rm" or "git clean". That
creates an interesting situation if the script is run again before
"worktree prune" runs. We identify the directory as a "new" worktree,
and add it to the list. So you may end up with several copies:

  $ git worktree list
  /home/peff/compile/git  eee785d2e0 
[jk/doc-diff]
  /home/peff/compile/git/.git/tmp-ci  290f16acda 
(detached HEAD)
  /home/peff/compile/git/Documentation/tmp-doc-diff/worktree  cc6237c051 
(detached HEAD)
  /home/peff/compile/git/Documentation/tmp-doc-diff/worktree  e55de40950 
(detached HEAD)
  /home/peff/compile/git/Documentation/tmp-doc-diff/worktree  e55de40950 
(detached HEAD)

If I then run "git worktree prune", those duplicates don't go away
(because the directory is still there; it just corresponds to only the
final entry). If I delete the tmp-doc-diff directory and then run "git
worktree prune", they do all go away.

So I'm not sure:

  1. Should the script be doing something else to indicate that the
 worktree may be reused? I tried "git worktree remove", but it's
 unhappy that the directory doesn't exist. Should it quietly handle
 ignore that and remove any leftover cruft in $GIT_DIR/worktrees?

  2. Should "git worktree add" be more clever about realizing that an
 existing entry in $GIT_DIR/worktrees points to this directory? That
 would be fine for my use, but I wonder if there's some potential
 for loss (e.g., you blew away the work tree but until you do a
 "worktree prune", the refs are still there, objects reachable,
 etc).

  3. Should "git worktree prune" be more clever about dropping
 duplicates? I think it should be easy to identify them: they are
 entries in $GIT_DIR/worktrees for which:

   - the directory in $entry/gitdir does exist, but

   - $(cat $entry/gitdir)/.git does not point back to $entry

I could see any of them being plausible fixes, but people who have given
worktrees a lot of thought may have stronger opinions.

-Peff


Re: [PATCH] SubmittingPatches: mention doc-diff

2018-08-21 Thread Derrick Stolee

On 8/21/2018 3:23 PM, Jeff King wrote:

We already advise people to make sure their documentation
formats correctly. Let's point them at the doc-diff script,
which can help with that.

Let's also put a brief note in the script about its purpose,
since that otherwise can only be found in the original
commit message. Along with the existing -h/usage text,
that's hopefully enough for developers to make use of it.


This is helpful, thanks!


Signed-off-by: Jeff King 
---
Just a finishing touch on the jk/diff-rendered-docs topic.

  Documentation/SubmittingPatches | 4 +++-
  Documentation/doc-diff  | 8 
  2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index b44fd51f27..ec8b205145 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -80,7 +80,9 @@ GitHub-Travis CI hints section for details.
  
  Do not forget to update the documentation to describe the updated

  behavior and make sure that the resulting documentation set formats
-well. It is currently a liberal mixture of US and UK English norms for
+well (try the Documentation/doc-diff script).
+
+We currently have a liberal mixture of US and UK English norms for
  spelling and grammar, which is somewhat unfortunate.  A huge patch that
  touches the files all over the place only to correct the inconsistency
  is not welcome, though.  Potential clashes with other changes that can
diff --git a/Documentation/doc-diff b/Documentation/doc-diff
index f483fe427c..6e285e648c 100755
--- a/Documentation/doc-diff
+++ b/Documentation/doc-diff
@@ -1,4 +1,12 @@
  #!/bin/sh
+#
+# Build two documentation trees and diff the resulting formatted output.
+# Compared to a source diff, this can reveal mistakes in the formatting.
+# For example:
+#
+#   ./doc-diff origin/master HEAD
+#
+# would show the differences introduced by a branch based on master.
  
  OPTIONS_SPEC="\

  doc-diff [options]   [-- ]


Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects

2018-08-21 Thread Junio C Hamano
Jeff King  writes:

> When we serve a fetch, we pass the "wants" and "haves" from
> ...
> This lets us limit the change primarily to the oe_delta()
> and oe_set_delta_ext() functions. And as a bonus, most of
> the rest of the code does not consider these dummy entries
> at all, saving both runtime CPU and code complexity.
>
> Signed-off-by: Jeff King 
>
> Signed-off-by: Jeff King 
> ---

Sorry for commenting on something completely off-topic, but when
applied with "git am -s", I get a resulting commit with 3 S-o-b (the
above two, plus the one added by "-s"), with a blank line in between
them.  I can understand the first blank line (the one between your
two S-o-b), as the first S-o-b does not even appear to be part of
the trailer block, but cannot explain why I get an extra one before
the one added by "-s".  Puzzled...

> @@ -79,6 +81,7 @@ static unsigned long pack_size_limit;
>  static int depth = 50;
>  static int delta_search_threads;
>  static int pack_to_stdout;
> +static int thin;

It appears that this line is the only change since the previous
round.  The remainder of the patch looks cleanly done and readable.

Thanks.


Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk

2018-08-21 Thread Derrick Stolee

On 8/21/2018 3:07 PM, Jeff King wrote:

When we do a bitmap walk, we save the result, which
represents (WANTs & ~HAVEs); i.e., every object we care
about visiting in our walk. However, we throw away the
haves bitmap, which can sometimes be useful, too. Save it
and provide an access function so code which has performed a
walk can query it.


This makes a lot of sense. Based on the amount of time the "Counting 
Objects" blog post [1] spent talking about delta bases, I would have 
assumed this "haves" bitmap was already part of it. But, I can also see 
how it could be dropped if you are focusing on performance for 'git clone'.




A few notes on the accessor interface:

  - the bitmap code calls these "haves" because it grew out
of the want/have negotiation for fetches. But really,
these are simply the objects that would be flagged
UNINTERESTING in a regular traversal. Let's use that
more universal nomenclature for the external module
interface. We may want to change the internal naming
inside the bitmap code, but that's outside the scope of
this patch.


I saw the uninteresting-vs-haves name confusion in the patch below, but 
I agree with your logic here.


Sorry that I'm late to the party, but I was interested in the topic.

Thanks,

-Stolee

[1] https://githubengineering.com/counting-objects/

    "Counting Objects" by Vincent Martí



Re: [PATCH v2 0/6] reuse on-disk deltas for fetches with bitmaps

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 12:34:18PM -0700, Junio C Hamano wrote:

> > 1:  89fa0ec8d8 ! 1:  3e1b94d7d6 pack-bitmap: save "have" bitmap from walk
> > @@ -69,6 +69,8 @@
> >  +
> >  +  if (!bitmap_git)
> >  +  return 0; /* no bitmap loaded */
> > ++  if (!bitmap_git->result)
> > ++  BUG("failed to perform bitmap walk before querying");
> >  +  if (!bitmap_git->haves)
> >  +  return 0; /* walk had no "haves" */
> >  +
> 
> The first four are unchanged, so this actually compares 5/6 of the
> previous and the current one.  Omitting the four identical ones
> makes sense, but I wonder if it makes it easier to see if we keep
> the number-label of the surviving patches.

Agreed, but I think this is user error, and not the tool.

I ran:

  git range-diff @{push}...HEAD

since I knew that I had not pushed since beginning my revisions today.
But of course "rebase -i" is clever enough not to change the commit id
on the earlier commits I did not touch, and thus the merge base is
actually patch 4.

I should instead be more explicit about the base, like:

  git range-diff origin @{push} HEAD

That shows much more sensible output (see below).

For my triangular setup, I could even do:

  git range-diff @{upstream} @{push} HEAD

but I'm not sure if that is generally applicable advice (I'm not sure
how many people have really bought into @{push} and using triangular
config -- traditionally I think many people treat @{upstream} as the
place they push to). It also needs adjusting if your revisions might
span several sessions; you'd really need @{push}@{yesterday} or similar.
The best thing to compare against is probably what got queued, so
something like:

  git range-diff origin..origin/jk/$branch_name origin..HEAD

though that also introduces sign-off noise.

-Peff

-- >8 --
1:  9665189d70 = 1:  9665189d70 t/perf: factor boilerplate out of test_perf
2:  fa1ad80e4e = 2:  fa1ad80e4e t/perf: factor out percent calculations
3:  abf0ddbb9f = 3:  abf0ddbb9f t/perf: add infrastructure for measuring sizes
4:  49981526ad = 4:  49981526ad t/perf: add perf tests for fetches from a 
bitmapped server
5:  89fa0ec8d8 ! 5:  3e1b94d7d6 pack-bitmap: save "have" bitmap from walk
@@ -69,6 +69,8 @@
 +
 +  if (!bitmap_git)
 +  return 0; /* no bitmap loaded */
++  if (!bitmap_git->result)
++  BUG("failed to perform bitmap walk before querying");
 +  if (!bitmap_git->haves)
 +  return 0; /* walk had no "haves" */
 +
6:  f7ca0d59e3 ! 6:  b8b2416aac pack-objects: reuse on-disk deltas for thin 
"have" objects
@@ -12,7 +12,7 @@
 However, this misses some opportunities. Modulo some special
 cases like shallow or partial clones, we know that every
 object reachable from the "haves" could be a preferred base.
-We don't use them all for two reasons:
+We don't use all of them for two reasons:
 
   1. It's expensive to traverse the whole history and
  enumerate all of the objects the other side has.
@@ -100,15 +100,16 @@
 
 The second is that the rest of the code assumes that any
 reused delta will point to another "struct object_entry" as
-its base. But by definition, we don't have such an entry!
+its base. But of course the case we are interested in here
+is the one where don't have such an entry!
 
 I looked at a number of options that didn't quite work:
 
- - we could use a different flag for reused deltas. But it's
-   not a single bit for "I'm being reused". We have to
-   actually store the oid of the base, which is normally
-   done by pointing to the existing object_entry. And we'd
-   have to modify all the code which looks at deltas.
+ - we could use a flag to signal a reused delta, but it's
+   not a single bit. We have to actually store the oid of
+   the base, which is normally done by pointing to the
+   existing object_entry. And we'd have to modify all the
+   code which looks at deltas.
 
  - we could add the reused bases to the end of the existing
object_entry array. While this does create some extra
@@ -173,7 +174,7 @@
  static int depth = 50;
  static int delta_search_threads;
  static int pack_to_stdout;
-+static int thin = 0;
++static int thin;
  static int num_preferred_base;
  static struct progress *progress_state;
  




Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 03:47:36PM -0400, Derrick Stolee wrote:

> On 8/21/2018 3:07 PM, Jeff King wrote:
> > When we do a bitmap walk, we save the result, which
> > represents (WANTs & ~HAVEs); i.e., every object we care
> > about visiting in our walk. However, we throw away the
> > haves bitmap, which can sometimes be useful, too. Save it
> > and provide an access function so code which has performed a
> > walk can query it.
> 
> This makes a lot of sense. Based on the amount of time the "Counting
> Objects" blog post [1] spent talking about delta bases, I would have assumed
> this "haves" bitmap was already part of it. But, I can also see how it could
> be dropped if you are focusing on performance for 'git clone'.

I think that blog post was written after we were already using this
patch. ;)

> > A few notes on the accessor interface:
> > 
> >   - the bitmap code calls these "haves" because it grew out
> > of the want/have negotiation for fetches. But really,
> > these are simply the objects that would be flagged
> > UNINTERESTING in a regular traversal. Let's use that
> > more universal nomenclature for the external module
> > interface. We may want to change the internal naming
> > inside the bitmap code, but that's outside the scope of
> > this patch.
> 
> I saw the uninteresting-vs-haves name confusion in the patch below, but I
> agree with your logic here.
> 
> Sorry that I'm late to the party, but I was interested in the topic.

This has already missed the freeze for v2.19, so I think there is plenty
of time. More review would be welcome.

-Peff


Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects

2018-08-21 Thread Junio C Hamano
Junio C Hamano  writes:

> Jeff King  writes:
>
>> When we serve a fetch, we pass the "wants" and "haves" from
>> ...
>> This lets us limit the change primarily to the oe_delta()
>> and oe_set_delta_ext() functions. And as a bonus, most of
>> the rest of the code does not consider these dummy entries
>> at all, saving both runtime CPU and code complexity.
>>
>> Signed-off-by: Jeff King 
>>
>> Signed-off-by: Jeff King 
>> ---
>
> Sorry for commenting on something completely off-topic, but when
> applied with "git am -s", I get a resulting commit with 3 S-o-b (the
> above two, plus the one added by "-s"), with a blank line in between
> them.  I can understand the first blank line (the one between your
> two S-o-b), as the first S-o-b does not even appear to be part of
> the trailer block, but cannot explain why I get an extra one before
> the one added by "-s".  Puzzled...

I think your original "two s-o-b with a blank line in between" was
caused by the same problem, and "git commit --amend -s" perhaps
added an extra one at the end, and added a blank line before the
last "paragraph" while at it?

My suspicion is the long horizontal line at the beginning of the
table, triggers it.  I haven't followed the code closely yet,
though.


Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 12:43:49PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > When we serve a fetch, we pass the "wants" and "haves" from
> > ...
> > This lets us limit the change primarily to the oe_delta()
> > and oe_set_delta_ext() functions. And as a bonus, most of
> > the rest of the code does not consider these dummy entries
> > at all, saving both runtime CPU and code complexity.
> >
> > Signed-off-by: Jeff King 
> >
> > Signed-off-by: Jeff King 
> > ---
> 
> Sorry for commenting on something completely off-topic, but when
> applied with "git am -s", I get a resulting commit with 3 S-o-b (the
> above two, plus the one added by "-s"), with a blank line in between
> them.  I can understand the first blank line (the one between your
> two S-o-b), as the first S-o-b does not even appear to be part of
> the trailer block, but cannot explain why I get an extra one before
> the one added by "-s".  Puzzled...

Hmm. This case is most curious.

I don't normally have a S-o-b in my commits themselves, but in this case
I had done quite a bit of editing in my MUA while sending out the
patches, so I picked up the result for my local branch with a "git
reset --hard origin && git am ~/what-i-sent".

Then for sending v2, I did my usual "format-patch -s". For the first 5
patches, it worked fine, but for this one, it created the funny
duplicate, which should have been suppressed. So something about this
commit message seems to confuse our trailer-parsing code. And indeed,
running:

  git show -s --format='%(trailers)'

shows nothing, which explains the other behavior (we don't think there's
a trailer block, so we make a new one). I wonder if it gets confused by
the big block of perf results (which all start with "^%[0-9.]:").

-Peff


Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 12:50:09PM -0700, Junio C Hamano wrote:

> > Sorry for commenting on something completely off-topic, but when
> > applied with "git am -s", I get a resulting commit with 3 S-o-b (the
> > above two, plus the one added by "-s"), with a blank line in between
> > them.  I can understand the first blank line (the one between your
> > two S-o-b), as the first S-o-b does not even appear to be part of
> > the trailer block, but cannot explain why I get an extra one before
> > the one added by "-s".  Puzzled...
> 
> I think your original "two s-o-b with a blank line in between" was
> caused by the same problem, and "git commit --amend -s" perhaps
> added an extra one at the end, and added a blank line before the
> last "paragraph" while at it?
> 
> My suspicion is the long horizontal line at the beginning of the
> table, triggers it.  I haven't followed the code closely yet,
> though.

Ah, yeah, I think you're right. We call find_patch_start(), which thinks
the "---" line is the end of the commit message. That makes sense when
parsing trailers out of "format-patch" output, but not when we know we
have just the commit message.

So one obvious fix is a new option for the trailer code to tell it we
have _just_ a commit message. That would still leave this obvious false
positive for the format-patch case, but I'm not sure it can be helped.

-Peff


Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 04:07:47PM -0400, Jeff King wrote:

> On Tue, Aug 21, 2018 at 12:50:09PM -0700, Junio C Hamano wrote:
> 
> > > Sorry for commenting on something completely off-topic, but when
> > > applied with "git am -s", I get a resulting commit with 3 S-o-b (the
> > > above two, plus the one added by "-s"), with a blank line in between
> > > them.  I can understand the first blank line (the one between your
> > > two S-o-b), as the first S-o-b does not even appear to be part of
> > > the trailer block, but cannot explain why I get an extra one before
> > > the one added by "-s".  Puzzled...
> > 
> > I think your original "two s-o-b with a blank line in between" was
> > caused by the same problem, and "git commit --amend -s" perhaps
> > added an extra one at the end, and added a blank line before the
> > last "paragraph" while at it?
> > 
> > My suspicion is the long horizontal line at the beginning of the
> > table, triggers it.  I haven't followed the code closely yet,
> > though.
> 
> Ah, yeah, I think you're right. We call find_patch_start(), which thinks
> the "---" line is the end of the commit message. That makes sense when
> parsing trailers out of "format-patch" output, but not when we know we
> have just the commit message.
> 
> So one obvious fix is a new option for the trailer code to tell it we
> have _just_ a commit message. That would still leave this obvious false
> positive for the format-patch case, but I'm not sure it can be helped.

Another is to tighten the check. Something like this seems more
sensible:

diff --git a/trailer.c b/trailer.c
index 4e309460d1..92ec5cae82 100644
--- a/trailer.c
+++ b/trailer.c
@@ -793,7 +793,8 @@ static int find_patch_start(const char *str)
const char *s;
 
for (s = str; *s; s = next_line(s)) {
-   if (starts_with(s, "---"))
+   const char *v;
+   if (skip_prefix(s, "---", &v) && isspace(*v))
return s - str;
}
 

as it would catch "--- /some/file", "--- ", "---\n", and "---\r\n", but
not longer dashed lines. I wondered what "git am" does, though, and I
think it is mailinfo.c:patchbreak(), which has a few other cases to
handle things that git itself would not have generated. I don't know if
that's worth supporting or not.

I think there really are two bugs here, though. The find_patch_start()
check is overly lax, but we also should not have to use it at all when
we know there is no patch.

-Peff


Re: worktree duplicates, was: [PATCH] SubmittingPatches: mention doc-diff

2018-08-21 Thread Eric Sunshine
On Tue, Aug 21, 2018 at 3:36 PM Jeff King  wrote:
> The script does basically this to set up the temporary tree:
>
>   test -d $tmp || git worktree add $tmp ...
>
> The script never cleans up the worktree (since its results can often be
> reused between runs), but you may do so with "rm" or "git clean". That
> creates an interesting situation if the script is run again before
> "worktree prune" runs.

Aside from the problems you enumerate below, leaving worktrees sitting
around which the user did not create explicitly does seem a bit
unfriendly, which leads me to think that worktrees may not be the best
tool for this task. How about using "git clone --shared" instead?

More below...

> We identify the directory as a "new" worktree,
> and add it to the list. So you may end up with several copies:
>
>   $ git worktree list
>   [...]
>   /home/peff/compile/git/Documentation/tmp-doc-diff/worktree  cc6237c051 
> (detached HEAD)
>   /home/peff/compile/git/Documentation/tmp-doc-diff/worktree  e55de40950 
> (detached HEAD)
>   /home/peff/compile/git/Documentation/tmp-doc-diff/worktree  e55de40950 
> (detached HEAD)
>
> If I then run "git worktree prune", those duplicates don't go away
> (because the directory is still there; it just corresponds to only the
> final entry). If I delete the tmp-doc-diff directory and then run "git
> worktree prune", they do all go away.
>
>   1. Should the script be doing something else to indicate that the
>  worktree may be reused? I tried "git worktree remove", but it's
>  unhappy that the directory doesn't exist. Should it quietly handle
>  ignore that and remove any leftover cruft in $GIT_DIR/worktrees?

That's a weird case. There are multiple entries in
.git/worktrees/*/gitdir pointing at the same worktree directory, which
I don't think was considered when the machinery was being designed.
"git worktree remove" refusing to delete the worktree in this case
seems a good safety measure since something is obviously askew in the
bookkeeping and it doesn't want to lose potential work.

The solution to this problem might be to upgrade "prune" as you
describe in #3 and then ensure that that sort of aggressive pruning
happens automatically at "git worktree add" time.

>   2. Should "git worktree add" be more clever about realizing that an
>  existing entry in $GIT_DIR/worktrees points to this directory? That
>  would be fine for my use, but I wonder if there's some potential
>  for loss (e.g., you blew away the work tree but until you do a
>  "worktree prune", the refs are still there, objects reachable,
>  etc).

In the case that you've already blown away the directory, then having
"git worktree add" prune away the old worktree bookkeeping would make
sense and wouldn't lose anything (you've already thrown it away
manually). However, it could be lossy for the case when the directory
is only temporarily missing (because it's on removable media or a
network share).

In this case, it might make sense for "git worktree add" to refuse to
operate if an existing worktree entry still points at the directory
that you're trying to add. That should prevent those duplicate
worktree entries you saw.

>   3. Should "git worktree prune" be more clever about dropping
>  duplicates? I think it should be easy to identify them: they are
>  entries in $GIT_DIR/worktrees for which:
>
>- the directory in $entry/gitdir does exist, but
>- $(cat $entry/gitdir)/.git does not point back to $entry

Seems a sensible improvement to the pruning logic.

However, upon further consideration, any of the proposed "fixes" could
potentially be lossy. Consider a case like this:

% git worktree add foo
... make some changes in 'foo' ...
% mv foo bar # (fogetting to do "git worktree move foo bar")
% git worktree add foo

As currently implemented, one can "correct" the situation by manually
fixing the bookkeeping file in .git/worktrees for the worktree created
first. If it gets pruned automatically, then the state of those
changes in "bar" (nee "foo") could be lost.

So, I'm not sure what, if any, fix is appropriate.

Such uncertainty also further argues in favor of "git clone --shared"
for your particular use-case, I think.


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-21 Thread Derrick Stolee

On 8/20/2018 6:13 PM, Junio C Hamano wrote:

An early preview release Git v2.19.0-rc0 is now available for
testing at the usual places.


As part of testing the release candidate, I ran the performance suite 
against a fresh clone of the Linux repository using v2.18.0 and 
v2.19.0-rc0 (also: GIT_PERF_REPEAT_COUNT=10). I found a few nice 
improvements, but I also found a possible regression in tree walking. I 
say "tree walking" because it was revealed using p0001-rev-list.sh, but 
only with the "--objects" flag. I also saw some similar numbers on 'git 
log --raw'.


Test v2.18.0 v2.19.0-rc0

0001.1: rev-list --all 6.69(6.33+0.35) 6.52(6.20+0.31) -2.5%
0001.2: rev-list --all --objects 52.14(47.43+1.02)   57.15(51.09+1.18) +9.6%

To me, 9.6% seems out of the range of just noise for this length of a 
command, but I could be wrong. Could anyone else try to repro these results?


(This may also not just be tree-walking, but general pack-file loading 
and decompression, since I computed and stored a commit-graph file. 
Hence, commits are not being parsed from the pack-file by either command.)


Aside: the perf results were not all bad. Here was an interesting 
improvement:


Test v2.18.0 v2.19.0-rc0

0002.1: read_cache/discard_cache 1000 times 5.63(5.30+0.32)   
3.34(3.03+0.30) -40.7%


Thanks,

-Stolee



Re: worktree duplicates, was: [PATCH] SubmittingPatches: mention doc-diff

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 04:22:08PM -0400, Eric Sunshine wrote:

> On Tue, Aug 21, 2018 at 3:36 PM Jeff King  wrote:
> > The script does basically this to set up the temporary tree:
> >
> >   test -d $tmp || git worktree add $tmp ...
> >
> > The script never cleans up the worktree (since its results can often be
> > reused between runs), but you may do so with "rm" or "git clean". That
> > creates an interesting situation if the script is run again before
> > "worktree prune" runs.
> 
> Aside from the problems you enumerate below, leaving worktrees sitting
> around which the user did not create explicitly does seem a bit
> unfriendly, which leads me to think that worktrees may not be the best
> tool for this task. How about using "git clone --shared" instead?

That seems even more dangerous to me, since the created clone can become
corrupt when the parent prunes. Probably not huge for a single
operation, but you may be surprised when you run the script a few days
later and it barfs horribly due to a missing object.

We could do a local filesystem clone which would make a hardlink. But
eventually those hardlinks would be broken, and we'd be wasting a lot of
extra space (and there's no provision for repacking or maintenance of
the temp repo; again, OK for a day or two, but if maybe not if you leave
this thing lying around for months).

I also considered just using "git archive | tar xf -" to create the
checkouts.  But we really would like to move between trees without
updating files unnecessarily (to avoid triggering rebuilds via make).

We could do that without a full-on worktree by just keeping our own
temporary index. But I think that potentially ends up with similar
problems to the "clone --shared" one: at some point objects in the
parent go away, and it has no idea about our custom index.

I think a worktree with a detached HEAD is roughly the same concept, but
with an officially-approved mechanism by which the parent knows about
our worktree.

> >   1. Should the script be doing something else to indicate that the
> >  worktree may be reused? I tried "git worktree remove", but it's
> >  unhappy that the directory doesn't exist. Should it quietly handle
> >  ignore that and remove any leftover cruft in $GIT_DIR/worktrees?
> 
> That's a weird case. There are multiple entries in
> .git/worktrees/*/gitdir pointing at the same worktree directory, which
> I don't think was considered when the machinery was being designed.
> "git worktree remove" refusing to delete the worktree in this case
> seems a good safety measure since something is obviously askew in the
> bookkeeping and it doesn't want to lose potential work.
> 
> The solution to this problem might be to upgrade "prune" as you
> describe in #3 and then ensure that that sort of aggressive pruning
> happens automatically at "git worktree add" time.

I would feel funny about running "git worktree prune" in my script,
since it may also delete _other_ worktrees. But if "worktree prune"
handled this case, I'd be OK just ignoring occasional duplicates, and
periodic "git gc" would clean up the cruft.

> >   2. Should "git worktree add" be more clever about realizing that an
> >  existing entry in $GIT_DIR/worktrees points to this directory? That
> >  would be fine for my use, but I wonder if there's some potential
> >  for loss (e.g., you blew away the work tree but until you do a
> >  "worktree prune", the refs are still there, objects reachable,
> >  etc).
> 
> In the case that you've already blown away the directory, then having
> "git worktree add" prune away the old worktree bookkeeping would make
> sense and wouldn't lose anything (you've already thrown it away
> manually). However, it could be lossy for the case when the directory
> is only temporarily missing (because it's on removable media or a
> network share).

I think the removable ones already suffer from that problem (an auto-gc
can prune them). And they should already be marked with "git worktree
lock". That said, people don't always do what they should, and I'd
rather not make the problem worse. :)

> In this case, it might make sense for "git worktree add" to refuse to
> operate if an existing worktree entry still points at the directory
> that you're trying to add. That should prevent those duplicate
> worktree entries you saw.

Yes, but then what's the next step for my script? I can't "remove" since
the worktree isn't there. I can't blow away any directory that I know
about, since there isn't one. I need to somehow know that an existing
"$GIT_DIR/worktrees/foo" is the problem. But "foo" is not even
deterministic. Looking at the duplicates, it seems to be the basename of
the working tree, but then mutated to avoid collisions with other
worktrees.

What about refusing by default, but forcing an overwrite with "-f"?

That should keep the existing case as safe as now, but give an outlet
for this kind of case where the caller is in charge of the worktree and
know

Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects

2018-08-21 Thread Junio C Hamano
Jeff King  writes:

> Another is to tighten the check. Something like this seems more
> sensible:

Great minds think alike; is it a space was exactly what I was toying
with before I went to lunch.  FWIW I saw the full test suite passes
when I came back and then saw you had an identical patch ;-)

> I think there really are two bugs here, though. The find_patch_start()
> check is overly lax, but we also should not have to use it at all when
> we know there is no patch.

Yes, I was grepping for callchains, and it appeared none of them
actually expected us to feed "log plus --- plus patch" format.  The
obvious candidate to take it is "am" but we ask mailinfo to do the
splitting before the log message part even hits the rest of the
system.  So my inclination right now is to see if that is truly the
case and get rid of that bogus "patch start" thing, and if not, add
a flag to let the caller say "I know we only have message" to the
callchain.


[PATCH] commit: use timestamp_t for author_date_slab

2018-08-21 Thread Derrick Stolee
The author_date_slab is used to store the author date of a commit
when walking with the --author-date flag in rev-list or log. This
was added as an 'unsigned long' in

81c6b38b "log: --author-date-order"

Since 'unsigned long' is ambiguous in its bit-ness across platforms
(64-bit in Linux, 32-bit in Windows, for example), most references
to the author dates in commit.c were converted to timestamp_t in

dddbad72 "timestamp_t: a new data type for timestamps"

However, the slab definition was missed, leading to a mismatch in
the data types in Windows. This would not reveal itself as a bug
unless someone authors a commit after February 2106, but commits
can store anything as their author date.

Signed-off-by: Derrick Stolee 
---

I found this while reading up on the revision-walk machinery. This code
hasn't been touched in years, so could apply to 'maint'.

 commit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index 0030e79940..2ce5052b3e 100644
--- a/commit.c
+++ b/commit.c
@@ -609,7 +609,7 @@ struct commit *pop_commit(struct commit_list **stack)
 define_commit_slab(indegree_slab, int);
 
 /* record author-date for each commit object */
-define_commit_slab(author_date_slab, unsigned long);
+define_commit_slab(author_date_slab, timestamp_t);
 
 static void record_author_date(struct author_date_slab *author_date,
   struct commit *commit)
-- 
2.19.0.rc0



Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects

2018-08-21 Thread Junio C Hamano
Jeff King  writes:

> Ah, yeah, I think you're right. We call find_patch_start(), which thinks
> the "---" line is the end of the commit message. That makes sense when
> parsing trailers out of "format-patch" output, but not when we know we
> have just the commit message.

Yes, but that does not explain what we are seeing.  If the code
mistakenly thinks that the log message ends before that table, then
it should have inserted the S-o-b: _before_ that table, but that is
not happening.

So there are three issues; (1) find-patch-start uses too weak a
logic to find the beginning of a patch section (2) even if it found
the right place, its caller does not tell "commit --amend -s" where
the log message ends correctly and (3) some callchains that get
there know they only have a log message but there is no way to take
advantage of that information and skip the call to find-patch-start.


Re: [PATCH] commit: use timestamp_t for author_date_slab

2018-08-21 Thread Junio C Hamano
Derrick Stolee  writes:

> The author_date_slab is used to store the author date of a commit
> when walking with the --author-date flag in rev-list or log. This
> was added as an 'unsigned long' in
>
>   81c6b38b "log: --author-date-order"
>
> Since 'unsigned long' is ambiguous in its bit-ness across platforms
> (64-bit in Linux, 32-bit in Windows, for example), most references
> to the author dates in commit.c were converted to timestamp_t in
>
>   dddbad72 "timestamp_t: a new data type for timestamps"
>
> However, the slab definition was missed,

Makes sense, and obviously correct.

> I found this while reading up on the revision-walk machinery. This code
> hasn't been touched in years, so could apply to 'maint'.

Thanks.


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 04:41:02PM -0400, Derrick Stolee wrote:

> On 8/20/2018 6:13 PM, Junio C Hamano wrote:
> > An early preview release Git v2.19.0-rc0 is now available for
> > testing at the usual places.
> 
> As part of testing the release candidate, I ran the performance suite
> against a fresh clone of the Linux repository using v2.18.0 and v2.19.0-rc0
> (also: GIT_PERF_REPEAT_COUNT=10).

Wow, you're a glutton for punishment. :)

> I found a few nice improvements, but I
> also found a possible regression in tree walking. I say "tree walking"
> because it was revealed using p0001-rev-list.sh, but only with the
> "--objects" flag. I also saw some similar numbers on 'git log --raw'.
> 
> Test v2.18.0 v2.19.0-rc0
> 
> 0001.1: rev-list --all 6.69(6.33+0.35) 6.52(6.20+0.31) -2.5%
> 0001.2: rev-list --all --objects 52.14(47.43+1.02)   57.15(51.09+1.18) +9.6%
> 
> To me, 9.6% seems out of the range of just noise for this length of a
> command, but I could be wrong. Could anyone else try to repro these results?

I got:

0001.2: rev-list --all --objects  37.07(36.62+0.45)   39.11(38.58+0.51) +5.5%

Less change, but my overall times were smaller, too, so clearly our
hardware or exact repos are a little bit different. Those numbers seem
pretty consistent in further runs.

It bisects to 509f6f62a4 (cache: update object ID functions for
the_hash_algo, 2018-07-16). Which make sense. An "--objects" traversal
spends a huge amount of time checking each tree entry to see if we've
processed that object yet, which ends up as hashcmp() in the hash table.
I expect that a fixed 20-byte memcmp() can be optimized a lot more than
one with an arbitrary value.

Even if _we_ know the value can only take on one of a few values, I
don't know that we have an easy way to tell the compiler that. Possibly
we could improve things by jumping directly to an optimized code path.
Sort of a poor-man's JIT. ;)

Doing this:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..9c004a26c9 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,7 +1023,10 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   if (the_hash_algo->rawsz == 20)
+   return memcmp(sha1, sha2, 20);
+   else
+   return memcmp(sha1, sha1, the_hash_algo->rawsz);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)
on top of v2.19-rc0 seems to give me about a 3% speedup (though I might
be imaging it, as there's a bit of noise). A function pointer in
the_hash_algo might make even more sense.

-Peff


Re: [PATCH v2 2/2] commit-graph.txt: improve formatting for asciidoc

2018-08-21 Thread Junio C Hamano
"Derrick Stolee via GitGitGadget"  writes:

> From: Derrick Stolee 
>
> When viewing commit-graph.txt as a plain-text document, it makes
> sense to keep paragraphs left-padded between bullet points.
> However, asciidoc converts these left-padded paragraphs as monospace
> fonts, creating an unpleasant document. Remove the padding.

That's completely backwards.

These indented two paragraphs that follow "... the following
property:" do not fall into the same classes of paragraphs as
others.  The way the text version makes them stand out by indenting
clearly show these two are what "... the following ..." refers to.

Perhaps these two would want to become a bulleted list or
something.  

>
> The "Future Work" section includes a bulleted list of items, and one
> item has sub-items. These do not render properly in asciidoc, so
> remove the sub-list and incorporate them into the paragraph.
>
> Signed-off-by: Derrick Stolee 
> ---
>  Documentation/technical/commit-graph.txt | 43 +++-

I said I didn't check if commit-graph-format.txt doc format
correctly, but that does not mean you do not have to ;-).  I suspect
that most of the contents would become monospaced wall of text,
which is no better than the original text and because only the
headings are typeset in different font, the result actually would be
harder to read than the original.

We are not in a hurry to do this during the pre-release period, are
we?  My understanding is that the original text documentation files
will be shipped and installed, and they are adequately readable
(correct me if it is not the case).

Unless we are making the result a lot more readable as the original
text, let's not distract ourselves by rerolling this in too quick
cycles without giving us sufficiently big improvements.

Thanks.


Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 01:52:33PM -0700, Junio C Hamano wrote:

> > I think there really are two bugs here, though. The find_patch_start()
> > check is overly lax, but we also should not have to use it at all when
> > we know there is no patch.
> 
> Yes, I was grepping for callchains, and it appeared none of them
> actually expected us to feed "log plus --- plus patch" format.  The
> obvious candidate to take it is "am" but we ask mailinfo to do the
> splitting before the log message part even hits the rest of the
> system.  So my inclination right now is to see if that is truly the
> case and get rid of that bogus "patch start" thing, and if not, add
> a flag to let the caller say "I know we only have message" to the
> callchain.

I suspect this same logic is used by git-interpret-trailers, which is
taking an arbitrary message on stdin. That is probably the lone caller
that needs to retain this magic.

-Peff


Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 01:57:07PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Ah, yeah, I think you're right. We call find_patch_start(), which thinks
> > the "---" line is the end of the commit message. That makes sense when
> > parsing trailers out of "format-patch" output, but not when we know we
> > have just the commit message.
> 
> Yes, but that does not explain what we are seeing.  If the code
> mistakenly thinks that the log message ends before that table, then
> it should have inserted the S-o-b: _before_ that table, but that is
> not happening.
> 
> So there are three issues; (1) find-patch-start uses too weak a
> logic to find the beginning of a patch section (2) even if it found
> the right place, its caller does not tell "commit --amend -s" where
> the log message ends correctly and (3) some callchains that get
> there know they only have a log message but there is no way to take
> advantage of that information and skip the call to find-patch-start.

Yes, I'd agree with all of that.

I'm going offline, so if you are part-way through a patch, please do not
worry that we might duplicate effort. :)

-Peff


Re: [PATCH 3/7] submodule: is_submodule_active to differentiate between new and old mode

2018-08-21 Thread Junio C Hamano
Stefan Beller  writes:

>> So I think this patch is insufficient, and needs to at least change
>> the "submodule.active" codepath to return !!ret; otherwise, a caller
>> that now expects 0 (not active), 1 (active but can lose URL) and 2
>> (active and the presence of URL makes it so) will be confused when
>> one of the MATCHED_* constants from dir.h comes back.
>
> Yes.
>
> I'll resend when appropriately.

Thanks.


Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0

2018-08-21 Thread Junio C Hamano
Stefan Beller  writes:

>> ...
>> OTOH, if it were up to me I would have just gotten rid of
>> test_must_be_empty and used an existing function with the right
>> argument, like `test_cmp /dev/null` - but using some form consistently
>> is the most important, whatever it is.
>
> /dev/null, eh? It shows you don't use Windows on a day to day basis. ;-)
> But yeah consistency is really good to have. :)

Just to make sure we don't give wrong impression to bystanders, do
you mean that we should discourage using /dev/null in our tests or
scripts due to portability concerns?

I thought they had good enough emulation that writing /dev/null on
the command line in scripts do what we expect the shell to do; the
same thing can be said for calling open(2) on "/dev/null".

Back to the topic from the tangent, but there was a discussion on
choosing between "test_must_be_empty actual" vs "test_cmp empty
actual", and was even a proposal to trigger an error when an empty
file is given to test_cmp.  You two might want to join the party
there, perhaps?



Re: [PATCH] t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test

2018-08-21 Thread Junio C Hamano
Jeff King  writes:

>> > If we assume that "expect" is first (which is our convention but not
>> > necessarily guaranteed), then I think the best logic is something like:
>> > 
>> >   if $1 is empty; then
>> > bug in the test script
>> >   elif test_cmp_allow_empty "$@"
>> > test failure
>> > 
>> > We do not need to check $2 at all. An empty one is either irrelevant (if
>> > the expectation is empty), or a test failure (because it would not match
>> > the non-empty $1).
>> 
>> ... this is indeed a better solution. I written out the cases for
>> updated test_cmp to straighten out my thinking:
>
> I'd be OK pursuing either this line, or what you showed originally.

As I do find [1] to be a real concern, I'd prefer not to flag empty
input to test_cmp as special.  But if we _were_ to do something, I
agree that "$2 can be anything---that is the output from the
potentially buggy program we are testing" is the right attitude to
take.

[Footnote]

*1*
https://public-inbox.org/git/cam0vkjkt7fbjrie_3f4b13bht9hp9mxrhux5r1sogh2x7kq...@mail.gmail.com/


[GitHub] An email address was added to your account.

2018-08-21 Thread GitHub
Hey amc2399!

An email address (aam...@icloud.com) was added to your account. Visit 
https://github.com/settings/emails to review email addresses currently 
associated with your account.

To see this and other security events for your account, visit 
https://github.com/settings/security

If you run into problems, please contact support by visiting 
https://github.com/contact

Thanks,
Your friends at GitHub


[PATCH] t7501-commit: drop silly command substitution

2018-08-21 Thread SZEDER Gábor
The test '--dry-run with conflicts fixed from a merge' in
't7501-commit.sh', added in 8dc874b2ee (wt-status.c: set commitable
bit if there is a meaningful merge., 2016-02-15), runs the following
unnecessary and downright bogus command substitution:

  ! $(git merge --no-commit commit-1) &&

I.e. after 'git merge ...' is executed and expectedly fails, the test
attempts to execute its output:

  Merging:
  80f2ea2 commit 2
  virtual commit-1
  found 1 common ancestor:
  e60d113 Initial commit
  Auto-merging test-file
  CONFLICT (content): Merge conflict in test-file
  Automatic merge failed; fix conflicts and then commit the result.

as a command, which most likely fails, because there is no such
command as "Merging:".  Then '!' negates the failed exit status, the
test continues, and eventually succeeds.

Remove this command substitution and use 'test_must_fail' to ensure
that 'git merge' fails.

Signed-off-by: SZEDER Gábor 
---
 t/t7501-commit.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh
index 51646d8019..d766bf34c4 100755
--- a/t/t7501-commit.sh
+++ b/t/t7501-commit.sh
@@ -677,7 +677,7 @@ test_expect_success '--dry-run with conflicts fixed from a 
merge' '
git checkout -b branch-2 HEAD^1 &&
echo "commit-2-state" >test-file &&
git commit -m "commit 2" -i test-file &&
-   ! $(git merge --no-commit commit-1) &&
+   test_must_fail git merge --no-commit commit-1 &&
echo "commit-2-state" >test-file &&
git add test-file &&
git commit --dry-run &&
-- 
2.19.0.rc0.136.gd2dd172e64



Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-21 Thread brian m. carlson
On Tue, Aug 21, 2018 at 05:29:24PM -0400, Jeff King wrote:
> 0001.2: rev-list --all --objects  37.07(36.62+0.45)   39.11(38.58+0.51) +5.5%
> 
> Less change, but my overall times were smaller, too, so clearly our
> hardware or exact repos are a little bit different. Those numbers seem
> pretty consistent in further runs.
> 
> It bisects to 509f6f62a4 (cache: update object ID functions for
> the_hash_algo, 2018-07-16). Which make sense. An "--objects" traversal
> spends a huge amount of time checking each tree entry to see if we've
> processed that object yet, which ends up as hashcmp() in the hash table.
> I expect that a fixed 20-byte memcmp() can be optimized a lot more than
> one with an arbitrary value.
> 
> Even if _we_ know the value can only take on one of a few values, I
> don't know that we have an easy way to tell the compiler that. Possibly
> we could improve things by jumping directly to an optimized code path.
> Sort of a poor-man's JIT. ;)
> 
> Doing this:
> 
> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..9c004a26c9 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1023,7 +1023,10 @@ extern const struct object_id null_oid;
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> *sha2)
>  {
> - return memcmp(sha1, sha2, the_hash_algo->rawsz);
> + if (the_hash_algo->rawsz == 20)
> + return memcmp(sha1, sha2, 20);
> + else
> + return memcmp(sha1, sha1, the_hash_algo->rawsz);
>  }
>  
>  static inline int oidcmp(const struct object_id *oid1, const struct 
> object_id *oid2)
> on top of v2.19-rc0 seems to give me about a 3% speedup (though I might
> be imaging it, as there's a bit of noise). A function pointer in
> the_hash_algo might make even more sense.

It's possible that might be a better solution.  I looked into a GCC
assertion that the value was either 20 or 32, and that in itself didn't
seem to help, at least in the generated code.  Your solution is likely
better in that regard.

We could wire it up to be either 20 or 32 and let people experimenting
with other sizes of algorithms just add another branch.  I haven't
tested how that performs, though.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v2 2/2] commit-graph.txt: improve formatting for asciidoc

2018-08-21 Thread Derrick Stolee

On 8/21/2018 5:29 PM, Junio C Hamano wrote:

"Derrick Stolee via GitGitGadget"  writes:


From: Derrick Stolee 

When viewing commit-graph.txt as a plain-text document, it makes
sense to keep paragraphs left-padded between bullet points.
However, asciidoc converts these left-padded paragraphs as monospace
fonts, creating an unpleasant document. Remove the padding.

That's completely backwards.

These indented two paragraphs that follow "... the following
property:" do not fall into the same classes of paragraphs as
others.  The way the text version makes them stand out by indenting
clearly show these two are what "... the following ..." refers to.

Perhaps these two would want to become a bulleted list or
something.


The "Future Work" section includes a bulleted list of items, and one
item has sub-items. These do not render properly in asciidoc, so
remove the sub-list and incorporate them into the paragraph.

Signed-off-by: Derrick Stolee 
---
  Documentation/technical/commit-graph.txt | 43 +++-

I said I didn't check if commit-graph-format.txt doc format
correctly, but that does not mean you do not have to ;-).  I suspect
that most of the contents would become monospaced wall of text,
which is no better than the original text and because only the
headings are typeset in different font, the result actually would be
harder to read than the original.


I compared the results of commit-graph-format.txt to pack-format.txt. 
Since I had based the format of commit-graph-format.txt on that file, 
the output was remarkably similar (and hence I couldn't find anything to 
change).



We are not in a hurry to do this during the pre-release period, are
we?  My understanding is that the original text documentation files
will be shipped and installed, and they are adequately readable
(correct me if it is not the case).

Unless we are making the result a lot more readable as the original
text, let's not distract ourselves by rerolling this in too quick
cycles without giving us sufficiently big improvements.
No rush! I'll wait until the release calms down for a v3, and read the 
helpful docs on asciidoc format that Eric shared earlier.


Thanks,
-Stolee


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-21 Thread Jeff King
On Wed, Aug 22, 2018 at 12:48:16AM +, brian m. carlson wrote:

> > diff --git a/cache.h b/cache.h
> > index b1fd3d58ab..9c004a26c9 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -1023,7 +1023,10 @@ extern const struct object_id null_oid;
> >  
> >  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> > *sha2)
> >  {
> > -   return memcmp(sha1, sha2, the_hash_algo->rawsz);
> > +   if (the_hash_algo->rawsz == 20)
> > +   return memcmp(sha1, sha2, 20);
> > +   else
> > +   return memcmp(sha1, sha1, the_hash_algo->rawsz);
> >  }
> >  
> >  static inline int oidcmp(const struct object_id *oid1, const struct 
> > object_id *oid2)
> > on top of v2.19-rc0 seems to give me about a 3% speedup (though I might
> > be imaging it, as there's a bit of noise). A function pointer in
> > the_hash_algo might make even more sense.
> 
> It's possible that might be a better solution.  I looked into a GCC
> assertion that the value was either 20 or 32, and that in itself didn't
> seem to help, at least in the generated code.  Your solution is likely
> better in that regard.
> 
> We could wire it up to be either 20 or 32 and let people experimenting
> with other sizes of algorithms just add another branch.  I haven't
> tested how that performs, though.

Here's a _really_ dirty one:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..a6750524ea 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,6 +1023,7 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
+   assert(the_hash_algo->rawsz == 20);
return memcmp(sha1, sha2, the_hash_algo->rawsz);
 }
 

We probably don't want to do that, because it makes experimenting with
new hash algos a bit painful, but it gives the same 3-4% speedup pretty
consistently. But I think it demonstrates pretty clearly that giving the
compiler the extra limit information is sufficient. Presumably the
fixed-size memcmp turns into a few multi-word compares.

And indeed, if I look at the generated asm for the call in lookup_object
(which is likely the one we're hitting a lot in this case), I see:

  # cache.h:1027: return memcmp(sha1, sha2, the_hash_algo->rawsz);
  .loc 4 1027 9 is_stmt 0 view .LVU86
  movq(%rsi), %rcx# MEM[(void *)sha1_25(D)], MEM[(void 
*)sha1_25(D)]
  movq8(%rsi), %rdi   # MEM[(void *)sha1_25(D)], tmp125
  xorq4(%rax), %rcx   # MEM[(void *)_6], tmp116
  xorq8(%r8), %rdi# MEM[(void *)_6], tmp115
  orq %rcx, %rdi  # tmp116, tmp115
  jne .L27#,
  movl16(%r8), %ecx   # MEM[(void *)_6], tmp129
  cmpl%ecx, 16(%rsi)  # tmp129, MEM[(void *)sha1_25(D)]
  jne .L27#,

So I wonder if there's some other way to tell the compiler that we'll
only have a few values. An enum comes to mind, though I don't think the
enum rules are strict enough to make this guarantee (after all, it's OK
to bitwise-OR enums, so they clearly don't specify all possible values).

Having a dedicate hashcmp function for each hash_algo seems like the
sanest approach. We pay for one indirect function call, but the function
itself will have the constants available. But it does introduce one
extra complication. We're benefiting here from knowing that the size is
always 20, but also the inline hashcmp knows that we only care about
equality, not comparison.

So if I do this:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..da56da7be2 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,7 +1023,7 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   return the_hash_algo->cmp_fn(sha1, sha2);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)
diff --git a/hash.h b/hash.h
index 7c8238bc2e..ac22ba63b6 100644
--- a/hash.h
+++ b/hash.h
@@ -64,6 +64,7 @@ typedef union git_hash_ctx git_hash_ctx;
 typedef void (*git_hash_init_fn)(git_hash_ctx *ctx);
 typedef void (*git_hash_update_fn)(git_hash_ctx *ctx, const void *in, size_t 
len);
 typedef void (*git_hash_final_fn)(unsigned char *hash, git_hash_ctx *ctx);
+typedef int (*git_hash_cmp_fn)(const void *a, const void *b);
 
 struct git_hash_algo {
/*
@@ -90,6 +91,8 @@ struct git_hash_algo {
/* The hash finalization function. */
git_hash_final_fn final_fn;
 
+   git_hash_cmp_fn cmp_fn;
+
/* The OID of the empty tree. */
const struct object_id *empty_tree;
 
diff --git a/sha1-file.c b/sha1-file.c
index 97b7423848..7072e360d7 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -69,6 +69,11 @@ static void git_hash_sha1_final(unsigned char *hash, 
git_hash_ctx *ctx)
git_SHA1_Final(hash, &ctx->sha1);
 }
 
+static int git_hash_sha1_cmp(const void *a, const void *b)
+{
+   return memcmp(a, b, 20);
+}
+
 static void git_hash

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:

> with the obvious "oideq()" implementation added, that seems to get me to
> 2-3%. Not _quite_ as good as the original branching version I showed.
> And we had to touch all the callsites (although arguably that kind of
> "eq" function is a better interface anyway, since it obviously allows
> for more optimization.
> 
> So maybe the branching thing is actually not so insane. It makes new
> hash_algo's Just Work; they just won't be optimized. And the change is
> very localized.

Hmph. So I went back to double-check my measurements on that branching
version, and I couldn't replicate it!

It turns out what I showed (and measured) before has a bug. Can you see
it?

diff --git a/cache.h b/cache.h
index b1fd3d58ab..9c004a26c9 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,7 +1023,10 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   if (the_hash_algo->rawsz == 20)
+   return memcmp(sha1, sha2, 20);
+   else
+   return memcmp(sha1, sha1, the_hash_algo->rawsz);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)


The problem is the fallback code compares "sha1" to "sha1". The compiler
realizes that's a noop and is able to treat it like a constant. Thus
essentially leaving only the first branch, which it then expands into a
few instructions.

If we fix that bug, then we really do memcmp on either side of the
conditional. And the compiler is smart enough to realize that hey,
that's the same as just calling memcmp with the_hash_algo->rawsz on
either side. And we end up with roughly the same code that we started
with.

So the assert() version really is the fastest. I didn't test, but I
suspect we could "trick" the compiler by having the fallback call an
opaque wrapper around memcmp(). That would prevent it from combining the
two paths, and presumably it would still optimize the constant-20 side.
Or maybe it would eventually decide our inline function is getting too
big and scrap it. Which probably crosses a line of craziness (if I
didn't already cross it two emails ago).

-Peff


Kindly Confirm Receive

2018-08-21 Thread Richard Jeffery Esq




Good day,

My name is Richard Jeffery I have a client, who died as a result of
heart-related condition on the 10th of December, 2015. I have contacted
you to assist in distributing the estate left behind by my client, who
shares the same last name as yours. Would discuss more when I hear from
you. Contact me on this on this private email
jefferyrich...@mail2litigator.com to indicate your interest.

Thnaks and regards,
Richard Jeffery
Email: jefferyrich...@mail2litigator.com


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-21 Thread brian m. carlson
On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:
> So I wonder if there's some other way to tell the compiler that we'll
> only have a few values. An enum comes to mind, though I don't think the
> enum rules are strict enough to make this guarantee (after all, it's OK
> to bitwise-OR enums, so they clearly don't specify all possible values).

I was thinking about this:

diff --git a/cache.h b/cache.h
index 1398b2a4e4..1f5c6e9319 100644
--- a/cache.h
+++ b/cache.h
@@ -1033,7 +1033,14 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   switch (the_hash_algo->rawsz) {
+   case 20:
+   return memcmp(sha1, sha2, 20);
+   case 32:
+   return memcmp(sha1, sha2, 32);
+   default:
+   assert(0);
+   }
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)

That would make it obvious that there are at most two options.
Unfortunately, gcc for me determines that the buffer in walker.c is 20
bytes in size and steadfastly refuses to compile because it doesn't know
that the value will never be 32 in our codebase currently.  I'd need to
send in more patches before it would compile.

I don't know if something like this is an improvement or now, but this
seems to at least compile:

diff --git a/cache.h b/cache.h
index 1398b2a4e4..3207f74771 100644
--- a/cache.h
+++ b/cache.h
@@ -1033,7 +1033,13 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   switch (the_hash_algo->rawsz) {
+   case 20:
+   case 32:
+   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   default:
+   assert(0);
+   }
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)

I won't have time to sit down and test this out until tomorrow afternoon
at the earliest.  If you want to send in something in the mean time,
even if that limits things to just 20 for now, that's fine.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH] wt-status.c: set commitable bit if there is a meaningful merge.

2018-08-21 Thread Stephen Smith
On Tuesday, February 16, 2016 8:33:54 PM MST Junio C Hamano wrote:
> In fact, "commit --dry-run" is already broken without this "a merge
> ends up in a no-op" corner case.  The management of s->commitable
> flag and dry_run_commit() that uses it are unfortunately more broken
> than I originally thought.
> 

> This function is only called from wt_status_print(), which in turn
> is only called from run_status() in commit.c when the status format
> is unspecified or set to STATUS_FORMAT_LONG.
> 
> So if you do this:
> 
> $ git reset --hard HEAD
> $ >a-new-file && git add a-new-file
> $ git commit --dry-run --short; echo $?
> 
> you'd get "No, there is nothing interesting to commit", which is
> clearly bogus.

I was about to start working on working on this and ran the test you suggested 
back in 2016.   I don't get the error message from that time period.  I 
believe that this was fixed.   





Re: Contribution for an Open Source Git

2018-08-21 Thread Gaux Nation
https://drive.google.com/open?id=0B_6x2GR6_FTMc2RERHlJZVpQc0RIMFMxNi13bE95UU1fZmp3
On Tue, Aug 21, 2018 at 10:54 PM Gaux Nation  wrote:
>
> Hey there,
>
> The name is Sachin Sharma(19 yrs old, student, India) and passionate
> for innovative technology and digital content.
>
>
>
> Why Open Source?
>
> Open Source connects like-minded genes in the whole world. This lets
> contributors to get engaged with the groups of different communities.
>
>
>
> Why Git?
>
> I just love the platform your organization had built,
>
> 1.Branching and merging.
>
> 2.Small and fast.
>
> 3. Distributed.
>
> 4.Free and Open Source.*
>
>
>
> Why I?
>
> I’m a new bee, therefore I might not have that long list of projects
> but I have that level of commitment, attitude and ability to follow
> directions which are expected from a student.
>
>
>
> What do I need from Git?
>
> Guidance, neither too deep nor too short(optimum).
>
> How can I get started for a contribution?
>
> What are the necessary technical skills needed for a contribution to Git?
>
>
>
>
>
>
>
> GitHub:  satingaux@github
>
> Contact: 8572020330,9138258580.
>
>
>
> Quote: I might not get selected, but an interaction with you will make my day.
>
>I assure you, sir I will work hard to give my 100%.
>
>I feel blessed if I got chance for contributing to an open
> source community.
>
>
>
>
>
> Technical Skills: JAVA, JavaScript, SQL, C, C++.
>
>
>
> Thank you


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-21 Thread Jeff King
On Wed, Aug 22, 2018 at 05:36:26AM +, brian m. carlson wrote:

> On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:
> > So I wonder if there's some other way to tell the compiler that we'll
> > only have a few values. An enum comes to mind, though I don't think the
> > enum rules are strict enough to make this guarantee (after all, it's OK
> > to bitwise-OR enums, so they clearly don't specify all possible values).
> 
> I was thinking about this:
> 
> diff --git a/cache.h b/cache.h
> index 1398b2a4e4..1f5c6e9319 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1033,7 +1033,14 @@ extern const struct object_id null_oid;
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> *sha2)
>  {
> - return memcmp(sha1, sha2, the_hash_algo->rawsz);
> + switch (the_hash_algo->rawsz) {
> + case 20:
> + return memcmp(sha1, sha2, 20);
> + case 32:
> + return memcmp(sha1, sha2, 32);
> + default:
> + assert(0);
> + }
>  }

Unfortunately this version doesn't seem to be any faster than the status
quo. And looking at the generated asm, it still looks to be calling
memcpy(). Removing the "case 32" branch switches it back to fast
assembly (this is all using gcc 8.2.0, btw). So I think we're deep into
guessing what the optimizer is going to do, and there's a good chance
that other versions are going to optimize it differently.

We might be better off just writing it out manually. Unfortunately, it's
a bit hard because the neg/0/pos return is more expensive to compute
than pure equality. And only the compiler knows at each inlined site
whether we actually want equality. So now we're back to switching every
caller to use hasheq() if that's what they want.

But _if_ we're OK with that, and _if_ we don't mind some ifdefs for
portability, then this seems as fast as the original (memcmp+constant)
code on my machine:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..c406105f3c 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,7 +1023,16 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   switch (the_hash_algo->rawsz) {
+   case 20:
+   if (*(uint32_t *)sha1 == *(uint32_t *)sha2 &&
+   *(unsigned __int128 *)(sha1+4) == *(unsigned __int128 
*)(sha2+4))
+   return 0;
+   case 32:
+   return memcmp(sha1, sha2, 32);
+   default:
+   assert(0);
+   }
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)

Which is really no surprise, because the generated asm looks about the
same. There are obviously alignment questions there. It's possible it
could even be written portably as a simple loop. Or maybe not. We used
to do that, but modern compilers were able to optimize the memcmp
better. Maybe that's changed. Or maybe they were simply unwilling to
unroll a 20-length loop to find out that it could be turned into a few
quad-word compares.

> That would make it obvious that there are at most two options.
> Unfortunately, gcc for me determines that the buffer in walker.c is 20
> bytes in size and steadfastly refuses to compile because it doesn't know
> that the value will never be 32 in our codebase currently.  I'd need to
> send in more patches before it would compile.

Yeah, I see that warning all over the place (everywhere that calls
is_null_oid(), which is passing in a 20-byte buffer).

> I don't know if something like this is an improvement or now, but this
> seems to at least compile:
> 
> diff --git a/cache.h b/cache.h
> index 1398b2a4e4..3207f74771 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1033,7 +1033,13 @@ extern const struct object_id null_oid;
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> *sha2)
>  {
> - return memcmp(sha1, sha2, the_hash_algo->rawsz);
> + switch (the_hash_algo->rawsz) {
> + case 20:
> + case 32:
> + return memcmp(sha1, sha2, the_hash_algo->rawsz);
> + default:
> + assert(0);
> + }

I think that would end up with the same slow code, as gcc would rather
call memcmp than expand out the two sets of asm.

> I won't have time to sit down and test this out until tomorrow afternoon
> at the earliest.  If you want to send in something in the mean time,
> even if that limits things to just 20 for now, that's fine.

I don't have a good option. The assert() thing works until I add in the
"32" branch, but that's just punting the issue off until you add support
for the new hash.

Hand-rolling our own asm or C is a portability headache, and we need to
change all of the callsites to use a new hasheq().

Hiding it behind a per-hash function is conceptually cleanest, but not
quite as fast. And it also requires hash