date:20170915

On Fri, Sep 15, 2017 at 5:53 PM, Jonathan Nieder  wrote:
> Jason Merrill wrote:
>> On Fri, Sep 15, 2017 at 1:52 PM, Jonathan Nieder  wrote:
>> > Jason Merrill wrote:
>
 Subject: Fix merge parent checking with svn.pushmergeinfo.

 Without this fix, svn dcommit of a merge with svn.pushmergeinfo set would
 get error messages like "merge parent  for  is on branch
 svn+ssh://gcc.gnu.org/svn/gcc/trunk, which is not under the git-svn root
 svn+ssh://ja...@gcc.gnu.org/svn/gcc!"

 * git-svn.perl: Remove username from rooturl before comparing to branchurl.

 Signed-off-by: Jason Merrill 
>>>
>>> Interesting.  Thanks for writing it.
>>
>> Thanks for the review.
>>
>>> Could there be a test for this to make sure this doesn't regress in
>>> the future?  See t/t9151-svn-mergeinfo.sh for some examples.
>>
>> Hmm, I'm afraid figuring out how to write such a test would take
>> longer than I can really spare for this issue.  There don't seem to be
>> any svn+ssh tests currently.
>
> Well, could you give manual commands to allow me to reproduce the
> problem?
>
> Then I'll translate them into a test. :)

Something like this:

git svn clone -s svn+ssh://user@host/repo
git config svn.pushmergeinfo yes
git checkout -b branch origin/branch
git merge origin/trunk
git svn dcommit

Thanks!

> FWIW remove_username seems to be able to cope fine with an http://
> URL.  t/lib-httpd.sh starts an http server with Subversion enabled,
> as long as the envvar GIT_SVN_TEST_HTTPD is set to true.  Its address
> is $svnrepo, which is an http URL (but I don't see a username in the
> URL).  Does that help?

I think the http transport handles the username separately, not in the URL.

I would expect that a dummy ssh wrapper like some of the tests use
would be sufficient, no need for an actual network connection.

> Alternatively, does using rewrite-root as in t9151-svn-mergeinfo.sh
> help?

Hmm, I'm not sure how rewriteRoot would interact with this issue,
whether it would be useful as a workaround or another problematic
case.

Jason

RE: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension

> -Original Message-
> +dirty_repo () {
> + : >untracked &&
> + : >dir1/untracked &&
> + : >dir2/untracked &&
> + echo 1 >modified &&
> + echo 2 >dir1/modified &&
> + echo 3 >dir2/modified &&
> + echo 4 >new &&
> + echo 5 >dir1/new &&
> + echo 6 >dir2/new

If I add an untracked file named dir3/untracked to dirty_repo
 (and write_integration_script), then "status doesn't detect 
unreported modifications", below, fails.  Did I do something 
wrong, or does this turn up a bug?

> + test_expect_success "setup preloadIndex to $preload_val" '
> + git config core.preloadIndex $preload_val &&
> + if [ $preload_val -eq true ]

"-eq" is for numeric equality in POSIX shell.  So this works if your 
/bin/sh is bash but not if it's e.g. dash.  This happens twice more 
below.  Use "=" instead.

RE: [PATCH v6 12/12] fsmonitor: add a performance test


> -Original Message-
> + # Choose integration script based on existance of Watchman.

Spelling: existence

Re: [PATCH v2] git-svn: Fix svn.pushmergeinfo handling of svn+ssh usernames.

Jason Merrill wrote:

> Previously, svn dcommit of a merge with svn.pushmergeinfo set would
> get error messages like "merge parent  for  is on branch
> svn+ssh://gcc.gnu.org/svn/gcc/trunk, which is not under the git-svn root
> svn+ssh://ja...@gcc.gnu.org/svn/gcc!"
>
> So, let's call remove_username (as we do for svn info) before comparing
> rooturl to branchurl.
>
> Signed-off-by: Jason Merrill 
> Reviewed-by: Jonathan Nieder 
> ---
>  git-svn.perl | 1 +
>  1 file changed, 1 insertion(+)

This is indeed
Reviewed-by: Jonathan Nieder 

though it does need a test --- I have no confidence that this fix will
be preserved without one.  Anyway, that can happen in a separate
patch.

Thanks for your work,
Jonathan

Re: [PATCH] Fix merge parent checking with svn.pushmergeinfo.

Jason Merrill wrote:
> On Fri, Sep 15, 2017 at 1:52 PM, Jonathan Nieder  wrote:
> > Jason Merrill wrote:

>>> Subject: Fix merge parent checking with svn.pushmergeinfo.
>>>
>>> Without this fix, svn dcommit of a merge with svn.pushmergeinfo set would
>>> get error messages like "merge parent  for  is on branch
>>> svn+ssh://gcc.gnu.org/svn/gcc/trunk, which is not under the git-svn root
>>> svn+ssh://ja...@gcc.gnu.org/svn/gcc!"
>>>
>>> * git-svn.perl: Remove username from rooturl before comparing to branchurl.
>>>
>>> Signed-off-by: Jason Merrill 
>>
>> Interesting.  Thanks for writing it.
>
> Thanks for the review.
>
>> Could there be a test for this to make sure this doesn't regress in
>> the future?  See t/t9151-svn-mergeinfo.sh for some examples.
>
> Hmm, I'm afraid figuring out how to write such a test would take
> longer than I can really spare for this issue.  There don't seem to be
> any svn+ssh tests currently.

Well, could you give manual commands to allow me to reproduce the
problem?

Then I'll translate them into a test. :)

FWIW remove_username seems to be able to cope fine with an http://
URL.  t/lib-httpd.sh starts an http server with Subversion enabled,
as long as the envvar GIT_SVN_TEST_HTTPD is set to true.  Its address
is $svnrepo, which is an http URL (but I don't see a username in the
URL).  Does that help?

Alternatively, does using rewrite-root as in t9151-svn-mergeinfo.sh
help?

[...]
> How about this?
>
> git-svn: Fix svn.pushmergeinfo handling of svn+ssh usernames.
>
> Previously, svn dcommit of a merge with svn.pushmergeinfo set would
> get error messages like "merge parent  for  is on branch
> svn+ssh://gcc.gnu.org/svn/gcc/trunk, which is not under the git-svn root
> svn+ssh://ja...@gcc.gnu.org/svn/gcc!"
>
> So, let's call remove_username (as we do for svn info) before comparing
> rooturl to branchurl.

Looks good.

Thanks.

Jonathan

[PATCH v2] git-svn: Fix svn.pushmergeinfo handling of svn+ssh usernames.

Previously, svn dcommit of a merge with svn.pushmergeinfo set would
get error messages like "merge parent  for  is on branch
svn+ssh://gcc.gnu.org/svn/gcc/trunk, which is not under the git-svn root
svn+ssh://ja...@gcc.gnu.org/svn/gcc!"

So, let's call remove_username (as we do for svn info) before comparing
rooturl to branchurl.

Signed-off-by: Jason Merrill 
Reviewed-by: Jonathan Nieder 
---
 git-svn.perl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/git-svn.perl b/git-svn.perl
index fa42364785..3b95d67bde 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -931,6 +931,7 @@ sub cmd_dcommit {
# information from different SVN repos, and paths
# which are not underneath this repository root.
my $rooturl = $gs->repos_root;
+   Git::SVN::remove_username($rooturl);
foreach my $d (@$linear_refs) {
my %parentshash;
read_commit_parents(\%parentshash, $d);
-- 
2.13.5

RE: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

> -Original Message-
> From: Ben Peart [mailto:benpe...@microsoft.com]
> Sent: Friday, September 15, 2017 3:21 PM
> To: benpe...@microsoft.com
> Cc: David Turner ; ava...@gmail.com;
> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
> Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file 
> system
> monitor to speed up detecting new or changed files.
 
> +int git_config_get_fsmonitor(void)
> +{
> + if (git_config_get_pathname("core.fsmonitor", _fsmonitor))
> + core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
> +
> + if (core_fsmonitor && !*core_fsmonitor)
> + core_fsmonitor = NULL;
> +
> + if (core_fsmonitor)
> + return 1;
> +
> + return 0;
> +}

This functions return values are backwards relative to the rest of the 
git_config_* functions.

[snip]

+>  /*
+>   * With fsmonitor, we can trust the untracked cache's valid field.
+>   */

[snip]

> +int read_fsmonitor_extension(struct index_state *istate, const void *data,
> + unsigned long sz)
> +{

If git_config_get_fsmonitor returns 0, fsmonitor_dirty will leak.

[snip]

> + /* a fsmonitor process can return '*' to indicate all entries are 
> invalid */

That's not documented in your documentation.  Also, I'm not sure I like it: 
what 
if I have a file whose name starts with '*'?  Yeah, that would be silly, but 
this indicates the need 
for the protocol to have some sort of signaling mechanism that's out-of-band  
Maybe 
have some key\0value\0 pairs and then \0\0 and then the list of files?  Or, if 
you want to keep
it really simple, allow an entry of '/' (which is an invalid filename) to mean 
'all'.

> +void add_fsmonitor(struct index_state *istate) {
> + int i;
> +
> + if (!istate->fsmonitor_last_update) {
[snip]
> + /* reset the untracked cache */

Is this really necessary?  Shouldn't the untracked cache be in a correct state 
already? 

> +/*
> + * Clear the given cache entries CE_FSMONITOR_VALID bit and invalidate

Nit: "s/entries/entry's/".

Re: [PATCH 1/1] reset: fix reset when using the sparse-checkout feature.

2017-09-15 Thread Jacob Keller

On Fri, Sep 15, 2017 at 10:21 AM, Kevin Willford  wrote:
> From: Junio C Hamano [mailto:gits...@pobox.com]
> Sent: Thursday, September 14, 2017 11:00 PM
>>
>> Kevin Willford  writes:
>>
>> > 1. Does this statement, "I only care about the files in this
>> > sparse checkout, and do not concern me with anything else", mean
>> > that git should not change files outside the sparse-checkout whether
>> > that be in the working directory or in the index?  Or does that only
>> > apply to the working directory and the index version of files can
>> > change to whatever git the git command would do without using
>> > sparse?  For example if I am doing a 'git reset HEAD~1'  should the
>> > version in the index of files outside the sparse not be changed or
>> > change to the HEAD~1 version with the skip-worktree bit on?
>>
>> My understanding of the purpose of "sparse checkout" thing is that
>> the user still wants to create correct whole-tree commit even the
>> user does not have the whole-tree checkout.  The object names for
>> blobs recorded in the index that are meant to be included in the
>> next commit MUST be the same as those that would be in the index
>> when the "sparse" feature is not in use.  "reset HEAD~1" should
>> match the index entries to the tree entries in HEAD~1.  So, the
>> latter, I would think, among your two alternatives.
>>
>> IOW, after "git reset HEAD~", if you drop the skip-worktree bit from
>> all index entries, "git diff --cached HEAD" must say "there is no
>> changes".
>>
>> The only difference between the "sparse" and normal would be that,
>> because the "sparse" user does not intend to change anything outside
>> the "sparse" area, these paths outside her "sparse" area would not
>> materialize on the filesystem.  For the next "write-tree" out of the
>> index to still write the correct tree out, the entries outside her
>> "sparse" area in the index MUST match the tree of the commit she
>> started working from.
>>
>
> Makes sense.  And even though the reset might only change entries
> outside the sparse area and the next status will report "nothing to commit,
> working tree clean",  that's okay because the user hasn't changed
> anything in their sparse area and intended to roll back the index to
> whatever they specified in their reset command.
>
>> > 2. How will this work with other git commands like merge, rebase,
>> > cherry-pick, etc.?
>> > 3. What about when there is a merge conflict with a file that is outside
>> > the sparse checkout?
>>
>> I would say, rather than forbidding such a merge, it should let her
>> see and deal with the conflict by dropping the "this is outside the
>> sparse area, so do not bother materializing it to the filesystem"
>> bit, but tell her loudly what it did ("I checked out a half-merged
>> file outside your sparse-checkout area because you'd need it while
>> resolving the conflict").  By doing things that way, the user can
>> decide if she wants to go ahead and complete the merge, even if the
>> conflict is outside the area she is currently interested in, or
>> postpone the merge and continue working on what she has been working
>> on inside the narrowed-down area first.
>>
>> I do not have a strong opinion whether the sparse-checkout
>> configuration file should be adjusted to match when the command must
>> tentatively bust the sparse checkout area; I'd imagine it can be
>> argued either way.
>>
>> Note that "sparse" is not my itch, and I would not be surprised if
>> those who designed it may want it to work differently from my
>> knee-jerk reaction in the previous two paragraphs, and I may even
>> find such an alternative solution preferable.
>>
>> But it is highly unlikely for any sensible solution would violate
>> the basic premise, i.e. "the indexed contents will stay the same as
>> the case without any 'sparse', so the next write-tree will do the
>> right thing".
>
> There was one other case that I thought about while implementing
> this approach and it is when the user creates a file that is outside their
> sparse definition.  From your explanation above I will attempt to
> explain how I think it should work and please correct me if you see
> it working differently.
>
> The user creates a file that is outside the sparse area and it will show
> up as untracked.  No problem here since the untracked are outside the
> scope of using sparse.  Next the user adds the untracked file to the index.
> The skip-worktree bit should be off and stay off since the user could make
> additional changes and want to add them.  Once the user commits the
> newly created file, I could see turning on the skip-worktree bit and
> removing the file from the working tree after the commit but since
> this was a user initiated action and it is sparse "checkout" it makes
> sense to wait until the next "checkout" command which will use the
> sparse definition to clean this file from the working directory and
> turn on the skip-worktree

Re: [PATCH] Fix merge parent checking with svn.pushmergeinfo.

On Fri, Sep 15, 2017 at 1:52 PM, Jonathan Nieder  wrote:
> Hi,
>
> Jason Merrill wrote:
>
>> Subject: Fix merge parent checking with svn.pushmergeinfo.
>>
>> Without this fix, svn dcommit of a merge with svn.pushmergeinfo set would
>> get error messages like "merge parent  for  is on branch
>> svn+ssh://gcc.gnu.org/svn/gcc/trunk, which is not under the git-svn root
>> svn+ssh://ja...@gcc.gnu.org/svn/gcc!"
>>
>> * git-svn.perl: Remove username from rooturl before comparing to branchurl.
>>
>> Signed-off-by: Jason Merrill 
>
> Interesting.  Thanks for writing it.

Thanks for the review.

> Could there be a test for this to make sure this doesn't regress in
> the future?  See t/t9151-svn-mergeinfo.sh for some examples.

Hmm, I'm afraid figuring out how to write such a test would take
longer than I can really spare for this issue.  There don't seem to be
any svn+ssh tests currently.

> Nit: git doesn't use GNU-style changelogs, preferring to let the code
> speak for itself.  Maybe it would work better as the subject line?
> E.g. something like
>
> git-svn: remove username from root before comparing to branch URL
>
> Without this fix, ...
>
> Signed-off-by: ...

How about this?

git-svn: Fix svn.pushmergeinfo handling of svn+ssh usernames.

Previously, svn dcommit of a merge with svn.pushmergeinfo set would
get error messages like "merge parent  for  is on branch
svn+ssh://gcc.gnu.org/svn/gcc/trunk, which is not under the git-svn root
svn+ssh://ja...@gcc.gnu.org/svn/gcc!"

So, let's call remove_username (as we do for svn info) before comparing
rooturl to branchurl.

>> ---
>>  git-svn.perl | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/git-svn.perl b/git-svn.perl
>> index fa42364785..1663612b1c 100755
>> --- a/git-svn.perl
>> +++ b/git-svn.perl
>> @@ -931,6 +931,7 @@ sub cmd_dcommit {
>>   # information from different SVN repos, and paths
>>   # which are not underneath this repository root.
>>   my $rooturl = $gs->repos_root;
>> + Git::SVN::remove_username ($rooturl);
>
> style nit: Git doesn't include a space between function names and
> their argument list.

Fixed.

> I wonder if it would make sense to rename the $rooturl variable
> since now it is not the unmodified root. E.g. how about
>
> my $expect_url = $gs->repos_root;
> Git::SVN::remove_username($expect_url);
> ...
>
>>   foreach my $d (@$linear_refs) {
>>   my %parentshash;
>>   read_commit_parents(\%parentshash, $d);

It isn't the unmodified root, but it is the effective root that is
printed by svn info and used in branch URLs in git-svn-id, so it seems
to me that the name $rooturl is still appropriate.

> The rest looks good.
>
> Thanks and hope that helps,
> Jonathan

Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)

Johannes Schindelin  writes:

> On Sat, 16 Sep 2017, Junio C Hamano wrote:
>
>> Johannes Schindelin  writes:
>> 
>> > If you want *contributors* to ping the thread themselves, how about
>> > *posting your updates there, too*?
>> 
>> I do not understand this comment at all.  That is what I and others
>> already and always do by responding to the patches, and when trying
>> to see if a topic is still alive, with thread-specific responses and
>> pings.
>> 
>> If you are demanding that "What's cooking" report to be sent as a
>> response to all the topics, that will *NOT* going to happen.  It is
>> meant to give a summary of the current state to help contributors
>> and reviewers be aware of what is happening across the entire tree
>> and not limited to a specific topic.
>
> If that is the case, you will have to be okay with others responding to
> your "What's cooking" mails instead of the original threads.

Yes and no.  When I say "This topic waits for a reroll" and somebody
(not necessarily the author of the topic) wants to remind me that a
reroll has already been posted (or worse--I queued the updated
version but forgot to update the message), I do appreciate that the
reply is made to the "What's cooking" report.  When there is "This
topic waits for a response to a review comment" and the responder
wants to respond to the review comment, the reply should be made in
response to that review comment.  Otherwise, the discussion will
lose the continuity.

And as you alluded to, we may need to see if we can help making it
easier to do the latter when needed.

> That's what
> you are buying into by having these "What's cooking" mails that are in no
> usable way connected to the original threads.

For the above reason, I do not think this is a particularly useful
stance to take.  Do you have a concrete suggestion to make these
individual entries more helpful for people who may want go back to
the original thread in doing so?  In-reply-to: or References: fields
of the "What's cooking" report would not help.  I often have the
message IDs that made/helped me make these individual comments in
the description; they alone would not react to mouse clicks, though.

Having said that, I'd expect that individual contributors have past
messages pertaining to the smaller numbers of their own topics in
flight in their mailbox than the project wide "What's cooking"
report covers, so perhaps an affort to devise such a mechanism may
result in an over-engineering waste nobody finds useful.  I dunno.

Re: [PATCH v5 25/40] external-odb: add 'get_direct' support

2017-09-15 Thread Jonathan Tan

On Fri, 15 Sep 2017 13:24:50 +0200
Christian Couder  wrote:

> > There are still some nuances. For example, if an external ODB provides
> > both a tree and a blob that the tree references, do we fetch the tree in
> > order to call "have" on all its blobs, or do we trust the ODB that if it
> > has the tree, it has all the other objects? In my design, I do the
> > latter, but in the general case where we have multiple ODBs, we might
> > have to do the former. (And if we do the former, it seems to me that the
> > connectivity check must be performed "online" - that is, with the ODBs
> > being able to provide "get".)
> 
> Yeah, I agree that the problem is more complex if there can be trees
> or all kind of objects in the external odb.
> But as I explain in the following email to Junio, I don't think
> storing other kind of objects is one of the most interesting use case:
> 
> https://public-inbox.org/git/CAP8UFD3=nutrf24clsok4hsgm3nxgh8sbzvpmcg7cnchj2z...@mail.gmail.com/

If we start with only blobs in the ODB, that makes sense (the ODB will
need to supply a fast enough "list" or "have", but, as you wrote before,
a mechanism like fetching an additional ref that contains all the
necessary information whenever we fetch refs would be enough). I agree
that it would work with existing use cases (including yours).

> > (Also, my work extends all the way to fetch/clone [1], but admittedly I
> > have been taking it a step at a time and recently have only been
> > discussing how the local repo should handle the missing object
> > situation.)
> >
> > [1] 
> > https://public-inbox.org/git/cover.1499800530.git.jonathanta...@google.com/
> 
> Yeah, I think your work is interesting and could perhaps be useful for
> external odbs as there could be situations that would be handled
> better using your work or something similar.

Thanks.

Re: commit-msg hook does not run on merge with --no-ff option

Joseph Dunne  writes:

> Valid point.  The way my project is set up I always get a conflict on
> merge operations, so technically all my merges (except fast forward
> merges) end with a git-commit, which of course runs the commit-msg
> hook.  It seems everything is working as designed.  Shame there isn't
> a merge-msg hook.
>
> It seems I have no choice but to work around this issue.  Thanks for your 
> help.

I think Stefan wanted to say in his message upthread that with an
update still in flight you may not need a workaround.

Re: Commit dropped when swapping commits with rebase -i -p

Sebastian Schuberth  writes:

> On 2017-09-02 02:04, Jonathan Nieder wrote:
>
>>> Anyway, this should really more explicitly say *what* you need to know
>>> about, that is, reordering commits does not work.
>> 
>> It tries to explain that, even with an example.  If you have ideas for
>> improving the wording, that would be welcome.
>
> As a first step, I indeed believe the wording must the stronger / clearer. 
> How about this:
>
> From f69854ce7b9359603581317d152421ff6d89f345 Mon Sep 17 00:00:00 2001
> From: Sebastian Schuberth 
> Date: Mon, 11 Sep 2017 10:41:27 +0200
> Subject: [PATCH] docs: use a stronger wording when describing bugs with 
> rebase -i -p
>
> Signed-off-by: Sebastian Schuberth 
> ---
>  Documentation/git-rebase.txt | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> index 6805a74aec..ccd0a04d54 100644
> --- a/Documentation/git-rebase.txt
> +++ b/Documentation/git-rebase.txt
> @@ -782,10 +782,11 @@ case" recovery too!
>  
>  BUGS
>  
> -The todo list presented by `--preserve-merges --interactive` does not
> -represent the topology of the revision graph.  Editing commits and
> -rewording their commit messages should work fine, but attempts to
> -reorder commits tend to produce counterintuitive results.
> +Be careful when combining the `-i` / `--interactive` and `-p` /
> +`--preserve-merges` options.  Reordering commits will drop commits from the
> +main line. This is because the todo list does not represent the topology of 
> the
> +revision graph in this case.  However, editing commits and rewording their
> +commit messages 'should' work fine.
>  
>  For example, an attempt to rearrange
>  


Anybody?  I personally feel that the updated text is not all that
stronger but it is clearer by clarifying what "counterintuitive
results" actually mean, but I am not the target audience this
paragraph is trying to help, nor I am the one who is making excuse
for a known bug, so...

Re: What's cooking in git.git (Sep 2017, #03; Fri, 15)

Johannes Schindelin  writes:

> Hi Junio,
>
> On Fri, 15 Sep 2017, Junio C Hamano wrote:
>
>> --
>> [Cooking]
>> 
>> [...]
>> 
>> * mk/diff-delta-uint-may-be-shorter-than-ulong (2017-08-10) 1 commit
>>  ...
>>  Dropped, as it was rerolled for review as part of a larger series.
>>  cf. <1502914591-26215-1-git-send-email-martin@mail.zuhause>
>> 
>> [...]
>> 
>> * mk/use-size-t-in-zlib (2017-08-10) 1 commit
>>  ...
>>  Dropped, as it was rerolled for review as part of a larger series.
>>  cf. <1502914591-26215-1-git-send-email-martin@mail.zuhause>
>> 
>> 
>> * mk/diff-delta-avoid-large-offset (2017-08-11) 1 commit
>>  ...
>>  Dropped, as it was rerolled for review as part of a larger series.
>>  cf. <1502914591-26215-1-git-send-email-martin@mail.zuhause>
>> 
>> 
>> --
>> [Discarded]
>> 
>> [...]
>
> These three topics are still in the wrong category. Please fix.

Hmph, but did the larger series these refer to actually land?  If I
recall correctly they were too invasive to get merged cleanly to
'next' and 'pu' without disrupting topics in flight and that is why
it is not even listed there.

The only two reasons a topic with a good goal and approach gets
thrown into the Discarded bin are either because it is tentatively
retracted to avoid disrupting other topics in flight (with the
expectation to be redone later) or what it wants to solve is
addressed by another topic and becomes unnecessary.

These three attempt to do good things and I have been hoping that
others can help moving them forward by e.g. reporting that they
still cleanly merge and stand on their own as improvements at a
smaller scale than the larger one that was attempted but was not
queued, or if nobody volunteers, I was guessing that I might end up
doing that myself.  Before that happens, I'd prefer to keep them
listed and topics kept in my tree, even if they are not part of
'pu'.

On the other hand, because many topics have graduated to master
recently, the larger one (possibly in the updated form---I do not
recall what conflicts it had with what other topics) may be able to
cook peacefully together with topics we have that are still in
flight.  If that is the case, then that larger topic will be queued
and these three will truly become material for the Discarded bin.

RFC: Design and code of partial clones (now, missing commits and trees OK)

2017-09-15 Thread Jonathan Tan

For those interested in partial clones and/or missing objects in repos,
I've updated my original partialclone patches to not require an explicit
list of promises. Fetch/clone still only permits exclusion of blobs, but
the infrastructure is there for a local repo to support missing trees
and commits as well.

They can be found here:

https://github.com/jonathantanmy/git/tree/partialclone2

To make the number of patches more manageable, I have omitted support
for a custom fetching hook (but it can be readded in fetch-object.c),
and only support promisor packfiles for now (but I don't have any
objection to supporting promisor loose objects in the future).

Let me know what you think of the overall approach. In particular, I'm
still wondering if there is a better approach than to toggle
"fetch_if_missing" whenever we need lazy fetching (or need to suppress
it).

Also, if there any patches that you think might be useful to others, let
me know and I'll send them to this mailing list for review.

A demo and an overview of the design (also available from that
repository's README):

Demo


Obtain a repository.

$ make prefix=$HOME/local install
$ cd $HOME/tmp
$ git clone https://github.com/git/git

Make it advertise the new feature and allow requests for arbitrary blobs.

$ git -C git config uploadpack.advertiseblobmaxbytes 1
$ git -C git config uploadpack.allowanysha1inwant 1

Perform the partial clone and check that it is indeed smaller. Specify
"file://" in order to test the partial clone mechanism. (If not, Git will
perform a local clone, which unselectively copies every object.)

$ git clone --blob-max-bytes=10 "file://$(pwd)/git" git2
$ git clone "file://$(pwd)/git" git3
$ du -sh git2 git3
116Mgit2
129Mgit3

Observe that the new repo is automatically configured to fetch missing objects
from the original repo. Subsequent fetches will also be partial.

$ cat git2/.git/config
[core]
repositoryformatversion = 1
filemode = true
bare = false
logallrefupdates = true
[remote "origin"]
url = [snip]
fetch = +refs/heads/*:refs/remotes/origin/*
blobmaxbytes = 10
[extensions]
partialclone = origin
[branch "master"]
remote = origin
merge = refs/heads/master

Unlike in an older version of this code (see the `partialclone` branch), this
also works with the HTTP/HTTPS protocols.

Design
==

Local repository layout
---

A repository declares its dependence on a *promisor remote* (a remote that
declares that it can serve certain objects when requested) by a repository
extension "partialclone". `extensions.partialclone` must be set to the name of
the remote ("origin" in the demo above).

A packfile can be annotated as originating from the promisor remote by the
existence of a "(packfile name).promisor" file with arbitrary contents (similar
to the ".keep" file). Whenever a promisor remote sends an object, it declares
that it can serve every object directly or indirectly referenced by the sent
object.

A promisor packfile is a packfile annotated with the ".promisor" file. A
promisor object is an object in a promisor packfile. A promised object is an
object directly referenced by a promisor object.

(In the future, we might need to add ".promisor" support to loose objects.)

Connectivity check and gc
-

The object walk done by the connectivity check (as used by fsck and fetch) stops
at all promisor objects and promised objects.

The object walk done by gc also stops at all promisor objects and promised
objects. Only non-promisor packfiles are deleted (if pack deletion is
requested); promisor packfiles are left alone. This maintains the distinction
between promisor packfiles and non-promisor packfiles. (In the future, we might
need to do something more sophisticated with promisor packfiles.)

Fetching of promised objects


When `sha1_object_info_extended()` (or similar) is invoked, it will
automatically attempt to fetch a missing object from the promisor remote if that
object is not in the local repository. For efficiency, no check is made as to
whether that object is a promised object or not.

This automatic fetching can be toggled on and off by the `fetch_if_missing`
global variable, and it is on by default.

The actual fetch is done through the fetch-pack/upload-pack protocol. Right now,
this uses the fact that upload-pack allows blob and tree "want"s, and this
incurs the overhead of the unnecessary ref advertisement. I hope that protocol
v2 will allow us to declare that blob and tree "want"s are allowed, and allow
the client to declare that it does not want the ref advertisement. All packfiles
downloaded in this way are annotated with ".promisor".

Fetching with `git fetch`
-

The fetch-pack/upload-pack protocol has also been extended to support omission
of blobs

Re: RFC v3: Another proposed hash function transition plan

2017-09-15 Thread Philip Oakley

Hi Jonathan,

"Jonathan Nieder" wrote;

Johannes Schindelin wrote:

On Wed, 13 Sep 2017, Jonathan Nieder wrote:

As a side note, I am probably misreading, but I found this set of
paragraphs a bit condescending. It sounds to me like you are saying
"You are making the wrong choice of hash function and everything else
you are describing is irrelevant when compared to that monumental
mistake. Please stop working on things I don't consider important".
With that reading it is quite demotivating to read.

I am sorry you read it that way. I did not feel condescending when I
wrote

that mail, I felt annoyed by the side track, and anxious. In my mind, the
transition is too important for side tracking, and I worry that we are
not

fast enough (imagine what would happen if a better attack was discovered
that is not as easily detected as the one we know about?).

Thanks for clarifying. That makes sense.

[...]

As to *my* opinion: after reading https://goo.gl/gh2Mzc (is it really
correct that its last update has been on March 6th?), my only concern is
really that it still talks about SHA3-256 when I think that the
performance benefits of SHA-256 (think: "Git at scale", and also hardware
support) really make the latter a better choice.

In order to be "ironed out", I think we need to talk about the
implementation detail "Translation table". This is important. It needs to
be *fast*.

Speaking of *fast*, I could imagine that it would make sense to store the
SHA-1 objects on disk, still, instead of converting them on the fly. I am
not sure whether this is something we need to define in the document,
though, as it may very well be premature optimization; Maybe mention that
we could do this if necessary?

Apart from that, I would *love* to see this document as The Official Plan
that I can Show To The Manager so that I can ask to Allocate Time.

Sounds promising!

Thanks much for this feedback. This is very helpful for knowing what
v4 of the doc needs.

The discussion of the translation table in [1] didn't make it to the
doc. You're right that it needs to.

Caching SHA-1 objects (and the pros and cons involved) makes sense to
mention in an "ideas for future work" section.

An implementation plan with well-defined pieces for people to take on
and estimates of how much work each involves may be useful for Showing
To The Manager. So I'll include a sketch of that for reviewers to
poke holes in, too.

Another thing the doc doesn't currently describe is how Git protocol
would work. That's worth sketching in a "future work" section as
well.

Sorry it has been taking so long to get this out. I think we should
have something ready to send on Monday.

I had a look at the current doc https://goo.gl/gh2Mzc and thought that the
selection of the "NewHash" should be separated out into a section of it's
own as a 'separation of concerns', so that the general transition plan only
refers to the "NewHash", so as not to accidentally pre-judge that selection.

I did look up the arguments regarding sha2 (sha256) versus sha3-256 and
found these two Q items

https://security.stackexchange.com/questions/152360/should-we-be-using-sha3-2017

https://security.stackexchange.com/questions/86283/how-does-sha3-keccak-shake-compare-to-sha2-should-i-use-non-shake-parameter

with an onward link to this:
https://www.imperialviolet.org/2012/10/21/nist.html

"NIST may not have you in mind (21 Oct 2012)"

"A couple of weeks back, NIST announced that Keccak would be SHA-3. Keccak
has somewhat disappointing software performance but is a gift to hardware
implementations."

which does appear to cover some of the concerns that dscho had noted, and
speed does appear to be a core Git selling point.

It would be worth at least covering these trade offs in the "select a
NewHash" section of the document, as at the end of the day it will be a
political judgement about what the future might hold regarding the
contenders.

What may also be worth noting is the fall back plan should the chosen
NewHash be the first to fail, perhaps spectacularly, as having a ready plan
could support the choice at risk.

Thanks,
Jonathan

[1]
https://public-inbox.org/git/CAJo=hJtoX9=aylhhpujs7fuev9ciz_mnpnephuz8whui6g9...@mail.gmail.com/

--
Philip

RE: [PATCH v6 06/12] ls-files: Add support in ls-files to display the fsmonitor valid bit

> -Original Message-
> From: Ben Peart [mailto:benpe...@microsoft.com]
> Sent: Friday, September 15, 2017 3:21 PM
> To: benpe...@microsoft.com
> Cc: David Turner ; ava...@gmail.com;
> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
> Subject: [PATCH v6 06/12] ls-files: Add support in ls-files to display the 
> fsmonitor
> valid bit
> 
> Add a new command line option (-f) to ls-files to have it use lowercase 
> letters
> for 'fsmonitor valid' files

Document in man page, please.

Re: What's cooking in git.git (Sep 2017, #03; Fri, 15)

Johannes Schindelin  writes:

> Please stop stating that you expect a reroll for rebase-i-extra when you
> explicitly stated months ago that you would not take my v6. It gets a bit
> annoying.

I already explained to you why I skipped v6, which turned to be
identical to v5 when the unnecessary rebase was undone.  It does not
have anything to do with expecting a v7 or later to fix other things
already pointed out in the review.

Having said that, I am getting sick of your constant hostile
whining, and am inclined to merge it as-is to 'next' and advance it
from there, if only to reduce these noises from you.  Even if you
are unwilling to, you do not have the sole possession and veto power
over the area of the code (neither do I) to prevent further
improvements, so I can expect others with good design taste
(hopefully not me) to fix up the misuse of revision traversal
implementation detail with follow-up patches.

Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)

Hi Junio,

On Sat, 16 Sep 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > If you want *contributors* to ping the thread themselves, how about
> > *posting your updates there, too*?
> 
> I do not understand this comment at all.  That is what I and others
> already and always do by responding to the patches, and when trying
> to see if a topic is still alive, with thread-specific responses and
> pings.
> 
> If you are demanding that "What's cooking" report to be sent as a
> response to all the topics, that will *NOT* going to happen.  It is
> meant to give a summary of the current state to help contributors
> and reviewers be aware of what is happening across the entire tree
> and not limited to a specific topic.

If that is the case, you will have to be okay with others responding to
your "What's cooking" mails instead of the original threads. That's what
you are buying into by having these "What's cooking" mails that are in no
usable way connected to the original threads.

Ciao,
Dscho

RE: [PATCH v6 05/12] fsmonitor: add documentation for the fsmonitor extension.



> -Original Message-
> From: Ben Peart [mailto:benpe...@microsoft.com]
> Sent: Friday, September 15, 2017 3:21 PM
> To: benpe...@microsoft.com
> Cc: David Turner ; ava...@gmail.com;
> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
> Subject: [PATCH v6 05/12] fsmonitor: add documentation for the fsmonitor
> extension.
> 
> This includes the core.fsmonitor setting, the query-fsmonitor hook, and the
> fsmonitor index extension.
> 
> Signed-off-by: Ben Peart 
> ---
>  Documentation/config.txt |  6 ++
>  Documentation/githooks.txt   | 23 +++
>  Documentation/technical/index-format.txt | 19 +++
>  3 files changed, 48 insertions(+)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt index
> dc4e3f58a2..c196007a27 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -413,6 +413,12 @@ core.protectNTFS::
>   8.3 "short" names.
>   Defaults to `true` on Windows, and `false` elsewhere.
> 
> +core.fsmonitor::
> + If set, the value of this variable is used as a command which
> + will identify all files that may have changed since the
> + requested date/time. This information is used to speed up git by
> + avoiding unnecessary processing of files that have not changed.

I'm confused here.  You have a file called "fsmonitor-watchman", which seems to 
discuss the protocol for core.fsmonitor scripts in general, and you have this 
documentation, which does not link to that file.  Can you clarify this? 



> +The hook should output to stdout the list of all files in the working
> +directory that may have changed since the requested time.  The logic
> +should be inclusive so that it does not miss any potential changes.

+"It is OK to include files which have not actually changed.  Newly-created and 
deleted files should also be included.  When files are renamed, both the old 
and the new name should be included."

Also, please discuss case sensitivity issues (e.g. on OS X).  

> +The paths should be relative to the root of the working directory and
> +be separated by a single NUL.



> +  - 32-bit version number: the current supported version is 1.
> +
> +  - 64-bit time: the extension data reflects all changes through the given
> + time which is stored as the nanoseconds elapsed since midnight,
> + January 1, 1970.

Nit: Please specify signed or unsigned for these.  (I expect to be getting out 
of 
cryosleep around 2262, and I want to know if my old git repos will keep 
working...)

> +  - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap.
> +
> +  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
> +is not CE_FSMONITOR_VALID.

Re: [PATCH 1/2] test-lib: group system specific FIFO tests by system

Hi Michael,

On Fri, 15 Sep 2017, Michael J Gruber wrote:

> Johannes Schindelin venit, vidit, dixit 15.09.2017 00:21:
> > 
> > On Thu, 14 Sep 2017, Michael J Gruber wrote:
> > 
> >> test-lib determines whether a file-system supports FIFOs and needs to do
> >> special casing for CYGWIN and MINGW. This separates those system
> >> specific settings from those at more central place.
> >>
> >> Set mkfifo()  to false in the central system specific place so that the
> >> same test works everywhere.
> > 
> > The mkfifo() emulation of Cygwin seems to work, no? I think it works even
> > in MSYS2, but not in MINGW.
> > 
> > So maybe this patch should affect only the MINGW arm?
> 
> I only reorganised the code, so in that sense the patch does not affect
> any system ;)
> 
> If indeed mkfifo works on CYGWIN than a separate patch should remove the
> exclusion of CYGWIN; alas, I can't confirm (I wish MS still had the old
> academic alliance programme).

You don't need an academic license, as there are free VMs available (they
expire after a while, though):

https://developer.microsoft.com/en-us/windows/downloads/virtual-machines

It seems that the current free VM will expire in 4 days. After that,
probably a new one will be available that expires some time later (not
sure about the exact time frame of those VMs).

Alternatively, you can download VMs meant for web testing at
https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/ which
expire after 90 days.

Ciao,
Dscho

Pertenecer a la Comunidad GIT

2017-09-15 Thread EduardoLeon

Buen día,

Me gustaría pertenecer a su comunidad y conocer mas sobre los temas de git

Muchas Gracias,
 
Saludos Cordiales. 
 
Administrador de la configuracion del software
Eduardo Alfonso León Benítez  Lebx  
Ext- 2366

Re: What's cooking in git.git (Sep 2017, #03; Fri, 15)

Hi Junio,

On Fri, 15 Sep 2017, Junio C Hamano wrote:

> --
> [Cooking]
> 
> [...]
> 
> * mk/diff-delta-uint-may-be-shorter-than-ulong (2017-08-10) 1 commit
>  . diff-delta: fix encoding size that would not fit in "unsigned int"
> 
>  The machinery to create xdelta used in pack files received the
>  sizes of the data in size_t, but lost the higher bits of them by
>  storing them in "unsigned int" during the computation, which is
>  fixed.
> 
>  Dropped, as it was rerolled for review as part of a larger series.
>  cf. <1502914591-26215-1-git-send-email-martin@mail.zuhause>
> 
> [...]
> 
> * mk/use-size-t-in-zlib (2017-08-10) 1 commit
>  . zlib.c: use size_t for size
> 
>  The wrapper to call into zlib followed our long tradition to use
>  "unsigned long" for sizes of regions in memory, which have been
>  updated to use "size_t".
> 
>  Dropped, as it was rerolled for review as part of a larger series.
>  cf. <1502914591-26215-1-git-send-email-martin@mail.zuhause>
> 
> 
> * mk/diff-delta-avoid-large-offset (2017-08-11) 1 commit
>  . diff-delta: do not allow delta offset truncation
> 
>  The delta format used in the packfile cannot reference data at
>  offset larger than what can be expressed in 4-byte, but the
>  generator for the data failed to make sure the offset does not
>  overflow.  This has been corrected.
> 
>  Dropped, as it was rerolled for review as part of a larger series.
>  cf. <1502914591-26215-1-git-send-email-martin@mail.zuhause>
> 
> 
> --
> [Discarded]
> 
> [...]

These three topics are still in the wrong category. Please fix.

Ciao,
Dscho

[PATCH v6 03/12] update-index: add a new --force-write-index option

At times, it makes sense to avoid the cost of writing out the index
when the only changes can easily be recomputed on demand. This causes
problems when trying to write test cases to verify that state as they
can't guarantee the state has been persisted to disk.

Add a new option (--force-write-index) to update-index that will
ensure the index is written out even if the cache_changed flag is not
set.

Signed-off-by: Ben Peart 
---
 builtin/update-index.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index d562f2ec69..e1ca0759d5 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -915,6 +915,7 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
struct refresh_params refresh_args = {0, _errors};
int lock_error = 0;
int split_index = -1;
+   int force_write = 0;
struct lock_file *lock_file;
struct parse_opt_ctx_t ctx;
strbuf_getline_fn getline_fn;
@@ -1006,6 +1007,8 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
N_("test if the filesystem supports untracked 
cache"), UC_TEST),
OPT_SET_INT(0, "force-untracked-cache", _cache,
N_("enable untracked cache without testing the 
filesystem"), UC_FORCE),
+   OPT_SET_INT(0, "force-write-index", _write,
+   N_("write out the index even if is not flagged as 
changed"), 1),
OPT_END()
};
 
@@ -1147,7 +1150,7 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
die("BUG: bad untracked_cache value: %d", untracked_cache);
}
 
-   if (active_cache_changed) {
+   if (active_cache_changed || force_write) {
if (newfd < 0) {
if (refresh_args.flags & REFRESH_QUIET)
exit(128);
-- 
2.14.1.548.ge54b1befee.dirty

[PATCH v6 02/12] preload-index: add override to enable testing preload-index

Preload index doesn't run unless it has a minimum number of 1000 files.
To enable running tests with fewer files, add an environment variable
(GIT_FORCE_PRELOAD_TEST) which will override that minimum and set it to 2.

Signed-off-by: Ben Peart 
---
 preload-index.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/preload-index.c b/preload-index.c
index 70a4c80878..75564c497a 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -79,6 +79,8 @@ static void preload_index(struct index_state *index,
return;
 
threads = index->cache_nr / THREAD_COST;
+   if ((index->cache_nr > 1) && (threads < 2) && 
getenv("GIT_FORCE_PRELOAD_TEST"))
+   threads = 2;
if (threads < 2)
return;
if (threads > MAX_PARALLEL)
-- 
2.14.1.548.ge54b1befee.dirty

[PATCH v6 09/12] split-index: disable the fsmonitor extension when running the split index test

The split index test t1700-split-index.sh has hard coded SHA values for
the index.  Currently it supports index V4 and V3 but assumes there are
no index extensions loaded.

When manually forcing the fsmonitor extension to be turned on when
running the test suite, the SHA values no longer match which causes the
test to fail.

The potential matrix of index extensions and index versions can is quite
large so instead disable the extension before attempting to run the test.

Signed-off-by: Ben Peart 
---
 t/t1700-split-index.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 22f69a410b..af9b847761 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -6,6 +6,7 @@ test_description='split index mode tests'
 
 # We need total control of index splitting here
 sane_unset GIT_TEST_SPLIT_INDEX
+sane_unset GIT_FSMONITOR_TEST
 
 test_expect_success 'enable split index' '
git config splitIndex.maxPercentChange 100 &&
-- 
2.14.1.548.ge54b1befee.dirty

[PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension

Add a test utility (test-dump-fsmonitor) that will dump the fsmonitor
index extension.

Signed-off-by: Ben Peart 
---
 Makefile   |  1 +
 t/helper/test-dump-fsmonitor.c | 21 +
 2 files changed, 22 insertions(+)
 create mode 100644 t/helper/test-dump-fsmonitor.c

diff --git a/Makefile b/Makefile
index 9d6ec9c1e9..d970cd00e9 100644
--- a/Makefile
+++ b/Makefile
@@ -639,6 +639,7 @@ TEST_PROGRAMS_NEED_X += test-config
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
+TEST_PROGRAMS_NEED_X += test-dump-fsmonitor
 TEST_PROGRAMS_NEED_X += test-dump-split-index
 TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
 TEST_PROGRAMS_NEED_X += test-fake-ssh
diff --git a/t/helper/test-dump-fsmonitor.c b/t/helper/test-dump-fsmonitor.c
new file mode 100644
index 00..482d749bb9
--- /dev/null
+++ b/t/helper/test-dump-fsmonitor.c
@@ -0,0 +1,21 @@
+#include "cache.h"
+
+int cmd_main(int ac, const char **av)
+{
+   struct index_state *istate = _index;
+   int i;
+
+   setup_git_directory();
+   if (do_read_index(istate, get_index_file(), 0) < 0)
+   die("unable to read index file");
+   if (!istate->fsmonitor_last_update) {
+   printf("no fsmonitor\n");
+   return 0;
+   }
+   printf("fsmonitor last update %"PRIuMAX"\n", 
istate->fsmonitor_last_update);
+
+   for (i = 0; i < istate->cache_nr; i++)
+   printf((istate->cache[i]->ce_flags & CE_FSMONITOR_VALID) ? "+" 
: "-");
+
+   return 0;
+}
-- 
2.14.1.548.ge54b1befee.dirty

[PATCH v6 12/12] fsmonitor: add a performance test

Add a test utility (test-drop-caches) that flushes all changes to disk
then drops file system cache on Windows, Linux, and OSX.

Add a perf test (p7519-fsmonitor.sh) for fsmonitor.

By default, the performance test will utilize the Watchman file system
monitor if it is installed.  If Watchman is not installed, it will use a
dummy integration script that does not report any new or modified files.
The dummy script has very little overhead which provides optimistic results.

The performance test will also use the untracked cache feature if it is
available as fsmonitor uses it to speed up scanning for untracked files.

There are 3 environment variables that can be used to alter the default
behavior of the performance test:

GIT_PERF_7519_UNTRACKED_CACHE: used to configure core.untrackedCache
GIT_PERF_7519_SPLIT_INDEX: used to configure core.splitIndex
GIT_PERF_7519_FSMONITOR: used to configure core.fsMonitor

The big win for using fsmonitor is the elimination of the need to scan the
working directory looking for changed and untracked files. If the file
information is all cached in RAM, the benefits are reduced.

GIT_PERF_7519_DROP_CACHE: if set, the OS caches are dropped between tests

Signed-off-by: Ben Peart 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile|   1 +
 t/helper/.gitignore |   1 +
 t/helper/test-drop-caches.c | 161 ++
 t/perf/p7519-fsmonitor.sh   | 184 
 4 files changed, 347 insertions(+)
 create mode 100644 t/helper/test-drop-caches.c
 create mode 100755 t/perf/p7519-fsmonitor.sh

diff --git a/Makefile b/Makefile
index d970cd00e9..b2653ee64f 100644
--- a/Makefile
+++ b/Makefile
@@ -638,6 +638,7 @@ TEST_PROGRAMS_NEED_X += test-ctype
 TEST_PROGRAMS_NEED_X += test-config
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
+TEST_PROGRAMS_NEED_X += test-drop-caches
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-dump-fsmonitor
 TEST_PROGRAMS_NEED_X += test-dump-split-index
diff --git a/t/helper/.gitignore b/t/helper/.gitignore
index 721650256e..f9328eebdd 100644
--- a/t/helper/.gitignore
+++ b/t/helper/.gitignore
@@ -3,6 +3,7 @@
 /test-config
 /test-date
 /test-delta
+/test-drop-caches
 /test-dump-cache-tree
 /test-dump-split-index
 /test-dump-untracked-cache
diff --git a/t/helper/test-drop-caches.c b/t/helper/test-drop-caches.c
new file mode 100644
index 00..717079865c
--- /dev/null
+++ b/t/helper/test-drop-caches.c
@@ -0,0 +1,161 @@
+#include "git-compat-util.h"
+
+#if defined(GIT_WINDOWS_NATIVE)
+
+int cmd_sync(void)
+{
+   char Buffer[MAX_PATH];
+   DWORD dwRet;
+   char szVolumeAccessPath[] = ".\\X:";
+   HANDLE hVolWrite;
+   int success = 0;
+
+   dwRet = GetCurrentDirectory(MAX_PATH, Buffer);
+   if ((0 == dwRet) || (dwRet > MAX_PATH))
+   return error("Error getting current directory");
+
+   if ((Buffer[0] < 'A') || (Buffer[0] > 'Z'))
+   return error("Invalid drive letter '%c'", Buffer[0]);
+
+   szVolumeAccessPath[4] = Buffer[0];
+   hVolWrite = CreateFile(szVolumeAccessPath, GENERIC_READ | GENERIC_WRITE,
+   FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, 0, 
NULL);
+   if (INVALID_HANDLE_VALUE == hVolWrite)
+   return error("Unable to open volume for writing, need admin 
access");
+
+   success = FlushFileBuffers(hVolWrite);
+   if (!success)
+   error("Unable to flush volume");
+
+   CloseHandle(hVolWrite);
+
+   return !success;
+}
+
+#define STATUS_SUCCESS (0xL)
+#define STATUS_PRIVILEGE_NOT_HELD  (0xC061L)
+
+typedef enum _SYSTEM_INFORMATION_CLASS {
+   SystemMemoryListInformation = 80,
+} SYSTEM_INFORMATION_CLASS;
+
+typedef enum _SYSTEM_MEMORY_LIST_COMMAND {
+   MemoryCaptureAccessedBits,
+   MemoryCaptureAndResetAccessedBits,
+   MemoryEmptyWorkingSets,
+   MemoryFlushModifiedList,
+   MemoryPurgeStandbyList,
+   MemoryPurgeLowPriorityStandbyList,
+   MemoryCommandMax
+} SYSTEM_MEMORY_LIST_COMMAND;
+
+BOOL GetPrivilege(HANDLE TokenHandle, LPCSTR lpName, int flags)
+{
+   BOOL bResult;
+   DWORD dwBufferLength;
+   LUID luid;
+   TOKEN_PRIVILEGES tpPreviousState;
+   TOKEN_PRIVILEGES tpNewState;
+
+   dwBufferLength = 16;
+   bResult = LookupPrivilegeValueA(0, lpName, );
+   if (bResult) {
+   tpNewState.PrivilegeCount = 1;
+   tpNewState.Privileges[0].Luid = luid;
+   tpNewState.Privileges[0].Attributes = 0;
+   bResult = AdjustTokenPrivileges(TokenHandle, 0, ,
+   (DWORD)((LPBYTE)&(tpNewState.Privileges[1]) - 
(LPBYTE)),
+   , );
+   if (bResult) {
+   tpPreviousState.PrivilegeCount = 1;
+

[PATCH v6 11/12] fsmonitor: add a sample integration script for Watchman

This script integrates the new fsmonitor capabilities of git with the
cross platform Watchman file watching service. To use the script:

Download and install Watchman from https://facebook.github.io/watchman/.
Rename the sample integration hook from fsmonitor-watchman.sample to
fsmonitor-watchman. Configure git to use the extension:

git config core.fsmonitor .git/hooks/fsmonitor-watchman

Optionally turn on the untracked cache for optimal performance.

Signed-off-by: Ben Peart 
Signed-off-by: Johannes Schindelin 
Signed-off-by: Ævar Arnfjörð Bjarmason 
Signed-off-by: Christian Couder 
---
 templates/hooks--fsmonitor-watchman.sample | 119 +
 1 file changed, 119 insertions(+)
 create mode 100755 templates/hooks--fsmonitor-watchman.sample

diff --git a/templates/hooks--fsmonitor-watchman.sample 
b/templates/hooks--fsmonitor-watchman.sample
new file mode 100755
index 00..2779d7edf3
--- /dev/null
+++ b/templates/hooks--fsmonitor-watchman.sample
@@ -0,0 +1,119 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+use IPC::Open2;
+
+# An example hook script to integrate Watchman
+# (https://facebook.github.io/watchman/) with git to speed up detecting
+# new and modified files.
+#
+# The hook is passed a version (currently 1) and a time in nanoseconds
+# formatted as a string and outputs to stdout all files that have been
+# modified since the given time. Paths must be relative to the root of
+# the working tree and separated by a single NUL.
+#
+# To enable this hook, rename this file to "query-watchman" and set
+# 'git config core.fsmonitor .git/hooks/query-watchman'
+#
+my ($version, $time) = @ARGV;
+
+# Check the hook interface version
+
+if ($version == 1) {
+   # convert nanoseconds to seconds
+   $time = int $time / 10;
+} else {
+   die "Unsupported query-fsmonitor hook version '$version'.\n" .
+   "Falling back to scanning...\n";
+}
+
+# Convert unix style paths to escaped Windows style paths when running
+# in Windows command prompt
+
+my $system = `uname -s`;
+$system =~ s/[\r\n]+//g;
+my $git_work_tree;
+
+if ($system =~ m/^MSYS_NT/) {
+   $git_work_tree = `cygpath -aw "\$PWD"`;
+   $git_work_tree =~ s/[\r\n]+//g;
+   $git_work_tree =~ s,\\,/,g;
+} else {
+   $git_work_tree = $ENV{'PWD'};
+}
+
+my $retry = 1;
+
+launch_watchman();
+
+sub launch_watchman {
+
+   # Set input record separator
+   local $/ = 0666;
+
+   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j')
+   or die "open2() failed: $!\n" .
+   "Falling back to scanning...\n";
+
+   # In the query expression below we're asking for names of files that
+   # changed since $time but were not transient (ie created after
+   # $time but no longer exist).
+   #
+   # To accomplish this, we're using the "since" generator to use the
+   # recency index to select candidate nodes and "fields" to limit the
+   # output to file names only. Then we're using the "expression" term to
+   # further constrain the results.
+   #
+   # The category of transient files that we want to ignore will have a
+   # creation clock (cclock) newer than $time_t value and will also not
+   # currently exist.
+
+   print CHLD_IN "[\"query\", \"$git_work_tree\", { \
+   \"since\": $time, \
+   \"fields\": [\"name\"], \
+   \"expression\": [\"not\", [\"allof\", [\"since\", $time, \"cclock\"], 
[\"not\", \"exists\"]]] \
+   }]";
+
+   my $response = ;
+
+   die "Watchman: command returned no output.\n" .
+   "Falling back to scanning...\n" if $response eq "";
+   die "Watchman: command returned invalid output: $response\n" .
+   "Falling back to scanning...\n" unless $response =~ /^\{/;
+
+   my $json_pkg;
+   eval {
+   require JSON::XS;
+   $json_pkg = "JSON::XS";
+   1;
+   } or do {
+   require JSON::PP;
+   $json_pkg = "JSON::PP";
+   };
+
+   my $o = $json_pkg->new->utf8->decode($response);
+
+   if ($retry > 0 and $o->{error} and $o->{error} =~ m/unable to resolve 
root .* directory (.*) is not watched/) {
+   print STDERR "Adding '$git_work_tree' to watchman's watch 
list.\n";
+   $retry--;
+   qx/watchman watch "$git_work_tree"/;
+   die "Failed to make watchman watch '$git_work_tree'.\n" .
+   "Falling back to scanning...\n" if $? != 0;
+
+   # Watchman will always return all files on the first query so
+   # return the fast "everything is dirty" flag to git and do the
+   # Watchman query just to get it over with now so we won't pay
+   # the cost in git to look up each individual file.
+   print "*\0";
+   eval { launch_watchman() };
+

[PATCH v6 01/12] bswap: add 64 bit endianness helper get_be64

Add a new get_be64 macro to enable 64 bit endian conversions on memory
that may or may not be aligned.

Signed-off-by: Ben Peart 
---
 compat/bswap.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/compat/bswap.h b/compat/bswap.h
index 7d063e9e40..6b22c46214 100644
--- a/compat/bswap.h
+++ b/compat/bswap.h
@@ -158,7 +158,9 @@ static inline uint64_t git_bswap64(uint64_t x)
 
 #define get_be16(p)ntohs(*(unsigned short *)(p))
 #define get_be32(p)ntohl(*(unsigned int *)(p))
+#define get_be64(p)ntohll(*(uint64_t *)(p))
 #define put_be32(p, v) do { *(unsigned int *)(p) = htonl(v); } while (0)
+#define put_be64(p, v) do { *(uint64_t *)(p) = htonll(v); } while (0)
 
 #else
 
@@ -178,6 +180,13 @@ static inline uint32_t get_be32(const void *ptr)
(uint32_t)p[3] <<  0;
 }
 
+static inline uint64_t get_be64(const void *ptr)
+{
+   const unsigned char *p = ptr;
+   return  (uint64_t)get_be32(p[0]) << 32 |
+   (uint64_t)get_be32(p[4]) <<  0;
+}
+
 static inline void put_be32(void *ptr, uint32_t value)
 {
unsigned char *p = ptr;
@@ -187,4 +196,17 @@ static inline void put_be32(void *ptr, uint32_t value)
p[3] = value >>  0;
 }
 
+static inline void put_be64(void *ptr, uint64_t value)
+{
+   unsigned char *p = ptr;
+   p[0] = value >> 56;
+   p[1] = value >> 48;
+   p[2] = value >> 40;
+   p[3] = value >> 32;
+   p[4] = value >> 24;
+   p[5] = value >> 16;
+   p[6] = value >>  8;
+   p[7] = value >>  0;
+}
+
 #endif
-- 
2.14.1.548.ge54b1befee.dirty

[PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension

Test the ability to add/remove the fsmonitor index extension via
update-index.

Test that dirty files returned from the integration script are properly
represented in the index extension and verify that ls-files correctly
reports their state.

Test that ensure status results are correct when using the new fsmonitor
extension.  Test untracked, modified, and new files by ensuring the
results are identical to when not using the extension.

Test that if the fsmonitor extension doesn't tell git about a change, it
doesn't discover it on its own.  This ensures git is honoring the
extension and that we get the performance benefits desired.

Three test integration scripts are provided:

fsmonitor-all - marks all files as dirty
fsmonitor-none - marks no files as dirty
fsmonitor-watchman - integrates with Watchman with debug logging

To run tests in the test suite while utilizing fsmonitor:

First copy t/t7519/fsmonitor-all to a location in your path and then set
GIT_FORCE_PRELOAD_TEST=true and GIT_FSMONITOR_TEST=fsmonitor-all and run
your tests.

Note: currently when using the test script fsmonitor-watchman on
Windows, many tests fail due to a reported but not yet fixed bug in
Watchman where it holds on to handles for directories and files which
prevents the test directory from being cleaned up properly.

Signed-off-by: Ben Peart 
---
 t/t7519-status-fsmonitor.sh | 259 
 t/t7519/fsmonitor-all   |  23 
 t/t7519/fsmonitor-none  |  21 
 t/t7519/fsmonitor-watchman  | 128 ++
 4 files changed, 431 insertions(+)
 create mode 100755 t/t7519-status-fsmonitor.sh
 create mode 100755 t/t7519/fsmonitor-all
 create mode 100755 t/t7519/fsmonitor-none
 create mode 100755 t/t7519/fsmonitor-watchman

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
new file mode 100755
index 00..6aa1e4e924
--- /dev/null
+++ b/t/t7519-status-fsmonitor.sh
@@ -0,0 +1,259 @@
+#!/bin/sh
+
+test_description='git status with file system watcher'
+
+. ./test-lib.sh
+
+#
+# To run the entire git test suite using fsmonitor:
+#
+# copy t/t7519/fsmonitor-all to a location in your path and then set
+# GIT_FSMONITOR_TEST=fsmonitor-all and run your tests.
+#
+
+# Note, after "git reset --hard HEAD" no extensions exist other than 'TREE'
+# "git update-index --fsmonitor" can be used to get the extension written
+# before testing the results.
+
+clean_repo () {
+   git reset --hard HEAD &&
+   git clean -fd
+}
+
+dirty_repo () {
+   : >untracked &&
+   : >dir1/untracked &&
+   : >dir2/untracked &&
+   echo 1 >modified &&
+   echo 2 >dir1/modified &&
+   echo 3 >dir2/modified &&
+   echo 4 >new &&
+   echo 5 >dir1/new &&
+   echo 6 >dir2/new
+}
+
+write_integration_script() {
+   write_script .git/hooks/fsmonitor-test<<-\EOF
+   if [ "$#" -ne 2 ]; then
+   echo "$0: exactly 2 arguments expected"
+   exit 2
+   fi
+   if [ "$1" != 1 ]; then
+   echo -e "Unsupported core.fsmonitor hook version.\n" >&2
+   exit 1
+   fi
+   printf "untracked\0"
+   printf "dir1/untracked\0"
+   printf "dir2/untracked\0"
+   printf "modified\0"
+   printf "dir1/modified\0"
+   printf "dir2/modified\0"
+   printf "new\0"
+   printf "dir1/new\0"
+   printf "dir2/new\0"
+   EOF
+}
+
+test_lazy_prereq UNTRACKED_CACHE '
+   { git update-index --test-untracked-cache; ret=$?; } &&
+   test $ret -ne 1
+'
+
+test_expect_success 'setup' '
+   mkdir -p .git/hooks &&
+   : >tracked &&
+   : >modified &&
+   mkdir dir1 &&
+   : >dir1/tracked &&
+   : >dir1/modified &&
+   mkdir dir2 &&
+   : >dir2/tracked &&
+   : >dir2/modified &&
+   git -c core.fsmonitor= add . &&
+   git -c core.fsmonitor= commit -m initial &&
+   git config core.fsmonitor .git/hooks/fsmonitor-test &&
+   cat >.gitignore <<-\EOF
+   .gitignore
+   expect*
+   actual*
+   marker*
+   EOF
+'
+
+# test that the fsmonitor extension is off by default
+test_expect_success 'fsmonitor extension is off by default' '
+   test-dump-fsmonitor >actual &&
+   grep "^no fsmonitor" actual
+'
+
+# test that "update-index --fsmonitor" adds the fsmonitor extension
+test_expect_success 'update-index --fsmonitor" adds the fsmonitor extension' '
+   git update-index --fsmonitor &&
+   test-dump-fsmonitor >actual &&
+   grep "^fsmonitor last update" actual
+'
+
+# test that "update-index --no-fsmonitor" removes the fsmonitor extension
+test_expect_success 'update-index --no-fsmonitor" removes the fsmonitor 
extension' '
+   git update-index --no-fsmonitor &&
+   test-dump-fsmonitor >actual &&
+   grep "^no fsmonitor" actual
+'
+
+cat >expect expect expect

[PATCH v6 07/12] update-index: add fsmonitor support to update-index

Add support in update-index to manually add/remove the fsmonitor
extension via --fsmonitor/--no-fsmonitor flags

Signed-off-by: Ben Peart 
---
 builtin/update-index.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 6f39ee9274..b03afc1f3a 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -918,6 +918,7 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
int lock_error = 0;
int split_index = -1;
int force_write = 0;
+   int fsmonitor = -1;
struct lock_file *lock_file;
struct parse_opt_ctx_t ctx;
strbuf_getline_fn getline_fn;
@@ -1011,6 +1012,8 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
N_("enable untracked cache without testing the 
filesystem"), UC_FORCE),
OPT_SET_INT(0, "force-write-index", _write,
N_("write out the index even if is not flagged as 
changed"), 1),
+   OPT_BOOL(0, "fsmonitor", ,
+   N_("enable or disable file system monitor")),
OPT_END()
};
 
@@ -1152,6 +1155,22 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
die("BUG: bad untracked_cache value: %d", untracked_cache);
}
 
+   if (fsmonitor > 0) {
+   if (git_config_get_fsmonitor() == 0)
+   warning(_("core.fsmonitor is unset; "
+   "set it if you really want to "
+   "enable fsmonitor"));
+   add_fsmonitor(_index);
+   report(_("fsmonitor enabled"));
+   } else if (!fsmonitor) {
+   if (git_config_get_fsmonitor() == 1)
+   warning(_("core.fsmonitor is set; "
+   "remove it if you really want to "
+   "disable fsmonitor"));
+   remove_fsmonitor(_index);
+   report(_("fsmonitor disabled"));
+   }
+
if (active_cache_changed || force_write) {
if (newfd < 0) {
if (refresh_args.flags & REFRESH_QUIET)
-- 
2.14.1.548.ge54b1befee.dirty

[PATCH v6 05/12] fsmonitor: add documentation for the fsmonitor extension.

This includes the core.fsmonitor setting, the query-fsmonitor hook,
and the fsmonitor index extension.

Signed-off-by: Ben Peart 
---
 Documentation/config.txt |  6 ++
 Documentation/githooks.txt   | 23 +++
 Documentation/technical/index-format.txt | 19 +++
 3 files changed, 48 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index dc4e3f58a2..c196007a27 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -413,6 +413,12 @@ core.protectNTFS::
8.3 "short" names.
Defaults to `true` on Windows, and `false` elsewhere.
 
+core.fsmonitor::
+   If set, the value of this variable is used as a command which
+   will identify all files that may have changed since the
+   requested date/time. This information is used to speed up git by
+   avoiding unnecessary processing of files that have not changed.
+
 core.trustctime::
If false, the ctime differences between the index and the
working tree are ignored; useful when the inode change time
diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 1bb4f92d4d..da82d64b0b 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -456,6 +456,29 @@ non-zero status causes 'git send-email' to abort before 
sending any
 e-mails.
 
 
+[[fsmonitor-watchman]]
+fsmonitor-watchman
+~~~
+
+This hook is invoked when the configuration option core.fsmonitor is
+set to .git/hooks/fsmonitor-watchman.  It takes two arguments, a version
+(currently 1) and the time in elapsed nanoseconds since midnight,
+January 1, 1970.
+
+The hook should output to stdout the list of all files in the working
+directory that may have changed since the requested time.  The logic
+should be inclusive so that it does not miss any potential changes.
+The paths should be relative to the root of the working directory
+and be separated by a single NUL.
+
+Git will limit what files it checks for changes as well as which
+directories are checked for untracked files based on the path names
+given.
+
+The exit status determines whether git will use the data from the
+hook to limit its search.  On error, it will fall back to verifying
+all files and folders.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/Documentation/technical/index-format.txt 
b/Documentation/technical/index-format.txt
index ade0b0c445..db3572626b 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -295,3 +295,22 @@ The remaining data of each directory block is grouped by 
type:
 in the previous ewah bitmap.
 
   - One NUL.
+
+== File System Monitor cache
+
+  The file system monitor cache tracks files for which the core.fsmonitor
+  hook has told us about changes.  The signature for this extension is
+  { 'F', 'S', 'M', 'N' }.
+
+  The extension starts with
+
+  - 32-bit version number: the current supported version is 1.
+
+  - 64-bit time: the extension data reflects all changes through the given
+   time which is stored as the nanoseconds elapsed since midnight,
+   January 1, 1970.
+
+  - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap.
+
+  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
+is not CE_FSMONITOR_VALID.
-- 
2.14.1.548.ge54b1befee.dirty

[PATCH v6 06/12] ls-files: Add support in ls-files to display the fsmonitor valid bit

Add a new command line option (-f) to ls-files to have it use lowercase
letters for 'fsmonitor valid' files

Signed-off-by: Ben Peart 
---
 builtin/ls-files.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index e1339e6d17..313962a0c1 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -31,6 +31,7 @@ static int show_resolve_undo;
 static int show_modified;
 static int show_killed;
 static int show_valid_bit;
+static int show_fsmonitor_bit;
 static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
@@ -86,7 +87,8 @@ static const char *get_tag(const struct cache_entry *ce, 
const char *tag)
 {
static char alttag[4];
 
-   if (tag && *tag && show_valid_bit && (ce->ce_flags & CE_VALID)) {
+   if (tag && *tag && ((show_valid_bit && (ce->ce_flags & CE_VALID)) ||
+   (show_fsmonitor_bit && (ce->ce_flags & CE_FSMONITOR_VALID {
memcpy(alttag, tag, 3);
 
if (isalpha(tag[0])) {
@@ -515,6 +517,8 @@ int cmd_ls_files(int argc, const char **argv, const char 
*cmd_prefix)
N_("identify the file status with tags")),
OPT_BOOL('v', NULL, _valid_bit,
N_("use lowercase letters for 'assume unchanged' 
files")),
+   OPT_BOOL('f', NULL, _fsmonitor_bit,
+   N_("use lowercase letters for 'fsmonitor clean' 
files")),
OPT_BOOL('c', "cached", _cached,
N_("show cached files in the output (default)")),
OPT_BOOL('d', "deleted", _deleted,
@@ -584,7 +588,7 @@ int cmd_ls_files(int argc, const char **argv, const char 
*cmd_prefix)
for (i = 0; i < exclude_list.nr; i++) {
add_exclude(exclude_list.items[i].string, "", 0, el, 
--exclude_args);
}
-   if (show_tag || show_valid_bit) {
+   if (show_tag || show_valid_bit || show_fsmonitor_bit) {
tag_cached = "H ";
tag_unmerged = "M ";
tag_removed = "R ";
-- 
2.14.1.548.ge54b1befee.dirty

[PATCH v6 00/12] Fast git status via a file system watcher

This is a fairly significant rewrite since V5. The big changes include:

Multiple functions including preload-index(), ie_match_stat(), and
refresh_cache_ent() have been updated to honor the CE_FSMONITOR_VALID bit
following the same pattern as skip_worktree and CE_VALID.  As a result,
performance improvements apply to all git commands that would otherwise
have had to scan the entire working directory.

core.fsmonitor is now a registered command (instead of a hook) to
provide additional flexibility.  It is called when needed to ensure the
state of the index is up-to-date.

The Watchman integration script is now entirely written in perl to
minimize spawning additional helper commands.  This along with the other
changes have helped reduce the overhead and made the extension applicable
to more (ie smaller) repos.

There are additional opportunities for performance improvements but I
wanted to get this version out there and then build on it as the
foundation.  Some potential examples of future patches include:

 - call the integration script on a background thread so that it can
   execute in parallel.

 - optimize traverse trees by pruning out entire branches that do not
   contain any changes.

Other optimizations likely exist where knowledge that files have not
changed can be used to short circuit some of the normal workflow.

Performance
===

With the various enhancements, performance has been improved especially
for smaller repos.  The included perf test compares status times without
fsmonitor to those with fsmonitor using the provided Watchman integration
script.

Due to the overhead of calling out to Watchman, on very small repos
(<10K files) the overhead exceeds the savings.  Once repos hit 10K files
the savings kick in and for repos beyond that, the savings are dramatic.

Test with 10,000 files   this tree

7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman) 0.35(0.03+0.04)
7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman)0.37(0.00+0.09)
7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman)   0.43(0.03+0.06)
7519.6: status (fsmonitor=)  0.45(0.00+0.07)
7519.7: status -uno (fsmonitor=) 0.40(0.03+0.07)
7519.8: status -uall (fsmonitor=)0.44(0.04+0.04)

Test with 100,000 files  this tree

7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman) 0.33(0.01+0.03)
7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman)0.36(0.00+0.06)
7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman)   0.93(0.00+0.07)
7519.6: status (fsmonitor=)  2.66(0.04+0.03)
7519.7: status -uno (fsmonitor=) 2.44(0.01+0.06)
7519.8: status -uall (fsmonitor=)2.94(0.03+0.07)

Test with 1,000,000 filesthis tree
-
7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman) 1.45(0.00+0.06)
7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman)0.88(0.01+0.04)
7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman)   6.14(0.03+0.04)
7519.6: status (fsmonitor=)  
25.91(0.04+0.06)
7519.7: status -uno (fsmonitor=) 
23.96(0.04+0.03)
7519.8: status -uall (fsmonitor=)
28.81(0.00+0.07)

Note: all numbers above are with a warm disk cache on a fast SSD, real
world performance numbers are often dramatically better as fsmonitor can
eliminate all the file IO to lstat every file and then traverse the
working directory looking for untracked files.  For example, a cold
status without fsmonitor on a HDD with 1M files takes 1m22.774s

$ time git -c core.fsmonitor= status
On branch p0006-ballast

It took 2.09 seconds to enumerate untracked files. 'status -uno'
may speed it up, but you have to be careful not to forget to add
new files yourself (see 'git help status').
nothing to commit, working tree clean

real1m22.774s
user0m0.000s
sys 0m0.000s


Ben Peart (12):
  bswap: add 64 bit endianness helper get_be64
  preload-index: add override to enable testing preload-index
  update-index: add a new --force-write-index option
  fsmonitor: teach git to optionally utilize a file system monitor to
speed up detecting new or changed files.
  fsmonitor: add documentation for the fsmonitor extension.
  ls-files: Add support in ls-files to display the fsmonitor valid bit
  update-index: add fsmonitor support to update-index
  fsmonitor: add a test tool to dump the index extension
  split-index: disable the fsmonitor extension when

[PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

When the index is read from disk, the fsmonitor index extension is used
to flag the last known potentially dirty index entries. The registered
core.fsmonitor command is called with the time the index was last
updated and returns the list of files changed since that time. This list
is used to flag any additional dirty cache entries and untracked cache
directories.

We can then use this valid state to speed up preload_index(),
ie_match_stat(), and refresh_cache_ent() as they do not need to lstat()
files to detect potential changes for those entries marked
CE_FSMONITOR_VALID.

In addition, if the untracked cache is turned on valid_cached_dir() can
skip checking directories for new or changed files as fsmonitor will
invalidate the cache only for those directories that have been
identified as having potential changes.

To keep the CE_FSMONITOR_VALID state accurate during git operations;
when git updates a cache entry to match the current state on disk,
it will now set the CE_FSMONITOR_VALID bit.

Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit
is cleared and the corresponding untracked cache directory is marked
invalid.

Signed-off-by: Ben Peart 
---
 Makefile   |   1 +
 apply.c|   2 +-
 builtin/update-index.c |   2 +
 cache.h|  10 +-
 config.c   |  14 +++
 config.h   |   1 +
 diff-lib.c |   2 +
 dir.c  |  27 --
 dir.h  |   2 +
 entry.c|   4 +-
 environment.c  |   1 +
 fsmonitor.c| 253 +
 fsmonitor.h|  61 
 preload-index.c|   6 +-
 read-cache.c   |  49 --
 submodule.c|   2 +-
 unpack-trees.c |   8 +-
 17 files changed, 419 insertions(+), 26 deletions(-)
 create mode 100644 fsmonitor.c
 create mode 100644 fsmonitor.h

diff --git a/Makefile b/Makefile
index f2bb7f2f63..9d6ec9c1e9 100644
--- a/Makefile
+++ b/Makefile
@@ -786,6 +786,7 @@ LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
+LIB_OBJS += fsmonitor.o
 LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
 LIB_OBJS += graph.o
diff --git a/apply.c b/apply.c
index 71cbbd141c..9061cc5f15 100644
--- a/apply.c
+++ b/apply.c
@@ -3399,7 +3399,7 @@ static int verify_index_match(const struct cache_entry 
*ce, struct stat *st)
return -1;
return 0;
}
-   return ce_match_stat(ce, st, 
CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE);
+   return ce_match_stat(ce, st, 
CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE|CE_MATCH_IGNORE_FSMONITOR);
 }
 
 #define SUBMODULE_PATCH_WITHOUT_INDEX 1
diff --git a/builtin/update-index.c b/builtin/update-index.c
index e1ca0759d5..6f39ee9274 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -16,6 +16,7 @@
 #include "pathspec.h"
 #include "dir.h"
 #include "split-index.h"
+#include "fsmonitor.h"
 
 /*
  * Default to not allowing changes to the list of files. The
@@ -233,6 +234,7 @@ static int mark_ce_flags(const char *path, int flag, int 
mark)
else
active_cache[pos]->ce_flags &= ~flag;
active_cache[pos]->ce_flags |= CE_UPDATE_IN_BASE;
+   mark_fsmonitor_invalid(_index, active_cache[pos]);
cache_tree_invalidate_path(_index, path);
active_cache_changed |= CE_ENTRY_CHANGED;
return 0;
diff --git a/cache.h b/cache.h
index a916bc79e3..eccab968bd 100644
--- a/cache.h
+++ b/cache.h
@@ -203,6 +203,7 @@ struct cache_entry {
 #define CE_ADDED (1 << 19)
 
 #define CE_HASHED(1 << 20)
+#define CE_FSMONITOR_VALID   (1 << 21)
 #define CE_WT_REMOVE (1 << 22) /* remove in work directory */
 #define CE_CONFLICTED(1 << 23)
 
@@ -326,6 +327,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CACHE_TREE_CHANGED (1 << 5)
 #define SPLIT_INDEX_ORDERED(1 << 6)
 #define UNTRACKED_CHANGED  (1 << 7)
+#define FSMONITOR_CHANGED  (1 << 8)
 
 struct split_index;
 struct untracked_cache;
@@ -344,6 +346,7 @@ struct index_state {
struct hashmap dir_hash;
unsigned char sha1[20];
struct untracked_cache *untracked;
+   uint64_t fsmonitor_last_update;
 };
 
 extern struct index_state the_index;
@@ -679,8 +682,10 @@ extern void *read_blob_data_from_index(const struct 
index_state *, const char *,
 #define CE_MATCH_IGNORE_MISSING0x08
 /* enable stat refresh */
 #define CE_MATCH_REFRESH   0x10
-extern int ie_match_stat(const struct index_state *, const struct cache_entry 
*, struct stat *, unsigned int);
-extern int ie_modified(const struct index_state *, const struct cache_entry *, 
struct stat *, unsigned int);
+/* do stat comparison even if CE_FSMONITOR_VALID is true */
+#define

Re: What's cooking in git.git (Sep 2017, #03; Fri, 15)

Hi Junio,

On Fri, 15 Sep 2017, Junio C Hamano wrote:

> * js/rebase-i-final (2017-07-27) 10 commits
>  - rebase -i: rearrange fixup/squash lines using the rebase--helper
>  - t3415: test fixup with wrapped oneline
>  - rebase -i: skip unnecessary picks using the rebase--helper
>  - rebase -i: check for missing commits in the rebase--helper
>  - t3404: relax rebase.missingCommitsCheck tests
>  - rebase -i: also expand/collapse the SHA-1s via the rebase--helper
>  - rebase -i: do not invent onelines when expanding/collapsing SHA-1s
>  - rebase -i: remove useless indentation
>  - rebase -i: generate the script via rebase--helper
>  - t3415: verify that an empty instructionFormat is handled as before
> 
>  The final batch to "git rebase -i" updates to move more code from
>  the shell script to C.
> 
>  Expecting a reroll.

Please stop stating that you expect a reroll for rebase-i-extra when you
explicitly stated months ago that you would not take my v6. It gets a bit
annoying.

Thanks,
Dscho

Re: [PATCH] for_each_string_list_item(): behave correctly for empty list

Hi,

Michael Haggerty wrote:

> If you pass a newly-initialized or newly-cleared `string_list` to
> `for_each_string_list_item()`, then the latter does
>
> for (
> item = (list)->items; /* note, this is NULL */
> item < (list)->items + (list)->nr; /* note: NULL + 0 */
> ++item)
>
> Even though this probably works almost everywhere, it is undefined
> behavior, and it could plausibly cause highly-optimizing compilers to
> misbehave.

Wait, NULL + 0 is undefined behavior?

*checks the standard*  C99 section 6.5.6.8 says

"If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the
array object, the evaluation shall not produce an overflow;
otherwise, the behavior is undefined."

C99 section 7.17.3 says

"NULL

which expands to an implementation-defined null pointer constant"

6.3.2.3.3 says

"An integer constant expression with the value 0, or such an
expression cast to type void *, is called a null pointer
constant.  If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is
guaranteed to compare unequal to a pointer to any object or
function."

NULL doesn't point to anything so it looks like adding 0 to a null
pointer is indeed undefined.  (As a piece of trivia, strictly speaking
NULL + 0 would be undefined on some implementations and defined on
others, since an implementation is permitted to #define NULL to 0.)

So Coverity is not just warning because it is not able to guarantee
that list->nr is 0.  Huh.

> It would be a pain to have to change the signature of this macro, and
> we'd prefer not to add overhead to each iteration of the loop. So
> instead, whenever `list->items` is NULL, initialize `item` to point at
> a dummy `string_list_item` created for the purpose.

What signature change do you mean?  I don't understand what this
paragraph is alluding to.

> This problem was noticed by Coverity.
>
> Signed-off-by: Michael Haggerty 
> ---
[...]
>  string-list.c | 2 ++
>  string-list.h | 7 +--
>  2 files changed, 7 insertions(+), 2 deletions(-)

Does the following alternate fix work?  I think I prefer it because
it doesn't require introducing a new global.

Thanks,
Jonathan

diff --git i/string-list.h w/string-list.h
index 29bfb7ae45..dae33fbb89 100644
--- i/string-list.h
+++ w/string-list.h
@@ -33,7 +33,9 @@ typedef int (*string_list_each_func_t)(struct 
string_list_item *, void *);
 int for_each_string_list(struct string_list *list,
 string_list_each_func_t, void *cb_data);
 #define for_each_string_list_item(item,list) \
-   for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
+   for (item = (list)->items; \
+(list)->items && item < (list)->items + (list)->nr; \
+++item)

 /*
  * Apply want to each item in list, retaining only the ones for which

Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)

Johannes Schindelin  writes:

> If you want *contributors* to ping the thread themselves, how about
> *posting your updates there, too*?

I do not understand this comment at all.  That is what I and others
already and always do by responding to the patches, and when trying
to see if a topic is still alive, with thread-specific responses and
pings.

If you are demanding that "What's cooking" report to be sent as a
response to all the topics, that will *NOT* going to happen.  It is
meant to give a summary of the current state to help contributors
and reviewers be aware of what is happening across the entire tree
and not limited to a specific topic.

Re: [PATCH 2/3] merge-base: return fork-point outside reflog

Michael J Gruber  writes:

> I did not look up the discussion preceeding 4f21454b55 ("merge-base:
> handle --fork-point without reflog", 2016-10-12), but if "merge-base
> --fork-point" were about a "strict reflog" notion then there was nothing
> to fix back then - no reflog, no merge-base candidates. Period.
>
> I don't mind having two modes, say "--reflog" (strict reflog notion) and
> "--fork-point" (reflog plus DAG), but the current implementation is
> neither, and the current documentation clearly is the latter, which is
> what I'm trying to bring the implementaion in line with. Strict mode
> would need a revert of 4f21454b55 (which adds the tip of ref if the
> reflog is empty) for that mode.

Thanks for pointing out a flaw in the logic in my response.

When the user runs "merge-base --fork-point Branch Derived", the
question she is asking is: "what is the merge-base between the
Derived and a virtual commit across all the commits that we know
were at the tip of the Branch at some point and is the merge-base
commit among one of these historical tips of Branch?"

You are correct to point out that loosening the definition of the
"--fork-point" to lose the "and is the merge-base among the one of
these historical tips?" half of the question may be useful, and we
need to introduce "strict" vs "non-strict" modes in order to allow
such a distinction.  I somehow thought that giving and not giving
the "--fork-point" option would be a sufficient clue to express that
distinction, but that is not the same thing and my reasoning was
flawed.  Even when the exact answer of the more strict version of
the "--fork-point" (i.e. what the current implementation computes)
is not available because of missing reflog entries, the merge-base
computation that takes available reflog entries into account but
that does not insist that the resulting base must be among the known
historical tips (i.e. what your patch 2/3 wants to compute) could be
closer to the fork-point than "merge-base Branch Derived" without
"--fork-point" option would compute, and could be more useful as a
"best --onto commit that is guessed automatically" for the purpose
of "rebase".  I agree that there is a value in what your patch 2/3
wants to do when the current one that is more strict would say
"there is no known fork-point"---we would gain a way to say "... but
this is the best guess based on available data that may be better
than getting no answer." which we lack.

Having said all that, I do not agree with your last sentence in the
paragraph I quoted above.  It is a mere implementation detail to
consult the reflog to find out the set of "historical tips of the
Branch"; the current tip by definition is among the commits in that
set, even when the reflog of Branch is missing.  What 4f21454b55 did
was a reasonable "fix" that is still in line with the definition of
"--fork-point" from that point of view.

Whether we add a "looser" version of "--fork-point" to the system or
not, the more strict version should still use the current tip as one
of the historical tips (i.e. those that we would take from the
reflog if the reflog were not empty) in the more "strict" mode.  The
looser version may also want to do so as well.

Thanks.

RE: Is finalize_object_file in sha1_file.c handling errno from "rename" correctly?

2017-09-15 Thread Wesley Smith

Junio,

Thanks for your response.  I'm glad to see that you've been able to understand 
the problem.  I'm working with the Windows git team to properly return EACCESS 
when "rename" fails due to access permissions, but it also sounds like there 
will need to be a fix to finalize_object_file to better handle the case of an 
existing file when renaming.

Wesley Smith
T: 503.597.0556 | wsm...@greatergiving.com

-Original Message-
From: Junio C Hamano [mailto:gits...@pobox.com] 
Sent: Thursday, September 14, 2017 11:51 PM
To: Wesley Smith
Cc: git@vger.kernel.org
Subject: Re: Is finalize_object_file in sha1_file.c handling errno from 
"rename" correctly?

Wesley Smith  writes:

> 1) This bug is triggered because "git fetch" is causing a pack file to 
> be written when that same pack file already exists.  It seems like 
> this is harmless and shouldn't cause a problem.  Is that correct?

The final name of the packfile is derived from the entire contents of the 
packfile; it should be harmless when we attempt to rename a new file, which has 
exactly the same contents as an existing file, to the existing file and see a 
failure out of that attempt.

> 2) It seems that finalize_object_file is not accounting for the fact 
> that "link" will return EEXIST if the destination file already exists 
> but is not writeable, whereas "rename" will return EACCESS in this 
> case.  Is that correct?  If so, should finalize_object_file be fixed 
> to account for this? Perhaps it should check if the newfile exists 
> before calling rename.  Or, should the Windows mingw_rename function 
> be modified to return EEXIST in this case, even though that's not the 
> standard errno for that situation?

The codepath that is triggered by OBJECT_CREATION_USES_RENAMES ought to behave 
correctly even on non-Windows platforms, so bending the error code of rename() 
only on Windows to fit the existing error handling would not be a smart thing 
to do.  Rather, the rename() emulation should leave a correct errno and the 
caller should be updated to be aware of that error that is not EEXIST, which it 
currently knows about.

Thanks for spotting a problem and digging to its cause.

This is a #leftoverbits tangent, and should be done separately from your 
"OBJECT_CREATION_USES_RENAMES is broken" topic, but I think it is a bug to use 
finalize_object_file() directly to "finalize"
anything but an individual loose object file in the first place.

We should create a new shared helper that does what the function currently 
does, make finalize_object_file() call that new shared helper, and make sure 
finalize_object_file() is called only on a newly created loose object file.  
The codepath that creates a new packfile and other things and moves them to the 
final name should not call finalize_object_file() but the new shared helper 
instead.

That way, we could later implement the "collision? check" alluded by the 
in-code comment in finailize_object_file(), and we won't have to worry about 
affecting callers other than the one that creates a loose object file with such 
an enhancement.

Re: [PATCH] Fix merge parent checking with svn.pushmergeinfo.

Hi,

Jason Merrill wrote:

> Subject: Fix merge parent checking with svn.pushmergeinfo.
>
> Without this fix, svn dcommit of a merge with svn.pushmergeinfo set would
> get error messages like "merge parent  for  is on branch
> svn+ssh://gcc.gnu.org/svn/gcc/trunk, which is not under the git-svn root
> svn+ssh://ja...@gcc.gnu.org/svn/gcc!"
>
> * git-svn.perl: Remove username from rooturl before comparing to branchurl.
>
> Signed-off-by: Jason Merrill 

Interesting.  Thanks for writing it.

Could there be a test for this to make sure this doesn't regress in
the future?  See t/t9151-svn-mergeinfo.sh for some examples.

Nit: git doesn't use GNU-style changelogs, preferring to let the code
speak for itself.  Maybe it would work better as the subject line?
E.g. something like

git-svn: remove username from root before comparing to branch URL

Without this fix, ...

Signed-off-by: ...

> ---
>  git-svn.perl | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/git-svn.perl b/git-svn.perl
> index fa42364785..1663612b1c 100755
> --- a/git-svn.perl
> +++ b/git-svn.perl
> @@ -931,6 +931,7 @@ sub cmd_dcommit {
>   # information from different SVN repos, and paths
>   # which are not underneath this repository root.
>   my $rooturl = $gs->repos_root;
> + Git::SVN::remove_username ($rooturl);

style nit: Git doesn't include a space between function names and
their argument list.

I wonder if it would make sense to rename the $rooturl variable
since now it is not the unmodified root. E.g. how about

my $expect_url = $gs->repos_root;
Git::SVN::remove_username($expect_url);
...

>   foreach my $d (@$linear_refs) {
>   my %parentshash;
>   read_commit_parents(\%parentshash, $d);

The rest looks good.

Thanks and hope that helps,
Jonathan

RE: [PATCH 1/1] reset: fix reset when using the sparse-checkout feature.

2017-09-15 Thread Kevin Willford

From: Junio C Hamano [mailto:gits...@pobox.com]
Sent: Thursday, September 14, 2017 11:00 PM
> 
> Kevin Willford  writes:
> 
> > 1. Does this statement, "I only care about the files in this
> > sparse checkout, and do not concern me with anything else", mean
> > that git should not change files outside the sparse-checkout whether
> > that be in the working directory or in the index?  Or does that only
> > apply to the working directory and the index version of files can
> > change to whatever git the git command would do without using
> > sparse?  For example if I am doing a 'git reset HEAD~1'  should the
> > version in the index of files outside the sparse not be changed or
> > change to the HEAD~1 version with the skip-worktree bit on?
> 
> My understanding of the purpose of "sparse checkout" thing is that
> the user still wants to create correct whole-tree commit even the
> user does not have the whole-tree checkout.  The object names for
> blobs recorded in the index that are meant to be included in the
> next commit MUST be the same as those that would be in the index
> when the "sparse" feature is not in use.  "reset HEAD~1" should
> match the index entries to the tree entries in HEAD~1.  So, the
> latter, I would think, among your two alternatives.
> 
> IOW, after "git reset HEAD~", if you drop the skip-worktree bit from
> all index entries, "git diff --cached HEAD" must say "there is no
> changes".
> 
> The only difference between the "sparse" and normal would be that,
> because the "sparse" user does not intend to change anything outside
> the "sparse" area, these paths outside her "sparse" area would not
> materialize on the filesystem.  For the next "write-tree" out of the
> index to still write the correct tree out, the entries outside her
> "sparse" area in the index MUST match the tree of the commit she
> started working from.
> 

Makes sense.  And even though the reset might only change entries
outside the sparse area and the next status will report "nothing to commit,
working tree clean",  that's okay because the user hasn't changed
anything in their sparse area and intended to roll back the index to
whatever they specified in their reset command.

> > 2. How will this work with other git commands like merge, rebase,
> > cherry-pick, etc.?
> > 3. What about when there is a merge conflict with a file that is outside
> > the sparse checkout?
> 
> I would say, rather than forbidding such a merge, it should let her
> see and deal with the conflict by dropping the "this is outside the
> sparse area, so do not bother materializing it to the filesystem"
> bit, but tell her loudly what it did ("I checked out a half-merged
> file outside your sparse-checkout area because you'd need it while
> resolving the conflict").  By doing things that way, the user can
> decide if she wants to go ahead and complete the merge, even if the
> conflict is outside the area she is currently interested in, or
> postpone the merge and continue working on what she has been working
> on inside the narrowed-down area first.
> 
> I do not have a strong opinion whether the sparse-checkout
> configuration file should be adjusted to match when the command must
> tentatively bust the sparse checkout area; I'd imagine it can be
> argued either way.
> 
> Note that "sparse" is not my itch, and I would not be surprised if
> those who designed it may want it to work differently from my
> knee-jerk reaction in the previous two paragraphs, and I may even
> find such an alternative solution preferable.
> 
> But it is highly unlikely for any sensible solution would violate
> the basic premise, i.e. "the indexed contents will stay the same as
> the case without any 'sparse', so the next write-tree will do the
> right thing".

There was one other case that I thought about while implementing
this approach and it is when the user creates a file that is outside their
sparse definition.  From your explanation above I will attempt to
explain how I think it should work and please correct me if you see
it working differently.

The user creates a file that is outside the sparse area and it will show
up as untracked.  No problem here since the untracked are outside the
scope of using sparse.  Next the user adds the untracked file to the index.
The skip-worktree bit should be off and stay off since the user could make
additional changes and want to add them.  Once the user commits the
newly created file, I could see turning on the skip-worktree bit and
removing the file from the working tree after the commit but since
this was a user initiated action and it is sparse "checkout" it makes
sense to wait until the next "checkout" command which will use the
sparse definition to clean this file from the working directory and
turn on the skip-worktree flag.  If the user wants to continue to use
the new file, they would need to update their sparse-checkout file
to include the newly created file so that the next checkout will not

Re: [PATCH 0/3] Improve abbreviation disambiguation

Hi,

Derrick Stolee wrote:

> This is my first patch submission, and I look forward to your feedback.

Thanks for writing this.  Looks exciting.

[...]
> When displaying object ids, we frequently want to see an abbreviation
[etc]
> Note that performance improves in all cases, but the performance gain
> is larger when there are multiple, large pack-files. This gain comes
> from the lack of in-memory caching of index files that have already been
> inspected.

Can this (especially the performance information) go in the commit
message for one of the patches?

Otherwise it is harder for people to track down when looking at these
changes later with "git log".  In the worst case, if the mailing list
archive is down, then people would not have access to this information
at all.  For that reason, I try (though I often fail!) to restrict
cover letters to giving hints at e.g. what changed since the patch
series last visited the list, and to make the commit messages in the
patches themselves as self-contained as possible.

That aside, looking forward to reading the patches themselves in more
detail.  Thanks for working on this.

Sincerely,
Jonathan

[PATCH] Fix merge parent checking with svn.pushmergeinfo.

Without this fix, svn dcommit of a merge with svn.pushmergeinfo set would
get error messages like "merge parent  for  is on branch
svn+ssh://gcc.gnu.org/svn/gcc/trunk, which is not under the git-svn root
svn+ssh://ja...@gcc.gnu.org/svn/gcc!"

* git-svn.perl: Remove username from rooturl before comparing to branchurl.

Signed-off-by: Jason Merrill 
---
 git-svn.perl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/git-svn.perl b/git-svn.perl
index fa42364785..1663612b1c 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -931,6 +931,7 @@ sub cmd_dcommit {
# information from different SVN repos, and paths
# which are not underneath this repository root.
my $rooturl = $gs->repos_root;
+   Git::SVN::remove_username ($rooturl);
foreach my $d (@$linear_refs) {
my %parentshash;
read_commit_parents(\%parentshash, $d);
-- 
2.13.5

[PATCH 1/3] sha1_name: Create perf test for find_unique_abbrev()

Create helper program test-abbrev to compute the minimum length of a
disambiguating short-sha for 100,000 object ids. The ids are created
by iterating an unsigned int hash_base by a constant hash_delta and
copying hash_base five times across the sha1. Iterating by hash_delta
does not create a duplicate value for over 10,000,000 iterations.

test-abberv demonstrates the performance improvements that will be
shown by later improvements to the find_unique_abberv(). The value of
100,000 is large enough to show the significance of the later
improvements while only taking a few seconds on large repos.

Signed-off-by: Derrick Stolee 
---
 Makefile   |  1 +
 t/helper/.gitignore|  1 +
 t/helper/test-abbrev.c | 23 +++
 t/perf/p0008-abbrev.sh | 12 
 4 files changed, 37 insertions(+)
 create mode 100644 t/helper/test-abbrev.c
 create mode 100755 t/perf/p0008-abbrev.sh

diff --git a/Makefile b/Makefile
index f2bb7f2f6..081ca05e8 100644
--- a/Makefile
+++ b/Makefile
@@ -633,6 +633,7 @@ X =
 
 PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS))
 
+TEST_PROGRAMS_NEED_X += test-abbrev
 TEST_PROGRAMS_NEED_X += test-chmtime
 TEST_PROGRAMS_NEED_X += test-ctype
 TEST_PROGRAMS_NEED_X += test-config
diff --git a/t/helper/.gitignore b/t/helper/.gitignore
index 721650256..80ce7d836 100644
--- a/t/helper/.gitignore
+++ b/t/helper/.gitignore
@@ -1,3 +1,4 @@
+/test-abbrev
 /test-chmtime
 /test-ctype
 /test-config
diff --git a/t/helper/test-abbrev.c b/t/helper/test-abbrev.c
new file mode 100644
index 0..cb3551df9
--- /dev/null
+++ b/t/helper/test-abbrev.c
@@ -0,0 +1,23 @@
+#include "cache.h"
+
+int cmd_main(int ac, const char **av)
+{
+   setup_git_directory();
+
+   unsigned int hash_delt = 0x13579BDF;
+   unsigned int hash_base = 0x01020304;
+   struct object_id oid;
+
+   int i, count = 0;
+   int n = sizeof(struct object_id) / sizeof(int);
+   while (count++ < 10) {
+   for (i = 0; i < n; i++)
+   ((unsigned int*)oid.hash)[i] = hash_base;
+
+   find_unique_abbrev(oid.hash, MINIMUM_ABBREV);
+
+   hash_base += hash_delt;
+   }
+
+   exit(0);
+}
diff --git a/t/perf/p0008-abbrev.sh b/t/perf/p0008-abbrev.sh
new file mode 100755
index 0..7c3fad807
--- /dev/null
+++ b/t/perf/p0008-abbrev.sh
@@ -0,0 +1,12 @@
+#!/bin/sh
+
+test_description='Test object disambiguation through abbreviations'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_perf 'find_unique_abbrev()' '
+   test-abbrev
+'
+
+test_done
-- 
2.14.1.538.g56ec8fc98.dirty

[PATCH 3/3] sha1_name: Parse less while finding common prefix

Create get_hex_char_from_oid() to parse oids one hex character at a
time. This prevents unnecessary copying of hex characters in
extend_abbrev_len() when finding the length of a common prefix.

This change decreases the time to run test-abbrev by up to 40% on
large repos.

Signed-off-by: Derrick Stolee 
---
 sha1_name.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index f2a1ebe49..bb47b6702 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -480,13 +480,22 @@ struct min_abbrev_data {
char *hex;
 };
 
+static inline char get_hex_char_from_oid(const struct object_id *oid, int i)
+{
+   static const char hex[] = "0123456789abcdef";
+
+   if ((i & 1) == 0)
+   return hex[oid->hash[i >> 1] >> 4];
+   else
+   return hex[oid->hash[i >> 1] & 0xf];
+}
+
 static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
 {
struct min_abbrev_data *mad = cb_data;
 
-   char *hex = oid_to_hex(oid);
unsigned int i = mad->init_len;
-   while (mad->hex[i] && mad->hex[i] == hex[i])
+   while (mad->hex[i] && mad->hex[i] == get_hex_char_from_oid(oid, i))
i++;
 
if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
-- 
2.14.1.538.g56ec8fc98.dirty

[PATCH 0/3] Improve abbreviation disambiguation

Hello,

My name is Derrick Stolee and I just switched teams at Microsoft from
the VSTS Git Server to work on performance improvements in core Git.

This is my first patch submission, and I look forward to your feedback.

Thanks,
 Stolee


When displaying object ids, we frequently want to see an abbreviation
for easier typing. That abbreviation must be unambiguous among all
object ids.

The current implementation of find_unique_abbrev() performs a loop
checking if each abbreviation length is unambiguous until finding one
that works. This causes multiple round-trips to the disk when starting
with the default abbreviation length (usually 7) but needing up to 12
characters for an unambiguous short-sha. For very large repos, this
effect is pronounced and causes issues with several commands, from
obvious consumers `status` and `log` to less obvious commands such as
`fetch` and `push`.

This patch improves performance by iterating over objects matching the
short abbreviation only once, inspecting each object id, and reporting
the minimum length of an unambiguous abbreviation.

A performance helper `test-abbrev` and performance test `p0008-abbrev.sh`
are added to demonstrate this performance improvement. Here are some
performance results for the three included commits, using
GIT_PERF_REPEAT_COUNT=10 since the first test is frequently an outlier
due to the file cache being cold.

Running git on a Linux VM, we see the following gains.

| Repo| Pack-Files | Loose Objs | Baseline | Patch 2 | Patch 3 |
|-|||--|-|-|
| Git.git | 1  | 0  | 0.46 s   | -87.0%  | -87.0%  |
| Git.git | 5  | 0  | 1.04 s   | -84.6%  | -85.6%  |
| Git.git | 4  | 75852  | 0.88 s   | -86.4%  | -86.4%  |
| Linux   | 1  | 0  | 0.63 s   | -38.1%  | -69.8%  |
| Linux   | 24 | 0  | 5.41 s   | -69.3%  | -71.5%  |
| Linux   | 23 | 323441 | 5.41 s   | -70.6%  | -73.4%  |

Running a similar patch on Git for Windows, we see the following gains.

| Repo  | Pack-Files | Loose | Baseline | Patch 2 | Patch 3 |
|---||---|--|-|-|
| GitForWindows | 6  | 319   | 7.19 s   | -91.1%  | -91.5%  |
| VSTS  | 3  | 38| 7.83 s   | -88.9%  | -90.9%  |
| Linux | 3  | 0 | 7.92 s   | -87.9%  | -90.2%  |
| Windows   | 50 | 219   | 17.8 s   | -98.6%  | -98.6%  |

Note that performance improves in all cases, but the performance gain
is larger when there are multiple, large pack-files. This gain comes
from the lack of in-memory caching of index files that have already been
inspected.


Derrick Stolee (3):
  sha1_name: Create perf test for find_unique_abbrev()
  sha1_name: Unroll len loop in find_unique_abbrev_r
  sha1_name: Parse less while finding common prefix

 Makefile   |  1 +
 sha1_name.c| 66 ++
 t/helper/.gitignore|  1 +
 t/helper/test-abbrev.c | 22 +
 t/perf/p0008-abbrev.sh | 12 +
 5 files changed, 87 insertions(+), 15 deletions(-)
 create mode 100644 t/helper/test-abbrev.c
 create mode 100755 t/perf/p0008-abbrev.sh

-- 
2.14.1.538.g56ec8fc98.dirty

[PATCH 2/3] sha1_name: Unroll len loop in find_unique_abbrev_r

Unroll the while loop inside find_unique_abbrev_r to avoid iterating
through all loose objects and packfiles multiple times when the short
name is longer than the predicted length.

Instead, inspect each object that collides with the estimated
abbreviation to find the longest common prefix.

Signed-off-by: Derrick Stolee 
---
 sha1_name.c | 57 ++---
 1 file changed, 42 insertions(+), 15 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 134ac9742..f2a1ebe49 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -474,10 +474,32 @@ static unsigned msb(unsigned long val)
return r;
 }
 
-int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
+struct min_abbrev_data {
+   unsigned int init_len;
+   unsigned int cur_len;
+   char *hex;
+};
+
+static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
 {
-   int status, exists;
+   struct min_abbrev_data *mad = cb_data;
+
+   char *hex = oid_to_hex(oid);
+   unsigned int i = mad->init_len;
+   while (mad->hex[i] && mad->hex[i] == hex[i])
+   i++;
+
+   if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
+   mad->cur_len = i + 1;
+
+   return 0;
+}
 
+int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
+{
+   struct disambiguate_state ds;
+   struct min_abbrev_data mad;
+   struct object_id oid_ret;
if (len < 0) {
unsigned long count = approximate_object_count();
/*
@@ -503,19 +525,24 @@ int find_unique_abbrev_r(char *hex, const unsigned char 
*sha1, int len)
sha1_to_hex_r(hex, sha1);
if (len == GIT_SHA1_HEXSZ || !len)
return GIT_SHA1_HEXSZ;
-   exists = has_sha1_file(sha1);
-   while (len < GIT_SHA1_HEXSZ) {
-   struct object_id oid_ret;
-   status = get_short_oid(hex, len, _ret, GET_OID_QUIETLY);
-   if (exists
-   ? !status
-   : status == SHORT_NAME_NOT_FOUND) {
-   hex[len] = 0;
-   return len;
-   }
-   len++;
-   }
-   return len;
+
+   if (init_object_disambiguation(hex, len, ) < 0)
+   return -1;
+
+   mad.init_len = len;
+   mad.cur_len = len;
+   mad.hex = hex;
+
+   ds.fn = extend_abbrev_len;
+   ds.always_call_fn = 1;
+   ds.cb_data = (void*)
+
+   find_short_object_filename();
+   find_short_packed_object();
+   (void)finish_object_disambiguation(, _ret);
+
+   hex[mad.cur_len] = 0;
+   return mad.cur_len;
 }
 
 const char *find_unique_abbrev(const unsigned char *sha1, int len)
-- 
2.14.1.538.g56ec8fc98.dirty

Re: [PATCH 1/2] test-lib: group system specific FIFO tests by system

2017-09-15 Thread Ramsay Jones

On 15/09/17 11:31, Michael J Gruber wrote:
> Johannes Schindelin venit, vidit, dixit 15.09.2017 00:21:
>> Hi Michael,
>>
>> On Thu, 14 Sep 2017, Michael J Gruber wrote:
>>
>>> test-lib determines whether a file-system supports FIFOs and needs to do
>>> special casing for CYGWIN and MINGW. This separates those system
>>> specific settings from those at more central place.
>>>
>>> Set mkfifo()  to false in the central system specific place so that the
>>> same test works everywhere.
>>
>> The mkfifo() emulation of Cygwin seems to work, no? I think it works even
>> in MSYS2, but not in MINGW.
>>
>> So maybe this patch should affect only the MINGW arm?
> 
> I only reorganised the code, so in that sense the patch does not affect
> any system ;)
> 
> If indeed mkfifo works on CYGWIN than a separate patch should remove the
> exclusion of CYGWIN; alas, I can't confirm (I wish MS still had the old
> academic alliance programme).

I haven't tried recently, but back in 2013 Mark Levedahl wrote
in his commit message that mkfifo was known not to work on the cygwin
mailing list. (see commit 9443605b5d, "test-lib.sh - cygwin does
not have usable FIFOs", 04-07-2013) In fact, you can see that this
was disabled on cygwin before MinGW!

It is possible, of course, that fifos now work on cygwin. (I will
put it on my TODO list).

In any event, this is indeed a separate issue.

ATB,
Ramsay Jones

Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)

Hi Junio,

On Thu, 24 Aug 2017, Junio C Hamano wrote:

> Lars Schneider  writes:
> 
> > On 22 Aug 2017, at 21:56, Junio C Hamano  wrote:
> >
> >> Here are the topics that have been cooking.  Commits prefixed with
> >> '-' are only in 'pu' (proposed updates) while commits prefixed with
> >> '+' are in 'next'.  The ones marked with '.' do not appear in any of
> >> the integration branches, but I am still holding onto them.
> >> 
> >> The second batch of topics are in.  This cycle is going a bit slower
> >> than the previous one (as of mid-week #3 of this cycle, we have
> >> about 200 patches on 'master' since v2.14, compared to about 300
> >> patches in the cycle towards v2.14 at a similar point in the cycle),
> >> but hopefully we can catch up eventually.  
> >> 
> >> I am planning to be offline most of the next week, by the way.
> >> 
> >> You can find the changes described here in the integration branches
> >> of the repositories listed at
> >> 
> >>http://git-blame.blogspot.com/p/git-public-repositories.html
> >> 
> >> --
> >> [Graduated to "master"]
> >> 
> >
> > Hi Junio,
> >
> > just in case this got lost: I posted a patch with an improvement to 
> > 2841e8f ("convert: add "status=delayed" to filter process protocol", 
> > 2017-06-30) which was merged to master in the beginning of 2.15.
> >
> > https://public-inbox.org/git/20170820154720.32259-1-larsxschnei...@gmail.com/
> 
> Thanks for pinging, but next time ping the thread itself if it is
> about something that is not in What's cooking report you are
> responding to.

If you want *contributors* to ping the thread themselves, how about
*posting your updates there, too*?

It does make things inconvenient for contributors if they have to monitor
those Whats' cooking emails in addition to the mail threads, you know?

Ciao,
Dscho

[PATCH] for_each_string_list_item(): behave correctly for empty list

2017-09-15 Thread Michael Haggerty

If you pass a newly-initialized or newly-cleared `string_list` to
`for_each_string_list_item()`, then the latter does

for (
item = (list)->items; /* note, this is NULL */
item < (list)->items + (list)->nr; /* note: NULL + 0 */
++item)

Even though this probably works almost everywhere, it is undefined
behavior, and it could plausibly cause highly-optimizing compilers to
misbehave.

It would be a pain to have to change the signature of this macro, and
we'd prefer not to add overhead to each iteration of the loop. So
instead, whenever `list->items` is NULL, initialize `item` to point at
a dummy `string_list_item` created for the purpose.

This problem was noticed by Coverity.

Signed-off-by: Michael Haggerty 
---
Just a little thing I noticed in a Coverity report. This macro has
been broken since it was first introduced, in 2010.

This patch applies against maint. It is also available from my Git
fork [1] as branch `iter-empty-string-list`.

Michael

[1] https://github.com/mhagger/git

 string-list.c | 2 ++
 string-list.h | 7 +--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/string-list.c b/string-list.c
index 806b4c8723..7eacf6037f 100644
--- a/string-list.c
+++ b/string-list.c
@@ -1,6 +1,8 @@
 #include "cache.h"
 #include "string-list.h"
 
+struct string_list_item dummy_string_list_item;
+
 void string_list_init(struct string_list *list, int strdup_strings)
 {
memset(list, 0, sizeof(*list));
diff --git a/string-list.h b/string-list.h
index 29bfb7ae45..79bb78d80a 100644
--- a/string-list.h
+++ b/string-list.h
@@ -32,8 +32,11 @@ void string_list_clear_func(struct string_list *list, 
string_list_clear_func_t c
 typedef int (*string_list_each_func_t)(struct string_list_item *, void *);
 int for_each_string_list(struct string_list *list,
 string_list_each_func_t, void *cb_data);
-#define for_each_string_list_item(item,list) \
-   for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
+extern struct string_list_item dummy_string_list_item;
+#define for_each_string_list_item(item,list) \
+   for (item = (list)->items ? (list)->items : _string_list_item; \
+item < (list)->items + (list)->nr;  \
+++item)
 
 /*
  * Apply want to each item in list, retaining only the ones for which
-- 
2.14.1

Re: [PATCH v2] config: avoid "write_in_full(fd, buf, len) < len" pattern

2017-09-15 Thread Ramsay Jones

On 15/09/17 01:37, Jeff King wrote:
> On Thu, Sep 14, 2017 at 12:31:38AM +0100, Ramsay Jones wrote:
> 
>> I just tried it again tonight; the current master branch has 3532
>> warnings when compiled with -Wextra, 1409 of which are -Wsign-compare
>> warnings. After applying the patch below, those numbers are reduced
>> by 344 to 3188/1065.
> 
> I'd love it if we were clean on -Wextra. My big question is many
> contortions we'd have to go through in the code. I don't mind at all if
> we're actually cleaning as we go (e.g., using types of the correct
> signedness, or preventing possible funny interactions). I'm just worried
> it will turn into a bunch of noisy casts.

Hmm, my memory is not what it was, ... ;-) However, I seem to recall
that most of the changes were "improvements", with only a (relatively)
few scattering of casts (behind existing macros - namely the OPTION
macros). (Oh, wait, I think 'unused' parameters etc., was another
problem area).

> The patch you showed seems like an improvement to the code, but I don't
> know if that would be the case for all of them. :)

Yes, I've had that patch (and others like it) hanging around for
years! (perhaps it's time to dig through all those old branches)

ATB,
Ramsay Jones

[RFC PATCH v2 2/2] submodule: simplify decision tree whether to or not to fetch

2017-09-15 Thread Heiko Voigt

To make extending this logic later easier.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 74 ++---
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/submodule.c b/submodule.c
index 38b9905e43..fa44fc59f2 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1118,6 +1118,31 @@ struct submodule_parallel_fetch {
 };
 #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
 
+static int get_fetch_recurse_config(const struct submodule *submodule,
+   struct submodule_parallel_fetch *spf)
+{
+   if (spf->command_line_option != RECURSE_SUBMODULES_DEFAULT)
+   return spf->command_line_option;
+
+   if (submodule) {
+   char *key;
+   const char *value;
+
+   int fetch_recurse = submodule->fetch_recurse;
+   key = xstrfmt("submodule.%s.fetchRecurseSubmodules", 
submodule->name);
+   if (!repo_config_get_string_const(the_repository, key, )) 
{
+   fetch_recurse = parse_fetch_recurse_submodules_arg(key, 
value);
+   }
+   free(key);
+
+   if (fetch_recurse != RECURSE_SUBMODULES_NONE)
+   /* local config overrules everything except commandline 
*/
+   return fetch_recurse;
+   }
+
+   return spf->default_option;
+}
+
 static int get_next_submodule(struct child_process *cp,
  struct strbuf *err, void *data, void **task_cb)
 {
@@ -1137,46 +1162,21 @@ static int get_next_submodule(struct child_process *cp,
 
submodule = submodule_from_path(_oid, ce->name);
 
-   default_argv = "yes";
-   if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
-   int fetch_recurse = RECURSE_SUBMODULES_NONE;
-
-   if (submodule) {
-   char *key;
-   const char *value;
-
-   fetch_recurse = submodule->fetch_recurse;
-   key = 
xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name);
-   if 
(!repo_config_get_string_const(the_repository, key, )) {
-   fetch_recurse = 
parse_fetch_recurse_submodules_arg(key, value);
-   }
-   free(key);
-   }
-
-   if (fetch_recurse != RECURSE_SUBMODULES_NONE) {
-   if (fetch_recurse == RECURSE_SUBMODULES_OFF)
-   continue;
-   if (fetch_recurse == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
-
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   } else {
-   if (spf->default_option == 
RECURSE_SUBMODULES_OFF)
-   continue;
-   if (spf->default_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
- 
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   }
-   } else if (spf->command_line_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
+   switch (get_fetch_recurse_config(submodule, spf))
+   {
+   default:
+   case RECURSE_SUBMODULES_DEFAULT:
+   case RECURSE_SUBMODULES_ON_DEMAND:
+   if (!submodule || 
!unsorted_string_list_lookup(_submodule_names,
 submodule->name))
continue;
default_argv = "on-demand";
+   break;
+   case RECURSE_SUBMODULES_ON:
+   default_argv = "yes";
+   break;
+   case RECURSE_SUBMODULES_OFF:
+   continue;
}
 
strbuf_addf(_path, "%s/%s", spf->work_tree, ce->name);
-- 
2.14.1.145.gb3622a4

Re: commit-msg hook does not run on merge with --no-ff option

2017-09-15 Thread Joseph Dunne

Valid point.  The way my project is set up I always get a conflict on
merge operations, so technically all my merges (except fast forward
merges) end with a git-commit, which of course runs the commit-msg
hook.  It seems everything is working as designed.  Shame there isn't
a merge-msg hook.

It seems I have no choice but to work around this issue.  Thanks for your help.

On Thu, Sep 14, 2017 at 8:50 PM, Kaartic Sivaraam
 wrote:
> On Tue, 2017-09-12 at 13:24 -0500, Joseph Dunne wrote:
>> Sorry I don't understand your question.  The commit-msg hook runs
>> properly in all cases except when I perform a merge with the --no-ff
>> option enabled.
>>
>
> It's working just as the documentation says it does (emphasis mine),
>
>
> This hook is invoked by **git commit**, and can be bypassed with the 
> --no-verify option.
> It takes a single parameter, the name of the file that holds the proposed 
> commit log
> message. Exiting with a non-zero status causes the git commit to abort.
>
>
> It says that 'commit-msg' hook is invoked only for a "commit" (it's not
> a MERGE-msg hook you see, it doesn't exist anyway). In case you see the
> hook getting invoked for a merge then that's an issue, I guess. For
> what kind of merges do you see the 'commit-msg' hook getting invoked?
>
> --
> Kaartic

[RFC PATCH v2 1/2] implement fetching of moved submodules

2017-09-15 Thread Heiko Voigt

We store the changed submodules paths to calculate which submodule needs
fetching. This does not work for moved submodules since their paths do
not stay the same in case of a moved submodules. In case of new
submodules we do not have a path in the current checkout, since they
just appeared in this fetch.

It is more general to collect the submodule names for changes instead of
their paths to include the above cases.

With the change described above we implement 'on-demand' fetching of
changes in moved submodules.

Note: This does only work when repositories have a .gitmodules file. In
other words: It breaks if we do not get a name for a repository.
IIRC, consensus was that this is a requirement to get nice submodule
handling these days?

NEEDSWORK: This breaks t5531 and t5545 because they do not use a
.gitmodules file. I will add a fallback to paths to help such users.

Signed-off-by: Heiko Voigt 
---
This an update of the previous series[1] to the current master. The
fallback is still missing but now it should not conflict with any topics
in flight anymore (hopefully).

Cheers Heiko

[1] https://public-inbox.org/git/20170817105349.gc52...@book.hvoigt.net/

 submodule.c | 91 +
 t/t5526-fetch-submodules.sh | 35 +
 2 files changed, 85 insertions(+), 41 deletions(-)

diff --git a/submodule.c b/submodule.c
index 3cea8221e0..38b9905e43 100644
--- a/submodule.c
+++ b/submodule.c
@@ -21,7 +21,7 @@
 #include "parse-options.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
-static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
+static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -667,11 +667,11 @@ const struct submodule *submodule_from_ce(const struct 
cache_entry *ce)
 }
 
 static struct oid_array *submodule_commits(struct string_list *submodules,
-  const char *path)
+  const char *name)
 {
struct string_list_item *item;
 
-   item = string_list_insert(submodules, path);
+   item = string_list_insert(submodules, name);
if (item->util)
return (struct oid_array *) item->util;
 
@@ -680,39 +680,34 @@ static struct oid_array *submodule_commits(struct 
string_list *submodules,
return (struct oid_array *) item->util;
 }
 
+struct collect_changed_submodules_cb_data {
+   struct string_list *changed;
+   const struct object_id *commit_oid;
+};
+
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
  struct diff_options *options,
  void *data)
 {
+   struct collect_changed_submodules_cb_data *me = data;
+   struct string_list *changed = me->changed;
+   const struct object_id *commit_oid = me->commit_oid;
int i;
-   struct string_list *changed = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
struct oid_array *commits;
+   const struct submodule *submodule;
+
if (!S_ISGITLINK(p->two->mode))
continue;
 
-   if (S_ISGITLINK(p->one->mode)) {
-   /*
-* NEEDSWORK: We should honor the name configured in
-* the .gitmodules file of the commit we are examining
-* here to be able to correctly follow submodules
-* being moved around.
-*/
-   commits = submodule_commits(changed, p->two->path);
-   oid_array_append(commits, >two->oid);
-   } else {
-   /* Submodule is new or was moved here */
-   /*
-* NEEDSWORK: When the .git directories of submodules
-* live inside the superprojects .git directory some
-* day we should fetch new submodules directly into
-* that location too when config or options request
-* that so they can be checked out from there.
-*/
+   submodule = submodule_from_path(commit_oid, p->two->path);
+   if (!submodule)
continue;
-   }
+
+   commits = submodule_commits(changed, submodule->name);
+   oid_array_append(commits, >two->oid);
}
 }
 
@@ -735,11 +730,14 @@ static void collect_changed_submodules(struct string_list 
*changed,
 
while ((commit = get_revision())) {
struct rev_info diff_rev;
+   struct collect_changed_submodules_cb_data

Re: [RFC/PATCH v4 00/49] Add initial experimental external ODB support

2017-09-15 Thread Christian Couder

(It looks like I did not reply to this other email yet, sorry about
this late reply.)

On Wed, Jul 12, 2017 at 9:06 PM, Jonathan Tan  wrote:
> On Tue, 20 Jun 2017 09:54:34 +0200
> Christian Couder  wrote:
>
>> Git can store its objects only in the form of loose objects in
>> separate files or packed objects in a pack file.
>>
>> To be able to better handle some kind of objects, for example big
>> blobs, it would be nice if Git could store its objects in other object
>> databases (ODB).
>
> Thanks for this, and sorry for the late reply. It's good to know that
> others are thinking about "missing" objects in repos too.
>
>>   - "have": the helper should respond with the sha1, size and type of
>> all the objects the external ODB contains, one object per line.
>
> This should work well if we are not caching this "have" information
> locally (that is, if the object store can be accessed with low latency),
> but I am not sure if this will work otherwise.

Yeah, there could be problems related to caching or not caching the
"have" information.
As a repo should not send the blobs that are in an external odb, I
think it could be useful to cache the "have" information.
I plan to take a look and add related tests soon.

> I see that you have
> proposed a local cache-using method later in the e-mail - my comments on
> that are below.
>
>>   - "get ": the helper should then read from the external ODB
>> the content of the object corresponding to  and pass it to
>> Git.
>
> This makes sense - I have some patches [1] that implement this with the
> "fault_in" mechanism described in your e-mail.
>
> [1] 
> https://public-inbox.org/git/cover.1499800530.git.jonathanta...@google.com/
>
>> * Transfering information
>>
>> To tranfer information about the blobs stored in external ODB, some
>> special refs, called "odb ref", similar as replace refs, are used in
>> the tests of this series, but in general nothing forces the helper to
>> use that mechanism.
>>
>> The external odb helper is responsible for using and creating the refs
>> in refs/odbs//, if it wants to do that. It is free for
>> example to just create one ref, as it is also free to create many
>> refs. Git would just transmit the refs that have been created by this
>> helper, if Git is asked to do so.
>>
>> For now in the tests there is one odb ref per blob, as it is simple
>> and as it is similar to what git-lfs does. Each ref name is
>> refs/odbs// where  is the sha1 of the blob stored
>> in the external odb named .
>>
>> These odb refs point to a blob that is stored in the Git
>> repository and contain information about the blob stored in the
>> external odb. This information can be specific to the external odb.
>> The repos can then share this information using commands like:
>>
>> `git fetch origin "refs/odbs//*:refs/odbs//*"`
>>
>> At the end of the current patch series, "git clone" is teached a
>> "--initial-refspec" option, that asks it to first fetch some specified
>> refs. This is used in the tests to fetch the odb refs first.
>>
>> This way only one "git clone" command can setup a repo using the
>> external ODB mechanism as long as the right helper is installed on the
>> machine and as long as the following options are used:
>>
>>   - "--initial-refspec " to fetch the odb refspec
>>   - "-c odb..command=" to configure the helper
>
> A method like this means that information about every object is
> downloaded, regardless of which branches were actually cloned, and
> regardless of what parameters (e.g. max blob size) were used to control
> the objects that were actually cloned.
>
> We could make, say, one "odb ref" per size and branch - for example,
> "refs/odbs/master/0", "refs/odbs/master/1k", "refs/odbs/master/1m", etc.
> - and have the client know which one to download. But this wouldn't
> scale if we introduce different object filters in the clone and fetch
> commands.

Yeah, there are multiple ways to do that.

> I think that it is best to have upload-pack send this information
> together with the packfile, since it knows exactly what objects were
> omitted, and therefore what information the client needs. As discussed
> in a sibling e-mail, clone/fetch already needs to be modified to omit
> objects anyway.

I try to avoid sending this information as I don't think it is
necessary and it simplify things a lot to not have to change the
communication protocol.

Re: [RFC/PATCH v4 00/49] Add initial experimental external ODB support

2017-09-15 Thread Christian Couder

(It looks like I did not reply to this email yet, sorry about this late reply.)

On Thu, Jul 6, 2017 at 7:36 PM, Ben Peart  wrote:
>
> On 7/1/2017 3:41 PM, Christian Couder wrote:
>>
>> On Fri, Jun 23, 2017 at 8:24 PM, Ben Peart  wrote:
>>>
>>> Great to see this making progress!
>>>
>>> My thoughts and questions are mostly about the overall design tradeoffs.
>>>
>>> Is your intention to enable the ODB to completely replace the regular
>>> object
>>> store or just to supplement it?
>>
>> It is to supplement it, as I think the regular object store works very
>> well most of the time.
>
> I certainly understand the desire to restrict the scope of the patch series.
> I know full replacement is a much larger problem as it would touch much more
> of the codebase.
>
> I'd still like to see an object store that was thread safe, more robust (ie
> transactional) and hopefully faster so I am hoping we can design the ODB
> interface to eventually enable that.

I doubt that the way Git and the external odb helpers communicate in
process mode is good enough for multi-threading, so I think this would
require another communication mechanism altogether.

> For example: it seems the ODB helpers need to be able to be called before
> the regular object store in the "put" case (so they can intercept large
> objects for example) and after in in the "get" case to enable "fault-in."
> Something like this:
>
> have/get
> 
> git object store
> large object ODB helper
>
> put
> ===
> large object ODB helper
> git object store
>
> It would be nice if that order wasn't hard coded but that the order or level
> of the "git object store" could be specified using the same mechanism as
> used for the ODB helpers so that some day you could do something like this:
>
> have/get
> 
> "LMDB" ODB helper
> git object store
>
> put
> ===
> "LMDB" ODB helper
> git object store
>
> (and even further out, drop the current git object store completely :)).

Yeah, I understand that it could help.

>>> I think it would be good to ensure the
>>> interface is robust and performant enough to actually replace the current
>>> object store interface (even if we don't actually do that just yet).
>>
>>
>> I agree that it should be robust and performant, but I don't think it
>> needs to be as performant in all cases as the current object store
>> right now.
>>
>>> Another way of asking this is: do the 3 verbs (have, get, put) and the 3
>>> types of "get" enable you to wrap the current loose object and pack file
>>> code as ODBs and run completely via the external ODB interface?  If not,
>>> what is missing and can it be added?
>
> One example of what I think is missing is a way to stream objects (ie
> get_stream, put_stream).  This isn't used often in git but it did exist last
> I checked.  I'm not saying this needs to be supported in the first version -
> more if we want to support total replacement.

I agree and it seems to me that others have already pointed that the
streaming API could be used.

> I also wonder if we'd need an "optimize" verb (for "git gc") or a "validate"
> verb (for "git fsck").  Again, only if/when we are looking at total
> replacement.

Yeah, I agree that something might be useful for these commands.

>> Right now the "put" verb only send plain blobs, so the most logical
>> way to run completely via the external ODB interface would be to use
>> it to send and receive plain blobs. There are tests scripts (t0420,
>> t0470 and t0480) that use an http server as the external ODB and all
>> the blobs are stored in it.
>>
>> And yeah for now it works only for blobs. There is a temporary patch
>> in the series that limits it to blobs. For the non RFC patch series, I
>> think it should either use the attribute system to tell which objects
>> should be run via the external ODB interface, or perhaps there should
>> be a way to ask each external ODB helper which kind of objects and
>> blobs it can handle. I should add that in the future work part.
>
> Sounds good.  For GVFS we handle all object types (including commits and
> trees) so would need this to be enabled so that we can switch to using it.

Ok.

>>> _Eventually_ it would be great to see the current object store(s) moved
>>> behind the new ODB interface.
>>
>> This is not one of my goals and I think it could be a problem if we
>> want to keep the "fault in" mode.
>
>> In this mode the helper writes or reads directly to or from the
>> current object store, so it needs the current object store to be
>> available.
>
> I think implementing "fault in" should be an option that the ODB handler can
> implement but should not be required by the design/interface.  As you state
> above, this could be as simple as having the ODB handler write the object to
> the git object store on "get."

This is 'get_direct' since v5 and yeah it is optional.

>> Also I think compatibility with other git implementations is important
>> and it is a good thing that

Re: [PATCH v3] commit-template: change a message to be more intuitive

2017-09-15 Thread Kaartic Sivaraam

On Fri, 2017-09-15 at 12:00 +0200, Michael J Gruber wrote:
> Kaartic Sivaraam venit, vidit, dixit 15.09.2017 06:50:
> > 
> >  I didn't expect the least that this would go upto v3. In case anyboy finds

That's should have been 'anybody'.

> >  something wrong with this change too, it's a lot better to drop this 
> > altogether
> >  than going for a v4.
> 
> That happens more often than not :)
> 

:)

> Your original intent was to avoid any misunderstandings, and all the
> comments and iterations made sure that we don't trade one possible
> source of misunderstanding for another but avoid them all.
> 

Of course they did. 

> I consider v4 to be a strict improvement over the status quo and (as fas
> as I can see) to serve your original intent as good as possible.

I thought I shouldn't go for a v4 as I feared it might make things
worse than better because the original sentence wasn't that confusing
in the first place ;-)

-- 
Kaartic

[PATCH v2] add test for bug in git-mv for recursive submodules

2017-09-15 Thread Heiko Voigt

When using git-mv with a submodule it will detect that and update the
paths for its configurations (.gitmodules, worktree and gitfile). This
does not work for recursive submodules where a user renames the root
submodule.

We discovered this fact when working on on-demand fetch for renamed
submodules. Lets add a test to document.

Signed-off-by: Heiko Voigt 
---
On Fri, Aug 18, 2017 at 12:04:03PM -0700, Stefan Beller wrote:
> > I just copied the shortcut that they were adding themselfes as submodule
> > in 'setup submodule'. The whole setup of submodules in this test is like
> > this. This way we already had a nested submodule structure which I could
> > just add.
> >
> > I agree that this is unrealistic so I can change that in the test I am
> > adding. But from what I have seen, this shortcut is taken in quite some
> > places when dealing with submodules.
> 
> Please do not make it worse.
> Once upon a time (late '16 IIRC) I had a series floating on the
> list removing all occurrences, but there were issues with the
> series and it did not land.

Took a little while but here is a more clean patch creating individual
submodules for the nesting.

Cheers Heiko

 t/t7001-mv.sh | 25 +
 1 file changed, 25 insertions(+)

diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
index e365d1ff77..cbc5fb37fe 100755
--- a/t/t7001-mv.sh
+++ b/t/t7001-mv.sh
@@ -491,4 +491,29 @@ test_expect_success 'moving a submodule in nested 
directories' '
test_cmp actual expect
 '
 
+test_expect_failure 'moving nested submodules' '
+   git commit -am "cleanup commit" &&
+   mkdir sub_nested_nested &&
+   (cd sub_nested_nested &&
+   touch nested_level2 &&
+   git init &&
+   git add . &&
+   git commit -m "nested level 2"
+   ) &&
+   mkdir sub_nested &&
+   (cd sub_nested &&
+   touch nested_level1 &&
+   git init &&
+   git add . &&
+   git commit -m "nested level 1"
+   git submodule add ../sub_nested_nested &&
+   git commit -m "add nested level 2"
+   ) &&
+   git submodule add ./sub_nested nested_move &&
+   git commit -m "add nested_move" &&
+   git submodule update --init --recursive &&
+   git mv nested_move sub_nested_moved &&
+   git status
+'
+
 test_done
-- 
2.14.1.145.gb3622a4

Re: how to remove from history just one version of a file/dir?

2017-09-15 Thread Jeff King

On Fri, Sep 15, 2017 at 07:06:43AM -0400, Robert P. J. Day wrote:

> > I think you want to stick with a --tree-filter (or an
> > --index-filter), but just selectively decide when to do the
> > deletion. For example, if you can tell the difference between the
> > two states based on the presence of some file, then perhaps:
> >
> >   git filter-branch --prune-empty --index-filter '
> > if git rev-parse --verify :dir/sentinel >/dev/null 2>&1
> > then
> >   git rm --cached -rf dir
> > fi
> >   ' HEAD
> >
> > The "--prune-empty" is optional, but will drop commits that become
> > empty because they _only_ touched that directory.
> >
> > We use ":dir/sentinel" to see if the entry is in the index, because
> > the index filter won't have the tree checked out. Likewise, we need
> > to use "rm --cached" to just touch the index.
> 
>   got it. one last query -- i note that there is no "else" clause in
> that code for "--index-filter". am i assuming correctly that if i was
> using "--tree-filter" instead, i really would need if/then/else along
> the lines of:
> 
>   if blah ; then
> skip_commit "$@"
>   else
> git commit-tree "$@"
>   fi
> 
> thank you kindly.

No, I think a tree-filter would just be:

  if test -e dir/sentinel
  then
rm -rf dir
git add -u
  fi

(I can't remember if the "add -u" is necessary or not; I rarely use tree
filters).

In other words, for each commit you are just saying "if the bad version
of the directory is present, then get rid of it". You shouldn't need to
deal with commit-tree at all. The filter-branch script will take care of
committing whatever tree state your filter leaves in place.

Do note that I didn't test either of the versions I sent to you, so it's
possible I'm missing some subtle thing. But I'm pretty sure the general
direction is correct.

-Peff

Re: Git 2.14.1: t6500: error during test on musl libc

2017-09-15 Thread Jeff King

On Fri, Sep 15, 2017 at 08:37:40AM +0200, Kevin Daudt wrote:

> On Thu, Sep 14, 2017 at 09:43:12PM -0500, A. Wilcox wrote:
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA256
> > 
> > Hi there,
> > 
> > While bumping Git's version for our Linux distribution to 2.14.1, I've
> > run in to a new test failure in t6500-gc.sh.  This is the output of
> > the failing test with debug=t verbose=t:
> 
> This is a new test introduced by c45af94db 
> (gc: run pre-detach operations under lock, 2017-07-11) which was
> included in v2.14.0.
> 
> So it might be that this was already a problem for a longer time, only
> just recently uncovered.

The code change there is not all that big. Mostly we're just checking
that the lock is actually respected. The lock code doesn't exercise libc
all that much. It does use fscanf, which I guess is a little exotic for
us. It's also possible that hostname() doesn't behave quite as we
expect.

If you instrument gc like the patch below, what does it report when you
run:

  GIT_TRACE=1 ./t6500-gc.sh --verbose-only=8

I get:

  [...]
  trace: built-in: git 'gc' '--auto'
  Auto packing the repository in background for optimum performance.
  See "git help gc" for manual housekeeping.
  debug: gc lock already held by $my_hostname
  [...]

If you get "acquired gc lock", then the problem is in
lock_repo_for_gc(), and I'd suspect some problem with fscanf or
hostname.

-Peff

---
diff --git a/builtin/gc.c b/builtin/gc.c
index c22787ac72..a7450a0058 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -242,9 +242,11 @@ static const char *lock_repo_for_gc(int force, pid_t* 
ret_pid)
int fd;
char *pidfile_path;
 
-   if (is_tempfile_active(pidfile))
+   if (is_tempfile_active(pidfile)) {
/* already locked */
+   trace_printf("debug: we already hold the gc lock");
return NULL;
+   }
 
if (xgethostname(my_host, sizeof(my_host)))
xsnprintf(my_host, sizeof(my_host), "unknown");
@@ -284,6 +286,7 @@ static const char *lock_repo_for_gc(int force, pid_t* 
ret_pid)
rollback_lock_file();
*ret_pid = pid;
free(pidfile_path);
+   trace_printf("debug: gc lock already held by %s", 
locking_host);
return locking_host;
}
}
@@ -295,6 +298,7 @@ static const char *lock_repo_for_gc(int force, pid_t* 
ret_pid)
commit_lock_file();
pidfile = register_tempfile(pidfile_path);
free(pidfile_path);
+   trace_printf("debug: we have acquired the gc lock");
return NULL;
 }

Re: [PATCH v5 25/40] external-odb: add 'get_direct' support

2017-09-15 Thread Christian Couder

On Thu, Sep 14, 2017 at 8:19 PM, Jonathan Tan  wrote:
> On Thu, 14 Sep 2017 10:39:35 +0200
> Christian Couder  wrote:
>
>> From the following email:
>>
>> https://public-inbox.org/git/20170804145113.5ceaf...@twelve2.svl.corp.google.com/
>>
>> it looks like his work is fundamentally about changing the rules of
>> connectivity checks. Objects are split between "homegrown" objects and
>> "imported" objects which are in separate pack files. Then references
>> to imported objects are not checked during connectivity check.
>>
>> I think changing connectivity rules is not necessary to make something
>> like external odb work. For example when fetching a pack that refers
>> to objects that are in an external odb, if access this external odb
>> has been configured, then the connectivity check will pass as the
>> missing objects in the pack will be seen as already part of the repo.
>
> There are still some nuances. For example, if an external ODB provides
> both a tree and a blob that the tree references, do we fetch the tree in
> order to call "have" on all its blobs, or do we trust the ODB that if it
> has the tree, it has all the other objects? In my design, I do the
> latter, but in the general case where we have multiple ODBs, we might
> have to do the former. (And if we do the former, it seems to me that the
> connectivity check must be performed "online" - that is, with the ODBs
> being able to provide "get".)

Yeah, I agree that the problem is more complex if there can be trees
or all kind of objects in the external odb.
But as I explain in the following email to Junio, I don't think
storing other kind of objects is one of the most interesting use case:

https://public-inbox.org/git/CAP8UFD3=nutrf24clsok4hsgm3nxgh8sbzvpmcg7cnchj2z...@mail.gmail.com/

> (Also, my work extends all the way to fetch/clone [1], but admittedly I
> have been taking it a step at a time and recently have only been
> discussing how the local repo should handle the missing object
> situation.)
>
> [1] 
> https://public-inbox.org/git/cover.1499800530.git.jonathanta...@google.com/

Yeah, I think your work is interesting and could perhaps be useful for
external odbs as there could be situations that would be handled
better using your work or something similar.

Re: [PATCH 1/3] t6010: test actual test output

2017-09-15 Thread Jeff King

On Fri, Sep 15, 2017 at 12:01:27PM +0200, Michael J Gruber wrote:

> Jeff King venit, vidit, dixit 14.09.2017 16:34:
> > On Thu, Sep 14, 2017 at 03:15:18PM +0200, Michael J Gruber wrote:
> > 
> >> 4f21454b55 ("merge-base: handle --fork-point without reflog",
> >> 2016-10-12) introduced a fix for merge-base --fork-point without reflog
> >> and a test. While that test is fine, it did not update expected nor actual
> >> output and tested that of the previous test instead.
> > 
> > Oops. The use of the existing "expect3" was intentional (the idea being
> > that we are repeating the identical step from the previous test with the
> > slight tweak of not having a reflog). But obviously failing to write to
> > "actual" at all was a mistake.
> > 
> > That said, I'm OK with reiterating the expected output.
> 
> expect3 had a different content, too ;)

Oh, then it was just horribly broken, then. Oops. :)

Thanks for fixing.

-Peff

Re: how to remove from history just one version of a file/dir?

2017-09-15 Thread Robert P. J. Day

On Thu, 14 Sep 2017, Jeff King wrote:

> On Thu, Sep 14, 2017 at 07:32:11AM -0400, Robert P. J. Day wrote:
>
> >   [is this the right place to ask questions about git usage? or is
> > there a different forum where one can submit possibly
> > embarrassingly silly questions?]
>
> No, this is the right place for embarrassing questions. :)
>
> >   say, early on, one commits a sizable directory of content, call
> > it /mydir. that directory sits there for a while until it becomes
> > obvious it's out of date and worthless and should never have been
> > committed. the obvious solution would seem to be:
> >
> >   $ git filter-branch --tree-filter 'rm -rf /mydir' HEAD
> >
> > correct?
>
> That would work, though note that using an --index-filter would be
> more efficient (since it avoids checking out each tree as it walks
> the history).

  i'm just digging into --index-filter as we speak, i realize it's
noticeably faster.

> >   however, say one version of that directory was committed early
> > on, then later tossed for being useless with "git rm", and
> > subsequently replaced by newer content under exactly the same
> > name. now i'd like to go back and delete the history related to
> > that early version of /mydir, but not the second.
>
> Makes sense as a goal.
>
> >   obviously, i can't use the above command as it would delete both
> > versions. so it appears the solution would be a trivial
> > application of the "--commit-filter" option:
> >
> >git filter-branch --commit-filter '
> >  if [ "$GIT_COMMIT" = "" ] ; then
> >skip_commit "$@";
> >  else
> >git commit-tree "$@";
> >  fi' HEAD
> >
> > where  is the commit that introduced the first verrsion of
> > /mydir. do i have that right? is there a simpler way to do this?
>
> No, this won't work. Filter-branch is not walking the history and
> applying the changes to each commit, like rebase does.  It's
> literally operating on each commit object, and recall that each
> commit object points to a tree that is a snapshot of the repository
> contents.
>
> So if you skip a commit, that commit itself goes away. But the
> commit after it (which didn't touch the unwanted contents) will
> still mention those contents in its tree.

  ah, of course, duh.

> I think you want to stick with a --tree-filter (or an
> --index-filter), but just selectively decide when to do the
> deletion. For example, if you can tell the difference between the
> two states based on the presence of some file, then perhaps:
>
>   git filter-branch --prune-empty --index-filter '
>   if git rev-parse --verify :dir/sentinel >/dev/null 2>&1
>   then
> git rm --cached -rf dir
>   fi
>   ' HEAD
>
> The "--prune-empty" is optional, but will drop commits that become
> empty because they _only_ touched that directory.
>
> We use ":dir/sentinel" to see if the entry is in the index, because
> the index filter won't have the tree checked out. Likewise, we need
> to use "rm --cached" to just touch the index.

  got it. one last query -- i note that there is no "else" clause in
that code for "--index-filter". am i assuming correctly that if i was
using "--tree-filter" instead, i really would need if/then/else along
the lines of:

  if blah ; then
skip_commit "$@"
  else
git commit-tree "$@"
  fi

thank you kindly.

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
http://crashcourse.ca

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday

Re: [PATCH 1/2] test-lib: group system specific FIFO tests by system

Johannes Schindelin venit, vidit, dixit 15.09.2017 00:21:
> Hi Michael,
> 
> On Thu, 14 Sep 2017, Michael J Gruber wrote:
> 
>> test-lib determines whether a file-system supports FIFOs and needs to do
>> special casing for CYGWIN and MINGW. This separates those system
>> specific settings from those at more central place.
>>
>> Set mkfifo()  to false in the central system specific place so that the
>> same test works everywhere.
> 
> The mkfifo() emulation of Cygwin seems to work, no? I think it works even
> in MSYS2, but not in MINGW.
> 
> So maybe this patch should affect only the MINGW arm?

I only reorganised the code, so in that sense the patch does not affect
any system ;)

If indeed mkfifo works on CYGWIN than a separate patch should remove the
exclusion of CYGWIN; alas, I can't confirm (I wish MS still had the old
academic alliance programme).

Michael

Re: [PATCH 2/3] merge-base: return fork-point outside reflog

Junio C Hamano venit, vidit, dixit 15.09.2017 04:48:
> Michael J Gruber  writes:
> 
>> In fact, per documentation "--fork-point" looks at the reflog in
>> addition to doing the usual walk from the tip. The original design
>> description in d96855ff51 ("merge-base: teach "--fork-point" mode",
>> 2013-10-23) describes this as computing from a virtual merge-base of all
>> the historical tips of refname. They may or may not all be present in
>> the reflog (think pruning, non-ff fetching, fast forwarding etc.),
>> so filtering by the current contents of the reflog is potentially
>> harmful, and it does not seem to fulfill any purpose in the original
>> design.
> 
> Let me think aloud, using the picture from the log message from that
> commit.
> 
>  o---B1
> /
> ---o---o---B2--o---o---o---Base
> \
>  B3
>   \
>Derived
> 
> where the current tip of the "base" branch is at Base, but earlier
> fetch observed that its tip used to be B3 and then B2 and then B1
> before getting to the current commit, and the branch being rebased
> on top of the latest "base" is based on commit B3.
> 
> So the logic tries to find a merge base between "Derived" and a
> virtual merge commit across Base, B1, B2 and B3.  And it finds B3.
> 
> If for some reason we didn't have B3 in the reflog, then wouldn't
> the merge base computation between Derived and a virtual merge
> commit across Base, B2 and B2 (but not B3 because we no longer know
> about it due to its lack in the reflog) find 'o' that is the parent
> of B2 and B3? 

Yes.

> Wouldn't that lead to both B3 and Derived replayed
> when the user of the fork-point potion rebases the history of
> Derived?

Replayed, yes. What that means would depend on how B3 ended up being
"off base" (reset or rebase, e.g.): the replay could lead to a reapply
without conflict, or with conflict, or an empty (discarded) commit,
depending on "how much of B3" is still "on base".

> Perhaps that is the best we could do with a pruned reflog that lacks
> B3, but if that is the case, I wonder if it may be better to fail
> the request saying that we cannot find the fork-point (because,
> well, your reflog no longer has sufficient information), than
> silently give a wrong result, and in this case, we can tell between
> a correct result (i.e. the merge base is one of the commits we still
> know was at the tip) and a wrong one (i.e. the merge base is not any
> of the commits in the reflog).
> 
> If we declare --fork-point is the best effort option and may give an
> answer that is not better than without the option, then I think this
> patch is OK, but that diminishes the value of the option as a
> building block, I am afraid.
> 
> Callers that are more careful could ask merge-base with --fork-point
> (and happily use it knowing that the result is indeed a commit that
> used to be at the tip), or fall back to the result merge-base
> without --fork-point gives (because you could do no better) and deal
> with duplicates that may happen due to the imprecise determination.
> With this change, these callers will get result from a call to
> "merge-base --fork-point" that may or may not be definite, and they
> cannot tell.  For lazy users, making the option itself to fall back
> may be simpler to use, and certainly is a valid stance to take when
> implementing a convenience option to a Porcelain command, but I do
> not know if it is a good idea to throw "merge-base --fork-point"
> into that category.

Simply put, "git merge-base ref commit" looks at the (graph) history of
ref and considers merge-base candidates that are also in the graph
history of commit. This is the "graph notion" of merge-base, and the
result is immanently determined by the DAG.

There is also the "reflog notion" where you look at the "reflog history"
of ref. The result depends on the reflog, which itself is "volatile"
(think prune), and such is the result.

Now, the original documentation of merge-base says that "merge-base
--fork-point" looks at the reflog of ref "also" (*in addition to*) the
DAG. That is, the merge-base candidates for "merge-base --fork-point"
are a super-set of those for the plain mode, enhanced by the reflog.
(graph notion plus reflog notion)

The original implementation makes sure that "merge-base --fork-point"
looks *only* at candidates from the reflog. (strict reflog notion)

That is a discrepancy that we should resolve in any case.

Note that with a "complete reflog", the set of reflog merge-base
candidates is a superset of the one from the DAG.

I did not look up the discussion preceeding 4f21454b55 ("merge-base:
handle --fork-point without reflog", 2016-10-12), but if "merge-base
--fork-point" were about a "strict reflog" notion then there was nothing
to fix back then - no reflog, no merge-base candidates. Period.

I don't mind having two modes, say "--reflog"

Re: [PATCH 1/3] t6010: test actual test output

Jeff King venit, vidit, dixit 14.09.2017 16:34:
> On Thu, Sep 14, 2017 at 03:15:18PM +0200, Michael J Gruber wrote:
> 
>> 4f21454b55 ("merge-base: handle --fork-point without reflog",
>> 2016-10-12) introduced a fix for merge-base --fork-point without reflog
>> and a test. While that test is fine, it did not update expected nor actual
>> output and tested that of the previous test instead.
> 
> Oops. The use of the existing "expect3" was intentional (the idea being
> that we are repeating the identical step from the previous test with the
> slight tweak of not having a reflog). But obviously failing to write to
> "actual" at all was a mistake.
> 
> That said, I'm OK with reiterating the expected output.

expect3 had a different content, too ;)

Michael

Re: [PATCH v3] commit-template: change a message to be more intuitive

Kaartic Sivaraam venit, vidit, dixit 15.09.2017 06:50:
> It's not good to use the phrase 'do not touch' to convey the information
> that the cut-line should not be modified or removed as it could possibly
> be mis-interpreted by a person who doesn't know that the word 'touch' has
> the meaning of 'tamper with'. Further, it could make translations a little
> difficult as it might not have the intended meaning in a few languages (for
> which translations don't exist yet) when translated as such.
> 
> So, use intuitive terms in the sentence. Replacing the word 'touch' with
> other terms has introduced the possibility of the following sentence to be
> mis-interpreted, so change the terms in that too.
> 
> Signed-off-by: Kaartic Sivaraam 
> ---
>  Changes in v3:
>  
> - changed the wordings of the second sentence as there seemed to be a
>   magical 'or else' connecting the two lines.
>  
>  I didn't expect the least that this would go upto v3. In case anyboy finds
>  something wrong with this change too, it's a lot better to drop this 
> altogether
>  than going for a v4.

That happens more often than not :)

Your original intent was to avoid any misunderstandings, and all the
comments and iterations made sure that we don't trade one possible
source of misunderstanding for another but avoid them all.

I consider v4 to be a strict improvement over the status quo and (as fas
as I can see) to serve your original intent as good as possible.

>  wt-status.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/wt-status.c b/wt-status.c
> index 77c27c511..23e87e74d 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -934,7 +934,7 @@ size_t wt_status_locate_end(const char *s, size_t len)
>  
>  void wt_status_add_cut_line(FILE *fp)
>  {
> - const char *explanation = _("Do not touch the line above.\nEverything 
> below will be removed.");
> + const char *explanation = _("Do not modify or remove the line 
> above.\nEverything below it will be ignored.");
>   struct strbuf buf = STRBUF_INIT;
>  
>   fprintf(fp, "%c %s", comment_line_char, cut_line);
>

Re: [PATCH 2/3] merge-base: return fork-point outside reflog

2017-09-15 Thread Phillip Wood

On 15/09/17 03:48, Junio C Hamano wrote:
> 
> Michael J Gruber  writes:
> 
>> In fact, per documentation "--fork-point" looks at the reflog in
>> addition to doing the usual walk from the tip. The original design
>> description in d96855ff51 ("merge-base: teach "--fork-point" mode",
>> 2013-10-23) describes this as computing from a virtual merge-base of all
>> the historical tips of refname. They may or may not all be present in
>> the reflog (think pruning, non-ff fetching, fast forwarding etc.),
>> so filtering by the current contents of the reflog is potentially
>> harmful, and it does not seem to fulfill any purpose in the original
>> design.
> 
> Let me think aloud, using the picture from the log message from that
> commit.
> 
>  o---B1
> /
> ---o---o---B2--o---o---o---Base
> \
>  B3
>   \
>Derived
> 
> where the current tip of the "base" branch is at Base, but earlier
> fetch observed that its tip used to be B3 and then B2 and then B1
> before getting to the current commit, and the branch being rebased
> on top of the latest "base" is based on commit B3.
> 
> So the logic tries to find a merge base between "Derived" and a
> virtual merge commit across Base, B1, B2 and B3.  And it finds B3.


Thanks for this explanation, I've never been sure exactly what
--fork-point did, after reading this I think I understand it.

> If for some reason we didn't have B3 in the reflog, then wouldn't
> the merge base computation between Derived and a virtual merge
> commit across Base, B2 and B2 (but not B3 because we no longer know
> about it due to its lack in the reflog) find 'o' that is the parent
> of B2 and B3?  Wouldn't that lead to both B3 and Derived replayed
> when the user of the fork-point potion rebases the history of
> Derived?
> 
> Perhaps that is the best we could do with a pruned reflog that lacks
> B3, but if that is the case, I wonder if it may be better to fail
> the request saying that we cannot find the fork-point (because,
> well, your reflog no longer has sufficient information), than
> silently give a wrong result, and in this case, we can tell between
> a correct result (i.e. the merge base is one of the commits we still
> know was at the tip) and a wrong one (i.e. the merge base is not any
> of the commits in the reflog).
> 
> If we declare --fork-point is the best effort option and may give an
> answer that is not better than without the option, then I think this
> patch is OK, but that diminishes the value of the option as a
> building block, I am afraid.

I'd tend to agree with you that it would be better to fail rather than
give a best effort. I've got a script that I use to sync the branch I'm
working on between my desktop and laptop. When it pulls it checks the
local head and the compares it to the remote head before pulling. If
they match then it does 'git reset --hard $new_remote', if the local
head is descended from the old remote head then it does 'git rebase
--onto $new_remote $old_remote', otherwise it refuses to update the
local head. If I've understood --fork-point correctly then I could use
'git rebase --fork-point $remote_branch' if it failed when it couldn't
find a merge base in the reflog of the remote branch.

> 
> Callers that are more careful could ask merge-base with --fork-point
> (and happily use it knowing that the result is indeed a commit that
> used to be at the tip), or fall back to the result merge-base
> without --fork-point gives (because you could do no better) and deal
> with duplicates that may happen due to the imprecise determination.
> With this change, these callers will get result from a call to
> "merge-base --fork-point" that may or may not be definite, and they
> cannot tell.  For lazy users, making the option itself to fall back
> may be simpler to use, and certainly is a valid stance to take when
> implementing a convenience option to a Porcelain command, but I do
> not know if it is a good idea to throw "merge-base --fork-point"
> into that category.
> 

Best Wishes

Phillip

Re: What's cooking in git.git (Sep 2017, #03; Fri, 15)

2017-09-15 Thread Lars Schneider


On 15 Sep 2017, at 07:58, Junio C Hamano  wrote:

> Here are the topics that have been cooking.  Commits prefixed with
> '-' are only in 'pu' (proposed updates) while commits prefixed with
> '+' are in 'next'.  The ones marked with '.' do not appear in any of
> the integration branches, but I am still holding onto them.

> ...

> * ls/travis-scriptify (2017-09-11) 3 commits
>  (merged to 'next' on 2017-09-14 at 8fa685d3b7)
> + travis: dedent a few scripts that are indented overly deeply
> + travis-ci: skip a branch build if equal tag is present
> + travis-ci: move Travis CI code into dedicated scripts
> 
> The scripts to drive TravisCI has been reorganized and then an
> optimization to avoid spending cycles on a branch whose tip is
> tagged has been implemented.
> 
> Will merge to 'master'.

SZEDER noticing a bug in this series that I was about to fix:
https://public-inbox.org/git/3b175d35-5b1c-43cd-a7e9-85693335b...@gmail.com/

I assume at this point a new patch is better than a re-roll, right?

Thanks,
Lars

Re: Git 2.14.1: t6500: error during test on musl libc

2017-09-15 Thread Kevin Daudt

On Fri, Sep 15, 2017 at 08:37:40AM +0200, Kevin Daudt wrote:
> On Thu, Sep 14, 2017 at 09:43:12PM -0500, A. Wilcox wrote:
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA256
> > 
> > Hi there,
> > 
> > While bumping Git's version for our Linux distribution to 2.14.1, I've
> > run in to a new test failure in t6500-gc.sh.  This is the output of
> > the failing test with debug=t verbose=t:
> 
> This is a new test introduced by c45af94db 
> (gc: run pre-detach operations under lock, 2017-07-11) which was
> included in v2.14.0.
> 
> So it might be that this was already a problem for a longer time, only
> just recently uncovered.
> 

Adding Jeff King to CC

Re: Git 2.14.1: t6500: error during test on musl libc

2017-09-15 Thread Kevin Daudt

On Thu, Sep 14, 2017 at 09:43:12PM -0500, A. Wilcox wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> Hi there,
> 
> While bumping Git's version for our Linux distribution to 2.14.1, I've
> run in to a new test failure in t6500-gc.sh.  This is the output of
> the failing test with debug=t verbose=t:

This is a new test introduced by c45af94db 
(gc: run pre-detach operations under lock, 2017-07-11) which was
included in v2.14.0.

So it might be that this was already a problem for a longer time, only
just recently uncovered.

> 
> 
> expecting success:
> # make sure we run a background auto-gc
> test_commit make-pack &&
> git repack &&
> test_config gc.autopacklimit 1 &&
> test_config gc.autodetach true &&
> 
> # create a ref whose loose presence we can use to detect a
> pack-refs run
> git update-ref refs/heads/should-be-loose HEAD &&
> test_path_is_file .git/refs/heads/should-be-loose &&
> 
> # now fake a concurrent gc that holds the lock; we can use our
> # shell pid so that it looks valid.
> hostname=$(hostname || echo unknown) &&
> printf "$$ %s" "$hostname" >.git/gc.pid &&
> 
> # our gc should exit zero without doing anything
> run_and_wait_for_auto_gc &&
> test_path_is_file .git/refs/heads/should-be-loose
> 
> [master 28ecdda] make-pack
>  Author: A U Thor 
>  1 file changed, 1 insertion(+)
>  create mode 100644 make-pack.t
> Counting objects: 3, done.
> Delta compression using up to 8 threads.
> Compressing objects: 100% (2/2), done.
> Writing objects: 100% (3/3), done.
> Total 3 (delta 0), reused 0 (delta 0)
> Auto packing the repository in background for optimum performance.
> See "git help gc" for manual housekeeping.
> File .git/refs/heads/should-be-loose doesn't exist.
> not ok 8 - background auto gc respects lock for all operations
> #
> #   # make sure we run a background auto-gc
> #   test_commit make-pack &&
> #   git repack &&
> #   test_config gc.autopacklimit 1 &&
> #   test_config gc.autodetach true &&
> #
> #   # create a ref whose loose presence we can use to
> detect a pack-refs run
> #   git update-ref refs/heads/should-be-loose HEAD &&
> #   test_path_is_file .git/refs/heads/should-be-loose &&
> #
> #   # now fake a concurrent gc that holds the lock; we can
> use our
> #   # shell pid so that it looks valid.
> #   hostname=$(hostname || echo unknown) &&
> #   printf "$$ %s" "$hostname" >.git/gc.pid &&
> #
> #   # our gc should exit zero without doing anything
> #   run_and_wait_for_auto_gc &&
> #   test_path_is_file .git/refs/heads/should-be-loose
> #
> 
> # failed 1 among 8 test(s)
> 1..8
> 
> 
> I admit I am mostly blind with the Git gc system.  Should I use strace
> on the git-gc process at the end?  How would I accomplish that?  Is
> there a better way of debugging this error further?
> 
> Core system stats:
> 
> Intel x86_64 E3-1280 v3 @ 3.60 GHz
> musl libc 1.1.16+20
> git 2.14.1, vanilla except for a patch to an earlier test due to
> musl's inability to cope with EUC-JP
> bash 4.3.48(1)-release
> 
> Thank you very much.
> 
> All the best,
> - --arw
> 
> - -- 
> A. Wilcox (awilfox)
> Project Lead, Adélie Linux
> http://adelielinux.org
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2
> 
> iQIcBAEBCAAGBQJZuz4yAAoJEMspy1GSK50UORwP/0Jxfp3xzexh27tSJlXYWS/g
> g9QK8Xmid+3A0R696Vb2GguKg2roCcTmM2anR7iD1B2f2W31sgf+8M5mnJRHyJ1p
> geEeqwrTdpCk6jQ/1Pj03L0NOftb1ftR6hcoVujBFAOph4jRlRdZDPA87fe6snrh
> q99C3LoDXQcyK6WWJwzX+t2wOplKgpGJP8wTAaZ0AHoUwVS5CLPl8tP2XaY4kLfD
> ZPPcvtp9wisVzzZ2ssE/CLGd38EbenNNZ6OJCBFJIHmlwey4G2isZ9kk6fVIHXi2
> unBJ8yVqI7hQKmQFSVQMMSFSd9azhHnDjTBO5mzWeRK9HNVMda3LZsXTtVeswnRs
> lN/ASMdt5KdfpNy/plFB7yDWLlQSQY7j1mxBMR8lL3AdVVQUbJppDM795tt+rn6a
> NCE2ESZMWd/QEULmT92AbkNJTj5ibBEoubnVTka05KMjaBLwIauhpqU5XxLFq2UH
> y3JYQU9hm0E7dQE0CLXxIm5/574T6bBUgp1cXH3CjxkeUYKR1USVKtDfBV6t/Qmt
> xlDZKPEfjKbTvL3KUF33G+eAp55wTwrJTaWlOp8A/JqooXavYghcsuFhYtCPJ8qo
> fFUa8kBZP70E/O7JkycUu8wi7p42+j1a8gR6/AnPG2u2wyoiosLCxHX+nll4gKmN
> b6BuiRn0Z9ie5xw4xcMR
> =Vf8Z
> -END PGP SIGNATURE-

Re: Is finalize_object_file in sha1_file.c handling errno from "rename" correctly?