Re: [PATCH] t7406: submodule..update command must not be run from .gitmodules

2017-09-25 Thread Johannes Sixt

Am 26.09.2017 um 00:50 schrieb Stefan Beller:

submodule..update can be assigned an arbitrary command via setting
it to "!command". When this command is found in the regular config, Git
ought to just run that command instead of other update mechanisms.

However if that command is just found in the .gitmodules file, it is
potentially untrusted, which is why we do not run it.  Add a test
confirming the behavior.

Suggested-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
---

  updated to use the super robust script.
  Thanks Jonathan,
  
  Stefan


  t/t7406-submodule-update.sh | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 034914a14f..d718cb00e7 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -406,6 +406,20 @@ test_expect_success 'submodule update - command in 
.git/config' '
)
  '
  
+test_expect_success 'submodule update - command in .gitmodules is ignored' '

+   test_when_finished "git -C super reset --hard HEAD^" &&
+
+   write_script must_not_run.sh <<-EOF &&
+   >$TEST_DIRECTORY/bad
+   EOF


I am pretty confident that this does not test what you intend to test. 
Notice that $TEST_DIRECTORY is expanded when the script is written. But 
that path contains a blank, and we have something like this in the test 
script:


#!/bin/sh
>/the/build/directory/t/trash directory.t7406/bad

If you inject the bug against which this test protects into 
git-submodule, you should find a file "trash" in your t directory, and 
the file "bad" still absent. Not to mention that the script fails 
because it cannot run "directory.t7406/bad".


To fix that, you should use and exported variable and access that from 
the test script, for example:


write_script must_not_run.sh <<-\EOF &&
>"$TEST_DIRECTORY"/bad
EOF
...
(
export TEST_DIRECTORY &&
git -C super submodule update submodule
) &&
test_path_is_missing bad


+
+   git -C super config -f .gitmodules submodule.submodule.update 
"!$TEST_DIRECTORY/must_not_run.sh" &&
+   git -C super commit -a -m "add command to .gitmodules file" &&
+   git -C super/submodule reset --hard $submodulesha1^ &&
+   git -C super submodule update submodule &&
+   test_path_is_missing bad
+'
+
  cat << EOF >expect
  Execution of 'false $submodulesha1' failed in submodule path 'submodule'
  EOF



-- Hannes


Re: [PATCH 4/4] Move documentation of string_list into string-list.h

2017-09-25 Thread Junio C Hamano
Junio C Hamano  writes:

>
> Thanks.  I am not sure if you can safely reorder the contents of the
> header files in general, but I trust that you made sure that this
> does not introduce problems (e.g. referrals before definition).

Alas, this time it seems my trust was grossly misplaced.  Discarding
this patch and redoing the integration cycle for the day.


Re: '--shallow-since' option is not available for git-pull in Git version 2.14.1

2017-09-25 Thread Kevin Daudt
On Mon, Sep 25, 2017 at 04:31:10PM +0900, Sanggyu Nam wrote:
> I’ve found that some subcommands such as git-clone, git-fetch, and
> git-pull support an option named ‘--shallow-since’, as of Git version
> 2.11. This option is documented in the man page of each subcommand. In
> Git 2.14.1, I’ve checked that the option is available for git-clone
> and git-fetch so that the history of a shallow repository is updated.
> However, running git-pull with this option shows an error as follows:
> 
> error: unknown option `shallow-since=2017-09-10T00:00:00+00:00'
> 
> usage: git pull [] [ [...]]
> ...
> 
> I found that this option is not available in Git 2.14.1 on macOS and
> Ubuntu 16.04.1. It seems the option handling of git-pull does not
> match with what’s described in the man page. Since ‘git pull’ is a
> shorthand for ‘git fetch’ followed by ‘git merge FETCH_HEAD’ in its
> default mode, we can run these two commands in this order as a
> workaround.
> 
> 

This does not only count for --shallow-since, but also --deepen
and --shallow-exclude.


Re: [PATCH 4/4] Move documentation of string_list into string-list.h

2017-09-25 Thread Junio C Hamano
Han-Wen Nienhuys  writes:

> This mirrors commit bdfdaa4978dd92992737e662f25adc01d32d0774 which did
> the same for strbuf.h:
>
> * API documentation uses /** */ to set it apart from other comments.
>
> * Function names were stripped from the comments.
>
> * Ordering of the header was adjusted to follow the one from the text
>   file.
>
> * Edited some existing comments to follow the new standard.
>
> Signed-off-by: Han-Wen Nienhuys 
> ---

Thanks.  I am not sure if you can safely reorder the contents of the
header files in general, but I trust that you made sure that this
does not introduce problems (e.g. referrals before definition).

Also, I am not sure what "the new standard" refers to.  Have we
established a new standard somewhere?  Did we have discussion on it?
Puzzled.

Thanks.


>  Documentation/technical/api-string-list.txt | 209 
> 
>  string-list.h   | 187 +
>  2 files changed, 159 insertions(+), 237 deletions(-)
>  delete mode 100644 Documentation/technical/api-string-list.txt
>
> diff --git a/Documentation/technical/api-string-list.txt 
> b/Documentation/technical/api-string-list.txt
> deleted file mode 100644
> index c08402b12..0
> --- a/Documentation/technical/api-string-list.txt
> +++ /dev/null
> @@ -1,209 +0,0 @@
> -string-list API
> -===
> -
> -The string_list API offers a data structure and functions to handle
> -sorted and unsorted string lists.  A "sorted" list is one whose
> -entries are sorted by string value in `strcmp()` order.
> -
> -The 'string_list' struct used to be called 'path_list', but was renamed
> -because it is not specific to paths.
> -
> -The caller:
> -
> -. Allocates and clears a `struct string_list` variable.
> -
> -. Initializes the members. You might want to set the flag `strdup_strings`
> -  if the strings should be strdup()ed. For example, this is necessary
> -  when you add something like git_path("..."), since that function returns
> -  a static buffer that will change with the next call to git_path().
> -+
> -If you need something advanced, you can manually malloc() the `items`
> -member (you need this if you add things later) and you should set the
> -`nr` and `alloc` members in that case, too.
> -
> -. Adds new items to the list, using `string_list_append`,
> -  `string_list_append_nodup`, `string_list_insert`,
> -  `string_list_split`, and/or `string_list_split_in_place`.
> -
> -. Can check if a string is in the list using `string_list_has_string` or
> -  `unsorted_string_list_has_string` and get it from the list using
> -  `string_list_lookup` for sorted lists.
> -
> -. Can sort an unsorted list using `string_list_sort`.
> -
> -. Can remove duplicate items from a sorted list using
> -  `string_list_remove_duplicates`.
> -
> -. Can remove individual items of an unsorted list using
> -  `unsorted_string_list_delete_item`.
> -
> -. Can remove items not matching a criterion from a sorted or unsorted
> -  list using `filter_string_list`, or remove empty strings using
> -  `string_list_remove_empty_items`.
> -
> -. Finally it should free the list using `string_list_clear`.
> -
> -Example:
> -
> -
> -struct string_list list = STRING_LIST_INIT_NODUP;
> -int i;
> -
> -string_list_append(, "foo");
> -string_list_append(, "bar");
> -for (i = 0; i < list.nr; i++)
> - printf("%s\n", list.items[i].string)
> -
> -
> -NOTE: It is more efficient to build an unsorted list and sort it
> -afterwards, instead of building a sorted list (`O(n log n)` instead of
> -`O(n^2)`).
> -+
> -However, if you use the list to check if a certain string was added
> -already, you should not do that (using unsorted_string_list_has_string()),
> -because the complexity would be quadratic again (but with a worse factor).
> -
> -Functions
> --
> -
> -* General ones (works with sorted and unsorted lists as well)
> -
> -`string_list_init`::
> -
> - Initialize the members of the string_list, set `strdup_strings`
> - member according to the value of the second parameter.
> -
> -`filter_string_list`::
> -
> - Apply a function to each item in a list, retaining only the
> - items for which the function returns true.  If free_util is
> - true, call free() on the util members of any items that have
> - to be deleted.  Preserve the order of the items that are
> - retained.
> -
> -`string_list_remove_empty_items`::
> -
> - Remove any empty strings from the list.  If free_util is true,
> - call free() on the util members of any items that have to be
> - deleted.  Preserve the order of the items that are retained.
> -
> -`print_string_list`::
> -
> - Dump a string_list to stdout, useful mainly for debugging purposes. It
> - can take an optional header argument and it writes out the
> - string-pointer pairs of the string_list, each one in its own line.
> -
> -`string_list_clear`::
> -
> - Free a 

Re: [PATCH 2/4] Clarify return value ownership of real_path and read_gitfile_gently

2017-09-25 Thread Junio C Hamano
Han-Wen Nienhuys  writes:

>>Subject: Re: [PATCH 2/4] Clarify return value ownership of real_path and 
>>read_gitfile_gently

We try to make "shortlog --no-merges" output consistently say what
area the change is about, followed by a colon, followed by a short
description.

> Signed-off-by: Han-Wen Nienhuys 
> ---
>  abspath.c | 3 +++
>  setup.c   | 3 ++-
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/abspath.c b/abspath.c
> index 708aff8d4..792a2d533 100644
> --- a/abspath.c
> +++ b/abspath.c
> @@ -202,6 +202,9 @@ char *strbuf_realpath(struct strbuf *resolved, const char 
> *path,
>   return retval;
>  }
>  
> +/* Resolve `path` into an absolute, cleaned-up path. The return value
> + * comes from a global shared buffer and should not be freed.
> + */
>  const char *real_path(const char *path)
>  {
>   static struct strbuf realpath = STRBUF_INIT;

The part about "what it does" is a good thing to have here, but I do
not think the second sentence adds much if it stays here (if the
comment were in a header file far from implementation, then it is a
different matter).  Besides, "should not be freed" is not the only
important thing---it is already implied by the function returning
"const char *" (i.e. the caller/receiver does not own it).  It will
stay valid only until the next call, so the caller needs to xstrdup()
it if it wants to keep the value.  That is equally important.

But quite honestly, that pattern appears very often in our codebase
(all users of get_pathname() share the same characteristics) that
these details (i.e. do not free, expect the buffer to be recycled)
probaly is not worth it.  Mentioning that the returned value is in a
shared buffer for short-term use would be sufficient.

> diff --git a/setup.c b/setup.c
> index 6d8380acd..31853724c 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -541,7 +541,8 @@ void read_gitfile_error_die(int error_code, const char 
> *path, const char *dir)
>  
>  /*
>   * Try to read the location of the git directory from the .git file,
> - * return path to git directory if found.
> + * return path to git directory if found. The return value comes from
> + * a shared pool and should not be freed.

OK, the returned value comes from the return value of real_path(),
so the early half of the new warning is worth having.  "should not
be freed" is both extraneous (for those who are already aware of the
common pattern in our codebase) and insufficient (for those who are
not yet).

Thanks.

>   *
>   * On failure, if return_error_code is not NULL, return_error_code
>   * will be set to an error code and NULL will be returned. If


Re: [PATCH] fast-import: checkpoint: dump branches/tags/marks even if object_count==0

2017-09-25 Thread Junio C Hamano
"Eric Rannaud"  writes:

> The checkpoint command cycles packfiles if object_count != 0, a sensible
> test or there would be no pack files to write. Since 820b931012, the
> command also dumps branches, tags and marks, but still conditionally.
> However, it is possible for a command stream to modify refs or create
> marks without creating any new objects.

That reasoning sounds sensible.  Especially given the discussion of
"checkpoint" and "progress" we can see in "git fast-import --help"
documentation. E.g.

Placing a progress command immediately after a checkpoint will
inform the reader when the checkpoint has been completed and it
can safely access the refs that fast-import updated.

would not be true without this change, I suspect.

Can we also add a new test or two that protect this from future
breakages?

Thanks.


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Junio C Hamano
Jeff King  writes:

>> #ifndef EUNDERFLOW
>> # ifdef ENODATA
>> #  define EUNDERFLOW ENODATA
>> # else
>> #  define EUNDERFLOW ESPIPE
>> # endif
>> #endif
>
> Right, I think our mails just crossed but I'm leaning in this direction.

Hmph, I may be slow (or may be skimming the exchanges too fast), but
what exactly is wrong with "0"?  As long as we do not have to tell
two or more "not exactly an error from syscall in errno" cases, I
would think "0" is the best value to use.

If the syserror message _is_ the issue, then we'd need to either
pick an existing errno that is available everywhere (with possibly
suboptimal message), or pick something and prepare a fallback for
platforms that lack the errno, so picking "0" as the value and use
whatever logic we would have used for the "fallback" would not sound
too bad.  I.e.

if (read_in_full(..., size) != size)
if (errno)
die_errno("oops");
else
die("short read");

If the callsite were too numerous,

#define die_errno_or(msg1, msg2) if (errno) die_errno(msg1); else die(msg2)

perhaps?



Re: -s theirs use-case(s) Was: BUG: merge -s theirs is not in effect

2017-09-25 Thread Junio C Hamano
Yaroslav Halchenko  writes:

> 1. As a workaround for absence of -m theirs I using mtheirs git alias:
> (I believe provided to me awhile back here on the list):
>
> mtheirs = !sh -c 'git merge -s ours --no-commit $1 && git read-tree -m -u 
> $1' -
>
> and it worked fine for my usecases
>
> 2. I think that if there is a reason for -s ours to exist, so there for -s 
> theirs
> since it is just the directionality of merges which changes between the two

Just on this point.  They are not exactly symmetric.

Imagine there are some undesirable changes you want to vanquish from
the world, but they have already built on useful changes on top of
the undesirable changes.  A hypothetical history might look like
this:

 B---C
/
   X---X---A
  /
  ---o---o your mainline

where 'X' denotes those unwanted changes.

With a "-s ours" merge, you can declare that changes on the other
branch will never be merged to your branch, i.e.

 B---C
/
   X---X---A
  / \
  ---o---o---M your mainline

and then you can safely merge A and C into your branch, without
having to worry about them bringing the unwanted changes to your
tree state.

 B---C
/ \
   X---X---A   \
  / \   \   \
  ---o---o---M---N---O  your mainline

That is the primary reason why "-s ours" exists, i.e. you do not
control the branch where mistakes X were made because that is
somebody else's history.

The symmetiric case where _you_ have wrong changes do not need "-s
theirs".  These mistakes X are yours, so are the changes depend on
them:

 B---C
/
   X---X---A
  /
  ---o---o their mainline

and you can just rebase A, B and C on top of their mainline while
getting rid of Xs yourself before publishing.

   B'--C'
  /  
  ---o---o---A'

The reason why ours and theirs are not symmetric is because you are
you and not them---the control and ownership of our history and
their history is not symmetric.

There may be valid workflows that benefit from "-s theirs", and I
would not be surprised at all if we found more of them in the past 9
years since we had the "why -s theirs does not exist" discussion in
2008.  But "because -s ours can be used in reverse to emulate" is
not a valid excuse to add "-s theirs".  It can be used a rationale
against adding it (e.g. "-s theirs generally is discouraged because
it forsters a bad workflow, but in a very rare case where it might
be useful, you can always check out their branch and merge yours
using '-s ours' to emulate it, so we do not lose any functionality
even if we did not add it"), though.


[PATCH] fast-import: checkpoint: dump branches/tags/marks even if object_count==0

2017-09-25 Thread Eric Rannaud
The checkpoint command cycles packfiles if object_count != 0, a sensible
test or there would be no pack files to write. Since 820b931012, the
command also dumps branches, tags and marks, but still conditionally.
However, it is possible for a command stream to modify refs or create
marks without creating any new objects.

For example, reset a branch:

$ git fast-import
reset refs/heads/master
from refs/heads/master^

checkpoint

but refs/heads/master remains unchanged.

Other example: a commit command that re-creates an object that already
exists in the object database.

This fix unconditionally calls dump_{branches,tags,marks}() for all
checkpoint commands. dump_branches() and dump_tags() are cheap to call
in the case of a no-op.

Signed-off-by: Eric Rannaud 
---
 fast-import.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 35bf671f12c4..d5e4cf0bad41 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -3189,10 +3189,10 @@ static void checkpoint(void)
checkpoint_requested = 0;
if (object_count) {
cycle_packfile();
-   dump_branches();
-   dump_tags();
-   dump_marks();
}
+   dump_branches();
+   dump_tags();
+   dump_marks();
 }
 
 static void parse_checkpoint(void)
-- 
2.14.1



Re: -X theirs does not resolve symlink conflict Was: BUG: merge -s theirs is not in effect

2017-09-25 Thread Junio C Hamano
Junio C Hamano  writes:

> Junio C Hamano  writes:
>
>> I do not recall people talking about symbolic links but the case of
>> binary files has been on the wishlist for a long time, and I do not
>> know of anybody who is working on (or is planning to work on) it.
>
> Ah, I misremembered.
>
> We've addressed the "binary files" case back in 2012 with a944af1d
> ("merge: teach -Xours/-Xtheirs to binary ll-merge driver",
> 2012-09-08).  I do not know offhand if it is just as easy to plumb
> the MERGE_FAVOR_{OURS,THEIRS} bits thru the symbolic link codepath,
> like that patch did to the binary file codepath.

Perhaps the attached (totally untested) patch might be a good
starting point.  I do not know if you are interested in hacking on
Git, and I do not feel offended if you are not, but perhaps somebody
else might get interested in seeing if this #leftoverbits is a good
direction to go in, and finishing it with docs and tests if it is
;-)


 merge-recursive.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 1d3f8f0d22..3605275ca3 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1026,10 +1026,19 @@ static int merge_file_1(struct merge_options *o,
   >oid,
   !o->call_depth);
} else if (S_ISLNK(a->mode)) {
-   oidcpy(>oid, >oid);
-
-   if (!oid_eq(>oid, >oid))
-   result->clean = 0;
+   switch (o->recursive_variant) {
+   case MERGE_RECURSIVE_NORMAL:
+   oidcpy(>oid, >oid);
+   if (!oid_eq(>oid, >oid))
+   result->clean = 0;
+   break;
+   case MERGE_RECURSIVE_OURS:
+   oidcpy(>oid, >oid);
+   break;
+   case MERGE_RECURSIVE_THEIRS:
+   oidcpy(>oid, >oid);
+   break;
+   }
} else
die("BUG: unsupported object type in the tree");
}


Re: -X theirs does not resolve symlink conflict Was: BUG: merge -s theirs is not in effect

2017-09-25 Thread Junio C Hamano
Junio C Hamano  writes:

> I do not recall people talking about symbolic links but the case of
> binary files has been on the wishlist for a long time, and I do not
> know of anybody who is working on (or is planning to work on) it.

Ah, I misremembered.

We've addressed the "binary files" case back in 2012 with a944af1d
("merge: teach -Xours/-Xtheirs to binary ll-merge driver",
2012-09-08).  I do not know offhand if it is just as easy to plumb
the MERGE_FAVOR_{OURS,THEIRS} bits thru the symbolic link codepath,
like that patch did to the binary file codepath.



Re: -X theirs does not resolve symlink conflict Was: BUG: merge -s theirs is not in effect

2017-09-25 Thread Junio C Hamano
Yaroslav Halchenko  writes:

> yes it does. Thanks.  And that is where I realized that I should have used -X
> theirs (not -s theirs), as the instruction on the option for the
> (recursive) merge.  And now problem is more specific:
>
> - conflict within file content editing was resolved as instructed
>   (taking "theirs" version)
>
> - BUT symlink was not taken from "theirs" and left as unresolved conflict:

I wouldn't call it working-as-intended, but this unfortunately is
expected.  You'd encounter exactly the same behaviour when changes
to a binary file conflicts.

It is because -X _ONLY_ kicks in (i.e. that is how it
is defined) when we would otherwise throw the half-merged result:

<<<
our version looks like this
===
their version looks like this
>>>

and ask you to edit that to a correct resolution.

Because you would not normally be given something like the above
when merging conflicted changes to symbolic links or to binary
files, -X has no chance of affecting the outcome.

I do not recall people talking about symbolic links but the case of
binary files has been on the wishlist for a long time, and I do not
know of anybody who is working on (or is planning to work on) it.


Re: [PATCH] Documentation: consolidate submodule..update

2017-09-25 Thread Junio C Hamano
Brandon Williams  writes:

>> +The method by which a submodule is updated by 'git submodule update',
>> +which is the only affected command, others such as
>> +'git checkout --recurse-submodules' are unaffected. It exists for
>> +historical reasons, when 'git submodule' was the only command to
>> +interact with submodules; settings like `submodule.active`
>> +and `pull.rebase` are more specific. It is copied to the config
>> +by `git submodule init` from the .gitmodules file.
>> +Allowed values here are 'checkout', 'rebase', 'merge' or 'none'.
>> +See description of 'update' command in linkgit:git-submodule[1]
>> +for their meaning. Note that the '!command' form is intentionally
>> +ignored here for security reasons.
>
> This probably needs to be tweaked a bit to say that the '!command' form
> is ignored by submodule init, in that it isn't copied over from the
> .gitmodules file, but if it is configured in your config it will be
> respected by 'submodule update'.

I do not think gitmodules.txt is the place to say anything more than
what Stefan's patch says above.  Perhaps config.txt should mention
that in addition to what the variable with the same in .gitmodules
can take, it is allowed to use the !command form.

IOW, in config.txt

submodule..update::
Specifies how 'git submodule update' should work on
the named submodule.  In addition to the values that
can be specified in (and copied from) `.gitmodules`
(see linkgit:gitmodules[5]), `!command` form can
also be used.

or something, perhaps?


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> What do you think of patch 7 in light of this? If the short-read case
> gives us a sane errno, should we even bother trying to consistently
> handle its error separately?

I like the read_exactly_or_die variant because it makes callers more
concise, but on the other hand a caller handling the error can write a
more meaningful error message with the right amount of context.

So I think you're right that it's better to drop patch 7.

Thanks,
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jeff King
On Mon, Sep 25, 2017 at 05:20:20PM -0700, Jonathan Nieder wrote:

> > What do you think of ENODATA? The glibc text for it is pretty
> > appropriate. If it's not available everywhere, we'd have to fallback to
> > something else (EIO? 0?). I don't know how esoteric it is. The likely
> > candidate to be lacking it is Windows.
> 
> ENODATA with a fallback to ESPIPE sounds fine to me.
> 
> read() would never produce ESPIPE because it doesn't seek.
> 
> So that would be:
> 
> /* errno value to use for early EOF */
> #ifndef ENODATA
> #define ENODATA ESPIPE
> #endif

Thanks, I'll re-roll with that (and thank you Stefan for mentioning
ENODATA again; I came across it early in my search but discounted it as
not quite right semantically. I should have been looking at the
strerror() text, which is what really matters).

What do you think of patch 7 in light of this? If the short-read case
gives us a sane errno, should we even bother trying to consistently
handle its error separately?

-Peff


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Sep 25, 2017 at 05:17:32PM -0700, Jonathan Nieder wrote:

>> #ifndef EUNDERFLOW
>> # ifdef ENODATA
>> #  define EUNDERFLOW ENODATA
>> # else
>> #  define EUNDERFLOW ESPIPE
>> # endif
>> #endif
>
> Right, I think our mails just crossed but I'm leaning in this direction.
> I'd prefer to call it SHORT_READ_ERRNO or something, though. Your
> "#ifndef EUNDERFLOW" had me thinking that this was something that a real
> platform might have, but AFAICT you just made it up.

Agreed.  Two of the risks of replying too quickly. :)

Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jeff King
On Mon, Sep 25, 2017 at 05:17:32PM -0700, Jonathan Nieder wrote:

> Jonathan Nieder wrote:
> > On Mon, Sep 25, 2017 at 08:09:13PM -0400, Jeff King wrote:
> 
> >> ENODATA is not too bad. On my glibc system it yields "No data available"
> >> from strerror(), which is at least comprehensible.
> >>
> >> We're still left with the question of whether it is defined everywhere
> >> (and what to fallback to when it isn't).
> >
> > So,
> >
> > #ifndef EUNDERFLOW
> > #ifdef ENODATA
> > #define ENODATA EUNDERFLOW
> > #else
> > #define ENODATA ESPIPE
> > #endif
> > #endif
> >
> > ?
> 
> Uh, pretend I said
> 
> #ifndef EUNDERFLOW
> # ifdef ENODATA
> #  define EUNDERFLOW ENODATA
> # else
> #  define EUNDERFLOW ESPIPE
> # endif
> #endif

Right, I think our mails just crossed but I'm leaning in this direction.
I'd prefer to call it SHORT_READ_ERRNO or something, though. Your
"#ifndef EUNDERFLOW" had me thinking that this was something that a real
platform might have, but AFAICT you just made it up.

-Peff


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> I definitely would prefer to avoid EIO, or anything that an actual
> read() might return.
>
> What do you think of ENODATA? The glibc text for it is pretty
> appropriate. If it's not available everywhere, we'd have to fallback to
> something else (EIO? 0?). I don't know how esoteric it is. The likely
> candidate to be lacking it is Windows.

ENODATA with a fallback to ESPIPE sounds fine to me.

read() would never produce ESPIPE because it doesn't seek.

So that would be:

/* errno value to use for early EOF */
#ifndef ENODATA
#define ENODATA ESPIPE
#endif

Thanks,
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jonathan Nieder wrote:
> On Mon, Sep 25, 2017 at 08:09:13PM -0400, Jeff King wrote:

>> ENODATA is not too bad. On my glibc system it yields "No data available"
>> from strerror(), which is at least comprehensible.
>>
>> We're still left with the question of whether it is defined everywhere
>> (and what to fallback to when it isn't).
>
> So,
>
> #ifndef EUNDERFLOW
> #ifdef ENODATA
> #define ENODATA EUNDERFLOW
> #else
> #define ENODATA ESPIPE
> #endif
> #endif
>
> ?

Uh, pretend I said

#ifndef EUNDERFLOW
# ifdef ENODATA
#  define EUNDERFLOW ENODATA
# else
#  define EUNDERFLOW ESPIPE
# endif
#endif

Sorry for the nonsense.
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jeff King
On Mon, Sep 25, 2017 at 05:13:54PM -0700, Jonathan Nieder wrote:

> > I actually prefer "0" of the all of the options discussed. At least it
> > is immediately clear that it is not a syscall error.
> 
> I have a basic aversion to ": Success" in error messages.  Whenever I
> see such an error message, I stop trusting the program that produced it
> not to be seriously buggy.  Maybe I'm the only one?

That's a fair criticism, I think.

> If no actual errno works, we could make a custom strerror-like function
> with our own custom errors (making them negative to avoid clashing with
> standard errno values), but this feels like overkill.

Yes, I also thought about that. It feels pretty invasive (I guess we'd
wrap strerror() with our own custom function so that callers don't have
to remember which one to use).

> In the same spirit of misleadingly confusing too-long and too-short,
> there is also ERANGE ("Result too large"), which doesn't work here.
> There's also EPROTO "Protocol error", but that's about protocols, not
> file formats.  More vague and therefor maybe more applicable is EBADMSG
> "Bad message".  There's also ENOMSG "No message of the desired type".
> 
> If the goal is to support debugging, an error like EPIPE "Broken pipe"
> or ESPIPE "Invalid seek" would be likely to lead me in the right
> direction (wrong file size), even though it is misleading about how
> the error surfaced.
> 
> We could also avoid trying to be cute and use something generic like
> EIO "Input/output error".

I definitely would prefer to avoid EIO, or anything that an actual
read() might return.

What do you think of ENODATA? The glibc text for it is pretty
appropriate. If it's not available everywhere, we'd have to fallback to
something else (EIO? 0?). I don't know how esoteric it is. The likely
candidate to be lacking it is Windows.

-Peff


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
On Mon, Sep 25, 2017 at 08:09:13PM -0400, Jeff King wrote:
> On Mon, Sep 25, 2017 at 05:06:58PM -0700, Stefan Beller wrote:

>> After reading this discussion from the sideline, maybe
>>
>>   ENODATA No message is available on the STREAM head read
>> queue (POSIX.1)
>>
>> is not too bad after all. Or
>>
>>   ESPIPE  Invalid seek (POSIX.1)
>>
>> Technically we do not seek, but one could imagine we'd seek there
>> to read?
>
> ENODATA is not too bad. On my glibc system it yields "No data available"
> from strerror(), which is at least comprehensible.
>
> We're still left with the question of whether it is defined everywhere
> (and what to fallback to when it isn't).

So,

#ifndef EUNDERFLOW
#ifdef ENODATA
#define ENODATA EUNDERFLOW
#else
#define ENODATA ESPIPE
#endif
#endif

?  Windows has ESPIPE, and I'm not sure what other non-POSIX platform we
need to worry about.

Thanks,
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Sep 25, 2017 at 04:55:10PM -0700, Jonathan Nieder wrote:

>> All that really matters is the strerror() that the user would see.
>
> Agreed, though I didn't think those were necessarily standardized.

The standard allows them to vary for the sake of internationalization,
but they are de facto constants (probably because of application authors
like us abusing them ;-)).

[...]
>> Of course, it's even
>> better if we fix the callers and don't try to wedge this into errno.
>
> Yes. This patch is just a stop-gap. Perhaps we should abandon it
> entirely. But I couldn't find a way to fix all the callers. If you have
> a function that just returns "-1" when it sees a read_in_full() error,
> how does _its_ caller tell the difference?
>
> I guess the answer is that the interface of the sub-function calling
> read_in_full() needs to change.

Yes. :/

>> If you are okay with the too-long/too-short confusion in EOVERFLOW, then
>> there is EMSGSIZE:
>>
>>  Message too long
>
> I don't really like that one either.
>
> I actually prefer "0" of the all of the options discussed. At least it
> is immediately clear that it is not a syscall error.

I have a basic aversion to ": Success" in error messages.  Whenever I
see such an error message, I stop trusting the program that produced it
not to be seriously buggy.  Maybe I'm the only one?

If no actual errno works, we could make a custom strerror-like function
with our own custom errors (making them negative to avoid clashing with
standard errno values), but this feels like overkill.

In the same spirit of misleadingly confusing too-long and too-short,
there is also ERANGE ("Result too large"), which doesn't work here.
There's also EPROTO "Protocol error", but that's about protocols, not
file formats.  More vague and therefor maybe more applicable is EBADMSG
"Bad message".  There's also ENOMSG "No message of the desired type".

If the goal is to support debugging, an error like EPIPE "Broken pipe"
or ESPIPE "Invalid seek" would be likely to lead me in the right
direction (wrong file size), even though it is misleading about how
the error surfaced.

We could also avoid trying to be cute and use something generic like
EIO "Input/output error".

Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jeff King
On Mon, Sep 25, 2017 at 05:06:58PM -0700, Stefan Beller wrote:

> >> If you are okay with the too-long/too-short confusion in EOVERFLOW, then
> >> there is EMSGSIZE:
> >>
> >>   Message too long
> >
> > I don't really like that one either.
> >
> > I actually prefer "0" of the all of the options discussed. At least it
> > is immediately clear that it is not a syscall error.
> >
> > -Peff
> 
> After reading this discussion from the sideline, maybe
> 
>   ENODATA No message is available on the STREAM head read
> queue (POSIX.1)
> 
> is not too bad after all. Or
> 
>   ESPIPE  Invalid seek (POSIX.1)
> 
> Technically we do not seek, but one could imagine we'd seek there
> to read?

ENODATA is not too bad. On my glibc system it yields "No data available"
from strerror(), which is at least comprehensible.

We're still left with the question of whether it is defined everywhere
(and what to fallback to when it isn't).

-Peff


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Stefan Beller
On Mon, Sep 25, 2017 at 5:01 PM, Jeff King  wrote:

>
>> If you are okay with the too-long/too-short confusion in EOVERFLOW, then
>> there is EMSGSIZE:
>>
>>   Message too long
>
> I don't really like that one either.
>
> I actually prefer "0" of the all of the options discussed. At least it
> is immediately clear that it is not a syscall error.
>
> -Peff

After reading this discussion from the sideline, maybe

  ENODATA No message is available on the STREAM head read
queue (POSIX.1)

is not too bad after all. Or

  ESPIPE  Invalid seek (POSIX.1)

Technically we do not seek, but one could imagine we'd seek there
to read?


Re: [PATCH] t7406: submodule..update command must not be run from .gitmodules

2017-09-25 Thread Jonathan Nieder
Stefan Beller wrote:

> submodule..update can be assigned an arbitrary command via setting
> it to "!command". When this command is found in the regular config, Git
> ought to just run that command instead of other update mechanisms.
>
> However if that command is just found in the .gitmodules file, it is
> potentially untrusted, which is why we do not run it.  Add a test
> confirming the behavior.
>
> Suggested-by: Jonathan Nieder 
> Signed-off-by: Stefan Beller 
> ---
>  t/t7406-submodule-update.sh | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
> index 034914a14f..d718cb00e7 100755
> --- a/t/t7406-submodule-update.sh
> +++ b/t/t7406-submodule-update.sh
> @@ -406,6 +406,20 @@ test_expect_success 'submodule update - command in 
> .git/config' '
>   )
>  '
>  
> +test_expect_success 'submodule update - command in .gitmodules is ignored' '
> + test_when_finished "git -C super reset --hard HEAD^" &&
> +
> + write_script must_not_run.sh <<-EOF &&
> + >$TEST_DIRECTORY/bad
> + EOF
> +
> + git -C super config -f .gitmodules submodule.submodule.update 
> "!$TEST_DIRECTORY/must_not_run.sh" &&

Long line, but I don't think I care.  I wish there were a tool like
"make style" to format shell scripts.

> + git -C super commit -a -m "add command to .gitmodules file" &&
> + git -C super/submodule reset --hard $submodulesha1^ &&
> + git -C super submodule update submodule &&
> + test_path_is_missing bad
> +'

Per offline discussion, you tested that this fails when you use
.git/config instead of .gitmodules, so there aren't any subtle typos
here. :)

Reviewed-by: Jonathan Nieder 

Thanks for writing it.


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jeff King
On Mon, Sep 25, 2017 at 04:55:10PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> >   [EOVERFLOW]
> > The file is a regular file, nbyte is greater than 0, the starting
> > position is before the end-of-file, and the starting position is
> > greater than or equal to the offset maximum established in the open
> > file description associated with fildes.
> >
> > That's not _exactly_ what's going on here, but it's pretty close. And is
> > what you would get if you implemented read_exactly() in terms of
> > something like pread().
> 
> All that really matters is the strerror() that the user would see.

Agreed, though I didn't think those were necessarily standardized.

> For EOVERFLOW, that is
> 
>   Value too large to be stored in data type

Interestingly that does not seem to be a good description for the case
POSIX describes above.

> For EILSEQ, it is
> 
>   Illegal byte sequence
> 
> Neither is perfect, but the latter seems like a better fit for the kind
> of file format errors we're describing here.

I find that one actively misleading. It's not the bytes we saw, it's the
lack of them.

> Of course, it's even
> better if we fix the callers and don't try to wedge this into errno.

Yes. This patch is just a stop-gap. Perhaps we should abandon it
entirely. But I couldn't find a way to fix all the callers. If you have
a function that just returns "-1" when it sees a read_in_full() error,
how does _its_ caller tell the difference?

I guess the answer is that the interface of the sub-function calling
read_in_full() needs to change.

> If you are okay with the too-long/too-short confusion in EOVERFLOW, then
> there is EMSGSIZE:
> 
>   Message too long

I don't really like that one either.

I actually prefer "0" of the all of the options discussed. At least it
is immediately clear that it is not a syscall error.

-Peff


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

>   [EOVERFLOW]
> The file is a regular file, nbyte is greater than 0, the starting
> position is before the end-of-file, and the starting position is
> greater than or equal to the offset maximum established in the open
> file description associated with fildes.
>
> That's not _exactly_ what's going on here, but it's pretty close. And is
> what you would get if you implemented read_exactly() in terms of
> something like pread().

All that really matters is the strerror() that the user would see.

For EOVERFLOW, that is

Value too large to be stored in data type

For EILSEQ, it is

Illegal byte sequence

Neither is perfect, but the latter seems like a better fit for the kind
of file format errors we're describing here.  Of course, it's even
better if we fix the callers and don't try to wedge this into errno.

If you are okay with the too-long/too-short confusion in EOVERFLOW, then
there is EMSGSIZE:

Message too long

Hope that helps,
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Sep 25, 2017 at 04:33:16PM -0700, Jonathan Nieder wrote:
>> Jef King wrote:

>>>   errno = 0;
>>>   read_in_full(fd, buf, sizeof(buf));
>>>   if (errno)
>>> die_errno("oops");
[...]
>>>  in general we frown on it for calls like
>>> read().
>>
>> Correct.  Actually more than "frown on": except for with the few calls
>> like strtoul that are advertised to work this way, POSIX does not make
>> the guarantee the above code would rely on, at all.
>>
>> So it's not just frowned upon: it's so unportable that the standard
>> calls it out as something that won't work.
>
> Is it unportable? Certainly read() is free reset errno to zero on
> success. But is it allowed to set it to another random value?
>
> I think we're getting pretty academic here, but I'm curious if you have
> a good reference.

Yes, it is allowed to set it to another random value.  It's a common
thing for implementations to do when they hit a recoverable error,
which they sometimes do (e.g. think EFAULT, EINTR, ETIMEDOUT, or an
implementation calling strtoul and getting EINVAL or ERANGE).

glibc only recently started trying to avoid this kind of behavior,
because even though the standard allows it, users hate it.

POSIX.1-2008 XSI 2.3 "Error Numbers" says

Some functions provide the error number in a variable accessed
through the symbol errno, defined by including the 
header.  The value of errno should only be examined when it is
indicated to be valid by a function's return value.

See http://austingroupbugs.net/view.php?id=374 for an example where
someone wanted to add a new guarantee of that kind to a function.

Hope that helps,
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jeff King
On Mon, Sep 25, 2017 at 07:37:32PM -0400, Jeff King wrote:

> > Correct.  Actually more than "frown on": except for with the few calls
> > like strtoul that are advertised to work this way, POSIX does not make
> > the guarantee the above code would rely on, at all.
> > 
> > So it's not just frowned upon: it's so unportable that the standard
> > calls it out as something that won't work.
> 
> Is it unportable? Certainly read() is free reset errno to zero on
> success. But is it allowed to set it to another random value?
> 
> I think we're getting pretty academic here, but I'm curious if you have
> a good reference.

Answering my own question. POSIX says:

  No function in this volume of IEEE Std 1003.1-2001 shall set errno to
  0. The setting of errno after a successful call to a function is
  unspecified unless the description of that function specifies that
  errno shall not be modified.

So that does seem to outlaw errno-only checks for most functions.

It makes me wonder if the recent getdelim() fix is technically violating
this. It should instead be explicitly checking for feof().

> IMHO as long as it _is_ deterministic and recognize as not an error from
> read(), that's the best we can do. Which is why I went with "0" in the
> first place. Seeing "read error: success" is a common-ish idiom (to me
> anyway) for "read didn't fail, but some user-space logic did", if only
> because it often happens accidentally.

Another option I ran across from POSIX:

  [EOVERFLOW]
The file is a regular file, nbyte is greater than 0, the starting
position is before the end-of-file, and the starting position is
greater than or equal to the offset maximum established in the open
file description associated with fildes.

That's not _exactly_ what's going on here, but it's pretty close. And is
what you would get if you implemented read_exactly() in terms of
something like pread().

-Peff


Re: [PATCH] Documentation: consolidate submodule..update

2017-09-25 Thread Jonathan Nieder
Stefan Beller wrote:

> By as-is I refer to origin/pu.

Ah, okay.  I'm not in that habit because pu frequently loses topics
--- it's mostly useful as a tool to (1) distribute topic branches and
(2) check whether they are compatible with each other.

[...]
> On Mon, Sep 25, 2017 at 3:22 PM, Jonathan Nieder  wrote:

>> Maybe we should work on first wordsmithing the doc and then figuring
>> out where it goes?  The current state of the doc with (?) three
>> different texts,
>
> such that each different text highlights each locations purpose.
>
>> all wrong,
>
> Care to spell out this bold claim?

In the state of "master", "man git-config" tells me that

[submodule ""]
update = ...

sets the "default update procedure for a submodule".  It suggests that
I read about "git submodule update" in git-submodule(1) for more
details.  This is problematic because

 1. It doesn't tell me when I should and should not set it.
 2. I would expect "git checkout --recurse-submodules" to respect the
default update procedure.
 3. It doesn't tell me what commands like "git fetch
--recurse-submodules" will do.
 4. It doesn't point me to better alternatives, when this
configuration doesn't fit my use case.

"man git-submodule" doesn't have a section on submodule..update.
It has references scattered throughout the manpage.  It only tells me
how the setting affects the "git submodule update" command and doesn't
address the problems (1)-(4).

"man gitmodules" also refers to this setting as the "default update
procedure for the named submodule".  It doesn't answer the questions
(1)-(4) either.

Thanks and hope that helps,
Jonathan


Re: [PATCH v2 4/5] sha1_name: Parse less while finding common prefix

2017-09-25 Thread Stefan Beller
On Mon, Sep 25, 2017 at 2:54 AM, Derrick Stolee  wrote:
> Create get_hex_char_from_oid() to parse oids one hex character at a
> time. This prevents unnecessary copying of hex characters in
> extend_abbrev_len() when finding the length of a common prefix.
>
> p0008.1: find_unique_abbrev() for existing objects
> --
>
> For 10 repeated tests, each checking 100,000 known objects, we find the
> following results when running in a Linux VM:
>
> |   | Pack  | Packed  | Loose  | Base   | New||
> | Repo  | Files | Objects | Objects| Time   | Time   | Rel%   |
> |---|---|-|||||
> | Git   | 1 |  230078 |  0 | 0.08 s | 0.08 s |   0.0% |
> | Git   | 5 |  230162 |  0 | 0.17 s | 0.16 s | - 5.9% |
> | Git   | 4 |  154310 |  75852 | 0.14 s | 0.12 s | -14.3% |
> | Linux | 1 | 5606645 |  0 | 0.50 s | 0.25 s | -50.0% |
> | Linux |24 | 5606645 |  0 | 2.41 s | 2.08 s | -13.7% |
> | Linux |23 | 5283204 | 323441 | 1.99 s | 1.69 s | -15.1% |
> | VSTS  | 1 | 4355923 |  0 | 0.40 s | 0.22 s | -45.0% |
> | VSTS  |32 | 4355923 |  0 | 2.09 s | 1.99 s | - 4.8% |
> | VSTS  |31 | 4276829 |  79094 | 3.60 s | 3.20 s | -11.1% |
>
> For the Windows repo running in Windows Subsystem for Linux:
>
> Pack Files: 50
> Packed Objects: 22,385,898
>  Loose Objects: 492
>  Base Time: 4.61 s
>   New Time: 4.61 s
>  Rel %: 0.0%
>
> p0008.2: find_unique_abbrev() for missing objects
> -
>
> For 10 repeated tests, each checking 100,000 missing objects, we find
> the following results when running in a Linux VM:
>
> |   | Pack  | Packed  | Loose  | Base   | New||
> | Repo  | Files | Objects | Objects| Time   | Time   | Rel%   |
> |---|---|-|||||
> | Git   | 1 |  230078 |  0 | 0.06 s | 0.05 s | -16.7% |
> | Git   | 5 |  230162 |  0 | 0.14 s | 0.15 s | + 7.1% |
> | Git   | 4 |  154310 |  75852 | 0.12 s | 0.12 s |   0.0% |
> | Linux | 1 | 5606645 |  0 | 0.40 s | 0.17 s | -57.5% |
> | Linux |24 | 5606645 |  0 | 1.59 s | 1.30 s | -18.2% |
> | Linux |23 | 5283204 | 323441 | 1.23 s | 1.10 s | -10.6% |
> | VSTS  | 1 | 4355923 |  0 | 0.25 s | 0.12 s | -52.0% |
> | VSTS  |32 | 4355923 |  0 | 1.45 s | 1.34 s | - 7.6% |
> | VSTS  |31 | 4276829 |  79094 | 1.59 s | 1.34 s | -15.7% |
>
> For the Windows repo running in Windows Subsystem for Linux:
>
> Pack Files: 50
> Packed Objects: 22,385,898
>  Loose Objects: 492
>  Base Time: 3.91 s
>   New Time: 3.08 s
>  Rel %: -21.1%
>

These number look pretty cool!

> Signed-off-by: Derrick Stolee 
>
> Signed-off-by: Derrick Stolee 

double signoff?

> ---
>  sha1_name.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/sha1_name.c b/sha1_name.c
> index f2a1ebe49..bb47b6702 100644
> --- a/sha1_name.c
> +++ b/sha1_name.c
> @@ -480,13 +480,22 @@ struct min_abbrev_data {
> char *hex;
>  };
>
> +static inline char get_hex_char_from_oid(const struct object_id *oid, int i)

'i' is not very descriptive, maybe add a comment?
(I realize it is just walking through the char*s one by one)

Maybe this function (together with the change in the while below)
could go into hex.c as "int progressively_cmp_oids" that returns
the position at which the given oids differ?

> +{
> +   static const char hex[] = "0123456789abcdef";
> +
> +   if ((i & 1) == 0)
> +   return hex[oid->hash[i >> 1] >> 4];
> +   else
> +   return hex[oid->hash[i >> 1] & 0xf];
> +}

sha1_to_hex_r has very similar code, though it iterates less
and covers both cases in the loop body.

That is the actual reason I propose moving this function
(or a variant thereof) to hex.c as there we can share code.


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jeff King
On Mon, Sep 25, 2017 at 04:33:16PM -0700, Jonathan Nieder wrote:

> > If I do this:
> >
> >   errno = 0;
> >   read_in_full(fd, buf, sizeof(buf));
> >   if (errno)
> > die_errno("oops");
> >
> > then we'll claim an error, even though there was none (remember that
> > it's only an error for _some_ callers to not read the whole length).
> >
> > This may be sufficiently odd that we don't need to care about it. There
> > are some calls (like strtoul) which require this kind of clear-and-check
> > strategy with errno, but in general we frown on it for calls like
> > read().
> 
> Correct.  Actually more than "frown on": except for with the few calls
> like strtoul that are advertised to work this way, POSIX does not make
> the guarantee the above code would rely on, at all.
> 
> So it's not just frowned upon: it's so unportable that the standard
> calls it out as something that won't work.

Is it unportable? Certainly read() is free reset errno to zero on
success. But is it allowed to set it to another random value?

I think we're getting pretty academic here, but I'm curious if you have
a good reference.

> > We could also introduce a better helper like this:
> >
> >   int read_exactly(int fd, void *buf, size_t count)
> >   {
> > ssize_t ret = read_in_full(fd, buf, count);
> > if (ret < 0)
> > return -1;
> > if (ret != count) {
> > errno = EILSEQ;
> > return -1;
> > }
> > return 0;
> >   }
> >
> > Then we know that touching errno always coincides with an error return.
> > And it's shorter to check "< 0" compared to "!= count" in the caller.
> > But of course a caller which wants to distinguish the two cases for its
> > error messages then has to look at errno:
> 
> That sounds nice, but it doesn't solve the original problem of callers
> using read_in_full that way.

Right. None of the options discussed in this patch can do so. They can
only take a caller which doesn't distinguish between the cases, and give
it a deterministic value in errno for the short-read case.

IMHO as long as it _is_ deterministic and recognize as not an error from
read(), that's the best we can do. Which is why I went with "0" in the
first place. Seeing "read error: success" is a common-ish idiom (to me
anyway) for "read didn't fail, but some user-space logic did", if only
because it often happens accidentally.

-Peff


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

>> --- a/wrapper.c
>> +++ b/wrapper.c
>> @@ -318,8 +318,10 @@ ssize_t read_in_full(int fd, void *buf, size_t count)
>>  ssize_t loaded = xread(fd, p, count);
>>  if (loaded < 0)
>>  return -1;
>> -if (loaded == 0)
>> +if (loaded == 0) {
>> +errno = EILSEQ;
>>  return total;
>> +}
>
> If I do this:
>
>   errno = 0;
>   read_in_full(fd, buf, sizeof(buf));
>   if (errno)
>   die_errno("oops");
>
> then we'll claim an error, even though there was none (remember that
> it's only an error for _some_ callers to not read the whole length).
>
> This may be sufficiently odd that we don't need to care about it. There
> are some calls (like strtoul) which require this kind of clear-and-check
> strategy with errno, but in general we frown on it for calls like
> read().

Correct.  Actually more than "frown on": except for with the few calls
like strtoul that are advertised to work this way, POSIX does not make
the guarantee the above code would rely on, at all.

So it's not just frowned upon: it's so unportable that the standard
calls it out as something that won't work.

> We could also introduce a better helper like this:
>
>   int read_exactly(int fd, void *buf, size_t count)
>   {
>   ssize_t ret = read_in_full(fd, buf, count);
>   if (ret < 0)
>   return -1;
>   if (ret != count) {
>   errno = EILSEQ;
>   return -1;
>   }
>   return 0;
>   }
>
> Then we know that touching errno always coincides with an error return.
> And it's shorter to check "< 0" compared to "!= count" in the caller.
> But of course a caller which wants to distinguish the two cases for its
> error messages then has to look at errno:

That sounds nice, but it doesn't solve the original problem of callers
using read_in_full that way.

Thanks,
Jonathan


Re: [PATCH 7/7] add xread_in_full() helper

2017-09-25 Thread Jeff King
On Mon, Sep 25, 2017 at 03:16:08PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > Many callers of read_in_full() expect to see exactly "len"
> > bytes, and die if that isn't the case.
> 
> micronit: Can this be named read_in_full_or_die?
> 
> Otherwise it's too easy to mistake for a function like xread, which
> has different semantics.
> 
> I realize that xmalloc, xmemdupz, etc use a different convention.
> That's yet another reason to be explicit, IMHO.

Yeah, we've definitely misused the "x" prefix for different things. I
agree that being explicit probably can't hurt.

I wonder if it's worth calling it "read_exactly_or_die()" to emphasize
that not reading enough bytes is a die-able offense.

-Peff


Re: [PATCH 6/7] worktree: check the result of read_in_full()

2017-09-25 Thread Jeff King
On Mon, Sep 25, 2017 at 03:14:26PM -0700, Jonathan Nieder wrote:

> > diff --git a/builtin/worktree.c b/builtin/worktree.c
> > index 2f4a4ef9cd..87b3d70b0b 100644
> > --- a/builtin/worktree.c
> > +++ b/builtin/worktree.c
> > @@ -59,7 +59,11 @@ static int prune_worktree(const char *id, struct strbuf 
> > *reason)
> > }
> > len = xsize_t(st.st_size);
> > path = xmallocz(len);
> > -   read_in_full(fd, path, len);
> > +   if (read_in_full(fd, path, len) != len) {
> > +   strbuf_addf(reason, _("Removing worktrees/%s: gitdir read did 
> > not match stat (%s)"),
> > +   id, strerror(errno));
> 
> I'm a little confused.  The 'if' condition checks for a read error but
> the message says something about 'stat'.
> 
> If we're trying to double-check the 'stat' result, shouldn't we read
> all the way to EOF in case the file got longer?

If you wanted to really double-check the stat result, yes you'd want to
make sure there aren't extra bytes either. But really we're just making
sure we were able to read _enough_ bytes.

To be honest, I'm tempted to rip out the whole thing and replace it with
strbuf_read_file(), which seems more straightforward.

The function does seem to rely on the stat() to get the mtime, so we'd
have to leave that part in.

-Peff


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jeff King
On Mon, Sep 25, 2017 at 03:09:14PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > In an ideal world, callers would always distinguish between
> > these cases and give a useful message for each. But as an
> > easy way to make our imperfect world better, let's reset
> > errno to a known value. The best we can do is "0", which
> > will yield something like:
> >
> >   unable to read: Success
> >
> > That's not great, but at least it's deterministic and makes
> > it clear that we didn't see an error from read().
> 
> Yuck.  Can we set errno to something more specific instead?

Yes, but what? You've suggested EILSEQ here, but that feels a
bit...funny. I guess at least it's unlikely that read() would ever set
errno to that itself, so we can be reasonably sure that seeing it is a
sign of a short read. I thought ERANGE might be a possible candidate,
but it doesn't quite fit. Ditto for ENODATA.

I _would_ like to have something meaningful, as it would mean the whole
xread_in_full() nonsense from the final patch would be unnecessary. It
would simply be reasonable to show the errno after read_in_full.

Another question is whether EILSEQ (or the others) is actually defined
everywhere we compile. If it isn't, we'll have to define it ourselves
(but to what? strerror() won't return anything useful for it).

> read(2) also doesn't promise not to clobber errno on success.

Yes, though it's only a problem if you're using something other than 0.

Speaking of funny errno clobbering, with your patch:

> diff --git a/wrapper.c b/wrapper.c
> index 61aba0b5c1..1842a99b87 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -318,8 +318,10 @@ ssize_t read_in_full(int fd, void *buf, size_t count)
>   ssize_t loaded = xread(fd, p, count);
>   if (loaded < 0)
>   return -1;
> - if (loaded == 0)
> + if (loaded == 0) {
> + errno = EILSEQ;
>   return total;
> + }

If I do this:

  errno = 0;
  read_in_full(fd, buf, sizeof(buf));
  if (errno)
die_errno("oops");

then we'll claim an error, even though there was none (remember that
it's only an error for _some_ callers to not read the whole length).

This may be sufficiently odd that we don't need to care about it. There
are some calls (like strtoul) which require this kind of clear-and-check
strategy with errno, but in general we frown on it for calls like
read().

We could also introduce a better helper like this:

  int read_exactly(int fd, void *buf, size_t count)
  {
ssize_t ret = read_in_full(fd, buf, count);
if (ret < 0)
return -1;
if (ret != count) {
errno = EILSEQ;
return -1;
}
return 0;
  }

Then we know that touching errno always coincides with an error return.
And it's shorter to check "< 0" compared to "!= count" in the caller.
But of course a caller which wants to distinguish the two cases for its
error messages then has to look at errno:

  if (read_exactly(fd, buf, len) < 0) {
if (errno == EILSEQ) /* eek, now this abstraction is leaky */
die("short read");
else
die_errno("read error");
  }

I dunno. All options seem pretty gross to me.

-Peff


Re: [PATCH 1/7] files-backend: prefer "0" for write_in_full() error check

2017-09-25 Thread Jeff King
On Mon, Sep 25, 2017 at 02:59:27PM -0700, Jonathan Nieder wrote:

> > But during the conflict resolution in c50424a6f0 (Merge
> > branch 'jk/write-in-full-fix', 2017-09-25), this morphed
> > into
> [...]
> Good eyes.  Thanks.

Sort of. :) I usually continually rebase my topics until they end up
empty. So I notice bits like this either during conflict resolution, or
when the topic is left unexpectedly non-empty.

It's not foolproof, though. Sometimes rebasing a topic on itself is so
painful that I "rebase --skip" liberally in the early parts, and would
miss any merge funniness.

> By the way, the description from that merge commit
> 
> Many codepaths did not diagnose write failures correctly when disks
> go full, due to their misuse of write_in_full() helper function,
> which have been corrected.
> 
> does not look accurate to me.  At least the "Many codepaths" part.
> All of those changes except for three should be no-ops.  The scariest
> one is the 'long ret = write_in_full(fd, buf, size)' in notes-merge.c,
> which is about overflowing a "long" on Windows instead of error
> handling.

The ones in config.c are definitely wrong, too. Though I'd agree that
"many" might be overstating it.

-Peff


[PATCH] t7406: submodule..update command must not be run from .gitmodules

2017-09-25 Thread Stefan Beller
submodule..update can be assigned an arbitrary command via setting
it to "!command". When this command is found in the regular config, Git
ought to just run that command instead of other update mechanisms.

However if that command is just found in the .gitmodules file, it is
potentially untrusted, which is why we do not run it.  Add a test
confirming the behavior.

Suggested-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
---

 updated to use the super robust script.
 Thanks Jonathan,
 
 Stefan

 t/t7406-submodule-update.sh | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 034914a14f..d718cb00e7 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -406,6 +406,20 @@ test_expect_success 'submodule update - command in 
.git/config' '
)
 '
 
+test_expect_success 'submodule update - command in .gitmodules is ignored' '
+   test_when_finished "git -C super reset --hard HEAD^" &&
+
+   write_script must_not_run.sh <<-EOF &&
+   >$TEST_DIRECTORY/bad
+   EOF
+
+   git -C super config -f .gitmodules submodule.submodule.update 
"!$TEST_DIRECTORY/must_not_run.sh" &&
+   git -C super commit -a -m "add command to .gitmodules file" &&
+   git -C super/submodule reset --hard $submodulesha1^ &&
+   git -C super submodule update submodule &&
+   test_path_is_missing bad
+'
+
 cat << EOF >expect
 Execution of 'false $submodulesha1' failed in submodule path 'submodule'
 EOF
-- 
2.14.0.rc0.3.g6c2e499285



Re: [PATCH] Documentation: consolidate submodule..update

2017-09-25 Thread Stefan Beller
On Mon, Sep 25, 2017 at 3:22 PM, Jonathan Nieder  wrote:
> Stefan Beller wrote:
>> On Mon, Sep 25, 2017 at 12:17 PM, Brandon Williams  wrote:
>>> On 09/25, Stefan Beller wrote:
>
 Have one place to explain the effects of setting submodule..update
 instead of two.

 Signed-off-by: Stefan Beller 
 ---
>> I disagree.  Actually, I think the git-config(1) blurb could just
>> point here, and that the text here ought to be clear about what
>> commands it affects and how an end user would use it.
>
> I tend to agree with the consolidation.

 Something like this?
>>>
>>> I like the consolidation, its easier to keep up to date when its only in
>>> one place.
>>
>> After thinking about it further, I'd like to withdraw this patch
>> for now.
>>
>> The reason is that this would also forward point to
>> git-submodule.txt, such that we'd still have 2 places
>> in which it is explained. The current situation with explaining
>> it in 3 places is not too bad as each place hints at their specific
>> circumstance:
>> git-submodule.txt explains the values to set itself.
>> gitmodules.txt explains what the possible fields in that file are,
>>   and regarding this field it points out it is ignored in the init call.
>> config.txt: actually describe the effects of the setting.
>>
>> I think we can keep it as-is for now.
>
> Sorry, I got lost.  Which state is as-is?

By as-is I refer to origin/pu.

> As a user, how do I find out what submodule.*.update is going to do
> and which commands will respect it?

The user would discover it via 'man git-config' I would assume, which
covers any config variable? It also directs to git-submodule which is
more detailed, but the text there is suitable for the casual reader.
(pu has origin/sb/doc-config-submodule-update)

> Maybe we should work on first wordsmithing the doc and then figuring
> out where it goes?  The current state of the doc with (?) three
> different texts,

such that each different text highlights each locations purpose.

> all wrong,

Care to spell out this bold claim?

Thanks,
Stefan


Re: git ls-tree -d doesn't work if the specified path is the repository root?

2017-09-25 Thread Jonathan Nieder
Hi,

Roy Wellington wrote:

> When I run `git ls-tree -d HEAD -- subdir` from the root of my
> repository, where `subdir` is a subdirectory in that root, I get the
> treehash of that subdirectory. This is what I expect.
>
> However, if I merely replace `subdir` with `.` (the root of the
> repository), (i.e., `git ls-tree -d HEAD -- .`) git ls-tree returns
> the treehashes of the /children/ of the root, instead of the root
> itself, contrary to the documented behavior of -d.

Can you be more specific?  What documentation led you to this
expectation?

I don't have a strong opinion about this behavior, but in any case if
the documentation and command disagree about the correct behavior then
one of them needs to be fixed. :)

> Is there some reason for this? This behavior seems like a bug to me:
> it means that prior to calling ls-tree I need to check if the
> referenced path happens to be the root, and if so, find some other
> means (rev-parse?) of converting it to a treehash.

Does

git rev-parse HEAD:subdir

work better for what you're trying to accomplish?  To get the root of
the repository, you can use

git rev-parse HEAD:

Thanks and hope that helps,
Jonathan


Re: [PATCH] Documentation: consolidate submodule..update

2017-09-25 Thread Jonathan Nieder
Stefan Beller wrote:
> On Mon, Sep 25, 2017 at 12:17 PM, Brandon Williams  wrote:
>> On 09/25, Stefan Beller wrote:

>>> Have one place to explain the effects of setting submodule..update
>>> instead of two.
>>>
>>> Signed-off-by: Stefan Beller 
>>> ---
> I disagree.  Actually, I think the git-config(1) blurb could just
> point here, and that the text here ought to be clear about what
> commands it affects and how an end user would use it.

 I tend to agree with the consolidation.
>>>
>>> Something like this?
>>
>> I like the consolidation, its easier to keep up to date when its only in
>> one place.
>
> After thinking about it further, I'd like to withdraw this patch
> for now.
>
> The reason is that this would also forward point to
> git-submodule.txt, such that we'd still have 2 places
> in which it is explained. The current situation with explaining
> it in 3 places is not too bad as each place hints at their specific
> circumstance:
> git-submodule.txt explains the values to set itself.
> gitmodules.txt explains what the possible fields in that file are,
>   and regarding this field it points out it is ignored in the init call.
> config.txt: actually describe the effects of the setting.
>
> I think we can keep it as-is for now.

Sorry, I got lost.  Which state is as-is?

As a user, how do I find out what submodule.*.update is going to do
and which commands will respect it?

Maybe we should work on first wordsmithing the doc and then figuring
out where it goes?  The current state of the doc with (?) three
different texts, all wrong, doesn't seem like a good state to
preserve.

Thanks,
Jonathan


Re: [PATCH] Documentation: consolidate submodule..update

2017-09-25 Thread Stefan Beller
On Mon, Sep 25, 2017 at 12:17 PM, Brandon Williams  wrote:
> On 09/25, Stefan Beller wrote:
>> Have one place to explain the effects of setting submodule..update
>> instead of two.
>>
>> Signed-off-by: Stefan Beller 
>> ---
>> >> I disagree.  Actually, I think the git-config(1) blurb could just
>> >> point here, and that the text here ought to be clear about what
>> >> commands it affects and how an end user would use it.
>> >
>> > I tend to agree with the consolidation.
>>
>> Something like this?
>
> I like the consolidation, its easier to keep up to date when its only in
> one place.

After thinking about it further, I'd like to withdraw this patch
for now.

The reason is that this would also forward point to
git-submodule.txt, such that we'd still have 2 places
in which it is explained. The current situation with explaining
it in 3 places is not too bad as each place hints at their specific
circumstance:
git-submodule.txt explains the values to set itself.
gitmodules.txt explains what the possible fields in that file are,
  and regarding this field it points out it is ignored in the init call.
config.txt: actually describe the effects of the setting.

I think we can keep it as-is for now.

Thanks,
Stefan


Re: [PATCH 7/7] add xread_in_full() helper

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> Many callers of read_in_full() expect to see exactly "len"
> bytes, and die if that isn't the case.

micronit: Can this be named read_in_full_or_die?

Otherwise it's too easy to mistake for a function like xread, which
has different semantics.

I realize that xmalloc, xmemdupz, etc use a different convention.
That's yet another reason to be explicit, IMHO.

Thanks,
Jonathan


Re: [PATCH 6/7] worktree: check the result of read_in_full()

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> We try to read "len" bytes into a buffer and just assume
> that it happened correctly. In practice this should usually
> be the case, since we just stat'd the file to get the
> length.  But we could be fooled by transient errors or by
> other processes racily truncating the file.
>
> Let's be more careful. There's a slim chance this could
> catch a real error, but it also prevents people and tools
> from getting worried while reading the code.
>
> Signed-off-by: Jeff King 
> ---
>  builtin/worktree.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> index 2f4a4ef9cd..87b3d70b0b 100644
> --- a/builtin/worktree.c
> +++ b/builtin/worktree.c
> @@ -59,7 +59,11 @@ static int prune_worktree(const char *id, struct strbuf 
> *reason)
>   }
>   len = xsize_t(st.st_size);
>   path = xmallocz(len);
> - read_in_full(fd, path, len);
> + if (read_in_full(fd, path, len) != len) {
> + strbuf_addf(reason, _("Removing worktrees/%s: gitdir read did 
> not match stat (%s)"),
> + id, strerror(errno));

I'm a little confused.  The 'if' condition checks for a read error but
the message says something about 'stat'.

If we're trying to double-check the 'stat' result, shouldn't we read
all the way to EOF in case the file got longer?

Puzzled,
Jonathan


Re: [PATCH 5/7] worktree: use xsize_t to access file size

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> To read the "gitdir" file into memory, we stat the file and
> allocate a buffer. But we store the size in an "int", which
> may be truncated. We should use a size_t and xsize_t(),
> which will detect truncation.
>
> An overflow is unlikely for a "gitdir" file, but it's a good
> practice to model.
>
> Signed-off-by: Jeff King 
> ---
>  builtin/worktree.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Reviewed-by: Jonathan Nieder 


Re: [PATCH 4/7] get-tar-commit-id: prefer "!=" for read_in_full() error check

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> Suggested-by: Jonathan Nieder 
> Signed-off-by: Jeff King 
> ---
>  builtin/get-tar-commit-id.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)

Reviewed-by: Jonathan Nieder 


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> In an ideal world, callers would always distinguish between
> these cases and give a useful message for each. But as an
> easy way to make our imperfect world better, let's reset
> errno to a known value. The best we can do is "0", which
> will yield something like:
>
>   unable to read: Success
>
> That's not great, but at least it's deterministic and makes
> it clear that we didn't see an error from read().

Yuck.  Can we set errno to something more specific instead?

read(2) also doesn't promise not to clobber errno on success.

How about something like the following?

-- >8 --
Subject: read_in_full: set errno for partial reads

Many callers of read_in_full() complain when we do not read their full
byte-count. But a check like:

  if (read_in_full(fd, buf, len) != len)
return error_errno("unable to read");

conflates two problem conditions:

  1. A real error from read().
  2. There were fewer than "len" bytes available.

In the first case, showing the user strerror(errno) is useful. In the
second, we may see a random errno that was set by some previous system
call.

In an ideal world, callers would always distinguish between these
cases and give a useful message for each. But as an easy way to make
our imperfect world better, let's set errno in the second case to a
known value. There is no standard "Unexpected end of file" errno
value, so instead use EILSEQ, which yields a message like

  unable to read: Illegal byte sequence

That's not great, but at least it's deterministic and makes it
possible to reverse-engineer what went wrong.

Change-Id: I48138052f71b7c6cf35c2d00a10dc8febaca4f10
Signed-off-by: Jonathan Nieder 
---
 wrapper.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/wrapper.c b/wrapper.c
index 61aba0b5c1..1842a99b87 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -318,8 +318,10 @@ ssize_t read_in_full(int fd, void *buf, size_t count)
ssize_t loaded = xread(fd, p, count);
if (loaded < 0)
return -1;
-   if (loaded == 0)
+   if (loaded == 0) {
+   errno = EILSEQ;
return total;
+   }
count -= loaded;
p += loaded;
total += loaded;
-- 
2.14.1.821.g8fa685d3b7



Re: [PATCH 2/7] notes-merge: drop dead zero-write code

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> We call write_in_full() with a size that we know is greater
> than zero. The return value can never be zero, then, since
> write_in_full() converts such a failed write() into ENOSPC
> and returns -1.  We can just drop this branch of the error
> handling entirely.
>
> Suggested-by: Jonathan Nieder 
> Signed-off-by: Jeff King 
> ---
>  notes-merge.c | 2 --
>  1 file changed, 2 deletions(-)

Reviewed-by: Jonathan Nieder 

Thanks for tying up this loose end.


Re: [PATCH 1/7] files-backend: prefer "0" for write_in_full() error check

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:
> Commit 06f46f237a (avoid "write_in_full(fd, buf, len) !=
> len" pattern, 2017-09-13) converted this callsite from:
>
>   write_in_full(...) != 1
>
> to
>
>   write_in_full(...) < 0
>
> But during the conflict resolution in c50424a6f0 (Merge
> branch 'jk/write-in-full-fix', 2017-09-25), this morphed
> into
>
>   write_in_full(...) < 1
>
> This behaves as we want, but we prefer to avoid modeling the
> "less than length" error-check which can be subtly buggy, as
> shown in efacf609c8 (config: avoid "write_in_full(fd, buf,
> len) < len" pattern, 2017-09-13).
>
> Signed-off-by: Jeff King 
> ---
>  refs/files-backend.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Good eyes.  Thanks.
Reviewed-by: Jonathan Nieder 

By the way, the description from that merge commit

Many codepaths did not diagnose write failures correctly when disks
go full, due to their misuse of write_in_full() helper function,
which have been corrected.

does not look accurate to me.  At least the "Many codepaths" part.
All of those changes except for three should be no-ops.  The scariest
one is the 'long ret = write_in_full(fd, buf, size)' in notes-merge.c,
which is about overflowing a "long" on Windows instead of error
handling.

Thanks,
Jonathan


macOS Git installer on SourceForge

2017-09-25 Thread Lars Schneider
Hi,

one of my colleagues made me aware today that the macOS Git installer is 
hosted on SourceForge and advertised on the official git-scm page:
https://git-scm.com/download/mac

SourceForge had some bad press as they apparently bundled junkware in
their downloads:
https://www.howtogeek.com/218764/warning-don%E2%80%99t-download-software-from-sourceforge-if-you-can-help-it/

Is this a reason for concern?

Thanks,
Lars

[PATCH 7/7] add xread_in_full() helper

2017-09-25 Thread Jeff King
Many callers of read_in_full() expect to see exactly "len"
bytes, and die if that isn't the case. Since there are two
interesting error cases:

  1. We saw a read() error.

  2. We saw EOF with fewer bytes than expected.

we would ideally distinguish between them when reporting to
the user. Let's add a helper to make this easier. We can
convert cases that currently lump the two conditions
together (improving their error messages) and ones that
already distinguish (shortening their code).

There are a number of read_in_full() calls left that we
can't (or shouldn't) convert, for one or more of these
reasons:

 - Some are not trying to read an exact number in the first
   place. So they only care about case 1, and are fine.

 - Some call error() instead of die(). We could possibly
   accommodate these, at the expense of making our helper a
   bit more clunky.

 - Some print nothing and return a failing code up the
   stack, in which case somebody eventually looks at errno.
   Handling these with a helper would be hard. Ideally there
   would be some errno value that we could set for short
   read, but there's no standard one (ENODATA is kind-of
   accurate, but it's not really "no data"; it's "not enough
   data", and we wouldn't want to get confused with a real
   ENODATA error).

Signed-off-by: Jeff King 
---
 builtin/get-tar-commit-id.c |  3 +--
 bulk-checkin.c  |  4 +---
 cache.h |  7 +++
 combine-diff.c  |  8 +---
 csum-file.c |  6 +-
 pack-write.c|  3 +--
 wrapper.c   | 11 +++
 7 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c
index 259ad40339..0b97196afc 100644
--- a/builtin/get-tar-commit-id.c
+++ b/builtin/get-tar-commit-id.c
@@ -24,8 +24,7 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const 
char *prefix)
if (argc != 1)
usage(builtin_get_tar_commit_id_usage);
 
-   if (read_in_full(0, buffer, HEADERSIZE) != HEADERSIZE)
-   die_errno("git get-tar-commit-id: read error");
+   xread_in_full(0, buffer, HEADERSIZE, "stdin");
if (header->typeflag[0] != 'g')
return 1;
if (!skip_prefix(content, "52 comment=", ))
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 9a1f6c49ab..a9743785fb 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -115,9 +115,7 @@ static int stream_to_pack(struct bulk_checkin_state *state,
 
if (size && !s.avail_in) {
ssize_t rsize = size < sizeof(ibuf) ? size : 
sizeof(ibuf);
-   if (read_in_full(fd, ibuf, rsize) != rsize)
-   die("failed to read %d bytes from '%s'",
-   (int)rsize, path);
+   xread_in_full(fd, ibuf, rsize, path);
offset += rsize;
if (*already_hashed_to < offset) {
size_t hsize = offset - *already_hashed_to;
diff --git a/cache.h b/cache.h
index 49b083ee0a..deec532228 100644
--- a/cache.h
+++ b/cache.h
@@ -1757,6 +1757,13 @@ extern ssize_t read_in_full(int fd, void *buf, size_t 
count);
 extern ssize_t write_in_full(int fd, const void *buf, size_t count);
 extern ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset);
 
+/*
+ * Like read_in_full(), but die on errors or a failure to read exactly "count"
+ * bytes. The "desc" string will be inserted into the error message as "reading
+ * %s".
+ */
+extern void xread_in_full(int fd, void *buf, size_t count, const char *desc);
+
 static inline ssize_t write_str_in_full(int fd, const char *str)
 {
return write_in_full(fd, str, strlen(str));
diff --git a/combine-diff.c b/combine-diff.c
index 9e163d5ada..4bdf797a10 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1027,7 +1027,6 @@ static void show_patch_diff(struct combine_diff_path 
*elem, int num_parent,
free_filespec(df);
} else if (0 <= (fd = open(elem->path, O_RDONLY))) {
size_t len = xsize_t(st.st_size);
-   ssize_t done;
int is_file, i;
 
elem->mode = canon_mode(st.st_mode);
@@ -1042,12 +1041,7 @@ static void show_patch_diff(struct combine_diff_path 
*elem, int num_parent,
 
result_size = len;
result = xmallocz(len);
-
-   done = read_in_full(fd, result, len);
-   if (done < 0)
-   die_errno("read error '%s'", elem->path);
-   else if (done < len)
-   die("early EOF '%s'", elem->path);
+   xread_in_full(fd, result, len, elem->path);
 
/* If not a fake symlink, apply filters, e.g. autocrlf 
*/
   

[PATCH 6/7] worktree: check the result of read_in_full()

2017-09-25 Thread Jeff King
We try to read "len" bytes into a buffer and just assume
that it happened correctly. In practice this should usually
be the case, since we just stat'd the file to get the
length.  But we could be fooled by transient errors or by
other processes racily truncating the file.

Let's be more careful. There's a slim chance this could
catch a real error, but it also prevents people and tools
from getting worried while reading the code.

Signed-off-by: Jeff King 
---
 builtin/worktree.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 2f4a4ef9cd..87b3d70b0b 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -59,7 +59,11 @@ static int prune_worktree(const char *id, struct strbuf 
*reason)
}
len = xsize_t(st.st_size);
path = xmallocz(len);
-   read_in_full(fd, path, len);
+   if (read_in_full(fd, path, len) != len) {
+   strbuf_addf(reason, _("Removing worktrees/%s: gitdir read did 
not match stat (%s)"),
+   id, strerror(errno));
+   return 1;
+   }
close(fd);
while (len && (path[len - 1] == '\n' || path[len - 1] == '\r'))
len--;
-- 
2.14.1.1148.ga2561536a1



[PATCH 5/7] worktree: use xsize_t to access file size

2017-09-25 Thread Jeff King
To read the "gitdir" file into memory, we stat the file and
allocate a buffer. But we store the size in an "int", which
may be truncated. We should use a size_t and xsize_t(),
which will detect truncation.

An overflow is unlikely for a "gitdir" file, but it's a good
practice to model.

Signed-off-by: Jeff King 
---
 builtin/worktree.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index de26849f55..2f4a4ef9cd 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -38,7 +38,8 @@ static int prune_worktree(const char *id, struct strbuf 
*reason)
 {
struct stat st;
char *path;
-   int fd, len;
+   int fd;
+   size_t len;
 
if (!is_directory(git_path("worktrees/%s", id))) {
strbuf_addf(reason, _("Removing worktrees/%s: not a valid 
directory"), id);
@@ -56,7 +57,7 @@ static int prune_worktree(const char *id, struct strbuf 
*reason)
id, strerror(errno));
return 1;
}
-   len = st.st_size;
+   len = xsize_t(st.st_size);
path = xmallocz(len);
read_in_full(fd, path, len);
close(fd);
-- 
2.14.1.1148.ga2561536a1



[PATCH 4/7] get-tar-commit-id: prefer "!=" for read_in_full() error check

2017-09-25 Thread Jeff King
Comparing the result of read_in_full() using less-than is
potentially dangerous, as discussed in 561598cfcf
(read_pack_header: handle signed/unsigned comparison in read
result, 2017-09-13).

The instance in get-tar-commit-id is OK, because the
HEADERSIZE macro expands to a signed integer. But if it were
switched to an unsigned type like:

  size_t HEADERSIZE = ...;

this would be a bug. Let's use the more robust "!="
construct.

We can also drop the useless "n" variable while we're at it.

Suggested-by: Jonathan Nieder 
Signed-off-by: Jeff King 
---
 builtin/get-tar-commit-id.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c
index 6d9a79f9b3..259ad40339 100644
--- a/builtin/get-tar-commit-id.c
+++ b/builtin/get-tar-commit-id.c
@@ -20,14 +20,12 @@ int cmd_get_tar_commit_id(int argc, const char **argv, 
const char *prefix)
struct ustar_header *header = (struct ustar_header *)buffer;
char *content = buffer + RECORDSIZE;
const char *comment;
-   ssize_t n;
 
if (argc != 1)
usage(builtin_get_tar_commit_id_usage);
 
-   n = read_in_full(0, buffer, HEADERSIZE);
-   if (n < HEADERSIZE)
-   die("git get-tar-commit-id: read error");
+   if (read_in_full(0, buffer, HEADERSIZE) != HEADERSIZE)
+   die_errno("git get-tar-commit-id: read error");
if (header->typeflag[0] != 'g')
return 1;
if (!skip_prefix(content, "52 comment=", ))
-- 
2.14.1.1148.ga2561536a1



[PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jeff King
Many callers of read_in_full() complain when we do not read
their full byte-count. But a check like:

  if (read_in_full(fd, buf, len) != len)
  return error_errno("unable to read");

conflates two problem conditions:

  1. A real error from read().

  2. There were fewer than "len" bytes available.

In the first case, showing the user strerror(errno) is
useful. But in the second, we may see a random errno that
was set by some previous system call.

In an ideal world, callers would always distinguish between
these cases and give a useful message for each. But as an
easy way to make our imperfect world better, let's reset
errno to a known value. The best we can do is "0", which
will yield something like:

  unable to read: Success

That's not great, but at least it's deterministic and makes
it clear that we didn't see an error from read().

Signed-off-by: Jeff King 
---
 wrapper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/wrapper.c b/wrapper.c
index 61aba0b5c1..f55debc92d 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -314,6 +314,7 @@ ssize_t read_in_full(int fd, void *buf, size_t count)
char *p = buf;
ssize_t total = 0;
 
+   errno = 0;
while (count > 0) {
ssize_t loaded = xread(fd, p, count);
if (loaded < 0)
-- 
2.14.1.1148.ga2561536a1



[PATCH 2/7] notes-merge: drop dead zero-write code

2017-09-25 Thread Jeff King
We call write_in_full() with a size that we know is greater
than zero. The return value can never be zero, then, since
write_in_full() converts such a failed write() into ENOSPC
and returns -1.  We can just drop this branch of the error
handling entirely.

Suggested-by: Jonathan Nieder 
Signed-off-by: Jeff King 
---
 notes-merge.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/notes-merge.c b/notes-merge.c
index 597d43f65c..4352c34a6e 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -308,8 +308,6 @@ static void write_buf_to_worktree(const struct object_id 
*obj,
if (errno == EPIPE)
break;
die_errno("notes-merge");
-   } else if (!ret) {
-   die("notes-merge: disk full?");
}
size -= ret;
buf += ret;
-- 
2.14.1.1148.ga2561536a1



[PATCH 1/7] files-backend: prefer "0" for write_in_full() error check

2017-09-25 Thread Jeff King
Commit 06f46f237a (avoid "write_in_full(fd, buf, len) !=
len" pattern, 2017-09-13) converted this callsite from:

  write_in_full(...) != 1

to

  write_in_full(...) < 0

But during the conflict resolution in c50424a6f0 (Merge
branch 'jk/write-in-full-fix', 2017-09-25), this morphed
into

  write_in_full(...) < 1

This behaves as we want, but we prefer to avoid modeling the
"less than length" error-check which can be subtly buggy, as
shown in efacf609c8 (config: avoid "write_in_full(fd, buf,
len) < len" pattern, 2017-09-13).

Signed-off-by: Jeff King 
---
 refs/files-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index dac33628b3..e35c64c001 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3007,7 +3007,7 @@ static int files_reflog_expire(struct ref_store 
*ref_store,
} else if (update &&
   (write_in_full(get_lock_file_fd(>lk),
oid_to_hex(_kept_oid), GIT_SHA1_HEXSZ) 
< 0 ||
-   write_str_in_full(get_lock_file_fd(>lk), 
"\n") < 1 ||
+   write_str_in_full(get_lock_file_fd(>lk), 
"\n") < 0 ||
close_ref_gently(lock) < 0)) {
status |= error("couldn't write %s",
get_lock_file_path(>lk));
-- 
2.14.1.1148.ga2561536a1



[PATCH 0/7] read/write_in_full leftovers

2017-09-25 Thread Jeff King
This series addresses the bits leftover from the discussion two weeks
ago in:

  
https://public-inbox.org/git/20170913170807.cyx7rrpoyhaau...@sigill.intra.peff.net/

and its subthread. I don't think any of these is a real problem in
practice, so this can be considered as a cleanup. I'm on the fence over
whether the final one is a good idea. See the commit message for
details.

  [1/7]: files-backend: prefer "0" for write_in_full() error check
  [2/7]: notes-merge: drop dead zero-write code
  [3/7]: read_in_full: reset errno before reading
  [4/7]: get-tar-commit-id: prefer "!=" for read_in_full() error check
  [5/7]: worktree: use xsize_t to access file size
  [6/7]: worktree: check the result of read_in_full()
  [7/7]: add xread_in_full() helper

 builtin/get-tar-commit-id.c |  5 +
 builtin/worktree.c  | 11 ---
 bulk-checkin.c  |  4 +---
 cache.h |  7 +++
 combine-diff.c  |  8 +---
 csum-file.c |  6 +-
 notes-merge.c   |  2 --
 pack-write.c|  3 +--
 refs/files-backend.c|  2 +-
 wrapper.c   | 12 
 10 files changed, 33 insertions(+), 27 deletions(-)

-Peff


Re: [PATCH 4/4] Move documentation of string_list into string-list.h

2017-09-25 Thread Stefan Beller
On Mon, Sep 25, 2017 at 10:40 AM, Brandon Williams  wrote:
> On 09/25, Han-Wen Nienhuys wrote:
>> This mirrors commit bdfdaa4978dd92992737e662f25adc01d32d0774 which did
>
> Not really important but most times we reference another commit from a
> commit msg we include the one line summary like:
> 'bdfdaa497 (strbuf.h: integrate api-strbuf.txt documentation,
> 2015-01-16)'

I have

alias.gcs=show --date=short -s --pretty='format:%h (%s, %ad)'

in the config file for that.

Thanks for converting documentation into header files!

Stefan


Re: [PATCH] t7406: submodule..update command must not be run from .gitmodules

2017-09-25 Thread Jonathan Nieder
Stefan Beller wrote:

> submodule..update can be assigned an arbitrary command via setting
> it to "!command". When this command is found in the regular config, Git
> ought to just run that command instead of other update mechanisms.
> 
> However if that command is just found in the .gitmodules file, it is
> potentially untrusted, which is why we do not run it.  Add a test
> confirming the behavior.
> 
> Suggested-by: Jonathan Nieder 
> Signed-off-by: Stefan Beller 
> ---
>  t/t7406-submodule-update.sh | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
> index 034914a14f..780af4e6f5 100755
> --- a/t/t7406-submodule-update.sh
> +++ b/t/t7406-submodule-update.sh
> @@ -406,6 +406,16 @@ test_expect_success 'submodule update - command in 
> .git/config' '
>   )
>  '
>  
> +test_expect_success 'submodule update - command in .gitmodules is ignored' '
> + test_when_finished "git -C super reset --hard HEAD^" &&
> +
> + git -C super config -f .gitmodules submodule.submodule.update "!false 
> || echo >bad" &&

What does the '!false || echo >bad' do?

Ideally we want this test to be super robust: e.g. if it runs the
command but from a different directory, we still want the test to fail,
and if it runs the command but using exec instead of a shell, we still
want the test to fail.

Maybe write_script would help with this.  E.g. would something like

test_when_finished ... &&
write_script must_not_run.sh <<-EOF &&
>$TEST_DIRECTORY/bad
EOF

git -C super config -f .gitmodules submodule.submodule.update \
"!$TEST_DIRECTORY/must_not_run.sh" &&
...

work?

Thanks,
Jonathan


[PATCH] t7406: submodule..update command must not be run from .gitmodules

2017-09-25 Thread Stefan Beller
submodule..update can be assigned an arbitrary command via setting
it to "!command". When this command is found in the regular config, Git
ought to just run that command instead of other update mechanisms.

However if that command is just found in the .gitmodules file, it is
potentially untrusted, which is why we do not run it.  Add a test
confirming the behavior.

Suggested-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
---
 t/t7406-submodule-update.sh | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 034914a14f..780af4e6f5 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -406,6 +406,16 @@ test_expect_success 'submodule update - command in 
.git/config' '
)
 '
 
+test_expect_success 'submodule update - command in .gitmodules is ignored' '
+   test_when_finished "git -C super reset --hard HEAD^" &&
+
+   git -C super config -f .gitmodules submodule.submodule.update "!false 
|| echo >bad" &&
+   git -C super commit -a -m "add command to .gitmodules file" &&
+   git -C super/submodule reset --hard $submodulesha1^ &&
+   git -C super submodule update submodule 2>../actual &&
+   test_path_is_missing super/bad
+'
+
 cat << EOF >expect
 Execution of 'false $submodulesha1' failed in submodule path 'submodule'
 EOF
-- 
2.14.0.rc0.3.g6c2e499285



Re: git ls-tree -d doesn't work if the specified path is the repository root?

2017-09-25 Thread Santiago Torres
 
> Is there some reason for this? This behavior seems like a bug to me:
> it means that prior to calling ls-tree I need to check if the
> referenced path happens to be the root, and if so, find some other
> means (rev-parse?) of converting it to a treehash.

Hmm, interesting.

I see these four behaviors:

[santiago@LykOS bg_daemon]$ git ls-tree -d HEAD -- src
04 tree 238a62ca62527423fd3190d00917ddfef0d254a3src

[santiago@LykOS bg_daemon]$ git ls-tree -d HEAD -- src/
04 tree 767beaaf0927f89e630c52830b6fbac138ec039asrc/bg_daemon

[santiago@LykOS bg_daemon]$ git ls-tree -d HEAD -- .
04 tree 238a62ca62527423fd3190d00917ddfef0d254a3src
04 tree de12675d1023b1e743f7e11c2276fc417b3650a6tests

[santiago@LykOS bg_daemon]$ git ls-tree -d HEAD -- ./
04 tree 238a62ca62527423fd3190d00917ddfef0d254a3src
04 tree de12675d1023b1e743f7e11c2276fc417b3650a6tests

IMHO, I think it'd be more consistent if . returned the root treehash and ./
returned the treehash of the directories in root.

I don't personally know if there is a reason for this...

Thanks,
-Santiago.

> 
> Thanks,
> —Roy


signature.asc
Description: PGP signature


Re: [PATCH] Documentation: consolidate submodule..update

2017-09-25 Thread Brandon Williams
On 09/25, Stefan Beller wrote:
> Have one place to explain the effects of setting submodule..update
> instead of two.
> 
> Signed-off-by: Stefan Beller 
> ---
> >> I disagree.  Actually, I think the git-config(1) blurb could just
> >> point here, and that the text here ought to be clear about what
> >> commands it affects and how an end user would use it.
> >
> > I tend to agree with the consolidation.
> 
> Something like this?

I like the consolidation, its easier to keep up to date when its only in
one place.

> 
>  Documentation/config.txt |  9 +
>  Documentation/gitmodules.txt | 20 +++-
>  2 files changed, 12 insertions(+), 17 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1ac0ae6adb..0d5a296b6c 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -3085,14 +3085,7 @@ submodule..url::
>   See linkgit:git-submodule[1] and linkgit:gitmodules[5] for details.
>  
>  submodule..update::
> - The method by which a submodule is updated by 'git submodule update',
> - which is the only affected command, others such as
> - 'git checkout --recurse-submodules' are unaffected. It exists for
> - historical reasons, when 'git submodule' was the only command to
> - interact with submodules; settings like `submodule.active`
> - and `pull.rebase` are more specific. It is populated by
> - `git submodule init` from the linkgit:gitmodules[5] file.
> - See description of 'update' command in linkgit:git-submodule[1].
> + See `submodule..update` in linkgit:gitmodules[5].
>  
>  submodule..branch::
>   The remote branch name for a submodule, used by `git submodule
> diff --git a/Documentation/gitmodules.txt b/Documentation/gitmodules.txt
> index db5d47eb19..d156dee387 100644
> --- a/Documentation/gitmodules.txt
> +++ b/Documentation/gitmodules.txt
> @@ -38,15 +38,17 @@ submodule..url::
>  In addition, there are a number of optional keys:
>  
>  submodule..update::
> - Defines the default update procedure for the named submodule,
> - i.e. how the submodule is updated by "git submodule update"
> - command in the superproject. This is only used by `git
> - submodule init` to initialize the configuration variable of
> - the same name. Allowed values here are 'checkout', 'rebase',
> - 'merge' or 'none'. See description of 'update' command in
> - linkgit:git-submodule[1] for their meaning. Note that the
> - '!command' form is intentionally ignored here for security
> - reasons.
> + The method by which a submodule is updated by 'git submodule update',
> + which is the only affected command, others such as
> + 'git checkout --recurse-submodules' are unaffected. It exists for
> + historical reasons, when 'git submodule' was the only command to
> + interact with submodules; settings like `submodule.active`
> + and `pull.rebase` are more specific. It is copied to the config
> + by `git submodule init` from the .gitmodules file.
> + Allowed values here are 'checkout', 'rebase', 'merge' or 'none'.
> + See description of 'update' command in linkgit:git-submodule[1]
> + for their meaning. Note that the '!command' form is intentionally
> + ignored here for security reasons.

This probably needs to be tweaked a bit to say that the '!command' form
is ignored by submodule init, in that it isn't copied over from the
.gitmodules file, but if it is configured in your config it will be
respected by 'submodule update'.

>  
>  submodule..branch::
>   A remote branch name for tracking updates in the upstream submodule.
> -- 
> 2.14.0.rc0.3.g6c2e499285
> 

-- 
Brandon Williams


git ls-tree -d doesn't work if the specified path is the repository root?

2017-09-25 Thread Roy Wellington
Hi,

When I run `git ls-tree -d HEAD -- subdir` from the root of my
repository, where `subdir` is a subdirectory in that root, I get the
treehash of that subdirectory. This is what I expect.

However, if I merely replace `subdir` with `.` (the root of the
repository), (i.e., `git ls-tree -d HEAD -- .`) git ls-tree returns
the treehashes of the /children/ of the root, instead of the root
itself, contrary to the documented behavior of -d.

Is there some reason for this? This behavior seems like a bug to me:
it means that prior to calling ls-tree I need to check if the
referenced path happens to be the root, and if so, find some other
means (rev-parse?) of converting it to a treehash.

Thanks,
—Roy


[PATCH] Documentation: consolidate submodule..update

2017-09-25 Thread Stefan Beller
Have one place to explain the effects of setting submodule..update
instead of two.

Signed-off-by: Stefan Beller 
---
>> I disagree.  Actually, I think the git-config(1) blurb could just
>> point here, and that the text here ought to be clear about what
>> commands it affects and how an end user would use it.
>
> I tend to agree with the consolidation.

Something like this?

 Documentation/config.txt |  9 +
 Documentation/gitmodules.txt | 20 +++-
 2 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1ac0ae6adb..0d5a296b6c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3085,14 +3085,7 @@ submodule..url::
See linkgit:git-submodule[1] and linkgit:gitmodules[5] for details.
 
 submodule..update::
-   The method by which a submodule is updated by 'git submodule update',
-   which is the only affected command, others such as
-   'git checkout --recurse-submodules' are unaffected. It exists for
-   historical reasons, when 'git submodule' was the only command to
-   interact with submodules; settings like `submodule.active`
-   and `pull.rebase` are more specific. It is populated by
-   `git submodule init` from the linkgit:gitmodules[5] file.
-   See description of 'update' command in linkgit:git-submodule[1].
+   See `submodule..update` in linkgit:gitmodules[5].
 
 submodule..branch::
The remote branch name for a submodule, used by `git submodule
diff --git a/Documentation/gitmodules.txt b/Documentation/gitmodules.txt
index db5d47eb19..d156dee387 100644
--- a/Documentation/gitmodules.txt
+++ b/Documentation/gitmodules.txt
@@ -38,15 +38,17 @@ submodule..url::
 In addition, there are a number of optional keys:
 
 submodule..update::
-   Defines the default update procedure for the named submodule,
-   i.e. how the submodule is updated by "git submodule update"
-   command in the superproject. This is only used by `git
-   submodule init` to initialize the configuration variable of
-   the same name. Allowed values here are 'checkout', 'rebase',
-   'merge' or 'none'. See description of 'update' command in
-   linkgit:git-submodule[1] for their meaning. Note that the
-   '!command' form is intentionally ignored here for security
-   reasons.
+   The method by which a submodule is updated by 'git submodule update',
+   which is the only affected command, others such as
+   'git checkout --recurse-submodules' are unaffected. It exists for
+   historical reasons, when 'git submodule' was the only command to
+   interact with submodules; settings like `submodule.active`
+   and `pull.rebase` are more specific. It is copied to the config
+   by `git submodule init` from the .gitmodules file.
+   Allowed values here are 'checkout', 'rebase', 'merge' or 'none'.
+   See description of 'update' command in linkgit:git-submodule[1]
+   for their meaning. Note that the '!command' form is intentionally
+   ignored here for security reasons.
 
 submodule..branch::
A remote branch name for tracking updates in the upstream submodule.
-- 
2.14.0.rc0.3.g6c2e499285



Re: [PATCH] git: add --no-optional-locks option

2017-09-25 Thread Stefan Beller
>> There might be another option to cope with the situation:
>>
>>  4. Teach all commands to spinlock / busywait shortly for important
>>  locks instead of giving up. In that case git-status rewriting
>>  the index ought to be fine?
>
> We do have all the infrastructure in place to do a reasonable busywait
> with backoff. I think the patch is roughly just:
>
> diff --git a/read-cache.c b/read-cache.c
> index 65f4fe8375..fc1ba122a3 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1563,7 +1563,8 @@ static int read_index_extension(struct index_state 
> *istate,
>
>  int hold_locked_index(struct lock_file *lk, int lock_flags)
>  {
> -   return hold_lock_file_for_update(lk, get_index_file(), lock_flags);
> +   return hold_lock_file_for_update_timeout(lk, get_index_file(),
> +lock_flags, 500);
>  }
>
>  int read_index(struct index_state *istate)
>
> though I think there are a few sites which manually call
> hold_lock_file_for_update() on the index that would need similar
> treatment.

uh, too bad. The patch above looks really promising, though. :)

>
> I suspect it would work OK in practice, unless your index is so big that
> 500ms isn't enough. The user may also see minor stalls when there's lock
> contention. I'm not sure how annoying that would be.

There is only one way to find out. Though we don't want to volunteer
all users into this experiment, I'd presume.

Regarding larger indexes, I wonder if we can adapt the 500ms
to the repo size. At first I thought the abbreviation length could be
a good proxy to determine the maximum waiting time, but now I am
not so sure any more.

Stefan


Re: What's cooking in git.git (Sep 2017, #04; Mon, 25)

2017-09-25 Thread Stefan Beller
>
> * sb/parse-options-blank-line-before-option-list (2017-08-25) 1 commit
>  - usage_with_options: omit double new line on empty option list
>
>  "git worktree" with no option and no subcommand showed too many
>  blank lines in its help text, which has been reduced.
>
>  Superseded by bc/rev-parse-parseopt-fix?

yes it is, please retract this one.


> * sb/submodule-recursive-checkout-detach-head (2017-07-28) 2 commits
>  - Documentation/checkout: clarify submodule HEADs to be detached
>  - recursive submodules: detach HEAD from new state
>
>  "git checkout --recursive" may overwrite and rewind the history of
>  the branch that happens to be checked out in submodule
>  repositories, which might not be desirable.  Detach the HEAD but
>  still allow the recursive checkout to succeed in such a case.
>
>  Undecided.
>  This needs justification in a larger picture; it is unclear why
>  this is better than rejecting recursive checkout, for example.

I'll revisit this with fresh eyes, the original plan was to get
jn/per-repo-obj-store done to have more fundamental operations
available to do the right thing, whatever it is, though.

>
> * jn/per-repo-obj-store (2017-09-07) 39 commits

We intend to get this one rolling again.


Re: [PATCH 4/4] Move documentation of string_list into string-list.h

2017-09-25 Thread Brandon Williams
On 09/25, Han-Wen Nienhuys wrote:
> This mirrors commit bdfdaa4978dd92992737e662f25adc01d32d0774 which did

Not really important but most times we reference another commit from a
commit msg we include the one line summary like:
'bdfdaa497 (strbuf.h: integrate api-strbuf.txt documentation,
2015-01-16)'

> the same for strbuf.h:
> 
> * API documentation uses /** */ to set it apart from other comments.
> 
> * Function names were stripped from the comments.
> 
> * Ordering of the header was adjusted to follow the one from the text
>   file.
> 
> * Edited some existing comments to follow the new standard.
> 
> Signed-off-by: Han-Wen Nienhuys 
> ---
>  Documentation/technical/api-string-list.txt | 209 
> 
>  string-list.h   | 187 +
>  2 files changed, 159 insertions(+), 237 deletions(-)
>  delete mode 100644 Documentation/technical/api-string-list.txt
> 
> diff --git a/Documentation/technical/api-string-list.txt 
> b/Documentation/technical/api-string-list.txt
> deleted file mode 100644
> index c08402b12..0
> --- a/Documentation/technical/api-string-list.txt
> +++ /dev/null
> @@ -1,209 +0,0 @@
> -string-list API
> -===
> -
> -The string_list API offers a data structure and functions to handle
> -sorted and unsorted string lists.  A "sorted" list is one whose
> -entries are sorted by string value in `strcmp()` order.
> -
> -The 'string_list' struct used to be called 'path_list', but was renamed
> -because it is not specific to paths.
> -
> -The caller:
> -
> -. Allocates and clears a `struct string_list` variable.
> -
> -. Initializes the members. You might want to set the flag `strdup_strings`
> -  if the strings should be strdup()ed. For example, this is necessary
> -  when you add something like git_path("..."), since that function returns
> -  a static buffer that will change with the next call to git_path().
> -+
> -If you need something advanced, you can manually malloc() the `items`
> -member (you need this if you add things later) and you should set the
> -`nr` and `alloc` members in that case, too.
> -
> -. Adds new items to the list, using `string_list_append`,
> -  `string_list_append_nodup`, `string_list_insert`,
> -  `string_list_split`, and/or `string_list_split_in_place`.
> -
> -. Can check if a string is in the list using `string_list_has_string` or
> -  `unsorted_string_list_has_string` and get it from the list using
> -  `string_list_lookup` for sorted lists.
> -
> -. Can sort an unsorted list using `string_list_sort`.
> -
> -. Can remove duplicate items from a sorted list using
> -  `string_list_remove_duplicates`.
> -
> -. Can remove individual items of an unsorted list using
> -  `unsorted_string_list_delete_item`.
> -
> -. Can remove items not matching a criterion from a sorted or unsorted
> -  list using `filter_string_list`, or remove empty strings using
> -  `string_list_remove_empty_items`.
> -
> -. Finally it should free the list using `string_list_clear`.
> -
> -Example:
> -
> -
> -struct string_list list = STRING_LIST_INIT_NODUP;
> -int i;
> -
> -string_list_append(, "foo");
> -string_list_append(, "bar");
> -for (i = 0; i < list.nr; i++)
> - printf("%s\n", list.items[i].string)
> -
> -
> -NOTE: It is more efficient to build an unsorted list and sort it
> -afterwards, instead of building a sorted list (`O(n log n)` instead of
> -`O(n^2)`).
> -+
> -However, if you use the list to check if a certain string was added
> -already, you should not do that (using unsorted_string_list_has_string()),
> -because the complexity would be quadratic again (but with a worse factor).
> -
> -Functions
> --
> -
> -* General ones (works with sorted and unsorted lists as well)
> -
> -`string_list_init`::
> -
> - Initialize the members of the string_list, set `strdup_strings`
> - member according to the value of the second parameter.
> -
> -`filter_string_list`::
> -
> - Apply a function to each item in a list, retaining only the
> - items for which the function returns true.  If free_util is
> - true, call free() on the util members of any items that have
> - to be deleted.  Preserve the order of the items that are
> - retained.
> -
> -`string_list_remove_empty_items`::
> -
> - Remove any empty strings from the list.  If free_util is true,
> - call free() on the util members of any items that have to be
> - deleted.  Preserve the order of the items that are retained.
> -
> -`print_string_list`::
> -
> - Dump a string_list to stdout, useful mainly for debugging purposes. It
> - can take an optional header argument and it writes out the
> - string-pointer pairs of the string_list, each one in its own line.
> -
> -`string_list_clear`::
> -
> - Free a string_list. The `string` pointer of the items will be freed in
> - case the `strdup_strings` member of the string_list is set. The second
> - parameter 

Re: [PATCH 3/4] Document submodule_to_gitdir

2017-09-25 Thread Brandon Williams
On 09/25, Han-Wen Nienhuys wrote:
> Signed-off-by: Han-Wen Nienhuys 
> ---
>  submodule.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/submodule.c b/submodule.c
> index b12600fc7..b66c23f5d 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1997,6 +1997,9 @@ const char *get_superproject_working_tree(void)
>   return ret;
>  }
>  
> +/* Put the gitdir for a submodule (given relative to the main repository 
> worktree)
> + * into `buf`, or return -1 on error.
> + */

Same nit as in patch [2/4]

>  int submodule_to_gitdir(struct strbuf *buf, const char *submodule)
>  {
>   const struct submodule *sub;
> -- 
> 2.14.1.821.g8fa685d3b7-goog
> 

-- 
Brandon Williams


Re: [PATCH 2/4] Clarify return value ownership of real_path and read_gitfile_gently

2017-09-25 Thread Brandon Williams
On 09/25, Han-Wen Nienhuys wrote:
> Signed-off-by: Han-Wen Nienhuys 
> ---
>  abspath.c | 3 +++
>  setup.c   | 3 ++-
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/abspath.c b/abspath.c
> index 708aff8d4..792a2d533 100644
> --- a/abspath.c
> +++ b/abspath.c
> @@ -202,6 +202,9 @@ char *strbuf_realpath(struct strbuf *resolved, const char 
> *path,
>   return retval;
>  }
>  
> +/* Resolve `path` into an absolute, cleaned-up path. The return value
> + * comes from a global shared buffer and should not be freed.
> + */

nit: Our comment style requires the opening '/*' which starts a comment
block to be on its own line.

>  const char *real_path(const char *path)
>  {
>   static struct strbuf realpath = STRBUF_INIT;
> diff --git a/setup.c b/setup.c
> index 6d8380acd..31853724c 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -541,7 +541,8 @@ void read_gitfile_error_die(int error_code, const char 
> *path, const char *dir)
>  
>  /*
>   * Try to read the location of the git directory from the .git file,
> - * return path to git directory if found.
> + * return path to git directory if found. The return value comes from
> + * a shared pool and should not be freed.
>   *
>   * On failure, if return_error_code is not NULL, return_error_code
>   * will be set to an error code and NULL will be returned. If
> -- 
> 2.14.1.821.g8fa685d3b7-goog
> 

-- 
Brandon Williams


Re: [PATCH] git: add --no-optional-locks option

2017-09-25 Thread Jeff King
On Mon, Sep 25, 2017 at 06:10:12PM +0200, Johannes Schindelin wrote:

> On Thu, 21 Sep 2017, Jeff King wrote:
> 
> > Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not*
> > to lock the index and update it, 2016-08-12). Folks working on GitHub
> > Desktop complained to me that it's only available on Windows. :)
> 
> Okay, so this is trying to help me by upstreaming a patch from Git for
> Windows?
> 
> If so: thanks!

Yes, in a round-about way. :)

> The changes, in particular the different semantics, will guarantee that I
> have to work on the consumer's side here, though, leaving me even less
> time for the Git mailing list, so I will need a lot more help with
> upstreaming patches.

I think you can get away with doing nothing for the time being. The two
patches can co-exist in the GfW codebase[1], and people using the
existing option can switch over at their leisure. Eventually you may
decide to revert 67e5ce7f63.

-Peff

[1] Modulo some trivial conflict resolution when the two are merged.


Re: [PATCH v2 1/1] Win32: simplify loading of DLL functions

2017-09-25 Thread Jonathan Nieder
Johannes Schindelin wrote:

> Dynamic loading of DLL functions is duplicated in several places in Git
> for Windows' source code.
>
> This patch adds a pair of macros to simplify the process: the
> DECLARE_PROC_ADDR(, , ,
> ..) macro to be used at the beginning of a
> code block, and the INIT_PROC_ADDR() macro to call before
> using the declared function. The return value of the INIT_PROC_ADDR()
> call has to be checked; If it is NULL, the function was not found in the
> specified DLL.
>
> Example:
>
> DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW,
>   LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES);
>
> if (!INIT_PROC_ADDR(CreateHardLinkW))
> return error("Could not find CreateHardLinkW() function";
>
>   if (!CreateHardLinkW(source, target, NULL))
>   return error("could not create hardlink from %S to %S",
>source, target);
>   return 0;
>
> Signed-off-by: Karsten Blees 
> Signed-off-by: Johannes Schindelin 
> Reviewed-by: Jonathan Nieder 

Yep, this is indeed

Reviewed-by: Jonathan Nieder 

> ---
>  compat/win32/lazyload.h | 57 
> +
>  1 file changed, 57 insertions(+)
>  create mode 100644 compat/win32/lazyload.h

Thanks,
Jonathan


VKTUR'un SİZE ÖZEL KAMPANYALARI DEVAM EDİYOR!!

2017-09-25 Thread vkt...@mynet.com



Re: [PATCH] git: add --no-optional-locks option

2017-09-25 Thread Johannes Schindelin
Hi Kaartic,

On Sun, 24 Sep 2017, Kaartic Sivaraam wrote:

> On Thursday 21 September 2017 10:02 AM, Jeff King wrote:
> > Some tools like IDEs or fancy editors may periodically run commands
> > like "git status" in the background to keep track of the state of the
> > repository.
> 
> I might be missing something, shouldn't the IDEs be encouraged to use
> libgit2 instead? I thought it was meant for these use cases.

There are pros and cons. Visual Studio moved away from libgit2 e.g. to
support SSH (back then, libgit2 did not support spawning processes, I have
no idea whether that changed in the meantime).

Ciao,
Johannes


Re: [PATCH] git: add --no-optional-locks option

2017-09-25 Thread Johannes Schindelin
Hi Peff,

On Thu, 21 Sep 2017, Jeff King wrote:

> Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not*
> to lock the index and update it, 2016-08-12). Folks working on GitHub
> Desktop complained to me that it's only available on Windows. :)

Okay, so this is trying to help me by upstreaming a patch from Git for
Windows?

If so: thanks!

The changes, in particular the different semantics, will guarantee that I
have to work on the consumer's side here, though, leaving me even less
time for the Git mailing list, so I will need a lot more help with
upstreaming patches.

Ciao,
Dscho


Re: [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes

2017-09-25 Thread Jeff King
On Sun, Sep 24, 2017 at 09:59:28PM +0200, Martin Ågren wrote:

> > Anyway, doing:
> >
> >   
> > ASAN_OPTIONS=detect_leaks=1:abort_on_error=0:exitcode=0:log_path=/tmp/lsan/output
> >  \
> >   make SANITIZE=address,leak test
> >
> > should pass the whole suite and give you a host of files to analyze.
> 
> Thanks. My reading of the documentation was off. Turns out exitcode=0
> does not set the exit code to 0, but rather turns it off. Duh.

Actually, the docs are quite confusing. The LSan "exitcode" option
defaults to 23, and that is the exit code you get. So I think it
probably is interpreted as the code to exit with, or 0 for "use the
original exit code".

> > I'm not sure of the best way to count things.
> 
> Right. It's a tricky problem. And in the end, all we find out is where
> we allocate and how we got there. Exactly where we lose/leak that piece
> of allocated memory is another question...

For hunting a particular trace, I think you have to walk up the list of
called functions and see where the pointers go out of scope. I'm not
sure how to make that easier (in theory a compiler-instrumentation like
LSan could do it by performing a leak-check when pointers go out of
scope. But it would also need to know about copies you've made of
pointers, so I imagine it would be extremely slow to run).

But at least on the topic of "how many unique leaks are there", I wrote
the script below to try to give some basic answers. It just finds the
first non-boring entry in each stack trace and reports that. Where
"boring" is really "this function is not expected to free, but hands off
memory ownership to somebody else".

You can use it to do:

  perl leaks.pl /tmp/lsan/output.* | sort | uniq -c | sort -rn | head

to see places that leak a lot. These are either boring calls that need
to be annotated, or are high-value targets for de-leaking.

I notice ref-filter.c has quite a few high entries on the list. I'm not
sure yet which case it falls into. :)

The other interesting thing is seeing how many "unique" leaks there are:

  perl leaks.pl /tmp/lsan/output.* | sort -u | wc -l

I get a bit over 800 with a run of the test suite. Which is a lot, but
fewer than I expected. And I'm sure quite a few of them are really
"duplicates" that can be eliminated in chunks.

So I don't know how useful any of that will be, but it at least should
give _some_ metric that should be diminishing as we fix leaks.

-- >8 --
#!/usr/bin/perl

my $boring = join('|',
  # These are allocation functions that get called from a lot of places.
  qw(
__interceptor_strdup
__interceptor_calloc
realloc
malloc
xstrdup
xcalloc
strbuf_
xmemdupz
xstrvfmt
xstrfmt
xstrndup
  ),
  # These are really just the revision machinery not getting cleaned up;
  # for many we'd probably want to just UNLEAK() at the apex caller
  qw(
add_rev_cmdline
add_object_array_with_path
add_pending_object_with_path
add_pending_object_with_mode
add_pending_object
handle_revision_arg
setup_revisions
  ),
  # More allocators that drop memory ownership
  qw(
alloc_ref_with_prefix
alloc_ref
copy_ref
commit_list_insert
copy_pathspec
  ),
);
my $boring_re = qr/^$boring/;
my $skipping;

while (<>) {
  if (/^\s*#[0-9]+ 0x[0-9a-f]+ in (.*)/) {
next if $skipping; # we already reported this trace
next if $1 =~ $boring_re;

print $1, "\n";
$skipping = 1;
  } else {
$skipping = 0;
  }
}


[PATCH v2 0/1] Simplify loading of DLL functions on Windows

2017-09-25 Thread Johannes Schindelin
Changes since v1:

- repeated the example from the commit message in the .h file

- mentioned that the function is not thread-safe.

- added Jonathan's Reviewed-by: footer


Johannes Schindelin (1):
  Win32: simplify loading of DLL functions

 compat/win32/lazyload.h | 57 +
 1 file changed, 57 insertions(+)
 create mode 100644 compat/win32/lazyload.h


base-commit: 28996cec80690d2322359d3650a57e8de6e01eb6
Published-As: https://github.com/dscho/git/releases/tag/lazyload-v2
Fetch-It-Via: git fetch https://github.com/dscho/git lazyload-v2

Interdiff vs v1:
 diff --git a/compat/win32/lazyload.h b/compat/win32/lazyload.h
 index 91c10dad2fb..9e631c8593f 100644
 --- a/compat/win32/lazyload.h
 +++ b/compat/win32/lazyload.h
 @@ -1,7 +1,19 @@
  #ifndef LAZYLOAD_H
  #define LAZYLOAD_H
  
 -/* simplify loading of DLL functions */
 +/*
 + * A pair of macros to simplify loading of DLL functions. Example:
 + *
 + *   DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW,
 + * LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES);
 + *
 + *   if (!INIT_PROC_ADDR(CreateHardLinkW))
 + *   return error("Could not find CreateHardLinkW() function";
 + *
 + *   if (!CreateHardLinkW(source, target, NULL))
 + *   return error("could not create hardlink from %S to %S",
 + *source, target);
 + */
  
  struct proc_addr {
const char *const dll;
 @@ -20,6 +32,7 @@ struct proc_addr {
   * Loads a function from a DLL (once-only).
   * Returns non-NULL function pointer on success.
   * Returns NULL + errno == ENOSYS on failure.
 + * This function is not thread-safe.
   */
  #define INIT_PROC_ADDR(function) \
(function = get_proc_addr(_addr_##function))
-- 
2.14.1.windows.1.1024.gf2dea585d74.dirty



[PATCH v2 1/1] Win32: simplify loading of DLL functions

2017-09-25 Thread Johannes Schindelin
Dynamic loading of DLL functions is duplicated in several places in Git
for Windows' source code.

This patch adds a pair of macros to simplify the process: the
DECLARE_PROC_ADDR(, , ,
..) macro to be used at the beginning of a
code block, and the INIT_PROC_ADDR() macro to call before
using the declared function. The return value of the INIT_PROC_ADDR()
call has to be checked; If it is NULL, the function was not found in the
specified DLL.

Example:

DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW,
  LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES);

if (!INIT_PROC_ADDR(CreateHardLinkW))
return error("Could not find CreateHardLinkW() function";

if (!CreateHardLinkW(source, target, NULL))
return error("could not create hardlink from %S to %S",
 source, target);
return 0;

Signed-off-by: Karsten Blees 
Signed-off-by: Johannes Schindelin 
Reviewed-by: Jonathan Nieder 
Signed-off-by: Johannes Schindelin 
---
 compat/win32/lazyload.h | 57 +
 1 file changed, 57 insertions(+)
 create mode 100644 compat/win32/lazyload.h

diff --git a/compat/win32/lazyload.h b/compat/win32/lazyload.h
new file mode 100644
index 000..9e631c8593f
--- /dev/null
+++ b/compat/win32/lazyload.h
@@ -0,0 +1,57 @@
+#ifndef LAZYLOAD_H
+#define LAZYLOAD_H
+
+/*
+ * A pair of macros to simplify loading of DLL functions. Example:
+ *
+ *   DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW,
+ * LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES);
+ *
+ *   if (!INIT_PROC_ADDR(CreateHardLinkW))
+ *   return error("Could not find CreateHardLinkW() function";
+ *
+ *   if (!CreateHardLinkW(source, target, NULL))
+ *   return error("could not create hardlink from %S to %S",
+ *source, target);
+ */
+
+struct proc_addr {
+   const char *const dll;
+   const char *const function;
+   FARPROC pfunction;
+   unsigned initialized : 1;
+};
+
+/* Declares a function to be loaded dynamically from a DLL. */
+#define DECLARE_PROC_ADDR(dll, rettype, function, ...) \
+   static struct proc_addr proc_addr_##function = \
+   { #dll, #function, NULL, 0 }; \
+   static rettype (WINAPI *function)(__VA_ARGS__)
+
+/*
+ * Loads a function from a DLL (once-only).
+ * Returns non-NULL function pointer on success.
+ * Returns NULL + errno == ENOSYS on failure.
+ * This function is not thread-safe.
+ */
+#define INIT_PROC_ADDR(function) \
+   (function = get_proc_addr(_addr_##function))
+
+static inline void *get_proc_addr(struct proc_addr *proc)
+{
+   /* only do this once */
+   if (!proc->initialized) {
+   HANDLE hnd;
+   proc->initialized = 1;
+   hnd = LoadLibraryExA(proc->dll, NULL,
+LOAD_LIBRARY_SEARCH_SYSTEM32);
+   if (hnd)
+   proc->pfunction = GetProcAddress(hnd, proc->function);
+   }
+   /* set ENOSYS if DLL or function was not found */
+   if (!proc->pfunction)
+   errno = ENOSYS;
+   return proc->pfunction;
+}
+
+#endif
-- 
2.14.1.windows.1.1024.gf2dea585d74.dirty


[PATCH 1/4] Fix typo in submodule.h

2017-09-25 Thread Han-Wen Nienhuys
Signed-off-by: Han-Wen Nienhuys 
---
 submodule.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/submodule.h b/submodule.h
index 6b52133c8..f0da0277a 100644
--- a/submodule.h
+++ b/submodule.h
@@ -120,7 +120,7 @@ extern int submodule_move_head(const char *path,
 
 /*
  * Prepare the "env_array" parameter of a "struct child_process" for executing
- * a submodule by clearing any repo-specific envirionment variables, but
+ * a submodule by clearing any repo-specific environment variables, but
  * retaining any config in the environment.
  */
 extern void prepare_submodule_repo_env(struct argv_array *out);
-- 
2.14.1.821.g8fa685d3b7-goog



[PATCH 3/4] Document submodule_to_gitdir

2017-09-25 Thread Han-Wen Nienhuys
Signed-off-by: Han-Wen Nienhuys 
---
 submodule.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/submodule.c b/submodule.c
index b12600fc7..b66c23f5d 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1997,6 +1997,9 @@ const char *get_superproject_working_tree(void)
return ret;
 }
 
+/* Put the gitdir for a submodule (given relative to the main repository 
worktree)
+ * into `buf`, or return -1 on error.
+ */
 int submodule_to_gitdir(struct strbuf *buf, const char *submodule)
 {
const struct submodule *sub;
-- 
2.14.1.821.g8fa685d3b7-goog



[PATCH 2/4] Clarify return value ownership of real_path and read_gitfile_gently

2017-09-25 Thread Han-Wen Nienhuys
Signed-off-by: Han-Wen Nienhuys 
---
 abspath.c | 3 +++
 setup.c   | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/abspath.c b/abspath.c
index 708aff8d4..792a2d533 100644
--- a/abspath.c
+++ b/abspath.c
@@ -202,6 +202,9 @@ char *strbuf_realpath(struct strbuf *resolved, const char 
*path,
return retval;
 }
 
+/* Resolve `path` into an absolute, cleaned-up path. The return value
+ * comes from a global shared buffer and should not be freed.
+ */
 const char *real_path(const char *path)
 {
static struct strbuf realpath = STRBUF_INIT;
diff --git a/setup.c b/setup.c
index 6d8380acd..31853724c 100644
--- a/setup.c
+++ b/setup.c
@@ -541,7 +541,8 @@ void read_gitfile_error_die(int error_code, const char 
*path, const char *dir)
 
 /*
  * Try to read the location of the git directory from the .git file,
- * return path to git directory if found.
+ * return path to git directory if found. The return value comes from
+ * a shared pool and should not be freed.
  *
  * On failure, if return_error_code is not NULL, return_error_code
  * will be set to an error code and NULL will be returned. If
-- 
2.14.1.821.g8fa685d3b7-goog



[PATCH 4/4] Move documentation of string_list into string-list.h

2017-09-25 Thread Han-Wen Nienhuys
This mirrors commit bdfdaa4978dd92992737e662f25adc01d32d0774 which did
the same for strbuf.h:

* API documentation uses /** */ to set it apart from other comments.

* Function names were stripped from the comments.

* Ordering of the header was adjusted to follow the one from the text
  file.

* Edited some existing comments to follow the new standard.

Signed-off-by: Han-Wen Nienhuys 
---
 Documentation/technical/api-string-list.txt | 209 
 string-list.h   | 187 +
 2 files changed, 159 insertions(+), 237 deletions(-)
 delete mode 100644 Documentation/technical/api-string-list.txt

diff --git a/Documentation/technical/api-string-list.txt 
b/Documentation/technical/api-string-list.txt
deleted file mode 100644
index c08402b12..0
--- a/Documentation/technical/api-string-list.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-string-list API
-===
-
-The string_list API offers a data structure and functions to handle
-sorted and unsorted string lists.  A "sorted" list is one whose
-entries are sorted by string value in `strcmp()` order.
-
-The 'string_list' struct used to be called 'path_list', but was renamed
-because it is not specific to paths.
-
-The caller:
-
-. Allocates and clears a `struct string_list` variable.
-
-. Initializes the members. You might want to set the flag `strdup_strings`
-  if the strings should be strdup()ed. For example, this is necessary
-  when you add something like git_path("..."), since that function returns
-  a static buffer that will change with the next call to git_path().
-+
-If you need something advanced, you can manually malloc() the `items`
-member (you need this if you add things later) and you should set the
-`nr` and `alloc` members in that case, too.
-
-. Adds new items to the list, using `string_list_append`,
-  `string_list_append_nodup`, `string_list_insert`,
-  `string_list_split`, and/or `string_list_split_in_place`.
-
-. Can check if a string is in the list using `string_list_has_string` or
-  `unsorted_string_list_has_string` and get it from the list using
-  `string_list_lookup` for sorted lists.
-
-. Can sort an unsorted list using `string_list_sort`.
-
-. Can remove duplicate items from a sorted list using
-  `string_list_remove_duplicates`.
-
-. Can remove individual items of an unsorted list using
-  `unsorted_string_list_delete_item`.
-
-. Can remove items not matching a criterion from a sorted or unsorted
-  list using `filter_string_list`, or remove empty strings using
-  `string_list_remove_empty_items`.
-
-. Finally it should free the list using `string_list_clear`.
-
-Example:
-
-
-struct string_list list = STRING_LIST_INIT_NODUP;
-int i;
-
-string_list_append(, "foo");
-string_list_append(, "bar");
-for (i = 0; i < list.nr; i++)
-   printf("%s\n", list.items[i].string)
-
-
-NOTE: It is more efficient to build an unsorted list and sort it
-afterwards, instead of building a sorted list (`O(n log n)` instead of
-`O(n^2)`).
-+
-However, if you use the list to check if a certain string was added
-already, you should not do that (using unsorted_string_list_has_string()),
-because the complexity would be quadratic again (but with a worse factor).
-
-Functions
--
-
-* General ones (works with sorted and unsorted lists as well)
-
-`string_list_init`::
-
-   Initialize the members of the string_list, set `strdup_strings`
-   member according to the value of the second parameter.
-
-`filter_string_list`::
-
-   Apply a function to each item in a list, retaining only the
-   items for which the function returns true.  If free_util is
-   true, call free() on the util members of any items that have
-   to be deleted.  Preserve the order of the items that are
-   retained.
-
-`string_list_remove_empty_items`::
-
-   Remove any empty strings from the list.  If free_util is true,
-   call free() on the util members of any items that have to be
-   deleted.  Preserve the order of the items that are retained.
-
-`print_string_list`::
-
-   Dump a string_list to stdout, useful mainly for debugging purposes. It
-   can take an optional header argument and it writes out the
-   string-pointer pairs of the string_list, each one in its own line.
-
-`string_list_clear`::
-
-   Free a string_list. The `string` pointer of the items will be freed in
-   case the `strdup_strings` member of the string_list is set. The second
-   parameter controls if the `util` pointer of the items should be freed
-   or not.
-
-* Functions for sorted lists only
-
-`string_list_has_string`::
-
-   Determine if the string_list has a given string or not.
-
-`string_list_insert`::
-
-   Insert a new element to the string_list. The returned pointer can be
-   handy if you want to write something to the `util` pointer of the
-   string_list_item containing the just added string. If the given

[PATCH 0/4] Assorted comment fixes

2017-09-25 Thread Han-Wen Nienhuys
I followed Peff's advice for string-list.h comments.

Han-Wen Nienhuys (4):
  Fix typo in submodule.h
  Clarify return value ownership of real_path and read_gitfile_gently
  Document submodule_to_gitdir
  Move documentation of string_list into string-list.h

 Documentation/technical/api-string-list.txt | 209 
 abspath.c   |   3 +
 setup.c |   3 +-
 string-list.h   | 187 +
 submodule.c |   3 +
 submodule.h |   2 +-
 6 files changed, 168 insertions(+), 239 deletions(-)
 delete mode 100644 Documentation/technical/api-string-list.txt

--
2.14.1.821.g8fa685d3b7-goog


Re: [PATCH v2 00/21] Read `packed-refs` using mmap()

2017-09-25 Thread Johannes Schindelin
Hi Peff,


On Wed, 20 Sep 2017, Jeff King wrote:

> On Tue, Sep 19, 2017 at 08:22:08AM +0200, Michael Haggerty wrote:
> 
> > This branch is also available from my fork on GitHub as branch
> > `mmap-packed-refs`.
> > 
> > My main uncertainties are:
> > 
> > 1. Does this code actually work on Windows?
> > 
> 
> I can't really answer (1) due to lack of knowledge.

Sorry, I have only a couple of minutes per day to look through the flood
of emails from the Git mailing list and to answer them, so I forgot to say
that I had a VM build and test Michael's patches, and they worked on
Windows.

Ciao,
Dscho


Re: [PATCH v2 14/21] read_packed_refs(): ensure that references are ordered when read

2017-09-25 Thread Johannes Schindelin
Hi Michael,

On Thu, 21 Sep 2017, Michael Haggerty wrote:

> On 09/20/2017 08:50 PM, Jeff King wrote:
> > On Tue, Sep 19, 2017 at 08:22:22AM +0200, Michael Haggerty wrote:
> >> If `packed-refs` was already sorted, then (if the system allows it)
> >> we can use the mmapped file contents directly. But if the system
> >> doesn't allow a file that is currently mmapped to be replaced using
> >> `rename()`, then it would be bad for us to keep the file mmapped for
> >> any longer than necessary. So, on such systems, always make a copy of
> >> the file contents, either as part of the sorting process, or
> >> afterwards.
> > 
> > So this sort-of answers my question from the previous commit (why we
> > care about the distinction between NONE and TEMPORARY), since we now
> > start treating them differently.
> > 
> > But I'm a bit confused why there's any advantage in the TEMPORARY case
> > to doing the mmap-and-copy versus just treating it like NONE and reading
> > it directly.
> > 
> > Hmm, I guess it comes down to the double-allocation thing again? Let me
> > see if I have this right:
> > 
> >   1. For NO_MMAP, we'd read the buffer once. If it's sorted, good. If
> >  not, then we temporarily hold it in memory twice while we copy it
> >  into the new sorted buffer.
> > 
> >   2. For TEMPORARY, we mmap once. If it's sorted, then we make a single
> >  copy. If it's not sorted, then we do the copy+sort as a single
> >  step.
> > 
> >   3. For MMAP_OK, if it's sorted, we're done. Otherwise, we do the
> >  single copy.
> > 
> > So this is really there to help the TEMPORARY case reading an old
> > unsorted file avoid needing to use double-the-ram during the copy?
> > 
> > That seems like a good reason (and it does not seem to add too much
> > complexity).
> 
> Yes, that's correct. A deeper question is whether it's worth this extra
> effort to optimize the conversion from "unsorted" to "known-sorted",
> which it seems should only happen once per repository. But
> 
> * Many more Git commands read the `packed-refs` file than write it.
>   So an "unsorted" file might have to be read multiple times before
>   the first time it is rewritten with the "sorted" trait.
> 
> * Users might have multiple Git clients writing their `packed-refs`
>   file (e.g., the git CLI plus JGit in Eclipse), and they might not
>   upgrade both clients at the same time to versions that write the
>   "sorted" trait. So a single repository might go back and forth
>   between "unsorted" and "known-sorted" multiple times in its
>   life.
> 
> * Theoretically, somebody might have a `packed-refs` file that is so
>   big that it takes up more than half of memory. I think there are
>   scenarios where such a file could be converted to "sorted" form
>   in `MMAP_TEMPORARY` mode but would fail in `MMAP_NONE` mode.
> 
> On the downside, in my benchmarking on Linux, there were hints that the
> `MMAP_TEMPORARY` branch might be a *tiny* bit slower than the `MMAP_OK`
> branch when operating on a known-sorted `packed-refs` file. But the
> speed difference was barely detectable (and might be illusory). And
> anyway, the speed difference on Linux is moot, since on Linux `MMAP_OK`
> mode will usually be used. It *would* be interesting to know if there is
> a significant speed difference on Windows. Dscho?

Sadly, I lack the time to test this thoroughly, but my experience is that
those things are dominated by I/O on Windows, i.e. that the malloc() and
copy won't matter all that much.

In any case, the same argument you make for Linux holds, in reverse, for
Windows: we cannot use MMAP_OK on Windows, so the point is moot ;-)

Ciao,
Dscho


-s theirs use-case(s) Was: BUG: merge -s theirs is not in effect

2017-09-25 Thread Yaroslav Halchenko

On Mon, 25 Sep 2017, Junio C Hamano wrote:
>It is a different matter to resurrect the age old discussion that
>happend in the summer of 2008 if '-s theirs' should or should not
>exist.  In short, the previous discussion can be summarised to
>"we don't want '-s theirs' as it encourages the wrong workflow".

>
> https://public-inbox.org/git/alpine.DEB.1.00.0807290123300.2725@eeepc-johanness/
>https://public-inbox.org/git/7vtzen7bul@gitster.siamese.dyndns.org/
>https://public-inbox.org/git/20080720192130.6...@nanako3.lavabit.com/

>It is OK for people to come with new perspective and bring new
>ideas to the table.  We learned from experience while using Git
>for longer and are wiser than what we were back then, and might
>be able to make a better decision ;-)

FWIW

1. As a workaround for absence of -m theirs I using mtheirs git alias:
(I believe provided to me awhile back here on the list):

mtheirs = !sh -c 'git merge -s ours --no-commit $1 && git read-tree -m -u 
$1' -

and it worked fine for my usecases

2. I think that if there is a reason for -s ours to exist, so there for -s 
theirs
since it is just the directionality of merges which changes between the two

3. My most frequently used use-case for -m theirs strategy is repositories such 
as 

http://datasets.datalad.org/openfmri/ds01/.git

where we construct "datalad dataset" by crawling the web resource(s), and
workflow consists of 3 branches:

incoming -- content from the web "as is"
incoming-processed   -- content from the web "processed" (fully automatically),
e.g. tarballs extracted etc
master   -- the "final" result, delivered to public

incoming-processed   is formed  by  -s theirs --no-commit  incoming, then all
content needed to be extracted/processed (since last such merge point) is
processed and commit is done.  Such "merge" allows us to establish a point of
previous "processing state" so we could react appropriately whenever anything
in "incoming" branch changes (so that there is a new commit).

  And then incoming-processed is merged (regular recursive) into the
master branch, which might have further "manual" tune ups.

PS thanks for CCing replies

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik


-X theirs does not resolve symlink conflict Was: BUG: merge -s theirs is not in effect

2017-09-25 Thread Yaroslav Halchenko

On Mon, 25 Sep 2017, Junio C Hamano wrote:

> Yaroslav Halchenko  writes:

> > d'oh, indeed there is no git-merge-theirs  neither in debian pkg or a 
> > freshly
> > built git  and I found a rogue script in the PATH (which did nothing
> > apparently, sorry!). BUT I was originally mislead by the --help/manpage:

> Ahh, you're right.  The text does make readers expect "-s theirs" to
> exist.
> ...
>  * I hope this should help things a bit.

yes it does. Thanks.  And that is where I realized that I should have used -X
theirs (not -s theirs), as the instruction on the option for the
(recursive) merge.  And now problem is more specific:

- conflict within file content editing was resolved as instructed
  (taking "theirs" version)

- BUT symlink was not taken from "theirs" and left as unresolved conflict:

$> rm -rf /tmp/repo1; mkdir /tmp/repo1; cd /tmp/repo1; git init .; ln -s sym1 
link; echo 1 > file; git add file link; git commit -m 'common'; git co -b b1 ; 
ln -sf b1link link; echo "b1 file" >| file; git commit -m 'b2 changes' -a; git 
co master; ln -sf masterlink link; echo "master file" >| file; git commit -m 
'also modified in master' -a; git merge -X theirs --no-edit b1; ls -l link; cat 
file
warning: templates not found /home/yoh/share/git-core/templates
Initialized empty Git repository in /tmp/repo1/.git/
[master (root-commit) f0b75bc] common
 2 files changed, 2 insertions(+)
 create mode 100644 file
 create mode 12 link
Switched to a new branch 'b1'
[b1 45c93ca] b2 changes
 2 files changed, 2 insertions(+), 2 deletions(-)
Switched to branch 'master'
[master 0ee6db2] also modified in master
 2 files changed, 2 insertions(+), 2 deletions(-)
Auto-merging link
CONFLICT (content): Merge conflict in link
Auto-merging file
Automatic merge failed; fix conflicts and then commit the result.
lrwxrwxrwx 1 yoh yoh 10 Sep 25 10:21 link -> masterlink
b1 file
changes on filesystem:  

 link | Unmerged
cached/staged changes:
 file | 2 +-
 link | Unmerged


PS I will followup on -s theirs in a split thread
PSS Thanks for CCing me your replies
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik


Re: [PATCH] mailinfo: don't decode invalid =XY quoted-printable sequences

2017-09-25 Thread Jeff King
On Sat, Sep 23, 2017 at 08:04:40PM +0200, René Scharfe wrote:

> Decode =XY in quoted-printable segments only if X and Y are hexadecimal
> digits, otherwise just copy them.  That's at least better than
> interpreting negative results from hexval() as a character.

Thanks, this looks good to me overall.

I wondered if we should die() here, but walking over cruft may be more
friendly. The base64 case does the same, though it actually ignores the
bytes rather than copying them. Since this is never _supposed_ to
happen, it's hard to say what behavior would be preferable without
seeing a real-world broken case.

-Peff


Re: [PATCH v3 00/21] Read `packed-refs` using mmap()

2017-09-25 Thread Jeff King
On Mon, Sep 25, 2017 at 09:59:57AM +0200, Michael Haggerty wrote:

> This is v3 of a patch series that changes the reading and caching of
> the `packed-refs` file to use `mmap()`. Thanks to Stefan, Peff, Dscho,
> and Junio for their comments about v2. I think I have addressed all of
> the feedback from v1 [1] and v2 [2].
> 
> This version has only minor changes relative to v2:
> 
> * Fixed a trivial error in the commit message for patch 08.
> 
> * In patch 13:
> 
>   * In the commit message, explain the appearance of `MMAP_TEMPORARY`
> even though it is not yet treated differently than `MMAP_NONE`.
> 
>   * In `Makefile`, don't make `USE_WIN32_MMAP` imply
> `MMAP_PREVENTS_DELETE`.
> 
>   * Correct the type of a local variable from `size_t` to `ssize_t`.

Thanks, this version addresses all my nits.

-Peff


Re: [PATCH 3/4] merge: --no-verify to bypass pre-merge hook

2017-09-25 Thread Michael J Gruber
Junio C Hamano venit, vidit, dixit 24.09.2017 01:48:
> Michael J Gruber  writes:
> 
>> From: Michael J Gruber 
>>
>> Analogous to commit, introduce a '--no-verify' option which bypasses the
>> pre-merge hook. The shorthand '-n' is taken by the (non-existing)
>> '--no-stat' already.
>>
>> Signed-off-by: Michael J Gruber 
>> ---
> 
> It appears that some of the pieces in this patch, namely,
> D/git-merge.txt and D/merge-options.txt, belong to 2/4, where we are
> fixing up an earlier addition of --[no-]verify option to the
> command, to be updated to only add mention of pre-merge hook in this
> step?

Oh, sorry, rebaser error. I should also rewrite all commits to my
current e-mail.

> The end result looks good regardless, I would think, though.

Pending restructuring and attending to the other comments...


[RFC PATCH 8/8] sequencer: try to commit without forking 'git commit'

2017-09-25 Thread Phillip Wood
From: Phillip Wood 

If the commit message does not need to be edited then create the
commit without forking 'git commit'. Taking the best time of ten runs
with a warm cache this reduces the time taken to cherry-pick 10
commits by 26% (from 319ms to 235ms), and the time taken by 'git
rebase --continue' to pick 10 commits by 44% (from 376ms to 211ms) on
my computer running linux. The greater saving for rebase is because it
longer wastes time creating the commit summary just to throw it away.

Even when not forking 'git commit' the commit message is written to a
file and CHERRY_PICK_HEAD/REVERT_HEAD are created unnecessarily. This
could be eliminated in future. I hacked up a version that does not
write these files and just passed an strbuf (with the wrong message
for fixup and squash commands) to do_commit() but I couldn't measure
any significant time difference when running cherry-pick or rebase. I
think eliminating the writes properly for rebase would require a bit
of effort as the code would need to be restructured.

Signed-off-by: Phillip Wood 
---

I wonder if this should reparse the author it gets from the existing
commit rather than just reusing it.

 sequencer.c | 149 +++-
 1 file changed, 147 insertions(+), 2 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 
230bdb8535a422b1263429e5894e3f8b1733e844..ebba53583727e947ed767d6181b14254e23fc146
 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -591,6 +591,18 @@ static int read_env_script(struct argv_array *env)
return 0;
 }
 
+static char *get_author(const char* message)
+{
+   size_t len;
+   const char *a;
+
+   a = find_commit_header(message, "author", );
+   if (a)
+   return xmemdupz(a, len);
+
+   return NULL;
+}
+
 static const char staged_changes_advice[] =
 N_("you have staged changes in your working tree\n"
 "If these changes are meant to be squashed into the previous commit, run:\n"
@@ -973,6 +985,130 @@ void print_commit_summary(const char *prefix, const 
struct object_id *oid,
strbuf_release();
 }
 
+static int try_to_commit(struct strbuf *msgbuf, const char *author,
+struct replay_opts *opts, unsigned int flags)
+{
+   struct object_id tree;
+   struct object_id oid;
+   struct commit *current_head;
+   struct commit_list *parents = NULL;
+   struct commit_extra_header *extra = NULL;
+   struct strbuf err = STRBUF_INIT;
+   struct strbuf amend_msg = STRBUF_INIT;
+   struct strbuf *msg;
+   char *amend_author = NULL;
+   const char *gpg_sign;
+   enum cleanup_mode cleanup;
+   int res = 0;
+
+   msg = msgbuf;
+   if (flags & EDIT_MSG || flags & VERIFY_MSG)
+   return 1;
+
+   if (get_oid("HEAD", )) {
+   current_head = NULL;
+   } else {
+   current_head = lookup_commit_or_die(, "HEAD");
+   if (parse_commit(current_head))
+   return error(_("could not parse HEAD commit"));
+   }
+
+   if (flags & AMEND_MSG) {
+   const char *body;
+   const char *exclude_gpgsig[2] = { "gpgsig", NULL };
+   const char *out_enc = get_commit_output_encoding();
+   const char *message = logmsg_reencode(current_head, NULL,
+ out_enc);
+
+   msg = _msg;
+   if (find_commit_subject(message, ))
+   strbuf_addstr(_msg, body);
+   author = amend_author = get_author (message);
+   unuse_commit_buffer(current_head, message);
+   if (!author) {
+   res = error(_("unable to parse commit author"));
+   goto out;
+   }
+   parents = copy_commit_list(current_head->parents);
+   extra = read_commit_extra_headers(current_head, exclude_gpgsig);
+   } else if (current_head) {
+   commit_list_insert(current_head, );
+   }
+
+   cleanup = (flags & CLEANUP_MSG) ? CLEANUP_ALL : default_msg_cleanup;
+   if (cleanup != CLEANUP_NONE)
+   strbuf_stripspace(msg, cleanup == CLEANUP_ALL);
+   if (!opts->allow_empty_message && message_is_empty(msg, cleanup)) {
+   res = 1;
+   goto out;
+   }
+
+   gpg_sign = (opts->gpg_sign) ? opts->gpg_sign : default_gpg_sign;
+
+   if (write_cache_as_tree(tree.hash, 0, NULL)) {
+   res = error(_("git write-tree failed to write a tree"));
+   goto out;
+   }
+
+   if (!(flags & ALLOW_EMPTY) && !oidcmp(current_head ?
+ _head->tree->object.oid :
+ _tree_oid, )) {
+   res = 1;
+   goto out;
+   }
+
+   res = commit_tree_extended(msg->buf, msg->len, 

[RFC PATCH 2/8] commit: move code to update HEAD to libgit

2017-09-25 Thread Phillip Wood
From: Phillip Wood 

Signed-off-by: Phillip Wood 
---
 builtin/commit.c | 31 +++
 sequencer.c  | 39 ++-
 sequencer.h  |  2 ++
 3 files changed, 47 insertions(+), 25 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 
0b8c1ef6f57cfed328d12255e6834adb4bda4137..497778ba2c02afdd4a337969a27ca781e8389040
 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1578,13 +1578,11 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
struct strbuf sb = STRBUF_INIT;
struct strbuf author_ident = STRBUF_INIT;
const char *index_file, *reflog_msg;
-   char *nl;
struct object_id oid;
struct commit_list *parents = NULL;
struct stat statbuf;
struct commit *current_head = NULL;
struct commit_extra_header *extra = NULL;
-   struct ref_transaction *transaction;
struct strbuf err = STRBUF_INIT;
 
if (argc == 2 && !strcmp(argv[1], "-h"))
@@ -1625,10 +1623,10 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
reflog_msg = getenv("GIT_REFLOG_ACTION");
if (!current_head) {
if (!reflog_msg)
-   reflog_msg = "commit (initial)";
+   setenv ("GIT_REFLOG_ACTION", "commit (initial)", 1);
} else if (amend) {
if (!reflog_msg)
-   reflog_msg = "commit (amend)";
+   setenv("GIT_REFLOG_ACTION", "commit (amend)", 1);
parents = copy_commit_list(current_head->parents);
} else if (whence == FROM_MERGE) {
struct strbuf m = STRBUF_INIT;
@@ -1637,7 +1635,7 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
struct commit_list **pptr = 
 
if (!reflog_msg)
-   reflog_msg = "commit (merge)";
+   setenv("GIT_REFLOG_ACTION", "commit (merge)", 1);
pptr = commit_list_append(current_head, pptr);
fp = xfopen(git_path_merge_head(), "r");
while (strbuf_getline_lf(, fp) != EOF) {
@@ -1660,9 +1658,9 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
parents = reduce_heads(parents);
} else {
if (!reflog_msg)
-   reflog_msg = (whence == FROM_CHERRY_PICK)
-   ? "commit (cherry-pick)"
-   : "commit";
+   setenv("GIT_REFLOG_ACTION", (whence == FROM_CHERRY_PICK)
+   ? "commit (cherry-pick)"
+   : "commit", 1);
commit_list_insert(current_head, );
}
 
@@ -1707,25 +1705,10 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
strbuf_release(_ident);
free_commit_extra_headers(extra);
 
-   nl = strchr(sb.buf, '\n');
-   if (nl)
-   strbuf_setlen(, nl + 1 - sb.buf);
-   else
-   strbuf_addch(, '\n');
-   strbuf_insert(, 0, reflog_msg, strlen(reflog_msg));
-   strbuf_insert(, strlen(reflog_msg), ": ", 2);
-
-   transaction = ref_transaction_begin();
-   if (!transaction ||
-   ref_transaction_update(transaction, "HEAD", oid.hash,
-  current_head
-  ? current_head->object.oid.hash : null_sha1,
-  0, sb.buf, ) ||
-   ref_transaction_commit(transaction, )) {
+   if (update_head (current_head, , , )) {
rollback_index_files();
die("%s", err.buf);
}
-   ref_transaction_free(transaction);
 
unlink(git_path_cherry_pick_head());
unlink(git_path_revert_head());
diff --git a/sequencer.c b/sequencer.c
index 
319208afb3de36c97b6c62d4ecf6e641245e7a54..917ad4a16216b30adb2c2c9650217926d8db8ba7
 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1,10 +1,10 @@
 #include "cache.h"
 #include "config.h"
 #include "lockfile.h"
-#include "sequencer.h"
 #include "dir.h"
 #include "object.h"
 #include "commit.h"
+#include "sequencer.h"
 #include "tag.h"
 #include "run-command.h"
 #include "exec_cmd.h"
@@ -750,6 +750,43 @@ int template_untouched(const struct strbuf *sb, const char 
*template_file,
return rest_is_empty(sb, start - sb->buf);
 }
 
+int update_head(const struct commit *old_head, const struct object_id 
*new_head,
+   const struct strbuf *msg, struct strbuf *err)
+{
+   struct ref_transaction *transaction;
+   struct strbuf sb = STRBUF_INIT;
+   const char *nl, *reflog_msg;
+   int ret = 0;
+
+   reflog_msg = getenv("GIT_REFLOG_ACTION");
+   if (!reflog_msg)
+   reflog_msg="";
+
+   nl = strchr(msg->buf, '\n');
+   if (nl) {

[RFC PATCH 0/8] sequencer: dont't fork git commit

2017-09-25 Thread Phillip Wood
From: Phillip Wood 

These patches teach the sequencer to create commits without forking
git commit when the commit message does not need to be edited. This
speeds up cherry picking 10 commits by 26% and picking 10 commits with
rebase --continue by 44%. The first few patches move bits of
builtin/commit.c to sequencer.c. The last two patches actually
implement creating commits in sequencer.c.

Phillip Wood (8):
  commit: move empty message checks to libgit
  commit: move code to update HEAD to libgit
  sequencer: refactor update_head()
  commit: move post-rewrite code to libgit
  commit: move print_commit_summary() to libgit
  sequencer: simplify adding Signed-off-by: trailer
  sequencer: load commit related config
  sequencer: try to commit without forking 'git commit'

 builtin/commit.c | 269 ++--
 builtin/rebase--helper.c |  13 +-
 builtin/revert.c |  15 +-
 sequencer.c  | 447 ++-
 sequencer.h  |  20 +++
 5 files changed, 503 insertions(+), 261 deletions(-)

-- 
2.14.1



[RFC PATCH 1/8] commit: move empty message checks to libgit

2017-09-25 Thread Phillip Wood
From: Phillip Wood 

Signed-off-by: Phillip Wood 
---
 builtin/commit.c | 70 +++-
 sequencer.c  | 60 
 sequencer.h  | 10 
 3 files changed, 73 insertions(+), 67 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 
b3b04f5dd3a94d1661e877c5019cc56ac46854ef..0b8c1ef6f57cfed328d12255e6834adb4bda4137
 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -128,12 +128,7 @@ static char *sign_commit;
  * if editor is used, and only the whitespaces if the message
  * is specified explicitly.
  */
-static enum {
-   CLEANUP_SPACE,
-   CLEANUP_NONE,
-   CLEANUP_SCISSORS,
-   CLEANUP_ALL
-} cleanup_mode;
+static enum cleanup_mode cleanup_mode;
 static const char *cleanup_arg;
 
 static enum commit_whence whence;
@@ -978,65 +973,6 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
return 1;
 }
 
-static int rest_is_empty(struct strbuf *sb, int start)
-{
-   int i, eol;
-   const char *nl;
-
-   /* Check if the rest is just whitespace and Signed-off-by's. */
-   for (i = start; i < sb->len; i++) {
-   nl = memchr(sb->buf + i, '\n', sb->len - i);
-   if (nl)
-   eol = nl - sb->buf;
-   else
-   eol = sb->len;
-
-   if (strlen(sign_off_header) <= eol - i &&
-   starts_with(sb->buf + i, sign_off_header)) {
-   i = eol;
-   continue;
-   }
-   while (i < eol)
-   if (!isspace(sb->buf[i++]))
-   return 0;
-   }
-
-   return 1;
-}
-
-/*
- * Find out if the message in the strbuf contains only whitespace and
- * Signed-off-by lines.
- */
-static int message_is_empty(struct strbuf *sb)
-{
-   if (cleanup_mode == CLEANUP_NONE && sb->len)
-   return 0;
-   return rest_is_empty(sb, 0);
-}
-
-/*
- * See if the user edited the message in the editor or left what
- * was in the template intact
- */
-static int template_untouched(struct strbuf *sb)
-{
-   struct strbuf tmpl = STRBUF_INIT;
-   const char *start;
-
-   if (cleanup_mode == CLEANUP_NONE && sb->len)
-   return 0;
-
-   if (!template_file || strbuf_read_file(, template_file, 0) <= 0)
-   return 0;
-
-   strbuf_stripspace(, cleanup_mode == CLEANUP_ALL);
-   if (!skip_prefix(sb->buf, tmpl.buf, ))
-   start = sb->buf;
-   strbuf_release();
-   return rest_is_empty(sb, start - sb->buf);
-}
-
 static const char *find_author_by_nickname(const char *name)
 {
struct rev_info revs;
@@ -1744,12 +1680,12 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
if (cleanup_mode != CLEANUP_NONE)
strbuf_stripspace(, cleanup_mode == CLEANUP_ALL);
 
-   if (message_is_empty() && !allow_empty_message) {
+   if (message_is_empty(, cleanup_mode) && !allow_empty_message) {
rollback_index_files();
fprintf(stderr, _("Aborting commit due to empty commit 
message.\n"));
exit(1);
}
-   if (template_untouched() && !allow_empty_message) {
+   if (template_untouched(, template_file, cleanup_mode) && 
!allow_empty_message) {
rollback_index_files();
fprintf(stderr, _("Aborting commit; you did not edit the 
message.\n"));
exit(1);
diff --git a/sequencer.c b/sequencer.c
index 
fcceabb80f4261006cdd65bc0ec95ac54ea42e7c..319208afb3de36c97b6c62d4ecf6e641245e7a54
 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -690,6 +690,66 @@ static int run_git_commit(const char *defmsg, struct 
replay_opts *opts,
return run_command();
 }
 
+static int rest_is_empty(const struct strbuf *sb, int start)
+{
+   int i, eol;
+   const char *nl;
+
+   /* Check if the rest is just whitespace and Signed-off-by's. */
+   for (i = start; i < sb->len; i++) {
+   nl = memchr(sb->buf + i, '\n', sb->len - i);
+   if (nl)
+   eol = nl - sb->buf;
+   else
+   eol = sb->len;
+
+   if (strlen(sign_off_header) <= eol - i &&
+   starts_with(sb->buf + i, sign_off_header)) {
+   i = eol;
+   continue;
+   }
+   while (i < eol)
+   if (!isspace(sb->buf[i++]))
+   return 0;
+   }
+
+   return 1;
+}
+
+/*
+ * Find out if the message in the strbuf contains only whitespace and
+ * Signed-off-by lines.
+ */
+int message_is_empty(const struct strbuf *sb, enum cleanup_mode cleanup_mode)
+{
+   if (cleanup_mode == CLEANUP_NONE && sb->len)
+   return 0;
+   return 

[RFC PATCH 4/8] commit: move post-rewrite code to libgit

2017-09-25 Thread Phillip Wood
From: Phillip Wood 

Signed-off-by: Phillip Wood 
---
 builtin/commit.c | 42 +-
 sequencer.c  | 46 ++
 sequencer.h  |  2 ++
 3 files changed, 49 insertions(+), 41 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 
497778ba2c02afdd4a337969a27ca781e8389040..9d621098823d196643180226491e43c806154c13
 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -31,9 +31,7 @@
 #include "gpg-interface.h"
 #include "column.h"
 #include "sequencer.h"
-#include "notes-utils.h"
 #include "mailmap.h"
-#include "sigchain.h"
 
 static const char * const builtin_commit_usage[] = {
N_("git commit [] [--] ..."),
@@ -1465,37 +1463,6 @@ static int git_commit_config(const char *k, const char 
*v, void *cb)
return git_status_config(k, v, s);
 }
 
-static int run_rewrite_hook(const struct object_id *oldoid,
-   const struct object_id *newoid)
-{
-   struct child_process proc = CHILD_PROCESS_INIT;
-   const char *argv[3];
-   int code;
-   struct strbuf sb = STRBUF_INIT;
-
-   argv[0] = find_hook("post-rewrite");
-   if (!argv[0])
-   return 0;
-
-   argv[1] = "amend";
-   argv[2] = NULL;
-
-   proc.argv = argv;
-   proc.in = -1;
-   proc.stdout_to_stderr = 1;
-
-   code = start_command();
-   if (code)
-   return code;
-   strbuf_addf(, "%s %s\n", oid_to_hex(oldoid), oid_to_hex(newoid));
-   sigchain_push(SIGPIPE, SIG_IGN);
-   write_in_full(proc.in, sb.buf, sb.len);
-   close(proc.in);
-   strbuf_release();
-   sigchain_pop(SIGPIPE);
-   return finish_command();
-}
-
 int run_commit_hook(int editor_is_used, const char *index_file, const char 
*name, ...)
 {
struct argv_array hook_env = ARGV_ARRAY_INIT;
@@ -1725,14 +1692,7 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
rerere(0);
run_commit_hook(use_editor, get_index_file(), "post-commit", NULL);
if (amend && !no_post_rewrite) {
-   struct notes_rewrite_cfg *cfg;
-   cfg = init_copy_notes_for_rewrite("amend");
-   if (cfg) {
-   /* we are amending, so current_head is not NULL */
-   copy_note_for_rewrite(cfg, _head->object.oid, 
);
-   finish_copy_notes_for_rewrite(cfg, "Notes added by 'git 
commit --amend'");
-   }
-   run_rewrite_hook(_head->object.oid, );
+   commit_post_rewrite(current_head, );
}
if (!quiet)
print_summary(prefix, , !current_head);
diff --git a/sequencer.c b/sequencer.c
index 
1795a4df2a0021b2419d941c6083e49cd6647314..81bd0810df6bcf2e078abc220bb8984345bf467f
 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -20,6 +20,8 @@
 #include "trailer.h"
 #include "log-tree.h"
 #include "wt-status.h"
+#include "notes-utils.h"
+#include "sigchain.h"
 
 #define GIT_REFLOG_ACTION "GIT_REFLOG_ACTION"
 
@@ -786,6 +788,50 @@ int update_head(const struct commit *old_head, const 
struct object_id *new_head,
return ret;
 }
 
+static int run_rewrite_hook(const struct object_id *oldoid,
+   const struct object_id *newoid)
+{
+   struct child_process proc = CHILD_PROCESS_INIT;
+   const char *argv[3];
+   int code;
+   struct strbuf sb = STRBUF_INIT;
+
+   argv[0] = find_hook("post-rewrite");
+   if (!argv[0])
+   return 0;
+
+   argv[1] = "amend";
+   argv[2] = NULL;
+
+   proc.argv = argv;
+   proc.in = -1;
+   proc.stdout_to_stderr = 1;
+
+   code = start_command();
+   if (code)
+   return code;
+   strbuf_addf(, "%s %s\n", oid_to_hex(oldoid), oid_to_hex(newoid));
+   sigchain_push(SIGPIPE, SIG_IGN);
+   write_in_full(proc.in, sb.buf, sb.len);
+   close(proc.in);
+   strbuf_release();
+   sigchain_pop(SIGPIPE);
+   return finish_command();
+}
+
+void commit_post_rewrite(const struct commit *old_head,
+const struct object_id *new_head)
+{
+   struct notes_rewrite_cfg *cfg;
+   cfg = init_copy_notes_for_rewrite("amend");
+   if (cfg) {
+   /* we are amending, so current_head is not NULL */
+   copy_note_for_rewrite(cfg, _head->object.oid, new_head);
+   finish_copy_notes_for_rewrite(cfg, "Notes added by 'git commit 
--amend'");
+   }
+   run_rewrite_hook(_head->object.oid, new_head);
+}
+
 static int is_original_commit_empty(struct commit *commit)
 {
const struct object_id *ptree_oid;
diff --git a/sequencer.h b/sequencer.h
index 
87edf40e5274d59f48d5af57678100ea220d2c8a..45def684ad751d0b8dc62b6cfdfb819ddf183c89
 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -62,4 +62,6 @@ int template_untouched(const struct strbuf *sb, const char 
*template_file,
 

[RFC PATCH 5/8] commit: move print_commit_summary() to libgit

2017-09-25 Thread Phillip Wood
From: Phillip Wood 

Signed-off-by: Phillip Wood 
---
 builtin/commit.c | 126 ---
 sequencer.c  | 116 ++
 sequencer.h  |   5 +++
 3 files changed, 129 insertions(+), 118 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 
9d621098823d196643180226491e43c806154c13..fac392216c4dc4d4a2b9fbee1d9ee14da0a14209
 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -43,31 +43,6 @@ static const char * const builtin_status_usage[] = {
NULL
 };
 
-static const char implicit_ident_advice_noconfig[] =
-N_("Your name and email address were configured automatically based\n"
-"on your username and hostname. Please check that they are accurate.\n"
-"You can suppress this message by setting them explicitly. Run the\n"
-"following command and follow the instructions in your editor to edit\n"
-"your configuration file:\n"
-"\n"
-"git config --global --edit\n"
-"\n"
-"After doing this, you may fix the identity used for this commit with:\n"
-"\n"
-"git commit --amend --reset-author\n");
-
-static const char implicit_ident_advice_config[] =
-N_("Your name and email address were configured automatically based\n"
-"on your username and hostname. Please check that they are accurate.\n"
-"You can suppress this message by setting them explicitly:\n"
-"\n"
-"git config --global user.name \"Your Name\"\n"
-"git config --global user.email y...@example.com\n"
-"\n"
-"After doing this, you may fix the identity used for this commit with:\n"
-"\n"
-"git commit --amend --reset-author\n");
-
 static const char empty_amend_advice[] =
 N_("You asked to amend the most recent commit, but doing so would make\n"
 "it empty. You can repeat your command with --allow-empty, or you can\n"
@@ -1343,97 +1318,6 @@ int cmd_status(int argc, const char **argv, const char 
*prefix)
return 0;
 }
 
-static const char *implicit_ident_advice(void)
-{
-   char *user_config = expand_user_path("~/.gitconfig", 0);
-   char *xdg_config = xdg_config_home("config");
-   int config_exists = file_exists(user_config) || file_exists(xdg_config);
-
-   free(user_config);
-   free(xdg_config);
-
-   if (config_exists)
-   return _(implicit_ident_advice_config);
-   else
-   return _(implicit_ident_advice_noconfig);
-
-}
-
-static void print_summary(const char *prefix, const struct object_id *oid,
- int initial_commit)
-{
-   struct rev_info rev;
-   struct commit *commit;
-   struct strbuf format = STRBUF_INIT;
-   struct object_id junk_oid;
-   const char *head;
-   struct pretty_print_context pctx = {0};
-   struct strbuf author_ident = STRBUF_INIT;
-   struct strbuf committer_ident = STRBUF_INIT;
-
-   commit = lookup_commit(oid);
-   if (!commit)
-   die(_("couldn't look up newly created commit"));
-   if (parse_commit(commit))
-   die(_("could not parse newly created commit"));
-
-   strbuf_addstr(, "format:%h] %s");
-
-   format_commit_message(commit, "%an <%ae>", _ident, );
-   format_commit_message(commit, "%cn <%ce>", _ident, );
-   if (strbuf_cmp(_ident, _ident)) {
-   strbuf_addstr(, "\n Author: ");
-   strbuf_addbuf_percentquote(, _ident);
-   }
-   if (author_date_is_interesting()) {
-   struct strbuf date = STRBUF_INIT;
-   format_commit_message(commit, "%ad", , );
-   strbuf_addstr(, "\n Date: ");
-   strbuf_addbuf_percentquote(, );
-   strbuf_release();
-   }
-   if (!committer_ident_sufficiently_given()) {
-   strbuf_addstr(, "\n Committer: ");
-   strbuf_addbuf_percentquote(, _ident);
-   if (advice_implicit_identity) {
-   strbuf_addch(, '\n');
-   strbuf_addstr(, implicit_ident_advice());
-   }
-   }
-   strbuf_release(_ident);
-   strbuf_release(_ident);
-
-   init_revisions(, prefix);
-   setup_revisions(0, NULL, , NULL);
-
-   rev.diff = 1;
-   rev.diffopt.output_format =
-   DIFF_FORMAT_SHORTSTAT | DIFF_FORMAT_SUMMARY;
-
-   rev.verbose_header = 1;
-   rev.show_root_diff = 1;
-   get_commit_format(format.buf, );
-   rev.always_show_header = 0;
-   rev.diffopt.detect_rename = 1;
-   rev.diffopt.break_opt = 0;
-   diff_setup_done();
-
-   head = resolve_ref_unsafe("HEAD", 0, junk_oid.hash, NULL);
-   if (!strcmp(head, "HEAD"))
-   head = _("detached HEAD");
-   else
-   skip_prefix(head, "refs/heads/", );
-   printf("[%s%s ", head, initial_commit ? _(" (root-commit)") : "");
-
-   if (!log_tree_commit(, commit)) {
-   rev.always_show_header = 1;
-   

[RFC PATCH 6/8] sequencer: simplify adding Signed-off-by: trailer

2017-09-25 Thread Phillip Wood
From: Phillip Wood 

Add the Signed-off-by: trailer in one place rather than adding it to
the message when doing a recursive merge and then again when
committing. This means that if there are conflicts when merging with a
strategy other than 'recursive' the Signed-off-by: trailer will be
added if the user commits the resolution themselves without passing
'--signoff' to 'git commit'. It also simplifies the in-process commit
that is about to be added to the sequencer.

Signed-off-by: Phillip Wood 
---
 sequencer.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 
65031353f01d75624951222c2de280c427e26ac5..e9060e3ca50777687c578ff09c62cd901efcfb0e
 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -476,9 +476,6 @@ static int do_recursive_merge(struct commit *base, struct 
commit *next,
_(action_name(opts)));
rollback_lock_file(_lock);
 
-   if (opts->signoff)
-   append_signoff(msgbuf, 0, 0);
-
if (!clean)
append_conflicts_hint(msgbuf);
 
@@ -656,8 +653,6 @@ static int run_git_commit(const char *defmsg, struct 
replay_opts *opts,
argv_array_push(, "--amend");
if (opts->gpg_sign)
argv_array_pushf(, "-S%s", opts->gpg_sign);
-   if (opts->signoff)
-   argv_array_push(, "-s");
if (defmsg)
argv_array_pushl(, "-F", defmsg, NULL);
if ((flags & CLEANUP_MSG))
@@ -1339,6 +1334,9 @@ static int do_pick_commit(enum todo_command command, 
struct commit *commit,
}
}
 
+   if (opts->signoff)
+   append_signoff(, 0, 0);
+
if (is_rebase_i(opts) && write_author_script(msg.message) < 0)
res = -1;
else if (!opts->strategy || !strcmp(opts->strategy, "recursive") || 
command == TODO_REVERT) {
-- 
2.14.1



[RFC PATCH 7/8] sequencer: load commit related config

2017-09-25 Thread Phillip Wood
From: Phillip Wood 

Load default values for message cleanup and gpg signing of commits in
preparation for committing without forking 'git commit'.

Signed-off-by: Phillip Wood 
---
 builtin/rebase--helper.c | 13 -
 builtin/revert.c | 15 +--
 sequencer.c  | 30 ++
 sequencer.h  |  1 +
 4 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c
index 
c82b4dce6838fa039e9c2cfc769c7cd17a8ef562..3a99f9f8f1396ee5a040c7c4e9367688fe04c9a4
 100644
--- a/builtin/rebase--helper.c
+++ b/builtin/rebase--helper.c
@@ -9,6 +9,17 @@ static const char * const builtin_rebase_helper_usage[] = {
NULL
 };
 
+static int git_rebase_helper_config (const char *k, const char *v, void *cb)
+{
+   int status;
+
+   status = git_sequencer_config (k, v, NULL);
+   if (status)
+   return status;
+
+   return git_default_config(k, v, NULL);
+}
+
 int cmd_rebase__helper(int argc, const char **argv, const char *prefix)
 {
struct replay_opts opts = REPLAY_OPTS_INIT;
@@ -24,7 +35,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
OPT_END()
};
 
-   git_config(git_default_config, NULL);
+   git_config(git_rebase_helper_config, NULL);
 
opts.action = REPLAY_INTERACTIVE_REBASE;
opts.allow_ff = 1;
diff --git a/builtin/revert.c b/builtin/revert.c
index 
b9d927eb09c9ed87c84681df1396f4e6d9b13c97..2883f9eb71be1c2b205fd0f1d962748438f4c4b8
 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -31,6 +31,17 @@ static const char * const cherry_pick_usage[] = {
NULL
 };
 
+static int git_revert_config (const char *k, const char *v, void *cb)
+{
+   int status;
+
+   status = git_sequencer_config (k, v, NULL);
+   if (status)
+   return status;
+
+   return git_default_config(k, v, NULL);
+}
+
 static const char *action_name(const struct replay_opts *opts)
 {
return opts->action == REPLAY_REVERT ? "revert" : "cherry-pick";
@@ -208,7 +219,7 @@ int cmd_revert(int argc, const char **argv, const char 
*prefix)
if (isatty(0))
opts.edit = 1;
opts.action = REPLAY_REVERT;
-   git_config(git_default_config, NULL);
+   git_config(git_revert_config, NULL);
res = run_sequencer(argc, argv, );
if (res < 0)
die(_("revert failed"));
@@ -221,7 +232,7 @@ int cmd_cherry_pick(int argc, const char **argv, const char 
*prefix)
int res;
 
opts.action = REPLAY_PICK;
-   git_config(git_default_config, NULL);
+   git_config(git_revert_config, NULL);
res = run_sequencer(argc, argv, );
if (res < 0)
die(_("cherry-pick failed"));
diff --git a/sequencer.c b/sequencer.c
index 
e9060e3ca50777687c578ff09c62cd901efcfb0e..230bdb8535a422b1263429e5894e3f8b1733e844
 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -687,6 +687,36 @@ static int run_git_commit(const char *defmsg, struct 
replay_opts *opts,
return run_command();
 }
 
+static enum cleanup_mode default_msg_cleanup = CLEANUP_NONE;
+static char *default_gpg_sign;
+
+int git_sequencer_config(const char *k, const char *v, void *cb)
+{
+   if (!strcmp(k, "commit.cleanup")) {
+   int status;
+   const char *s;
+
+   status = git_config_string(, k, v);
+   if (!strcmp(s, "verbatim"))
+   default_msg_cleanup = CLEANUP_NONE;
+   else if (!strcmp(s, "whitespace"))
+   default_msg_cleanup = CLEANUP_SPACE;
+   else if (!strcmp(s, "strip"))
+   default_msg_cleanup = CLEANUP_ALL;
+   else if (!strcmp(s, "scissors"))
+   default_msg_cleanup = CLEANUP_NONE;
+
+   return status;
+   }
+
+   if (!strcmp(k, "commit.gpgsign")) {
+   default_gpg_sign = git_config_bool(k, v) ? "" : NULL;
+   return 0;
+   }
+
+   return git_gpg_config(k, v, NULL);
+}
+
 static int rest_is_empty(const struct strbuf *sb, int start)
 {
int i, eol;
diff --git a/sequencer.h b/sequencer.h
index 
d491f5cae2e6152859d114a08bbd7c935d1e80a6..2759ca2f7d1aba1268c88dc8bd94eb3e593897fe
 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -49,6 +49,7 @@ extern const char sign_off_header[];
 
 void append_signoff(struct strbuf *msgbuf, int ignore_footer, unsigned flag);
 void append_conflicts_hint(struct strbuf *msgbuf);
+int git_sequencer_config(const char *k, const char *v, void *cb);
 
 enum cleanup_mode {
CLEANUP_SPACE,
-- 
2.14.1



[RFC PATCH 3/8] sequencer: refactor update_head()

2017-09-25 Thread Phillip Wood
From: Phillip Wood 

The previous commit was a mechanical translation of the code from
builtin/commit.c. Now that it uses its own strbuf for the reflog
message it can be simplified slightly.

Signed-off-by: Phillip Wood 
---
 sequencer.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 
917ad4a16216b30adb2c2c9650217926d8db8ba7..1795a4df2a0021b2419d941c6083e49cd6647314
 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -759,8 +759,9 @@ int update_head(const struct commit *old_head, const struct 
object_id *new_head,
int ret = 0;
 
reflog_msg = getenv("GIT_REFLOG_ACTION");
-   if (!reflog_msg)
-   reflog_msg="";
+   if (reflog_msg)
+   strbuf_addstr(, reflog_msg);
+   strbuf_addstr(, ": ");
 
nl = strchr(msg->buf, '\n');
if (nl) {
@@ -769,8 +770,6 @@ int update_head(const struct commit *old_head, const struct 
object_id *new_head,
strbuf_addbuf(, msg);
strbuf_addch(, '\n');
}
-   strbuf_insert(, 0, reflog_msg, strlen(reflog_msg));
-   strbuf_insert(, strlen(reflog_msg), ": ", 2);
 
transaction = ref_transaction_begin(err);
if (!transaction ||
-- 
2.14.1



[PATCH v2 0/5] Improve abbreviation disambiguation

2017-09-25 Thread Derrick Stolee
Thanks for the feedback on my v1 patch. Thanks also to Jeff Hostetler
for helping me with this v2 patch, which includes an extra performance
improvement in commit 5.

Changes since last version:

* Added helper program test-list-objects to construct a list of
  existing object ids.

* test-abbrev now disambiguates a list of OIDs from stdin.

* p0008-abbrev.sh now has two tests:
* 0008.1 tests 100,000 known OIDs
* 0008.2 tests 100,000 missing OIDs

* Added a third performance improvement that uses binary search within
  packfiles to inspect at most two object ids per packfile.

Thanks,
 Stolee

---

When displaying object ids, we frequently want to see an abbreviation
for easier typing. That abbreviation must be unambiguous among all
object ids.

The current implementation of find_unique_abbrev() performs a loop
checking if each abbreviation length is unambiguous until finding one
that works. This causes multiple round-trips to the disk when starting
with the default abbreviation length (usually 7) but needing up to 12
characters for an unambiguous short-sha. For very large repos, this
effect is pronounced and causes issues with several commands, from
obvious consumers `status` and `log` to less obvious commands such as
`fetch` and `push`.

This patch improves performance by iterating over objects matching the
short abbreviation only once, inspecting each object id, and reporting
the minimum length of an unambiguous abbreviation.

A helper program `test-list-objects` outputs a sampling of object ids,
which we reorder using `sort -R` before using them as input to a
performance test. 

A performance helper `test-abbrev` and performance test `p0008-abbrev.sh`
are added to demonstrate performance improvements in this area.

I include performance test numbers in the commit messages for each
change, but I also include the difference between the baseline and the
final change here:


p0008.1: find_unique_abbrev() for existing objects
--

For 10 repeated tests, each checking 100,000 known objects, we find the
following results when running in a Linux VM:

|   | Pack  | Packed  | Loose  | Base   | New||
| Repo  | Files | Objects | Objects| Time   | Time   | Rel%   |
|---|---|-|||||
| Git   | 1 |  230078 |  0 | 0.12 s | 0.05 s | -58.3% |
| Git   | 5 |  230162 |  0 | 0.25 s | 0.15 s | -40.0% |
| Git   | 4 |  154310 |  75852 | 0.18 s | 0.11 s | -38.9% |
| Linux | 1 | 5606645 |  0 | 0.32 s | 0.10 s | -68.8% |
| Linux |24 | 5606645 |  0 | 2.77 s | 2.00 s | -27.8% |
| Linux |23 | 5283204 | 323441 | 2.86 s | 1.62 s | -43.4% |
| VSTS  | 1 | 4355923 |  0 | 0.27 s | 0.09 s | -66.7% |
| VSTS  |32 | 4355923 |  0 | 2.41 s | 2.01 s | -16.6% |
| VSTS  |31 | 4276829 |  79094 | 4.22 s | 3.02 s | -28.4% |

For the Windows repo running in Windows Subsystem for Linux:

Pack Files: 50
Packed Objects: 22,385,898
 Loose Objects: 492
 Base Time: 5.69 s
  New Time: 4.09 s
 Rel %: -28.1%

p0008.2: find_unique_abbrev() for missing objects
-

For 10 repeated tests, each checking 100,000 missing objects, we find
the following results when running in a Linux VM:

|   | Pack  | Packed  | Loose  | Base   | New||
| Repo  | Files | Objects | Objects| Time   | Time   | Rel%   |
|---|---|-|||||
| Git   | 1 |  230078 |  0 | 0.61 s | 0.04 s | -93.4% |
| Git   | 5 |  230162 |  0 | 1.30 s | 0.15 s | -88.5% |
| Git   | 4 |  154310 |  75852 | 1.07 s | 0.12 s | -88.8% |
| Linux | 1 | 5606645 |  0 | 0.65 s | 0.05 s | -92.3% |
| Linux |24 | 5606645 |  0 | 7.12 s | 1.28 s | -82.0% |
| Linux |23 | 5283204 | 323441 | 6.33 s | 0.96 s | -84.8% |
| VSTS  | 1 | 4355923 |  0 | 0.64 s | 0.05 s | -92.2% |
| VSTS  |32 | 4355923 |  0 | 7.77 s | 1.36 s | -82.5% |
| VSTS  |31 | 4276829 |  79094 | 7.76 s | 1.36 s | -82.5% |

For the Windows repo running in Windows Subsystem for Linux:

Pack Files: 50
Packed Objects: 22,385,898
 Loose Objects: 492
 Base Time: 38.9 s
  New Time:  2.7 s
 Rel %: -93.1%

Derrick Stolee (5):
  test-list-objects: List a subset of object ids
  p0008-abbrev.sh: Test find_unique_abbrev() perf
  sha1_name: Unroll len loop in find_unique_abbrev_r
  sha1_name: Parse less while finding common prefix
  sha1_name: Minimize OID comparisons during disambiguation

 Makefile |   2 +
 sha1_name.c  | 128 ++-
 t/helper/.gitignore  |   2 +
 t/helper/test-abbrev.c   |  19 +++
 t/helper/test-list-objects.c |  85 
 t/perf/p0008-abbrev.sh   |  22 
 6 files changed, 243 insertions(+), 15 deletions(-)
 create mode 100644 t/helper/test-abbrev.c
 create mode 

  1   2   >