Re: [GSoC][PATCH v6 2/2] submodule: port subcommand foreach from shell to C

2017-06-02 Thread Stefan Beller
On Fri, Jun 2, 2017 at 4:24 AM, Prathamesh Chavan  wrote:
> This aims to make git-submodule foreach a builtin. This is the very
> first step taken in this direction. Hence, 'foreach' is ported to
> submodule--helper, and submodule--helper is called from git-submodule.sh.
> The code is split up to have one function to obtain all the list of
> submodules. This function acts as the front-end of git-submodule foreach
> subcommand. It calls the function for_each_submodule_list, which basically
> loops through the list and calls function fn, which in this case is
> runcommand_in_submodule. This third function is a calling function that
> takes care of running the command in that submodule, and recursively
> perform the same when --recursive is flagged.
>
> The first function module_foreach first parses the options present in
> argv, and then with the help of module_list_compute, generates the list of
> submodules present in the current working tree.
>
> The second function for_each_submodule_list traverses through the
> list, and calls function fn (which in case of submodule subcommand
> foreach is runcommand_in_submodule) is called for each entry.
>
> The third function runcommand_in_submodule, generates a submodule struct sub
> for $name, value and then later prepends name=sub->name; and other
> value assignment to the env argv_array structure of a child_process.
> Also the  of submodule-foreach is push to args argv_array
> structure and finally, using run_command the commands are executed
> using a shell.
>
> The third function also takes care of the recursive flag, by creating
> a separate child_process structure and prepending "--super-prefix 
> displaypath",
> to the args argv_array structure. Other required arguments and the
> input  of submodule-foreach is also appended to this argv_array.
>

Is the commit message still accurate?
You describe the changes between the versions below the --- line,
that is not recorded in the permanent commit history.

In the commit message is less important to write "what" is happening,
because that can easily be read from the patch/commit itself, but rather
"why" things happen, such as design choices, maybe:

This aims to make git-submodule foreach a builtin. This is the very
first step taken in this direction. Hence, 'foreach' is ported to
submodule--helper, and submodule--helper is called from git-submodule.sh.

We'll introduce 3 functions, one that is exposed to the command line
and handles command line arguments, one to iterate over a set of
submodules, and finally one to execute an arbitrary shell command
in the submodule.

Attention must be paid to the 'path' variable, see 64394e3ae9
(git-submodule.sh: Don't use $path variable in eval_gettext string,
2012-04-17) details. The path varialbe is not exposed into the environment
of the invoked shell, but we just give "path=%s;" as the first argument.

We do not need to condition on the number of arguments as in 1c4fb136db
(submodule foreach: skip eval for more than one argument, 2013-09-27)
as we will run exactly one shell in the submodules directory.

Sign-off-...

>
> Other than that, additionally the case of no. of arugments in 
> being equal to 1 is also considered separetly.
> THe reason of having this change in the shell script was given in the
> commit 1c4fb136db.
> According to my understanding, eval "$1" executes $1 in same shell,
> whereas "$@" gets executed in a separate shell, which doesn't allow
> "$@" to access the env variables $name, $path, etc.
> Hence, to keep the ported function similar, this condition is also
> added.

This paragraph would be a good candidate for the commit message, too.
However as we rewrite it in C, we will spawn exactly one shell no matter
how many arguments we have (well for 0 we have no shell, but for 1 or more
arguments we'll spawn exactly one shell?)

> +   } else if (prefix) {
> +   struct strbuf sb = STRBUF_INIT;
> +   char *displaypath = xstrdup(relative_path(path, prefix, ));
> +   strbuf_release();
> +   return displaypath;

Note to self (or any other that is interested in long term clean code):
I have seen this pattern a couple of times, a strbuf just to appease
the argument list of relative_path.
(init_submodule, prepare_to_clone_next_submodule,
resolve_relative_path in submodule--helper
cmd_rev_parse in builtin/rev-parse
connect_work_tree_and_git_dir in dir.c
write_name_quoted_relative in quote.c
get_superproject_working_tree in submodule.c
cmd_main in test-path-utils;
actually all uses of this function :( )
We should really make a relative_path function that can work
without the need of a strbuf, maybe just wrap the 3 lines into a new
function, or remove the strbuf from the argument list.
(The potential memleak is horrible to fix though. But as seen here
we could just always return 

Re: public inbox links, was: Re: preserve untracked cache, was Re: What's cooking in git.git (Jun 2017, #01; Thu, 1)

2017-06-02 Thread Eric Wong
Stefan Beller  wrote:
> Today I learned again how public-inbox is awesome! Thanks Eric!

You're welcome :)

> * You can just copy the message ID INCLUDING the surrounding < >
>   and public inbox still just shows you the correct message. I had assumed
>   you would need to strip off the < > and I did so since.

Yeah, it's a fallback since it's probably a common mistake.
AFAIK, git-send-email also avoids redundantly adding '<>' and
only adds them if necessary.

> On Fri, Jun 2, 2017 at 5:26 PM, Junio C Hamano  wrote:

> > Issue #01 of June reports it in 'master':
> > https://public-inbox.org/git/
>
> * However with the < > unstripped, the awesomeness is limited:
>   Some tools (including my mail reader as well as public inbox itself[1])
>   do not recognize the link when there are < > in there.

Yeah, that's actually bad form on Junio's part.  public-inbox
can only support it up to an extent...

I seem to recall seeing some standard or style recommendation
that URLs (of any type) be surrounded by angle brackets in text:



So public-inbox (and other parsers) should stop looking for URLs
outside of the '<>'

But I think the newer style manuals state having spaces around the
URL is enough.

> While the second point is not the end of the world, it's still
> slightly annoying,
> which is why I thought I'll point it out here.
> 
> [1] https://public-inbox.org/git/xmqqvaodx6g4@gitster.mtv.corp.google.com/


Re: Wrong gitattributes documentation?

2017-06-02 Thread Rene Pasing
On 06/03/2017 02:01 AM, Junio C Hamano wrote:
> Your "/images/*" is the "Otherwise" case, isn't it?

Ok, sorry, didn't read your answer thoroughly enough the first time.

The problem is, when I have an entry in .gitignore like this:
/images

Then git will ignore that whole directory (and all of its subfolders
etc.). Git will just ignore everything in the tree starting with that
pathname.

But when I use the same pattern ('/images') in .gitattributes I would
expect (due to the documentation) that git-lfs will be called for all
files+subfolders+subfolderfiles and even the folder itself (this is what
I understand from 'The rules how the pattern matches paths are the same
as in .gitignore files').

This does not seem to be the case!

Sorry again for the confusion.



Re: Wrong gitattributes documentation?

2017-06-02 Thread Rene Pasing
On 06/03/2017 02:01 AM, Junio C Hamano wrote:
> Your "/images/*" is the "Otherwise" case, isn't it?

Yes, true. I forgot to mention in my first mail that I also tried
'/images/', which had the same (=none) effect as '/images/*', sorry
about the confusion.




public inbox links, was: Re: preserve untracked cache, was Re: What's cooking in git.git (Jun 2017, #01; Thu, 1)

2017-06-02 Thread Stefan Beller
On Fri, Jun 2, 2017 at 5:26 PM, Junio C Hamano  wrote:
>
> Issue #06 of May marked it to be merged to 'next':
> https://public-inbox.org/git/
>
> Issue #07 of May marked it for 'master':
> https://public-inbox.org/git/
>
> Issue #08 of May kept it (i.e. no issues discovered in the
> meantime):
> https://public-inbox.org/git/
>
> Issue #01 of June reports it in 'master':
> https://public-inbox.org/git/
>

Today I learned again how public-inbox is awesome! Thanks Eric!

* You can just copy the message ID INCLUDING the surrounding < >
  and public inbox still just shows you the correct message. I had assumed
  you would need to strip off the < > and I did so since.

* However with the < > unstripped, the awesomeness is limited:
  Some tools (including my mail reader as well as public inbox itself[1])
  do not recognize the link when there are < > in there.

While the second point is not the end of the world, it's still
slightly annoying,
which is why I thought I'll point it out here.

[1] https://public-inbox.org/git/xmqqvaodx6g4@gitster.mtv.corp.google.com/

Thanks,
Stefan


[PATCH] submodule foreach: correct $sm_path in nested submodules from a dir

2017-06-02 Thread Stefan Beller
When running 'git submodule foreach' from a subdirectory of your
repository, nested submodules get a bogus value for $sm_path:
For a submodule 'sub' that contains a nested submodule 'nested',
running 'git -C dir submodule foreach echo $path' would report
path='../nested' for the nested submodule. The first part '../' is
derived from the logic computing the relative path from $pwd to the
root of the superproject. The second part is the submodule path inside
the submodule. This value is of little use and is hard to document.

There are two different possible solutions that have more value:
(a) The path value is documented as the path from the toplevel of the
superproject to the mount point of the submodule.
In this case we would want to have path='sub/nested'.

(b) As Ramsay noticed the documented value is wrong. For the non-nested
case the path is equal to the relative path from $pwd to the
submodules working directory. When following this model,
the expected value would be path='../sub/nested'.

The behavior for (b) was introduced in 091a6eb0fe (submodule: drop the
top-level requirement, 2013-06-16) the intent for $path seemed to be
relative to $cwd to the submodule worktree, but that did not work for
nested submodules, as the intermittent submodules were not included in
the path.

If we were to fix the meaning of the $path using (a) such that "path"
is "the path from the toplevel of the superproject to the mount point
of the submodule", we would break any existing submodule user that runs
foreach from non-root of the superproject as the non-nested submodule
'../sub' would change its path to 'sub'.

If we would fix the meaning of the $path using (b), such that "path"
is "the relative path from $pwd to the submodule", then we would break
any user that uses nested submodules (even from the root directory) as
the 'nested' would become 'sub/nested'.

Both groups can be found in the wild.  The author has no data if one group
outweighs the other by large margin, and offending each one seems equally
bad at first.  However in the authors imagination it is better to go with
(a) as running from a sub directory sounds like it is carried out
by a human rather than by some automation task.  With a human on
the keyboard the feedback loop is short and the changed behavior can be
adapted to quickly unlike some automation that can break silently.

To ameliorate the situation, perform these changes
* Document 'sm_path' instead of 'path'.
  As using a variable '$path' may be harmful to users due to
  capitalization issues, see 64394e3ae9 (git-submodule.sh: Don't
  use $path variable in eval_gettext string, 2012-04-17). Adjust
  the documentation to advocate for using $sm_path,  which contains
  the same value. We still make the 'path' variable available,
  though not documented.

* Clarify the 'toplevel' variable documentation.
  It does not contain the topmost superproject as the author assumed,
  but the direct superproject, such that $toplevel/$sm_path is the
  actual absolute path of the submodule.

* The variable '$displaypath' was accessible but undocumented.
  Rename it '$displaypath' to '$dpath'. Document what it contains.
  Users that are broken by the behavior change of 'sm_path' introduced
  in this commit, can switch from '$path' to '$dpath'.

Discussed-with: Ramsay Jones 
Signed-off-by: Prathamesh Chavan 
Signed-off-by: Stefan Beller 
---

Ramsay,
I would think this would be least offensive to all parties involved.
With my current understanding of the situation
I think this is the best fix for now.

Prathamesh, sorry for the missleading suggestions earlier
on how to approach this bug. Let's see how the discussion turns out on this
one before rebasing the rewrite in C on this one.

Thanks,
Stefan

 Documentation/git-submodule.txt | 32 ++--
 git-submodule.sh|  7 +++
 t/t7407-submodule-foreach.sh| 39 ---
 3 files changed, 57 insertions(+), 21 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 74bc6200d5..52e3ef1325 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -218,20 +218,24 @@ information too.
 
 foreach [--recursive] ::
Evaluates an arbitrary shell command in each checked out submodule.
-   The command has access to the variables $name, $path, $sha1 and
-   $toplevel:
-   $name is the name of the relevant submodule section in .gitmodules,
-   $path is the name of the submodule directory relative to the
-   superproject, $sha1 is the commit as recorded in the superproject,
-   and $toplevel is the absolute path to the top-level of the superproject.
-   Any submodules defined in the superproject but not checked out are
-   ignored by this command. Unless given `--quiet`, foreach prints the name
-   of each 

Re: pushing for a new hash, was Re: [PATCH 2/3] rebase: Add tests for console output

2017-06-02 Thread Junio C Hamano
Jonathan Nieder  writes:

> 2. Hard-coded object ids in tests: As Stefan hinted, many tests beyond
>t00* make assumptions about the exact values of object ids.  That's
>bad for maintainability for other reasons beyond the hash function
>transition, too.
>
>It should be possible to suss them out by patching git's sha1
>routine to use the ones-complement of sha1 (~sha1) instead and
>seeing which tests fail.

One particularly nasty one is t1512-rev-parse-disambiguation that
ensures that the abbreviation and disambiguation works correctly.
It uses a set of objects (tags, commits, trees and blobs) whose
object names all begin with number of "0"; which will of course
become useless once we change the hash function.

No matter what new hash function is chosen, we'd need a similar
test to ensure that disambiguation works correctly, so one of the
tasks for hash migration is to port (not drop) this test.


Re: preserve untracked cache, was Re: What's cooking in git.git (Jun 2017, #01; Thu, 1)

2017-06-02 Thread Junio C Hamano
Johannes Schindelin  writes:

> Hi Junio,
>
> On Fri, 2 Jun 2017, Junio C Hamano wrote:
>
>> Samuel Lijin  writes:
>> 
>> >> What is holding this topic up? Anything Ben or I can do to move this
>> >> closer to `next` or even `master`?
>> >
>> > It's in `next` right now (3196d093d6).
>> 
>> Thanks for pinging and checking ;-)  
>> 
>> I think the topic was merged to 'next' on the 23rd of last month and
>> graduated to 'master' in the past few days, together with other
>> topics.
>
> Okay. I never saw any "Will merge to" message, so I got worried.

Well, I cannot quite help if you are not reading them ;-)

Issue #06 of May marked it to be merged to 'next':
https://public-inbox.org/git/

Issue #07 of May marked it for 'master':
https://public-inbox.org/git/

Issue #08 of May kept it (i.e. no issues discovered in the
meantime):
https://public-inbox.org/git/

Issue #01 of June reports it in 'master':
https://public-inbox.org/git/




Re: Unaligned accesses in sha1dc

2017-06-02 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> On Fri, Jun 2, 2017 at 4:46 PM, Liam R. Howlett  
> wrote:
>
>> 2.13.0 is very much broken for me on SPARC.
>> {maint//git} $ make -j120
>> [...]
>> {maint//git} $ ./git log
>> [1]1004506 bus error (core dumped)  ./git log
>>
>> This is with b06d36431 (maint).
>>
>> The same thing happens on v2.13.0-384-g826c06412 (master).
>>
>> v2.13.0-539-g4b9c06c7d (next) works for me, as did following the
>> instructions on upgrading the sha1dc code myself.
>
> Thanks a lot. So that works as I suspect on SPARC, hopefully it'll be
> in master (and 2.13.1) soon.

Thanks.  Hopefully your ab/sha1dc-maint a0103914 ("sha1dc: update
from upstream", 2017-05-20) should be sufficient for helping the
users in the real-world, then.




Re: Unaligned accesses in sha1dc

2017-06-02 Thread Junio C Hamano
Linus Torvalds  writes:

> Dereferencing an unaligned pointer may be "undefined" in some
> technical meaning, but it sure as hell isn't undefined in reality, and
> compilers that willfully do stupid things should not be catered to
> overly. Reality is a lot more important.

Thanks for succinctly putting it this way.  I think you said the
above number of times, and I was looking for one of them to include
a reference to in the response I was preparing, but this message
made it unnecessary ;-)

> And I think gcc can be tweaked to use "movbe" on x86 with the right
> magic incantation (ie not just __builtin_bswap32(), but also the
> switch to enable the new instructions).  So having code to detect gcc
> and using __builtin_bswap32() would indeed be a good thing.


Re: Wrong gitattributes documentation?

2017-06-02 Thread Junio C Hamano
Rene Pasing  writes:

> The problem is, the documentation[1] says: "The rules how the pattern
> matches paths are the same as in .gitignore files; see gitignore[5].",
> so when I have a pattern like '/images/', it should match on all
> files+folders under /images, even the directory itself, right?
>
> Or when
> I'd use '/images/*', it should match on all files+folders under
> /images, right?

My reading of "PATTERN FORMAT" section in "git help ignore" says it
shouldn't.

 - If the pattern does not contain a slash '/', Git treats it as
   a shell glob pattern and checks for a match against the
   pathname relative to the location of the `.gitignore` file
   (relative to the toplevel of the work tree if not from a
   `.gitignore` file).

 - Otherwise, Git treats the pattern as a shell glob suitable
   for consumption by fnmatch(3) with the FNM_PATHNAME flag:
   wildcards in the pattern will not match a / in the pathname.
   For example, "Documentation/{asterisk}.html" matches
   "Documentation/git.html" but not "Documentation/ppc/ppc.html"
   or "tools/perf/Documentation/perf.html".

Your "/images/*" is the "Otherwise" case, isn't it?


Re: [PATCH] perf: work around the tested repo having an index.lock

2017-06-02 Thread Junio C Hamano
Jeff King  writes:

> But I think a more compelling case is that there may be an ongoing
> operation in the original repo (e.g., say you are in the middle of
> writing a commit message) when we do a blind copy of the filesystem
> contents. You might racily pick up a lockfile.
>
> Should we find and delete all *.lock files in the copied directory? That
> would get ref locks, etc. Half-formed object files are OK. Technically
> if you want to get an uncorrupted repository you'd also want to copy
> refs before objects (in case somebody makes a new object and updates a
> ref while you're copying).
>
> I don't know how careful it's worth being. I don't really _object_ to
> this patch exactly, but it does seem like it's picking up one random
> case (that presumably you hit) and ignoring all of the related cases.

My feeling exactly.  Diagnosing and failing upfront saying "well you
made a copy but it is not suitable for testing" sounds more sensible
at lesat to me.


Re: How to avoid "Please tell me who you are..."?

2017-06-02 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> On Fri, Jun 2, 2017 at 9:15 AM, Junio C Hamano  wrote:
>> Jeffrey Walton  writes:
>>
>>> Is there no switch? Its the most efficient way to accomplish the task.
>>
>> This is a safety to help normal human users from hurting themselves,
>> and it does not make any sense to have "I have no name, so record
>> garbage, please" option, switch or setting that is different from
>> "Here is the name I want you to use when recording things".
>>
>> The switch _is_ to set the names with whatever standard way.
>
> Presumably OP doesn't want to mess with the env for whatever reason,
> in that case:
>
> git -c user.name=Nobody -c user.email=nob...@example.com
> cherry-pick 

Exactly.  That is what I meant as one of the "whatever standard
way".


Re: [PATCH 25/33] notes-merge: convert verify_notes_filepair to struct object_id

2017-06-02 Thread Junio C Hamano
Brandon Williams  writes:

> On 06/02, Junio C Hamano wrote:
>> 
>> > -static int path_to_sha1(const char *path, unsigned char *sha1)
>> > +static int path_to_oid(const char *path, struct object_id *oid)
>> >  {
>> > -  char hex_sha1[40];
>> > +  char hex_oid[GIT_SHA1_HEXSZ];
>> >int i = 0;
>> > -  while (*path && i < 40) {
>> > +  while (*path && i < GIT_SHA1_HEXSZ) {
>> >if (*path != '/')
>> > -  hex_sha1[i++] = *path;
>> > +  hex_oid[i++] = *path;
>> >path++;
>> >}
>> 
>> It's no brainer to do s/GIT_SHA1_HEXSZ/GIT_MAX_HEXSZ/ for all of the
>> above, but ...
>
> I'll fix this.
>
>> 
>> > -  if (*path || i != 40)
>> > +  if (*path || i != GIT_SHA1_HEXSZ)
>> >return -1;
>> 
>> ... this one is tricky.  
>> 
>> What's in our envisioned future?  Are we expecing to see object
>> names, named with two or more hash functions, in a same repository?
>> If so, and one is 20 bytes and another one is 32 bytes, then this
>> should check 'i' against 40 and 64 and pass if 'i' is one of these
>> expected lengths?
>
> That's a good question, and at this point in time do we have an
> envisioned future?  From some of our conversations I seem to remember
> that we don't want a single repository to have objects based on two
> different hash functions, but rather some translation layer to convert
> between two hashing functions (for compatibility with other
> non-converted repos).  Though nothing has been settled upon yet so I
> don't know what the future would look like (and the best way to prepare
> for it).

I do not think we know precisely what we want yet.  The "MAX" in the
allocation size is an indication that we are allocating for the
largest variant possible in the version of Git being compiled, which
is a hint that we anticipate that we may have multiple variants with
different sizes.  And that is where my "against 40 and 64 ... one of
these expected lengths" come from.  

But there is no such set of macros that define acceptable/expected
lengths and that tells me that nobody among who defined, reviewed
and accepted the MAX macro has thought this part through yet.

I see Jonathan started talking about experiments with "different
hash" elsewhere; I am reading his message as assuming one hash at a
time, and the repository tells us what the hash is.  And I think
that is one plausible design.

If we were to go that route, then

- MAX will be the maximum among hashes we can choose at the
  beginning of a process (perhaps by consulting the repository
  extension).

- In addition to GIT_SHA1_HEXSZ, GIT_SHA2_HEXSZ, and friends, we
  need a variable that records the length of the hash chosen for
  the process at the startup, GIT_OIDHASH_HEXSZ.  That would
  become one of the fields in your "struct repository",

#define GIT_OIDHASH_HEXSZ (the_repository.oidhash_hexsz)

  among other things like what the oid for an empty blob is.

I'd say in the meantime we can do something like the attached and
use it here, perhaps?

 cache.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/cache.h b/cache.h
index 53999418d3..cafb13db9f 100644
--- a/cache.h
+++ b/cache.h
@@ -70,6 +70,10 @@ unsigned long git_deflate_bound(git_zstream *, unsigned 
long);
 #define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
 #define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
 
+/* The length for the object name hash */
+#define GIT_OIDHASH_RAWSZ GIT_SHA1_RAWSZ
+#define GIT_OIDHASH_HEXSZ GIT_SHA1_HEXSZ
+
 struct object_id {
unsigned char hash[GIT_MAX_RAWSZ];
 };


Re: [PATCH 00/33] object id conversion (grep and diff)

2017-06-02 Thread Junio C Hamano
Brandon Williams  writes:

> On 06/02, Junio C Hamano wrote:
>> 
>> I lied.  This also conflicts somewhat with Peff's diff-blob topic.
>> I think I resolved them correctly (there needs evil merges applied
>> to two files when merging this topic), and hopefully can push out
>> the result by the end of the day.
>> 
>> Thanks.
>
> If it ends up being too much of a headache for you to deal with, let me
> know and I can rebase on top of those series.  That way you don't have to
> deal with the conflict resolutions.  Just let me know what you'd like me
> to do.

Sorry, I forgot to push the result out, even though I double checked
the conflict resolution I did last night.  Now it is in the public
repository.  I have one squash queued at the right place to update
SHA1s to OIDs in the comment Brian pointed out.

If you ever need to rebase this on top of future 'master' that
already has js/blame-lib topic, fetching from me and checking
the evil merge I did may turn out to be helpful:

 $ git fetch git://github.com/gitster/git refs/merge-fix/bw/object-id
 $ git show FETCH_HEAD

but I can take patches based on the same old 'master'; as long as
the evil merge above is correct, that would probably be preferrable,
as it makes it easier to compare the two iterations of your series.

Repeating some backstory of "merge-fix" might be beneficial, if a
reader is interested.  Otherwise the remainder of this message can
safely be skipped.

After a topic (i.e. js/blame-lib) moves a lot of code around (i.e. a
bulk of what used to be in builtin/blame.c is now in blame.c and
some in diff.c), merging a topic that touches places in the code
that was moved in-place (i.e. bw/object-id, which updates the code
in builtin/blame.c) will leave a conflict that looks like:

<<< HEAD
... very little that is left after moving
... bunch of code out of this file
||| common
... a lot of
... stuff before
... your change from SHA1 to OID
... is here
===
... a lot of
... stuff after
... your change from SHA1 to OID
... is here
>>> theirs

As far as builtin/blame.c is concerned, the resolution for this
set of conflicts is just to take the HEAD version, ignoring all of
your SHA1-to-OID changes.  Once it is resolved, "rerere" can help
us remember this resolution to builtin/blame.c

But the ignored changes need to go somewhere else.

What the user who is doing a merge does at this point is to compare
what is between ||| and === (i.e. common ancestor's version)
with what is between === and >>> (i.e. their version) to
figure out what the branch being merged did.  And the user needs to
know where the common code went in the version in HEAD.

 $ git log [--no-merges] -p MERGE_HEAD.. -- builtin/blame.c

is helpful to locate the commit that lost the common lines from the
file.  And "git show" on it will tell us that they mostly went to
blame.c while some migrating to diff.c; we found out what you did by
comparing "common" and "theirs" in the previous step and we apply
these changes to these "new" places.

And that is the diff you see in refs/merge-fix/bw/object-id.  The
script I use to re-build 'pu' every day (probably I use it three
times a day on average) knows about that ref.  The script starts
from the tip of 'master', and for each topic, (1) run "git merge"
into HEAD, (2) take resolution recorded by "rerere" and (3) if
merge/fix/$topic exists, cherry-pick it on top to squash into the
merge made in (2).

Once I have taught my rerere database and refs/merge-fix/ about this
merge, it is not too big a deal to redo the merge to adjust to an
updated 'master' or a new interation of your series because of the
above mechanism.  And that is why I say it is OK for an updated series
to be based on the same old 'master'.

Thanks.




Re: [WIP v2 2/2] pack-objects: support --blob-max-bytes

2017-06-02 Thread Jonathan Nieder
Jeff King wrote:
> On Fri, Jun 02, 2017 at 12:38:45PM -0700, Jonathan Tan wrote:

>> +--blob-max-bytes=::
>> +This option can only be used with --stdout. If specified, a blob
>> +larger than this will not be packed unless a to-be-packed tree
>> +has that blob with a filename beginning with ".git".  The size
>> +can be suffixed with "k", "m", or "g", and may be "0".
>> ++
>> +If specified, after printing the packfile, pack-objects will print the
>> +count of excluded blobs (8 bytes) . Subsequently, for each excluded blob
>> +in hash order, pack-objects will print its hash (20 bytes) and its size
>> +(8 bytes). All numbers are in network byte order.
>> ++
>> +If pack-objects cannot efficiently determine, for an arbitrary blob,
>> +that no to-be-packed tree has that blob with a filename beginning with
>> +".git" (for example, if bitmaps are used to calculate the objects to be
>> +packed), it will pack all blobs regardless of size.
>
> Hmm. So this feature can't be used with bitmaps at all?  That seems like
> a big downside, as we still have to walk the whole graph to come up with
> the list of blobs (even if we're sending them). That's 30-40 seconds of
> CPU per clone on torvalds/linux. I imagine it's much worse on
> repositories big enough to need a full GVFS-style "don't send any blobs"
> approach.
>
> We have a name-hash cache extension in the bitmap file, but it doesn't
> carry enough information to deduce the .git-ness of a file. I don't
> think it would be too hard to add a "flags" extension, and give a single
> bit to "this is a .git file".

A nicer approach IMHO is to include an extra bitmap, like the existing
object-type bitmaps (see the dump_bitmap calls in
bitmap_writer_finish).  This would would represent the set of all
.git* blobs in the pack.

[...]
>  If you're not just avoiding large blobs but trying
> to get a narrow clone, you don't want the .git files from the
> uninteresting parts of the tree.

This is something I've wondered about, too.  Part of the story is that
we haven't started omitting trees, so there is already O(number of
trees) objects being sent and some additional small blobs for .git*
specials doesn't make it much worse.  Sending all .git* blobs keeps
things simple since the server doesn't have to infer which .git* blobs
are relevant to this client.

Longer term, we will likely want to allow clients to request omission
of some trees, too.  Omitting the corresponding .git* files becomes
more straightforward at that point.

Does that make sense?

Jonathan


Re: git-gui ignores core.hooksPath

2017-06-02 Thread Philipp Gortan
Thanks Philip,

I've created a pull request there -
https://github.com/patthoyts/git-gui/pull/12



signature.asc
Description: OpenPGP digital signature


Please i need your urgent and sincere reply

2017-06-02 Thread Mrs Gloria
Dear Good day,


I sent this mail praying for it to reach you in good health, since I
myself are in a very critical health condition in which I sleep every
night without knowing if I may be alive to see the next day. I am a
widow suffering from long time illness. I have some funds I inherited
from my late husband, my Doctor told me recently that I would not last
due to the illness. Having known my condition, I decided to donate
this fund to a good person that will utilize it the way i am going to
instruct herein. I need a very honest  person who can claim this money
and use it for Charity works, for orphanages, widows and also build
schools for less privilege .

I accept this decision because I do not have any child who will
inherit this money after I die.Please I want your sincerely and urgent
answer to know if you will be able to execute this project, and I will
give you more information on how the fund will be transferred to your
bank account.

I am waiting for your reply and  my private email address is
caronglori...@yahoo.com
Thank you,

Mrs Gloria


Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor

2017-06-02 Thread Ævar Arnfjörð Bjarmason
On Sat, Jun 3, 2017 at 12:05 AM, Ben Peart  wrote:
>
>
> On 6/2/2017 6:28 AM, Ævar Arnfjörð Bjarmason wrote:
>>
>> Add a performance test for the new core.fsmonitor facility using the
>> sample query-fsmonitor hook.
>>
>> This is WIP code for the reasons explained in the setup comments,
>> unfortunately the perf code doesn't easily allow you to run different
>> setup code for different versions you're testing. This test will stop
>> working if the fsmonitor is merged into the master branch.
>>
>> Output against linxu.git:
>>
>>  $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux
>> GIT_PERF_MAKE_OPTS='-j8' ./run origin/master avar/fsmonitor
>> ./p7519-fsmonitor.sh
>>  [...]
>>  Test  origin/master avar/fsmonitor
>>
>> ---
>>  7519.2: status (first)0.08(0.04+0.09)   0.12(0.07+0.10)
>> +50.0%
>>  7519.3: status (subsequent)   0.08(0.04+0.09)   0.12(0.06+0.11)
>> +50.0%
>>  7519.4: status -uno   0.02(0.02+0.05)   0.06(0.05+0.06)
>> +200.0%
>>  7519.5: status -uall  0.08(0.06+0.07)   0.12(0.07+0.10)
>> +50.0%
>>
>
> With regular status times this low, the overhead of calling the hook +
> watchman + perl results in slower overall times as I noted in my initial
> cover letter.  If status calls are already this fast, fsmonitor + watchman
> isn't needed and obviously doesn't help.
>
> This does highlight an optimization I could add.  Even though -uno is
> passed, the fsmonitor code currently still goes through the work of marking
> the untracked entries as dirty.  I'll look at optimizing that out to speed
> status up when using that option.
>
>> And against a larger in-house monorepo I have here, with the same
>> options (except the repo path):
>>
>>  Test  origin/master avar/fsmonitor
>>
>> ---
>>  7519.2: status (first)0.20(0.11+0.18)   0.27(0.15+0.21)
>> +35.0%
>>  7519.3: status (subsequent)   0.20(0.11+0.18)   0.27(0.15+0.21)
>> +35.0%
>>  7519.4: status -uno   0.04(0.03+0.10)   0.22(0.08+0.12)
>> +450.0%
>>  7519.5: status -uall  0.20(0.13+0.16)   0.27(0.18+0.19)
>> +35.0%
>>
>> Against linux.git with a hack to flush the FS cache (on Linux) before
>> running the first 'git status', only running one test so the result
>> isn't discarded as the slowest of N:
>>
>
> I don't know know about on Linux but with Windows, when you flush the file
> system cache via unmount/mount, it causes Watchman to do a full scan with
> the next query.  This has a significant negative performance impact on the
> next status call as it returns the set of _all_ files which git then uses to
> find and mark all index and untracked cache entries as fsmonitor dirty then
> does a complete scan.  This combination makes the first status after these
> events slower than without Watchman.

I'm not sure, but I don't think this is a thing at all on Linux. How
inotify works is unrelated to whether some part of the filesystem
happens to be cached, and the echo+proc command I have is just
flushing all those pages. That I'm flushing all the pages at once
should be as much of a non-event as one page being claimed from the
fscache for use by the RAM allocation of some application.

I suspect Windows somehow conflates its fs watching implementation
with what on Linux is the inotify queue the application has to
consume, if that queue isn't consumed watchman will need to re-scan.

Or maybe my mental model of this whole thing is wrong and this also
happens on Linux...

> I'm currently working with the Watchman team to return a code indicating we
> should just scan everything ourselves to avoid the extra overhead, but that
> solution is TBD.  Given it only happens the first time a query is done on a
> new clone or the first time after watchman is started, I didn't intend to
> hold up the patch series for it but will submit an enhanced version once a
> solution is available.  The enhanced version will then be the same
> performance (for the first status call) as when not running with fsmonitor -
> not faster - as the state is unknown so must be gathered from the working
> directory.

I don't have time to update the perf test now or dig into it, but most
of what you're describing in this mail doesn't at all match with the
ad-hoc tests I ran in
https://public-inbox.org/git/CACBZZX5e58bWuf3NdDYTxu2KyZj29hHONzN=rp-7vxd8nur...@mail.gmail.com/

There (at the very end of the E-Mail) I'm running watchman in a tight
loop while I flush the entire fs cache, its runtime is never longer
than 600ms, with 3ms being the norm.

I.e. flushing the cache doesn't slow things down much at all compared
to how long a "git status" takes from cold cache. Something else must
be going on, and the smoking gun is the gprof output I posted in the
follow-up E-Mail:

Re: git-gui ignores core.hooksPath

2017-06-02 Thread Philip Oakley

From: "Ævar Arnfjörð Bjarmason" 

On Fri, Jun 2, 2017 at 3:41 PM, Philipp Gortan  wrote:

Hi git devs,

First off, thanks for your awesome work!

I've been unhappy for quite a while that I had to configure the hooks
manually for each of my repos - until I found out recently that there is
the core.hooksPath config variable that (when set globally) allows me to
specify a hooks directory to be used for all my repositories.

Now I was happy - for a few minutes, until I tested this feature in
git-gui, and realized that it doesn't work there.

This seems to be caused by "proc githook_read", which says "set pchook
[gitdir hooks $hook_name]" instead of querying "git config
core.hooksPath" first - cf
https://github.com/git/git/blob/2cc2e70264e0fcba04f9ef791d144bbc8b501206/git-gui/git-gui.sh#L627

Would be great if this could get fixed...


Hi. I added core.hooksPath, glad to see it's useful to other people.

This indeed is something that should be fixed, but git-gui development
is managed outside of git.git, it's just occasionally pulled in. I'm
not what the best place to contact is, but I've CC'd
Philip Oakley who's been making recent commits to git-gui.git at
http://repo.or.cz/git-gui.git/


The proper maintainer for the git-gui is Pat Thoyts (cc'd) who now has a 
repo at https://github.com/patthoyts/git-gui where Pull requests can be 
made, with patches posted here initially for wider review.


I'm just another contributor, but I have managed (with a bit of help from 
google and formative years using FORTH;-) to get a few git-gui patches 
tested and incorporated. tcl/tk isn't that hard, its just interpreted 
code.., the main part, for me, was working out how to ru the code in the 
Git-for-Windows SDK environment (with help).

--
Philip 



Re: git-2.13.0: log --date=format:%z not working

2017-06-02 Thread Jeff King
On Sat, Jun 03, 2017 at 12:47:59AM +0200, Ulrich Mueller wrote:

> > On Fri, 2 Jun 2017, Jeff King wrote:
> 
> > The remaining question is whether we want to care about preserving the
> > system %Z for the local-timezone case.
> 
> No strong preference here. Maybe go for consistency, and have %Z
> always return the same format (either empty, or same as %z). That
> would at least prevent surprises when users switch from format-local
> to format.

It also a lot easier to implement, which is nice.

I agree on the least surprise thing, but the flipside of this is that
Git's use of strftime will behave differently than other programs on the
system (e.g., "date +%Z").

-Peff


Re: git-2.13.0: log --date=format:%z not working

2017-06-02 Thread Ulrich Mueller
> On Fri, 2 Jun 2017, Jeff King wrote:

> The remaining question is whether we want to care about preserving the
> system %Z for the local-timezone case.

No strong preference here. Maybe go for consistency, and have %Z
always return the same format (either empty, or same as %z). That
would at least prevent surprises when users switch from format-local
to format.

Ulrich


Re: git-2.13.0: log --date=format:%z not working

2017-06-02 Thread Jeff King
On Sat, Jun 03, 2017 at 12:04:32AM +0200, Ulrich Mueller wrote:

> Actually, the POSIX definition for %Z continues: "or by no bytes if no
> timezone information exists." So also returning an empty string would
> be compliant (but maybe not very helpful).
> [...]
> I agree that GMT+0200 could be misleading. But what about resolving %Z
> the same as %z in the case of the author's time zone, as was suggested
> earlier? It is supposed to be human-readable output, or do we expect
> that someone would use the %Z output and e.g. plug it back into their
> TZ?

Yeah, I think these are the only real contenders: an empty string, or
"+0200" (which _isn't_ confusing, because it doesn't have the
abbreviation in front of it, so it pretty clearly is an offset from
GMT).

I don't have a preference between the other two.

The remaining question is whether we want to care about preserving the
system %Z for the local-timezone case.

-Peff


Re: [WIP v2 2/2] pack-objects: support --blob-max-bytes

2017-06-02 Thread Jeff King
On Fri, Jun 02, 2017 at 12:38:45PM -0700, Jonathan Tan wrote:

> +--blob-max-bytes=::
> + This option can only be used with --stdout. If specified, a blob
> + larger than this will not be packed unless a to-be-packed tree
> + has that blob with a filename beginning with ".git".  The size
> + can be suffixed with "k", "m", or "g", and may be "0".
> ++
> +If specified, after printing the packfile, pack-objects will print the
> +count of excluded blobs (8 bytes) . Subsequently, for each excluded blob
> +in hash order, pack-objects will print its hash (20 bytes) and its size
> +(8 bytes). All numbers are in network byte order.
> ++
> +If pack-objects cannot efficiently determine, for an arbitrary blob,
> +that no to-be-packed tree has that blob with a filename beginning with
> +".git" (for example, if bitmaps are used to calculate the objects to be
> +packed), it will pack all blobs regardless of size.

Hmm. So this feature can't be used with bitmaps at all?  That seems like
a big downside, as we still have to walk the whole graph to come up with
the list of blobs (even if we're sending them). That's 30-40 seconds of
CPU per clone on torvalds/linux. I imagine it's much worse on
repositories big enough to need a full GVFS-style "don't send any blobs"
approach.

We have a name-hash cache extension in the bitmap file, but it doesn't
carry enough information to deduce the .git-ness of a file. I don't
think it would be too hard to add a "flags" extension, and give a single
bit to "this is a .git file".

I do also wonder if the two features would need to be separated for a
GVFS-style solution. If you're not just avoiding large blobs but trying
to get a narrow clone, you don't want the .git files from the
uninteresting parts of the tree. You want to get no blobs at all, and
then fault them in as they become relevant due to user action.

-Peff


Re: [PATCH v4 0/6] Fast git status via a file system watcher

2017-06-02 Thread Jeff King
On Thu, Jun 01, 2017 at 02:13:10PM -0700, Stefan Beller wrote:

> >
> >>  $ git am /tmp/original_msg.txt
> >>  Applying: fsmonitor: add documentation for the fsmonitor extension.
> >>  error: patch failed: Documentation/githooks.txt:448
> >>  error: Documentation/githooks.txt: patch does not apply
> >>  Patch failed at 0001 fsmonitor: add documentation for the
> >> fsmonitor extension.
> >>  The copy of the patch that failed is found in:
> >> .git/rebase-apply/patch
> >>  When you have resolved this problem, run "git am --continue".
> >>  If you prefer to skip this patch, run "git am --skip" instead.
> >>  To restore the original branch and stop patching, run "git am
> >> --abort".
> 
> Try again with -3. (We should make that the default for am, maybe?)
> It helped me for most of the conflicts that I saw.

Yeah, I was going to suggest the same thing. "git apply" is much pickier
than "patch" about fuzzing the hunks (intentionally so, because the
results can sometimes end up not what the sender intended). But in most
cases "-3" can apply the patch without losing safety (unless the sender
built off a branch that you don't even have).

There's some more discussion of --3way as the default in

  
http://public-inbox.org/git/CAKwvOdn9j=_Ob=xq4ucn6ar1g537zniu9ox4if6o1qo7kpy...@mail.gmail.com/T/#u

-Peff


Re: [WIP v2 0/2] Modifying pack objects to support --blob-max-bytes

2017-06-02 Thread Jeff King
On Fri, Jun 02, 2017 at 12:38:43PM -0700, Jonathan Tan wrote:

> > Do we need to future-proof the output format so that we can later
> > use 32-byte hash?  The input to pack-objects (i.e. rev-list --objects)
> > is hexadecimal text, and it may not be so bad to make this also
> > text, e.g. " SP  LF".  That way, you do not have to
> > worry about byte order, hash size, or length limited to uint64.
> 
> The reason for using binary is for the convenience of the client to
> avoid another conversion before storing it to disk (and also network
> savings). In a large repo, I think this list will be quite large. I
> realized that I didn't mention anything about this in the commit
> message, so I have added an explanation.
> 
> I think this is sufficiently future-proof in that the format of this
> hash matches the format of the hashes used in the objects in the
> packfile. As for object size being limited to uint64, I think the
> benefits of the space savings (in using binary) outweigh the small risk
> that our files will get larger than that before we upgrade our protocol
> :-P

The rest of the pack code uses a varint encoding which is generally
much smaller than a uint64 for most files, but can handle arbitrary
sizes.

The one thing it loses is that you wouldn't have a fixed-size record, so
if you were planning to dump this directly to disk and binary-search it,
that won't work. OTOH, you could make pseudo-pack-entries and just
index them along with the rest of the objects in the pack .idx.

The one subtle thing there is that the pseudo-entry would have to say
"this is my sha1". And then we'd end up repeating that sha1 in the .idx
file. So it's optimal on the network but wastes 20 bytes on disk (unless
index-pack throws away the in-pack sha1s as it indexes, which is
certainly an option).

> > Can this multiplication overflow (hint: st_mult)?
> 
> Thanks - used st_mult.

Actually, I think this is a good candidate for ALLOC_ARRAY().

> > This sorting is a part of external contract (i.e. "output in hash
> > order" is documented), but is it necessary?  Just wondering.
> 
> It is convenient for the client because the client can then store it
> directly and binary search when accessing it. (Added a note about
> convenience to the commit message.)

In general the Git client doesn't trust the pack data coming from a
remote, and you can't corrupt a client by sending it bogus data. Either
the client computes it from scratch (e.g., the sha1s of each object) or
the client will reject nonsense (missing bases, refs pointing to objects
that aren't sent, etc).

I know this feature implies a certain amount of trust (after all, the
server could claim that it omitted any sha1 it likes), but we should
probably still be as strict as possible that what the other side is
sending makes sense. In this case, we should probably hashcmp() each
entry with the last and make sure they're strictly increasing (no
out-of-order and no duplicates).

-Peff


Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor

2017-06-02 Thread Ben Peart



On 6/2/2017 6:28 AM, Ævar Arnfjörð Bjarmason wrote:

Add a performance test for the new core.fsmonitor facility using the
sample query-fsmonitor hook.

This is WIP code for the reasons explained in the setup comments,
unfortunately the perf code doesn't easily allow you to run different
setup code for different versions you're testing. This test will stop
working if the fsmonitor is merged into the master branch.

Output against linxu.git:

 $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_MAKE_OPTS='-j8' ./run origin/master avar/fsmonitor ./p7519-fsmonitor.sh
 [...]
 Test  origin/master avar/fsmonitor
 ---
 7519.2: status (first)0.08(0.04+0.09)   0.12(0.07+0.10) +50.0%
 7519.3: status (subsequent)   0.08(0.04+0.09)   0.12(0.06+0.11) +50.0%
 7519.4: status -uno   0.02(0.02+0.05)   0.06(0.05+0.06) +200.0%
 7519.5: status -uall  0.08(0.06+0.07)   0.12(0.07+0.10) +50.0%



With regular status times this low, the overhead of calling the hook + 
watchman + perl results in slower overall times as I noted in my initial 
cover letter.  If status calls are already this fast, fsmonitor + 
watchman isn't needed and obviously doesn't help.


This does highlight an optimization I could add.  Even though -uno is 
passed, the fsmonitor code currently still goes through the work of 
marking the untracked entries as dirty.  I'll look at optimizing that 
out to speed status up when using that option.



And against a larger in-house monorepo I have here, with the same
options (except the repo path):

 Test  origin/master avar/fsmonitor
 ---
 7519.2: status (first)0.20(0.11+0.18)   0.27(0.15+0.21) +35.0%
 7519.3: status (subsequent)   0.20(0.11+0.18)   0.27(0.15+0.21) +35.0%
 7519.4: status -uno   0.04(0.03+0.10)   0.22(0.08+0.12) +450.0%
 7519.5: status -uall  0.20(0.13+0.16)   0.27(0.18+0.19) +35.0%

Against linux.git with a hack to flush the FS cache (on Linux) before
running the first 'git status', only running one test so the result
isn't discarded as the slowest of N:



I don't know know about on Linux but with Windows, when you flush the 
file system cache via unmount/mount, it causes Watchman to do a full 
scan with the next query.  This has a significant negative performance 
impact on the next status call as it returns the set of _all_ files 
which git then uses to find and mark all index and untracked cache 
entries as fsmonitor dirty then does a complete scan.  This combination 
makes the first status after these events slower than without Watchman.


I'm currently working with the Watchman team to return a code indicating 
we should just scan everything ourselves to avoid the extra overhead, 
but that solution is TBD.  Given it only happens the first time a query 
is done on a new clone or the first time after watchman is started, I 
didn't intend to hold up the patch series for it but will submit an 
enhanced version once a solution is available.  The enhanced version 
will then be the same performance (for the first status call) as when 
not running with fsmonitor - not faster - as the state is unknown so 
must be gathered from the working directory.



 $ GIT_PERF_REPEAT_COUNT=1 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='sudo sync 
&& echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null && make -j8' ./run 
origin/master avar/fsmonitor ./p7519-fsmonitor.sh
 [...]
 Test  origin/master avar/fsmonitor
 
 7519.2: status (first)0.30(0.18+0.10)   8.26(0.22+0.10) +2653.3%
 7519.3: status (subsequent)   0.08(0.04+0.08)   0.81(0.09+0.07) +912.5%
 7519.4: status -uno   0.02(0.01+0.06)   0.08(0.04+0.07) +300.0%
 7519.5: status -uall  0.08(0.06+0.07)   0.15(0.08+0.09) +87.5%

Now obviously due to 1 run that has a lot of noise, but I would expect
that first invocation to be blindingly fast since watchman has info on
what files were modified since the cache was flushed.



Every (first) run of the performance test will be very expensive for the 
reasons outlined above.  This is clearest when you only have 1 run as it 
doesn't get masked by the faster 2nd+ runs.  That first run will never 
be "blindingly fast" as git has to gather the initial state and save out 
the cache - it's all the subsequent calls that will be faster.



The same on the large monorepo noted above:

 Test  origin/master avar/fsmonitor
 ---
 7519.2: status (first)0.59(0.28+0.24)   0.93(0.35+0.19) +57.6%
 7519.3: status (subsequent)   0.20(0.10+0.19)   

Re: git-2.13.0: log --date=format:%z not working

2017-06-02 Thread Ulrich Mueller
> On Fri, 2 Jun 2017, Jeff King wrote:

> On Fri, Jun 02, 2017 at 07:25:43PM +0200, René Scharfe wrote:
>> On Linux "%z %Z" is expanded to "+0200 CEST" for me, while on Windows I
>> get "Mitteleurop▒ische Sommerzeit Mitteleurop▒ische Sommerzeit".  (That
>> "▒" is probably supposed to be an "ä".)  POSIX requires  +hhmm or -hhmm
>> format for %z, and for %Z is to be "Replaced by the timezone name or
>> abbreviation".

Actually, the POSIX definition for %Z continues: "or by no bytes if no
timezone information exists." So also returning an empty string would
be compliant (but maybe not very helpful).

>> I'd say "GMT+0200" etc. is a nice enough timezone name, i.e. having %Z
>> resolve to the same as %z plus a literal prefix of "GMT" should at least
>> not be wrong.

> I thought that, too, but I think it is wrong based on my understanding
> of how $TZ is parsed. There something like "EDT-4" means "call this EDT,
> and by the way it is 4 hours behind GMT".

> So what you're proposing isn't wrong per se, but your notation means
> something totally different than what similar-looking notation looks
> like on the $TZ end, which is bound to create confusion.

I agree that GMT+0200 could be misleading. But what about resolving %Z
the same as %z in the case of the author's time zone, as was suggested
earlier? It is supposed to be human-readable output, or do we expect
that someone would use the %Z output and e.g. plug it back into their
TZ?

>> Alternatively we could have a lookup table mapping a few typical offsets
>> to timezone names, but e.g. handling daylight saving times would
>> probably be too hard (when did that part of the world switch in the
>> given year?  north or south of the equator?)..

IMHO maintaining such a local table of timezones won't fly.

> Right, I don't think the mapping of zone to offset is reversible,
> because many zones map to the same offset. If I tell you I'm in -0500,
> even just in the US that could mean Eastern Standard Time (winter, no
> DST) or Central Daylight Time (summer, DST). Not to mention that other
> political entities in the same longitude have their own zones which do
> DST at different times (or were even established as zones at different
> times; historical dates need to use the zones as they were at that
> time).

Same here, my +0200 offset could be anything of CAT, CEST, EET, IST,
SAST, or WAST, according to IANA timezone data. It's a one-directional
mapping, and there's no way to get the author's /etc/localtime info
(or whatever its equivalent is on other systems) back from the offset
stored in the commit. A timezone name may not even exist at all for a
given [+-]hhmm offset.

Ulrich


Re: git-2.13.0: log --date=format:%z not working

2017-06-02 Thread Jeff King
On Fri, Jun 02, 2017 at 11:23:30AM +0900, Junio C Hamano wrote:

> René Scharfe  writes:
> 
> > Am 27.05.2017 um 23:46 schrieb Jeff King:
> >> On Sat, May 27, 2017 at 06:57:08PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> 
> >>> There's another test which breaks if we just s/gmtime/localtime/g. As
> >>> far as I can tell to make the non-local case work we'd need to do a
> >>> whole dance where we set the TZ variable to e.g. UTC$offset, then call
> >>> strftime(), then call it again. Maybe there's some way to just specify
> >>> the tz offset, but I didn't find any in a quick skimming of time.h.
> >> 
> >> There isn't.
> > Right.  We could handle %z internally, though.  %Z would be harder (left
> > as an exercise for readers..).
> >
> > First we'd have to undo 0a0416a3 (strbuf_expand: convert "%%" to "%"),
> > though, in order to give full control back to strbuf_expand callbacks.
> >
> > 2-pack patch:
> 
> I think the list concensus is that handling %z ourselves like this
> one does is the best we can do portably.

I've actually been turning it over in my head. This feels hacky, but I'm
not sure if we can do better.

In theory the solution is:

  1. Start using localtime() instead of gmtime() with an adjustment when
 we are converting to the local timezone (i.e., format-local). We
 should be able to do this portably.

 This is easy to do, and it's better than handling %z ourselves,
 because it makes %Z work, too.

  2. When showing the author's timezone, do some trickery to set the
 program's timezone, then use localtime(), then restore the program
 timezone.

 I couldn't get this to work reliably. And anyway, we'd still have
 nothing to put in %Z since we don't have a timezone name at all in
 the git objects. We just have "+0400" or whatever.

So I don't see a portable way to make (2) work. But it seems a shame
that %Z does not work for case (1) with René's patch.

I guess we could do (1) for the local cases and then handle "%z"
ourselves otherwise. That sounds even _more_ confusing, but it at least
gets the most cases right.

If we do handle "%z" ourselves (either always or for just the one case),
what should the matching %Z say? Right now (and I think with René's
patch) it says GMT, which is actively misleading. We should probably
replace it with the same text as "%z". That's not quite what the user
wanted, but at least it's accurate.

> > --- >8 ---
> >  builtin/cat-file.c |  5 +
> >  convert.c  |  1 +
> >  daemon.c   |  3 +++
> >  date.c |  2 +-
> >  ll-merge.c |  5 +++--
> >  pretty.c   |  3 +++
> >  strbuf.c   | 39 ++-
> >  strbuf.h   | 11 +--
> >  t/t6006-rev-list-format.sh | 12 
> >  9 files changed, 63 insertions(+), 18 deletions(-)

As far as the patch itself goes, I'm disappointed to lose the automatic
"%" handling for all of the other callers. But I suspect the boilerplate
involved in any solution that lets callers choose whether or not to use
it would end up being longer than just handling it in each caller.

-Peff


Re: Unaligned accesses in sha1dc

2017-06-02 Thread Linus Torvalds
On Fri, Jun 2, 2017 at 1:17 PM, demerphq  wrote:
> Most hash function implementations have code like the following
> (extracted and reduced from hv_macro.h in perl.git [which only
> supports little-endian hash functions]):

Yes.

Please do *not* try to make things overly portable by adding random
memcpy() functions.

Yes, many compilers will do ok with it, and generate the right code
(single load from an unaligned address). But many won't. Gcc
completely screws it up in older versions on ARM, for example.

Dereferencing an unaligned pointer may be "undefined" in some
technical meaning, but it sure as hell isn't undefined in reality, and
compilers that willfully do stupid things should not be catered to
overly. Reality is a lot more important.

And I think gcc can be tweaked to use "movbe" on x86 with the right
magic incantation (ie not just __builtin_bswap32(), but also the
switch to enable the new instructions).  So having code to detect gcc
and using __builtin_bswap32() would indeed be a good thing.

 Linus


RE: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor

2017-06-02 Thread David Turner
BTW, a medium-sized (~250k files across 40k dirs) synthetic repo is available 
over bittorrent at:
http://bitmover.com/2015-04-03-1M-git-bare.tar.bz2.torrent

I tried Ævar's perf test with that (on a beefy laptop with SSD), and got 
significantly slower results with bp/fsmonitor:
Test  origin/master bp/fsmonitor   
---
7519.2: status (first)0.32(0.23+0.39)   0.32(0.26+0.36) +0.0%  
7519.3: status (subsequent)   0.18(0.14+0.34)   0.32(0.24+0.37) +77.8% 
7519.4: status -uno   0.11(0.08+0.32)   0.24(0.18+0.34) +118.2%
7519.5: status -uall  0.49(0.22+0.56)   0.62(0.36+0.55) +26.5% 
7519.2: status (first)0.32(0.23+0.39)   0.32(0.26+0.36) +0.0%  
7519.3: status (subsequent)   0.18(0.14+0.34)   0.32(0.24+0.37) +77.8% 
7519.4: status -uno   0.11(0.08+0.32)   0.24(0.18+0.34) +118.2%
7519.5: status -uall  0.49(0.22+0.56)   0.62(0.36+0.55) +26.5% 

I have not yet looked into why this is.

> -Original Message-
> From: Ævar Arnfjörð Bjarmason [mailto:ava...@gmail.com]
> Sent: Friday, June 2, 2017 6:29 AM
> To: git@vger.kernel.org
> Cc: Junio C Hamano ; Ben Peart
> ; Nguyễn Thái Ngọc Duy ;
> Johannes Schindelin ; David Turner
> ; Jeff King ; Christian
> Couder ; Ævar Arnfjörð Bjarmason
> 
> Subject: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor
> 
> Add a performance test for the new core.fsmonitor facility using the sample
> query-fsmonitor hook.
> 
> This is WIP code for the reasons explained in the setup comments,
> unfortunately the perf code doesn't easily allow you to run different setup
> code for different versions you're testing. This test will stop working if the
> fsmonitor is merged into the master branch.
> 
> Output against linxu.git:
> 
> $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux
> GIT_PERF_MAKE_OPTS='-j8' ./run origin/master avar/fsmonitor ./p7519-
> fsmonitor.sh
> [...]
> Test  origin/master avar/fsmonitor
> ---
> 7519.2: status (first)0.08(0.04+0.09)   0.12(0.07+0.10) +50.0%
> 7519.3: status (subsequent)   0.08(0.04+0.09)   0.12(0.06+0.11) +50.0%
> 7519.4: status -uno   0.02(0.02+0.05)   0.06(0.05+0.06) +200.0%
> 7519.5: status -uall  0.08(0.06+0.07)   0.12(0.07+0.10) +50.0%
> 
> And against a larger in-house monorepo I have here, with the same options
> (except the repo path):
> 
> Test  origin/master avar/fsmonitor
> ---
> 7519.2: status (first)0.20(0.11+0.18)   0.27(0.15+0.21) +35.0%
> 7519.3: status (subsequent)   0.20(0.11+0.18)   0.27(0.15+0.21) +35.0%
> 7519.4: status -uno   0.04(0.03+0.10)   0.22(0.08+0.12) +450.0%
> 7519.5: status -uall  0.20(0.13+0.16)   0.27(0.18+0.19) +35.0%
> 
> Against linux.git with a hack to flush the FS cache (on Linux) before running
> the first 'git status', only running one test so the result isn't discarded 
> as the
> slowest of N:
> 
> $ GIT_PERF_REPEAT_COUNT=1 GIT_PERF_LARGE_REPO=~/g/linux
> GIT_PERF_MAKE_COMMAND='sudo sync && echo 3 | sudo tee
> /proc/sys/vm/drop_caches >/dev/null && make -j8' ./run origin/master
> avar/fsmonitor ./p7519-fsmonitor.sh
> [...]
> Test  origin/master avar/fsmonitor
> 
> 7519.2: status (first)0.30(0.18+0.10)   8.26(0.22+0.10) +2653.3%
> 7519.3: status (subsequent)   0.08(0.04+0.08)   0.81(0.09+0.07) +912.5%
> 7519.4: status -uno   0.02(0.01+0.06)   0.08(0.04+0.07) +300.0%
> 7519.5: status -uall  0.08(0.06+0.07)   0.15(0.08+0.09) +87.5%
> 
> Now obviously due to 1 run that has a lot of noise, but I would expect that
> first invocation to be blindingly fast since watchman has info on what files
> were modified since the cache was flushed.
> 
> The same on the large monorepo noted above:
> 
> Test  origin/master avar/fsmonitor
> ---
> 7519.2: status (first)0.59(0.28+0.24)   0.93(0.35+0.19) +57.6%
> 7519.3: status (subsequent)   0.20(0.10+0.19)   0.28(0.16+0.20) +40.0%
> 7519.4: status -uno   0.04(0.04+0.09)   0.11(0.08+0.12) +175.0%
> 7519.5: status -uall  0.29(0.11+0.18)   0.40(0.16+0.19) +37.9%
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
> 
> 
> On Fri, Jun 2, 2017 at 2:40 AM, Ben Peart  wrote:
> > Any chance you can provide me with a bash 

Re: [PATCH v2] docs: fix formatting and grammar

2017-06-02 Thread Jeff King
On Fri, Jun 02, 2017 at 10:45:38AM +0900, Junio C Hamano wrote:

> > ...I should have read to the end of the sentence. It should also be "in
> > the `$GIT_DIR/remotes/` file". Or just drop "file".
> 
> There is another one nearby.  Here is what I understand as your
> suggestion (the "just drop" variant), which I'll queue as SQUASH???
> on top of Adam's patch.

Yes, that's exactly what I meant (and the other looks good, too).
Thanks.

-Peff


Re: Unaligned accesses in sha1dc

2017-06-02 Thread Ævar Arnfjörð Bjarmason
On Fri, Jun 2, 2017 at 10:17 PM, demerphq  wrote:
> On 2 June 2017 at 21:32, Ævar Arnfjörð Bjarmason  wrote:
>> On Fri, Jun 2, 2017 at 11:49 AM, Martin Ågren  wrote:
>>> On 2 June 2017 at 10:51, Ævar Arnfjörð Bjarmason  wrote:
 On Fri, Jun 2, 2017 at 2:15 AM, Junio C Hamano  wrote:
> Martin Ågren  writes:
>
>> I looked into this some more. It turns out it is possible to trigger
>> undefined behavior on "next". Here's what I did:
>> ...
>>
>> This "fixes" the problem:
>> ...
>> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
>> index 3dff80a..d6f4c44 100644
>> --- a/sha1dc/sha1.c
>> +++ b/sha1dc/sha1.c
>> @@ -66,9 +66,9 @@
>> ...
>> With this diff, various tests which seem relevant for SHA-1 pass,
>> including t0013, and the UBSan-error is gone. The second diff is just
>> a monkey-patch. I have no reason to believe I will be able to come up
>> with a proper and complete patch for sha1dc. And I guess such a thing
>> would not really be Git's patch to carry, either. But at least Git
>> could consider whether to keep relying on undefined behavior or not.
>>
>> There's a fair chance I've mangled the whitespace. I'm using gmail's
>> web interface... Sorry about that.
>
> Thanks.  I see Marc Stevens is CC'ed in the thread, so I'd expect
> that the final "fix" would come from his sha1collisiondetection
> repository via Ævar.
>
> In the meantime, I am wondering if it makes sense to merge the
> earlier update with #ifdef ALLOW_UNALIGNED_ACCESS and #ifdef
> SHA1DC_FORCE_LITTLEENDIAN for the v2.13.x maintenance track, which
> would at least unblock those on platforms v2.13.0 did not work
> correctly at all.
>
> Ævar, thoughts?

 I think we're mixing up several things here, which need to be untangled:

 1) The sha1dc works just fine on most platforms even with undefined
 behavior, as evidenced by 2.13.0 working.
>>>
>>> Right, with "platform" meaning "combination of hardware-architecture
>>> and compiler". Nothing can be said about how the current code behaves
>>> on "x86". Such statements can only be made with regard to "x86 and
>>> this or that compiler". Even then, short of studying the compiler
>>> implementation/documentation in detail, one cannot be certain that
>>> seemingly unrelated changes in Git don't make the code do something
>>> else entirely.
>>
>> I think you're veering into a theoretical discussion here that has
>> little to no bearing on the practicalities involved here.
>>
>> Yes if something is undefined behavior in C the compiler &
>> architecture is free to do anything they want with it. In practice
>> lots of undefined behavior is de-facto standardized across various
>> platforms.
>>
>> As far as I can tell unaligned access is one of those things. I don't
>> think there's ever been an x86 chip / compiler that would run this
>> code with any semantic differences when it comes to unaligned access,
>> and such a chip / compiler is unlikely to ever exist.
>>
>> I'm not advocating that we rely on undefined behavior willy-nilly,
>> just that we should consider the real situation is (i.e. what actual
>> architectures / compilers are doing or are likely to do) as opposed to
>> the purely theoretical (if you gave a bunch of aliens who'd never
>> heard of our technology the ANSI C standard to implement from
>> scratch).
>>
>> Here's a performance test of your patch above against p3400-rebase.sh.
>> I don't know how much these error bars from t/perf can be trusted.
>> This is over 30 runs with -O3:
>>
>> - 3400.2: rebase on top of a lot of unrelated changes
>>   v2.12.0 : 1.25(1.10+0.06)
>>   v2.13.0 : 1.21(1.06+0.06) -3.2%
>>   origin/next : 1.22(1.04+0.07) -2.4%
>>   martin: 1.23(1.06+0.07) -1.6%
>> - 3400.4: rebase a lot of unrelated changes without split-index
>>   v2.12.0 : 6.49(3.60+0.52)
>>   v2.13.0 : 8.21(4.18+0.55) +26.5%
>>   origin/next : 8.27(4.34+0.64) +27.4%
>>   martin: 8.80(4.36+0.62) +35.6%
>> - 3400.6: rebase a lot of unrelated changes with split-index
>>   v2.12.0 : 6.77(3.56+0.51)
>>   v2.13.0 : 4.09(2.67+0.38) -39.6%
>>   origin/next : 4.13(2.70+0.36) -39.0%
>>   martin: 4.30(2.80+0.32) -36.5%
>>
>> And just your patch v.s. next:
>>
>> - 3400.2: rebase on top of a lot of unrelated changes
>>   origin/next : 1.22(1.06+0.06)
>>   martin  : 1.22(1.06+0.05) +0.0%
>> - 3400.4: rebase a lot of unrelated changes without split-index
>>   origin/next : 7.54(4.13+0.60)
>>   martin  : 7.75(4.34+0.67) +2.8%
>> - 3400.6: rebase a lot of unrelated changes with split-index
>>   origin/next : 4.19(2.92+0.31)
>>   martin  : 4.14(2.84+0.39) -1.2%
>>
>> It seems to be a bit slower, is that speedup worth the use of
>> unaligned access? I genuinely don't know. I'm just 

Re: [PATCH] respect core.hooksPath, falling back to .git/hooks

2017-06-02 Thread Philipp Gortan
Dear Philip,

the previous mail contains a patch against the master of
http://repo.or.cz/git-gui.git

Could you please review it? I am not a TCL developer, so please take
extra care!

Thanks,
Philipp



signature.asc
Description: OpenPGP digital signature


Re: pushing for a new hash, was Re: [PATCH 2/3] rebase: Add tests for console output

2017-06-02 Thread Ævar Arnfjörð Bjarmason
On Fri, Jun 2, 2017 at 7:54 PM, Jonathan Nieder  wrote:
> Hi Dscho,
>
> Johannes Schindelin wrote:
>> On Thu, 1 Jun 2017, Stefan Beller wrote:
>
>>> We had a discussion off list how much of the test suite is in bad shape,
>>> and "$ git grep ^index" points out a lot of places as well.
>>
>> Maybe we should call out a specific month (or even a longer period) during
>> which we try to push toward that new hash function, and focus more on
>> those tasks (and on critical bug fixes, if any) than anything else.
>
> Thanks for offering. ;-)
>
> Here's a rough list of some useful tasks, in no particular order:
>
> 1. bc/object-id: This patch series continues, eliminating assumptions
>about the size of object ids by encapsulating them in a struct.
>One straightforward way to find code that still needs to be
>converted is to grep for "sha" --- often the conversion patches
>change function and variable names to refer to oid_ where they used
>to use sha1_, making the stragglers easier to spot.
>
> 2. Hard-coded object ids in tests: As Stefan hinted, many tests beyond
>t00* make assumptions about the exact values of object ids.  That's
>bad for maintainability for other reasons beyond the hash function
>transition, too.
>
>It should be possible to suss them out by patching git's sha1
>routine to use the ones-complement of sha1 (~sha1) instead and
>seeing which tests fail.

I just hacked this up locally. While a lot of stuff fails, it's not
exactly an out of control garbage fire and could be churned through by
someone interested. A lot of it's just lazy sha1 hardcoding for no
good reason where we could use a tag, or e.g. test_cmp on ls-tree
output without the test really caring about the hash. Things that
really need to test the sha1 could be guarded by some new prereq.

Both of my attempts to fuzz SHA1DCInit though resulted in some
segfaults, I tried both changing "0x" to "~0x" in the ihv assignment,
and just calling SHA1DCUpdate(ctx, "x", 1) at the end of the function,
not sure why that's happening.

If someone knows offhand what I'm doing wrong here I might be
interested in going through this if I don't have to sift through the
segfaults. I.e. what's an easy hack to make all of git pretend that
"foo" hashes to "xfoo", "" to "x" etc.

> 4. When choosing a hash function, people may argue about performance.
>It would be useful for run some benchmarks for git (running
>the test suite, t/perf tests, etc) using a variety of hash
>functions as input to such a discussion.

To the extent that such benchmarks matter, it seems prudent to heavily
weigh them in favor of whatever seems to be likely to be the more
common hash function going forward, since those are likely to get
faster through future hardware acceleration.

E.g. Intel announced Goldmont last year which according to one SHA-1
implementation improved from 9.5 cycles per byte to 2.7 cpb[1]. They
only have acceleration for SHA-1 and SHA-256[2]

1. https://github.com/weidai11/cryptopp/issues/139#issuecomment-264283385

2. https://en.wikipedia.org/wiki/Goldmont


Re: Unaligned accesses in sha1dc

2017-06-02 Thread demerphq
On 2 June 2017 at 22:14, Ævar Arnfjörð Bjarmason  wrote:
> On Fri, Jun 2, 2017 at 10:11 PM, Martin Ågren  wrote:
>> On 2 June 2017 at 21:32, Ævar Arnfjörð Bjarmason  wrote:
>>> On Fri, Jun 2, 2017 at 11:49 AM, Martin Ågren  
>>> wrote:
 On 2 June 2017 at 10:51, Ævar Arnfjörð Bjarmason  wrote:
> On Fri, Jun 2, 2017 at 2:15 AM, Junio C Hamano  wrote:
>> Martin Ågren  writes:
>>
>>> I looked into this some more. It turns out it is possible to trigger
>>> undefined behavior on "next". Here's what I did:
>>> ...
>>>
>>> This "fixes" the problem:
>>> ...
>>> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
>>> index 3dff80a..d6f4c44 100644
>>> --- a/sha1dc/sha1.c
>>> +++ b/sha1dc/sha1.c
>>> @@ -66,9 +66,9 @@
>>> ...
>>> With this diff, various tests which seem relevant for SHA-1 pass,
>>> including t0013, and the UBSan-error is gone. The second diff is just
>>> a monkey-patch. I have no reason to believe I will be able to come up
>>> with a proper and complete patch for sha1dc. And I guess such a thing
>>> would not really be Git's patch to carry, either. But at least Git
>>> could consider whether to keep relying on undefined behavior or not.
>>>
>>> There's a fair chance I've mangled the whitespace. I'm using gmail's
>>> web interface... Sorry about that.
>>
>> Thanks.  I see Marc Stevens is CC'ed in the thread, so I'd expect
>> that the final "fix" would come from his sha1collisiondetection
>> repository via Ævar.
>>
>> In the meantime, I am wondering if it makes sense to merge the
>> earlier update with #ifdef ALLOW_UNALIGNED_ACCESS and #ifdef
>> SHA1DC_FORCE_LITTLEENDIAN for the v2.13.x maintenance track, which
>> would at least unblock those on platforms v2.13.0 did not work
>> correctly at all.
>>
>> Ævar, thoughts?
>
> I think we're mixing up several things here, which need to be untangled:
>
> 1) The sha1dc works just fine on most platforms even with undefined
> behavior, as evidenced by 2.13.0 working.

 Right, with "platform" meaning "combination of hardware-architecture
 and compiler". Nothing can be said about how the current code behaves
 on "x86". Such statements can only be made with regard to "x86 and
 this or that compiler". Even then, short of studying the compiler
 implementation/documentation in detail, one cannot be certain that
 seemingly unrelated changes in Git don't make the code do something
 else entirely.
>>>
>>> I think you're veering into a theoretical discussion here that has
>>> little to no bearing on the practicalities involved here.
>>>
>>> Yes if something is undefined behavior in C the compiler &
>>> architecture is free to do anything they want with it. In practice
>>> lots of undefined behavior is de-facto standardized across various
>>> platforms.
>>>
>>> As far as I can tell unaligned access is one of those things. I don't
>>> think there's ever been an x86 chip / compiler that would run this
>>> code with any semantic differences when it comes to unaligned access,
>>> and such a chip / compiler is unlikely to ever exist.
>>>
>>> I'm not advocating that we rely on undefined behavior willy-nilly,
>>> just that we should consider the real situation is (i.e. what actual
>>> architectures / compilers are doing or are likely to do) as opposed to
>>> the purely theoretical (if you gave a bunch of aliens who'd never
>>> heard of our technology the ANSI C standard to implement from
>>> scratch).
>>
>> Yeah, that's an argument. I just thought I'd provide whatever input I
>> could, albeit in text form. The only thing that matters in the end is
>> that you (the Git project) feel that you make the correct decision,
>> possibly going beyond "theoretical" reasoning into engineering-land.
>
> I forgot to note, I think it would be very useful if you could submit
> that patch of yours in cleaned up form to the upstream sha1dc project:
> https://github.com/cr-marcstevens/sha1collisiondetection
>
> They might be interested in taking it, even if it's guarded by some
> macro "don't do unaligned access even on archs that seem OK with it".
>
> My comments are just focusing on this in the context of whether we
> should be hotfixing our copy due to an issue in the wild, like e.g.
> the SPARC issue.

A good way to get the sha1dc project properly tests on all platforms
would be to wrap it in a cpan distribution and let cpants (cpan
testers) test it for you on all the platforms under the sun.

In the Sereal project we found and fixed many portability issues with
the csnappy code simply because there are people testing modules in
the cpan world on every platform you can think of, and a few you might
be surprised to find out people still use.

Yves

Yves


-- 
perl 

[PATCH] respect core.hooksPath, falling back to .git/hooks

2017-06-02 Thread Philipp Gortan
Signed-off-by: Philipp Gortan 
---

The following patch tries to fix git-gui to respect the core.hooksPath config
variable, falling back to the old behavior.

git-gui.sh | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/git-gui.sh b/git-gui.sh
index 5bc21b8..a5335b1 100755
--- a/git-gui.sh
+++ b/git-gui.sh
@@ -624,7 +624,10 @@ proc git_write {args} {
 }
 
 proc githook_read {hook_name args} {
-   set pchook [gitdir hooks $hook_name]
+   if {[catch {set hooksdir [git config core.hooksPath]}]} {
+   set hooksdir [gitdir hooks]
+   }
+   set pchook [file join $hooksdir $hook_name]
lappend args 2>@1
 
# On Windows [file executable] might lie so we need to ask
-- 
2.13.0



Re: Unaligned accesses in sha1dc

2017-06-02 Thread demerphq
On 2 June 2017 at 21:32, Ævar Arnfjörð Bjarmason  wrote:
> On Fri, Jun 2, 2017 at 11:49 AM, Martin Ågren  wrote:
>> On 2 June 2017 at 10:51, Ævar Arnfjörð Bjarmason  wrote:
>>> On Fri, Jun 2, 2017 at 2:15 AM, Junio C Hamano  wrote:
 Martin Ågren  writes:

> I looked into this some more. It turns out it is possible to trigger
> undefined behavior on "next". Here's what I did:
> ...
>
> This "fixes" the problem:
> ...
> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
> index 3dff80a..d6f4c44 100644
> --- a/sha1dc/sha1.c
> +++ b/sha1dc/sha1.c
> @@ -66,9 +66,9 @@
> ...
> With this diff, various tests which seem relevant for SHA-1 pass,
> including t0013, and the UBSan-error is gone. The second diff is just
> a monkey-patch. I have no reason to believe I will be able to come up
> with a proper and complete patch for sha1dc. And I guess such a thing
> would not really be Git's patch to carry, either. But at least Git
> could consider whether to keep relying on undefined behavior or not.
>
> There's a fair chance I've mangled the whitespace. I'm using gmail's
> web interface... Sorry about that.

 Thanks.  I see Marc Stevens is CC'ed in the thread, so I'd expect
 that the final "fix" would come from his sha1collisiondetection
 repository via Ævar.

 In the meantime, I am wondering if it makes sense to merge the
 earlier update with #ifdef ALLOW_UNALIGNED_ACCESS and #ifdef
 SHA1DC_FORCE_LITTLEENDIAN for the v2.13.x maintenance track, which
 would at least unblock those on platforms v2.13.0 did not work
 correctly at all.

 Ævar, thoughts?
>>>
>>> I think we're mixing up several things here, which need to be untangled:
>>>
>>> 1) The sha1dc works just fine on most platforms even with undefined
>>> behavior, as evidenced by 2.13.0 working.
>>
>> Right, with "platform" meaning "combination of hardware-architecture
>> and compiler". Nothing can be said about how the current code behaves
>> on "x86". Such statements can only be made with regard to "x86 and
>> this or that compiler". Even then, short of studying the compiler
>> implementation/documentation in detail, one cannot be certain that
>> seemingly unrelated changes in Git don't make the code do something
>> else entirely.
>
> I think you're veering into a theoretical discussion here that has
> little to no bearing on the practicalities involved here.
>
> Yes if something is undefined behavior in C the compiler &
> architecture is free to do anything they want with it. In practice
> lots of undefined behavior is de-facto standardized across various
> platforms.
>
> As far as I can tell unaligned access is one of those things. I don't
> think there's ever been an x86 chip / compiler that would run this
> code with any semantic differences when it comes to unaligned access,
> and such a chip / compiler is unlikely to ever exist.
>
> I'm not advocating that we rely on undefined behavior willy-nilly,
> just that we should consider the real situation is (i.e. what actual
> architectures / compilers are doing or are likely to do) as opposed to
> the purely theoretical (if you gave a bunch of aliens who'd never
> heard of our technology the ANSI C standard to implement from
> scratch).
>
> Here's a performance test of your patch above against p3400-rebase.sh.
> I don't know how much these error bars from t/perf can be trusted.
> This is over 30 runs with -O3:
>
> - 3400.2: rebase on top of a lot of unrelated changes
>   v2.12.0 : 1.25(1.10+0.06)
>   v2.13.0 : 1.21(1.06+0.06) -3.2%
>   origin/next : 1.22(1.04+0.07) -2.4%
>   martin: 1.23(1.06+0.07) -1.6%
> - 3400.4: rebase a lot of unrelated changes without split-index
>   v2.12.0 : 6.49(3.60+0.52)
>   v2.13.0 : 8.21(4.18+0.55) +26.5%
>   origin/next : 8.27(4.34+0.64) +27.4%
>   martin: 8.80(4.36+0.62) +35.6%
> - 3400.6: rebase a lot of unrelated changes with split-index
>   v2.12.0 : 6.77(3.56+0.51)
>   v2.13.0 : 4.09(2.67+0.38) -39.6%
>   origin/next : 4.13(2.70+0.36) -39.0%
>   martin: 4.30(2.80+0.32) -36.5%
>
> And just your patch v.s. next:
>
> - 3400.2: rebase on top of a lot of unrelated changes
>   origin/next : 1.22(1.06+0.06)
>   martin  : 1.22(1.06+0.05) +0.0%
> - 3400.4: rebase a lot of unrelated changes without split-index
>   origin/next : 7.54(4.13+0.60)
>   martin  : 7.75(4.34+0.67) +2.8%
> - 3400.6: rebase a lot of unrelated changes with split-index
>   origin/next : 4.19(2.92+0.31)
>   martin  : 4.14(2.84+0.39) -1.2%
>
> It seems to be a bit slower, is that speedup worth the use of
> unaligned access? I genuinely don't know. I'm just interested to find
> what if anything we need to urgently fix in a release version of git.
>
> One data point there is that the fallback blk-sha1 implementation
> we've shipped 

Re: [PATCH 0/3] Use skip_prefix() in handle_revision_{,pseudo_}opt()

2017-06-02 Thread Jeff King
On Fri, Jun 02, 2017 at 09:10:07PM +0200, SZEDER Gábor wrote:

> While at it, the first one fixes a minor bug, which allowed e.g. 'git
> log --no-min-parents-foobarbaz' to succeed.
> 
> The other two are fairly straightforward starts_with() ->
> skip_prefix() conversions.

These all look fine to me. It would be nice to address the extra cases I
mentioned on top, but even without that this is a strict improvement.

-Peff


Re: [PATCH 2/3] revision.c: use skip_prefix() in handle_revision_opt()

2017-06-02 Thread Jeff King
On Fri, Jun 02, 2017 at 04:11:43PM -0400, Jeff King wrote:

>   if (match_opt(arg, "--early-output"), )) {
>   int count = optarg ? atoi(optarg) : 100;
>   ...
>   }
> 
> which is a little nicer and could maybe help other options (I didn't see
> any, though).

I take it back. This would help --show-linear-break that your patch also
touches. And maybe --pretty, too.

-Peff


Re: Unaligned accesses in sha1dc

2017-06-02 Thread Ævar Arnfjörð Bjarmason
On Fri, Jun 2, 2017 at 10:11 PM, Martin Ågren  wrote:
> On 2 June 2017 at 21:32, Ævar Arnfjörð Bjarmason  wrote:
>> On Fri, Jun 2, 2017 at 11:49 AM, Martin Ågren  wrote:
>>> On 2 June 2017 at 10:51, Ævar Arnfjörð Bjarmason  wrote:
 On Fri, Jun 2, 2017 at 2:15 AM, Junio C Hamano  wrote:
> Martin Ågren  writes:
>
>> I looked into this some more. It turns out it is possible to trigger
>> undefined behavior on "next". Here's what I did:
>> ...
>>
>> This "fixes" the problem:
>> ...
>> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
>> index 3dff80a..d6f4c44 100644
>> --- a/sha1dc/sha1.c
>> +++ b/sha1dc/sha1.c
>> @@ -66,9 +66,9 @@
>> ...
>> With this diff, various tests which seem relevant for SHA-1 pass,
>> including t0013, and the UBSan-error is gone. The second diff is just
>> a monkey-patch. I have no reason to believe I will be able to come up
>> with a proper and complete patch for sha1dc. And I guess such a thing
>> would not really be Git's patch to carry, either. But at least Git
>> could consider whether to keep relying on undefined behavior or not.
>>
>> There's a fair chance I've mangled the whitespace. I'm using gmail's
>> web interface... Sorry about that.
>
> Thanks.  I see Marc Stevens is CC'ed in the thread, so I'd expect
> that the final "fix" would come from his sha1collisiondetection
> repository via Ævar.
>
> In the meantime, I am wondering if it makes sense to merge the
> earlier update with #ifdef ALLOW_UNALIGNED_ACCESS and #ifdef
> SHA1DC_FORCE_LITTLEENDIAN for the v2.13.x maintenance track, which
> would at least unblock those on platforms v2.13.0 did not work
> correctly at all.
>
> Ævar, thoughts?

 I think we're mixing up several things here, which need to be untangled:

 1) The sha1dc works just fine on most platforms even with undefined
 behavior, as evidenced by 2.13.0 working.
>>>
>>> Right, with "platform" meaning "combination of hardware-architecture
>>> and compiler". Nothing can be said about how the current code behaves
>>> on "x86". Such statements can only be made with regard to "x86 and
>>> this or that compiler". Even then, short of studying the compiler
>>> implementation/documentation in detail, one cannot be certain that
>>> seemingly unrelated changes in Git don't make the code do something
>>> else entirely.
>>
>> I think you're veering into a theoretical discussion here that has
>> little to no bearing on the practicalities involved here.
>>
>> Yes if something is undefined behavior in C the compiler &
>> architecture is free to do anything they want with it. In practice
>> lots of undefined behavior is de-facto standardized across various
>> platforms.
>>
>> As far as I can tell unaligned access is one of those things. I don't
>> think there's ever been an x86 chip / compiler that would run this
>> code with any semantic differences when it comes to unaligned access,
>> and such a chip / compiler is unlikely to ever exist.
>>
>> I'm not advocating that we rely on undefined behavior willy-nilly,
>> just that we should consider the real situation is (i.e. what actual
>> architectures / compilers are doing or are likely to do) as opposed to
>> the purely theoretical (if you gave a bunch of aliens who'd never
>> heard of our technology the ANSI C standard to implement from
>> scratch).
>
> Yeah, that's an argument. I just thought I'd provide whatever input I
> could, albeit in text form. The only thing that matters in the end is
> that you (the Git project) feel that you make the correct decision,
> possibly going beyond "theoretical" reasoning into engineering-land.

I forgot to note, I think it would be very useful if you could submit
that patch of yours in cleaned up form to the upstream sha1dc project:
https://github.com/cr-marcstevens/sha1collisiondetection

They might be interested in taking it, even if it's guarded by some
macro "don't do unaligned access even on archs that seem OK with it".

My comments are just focusing on this in the context of whether we
should be hotfixing our copy due to an issue in the wild, like e.g.
the SPARC issue.


Re: [PATCH] perf: work around the tested repo having an index.lock

2017-06-02 Thread Ævar Arnfjörð Bjarmason
On Fri, Jun 2, 2017 at 8:45 PM, Jeff King  wrote:
> On Fri, Jun 02, 2017 at 10:33:30AM +, Ævar Arnfjörð Bjarmason wrote:
>
>> When the tested repo has an index.lock file it should be removed. This
>> file may be present if e.g. git-status previously crashed in that
>> repo, and it will make a lot of git commands fail. Let's try harder
>> and remove the lock.
>
> If your git-status is crashing, you probably have bigger problems (and
> need to clean up the original, too).
>
> But I think a more compelling case is that there may be an ongoing
> operation in the original repo (e.g., say you are in the middle of
> writing a commit message) when we do a blind copy of the filesystem
> contents. You might racily pick up a lockfile.
>
> Should we find and delete all *.lock files in the copied directory? That
> would get ref locks, etc. Half-formed object files are OK. Technically
> if you want to get an uncorrupted repository you'd also want to copy
> refs before objects (in case somebody makes a new object and updates a
> ref while you're copying).
>
> I don't know how careful it's worth being. I don't really _object_ to
> this patch exactly, but it does seem like it's picking up one random
> case (that presumably you hit) and ignoring all of the related cases.

It's not a perfect solution, but it seemed not to cause any harm, and
would allow us to just do what you mean. I can't think of a case where
we'd care to preserve the index.lock in our perf copy, of course the
test may fail due to various other reasons, but at least it won't be
due to this reason.

This is just something I happened to run into locally because I had a
stale index.lock, but FWIW at work I have a git updater running on
tens of thousands of machines that has to handle various edge cases
(e.g. the machine ran out of disk space in the middle of an update, or
something was kill -9'd).

The leftover index.lock is quite common, the second most common "index
is hosed" error is "fatal: index file smaller than expected", but
solving that is more invasive, you need to rm the index and reset
--hard.


Re: [PATCH 2/3] revision.c: use skip_prefix() in handle_revision_opt()

2017-06-02 Thread Jeff King
On Fri, Jun 02, 2017 at 09:10:09PM +0200, SZEDER Gábor wrote:

> @@ -1785,15 +1785,15 @@ static int handle_revision_opt(struct rev_info *revs, 
> int argc, const char **arg
>   } else if (!strcmp(arg, "--author-date-order")) {
>   revs->sort_order = REV_SORT_BY_AUTHOR_DATE;
>   revs->topo_order = 1;
> - } else if (starts_with(arg, "--early-output")) {
> + } else if (skip_prefix(arg, "--early-output", )) {
>   int count = 100;
> - switch (arg[14]) {
> + switch (*optarg) {
>   case '=':
> - count = atoi(arg+15);
> + count = atoi(optarg + 1);
>   /* Fallthrough */
>   case 0:
>   revs->topo_order = 1;
> -revs->early_output = count;
> + revs->early_output = count;
>   }

What happens if I say "--early-output-foobar"? There should probably be
a "default" here that rejects it. Though we'd probably to goto to get to
the unknown block, yuck.

Perhaps we could do:

  if (skip_prefix(arg, "--early-output", ) &&
  (*optarg == '=' || !*optarg)) {
  int count = *optarg ? atoi(optarg + 1) : 100;
  revs->topo_order = 1;
  revs->early_output = count;
  }

Alternatively, a helper like:

  int match_opt(const char *have, const char *want, const char **argout)
  {
const char *arg;
if (!skip_prefix(have, want, ))
return 0;
if (!*arg)
*argout = NULL;
else if (*arg == '=')
*argout = arg + 1;
else
return 0;
return 1;
  }

would let us do:

  if (match_opt(arg, "--early-output"), )) {
int count = optarg ? atoi(optarg) : 100;
...
  }

which is a little nicer and could maybe help other options (I didn't see
any, though). If we're going to go that route, though, I suspect there
may be some helpers we already have. Looks like parse_long_opt() is
almost there, but doesn't handle options. I wonder if we could reuse
bits of parse-options here (or even better, just parse-optify many of
these).

Anyway, none of that is caused by your patch, but at least doing the
minimal fix (my first hunk) seems like it fits into your series.

-Peff


Re: Unaligned accesses in sha1dc

2017-06-02 Thread Martin Ågren
On 2 June 2017 at 21:32, Ævar Arnfjörð Bjarmason  wrote:
> On Fri, Jun 2, 2017 at 11:49 AM, Martin Ågren  wrote:
>> On 2 June 2017 at 10:51, Ævar Arnfjörð Bjarmason  wrote:
>>> On Fri, Jun 2, 2017 at 2:15 AM, Junio C Hamano  wrote:
 Martin Ågren  writes:

> I looked into this some more. It turns out it is possible to trigger
> undefined behavior on "next". Here's what I did:
> ...
>
> This "fixes" the problem:
> ...
> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
> index 3dff80a..d6f4c44 100644
> --- a/sha1dc/sha1.c
> +++ b/sha1dc/sha1.c
> @@ -66,9 +66,9 @@
> ...
> With this diff, various tests which seem relevant for SHA-1 pass,
> including t0013, and the UBSan-error is gone. The second diff is just
> a monkey-patch. I have no reason to believe I will be able to come up
> with a proper and complete patch for sha1dc. And I guess such a thing
> would not really be Git's patch to carry, either. But at least Git
> could consider whether to keep relying on undefined behavior or not.
>
> There's a fair chance I've mangled the whitespace. I'm using gmail's
> web interface... Sorry about that.

 Thanks.  I see Marc Stevens is CC'ed in the thread, so I'd expect
 that the final "fix" would come from his sha1collisiondetection
 repository via Ævar.

 In the meantime, I am wondering if it makes sense to merge the
 earlier update with #ifdef ALLOW_UNALIGNED_ACCESS and #ifdef
 SHA1DC_FORCE_LITTLEENDIAN for the v2.13.x maintenance track, which
 would at least unblock those on platforms v2.13.0 did not work
 correctly at all.

 Ævar, thoughts?
>>>
>>> I think we're mixing up several things here, which need to be untangled:
>>>
>>> 1) The sha1dc works just fine on most platforms even with undefined
>>> behavior, as evidenced by 2.13.0 working.
>>
>> Right, with "platform" meaning "combination of hardware-architecture
>> and compiler". Nothing can be said about how the current code behaves
>> on "x86". Such statements can only be made with regard to "x86 and
>> this or that compiler". Even then, short of studying the compiler
>> implementation/documentation in detail, one cannot be certain that
>> seemingly unrelated changes in Git don't make the code do something
>> else entirely.
>
> I think you're veering into a theoretical discussion here that has
> little to no bearing on the practicalities involved here.
>
> Yes if something is undefined behavior in C the compiler &
> architecture is free to do anything they want with it. In practice
> lots of undefined behavior is de-facto standardized across various
> platforms.
>
> As far as I can tell unaligned access is one of those things. I don't
> think there's ever been an x86 chip / compiler that would run this
> code with any semantic differences when it comes to unaligned access,
> and such a chip / compiler is unlikely to ever exist.
>
> I'm not advocating that we rely on undefined behavior willy-nilly,
> just that we should consider the real situation is (i.e. what actual
> architectures / compilers are doing or are likely to do) as opposed to
> the purely theoretical (if you gave a bunch of aliens who'd never
> heard of our technology the ANSI C standard to implement from
> scratch).

Yeah, that's an argument. I just thought I'd provide whatever input I
could, albeit in text form. The only thing that matters in the end is
that you (the Git project) feel that you make the correct decision,
possibly going beyond "theoretical" reasoning into engineering-land.

Take care,
Martin


Re: [PATCH] perf: work around the tested repo having an index.lock

2017-06-02 Thread Jeff King
On Fri, Jun 02, 2017 at 10:33:30AM +, Ævar Arnfjörð Bjarmason wrote:

> When the tested repo has an index.lock file it should be removed. This
> file may be present if e.g. git-status previously crashed in that
> repo, and it will make a lot of git commands fail. Let's try harder
> and remove the lock.

If your git-status is crashing, you probably have bigger problems (and
need to clean up the original, too).

But I think a more compelling case is that there may be an ongoing
operation in the original repo (e.g., say you are in the middle of
writing a commit message) when we do a blind copy of the filesystem
contents. You might racily pick up a lockfile.

Should we find and delete all *.lock files in the copied directory? That
would get ref locks, etc. Half-formed object files are OK. Technically
if you want to get an uncorrupted repository you'd also want to copy
refs before objects (in case somebody makes a new object and updates a
ref while you're copying).

I don't know how careful it's worth being. I don't really _object_ to
this patch exactly, but it does seem like it's picking up one random
case (that presumably you hit) and ignoring all of the related cases.

-Peff


[WIP v2 2/2] pack-objects: support --blob-max-bytes

2017-06-02 Thread Jonathan Tan
As part of an effort to improve Git support for very large repositories
in which clients typically have only a subset of all version-controlled
blobs, teach pack-objects to support --blob-max-bytes, packing only
blobs not exceeding that size unless the blob corresponds to a file
whose name starts with ".git". upload-pack will eventually be taught to
use this new parameter if needed to exclude certain blobs during a fetch
or clone, potentially drastically reducing network consumption when
serving these very large repositories.

Since any excluded blob should not act as a delta base, I did this by
avoiding such oversized blobs from ever going into the "to_pack" data
structure, which contains both preferred bases (which do not go into the
final packfile but can act as delta bases) and the objects to be packed
(which go into the final packfile and also can act as delta bases). To
that end, add_object_entry() has been modified to exclude the
appropriate non-preferred-base objects. (Preferred bases, regardless of
size, still enter "to_pack" - they are something that the client
indicates that it has, not something that the server needs to serve, so
no exclusion occurs.)

In addition, any such exclusions are recorded and sent before the
packfile. These are recorded in binary format and in hash order, in a
format convenient for a client to use if stored directly to disk.

The other method in which objects are added to "to_pack",
add_object_entry_from_bitmap(), has not been modified - thus packing in
the presence of bitmaps still packs all blobs regardless of size. See
the documentation update in this commit for the rationale.

Signed-off-by: Jonathan Tan 
---
 Documentation/git-pack-objects.txt |  19 ++-
 builtin/pack-objects.c | 105 -
 t/t5300-pack-object.sh |  71 +
 3 files changed, 193 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-pack-objects.txt 
b/Documentation/git-pack-objects.txt
index 8973510a4..23ff8b5be 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -12,7 +12,8 @@ SYNOPSIS
 'git pack-objects' [-q | --progress | --all-progress] [--all-progress-implied]
[--no-reuse-delta] [--delta-base-offset] [--non-empty]
[--local] [--incremental] [--window=] [--depth=]
-   [--revs [--unpacked | --all]] [--stdout | base-name]
+   [--revs [--unpacked | --all]]
+   [--stdout [--blob-max-bytes=] | base-name]
[--shallow] [--keep-true-parents] < object-list
 
 
@@ -231,6 +232,22 @@ So does `git bundle` (see linkgit:git-bundle[1]) when it 
creates a bundle.
With this option, parents that are hidden by grafts are packed
nevertheless.
 
+--blob-max-bytes=::
+   This option can only be used with --stdout. If specified, a blob
+   larger than this will not be packed unless a to-be-packed tree
+   has that blob with a filename beginning with ".git".  The size
+   can be suffixed with "k", "m", or "g", and may be "0".
++
+If specified, after printing the packfile, pack-objects will print the
+count of excluded blobs (8 bytes) . Subsequently, for each excluded blob
+in hash order, pack-objects will print its hash (20 bytes) and its size
+(8 bytes). All numbers are in network byte order.
++
+If pack-objects cannot efficiently determine, for an arbitrary blob,
+that no to-be-packed tree has that blob with a filename beginning with
+".git" (for example, if bitmaps are used to calculate the objects to be
+packed), it will pack all blobs regardless of size.
+
 SEE ALSO
 
 linkgit:git-rev-list[1]
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 730fa3d85..46578c05b 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -77,6 +77,8 @@ static unsigned long cache_max_small_delta_size = 1000;
 
 static unsigned long window_memory_limit = 0;
 
+static long blob_max_bytes = -1;
+
 /*
  * stats
  */
@@ -1069,17 +1071,73 @@ static const char no_closure_warning[] = N_(
 "disabling bitmap writing, as some objects are not being packed"
 );
 
+struct oversized_blob {
+   struct hashmap_entry entry;
+   struct object_id oid;
+   unsigned long size;
+};
+struct hashmap oversized_blobs;
+
+static int oversized_blob_cmp_fn(const void *b1, const void *b2,
+const void *keydata)
+{
+   const struct object_id *oid2 = keydata ? keydata :
+   &((const struct oversized_blob *) b2)->oid;
+   return oidcmp(&((const struct oversized_blob *) b1)->oid, oid2);
+}
+
+/*
+ * Returns 1 and records this blob in oversized_blobs if the given blob does
+ * not meet any defined blob size limits.  Blobs that appear as a tree entry
+ * whose basename begins with ".git" are never considered oversized.
+ *
+ * If the tree entry corresponding to the given blob is unknown, pass NULL as
+ * name. In this case, this function will always return 

Re: [PATCH 04/31] setup: don't perform lazy initialization of repository state

2017-06-02 Thread Jeff King
On Thu, Jun 01, 2017 at 12:23:25PM -0700, Stefan Beller wrote:

> On Wed, May 31, 2017 at 2:43 PM, Brandon Williams  wrote:
> > Under some circumstances (bogus GIT_DIR value or the discovered gitdir
> > is '.git') 'setup_git_directory()' won't initialize key repository
> > state.  This leads to inconsistent state after running the setup code.
> > To account for this inconsistent state, lazy initialization is done once
> > a caller asks for the repository's gitdir or some other piece of
> > repository state.  This is confusing and can be error prone.
> >
> > Instead let's tighten the expected outcome of 'setup_git_directory()'
> > and ensure that it initializes repository state in all cases that would
> > have been handled by lazy initialization.
> 
> Lazy init is usually there for a reason. (As in: it took too long to perform
> it at all times, so it has been optimized to only perform the init when 
> needed).

In the case of setup_git_env(), I think it was less about doing work and
more that we didn't want to have to do explicit setup in each program.
But over the years we've moved away from that, and in fact if you hit
the lazy initialization these days you'll generally BUG() anyway.

_But_ I suspect there are still some cases you'd need to handle. For
instance, it's still OK to skip calling setup_git_directory() if you've
got $GIT_DIR in the environment (which is why we have have_git_dir()
instead of checking startup_info->have_repository).

I think it would be nice to do away with that, too, but we're not quite
there yet (and if I am reading this patch correctly, we'd probably hit
these BUGs in such cases).

-Peff


[WIP v2 1/2] pack-objects: rename want_.* to ignore_.*

2017-06-02 Thread Jonathan Tan
Currently, in pack_objects, add_object_entry() distinguishes between 2
types of non-preferred-base objects:

 (1) objects that should not be in "to_pack" because an option like
 --local or --honor-pack-keep is set
 (2) objects that should be in "to_pack"

A subsequent commit will teach pack-objects to exclude certain objects
(specifically, blobs that are larger than a user-given threshold and are
not potentially a Git special file [1]), but unlike in (1) above, these
exclusions need to be reported. So now we have 3 types of
non-preferred-base objects:

 (a) objects that should not be in "to_pack" and should not be reported
 because an option like --local or --honor-pack-keep is set
 (b) objects that should not be in "to_pack" and should be reported
 because they are blobs that are oversized and non-Git-special
 (c) objects that should be in "to_pack"

add_object_entry() will be taught to distinguish between these 3 types
by the following:

 - renaming want_found_object() and want_object_in_pack() to ignore_.*
   to make it clear that they only check for (a) (this commit)
 - adding a new function to check for (b) and using it within
   add_object_entry() (a subsequent commit)

The alternative would be to retain the names want_found_object() and
want_object_in_pack() and have them handle both the cases (a) and (b),
but that would result in more complicated code.

We also take the opportunity to use the terminology "preferred_base"
instead of "excluded" in these methods. It is true that preferred bases
are not included in the final packfile generation, but at this point in
the code, there is no exclusion taking place - on the contrary, if
something is "excluded", it is in fact guaranteed to be in to_pack.

[1] For the purposes of pack_objects, a blob is a Git special file if it
appears in a to-be-packed tree with a filename beginning with
".git".

Signed-off-by: Jonathan Tan 
---
 builtin/pack-objects.c | 56 +-
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 80439047a..730fa3d85 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -946,12 +946,12 @@ static int have_duplicate_entry(const unsigned char *sha1,
return 1;
 }
 
-static int want_found_object(int exclude, struct packed_git *p)
+static int ignore_found_object(int preferred_base, struct packed_git *p)
 {
-   if (exclude)
-   return 1;
-   if (incremental)
+   if (preferred_base)
return 0;
+   if (incremental)
+   return 1;
 
/*
 * When asked to do --local (do not include an object that appears in a
@@ -969,19 +969,19 @@ static int want_found_object(int exclude, struct 
packed_git *p)
 */
if (!ignore_packed_keep &&
(!local || !have_non_local_packs))
-   return 1;
+   return 0;
 
if (local && !p->pack_local)
-   return 0;
+   return 1;
if (ignore_packed_keep && p->pack_local && p->pack_keep)
-   return 0;
+   return 1;
 
/* we don't know yet; keep looking for more packs */
return -1;
 }
 
 /*
- * Check whether we want the object in the pack (e.g., we do not want
+ * Check whether we should ignore this object (e.g., we do not want
  * objects found in non-local stores if the "--local" option was used).
  *
  * If the caller already knows an existing pack it wants to take the object
@@ -989,16 +989,16 @@ static int want_found_object(int exclude, struct 
packed_git *p)
  * function finds if there is any pack that has the object and returns the pack
  * and its offset in these variables.
  */
-static int want_object_in_pack(const unsigned char *sha1,
-  int exclude,
-  struct packed_git **found_pack,
-  off_t *found_offset)
+static int ignore_object(const unsigned char *sha1,
+int preferred_base,
+struct packed_git **found_pack,
+off_t *found_offset)
 {
struct mru_entry *entry;
-   int want;
+   int ignore;
 
-   if (!exclude && local && has_loose_object_nonlocal(sha1))
-   return 0;
+   if (!preferred_base && local && has_loose_object_nonlocal(sha1))
+   return 1;
 
/*
 * If we already know the pack object lives in, start checks from that
@@ -1006,9 +1006,9 @@ static int want_object_in_pack(const unsigned char *sha1,
 * are present we will determine the answer right now.
 */
if (*found_pack) {
-   want = want_found_object(exclude, *found_pack);
-   if (want != -1)
-   return want;
+   ignore = ignore_found_object(preferred_base, *found_pack);
+   if (ignore != -1)
+ 

[WIP v2 0/2] Modifying pack objects to support --blob-max-bytes

2017-06-02 Thread Jonathan Tan
Here's a new version addressing Junio's comments.

> Hmph, that statement is a hard to read and agree to.  I thought an
> ignored object that is not going to be packed is one that won't hit
> to_pack?

That is true currently, but will not be the full truth once the 2nd
patch is applied.  I have expanded the commit message to explain the
plan in more detail.

> It would be nice if the name of the parameter hints that this is
> counted in bytes, e.g. "--blob-max-bytes"; 

Done. Originally I wanted to avoid terms like "max" because this
disallowed the user from requesting that even 0-byte blobs not be sent,
but realized that I probably shouldn't bother about that. :-)

> Do we need to future-proof the output format so that we can later
> use 32-byte hash?  The input to pack-objects (i.e. rev-list --objects)
> is hexadecimal text, and it may not be so bad to make this also
> text, e.g. " SP  LF".  That way, you do not have to
> worry about byte order, hash size, or length limited to uint64.

The reason for using binary is for the convenience of the client to
avoid another conversion before storing it to disk (and also network
savings). In a large repo, I think this list will be quite large. I
realized that I didn't mention anything about this in the commit
message, so I have added an explanation.

I think this is sufficiently future-proof in that the format of this
hash matches the format of the hashes used in the objects in the
packfile. As for object size being limited to uint64, I think the
benefits of the space savings (in using binary) outweigh the small risk
that our files will get larger than that before we upgrade our protocol
:-P

>> ++
>> +If pack-objects cannot efficiently determine, for an arbitrary blob,
>> +that no to-be-packed tree has that blob with a filename beginning with
>> +".git" (for example, if bitmaps are used to calculate the objects to be
>> +packed), it will pack all blobs regardless of size.
>> +
> This is a bit hard to understand, at least to me.  Let me step-wise
> deconstruct what I think you are saying.
> 
>  - A blob can appear in multiple trees, and each tree has name
>(i.e. pathname component) for the blob.
> 
>  - Sometimes we may know _all_ trees that contain a particular
>blob and hence _all_ names these trees call this blob.  In such a
>case, we can _prove_ that the blob never is used as ".git"
>in any tree.
> 
>  - We exclude a blob that is larger than the specified limit, but
>only when we can prove the above.  If it is possible that the
>blob might appear as ".git" in some tree, the blob is
>not omitted no matter its size.
> 
> And this is a very conservative way to avoid omitting something that
> could be one of those control files like ".gitignore".
> 
> Am I following your thought correctly?

We only need to prove for *to-be-packed* trees (the trees that
pack-objects are about to pack in the packfile(s) that it is printing),
but otherwise, you are correct. I was trying to be thorough in saying
that "if we don't have bitmaps, we know if a blob's name starts with
'.git', so we can do filtering by size, but if we have bitmaps, we don't
have any relevant name information" - any rewording suggestions are
appreciated.

> Can this multiplication overflow (hint: st_mult)?

Thanks - used st_mult.

> This sorting is a part of external contract (i.e. "output in hash
> order" is documented), but is it necessary?  Just wondering.

It is convenient for the client because the client can then store it
directly and binary search when accessing it. (Added a note about
convenience to the commit message.)

> You do want to exclude the trailing thing out of the checksum, but
> CSUM_CLOSE would close the fd and that is why you pass 0 here.  OK.
>
> Even though we are in "pack_to_stdout" codepath, I'd prefer to
> consistently use f->fd, not a hardcoded 1 here.  Of course,
> sha1close() would have freed 'f' at this point, so we need to grab
> the return value from the function to learn which fd is connected to
> our original "stdout" at this point.
> 
> Likewise for write_oversized_info(), which probably should just take
> "int fd" as parameter.

Done for write_oversized_info(), although now that we are moving the
invocation of write_oversized_info() out of write_pack_file(), this
might not be so important. (For the rest, I have reverted the code
because we are now printing the excluded hashes before the packfile
instead of after.)

> I do not necessarily think it is a bad idea to show this as trailing
> data after the pack, but when we come to this function, do we have
> enough information to make the "oversize" information appear before
> the pack data if we wanted to?  I have a suspicion that it is easier
> to handle at the receiving end, after the capability exchange
> decides to use this "omit oversized ones" extension, to receive this
> before the packdata begins.  If it is at the end, the receiver needs
> to spawn either unpack-objects or index-objects 

Re: git-2.13.0: log --date=format:%z not working

2017-06-02 Thread Jeff King
On Fri, Jun 02, 2017 at 07:25:43PM +0200, René Scharfe wrote:

> Am 02.06.2017 um 05:08 schrieb Jeff King:
> > In theory the solution is:
> > 
> >1. Start using localtime() instead of gmtime() with an adjustment when
> >   we are converting to the local timezone (i.e., format-local). We
> >   should be able to do this portably.
> > 
> >   This is easy to do, and it's better than handling %z ourselves,
> >   because it makes %Z work, too.
> > 
> >2. When showing the author's timezone, do some trickery to set the
> >   program's timezone, then use localtime(), then restore the program
> >   timezone.
> > 
> >   I couldn't get this to work reliably. And anyway, we'd still have
> >   nothing to put in %Z since we don't have a timezone name at all in
> >   the git objects. We just have "+0400" or whatever.
> > 
> > So I don't see a portable way to make (2) work.
> 
> We could create a strftime wrapper that also takes a time zone offset,
> with platform-specific implementations.  Is it worth the effort?
> 
> What reliability issues did you run into?

My patch is below for reference.  The issue is that we have to stuff a
name into $TZ that the system libc will parse into something sensible.
Just setting it to "%+05d" doesn't work at all. glibc at least seems to
accept names like FOO+4, but:

  - I have no idea if that's portable

  - it only allows single-hour offsets, so +0330 is out. There might be
some way to represent that, but I'm not sure if it's portable
(FOO+0300 doesn't seem to work even on glibc).

  - that sets %Z to "FOO", which is obviously nonsense

> > If we do handle "%z" ourselves (either always or for just the one case),
> > what should the matching %Z say? Right now (and I think with René's
> > patch) it says GMT, which is actively misleading. We should probably
> > replace it with the same text as "%z". That's not quite what the user
> > wanted, but at least it's accurate.
> 
> On Linux "%z %Z" is expanded to "+0200 CEST" for me, while on Windows I
> get "Mitteleurop▒ische Sommerzeit Mitteleurop▒ische Sommerzeit".  (That
> "▒" is probably supposed to be an "ä".)  POSIX requires  +hhmm or -hhmm
> format for %z, and for %Z is to be "Replaced by the timezone name or
> abbreviation".
> 
> I'd say "GMT+0200" etc. is a nice enough timezone name, i.e. having %Z
> resolve to the same as %z plus a literal prefix of "GMT" should at least
> not be wrong.

I thought that, too, but I think it is wrong based on my understanding
of how $TZ is parsed. There something like "EDT-4" means "call this EDT,
and by the way it is 4 hours behind GMT".

So what you're proposing isn't wrong per se, but your notation means
something totally different than what similar-looking notation looks
like on the $TZ end, which is bound to create confusion.

> Alternatively we could have a lookup table mapping a few typical offsets
> to timezone names, but e.g. handling daylight saving times would
> probably be too hard (when did that part of the world switch in the
> given year?  north or south of the equator?)..

Right, I don't think the mapping of zone to offset is reversible,
because many zones map to the same offset. If I tell you I'm in -0500,
even just in the US that could mean Eastern Standard Time (winter, no
DST) or Central Daylight Time (summer, DST). Not to mention that other
political entities in the same longitude have their own zones which do
DST at different times (or were even established as zones at different
times; historical dates need to use the zones as they were at that
time).

> > As far as the patch itself goes, I'm disappointed to lose the automatic
> > "%" handling for all of the other callers. But I suspect the boilerplate
> > involved in any solution that lets callers choose whether or not to use
> > it would end up being longer than just handling it in each caller.
> 
> Actually I felt uneasy when you added that forced %% handling because it
> put a policy into an otherwise neutral interpreter function.  I just had
> no practical argument against it -- until now.
> 
> I'd rather see strbuf_expand also lose the hard-coded percent sign, but
> again I don't have an actual user for such a flexibility (yet).
> 
> Perhaps we should add a fully neutral strbuf_expand_core (or whatever),
> make strbuf_expand a wrapper with hard-coded % and %% handling and use
> the core function in the strftime wrapper.  Except that the function is
> not easily stackable.  Hmm..

Right, that's the boilerplate trickiness I was referring to. It's
probably not worth the effort.

Anyway, here's my patch. I've been testing it with:

  ./git log --format='%ai%n%ad%n' --date=format:'%Y-%m-%d %H:%M:%S %z (%Z)'

which lets you compare a variety of commits with the existing formatting
routine.

---
diff --git a/date.c b/date.c
index 63fa99685..a6214172e 100644
--- a/date.c
+++ b/date.c
@@ -196,6 +196,34 @@ const char *show_date(timestamp_t time, int tz, const 
struct date_mode *mode)
  

Re: Unaligned accesses in sha1dc

2017-06-02 Thread Ævar Arnfjörð Bjarmason
On Fri, Jun 2, 2017 at 11:49 AM, Martin Ågren  wrote:
> On 2 June 2017 at 10:51, Ævar Arnfjörð Bjarmason  wrote:
>> On Fri, Jun 2, 2017 at 2:15 AM, Junio C Hamano  wrote:
>>> Martin Ågren  writes:
>>>
 I looked into this some more. It turns out it is possible to trigger
 undefined behavior on "next". Here's what I did:
 ...

 This "fixes" the problem:
 ...
 diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
 index 3dff80a..d6f4c44 100644
 --- a/sha1dc/sha1.c
 +++ b/sha1dc/sha1.c
 @@ -66,9 +66,9 @@
 ...
 With this diff, various tests which seem relevant for SHA-1 pass,
 including t0013, and the UBSan-error is gone. The second diff is just
 a monkey-patch. I have no reason to believe I will be able to come up
 with a proper and complete patch for sha1dc. And I guess such a thing
 would not really be Git's patch to carry, either. But at least Git
 could consider whether to keep relying on undefined behavior or not.

 There's a fair chance I've mangled the whitespace. I'm using gmail's
 web interface... Sorry about that.
>>>
>>> Thanks.  I see Marc Stevens is CC'ed in the thread, so I'd expect
>>> that the final "fix" would come from his sha1collisiondetection
>>> repository via Ævar.
>>>
>>> In the meantime, I am wondering if it makes sense to merge the
>>> earlier update with #ifdef ALLOW_UNALIGNED_ACCESS and #ifdef
>>> SHA1DC_FORCE_LITTLEENDIAN for the v2.13.x maintenance track, which
>>> would at least unblock those on platforms v2.13.0 did not work
>>> correctly at all.
>>>
>>> Ævar, thoughts?
>>
>> I think we're mixing up several things here, which need to be untangled:
>>
>> 1) The sha1dc works just fine on most platforms even with undefined
>> behavior, as evidenced by 2.13.0 working.
>
> Right, with "platform" meaning "combination of hardware-architecture
> and compiler". Nothing can be said about how the current code behaves
> on "x86". Such statements can only be made with regard to "x86 and
> this or that compiler". Even then, short of studying the compiler
> implementation/documentation in detail, one cannot be certain that
> seemingly unrelated changes in Git don't make the code do something
> else entirely.

I think you're veering into a theoretical discussion here that has
little to no bearing on the practicalities involved here.

Yes if something is undefined behavior in C the compiler &
architecture is free to do anything they want with it. In practice
lots of undefined behavior is de-facto standardized across various
platforms.

As far as I can tell unaligned access is one of those things. I don't
think there's ever been an x86 chip / compiler that would run this
code with any semantic differences when it comes to unaligned access,
and such a chip / compiler is unlikely to ever exist.

I'm not advocating that we rely on undefined behavior willy-nilly,
just that we should consider the real situation is (i.e. what actual
architectures / compilers are doing or are likely to do) as opposed to
the purely theoretical (if you gave a bunch of aliens who'd never
heard of our technology the ANSI C standard to implement from
scratch).

Here's a performance test of your patch above against p3400-rebase.sh.
I don't know how much these error bars from t/perf can be trusted.
This is over 30 runs with -O3:

- 3400.2: rebase on top of a lot of unrelated changes
  v2.12.0 : 1.25(1.10+0.06)
  v2.13.0 : 1.21(1.06+0.06) -3.2%
  origin/next : 1.22(1.04+0.07) -2.4%
  martin: 1.23(1.06+0.07) -1.6%
- 3400.4: rebase a lot of unrelated changes without split-index
  v2.12.0 : 6.49(3.60+0.52)
  v2.13.0 : 8.21(4.18+0.55) +26.5%
  origin/next : 8.27(4.34+0.64) +27.4%
  martin: 8.80(4.36+0.62) +35.6%
- 3400.6: rebase a lot of unrelated changes with split-index
  v2.12.0 : 6.77(3.56+0.51)
  v2.13.0 : 4.09(2.67+0.38) -39.6%
  origin/next : 4.13(2.70+0.36) -39.0%
  martin: 4.30(2.80+0.32) -36.5%

And just your patch v.s. next:

- 3400.2: rebase on top of a lot of unrelated changes
  origin/next : 1.22(1.06+0.06)
  martin  : 1.22(1.06+0.05) +0.0%
- 3400.4: rebase a lot of unrelated changes without split-index
  origin/next : 7.54(4.13+0.60)
  martin  : 7.75(4.34+0.67) +2.8%
- 3400.6: rebase a lot of unrelated changes with split-index
  origin/next : 4.19(2.92+0.31)
  martin  : 4.14(2.84+0.39) -1.2%

It seems to be a bit slower, is that speedup worth the use of
unaligned access? I genuinely don't know. I'm just interested to find
what if anything we need to urgently fix in a release version of git.

One data point there is that the fallback blk-sha1 implementation
we've shipped since 2009 has the same errors about unaligned access as
before your patch with -fsanitize[..], and looking at the commit
message this was something Linus knew about at the time, see
d7c208a92e ("Add new optimized C 'block-sha1' 

[PATCH] Documentation/git-rm: correct submodule description

2017-06-02 Thread Stefan Beller
Since 3ccd681c2a (Merge branch 'sb/submodule-rm-absorb', 2017-01-18)
git-rm tries to absorb any submodules git dir before deleting the
submodule. Correct the documentation to say so.

Signed-off-by: Stefan Beller 
---
 Documentation/git-rm.txt | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
index f1efc116eb..8c87e8cdd7 100644
--- a/Documentation/git-rm.txt
+++ b/Documentation/git-rm.txt
@@ -140,10 +140,11 @@ Only submodules using a gitfile (which means they were 
cloned
 with a Git version 1.7.8 or newer) will be removed from the work
 tree, as their repository lives inside the .git directory of the
 superproject. If a submodule (or one of those nested inside it)
-still uses a .git directory, `git rm` will fail - no matter if forced
-or not - to protect the submodule's history. If it exists the
-submodule. section in the linkgit:gitmodules[5] file will also
-be removed and that file will be staged (unless --cached or -n are used).
+still uses a .git directory, `git rm` will move the submodules
+git directory into the superprojects git directory to protect
+the submodule's history. If it exists the submodule. section
+in the linkgit:gitmodules[5] file will also be removed and that file
+will be staged (unless --cached or -n are used).
 
 A submodule is considered up-to-date when the HEAD is the same as
 recorded in the index, no tracked files are modified and no untracked
-- 
2.13.0.17.gab62347cd9



Re: [PATCH 00/33] object id conversion (grep and diff)

2017-06-02 Thread Brandon Williams
On 05/31, brian m. carlson wrote:
> On Tue, May 30, 2017 at 10:30:36AM -0700, Brandon Williams wrote:
> > A month or so ago I thought I would lend a hand to Brian and do a round of
> > conversions from sha1 -> struct object_id.  Now that Brian's latest series 
> > has
> > hit master I can finally send these patches out.
> > 
> > The first couple patches are from Brian which convert some of the notes 
> > logic
> > to using 'struct object_id'.  The remaining patches are to convert the grep 
> > and
> > diff machinery to using 'struct object_id'.
> > 
> > Brandon Williams (26):
> >   grep: convert to struct object_id
> >   diff: convert get_stat_data to struct object_id
> >   diff: convert diff_index_show_file to struct object_id
> >   diff: convert diff_addremove to struct object_id
> >   diff: convert run_diff_files to struct object_id
> >   diff: convert diff_change to struct object_id
> >   diff: convert fill_filespec to struct object_id
> >   diff: convert reuse_worktree_file to struct object_id
> >   diff: finish conversion for prepare_temp_file to struct object_id
> >   patch-ids: convert to struct object_id
> >   diff: convert diff_flush_patch_id to struct object_id
> >   combine-diff: convert diff_tree_combined to struct object_id
> >   combine-diff: convert find_paths_* to struct object_id
> >   tree-diff: convert diff_root_tree_sha1 to struct object_id
> >   notes-merge: convert notes_merge* to struct object_id
> >   notes-merge: convert merge_from_diffs to struct object_id
> >   notes-merge: convert find_notes_merge_pair_ps to struct object_id
> >   notes-merge: convert verify_notes_filepair to struct object_id
> >   notes-merge: convert write_note_to_worktree to struct object_id
> >   diff-tree: convert diff_tree_sha1 to struct object_id
> 
> I've wanted to convert this function for some time.  I'm glad you got
> around to it.

Of course, glad to help!  After working on this stuff I realized how serial 
this sort
of conversion can be since its easy to have a function span many many
parts of the code base.

> 
> Other than the issue I pointed out, the fact that I'm obviously not
> qualified to review my own patches, and the issue that Stefan pointed
> out with GIT_MAX_HEXSZ, this looks good to me.
> -- 
> brian m. carlson / brian with sandals: Houston, Texas, US
> https://www.crustytoothpaste.net/~bmc | My opinion only
> OpenPGP: https://keybase.io/bk2204



-- 
Brandon Williams


[PATCH 0/3] Use skip_prefix() in handle_revision_{,pseudo_}opt()

2017-06-02 Thread SZEDER Gábor
While at it, the first one fixes a minor bug, which allowed e.g. 'git
log --no-min-parents-foobarbaz' to succeed.

The other two are fairly straightforward starts_with() ->
skip_prefix() conversions.

SZEDER Gábor (3):
  revision.c: stricter parsing of '--no-{min,max}-parents'
  revision.c: use skip_prefix() in handle_revision_opt()
  revision.c: use skip_prefix() in handle_revision_pseudo_opt()

 revision.c | 76 --
 1 file changed, 39 insertions(+), 37 deletions(-)

-- 
2.13.0.420.g54001f015



[PATCH 1/3] revision.c: stricter parsing of '--no-{min,max}-parents'

2017-06-02 Thread SZEDER Gábor
These two options are parsed using starts_with(), allowing things like
'git log --no-min-parents-foobarbaz' to succeed.

Use strcmp() instead.

Signed-off-by: SZEDER Gábor 
---
 revision.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/revision.c b/revision.c
index f88c14bab..2f37e1e3a 100644
--- a/revision.c
+++ b/revision.c
@@ -1812,11 +1812,11 @@ static int handle_revision_opt(struct rev_info *revs, 
int argc, const char **arg
revs->max_parents = 1;
} else if (starts_with(arg, "--min-parents=")) {
revs->min_parents = atoi(arg+14);
-   } else if (starts_with(arg, "--no-min-parents")) {
+   } else if (!strcmp(arg, "--no-min-parents")) {
revs->min_parents = 0;
} else if (starts_with(arg, "--max-parents=")) {
revs->max_parents = atoi(arg+14);
-   } else if (starts_with(arg, "--no-max-parents")) {
+   } else if (!strcmp(arg, "--no-max-parents")) {
revs->max_parents = -1;
} else if (!strcmp(arg, "--boundary")) {
revs->boundary = 1;
-- 
2.13.0.420.g54001f015



[PATCH 3/3] revision.c: use skip_prefix() in handle_revision_pseudo_opt()

2017-06-02 Thread SZEDER Gábor
Instead of starts_with() and a bunch of magic numbers.

Signed-off-by: SZEDER Gábor 
---
 revision.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/revision.c b/revision.c
index 2b64b7e0e..ab0279572 100644
--- a/revision.c
+++ b/revision.c
@@ -2142,20 +2142,20 @@ static int handle_revision_pseudo_opt(const char 
*submodule,
} else if ((argcount = parse_long_opt("exclude", argv, ))) {
add_ref_exclusion(>ref_excludes, optarg);
return argcount;
-   } else if (starts_with(arg, "--branches=")) {
+   } else if (skip_prefix(arg, "--branches=", )) {
struct all_refs_cb cb;
init_all_refs_cb(, revs, *flags);
-   for_each_glob_ref_in(handle_one_ref, arg + 11, "refs/heads/", 
);
+   for_each_glob_ref_in(handle_one_ref, optarg, "refs/heads/", 
);
clear_ref_exclusion(>ref_excludes);
-   } else if (starts_with(arg, "--tags=")) {
+   } else if (skip_prefix(arg, "--tags=", )) {
struct all_refs_cb cb;
init_all_refs_cb(, revs, *flags);
-   for_each_glob_ref_in(handle_one_ref, arg + 7, "refs/tags/", 
);
+   for_each_glob_ref_in(handle_one_ref, optarg, "refs/tags/", );
clear_ref_exclusion(>ref_excludes);
-   } else if (starts_with(arg, "--remotes=")) {
+   } else if (skip_prefix(arg, "--remotes=", )) {
struct all_refs_cb cb;
init_all_refs_cb(, revs, *flags);
-   for_each_glob_ref_in(handle_one_ref, arg + 10, "refs/remotes/", 
);
+   for_each_glob_ref_in(handle_one_ref, optarg, "refs/remotes/", 
);
clear_ref_exclusion(>ref_excludes);
} else if (!strcmp(arg, "--reflog")) {
add_reflogs_to_pending(revs, *flags);
@@ -2165,14 +2165,14 @@ static int handle_revision_pseudo_opt(const char 
*submodule,
*flags ^= UNINTERESTING | BOTTOM;
} else if (!strcmp(arg, "--no-walk")) {
revs->no_walk = REVISION_WALK_NO_WALK_SORTED;
-   } else if (starts_with(arg, "--no-walk=")) {
+   } else if (skip_prefix(arg, "--no-walk=", )) {
/*
 * Detached form ("--no-walk X" as opposed to "--no-walk=X")
 * not allowed, since the argument is optional.
 */
-   if (!strcmp(arg + 10, "sorted"))
+   if (!strcmp(optarg, "sorted"))
revs->no_walk = REVISION_WALK_NO_WALK_SORTED;
-   else if (!strcmp(arg + 10, "unsorted"))
+   else if (!strcmp(optarg, "unsorted"))
revs->no_walk = REVISION_WALK_NO_WALK_UNSORTED;
else
return error("invalid argument to --no-walk");
-- 
2.13.0.420.g54001f015



[PATCH 2/3] revision.c: use skip_prefix() in handle_revision_opt()

2017-06-02 Thread SZEDER Gábor
Instead of starts_with() and a bunch of magic numbers.

While at it, there is an indentation fix where processing
'--early-output', and a coding style fix where processing
'--show-notes'.

Signed-off-by: SZEDER Gábor 
---
 revision.c | 54 --
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/revision.c b/revision.c
index 2f37e1e3a..2b64b7e0e 100644
--- a/revision.c
+++ b/revision.c
@@ -1725,8 +1725,8 @@ static int handle_revision_opt(struct rev_info *revs, int 
argc, const char **arg
revs->max_count = atoi(argv[1]);
revs->no_walk = 0;
return 2;
-   } else if (starts_with(arg, "-n")) {
-   revs->max_count = atoi(arg + 2);
+   } else if (skip_prefix(arg, "-n", )) {
+   revs->max_count = atoi(optarg);
revs->no_walk = 0;
} else if ((argcount = parse_long_opt("max-age", argv, ))) {
revs->max_age = atoi(optarg);
@@ -1785,15 +1785,15 @@ static int handle_revision_opt(struct rev_info *revs, 
int argc, const char **arg
} else if (!strcmp(arg, "--author-date-order")) {
revs->sort_order = REV_SORT_BY_AUTHOR_DATE;
revs->topo_order = 1;
-   } else if (starts_with(arg, "--early-output")) {
+   } else if (skip_prefix(arg, "--early-output", )) {
int count = 100;
-   switch (arg[14]) {
+   switch (*optarg) {
case '=':
-   count = atoi(arg+15);
+   count = atoi(optarg + 1);
/* Fallthrough */
case 0:
revs->topo_order = 1;
-  revs->early_output = count;
+   revs->early_output = count;
}
} else if (!strcmp(arg, "--parents")) {
revs->rewrite_parents = 1;
@@ -1810,12 +1810,12 @@ static int handle_revision_opt(struct rev_info *revs, 
int argc, const char **arg
revs->min_parents = 2;
} else if (!strcmp(arg, "--no-merges")) {
revs->max_parents = 1;
-   } else if (starts_with(arg, "--min-parents=")) {
-   revs->min_parents = atoi(arg+14);
+   } else if (skip_prefix(arg, "--min-parents=", )) {
+   revs->min_parents = atoi(optarg);
} else if (!strcmp(arg, "--no-min-parents")) {
revs->min_parents = 0;
-   } else if (starts_with(arg, "--max-parents=")) {
-   revs->max_parents = atoi(arg+14);
+   } else if (skip_prefix(arg, "--max-parents=", )) {
+   revs->max_parents = atoi(optarg);
} else if (!strcmp(arg, "--no-max-parents")) {
revs->max_parents = -1;
} else if (!strcmp(arg, "--boundary")) {
@@ -1897,14 +1897,15 @@ static int handle_revision_opt(struct rev_info *revs, 
int argc, const char **arg
revs->verbose_header = 1;
revs->pretty_given = 1;
get_commit_format(NULL, revs);
-   } else if (starts_with(arg, "--pretty=") || starts_with(arg, 
"--format=")) {
+   } else if (skip_prefix(arg, "--pretty=", ) ||
+  skip_prefix(arg, "--format=", )) {
/*
 * Detached form ("--pretty X" as opposed to "--pretty=X")
 * not allowed, since the argument is optional.
 */
revs->verbose_header = 1;
revs->pretty_given = 1;
-   get_commit_format(arg+9, revs);
+   get_commit_format(optarg, revs);
} else if (!strcmp(arg, "--expand-tabs")) {
revs->expand_tabs_in_log = 8;
} else if (!strcmp(arg, "--no-expand-tabs")) {
@@ -1922,26 +1923,27 @@ static int handle_revision_opt(struct rev_info *revs, 
int argc, const char **arg
revs->show_signature = 1;
} else if (!strcmp(arg, "--no-show-signature")) {
revs->show_signature = 0;
-   } else if (!strcmp(arg, "--show-linear-break") ||
-  starts_with(arg, "--show-linear-break=")) {
-   if (starts_with(arg, "--show-linear-break="))
-   revs->break_bar = xstrdup(arg + 20);
-   else
+   } else if (skip_prefix(arg, "--show-linear-break", )) {
+   switch (*optarg) {
+   case '=':
+   revs->break_bar = xstrdup(optarg + 1);
+   break;
+   case 0:
revs->break_bar = "..";
+   }
revs->track_linear = 1;
revs->track_first_time = 1;
-   } else if (starts_with(arg, "--show-notes=") ||
-  starts_with(arg, "--notes=")) {
+   } else if (skip_prefix(arg, "--show-notes=", ) ||
+  skip_prefix(arg, "--notes=", )) {
struct strbuf buf = STRBUF_INIT;
  

Re: [PATCH 25/33] notes-merge: convert verify_notes_filepair to struct object_id

2017-06-02 Thread Brandon Williams
On 06/02, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > Signed-off-by: Brandon Williams 
> > ---
> >  notes-merge.c | 20 ++--
> >  1 file changed, 10 insertions(+), 10 deletions(-)
> >
> > diff --git a/notes-merge.c b/notes-merge.c
> > index 55dbb3659..962e9b1bc 100644
> > --- a/notes-merge.c
> > +++ b/notes-merge.c
> > @@ -22,21 +22,21 @@ void init_notes_merge_options(struct 
> > notes_merge_options *o)
> > o->verbosity = NOTES_MERGE_VERBOSITY_DEFAULT;
> >  }
> >  
> > -static int path_to_sha1(const char *path, unsigned char *sha1)
> > +static int path_to_oid(const char *path, struct object_id *oid)
> >  {
> > -   char hex_sha1[40];
> > +   char hex_oid[GIT_SHA1_HEXSZ];
> > int i = 0;
> > -   while (*path && i < 40) {
> > +   while (*path && i < GIT_SHA1_HEXSZ) {
> > if (*path != '/')
> > -   hex_sha1[i++] = *path;
> > +   hex_oid[i++] = *path;
> > path++;
> > }
> 
> It's no brainer to do s/GIT_SHA1_HEXSZ/GIT_MAX_HEXSZ/ for all of the
> above, but ...

I'll fix this.

> 
> > -   if (*path || i != 40)
> > +   if (*path || i != GIT_SHA1_HEXSZ)
> > return -1;
> 
> ... this one is tricky.  
> 
> What's in our envisioned future?  Are we expecing to see object
> names, named with two or more hash functions, in a same repository?
> If so, and one is 20 bytes and another one is 32 bytes, then this
> should check 'i' against 40 and 64 and pass if 'i' is one of these
> expected lengths?

That's a good question, and at this point in time do we have an
envisioned future?  From some of our conversations I seem to remember
that we don't want a single repository to have objects based on two
different hash functions, but rather some translation layer to convert
between two hashing functions (for compatibility with other
non-converted repos).  Though nothing has been settled upon yet so I
don't know what the future would look like (and the best way to prepare
for it).

> 
> > -   return get_sha1_hex(hex_sha1, sha1);
> > +   return get_oid_hex(hex_oid, oid);
> >  }

-- 
Brandon Williams


Re: [PATCH 22/33] notes-merge: convert notes_merge* to struct object_id

2017-06-02 Thread Brandon Williams
On 05/31, brian m. carlson wrote:
> On Tue, May 30, 2017 at 10:30:58AM -0700, Brandon Williams wrote:
> > @@ -596,47 +596,47 @@ int notes_merge(struct notes_merge_options *o,
> > /* Find merge bases */
> > bases = get_merge_bases(local, remote);
> > if (!bases) {
> > -   base_sha1 = null_sha1;
> > -   base_tree_sha1 = EMPTY_TREE_SHA1_BIN;
> > +   base_oid = _oid;
> > +   base_tree_oid = _tree_oid;
> > if (o->verbosity >= 4)
> > printf("No merge base found; doing history-less 
> > merge\n");
> > } else if (!bases->next) {
> > -   base_sha1 = bases->item->object.oid.hash;
> > -   base_tree_sha1 = bases->item->tree->object.oid.hash;
> > +   base_oid = >item->object.oid;
> > +   base_tree_oid = >item->tree->object.oid;
> > if (o->verbosity >= 4)
> > printf("One merge base found (%.7s)\n",
> > -   sha1_to_hex(base_sha1));
> > +  oid_to_hex(base_oid));
> 
> I noticed you fixed up the indentation.  Thanks.
> 
> > diff --git a/notes-merge.h b/notes-merge.h
> > index 0d890563b..eaa8e3b86 100644
> > --- a/notes-merge.h
> > +++ b/notes-merge.h
> > @@ -33,15 +33,15 @@ void init_notes_merge_options(struct 
> > notes_merge_options *o);
> >   *
> >   * 1. The merge trivially results in an existing commit (e.g. fast-forward 
> > or
> >   *already-up-to-date). 'local_tree' is untouched, the SHA1 of the 
> > result
> > - *is written into 'result_sha1' and 0 is returned.
> > + *is written into 'result_oid' and 0 is returned.
> >   * 2. The merge successfully completes, producing a merge commit. 
> > local_tree
> >   *contains the updated notes tree, the SHA1 of the resulting commit is
> > - *written into 'result_sha1', and 1 is returned.
> > + *written into 'result_oid', and 1 is returned.
> >   * 3. The merge results in conflicts. This is similar to #2 in that the
> >   *partial merge result (i.e. merge result minus the unmerged entries)
> >   *are stored in 'local_tree', and the SHA1 or the resulting commit
> >   *(to be amended when the conflicts have been resolved) is written into
> > - *'result_sha1'. The unmerged entries are written into the
> > + *'result_oid'. The unmerged entries are written into the
> >   *.git/NOTES_MERGE_WORKTREE directory with conflict markers.
> >   *-1 is returned.
> >   *
> 
> Did you want to change the comment to say "object ID" or "OID" instead
> of "SHA1" like you did in an earlier patch?

Yep I can do that here and in the comment below this one.

> -- 
> brian m. carlson / brian with sandals: Houston, Texas, US
> https://www.crustytoothpaste.net/~bmc | My opinion only
> OpenPGP: https://keybase.io/bk2204



-- 
Brandon Williams


Re: [PATCH 00/33] object id conversion (grep and diff)

2017-06-02 Thread Brandon Williams
On 06/02, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
> > Junio C Hamano  writes:
> >
> >> Brandon Williams  writes:
> >>
> >>> A month or so ago I thought I would lend a hand to Brian and do a round of
> >>> conversions from sha1 -> struct object_id.  Now that Brian's latest 
> >>> series has
> >>> hit master I can finally send these patches out.
> >>>
> >>> The first couple patches are from Brian which convert some of the notes 
> >>> logic
> >>> to using 'struct object_id'.  The remaining patches are to convert the 
> >>> grep and
> >>> diff machinery to using 'struct object_id'.
> >>
> >> Nicely done for all of them.  Thanks.  Will queue (with tweaks
> >> mentioned in the comments).
> >
> > Oops.  I won't be able to queue this for now as it heavily conflicts
> > with blame-lib topic.  The resolution should be trivial, mechanical
> > and boring, but takes time that I do not have today.
> 
> I lied.  This also conflicts somewhat with Peff's diff-blob topic.
> I think I resolved them correctly (there needs evil merges applied
> to two files when merging this topic), and hopefully can push out
> the result by the end of the day.
> 
> Thanks.

If it ends up being too much of a headache for you to deal with, let me
know and I can rebase on top of those series.  That way you don't have to
deal with the conflict resolutions.  Just let me know what you'd like me
to do.

-- 
Brandon Williams


Re: pushing for a new hash, was Re: [PATCH 2/3] rebase: Add tests for console output

2017-06-02 Thread Jonathan Nieder
Jonathan Nieder wrote:

> Here's a rough list of some useful tasks, in no particular order:
>
> 1. bc/object-id: This patch series continues, eliminating assumptions
>about the size of object ids by encapsulating them in a struct.
>One straightforward way to find code that still needs to be
>converted is to grep for "sha" --- often the conversion patches
>change function and variable names to refer to oid_ where they used
>to use sha1_, making the stragglers easier to spot.

I forgot to say: please coordinate with Brian M. Carlson if working on
this one.  That makes it possible to divide up the work in a way that
doesn't lead to people patching the same files and stepping on each
other's toes.

This is one of the larger tasks and contributions to it are very
useful.  Just do coordinate.

For comparison, the other suggestions in the list should be possible
for a rogue agent to experiment with alone, with advice from the list
where they need it. :)

Jonathan


Re: pushing for a new hash, was Re: [PATCH 2/3] rebase: Add tests for console output

2017-06-02 Thread Jonathan Nieder
Hi Dscho,

Johannes Schindelin wrote:
> On Thu, 1 Jun 2017, Stefan Beller wrote:

>> We had a discussion off list how much of the test suite is in bad shape,
>> and "$ git grep ^index" points out a lot of places as well.
>
> Maybe we should call out a specific month (or even a longer period) during
> which we try to push toward that new hash function, and focus more on
> those tasks (and on critical bug fixes, if any) than anything else.

Thanks for offering. ;-)

Here's a rough list of some useful tasks, in no particular order:

1. bc/object-id: This patch series continues, eliminating assumptions
   about the size of object ids by encapsulating them in a struct.
   One straightforward way to find code that still needs to be
   converted is to grep for "sha" --- often the conversion patches
   change function and variable names to refer to oid_ where they used
   to use sha1_, making the stragglers easier to spot.

2. Hard-coded object ids in tests: As Stefan hinted, many tests beyond
   t00* make assumptions about the exact values of object ids.  That's
   bad for maintainability for other reasons beyond the hash function
   transition, too.

   It should be possible to suss them out by patching git's sha1
   routine to use the ones-complement of sha1 (~sha1) instead and
   seeing which tests fail.

3. Repository format extension to use a different hash function: we
   want git to be able to work with two hash functions: sha1 and
   something else.  For interoperability and simplity, it is useful
   for a single git binary to support both hash functions.

   That means a repository needs to be able to specify what hash
   function is used for the objects in that repository.  This can be
   configured by setting '[core] repositoryformatversion=1' (to avoid
   confusing old versions of git) and
   '[extensions] experimentalNewHashFunction = true'.
   Documentation/technical/repository-version.txt has more details.

   We can start experimenting with this using e.g. the ~sha1 function
   described at (2), or the 160-bit hash of the patch author's choice
   (e.g. truncated blake2bp-256).

4. When choosing a hash function, people may argue about performance.
   It would be useful for run some benchmarks for git (running
   the test suite, t/perf tests, etc) using a variety of hash
   functions as input to such a discussion.

5. Longer hash: Even once all object id references in git use struct
   object_id (see (1)), we need to tackle other assumptions about
   object id size in git and its tests.

   It should be possible to suss them out by replacing git's sha1
   routine with a 40-byte hash: sha1 with each byte repeated (sha1+sha1)
   and seeing what fails.

6. Repository format extension for longer hash: As in (3), we could
   add a repository format extension to experiment with using the
   sha1+sha1 function.

7. Avoiding wasted memory from unused hash functions: struct object_id
   has definition 'unsigned char hash[GIT_MAX_RAWSZ]', where
   GIT_MAX_RAWSZ is the size of the largest supported hash function.
   When operating on a repository that only uses sha1, this wastes
   memory.

   Avoid that by making object identifiers variable-sized.  That is,
   something like

 struct object_id {
union {
  unsigned char hash20[20];
  unsigned char hash32[32];
} *hash;
 }

   or

 struct object_id {
   unsigned char *hash;
 }

   The hard part is that allocation and destruction have to be
   explicit instead of happening automatically when an object_id is an
   automatic variable.

8. Implement
   http://public-inbox.org/git/20170307001709.gc26...@aiede.mtv.corp.google.com/
   :)  I'd like to send a breakdown of that too, but that probably
   should happen in a separate message.

9. We can use help from security experts in all of this.  Fuzzing,
   analysis of how we use cryptography, security review of other parts
   of the design, and information to help choose a hash function are
   all appreciated.

> I also wonder how we can attract (back) cryptographic talent to help us
> avoid repeating mistakes when picking a new hash algorithm.
>
> So far, the only undisputable expert opinion I read was from the Keccak
> team, and I did not have the impression that their opinion had any impact
> on the discussion. Needless to say: I think it should. Cryptography is
> hard. We proved it ;-)

Do you have some ideas in mind here?

How did you get the impression that their opinion had no impact?  We
have been getting feedback about the choice of hash function both on
and off list from a variety of people, some indisputably security
experts.  Sometimes the best one can do is to just listen.

For what it's worth my personal opinion is currently leaning toward
blake2bp-256 as choice of hash function, not SHA2 or SHA3.  But we
still have time to learn more and make a better decision.

Thanks and hope that helps,
Jonathan


Re: git-2.13.0: log --date=format:%z not working

2017-06-02 Thread René Scharfe

Am 02.06.2017 um 05:08 schrieb Jeff King:

In theory the solution is:

   1. Start using localtime() instead of gmtime() with an adjustment when
  we are converting to the local timezone (i.e., format-local). We
  should be able to do this portably.

  This is easy to do, and it's better than handling %z ourselves,
  because it makes %Z work, too.

   2. When showing the author's timezone, do some trickery to set the
  program's timezone, then use localtime(), then restore the program
  timezone.

  I couldn't get this to work reliably. And anyway, we'd still have
  nothing to put in %Z since we don't have a timezone name at all in
  the git objects. We just have "+0400" or whatever.

So I don't see a portable way to make (2) work.


We could create a strftime wrapper that also takes a time zone offset,
with platform-specific implementations.  Is it worth the effort?

What reliability issues did you run into?


But it seems a shame
that %Z does not work for case (1) with René's patch.

I guess we could do (1) for the local cases and then handle "%z"
ourselves otherwise. That sounds even _more_ confusing, but it at least
gets the most cases right.

If we do handle "%z" ourselves (either always or for just the one case),
what should the matching %Z say? Right now (and I think with René's
patch) it says GMT, which is actively misleading. We should probably
replace it with the same text as "%z". That's not quite what the user
wanted, but at least it's accurate.


On Linux "%z %Z" is expanded to "+0200 CEST" for me, while on Windows I
get "Mitteleurop▒ische Sommerzeit Mitteleurop▒ische Sommerzeit".  (That
"▒" is probably supposed to be an "ä".)  POSIX requires  +hhmm or -hhmm
format for %z, and for %Z is to be "Replaced by the timezone name or
abbreviation".

I'd say "GMT+0200" etc. is a nice enough timezone name, i.e. having %Z
resolve to the same as %z plus a literal prefix of "GMT" should at least
not be wrong.

Alternatively we could have a lookup table mapping a few typical offsets
to timezone names, but e.g. handling daylight saving times would
probably be too hard (when did that part of the world switch in the
given year?  north or south of the equator?)..


As far as the patch itself goes, I'm disappointed to lose the automatic
"%" handling for all of the other callers. But I suspect the boilerplate
involved in any solution that lets callers choose whether or not to use
it would end up being longer than just handling it in each caller.


Actually I felt uneasy when you added that forced %% handling because it
put a policy into an otherwise neutral interpreter function.  I just had
no practical argument against it -- until now.

I'd rather see strbuf_expand also lose the hard-coded percent sign, but
again I don't have an actual user for such a flexibility (yet).

Perhaps we should add a fully neutral strbuf_expand_core (or whatever),
make strbuf_expand a wrapper with hard-coded % and %% handling and use
the core function in the strftime wrapper.  Except that the function is
not easily stackable.  Hmm..

René


Wrong gitattributes documentation?

2017-06-02 Thread Rene Pasing

Hi all,

I have noticed a strange behaviour when using git-lfs.

If I understood correctly, git-lfs adds patterns to .gitattributes, for 
which git then calls lfs for any matches with this pattern.


The problem is, the documentation[1] says: "The rules how the pattern 
matches paths are the same as in .gitignore files; see gitignore[5].", 
so when I have a pattern like '/images/', it should match on all 
files+folders under /images, even the directory itself, right? Or when 
I'd use '/images/*', it should match on all files+folders under /images, 
right?


This does not seem to be the case. Nothing is done by lfs when using 
'/images/', and when using '/images/*', it only takes into account files 
which are directly in /images (so it doesnt take into account 
subdirectories).


The only solution to this (until now) is to use '/images/**/*', which 
seems to do exactly what I want (lfs gets executed for all files+folders 
under /images).


Is this expected behaviour? If yes, then the documentation of 
gitattributes seems incorrect to me. If not, can this be considered as a 
bug?


I found several complaints about this to git-lfs (or other projects 
relying on gitattributes), e.g. [2], but I didn't find anything about 
this regarding gitattributes itself.


Thanks for answers
Regards
Rene

PS: $ git --version
git version 2.1.4

[1] = https://git-scm.com/docs/gitattributes
[2] = https://github.com/git-lfs/git-lfs/issues/2068


Re: Unaligned accesses in sha1dc

2017-06-02 Thread Ævar Arnfjörð Bjarmason
On Fri, Jun 2, 2017 at 4:46 PM, Liam R. Howlett  wrote:
> * ?var Arnfj?r? Bjarmason  [170602 04:53]:
>> On Fri, Jun 2, 2017 at 2:15 AM, Junio C Hamano  wrote:
>> > Martin Ågren  writes:
>> >
>> >> I looked into this some more. It turns out it is possible to trigger
>> >> undefined behavior on "next". Here's what I did:
>> >> ...
>> >>
>> >> This "fixes" the problem:
>> >> ...
>> >> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
>> >> index 3dff80a..d6f4c44 100644
>> >> --- a/sha1dc/sha1.c
>> >> +++ b/sha1dc/sha1.c
>> >> @@ -66,9 +66,9 @@
>> >> ...
>> >> With this diff, various tests which seem relevant for SHA-1 pass,
>> >> including t0013, and the UBSan-error is gone. The second diff is just
>> >> a monkey-patch. I have no reason to believe I will be able to come up
>> >> with a proper and complete patch for sha1dc. And I guess such a thing
>> >> would not really be Git's patch to carry, either. But at least Git
>> >> could consider whether to keep relying on undefined behavior or not.
>> >>
>> >> There's a fair chance I've mangled the whitespace. I'm using gmail's
>> >> web interface... Sorry about that.
>> >
>> > Thanks.  I see Marc Stevens is CC'ed in the thread, so I'd expect
>> > that the final "fix" would come from his sha1collisiondetection
>> > repository via Ævar.
>> >
>> > In the meantime, I am wondering if it makes sense to merge the
>> > earlier update with #ifdef ALLOW_UNALIGNED_ACCESS and #ifdef
>> > SHA1DC_FORCE_LITTLEENDIAN for the v2.13.x maintenance track, which
>> > would at least unblock those on platforms v2.13.0 did not work
>> > correctly at all.
>> >
>> > Ævar, thoughts?
>>
>> I think we're mixing up several things here, which need to be untangled:
>>
>> 1) The sha1dc works just fine on most platforms even with undefined
>> behavior, as evidenced by 2.13.0 working.
>>
>> 2) There was a bug in practice with unaligned access on SPARC. It's
>> not clear to me whether anyone (Andreas, Liam?) still has any issues
>> in practice on any platform without specifying compile flags like what
>> Martin Ågren suggested above.
>>
>> Andreas: Is your initial report of unaligned access here fixed in the
>> next branch with my "sha1dc: update from upstream" commit? You didn't
>> say what platform you were on.
>>
>> Liam: How about your issue on SPARC?
>
> 2.13.0 is very much broken for me on SPARC.
> {maint//git} $ make -j120
> [...]
> {maint//git} $ ./git log
> [1]1004506 bus error (core dumped)  ./git log
>
> This is with b06d36431 (maint).
>
> The same thing happens on v2.13.0-384-g826c06412 (master).
>
> v2.13.0-539-g4b9c06c7d (next) works for me, as did following the
> instructions on upgrading the sha1dc code myself.

Thanks a lot. So that works as I suspect on SPARC, hopefully it'll be
in master (and 2.13.1) soon.

>>
>> 3) Now we have another issue reported by Martin Ågren here, which is
>> that while the code works in practice on most platforms it's using
>> undefined behavior. On my GCC 7.1.1 it's sufficient to:
>
> My platforms gcc is older than 7.1.1.

Right, shouldn't matter. Just thought I'd note the version for context
to note what version was producing that warning.

>>
>> make -j8 CFLAGS="-fsanitize=undefined
>> -fsanitize-recover=undefined" LDFLAGS="-fsanitize=undefined
>> -fsanitize-recover=undefined" all
>>
>> And then run e.g.:
>>
>> ./t0020-crlf.sh -v
>
> These tests pass With my older gcc - which those flags are not
> recognized.
>
> # passed all 35 test(s)
>
>
>>
>> To get spiel like:
>>
>> sha1dc/sha1.c:346:2: runtime error: load of misaligned address
>> 0x5610bf16d005 for type 'const uint32_t', which requires 4 byte
>> alignment
>> 0x5610bf16d005: note: pointer points here
>>  65 6e 74 20 66 30 34  66 61 39 37 36 36 64 62  62 38 65 34 63 37
>> 33 38  34 37 30 61 31 36 63 61  62
>>
>> I think that this is definitely something worth looking into /
>> coordinating with upstream, but I haven't seen anything to suggest
>> that we need to be rushing to get a patch in to fix this given 1) and
>> nobody saying yet that 2) doesn't solve their issue as long as they're
>> not supplying some -fsanitize=* flags.
>>
>> Now, stepping a bit back from this whole thing: I didn't read the
>> entire discussion back in February when sha1dc was integrated, but I
>> really don't see given all this churn / bug reporting we're getting
>> now why another acceptable solution wouldn't be to just revert
>> e6b07da278 ("Makefile: make DC_SHA1 the default", 2017-03-17) &
>> release 2.13.1 with that.
>>
>> Clearly there are outstanding issues with it, and needing to do a
>> memcpy() as my `next` patch does will harm performance on some
>> platforms, and something like Martin's patch on top will slow it down
>> even more.
>>
>> It seems to me that we should give it more time to cook, and better
>> understand the various trade-offs involved. The shattered attack is
>> very unlikely to impact anything 

Re: [PATCH v4 0/8] PCRE v2, PCRE v1 JIT, log -P & fixes

2017-06-02 Thread Ævar Arnfjörð Bjarmason
On Fri, Jun 2, 2017 at 6:10 PM, Johannes Schindelin
 wrote:
> Hi,
>
> On Fri, 2 Jun 2017, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason   writes:
>>
>> > See <20170525200528.22037-1-ava...@gmail.com> for v3
>> > (https://public-inbox.org/git/20170525200528.22037-1-ava...@gmail.com/).
>> >
>> > This is on top of "[PATCH v4 00/31] Easy to review grep & pre-PCRE
>> > changes".
>> >
>> > Changes:
>> >
>> > Ævar Arnfjörð Bjarmason (8):
>> >   grep: don't redundantly compile throwaway patterns under threading
>> >   grep: skip pthreads overhead when using one thread
>> >   log: add -P as a synonym for --perl-regexp
>> >   grep: add support for the PCRE v1 JIT API
>> >   grep: un-break building with PCRE < 8.32
>> >   grep: un-break building with PCRE < 8.20
>> >
>> > No changes.
>> >
>> >   grep: un-break building with PCRE >= 8.32 without --enable-jit
>> >
>> > NEW: It turns out that a PCRE version that supports JIT, but is built
>> > without JIT support will fail at link time since there's no
>> > pcre_jit_exec symbol.
>> >
>> > It also turns out (contrary to what I claimed on list before, my
>> > mistake) that there's no way to detect this through some macro. All
>> > the pcre include files are the same with/without --enable-jit, only
>> > the object file differs.
>> >
>> > So there's now a NO_LIBPCRE1_JIT flag to the Makefile, which is off by
>> > default, but turned on on MinGW. I have not tested that
>> > config.mak.uname change, but everything else I could test on Linux.
>> >
>> > The reason for why it's NO_LIBPCRE1_JIT not USE_LIBPCRE1_JIT is that
>> > in practice pretty much everyone who builds pcre builds it with JIT
>> > (I've looked through various Linux/BSD distro build files), it's MinGW
>> > that's the exception here. Given the performance gain it makes sense
>> > to make it the default.
>> >
>> >   grep: add support for PCRE v2
>> >
>> > Almost no changes, just:
>> >
>> >  * A trivial change to stop redundantly assigning to pcre2_jit_on,
>> >mistakenly left over from an earlier version.
>> >
>> >  * Updated commit message / perf numbers for the extra patches in the
>> >series both here and in v3.
>>
>> Nicely summarised and matches what I received; thanks, will replace.
>
> For the record: I spent the entire development time I had today on trying
> to get PCRE2 to build and to figure out which PCRE2 tests fail and why.
>
> I hoped to get to the bottom why the JIT is disabled in PCRE1, but ran out
> of time.
>
> I seem to have gotten PCRE2 to build and figured out why the tests failed
> (spoiler: all of the failures were bogus and no indication of an
> incorrectly-built PCRE2).
>
> I barely had time to build `pu` (forcing PCRE2) and to run the test
> scripts whose file names contain the substring "grep". Seems to work so
> far, but this is by no means comprehensive testing; it is more like hushed
> and rushed testing on a Friday night when I should have stopped working 10
> minutes ago.
>
> Will continue with testing Git for Windows using PCRE2 next week and keep
> you posted,

Thanks a lot for testing it. Great to hear that it (definitely almost) works!

If the grep tests it's very likely that all of them will pass, the
only tests I run when developing this series (outside of the full run
for list submission) are t[0-9]*grep*.sh t[0-9]*log*.sh tests, since
those are the only ones impacted by it.


Re: git-gui ignores core.hooksPath

2017-06-02 Thread Ævar Arnfjörð Bjarmason
On Fri, Jun 2, 2017 at 3:41 PM, Philipp Gortan  wrote:
> Hi git devs,
>
> First off, thanks for your awesome work!
>
> I've been unhappy for quite a while that I had to configure the hooks
> manually for each of my repos - until I found out recently that there is
> the core.hooksPath config variable that (when set globally) allows me to
> specify a hooks directory to be used for all my repositories.
>
> Now I was happy - for a few minutes, until I tested this feature in
> git-gui, and realized that it doesn't work there.
>
> This seems to be caused by "proc githook_read", which says "set pchook
> [gitdir hooks $hook_name]" instead of querying "git config
> core.hooksPath" first - cf
> https://github.com/git/git/blob/2cc2e70264e0fcba04f9ef791d144bbc8b501206/git-gui/git-gui.sh#L627
>
> Would be great if this could get fixed...

Hi. I added core.hooksPath, glad to see it's useful to other people.

This indeed is something that should be fixed, but git-gui development
is managed outside of git.git, it's just occasionally pulled in. I'm
not what the best place to contact is, but I've CC'd
Philip Oakley who's been making recent commits to git-gui.git at
http://repo.or.cz/git-gui.git/


Re: [PATCH v4 0/8] PCRE v2, PCRE v1 JIT, log -P & fixes

2017-06-02 Thread Johannes Schindelin
Hi,

On Fri, 2 Jun 2017, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason   writes:
> 
> > See <20170525200528.22037-1-ava...@gmail.com> for v3
> > (https://public-inbox.org/git/20170525200528.22037-1-ava...@gmail.com/).
> >
> > This is on top of "[PATCH v4 00/31] Easy to review grep & pre-PCRE
> > changes".
> >
> > Changes:
> >
> > Ævar Arnfjörð Bjarmason (8):
> >   grep: don't redundantly compile throwaway patterns under threading
> >   grep: skip pthreads overhead when using one thread
> >   log: add -P as a synonym for --perl-regexp
> >   grep: add support for the PCRE v1 JIT API
> >   grep: un-break building with PCRE < 8.32
> >   grep: un-break building with PCRE < 8.20
> >
> > No changes.
> >
> >   grep: un-break building with PCRE >= 8.32 without --enable-jit
> >
> > NEW: It turns out that a PCRE version that supports JIT, but is built
> > without JIT support will fail at link time since there's no
> > pcre_jit_exec symbol.
> >
> > It also turns out (contrary to what I claimed on list before, my
> > mistake) that there's no way to detect this through some macro. All
> > the pcre include files are the same with/without --enable-jit, only
> > the object file differs.
> >
> > So there's now a NO_LIBPCRE1_JIT flag to the Makefile, which is off by
> > default, but turned on on MinGW. I have not tested that
> > config.mak.uname change, but everything else I could test on Linux.
> >
> > The reason for why it's NO_LIBPCRE1_JIT not USE_LIBPCRE1_JIT is that
> > in practice pretty much everyone who builds pcre builds it with JIT
> > (I've looked through various Linux/BSD distro build files), it's MinGW
> > that's the exception here. Given the performance gain it makes sense
> > to make it the default.
> >
> >   grep: add support for PCRE v2
> >
> > Almost no changes, just:
> >
> >  * A trivial change to stop redundantly assigning to pcre2_jit_on,
> >mistakenly left over from an earlier version.
> >
> >  * Updated commit message / perf numbers for the extra patches in the
> >series both here and in v3.
> 
> Nicely summarised and matches what I received; thanks, will replace.

For the record: I spent the entire development time I had today on trying
to get PCRE2 to build and to figure out which PCRE2 tests fail and why.

I hoped to get to the bottom why the JIT is disabled in PCRE1, but ran out
of time.

I seem to have gotten PCRE2 to build and figured out why the tests failed
(spoiler: all of the failures were bogus and no indication of an
incorrectly-built PCRE2).

I barely had time to build `pu` (forcing PCRE2) and to run the test
scripts whose file names contain the substring "grep". Seems to work so
far, but this is by no means comprehensive testing; it is more like hushed
and rushed testing on a Friday night when I should have stopped working 10
minutes ago.

Will continue with testing Git for Windows using PCRE2 next week and keep
you posted,
Dscho

Re: Unaligned accesses in sha1dc

2017-06-02 Thread Liam R. Howlett
* ?var Arnfj?r? Bjarmason  [170602 04:53]:
> On Fri, Jun 2, 2017 at 2:15 AM, Junio C Hamano  wrote:
> > Martin Ågren  writes:
> >
> >> I looked into this some more. It turns out it is possible to trigger
> >> undefined behavior on "next". Here's what I did:
> >> ...
> >>
> >> This "fixes" the problem:
> >> ...
> >> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
> >> index 3dff80a..d6f4c44 100644
> >> --- a/sha1dc/sha1.c
> >> +++ b/sha1dc/sha1.c
> >> @@ -66,9 +66,9 @@
> >> ...
> >> With this diff, various tests which seem relevant for SHA-1 pass,
> >> including t0013, and the UBSan-error is gone. The second diff is just
> >> a monkey-patch. I have no reason to believe I will be able to come up
> >> with a proper and complete patch for sha1dc. And I guess such a thing
> >> would not really be Git's patch to carry, either. But at least Git
> >> could consider whether to keep relying on undefined behavior or not.
> >>
> >> There's a fair chance I've mangled the whitespace. I'm using gmail's
> >> web interface... Sorry about that.
> >
> > Thanks.  I see Marc Stevens is CC'ed in the thread, so I'd expect
> > that the final "fix" would come from his sha1collisiondetection
> > repository via Ævar.
> >
> > In the meantime, I am wondering if it makes sense to merge the
> > earlier update with #ifdef ALLOW_UNALIGNED_ACCESS and #ifdef
> > SHA1DC_FORCE_LITTLEENDIAN for the v2.13.x maintenance track, which
> > would at least unblock those on platforms v2.13.0 did not work
> > correctly at all.
> >
> > Ævar, thoughts?
> 
> I think we're mixing up several things here, which need to be untangled:
> 
> 1) The sha1dc works just fine on most platforms even with undefined
> behavior, as evidenced by 2.13.0 working.
> 
> 2) There was a bug in practice with unaligned access on SPARC. It's
> not clear to me whether anyone (Andreas, Liam?) still has any issues
> in practice on any platform without specifying compile flags like what
> Martin Ågren suggested above.
> 
> Andreas: Is your initial report of unaligned access here fixed in the
> next branch with my "sha1dc: update from upstream" commit? You didn't
> say what platform you were on.
> 
> Liam: How about your issue on SPARC?

2.13.0 is very much broken for me on SPARC.
{maint//git} $ make -j120
[...]
{maint//git} $ ./git log
[1]1004506 bus error (core dumped)  ./git log

This is with b06d36431 (maint).

The same thing happens on v2.13.0-384-g826c06412 (master).

v2.13.0-539-g4b9c06c7d (next) works for me, as did following the
instructions on upgrading the sha1dc code myself.

> 
> 3) Now we have another issue reported by Martin Ågren here, which is
> that while the code works in practice on most platforms it's using
> undefined behavior. On my GCC 7.1.1 it's sufficient to:

My platforms gcc is older than 7.1.1.

> 
> make -j8 CFLAGS="-fsanitize=undefined
> -fsanitize-recover=undefined" LDFLAGS="-fsanitize=undefined
> -fsanitize-recover=undefined" all
> 
> And then run e.g.:
> 
> ./t0020-crlf.sh -v

These tests pass With my older gcc - which those flags are not
recognized.

# passed all 35 test(s)


> 
> To get spiel like:
> 
> sha1dc/sha1.c:346:2: runtime error: load of misaligned address
> 0x5610bf16d005 for type 'const uint32_t', which requires 4 byte
> alignment
> 0x5610bf16d005: note: pointer points here
>  65 6e 74 20 66 30 34  66 61 39 37 36 36 64 62  62 38 65 34 63 37
> 33 38  34 37 30 61 31 36 63 61  62
> 
> I think that this is definitely something worth looking into /
> coordinating with upstream, but I haven't seen anything to suggest
> that we need to be rushing to get a patch in to fix this given 1) and
> nobody saying yet that 2) doesn't solve their issue as long as they're
> not supplying some -fsanitize=* flags.
> 
> Now, stepping a bit back from this whole thing: I didn't read the
> entire discussion back in February when sha1dc was integrated, but I
> really don't see given all this churn / bug reporting we're getting
> now why another acceptable solution wouldn't be to just revert
> e6b07da278 ("Makefile: make DC_SHA1 the default", 2017-03-17) &
> release 2.13.1 with that.
> 
> Clearly there are outstanding issues with it, and needing to do a
> memcpy() as my `next` patch does will harm performance on some
> platforms, and something like Martin's patch on top will slow it down
> even more.
> 
> It seems to me that we should give it more time to cook, and better
> understand the various trade-offs involved. The shattered attack is
> very unlikely to impact anything in practice, and users who are
> paranoid about it can opt-in to this extra protection.

I have not seen issues with DC_SAH1 with the newer code base on SPARC.

Thanks,
Liam


Re: git-gui ignores core.hooksPath

2017-06-02 Thread Samuel Lijin
On Fri, Jun 2, 2017 at 9:41 AM, Philipp Gortan  wrote:
> Hi git devs,
>
> First off, thanks for your awesome work!
>
> I've been unhappy for quite a while that I had to configure the hooks
> manually for each of my repos - until I found out recently that there is
> the core.hooksPath config variable that (when set globally) allows me to
> specify a hooks directory to be used for all my repositories.

OT but you may also want to look into using Git templates.

> Now I was happy - for a few minutes, until I tested this feature in
> git-gui, and realized that it doesn't work there.
>
> This seems to be caused by "proc githook_read", which says "set pchook
> [gitdir hooks $hook_name]" instead of querying "git config
> core.hooksPath" first - cf
> https://github.com/git/git/blob/2cc2e70264e0fcba04f9ef791d144bbc8b501206/git-gui/git-gui.sh#L627
>
> Would be great if this could get fixed...
>
> Thanks, Philipp
>


git-gui ignores core.hooksPath

2017-06-02 Thread Philipp Gortan
Hi git devs,

First off, thanks for your awesome work!

I've been unhappy for quite a while that I had to configure the hooks
manually for each of my repos - until I found out recently that there is
the core.hooksPath config variable that (when set globally) allows me to
specify a hooks directory to be used for all my repositories.

Now I was happy - for a few minutes, until I tested this feature in
git-gui, and realized that it doesn't work there.

This seems to be caused by "proc githook_read", which says "set pchook
[gitdir hooks $hook_name]" instead of querying "git config
core.hooksPath" first - cf
https://github.com/git/git/blob/2cc2e70264e0fcba04f9ef791d144bbc8b501206/git-gui/git-gui.sh#L627

Would be great if this could get fixed...

Thanks, Philipp



signature.asc
Description: OpenPGP digital signature


pushing for a new hash, was Re: [PATCH 2/3] rebase: Add tests for console output

2017-06-02 Thread Johannes Schindelin
Hi,

On Thu, 1 Jun 2017, Stefan Beller wrote:

> On Thu, Jun 1, 2017 at 4:40 PM, Junio C Hamano  wrote:
> > Johannes Schindelin  writes:
> >
> >> Also, about the commit IDs. As long as the tests are consistent (i.e. they
> >> use test_commit rather than plain `git commit`, or at least call
> >> `test_tick` beforehand), the commit IDs should actually be identical
> >> between runs and not depend on the time of day or the date.
> >>
> >> The only exception to that rule is when some previous test cases call
> >> `test_commit` but are guarded behind some prereq and may not be executed
> >> at all. In that case, the precise commit IDs depend on the particular set
> >> of active prereqs.
> >
> > Good observation.  The tests written such a way may make later
> > introduction of new hash function troublesome, though (we already
> > have tons of them, and it is already a hassle just imagining that we
> > will have to migrate them X-<).
> >
> > And what you gave below is an excellent suggestion to even solve
> > that future headaches.
> 
> We had a discussion off list how much of the test suite is in bad shape,
> and "$ git grep ^index" points out a lot of places as well.

Maybe we should call out a specific month (or even a longer period) during
which we try to push toward that new hash function, and focus more on
those tasks (and on critical bug fixes, if any) than anything else.

I also wonder how we can attract (back) cryptographic talent to help us
avoid repeating mistakes when picking a new hash algorithm.

So far, the only undisputable expert opinion I read was from the Keccak
team, and I did not have the impression that their opinion had any impact
on the discussion. Needless to say: I think it should. Cryptography is
hard. We proved it ;-)

Ciao,
Dscho


[GSoC][PATCH v6 1/2] submodule: fix buggy $path and $sm_path variable's value

2017-06-02 Thread Prathamesh Chavan
According to the documentation about git-submodule foreach subcommand's
$path variable:
$path is the name of the submodule directory relative to the superproject

But it was observed when the value of the $path value deviates from this
for the nested submodules when the  is run from a subdirectory.
This patch aims for its correction.

Additional test cases added to the submodule-foreach test suite in t7407,
to check the submodule foreach --recursive behavior from a subdirectory
as this was missing from the test suite.

Helped-by: Brandon Williams 
Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 git-submodule.sh |  2 +-
 t/t7407-submodule-foreach.sh | 33 +
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index c0d0e9a4c..ea6f56337 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -344,9 +344,9 @@ cmd_foreach()
prefix="$prefix$sm_path/"
sanitize_submodule_env
cd "$sm_path" &&
-   sm_path=$(git submodule--helper relative-path 
"$sm_path" "$wt_prefix") &&
# we make $path available to scripts ...
path=$sm_path &&
+   sm_path=$displaypath &&
if test $# -eq 1
then
eval "$1"
diff --git a/t/t7407-submodule-foreach.sh b/t/t7407-submodule-foreach.sh
index 6ba5daf42..f4e07eee3 100755
--- a/t/t7407-submodule-foreach.sh
+++ b/t/t7407-submodule-foreach.sh
@@ -197,6 +197,39 @@ test_expect_success 'test messages from "foreach 
--recursive" from subdirectory'
test_i18ncmp expect actual
 '
 
+sub1sha1=$(cd clone2/sub1 && git rev-parse HEAD)
+sub2sha1=$(cd clone2/sub2 && git rev-parse HEAD)
+sub3sha1=$(cd clone2/sub3 && git rev-parse HEAD)
+nested1sha1=$(cd clone2/nested1 && git rev-parse HEAD)
+nested2sha1=$(cd clone2/nested1/nested2 && git rev-parse HEAD)
+nested3sha1=$(cd clone2/nested1/nested2/nested3 && git rev-parse HEAD)
+submodulesha1=$(cd clone2/nested1/nested2/nested3/submodule && git rev-parse 
HEAD)
+
+cat >expect <../../actual
+   ) &&
+   test_i18ncmp expect actual
+'
+
 cat > expect <

[GSoC][PATCH v6 2/2] submodule: port subcommand foreach from shell to C

2017-06-02 Thread Prathamesh Chavan
This aims to make git-submodule foreach a builtin. This is the very
first step taken in this direction. Hence, 'foreach' is ported to
submodule--helper, and submodule--helper is called from git-submodule.sh.
The code is split up to have one function to obtain all the list of
submodules. This function acts as the front-end of git-submodule foreach
subcommand. It calls the function for_each_submodule_list, which basically
loops through the list and calls function fn, which in this case is
runcommand_in_submodule. This third function is a calling function that
takes care of running the command in that submodule, and recursively
perform the same when --recursive is flagged.

The first function module_foreach first parses the options present in
argv, and then with the help of module_list_compute, generates the list of
submodules present in the current working tree.

The second function for_each_submodule_list traverses through the
list, and calls function fn (which in case of submodule subcommand
foreach is runcommand_in_submodule) is called for each entry.

The third function runcommand_in_submodule, generates a submodule struct sub
for $name, value and then later prepends name=sub->name; and other
value assignment to the env argv_array structure of a child_process.
Also the  of submodule-foreach is push to args argv_array
structure and finally, using run_command the commands are executed
using a shell.

The third function also takes care of the recursive flag, by creating
a separate child_process structure and prepending "--super-prefix displaypath",
to the args argv_array structure. Other required arguments and the
input  of submodule-foreach is also appended to this argv_array.

Helped-by: Brandon Williams 
Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
In this new version of, we avoid exporting the $path variable to
the environment, but instead prefix the  with it.
In this way, we avoid the issue it creates with the env variable
$PATH in windows.

Other than that, additionally the case of no. of arugments in 
being equal to 1 is also considered separetly.
THe reason of having this change in the shell script was given in the
commit 1c4fb136db.
According to my understanding, eval "$1" executes $1 in same shell,
whereas "$@" gets executed in a separate shell, which doesn't allow
"$@" to access the env variables $name, $path, etc.
Hence, to keep the ported function similar, this condition is also
added. 

Apart from this, other suggested changes are also implemented.

Complete build report is available at:
https://travis-ci.org/pratham-pc/git/builds
Branch: submodule-foreach
Build #87
 
 builtin/submodule--helper.c | 153 
 git-submodule.sh|  39 +--
 2 files changed, 154 insertions(+), 38 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 566a5b6a6..d08a02ad9 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -13,6 +13,9 @@
 #include "refs.h"
 #include "connect.h"
 
+typedef void (*submodule_list_func_t)(const struct cache_entry *list_item,
+ void *cb_data);
+
 static char *get_default_remote(void)
 {
char *dest = NULL, *ret;
@@ -219,6 +222,25 @@ static int resolve_relative_url_test(int argc, const char 
**argv, const char *pr
return 0;
 }
 
+static char *get_submodule_displaypath(const char *path, const char *prefix)
+{
+   const char *super_prefix = get_super_prefix();
+
+   if (prefix && super_prefix) {
+   BUG("cannot have prefix '%s' and superprefix '%s'",
+   prefix, super_prefix);
+   } else if (prefix) {
+   struct strbuf sb = STRBUF_INIT;
+   char *displaypath = xstrdup(relative_path(path, prefix, ));
+   strbuf_release();
+   return displaypath;
+   } else if (super_prefix) {
+   return xstrfmt("%s/%s", super_prefix, path);
+   } else {
+   return xstrdup(path);
+   }
+}
+
 struct module_list {
const struct cache_entry **entries;
int alloc, nr;
@@ -331,6 +353,14 @@ static int module_list(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
+static void for_each_submodule_list(const struct module_list list,
+   submodule_list_func_t fn, void *cb_data)
+{
+   int i;
+   for (i = 0; i < list.nr; i++)
+   fn(list.entries[i], cb_data);
+}
+
 static void init_submodule(const char *path, const char *prefix, int quiet)
 {
const struct submodule *sub;
@@ -487,6 +517,128 @@ static int module_name(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
+struct cb_foreach {
+   int argc;
+   const char **argv;
+   const char *prefix;
+   unsigned int quiet: 1;
+   

[PATCH] perf: work around the tested repo having an index.lock

2017-06-02 Thread Ævar Arnfjörð Bjarmason
When the tested repo has an index.lock file it should be removed. This
file may be present if e.g. git-status previously crashed in that
repo, and it will make a lot of git commands fail. Let's try harder
and remove the lock.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/perf/perf-lib.sh | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
index b6fc880395..b50211b259 100644
--- a/t/perf/perf-lib.sh
+++ b/t/perf/perf-lib.sh
@@ -108,7 +108,14 @@ test_perf_create_repo_from () {
cd "$repo" &&
"$MODERN_GIT" init -q &&
test_perf_do_repo_symlink_config_ &&
-   mv .git/hooks .git/hooks-disabled 2>/dev/null
+   mv .git/hooks .git/hooks-disabled 2>/dev/null &&
+   if test -f .git/index.lock
+   then
+   # We may be copying a repo that can't run "git
+   # status" due to a locked index. Since we have
+   # a copy it's fine to remove the lock.
+   rm .git/index.lock
+   fi
) || error "failed to copy repository '$source' to '$repo'"
 }
 
-- 
2.13.0.506.g27d5fe0cd



[WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor

2017-06-02 Thread Ævar Arnfjörð Bjarmason
Add a performance test for the new core.fsmonitor facility using the
sample query-fsmonitor hook.

This is WIP code for the reasons explained in the setup comments,
unfortunately the perf code doesn't easily allow you to run different
setup code for different versions you're testing. This test will stop
working if the fsmonitor is merged into the master branch.

Output against linxu.git:

$ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_MAKE_OPTS='-j8' ./run origin/master avar/fsmonitor ./p7519-fsmonitor.sh
[...]
Test  origin/master avar/fsmonitor
---
7519.2: status (first)0.08(0.04+0.09)   0.12(0.07+0.10) +50.0%
7519.3: status (subsequent)   0.08(0.04+0.09)   0.12(0.06+0.11) +50.0%
7519.4: status -uno   0.02(0.02+0.05)   0.06(0.05+0.06) +200.0%
7519.5: status -uall  0.08(0.06+0.07)   0.12(0.07+0.10) +50.0%

And against a larger in-house monorepo I have here, with the same
options (except the repo path):

Test  origin/master avar/fsmonitor
---
7519.2: status (first)0.20(0.11+0.18)   0.27(0.15+0.21) +35.0%
7519.3: status (subsequent)   0.20(0.11+0.18)   0.27(0.15+0.21) +35.0%
7519.4: status -uno   0.04(0.03+0.10)   0.22(0.08+0.12) +450.0%
7519.5: status -uall  0.20(0.13+0.16)   0.27(0.18+0.19) +35.0%

Against linux.git with a hack to flush the FS cache (on Linux) before
running the first 'git status', only running one test so the result
isn't discarded as the slowest of N:

$ GIT_PERF_REPEAT_COUNT=1 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_MAKE_COMMAND='sudo sync && echo 3 | sudo tee /proc/sys/vm/drop_caches 
>/dev/null && make -j8' ./run origin/master avar/fsmonitor ./p7519-fsmonitor.sh
[...]
Test  origin/master avar/fsmonitor

7519.2: status (first)0.30(0.18+0.10)   8.26(0.22+0.10) +2653.3%
7519.3: status (subsequent)   0.08(0.04+0.08)   0.81(0.09+0.07) +912.5%
7519.4: status -uno   0.02(0.01+0.06)   0.08(0.04+0.07) +300.0%
7519.5: status -uall  0.08(0.06+0.07)   0.15(0.08+0.09) +87.5%

Now obviously due to 1 run that has a lot of noise, but I would expect
that first invocation to be blindingly fast since watchman has info on
what files were modified since the cache was flushed.

The same on the large monorepo noted above:

Test  origin/master avar/fsmonitor
---
7519.2: status (first)0.59(0.28+0.24)   0.93(0.35+0.19) +57.6%
7519.3: status (subsequent)   0.20(0.10+0.19)   0.28(0.16+0.20) +40.0%
7519.4: status -uno   0.04(0.04+0.09)   0.11(0.08+0.12) +175.0%
7519.5: status -uall  0.29(0.11+0.18)   0.40(0.16+0.19) +37.9%

Signed-off-by: Ævar Arnfjörð Bjarmason 
---


On Fri, Jun 2, 2017 at 2:40 AM, Ben Peart  wrote:
> Any chance you can provide me with a bash script that contains the exact
> sequence of commands you are running to get this result?  I've been trying
> to replicate it using your notes but have not been able to.  I'd like to see
> if it is a repo difference, a platform difference, a command sequence
> difference (or something else entirely :)).

I can do better than that, here's a new perf test on top of this
series which demonstates the issue. I've only tested this on Linux
4.9.0 with watchman 4.9.0 compiled from git (yes, they're
coincidentally the same version).

A good addition to this would be `printf  | watchman -j` as noted in my earlier mail, but I ran out of
time.

You can also set any combination of GIT_PERF_7519_UNTRACKED_CACHE &
GIT_PERF_7519_SPLIT_INDEX to play with turning that on. I haven't
tested all combinations of that, but e.g. testing with untrackedCache
didn't give results that looked different from the performance
regressions noted above.

Aside from performance, I think a very good addition to stress-test
this series would be a patch to t/test-lib*sh guarded by some env flag
to do a similar watchman watch-del/watch/watch-list dance as the one
I'm doing here in the setup, and setting up the hook / config.

That would allow testing the entire git test suite with this feature,
to find any subtle bugs this might have introduced in git-status.

 t/perf/p7519-fsmonitor.sh | 58 +++
 1 file changed, 58 insertions(+)
 create mode 100755 t/perf/p7519-fsmonitor.sh

diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh
new file mode 100755
index 00..b838a0ff14
--- /dev/null
+++ b/t/perf/p7519-fsmonitor.sh
@@ -0,0 +1,58 @@
+#!/bin/sh
+
+test_description="Test core.fsmonitor"
+
+. 

Re: preserve untracked cache, was Re: What's cooking in git.git (Jun 2017, #01; Thu, 1)

2017-06-02 Thread Johannes Schindelin
Hi Junio,

On Fri, 2 Jun 2017, Junio C Hamano wrote:

> Samuel Lijin  writes:
> 
> >> What is holding this topic up? Anything Ben or I can do to move this
> >> closer to `next` or even `master`?
> >
> > It's in `next` right now (3196d093d6).
> 
> Thanks for pinging and checking ;-)  
> 
> I think the topic was merged to 'next' on the 23rd of last month and
> graduated to 'master' in the past few days, together with other
> topics.

Okay. I never saw any "Will merge to" message, so I got worried.

Thanks,
Dscho


Re: Unaligned accesses in sha1dc

2017-06-02 Thread Martin Ågren
On 2 June 2017 at 10:51, Ævar Arnfjörð Bjarmason  wrote:
> On Fri, Jun 2, 2017 at 2:15 AM, Junio C Hamano  wrote:
>> Martin Ågren  writes:
>>
>>> I looked into this some more. It turns out it is possible to trigger
>>> undefined behavior on "next". Here's what I did:
>>> ...
>>>
>>> This "fixes" the problem:
>>> ...
>>> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
>>> index 3dff80a..d6f4c44 100644
>>> --- a/sha1dc/sha1.c
>>> +++ b/sha1dc/sha1.c
>>> @@ -66,9 +66,9 @@
>>> ...
>>> With this diff, various tests which seem relevant for SHA-1 pass,
>>> including t0013, and the UBSan-error is gone. The second diff is just
>>> a monkey-patch. I have no reason to believe I will be able to come up
>>> with a proper and complete patch for sha1dc. And I guess such a thing
>>> would not really be Git's patch to carry, either. But at least Git
>>> could consider whether to keep relying on undefined behavior or not.
>>>
>>> There's a fair chance I've mangled the whitespace. I'm using gmail's
>>> web interface... Sorry about that.
>>
>> Thanks.  I see Marc Stevens is CC'ed in the thread, so I'd expect
>> that the final "fix" would come from his sha1collisiondetection
>> repository via Ævar.
>>
>> In the meantime, I am wondering if it makes sense to merge the
>> earlier update with #ifdef ALLOW_UNALIGNED_ACCESS and #ifdef
>> SHA1DC_FORCE_LITTLEENDIAN for the v2.13.x maintenance track, which
>> would at least unblock those on platforms v2.13.0 did not work
>> correctly at all.
>>
>> Ævar, thoughts?
>
> I think we're mixing up several things here, which need to be untangled:
>
> 1) The sha1dc works just fine on most platforms even with undefined
> behavior, as evidenced by 2.13.0 working.

Right, with "platform" meaning "combination of hardware-architecture
and compiler". Nothing can be said about how the current code behaves
on "x86". Such statements can only be made with regard to "x86 and
this or that compiler". Even then, short of studying the compiler
implementation/documentation in detail, one cannot be certain that
seemingly unrelated changes in Git don't make the code do something
else entirely.

> 2) There was a bug in practice with unaligned access on SPARC. It's
> not clear to me whether anyone (Andreas, Liam?) still has any issues
> in practice on any platform without specifying compile flags like what
> Martin Ågren suggested above.

True.

> 3) Now we have another issue reported by Martin Ågren here, which is
> that while the code works in practice on most platforms it's using
> undefined behavior.
...
> I think that this is definitely something worth looking into /
> coordinating with upstream, but I haven't seen anything to suggest
> that we need to be rushing to get a patch in to fix this given 1) and
> nobody saying yet that 2) doesn't solve their issue as long as they're
> not supplying some -fsanitize=* flags.
>
> Now, stepping a bit back from this whole thing: I didn't read the
> entire discussion back in February when sha1dc was integrated, but I
> really don't see given all this churn / bug reporting we're getting
> now why another acceptable solution wouldn't be to just revert
> e6b07da278 ("Makefile: make DC_SHA1 the default", 2017-03-17) &
> release 2.13.1 with that.
>
> Clearly there are outstanding issues with it, and needing to do a
> memcpy() as my `next` patch does will harm performance on some
> platforms, and something like Martin's patch on top will slow it down
> even more.

The only thing in my second "patch" which could possibly affect
performance as I see it would be the call to memcpy(.. ,.. ,4),
including pointer-calculation. Focusing on x86, I would not say that
it "will" slow it down until I'd measured performance. I wouldn't even
rule out that the compiled assembler could be identical. I would just
say the patch "would most likely slow it down even more on some
architectures, with some compilers and/or with some
compiler-settings".

If undefined behavior is avoided with memcpy(.., .., 4) then there
should be no formal need for your "big" memcpy where things are copied
into a known-to-be-aligned buffer. The behavior will be defined on all
architectures, anyway. Then your memcpy would simply be part of an
optimization to prefer one big memcpy and many loads instead of many
small memcpy-calls. On some architectures, that might be a very good
optimization. But on others, if the small memcpy is compiled to a
simple load, then I believe such an optimization would most likely be
a slow-down (modulo crazy-clever compiler optimizations).

> It seems to me that we should give it more time to cook, and better
> understand the various trade-offs involved. The shattered attack is
> very unlikely to impact anything in practice, and users who are
> paranoid about it can opt-in to this extra protection.

Regarding reverting and cooking, I don't feel like I'm in a position
to express an opinion. Thanks for thinking about this undefined

Re: Unaligned accesses in sha1dc

2017-06-02 Thread Ævar Arnfjörð Bjarmason
On Fri, Jun 2, 2017 at 2:15 AM, Junio C Hamano  wrote:
> Martin Ågren  writes:
>
>> I looked into this some more. It turns out it is possible to trigger
>> undefined behavior on "next". Here's what I did:
>> ...
>>
>> This "fixes" the problem:
>> ...
>> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
>> index 3dff80a..d6f4c44 100644
>> --- a/sha1dc/sha1.c
>> +++ b/sha1dc/sha1.c
>> @@ -66,9 +66,9 @@
>> ...
>> With this diff, various tests which seem relevant for SHA-1 pass,
>> including t0013, and the UBSan-error is gone. The second diff is just
>> a monkey-patch. I have no reason to believe I will be able to come up
>> with a proper and complete patch for sha1dc. And I guess such a thing
>> would not really be Git's patch to carry, either. But at least Git
>> could consider whether to keep relying on undefined behavior or not.
>>
>> There's a fair chance I've mangled the whitespace. I'm using gmail's
>> web interface... Sorry about that.
>
> Thanks.  I see Marc Stevens is CC'ed in the thread, so I'd expect
> that the final "fix" would come from his sha1collisiondetection
> repository via Ævar.
>
> In the meantime, I am wondering if it makes sense to merge the
> earlier update with #ifdef ALLOW_UNALIGNED_ACCESS and #ifdef
> SHA1DC_FORCE_LITTLEENDIAN for the v2.13.x maintenance track, which
> would at least unblock those on platforms v2.13.0 did not work
> correctly at all.
>
> Ævar, thoughts?

I think we're mixing up several things here, which need to be untangled:

1) The sha1dc works just fine on most platforms even with undefined
behavior, as evidenced by 2.13.0 working.

2) There was a bug in practice with unaligned access on SPARC. It's
not clear to me whether anyone (Andreas, Liam?) still has any issues
in practice on any platform without specifying compile flags like what
Martin Ågren suggested above.

Andreas: Is your initial report of unaligned access here fixed in the
next branch with my "sha1dc: update from upstream" commit? You didn't
say what platform you were on.

Liam: How about your issue on SPARC?

3) Now we have another issue reported by Martin Ågren here, which is
that while the code works in practice on most platforms it's using
undefined behavior. On my GCC 7.1.1 it's sufficient to:

make -j8 CFLAGS="-fsanitize=undefined
-fsanitize-recover=undefined" LDFLAGS="-fsanitize=undefined
-fsanitize-recover=undefined" all

And then run e.g.:

./t0020-crlf.sh -v

To get spiel like:

sha1dc/sha1.c:346:2: runtime error: load of misaligned address
0x5610bf16d005 for type 'const uint32_t', which requires 4 byte
alignment
0x5610bf16d005: note: pointer points here
 65 6e 74 20 66 30 34  66 61 39 37 36 36 64 62  62 38 65 34 63 37
33 38  34 37 30 61 31 36 63 61  62

I think that this is definitely something worth looking into /
coordinating with upstream, but I haven't seen anything to suggest
that we need to be rushing to get a patch in to fix this given 1) and
nobody saying yet that 2) doesn't solve their issue as long as they're
not supplying some -fsanitize=* flags.

Now, stepping a bit back from this whole thing: I didn't read the
entire discussion back in February when sha1dc was integrated, but I
really don't see given all this churn / bug reporting we're getting
now why another acceptable solution wouldn't be to just revert
e6b07da278 ("Makefile: make DC_SHA1 the default", 2017-03-17) &
release 2.13.1 with that.

Clearly there are outstanding issues with it, and needing to do a
memcpy() as my `next` patch does will harm performance on some
platforms, and something like Martin's patch on top will slow it down
even more.

It seems to me that we should give it more time to cook, and better
understand the various trade-offs involved. The shattered attack is
very unlikely to impact anything in practice, and users who are
paranoid about it can opt-in to this extra protection.


Re: How to avoid "Please tell me who you are..."?

2017-06-02 Thread Ævar Arnfjörð Bjarmason
On Fri, Jun 2, 2017 at 9:15 AM, Junio C Hamano  wrote:
> Jeffrey Walton  writes:
>
>> Is there no switch? Its the most efficient way to accomplish the task.
>
> This is a safety to help normal human users from hurting themselves,
> and it does not make any sense to have "I have no name, so record
> garbage, please" option, switch or setting that is different from
> "Here is the name I want you to use when recording things".
>
> The switch _is_ to set the names with whatever standard way.

Presumably OP doesn't want to mess with the env for whatever reason,
in that case:

git -c user.name=Nobody -c user.email=nob...@example.com
cherry-pick 


Re: [PATCH 00/33] object id conversion (grep and diff)

2017-06-02 Thread Junio C Hamano
Junio C Hamano  writes:

> Junio C Hamano  writes:
>
>> Brandon Williams  writes:
>>
>>> A month or so ago I thought I would lend a hand to Brian and do a round of
>>> conversions from sha1 -> struct object_id.  Now that Brian's latest series 
>>> has
>>> hit master I can finally send these patches out.
>>>
>>> The first couple patches are from Brian which convert some of the notes 
>>> logic
>>> to using 'struct object_id'.  The remaining patches are to convert the grep 
>>> and
>>> diff machinery to using 'struct object_id'.
>>
>> Nicely done for all of them.  Thanks.  Will queue (with tweaks
>> mentioned in the comments).
>
> Oops.  I won't be able to queue this for now as it heavily conflicts
> with blame-lib topic.  The resolution should be trivial, mechanical
> and boring, but takes time that I do not have today.

I lied.  This also conflicts somewhat with Peff's diff-blob topic.
I think I resolved them correctly (there needs evil merges applied
to two files when merging this topic), and hopefully can push out
the result by the end of the day.

Thanks.


Re: How to avoid "Please tell me who you are..."?

2017-06-02 Thread Junio C Hamano
Jeffrey Walton  writes:

> Is there no switch? Its the most efficient way to accomplish the task.

This is a safety to help normal human users from hurting themselves,
and it does not make any sense to have "I have no name, so record
garbage, please" option, switch or setting that is different from
"Here is the name I want you to use when recording things".

The switch _is_ to set the names with whatever standard way.  


Re: How to avoid "Please tell me who you are..."?

2017-06-02 Thread Jeffrey Walton
On Fri, Jun 2, 2017 at 3:00 AM, Konstantin Khomoutov
 wrote:
> On Fri, Jun 02, 2017 at 02:02:22AM -0400, Jeffrey Walton wrote:
>
>> I'm working on a test machine. It mostly needs to be a clone of
>> upstream. On occasion it needs to test a particular commit.
>>
>> When I attempt to test a commit it produces:
>>
>> $ git cherry-pick eb3b27a6a543
>>
>> *** Please tell me who you are.
> [...]
>> This is a nameless test account, so there is no information to provide.
>>
>> How do I tell Git to ignore these checks?
> [...]
>> Well, they don't exist so there's nothing to set.
>>
>> The machine below its a CubieBoard used for testing. I remote into it
>> with test@. As a matter of policy, no check-ins occur on it. Other
>> than the password database and authroized_keys file, there is no
>> information on it to be lost or stolen.
>
> `git cherry-pick` wants to record a commit.  A commit in Git always
> possess the information on "the committer" -- whoever recorded the
> commit (it might be distinct from the commit author, as is the case with
> cherry-picking).  There's no way to not set the committer.
>
> I envision two ways to get around this situation:
>
> 1) Patch the ~/.whatevershellrc of your test account to set this
>information by setting and exporting the GIT_AUTHOR_NAME and
>GIT_AUTHOR_EMAIL env. variables (and may be others -- see the
>"git" manual page; run `git help git`).
>
>May be even add it in /etc/skel to make all accounts create inherit
>it.
>
> 2) Set these parameters in the repository you're working with.
>
>While Git suggests you to pass "--global" to the `git config`
>invocations, it's perfectly OK to use "--local" with them (which is
>IIRC the default, if not supplied) to make these settings be recorded
>in the repository's configuration rather than in ~/.gitconfig.
>
> 3) Pass these options explicitly to Git invocations or make a shell
>alias which would do so, like with
>
>function git() {
>  command git \
> -c user.name='Joe Tester' \
> -c user.email=tes...@acme.corp \
> "$@"
>}
>
> I'd personally go with (2).

Thanks.

Is there no switch? Its the most efficient way to accomplish the task.

Jeff


Re: How to avoid "Please tell me who you are..."?

2017-06-02 Thread Konstantin Khomoutov
On Fri, Jun 02, 2017 at 02:02:22AM -0400, Jeffrey Walton wrote:

> I'm working on a test machine. It mostly needs to be a clone of
> upstream. On occasion it needs to test a particular commit.
> 
> When I attempt to test a commit it produces:
> 
> $ git cherry-pick eb3b27a6a543
> 
> *** Please tell me who you are.
[...]
> This is a nameless test account, so there is no information to provide.
> 
> How do I tell Git to ignore these checks?
[...]
> Well, they don't exist so there's nothing to set.
> 
> The machine below its a CubieBoard used for testing. I remote into it
> with test@. As a matter of policy, no check-ins occur on it. Other
> than the password database and authroized_keys file, there is no
> information on it to be lost or stolen.

`git cherry-pick` wants to record a commit.  A commit in Git always
possess the information on "the committer" -- whoever recorded the
commit (it might be distinct from the commit author, as is the case with
cherry-picking).  There's no way to not set the committer.

I envision two ways to get around this situation:

1) Patch the ~/.whatevershellrc of your test account to set this
   information by setting and exporting the GIT_AUTHOR_NAME and
   GIT_AUTHOR_EMAIL env. variables (and may be others -- see the
   "git" manual page; run `git help git`).
   
   May be even add it in /etc/skel to make all accounts create inherit
   it.

2) Set these parameters in the repository you're working with.

   While Git suggests you to pass "--global" to the `git config`
   invocations, it's perfectly OK to use "--local" with them (which is
   IIRC the default, if not supplied) to make these settings be recorded
   in the repository's configuration rather than in ~/.gitconfig.

3) Pass these options explicitly to Git invocations or make a shell
   alias which would do so, like with

   function git() {
 command git \
-c user.name='Joe Tester' \
-c user.email=tes...@acme.corp \
"$@"
   }

I'd personally go with (2).



What's cooking in git.git (Jun 2017, #02; Fri, 2)

2017-06-02 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

As much I hate to send a new issue of "What's cooking" report in
quick successions, quite a lot has happened on the 'master' front to
prepare topics that wants to eventually go to 'maint'.  I am aiming
to tag v2.13.1 very early next week, as I expect to be offline in
the latter half of the week.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ab/grep-preparatory-cleanup (2017-05-26) 31 commits
  (merged to 'next' on 2017-05-29 at f2cfa89d3e)
 + grep: assert that threading is enabled when calling grep_{lock,unlock}
 + grep: given --threads with NO_PTHREADS=YesPlease, warn
 + pack-objects: fix buggy warning about threads
 + pack-objects & index-pack: add test for --threads warning
 + test-lib: add a PTHREADS prerequisite
 + grep: move is_fixed() earlier to avoid forward declaration
 + grep: change internal *pcre* variable & function names to be *pcre1*
 + grep: change the internal PCRE macro names to be PCRE1
 + grep: factor test for \0 in grep patterns into a function
 + grep: remove redundant regflags assignments
 + grep: catch a missing enum in switch statement
 + perf: add a comparison test of log --grep regex engines with -F
 + perf: add a comparison test of log --grep regex engines
 + perf: add a comparison test of grep regex engines with -F
 + perf: add a comparison test of grep regex engines
 + perf: emit progress output when unpacking & building
 + perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do
 + grep: add tests to fix blind spots with \0 patterns
 + grep: prepare for testing binary regexes containing rx metacharacters
 + grep: add a test helper function for less verbose -f \0 tests
 + grep: add tests for grep pattern types being passed to submodules
 + grep: amend submodule recursion test for regex engine testing
 + grep: add tests for --threads=N and grep.threads
 + grep: change non-ASCII -i test to stop using --debug
 + grep: add a test for backreferences in PCRE patterns
 + grep: add a test asserting that --perl-regexp dies when !PCRE
 + log: make --regexp-ignore-case work with --perl-regexp
 + log: add exhaustive tests for pattern style options & config
 + test-lib: rename the LIBPCRE prerequisite to PCRE
 + grep & rev-list doc: stop promising libpcre for --perl-regexp
 + Makefile & configure: reword inaccurate comment about PCRE
 (this branch is used by ab/pcre-v2 and sb/submodule-blanket-recursive.)

 The internal implementation of "git grep" has seen some clean-up.


* ab/ref-filter-no-contains (2017-05-23) 1 commit
  (merged to 'next' on 2017-05-29 at 5d39fd2961)
 + tag: duplicate mention of --contains should mention --no-contains

 Doc update to a recent topic.


* ah/doc-interpret-trailers-ifexists (2017-05-23) 1 commit
  (merged to 'next' on 2017-05-29 at cb353c1d21)
 + Documentation: fix reference to ifExists for interpret-trailers

 Documentation fix.


* ah/doc-pretty-format-fix (2017-05-23) 1 commit
  (merged to 'next' on 2017-05-29 at 6e3e8fd80d)
 + Documentation: fix formatting typo in pretty-formats.txt

 Documentation fix.


* dk/send-email-avoid-net-smtp-ssl-when-able (2017-06-01) 1 commit
  (merged to 'next' on 2017-06-01 at 3ff3ddfac7)
 + send-email: Net::SMTP::starttls was introduced in v2.34

 A hotfix to a topic in 'master'.


* jk/diff-blob (2017-05-24) 15 commits
  (merged to 'next' on 2017-05-29 at 5ecc979cc7)
 + diff: use blob path for blob/file diffs
 + diff: use pending "path" if it is available
 + diff: use the word "path" instead of "name" for blobs
 + diff: pass whole pending entry in blobinfo
 + handle_revision_arg: record paths for pending objects
 + handle_revision_arg: record modes for "a..b" endpoints
 + t4063: add tests of direct blob diffs
 + get_sha1_with_context: dynamically allocate oc->path
 + get_sha1_with_context: always initialize oc->symlink_path
 + sha1_name: consistently refer to object_context as "oc"
 + handle_revision_arg: add handle_dotdot() helper
 + handle_revision_arg: hoist ".." check out of range parsing
 + handle_revision_arg: stop using "dotdot" as a generic pointer
 + handle_revision_arg: simplify commit reference lookups
 + handle_revision_arg: reset "dotdot" consistently

 The result from "git diff" that compares two blobs, e.g. "git diff
 $commit1:$path $commit2:$path", used to be shown with the full
 object name as given on the command line, but it is more natural to
 use the $path in the output and use it to look up .gitattributes.


* js/bs-is-a-dir-sep-on-windows (2017-05-26) 2 commits
  (merged to 'next' on 2017-05-26 at 450b39f726)
 + Windows: do not treat 

Re: How to avoid "Please tell me who you are..."?

2017-06-02 Thread Jeffrey Walton
On Fri, Jun 2, 2017 at 2:30 AM, Davide Fiorentino
 wrote:
> Is there a reason why you don't want or can't set those details?

Well, they don't exist so there's nothing to set.

The machine below its a CubieBoard used for testing. I remote into it
with test@. As a matter of policy, no check-ins occur on it. Other
than the password database and authroized_keys file, there is no
information on it to be lost or stolen.

Jeff

> On June 2, 2017 7:02:22 AM GMT+01:00, Jeffrey Walton 
> wrote:
>>
>> I'm working on a test machine. It mostly needs to be a clone of
>> upstream. On occasion it needs to test a particular commit.
>>
>> When I attempt to test a commit it produces:
>>
>> $ git cherry-pick eb3b27a6a543
>>
>> *** Please tell me who you are.
>>
>> Run
>>
>>   git config --global user.email "y...@example.com"
>>   git config --global user.name "Your Name"
>>
>> to set your account's default identity.
>> Omit --global to set the identity only in this repository.
>>
>> This is a nameless test account, so there is no information to provide.
>>
>> How do I tell Git to ignore these checks?
>>
>> Thanks in advance.


How to avoid "Please tell me who you are..."?

2017-06-02 Thread Jeffrey Walton
I'm working on a test machine. It mostly needs to be a clone of
upstream. On occasion it needs to test a particular commit.

When I attempt to test a commit it produces:

$ git cherry-pick eb3b27a6a543

*** Please tell me who you are.

Run

  git config --global user.email "y...@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

This is a nameless test account, so there is no information to provide.

How do I tell Git to ignore these checks?

Thanks in advance.