Re: [GSoC] Proposal Discussion

2017-03-27 Thread Christian Couder
Hi,

On Tue, Mar 28, 2017 at 12:17 AM, Devin Lehmacher  wrote:
> Hello everyone,
>
> I am a student studying Computer Science at Cornell University. I
> already completed a microproject, Move ~/.git-credential-cache/socket to
> $XDG_CACHE_HOME/credential/socket a week and a half ago or so.

Nice. It would be better though if you could provide a link to the
thread where your microproject was discussed. If it has been merged to
master, you could also provide the merge commit. Otherwise please tell
what is its branch name and current status in the last "What's cooking
in git.git" email from Junio.

> I am interested in 2 different projects and would like some advice on
> them, to help me decide which one to submit a proposal for.
>
> 1. `git rebase -i` conversion.
>I was initially the most interested in this project but realize that
>after having a very busy week last week that Ivan Tham started
>[discussion][1] about this project. Would it be appropriate to submit
>a proposal for a project that someone else also wants to work on?

Yes, it is ok. Obviously only one student/proposal can be selected for
a given project, but as anyway the main constraint for us is usually
the number of available mentors, there is a low chance that this would
prevent us from selecting one more student than we could otherwise
select.

You could also submit 2 proposals if you have time to work on more than one.

> 2. formatting tool improvements.
>There are four different git commands mentioned [here][2] as possible
>tools to improve as can be seen in the email. Of those I think it
>would make the most sense to extend `git name-rev`. It seems best
>suited to the desired behavior. It would need to be extended to
>understand rev's that refer to objects rather than just a commit-ish
>and also add formatting support similar to the information that log
>and for-each-ref can output. Since this doesn't seem like much work,
>would it be feasible to generalize and somewhat standardize all of
>the formatting commands?

Yeah, I think it would be good. It might involve a lot of discussion
though and this could slow your project.
So if you really want to do it, my advice is to try to start the
discussion as soon as possible, that is now.

To do that you could for example Cc people involved in the email
discussions, and try to come up with concrete proposals about how to
generalize and standardize the formatting commands.

Thanks,
Christian.


Re: [PATCH/RFC] parse-options: add facility to make options configurable

2017-03-27 Thread Jeff King
On Sat, Mar 25, 2017 at 11:32:02PM +0100, Ævar Arnfjörð Bjarmason wrote:

> > So hopefully it's clear that the two are functionally equivalent, and
> > differ only in syntax (in this case we manually decided which options
> > are safe to pull from the config, but we'd have to parse the options.log
> > string, too, and we could make the same decision there).
> 
> I like the simplicity of this approach a lot. I.e. (to paraphrase it
> just to make sure we're on the same page): Skip all the complexity of
> reaching into the getopt guts, and just munge argc/argv given a config
> we can stick ahead of the getopt (struct options) spec, inject some
> options at the beginning if they're in the config, and off we go
> without any further changes to the getopt guts.

Yep, I think that's an accurate description.

> There's two practical issues with this that are easy to solve with my
> current approach, but I can't find an easy solution to using this
> method.
> 
> The first is that we're replacing the semantics of:
> 
> "If you're specifying it on the command-line, we take it from there,
> otherwise we use your config, if set, regardless of how the option
> works"
> 
> with:
> 
> "We read your config, inject options implicitly at the start of the
> command line, and then append whatever command-line you give us"
> 
> These two are not the same. Consider e.g. the commit.verbose config.
> With my current patch if have commit.verbose=1 in your config and do
> "commit --verbose" you just end up with a result equivalent to not
> having it in your config, but since the --verbose option can be
> supplied multiple times to increase verbosity with the injection
> method you'd end up with the equivalent of commit.verbose=2.

Right, for anything where multiple options are meaningful, they'd have
to give "--no-verbose" to reset the option. In a sense that's less
friendly, because it's more manual. But it's also less magical, because
the semantics are clear: the config option behaves exactly as if you
gave the option on the command line. So for an OPT_STRING_LIST(), you
could append to the list, or reset it to empty, etc, as you see fit.

But I do agree that it's more manual, and probably would cause some
confusion.

> I can't think of a good way around that with your proposed approach
> that doesn't essentially get us back to something very similar to my
> patch, i.e. we'd need to parse the command-line using the options spec
> before applying our implicit config.

Yes, the semantics you gave require parsing the options first. I think
it would be sufficient to just give each "struct option" a "seen" flag
(rather than having it understand the config mechanism), having
parse_options() set the flag, and then feeding the result to a separate
config/cmdline mapping mechanism. That keeps the complexity out of the
options code.

It does tie us back in to requiring parse-options, which not all the
options use.

In a lot of cases that "seen" flag is effectively a sentinel value in
whatever variable the option value is stored in. But some of the options
don't have reasonable sentinel values (as you noticed with the "revert
-m" handling recently).

> The second issue is related, i.e. I was going to add some flag an
> option could supply to say "if I'm provided none of these other
> maybe-from-config options get to read their config". This is needed
> for hybrid plumbing/porcelain like "git status --porcelain".

Yeah, I agree you can't make that decision until you've seen the
command-line options. I think we currently do some hairy stuff where we
speculatively read config into a variable, and then apply the
config-based defaults only when we know we're in non-porcelain mode (see
status_deferred_config in builtin/commit.c).

That came about because we didn't want to parse the config a second
time. These days the deferred config should probably just be read from
the cached configset, after we've read the other options.

But I think this can be done after the full option-parsing is finished
by applying the mapping then.  I.e., something like:

parse_options(argc, argv, options, ...);
if (status_format != STATUS_FORMAT_PORCELAIN)
apply_config_mapping(status_mapping, options);

-Peff


[PATCH] [GSOC] get_non_kept_pack_filenames(): reimplement using iterators

2017-03-27 Thread Robert Stanca
Replaces recursive traversing of opendir with dir_iterator.

Signed-off-by: Robert Stanca 
---
 builtin/repack.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 677bc7c..27a5597 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -7,6 +7,8 @@
 #include "strbuf.h"
 #include "string-list.h"
 #include "argv-array.h"
+#include "iterator.h"
+#include "dir-iterator.h"
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
@@ -86,26 +88,21 @@ static void remove_pack_on_signal(int signo)
  */
 static void get_non_kept_pack_filenames(struct string_list *fname_list)
 {
-   DIR *dir;
-   struct dirent *e;
+   struct dir_iterator *diter = dir_iterator_begin(packdir);
char *fname;
 
-   if (!(dir = opendir(packdir)))
-   return;
-
-   while ((e = readdir(dir)) != NULL) {
+   while (dir_iterator_advance(diter) == ITER_OK) {
size_t len;
-   if (!strip_suffix(e->d_name, ".pack", ))
+   if (!strip_suffix(diter->relative_path, ".pack", ))
continue;
 
-   fname = xmemdupz(e->d_name, len);
+   fname = xmemdupz(diter->relative_path, len);
 
if (!file_exists(mkpath("%s/%s.keep", packdir, fname)))
string_list_append_nodup(fname_list, fname);
else
free(fname);
}
-   closedir(dir);
 }
 
 static void remove_redundant_pack(const char *dir_name, const char *base_name)
-- 
2.7.4




Hi , this is my first patch submission for Git Gsoc. I ran full tests and local 
tests with
prove --timer --jobs 15 ./t*pack*.sh .

Have a great day,
 Robert.


Re: [PATCH 7/7] submodule.c: correctly handle nested submodules in is_submodule_modified

2017-03-27 Thread Jonathan Nieder
Junio C Hamano wrote:

> Shouldn't this done as part of 4/7 where is_submodule_modified()
> starts reading from the porcelain v2 output?  4/7 does adjust for
> the change from double question mark (porcelain v1) to a single one
> for untracked, but at the same time it needs to prepare for these
> 'u' (unmerged), '1' (normal modification) and '2' (mods with rename)
> to appear in the output, no?
>
> IOW, with 4/7 and 7/7 done as separate steps, isn't the system
> broken between these steps?

No.  Both before and after patch 4, this code has to determine two
details from a submodule:

 1. Does it have untracked files?
 2. Does it have any modifications to tracked files (including
submodules)?

Using porcelain v1 format, (1) is represented by a "??" line and (2)
is represented by any other line. Using porcelain v2 format, (1) is
represented by a "u" line and (2) is represented by any other line.

So patch 4 does not intend to change behavior.

This patch 7 is trying to do something more subtle.  Suppose I have a
superproject 'parent', with a submodule 'parent/sub', which itself
contains a submodule 'parent/sub/subsub'.  Now suppose I run, from
within 'parent':

echo hi >sub/subsub/stray-file

Both before and after patch 4, if I run "git status" from 'parent'
then I will learn that "sub" was modified.  "git status" within 'sub'
would tell me that "subsub" has an untracked file.

But from the end user's point of view, even when running in "parent",
what I want to know is that there is an untracked file.  Treating it
as a modification instead of untracked file is confusing and does
not answer the user's actual question.  That is what patch 7 tries to
fix.

In other words, patch 7 is about changing that list of two questions
from before.  Something like

 1. Does it or any submodule contained within it have untracked files,
that I could add with "git add -N --recurse-submodules"?

 2. Does it or any submodule contained within it have modified files,
that I could add with "git add -u --recurse-submodules"?

 3. Does it or any submodule contained within it have a changed HEAD,
that I could also add with "git add -u --recurse-submodules"?

Question (3) didn't come up before because when there are no nested
submodules, the diff machinery answers it (saving us from getting the
answer from the status --porcelain we recurse to).

Thanks and hope that helps,
Jonathan


Re: [PATCH] push: allow atomic flag via configuration

2017-03-27 Thread Jonathan Nieder
Hi,

Romuald Brunet wrote:
> On ven., 2017-03-24 at 12:29 -0700, Junio C Hamano wrote:
>> Jeff King  writes:

>>> My one question would be whether people would want this to actually be
>>> specific to a particular remote, and not just on for a given repository
>>> (your "site-specific" in the description made me think of that). In that
>>> case it would be better as part of the remote.* config.
>>
>> Yeah, I had the same reaction.
>>
>> Conceptually, this sits next to remote.*.push that defines which set
>> of refs are sent by default, and remote..pushAtomic does make
>> sense.  If (and only if) it turns out to be cumbersome for somebody
>> to set the configuration for each and every remote, it is OK to also
>> add push.atomic to serve as a fallback for remote.*.pushAtomic, I
>> would think, but adding only push.atomic feels somewhat backwards.
>
> Thanks for your feedback
>
> I'm mostly using single remotes that's why I didn't even think of making
> it configurable per remote. But you're right that makes more sense.
>
> I'll try to make that modification to the patch.
>
> As for my use case: I'd like to use default atomic pushes when pushing a
> new tag among our stable branch, but inevitably forgetting to rebase
> beforehand. Therefore pushing a "dangling" commit/tag

Making it per-remote would also address my concern about scripts.
Existing scripts may be using

git push $url $refs

and relying on the push to partially succeed even when some refs
cannot be fast-forwarded.  (For example, I had such a script that did

git push my-mirror refs/remotes/origin/*:refs/heads/*
git push my-mirror +origin/pu:pu

because the first command would fail to update pu, and then the second
command would clean up after it.)  A configuration that globally
changes the effect of "git push" to mean "git push --atomic" would
break such scripts.  A per-remote configuration is tightly enough
scoped to not be likely to cause unrelated breakage for users that run
scripts written by others.

Thanks and hope that helps,
Jonathan


RE: [PATCH v2 2/2] sub-process: refactor the filter process code into a reusable module

2017-03-27 Thread Ben Peart
> -Original Message-
> From: Jonathan Tan [mailto:jonathanta...@google.com]
> Sent: Monday, March 27, 2017 3:00 PM
> To: Ben Peart ; git@vger.kernel.org
> Cc: Ben Peart 
> Subject: Re: [PATCH v2 2/2] sub-process: refactor the filter process code into
> a reusable module
> 
> On 03/24/2017 08:27 AM, Ben Peart wrote:
> > Refactor the filter..process code into a separate sub-process
> > module that can be used to reduce the cost of starting up a
> > sub-process for multiple commands.  It does this by keeping the
> > external process running and processing all commands by communicating
> > over standard input and standard output using the packet format (pkt-line)
> based protocol.
> > Full documentation is in Documentation/technical/api-sub-process.txt.
> 
> Thanks - this looks like something useful to have.

Thanks for the review and feedback.

> 
> When you create a "struct subprocess_entry" to be entered into the system,
> it is not a true "struct subprocess_entry" - it is a "struct subprocess_entry"
> plus some extra variables at the end. Since the sub-process hashmap is
> keyed solely on the command, what happens if another component uses the
> same trick (but with different extra
> variables) when using a sub-process with the same command?

Having the command be the unique key is sufficient because it gets executed as 
a process by run_command and there can't be multiple different processes by the 
same name. 

> 
> I can think of at least two ways to solve this: (i) each component can have 
> its
> own sub-process hashmap, or (ii) add a component key to the hashmap. (i)
> seems more elegant to me, but I'm not sure what the code will look like.
> 
> Also, I saw some minor code improvements possible (e.g. using "starts_with"
> when you're checking for the "status=" line) but I'll comment on those
> and look into the code more thoroughly once the questions in this e-mail are
> resolved.
> 
> > diff --git a/sub-process.h b/sub-process.h new file mode 100644 index
> > 00..d1492f476d
> > --- /dev/null
> > +++ b/sub-process.h
> > @@ -0,0 +1,46 @@
> > +#ifndef SUBPROCESS_H
> > +#define SUBPROCESS_H
> > +
> > +#include "git-compat-util.h"
> > +#include "hashmap.h"
> > +#include "run-command.h"
> > +
> > +/*
> > + * Generic implementation of background process infrastructure.
> > + * See Documentation/technical/api-background-process.txt.
> > + */
> > +
> > + /* data structures */
> > +
> > +struct subprocess_entry {
> > +   struct hashmap_entry ent; /* must be the first member! */
> > +   struct child_process process;
> > +   const char *cmd;
> > +};
> 
> I notice from the documentation (and from "subprocess_get_child_process"
> below) that this is meant to be opaque, but I think this can be non-opaque
> (like "run-command").
> 
> Also, I would prefer adding a "util" pointer here instead of using it as an
> embedded struct. There is no clue here that it is embeddable or meant to be
> embedded.
> 

The structure is intentionally opaque to provide the benefits of encapsulation. 
 Obviously, the "C" language doesn't provide any enforcement of that design 
principal but we do what we can.  

The embedded struct is following the same design pattern as elsewhere in git 
(hashmap for example) simply for consistency.

> > +
> > +/* subprocess functions */
> > +
> > +typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
> > +int subprocess_start(struct subprocess_entry *entry, const char *cmd,
> > +   subprocess_start_fn startfn);
> 
> I'm not sure if it is useful to take a callback here - I think the caller of 
> this
> function can just run whatever it wants after a successful subprocess_start.

The purpose of doing the subprocess specific initialization via a callback is 
so that if it encounters an error (for example, it can't negotiate a common 
interface version) the subprocess_start function can detect that and ensure the 
hashmap does not contain the invalid/unusable subprocess. 

> 
> Alternatively, if you add the "util" pointer (as I described above), then it
> makes sense to add a subprocess_get_or_start() function (and now it makes
> sense to take the callback). This way, the data structure will own, create, 
> and
> destroy all the "struct subprocess_entry" that it needs, creating a nice
> separation of concerns.
> 
> > +
> > +void subprocess_stop(struct subprocess_entry *entry);
> 
> (continued from above) And it would be clear that this would free
> "entry", for example.
> 
> > +
> > +struct subprocess_entry *subprocess_find_entry(const char *cmd);
> > +
> > +/* subprocess helper functions */
> > +
> > +static inline struct child_process *subprocess_get_child_process(
> > +   struct subprocess_entry *entry)
> > +{
> > +   return >process;
> > +}
> > +
> > +/*
> > + * Helper function that will read packets looking for "status="
> > + * key/value pairs and return the value from the last "status" packet
> > + */
> > +
> 

Re: [PATCH v2 1/2] read-cache: skip index SHA verification

2017-03-27 Thread Jeff King
On Mon, Mar 27, 2017 at 04:32:10PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > Hrm, there shouldn't be any dependency of the config on the index (and
> > there are a handful of options which impact the index already). Did you
> > try it and run into problems?
> >
> > In general, I'd much rather see us either:
> >
> >   1. Rip the code out entirely if it is not meant to be configurable,
> >  and cannot be triggered by the actual git binary.
> >
> > or
> >
> >   2. Make it configurable, even if most people wouldn't use it. And then
> >  have a test to exercise it using a git command (unlike the one-off
> >  test helper, which isn't run at all).
> 
> Agreed with this, leaning toward (1).
> 
> If "git fsck" verifies the .git/index file then I don't see any need
> for other commands to.

Yeah, code that _can_ be run but almost nobody does is possibly worse
than code that can't be run. :)

I agree that it would make sense for fsck to check it, though (just like
it checks the actual pack trailer checksums, even though normal
operations do not).

-Peff


Re: [PATCH v2 1/2] read-cache: skip index SHA verification

2017-03-27 Thread Jonathan Nieder
Jeff King wrote:

> Hrm, there shouldn't be any dependency of the config on the index (and
> there are a handful of options which impact the index already). Did you
> try it and run into problems?
>
> In general, I'd much rather see us either:
>
>   1. Rip the code out entirely if it is not meant to be configurable,
>  and cannot be triggered by the actual git binary.
>
> or
>
>   2. Make it configurable, even if most people wouldn't use it. And then
>  have a test to exercise it using a git command (unlike the one-off
>  test helper, which isn't run at all).

Agreed with this, leaning toward (1).

If "git fsck" verifies the .git/index file then I don't see any need
for other commands to.

Thanks and hope that helps,
Jonathan


Re: [PATCH v2 1/2] read-cache: skip index SHA verification

2017-03-27 Thread Jeff King
On Mon, Mar 27, 2017 at 09:09:38PM +, g...@jeffhostetler.com wrote:

> From: Jeff Hostetler 
> 
> Teach git to skip verification of the index SHA in read_index().
> 
> This is a performance optimization.  The index file SHA verification
> can be considered an ancient relic from the early days of git and only
> useful for detecting disk corruption.  For small repositories, this
> SHA calculation is not that significant, but for gigantic repositories
> this calculation adds significant time to every command.
> 
> I added a global "skip_verify_index" variable to control this and
> allow it to be tested.
> 
> I did not create a config setting for this because of chicken-n-egg
> problems with the loading the config and the index.

Hrm, there shouldn't be any dependency of the config on the index (and
there are a handful of options which impact the index already). Did you
try it and run into problems?

In general, I'd much rather see us either:

  1. Rip the code out entirely if it is not meant to be configurable,
 and cannot be triggered by the actual git binary.

or

  2. Make it configurable, even if most people wouldn't use it. And then
 have a test to exercise it using a git command (unlike the one-off
 test helper, which isn't run at all).

-Peff


Re: [PATCH v2 0/2] read-cache: call verify_hdr() in a background thread

2017-03-27 Thread Jeff King
On Mon, Mar 27, 2017 at 09:09:37PM +, g...@jeffhostetler.com wrote:

> From: Jeff Hostetler 
> 
> Version 2 of this patch series simplifies this to just
> turn off the hash verification.  Independent comments
> from Linus and Peff suggested that we could just turn
> this off and not worry about it.  So I've updated this
> patch to do that.  I added a global variable to allow
> the original code path to be used.  I also added a
> t/helper command to demonstrate the differences.
> 
> On the Linux repo, the effect is rather trivial:
> 
> $ ~/work/gfw/t/helper/test-skip-verify-index -c 3
> 0.029884 0 [cache_nr 57994]
> 0.031035 0 [cache_nr 57994]
> 0.024308 0 [cache_nr 57994]
> 0.028409 0 avg
> 0.018359 1 [cache_nr 57994]
> 0.017025 1 [cache_nr 57994]
> 0.011087 1 [cache_nr 57994]
> 0.015490 1 avg
> 
> On my Windows source tree (450MB index), I'm seeing a
> savings of 0.6 seconds -- read_index() went from 1.2 to 0.6
> seconds.

Very satisfying. I assume that was with OpenSSL as the SHA-1
implementation (sha1dc would have been much slower on 450MB, I think).

-Peff


What's cooking in git.git (Mar 2017, #11; Mon, 27)

2017-03-27 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* jc/lint-runaway-here-doc (2017-03-23) 1 commit
  (merged to 'next' on 2017-03-24 at c1f1ca1bd7)
 + tests: lint for run-away here-doc

 The test framework learned to detect unterminated here documents.


* jk/prefix-filename (2017-03-21) 6 commits
  (merged to 'next' on 2017-03-22 at 6d180ed430)
 + bundle: use prefix_filename with bundle path
 + prefix_filename: simplify windows #ifdef
 + prefix_filename: return newly allocated string
 + prefix_filename: drop length parameter
 + prefix_filename: move docstring to header file
 + hash-object: fix buffer reuse with --path in a subdirectory

 Code clean-up with minor bugfixes.


* jk/quote-env-path-list-component (2017-03-22) 1 commit
  (merged to 'next' on 2017-03-24 at 78843c4f9d)
 + t5615: fix a here-doc syntax error

 A test fix.


* km/config-grammofix (2017-03-23) 1 commit
  (merged to 'next' on 2017-03-24 at 75ddbbc6f9)
 + doc/config: grammar fixes for core.{editor,commentChar}

 Doc update.


* km/t1400-modernization (2017-03-21) 5 commits
  (merged to 'next' on 2017-03-22 at e9c3154ca4)
 + t1400: use test_when_finished for cleanup
 + t1400: remove a set of unused output files
 + t1400: use test_path_is_* helpers
 + t1400: set core.logAllRefUpdates in "logged by touch" tests
 + t1400: rename test descriptions to be unique

 Code clean-up.


* sb/describe-broken (2017-03-22) 1 commit
  (merged to 'next' on 2017-03-24 at 2262cbf415)
 + builtin/describe: introduce --broken flag

 "git describe --dirty" dies when it cannot be determined if the
 state in the working tree matches that of HEAD (e.g. broken
 repository or broken submodule).  The command learned a new option
 "git describe --broken" to give "$name-broken" (where $name is the
 description of HEAD) in such a case.


* sb/push-options-via-transport (2017-03-22) 2 commits
  (merged to 'next' on 2017-03-24 at c5535bec3b)
 + remote-curl: allow push options
 + send-pack: send push options correctly in stateless-rpc case

 Recently we started passing the "--push-options" through the
 external remote helper interface; now the "smart HTTP" remote
 helper understands what to do with the passed information.


* sb/submodule-update-initial-runs-custom-script (2017-03-22) 1 commit
  (merged to 'next' on 2017-03-24 at 628200c3b1)
 + t7406: correct test case for submodule-update initial population

 A test fix.


* sb/t3600-rephrase (2017-03-22) 1 commit
  (merged to 'next' on 2017-03-24 at 5ec1eee490)
 + t3600: rename test to describe its functionality

 A test retitling.


* st/verify-tag (2017-03-24) 1 commit
  (merged to 'next' on 2017-03-24 at d26313d4ab)
 + t7004, t7030: fix here-doc syntax errors

 A few unterminated here documents in tests were fixed, which in
 turn revealed incorrect expectations the tests make. These tests
 have been updated.

--
[New Topics]

* ab/case-insensitive-upstream-and-push-marker (2017-03-27) 1 commit
 - rev-parse: match @{upstream}, @{u} and @{push} case-insensitively

 On many keyboards, typing "@{" involves holding down SHIFT key and
 one can easily end up with "@{Up..." when typing "@{upstream}".  As
 the upstream/push keywords do not appear anywhere else in the syntax,
 we can safely accept them case insensitively without introducing
 ambiguity or confusion  to solve this.

 Will merge to 'next'.


* bc/object-id (2017-03-26) 21 commits
 - Documentation: update and rename api-sha1-array.txt
 - Rename sha1_array to oid_array
 - Convert sha1_array_for_each_unique and for_each_abbrev to object_id
 - Convert sha1_array_lookup to take struct object_id
 - Convert remaining callers of sha1_array_lookup to object_id
 - Make sha1_array_append take a struct object_id *
 - sha1-array: convert internal storage for struct sha1_array to object_id
 - builtin/pull: convert to struct object_id
 - submodule: convert check_for_new_submodule_commits to object_id
 - sha1_name: convert disambiguate_hint_fn to take object_id
 - sha1_name: convert struct disambiguate_state to object_id
 - test-sha1-array: convert most code to struct object_id
 - parse-options-cb: convert sha1_array_append caller to struct object_id
 - fsck: convert init_skiplist to struct object_id
 - builtin/receive-pack: convert portions to struct object_id
 - builtin/receive-pack: fix incorrect pointer arithmetic
 - builtin/pull: convert portions to struct object_id
 - builtin/diff: convert to struct object_id
 - Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ
 - 

RE: [PATCH v2 1/2] pkt-line: add packet_writel() and packet_read_line_gently()

2017-03-27 Thread Ben Peart
> -Original Message-
> From: Torsten Bögershausen [mailto:tbo...@web.de]
> Sent: Saturday, March 25, 2017 1:47 AM
> To: Ben Peart ; git@vger.kernel.org
> Cc: Ben Peart ; christian.cou...@gmail.com;
> larsxschnei...@gmail.com
> Subject: Re: [PATCH v2 1/2] pkt-line: add packet_writel() and
> packet_read_line_gently()
> 
> On 2017-03-24 16:27, Ben Peart wrote:
> > Add packet_writel() which writes multiple lines in a single call and
> > then calls packet_flush_gently(). Add packet_read_line_gently() to
> > enable reading a line without dying on EOF.
> >
> > Signed-off-by: Ben Peart 
> > ---
> >  pkt-line.c | 31 +++  pkt-line.h | 11
> > +++
> >  2 files changed, 42 insertions(+)
> >
> > diff --git a/pkt-line.c b/pkt-line.c
> > index d4b6bfe076..2788aa1af6 100644
> > --- a/pkt-line.c
> > +++ b/pkt-line.c
> > @@ -171,6 +171,25 @@ int packet_write_fmt_gently(int fd, const char
> *fmt, ...)
> > return status;
> >  }
> >
> > +int packet_writel(int fd, const char *line, ...)
> The name packet_writel is hard to distinguish from packet_write.
> Would packet_write_lines make more sense ?
> 

Just goes to prove that there are two hard things in computer science: cache 
invalidation, naming things, and off-by-one errors. :)  The feedback on V1 was:

"I am hesitant to take a function that does not take any "list"
type argument and yet calls itself "write_list".  IOW, I'd expect something 
like these

write_list(int fd, const char **lines);
write_list(int fd, struct string_list *lines);

given that name, and not "varargs, each of which is a line".  I am tempted to 
suggest

packet_writel(int fd, const char *line, ...);"

> > +{
> > +   va_list args;
> > +   int err;
> > +   va_start(args, line);
> > +   for (;;) {
> > +   if (!line)
> > +   break;
> > +   if (strlen(line) > LARGE_PACKET_DATA_MAX)
> > +   return -1;
> > +   err = packet_write_fmt_gently(fd, "%s\n", line);
> > +   if (err)
> > +   return err;
> > +   line = va_arg(args, const char*);
> > +   }
> > +   va_end(args);
> > +   return packet_flush_gently(fd);
> > +}
> > +
> I don't think that this va_start() is needed, even if it works.
> 
> int packet_write_line(int fd, const char *lines[])
> |
>   const char *line = *lines;
>   int err;
>   while (line) {
>   if (strlen(line) > LARGE_PACKET_DATA_MAX)
>   return -1;
>   err = packet_write_fmt_gently(fd, "%s\n", line);
>   if (err)
>   return err;
>   lines++;
>   line = *lines;
>   }
>   return packet_flush_gently(fd);
> ]
> 

This is a helper function to simply the common pattern of writing out a 
variable number of lines followed by a flush.  The usage of the function as 
currently implemented is:

packet_writel(fd, "line one", "line two", "line n");

which requires the use of variable number of arguments.  With your proposal 
that convenience is lost as you have to create an array of strings and pass 
that instead.  The usage just isn't as simple as the current model.



RE: [PATCH v1 2/3] sub-process: refactor the filter process code into a reusable module

2017-03-27 Thread Ben Peart
> Junio C Hamano  writes:
> 
> 
> To avoid confusion (although readers may not require), even though I
> explained "boring mechanical part" first and "refactoring", that was purely
> for explanation.
> 
> In practice, I would expect that it would be easier to both do and review if
> refactoring is done with the original name.
> 
> A function that will keep its name in the final result (e.g.
> start_multi_file_filter()) will call a new and more generic helper function.  
> The
> new helper may start using the new name from the get-go (e.g.
> subprocess_start()), but the data types it shares with the original part of 
> the
> code (e.g. 'struct cmd2process') may still be using the original name.
> 
> And after completing "2 or more" refactoring would be a good place to do
> the remaining "boring mechanical rename".  IOW, the count above could be
> 
>  2 or more (refactoring) +
>  1 (boring mechanical part) +
>  1 (final movement)
> 
> and I didn't mean to say that you need to rename first.  What we want is "if
> you need to have a single large patch that cannot be split, see if you can
> make it purely mechanical.", as a single large patch that is _not_ mechanical
> conversion is the worst kind of patch for reviewers.

Thanks, I think I better understand what you are looking for in a patch series. 
 In short, any non trivial refactoring should take place within the same file 
using 1 or more patches to keep each patch as simple as possible.  Any large or 
cross file refactoring should be made as boring/mechanical as possible. This is 
to make it easier to see any complex changes within a single format patch 
section and avoid having to look between two file patches to ensure the 
refactoring didn't unintentionally change behavior.

I'll throw out my current refactoring and do it again attempting to follow 
these guidelines as soon as I can find the time ($DAYJOB tends to take priority 
over my open source work).



[GSoC] Proposal Discussion

2017-03-27 Thread Devin Lehmacher
Hello everyone,

I am a student studying Computer Science at Cornell University. I
already completed a microproject, Move ~/.git-credential-cache/socket to
$XDG_CACHE_HOME/credential/socket a week and a half ago or so.

I am interested in 2 different projects and would like some advice on
them, to help me decide which one to submit a proposal for.

1. `git rebase -i` conversion. 
   I was initially the most interested in this project but realize that
   after having a very busy week last week that Ivan Tham started
   [discussion][1] about this project. Would it be appropriate to submit
   a proposal for a project that someone else also wants to work on?

2. formatting tool improvements. 
   There are four different git commands mentioned [here][2] as possible
   tools to improve as can be seen in the email. Of those I think it
   would make the most sense to extend `git name-rev`. It seems best
   suited to the desired behavior. It would need to be extended to
   understand rev's that refer to objects rather than just a commit-ish
   and also add formatting support similar to the information that log
   and for-each-ref can output. Since this doesn't seem like much work,
   would it be feasible to generalize and somewhat standardize all of
   the formatting commands?

[1]: 
https://public-inbox.org/git/20170320164154.xbcu6rg0c%25pickf...@riseup.net/
[2]: 
https://public-inbox.org/git/CA+P7+xr4ZNCCJkS0=yR-FNu+MrL60YX-+Wsz9L_5LCNhnY_d=a...@mail.gmail.com/

Thanks for any feedback you may have,
Devin


Re: [PATCH 3/3] blame: output porcelain "previous" header for each file

2017-03-27 Thread Jeff King
On Mon, Mar 27, 2017 at 11:08:08PM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Fri, Jan 6, 2017 at 5:20 AM, Jeff King  wrote:
> 
> > +for output in porcelain line-porcelain
> > +do
> > +   test_expect_success "generate --$output output" '
> > +   git blame --root -C --$output combined >output
> > +   '
> > +
> 
> The --root option isn't needed here, the tests pass if it's removed.
> 
> Discovered while looking at the sorry state of our blame test suite
> vis-a-vis tests for config, no tests for blame.showroot &
> blame.blankboundary.
> 
> I'll submit that eventually, but meanwhile did you mean to omit --root
> here, or is the test broken in some other way and it /should/ need
> --root but doesn't?

I don't think it's strictly needed, though I think it's worth keeping.

The test is making sure that some lines blame to HEAD and some to HEAD^.
But the latter is a root commit, and so it just becomes a boundary
commit without --root.

You can see the difference if you interrupt the test here and run:

  git blame -C [--root] combined

And that's what I was doing when I developed the test case. If you were
to test the non-porcelain output, you'd need to it (to match the real
sha-1 X, and not the boundary "^X").

When the porcelain outputs are used, though, the boundary commits are
shown as full sha1s, and just get annotated with a "boundary" line. As
the tests just look for subject, filename, and previous lines, they
don't notice whether the boundary marker is there or not. And so --root
could technically go away.

We care mostly about detecting the values for the second commit, so I
don't think it actually matters. But if we wanted to be thorough, we
could confirm that the "boundary" lines are as we expect (or
alternatively, we could add an uninteresting base commit at the bottom
to make the second one non-root).

-Peff


Re: [PATCH 7/7] submodule.c: correctly handle nested submodules in is_submodule_modified

2017-03-27 Thread Junio C Hamano
Stefan Beller  writes:

> When a nested submodule has untracked files, it would be reported as
> "modified submodule" in the superproject, because submodules are not
> parsed correctly in is_submodule_modified as they are bucketed into
> the modified pile as "they are not an untracked file".

I cannot quite parse the above.

> Signed-off-by: Stefan Beller 
> ---
>  submodule.c | 23 +--
>  t/t3600-rm.sh   |  2 +-
>  t/t7506-status-submodule.sh |  2 +-
>  3 files changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index fa21c7bb72..730cc9513a 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1078,8 +1078,27 @@ unsigned is_submodule_modified(const char *path, int 
> ignore_untracked)
>   /* regular untracked files */
>   if (buf.buf[0] == '?')
>   dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
> - else
> - dirty_submodule |= DIRTY_SUBMODULE_MODIFIED;
> +
> + if (buf.buf[0] == 'u' ||
> + buf.buf[0] == '1' ||
> + buf.buf[0] == '2') {
> + /*
> +  * T XY :
> +  * T = line type, XY = status,  = submodule state
> +  */
> + if (buf.len < 1 + 1 + 2 + 1 + 4)
> + die("BUG: invalid status --porcelain=2 line %s",
> + buf.buf);
> +
> + /* regular unmerged and renamed files */
> + if (buf.buf[5] == 'S' && buf.buf[8] == 'U')
> + /* nested untracked file */
> + dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;

OK, we have untracked one.

> + if (memcmp(buf.buf + 5, "S..U", 4))
> + /* other change */
> + dirty_submodule |= DIRTY_SUBMODULE_MODIFIED;

And for other cases like C at buf.buf[6] or M at buf.buf[7],
i.e. where the submodule not just has untracked files but has real
changes, we say it is truly MODIFIED here.

If there are changes to paths that is not a submodule but a tracked
file in the submodule in question would have N at buf.buf[5] and is
also caught with the same "not S..U so that's MODIFIED" logic.

OK.

Shouldn't this done as part of 4/7 where is_submodule_modified()
starts reading from the porcelain v2 output?  4/7 does adjust for
the change from double question mark (porcelain v1) to a single one
for untracked, but at the same time it needs to prepare for these
'u' (unmerged), '1' (normal modification) and '2' (mods with rename)
to appear in the output, no?

IOW, with 4/7 and 7/7 done as separate steps, isn't the system
broken between these steps?



[PATCH v2 1/2] read-cache: skip index SHA verification

2017-03-27 Thread git
From: Jeff Hostetler 

Teach git to skip verification of the index SHA in read_index().

This is a performance optimization.  The index file SHA verification
can be considered an ancient relic from the early days of git and only
useful for detecting disk corruption.  For small repositories, this
SHA calculation is not that significant, but for gigantic repositories
this calculation adds significant time to every command.

I added a global "skip_verify_index" variable to control this and
allow it to be tested.

I did not create a config setting for this because of chicken-n-egg
problems with the loading the config and the index.

Signed-off-by: Jeff Hostetler 
---
 cache.h  | 5 +
 read-cache.c | 6 ++
 2 files changed, 11 insertions(+)

diff --git a/cache.h b/cache.h
index 80b6372..4e9182f 100644
--- a/cache.h
+++ b/cache.h
@@ -740,6 +740,11 @@ extern int protect_ntfs;
 extern int git_db_env, git_index_env, git_graft_env, git_common_dir_env;
 
 /*
+ * When set, skip verification of the index SHA in read_index().
+ */
+extern int skip_verify_index;
+
+/*
  * Include broken refs in all ref iterations, which will
  * generally choke dangerous operations rather than letting
  * them silently proceed without taking the broken ref into
diff --git a/read-cache.c b/read-cache.c
index 9054369..1ca69c3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -46,6 +46,8 @@
 struct index_state the_index;
 static const char *alternate_index_output;
 
+int skip_verify_index = 1;
+
 static void set_index_entry(struct index_state *istate, int nr, struct 
cache_entry *ce)
 {
istate->cache[nr] = ce;
@@ -1382,6 +1384,10 @@ static int verify_hdr(struct cache_header *hdr, unsigned 
long size)
hdr_version = ntohl(hdr->hdr_version);
if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
return error("bad index version %d", hdr_version);
+
+   if (skip_verify_index)
+   return 0;
+
git_SHA1_Init();
git_SHA1_Update(, hdr, size - 20);
git_SHA1_Final(sha1, );
-- 
2.7.4



[PATCH v2 2/2] skip_verify_index: helper test

2017-03-27 Thread git
From: Jeff Hostetler 

Created test to measure read_index() with and without
skip_verify_index set and report performance differences.

Signed-off-by: Jeff Hostetler 
---
 Makefile  |  1 +
 t/helper/.gitignore   |  1 +
 t/helper/test-skip-verify-index.c | 73 +++
 3 files changed, 75 insertions(+)
 create mode 100644 t/helper/test-skip-verify-index.c

diff --git a/Makefile b/Makefile
index 9ec6065..e4932b6 100644
--- a/Makefile
+++ b/Makefile
@@ -631,6 +631,7 @@ TEST_PROGRAMS_NEED_X += test-scrap-cache-tree
 TEST_PROGRAMS_NEED_X += test-sha1
 TEST_PROGRAMS_NEED_X += test-sha1-array
 TEST_PROGRAMS_NEED_X += test-sigchain
+TEST_PROGRAMS_NEED_X += test-skip-verify-index
 TEST_PROGRAMS_NEED_X += test-string-list
 TEST_PROGRAMS_NEED_X += test-submodule-config
 TEST_PROGRAMS_NEED_X += test-subprocess
diff --git a/t/helper/.gitignore b/t/helper/.gitignore
index d6e8b36..4d2ed3c 100644
--- a/t/helper/.gitignore
+++ b/t/helper/.gitignore
@@ -25,6 +25,7 @@
 /test-sha1
 /test-sha1-array
 /test-sigchain
+/test-skip-verify-index
 /test-string-list
 /test-submodule-config
 /test-subprocess
diff --git a/t/helper/test-skip-verify-index.c 
b/t/helper/test-skip-verify-index.c
new file mode 100644
index 000..ec2db27
--- /dev/null
+++ b/t/helper/test-skip-verify-index.c
@@ -0,0 +1,73 @@
+#include "cache.h"
+#include "parse-options.h"
+
+uint64_t time_runs(int count)
+{
+   uint64_t t0, t1;
+   uint64_t sum = 0;
+   uint64_t avg;
+   int i;
+
+   for (i = 0; i < count; i++) {
+   t0 = getnanotime();
+   read_cache();
+   t1 = getnanotime();
+
+   sum += (t1 - t0);
+
+   printf("%f %d [cache_nr %d]\n",
+  ((double)(t1 - t0))/10,
+  skip_verify_index,
+  the_index.cache_nr);
+   fflush(stdout);
+
+   discard_cache();
+   }
+
+   avg = sum / count;
+   if (count > 1) {
+   printf("%f %d avg\n",
+  (double)avg/10,
+  skip_verify_index);
+   fflush(stdout);
+   }
+
+   return avg;
+}
+
+int cmd_main(int argc, const char **argv)
+{
+   int count = 1;
+   const char *usage[] = {
+   "test-core-verify-index",
+   NULL
+   };
+   struct option options[] = {
+   OPT_INTEGER('c', "count", , "number of passes"),
+   OPT_END(),
+   };
+   const char *prefix;
+   uint64_t avg_no_skip, avg_skip;
+
+   prefix = setup_git_directory();
+   argc = parse_options(argc, argv, prefix, options, usage, 0);
+
+   if (count < 1)
+   die("count must greater than zero");
+
+   /* throw away call to get the index in the disk cache */
+   read_cache();
+   discard_cache();
+
+   /* disable index SHA verification */
+   skip_verify_index = 0;
+   avg_no_skip = time_runs(count);
+
+   /* enable index SHA verification */
+   skip_verify_index = 1;
+   avg_skip = time_runs(count);
+
+   if (avg_skip > avg_no_skip)
+   die("skipping index SHA verification did not help");
+   return 0;
+}
-- 
2.7.4



[PATCH v2 0/2] read-cache: call verify_hdr() in a background thread

2017-03-27 Thread git
From: Jeff Hostetler 

Version 2 of this patch series simplifies this to just
turn off the hash verification.  Independent comments
from Linus and Peff suggested that we could just turn
this off and not worry about it.  So I've updated this
patch to do that.  I added a global variable to allow
the original code path to be used.  I also added a
t/helper command to demonstrate the differences.

On the Linux repo, the effect is rather trivial:

$ ~/work/gfw/t/helper/test-skip-verify-index -c 3
0.029884 0 [cache_nr 57994]
0.031035 0 [cache_nr 57994]
0.024308 0 [cache_nr 57994]
0.028409 0 avg
0.018359 1 [cache_nr 57994]
0.017025 1 [cache_nr 57994]
0.011087 1 [cache_nr 57994]
0.015490 1 avg

On my Windows source tree (450MB index), I'm seeing a
savings of 0.6 seconds -- read_index() went from 1.2 to 0.6
seconds.

=

This patch contains a performance optimization to run
verify_hdr() in a background thread while the foreground
thread parses the index.  This allows do_read_index() to
complete faster.

This idea was recently discussed on the mailing list in:
https://public-inbox.org/git/85221b97-759f-b7a9-1256-21515d163...@jeffhostetler.com/
during a discussion on sha1dc.

Our testing on Windows showed that verifying the SHA1 hash
on the index file takes approximately the same amount of time
as parsing the file and building the array of cache_entries.
(It could also be that having 2 threads better ammortizes the
page faults of reading from the mmap'd file.)

An earlier version of this change has been in use in GfW since December:
https://github.com/git-for-windows/git/pull/978

Jeff Hostetler (2):
  read-cache: skip index SHA verification
  skip_verify_index: helper test

 Makefile  |  1 +
 cache.h   |  5 +++
 read-cache.c  |  6 
 t/helper/.gitignore   |  1 +
 t/helper/test-skip-verify-index.c | 73 +++
 5 files changed, 86 insertions(+)
 create mode 100644 t/helper/test-skip-verify-index.c

-- 
2.7.4



Re: [PATCH 3/3] blame: output porcelain "previous" header for each file

2017-03-27 Thread Ævar Arnfjörð Bjarmason
On Fri, Jan 6, 2017 at 5:20 AM, Jeff King  wrote:

> +for output in porcelain line-porcelain
> +do
> +   test_expect_success "generate --$output output" '
> +   git blame --root -C --$output combined >output
> +   '
> +

The --root option isn't needed here, the tests pass if it's removed.

Discovered while looking at the sorry state of our blame test suite
vis-a-vis tests for config, no tests for blame.showroot &
blame.blankboundary.

I'll submit that eventually, but meanwhile did you mean to omit --root
here, or is the test broken in some other way and it /should/ need
--root but doesn't?


Re: [PATCH v2 2/3] sequencer: make commit options more extensible

2017-03-27 Thread Junio C Hamano
Junio C Hamano  writes:

> As this thing is about fixing a regression visible to end users, I
> may get around to fixing things up myself, but I have other topics
> to attend to, so...

So I ended up with this version before merging it to 'next'.  

I moved 'allow' back on the line it is declared, but left it
uninitialized because it is unconditionally assigned to before its
value is looked at.

Also the updated log message stresses more about the reason why
piling new parameters on top is a bad practice, and it is a good
idea to do this clean-up now.  Compared to that, the reason why this
clean-up was not done before and left as maintenance burden is much
less important to the readers of the log.

-- >8 --
From: Johannes Schindelin 
Date: Thu, 23 Mar 2017 17:07:11 +0100
Subject: [PATCH] sequencer: make commit options more extensible

So far every time we need to tweak the behaviour of run_git_commit()
we have been adding a "int" parameter to it.  As the function gains
parameters and different callsites having different needs, this is
becoming a maintenance burden.  When a new knob needs to be added to
address a specific need for a single callsite, all the other callsites
need to add a "no, I do not want anything special with respect to the
new knob" argument.

Consolidate the existing four parameters into a flag word to make it
more maintainable, as we will be adding a new one to the mix soon.

Signed-off-by: Johannes Schindelin 
Signed-off-by: Junio C Hamano 
---
 sequencer.c | 48 ++--
 1 file changed, 26 insertions(+), 22 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 8183a83c1f..677e6ef764 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -602,6 +602,11 @@ N_("you have staged changes in your working tree\n"
 "\n"
 "  git rebase --continue\n");
 
+#define ALLOW_EMPTY (1<<0)
+#define EDIT_MSG(1<<1)
+#define AMEND_MSG   (1<<2)
+#define CLEANUP_MSG (1<<3)
+
 /*
  * If we are cherry-pick, and if the merge did not result in
  * hand-editing, we will hit this commit and inherit the original
@@ -615,8 +620,7 @@ N_("you have staged changes in your working tree\n"
  * author metadata.
  */
 static int run_git_commit(const char *defmsg, struct replay_opts *opts,
- int allow_empty, int edit, int amend,
- int cleanup_commit_message)
+ unsigned int flags)
 {
struct child_process cmd = CHILD_PROCESS_INIT;
const char *value;
@@ -624,7 +628,7 @@ static int run_git_commit(const char *defmsg, struct 
replay_opts *opts,
cmd.git_cmd = 1;
 
if (is_rebase_i(opts)) {
-   if (!edit) {
+   if (!(flags & EDIT_MSG)) {
cmd.stdout_to_stderr = 1;
cmd.err = -1;
}
@@ -640,7 +644,7 @@ static int run_git_commit(const char *defmsg, struct 
replay_opts *opts,
argv_array_push(, "commit");
argv_array_push(, "-n");
 
-   if (amend)
+   if ((flags & AMEND_MSG))
argv_array_push(, "--amend");
if (opts->gpg_sign)
argv_array_pushf(, "-S%s", opts->gpg_sign);
@@ -648,16 +652,16 @@ static int run_git_commit(const char *defmsg, struct 
replay_opts *opts,
argv_array_push(, "-s");
if (defmsg)
argv_array_pushl(, "-F", defmsg, NULL);
-   if (cleanup_commit_message)
+   if ((flags & CLEANUP_MSG))
argv_array_push(, "--cleanup=strip");
-   if (edit)
+   if ((flags & EDIT_MSG))
argv_array_push(, "-e");
-   else if (!cleanup_commit_message &&
+   else if (!(flags & CLEANUP_MSG) &&
 !opts->signoff && !opts->record_origin &&
 git_config_get_value("commit.cleanup", ))
argv_array_push(, "--cleanup=verbatim");
 
-   if (allow_empty)
+   if ((flags & ALLOW_EMPTY))
argv_array_push(, "--allow-empty");
 
if (opts->allow_empty_message)
@@ -926,14 +930,14 @@ static void record_in_rewritten(struct object_id *oid,
 static int do_pick_commit(enum todo_command command, struct commit *commit,
struct replay_opts *opts, int final_fixup)
 {
-   int edit = opts->edit, cleanup_commit_message = 0;
-   const char *msg_file = edit ? NULL : git_path_merge_msg();
+   unsigned int flags = opts->edit ? EDIT_MSG : 0;
+   const char *msg_file = opts->edit ? NULL : git_path_merge_msg();
unsigned char head[20];
struct commit *base, *next, *parent;
const char *base_label, *next_label;
struct commit_message msg = { NULL, NULL, NULL, NULL };
struct strbuf msgbuf = STRBUF_INIT;
-   int res, unborn = 0, amend = 0, allow = 0;
+   int res, unborn = 0, allow;
 
if (opts->no_commit) {
/*
@@ -991,7 +995,7 @@ static int 

Re: [PATCH v2 5/7] name-hash: perf improvement for lazy_init_name_hash

2017-03-27 Thread Jeff Hostetler



On 3/27/2017 4:24 PM, Junio C Hamano wrote:

g...@jeffhostetler.com writes:


+/*
+ * We use n mutexes to guard n partitions of the "istate->dir_hash"
+ * hashtable.  Since "find" and "insert" operations will hash to a
+ * particular bucket and modify/search a single chain, we can say
+ * that "all chains mod n" are guarded by the same mutex -- rather
+ * than having a single mutex to guard the entire table.  (This does
+ * require that we disable "rehashing" on the hashtable.)
+ *
+ * So, a larger value here decreases the probability of a collision
+ * and the time that each thread must wait for the mutex.
+ */
+#define LAZY_MAX_MUTEX   (32)


Good thinking is very well explained in the in-code comment.


+static int handle_range_dir(
+   struct index_state *istate,
+   int k_start,
+   int k_end,
+   struct dir_entry *parent,
+   struct strbuf *prefix,
+   struct lazy_entry *lazy_entries,
+   struct dir_entry **dir_new_out)
+{
+   int rc, k;
+   int input_prefix_len = prefix->len;
+   struct dir_entry *dir_new;
+
+   dir_new = hash_dir_entry_with_parent_and_prefix(istate, parent, prefix);
+
+   strbuf_addch(prefix, '/');
+
+   /*
+* Scan forward in the index array for index entries having the same
+* path prefix (that are also in this directory).
+*/
+   if (strncmp(istate->cache[k_start + 1]->name, prefix->buf, prefix->len) 
> 0)
+   k = k_start + 1;
+   else if (strncmp(istate->cache[k_end - 1]->name, prefix->buf, 
prefix->len) == 0)
+   k = k_end;
+   else {
+   int begin = k_start;
+   int end = k_end;
+   while (begin < end) {
+   int mid = (begin + end) >> 1;
+   int cmp = strncmp(istate->cache[mid]->name, prefix->buf, 
prefix->len);
+   if (cmp == 0) /* mid has same prefix; look in second 
part */
+   begin = mid + 1;
+   else if (cmp > 0) /* mid is past group; look in first 
part */
+   end = mid;
+   else
+   die("cache entry out of order");


Dying at this low level in the callchain in a worker thread made me
feel a little bit nervous, but even if we arrange to return to the
caller without doing extra computation, we would want to detect and
abort when the cache entries are not sorted anyway, so this hsould
be OK.


yeah, i wasn't sure if there was any need to complicate things
returning this error.  the index is sorted before we get here,
so this should never happen.




+   }
+   k = begin;
+   }
+
+   /*
+* Recurse and process what we can of this subset [k_start, k).
+*/
+   rc = handle_range_1(istate, k_start, k, dir_new, prefix, lazy_entries);
+
+   strbuf_setlen(prefix, input_prefix_len);
+
+   *dir_new_out = dir_new;
+   return rc;
+}


The varilable "rc" (return code?) is a lot more than return code; it
tells how many entries we processed and signals the caller that it
still needs to sweep the remainder if we didn't reach k_end.  The
caller of this function calls the variable to receive this
"processed", so I didn't find it too confusing while reading the
code top-down, though.


The "rc" variable came from clear_ce_flags_dir().  I stole the
diving logic from it and clear_ce_flags_1(), so I tried to keep
the same name here.

(And that reminds me, the linear search in clead_ce_flags_dir()
could be sped up with a variation of the binary search I put in
here.  But that's for another day.)




+static int handle_range_1(
+   struct index_state *istate,
+   int k_start,
+   int k_end,
+   struct dir_entry *parent,
+   struct strbuf *prefix,
+   struct lazy_entry *lazy_entries)
+{
+   int input_prefix_len = prefix->len;
+   int k = k_start;
+
+   while (k < k_end) {
+   struct cache_entry *ce_k = istate->cache[k];
+   const char *name, *slash;
+
+   if (prefix->len && strncmp(ce_k->name, prefix->buf, 
prefix->len))
+   break;


At first I was worried by this early return (i.e. we chop the entire
active_nr entries into lazy_nr_dir_threads and hand each of them a
range [k_start, k_end)---stopping before a thread reaches the end of
the range it is responsible for will leave gaps), but then realized
that this early return "we are done with the entries in the same
directory" kicks in only for recursive invocation of this function,
which makes it correct and perfectly fine.

I also briefly wondered if it is worth wiggling the boundary of
ranges for adjacent workers to align with directory boundaries, as
the last round of hashes done by a worker and the first round of
hashes done by another worker adjacent to it will be hashing for the
same parent directory, but I think that would be counter-productive
and think your 

Re: [PATCH v2 0/2] describe: localize debug output

2017-03-27 Thread Junio C Hamano
Michael J Gruber  writes:

> v2 computes the width for the localized output dynamically.
> In fact, it might overcalculated a bit depending on the encoding,
> but this does not do any harm.

Thanks.

As you said, if we wanted to actually _align_, then it needs much
more work.  But that is not what we are aiming for in this round,
it should be alright.



Re: [PATCH] git-p4: Add failing test case for name-rev vs symbolic-ref

2017-03-27 Thread Junio C Hamano
Luke Diamand  writes:

> Yes. The test passes with your change.
>>
>> IOW, can we have a follow up to this patch that fixes a known
>> breakage the patch documents ;-)?
>
> Yes.

Thanks.


Re: [PATCH v3 2/2] l10n: Add git-add.txt to localized man pages

2017-03-27 Thread Junio C Hamano
Jean-Noël AVILA  writes:

>> ...  Doesn't that workflow apply equally well for the
>> documentation l10n?
>
> Theoretically, this workflow should apply to the documentation, so that a 
> version of the documentation can be cut at each release of git. I still have 
> to convince po4a not to update the *.pot and *.po files each time it is run, 
> while at the same time allow translators to produce the output file for 
> proofreading.

Ahh, OK, that does sound like a meaningful difference that may make
the workflow we use for in-code strings not applicable for the
documentation l10n project.  As I said already, I am not married to
the "gitman-l10n at Documentation/po" approach at all, and if the
layout you brought up to turn the containment relationship the other
way around works better, that is perfectly fine by me.

Thanks.





Re: [PATCH v2 5/7] name-hash: perf improvement for lazy_init_name_hash

2017-03-27 Thread Junio C Hamano
g...@jeffhostetler.com writes:

> +/*
> + * We use n mutexes to guard n partitions of the "istate->dir_hash"
> + * hashtable.  Since "find" and "insert" operations will hash to a
> + * particular bucket and modify/search a single chain, we can say
> + * that "all chains mod n" are guarded by the same mutex -- rather
> + * than having a single mutex to guard the entire table.  (This does
> + * require that we disable "rehashing" on the hashtable.)
> + *
> + * So, a larger value here decreases the probability of a collision
> + * and the time that each thread must wait for the mutex.
> + */
> +#define LAZY_MAX_MUTEX   (32)

Good thinking is very well explained in the in-code comment.

> +static int handle_range_dir(
> + struct index_state *istate,
> + int k_start,
> + int k_end,
> + struct dir_entry *parent,
> + struct strbuf *prefix,
> + struct lazy_entry *lazy_entries,
> + struct dir_entry **dir_new_out)
> +{
> + int rc, k;
> + int input_prefix_len = prefix->len;
> + struct dir_entry *dir_new;
> +
> + dir_new = hash_dir_entry_with_parent_and_prefix(istate, parent, prefix);
> +
> + strbuf_addch(prefix, '/');
> +
> + /*
> +  * Scan forward in the index array for index entries having the same
> +  * path prefix (that are also in this directory).
> +  */
> + if (strncmp(istate->cache[k_start + 1]->name, prefix->buf, prefix->len) 
> > 0)
> + k = k_start + 1;
> + else if (strncmp(istate->cache[k_end - 1]->name, prefix->buf, 
> prefix->len) == 0)
> + k = k_end;
> + else {
> + int begin = k_start;
> + int end = k_end;
> + while (begin < end) {
> + int mid = (begin + end) >> 1;
> + int cmp = strncmp(istate->cache[mid]->name, 
> prefix->buf, prefix->len);
> + if (cmp == 0) /* mid has same prefix; look in second 
> part */
> + begin = mid + 1;
> + else if (cmp > 0) /* mid is past group; look in first 
> part */
> + end = mid;
> + else
> + die("cache entry out of order");

Dying at this low level in the callchain in a worker thread made me
feel a little bit nervous, but even if we arrange to return to the
caller without doing extra computation, we would want to detect and
abort when the cache entries are not sorted anyway, so this hsould
be OK.

> + }
> + k = begin;
> + }
> +
> + /*
> +  * Recurse and process what we can of this subset [k_start, k).
> +  */
> + rc = handle_range_1(istate, k_start, k, dir_new, prefix, lazy_entries);
> +
> + strbuf_setlen(prefix, input_prefix_len);
> +
> + *dir_new_out = dir_new;
> + return rc;
> +}

The varilable "rc" (return code?) is a lot more than return code; it
tells how many entries we processed and signals the caller that it
still needs to sweep the remainder if we didn't reach k_end.  The
caller of this function calls the variable to receive this
"processed", so I didn't find it too confusing while reading the
code top-down, though.

> +static int handle_range_1(
> + struct index_state *istate,
> + int k_start,
> + int k_end,
> + struct dir_entry *parent,
> + struct strbuf *prefix,
> + struct lazy_entry *lazy_entries)
> +{
> + int input_prefix_len = prefix->len;
> + int k = k_start;
> +
> + while (k < k_end) {
> + struct cache_entry *ce_k = istate->cache[k];
> + const char *name, *slash;
> +
> + if (prefix->len && strncmp(ce_k->name, prefix->buf, 
> prefix->len))
> + break;

At first I was worried by this early return (i.e. we chop the entire
active_nr entries into lazy_nr_dir_threads and hand each of them a
range [k_start, k_end)---stopping before a thread reaches the end of
the range it is responsible for will leave gaps), but then realized
that this early return "we are done with the entries in the same
directory" kicks in only for recursive invocation of this function,
which makes it correct and perfectly fine.

I also briefly wondered if it is worth wiggling the boundary of
ranges for adjacent workers to align with directory boundaries, as
the last round of hashes done by a worker and the first round of
hashes done by another worker adjacent to it will be hashing for the
same parent directory, but I think that would be counter-productive
and think your "almost even" distribution would be a lot more
sensible.  After all, when distribution of paths is skewed, a single
directory may have disproportionally more (leaf) entries than the
remainder of the index and in such a case, we would want multiple
workers to share the load of hashing them, even if that means they
may have to hash the same leading path independently.

Nicely done.  Let's have this in 'next' and then in 'master'
soonish.

Thanks.


Re: [PATCH v3 2/2] l10n: Add git-add.txt to localized man pages

2017-03-27 Thread Jean-Noël AVILA
Le dimanche 26 mars 2017, 15:56:55 CEST Junio C Hamano a écrit :
> Jean-Noël AVILA  writes:
> > ... So I would
> > think the other way around: for those interested in translated the
> > documentation, some script would allow to checkout the git project inside
> > the gitman-l10n project (like a kind of library).
> > 
> > This would be mainly transparent for the git developers.
> 
> As long as the resulting layout would help all groups (1) developers
> who do not worry about documentation l10n (2) documentation i18n
> coordinator and transltors (3) those who build and ship binary
> packages, I personally am OK either way.
> 
> Having said that, I am not sure if I understand your "translators do
> not have a fixed version of git.git to work with and po4a cannot
> work well" as a real concern.  Wouldn't the l10n of documentation
> use a similar workflow as used for the translation of in-code
> strings we do in po/?  Namely, *.pot files are *NOT* updated by
> individual translators by picking up a random version of git.git and
> running xgettext.  Instead, i18n coordinator is the only person who
> runs xgettext to update *.pot for the upcoming release of Git being
> prepared, and then translators work off of that *.pot file.  Which
> means they do not have to worry about in-code strings that gets
> updated in the meantime; instead they work on a stable known
> snapshot of *.pot and wait for the next sync with i18n coordinator
> whose running of xgettext would update *.pot files with updated
> in-code strings.  Doesn't that workflow apply equally well for the
> documentation l10n?

Theoretically, this workflow should apply to the documentation, so that a 
version of the documentation can be cut at each release of git. I still have 
to convince po4a not to update the *.pot and *.po files each time it is run, 
while at the same time allow translators to produce the output file for 
proofreading.




[ANNOUNCE] Git for Windows 2.12.2

2017-03-27 Thread Johannes Schindelin
Dear Git users,

It is my pleasure to announce that Git for Windows 2.12.2 is available from:

https://git-for-windows.github.io/

Changes since Git for Windows v2.12.1 (March 21st 2017)

New Features

  * Comes with Git v2.12.2.
  * An earlier iteration of the changes speeding up the
case-insensitive cache of file names was replaced by a new
iteration.

Filename | SHA-256
 | ---
Git-2.12.2-64-bit.exe | 
99492acd85bad097b5952ccfd5cb37658bf3301f5d8256f345dd10937ab07899
Git-2.12.2-32-bit.exe | 
f99a9c398ee352982477be39e723df3357c71f13f0697ec580cfee55419e5880
PortableGit-2.12.2-64-bit.7z.exe | 
6a366a5b5702d24b401aba6b022d502b5f6597e00654075e491319878ba0a535
PortableGit-2.12.2-32-bit.7z.exe | 
52c236fead982c31733e43fb7361a4982b2d1c0a54a011f68b074ec7f64436c3
MinGit-2.12.2-64-bit.zip | 
fcebf3ef4f7fe2bc852879eb77d2bd63af49bd274aa4c4d61c7b4a1fa76b830f
MinGit-2.12.2-32-bit.zip | 
494e4fb629f8b05bc067e27aea86c45af2322a34730d5ff16609bed199e5954c
Git-2.12.2-64-bit.tar.bz2 | 
d91d2d6a6da99ceafc9b0749e619fa1db3387fe20dc0c9ad8e8c51e4a5cd9f37
Git-2.12.2-32-bit.tar.bz2 | 
e048e0082f07dbb7fed1107f78d0515c4d58916154f9c8f9591b482f52c25301

Ciao,
Johannes


Re: [PATCH v2 2/2] sub-process: refactor the filter process code into a reusable module

2017-03-27 Thread Jonathan Tan

On 03/24/2017 08:27 AM, Ben Peart wrote:

Refactor the filter..process code into a separate sub-process
module that can be used to reduce the cost of starting up a sub-process
for multiple commands.  It does this by keeping the external process
running and processing all commands by communicating over standard input
and standard output using the packet format (pkt-line) based protocol.
Full documentation is in Documentation/technical/api-sub-process.txt.


Thanks - this looks like something useful to have.

When you create a "struct subprocess_entry" to be entered into the 
system, it is not a true "struct subprocess_entry" - it is a "struct 
subprocess_entry" plus some extra variables at the end. Since the 
sub-process hashmap is keyed solely on the command, what happens if 
another component uses the same trick (but with different extra 
variables) when using a sub-process with the same command?


I can think of at least two ways to solve this: (i) each component can 
have its own sub-process hashmap, or (ii) add a component key to the 
hashmap. (i) seems more elegant to me, but I'm not sure what the code 
will look like.


Also, I saw some minor code improvements possible (e.g. using 
"starts_with" when you're checking for the "status=" line) but I'll 
comment on those and look into the code more thoroughly once the 
questions in this e-mail are resolved.



diff --git a/sub-process.h b/sub-process.h
new file mode 100644
index 00..d1492f476d
--- /dev/null
+++ b/sub-process.h
@@ -0,0 +1,46 @@
+#ifndef SUBPROCESS_H
+#define SUBPROCESS_H
+
+#include "git-compat-util.h"
+#include "hashmap.h"
+#include "run-command.h"
+
+/*
+ * Generic implementation of background process infrastructure.
+ * See Documentation/technical/api-background-process.txt.
+ */
+
+ /* data structures */
+
+struct subprocess_entry {
+   struct hashmap_entry ent; /* must be the first member! */
+   struct child_process process;
+   const char *cmd;
+};


I notice from the documentation (and from "subprocess_get_child_process" 
below) that this is meant to be opaque, but I think this can be 
non-opaque (like "run-command").


Also, I would prefer adding a "util" pointer here instead of using it as 
an embedded struct. There is no clue here that it is embeddable or meant 
to be embedded.



+
+/* subprocess functions */
+
+typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
+int subprocess_start(struct subprocess_entry *entry, const char *cmd,
+   subprocess_start_fn startfn);


I'm not sure if it is useful to take a callback here - I think the 
caller of this function can just run whatever it wants after a 
successful subprocess_start.


Alternatively, if you add the "util" pointer (as I described above), 
then it makes sense to add a subprocess_get_or_start() function (and now 
it makes sense to take the callback). This way, the data structure will 
own, create, and destroy all the "struct subprocess_entry" that it 
needs, creating a nice separation of concerns.



+
+void subprocess_stop(struct subprocess_entry *entry);


(continued from above) And it would be clear that this would free 
"entry", for example.



+
+struct subprocess_entry *subprocess_find_entry(const char *cmd);
+
+/* subprocess helper functions */
+
+static inline struct child_process *subprocess_get_child_process(
+   struct subprocess_entry *entry)
+{
+   return >process;
+}
+
+/*
+ * Helper function that will read packets looking for "status="
+ * key/value pairs and return the value from the last "status" packet
+ */
+
+int subprocess_read_status(int fd, struct strbuf *status);
+
+#endif



Re: [PATCH] strbuf: support long paths w/o read rights in strbuf_getcwd() on FreeBSD

2017-03-27 Thread Junio C Hamano
Zenobiusz Kunegunda  writes:

> Od: "Junio C Hamano" 
> Do: "René Scharfe" ;
> Wysłane: 2:40 Poniedziałek 2017-03-27
> Temat: Re: [PATCH] strbuf: support long paths w/o read rights in 
> strbuf_getcwd() on FreeBSD
>
>> 
>> Nicely analysed and fixed (or is the right word "worked around"?)
>> 
>> Thanks, will queue.
>
> Is this patch going to be included in next git version ( or sooner ) by any 
> chance?

Absolutely.  Thanks for your initial report and sticking with us
during the session to identify the root cause that led to this
solution.

Again, René, thanks for your superb analysis and solution.


Re: [PATCH v3] rev-parse: match @{upstream}, @{u} and @{push} case-insensitively

2017-03-27 Thread Ævar Arnfjörð Bjarmason
On Mon, Mar 27, 2017 at 7:45 PM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason   writes:
>
>> Before this change:
>>
>> |+-+--+-|
>> | What?  | CI? | CIP? | AG? |
>> |+-+--+-|
>> | sha1   | Y   | -| N   |
>> | describeOutput | N   | N| N   |
>> | refname| N   | N| N   |
>> | @{}  | Y   | Y| Y   |
>> | @{} | N/A | N/A  | N   |
>> | @{-}| N/A | N/A  | N   |
>> | @{upstream}| N   | Y| N   |
>> | @{push}| N   | Y| N   |
>> | ^{}  | N   | Y| N   |
>> | ^{/regex}  | N   | N| N   |
>> |+-+--+-|
>>
>> After it:
>>
>> |+-+--+-|
>> | What?  | CI? | CIP? | AG? |
>> |+-+--+-|
>> | sha1   | Y   | -| N   |
>> | describeOutput | N   | N| N   |
>> | refname| N   | N| N   |
>> | @{}  | Y   | Y| Y   |
>> | @{} | N/A | N/A  | N   |
>> | @{-}| N/A | N/A  | N   |
>> | @{upstream}| Y   | -| N   |
>> | @{push}| Y   | -| N   |
>> | ^{}  | N   | Y| N   |
>> | ^{/regex}  | N   | N| N   |
>> |+-+--+-|
>
> As we are not touching ^{} or ^{/regex}, and it is obvious
> numbers do not have cases, I'll trim this down to focus only on
> things that are relevant while queuing:
>
> Before this change:
>
> |+-+--+-|
> | What?  | CI? | CIP? | AG? |
> |+-+--+-|
> | @{}  | Y   | Y| Y   |
> | @{upstream}| N   | Y| N   |
> | @{push}| N   | Y| N   |
> |+-+--+-|
>
> After it:
>
> |+-+--+-|
> | What?  | CI? | CIP? | AG? |
> |+-+--+-|
> | @{}  | Y   | Y| Y   |
> | @{upstream}| Y   | Y| N   |
> | @{push}| Y   | Y| N   |
> |+-+--+-|
>
> should be sufficient to highlight that it was possible to safely
> make these two things case insensitive, and we made so.
>
> For that matter, I do not know the value of AG? field---it only
> serves to show that @{} is an odd-man out and cannot be
> used as a good example to follow, but I am too lazy to remove it ;-)
>
>> Makes sense, replaced that note with that summary. Here's hopefully a
>> final v3 with that change. I've omitted the other two patches as noted
>> in the discussion about those two, I don't think it makes sense to
>> include them.
>
> Thanks.
>
>> @@ -122,6 +123,9 @@ refs/remotes/myfork/mybranch
>>  Note in the example that we set up a triangular workflow, where we pull
>>  from one location and push to another. In a non-triangular workflow,
>>  '@\{push}' is the same as '@\{upstream}', and there is no need for it.
>> ++
>> +This suffix is accepted when spelled in uppercase, and means the same
>> +thing no matter the case.
>
> As the above text (including the original) does not explicitly say
> that lowercase spelling is canonical, the new text is prone to be
> misinterpreted that only the uppercase version is accepted.  I'll
> do s/is accepted/is also accepted/ while queuing, but please holler
> if there are better ways to phrase this.

All of the above sounds good, thanks for fixing it up.


Re: [PATCH v3] rev-parse: match @{upstream}, @{u} and @{push} case-insensitively

2017-03-27 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> Before this change:
>
> |+-+--+-|
> | What?  | CI? | CIP? | AG? |
> |+-+--+-|
> | sha1   | Y   | -| N   |
> | describeOutput | N   | N| N   |
> | refname| N   | N| N   |
> | @{}  | Y   | Y| Y   |
> | @{} | N/A | N/A  | N   |
> | @{-}| N/A | N/A  | N   |
> | @{upstream}| N   | Y| N   |
> | @{push}| N   | Y| N   |
> | ^{}  | N   | Y| N   |
> | ^{/regex}  | N   | N| N   |
> |+-+--+-|
>
> After it:
>
> |+-+--+-|
> | What?  | CI? | CIP? | AG? |
> |+-+--+-|
> | sha1   | Y   | -| N   |
> | describeOutput | N   | N| N   |
> | refname| N   | N| N   |
> | @{}  | Y   | Y| Y   |
> | @{} | N/A | N/A  | N   |
> | @{-}| N/A | N/A  | N   |
> | @{upstream}| Y   | -| N   |
> | @{push}| Y   | -| N   |
> | ^{}  | N   | Y| N   |
> | ^{/regex}  | N   | N| N   |
> |+-+--+-|

As we are not touching ^{} or ^{/regex}, and it is obvious
numbers do not have cases, I'll trim this down to focus only on
things that are relevant while queuing:

Before this change:

|+-+--+-|
| What?  | CI? | CIP? | AG? |
|+-+--+-|
| @{}  | Y   | Y| Y   |
| @{upstream}| N   | Y| N   |
| @{push}| N   | Y| N   |
|+-+--+-|

After it:

|+-+--+-|
| What?  | CI? | CIP? | AG? |
|+-+--+-|
| @{}  | Y   | Y| Y   |
| @{upstream}| Y   | Y| N   |
| @{push}| Y   | Y| N   |
|+-+--+-|

should be sufficient to highlight that it was possible to safely
make these two things case insensitive, and we made so.  

For that matter, I do not know the value of AG? field---it only
serves to show that @{} is an odd-man out and cannot be
used as a good example to follow, but I am too lazy to remove it ;-)

> Makes sense, replaced that note with that summary. Here's hopefully a
> final v3 with that change. I've omitted the other two patches as noted
> in the discussion about those two, I don't think it makes sense to
> include them.

Thanks.

> @@ -122,6 +123,9 @@ refs/remotes/myfork/mybranch
>  Note in the example that we set up a triangular workflow, where we pull
>  from one location and push to another. In a non-triangular workflow,
>  '@\{push}' is the same as '@\{upstream}', and there is no need for it.
> ++
> +This suffix is accepted when spelled in uppercase, and means the same
> +thing no matter the case.

As the above text (including the original) does not explicitly say
that lowercase spelling is canonical, the new text is prone to be
misinterpreted that only the uppercase version is accepted.  I'll
do s/is accepted/is also accepted/ while queuing, but please holler
if there are better ways to phrase this.

Thanks.


Re: [PATCH 0/7] PREVIEW: Introduce DC_AND_OPENSSL_SHA1 make flag

2017-03-27 Thread Junio C Hamano
Jeff King  writes:

> Yeah, I think we can assume it will be possible with SHAttered levels of
> effort. An attacker can use it to create a persistent corruption by
> having somebody fetch from them twice. So not really that interesting an
> attack, but it is something. I still think that ditching SHA-1 for the
> naming is probably a better fix than worrying about SHA-1 collisions.

Yes, I agree with that part.  

Our trailer checksum happens to be SHA-1 mostly because the code was
available, not because they need to be a crypto-strong hash.  It can
safely be changed to something other than SHA-1 that is much faster,
if that is desired, when it is used only for bit-flip detection of
local files like the index file.

I also agree that changing the naming scheme (e.g. use the "hash" as
a hash to choose hash-bucket but accept the fact that hashes can
collide) is a better solution, if this "packname can collide" were
to become real problem.

Thanks.



Re: [PATCH] sha1dc: safeguard against outside definitions of BIGENDIAN

2017-03-27 Thread Junio C Hamano
Johannes Schindelin  writes:

>> So here is what I came up with as a replacement (this time as a
>> proper patch not a comment on a patch).
>> 
>> Dscho, could you see if this fixes your build?
>
> The Continuous Testing is back to normal, thanks.

Thanks for a quick response.

I wished if we can reuse the endianness detection we already have in
"compat/bswap.h", which is included by git-compat-util.h hence also
here.  It would have been very nice if we could say

#if GIT_BYTE_ORDER == GIT_BIG_ENDIAN
... do big endian thing ...
#else
... do little endian thing ...
#endif

when necessary, without having to reinvent the cheks on __BYTE_ORDER
and friends.  It however needs a bit of rework in "compat/bswap.h",
which is way beyond the scope of making a quick fix-up to unblock us.







Re: [PATCH] push: allow atomic flag via configuration

2017-03-27 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> On Fri, Mar 24, 2017 at 6:17 PM, Romuald Brunet  wrote:
>> +test_expect_success 'atomic option possible via git-config' '
>> +   # prepare the repo
>> +   mk_repo_pair &&
>> +   (
>> +   cd workbench &&
>> +   test_commit one &&
>> +   git checkout -b second master &&
>> +   test_commit two &&
>> +   git push --mirror up
>> +   ) &&
>> +   # a third party modifies the server side:
>> +   (
>> +   cd upstream &&
>> +   git checkout second &&
>> +   git tag test_tag second
>> +   ) &&
>> +   # see if we can now push both branches.
>> +   (
>> +   cd workbench &&
>> +   git config push.atomic true &&
>> +   git checkout master &&
>> +   test_commit three &&
>> +   git checkout second &&
>> +   test_commit four &&
>> +   git tag test_tag &&
>> +   test_must_fail git push --tags up master second
>> +   ) &&
>> +   test_refs master HEAD@{3} &&
>> +   test_refs second HEAD@{1}
>> +'
>> +
>
> Sent my earlier E-Mail too soon, so I missed this, there's no test
> here for what "git config push.atomic false" && "git push --atomic"
> does, is that atomic or not? I.e. what does "git -c push.atomic=false
> push --atomic" do? Does the CLI option override the config as it
> should?

Good points.  Thanks for reading and reviewing the tests carefully.



[PATCH v2 0/2] describe: localize debug output

2017-03-27 Thread Michael J Gruber
v2 computes the width for the localized output dynamically.
In fact, it might overcalculated a bit depending on the encoding,
but this does not do any harm.

Michael J Gruber (2):
  describe: localize debug output fully
  l10n: de: translate describe debug terms

 builtin/describe.c | 15 ---
 po/de.po   | 14 +-
 2 files changed, 25 insertions(+), 4 deletions(-)

-- 
2.12.2.584.g7becbf139a



[PATCH v2 2/2] l10n: de: translate describe debug terms

2017-03-27 Thread Michael J Gruber
Signed-off-by: Michael J Gruber 
---
 po/de.po | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/po/de.po b/po/de.po
index e9c86f5488..913db393dc 100644
--- a/po/de.po
+++ b/po/de.po
@@ -7530,7 +7530,19 @@ msgstr "git describe [] [...]"
 msgid "git describe [] --dirty"
 msgstr "git describe [] --dirty"
 
-#: builtin/describe.c:217
+#: builtin/describe.c:52
+msgid "head"
+msgstr "Branch"
+
+#: builtin/describe.c:52
+msgid "lightweight"
+msgstr "nicht-annotiert"
+
+#: builtin/describe.c:52
+msgid "annotated"
+msgstr "annotiert"
+
+#: builtin/describe.c:249
 #, c-format
 msgid "annotated tag %s not available"
 msgstr "annotiertes Tag %s ist nicht verfügbar"
-- 
2.12.2.584.g7becbf139a



[PATCH v2 1/2] describe: localize debug output fully

2017-03-27 Thread Michael J Gruber
git describe --debug localizes all debug messages but not the terms
head, lightweight, annotated that it outputs for the candidates.
Localize them, too.

Signed-off-by: Michael J Gruber 
---
 builtin/describe.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 45adbf67d5..99e963dfe7 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -50,7 +50,7 @@ struct commit_name {
 };
 
 static const char *prio_names[] = {
-   "head", "lightweight", "annotated",
+   N_("head"), N_("lightweight"), N_("annotated"),
 };
 
 static int commit_name_cmp(const struct commit_name *cn1,
@@ -395,10 +395,19 @@ static void describe(const char *arg, int last_one)
free_commit_list(list);
 
if (debug) {
+   static int label_width = -1;
+   if (label_width < 0) {
+   int i, w;
+   for (i = 0; i < ARRAY_SIZE(prio_names); i++) {
+   w = strlen(_(prio_names[i]));
+   if (label_width < w)
+   label_width = w;
+   }
+   }   
for (cur_match = 0; cur_match < match_cnt; cur_match++) {
struct possible_tag *t = _matches[cur_match];
-   fprintf(stderr, " %-11s %8d %s\n",
-   prio_names[t->name->prio],
+   fprintf(stderr, " %-*s %8d %s\n",
+   label_width, _(prio_names[t->name->prio]),
t->depth, t->name->path);
}
fprintf(stderr, _("traversed %lu commits\n"), seen_commits);
-- 
2.12.2.584.g7becbf139a



Re: Re: Re: Re: GSoC Project | Convert interactive rebase to C

2017-03-27 Thread Pickfire
Johannes Schindelin  wrote:

> On Sat, 25 Mar 2017, Ivan Tham wrote:
>
> > Johannes Schindelin  wrote:
> > > On Tue, 21 Mar 2017, Ivan Tham wrote:
> > > > Stefan Beller  wrote:
> > > > > On Mon, Mar 20, 2017 at 9:41 AM, Ivan Tham  
> > > > > wrote:
> > > > >
> > > > > > I am interested to work on "Convert interactive rebase to C"
> > > > >
> > > > > +cc Johannes, who recently worked on rebase and the sequencer.
> > >
> > > Glad you are interested! Please note that large parts of the
> > > interactive rebase are already in C now, but there is enough work left
> > > in that corner.
> >
> > Glad to hear that, I would really like to see interactive rebase in C.
>
> Please note that a notable part already made it into C in v2.12.1. There
> are still a few loose ends to tie, of course; it still makes for a great
> head start on your project, methinks.

Ah, that's great.

And while I was working on the microproject (shell patterns in user diff),
I can't produce the output of t/t4034-diff-words.sh manually with:

mkdir cpp/ && cd cpp/ && git init

cat > pre b
ab a>=b
a==b a!=b
a
a^b
a|b
a&
a||b
a?b:z
a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
a,y
a::b
EOF

cat > post y
xy x>=y
x==y x!=y
x
x^y
x|y
x&
x||y
x?y:z
x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
x,y
x::y
EOF

echo '* diff="cpp"' > .gitmodules
git diff --no-index --color-words pre post > output

Surprisingly, it shows (which is very different from the expected output):

diff --git a/pre b/post
index 23d5c8a..7e8c026 100644
--- a/pre
+++ b/post
@@ -1,19 +1,19 @@
Foo():x(0&&1){}Foo() : x(0&42) { bar(x); }
cout<<"Hello World!\n"<b a.b
!a ~a a++ a-- a*b a
a*b a/b a%b
a+b a-b
a<>b
ab a>=b
a==b a!=b
a
a^b
a|b
a&
a||b
a?b:z
a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
a,y
a::bWorld?\n"<y x.y
!x ~x x++ x-- x*y x
x*y x/y x%y
x+y x-y
x<>y
xy x>=y
x==y x!=y
x
x^y
x|y
x&
x||y
x?y:z
x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
x,y
x::y

Instead of:

diff --git a/pre b/post
index 23d5c8a..7e8c026 100644
--- a/pre
+++ b/post
@@ -1,19 +1,19 @@
Foo() : x(0&&1&42) { bar(x); }
cout<<"Hello World!?\n"<b 
ay x.by
!ax ~a ax x++ 
ax-- ax*b 
ay x&b
ay
x*b ay x/b ay 
x%b
ay
x+b ay x-b
ay
x<>b
ay
xb ay x>=b
ay
x==b ay x!=b
ay
x&b
ay
x^b
ay
x|b
ay
x&&b
ay
x||b
ay
x?by:z
ax=b ay x+=b 
ay x-=b ay x*=b 
ay x/=b ay x%=b 
ay x<<=b ay x>>=b 
ay x&=b ay x^=b 
ay x|=b
ay
x,y
ax::by

That's does not just happens to cpp builtins, it happens to bibtex as well.
Is it that I had missed some configuration since I have tested this on a
few machines?

> > > > > > aiming to port most builtins stuff to C in which we can reduce
> > > > > > the size of git. Additionally, I would also like to convert
> > > > > > scripts to builtins as an additional milestone.
> > >
> > > Careful. It is a ton of work to get the rebase -i conversion done, and
> > > then a ton of work to get it integrated. That will fill 3 months, very
> > > easily.
> >
> > My main aim is to reduce the extra dependency of perl, but planning to
> > start with rebase, can I make that an optional task where I can help out
> > after I had completed my main task during gsoc?
>
> Sure, you can make it an optional task, and I would be very happy if you
> followed up on it even after GSoC!

Yes, I can do that as well. I will be happy to have git be smaller. ^^

> As far as the Perl dependency is concerned, I actually think there is only
> one serious one left: git add -i.

Yes, that as well. But basically the core parts first.

> Granted, there is send-email, but it really does not matter all that much
> these days *except* if you want to use Git to contribute to projects that
> still use a mailing list-based patch submission process (the ones that
> come to mind are: Git, Linux and Cygwin). Most Git users actually do not
> submit any patches to mailing lists, therefore I tend to ignore this one.

I won't ignore that but I will do the others first since some package
manager pack it with git but instead let it be a subpackage.

> The rest of the Perl scripts interacts with foreign SCMs (archimport,
> cvsexportcommit, cvsimport, cvsserver, and svn). I *guess* that it would
> be nice to 

Re: [PATCH] sha1dc: safeguard against outside definitions of BIGENDIAN

Hi Junio,

On Sat, 25 Mar 2017, Junio C Hamano wrote:

> Junio C Hamano  writes:
> 
> > Which leads me to wonder if a more robust solution that is in line
> > with the original design of sha1dc/sha1.c code may be to do an
> > unconditional "#undef BIGENDIAN" before the above block, so that no
> > matter what the calling environment sets BIGENDIAN to (including
> > "0"), it gets ignored and we always use the auto-selection.
> 
> So here is what I came up with as a replacement (this time as a
> proper patch not a comment on a patch).
> 
> Dscho, could you see if this fixes your build?

The Continuous Testing is back to normal, thanks.

Ciao,
Johannes


Re: Re: Re: GSoC Project | Convert interactive rebase to C

Hi Ivan,

On Sat, 25 Mar 2017, Inaw Tham wrote:

> Johannes Schindelin  wrote:
> > On Tue, 21 Mar 2017, Ivan Tham wrote:
> > > Stefan Beller  wrote:
> > > > On Mon, Mar 20, 2017 at 9:41 AM, Ivan Tham  wrote:
> > > >
> > > > > I am interested to work on "Convert interactive rebase to C"
> > > >
> > > > +cc Johannes, who recently worked on rebase and the sequencer.
> > 
> > Glad you are interested! Please note that large parts of the
> > interactive rebase are already in C now, but there is enough work left
> > in that corner.
> 
> Glad to hear that, I would really like to see interactive rebase in C.

Please note that a notable part already made it into C in v2.12.1. There
are still a few loose ends to tie, of course; it still makes for a great
head start on your project, methinks.

> > > > > aiming to port most builtins stuff to C in which we can reduce
> > > > > the size of git. Additionally, I would also like to convert
> > > > > scripts to builtins as an additional milestone.
> > 
> > Careful. It is a ton of work to get the rebase -i conversion done, and
> > then a ton of work to get it integrated. That will fill 3 months, very
> > easily.
> 
> My main aim is to reduce the extra dependency of perl, but planning to
> start with rebase, can I make that an optional task where I can help out
> after I had completed my main task during gsoc?

Sure, you can make it an optional task, and I would be very happy if you
followed up on it even after GSoC!

As far as the Perl dependency is concerned, I actually think there is only
one serious one left: git add -i.

Granted, there is send-email, but it really does not matter all that much
these days *except* if you want to use Git to contribute to projects that
still use a mailing list-based patch submission process (the ones that
come to mind are: Git, Linux and Cygwin). Most Git users actually do not
submit any patches to mailing lists, therefore I tend to ignore this one.

The rest of the Perl scripts interacts with foreign SCMs (archimport,
cvsexportcommit, cvsimport, cvsserver, and svn). I *guess* that it would
be nice to follow up on the remote-svn work (which has not really gone
anywhere so far, AFAICT the main driving contributor pursues different
projects these days), but IMHO none of these are really needed to run Git.

Ciao,
Johannes


[PATCH v3] rev-parse: match @{upstream}, @{u} and @{push} case-insensitively

Change the revision parsing logic to match @{upstream}, @{u} & @{push}
case-insensitively.

Before this change supplying anything except the lower-case forms
emits an "unknown revision or path not in the working tree"
error. This change makes upper-case & mixed-case versions equivalent
to the lower-case versions.

The use-case for this is being able to hold the shift key down while
typing @{u} on certain keyboard layouts, which makes the sequence
easier to type, and reduces cases where git throws an error at the
user where it could do what he means instead.

These suffixes now join various other suffixes & special syntax
documented in gitrevisions(7) that matches case-insensitively. A table
showing the status of the various forms documented there before &
after this patch is shown below. The key for the table is:

 - CI  = Case Insensitive
 - CIP = Case Insensitive Possible (without ambiguities)
 - AG  = Accepts Garbage (.e.g. @{./.4.minutes./.})

Before this change:

|+-+--+-|
| What?  | CI? | CIP? | AG? |
|+-+--+-|
| sha1   | Y   | -| N   |
| describeOutput | N   | N| N   |
| refname| N   | N| N   |
| @{}  | Y   | Y| Y   |
| @{} | N/A | N/A  | N   |
| @{-}| N/A | N/A  | N   |
| @{upstream}| N   | Y| N   |
| @{push}| N   | Y| N   |
| ^{}  | N   | Y| N   |
| ^{/regex}  | N   | N| N   |
|+-+--+-|

After it:

|+-+--+-|
| What?  | CI? | CIP? | AG? |
|+-+--+-|
| sha1   | Y   | -| N   |
| describeOutput | N   | N| N   |
| refname| N   | N| N   |
| @{}  | Y   | Y| Y   |
| @{} | N/A | N/A  | N   |
| @{-}| N/A | N/A  | N   |
| @{upstream}| Y   | -| N   |
| @{push}| Y   | -| N   |
| ^{}  | N   | Y| N   |
| ^{/regex}  | N   | N| N   |
|+-+--+-|

The ^{} suffix is not made case-insensitive, because other
places that take  like "cat-file -t " do want them case
sensitively (after all we never declared that type names are case
insensitive). Allowing case-insensitive typename only with this syntax
will make the resulting Git as a whole inconsistent.

This change was independently authored to scratch a longtime itch, but
when I was about to submit it I discovered that a similar patch had
been submitted unsuccessfully before by Conrad Irwin in August 2011 as
"rev-parse: Allow @{U} as a synonym for
@{u}" (<1313287071-7851-1-git-send-email-conrad.ir...@gmail.com>).

The tests for this patch are more exhaustive than in the 2011
submission. The starting point for them was to first change the code
to only support upper-case versions of the existing words, seeing what
broke, and amending the breaking tests to check upper case & mixed
case as appropriate, and where not redundant to other similar
tests. The implementation itself is equivalent.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

On Mon, Mar 27, 2017 at 2:27 AM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason   writes:
>
>> The ^{} suffix could be changed to be case-insensitive as well
>> without introducing any ambiguities. That was included in an earlier
>> version of this patch, but Junio objected to its inclusion in [2].
>
> We try not to be personal in our log message.  It's not like my
> objection weighs heavier than objections from others.  The same
> number of bytes in the log message can better to spent to allow
> readers of "git log" independently to judge the rationle without
> referring to external material.

FWIW I just cited it because you went into a lot more detail in your
message, and thought I'd link to it in lieu of trying to paraphrase it
at even greater length.

> Perhaps replace this entire paragraph, including the URL in [2],
> with something like
>
> The ^{type} suffix is not made case-insensitive, because other
> places that take  like "cat-file -t " do want them
> case sensitively (after all we never declared that type names
> are case insensitive). Allowing case-insensitive typename only
> with this syntax will make the resulting Git as a whole
> inconsistent.
>
> Other than that, the change to the code and the new tests all makes
> sense to me.

Makes sense, replaced that note with that summary. Here's hopefully a
final v3 with that change. I've omitted the other two patches as noted
in the discussion about those two, I don't think it makes sense to
include them.

 Documentation/revisions.txt   |  6 +-
 sha1_name.c   |  2 +-
 t/t1507-rev-parse-upstream.sh | 15 +++
 t/t1514-rev-parse-push.sh |  8 ++--
 4 files changed, 23 insertions(+), 8 deletions(-)

diff --git 

Re: [PATCH] push: allow atomic flag via configuration

On Fri, Mar 24, 2017 at 6:17 PM, Romuald Brunet  wrote:
> +test_expect_success 'atomic option possible via git-config' '
> +   # prepare the repo
> +   mk_repo_pair &&
> +   (
> +   cd workbench &&
> +   test_commit one &&
> +   git checkout -b second master &&
> +   test_commit two &&
> +   git push --mirror up
> +   ) &&
> +   # a third party modifies the server side:
> +   (
> +   cd upstream &&
> +   git checkout second &&
> +   git tag test_tag second
> +   ) &&
> +   # see if we can now push both branches.
> +   (
> +   cd workbench &&
> +   git config push.atomic true &&
> +   git checkout master &&
> +   test_commit three &&
> +   git checkout second &&
> +   test_commit four &&
> +   git tag test_tag &&
> +   test_must_fail git push --tags up master second
> +   ) &&
> +   test_refs master HEAD@{3} &&
> +   test_refs second HEAD@{1}
> +'
> +

Sent my earlier E-Mail too soon, so I missed this, there's no test
here for what "git config push.atomic false" && "git push --atomic"
does, is that atomic or not? I.e. what does "git -c push.atomic=false
push --atomic" do? Does the CLI option override the config as it
should?


Re: [PATCH] push: allow atomic flag via configuration

On Fri, Mar 24, 2017 at 6:17 PM, Romuald Brunet  wrote:
> +push.atomic::
> +   If set to true enable `--atomic` option by default.  You
> +   may override this configuration at time of push by specifying
> +   `--no-atomic`.
> +

This should also be mentioned in the --atomic documentation in
git-push.txt itself. See e.g. the documentation for pull --rebase for
an example of this.


Re: [PATCH] push: allow atomic flag via configuration

On ven., 2017-03-24 at 12:29 -0700, Junio C Hamano wrote:
> Jeff King  writes:
> 
> > My one question would be whether people would want this to actually be
> > specific to a particular remote, and not just on for a given repository
> > (your "site-specific" in the description made me think of that). In that
> > case it would be better as part of the remote.* config.
> 
> Yeah, I had the same reaction.  
> 
> Conceptually, this sits next to remote.*.push that defines which set
> of refs are sent by default, and remote..pushAtomic does make
> sense.  If (and only if) it turns out to be cumbersome for somebody
> to set the configuration for each and every remote, it is OK to also
> add push.atomic to serve as a fallback for remote.*.pushAtomic, I
> would think, but adding only push.atomic feels somewhat backwards.

Thanks for your feedback

I'm mostly using single remotes that's why I didn't even think of making
it configurable per remote. But you're right that makes more sense.

I'll try to make that modification to the patch.

As for my use case: I'd like to use default atomic pushes when pushing a
new tag among our stable branch, but inevitably forgetting to rebase
beforehand. Therefore pushing a "dangling" commit/tag


-- 
Romuald Brunet



Re: [PATCH v2 2/3] rev-parse: add @{p} as a synonym for @{push}

On Mon, Mar 27, 2017 at 4:53 AM, Jeff King  wrote:
> On Sun, Mar 26, 2017 at 12:16:53PM +, Ævar Arnfjörð Bjarmason wrote:
>
>> Add @{p} as a shorthand for @{push} for consistency with the @{u}
>> shorthand for @{upstream}.
>>
>> This wasn't added when @{push} was introduced in commit
>> adfe5d0434 ("sha1_name: implement @{push} shorthand", 2015-05-21), but
>> it can be added without any ambiguity and saves the user some typing.
>
> It _can_ be added, but it was intentionally avoided at the time because
> there was discussion of adding other p-words, including:
>
>   - @{pull} as a synonym for @{upstream} (and to better match @{push})
>
>   - @{publish}, which was some similar-ish system that was based on
> per-branch config, but the patches were never merged.
>
> It's been a few years with neither of those things happening, so in a
> sense it may be safe to add it now. OTOH, if users are not clamoring for
> @{p} and it is just being added "because we can", maybe that is not a
> good reason.

Yeah let's just drop this.

>> -'@\{push\}', e.g. 'master@\{push\}', '@\{push\}'::
>> -  The suffix '@\{push}' reports the branch "where we would push to" if
>> +'@\{push\}', e.g. 'master@\{push\}', '@\{p\}'::
>> +  The suffix '@\{push}' (short form '@\{push}') reports the branch "where 
>> we would push to" if
>
> Did you mean to say "short form '@\{p}'"?

Yup, my mistake.


Re: [PATCH v2 3/3] rev-parse: match ^{} case-insensitively

On Mon, Mar 27, 2017 at 4:58 AM, Jeff King  wrote:
> On Sun, Mar 26, 2017 at 05:36:21PM -0700, Junio C Hamano wrote:
>
>> It's not "potential confusion".  This closes the door for us to
>> introduce "TREE" as a separate object type in the future.
>>
>> If we agree to make a declaration that all typenames are officially
>> spelled in lowercase [*1*] and at the UI level we accept typenames
>> spelled in any case, then we can adopt this change (and then we need
>> to update "cat-file -t" etc. to match it).
>>
>> I do not at all mind to see if the list concensus is to make such a
>> declaration and permanently close the door for typenames that are
>> not lowercase, and after seeing such a concensus I'd gladly
>> appreciate this patch, but I do not want to see a change like this
>> that decides the future of the system, pretending as an innocuous
>> change, sneaked in without making sure that everybody is aware of
>> its implications.
>
> FWIW, I cannot see us ever adding TREE (or Tree) as a separate type.
> It's too confusing for no gain. We'd call it "tree2" or something more
> obvious.
>
> So I don't mind closing that door, but I'm not sure if a partial
> conversion (where some commands are case-insensitive but others aren't
> yet) might not leave us in a more confusing place.
>
> I dunno. I guess I have never wanted to type "^{Tree}" in the first
> place, so I do not personally see the _benefit_. Which makes it easy to
> see even small negatives as a net loss.

I don't either, I think this patch should just be dropped.

As noted in the cover letter[1] I carved this off from the rest of the
series just in case anyone wanted this, either now or in the future.

I originally implemented this for consistency with @{U} because it's
another thing that can be made case-insensitive unambiguously as far
as the rev-parse syntax is concerned, but as you/Junio note here the
tree/blob/object etc. names exist in a lot more places, so just making
this particular thing case insensitive wouldn't make sense, and has
little benefit.

1. <20170326121654.22035-1-ava...@gmail.com>


Re: [PATCH v2 3/3] rev-parse: match ^{} case-insensitively

On Sun, Mar 26, 2017 at 10:39:18PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > FWIW, I cannot see us ever adding TREE (or Tree) as a separate type.
> > It's too confusing for no gain. We'd call it "tree2" or something more
> > obvious.
> 
> In case it was not clear, I didn't mean to say I _want_ to leave
> that door open.  Well, I cannot imagine it was unclear, as I said I
> do not at all mind declaring that all object names will be lowercase
> to allow us freely downcase what we got at the UI level.

No, I understood that. You just mentioned "list consensus" so I was
trying to give my two cents. ;)

> > I dunno. I guess I have never wanted to type "^{Tree}" in the first
> > place, so I do not personally see the _benefit_. Which makes it easy to
> > see even small negatives as a net loss.
> 
> As to the potential _benefit_, I do not see much either myself, but
> we already are seeing somebody cared enough to throw us a patch, so
> to some people there are clearly perceived benefit.  I do not think
> closing the door for typenames that are not lowercase is a negative
> change at all.

By negative, I just meant potential confusion when we are half-way there
(e.g., "foo^{TREE}" works but "git cat-file TREE foo" does not).

> I just wanted the patch to make it clear that it is making such a
> system-wide design decision and casting it in stone.  Which includes
> that "cat-file " and "hash-object -t " get the same
> case-insensitivity update and probably writing that design decision
> down somewhere in the documentation, perhaps in the glossary where
> we talk about the "object type".

Yes, I agree that that is the right path forward.

-Peff


Re: [PATCH 0/7] PREVIEW: Introduce DC_AND_OPENSSL_SHA1 make flag

On Sun, Mar 26, 2017 at 11:07:02PM -0700, Junio C Hamano wrote:

> > No, I don't think so. We don't trust the trailer hash for anything to do
> > with corruption; we actually inflate the objects and see which ones we
> > got. So the victim will notice immediately that what the attacker sent
> > it is insufficient to complete the fetch (or push), and will refuse to
> > update the refs. The fetch transfer, but nobody gets corrupted.
> 
> In the scenario I was presenting, both the original fetch that gives
> one packdata and the later fetch that gives another packdata (which
> happens to share the csum-file trailing checksum) satisfy the "does
> the new pack give us enough objects to really complete the tips of
> refs?" check.

Right, my point was that we do that check _after_ throwing away the
duplicate-named pack. So you cannot fool that check, update the ref, and
then throw away the pack to get a corrupt receiver. The receiver throws
away the pack first, then says "hey, I don't have all the objects" and
aborts.

That said...

> The second fetch transfers, we validate the packdata using index-pack
> (we may pass --check-self-contained-and-connected and we would pass
> --strict if transfer-fsck is set), we perhaps even store it in
> quarantine area while adding it to the list of in-core packs, make
> sure everything is now connected from the refs using pre-existing
> packs and this new pack.  The index-pack may see everything is good
> and then would report the resulting pack name back to
> index_pack_lockfile() called by fetch-pack.c::get_pack().

These are interesting corner cases. We only use
--check-self-contained-and-connected with clones, but you may still have
packs from an alternate during a clone (although I think the two packs
would be allowed to co-exist indefinitely, then).

The quarantine case is more interesting. The two packs _do_ co-exist
while we do the connectivity check there, and then afterwards we can
have only one. So that reversal of operations introduces a problem, and
you could end up with a lasting corruption as a result.

> But even though both of these packs _are_ otherwise valid (in the
> sense that they satisfactorily transfer objects necessary to make
> the refs that were fetched complete), because we name the packs
> after the trailer hash and we cannot have two files with the same
> name, we end up throwing away the later one.

I kind of wonder if we should simply allow potential duplicates to
co-exist. The pack names really aren't used for duplicate suppression in
any meaningful sense. We effectively use them as UUIDs so that each new
pack gets a unique name without having to do any locking or other
coordination. It would not be unreasonable to say "oops, 1234abcd
already exists; I'll just increment and call this new one 1234abce". The
two presumably-the-same packs would then co-exist until the new "repack
-a" removes duplicates (not just at the pack level, but at the object
level).

The biggest problem there is that "claiming" a pack name is not
currently atomic. We just do it blindly. So switching to some other
presumed-unique UUID might actually be easier (whether SHA-256 of the
pack contents or some other method).

> As I said, it is a totally different matter if this attack scenario
> is a practical threat.  For one thing, it is probably harder than
> just applying the straight "shattered" attack to create a single
> object collision--you have to make two packs share the same trailing
> hash _and_ make sure that both of them record data for valid
> objects.  But I am not convinced that it would be much harder
> (e.g. I understand that zlib deflate can be told not to attempt
> compression at all, and the crafted garbage used in the middle part
> of the "shattered" attack can be such a blob object expressed as a
> base object--once the attacker has two such packfiles that hash the
> same, two object names for these garbage blobs can be used to
> present two versions of the values for a ref to be fetched by these
> two fetch requests).

Yeah, I think we can assume it will be possible with SHAttered levels of
effort. An attacker can use it to create a persistent corruption by
having somebody fetch from them twice. So not really that interesting an
attack, but it is something. I still think that ditching SHA-1 for the
naming is probably a better fix than worrying about SHA-1 collisions.

-Peff


Re: [PATCH] git-p4: Add failing test case for name-rev vs symbolic-ref

On 27 March 2017 at 00:18, Junio C Hamano  wrote:
> Luke Diamand  writes:
>
>> As per the discussion about use of "git name-rev" vs "git symbolic-ref" in
>> git-p4 here:
>>
>> http://marc.info/?l=git=148979063421355
>>
>> git-p4 could get confused about the branch it is on and affects
>> the git-p4.allowSubmit  option. This adds a failing
>> test case for the problem.
>>
>> Luke Diamand (1):
>>   git-p4: add failing test for name-rev rather than symbolic-ref
>>
>>  t/t9807-git-p4-submit.sh | 16 
>>  1 file changed, 16 insertions(+)
>
> Ahh, thanks for tying loose ends.  I completely forgot about that
> topic.
>
> If we queue this and then update the function in git-p4.py that
> (mis)uses name-rev to learn the current branch to use symbolic-ref
> instead, can we flip the "expect_failure" to "expect_success"?

Yes. The test passes with your change.

>
> IOW, can we have a follow up to this patch that fixes a known
> breakage the patch documents ;-)?

Yes.

Regards!
Luke


Re: [PATCH 0/7] PREVIEW: Introduce DC_AND_OPENSSL_SHA1 make flag

Jeff King  writes:

>> If a malicious site can craft two packfiles that hash to the same,
>> then it can first feed one against a fetch request, then feed the
>> other one when a later fetch request comes, and then the later pack
>> is discarded by the "existing data wins" rule.  Even though this
>> later pack may have all the objects necessary to complete the refs
>> this later fetch request asked for, and an earlier pack that
>> happened to have the same pack trailer hash doesn't have these
>> necessary objects, the receiver ends up discarding this necessary
>> pack.  The end result is that the receiver's repository is now
>> corrupt, lacking some objects.
>
> No, I don't think so. We don't trust the trailer hash for anything to do
> with corruption; we actually inflate the objects and see which ones we
> got. So the victim will notice immediately that what the attacker sent
> it is insufficient to complete the fetch (or push), and will refuse to
> update the refs. The fetch transfer, but nobody gets corrupted.

In the scenario I was presenting, both the original fetch that gives
one packdata and the later fetch that gives another packdata (which
happens to share the csum-file trailing checksum) satisfy the "does
the new pack give us enough objects to really complete the tips of
refs?" check.  The second fetch transfers, we validate the packdata
using index-pack (we may pass --check-self-contained-and-connected
and we would pass --strict if transfer-fsck is set), we perhaps even
store it in quarantine area while adding it to the list of in-core
packs, make sure everything is now connected from the refs using
pre-existing packs and this new pack.  The index-pack may see
everything is good and then would report the resulting pack name
back to index_pack_lockfile() called by fetch-pack.c::get_pack().

That was the scenario I had in mind.  Not "bogus sender throws a
corrupt pack at you" case, where we check the integrity and
connectivity of the objects ourselves.

And the trailer hash the sender added at the end of the pack data
stream does not have to come into the picture.  When we compute that
ourselves for the received pack data, because the sender is trying a
successful collision attack (they gave us one pack that hashes to
certain value earlier; now they are giving us the other one), we
would end up computing the same.

But even though both of these packs _are_ otherwise valid (in the
sense that they satisfactorily transfer objects necessary to make
the refs that were fetched complete), because we name the packs
after the trailer hash and we cannot have two files with the same
name, we end up throwing away the later one.

As I said, it is a totally different matter if this attack scenario
is a practical threat.  For one thing, it is probably harder than
just applying the straight "shattered" attack to create a single
object collision--you have to make two packs share the same trailing
hash _and_ make sure that both of them record data for valid
objects.  But I am not convinced that it would be much harder
(e.g. I understand that zlib deflate can be told not to attempt
compression at all, and the crafted garbage used in the middle part
of the "shattered" attack can be such a blob object expressed as a
base object--once the attacker has two such packfiles that hash the
same, two object names for these garbage blobs can be used to
present two versions of the values for a ref to be fetched by these
two fetch requests).


Re: [PATCH] strbuf: support long paths w/o read rights in strbuf_getcwd() on FreeBSD


Od: "Junio C Hamano" 
Do: "René Scharfe" ;
Wysłane: 2:40 Poniedziałek 2017-03-27
Temat: Re: [PATCH] strbuf: support long paths w/o read rights in 
strbuf_getcwd() on FreeBSD

> 
> Nicely analysed and fixed (or is the right word "worked around"?)
> 
> Thanks, will queue.
> 


Is this patch going to be included in next git version ( or sooner ) by any 
chance?

Thank you, everyone,  for your attention to the problem.