Re: [PATCH] fetch: use "quick" has_sha1_file for tag following

2016-10-18 Thread Jeff King
On Mon, Oct 17, 2016 at 10:30:28AM -0700, Junio C Hamano wrote:

> > It looks like I _did_ look into optimizing this into a single stat()
> > call in the thread at [1]. I completely forgot about that. I did find
> > there that naively using stat_validity() on a directory is racy, though
> > I wonder if we could do something clever with gettimeofday() instead.
> 
> It feels funny to hear an idea to compare fs timestamp with gettimeofday
> immedately after hearing the word NFS, though ;-).

Yeah, I had a funny feeling in my stomach as I wrote that.

What you really want to know is the current filesystem time. You'd
probably have to do something gross like creating a new file and then
comparing its timestamp. In theory you'd only have to do that _once_,
and then as long as the pack directory wasn't changing, you could say "I
don't know what time it is now, but I know it is at least time X, and I
know that X is greater than Y, the pack directory timestamp, therefore
the pack directory hasn't changed since I last looked".

That assumes monotonic clocks, but we basically already do so for the
racy-git checks, I think.

I dunno. It feels...complicated. And bad to require writing to the
repository for what would otherwise be a read-only operation. But I
don't see any fundamental reason it could not work.

-Peff


Re: [PATCH 0/4] diff.wsErrorHighlight configuration variable

2016-10-18 Thread Jeff King
On Tue, Oct 04, 2016 at 03:54:45PM -0700, Junio C Hamano wrote:

> "git diff" and its family of commands have "--ws-error-highlight"
> option to allow whitespace breakages on old and context lines
> painted in color.diff.whitespace color, instead of the usual "we
> paint breakages only on new lines", but there wasn't a configuration
> variable that corresponds to it.
> 
> This would be a lot closer to a series that could be acceptable,
> compared to the previous "it should look like this" patch.
> 
> Junio C Hamano (4):
>   t4015: split out the "setup" part of ws-error-highlight test
>   diff.c: refactor parse_ws_error_highlight()
>   diff.c: move ws-error-highlight parsing helpers up
>   diff: introduce diff.wsErrorHighlight option

This topic got stuck in my dreaded "to review" pile and I forgot about
it. I see you've already marked it for merging to master, but FWIW, I
read it over and did not see any problems.

-Peff


Re: [PATCH] submodule--helper: normalize funny urls

2016-10-18 Thread Johannes Schindelin
Hi Stefan,

On Mon, 17 Oct 2016, Stefan Beller wrote:

> On Mon, Oct 17, 2016 at 3:49 PM, Junio C Hamano  wrote:
> > Stefan Beller  writes:
> >
> >> +static void strip_url_ending(char *url, size_t *_len)
> >> +{
> >> + int check_url_stripping = 1;
> >> + size_t len = _len ? *_len : strlen(url);
> >> +
> >> + while (check_url_stripping) {
> >> + check_url_stripping = 0;
> >> + if (is_dir_sep(url[len-2]) && url[len-1] == '.') {
> >
> > This is "strip /. at the end" it seems.
> >
> > Does anything in the loop control guarantees 2 <= len at this point?
> 
> Oh, thanks for pointing that out. I thought about that and missed to add it.
> I'll reroll with the length check once we hear back from Windows folks,
> that this is a viable strategy for them, too.

It is a viable strategy for me, too.

Thanks,
Dscho


Re: What's cooking in git.git (Oct 2016, #03; Tue, 11)

2016-10-18 Thread Johannes Schindelin
Hi Junio,

On Mon, 17 Oct 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> >> I'll mark it as "wait for follow-up fix" in whats-cooking.txt (on
> >> 'todo' branch) to remind myself not to merge it yet.
> >
> > May I request your guidance as to your preference how to proceed?  ...
> 
> I guess I didn't see this before I sent my response to the review
> thread, which was in my pile of "these need more thought than others
> before responding" topics.  
> 
> > Here are the options I see:
> >
> > A) remove the tests in question
> >
> > B) mark them as !MINGW instead
> >
> > C) change just those two tests from using `$PWD` (pseudo-Unix path) to
> >   `$(pwd)` (native path)
> >
> > I would like to hear your feedback about your preference, but not without
> > priming you a little bit by detailing my current opinion on the matter:
> >
> > While I think B) would be the easiest to read, C) would document the
> > expected behavior better. A) would feel to me like shrugging, i.e. the
> > lazy, wrong thing to do.
> >
> > What do you think?
> 
> As to my preference on tests, I guess what I suggested was a cross
> between your B and C below, and I can go with either one as an
> abbreviated version of my preference ;-) 
> 
> I am still wondering if the test is expecting the right behaviour,
> though.  If some codepaths rely on a question "please resolve '../.'
> relative to 'path/to/dir/.'" being answered as "that's path/to/dir
> itself", it smells to me that the downstream of the dataflow that
> expects such an answer, as well as the machinery that produces such
> an answer, are acting as two wrongs that happen to cancel each
> other.  Am I grossly misunderstanding what that test is doing?

I think your "let's take a step back" was spot on: when being passed a
path ending in "/." and being told to normalize the relative path "../."
on top, the very special meaning of "." should be taken into
consideration.

Thanks,
Dscho


Re: [PATCH v3 14/25] sequencer: introduce a helper to read files written by scripts

2016-10-18 Thread Johannes Schindelin
Hi Junio,

On Mon, 17 Oct 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > +/*
> > + * Reads a file that was presumably written by a shell script, i.e.
> > + * with an end-of-line marker that needs to be stripped.
> > + *
> > + * Returns 1 if the file was read, 0 if it could not be read or does not 
> > exist.
> > + */
> > +static int read_oneliner(struct strbuf *buf,
> > +   const char *path, int skip_if_empty)
> > +...
> > +   if (strbuf_read_file(buf, path, 0) < 0) {
> > +   warning_errno(_("could not read '%s'"), path);
> > +   return 0;
> > +   }
> > +   if (buf->len > orig_len && buf->buf[buf->len - 1] == '\n') {
> > +   if (--buf->len > orig_len && buf->buf[buf->len - 1] == '\r')
> > +   --buf->len;
> > +   buf->buf[buf->len] = '\0';
> > +   }
> 
> The name says "oneliner" but this reads the whole thing and trims
> only the last line of the input.  Which is correct?

The latter. Basically, `read_oneliner()` is short-hand for "that thing
that shell does when you use `cat file` with backticks.

I do not like `read_stripping_last_eol()`, `read_what_the_shell_wrote()`
nor `read_skipping_last_lf()`. So if you come up with any brilliant idea,
I am all ears.

In the meantime, I'd be happy to just add a comment that this function is
intended for oneliners, but that it will also read multi-line files and
only strip off the EOL marker from the last line.

Would that work for you?
Dscho


Re: [PATCH v3 16/25] sequencer: support amending commits

2016-10-18 Thread Johannes Schindelin
Hi Junio,

On Mon, 17 Oct 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > This teaches the run_git_commit() function to take an argument that will
> > allow us to implement "todo" commands that need to amend the commit
> > messages ("fixup", "squash" and "reword").
> 
> Likewise to 15/25, i.e. Good, though the growth by these two steps
> starts to make me wonder if these three options should be crammed
> into an unsigned "flags" bitword.

After looking at the diff with the added complications of ORing and ANDing
the flags, I'd much rather prefer to stay with the three flags being kept
separately. It's not like we need to save bits, but we need to preserve
readability as much as possible, I'd wager.

> I see you have v4, so I'll ignore the remainder of this stale round
> and start reading that updated one instead.

Thanks,
Dscho


Re: [PATCH v4 05/25] sequencer: eventually release memory allocated for the option values

2016-10-18 Thread Johannes Schindelin
Hi Junio,

On Mon, 17 Oct 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > The sequencer is our attempt to lib-ify cherry-pick. Yet it behaves
> > like a one-shot command when it reads its configuration: memory is
> > allocated and released only when the command exits.
> >
> > This is kind of okay for git-cherry-pick, which *is* a one-shot
> > command. All the work to make the sequencer its work horse was
> > done to allow using the functionality as a library function, though,
> > including proper clean-up after use.
> >
> > To remedy that, we now take custody of the option values in question,
> > requiring those values to be malloc()ed or strdup()ed
> 
> That is the approach this patch takes, so "eventually release" in
> the title is no longer accurate, I would think.

To the contrary, we now free() things in remove_state(), so we still
"eventually release" the memory.

> > Sadly, the current approach makes the code uglier, as we now have to
> > take care to strdup() the values passed via the command-line.
> 
> I obviously disagree with that statement and the _entrust was too
> ugly to live, but it is obviously subjective, and it boils down to
> who has a better taste.  Let's not go there.

Changed.

Thanks,
Dscho


Re: [PATCH v4 08/25] sequencer: completely revamp the "todo" script parsing

2016-10-18 Thread Johannes Schindelin
Hi Junio,

On Mon, 17 Oct 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > -   for (i = 1; *p; i++) {
> > +   for (i = 1; *p; i++, p = next_p) {
> > char *eol = strchrnul(p, '\n');
> > -   commit = parse_insn_line(p, eol, opts);
> > -   if (!commit)
> > -   return error(_("Could not parse line %d."), i);
> > -   next = commit_list_append(commit, next);
> > -   p = *eol ? eol + 1 : eol;
> > +
> > +   next_p = *eol ? eol + 1 /* strip LF */ : eol;
> 
> This one was explained as "skip LF" in the previous round, and that
> is more correct than "strip", I think.  The +1 here is not done to
> "strip" the LF out of the end result, but to "skip" one to move to
> the beginning of the next line.

Changed,
Dscho


Re: Merge conflicts in .gitattributes can cause trouble

2016-10-18 Thread Johannes Schindelin
Hi Junio,

On Mon, 17 Oct 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > I would vote for:
> >
> > 4. We keep letting Git read in the *current* version of .gitattributes
> >*before* the merge, and apply those attributes while performing the
> >merge.
> 
> Even though this needs a major surgery to the way the attr subsystem
> reads from these files, I think it is conceptually the cleanest.

To the contrary. As far as I can see, when calling `git merge`, Git
currently *does* read .gitattributes from the file, and if that fails,
falls back to reading that file from the index.

In other words, option 4. is the current behavior no change required.

Ciao,
Dscho


Git log exclude/remotes/branches options not working as expected

2016-10-18 Thread Robert Dailey
I have 3 remotes registered in my clone:

origin, fork, drive

When I do:

$ git log --oneline --decorate --graph

I only want to see branches under:

refs/heads/*
refs/remotes/origin/*

I tried the following:

$ git log --oneline --decorate --graph --simplify-by-decoration
--remote=origin topic1..master

However, I still see refs present in the graph for 'drive' and 'fork'
remote tracking branches. I can't tell if these are shown simply
because other refs not excluded by my options happen to also be at
that SHA1, or if the log command is still generating the graph based
on other branches.

What I'm expecting is that I literally see NONE of those excluded refs
on the graph, even if other included refs also happen to be positioned
at those commits. It's the visualization of the refs I'm concerned
about: I have a lot of remote tracking branches in those remotes that
clutter the view; I'd rather not see them at all (in addition to the
graph not considering them when it is being built/generated). Am I
misunderstanding the purpose here? How can I achieve my goals?
Documentation hasn't really helped me out here.


Re: What's cooking in git.git (Oct 2016, #03; Tue, 11)

2016-10-18 Thread Santiago Torres
> * st/verify-tag (2016-10-10) 7 commits
>  - t/t7004-tag: Add --format specifier tests
>  - t/t7030-verify-tag: Add --format specifier tests
>  - builtin/tag: add --format argument for tag -v
>  - builtin/verify-tag: add --format to verify-tag
>  - tag: add format specifier to gpg_verify_tag
>  - ref-filter: add function to print single ref_array_item
>  - gpg-interface, tag: add GPG_VERIFY_QUIET flag
> 
>  "git tag" and "git verify-tag" learned to put GPG verification
>  status in their "--format=" output format.
> 
>  Is this ready for 'next'?

Hi, I saw this on the previous "what's cooking." Is there anything I
need to do on my side to make sure this is ready for next?

Thanks!
-Santiago.


signature.asc
Description: PGP signature


[PATCH] daemon, path.c: fix a bug with ~ in repo paths

2016-10-18 Thread Luke Shumaker
The superficial aspect of this change is that git-daemon now allows paths
that start with a "~".  Previously, if git-daemon was run with
"--base-path=/srv/git", it was impossible to get it to serve
"/srv/git/~foo/bar.git".  An odd edge-case that was broken.

But from a source-code standpoint, the change is in path.c:enter_repo().  I
have adjusted it to take separate "strict_prefix" and "strict_suffix"
arguments, rather than a single "strict" argument.

I also make it clearer what the purpose of each path buffer is for, by
renaming them to chdir_path and ret_path; chdir_path is the path that we
pass to chdir(); return_path is the path we return to the user.  Using this
nomenclature, we can more easily explain the behavior of the function.
There are 3 DWIM measures that enter_repo() provides: tilde expansion,
suffix guessing, and gitfile expansion; it also trims trailing slashes.
Here is how they are applied to each path:

+--+++
| Before this commit   | chdir_path | ret_path   |
+--+++
| trim trailing slashes| !strict| !strict|
| tilde expansion  | !strict| false  |
| suffix guessing  | !strict| !strict|
| gitfile expansion (< 2.6.3)  | !strict| false  |
| gitfile expansion (>= 2.6.3) | true   | strict |
+--+++
| With this commit | chdir_path | ret_path   |
+--+++
| trim trailing slashes| true   | true   |
| tilde expansion  | !strict_prefix | false  |
| suffix guessing  | !strict_suffix | !strict_suffix |
| gitfile expansion| true   | false  |
+--+++

The separation of "strict" into "strict_prefix" and "strict_suffix" is
necessary for git-daemon because it has separate --strict-paths (affects
prefix and suffix) and --user-path (just prefix) flags that can be toggled
separately.

In the other programs where enter_repo() is called, I continued the
existing behavior of tying the prefix and suffix strictness together
together; though I am not entirely sure that they should all be enabling
tilde expansion.  But for now, their behavior hasn't changed.

Signed-off-by: Luke Shumaker 
---
 builtin/receive-pack.c   |   2 +-
 builtin/upload-archive.c |   2 +-
 cache.h  |   2 +-
 daemon.c |  42 +++
 http-backend.c   |   2 +-
 path.c   | 135 +--
 upload-pack.c|   2 +-
 7 files changed, 96 insertions(+), 91 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 011db00..f430e96 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1860,7 +1860,7 @@ int cmd_receive_pack(int argc, const char **argv, const 
char *prefix)
 
setup_path();
 
-   if (!enter_repo(service_dir, 0))
+   if (!enter_repo(service_dir, 0, 0))
die("'%s' does not appear to be a git repository", service_dir);
 
git_config(receive_pack_config, NULL);
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 2caedf1..00d4ced 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -25,7 +25,7 @@ int cmd_upload_archive_writer(int argc, const char **argv, 
const char *prefix)
if (argc != 2)
usage(upload_archive_usage);
 
-   if (!enter_repo(argv[1], 0))
+   if (!enter_repo(argv[1], 0, 0))
die("'%s' does not appear to be a git repository", argv[1]);
 
/* put received options in sent_argv[] */
diff --git a/cache.h b/cache.h
index 4cba08e..6380be0 100644
--- a/cache.h
+++ b/cache.h
@@ -1024,7 +1024,7 @@ enum scld_error 
safe_create_leading_directories_const(const char *path);
 
 int mkdir_in_gitdir(const char *path);
 extern char *expand_user_path(const char *path);
-const char *enter_repo(const char *path, int strict);
+const char *enter_repo(const char *path, int strict_prefix, int strict_suffix);
 static inline int is_absolute_path(const char *path)
 {
return is_dir_sep(path[0]) || has_dos_drive_prefix(path);
diff --git a/daemon.c b/daemon.c
index 425aad0..118d337 100644
--- a/daemon.c
+++ b/daemon.c
@@ -170,27 +170,23 @@ static const char *path_ok(const char *directory, struct 
hostinfo *hi)
return NULL;
}
 
-   if (*dir == '~') {
-   if (!user_path) {
-   logerror("'%s': User-path not allowed", dir);
-   return NULL;
-   }
-   if (*user_path) {
-   /* Got eit

git checkout crashes after server being updated to Debian X86_64

2016-10-18 Thread Raffael Reichelt
Hello! 

I have a serious problem with git, After my provider had updated to a X86_64 
architecture git crashes with various memory-related errors. This is happening 
remote when pushing to the repository from my local machine as well as trying 
it on a shell on the server itself.

This are the error-messages:

fatal: Out of memory, realloc failed
fatal: recursion detected in die handler
fatal: recursion detected in die handler

or
fatal: unable to create threaded lstat
fatal: recursion detected in die handler
or
fatal: unable to create threaded lstat
*** Error in `git': double free or corruption (fasttop): 0x00a8ade0 ***
fatal: recursion detected in die handler
Aborted

It’s obviously not a problem of the repository - happens with all of them. I 
think it is also not a question of size - happens with a 80M Repository as well 
as with a 500M one.

Any way: did a 

git fsck
Prüfe Objekt-Verzeichnisse: 100% (256/256), Fertig.
Prüfe Objekte: 100% (56305/56305), Fertig.

git gc --auto --prune=today —aggressive
git repack

Additionally I played around some config parameters  so my config now looks 
like:
[http]
postbuffer = 524288000
[pack]
threads = 1
deltaCacheSize = 128m
packSizeLimit = 128m
windowMemory = 128m
[core]
packedGitLimit = 128m
packedGitWindowSize = 128m
repositoryformatversion = 0
filemode = true
bare = true

I am running 
git version 2.1.4
 
on
Linux infongp-de65 3.14.0-ui16196-uiabi1-infong-amd64 #1 SMP Debian 
3.14.73-2~ui80+4 (2016-07-13) x86_64 GNU/Linux

Anyone out there to help me getting out of this trouble?

Regards,
Raffael



Re: [PATCH] submodule--helper: normalize funny urls

2016-10-18 Thread Junio C Hamano
Stefan Beller  writes:

> Currently a URL for the superproject ending in
>
> (A).../path/to/dir
> (B).../path/to/dir/
> (C).../path/to/dir/.
> (D).../path/to/dir/./.
> (E).../path/to/dir/.///.//.
>
> is treated the same in (A) and (B), but (C, D, E) are different.

You may know what you meant to say with "treated", but the readers
do not know "treated" in what situation you are talking about.  We
need to tell the readers that the bug being fixed is about resolving
a relative URL "../" off of the URL of the superproject
to compute the remote URL for a submodule repository.

> We never produce the URLs in (C,D,E) ourselves, they come to use, because
> the user used it as the URL for cloning a superproject.
> Normalize these paths.

As you know the externally-visible impact of this change (which I
asked you, but didn't see an on-list answer to, by the way), please
describe what this means to the end user in the log message.  It is
normally done in an earlier part of the log message to describe the
problem being solved and its background.

If I understand the issue correctly, it may go like this:

The remote URL for the submodule can be specified relative
to the URL of the superproject in .gitmodules.  A top-level
git://site.xz/toplevel.git can specify in its .gitmodules

[submodule "sub"]
url = ../submodule.git
path = sub

to say that git://site.xz/submodule.git is where the
submodule bound at its "sub/" is found.

However, when the toplevel is cloned like this:

git clone git://site.xz/toplevel.git/. top

i.e. the toplevel specifies its URL with trailing "/.", the
code set the URL to git://site.xz/toplevel.git/submodule.git
for the submodule, which is nonsense.  This was because the
logic failed to treat trailing "/." any differently from
trailing "/" when resolving a
relative URL "../" off of it.  Stripping "/." at
the end does *not* take you one level up, even though
stripping "/" does!

And then describe the solution/bugfix, listing A-C (or A-E) and
telling that these will be treated the same way.



Re: What's cooking in git.git (Oct 2016, #03; Tue, 11)

2016-10-18 Thread Junio C Hamano
Santiago Torres  writes:

>> * st/verify-tag (2016-10-10) 7 commits
>>  - t/t7004-tag: Add --format specifier tests
>>  - t/t7030-verify-tag: Add --format specifier tests
>>  - builtin/tag: add --format argument for tag -v
>>  - builtin/verify-tag: add --format to verify-tag
>>  - tag: add format specifier to gpg_verify_tag
>>  - ref-filter: add function to print single ref_array_item
>>  - gpg-interface, tag: add GPG_VERIFY_QUIET flag
>> 
>>  "git tag" and "git verify-tag" learned to put GPG verification
>>  status in their "--format=" output format.
>> 
>>  Is this ready for 'next'?
>
> Hi, I saw this on the previous "what's cooking." Is there anything I
> need to do on my side to make sure this is ready for next?

Posting this exact message to the list would be an excellent way
;-).

Hopefully some competent reviewer comes and points me at a thread
where s/he says the series was already reviewed and in good shape
soonish, and your message may be a good trigger to make it happen.




Re: [PATCH v3 16/25] sequencer: support amending commits

2016-10-18 Thread Junio C Hamano
Johannes Schindelin  writes:

> On Mon, 17 Oct 2016, Junio C Hamano wrote:
>
>> Johannes Schindelin  writes:
>> 
>> > This teaches the run_git_commit() function to take an argument that will
>> > allow us to implement "todo" commands that need to amend the commit
>> > messages ("fixup", "squash" and "reword").
>> 
>> Likewise to 15/25, i.e. Good, though the growth by these two steps
>> starts to make me wonder if these three options should be crammed
>> into an unsigned "flags" bitword.
>
> After looking at the diff with the added complications of ORing and ANDing
> the flags, I'd much rather prefer to stay with the three flags being kept
> separately. It's not like we need to save bits, but we need to preserve
> readability as much as possible, I'd wager.

That's OK.  I just wanted to make sure pros-and-cons have been
already considered.

The primary merit of using flags bitword is not to save bits; it is
done to limit the damage to the codebase when we need to add yet
another knob, by the way.


Re: [PATCH v3 14/25] sequencer: introduce a helper to read files written by scripts

2016-10-18 Thread Junio C Hamano
Johannes Schindelin  writes:

> In the meantime, I'd be happy to just add a comment that this function is
> intended for oneliners, but that it will also read multi-line files and
> only strip off the EOL marker from the last line.
>
> Would that work for you?

That would be ideal, I would think.


Re: [PATCH v3 4/6] trailer: make args have their own struct

2016-10-18 Thread Junio C Hamano
Stefan Beller  writes:

>> @@ -29,6 +29,12 @@ struct trailer_item {
>> struct list_head list;
>> char *token;
>> char *value;
>> +};
>> +
>> +struct arg_item {
>> +   struct list_head list;
>> +   char *token;
>> +   char *value;
>> struct conf_info conf;
>>  };
>
> (Unrelated side note:) When first seeing this diff, I assumed the diff
> heuristic is going wild, because it doesn't add a full struct.
> But on a second closer look, I realize this is the only correct diff,
> because we do not account for moved lines from one struct to the
> other.

It probably is not "the only" correct diff, as you actually could
shift it the other way by one to three lines.  I am not sure which
one among four possible diff is the easiest to grok, though.  Both
the above (picking the highest possible position) and the below (the
other extreme) are probably easier to read than anything in between.

 struct trailer_item {
+   struct list_head list;
+   char *token;
+   char *value;
+};
+
+struct arg_item {
struct list_head list;
char *token;
char *value;
struct conf_info conf;
 };
 


Re: [PATCH v3 5/6] trailer: allow non-trailers in trailer block

2016-10-18 Thread Junio C Hamano
Jonathan Tan  writes:

>> * rs/c-auto-resets-attributes:
>>   pretty: avoid adding reset for %C(auto) if output is empty
>>
>> And neither of the two colon containing line remotely resembles how
>> a typical RFC-822 header is formatted.  So that may serve as a hint
>> to how we can tighten it without introducing false negative.
>
> The only "offending" character is the space (according to RFC 822),
> but that sounds like a good rule to have.

I suspect that we should be willing to deviate from the letter of
RFC to reject misidentification.  I saw things like

Thanks to: Jonathan Tan 
Signed-off-by: A U Thor 

in the wild (notice the SP between Thanks and to), for example.
Rejecting leading whitespace as a line that does *not* start the
header (hence its colon does not count) may be a good compromise.

> I think that "Signed-off-by:" is not guaranteed to be
> present.

But do we really care about that case where there is no S-o-b:?  I
personally do not think so.

> Defining a trailer line as "a line starting with a token,
> then optional whitespace, then separator", maybe the following rule:
> - at least one trailer line generated by Git ("(cherry picked by" or
> "Signed-off-by") or configured in the "trailer" section in gitconfig
> OR
> - at least 3/4 logical trailer lines (I'm wondering if this should be
> 100% trailer lines)

I'd strongly suggest turning that OR to AND.  We will not safely be
able to write a commit log message that describes how S-o-b lines
are handled in its last paragraph otherwise.

I do not care too deeply about 3/4, but I meant to allow 75% cruft
but no more than that, and the fact that the threashold is set at
way more than 50% is important.  IOW, if you have

Ordinary log message here...

S-o-b: A U Thor 
[a short description that is typically a single liner
in the real world use pattern we saw in the world, but
could overflow to become multi line cruft]
S-o-b: R E Layer 

"last paragraph" is 5 lines long, among which 60% are cruft that is
below the 75% threshold, and "am -s" can still add the S-o-b of the
committer at the end of that existing last paragraph.  Making it too
strict would raise the false negative ratio.


Re: git checkout crashes after server being updated to Debian X86_64

2016-10-18 Thread René Scharfe

Am 18.10.2016 um 17:17 schrieb Raffael Reichelt:

Hello!

I have a serious problem with git, After my provider had updated to a
X86_64 architecture git crashes with various memory-related errors.
This is happening remote when pushing to the repository from my local
machine as well as trying it on a shell on the server itself.

This are the error-messages:

fatal: Out of memory, realloc failed
fatal: recursion detected in die handler
fatal: recursion detected in die handler

or
fatal: unable to create threaded lstat
fatal: recursion detected in die handler
or
fatal: unable to create threaded lstat
*** Error in `git': double free or corruption (fasttop): 0x00a8ade0 ***
fatal: recursion detected in die handler
Aborted

It’s obviously not a problem of the repository - happens with all of
them. I think it is also not a question of size - happens with a 80M
Repository as well as with a 500M one.

Any way: did a

git fsck
Prüfe Objekt-Verzeichnisse: 100% (256/256), Fertig.
Prüfe Objekte: 100% (56305/56305), Fertig.

git gc --auto --prune=today —aggressive
git repack

Additionally I played around some config parameters  so my config now looks 
like:
[http]
postbuffer = 524288000
[pack]
threads = 1
deltaCacheSize = 128m
packSizeLimit = 128m
windowMemory = 128m
[core]
packedGitLimit = 128m
packedGitWindowSize = 128m
repositoryformatversion = 0
filemode = true
bare = true

I am running
git version 2.1.4

on
Linux infongp-de65 3.14.0-ui16196-uiabi1-infong-amd64 #1 SMP Debian 
3.14.73-2~ui80+4 (2016-07-13) x86_64 GNU/Linux

Anyone out there to help me getting out of this trouble?


Git 2.1.4 is the version that comes with Debian stable according to 
https://packages.debian.org/jessie/git, so I guess using a more recent 
version is not a reasonable option.


What do "file $(which git)" and "ulimit -a" return?  Do you have an 
x86-64 binary and no unnecessarily low limits set?


René



Re: git checkout crashes after server being updated to Debian X86_64

2016-10-18 Thread Raffael Reichelt
Hello Renè!
file is returning 

/usr/bin/git: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically 
linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32,  
BuildID[sha1]=ee62e538d6fe6673d3ba49f0e66bfec784cc32bc, stripped

and ulimit is:
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 1
file size   (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 512
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) 1800
max user processes  (-u) 42
virtual memory  (kbytes, -v) 786432
file locks  (-x) unlimited

Support told me git is limited to 600M

Regrads,
Rafael

> Am 18.10.2016 um 18:42 schrieb René Scharfe :
> 
> Am 18.10.2016 um 17:17 schrieb Raffael Reichelt:
>> Hello!
>> 
>> I have a serious problem with git, After my provider had updated to a
>> X86_64 architecture git crashes with various memory-related errors.
>> This is happening remote when pushing to the repository from my local
>> machine as well as trying it on a shell on the server itself.
>> 
>> This are the error-messages:
>> 
>> fatal: Out of memory, realloc failed
>> fatal: recursion detected in die handler
>> fatal: recursion detected in die handler
>> 
>> or
>> fatal: unable to create threaded lstat
>> fatal: recursion detected in die handler
>> or
>> fatal: unable to create threaded lstat
>> *** Error in `git': double free or corruption (fasttop): 0x00a8ade0 
>> ***
>> fatal: recursion detected in die handler
>> Aborted
>> 
>> It’s obviously not a problem of the repository - happens with all of
>> them. I think it is also not a question of size - happens with a 80M
>> Repository as well as with a 500M one.
>> 
>> Any way: did a
>> 
>> git fsck
>> Prüfe Objekt-Verzeichnisse: 100% (256/256), Fertig.
>> Prüfe Objekte: 100% (56305/56305), Fertig.
>> 
>> git gc --auto --prune=today —aggressive
>> git repack
>> 
>> Additionally I played around some config parameters  so my config now looks 
>> like:
>> [http]
>>postbuffer = 524288000
>> [pack]
>>threads = 1
>>deltaCacheSize = 128m
>>packSizeLimit = 128m
>>windowMemory = 128m
>> [core]
>>packedGitLimit = 128m
>>packedGitWindowSize = 128m
>>repositoryformatversion = 0
>>filemode = true
>>bare = true
>> 
>> I am running
>> git version 2.1.4
>> 
>> on
>> Linux infongp-de65 3.14.0-ui16196-uiabi1-infong-amd64 #1 SMP Debian 
>> 3.14.73-2~ui80+4 (2016-07-13) x86_64 GNU/Linux
>> 
>> Anyone out there to help me getting out of this trouble?
> 
> Git 2.1.4 is the version that comes with Debian stable according to 
> https://packages.debian.org/jessie/git, so I guess using a more recent 
> version is not a reasonable option.
> 
> What do "file $(which git)" and "ulimit -a" return?  Do you have an x86-64 
> binary and no unnecessarily low limits set?
> 
> René



Re: Git log exclude/remotes/branches options not working as expected

2016-10-18 Thread Junio C Hamano
Robert Dailey  writes:

> I have 3 remotes registered in my clone:
>
> origin, fork, drive
>
> When I do:
>
> $ git log --oneline --decorate --graph
>
> I only want to see branches under:
>
> refs/heads/*
> refs/remotes/origin/*
>
> I tried the following:
>
> $ git log --oneline --decorate --graph --simplify-by-decoration
> --remote=origin topic1..master

I am guessing that the above is not what you actually typed and
s/remote/remotes/ is what you did.

According to "git log --help":

--remotes[=]
Pretend as if all the refs in refs/remotes are listed on the
command line as . If  is given, limit
remote-tracking branches to ones matching given shell glob. If
pattern lacks ?, *, or [, /* at the end is implied.

So your command line is equivalent to

$ git log --oneline --decorate --graph --simplify-by-decoration \
  refs/remotes/origin/master refs/remotes/origin/topic ... \
  topic1..master

(replace the second line with all the remote-tracking branches you
have for "origin").

There is nothing that tells "--decorate" which refs to base its
decoration on, so it is reasonable to expect that a commit that
happens to match a tip of remote-tracking branches from other
remotes are decorated as such, and I think that is what you are
seeing.

> What I'm expecting is that I literally see NONE of those excluded refs
> ... How can I achieve my goals?

The decorations are added in log-tree.c::load_ref_decorations() and
it does not get anything that says "I want to see decorations based
on only these refs", so a short answer is that there is no canned
way to do this in today's codebase.

Having said that, I would say it is a reasonable new feature to
have.  Another related wishlist item might say "decorate only based
on tags, not branches or remote-tracking refs".

You can achieve your goals by teaching that codepath to take such
new pieces of information, come up with a new command line option
[*1*] and add a code to parse your command line option to either
builtin/log.c or revisions.c and pass it down the callchain that
leads to load_ref_decorations().


[Footnote]

*1* Unfortunately "--decorate=" is already taken to specify
the decoration levels, so you would need a different one.


Re: [PATCH] daemon, path.c: fix a bug with ~ in repo paths

2016-10-18 Thread Junio C Hamano
Luke Shumaker  writes:

> The superficial aspect of this change is that git-daemon now allows paths
> that start with a "~".  Previously, if git-daemon was run with
> "--base-path=/srv/git", it was impossible to get it to serve
> "/srv/git/~foo/bar.git".

I am not sure I understand what you are saying here.  Do you mean

I have a path on my server /srv/git/~foo/bar.git; the tilde does
not mean anything special--it is just a byte in a valid pathname.

I want to allow my users to say

git fetch git://my.server/~foo/bar.git

and fetch from that repository, but "git daemon" lacks the way
to configure to allow it.

If that is the case, what happens instead?  Due to the leading
"~foo/" getting noticed as an attempt to use the user-path expansion
it is not treated as just a literal character?

I am not sure if it is even a bug.  As you can easily lose that
tilde that appears in front of subdirectory of /srv/git/ or replace
it with something else (e.g. "u/"), this smells like "Don't do it if
it hurts" thing to me.



Re: [PATCH] submodule--helper: normalize funny urls

2016-10-18 Thread Junio C Hamano
Junio C Hamano  writes:

> Stefan Beller  writes:
>
>> +static void strip_url_ending(char *url, size_t *_len)
>> +{
>> +int check_url_stripping = 1;
>> +size_t len = _len ? *_len : strlen(url);
>> +
>> +while (check_url_stripping) {
>> +check_url_stripping = 0;
>> +if (is_dir_sep(url[len-2]) && url[len-1] == '.') {
>
> This is "strip /. at the end" it seems.
>
> Does anything in the loop control guarantees 2 <= len at this point?
>
>> +url[len-2] = '\0';
>> +len -= 2;
>> +check_url_stripping = 1;
>> +}
>> +
>> +if (is_dir_sep(url[len-1])) {
>
> This is "strip / at the end" it seems.
>
> Does anything in the loop control guarantees 1 <= len at this point?
>
>> +url[len-1] = '\0';
>> +len--;
>> +check_url_stripping = 1;
>> +}
>> +}

I also somehow find the "check-url-stripping" variable ugly.

while (URL still has something that could be stripped) {
if (ends with "/.") {
strip "/.";
continue;
}
if (ends with "/") {
strip "/";
continue;
}
break;
}

perhaps?


Re: [PATCH] submodule--helper: normalize funny urls

2016-10-18 Thread Stefan Beller
On Tue, Oct 18, 2016 at 10:15 AM, Junio C Hamano  wrote:
>
> I also somehow find the "check-url-stripping" variable ugly.
>
> while (URL still has something that could be stripped) {

for(;;) {

here ? (this code would not need a variable, and
for wins over while:
$ git grep "while (1)" |wc -l
107
$ git grep "for (;;)" |wc -l
128
)

> if (ends with "/.") {
> strip "/.";
> continue;
> }
> if (ends with "/") {
> strip "/";
> continue;
> }
> break;
> }
>
> perhaps?


Re: Merge conflicts in .gitattributes can cause trouble

2016-10-18 Thread Junio C Hamano
Johannes Schindelin  writes:

> To the contrary. As far as I can see, when calling `git merge`, Git
> currently *does* read .gitattributes from the file, and if that fails,
> falls back to reading that file from the index.

Hmph.

Assuming that the merge always goes in the index order, I think you
are right.  When we need to merge path/to/dir/.gitattributes, we
would need to know all the .gitattributes files that may affect that
path, so .gitattributes, path/.gitattributes, path/to/.gitattributes
and the file being merged are all read into core before anything
happens and these are kept in attr_stack while merging anything
underneath path/to/dir/ hierarchy without being re-read from the
filesystem.  The original contents of path/to/dir/.gitattributes
cached in the attr_stack will be discarded when we start to merge
things outside path/to/dir (e.g. merging path/to/another/file), but
at that point the contents of path/to/dir/.gitattributes no longer
matters to the path being merged, so unless the merge somehow jumps
around it is OK.

It may be a fragile assumption in the longer term that the merge
always goes in the index order, but I think the assumption holds in
the current codebase, and the update planned immediately in the
future.



Re: [PATCH] submodule--helper: normalize funny urls

2016-10-18 Thread Junio C Hamano
Stefan Beller  writes:

> for(;;) {
>
> here ? (this code would not need a variable, and
> for wins over while:
> $ git grep "while (1)" |wc -l
> 107
> $ git grep "for (;;)" |wc -l
> 128
> )

I dunno; the numbers tells me there is no strong preference by wide
margin either way.

I am not sure if the end shape does not really need an exit
condition.  If there is a need for one, "for (; condition;)" would
look strange.  If there isn't, "for (;;)" is actually my personal
preference over "while (1)".



[PATCH] submodule--helper: normalize funny urls

2016-10-18 Thread Stefan Beller
The remote URL for the submodule can be specified relative
to the URL of the superproject in .gitmodules.  A top-level
git://site.xz/toplevel.git can specify in its .gitmodules

[submodule "sub"]
url = ../submodule.git
path = sub

to say that git://site.xz/submodule.git is where the
submodule bound at its "sub/" is found.

However, when the toplevel is cloned like this:

git clone git://site.xz/toplevel.git/. top

i.e. the toplevel specifies its URL with trailing "/.", the
code set the URL to git://site.xz/toplevel.git/submodule.git
for the submodule, which is nonsense.  This was because the
logic failed to treat trailing "/." any differently from
trailing "/" when resolving a
relative URL "../" off of it.  Stripping "/." at
the end does *not* take you one level up, even though
stripping "/" does!

Some users may rely on this by always cloning with '/.' and having
an additional '../' in the relative path for the submodule, and this
patch breaks them. So why introduce this patch?

The fix in c12922024 (submodule: ignore trailing slash on superproject
URL, 2016-10-10) and prior discussion revealed, that Git and Git
for Windows treat URLs differently, as currently Git for Windows
strips off a trailing dot from paths when calling a Git binary
unlike when running a shell. Which means Git for Windows is already
doing the right thing for the case mentioned above, but it would fail
our current tests, that test for the broken behavior and it would
confuse users working across platforms. So we'd rather fix it
in Git to ignore any of these trailing no ops in the path properly.

We never produce the URLs with a trailing '/.' in Git ourselves,
they come to us, because the user used it as the URL for cloning
a superproject. Normalize these paths.

Helped-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
---

 * reworded the commit message, taken from Junio, but added more explanation
   why we want to introduce this patch. 
 * added the length check
 * use an infinite loop with break instead of a variable
   to determine the ending condition.

 builtin/submodule--helper.c | 48 +
 t/t0060-path-utils.sh   | 11 +++
 2 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 260f46f..ac03cb3 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -76,6 +76,29 @@ static int chop_last_dir(char **remoteurl, int is_relative)
return 0;
 }
 
+static void strip_url_ending(char *url, size_t *_len)
+{
+   size_t len = _len ? *_len : strlen(url);
+
+   for (;;) {
+   if (len > 1 && is_dir_sep(url[len-2]) && url[len-1] == '.') {
+   url[len-2] = '\0';
+   len -= 2;
+   continue;
+   }
+   if (len > 0 && is_dir_sep(url[len-1])) {
+   url[len-1] = '\0';
+   len --;
+   continue;
+   }
+
+   break;
+   }
+
+   if (_len)
+   *_len = len;
+}
+
 /*
  * The `url` argument is the URL that navigates to the submodule origin
  * repo. When relative, this URL is relative to the superproject origin
@@ -93,14 +116,16 @@ static int chop_last_dir(char **remoteurl, int is_relative)
  * the superproject working tree otherwise.
  *
  * NEEDSWORK: This works incorrectly on the domain and protocol part.
- * remote_url  url  outcome  expectation
- * http://a.com/b  ../c http://a.com/c   as is
- * http://a.com/b/ ../c http://a.com/c   same as previous line, but
- *   ignore trailing slash in 
url
- * http://a.com/b  ../../c  http://c error out
- * http://a.com/b  ../../../c   http:/c  error out
- * http://a.com/b  ../../../../chttp:c   error out
- * http://a.com/b  ../../../../../c.:c   error out
+ * remote_url   url  outcome  expectation
+ * http://a.com/b   ../c http://a.com/c   as is
+ * http://a.com/b/  ../c http://a.com/c   same as previous line, 
but
+ *ignore trailing '/' in 
url
+ * http://a.com/b/. ../c http://a.com/c   same as previous line, 
but
+ *ignore trailing '/.' in 
url
+ * http://a.com/b   ../../c  http://c error out
+ * http://a.com/b   ../../../c   http:/c  error out
+ * http://a.com/b   ../../../../chttp:c   error out
+ * http://a.com/b   ../../../../../c.:c   error out
  * NEEDSWORK: Given how chop_last_dir() works, this function is broken
  * when a local part has a colon in its path component, too.
  */
@@ -115,8 +140,7 @@ static char *relative_url(const char *remote_url,

Re: [PATCH] daemon, path.c: fix a bug with ~ in repo paths

2016-10-18 Thread Luke Shumaker
On Tue, 18 Oct 2016 13:08:45 -0400,
Junio C Hamano wrote:
> 
> Luke Shumaker  writes:
> 
> > The superficial aspect of this change is that git-daemon now allows paths
> > that start with a "~".  Previously, if git-daemon was run with
> > "--base-path=/srv/git", it was impossible to get it to serve
> > "/srv/git/~foo/bar.git".
> 
> I am not sure I understand what you are saying here.  Do you mean
> 
> I have a path on my server /srv/git/~foo/bar.git; the tilde does
> not mean anything special--it is just a byte in a valid pathname.
> 
> I want to allow my users to say
> 
>   git fetch git://my.server/~foo/bar.git
> 
> and fetch from that repository, but "git daemon" lacks the way
> to configure to allow it.

Yes, that is what I am saying.

> If that is the case, what happens instead?  Due to the leading
> "~foo/" getting noticed as an attempt to use the user-path expansion
> it is not treated as just a literal character?

What happens instead is

if (*dir == '~') {
if (!user_path) {
logerror("'%s': User-path not allowed", dir);
return NULL;
}

which to the user looks like

git clone git://my.server/~foo/bar.git
Cloning into 'bar'...
fatal: remote error: access denied or repository not exported: 
~foo/bar.git

> I am not sure if it is even a bug.  As you can easily lose that
> tilde that appears in front of subdirectory of /srv/git/ or replace
> it with something else (e.g. "u/"), this smells like "Don't do it if
> it hurts" thing to me.

I buy into "Don't do it if it hurts", but that doesn't mean it's not a
bug on an uncommon edge-case.  Note that it doesn't hurt with
git-shell or cgit (I haven't checked with gitweb).

Many programs (especially shell scripts) fail to deal with filenames
containing a space.  "Don't put spaces in filenames if it hurts".
It's still a bug in the program.

Similarly, `git gui` used to not be able to add a file in a directory
starting with '~' (when one clicked the file named "~foo/bar", it
said something along the lines of "/home/~foo/bar is outside
repository"), and one had to use `git add '~foo/bar` directly.
"Don't do it if it hurts"; it was still a bug.

  Aside: one (somewhat silly) non-user reason that I've seen for a
  directory to start with '~' is that it sorts after all other ASCII
  characters; it moves the directory to the end of any lists.

-- 
Happy hacking,
~ Luke Shumaker


Integrating submodules with no side effects

2016-10-18 Thread Robert Dailey
Hello git experts,

I have in the past attempted to integrate submodules into my primary
repository using the same directory name. However, this has always
caused headache when going to and from branches that take you between
when this integration occurred and when it didn't. It's a bit hard to
explain. Basically, if I have a submodule "foo", and I delete that
submodule and physically add its files under the same directory "foo",
when I do a pull to get this change from another clone, it fails
saying:

error: The following untracked working tree files would be overwritten
by checkout:
foo/somefile.txt
Please move or remove them before you switch branches.
Aborting
could not detach HEAD


Obviously, git can't delete the submodule because the files have also
been added directly. I don't think it is built to handle this
scenario. Here is the series of commands I ran to "integrate" the
submodule (replace the submodule with a directory containing the exact
contents of the submodule itself):

#!/usr/bin/env bash
mv "$1" "${1}_"
git submodule deinit "$1"
git rm "$1"
mv "${1}_" "$1"
git add "$1/**"

The above script is named git-integrate-submodule, I run it like so:

$ git integrate-submodule foo

Then I do:

$ git commit -m 'Integrated foo submodule'

Is there any way to make this work nicely? The only solution I've
found is to obviously rename the directory before adding the physical
files, for example name it foo1. Because they're different, they never
"clash".


[PATCH] sha1_name: remove ONELINE_SEEN bit

2016-10-18 Thread Junio C Hamano
28a4d94044 ("object name: introduce ':/' notation",
2007-02-24) started using its own bit from object->flags to mark
commits used while parsing the ":/token" extended SHA-1 syntax to
name a commit temporarily, and this was kept even when f7bff00314
("sha1_name.c: fix parsing of ":/token" syntax", 2010-08-02) found
and fixed a bug in its implementation.

The use of that flag bit, however, is limited to a single function,
get_sha1_oneline(), which first sets it for the commits sitting at
the tips of refs, uses the bit to avoid duplicate traversal while
walking the history, and then cleans the bit from all commits it
walked.

Which is exactly what the general-purpose TMP_MARK bit meant to be
used for isolated case was invented for.  Replace ONELINE_SEEN with
TMP_MARK and retire the former.

Signed-off-by: Junio C Hamano 
---
 object.h|  1 -
 sha1_name.c | 10 --
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/object.h b/object.h
index f8b644263f..f8e218eccd 100644
--- a/object.h
+++ b/object.h
@@ -37,7 +37,6 @@ struct object_array {
  * bundle.c:   16
  * http-push.c:16-19
  * commit.c:   16-19
- * sha1_name.c: 20
  */
 #define FLAG_BITS  27
 
diff --git a/sha1_name.c b/sha1_name.c
index ca7ddd6f2c..fa0e6701a3 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -7,6 +7,7 @@
 #include "refs.h"
 #include "remote.h"
 #include "dir.h"
+#include "revision.h"
 
 static int get_sha1_oneline(const char *, unsigned char *, struct commit_list 
*);
 
@@ -855,9 +856,6 @@ static int get_sha1_1(const char *name, int len, unsigned 
char *sha1, unsigned l
  * For future extension, all other sequences beginning with ':/!' are reserved.
  */
 
-/* Remember to update object flag allocation in object.h */
-#define ONELINE_SEEN (1u<<20)
-
 static int handle_one_ref(const char *path, const struct object_id *oid,
  int flag, void *cb_data)
 {
@@ -899,7 +897,7 @@ static int get_sha1_oneline(const char *prefix, unsigned 
char *sha1,
return -1;
 
for (l = list; l; l = l->next) {
-   l->item->object.flags |= ONELINE_SEEN;
+   l->item->object.flags |= TMP_MARK;
commit_list_insert(l->item, &backup);
}
while (list) {
@@ -907,7 +905,7 @@ static int get_sha1_oneline(const char *prefix, unsigned 
char *sha1,
struct commit *commit;
int matches;
 
-   commit = pop_most_recent_commit(&list, ONELINE_SEEN);
+   commit = pop_most_recent_commit(&list, TMP_MARK);
if (!parse_object(commit->object.oid.hash))
continue;
buf = get_commit_buffer(commit, NULL);
@@ -924,7 +922,7 @@ static int get_sha1_oneline(const char *prefix, unsigned 
char *sha1,
regfree(®ex);
free_commit_list(list);
for (l = backup; l; l = l->next)
-   clear_commit_marks(l->item, ONELINE_SEEN);
+   clear_commit_marks(l->item, TMP_MARK);
free_commit_list(backup);
return found ? 0 : -1;
 }


Re: [PATCH] submodule--helper: normalize funny urls

2016-10-18 Thread Junio C Hamano
Stefan Beller  writes:

> Some users may rely on this by always cloning with '/.' and having
> an additional '../' in the relative path for the submodule, and this
> patch breaks them. So why introduce this patch?
>
> The fix in c12922024 (submodule: ignore trailing slash on superproject
> URL, 2016-10-10) and prior discussion revealed, that Git and Git
> for Windows treat URLs differently, as currently Git for Windows
> strips off a trailing dot from paths when calling a Git binary
> unlike when running a shell. Which means Git for Windows is already
> doing the right thing for the case mentioned above, but it would fail
> our current tests, that test for the broken behavior and it would
> confuse users working across platforms. So we'd rather fix it
> in Git to ignore any of these trailing no ops in the path properly.
>
> We never produce the URLs with a trailing '/.' in Git ourselves,
> they come to us, because the user used it as the URL for cloning
> a superproject. Normalize these paths.
>
> Helped-by: Junio C Hamano 
> Signed-off-by: Stefan Beller 
> ---
>
>  * reworded the commit message, taken from Junio, but added more explanation
>why we want to introduce this patch. 

The additional explanation is very good.

> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index 260f46f..ac03cb3 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -76,6 +76,29 @@ static int chop_last_dir(char **remoteurl, int is_relative)
>   return 0;
>  }
>  
> +static void strip_url_ending(char *url, size_t *_len)
> +{
> + size_t len = _len ? *_len : strlen(url);

Stare at our codebase and you'd notice that we avoid using names
with leading underscore deliberately and use trailing one instead
when we name a throw-away name like this.  Let's do the same here.
I.e.

static void strip_url_ending(char *url, size_t *len_)
{
size_t len = len_ ? *len_ : strlen(url);

> + for (;;) {
> + if (len > 1 && is_dir_sep(url[len-2]) && url[len-1] == '.') {
> + url[len-2] = '\0';

"len-1" and "len-2" are usually spelled with SP on both sides of
binary operators.

> + len -= 2;
> + continue;
> + }
> + if (len > 0 && is_dir_sep(url[len-1])) {
> + url[len-1] = '\0';
> + len --;

And post-decrement sticks to whatever it is decrementing without SP.

> + continue;
> + }


Re: [PATCH 1/2] submodule: ignore trailing slash on superproject URL

2016-10-18 Thread Johannes Sixt
Am 17.10.2016 um 21:32 schrieb Johannes Sixt:
> I think that we could reduce the confusion by converting all $PWD to
> $(pwd) in these test cases. I don't remember why I suggested to use $PWD
> for one of the arguments of the test cases (the second must be $(pwd)),
> but the most likely reason is only that we save a process.

Something like this works for me on Windows. I can update the patch
after the "normalize funny urls" topic settles.

 8< 
t0060: sidestep surprising path mangling results on Windows

When an MSYS program (such as the bash that drives the test suite)
invokes git on Windows, absolute Unix style paths are transformed into
Windows native absolute paths (drive letter form). However, this
transformation also includes some simplifications that are not just
straight-forward textual substitutions:

- When the path ends in "/.", then the dot is stripped, but not the
  directory separator.

- When the path contains "..", then it is optimized away if possible,
  e.g., "/c/dir/foo/../bar" becomes "c:/dir/bar".

These additional transformations violate the assumptions of some
submodule path tests. We can avoid them when the input is already a
Windows native path, because then MSYS leaves the path unmolested.

Convert the uses of $PWD to $(pwd); the latter returns a native Windows
path.

Signed-off-by: Johannes Sixt 
---
 t/t0060-path-utils.sh | 27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh
index 25b48e5..444b5a4 100755
--- a/t/t0060-path-utils.sh
+++ b/t/t0060-path-utils.sh
@@ -305,8 +305,9 @@ test_git_path GIT_COMMON_DIR=bar config   
bar/config
 test_git_path GIT_COMMON_DIR=bar packed-refs  bar/packed-refs
 test_git_path GIT_COMMON_DIR=bar shallow  bar/shallow
 
-# In the tests below, the distinction between $PWD and $(pwd) is important:
-# on Windows, $PWD is POSIX style (/c/foo), $(pwd) has drive letter (c:/foo).
+# In the tests below, $(pwd) must be used because it is a native path on
+# Windows and avoids MSYS's path mangling (which simplifies "foo/../bar" and
+# strips the dot from trailing "/.").
 
 test_submodule_relative_url "../" "../foo" "../submodule" "../../submodule"
 test_submodule_relative_url "../" "../foo/bar" "../submodule" 
"../../foo/submodule"
@@ -314,7 +315,7 @@ test_submodule_relative_url "../" "../foo/submodule" 
"../submodule" "../../foo/s
 test_submodule_relative_url "../" "./foo" "../submodule" "../submodule"
 test_submodule_relative_url "../" "./foo/bar" "../submodule" "../foo/submodule"
 test_submodule_relative_url "../../../" "../foo/bar" "../sub/a/b/c" 
"../../../../foo/sub/a/b/c"
-test_submodule_relative_url "../" "$PWD/addtest" "../repo" "$(pwd)/repo"
+test_submodule_relative_url "../" "$(pwd)/addtest" "../repo" "$(pwd)/repo"
 test_submodule_relative_url "../" "foo/bar" "../submodule" "../foo/submodule"
 test_submodule_relative_url "../" "foo" "../submodule" "../submodule"
 
@@ -327,16 +328,16 @@ test_submodule_relative_url "(null)" "../foo" 
"../submodule" "../submodule"
 test_submodule_relative_url "(null)" "./foo/bar" "../submodule" "foo/submodule"
 test_submodule_relative_url "(null)" "./foo" "../submodule" "submodule"
 test_submodule_relative_url "(null)" "//somewhere else/repo" "../subrepo" 
"//somewhere else/subrepo"
-test_submodule_relative_url "(null)" "$PWD/subsuper_update_r" 
"../subsubsuper_update_r" "$(pwd)/subsubsuper_update_r"
-test_submodule_relative_url "(null)" "$PWD/super_update_r2" 
"../subsuper_update_r" "$(pwd)/subsuper_update_r"
-test_submodule_relative_url "(null)" "$PWD/." "../." "$(pwd)/."
-test_submodule_relative_url "(null)" "$PWD" "./." "$(pwd)/."
-test_submodule_relative_url "(null)" "$PWD/addtest" "../repo" "$(pwd)/repo"
-test_submodule_relative_url "(null)" "$PWD" "./å äö" "$(pwd)/å äö"
-test_submodule_relative_url "(null)" "$PWD/." "../submodule" "$(pwd)/submodule"
-test_submodule_relative_url "(null)" "$PWD/submodule" "../submodule" 
"$(pwd)/submodule"
-test_submodule_relative_url "(null)" "$PWD/home2/../remote" "../bundle1" 
"$(pwd)/home2/../bundle1"
-test_submodule_relative_url "(null)" "$PWD/submodule_update_repo" "./." 
"$(pwd)/submodule_update_repo/."
+test_submodule_relative_url "(null)" "$(pwd)/subsuper_update_r" 
"../subsubsuper_update_r" "$(pwd)/subsubsuper_update_r"
+test_submodule_relative_url "(null)" "$(pwd)/super_update_r2" 
"../subsuper_update_r" "$(pwd)/subsuper_update_r"
+test_submodule_relative_url "(null)" "$(pwd)/." "../." "$(pwd)/."
+test_submodule_relative_url "(null)" "$(pwd)" "./." "$(pwd)/."
+test_submodule_relative_url "(null)" "$(pwd)/addtest" "../repo" "$(pwd)/repo"
+test_submodule_relative_url "(null)" "$(pwd)" "./å äö" "$(pwd)/å äö"
+test_submodule_relative_url "(null)" "$(pwd)/." "../submodule" 
"$(pwd)/submodule"
+test_submodule_relative_url "(null)" "$(pwd)/submodule" "../submodule" 
"$(pwd)/submodule"
+test_submodule_relative_url "(null)" "$(pwd)/home2

Re: What's cooking in git.git (Oct 2016, #04; Mon, 17)

2016-10-18 Thread Lars Schneider

> On 17 Oct 2016, at 15:28, Junio C Hamano  wrote:
> ...
> 
> * ls/filter-process (2016-10-17) 14 commits
> - contrib/long-running-filter: add long running filter example
> - convert: add filter..process option
> - convert: prepare filter..process option
> - convert: make apply_filter() adhere to standard Git error handling
> - pkt-line: add functions to read/write flush terminated packet streams
> - pkt-line: add packet_write_gently()
> - pkt-line: add packet_flush_gently()
> - pkt-line: add packet_write_fmt_gently()
> - pkt-line: extract set_packet_header()
> - pkt-line: rename packet_write() to packet_write_fmt()
> - run-command: add clean_on_exit_handler
> - run-command: move check_pipe() from write_or_die to run_command
> - convert: modernize tests
> - convert: quote filter names in error messages
> 
> The smudge/clean filter API expect an external process is spawned
> to filter the contents for each path that has a filter defined.  A
> new type of "process" filter API has been added to allow the first
> request to run the filter for a path to spawn a single process, and
> all filtering need is served by this single process for multiple
> paths, reducing the process creation overhead.

Hi Junio,

what do you think about v11? Do you feel the series is becoming mature
enough for `next`?

Thanks,
Lars


Re: What's cooking in git.git (Oct 2016, #04; Mon, 17)

2016-10-18 Thread Junio C Hamano
Lars Schneider  writes:

>> On 17 Oct 2016, at 15:28, Junio C Hamano  wrote:
>> ...
>> 
>> * ls/filter-process (2016-10-17) 14 commits
>> - contrib/long-running-filter: add long running filter example
>> - convert: add filter..process option
>> - convert: prepare filter..process option
>> - convert: make apply_filter() adhere to standard Git error handling
>> - pkt-line: add functions to read/write flush terminated packet streams
>> - pkt-line: add packet_write_gently()
>> - pkt-line: add packet_flush_gently()
>> - pkt-line: add packet_write_fmt_gently()
>> - pkt-line: extract set_packet_header()
>> - pkt-line: rename packet_write() to packet_write_fmt()
>> - run-command: add clean_on_exit_handler
>> - run-command: move check_pipe() from write_or_die to run_command
>> - convert: modernize tests
>> - convert: quote filter names in error messages
>> 
>> The smudge/clean filter API expect an external process is spawned
>> to filter the contents for each path that has a filter defined.  A
>> new type of "process" filter API has been added to allow the first
>> request to run the filter for a path to spawn a single process, and
>> all filtering need is served by this single process for multiple
>> paths, reducing the process creation overhead.
>
> Hi Junio,
>
> what do you think about v11? Do you feel the series is becoming mature
> enough for `next`?

I've already had that feeling a few rounds ago, but I haven't had a
chance to read the most recent one carefully myself to answer that
question honestly.



[PATCHv3] submodule--helper: normalize funny urls

2016-10-18 Thread Stefan Beller
The remote URL for the submodule can be specified relative
to the URL of the superproject in .gitmodules.  A top-level
git://site.xz/toplevel.git can specify in its .gitmodules

[submodule "sub"]
url = ../submodule.git
path = sub

to say that git://site.xz/submodule.git is where the
submodule bound at its "sub/" is found.

However, when the toplevel is cloned like this:

git clone git://site.xz/toplevel.git/. top

i.e. the toplevel specifies its URL with trailing "/.", the
code set the URL to git://site.xz/toplevel.git/submodule.git
for the submodule, which is nonsense.  This was because the
logic failed to treat trailing "/." any differently from
trailing "/" when resolving a
relative URL "../" off of it.  Stripping "/." at
the end does *not* take you one level up, even though
stripping "/" does!

Some users may rely on this by always cloning with '/.' and having
an additional '../' in the relative path for the submodule, and this
patch breaks them. So why introduce this patch?

The fix in c12922024 (submodule: ignore trailing slash on superproject
URL, 2016-10-10) and prior discussion revealed, that Git and Git
for Windows treat URLs differently, as currently Git for Windows
strips off a trailing dot from paths when calling a Git binary
unlike when running a shell. Which means Git for Windows is already
doing the right thing for the case mentioned above, but it would fail
our current tests, that test for the broken behavior and it would
confuse users working across platforms. So we'd rather fix it
in Git to ignore any of these trailing no ops in the path properly.

We never produce the URLs with a trailing '/.' in Git ourselves,
they come to us, because the user used it as the URL for cloning
a superproject. Normalize these paths.

Helped-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
---
v3:
 * fixed the coding style.

v2:
 * reworded the commit message, taken from Junio, but added more explanation
   why we want to introduce this patch. 
 * added the length check
 * use an infinite loop with break instead of a variable
   to determine the ending condition.
   
 builtin/submodule--helper.c | 48 +
 t/t0060-path-utils.sh   | 11 +++
 2 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 260f46f..4f11082 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -76,6 +76,29 @@ static int chop_last_dir(char **remoteurl, int is_relative)
return 0;
 }
 
+static void strip_url_ending(char *url, size_t *len_)
+{
+   size_t len = len_ ? *len_ : strlen(url);
+
+   for (;;) {
+   if (len > 1 && is_dir_sep(url[len - 2]) && url[len - 1] == '.') 
{
+   url[len - 2] = '\0';
+   len -= 2;
+   continue;
+   }
+   if (len > 0 && is_dir_sep(url[len - 1])) {
+   url[len - 1] = '\0';
+   len--;
+   continue;
+   }
+
+   break;
+   }
+
+   if (len_)
+   *len_ = len;
+}
+
 /*
  * The `url` argument is the URL that navigates to the submodule origin
  * repo. When relative, this URL is relative to the superproject origin
@@ -93,14 +116,16 @@ static int chop_last_dir(char **remoteurl, int is_relative)
  * the superproject working tree otherwise.
  *
  * NEEDSWORK: This works incorrectly on the domain and protocol part.
- * remote_url  url  outcome  expectation
- * http://a.com/b  ../c http://a.com/c   as is
- * http://a.com/b/ ../c http://a.com/c   same as previous line, but
- *   ignore trailing slash in 
url
- * http://a.com/b  ../../c  http://c error out
- * http://a.com/b  ../../../c   http:/c  error out
- * http://a.com/b  ../../../../chttp:c   error out
- * http://a.com/b  ../../../../../c.:c   error out
+ * remote_url   url  outcome  expectation
+ * http://a.com/b   ../c http://a.com/c   as is
+ * http://a.com/b/  ../c http://a.com/c   same as previous line, 
but
+ *ignore trailing '/' in 
url
+ * http://a.com/b/. ../c http://a.com/c   same as previous line, 
but
+ *ignore trailing '/.' in 
url
+ * http://a.com/b   ../../c  http://c error out
+ * http://a.com/b   ../../../c   http:/c  error out
+ * http://a.com/b   ../../../../chttp:c   error out
+ * http://a.com/b   ../../../../../c.:c   error out
  * NEEDSWORK: Given how chop_last_dir() works, this function is broken
  * when a local part has a colon in its path component, too.
  */
@@ -115,8 +140,7 @@ s

Re: [PATCHv3] submodule--helper: normalize funny urls

2016-10-18 Thread Junio C Hamano
Stefan Beller  writes:

> The remote URL for the submodule can be specified relative
> ...
> v3:
>  * fixed the coding style.

Ah, thanks.  I had a squash queued on top but will replace with this
one.


Re: Integrating submodules with no side effects

2016-10-18 Thread Stefan Beller
On Tue, Oct 18, 2016 at 12:35 PM, Robert Dailey
 wrote:
> Hello git experts,
>
> I have in the past attempted to integrate submodules into my primary
> repository using the same directory name. However, this has always
> caused headache when going to and from branches that take you between
> when this integration occurred and when it didn't. It's a bit hard to
> explain. Basically, if I have a submodule "foo", and I delete that
> submodule and physically add its files under the same directory "foo",
> when I do a pull to get this change from another clone, it fails
> saying:
>
> error: The following untracked working tree files would be overwritten
> by checkout:
> foo/somefile.txt
> Please move or remove them before you switch branches.
> Aborting
> could not detach HEAD
>
>
> Obviously, git can't delete the submodule because the files have also
> been added directly. I don't think it is built to handle this
> scenario. Here is the series of commands I ran to "integrate" the
> submodule (replace the submodule with a directory containing the exact
> contents of the submodule itself):
>
> #!/usr/bin/env bash
> mv "$1" "${1}_"
> git submodule deinit "$1"

This removes the submodule entries from .git/config
(and it would remove the contents of that submodule, but they are moved)

> git rm "$1"

Removing the git link here.

So we still have the entries in the .gitmodules file there.
Maybe add:

name=$(git submodule-helper name $1)
git config -f .gitmodules --unset submodule.$name.*
git add .gitmodules

? (Could be optional)

> mv "${1}_" "$1"
> git add "$1/**"

Moving back into place and adding all files in there.

>
> The above script is named git-integrate-submodule, I run it like so:
>
> $ git integrate-submodule foo
>
> Then I do:
>
> $ git commit -m 'Integrated foo submodule'
>
> Is there any way to make this work nicely?

I think you can just remove the gitlink from the index and not from the working
tree ("git rm --cached $1")

> The only solution I've
> found is to obviously rename the directory before adding the physical
> files, for example name it foo1. Because they're different, they never
> "clash".

Also look at the difference between plumbing and porcelain commands[1],
as plumbing is more stable than the porcelain, so it will be easier to maintain
this script.

I think this would be an actually reasonable feature, which Git itself
could support via "git submodule [de]integrate", but then we'd also want
to see the reverse, i.e. take a sub directory and make it a submodule.

[1] e.g. https://www.kernel.org/pub/software/scm/git/docs/

Thanks,
Stefan


Re: [PATCHv3] submodule--helper: normalize funny urls

2016-10-18 Thread Junio C Hamano
Junio C Hamano  writes:

> Stefan Beller  writes:
>
>> The remote URL for the submodule can be specified relative
>> ...
>> v3:
>>  * fixed the coding style.
>
> Ah, thanks.  I had a squash queued on top but will replace with this
> one.

Heh, I guess I shouldn't have responded before seeing what this
breaks.  Applied on top of sb/submodule-ignore-trailing-slash, these
seem to break.

t/trash directory.t3600-rm
t/trash directory.t7403-submodule-sync
t/trash directory.t7406-submodule-update
t/trash directory.t7407-submodule-foreach
t/trash directory.t7506-status-submodule

Some may be showing broken assumptions of the downstream, two wrongs
compensating each other and correcting one exposing breakage of the
other.  I didn't look at them deeply.


Re: [PATCHv3] submodule--helper: normalize funny urls

2016-10-18 Thread Stefan Beller
On Tue, Oct 18, 2016 at 2:19 PM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> Stefan Beller  writes:
>>
>>> The remote URL for the submodule can be specified relative
>>> ...
>>> v3:
>>>  * fixed the coding style.
>>
>> Ah, thanks.  I had a squash queued on top but will replace with this
>> one.
>
> Heh, I guess I shouldn't have responded before seeing what this
> breaks.  Applied on top of sb/submodule-ignore-trailing-slash, these
> seem to break.

Ugh. (I should have tested more than just t0060).

The underlying issue is two fold:

* in t3600 we'd need
diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
index d046d98..545d32f 100755
--- a/t/t3600-rm.sh
+++ b/t/t3600-rm.sh
@@ -616,7 +616,7 @@ test_expect_success 'setup subsubmodule' '
git submodule update &&
(cd submod &&
git update-index --add --cacheinfo 16 $(git
rev-parse HEAD) subsubmod &&
-   git config -f .gitmodules submodule.sub.url ../. &&
+   git config -f .gitmodules submodule.sub.url ./. &&
git config -f .gitmodules submodule.sub.path subsubmod &&
git submodule init &&
git add .gitmodules &&

because the sub-submodule URL is actually the same as the submodule
(because we'd test lazily)

This looks ok from a bug fixers perspective.

However in t7403, we have a construct like:

git clone . super

which then results in

git -C super remote -v
./git/t/trash directory.t7403-submodule-sync/. (fetch)

And the commit message of this patch claimed we'd never use
the /. syntax ourselves. (We could argue the stupid users in the test
suite are doing it wrong, because in practice nobody would use clone
to create a nested repository? Not sure I agree.)

However instead of fixing the levels of nesting, the fix is as easy as:
diff --git a/t/t7403-submodule-sync.sh b/t/t7403-submodule-sync.sh
index 0726799..525d32b 100755
--- a/t/t7403-submodule-sync.sh
+++ b/t/t7403-submodule-sync.sh
@@ -15,7 +15,9 @@ test_expect_success setup '
git add file &&
test_tick &&
git commit -m upstream &&
-   git clone . super &&
+   # avoid cloning a repository with a url ending in /.
+   git clone . root &&
+   git clone root super &&
git clone super submodule &&
(
cd submodule &&

Same goes for t740{6,7} as well as t7506.

I think this change to the test suite is not warranted, because
we want to have the current behavior as-is as it seems like a nice
hack:

* Maybe we'd want to think about checking for the URL in git clone
  normalize the URL before configuring remote.origin.URL

* an often observed work flow for submodule tests seems:

mkdir sub1 &&
git -C sub1 init  &&
...

git clone . super &&
git -C super submodule add ../sub1
... # the ../sub1 looks intuitively correct
# because from the current directory which is
# super the relative path is ../sub1
#
# However in reality this ought to be a relative URL,
# and as super sits in the same directory as sub1
# ./sub1 would be "correct" according to the documentation
# However as the super remote URL ends with /.
# we had a bug that we needed to add one layer of unnesting
# and that is how ../sub1 worked.


Not sure about this patch any more.

Stefan


Re: [PATCHv3] attr: convert to new threadsafe API

2016-10-18 Thread Stefan Beller
On Fri, Oct 14, 2016 at 8:37 AM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> *1* Would we need a wrapping struct around the array of results?
>
> By the way, I do see a merit on the "check" side (tl;dr: but I do
> not think "result" needs it, hence I do not see the need for the
> "ugly" variants).

So we'd rather go with const char **result instead of our own new struct there.
Ok, got it.

>
> Take "archive" for example.  For each path, it wants to see the
> attribute "export-ignore" to decide if it is to be omitted.  In
> addition, the usual set of attributes used to smudge blobs into the
> working tree representation are inspected by the convert.c API as
> part of its implementation of convert_to_working_tree().  This
> program has at least two sets of <"check", "result"> that are used
> by two git_check_attr() callsites that are unaware of each other.
>
> One of the optimizations we discussed is to trim down the attr-stack
> (which caches the attributes read from .gitattributes files that are
> in effect for the "last" directory that has the path for which
> attrbiutes are queried for) by reading/keeping only the entries that
> affect the attributes the caller is interested in.  But when there
> are multiple callsites that are interested in different sets of
> attributes, we obviously cannot do such an optimization without
> taking too much cache-invalidation hit.  Because these callsites are
> not unaware of each other, I do not think we can say "keep the
> entries that affects the union of all active callsites" very easily,
> even if it were possible.
>
> But we could tie this cache to "check", which keeps a constant
> subset of attributes that the caller is interested in (i.e. each
> callsite would keep its own cache that is useful for its query).
> While we are single-threaded, "struct git_attr_check" being a
> wrapping struct around the array of "what attributes are of
> interest?" is a good place to add that per-check attr-stack cache.
> When we go multi-threaded, the attr-stack cache must become
> per-thread, and needs to be moved to per-thread storage, and such a
> per-thread storage would have multiple attr-stack, one per "check"
> instance (i.e. looking up the attr-stack may have to say "who/what
> thread am I?" to first go to the thread-local storage for the
> current thread, where a table of pointers to attr-stacks is kept and
> from there, index into that table to find the attr-stack that
> corresponds to the particular "check").  We could use the address of
> "check" as the key into this table, but "struct git_attr_check" that
> wraps the array gives us another option to allocate a small
> consecutive integer every time initl() creates a new "check" and use
> it as the index into that attr-stack table, as that integer index
> can be in the struct that wraps the array of wanted attributes.
>
> Note. none of the above is a suggestion to do the attr
> caching the way exactly described.  The above is primarily
> to illustrate how a wrapping struct may give us future
> flexibility without affecting a single line of code in the
> user of API.
>
> It may turn out that we do not need to have anything other than the
> array of wanted attributes in the "check" struct, but unlike
> "result", "check" is shared across threads, and do not have to live
> directly on the stack, so we can prepare for flexibility.
>
> I do not foresee a similar need for wrapping struct for "result",
> and given that we do want to keep the option of having them directly
> on the stack, I am inclined to say we shouldn't introduce one.
>
> If we were still to do the wrapping for result, I would say that
> basing it around the FLEX_ARRAY idiom, i.e.
>
>> struct git_attr_result {
>> int num_slots;
>> const char *value[FLEX_ARRAY];
>> };
>
> is a horrible idea.  It would be less horrible if it were
>
> struct git_attr_result {
> int num_slots;
> const char **value;
> };

So const char** but with an additional number of slots, all we do
would be to compare this number of slots to the checks number of slots and
die("BUG:..."),  which is just a burden and no help.

>
> then make the API user write via a convenience macro something like
> this
>
> const char *result_values[NUM_ATTRS_OF_INTEREST];
> struct git_attr_result result = {
> ARRAY_SIZE(result_values), &result_values
> };
>
> instead.  That way, at least the side that implements git_check_attr()
> would not have to be type-unsafe like the example of ugliness in the
> message I am following-up on.

Ok I will reroll with the const char** instead of the macro stuff that
I came up with,
(that would be type safe though uglier than the pure variant).


Re: [PATCHv3] attr: convert to new threadsafe API

2016-10-18 Thread Junio C Hamano
On Tue, Oct 18, 2016 at 4:52 PM, Stefan Beller  wrote:
>
> >
> > By the way, I do see a merit on the "check" side (tl;dr: but I do
> > not think "result" needs it, hence I do not see the need for the
> > "ugly" variants).
>
> So we'd rather go with const char **result instead of our own new struct 
> there.
> Ok, got it.

I do not think you got it. I am talking about wrapping struct around
an array of element,
not each element in the array. IOW

> > If we were still to do the wrapping for result, I would say that
> > basing it around the FLEX_ARRAY idiom, i.e.
> >
> >> struct git_attr_result {
> >> int num_slots;
> >> const char *value[FLEX_ARRAY];
> >> };

the structure around the array of elements (value) that allows us to have
something other than value[] in it. That is what I said "I do not see
the need for".

It is perfectly fine future-proofing to have

struct git_attr_result_value {
  const char *value;
};

and have the users of API declare

struct git_attr_result value[5];

or whatever. That way we could fatten the structure later if we wanted
to without having to update the users of API, and there is no downside.

Having wrapping strut around the array does have a huge downside,
and that is what I said I see no need for.


Re: [PATCHv3] attr: convert to new threadsafe API

2016-10-18 Thread Stefan Beller
On Tue, Oct 18, 2016 at 5:06 PM, Junio C Hamano  wrote:
> On Tue, Oct 18, 2016 at 4:52 PM, Stefan Beller  wrote:
>>
>> >
>> > By the way, I do see a merit on the "check" side (tl;dr: but I do
>> > not think "result" needs it, hence I do not see the need for the
>> > "ugly" variants).
>>
>> So we'd rather go with const char **result instead of our own new struct 
>> there.
>> Ok, got it.
>
> I do not think you got it. I am talking about wrapping struct around
> an array of element,
> not each element in the array. IOW
>
>> > If we were still to do the wrapping for result, I would say that
>> > basing it around the FLEX_ARRAY idiom, i.e.
>> >
>> >> struct git_attr_result {
>> >> int num_slots;
>> >> const char *value[FLEX_ARRAY];
>> >> };
>
> the structure around the array of elements (value) that allows us to have
> something other than value[] in it. That is what I said "I do not see
> the need for".
>
> It is perfectly fine future-proofing to have
>
> struct git_attr_result_value {
>   const char *value;
> };
>
> and have the users of API declare
>
> struct git_attr_result value[5];
>
> or whatever. That way we could fatten the structure later if we wanted
> to without having to update the users of API, and there is no downside.
>
> Having wrapping strut around the array does have a huge downside,
> and that is what I said I see no need for.

I am not sure if I see the upside on wrapping a single value except for
its future proofness, i.e. what if we want to transport information that
is valid for all values, e.g. an error code or that the result check was done
lazily (for non lazy you would need to do X) or ...

IOW I would expect there to be more use cases for information regarding
all the values and not each value enhanced by a thing.

We could just repeat the thing in each of the 5 values in
> struct git_attr_result value[5];
though.

I'll go with that.


Re: [PATCHv3] attr: convert to new threadsafe API

2016-10-18 Thread Junio C Hamano
Stefan Beller  writes:

> I am not sure if I see the upside on wrapping a single value except for
> its future proofness,

I do not see anything other than future-proofing, either.  If we
need to touch all the code that uses the attributes to update the
API, I'd prefer to avoid having to do that again in the future.


Re: [PATCHv3] submodule--helper: normalize funny urls

2016-10-18 Thread Junio C Hamano
Stefan Beller  writes:

> The underlying issue is two fold:
>
> * in t3600 we'd need
> diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
> index d046d98..545d32f 100755
> --- a/t/t3600-rm.sh
> +++ b/t/t3600-rm.sh
> @@ -616,7 +616,7 @@ test_expect_success 'setup subsubmodule' '
> git submodule update &&
> (cd submod &&
> git update-index --add --cacheinfo 16 $(git
> rev-parse HEAD) subsubmod &&
> -   git config -f .gitmodules submodule.sub.url ../. &&
> +   git config -f .gitmodules submodule.sub.url ./. &&
> git config -f .gitmodules submodule.sub.path subsubmod &&
> git submodule init &&
> git add .gitmodules &&
>
> because the sub-submodule URL is actually the same as the submodule
> (because we'd test lazily)

This fix sounds entirely sane.  The "../." you touched was depending
on the buggy behaviour; it is exactly the case of "there were two
wrongs that were covering each other; after one of them gets fixed,
the other one's brokenness is exposed", right?

> However in t7403, we have a construct like:
>
> git clone . super
>
> which then results in
>
> git -C super remote -v
> ./git/t/trash directory.t7403-submodule-sync/. (fetch)

That sounds expected.  We do not have to add trailing "/.", but the
system ought to work with or without it correctly and the same way.

> However instead of fixing the levels of nesting, the fix is as easy as:
> diff --git a/t/t7403-submodule-sync.sh b/t/t7403-submodule-sync.sh
> index 0726799..525d32b 100755
> --- a/t/t7403-submodule-sync.sh
> +++ b/t/t7403-submodule-sync.sh
> @@ -15,7 +15,9 @@ test_expect_success setup '
> git add file &&
> test_tick &&
> git commit -m upstream &&
> -   git clone . super &&
> +   # avoid cloning a repository with a url ending in /.
> +   git clone . root &&
> +   git clone root super &&
> git clone super submodule &&
> (
> cd submodule &&
>
> Same goes for t740{6,7} as well as t7506.

Isn't the issue the same as that "3600-rm" one?  I know you said
twofold upfront, but I am not sure I agree.

The "super" repository refers to its submodules with "../submodule"
in the test code we have, even though the submodule referred to
lives inside $TRASH, and by fixing the "trailing /. and trailing
/root are treated the same way" bug, its reference created in the
test ends up referring to one level above, perhaps in t/submodule,
instead of the intended place t/trash/submodule.  By using "root",
you are making their wrong references point at the right place.

Admittedly, the updated test above tests something different from
what it originally wanted to test, which feels somewhat distasteful
but I do not think it is wrong.

> I think this change to the test suite is not warranted, because
> we want to have the current behavior as-is ...

I am not sure.  Certainly we would want to make sure that the normal
case (i.e. no funny trailing junk) to work correctly, but we do want
to protect the fix from future breakage as well, no?

Perhaps we can do a preliminary step to update tests to primarily
check the cases that do not involve URL with trailing "/." by either
doing that double clone, or more preferrably, clone from "$(pwd)"
and adjust the incorrect submodule reference that have been relying
on the buggy behaviour.  With that "root" trick, the test would pass
with or without the fix under discussion, right?

Then do the fix under discussion and introduce a test that clones
from "." refers to submodules with relative path and makes sure that
trailing "/." is interpreted correctly.


Re: [PATCHv3] submodule--helper: normalize funny urls

2016-10-18 Thread Stefan Beller
On Tue, Oct 18, 2016 at 5:56 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> The underlying issue is two fold:
>>
>> * in t3600 we'd need
>> diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
>> index d046d98..545d32f 100755
>> --- a/t/t3600-rm.sh
>> +++ b/t/t3600-rm.sh
>> @@ -616,7 +616,7 @@ test_expect_success 'setup subsubmodule' '
>> git submodule update &&
>> (cd submod &&
>> git update-index --add --cacheinfo 16 $(git
>> rev-parse HEAD) subsubmod &&
>> -   git config -f .gitmodules submodule.sub.url ../. &&
>> +   git config -f .gitmodules submodule.sub.url ./. &&
>> git config -f .gitmodules submodule.sub.path subsubmod &&
>> git submodule init &&
>> git add .gitmodules &&
>>
>> because the sub-submodule URL is actually the same as the submodule
>> (because we'd test lazily)
>
> This fix sounds entirely sane.  The "../." you touched was depending
> on the buggy behaviour; it is exactly the case of "there were two
> wrongs that were covering each other; after one of them gets fixed,
> the other one's brokenness is exposed", right?
>
>> However in t7403, we have a construct like:
>>
>> git clone . super
>>
>> which then results in
>>
>> git -C super remote -v
>> ./git/t/trash directory.t7403-submodule-sync/. (fetch)
>
> That sounds expected.  We do not have to add trailing "/.", but the
> system ought to work with or without it correctly and the same way.
>
>> However instead of fixing the levels of nesting, the fix is as easy as:
>> diff --git a/t/t7403-submodule-sync.sh b/t/t7403-submodule-sync.sh
>> index 0726799..525d32b 100755
>> --- a/t/t7403-submodule-sync.sh
>> +++ b/t/t7403-submodule-sync.sh
>> @@ -15,7 +15,9 @@ test_expect_success setup '
>> git add file &&
>> test_tick &&
>> git commit -m upstream &&
>> -   git clone . super &&
>> +   # avoid cloning a repository with a url ending in /.
>> +   git clone . root &&
>> +   git clone root super &&
>> git clone super submodule &&
>> (
>> cd submodule &&
>>
>> Same goes for t740{6,7} as well as t7506.
>
> Isn't the issue the same as that "3600-rm" one?  I know you said
> twofold upfront, but I am not sure I agree.

I took a couple of hours trying to get the same fix applied to the t7* tests,
but that doesn't seem to be as easy. I'll try again.

>
> The "super" repository refers to its submodules with "../submodule"
> in the test code we have, even though the submodule referred to
> lives inside $TRASH, and by fixing the "trailing /. and trailing
> /root are treated the same way" bug, its reference created in the
> test ends up referring to one level above, perhaps in t/submodule,
> instead of the intended place t/trash/submodule.  By using "root",
> you are making their wrong references point at the right place.

Correct.

>
> Admittedly, the updated test above tests something different from
> what it originally wanted to test, which feels somewhat distasteful
> but I do not think it is wrong.

I think it is. I was just showing how to quick fix the issue, and how wide the
impact was. More like assessing the situation of what is broken with
that patch, rather than proposing a way to go for fixing.

>
>> I think this change to the test suite is not warranted, because
>> we want to have the current behavior as-is ...
>
> I am not sure.  Certainly we would want to make sure that the normal
> case (i.e. no funny trailing junk) to work correctly, but we do want
> to protect the fix from future breakage as well, no?

Exactly. So not intermediate "root" that we clone from, but adapting the
relative URLs. Maybe half the broken tests can switch to 'root' and the others
go with the current behavior of cloning . to super.

>
> Perhaps we can do a preliminary step to update tests to primarily
> check the cases that do not involve URL with trailing "/." by either
> doing that double clone, or more preferrably, clone from "$(pwd)"
> and adjust the incorrect submodule reference that have been relying
> on the buggy behaviour.  With that "root" trick, the test would pass
> with or without the fix under discussion, right?

I assume so, (not tested).

>
> Then do the fix under discussion and introduce a test that clones
> from "." refers to submodules with relative path and makes sure that
> trailing "/." is interpreted correctly.


Re: [PATCH v4 05/25] sequencer: eventually release memory allocated for the option values

2016-10-18 Thread Junio C Hamano
Johannes Schindelin  writes:

>> > To remedy that, we now take custody of the option values in question,
>> > requiring those values to be malloc()ed or strdup()ed
>> 
>> That is the approach this patch takes, so "eventually release" in
>> the title is no longer accurate, I would think.
>
> To the contrary, we now free() things in remove_state(), so we still
> "eventually release" the memory.

OK.  We call a change to teach remove_state() to free the resource
does more commonly as "plug leaks"; the word "eventually" gave me an
impression that we are emphasizing the fact that we do not free(3)
immediately but lazily do so at the end, hence my response.


Re: [PATCHv3] submodule--helper: normalize funny urls

2016-10-18 Thread Junio C Hamano
Stefan Beller  writes:

>> I am not sure.  Certainly we would want to make sure that the normal
>> case (i.e. no funny trailing junk) to work correctly, but we do want
>> to protect the fix from future breakage as well, no?
>
> Exactly. So not intermediate "root" that we clone from, but adapting the
> relative URLs. Maybe half the broken tests can switch to 'root' and the others
> go with the current behavior of cloning . to super.
>>
>> Perhaps we can do a preliminary step to update tests to primarily
>> check the cases that do not involve URL with trailing "/." by either
>> doing that double clone, or more preferrably, clone from "$(pwd)"
>> and adjust the incorrect submodule reference that have been relying
>> on the buggy behaviour.  With that "root" trick, the test would pass
>> with or without the fix under discussion, right?
>
> I assume so, (not tested).

OK.  Thanks for sticking with it.


[PATCH 0/7] Rejecting useless merge bases

2016-10-18 Thread Junio C Hamano
This is a continuation of

http://public-inbox.org/git/xmqqmvi2sj8f@gitster.mtv.corp.google.com

In a workflow where topic branches are first merged to the 'next'
integration branch to be tested before getting merged down to the
'master' integration branch to be consumed by the end users, merging
the 'master' branch back to the 'next' happens after topics graduate
to 'master' and release notes entries are written for them.

Git finds many merge bases between 'master' and 'next' while
creating this merge.  In addition to the tip of 'master' back when
we made such a merge back from 'master' to 'next' was made the last
time, which is the most reasonable merge base to explain the
histories of both branches, all the tips of topic branches that
graduated recently are merge bases.  Because these tips of topic
branches were already in 'next', the tip of 'next' reaches them, and
because they just graduated to 'master', the tip of 'master' reaches
them, too.  And these topics are independent most of the time (that
is the point of employing the topic-branch workflow), so they cannot
be reduced.

The merge-recursive machinery is very inefficient to compute this
merge, because it needs to create pointless virtual merge-base
commits across these many merge bases.  Conceptually, the point
where the histories of 'master' and 'next' diverged was the tip of
'master' back when we created such a merge back from 'master' to
'next' the last time, and in practice that is the only merge base
that matters.

The series allows us to ignore these tips of topics, which are
uninteresting merge bases, when running "git merge".  The example
merge with 12 merge bases:

git checkout 4868def05e && git merge 659889482a

in our history takes about 1.22-1.33 seconds without the series,
while running the latter "git merge" with the "--fp-base-only"
option takes about 0.54-0.59 seconds.

Junio C Hamano (7):
  commit: simplify fastpath of merge-base
  sha1_name: remove ONELINE_SEEN bit
  merge-base: stop moving commits around in remove_redundant()
  merge-base: expose get_merge_bases_many_0() a bit more
  merge-base: mark bases that are on first-parent chain
  merge-base: limit the output to bases that are on first-parent chain
  merge: allow to use only the fp-only merge bases

 Documentation/git-merge-base.txt |  8 +++-
 Documentation/merge-options.txt  |  9 +
 builtin/merge-base.c | 10 +++--
 builtin/merge.c  | 15 ++--
 commit.c | 81 
 commit.h |  7 +++-
 object.h |  3 +-
 sha1_name.c  | 10 ++---
 8 files changed, 94 insertions(+), 49 deletions(-)

-- 
2.10.1-631-gb2c64dcf30



[PATCH 1/7] commit: simplify fastpath of merge-base computation

2016-10-18 Thread Junio C Hamano
The get_merge_bases_many*() family of functions first call
merge_bases_many() to find all possible merge bases between a single
commit "one" and a set of other commits "twos[]".  Because this
typically involves traversing the commit graph, which uses the
object flags on the commits involved, the function needs to clear
the flags before it returns.

Except for one special case.  If "one" is exactly one of the "twos",
"one" is the merge base and merge_bases_many() returns a list of
merge bases with a single element, "one", on it, without smudging
the object flags of the commits at all.

We used to use a loop over "twos[]" array to see if this fast-path
would have kicked in in merge_bases_many(), in which case we can
return without cleaning the bits at all.  We can do the same by
inspecting the result in a more direct way, which is conceptually
cleaner.

Signed-off-by: Junio C Hamano 
---
 commit.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/commit.c b/commit.c
index aada266f9a..6266c0380c 100644
--- a/commit.c
+++ b/commit.c
@@ -955,10 +955,17 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
int cnt, i;
 
result = merge_bases_many(one, n, twos);
-   for (i = 0; i < n; i++) {
-   if (one == twos[i])
-   return result;
-   }
+
+   /*
+* The fast-path of 'one' being the merge-base; there is no
+* need to clean the object flags in this case.
+*/
+   if (result && !result->next &&
+   result->item == one &&
+   !(one->object.flags & RESULT))
+   return result;
+
+   /* If we didn't get any, or there is only one, we are done */
if (!result || !result->next) {
if (cleanup) {
clear_commit_marks(one, all_flags);
-- 
2.10.1-631-gb2c64dcf30



[PATCH 4/7] merge-base: expose get_merge_bases_many_0() a bit more

2016-10-18 Thread Junio C Hamano
"git merge-base" names its internal workhorse helper function
"get_merge_bases_many_0()", which takes one "can we get away without
clearing the object->flags bits because we know we are the last
caller?" parameter.  Make the parameter into a flags word to make it
extensible and rename it to get_merge_bases_opt().  Use it to turn
get_merge_bases_many_dirty() wrapper into a C-preprocessor macro.

Signed-off-by: Junio C Hamano 
---
 commit.c | 19 ++-
 commit.h |  6 --
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/commit.c b/commit.c
index 59bd18e67c..92d23b1082 100644
--- a/commit.c
+++ b/commit.c
@@ -942,15 +942,15 @@ static void mark_redundant(struct commit **array, int cnt)
free(filled_index);
 }
 
-static struct commit_list *get_merge_bases_many_0(struct commit *one,
- int n,
- struct commit **twos,
- int cleanup)
+struct commit_list *get_merge_bases_opt(struct commit *one,
+   int n, struct commit **twos,
+   unsigned flags)
 {
struct commit_list *list;
struct commit **rslt;
struct commit_list *result;
int cnt, i;
+   int cleanup = !!(flags & MB_POSTCLEAN);
 
result = merge_bases_many(one, n, twos);
 
@@ -997,19 +997,12 @@ struct commit_list *get_merge_bases_many(struct commit 
*one,
 int n,
 struct commit **twos)
 {
-   return get_merge_bases_many_0(one, n, twos, 1);
-}
-
-struct commit_list *get_merge_bases_many_dirty(struct commit *one,
-  int n,
-  struct commit **twos)
-{
-   return get_merge_bases_many_0(one, n, twos, 0);
+   return get_merge_bases_opt(one, n, twos, 0);
 }
 
 struct commit_list *get_merge_bases(struct commit *one, struct commit *two)
 {
-   return get_merge_bases_many_0(one, 1, &two, 1);
+   return get_merge_bases_opt(one, 1, &two, MB_POSTCLEAN);
 }
 
 /*
diff --git a/commit.h b/commit.h
index 32e1a113e5..557f2814b7 100644
--- a/commit.h
+++ b/commit.h
@@ -253,8 +253,10 @@ extern struct commit_list *get_merge_bases(struct commit 
*rev1, struct commit *r
 extern struct commit_list *get_merge_bases_many(struct commit *one, int n, 
struct commit **twos);
 extern struct commit_list *get_octopus_merge_bases(struct commit_list *in);
 
-/* To be used only when object flags after this call no longer matter */
-extern struct commit_list *get_merge_bases_many_dirty(struct commit *one, int 
n, struct commit **twos);
+#define MB_POSTCLEAN 01
+extern struct commit_list *get_merge_bases_opt(struct commit *one, int n, 
struct commit **twos, unsigned flags);
+
+#define get_merge_bases_many_dirty(one, n, twos) 
get_merge_bases_opt((one),(n),(twos),MB_POSTCLEAN)
 
 /* largest positive number a signed 32-bit integer can contain */
 #define INFINITE_DEPTH 0x7fff
-- 
2.10.1-631-gb2c64dcf30



[PATCH 3/7] merge-base: stop moving commits around in remove_redundant()

2016-10-18 Thread Junio C Hamano
The merge-base computation is performed in two steps.  First,
paint_down_to_common() traverses the history to find all possible
merge bases (and more), and then remove_redundant() checks the
result and culls the ones that can be reached from another commit
in the result.

The latter received an array of commits, and then returned the same
array after reordering the elements in it, moving the surviving ones
to the front and returning the number of surviving ones.

This arrangement works, but it makes it cumbersome for the callers
when they want to see the array's contents intact (e.g. the caller
may want to keep an additional per-commit data in an independent
array that parallels the array of commits).

Stop moving commits around in the array, and instead mark the ones
that are not merge bases with the STALE bit in their object->flags
bitword.

Signed-off-by: Junio C Hamano 
---
 commit.c | 32 ++--
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/commit.c b/commit.c
index 6266c0380c..59bd18e67c 100644
--- a/commit.c
+++ b/commit.c
@@ -888,11 +888,11 @@ struct commit_list *get_octopus_merge_bases(struct 
commit_list *in)
return ret;
 }
 
-static int remove_redundant(struct commit **array, int cnt)
+static void mark_redundant(struct commit **array, int cnt)
 {
/*
 * Some commit in the array may be an ancestor of
-* another commit.  Move such commit to the end of
+* another commit.  Mark such commit as STALE in
 * the array, and return the number of commits that
 * are independent from each other.
 */
@@ -930,18 +930,16 @@ static int remove_redundant(struct commit **array, int 
cnt)
free_commit_list(common);
}
 
-   /* Now collect the result */
-   COPY_ARRAY(work, array, cnt);
-   for (i = filled = 0; i < cnt; i++)
-   if (!redundant[i])
-   array[filled++] = work[i];
-   for (j = filled, i = 0; i < cnt; i++)
+   /* Mark the result */
+   for (i = 0; i < cnt; i++)
if (redundant[i])
-   array[j++] = work[i];
+   array[i]->object.flags |= STALE;
+   else
+   array[i]->object.flags &= ~STALE;
+
free(work);
free(redundant);
free(filled_index);
-   return filled;
 }
 
 static struct commit_list *get_merge_bases_many_0(struct commit *one,
@@ -984,10 +982,13 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
clear_commit_marks(one, all_flags);
clear_commit_marks_many(n, twos, all_flags);
 
-   cnt = remove_redundant(rslt, cnt);
+   mark_redundant(rslt, cnt);
result = NULL;
for (i = 0; i < cnt; i++)
-   commit_list_insert_by_date(rslt[i], &result);
+   if (!(rslt[i]->object.flags & STALE))
+   commit_list_insert_by_date(rslt[i], &result);
+   else
+   rslt[i]->object.flags &= ~STALE;
free(rslt);
return result;
 }
@@ -1086,9 +1087,12 @@ struct commit_list *reduce_heads(struct commit_list 
*heads)
p->item->object.flags &= ~STALE;
}
}
-   num_head = remove_redundant(array, num_head);
+   mark_redundant(array, num_head);
for (i = 0; i < num_head; i++)
-   tail = &commit_list_insert(array[i], tail)->next;
+   if (!(array[i]->object.flags & STALE))
+   tail = &commit_list_insert(array[i], tail)->next;
+   else
+   array[i]->object.flags &= ~STALE;
return result;
 }
 
-- 
2.10.1-631-gb2c64dcf30



[PATCH 2/7] sha1_name: remove ONELINE_SEEN bit

2016-10-18 Thread Junio C Hamano
28a4d94044 ("object name: introduce ':/' notation",
2007-02-24) started using its own bit from object->flags to mark
commits used while parsing the ":/token" extended SHA-1 syntax to
name a commit temporarily, and this was kept even when f7bff00314
("sha1_name.c: fix parsing of ":/token" syntax", 2010-08-02) found
and fixed a bug in its implementation.

The use of that flag bit, however, is limited to a single function,
get_sha1_oneline(), which first sets it for the commits sitting at
the tips of refs, uses the bit to avoid duplicate traversal while
walking the history, and then cleans the bit from all commits it
walked.

Which is exactly what the general-purpose TMP_MARK bit meant to be
used for isolated case was invented for.  Replace ONELINE_SEEN with
TMP_MARK and retire the former.

Signed-off-by: Junio C Hamano 
---
 object.h|  1 -
 sha1_name.c | 10 --
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/object.h b/object.h
index f8b644263f..f8e218eccd 100644
--- a/object.h
+++ b/object.h
@@ -37,7 +37,6 @@ struct object_array {
  * bundle.c:   16
  * http-push.c:16-19
  * commit.c:   16-19
- * sha1_name.c: 20
  */
 #define FLAG_BITS  27
 
diff --git a/sha1_name.c b/sha1_name.c
index ca7ddd6f2c..fa0e6701a3 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -7,6 +7,7 @@
 #include "refs.h"
 #include "remote.h"
 #include "dir.h"
+#include "revision.h"
 
 static int get_sha1_oneline(const char *, unsigned char *, struct commit_list 
*);
 
@@ -855,9 +856,6 @@ static int get_sha1_1(const char *name, int len, unsigned 
char *sha1, unsigned l
  * For future extension, all other sequences beginning with ':/!' are reserved.
  */
 
-/* Remember to update object flag allocation in object.h */
-#define ONELINE_SEEN (1u<<20)
-
 static int handle_one_ref(const char *path, const struct object_id *oid,
  int flag, void *cb_data)
 {
@@ -899,7 +897,7 @@ static int get_sha1_oneline(const char *prefix, unsigned 
char *sha1,
return -1;
 
for (l = list; l; l = l->next) {
-   l->item->object.flags |= ONELINE_SEEN;
+   l->item->object.flags |= TMP_MARK;
commit_list_insert(l->item, &backup);
}
while (list) {
@@ -907,7 +905,7 @@ static int get_sha1_oneline(const char *prefix, unsigned 
char *sha1,
struct commit *commit;
int matches;
 
-   commit = pop_most_recent_commit(&list, ONELINE_SEEN);
+   commit = pop_most_recent_commit(&list, TMP_MARK);
if (!parse_object(commit->object.oid.hash))
continue;
buf = get_commit_buffer(commit, NULL);
@@ -924,7 +922,7 @@ static int get_sha1_oneline(const char *prefix, unsigned 
char *sha1,
regfree(®ex);
free_commit_list(list);
for (l = backup; l; l = l->next)
-   clear_commit_marks(l->item, ONELINE_SEEN);
+   clear_commit_marks(l->item, TMP_MARK);
free_commit_list(backup);
return found ? 0 : -1;
 }
-- 
2.10.1-631-gb2c64dcf30



[PATCH 6/7] merge-base: limit the output to bases that are on first-parent chain

2016-10-18 Thread Junio C Hamano
A new option "--fp-only" limits the output to the merge bases that
are on the first-parent chain from the branches being merged.

Signed-off-by: Junio C Hamano 
---
 Documentation/git-merge-base.txt |  8 +++-
 builtin/merge-base.c | 10 +++---
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-merge-base.txt b/Documentation/git-merge-base.txt
index 808426faac..f99653f138 100644
--- a/Documentation/git-merge-base.txt
+++ b/Documentation/git-merge-base.txt
@@ -9,7 +9,7 @@ git-merge-base - Find as good common ancestors as possible for 
a merge
 SYNOPSIS
 
 [verse]
-'git merge-base' [-a|--all]  ...
+'git merge-base' [-a|--all] [--fp-only]  ...
 'git merge-base' [-a|--all] --octopus ...
 'git merge-base' --is-ancestor  
 'git merge-base' --independent ...
@@ -72,6 +72,12 @@ OPTIONS
 --all::
Output all merge bases for the commits, instead of just one.
 
+--fp-only::
+   Limit the output to merge bases that are on the first-parent
+   chain from none of the commits given on the command line.
+
+
+
 DISCUSSION
 --
 
diff --git a/builtin/merge-base.c b/builtin/merge-base.c
index c0d1822eb3..42febb20ff 100644
--- a/builtin/merge-base.c
+++ b/builtin/merge-base.c
@@ -6,11 +6,12 @@
 #include "revision.h"
 #include "parse-options.h"
 
-static int show_merge_base(struct commit **rev, int rev_nr, int show_all)
+static int show_merge_base(struct commit **rev, int rev_nr, int show_all, int 
fp_only)
 {
struct commit_list *result;
+   unsigned flags = fp_only ? MB_FPCHAIN : 0;
 
-   result = get_merge_bases_many_dirty(rev[0], rev_nr - 1, rev + 1);
+   result = get_merge_bases_opt(rev[0], rev_nr - 1, rev + 1, flags);
 
if (!result)
return 1;
@@ -208,10 +209,13 @@ int cmd_merge_base(int argc, const char **argv, const 
char *prefix)
struct commit **rev;
int rev_nr = 0;
int show_all = 0;
+   int fp_only = 0;
int cmdmode = 0;
 
struct option options[] = {
OPT_BOOL('a', "all", &show_all, N_("output all common 
ancestors")),
+   OPT_BOOL(0, "fp-only", &fp_only,
+N_("limit to bases that are on first-parent chain")),
OPT_CMDMODE(0, "octopus", &cmdmode,
N_("find ancestors for a single n-way merge"), 'o'),
OPT_CMDMODE(0, "independent", &cmdmode,
@@ -255,5 +259,5 @@ int cmd_merge_base(int argc, const char **argv, const char 
*prefix)
ALLOC_ARRAY(rev, argc);
while (argc-- > 0)
rev[rev_nr++] = get_commit_reference(*argv++);
-   return show_merge_base(rev, rev_nr, show_all);
+   return show_merge_base(rev, rev_nr, show_all, fp_only);
 }
-- 
2.10.1-631-gb2c64dcf30



[PATCH 7/7] merge: allow to use only the fp-only merge bases

2016-10-18 Thread Junio C Hamano
Teach "git merge" a new option "--fp-base-only" that tells it to
consider only merge bases that are on the first-parent chain.

This speeds up back-merges needed in the topic branch workflow.  The
merge of 'master' back into 'next' used as an example in the RFD
article , i.e.

git checkout 4868def05e && git merge 659889482a

in our history takes about 1.22-1.33 seconds without the series,
while running the latter "git merge" with the "--fp-base-only"
option takes about 0.54-0.59 seconds.

Signed-off-by: Junio C Hamano 
---
 Documentation/merge-options.txt |  9 +
 builtin/merge.c | 15 ---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/Documentation/merge-options.txt b/Documentation/merge-options.txt
index 5b4a62e936..7927f069e4 100644
--- a/Documentation/merge-options.txt
+++ b/Documentation/merge-options.txt
@@ -116,6 +116,15 @@ ifndef::git-pull[]
Note that not all merge strategies may support progress
reporting.
 
+--fp-base-only::
+   Instead of using all merge bases when computing the
+   three-way merge result, use only the merge bases on the
+   first-parent chain of the commits being merged.  This
+   experimental feature is meant to be used when merging an
+   older integration branch back to a newer integration branch
+   in the topic-branch workflow.
+
+
 endif::git-pull[]
 
 --allow-unrelated-histories::
diff --git a/builtin/merge.c b/builtin/merge.c
index 0ae099f746..a38b878e61 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -88,6 +88,8 @@ enum ff_type {
 
 static enum ff_type fast_forward = FF_ALLOW;
 
+static int fp_base_only;
+
 static int option_parse_message(const struct option *opt,
const char *arg, int unset)
 {
@@ -210,6 +212,8 @@ static struct option builtin_merge_options[] = {
{ OPTION_SET_INT, 0, "ff-only", &fast_forward, NULL,
N_("abort if fast-forward is not possible"),
PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, FF_ONLY },
+   OPT_BOOL(0, "fp-base-only", &fp_base_only,
+N_("use only merge bases on first-parent chain")),
OPT_RERERE_AUTOUPDATE(&allow_rerere_auto),
OPT_BOOL(0, "verify-signatures", &verify_signatures,
N_("verify that the named commit has a valid GPG signature")),
@@ -1340,9 +1344,14 @@ int cmd_merge(int argc, const char **argv, const char 
*prefix)
 
if (!remoteheads)
; /* already up-to-date */
-   else if (!remoteheads->next)
-   common = get_merge_bases(head_commit, remoteheads->item);
-   else {
+   else if (!remoteheads->next) {
+   unsigned flags = MB_POSTCLEAN;
+   if (fp_base_only)
+   flags |= MB_FPCHAIN;
+   common = get_merge_bases_opt(head_commit,
+1, &remoteheads->item,
+flags);
+   } else {
struct commit_list *list = remoteheads;
commit_list_insert(head_commit, &list);
common = get_octopus_merge_bases(list);
-- 
2.10.1-631-gb2c64dcf30



[PATCH 5/7] merge-base: mark bases that are on first-parent chain

2016-10-18 Thread Junio C Hamano
In a workflow where topic branches are first merged to the 'next'
integration branch to be tested before getting merged down to the
'master' integration branch to be consumed by the end users, merging
the 'master' branch back to the 'next' happens after topics graduate
to 'master' and release notes entries are written for them.

Git finds many merge bases between 'master' and 'next' while
creating this merge.  In addition to the tip of 'master' back when
we made such a merge back from 'master' to 'next' was made the last
time, which is the most reasonable merge base to explain the
histories of both branches, all the tips of topic branches that
graduated recently are merge bases.  Because these tips of topic
branches were already in 'next', the tip of 'next' reaches them, and
because they just graduated to 'master', the tip of 'master' reaches
them, too.  And these topics are independent most of the time (that
is the point of employing the topic-branch workflow), so they cannot
be reduced.

The merge-recursive machinery is very inefficient to compute this
merge, because it needs to create pointless virtual merge-base
commits across these many merge bases.  Conceptually, the point
where the histories of 'master' and 'next' diverged was the tip of
'master' back when we created such a merge back from 'master' to
'next' the last time, and in practice that is the only merge base
that matters.

Update the logic in paint_down_to_common() so that it can mark if a
merge-base commit was reached by following the first-parent chain of
either side of the merge to find such commit, and give an option to
get_merge_bases_opt() to return only merge-base commits that are on
the first-parent chain, so that a later step of this series can
introduce a command line option to enable it and avoid feeding
useless merge bases to the merge machinery.

Signed-off-by: Junio C Hamano 
---
 commit.c | 17 +++--
 commit.h |  1 +
 object.h |  2 +-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index 92d23b1082..8e90531b05 100644
--- a/commit.c
+++ b/commit.c
@@ -765,8 +765,9 @@ void sort_in_topological_order(struct commit_list **list, 
enum rev_sort_order so
 #define PARENT2(1u<<17)
 #define STALE  (1u<<18)
 #define RESULT (1u<<19)
+#define FPCHAIN(1u<<20)
 
-static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
+static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT | 
FPCHAIN);
 
 static int queue_has_nonstale(struct prio_queue *queue)
 {
@@ -802,6 +803,7 @@ static struct commit_list *paint_down_to_common(struct 
commit *one, int n, struc
struct commit *commit = prio_queue_get(&queue);
struct commit_list *parents;
int flags;
+   int nth_parent = 0;
 
flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
if (flags == (PARENT1 | PARENT2)) {
@@ -816,11 +818,14 @@ static struct commit_list *paint_down_to_common(struct 
commit *one, int n, struc
while (parents) {
struct commit *p = parents->item;
parents = parents->next;
+   nth_parent++;
if ((p->object.flags & flags) == flags)
continue;
if (parse_commit(p))
return NULL;
p->object.flags |= flags;
+   if (nth_parent == 1)
+   p->object.flags |= FPCHAIN;
prio_queue_put(&queue, p);
}
}
@@ -951,6 +956,8 @@ struct commit_list *get_merge_bases_opt(struct commit *one,
struct commit_list *result;
int cnt, i;
int cleanup = !!(flags & MB_POSTCLEAN);
+   int fpchain = !!(flags & MB_FPCHAIN);
+   char *on_fpchain;
 
result = merge_bases_many(one, n, twos);
 
@@ -975,21 +982,27 @@ struct commit_list *get_merge_bases_opt(struct commit 
*one,
/* There are more than one */
cnt = commit_list_count(result);
rslt = xcalloc(cnt, sizeof(*rslt));
+   on_fpchain = xcalloc(cnt, sizeof(*on_fpchain));
for (list = result, i = 0; list; list = list->next)
rslt[i++] = list->item;
free_commit_list(result);
 
+   for (i = 0; i < cnt; i++)
+   on_fpchain[i] = !!(rslt[i]->object.flags & FPCHAIN);
+
clear_commit_marks(one, all_flags);
clear_commit_marks_many(n, twos, all_flags);
 
mark_redundant(rslt, cnt);
result = NULL;
for (i = 0; i < cnt; i++)
-   if (!(rslt[i]->object.flags & STALE))
+   if (!(rslt[i]->object.flags & STALE) &&
+   (!fpchain || on_fpchain[i]))
commit_list_insert_by_date(rslt[i], &result);
else
rslt[i]->object.flags &=