[PATCH] clone: pass --progress decision to recursive submodules

2016-09-21 Thread Jeff King
When cloning with "--recursive", we'd generally expect
submodules to show progress reports if the main clone did,
too.

In older versions of git, this mostly worked out of the
box. Since we show progress by default when stderr is a tty,
and since the child clones inherit the parent stderr, then
both processes would come to the same decision by default.
If the parent clone was asked for "--quiet", we passed down
"--quiet" to the child. However, if stderr was not a tty and
the user specified "--progress", we did not propagate this
to the child.

That's a minor bug, but things got much worse when we
switched recently to submodule--helper's update_clone
command. With that change, the stderr of the child clones
are always connected to a pipe, and we never output
progress at all.

This patch teaches git-submodule and git-submodule--helper
how to pass down an explicit "--progress" flag when cloning.
The clone command then decides to propagate that flag based
on the cloning decision made earlier (which takes into
account isatty(2) of the parent process, existing --progress
or --quiet flags, etc). Since the child processes always run
without a tty on stderr, we don't have to worry about
passing an explicit "--no-progress"; it's the default for
them.

This fixes the recent loss of progress during recursive
clones. And as a bonus, it makes:

  git clone --recursive --progress ... 2>&1 | cat

work by triggering progress explicitly in the children.

Signed-off-by: Jeff King 
---
I don't usually use submodules, but I happened to be testing something,
and my goto "this has a lot of submodules" repository happens to have
some pretty large submodules. So I had plenty of time to contemplate the
lack of progress reporting. :)

I checked with --jobs, too, and this should do the right thing; it
queues up the progress reports for submodules that aren't the
output_owner, and then dumps them all at once (so the progress meter
will appear to whiz by for those jobs until it catches up to the current
state, which is the only reasonable thing we can do, I think).

I imagine there are other code paths that want similar treatment, but I
didn't look into them. I'd assume "fetch" is one. I'm not sure if we do
parallel checkouts, but that's another potential.

 builtin/clone.c | 16 ++--
 builtin/submodule--helper.c | 18 +++---
 git-submodule.sh|  5 +
 3 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 404c5e8..28ce938 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -670,7 +670,7 @@ static void update_head(const struct ref *our, const struct 
ref *remote,
}
 }
 
-static int checkout(void)
+static int checkout(int submodule_progress)
 {
unsigned char sha1[20];
char *head;
@@ -734,6 +734,9 @@ static int checkout(void)
if (max_jobs != -1)
argv_array_pushf(, "--jobs=%d", max_jobs);
 
+   if (submodule_progress)
+   argv_array_push(, "--progress");
+
err = run_command_v_opt(args.argv, RUN_GIT_CMD);
argv_array_clear();
}
@@ -841,6 +844,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
const char *src_ref_prefix = "refs/heads/";
struct remote *remote;
int err = 0, complete_refs_before_fetch = 1;
+   int submodule_progress;
 
struct refspec *refspec;
const char *fetch_pattern;
@@ -1099,6 +1103,14 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
 
update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
+   /*
+* We want to show progress for recursive submodule clones iff
+* we did so for the main clone. But only the transport knows
+* the final decision for this flag, so we need to rescue the value
+* before we free the transport.
+*/
+   submodule_progress = transport->progress;
+
transport_unlock_pack(transport);
transport_disconnect(transport);
 
@@ -1108,7 +1120,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
}
 
junk_mode = JUNK_LEAVE_REPO;
-   err = checkout();
+   err = checkout(submodule_progress);
 
strbuf_release(_msg);
strbuf_release(_top);
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 7b8ddfe..d2f9d7d 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -443,7 +443,8 @@ static int module_name(int argc, const char **argv, const 
char *prefix)
 }
 
 static int clone_submodule(const char *path, const char *gitdir, const char 
*url,
-  const char *depth, struct string_list *reference, 
int quiet)
+  const char *depth, struct string_list *reference,
+  int quiet, int progress)
 {
struct child_process cp = CHILD_PROCESS_INIT;
 
@@ 

Re: [PATCH tg/add-chmod+x-fix 2/2] t3700-add: protect one --chmod=+x test with POSIXPERM

2016-09-21 Thread Johannes Sixt

Am 21.09.2016 um 23:12 schrieb Junio C Hamano:

Johannes Sixt  writes:


But I came to a different conclusion as I said in a message that
crossed yours. I hope Thomas can pick up the baton again.


Yeah, our mails crossed, apparently, and I do agree with your
reasoning.  How about this, then?

-- >8 --
From: Johannes Sixt 
Date: Tue, 20 Sep 2016 08:18:25 +0200
Subject: [PATCH] t3700-add: do not check working tree file mode without 
POSIXPERM

A recently introduced test checks the result of 'git status' after
setting the executable bit on a file. This check does not yield the
expected result when the filesystem does not support the executable
bit.

What we care about is that a file added with "--chmod=+x" has
executable bit in the index and that "--chmod=+x" (or any other
options for that matter) does not muck with working tree files.
The former is tested by other existing tests, so let's check the
latter more explicitly and only under POSIXPERM prerequisite.

Signed-off-by: Johannes Sixt 
Signed-off-by: Junio C Hamano 
---
 t/t3700-add.sh | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 16ab2da..924a266 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -361,13 +361,11 @@ test_expect_success 'git add --chmod=[+-]x changes index 
with already added file
test_mode_in_index 100644 xfoo3
 '

-test_expect_success 'file status is changed after git add --chmod=+x' '
-   echo "AM foo4" >expected &&
+test_expect_success POSIXPERM 'git add --chmod=[+-]x does not change the 
working tree' '
echo foo >foo4 &&
git add foo4 &&
git add --chmod=+x foo4 &&
-   git status -s foo4 >actual &&
-   test_cmp expected actual
+   ! test -x foo4
 '

 test_expect_success 'no file status change if no pathspec is given' '



That makes a lot of sense. Thank you so much!

-- Hannes



Re: [PATCH 1/2] ls-files: adding support for submodules

2016-09-21 Thread Jeff King
On Wed, Sep 21, 2016 at 04:13:22PM -0700, Junio C Hamano wrote:

> Brandon Williams  writes:
> 
> > yes you mentioned this and I meant to change that before sending it out.
> > Looks like it slipped through have slipped through.
> 
> I already fixed it up locally when I sent the reply, but thanks for
> resending (which assures me that your local copy is up-to-date and I
> do not have to worry about having to repeat me in the future, if
> this ever needs further rerolling ;-).

While we are on the subject, the commit message also uses some past
tense:

  Allow ls-files to recognize submodules in order to retrieve a list of
  files from a repository's submodules.  This is done by forking off a
  process to recursively call ls-files on all submodules. Also added a
  submodule-prefix command in order to prepend paths to child processes.

The final sentence should be "Also add...".

Since this final bit of logic was sufficiently non-obvious that it only
came about in v2, maybe it is worth describing a little more fully:

  Also add a submodule-prefix option, which instructs the child
  processes to prepend the prefix to each path they output. This makes
  the output paths match what is on the filesystem (i.e., as if the
  submodule boundaries were not there at all).

Should this option just be "--prefix", or maybe "--output-prefix"?
Submodules are the obvious use case here, but I could see somebody
adapting this for other uses (alternatively, if we _do_ want to keep it
just as an implementation detail for submodules, we should probably
discourage people in the documentation from using it themselves).

-Peff


Re: [PATCH] verify_packfile: check pack validity before accessing data

2016-09-21 Thread Jeff King
On Wed, Sep 21, 2016 at 11:49:05PM -0400, Jeff King wrote:

> The verify_packfile() does not explicitly open the packfile;
> instead, it starts with a sha1 checksum over the whole pack,
> and relies on use_pack() to open the packfile as a side
> effect.
> 
> If the pack cannot be opened for whatever reason (either
> because its header information is corrupted, or perhaps
> because a simultaneous repack deleted it), then use_pack()
> will die(), as it has no way to return an error. This is not
> ideal, as verify_packfile() otherwise tries to gently return
> an error (this lets programs like git-fsck go on to check
> other packs).
> 
> Instead, let's check is_pack_valid() up front, and return an
> error if it fails. This will open the pack as a side effect,
> and then use_pack() will later rely on our cached
> descriptor, and avoid calling die().

I actually had an ulterior motive, but it didn't pan out. I
think this change is an improvement on its own, which is why I posted
it. But here's my ulterior motive, for reference.

The die() in question happens when use_pack() is asked to lazily open a
pack, but it fails. If this happens then the code in question is racy
with respect to somebody else running a repack, because the pack we are
looking for might go away, but we could find the object in another pack.
Since we cannot handle the retry at this level, callers of use_pack()
should generally use is_pack_valid() early to cache the descriptor (and
at that early stage, they can still bail to another pack if necessary).
See the comment in fill_pack_entry(), for example, or the discussion in 
4c08018 (pack-objects: protect against disappearing packs, 2011-10-14).

So I wanted to know whether there were any code paths that failed to do
so, and just blindly rely on the lazy-open. Finding the races is
inherently hard, because you only catch them when somebody else is doing
a repack. But if we just _remove_ the lazy-load, then it becomes easy to
catch anybody relying on it. Like:

diff --git a/sha1_file.c b/sha1_file.c
index b9c1fa3..f3d7615 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1122,8 +1122,8 @@ unsigned char *use_pack(struct packed_git *p,
 * hash, and the in_window function above wouldn't match
 * don't allow an offset too close to the end of the file.
 */
-   if (!p->pack_size && p->pack_fd == -1 && open_packed_git(p))
-   die("packfile %s cannot be accessed", p->pack_name);
+   if (!p->pack_size && p->pack_fd == -1)
+   die("BUG: use_pack() called on unopened '%s'", p->pack_name);
if (offset > (p->pack_size - 20))
die("offset beyond end of packfile (truncated pack?)");
if (offset < 0)
@@ -1140,8 +1140,8 @@ unsigned char *use_pack(struct packed_git *p,
size_t window_align = packed_git_window_size / 2;
off_t len;
 
-   if (p->pack_fd == -1 && open_packed_git(p))
-   die("packfile %s cannot be accessed", 
p->pack_name);
+   if (p->pack_fd == -1)
+   die("BUG: use_pack() called on unopened '%s'", 
p->pack_name);
 
win = xcalloc(1, sizeof(*win));
win->offset = (offset / window_align) * window_align;


Running the test suite with the patch above revealed the issue in
verify_packfile (and with my patch, the test suite now passes, even with
this).

So I was hoping that we could convert these into assertions as above.
But the test suite passing does not quite tell the whole story. We might
still close a pack in the middle of an operation if we need to open
another one and are running against the system file descriptor limits.
That would only trigger in a repository with a large number of packs (or
a very low descriptor limit).

In such a case, we are relying on the lazy-load (and we _are_ racy!).
But the patch above would punish people on low-descriptor systems. It's
better to have an unlikely race and complete the request than to fail
consistently. :-/

For people who are running high-traffic servers, they just need to make
sure their file descriptor limit is reasonably high to avoid the race.

-Peff


[PATCH] verify_packfile: check pack validity before accessing data

2016-09-21 Thread Jeff King
The verify_packfile() does not explicitly open the packfile;
instead, it starts with a sha1 checksum over the whole pack,
and relies on use_pack() to open the packfile as a side
effect.

If the pack cannot be opened for whatever reason (either
because its header information is corrupted, or perhaps
because a simultaneous repack deleted it), then use_pack()
will die(), as it has no way to return an error. This is not
ideal, as verify_packfile() otherwise tries to gently return
an error (this lets programs like git-fsck go on to check
other packs).

Instead, let's check is_pack_valid() up front, and return an
error if it fails. This will open the pack as a side effect,
and then use_pack() will later rely on our cached
descriptor, and avoid calling die().

Signed-off-by: Jeff King 
---
 pack-check.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/pack-check.c b/pack-check.c
index d123846..c5c7763 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -57,11 +57,8 @@ static int verify_packfile(struct packed_git *p,
int err = 0;
struct idx_entry *entries;
 
-   /* Note that the pack header checks are actually performed by
-* use_pack when it first opens the pack file.  If anything
-* goes wrong during those checks then the call will die out
-* immediately.
-*/
+   if (!is_pack_valid(p))
+   return error("packfile %s cannot be accessed", p->pack_name);
 
git_SHA1_Init();
do {
-- 
2.10.0.482.gae5a597


What's cooking in git.git (Sep 2016, #06; Wed, 21)

2016-09-21 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

Quite a lot of topics have graduated to 'master'.  The tip of 'next'
has been rewound and rebuilt.  Accumulated fixes since v2.10.0 are
now almost ready to spawn the first maintenance update v2.10.1 but
not quite yet.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* bw/pathspec-remove-unused-extern-decl (2016-09-13) 1 commit
  (merged to 'next' on 2016-09-15 at c5b281b)
 + pathspec: remove unnecessary function prototypes

 Code cleanup.


* et/add-chmod-x (2016-09-12) 1 commit
  (merged to 'next' on 2016-09-15 at c81abae)
 + add: document the chmod option
 (this branch is used by tg/add-chmod+x-fix.)

 "git add --chmod=+x" added recently lacked documentation, which has
 been corrected.


* ew/http-do-not-forget-to-call-curl-multi-remove-handle (2016-09-13) 3 commits
  (merged to 'next' on 2016-09-15 at 696acb7)
 + http: always remove curl easy from curlm session on release
 + http: consolidate #ifdefs for curl_multi_remove_handle
 + http: warn on curl_multi_add_handle failures

 The http transport (with curl-multi option, which is the default
 these days) failed to remove curl-easy handle from a curlm session,
 which led to unnecessary API failures.


* jk/delta-base-cache (2016-09-12) 1 commit
  (merged to 'next' on 2016-09-15 at 1e35f8d)
 + add_delta_base_cache: use list_for_each_safe

 Recently we updated the code to manage the in-core cache that holds
 objects that have recently been used to reconstitute other objects
 that are stored as deltas against them, but the update used an
 incorrect API function to manage the list of these objects.  This
 has been fixed.
 This is a last-minute fix to a topic that graduated to 'master'
 post 2.10 release.


* jk/patch-ids-no-merges (2016-09-12) 2 commits
  (merged to 'next' on 2016-09-15 at 14bb3a0)
 + patch-ids: refuse to compute patch-id for merge commit
 + patch-ids: turn off rename detection

 "git log --cherry-pick" used to include merge commits as candidates
 to be matched up with other commits, resulting a lot of wasted time.
 The patch-id generation logic has been updated to ignore merges to
 avoid the wastage.


* jk/rebase-i-drop-ident-check (2016-07-29) 1 commit
  (merged to 'next' on 2016-08-14 at 6891bcd)
 + rebase-interactive: drop early check for valid ident

 Even when "git pull --rebase=preserve" (and the underlying "git
 rebase --preserve") can complete without creating any new commit
 (i.e. fast-forwards), it still insisted on having a usable ident
 information (read: user.email is set correctly), which was less
 than nice.  As the underlying commands used inside "git rebase"
 would fail with a more meaningful error message and advice text
 when the bogus ident matters, this extra check was removed.


* jk/reduce-gc-aggressive-depth (2016-08-11) 1 commit
  (merged to 'next' on 2016-08-11 at 6810c6f)
 + gc: default aggressive depth to 50

 "git gc --aggressive" used to limit the delta-chain length to 250,
 which is way too deep for gaining additional space savings and is
 detrimental for runtime performance.  The limit has been reduced to
 50.


* jk/setup-sequence-update (2016-09-13) 16 commits
  (merged to 'next' on 2016-09-15 at 4df8399)
 + t1007: factor out repeated setup
 + init: reset cached config when entering new repo
 + init: expand comments explaining config trickery
 + config: only read .git/config from configured repos
 + test-config: setup git directory
 + t1302: use "git -C"
 + pager: handle early config
 + pager: use callbacks instead of configset
 + pager: make pager_program a file-local static
 + pager: stop loading git_default_config()
 + pager: remove obsolete comment
 + diff: always try to set up the repository
 + diff: handle --no-index prefixes consistently
 + diff: skip implicit no-index check when given --no-index
 + patch-id: use RUN_SETUP_GENTLY
 + hash-object: always try to set up the git repository
 (this branch is used by nd/init-core-worktree-in-multi-worktree-world.)

 There were numerous corner cases in which the configuration files
 are read and used or not read at all depending on the directory a
 Git command was run, leading to inconsistent behaviour.  The code
 to set-up repository access at the beginning of a Git process has
 been updated to fix them.


* js/cat-file-filters (2016-09-11) 4 commits
  (merged to 'next' on 2016-09-15 at a231380)
 + cat-file: support --textconv/--filters in batch mode
 + cat-file --textconv/--filters: allow specifying the path separately
 + cat-file: introduce the --filters option
 + cat-file: fix a grammo in the man page

 

[PATCH 2/2 v2] ls-files: add pathspec matching for submodules

2016-09-21 Thread Brandon Williams
Pathspecs can be a bit tricky when trying to apply them to submodules.
The main challenge is that the pathspecs will be with respect to the
superproject and not with respect to paths in the submodule.  The
approach this patch takes is to pass in the identical pathspec from the
superproject to the submodule in addition to the submodule-prefix, which
is the path from the root of the superproject to the submodule, and then
we can compare an entry in the submodule prepended with the
submodule-prefix to the pathspec in order to determine if there is a
match.

This patch also permits the pathspec logic to perform a prefix match against
submodules since a pathspec could refer to a file inside of a submodule.
Due to limitations in the wildmatch logic, a prefix match is only done
literally.  If any wildcard character is encountered we'll simply punt
and produce a false positive match.  More accurate matching will be done
once inside the submodule.  This is due to the superproject not knowing
what files could exist in the submodule.

Signed-off-by: Brandon Williams 
---
 builtin/ls-files.c | 132 -
 dir.c  |  46 +++-
 dir.h  |   4 +
 t/t3007-ls-files-recurse-submodules.sh | 114 ++--
 4 files changed, 234 insertions(+), 62 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index ffd9ea6..fa4029e 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -177,12 +177,34 @@ static void show_gitlink(const struct cache_entry *ce)
 {
struct child_process cp = CHILD_PROCESS_INIT;
int status;
+   int i;
 
argv_array_push(, "ls-files");
argv_array_push(, "--recurse-submodules");
argv_array_pushf(, "--submodule-prefix=%s%s/",
 submodule_prefix ? submodule_prefix : "",
 ce->name);
+   /* add options */
+   if (show_eol)
+   argv_array_push(, "--eol");
+   if (show_valid_bit)
+   argv_array_push(, "-v");
+   if (show_stage)
+   argv_array_push(, "--stage");
+   if (show_cached)
+   argv_array_push(, "--cached");
+   if (debug_mode)
+   argv_array_push(, "--debug");
+
+   /*
+* Pass in the original pathspec args.  The submodule will be
+* responsible for prepending the 'submodule_prefix' prior to comparing
+* against the pathspec for matches.
+*/
+   argv_array_push(, "--");
+   for (i = 0; i < pathspec.nr; i++)
+   argv_array_push(, pathspec.items[i].original);
+
cp.git_cmd = 1;
cp.dir = ce->name;
status = run_command();
@@ -192,57 +214,62 @@ static void show_gitlink(const struct cache_entry *ce)
 
 static void show_ce_entry(const char *tag, const struct cache_entry *ce)
 {
+   struct strbuf name = STRBUF_INIT;
int len = max_prefix_len;
+   if (submodule_prefix)
+   strbuf_addstr(, submodule_prefix);
+   strbuf_addstr(, ce->name);
 
if (len >= ce_namelen(ce))
die("git ls-files: internal error - cache entry not superset of 
prefix");
 
-   if (!match_pathspec(, ce->name, ce_namelen(ce),
-   len, ps_matched,
-   S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode)))
-   return;
-   if (recurse_submodules && S_ISGITLINK(ce->ce_mode)) {
+   if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
+   submodule_path_match(, name.buf, ps_matched)) {
show_gitlink(ce);
-   return;
-   }
+   } else if (match_pathspec(, name.buf, name.len,
+ len, ps_matched,
+ S_ISDIR(ce->ce_mode) ||
+ S_ISGITLINK(ce->ce_mode))) {
+   if (tag && *tag && show_valid_bit &&
+   (ce->ce_flags & CE_VALID)) {
+   static char alttag[4];
+   memcpy(alttag, tag, 3);
+   if (isalpha(tag[0]))
+   alttag[0] = tolower(tag[0]);
+   else if (tag[0] == '?')
+   alttag[0] = '!';
+   else {
+   alttag[0] = 'v';
+   alttag[1] = tag[0];
+   alttag[2] = ' ';
+   alttag[3] = 0;
+   }
+   tag = alttag;
+   }
 
-   if (tag && *tag && show_valid_bit &&
-   (ce->ce_flags & CE_VALID)) {
-   static char alttag[4];
-   memcpy(alttag, tag, 3);
-   if (isalpha(tag[0]))
-   alttag[0] = tolower(tag[0]);
-   else if (tag[0] == '?')
-   alttag[0] = '!';
-

Re: [PATCH 2/2] ls-files: add pathspec matching for submodules

2016-09-21 Thread Brandon Williams
On Wed, Sep 21, 2016 at 3:53 PM, Junio C Hamano  wrote:
>
> Sounds sensible.  Just a minor nit in terminology, but I think we
> fairly consistently say "a superproject contains submodules" (run
> "git grep -E 'super *(module|project)'").
>
> I'd suggest s/super module/superproject/ for consistency.

Will do.

> An example of this test would be to match pathspec "sub/file" with
> submodule path "sub"?

Yep,  I believe there's a test for that case

> item->match[namelen] is accessed without checking if item->match[]
> is long enough here; shouldn't item->len be checked before doing
> that?

Oh right!  Good catch.

>
> Hmph, isn't this the one that is allowed produce false positive but
> cannot afford to give any false negative?  It feels a bit strange
> that the code checks two cases where we can positively say that it
> is worth descending into, and falling through would give "no this
> will never match".  That sounds like invitation for false negatives.
>
> IOW, I would have expected
>
> if (flags & DO_MATCH_SUBMODULE) {
> if (may match in this case)
> return MATCHED_RECURSIVE;
> if (may match in this other case)
> return MATCHED_RECURSIVE;
> ...
> if (obviously cannot match in this case)
> return 0;
> if (obviously cannot match in this other case)
> return 0;
> /* otherwise we cannot say */
> return MATCHED_RECURSIVELY;
> }
>
> as the general code structure.
>
> Fully spelled out,
>
> if (flags & DO_MATCH_SUBMODULE) {
> /* Check if the name is a literal prefix of the pathspec */
> if (namelen < item->len &&
> item->match[namelen] == '/' &&
> !ps_strncmp(item, match, name, namelen))
> return MATCHED_RECURSIVE;
>
> /* Does the literal leading part have chance of matching? */
> if (item->nowildcard_len < item->len &&
> namelen <= item->nowildcard_len &&
> ps_strncmp(item, match, name, namelen))
> return 0; /* no way "su?/file" can match "sib" */
>
> /* Otherwise we cannot say */
> return MATCHED_RECURSIVELY;
> }
>
> or something like that.  There may be other "obviously cannot match"
> cases we may want to add further.
>
> Thanks.

You're right it should be structured the other way.


Re: [PATCH 1/2] ls-files: adding support for submodules

2016-09-21 Thread Junio C Hamano
Brandon Williams  writes:

> yes you mentioned this and I meant to change that before sending it out.
> Looks like it slipped through have slipped through.

I already fixed it up locally when I sent the reply, but thanks for
resending (which assures me that your local copy is up-to-date and I
do not have to worry about having to repeat me in the future, if
this ever needs further rerolling ;-).


Re: [PATCH 2/2] ls-files: add pathspec matching for submodules

2016-09-21 Thread Junio C Hamano
Brandon Williams  writes:

> Pathspecs can be a bit tricky when trying to apply them to submodules.
> The main challenge is that the pathspecs will be with respect to the
> super module and not with respect to paths in the submodule.  The
> approach this patch takes is to pass in the identical pathspec from the
> super module to the submodule in addition to the submodule-prefix, which
> is the path from the root of the super module to the submodule, and then
> we can compare an entry in the submodule prepended with the
> submodule-prefix to the pathspec in order to determine if there is a
> match.
>
> This patch also permits the pathspec logic to perform a prefix match against
> submodules since a pathspec could refer to a file inside of a submodule.
> Due to limitations in the wildmatch logic, a prefix match is only done
> literally.  If any wildcard character is encountered we'll simply punt
> and produce a false positive match.  More accurate matching will be done
> once inside the submodule.  This is due to the super module not knowing
> what files could exist in the submodule.

Sounds sensible.  Just a minor nit in terminology, but I think we
fairly consistently say "a superproject contains submodules" (run
"git grep -E 'super *(module|project)'").

I'd suggest s/super module/superproject/ for consistency.

> diff --git a/dir.c b/dir.c
> index 0ea235f..9df6d36 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -207,8 +207,9 @@ int within_depth(const char *name, int namelen,
>   return 1;
>  }
>  
> -#define DO_MATCH_EXCLUDE   1
> -#define DO_MATCH_DIRECTORY 2
> +#define DO_MATCH_EXCLUDE   (1<<0)
> +#define DO_MATCH_DIRECTORY (1<<1)
> +#define DO_MATCH_SUBMODULE (1<<2)
>  
>  /*
>   * Does 'match' match the given name?
> @@ -283,6 +284,29 @@ static int match_pathspec_item(const struct 
> pathspec_item *item, int prefix,
>item->nowildcard_len - prefix))
>   return MATCHED_FNMATCH;
>  
> + /* Perform checks to see if "name" is a super set of the pathspec */
> + if (flags & DO_MATCH_SUBMODULE) {
> + /* Check if the name is a literal prefix of the pathspec */
> + if ((item->match[namelen] == '/') &&
> + !ps_strncmp(item, match, name, namelen))
> + return MATCHED_RECURSIVELY;

An example of this test would be to match pathspec "sub/file" with
submodule path "sub"?

item->match[namelen] is accessed without checking if item->match[]
is long enough here; shouldn't item->len be checked before doing
that?

> + /*
> +  * Here is where we would perform a wildmatch to check if
> +  * "name" can be matched as a directory (or a prefix) against
> +  * the pathspec.  Since wildmatch doesn't have this capability
> +  * at the present we have to punt and say that it is a match,
> +  * esentially returning a false positive (as long as "name"
> +  * matches upto the first wild character).
> +  * The submodules themselves will be able to perform more
> +  * accurate matching to determine if the pathspec matches.
> +  */
> + if (item->nowildcard_len < item->len &&
> + !ps_strncmp(item, match, name,
> + item->nowildcard_len - prefix))
> + return MATCHED_RECURSIVELY;

An example of this test would be to match pathspec "su?/file" with
submodule path "sub", where the substring up to nowildcard_len is
the leading literal string "su" that must match with the path (in
other words, a path "sib" will not match "su?/file").

> + }
> +

Hmph, isn't this the one that is allowed produce false positive but
cannot afford to give any false negative?  It feels a bit strange
that the code checks two cases where we can positively say that it
is worth descending into, and falling through would give "no this
will never match".  That sounds like invitation for false negatives.

IOW, I would have expected

if (flags & DO_MATCH_SUBMODULE) {
if (may match in this case)
return MATCHED_RECURSIVE;
if (may match in this other case)
return MATCHED_RECURSIVE;
...
if (obviously cannot match in this case)
return 0;
if (obviously cannot match in this other case)
return 0;
/* otherwise we cannot say */
return MATCHED_RECURSIVELY;
}

as the general code structure.

Fully spelled out,

if (flags & DO_MATCH_SUBMODULE) {
/* Check if the name is a literal prefix of the pathspec */
if (namelen < item->len &&
item->match[namelen] == '/' &&
!ps_strncmp(item, match, name, namelen))
return MATCHED_RECURSIVE;

/* Does the 

Re: v2.10.0: ls-tree exit status is always 0, this differs from ls(1)

2016-09-21 Thread Steffen Nurpmeso
Junio C Hamano  wrote:
 |Steffen Nurpmeso  writes:
 ...
 |Sorry, but I did not notice that there was an attached patch when I
 |was reading your response for the first time.  Risk of using an
 |attachment to e-mail ;-)
 |
 |I think this issue does not need a separate bullet point.  The
 |existing text says:
 ..
 |and what caused your surprise is already covered by the first bullet
 |point, if the reader knows what "patterns to match" means in Git's
 |command line tools; it just needs to be extended to be more
 |meaningful to those who don't, I think.
 |
 |How about rewriting the first bullet point like so instead:
 |
 |  - the behaviour is different from that of "/bin/ls" in that the
 |'' are actually patterns to match, e.g. so specifying
 |directory name (without `-r`) will behave differently, the order
 |of the arguments does not matter, and a '' argument that
 |does not match any path is not an error (i.e. if there is no
 |path that matches any pattern, nothing is shown in the output).

Not an error would have been an enlightenment to me.

But now i'm even getting nervous to read about patterns.
We have patterns for tags/remotes/branches, author/committer/grep
patterns, (most of those, maybe all today, with fixed string,
extended or basic regex), the git-grep patterns ("leading paths
match and glob(7) patterns are supported").  Is that all?
I would assume glob-style for ls-tree:

  ?0[steffen@wales ]$ git ls-tree HEAD `ls mime*`
  100644 blob ee47419c209da789b606ab6d979c22f4ae632712mime.c
  100644 blob 0cfe3766bd5f035eac06b728a4f63224455e13camime.types
  100644 blob 7d890df7553522691ed09f266ea7f9effb6a2f4emime_enc.c
  100644 blob 430e300d9a8887c5cd48d1cc63034168e47e9721mime_param.c
  100644 blob 0338a46d3247ea00b5bcedb2d82ff30fe5d18d48mime_parse.c
  100644 blob d62fa8defae27240a5ce81ad2239dd7f94b6c5c5mime_types.c
  ?0[steffen@wales ]$ git ls-tree HEAD 'mime*'
  ?0[steffen@wales ]$ git ls-tree HEAD 'mime.*'

No, ls-tree is not part of what i use every day, "Git's command
line tools" is (too) wide afield, in that sense.

Thank you (also in general, for git), and ciao from a country with
a pretty real autumn,

--steffen


[PATCH 1/2] ls-files: optionally recurse into submodules

2016-09-21 Thread Brandon Williams
Allow ls-files to recognize submodules in order to retrieve a list of
files from a repository's submodules.  This is done by forking off a
process to recursively call ls-files on all submodules. Also added a
submodule-prefix command in order to prepend paths to child processes.

Signed-off-by: Brandon Williams 
---
 Documentation/git-ls-files.txt | 11 +++-
 builtin/ls-files.c | 61 +
 t/t3007-ls-files-recurse-submodules.sh | 99 ++
 3 files changed, 170 insertions(+), 1 deletion(-)
 create mode 100755 t/t3007-ls-files-recurse-submodules.sh

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index 0d933ac..09e4449 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -18,7 +18,9 @@ SYNOPSIS
[--exclude-per-directory=]
[--exclude-standard]
[--error-unmatch] [--with-tree=]
-   [--full-name] [--abbrev] [--] [...]
+   [--full-name] [--recurse-submodules]
+   [--submodule-prefix=]
+   [--abbrev] [--] [...]
 
 DESCRIPTION
 ---
@@ -137,6 +139,13 @@ a space) at the start of each line:
option forces paths to be output relative to the project
top directory.
 
+--recurse-submodules::
+   Recursively calls ls-files on each submodule in the repository.
+   Currently there is only support for the --cached mode.
+
+--submodule-prefix=::
+   Prepend the provided path to the output of each file
+
 --abbrev[=]::
Instead of showing the full 40-byte hexadecimal object
lines, show only a partial prefix.
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 00ea91a..ffd9ea6 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -14,6 +14,7 @@
 #include "resolve-undo.h"
 #include "string-list.h"
 #include "pathspec.h"
+#include "run-command.h"
 
 static int abbrev;
 static int show_deleted;
@@ -28,6 +29,8 @@ static int show_valid_bit;
 static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
+static const char *submodule_prefix;
+static int recurse_submodules;
 
 static const char *prefix;
 static int max_prefix_len;
@@ -68,6 +71,21 @@ static void write_eolinfo(const struct cache_entry *ce, 
const char *path)
 static void write_name(const char *name)
 {
/*
+* NEEDSWORK: To make this thread-safe, full_name would have to be owned
+* by the caller.
+*
+* full_name get reused across output lines to minimize the allocation
+* churn.
+*/
+   static struct strbuf full_name = STRBUF_INIT;
+   if (submodule_prefix && *submodule_prefix) {
+   strbuf_reset(_name);
+   strbuf_addstr(_name, submodule_prefix);
+   strbuf_addstr(_name, name);
+   name = full_name.buf;
+   }
+
+   /*
 * With "--full-name", prefix_len=0; this caller needs to pass
 * an empty string in that case (a NULL is good for "").
 */
@@ -152,6 +170,26 @@ static void show_killed_files(struct dir_struct *dir)
}
 }
 
+/**
+ * Recursively call ls-files on a submodule
+ */
+static void show_gitlink(const struct cache_entry *ce)
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+   int status;
+
+   argv_array_push(, "ls-files");
+   argv_array_push(, "--recurse-submodules");
+   argv_array_pushf(, "--submodule-prefix=%s%s/",
+submodule_prefix ? submodule_prefix : "",
+ce->name);
+   cp.git_cmd = 1;
+   cp.dir = ce->name;
+   status = run_command();
+   if (status)
+   exit(status);
+}
+
 static void show_ce_entry(const char *tag, const struct cache_entry *ce)
 {
int len = max_prefix_len;
@@ -163,6 +201,10 @@ static void show_ce_entry(const char *tag, const struct 
cache_entry *ce)
len, ps_matched,
S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode)))
return;
+   if (recurse_submodules && S_ISGITLINK(ce->ce_mode)) {
+   show_gitlink(ce);
+   return;
+   }
 
if (tag && *tag && show_valid_bit &&
(ce->ce_flags & CE_VALID)) {
@@ -468,6 +510,10 @@ int cmd_ls_files(int argc, const char **argv, const char 
*cmd_prefix)
{ OPTION_SET_INT, 0, "full-name", _len, NULL,
N_("make the output relative to the project top 
directory"),
PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL },
+   OPT_STRING(0, "submodule-prefix", _prefix,
+   N_("path"), N_("prepend  to each file")),
+   OPT_BOOL(0, "recurse-submodules", _submodules,
+   N_("recurse through submodules")),
OPT_BOOL(0, "error-unmatch", _unmatch,
N_("if any  is not in the index, 

Re: [PATCH 1/2] ls-files: adding support for submodules

2016-09-21 Thread Brandon Williams
yes you mentioned this and I meant to change that before sending it out.
Looks like it slipped through have slipped through.


Re: [PATCH 4/3] http: check curl_multi_remove_handle error code

2016-09-21 Thread Jeff King
On Wed, Sep 21, 2016 at 10:29:59PM +, Eric Wong wrote:

> Jeff King  wrote:
> > On Wed, Sep 21, 2016 at 09:46:23PM +, Eric Wong wrote:
> > 
> > > ---8<---
> > > Subject: [PATCH] http: check curl_multi_remove_handle error code
> > > 
> > > This should help detect bugs in future changes.  While we're at
> > > it, fix a (probably innocuous) bug in our http_cleanup function
> > > for users of older curl.
> > > 
> > > curl_multi_remove_handle was not idempotent until curl 7.33.0
> > > with commit 84f3b3dd448399f9548468676e1bd1475dba8de5
> > > ("curl_multi_remove_handle: allow multiple removes"),
> > > so we track the "curlm" membership of the curl easy handle
> > > ourselves with a new "in_multi" flag.
> > 
> > Does curl provide a meaningful error here? I'm just wondering if we
> > could simply let curl handle this, and just ignore the error that comes
> > from older versions. We're basically just replicating curl's own state
> > data here.
> 
> curl before 7.33.0 returned CURLM_BAD_EASY_HANDLE.  This error
> code also happens if we pass a bad/corrupt easy handle;
> so it could be hiding an error on our end.

Hmm, yeah. So we would want to make it conditional on the version
anyway, so that we detected such bugs on more modern systems. That still
doesn't seem unreasonable to me, but your patch is also not very big, so
it's probably fine (the major risk with yours is simply that our state
and curl's state get out of sync).

-Peff


Re: [PATCH 4/3] http: check curl_multi_remove_handle error code

2016-09-21 Thread Eric Wong
Jeff King  wrote:
> On Wed, Sep 21, 2016 at 09:46:23PM +, Eric Wong wrote:
> 
> > ---8<---
> > Subject: [PATCH] http: check curl_multi_remove_handle error code
> > 
> > This should help detect bugs in future changes.  While we're at
> > it, fix a (probably innocuous) bug in our http_cleanup function
> > for users of older curl.
> > 
> > curl_multi_remove_handle was not idempotent until curl 7.33.0
> > with commit 84f3b3dd448399f9548468676e1bd1475dba8de5
> > ("curl_multi_remove_handle: allow multiple removes"),
> > so we track the "curlm" membership of the curl easy handle
> > ourselves with a new "in_multi" flag.
> 
> Does curl provide a meaningful error here? I'm just wondering if we
> could simply let curl handle this, and just ignore the error that comes
> from older versions. We're basically just replicating curl's own state
> data here.

curl before 7.33.0 returned CURLM_BAD_EASY_HANDLE.  This error
code also happens if we pass a bad/corrupt easy handle;
so it could be hiding an error on our end.


Re: [PATCH 1/2] ls-files: adding support for submodules

2016-09-21 Thread Junio C Hamano
Brandon Williams  writes:

> This is another version of the first ls-files patch i sent out in
> order.  In this version I fixed the option
>  name to be '--submodule-prefix'.

I understand that many internal changes in your work environment are
titled like "DOing X", but our convention around here is to label
them "DO X", as if you are giving an order to somebody else, either
telling the codebase "to be like so", or telling the patch-monkey
maintainer "to make it so".  So I'd suggest retitling it to

ls-files: optionally recurse into submodules

or something like that.  It is an added advantage of being a lot
more descriptive than "adding support", which does not say what kind
of support it is adding.


Re: [PATCH] gitweb: use highlight's shebang detection

2016-09-21 Thread Ian Kelling
fyi: I mistakenly did not include v2 in the subject of the last message.


Re: [PATCH 4/3] http: check curl_multi_remove_handle error code

2016-09-21 Thread Jeff King
On Wed, Sep 21, 2016 at 09:46:23PM +, Eric Wong wrote:

> ---8<---
> Subject: [PATCH] http: check curl_multi_remove_handle error code
> 
> This should help detect bugs in future changes.  While we're at
> it, fix a (probably innocuous) bug in our http_cleanup function
> for users of older curl.
> 
> curl_multi_remove_handle was not idempotent until curl 7.33.0
> with commit 84f3b3dd448399f9548468676e1bd1475dba8de5
> ("curl_multi_remove_handle: allow multiple removes"),
> so we track the "curlm" membership of the curl easy handle
> ourselves with a new "in_multi" flag.

Does curl provide a meaningful error here? I'm just wondering if we
could simply let curl handle this, and just ignore the error that comes
from older versions. We're basically just replicating curl's own state
data here.

-Peff


[PATCH] gitweb: use highlight's shebang detection

2016-09-21 Thread Ian Kelling
The highlight binary can detect language by shebang when we can't tell
the syntax type by the name of the file. In that case, pass the blob
to "highlight --force" and the resulting html will have markup for
highlighting if the language was detected.

Document the feature and improve syntax highlight documentation, add
test to ensure gitweb doesn't crash when language detection is used,
and remove an unused parameter from gitweb_check_feature().

Signed-off-by: Ian Kelling 
---
 Documentation/gitweb.conf.txt  | 21 ++---
 gitweb/gitweb.perl | 14 +++---
 t/t9500-gitweb-standalone-no-errors.sh |  8 
 3 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index a79e350..e632089 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -246,13 +246,20 @@ $highlight_bin::
Note that 'highlight' feature must be set for gitweb to actually
use syntax highlighting.
 +
-*NOTE*: if you want to add support for new file type (supported by
-"highlight" but not used by gitweb), you need to modify `%highlight_ext`
-or `%highlight_basename`, depending on whether you detect type of file
-based on extension (for example "sh") or on its basename (for example
-"Makefile").  The keys of these hashes are extension and basename,
-respectively, and value for given key is name of syntax to be passed via
-`--syntax ` to highlighter.
+*NOTE*: for a file to be highlighted, its syntax type must be detected
+and that syntax must be supported by "highlight".  The default syntax
+detection is minimal, and there are many supported syntax types with no
+detection by default.  There are three options for adding syntax
+detection.  The first and second priority are `%highlight_basename` and
+`%highlight_ext`, which detect based on basename (the full filename, for
+example "Makefile") and extension (for example "sh").  The keys of these
+hashes are the basename and extension, respectively, and the value for a
+given key is the name of the syntax to be passed via `--syntax `
+to "highlight".  The last priority is the "highlight" configuration of
+`Shebang` regular expressions to detect the language based on the first
+line in the file, (for example, matching the line "#!/bin/bash").  See
+the highlight documentation and the default config at
+/etc/highlight/filetypes.conf for more details.
 +
 For example if repositories you are hosting use "phtml" extension for
 PHP files, and you want to have correct syntax-highlighting for those
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 33d701d..44094f4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3913,7 +3913,7 @@ sub blob_contenttype {
 # guess file syntax for syntax highlighting; return undef if no highlighting
 # the name of syntax can (in the future) depend on syntax highlighter used
 sub guess_file_syntax {
-   my ($highlight, $mimetype, $file_name) = @_;
+   my ($highlight, $file_name) = @_;
return undef unless ($highlight && defined $file_name);
my $basename = basename($file_name, '.in');
return $highlight_basename{$basename}
@@ -3931,15 +3931,16 @@ sub guess_file_syntax {
 # or return original FD if no highlighting
 sub run_highlighter {
my ($fd, $highlight, $syntax) = @_;
-   return $fd unless ($highlight && defined $syntax);
+   return $fd unless ($highlight);
 
close $fd;
+   my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
  quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', 
'-pse',
'$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
'--', "-fe=$fallback_encoding")." | ".
  quote_command($highlight_bin).
- " --replace-tabs=8 --fragment --syntax $syntax |"
+ " --replace-tabs=8 --fragment $syntax_arg |"
or die_error(500, "Couldn't open file or run syntax 
highlighter");
return $fd;
 }
@@ -7062,9 +7063,8 @@ sub git_blob {
$have_blame &&= ($mimetype =~ m!^text/!);
 
my $highlight = gitweb_check_feature('highlight');
-   my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
-   $fd = run_highlighter($fd, $highlight, $syntax)
-   if $syntax;
+   my $syntax = guess_file_syntax($highlight, $file_name);
+   $fd = run_highlighter($fd, $highlight, $syntax);
 
git_header_html(undef, $expires);
my $formats_nav = '';
@@ -7117,7 +7117,7 @@ sub git_blob {
$line = untabify($line);
printf qq!%4i %s\n!,
   $nr, esc_attr(href(-replay => 1)), $nr, $nr,
-  $syntax ? sanitize($line) : esc_html($line, 
-nbsp=>1);
+ 

Re: [PATCH] gitweb: use highlight's shebang detection

2016-09-21 Thread Ian Kelling
On Tue, Sep 20, 2016, at 01:22 PM, Jakub Narębski wrote:
> W dniu 06.09.2016 o 21:00, Ian Kelling pisze:
>
> > The highlight binary can detect language by shebang when we can't tell
> > the syntax type by the name of the file.
>
> Was it something always present among highlight[1] binary capabilities,
> or is it something present only in new enough highlight app?  Or only
> in some specific fork / specific binary?  I couldn't find language
> detection in highlight[1] documentation...
>
> [1]: http://www.andre-simon.de/doku/highlight/en/highlight.php

Search for the word shebang, it's mentioned twice.

>
> If this feature is available only for some version, or for some
> highlighters, gitweb would have to provide an option to configure
> it.  It might be an additional configuration variable, it might
> be a special value in the %highlight_basename or %highlight_ext.

Good question. It was added upstream in 2007, and I tested that it's
functioning in the earliest distros I have easy access to: ubuntu 14.04
and debian wheezy.

>
> >  To use highlight's shebang
> > detection, add highlight to the pipeline whenever highlight is enabled.
>
> This describes what this patch does, but the sentence feels
> a bit convoluted, as it is stated.
>

Agreed. I've changed it in v2 of the patch, and perhaps this will make
the rest of the patch clearer too. The new paragraph is:

The highlight binary can detect language by shebang when we can't
tell
the syntax type by the name of the file. In that case, pass the blob
to "highlight --force" and the resulting html will have markup for
highlighting if the language was detected.



> >
> > Document the shebang detection and add a test which exercises it in
> > t/t9500-gitweb-standalone-no-errors.sh.
>
> Nice!
>
> >
> > Signed-off-by: Ian Kelling 
> > ---
> >
> > Notes:
> > I wondered if adding highlight to the pipeline would make viewing a blob
> > with no highlighting take longer but it did not on my computer. I found
> > no noticeable impact on small files and strangely, on a 159k file, it
> > took 7% less time averaged over several requests.
>
> Strange.  I would guess that invoking separate binary and perl would
> always
> add to the time (especially on operation systems where forking / running
> command is expensive... though those are not often used with web servers,
> isn't it).

I dug into this a little more, and I think it's because when we call
highlight, we later call sanitize() instead of esc_html(). sanitize() is
faster and makes up for the extra time highlight takes. I ran a test on
my machine calling sanitize and esc_html on each line of gitweb.perl 100
times: 7.4s for sanitize, 12.4s for esc_html.

>
> >
> >  Documentation/gitweb.conf.txt  | 21 ++---
> >  gitweb/gitweb.perl | 10 +-
> >  t/t9500-gitweb-standalone-no-errors.sh | 18 +-
> >  3 files changed, 32 insertions(+), 17 deletions(-)
> >
> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > index a79e350..e632089 100644
> > --- a/Documentation/gitweb.conf.txt
> > +++ b/Documentation/gitweb.conf.txt
> > @@ -246,13 +246,20 @@ $highlight_bin::
> > Note that 'highlight' feature must be set for gitweb to actually
> > use syntax highlighting.
> >  +
> > -*NOTE*: if you want to add support for new file type (supported by
> > -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> > -or `%highlight_basename`, depending on whether you detect type of file
> > -based on extension (for example "sh") or on its basename (for example
> > -"Makefile").  The keys of these hashes are extension and basename,
> > -respectively, and value for given key is name of syntax to be passed via
> > -`--syntax ` to highlighter.
> > +*NOTE*: for a file to be highlighted, its syntax type must be detected
> > +and that syntax must be supported by "highlight".  The default syntax
> > +detection is minimal, and there are many supported syntax types with no
> > +detection by default.  There are three options for adding syntax
> > +detection.  The first and second priority are `%highlight_basename` and
> > +`%highlight_ext`, which detect based on basename (the full filename, for
> > +example "Makefile") and extension (for example "sh").  The keys of these
> > +hashes are the basename and extension, respectively, and the value for a
> > +given key is the name of the syntax to be passed via `--syntax `
> > +to "highlight".  The last priority is the "highlight" configuration of
> > +`Shebang` regular expressions to detect the language based on the first
> > +line in the file, (for example, matching the line "#!/bin/bash").  See
> > +the highlight documentation and the default config at
> > +/etc/highlight/filetypes.conf for more details.
>
> All right; in addition to expanding the docs, it also improves them.

Noted in v2 commit 

Re: [PATCH v4 3/3] regex: use regexec_buf()

2016-09-21 Thread Jeff King
On Wed, Sep 21, 2016 at 08:24:14PM +0200, Johannes Schindelin wrote:

> The new regexec_buf() function operates on buffers with an explicitly
> specified length, rather than NUL-terminated strings.
> 
> We need to use this function whenever the buffer we want to pass to
> regexec() may have been mmap()ed (and is hence not NUL-terminated).
> 
> Note: the original motivation for this patch was to fix a bug where
> `git diff -G ` would crash. This patch converts more callers,
> though, some of which explicitly allocated and constructed
> NUL-terminated strings (or worse: modified read-only buffers to insert
> NULs).

Nice. I probably would have split these into their own patch, but I
think it is OK here.

> @@ -228,18 +227,16 @@ static long ff_regexp(const char *line, long len,
>   len--;
>   }
>  
> - line_buffer = xstrndup(line, len); /* make NUL terminated */
> -

Nice to see this one going away in particular, since it's called quite a
lot. According to perf, "git log -p" on git.git drops about 1.5 million
malloc calls (about 9% of the total). And here are best-of-five results
for that same command:

  [before]
  real0m14.676s
  user0m13.988s
  sys 0m0.676s

  [after]
  real0m14.394s
  user0m13.624s
  sys 0m0.760s

Not a _huge_ improvement, but more significant than the run-to-run
noise.

-Peff


Re: [PATCH 1/2] ls-files: adding support for submodules

2016-09-21 Thread Brandon Williams
This is another version of the first ls-files patch i sent out in
order.  In this version I fixed the option
 name to be '--submodule-prefix'.


[PATCH 1/2] ls-files: adding support for submodules

2016-09-21 Thread Brandon Williams
Allow ls-files to recognize submodules in order to retrieve a list of
files from a repository's submodules.  This is done by forking off a
process to recursively call ls-files on all submodules. Also added a
submodule-prefix command in order to prepend paths to child processes.

Signed-off-by: Brandon Williams 
---
 Documentation/git-ls-files.txt | 11 +++-
 builtin/ls-files.c | 61 +
 t/t3007-ls-files-recurse-submodules.sh | 99 ++
 3 files changed, 170 insertions(+), 1 deletion(-)
 create mode 100755 t/t3007-ls-files-recurse-submodules.sh

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index 0d933ac..09e4449 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -18,7 +18,9 @@ SYNOPSIS
[--exclude-per-directory=]
[--exclude-standard]
[--error-unmatch] [--with-tree=]
-   [--full-name] [--abbrev] [--] [...]
+   [--full-name] [--recurse-submodules]
+   [--submodule-prefix=]
+   [--abbrev] [--] [...]
 
 DESCRIPTION
 ---
@@ -137,6 +139,13 @@ a space) at the start of each line:
option forces paths to be output relative to the project
top directory.
 
+--recurse-submodules::
+   Recursively calls ls-files on each submodule in the repository.
+   Currently there is only support for the --cached mode.
+
+--submodule-prefix=::
+   Prepend the provided path to the output of each file
+
 --abbrev[=]::
Instead of showing the full 40-byte hexadecimal object
lines, show only a partial prefix.
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 00ea91a..ffd9ea6 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -14,6 +14,7 @@
 #include "resolve-undo.h"
 #include "string-list.h"
 #include "pathspec.h"
+#include "run-command.h"
 
 static int abbrev;
 static int show_deleted;
@@ -28,6 +29,8 @@ static int show_valid_bit;
 static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
+static const char *submodule_prefix;
+static int recurse_submodules;
 
 static const char *prefix;
 static int max_prefix_len;
@@ -68,6 +71,21 @@ static void write_eolinfo(const struct cache_entry *ce, 
const char *path)
 static void write_name(const char *name)
 {
/*
+* NEEDSWORK: To make this thread-safe, full_name would have to be owned
+* by the caller.
+*
+* full_name get reused across output lines to minimize the allocation
+* churn.
+*/
+   static struct strbuf full_name = STRBUF_INIT;
+   if (submodule_prefix && *submodule_prefix) {
+   strbuf_reset(_name);
+   strbuf_addstr(_name, submodule_prefix);
+   strbuf_addstr(_name, name);
+   name = full_name.buf;
+   }
+
+   /*
 * With "--full-name", prefix_len=0; this caller needs to pass
 * an empty string in that case (a NULL is good for "").
 */
@@ -152,6 +170,26 @@ static void show_killed_files(struct dir_struct *dir)
}
 }
 
+/**
+ * Recursively call ls-files on a submodule
+ */
+static void show_gitlink(const struct cache_entry *ce)
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+   int status;
+
+   argv_array_push(, "ls-files");
+   argv_array_push(, "--recurse-submodules");
+   argv_array_pushf(, "--submodule-prefix=%s%s/",
+submodule_prefix ? submodule_prefix : "",
+ce->name);
+   cp.git_cmd = 1;
+   cp.dir = ce->name;
+   status = run_command();
+   if (status)
+   exit(status);
+}
+
 static void show_ce_entry(const char *tag, const struct cache_entry *ce)
 {
int len = max_prefix_len;
@@ -163,6 +201,10 @@ static void show_ce_entry(const char *tag, const struct 
cache_entry *ce)
len, ps_matched,
S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode)))
return;
+   if (recurse_submodules && S_ISGITLINK(ce->ce_mode)) {
+   show_gitlink(ce);
+   return;
+   }
 
if (tag && *tag && show_valid_bit &&
(ce->ce_flags & CE_VALID)) {
@@ -468,6 +510,10 @@ int cmd_ls_files(int argc, const char **argv, const char 
*cmd_prefix)
{ OPTION_SET_INT, 0, "full-name", _len, NULL,
N_("make the output relative to the project top 
directory"),
PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL },
+   OPT_STRING(0, "submodule-prefix", _prefix,
+   N_("path"), N_("prepend  to each file")),
+   OPT_BOOL(0, "recurse-submodules", _submodules,
+   N_("recurse through submodules")),
OPT_BOOL(0, "error-unmatch", _unmatch,
N_("if any  is not in the index, 

[PATCH 2/2] ls-files: add pathspec matching for submodules

2016-09-21 Thread Brandon Williams
Pathspecs can be a bit tricky when trying to apply them to submodules.
The main challenge is that the pathspecs will be with respect to the
super module and not with respect to paths in the submodule.  The
approach this patch takes is to pass in the identical pathspec from the
super module to the submodule in addition to the submodule-prefix, which
is the path from the root of the super module to the submodule, and then
we can compare an entry in the submodule prepended with the
submodule-prefix to the pathspec in order to determine if there is a
match.

This patch also permits the pathspec logic to perform a prefix match against
submodules since a pathspec could refer to a file inside of a submodule.
Due to limitations in the wildmatch logic, a prefix match is only done
literally.  If any wildcard character is encountered we'll simply punt
and produce a false positive match.  More accurate matching will be done
once inside the submodule.  This is due to the super module not knowing
what files could exist in the submodule.

Signed-off-by: Brandon Williams 
---
 builtin/ls-files.c | 132 -
 dir.c  |  43 ++-
 dir.h  |   4 +
 t/t3007-ls-files-recurse-submodules.sh | 114 ++--
 4 files changed, 231 insertions(+), 62 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index ffd9ea6..fa4029e 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -177,12 +177,34 @@ static void show_gitlink(const struct cache_entry *ce)
 {
struct child_process cp = CHILD_PROCESS_INIT;
int status;
+   int i;
 
argv_array_push(, "ls-files");
argv_array_push(, "--recurse-submodules");
argv_array_pushf(, "--submodule-prefix=%s%s/",
 submodule_prefix ? submodule_prefix : "",
 ce->name);
+   /* add options */
+   if (show_eol)
+   argv_array_push(, "--eol");
+   if (show_valid_bit)
+   argv_array_push(, "-v");
+   if (show_stage)
+   argv_array_push(, "--stage");
+   if (show_cached)
+   argv_array_push(, "--cached");
+   if (debug_mode)
+   argv_array_push(, "--debug");
+
+   /*
+* Pass in the original pathspec args.  The submodule will be
+* responsible for prepending the 'submodule_prefix' prior to comparing
+* against the pathspec for matches.
+*/
+   argv_array_push(, "--");
+   for (i = 0; i < pathspec.nr; i++)
+   argv_array_push(, pathspec.items[i].original);
+
cp.git_cmd = 1;
cp.dir = ce->name;
status = run_command();
@@ -192,57 +214,62 @@ static void show_gitlink(const struct cache_entry *ce)
 
 static void show_ce_entry(const char *tag, const struct cache_entry *ce)
 {
+   struct strbuf name = STRBUF_INIT;
int len = max_prefix_len;
+   if (submodule_prefix)
+   strbuf_addstr(, submodule_prefix);
+   strbuf_addstr(, ce->name);
 
if (len >= ce_namelen(ce))
die("git ls-files: internal error - cache entry not superset of 
prefix");
 
-   if (!match_pathspec(, ce->name, ce_namelen(ce),
-   len, ps_matched,
-   S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode)))
-   return;
-   if (recurse_submodules && S_ISGITLINK(ce->ce_mode)) {
+   if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
+   submodule_path_match(, name.buf, ps_matched)) {
show_gitlink(ce);
-   return;
-   }
+   } else if (match_pathspec(, name.buf, name.len,
+ len, ps_matched,
+ S_ISDIR(ce->ce_mode) ||
+ S_ISGITLINK(ce->ce_mode))) {
+   if (tag && *tag && show_valid_bit &&
+   (ce->ce_flags & CE_VALID)) {
+   static char alttag[4];
+   memcpy(alttag, tag, 3);
+   if (isalpha(tag[0]))
+   alttag[0] = tolower(tag[0]);
+   else if (tag[0] == '?')
+   alttag[0] = '!';
+   else {
+   alttag[0] = 'v';
+   alttag[1] = tag[0];
+   alttag[2] = ' ';
+   alttag[3] = 0;
+   }
+   tag = alttag;
+   }
 
-   if (tag && *tag && show_valid_bit &&
-   (ce->ce_flags & CE_VALID)) {
-   static char alttag[4];
-   memcpy(alttag, tag, 3);
-   if (isalpha(tag[0]))
-   alttag[0] = tolower(tag[0]);
-   else if (tag[0] == '?')
-   alttag[0] = '!';
- 

Re: [PATCH v4 0/3] Fix a segfault caused by regexec() being called on mmap()ed data

2016-09-21 Thread Jeff King
On Wed, Sep 21, 2016 at 08:23:11PM +0200, Johannes Schindelin wrote:

> We solve this by introducing a helper, regexec_buf(), that takes a
> pointer and a length instead of a NUL-terminated string.
> 
> This helper then uses REG_STARTEND where available, and falls back to
> allocating and constructing a NUL-terminated string. Given the
> wide-spread support for REG_STARTEND (Linux has it, MacOSX has it, Git
> for Windows has it because it uses compat/regex/ that has it), I think
> this is a fair trade-off.

I did a double-take on this, but then read:

> Changes since v3:
> [...]
> - removed fallback when REG_STARTEND is not supported, in favor of
>   requiring NO_REGEX.

So I think we are all in agreement. :)

With the exception of a few commit message fixups that Junio already
pointed out, this looks good to me. Thanks.

-Peff


[PATCH 4/3] http: check curl_multi_remove_handle error code

2016-09-21 Thread Eric Wong
Junio C Hamano  wrote:
> Eric Wong  writes:
> > The key patch here is 3/3 which seems like an obvious fix to
> > adding the problem of adding a curl easy handle to a curl multi
> > handle repeatedly.
> 
> Yeah, sounds like the right thing to do and 2/3 makes it really easy
> to read the resulting code.
> 
> > I will investigate those failures in a week or two when I regain
> > regular computer access.
> 
> Thanks. Will tentatively queue on 'pu' and wait for updates.

I'm comfortable with the original 3 patch series in 'next'
and being merged to 'master' and 'maint', soon.

I don't think the following 4/3 is strictly necessary now, so
I'd be more comfortable with it being tested in 'pu' or 'next'
for a longer period.

(online today, but not much tomorrow or another few days after)

---8<---
Subject: [PATCH] http: check curl_multi_remove_handle error code

This should help detect bugs in future changes.  While we're at
it, fix a (probably innocuous) bug in our http_cleanup function
for users of older curl.

curl_multi_remove_handle was not idempotent until curl 7.33.0
with commit 84f3b3dd448399f9548468676e1bd1475dba8de5
("curl_multi_remove_handle: allow multiple removes"),
so we track the "curlm" membership of the curl easy handle
ourselves with a new "in_multi" flag.

Tested with curl 7.26.0 and 7.38.0 on Debian 7.x (wheezy) and
Debian 8.x (jessie) respectively.

Signed-off-by: Eric Wong 
---
 http.c | 12 ++--
 http.h |  1 +
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/http.c b/http.c
index 82ed542..9f97749 100644
--- a/http.c
+++ b/http.c
@@ -204,7 +204,12 @@ static void finish_active_slot(struct active_request_slot 
*slot)
 static void xmulti_remove_handle(struct active_request_slot *slot)
 {
 #ifdef USE_CURL_MULTI
-   curl_multi_remove_handle(curlm, slot->curl);
+   CURLMcode code = curl_multi_remove_handle(curlm, slot->curl);
+
+   if (code != CURLM_OK)
+   die("curl_multi_remove_handle failed (%p): %s",
+   slot->curl, curl_multi_strerror(code));
+   slot->in_multi = 0;
 #endif
 }
 
@@ -888,7 +893,8 @@ void http_cleanup(void)
while (slot != NULL) {
struct active_request_slot *next = slot->next;
if (slot->curl != NULL) {
-   xmulti_remove_handle(slot);
+   if (slot->in_multi)
+   xmulti_remove_handle(slot);
curl_easy_cleanup(slot->curl);
}
free(slot);
@@ -965,6 +971,7 @@ struct active_request_slot *get_active_slot(void)
newslot = xmalloc(sizeof(*newslot));
newslot->curl = NULL;
newslot->in_use = 0;
+   newslot->in_multi = 0;
newslot->next = NULL;
 
slot = active_queue_head;
@@ -1033,6 +1040,7 @@ int start_active_slot(struct active_request_slot *slot)
slot->in_use = 0;
return 0;
}
+   slot->in_multi = 1;
 
/*
 * We know there must be something to do, since we just added
diff --git a/http.h b/http.h
index 5ab9d9c..3339d70 100644
--- a/http.h
+++ b/http.h
@@ -60,6 +60,7 @@ struct slot_results {
 struct active_request_slot {
CURL *curl;
int in_use;
+   int in_multi;
CURLcode curl_result;
long http_code;
int *finished;
-- 
EW



Re: [PATCH tg/add-chmod+x-fix 2/2] t3700-add: protect one --chmod=+x test with POSIXPERM

2016-09-21 Thread Junio C Hamano
Johannes Sixt  writes:

> But I came to a different conclusion as I said in a message that
> crossed yours. I hope Thomas can pick up the baton again.

Yeah, our mails crossed, apparently, and I do agree with your
reasoning.  How about this, then?

-- >8 -- 
From: Johannes Sixt 
Date: Tue, 20 Sep 2016 08:18:25 +0200
Subject: [PATCH] t3700-add: do not check working tree file mode without 
POSIXPERM

A recently introduced test checks the result of 'git status' after
setting the executable bit on a file. This check does not yield the
expected result when the filesystem does not support the executable
bit.

What we care about is that a file added with "--chmod=+x" has
executable bit in the index and that "--chmod=+x" (or any other
options for that matter) does not muck with working tree files.
The former is tested by other existing tests, so let's check the
latter more explicitly and only under POSIXPERM prerequisite.

Signed-off-by: Johannes Sixt 
Signed-off-by: Junio C Hamano 
---
 t/t3700-add.sh | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 16ab2da..924a266 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -361,13 +361,11 @@ test_expect_success 'git add --chmod=[+-]x changes index 
with already added file
test_mode_in_index 100644 xfoo3
 '
 
-test_expect_success 'file status is changed after git add --chmod=+x' '
-   echo "AM foo4" >expected &&
+test_expect_success POSIXPERM 'git add --chmod=[+-]x does not change the 
working tree' '
echo foo >foo4 &&
git add foo4 &&
git add --chmod=+x foo4 &&
-   git status -s foo4 >actual &&
-   test_cmp expected actual
+   ! test -x foo4
 '
 
 test_expect_success 'no file status change if no pathspec is given' '
-- 
2.10.0-515-g9036219



RE: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

2016-09-21 Thread Ben Peart
I understand the reluctance to change the existing behavior of the "git 
checkout -b " command.

I see this as a tradeoff between taking advantage of the muscle memory for the 
existing command and coming up with a new shortcut command and training people 
to use it instead.

The fact that all the use cases we've observed and all the git test cases 
actually produce the same results but significantly faster with that change in 
behavior made me hope we could redefine the command to take advantage of the 
muscle memory.

That said, you're much more on the frontline of receiving negative feedback 
about doing that than I am. :)  How would you like to proceed?

Ben

-Original Message-
From: Junio C Hamano [mailto:gits...@pobox.com] 
Sent: Monday, September 19, 2016 1:04 PM
To: Ben Peart 
Cc: pclo...@gmail.com; Ben Peart ; git@vger.kernel.org
Subject: Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial 
checkout

Junio C Hamano  writes:

>> "git checkout -b foo" (without -f -m or ) is defined in 
>> the manual as being a shortcut for/equivalent to:
>>  
>> (1a) "git branch foo"
>> (1b) "git checkout foo"
>>  
>> However, it has been our experience in our observed use cases and all 
>> the existing git tests, that it can be treated as equivalent to:
>>  
>> (2a) "git branch foo"
>> (2b) "git symbolic-ref HEAD refs/heads/foo"
>> ...
>
> I am still not sure if I like the change of what "checkout -b" is this 
> late in the game, though.

Having said all that.

I do see the merit of having a shorthand way to invoke your 2 above.
It is just that I am not convinced that it is the best way to achieve that goal 
to redefine what "git checkout -b " (no other parameters) does.


Re: [PATCH tg/add-chmod+x-fix 2/2] t3700-add: protect one --chmod=+x test with POSIXPERM

2016-09-21 Thread Johannes Sixt

Am 21.09.2016 um 22:47 schrieb Junio C Hamano:

-test_expect_success 'file status is changed after git add --chmod=+x' '
-   echo "AM foo4" >expected &&
+test_expect_success 'git add --chmod=[+-]x changes index with newly added 
file' '
echo foo >foo4 &&
git add foo4 &&
git add --chmod=+x foo4 &&
-   git status -s foo4 >actual &&
-   test_cmp expected actual
+   test_mode_in_index 100755 foo4
 '


No, that's redundant. There are plenty of other test cases that check 
for this. You could just remove the case.


But I came to a different conclusion as I said in a message that crossed 
yours. I hope Thomas can pick up the baton again.


-- Hannes



Re: [PATCH tg/add-chmod+x-fix 2/2] t3700-add: protect one --chmod=+x test with POSIXPERM

2016-09-21 Thread Johannes Sixt

Am 21.09.2016 um 20:12 schrieb Junio C Hamano:

Thomas Gummerer  writes:


 I am surprised that add --chmod=+x changes only the index, but not
 the file on disk!?!


I *think* --chmod is mainly thought of as a convenience for git users
on a filesystem that doesn't have an executable flag.  So it was
introduced this way as the permissions on the file system don't matter
in that case.


OK.


 A change of that behaviour may make sense for this
though.


Hm, git-add is for moving content from the worktree to the index. I 
don't think it should change the worktree. I should not have been surprised.


It should probably not even introduce a change in the index that it does 
not see in the worktree, but, hey, git-add is a reasonable porcelain 
substitute for the --chmod task that otherwise git-update-index would 
have to do.



Perhaps we shouldn't even test this, then?


If I am right that git-add should not change the worktree, it should be 
tested. But 'git status -s' is probably the wrong tool for the reasons 
you gave (it could accidentally detect a change due to content 
difference instead of a file mode difference). At any rate, such a test 
must be protected with POSIXPERM.


-- Hannes



Re: [PATCH tg/add-chmod+x-fix 2/2] t3700-add: protect one --chmod=+x test with POSIXPERM

2016-09-21 Thread Junio C Hamano
Junio C Hamano  writes:

> ...  Comparing the index with the working tree using "status"
> is probably not how you would want to do so.  A future breakage may
> cause the indexed blob name to change by mistake, and status would
> happily report difference but you would not notice its output is
> saying "Hey, they are different between the index and the working
> tree", while you are expecting ONLY the change in the executable bit.

In other words, how about doing it this way?

-- >8 --
From: Johannes Sixt 
Date: Tue, 20 Sep 2016 08:18:25 +0200
Subject: [PATCH] t3700-add: do not rely on executable bit of the working tree 
file

A recently introduced test checks the result of 'git status' after
setting the executable bit on a file. This check does not yield the
expected result when the filesystem does not support the executable
bit.

The primary thing we care about is that a file added with --chmod=+x
has executable bit in the index, not that it is registered in the
index somehow differently from what is in the working tree, which
is what the "status" output tries to catch.  Let's check what we
care about, i.e. the path is marked as an executable in the index.

Signed-off-by: Johannes Sixt 
Signed-off-by: Junio C Hamano 
---
 t/t3700-add.sh | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 16ab2da..1a733bb 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -361,13 +361,11 @@ test_expect_success 'git add --chmod=[+-]x changes index 
with already added file
test_mode_in_index 100644 xfoo3
 '
 
-test_expect_success 'file status is changed after git add --chmod=+x' '
-   echo "AM foo4" >expected &&
+test_expect_success 'git add --chmod=[+-]x changes index with newly added 
file' '
echo foo >foo4 &&
git add foo4 &&
git add --chmod=+x foo4 &&
-   git status -s foo4 >actual &&
-   test_cmp expected actual
+   test_mode_in_index 100755 foo4
 '
 
 test_expect_success 'no file status change if no pathspec is given' '
-- 
2.10.0-515-g9036219



Re: v2.10.0: ls-tree exit status is always 0, this differs from ls(1)

2016-09-21 Thread Junio C Hamano
Steffen Nurpmeso  writes:

> diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
> index dbc91f9..8ebeced 100644
> --- a/Documentation/git-ls-tree.txt
> +++ b/Documentation/git-ls-tree.txt
> @@ -33,6 +33,10 @@ in the current working directory.  Note that:
> However, the current working directory can be ignored by passing
> --full-tree option.
>  
> + - the behaviour is different to that of "/bin/ls" in sofar as non-existing
> +   '' arguments are silently ignored and not reflected in the exit
> +   status code.
> +
>  OPTIONS
>  ---
>  ::

Sorry, but I did not notice that there was an attached patch when I
was reading your response for the first time.  Risk of using an
attachment to e-mail ;-)

I think this issue does not need a separate bullet point.  The
existing text says:

Note that:

 - the behaviour is slightly different from that of "/bin/ls" in that the
   '' denotes just a list of patterns to match, e.g. so specifying
   directory name (without `-r`) will behave differently, and order of the
   arguments does not matter.

 - the behaviour is similar to that of "/bin/ls" in that the '' is
   taken as relative to the current working directory.  E.g. when you are
   in a directory 'sub' that has a directory 'dir', you can run 'git
   ls-tree -r HEAD dir' to list the contents of the tree (that is
   'sub/dir' in `HEAD`).  You don't want to give a tree that is not at the
   root level (e.g. `git ls-tree -r HEAD:sub dir`) in this case, as that
   would result in asking for 'sub/sub/dir' in the `HEAD` commit.
   However, the current working directory can be ignored by passing
   --full-tree option.

and what caused your surprise is already covered by the first bullet
point, if the reader knows what "patterns to match" means in Git's
command line tools; it just needs to be extended to be more
meaningful to those who don't, I think.

How about rewriting the first bullet point like so instead:

  - the behaviour is different from that of "/bin/ls" in that the
'' are actually patterns to match, e.g. so specifying
directory name (without `-r`) will behave differently, the order
of the arguments does not matter, and a '' argument that
does not match any path is not an error (i.e. if there is no
path that matches any pattern, nothing is shown in the output).


Re: [PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages

2016-09-21 Thread Ævar Arnfjörð Bjarmason
On Wed, Sep 21, 2016 at 8:28 PM, Jakub Narębski  wrote:
> W dniu 21.09.2016 o 20:04, Ævar Arnfjörð Bjarmason pisze:
>> It would make some code like git_print_log() a bit more complex /
>> fragile, since it would have to work on multi-line strings, but
>> anything that needed to do a regex match / replacement would be much
>> faster.
>
> Would it?  Did you perform any synthetic micro-benchmark?

No, just experience. With the caveat that this may not matter at all
in this context, C-like code in Perl is slow, if you can offload
things to one big regex operation it's usually faster.

>>
>> But OTOH I think perhaps we're worrying about nothing when it comes to
>> the performance. I haven't been able to make gitweb display more than
>> a 100 or so commits at a time (haven't found where exactly in the code
>> these limits are), any munging we do on the log messages would have to
>> be pretty damn slow to matter.
>
> sub git_log_generic {
>
> # [...]
>
> my @commitlist =
> parse_commits($commit_hash, 101, (100 * $page),
>   defined $file_name ? ($file_name, 
> "--full-history") : ());
>
> Here you have it (it probably should be a constant; this number can be
> found in a few other places).

Thanks!


Re: [PATCH v4 3/3] regex: use regexec_buf()

2016-09-21 Thread Junio C Hamano
Junio C Hamano  writes:

> Johannes Schindelin  writes:
>
>> The new regexec_buf() function operates on buffers with an explicitly
>> specified length, rather than NUL-terminated strings.
>>
>> We need to use this function whenever the buffer we want to pass to
>> regexec() may have been mmap()ed (and is hence not NUL-terminated).
>>
>> Note: the original motivation for this patch was to fix a bug where
>> `git diff -G ` would crash. This patch converts more callers,
>> though, some of which explicitly allocated and constructed
>> NUL-terminated strings (or worse: modified read-only buffers to insert
>> NULs).

Also, I think there is nobody that modified read-only buffer.
diffgrep_consume() does say "Yuck -- line ought to be const",
but its "line" parameter is actually a non-const exactly for
this yuckiness (iow, it knew what it was doing).

Perhaps like so?

regex: use regexec_buf()

The new regexec_buf() function operates on buffers with an explicitly
specified length, rather than NUL-terminated strings.

We need to use this function whenever the buffer we want to pass to
regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated).

Note: the original motivation for this patch was to fix a bug where
`git diff -G ` would crash. This patch converts more callers,
though, some of which allocated to construct NUL-terminated strings,
or worse, modified buffers to temporarily insert NULs while calling
regexec(3).  By converting them to use regexec_buf() they have become
much cleaner.

Signed-off-by: Johannes Schindelin 
Signed-off-by: Junio C Hamano 

The patch text looked good to me.

Thanks.


Re: v2.10.0: ls-tree exit status is always 0, this differs from ls(1)

2016-09-21 Thread Steffen Nurpmeso
Hello.

Junio C Hamano  wrote:
 |Steffen Nurpmeso  writes:
 |> I think this behaviour contradicts the manual which strongly links
 |> ls-tree to ls(1):
 |
 |Patches to the documentation is very much welcomed.

The below could serve this purpose.

 |Somewhere the similarity must end, and actually it ends a lot
 |earlier, as "/bin/ls" takes exact paths while "ls-tree" (or any
 |other Git command for that matter) takes a pathspec pattern,
 |and not having a path that matches the pathspec pattern is not
 |an error condition.

I was just surprised to see nothing and get no feedback at all.
Ciao!

--steffen
diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
index dbc91f9..8ebeced 100644
--- a/Documentation/git-ls-tree.txt
+++ b/Documentation/git-ls-tree.txt
@@ -33,6 +33,10 @@ in the current working directory.  Note that:
However, the current working directory can be ignored by passing
--full-tree option.
 
+ - the behaviour is different to that of "/bin/ls" in sofar as non-existing
+   '' arguments are silently ignored and not reflected in the exit
+   status code.
+
 OPTIONS
 ---
 ::


Re: [PATCH v4 3/3] regex: use regexec_buf()

2016-09-21 Thread Junio C Hamano
Johannes Schindelin  writes:

> The new regexec_buf() function operates on buffers with an explicitly
> specified length, rather than NUL-terminated strings.
>
> We need to use this function whenever the buffer we want to pass to
> regexec() may have been mmap()ed (and is hence not NUL-terminated).
>
> Note: the original motivation for this patch was to fix a bug where
> `git diff -G ` would crash. This patch converts more callers,
> though, some of which explicitly allocated and constructed
> NUL-terminated strings (or worse: modified read-only buffers to insert
> NULs).
>
> Some of the buffers actually may be NUL-terminated. As regexec_buf()
> uses REG_STARTEND where available, but has to fall back to allocating
> and constructing NUL-terminated strings where REG_STARTEND is not
> available, this makes the code less efficient in the latter case.
>
> However, given the widespread support for REG_STARTEND, combined with
> the improved ease of code maintenance, we strike the balance in favor
> of REG_STARTEND.

The last paragraph can go (2/3 was already justified separately),
and the paragraph before that needs rewording, as you no longer do
the "duplicate, run regexec, and free" dance.

Will comment on the patch text itself later.

Thanks for following it through.  This topic actually fell under my
radar until now.


Re: [PATCH v4 2/3] regex: add regexec_buf() that can work on a non NUL-terminated string

2016-09-21 Thread Junio C Hamano
Johannes Schindelin  writes:

> ...
> Happily, there is an extension to regexec() introduced by the NetBSD
> project and present in all major regex implementation including
> Linux', MacOSX' and the one Git includes in compat/regex/: by using
> the (non-POSIX) REG_STARTEND flag, it is possible to tell the
> regexec() function that it should only look at the offsets between
> pmatch[0].rm_so and pmatch[0].rm_eo.
>
> That is exactly what we need.

Wonderful.

> Since support for REG_STARTEND is so widespread by now, let's just
> introduce a helper function that uses it, and fall back to allocating
> and constructing a NUL-terminated when REG_STARTEND is not available.

I'd somehow reword the last paragraph here, though ;-)

Since support for REG_STARTEND is so widespread by now, let's just
introduce a helper function that always uses it, and tell people
on a platform whose regex library does not support it to use the
one from our compat/regex/ directory.

The patch itself looks very sane.  Thanks.


Re: [PATCH 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-09-21 Thread Jakub Narębski
W dniu 21.09.2016 o 19:58, Ævar Arnfjörð Bjarmason pisze:
> On Wed, Sep 21, 2016 at 7:09 PM, Jakub Narębski  wrote:
>> W dniu 21.09.2016 o 13:44, Ævar Arnfjörð Bjarmason napisał:

>>> +(?>> +[A-Za-z0-9.-]+
>>> +(?!\.) # refs can't end with ".", see check_refname_format()
>>
>> If we can assume that tag name is at least two characters (instead of
>> at least one character), we could get rid of those extended regexp
>> lookaround assertions:
>>
>>   (?>   (?!pattern)  - zero-width negative lookahead  assertion
>>
>> That is:
>>
>>   +[A-Za-z0-9.]   # see strbuf_check_tag_ref(). Tags can't start 
>> with -
>>   +[A-Za-z0-9.-]*
>>   +[A-Za-z0-9-]   # refs can't end with ".", see 
>> check_refname_format()
> 
> Why get rid of them? I'm all for improving the regex, there's bound to
> be lots of bugs in it, but since it's perl we can freely use its
> extended features.

Ah, all right. I was wondering how zero width assertions / patterns
interact with each other, but zero-width negative lookaround assertions
are really quite simple.

> 
>> Also, the canonical documentation for what is allowed in refnames
>> is git-check-ref-format(1)... though it does not look like it includes
>> "tags cannot start with '-'".
> 
> Yeah, looks like that manpage needs to be patched.

Right.

> 
>> Anyway, perhaps 'is it valid refname' could be passed to a subroutine,
>> or a named regexp (which might be more involved, like disallowing two
>> consecutive dots, e.g. "(?!.*\.{2})" at beginning).

I wonder if rules for valid tag name can be described in extended
regexp, and if it is, how readable would it be.

Regards,
-- 
Jakub Narębski



Re: [PATCH v2 2/3] init: do not set core.worktree more often than necessary

2016-09-21 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> When "git init" is called with GIT_WORK_TREE environment set, we want to
> keep this worktree's location in core.worktree so the user does not have
> to set the environment again and again. See ef6f0af (git-init: set
> core.worktree if GIT_WORK_TREE is specified - 2007-07-04)
>
> We detect that by this logic (in needs_work_tree_config): normally
> worktree's top dir would contains ".git" directory, if this is not true,

s/contains/contain/;

> worktree is probably set to elsewhere by the user.
>
> Unfortunately when it calls get_git_dir() it does not take ".git" files
> into account. When we find a .git file, we immediately follow the file
> until we find the real ".git" directory. The location of this first
> ".git" file is lost.
>
> The .git file would satisfy the logic above and not create
> core.worktree (correct). But because the final .git's location is used,
> needs_work_tree_config() is misled and creates core.worktree anyway.

The above explanation makes it sound like the correct fix belongs to
needs_work_tree_config(), though.

I am starting to wonder if what ef6f0af did was misguided and we are
better off without setting core.worktree ourselves, but that is a
different issue.

> diff --git a/builtin/init-db.c b/builtin/init-db.c
> index d5d7558..0d5cc76 100644
> --- a/builtin/init-db.c
> +++ b/builtin/init-db.c
> @@ -23,6 +23,7 @@ static int init_is_bare_repository = 0;
>  static int init_shared_repository = -1;
>  static const char *init_db_template_dir;
>  static const char *git_link;
> +static const char *original_git_dir;
>  
>  static void copy_templates_1(struct strbuf *path, struct strbuf *template,
>DIR *dir)
> @@ -263,7 +264,7 @@ static int create_default_files(const char *template_path)
>   /* allow template config file to override the default */
>   if (log_all_ref_updates == -1)
>   git_config_set("core.logallrefupdates", "true");
> - if (needs_work_tree_config(get_git_dir(), work_tree))
> + if (needs_work_tree_config(original_git_dir, work_tree))
>   git_config_set("core.worktree", work_tree);
>   }
>  
> @@ -314,6 +315,8 @@ static void create_object_directory(void)
>  int set_git_dir_init(const char *git_dir, const char *real_git_dir,
>int exist_ok)
>  {
> + original_git_dir = xstrdup(real_path(git_dir));
> +
>   if (real_git_dir) {
>   struct stat st;

The function being extern bothers me.  The create_default_files()
function, which is the only thing consumes this variable, is called
only from init_db(), and I'd prefer to see some way to guarantee
that everybody who calls init_db() calls set_git_dir_init()
beforehand.  Right now, cmd_init_db() and cmd_clone() are the only
ones that call init_db() and they both call set_dir_git_init(); if a
new caller starts calling init_db() and forgets to call the other
one, that caller will be buggy.

Perhaps a comment before init_db() to tell callers to always call
the other one is the least thing necessary?

> diff --git a/t/t0001-init.sh b/t/t0001-init.sh
> index 488564b..b8fc588 100755
> --- a/t/t0001-init.sh
> +++ b/t/t0001-init.sh
> @@ -400,9 +400,11 @@ test_expect_success 're-init from a linked worktree' '
>   test_commit first &&
>   git worktree add ../linked-worktree &&
>   mv .git/info/exclude expected-exclude &&
> + cp .git/config expected-config &&
>   find .git/worktrees -print | sort >expected &&
>   git -C ../linked-worktree init &&
>   test_cmp expected-exclude .git/info/exclude &&
> + test_cmp expected-config .git/config &&
>   find .git/worktrees -print | sort >actual &&
>   test_cmp expected actual
>   )


[PATCH v4 1.5/3] fixup! regex: -G feeds a non NUL-terminated string to regexec() and fails

2016-09-21 Thread Johannes Schindelin
Sorry, I should have squashed that in, but I *just* noticed the conflict
with 433860f (diff: improve positioning of add/delete blocks in diffs,
2016-09-05).

---
 t/{t4061-diff-pickaxe.sh => t4062-diff-pickaxe.sh} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename t/{t4061-diff-pickaxe.sh => t4062-diff-pickaxe.sh} (100%)

diff --git a/t/t4061-diff-pickaxe.sh b/t/t4062-diff-pickaxe.sh
similarity index 100%
rename from t/t4061-diff-pickaxe.sh
rename to t/t4062-diff-pickaxe.sh
-- 
2.10.0.windows.1.10.g803177d




Re: [PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages

2016-09-21 Thread Jakub Narębski
W dniu 21.09.2016 o 20:04, Ævar Arnfjörð Bjarmason pisze:
> On Wed, Sep 21, 2016 at 6:26 PM, Jakub Narębski  wrote:
> 
>> P.S. I have reworking of commit message parsing and enhancement in my
>> long, long and dated gitweb TODO list :-(
> 
> Anything specific you could share?

Some of TODO I would have to bring from backups, as the computer on
which I did majority of gitweb development has since died (from old
age).

The list includes:
- implement caching of gitweb output
- revamp handling of encoding (UTF-8 with fallback encoding)
- split gitweb into modules, while maintaining ease of install
- refactor handling of diffs
- better handling of config files
- document URI structure, perhaps revamp URI parsing and generation
- make commit message transformation generic
  (see below)

> 
> One thing that would be a lot faster in Perl is if we didn't have to
> pass the log around as split-up lines and could just operate on it as
> one big string.

Well, there are a few transformations that commit message undergoes
in gitweb, including linking SHA1, optional linking of bug numbers
to bug tracker, and syntax highlighting of signoff lines (trailer
lines).  

I would like to have this cleaned up, and refactored.  With all
those transformations we would need to keep account which parts
are HTML, and which not and need escaping (note: URI escape !=
HTML escape).

> 
> It would make some code like git_print_log() a bit more complex /
> fragile, since it would have to work on multi-line strings, but
> anything that needed to do a regex match / replacement would be much
> faster.

Would it?  Did you perform any synthetic micro-benchmark?

> 
> But OTOH I think perhaps we're worrying about nothing when it comes to
> the performance. I haven't been able to make gitweb display more than
> a 100 or so commits at a time (haven't found where exactly in the code
> these limits are), any munging we do on the log messages would have to
> be pretty damn slow to matter.

sub git_log_generic {

# [...]

my @commitlist =
parse_commits($commit_hash, 101, (100 * $page),
  defined $file_name ? ($file_name, 
"--full-history") : ());

Here you have it (it probably should be a constant; this number can be
found in a few other places).

Best,
-- 
Jakub Narębski



[PATCH v4 2/3] regex: add regexec_buf() that can work on a non NUL-terminated string

2016-09-21 Thread Johannes Schindelin
We just introduced a test that demonstrates that our sloppy use of
regexec() on a mmap()ed area can result in incorrect results or even
hard crashes.

So what we need to fix this is a function that calls regexec() on a
length-delimited, rather than a NUL-terminated, string.

Happily, there is an extension to regexec() introduced by the NetBSD
project and present in all major regex implementation including
Linux', MacOSX' and the one Git includes in compat/regex/: by using
the (non-POSIX) REG_STARTEND flag, it is possible to tell the
regexec() function that it should only look at the offsets between
pmatch[0].rm_so and pmatch[0].rm_eo.

That is exactly what we need.

Since support for REG_STARTEND is so widespread by now, let's just
introduce a helper function that uses it, and fall back to allocating
and constructing a NUL-terminated when REG_STARTEND is not available.

Signed-off-by: Johannes Schindelin 
---
 Makefile  |  3 ++-
 git-compat-util.h | 13 +
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index df4f86b..c6f7f66 100644
--- a/Makefile
+++ b/Makefile
@@ -301,7 +301,8 @@ all::
 # crashes due to allocation and free working on different 'heaps'.
 # It's defined automatically if USE_NED_ALLOCATOR is set.
 #
-# Define NO_REGEX if you have no or inferior regex support in your C library.
+# Define NO_REGEX if your C library lacks regex support with REG_STARTEND
+# feature.
 #
 # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the
 # user.
diff --git a/git-compat-util.h b/git-compat-util.h
index 37cce07..8aab0c3 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -977,6 +977,19 @@ void git_qsort(void *base, size_t nmemb, size_t size,
 #define qsort git_qsort
 #endif
 
+#ifndef REG_STARTEND
+#error "Git requires REG_STARTEND support. Compile with NO_REGEX=NeedsStartEnd"
+#endif
+
+static inline int regexec_buf(const regex_t *preg, const char *buf, size_t 
size,
+ size_t nmatch, regmatch_t pmatch[], int eflags)
+{
+   assert(nmatch > 0 && pmatch);
+   pmatch[0].rm_so = 0;
+   pmatch[0].rm_eo = size;
+   return regexec(preg, buf, nmatch, pmatch, eflags | REG_STARTEND);
+}
+
 #ifndef DIR_HAS_BSD_GROUP_SEMANTICS
 # define FORCE_DIR_SET_GID S_ISGID
 #else
-- 
2.10.0.windows.1.10.g803177d




[PATCH v4 3/3] regex: use regexec_buf()

2016-09-21 Thread Johannes Schindelin
The new regexec_buf() function operates on buffers with an explicitly
specified length, rather than NUL-terminated strings.

We need to use this function whenever the buffer we want to pass to
regexec() may have been mmap()ed (and is hence not NUL-terminated).

Note: the original motivation for this patch was to fix a bug where
`git diff -G ` would crash. This patch converts more callers,
though, some of which explicitly allocated and constructed
NUL-terminated strings (or worse: modified read-only buffers to insert
NULs).

Some of the buffers actually may be NUL-terminated. As regexec_buf()
uses REG_STARTEND where available, but has to fall back to allocating
and constructing NUL-terminated strings where REG_STARTEND is not
available, this makes the code less efficient in the latter case.

However, given the widespread support for REG_STARTEND, combined with
the improved ease of code maintenance, we strike the balance in favor
of REG_STARTEND.

Signed-off-by: Johannes Schindelin 
---
 diff.c  |  3 ++-
 diffcore-pickaxe.c  | 18 --
 grep.c  | 14 ++
 t/t4061-diff-pickaxe.sh |  2 +-
 xdiff-interface.c   | 13 -
 5 files changed, 17 insertions(+), 33 deletions(-)

diff --git a/diff.c b/diff.c
index c6da383..fb99235 100644
--- a/diff.c
+++ b/diff.c
@@ -952,7 +952,8 @@ static int find_word_boundaries(mmfile_t *buffer, regex_t 
*word_regex,
 {
if (word_regex && *begin < buffer->size) {
regmatch_t match[1];
-   if (!regexec(word_regex, buffer->ptr + *begin, 1, match, 0)) {
+   if (!regexec_buf(word_regex, buffer->ptr + *begin,
+buffer->size - *begin, 1, match, 0)) {
char *p = memchr(buffer->ptr + *begin + match[0].rm_so,
'\n', match[0].rm_eo - match[0].rm_so);
*end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 55067ca..9795ca1 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -23,7 +23,6 @@ static void diffgrep_consume(void *priv, char *line, unsigned 
long len)
 {
struct diffgrep_cb *data = priv;
regmatch_t regmatch;
-   int hold;
 
if (line[0] != '+' && line[0] != '-')
return;
@@ -33,11 +32,8 @@ static void diffgrep_consume(void *priv, char *line, 
unsigned long len)
 * caller early.
 */
return;
-   /* Yuck -- line ought to be "const char *"! */
-   hold = line[len];
-   line[len] = '\0';
-   data->hit = !regexec(data->regexp, line + 1, 1, , 0);
-   line[len] = hold;
+   data->hit = !regexec_buf(data->regexp, line + 1, len - 1, 1,
+, 0);
 }
 
 static int diff_grep(mmfile_t *one, mmfile_t *two,
@@ -50,9 +46,11 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
xdemitconf_t xecfg;
 
if (!one)
-   return !regexec(regexp, two->ptr, 1, , 0);
+   return !regexec_buf(regexp, two->ptr, two->size,
+   1, , 0);
if (!two)
-   return !regexec(regexp, one->ptr, 1, , 0);
+   return !regexec_buf(regexp, one->ptr, one->size,
+   1, , 0);
 
/*
 * We have both sides; need to run textual diff and see if
@@ -83,8 +81,8 @@ static unsigned int contains(mmfile_t *mf, regex_t *regexp, 
kwset_t kws)
regmatch_t regmatch;
int flags = 0;
 
-   assert(data[sz] == '\0');
-   while (*data && !regexec(regexp, data, 1, , flags)) {
+   while (*data &&
+  !regexec_buf(regexp, data, sz, 1, , flags)) {
flags |= REG_NOTBOL;
data += regmatch.rm_eo;
if (*data && regmatch.rm_so == regmatch.rm_eo)
diff --git a/grep.c b/grep.c
index d7d00b8..1194d35 100644
--- a/grep.c
+++ b/grep.c
@@ -898,17 +898,6 @@ static int fixmatch(struct grep_pat *p, char *line, char 
*eol,
}
 }
 
-static int regmatch(const regex_t *preg, char *line, char *eol,
-   regmatch_t *match, int eflags)
-{
-#ifdef REG_STARTEND
-   match->rm_so = 0;
-   match->rm_eo = eol - line;
-   eflags |= REG_STARTEND;
-#endif
-   return regexec(preg, line, 1, match, eflags);
-}
-
 static int patmatch(struct grep_pat *p, char *line, char *eol,
regmatch_t *match, int eflags)
 {
@@ -919,7 +908,8 @@ static int patmatch(struct grep_pat *p, char *line, char 
*eol,
else if (p->pcre_regexp)
hit = !pcrematch(p, line, eol, match, eflags);
else
-   hit = !regmatch(>regexp, line, eol, match, eflags);
+   hit = !regexec_buf(>regexp, line, eol - line, 1, match,
+  eflags);

[PATCH v4 1/3] regex: -G feeds a non NUL-terminated string to regexec() and fails

2016-09-21 Thread Johannes Schindelin
When our pickaxe code feeds file contents to regexec(), it implicitly
assumes that the file contents are read into implicitly NUL-terminated
buffers (i.e. that we overallocate by 1, appending a single '\0').

This is not so.

In particular when the file contents are simply mmap()ed, we can be
virtually certain that the buffer is preceding uninitialized bytes, or
invalid pages.

Note that the test we add here is known to be flakey: we simply cannot
know whether the byte following the mmap()ed ones is a NUL or not.

Typically, on Linux the test passes. On Windows, it fails virtually
every time due to an access violation (that's a segmentation fault for
you Unix-y people out there). And Windows would be correct: the
regexec() call wants to operate on a regular, NUL-terminated string,
there is no NUL in the mmap()ed memory range, and it is undefined
whether the next byte is even legal to access.

When run with --valgrind it demonstrates quite clearly the breakage, of
course.

Being marked with `test_expect_failure`, this test will sometimes be
declare "TODO fixed", even if it only passes by mistake.

This test case represents a Minimal, Complete and Verifiable Example of
a breakage reported by Chris Sidi.

Signed-off-by: Johannes Schindelin 
---
 t/t4061-diff-pickaxe.sh | 22 ++
 1 file changed, 22 insertions(+)
 create mode 100755 t/t4061-diff-pickaxe.sh

diff --git a/t/t4061-diff-pickaxe.sh b/t/t4061-diff-pickaxe.sh
new file mode 100755
index 000..5929f2e
--- /dev/null
+++ b/t/t4061-diff-pickaxe.sh
@@ -0,0 +1,22 @@
+#!/bin/sh
+#
+# Copyright (c) 2016 Johannes Schindelin
+#
+
+test_description='Pickaxe options'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+   test_commit initial &&
+   printf "%04096d" 0 >4096-zeroes.txt &&
+   git add 4096-zeroes.txt &&
+   test_tick &&
+   git commit -m "A 4k file"
+'
+test_expect_failure '-G matches' '
+   git diff --name-only -G "^0{4096}$" HEAD^ >out &&
+   test 4096-zeroes.txt = "$(cat out)"
+'
+
+test_done
-- 
2.10.0.windows.1.10.g803177d




[PATCH v4 0/3] Fix a segfault caused by regexec() being called on mmap()ed data

2016-09-21 Thread Johannes Schindelin
[Cc:ing Benjamin Kramer & René Scharfe because they both worked on
the REG_STARTEND code in grep.c that I replace in this iteration of the
patch series]

This patch series addresses a problem where `git diff` is called using
`-G` or `-S --pickaxe-regex` on new-born files that are configured
without user diff drivers, and that hence get mmap()ed into memory.

The problem with that: mmap()ed memory is *not* NUL-terminated, yet the
pickaxe code calls regexec() on it just the same.

This problem has been reported by my colleague Chris Sidi.

We solve this by introducing a helper, regexec_buf(), that takes a
pointer and a length instead of a NUL-terminated string.

This helper then uses REG_STARTEND where available, and falls back to
allocating and constructing a NUL-terminated string. Given the
wide-spread support for REG_STARTEND (Linux has it, MacOSX has it, Git
for Windows has it because it uses compat/regex/ that has it), I think
this is a fair trade-off.

Changes since v3:

- reworded the onelines as per Junio's suggestions.

- removed fallback when REG_STARTEND is not supported, in favor of
  requiring NO_REGEX.

- removed the regmatch() function from grep.c, in favor of using
  regexec_buf().


Johannes Schindelin (3):
  regex: -G feeds a non NUL-terminated string to regexec() and
fails
  regex: add regexec_buf() that can work on a non NUL-terminated string
  regex: use regexec_buf()

 Makefile|  3 ++-
 diff.c  |  3 ++-
 diffcore-pickaxe.c  | 18 --
 git-compat-util.h   | 13 +
 grep.c  | 14 ++
 t/t4061-diff-pickaxe.sh | 22 ++
 xdiff-interface.c   | 13 -
 7 files changed, 53 insertions(+), 33 deletions(-)
 create mode 100755 t/t4061-diff-pickaxe.sh

Published-As: https://github.com/dscho/git/releases/tag/mmap-regexec-v4
Fetch-It-Via: git fetch https://github.com/dscho/git mmap-regexec-v4

Interdiff vs v3:

 diff --git a/Makefile b/Makefile
 index df4f86b..c6f7f66 100644
 --- a/Makefile
 +++ b/Makefile
 @@ -301,7 +301,8 @@ all::
  # crashes due to allocation and free working on different 'heaps'.
  # It's defined automatically if USE_NED_ALLOCATOR is set.
  #
 -# Define NO_REGEX if you have no or inferior regex support in your C library.
 +# Define NO_REGEX if your C library lacks regex support with REG_STARTEND
 +# feature.
  #
  # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the
  # user.
 diff --git a/git-compat-util.h b/git-compat-util.h
 index 627ec5f..8aab0c3 100644
 --- a/git-compat-util.h
 +++ b/git-compat-util.h
 @@ -977,25 +977,17 @@ void git_qsort(void *base, size_t nmemb, size_t size,
  #define qsort git_qsort
  #endif
  
 +#ifndef REG_STARTEND
 +#error "Git requires REG_STARTEND support. Compile with 
NO_REGEX=NeedsStartEnd"
 +#endif
 +
  static inline int regexec_buf(const regex_t *preg, const char *buf, size_t 
size,
  size_t nmatch, regmatch_t pmatch[], int eflags)
  {
 -#ifdef REG_STARTEND
assert(nmatch > 0 && pmatch);
pmatch[0].rm_so = 0;
pmatch[0].rm_eo = size;
return regexec(preg, buf, nmatch, pmatch, eflags | REG_STARTEND);
 -#else
 -  char *buf2 = xmalloc(size + 1);
 -  int ret;
 -
 -  memcpy(buf2, buf, size);
 -  buf2[size] = '\0';
 -  ret = regexec(preg, buf2, nmatch, pmatch, eflags);
 -  free(buf2);
 -
 -  return ret;
 -#endif
  }
  
  #ifndef DIR_HAS_BSD_GROUP_SEMANTICS
 diff --git a/grep.c b/grep.c
 index d7d00b8..1194d35 100644
 --- a/grep.c
 +++ b/grep.c
 @@ -898,17 +898,6 @@ static int fixmatch(struct grep_pat *p, char *line, char 
*eol,
}
  }
  
 -static int regmatch(const regex_t *preg, char *line, char *eol,
 -  regmatch_t *match, int eflags)
 -{
 -#ifdef REG_STARTEND
 -  match->rm_so = 0;
 -  match->rm_eo = eol - line;
 -  eflags |= REG_STARTEND;
 -#endif
 -  return regexec(preg, line, 1, match, eflags);
 -}
 -
  static int patmatch(struct grep_pat *p, char *line, char *eol,
regmatch_t *match, int eflags)
  {
 @@ -919,7 +908,8 @@ static int patmatch(struct grep_pat *p, char *line, char 
*eol,
else if (p->pcre_regexp)
hit = !pcrematch(p, line, eol, match, eflags);
else
 -  hit = !regmatch(>regexp, line, eol, match, eflags);
 +  hit = !regexec_buf(>regexp, line, eol - line, 1, match,
 + eflags);
  
return hit;
  }

-- 
2.10.0.windows.1.10.g803177d

base-commit: f6727b0509ec3417a5183ba6e658143275a734f5

Re: What's cooking in git.git (Sep 2016, #05; Mon, 19)

2016-09-21 Thread Johannes Schindelin
Hi Junio,

On Wed, 21 Sep 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> >> * jk/rebase-i-drop-ident-check (2016-07-29) 1 commit
> >>   (merged to 'next' on 2016-08-14 at 6891bcd)
> >>  + rebase-interactive: drop early check for valid ident
> >> 
> >>  Even when "git pull --rebase=preserve" (and the underlying "git
> >>  rebase --preserve") can complete without creating any new commit
> >>  (i.e. fast-forwards), it still insisted on having a usable ident
> >>  information (read: user.email is set correctly), which was less
> >>  than nice.  As the underlying commands used inside "git rebase"
> >>  would fail with a more meaningful error message and advice text
> >>  when the bogus ident matters, this extra check was removed.
> >> 
> >>  Will hold to see if people scream.
> >>  cf. <20160729224944.ga23...@sigill.intra.peff.net>
> >
> > Let's do this.
> 
> We have already been doing it (i.e. "hold to see if people scream")
> for some time.

I meant: let's merge this to `master`.

> Does it conflict with your effort to reimplement "rebase -i" in C

I do not think so.

> to keep this in 'next'?  Do you want it to move to 'master'?  I was
> under the impression that it would not make a difference to have or not
> have this patch once your reimplementation gets merged (meaning: the
> removal of the three lines will be done by wholesale removal of
> git-rebase--interactive.sh done the endgame of your series), so...

Oh, I failed to make clear that my patch series do *not* remove
git-rebase--interactive.sh. I just barely started to work to that end.
While the speed improvements are quite noticable, the rebase--helper
command still only implements the performance-critical code paths in C.

There is quite a bit of work left to do before git-rebase--interactive.sh
can be retired:

- --root is not handled via the sequencer yet,

- --preserve-merges is not handled either [*1*],

- the shell script still sets up the state directory,

- option parsing is still all-shell,

- probably more tasks I forgot.

The good news is that these parts can be converted independently from each
other, and even by independent developers (hint, hint ;-)).

Ciao,
Dscho

Footnote *1*: I am not sure that I want to port -p to C: in my view, this
is a failed experiment, to be replaced with a design based on my Git
garden shears. I tend to think that that part should be moved to a new
shell script ("git-rebase--preserve-merges.sh"?) unless some developer
other than me feels strongly enough to put their money where their mouth
is and teach the sequencer about it.


Re: [PATCH v2 0/3] Fix git-init in linked worktrees

2016-09-21 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> v2 requires jk/setup-sequence-update so I could kill my workaround
> patch and avoid conflicts in t0001. And:
>
>  - 1/3 has a few missing words back in its commit message
>  - 2/3, which was 3/3 in v1, no longer has the ugly hacky
>get_first_git_dir()
>  - 3/3 is a new tiny code improvement after the new 2/3
>
> Nguyễn Thái Ngọc Duy (3):
>   init: correct re-initialization from a linked worktree
>   init: do not set core.worktree more often than necessary
>   init: reuse original_git_dir in set_git_dir_init()

Thanks. Will take a look.


Re: git-gui, was Re: [PATCH v2 6/6] git-gui: Update Japanese information

2016-09-21 Thread Junio C Hamano
Vasco Almeida  writes:

> I have sent some git-gui patches on May this year and I think it will
> add value to accepted them at some point:

Yeah, they may be of value, but the thing is, I am not really in the
position to review or apply them (I don't do git-gui).

If Pat is not going to return, we would need to find volunteers to
be maintainers of "git-gui" first.

Thanks.  I may get to these patches when/if I find time, but it is
not likely to happen very soon.



Re: [PATCH] git-check-ref-format.txt: fixup documentation

2016-09-21 Thread Junio C Hamano
Elia Pinto  writes:

> die is not a standard shell function. Use
> a different shell code for the example.

Thanks.


Re: [PATCH tg/add-chmod+x-fix 2/2] t3700-add: protect one --chmod=+x test with POSIXPERM

2016-09-21 Thread Junio C Hamano
Thomas Gummerer  writes:

>>  I am surprised that add --chmod=+x changes only the index, but not
>>  the file on disk!?!
>
> I *think* --chmod is mainly thought of as a convenience for git users
> on a filesystem that doesn't have an executable flag.  So it was
> introduced this way as the permissions on the file system don't matter
> in that case.  A change of that behaviour may make sense for this
> though.

Perhaps we shouldn't even test this, then?  I can see future people
wanting to change this behaviour, while I can see argument for not
touching the working tree file, too.  It is essential for the main
purpose of the command (i.e. "I want to flip the executable bit for
the path in the index") to make sure that the bit in the index is
changed.  Comparing the index with the working tree using "status"
is probably not how you would want to do so.  A future breakage may
cause the indexed blob name to change by mistake, and status would
happily report difference but you would not notice its output is
saying "Hey, they are different between the index and the working
tree", while you are expecting ONLY the change in the executable bit.

How about doing

git add foo4 &&
git add --chmod=+x foo4 &&
test_mode_in_index 100755 foo4

instead?

>
>>  t/t3700-add.sh | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/t/t3700-add.sh b/t/t3700-add.sh
>> index 16ab2da..13e0dd2 100755
>> --- a/t/t3700-add.sh
>> +++ b/t/t3700-add.sh
>> @@ -361,7 +361,7 @@ test_expect_success 'git add --chmod=[+-]x changes index 
>> with already added file
>>  test_mode_in_index 100644 xfoo3
>>  '
>>  
>> -test_expect_success 'file status is changed after git add --chmod=+x' '
>> +test_expect_success POSIXPERM 'file status is changed after git add 
>> --chmod=+x' '
>>  echo "AM foo4" >expected &&
>>  echo foo >foo4 &&
>>  git add foo4 &&
>> -- 
>> 2.10.0.85.gea34e30
>> 


Re: [PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages

2016-09-21 Thread Ævar Arnfjörð Bjarmason
On Wed, Sep 21, 2016 at 6:26 PM, Jakub Narębski  wrote:

> P.S. I have reworking of commit message parsing and enhancement in my
> long, long and dated gitweb TODO list :-(

Anything specific you could share?

One thing that would be a lot faster in Perl is if we didn't have to
pass the log around as split-up lines and could just operate on it as
one big string.

It would make some code like git_print_log() a bit more complex /
fragile, since it would have to work on multi-line strings, but
anything that needed to do a regex match / replacement would be much
faster.

But OTOH I think perhaps we're worrying about nothing when it comes to
the performance. I haven't been able to make gitweb display more than
a 100 or so commits at a time (haven't found where exactly in the code
these limits are), any munging we do on the log messages would have to
be pretty damn slow to matter.

> P.P.S. Kay Sievers no longer works on gitweb, and I think no longer
> works at SuSE but at RedHat.

Yup, been getting bounces from his address.


Re: [PATCH 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-09-21 Thread Junio C Hamano
Jakub Narębski  writes:

>> When I saw 2/3 I wondered about one thing and 3/3 shares the same,
>> which is that we only use regex match and do not validate for a
>> false match.  Would it be too expensive...
>
> It's a matter of balance between false positives (and unresolving
> links) and performance...

Yes, and that is why I asked a simple yes-or-no question.  Would it
be too expensive?  Your answer seems to be yes.

Have we measured?  Is that really a bottleneck?  Would it help to
update parse_commits to call a new command "gitweb--helper" that
produces the result of what git_print_log would have done to its
$log argument, for example?



Re: v2.9.3 and v2.10.0: `name-ref' HEAD gives wrong branch name

2016-09-21 Thread Jakub Narębski
W dniu 21.09.2016 o 18:37, Junio C Hamano pisze:
> Jakub Narębski  writes:
>>> Have you tried "git symbolic-ref HEAD"?
>>>
>>> $ git symbolic-ref HEAD
>>> refs/heads/master
>>>
>>> If you don't want the fully-qualified ref, you can add --short:
>>>
>>> $ git symbolic-ref --short HEAD
>>> master
>>
>> This does not work for detached HEAD, but perhaps you don't need
>> to worry about this.
> 
> I am not sure what you mean by "does not work".  Asking what ref
> HEAD points at to symbolic-ref will tell you it does not point at
> anything by exiting with non-zero status and that can be relied
> upon.
> 
> Asking "symbolic-ref HEAD" has been the way how "git branch" and
> other commands find out what branch is currently checked out for
> almost eternity ("git symbolic-ref" appeared in Git v0.99.8).

I'm sorry, I was wrong saying "does not work".  The problem is not
that it does not work, but that you need to take care of exit 
condition to correctly support detached HEAD state, and not end
with empty name for a branch, for example...

But if handled correctly, git-symbolic-ref is as good as git-rev-parse
as a plumbing to resolve current branch name.

Best,
-- 
Jakub Narębski



Re: [PATCH 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-09-21 Thread Ævar Arnfjörð Bjarmason
On Wed, Sep 21, 2016 at 7:09 PM, Jakub Narębski <jna...@gmail.com> wrote:
> W dniu 21.09.2016 o 13:44, Ævar Arnfjörð Bjarmason napisał:
>
>> Change the log formatting function to know about "git describe" output
>> like v2.8.0-4-g867ad08 in addition to just plain 867ad08.
>
> All right, that is a good plan.
>
>>
>> This also fixes a micro-regression in my change of the minimum SHA1
>> length from 8 to 7, which is that dated tags like
>> hadoop-20160921-113441-20-g094fb7d would start thinking the "20160921"
>> part was a commit.
>
> Actually 20160921 is 8 characters, so assuming that '-' is treated
> as word boundary by Perl, it is not a regression; this false positive
> was there.  The new feature would help, instead of linking false match
> it links whole git-describe output.
>
> So this paragraph needs to be changed wrt. the above.

Yeah I just miscounted there, will change that.

> Note that there are quite a bit of shortened SHA-1 that are composed
> entirely from digits, without a-f characters.

*nod*

>>
>> There are still many valid refnames that we don't link to
>> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
>> v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
>> similarly it's trivially possible to create some refnames like
>> "æ/var-gf6727b0" or whatever which won't be picked up by this regex.
>
> Hopefully hierarchical tags are rare.  We need to reduce false
> positives.
>
>>
>> There's surely room for improvement here, but I just wanted to address
>> the very common case of sticking "git describe" output into commit
>> messages without trying to link to all possible refnames, that's going
>> to be a rather futile exercise given that this is free text, and it
>> would be prohibitively expensive to look up whether the references in
>> question exist in our repository.
>
> Note that we do not ask Git at the time of displaying commit message
> if the link is valid for performance reasons; we link it, and the link
> may be invalid if it was a false positive.
>
> Note that recommended way to refer to other commit in commit mesages
> is (see Documentation/SubmittingPatches):
>
>   If you want to reference a previous commit in the history of a stable
>   branch, use the format "abbreviated sha1 (subject, date)",
>   with the subject enclosed in a pair of double-quotes, like this:
>
>  Commit f86a374 ("pack-bitmap.c: fix a memleak", 2015-03-30)
>  noticed that ...
>
> Hmmm... this makes previous commit even more important.
>
>> ---
>>  gitweb/gitweb.perl | 18 --
>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 101dbc0..3a52bc7 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>>   my $line = shift;
>>
>>   $line = esc_html($line, -nbsp=>1);
>> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>> + $line =~ s{
>> +\b
>> +(
>> +# The output of "git describe", e.g. v2.10.0-297-gf6727b0
>> +# or hadoop-20160921-113441-20-g094fb7d
>
> All right, for more complex regular expressions using in-line comments
> (extended regexp in Perl) is a good idea.
>
>> +(?> +[A-Za-z0-9.-]+
>> +(?!\.) # refs can't end with ".", see check_refname_format()
>
> If we can assume that tag name is at least two characters (instead of
> at least one character), we could get rid of those extended regexp
> lookaround assertions:
>
>   (?   (?!pattern)  - zero-width negative lookahead  assertion
>
> That is:
>
>   +[A-Za-z0-9.]   # see strbuf_check_tag_ref(). Tags can't start 
> with -
>   +[A-Za-z0-9.-]*
>   +[A-Za-z0-9-]   # refs can't end with ".", see 
> check_refname_format()

Why get rid of them? I'm all for improving the regex, there's bound to
be lots of bugs in it, but since it's perl we can freely use its
extended features.

> Also, the canonical documentation for what is allowed in refnames
> is git-check-ref-format(1)... though it does not look like it includes
> "tags cannot start with '-'".

Yeah, looks like that manpage needs to be patched.

> Anyway, perhaps 'is it valid refname' could be passed to a subroutine,
> or a named regexp (which might be more involved, like disallowing two
> consecutive dots, e.g. "(?!.*\.{2})" at beginning).
>
>> +-g[0-9a-fA-F]{7,40}
>

Re: [PATCH] gitweb: use highlight's shebang detection

2016-09-21 Thread Jakub Narębski
W dniu 21.09.2016 o 18:38, Junio C Hamano pisze:
> Jakub Narębski  writes:
>> W dniu 06.09.2016 o 21:00, Ian Kelling pisze:
>>
>>> The highlight binary can detect language by shebang when we can't tell
>>> the syntax type by the name of the file. 
>>
>> Was it something always present among highlight[1] binary capabilities,
>> or is it something present only in new enough highlight app?  Or only
>> in some specific fork / specific binary?  I couldn't find language
>> detection in highlight[1] documentation...
>> ...
>> Thank you for your work on this patch,
> 
> Thanks for reviewing.  It seems that there will be further exchange
> needed before I can pick it up?

Yes, I think so.

-- 
Jakub Narębski



Re: [PATCH v2] ls-files: add pathspec matching for submodules

2016-09-21 Thread Junio C Hamano
Junio C Hamano  writes:

> Brandon Williams  writes:
>
>> On a similar but slightly different note.  In general do we want
>> the pathspec '??b' to match against the sib/ directory and
>> subsequently have ls-files print all entries inside of the sib/
>> directory?  (this is in the non-recursive case)
>
> I'd need to find time to dig a bit of history before I can give a
> firm opinion on this, but here is a knee-jerk version of my reaction.

In the context of what you are doing, i.e. "ls-files that recurses
into submodules", my opinion is that "ls-files --recurse-submodules"
should behave wrt pathspecs AS IF all the submodule contents are
flattened into a single index of the superproject.

In the sample scenario under discussion, i.e.

In the superproject we have these
$ git ls-files -s
100644 c489803d5bdec1755f650854fe7ef5ab7a3ee58d 0   .gitmodules
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   sib/file
16 1f5a0695289c500f25e7fa55e3ad27e394d1206b 0   sub

In 'sub' submodule we have this
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   file

such a flattend index would look like this:

100644 c489803d5bdec1755f650854fe7ef5ab7a3ee58d 0   .gitmodules
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   sib/file
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   sub/file

i.e. removing 'sub' submodule entry from the index of the
superproject and overlay everything in the submodule with sub/
prefixed to its path.

And with such an index, if and only if a path matches a pathspec,
"git ls-files --recurse-submodules" run at the toplevel with the
same pathspec should show the path.  That means

$ git ls-files --recurse-submodules '??b'

would show nothing (not even 'sub'), while

$ git ls-files --recurse-submodules '??b*'

should show sib/file and sub/file.  That is because that is how the
command without "--recurse-submodules" working on that flat index
would produce.

The "we have historically two kinds of pathspecs and they differ how
they work with wildcard" is a separate issue, I would think, even
though the result would affect what should happen in the above
example (i.e. if we said "either a pattern match or a literal match
to a leading directory path should make everything underneath
match", '??b' would make sib/ and sub/ to be
shown).


Re: [PATCH 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-09-21 Thread Jakub Narębski
W dniu 21.09.2016 o 18:50, Junio C Hamano pisze:
> Ævar Arnfjörð Bjarmason   writes:
> 
>> There's surely room for improvement here, but I just wanted to address
>> the very common case of sticking "git describe" output into commit
>> messages without trying to link to all possible refnames, that's going
>> to be a rather futile exercise given that this is free text, and it
>> would be prohibitively expensive to look up whether the references in
>> question exist in our repository.
> 
> When I saw 2/3 I wondered about one thing and 3/3 shares the same,
> which is that we only use regex match and do not validate for a
> false match.  Would it be too expensive to pick up what _looks_ like
> a rev (e.g. hex or g(refname regexp)-hex) then validate it with
> "rev-parse --verify --quiet" to make sure it is a rev, before
> actually making it a link?  Even if are we trying to account for
> people referring to commits that do not exist in this repository
> (e.g. some other project, in a submodule repository, or just an
> earlier incarnation of rebasing that has since been lost), it seems
> to me that it does not help to mark them with a link that won't
> resolve.

I think it could be a good *option*, but revision verification
could be costly, for example in the 'log' view with multiple commits
and multiple revision-like looking candidates, even if we were able
to do it with one command.

Also, "git rev-parse --verify [--quiet]" can verify only one
revision at once, isn't it? Maybe something like 'git cat-file
--batch-check' would be better (one fork)?

It's a matter of balance between false positives (and unresolving
links) and performance...

Best,
-- 
Jakub Narębski



Re: What's cooking in git.git (Sep 2016, #05; Mon, 19)

2016-09-21 Thread Kevin Daudt
On Wed, Sep 21, 2016 at 10:36:57AM -0700, Junio C Hamano wrote:
> Kevin Daudt  writes:
> 
> > On Mon, Sep 19, 2016 at 04:30:34PM -0700, Junio C Hamano wrote:
> >> 
> >> * kd/mailinfo-quoted-string (2016-09-19) 2 commits
> >>  - mailinfo: unescape quoted-pair in header fields
> >>  - t5100-mailinfo: replace common path prefix with variable
> >
> > Is this good enough, or do you want me to look into the feedback from
> > jeff?
> 
> If you are talking about the simplified loop that deliberately sets
> a rule that is looser than RFC, yes, I'd like to see you at least
> consider the pros and cons of his approach, which looked nicer to my
> brief reading of it.
> 
> It is perfectly OK by me (it may not be so if you ask Peff) if you
> decide that your version is better after doing so, though.
> 
> Thanks.

Alright, I'll look into it.


Re: What's cooking in git.git (Sep 2016, #05; Mon, 19)

2016-09-21 Thread Junio C Hamano
Kevin Daudt  writes:

> On Mon, Sep 19, 2016 at 04:30:34PM -0700, Junio C Hamano wrote:
>> 
>> * kd/mailinfo-quoted-string (2016-09-19) 2 commits
>>  - mailinfo: unescape quoted-pair in header fields
>>  - t5100-mailinfo: replace common path prefix with variable
>
> Is this good enough, or do you want me to look into the feedback from
> jeff?

If you are talking about the simplified loop that deliberately sets
a rule that is looser than RFC, yes, I'd like to see you at least
consider the pros and cons of his approach, which looked nicer to my
brief reading of it.

It is perfectly OK by me (it may not be so if you ask Peff) if you
decide that your version is better after doing so, though.

Thanks.


Re: [PATCH v3 0/3] handle multiline in-body headers

2016-09-21 Thread Junio C Hamano
Jonathan Tan  writes:

> With the above change, it is actually no longer necessary to make
> is_scissors_line take plain char * (the second patch) - I think that
> that patch still improves the code, but let me know if you want me to
> remove it from this patch set.

I agree with you that it is an independently good change.  Let's
keep it.

Overall looked very good.  Thanks, will queue.


Re: [PATCH 1/6] i18n: commit: mark message for translation

2016-09-21 Thread Junio C Hamano
Jean-Noël AVILA  writes:

> Signed-off-by: Vasco Almeida 
> Signed-off-by: Jean-Noel Avila 
> ---
>
> Instead of distillating change requests, I'd better do it by
> myself. Here is the reworked version of the patch.
>

It would have helped if you had an in-body header to retitle this
patch for 3/6, instead of leaving it as 1/6 for "commit" X-<.

Will tweak and fit it in.  Thanks.

>  diff.c | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index c6da383..494f723 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -55,6 +55,11 @@ static char diff_colors[][COLOR_MAXLEN] = {
>   GIT_COLOR_NORMAL,   /* FUNCINFO */
>  };
>  
> +static NORETURN void die_want_option(const char *option_name)
> +{
> + die(_("option '%s' requires a value"), option_name);
> +}
> +
>  static int parse_diff_color_slot(const char *var)
>  {
>   if (!strcasecmp(var, "context") || !strcasecmp(var, "plain"))
> @@ -3325,7 +3330,7 @@ void diff_setup_done(struct diff_options *options)
>   if (options->output_format & DIFF_FORMAT_NO_OUTPUT)
>   count++;
>   if (count > 1)
> - die("--name-only, --name-status, --check and -s are mutually 
> exclusive");
> + die(_("--name-only, --name-status, --check and -s are mutually 
> exclusive"));
>  
>   /*
>* Most of the time we can say "there are changes"
> @@ -3521,7 +3526,7 @@ static int stat_opt(struct diff_options *options, const 
> char **av)
>   if (*arg == '=')
>   width = strtoul(arg + 1, , 10);
>   else if (!*arg && !av[1])
> - die("Option '--stat-width' requires a value");
> + die_want_option("--stat-width");
>   else if (!*arg) {
>   width = strtoul(av[1], , 10);
>   argcount = 2;
> @@ -3530,7 +3535,7 @@ static int stat_opt(struct diff_options *options, const 
> char **av)
>   if (*arg == '=')
>   name_width = strtoul(arg + 1, , 10);
>   else if (!*arg && !av[1])
> - die("Option '--stat-name-width' requires a 
> value");
> + die_want_option("--stat-name-width");
>   else if (!*arg) {
>   name_width = strtoul(av[1], , 10);
>   argcount = 2;
> @@ -3539,7 +3544,7 @@ static int stat_opt(struct diff_options *options, const 
> char **av)
>   if (*arg == '=')
>   graph_width = strtoul(arg + 1, , 10);
>   else if (!*arg && !av[1])
> - die("Option '--stat-graph-width' requires a 
> value");
> + die_want_option("--stat-graph-width");
>   else if (!*arg) {
>   graph_width = strtoul(av[1], , 10);
>   argcount = 2;
> @@ -3548,7 +3553,7 @@ static int stat_opt(struct diff_options *options, const 
> char **av)
>   if (*arg == '=')
>   count = strtoul(arg + 1, , 10);
>   else if (!*arg && !av[1])
> - die("Option '--stat-count' requires a value");
> + die_want_option("--stat-count");
>   else if (!*arg) {
>   count = strtoul(av[1], , 10);
>   argcount = 2;


Re: [PATCH 1/3] gitweb: Fix an ancient typo in v1.7.7-rc1-1-g0866786

2016-09-21 Thread Ævar Arnfjörð Bjarmason
On Wed, Sep 21, 2016 at 7:14 PM, Jakub Narębski  wrote:
> W dniu 21.09.2016 o 16:17, Ævar Arnfjörð Bjarmason napisał:
>> On Wed, Sep 21, 2016 at 3:33 PM, Jakub Narębski  wrote:
>>> W dniu 21.09.2016 o 13:44, Ævar Arnfjörð Bjarmason napisał:
> [...]
>
 -# Sanitize for use in XHTML + application/xml+xhtm (valid XML 1.0)
 +# Sanitize for use in XHTML + application/xml+xhtml (valid XML 1.0)
>>>
>>> Nb. I wonder how common is use of XHTML nowadays, with HTML5 as standard...
>>
>> It's sent to modern browsers, I noticed it because when doing the rest
>> of the patches in the series the slightest mistake in the HTML syntax
>> would cause the page not to render in Chrome, because
>> application/xml+xhtml activates its anal parsing mode.
>
> What I wanted to say is if we should support XHTML mimetype at all;
> the future is HTML5 and perhaps gitweb should always use 'text/html'.

Regardless of what MIME type we'd normally use, as long as browsers
support application/xml+xhtml developing with it is very handy,
because you get the instant equivalent of compile errors for your
HTML, as opposed to the usual behavior of "oh this doesn't parse, but
let's try to make sense of it anyway", which often leads to fruitless
debugging sessions just because you forgot to close some tag or
quotation.


Re: [PATCH 1/3] gitweb: Fix an ancient typo in v1.7.7-rc1-1-g0866786

2016-09-21 Thread Jakub Narębski
W dniu 21.09.2016 o 16:17, Ævar Arnfjörð Bjarmason napisał:
> On Wed, Sep 21, 2016 at 3:33 PM, Jakub Narębski  wrote:
>> W dniu 21.09.2016 o 13:44, Ævar Arnfjörð Bjarmason napisał:
[...]

>>> -# Sanitize for use in XHTML + application/xml+xhtm (valid XML 1.0)
>>> +# Sanitize for use in XHTML + application/xml+xhtml (valid XML 1.0)
>>
>> Nb. I wonder how common is use of XHTML nowadays, with HTML5 as standard...
> 
> It's sent to modern browsers, I noticed it because when doing the rest
> of the patches in the series the slightest mistake in the HTML syntax
> would cause the page not to render in Chrome, because
> application/xml+xhtml activates its anal parsing mode.

What I wanted to say is if we should support XHTML mimetype at all;
the future is HTML5 and perhaps gitweb should always use 'text/html'.

But this is unrelated to this change...
-- 
Jakub Narębski



Re: [PATCH v2] ls-files: add pathspec matching for submodules

2016-09-21 Thread Junio C Hamano
Brandon Williams  writes:

> On a similar but slightly different note.  In general do we want
> the pathspec '??b' to match against the sib/ directory and
> subsequently have ls-files print all entries inside of the sib/
> directory?  (this is in the non-recursive case)

I'd need to find time to dig a bit of history before I can give a
firm opinion on this, but here is a knee-jerk version of my reaction.

 * A pathspec element that matches literally to a directory causes
   itself and everything underneath the directory match that
   element, e.g. "sib" would be considered a match.

 * Otherwise, a pathspec that matches with the whole path as a
   pattern matches the path, e.g. "??b" would match "sib" itself,
   but not "sib/file".  Note that "??b*" would match "sib" and
   "sib/file" because the pattern match is without FNM_PATNAME
   unless ':(glob)' magic is in effect.

Historically, some commands treated a pathspec as purely a prefix
match (i.e. the former) and did not use _any_ pattern matching,
while other commands did both of the above two (e.g. compare ls-tree
and ls-files).  I thought we were slowly moving towards unifying
them, but apparently 'git log -- "D?cumentation"' does not show
anything close to what 'git log -- Documentation' gives us even in
today's Git.

Probably we want to change it at some point so that a pattern that
matches one leading directory would cause everything underneath to
match, e.g. "??b" would include "sib/file" just because "sib" would.




Re: [PATCH 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-09-21 Thread Jakub Narębski
W dniu 21.09.2016 o 13:44, Ævar Arnfjörð Bjarmason napisał:

> Change the log formatting function to know about "git describe" output
> like v2.8.0-4-g867ad08 in addition to just plain 867ad08.

All right, that is a good plan.

> 
> This also fixes a micro-regression in my change of the minimum SHA1
> length from 8 to 7, which is that dated tags like
> hadoop-20160921-113441-20-g094fb7d would start thinking the "20160921"
> part was a commit.

Actually 20160921 is 8 characters, so assuming that '-' is treated
as word boundary by Perl, it is not a regression; this false positive
was there.  The new feature would help, instead of linking false match
it links whole git-describe output.

So this paragraph needs to be changed wrt. the above.

Note that there are quite a bit of shortened SHA-1 that are composed
entirely from digits, without a-f characters.

> 
> There are still many valid refnames that we don't link to
> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
> v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
> similarly it's trivially possible to create some refnames like
> "æ/var-gf6727b0" or whatever which won't be picked up by this regex.

Hopefully hierarchical tags are rare.  We need to reduce false
positives.

> 
> There's surely room for improvement here, but I just wanted to address
> the very common case of sticking "git describe" output into commit
> messages without trying to link to all possible refnames, that's going
> to be a rather futile exercise given that this is free text, and it
> would be prohibitively expensive to look up whether the references in
> question exist in our repository.

Note that we do not ask Git at the time of displaying commit message
if the link is valid for performance reasons; we link it, and the link
may be invalid if it was a false positive.

Note that recommended way to refer to other commit in commit mesages
is (see Documentation/SubmittingPatches):

  If you want to reference a previous commit in the history of a stable
  branch, use the format "abbreviated sha1 (subject, date)",
  with the subject enclosed in a pair of double-quotes, like this:

 Commit f86a374 ("pack-bitmap.c: fix a memleak", 2015-03-30)
 noticed that ...

Hmmm... this makes previous commit even more important.

> ---
>  gitweb/gitweb.perl | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 101dbc0..3a52bc7 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>   my $line = shift;
>  
>   $line = esc_html($line, -nbsp=>1);
> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
> + $line =~ s{
> +\b
> +(
> +# The output of "git describe", e.g. v2.10.0-297-gf6727b0
> +# or hadoop-20160921-113441-20-g094fb7d

All right, for more complex regular expressions using in-line comments
(extended regexp in Perl) is a good idea.

> +(? +[A-Za-z0-9.-]+
> +(?!\.) # refs can't end with ".", see check_refname_format()

If we can assume that tag name is at least two characters (instead of
at least one character), we could get rid of those extended regexp
lookaround assertions:

  (? +-g[0-9a-fA-F]{7,40}

If we are limiting to git-describe output, we can get rid of A-F here.

> +|
> +# Just a normal looking Git SHA1
> +[0-9a-fA-F]{7,40}
> +)
> +\b
> +}{
>   $cgi->a({-href => href(action=>"object", hash=>$1),
>   -class => "text"}, $1);
> - }eg;
> + }egx;
>  
>   return $line;
>  }
> 

Good work. 

I assume that you are using git-describe output in commit messages
a lot, isn't it?
-- 
Jakub Narębski



Re: v2.10.0: ls-tree exit status is always 0, this differs from ls(1)

2016-09-21 Thread Junio C Hamano
Steffen Nurpmeso  writes:

> I think this behaviour contradicts the manual which strongly links
> ls-tree to ls(1):

Patches to the documentation is very much welcomed.

Somewhere the similarity must end, and actually it ends a lot
earlier, as "/bin/ls" takes exact paths while "ls-tree" (or any
other Git command for that matter) takes a pathspec pattern,
and not having a path that matches the pathspec pattern is not
an error condition.

Thanks.


Re: [PATCH 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-09-21 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  <ava...@gmail.com> writes:

> There's surely room for improvement here, but I just wanted to address
> the very common case of sticking "git describe" output into commit
> messages without trying to link to all possible refnames, that's going
> to be a rather futile exercise given that this is free text, and it
> would be prohibitively expensive to look up whether the references in
> question exist in our repository.

When I saw 2/3 I wondered about one thing and 3/3 shares the same,
which is that we only use regex match and do not validate for a
false match.  Would it be too expensive to pick up what _looks_ like
a rev (e.g. hex or g(refname regexp)-hex) then validate it with
"rev-parse --verify --quiet" to make sure it is a rev, before
actually making it a link?  Even if are we trying to account for
people referring to commits that do not exist in this repository
(e.g. some other project, in a submodule repository, or just an
earlier incarnation of rebasing that has since been lost), it seems
to me that it does not help to mark them with a link that won't
resolve.

> ---
>  gitweb/gitweb.perl | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 101dbc0..3a52bc7 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>   my $line = shift;
>  
>   $line = esc_html($line, -nbsp=>1);
> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
> + $line =~ s{
> +\b
> +(
> +    # The output of "git describe", e.g. v2.10.0-297-gf6727b0
> +# or hadoop-20160921-113441-20-g094fb7d
> +(? +[A-Za-z0-9.-]+
> +(?!\.) # refs can't end with ".", see check_refname_format()
> +-g[0-9a-fA-F]{7,40}
> +|
> +# Just a normal looking Git SHA1
> +[0-9a-fA-F]{7,40}
> +)
> +\b
> +}{
>   $cgi->a({-href => href(action=>"object", hash=>$1),
>   -class => "text"}, $1);
> - }eg;
> + }egx;
>  
>   return $line;
>  }


Re: [PATCH v1] travis-ci: ask homebrew for the its path instead of hardcoding it

2016-09-21 Thread Junio C Hamano
Lars Schneider  writes:

>> On 21 Sep 2016, at 11:31, stefan.na...@atlas-elektronik.com wrote:
>> 
>> In the Subject: s/the //
>> 
>> Am 21.09.2016 um 10:45 schrieb larsxschnei...@gmail.com:
>>> From: Lars Schneider 
>>> 
>>> The TravisCI macOS build is broken because homebrew (a macOS depedency
>> 
>> s/depedency/dependency/
>
> Thanks for spotting both errors!
>
> @Junio: Should I make a v2?

No.  osx before_install stuff was in there since the very beginning,
i.e. 522354d7 ("Add Travis CI support", 2015-11-27), so I guess this
needs to go to maint-2.7 and upwards, but I guess we should discourage
people to stay on an older maintenance track forever, so let's do
this only for 'maint' and upwards.


Re: [PATCH] gitweb: use highlight's shebang detection

2016-09-21 Thread Junio C Hamano
Jakub Narębski  writes:

> W dniu 06.09.2016 o 21:00, Ian Kelling pisze:
>
>> The highlight binary can detect language by shebang when we can't tell
>> the syntax type by the name of the file. 
>
> Was it something always present among highlight[1] binary capabilities,
> or is it something present only in new enough highlight app?  Or only
> in some specific fork / specific binary?  I couldn't find language
> detection in highlight[1] documentation...
> ...
> Thank you for your work on this patch,

Thanks for reviewing.  It seems that there will be further exchange
needed before I can pick it up?


Re: v2.9.3 and v2.10.0: `name-ref' HEAD gives wrong branch name

2016-09-21 Thread Junio C Hamano
Jakub Narębski  writes:

>> Have you tried "git symbolic-ref HEAD"?
>> 
>> $ git symbolic-ref HEAD
>> refs/heads/master
>> 
>> If you don't want the fully-qualified ref, you can add --short:
>> 
>> $ git symbolic-ref --short HEAD
>> master
>
> This does not work for detached HEAD, but perhaps you don't need
> to worry about this.

I am not sure what you mean by "does not work".  Asking what ref
HEAD points at to symbolic-ref will tell you it does not point at
anything by exiting with non-zero status and that can be relied
upon.

Asking "symbolic-ref HEAD" has been the way how "git branch" and
other commands find out what branch is currently checked out for
almost eternity ("git symbolic-ref" appeared in Git v0.99.8).



Re: [PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages

2016-09-21 Thread Jakub Narębski
W dniu 21.09.2016 o 13:44, Ævar Arnfjörð Bjarmason napisał:

> Subject: [PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages

This is modification of a feature, not a new feature it sounds like.
I think the following title / subject would be better:

  Subject: [PATCH 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+

>
> Change the minimum length of a commit we'll link to from 8 to 7.

I think it would read better as:

  Change the minimum length of an abbreviated object identifier in the
  commit message gitweb tries to turn into link from 8 hexchars to 7.

> 
> This arbitrary minimum length of 8 was introduced in
> v1.4.4.2-151-gbfe2191, but as seen in e.g. v1.7.4-1-gdce9648 the
> default abbreviation length is 7.

Right. I wonder why it was 8 in gitweb...

> 
> It's still possible to reference SHA1s down to 4 characters in length,
> see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make
> git actually produce that, so I doubt anyone is putting that into log
> messages in practice, but people definitely do put 7 character SHA1s
> into log messages.

There is an additional problem: the shorter SHA1 abbrev we try to
match, the more possibility of false positives, words that only look
like (shortened SHA-1).

For 7 characters there is at last one word that can be mistaken
for SHA1 abbrev, namely 'deedeed' (hopefully rare in commit messages).
For 6 characters we have 'accede', 'beaded', 'decade' (!), 'deface',
'facade' (!!), and possibly more (and of course all 7 character
hexdigit words).

Also, the number of digits provided as an optional parameter to
--abbrev or --abbrev-commit options is only a minimal number of 
hexdigits: Git would use as many as needed for the abbreviated SHA-1
to be unambiguous, at current time.


I think allowing 7-character shortened SHA-1, which is what Git
produces for smaller repositories by default is (might be?) a good
idea.  Thanks for the patch.

> 
> I think it's fairly dubious to link to things matching [0-9a-fA-F]
> here as opposed to just [0-9a-f], that dates back to the initial
> version of gitweb from 161332a. Git will accept all-caps SHA1s, but
> didn't ever produce them as far as I can tell.

All right, thanks for reminder.

Signoff?

> ---
>  gitweb/gitweb.perl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 9473daf..101dbc0 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2036,7 +2036,7 @@ sub format_log_line_html {
>   my $line = shift;
>  
>   $line = esc_html($line, -nbsp=>1);
> - $line =~ s{\b([0-9a-fA-F]{8,40})\b}{
> + $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>   $cgi->a({-href => href(action=>"object", hash=>$1),
>   -class => "text"}, $1);
>   }eg;
> 

Nice and simple.

P.S. I have reworking of commit message parsing and enhancement in my
long, long and dated gitweb TODO list :-(

P.P.S. Kay Sievers no longer works on gitweb, and I think no longer
works at SuSE but at RedHat.

Best,
-- 
Jakub Narębski


Re: What's cooking in git.git (Sep 2016, #05; Mon, 19)

2016-09-21 Thread Kevin Daudt
On Mon, Sep 19, 2016 at 04:30:34PM -0700, Junio C Hamano wrote:
> 
> * kd/mailinfo-quoted-string (2016-09-19) 2 commits
>  - mailinfo: unescape quoted-pair in header fields
>  - t5100-mailinfo: replace common path prefix with variable

Is this good enough, or do you want me to look into the feedback from
jeff?


Re: 2.10.0: git log --oneline prints gpg signatures in 4 lines

2016-09-21 Thread Junio C Hamano
Jeff King  writes:

> I don't think anything has changed here in 2.10. Running "git log
> --oneline --show-signature" has _always_ been horribly ugly. However,
> 2.10 did introduce the "log.showsignature" config, which makes "git log
> --oneline" pretty unusable when it is enabled. Ditto for one-liner uses
> of "--format".
>
> I think we should probably ignore the config entirely when using any of
> the one-liner formats (and I'd include --format, too, even though it can
> sometimes be multi-line; it already has %GG to include that information
> as appropriate).
>
> Another option would be to somehow represent the signature information
> in the --oneline output, but I think I'd rather leave that for people to
> experiment with using "--format".

My thinking is that "--oneline --show-signature" and "--oneline"
with log.showsignature set to true without "--no-show-signature" on
the command line should produce identical result.

The current definition of "--oneline" seems to me "the commit object
name and the log message is shown on a single line" (I consider that
the decoration given by "--decorate" is part of "commit object
name"), but there may be other things shown that may not fit on a
single line.

I do not terribly mind changing the definition of "--oneline" to
"what is output is ONLY the commit object name and the log message,
nothing else is shown", though.  After all, the output from "log" is
for human consumption, and it is a bug in the script if it is
depending on parsing the "log" output, so it is OK to change its
output to suit human needs, if necessary.






Re: 2.10.0: git log --oneline prints gpg signatures in 4 lines

2016-09-21 Thread Junio C Hamano
Leandro Lucarella  writes:

> Hi, starting from 2.10.0 I noticed that when using git log --oneline,
> if commits are signed with GPG, now the signatures are printed too, and
> it takes 3 lines for the signature information + 1 line for the title
> of the commit, so suddenly --oneline became --fourline :)
>
> Is this really intended?

I think so.  The documentation for --oneline may say "one line per
commit" but in reality, some things cannot be expressed on one line.
It should probably be described as "commit object name and its log
message is formatted to be on a single line" without limiting other
things that the user may have asked to be shown.

And show-signature is an example of what the user additionally can
ask that cannot be fit on a single line.  There probably are others.






Re: clarification of `rev-list --no-walk ^`?

2016-09-21 Thread Junio C Hamano
Michael J Gruber  writes:

> I think you answered to e-mail (in-reply-to) and to Philip's actual text
> (quotes), but just in case:

Yes, my mistake.  Sorry for the noise.


Re: Bug: pager. doesn't work well with editors

2016-09-21 Thread Junio C Hamano
Jeff King  writes:

> And this isn't really limited to the editor. It's more _annoying_ with
> the editor, but really "pager.tag" does not make any sense to set right
> now, because it is handled outside of the "tag" command entirely, and
> doesn't know what mode the tag command will be running in.

Stepping back even further, perhaps the whole pager. was a bad
interim move.  For those who set "less" without "-F", being able to
set pager. to false may still be necessary, but I am wondering
about setting it to true or a command string here.

It did mean well and may have helped when "git " that produces
reams of output had not yet learned to auto-paginate as a stop-gap
measure by allowing users to set pager., but I wonder if the
ideal course of action was to identify (or "wait until people show
their desire") individual operating modes of various commands and
teach them to auto-paginate.  For example, "tag -l" may be one of
them that we would want to teach to.


Re: [PATCH] mailinfo: unescape quoted-pair in header fields

2016-09-21 Thread Junio C Hamano
Jeff King  writes:

> So if that's the case, do we actually need to care if we see any
> parenthesized comments? I think we should just leave comments in place
> either way, so syntactically they are only interesting insofar as we
> replace quoted pairs or not.
>
> IOW, I wonder if:
>
>   while ((c = *in++)) {
>   switch (c) {
>   case '\\':
>   if (!*in)
>   return 0; /* ignore trailing backslash */
>   /* quoted pair */
>   strbuf_addch(out, *in++);
>   break;
>   case '"':
>   /*
>* This may be starting or ending a quoted section,
>* but we do not care whether we are in such a section.
>* We _do_ need to remove the quotes, though, as they
>* are syntactic.
>*/
>   break;
>   default:
>   /*
>* Anything else is a normal character we keep. These
>* _might_ be violating the RFC if they are magic
>* characters outside of a quoted section, but we'd
>* rather be liberal and pass them through.
>*/
>   strbuf_addch(out, c);
>   break;
>   }
>   }
>
> would work. I certainly do not mind following the RFC more closely, but
> AFAICT the very simple code above gives a pretty forgiving outcome.

The simplicity of the code does look attractive to me.  I do not
offhand see an obvious case/flaw that this simplified rule would
mangle a valid human-readable part.



Re: [PATCH v3] format-patch: Add --rfc for the common case of [RFC PATCH]

2016-09-21 Thread Junio C Hamano
Josh Triplett  writes:

> This provides a shorter and more convenient alias for
> --subject-prefix='RFC PATCH'.

Shorter and more convenient is quite subjective but more important
as a justification is that we believe [RFC PATCH] is used fairly
commonly (at least in certain circles).

> Includes documentation in the format-patch manpage, and a new test
> covering --rfc.

We can see that from diffstat ;-)

I'd retitle this like so:

format-patch: add "--rfc" for the common case of [RFC PATCH]

Add an alias for --subject-prefix='RFC PATCH', which is used
commonly in some development communities to deserve such a
short-hand.

Signed-off-by: Josh Triplett 
Reviewed-by: Jeff King 
Signed-off-by: Junio C Hamano 

> diff --git a/Documentation/git-format-patch.txt 
> b/Documentation/git-format-patch.txt
> index 9624c84..9b200b3 100644
> --- a/Documentation/git-format-patch.txt
> +++ b/Documentation/git-format-patch.txt
> @@ -19,7 +19,8 @@ SYNOPSIS
>  [--start-number ] [--numbered-files]
>  [--in-reply-to=Message-Id] [--suffix=.]
>  [--ignore-if-in-upstream]
> -[--subject-prefix=Subject-Prefix] [(--reroll-count|-v) ]
> +[--rfc] [--subject-prefix=Subject-Prefix]
> +[(--reroll-count|-v) ]
>  [--to=] [--cc=]
>  [--[no-]cover-letter] [--quiet] [--notes[=]]
>  []
> @@ -172,6 +173,11 @@ will want to ensure that threading is disabled for `git 
> send-email`.
>   allows for useful naming of a patch series, and can be
>   combined with the `--numbered` option.
>  
> +--rfc::
> + Alias for `--subject-prefix="RFC PATCH"`. RFC means "Request For
> + Comments"; use this when sending an experimental patch for
> + discussion rather than application.
> +

I do not think we want to be in the business of encouragign or
discouraging the use of "[RFC PATCH]".  

--rfc:: A short-hand for `--subject-prefix="RFC PATCH"`.
RFC stands for "request for comments" and such a
prefix is used in some development communities when
sending a patch primarily to illustrate an idea to
help discussion, rather than to be applied.

perhaps?

The code and test both look good to me.

Thanks.


Re: What's cooking in git.git (Sep 2016, #05; Mon, 19)

2016-09-21 Thread Junio C Hamano
Duy Nguyen  writes:

> On Tue, Sep 20, 2016 at 6:30 AM, Junio C Hamano  wrote:
>> * nd/checkout-disambiguation (2016-09-09) 4 commits
>>  - fixup! checkout.txt: document a common case that ignores ambiguation rules
>>  - checkout: fix ambiguity check in subdir
>>  - checkout.txt: document a common case that ignores ambiguation rules
>>  - checkout: add some spaces between code and comment
>>
>>  "git checkout " does not follow the usual disambiguation
>>  rules when the  can be both a rev and a path, to allow
>>  checking out a branch 'foo' in a project that happens to have a
>>  file 'foo' in the working tree without having to disambiguate.
>>  This was poorly documented and the check was incorrect when the
>>  command was run from a subdirectory.
>>
>>  Waiting for an Ack for fixup!
>
> Oops, I didn't know (I have about 300 unread git mails in my inbox), Ack.

Thanks.


Re: What's cooking in git.git (Sep 2016, #05; Mon, 19)

2016-09-21 Thread Junio C Hamano
Johannes Schindelin  writes:

>> * jk/rebase-i-drop-ident-check (2016-07-29) 1 commit
>>   (merged to 'next' on 2016-08-14 at 6891bcd)
>>  + rebase-interactive: drop early check for valid ident
>> 
>>  Even when "git pull --rebase=preserve" (and the underlying "git
>>  rebase --preserve") can complete without creating any new commit
>>  (i.e. fast-forwards), it still insisted on having a usable ident
>>  information (read: user.email is set correctly), which was less
>>  than nice.  As the underlying commands used inside "git rebase"
>>  would fail with a more meaningful error message and advice text
>>  when the bogus ident matters, this extra check was removed.
>> 
>>  Will hold to see if people scream.
>>  cf. <20160729224944.ga23...@sigill.intra.peff.net>
>
> Let's do this.

We have already been doing it (i.e. "hold to see if people scream")
for some time.

Does it conflict with your effort to reimplement "rebase -i" in C to
keep this in 'next'?  Do you want it to move to 'master'?  I was
under the impression that it would not make a difference to have or
not have this patch once your reimplementation gets merged (meaning:
the removal of the three lines will be done by wholesale removal of
git-rebase--interactive.sh done the endgame of your series), so...



Re: 2.10.0: git log --oneline prints gpg signatures in 4 lines

2016-09-21 Thread Michael J Gruber
Leandro Lucarella venit, vidit, dixit 21.09.2016 15:53:
> On Tue, 20 Sep 2016 19:15:33 -0400
> Jeff King  wrote:
> 
>> On Tue, Sep 20, 2016 at 05:09:54PM +0200, Leandro Lucarella wrote:
>>
>>> Hi, starting from 2.10.0 I noticed that when using git log
>>> --oneline, if commits are signed with GPG, now the signatures are
>>> printed too, and it takes 3 lines for the signature information + 1
>>> line for the title of the commit, so suddenly --oneline became
>>> --fourline :)
>>>
>>> Is this really intended?
>>
>> I don't think anything has changed here in 2.10. Running "git log
>> --oneline --show-signature" has _always_ been horribly ugly. However,
>> 2.10 did introduce the "log.showsignature" config, which makes "git
>> log --oneline" pretty unusable when it is enabled. Ditto for
>> one-liner uses of "--format".
>>
>> I think we should probably ignore the config entirely when using any
>> of the one-liner formats (and I'd include --format, too, even though
>> it can sometimes be multi-line; it already has %GG to include that
>> information as appropriate).
> 
> Woops! Definitely it shouldn't be added when --format is used, this is
> also breaking some scripts I have using git log --format to get some
> information about commits, and GPG information is being output even
> when there is anything about GPG requested in the chosen format.
> 
> I guess I will disable log.showsignatures for now... :(
> 

I guess that's one of the reasons why I didn't like that config option
to begin with. There's a flood of these config "convenience" options
lately where we have to special case for scripting... Aliases must have
become old school or something.

Note that "git log --show-notes", "git log --oneline --raw", "git log
--oneline -p" and similar are not "one line" either. So
"--show-signature" behaves just like all others.

git log --format="%h %G? %s"

or a colored version thereof may be what you want to alias as "log1" or
such.

Cheers,
Michael


Re: clarification of `rev-list --no-walk ^`?

2016-09-21 Thread Michael J Gruber
Junio C Hamano venit, vidit, dixit 19.09.2016 18:12:
> Michael J Gruber  writes:
> 
>>> It can be read that
>>>
>>> $ git cherry-pick maint next
>>>
>>> would pick two single commits, while
>>>
>>> $ git cherry-pick maint next ^master
>>>
>>> could implicitly be read as
>>>
>>> $ git cherry-pick maint next --do-walk ^master
> 
> You can read it as "master..next maint" that does force walking.
> 
>>> Clearly that's not what is intended, which is
>>>
>>> $ git cherry-pick --do-walk maint next ^master
> 
> I do not see the distinction betwee the above two you seem to be
> trying to make.  Care to explain?

I think you answered to e-mail (in-reply-to) and to Philip's actual text
(quotes), but just in case:

[git]✓ git rev-list --no-walk ^HEAD~3 HEAD
47d74601f5c6bbef215a887be2ca877e34391c9f
574dece7b651fbae385add51d7aaea1cc414007a
3fbbf6e9e40b151215cce6c6e25cd4db0232d870
[git]✓ git rev-list ^HEAD~3 --no-walk HEAD
47d74601f5c6bbef215a887be2ca877e34391c9f

The order of revision arguments and options does play role (but where I
put my HEAD does not, uhm), i.e. walk-options vs. negative refs.

The reason is that negative revs come with an implicit --do-walk (we
need to walk to mark uninteresting revs), and the last
--do-walk/--no-walk wins. That's what I meant with my comment.

But there is only one walk (or none), and one setting effective for all
revision arguments.

Michael


Re: clarification of `rev-list --no-walk ^`?

2016-09-21 Thread Michael J Gruber
[So many typos, sorry]

Michael J Gruber venit, vidit, dixit 21.09.2016 16:46:
> Junio C Hamano venit, vidit, dixit 19.09.2016 18:12:
>> Michael J Gruber  writes:
>>
 It can be read that

 $ git cherry-pick maint next

 would pick two single commits, while

 $ git cherry-pick maint next ^master

 could implicitly be read as

 $ git cherry-pick maint next --do-walk ^master
>>
>> You can read it as "master..next maint" that does force walking.
>>
 Clearly that's not what is intended, which is

 $ git cherry-pick --do-walk maint next ^master
>>
>> I do not see the distinction betwee the above two you seem to be
>> trying to make.  Care to explain?
> 
> I think you answered to e-mail (in-reply-to) and to Philip's actual text
> (quotes), but just in case:

"my e-mail"

> 
> [git]✓ git rev-list --no-walk ^HEAD~3 HEAD
> 47d74601f5c6bbef215a887be2ca877e34391c9f
> 574dece7b651fbae385add51d7aaea1cc414007a
> 3fbbf6e9e40b151215cce6c6e25cd4db0232d870
> [git]✓ git rev-list ^HEAD~3 --no-walk HEAD
> 47d74601f5c6bbef215a887be2ca877e34391c9f
> 
> The order of revision arguments and options does play role (but where I
> put my HEAD does not, uhm), i.e. walk-options vs. negative refs.

"play a role"
"negative revs"

> 
> The reason is that negative revs come with an implicit --do-walk (we
> need to walk to mark uninteresting revs), and the last

"in order to mark"

> --do-walk/--no-walk wins. That's what I meant with my comment.
> 
> But there is only one walk (or none), and one setting effective for all
> revision arguments.
> 
> Michael
> 



Re: 2.10.0: git log --oneline prints gpg signatures in 4 lines

2016-09-21 Thread Leandro Lucarella
On Tue, 20 Sep 2016 19:15:33 -0400
Jeff King  wrote:

> On Tue, Sep 20, 2016 at 05:09:54PM +0200, Leandro Lucarella wrote:
> 
> > Hi, starting from 2.10.0 I noticed that when using git log
> > --oneline, if commits are signed with GPG, now the signatures are
> > printed too, and it takes 3 lines for the signature information + 1
> > line for the title of the commit, so suddenly --oneline became
> > --fourline :)
> > 
> > Is this really intended?
> 
> I don't think anything has changed here in 2.10. Running "git log
> --oneline --show-signature" has _always_ been horribly ugly. However,
> 2.10 did introduce the "log.showsignature" config, which makes "git
> log --oneline" pretty unusable when it is enabled. Ditto for
> one-liner uses of "--format".
> 
> I think we should probably ignore the config entirely when using any
> of the one-liner formats (and I'd include --format, too, even though
> it can sometimes be multi-line; it already has %GG to include that
> information as appropriate).

Woops! Definitely it shouldn't be added when --format is used, this is
also breaking some scripts I have using git log --format to get some
information about commits, and GPG information is being output even
when there is anything about GPG requested in the chosen format.

I guess I will disable log.showsignatures for now... :(

-- 
Leandro Lucarella
Technical Development Lead
Sociomantic Labs GmbH 


Re: v2.9.3 and v2.10.0: `name-ref' HEAD gives wrong branch name

2016-09-21 Thread Steffen Nurpmeso
Hello.

Jakub Narębski  wrote:
 |W dniu 20.09.2016 o 20:54, Bryan Turner pisze:
 |> On Tue, Sep 20, 2016 at 9:23 AM, Steffen Nurpmeso  \
 |> wrote:
 |>> Hello again,
 |>>
 |>> yah, sorry, i'm back again..
 |>> I try to find a way to find the name of the current branch in an
 |>> automated way, because i need to ensure that a commit happens on
 |>> it and no other branch.  Now the problem arises that the commit
 |>> ref at the time of that commit maybe shared in between several
 |>> different branches, but no more thereafter, of course:
 |>>
 |>>   ?0[steffen@wales ]$ git branch|grep '^*'
 |>>   * stable/v14.9
 |
 |Not good, 'git branch' is a porcelain (user facing) command, so it
 |output may change; e.g. '*' could be replaced with '•'. For example
 |output for detached HEAD had changed!

Ok.  I went the road Bryan suggested, i had only forgotten this.
Yes, it caused mysterious bugs once rev-parse reversed the output,
but i didn't understand the order at first, anyway.  With todays'
ever-rotating distributions i don't even try to keep up; currently
unthinkable to use the same release of an OS for five years, like
FreeBSD 5.3.  Well.

  ...
 |>> Is there another way except looking into .git/HEAD or using sed(1)
 |>> on the output of `branch' to find the right name?
 |> 
 |> Have you tried "git symbolic-ref HEAD"?
  ...
 |This does not work for detached HEAD, but perhaps you don't need
 |to worry about this.

No, not for me: it will only switch in between two different
stable/ which exist.  But thanks, just give it to me!

 |  $ git rev-parse --symbolic-full-name HEAD
 |  refs/heads/master

This is a really good suggestion, which i will remember.  I didn't
know this at all yet:

  $ git rev-parse --symbolic-full-name --abbrev-ref=strict HEAD

Seems to do exactly what i want, non-fragile, then.

 ...
 |But
 ...
 |  You are in 'detached HEAD' state. [...]
 | 
 |  $ git rev-parse --symbolic-full-name HEAD
 |  HEAD
 |
 |  $ git symbolic-ref HEAD
 |  fatal: ref HEAD is not a symbolic ref
 |
 |  $ git branch
 |  * (HEAD detached at 3e2ebf9)
 |master

And name-rev gives "HEAD master~2" in a test of mine, or only
"master~2", or "undefined" if i use --tags, for completeness sake.

Thanks, Jakub.  I'm using the plumbing.

--steffen


Re: [PATCH 1/3] gitweb: Fix an ancient typo in v1.7.7-rc1-1-g0866786

2016-09-21 Thread Ævar Arnfjörð Bjarmason
On Wed, Sep 21, 2016 at 3:33 PM, Jakub Narębski  wrote:
> W dniu 21.09.2016 o 13:44, Ævar Arnfjörð Bjarmason napisał:
>
>> Subject: [PATCH 1/3] gitweb: Fix an ancient typo in v1.7.7-rc1-1-g0866786
>
> It is more "ancient typo from v1.7.7-rc1-1-g0866786", but perhaps more
> important is "ancient typo in a comment"

Yeah, will rephrase.

>>
>> The Content-Type is application/xhtml+xml, not application/xhtm+xml.
>
> Right.  Thanks for the patch.
>
> Signoff?

Blast! I forgot that for these 3x patches. I'll re-submit pending
further comments on the rest of the code changes in the series.

>> ---
>>  gitweb/gitweb.perl | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 33d701d..9473daf 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -1616,7 +1616,7 @@ sub esc_path {
>>   return $str;
>>  }
>>
>> -# Sanitize for use in XHTML + application/xml+xhtm (valid XML 1.0)
>> +# Sanitize for use in XHTML + application/xml+xhtml (valid XML 1.0)
>
> Nb. I wonder how common is use of XHTML nowadays, with HTML5 as standard...

It's sent to modern browsers, I noticed it because when doing the rest
of the patches in the series the slightest mistake in the HTML syntax
would cause the page not to render in Chrome, because
application/xml+xhtm activates its anal parsing mode.

>>  sub sanitize {
>>   my $str = shift;
>>
>>
>


FROM MRS LINAH MOHOHLO

2016-09-21 Thread FROM MRS LINAH


 Hollow dear,
I have initially sent you this message before, kindly get back to me

FROM Mrs Rosemary Linah Mohohlo55.docx
Description: MS-Word 2007 document


Re: v2.9.3 and v2.10.0: `name-ref' HEAD gives wrong branch name

2016-09-21 Thread Jakub Narębski
W dniu 20.09.2016 o 20:54, Bryan Turner pisze:
> On Tue, Sep 20, 2016 at 9:23 AM, Steffen Nurpmeso  wrote:
>> Hello again,
>>
>> yah, sorry, i'm back again..
>> I try to find a way to find the name of the current branch in an
>> automated way, because i need to ensure that a commit happens on
>> it and no other branch.  Now the problem arises that the commit
>> ref at the time of that commit maybe shared in between several
>> different branches, but no more thereafter, of course:
>>
>>   ?0[steffen@wales ]$ git branch|grep '^*'
>>   * stable/v14.9

Not good, 'git branch' is a porcelain (user facing) command, so it
output may change; e.g. '*' could be replaced with '•'. For example
output for detached HEAD had changed!

>>   ?0[steffen@wales ]$ git name-rev --name-only HEAD
>>   stable/v14.8
>>
>> Is there another way except looking into .git/HEAD or using sed(1)
>> on the output of `branch' to find the right name?
> 
> Have you tried "git symbolic-ref HEAD"?
> 
> $ git symbolic-ref HEAD
> refs/heads/master
> 
> If you don't want the fully-qualified ref, you can add --short:
> 
> $ git symbolic-ref --short HEAD
> master

This does not work for detached HEAD, but perhaps you don't need
to worry about this.

  $ git rev-parse --symbolic-full-name HEAD
  refs/heads/master

But

  $ git checkout HEAD^0
  Note: checking out 'HEAD^0'.

  You are in 'detached HEAD' state. [...]
 
  $ git rev-parse --symbolic-full-name HEAD
  HEAD

  $ git symbolic-ref HEAD
  fatal: ref HEAD is not a symbolic ref

  $ git branch
  * (HEAD detached at 3e2ebf9)
master

-- 
Jakub Narębski



Re: [PATCH 1/3] gitweb: Fix an ancient typo in v1.7.7-rc1-1-g0866786

2016-09-21 Thread Jakub Narębski
W dniu 21.09.2016 o 13:44, Ævar Arnfjörð Bjarmason napisał:

> Subject: [PATCH 1/3] gitweb: Fix an ancient typo in v1.7.7-rc1-1-g0866786

It is more "ancient typo from v1.7.7-rc1-1-g0866786", but perhaps more
important is "ancient typo in a comment"

>
> The Content-Type is application/xhtml+xml, not application/xhtm+xml.

Right.  Thanks for the patch.

Signoff?

> ---
>  gitweb/gitweb.perl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 33d701d..9473daf 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1616,7 +1616,7 @@ sub esc_path {
>   return $str;
>  }
>  
> -# Sanitize for use in XHTML + application/xml+xhtm (valid XML 1.0)
> +# Sanitize for use in XHTML + application/xml+xhtml (valid XML 1.0)

Nb. I wonder how common is use of XHTML nowadays, with HTML5 as standard...

>  sub sanitize {
>   my $str = shift;
>  
> 



Re: [PATCH v1] travis-ci: ask homebrew for the its path instead of hardcoding it

2016-09-21 Thread Lars Schneider

> On 21 Sep 2016, at 11:31, stefan.na...@atlas-elektronik.com wrote:
> 
> In the Subject: s/the //
> 
> Am 21.09.2016 um 10:45 schrieb larsxschnei...@gmail.com:
>> From: Lars Schneider 
>> 
>> The TravisCI macOS build is broken because homebrew (a macOS depedency
> 
> s/depedency/dependency/

Thanks for spotting both errors!

@Junio: Should I make a v2?

Thanks,
Lars


v2.10.0: ls-tree exit status is always 0, this differs from ls(1)

2016-09-21 Thread Steffen Nurpmeso
I think this behaviour contradicts the manual which strongly links
ls-tree to ls(1):

  ?0[steffen@wales ]$ ls NEWSS
  ls: cannot access 'NEWSS': No such file or directory
  ?2[steffen@wales ]$ git ls-tree --name-only master NEWSS  
 
  ?0[steffen@wales ]$ ls NEWS   
 
  NEWS
  ?0[steffen@wales ]$ git ls-tree --name-only master NEWS   
 
  NEWS
  ?0[steffen@wales ]$ 

Ciao.

--steffen


Re: Git garden shears, was Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-09-21 Thread Jakub Narębski
Hello Dscho,

W dniu 11.09.2016 o 10:33, Johannes Schindelin napisał: 
> On Fri, 9 Sep 2016, Jakub Narębski wrote:
[...]

>> When preserving merges, there are (as far as I understand it), two
>> problems:
>>  - what it means to preserve changes (which change to pick,
>>that is what is the mainline changes rebase is re-applying)
>>  - what are parents of the merge commit (at least one parent
>>would be usually rewritten)
>>
>> Maybe the internal (and perhaps also user-visible) representation
>> of merge in instruction sheet could use the notation of filter-branch,
>> that is 'map()'... it could also imply the mainline.
>>
>> That is the instruction in the internal instruction sheet could
>> look like this:
>>
>>   merge -m 1 map(2fd4e1c6...) da39a3ee... \t Merge 'foo' into master  
>>
>>
>> Note that it has nothing to do with this series!
> 
> Right. But I did solve that already. In the Git garden shears [*1*]
> (essentially my New And Improved attempt at recreating branch structures
> while rebasing), I generate and process scripts like this:
> 
>   mark onto
> 
>   # Branch: super-cool-feature
>   rewind onto
>   pick 1 feature
>   pick 2 documentation
>   mark super-cool-feature
> 
>   # Branch: typo-fix
>   rewind onto
>   pick a fix a tyop

There probably should be there

mark typo-fix

> 
>   rewind onto
>   merge -C cafebabe super-cool-feature
>   merge -C babecafe typo-fix
> 
>   cleanup super-cool-feature typo-fix
> 
> Of course this will change a little, still, once I get around to implement
> this on top of the rebase--helper.

Do I understand it correctly that it is user-visible instruction sheet, and
not the internal instruction sheet for sequencer?  This looks very nice
and is well readable.

I guess that it needs to be pre-populated by Git based on topology of the
branch being rebased.

As I see, there are three basic topologies of non-linear branch to be
rebased; all else is combination of thereof, or derivative:

1. Merge commit without branching point, that is we need to go
   from the following situation

   *---*---*---#---o---o---o<-- old base
   \\
\\=a===b===M===c<-- branch being rebased
  /
 ...---x---x---x-/  <-- side branch

  to the following:

   *---*---*---#---o---o---o
\
 \-a'--b'--M'--c' 
  /
 ...---x---x---x-/  

I think this case is the only one supported by `--preserve-merges`,
but I may be mistaken - I never had the need to use this feature IRL.

2. Branching point without accompanying merge commit, or in other words
   rebasing many branches tied together; a shrub if you will.  That is,
   we want to go from the following situation:

   *---*---*---#---o---o---o   <-- old base
   \
\--a---b---c   <-- branch being rebased
\
 \-1   <-- dependent branch

   to the following one:

   *---*---*---#---o---o---o
\
 \--a'--b'--c'
 \
  \-1'

I don't think Git supports something like that out of the box, but it
is not hard to create something like that "by hand". It is not much
of a problem... unless you forget to rebase the second dependent branch.

3. Branching point with merge point, that is subbranch created and
   merged - an "eye" (it is not a loop in DAG):

   *---*---*---#---o---o---o <-- old base
   \
\--a---b---c---M---d <-- branch being rebased
\ /
 \-1---2-/ [ <-- possibly a branch ]

   All edges are directed edges, with arrows pointing from right to
   left; that is  *---*  is really  *<---*

   The expected result is:

   *---*---*---#---o---o---o
\
 \--a'--b'--c'--M'--d'
 \ /
  \-1'--2'/

I guess that is the main purpose of your git-garden-shears script,
isn't it?

> 
> For example, I am not so hot about the "merge -C ..." syntax. I'll
> probably split that into a "remerge  " and a new "merge
> " command (the latter asking interactively for the merge commit
> message).

There is also an additional complication in that merge commit message
may be *partially* automatically generated. First there is the subject
generated by 'git merge' ("Merge branch 'foo'") or 'git pull '.
It might have been translated, or extended.  Second there is a place
for branch cover letter. Third, subject to merge.log / merge.summary
there is a shortlog.

>From those shortlog should be surely updated to correspond to the
post-rebase state.  The first line could 

Hello!!!!!

2016-09-21 Thread MUSSA ALI


DEAR FRIEND,

How are you together with your family and  business hope all is well. I know 
that this message will come to you as a surprise. I Hope that you will not 
expose or betray this trust and confident that I am about to repose on you for 
our mutual benefit of our both families. I need your urgent assistance in 
transferring the sum of ($18) million dollars to your  account ,this money has 
been dormant for years in our Bank without claim. I want the bank to release 
the money to you as a next of kin to our deceased customer who died along with 
his entire family,

I will give you full details on how the business will be executed and also note 
that you will have 50% of the above mentioned sum while 50% will be for me,your 
full data's will be required,if you are interested or you agree to handle this 
business with me as top secret. I am expecting your urgent respond.

  FILL THIS FORM BELOW PLEASE AND RESEND IT TO  ME

Your Full Name  : …….
Your Age : …….
Your Country : ………
Your Occupation : …
Your Sex : …
Your mobile Phone Number:…

Thanks
Mr Mussa  Ali


Re: 2.10.0: git log --oneline prints gpg signatures in 4 lines

2016-09-21 Thread Leandro Lucarella
On Tue, 20 Sep 2016 19:15:33 -0400
Jeff King  wrote:

> On Tue, Sep 20, 2016 at 05:09:54PM +0200, Leandro Lucarella wrote:
> 
> > Hi, starting from 2.10.0 I noticed that when using git log
> > --oneline, if commits are signed with GPG, now the signatures are
> > printed too, and it takes 3 lines for the signature information + 1
> > line for the title of the commit, so suddenly --oneline became
> > --fourline :)
> > 
> > Is this really intended?
> 
> I don't think anything has changed here in 2.10. Running "git log
> --oneline --show-signature" has _always_ been horribly ugly. However,
> 2.10 did introduce the "log.showsignature" config, which makes "git
> log --oneline" pretty unusable when it is enabled. Ditto for
> one-liner uses of "--format".

Right! Now I remember, I changed my configuration when I read the
release notes, before I upgraded. Now that I did upgrade I'm seeing the
results.

Anyway, yeah, using the new configuration makes --oneline pretty
unusable, so ignoring that option for --oneline seems like a good idea.

> I think we should probably ignore the config entirely when using any
> of the one-liner formats (and I'd include --format, too, even though
> it can sometimes be multi-line; it already has %GG to include that
> information as appropriate).
> 
> Another option would be to somehow represent the signature information
> in the --oneline output, but I think I'd rather leave that for people
> to experiment with using "--format".

I think it might be nice to show the information in one line in a ver
succinct way, like just showing a green unicode check mark (✓) or a red
cross mark (❌) if it failed (or just colour the commit subject in
green/red if a signature is present and is passing/failing).

Thanks!

-- 
Leandro Lucarella
Technical Development Lead
Sociomantic Labs GmbH 


[PATCH 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-09-21 Thread Ævar Arnfjörð Bjarmason
Change the log formatting function to know about "git describe" output
like v2.8.0-4-g867ad08 in addition to just plain 867ad08.

This also fixes a micro-regression in my change of the minimum SHA1
length from 8 to 7, which is that dated tags like
hadoop-20160921-113441-20-g094fb7d would start thinking the "20160921"
part was a commit.

There are still many valid refnames that we don't link to
e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
similarly it's trivially possible to create some refnames like
"æ/var-gf6727b0" or whatever which won't be picked up by this regex.

There's surely room for improvement here, but I just wanted to address
the very common case of sticking "git describe" output into commit
messages without trying to link to all possible refnames, that's going
to be a rather futile exercise given that this is free text, and it
would be prohibitively expensive to look up whether the references in
question exist in our repository.
---
 gitweb/gitweb.perl | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 101dbc0..3a52bc7 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2036,10 +2036,24 @@ sub format_log_line_html {
my $line = shift;
 
$line = esc_html($line, -nbsp=>1);
-   $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
+   $line =~ s{
+\b
+(
+# The output of "git describe", e.g. v2.10.0-297-gf6727b0
+# or hadoop-20160921-113441-20-g094fb7d
+(?a({-href => href(action=>"object", hash=>$1),
-class => "text"}, $1);
-   }eg;
+   }egx;
 
return $line;
 }
-- 
2.1.3



[PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages

2016-09-21 Thread Ævar Arnfjörð Bjarmason
Change the minimum length of a commit we'll link to from 8 to 7.

This arbitrary minimum length of 8 was introduced in
v1.4.4.2-151-gbfe2191, but as seen in e.g. v1.7.4-1-gdce9648 the
default abbreviation length is 7.

It's still possible to reference SHA1s down to 4 characters in length,
see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make
git actually produce that, so I doubt anyone is putting that into log
messages in practice, but people definitely do put 7 character SHA1s
into log messages.

I think it's fairly dubious to link to things matching [0-9a-fA-F]
here as opposed to just [0-9a-f], that dates back to the initial
version of gitweb from 161332a. Git will accept all-caps SHA1s, but
didn't ever produce them as far as I can tell.
---
 gitweb/gitweb.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 9473daf..101dbc0 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2036,7 +2036,7 @@ sub format_log_line_html {
my $line = shift;
 
$line = esc_html($line, -nbsp=>1);
-   $line =~ s{\b([0-9a-fA-F]{8,40})\b}{
+   $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
$cgi->a({-href => href(action=>"object", hash=>$1),
-class => "text"}, $1);
}eg;
-- 
2.1.3



[PATCH 1/3] gitweb: Fix an ancient typo in v1.7.7-rc1-1-g0866786

2016-09-21 Thread Ævar Arnfjörð Bjarmason
The Content-Type is application/xhtml+xml, not application/xhtm+xml.
---
 gitweb/gitweb.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 33d701d..9473daf 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1616,7 +1616,7 @@ sub esc_path {
return $str;
 }
 
-# Sanitize for use in XHTML + application/xml+xhtm (valid XML 1.0)
+# Sanitize for use in XHTML + application/xml+xhtml (valid XML 1.0)
 sub sanitize {
my $str = shift;
 
-- 
2.1.3



[PATCH v2 1/3] init: correct re-initialization from a linked worktree

2016-09-21 Thread Nguyễn Thái Ngọc Duy
When 'git init' is called from a linked worktree, we treat '.git'
dir (which is $GIT_COMMON_DIR/worktrees/something) as the main
'.git' (i.e. $GIT_COMMON_DIR) and populate the whole repository skeleton
in there. It does not harm anything (*) but it is still wrong.

Since 'git init' calls set_git_dir() at preparation time, which
indirectly calls get_common_dir() and correctly detects multiple
worktree setup, all git_path_buf() calls in create_default_files() will
return correct paths in both single and multiple worktree setups. The
only thing left is copy_templates(), which targets $GIT_DIR, not
$GIT_COMMON_DIR.

Fix that with get_git_common_dir(). This function will return $GIT_DIR
in single-worktree setup, so we don't have to make a special case for
multiple-worktree here.

(*) It does in fact, thanks to another bug. More on that later.

Noticed-by: Max Nordlund 
Helped-by: Michael J Gruber 
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/init-db.c |  2 +-
 t/t0001-init.sh   | 15 +++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index cc09fca..d5d7558 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -138,7 +138,7 @@ static void copy_templates(const char *template_dir)
goto close_free_return;
}
 
-   strbuf_addstr(, get_git_dir());
+   strbuf_addstr(, get_git_common_dir());
strbuf_complete(, '/');
copy_templates_1(, _path, dir);
 close_free_return:
diff --git a/t/t0001-init.sh b/t/t0001-init.sh
index 8ffbbea..488564b 100755
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
@@ -393,4 +393,19 @@ test_expect_success 'remote init from does not use config 
from cwd' '
test_cmp expect actual
 '
 
+test_expect_success 're-init from a linked worktree' '
+   git init main-worktree &&
+   (
+   cd main-worktree &&
+   test_commit first &&
+   git worktree add ../linked-worktree &&
+   mv .git/info/exclude expected-exclude &&
+   find .git/worktrees -print | sort >expected &&
+   git -C ../linked-worktree init &&
+   test_cmp expected-exclude .git/info/exclude &&
+   find .git/worktrees -print | sort >actual &&
+   test_cmp expected actual
+   )
+'
+
 test_done
-- 
2.8.2.524.g6ff3d78



[PATCH v2 2/3] init: do not set core.worktree more often than necessary

2016-09-21 Thread Nguyễn Thái Ngọc Duy
When "git init" is called with GIT_WORK_TREE environment set, we want to
keep this worktree's location in core.worktree so the user does not have
to set the environment again and again. See ef6f0af (git-init: set
core.worktree if GIT_WORK_TREE is specified - 2007-07-04)

We detect that by this logic (in needs_work_tree_config): normally
worktree's top dir would contains ".git" directory, if this is not true,
worktree is probably set to elsewhere by the user.

Unfortunately when it calls get_git_dir() it does not take ".git" files
into account. When we find a .git file, we immediately follow the file
until we find the real ".git" directory. The location of this first
".git" file is lost.

The .git file would satisfy the logic above and not create
core.worktree (correct). But because the final .git's location is used,
needs_work_tree_config() is misled and creates core.worktree anyway.

This would not be a huge deal normally. But if this happens in a
multiple worktree setup it becomes a real problem because up until now,
core.worktree will be applied to the main worktree only. If you
accidentally do "git init" from a linked worktree, you set
core.worktree (for the main repo) pointing to the _linked_ worktree.
After that point, may you live in interesting times.

Record the .git file location and use it here.

PS. real_path() resolves symlinks So original_git_dir is not truly
original if '.git' is a symlink. Hopefully it's not longer used in favor
of .git files.

Noticed-by: Max Nordlund 
Helped-by: Michael J Gruber 
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/init-db.c | 5 -
 t/t0001-init.sh   | 2 ++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index d5d7558..0d5cc76 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -23,6 +23,7 @@ static int init_is_bare_repository = 0;
 static int init_shared_repository = -1;
 static const char *init_db_template_dir;
 static const char *git_link;
+static const char *original_git_dir;
 
 static void copy_templates_1(struct strbuf *path, struct strbuf *template,
 DIR *dir)
@@ -263,7 +264,7 @@ static int create_default_files(const char *template_path)
/* allow template config file to override the default */
if (log_all_ref_updates == -1)
git_config_set("core.logallrefupdates", "true");
-   if (needs_work_tree_config(get_git_dir(), work_tree))
+   if (needs_work_tree_config(original_git_dir, work_tree))
git_config_set("core.worktree", work_tree);
}
 
@@ -314,6 +315,8 @@ static void create_object_directory(void)
 int set_git_dir_init(const char *git_dir, const char *real_git_dir,
 int exist_ok)
 {
+   original_git_dir = xstrdup(real_path(git_dir));
+
if (real_git_dir) {
struct stat st;
 
diff --git a/t/t0001-init.sh b/t/t0001-init.sh
index 488564b..b8fc588 100755
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
@@ -400,9 +400,11 @@ test_expect_success 're-init from a linked worktree' '
test_commit first &&
git worktree add ../linked-worktree &&
mv .git/info/exclude expected-exclude &&
+   cp .git/config expected-config &&
find .git/worktrees -print | sort >expected &&
git -C ../linked-worktree init &&
test_cmp expected-exclude .git/info/exclude &&
+   test_cmp expected-config .git/config &&
find .git/worktrees -print | sort >actual &&
test_cmp expected actual
)
-- 
2.8.2.524.g6ff3d78



[PATCH v2 3/3] init: reuse original_git_dir in set_git_dir_init()

2016-09-21 Thread Nguyễn Thái Ngọc Duy
Since original_git_dir is a copy of real_path(git_dir), let's reuse it
and avoid calling real_path() more than necessary.

The xstrdup() is removed too because original_git_dir is already a copy,
and we're not going to free git_link in this code probably forever.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/init-db.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 0d5cc76..d70fc45 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -330,11 +330,11 @@ int set_git_dir_init(const char *git_dir, const char 
*real_git_dir,
 * make sure symlinks are resolved because we'll be
 * moving the target repo later on in separate_git_dir()
 */
-   git_link = xstrdup(real_path(git_dir));
+   git_link = original_git_dir;
set_git_dir(real_path(real_git_dir));
}
else {
-   set_git_dir(real_path(git_dir));
+   set_git_dir(original_git_dir);
git_link = NULL;
}
startup_info->have_repository = 1;
-- 
2.8.2.524.g6ff3d78



  1   2   >