Re: git work trees

2017-04-12 Thread Duy Nguyen
On Tue, Apr 11, 2017 at 10:14 PM, taylor, david  wrote:
> We are using Git in a distributed environment.
>
> In the United States, we have the master repository in one state and a build 
> cluster in a different state.
> In addition to people in the US doing builds, we have people in other 
> countries (Ireland, India, Israel,
> Russia, possibly others) doing builds -- using the build cluster.
>
> The local mirror of the repository is NFS accessible.  The plan is to make 
> builds faster through the use
> of work trees.  The build cluster nodes involved in the build will have a 
> worktree in RAM -- checked out
> for the duration of the build.  Since the worktree is in RAM, it will not be 
> NFS accessible.
>
> [Cloning takes 20+ minutes when the network is unloaded.  Building, with 
> sources NFS mounted, takes
> 5-10 minutes.]

Using worktrees in this scenario kinda defeats the distributed nature
of git. Cloning takes long, yes. But what about just "git pull" (or
"git fetch && git checkout -f" if you want to avoid merging)?

> This presents a few problems.
>
> When we are done with a work tree, we want to clean up -- think: prune.  But, 
> you cannot prune just
> one worktree; you have to prune the set.  And no machine has access to all 
> the worktrees.  So, no
> machine knows which ones are prunable.

By "prune one worktree", did you mean delete one? Or delete a branch
the worktree uses and prune the object database?

> There is no 'lock' option to 'add'.  If someone does a 'prune' after you do 
> an 'add' and before you do a
> 'lock', then your 'add' is undone.
>
> Are there any plans to add a '[--lock]' option to 'add' to create the 
> worktree in the locked state?  And/or
> plans to add a [...] option to prune to say 'prune only this path / 
> these paths'?

So this is "git worktree prune". Adding "worktree add --locked" sounds
reasonable (and quite simple too, because "worktree add" does lock the
worktree at creation time; we just need to stop it from releasing the
lock). I might be able to do it quickly (it does not mean "available
in the next release" though).

If you need to just prune "this path", I think it's the equivalent of
"git worktree remove" (i.e. delete a specific worktree). Work has been
going on for a while to add that command. Maybe it'll be available
later this year.

> If there are no plans, is the above an acceptable interface?  And if we 
> implemented it, would it be looked
> upon favorably?

Speaking of this use case (and this is my own opinion) I think this is
stretching "git worktree" too much. When I created it, I imagined this
functionality to be used by a single person.
-- 
Duy


Re: [PATCH v4 0/5] Kill manual ref parsing code in worktree.c

2017-04-14 Thread Duy Nguyen
On Thu, Apr 13, 2017 at 07:02:22PM -0700, Junio C Hamano wrote:
> Oops, I shouldn't have done that.  When applied on top of the
> files-backend thing (or have you updated that one and is my tree
> lacking it???), this breaks quite a few tests.
> 
> t0001#41 dumps core from "git worktree add ../linked-worktree" for
> example.

This is embarassing. I worked on and off on this series over a long
period of time. I guess at the end I thought everything was ok "since
the last time" and just sent it away without realizing I hadn't
actually run the test suite, because it does fail here, now.

This fixup patch should make the test suite pass. I'm sorry for
wasting your time on this.

-- 8< --
>From 09c6d432490b8e9de99eca09cdbdaebf53bbad2c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
 
Date: Fri, 14 Apr 2017 19:39:53 +0700
Subject: [PATCH] fixup! refs: introduce get_worktree_ref_store()

---
 refs.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/refs.c b/refs.c
index a4083caf6a..05045a64e1 100644
--- a/refs.c
+++ b/refs.c
@@ -1600,11 +1600,13 @@ struct ref_store *get_worktree_ref_store(const struct 
worktree *wt)
unsigned int refs_all_capabilities =
REF_STORE_READ | REF_STORE_WRITE |
REF_STORE_ODB | REF_STORE_MAIN;
+   const char *id;
 
if (wt->is_current)
return get_main_ref_store();
 
-   refs = lookup_ref_store_map(&worktree_ref_stores, wt->id);
+   id = wt->id ? wt->id : "/";
+   refs = lookup_ref_store_map(&worktree_ref_stores, id);
if (refs)
return refs;
 
@@ -1617,7 +1619,7 @@ struct ref_store *get_worktree_ref_store(const struct 
worktree *wt)
 
if (refs)
register_ref_store_map(&worktree_ref_stores, "worktree",
-  refs, wt->id);
+  refs, id);
return refs;
 }
 
-- 
2.11.0.157.gd943d85

-- 8< --


Re: [PATCH] worktree add: add --lock option

2017-04-14 Thread Duy Nguyen
On Fri, Apr 14, 2017 at 5:50 AM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> As explained in the document. This option has an advantage over the
>> command sequence "git worktree add && git worktree lock": there will be
>> no gap that somebody can accidentally "prune" the new worktree (or soon,
>> explicitly "worktree remove" it).
>>
>> "worktree add" does keep a lock on while it's preparing the worktree.
>> If --lock is specified, this lock remains after the worktree is created.
>>
>> Suggested-by: David Taylor 
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> ---
>>  A patch that adds --lock may look like this.
>
> This looks more like "I do believe the idea by David is a useful
> addition and here is how I did it to the best of my ability---let's
> make sure we polish it for eventual inclusion" than a mere "it may
> look like so---do whatever you want with it" patch.

It is a good addition, which is why I added tests and documents, so it
may have a chance for inclusion. I would not strongly defend it though
if there's objection.

> To me "git worktree add --lock" somehow sounds less correct than
> "git worktree add --locked", but I'd appreciate if natives can
> correct me.

That was my first choice too. Then I saw --detach (instead of
--detached). I didn't care much and went with a verb like the rest.
-- 
Duy


Re: [PATCH v7 19/28] files-backend: replace submodule_allowed check in files_downcast()

2017-04-14 Thread Duy Nguyen
On Fri, Apr 14, 2017 at 03:44:46AM -0700, Junio C Hamano wrote:
> Duy Nguyen  writes:
> 
> > I'll mark this mail and do a follow-up patch once this topic graduates
> > to master. It's less review burden and mail traffic.
> 
> I actually do not mind a single replacement patch.

Thanks. Here's the fixup patch

-- 8< --
>From c572f11f9c653f6f18d51a56b31cc5d966f9b242 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
 
Date: Fri, 14 Apr 2017 19:54:31 +0700
Subject: [PATCH] fixup! files-backend: replace submodule_allowed check in
 files_downcast()

---
 refs.c   |  2 +-
 refs/files-backend.c | 10 --
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/refs.c b/refs.c
index 241b4227b2..c7bffac06d 100644
--- a/refs.c
+++ b/refs.c
@@ -1486,7 +1486,7 @@ struct ref_store *get_ref_store(const char *submodule)
return NULL;
}
 
-   /* pretend that add_submodule_odb() has been called */
+   /* assume that add_submodule_odb() has been called */
refs = ref_store_init(submodule_sb.buf,
  REF_STORE_READ | REF_STORE_ODB);
register_submodule_ref_store(refs, submodule);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index d97a924860..3faf5a0635 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1006,15 +1006,13 @@ static void files_assert_main_repository(struct 
files_ref_store *refs,
if (refs->store_flags & REF_STORE_MAIN)
return;
 
-   die("BUG: unallowed operation (%s), only works "
-   "on main ref store\n", caller);
+   die("BUG: operation %s is only allowed for main ref store", caller);
 }
 
 /*
  * Downcast ref_store to files_ref_store. Die if ref_store is not a
- * files_ref_store. If submodule_allowed is not true, then also die if
- * files_ref_store is for a submodule (i.e., not for the main
- * repository). caller is used in any necessary error messages.
+ * files_ref_store. Die if the operation is not allowed on this particular
+ * store. "caller" is used in any necessary error messages.
  */
 static struct files_ref_store *files_downcast(struct ref_store *ref_store,
  unsigned int required_flags,
@@ -1029,7 +1027,7 @@ static struct files_ref_store *files_downcast(struct 
ref_store *ref_store,
refs = (struct files_ref_store *)ref_store;
 
if ((refs->store_flags & required_flags) != required_flags)
-   die("BUG: unallowed operation (%s), requires %x, has %x\n",
+   die("BUG: operation %s requires abilities 0x%x but have 0x%x",
caller, required_flags, refs->store_flags);
 
return refs;
-- 
2.11.0.157.gd943d85

-- 8< --


Re: [PATCH] worktree add: add --lock option

2017-04-15 Thread Duy Nguyen
On Sat, Apr 15, 2017 at 3:07 PM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> Nguyễn Thái Ngọc Duy   writes:
>>
>>> -unlink_or_warn(sb.buf);
>>> +if (!ret && opts->keep_locked) {
>>> +/*
>>> + * Don't keep the confusing "initializing" message
>>> + * after it's already over.
>>> + */
>>> +truncate(sb.buf, 0);
>>> +} else {
>>> +unlink_or_warn(sb.buf);
>>> +}
>>
>> builtin/worktree.c: In function 'add_worktree':
>> builtin/worktree.c:314:11: error: ignoring return value of 'truncate', 
>> declared with attribute warn_unused_result [-Werror=unused-result]
>>truncate(sb.buf, 0);
>>^

Yes it's supposed to be safe to ignore the error in this case. I
thought of adding (void) last minute but didn't do it.


>> cc1: all warnings being treated as errors
>> make: *** [builtin/worktree.o] Error 1
>
> I wonder why we need to have "initializing" and then remove the
> string.  Wouldn't it be simpler to do something like this instead?
> Does an empty lockfile have some special meaning?

It's mostly for troubleshooting. If "git worktree add" fails in a
later phase with a valid/locked worktree remains, this gives us a
clue.

>  builtin/worktree.c  | 16 +++-
>  t/t2025-worktree-add.sh |  3 +--
>  2 files changed, 8 insertions(+), 11 deletions(-)
>
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> index 3dab07c829..5ebdcce793 100644
> --- a/builtin/worktree.c
> +++ b/builtin/worktree.c
> @@ -243,7 +243,10 @@ static int add_worktree(const char *path, const char 
> *refname,
>  * after the preparation is over.
>  */
> strbuf_addf(&sb, "%s/locked", sb_repo.buf);
> -   write_file(sb.buf, "initializing");
> +   if (!opts->keep_locked)
> +   write_file(sb.buf, "initializing");
> +   else
> +   write_file(sb.buf, "added with --lock");
>
> strbuf_addf(&sb_git, "%s/.git", path);
> if (safe_create_leading_directories_const(sb_git.buf))
> @@ -306,15 +309,10 @@ static int add_worktree(const char *path, const char 
> *refname,
>  done:
> strbuf_reset(&sb);
> strbuf_addf(&sb, "%s/locked", sb_repo.buf);
> -   if (!ret && opts->keep_locked) {
> -   /*
> -* Don't keep the confusing "initializing" message
> -* after it's already over.
> -*/
> -   truncate(sb.buf, 0);
> -   } else {
> +   if (!ret && opts->keep_locked)
> +   ;
> +   else
> unlink_or_warn(sb.buf);
> -   }
> argv_array_clear(&child_env);
> strbuf_release(&sb);
> strbuf_release(&symref);
> diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
> index 6dce920c03..304f25fcd1 100755
> --- a/t/t2025-worktree-add.sh
> +++ b/t/t2025-worktree-add.sh
> @@ -65,8 +65,7 @@ test_expect_success '"add" worktree' '
>
>  test_expect_success '"add" worktree with lock' '
> git rev-parse HEAD >expect &&
> -   git worktree add --detach --lock here-with-lock master &&
> -   test_must_be_empty .git/worktrees/here-with-lock/locked
> +   git worktree add --detach --lock here-with-lock master

I think you still need to check for the presence of "locked" file.

>  '
>
>  test_expect_success '"add" worktree from a subdir' '
-- 
Duy


Re: includeIf breaks calling dashed externals

2017-04-15 Thread Duy Nguyen
On Fri, Apr 14, 2017 at 01:43:37PM -0400, Jeff King wrote:
> On Fri, Apr 14, 2017 at 07:04:23PM +0200, Bert Wesarg wrote:
> 
> > Dear Duy,
> > 
> > heaving an includeIf in a git config file breaks calling external git
> > commands, most prominently git-gui.
> > 
> > $ git --version
> > git version 2.12.2.599.gcf11a6797
> > $ git rev-parse --is-inside-work-tree
> > true
> > $ git echo
> > git: 'echo' is not a git command. See 'git --help'.
> > 
> > Did you mean this?
> > fetch
> > $ echo '[includeIf "gitdir:does-not-exists"]path = does-not-exists'
> > >>.git/config
> > $ git rev-parse --is-inside-work-tree
> > true
> > $ git echo
> > fatal: BUG: setup_git_env called without repository
> 
> Probably this fixes it:
> 
> diff --git a/config.c b/config.c
> index b6e4a57b9..8d66bdf56 100644
> --- a/config.c
> +++ b/config.c
> @@ -213,6 +213,9 @@ static int include_by_gitdir(const char *cond, size_t 
> cond_len, int icase)
>   struct strbuf pattern = STRBUF_INIT;
>   int ret = 0, prefix;
>  
> + if (!have_git_dir())
> + return 0;
> +
>   strbuf_add_absolute_path(&text, get_git_dir());
>   strbuf_add(&pattern, cond, cond_len);
>   prefix = prepare_include_condition_pattern(&pattern);
> 
> But it does raise a question of reading config before/after repository
> setup, since those will give different answers. I guess they do anyway
> because of $GIT_DIR/config.

This happens in execv_dased_external() -> check_pager_config() ->
read_early_config(). We probably could use the same discover_git_directory
trick to get .git dir (because we should find it). Maybe something
like this instead?

diff --git a/config.c b/config.c
index 1a4d85537b..4f540ae578 100644
--- a/config.c
+++ b/config.c
@@ -212,8 +212,14 @@ static int include_by_gitdir(const char *cond, size_t 
cond_len, int icase)
struct strbuf text = STRBUF_INIT;
struct strbuf pattern = STRBUF_INIT;
int ret = 0, prefix;
+   struct strbuf gitdir = STRBUF_INIT;
 
-   strbuf_add_absolute_path(&text, get_git_dir());
+   if (have_git_dir())
+   strbuf_addstr(&gitdir, get_git_dir());
+   else if (!discover_git_directory(&gitdir))
+   goto done;
+
+   strbuf_add_absolute_path(&text, gitdir.buf);
strbuf_add(&pattern, cond, cond_len);
prefix = prepare_include_condition_pattern(&pattern);
 
@@ -237,6 +243,7 @@ static int include_by_gitdir(const char *cond, size_t 
cond_len, int icase)
 icase ? WM_CASEFOLD : 0, NULL);
 
 done:
+   strbuf_release(&gitdir);
strbuf_release(&pattern);
strbuf_release(&text);
return ret;

--
Duy


Re: [PATCH 3/3] reset.c: update files when using sparse to avoid data loss.

2017-04-15 Thread Duy Nguyen
On Wed, Apr 12, 2017 at 10:37 PM, Kevin Willford  wrote:
>
>> -Original Message-
>> From: git-ow...@vger.kernel.org [mailto:git-ow...@vger.kernel.org] On
>> Behalf Of Duy Nguyen
>> Sent: Wednesday, April 12, 2017 7:21 AM
>> To: Kevin Willford 
>> Cc: Kevin Willford ; git@vger.kernel.org;
>> gits...@pobox.com; p...@peff.net
>> Subject: Re: [PATCH 3/3] reset.c: update files when using sparse to avoid
>> data loss.
>>
>> On Wed, Apr 12, 2017 at 5:30 AM, Kevin Willford 
>> wrote:
>> > The loss of the skip-worktree bits is part of the problem if you are
>> > talking about modified files.  The other issue that I was having is
>> > when running a reset and there were files added in the commit that is
>> > being reset, there will not be an entry in the index and not a file on
>> > disk so the data for that file is completely lost at that point.
>> > "status" also doesn't include anything about this loss of data.  On
>> > modified files status will at least have the file as deleted since
>> > there is still an index entry but again the previous version of the file 
>> > and it's
>> data is lost.
>>
>> Well, we could have "deleted" index entries, those marked with
>> CE_REMOVE. They are never written down to file though, so 'status'
>> won't benefit from that. Hopefully we can restore the file before the index
>> file is written down and we really lose skip-worktree bits?
>
> Because this is a reset --mixed it will never run through unpack_trees and
> The entries are never marked with CE_REMOVE.

I know. But in my view, it should. All updates from a tree object to
the index should happen through unpack_trees().

>> > To me this is totally unexpected behavior, for example if I am on a
>> > commit where there are only files that where added and run a reset
>> > HEAD~1 and then a status, it will show a clean working directory.
>> > Regardless of skip-worktree bits the user needs to have the data in
>> > the working directory after the reset or data is lost which is always bad.
>>
>> I agree we no longer have a place to save the skip-worktree bit, we have to
>> restore the state back as if skip-worktree bit does not exist.
>> It would be good if we could keep the logic in unpack_trees() though.
>> For example, if the file is present on disk even if skip-worktree bit is on,
>> unpack_trees() would abort instead of silently overwriting it.
>> This is a difference between skip-worktree and assume-unchanged bits.
>> If you do explicit checkout_entry() you might have to add more checks to
>> keep behavior consistent.
>> --
>> Duy
>
> Because this is a reset --mixed it will follow the code path calling 
> read_from_tree
> and ends up calling update_index_from_diff in the format_callback of the diff,
> so unpack_trees() is never called in the --mixed case.  This code change also 
> only applies
> when the file does not exist and the skip-worktree bit is on and the updated
> index entry either will be missing (covers the added scenario) or was not 
> missing
> before (covers the modified scenario).  If there is a better way to get the 
> previous
> index entry to disk than what I am doing, I am happy to implement it 
> correctly.

I think it's ok to just look at the diff (from update_index_from_diff)
and restore the on-disk version for now. I'd like to make --mixed use
unpack_trees() too but I haven't studied  this code long enough to see
why it went with "diff" instead of "read-tree" (which translates
directly to unpack_trees). Maybe there is some subtle reason for that.
Though it looks like it was more convenient to do "diff" in the
git-reset.sh version, and that got translated literally to C when the
command was rewritten.
-- 
Duy


Re: What's cooking in git.git (Apr 2017, #02; draft as of Sat, 15)

2017-04-16 Thread Duy Nguyen
On Sat, Apr 15, 2017 at 5:14 PM, Junio C Hamano  wrote:
> * nd/conditional-config-include (2017-04-14) 2 commits
>  - config: resolve symlinks in conditional include's patterns
>  - path.c: and an option to call real_path() in expand_user_path()
>

$GIT_DIR may in some cases be normalized with all symlinks resolved
while "gitdir" path expansion in the pattern does not receive the same
treatment. This may lead to incorrect mismatch.

> * nd/worktree-add-lock (2017-04-15) 2 commits
>  - SQUASH???
>  - worktree add: add --lock option
>

Allow to lock a workktree immediately after it's created. This helps
prevent a race between "git worktree add; git worktree lock" and "git
worktree prune".
-- 
Duy


Re: [BUG] ls-files '**' globstar matches one or more directories instead of zero or more directories

2017-04-16 Thread Duy Nguyen
On Sat, Apr 15, 2017 at 2:17 AM, Alistair Buxton  wrote:
> To reproduce, go to any git repository and run:
>
> diff <(git ls-files '**/*' | sort) <(git ls-files | sort)
>
> Expected result: No output since both commands should produce identical 
> output.
>
> Actual result: '**/*' only matches files at least one directory deep.
>
> The same happens with eg '**/Makefile' - only Makefiles in
> subdirectories are listed.
>
> I have personally tested with 2.7.8 and 2.12.0. Others on IRC report
> that this happens in the next branch.

Another data point. t3070-wildmatch.sh has a test for this case

match 1 0 'foo' '**/foo'

so the pattern matching machinery is _not_ broken. There may be some
changes in ls-files (pathspec-related?) that leads to this. But I
didn't try to bisect it.
-- 
Duy


Re: [BUG] ls-files '**' globstar matches one or more directories instead of zero or more directories

2017-04-16 Thread Duy Nguyen
On Sat, Apr 15, 2017 at 2:17 AM, Alistair Buxton  wrote:
> To reproduce, go to any git repository and run:
>
> diff <(git ls-files '**/*' | sort) <(git ls-files | sort)

Actually the '**/' magic only kicks in if you write

git ls-files ':(glob)**/*'

Without that '**' is a normal '*' and matching just subdirs is expected.
-- 
Duy


Re: [PATCH 1/2] config: prepare to pass more info in git_config_with_options()

2017-04-16 Thread Duy Nguyen
On Sun, Apr 16, 2017 at 11:31:28AM -0400, Jeff King wrote:
> On Sun, Apr 16, 2017 at 05:41:24PM +0700, Nguyễn Thái Ngọc Duy wrote:
> 
> > So far we can only pass one flag, respect_includes, to thie function. We
> > need to pass some more (non-flag even), so let's make it accept a struct
> > instead of an integer.
> 
> Yeah, this makes sense. But...
> 
> > diff --git a/builtin/config.c b/builtin/config.c
> > index 4f49a0edb9..5de4a36146 100644
> > --- a/builtin/config.c
> > +++ b/builtin/config.c
> > @@ -26,7 +26,8 @@ static int use_global_config, use_system_config, 
> > use_local_config;
> >  static struct git_config_source given_config_source;
> >  static int actions, types;
> >  static int end_null;
> > -static int respect_includes = -1;
> > +static int respect_includes_opt;
> > +static struct config_options config_options;
> 
> It fails all the git-config "include" tests because respect_includes_opt
> is missing the initialization to "-1".

Serves me right for doing last-minute "harmless" refactoring without
rerunning the test suite :(

> 
> -Peff


Re: [PATCH 2/2] config: handle conditional include when $GIT_DIR is not set up

2017-04-16 Thread Duy Nguyen
On Sun, Apr 16, 2017 at 11:51:32AM -0400, Jeff King wrote:
> > diff --git a/cache.h b/cache.h
> > index e29a093839..27b7286f99 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -1884,6 +1884,8 @@ enum config_origin_type {
> >  
> >  struct config_options {
> > unsigned int respect_includes : 1;
> > +   unsigned int early_config : 1;
> > +   const char *git_dir; /* only valid when early_config is true */
> >  };
> 
> Why do we need both the flag and the string? Later, you do:
> 
> > -static int include_by_gitdir(const char *cond, size_t cond_len, int icase)
> > +static int include_by_gitdir(const struct config_options *opts,
> > +const char *cond, size_t cond_len, int icase)
> >  {
> > struct strbuf text = STRBUF_INIT;
> > struct strbuf pattern = STRBUF_INIT;
> > int ret = 0, prefix;
> >  
> > -   strbuf_add_absolute_path(&text, get_git_dir());
> > +   if (!opts->early_config)
> > +   strbuf_add_absolute_path(&text, get_git_dir());
> > +   else if (opts->git_dir)
> > +   strbuf_add_absolute_path(&text, opts->git_dir);
> > +   else
> > +   goto done;
> 
> So we call get_git_dir() always when we're not in early config. Even if
> we don't have a git dir! Doesn't this mean that programs operating
> outside of a repo will still hit the BUG? I.e.:
> 
>   git config --global includeif.gitdir:/whatever.path foo
>   cd /not/a/git/dir
>   git diff --no-index foo bar
> 
> ?
> 
> I think instead the logic should be:
> 
>   if (opts->git_dir)
>   strbuf_add_absolute_path(&text, opts->git_dir);
>   else if (have_git_dir())
>   strbuf_add_absolute_path(&text, get_git_dir());
>   else
>   goto done;

Do we care about the case when the override instruction is "we don't
have $GIT_DIR, act as if it does not exist, even though have_git_dir()
returns true"?

I'm guessing no, we won't run into that situation (and am inclined to
restructure the code as you suggested). Just throwing it out there if
I'm mistaken.
--
Duy


Re: [PATCH 2/2] config: handle conditional include when $GIT_DIR is not set up

2017-04-17 Thread Duy Nguyen
(To Junio, this series conflicts slightly with
nd/conditional-config-include, let me know if you want me to rebase
this on top of that)

On Sun, Apr 16, 2017 at 10:51 PM, Jeff King  wrote:
>> + if (opts.git_dir) {
>>   struct git_config_source repo_config;
>>
>>   memset(&repo_config, 0, sizeof(repo_config));
>> - strbuf_addstr(&buf, "/config");
>> + strbuf_addf(&buf, "%s/config", opts.git_dir);
>>   repo_config.file = buf.buf;
>>   git_config_with_options(cb, data, &repo_config, &opts);
>>   }
>
> ...we have to re-add the git_dir.
>
> Might it be simpler to just xstrdup() to opts.git_dir, and then leave
> this later code alone?

Sure thing. But we need to restore the "if" expression too. Otherwise,
if have_git_dir() is true, we may come here with an empty "buf" before
that strbuf_addstr(&buf, "/config") is called. It does not matter much
anyway because this block will be removed.

> You can see the second problem with:
>
>   # random external
>   cat >git-foo <<-\EOF
>   #!/bin/sh
>   echo foo
>   EOF
>   chmod +x git-foo
>   PATH=$PWD:$PATH
>
>   git init
>   git config pager.foo 'sed s/^/repo:/'
>   git -c pager.foo='sed s/^/cmdline:/' foo
>
> That command should prefer the cmdline config, but it doesn't.

I actually had problems seeing the problem, for some reason it didn't
work for me. I guess I made a mistake somewhere.

> The fix is something like what's below, which is easy on top of your new
> options struct. I can wrap it up with a config message and test on top
> of your series.

... anyway I read this last sentence too late and spent too much time
wrapping your changes in a patch (well most of the time was spent on
writing a new test actually), so I'm  sending it out too. My test uses
test-config though (I have given up on dealing with pager and tty).

Off topic. Back to the pager.foo thing (because I had to fight it a
bit before giving up). I find it a bit unintuitive that "--paginate"
forces the pager back to less (or $PAGER) when I say "pager.foo =
my-favorite-pager". Back when pager.foo is a boolean thing, it makes
sense for --paginate to override the "to page or not to page"
decision. But then you added a new meaning too pager.foo (what command
to run). "--paginate" should respect the command pager.foo specifies
when its value is a command, I think.
-- 
Duy


Re: [PATCH] Make git log work for git CWD outside of work tree

2017-04-17 Thread Duy Nguyen
On Fri, Apr 14, 2017 at 4:29 AM, Jeff King  wrote:
> On Wed, Apr 12, 2017 at 08:11:22PM +0700, Duy Nguyen wrote:
>
>> On Wed, Apr 12, 2017 at 8:01 PM, Jeff King  wrote:
>> > I dunno. Maybe I am missing some subtle case, but it's not clear to me
>> > what the user would be trying to do by having git stay in the original
>> > directory.
>>
>> Not sure if it really happens. But if we jump from worktree is inside
>> original cwd and we have to jump to worktree, we set empty prefix, not
>> "../../../" one. So we can't refer back to files relative to original
>> cwd with prefix. I was kinda hoping "super prefix" would solve it, but
>> that one seems designed specifically for submodules.
>
> Ah, right. I think the issue is that "prefix" really serves two uses.
> For things like command-line arguments, we use to find the original path
> after we've moved. But we also use it for "where in the working tree
> are we".
>
> So with an out-of-tree cwd, we'd want to set the first one to the real
> path ("../../whatever" or even just an absolute path), but the second
> one would probably be empty (i.e., just pretend that we started from the
> top-level).
>
> But that would require first refactoring all of the cmd_* functions to
> handle those prefixes separately.

Yeah. I probably shouldn't start another series until all others of
mine settle. But if anyone is changing cmd_*, I suggest we take this
opportunity to pass "struct struct startup_info *" in the as the only
option. argv/argv could be stored there as well. And you can add
"cwd_prefix" to present that "../../whatever" (I would avoid the name
"super prefix" until things settle), or just keep the original "cwd"
in there and let people do whatever they want with it.
-- 
Duy


Re: [PATCH 3/3] reset.c: update files when using sparse to avoid data loss.

2017-04-17 Thread Duy Nguyen
On Sun, Apr 16, 2017 at 11:25 AM, Duy Nguyen  wrote:
>> Because this is a reset --mixed it will never run through unpack_trees and
>> The entries are never marked with CE_REMOVE.
>
> I know. But in my view, it should. All updates from a tree object to
> the index should happen through unpack_trees().

Just fyi. My view is wrong. We need to handle a diff here, not through
unpack_trees() because "git reset --mixed" support partial reset, see
2ce633b928 (git-reset [--mixed]  [--] ... - 2006-12-14).
We might be able to make unpack_trees() leave certain paths(pec)
untouched, but I don't think it is worth it. In other words, your
original patch is the way to go.

PS. I briefly wondered if "git checkout  -- " had the
same problem. I think not, because while --mixed does not touch
worktree, checkout does, so it should restore on-disk versions if
needed. The read_tree_some() call in checkout_paths() should respect
sparse patterns and add skip-worktree bits back if needed though, but
I don't think it does that.
-- 
Duy


Re: How to keep log history when renaming and changing simultaneously

2017-04-17 Thread Duy Nguyen
On Mon, Apr 17, 2017 at 6:36 PM, Urs Thuermann  wrote:
> Sometimes I need to rename and change a file in one commit.  One
> example would be a file foo.h that begins with
>
> #ifndef FOO_H
> #define FOO_H
>
> which should be renamed bar.h and have the FOO_H changed to BAR_H.
> In subversion I would
>
> svn mv foo.h bar.h; vi bar.h; svn ci
>
> and then a
>
> svn log bar.h
>
> also shows the history of that file when it was named foo.h.
>
> This does not work in git.
> ...
> Is there a better way to do this in git?

We could if we allowed rename annotation, similar to that "svn mv"
command. I worked on that a while back [1] then I stopped because of a
design conflict. But I'm almost convinced Jeff and Junio were right
(and I wrong) now so I will likely revive that series before 2019
hits.

[1] 
http://public-inbox.org/git/1453287968-26000-1-git-send-email-pclo...@gmail.com/
-- 
Duy


Re: Feature request: --format=json

2017-04-17 Thread Duy Nguyen
On Mon, Apr 17, 2017 at 7:44 PM, Fred .Flintstone  wrote:
> So I did "git rev-list --all --pretty" and it looks like "git log".
> Which outputs a human-readable format.
>
> However, if I want something more suitable for machine parsing, is
> there any way to get that output?
>
> Example maybe I want another date format like ISO dates, or maybe a
> serializable format like JSON or CSV or something. Maybe I want more
> data than commit, auhor, date, subject and body?

"git cat-file commit " should give you a machine-readable
format of everything (it's the same thing that git-log parses and
shows you; not counting options like --diff, --stat...). 
is from rev-list output (without --pretty, that's not for machine
processing). You probably can use "git cat-file --batch" too, just
pipe the whole rev-list output through it. You don't get to choose a
convenient format this way though.
-- 
Duy


Re: What's cooking in git.git (Apr 2017, #03; Tue, 18)

2017-04-19 Thread Duy Nguyen
On Tue, Apr 18, 2017 at 10:45:22PM -0700, Junio C Hamano wrote:
> * nd/conditional-config-in-early-config (2017-04-17) 3 commits
>  - config: correct file reading order in read_early_config()
>  - config: handle conditional include when $GIT_DIR is not set up
>  - config: prepare to pass more info in git_config_with_options()
> 
>  The recently introduced conditional inclusion of configuration did
>  not work well when early-config mechanism was involved.
> 
>  Will merge to 'next'.

You may want to squash this in before merging to 'next'. I think this
is the last comment from Jeff.

-- 8< --
Subject: [PATCH] fixup! config: correct file reading order in 
read_early_config()

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 t/helper/test-config.c  | 4 
 t/t1309-early-config.sh | 3 ++-
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/t/helper/test-config.c b/t/helper/test-config.c
index 696d0a52fd..8e3ed6a76c 100644
--- a/t/helper/test-config.c
+++ b/t/helper/test-config.c
@@ -84,10 +84,6 @@ int cmd_main(int argc, const char **argv)
struct config_set cs;
 
if (argc == 3 && !strcmp(argv[1], "read_early_config")) {
-   const char *cmdline_config = getenv("CMDL_CFG");
-
-   if (cmdline_config)
-   git_config_push_parameter(cmdline_config);
read_early_config(early_config_cb, (void *)argv[2]);
return 0;
}
diff --git a/t/t1309-early-config.sh b/t/t1309-early-config.sh
index c15f18dd96..1af8c454cf 100755
--- a/t/t1309-early-config.sh
+++ b/t/t1309-early-config.sh
@@ -47,13 +47,14 @@ test_expect_success 'ceiling #2' '
test xdg = "$(cat output)"
 '
 
+cmdline_config="'test.source=cmdline'"
 test_expect_success 'read config file in right order' '
echo "[test]source = home" >>.gitconfig &&
git init foo &&
(
cd foo &&
echo "[test]source = repo" >>.git/config &&
-   CMDL_CFG=test.source=cmdline test-config \
+   GIT_CONFIG_PARAMETERS=$cmdline_config test-config \
read_early_config test.source >actual &&
cat >expected <<-\EOF &&
home
-- 
2.11.0.157.gd943d85
-- 8< --


Re: [PATCH v2 12/12] rev-list: expose and document --single-worktree

2017-04-19 Thread Duy Nguyen
On Sun, Mar 19, 2017 at 1:00 AM, Junio C Hamano  wrote:
>> diff --git a/Documentation/rev-list-options.txt 
>> b/Documentation/rev-list-options.txt
>> index a02f7324c0..c71e94b2d0 100644
>> --- a/Documentation/rev-list-options.txt
>> +++ b/Documentation/rev-list-options.txt
>> @@ -179,6 +179,14 @@ explicitly.
>>   Pretend as if all objects mentioned by reflogs are listed on the
>>   command line as ``.
>>
>> +--single-worktree::
>> + By default, all working trees will be examined by the
>
> s/working tree/worktree/?

Nope. It's the "working tree" that we consistently use throughout
git-worktree.txt
-- 
Duy


Re: [PATCH v3 05/12] refs: move submodule slash stripping code to get_submodule_ref_store

2017-04-19 Thread Duy Nguyen
On Thu, Apr 20, 2017 at 12:02:08AM +0200, Johannes Sixt wrote:
> Am 19.04.2017 um 13:01 schrieb Nguyễn Thái Ngọc Duy:
> > @@ -1558,7 +1543,17 @@ struct ref_store *get_submodule_ref_store(const char 
> > *submodule)
> >  {
> > struct strbuf submodule_sb = STRBUF_INIT;
> > struct ref_store *refs;
> > +   char *to_free = NULL;
> > int ret;
> > +   size_t len;
> > +
> > +   if (submodule) {
> > +   len = strlen(submodule);
> > +   while (len && submodule[len - 1] == '/')
> 
> What is the source of the value of 'submodule'? Is it an index entry? Or 
> did it pass through parse_pathspec? In these cases it is correct to 
> compare against literal '/'. Otherwise, is_dir_sep() is preferred.

This is a code move from resolve_gitlink_ref(), which goes back to
0ebde32c87 (Add 'resolve_gitlink_ref()' helper function - 2007-04-09)
and it looks like a dir separator back then.

Can I convert that in a separate topic? I think Michael even wanted to
kill all these path manipulation in refs code, which makes sense, but
I would need to audit the callers carefully before making that move.
--
Duy


Re: What's cooking in git.git (Apr 2017, #04; Wed, 19)

2017-04-20 Thread Duy Nguyen
On Wed, Apr 19, 2017 at 10:37:21PM -0700, Junio C Hamano wrote:
> * nd/worktree-add-lock (2017-04-16) 2 commits
>  - SQUASH???
>  - worktree add: add --lock option
> 
>  Allow to lock a worktree immediately after it's created. This helps
>  prevent a race between "git worktree add; git worktree lock" and
>  "git worktree prune".
> 
>  Waiting for a response to SQUASH???

Looking good. I would add some comment, lest ';' feel lonely. But it's
really personal taste.

-- 8< --
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 5ebdcce793..bc75676bf3 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -310,7 +310,7 @@ static int add_worktree(const char *path, const char 
*refname,
strbuf_reset(&sb);
strbuf_addf(&sb, "%s/locked", sb_repo.buf);
if (!ret && opts->keep_locked)
-   ;
+   ; /* --lock wants to keep "locked" file */
else
unlink_or_warn(sb.buf);
argv_array_clear(&child_env);
-- 8< --


Re: [PATCH v3 05/12] refs: move submodule slash stripping code to get_submodule_ref_store

2017-04-20 Thread Duy Nguyen
On Thu, Apr 20, 2017 at 5:02 AM, Johannes Sixt  wrote:
> Am 19.04.2017 um 13:01 schrieb Nguyễn Thái Ngọc Duy:
>>
>> @@ -1558,7 +1543,17 @@ struct ref_store *get_submodule_ref_store(const
>> char *submodule)
>>  {
>> struct strbuf submodule_sb = STRBUF_INIT;
>> struct ref_store *refs;
>> +   char *to_free = NULL;
>> int ret;
>> +   size_t len;
>> +
>> +   if (submodule) {
>> +   len = strlen(submodule);
>> +   while (len && submodule[len - 1] == '/')
>
>
> What is the source of the value of 'submodule'? Is it an index entry? Or did
> it pass through parse_pathspec? In these cases it is correct to compare
> against literal '/'. Otherwise, is_dir_sep() is preferred.

I've looked at the callers. Yes it is a path and is_dir_sep() should
be used. Since this has been like this in the current code, unless
there's some more changes in this series or you insist, I would hold
this change back, wait for this series to settle and submit it later
(I'll have to do that anyway to kill the get_main_store() call in this
function).
-- 
Duy


Re: [PATCH] refs.h: rename submodule arguments to submodule_path

2017-04-21 Thread Duy Nguyen
On Fri, Apr 21, 2017 at 1:42 PM, Michael Haggerty  wrote:
> On 04/21/2017 08:32 AM, Michael Haggerty wrote:
>> [...]
>> I've CCed Duy because I don't know whether he has more plans regarding
>> submodule references [...] get rid of the
>> `for_each_ref_submodule()` family of functions entirely.
>>
>> So perhaps the code that this patch touches won't be around long anyway.
>
> Oh yeah, he has done exactly that in his nd/prune-in-worktree patch
> series. (I knew I'd seen that somewhere...)
>
> So it seems that the argument renaming has mostly been overtaken by
> events, though even after Duy's patch series there are a few `const char
> *submodule` arguments that could be renamed.

Yeah. After that series, the only place that takes a submodule (path)
is get_submodule_ref_store() (other functions are just helpers).
Renaming it to submodule_path makes perfect sense. Johannes Sixt when
reviewing that series also noticed the "path" nature of this
"submodule" argument and suggested converting submodule[x] == '/' to
is_dir_sep(submodule[i]) for that reason.

At this point, I think Stefan even has the opportunity to
reference/look up a submodule ref store by something other than a path
 (like "struct submodule *", or by name)  if he wants to. But if you
do that, maybe rename the current function to
get_submodule_ref_store_by_path() before you add a new
get_submodule_ref_store_by_whatever().

About ".. because I don't know whether he (Duy) has more plans
regarding submodule references", I plan to convert "submodule[x] ==
'/'" to is_dir_sep(submodule[x]), but I think that's about it. I'm not
involved much in submodule to see the direction it's heading. As far
as refs code is concerned, a "struct ref_store *" is all it needs,
regardless of how you obtain it.
-- 
Duy


Re: [BUG] test suite broken with GIT_TEST_SPLIT_INDEX

2017-04-21 Thread Duy Nguyen
On Fri, Apr 21, 2017 at 2:10 PM, Christian Couder
 wrote:
> On Thu, Apr 20, 2017 at 11:24 PM, Thomas Gummerer  
> wrote:
>> On 04/20, Christian Couder wrote:
>>>
>>> Could you try with the following patch:
>>>
>>> http://public-inbox.org/git/20170330210354.20018-1-chrisc...@tuxfamily.org/
>>
>> Yeah, I tried with and without that patch with the same result.
>> Unless I'm screwing something up when testing I don't think this fixes
>> the issue unfortunately.
>
> Ok, I will take a look soon.
>
> By the way I think that GIT_TEST_SPLIT_INDEX has become redundant now
> that there is core.splitIndex.
> So perhaps in the long run it will be best to deprecate
> GIT_TEST_SPLIT_INDEX and eventually remove it.

I think you can't, at least the way I understand this variable. It's a
_test_ variable to force exercise split index code path a whole lot
more, by running the entire test suite with split index always
enabled, instead of just a couple in  t-split-index.sh. We can't
achieve the same with core.splitIndex because that's more about user
control and you can't just set core.splitIndex for the entire test
suite (can we?).

I would understand if the failed tests are about core.splitIndex (they
both try to control how split index is created), but failed tests look
totally unrelated. We may have discovered some bug (or that git_path
one Jeff mentioned).
-- 
Duy


Re: [PATCH 00/15] Handle fopen() errors

2017-04-21 Thread Duy Nguyen
On Fri, Apr 21, 2017 at 1:29 PM, Jeff King  wrote:
> On Thu, Apr 20, 2017 at 08:41:32PM -0700, Junio C Hamano wrote:
>
>> Junio C Hamano  writes:
>>
>> > I wonder if it is OK to only special case ENOENT for !fp cases,
>> > where existing code silently returns.  Perhaps it is trying to read
>> > an optional file, and it returns silently because lack of it is
>> > perfectly OK for the purpose of the code.  Are there cases where
>> > this optional file is inside an optional directory, leading to
>> > ENOTDIR, instead of ENOENT, observed and reported by your check?
>>
>> "git grep -B1 warn_on_inaccessible" is enlightening.  I wonder if we
>> want to wrap the two lines into a hard to misuse helper function,
>> something along this line.  Would having this patch as a preparatory
>> step shrink your series?  The patch count would be the same, but you
>> wouldn't be writing "if (errno != ENOENT)" lines yourself.
>
> I had a similar thought while reading through it. I think it would be
> shorter still with:
>
>   FILE *fopen_or_warn(const char *path, const char *mode)
>   {
> FILE *fh = fopen(path, mode);
> if (!fh)
> warn_failure_to_read_open_optional_path(path);
> return fh;
>   }
>
> And then quite a few of the patches could just be
> s/fopen/fopen_or_warn/.

Jeff.. oh Jeff.. you have made it _way_ too convenient that after a
quick grep at fopen( again, I found a couple more places that I would
have just ignored last time (too much work), but now all I need to do
is Alt-f to the end of fopen and Alt-/ a few times. Too tempting.. :)
-- 
Duy


Re: [BUG] test suite broken with GIT_TEST_SPLIT_INDEX

2017-04-21 Thread Duy Nguyen
On Fri, Apr 21, 2017 at 6:57 PM, Christian Couder
 wrote:
> On Fri, Apr 21, 2017 at 1:46 PM, Christian Couder
>  wrote:
>> On Fri, Apr 21, 2017 at 11:53 AM, Duy Nguyen  wrote:
>>> On Fri, Apr 21, 2017 at 2:10 PM, Christian Couder
>>>  wrote:
>>>> On Thu, Apr 20, 2017 at 11:24 PM, Thomas Gummerer  
>>>> wrote:
>>>>> On 04/20, Christian Couder wrote:
>>>>>>
>>>>>> Could you try with the following patch:
>>>>>>
>>>>>> http://public-inbox.org/git/20170330210354.20018-1-chrisc...@tuxfamily.org/
>>>>>
>>>>> Yeah, I tried with and without that patch with the same result.
>>>>> Unless I'm screwing something up when testing I don't think this fixes
>>>>> the issue unfortunately.
>>>>
>>>> Ok, I will take a look soon.
>>>>
>>>> By the way I think that GIT_TEST_SPLIT_INDEX has become redundant now
>>>> that there is core.splitIndex.
>>>> So perhaps in the long run it will be best to deprecate
>>>> GIT_TEST_SPLIT_INDEX and eventually remove it.
>>>
>>> I think you can't, at least the way I understand this variable. It's a
>>> _test_ variable to force exercise split index code path a whole lot
>>> more, by running the entire test suite with split index always
>>> enabled, instead of just a couple in  t-split-index.sh. We can't
>>> achieve the same with core.splitIndex because that's more about user
>>> control and you can't just set core.splitIndex for the entire test
>>> suite (can we?).
>>
>> Yeah, you are right.
>> It looks like we have GIT_TEST_OPTS to pass options like --debug,
>> --valgrind, --verbose, but we don't have an environment variable to
>> set config options.
>
> Or maybe GIT_CONFIG_PARAMETERS works for this purpose?

It has to be set inside test-lib.sh, not from outside because
environment variables from outside are filtered if I remember
correctly and only a few specials plus those GIT_TEST_ can survive.
Some tests override GIT_CONFIG_PARAMETERS themselves to pass config
vars to certain command (I know because I just did a couple days ago
;).which loses core.splitIndex.
-- 
Duy


Re: [PATCH 00/15] Handle fopen() errors

2017-04-21 Thread Duy Nguyen
On Fri, Apr 21, 2017 at 6:52 PM, Junio C Hamano  wrote:
> Yes, but (1) we'd need to be careful about --quiet

Yeah. It's a real pain point for making changes like this. At some
point we should just have a global (maybe multi-level) quiet flag.
-- 
Duy


Re: [PATCH v3 06/12] refs: add refs_head_ref()

2017-04-22 Thread Duy Nguyen
On Sat, Apr 22, 2017 at 1:37 PM, Michael Haggerty  wrote:
> On 04/19/2017 01:01 PM, Nguyễn Thái Ngọc Duy wrote:
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> ---
>>  refs.c | 19 +--
>>  refs.h |  2 ++
>>  2 files changed, 11 insertions(+), 10 deletions(-)
>>
>> diff --git a/refs.c b/refs.c
>> index 26474cb62a..a252ae43ee 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -1208,27 +1208,26 @@ int refs_rename_ref_available(struct ref_store *refs,
>>   return ok;
>>  }
>>
>> -int head_ref_submodule(const char *submodule, each_ref_fn fn, void *cb_data)
>> +int refs_head_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
>>  {
>>   struct object_id oid;
>>   int flag;
>>
>> - if (submodule) {
>> - if (resolve_gitlink_ref(submodule, "HEAD", oid.hash) == 0)
>> - return fn("HEAD", &oid, 0, cb_data);
>> -
>> - return 0;
>> - }
>> -
>> - if (!read_ref_full("HEAD", RESOLVE_REF_READING, oid.hash, &flag))
>> + if (!refs_read_ref_full(refs, "HEAD", RESOLVE_REF_READING,
>> + oid.hash, &flag))
>>   return fn("HEAD", &oid, flag, cb_data);
>>
>>   return 0;
>>  }
>>
>> +int head_ref_submodule(const char *submodule, each_ref_fn fn, void *cb_data)
>> +{
>> + return refs_head_ref(get_submodule_ref_store(submodule), fn, cb_data);
>> +}
>> +
>>  int head_ref(each_ref_fn fn, void *cb_data)
>>  {
>> - return head_ref_submodule(NULL, fn, cb_data);
>> + return refs_head_ref(get_main_ref_store(), fn, cb_data);
>>  }
>>
>>  /*
>> diff --git a/refs.h b/refs.h
>> index 447381d378..0572473ef7 100644
>> --- a/refs.h
>> +++ b/refs.h
>> @@ -233,6 +233,8 @@ typedef int each_ref_fn(const char *refname,
>>   * modifies the reference also returns a nonzero value to immediately
>>   * stop the iteration. Returned references are sorted.
>>   */
>> +int refs_head_ref(struct ref_store *refs,
>> +   each_ref_fn fn, void *cb_data);
>>  int refs_for_each_ref(struct ref_store *refs,
>> each_ref_fn fn, void *cb_data);
>>  int refs_for_each_ref_in(struct ref_store *refs, const char *prefix,
>>
>
> I'm seeing segfaults in t3600 after this patch, apparently because
> `refs==NULL` gets passed from `head_ref_submodule()` to `refs_head_ref()`.

Eventually it's the caller's responsibility not to pass NULL ref store
to refs_* functions. That happens in "revision.c: use refs_for_each*()
instead of for_each_*_submodule()" patch, which is too late and would
break bisect. Will fix.
-- 
Duy


Re: [PATCH v3 10/12] files-backend: make reflog iterator go through per-worktree reflog

2017-04-22 Thread Duy Nguyen
On Sat, Apr 22, 2017 at 10:05:02AM +0200, Michael Haggerty wrote:
> I find this implementation confusing:
> 
> * `if (iter->worktree_dir_iterator)` sounds like it should mean
>   that we are iterating over worktree references but it really means
>   that we are iterating over the common references in a repository
>   that is a linked worktree.
> * `files_reflog_iterator_advance()` is called recursively, but only
>   for the first worktree reference.
> * `iter->worktree_dir_iterator` is moved over to `iter->dir_iterator`
>   when the common refs are exhausted.
> 
> Do you find it more readable as follows?:

It's a bit better, but while we're at it, why not take full advantage
of iterator abstraction?

This replacement patch (with some unrelated bits removed to reduce
distraction) adds a new meta ref-iterator that combine a per-repo and
a per-worktree iterators together. The new iterator walks through both
sub-iterators and drop the per-worktree results from the per-repo
iterator, which will be replaced with results from per-worktree
iterator.

You probably see where I'm going with this. When the new "linked
worktree ref store" comes, it will combine two per-worktree and
per-repo ref stores together and this iterator will come handy.

At that point, files-backend can go back to being oblivious about
$GIT_DIR vs $GIT_COMMON_DIR and files_reflog_iterator_begin() will be
reverted back to the version before this patch.

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 4149943a6e..817b7b5d5e 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3432,23 +3423,37 @@ static struct ref_iterator_vtable 
files_reflog_iterator_vtable = {
files_reflog_iterator_abort
 };
 
-static struct ref_iterator *files_reflog_iterator_begin(struct ref_store 
*ref_store)
+static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
+ const char *gitdir)
 {
-   struct files_ref_store *refs =
-   files_downcast(ref_store, REF_STORE_READ,
-  "reflog_iterator_begin");
struct files_reflog_iterator *iter = xcalloc(1, sizeof(*iter));
struct ref_iterator *ref_iterator = &iter->base;
struct strbuf sb = STRBUF_INIT;
 
base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
-   files_reflog_path(refs, &sb, NULL);
+   strbuf_addf(&sb, "%s/logs", gitdir);
iter->dir_iterator = dir_iterator_begin(sb.buf);
iter->ref_store = ref_store;
strbuf_release(&sb);
+
return ref_iterator;
 }
 
+static struct ref_iterator *files_reflog_iterator_begin(struct ref_store 
*ref_store)
+{
+   struct files_ref_store *refs =
+   files_downcast(ref_store, REF_STORE_READ,
+  "reflog_iterator_begin");
+
+   if (!strcmp(refs->gitdir, refs->gitcommondir)) {
+   return reflog_iterator_begin(ref_store, refs->gitcommondir);
+   } else {
+   return worktree_ref_iterator_begin(
+   reflog_iterator_begin(ref_store, refs->gitcommondir),
+   reflog_iterator_begin(ref_store, refs->gitdir));
+   }
+}
+
 static int ref_update_reject_duplicates(struct string_list *refnames,
struct strbuf *err)
 {
diff --git a/refs/iterator.c b/refs/iterator.c
index bce1f192f7..93243a00c4 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -40,6 +40,14 @@ void base_ref_iterator_free(struct ref_iterator *iter)
free(iter);
 }
 
+static void ref_iterator_copy_result(struct ref_iterator *dst,
+const struct ref_iterator *src)
+{
+   dst->refname = src->refname;
+   dst->oid = src->oid;
+   dst->flags = src->flags;
+}
+
 struct empty_ref_iterator {
struct ref_iterator base;
 };
@@ -382,3 +390,100 @@ int do_for_each_ref_iterator(struct ref_iterator *iter,
return -1;
return retval;
 }
+
+struct worktree_ref_iterator {
+   struct ref_iterator base;
+   struct ref_iterator *per_repo_iterator;
+   struct ref_iterator *per_worktree_iterator;
+};
+
+static int worktree_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+   struct worktree_ref_iterator *iter =
+   (struct worktree_ref_iterator *)ref_iterator;
+   int ok;
+
+   while (1) {
+   struct ref_iterator **subiter;
+   int normal_only;
+
+   if (iter->per_repo_iterator) {
+   subiter = &iter->per_repo_iterator;
+   /*
+* If we are in a worktree, then we only
+* include "normal" common references:
+*/
+   normal_only = !!iter->per_worktree_iterator;
+   } else if (iter->per_worktree_iterator) {
+   subiter = &iter->per_worktree_iterator;
+  

Re: Feature request: --format=json

2017-04-24 Thread Duy Nguyen
On Mon, Apr 24, 2017 at 3:33 PM, shawn wilson  wrote:
> Late to the party, but I too would also like json format output (mainly so I
> could pipe stuff to jq instead of looking at the man page for which %thing
> I'm looking for). That said, it's not at the PR level of want for me.
>
> OTOH, format=xml would be even more handy IMHO... Which I see has hit both
> SO and this ml in the past. Either way /some/ machine output would be a good
> thing :)

Personally I'd rather avoid linking to another library just for
json/xml formatting. libgit2 would be a great place to have
functionality like this and it looks like you don't even have to touch
C [1] to do that.

[1] https://gist.github.com/m1el/42472327b4be382b02eb
-- 
Duy


Re: [PATCH] cache-tree: reject entries with null sha1

2017-04-24 Thread Duy Nguyen
On Sat, Apr 22, 2017 at 1:46 AM, Jeff King  wrote:
> We generally disallow null sha1s from entering the index,
> due to 4337b5856 (do not write null sha1s to on-disk index,
> 2012-07-28). However, we loosened that in 83bd7437c
> (write_index: optionally allow broken null sha1s,
> 2013-08-27) so that tools like filter-branch could be used
> to repair broken history.

Uh oh.. cache-tree.

> However, we should make sure that these broken entries do
> not get propagated into new trees. For most entries, we'd
> catch them with the missing-object check (since presumably
> the null sha1 does not exist in our object database). But
> gitlink entries do not need reachability, so we may blindly
> copy the entry into a bogus tree.

Phew.. not another bug of mine :D

> When merged to pu, this fixes the existing test breakage in t7009 when
> GIT_TEST_SPLIT_INDEX is used (because the split index didn't rewrite the
> whole index, "git rm --cached" didn't always barf).

Latest 'pu' has your patch, but t7009 still fails on me (with "invalid
object" error), more on this later..

> But I think it's worth doing on its own merits, as demonstrated by the new 
> tests.

Agreed. The patch looks correct.

Just checking, since cache-tree helps speed up many operations,
dropping cache-tree can have some performance implication. But this
must be an error case (null sha1) and we will not run into it often to
worry about unnecessary dropping, correct?

> The one thing I haven't figured out it is why the new test in t7009
> fails with the split-index. And even more curiously, the new tests in
> t1601 _don't_ fail with it, even if I instrument the fake index to have
> more entries (making it more likely to split).

back to t7009 failure. I'll see if I can look more into this this
weekend. If split-index somehow produces these null sha1, then I
probably have a problem.

Thanks for looking at it anyway. One bug down. Thousands to go...

BTW, I ran t7009 with valgrind and it reported this. Is it something
we should be worried about? I vaguely recall you're doing something
with prio-queue...

==4246== Source and destination overlap in memcpy(0x5952990, 0x5952990, 16)
==4246==at 0x4C2EACD: memcpy@@GLIBC_2.14 (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==4246==by 0x545D05: swap (prio-queue.c:15)
==4246==by 0x545D72: prio_queue_reverse (prio-queue.c:25)
==4246==by 0x4CBC0C: sort_in_topological_order (commit.c:723)
==4246==by 0x574C97: prepare_revision_walk (revision.c:2858)
==4246==by 0x48A2BA: cmd_rev_list (rev-list.c:385)
==4246==by 0x405A6F: run_builtin (git.c:371)
==4246==by 0x405CDC: handle_builtin (git.c:572)
==4246==by 0x405E51: run_argv (git.c:624)
==4246==by 0x405FF3: cmd_main (git.c:701)
==4246==by 0x4A48CE: main (common-main.c:43)
-- 
Duy


Re: [PATCH 1/6] worktree.c: add validate_worktree()

2017-04-24 Thread Duy Nguyen
On Fri, Apr 21, 2017 at 9:16 AM, Junio C Hamano  wrote:
>> +int validate_worktree(const struct worktree *wt, int quiet)
>> +{
>> + struct strbuf sb = STRBUF_INIT;
>> + char *path;
>> + int err, ret;
>> +
>> + if (is_main_worktree(wt)) {
>> + /*
>> +  * Main worktree using .git file to point to the
>> +  * repository would make it impossible to know where
>> +  * the actual worktree is if this function is executed
>> +  * from another worktree. No .git file support for now.
>> +  */
>> + strbuf_addf(&sb, "%s/.git", wt->path);
>> + if (!is_directory(sb.buf)) {
>> + strbuf_release(&sb);
>> + return report(quiet, _("'%s/.git' at main worktree is 
>> not the repository directory"),
>> +   wt->path);
>
> These messages come without telling what they are.  Should they say
> that these are errors?  Or does the severity depend on the caller,
> i.e. why they want to know if a particular worktree is valid, and
> sometimes these are errors and other times they are mere warnings?

I'll save the error in a strbuf and let the caller decide how to
present them, which gives better context (e.g. "unable to move
worktree because ...")

>> + }
>> + return 0;
>> + }
>> +
>> + /*
>> +  * Make sure "gitdir" file points to a real .git file and that
>> +  * file points back here.
>> +  */
>> + if (!is_absolute_path(wt->path))
>> + return report(quiet, _("'%s' file does not contain absolute 
>> path to the worktree location"),
>> +   git_common_path("worktrees/%s/gitdir", wt->id));
>
> It makes me wonder if this kind of error reporting belongs to the
> place where these are read (and a new wt is written out to the
> filesystem, perhaps).  The programmer who wrote this code may have
> known that wt->path is prepared by reading "worktrees/%s/gitdir" and
> without doing real_path() or absolute_path() on the result when this
> code was written, but nothing guarantees that to stay true over time
> as the code evolves.

This is almost like fsck for worktrees and for now only be checked
before we do destructive things to worktrees (moving, removing..).

Yeah we probably should do this at read time too (after checking if a
worktree is locked, and skip the next check because wt->path may not
exist). But we probably want to either make this function cheaper, or
we cache the worktree list. Probably the latter. It's on my todo list.
-- 
Duy


Re: [PATCH 1/5] add SWAP macro

2017-04-24 Thread Duy Nguyen
On Mon, Apr 24, 2017 at 6:49 PM, Jeff King  wrote:
> diff --git a/prio-queue.c b/prio-queue.c
> index 17252d231..fc3860fdc 100644
> --- a/prio-queue.c
> +++ b/prio-queue.c
> @@ -21,7 +21,7 @@ void prio_queue_reverse(struct prio_queue *queue)
>
> if (queue->compare != NULL)
> die("BUG: prio_queue_reverse() on non-LIFO queue");
> -   for (i = 0; i <= (j = (queue->nr - 1) - i); i++)
> +   for (i = 0; i < (j = (queue->nr - 1) - i); i++)
> swap(queue, i, j);
>  }

FWIW This shuts valgrind-3.10.1 up.
-- 
Duy


Re: [BUG] test suite broken with GETTEXT_POISON=YesPlease

2017-04-24 Thread Duy Nguyen
On Fri, Apr 21, 2017 at 9:54 PM, Lars Schneider
 wrote:
>
>> Am 20.04.2017 um 23:58 schrieb Ævar Arnfjörð Bjarmason :
>>
>> As a refresh of everyone's memory (because mine needed it). This is a
>> feature I added back in 2011 when the i18n support was initially
>> added.
>>
>> There was concern at the time that we would inadvertently mark
>> plumbing messages for translation, particularly something in a shared
>> code path, and this was a way to hopefully smoke out those issues with
>> the test suite.
>>
>> However compiling with it breaks a couple of dozen tests, I stopped
>> digging when I saw some broke back in 2014.
>>
>> What should be done about this? I think if we're going to keep them
>> they need to be run regularly by something like Travis (Lars CC'd),
>> however empirical evidence suggests that not running them is just fine
>> too, so should we just remove support for this test mode?
>
> Right now we are building and testing Git in the following configurations:
>
> 1. Linux, gcc, stable Perforce an GitLFS version (used by git-p4 tests)
> 2. Linux, gcc, stable Perforce an GitLFS version (used by git-p4 tests) *
> 3. OSX, clang, latest Perforce an GitLFS version (used by git-p4 tests)
> 4. OSX, clang, latest Perforce an GitLFS version (used by git-p4 tests) *
> 5. Linux32, gcc, no git-p4 tests
> 6. Windows, gcc, no git-p4 tests
>
> 1-4 run the same tests right now. This was especially useful in the beginning 
> to identify flaky tests (t0025 is still flaky!).
>
> We could easily run the tests in 1-4 with different configurations. E.g. 
> enable GETTEXT_POISON in 2.

If CI stays idle some time, I suggest adding a test run with valgrind
on 'pu' or 'next'. If there's still spare capacity I would add another
test run with GIT_TEST_SPLIT_INDEX=1 (but not right now, it's broken)

Off topic, is it possible to receive mail notifications from Travis
when a fault is found in either 'pu', 'next' or 'master'? I know how
to do it in Jenkins, but I'm not familiar with Travis and there's no
obvious button from the web page..
-- 
Duy


Re: BUG: wildmatches like foo/**/**/bar don't match properly due to internal optimizations

2017-04-26 Thread Duy Nguyen
On Wed, Apr 26, 2017 at 2:13 AM, Ævar Arnfjörð Bjarmason
 wrote:
> Thought I'd just start another thread for this rather than tack it
> onto the pathalogical case thread.
>
> In commit 4c251e5cb5 ("wildmatch: make /**/ match zero or more
> directories", 2012-10-15) Duy added support for ** in globs.
>
> One test-case for this is:
>
> match 1 0 'foo/baz/bar' 'foo/**/**/bar'
>
> I.e. foo/**/**/bar matches foo/baz/bar. However due to some
> pre-pruning we do in pathspec/ls-tree we can't ever match it, because
> the first thing we do is peel the first part of the path/pattern off,
> i.e. foo/, and then match baz/bar against **/**/bar.

Yeah. I think the prefix compare trick predated wildmatch. When I
introduced positional wildcards "**/" I failed to spot this. Good
catch.

Ideally this sort of optimization should be contained within wildmatch
(or whatever matching engine we'll be using). It also opens up more
opportunity (like precompile pattern mentioned elsewhere in this
thread).

You need to be careful though, when we do case-insensitive matching,
sometimes we want to match the prefix case _sensitively_ instead. So
we need to pass the "prefix" info in some cases to the matching
engine.

I guess time is now ripe (i.e. somebody volunteers to work on this ;-)
to improve wildmatch. "improve" can also be "rewriting to pcre" if we
really want that route, which I have no opinion because I don't know
pcre availability on other (some obscure) platforms.
-- 
Duy


Re: [PATCH] test: remove unused parameter from the wildmatch test

2017-04-26 Thread Duy Nguyen
On Tue, Apr 25, 2017 at 4:51 PM, Ævar Arnfjörð Bjarmason
 wrote:
> We wouldn't be using fnmatch(), but I think it's a probably a good
> idea for the tests to support a mode where we have to declare
> explicitly whether something should also match under fnmatch or not,
> so we document the differences.

It was that way for a while (when wildmatch code was still "young")
then we removed fnmatch tests because system fnmatch was inconsistent
in some corner cases and because wildmatch code did not change much
anymore.

If we start turning it upside down, yes some sort of checks like that
may be a good idea (or at least until new code stablizes again).
-- 
Duy


Re: [PATCH 07/15] remote.c: report error on failure to fopen()

2017-04-27 Thread Duy Nguyen
On Thu, Apr 27, 2017 at 12:07 PM, Johannes Sixt  wrote:
> Am 27.04.2017 um 02:57 schrieb Junio C Hamano:
>>
>> Johannes Sixt  writes:
>>
>>> +++ git ls-remote 'refs*master'
>>> +warning: unable to access '.git/branches/refs*master': Invalid argument
>>>  fatal: 'refs*master' does not appear to be a git repository
>>>  fatal: Could not read from remote repository.
>>>
>>>  Please make sure you have the correct access rights
>>>  and the repository exists.
>>> +++ exit_code=128
>>>
>>> On Windows, it is not allowed to pass a file name with an asterisk to
>>> the OS. The test case contains this comment:
>>>
>>> # We could just as easily have used "master"; the "*" emphasizes its
>>> # role as a pattern.
>>>
>>> So, can we replace the name with a simple "master", our would this
>>> miss the goal of the test case? Should we make it conditional on the
>>> MINGW prerequisite?
>>
>>
>> I would actually be more worried about the real-life impact of this
>> change.  Those who did "git ls-remote 'refs*master'" merely got "it
>> does not appear to be a git repository" and that was entirely sensible
>> response from the command.  Can Windows folks live with an extra
>> warning before it, or do they object to see that new warning?
>
>
> I was also worried that the new warning may be irritating. However, I expect
> that it is seen in practice only after a typo. My gut feeling is that this
> is bearable, because the reason for the warning should be obvious.
>
> Unless a use-case turns up where the pattern occurs routinely. We'll have to
> keep the eyes open. Until then it is better to keep the change, IMO.

OK I'll just add MINGW to the test then. Windows folks can improve
warn_on_inaccessible() to suppress certain class of error messages if
needed.
-- 
Duy


Re: [PATCH] cache-tree: reject entries with null sha1

2017-05-03 Thread Duy Nguyen
On Tue, May 2, 2017 at 2:22 AM, Jeff King  wrote:
> On Mon, May 01, 2017 at 01:23:28PM +0200, René Scharfe wrote:
>
>> Am 24.04.2017 um 12:39 schrieb Duy Nguyen:
>> > BTW, I ran t7009 with valgrind and it reported this. Is it something
>> > we should be worried about? I vaguely recall you're doing something
>> > with prio-queue...
>> >
>> > ==4246== Source and destination overlap in memcpy(0x5952990, 0x5952990, 16)
>> > ==4246==at 0x4C2EACD: memcpy@@GLIBC_2.14 (in
>> > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>> > ==4246==by 0x545D05: swap (prio-queue.c:15)
>> > ==4246==by 0x545D72: prio_queue_reverse (prio-queue.c:25)
>> > ==4246==by 0x4CBC0C: sort_in_topological_order (commit.c:723)
>> > ==4246==by 0x574C97: prepare_revision_walk (revision.c:2858)
>> > ==4246==by 0x48A2BA: cmd_rev_list (rev-list.c:385)
>> > ==4246==by 0x405A6F: run_builtin (git.c:371)
>> > ==4246==by 0x405CDC: handle_builtin (git.c:572)
>> > ==4246==by 0x405E51: run_argv (git.c:624)
>> > ==4246==by 0x405FF3: cmd_main (git.c:701)
>> > ==4246==by 0x4A48CE: main (common-main.c:43)
>>
>> I can only get gcc and clang to call memcpy instead of inlining it by
>> specifying -fno-builtin.  Do you use that option?  If yes, why?  (Just
>> curious.)
>
> I do my normal edit-compile cycles with -O0 because it's fast, and
> because it makes debugging much easier.

Same here. My CFLAGS (without lots of -Wstuff)

CFLAGS =  -g -O0 -fstack-protector

Maybe it's -fstack-protector then? This is gcc 4.9.3. I think Gentoo
does not add any distro-specific patches on this particular version.
-- 
Duy


Re: [PATCH 0/5] Start of a journey: drop NO_THE_INDEX_COMPATIBILITY_MACROS

2017-05-03 Thread Duy Nguyen
On Tue, May 2, 2017 at 8:36 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> This applies to origin/master.
>>
>> For better readability and understandability for newcomers it is a good idea
>> to not offer 2 APIs doing the same thing with on being the #define of the 
>> other.
>>
>> In the long run we may want to drop the macros guarded by
>> NO_THE_INDEX_COMPATIBILITY_MACROS. This converts a couple of them.
>
> Why?

Well.. why did you add NO_THE_INDEX_COMP... in the first place? ;-) j/k

> Why should we keep typing &the_index, when most of the time we are given 
> _the_ index and working on it?

I attempted the same thing once or twice in the past, had the same
impression and dropped it. But I think it's good to get rid of cache_*
macros, at least outside builtin/ directory. It makes it clearer about
the index dependency. Maybe one day we could libify sha1_file.c and
finally introduce "struct repository", where the_index, object db and
ref-store belong (hmm.. the index may belong to "struct worktree", not
the repo one...).

So yeah it may look a bit more verbose (and probably causes a lot more
conflicts on 'pu') but in my opinion it's a good direction. I wish
Stefan good luck. Brave soul :D
-- 
Duy


Re: RFA: untracked cache vs git reset --hard

2017-05-03 Thread Duy Nguyen
On Wed, May 3, 2017 at 5:27 PM, Johannes Schindelin
 wrote:
> Hi all,
>
> I have a problem and would like to solicit advice how to fix it.
>
> The untracked cache has made a real difference on rather large
> repositories with tons of directories, and it is really, really useful.
>
> But. One innocuous `git reset --hard` will just blow it away.
>
> How? reset_index() calls unpack_trees() which in turn tries to populate a
> new index and then discards the old one:
>
> https://github.com/git/git/blob/v2.12.2/unpack-trees.c#L1293
>
> That discard_index() unfortunately also blows away each and every index
> extension that had been read carefully before.

This is a real problem when we introduce non-optional extensions (i.e.
extension name in lower case). Dropping them is not an option because
they may contain vital/original information. We haven't any so far,
but I've been wanting to add one for years (narrow clone). So I'm all
for tackling the problem now :)

> All users of `git reset --hard` (including `git stash`) suffer this.
>
> In fact, it looks as if *any* caller of unpack_trees() would suffer the
> same problem: git-am, git-checkout, git-commit, git-merge, etc
>
> Now, I could imagine that maybe we could just "move"
> o->dst_index.untracked to o->result.untracked, and that the machinery then
> would do the right thing.

These extensions may have dependencies in the o->result.cache[] (do we
allow an extension to depend on another?). If invalidation is not
handled correctly then it's not safe to simply copy the extension
over.

For untracked cache, I think we do invalidation right and just moving
it over dst_index (and resetting NULL in o->result so it does not get
accidentally deleted) is fine.

I'd rather we have a common way of dealing with this for any extension
though. Split index needs special treatment too [1]. Maybe we can add

int migrate_index_extensions(struct index_state *dst, struct index_state *src);

in read-cache.c where it calls migrate_XXX() for each extension. In
some cases (cache-tree) we could even do more, like repair cache-tree
there to avoid hitting performance regressions.

[1] https://github.com/git/git/blob/v2.12.2/unpack-trees.c#L1165-L1167
-- 
Duy


Re: PCRE v2 compile error, was Re: What's cooking in git.git (May 2017, #01; Mon, 1)

2017-05-03 Thread Duy Nguyen
On Wed, May 3, 2017 at 4:45 PM, Johannes Schindelin
 wrote:
>> So it makes sense to give our downstream distributors a nudge to
>> switch over to it.

Some contributor (i.e. me) was not happy with this nudging though. The
other day I switched to some branch (probably 'pu') and the build
failed because, guess what, I didn't have pcre2 installed. If I set
USE_LIBPCRE1 then I lose pcre support when switching to other
branches. And no, I don't want to install libpcre2, not when I'm
forced to this way.

188 packages on Gentoo optionally depend on libpcre, 6 packages on
libpcre2. Chances that a Gentoo user has libpcre2 already are rather
low. I'll revisit my installation when the level of libpcre2 support
grows a bit more than that. You can nudge distributors directly,
probably more efficient too.

> ...
>
> I hate to be that someone, but it has to be said: this is a disruptive
> change, and it would be a lot better to make it an opt-in at first, and
> when the dust settled about this option and many distributions have opted
> in already because of the benefits and tested this thoroughly in practice,

Agreed.
-- 
Duy


Re: [PATCH v2 13/21] remote.c: report error on failure to fopen()

2017-05-09 Thread Duy Nguyen
Sorry for super late reply. I'm slowly catching up.

On Wed, May 3, 2017 at 10:22 PM, Johannes Schindelin
 wrote:
> Hi Duy,
>
>
> On Wed, 3 May 2017, Nguyễn Thái Ngọc Duy wrote:
>
>> There's plenty of error() in this code to safely assume --quiet is not a
>> concern.
>>
>> t5512 needs update because if we check the path 'refs*master' (i.e. the
>> asterisk is part of the path) then we'll get an EINVAL error.
>
> So the first change in this patch unmasks a bug that is fixed by the
> second patch?

The change in read_branches_file() in this patch causes the failure.
See the original report [1],

[1] http://public-inbox.org/git/3a2686c2-6766-1235-001f-1b5283b5f...@kdbg.org/
-- 
Duy


Re: [PATCH v2 00/21]

2017-05-09 Thread Duy Nguyen
On Sun, May 7, 2017 at 11:20 AM, Junio C Hamano  wrote:
> On Thu, May 4, 2017 at 2:45 PM, Junio C Hamano  wrote:
>>
>> Nguyễn Thái Ngọc Duy   writes:
>>
>> > Changes since v1:
>> >
>> >  - fopen_or_warn() and warn_on_fopen_errors() are introduced. The
>> >latter's name is probably not great...
>> >  - A new patch (first one) to convert a bunch to using xfopen()
>> >  - Fix test failure on Windows, found by Johannes Sixt
>> >  - Fix the memory leak in log.c, found by Jeff
>> >
>> > There are still a couple of fopen() remained, but I'm getting slow
>> > again so let's get these in first and worry about the rest when
>> > somebody has time.
>
> Hmm, is this topic responsible for recent breakage Travis claims on MacOS X?
>
> https://travis-ci.org/git/git/jobs/229585098#L3091
>
> seems to expect an error when test-config is fed a-directory but we are
> not getting the expected warning and error?

Sounds about right. Let me see if defining FREAD_READS_DIRECTORIES on
MacOS X makes travis happy.
-- 
Duy


Re: [PATCH] run-command: use strbuf_addstr() for adding a string to a strbuf

2018-03-25 Thread Duy Nguyen
On Sun, Mar 25, 2018 at 12:57 PM, René Scharfe  wrote:
> Patch generated with Coccinelle and contrib/coccinelle/strbuf.cocci.
>
> Signed-off-by: Rene Scharfe 
> ---
> That line was added by e73dd78699 (run-command.c: introduce
> trace_run_command()).
>
>  run-command.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/run-command.c b/run-command.c
> index a483d5904a..84899e423f 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -621,7 +621,7 @@ static void trace_run_command(const struct child_process 
> *cp)
> if (!trace_want(&trace_default_key))
> return;
>
> -   strbuf_addf(&buf, "trace: run_command:");
> +   strbuf_addstr(&buf, "trace: run_command:");

Obviously correct. Ack.

> if (cp->dir) {
> strbuf_addstr(&buf, " cd ");
> sq_quote_buf_pretty(&buf, cp->dir);
> --
> 2.16.3



-- 
Duy


Re: [PATCH v7 00/13] nd/pack-objects-pack-struct updates

2018-03-26 Thread Duy Nguyen
On Mon, Mar 26, 2018 at 5:13 PM, Jeff King  wrote:
> On Sat, Mar 24, 2018 at 07:33:40AM +0100, Nguyễn Thái Ngọc Duy wrote:
>
>> +unsigned long oe_get_size_slow(struct packing_data *pack,
>> +const struct object_entry *e)
>> +{
>> + struct packed_git *p;
>> + struct pack_window *w_curs;
>> + unsigned char *buf;
>> + enum object_type type;
>> + unsigned long used, avail, size;
>> +
>> + if (e->type_ != OBJ_OFS_DELTA && e->type_ != OBJ_REF_DELTA) {
>> + read_lock();
>> + if (sha1_object_info(e->idx.oid.hash, &size) < 0)
>> + die(_("unable to get size of %s"),
>> + oid_to_hex(&e->idx.oid));
>> + read_unlock();
>> + return size;
>> + }
>> +
>> + p = oe_in_pack(pack, e);
>> + if (!p)
>> + die("BUG: when e->type is a delta, it must belong to a pack");
>> +
>> + read_lock();
>> + w_curs = NULL;
>> + buf = use_pack(p, &w_curs, e->in_pack_offset, &avail);
>> + used = unpack_object_header_buffer(buf, avail, &type, &size);
>> + if (used == 0)
>> + die(_("unable to parse object header of %s"),
>> + oid_to_hex(&e->idx.oid));
>> +
>> + unuse_pack(&w_curs);
>> + read_unlock();
>> + return size;
>> +}
>
> It took me a while to figure out why this treated deltas and non-deltas
> differently. At first I thought it was an optimization (since we can
> find non-delta sizes quickly by looking at the headers).  But I think
> it's just that you want to know the size of the actual _delta_, not the
> reconstructed object. And there's no way to ask sha1_object_info() for
> that.
>
> Perhaps the _extended version of that function should learn an
> OBJECT_INFO_NO_DEREF flag or something to tell it return the true delta
> type and size. Then this whole function could just become a single call.
>
> But short of that, it's probably worth a comment explaining what's going
> on.

I thought the elaboration on "size" in the big comment block in front
of struct object_entry was enough. I was wrong. Will add something
here.

>> +Running tests with special setups
>> +-
>> +
>> +The whole test suite could be run to test some special features
>> +that cannot be easily covered by a few specific test cases. These
>> +could be enabled by running the test suite with correct GIT_TEST_
>> +environment set.
>> +
>> +GIT_TEST_SPLIT_INDEX forces split-index mode on the whole test suite.
>> +
>> +GIT_TEST_FULL_IN_PACK_ARRAY exercises the uncommon pack-objects code
>> +path where there are more than 1024 packs even if the actual number of
>> +packs in repository is below this limit.
>> +
>> +GIT_TEST_OE_SIZE_BITS= exercises the uncommon pack-objects
>> +code path where we do not cache objecct size in memory and read it
>> +from existing packs on demand. This normally only happens when the
>> +object size is over 2GB. This variable forces the code path on any
>> +object larger than 2^ bytes.
>
> It's nice to have these available to test the uncommon cases. But I have
> a feeling nobody will ever run them, since it requires extra effort (and
> takes a full test run).

I know :) I also know that this does not interfere with
GIT_TEST_SPLIT_INDEX, which is being run in Travis. So the plan (after
this series is merged) is to make Travis second run to do something
like

make test GIT_TEST_SPLIT...=1 GIT_TEST_FULL..=1 GIT_TEST_OE..=4

we don't waste more cpu cycles and we can make sure these code paths
are always run (at least on one platform)

> I see there's a one-off test for GIT_TEST_FULL_IN_PACK_ARRAY, which I
> think is a good idea, since it makes sure the code is exercised in a
> normal test suite run. Should we do the same for GIT_TEST_OE_SIZE_BITS?

I think the problem with OE_SIZE_BITS is it has many different code
paths (like reused deltas) which is hard to make sure it runs. But yes
I think I could construct a pack that executes both code paths in
oe_get_size_slow(). Will do in a reroll.

> I haven't done an in-depth read of each patch yet; this was just what
> jumped out at me from reading the interdiff.

I would really appreciate it if you could find some time to do it. The
bugs I found in this round proved that I had no idea what's really
going on in pack-objects. Sure I know the big picture but that's far
from enough to do changes like this.
-- 
Duy


Re: [PATCH v2 01/36] t/helper: add an empty test-tool program

2018-03-26 Thread Duy Nguyen
On Mon, Mar 26, 2018 at 5:27 PM, Johannes Schindelin
 wrote:
> Hi Duy,
>
> On Sat, 24 Mar 2018, Nguyễn Thái Ngọc Duy wrote:
>
>> diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
>> new file mode 100644
>> index 00..c730f718ca
>> --- /dev/null
>> +++ b/t/helper/test-tool.c
>> @@ -0,0 +1,27 @@
>> +#include "git-compat-util.h"
>> +#include "test-tool.h"
>> +
>> +struct test_cmd {
>> + const char *name;
>> + int (*main)(int argc, const char **argv);
>
> This makes the build fail on Windows, as we override `main` in
> compat/mingw.h:

Sigh.. not complaining, but I wish somebody tries to compile git with
wine (and automate it in travis). This way we could at least cover the
compilation part for all major platforms. Probably too small for a
GSoC (and making the test suite pass with wine may be too large for
GSoC)
-- 
Duy


Re: git complains packed-refs is not a directory when used with GIT_DIR and GIT_WORK_TREE envvars.

2018-03-27 Thread Duy Nguyen
On Tue, Mar 27, 2018 at 8:31 AM, Jeff King  wrote:
>...
>
> But that really feels like we're papering over the problem.

It is papering over the problem in my opinion. setup_work_tree() is
involved here and when it moves cwd around, it re-init git-dir (and
all other paths). Before my patch we call git_path() only when we need
it and git-dir is likely already updated by setup_work_tree(). After
this patch, the path is set in stone before setup_work_tree() kicks
in. Once it moves cwd, relative paths all become invalid.

The way setup_work_tree() does it is just bad to me. When it calls
set_git_dir() again, the assumption is the new path is exactly the
same as the old one. The only difference is relative vs absolute. But
this is super hard to see from set_git_dir implementation. The new
struct repository design also inherits this (i.e. allowing to call
set_git_dir multiple times, which really does not make sense), but
this won't fly long term. When cwd moves, all cached relative paths
must be adjusted, we need a better mechanism to tell everybody that,
not just do it for $GIT_DIR and related paths.

I am planning to fix this. This is part of the "setup cleanup" effort
to keep repository.c design less messy and hopefully unify how the
main repo and submodule repo are created/set up. But frankly that may
take a long while before I could submit anything substantial (I also
have the "report multiple worktree's HEADs correctly and make fsck
count all HEADs" task, which is not small and to me have higher
priority).

So I would not mind papering over it right now (with an understanding
that absolute path pays some more overhead in path walking, which was
th reason we tried to avoid it in setup code). A slightly better patch
is trigger this path absolutization from setup_work_tree(), near
set_git_dir(). But then you face some ugliness there: how to find out
all ref stores to update, or just update the main ref store alone.

> It's not
> clear to me exactly what f57f37e2 is trying to accomplish, and whether
> it would work for it to look call get_git_dir() whenever it needed the
> path.
>
> -Peff
-- 
Duy


Re: [PATCH v2] Makefile: detect compiler and enable more warnings in DEVELOPER=1

2018-03-27 Thread Duy Nguyen
On Tue, Mar 27, 2018 at 12:02 AM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> The set of extra warnings we enable when DEVELOPER has to be
>> conservative because we can't assume any compiler version the
>> developer may use. Detect the compiler version so we know when it's
>> safe to enable -Wextra and maybe more.
>
> This is a good idea in general, but we are not quite ready without
> some fixups.
>
> Here is a quick summary (not exhaustive) from my trial merge to 'pu'
> (which will be reverted before today's integration is pushed out).
>
>  - json-writer.c triggers -Werror=old-style-decl
>
>  - t/helper/test-json-writer.c triggers Werror=missing-prototypes
>quite a few times.

I expected these (and it was a good reason to push this patch
forward). I think people also reported style problems with this
series.

>
>  - connect.c -Werror=implicit-fallthough around die_initial_contact().
>

This I did not expect. But I just looked again and I had this option
explicitly turned off in config.mak! gcc-6.4 and gcc-4.9 do not
complain about this. gcc-7.3 does. What's your compiler/version?


-- 
Duy


Re: [PATCH v6 07/35] connect: convert get_remote_heads to use struct packet_reader

2018-03-27 Thread Duy Nguyen
On Thu, Mar 15, 2018 at 10:31:14AM -0700, Brandon Williams wrote:
> In order to allow for better control flow when protocol_v2 is introduced
> +static enum protocol_version discover_version(struct packet_reader *reader)
> +{
> + enum protocol_version version = protocol_unknown_version;
> +
> + /*
> +  * Peek the first line of the server's response to
> +  * determine the protocol version the server is speaking.
> +  */
> + switch (packet_reader_peek(reader)) {
> + case PACKET_READ_EOF:
> + die_initial_contact(0);
> + case PACKET_READ_FLUSH:

gcc is dumb. When -Werror and -Wimplicit-fallthrough are enabled (on
at least gcc 7.x), it fails to realize that this die_initial_contact()
will not fall through (even though we do tell it about die() not
returning, but I guess that involves more flow analysis to realize
die_initial_contact is in the same boat).

Since -Wimplicit-fallthrough may be useful to spot bugs elsewhere and
there are only two places in this series that tick it off, is it
possible to squash in something like this? On the off chance that we
fail horribly to die, "break;" would stop the wrong code from
executing even more.

This covers another patch too, but you get the idea.

diff --git a/connect.c b/connect.c
index 54971166ac..847bf2f7d6 100644
--- a/connect.c
+++ b/connect.c
@@ -124,6 +124,7 @@ enum protocol_version discover_version(struct packet_reader 
*reader)
switch (packet_reader_peek(reader)) {
case PACKET_READ_EOF:
die_initial_contact(0);
+   break;
case PACKET_READ_FLUSH:
case PACKET_READ_DELIM:
version = protocol_v0;
@@ -303,6 +304,7 @@ struct ref **get_remote_heads(struct packet_reader *reader,
switch (packet_reader_read(reader)) {
case PACKET_READ_EOF:
die_initial_contact(1);
+   break;
case PACKET_READ_NORMAL:
len = reader->pktlen;
if (len > 4 && skip_prefix(reader->line, "ERR ", &arg))

--
Duy


Re: [PATCH v6 07/35] connect: convert get_remote_heads to use struct packet_reader

2018-03-27 Thread Duy Nguyen
On Tue, Mar 27, 2018 at 6:11 PM, Jeff King  wrote:
> On Tue, Mar 27, 2018 at 05:27:14PM +0200, Duy Nguyen wrote:
>
>> On Thu, Mar 15, 2018 at 10:31:14AM -0700, Brandon Williams wrote:
>> > In order to allow for better control flow when protocol_v2 is introduced
>> > +static enum protocol_version discover_version(struct packet_reader 
>> > *reader)
>> > +{
>> > +   enum protocol_version version = protocol_unknown_version;
>> > +
>> > +   /*
>> > +* Peek the first line of the server's response to
>> > +* determine the protocol version the server is speaking.
>> > +*/
>> > +   switch (packet_reader_peek(reader)) {
>> > +   case PACKET_READ_EOF:
>> > +   die_initial_contact(0);
>> > +   case PACKET_READ_FLUSH:
>>
>> gcc is dumb. When -Werror and -Wimplicit-fallthrough are enabled (on
>> at least gcc 7.x), it fails to realize that this die_initial_contact()
>> will not fall through (even though we do tell it about die() not
>> returning, but I guess that involves more flow analysis to realize
>> die_initial_contact is in the same boat).
>> [...]
>> @@ -124,6 +124,7 @@ enum protocol_version discover_version(struct 
>> packet_reader *reader)
>>   switch (packet_reader_peek(reader)) {
>>   case PACKET_READ_EOF:
>>   die_initial_contact(0);
>> + break;
>
> Would it make sense just to annotate that function to help the flow
> analysis?

Yes that works wonderfully with my gcc-7.3.0

> Like:
>
> diff --git a/connect.c b/connect.c
> index c3a014c5ba..49eca46462 100644
> --- a/connect.c
> +++ b/connect.c
> @@ -46,7 +46,7 @@ int check_ref_type(const struct ref *ref, int flags)
> return check_ref(ref->name, flags);
>  }
>
> -static void die_initial_contact(int unexpected)
> +static NORETURN void die_initial_contact(int unexpected)
>  {
> if (unexpected)
> die(_("The remote end hung up upon initial contact"));
>
> That should let the callers know what's going on, and inside the
> function itself, the compiler should confirm that all code paths hit
> another NORETURN function.
>
> -Peff



-- 
Duy


Re: [PATCH v6 07/35] connect: convert get_remote_heads to use struct packet_reader

2018-03-27 Thread Duy Nguyen
On Tue, Mar 27, 2018 at 6:25 PM, Duy Nguyen  wrote:
> On Tue, Mar 27, 2018 at 6:11 PM, Jeff King  wrote:
>> On Tue, Mar 27, 2018 at 05:27:14PM +0200, Duy Nguyen wrote:
>>
>>> On Thu, Mar 15, 2018 at 10:31:14AM -0700, Brandon Williams wrote:
>>> > In order to allow for better control flow when protocol_v2 is introduced
>>> > +static enum protocol_version discover_version(struct packet_reader 
>>> > *reader)
>>> > +{
>>> > +   enum protocol_version version = protocol_unknown_version;
>>> > +
>>> > +   /*
>>> > +* Peek the first line of the server's response to
>>> > +* determine the protocol version the server is speaking.
>>> > +*/
>>> > +   switch (packet_reader_peek(reader)) {
>>> > +   case PACKET_READ_EOF:
>>> > +   die_initial_contact(0);
>>> > +   case PACKET_READ_FLUSH:
>>>
>>> gcc is dumb. When -Werror and -Wimplicit-fallthrough are enabled (on
>>> at least gcc 7.x), it fails to realize that this die_initial_contact()
>>> will not fall through (even though we do tell it about die() not
>>> returning, but I guess that involves more flow analysis to realize
>>> die_initial_contact is in the same boat).
>>> [...]
>>> @@ -124,6 +124,7 @@ enum protocol_version discover_version(struct 
>>> packet_reader *reader)
>>>   switch (packet_reader_peek(reader)) {
>>>   case PACKET_READ_EOF:
>>>   die_initial_contact(0);
>>> + break;
>>
>> Would it make sense just to annotate that function to help the flow
>> analysis?
>
> Yes that works wonderfully with my gcc-7.3.0

And this changes things. Since this series is 35 patches and there's
no sign of reroll needed, I'm going to make this change separately.
Don't reroll just because of this
-- 
Duy


Re: git complains packed-refs is not a directory when used with GIT_DIR and GIT_WORK_TREE envvars.

2018-03-27 Thread Duy Nguyen
On Tue, Mar 27, 2018 at 6:47 PM, Jeff King  wrote:
>> So I would not mind papering over it right now (with an understanding
>> that absolute path pays some more overhead in path walking, which was
>> th reason we tried to avoid it in setup code). A slightly better patch
>> is trigger this path absolutization from setup_work_tree(), near
>> set_git_dir(). But then you face some ugliness there: how to find out
>> all ref stores to update, or just update the main ref store alone.
>
> I don't quite get why f57f37e2 doesn't want to call git_path(). Is it to
> avoid the way the path is munged? Or is it to avoid some lazy-setup that
> is triggered by calling get_git_dir() at all (which doesn't make much
> sense to me, because we'd already have called get_git_dir() much
> earlier). Or is it just because we may sometimes fill in refs->git_dir
> with something besides get_git_dir() (for a submodule or worktree or
> something)?

None of those, I think. git_path() does some magic to translate paths
so that refs/... ends up with $GIT_COMMON_DIR/refs/... while "index"
ends up with $GIT_DIR/index. Michael wanted to avoid that magic and
keep the control within refs code (i.e. this code knows refs/ and
packed-refs are shared, and pseudo refs are not, what git_path()
decides does not matter).

> I.e., can we do one of (depending on which of those answers is "yes"):
>
>   1. Stop caching the value of get_git_dir(), and instead call it
>  on-demand instead of looking at refs->git_dir? (If we just want to
>  avoid git_path()).

This probably works, but I count it as papering over the problem too.

>
>   2. If we need to avoid even calling get_git_dir(), can we add a
>  "light" version of it that avoids whatever side effects we're
>  trying to skip?
>
>   3. If the problem is just that sometimes we need get_git_dir() and
>  sometimes not, could we perhaps store NULL as a sentinel to mean
>  "look up get_git_dir() when you need it"?
>
>  That would let submodules and worktrees fill in their paths as
>  necessary (assuming they never change after init), but handle the
>  case of get_git_dir() changing.
>
> Hmm. Typing that out, it seems like (3) is probably the right path.
> Something like the patch below seems to fix it and passes the tests.

Honestly I think this is just another way to work around the problem
(with even more changes than your abspath approach). The problem is
with setup_work_tree(). We create a ref store at a specific location
and it should stay working without lazily calling get_git_dir(), which
has nothing to do (anymore) with the path we have given a ref store.
If somebody changes a global setting like $CWD, it should be well
communicated to everybody involved.

I would rather have something like ref_store_reinit() in the same
spirit as the second call of set_git_dir() in setup_work_tree. It is
hacky, but it works and keeps changes to minimal (so that it could be
easily replaced later).
-- 
Duy


Re: git complains packed-refs is not a directory when used with GIT_DIR and GIT_WORK_TREE envvars.

2018-03-27 Thread Duy Nguyen
On Tue, Mar 27, 2018 at 07:09:36PM +0200, Duy Nguyen wrote:
> I would rather have something like ref_store_reinit() in the same
> spirit as the second call of set_git_dir() in setup_work_tree. It is
> hacky, but it works and keeps changes to minimal (so that it could be
> easily replaced later).

So in the name of hacky and dirty things, it would look something like
this. This passed your test case. The test suite is still running
(slow laptop) but I don't expect breakages there.

-- 8< --
diff --git a/refs.c b/refs.c
index 20ba82b434..c6116c4f7a 100644
--- a/refs.c
+++ b/refs.c
@@ -1660,6 +1660,16 @@ struct ref_store *get_main_ref_store(void)
return main_ref_store;
 }
 
+void make_main_ref_store_use_absolute_paths(void)
+{
+   files_force_absolute_paths(get_main_ref_store());
+}
+
+void make_main_ref_store_use_relative_paths(const char *cwd)
+{
+   files_make_relative_paths(get_main_ref_store(), cwd);
+}
+
 /*
  * Associate a ref store with a name. It is a fatal error to call this
  * function twice for the same name.
diff --git a/refs.h b/refs.h
index 01be5ae32f..532a4ad09d 100644
--- a/refs.h
+++ b/refs.h
@@ -759,6 +759,9 @@ int reflog_expire(const char *refname, const struct 
object_id *oid,
 int ref_storage_backend_exists(const char *name);
 
 struct ref_store *get_main_ref_store(void);
+void make_main_ref_store_use_absolute_paths(void);
+void make_main_ref_store_use_relative_paths(const char *cwd);
+
 /*
  * Return the ref_store instance for the specified submodule. For the
  * main repository, use submodule==NULL; such a call cannot fail. For
diff --git a/refs/files-backend.c b/refs/files-backend.c
index bec8e30e9e..629198826f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3092,6 +3092,32 @@ static int files_reflog_expire(struct ref_store 
*ref_store,
return -1;
 }
 
+void files_force_absolute_paths(struct ref_store *ref_store)
+{
+   struct files_ref_store *refs =
+   files_downcast(ref_store, REF_STORE_WRITE, "don't ask");
+
+   char *path = refs->gitdir;
+   refs->gitdir = absolute_pathdup(path);
+   free(path);
+
+   path = refs->gitcommondir;
+   refs->gitcommondir = absolute_pathdup(path);
+   free(path);
+}
+
+void files_make_relative_paths(struct ref_store *ref_store, const char *cwd)
+{
+   struct files_ref_store *refs =
+   files_downcast(ref_store, REF_STORE_WRITE, "don't ask");
+
+   const char *path = remove_leading_path(refs->gitdir, cwd);
+   refs->gitdir = absolute_pathdup(path);
+
+   path = remove_leading_path(refs->gitcommondir, cwd);
+   refs->gitcommondir = absolute_pathdup(path);
+}
+
 static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
 {
struct files_ref_store *refs =
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index dd834314bd..827e97bcca 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -669,4 +669,7 @@ struct ref_store {
 void base_ref_store_init(struct ref_store *refs,
 const struct ref_storage_be *be);
 
+void files_force_absolute_paths(struct ref_store *refs);
+void files_make_relative_paths(struct ref_store *refs, const char *cwd);
+
 #endif /* REFS_REFS_INTERNAL_H */
diff --git a/setup.c b/setup.c
index 7287779642..a5f4396b4e 100644
--- a/setup.c
+++ b/setup.c
@@ -3,6 +3,7 @@
 #include "config.h"
 #include "dir.h"
 #include "string-list.h"
+#include "refs.h"
 
 static int inside_git_dir = -1;
 static int inside_work_tree = -1;
@@ -389,8 +390,10 @@ void setup_work_tree(void)
 
work_tree = get_git_work_tree();
git_dir = get_git_dir();
-   if (!is_absolute_path(git_dir))
+   if (!is_absolute_path(git_dir)) {
git_dir = real_path(get_git_dir());
+   make_main_ref_store_use_absolute_paths();
+   }
if (!work_tree || chdir(work_tree))
die(_("this operation must be run in a work tree"));
 
@@ -402,6 +405,7 @@ void setup_work_tree(void)
setenv(GIT_WORK_TREE_ENVIRONMENT, ".", 1);
 
set_git_dir(remove_leading_path(git_dir, work_tree));
+   make_main_ref_store_use_relative_paths(work_tree);
initialized = 1;
 }
 
-- 8< --


Re: git complains packed-refs is not a directory when used with GIT_DIR and GIT_WORK_TREE envvars.

2018-03-28 Thread Duy Nguyen
On Wed, Mar 28, 2018 at 11:52 AM, Jeff King  wrote:
> On Tue, Mar 27, 2018 at 07:30:24PM +0200, Duy Nguyen wrote:
>
>> On Tue, Mar 27, 2018 at 07:09:36PM +0200, Duy Nguyen wrote:
>> > I would rather have something like ref_store_reinit() in the same
>> > spirit as the second call of set_git_dir() in setup_work_tree. It is
>> > hacky, but it works and keeps changes to minimal (so that it could be
>> > easily replaced later).
>>
>> So in the name of hacky and dirty things, it would look something like
>> this. This passed your test case. The test suite is still running
>> (slow laptop) but I don't expect breakages there.
>
> I think this is the right direction. I mentioned in my last reply that
> it would be nice for this to be a bit more generic, in case we need to
> use it again (and also just to keep the module boundaries sane).

Yes, that's why I called it hacky and dirty :) I keep thinking about
this, so I will probably fix it in a nicer way.

> This part confused me at first:
>
>> +void make_main_ref_store_use_absolute_paths(void)
>> +{
>> + files_force_absolute_paths(get_main_ref_store());
>> +}
>> +
>> +void make_main_ref_store_use_relative_paths(const char *cwd)
>> +{
>> + files_make_relative_paths(get_main_ref_store(), cwd);
>> +}
>
> since I thought you were actually turning things into absolute paths.
> But your procedure is basically "turn absolute, then after chdir, turn
> them back relative".
>
> I think it might be clearer if a single call is given both the old and
> new paths. That requires the caller of chdir() storing getcwd() before
> it moves, but I don't think that should be a big deal.

The problem is switching relative paths relies on the old $CWD if I'm
not mistaken and we need  getcwd() for this. I'd love to have one
callback that says "$CWD has been switched from this path to that
path, do whatever you need to" that can be called any time, before or
after chdir(). I'll look more into it.
-- 
Duy


Re: [PATCH 2/8] strbuf.c: reintroduce get_pwd_cwd() (with strbuf_ prefix)

2018-03-28 Thread Duy Nguyen
On Wed, Mar 28, 2018 at 8:02 PM, Stefan Beller  wrote:
> On Wed, Mar 28, 2018 at 10:55 AM, Nguyễn Thái Ngọc Duy
>  wrote:
>> +/**
>> + * Return the current directory, fall back to $PWD if the
>> + * current directory does not exist.
>> + */
>> +extern void strbuf_get_pwd_cwd(struct strbuf *sb);
>
> Please see 89a9f2c862 (CodingGuidelines: mention "static" and "extern",
> 2018-02-08) and drop the extern if you need to reroll.

I'm aware (and in favor) of that actually. But this whole strbuf.h
uses extern. Either I had to delete all extern first, or go with old
style and maybe do the whole deletion later. I chose the former. Maybe
be I'll delete all "extern" as a prep patch.
-- 
Duy


Re: [PATCH 6/8] environment.c: adjust env containing relpath when $CWD is moved

2018-03-28 Thread Duy Nguyen
On Wed, Mar 28, 2018 at 8:30 PM, Jeff King  wrote:
> On Wed, Mar 28, 2018 at 07:55:35PM +0200, Nguyễn Thái Ngọc Duy wrote:
>
>> From: Duy Nguyen 
>>
>> As noted in the previous patch, when $CWD is moved, we recognize the
>> problem with relative paths and update $GIT_WORK_TREE and $GIT_DIR
>> with new ones.
>>
>> We have plenty more environment variables that can contain paths
>> though. If they are read and cached before setup_work_tree() is
>> called, nobody will update them and they become bad paths.
>
> Hmm, yeah, I missed these. It would be interesting to know if there are
> easy-to-run test cases that show off these bugs, or if they're
> hypothetical. (Even if they are hypothetical, I'm not opposed to fixing
> them in the name of maintainability).

It's kinda hard to show off these. But the GIT_ALTERNATE_OBJ..
variable could be one example in favor of this fix. Before, we lazily
read the env var which is most likely after setup_work_tree() has run.
After a bunch of code reorganization and stuff, GIT_ALTERNATE_OBJ is
now read very early, which should be safe (why not?) but it actually
breaks. Alternate db is only queried as the last resort if I'm not
mistaken, so 90% of time you just hit an object in odb and never find
out that your GIT_ALTERNATE_OBJ... points to nowhere.

>> diff --git a/environment.c b/environment.c
>> index 39b3d906c8..f9dcc1b99e 100644
>> --- a/environment.c
>> +++ b/environment.c
>> @@ -128,6 +128,20 @@ const char * const local_repo_env[] = {
>>   NULL
>>  };
>>
>> +/* A subset of local_repo_env[] that contains path */
>> +const char * const local_repo_path_env[] = {
>> + ALTERNATE_DB_ENVIRONMENT,
>> + CONFIG_ENVIRONMENT,
>> + DB_ENVIRONMENT,
>> + GIT_COMMON_DIR_ENVIRONMENT,
>> + GIT_DIR_ENVIRONMENT,
>> + GIT_SHALLOW_FILE_ENVIRONMENT,
>> + GIT_WORK_TREE_ENVIRONMENT,
>> + GRAFT_ENVIRONMENT,
>> + INDEX_ENVIRONMENT,
>> + NULL
>> +};
>
> It might be nice to fold this list into local_repo_env automatically. I
> think you'd have to do it with a macro.

Aha! I did not like the split either and wanted to turn
local_repo_env[] to an array of struct so we can add attributes to
each variable, but the way local_repo_env[] is being used, that's
impossible without more surgery.

> OTOH, it's possible that there could be a path-related environment
> variable that _isn't_ actually part of local_repo_env. E.g., I think
> GIT_CONFIG might classify there (though I don't know if it's worth
> worrying about).

I'd rather fix it now and forget about it than trying to troubleshoot
a problem related to bad relative $GIT_CONFIG (even if the chance of
that happening is probably 1%)

>> +static void update_path_envs(const char *old_cwd, const char *new_cwd,
>> +  void *cb)
>> +{
>> + int i;
>> +
>> + /*
>> +  * FIXME: special treatment needed for
>> +  * GIT_ALTERNATE_OBJECT_DIRECTORIES because it can contain
>> +  * multiple paths.
>> +  */
>
> Yuck. It just keeps getting more complicated. :(
>
> I do wonder if relative paths in variables like this are worth worrying
> about. AFAIK, it's always been a "well, it kind of works" situation, but

Yeah. 99% of time $GIT_DIR is $GIT_WORK_TREE/.git and no path
adjustment is needed.

> not something we've tried to actively support. I think with the current
> code you'd potentially get inconsistent results between a command which
> sets up the work tree and one which doesn't. So this would be fixing
> that, but at the same time, I'm not sure how much we want to promise
> here.

I would be just as happy to die() when we find out we have relative
paths that takes too much work to reparent. It keeps the amount of
work down and will not bite us later (and will let us know if any user
needs it because they would have to report back after hitting the said
die()).
-- 
Duy


Re: [PATCH 2/4] add chdir-notify API

2018-03-29 Thread Duy Nguyen
On Wed, Mar 28, 2018 at 7:40 PM, Jeff King  wrote:
> +static void reparent_cb(const char *old_cwd,
> +   const char *new_cwd,
> +   void *data)
> +{
> +   char **path = data;

Maybe check data == NULL and return early. This is just for
convenience, e.g. instead of doing

path = getenv("foo");
if (path)
chdir_notify_reparent(&path);

I could do

path = getenv("foo");
chdir_notify_reparent(&path);

> +   char *tmp = *path;
> +   *path = reparent_relative_path(old_cwd, new_cwd, tmp);
> +   free(tmp);
> +}
> +
> +void chdir_notify_reparent(char **path)
> +{
> +   if (!is_absolute_path(*path))

I think this check is a bit early. There could be a big gap between
chdir_notify_reparent() and reparent_cb(). During that time, *path may
be updated and become a relative path. We can check for absolute path
inside reparent_cb() instead.

> +   chdir_notify_register(reparent_cb, path);
> +}

Overall, I like this API very much! I will add another one similar to
chdir_notify_reparent() but it takes an env name instead and the
callback will setenv() appropriately. The result looks super good.
-- 
Duy


Re: [PATCH 0/8] Re: git complains packed-refs is not a directory when used with GIT_DIR and GIT_WORK_TREE envvars.

2018-03-29 Thread Duy Nguyen
On Wed, Mar 28, 2018 at 02:19:32PM -0400, Jeff King wrote:
> 
> > I will probably rework on top of your chdir-notify instead (and let
> > yours to be merged earlier)
> 
> Thanks. I like some of the related changes you made, like including this
> in the tracing output. That should be easy to do on top of mine, I
> think.

Yeah. But is it possible to sneak something like this in your series
(I assume you will reroll anyway)? I could do it separately, but it
looks nicer if it's split out and merged in individual patches that
add new chdir-notify call site.

diff --git a/chdir-notify.c b/chdir-notify.c
index 7034e98d71..a2a33443f8 100644
--- a/chdir-notify.c
+++ b/chdir-notify.c
@@ -4,21 +4,26 @@
 #include "strbuf.h"
 
 struct chdir_notify_entry {
+   const char *name;
chdir_notify_callback cb;
void *data;
struct list_head list;
 };
 static LIST_HEAD(chdir_notify_entries);
 
-void chdir_notify_register(chdir_notify_callback cb, void *data)
+void chdir_notify_register(const char *name,
+  chdir_notify_callback cb,
+  void *data)
 {
struct chdir_notify_entry *e = xmalloc(sizeof(*e));
e->cb = cb;
e->data = data;
+   e->name = name;
list_add_tail(&e->list, &chdir_notify_entries);
 }
 
-static void reparent_cb(const char *old_cwd,
+static void reparent_cb(const char *name,
+   const char *old_cwd,
const char *new_cwd,
void *data)
 {
@@ -28,12 +33,16 @@ static void reparent_cb(const char *old_cwd,
if (!tmp || !is_absolute_path(tmp))
return;
*path = reparent_relative_path(old_cwd, new_cwd, tmp);
+   if (name)
+   trace_printf_key(&trace_setup_key,
+"setup: reparent %s to '%s'",
+name, *path);
free(tmp);
 }
 
-void chdir_notify_reparent(char **path)
+void chdir_notify_reparent(const char *name, char **path)
 {
-   chdir_notify_register(reparent_cb, path);
+   chdir_notify_register(name, reparent_cb, path);
 }
 
 int chdir_notify(const char *new_cwd)
@@ -45,11 +54,12 @@ int chdir_notify(const char *new_cwd)
return -1;
if (chdir(new_cwd) < 0)
return -1;
+   trace_printf_key(&trace_setup_key, "setup: chdir to '%s'", new_cwd);
 
list_for_each(pos, &chdir_notify_entries) {
struct chdir_notify_entry *e =
list_entry(pos, struct chdir_notify_entry, list);
-   e->cb(old_cwd.buf, new_cwd, e->data);
+   e->cb(e->name, old_cwd.buf, new_cwd, e->data);
}
 
strbuf_release(&old_cwd);
diff --git a/chdir-notify.h b/chdir-notify.h
index c3bd818a85..b9be1b54ac 100644
--- a/chdir-notify.h
+++ b/chdir-notify.h
@@ -16,23 +16,29 @@
  * warning("switched from %s to %s!", old_path, new_path);
  *   }
  *   ...
- *   chdir_notify_register(foo, data);
+ *   chdir_notify_register("description", foo, data);
  *
  * In practice most callers will want to move a relative path to the new root;
  * they can use the reparent_relative_path() helper for that. If that's all
  * you're doing, you can also use the convenience function:
  *
- *   chdir_notify_reparent(&my_path);
+ *   chdir_notify_reparent("description", &my_path);
  *
  * Registered functions are called in the order in which they were added. Note
  * that there's currently no way to remove a function, so make sure that the
  * data parameter remains valid for the rest of the program.
+ *
+ * The first argument is used for tracing purpose only. when my_path is updated
+ * by chdir_notify_reparent() it will print a message if $GIT_TRACE_SETUP is 
set.
+ * This argument could be NULL.
  */
-typedef void (*chdir_notify_callback)(const char *old_cwd,
+typedef void (*chdir_notify_callback)(const char *name,
+ const char *old_cwd,
  const char *new_cwd,
  void *data);
-void chdir_notify_register(chdir_notify_callback cb, void *data);
-void chdir_notify_reparent(char **path);
+void chdir_notify_register(const char *name, chdir_notify_callback cb,
+  void *data);
+void chdir_notify_reparent(const char *name, char **path);
 
 /*
  *
diff --git a/environment.c b/environment.c
index 794a75a717..92701cbc0c 100644
--- a/environment.c
+++ b/environment.c
@@ -330,11 +330,15 @@ static void set_git_dir_1(const char *path)
setup_git_env(path);
 }
 
-static void update_relative_gitdir(const char *old_cwd,
+static void update_relative_gitdir(const char *name,
+  const char *old_cwd,
   const char *new_cwd,
   void *data)
 {
char *path = reparent_relative_path(old_cwd, new_cwd, get_git_dir());
+   trace_printf_key(&trace_setup_key,
+ 

Re: [PATCH 3/4] set_work_tree: use chdir_notify

2018-03-29 Thread Duy Nguyen
On Wed, Mar 28, 2018 at 7:42 PM, Jeff King  wrote:
> When we change to the top of the working tree, we manually
> re-adjust $GIT_DIR and call set_git_dir() again, in order to
> update any relative git-dir we'd compute earlier.

Another way to approach this problem is not delaying chdir() at all.
We have to delay calling setup_work_tree() and not do it in
setup_git_directory() because it can die() when chdir() fails. But in
many cases, the command does not actually need the worktree and does
not deserve to die. But what if we make setup_work_tree be gentle?

If it successfully chdir() at the end of setup_git_directory() (and
perhaps before the first set_git_dir call), great! The problem we're
dealing here vanishes. If it fails, don't die, just set a flag. Later
on when a command requests a worktree, we can check this flag and now
can die().

It's less code this way, but it uses up more of your (or my) time
because even though the first set_git_dir() call actually happens at 8
places. Is it worth trying ?
-- 
Duy


Re: [PATCH 3/4] set_work_tree: use chdir_notify

2018-03-29 Thread Duy Nguyen
On Thu, Mar 29, 2018 at 7:02 PM, Duy Nguyen  wrote:
> ...
>
> It's less code this way, but it uses up more of your (or my) time
> because even though the first set_git_dir() call actually happens at 8
> places. Is it worth trying ?

Never mind. I took a stab anyway. The setup code should have much less
side effect before we can do stuff like this.
-- 
Duy


Re: [PATCH 3/4] set_work_tree: use chdir_notify

2018-03-29 Thread Duy Nguyen
On Thu, Mar 29, 2018 at 7:50 PM, Jeff King  wrote:
> On Thu, Mar 29, 2018 at 07:02:21PM +0200, Duy Nguyen wrote:
>
>> On Wed, Mar 28, 2018 at 7:42 PM, Jeff King  wrote:
>> > When we change to the top of the working tree, we manually
>> > re-adjust $GIT_DIR and call set_git_dir() again, in order to
>> > update any relative git-dir we'd compute earlier.
>>
>> Another way to approach this problem is not delaying chdir() at all.
>> We have to delay calling setup_work_tree() and not do it in
>> setup_git_directory() because it can die() when chdir() fails. But in
>> many cases, the command does not actually need the worktree and does
>> not deserve to die. But what if we make setup_work_tree be gentle?
>>
>> If it successfully chdir() at the end of setup_git_directory() (and
>> perhaps before the first set_git_dir call), great! The problem we're
>> dealing here vanishes. If it fails, don't die, just set a flag. Later
>> on when a command requests a worktree, we can check this flag and now
>> can die().
>>
>> It's less code this way, but it uses up more of your (or my) time
>> because even though the first set_git_dir() call actually happens at 8
>> places. Is it worth trying ?
>
> I do kind of like that. I'm reasonably happy with the chdir_notify()
> interface, but it would be nicer still if we could get rid of it in the
> first place. It's true that we _could_ chdir from other places, but

There's another place we do, that I should mention and keep
forgetting. Our run-command.c code allows to switch cwd, and if
$GIT_DIR and stuff is relative then we should reparent them too just
like we do with setup_work_tree(). Your chdir-notify makes it super
easy to support that, we just need to move the prep_childenv() down
below chdir(). But since nobody has complaint, I suppose that feature
is not really popular (or at least not used to launch another git
process anyway)

> realistically speaking, we do our one big chdir as part of the setup
> process.
-- 
Duy


Re: [PATCH 2/4] add chdir-notify API

2018-03-29 Thread Duy Nguyen
On Thu, Mar 29, 2018 at 7:48 PM, Jeff King  wrote:
> On Thu, Mar 29, 2018 at 04:53:42PM +0200, Duy Nguyen wrote:
>
>> On Wed, Mar 28, 2018 at 7:40 PM, Jeff King  wrote:
>> > +static void reparent_cb(const char *old_cwd,
>> > +   const char *new_cwd,
>> > +   void *data)
>> > +{
>> > +   char **path = data;
>>
>> Maybe check data == NULL and return early. This is just for
>> convenience, e.g. instead of doing
>>
>> path = getenv("foo");
>> if (path)
>> chdir_notify_reparent(&path);
>>
>> I could do
>>
>> path = getenv("foo");
>> chdir_notify_reparent(&path);
>
> You'd need to xstrdup(path) there anyway. I guess you could use
> xstrdup_or_null(), but it seems somewhat unlikely (unless your work on
> top really does add such a callsite?).

It actually exists in 'next', repository.c where we have this line

repo->alternate_db = xstrdup_or_null(o->alternate_db);

> I guess it's not that much code to be careful, though.

>> > +void chdir_notify_reparent(char **path)
>> > +{
>> > +   if (!is_absolute_path(*path))
>>
>> I think this check is a bit early. There could be a big gap between
>> chdir_notify_reparent() and reparent_cb(). During that time, *path may
>> be updated and become a relative path. We can check for absolute path
>> inside reparent_cb() instead.
>
> My thinking was that we'd be effectively zero-cost for an absolute path,
> though really adding an element to the notification list is probably not
> a big deal.

I think we could leave such optimization to the caller. I'm ok with
keeping this "if" too, but you may need to clarify it in the big
comment block in chdir-notify.h because this behavior to me is
surprising.

> I also assumed that whoever changed it to a relative path would then
> install the notification handler. One thing that my series kind of
> glosses over is whether somebody might call any of these functions
> twice, which would involve setting up the handler twice. So e.g. if you
> did:
>
>   set_git_dir("foo");
>   set_git_dir("bar");
>
> we'd have two handlers, which would do the wrong thing when the second
> one triggered. I hoped we could eventually add a BUG() if that happens,
> but maybe we should simply do a:
>
>   static int registered_chdir;
>
>   if (!registered_chdir) {
> chdir_notify_reparent(&path);
> registered_chdir = 1;
>   }
>
> at each call-site. Another alternative would be to make it a noop to
> feed the same path twice. That requires a linear walk through the list,
> but it's a pretty small list.

Well, given the number of call sites is rather small at the moment, I
think chdir-notify can just stay dumb and let the caller deal with
duplicate registration. One thing I like about your linked list design
though, is it makes it quite easy to remove (or even update) a
callback. You can simply return the (opaque) pointer to the entire
chdir_notify_entry as a handle and we can free/unlink the entry based
on it. It's kinda hard to me to do it with array-based design.
-- 
Duy


Re: [PATCH 1/9] git_config_set: fix off-by-two

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 2:32 PM, Johannes Schindelin
 wrote:
> Hi,
>
> On Thu, 29 Mar 2018, Jeff King wrote:
>
>> On Thu, Mar 29, 2018 at 11:15:33AM -0700, Stefan Beller wrote:
>>
>> > > When calling `git config --unset abc.a` on this file, it leaves this
>> > > (invalid) config behind:
>> > >
>> > > [
>> > > [xyz]
>> > > key = value
>> > >
>> > > The reason is that we try to search for the beginning of the line (or
>> > > for the end of the preceding section header on the same line) that
>> > > defines abc.a, but as an optimization, we subtract 2 from the offset
>> > > pointing just after the definition before we call
>> > > find_beginning_of_line(). That function, however, *also* performs that
>> > > optimization and promptly fails to find the section header correctly.
>> >
>> > This commit message would be more convincing if we had it in test form.
>>
>> I agree a test might be nice. But I don't find the commit message
>> unconvincing at all. It explains pretty clearly why the bug occurs, and
>> you can verify it by looking at find_beginning_of_line.
>>
>> > [abc]a
>> >
>> > is not written by Git, but would be written from an outside tool or person
>> > and we barely cope with it?
>>
>> Yes, I don't think git would ever write onto the same line. But clearly
>> we should handle anything that's syntactically valid.
>
> I was tempted to add the test case, because it is easy to test it.
>
> But I then decided *not* to add it. Why? Testing is a balance between "can
> do" and "need to do".
>
> Can you imagine that I did *not* run the entire test suite before
> submitting this patch series, because it takes an incredible *90 minutes*
> to run *on a fast Windows machine*?

What's wrong with firing up a new worktree, run the test suite there
and go back to do something else so you won't waste time just waiting
for test results and submit? Sure there is a mental overhead for
switching tasks, but at 90 minutes, I think it's worth doing.
-- 
Duy


Re: [PATCH] setup.c: reset candidate->work_tree after freeing it

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 7:10 PM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> Dangling pointers are usually bad news. Reset it back to NULL.
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> ---
>
> Before abade65b ("setup: expose enumerated repo info", 2017-11-12),
> candidate was an on-stack variable local to this function, so there
> was no need to do anything before returning.  After that commit, we
> pass &repo_fmt down the codepath so theoretically the caller could
> peek into repo_fmt.work_tree after this codepath, which may be bad.
> But if candidate->work_tree were not NULL at this point, that would
> mean that such a caller that peeks is getting a WRONG information,
> no?  It thinks there were no core.worktree set but in reality there
> was the configuration set in the repository, no?
>
> Which fields in candidate are safe to peek by the caller?  How can a
> caller tell?

To me, all fields should be valid after
check_repository_format_gently(). Right now the caller does not need
to read any info from repo_fmt because check_repo... passes the
information in another way, via global variables like
is_bare_repository_cfg and git_work_tree_cfg.

> It seems that setup_git_directory_gently() uses repo_fmt when
> calling various variants of setup_*_git_dir() and then uses the
> repo_fmt.hash_algo field later.

Yes. Though if we're going to reduce global state further more, then
the "if (!has_common)" should be done by the caller, then we need
access to all fields in repo_fmt

> If we want to keep fields of repo_fmt alive and valid after
> check_repository_format_gently() and callers of it like
> setup_*_git_dir() returns, then perhaps the right fix is not to free
> candidate->work_tree here, and instead give an interface to clean up
> repository_format instance, so that the ultimate caller like
> setup_git_directory_gently() can safely peek into any fields in it
> and then clean it up after it is done?

We still need to free and set NULL here though in addition to a
cleanup interface. The reason is, when checking repo config from a
worktree, we deliberately ignore core.worktree (which belongs to the
main repo only). The implicit line near this
free(candidate->work_tree) is "leave git_work_tree_cfg alone, we don't
recognize core.worktree". Once we move setting git_work_tree_cfg out
of this function, this becomes clear.

I didn't think all of this when I wrote this patch though. It was
"hey, it's theoretical bug so let's fix it". Only later on when I
refactored more that I realized what it meant.
-- 
Duy


Re: [PATCH] setup.c: reset candidate->work_tree after freeing it

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 8:32 PM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>> On Fri, Mar 30, 2018 at 7:10 PM, Junio C Hamano  wrote:
>>
>>> Which fields in candidate are safe to peek by the caller?  How can a
>>> caller tell?
>>
>> To me, all fields should be valid after
>> check_repository_format_gently().
>
> If so, free() is wrong in the first place, and FREE_AND_NULL() is
> making it even worse, no?  We learned there is work_tree set to
> somewhere, the original code by Peff before abade65b ("setup: expose
> enumerated repo info", 2017-11-12) freed it because the code no
> longer needed that piece of information.  If we are passing all we
> learned back to the caller, we should not free the field in the
> function at all.  But it seems (below) the codepath is messier than
> that.

Actually no, NULL is the right value. I was trying to say that this
mysterious code was about _deliberately_ ignore core.worktree. By
keeping repo_fmt->worktree as NULL we tell the caller "core.worktree
is not set". The current code also does that but in a different way:
it sets git_work_tree_cfg based on candidate->worktree, but only for
the "!has_common" block.

>> We still need to free and set NULL here though in addition to a
>> cleanup interface. The reason is, when checking repo config from a
>> worktree, we deliberately ignore core.worktree (which belongs to the
>> main repo only). The implicit line near this
>> free(candidate->work_tree) is "leave git_work_tree_cfg alone, we don't
>> recognize core.worktree". Once we move setting git_work_tree_cfg out
>> of this function, this becomes clear.
>
> So in other words, there is a code that looks at the field and it
> _wants_ to see NULL there---otherwise that brittle code misbehaves
> and FREE_AND_NULL() is a bad-aid to work it around?
>
> Then proposed log message "leaving it dangling is unsanitary" is
> *not* what is going on here, and the real reason why the code should
> be like so deserve to be described both in the log message and in a
> large in-code comment, no?

Let's drop this for now. I'm a bit further along in refactoring this
code that I thought I could. It'll be clearer when the caller is also
updated to show what's wrong.
-- 
Duy


Re: [PATCH 1/9] git_config_set: fix off-by-two

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 8:53 PM, Johannes Schindelin
 wrote:
> I think the best course of action would be to incrementally do away with
> the shell scripted test framework, in the way you outlined earlier this
> year. This would *also* buy us a wealth of other benefits, such as better
> control over the parallelization, resource usage, etc.

If you have not noticed, I'm a bit busy with all sorts of stuff and
probably won't continue that work. And since it affects you the most,
you probably have the best motive to tackle it ;-) I don't think
complaining about slow test suite helps. And avoiding adding more
tests because of that definitely does not help.

> It would also finally make it easier to introduce something like "smart
> testing" where code coverage could be computed (this works only for C
> code, of course, not for the many scripted parts of core Git), and a diff
> could be inspected to discover which tests *really* need to be run,
> skipping the tests that would only touch unchanged code.
-- 
Duy


Re: [PATCH v2 0/5] re-parenting relative directories after chdir

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 02:34:25PM -0400, Jeff King wrote:
> On Wed, Mar 28, 2018 at 01:36:56PM -0400, Jeff King wrote:
> 
> > For those just joining us, this fixes a regression that was in v2.13 (so
> > nothing we need to worry about as part of the 2.17-rc track).
> > 
> >   [1/4]: set_git_dir: die when setenv() fails
> >   [2/4]: add chdir-notify API
> >   [3/4]: set_work_tree: use chdir_notify
> >   [4/4]: refs: use chdir_notify to update cached relative paths
> 
> Here's a re-roll based on the feedback I got, including:
> 
>  - fixes the memory leak and vague comment pointed out by Eric
> 
>  - adds in tracing code from Duy
> 
>  - I took Duy's suggestions regarding "least" surprise in some of the
>functions (reparenting NULL is a noop, and registering a reparent
>handler does so even for an absolute path).
> 
> I punted on the "registering the same path twice" thing. That is a
> potential way to misuse this API, but I don't think there's a good
> solution. The "reparent" helper could figure this out for you, but in
> the general case we actually install an arbitrary callback. So the
> caller really has to handle it there.

The series looks good to me.

> 
> I think in the long run we'd want to outlaw calling set_git_dir() twice
> anyway.

Oh yeah. With my latest WIP changes, the bottom of
setup_git_directory_gently() looks like this. Nowhere else in setup
code calls these functions anymore (except the current
setup_work_tree)

-- 8< --
if (result.worktree)
set_git_work_tree(result.worktree);
if (result.gitdir)
set_git_dir(result.gitdir);
if (startup_info->have_repository)
repo_set_hash_algo(the_repository, result.repo_fmt.hash_algo);
...
return result.prefix;
-- 8< --

>From here on, it's not hard to see how to turn set_git_work_tree()
into setup_work_tree_gently() (without doing any set_git_dir) and the
last two calls into "repo_init_gitdir(gitdir, hash_algo)", which
should be where we allocate a new repository object and initialize
related object store, ref store...

But I still have a couple setup corner cases to deal with first :(
--
Duy


Re: [PATCH v7 13/13] pack-objects: reorder members to shrink struct object_entry

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 11:26 PM, Jeff King  wrote:
> On Sat, Mar 24, 2018 at 07:33:53AM +0100, Nguyễn Thái Ngọc Duy wrote:
>
>> Previous patches leave lots of holes and padding in this struct. This
>> patch reorders the members and shrinks the struct down to 80 bytes
>> (from 136 bytes, before any field shrinking is done) with 16 bits to
>> spare (and a couple more in in_pack_header_size when we really run out
>> of bits).
>
> Out of curiosity, did you count this yourself, or did you double-check
> with a few compilers to make sure they all produce the same result?

I used pahole though only with .o files created by gcc 64-bit. I'll
try the 32-bit version and clang as well.
-- 
Duy


Re: [PATCH v7 12/13] pack-objects: shrink delta_size field in struct object_entry

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 11:24 PM, Jeff King  wrote:
> On Sat, Mar 24, 2018 at 07:33:52AM +0100, Nguyễn Thái Ngọc Duy wrote:
>> @@ -2004,10 +2006,12 @@ static int try_delta(struct unpacked *trg, struct 
>> unpacked *src,
>>   delta_buf = create_delta(src->index, trg->data, trg_size, &delta_size, 
>> max_size);
>>   if (!delta_buf)
>>   return 0;
>> + if (delta_size >= (1 << OE_DELTA_SIZE_BITS))
>> + return 0;
>
> This is the other spot that needs to be "1U".
>
> How come this doesn't get a pdata->oe_delta_size_limit like we have
> pdata->oe_size_limit? Would we want a matching
> $GIT_TEST_OE_DELTA_SIZE_BITS to test it, too?

Probably. This change does not look as risky as the others (no
complicated fallback). But without $GIT_TEST_OE_DELTA_SIZE_BITS it's
hard to know how the new code reacts when we get over the limit. I
will add it.
-- 
Duy


Re: [PATCH v7 10/13] pack-objects: clarify the use of object_entry::size

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 11:04 PM, Jeff King  wrote:
> The subject says "clarify" so I was a little surprised to see code
> changes. It looks like we're just avoiding reassigning on top of the
> value repeatedly, which is part of that clarification. It looks like a
> noop to me.

Oh well... I was counting on the new name (in_pack_size, which follows
in_pack_type naming convention) to emphasize it (and the new "delta
size" comment to point out where in_pack_size contains a delta size.
-- 
Duy


Re: [PATCH v7 08/13] pack-objects: shrink z_delta_size field in struct object_entry

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 10:59 PM, Jeff King  wrote:
> On Sat, Mar 24, 2018 at 07:33:48AM +0100, Nguyễn Thái Ngọc Duy wrote:
>
>> We only cache deltas when it's smaller than a certain limit. This limit
>> defaults to 1000 but save its compressed length in a 64-bit field.
>> Shrink that field down to 16 bits, so you can only cache 65kb deltas.
>> Larger deltas must be recomputed at when the pack is written down.
>
> Unlike the depth, I don't think there's any _inherent_ reason you
> couldn't throw, say, 1MB deltas into the cache (if you sized it large
> enough). But I doubt such deltas are really all that common. Here are
> the top 10 in linux.git:
>
>   $ git cat-file --batch-all-objects --batch-check='%(deltabase) 
> %(objectsize:disk)' |
> grep -v ^0 | sort -k 2nr | head
>   a02b6794337286bc12c907c33d5d75537c240bd0 769103
>   b28d4b64c05da02c5e8c684dcb9422876225ebdc 327116
>   1e98ce86ed19aff9ba721d13a749ff08088c9922 325257
>   a02b6794337286bc12c907c33d5d75537c240bd0 240647
>   c550d99286c01867dfb26e432417f3106acf8611 177896
>   5977795854f852c2b95dd023fd03cace023ee41c 119737
>   4ccf9681c45d01d17376f7e0d266532a4460f5f8 112671
>   b39fb6821faa9e7bc36de738152a2817b4bf3654 112657
>   2645d6239b74bebd661436762e819b831095b084 103980
>   b8ce7fe5d8def58dc63b7ae099eff7bd07e4e845 101014
>
> It's possible some weird workload would want to tweak this. Say you were
> storing a ton of delta-capable files that were big and always differed
> by a megabyte. And it was somehow really important to you to tradeoff
> memory for CPU during the write phase of a pack.

We're not short on spare bits so I will try to raise this limit to 1MB
(not because you mentioned 1MB, but because the largest size in your
output is close to 1MB).
-- 
Duy


Re: [PATCH v7 06/13] pack-objects: move in_pack out of struct object_entry

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 10:48 PM, Jeff King  wrote:
> On Sat, Mar 24, 2018 at 07:33:46AM +0100, Nguyễn Thái Ngọc Duy wrote:
>
>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>> index e1244918a5..b41610569e 100644
>> --- a/builtin/pack-objects.c
>> +++ b/builtin/pack-objects.c
>> @@ -29,6 +29,8 @@
>>  #include "list.h"
>>  #include "packfile.h"
>>
>> +#define IN_PACK(obj) oe_in_pack(&to_pack, obj)
>
> How come this one gets a macro, but the earlier conversions don't?
>
> I guess the problem is that oe_in_pack() is defined in the generic
> pack-objects.h, but &to_pack is only in builtin/pack-objects.c?
>
> I wonder if it would be that bad to just say oe_in_pack(&to_pack, obj)
> everywhere. It's longer, but it makes the code slightly less magical to
> read.

Longer was exactly why I added these macros (with the hope that the
macro upper case names already ring a "it's magical" bell). Should I
drop all these macros? Some code becomes a lot more verbose though.

>> +static void prepare_in_pack_by_idx(struct packing_data *pdata)
>> +{
>> + struct packed_git **mapping, *p;
>> + int cnt = 0, nr = 1 << OE_IN_PACK_BITS;
>> +
>> + if (getenv("GIT_TEST_FULL_IN_PACK_ARRAY")) {
>> + /*
>> +  * leave in_pack_by_idx NULL to force in_pack[] to be
>> +  * used instead
>> +  */
>> + return;
>> + }
>> +
>> + ALLOC_ARRAY(mapping, nr);
>> + mapping[cnt++] = NULL; /* zero index must be mapped to NULL */
>
> Why? I guess because index==0 is a sentinel for "we're using the small
> index numbers?"

No because by default all values in object_entry is zero (or NULL). If
I remember correctly, some code will skip setting in_pack pointer to
leave it NULL. When we convert it to an index, it should also point to
NULL.

>> + prepare_packed_git();
>> + for (p = packed_git; p; p = p->next, cnt++) {
>> + if (cnt == nr) {
>> + free(mapping);
>> + return;
>> + }
>> + p->index = cnt;
>> + mapping[cnt] = p;
>> + }
>> + pdata->in_pack_by_idx = mapping;
>> +}
>
> What happens if we later have to reprepare_packed_git() and end up with
> more packs? We only call this for the first pack.
>
> It may well be handled, but I'm having trouble following the code to see
> if it is. And I doubt that case is covered by our test suite (since it
> inherently involves a race).

I don't think I covered this case. But since "index" field in
packed_git should be zero for the new packs, we could check and either
add it to in_pack_by_idx[].

>>  /*
>> + * The size of struct nearly determines pack-objects's memory
>> + * consumption. This struct is packed tight for that reason. When you
>> + * add or reorder something in this struct, think a bit about this.
>> + *
>
> It's funny to see this warning come in the middle. Should it be part of
> the final struct reordering at the end?

It was at the end in some version, the I shuffled the patches and
forgot about this one :)
-- 
Duy


Re: [PATCH v7 12/13] pack-objects: shrink delta_size field in struct object_entry

2018-03-31 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 11:24 PM, Jeff King  wrote:
> How come this doesn't get a pdata->oe_delta_size_limit like we have
> pdata->oe_size_limit? Would we want a matching
> $GIT_TEST_OE_DELTA_SIZE_BITS to test it, too?

Nope. This changes how the delta chain is formed (e.g. produces
shorter chains) and apparently some tests rely on that, like t5303.
-- 
Duy


Re: What's cooking in git.git (Mar 2018, #06; Fri, 30)

2018-03-31 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 10:38 PM, Junio C Hamano  wrote:
> * nd/repack-keep-pack (2018-03-26) 7 commits
>  - pack-objects: show some progress when counting kept objects
>  - gc --auto: exclude base pack if not enough mem to "repack -ad"
>  - gc: handle a corner case in gc.bigPackThreshold
>  - gc: add gc.bigPackThreshold config
>  - gc: add --keep-largest-pack option
>  - repack: add --keep-pack option
>  - t7700: have closing quote of a test at the beginning of line
>
>  "git gc" in a large repository takes a lot of time as it considers
>  to repack all objects into one pack by default.  The command has
>  been taught to pretend as if the largest existing packfile is
>  marked with ".keep" so that it is left untouched while objects in
>  other packs and loose ones are repacked.
>
>  Reroll exists, but it seems to be still slushy.
>  cf. <20180316192745.19557-1-pclo...@gmail.com>

The next one v4 [1] should be less slushy.

[1] https://public-inbox.org/git/20180324072507.21059-1-pclo...@gmail.com/#t
-- 
Duy


Re: [PATCH v7 06/13] pack-objects: move in_pack out of struct object_entry

2018-03-31 Thread Duy Nguyen
On Sat, Mar 31, 2018 at 06:20:07AM -0400, Jeff King wrote:
> On Sat, Mar 31, 2018 at 06:51:10AM +0200, Duy Nguyen wrote:
> 
> > >> +#define IN_PACK(obj) oe_in_pack(&to_pack, obj)
> > >
> > > How come this one gets a macro, but the earlier conversions don't?
> > >
> > > I guess the problem is that oe_in_pack() is defined in the generic
> > > pack-objects.h, but &to_pack is only in builtin/pack-objects.c?
> > >
> > > I wonder if it would be that bad to just say oe_in_pack(&to_pack, obj)
> > > everywhere. It's longer, but it makes the code slightly less magical to
> > > read.
> > 
> > Longer was exactly why I added these macros (with the hope that the
> > macro upper case names already ring a "it's magical" bell). Should I
> > drop all these macros? Some code becomes a lot more verbose though.
> 
> I'm on the fence. I agree that the macro screams "magical". I just
> sometimes see a macro and think something really weird and
> unfunction-like is going on. But really we're just replacing a default
> parameter.
> 
> So I dunno. If you get rid of the macros and I look at it, I give even
> odds that I'll say "yech, put them back!". :)

It would look like this (on top of v8). I think the "&to_pack" part is
most distracting when it's used as part of an expression (or a
function argument). I probably went overboard with SET_ macros though.

-- 8< --
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index b5bba2c228..dec849b755 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -29,18 +29,6 @@
 #include "list.h"
 #include "packfile.h"
 
-#define IN_PACK(obj) oe_in_pack(&to_pack, obj)
-#define SIZE(obj) oe_size(&to_pack, obj)
-#define SET_SIZE(obj,size) oe_set_size(&to_pack, obj, size)
-#define DELTA_SIZE(obj) oe_delta_size(&to_pack, obj)
-#define DELTA(obj) oe_delta(&to_pack, obj)
-#define DELTA_CHILD(obj) oe_delta_child(&to_pack, obj)
-#define DELTA_SIBLING(obj) oe_delta_sibling(&to_pack, obj)
-#define SET_DELTA(obj, val) oe_set_delta(&to_pack, obj, val)
-#define SET_DELTA_SIZE(obj, val) oe_set_delta_size(&to_pack, obj, val)
-#define SET_DELTA_CHILD(obj, val) oe_set_delta_child(&to_pack, obj, val)
-#define SET_DELTA_SIBLING(obj, val) oe_set_delta_sibling(&to_pack, obj, val)
-
 static const char *pack_usage[] = {
N_("git pack-objects --stdout [...] [<  | < 
]"),
N_("git pack-objects [...]  [<  | < 
]"),
@@ -137,14 +125,14 @@ static void *get_delta(struct object_entry *entry)
buf = read_sha1_file(entry->idx.oid.hash, &type, &size);
if (!buf)
die("unable to read %s", oid_to_hex(&entry->idx.oid));
-   base_buf = read_sha1_file(DELTA(entry)->idx.oid.hash, &type,
+   base_buf = read_sha1_file(oe_delta(&to_pack, entry)->idx.oid.hash, 
&type,
  &base_size);
if (!base_buf)
die("unable to read %s",
-   oid_to_hex(&DELTA(entry)->idx.oid));
+   oid_to_hex(&oe_delta(&to_pack, entry)->idx.oid));
delta_buf = diff_delta(base_buf, base_size,
   buf, size, &delta_size, 0);
-   if (!delta_buf || delta_size != DELTA_SIZE(entry))
+   if (!delta_buf || delta_size != oe_delta_size(&to_pack, entry))
die("delta size changed");
free(buf);
free(base_buf);
@@ -295,15 +283,15 @@ static unsigned long write_no_reuse_object(struct 
hashfile *f, struct object_ent
FREE_AND_NULL(entry->delta_data);
entry->z_delta_size = 0;
} else if (entry->delta_data) {
-   size = DELTA_SIZE(entry);
+   size = oe_delta_size(&to_pack, entry);
buf = entry->delta_data;
entry->delta_data = NULL;
-   type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
+   type = (allow_ofs_delta && oe_delta(&to_pack, 
entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
-   size = DELTA_SIZE(entry);
-   type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
+   size = oe_delta_size(&to_pack, entry);
+   type = (allow_ofs_delta && oe_delta(&to_pack, 
entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
 
@@ -327,7 +315,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
 * encoding of th

Re: [PATCH v8 00/15] nd/pack-objects-pack-struct updates

2018-03-31 Thread Duy Nguyen
On Sat, Mar 31, 2018 at 1:36 PM, Ævar Arnfjörð Bjarmason
 wrote:
>> +GIT_TEST_SPLIT_INDEX forces split-index mode on the whole test suite.
>> +
>>  GIT_TEST_FULL_IN_PACK_ARRAY exercises the uncommon pack-objects code
>>  path where there are more than 1024 packs even if the actual number of
>>  packs in repository is below this limit.
>>
>> -GIT_TEST_OE_SIZE_BITS= exercises the uncommon pack-objects
>> -code path where we do not cache objecct size in memory and read it
>> -from existing packs on demand. This normally only happens when the
>> -object size is over 2GB. This variable forces the code path on any
>> -object larger than 2^ bytes.
>
> The docs here say set these env variables, but actually
> GIT_TEST_FULL_IN_PACK_ARRAY is a special snowflake in requiring you to
> set a bool value.
>
> I'd set GIT_TEST_SPLIT_INDEX=YesPlease already in my test setup & just
> copied that as GIT_TEST_FULL_IN_PACK_ARRAY=YesPlease, but that'll error
> out since it's expecting bool, not the env variable to be set.
>
> I really don't care which we use, but let's use either if(getenv()) or
> if(git_env_bool()) consistently, and then have the docs either say "if
> set" or "if set to a boolean value (see git-config(1))".

I'll change GIT_TEST_SPLIT_INDEX to boolean too since I document it
here anyway. Will wait for a while though to see if anything else
should be part of v9.
-- 
Duy


Re: [PATCH 4/3] Makefile: untangle DEVELOPER and -Werror

2018-03-31 Thread Duy Nguyen
On Sat, Mar 31, 2018 at 6:40 PM, Ævar Arnfjörð Bjarmason
 wrote:
> Change the DEVELOPER flag, and the newly added EAGER_DEVELOPER flag
> which (approximately) enables -Wextra so that any combination of them
> and -Werror can be set.
>
> I've long wanted to use DEVELOPER=1 in my production builds, but on
> some old systems I still get warnings, and thus the build would
> fail. However if the build/tests fail for some other reason, it would
> still be useful to scroll up and see what the relevant code is warning
> about.
>
> This change allows for that. Now setting DEVELOPER will set -Werror as
> before, but if DEVELOPER_NONFATAL is set you'll get the same warnings,
> but without -Werror.
>
> I've renamed the newly added EAGER_DEVELOPER flag to
> DEVELOPER_EXTRA. The reason is that it approximately turns on -Wextra,
> and it'll be more consistent to add e.g. DEVELOPER_PEDANTIC later than
> inventing some new name of our own (VERY_EAGER_DEVELOPER?).

Before we go with zillions of *DEVELOPER* maybe we can have something
like DEVOPTS where you can give multiple keywords to a single variable
to influence config.mak.dev. This is similar to COMPILER_FEATURES we
already have in there, but now it's driven by the dev instead of the
compiler. So you can have keywords like "gentle" (no -Werror) "extra"
(-Wextra with no suppression) and something else.
-- 
Duy


Re: [PATCH 4/3] Makefile: untangle DEVELOPER and -Werror

2018-04-03 Thread Duy Nguyen
On Tue, Apr 03, 2018 at 11:19:46AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sat, Mar 31 2018, Duy Nguyen wrote:
> 
> > On Sat, Mar 31, 2018 at 6:40 PM, Ævar Arnfjörð Bjarmason
> >  wrote:
> >> Change the DEVELOPER flag, and the newly added EAGER_DEVELOPER flag
> >> which (approximately) enables -Wextra so that any combination of them
> >> and -Werror can be set.
> >>
> >> I've long wanted to use DEVELOPER=1 in my production builds, but on
> >> some old systems I still get warnings, and thus the build would
> >> fail. However if the build/tests fail for some other reason, it would
> >> still be useful to scroll up and see what the relevant code is warning
> >> about.
> >>
> >> This change allows for that. Now setting DEVELOPER will set -Werror as
> >> before, but if DEVELOPER_NONFATAL is set you'll get the same warnings,
> >> but without -Werror.
> >>
> >> I've renamed the newly added EAGER_DEVELOPER flag to
> >> DEVELOPER_EXTRA. The reason is that it approximately turns on -Wextra,
> >> and it'll be more consistent to add e.g. DEVELOPER_PEDANTIC later than
> >> inventing some new name of our own (VERY_EAGER_DEVELOPER?).
> >
> > Before we go with zillions of *DEVELOPER* maybe we can have something
> > like DEVOPTS where you can give multiple keywords to a single variable
> > to influence config.mak.dev. This is similar to COMPILER_FEATURES we
> > already have in there, but now it's driven by the dev instead of the
> > compiler. So you can have keywords like "gentle" (no -Werror) "extra"
> > (-Wextra with no suppression) and something else.
> 
> We could do that, but I don't think it's that bad. This patch is one
> extra option on top of yours,

And that eager this was marked experimental because I was not sure if
it was the right way :)

> and it's not going to result in some combinatorial explosion of
> options, i.e. if we add DEVELOPER_PEDANTIC we'll just add one extra
> flag.
> 
> But sure, we could make this some string we'd need to parse out similar
> to COMPILER_FEATURES, it just seems more complex to me for this task.

It's not that complex. With the EAGER_DEVELOPER patch removed, we can
have something like this where eager devs just need to put

DEVOPTS = gentle no-suppression

and you put

DEVOPTS = gentle

(bad naming, I didn't spend time thinking about names)

-- 8< --
diff --git a/Makefile b/Makefile
index e6680a8977..7b4e038e1d 100644
--- a/Makefile
+++ b/Makefile
@@ -435,6 +435,11 @@ all::
 # Define DEVELOPER to enable more compiler warnings. Compiler version
 # and faimily are auto detected, but could be overridden by defining
 # COMPILER_FEATURES (see config.mak.dev)
+#
+# When DEVELOPER is set, DEVOPTS can be used to control compiler options.
+# This variable contains keywords separated by whitespace. Two keywords
+# are recognized: "gentle" removes -Werror and "no-suppression"
+# removes all "-Wno-" options.
 
 GIT-VERSION-FILE: FORCE
@$(SHELL_PATH) ./GIT-VERSION-GEN
diff --git a/config.mak.dev b/config.mak.dev
index 716a14ecc7..1d7aba6a6a 100644
--- a/config.mak.dev
+++ b/config.mak.dev
@@ -1,4 +1,6 @@
+ifeq ($(filter gentle,$(DEVOPTS)),)
 CFLAGS += -Werror
+endif
 CFLAGS += -Wdeclaration-after-statement
 CFLAGS += -Wno-format-zero-length
 CFLAGS += -Wold-style-definition
@@ -21,6 +23,7 @@ CFLAGS += -Wextra
 # if a function is public, there should be a prototype and the right
 # header file should be included. If not, it should be static.
 CFLAGS += -Wmissing-prototypes
+ifeq ($(filter no-suppression,$(DEVOPTS)),)
 # These are disabled because we have these all over the place.
 CFLAGS += -Wno-empty-body
 CFLAGS += -Wno-missing-field-initializers
@@ -28,6 +31,7 @@ CFLAGS += -Wno-sign-compare
 CFLAGS += -Wno-unused-function
 CFLAGS += -Wno-unused-parameter
 endif
+endif
 
 # uninitialized warnings on gcc 4.9.2 in xdiff/xdiffi.c and config.c
 # not worth fixing since newer compilers correctly stop complaining
-- 8< --


Re: [PATCH] t2028: tighten grep expression to make "move worktree" test more robust

2018-04-03 Thread Duy Nguyen
On Tue, Apr 3, 2018 at 11:25 AM, Eric Sunshine  wrote:
> Following a rename of worktree "source" to "destination", the "move
> worktree" test uses grep to verify that the output of "git worktree list
> --porcelain" does not contain "source" (and does contain "destination").
> Unfortunately, the grep expression is too loose and can match
> unexpectedly. For example, if component of the test trash directory path
> matches "source" (e.g. "/home/me/sources/git/t/trash*"), then the test
> will be fooled into thinking that "source" still exists. Tighten the
> expression to avoid such accidental matches.
>
> While at it, drop an unused variable ("toplevel") from the test and
> tighten a similarly too-loose expression in a related test.
>
> Reported-by: Jens Krüger 
> Signed-off-by: Eric Sunshine 
> ---
>
> t2028 in 2.17.0 can be fooled into failing depending upon the path of
> the test's trash directory. The problem is with the test being too
> loose, not with Git itself. Problem report and diagnosis here[1].
>
> [1]: 
> https://public-inbox.org/git/26a00c2b-c588-68d5-7085-22310c20e...@frm2.tum.de/T/#m994cdb29f141656b0ab48dd0d152432c7e86fc20

Thanks both. It was great to scroll to the latest mails and saw that I
didn't have to do anything else :)
-- 
Duy


Re: [PATCH 1/9] git_config_set: fix off-by-two

2018-04-03 Thread Duy Nguyen
On Tue, Apr 3, 2018 at 11:31 AM, Johannes Schindelin
 wrote:
> It is very frustrating to spend that much time with only little gains here
> and there (and BusyBox-w32 is simply not robust enough yet, apart from
> also not showing a significant improvement in performance).

You still use busybox-w32? It's amazing that people still use it after
the linux subsystem comes. busybox has a lot of commands built in
(i.e. no new processes) and unless rmyorston did something more, the
"fork" in ash shell should be as cheap as it could be: it simply
serializes data and sends to the new process.

If performance does not improve, I guess the process creation cost
dominates. There's not much we could do except moving away from the
zillion processes test framework: either something C-based or another
scripting language (ok I don't want to bring this up again)
-- 
Duy


Re: [RFC PATCH 00/19] object-store refactoring 3 (replace objects, main ref store)

2018-04-07 Thread Duy Nguyen
On Sat, Apr 7, 2018 at 1:21 AM, Stefan Beller  wrote: *
> diff --git a/repository.h b/repository.h
> index 09df94a472..2922d3a28b 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -26,6 +26,11 @@ struct repository {
>  */
> struct raw_object_store *objects;
>
> +   /*
> +* The store in which the refs are hold.
> +*/
> +   struct ref_store *main_ref_store;
> +

We probably should drop the main_ prefix here because this could also
be a submodule ref store (when the repository is about a submodule).
worktree ref store is a different story and I don't know how to best
present it here yet (I'm still thinking a separate struct repository).
But we can worry about that when struct repository supports multiple
worktree.
-- 
Duy


Re: [PATCH 4/3] Makefile: untangle DEVELOPER and -Werror

2018-04-07 Thread Duy Nguyen
On Fri, Apr 6, 2018 at 11:42 PM, Jeff King  wrote:
> On Tue, Apr 03, 2018 at 05:17:00PM +0200, Duy Nguyen wrote:
>
>> It's not that complex. With the EAGER_DEVELOPER patch removed, we can
>> have something like this where eager devs just need to put
>>
>> DEVOPTS = gentle no-suppression
>>
>> and you put
>>
>> DEVOPTS = gentle
>>
>> (bad naming, I didn't spend time thinking about names)
>
> It seems to me like we're losing the point of DEVELOPER here. I thought
> the idea was to have a turn-key flag you could set to get extra linting
> on your commits. But now we're tweaking all kinds of individual options.
> At some point are we better off just letting you put "-Wno-foo" in your
> CFLAGS yourself?

It's because what AEvar wanted is no longer a dev thing :)

> I don't mind the version-based checks because they're automatic, so the
> feature remains turn-key. But this kind of DEVOPTS seems like a step in
> the wrong direction.
>
> -Peff



-- 
Duy


Re: [PATCH 4/3] Makefile: untangle DEVELOPER and -Werror

2018-04-07 Thread Duy Nguyen
On Sat, Apr 7, 2018 at 2:36 PM, Ævar Arnfjörð Bjarmason
 wrote:
> Anyway, I see you've pushed a new version with DEVOPTS. I'll submit mine
> on top of that once your new version lands (unless you want to try to
> integrate it yourself).

Actually I think I'll just drop both EAGER_DEVELOPER and DEVOTPS.
-- 
Duy


Re: [PATCH/RFC 5/5] help: add "-a --verbose" to list all commands with synopsis

2018-04-09 Thread Duy Nguyen
On Mon, Apr 9, 2018 at 11:55 AM, Eric Sunshine  wrote:
> On Mon, Apr 9, 2018 at 5:47 AM, Junio C Hamano  wrote:
>> Eric Sunshine  writes:
>>> On Mon, Mar 26, 2018 at 12:55 PM, Nguyễn Thái Ngọc Duy
>>>  wrote:
 +   switch (category) {
 +   case CAT_ancillaryinterrogators: return _("Ancillary 
 interrogators");
 +   case CAT_ancillarymanipulators: return _("Ancillary manipulators");
 +   case CAT_foreignscminterface: return _("Foreign SCM interface");
 +   case CAT_mainporcelain: return _("Main porcelain");
 +   case CAT_plumbinginterrogators: return _("Plumbing interrogators");
 +   case CAT_plumbingmanipulators: return _("Plumbing interrogators");
>>>
>>> s/interrogators"/manipulators"/
>>>
 +   case CAT_purehelpers: return _("Pure helpers");
 +   case CAT_synchelpers: return _("Sync helpers");
 +   case CAT_synchingrepositories: return _("Synching repositories");
>>
>> Somehow this screams "an array of strings" at me.  Aren't this
>> CAT_things small and dense enum?
>
> Duy's modified generate-cmdlist.sh does actually output an array of
> strings for these, but the (generated) array is commented out in this
> RFC. I suppose the reason it's not presently used is because the array
> looks like this:
>
> static const char *cmd_categories[] = {
> "ancillaryinterrogators",
> "ancillarymanipulators",
> "foreignscminterface",
> "mainporcelain",
> "plumbinginterrogators",
> "plumbingmanipulators",
> "purehelpers",
> "synchelpers",
> "synchingrepositories",
>  NULL
> };
>
> which doesn't give quite the human-friendly output he'd like. The
> series is RFC, after all.

Yep.

> A possible approach to fix it would be to add a new "### categories"
> section to command-list.txt which associates those category tags
> ("ancillaryinterrogators") with human-readable counterparts
> ("Ancillary interrogators").

Or extract the headlines from git.txt but that's not easy since it's
not consistent. We could manually recreate the same grouping as in
git.txt too, it's probably nicer than just printing groups sorted by
category id.
-- 
Duy


Re: What's cooking in git.git (Apr 2018, #01; Mon, 9)

2018-04-09 Thread Duy Nguyen
On Mon, Apr 9, 2018 at 12:21 PM, Junio C Hamano  wrote:
> * sb/packfiles-in-repository (2018-03-26) 12 commits
>   (merged to 'next' on 2018-03-30 at caa68db14d)
>  + packfile: keep prepare_packed_git() private
>  + packfile: allow find_pack_entry to handle arbitrary repositories
>  + packfile: add repository argument to find_pack_entry
>  + packfile: allow reprepare_packed_git to handle arbitrary repositories
>  + packfile: allow prepare_packed_git to handle arbitrary repositories
>  + packfile: allow prepare_packed_git_one to handle arbitrary repositories
>  + packfile: add repository argument to reprepare_packed_git
>  + packfile: add repository argument to prepare_packed_git
>  + packfile: add repository argument to prepare_packed_git_one
>  + packfile: allow install_packed_git to handle arbitrary repositories
>  + packfile: allow rearrange_packed_git to handle arbitrary repositories
>  + packfile: allow prepare_packed_git_mru to handle arbitrary repositories
>  (this branch uses nd/remove-ignore-env-field and sb/object-store; is tangled 
> with sb/submodule-move-nested.)
>
>  Refactoring of the internal global data structure continues.
>
>  Is this ready for 'master' by now?

I think so. Things start to look much nicer.
-- 
Duy


Re: [PATCH 1/2] git-worktree.txt: recommend 'git worktree remove' over manual deletion

2018-04-09 Thread Duy Nguyen
On Mon, Apr 9, 2018 at 9:33 AM, Eric Sunshine  wrote:
> When cc73385cf6 (worktree remove: new command, 2018-02-12) implemented
> and documented 'git worktree remove', it forgot to update existing
> instructions suggesting manual deletion. Fix this oversight by
> recommending 'git worktree remove' instead.

Too bad we can't show off "git worktree move" :) The patches look fine.
-- 
Duy


Re: [PATCH/RFC 3/5] generate-cmdlist.sh: keep all information in common-cmds.h

2018-04-09 Thread Duy Nguyen
On Mon, Apr 9, 2018 at 6:59 AM, Eric Sunshine  wrote:
> On Mon, Mar 26, 2018 at 12:55 PM, Nguyễn Thái Ngọc Duy
>  wrote:
>> common-cmds.h is used to extract the list of common commands (by
>> group) and a one-line summary of each command. Some information is
>> dropped, for example command category or summary of other commands.
>> Update generate-cmdlist.sh to keep all the information. The extra info
>> will be used shortly.
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> ---
>> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
>> @@ -2,9 +2,10 @@
>>  struct cmdname_help {
>> -   char name[16];
>> +   char name[32];
>> char help[80];
>> -   unsigned char group;
>> +   unsigned int category;
>> +   unsigned int group;
>>  };
>> @@ -23,27 +24,50 @@ sed -n '
>> +echo "#define GROUP_NONE 0xff /* no common group */"
>> +echo "#define GROUP_ 0xff /* no common group */"
>
> Meh, this "GROUP_" alias of "GROUP_NONE" isn't so nice.

Yeah. I don't want to mess too much with shell script. I wonder if we
should instead kill this script and extend Documentation/cmd-list.perl
to handle this task too. It would be much nicer to write and maintain
the script. The downside is NO_PERL builds will have no commands in
"git help".
-- 
Duy


Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary

2018-04-14 Thread Duy Nguyen
On Thu, Apr 12, 2018 at 12:06 AM, Philip Oakley  wrote:
> I'm only just catching up, but does/can this series also capture the
> non-command guides that are available in git so that the 'git help -g' can
> begin to list them all?

It currently does not. But I don't see why it should not. This should
allow git.txt to list all the guides too, for people who skip "git
help" and go hard core mode with "man git". Thanks for bringing this
up.
-- 
Duy


Re: [PATCH v1 0/2] fsexcludes: Add programmatic way to exclude files

2018-04-14 Thread Duy Nguyen
On Tue, Apr 10, 2018 at 11:04 PM, Ben Peart  wrote:
> In git repos with large working directories an external file system monitor
> (like fsmonitor or gvfs) can track what files in the working directory have 
> been
> modified.  This information can be used to speed up git operations that scale
> based on the size of the working directory so that they become O(# of modified
> files) vs O(# of files in the working directory).
>
> The fsmonitor patch series added logic to limit what files git had to stat() 
> to
> the set of modified files provided by the fsmonitor hook proc.  It also used 
> the
> untracked cache (if enabled) to limit the files/folders git had to scan 
> looking
> for new/untracked files.  GVFS is another external file system model that also
> speeds up git working directory based operations that has been using a 
> different
> mechanism (programmatically generating an excludes file) to enable git to be
> O(# of modified files).
>
> This patch series will introduce a new way to limit git�s traversal of the
> working directory that does not require the untracked cache (fsmonitor) or 
> using
> the excludes feature (GVFS).  It does this by enhancing the existing excludes
> logic in dir.c to support a new �File System Excludes� or fsexcludes API that 
> is
> better tuned to these programmatic applications.

I have not had a chance to really look at the patches yet but I think
these three paragraphs should somehow be included in the commit
description of 1/2 (or spread out between 1/2 and 2/2). 1/2
description for example briefly talks about how to use the new thing,
but not really tell what it's for, why you need to add it.
-- 
Duy


Re: [PATCH/RFC 3/5] generate-cmdlist.sh: keep all information in common-cmds.h

2018-04-15 Thread Duy Nguyen
On Mon, Apr 9, 2018 at 6:59 AM, Eric Sunshine  wrote:
>> +awk '{print $2;}' |
>
> At one time, Junio expressed concerns[2] about having an 'awk'
> dependency in the build system (in fact, with regards to this same
> generation process). Whether he still has such concerns is unknown,
> but it should be easy enough to avoid it here (and below).
>
> [2]: https://public-inbox.org/git/20150519004356.GA12854@flurp.local/

I'll stick with awk to avoid too much headache with regular
expressions (replacements are welcome though). We do use awk in our
test suite so it should be ok (who builds git and runs it without
testing?)
-- 
Duy


Re: .gitattributes lookup doesn't respect GIT_WORK_TREE

2018-04-16 Thread Duy Nguyen
On Mon, Apr 16, 2018 at 12:40 AM, Andreas Schwab  wrote:
> On Apr 16 2018, Junio C Hamano  wrote:
>
>> I may be mistaken (I do not have the code in front of me right now)
>> but IIRC after the setup.c code runs (which happens quite early in
>> the sequence that starts from git.c::cmd_main()), the Git process
>> moves to the top level of the working tree,
>
> git log/show don't appear to do that.

Yeah we lazily set up worktree in some cases. Elsewhere in the
chdir-notify thread, I suggested that we set up worktree
unconditionally (but do not die setup fails; only die when a command
specifically requests for worktree). That work would help make this
work in most cases. But it's not materialized yet.
-- 
Duy


Re: [PATCH v2 3/6] generate-cmdlist.sh: keep all information in common-cmds.h

2018-04-16 Thread Duy Nguyen
On Mon, Apr 16, 2018 at 8:28 AM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> @@ -23,28 +36,44 @@ sed -n '
>>   ' "$1"
>>  printf '};\n\n'
>>
>> +echo "#define GROUP_NONE 0xff /* no common group */"
>
> Some later code forgets about this value, and causes "git" to
> segfault at the end of this entire series.
>
> Namely, here:
>
>> - for (i = 0; i < ARRAY_SIZE(common_cmds); i++) {
>> + for (i = 0; i < nr; i++) {
>>   if (common_cmds[i].group != current_grp) {
>>   printf("\n%s\n", 
>> _(common_cmd_groups[common_cmds[i].group]));
>>   current_grp = common_cmds[i].group;
>
> where common_cmd_groups[] gets overrun.

Argh!! I thought I tested everything. Sorry for the sloppy quality.

>
> Here is a squash I'll queue on top to keep the tip of 'pu' at least
> buildable.

Thanks. It's actually interesting that we have main porcelain commands
that belong to no group. I'll try to classify them so that they show
up as well.


-- 
Duy


Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary

2018-04-17 Thread Duy Nguyen
On Sun, Apr 15, 2018 at 11:21 PM, Philip Oakley  wrote:
> From: "Duy Nguyen"  : Saturday, April 14, 2018 4:44 PM
>
>> On Thu, Apr 12, 2018 at 12:06 AM, Philip Oakley 
>> wrote:
>>>
>>> I'm only just catching up, but does/can this series also capture the
>>> non-command guides that are available in git so that the 'git help -g'
>>> can
>>> begin to list them all?
>>
>>
>> It currently does not. But I don't see why it should not. This should
>> allow git.txt to list all the guides too, for people who skip "git
>> help" and go hard core mode with "man git". Thanks for bringing this
>> up.
>> --
>> Duy
>>
> Is that something I should add to my todo to add a 'guide' category etc.?

I added it too [1]. Not sure if you want anything more on top though.

[1] https://public-inbox.org/git/20180415164238.9107-7-pclo...@gmail.com/T/#u
-- 
Duy


Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary

2018-04-17 Thread Duy Nguyen
On Tue, Apr 17, 2018 at 06:24:41PM +0200, Duy Nguyen wrote:
> On Sun, Apr 15, 2018 at 11:21 PM, Philip Oakley  wrote:
> > From: "Duy Nguyen"  : Saturday, April 14, 2018 4:44 PM
> >
> >> On Thu, Apr 12, 2018 at 12:06 AM, Philip Oakley 
> >> wrote:
> >>>
> >>> I'm only just catching up, but does/can this series also capture the
> >>> non-command guides that are available in git so that the 'git help -g'
> >>> can
> >>> begin to list them all?
> >>
> >>
> >> It currently does not. But I don't see why it should not. This should
> >> allow git.txt to list all the guides too, for people who skip "git
> >> help" and go hard core mode with "man git". Thanks for bringing this
> >> up.
> >> --
> >> Duy
> >>
> > Is that something I should add to my todo to add a 'guide' category etc.?
> 
> I added it too [1]. Not sure if you want anything more on top though.

The "anything more" that at least I had in mind was something like
this. Though I'm not sure if it's a good thing to replace a hand
crafted section with an automatedly generated one. This patch on top
combines the "SEE ALSO" and "FURTHER DOCUMENT" into one with most of
documents/guides are extracted from command-list.txt

-- 8< --
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 6232143cb9..3e0ecd2e11 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -292,6 +292,7 @@ doc.dep : $(docdep_prereqs) $(wildcard *.txt) 
build-docdep.perl
 
 cmds_txt = cmds-ancillaryinterrogators.txt \
cmds-ancillarymanipulators.txt \
+   cmds-guide.txt \
cmds-mainporcelain.txt \
cmds-plumbinginterrogators.txt \
cmds-plumbingmanipulators.txt \
diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl
index 5aa73cfe45..e158bd9b96 100755
--- a/Documentation/cmd-list.perl
+++ b/Documentation/cmd-list.perl
@@ -54,6 +54,7 @@ for (sort <>) {
 
 for my $cat (qw(ancillaryinterrogators
ancillarymanipulators
+   guide
mainporcelain
plumbinginterrogators
plumbingmanipulators
diff --git a/Documentation/git.txt b/Documentation/git.txt
index 4767860e72..d60d2ae0c7 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -808,29 +808,6 @@ The index is also capable of storing multiple entries 
(called "stages")
 for a given pathname.  These stages are used to hold the various
 unmerged version of a file when a merge is in progress.
 
-FURTHER DOCUMENTATION
--
-
-See the references in the "description" section to get started
-using Git.  The following is probably more detail than necessary
-for a first-time user.
-
-The link:user-manual.html#git-concepts[Git concepts chapter of the
-user-manual] and linkgit:gitcore-tutorial[7] both provide
-introductions to the underlying Git architecture.
-
-See linkgit:gitworkflows[7] for an overview of recommended workflows.
-
-See also the link:howto-index.html[howto] documents for some useful
-examples.
-
-The internals are documented in the
-link:technical/api-index.html[Git API documentation].
-
-Users migrating from CVS may also want to
-read linkgit:gitcvs-migration[7].
-
-
 Authors
 ---
 Git was started by Linus Torvalds, and is currently maintained by Junio
@@ -854,11 +831,16 @@ the Git Security mailing list 
.
 
 SEE ALSO
 
-linkgit:gittutorial[7], linkgit:gittutorial-2[7],
-linkgit:giteveryday[7], linkgit:gitcvs-migration[7],
-linkgit:gitglossary[7], linkgit:gitcore-tutorial[7],
-linkgit:gitcli[7], link:user-manual.html[The Git User's Manual],
-linkgit:gitworkflows[7]
+
+See the references in the "description" section to get started
+using Git.  The following is probably more detail than necessary
+for a first-time user.
+
+include::cmds-guide.txt[]
+
+See also the link:howto-index.html[howto] documents for some useful
+examples. The internals are documented in the
+link:technical/api-index.html[Git API documentation].
 
 GIT
 ---
diff --git a/command-list.txt b/command-list.txt
index 1835f1a928..f26b8acd52 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -150,10 +150,14 @@ git-whatchanged 
ancillaryinterrogators
 git-worktreemainporcelain
 git-write-tree  plumbingmanipulators
 gitattributes   guide
+gitcvs-migrationguide
+gitcli  guide
+gitcore-tutorialguide
 giteveryday guide
 gitglossary guide
 gitignore   guide
 gitmodules  guide
 gitrevisionsguide
 gittutorial guide
+gittutorial-2   guide
 gitworkflowsguide
-- 8< --


Re: [PATCH/RFC] completion: complete all possible -no-

2018-04-18 Thread Duy Nguyen
On Wed, Apr 18, 2018 at 5:43 AM, Junio C Hamano  wrote:
> So, the earlier mention of "clone --no-checkout" sounded about not
> losing this historical practice, but (desirabilty of magic number 4
> aside) this "show first handful of --no-foo" feature is not about
> historical practice but is forward looking, in the sense that you do
> not mark "important" negated options in the source, which would be a
> way to handle the histrical "clone --no-checkout", but let the
> machinery mechanically choose among --no-foo (with the stupid choice
> criterion "first four are shown").

Well you kinda mark important in the source too. --no-checkout for
exampled is declared as OPT_BOOL(0, "no-checkout"... and parse-options
code has to add the double-negative form --checkout back [1].

The "first four" is chosen after carefully examining all commands and
observing that none of them have more than 4 "important" --no-. But
yes it is questionable and I should be able to do better to separate
the favorable --no- from the other extra and most-of-the-time-useless
--no- options.

> That allows other commands to
> have many --no-foo form without overwhelming the choices, but I am
> not sure if it is much better than a possible alternative of only
> showing --no-foo for more "important" foo's when show_gitcomp() is
> asked to list all of things. It would certainly be a more involved
> solution, that might require an update to the way how the choices
> are precomputed (you'd end up having to keep a separate "use this
> list when completing '--no-'" in addition to the normal list).

I did think about this alternative and was still undecided. Suppose
that you have less than 4 "important" --no- options, showing some
extra ones to me does not really hurt anything and if we could show
more options (within the same screen space) we should. But on the
other hand maintaining this magic number could be a maintenance
nightmare... Yeah I think I'm shifting towards no magic number now.

[1] These double negative options will _always_ show up,  there is no
easy way to hide them because they don't start with --no-. But we
don't have a lot of options starting with "no-" so it's probably fine.
-- 
Duy


Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary

2018-04-18 Thread Duy Nguyen
On Wed, Apr 18, 2018 at 12:47 AM, Philip Oakley  wrote:
>>> > Is that something I should add to my todo to add a 'guide' category >
>>> > etc.?
>>>
>>> I added it too [1]. Not sure if you want anything more on top though.
>
>
> What I've seen is looking good - I've not had as much time as I'd like..
>
> I'm not sure of the status of the git/generate-cmdlist.sh though. Should
> that also be updated, or did I miss that?

Yes it's updated by other patches in the same thread.
-- 
Duy


Re: [PATCH v2 1/2] completion: stop showing 'save' for stash by default

2018-04-19 Thread Duy Nguyen
On Fri, Apr 20, 2018 at 1:25 AM, Thomas Gummerer  wrote:
> The 'save' subcommand in git stash has been deprecated in
> fd2ebf14db ("stash: mark "git stash save" deprecated in the man page",
> 2017-10-22).
>
> Stop showing it when the users enters 'git stash ' or 'git stash
> s'.  Keep showing it however when the user enters 'git stash sa'
> or any more characters of the 'save' subcommand.

I don't think this is worth it. You only save two keystrokes for 've'
and already waste one on .
-- 
Duy


Re: [PATCH v3 0/6] Keep all info in command-list.txt in git binary

2018-04-21 Thread Duy Nguyen
On Sat, Apr 21, 2018 at 06:54:08PM +0200, Nguyễn Thái Ngọc Duy wrote:
> Changes:
> 
> - remove the deprecated column in command-list.txt. My change break it
>   anyway if anyone uses it.
> - fix up failed tests that I marked in the RFC and kinda forgot about it.
> - fix bashisms in generate-cmdlist.sh
> - fix segfaul in "git help"

Sorry I forgot the interdiff

diff --git a/command-list.txt b/command-list.txt
index 0809a19184..1835f1a928 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -9,7 +9,7 @@ history  grow, mark and tweak your common history
 remote   collaborate (see also: git help workflows)
 
 ### command list (do not change this line)
-# command name  category [deprecated] [common]
+# command name  category[common]
 git-add mainporcelain   worktree
 git-am  mainporcelain
 git-annotateancillaryinterrogators
diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 9f17703aa7..7d17ca23f6 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -835,19 +835,23 @@ __git_complete_strategy ()
 }
 
 __git_commands () {
-   if test -n "${GIT_TESTING_COMMAND_COMPLETION:-}"
+   if test -n "$GIT_TESTING_COMPLETION"
then
-   printf "%s" "${GIT_TESTING_COMMAND_COMPLETION}"
+   case "$1" in
+   porcelain)
+   printf "%s" "$GIT_TESTING_PORCELAIN_COMMAND_LIST";;
+   all)
+   printf "%s" "$GIT_TESTING_ALL_COMMAND_LIST";;
+   esac
else
git --list-cmds=$1
fi
 }
 
-__git_list_all_commands ()
+__git_list_commands ()
 {
local i IFS=" "$'\n'
-   local category=${1-all}
-   for i in $(__git_commands $category)
+   for i in $(__git_commands $1)
do
case $i in
*--*) : helper pattern;;
@@ -860,14 +864,14 @@ __git_all_commands=
 __git_compute_all_commands ()
 {
test -n "$__git_all_commands" ||
-   __git_all_commands=$(__git_list_all_commands)
+   __git_all_commands=$(__git_list_commands all)
 }
 
 __git_porcelain_commands=
 __git_compute_porcelain_commands ()
 {
test -n "$__git_porcelain_commands" ||
-   __git_porcelain_commands=$(__git_list_all_commands porcelain)
+   __git_porcelain_commands=$(__git_list_commands porcelain)
 }
 
 # Lists all set config variables starting with the given section prefix,
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index e35f3e357b..86d59419b3 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -36,7 +36,7 @@ sed -n '
' "$1"
 printf '};\n\n'
 
-echo "#define GROUP_NONE 0xff /* no common group */"
+echo "#define GROUP_NONE 0xff"
 n=0
 while read grp
 do
@@ -45,15 +45,6 @@ do
 done <"$grps"
 echo
 
-echo '/*'
-printf 'static const char *cmd_categories[] = {\n'
-category_list "$1" |
-while read category; do
-   printf '\t\"'$category'\",\n'
-done
-printf '\tNULL\n};\n\n'
-echo '*/'
-
 n=0
 category_list "$1" |
 while read category; do
@@ -68,10 +59,11 @@ sort |
 while read cmd category tags
 do
if [ "$category" = guide ]; then
-   name=${cmd/git}
+   prefix=git
else
-   name=${cmd/git-}
+   prefix=git-
fi
+   name=$(echo $cmd | sed "s/^${prefix}//")
sed -n '
/^NAME/,/'"$cmd"'/H
${
diff --git a/help.c b/help.c
index a44f4a113e..88127fdd6f 100644
--- a/help.c
+++ b/help.c
@@ -201,7 +201,8 @@ static void extract_common_cmds(struct cmdname_help 
**p_common_cmds,
for (i = 0; i < ARRAY_SIZE(command_list); i++) {
const struct cmdname_help *cmd = command_list + i;
 
-   if (cmd->category != CAT_mainporcelain)
+   if (cmd->category != CAT_mainporcelain ||
+   cmd->group == GROUP_NONE)
continue;
 
common_cmds[nr++] = *cmd;
diff --git a/t/t0012-help.sh b/t/t0012-help.sh
index fd2a7f27dc..53208ab20e 100755
--- a/t/t0012-help.sh
+++ b/t/t0012-help.sh
@@ -25,6 +25,15 @@ test_expect_success "setup" '
EOF
 '
 
+# make sure to exercise these code paths, the output is a bit tricky
+# to verify
+test_expect_success 'basic help commands' '
+   git help >/dev/null &&
+   git help -a >/dev/null &&
+   git help -g >/dev/null &&
+   git help -av >/dev/null
+'
+
 test_expect_success "works for commands and guides by default" '
configure_help &&
git help status &&
diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 4bfd26ddf9..5a23a46fcf 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -13,7 +13,7 @@ complete ()
return 0
 }
 
-# Be careful when updating this list:
+# Be careful when updating

Re: [RFC PATCH v10 32.5/36] unpack_trees: fix memory corruption with split_index when src != dst

2018-04-22 Thread Duy Nguyen
On Sat, Apr 21, 2018 at 9:37 PM, Elijah Newren  wrote:
> Currently, all callers of unpack_trees() set o->src_index == o->dst_index.
> Since we create a temporary index in o->result, then discard o->dst_index
> and overwrite it with o->result, when o->src_index == o->dst_index it is
> safe to just reuse o->src_index's split_index for o->result.  However,
> o->src_index and o->dst_index are specified separately in order to allow
> callers to have these be different.  In such a case, reusing
> o->src_index's split_index for o->result will cause the split_index to be
> shared.  If either index then has entries replaced or removed, it will
> result in the other index referring to free()'d memory.
>
> Signed-off-by: Elijah Newren 
> ---
>
> I still haven't wrapped my head around the split_index stuff entirely, so
> it's possible that
>
>   - the performance optimization isn't even valid when src == dst.  Could
> the original index be different enough from the result that we don't
> want its split_index?

This really depends on the use case of course. But when git checkout
is used for switching branches, unpack-trees will be used and unless
you switch between to vastly different branches, the updated entries
may be small compared to the entire index that sharing is still good.
If the result index is so different that it results in a huge index
file anyway, I believe we have code to recreate a new shared index to
keep its size down next time.

>   - there's a better, more performant fix or there is some way to actually
> share a split_index between two independent index_state objects.

A cleaner way of doing this would be something to the line [1]

move_index_extensions(&o->result, o->dst_index);

near the end of this function. This could be where we compare the
result index with the source index's shared file and see if it's worth
keeping the shared index or not. Shared index is designed to work with
huge index files though, any operations that go through all index
entries will usually not be cheap. But at least it's safer.

> However, with this fix, all the tests pass both normally and under
> GIT_TEST_SPLIT_INDEX=DareISayYes.  Without this patch, when
> GIT_TEST_SPLIT_INDEX is set, my directory rename detection series will fail
> several tests, as reported by SZEDER.

Yes, the change looks good.

[1] To me the second parameter should be src_index, not dst_index.
We're copying entries from _source_ index to "result" and we should
also copy extensions from the source index. That line happens to work
only when dst_index is the same as src_index, which is the common use
case so far.

>  unpack-trees.c | 15 +--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 79fd97074e..b670415d4c 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1284,9 +1284,20 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> struct unpack_trees_options
> o->result.timestamp.sec = o->src_index->timestamp.sec;
> o->result.timestamp.nsec = o->src_index->timestamp.nsec;
> o->result.version = o->src_index->version;
> -   o->result.split_index = o->src_index->split_index;
> -   if (o->result.split_index)
> +   if (!o->src_index->split_index) {
> +   o->result.split_index = NULL;
> +   } else if (o->src_index == o->dst_index) {
> +   /*
> +* o->dst_index (and thus o->src_index) will be discarded
> +* and overwritten with o->result at the end of this function,
> +* so just use src_index's split_index to avoid having to
> +* create a new one.
> +*/
> +   o->result.split_index = o->src_index->split_index;
> o->result.split_index->refcount++;
> +   } else {
> +   o->result.split_index = init_split_index(&o->result);
> +   }
> hashcpy(o->result.sha1, o->src_index->sha1);
> o->merge_size = len;
> mark_all_ce_unused(o->src_index);
> --
> 2.17.0.296.gaac25b4b81
>



-- 
Duy


Re: [PATCH v3 0/6] Keep all info in command-list.txt in git binary

2018-04-22 Thread Duy Nguyen
On Sun, Apr 22, 2018 at 4:45 PM, Ramsay Jones
 wrote:
>
>
> On 21/04/18 17:56, Duy Nguyen wrote:
>> On Sat, Apr 21, 2018 at 06:54:08PM +0200, Nguyễn Thái Ngọc Duy wrote:
>>> Changes:
>>>
>>> - remove the deprecated column in command-list.txt. My change break it
>>>   anyway if anyone uses it.
>>> - fix up failed tests that I marked in the RFC and kinda forgot about it.
>>> - fix bashisms in generate-cmdlist.sh
>>> - fix segfaul in "git help"
>>
>> Sorry I forgot the interdiff
>>
> [snip]
>
>> diff --git a/t/t0012-help.sh b/t/t0012-help.sh
>> index fd2a7f27dc..53208ab20e 100755
>> --- a/t/t0012-help.sh
>> +++ b/t/t0012-help.sh
>> @@ -25,6 +25,15 @@ test_expect_success "setup" '
>>   EOF
>>  '
>>
>> +# make sure to exercise these code paths, the output is a bit tricky
>> +# to verify
>> +test_expect_success 'basic help commands' '
>> + git help >/dev/null &&
>> + git help -a >/dev/null &&
>> + git help -g >/dev/null &&
>> + git help -av >/dev/null
>> +'
>> +
> I think you need to try a little harder than this! ;-)

Yeah. I did think about grepping the output but decided not to because
of gettext poison stuff and column output in "git help". If we do want
to test this, how about I extend --list-cmds= option to take a few
more parameters? --list-cmds=common would output all common commands,
--list-cmds= does the same for other command category. This
way we can verify without worrying about text formatting, paging or
translation.
-- 
Duy


<    1   2   3   4   5   6   7   8   9   10   >