Re: [PATCH] pack-format.txt: more details on pack file format

2018-05-08 Thread Duy Nguyen
On Tue, May 8, 2018 at 8:21 PM, Ævar Arnfjörð Bjarmason
 wrote:
>
> On Tue, May 08 2018, Nguyễn Thái Ngọc Duy wrote:
>
>> The current document mentions OBJ_* constants without their actual
>> values. A git developer would know these are from cache.h but that's
>> not very friendly to a person who wants to read this file to implement
>> a pack file parser.
>>
>> Similarly, the deltified representation is not documented at all (the
>> "document" is basically patch-delta.c). Translate that C code in
>> English.
>
> Thanks, will drop my version from v4, although a comment for the enum in
> cache.h pointing the reader to these docs would be very useful.

True. I will add some together with the pack-format.txt update.
-- 
Duy


Re: [PATCH] pack-format.txt: more details on pack file format

2018-05-08 Thread Duy Nguyen
On Tue, May 8, 2018 at 7:23 PM, Stefan Beller  wrote:
>>  While at there, I also add some text about this obscure delta format.
>>  We occasionally have questions about this on the mailing list if I
>>  remember correctly.
>
> Let me see if I can understand it, as I am not well versed in the
> delta format, so ideally I would understand it from the patch here?

Well yes. I don't expect my first version to be that easy to
understand. This is where you come in to help ;-)

>> +Valid object types are:
>> +
>> +- OBJ_COMMIT (1)
>> +- OBJ_TREE (2)
>> +- OBJ_BLOB (3)
>> +- OBJ_TAG (4)
>> +- OBJ_OFS_DELTA (6)
>> +- OBJ_REF_DELTA (7)
>> +
>> +Type 5 is reserved for future expansion.
>
> and type 0 as well, but that is not spelled out?

type 0 is invalid. I think in some encoding it's not even possible to
encode zero. Anyway yes it should be spelled out.

>
>> +Deltified representation
>
> Does this refer to OFS delta as well as REF deltas?

Yes. Both OFS and REF deltas have the same "body" which is what this
part is about. The differences between OFS and REF deltas are not
described (in fact I don't think we describe what OFS and REF deltas
are at all).

>> is a sequence of one byte command optionally
>> +followed by more data for the command. The following commands are
>> +recognized:
>
> So a Deltified representation of an object is a 6 or 7 in the 3 bit type
> and then the length. Then a command is shown how to construct
> the object based on other objects. Can there be more commands?
>
>> +- If bit 7 is set, the remaining bits in the command byte specifies
>> +  how to extract copy offset and size to copy. The following must be
>> +  evaluated in this exact order:
>
> So there are 2 modes, and the high bit indicates which mode is used.
> You start describing the more complicated mode first,
> maybe give names to both of them? "direct copy" (below) and
> "compressed copy with offset" ?

I started to update this more because even this text is hard to get
even to me. So let's get the background first.

We have a source object somewhere (the object name comes from ofs/ref
delta's header), basically we have the whole content. This delta
thingy tells us how to use that source object to create a new (target)
object.

The delta is actually a sequence of instructions (of variable length).
One is for copying from the source object. The other copies from the
delta itself (e.g. this is new data in the target which is not
available anywhere in the source object to copy from). The last bit of
the first byte determines what instruction type it is.


>> +  - If bit 0 is set, the following byte contains bits 0-7 of the copy
>> +offset (this also resets all other bits in the copy offset to
>> +zero).
>> +  - If bit 1 is set, the following byte contains bits 8-15 of the copy
>> +offset.
>> +  - If bit 2 is set, the following byte contains bits 16-23 of the
>> +copy offset.
>> +  - If bit 3 is set, the following byte contains bits 24-31 of the
>> +copy offset.
>
> I assume these bits are exclusive, i.e. if bit 3 is set, bits 0-2 are not
> allowed to be set. What happens if they are set, do we care?
>
> If bit 3 is set, then the following byte contains 24-31 of the copy offset,
> where is the rest? Do I wait for another command byte with
> bits 2,1,0 to learn about the body offsets, or do they follow the
> following byte? Something like:
>
>   "If bit 3 is set, then the next 4 bytes are the copy offset,
>   in network byte order"

My first attempt at "translating to English" is like a constructing C
from assembly: it's horrible.

The instruction looks like this

bit  0123   4  5  6
  +--+++++--+--+--+
  | 1xxx | offset | offset | offset | offset | size | size | size |
  +--+++++--+--+--+

Here you can see it in its full form, each box represents a byte. The
first byte has bit 7 set as mentioned. We can see here that offsets
(where to copy from in the source object) takes 4 bytes and size (how
many bytes to copy) takes 3. Offset size size is in LSB order.

The "xxx" part lets us shrink this down. If the offset can fit in
16 bits, there's no reason to waste the last two bytes describing
zero. Each 'x' marks whether the corresponding byte is present. The
bit number is in the first row. So if you have offset 255 and size 1,
the instruction is three bytes 10010001b, 255, 1. The octets on "bit
column" 1, 2, 3, 5 and 6 are missing because the corresponding bit in
the first bit is not set.

>> +  - If bit 4 is set, the following byte contains bits 0-7 of the copy
>> +size (this also resets all other bits in the copy size to zero_.
>> +  - If bit 5 is set, the following byte contains bits 8-15 of the copy
>> +size.
>> +  - If bit 6 is set, the following byte contains bits 16-23 of the
>> +copy size.
>
> bits 4-7 seem to be another 

Re: [PATCH v3 04/12] cache.h: add comment explaining the order in object_type

2018-05-08 Thread Duy Nguyen
On Tue, May 1, 2018 at 8:40 PM, Ævar Arnfjörð Bjarmason
 wrote:
> The order in the enum might seem arbitrary, and isn't explained by
> 72518e9c26 ("more lightweight revalidation while reusing deflated
> stream in packing", 2006-09-03) which added it.
>
> Derrick Stolee suggested that it's ordered topologically in
> 5f8b1ec1-258d-1acc-133e-a7c248b40...@gmail.com. Makes sense to me, add
> that as a comment.
>
> Helped-by: Derrick Stolee 
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  cache.h | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/cache.h b/cache.h
> index 77b7acebb6..354903c3ea 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -376,6 +376,14 @@ extern void free_name_hash(struct index_state *istate);
>  enum object_type {
> OBJ_BAD = -1,
> OBJ_NONE = 0,
> +   /*
> +* Why have our our "real" object types in this order? They're
> +* ordered topologically:
> +*
> +* tag(4)-> commit(1), tree(2), blob(3)
> +* commit(1) -> tree(2)
> +* tree(2)   -> blob(3)
> +*/

I think it's more important that these constants are part of the pack
file format. Even if it follows some order now, when a new object type
comes, you cannot just reorder to keep things look nice because then
you break pack file access.

I'm afraid this comment suggests that these numbers are just about
order, which is very wrong.

> OBJ_COMMIT = 1,
> OBJ_TREE = 2,
> OBJ_BLOB = 3,
-- 
Duy


Re: [PATCH/RFC] completion: complete all possible -no-

2018-05-08 Thread Duy Nguyen
On Mon, Apr 23, 2018 at 7:36 AM, Eric Sunshine  wrote:
> I haven't looked at the implementation, so this may be an entirely
> stupid suggestion, but would it be possible to instead render the
> completions as?
>
> % git checkout --
> --[no-]conflict=   --[no-]patch
> --[no-]detach  --[no-]progress
> --[no-]ignore-other-worktrees  --[no-]quiet
> --[no-]ignore-skip-worktree-bits   --[no-]recurse-submodules
> --[no-]merge   --theirs
> --[no-]orphan= --[no-]track
> --ours
>
> This would address the problem of the --no-* options taking double the
> screen space.

It took me so long to reply partly because I remember seeing some guy
doing clever trick with tab completion that also shows a short help
text in addition to the complete words. I could not find that again
and from my reading (also internet searching) it's probably not
possible to do this without trickery.

> It's also more intuitive than that lone and somewhat weird-looking
> "--no-" suggestion.

It's not that weird if you think about file path completion, where you
complete one path component at a time not full path, bash just does
not show you full paths to everything.

I'm arguing about this because I want to see your reaction, because
I'm thinking of doing the very same thing for config completion. Right
now "git config " gives you two pages of all available config
variables. I'm thinking that we "git config " just shows the
groups, e.g.

> ~/w/git $ git config
add.  interactive.
advice.   log.
alias.mailmap.
am.   man.

Only when you do "git config log." that it shows you log.*
-- 
Duy


Re: [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions

2018-05-08 Thread Duy Nguyen
On Tue, May 8, 2018 at 12:59 AM, Stefan Beller  wrote:
> @@ -501,9 +509,31 @@ void raw_object_store_clear(struct raw_object_store *o)
>  void parsed_object_pool_clear(struct parsed_object_pool *o)
>  {
> /*
> -* TOOD free objects in o->obj_hash.
> -*
>  * As objects are allocated in slabs (see alloc.c), we do
>  * not need to free each object, but each slab instead.
> +*
> +* Before doing so, we need to free any additional memory
> +* the objects may hold.
>  */
> +   unsigned i;
> +
> +   for (i = 0; i < o->obj_hash_size; i++) {
> +   struct object *obj = o->obj_hash[i];
> +
> +   if (!obj)
> +   continue;
> +
> +   if (obj->type == OBJ_TREE) {
> +   free(((struct tree*)obj)->buffer);

It would be nicer to keep this in separate functions, e.g.
release_tree_node() and release_commit_node() to go with
alloc_xxx_node().

> +   } else if (obj->type == OBJ_COMMIT) {
> +   free_commit_list(((struct commit*)obj)->parents);
> +   free(&((struct commit*)obj)->util);
> +   }
> +   }

I still don't see who frees obj_hash[] (or at least clears it if not
freed). If I'm going to use this to free memory in pack-objects then
I'd really prefer obj_hash[] freed because it's a big _big_ array.

Just to be clear, what I mean is

FREE_AND_NULL(o->obj_hash);
o->obj_hash_size = 0;

> +
> +   clear_alloc_state(o->blob_state);
> +   clear_alloc_state(o->tree_state);
> +   clear_alloc_state(o->commit_state);
> +   clear_alloc_state(o->tag_state);
> +   clear_alloc_state(o->object_state);
>  }
-- 
Duy


Re: [PATCH v2 02/18] Add a new builtin: branch-diff

2018-05-07 Thread Duy Nguyen
On Mon, May 7, 2018 at 9:50 AM, Jeff King  wrote:
> On Sat, May 05, 2018 at 11:57:26PM +0200, Johannes Schindelin wrote:
>
>> > It feels really petty complaining about the name, but I just want to
>> > raise the point, since it will never be easier to change than right now.
>>
>> I do hear you. Especially since I hate `git cherry` every single time that
>> I try to tab-complete `git cherry-pick`.
>
> Me too. :)

Just so you know I'm also not happy with that "git cherry". Since I'm
updating git-completion.bash in this area and we got 3 "me too" votes
(four if we count Szeder in another thread), I'm going to implementing
something to at least let you exclude "cherry" from the completion
list if you want.
-- 
Duy


Re: [PATCH 4/5] lock_file: make function-local locks non-static

2018-05-07 Thread Duy Nguyen
On Sun, May 6, 2018 at 9:32 PM, Martin Ågren <martin.ag...@gmail.com> wrote:
> On 6 May 2018 at 19:42, Duy Nguyen <pclo...@gmail.com> wrote:
>> On Sun, May 6, 2018 at 7:26 PM, Duy Nguyen <pclo...@gmail.com> wrote:
>>> On Sun, May 6, 2018 at 4:10 PM, Martin Ågren <martin.ag...@gmail.com> wrote:
>>>> These `struct lock_file`s are local to their respective functions and we
>>>> can drop their staticness.
>
>>>> -   static struct lock_file lock;
>>>> +   struct lock_file lock = LOCK_INIT;
>>>
>>> Is it really safe to do this? I vaguely remember something about
>>> (global) linked list and signal handling which could trigger any time
>>> and probably at atexit() time too (i.e. die()). You don't want to
>>> depend on stack-based variables in that case.
>>
>> So I dug in a bit more about this. The original implementation does
>> not allow stack-based lock files at all in 415e96c8b7 ([PATCH]
>> Implement git-checkout-cache -u to update stat information in the
>> cache. - 2005-05-15). The situation has changed since 422a21c6a0
>> (tempfile: remove deactivated list entries - 2017-09-05). At the end
>> of that second commit, Jeff mentioned "We can clean them up
>> individually" which I guess is what these patches do. Though I do not
>> know if we need to make sure to call "release" function or something/
>> Either way you need more explanation and assurance than just "we can
>> drop their staticness" in the commit mesage.
>
> Thank you Duy for your comments. How about I write the commit message
> like so:

+Jeff. Since he made it possible to remove lock file from the global
linked list, he probably knows well what to check when switching from
a static lock file to a stack-local one.

>
>   After 076aa2cbd (tempfile: auto-allocate tempfiles on heap, 2017-09-05),
>   we can have lockfiles on the stack. These `struct lock_file`s are local
>   to their respective functions and we can drop their staticness.
>
>   Each of these users either commits or rolls back the lock in every
>   codepath, with these possible exceptions:
>
> * We bail using a call to `die()` or `exit()`. The lock will be
>   cleaned up automatically.
>
> * We return early from a function `cmd_foo()` in builtin/, i.e., we
>   are just about to exit. The lock will be cleaned up automatically.

There are also signals which can be caught and run on its own stack (I
think) so whatever variable on the current stack should be safe, I
guess.

>   If I have missed some codepath where we do not exit, yet leave a locked
>   lock around, that was so also before this patch. If we would later
>   re-enter the same function, then before this patch, we would be retaking
>   a lock for the very same `struct lock_file`, which feels awkward, but to
>   the best of my reading has well-defined behavior. Whereas after this
>   patch, we would attempt to take the lock with a completely fresh `struct
>   lock_file`. In both cases, the result would simply be that the lock can
>   not be taken, which is a situation we already handle.

There is a difference here, if the lock is not released properly,
previously the lockfile is still untouched. If it's on stack, it may
be overwritten which can corrupt the linked list to get to the next
lock file.  (and this is about calling the function in question just
_once_ not the second time).
-- 
Duy


Re: [PATCH 4/5] lock_file: make function-local locks non-static

2018-05-06 Thread Duy Nguyen
On Sun, May 6, 2018 at 7:26 PM, Duy Nguyen <pclo...@gmail.com> wrote:
> On Sun, May 6, 2018 at 4:10 PM, Martin Ågren <martin.ag...@gmail.com> wrote:
>> These `struct lock_file`s are local to their respective functions and we
>> can drop their staticness.
>>
>> Signed-off-by: Martin Ågren <martin.ag...@gmail.com>
>> ---
>>  apply.c| 2 +-
>>  builtin/describe.c | 2 +-
>>  builtin/difftool.c | 2 +-
>>  builtin/gc.c   | 2 +-
>>  builtin/merge.c| 4 ++--
>>  builtin/receive-pack.c | 2 +-
>>  bundle.c   | 2 +-
>>  fast-import.c  | 2 +-
>>  refs/files-backend.c   | 2 +-
>>  shallow.c  | 2 +-
>>  10 files changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/apply.c b/apply.c
>> index 7e5792c996..07b14d1127 100644
>> --- a/apply.c
>> +++ b/apply.c
>> @@ -4058,7 +4058,7 @@ static int build_fake_ancestor(struct apply_state 
>> *state, struct patch *list)
>>  {
>> struct patch *patch;
>> struct index_state result = { NULL };
>> -   static struct lock_file lock;
>> +   struct lock_file lock = LOCK_INIT;
>
> Is it really safe to do this? I vaguely remember something about
> (global) linked list and signal handling which could trigger any time
> and probably at atexit() time too (i.e. die()). You don't want to
> depend on stack-based variables in that case.

So I dug in a bit more about this. The original implementation does
not allow stack-based lock files at all in 415e96c8b7 ([PATCH]
Implement git-checkout-cache -u to update stat information in the
cache. - 2005-05-15). The situation has changed since 422a21c6a0
(tempfile: remove deactivated list entries - 2017-09-05). At the end
of that second commit, Jeff mentioned "We can clean them up
individually" which I guess is what these patches do. Though I do not
know if we need to make sure to call "release" function or something/
Either way you need more explanation and assurance than just "we can
drop their staticness" in the commit mesage.
-- 
Duy


Re: [PATCH 4/5] lock_file: make function-local locks non-static

2018-05-06 Thread Duy Nguyen
On Sun, May 6, 2018 at 4:10 PM, Martin Ågren  wrote:
> These `struct lock_file`s are local to their respective functions and we
> can drop their staticness.
>
> Signed-off-by: Martin Ågren 
> ---
>  apply.c| 2 +-
>  builtin/describe.c | 2 +-
>  builtin/difftool.c | 2 +-
>  builtin/gc.c   | 2 +-
>  builtin/merge.c| 4 ++--
>  builtin/receive-pack.c | 2 +-
>  bundle.c   | 2 +-
>  fast-import.c  | 2 +-
>  refs/files-backend.c   | 2 +-
>  shallow.c  | 2 +-
>  10 files changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/apply.c b/apply.c
> index 7e5792c996..07b14d1127 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -4058,7 +4058,7 @@ static int build_fake_ancestor(struct apply_state 
> *state, struct patch *list)
>  {
> struct patch *patch;
> struct index_state result = { NULL };
> -   static struct lock_file lock;
> +   struct lock_file lock = LOCK_INIT;

Is it really safe to do this? I vaguely remember something about
(global) linked list and signal handling which could trigger any time
and probably at atexit() time too (i.e. die()). You don't want to
depend on stack-based variables in that case.
-- 
Duy


Re: [PATCH v2 02/18] Add a new builtin: branch-diff

2018-05-06 Thread Duy Nguyen
On Sun, May 6, 2018 at 6:53 AM, Jacob Keller  wrote:
> On Sat, May 5, 2018 at 6:05 PM, Igor Djordjevic
>  wrote:
>> Hi Dscho,
>>
>> On 05/05/2018 23:57, Johannes Schindelin wrote:
>>>
>>> > > This builtin does not do a whole lot so far, apart from showing a
>>> > > usage that is oddly similar to that of `git tbdiff`. And for a
>>> > > good reason: the next commits will turn `branch-diff` into a
>>> > > full-blown replacement for `tbdiff`.
>>> >
>>> > One minor point about the name: will it become annoying as a tab
>>> > completion conflict with git-branch?
>>>
>>> I did mention this in the commit message of 18/18:
>>>
>>> Without this patch, we would only complete the `branch-diff` part but
>>> not the options and other arguments.
>>>
>>> This of itself may already be slightly disruptive for well-trained
>>> fingers that assume that `git braorimas` would expand to
>>> `git branch origin/master`, as we now no longer automatically append a
>>> space after completing `git branch`: this is now ambiguous.
>>>
>>> > It feels really petty complaining about the name, but I just want
>>> > to raise the point, since it will never be easier to change than
>>> > right now.
>>>
>>> I do hear you. Especially since I hate `git cherry` every single
>>> time that I try to tab-complete `git cherry-pick`.
>>>
>>> > (And no, I don't really have another name in mind; I'm just
>>> > wondering if "subset" names like this might be a mild annoyance in
>>> > the long run).
>>>
>>> They totally are, and if you can come up with a better name, I am
>>> really interested in changing it before this hits `next`, even.
>>
>> I gave this just a quick glance so might be I`m missing something
>> obvious or otherwise well-known here, bur why not `diff-branch` instead?
>>
>> From user interface perspective, I would (personally) rather expect a
>> command that does "diff of branches" to belong to "diff family" of
>> commands (just operating on branches, instead of "branch" command
>> knowing to "diff itself"), and I see we already have `diff-files`,
>> `diff-index` and `diff-tree`, for what that`s worth.
>>
>> Heck, I might even expect something like `git diff --branch ...` to work,
>> but I guess that is yet a different matter :)
>>
>> Thanks, Buga
>
> I like diff-branch, though I suppose that also conflicts with diff too.

How about interdiff?

-- 
Duy


Re: [PATCH v2 18/18] completion: support branch-diff

2018-05-06 Thread Duy Nguyen
On Fri, May 04, 2018 at 05:35:11PM +0200, Johannes Schindelin wrote:
> Tab completion of `branch-diff` is very convenient, especially given
> that the revision arguments that need to be passed to `git branch-diff`
> are typically more complex than, say, your grandfather's `git log`
> arguments.
> 
> Without this patch, we would only complete the `branch-diff` part but
> not the options and other arguments.
> 
> This of itself may already be slightly disruptive for well-trained
> fingers that assume that `git braorimas` would expand to
> `git branch origin/master`, as we now no longer automatically append a
> space after completing `git branch`: this is now ambiguous.
> 
> Signed-off-by: Johannes Schindelin 
> ---
>  contrib/completion/git-completion.bash | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/contrib/completion/git-completion.bash 
> b/contrib/completion/git-completion.bash
> index 01dd9ff07a2..45addd525ac 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -1496,6 +1496,24 @@ _git_format_patch ()
>   __git_complete_revlist
>  }
>  
> +__git_branch_diff_options="
> + --no-patches --creation-weight= --dual-color
> +"
> +
> +_git_branch_diff ()
> +{
> + case "$cur" in
> + --*)
> + __gitcomp "

You should use __gitcomp_builtin so you don't have to maintain
$__git_branch_diff_options here. Something like this

-- 8< --
diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 45addd525a..4745631daf 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1496,18 +1496,11 @@ _git_format_patch ()
__git_complete_revlist
 }
 
-__git_branch_diff_options="
-   --no-patches --creation-weight= --dual-color
-"
-
 _git_branch_diff ()
 {
case "$cur" in
--*)
-   __gitcomp "
-   $__git_branch_diff_options
-   $__git_diff_common_options
-   "
+   __gitcomp_builtin branch-diff "$__git_diff_common_options"
return
;;
esac
-- 8< --


> + $__git_branch_diff_options
> + $__git_diff_common_options
> + "
> + return
> + ;;
> + esac
> + __git_complete_revlist
> +}
> +
>  _git_fsck ()
>  {
>   case "$cur" in
> -- 
> 2.17.0.409.g71698f11835


Re: [PATCH v4 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-05 Thread Duy Nguyen
On Sat, May 5, 2018 at 4:43 AM, Taylor Blau  wrote:
> Teach 'git-grep(1)' a new option, '--column', to show the column
> number of the first match on a non-context line.

Why? Or put it another way, what is this option used for? Only
git-jump? (which should also be mentioned here if true)

>
> For example:
>
>   $ git grep -n --column foo | head -n3
>   .clang-format:51:14:# myFunction(foo, bar, baz);
>   .clang-format:64:7:# int foo();
>   .clang-format:75:8:# void foo()
>
-- 
Duy


Re: [PATCH] alloc.c: replace alloc by mempool

2018-05-04 Thread Duy Nguyen
On Fri, May 4, 2018 at 12:18 AM, Stefan Beller  wrote:
> I just measured on git.git and linux.git (both of which are not *huge* by
> any standard, but should give a good indication. linux has  6M objects,
> and when allocating 1024 at a time, we run into the new block allocation
> ~6000 times).
>
> I could not measure any meaningful difference.
>
> linux.git $ git count-objects -v
> count: 0
> size: 0
> in-pack: 6036543
> packs: 2
> size-pack: 2492985
> prune-packable: 0
> garbage: 0
> size-garbage: 0
>
> (with this patch)
>  Performance counter stats for '/u/git/git count-objects -v' (30 runs):
>
>   2.123683  task-clock:u (msec)   #0.831 CPUs utilized
> ( +-  2.32% )
>  0  context-switches:u#0.000 K/sec
>  0  cpu-migrations:u  #0.000 K/sec
>126  page-faults:u #0.059 M/sec
> ( +-  0.22% )
>895,900  cycles:u  #0.422 GHz  
> ( +-  1.40% )
>976,596  instructions:u#1.09  insn per cycle   
> ( +-  0.01% )
>218,256  branches:u#  102.772 M/sec
> ( +-  0.01% )
>  8,331  branch-misses:u   #3.82% of all branches  
> ( +-  0.61% )
>
>0.002556203 seconds time elapsed   
>( +-  2.20% )
>
>   Performance counter stats for 'git count-objects -v' (30 runs):
>
>   2.410352  task-clock:u (msec)   #0.801 CPUs utilized
> ( +-  2.79% )
>  0  context-switches:u#0.000 K/sec
>  0  cpu-migrations:u  #0.000 K/sec
>131  page-faults:u #0.054 M/sec
> ( +-  0.16% )
>993,301  cycles:u  #0.412 GHz  
> ( +-  1.99% )
>  1,087,428  instructions:u#1.09  insn per cycle   
> ( +-  0.02% )
>244,292  branches:u#  101.351 M/sec
> ( +-  0.02% )
>  9,264  branch-misses:u   #3.79% of all branches  
> ( +-  0.57% )
>
>0.003010854 seconds time elapsed   
>( +-  2.54% )
>
> So I think we could just replace it for now and optimize again later, if it
> turns out to be a problem. I think the easiest optimisation is to increase
> the allocation size of having a lot more objects per mp_block.

Yeah. I also tested this from a different angle: memory overhead. For
2M objects with one mp_block containing 1024 objects (same setting as
alloc.c), the overhead (not counting malloc() internal overhead) is
46KB and we don't have any extra overhead due to padding between
objects. This is true for all struct blob, commit, tree and tag. This
is really good. alloc.c has zero overhead when measured this way but
46KB is practically zero to me.
-- 
Duy


Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-04 Thread Duy Nguyen
On Fri, May 4, 2018 at 5:23 PM, Johannes Schindelin
 wrote:
>> > So that's what `info` does: it influences whether/where
>> > the command is listed in `git help`'s output... Interesting. I thought the
>> > lines here were trying to automate parts of the tab completion or
>> > something.
>>
>> Oh it does many things. The completion part is coming (so yeah you
>> don't need to update git-completion.bash at all, as long as you have a
>> line here and use parse_options() ;-), but I think it's mainly for
>> "git help" and command listing in "git help git" (this command for
>> example should show up under the "Main porcelain commands" in that man
>> page when you put a line here)
>
> I have a hard time believing that anything automated can infer from the
> source code that branch-diff can accept the non-options in three different
> formats, and what those formats look like...

Yeah. For now it can only do options which is machine readable. But I
have a big plan, so who knows :D
-- 
Duy


Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-04 Thread Duy Nguyen
On Fri, May 04, 2018 at 04:44:49PM +0200, Duy Nguyen wrote:
> On Fri, May 4, 2018 at 9:23 AM, Johannes Schindelin
> <johannes.schinde...@gmx.de> wrote:
> > Oh, okay. It was not at all clear to me what the exact format and role of
> > these lines are...
> 
> Noted. I'm making more updates in this file in another topic and will
> add some explanation so the next guy will be less confused.

This is what I will include (but in a separate topic). I will not CC
you there to keep your inbox slightly less full. I hope this helps
understand what command-list.txt is for.

diff --git a/command-list.txt b/command-list.txt
index 40776b9587..929d8f0ea0 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -1,3 +1,47 @@
+# Command classification list
+# ---
+# All supported commands, builtin or external, must be described in
+# here. This info is used to list commands in various places. Each
+# command is on one line followed by one or more attributes.
+#
+# The first attribute group is mandatory and indicates the command
+# type. This group includes:
+#
+#   mainporcelain
+#   ancillarymanipulators
+#   ancillaryinterrogators
+#   foreignscminterface
+#   plumbingmanipulators
+#   plumbinginterrogators
+#   synchingrepositories
+#   synchelpers
+#   purehelpers
+#
+# the type names are self explanatory. But if you want to see what
+# command belongs to what group to get a better idea, have a look at
+# "git" man page, "GIT COMMANDS" section.
+#
+# Commands of type mainporcelain can also optionally have one of these
+# attributes:
+#
+#   init
+#   worktree
+#   info
+#   history
+#   remote
+#
+# These commands are considered "common" and will show up in "git
+# help" output in groups. Uncommon porcelain commands must not
+# specify any of these attributes.
+#
+# "complete" attribute is used to mark that the command should be
+# completable by git-completion.bash. Note that by default,
+# mainporcelain commands are completable and you don't need this
+# attribute.
+#
+# While not true commands, guides are also specified here, which can
+# only have "guide" attribute and nothing else.
+#
 ### command list (do not change this line, also do not change alignment)
 # command name  category [category] [category]
 git-add mainporcelain   worktree


Re: [PATCH v2] checkout & worktree: introduce checkout.implicitRemote

2018-05-04 Thread Duy Nguyen
On Fri, May 4, 2018 at 9:54 AM, Ævar Arnfjörð Bjarmason
 wrote:
> Realistically the way we do hooks now would make the UI of this suck,
> i.e. you couldn't configure it globally or system-wide easily. Something
> like what I noted in
> https://public-inbox.org/git/871sf3el01@evledraar.gmail.com/ would
> make it suck less, but that's a much bigger task.

I thought you would bring this up :) I've given some more thoughts on
this topic and am willing to implement something like below, in a week
or two. Would that help change your mind?

I proposed hooks.* config space in that same thread. Here is the
extension to make it cover both of your points.

hooks.* can have multiple values. So you can specify
hooks.post-checkout multiple times and all those scripts will run (in
the same order you specified). Since we already have a search path for
config (/etc/gitconfig -> $HOME/.gitconfig -> $REPO/config) this helps
hooks management as well. Execution order is still the same: if you
specify hooks.post-checkout in both /etc/gitconfig and .git/config,
then the one in /etc/gitconfig is executed first, the one in .git
second.

And here's something extra to make it possible to override the search
order: if you specify hooks.post-checkout = reset (reset is a random
keyword) then _all_ post-checkout hooks before this point are ignored.
So you can put this in .git/config

[hooks]
post-checkout = reset
post-checkout = ~/some-hook

and can be sure that post-checkout specified in $HOME and /etc will
not affect you, only ~/some-hook will run. If you drop the second line
then you have no post-checkout hooks. This is a workaround for a
bigger problem (not being able to unset a config entry) but I think
it's sufficient for this use case.
-- 
Duy


Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-04 Thread Duy Nguyen
On Fri, May 4, 2018 at 9:23 AM, Johannes Schindelin
 wrote:
> Oh, okay. It was not at all clear to me what the exact format and role of
> these lines are...

Noted. I'm making more updates in this file in another topic and will
add some explanation so the next guy will be less confused.

> So that's what `info` does: it influences whether/where
> the command is listed in `git help`'s output... Interesting. I thought the
> lines here were trying to automate parts of the tab completion or
> something.

Oh it does many things. The completion part is coming (so yeah you
don't need to update git-completion.bash at all, as long as you have a
line here and use parse_options() ;-), but I think it's mainly for
"git help" and command listing in "git help git" (this command for
example should show up under the "Main porcelain commands" in that man
page when you put a line here)
-- 
Duy


Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-03 Thread Duy Nguyen
On Thu, May 3, 2018 at 10:32 PM, Johannes Schindelin
<johannes.schinde...@gmx.de> wrote:
> Hi Duy,
>
> On Thu, 3 May 2018, Johannes Schindelin wrote:
>
>> On Thu, 3 May 2018, Duy Nguyen wrote:
>>
>> > On Thu, May 3, 2018 at 5:30 PM, Johannes Schindelin
>> > <johannes.schinde...@gmx.de> wrote:
>> > > diff --git a/command-list.txt b/command-list.txt
>> > > index a1fad28fd82..c89ac8f417f 100644
>> > > --- a/command-list.txt
>> > > +++ b/command-list.txt
>> > > @@ -19,6 +19,7 @@ git-archive mainporcelain
>> > >  git-bisect  mainporcelain   info
>> > >  git-blame   ancillaryinterrogators
>> > >  git-branch  mainporcelain   history
>> > > +git-branch-diff mainporcelain   info
>> >
>> > Making it part of "git help" with the info keywords at this stage may
>> > be premature. "git help" is about _common_ commands and we don't know
>> > (yet) how popular this will be.
>>
>> Makes sense. I removed the `mainporcelain` keyword locally.
>
> On second thought, I *think* you meant to imply that I should remove that
> line altogether. Will do that now.

Actually I only suggested to remove the last word "info". That was
what made this command "common". Classifying all commands in this file
is definitely a good thing, and I think mainporcelain is the right
choice.
-- 
Duy


Re: [PATCH v2 0/5] Allocate cache entries from memory pool

2018-05-03 Thread Duy Nguyen
On Thu, May 3, 2018 at 7:21 PM, Stefan Beller <sbel...@google.com> wrote:
> On Thu, May 3, 2018 at 9:35 AM, Duy Nguyen <pclo...@gmail.com> wrote:
>> On Mon, Apr 30, 2018 at 5:31 PM, Jameson Miller <jam...@microsoft.com> wrote:
>>> This patch series improves the performance of loading indexes by
>>> reducing the number of malloc() calls. Loading the index from disk is
>>> partly dominated by the time in malloc(), which is called for each
>>> index entry. This patch series reduces the number of times malloc() is
>>> called as part of loading the index, and instead allocates a block of
>>> memory upfront that is large enough to hold all of the cache entries,
>>> and chunks this memory itself. This change builds on [1].
>>
>> I have only looked at the mem-pool related patches to see if
>> mem-pool.c is good enough to replace alloc.c. To me, it's a "yes"
>> after we optimize mem_pool_alloc() a bit (not that performance really
>> matters in alloc.c case, but that may be because it's already
>> blazingly fast that we never noticed about it).
>
> alloc.c was not just about speed, but mostly about dense packing?
> 855419f764a (Add specialized object allocator, 2006-06-19)

I know. I vaguely remembered Linus made that change but did not really
look it up :) That reference should be included when/if you switch
from alloc.c to mem-pool.c.

> To me it is also a clear yes when it comes to combining these
> two memory pools.

I also did not notice that jm/mem-pool already landed in master. Have
you tried measure (both memory usage and allocation speed) of it and
alloc.c? Just take some big repo as an example and do count-objects -v
to see how many blobs/trees/commits it has, then allocate the same
amount with both alloc.c and mem-pool.c and measure both speed/mem.
I'm pretty sure you're right that mem-pool.c is a clear yes. I was
just being more conservative because we do (slightly) change
allocator's behavior when we make the switch. But it's also very
likely that any performance difference will be insignificant.

I'm asking this because if mem-pool.c is a clear winner, you can start
to update you series to use it now and kill alloc.c in the process.

PS. Is Jeff back yet? I'm sure Junio is listening and all but I'm
afraid he's too busy being a maintainer so Jeff's opinion in this area
is really valuable. He has all the fun and weird use cases to play
with at github.
-- 
Duy


Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions

2018-05-03 Thread Duy Nguyen
On Thu, May 3, 2018 at 7:24 PM, Stefan Beller  wrote:
>>> +void clear_alloc_state(struct alloc_state *s)
>>> +{
>>> +   while (s->slab_nr > 0) {
>>> +   s->slab_nr--;
>>> +   free(s->slabs[s->slab_nr]);
>>
>> I think you're leaking memory here. Commit and tree objects may have
>> more allocations in them (especially trees, but I think we have
>> commit_list in struct commit too). Those need to be freed as well.
>
> I would think that tree and commit memory will be free'd in the
> parsed_objects release function? (TODO: I need to add it over there)

What release function? I know tree->buffer is often released since you
don't want to keep much in memory. But the user should not be required
to release all objects before calling object_parser_clear(). For
struct commit, I'm pretty we make the commit_list (for commit parents)
and never free. Now we need to do it, somewhere.

Oh you mean object_parser_clear() as release function? Yes. I've been
wondering if it makes more sense to go through obj_hash[] and release
objects before free(obj_hash) than doing it here. It's probably better
to do it there since obj_hash[] would contain the very last pointers
to these objects and look like the right place to release resources.

> Thanks,
> Stefan
-- 
Duy


Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-03 Thread Duy Nguyen
On Thu, May 3, 2018 at 5:30 PM, Johannes Schindelin
 wrote:
> diff --git a/command-list.txt b/command-list.txt
> index a1fad28fd82..c89ac8f417f 100644
> --- a/command-list.txt
> +++ b/command-list.txt
> @@ -19,6 +19,7 @@ git-archive mainporcelain
>  git-bisect  mainporcelain   info
>  git-blame   ancillaryinterrogators
>  git-branch  mainporcelain   history
> +git-branch-diff mainporcelain   info

Making it part of "git help" with the info keywords at this stage may
be premature. "git help" is about _common_ commands and we don't know
(yet) how popular this will be.

(I'm not complaining though, I almost wrote "what witchcraft is this
and where can I get it" when I see branch-diff output mentioned in
AEvar's mail)
-- 
Duy


Re: [PATCH v2 0/5] Allocate cache entries from memory pool

2018-05-03 Thread Duy Nguyen
On Mon, Apr 30, 2018 at 5:31 PM, Jameson Miller  wrote:
> This patch series improves the performance of loading indexes by
> reducing the number of malloc() calls. Loading the index from disk is
> partly dominated by the time in malloc(), which is called for each
> index entry. This patch series reduces the number of times malloc() is
> called as part of loading the index, and instead allocates a block of
> memory upfront that is large enough to hold all of the cache entries,
> and chunks this memory itself. This change builds on [1].

I have only looked at the mem-pool related patches to see if
mem-pool.c is good enough to replace alloc.c. To me, it's a "yes"
after we optimize mem_pool_alloc() a bit (not that performance really
matters in alloc.c case, but that may be because it's already
blazingly fast that we never noticed about it).

I probably should look at read-cache.c changes too. Maybe later.
Although after the change to use xmalloc() per entry a few years(?)
ago, it should be straight forward to use a different memory
allocator.
-- 
Duy


Re: [PATCH v2 5/5] block alloc: add validations around cache_entry lifecyle

2018-05-03 Thread Duy Nguyen
On Mon, Apr 30, 2018 at 5:31 PM, Jameson Miller  wrote:
> Add an option (controlled by an environment variable) perform extra
> validations on mem_pool allocated cache entries. When set:
>
>   1) Invalidate cache_entry memory when discarding cache_entry.
>
>   2) When discarding index_state struct, verify that all cache_entries
>  were allocated from expected mem_pool.
>
>   3) When discarding mem_pools, invalidate mem_pool memory.

On linux step 3 could be better achieved by allocating blocks with
mmap() and freeing them with munmap(). Access to already munmap()'d
blocks will result in segmentation fault regardless of values.  I
guess Windows also has something similar (I vaguely remember something
about "locking memory" and stuff, but my win32 knowledge is decade old
at this point)

(Actually with glibc on linux, i'm pretty sure mmap is already used
for large allocation so step 3 is achieved without doing anything; not
sure about other libc implementations)

> This should provide extra checks that mem_pools and their allocated
> cache_entries are being used as expected.
>
> Signed-off-by: Jameson Miller 
> ---
>  cache.h  |  7 +++
>  git.c|  3 +++
>  mem-pool.c   | 24 +++-
>  mem-pool.h   |  2 ++
>  read-cache.c | 47 +++
>  5 files changed, 82 insertions(+), 1 deletion(-)
>
> diff --git a/cache.h b/cache.h
> index 7ed68f28e0..8f10f0649b 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -369,6 +369,13 @@ void discard_index_cache_entry(struct cache_entry *ce);
>   */
>  void discard_transient_cache_entry(struct cache_entry *ce);
>
> +/*
> + * Validate the cache entries in the index.  This is an internal
> + * consistency check that the cache_entry structs are allocated from
> + * the expected memory pool.
> + */
> +void validate_cache_entries(const struct index_state *istate);
> +
>  #ifndef NO_THE_INDEX_COMPATIBILITY_MACROS
>  #define active_cache (the_index.cache)
>  #define active_nr (the_index.cache_nr)
> diff --git a/git.c b/git.c
> index f598fae7b7..28ec7a6c4f 100644
> --- a/git.c
> +++ b/git.c
> @@ -347,7 +347,10 @@ static int run_builtin(struct cmd_struct *p, int argc, 
> const char **argv)
>
> trace_argv_printf(argv, "trace: built-in: git");
>
> +   validate_cache_entries(_index);
> status = p->fn(argc, argv, prefix);
> +   validate_cache_entries(_index);
> +
> if (status)
> return status;
>
> diff --git a/mem-pool.c b/mem-pool.c
> index a495885c4b..77adb5d5b9 100644
> --- a/mem-pool.c
> +++ b/mem-pool.c
> @@ -60,21 +60,43 @@ void mem_pool_discard(struct mem_pool *mem_pool)
>  {
> int i;
> struct mp_block *block, *block_to_free;
> +   int invalidate_memory = should_validate_cache_entries();

Heh.. cache-entries logic should not enter mem-pool.c.

> +
> for (block = mem_pool->mp_block; block;)
> {
> block_to_free = block;
> block = block->next_block;
> +
> +   if (invalidate_memory)
> +   memset(block_to_free->space, 0xDD, ((char 
> *)block_to_free->end) - ((char *)block_to_free->space));
> +
> free(block_to_free);
> }
>
> -   for (i = 0; i < mem_pool->nr; i++)
> +   for (i = 0; i < mem_pool->nr; i++) {
> +   memset(mem_pool->custom[i], 0xDD, ((char 
> *)mem_pool->custom_end[i]) - ((char *)mem_pool->custom[i]));

"if (invalidate_memory)" missing

> free(mem_pool->custom[i]);
> +   }
>
> free(mem_pool->custom);
> free(mem_pool->custom_end);
> free(mem_pool);
>  }
>
> +int should_validate_cache_entries(void)
> +{
> +   static int validate_index_cache_entries = -1;
> +
> +   if (validate_index_cache_entries < 0) {
> +   if (getenv("GIT_TEST_VALIDATE_INDEX_CACHE_ENTRIES"))

There's a safer version that you can use, get_env_bool()

> +   validate_index_cache_entries = 1;
> +   else
> +   validate_index_cache_entries = 0;
> +   }
> +
> +   return validate_index_cache_entries;
> +}
> +
>  void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
>  {
> struct mp_block *p;
> diff --git a/mem-pool.h b/mem-pool.h
> index 34df4fa709..b1f9a920ba 100644
> --- a/mem-pool.h
> +++ b/mem-pool.h
> @@ -63,4 +63,6 @@ void mem_pool_combine(struct mem_pool *dst, struct mem_pool 
> *src);
>   */
>  int mem_pool_contains(struct mem_pool *mem_pool, void *mem);
>
> +int should_validate_cache_entries(void);
> +
>  #endif
> diff --git a/read-cache.c b/read-cache.c
> index 01cd7fea41..e1dc9f7f33 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1270,6 +1270,8 @@ int add_index_entry(struct index_state *istate, struct 
> cache_entry *ce, int opti
>  {
> int pos;
>
> +   validate_cache_entries(istate);

Validating _all_ entries every time an entry is added sounds way too 

Re: [PATCH v2 3/5] mem-pool: fill out functionality

2018-05-03 Thread Duy Nguyen
Another I noticed in the jm/mem-pool series is this loop in mem_pool_alloc()

for (p = mem_pool->mp_block; p; p = p->next_block)
if (p->end - p->next_free >= len)
break;

You always go from the start (mp_block) but at some point those first
blocks are filled up and we don't really need to walk from the start
anymore. If we allow the mem-pool user to set a "minimum alloc" limit,
then we can determine if the remaining space in a block is not useful
for any future allocation, we can just skip it and start looking for
an available from a new pointer, avail_block or something.

I'm writing this with alloc.c in mind because we have a lot more
blocks to allocate there. Unlike read-cache, you can't really estimate
how many mp_blocks you're going to need. This linked list could become
long. And because alloc.c does fixed size allocation, we use up one
block after another and will never find free space in previous blocks.

On Mon, Apr 30, 2018 at 5:31 PM, Jameson Miller  wrote:
> diff --git a/mem-pool.c b/mem-pool.c
> index 389d7af447..a495885c4b 100644
> --- a/mem-pool.c
> +++ b/mem-pool.c
> @@ -5,6 +5,8 @@
>  #include "cache.h"
>  #include "mem-pool.h"
>
> +#define BLOCK_GROWTH_SIZE 1024*1024 - sizeof(struct mp_block);

#define BLOCK_GROWTH_SIZE (1024*1024 - sizeof(struct mp_block))

(wrapped in brackets and no trailing semicolon)

> +
>  static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, 
> size_t block_alloc)
>  {
> struct mp_block *p;
> @@ -19,6 +21,60 @@ static struct mp_block *mem_pool_alloc_block(struct 
> mem_pool *mem_pool, size_t b
> return p;
>  }
>
> +static void *mem_pool_alloc_custom(struct mem_pool *mem_pool, size_t 
> block_alloc)
> +{
> +   char *p;

An empty line between variable declaration and function body would be nice.

> +   ALLOC_GROW(mem_pool->custom, mem_pool->nr + 1, mem_pool->alloc);
> +   ALLOC_GROW(mem_pool->custom_end, mem_pool->nr_end + 1, 
> mem_pool->alloc_end);
> +

If you put both custom and custom_end in a struct, then you can grow
just one array (of the new struct) and have fewer support variables
like nr_end and alloc_end

The only downside that I can see is the potential padding between
struct increasing memory consumption here. but since you don't care
about reallocation cost (i.e. large memory allocations should be
rare), it probably  does not matter either.

But wait, can we just reuse struct mp_block for this? You allocate a
new mp_block (for just one allocation) as usual, then you can can
maintain a linked list of custom alloc in "struct mp_block
*mp_custom_block" or something. This way we can walk both bulk and
custom allocation the same way.

> +   p = xmalloc(block_alloc);
> +   mem_pool->custom[mem_pool->nr++] = p;
> +   mem_pool->custom_end[mem_pool->nr_end++] = p + block_alloc;
> +
> +   mem_pool->pool_alloc += block_alloc;
> +
> +   return mem_pool->custom[mem_pool->nr];
> +}
> +
> +void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size)
> +{
> +   if (!(*mem_pool))

I think (!*mem_pool) should be enough, although you could avoid the
operator precedence headache by doing

if (*mem_pool)
return;

> +   {
> +   *mem_pool = xmalloc(sizeof(struct mem_pool));

I think we tend to do *mem_pool = xmalloc(sizeof(**mem_pool));

> +   (*mem_pool)->pool_alloc = 0;

Eghh.. perhaps just declare

struct mem_pool *pool;

then allocate a new memory to this, initialize everything and only do

*mem_pool = pool;

at the end? It's much less of this (*mem_pool)->

> +   (*mem_pool)->mp_block = NULL;

Just memset() it once (or use xcallo) and only initialize
non-zero/null fields afterwards.

> +   (*mem_pool)->block_alloc = BLOCK_GROWTH_SIZE;
> +   (*mem_pool)->custom = NULL;
> +   (*mem_pool)->nr = 0;
> +   (*mem_pool)->alloc = 0;
> +   (*mem_pool)->custom_end = NULL;
> +   (*mem_pool)->nr_end = 0;
> +   (*mem_pool)->alloc_end = 0;
> +
> +   if (initial_size > 0)
> +   mem_pool_alloc_block(*mem_pool, initial_size);
> +   }
> +}
> +
> +void mem_pool_discard(struct mem_pool *mem_pool)
> +{
> +   int i;
> +   struct mp_block *block, *block_to_free;
> +   for (block = mem_pool->mp_block; block;)
> +   {
> +   block_to_free = block;
> +   block = block->next_block;
> +   free(block_to_free);
> +   }
> +
> +   for (i = 0; i < mem_pool->nr; i++)
> +   free(mem_pool->custom[i]);
> +
> +   free(mem_pool->custom);
> +   free(mem_pool->custom_end);
> +   free(mem_pool);
> +}
> +
>  void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
>  {
> struct mp_block *p;
> @@ -33,10 +89,8 @@ void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
> break;
>
> if (!p) {
> -

Re: [PATCH v2] checkout & worktree: introduce checkout.implicitRemote

2018-05-03 Thread Duy Nguyen
On Thu, May 3, 2018 at 3:18 PM, Ævar Arnfjörð Bjarmason
 wrote:
> Introduce a checkout.implicitRemote setting which can be used to
> designate a remote to prefer (via checkout.implicitRemote=origin) when
> running e.g. "git checkout master" to mean origin/master, even though
> there's other remotes that have the "master" branch.
>
> I want this because it's very handy to use this workflow to checkout a
> repository and create a topic branch, then get back to a "master" as
> retrieved from upstream:
>
> (
> rm -rf /tmp/tbdiff &&
> git clone g...@github.com:trast/tbdiff.git &&
> cd tbdiff &&
> git branch -m topic &&
> git checkout master
> )
>
> That will output:
>
> Branch 'master' set up to track remote branch 'master' from 'origin'.
> Switched to a new branch 'master'
>
> But as soon as a new remote is added (e.g. just to inspect something
> from someone else) the DWIMery goes away:
>
> (
> rm -rf /tmp/tbdiff &&
> git clone g...@github.com:trast/tbdiff.git &&
> cd tbdiff &&
> git branch -m topic &&
> git remote add avar g...@github.com:avar/tbdiff.git &&
> git fetch avar &&
> git checkout master
> )
>
> Will output:
>
> error: pathspec 'master' did not match any file(s) known to git.
>
> The new checkout.implicitRemote config allows me to say that whenever
> that ambiguity comes up I'd like to prefer "origin", and it'll still
> work as though the only remote I had was "origin".
>
> I considered splitting this into checkout.implicitRemote and
> worktree.implicitRemote, but it's probably less confusing to break our
> own rules that anything shared between config should live in core.*
> than have two config settings, and I couldn't come up with a short
> name under core.* that made sense (core.implicitRemoteForCheckout?).

I wonder if it's difficult to add a dwim hook that allows us to
replace the entire dwim logic with a hook? Doing this "preferring
origin" in a shell script should be easy and it lets us play more with
tweaking dwim logic. (Yeah sorry I'm getting off topic again)
-- 
Duy


Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions

2018-05-03 Thread Duy Nguyen
On Tue, May 1, 2018 at 11:34 PM, Stefan Beller  wrote:
> @@ -501,9 +516,12 @@ void raw_object_store_clear(struct raw_object_store *o)
>  void object_parser_clear(struct object_parser *o)
>  {
> /*
> -* TOOD free objects in o->obj_hash.
> -*

You need to free(o->obj_hash) too. If you just want to reuse existing
obj_hash[] then at least clear it, leave no dangling pointers behind.

>  * As objects are allocated in slabs (see alloc.c), we do
>  * not need to free each object, but each slab instead.
>  */
> +   clear_alloc_state(o->blob_state);
> +   clear_alloc_state(o->tree_state);
> +   clear_alloc_state(o->commit_state);
> +   clear_alloc_state(o->tag_state);
> +   clear_alloc_state(o->object_state);
>  }
-- 
Duy


Re: [PATCH 00/13] object store: alloc

2018-05-02 Thread Duy Nguyen
On Wed, May 2, 2018 at 8:07 PM, Jameson Miller <jam...@microsoft.com> wrote:
>
>
>> -Original Message-
>> From: Duy Nguyen <pclo...@gmail.com>
>> Sent: Wednesday, May 2, 2018 1:02 PM
>> To: Stefan Beller <sbel...@google.com>
>> Cc: Git Mailing List <git@vger.kernel.org>; Jameson Miller
>> <jam...@microsoft.com>
>> Subject: Re: [PATCH 00/13] object store: alloc
>>
>> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbel...@google.com> wrote:
>> > I also debated if it is worth converting alloc.c via this patch series
>> > or if it might make more sense to use the new mem-pool by Jameson[1].
>> >
>> > I vaguely wonder about the performance impact, as the object
>> > allocation code seemed to be relevant in the past.
>>
>> If I remember correctly, alloc.c was added because malloc() has too high
>> overhead per allocation (and we create like millions of them). As long as you
>> keep allocation overhead low, it should be ok. Note that we allocate a lot 
>> more
>> than the mem-pool's main target (cache entries if I remember correctly). We
>> may have a couple thousands cache entries.  We already deal with a couple
>> million of struct object.
>
> The work to move cache entry allocation onto a memory pool was motivated by
> the fact that we are looking at indexes with millions of entries. If there is 
> scaling
> concern with the current version of mem-pool, we would like to address it 
> there
> as well. Or if there is improvements that can be shared, that would be nice 
> as well.

I think the two have quite different characteristics. alloc.c code is
driven by overhead. struct blob is only 24 bytes each and about 1/3
the repo is blobs, and each malloc has 16 bytes overhead or so if I
remember correctly. struct cache_entry at minimum in 88 bytes so
relative overhead is not that a big deal (but sure reducing it is
still very nice).

mem-pool is about allocation speed, but I think that's not a concern
for alloc.c because when we do full rev walk, I think I/O is always
the bottleneck (maybe object lookup as well). I don't see a good way
to have the one memory allocator that satisfyies both to be honest. If
you could allocate fixed-size cache entries most of the time (e.g.
larger entries will be allocated using malloc or something, and should
be a small number), then perhaps we can unify. Or if mem-pool can have
an option to allocated fixed size pieces with no overhead, that would
be great (sorry I still have not read that mem-pool series yet)
-- 
Duy


Re: [PATCH] checkout & worktree: introduce a core.DWIMRemote setting

2018-05-02 Thread Duy Nguyen
On Wed, May 2, 2018 at 8:00 PM, Eric Sunshine  wrote:
> 2. Building on #1: How well is the term "DWIM" understood by non-power
> users? A term, such as "default" is more well known.

I'm going off topic but I kinda dislike this term. First time I
encountered it in the code I didn't even know what it meant. Since it
has not been leaked to user documents (I only did a grep on
Documentation/) perhaps a better term should be used, preferably not
another acronym.
--
Duy


Re: [PATCH 01/13] repository: introduce object parser field

2018-05-02 Thread Duy Nguyen
On Wed, May 2, 2018 at 7:26 PM, Stefan Beller  wrote:
>> Another suggestion is object_pool, if we keep 'struct object' instead
>> of 'struct parsed_object' and also want to keep current allocation
>> behavior: no individual deallocation. If you free, you free the whole
>> pool (e.g. you could run rev-list --all --objects in a separate pool,
>> once you're done, you delete the whole pool).
>
> That is what the following patches will be about, you can
> only free the whole set of parsed objects.
>
> So if you want to do a separate rev walk, you may need to
> instantiate a new repository for it (ideally you'd only need a
> separate parsed object store).

I'm not sure if it's a good idea to create a separate struct
repository just because you want to free this parsed object store.
What if updates are made in both repositories? All the cache (well,
mostly the delta base cache in sha1_file.c) will double memory usage
as well. Yeah not ideal. But I guess making rev-list related code use
a separate parsed object store is no small task (and kinda risky as
well since we migrate from global lookup_* functions to local ones and
need to choose the correct parsed object store to look up from)

For your information, there is already a good use case for this
wholesale memory free: if we can free the rev-list related memory
early in pack-objects (e.g. part of repack operation) then it could
lower memory pressure significantly when running on large repos. This
has been discussed a bit lately.
-- 
Duy


Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions

2018-05-02 Thread Duy Nguyen
On Tue, May 1, 2018 at 11:34 PM, Stefan Beller  wrote:
>  #include "cache.h"
>  #include "object.h"
> @@ -30,8 +31,25 @@ struct alloc_state {
> int count; /* total number of nodes allocated */
> int nr;/* number of nodes left in current allocation */
> void *p;   /* first free node in current allocation */
> +
> +   /* bookkeeping of allocations */
> +   void **slabs;

Another way to manage this is linked list: you could reserve one
"object" in each slab to store the "next" (or "prev") pointer to
another slab, then you can just walk through all slabs and free. It's
a bit cheaper than reallocating slabs[], but I guess we reallocate so
few times that readability matters more (whichever way is chosen).

> +   int slab_nr, slab_alloc;
>  };
>
> +void *allocate_alloc_state(void)
> +{
> +   return xcalloc(1, sizeof(struct alloc_state));
> +}
> +
> +void clear_alloc_state(struct alloc_state *s)
> +{
> +   while (s->slab_nr > 0) {
> +   s->slab_nr--;
> +   free(s->slabs[s->slab_nr]);

I think you're leaking memory here. Commit and tree objects may have
more allocations in them (especially trees, but I think we have
commit_list in struct commit too). Those need to be freed as well.

> +   }
> +}
> +
>  static inline void *alloc_node(struct alloc_state *s, size_t node_size)
>  {
> void *ret;
-- 
Duy


Re: [PATCH 01/13] repository: introduce object parser field

2018-05-02 Thread Duy Nguyen
On Tue, May 1, 2018 at 11:33 PM, Stefan Beller  wrote:
> /*
> -* Holds any information related to accessing the raw object content.
> +* Holds any information needed to retrieve the raw content
> +* of objects. The object_parser uses this to get object
> +* content which it then parses.
>  */
> struct raw_object_store *objects;
>
> +   /*
> +* State for the object parser. This owns all parsed objects
> +* (struct object) so callers do not have to manage their
> +* lifetime.
> +*/
> +   struct object_parser *parsed_objects;

I like this name 'parsed_objects'. Should we rename the struct after
it (e.g. parsed_object_store as opposed to raw_object_store above)?

Another suggestion is object_pool, if we keep 'struct object' instead
of 'struct parsed_object' and also want to keep current allocation
behavior: no individual deallocation. If you free, you free the whole
pool (e.g. you could run rev-list --all --objects in a separate pool,
once you're done, you delete the whole pool).
-- 
Duy


Re: [PATCH 00/13] object store: alloc

2018-05-02 Thread Duy Nguyen
On Tue, May 1, 2018 at 11:33 PM, Stefan Beller  wrote:
> I also debated if it is worth converting alloc.c via this patch series
> or if it might make more sense to use the new mem-pool by Jameson[1].
>
> I vaguely wonder about the performance impact, as the object allocation
> code seemed to be relevant in the past.

If I remember correctly, alloc.c was added because malloc() has too
high overhead per allocation (and we create like millions of them). As
long as you keep allocation overhead low, it should be ok. Note that
we allocate a lot more than the mem-pool's main target (cache entries
if I remember correctly). We may have a couple thousands cache
entries.  We already deal with a couple million of struct object.
-- 
Duy


Re: [PATCH v2 00/42] object_id part 13

2018-05-02 Thread Duy Nguyen
On Wed, May 02, 2018 at 12:25:28AM +, brian m. carlson wrote:
> Changes from v1:
> * Add missing sign-off.
> * Removed unneeded braces from init_pack_info.
> * Express 51 in terms of the_hash_algo->hexsz.
> * Fix comments referring to SHA-1.
> * Update commit messages as suggested.
> * Add and use empty_tree_oid_hex and empty_blob_oid_hex.

Interdiff for people who don't have time to read 42 patches yet

diff --git a/builtin/merge.c b/builtin/merge.c
index 8d75ebe64b..7084bcfdea 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -290,7 +290,7 @@ static void read_empty(const struct object_id *oid, int 
verbose)
args[i++] = "-v";
args[i++] = "-m";
args[i++] = "-u";
-   args[i++] = oid_to_hex(the_hash_algo->empty_tree);
+   args[i++] = empty_tree_oid_hex();
args[i++] = oid_to_hex(oid);
args[i] = NULL;
 
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index c31ceb30c2..dca523f50f 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -968,7 +968,7 @@ static const char *push_to_deploy(unsigned char *sha1,
return "Working directory has unstaged changes";
 
/* diff-index with either HEAD or an empty tree */
-   diff_index[4] = head_has_history() ? "HEAD" : 
oid_to_hex(the_hash_algo->empty_tree);
+   diff_index[4] = head_has_history() ? "HEAD" : empty_tree_oid_hex();
 
child_process_init();
child.argv = diff_index;
diff --git a/cache.h b/cache.h
index c5b041019b..71b3c1b15b 100644
--- a/cache.h
+++ b/cache.h
@@ -1033,6 +1033,9 @@ static inline int is_empty_tree_oid(const struct 
object_id *oid)
return !oidcmp(oid, the_hash_algo->empty_tree);
 }
 
+const char *empty_tree_oid_hex(void);
+const char *empty_blob_oid_hex(void);
+
 /* set default permissions by passing mode arguments to open(2) */
 int git_mkstemps_mode(char *pattern, int suffix_len, int mode);
 int git_mkstemp_mode(char *pattern, int mode);
diff --git a/http.c b/http.c
index ec70676748..312a5e1833 100644
--- a/http.c
+++ b/http.c
@@ -2070,7 +2070,7 @@ int http_get_info_packs(const char *base_url, struct 
packed_git **packs_head)
get_sha1_hex(data + i + 6, hash);
fetch_and_setup_pack_index(packs_head, hash,
  base_url);
-   i += 51;
+   i += hexsz + 11;
break;
}
default:
diff --git a/sequencer.c b/sequencer.c
index 12c1e1cdbb..94b6513402 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1480,8 +1480,7 @@ static int do_pick_commit(enum todo_command command, 
struct commit *commit,
unborn = get_oid("HEAD", );
if (unborn)
oidcpy(, the_hash_algo->empty_tree);
-   if (index_differs_from(unborn ?
-  oid_to_hex(the_hash_algo->empty_tree) : 
"HEAD",
+   if (index_differs_from(unborn ? empty_tree_oid_hex() : "HEAD",
   NULL, 0))
return error_dirty_index(opts);
}
diff --git a/server-info.c b/server-info.c
index 828ec5e538..7ce6dcd67b 100644
--- a/server-info.c
+++ b/server-info.c
@@ -223,11 +223,9 @@ static void init_pack_info(const char *infofile, int force)
else
stale = 1;
 
-   for (i = 0; i < num_pack; i++) {
-   if (stale) {
+   for (i = 0; i < num_pack; i++)
+   if (stale)
info[i]->old_num = -1;
-   }
-   }
 
/* renumber them */
QSORT(info, num_pack, compare_info);
diff --git a/sha1_file.c b/sha1_file.c
index 794753bd54..bf6c8da3ff 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -109,6 +109,18 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
},
 };
 
+const char *empty_tree_oid_hex(void)
+{
+   static char buf[GIT_MAX_HEXSZ + 1];
+   return oid_to_hex_r(buf, the_hash_algo->empty_tree);
+}
+
+const char *empty_blob_oid_hex(void)
+{
+   static char buf[GIT_MAX_HEXSZ + 1];
+   return oid_to_hex_r(buf, the_hash_algo->empty_blob);
+}
+
 /*
  * This is meant to hold a *small* number of objects that you would
  * want read_sha1_file() to be able to return, but yet you do not want
diff --git a/submodule.c b/submodule.c
index 22a96b7af0..ee7eea4877 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1567,7 +1567,7 @@ static void submodule_reset_index(const char *path)
   get_super_prefix_or_empty(), path);
argv_array_pushl(, "read-tree", "-u", "--reset", NULL);
 
-   argv_array_push(, oid_to_hex(the_hash_algo->empty_tree));
+   argv_array_push(, empty_tree_oid_hex());
 
if (run_command())
die("could not reset submodule index");
@@ -1659,9 +1659,9 @@ int submodule_move_head(const char *path,
  

Re: [PATCH 08/41] packfile: abstract away hash constant values

2018-05-02 Thread Duy Nguyen
On Wed, May 2, 2018 at 2:11 AM, brian m. carlson
<sand...@crustytoothpaste.net> wrote:
> On Tue, May 01, 2018 at 12:22:43PM +0200, Duy Nguyen wrote:
>> On Mon, Apr 23, 2018 at 11:39:18PM +, brian m. carlson wrote:
>> > There are several instances of the constant 20 and 20-based values in
>> > the packfile code.  Abstract away dependence on SHA-1 by using the
>> > values from the_hash_algo instead.
>>
>> While we're abstracting away 20. There's the only 20 left in
>> sha1_file.c that should also be gone. But I guess you could do that
>> later since you need to rename fill_sha1_path to
>> fill_loose_object_path or something.
>
> I'm already working on knocking those out.

Yeah I checked out your part14 branch after writing this note :P You
still need to rename the function though. I can remind that again when
part14 is sent out.

>> > @@ -507,15 +509,15 @@ static int open_packed_git_1(struct packed_git *p)
>> >  " while index indicates %"PRIu32" objects",
>> >  p->pack_name, ntohl(hdr.hdr_entries),
>> >  p->num_objects);
>> > -   if (lseek(p->pack_fd, p->pack_size - sizeof(sha1), SEEK_SET) == -1)
>> > +   if (lseek(p->pack_fd, p->pack_size - hashsz, SEEK_SET) == -1)
>> > return error("end of packfile %s is unavailable", 
>> > p->pack_name);
>> > -   read_result = read_in_full(p->pack_fd, sha1, sizeof(sha1));
>> > +   read_result = read_in_full(p->pack_fd, hash, hashsz);
>> > if (read_result < 0)
>> > return error_errno("error reading from %s", p->pack_name);
>> > -   if (read_result != sizeof(sha1))
>> > +   if (read_result != hashsz)
>> > return error("packfile %s signature is unavailable", 
>> > p->pack_name);
>> > -   idx_sha1 = ((unsigned char *)p->index_data) + p->index_size - 40;
>> > -   if (hashcmp(sha1, idx_sha1))
>> > +   idx_hash = ((unsigned char *)p->index_data) + p->index_size - hashsz * 
>> > 2;
>> > +   if (hashcmp(hash, idx_hash))
>>
>> Since the hash size is abstracted away, shouldn't this hashcmp become
>> oidcmp? (which still does not do the right thing, but at least it's
>> one less place to worry about)
>
> Unfortunately, I can't, because it's not an object ID.  I think the
> decision was made to not transform non-object ID hashes into struct
> object_id, which makes sense.  I suppose we could have an equivalent
> struct hash or something for those other uses.

I probably miss something, is hashcmp() supposed to stay after the
conversion? And will it compare any hash (as configured in the_algo)
or will it for SHA-1 only?

If hashcmp() will eventually compare the_algo->rawsz then yes this makes sense.

>> > @@ -675,7 +677,8 @@ struct packed_git *add_packed_git(const char *path, 
>> > size_t path_len, int local)
>> > p->pack_size = st.st_size;
>> > p->pack_local = local;
>> > p->mtime = st.st_mtime;
>> > -   if (path_len < 40 || get_sha1_hex(path + path_len - 40, p->sha1))
>> > +   if (path_len < the_hash_algo->hexsz ||
>> > +   get_sha1_hex(path + path_len - the_hash_algo->hexsz, p->sha1))
>>
>> get_sha1_hex looks out of place when we start going with
>> the_hash_algo. Maybe change to get_oid_hex() too.
>
> I believe this is the pack hash, which isn't an object ID.  I will
> transform it to be called something other than "sha1" and allocate more
> memory for it in a future series, though.

Ah ok, it's the "sha1" in the name that bugged me. I'm all good then.
-- 
Duy


Re: [PATCH] checkout & worktree: introduce a core.DWIMRemote setting

2018-05-02 Thread Duy Nguyen
On Wed, May 2, 2018 at 12:54 PM, Ævar Arnfjörð Bjarmason
 wrote:
> Introduce a core.DWIMRemote setting which can be used to designate a
> remote to prefer (via core.DWIMRemote=origin) when running e.g. "git
> checkout master" to mean origin/master, even though there's other
> remotes that have the "master" branch.

Do we anticipate more dwimy customizations to justify a new dwim group
in config (i.e. dwim.remote instead of core.dwimRemote)?
-- 
Duy


Re: [PATCH v2 1/4] test-tool: help verifying BUG() code paths

2018-05-02 Thread Duy Nguyen
On Wed, May 2, 2018 at 11:38 AM, Johannes Schindelin
 wrote:
> When we call BUG(), we signal via SIGABRT that something bad happened,
> dumping cores if so configured. In some setups these coredumps are
> redirected to some central place such as /proc/sys/kernel/core_pattern,
> which is a good thing.
>
> However, when we try to verify in our test suite that bugs are caught in
> certain code paths, we do *not* want to clutter such a central place
> with unnecessary coredumps.
>
> So let's special-case the test helpers (which we use to verify such code
> paths) so that the BUG() calls will *not* call abort() but exit with a
> special-purpose exit code instead.
>
> Helped-by: Nguyễn Thái Ngọc Duy 
> Signed-off-by: Johannes Schindelin 
> ---
>  t/helper/test-tool.c | 2 ++
>  usage.c  | 5 +
>  2 files changed, 7 insertions(+)
>
> diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
> index 87066ced62a..5176f9f20ae 100644
> --- a/t/helper/test-tool.c
> +++ b/t/helper/test-tool.c
> @@ -47,7 +47,9 @@ static struct test_cmd cmds[] = {
>  int cmd_main(int argc, const char **argv)
>  {
> int i;
> +   extern int BUG_exit_code;
>
> +   BUG_exit_code = 99;

It may be even better to let individual tests in t1406 control this,
pretty much like your original patch, except that we tell usage.c to
not send SIGABRT and just return a special fault code. That way we
don't accidentally suppress BUG()'s sigabrt behavior  in tests that do
not anticipate it (even in t1406).

But this patch is ok for me too if you don't want another reroll.

> if (argc < 2)
> die("I need a test name!");
>
> diff --git a/usage.c b/usage.c
> index cdd534c9dfc..9c84dccfa97 100644
> --- a/usage.c
> +++ b/usage.c
> @@ -210,6 +210,9 @@ void warning(const char *warn, ...)
> va_end(params);
>  }
>
> +/* Only set this, ever, from t/helper/, when verifying that bugs are caught. 
> */
> +int BUG_exit_code;
> +
>  static NORETURN void BUG_vfl(const char *file, int line, const char *fmt, 
> va_list params)
>  {
> char prefix[256];
> @@ -221,6 +224,8 @@ static NORETURN void BUG_vfl(const char *file, int line, 
> const char *fmt, va_lis
> snprintf(prefix, sizeof(prefix), "BUG: ");
>
> vreportf(prefix, fmt, params);
> +   if (BUG_exit_code)
> +   exit(BUG_exit_code);
> abort();
>  }
>
> --
> 2.17.0.windows.1.36.gdf4ca5fb72a
>
>



-- 
Duy


Re: completion troubles with bash

2018-05-01 Thread Duy Nguyen
On Tue, May 1, 2018 at 2:17 PM, Pascal Bourdier
 wrote:
> Hi,
>
>
> If "GREP_OPTIONS" is set to '--color=always' , then the git completion
> send some escape characters like this :
>
> ^[[1;35;40m^[[K  d^[[m^[[Kiff-files   mergetool
> a^[[m^[[Kdd  d^[[m^[[Kiff-index   mv
> a^[[m^[[Kddremoved^[[m^[[Kiff-treename-rev
> a^[[m^[[Km   d^[[m^[[Kifftool notes
> ...
>
>
> So you could find a small patch to disable color with "grep" in attachment.

Yep. The topic 'nd/command-list' on 'pu' accidentally fixes this too
since it removes the use of egrep.

>
>
> Regards,
>
>
> Pascal Bourdier



-- 
Duy


Re: [PATCH 0/4] subtree: move out of contrib

2018-05-01 Thread Duy Nguyen
On Mon, Apr 30, 2018 at 11:50 AM, Ævar Arnfjörð Bjarmason
 wrote:
> I think at this point git-subtree is widely used enough to move out of
> contrib/, maybe others disagree, but patches are always better for
> discussion that patch-less ML posts.

After narrow/partial clone becomes real, it should be "easy" to
implement some sort of narrow checkout that would achieve the same
thing. But it took me forever with all other stuff to get back to
this.

If we remove it from contrib and there are people willing to
update/maintain it, should it be a separate repository then? The
willing people will have much more freedom to update it. And I don't
have to answer the questions about who will maintain this thing in
git.git. I don't like the idea of adding new (official) shell-based
commands either to be honest.
-- 
Duy


Re: [PATCH 2/6] t1406: prepare for the refs code to fail with BUG()

2018-05-01 Thread Duy Nguyen
On Mon, Apr 30, 2018 at 12:17 AM, Johannes Schindelin
 wrote:
> t1406 specifically verifies that certain code paths fail with a BUG: ...
> message.
>
> In the upcoming commit, we will convert that message to be generated via
> BUG() instead of die("BUG: ..."), which implies SIGABRT instead of a
> regular exit code.

On the other hand, SIGABRT on linux creates core dumps. And on some
setup (like mine) core dumps may be redirected to some central place
via /proc/sys/kernel/core_pattern. I think systemd does it too but I
didn't check.

This moving to SIGABRT when we know it _will_ happen when running the
test suite will accumulate core dumps over time and not cleaned up by
the test suite. Maybe keeping die("BUG: here is a good compromise.
-- 
Duy


Re: [PATCH 2/6] t1406: prepare for the refs code to fail with BUG()

2018-05-01 Thread Duy Nguyen
On Tue, May 1, 2018 at 1:04 PM, Johannes Schindelin
 wrote:
>> If SIGABRT occurs as a result of BUG(), and we know that this happens for
>> certain cases, it means we have an unfixed bug.
>
> Not in this case: The code in question is in
> https://github.com/git/git/blob/v2.17.0/t/helper/test-ref-store.c#L190-L201
> and it is called in a way that fails to have the required flags for the
> operation.

To elaborate, in this particular case, developers are not supposed to
call calling create_reflog() on a submodule ref store because the
store's implementation does not support it (yet). So it's definitely
BUG() to catch devs from doing so. But due to the multiple layer
abstractions, we also need to verify that ref code will bug out in
this case :P

> This would normally indicate a bug, but in this case, that is
> exactly what the regression test tries to trigger: we *want* such a bug to
> cause a failure.
-- 
Duy


Re: [PATCH 00/41] object_id part 13

2018-05-01 Thread Duy Nguyen
On Mon, Apr 30, 2018 at 11:59:43PM +, brian m. carlson wrote:
> > I guess I'll be helping review this series instead :D

Overall I think this looks good.

> 
> Yeah, I have code to fix that, but it's ugly.
> 
> You can see the work on part2 and part3 of the test fixes, plus the
> fixes for all of that stuff on my object-id-part14 branch.

Since I gave it some thought, another way of dealing with this may be
hiding this behind git_hash_algo abstraction, so we have one function
for sha1, one for newhash... and they are probably just macros that
take different struct definition.

--
Duy



Re: [PATCH 39/41] Update shell scripts to compute empty tree object ID

2018-05-01 Thread Duy Nguyen
On Mon, Apr 23, 2018 at 11:39:49PM +, brian m. carlson wrote:
> Several of our shell scripts hard-code the object ID of the empty tree.
> To avoid any problems when changing hashes, compute this value on
> startup of the script.  For performance, store the value in a variable
> and reuse it throughout the life of the script.
> 
> Signed-off-by: brian m. carlson 
> ---
>  git-filter-branch.sh   | 4 +++-
>  git-rebase--interactive.sh | 4 +++-
>  templates/hooks--pre-commit.sample | 2 +-
>  3 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> index 64f21547c1..ccceaf19a7 100755
> --- a/git-filter-branch.sh
> +++ b/git-filter-branch.sh
> @@ -11,6 +11,8 @@
>  # The following functions will also be available in the commit filter:
>  
>  functions=$(cat << \EOF
> +EMPTY_TREE=$(git hash-object -t tree /dev/null)

All scripts (except those example hooks) must source
git-sh-setup. Should we define this in there instead?
--
Duy


Re: [PATCH 09/41] pack-objects: abstract away hash algorithm

2018-05-01 Thread Duy Nguyen
On Mon, Apr 23, 2018 at 11:39:19PM +, brian m. carlson wrote:
> @@ -1850,7 +1852,7 @@ static int try_delta(struct unpacked *trg, struct 
> unpacked *src,
>   /* Now some size filtering heuristics. */
>   trg_size = trg_entry->size;
>   if (!trg_entry->delta) {
> - max_size = trg_size/2 - 20;
> + max_size = trg_size/2 - the_hash_algo->rawsz;

This may be questionable. Note the "heuristics" comment above. I'm not
even sure if this is hash size or some magical-yet-randomly-good
value. Just wanted to bring the attention for other people with better
understand of this code to see

>   ref_depth = 1;
>   } else {
>   max_size = trg_entry->delta_size;


Re: [PATCH 08/41] packfile: abstract away hash constant values

2018-05-01 Thread Duy Nguyen
On Mon, Apr 23, 2018 at 11:39:18PM +, brian m. carlson wrote:
> There are several instances of the constant 20 and 20-based values in
> the packfile code.  Abstract away dependence on SHA-1 by using the
> values from the_hash_algo instead.

While we're abstracting away 20. There's the only 20 left in
sha1_file.c that should also be gone. But I guess you could do that
later since you need to rename fill_sha1_path to
fill_loose_object_path or something.


> @@ -507,15 +509,15 @@ static int open_packed_git_1(struct packed_git *p)
>" while index indicates %"PRIu32" objects",
>p->pack_name, ntohl(hdr.hdr_entries),
>p->num_objects);
> - if (lseek(p->pack_fd, p->pack_size - sizeof(sha1), SEEK_SET) == -1)
> + if (lseek(p->pack_fd, p->pack_size - hashsz, SEEK_SET) == -1)
>   return error("end of packfile %s is unavailable", p->pack_name);
> - read_result = read_in_full(p->pack_fd, sha1, sizeof(sha1));
> + read_result = read_in_full(p->pack_fd, hash, hashsz);
>   if (read_result < 0)
>   return error_errno("error reading from %s", p->pack_name);
> - if (read_result != sizeof(sha1))
> + if (read_result != hashsz)
>   return error("packfile %s signature is unavailable", 
> p->pack_name);
> - idx_sha1 = ((unsigned char *)p->index_data) + p->index_size - 40;
> - if (hashcmp(sha1, idx_sha1))
> + idx_hash = ((unsigned char *)p->index_data) + p->index_size - hashsz * 
> 2;
> + if (hashcmp(hash, idx_hash))

Since the hash size is abstracted away, shouldn't this hashcmp become
oidcmp? (which still does not do the right thing, but at least it's
one less place to worry about)

Same comment for other hashcmp in this patch.

> @@ -675,7 +677,8 @@ struct packed_git *add_packed_git(const char *path, 
> size_t path_len, int local)
>   p->pack_size = st.st_size;
>   p->pack_local = local;
>   p->mtime = st.st_mtime;
> - if (path_len < 40 || get_sha1_hex(path + path_len - 40, p->sha1))
> + if (path_len < the_hash_algo->hexsz ||
> + get_sha1_hex(path + path_len - the_hash_algo->hexsz, p->sha1))

get_sha1_hex looks out of place when we start going with
the_hash_algo. Maybe change to get_oid_hex() too.

> @@ -1678,10 +1683,10 @@ int bsearch_pack(const struct object_id *oid, const 
> struct packed_git *p, uint32
>  
>   index_lookup = index_fanout + 4 * 256;
>   if (p->index_version == 1) {
> - index_lookup_width = 24;
> + index_lookup_width = hashsz + 4;

You did good research to spot this 24 constant ;-)

--
Duy


Re: [PATCH 04/41] packfile: remove unused member from struct pack_entry

2018-05-01 Thread Duy Nguyen
On Mon, Apr 23, 2018 at 11:39:14PM +, brian m. carlson wrote:
> The sha1 member in struct pack_entry is unused except for one instance
> in which we store a value in it.  Since nobody ever reads this value,
> don't bother to compute it and remove the member from struct pack_entry.

Never used since its introduction in 1f688557c0 ([PATCH] Teach
read_sha1_file() and friends about packed git object store. -
2005-06-27). Good riddance.
--
Duy


Re: [PATCH 03/41] Remove unused member in struct object_context

2018-05-01 Thread Duy Nguyen
On Mon, Apr 23, 2018 at 11:39:13PM +, brian m. carlson wrote:
> The tree member of struct object_context is unused except in one place
> where we write to it.  Since there are no users of this member, remove
> it.

Yep. It's never used since its introduction in 573285e552 (sha1_name:
add get_sha1_with_context() - 2010-06-09) in 'cp/textconv-cat-file'
topic. I guess the idea at that time was to keep all the information
from get_tree_entry() in object context for future use.

Since it's been eight years and nobody needs it still, it should be
good to go.
--
Duy


Re: [PATCH 02/41] server-info: remove unused members from struct pack_info

2018-05-01 Thread Duy Nguyen
On Mon, Apr 23, 2018 at 11:39:12PM +, brian m. carlson wrote:
> The head member of struct pack_info is completely unused and the
> nr_heads member is used only in one place, which is an assignment.
> Since these structure members are not useful, remove them.

If you reroll, you could add that their last use was in 3e15c67c90
(server-info: throw away T computation as well. - 2005-12-04)
--
Duy



Re: [PATCH 01/41] cache: add a function to read an object ID from a buffer

2018-05-01 Thread Duy Nguyen
On Mon, Apr 23, 2018 at 11:39:11PM +, brian m. carlson wrote:
> diff --git a/cache.h b/cache.h
> index bbaf5c349a..4bca177cf3 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1008,6 +1008,11 @@ static inline void oidclr(struct object_id *oid)
>   memset(oid->hash, 0, GIT_MAX_RAWSZ);
>  }
>  
> +static inline void oidread(struct object_id *oid, const unsigned char *hash)
> +{
> + memcpy(oid->hash, hash, the_hash_algo->rawsz);

If performance is a concern, should we go with GIT_MAX_RAWSZ instead
of the_hash_algo->rawsz which gives the compiler some more to bypass
actual memcpy function and generate copy code directly?

If it is not a performance problem, should we avoid inline and move
the implementation somewhere?

> +}
> +
>  


Re: [PATCH 00/41] object_id part 13

2018-04-30 Thread Duy Nguyen
On Tue, Apr 24, 2018 at 1:39 AM, brian m. carlson
 wrote:
> [0] I can synthesize blobs, trees, and commits, but things are currently
> totally broken, which is, I suppose, to be expected.

Yup. I was tired and bored so I went playing with the new hash.
Writing and reading blobs (with hash-object/cat-file) were relatively
easy after fixing up fill_sha1_path and get_oid_basic). Then I worked
my way up to update-index/ls-files so that I could make trees with
write-tree. And I hit the first road block: struct ondisk_cache_entry
hard codes hash size so I would need to re-organize the code for more
flexibility (or even redesign the file format if I want to keep byte
alignment). Eck...

I guess I'll be helping review this series instead :D
-- 
Duy


Re: [PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index

2018-04-30 Thread Duy Nguyen
On Mon, Apr 30, 2018 at 6:19 PM, Elijah Newren <new...@gmail.com> wrote:
> Hi Duy,
>
> On Mon, Apr 30, 2018 at 7:45 AM, Duy Nguyen <pclo...@gmail.com> wrote:
>> On Mon, Apr 30, 2018 at 4:42 PM, Duy Nguyen <pclo...@gmail.com> wrote:
>>> On Sun, Apr 29, 2018 at 10:53 PM, Johannes Schindelin
>>> <johannes.schinde...@gmx.de> wrote:
>>>>> > @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc 
>>>>> > *t, struct unpack_trees_options
>>>>> >   WRITE_TREE_SILENT |
>>>>> >   WRITE_TREE_REPAIR);
>>>>> > }
>>>>> > -   move_index_extensions(>result, o->dst_index);
>>>>> > +   move_index_extensions(>result, o->src_index);
>>>>>
>>>>> While this looks like the right thing to do on paper, I believe it's
>>>>> actually broken for a specific case of untracked cache. In short,
>>>>> please do not touch this line. I will send a patch to revert
>>>>> edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08),
>>>>> which essentially deletes this line, with proper explanation and
>>>>> perhaps a test if I could come up with one.
>>>>>
>>>>> When we update the index, we depend on the fact that all updates must
>>>>> invalidate the right untracked cache correctly. In this unpack
>>>>> operations, we start copying entries over from src to result. Since
>>>>> 'result' (at least from the beginning) does not have an untracked
>>>>> cache, it has nothing to invalidate when we copy entries over. By the
>>>>> time we have done preparing 'result', what's recorded in src's (or
>>>>> dst's for that matter) untracked cache may or may not apply to
>>>>> 'result'  index anymore. This copying only leads to more problems when
>>>>> untracked cache is used.
>>>>
>>>> Is there really no way to invalidate just individual entries?
>>>
>>> Grr the short answer is the current code (i.e. without Elijah's
>>> changes) works but in a twisted way. So you get to keep untracked
>>> cache in the end.
>>
>> GAAAHH.. it works _with_ Elijah's changes (since he made the change
>> from dst to src) not without (and no performance regression).
>
> So...is that an Acked-by for the patch

Yes, Acked-by: me.

> or does the "two wrong make a
> right, I guess" comment suggest that we should still drop the
> move_index_extensions change (essentially reverting to v1 of the PATCH
> as found at 20180421193736.12722-1-new...@gmail.com), and you'll fix
> things up further in a separate series?

I think I'll stay away from this file for a while. When I gather
enough courage, I'll need to read it through since it sounds like a
mine field.
-- 
Duy


Re: [PATCH v5 00/10] Keep all info in command-list.txt in git binary

2018-04-30 Thread Duy Nguyen
This is probably scope creep for this series, but do you guys think we
should do the same for config variables completion? We currently
maintain a giant list at the end of _git_config(). Extracting the list
from Documentation/config.txt to keep it in a C array does not look
super hard. There will be some special handling for advice.* or
color but overall I still think it's a net gain.

Listing all recognizable config variables from "git config" (or "git
help") would be lovely, but I don't think it helps unless we could
print a short summary line each variable, but this info is not
available anywhere.
--
Duy


Re: [RFC PATCH] checkout: Force matching mtime between files

2018-04-30 Thread Duy Nguyen
On Mon, Apr 30, 2018 at 1:56 AM, Junio C Hamano <gits...@pobox.com> wrote:
> Duy Nguyen <pclo...@gmail.com> writes:
>
>> Target revision should be available in the index. But this gives me an
>> idea to another thing that bugs me: sending the list to the hook means
>> I have to deal with separator (\n or NUL?) or escaping. This mentions
>> of index makes me take a different direction. I could produce a small
>> index that contains just what is modified, then you can retrieve
>> whatever info you want with `git ls-files` or even `git show` after
>> pointing $GIT_INDEX_FILE to it.
>
> That's somewhere in between a tail wagging the dog and a hammer
> looking like a solution even when you have a screw.  By passing a
> temporary index, you may be able to claim that you are feeding the
> necessary information without corruption and in a readable and
> native format of Git, and make it up to the reader to grab the paths
> out of it, but
>
>  (1) the contents, and probably the cached stat information, in that
>  temporary index is duplicated and wasted; you know from the
>  time you design this thing that all you want to convey is a
>  list of paths.

Yep, I was not happy about this. Which I why I moved to update the
hook calling convention to pass pathspec to the hook instead.

>  (2) it is totally unclear who is responsible for cleaning the
>  temporary file up.

The one that creates it deletes it, which is git.

>  (3) the recipient must walk and carefully grab the path, certainly
>  has to "deal with separator (\n or NUL?) or escaping" anyway,
>  especially if the reason you use a temporary index is because
>  "they can use ls-files on it".  They need to read from ls-files
>  anyway, so that is no better than feeding ls-files compatible
>  input from the standard input of the hook script.

Err "ls-files compatible input" or "ls-files compatible _output_"? If
by input you mean the pathspec we give to ls-files, I agree. If it's
standard ls-files --stage output , letting the hook call ls-files lets
it select the output format it wants (including the potential json
output in the future), which is more flexible.

>
> no?



-- 
Duy


Re: [PATCH v1] test-drop-caches: simplify delay loading of NtSetSystemInformation

2018-04-30 Thread Duy Nguyen
On Mon, Apr 30, 2018 at 4:26 PM, Ben Peart  wrote:
> Take advantage of the recent addition of support for lazy loading functions
> on Windows to simplfy the loading of NtSetSystemInformation.
>
> Signed-off-by: Ben Peart 
> ---
>
> Notes:
> Base Ref: master
> Web-Diff: https://github.com/benpeart/git/commit/6e6ce4a788
> Checkout: git fetch https://github.com/benpeart/git test-drop-caches-v1 
> && git checkout 6e6ce4a788
>
>  t/helper/test-drop-caches.c | 16 
>  1 file changed, 4 insertions(+), 12 deletions(-)
>
> diff --git a/t/helper/test-drop-caches.c b/t/helper/test-drop-caches.c
> index 838760898b..dd41da1a2c 100644
> --- a/t/helper/test-drop-caches.c
> +++ b/t/helper/test-drop-caches.c
> @@ -1,5 +1,6 @@
>  #include "test-tool.h"
>  #include "git-compat-util.h"
> +#include "lazyload.h"

This is in compat/win32, should it be inside the "if defined
(GIT_WINDOWS_NATIVE)" block instead of here?

>
>  #if defined(GIT_WINDOWS_NATIVE)
>
> @@ -82,8 +83,6 @@ static int cmd_dropcaches(void)
>  {
> HANDLE hProcess = GetCurrentProcess();
> HANDLE hToken;
> -   HMODULE ntdll;
> -   DWORD(WINAPI *NtSetSystemInformation)(INT, PVOID, ULONG);
> SYSTEM_MEMORY_LIST_COMMAND command;
> int status;
>
> @@ -95,14 +94,9 @@ static int cmd_dropcaches(void)
>
> CloseHandle(hToken);
>
> -   ntdll = LoadLibrary("ntdll.dll");
> -   if (!ntdll)
> -   return error("Can't load ntdll.dll, wrong Windows version?");
> -
> -   NtSetSystemInformation =
> -   (DWORD(WINAPI *)(INT, PVOID, ULONG))GetProcAddress(ntdll, 
> "NtSetSystemInformation");
> -   if (!NtSetSystemInformation)
> -   return error("Can't get function addresses, wrong Windows 
> version?");
> +   DECLARE_PROC_ADDR(ntdll.dll, DWORD, NtSetSystemInformation, INT, 
> PVOID, ULONG);
> +   if (!INIT_PROC_ADDR(NtSetSystemInformation))
> +   return error("Could not find NtSetSystemInformation() 
> function");
>
> command = MemoryPurgeStandbyList;
> status = NtSetSystemInformation(
> @@ -115,8 +109,6 @@ static int cmd_dropcaches(void)
> else if (status != STATUS_SUCCESS)
> error("Unable to execute the memory list command %d", status);
>
> -   FreeLibrary(ntdll);
> -
> return status;
>  }
>
>
> base-commit: 1f1cddd558b54bb0ce19c8ace353fd07b758510d
> --
> 2.17.0.windows.1
>



-- 
Duy


Re: [PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index

2018-04-30 Thread Duy Nguyen
On Mon, Apr 30, 2018 at 4:42 PM, Duy Nguyen <pclo...@gmail.com> wrote:
> On Sun, Apr 29, 2018 at 10:53 PM, Johannes Schindelin
> <johannes.schinde...@gmx.de> wrote:
>>> > @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc 
>>> > *t, struct unpack_trees_options
>>> >   WRITE_TREE_SILENT |
>>> >   WRITE_TREE_REPAIR);
>>> > }
>>> > -   move_index_extensions(>result, o->dst_index);
>>> > +   move_index_extensions(>result, o->src_index);
>>>
>>> While this looks like the right thing to do on paper, I believe it's
>>> actually broken for a specific case of untracked cache. In short,
>>> please do not touch this line. I will send a patch to revert
>>> edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08),
>>> which essentially deletes this line, with proper explanation and
>>> perhaps a test if I could come up with one.
>>>
>>> When we update the index, we depend on the fact that all updates must
>>> invalidate the right untracked cache correctly. In this unpack
>>> operations, we start copying entries over from src to result. Since
>>> 'result' (at least from the beginning) does not have an untracked
>>> cache, it has nothing to invalidate when we copy entries over. By the
>>> time we have done preparing 'result', what's recorded in src's (or
>>> dst's for that matter) untracked cache may or may not apply to
>>> 'result'  index anymore. This copying only leads to more problems when
>>> untracked cache is used.
>>
>> Is there really no way to invalidate just individual entries?
>
> Grr the short answer is the current code (i.e. without Elijah's
> changes) works but in a twisted way. So you get to keep untracked
> cache in the end.

GAAAHH.. it works _with_ Elijah's changes (since he made the change
from dst to src) not without (and no performance regression). This
file really messes my brain up.
-- 
Duy


Re: [PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index

2018-04-30 Thread Duy Nguyen
On Sun, Apr 29, 2018 at 10:53 PM, Johannes Schindelin
 wrote:
>> > @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc 
>> > *t, struct unpack_trees_options
>> >   WRITE_TREE_SILENT |
>> >   WRITE_TREE_REPAIR);
>> > }
>> > -   move_index_extensions(>result, o->dst_index);
>> > +   move_index_extensions(>result, o->src_index);
>>
>> While this looks like the right thing to do on paper, I believe it's
>> actually broken for a specific case of untracked cache. In short,
>> please do not touch this line. I will send a patch to revert
>> edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08),
>> which essentially deletes this line, with proper explanation and
>> perhaps a test if I could come up with one.
>>
>> When we update the index, we depend on the fact that all updates must
>> invalidate the right untracked cache correctly. In this unpack
>> operations, we start copying entries over from src to result. Since
>> 'result' (at least from the beginning) does not have an untracked
>> cache, it has nothing to invalidate when we copy entries over. By the
>> time we have done preparing 'result', what's recorded in src's (or
>> dst's for that matter) untracked cache may or may not apply to
>> 'result'  index anymore. This copying only leads to more problems when
>> untracked cache is used.
>
> Is there really no way to invalidate just individual entries?

Grr the short answer is the current code (i.e. without Elijah's
changes) works but in a twisted way. So you get to keep untracked
cache in the end.

I was right about the invalidation stuff. I knew about
invalidate_ce_path() in this file. What I didn't remember was this
function actually invalidates entries from the _source_ index, not the
result one. What kind of logic is that? You copy/move entries from
source to result than you go invalidate the source. Since the original
move_index_extensions() call moves extensions from the source, these
are already properly invalidated (both untracked cache and cache
tree), it it looks like it does the right thing. Two wrongs make a
right, I guess.

Sorry for venting. I was not happy with what I found. And sorry for
wasting your time making this move_index.. change then remove it.

> I have a couple of worktrees which are *huge*. And edf3b90553 really
> helped relieve the pain a bit when running `git status`. Now you say that
> even a `git checkout -b new-branch` would blow the untracked cache away
> again?
>
> It would be *really* nice if we could prevent that performance regression
> somehow.
>
> Ciao,
> Dscho



-- 
Duy


Re: [PATCH v5 09/10] help: use command-list.txt for the source of guides

2018-04-29 Thread Duy Nguyen
Phillip (and others) the changes in this patch make "git help -g" now
lists a lot more guides than just the "common" one as advertised (see
below for the exact list). The man page for "git help -g" also
mentions that it would list "useful" guides, not all guides. But we
have no way to list all guides as far as I can tell.

I guess we have two options forward:

- keep "help -g" to common guide (we can tag common guides in
command-list.txt) and add a new option to list all guides ("help
-ag"?)
- reword the man page to make "help -g" list all guides

I'm ok with either direction. What's your preference?

For comparison, this is the new output

The common Git guides are:
   attributes  Defining attributes per path
   cli Git command-line interface and conventions
   core-tutorial   A Git core tutorial for developers
   cvs-migration   Git for CVS users
   diffcoreTweaking diff output
   everydayA useful minimum set of commands for Everyday
Git
   glossaryA Git Glossary
   hooks   Hooks used by Git
   ignore  Specifies intentionally untracked files to
ignore
   modules Defining submodule properties
   namespaces  Git namespaces
   repository-layout   Git Repository Layout
   revisions   Specifying revisions and ranges for Git
   tutorialA tutorial introduction to Git
   tutorial-2  A tutorial introduction to Git: part two
   workflows   An overview of recommended workflows with Git

compared to the old version

The common Git guides are:

   attributes   Defining attributes per path
   everyday Everyday Git With 20 Commands Or So
   glossary A Git glossary
   ignore   Specifies intentionally untracked files to ignore
   modules  Defining submodule properties
   revisionsSpecifying revisions and ranges for Git
   tutorial A tutorial introduction to Git (for version 1.5.1 or
newer)
   workflowsAn overview of recommended workflows with Git




On Sun, Apr 29, 2018 at 8:18 PM, Nguyễn Thái Ngọc Duy  wrote:
> The help command currently hard codes the list of guides and their
> summary in C. Let's move this list to command-list.txt. This lets us
> extract summary lines from Documentation/git*.txt. This also
> potentially lets us list guides in git.txt, but I'll leave that for
> now.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  Documentation/gitattributes.txt|  2 +-
>  Documentation/gitmodules.txt   |  2 +-
>  Documentation/gitrevisions.txt |  2 +-
>  Makefile   |  2 +-
>  builtin/help.c | 32 --
>  command-list.txt   | 16 +
>  contrib/completion/git-completion.bash | 15 
>  help.c | 18 ---
>  help.h |  1 +
>  t/t0012-help.sh|  6 +
>  10 files changed, 52 insertions(+), 44 deletions(-)
>
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 1094fe2b5b..083c2f380d 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -3,7 +3,7 @@ gitattributes(5)
>
>  NAME
>  
> -gitattributes - defining attributes per path
> +gitattributes - Defining attributes per path
>
>  SYNOPSIS
>  
> diff --git a/Documentation/gitmodules.txt b/Documentation/gitmodules.txt
> index db5d47eb19..4d63def206 100644
> --- a/Documentation/gitmodules.txt
> +++ b/Documentation/gitmodules.txt
> @@ -3,7 +3,7 @@ gitmodules(5)
>
>  NAME
>  
> -gitmodules - defining submodule properties
> +gitmodules - Defining submodule properties
>
>  SYNOPSIS
>  
> diff --git a/Documentation/gitrevisions.txt b/Documentation/gitrevisions.txt
> index 27dec5b91d..1f6cceaefb 100644
> --- a/Documentation/gitrevisions.txt
> +++ b/Documentation/gitrevisions.txt
> @@ -3,7 +3,7 @@ gitrevisions(7)
>
>  NAME
>  
> -gitrevisions - specifying revisions and ranges for Git
> +gitrevisions - Specifying revisions and ranges for Git
>
>  SYNOPSIS
>  
> diff --git a/Makefile b/Makefile
> index 71b5b594cd..18696e35b0 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1939,7 +1939,7 @@ $(BUILT_INS): git$X
>
>  command-list.h: generate-cmdlist.sh command-list.txt
>
> -command-list.h: $(wildcard Documentation/git-*.txt)
> +command-list.h: $(wildcard Documentation/git*.txt)
> $(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh command-list.txt >$@+ 
> && mv $@+ $@
>
>  SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
> diff --git a/builtin/help.c b/builtin/help.c
> index 83a7d73afe..b58e8d5f6a 100644
> --- a/builtin/help.c
> +++ b/builtin/help.c
> @@ -402,38 +402,6 @@ static void show_html_page(const char *git_cmd)
> open_html(page_path.buf);
>  }
>
> -static struct {
> -   const char *name;
> -   const 

Re: [PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index

2018-04-29 Thread Duy Nguyen
On Tue, Apr 24, 2018 at 8:50 AM, Elijah Newren  wrote:
> Currently, all callers of unpack_trees() set o->src_index == o->dst_index.
> The code in unpack_trees() does not correctly handle them being different.
> There are two separate issues:
>
> First, there is the possibility of memory corruption.  Since
> unpack_trees() creates a temporary index in o->result and then discards
> o->dst_index and overwrites it with o->result, in the special case that
> o->src_index == o->dst_index, it is safe to just reuse o->src_index's
> split_index for o->result.  However, when src and dst are different,
> reusing o->src_index's split_index for o->result will cause the
> split_index to be shared.  If either index then has entries replaced or
> removed, it will result in the other index referring to free()'d memory.
>
> Second, we can drop the index extensions.  Previously, we were moving
> index extensions from o->dst_index to o->result.  Since o->src_index is
> the one that will have the necessary extensions (o->dst_index is likely to
> be a new index temporary index created to store the results), we should be
> moving the index extensions from there.
>
> Signed-off-by: Elijah Newren 
> ---
>
> Differences from v2:
>   - Don't NULLify src_index until we're done using it
>   - Actually built and tested[1]
>
> But it now passes the testsuite on both linux and mac[2], and I even re-merged
> all 53288 merge commits in linux.git (with a merge of this patch together with
> the directory rename detection series) for good measure.  [Only 7 commits
> showed a difference, all due to directory rename detection kicking in.]
>
> [1] Turns out that getting all fancy with an m4.10xlarge and nice levels of
> parallelization are great until you realize that your new setup omitted a
> critical step, leaving you running a slightly stale version of git instead...
> :-(
>
> [2] Actually, I get two test failures on mac from t0050-filesystem.sh, both
> with unicode normalization tests, but those two tests fail before my changes
> too.  All the other tests pass.
>
>  unpack-trees.c | 19 +++
>  1 file changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index e73745051e..49526d70aa 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1284,9 +1284,20 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> struct unpack_trees_options
> o->result.timestamp.sec = o->src_index->timestamp.sec;
> o->result.timestamp.nsec = o->src_index->timestamp.nsec;
> o->result.version = o->src_index->version;
> -   o->result.split_index = o->src_index->split_index;
> -   if (o->result.split_index)
> +   if (!o->src_index->split_index) {
> +   o->result.split_index = NULL;
> +   } else if (o->src_index == o->dst_index) {
> +   /*
> +* o->dst_index (and thus o->src_index) will be discarded
> +* and overwritten with o->result at the end of this function,
> +* so just use src_index's split_index to avoid having to
> +* create a new one.
> +*/
> +   o->result.split_index = o->src_index->split_index;
> o->result.split_index->refcount++;
> +   } else {
> +   o->result.split_index = init_split_index(>result);
> +   }
> hashcpy(o->result.sha1, o->src_index->sha1);
> o->merge_size = len;
> mark_all_ce_unused(o->src_index);
> @@ -1401,7 +1412,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> struct unpack_trees_options
> }
> }
>
> -   o->src_index = NULL;
> ret = check_updates(o) ? (-2) : 0;
> if (o->dst_index) {
> if (!ret) {
> @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> struct unpack_trees_options
>   WRITE_TREE_SILENT |
>   WRITE_TREE_REPAIR);
> }
> -   move_index_extensions(>result, o->dst_index);
> +   move_index_extensions(>result, o->src_index);

While this looks like the right thing to do on paper, I believe it's
actually broken for a specific case of untracked cache. In short,
please do not touch this line. I will send a patch to revert
edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08),
which essentially deletes this line, with proper explanation and
perhaps a test if I could come up with one.

When we update the index, we depend on the fact that all updates must
invalidate the right untracked cache correctly. In this unpack
operations, we start copying entries over from src to result. Since
'result' (at least from the beginning) does not have an untracked
cache, it has nothing to invalidate when we copy entries over. By the
time we have done preparing 'result', what's recorded in src's (or
dst's for 

Re: [PATCH v4/wip 02/12] generate-cmds.sh: export all commands to command-list.h

2018-04-29 Thread Duy Nguyen
On Wed, Apr 25, 2018 at 8:07 PM, Eric Sunshine  wrote:
> On Wed, Apr 25, 2018 at 12:30 PM, Nguyễn Thái Ngọc Duy
>  wrote:
>> The current generate-cmds.sh generates just enough to print "git help"
>> output. That is, it only extracts help text for common commands.
>>
>> The script is now updated to extract help text for all commands and
>> keep command classification a new file, command-list.h. This will be
>> useful later:
>> [...]
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> ---
>> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
>> @@ -12,14 +34,51 @@ get_synopsis () {
>> +define_categories() {
>> +   echo
>> +   echo "/* Command categories */"
>> +   bit=0
>> +   category_list "$1" |
>> +   while read cat
>> +   do
>> +   echo "#define CAT_$cat (1UL << $bit)"
>> +   bit=$(($bit+1))
>> +   done
>> +   test "$bit" -gt 32 && die "Urgh.. too many categories?"
>
> Should this be '-ge' rather than '-gt'?

After we print "1UL << 31" we increment it to 32 and exit the loop,
then it's still within limits, -ge would incorrectly complain
-- 
Duy


Re: [RFC PATCH] checkout: Force matching mtime between files

2018-04-28 Thread Duy Nguyen
On Thu, Apr 26, 2018 at 4:46 PM, Michał Górny  wrote:
> For the record, we're using this with ebuilds and respective cache files
> (which are expensive to generate).  We are using separate repository
> which combines sources and cache files to keep the development
> repository clean.  I have researched different solutions for this but
> git turned out the best option for incremental updates for us.
>
> Tarballs are out of question, unless you expect users to fetch >100 MiB
> every time, and they are also expensive to update.  Deltas of tarballs
> are just slow and require storing a lot of extra data.  Rsync is not
> very efficient at frequent updates, and has significant overhead
> on every run.  With all its disadvantages, git is still something that
> lets our users fetch updates frequently with minimal network overhead.

I assume you're talking about the metadata directory in gentoo-x86
repo. This specific case could be solved by renaming metadata to
_metadata or something to put it on the top. "git checkout" always
updates files in strcmp(path) order. This guarantees time(_metadata)
<= time(ebuild) for all ebuilds without any extra touching (either in
git or in a post-checkout hook)

The behavior has been this way since forever and as far as I can tell
very unlikely to change at least for branch switching (major changes
involved around the index). It's a bit easier to accidentally change
how "git checkout -- path" works though. I don't know if we could just
make this checkout order a promise and guarantee not to break it
though. For it it does not sound like it adds extra maintenance
burden.
-- 
Duy


Re: [RFC PATCH] checkout: Force matching mtime between files

2018-04-28 Thread Duy Nguyen
On Fri, Apr 27, 2018 at 11:08 PM, Marc Branchaud  wrote:
>> On Wed, Apr 25, 2018 at 5:18 PM, Marc Branchaud 
>> This is a limitation of the current post-checkout hook. $3==0 from the
>> hook lets us know this is not a branch switch, but it does not really
>> tell you the affected paths. If it somehow passes all the given
>> pathspec to you, then you should be able to do "git ls-files --
>> $pathspec" which gives you the exact same set of paths that
>> git-checkout updates. We could do this by setting $4 to "--" and put
>> all the pathspecs in $5+ [1] e.g. "HEAD@{1} HEAD 0 -- path/to/file" in
>> the above example.
>>
>> There is  third case here, if you do "git checkout  --
>> path/to/file" then it cannot be covered by the current design. I guess
>> we could set $3 to '2' (retrieve from a tree) to indicate this in
>> addition to 0 (from index) and 1 (from switching branch) and then $1
>> could be the tree in question (pathspecs are passed the same way
>> above)
>>
>> [1] I wonder if we could have a more generic approach to pass
>> pathspecs via environment, which could work for more than just this
>> one hook. Not sure if it's a good idea though.
>
>
> I think there needs to be something other than listing all the paths in the
> command is viable, because it's too easy to hit some command-line-length
> limit.

I send pathspecs, not paths. If you type "git checkout -- foo/" then I
send exactly "foo/" not every paths in it. You can always figure that
out with git-ls-files.

Sure this can still hit command length limit when you do "git checkout
-- foo/*" and have lots of files in foo just one more param from
hitting the limit, then the hook may hit the limit because we need
more command line arguments. But this is the corner case I don't think
we should really need to care about.

> It would also be good if hook authors didn't have to re-invent the
> wheel of determining the changed paths for every corner-case.

Flexibility vs convenience I guess. A sample hook as template should
help the reinvention.

> My first instinct is to write them one-per-line on the hook's stdin. That's
> probably not generic enough for most hooks, but it seems like a good
> approach for this proposal.
>
> Throwing them into a temporary file with a known name is also good ---
> better, I think, than stuffing them into an environment variable.

This goes back to my post-checkout-modified proposal. If you're
writing to file, might as well reuse the index format. Then you can
read it with ls-files (which lets you decide path separator or even
quoting, I'm not sure) and it also provides some more info like file
hashes, access time...
-- 
Duy


Re: [RFC PATCH] checkout: Force matching mtime between files

2018-04-28 Thread Duy Nguyen
On Fri, Apr 27, 2018 at 11:08 PM, Elijah Newren <new...@gmail.com> wrote:
> On Fri, Apr 27, 2018 at 10:03 AM, Duy Nguyen <pclo...@gmail.com> wrote:
>> On Wed, Apr 25, 2018 at 5:18 PM, Marc Branchaud <marcn...@xiplink.com> wrote:
>>>
>>> * In a "file checkout" ("git checkout -- path/to/file"), $1 and $2 are
>>> identical so the above loop does nothing.  Offhand I'm not even sure how a
>>> hook might get the right files in this case.
>>
>> This is a limitation of the current post-checkout hook. $3==0 from the
>> hook lets us know this is not a branch switch, but it does not really
>> tell you the affected paths. If it somehow passes all the given
>> pathspec to you, then you should be able to do "git ls-files --
>> $pathspec" which gives you the exact same set of paths that
>> git-checkout updates. We could do this by setting $4 to "--" and put
>> all the pathspecs in $5+ [1] e.g. "HEAD@{1} HEAD 0 -- path/to/file" in
>> the above example.
>>
>> There is  third case here, if you do "git checkout  --
>> path/to/file" then it cannot be covered by the current design. I guess
>> we could set $3 to '2' (retrieve from a tree) to indicate this in
>> addition to 0 (from index) and 1 (from switching branch) and then $1
>> could be the tree in question (pathspecs are passed the same way
>> above)
>>
>> [1] I wonder if we could have a more generic approach to pass
>> pathspecs via environment, which could work for more than just this
>> one hook. Not sure if it's a good idea though.
>
> Here's a crazy idea -- maybe instead of a list of pathspecs you just
> provide the timestamp of when git checkout started.  Then the hook
> could walk the tree, find all files with modification times at least
> that late, and modify them all back to the the timestamp of when the
> git checkout started.
>
> Would that be enough?  Is that too crazy?

For this use case? Probably (assuming that timestamp precision does
not cause problems).

I'm more concerned about what info post-checkout hook should provide
but does not. Giving hook writer a way to get the list of updated
files lets them do more fancy stuff while providing checkout start
time probably only helps just this one case.

Providing start time in general for all hooks sounds like a good thing
though (and simple enough to implement).
-- 
Duy


Re: [PATCH 18/41] index-pack: abstract away hash function constant

2018-04-27 Thread Duy Nguyen
On Fri, Apr 27, 2018 at 11:08 PM, brian m. carlson
<sand...@crustytoothpaste.net> wrote:
> On Thu, Apr 26, 2018 at 05:46:28PM +0200, Duy Nguyen wrote:
>> On Wed, Apr 25, 2018 at 8:49 PM, Martin Ågren <martin.ag...@gmail.com> wrote:
>> > Once that is accomplished, I sort of suspect that this code will want to
>> > be updated to not always blindly use the_hash_algo, but to always work
>> > with SHA-1 sizes. Or rather, this would turn into more generic code to
>> > handle both "v2 with SHA-1" and "v3 with some hash function(s)". This
>> > commit might be a good first step in that direction.
>>
>> I also have an uneasy feeling when things this close to on-disk file
>> format get hash-agnostic treatment. I think we would need to start
>> adding new file formats soon, from bottom up with simple things like
>> loose object files (cat-file and hash-object should be enough to test
>> blobs...), then moving up to pack files and more. This is when we can
>> really decide where to use the new hash and whether we should keep
>> some hashes as sha-1.
>
> I agree that this is work which needs to be done soon.  There are
> basically a couple of pieces we need to handle NewHash:
>
> * Remove the dependencies on SHA-1 as much as possible.
> * Get the tests to pass with a different hash (almost done for 160-bit
>   hash; in progress for 256-bit hashes).

This step sounds good on paper but realistically could be a nightmare for you.

I tried to implement a simple cat-file/hash-object combination with my
imaginary newhash, which sounded straightforward to me since you have
done a lot of heavylifting. To my surprise I hit a lot more problems.
My point is, when I concentrate on just a few simple cases like this,
I have a smaller scope to work with and could quickly identify
problems. When you work on the scale of the test suite, it's really
hard to know where the problem is (and you don't even know what areas
are newhash-safe).

Anyway my cat-file/hash-object modification could be found here. I
probably will polish and send out a few good patches from it.

https://github.com/pclouds/git/commits/new-hash
-- 
Duy


Re: [RFC PATCH] checkout: Force matching mtime between files

2018-04-27 Thread Duy Nguyen
On Wed, Apr 25, 2018 at 5:18 PM, Marc Branchaud  wrote:
>> The best approach to do so is to have those people do the "touch"
>> thing in their own post-checkout hook.  People who use Git as the
>> source control system won't have to pay runtime cost of doing the
>> touch thing, and we do not have to maintain such a hook script.
>> Only those who use the "feature" would.
>
>
> The post-checkout hook approach is not exactly straightforward.

I am revisiting this because I'm not even happy with my
post-checkout-modified hook suggestion, so..

>
> Naively, it's simply
>
> for F in `git diff --name-only $1 $2`; do touch "$F"; done
>
> But consider:
>
> * Symlinks can cause the wrong file to be touched.  (Granted, Michał's
> proposed patch also doesn't deal with symlinks.)  Let's assume that a hook
> can be crafted will all possible sophistication.  There are still some
> fundamental problems:

OK so this one could be tricky to get right, but it's possible.

>
> * In a "file checkout" ("git checkout -- path/to/file"), $1 and $2 are
> identical so the above loop does nothing.  Offhand I'm not even sure how a
> hook might get the right files in this case.

This is a limitation of the current post-checkout hook. $3==0 from the
hook lets us know this is not a branch switch, but it does not really
tell you the affected paths. If it somehow passes all the given
pathspec to you, then you should be able to do "git ls-files --
$pathspec" which gives you the exact same set of paths that
git-checkout updates. We could do this by setting $4 to "--" and put
all the pathspecs in $5+ [1] e.g. "HEAD@{1} HEAD 0 -- path/to/file" in
the above example.

There is  third case here, if you do "git checkout  --
path/to/file" then it cannot be covered by the current design. I guess
we could set $3 to '2' (retrieve from a tree) to indicate this in
addition to 0 (from index) and 1 (from switching branch) and then $1
could be the tree in question (pathspecs are passed the same way
above)

[1] I wonder if we could have a more generic approach to pass
pathspecs via environment, which could work for more than just this
one hook. Not sure if it's a good idea though.

> * The hook has to be set up in every repo and submodule (at least until
> something like Ævar's experiments come to fruition).

Either --template or core.hooksPath would solve this, or I'll try to
get my "hooks.* config" patch in. I think that one is a good thing to
do anyway because it allows more flexible hook management (and it
could allow multiple hooks of the same name too). With this, we could
avoid adding more command-specific config like core.fsmonitor or
uploadpack.packObjectsHook which to me are hooks.

Another option is extend core.hooksPath for searching multiple places
instead of just one like AEvar suggested.

> * A fresh clone can't run the hook.  This is especially important when
> dealing with submodules.  (In one case where we were bit by this, make
> though that half of a fresh submodule clone's files were stale, and decided
> to re-autoconf the entire thing.)

Both --template and config-based hooks should let you handle this case.

So, I think if we improve the hook system to give more information
(which is something we definitely should do), we could do this without
adding special cases in git. I'm not saying that we should never add
special cases, but at least this lets us play with different kinds of
post-checkout activities before we decide which one would be best
implemented in git.
-- 
Duy


Re: [RFC PATCH] checkout: Force matching mtime between files

2018-04-26 Thread Duy Nguyen
On Thu, Apr 26, 2018 at 07:53:58PM +0200, SZEDER Gábor wrote:
> BTW, wouldn't running
> 
>   git clone --template=/path/to/template/dir/with/hooks/
> 
> invoke the post-checkout hook in there?
> 

Yes but it's cumbersome, preparing a template just for one extra
hook. I never like this template thing to be honest.
--
Duy


Re: [RFC PATCH] checkout: Force matching mtime between files

2018-04-26 Thread Duy Nguyen
On Thu, Apr 26, 2018 at 05:48:35PM +, Robin H. Johnson wrote:
> On Thu, Apr 26, 2018 at 06:43:56PM +0200, Duy Nguyen wrote:
> > On Wed, Apr 25, 2018 at 5:18 PM, Marc Branchaud <marcn...@xiplink.com> 
> > wrote:
> > > Are we all that sure that the performance hit is that drastic?  After all,
> > > we've just done write_entry().  Calling utime() at that point should just
> > > hit the filesystem cache.
> > I have a feeling this has "this is linux" assumption. Anybody knows
> > how freebsd, mac os x and windows behave?
> I don't know sorry. futimens might be better here if it can be used
> before the fd is closed.
> 
> > > * In a "file checkout" ("git checkout -- path/to/file"), $1 and $2 are
> > > identical so the above loop does nothing.  Offhand I'm not even sure how a
> > > hook might get the right files in this case.
> > Would a hook that gives you the list of updated files (in the exact
> > same order that git updates) help?
> Yes, that, along with the target revision I think would allow most or
> all of the desired behaviors mentioned in this thread *.

Target revision should be available in the index. But this gives me an
idea to another thing that bugs me: sending the list to the hook means
I have to deal with separator (\n or NUL?) or escaping. This mentions
of index makes me take a different direction. I could produce a small
index that contains just what is modified, then you can retrieve
whatever info you want with `git ls-files` or even `git show` after
pointing $GIT_INDEX_FILE to it.

So it's basically what the following (hacky) patch does. It adds
support for a new hook named post-checkout-modified. This hook will
prepares $GIT_DIR/index.modified which contains just the files
git-checkout has touched and deletes it after the hook finishes.

My test hook is pretty simple just to dump out what in there

#!/bin/sh
GIT_INDEX_FILE=`git rev-parse --git-path index.modified` git ls-files 
--stage

and it seems to work.

Of course, this does not give you the checkout order. But checkout
order has always been sorted order by path if I remember correctly and
it's unlikely to change (and I don't think you really need that exact
order anyway)

-- 8< --
diff --git a/builtin/checkout.c b/builtin/checkout.c
index b49b582071..92b30cd05f 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -52,6 +52,8 @@ struct checkout_opts {
const char *prefix;
struct pathspec pathspec;
struct tree *source_tree;
+
+   struct index_state istate_modified;
 };
 
 static int post_checkout_hook(struct commit *old_commit, struct commit 
*new_commit,
@@ -470,7 +472,7 @@ static void setup_branch_path(struct branch_info *branch)
branch->path = strbuf_detach(, NULL);
 }
 
-static int merge_working_tree(const struct checkout_opts *opts,
+static int merge_working_tree(struct checkout_opts *opts,
  struct branch_info *old_branch_info,
  struct branch_info *new_branch_info,
  int *writeout_error)
@@ -595,6 +597,27 @@ static int merge_working_tree(const struct checkout_opts 
*opts,
if (!cache_tree_fully_valid(active_cache_tree))
cache_tree_update(_index, WRITE_TREE_SILENT | 
WRITE_TREE_REPAIR);
 
+   if (find_hook("post-checkout-modified")) {
+   int i;
+
+   for (i = 0; i < the_index.cache_nr; i++) {
+   struct cache_entry *ce = the_index.cache[i];
+   struct cache_entry *new_ce;
+
+   /*
+* Hack: this is an abuse of this flag, hidden
+* dependency with write_locked_index()
+*/
+   if (!(ce->ce_flags & CE_UPDATE_IN_BASE))
+   continue;
+
+   new_ce = xcalloc(1, cache_entry_size(ce_namelen(ce)));
+   memcpy(new_ce, ce, cache_entry_size(ce_namelen(ce)));
+   add_index_entry(>istate_modified, new_ce,
+   ADD_CACHE_JUST_APPEND);
+   }
+   }
+
if (write_locked_index(_index, _file, COMMIT_LOCK))
die(_("unable to write new index file"));
 
@@ -811,7 +834,7 @@ static void orphaned_commit_warning(struct commit 
*old_commit, struct commit *ne
clear_commit_marks_all(ALL_REV_FLAGS);
 }
 
-static int switch_branches(const struct checkout_opts *opts,
+static int switch_branches(struct checkout_opts *opts,
   struct branch_info *new_branch_info)
 {
int ret = 0;
@@ -848,6 +871,16 @@ static int switch_branches(const struct checkout_opts 
*opts,
 
update_refs_for_switch(opts, _branch_info, new_branch_info);
 

Re: [RFC PATCH] checkout: Force matching mtime between files

2018-04-26 Thread Duy Nguyen
On Wed, Apr 25, 2018 at 10:41:18AM +0200, Ævar Arnfjörð Bjarmason wrote:
>  2. Add some config so we can have hook search paths, and ship with a
> default search path for hooks shipped with git.

I would go for something like this instead of search paths. This
allows you to specify a path to any specific hook in hooks.* config
group. This is basically "$GIT_DIR/hooks directory in config file" but
with lower priority than those in $GIT_DIR/hooks.

Now we can do something like


git -c hooks.post-checkout=/path/to/some/script clone ...

but of course I would need to fix the FIXME or this hook config is
only effective just this one time. (And of course you could put it in
~/.gitconfig)

-- 8< --
diff --git a/builtin/clone.c b/builtin/clone.c
index 7df5932b85..143413ed2d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1063,6 +1063,11 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
strbuf_addf(_top, "refs/remotes/%s/", option_origin);
}
 
+   /*
+* FIXME: we should keep all custom config settings via
+* "git  -c ..." in $GIT_DIR/config.
+*/
+
strbuf_addf(, "+%s*:%s*", src_ref_prefix, branch_top.buf);
strbuf_addf(, "remote.%s.url", option_origin);
git_config_set(key.buf, repo);
diff --git a/run-command.c b/run-command.c
index 84899e423f..ee5b6ea329 100644
--- a/run-command.c
+++ b/run-command.c
@@ -7,6 +7,7 @@
 #include "strbuf.h"
 #include "string-list.h"
 #include "quote.h"
+#include "config.h"
 
 void child_process_init(struct child_process *child)
 {
@@ -1256,6 +1257,8 @@ const char *find_hook(const char *name)
strbuf_reset();
strbuf_git_path(, "hooks/%s", name);
if (access(path.buf, X_OK) < 0) {
+   const char *cfg_hook;
+   struct strbuf cfg_key = STRBUF_INIT;
int err = errno;
 
 #ifdef STRIP_EXTENSION
@@ -1278,9 +1281,14 @@ const char *find_hook(const char *name)
   path.buf);
}
}
-   return NULL;
+
+   strbuf_reset();
+   strbuf_addf(_key, "hooks.%s", name);
+   if (!git_config_get_pathname(cfg_key.buf, _hook))
+   strbuf_addstr(, cfg_hook);
+   strbuf_release(_key);
}
-   return path.buf;
+   return path.len ? path.buf : NULL;
 }
 
 int run_hook_ve(const char *const *env, const char *name, va_list args)
-- 8< --
--
Duy


Re: [RFC PATCH] checkout: Force matching mtime between files

2018-04-26 Thread Duy Nguyen
On Wed, Apr 25, 2018 at 5:18 PM, Marc Branchaud  wrote:
> Are we all that sure that the performance hit is that drastic?  After all,
> we've just done write_entry().  Calling utime() at that point should just
> hit the filesystem cache.

I have a feeling this has "this is linux" assumption. Anybody knows
how freebsd, mac os x and windows behave?

> The post-checkout hook approach is not exactly straightforward.
>
> Naively, it's simply
>
> for F in `git diff --name-only $1 $2`; do touch "$F"; done
>
> But consider:
>
> * Symlinks can cause the wrong file to be touched.  (Granted, Michał's
> proposed patch also doesn't deal with symlinks.)  Let's assume that a hook
> can be crafted will all possible sophistication.  There are still some
> fundamental problems:
>
> * In a "file checkout" ("git checkout -- path/to/file"), $1 and $2 are
> identical so the above loop does nothing.  Offhand I'm not even sure how a
> hook might get the right files in this case.

Would a hook that gives you the list of updated files (in the exact
same order that git updates) help?

> * The hook has to be set up in every repo and submodule (at least until
> something like Ævar's experiments come to fruition).
>
> * A fresh clone can't run the hook.  This is especially important when
> dealing with submodules.  (In one case where we were bit by this, make
> though that half of a fresh submodule clone's files were stale, and decided
> to re-autoconf the entire thing.)

This to me sounds like something we should be able to improve in
general. The alternative is "git clone --no-checkout" then checkout
manually (which I see jenkins plugin does) is really not optimal even
if it works.
-- 
Duy


Re: [PATCH 18/41] index-pack: abstract away hash function constant

2018-04-26 Thread Duy Nguyen
On Wed, Apr 25, 2018 at 8:49 PM, Martin Ågren  wrote:
>> I agree that pack v2 is not going to have anything but SHA-1.  However,
>> writing all the code such that it's algorithm agnostic means that we can
>> do testing of new algorithms by wholesale replacing the algorithm with a
>> new one, which simplifies things considerably.
>
> Ok. I do sort of wonder if a "successful" test run after globally
> substituting Hash-Foo for SHA-1 (regardless of whether the size changes
> or not) hints at a problem. That is, nowhere do we test that this code
> uses 20-byte SHA-1s, regardless of what other hash functions are
> available and configured. Of course, until soon, that did not really
> have to be tested since there was only one hash function available to
> choose from. As for identifying all the places that matter ... no idea.
>
> Of course I can see how this helps get things to a point where Git does
> not crash and burn because the hash has a different size, and where the
> test suite doesn't spew failures because the initial chaining value of
> "SHA-1" is changed.
>
> Once that is accomplished, I sort of suspect that this code will want to
> be updated to not always blindly use the_hash_algo, but to always work
> with SHA-1 sizes. Or rather, this would turn into more generic code to
> handle both "v2 with SHA-1" and "v3 with some hash function(s)". This
> commit might be a good first step in that direction.

I also have an uneasy feeling when things this close to on-disk file
format get hash-agnostic treatment. I think we would need to start
adding new file formats soon, from bottom up with simple things like
loose object files (cat-file and hash-object should be enough to test
blobs...), then moving up to pack files and more. This is when we can
really decide where to use the new hash and whether we should keep
some hashes as sha-1.

For trailing hashes for example, there's no need to move to a new hash
which only costs us more cycles. We just use it as a fancy checksum to
avoid bit flips. But then my assumption about cost may be completely
wrong without experimenting.

> Long rambling short, yeah, I see your point.

So yeah. It may be ok to move everything to "new hash" now. But we
need a closer look soon.
-- 
Duy


Re: What's cooking in git.git (Apr 2018, #03; Wed, 25)

2018-04-26 Thread Duy Nguyen
On Wed, Apr 25, 2018 at 10:37 AM, Junio C Hamano  wrote:
> * nd/pack-objects-pack-struct (2018-04-16) 15 commits
>  ...
>
>  "git pack-objects" needs to allocate tons of "struct object_entry"
>  while doing its work, and shrinking its size helps the performance
>  quite a bit.
>
>  What's the doneness of this thing?  The interdiff since previous
>  rounds looked reasonable, but I didn't see this round otherwise
>  scrutinized by reviewers.  The numbers given in the commit near the
>  tip do look impressive, though ;-)

I think it's ok to move it to next, though I'd prefer to move it to
master just right after a release so it gets tested for a whole
release cycle. This also gives Jeff a chance to check it after he's
back (if he wants to).

> * nd/repack-keep-pack (2018-04-16) 7 commits
>  ...
>
>  "git gc" in a large repository takes a lot of time as it considers
>  to repack all objects into one pack by default.  The command has
>  been taught to pretend as if the largest existing packfile is
>  marked with ".keep" so that it is left untouched while objects in
>  other packs and loose ones are repacked.
>
>  What's the doneness of this thing?  The interdiff since the earlier
>  one looked reasonable, but I didn't see this round otherwise
>  scrutinized by reviewers.

This one should be safer than the previous one. I think it's ok to move to next.

Anyway I'll re-read these two series this weekend to see if I could
spot anything new.
-- 
Duy


Re: [PATCH v3 4/6] git.c: implement --list-cmds=porcelain

2018-04-25 Thread Duy Nguyen
On Wed, Apr 25, 2018 at 3:46 PM, SZEDER Gábor <szeder@gmail.com> wrote:
> On Tue, Apr 24, 2018 at 6:17 PM, Duy Nguyen <pclo...@gmail.com> wrote:
>> On Tue, Apr 24, 2018 at 6:12 PM, Duy Nguyen <pclo...@gmail.com> wrote:
>>> git-completion.bash will be updated to ask git "give me the commands
>>> in the mainporcelain, completable or external category". This also
>>> addresses another thing that bugs me: I wanted an option to let me
>>> complete all commands instead of just porcelain. This approach kinda
>>> generalizes that and it would be easy to let the user choose what
>>> category they want to complete.
>>
>> To complete this: there could also be yet another special category
>> "custom", where --list-cmds=custom simply returns a list of commands
>> specified in config file. With this the user can pick the "custom"
>> category to have total control of what command shows up at "git "
>> if they are not satisfied with the predefined categories.
>
> Note that config settings can be repository-specific, which might
> cause surprises if the user sets a custom command list in one
> repository's config file, but not (or a different one!) in others.
> Then the list of completed commands will depend on where the user
> happened to be when first hitting 'git '.

I think that is expected when the word "config file" is mentioned.
It's no different than aliases, which could also be repo specific and
affects completion.

> Unless you forgo
> caching the list of commands and run 'git --list-cmds=...' every time
> the user hits 'git ', but that will add the overhead of fork()ing
> a subshell and a git command.

This is a good point. I'd rather forgo caching iff the "custom"
strategy is used (technically we could still cache if we know the
source if $HOME/.gitconfig or higher but it's not worth the effort).
Just make it clear to the user that if they go with "custom" strategy
then they may hit some performance hit.

> Once upon a time I toyed with the idea of introducing environment
> variables like $GIT_COMPLETION_EXCLUDE_COMMANDS and
> $GIT_COMPLETION_INCLUDE_COMMANDS, to exclude and include commands from
> 'git '.  I wanted to exclude 'git cherry', because I have never
> ever used it but it's always in the way when I want to cherry-pick.
> And I wanted to include 'git for-each-ref' back when I was running it
> frequently while working on speeding up refs completion, but it
> wouldn't have brought that much benefits, because I have a 'git
> for-each-commit' script, too.
> Never really pursued it, though, just patched the long list in
> __git_list_porcelain_commands() in my git clone :)

I'm tempted to support "delta custom list" (e.g. "+cherry-pick" in the
config variable means "add that command on the existing list", or
"-blame" means exclude it) but that's probably too much work.
-- 
Duy


Re: [PATCH v3 4/6] git.c: implement --list-cmds=porcelain

2018-04-25 Thread Duy Nguyen
On Wed, Apr 25, 2018 at 3:06 PM, SZEDER Gábor  wrote:
>> -__git_list_all_commands ()
>
>> -__git_list_porcelain_commands ()
>
> Users can have their own completion scriptlets for their own git
> commands, and these should be able to rely on helper functions in
> git-completion.bash to do things like refs completion and what not.
> Therefore, we tend not to remove or alter those helper functions in a
> backwards incompatible way, because we don't want to break those
> completion scriptlets.

What kind of "API" guarantee do we provide here? I don't mind adding
the wrappers, but for me it's hard for new contributors (like me) to
see which one should be stable and which one is internal.
-- 
Duy


Re: [PATCH v3 4/6] git.c: implement --list-cmds=porcelain

2018-04-24 Thread Duy Nguyen
On Tue, Apr 24, 2018 at 6:12 PM, Duy Nguyen <pclo...@gmail.com> wrote:
> git-completion.bash will be updated to ask git "give me the commands
> in the mainporcelain, completable or external category". This also
> addresses another thing that bugs me: I wanted an option to let me
> complete all commands instead of just porcelain. This approach kinda
> generalizes that and it would be easy to let the user choose what
> category they want to complete.

To complete this: there could also be yet another special category
"custom", where --list-cmds=custom simply returns a list of commands
specified in config file. With this the user can pick the "custom"
category to have total control of what command shows up at "git "
if they are not satisfied with the predefined categories.
-- 
Duy


Re: [PATCH v3 4/6] git.c: implement --list-cmds=porcelain

2018-04-24 Thread Duy Nguyen
On Mon, Apr 23, 2018 at 3:32 PM, SZEDER Gábor  wrote:
> But then I noticed that it's not an accurate description of the
> current situation, because there is a wide grey area between
> porcelains and plumbing, and the completion script doesn't "filter out
> plumbing commands", but rather filters out commands that can be
> considered too low-level to be useful for "general" usage.
> Consequently, after 'git ' we also list:
>
>   - some 'ancillaryinterrogators': blame, annotate, difftool, fsck,
> help
>   - some 'ancillarymanipulators': config, mergetool, remote
>   - some 'foreignscminterface': p4, request-pull, svn, send-email
>   - even some plumbing: apply, name-rev (though 'name-rev' could be
> omitted; we have 'git describe')
>   - and also all "unknown" 'git-foo' commands that can be found in
> $PATH, which can be the user's own git scripts or other
> git-related tools ('git-annex', Git LFS, etc.).
>
> With this change we wouldn't list any of the above commands, but only
> those that are explicitly categorized as 'mainporcelain'.  I'd much
> prefer the current behaviour.

Yeah I noticed this (kinda) with filter-branch but I did not look
further to see all this. It's good that you review this series then :)

For the first group (known commands), how about we add a new category
"completion" in command-list.txt? Each command may belong to multiple
categories (and my updated script has to deal with that [1]). For the
second group, we could also have a special "external" category that is
produced at run time, not specified in command-list.txt. --list-cmds
option either has to accept multiple values, or we accept multiple
--list-cmds= options.

git-completion.bash will be updated to ask git "give me the commands
in the mainporcelain, completable or external category". This also
addresses another thing that bugs me: I wanted an option to let me
complete all commands instead of just porcelain. This approach kinda
generalizes that and it would be easy to let the user choose what
category they want to complete.

[1] which also means I could bring "deprecated" category back.
-- 
Duy


Re: [RFC PATCH v10 32.5/36] unpack_trees: fix memory corruption with split_index when src != dst

2018-04-23 Thread Duy Nguyen
On Mon, Apr 23, 2018 at 7:09 PM, Elijah Newren <new...@gmail.com> wrote:
> Hi,
>
> On Sun, Apr 22, 2018 at 5:38 AM, Duy Nguyen <pclo...@gmail.com> wrote:
>>>   - there's a better, more performant fix or there is some way to actually
>>> share a split_index between two independent index_state objects.
>>
>> A cleaner way of doing this would be something to the line [1]
>>
>> move_index_extensions(>result, o->dst_index);
>>
>> near the end of this function. This could be where we compare the
>> result index with the source index's shared file and see if it's worth
>> keeping the shared index or not. Shared index is designed to work with
>> huge index files though, any operations that go through all index
>> entries will usually not be cheap. But at least it's safer.
>
> Yeah, it looks like move_index_extensions() currently has no logic for
> the split_index.  Adding it sounds to me like a patch series of its
> own, and I'm keen to limit additional changes since my patch series
> already broke things pretty badly once already.

Oh I'm not suggesting that you do it. I was simply pointing out
something I saw while I looked at this patch and surrounding area. And
it's definitely should be done separately (by whoever) since merge
logic is quite twisted as I understand it (then top it off with rename
logic)

>> [1] To me the second parameter should be src_index, not dst_index.
>> We're copying entries from _source_ index to "result" and we should
>> also copy extensions from the source index. That line happens to work
>> only when dst_index is the same as src_index, which is the common use
>> case so far.
>
> That makes sense; this sounds like another fix that should be
> submitted.  Did you want to submit a patch making that change?  Do you
> want me to?

I did not look careful enough to make sure it was right and submit a
patch. But it sounds like it could be another regression if dst_index
is now not the same as src_index (sorry I didn't look at your whole
stories and don't if dst_index != src_index is a new thing or not). If
dst_index is new, moving extensions from that to result index is
basically no-op, in other words we fail to copy necessary extensions
over.
-- 
Duy


Re: [PATCH v3 0/6] Keep all info in command-list.txt in git binary

2018-04-22 Thread Duy Nguyen
On Sun, Apr 22, 2018 at 5:58 PM, Ramsay Jones
 wrote:
>>> I think you need to try a little harder than this! ;-)
>>
>> Yeah. I did think about grepping the output but decided not to because
>> of gettext poison stuff and column output in "git help". If we do want
>> to test this, how about I extend --list-cmds= option to take a few
>> more parameters? --list-cmds=common would output all common commands,
>> --list-cmds= does the same for other command category. This
>> way we can verify without worrying about text formatting, paging or
>> translation.
>
> Hmm, my immediate reaction would be to prefer my simple tests.
> Yes, they are not exactly rigorous and they would be affected
> by changing the help formatting, but they are effective. ;-)
>
> [I don't think the formatting would change that often, or at
> all - whoever submits that patch would get to update the tests!]

Hmm.. for non-column output that's true. "git help" with column output
should probably fine as well because even though we add more and more
commands every month, these are not marked common (and unlikely so).
So yeah I agree.

> What did you think about adding the BUG() checks?

I was thinking if there was a way to fail the build after running
./generate-cmds.sh and generating empty output but it does not sound
easy to do. So yeah, BUG() checks sound good.

-- 
Duy


Re: [PATCH v3 0/6] Keep all info in command-list.txt in git binary

2018-04-22 Thread Duy Nguyen
On Sun, Apr 22, 2018 at 4:45 PM, Ramsay Jones
<ram...@ramsayjones.plus.com> wrote:
>
>
> On 21/04/18 17:56, Duy Nguyen wrote:
>> On Sat, Apr 21, 2018 at 06:54:08PM +0200, Nguyễn Thái Ngọc Duy wrote:
>>> Changes:
>>>
>>> - remove the deprecated column in command-list.txt. My change break it
>>>   anyway if anyone uses it.
>>> - fix up failed tests that I marked in the RFC and kinda forgot about it.
>>> - fix bashisms in generate-cmdlist.sh
>>> - fix segfaul in "git help"
>>
>> Sorry I forgot the interdiff
>>
> [snip]
>
>> diff --git a/t/t0012-help.sh b/t/t0012-help.sh
>> index fd2a7f27dc..53208ab20e 100755
>> --- a/t/t0012-help.sh
>> +++ b/t/t0012-help.sh
>> @@ -25,6 +25,15 @@ test_expect_success "setup" '
>>   EOF
>>  '
>>
>> +# make sure to exercise these code paths, the output is a bit tricky
>> +# to verify
>> +test_expect_success 'basic help commands' '
>> + git help >/dev/null &&
>> + git help -a >/dev/null &&
>> + git help -g >/dev/null &&
>> + git help -av >/dev/null
>> +'
>> +
> I think you need to try a little harder than this! ;-)

Yeah. I did think about grepping the output but decided not to because
of gettext poison stuff and column output in "git help". If we do want
to test this, how about I extend --list-cmds= option to take a few
more parameters? --list-cmds=common would output all common commands,
--list-cmds= does the same for other command category. This
way we can verify without worrying about text formatting, paging or
translation.
-- 
Duy


Re: [RFC PATCH v10 32.5/36] unpack_trees: fix memory corruption with split_index when src != dst

2018-04-22 Thread Duy Nguyen
On Sat, Apr 21, 2018 at 9:37 PM, Elijah Newren  wrote:
> Currently, all callers of unpack_trees() set o->src_index == o->dst_index.
> Since we create a temporary index in o->result, then discard o->dst_index
> and overwrite it with o->result, when o->src_index == o->dst_index it is
> safe to just reuse o->src_index's split_index for o->result.  However,
> o->src_index and o->dst_index are specified separately in order to allow
> callers to have these be different.  In such a case, reusing
> o->src_index's split_index for o->result will cause the split_index to be
> shared.  If either index then has entries replaced or removed, it will
> result in the other index referring to free()'d memory.
>
> Signed-off-by: Elijah Newren 
> ---
>
> I still haven't wrapped my head around the split_index stuff entirely, so
> it's possible that
>
>   - the performance optimization isn't even valid when src == dst.  Could
> the original index be different enough from the result that we don't
> want its split_index?

This really depends on the use case of course. But when git checkout
is used for switching branches, unpack-trees will be used and unless
you switch between to vastly different branches, the updated entries
may be small compared to the entire index that sharing is still good.
If the result index is so different that it results in a huge index
file anyway, I believe we have code to recreate a new shared index to
keep its size down next time.

>   - there's a better, more performant fix or there is some way to actually
> share a split_index between two independent index_state objects.

A cleaner way of doing this would be something to the line [1]

move_index_extensions(>result, o->dst_index);

near the end of this function. This could be where we compare the
result index with the source index's shared file and see if it's worth
keeping the shared index or not. Shared index is designed to work with
huge index files though, any operations that go through all index
entries will usually not be cheap. But at least it's safer.

> However, with this fix, all the tests pass both normally and under
> GIT_TEST_SPLIT_INDEX=DareISayYes.  Without this patch, when
> GIT_TEST_SPLIT_INDEX is set, my directory rename detection series will fail
> several tests, as reported by SZEDER.

Yes, the change looks good.

[1] To me the second parameter should be src_index, not dst_index.
We're copying entries from _source_ index to "result" and we should
also copy extensions from the source index. That line happens to work
only when dst_index is the same as src_index, which is the common use
case so far.

>  unpack-trees.c | 15 +--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 79fd97074e..b670415d4c 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1284,9 +1284,20 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> struct unpack_trees_options
> o->result.timestamp.sec = o->src_index->timestamp.sec;
> o->result.timestamp.nsec = o->src_index->timestamp.nsec;
> o->result.version = o->src_index->version;
> -   o->result.split_index = o->src_index->split_index;
> -   if (o->result.split_index)
> +   if (!o->src_index->split_index) {
> +   o->result.split_index = NULL;
> +   } else if (o->src_index == o->dst_index) {
> +   /*
> +* o->dst_index (and thus o->src_index) will be discarded
> +* and overwritten with o->result at the end of this function,
> +* so just use src_index's split_index to avoid having to
> +* create a new one.
> +*/
> +   o->result.split_index = o->src_index->split_index;
> o->result.split_index->refcount++;
> +   } else {
> +   o->result.split_index = init_split_index(>result);
> +   }
> hashcpy(o->result.sha1, o->src_index->sha1);
> o->merge_size = len;
> mark_all_ce_unused(o->src_index);
> --
> 2.17.0.296.gaac25b4b81
>



-- 
Duy


Re: [PATCH v3 0/6] Keep all info in command-list.txt in git binary

2018-04-21 Thread Duy Nguyen
On Sat, Apr 21, 2018 at 06:54:08PM +0200, Nguyễn Thái Ngọc Duy wrote:
> Changes:
> 
> - remove the deprecated column in command-list.txt. My change break it
>   anyway if anyone uses it.
> - fix up failed tests that I marked in the RFC and kinda forgot about it.
> - fix bashisms in generate-cmdlist.sh
> - fix segfaul in "git help"

Sorry I forgot the interdiff

diff --git a/command-list.txt b/command-list.txt
index 0809a19184..1835f1a928 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -9,7 +9,7 @@ history  grow, mark and tweak your common history
 remote   collaborate (see also: git help workflows)
 
 ### command list (do not change this line)
-# command name  category [deprecated] [common]
+# command name  category[common]
 git-add mainporcelain   worktree
 git-am  mainporcelain
 git-annotateancillaryinterrogators
diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 9f17703aa7..7d17ca23f6 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -835,19 +835,23 @@ __git_complete_strategy ()
 }
 
 __git_commands () {
-   if test -n "${GIT_TESTING_COMMAND_COMPLETION:-}"
+   if test -n "$GIT_TESTING_COMPLETION"
then
-   printf "%s" "${GIT_TESTING_COMMAND_COMPLETION}"
+   case "$1" in
+   porcelain)
+   printf "%s" "$GIT_TESTING_PORCELAIN_COMMAND_LIST";;
+   all)
+   printf "%s" "$GIT_TESTING_ALL_COMMAND_LIST";;
+   esac
else
git --list-cmds=$1
fi
 }
 
-__git_list_all_commands ()
+__git_list_commands ()
 {
local i IFS=" "$'\n'
-   local category=${1-all}
-   for i in $(__git_commands $category)
+   for i in $(__git_commands $1)
do
case $i in
*--*) : helper pattern;;
@@ -860,14 +864,14 @@ __git_all_commands=
 __git_compute_all_commands ()
 {
test -n "$__git_all_commands" ||
-   __git_all_commands=$(__git_list_all_commands)
+   __git_all_commands=$(__git_list_commands all)
 }
 
 __git_porcelain_commands=
 __git_compute_porcelain_commands ()
 {
test -n "$__git_porcelain_commands" ||
-   __git_porcelain_commands=$(__git_list_all_commands porcelain)
+   __git_porcelain_commands=$(__git_list_commands porcelain)
 }
 
 # Lists all set config variables starting with the given section prefix,
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index e35f3e357b..86d59419b3 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -36,7 +36,7 @@ sed -n '
' "$1"
 printf '};\n\n'
 
-echo "#define GROUP_NONE 0xff /* no common group */"
+echo "#define GROUP_NONE 0xff"
 n=0
 while read grp
 do
@@ -45,15 +45,6 @@ do
 done <"$grps"
 echo
 
-echo '/*'
-printf 'static const char *cmd_categories[] = {\n'
-category_list "$1" |
-while read category; do
-   printf '\t\"'$category'\",\n'
-done
-printf '\tNULL\n};\n\n'
-echo '*/'
-
 n=0
 category_list "$1" |
 while read category; do
@@ -68,10 +59,11 @@ sort |
 while read cmd category tags
 do
if [ "$category" = guide ]; then
-   name=${cmd/git}
+   prefix=git
else
-   name=${cmd/git-}
+   prefix=git-
fi
+   name=$(echo $cmd | sed "s/^${prefix}//")
sed -n '
/^NAME/,/'"$cmd"'/H
${
diff --git a/help.c b/help.c
index a44f4a113e..88127fdd6f 100644
--- a/help.c
+++ b/help.c
@@ -201,7 +201,8 @@ static void extract_common_cmds(struct cmdname_help 
**p_common_cmds,
for (i = 0; i < ARRAY_SIZE(command_list); i++) {
const struct cmdname_help *cmd = command_list + i;
 
-   if (cmd->category != CAT_mainporcelain)
+   if (cmd->category != CAT_mainporcelain ||
+   cmd->group == GROUP_NONE)
continue;
 
common_cmds[nr++] = *cmd;
diff --git a/t/t0012-help.sh b/t/t0012-help.sh
index fd2a7f27dc..53208ab20e 100755
--- a/t/t0012-help.sh
+++ b/t/t0012-help.sh
@@ -25,6 +25,15 @@ test_expect_success "setup" '
EOF
 '
 
+# make sure to exercise these code paths, the output is a bit tricky
+# to verify
+test_expect_success 'basic help commands' '
+   git help >/dev/null &&
+   git help -a >/dev/null &&
+   git help -g >/dev/null &&
+   git help -av >/dev/null
+'
+
 test_expect_success "works for commands and guides by default" '
configure_help &&
git help status &&
diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 4bfd26ddf9..5a23a46fcf 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -13,7 +13,7 @@ complete ()
return 0
 }
 
-# Be careful when updating this list:
+# Be careful when 

Re: [PATCH v2 1/2] completion: stop showing 'save' for stash by default

2018-04-19 Thread Duy Nguyen
On Fri, Apr 20, 2018 at 1:25 AM, Thomas Gummerer  wrote:
> The 'save' subcommand in git stash has been deprecated in
> fd2ebf14db ("stash: mark "git stash save" deprecated in the man page",
> 2017-10-22).
>
> Stop showing it when the users enters 'git stash ' or 'git stash
> s'.  Keep showing it however when the user enters 'git stash sa'
> or any more characters of the 'save' subcommand.

I don't think this is worth it. You only save two keystrokes for 've'
and already waste one on .
-- 
Duy


Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary

2018-04-18 Thread Duy Nguyen
On Wed, Apr 18, 2018 at 12:47 AM, Philip Oakley  wrote:
>>> > Is that something I should add to my todo to add a 'guide' category >
>>> > etc.?
>>>
>>> I added it too [1]. Not sure if you want anything more on top though.
>
>
> What I've seen is looking good - I've not had as much time as I'd like..
>
> I'm not sure of the status of the git/generate-cmdlist.sh though. Should
> that also be updated, or did I miss that?

Yes it's updated by other patches in the same thread.
-- 
Duy


Re: [PATCH/RFC] completion: complete all possible -no-

2018-04-18 Thread Duy Nguyen
On Wed, Apr 18, 2018 at 5:43 AM, Junio C Hamano  wrote:
> So, the earlier mention of "clone --no-checkout" sounded about not
> losing this historical practice, but (desirabilty of magic number 4
> aside) this "show first handful of --no-foo" feature is not about
> historical practice but is forward looking, in the sense that you do
> not mark "important" negated options in the source, which would be a
> way to handle the histrical "clone --no-checkout", but let the
> machinery mechanically choose among --no-foo (with the stupid choice
> criterion "first four are shown").

Well you kinda mark important in the source too. --no-checkout for
exampled is declared as OPT_BOOL(0, "no-checkout"... and parse-options
code has to add the double-negative form --checkout back [1].

The "first four" is chosen after carefully examining all commands and
observing that none of them have more than 4 "important" --no-. But
yes it is questionable and I should be able to do better to separate
the favorable --no- from the other extra and most-of-the-time-useless
--no- options.

> That allows other commands to
> have many --no-foo form without overwhelming the choices, but I am
> not sure if it is much better than a possible alternative of only
> showing --no-foo for more "important" foo's when show_gitcomp() is
> asked to list all of things. It would certainly be a more involved
> solution, that might require an update to the way how the choices
> are precomputed (you'd end up having to keep a separate "use this
> list when completing '--no-'" in addition to the normal list).

I did think about this alternative and was still undecided. Suppose
that you have less than 4 "important" --no- options, showing some
extra ones to me does not really hurt anything and if we could show
more options (within the same screen space) we should. But on the
other hand maintaining this magic number could be a maintenance
nightmare... Yeah I think I'm shifting towards no magic number now.

[1] These double negative options will _always_ show up,  there is no
easy way to hide them because they don't start with --no-. But we
don't have a lot of options starting with "no-" so it's probably fine.
-- 
Duy


Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary

2018-04-17 Thread Duy Nguyen
On Tue, Apr 17, 2018 at 06:24:41PM +0200, Duy Nguyen wrote:
> On Sun, Apr 15, 2018 at 11:21 PM, Philip Oakley <philipoak...@iee.org> wrote:
> > From: "Duy Nguyen" <pclo...@gmail.com> : Saturday, April 14, 2018 4:44 PM
> >
> >> On Thu, Apr 12, 2018 at 12:06 AM, Philip Oakley <philipoak...@iee.org>
> >> wrote:
> >>>
> >>> I'm only just catching up, but does/can this series also capture the
> >>> non-command guides that are available in git so that the 'git help -g'
> >>> can
> >>> begin to list them all?
> >>
> >>
> >> It currently does not. But I don't see why it should not. This should
> >> allow git.txt to list all the guides too, for people who skip "git
> >> help" and go hard core mode with "man git". Thanks for bringing this
> >> up.
> >> --
> >> Duy
> >>
> > Is that something I should add to my todo to add a 'guide' category etc.?
> 
> I added it too [1]. Not sure if you want anything more on top though.

The "anything more" that at least I had in mind was something like
this. Though I'm not sure if it's a good thing to replace a hand
crafted section with an automatedly generated one. This patch on top
combines the "SEE ALSO" and "FURTHER DOCUMENT" into one with most of
documents/guides are extracted from command-list.txt

-- 8< --
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 6232143cb9..3e0ecd2e11 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -292,6 +292,7 @@ doc.dep : $(docdep_prereqs) $(wildcard *.txt) 
build-docdep.perl
 
 cmds_txt = cmds-ancillaryinterrogators.txt \
cmds-ancillarymanipulators.txt \
+   cmds-guide.txt \
cmds-mainporcelain.txt \
cmds-plumbinginterrogators.txt \
cmds-plumbingmanipulators.txt \
diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl
index 5aa73cfe45..e158bd9b96 100755
--- a/Documentation/cmd-list.perl
+++ b/Documentation/cmd-list.perl
@@ -54,6 +54,7 @@ for (sort <>) {
 
 for my $cat (qw(ancillaryinterrogators
ancillarymanipulators
+   guide
mainporcelain
plumbinginterrogators
plumbingmanipulators
diff --git a/Documentation/git.txt b/Documentation/git.txt
index 4767860e72..d60d2ae0c7 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -808,29 +808,6 @@ The index is also capable of storing multiple entries 
(called "stages")
 for a given pathname.  These stages are used to hold the various
 unmerged version of a file when a merge is in progress.
 
-FURTHER DOCUMENTATION
--
-
-See the references in the "description" section to get started
-using Git.  The following is probably more detail than necessary
-for a first-time user.
-
-The link:user-manual.html#git-concepts[Git concepts chapter of the
-user-manual] and linkgit:gitcore-tutorial[7] both provide
-introductions to the underlying Git architecture.
-
-See linkgit:gitworkflows[7] for an overview of recommended workflows.
-
-See also the link:howto-index.html[howto] documents for some useful
-examples.
-
-The internals are documented in the
-link:technical/api-index.html[Git API documentation].
-
-Users migrating from CVS may also want to
-read linkgit:gitcvs-migration[7].
-
-
 Authors
 ---
 Git was started by Linus Torvalds, and is currently maintained by Junio
@@ -854,11 +831,16 @@ the Git Security mailing list 
<git-secur...@googlegroups.com>.
 
 SEE ALSO
 
-linkgit:gittutorial[7], linkgit:gittutorial-2[7],
-linkgit:giteveryday[7], linkgit:gitcvs-migration[7],
-linkgit:gitglossary[7], linkgit:gitcore-tutorial[7],
-linkgit:gitcli[7], link:user-manual.html[The Git User's Manual],
-linkgit:gitworkflows[7]
+
+See the references in the "description" section to get started
+using Git.  The following is probably more detail than necessary
+for a first-time user.
+
+include::cmds-guide.txt[]
+
+See also the link:howto-index.html[howto] documents for some useful
+examples. The internals are documented in the
+link:technical/api-index.html[Git API documentation].
 
 GIT
 ---
diff --git a/command-list.txt b/command-list.txt
index 1835f1a928..f26b8acd52 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -150,10 +150,14 @@ git-whatchanged 
ancillaryinterrogators
 git-worktreemainporcelain
 git-write-tree  plumbingmanipulators
 gitattributes   guide
+gitcvs-migrationguide
+gitcli  guide
+gitcore-tutorialguide
 giteveryday guide
 gitglossary guide
 gitignore   guide
 gitmodules  guide
 gitrevisionsguide
 gittutorial guide
+gittutorial-2   guide
 gitworkflowsguide
-- 8< --


Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary

2018-04-17 Thread Duy Nguyen
On Sun, Apr 15, 2018 at 11:21 PM, Philip Oakley <philipoak...@iee.org> wrote:
> From: "Duy Nguyen" <pclo...@gmail.com> : Saturday, April 14, 2018 4:44 PM
>
>> On Thu, Apr 12, 2018 at 12:06 AM, Philip Oakley <philipoak...@iee.org>
>> wrote:
>>>
>>> I'm only just catching up, but does/can this series also capture the
>>> non-command guides that are available in git so that the 'git help -g'
>>> can
>>> begin to list them all?
>>
>>
>> It currently does not. But I don't see why it should not. This should
>> allow git.txt to list all the guides too, for people who skip "git
>> help" and go hard core mode with "man git". Thanks for bringing this
>> up.
>> --
>> Duy
>>
> Is that something I should add to my todo to add a 'guide' category etc.?

I added it too [1]. Not sure if you want anything more on top though.

[1] https://public-inbox.org/git/20180415164238.9107-7-pclo...@gmail.com/T/#u
-- 
Duy


Re: [PATCH v2 3/6] generate-cmdlist.sh: keep all information in common-cmds.h

2018-04-16 Thread Duy Nguyen
On Mon, Apr 16, 2018 at 8:28 AM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> @@ -23,28 +36,44 @@ sed -n '
>>   ' "$1"
>>  printf '};\n\n'
>>
>> +echo "#define GROUP_NONE 0xff /* no common group */"
>
> Some later code forgets about this value, and causes "git" to
> segfault at the end of this entire series.
>
> Namely, here:
>
>> - for (i = 0; i < ARRAY_SIZE(common_cmds); i++) {
>> + for (i = 0; i < nr; i++) {
>>   if (common_cmds[i].group != current_grp) {
>>   printf("\n%s\n", 
>> _(common_cmd_groups[common_cmds[i].group]));
>>   current_grp = common_cmds[i].group;
>
> where common_cmd_groups[] gets overrun.

Argh!! I thought I tested everything. Sorry for the sloppy quality.

>
> Here is a squash I'll queue on top to keep the tip of 'pu' at least
> buildable.

Thanks. It's actually interesting that we have main porcelain commands
that belong to no group. I'll try to classify them so that they show
up as well.


-- 
Duy


Re: .gitattributes lookup doesn't respect GIT_WORK_TREE

2018-04-16 Thread Duy Nguyen
On Mon, Apr 16, 2018 at 12:40 AM, Andreas Schwab  wrote:
> On Apr 16 2018, Junio C Hamano  wrote:
>
>> I may be mistaken (I do not have the code in front of me right now)
>> but IIRC after the setup.c code runs (which happens quite early in
>> the sequence that starts from git.c::cmd_main()), the Git process
>> moves to the top level of the working tree,
>
> git log/show don't appear to do that.

Yeah we lazily set up worktree in some cases. Elsewhere in the
chdir-notify thread, I suggested that we set up worktree
unconditionally (but do not die setup fails; only die when a command
specifically requests for worktree). That work would help make this
work in most cases. But it's not materialized yet.
-- 
Duy


Re: [PATCH/RFC 3/5] generate-cmdlist.sh: keep all information in common-cmds.h

2018-04-15 Thread Duy Nguyen
On Mon, Apr 9, 2018 at 6:59 AM, Eric Sunshine  wrote:
>> +awk '{print $2;}' |
>
> At one time, Junio expressed concerns[2] about having an 'awk'
> dependency in the build system (in fact, with regards to this same
> generation process). Whether he still has such concerns is unknown,
> but it should be easy enough to avoid it here (and below).
>
> [2]: https://public-inbox.org/git/20150519004356.GA12854@flurp.local/

I'll stick with awk to avoid too much headache with regular
expressions (replacements are welcome though). We do use awk in our
test suite so it should be ok (who builds git and runs it without
testing?)
-- 
Duy


Re: [PATCH v1 0/2] fsexcludes: Add programmatic way to exclude files

2018-04-14 Thread Duy Nguyen
On Tue, Apr 10, 2018 at 11:04 PM, Ben Peart  wrote:
> In git repos with large working directories an external file system monitor
> (like fsmonitor or gvfs) can track what files in the working directory have 
> been
> modified.  This information can be used to speed up git operations that scale
> based on the size of the working directory so that they become O(# of modified
> files) vs O(# of files in the working directory).
>
> The fsmonitor patch series added logic to limit what files git had to stat() 
> to
> the set of modified files provided by the fsmonitor hook proc.  It also used 
> the
> untracked cache (if enabled) to limit the files/folders git had to scan 
> looking
> for new/untracked files.  GVFS is another external file system model that also
> speeds up git working directory based operations that has been using a 
> different
> mechanism (programmatically generating an excludes file) to enable git to be
> O(# of modified files).
>
> This patch series will introduce a new way to limit git�s traversal of the
> working directory that does not require the untracked cache (fsmonitor) or 
> using
> the excludes feature (GVFS).  It does this by enhancing the existing excludes
> logic in dir.c to support a new �File System Excludes� or fsexcludes API that 
> is
> better tuned to these programmatic applications.

I have not had a chance to really look at the patches yet but I think
these three paragraphs should somehow be included in the commit
description of 1/2 (or spread out between 1/2 and 2/2). 1/2
description for example briefly talks about how to use the new thing,
but not really tell what it's for, why you need to add it.
-- 
Duy


Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary

2018-04-14 Thread Duy Nguyen
On Thu, Apr 12, 2018 at 12:06 AM, Philip Oakley  wrote:
> I'm only just catching up, but does/can this series also capture the
> non-command guides that are available in git so that the 'git help -g' can
> begin to list them all?

It currently does not. But I don't see why it should not. This should
allow git.txt to list all the guides too, for people who skip "git
help" and go hard core mode with "man git". Thanks for bringing this
up.
-- 
Duy


Re: [PATCH/RFC 3/5] generate-cmdlist.sh: keep all information in common-cmds.h

2018-04-09 Thread Duy Nguyen
On Mon, Apr 9, 2018 at 6:59 AM, Eric Sunshine  wrote:
> On Mon, Mar 26, 2018 at 12:55 PM, Nguyễn Thái Ngọc Duy
>  wrote:
>> common-cmds.h is used to extract the list of common commands (by
>> group) and a one-line summary of each command. Some information is
>> dropped, for example command category or summary of other commands.
>> Update generate-cmdlist.sh to keep all the information. The extra info
>> will be used shortly.
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> ---
>> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
>> @@ -2,9 +2,10 @@
>>  struct cmdname_help {
>> -   char name[16];
>> +   char name[32];
>> char help[80];
>> -   unsigned char group;
>> +   unsigned int category;
>> +   unsigned int group;
>>  };
>> @@ -23,27 +24,50 @@ sed -n '
>> +echo "#define GROUP_NONE 0xff /* no common group */"
>> +echo "#define GROUP_ 0xff /* no common group */"
>
> Meh, this "GROUP_" alias of "GROUP_NONE" isn't so nice.

Yeah. I don't want to mess too much with shell script. I wonder if we
should instead kill this script and extend Documentation/cmd-list.perl
to handle this task too. It would be much nicer to write and maintain
the script. The downside is NO_PERL builds will have no commands in
"git help".
-- 
Duy


Re: [PATCH 1/2] git-worktree.txt: recommend 'git worktree remove' over manual deletion

2018-04-09 Thread Duy Nguyen
On Mon, Apr 9, 2018 at 9:33 AM, Eric Sunshine  wrote:
> When cc73385cf6 (worktree remove: new command, 2018-02-12) implemented
> and documented 'git worktree remove', it forgot to update existing
> instructions suggesting manual deletion. Fix this oversight by
> recommending 'git worktree remove' instead.

Too bad we can't show off "git worktree move" :) The patches look fine.
-- 
Duy


Re: What's cooking in git.git (Apr 2018, #01; Mon, 9)

2018-04-09 Thread Duy Nguyen
On Mon, Apr 9, 2018 at 12:21 PM, Junio C Hamano  wrote:
> * sb/packfiles-in-repository (2018-03-26) 12 commits
>   (merged to 'next' on 2018-03-30 at caa68db14d)
>  + packfile: keep prepare_packed_git() private
>  + packfile: allow find_pack_entry to handle arbitrary repositories
>  + packfile: add repository argument to find_pack_entry
>  + packfile: allow reprepare_packed_git to handle arbitrary repositories
>  + packfile: allow prepare_packed_git to handle arbitrary repositories
>  + packfile: allow prepare_packed_git_one to handle arbitrary repositories
>  + packfile: add repository argument to reprepare_packed_git
>  + packfile: add repository argument to prepare_packed_git
>  + packfile: add repository argument to prepare_packed_git_one
>  + packfile: allow install_packed_git to handle arbitrary repositories
>  + packfile: allow rearrange_packed_git to handle arbitrary repositories
>  + packfile: allow prepare_packed_git_mru to handle arbitrary repositories
>  (this branch uses nd/remove-ignore-env-field and sb/object-store; is tangled 
> with sb/submodule-move-nested.)
>
>  Refactoring of the internal global data structure continues.
>
>  Is this ready for 'master' by now?

I think so. Things start to look much nicer.
-- 
Duy


Re: [PATCH/RFC 5/5] help: add "-a --verbose" to list all commands with synopsis

2018-04-09 Thread Duy Nguyen
On Mon, Apr 9, 2018 at 11:55 AM, Eric Sunshine  wrote:
> On Mon, Apr 9, 2018 at 5:47 AM, Junio C Hamano  wrote:
>> Eric Sunshine  writes:
>>> On Mon, Mar 26, 2018 at 12:55 PM, Nguyễn Thái Ngọc Duy
>>>  wrote:
 +   switch (category) {
 +   case CAT_ancillaryinterrogators: return _("Ancillary 
 interrogators");
 +   case CAT_ancillarymanipulators: return _("Ancillary manipulators");
 +   case CAT_foreignscminterface: return _("Foreign SCM interface");
 +   case CAT_mainporcelain: return _("Main porcelain");
 +   case CAT_plumbinginterrogators: return _("Plumbing interrogators");
 +   case CAT_plumbingmanipulators: return _("Plumbing interrogators");
>>>
>>> s/interrogators"/manipulators"/
>>>
 +   case CAT_purehelpers: return _("Pure helpers");
 +   case CAT_synchelpers: return _("Sync helpers");
 +   case CAT_synchingrepositories: return _("Synching repositories");
>>
>> Somehow this screams "an array of strings" at me.  Aren't this
>> CAT_things small and dense enum?
>
> Duy's modified generate-cmdlist.sh does actually output an array of
> strings for these, but the (generated) array is commented out in this
> RFC. I suppose the reason it's not presently used is because the array
> looks like this:
>
> static const char *cmd_categories[] = {
> "ancillaryinterrogators",
> "ancillarymanipulators",
> "foreignscminterface",
> "mainporcelain",
> "plumbinginterrogators",
> "plumbingmanipulators",
> "purehelpers",
> "synchelpers",
> "synchingrepositories",
>  NULL
> };
>
> which doesn't give quite the human-friendly output he'd like. The
> series is RFC, after all.

Yep.

> A possible approach to fix it would be to add a new "### categories"
> section to command-list.txt which associates those category tags
> ("ancillaryinterrogators") with human-readable counterparts
> ("Ancillary interrogators").

Or extract the headlines from git.txt but that's not easy since it's
not consistent. We could manually recreate the same grouping as in
git.txt too, it's probably nicer than just printing groups sorted by
category id.
-- 
Duy


Re: [PATCH 4/3] Makefile: untangle DEVELOPER and -Werror

2018-04-07 Thread Duy Nguyen
On Sat, Apr 7, 2018 at 2:36 PM, Ævar Arnfjörð Bjarmason
 wrote:
> Anyway, I see you've pushed a new version with DEVOPTS. I'll submit mine
> on top of that once your new version lands (unless you want to try to
> integrate it yourself).

Actually I think I'll just drop both EAGER_DEVELOPER and DEVOTPS.
-- 
Duy


Re: [PATCH 4/3] Makefile: untangle DEVELOPER and -Werror

2018-04-07 Thread Duy Nguyen
On Fri, Apr 6, 2018 at 11:42 PM, Jeff King <p...@peff.net> wrote:
> On Tue, Apr 03, 2018 at 05:17:00PM +0200, Duy Nguyen wrote:
>
>> It's not that complex. With the EAGER_DEVELOPER patch removed, we can
>> have something like this where eager devs just need to put
>>
>> DEVOPTS = gentle no-suppression
>>
>> and you put
>>
>> DEVOPTS = gentle
>>
>> (bad naming, I didn't spend time thinking about names)
>
> It seems to me like we're losing the point of DEVELOPER here. I thought
> the idea was to have a turn-key flag you could set to get extra linting
> on your commits. But now we're tweaking all kinds of individual options.
> At some point are we better off just letting you put "-Wno-foo" in your
> CFLAGS yourself?

It's because what AEvar wanted is no longer a dev thing :)

> I don't mind the version-based checks because they're automatic, so the
> feature remains turn-key. But this kind of DEVOPTS seems like a step in
> the wrong direction.
>
> -Peff



-- 
Duy


Re: [RFC PATCH 00/19] object-store refactoring 3 (replace objects, main ref store)

2018-04-07 Thread Duy Nguyen
On Sat, Apr 7, 2018 at 1:21 AM, Stefan Beller  wrote: *
> diff --git a/repository.h b/repository.h
> index 09df94a472..2922d3a28b 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -26,6 +26,11 @@ struct repository {
>  */
> struct raw_object_store *objects;
>
> +   /*
> +* The store in which the refs are hold.
> +*/
> +   struct ref_store *main_ref_store;
> +

We probably should drop the main_ prefix here because this could also
be a submodule ref store (when the repository is about a submodule).
worktree ref store is a different story and I don't know how to best
present it here yet (I'm still thinking a separate struct repository).
But we can worry about that when struct repository supports multiple
worktree.
-- 
Duy


Re: [PATCH 1/9] git_config_set: fix off-by-two

2018-04-03 Thread Duy Nguyen
On Tue, Apr 3, 2018 at 11:31 AM, Johannes Schindelin
 wrote:
> It is very frustrating to spend that much time with only little gains here
> and there (and BusyBox-w32 is simply not robust enough yet, apart from
> also not showing a significant improvement in performance).

You still use busybox-w32? It's amazing that people still use it after
the linux subsystem comes. busybox has a lot of commands built in
(i.e. no new processes) and unless rmyorston did something more, the
"fork" in ash shell should be as cheap as it could be: it simply
serializes data and sends to the new process.

If performance does not improve, I guess the process creation cost
dominates. There's not much we could do except moving away from the
zillion processes test framework: either something C-based or another
scripting language (ok I don't want to bring this up again)
-- 
Duy


Re: [PATCH] t2028: tighten grep expression to make "move worktree" test more robust

2018-04-03 Thread Duy Nguyen
On Tue, Apr 3, 2018 at 11:25 AM, Eric Sunshine  wrote:
> Following a rename of worktree "source" to "destination", the "move
> worktree" test uses grep to verify that the output of "git worktree list
> --porcelain" does not contain "source" (and does contain "destination").
> Unfortunately, the grep expression is too loose and can match
> unexpectedly. For example, if component of the test trash directory path
> matches "source" (e.g. "/home/me/sources/git/t/trash*"), then the test
> will be fooled into thinking that "source" still exists. Tighten the
> expression to avoid such accidental matches.
>
> While at it, drop an unused variable ("toplevel") from the test and
> tighten a similarly too-loose expression in a related test.
>
> Reported-by: Jens Krüger 
> Signed-off-by: Eric Sunshine 
> ---
>
> t2028 in 2.17.0 can be fooled into failing depending upon the path of
> the test's trash directory. The problem is with the test being too
> loose, not with Git itself. Problem report and diagnosis here[1].
>
> [1]: 
> https://public-inbox.org/git/26a00c2b-c588-68d5-7085-22310c20e...@frm2.tum.de/T/#m994cdb29f141656b0ab48dd0d152432c7e86fc20

Thanks both. It was great to scroll to the latest mails and saw that I
didn't have to do anything else :)
-- 
Duy


Re: [PATCH 4/3] Makefile: untangle DEVELOPER and -Werror

2018-04-03 Thread Duy Nguyen
On Tue, Apr 03, 2018 at 11:19:46AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sat, Mar 31 2018, Duy Nguyen wrote:
> 
> > On Sat, Mar 31, 2018 at 6:40 PM, Ævar Arnfjörð Bjarmason
> > <ava...@gmail.com> wrote:
> >> Change the DEVELOPER flag, and the newly added EAGER_DEVELOPER flag
> >> which (approximately) enables -Wextra so that any combination of them
> >> and -Werror can be set.
> >>
> >> I've long wanted to use DEVELOPER=1 in my production builds, but on
> >> some old systems I still get warnings, and thus the build would
> >> fail. However if the build/tests fail for some other reason, it would
> >> still be useful to scroll up and see what the relevant code is warning
> >> about.
> >>
> >> This change allows for that. Now setting DEVELOPER will set -Werror as
> >> before, but if DEVELOPER_NONFATAL is set you'll get the same warnings,
> >> but without -Werror.
> >>
> >> I've renamed the newly added EAGER_DEVELOPER flag to
> >> DEVELOPER_EXTRA. The reason is that it approximately turns on -Wextra,
> >> and it'll be more consistent to add e.g. DEVELOPER_PEDANTIC later than
> >> inventing some new name of our own (VERY_EAGER_DEVELOPER?).
> >
> > Before we go with zillions of *DEVELOPER* maybe we can have something
> > like DEVOPTS where you can give multiple keywords to a single variable
> > to influence config.mak.dev. This is similar to COMPILER_FEATURES we
> > already have in there, but now it's driven by the dev instead of the
> > compiler. So you can have keywords like "gentle" (no -Werror) "extra"
> > (-Wextra with no suppression) and something else.
> 
> We could do that, but I don't think it's that bad. This patch is one
> extra option on top of yours,

And that eager this was marked experimental because I was not sure if
it was the right way :)

> and it's not going to result in some combinatorial explosion of
> options, i.e. if we add DEVELOPER_PEDANTIC we'll just add one extra
> flag.
> 
> But sure, we could make this some string we'd need to parse out similar
> to COMPILER_FEATURES, it just seems more complex to me for this task.

It's not that complex. With the EAGER_DEVELOPER patch removed, we can
have something like this where eager devs just need to put

DEVOPTS = gentle no-suppression

and you put

DEVOPTS = gentle

(bad naming, I didn't spend time thinking about names)

-- 8< --
diff --git a/Makefile b/Makefile
index e6680a8977..7b4e038e1d 100644
--- a/Makefile
+++ b/Makefile
@@ -435,6 +435,11 @@ all::
 # Define DEVELOPER to enable more compiler warnings. Compiler version
 # and faimily are auto detected, but could be overridden by defining
 # COMPILER_FEATURES (see config.mak.dev)
+#
+# When DEVELOPER is set, DEVOPTS can be used to control compiler options.
+# This variable contains keywords separated by whitespace. Two keywords
+# are recognized: "gentle" removes -Werror and "no-suppression"
+# removes all "-Wno-" options.
 
 GIT-VERSION-FILE: FORCE
@$(SHELL_PATH) ./GIT-VERSION-GEN
diff --git a/config.mak.dev b/config.mak.dev
index 716a14ecc7..1d7aba6a6a 100644
--- a/config.mak.dev
+++ b/config.mak.dev
@@ -1,4 +1,6 @@
+ifeq ($(filter gentle,$(DEVOPTS)),)
 CFLAGS += -Werror
+endif
 CFLAGS += -Wdeclaration-after-statement
 CFLAGS += -Wno-format-zero-length
 CFLAGS += -Wold-style-definition
@@ -21,6 +23,7 @@ CFLAGS += -Wextra
 # if a function is public, there should be a prototype and the right
 # header file should be included. If not, it should be static.
 CFLAGS += -Wmissing-prototypes
+ifeq ($(filter no-suppression,$(DEVOPTS)),)
 # These are disabled because we have these all over the place.
 CFLAGS += -Wno-empty-body
 CFLAGS += -Wno-missing-field-initializers
@@ -28,6 +31,7 @@ CFLAGS += -Wno-sign-compare
 CFLAGS += -Wno-unused-function
 CFLAGS += -Wno-unused-parameter
 endif
+endif
 
 # uninitialized warnings on gcc 4.9.2 in xdiff/xdiffi.c and config.c
 # not worth fixing since newer compilers correctly stop complaining
-- 8< --


Re: [PATCH 4/3] Makefile: untangle DEVELOPER and -Werror

2018-03-31 Thread Duy Nguyen
On Sat, Mar 31, 2018 at 6:40 PM, Ævar Arnfjörð Bjarmason
 wrote:
> Change the DEVELOPER flag, and the newly added EAGER_DEVELOPER flag
> which (approximately) enables -Wextra so that any combination of them
> and -Werror can be set.
>
> I've long wanted to use DEVELOPER=1 in my production builds, but on
> some old systems I still get warnings, and thus the build would
> fail. However if the build/tests fail for some other reason, it would
> still be useful to scroll up and see what the relevant code is warning
> about.
>
> This change allows for that. Now setting DEVELOPER will set -Werror as
> before, but if DEVELOPER_NONFATAL is set you'll get the same warnings,
> but without -Werror.
>
> I've renamed the newly added EAGER_DEVELOPER flag to
> DEVELOPER_EXTRA. The reason is that it approximately turns on -Wextra,
> and it'll be more consistent to add e.g. DEVELOPER_PEDANTIC later than
> inventing some new name of our own (VERY_EAGER_DEVELOPER?).

Before we go with zillions of *DEVELOPER* maybe we can have something
like DEVOPTS where you can give multiple keywords to a single variable
to influence config.mak.dev. This is similar to COMPILER_FEATURES we
already have in there, but now it's driven by the dev instead of the
compiler. So you can have keywords like "gentle" (no -Werror) "extra"
(-Wextra with no suppression) and something else.
-- 
Duy


Re: [PATCH v8 00/15] nd/pack-objects-pack-struct updates

2018-03-31 Thread Duy Nguyen
On Sat, Mar 31, 2018 at 1:36 PM, Ævar Arnfjörð Bjarmason
 wrote:
>> +GIT_TEST_SPLIT_INDEX forces split-index mode on the whole test suite.
>> +
>>  GIT_TEST_FULL_IN_PACK_ARRAY exercises the uncommon pack-objects code
>>  path where there are more than 1024 packs even if the actual number of
>>  packs in repository is below this limit.
>>
>> -GIT_TEST_OE_SIZE_BITS= exercises the uncommon pack-objects
>> -code path where we do not cache objecct size in memory and read it
>> -from existing packs on demand. This normally only happens when the
>> -object size is over 2GB. This variable forces the code path on any
>> -object larger than 2^ bytes.
>
> The docs here say set these env variables, but actually
> GIT_TEST_FULL_IN_PACK_ARRAY is a special snowflake in requiring you to
> set a bool value.
>
> I'd set GIT_TEST_SPLIT_INDEX=YesPlease already in my test setup & just
> copied that as GIT_TEST_FULL_IN_PACK_ARRAY=YesPlease, but that'll error
> out since it's expecting bool, not the env variable to be set.
>
> I really don't care which we use, but let's use either if(getenv()) or
> if(git_env_bool()) consistently, and then have the docs either say "if
> set" or "if set to a boolean value (see git-config(1))".

I'll change GIT_TEST_SPLIT_INDEX to boolean too since I document it
here anyway. Will wait for a while though to see if anything else
should be part of v9.
-- 
Duy


<    2   3   4   5   6   7   8   9   10   11   >