Re: [PATCH] pack-format.txt: more details on pack file format
On Tue, May 8, 2018 at 8:21 PM, Ævar Arnfjörð Bjarmasonwrote: > > On Tue, May 08 2018, Nguyễn Thái Ngọc Duy wrote: > >> The current document mentions OBJ_* constants without their actual >> values. A git developer would know these are from cache.h but that's >> not very friendly to a person who wants to read this file to implement >> a pack file parser. >> >> Similarly, the deltified representation is not documented at all (the >> "document" is basically patch-delta.c). Translate that C code in >> English. > > Thanks, will drop my version from v4, although a comment for the enum in > cache.h pointing the reader to these docs would be very useful. True. I will add some together with the pack-format.txt update. -- Duy
Re: [PATCH] pack-format.txt: more details on pack file format
On Tue, May 8, 2018 at 7:23 PM, Stefan Bellerwrote: >> While at there, I also add some text about this obscure delta format. >> We occasionally have questions about this on the mailing list if I >> remember correctly. > > Let me see if I can understand it, as I am not well versed in the > delta format, so ideally I would understand it from the patch here? Well yes. I don't expect my first version to be that easy to understand. This is where you come in to help ;-) >> +Valid object types are: >> + >> +- OBJ_COMMIT (1) >> +- OBJ_TREE (2) >> +- OBJ_BLOB (3) >> +- OBJ_TAG (4) >> +- OBJ_OFS_DELTA (6) >> +- OBJ_REF_DELTA (7) >> + >> +Type 5 is reserved for future expansion. > > and type 0 as well, but that is not spelled out? type 0 is invalid. I think in some encoding it's not even possible to encode zero. Anyway yes it should be spelled out. > >> +Deltified representation > > Does this refer to OFS delta as well as REF deltas? Yes. Both OFS and REF deltas have the same "body" which is what this part is about. The differences between OFS and REF deltas are not described (in fact I don't think we describe what OFS and REF deltas are at all). >> is a sequence of one byte command optionally >> +followed by more data for the command. The following commands are >> +recognized: > > So a Deltified representation of an object is a 6 or 7 in the 3 bit type > and then the length. Then a command is shown how to construct > the object based on other objects. Can there be more commands? > >> +- If bit 7 is set, the remaining bits in the command byte specifies >> + how to extract copy offset and size to copy. The following must be >> + evaluated in this exact order: > > So there are 2 modes, and the high bit indicates which mode is used. > You start describing the more complicated mode first, > maybe give names to both of them? "direct copy" (below) and > "compressed copy with offset" ? I started to update this more because even this text is hard to get even to me. So let's get the background first. We have a source object somewhere (the object name comes from ofs/ref delta's header), basically we have the whole content. This delta thingy tells us how to use that source object to create a new (target) object. The delta is actually a sequence of instructions (of variable length). One is for copying from the source object. The other copies from the delta itself (e.g. this is new data in the target which is not available anywhere in the source object to copy from). The last bit of the first byte determines what instruction type it is. >> + - If bit 0 is set, the following byte contains bits 0-7 of the copy >> +offset (this also resets all other bits in the copy offset to >> +zero). >> + - If bit 1 is set, the following byte contains bits 8-15 of the copy >> +offset. >> + - If bit 2 is set, the following byte contains bits 16-23 of the >> +copy offset. >> + - If bit 3 is set, the following byte contains bits 24-31 of the >> +copy offset. > > I assume these bits are exclusive, i.e. if bit 3 is set, bits 0-2 are not > allowed to be set. What happens if they are set, do we care? > > If bit 3 is set, then the following byte contains 24-31 of the copy offset, > where is the rest? Do I wait for another command byte with > bits 2,1,0 to learn about the body offsets, or do they follow the > following byte? Something like: > > "If bit 3 is set, then the next 4 bytes are the copy offset, > in network byte order" My first attempt at "translating to English" is like a constructing C from assembly: it's horrible. The instruction looks like this bit 0123 4 5 6 +--+++++--+--+--+ | 1xxx | offset | offset | offset | offset | size | size | size | +--+++++--+--+--+ Here you can see it in its full form, each box represents a byte. The first byte has bit 7 set as mentioned. We can see here that offsets (where to copy from in the source object) takes 4 bytes and size (how many bytes to copy) takes 3. Offset size size is in LSB order. The "xxx" part lets us shrink this down. If the offset can fit in 16 bits, there's no reason to waste the last two bytes describing zero. Each 'x' marks whether the corresponding byte is present. The bit number is in the first row. So if you have offset 255 and size 1, the instruction is three bytes 10010001b, 255, 1. The octets on "bit column" 1, 2, 3, 5 and 6 are missing because the corresponding bit in the first bit is not set. >> + - If bit 4 is set, the following byte contains bits 0-7 of the copy >> +size (this also resets all other bits in the copy size to zero_. >> + - If bit 5 is set, the following byte contains bits 8-15 of the copy >> +size. >> + - If bit 6 is set, the following byte contains bits 16-23 of the >> +copy size. > > bits 4-7 seem to be another
Re: [PATCH v3 04/12] cache.h: add comment explaining the order in object_type
On Tue, May 1, 2018 at 8:40 PM, Ævar Arnfjörð Bjarmasonwrote: > The order in the enum might seem arbitrary, and isn't explained by > 72518e9c26 ("more lightweight revalidation while reusing deflated > stream in packing", 2006-09-03) which added it. > > Derrick Stolee suggested that it's ordered topologically in > 5f8b1ec1-258d-1acc-133e-a7c248b40...@gmail.com. Makes sense to me, add > that as a comment. > > Helped-by: Derrick Stolee > Signed-off-by: Ævar Arnfjörð Bjarmason > --- > cache.h | 8 > 1 file changed, 8 insertions(+) > > diff --git a/cache.h b/cache.h > index 77b7acebb6..354903c3ea 100644 > --- a/cache.h > +++ b/cache.h > @@ -376,6 +376,14 @@ extern void free_name_hash(struct index_state *istate); > enum object_type { > OBJ_BAD = -1, > OBJ_NONE = 0, > + /* > +* Why have our our "real" object types in this order? They're > +* ordered topologically: > +* > +* tag(4)-> commit(1), tree(2), blob(3) > +* commit(1) -> tree(2) > +* tree(2) -> blob(3) > +*/ I think it's more important that these constants are part of the pack file format. Even if it follows some order now, when a new object type comes, you cannot just reorder to keep things look nice because then you break pack file access. I'm afraid this comment suggests that these numbers are just about order, which is very wrong. > OBJ_COMMIT = 1, > OBJ_TREE = 2, > OBJ_BLOB = 3, -- Duy
Re: [PATCH/RFC] completion: complete all possible -no-
On Mon, Apr 23, 2018 at 7:36 AM, Eric Sunshinewrote: > I haven't looked at the implementation, so this may be an entirely > stupid suggestion, but would it be possible to instead render the > completions as? > > % git checkout -- > --[no-]conflict= --[no-]patch > --[no-]detach --[no-]progress > --[no-]ignore-other-worktrees --[no-]quiet > --[no-]ignore-skip-worktree-bits --[no-]recurse-submodules > --[no-]merge --theirs > --[no-]orphan= --[no-]track > --ours > > This would address the problem of the --no-* options taking double the > screen space. It took me so long to reply partly because I remember seeing some guy doing clever trick with tab completion that also shows a short help text in addition to the complete words. I could not find that again and from my reading (also internet searching) it's probably not possible to do this without trickery. > It's also more intuitive than that lone and somewhat weird-looking > "--no-" suggestion. It's not that weird if you think about file path completion, where you complete one path component at a time not full path, bash just does not show you full paths to everything. I'm arguing about this because I want to see your reaction, because I'm thinking of doing the very same thing for config completion. Right now "git config " gives you two pages of all available config variables. I'm thinking that we "git config " just shows the groups, e.g. > ~/w/git $ git config add. interactive. advice. log. alias.mailmap. am. man. Only when you do "git config log." that it shows you log.* -- Duy
Re: [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions
On Tue, May 8, 2018 at 12:59 AM, Stefan Bellerwrote: > @@ -501,9 +509,31 @@ void raw_object_store_clear(struct raw_object_store *o) > void parsed_object_pool_clear(struct parsed_object_pool *o) > { > /* > -* TOOD free objects in o->obj_hash. > -* > * As objects are allocated in slabs (see alloc.c), we do > * not need to free each object, but each slab instead. > +* > +* Before doing so, we need to free any additional memory > +* the objects may hold. > */ > + unsigned i; > + > + for (i = 0; i < o->obj_hash_size; i++) { > + struct object *obj = o->obj_hash[i]; > + > + if (!obj) > + continue; > + > + if (obj->type == OBJ_TREE) { > + free(((struct tree*)obj)->buffer); It would be nicer to keep this in separate functions, e.g. release_tree_node() and release_commit_node() to go with alloc_xxx_node(). > + } else if (obj->type == OBJ_COMMIT) { > + free_commit_list(((struct commit*)obj)->parents); > + free(&((struct commit*)obj)->util); > + } > + } I still don't see who frees obj_hash[] (or at least clears it if not freed). If I'm going to use this to free memory in pack-objects then I'd really prefer obj_hash[] freed because it's a big _big_ array. Just to be clear, what I mean is FREE_AND_NULL(o->obj_hash); o->obj_hash_size = 0; > + > + clear_alloc_state(o->blob_state); > + clear_alloc_state(o->tree_state); > + clear_alloc_state(o->commit_state); > + clear_alloc_state(o->tag_state); > + clear_alloc_state(o->object_state); > } -- Duy
Re: [PATCH v2 02/18] Add a new builtin: branch-diff
On Mon, May 7, 2018 at 9:50 AM, Jeff Kingwrote: > On Sat, May 05, 2018 at 11:57:26PM +0200, Johannes Schindelin wrote: > >> > It feels really petty complaining about the name, but I just want to >> > raise the point, since it will never be easier to change than right now. >> >> I do hear you. Especially since I hate `git cherry` every single time that >> I try to tab-complete `git cherry-pick`. > > Me too. :) Just so you know I'm also not happy with that "git cherry". Since I'm updating git-completion.bash in this area and we got 3 "me too" votes (four if we count Szeder in another thread), I'm going to implementing something to at least let you exclude "cherry" from the completion list if you want. -- Duy
Re: [PATCH 4/5] lock_file: make function-local locks non-static
On Sun, May 6, 2018 at 9:32 PM, Martin Ågren <martin.ag...@gmail.com> wrote: > On 6 May 2018 at 19:42, Duy Nguyen <pclo...@gmail.com> wrote: >> On Sun, May 6, 2018 at 7:26 PM, Duy Nguyen <pclo...@gmail.com> wrote: >>> On Sun, May 6, 2018 at 4:10 PM, Martin Ågren <martin.ag...@gmail.com> wrote: >>>> These `struct lock_file`s are local to their respective functions and we >>>> can drop their staticness. > >>>> - static struct lock_file lock; >>>> + struct lock_file lock = LOCK_INIT; >>> >>> Is it really safe to do this? I vaguely remember something about >>> (global) linked list and signal handling which could trigger any time >>> and probably at atexit() time too (i.e. die()). You don't want to >>> depend on stack-based variables in that case. >> >> So I dug in a bit more about this. The original implementation does >> not allow stack-based lock files at all in 415e96c8b7 ([PATCH] >> Implement git-checkout-cache -u to update stat information in the >> cache. - 2005-05-15). The situation has changed since 422a21c6a0 >> (tempfile: remove deactivated list entries - 2017-09-05). At the end >> of that second commit, Jeff mentioned "We can clean them up >> individually" which I guess is what these patches do. Though I do not >> know if we need to make sure to call "release" function or something/ >> Either way you need more explanation and assurance than just "we can >> drop their staticness" in the commit mesage. > > Thank you Duy for your comments. How about I write the commit message > like so: +Jeff. Since he made it possible to remove lock file from the global linked list, he probably knows well what to check when switching from a static lock file to a stack-local one. > > After 076aa2cbd (tempfile: auto-allocate tempfiles on heap, 2017-09-05), > we can have lockfiles on the stack. These `struct lock_file`s are local > to their respective functions and we can drop their staticness. > > Each of these users either commits or rolls back the lock in every > codepath, with these possible exceptions: > > * We bail using a call to `die()` or `exit()`. The lock will be > cleaned up automatically. > > * We return early from a function `cmd_foo()` in builtin/, i.e., we > are just about to exit. The lock will be cleaned up automatically. There are also signals which can be caught and run on its own stack (I think) so whatever variable on the current stack should be safe, I guess. > If I have missed some codepath where we do not exit, yet leave a locked > lock around, that was so also before this patch. If we would later > re-enter the same function, then before this patch, we would be retaking > a lock for the very same `struct lock_file`, which feels awkward, but to > the best of my reading has well-defined behavior. Whereas after this > patch, we would attempt to take the lock with a completely fresh `struct > lock_file`. In both cases, the result would simply be that the lock can > not be taken, which is a situation we already handle. There is a difference here, if the lock is not released properly, previously the lockfile is still untouched. If it's on stack, it may be overwritten which can corrupt the linked list to get to the next lock file. (and this is about calling the function in question just _once_ not the second time). -- Duy
Re: [PATCH 4/5] lock_file: make function-local locks non-static
On Sun, May 6, 2018 at 7:26 PM, Duy Nguyen <pclo...@gmail.com> wrote: > On Sun, May 6, 2018 at 4:10 PM, Martin Ågren <martin.ag...@gmail.com> wrote: >> These `struct lock_file`s are local to their respective functions and we >> can drop their staticness. >> >> Signed-off-by: Martin Ågren <martin.ag...@gmail.com> >> --- >> apply.c| 2 +- >> builtin/describe.c | 2 +- >> builtin/difftool.c | 2 +- >> builtin/gc.c | 2 +- >> builtin/merge.c| 4 ++-- >> builtin/receive-pack.c | 2 +- >> bundle.c | 2 +- >> fast-import.c | 2 +- >> refs/files-backend.c | 2 +- >> shallow.c | 2 +- >> 10 files changed, 11 insertions(+), 11 deletions(-) >> >> diff --git a/apply.c b/apply.c >> index 7e5792c996..07b14d1127 100644 >> --- a/apply.c >> +++ b/apply.c >> @@ -4058,7 +4058,7 @@ static int build_fake_ancestor(struct apply_state >> *state, struct patch *list) >> { >> struct patch *patch; >> struct index_state result = { NULL }; >> - static struct lock_file lock; >> + struct lock_file lock = LOCK_INIT; > > Is it really safe to do this? I vaguely remember something about > (global) linked list and signal handling which could trigger any time > and probably at atexit() time too (i.e. die()). You don't want to > depend on stack-based variables in that case. So I dug in a bit more about this. The original implementation does not allow stack-based lock files at all in 415e96c8b7 ([PATCH] Implement git-checkout-cache -u to update stat information in the cache. - 2005-05-15). The situation has changed since 422a21c6a0 (tempfile: remove deactivated list entries - 2017-09-05). At the end of that second commit, Jeff mentioned "We can clean them up individually" which I guess is what these patches do. Though I do not know if we need to make sure to call "release" function or something/ Either way you need more explanation and assurance than just "we can drop their staticness" in the commit mesage. -- Duy
Re: [PATCH 4/5] lock_file: make function-local locks non-static
On Sun, May 6, 2018 at 4:10 PM, Martin Ågrenwrote: > These `struct lock_file`s are local to their respective functions and we > can drop their staticness. > > Signed-off-by: Martin Ågren > --- > apply.c| 2 +- > builtin/describe.c | 2 +- > builtin/difftool.c | 2 +- > builtin/gc.c | 2 +- > builtin/merge.c| 4 ++-- > builtin/receive-pack.c | 2 +- > bundle.c | 2 +- > fast-import.c | 2 +- > refs/files-backend.c | 2 +- > shallow.c | 2 +- > 10 files changed, 11 insertions(+), 11 deletions(-) > > diff --git a/apply.c b/apply.c > index 7e5792c996..07b14d1127 100644 > --- a/apply.c > +++ b/apply.c > @@ -4058,7 +4058,7 @@ static int build_fake_ancestor(struct apply_state > *state, struct patch *list) > { > struct patch *patch; > struct index_state result = { NULL }; > - static struct lock_file lock; > + struct lock_file lock = LOCK_INIT; Is it really safe to do this? I vaguely remember something about (global) linked list and signal handling which could trigger any time and probably at atexit() time too (i.e. die()). You don't want to depend on stack-based variables in that case. -- Duy
Re: [PATCH v2 02/18] Add a new builtin: branch-diff
On Sun, May 6, 2018 at 6:53 AM, Jacob Kellerwrote: > On Sat, May 5, 2018 at 6:05 PM, Igor Djordjevic > wrote: >> Hi Dscho, >> >> On 05/05/2018 23:57, Johannes Schindelin wrote: >>> >>> > > This builtin does not do a whole lot so far, apart from showing a >>> > > usage that is oddly similar to that of `git tbdiff`. And for a >>> > > good reason: the next commits will turn `branch-diff` into a >>> > > full-blown replacement for `tbdiff`. >>> > >>> > One minor point about the name: will it become annoying as a tab >>> > completion conflict with git-branch? >>> >>> I did mention this in the commit message of 18/18: >>> >>> Without this patch, we would only complete the `branch-diff` part but >>> not the options and other arguments. >>> >>> This of itself may already be slightly disruptive for well-trained >>> fingers that assume that `git braorimas` would expand to >>> `git branch origin/master`, as we now no longer automatically append a >>> space after completing `git branch`: this is now ambiguous. >>> >>> > It feels really petty complaining about the name, but I just want >>> > to raise the point, since it will never be easier to change than >>> > right now. >>> >>> I do hear you. Especially since I hate `git cherry` every single >>> time that I try to tab-complete `git cherry-pick`. >>> >>> > (And no, I don't really have another name in mind; I'm just >>> > wondering if "subset" names like this might be a mild annoyance in >>> > the long run). >>> >>> They totally are, and if you can come up with a better name, I am >>> really interested in changing it before this hits `next`, even. >> >> I gave this just a quick glance so might be I`m missing something >> obvious or otherwise well-known here, bur why not `diff-branch` instead? >> >> From user interface perspective, I would (personally) rather expect a >> command that does "diff of branches" to belong to "diff family" of >> commands (just operating on branches, instead of "branch" command >> knowing to "diff itself"), and I see we already have `diff-files`, >> `diff-index` and `diff-tree`, for what that`s worth. >> >> Heck, I might even expect something like `git diff --branch ...` to work, >> but I guess that is yet a different matter :) >> >> Thanks, Buga > > I like diff-branch, though I suppose that also conflicts with diff too. How about interdiff? -- Duy
Re: [PATCH v2 18/18] completion: support branch-diff
On Fri, May 04, 2018 at 05:35:11PM +0200, Johannes Schindelin wrote: > Tab completion of `branch-diff` is very convenient, especially given > that the revision arguments that need to be passed to `git branch-diff` > are typically more complex than, say, your grandfather's `git log` > arguments. > > Without this patch, we would only complete the `branch-diff` part but > not the options and other arguments. > > This of itself may already be slightly disruptive for well-trained > fingers that assume that `git braorimas` would expand to > `git branch origin/master`, as we now no longer automatically append a > space after completing `git branch`: this is now ambiguous. > > Signed-off-by: Johannes Schindelin> --- > contrib/completion/git-completion.bash | 18 ++ > 1 file changed, 18 insertions(+) > > diff --git a/contrib/completion/git-completion.bash > b/contrib/completion/git-completion.bash > index 01dd9ff07a2..45addd525ac 100644 > --- a/contrib/completion/git-completion.bash > +++ b/contrib/completion/git-completion.bash > @@ -1496,6 +1496,24 @@ _git_format_patch () > __git_complete_revlist > } > > +__git_branch_diff_options=" > + --no-patches --creation-weight= --dual-color > +" > + > +_git_branch_diff () > +{ > + case "$cur" in > + --*) > + __gitcomp " You should use __gitcomp_builtin so you don't have to maintain $__git_branch_diff_options here. Something like this -- 8< -- diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index 45addd525a..4745631daf 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -1496,18 +1496,11 @@ _git_format_patch () __git_complete_revlist } -__git_branch_diff_options=" - --no-patches --creation-weight= --dual-color -" - _git_branch_diff () { case "$cur" in --*) - __gitcomp " - $__git_branch_diff_options - $__git_diff_common_options - " + __gitcomp_builtin branch-diff "$__git_diff_common_options" return ;; esac -- 8< -- > + $__git_branch_diff_options > + $__git_diff_common_options > + " > + return > + ;; > + esac > + __git_complete_revlist > +} > + > _git_fsck () > { > case "$cur" in > -- > 2.17.0.409.g71698f11835
Re: [PATCH v4 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
On Sat, May 5, 2018 at 4:43 AM, Taylor Blauwrote: > Teach 'git-grep(1)' a new option, '--column', to show the column > number of the first match on a non-context line. Why? Or put it another way, what is this option used for? Only git-jump? (which should also be mentioned here if true) > > For example: > > $ git grep -n --column foo | head -n3 > .clang-format:51:14:# myFunction(foo, bar, baz); > .clang-format:64:7:# int foo(); > .clang-format:75:8:# void foo() > -- Duy
Re: [PATCH] alloc.c: replace alloc by mempool
On Fri, May 4, 2018 at 12:18 AM, Stefan Bellerwrote: > I just measured on git.git and linux.git (both of which are not *huge* by > any standard, but should give a good indication. linux has 6M objects, > and when allocating 1024 at a time, we run into the new block allocation > ~6000 times). > > I could not measure any meaningful difference. > > linux.git $ git count-objects -v > count: 0 > size: 0 > in-pack: 6036543 > packs: 2 > size-pack: 2492985 > prune-packable: 0 > garbage: 0 > size-garbage: 0 > > (with this patch) > Performance counter stats for '/u/git/git count-objects -v' (30 runs): > > 2.123683 task-clock:u (msec) #0.831 CPUs utilized > ( +- 2.32% ) > 0 context-switches:u#0.000 K/sec > 0 cpu-migrations:u #0.000 K/sec >126 page-faults:u #0.059 M/sec > ( +- 0.22% ) >895,900 cycles:u #0.422 GHz > ( +- 1.40% ) >976,596 instructions:u#1.09 insn per cycle > ( +- 0.01% ) >218,256 branches:u# 102.772 M/sec > ( +- 0.01% ) > 8,331 branch-misses:u #3.82% of all branches > ( +- 0.61% ) > >0.002556203 seconds time elapsed >( +- 2.20% ) > > Performance counter stats for 'git count-objects -v' (30 runs): > > 2.410352 task-clock:u (msec) #0.801 CPUs utilized > ( +- 2.79% ) > 0 context-switches:u#0.000 K/sec > 0 cpu-migrations:u #0.000 K/sec >131 page-faults:u #0.054 M/sec > ( +- 0.16% ) >993,301 cycles:u #0.412 GHz > ( +- 1.99% ) > 1,087,428 instructions:u#1.09 insn per cycle > ( +- 0.02% ) >244,292 branches:u# 101.351 M/sec > ( +- 0.02% ) > 9,264 branch-misses:u #3.79% of all branches > ( +- 0.57% ) > >0.003010854 seconds time elapsed >( +- 2.54% ) > > So I think we could just replace it for now and optimize again later, if it > turns out to be a problem. I think the easiest optimisation is to increase > the allocation size of having a lot more objects per mp_block. Yeah. I also tested this from a different angle: memory overhead. For 2M objects with one mp_block containing 1024 objects (same setting as alloc.c), the overhead (not counting malloc() internal overhead) is 46KB and we don't have any extra overhead due to padding between objects. This is true for all struct blob, commit, tree and tag. This is really good. alloc.c has zero overhead when measured this way but 46KB is practically zero to me. -- Duy
Re: [PATCH 02/18] Add a new builtin: branch-diff
On Fri, May 4, 2018 at 5:23 PM, Johannes Schindelinwrote: >> > So that's what `info` does: it influences whether/where >> > the command is listed in `git help`'s output... Interesting. I thought the >> > lines here were trying to automate parts of the tab completion or >> > something. >> >> Oh it does many things. The completion part is coming (so yeah you >> don't need to update git-completion.bash at all, as long as you have a >> line here and use parse_options() ;-), but I think it's mainly for >> "git help" and command listing in "git help git" (this command for >> example should show up under the "Main porcelain commands" in that man >> page when you put a line here) > > I have a hard time believing that anything automated can infer from the > source code that branch-diff can accept the non-options in three different > formats, and what those formats look like... Yeah. For now it can only do options which is machine readable. But I have a big plan, so who knows :D -- Duy
Re: [PATCH 02/18] Add a new builtin: branch-diff
On Fri, May 04, 2018 at 04:44:49PM +0200, Duy Nguyen wrote: > On Fri, May 4, 2018 at 9:23 AM, Johannes Schindelin > <johannes.schinde...@gmx.de> wrote: > > Oh, okay. It was not at all clear to me what the exact format and role of > > these lines are... > > Noted. I'm making more updates in this file in another topic and will > add some explanation so the next guy will be less confused. This is what I will include (but in a separate topic). I will not CC you there to keep your inbox slightly less full. I hope this helps understand what command-list.txt is for. diff --git a/command-list.txt b/command-list.txt index 40776b9587..929d8f0ea0 100644 --- a/command-list.txt +++ b/command-list.txt @@ -1,3 +1,47 @@ +# Command classification list +# --- +# All supported commands, builtin or external, must be described in +# here. This info is used to list commands in various places. Each +# command is on one line followed by one or more attributes. +# +# The first attribute group is mandatory and indicates the command +# type. This group includes: +# +# mainporcelain +# ancillarymanipulators +# ancillaryinterrogators +# foreignscminterface +# plumbingmanipulators +# plumbinginterrogators +# synchingrepositories +# synchelpers +# purehelpers +# +# the type names are self explanatory. But if you want to see what +# command belongs to what group to get a better idea, have a look at +# "git" man page, "GIT COMMANDS" section. +# +# Commands of type mainporcelain can also optionally have one of these +# attributes: +# +# init +# worktree +# info +# history +# remote +# +# These commands are considered "common" and will show up in "git +# help" output in groups. Uncommon porcelain commands must not +# specify any of these attributes. +# +# "complete" attribute is used to mark that the command should be +# completable by git-completion.bash. Note that by default, +# mainporcelain commands are completable and you don't need this +# attribute. +# +# While not true commands, guides are also specified here, which can +# only have "guide" attribute and nothing else. +# ### command list (do not change this line, also do not change alignment) # command name category [category] [category] git-add mainporcelain worktree
Re: [PATCH v2] checkout & worktree: introduce checkout.implicitRemote
On Fri, May 4, 2018 at 9:54 AM, Ævar Arnfjörð Bjarmasonwrote: > Realistically the way we do hooks now would make the UI of this suck, > i.e. you couldn't configure it globally or system-wide easily. Something > like what I noted in > https://public-inbox.org/git/871sf3el01@evledraar.gmail.com/ would > make it suck less, but that's a much bigger task. I thought you would bring this up :) I've given some more thoughts on this topic and am willing to implement something like below, in a week or two. Would that help change your mind? I proposed hooks.* config space in that same thread. Here is the extension to make it cover both of your points. hooks.* can have multiple values. So you can specify hooks.post-checkout multiple times and all those scripts will run (in the same order you specified). Since we already have a search path for config (/etc/gitconfig -> $HOME/.gitconfig -> $REPO/config) this helps hooks management as well. Execution order is still the same: if you specify hooks.post-checkout in both /etc/gitconfig and .git/config, then the one in /etc/gitconfig is executed first, the one in .git second. And here's something extra to make it possible to override the search order: if you specify hooks.post-checkout = reset (reset is a random keyword) then _all_ post-checkout hooks before this point are ignored. So you can put this in .git/config [hooks] post-checkout = reset post-checkout = ~/some-hook and can be sure that post-checkout specified in $HOME and /etc will not affect you, only ~/some-hook will run. If you drop the second line then you have no post-checkout hooks. This is a workaround for a bigger problem (not being able to unset a config entry) but I think it's sufficient for this use case. -- Duy
Re: [PATCH 02/18] Add a new builtin: branch-diff
On Fri, May 4, 2018 at 9:23 AM, Johannes Schindelinwrote: > Oh, okay. It was not at all clear to me what the exact format and role of > these lines are... Noted. I'm making more updates in this file in another topic and will add some explanation so the next guy will be less confused. > So that's what `info` does: it influences whether/where > the command is listed in `git help`'s output... Interesting. I thought the > lines here were trying to automate parts of the tab completion or > something. Oh it does many things. The completion part is coming (so yeah you don't need to update git-completion.bash at all, as long as you have a line here and use parse_options() ;-), but I think it's mainly for "git help" and command listing in "git help git" (this command for example should show up under the "Main porcelain commands" in that man page when you put a line here) -- Duy
Re: [PATCH 02/18] Add a new builtin: branch-diff
On Thu, May 3, 2018 at 10:32 PM, Johannes Schindelin <johannes.schinde...@gmx.de> wrote: > Hi Duy, > > On Thu, 3 May 2018, Johannes Schindelin wrote: > >> On Thu, 3 May 2018, Duy Nguyen wrote: >> >> > On Thu, May 3, 2018 at 5:30 PM, Johannes Schindelin >> > <johannes.schinde...@gmx.de> wrote: >> > > diff --git a/command-list.txt b/command-list.txt >> > > index a1fad28fd82..c89ac8f417f 100644 >> > > --- a/command-list.txt >> > > +++ b/command-list.txt >> > > @@ -19,6 +19,7 @@ git-archive mainporcelain >> > > git-bisect mainporcelain info >> > > git-blame ancillaryinterrogators >> > > git-branch mainporcelain history >> > > +git-branch-diff mainporcelain info >> > >> > Making it part of "git help" with the info keywords at this stage may >> > be premature. "git help" is about _common_ commands and we don't know >> > (yet) how popular this will be. >> >> Makes sense. I removed the `mainporcelain` keyword locally. > > On second thought, I *think* you meant to imply that I should remove that > line altogether. Will do that now. Actually I only suggested to remove the last word "info". That was what made this command "common". Classifying all commands in this file is definitely a good thing, and I think mainporcelain is the right choice. -- Duy
Re: [PATCH v2 0/5] Allocate cache entries from memory pool
On Thu, May 3, 2018 at 7:21 PM, Stefan Beller <sbel...@google.com> wrote: > On Thu, May 3, 2018 at 9:35 AM, Duy Nguyen <pclo...@gmail.com> wrote: >> On Mon, Apr 30, 2018 at 5:31 PM, Jameson Miller <jam...@microsoft.com> wrote: >>> This patch series improves the performance of loading indexes by >>> reducing the number of malloc() calls. Loading the index from disk is >>> partly dominated by the time in malloc(), which is called for each >>> index entry. This patch series reduces the number of times malloc() is >>> called as part of loading the index, and instead allocates a block of >>> memory upfront that is large enough to hold all of the cache entries, >>> and chunks this memory itself. This change builds on [1]. >> >> I have only looked at the mem-pool related patches to see if >> mem-pool.c is good enough to replace alloc.c. To me, it's a "yes" >> after we optimize mem_pool_alloc() a bit (not that performance really >> matters in alloc.c case, but that may be because it's already >> blazingly fast that we never noticed about it). > > alloc.c was not just about speed, but mostly about dense packing? > 855419f764a (Add specialized object allocator, 2006-06-19) I know. I vaguely remembered Linus made that change but did not really look it up :) That reference should be included when/if you switch from alloc.c to mem-pool.c. > To me it is also a clear yes when it comes to combining these > two memory pools. I also did not notice that jm/mem-pool already landed in master. Have you tried measure (both memory usage and allocation speed) of it and alloc.c? Just take some big repo as an example and do count-objects -v to see how many blobs/trees/commits it has, then allocate the same amount with both alloc.c and mem-pool.c and measure both speed/mem. I'm pretty sure you're right that mem-pool.c is a clear yes. I was just being more conservative because we do (slightly) change allocator's behavior when we make the switch. But it's also very likely that any performance difference will be insignificant. I'm asking this because if mem-pool.c is a clear winner, you can start to update you series to use it now and kill alloc.c in the process. PS. Is Jeff back yet? I'm sure Junio is listening and all but I'm afraid he's too busy being a maintainer so Jeff's opinion in this area is really valuable. He has all the fun and weird use cases to play with at github. -- Duy
Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions
On Thu, May 3, 2018 at 7:24 PM, Stefan Bellerwrote: >>> +void clear_alloc_state(struct alloc_state *s) >>> +{ >>> + while (s->slab_nr > 0) { >>> + s->slab_nr--; >>> + free(s->slabs[s->slab_nr]); >> >> I think you're leaking memory here. Commit and tree objects may have >> more allocations in them (especially trees, but I think we have >> commit_list in struct commit too). Those need to be freed as well. > > I would think that tree and commit memory will be free'd in the > parsed_objects release function? (TODO: I need to add it over there) What release function? I know tree->buffer is often released since you don't want to keep much in memory. But the user should not be required to release all objects before calling object_parser_clear(). For struct commit, I'm pretty we make the commit_list (for commit parents) and never free. Now we need to do it, somewhere. Oh you mean object_parser_clear() as release function? Yes. I've been wondering if it makes more sense to go through obj_hash[] and release objects before free(obj_hash) than doing it here. It's probably better to do it there since obj_hash[] would contain the very last pointers to these objects and look like the right place to release resources. > Thanks, > Stefan -- Duy
Re: [PATCH 02/18] Add a new builtin: branch-diff
On Thu, May 3, 2018 at 5:30 PM, Johannes Schindelinwrote: > diff --git a/command-list.txt b/command-list.txt > index a1fad28fd82..c89ac8f417f 100644 > --- a/command-list.txt > +++ b/command-list.txt > @@ -19,6 +19,7 @@ git-archive mainporcelain > git-bisect mainporcelain info > git-blame ancillaryinterrogators > git-branch mainporcelain history > +git-branch-diff mainporcelain info Making it part of "git help" with the info keywords at this stage may be premature. "git help" is about _common_ commands and we don't know (yet) how popular this will be. (I'm not complaining though, I almost wrote "what witchcraft is this and where can I get it" when I see branch-diff output mentioned in AEvar's mail) -- Duy
Re: [PATCH v2 0/5] Allocate cache entries from memory pool
On Mon, Apr 30, 2018 at 5:31 PM, Jameson Millerwrote: > This patch series improves the performance of loading indexes by > reducing the number of malloc() calls. Loading the index from disk is > partly dominated by the time in malloc(), which is called for each > index entry. This patch series reduces the number of times malloc() is > called as part of loading the index, and instead allocates a block of > memory upfront that is large enough to hold all of the cache entries, > and chunks this memory itself. This change builds on [1]. I have only looked at the mem-pool related patches to see if mem-pool.c is good enough to replace alloc.c. To me, it's a "yes" after we optimize mem_pool_alloc() a bit (not that performance really matters in alloc.c case, but that may be because it's already blazingly fast that we never noticed about it). I probably should look at read-cache.c changes too. Maybe later. Although after the change to use xmalloc() per entry a few years(?) ago, it should be straight forward to use a different memory allocator. -- Duy
Re: [PATCH v2 5/5] block alloc: add validations around cache_entry lifecyle
On Mon, Apr 30, 2018 at 5:31 PM, Jameson Millerwrote: > Add an option (controlled by an environment variable) perform extra > validations on mem_pool allocated cache entries. When set: > > 1) Invalidate cache_entry memory when discarding cache_entry. > > 2) When discarding index_state struct, verify that all cache_entries > were allocated from expected mem_pool. > > 3) When discarding mem_pools, invalidate mem_pool memory. On linux step 3 could be better achieved by allocating blocks with mmap() and freeing them with munmap(). Access to already munmap()'d blocks will result in segmentation fault regardless of values. I guess Windows also has something similar (I vaguely remember something about "locking memory" and stuff, but my win32 knowledge is decade old at this point) (Actually with glibc on linux, i'm pretty sure mmap is already used for large allocation so step 3 is achieved without doing anything; not sure about other libc implementations) > This should provide extra checks that mem_pools and their allocated > cache_entries are being used as expected. > > Signed-off-by: Jameson Miller > --- > cache.h | 7 +++ > git.c| 3 +++ > mem-pool.c | 24 +++- > mem-pool.h | 2 ++ > read-cache.c | 47 +++ > 5 files changed, 82 insertions(+), 1 deletion(-) > > diff --git a/cache.h b/cache.h > index 7ed68f28e0..8f10f0649b 100644 > --- a/cache.h > +++ b/cache.h > @@ -369,6 +369,13 @@ void discard_index_cache_entry(struct cache_entry *ce); > */ > void discard_transient_cache_entry(struct cache_entry *ce); > > +/* > + * Validate the cache entries in the index. This is an internal > + * consistency check that the cache_entry structs are allocated from > + * the expected memory pool. > + */ > +void validate_cache_entries(const struct index_state *istate); > + > #ifndef NO_THE_INDEX_COMPATIBILITY_MACROS > #define active_cache (the_index.cache) > #define active_nr (the_index.cache_nr) > diff --git a/git.c b/git.c > index f598fae7b7..28ec7a6c4f 100644 > --- a/git.c > +++ b/git.c > @@ -347,7 +347,10 @@ static int run_builtin(struct cmd_struct *p, int argc, > const char **argv) > > trace_argv_printf(argv, "trace: built-in: git"); > > + validate_cache_entries(_index); > status = p->fn(argc, argv, prefix); > + validate_cache_entries(_index); > + > if (status) > return status; > > diff --git a/mem-pool.c b/mem-pool.c > index a495885c4b..77adb5d5b9 100644 > --- a/mem-pool.c > +++ b/mem-pool.c > @@ -60,21 +60,43 @@ void mem_pool_discard(struct mem_pool *mem_pool) > { > int i; > struct mp_block *block, *block_to_free; > + int invalidate_memory = should_validate_cache_entries(); Heh.. cache-entries logic should not enter mem-pool.c. > + > for (block = mem_pool->mp_block; block;) > { > block_to_free = block; > block = block->next_block; > + > + if (invalidate_memory) > + memset(block_to_free->space, 0xDD, ((char > *)block_to_free->end) - ((char *)block_to_free->space)); > + > free(block_to_free); > } > > - for (i = 0; i < mem_pool->nr; i++) > + for (i = 0; i < mem_pool->nr; i++) { > + memset(mem_pool->custom[i], 0xDD, ((char > *)mem_pool->custom_end[i]) - ((char *)mem_pool->custom[i])); "if (invalidate_memory)" missing > free(mem_pool->custom[i]); > + } > > free(mem_pool->custom); > free(mem_pool->custom_end); > free(mem_pool); > } > > +int should_validate_cache_entries(void) > +{ > + static int validate_index_cache_entries = -1; > + > + if (validate_index_cache_entries < 0) { > + if (getenv("GIT_TEST_VALIDATE_INDEX_CACHE_ENTRIES")) There's a safer version that you can use, get_env_bool() > + validate_index_cache_entries = 1; > + else > + validate_index_cache_entries = 0; > + } > + > + return validate_index_cache_entries; > +} > + > void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len) > { > struct mp_block *p; > diff --git a/mem-pool.h b/mem-pool.h > index 34df4fa709..b1f9a920ba 100644 > --- a/mem-pool.h > +++ b/mem-pool.h > @@ -63,4 +63,6 @@ void mem_pool_combine(struct mem_pool *dst, struct mem_pool > *src); > */ > int mem_pool_contains(struct mem_pool *mem_pool, void *mem); > > +int should_validate_cache_entries(void); > + > #endif > diff --git a/read-cache.c b/read-cache.c > index 01cd7fea41..e1dc9f7f33 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -1270,6 +1270,8 @@ int add_index_entry(struct index_state *istate, struct > cache_entry *ce, int opti > { > int pos; > > + validate_cache_entries(istate); Validating _all_ entries every time an entry is added sounds way too
Re: [PATCH v2 3/5] mem-pool: fill out functionality
Another I noticed in the jm/mem-pool series is this loop in mem_pool_alloc() for (p = mem_pool->mp_block; p; p = p->next_block) if (p->end - p->next_free >= len) break; You always go from the start (mp_block) but at some point those first blocks are filled up and we don't really need to walk from the start anymore. If we allow the mem-pool user to set a "minimum alloc" limit, then we can determine if the remaining space in a block is not useful for any future allocation, we can just skip it and start looking for an available from a new pointer, avail_block or something. I'm writing this with alloc.c in mind because we have a lot more blocks to allocate there. Unlike read-cache, you can't really estimate how many mp_blocks you're going to need. This linked list could become long. And because alloc.c does fixed size allocation, we use up one block after another and will never find free space in previous blocks. On Mon, Apr 30, 2018 at 5:31 PM, Jameson Millerwrote: > diff --git a/mem-pool.c b/mem-pool.c > index 389d7af447..a495885c4b 100644 > --- a/mem-pool.c > +++ b/mem-pool.c > @@ -5,6 +5,8 @@ > #include "cache.h" > #include "mem-pool.h" > > +#define BLOCK_GROWTH_SIZE 1024*1024 - sizeof(struct mp_block); #define BLOCK_GROWTH_SIZE (1024*1024 - sizeof(struct mp_block)) (wrapped in brackets and no trailing semicolon) > + > static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, > size_t block_alloc) > { > struct mp_block *p; > @@ -19,6 +21,60 @@ static struct mp_block *mem_pool_alloc_block(struct > mem_pool *mem_pool, size_t b > return p; > } > > +static void *mem_pool_alloc_custom(struct mem_pool *mem_pool, size_t > block_alloc) > +{ > + char *p; An empty line between variable declaration and function body would be nice. > + ALLOC_GROW(mem_pool->custom, mem_pool->nr + 1, mem_pool->alloc); > + ALLOC_GROW(mem_pool->custom_end, mem_pool->nr_end + 1, > mem_pool->alloc_end); > + If you put both custom and custom_end in a struct, then you can grow just one array (of the new struct) and have fewer support variables like nr_end and alloc_end The only downside that I can see is the potential padding between struct increasing memory consumption here. but since you don't care about reallocation cost (i.e. large memory allocations should be rare), it probably does not matter either. But wait, can we just reuse struct mp_block for this? You allocate a new mp_block (for just one allocation) as usual, then you can can maintain a linked list of custom alloc in "struct mp_block *mp_custom_block" or something. This way we can walk both bulk and custom allocation the same way. > + p = xmalloc(block_alloc); > + mem_pool->custom[mem_pool->nr++] = p; > + mem_pool->custom_end[mem_pool->nr_end++] = p + block_alloc; > + > + mem_pool->pool_alloc += block_alloc; > + > + return mem_pool->custom[mem_pool->nr]; > +} > + > +void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size) > +{ > + if (!(*mem_pool)) I think (!*mem_pool) should be enough, although you could avoid the operator precedence headache by doing if (*mem_pool) return; > + { > + *mem_pool = xmalloc(sizeof(struct mem_pool)); I think we tend to do *mem_pool = xmalloc(sizeof(**mem_pool)); > + (*mem_pool)->pool_alloc = 0; Eghh.. perhaps just declare struct mem_pool *pool; then allocate a new memory to this, initialize everything and only do *mem_pool = pool; at the end? It's much less of this (*mem_pool)-> > + (*mem_pool)->mp_block = NULL; Just memset() it once (or use xcallo) and only initialize non-zero/null fields afterwards. > + (*mem_pool)->block_alloc = BLOCK_GROWTH_SIZE; > + (*mem_pool)->custom = NULL; > + (*mem_pool)->nr = 0; > + (*mem_pool)->alloc = 0; > + (*mem_pool)->custom_end = NULL; > + (*mem_pool)->nr_end = 0; > + (*mem_pool)->alloc_end = 0; > + > + if (initial_size > 0) > + mem_pool_alloc_block(*mem_pool, initial_size); > + } > +} > + > +void mem_pool_discard(struct mem_pool *mem_pool) > +{ > + int i; > + struct mp_block *block, *block_to_free; > + for (block = mem_pool->mp_block; block;) > + { > + block_to_free = block; > + block = block->next_block; > + free(block_to_free); > + } > + > + for (i = 0; i < mem_pool->nr; i++) > + free(mem_pool->custom[i]); > + > + free(mem_pool->custom); > + free(mem_pool->custom_end); > + free(mem_pool); > +} > + > void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len) > { > struct mp_block *p; > @@ -33,10 +89,8 @@ void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len) > break; > > if (!p) { > -
Re: [PATCH v2] checkout & worktree: introduce checkout.implicitRemote
On Thu, May 3, 2018 at 3:18 PM, Ævar Arnfjörð Bjarmasonwrote: > Introduce a checkout.implicitRemote setting which can be used to > designate a remote to prefer (via checkout.implicitRemote=origin) when > running e.g. "git checkout master" to mean origin/master, even though > there's other remotes that have the "master" branch. > > I want this because it's very handy to use this workflow to checkout a > repository and create a topic branch, then get back to a "master" as > retrieved from upstream: > > ( > rm -rf /tmp/tbdiff && > git clone g...@github.com:trast/tbdiff.git && > cd tbdiff && > git branch -m topic && > git checkout master > ) > > That will output: > > Branch 'master' set up to track remote branch 'master' from 'origin'. > Switched to a new branch 'master' > > But as soon as a new remote is added (e.g. just to inspect something > from someone else) the DWIMery goes away: > > ( > rm -rf /tmp/tbdiff && > git clone g...@github.com:trast/tbdiff.git && > cd tbdiff && > git branch -m topic && > git remote add avar g...@github.com:avar/tbdiff.git && > git fetch avar && > git checkout master > ) > > Will output: > > error: pathspec 'master' did not match any file(s) known to git. > > The new checkout.implicitRemote config allows me to say that whenever > that ambiguity comes up I'd like to prefer "origin", and it'll still > work as though the only remote I had was "origin". > > I considered splitting this into checkout.implicitRemote and > worktree.implicitRemote, but it's probably less confusing to break our > own rules that anything shared between config should live in core.* > than have two config settings, and I couldn't come up with a short > name under core.* that made sense (core.implicitRemoteForCheckout?). I wonder if it's difficult to add a dwim hook that allows us to replace the entire dwim logic with a hook? Doing this "preferring origin" in a shell script should be easy and it lets us play more with tweaking dwim logic. (Yeah sorry I'm getting off topic again) -- Duy
Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions
On Tue, May 1, 2018 at 11:34 PM, Stefan Bellerwrote: > @@ -501,9 +516,12 @@ void raw_object_store_clear(struct raw_object_store *o) > void object_parser_clear(struct object_parser *o) > { > /* > -* TOOD free objects in o->obj_hash. > -* You need to free(o->obj_hash) too. If you just want to reuse existing obj_hash[] then at least clear it, leave no dangling pointers behind. > * As objects are allocated in slabs (see alloc.c), we do > * not need to free each object, but each slab instead. > */ > + clear_alloc_state(o->blob_state); > + clear_alloc_state(o->tree_state); > + clear_alloc_state(o->commit_state); > + clear_alloc_state(o->tag_state); > + clear_alloc_state(o->object_state); > } -- Duy
Re: [PATCH 00/13] object store: alloc
On Wed, May 2, 2018 at 8:07 PM, Jameson Miller <jam...@microsoft.com> wrote: > > >> -Original Message- >> From: Duy Nguyen <pclo...@gmail.com> >> Sent: Wednesday, May 2, 2018 1:02 PM >> To: Stefan Beller <sbel...@google.com> >> Cc: Git Mailing List <git@vger.kernel.org>; Jameson Miller >> <jam...@microsoft.com> >> Subject: Re: [PATCH 00/13] object store: alloc >> >> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbel...@google.com> wrote: >> > I also debated if it is worth converting alloc.c via this patch series >> > or if it might make more sense to use the new mem-pool by Jameson[1]. >> > >> > I vaguely wonder about the performance impact, as the object >> > allocation code seemed to be relevant in the past. >> >> If I remember correctly, alloc.c was added because malloc() has too high >> overhead per allocation (and we create like millions of them). As long as you >> keep allocation overhead low, it should be ok. Note that we allocate a lot >> more >> than the mem-pool's main target (cache entries if I remember correctly). We >> may have a couple thousands cache entries. We already deal with a couple >> million of struct object. > > The work to move cache entry allocation onto a memory pool was motivated by > the fact that we are looking at indexes with millions of entries. If there is > scaling > concern with the current version of mem-pool, we would like to address it > there > as well. Or if there is improvements that can be shared, that would be nice > as well. I think the two have quite different characteristics. alloc.c code is driven by overhead. struct blob is only 24 bytes each and about 1/3 the repo is blobs, and each malloc has 16 bytes overhead or so if I remember correctly. struct cache_entry at minimum in 88 bytes so relative overhead is not that a big deal (but sure reducing it is still very nice). mem-pool is about allocation speed, but I think that's not a concern for alloc.c because when we do full rev walk, I think I/O is always the bottleneck (maybe object lookup as well). I don't see a good way to have the one memory allocator that satisfyies both to be honest. If you could allocate fixed-size cache entries most of the time (e.g. larger entries will be allocated using malloc or something, and should be a small number), then perhaps we can unify. Or if mem-pool can have an option to allocated fixed size pieces with no overhead, that would be great (sorry I still have not read that mem-pool series yet) -- Duy
Re: [PATCH] checkout & worktree: introduce a core.DWIMRemote setting
On Wed, May 2, 2018 at 8:00 PM, Eric Sunshinewrote: > 2. Building on #1: How well is the term "DWIM" understood by non-power > users? A term, such as "default" is more well known. I'm going off topic but I kinda dislike this term. First time I encountered it in the code I didn't even know what it meant. Since it has not been leaked to user documents (I only did a grep on Documentation/) perhaps a better term should be used, preferably not another acronym. -- Duy
Re: [PATCH 01/13] repository: introduce object parser field
On Wed, May 2, 2018 at 7:26 PM, Stefan Bellerwrote: >> Another suggestion is object_pool, if we keep 'struct object' instead >> of 'struct parsed_object' and also want to keep current allocation >> behavior: no individual deallocation. If you free, you free the whole >> pool (e.g. you could run rev-list --all --objects in a separate pool, >> once you're done, you delete the whole pool). > > That is what the following patches will be about, you can > only free the whole set of parsed objects. > > So if you want to do a separate rev walk, you may need to > instantiate a new repository for it (ideally you'd only need a > separate parsed object store). I'm not sure if it's a good idea to create a separate struct repository just because you want to free this parsed object store. What if updates are made in both repositories? All the cache (well, mostly the delta base cache in sha1_file.c) will double memory usage as well. Yeah not ideal. But I guess making rev-list related code use a separate parsed object store is no small task (and kinda risky as well since we migrate from global lookup_* functions to local ones and need to choose the correct parsed object store to look up from) For your information, there is already a good use case for this wholesale memory free: if we can free the rev-list related memory early in pack-objects (e.g. part of repack operation) then it could lower memory pressure significantly when running on large repos. This has been discussed a bit lately. -- Duy
Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions
On Tue, May 1, 2018 at 11:34 PM, Stefan Bellerwrote: > #include "cache.h" > #include "object.h" > @@ -30,8 +31,25 @@ struct alloc_state { > int count; /* total number of nodes allocated */ > int nr;/* number of nodes left in current allocation */ > void *p; /* first free node in current allocation */ > + > + /* bookkeeping of allocations */ > + void **slabs; Another way to manage this is linked list: you could reserve one "object" in each slab to store the "next" (or "prev") pointer to another slab, then you can just walk through all slabs and free. It's a bit cheaper than reallocating slabs[], but I guess we reallocate so few times that readability matters more (whichever way is chosen). > + int slab_nr, slab_alloc; > }; > > +void *allocate_alloc_state(void) > +{ > + return xcalloc(1, sizeof(struct alloc_state)); > +} > + > +void clear_alloc_state(struct alloc_state *s) > +{ > + while (s->slab_nr > 0) { > + s->slab_nr--; > + free(s->slabs[s->slab_nr]); I think you're leaking memory here. Commit and tree objects may have more allocations in them (especially trees, but I think we have commit_list in struct commit too). Those need to be freed as well. > + } > +} > + > static inline void *alloc_node(struct alloc_state *s, size_t node_size) > { > void *ret; -- Duy
Re: [PATCH 01/13] repository: introduce object parser field
On Tue, May 1, 2018 at 11:33 PM, Stefan Bellerwrote: > /* > -* Holds any information related to accessing the raw object content. > +* Holds any information needed to retrieve the raw content > +* of objects. The object_parser uses this to get object > +* content which it then parses. > */ > struct raw_object_store *objects; > > + /* > +* State for the object parser. This owns all parsed objects > +* (struct object) so callers do not have to manage their > +* lifetime. > +*/ > + struct object_parser *parsed_objects; I like this name 'parsed_objects'. Should we rename the struct after it (e.g. parsed_object_store as opposed to raw_object_store above)? Another suggestion is object_pool, if we keep 'struct object' instead of 'struct parsed_object' and also want to keep current allocation behavior: no individual deallocation. If you free, you free the whole pool (e.g. you could run rev-list --all --objects in a separate pool, once you're done, you delete the whole pool). -- Duy
Re: [PATCH 00/13] object store: alloc
On Tue, May 1, 2018 at 11:33 PM, Stefan Bellerwrote: > I also debated if it is worth converting alloc.c via this patch series > or if it might make more sense to use the new mem-pool by Jameson[1]. > > I vaguely wonder about the performance impact, as the object allocation > code seemed to be relevant in the past. If I remember correctly, alloc.c was added because malloc() has too high overhead per allocation (and we create like millions of them). As long as you keep allocation overhead low, it should be ok. Note that we allocate a lot more than the mem-pool's main target (cache entries if I remember correctly). We may have a couple thousands cache entries. We already deal with a couple million of struct object. -- Duy
Re: [PATCH v2 00/42] object_id part 13
On Wed, May 02, 2018 at 12:25:28AM +, brian m. carlson wrote: > Changes from v1: > * Add missing sign-off. > * Removed unneeded braces from init_pack_info. > * Express 51 in terms of the_hash_algo->hexsz. > * Fix comments referring to SHA-1. > * Update commit messages as suggested. > * Add and use empty_tree_oid_hex and empty_blob_oid_hex. Interdiff for people who don't have time to read 42 patches yet diff --git a/builtin/merge.c b/builtin/merge.c index 8d75ebe64b..7084bcfdea 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -290,7 +290,7 @@ static void read_empty(const struct object_id *oid, int verbose) args[i++] = "-v"; args[i++] = "-m"; args[i++] = "-u"; - args[i++] = oid_to_hex(the_hash_algo->empty_tree); + args[i++] = empty_tree_oid_hex(); args[i++] = oid_to_hex(oid); args[i] = NULL; diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index c31ceb30c2..dca523f50f 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -968,7 +968,7 @@ static const char *push_to_deploy(unsigned char *sha1, return "Working directory has unstaged changes"; /* diff-index with either HEAD or an empty tree */ - diff_index[4] = head_has_history() ? "HEAD" : oid_to_hex(the_hash_algo->empty_tree); + diff_index[4] = head_has_history() ? "HEAD" : empty_tree_oid_hex(); child_process_init(); child.argv = diff_index; diff --git a/cache.h b/cache.h index c5b041019b..71b3c1b15b 100644 --- a/cache.h +++ b/cache.h @@ -1033,6 +1033,9 @@ static inline int is_empty_tree_oid(const struct object_id *oid) return !oidcmp(oid, the_hash_algo->empty_tree); } +const char *empty_tree_oid_hex(void); +const char *empty_blob_oid_hex(void); + /* set default permissions by passing mode arguments to open(2) */ int git_mkstemps_mode(char *pattern, int suffix_len, int mode); int git_mkstemp_mode(char *pattern, int mode); diff --git a/http.c b/http.c index ec70676748..312a5e1833 100644 --- a/http.c +++ b/http.c @@ -2070,7 +2070,7 @@ int http_get_info_packs(const char *base_url, struct packed_git **packs_head) get_sha1_hex(data + i + 6, hash); fetch_and_setup_pack_index(packs_head, hash, base_url); - i += 51; + i += hexsz + 11; break; } default: diff --git a/sequencer.c b/sequencer.c index 12c1e1cdbb..94b6513402 100644 --- a/sequencer.c +++ b/sequencer.c @@ -1480,8 +1480,7 @@ static int do_pick_commit(enum todo_command command, struct commit *commit, unborn = get_oid("HEAD", ); if (unborn) oidcpy(, the_hash_algo->empty_tree); - if (index_differs_from(unborn ? - oid_to_hex(the_hash_algo->empty_tree) : "HEAD", + if (index_differs_from(unborn ? empty_tree_oid_hex() : "HEAD", NULL, 0)) return error_dirty_index(opts); } diff --git a/server-info.c b/server-info.c index 828ec5e538..7ce6dcd67b 100644 --- a/server-info.c +++ b/server-info.c @@ -223,11 +223,9 @@ static void init_pack_info(const char *infofile, int force) else stale = 1; - for (i = 0; i < num_pack; i++) { - if (stale) { + for (i = 0; i < num_pack; i++) + if (stale) info[i]->old_num = -1; - } - } /* renumber them */ QSORT(info, num_pack, compare_info); diff --git a/sha1_file.c b/sha1_file.c index 794753bd54..bf6c8da3ff 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -109,6 +109,18 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { }, }; +const char *empty_tree_oid_hex(void) +{ + static char buf[GIT_MAX_HEXSZ + 1]; + return oid_to_hex_r(buf, the_hash_algo->empty_tree); +} + +const char *empty_blob_oid_hex(void) +{ + static char buf[GIT_MAX_HEXSZ + 1]; + return oid_to_hex_r(buf, the_hash_algo->empty_blob); +} + /* * This is meant to hold a *small* number of objects that you would * want read_sha1_file() to be able to return, but yet you do not want diff --git a/submodule.c b/submodule.c index 22a96b7af0..ee7eea4877 100644 --- a/submodule.c +++ b/submodule.c @@ -1567,7 +1567,7 @@ static void submodule_reset_index(const char *path) get_super_prefix_or_empty(), path); argv_array_pushl(, "read-tree", "-u", "--reset", NULL); - argv_array_push(, oid_to_hex(the_hash_algo->empty_tree)); + argv_array_push(, empty_tree_oid_hex()); if (run_command()) die("could not reset submodule index"); @@ -1659,9 +1659,9 @@ int submodule_move_head(const char *path,
Re: [PATCH 08/41] packfile: abstract away hash constant values
On Wed, May 2, 2018 at 2:11 AM, brian m. carlson <sand...@crustytoothpaste.net> wrote: > On Tue, May 01, 2018 at 12:22:43PM +0200, Duy Nguyen wrote: >> On Mon, Apr 23, 2018 at 11:39:18PM +, brian m. carlson wrote: >> > There are several instances of the constant 20 and 20-based values in >> > the packfile code. Abstract away dependence on SHA-1 by using the >> > values from the_hash_algo instead. >> >> While we're abstracting away 20. There's the only 20 left in >> sha1_file.c that should also be gone. But I guess you could do that >> later since you need to rename fill_sha1_path to >> fill_loose_object_path or something. > > I'm already working on knocking those out. Yeah I checked out your part14 branch after writing this note :P You still need to rename the function though. I can remind that again when part14 is sent out. >> > @@ -507,15 +509,15 @@ static int open_packed_git_1(struct packed_git *p) >> > " while index indicates %"PRIu32" objects", >> > p->pack_name, ntohl(hdr.hdr_entries), >> > p->num_objects); >> > - if (lseek(p->pack_fd, p->pack_size - sizeof(sha1), SEEK_SET) == -1) >> > + if (lseek(p->pack_fd, p->pack_size - hashsz, SEEK_SET) == -1) >> > return error("end of packfile %s is unavailable", >> > p->pack_name); >> > - read_result = read_in_full(p->pack_fd, sha1, sizeof(sha1)); >> > + read_result = read_in_full(p->pack_fd, hash, hashsz); >> > if (read_result < 0) >> > return error_errno("error reading from %s", p->pack_name); >> > - if (read_result != sizeof(sha1)) >> > + if (read_result != hashsz) >> > return error("packfile %s signature is unavailable", >> > p->pack_name); >> > - idx_sha1 = ((unsigned char *)p->index_data) + p->index_size - 40; >> > - if (hashcmp(sha1, idx_sha1)) >> > + idx_hash = ((unsigned char *)p->index_data) + p->index_size - hashsz * >> > 2; >> > + if (hashcmp(hash, idx_hash)) >> >> Since the hash size is abstracted away, shouldn't this hashcmp become >> oidcmp? (which still does not do the right thing, but at least it's >> one less place to worry about) > > Unfortunately, I can't, because it's not an object ID. I think the > decision was made to not transform non-object ID hashes into struct > object_id, which makes sense. I suppose we could have an equivalent > struct hash or something for those other uses. I probably miss something, is hashcmp() supposed to stay after the conversion? And will it compare any hash (as configured in the_algo) or will it for SHA-1 only? If hashcmp() will eventually compare the_algo->rawsz then yes this makes sense. >> > @@ -675,7 +677,8 @@ struct packed_git *add_packed_git(const char *path, >> > size_t path_len, int local) >> > p->pack_size = st.st_size; >> > p->pack_local = local; >> > p->mtime = st.st_mtime; >> > - if (path_len < 40 || get_sha1_hex(path + path_len - 40, p->sha1)) >> > + if (path_len < the_hash_algo->hexsz || >> > + get_sha1_hex(path + path_len - the_hash_algo->hexsz, p->sha1)) >> >> get_sha1_hex looks out of place when we start going with >> the_hash_algo. Maybe change to get_oid_hex() too. > > I believe this is the pack hash, which isn't an object ID. I will > transform it to be called something other than "sha1" and allocate more > memory for it in a future series, though. Ah ok, it's the "sha1" in the name that bugged me. I'm all good then. -- Duy
Re: [PATCH] checkout & worktree: introduce a core.DWIMRemote setting
On Wed, May 2, 2018 at 12:54 PM, Ævar Arnfjörð Bjarmasonwrote: > Introduce a core.DWIMRemote setting which can be used to designate a > remote to prefer (via core.DWIMRemote=origin) when running e.g. "git > checkout master" to mean origin/master, even though there's other > remotes that have the "master" branch. Do we anticipate more dwimy customizations to justify a new dwim group in config (i.e. dwim.remote instead of core.dwimRemote)? -- Duy
Re: [PATCH v2 1/4] test-tool: help verifying BUG() code paths
On Wed, May 2, 2018 at 11:38 AM, Johannes Schindelinwrote: > When we call BUG(), we signal via SIGABRT that something bad happened, > dumping cores if so configured. In some setups these coredumps are > redirected to some central place such as /proc/sys/kernel/core_pattern, > which is a good thing. > > However, when we try to verify in our test suite that bugs are caught in > certain code paths, we do *not* want to clutter such a central place > with unnecessary coredumps. > > So let's special-case the test helpers (which we use to verify such code > paths) so that the BUG() calls will *not* call abort() but exit with a > special-purpose exit code instead. > > Helped-by: Nguyễn Thái Ngọc Duy > Signed-off-by: Johannes Schindelin > --- > t/helper/test-tool.c | 2 ++ > usage.c | 5 + > 2 files changed, 7 insertions(+) > > diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c > index 87066ced62a..5176f9f20ae 100644 > --- a/t/helper/test-tool.c > +++ b/t/helper/test-tool.c > @@ -47,7 +47,9 @@ static struct test_cmd cmds[] = { > int cmd_main(int argc, const char **argv) > { > int i; > + extern int BUG_exit_code; > > + BUG_exit_code = 99; It may be even better to let individual tests in t1406 control this, pretty much like your original patch, except that we tell usage.c to not send SIGABRT and just return a special fault code. That way we don't accidentally suppress BUG()'s sigabrt behavior in tests that do not anticipate it (even in t1406). But this patch is ok for me too if you don't want another reroll. > if (argc < 2) > die("I need a test name!"); > > diff --git a/usage.c b/usage.c > index cdd534c9dfc..9c84dccfa97 100644 > --- a/usage.c > +++ b/usage.c > @@ -210,6 +210,9 @@ void warning(const char *warn, ...) > va_end(params); > } > > +/* Only set this, ever, from t/helper/, when verifying that bugs are caught. > */ > +int BUG_exit_code; > + > static NORETURN void BUG_vfl(const char *file, int line, const char *fmt, > va_list params) > { > char prefix[256]; > @@ -221,6 +224,8 @@ static NORETURN void BUG_vfl(const char *file, int line, > const char *fmt, va_lis > snprintf(prefix, sizeof(prefix), "BUG: "); > > vreportf(prefix, fmt, params); > + if (BUG_exit_code) > + exit(BUG_exit_code); > abort(); > } > > -- > 2.17.0.windows.1.36.gdf4ca5fb72a > > -- Duy
Re: completion troubles with bash
On Tue, May 1, 2018 at 2:17 PM, Pascal Bourdierwrote: > Hi, > > > If "GREP_OPTIONS" is set to '--color=always' , then the git completion > send some escape characters like this : > > ^[[1;35;40m^[[K d^[[m^[[Kiff-files mergetool > a^[[m^[[Kdd d^[[m^[[Kiff-index mv > a^[[m^[[Kddremoved^[[m^[[Kiff-treename-rev > a^[[m^[[Km d^[[m^[[Kifftool notes > ... > > > So you could find a small patch to disable color with "grep" in attachment. Yep. The topic 'nd/command-list' on 'pu' accidentally fixes this too since it removes the use of egrep. > > > Regards, > > > Pascal Bourdier -- Duy
Re: [PATCH 0/4] subtree: move out of contrib
On Mon, Apr 30, 2018 at 11:50 AM, Ævar Arnfjörð Bjarmasonwrote: > I think at this point git-subtree is widely used enough to move out of > contrib/, maybe others disagree, but patches are always better for > discussion that patch-less ML posts. After narrow/partial clone becomes real, it should be "easy" to implement some sort of narrow checkout that would achieve the same thing. But it took me forever with all other stuff to get back to this. If we remove it from contrib and there are people willing to update/maintain it, should it be a separate repository then? The willing people will have much more freedom to update it. And I don't have to answer the questions about who will maintain this thing in git.git. I don't like the idea of adding new (official) shell-based commands either to be honest. -- Duy
Re: [PATCH 2/6] t1406: prepare for the refs code to fail with BUG()
On Mon, Apr 30, 2018 at 12:17 AM, Johannes Schindelinwrote: > t1406 specifically verifies that certain code paths fail with a BUG: ... > message. > > In the upcoming commit, we will convert that message to be generated via > BUG() instead of die("BUG: ..."), which implies SIGABRT instead of a > regular exit code. On the other hand, SIGABRT on linux creates core dumps. And on some setup (like mine) core dumps may be redirected to some central place via /proc/sys/kernel/core_pattern. I think systemd does it too but I didn't check. This moving to SIGABRT when we know it _will_ happen when running the test suite will accumulate core dumps over time and not cleaned up by the test suite. Maybe keeping die("BUG: here is a good compromise. -- Duy
Re: [PATCH 2/6] t1406: prepare for the refs code to fail with BUG()
On Tue, May 1, 2018 at 1:04 PM, Johannes Schindelinwrote: >> If SIGABRT occurs as a result of BUG(), and we know that this happens for >> certain cases, it means we have an unfixed bug. > > Not in this case: The code in question is in > https://github.com/git/git/blob/v2.17.0/t/helper/test-ref-store.c#L190-L201 > and it is called in a way that fails to have the required flags for the > operation. To elaborate, in this particular case, developers are not supposed to call calling create_reflog() on a submodule ref store because the store's implementation does not support it (yet). So it's definitely BUG() to catch devs from doing so. But due to the multiple layer abstractions, we also need to verify that ref code will bug out in this case :P > This would normally indicate a bug, but in this case, that is > exactly what the regression test tries to trigger: we *want* such a bug to > cause a failure. -- Duy
Re: [PATCH 00/41] object_id part 13
On Mon, Apr 30, 2018 at 11:59:43PM +, brian m. carlson wrote: > > I guess I'll be helping review this series instead :D Overall I think this looks good. > > Yeah, I have code to fix that, but it's ugly. > > You can see the work on part2 and part3 of the test fixes, plus the > fixes for all of that stuff on my object-id-part14 branch. Since I gave it some thought, another way of dealing with this may be hiding this behind git_hash_algo abstraction, so we have one function for sha1, one for newhash... and they are probably just macros that take different struct definition. -- Duy
Re: [PATCH 39/41] Update shell scripts to compute empty tree object ID
On Mon, Apr 23, 2018 at 11:39:49PM +, brian m. carlson wrote: > Several of our shell scripts hard-code the object ID of the empty tree. > To avoid any problems when changing hashes, compute this value on > startup of the script. For performance, store the value in a variable > and reuse it throughout the life of the script. > > Signed-off-by: brian m. carlson> --- > git-filter-branch.sh | 4 +++- > git-rebase--interactive.sh | 4 +++- > templates/hooks--pre-commit.sample | 2 +- > 3 files changed, 7 insertions(+), 3 deletions(-) > > diff --git a/git-filter-branch.sh b/git-filter-branch.sh > index 64f21547c1..ccceaf19a7 100755 > --- a/git-filter-branch.sh > +++ b/git-filter-branch.sh > @@ -11,6 +11,8 @@ > # The following functions will also be available in the commit filter: > > functions=$(cat << \EOF > +EMPTY_TREE=$(git hash-object -t tree /dev/null) All scripts (except those example hooks) must source git-sh-setup. Should we define this in there instead? -- Duy
Re: [PATCH 09/41] pack-objects: abstract away hash algorithm
On Mon, Apr 23, 2018 at 11:39:19PM +, brian m. carlson wrote: > @@ -1850,7 +1852,7 @@ static int try_delta(struct unpacked *trg, struct > unpacked *src, > /* Now some size filtering heuristics. */ > trg_size = trg_entry->size; > if (!trg_entry->delta) { > - max_size = trg_size/2 - 20; > + max_size = trg_size/2 - the_hash_algo->rawsz; This may be questionable. Note the "heuristics" comment above. I'm not even sure if this is hash size or some magical-yet-randomly-good value. Just wanted to bring the attention for other people with better understand of this code to see > ref_depth = 1; > } else { > max_size = trg_entry->delta_size;
Re: [PATCH 08/41] packfile: abstract away hash constant values
On Mon, Apr 23, 2018 at 11:39:18PM +, brian m. carlson wrote: > There are several instances of the constant 20 and 20-based values in > the packfile code. Abstract away dependence on SHA-1 by using the > values from the_hash_algo instead. While we're abstracting away 20. There's the only 20 left in sha1_file.c that should also be gone. But I guess you could do that later since you need to rename fill_sha1_path to fill_loose_object_path or something. > @@ -507,15 +509,15 @@ static int open_packed_git_1(struct packed_git *p) >" while index indicates %"PRIu32" objects", >p->pack_name, ntohl(hdr.hdr_entries), >p->num_objects); > - if (lseek(p->pack_fd, p->pack_size - sizeof(sha1), SEEK_SET) == -1) > + if (lseek(p->pack_fd, p->pack_size - hashsz, SEEK_SET) == -1) > return error("end of packfile %s is unavailable", p->pack_name); > - read_result = read_in_full(p->pack_fd, sha1, sizeof(sha1)); > + read_result = read_in_full(p->pack_fd, hash, hashsz); > if (read_result < 0) > return error_errno("error reading from %s", p->pack_name); > - if (read_result != sizeof(sha1)) > + if (read_result != hashsz) > return error("packfile %s signature is unavailable", > p->pack_name); > - idx_sha1 = ((unsigned char *)p->index_data) + p->index_size - 40; > - if (hashcmp(sha1, idx_sha1)) > + idx_hash = ((unsigned char *)p->index_data) + p->index_size - hashsz * > 2; > + if (hashcmp(hash, idx_hash)) Since the hash size is abstracted away, shouldn't this hashcmp become oidcmp? (which still does not do the right thing, but at least it's one less place to worry about) Same comment for other hashcmp in this patch. > @@ -675,7 +677,8 @@ struct packed_git *add_packed_git(const char *path, > size_t path_len, int local) > p->pack_size = st.st_size; > p->pack_local = local; > p->mtime = st.st_mtime; > - if (path_len < 40 || get_sha1_hex(path + path_len - 40, p->sha1)) > + if (path_len < the_hash_algo->hexsz || > + get_sha1_hex(path + path_len - the_hash_algo->hexsz, p->sha1)) get_sha1_hex looks out of place when we start going with the_hash_algo. Maybe change to get_oid_hex() too. > @@ -1678,10 +1683,10 @@ int bsearch_pack(const struct object_id *oid, const > struct packed_git *p, uint32 > > index_lookup = index_fanout + 4 * 256; > if (p->index_version == 1) { > - index_lookup_width = 24; > + index_lookup_width = hashsz + 4; You did good research to spot this 24 constant ;-) -- Duy
Re: [PATCH 04/41] packfile: remove unused member from struct pack_entry
On Mon, Apr 23, 2018 at 11:39:14PM +, brian m. carlson wrote: > The sha1 member in struct pack_entry is unused except for one instance > in which we store a value in it. Since nobody ever reads this value, > don't bother to compute it and remove the member from struct pack_entry. Never used since its introduction in 1f688557c0 ([PATCH] Teach read_sha1_file() and friends about packed git object store. - 2005-06-27). Good riddance. -- Duy
Re: [PATCH 03/41] Remove unused member in struct object_context
On Mon, Apr 23, 2018 at 11:39:13PM +, brian m. carlson wrote: > The tree member of struct object_context is unused except in one place > where we write to it. Since there are no users of this member, remove > it. Yep. It's never used since its introduction in 573285e552 (sha1_name: add get_sha1_with_context() - 2010-06-09) in 'cp/textconv-cat-file' topic. I guess the idea at that time was to keep all the information from get_tree_entry() in object context for future use. Since it's been eight years and nobody needs it still, it should be good to go. -- Duy
Re: [PATCH 02/41] server-info: remove unused members from struct pack_info
On Mon, Apr 23, 2018 at 11:39:12PM +, brian m. carlson wrote: > The head member of struct pack_info is completely unused and the > nr_heads member is used only in one place, which is an assignment. > Since these structure members are not useful, remove them. If you reroll, you could add that their last use was in 3e15c67c90 (server-info: throw away T computation as well. - 2005-12-04) -- Duy
Re: [PATCH 01/41] cache: add a function to read an object ID from a buffer
On Mon, Apr 23, 2018 at 11:39:11PM +, brian m. carlson wrote: > diff --git a/cache.h b/cache.h > index bbaf5c349a..4bca177cf3 100644 > --- a/cache.h > +++ b/cache.h > @@ -1008,6 +1008,11 @@ static inline void oidclr(struct object_id *oid) > memset(oid->hash, 0, GIT_MAX_RAWSZ); > } > > +static inline void oidread(struct object_id *oid, const unsigned char *hash) > +{ > + memcpy(oid->hash, hash, the_hash_algo->rawsz); If performance is a concern, should we go with GIT_MAX_RAWSZ instead of the_hash_algo->rawsz which gives the compiler some more to bypass actual memcpy function and generate copy code directly? If it is not a performance problem, should we avoid inline and move the implementation somewhere? > +} > + >
Re: [PATCH 00/41] object_id part 13
On Tue, Apr 24, 2018 at 1:39 AM, brian m. carlsonwrote: > [0] I can synthesize blobs, trees, and commits, but things are currently > totally broken, which is, I suppose, to be expected. Yup. I was tired and bored so I went playing with the new hash. Writing and reading blobs (with hash-object/cat-file) were relatively easy after fixing up fill_sha1_path and get_oid_basic). Then I worked my way up to update-index/ls-files so that I could make trees with write-tree. And I hit the first road block: struct ondisk_cache_entry hard codes hash size so I would need to re-organize the code for more flexibility (or even redesign the file format if I want to keep byte alignment). Eck... I guess I'll be helping review this series instead :D -- Duy
Re: [PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index
On Mon, Apr 30, 2018 at 6:19 PM, Elijah Newren <new...@gmail.com> wrote: > Hi Duy, > > On Mon, Apr 30, 2018 at 7:45 AM, Duy Nguyen <pclo...@gmail.com> wrote: >> On Mon, Apr 30, 2018 at 4:42 PM, Duy Nguyen <pclo...@gmail.com> wrote: >>> On Sun, Apr 29, 2018 at 10:53 PM, Johannes Schindelin >>> <johannes.schinde...@gmx.de> wrote: >>>>> > @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc >>>>> > *t, struct unpack_trees_options >>>>> > WRITE_TREE_SILENT | >>>>> > WRITE_TREE_REPAIR); >>>>> > } >>>>> > - move_index_extensions(>result, o->dst_index); >>>>> > + move_index_extensions(>result, o->src_index); >>>>> >>>>> While this looks like the right thing to do on paper, I believe it's >>>>> actually broken for a specific case of untracked cache. In short, >>>>> please do not touch this line. I will send a patch to revert >>>>> edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08), >>>>> which essentially deletes this line, with proper explanation and >>>>> perhaps a test if I could come up with one. >>>>> >>>>> When we update the index, we depend on the fact that all updates must >>>>> invalidate the right untracked cache correctly. In this unpack >>>>> operations, we start copying entries over from src to result. Since >>>>> 'result' (at least from the beginning) does not have an untracked >>>>> cache, it has nothing to invalidate when we copy entries over. By the >>>>> time we have done preparing 'result', what's recorded in src's (or >>>>> dst's for that matter) untracked cache may or may not apply to >>>>> 'result' index anymore. This copying only leads to more problems when >>>>> untracked cache is used. >>>> >>>> Is there really no way to invalidate just individual entries? >>> >>> Grr the short answer is the current code (i.e. without Elijah's >>> changes) works but in a twisted way. So you get to keep untracked >>> cache in the end. >> >> GAAAHH.. it works _with_ Elijah's changes (since he made the change >> from dst to src) not without (and no performance regression). > > So...is that an Acked-by for the patch Yes, Acked-by: me. > or does the "two wrong make a > right, I guess" comment suggest that we should still drop the > move_index_extensions change (essentially reverting to v1 of the PATCH > as found at 20180421193736.12722-1-new...@gmail.com), and you'll fix > things up further in a separate series? I think I'll stay away from this file for a while. When I gather enough courage, I'll need to read it through since it sounds like a mine field. -- Duy
Re: [PATCH v5 00/10] Keep all info in command-list.txt in git binary
This is probably scope creep for this series, but do you guys think we should do the same for config variables completion? We currently maintain a giant list at the end of _git_config(). Extracting the list from Documentation/config.txt to keep it in a C array does not look super hard. There will be some special handling for advice.* or color but overall I still think it's a net gain. Listing all recognizable config variables from "git config" (or "git help") would be lovely, but I don't think it helps unless we could print a short summary line each variable, but this info is not available anywhere. -- Duy
Re: [RFC PATCH] checkout: Force matching mtime between files
On Mon, Apr 30, 2018 at 1:56 AM, Junio C Hamano <gits...@pobox.com> wrote: > Duy Nguyen <pclo...@gmail.com> writes: > >> Target revision should be available in the index. But this gives me an >> idea to another thing that bugs me: sending the list to the hook means >> I have to deal with separator (\n or NUL?) or escaping. This mentions >> of index makes me take a different direction. I could produce a small >> index that contains just what is modified, then you can retrieve >> whatever info you want with `git ls-files` or even `git show` after >> pointing $GIT_INDEX_FILE to it. > > That's somewhere in between a tail wagging the dog and a hammer > looking like a solution even when you have a screw. By passing a > temporary index, you may be able to claim that you are feeding the > necessary information without corruption and in a readable and > native format of Git, and make it up to the reader to grab the paths > out of it, but > > (1) the contents, and probably the cached stat information, in that > temporary index is duplicated and wasted; you know from the > time you design this thing that all you want to convey is a > list of paths. Yep, I was not happy about this. Which I why I moved to update the hook calling convention to pass pathspec to the hook instead. > (2) it is totally unclear who is responsible for cleaning the > temporary file up. The one that creates it deletes it, which is git. > (3) the recipient must walk and carefully grab the path, certainly > has to "deal with separator (\n or NUL?) or escaping" anyway, > especially if the reason you use a temporary index is because > "they can use ls-files on it". They need to read from ls-files > anyway, so that is no better than feeding ls-files compatible > input from the standard input of the hook script. Err "ls-files compatible input" or "ls-files compatible _output_"? If by input you mean the pathspec we give to ls-files, I agree. If it's standard ls-files --stage output , letting the hook call ls-files lets it select the output format it wants (including the potential json output in the future), which is more flexible. > > no? -- Duy
Re: [PATCH v1] test-drop-caches: simplify delay loading of NtSetSystemInformation
On Mon, Apr 30, 2018 at 4:26 PM, Ben Peartwrote: > Take advantage of the recent addition of support for lazy loading functions > on Windows to simplfy the loading of NtSetSystemInformation. > > Signed-off-by: Ben Peart > --- > > Notes: > Base Ref: master > Web-Diff: https://github.com/benpeart/git/commit/6e6ce4a788 > Checkout: git fetch https://github.com/benpeart/git test-drop-caches-v1 > && git checkout 6e6ce4a788 > > t/helper/test-drop-caches.c | 16 > 1 file changed, 4 insertions(+), 12 deletions(-) > > diff --git a/t/helper/test-drop-caches.c b/t/helper/test-drop-caches.c > index 838760898b..dd41da1a2c 100644 > --- a/t/helper/test-drop-caches.c > +++ b/t/helper/test-drop-caches.c > @@ -1,5 +1,6 @@ > #include "test-tool.h" > #include "git-compat-util.h" > +#include "lazyload.h" This is in compat/win32, should it be inside the "if defined (GIT_WINDOWS_NATIVE)" block instead of here? > > #if defined(GIT_WINDOWS_NATIVE) > > @@ -82,8 +83,6 @@ static int cmd_dropcaches(void) > { > HANDLE hProcess = GetCurrentProcess(); > HANDLE hToken; > - HMODULE ntdll; > - DWORD(WINAPI *NtSetSystemInformation)(INT, PVOID, ULONG); > SYSTEM_MEMORY_LIST_COMMAND command; > int status; > > @@ -95,14 +94,9 @@ static int cmd_dropcaches(void) > > CloseHandle(hToken); > > - ntdll = LoadLibrary("ntdll.dll"); > - if (!ntdll) > - return error("Can't load ntdll.dll, wrong Windows version?"); > - > - NtSetSystemInformation = > - (DWORD(WINAPI *)(INT, PVOID, ULONG))GetProcAddress(ntdll, > "NtSetSystemInformation"); > - if (!NtSetSystemInformation) > - return error("Can't get function addresses, wrong Windows > version?"); > + DECLARE_PROC_ADDR(ntdll.dll, DWORD, NtSetSystemInformation, INT, > PVOID, ULONG); > + if (!INIT_PROC_ADDR(NtSetSystemInformation)) > + return error("Could not find NtSetSystemInformation() > function"); > > command = MemoryPurgeStandbyList; > status = NtSetSystemInformation( > @@ -115,8 +109,6 @@ static int cmd_dropcaches(void) > else if (status != STATUS_SUCCESS) > error("Unable to execute the memory list command %d", status); > > - FreeLibrary(ntdll); > - > return status; > } > > > base-commit: 1f1cddd558b54bb0ce19c8ace353fd07b758510d > -- > 2.17.0.windows.1 > -- Duy
Re: [PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index
On Mon, Apr 30, 2018 at 4:42 PM, Duy Nguyen <pclo...@gmail.com> wrote: > On Sun, Apr 29, 2018 at 10:53 PM, Johannes Schindelin > <johannes.schinde...@gmx.de> wrote: >>> > @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc >>> > *t, struct unpack_trees_options >>> > WRITE_TREE_SILENT | >>> > WRITE_TREE_REPAIR); >>> > } >>> > - move_index_extensions(>result, o->dst_index); >>> > + move_index_extensions(>result, o->src_index); >>> >>> While this looks like the right thing to do on paper, I believe it's >>> actually broken for a specific case of untracked cache. In short, >>> please do not touch this line. I will send a patch to revert >>> edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08), >>> which essentially deletes this line, with proper explanation and >>> perhaps a test if I could come up with one. >>> >>> When we update the index, we depend on the fact that all updates must >>> invalidate the right untracked cache correctly. In this unpack >>> operations, we start copying entries over from src to result. Since >>> 'result' (at least from the beginning) does not have an untracked >>> cache, it has nothing to invalidate when we copy entries over. By the >>> time we have done preparing 'result', what's recorded in src's (or >>> dst's for that matter) untracked cache may or may not apply to >>> 'result' index anymore. This copying only leads to more problems when >>> untracked cache is used. >> >> Is there really no way to invalidate just individual entries? > > Grr the short answer is the current code (i.e. without Elijah's > changes) works but in a twisted way. So you get to keep untracked > cache in the end. GAAAHH.. it works _with_ Elijah's changes (since he made the change from dst to src) not without (and no performance regression). This file really messes my brain up. -- Duy
Re: [PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index
On Sun, Apr 29, 2018 at 10:53 PM, Johannes Schindelinwrote: >> > @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc >> > *t, struct unpack_trees_options >> > WRITE_TREE_SILENT | >> > WRITE_TREE_REPAIR); >> > } >> > - move_index_extensions(>result, o->dst_index); >> > + move_index_extensions(>result, o->src_index); >> >> While this looks like the right thing to do on paper, I believe it's >> actually broken for a specific case of untracked cache. In short, >> please do not touch this line. I will send a patch to revert >> edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08), >> which essentially deletes this line, with proper explanation and >> perhaps a test if I could come up with one. >> >> When we update the index, we depend on the fact that all updates must >> invalidate the right untracked cache correctly. In this unpack >> operations, we start copying entries over from src to result. Since >> 'result' (at least from the beginning) does not have an untracked >> cache, it has nothing to invalidate when we copy entries over. By the >> time we have done preparing 'result', what's recorded in src's (or >> dst's for that matter) untracked cache may or may not apply to >> 'result' index anymore. This copying only leads to more problems when >> untracked cache is used. > > Is there really no way to invalidate just individual entries? Grr the short answer is the current code (i.e. without Elijah's changes) works but in a twisted way. So you get to keep untracked cache in the end. I was right about the invalidation stuff. I knew about invalidate_ce_path() in this file. What I didn't remember was this function actually invalidates entries from the _source_ index, not the result one. What kind of logic is that? You copy/move entries from source to result than you go invalidate the source. Since the original move_index_extensions() call moves extensions from the source, these are already properly invalidated (both untracked cache and cache tree), it it looks like it does the right thing. Two wrongs make a right, I guess. Sorry for venting. I was not happy with what I found. And sorry for wasting your time making this move_index.. change then remove it. > I have a couple of worktrees which are *huge*. And edf3b90553 really > helped relieve the pain a bit when running `git status`. Now you say that > even a `git checkout -b new-branch` would blow the untracked cache away > again? > > It would be *really* nice if we could prevent that performance regression > somehow. > > Ciao, > Dscho -- Duy
Re: [PATCH v5 09/10] help: use command-list.txt for the source of guides
Phillip (and others) the changes in this patch make "git help -g" now lists a lot more guides than just the "common" one as advertised (see below for the exact list). The man page for "git help -g" also mentions that it would list "useful" guides, not all guides. But we have no way to list all guides as far as I can tell. I guess we have two options forward: - keep "help -g" to common guide (we can tag common guides in command-list.txt) and add a new option to list all guides ("help -ag"?) - reword the man page to make "help -g" list all guides I'm ok with either direction. What's your preference? For comparison, this is the new output The common Git guides are: attributes Defining attributes per path cli Git command-line interface and conventions core-tutorial A Git core tutorial for developers cvs-migration Git for CVS users diffcoreTweaking diff output everydayA useful minimum set of commands for Everyday Git glossaryA Git Glossary hooks Hooks used by Git ignore Specifies intentionally untracked files to ignore modules Defining submodule properties namespaces Git namespaces repository-layout Git Repository Layout revisions Specifying revisions and ranges for Git tutorialA tutorial introduction to Git tutorial-2 A tutorial introduction to Git: part two workflows An overview of recommended workflows with Git compared to the old version The common Git guides are: attributes Defining attributes per path everyday Everyday Git With 20 Commands Or So glossary A Git glossary ignore Specifies intentionally untracked files to ignore modules Defining submodule properties revisionsSpecifying revisions and ranges for Git tutorial A tutorial introduction to Git (for version 1.5.1 or newer) workflowsAn overview of recommended workflows with Git On Sun, Apr 29, 2018 at 8:18 PM, Nguyễn Thái Ngọc Duywrote: > The help command currently hard codes the list of guides and their > summary in C. Let's move this list to command-list.txt. This lets us > extract summary lines from Documentation/git*.txt. This also > potentially lets us list guides in git.txt, but I'll leave that for > now. > > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > Documentation/gitattributes.txt| 2 +- > Documentation/gitmodules.txt | 2 +- > Documentation/gitrevisions.txt | 2 +- > Makefile | 2 +- > builtin/help.c | 32 -- > command-list.txt | 16 + > contrib/completion/git-completion.bash | 15 > help.c | 18 --- > help.h | 1 + > t/t0012-help.sh| 6 + > 10 files changed, 52 insertions(+), 44 deletions(-) > > diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt > index 1094fe2b5b..083c2f380d 100644 > --- a/Documentation/gitattributes.txt > +++ b/Documentation/gitattributes.txt > @@ -3,7 +3,7 @@ gitattributes(5) > > NAME > > -gitattributes - defining attributes per path > +gitattributes - Defining attributes per path > > SYNOPSIS > > diff --git a/Documentation/gitmodules.txt b/Documentation/gitmodules.txt > index db5d47eb19..4d63def206 100644 > --- a/Documentation/gitmodules.txt > +++ b/Documentation/gitmodules.txt > @@ -3,7 +3,7 @@ gitmodules(5) > > NAME > > -gitmodules - defining submodule properties > +gitmodules - Defining submodule properties > > SYNOPSIS > > diff --git a/Documentation/gitrevisions.txt b/Documentation/gitrevisions.txt > index 27dec5b91d..1f6cceaefb 100644 > --- a/Documentation/gitrevisions.txt > +++ b/Documentation/gitrevisions.txt > @@ -3,7 +3,7 @@ gitrevisions(7) > > NAME > > -gitrevisions - specifying revisions and ranges for Git > +gitrevisions - Specifying revisions and ranges for Git > > SYNOPSIS > > diff --git a/Makefile b/Makefile > index 71b5b594cd..18696e35b0 100644 > --- a/Makefile > +++ b/Makefile > @@ -1939,7 +1939,7 @@ $(BUILT_INS): git$X > > command-list.h: generate-cmdlist.sh command-list.txt > > -command-list.h: $(wildcard Documentation/git-*.txt) > +command-list.h: $(wildcard Documentation/git*.txt) > $(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh command-list.txt >$@+ > && mv $@+ $@ > > SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\ > diff --git a/builtin/help.c b/builtin/help.c > index 83a7d73afe..b58e8d5f6a 100644 > --- a/builtin/help.c > +++ b/builtin/help.c > @@ -402,38 +402,6 @@ static void show_html_page(const char *git_cmd) > open_html(page_path.buf); > } > > -static struct { > - const char *name; > - const
Re: [PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index
On Tue, Apr 24, 2018 at 8:50 AM, Elijah Newrenwrote: > Currently, all callers of unpack_trees() set o->src_index == o->dst_index. > The code in unpack_trees() does not correctly handle them being different. > There are two separate issues: > > First, there is the possibility of memory corruption. Since > unpack_trees() creates a temporary index in o->result and then discards > o->dst_index and overwrites it with o->result, in the special case that > o->src_index == o->dst_index, it is safe to just reuse o->src_index's > split_index for o->result. However, when src and dst are different, > reusing o->src_index's split_index for o->result will cause the > split_index to be shared. If either index then has entries replaced or > removed, it will result in the other index referring to free()'d memory. > > Second, we can drop the index extensions. Previously, we were moving > index extensions from o->dst_index to o->result. Since o->src_index is > the one that will have the necessary extensions (o->dst_index is likely to > be a new index temporary index created to store the results), we should be > moving the index extensions from there. > > Signed-off-by: Elijah Newren > --- > > Differences from v2: > - Don't NULLify src_index until we're done using it > - Actually built and tested[1] > > But it now passes the testsuite on both linux and mac[2], and I even re-merged > all 53288 merge commits in linux.git (with a merge of this patch together with > the directory rename detection series) for good measure. [Only 7 commits > showed a difference, all due to directory rename detection kicking in.] > > [1] Turns out that getting all fancy with an m4.10xlarge and nice levels of > parallelization are great until you realize that your new setup omitted a > critical step, leaving you running a slightly stale version of git instead... > :-( > > [2] Actually, I get two test failures on mac from t0050-filesystem.sh, both > with unicode normalization tests, but those two tests fail before my changes > too. All the other tests pass. > > unpack-trees.c | 19 +++ > 1 file changed, 15 insertions(+), 4 deletions(-) > > diff --git a/unpack-trees.c b/unpack-trees.c > index e73745051e..49526d70aa 100644 > --- a/unpack-trees.c > +++ b/unpack-trees.c > @@ -1284,9 +1284,20 @@ int unpack_trees(unsigned len, struct tree_desc *t, > struct unpack_trees_options > o->result.timestamp.sec = o->src_index->timestamp.sec; > o->result.timestamp.nsec = o->src_index->timestamp.nsec; > o->result.version = o->src_index->version; > - o->result.split_index = o->src_index->split_index; > - if (o->result.split_index) > + if (!o->src_index->split_index) { > + o->result.split_index = NULL; > + } else if (o->src_index == o->dst_index) { > + /* > +* o->dst_index (and thus o->src_index) will be discarded > +* and overwritten with o->result at the end of this function, > +* so just use src_index's split_index to avoid having to > +* create a new one. > +*/ > + o->result.split_index = o->src_index->split_index; > o->result.split_index->refcount++; > + } else { > + o->result.split_index = init_split_index(>result); > + } > hashcpy(o->result.sha1, o->src_index->sha1); > o->merge_size = len; > mark_all_ce_unused(o->src_index); > @@ -1401,7 +1412,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, > struct unpack_trees_options > } > } > > - o->src_index = NULL; > ret = check_updates(o) ? (-2) : 0; > if (o->dst_index) { > if (!ret) { > @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, > struct unpack_trees_options > WRITE_TREE_SILENT | > WRITE_TREE_REPAIR); > } > - move_index_extensions(>result, o->dst_index); > + move_index_extensions(>result, o->src_index); While this looks like the right thing to do on paper, I believe it's actually broken for a specific case of untracked cache. In short, please do not touch this line. I will send a patch to revert edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08), which essentially deletes this line, with proper explanation and perhaps a test if I could come up with one. When we update the index, we depend on the fact that all updates must invalidate the right untracked cache correctly. In this unpack operations, we start copying entries over from src to result. Since 'result' (at least from the beginning) does not have an untracked cache, it has nothing to invalidate when we copy entries over. By the time we have done preparing 'result', what's recorded in src's (or dst's for
Re: [PATCH v4/wip 02/12] generate-cmds.sh: export all commands to command-list.h
On Wed, Apr 25, 2018 at 8:07 PM, Eric Sunshinewrote: > On Wed, Apr 25, 2018 at 12:30 PM, Nguyễn Thái Ngọc Duy > wrote: >> The current generate-cmds.sh generates just enough to print "git help" >> output. That is, it only extracts help text for common commands. >> >> The script is now updated to extract help text for all commands and >> keep command classification a new file, command-list.h. This will be >> useful later: >> [...] >> Signed-off-by: Nguyễn Thái Ngọc Duy >> --- >> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh >> @@ -12,14 +34,51 @@ get_synopsis () { >> +define_categories() { >> + echo >> + echo "/* Command categories */" >> + bit=0 >> + category_list "$1" | >> + while read cat >> + do >> + echo "#define CAT_$cat (1UL << $bit)" >> + bit=$(($bit+1)) >> + done >> + test "$bit" -gt 32 && die "Urgh.. too many categories?" > > Should this be '-ge' rather than '-gt'? After we print "1UL << 31" we increment it to 32 and exit the loop, then it's still within limits, -ge would incorrectly complain -- Duy
Re: [RFC PATCH] checkout: Force matching mtime between files
On Thu, Apr 26, 2018 at 4:46 PM, Michał Górnywrote: > For the record, we're using this with ebuilds and respective cache files > (which are expensive to generate). We are using separate repository > which combines sources and cache files to keep the development > repository clean. I have researched different solutions for this but > git turned out the best option for incremental updates for us. > > Tarballs are out of question, unless you expect users to fetch >100 MiB > every time, and they are also expensive to update. Deltas of tarballs > are just slow and require storing a lot of extra data. Rsync is not > very efficient at frequent updates, and has significant overhead > on every run. With all its disadvantages, git is still something that > lets our users fetch updates frequently with minimal network overhead. I assume you're talking about the metadata directory in gentoo-x86 repo. This specific case could be solved by renaming metadata to _metadata or something to put it on the top. "git checkout" always updates files in strcmp(path) order. This guarantees time(_metadata) <= time(ebuild) for all ebuilds without any extra touching (either in git or in a post-checkout hook) The behavior has been this way since forever and as far as I can tell very unlikely to change at least for branch switching (major changes involved around the index). It's a bit easier to accidentally change how "git checkout -- path" works though. I don't know if we could just make this checkout order a promise and guarantee not to break it though. For it it does not sound like it adds extra maintenance burden. -- Duy
Re: [RFC PATCH] checkout: Force matching mtime between files
On Fri, Apr 27, 2018 at 11:08 PM, Marc Branchaudwrote: >> On Wed, Apr 25, 2018 at 5:18 PM, Marc Branchaud >> This is a limitation of the current post-checkout hook. $3==0 from the >> hook lets us know this is not a branch switch, but it does not really >> tell you the affected paths. If it somehow passes all the given >> pathspec to you, then you should be able to do "git ls-files -- >> $pathspec" which gives you the exact same set of paths that >> git-checkout updates. We could do this by setting $4 to "--" and put >> all the pathspecs in $5+ [1] e.g. "HEAD@{1} HEAD 0 -- path/to/file" in >> the above example. >> >> There is third case here, if you do "git checkout -- >> path/to/file" then it cannot be covered by the current design. I guess >> we could set $3 to '2' (retrieve from a tree) to indicate this in >> addition to 0 (from index) and 1 (from switching branch) and then $1 >> could be the tree in question (pathspecs are passed the same way >> above) >> >> [1] I wonder if we could have a more generic approach to pass >> pathspecs via environment, which could work for more than just this >> one hook. Not sure if it's a good idea though. > > > I think there needs to be something other than listing all the paths in the > command is viable, because it's too easy to hit some command-line-length > limit. I send pathspecs, not paths. If you type "git checkout -- foo/" then I send exactly "foo/" not every paths in it. You can always figure that out with git-ls-files. Sure this can still hit command length limit when you do "git checkout -- foo/*" and have lots of files in foo just one more param from hitting the limit, then the hook may hit the limit because we need more command line arguments. But this is the corner case I don't think we should really need to care about. > It would also be good if hook authors didn't have to re-invent the > wheel of determining the changed paths for every corner-case. Flexibility vs convenience I guess. A sample hook as template should help the reinvention. > My first instinct is to write them one-per-line on the hook's stdin. That's > probably not generic enough for most hooks, but it seems like a good > approach for this proposal. > > Throwing them into a temporary file with a known name is also good --- > better, I think, than stuffing them into an environment variable. This goes back to my post-checkout-modified proposal. If you're writing to file, might as well reuse the index format. Then you can read it with ls-files (which lets you decide path separator or even quoting, I'm not sure) and it also provides some more info like file hashes, access time... -- Duy
Re: [RFC PATCH] checkout: Force matching mtime between files
On Fri, Apr 27, 2018 at 11:08 PM, Elijah Newren <new...@gmail.com> wrote: > On Fri, Apr 27, 2018 at 10:03 AM, Duy Nguyen <pclo...@gmail.com> wrote: >> On Wed, Apr 25, 2018 at 5:18 PM, Marc Branchaud <marcn...@xiplink.com> wrote: >>> >>> * In a "file checkout" ("git checkout -- path/to/file"), $1 and $2 are >>> identical so the above loop does nothing. Offhand I'm not even sure how a >>> hook might get the right files in this case. >> >> This is a limitation of the current post-checkout hook. $3==0 from the >> hook lets us know this is not a branch switch, but it does not really >> tell you the affected paths. If it somehow passes all the given >> pathspec to you, then you should be able to do "git ls-files -- >> $pathspec" which gives you the exact same set of paths that >> git-checkout updates. We could do this by setting $4 to "--" and put >> all the pathspecs in $5+ [1] e.g. "HEAD@{1} HEAD 0 -- path/to/file" in >> the above example. >> >> There is third case here, if you do "git checkout -- >> path/to/file" then it cannot be covered by the current design. I guess >> we could set $3 to '2' (retrieve from a tree) to indicate this in >> addition to 0 (from index) and 1 (from switching branch) and then $1 >> could be the tree in question (pathspecs are passed the same way >> above) >> >> [1] I wonder if we could have a more generic approach to pass >> pathspecs via environment, which could work for more than just this >> one hook. Not sure if it's a good idea though. > > Here's a crazy idea -- maybe instead of a list of pathspecs you just > provide the timestamp of when git checkout started. Then the hook > could walk the tree, find all files with modification times at least > that late, and modify them all back to the the timestamp of when the > git checkout started. > > Would that be enough? Is that too crazy? For this use case? Probably (assuming that timestamp precision does not cause problems). I'm more concerned about what info post-checkout hook should provide but does not. Giving hook writer a way to get the list of updated files lets them do more fancy stuff while providing checkout start time probably only helps just this one case. Providing start time in general for all hooks sounds like a good thing though (and simple enough to implement). -- Duy
Re: [PATCH 18/41] index-pack: abstract away hash function constant
On Fri, Apr 27, 2018 at 11:08 PM, brian m. carlson <sand...@crustytoothpaste.net> wrote: > On Thu, Apr 26, 2018 at 05:46:28PM +0200, Duy Nguyen wrote: >> On Wed, Apr 25, 2018 at 8:49 PM, Martin Ågren <martin.ag...@gmail.com> wrote: >> > Once that is accomplished, I sort of suspect that this code will want to >> > be updated to not always blindly use the_hash_algo, but to always work >> > with SHA-1 sizes. Or rather, this would turn into more generic code to >> > handle both "v2 with SHA-1" and "v3 with some hash function(s)". This >> > commit might be a good first step in that direction. >> >> I also have an uneasy feeling when things this close to on-disk file >> format get hash-agnostic treatment. I think we would need to start >> adding new file formats soon, from bottom up with simple things like >> loose object files (cat-file and hash-object should be enough to test >> blobs...), then moving up to pack files and more. This is when we can >> really decide where to use the new hash and whether we should keep >> some hashes as sha-1. > > I agree that this is work which needs to be done soon. There are > basically a couple of pieces we need to handle NewHash: > > * Remove the dependencies on SHA-1 as much as possible. > * Get the tests to pass with a different hash (almost done for 160-bit > hash; in progress for 256-bit hashes). This step sounds good on paper but realistically could be a nightmare for you. I tried to implement a simple cat-file/hash-object combination with my imaginary newhash, which sounded straightforward to me since you have done a lot of heavylifting. To my surprise I hit a lot more problems. My point is, when I concentrate on just a few simple cases like this, I have a smaller scope to work with and could quickly identify problems. When you work on the scale of the test suite, it's really hard to know where the problem is (and you don't even know what areas are newhash-safe). Anyway my cat-file/hash-object modification could be found here. I probably will polish and send out a few good patches from it. https://github.com/pclouds/git/commits/new-hash -- Duy
Re: [RFC PATCH] checkout: Force matching mtime between files
On Wed, Apr 25, 2018 at 5:18 PM, Marc Branchaudwrote: >> The best approach to do so is to have those people do the "touch" >> thing in their own post-checkout hook. People who use Git as the >> source control system won't have to pay runtime cost of doing the >> touch thing, and we do not have to maintain such a hook script. >> Only those who use the "feature" would. > > > The post-checkout hook approach is not exactly straightforward. I am revisiting this because I'm not even happy with my post-checkout-modified hook suggestion, so.. > > Naively, it's simply > > for F in `git diff --name-only $1 $2`; do touch "$F"; done > > But consider: > > * Symlinks can cause the wrong file to be touched. (Granted, Michał's > proposed patch also doesn't deal with symlinks.) Let's assume that a hook > can be crafted will all possible sophistication. There are still some > fundamental problems: OK so this one could be tricky to get right, but it's possible. > > * In a "file checkout" ("git checkout -- path/to/file"), $1 and $2 are > identical so the above loop does nothing. Offhand I'm not even sure how a > hook might get the right files in this case. This is a limitation of the current post-checkout hook. $3==0 from the hook lets us know this is not a branch switch, but it does not really tell you the affected paths. If it somehow passes all the given pathspec to you, then you should be able to do "git ls-files -- $pathspec" which gives you the exact same set of paths that git-checkout updates. We could do this by setting $4 to "--" and put all the pathspecs in $5+ [1] e.g. "HEAD@{1} HEAD 0 -- path/to/file" in the above example. There is third case here, if you do "git checkout -- path/to/file" then it cannot be covered by the current design. I guess we could set $3 to '2' (retrieve from a tree) to indicate this in addition to 0 (from index) and 1 (from switching branch) and then $1 could be the tree in question (pathspecs are passed the same way above) [1] I wonder if we could have a more generic approach to pass pathspecs via environment, which could work for more than just this one hook. Not sure if it's a good idea though. > * The hook has to be set up in every repo and submodule (at least until > something like Ævar's experiments come to fruition). Either --template or core.hooksPath would solve this, or I'll try to get my "hooks.* config" patch in. I think that one is a good thing to do anyway because it allows more flexible hook management (and it could allow multiple hooks of the same name too). With this, we could avoid adding more command-specific config like core.fsmonitor or uploadpack.packObjectsHook which to me are hooks. Another option is extend core.hooksPath for searching multiple places instead of just one like AEvar suggested. > * A fresh clone can't run the hook. This is especially important when > dealing with submodules. (In one case where we were bit by this, make > though that half of a fresh submodule clone's files were stale, and decided > to re-autoconf the entire thing.) Both --template and config-based hooks should let you handle this case. So, I think if we improve the hook system to give more information (which is something we definitely should do), we could do this without adding special cases in git. I'm not saying that we should never add special cases, but at least this lets us play with different kinds of post-checkout activities before we decide which one would be best implemented in git. -- Duy
Re: [RFC PATCH] checkout: Force matching mtime between files
On Thu, Apr 26, 2018 at 07:53:58PM +0200, SZEDER Gábor wrote: > BTW, wouldn't running > > git clone --template=/path/to/template/dir/with/hooks/ > > invoke the post-checkout hook in there? > Yes but it's cumbersome, preparing a template just for one extra hook. I never like this template thing to be honest. -- Duy
Re: [RFC PATCH] checkout: Force matching mtime between files
On Thu, Apr 26, 2018 at 05:48:35PM +, Robin H. Johnson wrote: > On Thu, Apr 26, 2018 at 06:43:56PM +0200, Duy Nguyen wrote: > > On Wed, Apr 25, 2018 at 5:18 PM, Marc Branchaud <marcn...@xiplink.com> > > wrote: > > > Are we all that sure that the performance hit is that drastic? After all, > > > we've just done write_entry(). Calling utime() at that point should just > > > hit the filesystem cache. > > I have a feeling this has "this is linux" assumption. Anybody knows > > how freebsd, mac os x and windows behave? > I don't know sorry. futimens might be better here if it can be used > before the fd is closed. > > > > * In a "file checkout" ("git checkout -- path/to/file"), $1 and $2 are > > > identical so the above loop does nothing. Offhand I'm not even sure how a > > > hook might get the right files in this case. > > Would a hook that gives you the list of updated files (in the exact > > same order that git updates) help? > Yes, that, along with the target revision I think would allow most or > all of the desired behaviors mentioned in this thread *. Target revision should be available in the index. But this gives me an idea to another thing that bugs me: sending the list to the hook means I have to deal with separator (\n or NUL?) or escaping. This mentions of index makes me take a different direction. I could produce a small index that contains just what is modified, then you can retrieve whatever info you want with `git ls-files` or even `git show` after pointing $GIT_INDEX_FILE to it. So it's basically what the following (hacky) patch does. It adds support for a new hook named post-checkout-modified. This hook will prepares $GIT_DIR/index.modified which contains just the files git-checkout has touched and deletes it after the hook finishes. My test hook is pretty simple just to dump out what in there #!/bin/sh GIT_INDEX_FILE=`git rev-parse --git-path index.modified` git ls-files --stage and it seems to work. Of course, this does not give you the checkout order. But checkout order has always been sorted order by path if I remember correctly and it's unlikely to change (and I don't think you really need that exact order anyway) -- 8< -- diff --git a/builtin/checkout.c b/builtin/checkout.c index b49b582071..92b30cd05f 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -52,6 +52,8 @@ struct checkout_opts { const char *prefix; struct pathspec pathspec; struct tree *source_tree; + + struct index_state istate_modified; }; static int post_checkout_hook(struct commit *old_commit, struct commit *new_commit, @@ -470,7 +472,7 @@ static void setup_branch_path(struct branch_info *branch) branch->path = strbuf_detach(, NULL); } -static int merge_working_tree(const struct checkout_opts *opts, +static int merge_working_tree(struct checkout_opts *opts, struct branch_info *old_branch_info, struct branch_info *new_branch_info, int *writeout_error) @@ -595,6 +597,27 @@ static int merge_working_tree(const struct checkout_opts *opts, if (!cache_tree_fully_valid(active_cache_tree)) cache_tree_update(_index, WRITE_TREE_SILENT | WRITE_TREE_REPAIR); + if (find_hook("post-checkout-modified")) { + int i; + + for (i = 0; i < the_index.cache_nr; i++) { + struct cache_entry *ce = the_index.cache[i]; + struct cache_entry *new_ce; + + /* +* Hack: this is an abuse of this flag, hidden +* dependency with write_locked_index() +*/ + if (!(ce->ce_flags & CE_UPDATE_IN_BASE)) + continue; + + new_ce = xcalloc(1, cache_entry_size(ce_namelen(ce))); + memcpy(new_ce, ce, cache_entry_size(ce_namelen(ce))); + add_index_entry(>istate_modified, new_ce, + ADD_CACHE_JUST_APPEND); + } + } + if (write_locked_index(_index, _file, COMMIT_LOCK)) die(_("unable to write new index file")); @@ -811,7 +834,7 @@ static void orphaned_commit_warning(struct commit *old_commit, struct commit *ne clear_commit_marks_all(ALL_REV_FLAGS); } -static int switch_branches(const struct checkout_opts *opts, +static int switch_branches(struct checkout_opts *opts, struct branch_info *new_branch_info) { int ret = 0; @@ -848,6 +871,16 @@ static int switch_branches(const struct checkout_opts *opts, update_refs_for_switch(opts, _branch_info, new_branch_info);
Re: [RFC PATCH] checkout: Force matching mtime between files
On Wed, Apr 25, 2018 at 10:41:18AM +0200, Ævar Arnfjörð Bjarmason wrote: > 2. Add some config so we can have hook search paths, and ship with a > default search path for hooks shipped with git. I would go for something like this instead of search paths. This allows you to specify a path to any specific hook in hooks.* config group. This is basically "$GIT_DIR/hooks directory in config file" but with lower priority than those in $GIT_DIR/hooks. Now we can do something like git -c hooks.post-checkout=/path/to/some/script clone ... but of course I would need to fix the FIXME or this hook config is only effective just this one time. (And of course you could put it in ~/.gitconfig) -- 8< -- diff --git a/builtin/clone.c b/builtin/clone.c index 7df5932b85..143413ed2d 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -1063,6 +1063,11 @@ int cmd_clone(int argc, const char **argv, const char *prefix) strbuf_addf(_top, "refs/remotes/%s/", option_origin); } + /* +* FIXME: we should keep all custom config settings via +* "git -c ..." in $GIT_DIR/config. +*/ + strbuf_addf(, "+%s*:%s*", src_ref_prefix, branch_top.buf); strbuf_addf(, "remote.%s.url", option_origin); git_config_set(key.buf, repo); diff --git a/run-command.c b/run-command.c index 84899e423f..ee5b6ea329 100644 --- a/run-command.c +++ b/run-command.c @@ -7,6 +7,7 @@ #include "strbuf.h" #include "string-list.h" #include "quote.h" +#include "config.h" void child_process_init(struct child_process *child) { @@ -1256,6 +1257,8 @@ const char *find_hook(const char *name) strbuf_reset(); strbuf_git_path(, "hooks/%s", name); if (access(path.buf, X_OK) < 0) { + const char *cfg_hook; + struct strbuf cfg_key = STRBUF_INIT; int err = errno; #ifdef STRIP_EXTENSION @@ -1278,9 +1281,14 @@ const char *find_hook(const char *name) path.buf); } } - return NULL; + + strbuf_reset(); + strbuf_addf(_key, "hooks.%s", name); + if (!git_config_get_pathname(cfg_key.buf, _hook)) + strbuf_addstr(, cfg_hook); + strbuf_release(_key); } - return path.buf; + return path.len ? path.buf : NULL; } int run_hook_ve(const char *const *env, const char *name, va_list args) -- 8< -- -- Duy
Re: [RFC PATCH] checkout: Force matching mtime between files
On Wed, Apr 25, 2018 at 5:18 PM, Marc Branchaudwrote: > Are we all that sure that the performance hit is that drastic? After all, > we've just done write_entry(). Calling utime() at that point should just > hit the filesystem cache. I have a feeling this has "this is linux" assumption. Anybody knows how freebsd, mac os x and windows behave? > The post-checkout hook approach is not exactly straightforward. > > Naively, it's simply > > for F in `git diff --name-only $1 $2`; do touch "$F"; done > > But consider: > > * Symlinks can cause the wrong file to be touched. (Granted, Michał's > proposed patch also doesn't deal with symlinks.) Let's assume that a hook > can be crafted will all possible sophistication. There are still some > fundamental problems: > > * In a "file checkout" ("git checkout -- path/to/file"), $1 and $2 are > identical so the above loop does nothing. Offhand I'm not even sure how a > hook might get the right files in this case. Would a hook that gives you the list of updated files (in the exact same order that git updates) help? > * The hook has to be set up in every repo and submodule (at least until > something like Ævar's experiments come to fruition). > > * A fresh clone can't run the hook. This is especially important when > dealing with submodules. (In one case where we were bit by this, make > though that half of a fresh submodule clone's files were stale, and decided > to re-autoconf the entire thing.) This to me sounds like something we should be able to improve in general. The alternative is "git clone --no-checkout" then checkout manually (which I see jenkins plugin does) is really not optimal even if it works. -- Duy
Re: [PATCH 18/41] index-pack: abstract away hash function constant
On Wed, Apr 25, 2018 at 8:49 PM, Martin Ågrenwrote: >> I agree that pack v2 is not going to have anything but SHA-1. However, >> writing all the code such that it's algorithm agnostic means that we can >> do testing of new algorithms by wholesale replacing the algorithm with a >> new one, which simplifies things considerably. > > Ok. I do sort of wonder if a "successful" test run after globally > substituting Hash-Foo for SHA-1 (regardless of whether the size changes > or not) hints at a problem. That is, nowhere do we test that this code > uses 20-byte SHA-1s, regardless of what other hash functions are > available and configured. Of course, until soon, that did not really > have to be tested since there was only one hash function available to > choose from. As for identifying all the places that matter ... no idea. > > Of course I can see how this helps get things to a point where Git does > not crash and burn because the hash has a different size, and where the > test suite doesn't spew failures because the initial chaining value of > "SHA-1" is changed. > > Once that is accomplished, I sort of suspect that this code will want to > be updated to not always blindly use the_hash_algo, but to always work > with SHA-1 sizes. Or rather, this would turn into more generic code to > handle both "v2 with SHA-1" and "v3 with some hash function(s)". This > commit might be a good first step in that direction. I also have an uneasy feeling when things this close to on-disk file format get hash-agnostic treatment. I think we would need to start adding new file formats soon, from bottom up with simple things like loose object files (cat-file and hash-object should be enough to test blobs...), then moving up to pack files and more. This is when we can really decide where to use the new hash and whether we should keep some hashes as sha-1. For trailing hashes for example, there's no need to move to a new hash which only costs us more cycles. We just use it as a fancy checksum to avoid bit flips. But then my assumption about cost may be completely wrong without experimenting. > Long rambling short, yeah, I see your point. So yeah. It may be ok to move everything to "new hash" now. But we need a closer look soon. -- Duy
Re: What's cooking in git.git (Apr 2018, #03; Wed, 25)
On Wed, Apr 25, 2018 at 10:37 AM, Junio C Hamanowrote: > * nd/pack-objects-pack-struct (2018-04-16) 15 commits > ... > > "git pack-objects" needs to allocate tons of "struct object_entry" > while doing its work, and shrinking its size helps the performance > quite a bit. > > What's the doneness of this thing? The interdiff since previous > rounds looked reasonable, but I didn't see this round otherwise > scrutinized by reviewers. The numbers given in the commit near the > tip do look impressive, though ;-) I think it's ok to move it to next, though I'd prefer to move it to master just right after a release so it gets tested for a whole release cycle. This also gives Jeff a chance to check it after he's back (if he wants to). > * nd/repack-keep-pack (2018-04-16) 7 commits > ... > > "git gc" in a large repository takes a lot of time as it considers > to repack all objects into one pack by default. The command has > been taught to pretend as if the largest existing packfile is > marked with ".keep" so that it is left untouched while objects in > other packs and loose ones are repacked. > > What's the doneness of this thing? The interdiff since the earlier > one looked reasonable, but I didn't see this round otherwise > scrutinized by reviewers. This one should be safer than the previous one. I think it's ok to move to next. Anyway I'll re-read these two series this weekend to see if I could spot anything new. -- Duy
Re: [PATCH v3 4/6] git.c: implement --list-cmds=porcelain
On Wed, Apr 25, 2018 at 3:46 PM, SZEDER Gábor <szeder@gmail.com> wrote: > On Tue, Apr 24, 2018 at 6:17 PM, Duy Nguyen <pclo...@gmail.com> wrote: >> On Tue, Apr 24, 2018 at 6:12 PM, Duy Nguyen <pclo...@gmail.com> wrote: >>> git-completion.bash will be updated to ask git "give me the commands >>> in the mainporcelain, completable or external category". This also >>> addresses another thing that bugs me: I wanted an option to let me >>> complete all commands instead of just porcelain. This approach kinda >>> generalizes that and it would be easy to let the user choose what >>> category they want to complete. >> >> To complete this: there could also be yet another special category >> "custom", where --list-cmds=custom simply returns a list of commands >> specified in config file. With this the user can pick the "custom" >> category to have total control of what command shows up at "git " >> if they are not satisfied with the predefined categories. > > Note that config settings can be repository-specific, which might > cause surprises if the user sets a custom command list in one > repository's config file, but not (or a different one!) in others. > Then the list of completed commands will depend on where the user > happened to be when first hitting 'git '. I think that is expected when the word "config file" is mentioned. It's no different than aliases, which could also be repo specific and affects completion. > Unless you forgo > caching the list of commands and run 'git --list-cmds=...' every time > the user hits 'git ', but that will add the overhead of fork()ing > a subshell and a git command. This is a good point. I'd rather forgo caching iff the "custom" strategy is used (technically we could still cache if we know the source if $HOME/.gitconfig or higher but it's not worth the effort). Just make it clear to the user that if they go with "custom" strategy then they may hit some performance hit. > Once upon a time I toyed with the idea of introducing environment > variables like $GIT_COMPLETION_EXCLUDE_COMMANDS and > $GIT_COMPLETION_INCLUDE_COMMANDS, to exclude and include commands from > 'git '. I wanted to exclude 'git cherry', because I have never > ever used it but it's always in the way when I want to cherry-pick. > And I wanted to include 'git for-each-ref' back when I was running it > frequently while working on speeding up refs completion, but it > wouldn't have brought that much benefits, because I have a 'git > for-each-commit' script, too. > Never really pursued it, though, just patched the long list in > __git_list_porcelain_commands() in my git clone :) I'm tempted to support "delta custom list" (e.g. "+cherry-pick" in the config variable means "add that command on the existing list", or "-blame" means exclude it) but that's probably too much work. -- Duy
Re: [PATCH v3 4/6] git.c: implement --list-cmds=porcelain
On Wed, Apr 25, 2018 at 3:06 PM, SZEDER Gáborwrote: >> -__git_list_all_commands () > >> -__git_list_porcelain_commands () > > Users can have their own completion scriptlets for their own git > commands, and these should be able to rely on helper functions in > git-completion.bash to do things like refs completion and what not. > Therefore, we tend not to remove or alter those helper functions in a > backwards incompatible way, because we don't want to break those > completion scriptlets. What kind of "API" guarantee do we provide here? I don't mind adding the wrappers, but for me it's hard for new contributors (like me) to see which one should be stable and which one is internal. -- Duy
Re: [PATCH v3 4/6] git.c: implement --list-cmds=porcelain
On Tue, Apr 24, 2018 at 6:12 PM, Duy Nguyen <pclo...@gmail.com> wrote: > git-completion.bash will be updated to ask git "give me the commands > in the mainporcelain, completable or external category". This also > addresses another thing that bugs me: I wanted an option to let me > complete all commands instead of just porcelain. This approach kinda > generalizes that and it would be easy to let the user choose what > category they want to complete. To complete this: there could also be yet another special category "custom", where --list-cmds=custom simply returns a list of commands specified in config file. With this the user can pick the "custom" category to have total control of what command shows up at "git " if they are not satisfied with the predefined categories. -- Duy
Re: [PATCH v3 4/6] git.c: implement --list-cmds=porcelain
On Mon, Apr 23, 2018 at 3:32 PM, SZEDER Gáborwrote: > But then I noticed that it's not an accurate description of the > current situation, because there is a wide grey area between > porcelains and plumbing, and the completion script doesn't "filter out > plumbing commands", but rather filters out commands that can be > considered too low-level to be useful for "general" usage. > Consequently, after 'git ' we also list: > > - some 'ancillaryinterrogators': blame, annotate, difftool, fsck, > help > - some 'ancillarymanipulators': config, mergetool, remote > - some 'foreignscminterface': p4, request-pull, svn, send-email > - even some plumbing: apply, name-rev (though 'name-rev' could be > omitted; we have 'git describe') > - and also all "unknown" 'git-foo' commands that can be found in > $PATH, which can be the user's own git scripts or other > git-related tools ('git-annex', Git LFS, etc.). > > With this change we wouldn't list any of the above commands, but only > those that are explicitly categorized as 'mainporcelain'. I'd much > prefer the current behaviour. Yeah I noticed this (kinda) with filter-branch but I did not look further to see all this. It's good that you review this series then :) For the first group (known commands), how about we add a new category "completion" in command-list.txt? Each command may belong to multiple categories (and my updated script has to deal with that [1]). For the second group, we could also have a special "external" category that is produced at run time, not specified in command-list.txt. --list-cmds option either has to accept multiple values, or we accept multiple --list-cmds= options. git-completion.bash will be updated to ask git "give me the commands in the mainporcelain, completable or external category". This also addresses another thing that bugs me: I wanted an option to let me complete all commands instead of just porcelain. This approach kinda generalizes that and it would be easy to let the user choose what category they want to complete. [1] which also means I could bring "deprecated" category back. -- Duy
Re: [RFC PATCH v10 32.5/36] unpack_trees: fix memory corruption with split_index when src != dst
On Mon, Apr 23, 2018 at 7:09 PM, Elijah Newren <new...@gmail.com> wrote: > Hi, > > On Sun, Apr 22, 2018 at 5:38 AM, Duy Nguyen <pclo...@gmail.com> wrote: >>> - there's a better, more performant fix or there is some way to actually >>> share a split_index between two independent index_state objects. >> >> A cleaner way of doing this would be something to the line [1] >> >> move_index_extensions(>result, o->dst_index); >> >> near the end of this function. This could be where we compare the >> result index with the source index's shared file and see if it's worth >> keeping the shared index or not. Shared index is designed to work with >> huge index files though, any operations that go through all index >> entries will usually not be cheap. But at least it's safer. > > Yeah, it looks like move_index_extensions() currently has no logic for > the split_index. Adding it sounds to me like a patch series of its > own, and I'm keen to limit additional changes since my patch series > already broke things pretty badly once already. Oh I'm not suggesting that you do it. I was simply pointing out something I saw while I looked at this patch and surrounding area. And it's definitely should be done separately (by whoever) since merge logic is quite twisted as I understand it (then top it off with rename logic) >> [1] To me the second parameter should be src_index, not dst_index. >> We're copying entries from _source_ index to "result" and we should >> also copy extensions from the source index. That line happens to work >> only when dst_index is the same as src_index, which is the common use >> case so far. > > That makes sense; this sounds like another fix that should be > submitted. Did you want to submit a patch making that change? Do you > want me to? I did not look careful enough to make sure it was right and submit a patch. But it sounds like it could be another regression if dst_index is now not the same as src_index (sorry I didn't look at your whole stories and don't if dst_index != src_index is a new thing or not). If dst_index is new, moving extensions from that to result index is basically no-op, in other words we fail to copy necessary extensions over. -- Duy
Re: [PATCH v3 0/6] Keep all info in command-list.txt in git binary
On Sun, Apr 22, 2018 at 5:58 PM, Ramsay Joneswrote: >>> I think you need to try a little harder than this! ;-) >> >> Yeah. I did think about grepping the output but decided not to because >> of gettext poison stuff and column output in "git help". If we do want >> to test this, how about I extend --list-cmds= option to take a few >> more parameters? --list-cmds=common would output all common commands, >> --list-cmds= does the same for other command category. This >> way we can verify without worrying about text formatting, paging or >> translation. > > Hmm, my immediate reaction would be to prefer my simple tests. > Yes, they are not exactly rigorous and they would be affected > by changing the help formatting, but they are effective. ;-) > > [I don't think the formatting would change that often, or at > all - whoever submits that patch would get to update the tests!] Hmm.. for non-column output that's true. "git help" with column output should probably fine as well because even though we add more and more commands every month, these are not marked common (and unlikely so). So yeah I agree. > What did you think about adding the BUG() checks? I was thinking if there was a way to fail the build after running ./generate-cmds.sh and generating empty output but it does not sound easy to do. So yeah, BUG() checks sound good. -- Duy
Re: [PATCH v3 0/6] Keep all info in command-list.txt in git binary
On Sun, Apr 22, 2018 at 4:45 PM, Ramsay Jones <ram...@ramsayjones.plus.com> wrote: > > > On 21/04/18 17:56, Duy Nguyen wrote: >> On Sat, Apr 21, 2018 at 06:54:08PM +0200, Nguyễn Thái Ngọc Duy wrote: >>> Changes: >>> >>> - remove the deprecated column in command-list.txt. My change break it >>> anyway if anyone uses it. >>> - fix up failed tests that I marked in the RFC and kinda forgot about it. >>> - fix bashisms in generate-cmdlist.sh >>> - fix segfaul in "git help" >> >> Sorry I forgot the interdiff >> > [snip] > >> diff --git a/t/t0012-help.sh b/t/t0012-help.sh >> index fd2a7f27dc..53208ab20e 100755 >> --- a/t/t0012-help.sh >> +++ b/t/t0012-help.sh >> @@ -25,6 +25,15 @@ test_expect_success "setup" ' >> EOF >> ' >> >> +# make sure to exercise these code paths, the output is a bit tricky >> +# to verify >> +test_expect_success 'basic help commands' ' >> + git help >/dev/null && >> + git help -a >/dev/null && >> + git help -g >/dev/null && >> + git help -av >/dev/null >> +' >> + > I think you need to try a little harder than this! ;-) Yeah. I did think about grepping the output but decided not to because of gettext poison stuff and column output in "git help". If we do want to test this, how about I extend --list-cmds= option to take a few more parameters? --list-cmds=common would output all common commands, --list-cmds= does the same for other command category. This way we can verify without worrying about text formatting, paging or translation. -- Duy
Re: [RFC PATCH v10 32.5/36] unpack_trees: fix memory corruption with split_index when src != dst
On Sat, Apr 21, 2018 at 9:37 PM, Elijah Newrenwrote: > Currently, all callers of unpack_trees() set o->src_index == o->dst_index. > Since we create a temporary index in o->result, then discard o->dst_index > and overwrite it with o->result, when o->src_index == o->dst_index it is > safe to just reuse o->src_index's split_index for o->result. However, > o->src_index and o->dst_index are specified separately in order to allow > callers to have these be different. In such a case, reusing > o->src_index's split_index for o->result will cause the split_index to be > shared. If either index then has entries replaced or removed, it will > result in the other index referring to free()'d memory. > > Signed-off-by: Elijah Newren > --- > > I still haven't wrapped my head around the split_index stuff entirely, so > it's possible that > > - the performance optimization isn't even valid when src == dst. Could > the original index be different enough from the result that we don't > want its split_index? This really depends on the use case of course. But when git checkout is used for switching branches, unpack-trees will be used and unless you switch between to vastly different branches, the updated entries may be small compared to the entire index that sharing is still good. If the result index is so different that it results in a huge index file anyway, I believe we have code to recreate a new shared index to keep its size down next time. > - there's a better, more performant fix or there is some way to actually > share a split_index between two independent index_state objects. A cleaner way of doing this would be something to the line [1] move_index_extensions(>result, o->dst_index); near the end of this function. This could be where we compare the result index with the source index's shared file and see if it's worth keeping the shared index or not. Shared index is designed to work with huge index files though, any operations that go through all index entries will usually not be cheap. But at least it's safer. > However, with this fix, all the tests pass both normally and under > GIT_TEST_SPLIT_INDEX=DareISayYes. Without this patch, when > GIT_TEST_SPLIT_INDEX is set, my directory rename detection series will fail > several tests, as reported by SZEDER. Yes, the change looks good. [1] To me the second parameter should be src_index, not dst_index. We're copying entries from _source_ index to "result" and we should also copy extensions from the source index. That line happens to work only when dst_index is the same as src_index, which is the common use case so far. > unpack-trees.c | 15 +-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/unpack-trees.c b/unpack-trees.c > index 79fd97074e..b670415d4c 100644 > --- a/unpack-trees.c > +++ b/unpack-trees.c > @@ -1284,9 +1284,20 @@ int unpack_trees(unsigned len, struct tree_desc *t, > struct unpack_trees_options > o->result.timestamp.sec = o->src_index->timestamp.sec; > o->result.timestamp.nsec = o->src_index->timestamp.nsec; > o->result.version = o->src_index->version; > - o->result.split_index = o->src_index->split_index; > - if (o->result.split_index) > + if (!o->src_index->split_index) { > + o->result.split_index = NULL; > + } else if (o->src_index == o->dst_index) { > + /* > +* o->dst_index (and thus o->src_index) will be discarded > +* and overwritten with o->result at the end of this function, > +* so just use src_index's split_index to avoid having to > +* create a new one. > +*/ > + o->result.split_index = o->src_index->split_index; > o->result.split_index->refcount++; > + } else { > + o->result.split_index = init_split_index(>result); > + } > hashcpy(o->result.sha1, o->src_index->sha1); > o->merge_size = len; > mark_all_ce_unused(o->src_index); > -- > 2.17.0.296.gaac25b4b81 > -- Duy
Re: [PATCH v3 0/6] Keep all info in command-list.txt in git binary
On Sat, Apr 21, 2018 at 06:54:08PM +0200, Nguyễn Thái Ngọc Duy wrote: > Changes: > > - remove the deprecated column in command-list.txt. My change break it > anyway if anyone uses it. > - fix up failed tests that I marked in the RFC and kinda forgot about it. > - fix bashisms in generate-cmdlist.sh > - fix segfaul in "git help" Sorry I forgot the interdiff diff --git a/command-list.txt b/command-list.txt index 0809a19184..1835f1a928 100644 --- a/command-list.txt +++ b/command-list.txt @@ -9,7 +9,7 @@ history grow, mark and tweak your common history remote collaborate (see also: git help workflows) ### command list (do not change this line) -# command name category [deprecated] [common] +# command name category[common] git-add mainporcelain worktree git-am mainporcelain git-annotateancillaryinterrogators diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index 9f17703aa7..7d17ca23f6 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -835,19 +835,23 @@ __git_complete_strategy () } __git_commands () { - if test -n "${GIT_TESTING_COMMAND_COMPLETION:-}" + if test -n "$GIT_TESTING_COMPLETION" then - printf "%s" "${GIT_TESTING_COMMAND_COMPLETION}" + case "$1" in + porcelain) + printf "%s" "$GIT_TESTING_PORCELAIN_COMMAND_LIST";; + all) + printf "%s" "$GIT_TESTING_ALL_COMMAND_LIST";; + esac else git --list-cmds=$1 fi } -__git_list_all_commands () +__git_list_commands () { local i IFS=" "$'\n' - local category=${1-all} - for i in $(__git_commands $category) + for i in $(__git_commands $1) do case $i in *--*) : helper pattern;; @@ -860,14 +864,14 @@ __git_all_commands= __git_compute_all_commands () { test -n "$__git_all_commands" || - __git_all_commands=$(__git_list_all_commands) + __git_all_commands=$(__git_list_commands all) } __git_porcelain_commands= __git_compute_porcelain_commands () { test -n "$__git_porcelain_commands" || - __git_porcelain_commands=$(__git_list_all_commands porcelain) + __git_porcelain_commands=$(__git_list_commands porcelain) } # Lists all set config variables starting with the given section prefix, diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index e35f3e357b..86d59419b3 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -36,7 +36,7 @@ sed -n ' ' "$1" printf '};\n\n' -echo "#define GROUP_NONE 0xff /* no common group */" +echo "#define GROUP_NONE 0xff" n=0 while read grp do @@ -45,15 +45,6 @@ do done <"$grps" echo -echo '/*' -printf 'static const char *cmd_categories[] = {\n' -category_list "$1" | -while read category; do - printf '\t\"'$category'\",\n' -done -printf '\tNULL\n};\n\n' -echo '*/' - n=0 category_list "$1" | while read category; do @@ -68,10 +59,11 @@ sort | while read cmd category tags do if [ "$category" = guide ]; then - name=${cmd/git} + prefix=git else - name=${cmd/git-} + prefix=git- fi + name=$(echo $cmd | sed "s/^${prefix}//") sed -n ' /^NAME/,/'"$cmd"'/H ${ diff --git a/help.c b/help.c index a44f4a113e..88127fdd6f 100644 --- a/help.c +++ b/help.c @@ -201,7 +201,8 @@ static void extract_common_cmds(struct cmdname_help **p_common_cmds, for (i = 0; i < ARRAY_SIZE(command_list); i++) { const struct cmdname_help *cmd = command_list + i; - if (cmd->category != CAT_mainporcelain) + if (cmd->category != CAT_mainporcelain || + cmd->group == GROUP_NONE) continue; common_cmds[nr++] = *cmd; diff --git a/t/t0012-help.sh b/t/t0012-help.sh index fd2a7f27dc..53208ab20e 100755 --- a/t/t0012-help.sh +++ b/t/t0012-help.sh @@ -25,6 +25,15 @@ test_expect_success "setup" ' EOF ' +# make sure to exercise these code paths, the output is a bit tricky +# to verify +test_expect_success 'basic help commands' ' + git help >/dev/null && + git help -a >/dev/null && + git help -g >/dev/null && + git help -av >/dev/null +' + test_expect_success "works for commands and guides by default" ' configure_help && git help status && diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh index 4bfd26ddf9..5a23a46fcf 100755 --- a/t/t9902-completion.sh +++ b/t/t9902-completion.sh @@ -13,7 +13,7 @@ complete () return 0 } -# Be careful when updating this list: +# Be careful when
Re: [PATCH v2 1/2] completion: stop showing 'save' for stash by default
On Fri, Apr 20, 2018 at 1:25 AM, Thomas Gummererwrote: > The 'save' subcommand in git stash has been deprecated in > fd2ebf14db ("stash: mark "git stash save" deprecated in the man page", > 2017-10-22). > > Stop showing it when the users enters 'git stash ' or 'git stash > s'. Keep showing it however when the user enters 'git stash sa' > or any more characters of the 'save' subcommand. I don't think this is worth it. You only save two keystrokes for 've' and already waste one on . -- Duy
Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary
On Wed, Apr 18, 2018 at 12:47 AM, Philip Oakleywrote: >>> > Is that something I should add to my todo to add a 'guide' category > >>> > etc.? >>> >>> I added it too [1]. Not sure if you want anything more on top though. > > > What I've seen is looking good - I've not had as much time as I'd like.. > > I'm not sure of the status of the git/generate-cmdlist.sh though. Should > that also be updated, or did I miss that? Yes it's updated by other patches in the same thread. -- Duy
Re: [PATCH/RFC] completion: complete all possible -no-
On Wed, Apr 18, 2018 at 5:43 AM, Junio C Hamanowrote: > So, the earlier mention of "clone --no-checkout" sounded about not > losing this historical practice, but (desirabilty of magic number 4 > aside) this "show first handful of --no-foo" feature is not about > historical practice but is forward looking, in the sense that you do > not mark "important" negated options in the source, which would be a > way to handle the histrical "clone --no-checkout", but let the > machinery mechanically choose among --no-foo (with the stupid choice > criterion "first four are shown"). Well you kinda mark important in the source too. --no-checkout for exampled is declared as OPT_BOOL(0, "no-checkout"... and parse-options code has to add the double-negative form --checkout back [1]. The "first four" is chosen after carefully examining all commands and observing that none of them have more than 4 "important" --no-. But yes it is questionable and I should be able to do better to separate the favorable --no- from the other extra and most-of-the-time-useless --no- options. > That allows other commands to > have many --no-foo form without overwhelming the choices, but I am > not sure if it is much better than a possible alternative of only > showing --no-foo for more "important" foo's when show_gitcomp() is > asked to list all of things. It would certainly be a more involved > solution, that might require an update to the way how the choices > are precomputed (you'd end up having to keep a separate "use this > list when completing '--no-'" in addition to the normal list). I did think about this alternative and was still undecided. Suppose that you have less than 4 "important" --no- options, showing some extra ones to me does not really hurt anything and if we could show more options (within the same screen space) we should. But on the other hand maintaining this magic number could be a maintenance nightmare... Yeah I think I'm shifting towards no magic number now. [1] These double negative options will _always_ show up, there is no easy way to hide them because they don't start with --no-. But we don't have a lot of options starting with "no-" so it's probably fine. -- Duy
Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary
On Tue, Apr 17, 2018 at 06:24:41PM +0200, Duy Nguyen wrote: > On Sun, Apr 15, 2018 at 11:21 PM, Philip Oakley <philipoak...@iee.org> wrote: > > From: "Duy Nguyen" <pclo...@gmail.com> : Saturday, April 14, 2018 4:44 PM > > > >> On Thu, Apr 12, 2018 at 12:06 AM, Philip Oakley <philipoak...@iee.org> > >> wrote: > >>> > >>> I'm only just catching up, but does/can this series also capture the > >>> non-command guides that are available in git so that the 'git help -g' > >>> can > >>> begin to list them all? > >> > >> > >> It currently does not. But I don't see why it should not. This should > >> allow git.txt to list all the guides too, for people who skip "git > >> help" and go hard core mode with "man git". Thanks for bringing this > >> up. > >> -- > >> Duy > >> > > Is that something I should add to my todo to add a 'guide' category etc.? > > I added it too [1]. Not sure if you want anything more on top though. The "anything more" that at least I had in mind was something like this. Though I'm not sure if it's a good thing to replace a hand crafted section with an automatedly generated one. This patch on top combines the "SEE ALSO" and "FURTHER DOCUMENT" into one with most of documents/guides are extracted from command-list.txt -- 8< -- diff --git a/Documentation/Makefile b/Documentation/Makefile index 6232143cb9..3e0ecd2e11 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -292,6 +292,7 @@ doc.dep : $(docdep_prereqs) $(wildcard *.txt) build-docdep.perl cmds_txt = cmds-ancillaryinterrogators.txt \ cmds-ancillarymanipulators.txt \ + cmds-guide.txt \ cmds-mainporcelain.txt \ cmds-plumbinginterrogators.txt \ cmds-plumbingmanipulators.txt \ diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl index 5aa73cfe45..e158bd9b96 100755 --- a/Documentation/cmd-list.perl +++ b/Documentation/cmd-list.perl @@ -54,6 +54,7 @@ for (sort <>) { for my $cat (qw(ancillaryinterrogators ancillarymanipulators + guide mainporcelain plumbinginterrogators plumbingmanipulators diff --git a/Documentation/git.txt b/Documentation/git.txt index 4767860e72..d60d2ae0c7 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -808,29 +808,6 @@ The index is also capable of storing multiple entries (called "stages") for a given pathname. These stages are used to hold the various unmerged version of a file when a merge is in progress. -FURTHER DOCUMENTATION -- - -See the references in the "description" section to get started -using Git. The following is probably more detail than necessary -for a first-time user. - -The link:user-manual.html#git-concepts[Git concepts chapter of the -user-manual] and linkgit:gitcore-tutorial[7] both provide -introductions to the underlying Git architecture. - -See linkgit:gitworkflows[7] for an overview of recommended workflows. - -See also the link:howto-index.html[howto] documents for some useful -examples. - -The internals are documented in the -link:technical/api-index.html[Git API documentation]. - -Users migrating from CVS may also want to -read linkgit:gitcvs-migration[7]. - - Authors --- Git was started by Linus Torvalds, and is currently maintained by Junio @@ -854,11 +831,16 @@ the Git Security mailing list <git-secur...@googlegroups.com>. SEE ALSO -linkgit:gittutorial[7], linkgit:gittutorial-2[7], -linkgit:giteveryday[7], linkgit:gitcvs-migration[7], -linkgit:gitglossary[7], linkgit:gitcore-tutorial[7], -linkgit:gitcli[7], link:user-manual.html[The Git User's Manual], -linkgit:gitworkflows[7] + +See the references in the "description" section to get started +using Git. The following is probably more detail than necessary +for a first-time user. + +include::cmds-guide.txt[] + +See also the link:howto-index.html[howto] documents for some useful +examples. The internals are documented in the +link:technical/api-index.html[Git API documentation]. GIT --- diff --git a/command-list.txt b/command-list.txt index 1835f1a928..f26b8acd52 100644 --- a/command-list.txt +++ b/command-list.txt @@ -150,10 +150,14 @@ git-whatchanged ancillaryinterrogators git-worktreemainporcelain git-write-tree plumbingmanipulators gitattributes guide +gitcvs-migrationguide +gitcli guide +gitcore-tutorialguide giteveryday guide gitglossary guide gitignore guide gitmodules guide gitrevisionsguide gittutorial guide +gittutorial-2 guide gitworkflowsguide -- 8< --
Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary
On Sun, Apr 15, 2018 at 11:21 PM, Philip Oakley <philipoak...@iee.org> wrote: > From: "Duy Nguyen" <pclo...@gmail.com> : Saturday, April 14, 2018 4:44 PM > >> On Thu, Apr 12, 2018 at 12:06 AM, Philip Oakley <philipoak...@iee.org> >> wrote: >>> >>> I'm only just catching up, but does/can this series also capture the >>> non-command guides that are available in git so that the 'git help -g' >>> can >>> begin to list them all? >> >> >> It currently does not. But I don't see why it should not. This should >> allow git.txt to list all the guides too, for people who skip "git >> help" and go hard core mode with "man git". Thanks for bringing this >> up. >> -- >> Duy >> > Is that something I should add to my todo to add a 'guide' category etc.? I added it too [1]. Not sure if you want anything more on top though. [1] https://public-inbox.org/git/20180415164238.9107-7-pclo...@gmail.com/T/#u -- Duy
Re: [PATCH v2 3/6] generate-cmdlist.sh: keep all information in common-cmds.h
On Mon, Apr 16, 2018 at 8:28 AM, Junio C Hamanowrote: > Nguyễn Thái Ngọc Duy writes: > >> @@ -23,28 +36,44 @@ sed -n ' >> ' "$1" >> printf '};\n\n' >> >> +echo "#define GROUP_NONE 0xff /* no common group */" > > Some later code forgets about this value, and causes "git" to > segfault at the end of this entire series. > > Namely, here: > >> - for (i = 0; i < ARRAY_SIZE(common_cmds); i++) { >> + for (i = 0; i < nr; i++) { >> if (common_cmds[i].group != current_grp) { >> printf("\n%s\n", >> _(common_cmd_groups[common_cmds[i].group])); >> current_grp = common_cmds[i].group; > > where common_cmd_groups[] gets overrun. Argh!! I thought I tested everything. Sorry for the sloppy quality. > > Here is a squash I'll queue on top to keep the tip of 'pu' at least > buildable. Thanks. It's actually interesting that we have main porcelain commands that belong to no group. I'll try to classify them so that they show up as well. -- Duy
Re: .gitattributes lookup doesn't respect GIT_WORK_TREE
On Mon, Apr 16, 2018 at 12:40 AM, Andreas Schwabwrote: > On Apr 16 2018, Junio C Hamano wrote: > >> I may be mistaken (I do not have the code in front of me right now) >> but IIRC after the setup.c code runs (which happens quite early in >> the sequence that starts from git.c::cmd_main()), the Git process >> moves to the top level of the working tree, > > git log/show don't appear to do that. Yeah we lazily set up worktree in some cases. Elsewhere in the chdir-notify thread, I suggested that we set up worktree unconditionally (but do not die setup fails; only die when a command specifically requests for worktree). That work would help make this work in most cases. But it's not materialized yet. -- Duy
Re: [PATCH/RFC 3/5] generate-cmdlist.sh: keep all information in common-cmds.h
On Mon, Apr 9, 2018 at 6:59 AM, Eric Sunshinewrote: >> +awk '{print $2;}' | > > At one time, Junio expressed concerns[2] about having an 'awk' > dependency in the build system (in fact, with regards to this same > generation process). Whether he still has such concerns is unknown, > but it should be easy enough to avoid it here (and below). > > [2]: https://public-inbox.org/git/20150519004356.GA12854@flurp.local/ I'll stick with awk to avoid too much headache with regular expressions (replacements are welcome though). We do use awk in our test suite so it should be ok (who builds git and runs it without testing?) -- Duy
Re: [PATCH v1 0/2] fsexcludes: Add programmatic way to exclude files
On Tue, Apr 10, 2018 at 11:04 PM, Ben Peartwrote: > In git repos with large working directories an external file system monitor > (like fsmonitor or gvfs) can track what files in the working directory have > been > modified. This information can be used to speed up git operations that scale > based on the size of the working directory so that they become O(# of modified > files) vs O(# of files in the working directory). > > The fsmonitor patch series added logic to limit what files git had to stat() > to > the set of modified files provided by the fsmonitor hook proc. It also used > the > untracked cache (if enabled) to limit the files/folders git had to scan > looking > for new/untracked files. GVFS is another external file system model that also > speeds up git working directory based operations that has been using a > different > mechanism (programmatically generating an excludes file) to enable git to be > O(# of modified files). > > This patch series will introduce a new way to limit git�s traversal of the > working directory that does not require the untracked cache (fsmonitor) or > using > the excludes feature (GVFS). It does this by enhancing the existing excludes > logic in dir.c to support a new �File System Excludes� or fsexcludes API that > is > better tuned to these programmatic applications. I have not had a chance to really look at the patches yet but I think these three paragraphs should somehow be included in the commit description of 1/2 (or spread out between 1/2 and 2/2). 1/2 description for example briefly talks about how to use the new thing, but not really tell what it's for, why you need to add it. -- Duy
Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary
On Thu, Apr 12, 2018 at 12:06 AM, Philip Oakleywrote: > I'm only just catching up, but does/can this series also capture the > non-command guides that are available in git so that the 'git help -g' can > begin to list them all? It currently does not. But I don't see why it should not. This should allow git.txt to list all the guides too, for people who skip "git help" and go hard core mode with "man git". Thanks for bringing this up. -- Duy
Re: [PATCH/RFC 3/5] generate-cmdlist.sh: keep all information in common-cmds.h
On Mon, Apr 9, 2018 at 6:59 AM, Eric Sunshinewrote: > On Mon, Mar 26, 2018 at 12:55 PM, Nguyễn Thái Ngọc Duy > wrote: >> common-cmds.h is used to extract the list of common commands (by >> group) and a one-line summary of each command. Some information is >> dropped, for example command category or summary of other commands. >> Update generate-cmdlist.sh to keep all the information. The extra info >> will be used shortly. >> >> Signed-off-by: Nguyễn Thái Ngọc Duy >> --- >> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh >> @@ -2,9 +2,10 @@ >> struct cmdname_help { >> - char name[16]; >> + char name[32]; >> char help[80]; >> - unsigned char group; >> + unsigned int category; >> + unsigned int group; >> }; >> @@ -23,27 +24,50 @@ sed -n ' >> +echo "#define GROUP_NONE 0xff /* no common group */" >> +echo "#define GROUP_ 0xff /* no common group */" > > Meh, this "GROUP_" alias of "GROUP_NONE" isn't so nice. Yeah. I don't want to mess too much with shell script. I wonder if we should instead kill this script and extend Documentation/cmd-list.perl to handle this task too. It would be much nicer to write and maintain the script. The downside is NO_PERL builds will have no commands in "git help". -- Duy
Re: [PATCH 1/2] git-worktree.txt: recommend 'git worktree remove' over manual deletion
On Mon, Apr 9, 2018 at 9:33 AM, Eric Sunshinewrote: > When cc73385cf6 (worktree remove: new command, 2018-02-12) implemented > and documented 'git worktree remove', it forgot to update existing > instructions suggesting manual deletion. Fix this oversight by > recommending 'git worktree remove' instead. Too bad we can't show off "git worktree move" :) The patches look fine. -- Duy
Re: What's cooking in git.git (Apr 2018, #01; Mon, 9)
On Mon, Apr 9, 2018 at 12:21 PM, Junio C Hamanowrote: > * sb/packfiles-in-repository (2018-03-26) 12 commits > (merged to 'next' on 2018-03-30 at caa68db14d) > + packfile: keep prepare_packed_git() private > + packfile: allow find_pack_entry to handle arbitrary repositories > + packfile: add repository argument to find_pack_entry > + packfile: allow reprepare_packed_git to handle arbitrary repositories > + packfile: allow prepare_packed_git to handle arbitrary repositories > + packfile: allow prepare_packed_git_one to handle arbitrary repositories > + packfile: add repository argument to reprepare_packed_git > + packfile: add repository argument to prepare_packed_git > + packfile: add repository argument to prepare_packed_git_one > + packfile: allow install_packed_git to handle arbitrary repositories > + packfile: allow rearrange_packed_git to handle arbitrary repositories > + packfile: allow prepare_packed_git_mru to handle arbitrary repositories > (this branch uses nd/remove-ignore-env-field and sb/object-store; is tangled > with sb/submodule-move-nested.) > > Refactoring of the internal global data structure continues. > > Is this ready for 'master' by now? I think so. Things start to look much nicer. -- Duy
Re: [PATCH/RFC 5/5] help: add "-a --verbose" to list all commands with synopsis
On Mon, Apr 9, 2018 at 11:55 AM, Eric Sunshinewrote: > On Mon, Apr 9, 2018 at 5:47 AM, Junio C Hamano wrote: >> Eric Sunshine writes: >>> On Mon, Mar 26, 2018 at 12:55 PM, Nguyễn Thái Ngọc Duy >>> wrote: + switch (category) { + case CAT_ancillaryinterrogators: return _("Ancillary interrogators"); + case CAT_ancillarymanipulators: return _("Ancillary manipulators"); + case CAT_foreignscminterface: return _("Foreign SCM interface"); + case CAT_mainporcelain: return _("Main porcelain"); + case CAT_plumbinginterrogators: return _("Plumbing interrogators"); + case CAT_plumbingmanipulators: return _("Plumbing interrogators"); >>> >>> s/interrogators"/manipulators"/ >>> + case CAT_purehelpers: return _("Pure helpers"); + case CAT_synchelpers: return _("Sync helpers"); + case CAT_synchingrepositories: return _("Synching repositories"); >> >> Somehow this screams "an array of strings" at me. Aren't this >> CAT_things small and dense enum? > > Duy's modified generate-cmdlist.sh does actually output an array of > strings for these, but the (generated) array is commented out in this > RFC. I suppose the reason it's not presently used is because the array > looks like this: > > static const char *cmd_categories[] = { > "ancillaryinterrogators", > "ancillarymanipulators", > "foreignscminterface", > "mainporcelain", > "plumbinginterrogators", > "plumbingmanipulators", > "purehelpers", > "synchelpers", > "synchingrepositories", > NULL > }; > > which doesn't give quite the human-friendly output he'd like. The > series is RFC, after all. Yep. > A possible approach to fix it would be to add a new "### categories" > section to command-list.txt which associates those category tags > ("ancillaryinterrogators") with human-readable counterparts > ("Ancillary interrogators"). Or extract the headlines from git.txt but that's not easy since it's not consistent. We could manually recreate the same grouping as in git.txt too, it's probably nicer than just printing groups sorted by category id. -- Duy
Re: [PATCH 4/3] Makefile: untangle DEVELOPER and -Werror
On Sat, Apr 7, 2018 at 2:36 PM, Ævar Arnfjörð Bjarmasonwrote: > Anyway, I see you've pushed a new version with DEVOPTS. I'll submit mine > on top of that once your new version lands (unless you want to try to > integrate it yourself). Actually I think I'll just drop both EAGER_DEVELOPER and DEVOTPS. -- Duy
Re: [PATCH 4/3] Makefile: untangle DEVELOPER and -Werror
On Fri, Apr 6, 2018 at 11:42 PM, Jeff King <p...@peff.net> wrote: > On Tue, Apr 03, 2018 at 05:17:00PM +0200, Duy Nguyen wrote: > >> It's not that complex. With the EAGER_DEVELOPER patch removed, we can >> have something like this where eager devs just need to put >> >> DEVOPTS = gentle no-suppression >> >> and you put >> >> DEVOPTS = gentle >> >> (bad naming, I didn't spend time thinking about names) > > It seems to me like we're losing the point of DEVELOPER here. I thought > the idea was to have a turn-key flag you could set to get extra linting > on your commits. But now we're tweaking all kinds of individual options. > At some point are we better off just letting you put "-Wno-foo" in your > CFLAGS yourself? It's because what AEvar wanted is no longer a dev thing :) > I don't mind the version-based checks because they're automatic, so the > feature remains turn-key. But this kind of DEVOPTS seems like a step in > the wrong direction. > > -Peff -- Duy
Re: [RFC PATCH 00/19] object-store refactoring 3 (replace objects, main ref store)
On Sat, Apr 7, 2018 at 1:21 AM, Stefan Bellerwrote: * > diff --git a/repository.h b/repository.h > index 09df94a472..2922d3a28b 100644 > --- a/repository.h > +++ b/repository.h > @@ -26,6 +26,11 @@ struct repository { > */ > struct raw_object_store *objects; > > + /* > +* The store in which the refs are hold. > +*/ > + struct ref_store *main_ref_store; > + We probably should drop the main_ prefix here because this could also be a submodule ref store (when the repository is about a submodule). worktree ref store is a different story and I don't know how to best present it here yet (I'm still thinking a separate struct repository). But we can worry about that when struct repository supports multiple worktree. -- Duy
Re: [PATCH 1/9] git_config_set: fix off-by-two
On Tue, Apr 3, 2018 at 11:31 AM, Johannes Schindelinwrote: > It is very frustrating to spend that much time with only little gains here > and there (and BusyBox-w32 is simply not robust enough yet, apart from > also not showing a significant improvement in performance). You still use busybox-w32? It's amazing that people still use it after the linux subsystem comes. busybox has a lot of commands built in (i.e. no new processes) and unless rmyorston did something more, the "fork" in ash shell should be as cheap as it could be: it simply serializes data and sends to the new process. If performance does not improve, I guess the process creation cost dominates. There's not much we could do except moving away from the zillion processes test framework: either something C-based or another scripting language (ok I don't want to bring this up again) -- Duy
Re: [PATCH] t2028: tighten grep expression to make "move worktree" test more robust
On Tue, Apr 3, 2018 at 11:25 AM, Eric Sunshinewrote: > Following a rename of worktree "source" to "destination", the "move > worktree" test uses grep to verify that the output of "git worktree list > --porcelain" does not contain "source" (and does contain "destination"). > Unfortunately, the grep expression is too loose and can match > unexpectedly. For example, if component of the test trash directory path > matches "source" (e.g. "/home/me/sources/git/t/trash*"), then the test > will be fooled into thinking that "source" still exists. Tighten the > expression to avoid such accidental matches. > > While at it, drop an unused variable ("toplevel") from the test and > tighten a similarly too-loose expression in a related test. > > Reported-by: Jens Krüger > Signed-off-by: Eric Sunshine > --- > > t2028 in 2.17.0 can be fooled into failing depending upon the path of > the test's trash directory. The problem is with the test being too > loose, not with Git itself. Problem report and diagnosis here[1]. > > [1]: > https://public-inbox.org/git/26a00c2b-c588-68d5-7085-22310c20e...@frm2.tum.de/T/#m994cdb29f141656b0ab48dd0d152432c7e86fc20 Thanks both. It was great to scroll to the latest mails and saw that I didn't have to do anything else :) -- Duy
Re: [PATCH 4/3] Makefile: untangle DEVELOPER and -Werror
On Tue, Apr 03, 2018 at 11:19:46AM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Sat, Mar 31 2018, Duy Nguyen wrote: > > > On Sat, Mar 31, 2018 at 6:40 PM, Ævar Arnfjörð Bjarmason > > <ava...@gmail.com> wrote: > >> Change the DEVELOPER flag, and the newly added EAGER_DEVELOPER flag > >> which (approximately) enables -Wextra so that any combination of them > >> and -Werror can be set. > >> > >> I've long wanted to use DEVELOPER=1 in my production builds, but on > >> some old systems I still get warnings, and thus the build would > >> fail. However if the build/tests fail for some other reason, it would > >> still be useful to scroll up and see what the relevant code is warning > >> about. > >> > >> This change allows for that. Now setting DEVELOPER will set -Werror as > >> before, but if DEVELOPER_NONFATAL is set you'll get the same warnings, > >> but without -Werror. > >> > >> I've renamed the newly added EAGER_DEVELOPER flag to > >> DEVELOPER_EXTRA. The reason is that it approximately turns on -Wextra, > >> and it'll be more consistent to add e.g. DEVELOPER_PEDANTIC later than > >> inventing some new name of our own (VERY_EAGER_DEVELOPER?). > > > > Before we go with zillions of *DEVELOPER* maybe we can have something > > like DEVOPTS where you can give multiple keywords to a single variable > > to influence config.mak.dev. This is similar to COMPILER_FEATURES we > > already have in there, but now it's driven by the dev instead of the > > compiler. So you can have keywords like "gentle" (no -Werror) "extra" > > (-Wextra with no suppression) and something else. > > We could do that, but I don't think it's that bad. This patch is one > extra option on top of yours, And that eager this was marked experimental because I was not sure if it was the right way :) > and it's not going to result in some combinatorial explosion of > options, i.e. if we add DEVELOPER_PEDANTIC we'll just add one extra > flag. > > But sure, we could make this some string we'd need to parse out similar > to COMPILER_FEATURES, it just seems more complex to me for this task. It's not that complex. With the EAGER_DEVELOPER patch removed, we can have something like this where eager devs just need to put DEVOPTS = gentle no-suppression and you put DEVOPTS = gentle (bad naming, I didn't spend time thinking about names) -- 8< -- diff --git a/Makefile b/Makefile index e6680a8977..7b4e038e1d 100644 --- a/Makefile +++ b/Makefile @@ -435,6 +435,11 @@ all:: # Define DEVELOPER to enable more compiler warnings. Compiler version # and faimily are auto detected, but could be overridden by defining # COMPILER_FEATURES (see config.mak.dev) +# +# When DEVELOPER is set, DEVOPTS can be used to control compiler options. +# This variable contains keywords separated by whitespace. Two keywords +# are recognized: "gentle" removes -Werror and "no-suppression" +# removes all "-Wno-" options. GIT-VERSION-FILE: FORCE @$(SHELL_PATH) ./GIT-VERSION-GEN diff --git a/config.mak.dev b/config.mak.dev index 716a14ecc7..1d7aba6a6a 100644 --- a/config.mak.dev +++ b/config.mak.dev @@ -1,4 +1,6 @@ +ifeq ($(filter gentle,$(DEVOPTS)),) CFLAGS += -Werror +endif CFLAGS += -Wdeclaration-after-statement CFLAGS += -Wno-format-zero-length CFLAGS += -Wold-style-definition @@ -21,6 +23,7 @@ CFLAGS += -Wextra # if a function is public, there should be a prototype and the right # header file should be included. If not, it should be static. CFLAGS += -Wmissing-prototypes +ifeq ($(filter no-suppression,$(DEVOPTS)),) # These are disabled because we have these all over the place. CFLAGS += -Wno-empty-body CFLAGS += -Wno-missing-field-initializers @@ -28,6 +31,7 @@ CFLAGS += -Wno-sign-compare CFLAGS += -Wno-unused-function CFLAGS += -Wno-unused-parameter endif +endif # uninitialized warnings on gcc 4.9.2 in xdiff/xdiffi.c and config.c # not worth fixing since newer compilers correctly stop complaining -- 8< --
Re: [PATCH 4/3] Makefile: untangle DEVELOPER and -Werror
On Sat, Mar 31, 2018 at 6:40 PM, Ævar Arnfjörð Bjarmasonwrote: > Change the DEVELOPER flag, and the newly added EAGER_DEVELOPER flag > which (approximately) enables -Wextra so that any combination of them > and -Werror can be set. > > I've long wanted to use DEVELOPER=1 in my production builds, but on > some old systems I still get warnings, and thus the build would > fail. However if the build/tests fail for some other reason, it would > still be useful to scroll up and see what the relevant code is warning > about. > > This change allows for that. Now setting DEVELOPER will set -Werror as > before, but if DEVELOPER_NONFATAL is set you'll get the same warnings, > but without -Werror. > > I've renamed the newly added EAGER_DEVELOPER flag to > DEVELOPER_EXTRA. The reason is that it approximately turns on -Wextra, > and it'll be more consistent to add e.g. DEVELOPER_PEDANTIC later than > inventing some new name of our own (VERY_EAGER_DEVELOPER?). Before we go with zillions of *DEVELOPER* maybe we can have something like DEVOPTS where you can give multiple keywords to a single variable to influence config.mak.dev. This is similar to COMPILER_FEATURES we already have in there, but now it's driven by the dev instead of the compiler. So you can have keywords like "gentle" (no -Werror) "extra" (-Wextra with no suppression) and something else. -- Duy
Re: [PATCH v8 00/15] nd/pack-objects-pack-struct updates
On Sat, Mar 31, 2018 at 1:36 PM, Ævar Arnfjörð Bjarmasonwrote: >> +GIT_TEST_SPLIT_INDEX forces split-index mode on the whole test suite. >> + >> GIT_TEST_FULL_IN_PACK_ARRAY exercises the uncommon pack-objects code >> path where there are more than 1024 packs even if the actual number of >> packs in repository is below this limit. >> >> -GIT_TEST_OE_SIZE_BITS= exercises the uncommon pack-objects >> -code path where we do not cache objecct size in memory and read it >> -from existing packs on demand. This normally only happens when the >> -object size is over 2GB. This variable forces the code path on any >> -object larger than 2^ bytes. > > The docs here say set these env variables, but actually > GIT_TEST_FULL_IN_PACK_ARRAY is a special snowflake in requiring you to > set a bool value. > > I'd set GIT_TEST_SPLIT_INDEX=YesPlease already in my test setup & just > copied that as GIT_TEST_FULL_IN_PACK_ARRAY=YesPlease, but that'll error > out since it's expecting bool, not the env variable to be set. > > I really don't care which we use, but let's use either if(getenv()) or > if(git_env_bool()) consistently, and then have the docs either say "if > set" or "if set to a boolean value (see git-config(1))". I'll change GIT_TEST_SPLIT_INDEX to boolean too since I document it here anyway. Will wait for a while though to see if anything else should be part of v9. -- Duy