Re: [PATCH v7 06/13] pack-objects: move in_pack out of struct object_entry

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 10:48 PM, Jeff King  wrote:
> On Sat, Mar 24, 2018 at 07:33:46AM +0100, Nguyễn Thái Ngọc Duy wrote:
>
>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>> index e1244918a5..b41610569e 100644
>> --- a/builtin/pack-objects.c
>> +++ b/builtin/pack-objects.c
>> @@ -29,6 +29,8 @@
>>  #include "list.h"
>>  #include "packfile.h"
>>
>> +#define IN_PACK(obj) oe_in_pack(_pack, obj)
>
> How come this one gets a macro, but the earlier conversions don't?
>
> I guess the problem is that oe_in_pack() is defined in the generic
> pack-objects.h, but _pack is only in builtin/pack-objects.c?
>
> I wonder if it would be that bad to just say oe_in_pack(_pack, obj)
> everywhere. It's longer, but it makes the code slightly less magical to
> read.

Longer was exactly why I added these macros (with the hope that the
macro upper case names already ring a "it's magical" bell). Should I
drop all these macros? Some code becomes a lot more verbose though.

>> +static void prepare_in_pack_by_idx(struct packing_data *pdata)
>> +{
>> + struct packed_git **mapping, *p;
>> + int cnt = 0, nr = 1 << OE_IN_PACK_BITS;
>> +
>> + if (getenv("GIT_TEST_FULL_IN_PACK_ARRAY")) {
>> + /*
>> +  * leave in_pack_by_idx NULL to force in_pack[] to be
>> +  * used instead
>> +  */
>> + return;
>> + }
>> +
>> + ALLOC_ARRAY(mapping, nr);
>> + mapping[cnt++] = NULL; /* zero index must be mapped to NULL */
>
> Why? I guess because index==0 is a sentinel for "we're using the small
> index numbers?"

No because by default all values in object_entry is zero (or NULL). If
I remember correctly, some code will skip setting in_pack pointer to
leave it NULL. When we convert it to an index, it should also point to
NULL.

>> + prepare_packed_git();
>> + for (p = packed_git; p; p = p->next, cnt++) {
>> + if (cnt == nr) {
>> + free(mapping);
>> + return;
>> + }
>> + p->index = cnt;
>> + mapping[cnt] = p;
>> + }
>> + pdata->in_pack_by_idx = mapping;
>> +}
>
> What happens if we later have to reprepare_packed_git() and end up with
> more packs? We only call this for the first pack.
>
> It may well be handled, but I'm having trouble following the code to see
> if it is. And I doubt that case is covered by our test suite (since it
> inherently involves a race).

I don't think I covered this case. But since "index" field in
packed_git should be zero for the new packs, we could check and either
add it to in_pack_by_idx[].

>>  /*
>> + * The size of struct nearly determines pack-objects's memory
>> + * consumption. This struct is packed tight for that reason. When you
>> + * add or reorder something in this struct, think a bit about this.
>> + *
>
> It's funny to see this warning come in the middle. Should it be part of
> the final struct reordering at the end?

It was at the end in some version, the I shuffled the patches and
forgot about this one :)
-- 
Duy


Re: [PATCH v7 08/13] pack-objects: shrink z_delta_size field in struct object_entry

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 10:59 PM, Jeff King  wrote:
> On Sat, Mar 24, 2018 at 07:33:48AM +0100, Nguyễn Thái Ngọc Duy wrote:
>
>> We only cache deltas when it's smaller than a certain limit. This limit
>> defaults to 1000 but save its compressed length in a 64-bit field.
>> Shrink that field down to 16 bits, so you can only cache 65kb deltas.
>> Larger deltas must be recomputed at when the pack is written down.
>
> Unlike the depth, I don't think there's any _inherent_ reason you
> couldn't throw, say, 1MB deltas into the cache (if you sized it large
> enough). But I doubt such deltas are really all that common. Here are
> the top 10 in linux.git:
>
>   $ git cat-file --batch-all-objects --batch-check='%(deltabase) 
> %(objectsize:disk)' |
> grep -v ^0 | sort -k 2nr | head
>   a02b6794337286bc12c907c33d5d75537c240bd0 769103
>   b28d4b64c05da02c5e8c684dcb9422876225ebdc 327116
>   1e98ce86ed19aff9ba721d13a749ff08088c9922 325257
>   a02b6794337286bc12c907c33d5d75537c240bd0 240647
>   c550d99286c01867dfb26e432417f3106acf8611 177896
>   5977795854f852c2b95dd023fd03cace023ee41c 119737
>   4ccf9681c45d01d17376f7e0d266532a4460f5f8 112671
>   b39fb6821faa9e7bc36de738152a2817b4bf3654 112657
>   2645d6239b74bebd661436762e819b831095b084 103980
>   b8ce7fe5d8def58dc63b7ae099eff7bd07e4e845 101014
>
> It's possible some weird workload would want to tweak this. Say you were
> storing a ton of delta-capable files that were big and always differed
> by a megabyte. And it was somehow really important to you to tradeoff
> memory for CPU during the write phase of a pack.

We're not short on spare bits so I will try to raise this limit to 1MB
(not because you mentioned 1MB, but because the largest size in your
output is close to 1MB).
-- 
Duy


Re: [PATCH v7 10/13] pack-objects: clarify the use of object_entry::size

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 11:04 PM, Jeff King  wrote:
> The subject says "clarify" so I was a little surprised to see code
> changes. It looks like we're just avoiding reassigning on top of the
> value repeatedly, which is part of that clarification. It looks like a
> noop to me.

Oh well... I was counting on the new name (in_pack_size, which follows
in_pack_type naming convention) to emphasize it (and the new "delta
size" comment to point out where in_pack_size contains a delta size.
-- 
Duy


Re: [PATCH v7 12/13] pack-objects: shrink delta_size field in struct object_entry

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 11:24 PM, Jeff King  wrote:
> On Sat, Mar 24, 2018 at 07:33:52AM +0100, Nguyễn Thái Ngọc Duy wrote:
>> @@ -2004,10 +2006,12 @@ static int try_delta(struct unpacked *trg, struct 
>> unpacked *src,
>>   delta_buf = create_delta(src->index, trg->data, trg_size, _size, 
>> max_size);
>>   if (!delta_buf)
>>   return 0;
>> + if (delta_size >= (1 << OE_DELTA_SIZE_BITS))
>> + return 0;
>
> This is the other spot that needs to be "1U".
>
> How come this doesn't get a pdata->oe_delta_size_limit like we have
> pdata->oe_size_limit? Would we want a matching
> $GIT_TEST_OE_DELTA_SIZE_BITS to test it, too?

Probably. This change does not look as risky as the others (no
complicated fallback). But without $GIT_TEST_OE_DELTA_SIZE_BITS it's
hard to know how the new code reacts when we get over the limit. I
will add it.
-- 
Duy


Re: [PATCH v7 13/13] pack-objects: reorder members to shrink struct object_entry

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 11:26 PM, Jeff King  wrote:
> On Sat, Mar 24, 2018 at 07:33:53AM +0100, Nguyễn Thái Ngọc Duy wrote:
>
>> Previous patches leave lots of holes and padding in this struct. This
>> patch reorders the members and shrinks the struct down to 80 bytes
>> (from 136 bytes, before any field shrinking is done) with 16 bits to
>> spare (and a couple more in in_pack_header_size when we really run out
>> of bits).
>
> Out of curiosity, did you count this yourself, or did you double-check
> with a few compilers to make sure they all produce the same result?

I used pahole though only with .o files created by gcc 64-bit. I'll
try the 32-bit version and clang as well.
-- 
Duy


Investment opportunity

2018-03-30 Thread Yi Tan
Greetings,

Please find the content of this mail very confidential. my name is Yi Tan,
I work with a Bank here in Hong Kong. I decided to contact you for an
opportunity to invest in any lucrative business in your country, I am
willing to offer you 40% as my business partner.

We also offer a quick loan at low interest rate, if you are interested
please reply me for more information on my private e-mail:
yitanelect...@gmail.com

Sincerely: Yi Tan




Compliment of the day ,

2018-03-30 Thread kab...@ono.com



--
Hi dear. 
It is wonderful to contact you, I want us to have correspondence. I 
wish you will have the desire so that we can get acquainted to each 
other. Life itself is a mystery, you never know where it might lead 
you.
I'm Tracy.William, a French American . I will be pleased if you reply 
through.(tracy.medce...@outlook.com)
With Love
Tracy
--



Re: [RFC PATCH 0/1] bdl-lib.sh: add bash debug logger

2018-03-30 Thread Wink Saville
On Fri, Mar 30, 2018 at 3:34 AM, Johannes Schindelin
 wrote:
> Hi Wink,
>
> On Tue, 27 Mar 2018, Wink Saville wrote:
>
>> Add bdl-lib.sh which provides functions to assit in debugging git
>> shell scripts and tests.
>
> Interesting idea. It is Bash-only, though... (and it is not a secret that
> I want to discourage using Unix shell scripts in Git's production code,
> they are a decent way to prototype things, but they fall short of being
> robust and portable in practice, and don't get me started on speed
> issues).
>
> So rather than spending time on making it easier to debug shell scripts, I
> would love to see us going into the direction of a consistent C source
> code. I still believe that we can get there, and that the benefits are
> worth the (huge) effort.
>
> Ciao,
> Johannes

I totally agree the code base should use primarily one language!

But that's not the case now and "bdl" gave me insight into the workings
of rebase--interactive and I found little guidance on how to debug
or learn the code base. So I thought I'd see if there was interest
in what I'd done.

If I were to make it non-bash specific would there be any interest?

-- Wink


Re: [PATCH] fast-import: fix a sparse 'NULL pointer' warning

2018-03-30 Thread Junio C Hamano
Ramsay Jones  writes:

> This was going to be an email to Jameson, but I wasn't fast enough! :(
>
> This was originally written against the 'pu' branch, but since the
> 'jm/mem-pool' branch has graduated to 'next', I have rebased on 'next'.
>
> Perhaps this could be squashed into that branch when you re-build
> the 'next' branch at the beginning of the next cycle?

Thanks.  

Yes, that would be the most sensible thing to do, I would think, as
the topic itself (like many other branches that are merged to 'next'
during the pre-release feature freeze period) is not ready for the
'master' branch yet anyway.




[PATCH] fast-import: fix a sparse 'NULL pointer' warning

2018-03-30 Thread Ramsay Jones

Commit a8dfa11562 ("fast-import: introduce mem_pool type", 2018-03-26)
introduces a 'mem_pool' type, along with a file-local global symbol
('fi_mem_poll') which is initialised in its declaration. This causes
sparse to issue a warning, thus:

  SP fast-import.c
  fast-import.c:301:40: warning: Using plain integer as NULL pointer

In order to suppress the warning, replace the '0' used to initialise the
'mp_block' field (of type 'struct mp_block *') with NULL.

Signed-off-by: Ramsay Jones 
---

Hi Junio,

This was going to be an email to Jameson, but I wasn't fast enough! :(

This was originally written against the 'pu' branch, but since the
'jm/mem-pool' branch has graduated to 'next', I have rebased on 'next'.

Perhaps this could be squashed into that branch when you re-build
the 'next' branch at the beginning of the next cycle?

ATB,
Ramsay Jones

 fast-import.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fast-import.c b/fast-import.c
index 0aa148ea4..38356e293 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -298,7 +298,7 @@ static int global_argc;
 static const char **global_argv;
 
 /* Memory pools */
-static struct mem_pool fi_mem_pool =  {0, 2*1024*1024 - sizeof(struct 
mp_block), 0 };
+static struct mem_pool fi_mem_pool =  {NULL, 2*1024*1024 - sizeof(struct 
mp_block), 0 };
 
 /* Atom management */
 static unsigned int atom_table_sz = 4451;
-- 
2.16.0


Re: [GSoC][PATCH v5] test: avoid pipes in git related commands for test

2018-03-30 Thread Junio C Hamano
Eric Sunshine  writes:

> On Tue, Mar 27, 2018 at 1:31 PM, Pratik Karki  wrote:
>> Avoid using pipes downstream of Git commands since the exit codes
>> of commands upstream of pipes get swallowed, thus potentially
>> hiding failure of those commands. Instead, capture Git command
>> output to a file and apply the downstream command(s) to that file.
>>
>>
>> Signed-off-by: Pratik Karki 
>
> Unnecessary double blank line above sign-off.

"git am" would automatically trigger stripspace, which would eat the
extra blank line from that two-blank-line block.

> Aside from that minor hiccup (which Junio fixed when queuing), this
> iteration addresses all my review comments[1] from the previous round
> and does not seem to introduce any new issues.

Thanks for a review.


Re: What's cooking in git.git (Mar 2018, #06; Fri, 30)

2018-03-30 Thread Jeff Hostetler



On 3/30/2018 4:38 PM, Junio C Hamano wrote:


* jh/json-writer (2018-03-28) 1 commit
  - json_writer: new routines to create data in JSON format

  Is this ready for 'next'?


Yes, I believe it is.  This has the V6 fixup for the HEREDOC
with leading whitespace, so I think we're good.

Thanks
Jeff



Re: [GSoC][PATCH v5] test: avoid pipes in git related commands for test

2018-03-30 Thread Eric Sunshine
On Tue, Mar 27, 2018 at 1:31 PM, Pratik Karki  wrote:
> Avoid using pipes downstream of Git commands since the exit codes
> of commands upstream of pipes get swallowed, thus potentially
> hiding failure of those commands. Instead, capture Git command
> output to a file and apply the downstream command(s) to that file.
>
>
> Signed-off-by: Pratik Karki 

Unnecessary double blank line above sign-off.

Aside from that minor hiccup (which Junio fixed when queuing), this
iteration addresses all my review comments[1] from the previous round
and does not seem to introduce any new issues.

Thanks.

[1]: 
https://public-inbox.org/git/CAPig+cS3GjYo+5C_W6WqzK3RP=W+918E6Cz=fsvhky6ewce...@mail.gmail.com/


Re: What's cooking in git.git (Mar 2018, #06; Fri, 30)

2018-03-30 Thread Junio C Hamano
Junio C Hamano  writes:

> --
> [Graduated to "master"]
>
> * jh/partial-clone (2018-03-25) 1 commit
>   (merged to 'next' on 2018-03-28 at 2a0a7aef8e)
>  + unpack-trees: release oid_array after use in check_updates()
>
>  Hotfix.

Not listed in the above, but this round also merges the hotfix
c7620bd0 ("upload-pack: disable object filtering when disabled by
config", 2018-03-28) from Jonathan Nieder that makes sure that
"uploadpack.allowfilter" does disable the feature even when the
client makes an unsolicited request to trigger the "filter" feature.

It is not (yet) clear how I screwed up this report; sorry about
that.





Re: [PATCH v7 13/13] pack-objects: reorder members to shrink struct object_entry

2018-03-30 Thread Jeff King
On Sat, Mar 24, 2018 at 07:33:53AM +0100, Nguyễn Thái Ngọc Duy wrote:

> Previous patches leave lots of holes and padding in this struct. This
> patch reorders the members and shrinks the struct down to 80 bytes
> (from 136 bytes, before any field shrinking is done) with 16 bits to
> spare (and a couple more in in_pack_header_size when we really run out
> of bits).

Out of curiosity, did you count this yourself, or did you double-check
with a few compilers to make sure they all produce the same result?

So having read the whole thing now, I think most of my original concerns
have been addressed. I do think readability takes a hit, but it's not
_too_ bad. There are a few things that have become more brittle, but I
can't think of anything on the horizon that would bite us.

-Peff


Re: [PATCH v7 12/13] pack-objects: shrink delta_size field in struct object_entry

2018-03-30 Thread Jeff King
On Sat, Mar 24, 2018 at 07:33:52AM +0100, Nguyễn Thái Ngọc Duy wrote:

> Allowing a delta size of 64 bits is crazy. Shrink this field down to
> 31 bits with one overflow bit.
> 
> If we find an existing delta larger than 2GB, we do not cache
> delta_size at all and will get the value from oe_size(), potentially
> from disk if it's larger than 4GB.

Since we have a fallback, we can put this slider wherever we want.
Probably something like 20 bits would be plenty, if we ever needed to
squeeze in a few more small-bit items.

> @@ -2004,10 +2006,12 @@ static int try_delta(struct unpacked *trg, struct 
> unpacked *src,
>   delta_buf = create_delta(src->index, trg->data, trg_size, _size, 
> max_size);
>   if (!delta_buf)
>   return 0;
> + if (delta_size >= (1 << OE_DELTA_SIZE_BITS))
> + return 0;

This is the other spot that needs to be "1U".

How come this doesn't get a pdata->oe_delta_size_limit like we have
pdata->oe_size_limit? Would we want a matching
$GIT_TEST_OE_DELTA_SIZE_BITS to test it, too?

-Peff


Re: [PATCH v7 11/13] pack-objects: shrink size field in struct object_entry

2018-03-30 Thread Jeff King
On Sat, Mar 24, 2018 at 07:33:51AM +0100, Nguyễn Thái Ngọc Duy wrote:

> It's very very rare that an uncompressed object is larger than 4GB
> (partly because Git does not handle those large files very well to
> begin with). Let's optimize it for the common case where object size
> is smaller than this limit.
> 
> Shrink size field down to 32 bits [1] and one overflow bit. If the
> size is too large, we read it back from disk. As noted in the previous
> patch, we need to return the delta size instead of canonical size when
> the to-be-reused object entry type is a delta instead of a canonical
> one.
> 
> Add two compare helpers that can take advantage of the overflow
> bit (e.g. if the file is 4GB+, chances are it's already larger than
> core.bigFileThreshold and there's no point in comparing the actual
> value).
> 
> Another note about oe_get_size_slow(). This function MUST be thread
> safe because SIZE() macro is used inside try_delta() which may run in
> parallel. Outside parallel code, no-contention locking should be dirt
> cheap (or insignificant compared to i/o access anyway). To exercise
> this code, it's best to run the test suite with something like
> 
> make test GIT_TEST_OE_SIZE_BITS=2
> 
> which forces this code on all objects larger than 3 bytes.

OK, makes sense. Since we need it to be thread-safe, we have to use
read_lock(). Which means that oe_get_size_slow() is defined in
builtin/pack-objects.c. But the object_entry is defined in the
more-generic pack-objects.h.

So anybody besides builtin/pack-objects.c will have to implement their
own fallback when e->size_valid isn't true. Which is a little odd, but I
guess nobody else needs that field. It might bite us in the future, but
I'm willing to cross my fingers for now (the pack-objects.h header is
really just there to support the bitmap writing code, but even that
could in theory all get shoved into a single translation unit if we had
to).

> [1] it's actually already 32 bits on Windows

And linux-i386. :)

> +unsigned long oe_get_size_slow(struct packing_data *pack,
> +const struct object_entry *e)
> +{

I think I already replied about this earlier, so I'll skim over it this
time.

> diff --git a/pack-objects.c b/pack-objects.c
> index 13f2b2bff2..59c6e40a02 100644
> --- a/pack-objects.c
> +++ b/pack-objects.c
> @@ -120,8 +120,15 @@ struct object_entry *packlist_alloc(struct packing_data 
> *pdata,
>  {
>   struct object_entry *new_entry;
>  
> - if (!pdata->nr_objects)
> + if (!pdata->nr_objects) {
>   prepare_in_pack_by_idx(pdata);
> + if (getenv("GIT_TEST_OE_SIZE_BITS")) {
> + int bits = atoi(getenv("GIT_TEST_OE_SIZE_BITS"));;
> + pdata->oe_size_limit = 1 << bits;
> + }
> + if (!pdata->oe_size_limit)
> + pdata->oe_size_limit = 1 << OE_SIZE_BITS;
> + }

This needs to be "1U << OE_SIZE_BITS". Shifting a signed integer 31 bits
is undefined.

No, I'm not that clever or careful myself. I ran the whole test suite
with SANITIZE=address,undefined and it turned this up, as well as a
similar case for OE_DELTA_SIZE_BITS.

> + uint32_t size_:OE_SIZE_BITS;
> + uint32_t size_valid:1;

A uint32_t bitfield? Would it make more sense to just call these
"unsigned", since we're specifying the precision already?

> +unsigned long oe_get_size_slow(struct packing_data *pack,
> +const struct object_entry *e);
> +static inline unsigned long oe_size(struct packing_data *pack,
> + const struct object_entry *e)
> +{
> + if (e->size_valid)
> + return e->size_;
> +
> + return oe_get_size_slow(pack, e);
> +}

If oe_get_size_slow() fails to find an object's size, it dies. I'm
trying to think of whether that might hit funny corner cases with
racing. I don't _think_ so, because if the object truly goes away, we'd
be screwed during the writing phase anyway.

> +static inline int oe_size_less_than(struct packing_data *pack,
> + const struct object_entry *lhs,
> + unsigned long rhs)
> +{
> + if (lhs->size_valid)
> + return lhs->size_ < rhs;
> + if (rhs < pack->oe_size_limit) /* rhs < 2^x <= lhs ? */
> + return 0;
> + return oe_get_size_slow(pack, lhs) < rhs;
> +}

Clever.

> +static inline void oe_set_size(struct packing_data *pack,
> +struct object_entry *e,
> +unsigned long size)
> +{
> + if (size < pack->oe_size_limit) {
> + e->size_ = size;
> + e->size_valid = 1;
> + } else {
> + e->size_valid = 0;
> + if (oe_get_size_slow(pack, e) != size)
> + die("BUG: 'size' is supposed to be the object size!");
> + }
> +}

That's an expensive assertion. But I guess this isn't supposed to happen
very frequently, so 

Re: [PATCH v7 10/13] pack-objects: clarify the use of object_entry::size

2018-03-30 Thread Jeff King
On Sat, Mar 24, 2018 at 07:33:50AM +0100, Nguyễn Thái Ngọc Duy wrote:

> While this field most of the time contains the canonical object size,
> there is one case it does not: when we have found that the base object
> of the delta in question is also to be packed, we will very happily
> reuse the delta by copying it over instead of regenerating the new
> delta.
> 
> "size" in this case will record the delta size, not canonical object
> size. Later on in write_reuse_object(), we reconstruct the delta
> header and "size" is used for this purpose. When this happens, the
> "type" field contains a delta type instead of a canonical type.
> Highlight this in the code since it could be tricky to see.

Thanks for digging down here. I have definitely been confused by this in
the past.

The subject says "clarify" so I was a little surprised to see code
changes. It looks like we're just avoiding reassigning on top of the
value repeatedly, which is part of that clarification. It looks like a
noop to me.

-Peff


Re: [PATCH v7 08/13] pack-objects: shrink z_delta_size field in struct object_entry

2018-03-30 Thread Jeff King
On Sat, Mar 24, 2018 at 07:33:48AM +0100, Nguyễn Thái Ngọc Duy wrote:

> We only cache deltas when it's smaller than a certain limit. This limit
> defaults to 1000 but save its compressed length in a 64-bit field.
> Shrink that field down to 16 bits, so you can only cache 65kb deltas.
> Larger deltas must be recomputed at when the pack is written down.

Unlike the depth, I don't think there's any _inherent_ reason you
couldn't throw, say, 1MB deltas into the cache (if you sized it large
enough). But I doubt such deltas are really all that common. Here are
the top 10 in linux.git:

  $ git cat-file --batch-all-objects --batch-check='%(deltabase) 
%(objectsize:disk)' |
grep -v ^0 | sort -k 2nr | head
  a02b6794337286bc12c907c33d5d75537c240bd0 769103
  b28d4b64c05da02c5e8c684dcb9422876225ebdc 327116
  1e98ce86ed19aff9ba721d13a749ff08088c9922 325257
  a02b6794337286bc12c907c33d5d75537c240bd0 240647
  c550d99286c01867dfb26e432417f3106acf8611 177896
  5977795854f852c2b95dd023fd03cace023ee41c 119737
  4ccf9681c45d01d17376f7e0d266532a4460f5f8 112671
  b39fb6821faa9e7bc36de738152a2817b4bf3654 112657
  2645d6239b74bebd661436762e819b831095b084 103980
  b8ce7fe5d8def58dc63b7ae099eff7bd07e4e845 101014

It's possible some weird workload would want to tweak this. Say you were
storing a ton of delta-capable files that were big and always differed
by a megabyte. And it was somehow really important to you to tradeoff
memory for CPU during the write phase of a pack.

That seems pretty unlikely to bite anybody (and that was the best I
could come up with as a devil's advocate against it).

> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  Documentation/config.txt |  3 ++-
>  builtin/pack-objects.c   | 22 --
>  pack-objects.h   |  3 ++-
>  3 files changed, 20 insertions(+), 8 deletions(-)

Patch looks OK.

-Peff


Re: What's cooking in git.git (Mar 2018, #06; Fri, 30)

2018-03-30 Thread Thomas Gummerer
On 03/30, Junio C Hamano wrote:

> * tg/worktree-add-existing-branch (2018-03-27) 6 commits
>  - t2025: rename now outdated branch name
>  - worktree: teach "add" to check out existing branches
>  - worktree: factor out dwim_branch function
>  - worktree: remove force_new_branch from struct add_opts
>  - worktree: be clearer when "add" dwim-ery kicks in
>  - worktree: improve message when creating a new worktree
> 
>  "git worktree add" learned to check out an existing branch.
> 
>  Is this ready for 'next'?

Not quite yet.  Eric spotted some UI deficiencies which I'm currently
trying to address.  I hope to re-roll this in a few days with those
deficiencies fixed.


Re: [PATCH v7 07/13] pack-objects: refer to delta objects by index instead of pointer

2018-03-30 Thread Jeff King
On Sat, Mar 24, 2018 at 07:33:47AM +0100, Nguyễn Thái Ngọc Duy wrote:

> These delta pointers always point to elements in the objects[] array
> in packing_data struct. We can only hold maximum 4G of those objects
> because the array size in nr_objects is uint32_t. We could use
> uint32_t indexes to address these elements instead of pointers. On
> 64-bit architecture (8 bytes per pointer) this would save 4 bytes per
> pointer.
> 
> Convert these delta pointers to indexes. Since we need to handle NULL
> pointers as well, the index is shifted by one [1].
> 
> [1] This means we can only index 2^32-2 objects even though nr_objects
> could contain 2^32-1 objects. It should not be a problem in
> practice because when we grow objects[], nr_alloc would probably
> blow up long before nr_objects hits the wall.

Hmm, that may be something we eventually fix. I suspect all of this code
does some pretty horrible things as you approach 2^32 objects, though.
I've never tried to make such a pack, but it may be within the realm of
possibility. The .idx file would be 80+GB, but the packfile might not be
much bigger if specially crafted.

I guess that's outside the realm of reasonable, though, so we can assume
that nobody would _really_ want to do that anytime soon. And anything
malicious would probably die long before this code triggers.

> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  builtin/pack-objects.c | 116 ++---
>  pack-objects.h |  67 ++--
>  2 files changed, 124 insertions(+), 59 deletions(-)

The patch itself looks OK. This is one of the nicer ones, because it
really doesn't involve any extra storage management, just some accessor
functions.

-Peff


Re: [PATCH v7 06/13] pack-objects: move in_pack out of struct object_entry

2018-03-30 Thread Jeff King
On Sat, Mar 24, 2018 at 07:33:46AM +0100, Nguyễn Thái Ngọc Duy wrote:

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index e1244918a5..b41610569e 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -29,6 +29,8 @@
>  #include "list.h"
>  #include "packfile.h"
>  
> +#define IN_PACK(obj) oe_in_pack(_pack, obj)

How come this one gets a macro, but the earlier conversions don't?

I guess the problem is that oe_in_pack() is defined in the generic
pack-objects.h, but _pack is only in builtin/pack-objects.c?

I wonder if it would be that bad to just say oe_in_pack(_pack, obj)
everywhere. It's longer, but it makes the code slightly less magical to
read.

> @@ -1074,7 +1076,7 @@ static void create_object_entry(const struct object_id 
> *oid,
>   else
>   nr_result++;
>   if (found_pack) {
> - entry->in_pack = found_pack;
> + oe_set_in_pack(_pack, entry, found_pack);
>   entry->in_pack_offset = found_offset;
>   }

it's funny to see in_pack as an external thing, but in_pack_offset still
in the struct. I guess there's nothing to be gained there, since the
offset really does need to be individual (and large).

> diff --git a/cache.h b/cache.h
> index 862bdff83a..b90feb3802 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1635,6 +1635,7 @@ extern struct packed_git {
>   int index_version;
>   time_t mtime;
>   int pack_fd;
> + int index;  /* for builtin/pack-objects.c */
>   unsigned pack_local:1,
>pack_keep:1,
>freshened:1,

It's pretty gross to infect this global struct. But I'm not sure there's
an easier way to do it with constant-time lookups. You'd have to build
the packed_git index preemptively in pack-objects, and then always just
pass around the index numbers.  And even that is kind of dicey, since
the packed_git list can grow while we're running.

The alternative is a hash table mapping packed_git pointers into numeric
indices. Yuck.

> +static void prepare_in_pack_by_idx(struct packing_data *pdata)
> +{
> + struct packed_git **mapping, *p;
> + int cnt = 0, nr = 1 << OE_IN_PACK_BITS;
> +
> + if (getenv("GIT_TEST_FULL_IN_PACK_ARRAY")) {
> + /*
> +  * leave in_pack_by_idx NULL to force in_pack[] to be
> +  * used instead
> +  */
> + return;
> + }
> +
> + ALLOC_ARRAY(mapping, nr);
> + mapping[cnt++] = NULL; /* zero index must be mapped to NULL */

Why? I guess because index==0 is a sentinel for "we're using the small
index numbers?"

> + prepare_packed_git();
> + for (p = packed_git; p; p = p->next, cnt++) {
> + if (cnt == nr) {
> + free(mapping);
> + return;
> + }
> + p->index = cnt;
> + mapping[cnt] = p;
> + }
> + pdata->in_pack_by_idx = mapping;
> +}

What happens if we later have to reprepare_packed_git() and end up with
more packs? We only call this for the first pack.

It may well be handled, but I'm having trouble following the code to see
if it is. And I doubt that case is covered by our test suite (since it
inherently involves a race).

>  /*
> + * The size of struct nearly determines pack-objects's memory
> + * consumption. This struct is packed tight for that reason. When you
> + * add or reorder something in this struct, think a bit about this.
> + *

It's funny to see this warning come in the middle. Should it be part of
the final struct reordering at the end?

-Peff


What's cooking in git.git (Mar 2018, #06; Fri, 30)

2018-03-30 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

Git 2.17 final is expected to be tagged by early next week.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* jh/partial-clone (2018-03-25) 1 commit
  (merged to 'next' on 2018-03-28 at 2a0a7aef8e)
 + unpack-trees: release oid_array after use in check_updates()

 Hotfix.

--
[New Topics]

* rs/status-with-removed-submodule (2018-03-28) 1 commit
  (merged to 'next' on 2018-03-30 at 8a7b618bc1)
 + submodule: check for NULL return of get_submodule_ref_store()

 "git submodule status" misbehaved on a submodule that has been
 removed from the working tree.

 Will cook in 'next'.


* lv/tls-1.3 (2018-03-29) 1 commit
  (merged to 'next' on 2018-03-30 at 4f13731408)
 + http: allow use of TLS 1.3

 When built with more recent cURL, GIT_SSL_VERSION can now specify
 "tlsv1.3" as its value.

 Will cook in 'next'.


* nd/warn-more-for-devs (2018-03-29) 3 commits
 - Makefile: add EAGER_DEVELOPER mode
 - Makefile: detect compiler and enable more warnings in DEVELOPER=1
 - connect.c: mark die_initial_contact() NORETURN

 The build procedure "make DEVELOPER=YesPlease" learned to enable a
 bit more warning options depending on the compiler used to help
 developers more.  There also is "make EAGER_DEVELOPER=YesPlease"
 available now, for those who want to help fixing warnings we
 usually ignore.

 Will merge to 'next'.


* sb/submodule-move-nested (2018-03-29) 6 commits
 - submodule: fixup nested submodules after moving the submodule
 - submodule-config: remove submodule_from_cache
 - submodule-config: add repository argument to submodule_from_{name, path}
 - submodule-config: allow submodule_free to handle arbitrary repositories
 - grep: remove "repo" arg from non-supporting funcs
 - submodule.h: drop declaration of connect_work_tree_and_git_dir
 (this branch uses nd/remove-ignore-env-field and sb/object-store; is tangled 
with sb/packfiles-in-repository.)

 Moving a submodule that itself has submodule in it with "git mv"
 forgot to make necessary adjustment to the nested sub-submodules;
 now the codepath learned to recurse into the submodules.


* tb/config-type (2018-03-29) 1 commit
 - builtin/config.c: prefer `--type=bool` over `--bool`, etc.
 (this branch is used by tb/config-default.)

 The "git config" command uses separate options e.g. "--int",
 "--bool", etc. to specify what type the caller wants the value to
 be interpreted as.  A new "--type=" option has been
 introduced, which would make it cleaner to define new types.

 Will merge to 'next'.


* tb/config-default (2018-03-29) 3 commits
 - builtin/config: introduce `color` type specifier
 - config.c: introduce 'git_config_color' to parse ANSI colors
 - builtin/config: introduce `--default`
 (this branch uses tb/config-type.)

 "git config --get" learned the "--default" option, to help the
 calling script.  Building on top of the tb/config-type topic, the
 "git config" learns "--type=color" type.  Taken together, you can
 do things like "git config --get foo.color --default blue" and get
 the ANSI color sequence for the color given to foo.color variable,
 or "blue" if the variable does not exist.


* eb/cred-helper-ignore-sigpipe (2018-03-29) 1 commit
  (merged to 'next' on 2018-03-30 at c48e98c1b1)
 + credential: ignore SIGPIPE when writing to credential helpers

 When credential helper exits very quickly without reading its
 input, it used to cause Git to die with SIGPIPE, which has been
 fixed.

 Will cook in 'next'.


* jk/flockfile-stdio (2018-03-30) 1 commit
 - config: move flockfile() closer to unlocked functions


* jk/relative-directory-fix (2018-03-30) 5 commits
 - refs: use chdir_notify to update cached relative paths
 - set_work_tree: use chdir_notify
 - add chdir-notify API
 - trace.c: export trace_setup_key
 - set_git_dir: die when setenv() fails

--
[Stalled]

* sb/blame-color (2018-02-13) 3 commits
 - builtin/blame: highlight recently changed lines
 - builtin/blame: add option to color metadata fields separately
 - builtin/blame: dim uninteresting metadata

 Expecting a reroll.
 cf. https://public-inbox.org/git/20171110011002.10179-1-sbel...@google.com/#t
 error messages are funny, can segfault, ...


* av/fsmonitor-updates (2018-01-04) 6 commits
 - fsmonitor: use fsmonitor data in `git diff`
 - fsmonitor: remove debugging lines from t/t7519-status-fsmonitor.sh
 - fsmonitor: make output of test-dump-fsmonitor more concise
 - fsmonitor: update helper tool, now that flags are filled later
 - fsmonitor: 

Re: [PATCH v7 05/13] pack-objects: move in_pack_pos out of struct object_entry

2018-03-30 Thread Jeff King
On Sat, Mar 24, 2018 at 07:33:45AM +0100, Nguyễn Thái Ngọc Duy wrote:

> This field is only need for pack-bitmap, which is an optional
> feature. Move it to a separate array that is only allocated when
> pack-bitmap is used (it's not freed in the same way that objects[] is
> not).

I had trouble parsing the parenthetical in the last sentence. It does
make sense if you read it hard enough, but maybe:

  (like objects[], it is not freed, since we need it until the end of
  the process)

would be more clear?

The patch itself seems OK.

-Peff


Re: [PATCH v7 04/13] pack-objects: use bitfield for object_entry::depth

2018-03-30 Thread Jeff King
On Sat, Mar 24, 2018 at 07:33:44AM +0100, Nguyễn Thái Ngọc Duy wrote:

> Because of struct packing from now on we can only handle max depth
> 4095 (or even lower when new booleans are added in this struct). This
> should be ok since long delta chain will cause significant slow down
> anyway.

OK. This is the first user-facing change, but I think it really
shouldn't hurt anybody. My experiments a while ago showed that chains
longer than 50 aren't really worth it, but so this could probably shrink
to something like 8 bits if we really needed it to.

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 83f8154865..205e1f646c 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3068,6 +3068,10 @@ int cmd_pack_objects(int argc, const char **argv, 
> const char *prefix)
>   if (pack_to_stdout != !base_name || argc)
>   usage_with_options(pack_usage, pack_objects_options);
>  
> + if (depth >= (1 << OE_DEPTH_BITS))
> + die(_("delta chain depth %d is greater than maximum limit %d"),
> + depth, (1 << OE_DEPTH_BITS) - 1);

Since this is introducing a new limit, I wonder if we should issue a
warning and just clamp it to the maximum value. That would be kinder to
people who may have existing (admittedly dumb) setups.

-Peff


Re: [PATCH v3 2/3] config.c: introduce 'git_config_color' to parse ANSI colors

2018-03-30 Thread Eric Sunshine
On Wed, Mar 28, 2018 at 9:16 PM, Taylor Blau  wrote:
> In preparation for adding `--color` to the `git-config(1)` builtin,
> let's introduce a color parsing utility, `git_config_color` in a similar
> fashion to `git_config_`.

Did you mean s/--color/--type=color/ ?

> Signed-off-by: Taylor Blau 


Re: [PATCH v3 1/3] builtin/config: introduce `--default`

2018-03-30 Thread Eric Sunshine
On Wed, Mar 28, 2018 at 9:16 PM, Taylor Blau  wrote:
> For some use cases, callers of the `git-config(1)` builtin would like to
> fallback to default values when the slot asked for does not exist. In
> addition, users would like to use existing type specifiers to ensure
> that values are parsed correctly when they do exist in the
> configuration.
>
> For example, to fetch a value without a type specifier and fallback to
> `$fallback`, the following is required:
>
>   $ git config core.foo || echo "$fallback"
>
> This is fine for most values, but can be tricky for difficult-to-express
> `$fallback`'s, like ANSI color codes.
>
> This motivates `--get-color`, which is a one-off exception to the normal
> type specifier rules wherein a user specifies both the configuration
> slot and an optional fallback. Both are formatted according to their
> type specifier, which eases the burden on the user to ensure that values
> are correctly formatted.
>
> This commit (and those following it in this series) aim to eventually
> replace `--get-color` with a consistent alternative. By introducing
> `--default`, we allow the `--get-color` action to be promoted to a
> `--color` type specifier, retaining the "fallback" behavior via the
> `--default` flag introduced in this commit.

I'm confused. The cover letter said that this iteration no longer
introduces a --color option (favoring instead --type=color), but this
commit message still talks about --color. Did you mean
s/--color/--type=color/ ?

> For example, we aim to replace:
>
>   $ git config --get-color slot [default] [...]
>
> with:
>
>   $ git config --default default --color slot [...]

Ditto: s/--color/--type=color/

> Values filled by `--default` behave exactly as if they were present in
> the affected configuration file; they will be parsed by type specifiers
> without the knowledge that they are not themselves present in the
> configuration.
>
> Specifically, this means that the following will work:
>
>   $ git config --int --default 1M does.not.exist
>   1048576
>
> In subsequent commits, we will offer `--color`, which (in conjunction
> with `--default`) will be sufficient to replace `--get-color`.

Ditto: s/--color/--type=color/

> Signed-off-by: Taylor Blau 


Re: [PATCH v7 03/13] pack-objects: use bitfield for object_entry::dfs_state

2018-03-30 Thread Jeff King
On Sat, Mar 24, 2018 at 07:33:43AM +0100, Nguyễn Thái Ngọc Duy wrote:

> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---

This probably needs some explanation for people digging in history (even
if it's "this is to shrink the size as part of a larger struct-shrinking
effort" so they know to dig around in the nearby history).

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 647c01ea34..83f8154865 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3049,6 +3049,9 @@ int cmd_pack_objects(int argc, const char **argv, const 
> char *prefix)
>   OPT_END(),
>   };
>  
> + if (DFS_NUM_STATES > (1 << OE_DFS_STATE_BITS))
> + die("BUG: too many dfs states, increase OE_DFS_STATE_BITS");

I thought this was off-by-one at first, but NUM_STATES is one more than
the highest state, so it's right.

I suspect all of the dfs and depth stuff could be pulled into a separate
array that is used only during that depth search. But as you have it
squished down here, I think we may be getting it "for free" in between
other non-word-aligned values in the struct.

-Peff


Re: [PATCH v7 02/13] pack-objects: turn type and in_pack_type to bitfields

2018-03-30 Thread Jeff King
On Sat, Mar 24, 2018 at 07:33:42AM +0100, Nguyễn Thái Ngọc Duy wrote:

> +static inline void oe_set_type(struct object_entry *e,
> +enum object_type type)
> +{
> + if (type >= OBJ_ANY)
> + die("BUG: OBJ_ANY cannot be set in pack-objects code");

A minor nit, but this (and other new assertions) should probably be
BUG().

> + e->type_valid = type >= OBJ_NONE;
> + e->type_ = (unsigned)type;

Hmm, so if !e->type_valid, then we may write utter garbage into
e->type_. That's OK, since everybody will access it via oe_type(), but I
wonder if we could trigger weird compiler behavior. I guess the unsigned
cast makes the truncation well-defined.

-Peff


Re: [PATCH v2 2/5] trace.c: export trace_setup_key

2018-03-30 Thread Jeff King
On Fri, Mar 30, 2018 at 12:50:50PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > On Fri, Mar 30, 2018 at 12:46:26PM -0700, Junio C Hamano wrote:
> >
> >> Jeff King  writes:
> >> 
> >> > From: Nguyễn Thái Ngọc Duy 
> >> >
> >> > This is so that we can print traces based on this key outside trace.c.
> >> 
> >> "this key" meaning...?  GIT_TRACE_SETUP?
> >
> > I think "based on trace_setup_key".
> >
> > -Peff
> 
> Yeah, I read, but did not pay enough attention to, the subject X-<.

To be fair, one of our guidelines is that the commit message should not
overly rely on the subject line. Though I am certainly guilty of
starting many a message with "This".

Perhaps:

  The setup-tracing code is static-local to trace.c. In preparation for
  new GIT_TRACE_SETUP code outside of trace.c, let's make the trace_key
  globally available.

would be a better commit message.

-Peff


Re: [PATCH] config: move flockfile() closer to unlocked functions

2018-03-30 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Mar 30, 2018 at 09:04:13PM +0200, Johannes Schindelin wrote:
>
>> > Probably the flockfile should go into do_config_from_file(), where we
>> > specify to use the unlocked variants.
>> 
>> Ah, that makes sense now! I am glad I could also help ;-)
>
> :)
>
>> > Yeah, I'll wait to see how your refactor turns out.
>> 
>> I don't think I'll touch too much in that part of the code. My changes
>> should not cause merge conflicts with a patch moving the
>> flockfile()/funlockfile() calls to do_config_from_file().
>
> OK, then let's do this while we're thinking about it:

Yup, what Dscho found was quite amusing ;-) and this obviously makes
the code clearer to follow.

Will queue, thanks.



Re: [PATCH v2 2/5] trace.c: export trace_setup_key

2018-03-30 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Mar 30, 2018 at 12:46:26PM -0700, Junio C Hamano wrote:
>
>> Jeff King  writes:
>> 
>> > From: Nguyễn Thái Ngọc Duy 
>> >
>> > This is so that we can print traces based on this key outside trace.c.
>> 
>> "this key" meaning...?  GIT_TRACE_SETUP?
>
> I think "based on trace_setup_key".
>
> -Peff

Yeah, I read, but did not pay enough attention to, the subject X-<.


Re: [PATCH v2 2/5] trace.c: export trace_setup_key

2018-03-30 Thread Jeff King
On Fri, Mar 30, 2018 at 12:46:26PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > From: Nguyễn Thái Ngọc Duy 
> >
> > This is so that we can print traces based on this key outside trace.c.
> 
> "this key" meaning...?  GIT_TRACE_SETUP?

I think "based on trace_setup_key".

-Peff


Re: [PATCH v2 2/5] trace.c: export trace_setup_key

2018-03-30 Thread Junio C Hamano
Jeff King  writes:

> From: Nguyễn Thái Ngọc Duy 
>
> This is so that we can print traces based on this key outside trace.c.

"this key" meaning...?  GIT_TRACE_SETUP?

>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> Signed-off-by: Jeff King 
> ---
>  trace.c | 14 +++---
>  trace.h |  1 +
>  2 files changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/trace.c b/trace.c
> index 7f3b08e148..fc623e91fd 100644
> --- a/trace.c
> +++ b/trace.c
> @@ -26,6 +26,7 @@
>  
>  struct trace_key trace_default_key = { "GIT_TRACE", 0, 0, 0 };
>  struct trace_key trace_perf_key = TRACE_KEY_INIT(PERFORMANCE);
> +struct trace_key trace_setup_key = TRACE_KEY_INIT(SETUP);
>  
>  /* Get a trace file descriptor from "key" env variable. */
>  static int get_trace_fd(struct trace_key *key)
> @@ -300,11 +301,10 @@ static const char *quote_crnl(const char *path)
>  /* FIXME: move prefix to startup_info struct and get rid of this arg */
>  void trace_repo_setup(const char *prefix)
>  {
> - static struct trace_key key = TRACE_KEY_INIT(SETUP);
>   const char *git_work_tree;
>   char *cwd;
>  
> - if (!trace_want())
> + if (!trace_want(_setup_key))
>   return;
>  
>   cwd = xgetcwd();
> @@ -315,11 +315,11 @@ void trace_repo_setup(const char *prefix)
>   if (!prefix)
>   prefix = "(null)";
>  
> - trace_printf_key(, "setup: git_dir: %s\n", 
> quote_crnl(get_git_dir()));
> - trace_printf_key(, "setup: git_common_dir: %s\n", 
> quote_crnl(get_git_common_dir()));
> - trace_printf_key(, "setup: worktree: %s\n", 
> quote_crnl(git_work_tree));
> - trace_printf_key(, "setup: cwd: %s\n", quote_crnl(cwd));
> - trace_printf_key(, "setup: prefix: %s\n", quote_crnl(prefix));
> + trace_printf_key(_setup_key, "setup: git_dir: %s\n", 
> quote_crnl(get_git_dir()));
> + trace_printf_key(_setup_key, "setup: git_common_dir: %s\n", 
> quote_crnl(get_git_common_dir()));
> + trace_printf_key(_setup_key, "setup: worktree: %s\n", 
> quote_crnl(git_work_tree));
> + trace_printf_key(_setup_key, "setup: cwd: %s\n", quote_crnl(cwd));
> + trace_printf_key(_setup_key, "setup: prefix: %s\n", 
> quote_crnl(prefix));
>  
>   free(cwd);
>  }
> diff --git a/trace.h b/trace.h
> index 88055abef7..2b6a1bc17c 100644
> --- a/trace.h
> +++ b/trace.h
> @@ -15,6 +15,7 @@ extern struct trace_key trace_default_key;
>  
>  #define TRACE_KEY_INIT(name) { "GIT_TRACE_" #name, 0, 0, 0 }
>  extern struct trace_key trace_perf_key;
> +extern struct trace_key trace_setup_key;
>  
>  extern void trace_repo_setup(const char *prefix);
>  extern int trace_want(struct trace_key *key);


Re: [PATCH v2 0/5] re-parenting relative directories after chdir

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 02:34:25PM -0400, Jeff King wrote:
> On Wed, Mar 28, 2018 at 01:36:56PM -0400, Jeff King wrote:
> 
> > For those just joining us, this fixes a regression that was in v2.13 (so
> > nothing we need to worry about as part of the 2.17-rc track).
> > 
> >   [1/4]: set_git_dir: die when setenv() fails
> >   [2/4]: add chdir-notify API
> >   [3/4]: set_work_tree: use chdir_notify
> >   [4/4]: refs: use chdir_notify to update cached relative paths
> 
> Here's a re-roll based on the feedback I got, including:
> 
>  - fixes the memory leak and vague comment pointed out by Eric
> 
>  - adds in tracing code from Duy
> 
>  - I took Duy's suggestions regarding "least" surprise in some of the
>functions (reparenting NULL is a noop, and registering a reparent
>handler does so even for an absolute path).
> 
> I punted on the "registering the same path twice" thing. That is a
> potential way to misuse this API, but I don't think there's a good
> solution. The "reparent" helper could figure this out for you, but in
> the general case we actually install an arbitrary callback. So the
> caller really has to handle it there.

The series looks good to me.

> 
> I think in the long run we'd want to outlaw calling set_git_dir() twice
> anyway.

Oh yeah. With my latest WIP changes, the bottom of
setup_git_directory_gently() looks like this. Nowhere else in setup
code calls these functions anymore (except the current
setup_work_tree)

-- 8< --
if (result.worktree)
set_git_work_tree(result.worktree);
if (result.gitdir)
set_git_dir(result.gitdir);
if (startup_info->have_repository)
repo_set_hash_algo(the_repository, result.repo_fmt.hash_algo);
...
return result.prefix;
-- 8< --

>From here on, it's not hard to see how to turn set_git_work_tree()
into setup_work_tree_gently() (without doing any set_git_dir) and the
last two calls into "repo_init_gitdir(gitdir, hash_algo)", which
should be where we allocate a new repository object and initialize
related object store, ref store...

But I still have a couple setup corner cases to deal with first :(
--
Duy


[PATCH] config: move flockfile() closer to unlocked functions

2018-03-30 Thread Jeff King
On Fri, Mar 30, 2018 at 09:04:13PM +0200, Johannes Schindelin wrote:

> > Probably the flockfile should go into do_config_from_file(), where we
> > specify to use the unlocked variants.
> 
> Ah, that makes sense now! I am glad I could also help ;-)

:)

> > Yeah, I'll wait to see how your refactor turns out.
> 
> I don't think I'll touch too much in that part of the code. My changes
> should not cause merge conflicts with a patch moving the
> flockfile()/funlockfile() calls to do_config_from_file().

OK, then let's do this while we're thinking about it:

-- >8 --
Subject: config: move flockfile() closer to unlocked functions

Commit 260d408e32 (config: use getc_unlocked when reading
from file, 2015-04-16) taught git_config_from_file() to lock
the filehandle so that we could safely use the faster
unlocked functions to access the handle.

However, it split the logic into two places:

  1. The master lock/unlock happens in git_config_from_file().

  2. The decision to use the unlocked functions happens in
 do_config_from_file().

That means that if anybody calls the latter function, they
will accidentally use the unlocked functions without holding
the lock. And indeed, git_config_from_stdin() does so.

In practice, this hasn't been a problem since this code
isn't generally multi-threaded (and even if some Git program
happened to have another thread running, it's unlikely to be
reading from stdin). But it's a good practice to make sure
we're always holding the lock before using the unlocked
functions.

Helped-by: Johannes Schindelin 
Signed-off-by: Jeff King 
---
I wasn't sure if this was "helped by" or "reported by" or
"stumbled-upon-by". :)

 config.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/config.c b/config.c
index b0c20e6cb8..609ef2f58b 100644
--- a/config.c
+++ b/config.c
@@ -1426,6 +1426,7 @@ static int do_config_from_file(config_fn_t fn,
void *data)
 {
struct config_source top;
+   int ret;
 
top.u.file = f;
top.origin_type = origin_type;
@@ -1436,7 +1437,10 @@ static int do_config_from_file(config_fn_t fn,
top.do_ungetc = config_file_ungetc;
top.do_ftell = config_file_ftell;
 
-   return do_config_from(, fn, data);
+   flockfile(f);
+   ret = do_config_from(, fn, data);
+   funlockfile(f);
+   return ret;
 }
 
 static int git_config_from_stdin(config_fn_t fn, void *data)
@@ -1451,9 +1455,7 @@ int git_config_from_file(config_fn_t fn, const char 
*filename, void *data)
 
f = fopen_or_warn(filename, "r");
if (f) {
-   flockfile(f);
ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename, 
filename, f, data);
-   funlockfile(f);
fclose(f);
}
return ret;
-- 
2.17.0.rc2.594.gdb94a0ce02



Re: [PATCH 1/9] git_config_set: fix off-by-two

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 8:53 PM, Johannes Schindelin
 wrote:
> I think the best course of action would be to incrementally do away with
> the shell scripted test framework, in the way you outlined earlier this
> year. This would *also* buy us a wealth of other benefits, such as better
> control over the parallelization, resource usage, etc.

If you have not noticed, I'm a bit busy with all sorts of stuff and
probably won't continue that work. And since it affects you the most,
you probably have the best motive to tackle it ;-) I don't think
complaining about slow test suite helps. And avoiding adding more
tests because of that definitely does not help.

> It would also finally make it easier to introduce something like "smart
> testing" where code coverage could be computed (this works only for C
> code, of course, not for the many scripted parts of core Git), and a diff
> could be inspected to discover which tests *really* need to be run,
> skipping the tests that would only touch unchanged code.
-- 
Duy


Re: A potential approach to making tests faster on Windows

2018-03-30 Thread Jeff King
On Fri, Mar 30, 2018 at 08:45:45PM +0200, Ævar Arnfjörð Bjarmason wrote:

> I've wondered for a while whether it wouldn't be a viable approach to
> make something like an interpreter for our test suite to get around this
> problem, i.e. much of it's very repetitive and just using a few shell
> functions we've defined, what if we had C equivalents of those?

I've had a similar thought, though I wonder how far we could get with
just shell. I even tried it out with test_cmp:

  
https://public-inbox.org/git/20161020215647.5no7effvutwep...@sigill.intra.peff.net/

But Johannes Sixt pointed out that they already do this (see
mingw_test_cmp in test-lib-functions).

I also tried to explore a few numbers about process invocations to see
if running shell commands is the problem:

  
https://public-inbox.org/git/20161020123111.qnbsainul2g54...@sigill.intra.peff.net/

There was some discussion there about whether the problem is programs
being exec'd, or if it's forks due to subshells. And if it is programs
being exec'd, whether it's shell programs or if it is simply that we
exec Git a huge number of times.

-Peff


Re: [PATCH] setup.c: reset candidate->work_tree after freeing it

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 8:32 PM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>> On Fri, Mar 30, 2018 at 7:10 PM, Junio C Hamano  wrote:
>>
>>> Which fields in candidate are safe to peek by the caller?  How can a
>>> caller tell?
>>
>> To me, all fields should be valid after
>> check_repository_format_gently().
>
> If so, free() is wrong in the first place, and FREE_AND_NULL() is
> making it even worse, no?  We learned there is work_tree set to
> somewhere, the original code by Peff before abade65b ("setup: expose
> enumerated repo info", 2017-11-12) freed it because the code no
> longer needed that piece of information.  If we are passing all we
> learned back to the caller, we should not free the field in the
> function at all.  But it seems (below) the codepath is messier than
> that.

Actually no, NULL is the right value. I was trying to say that this
mysterious code was about _deliberately_ ignore core.worktree. By
keeping repo_fmt->worktree as NULL we tell the caller "core.worktree
is not set". The current code also does that but in a different way:
it sets git_work_tree_cfg based on candidate->worktree, but only for
the "!has_common" block.

>> We still need to free and set NULL here though in addition to a
>> cleanup interface. The reason is, when checking repo config from a
>> worktree, we deliberately ignore core.worktree (which belongs to the
>> main repo only). The implicit line near this
>> free(candidate->work_tree) is "leave git_work_tree_cfg alone, we don't
>> recognize core.worktree". Once we move setting git_work_tree_cfg out
>> of this function, this becomes clear.
>
> So in other words, there is a code that looks at the field and it
> _wants_ to see NULL there---otherwise that brittle code misbehaves
> and FREE_AND_NULL() is a bad-aid to work it around?
>
> Then proposed log message "leaving it dangling is unsanitary" is
> *not* what is going on here, and the real reason why the code should
> be like so deserve to be described both in the log message and in a
> large in-code comment, no?

Let's drop this for now. I'm a bit further along in refactoring this
code that I thought I could. It'll be clearer when the caller is also
updated to show what's wrong.
-- 
Duy


Re: [PATCH] {fetch,upload}-pack: clearly mark unreachable v2 code

2018-03-30 Thread Stefan Beller
On Fri, Mar 30, 2018 at 12:03 PM, Brandon Williams  wrote:
> On 03/30, Ævar Arnfjörð Bjarmason wrote:
>> Change the switch statement driving upload_pack_v2() and
>> do_fetch_pack_v2() to clearly indicate that the FETCH_DONE case is
>> being handled implicitly by other code, instead of giving the reader
>> the impression that the "continue" statement is needed.
>>
>> This issue was flagged as DEADCODE by Coverity[1]. Simply removing the
>> "case FETCH_DONE" would make -Wswitch warn. Instead implement the same
>> solution discussed for my "[PATCH v2 18/29] grep: catch a missing enum
>> in switch statement" patch[2] (which never made it into git.git).
>>
>> 1. 
>> https://public-inbox.org/git/CAGZ79kbAOcwaRzjuMtZ_HVsYvUr_7UAPbOcnrmPgsdE19q=p...@mail.gmail.com/
>> 2. https://public-inbox.org/git/20170513231509.7834-19-ava...@gmail.com/
>
> I understand why you want this change, but I dislike it because it
> removes the ability to have the compiler tell you that your switch
> statements are exhaustive.  Of course it should be noticed rather
> quickly by the addition of those BUG statements :)

I think coverity doesn't flag empty sections, i.e.

case FETCH_DONE:
default:
BUG(...)

would do for coverity. Not sure if we want to add a /*fall thru */
comment, that would aid other compilers to not warn about it.

This would cover both the compiler as well as coverity.

Stefan


Re: [PATCH 8/9] git_config_set: use do_config_from_file() directly

2018-03-30 Thread Johannes Schindelin
Hi Peff,

On Fri, 30 Mar 2018, Jeff King wrote:

> On Fri, Mar 30, 2018 at 04:01:56PM +0200, Johannes Schindelin wrote:
> 
> > You know what is *really* funny?
> > 
> > -- snip --
> > static int git_config_from_stdin(config_fn_t fn, void *data)
> > {
> > return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, 
> > stdin, data, 0);
> > }
> > 
> > int git_config_from_file(config_fn_t fn, const char *filename, void *data)
> > {
> > int ret = -1;
> > FILE *f;
> > 
> > f = fopen_or_warn(filename, "r");
> > if (f) {
> > flockfile(f);
> > ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename, 
> > filename, f, data, 0);
> > funlockfile(f);
> > fclose(f);
> > }
> > return ret;
> > }
> > -- snap --
> > 
> > So the _stdin variant *goes out of its way not to flockfile()*...
> 
> *facepalm* That's probably my fault, since git_config_from_stdin()
> existed already when I did the flockfile stuff.
> 
> Probably the flockfile should go into do_config_from_file(), where we
> specify to use the unlocked variants.

Ah, that makes sense now! I am glad I could also help ;-)

> > But I guess all this will become moot when I start handing down the config
> > options. It does mean that I have to change the signatures in header
> > files, oh well ;-)
> > 
> > But then I can drop this here patch and we can stop musing about
> > flockfile()  ;-)
> 
> Yeah, I'll wait to see how your refactor turns out.

I don't think I'll touch too much in that part of the code. My changes
should not cause merge conflicts with a patch moving the
flockfile()/funlockfile() calls to do_config_from_file().

Ciao,
Dscho


Re: [PATCH] {fetch,upload}-pack: clearly mark unreachable v2 code

2018-03-30 Thread Brandon Williams
On 03/30, Ævar Arnfjörð Bjarmason wrote:
> Change the switch statement driving upload_pack_v2() and
> do_fetch_pack_v2() to clearly indicate that the FETCH_DONE case is
> being handled implicitly by other code, instead of giving the reader
> the impression that the "continue" statement is needed.
> 
> This issue was flagged as DEADCODE by Coverity[1]. Simply removing the
> "case FETCH_DONE" would make -Wswitch warn. Instead implement the same
> solution discussed for my "[PATCH v2 18/29] grep: catch a missing enum
> in switch statement" patch[2] (which never made it into git.git).
> 
> 1. 
> https://public-inbox.org/git/CAGZ79kbAOcwaRzjuMtZ_HVsYvUr_7UAPbOcnrmPgsdE19q=p...@mail.gmail.com/
> 2. https://public-inbox.org/git/20170513231509.7834-19-ava...@gmail.com/

I understand why you want this change, but I dislike it because it
removes the ability to have the compiler tell you that your switch
statements are exhaustive.  Of course it should be noticed rather
quickly by the addition of those BUG statements :)

> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  fetch-pack.c  | 4 ++--
>  upload-pack.c | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 216d1368be..3a16b4bc1a 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1393,8 +1393,8 @@ static struct ref *do_fetch_pack_v2(struct 
> fetch_pack_args *args,
>  
>   state = FETCH_DONE;
>   break;
> - case FETCH_DONE:
> - continue;
> + default:
> + BUG("Added a new fetch_state without updating switch");
>   }
>   }
>  
> diff --git a/upload-pack.c b/upload-pack.c
> index 87b4d32a6e..b7a7601c83 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -1416,8 +1416,8 @@ int upload_pack_v2(struct repository *r, struct 
> argv_array *keys,
>   create_pack_file();
>   state = FETCH_DONE;
>   break;
> - case FETCH_DONE:
> - continue;
> + default:
> + BUG("Added a new fetch_state without updating switch");
>   }
>   }
>  
> -- 
> 2.16.2.804.g6dcf76e118
> 

-- 
Brandon Williams


Re: [PATCH 1/9] git_config_set: fix off-by-two

2018-03-30 Thread Junio C Hamano
Johannes Schindelin  writes:

> What would be a *really* good strategy is: "Oh, there is a problem! Let's
> acknowledge it and try to come up with a solution rather than a
> work-around".
>
> EXPENSIVE_ON_WINDOWS is a symptom. Not a solution.

Yes, it is a workaround.  Making shell faster on windows would of
course be one possible solution to make t/t*.sh scripts go faster
;-)  Or update parts of t/t*.sh so that the equivalent test coverage
can be kept while running making them go faster on Windows.






Re: Bad refspec messes up bundle.

2018-03-30 Thread Johannes Schindelin
Hi Junio,

On Fri, 30 Mar 2018, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> >> > Is this a bug? Should bundle allow providing multiple refspecs when
> > ...
> > I agree that it is a bug if a bundle file records a ref multiple
> > times.  Luciano, here are some pointers so you can fix it:
> >
> > - probably the best way to start would be to add a new test case to
> >   t/t5607-clone-bundle.sh. The script *should* be relatively easy to
> >   understand and imitate. The new test case would probably look somewhat
> >   like this:
> >
> > test_expect_failure 'bundles must not contain multiple refs' '
> 
> s/multiple/duplicate/.  It is not unusual for a bundle to record
> more than one ref; it is (1) useless and harmful to unsuspecting
> clients to record the same ref twice with the same value and (2)
> nonesnse to record the same ref twice with different value.

Of course! Thanks for pointing this out.

> Other than that, the outline seems to go in the right general
> direction.

Excellent.

Luciano, the ball is in your court now. If you get stuck with anything
(such as getting started with building Git), please let us know. We can
help.

Ciao,
Johannes


Re: A potential approach to making tests faster on Windows

2018-03-30 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> On Fri, Mar 30 2018, Johannes Schindelin wrote [expressing frustrations
> about Windows test suite slowness]:
>
> I've wondered for a while whether it wouldn't be a viable approach to
> make something like an interpreter for our test suite to get around this
> problem, i.e. much of it's very repetitive and just using a few shell
> functions we've defined, what if we had C equivalents of those?
> ...
>
> I don't have time or interest to work on this now, but thought it was
> interesting to share. This assumes that something in shellscript like:
>
> while echo foo; do echo bar; done
>
> Is no slower on Windows than *nix, since it's purely using built-ins, as
> opposed to something that would shell out.

That's interesting; it certainly is appreciated to be constructive
to find a usable solution.



Re: Draft of Git Rev News edition 37

2018-03-30 Thread Johannes Schindelin
Hi Jake,

On Fri, 30 Mar 2018, Jacob Keller wrote:

> On Fri, Mar 30, 2018 at 2:02 AM, Johannes Schindelin
>  wrote:
> >
> > Jake, I do not know about your availability, but I would love it if
> > you could take a stab, as I trust you to be unbiased. I would not
> > trust myself to be unbiased because (as everybody saw who cared to
> > read) I got a little bit too emotional and should have stayed more
> > professional.
> 
> I hope to be able to make a summary of the discussion as best I can.
> It may take a bit as there is a lot of mails to read. I agree that a
> good summary should come from someone outside the discussion to reduce
> emotional bias.

Thank you so much!
Dscho


Re: [PATCH 1/9] git_config_set: fix off-by-two

2018-03-30 Thread Johannes Schindelin
Hi Duy,

On Fri, 30 Mar 2018, Duy Nguyen wrote:

> On Fri, Mar 30, 2018 at 2:32 PM, Johannes Schindelin
>  wrote:
> >
> > On Thu, 29 Mar 2018, Jeff King wrote:
> >
> >> On Thu, Mar 29, 2018 at 11:15:33AM -0700, Stefan Beller wrote:
> >>
> >> > > When calling `git config --unset abc.a` on this file, it leaves this
> >> > > (invalid) config behind:
> >> > >
> >> > > [
> >> > > [xyz]
> >> > > key = value
> >> > >
> >> > > The reason is that we try to search for the beginning of the line (or
> >> > > for the end of the preceding section header on the same line) that
> >> > > defines abc.a, but as an optimization, we subtract 2 from the offset
> >> > > pointing just after the definition before we call
> >> > > find_beginning_of_line(). That function, however, *also* performs that
> >> > > optimization and promptly fails to find the section header correctly.
> >> >
> >> > This commit message would be more convincing if we had it in test form.
> >>
> >> I agree a test might be nice. But I don't find the commit message
> >> unconvincing at all. It explains pretty clearly why the bug occurs, and
> >> you can verify it by looking at find_beginning_of_line.
> >>
> >> > [abc]a
> >> >
> >> > is not written by Git, but would be written from an outside tool or 
> >> > person
> >> > and we barely cope with it?
> >>
> >> Yes, I don't think git would ever write onto the same line. But clearly
> >> we should handle anything that's syntactically valid.
> >
> > I was tempted to add the test case, because it is easy to test it.
> >
> > But I then decided *not* to add it. Why? Testing is a balance between "can
> > do" and "need to do".
> >
> > Can you imagine that I did *not* run the entire test suite before
> > submitting this patch series, because it takes an incredible *90 minutes*
> > to run *on a fast Windows machine*?
> 
> What's wrong with firing up a new worktree, run the test suite there
> and go back to do something else so you won't waste time just waiting
> for test results and submit? Sure there is a mental overhead for
> switching tasks, but at 90 minutes, I think it's worth doing.

Of course it is worth doing. That's why I often test the end result on
Windows (waiting those 90 minutes, but I do not fire up a new worktree, I
use my cloud privilege and let Azure/Visual Studio Team Services do the
work for me, without slowing down my laptop).

What I would love to do, however, would be to test all intermediate
patches, too, as that often shows a problem with my frequent reorderings
via interactive rebases. And 90 minutes times 9 is... 13 hours and 30
minutes. That's a really long time.

I think the best course of action would be to incrementally do away with
the shell scripted test framework, in the way you outlined earlier this
year. This would *also* buy us a wealth of other benefits, such as better
control over the parallelization, resource usage, etc.

It would also finally make it easier to introduce something like "smart
testing" where code coverage could be computed (this works only for C
code, of course, not for the many scripted parts of core Git), and a diff
could be inspected to discover which tests *really* need to be run,
skipping the tests that would only touch unchanged code.

Ciao,
Dscho


Re: [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug)

2018-03-30 Thread Johannes Schindelin
Hi Ævar,

On Fri, 30 Mar 2018, Ævar Arnfjörð Bjarmason wrote:

> 
> On Thu, Mar 29 2018, Johannes Schindelin wrote:
> 
> > Nonetheless, I would be confortable with this patch going into
> > v2.17.0, even at this late stage. The final verdict is Junio's, of
> > course.
> 
> Thanks a lot for working on this. I'm keen to stress test this, but
> won't have time in the next few days, and in any case think that the
> parts that change functionality should wait until after 2.17 (but e.g.
> the test renaming would be fine for a cherry-pick).

Obviously this was never meant to get into v2.17.0 (apart maybe from 1/9,
which however is so contested over that addition of the test case under
the assumption that anybody but me would dare to touch those parts of the
code).

Ciao,
Dscho

A potential approach to making tests faster on Windows

2018-03-30 Thread Ævar Arnfjörð Bjarmason

On Fri, Mar 30 2018, Johannes Schindelin wrote [expressing frustrations
about Windows test suite slowness]:

I've wondered for a while whether it wouldn't be a viable approach to
make something like an interpreter for our test suite to get around this
problem, i.e. much of it's very repetitive and just using a few shell
functions we've defined, what if we had C equivalents of those?

Duy had a WIP patch set a while ago to add C test suite support, but I
thought what if we turn that inside-out, and instead have a shell
interpreter that knows about the likes of test_cmp, and executes them
directly?

Here's proof of concept as a patch to the dash shell:

u dash (debian/master=) $ git diff
diff --git a/src/builtins.def.in b/src/builtins.def.in
index 4441fe4..b214a17 100644
--- a/src/builtins.def.in
+++ b/src/builtins.def.in
@@ -92,3 +92,4 @@ ulimitcmd ulimit
 #endif
 testcmdtest [
 killcmd-u kill
+testcmpcmd test_cmp
diff --git a/src/jobs.c b/src/jobs.c
index c2c2332..905563f 100644
--- a/src/jobs.c
+++ b/src/jobs.c
@@ -1502,3 +1502,12 @@ getstatus(struct job *job) {
jobno(job), job->nprocs, status, retval));
return retval;
 }
+
+#include 
+int
+testcmpcmd(argc, argv)
+   int argc;
+   char **argv;
+{
+   fprintf(stderr, "Got %d arguments\n", argc);
+}

I just added that to jobs.c because it was easiest, then test_cmp
becomes a builtin:

u dash (debian/master=) $ src/dash -c 'type test_cmp'
test_cmp is a shell builtin
u dash (debian/master=) $ src/dash -c 'echo foo && test_cmp 1 2 3'
foo
Got 4 arguments

I.e. it's really easy to add new built in commands to the dash shell
(and probably other shells, but dash is really small & fast).

We could carry some patch like that to dash, and also patch it so
test-lib.sh could know that that was our own custom shell, and we'd then
skip defining functions like test_cmp, and instead use that new builtin.

Similarly, it could then be linked to our own binaries, and the
test-tool would be a builtin that would appropriately dispatch, and we
could even eventually make "git" a shell builtin.

I don't have time or interest to work on this now, but thought it was
interesting to share. This assumes that something in shellscript like:

while echo foo; do echo bar; done

Is no slower on Windows than *nix, since it's purely using built-ins, as
opposed to something that would shell out.


Re: [PATCH 1/9] git_config_set: fix off-by-two

2018-03-30 Thread Johannes Schindelin
Hi Junio,

On Fri, 30 Mar 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  writes:
> 
> > I think if it's worth fixing it's worth testing for, a future change to
> > the config code could easily introduce a regression for this, and
> > particularly in this type of code obscure edge cases like this can point
> > to bugs elsewhere.
> 
> Yup.  "The port to my favourite platform is too slow, and everybody
> should learn to live with thin test coverage" would not be a good
> strategy in the longer run.

What would be a *really* good strategy is: "Oh, there is a problem! Let's
acknowledge it and try to come up with a solution rather than a
work-around".

EXPENSIVE_ON_WINDOWS is a symptom. Not a solution.

And you are actively hurting my ability to contribute, I hope you are
aware of that.

Ciao,
Dscho

[PATCH v2 4/5] set_work_tree: use chdir_notify

2018-03-30 Thread Jeff King
When we change to the top of the working tree, we manually
re-adjust $GIT_DIR and call set_git_dir() again, in order to
update any relative git-dir we'd compute earlier.

Instead of the work-tree code having to know to call the
git-dir code, let's use the new chdir_notify interface.
There are two spots that need updating, with a few
subtleties in each:

  1. the set_git_dir() code needs to chdir_notify_register()
 so it can be told when to update its path.

 Technically we could push this down into repo_set_gitdir(),
 so that even repository structs besides the_repository
 could benefit from this. But that opens up a lot of
 complications:

  - we'd still need to touch set_git_dir(), because it
does some other setup (like setting $GIT_DIR in the
environment)

  - submodules using other repository structs get
cleaned up, which means we'd need to remove them
from the chdir_notify list

  - it's unlikely to fix any bugs, since we shouldn't
generally chdir() in the middle of working on a
submodule

  2. setup_work_tree now needs to call chdir_notify(), and
 can lose its manual set_git_dir() call.

 Note that at first glance it looks like this undoes the
 absolute-to-relative optimization added by 044bbbcb63
 (Make git_dir a path relative to work_tree in
 setup_work_tree(), 2008-06-19). But for the most part
 that optimization was just _undoing_ the
 relative-to-absolute conversion which the function was
 doing earlier (and which is now gone).

 It is true that if you already have an absolute git_dir
 that the setup_work_tree() function will no longer make
 it relative as a side effect. But:

   - we generally do have relative git-dir's due to the
 way the discovery code works

   - if we really care about making git-dir's relative
 when possible, then we should be relativizing them
 earlier (e.g., when we see an absolute $GIT_DIR we
 could turn it relative, whether we are going to
 chdir into a worktree or not). That would cover all
 cases, including ones that 044bbbcb63 did not.

Signed-off-by: Jeff King 
---
 environment.c | 23 ++-
 setup.c   |  9 +++--
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/environment.c b/environment.c
index e01acf8b11..903a6c9df7 100644
--- a/environment.c
+++ b/environment.c
@@ -13,6 +13,7 @@
 #include "refs.h"
 #include "fmt-merge-msg.h"
 #include "commit.h"
+#include "chdir-notify.h"
 
 int trust_executable_bit = 1;
 int trust_ctime = 1;
@@ -296,7 +297,7 @@ char *get_graft_file(void)
return the_repository->graft_file;
 }
 
-void set_git_dir(const char *path)
+static void set_git_dir_1(const char *path)
 {
if (setenv(GIT_DIR_ENVIRONMENT, path, 1))
die("could not set GIT_DIR to '%s'", path);
@@ -304,6 +305,26 @@ void set_git_dir(const char *path)
setup_git_env();
 }
 
+static void update_relative_gitdir(const char *name,
+  const char *old_cwd,
+  const char *new_cwd,
+  void *data)
+{
+   char *path = reparent_relative_path(old_cwd, new_cwd, get_git_dir());
+   trace_printf_key(_setup_key,
+"setup: move $GIT_DIR to '%s'",
+path);
+   set_git_dir_1(path);
+   free(path);
+}
+
+void set_git_dir(const char *path)
+{
+   set_git_dir_1(path);
+   if (!is_absolute_path(path))
+   chdir_notify_register(NULL, update_relative_gitdir, NULL);
+}
+
 const char *get_log_output_encoding(void)
 {
return git_log_output_encoding ? git_log_output_encoding
diff --git a/setup.c b/setup.c
index 7287779642..9eb2e808e1 100644
--- a/setup.c
+++ b/setup.c
@@ -3,6 +3,7 @@
 #include "config.h"
 #include "dir.h"
 #include "string-list.h"
+#include "chdir-notify.h"
 
 static int inside_git_dir = -1;
 static int inside_work_tree = -1;
@@ -378,7 +379,7 @@ int is_inside_work_tree(void)
 
 void setup_work_tree(void)
 {
-   const char *work_tree, *git_dir;
+   const char *work_tree;
static int initialized = 0;
 
if (initialized)
@@ -388,10 +389,7 @@ void setup_work_tree(void)
die(_("unable to set up work tree using invalid config"));
 
work_tree = get_git_work_tree();
-   git_dir = get_git_dir();
-   if (!is_absolute_path(git_dir))
-   git_dir = real_path(get_git_dir());
-   if (!work_tree || chdir(work_tree))
+   if (!work_tree || chdir_notify(work_tree))
die(_("this operation must be run in a work tree"));
 
/*
@@ -401,7 +399,6 @@ void setup_work_tree(void)
if (getenv(GIT_WORK_TREE_ENVIRONMENT))
setenv(GIT_WORK_TREE_ENVIRONMENT, ".", 1);
 
-   set_git_dir(remove_leading_path(git_dir, work_tree));
initialized = 1;
 }

[PATCH v2 3/5] add chdir-notify API

2018-03-30 Thread Jeff King
If one part of the code does a permanent chdir(), then this
invalidates any relative paths that may be held by other
parts of the code. For example, setup_work_tree() moves us
to the top of the working tree, which may invalidate a
previously stored relative gitdir.

We've hacked around this case by teaching setup_work_tree()
to re-run set_git_dir() with an adjusted path, but this
stomps all over the idea of module boundaries.
setup_work_tree() shouldn't have to know all of the places
that need to be fed an adjusted path. And indeed, there's at
least one other place (the refs code) which needs adjusting.

Let's provide an API to let code that stores relative paths
"subscribe" to updates to the current working directory.
This means that callers of chdir() don't need to know about
all subscribers ahead of time; they can simply consult a
dynamically built list.

Note that our helper function to reparent relative paths
uses the simple remove_leading_path(). We could in theory
use the much smarter relative_path(), but that led to some
problems as described in 41894ae3a3 (Use simpler
relative_path when set_git_dir, 2013-10-14). Since we're
aiming to replace the setup_work_tree() code here, let's
follow its lead.

Signed-off-by: Jeff King 
---
 Makefile   |  1 +
 chdir-notify.c | 93 ++
 chdir-notify.h | 73 +++
 3 files changed, 167 insertions(+)
 create mode 100644 chdir-notify.c
 create mode 100644 chdir-notify.h

diff --git a/Makefile b/Makefile
index a1d8775adb..0da98b9274 100644
--- a/Makefile
+++ b/Makefile
@@ -772,6 +772,7 @@ LIB_OBJS += branch.o
 LIB_OBJS += bulk-checkin.o
 LIB_OBJS += bundle.o
 LIB_OBJS += cache-tree.o
+LIB_OBJS += chdir-notify.o
 LIB_OBJS += checkout.o
 LIB_OBJS += color.o
 LIB_OBJS += column.o
diff --git a/chdir-notify.c b/chdir-notify.c
new file mode 100644
index 00..5f7f2c2ac2
--- /dev/null
+++ b/chdir-notify.c
@@ -0,0 +1,93 @@
+#include "cache.h"
+#include "chdir-notify.h"
+#include "list.h"
+#include "strbuf.h"
+
+struct chdir_notify_entry {
+   const char *name;
+   chdir_notify_callback cb;
+   void *data;
+   struct list_head list;
+};
+static LIST_HEAD(chdir_notify_entries);
+
+void chdir_notify_register(const char *name,
+  chdir_notify_callback cb,
+  void *data)
+{
+   struct chdir_notify_entry *e = xmalloc(sizeof(*e));
+   e->name = name;
+   e->cb = cb;
+   e->data = data;
+   list_add_tail(>list, _notify_entries);
+}
+
+static void reparent_cb(const char *name,
+   const char *old_cwd,
+   const char *new_cwd,
+   void *data)
+{
+   char **path = data;
+   char *tmp = *path;
+
+   if (!tmp)
+   return;
+
+   *path = reparent_relative_path(old_cwd, new_cwd, tmp);
+   free(tmp);
+
+   if (name) {
+   trace_printf_key(_setup_key,
+"setup: reparent %s to '%s'",
+name, *path);
+   }
+}
+
+void chdir_notify_reparent(const char *name, char **path)
+{
+   chdir_notify_register(name, reparent_cb, path);
+}
+
+int chdir_notify(const char *new_cwd)
+{
+   struct strbuf old_cwd = STRBUF_INIT;
+   struct list_head *pos;
+
+   if (strbuf_getcwd(_cwd) < 0)
+   return -1;
+   if (chdir(new_cwd) < 0) {
+   int saved_errno = errno;
+   strbuf_release(_cwd);
+   errno = saved_errno;
+   return -1;
+   }
+
+   trace_printf_key(_setup_key,
+"setup: chdir from '%s' to '%s'",
+old_cwd.buf, new_cwd);
+
+   list_for_each(pos, _notify_entries) {
+   struct chdir_notify_entry *e =
+   list_entry(pos, struct chdir_notify_entry, list);
+   e->cb(e->name, old_cwd.buf, new_cwd, e->data);
+   }
+
+   strbuf_release(_cwd);
+   return 0;
+}
+
+char *reparent_relative_path(const char *old_cwd,
+const char *new_cwd,
+const char *path)
+{
+   char *ret, *full;
+
+   if (is_absolute_path(path))
+   return xstrdup(path);
+
+   full = xstrfmt("%s/%s", old_cwd, path);
+   ret = xstrdup(remove_leading_path(full, new_cwd));
+   free(full);
+
+   return ret;
+}
diff --git a/chdir-notify.h b/chdir-notify.h
new file mode 100644
index 00..366e4c1ee9
--- /dev/null
+++ b/chdir-notify.h
@@ -0,0 +1,73 @@
+#ifndef CHDIR_NOTIFY_H
+#define CHDIR_NOTIFY_H
+
+/*
+ * An API to let code "subscribe" to changes to the current working directory.
+ * The general idea is that some code asks to be notified when the working
+ * directory changes, and other code that calls chdir uses a special wrapper
+ * that notifies everyone.
+ */
+
+/*
+ * Callers who need to know about changes 

[PATCH v2 5/5] refs: use chdir_notify to update cached relative paths

2018-03-30 Thread Jeff King
Commit f57f37e2e1 (files-backend: remove the use of
git_path(), 2017-03-26) introduced a regression when a
relative $GIT_DIR is used in a working tree:

  - when we initialize the ref backend, we make a copy of
get_git_dir(), which may be relative

  - later, we may call setup_work_tree() and chdir to the
root of the working tree

  - further calls to the ref code will use the stored git
directory, but relative paths will now point to the
wrong place

The new test in t1501 demonstrates one such instance (the
bug causes us to write the ref update to the nonsense
"relative/relative/.git").

Since setup_work_tree() now uses chdir_notify, we can just
ask it update our relative paths when necessary.

Reported-by: Rafael Ascensao 
Helped-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Jeff King 
---
 refs/files-backend.c  |  6 ++
 refs/packed-backend.c |  3 +++
 t/t1501-work-tree.sh  | 12 
 3 files changed, 21 insertions(+)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index bec8e30e9e..a92a2aa821 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -9,6 +9,7 @@
 #include "../lockfile.h"
 #include "../object.h"
 #include "../dir.h"
+#include "../chdir-notify.h"
 
 /*
  * This backend uses the following flags in `ref_update::flags` for
@@ -106,6 +107,11 @@ static struct ref_store *files_ref_store_create(const char 
*gitdir,
refs->packed_ref_store = packed_ref_store_create(sb.buf, flags);
strbuf_release();
 
+   chdir_notify_reparent("files-backend $GIT_DIR",
+ >gitdir);
+   chdir_notify_reparent("files-backend $GIT_COMMONDIR",
+ >gitcommondir);
+
return ref_store;
 }
 
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 65288c6472..369c34f886 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -5,6 +5,7 @@
 #include "packed-backend.h"
 #include "../iterator.h"
 #include "../lockfile.h"
+#include "../chdir-notify.h"
 
 enum mmap_strategy {
/*
@@ -202,6 +203,8 @@ struct ref_store *packed_ref_store_create(const char *path,
refs->store_flags = store_flags;
 
refs->path = xstrdup(path);
+   chdir_notify_reparent("packed-refs", >path);
+
return ref_store;
 }
 
diff --git a/t/t1501-work-tree.sh b/t/t1501-work-tree.sh
index b06210ec5e..a5db53337b 100755
--- a/t/t1501-work-tree.sh
+++ b/t/t1501-work-tree.sh
@@ -431,4 +431,16 @@ test_expect_success 'error out gracefully on invalid 
$GIT_WORK_TREE' '
)
 '
 
+test_expect_success 'refs work with relative gitdir and work tree' '
+   git init relative &&
+   git -C relative commit --allow-empty -m one &&
+   git -C relative commit --allow-empty -m two &&
+
+   GIT_DIR=relative/.git GIT_WORK_TREE=relative git reset HEAD^ &&
+
+   git -C relative log -1 --format=%s >actual &&
+   echo one >expect &&
+   test_cmp expect actual
+'
+
 test_done
-- 
2.17.0.rc2.594.gdb94a0ce02


[PATCH v2 2/5] trace.c: export trace_setup_key

2018-03-30 Thread Jeff King
From: Nguyễn Thái Ngọc Duy 

This is so that we can print traces based on this key outside trace.c.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Jeff King 
---
 trace.c | 14 +++---
 trace.h |  1 +
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/trace.c b/trace.c
index 7f3b08e148..fc623e91fd 100644
--- a/trace.c
+++ b/trace.c
@@ -26,6 +26,7 @@
 
 struct trace_key trace_default_key = { "GIT_TRACE", 0, 0, 0 };
 struct trace_key trace_perf_key = TRACE_KEY_INIT(PERFORMANCE);
+struct trace_key trace_setup_key = TRACE_KEY_INIT(SETUP);
 
 /* Get a trace file descriptor from "key" env variable. */
 static int get_trace_fd(struct trace_key *key)
@@ -300,11 +301,10 @@ static const char *quote_crnl(const char *path)
 /* FIXME: move prefix to startup_info struct and get rid of this arg */
 void trace_repo_setup(const char *prefix)
 {
-   static struct trace_key key = TRACE_KEY_INIT(SETUP);
const char *git_work_tree;
char *cwd;
 
-   if (!trace_want())
+   if (!trace_want(_setup_key))
return;
 
cwd = xgetcwd();
@@ -315,11 +315,11 @@ void trace_repo_setup(const char *prefix)
if (!prefix)
prefix = "(null)";
 
-   trace_printf_key(, "setup: git_dir: %s\n", 
quote_crnl(get_git_dir()));
-   trace_printf_key(, "setup: git_common_dir: %s\n", 
quote_crnl(get_git_common_dir()));
-   trace_printf_key(, "setup: worktree: %s\n", 
quote_crnl(git_work_tree));
-   trace_printf_key(, "setup: cwd: %s\n", quote_crnl(cwd));
-   trace_printf_key(, "setup: prefix: %s\n", quote_crnl(prefix));
+   trace_printf_key(_setup_key, "setup: git_dir: %s\n", 
quote_crnl(get_git_dir()));
+   trace_printf_key(_setup_key, "setup: git_common_dir: %s\n", 
quote_crnl(get_git_common_dir()));
+   trace_printf_key(_setup_key, "setup: worktree: %s\n", 
quote_crnl(git_work_tree));
+   trace_printf_key(_setup_key, "setup: cwd: %s\n", quote_crnl(cwd));
+   trace_printf_key(_setup_key, "setup: prefix: %s\n", 
quote_crnl(prefix));
 
free(cwd);
 }
diff --git a/trace.h b/trace.h
index 88055abef7..2b6a1bc17c 100644
--- a/trace.h
+++ b/trace.h
@@ -15,6 +15,7 @@ extern struct trace_key trace_default_key;
 
 #define TRACE_KEY_INIT(name) { "GIT_TRACE_" #name, 0, 0, 0 }
 extern struct trace_key trace_perf_key;
+extern struct trace_key trace_setup_key;
 
 extern void trace_repo_setup(const char *prefix);
 extern int trace_want(struct trace_key *key);
-- 
2.17.0.rc2.594.gdb94a0ce02



[PATCH v2 0/5] re-parenting relative directories after chdir

2018-03-30 Thread Jeff King
On Wed, Mar 28, 2018 at 01:36:56PM -0400, Jeff King wrote:

> For those just joining us, this fixes a regression that was in v2.13 (so
> nothing we need to worry about as part of the 2.17-rc track).
> 
>   [1/4]: set_git_dir: die when setenv() fails
>   [2/4]: add chdir-notify API
>   [3/4]: set_work_tree: use chdir_notify
>   [4/4]: refs: use chdir_notify to update cached relative paths

Here's a re-roll based on the feedback I got, including:

 - fixes the memory leak and vague comment pointed out by Eric

 - adds in tracing code from Duy

 - I took Duy's suggestions regarding "least" surprise in some of the
   functions (reparenting NULL is a noop, and registering a reparent
   handler does so even for an absolute path).

I punted on the "registering the same path twice" thing. That is a
potential way to misuse this API, but I don't think there's a good
solution. The "reparent" helper could figure this out for you, but in
the general case we actually install an arbitrary callback. So the
caller really has to handle it there.

I think in the long run we'd want to outlaw calling set_git_dir() twice
anyway. But possibly the handler-installers should be more careful.

I'm dropping poor Rafael from the cc here, since it is turning into
quite a lot of messages. :) I think at this point his bug has been duly
reported and we are working on the fix.

  [1/5]: set_git_dir: die when setenv() fails
  [2/5]: trace.c: export trace_setup_key
  [3/5]: add chdir-notify API
  [4/5]: set_work_tree: use chdir_notify
  [5/5]: refs: use chdir_notify to update cached relative paths

 Makefile  |  1 +
 cache.h   |  2 +-
 chdir-notify.c| 93 +++
 chdir-notify.h| 73 +
 environment.c | 26 ++--
 refs/files-backend.c  |  6 +++
 refs/packed-backend.c |  3 ++
 setup.c   |  9 ++---
 t/t1501-work-tree.sh  | 12 ++
 trace.c   | 14 +++
 trace.h   |  1 +
 11 files changed, 223 insertions(+), 17 deletions(-)
 create mode 100644 chdir-notify.c
 create mode 100644 chdir-notify.h

-Peff


[PATCH v2 1/5] set_git_dir: die when setenv() fails

2018-03-30 Thread Jeff King
The set_git_dir() function returns an error if setenv()
fails, but there are zero callers who pay attention to this
return value. If this ever were to happen, it could cause
confusing results, as sub-processes would see a potentially
stale GIT_DIR (e.g., if it is relative and we chdir()-ed to
the root of the working tree).

We _could_ try to fix each caller, but there's really
nothing useful to do after this failure except die. Let's
just lump setenv() failure into the same category as malloc
failure: things that should never happen and cause us to
abort catastrophically.

Signed-off-by: Jeff King 
---
 cache.h   | 2 +-
 environment.c | 5 ++---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index a61b2d3f0d..5c24394d84 100644
--- a/cache.h
+++ b/cache.h
@@ -477,7 +477,7 @@ extern const char *get_git_common_dir(void);
 extern char *get_object_directory(void);
 extern char *get_index_file(void);
 extern char *get_graft_file(void);
-extern int set_git_dir(const char *path);
+extern void set_git_dir(const char *path);
 extern int get_common_dir_noenv(struct strbuf *sb, const char *gitdir);
 extern int get_common_dir(struct strbuf *sb, const char *gitdir);
 extern const char *get_git_namespace(void);
diff --git a/environment.c b/environment.c
index d6dd64662c..e01acf8b11 100644
--- a/environment.c
+++ b/environment.c
@@ -296,13 +296,12 @@ char *get_graft_file(void)
return the_repository->graft_file;
 }
 
-int set_git_dir(const char *path)
+void set_git_dir(const char *path)
 {
if (setenv(GIT_DIR_ENVIRONMENT, path, 1))
-   return error("Could not set GIT_DIR to '%s'", path);
+   die("could not set GIT_DIR to '%s'", path);
repo_set_gitdir(the_repository, path);
setup_git_env();
-   return 0;
 }
 
 const char *get_log_output_encoding(void)
-- 
2.17.0.rc2.594.gdb94a0ce02



Re: [PATCH] setup.c: reset candidate->work_tree after freeing it

2018-03-30 Thread Junio C Hamano
Duy Nguyen  writes:

> On Fri, Mar 30, 2018 at 7:10 PM, Junio C Hamano  wrote:
>
>> Which fields in candidate are safe to peek by the caller?  How can a
>> caller tell?
>
> To me, all fields should be valid after
> check_repository_format_gently().

If so, free() is wrong in the first place, and FREE_AND_NULL() is
making it even worse, no?  We learned there is work_tree set to
somewhere, the original code by Peff before abade65b ("setup: expose
enumerated repo info", 2017-11-12) freed it because the code no
longer needed that piece of information.  If we are passing all we
learned back to the caller, we should not free the field in the
function at all.  But it seems (below) the codepath is messier than
that.

> We still need to free and set NULL here though in addition to a
> cleanup interface. The reason is, when checking repo config from a
> worktree, we deliberately ignore core.worktree (which belongs to the
> main repo only). The implicit line near this
> free(candidate->work_tree) is "leave git_work_tree_cfg alone, we don't
> recognize core.worktree". Once we move setting git_work_tree_cfg out
> of this function, this becomes clear.

So in other words, there is a code that looks at the field and it
_wants_ to see NULL there---otherwise that brittle code misbehaves
and FREE_AND_NULL() is a bad-aid to work it around?

Then proposed log message "leaving it dangling is unsanitary" is
*not* what is going on here, and the real reason why the code should
be like so deserve to be described both in the log message and in a
large in-code comment, no?


Re: [PATCH] builtin/config.c: prefer `--type=bool` over `--bool`, etc.

2018-03-30 Thread Jeff King
On Fri, Mar 30, 2018 at 09:00:05AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > ... But actually, last-one-wins applies only
> > to a _single_ option, not necessarily unrelated ones. Many other
> > multi-action commands actually have a series of separate boolean flags,
> > and then complain when more than one of the flags is set.
> >
> > So maybe it's not such a good idea for the actions (I do still think
> > it's the right path for the types).
> 
> If this were using command verbs (e.g. "git config get foo.bar") as
> opposed to command options (e.g. "git config --get foo.bar"), it
> wouldn't ahve allowed multiple command verbs from the command line,
> and last-one-wins would not have made much sense because there is no
> way to trigger it as a desirable "feature".
> 
> Just like the topic of the discussion unifies --int/--bool/etc. into
> a single --type={int,bool,...}, perhaps the existing command options
> --get/--list/etc. can be taken as if they were a mistaken historical
> way to spell --action={get,list,...}.  I of course am not recommending
> to add a new "--action" option.  I am suggesting it as a thought-aid
> to see if actions are all that different from value type options.
> 
> I agree that a-bit-per-type that is checked with HAS_MULTI_BITS()
> for error at the end does not make much sense.  I also think what
> you did in this patch for actions is a good clean-up for the above
> reason.

I agree the code internally is nicer after the patch. If we throw away
the argument of "last-one-wins is more consistent with other parts of
git" (because I don't really think that it is for this type of option),
I could possibly buy this as a code cleanup. But it does have a
user-visible impact, which makes the question: are we _OK_ with
switching to last-one-wins?

-Peff


Re: [PATCH v3 3/3] builtin/config: introduce `color` type specifier

2018-03-30 Thread Junio C Hamano
Taylor Blau  writes:

> @@ -184,6 +183,7 @@ Valid `[type]`'s include:
>  --bool-or-int::
>  --path::
>  --expiry-date::
> +--color::
>Historical options for selecting a type specifier. Prefer instead `--type`,
>(see: above).
>  
> @@ -223,6 +223,9 @@ Valid `[type]`'s include:
>   output it as the ANSI color escape sequence to the standard
>   output.  The optional `default` parameter is used instead, if
>   there is no color configured for `name`.
> ++
> +It is preferred to use `--type=color`, or `--type=color --default=[default]`
> +instead of `--get-color`.

Wasn't the whole point of the preliminary --type= patch to
avoid having to add thse two hunks?


Re: [PATCH v3 1/3] builtin/config: introduce `--default`

2018-03-30 Thread Junio C Hamano
Taylor Blau  writes:

> For some use cases, callers of the `git-config(1)` builtin would like to
> fallback to default values when the slot asked for does not exist. In
> addition, users would like to use existing type specifiers to ensure
> that values are parsed correctly when they do exist in the
> configuration.
> ...
> +--default value::
> +  When using `--get`, and the requested slot is not found, behave as if value
> +  were the value assigned to the that slot.

For "diff..color", the above is OK, but in general,
configuration variables are not called "slot".  s/slot/variable/.


Re: [PATCH] setup.c: reset candidate->work_tree after freeing it

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 7:10 PM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> Dangling pointers are usually bad news. Reset it back to NULL.
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> ---
>
> Before abade65b ("setup: expose enumerated repo info", 2017-11-12),
> candidate was an on-stack variable local to this function, so there
> was no need to do anything before returning.  After that commit, we
> pass _fmt down the codepath so theoretically the caller could
> peek into repo_fmt.work_tree after this codepath, which may be bad.
> But if candidate->work_tree were not NULL at this point, that would
> mean that such a caller that peeks is getting a WRONG information,
> no?  It thinks there were no core.worktree set but in reality there
> was the configuration set in the repository, no?
>
> Which fields in candidate are safe to peek by the caller?  How can a
> caller tell?

To me, all fields should be valid after
check_repository_format_gently(). Right now the caller does not need
to read any info from repo_fmt because check_repo... passes the
information in another way, via global variables like
is_bare_repository_cfg and git_work_tree_cfg.

> It seems that setup_git_directory_gently() uses repo_fmt when
> calling various variants of setup_*_git_dir() and then uses the
> repo_fmt.hash_algo field later.

Yes. Though if we're going to reduce global state further more, then
the "if (!has_common)" should be done by the caller, then we need
access to all fields in repo_fmt

> If we want to keep fields of repo_fmt alive and valid after
> check_repository_format_gently() and callers of it like
> setup_*_git_dir() returns, then perhaps the right fix is not to free
> candidate->work_tree here, and instead give an interface to clean up
> repository_format instance, so that the ultimate caller like
> setup_git_directory_gently() can safely peek into any fields in it
> and then clean it up after it is done?

We still need to free and set NULL here though in addition to a
cleanup interface. The reason is, when checking repo config from a
worktree, we deliberately ignore core.worktree (which belongs to the
main repo only). The implicit line near this
free(candidate->work_tree) is "leave git_work_tree_cfg alone, we don't
recognize core.worktree". Once we move setting git_work_tree_cfg out
of this function, this becomes clear.

I didn't think all of this when I wrote this patch though. It was
"hey, it's theoretical bug so let's fix it". Only later on when I
refactored more that I realized what it meant.
-- 
Duy


Re: [PATCH 3/4] set_work_tree: use chdir_notify

2018-03-30 Thread Jeff King
On Thu, Mar 29, 2018 at 08:01:33PM +0200, Duy Nguyen wrote:

> > I do kind of like that. I'm reasonably happy with the chdir_notify()
> > interface, but it would be nicer still if we could get rid of it in the
> > first place. It's true that we _could_ chdir from other places, but
> 
> There's another place we do, that I should mention and keep
> forgetting. Our run-command.c code allows to switch cwd, and if
> $GIT_DIR and stuff is relative then we should reparent them too just
> like we do with setup_work_tree(). Your chdir-notify makes it super
> easy to support that, we just need to move the prep_childenv() down
> below chdir(). But since nobody has complaint, I suppose that feature
> is not really popular (or at least not used to launch another git
> process anyway)

Yeah, I hadn't really considered that, since it would only matter for
environment variables, not for internal strings (which are all about to
get thrown out due to execve anyway). And my patch was just focusing on
that.

But I also wonder if rewriting the environment variables would matter
here. If we're going to chdir for a child, we'd generaly be clearing
repo-related env anyway.

-Peff


Re: [PATCH 0/8] Re: git complains packed-refs is not a directory when used with GIT_DIR and GIT_WORK_TREE envvars.

2018-03-30 Thread Jeff King
On Thu, Mar 29, 2018 at 04:57:26PM +0200, Duy Nguyen wrote:

> On Wed, Mar 28, 2018 at 02:19:32PM -0400, Jeff King wrote:
> > 
> > > I will probably rework on top of your chdir-notify instead (and let
> > > yours to be merged earlier)
> > 
> > Thanks. I like some of the related changes you made, like including this
> > in the tracing output. That should be easy to do on top of mine, I
> > think.
> 
> Yeah. But is it possible to sneak something like this in your series
> (I assume you will reroll anyway)? I could do it separately, but it
> looks nicer if it's split out and merged in individual patches that
> add new chdir-notify call site.

Sure.

> -void chdir_notify_register(chdir_notify_callback cb, void *data)
> +void chdir_notify_register(const char *name,
> +chdir_notify_callback cb,
> +void *data)
>  {
>   struct chdir_notify_entry *e = xmalloc(sizeof(*e));
>   e->cb = cb;
>   e->data = data;
> + e->name = name;
>   list_add_tail(>list, _notify_entries);
>  }

I'm tempted to make a copy of the name here (or at least document that
it must remain valid forever).

-Peff


Re: Bad refspec messes up bundle.

2018-03-30 Thread Junio C Hamano
Johannes Schindelin  writes:

> Hi Luciano,
>
>> > Is this a bug? Should bundle allow providing multiple refspecs when
> ...
> I agree that it is a bug if a bundle file records a ref multiple times.
> Luciano, here are some pointers so you can fix it:
>
> - probably the best way to start would be to add a new test case to
>   t/t5607-clone-bundle.sh. The script *should* be relatively easy to
>   understand and imitate. The new test case would probably look somewhat
>   like this:
>
>   test_expect_failure 'bundles must not contain multiple refs' '

s/multiple/duplicate/.  It is not unusual for a bundle to record
more than one ref; it is (1) useless and harmful to unsuspecting
clients to record the same ref twice with the same value and (2)
nonesnse to record the same ref twice with different value.

Other than that, the outline seems to go in the right general
direction.



Re: [PATCH] setup.c: reset candidate->work_tree after freeing it

2018-03-30 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> Dangling pointers are usually bad news. Reset it back to NULL.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---

Before abade65b ("setup: expose enumerated repo info", 2017-11-12),
candidate was an on-stack variable local to this function, so there
was no need to do anything before returning.  After that commit, we
pass _fmt down the codepath so theoretically the caller could
peek into repo_fmt.work_tree after this codepath, which may be bad.
But if candidate->work_tree were not NULL at this point, that would
mean that such a caller that peeks is getting a WRONG information,
no?  It thinks there were no core.worktree set but in reality there
was the configuration set in the repository, no?

Which fields in candidate are safe to peek by the caller?  How can a
caller tell?

It seems that setup_git_directory_gently() uses repo_fmt when
calling various variants of setup_*_git_dir() and then uses the
repo_fmt.hash_algo field later.

If we want to keep fields of repo_fmt alive and valid after
check_repository_format_gently() and callers of it like
setup_*_git_dir() returns, then perhaps the right fix is not to free
candidate->work_tree here, and instead give an interface to clean up
repository_format instance, so that the ultimate caller like
setup_git_directory_gently() can safely peek into any fields in it
and then clean it up after it is done?

>  setup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/setup.c b/setup.c
> index 7287779642..d193bee192 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -482,7 +482,7 @@ static int check_repository_format_gently(const char 
> *gitdir, struct repository_
>   inside_work_tree = -1;
>   }
>   } else {
> - free(candidate->work_tree);
> + FREE_AND_NULL(candidate->work_tree);
>   }
>  
>   return 0;


Re: [PATCH 1/9] git_config_set: fix off-by-two

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 2:32 PM, Johannes Schindelin
 wrote:
> Hi,
>
> On Thu, 29 Mar 2018, Jeff King wrote:
>
>> On Thu, Mar 29, 2018 at 11:15:33AM -0700, Stefan Beller wrote:
>>
>> > > When calling `git config --unset abc.a` on this file, it leaves this
>> > > (invalid) config behind:
>> > >
>> > > [
>> > > [xyz]
>> > > key = value
>> > >
>> > > The reason is that we try to search for the beginning of the line (or
>> > > for the end of the preceding section header on the same line) that
>> > > defines abc.a, but as an optimization, we subtract 2 from the offset
>> > > pointing just after the definition before we call
>> > > find_beginning_of_line(). That function, however, *also* performs that
>> > > optimization and promptly fails to find the section header correctly.
>> >
>> > This commit message would be more convincing if we had it in test form.
>>
>> I agree a test might be nice. But I don't find the commit message
>> unconvincing at all. It explains pretty clearly why the bug occurs, and
>> you can verify it by looking at find_beginning_of_line.
>>
>> > [abc]a
>> >
>> > is not written by Git, but would be written from an outside tool or person
>> > and we barely cope with it?
>>
>> Yes, I don't think git would ever write onto the same line. But clearly
>> we should handle anything that's syntactically valid.
>
> I was tempted to add the test case, because it is easy to test it.
>
> But I then decided *not* to add it. Why? Testing is a balance between "can
> do" and "need to do".
>
> Can you imagine that I did *not* run the entire test suite before
> submitting this patch series, because it takes an incredible *90 minutes*
> to run *on a fast Windows machine*?

What's wrong with firing up a new worktree, run the test suite there
and go back to do something else so you won't waste time just waiting
for test results and submit? Sure there is a mental overhead for
switching tasks, but at 90 minutes, I think it's worth doing.
-- 
Duy


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-30 Thread Sergey Organov
Hi Johannes,

Johannes Schindelin  writes:

> Hi Sergey,
>
> On Wed, 28 Mar 2018, Sergey Organov wrote:
>
>> Johannes Schindelin  writes:
>> 
>> > I use rebase every day. I use the Git garden shears every week. If you
>> > do not trust my experience with these things, nothing will convince
>> > you. 
>> 
>> Unfortunately you have exactly zero experience with rebasing merges as
>> you've never actually rebased them till now, and it's rebasing merges
>> that matters in this particular discussion.
>
> Who says that I have 0 experience with that? Oh yes, you do. Like, as if
> you know.

I just didn't see even single symptom of it in the discussion, still I
said nothing about it until you started to use your presumed experience
in place of true arguments.

> Guess what I do with those Git garden shears' merges? Can you guess? Of
> course you can. But you'll never know until I tell you. It is a little
> silly to try to tell me that I do not have any experience with rebasing
> merges when you have no idea what strategies I tried in the past.

Please notice that I never even started to discuss your 'merge'
directive, exactly because I believe you have huge experience both
implementing and using it that could be relied upon. Just don't mix-in
rebasing merge commits into it, as that is fundamentally different
operation.

And the other unspoken strategies you tried are irrelevant here, as you
declined the whole idea of replaying merge-the-commit instead of
replaying merge-the-operation until recently, and it seems you still do,
at least to some level, by attempting to use 'merge' operation to replay
"the (merge) _commit_". I'm afraid that whatever you've tried in the
past likely suffered from the same conceptual confusion, and thus did
not work indeed.

That said, negative experience is still an experience, often helpful,
but what is relevant here is that it likely was not about rebasing merge
commits at all, so trying to use this irrelevant experience in the
discussion to support your arguments, or rather lack of them, seems even
more unfair to me then using relevant experience for that purpose.

> Now, Phillip's strategy is clearly the best strategy I ever heard
> about,

For 'pick' vs 'merge', it's not about strategy, it's about concept. It
looks like you still believe that you somehow "merge" a merge commit
when you actually rebase it (with Phillip's strategy if you wish). You
don't merge it anywhere, period.

> and I am in the process of doing Actual Work to Put It To The Test.

That's the best outcome of the discussion I ever hoped for, seriously,
and I'll be all ears to listen to the outcomes of the experience you
will gain with it.

BTW, when you have it working, use the strategy you've implemented for
non-merge commits as well, as a good one should still work fine.
Moreover, it should better bring exactly the same results as the default
existing strategy being used for non-merge commits, even in conflicting
situations.

Hopefully /that/ experience will finally push you strong enough to get
the concept right, and you will finally understand that what you've 
implemented is nothing else but a _cherry-pick_ of a _merge commit_,
that reads simply _pick_, for brevity.

-- Sergey


Re: [PATCH 1/9] git_config_set: fix off-by-two

2018-03-30 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> I think if it's worth fixing it's worth testing for, a future change to
> the config code could easily introduce a regression for this, and
> particularly in this type of code obscure edge cases like this can point
> to bugs elsewhere.

Yup.  "The port to my favourite platform is too slow, and everybody
should learn to live with thin test coverage" would not be a good
strategy in the longer run.



Re: [PATCH v4 3/5] stash: convert drop and clear to builtin

2018-03-30 Thread Junio C Hamano
Joel Teichroeb  writes:

> +static int do_clear_stash(void)
> +{
> + struct object_id obj;
> + if (get_oid(ref_stash, ))
> + return 0;
> + return delete_ref(NULL, ref_stash, , 0);
> +}

Here you see if the "refs/stash" is there, and learn what its
current value is, using get_oid(), so that you can call delete_ref()
with it.

> +static int do_drop_stash(const char *prefix, struct stash_info *info)
> +{
> + ...
> + cp.git_cmd = 1;
> + /* Even though --quiet is specified, rev-parse still outputs the hash */
> + cp.no_stdout = 1;
> + argv_array_pushl(, "rev-parse", "--verify", "--quiet", NULL);
> + argv_array_pushf(, "%s@{0}", ref_stash);
> + ret = run_command();
> + if (ret)
> + do_clear_stash();

Here you call out to rev-parse as an external process.  Isn't doing
the same get_oid() sufficient?

Not limited to the above examples, the conversion in this series
feels somewhat unbalanced---doing easy things like this by forking
an external process and then doing a relatively heavyweight thing
like merge operation (in 2/5) in-process feels the other way around.

Thanks.


Re: Draft of Git Rev News edition 37

2018-03-30 Thread Jacob Keller
On Fri, Mar 30, 2018 at 2:02 AM, Johannes Schindelin
 wrote:
> Hi,
>
> On Mon, 19 Mar 2018, Christian Couder wrote:
>
>> On Sun, Mar 18, 2018 at 11:41 PM, Jacob Keller  
>> wrote:
>> >
>> > I don't have a good summary yet, but I think a section about the
>> > discussion regarding the new recreate-merges and rebasing merges
>> > that's been on going might be useful?
>>
>> Yeah sure, we would gladly accept a summary of this discussion.
>
> I would *love* a summary of that discussion, especially since it got
> pretty unwieldy (and partially out of hand, but that part probably does
> not need a lot of detail apart from the adjective "heated").
>
>> > a lot of that discussion occurred prior to git-merge (tho it's been
>> > ongoing since then?).
>>
>> If you want to take the latest discussions into account, the summary
>> could be either split into two parts, one for this edition and the
>> other one for the next edition. Or we could wait and have the whole
>> summary in the next edition.
>
> Jake, I do not know about your availability, but I would love it if you
> could take a stab, as I trust you to be unbiased. I would not trust myself
> to be unbiased because (as everybody saw who cared to read) I got a little
> bit too emotional and should have stayed more professional.
>
> Thanks,
> Dscho

I hope to be able to make a summary of the discussion as best I can.
It may take a bit as there is a lot of mails to read. I agree that a
good summary should come from someone outside the discussion to reduce
emotional bias.

Thanks,
Jake


Re: [PATCH] builtin/config.c: prefer `--type=bool` over `--bool`, etc.

2018-03-30 Thread Junio C Hamano
Jeff King  writes:

> ... But actually, last-one-wins applies only
> to a _single_ option, not necessarily unrelated ones. Many other
> multi-action commands actually have a series of separate boolean flags,
> and then complain when more than one of the flags is set.
>
> So maybe it's not such a good idea for the actions (I do still think
> it's the right path for the types).

If this were using command verbs (e.g. "git config get foo.bar") as
opposed to command options (e.g. "git config --get foo.bar"), it
wouldn't ahve allowed multiple command verbs from the command line,
and last-one-wins would not have made much sense because there is no
way to trigger it as a desirable "feature".

Just like the topic of the discussion unifies --int/--bool/etc. into
a single --type={int,bool,...}, perhaps the existing command options
--get/--list/etc. can be taken as if they were a mistaken historical
way to spell --action={get,list,...}.  I of course am not recommending
to add a new "--action" option.  I am suggesting it as a thought-aid
to see if actions are all that different from value type options.

I agree that a-bit-per-type that is checked with HAS_MULTI_BITS()
for error at the end does not make much sense.  I also think what
you did in this patch for actions is a good clean-up for the above
reason.


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-30 Thread Sergey Organov
Hi Johannes,

Johannes Schindelin  writes:

> Hi Sergey,
>
> On Fri, 30 Mar 2018, Sergey Organov wrote:
>
>> Could we please agree to stop using backward compatibility as an
>> objection in the discussion of the  --recreate-merges feature?
>
> No.
>
> The expectation of users as to what a `pick` is has not changed just
> because you wish it would.

As if I ever suggested to change user expectations. Could you please
stop putting words into my mouth?

I _am_ a user, and I expect 'pick' to pick commits, no matter how many
parents they might have.

And no, --preserve-merges did not ever pick commits with number of
parents more than one, it rather threw them away and re-merged the
heads. Calling it 'pick' was a huge mistake indeed! Fixing that mistake
is what I expect, as a user.

Just teach the 'pick' to correctly pick any commit, please!

>
> That is a matter of backwards-compatibility.

OK, fine, at least its only about user expectations and not about some
scripting incompatibility.

> You see, if you are driving a car for a hundred years already, and then
> switch to a different car, and it has a lever in the same place as your
> previous car's windshield wiper, but in the new car it has a button that
> activates the emergency driver seat ejection OMG *it has a seat ejection
> like in the James Bond movies! Where can I get that car?* Sorry for
> disgressing.

Except it's irrelevant as the 'pick' will still pick commits.

> I am really concerned about that willingness to put an innocuous button,
> so to speak, onto something users got really used to, over the course of a
> decade or so, when that button should really be made red and blinking and
> OMG where can I get that car?

It's irrelevant as the 'pick' will still pick commits.

> So to reiterate, I am really interested in a practical solution that won't
> cause nasty surprises.

I rather don't see how it possibly could cause any surprises, especially
compared to using 'merge' to pick commits.

> Meaning: `pick` != merge.

Exactly! Use 'merge' when you merge, as you are already doing. Use 'pick'
when you are picking. You don't merge "merge commit" when you are
picking it!

> That was a mistake in preserve-merges, as I have only mentioned like a
> hundred times, and we won't repeat it.

The mistake was that it used 'pick' to denote re-merge. You already
fixed that mistake by introducing 'merge' to re-merge, thanks God.

Please don't commit yet another mistake by now using 'merge' to pick!

-- Sergey


Re: [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug)

2018-03-30 Thread Ævar Arnfjörð Bjarmason

On Thu, Mar 29 2018, Johannes Schindelin wrote:

> Nonetheless, I would be confortable with this patch going into v2.17.0, even 
> at
> this late stage. The final verdict is Junio's, of course.

Thanks a lot for working on this. I'm keen to stress test this, but
won't have time in the next few days, and in any case think that the
parts that change functionality should wait until after 2.17 (but
e.g. the test renaming would be fine for a cherry-pick).


Re: [PATCH 1/9] git_config_set: fix off-by-two

2018-03-30 Thread Ævar Arnfjörð Bjarmason

On Fri, Mar 30 2018, Johannes Schindelin wrote:

> On Thu, 29 Mar 2018, Jeff King wrote:
>
>> On Thu, Mar 29, 2018 at 11:15:33AM -0700, Stefan Beller wrote:
>>
>> > > When calling `git config --unset abc.a` on this file, it leaves this
>> > > (invalid) config behind:
>> > >
>> > > [
>> > > [xyz]
>> > > key = value
>> > >
>> > > The reason is that we try to search for the beginning of the line (or
>> > > for the end of the preceding section header on the same line) that
>> > > defines abc.a, but as an optimization, we subtract 2 from the offset
>> > > pointing just after the definition before we call
>> > > find_beginning_of_line(). That function, however, *also* performs that
>> > > optimization and promptly fails to find the section header correctly.
>> >
>> > This commit message would be more convincing if we had it in test form.
>>
>> I agree a test might be nice. But I don't find the commit message
>> unconvincing at all. It explains pretty clearly why the bug occurs, and
>> you can verify it by looking at find_beginning_of_line.
>>
>> > [abc]a
>> >
>> > is not written by Git, but would be written from an outside tool or person
>> > and we barely cope with it?
>>
>> Yes, I don't think git would ever write onto the same line. But clearly
>> we should handle anything that's syntactically valid.
>
> I was tempted to add the test case, because it is easy to test it.
>
> But I then decided *not* to add it. Why? Testing is a balance between "can
> do" and "need to do".
>
> Can you imagine that I did *not* run the entire test suite before
> submitting this patch series, because it takes an incredible *90 minutes*
> to run *on a fast Windows machine*?

I think if it's worth fixing it's worth testing for, a future change to
the config code could easily introduce a regression for this, and
particularly in this type of code obscure edge cases like this can point
to bugs elsewhere.

We have the EXPENSIVE_ON_WINDOWS prerequisite already in master from an
earlier series of mine, maybe we could use that here, or add some other
prereq like OVERLY_EXHAUSTIVE which by default could depend on
EXPENSIVE_ON_WINDOWS, i.e. we'd have a set of overly pedantic tests that
we skip on Windows by default, as there's no reason to suspect they're
platform-dependent, but we'd like to know if they regress.


Re: [PATCH 8/9] git_config_set: use do_config_from_file() directly

2018-03-30 Thread Jeff King
On Fri, Mar 30, 2018 at 04:01:56PM +0200, Johannes Schindelin wrote:

> You know what is *really* funny?
> 
> -- snip --
> static int git_config_from_stdin(config_fn_t fn, void *data)
> {
> return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin, 
> data, 0);
> }
> 
> int git_config_from_file(config_fn_t fn, const char *filename, void *data)
> {
> int ret = -1;
> FILE *f;
> 
> f = fopen_or_warn(filename, "r");
> if (f) {
> flockfile(f);
> ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename, 
> filename, f, data, 0);
> funlockfile(f);
> fclose(f);
> }
> return ret;
> }
> -- snap --
> 
> So the _stdin variant *goes out of its way not to flockfile()*...

*facepalm* That's probably my fault, since git_config_from_stdin()
existed already when I did the flockfile stuff.

Probably the flockfile should go into do_config_from_file(), where we
specify to use the unlocked variants.

> But I guess all this will become moot when I start handing down the config
> options. It does mean that I have to change the signatures in header
> files, oh well ;-)
> 
> But then I can drop this here patch and we can stop musing about
> flockfile()  ;-)

Yeah, I'll wait to see how your refactor turns out.

-Peff


Re: [PATCH v5 0/6] worktree: teach "add" to check out existing branches

2018-03-30 Thread Thomas Gummerer
On 03/27, Eric Sunshine wrote:
> On Sun, Mar 25, 2018 at 9:49 AM, Thomas Gummerer  wrote:
> > Thanks Eric for the review of the previous round and Duy and Junio for
> > additional comments.
> > This round should address all of Eric's comments from the previous round.
> 
> Thanks, it appears to cover my review comments from the previous
> round. I do have some additional comments on this round (which I could
> have raised with the previous round if I had thought of them at the
> time).
> 
> > As explained in more detail in a reply to the review comment directly,
> > I did not add an enum to 'struct add_opts', for 'force_new_branch' and
> > 'checkout_existing_branch', but instead removed 'force_new_branch'
> > from the struct as it's not required.
> 
> Makes sense. In fact, I had thoughts along these lines during your
> previous dwim-ery series. See my comments on patch 3/6.
> 
> > The rest of the updates are mainly in the user facing messages,
> > documentation and one added test.
> > Interdiff below:
> 
> The interdiff looks sane. Unfortunately, due to UI regressions, I'm
> having second thoughts about whether this series is going in the right
> direction. See my comments on patch 2/6.

Thanks for your reviews of this series!  As I mentioned in the reply
there I'm going to see whether or not I can fix those regressions
(hopefully I can :)), and send a re-roll.


Re: [PATCH 8/9] git_config_set: use do_config_from_file() directly

2018-03-30 Thread Johannes Schindelin
Hi Peff,

On Fri, 30 Mar 2018, Jeff King wrote:

> On Fri, Mar 30, 2018 at 03:02:00PM +0200, Johannes Schindelin wrote:
> 
> > > I'm not sure I understand that last paragraph. What does flockfile() have
> > > to do with stdin/stdout?
> > > 
> > > The point of those calls is that we're locking the FILE handle, so that
> > > it's safe for the lower-level config code to run getc_unlocked(), which
> > > is faster.
> > > 
> > > So without those, we're calling getc_unlocked() without holding the
> > > lock. I think it probably works in practice because we know that we're
> > > single-threaded, but it seems a bit sketchy.
> > 
> > Oops. I misunderstood the purpose of flockfile(), then. I thought it was
> > only about multiple users of stdin/stdout.
> > 
> > Will have a look whether flockfile()/funlockfile() can be moved into
> > do_config_from_file() instead.
> 
> In a sense stdin/stdout are much more susceptible to this because
> they're global variables, and any thread may touch them. For the config
> code, we open our own handle that we don't expose elsewhere. So probably
> it would be fine just to use the unlocked variants even without locking.
> 
> But IMHO it's good practice to always flockfile() before using the
> unlocked variants. My reading of POSIX is that it's OK to use the
> unlocked variants without holding the lock (if you know there won't be
> contention), but if it's not hard to err on the side of safety, I'd
> prefer it.

You know what is *really* funny?

-- snip --
static int git_config_from_stdin(config_fn_t fn, void *data)
{
return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin, 
data, 0);
}

int git_config_from_file(config_fn_t fn, const char *filename, void *data)
{
int ret = -1;
FILE *f;

f = fopen_or_warn(filename, "r");
if (f) {
flockfile(f);
ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename, 
filename, f, data, 0);
funlockfile(f);
fclose(f);
}
return ret;
}
-- snap --

So the _stdin variant *goes out of its way not to flockfile()*...

But I guess all this will become moot when I start handing down the config
options. It does mean that I have to change the signatures in header
files, oh well ;-)

But then I can drop this here patch and we can stop musing about
flockfile()  ;-)

Ciao,
Dscho


Re: [PATCH v5 5/6] worktree: teach "add" to check out existing branches

2018-03-30 Thread Thomas Gummerer
On 03/27, Eric Sunshine wrote:
> On Sun, Mar 25, 2018 at 9:49 AM, Thomas Gummerer  wrote:
> > Currently 'git worktree add ' creates a new branch named after the
> > basename of the path by default.  If a branch with that name already
> > exists, the command refuses to do anything, unless the '--force' option
> > is given.
> >
> > However we can do a little better than that, and check the branch out if
> > it is not checked out anywhere else.  This will help users who just want
> > to check an existing branch out into a new worktree, and save a few
> > keystrokes.
> > [...]
> > Signed-off-by: Thomas Gummerer 
> > ---
> > diff --git a/builtin/worktree.c b/builtin/worktree.c
> > @@ -317,7 +318,10 @@ static int add_worktree(const char *path, const char 
> > *refname,
> > -   if (opts->new_branch)
> > +   if (opts->checkout_existing_branch)
> > +   fprintf_ln(stderr, _("checking out branch '%s'"),
> > +  refname);
> 
> This fprintf_ln() can easily fit on one line; no need to wrap
> 'refname' to the next one.
> 
> Not worth a re-roll, though.
> 
> > +   else if (opts->new_branch)
> > fprintf_ln(stderr, _("creating branch '%s'"), 
> > opts->new_branch);
> > diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
> > @@ -198,13 +198,26 @@ test_expect_success '"add" with  omitted' '
> > -test_expect_success '"add" auto-vivify does not clobber existing branch' '
> > +test_expect_success '"add" checks out existing branch of dwimd name' '
> > test_commit c1 &&
> > test_commit c2 &&
> > git branch precious HEAD~1 &&
> > -   test_must_fail git worktree add precious &&
> > +   git worktree add precious &&
> > test_cmp_rev HEAD~1 precious &&
> > -   test_path_is_missing precious
> > +   (
> > +   cd precious &&
> > +   test_cmp_rev precious HEAD
> > +   )
> > +'
> 
> Looking at this more closely, once it stops being a "don't clobber
> precious branch" test, it's doing more than it needs to do, thus is
> confusing for future readers. The setup -- creating two commits and
> making "precious" point one commit back -- was very intentional and
> allowed the test to verify not only that the worktree wasn't created
> but that "precious" didn't change to reference a different commit.
> However, _none_ of this matters once it becomes a dwim'ing test; the
> unnecessary code is confusing since it doesn't make sense in the
> context of dwim'ing. I _think_ the entire test can collapse to:
> 
> git branch existing &&
> git worktree add existing &&
> (
> cd existing &&
> test_cmp_rev existing HEAD
> )
> 
> In other words, this patch should drop the "precious" test altogether
> and just introduce a new dwim'ing test (and drop patch 6/6).
> 
> I do think that with the potential confusion to future readers, this
> does deserve a re-roll.

Hmm I do think that it's important to have the branch we create the
new working tree from different from the branch of the main working
tree, to make sure we don't accidentally overwrite an existing branch,
and to show that the new branch is checked out.

Maybe I'm overly paranoid though?

> > +test_expect_success '"add" auto-vivify fails with checked out branch' '
> > +   git checkout -b test-branch &&
> > +   test_must_fail git worktree add test-branch &&
> > +   test_path_is_missing test-branch
> > +'
> > +
> > +test_expect_success '"add --force" with existing dwimd name doesnt die' '
> > +   git worktree add --force test-branch
> >  '


Re: [PATCH] builtin/config.c: prefer `--type=bool` over `--bool`, etc.

2018-03-30 Thread Jeff King
On Fri, Mar 30, 2018 at 01:27:19AM -0400, Taylor Blau wrote:

> > If you really want to go all-out, I think the ACTION flags could use the
> > same cleanup. We treat them as bitflags, and then issue an error when
> > you set more than one, which is just silly.
> 
> Agreed, and I think that this is a good candidate for a future patch.
> Thoughts? :-).

I actually worked this up for fun, though I had second thoughts while
writing the commit message.

Besides the cleanup, my primary motivation was following last-one-wins
rules as we often do elsewhere. But actually, last-one-wins applies only
to a _single_ option, not necessarily unrelated ones. Many other
multi-action commands actually have a series of separate boolean flags,
and then complain when more than one of the flags is set.

So maybe it's not such a good idea for the actions (I do still think
it's the right path for the types).

For reference, here's the patch I wrote:

 builtin/config.c | 137 +--
 1 file changed, 72 insertions(+), 65 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index 01169dd628..5581f48ac8 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -25,35 +25,36 @@ static char term = '\n';
 
 static int use_global_config, use_system_config, use_local_config;
 static struct git_config_source given_config_source;
-static int actions, types;
+static int types;
 static int end_null;
 static int respect_includes_opt = -1;
 static struct config_options config_options;
 static int show_origin;
 
-#define ACTION_GET (1<<0)
-#define ACTION_GET_ALL (1<<1)
-#define ACTION_GET_REGEXP (1<<2)
-#define ACTION_REPLACE_ALL (1<<3)
-#define ACTION_ADD (1<<4)
-#define ACTION_UNSET (1<<5)
-#define ACTION_UNSET_ALL (1<<6)
-#define ACTION_RENAME_SECTION (1<<7)
-#define ACTION_REMOVE_SECTION (1<<8)
-#define ACTION_LIST (1<<9)
-#define ACTION_EDIT (1<<10)
-#define ACTION_SET (1<<11)
-#define ACTION_SET_ALL (1<<12)
-#define ACTION_GET_COLOR (1<<13)
-#define ACTION_GET_COLORBOOL (1<<14)
-#define ACTION_GET_URLMATCH (1<<15)
-
+enum config_action {
+   ACTION_NONE = 0,
+   ACTION_GET,
+   ACTION_GET_ALL,
+   ACTION_GET_REGEXP,
+   ACTION_REPLACE_ALL,
+   ACTION_ADD,
+   ACTION_UNSET,
+   ACTION_UNSET_ALL,
+   ACTION_RENAME_SECTION,
+   ACTION_REMOVE_SECTION,
+   ACTION_LIST,
+   ACTION_EDIT,
+   ACTION_SET,
+   ACTION_SET_ALL,
+   ACTION_GET_COLOR,
+   ACTION_GET_COLORBOOL,
+   ACTION_GET_URLMATCH,
+};
 /*
- * The actions "ACTION_LIST | ACTION_GET_*" which may produce more than
- * one line of output and which should therefore be paged.
+ * This must be an int and not an enum because we pass it by address
+ * to OPT_SETINT.
  */
-#define PAGING_ACTIONS (ACTION_LIST | ACTION_GET_ALL | \
-   ACTION_GET_REGEXP | ACTION_GET_URLMATCH)
+static int action;
 
 #define TYPE_BOOL (1<<0)
 #define TYPE_INT (1<<1)
@@ -69,20 +70,20 @@ static struct option builtin_config_options[] = {
OPT_STRING('f', "file", _config_source.file, N_("file"), N_("use 
given config file")),
OPT_STRING(0, "blob", _config_source.blob, N_("blob-id"), 
N_("read config from given blob object")),
OPT_GROUP(N_("Action")),
-   OPT_BIT(0, "get", , N_("get value: name [value-regex]"), 
ACTION_GET),
-   OPT_BIT(0, "get-all", , N_("get all values: key 
[value-regex]"), ACTION_GET_ALL),
-   OPT_BIT(0, "get-regexp", , N_("get values for regexp: 
name-regex [value-regex]"), ACTION_GET_REGEXP),
-   OPT_BIT(0, "get-urlmatch", , N_("get value specific for the 
URL: section[.var] URL"), ACTION_GET_URLMATCH),
-   OPT_BIT(0, "replace-all", , N_("replace all matching variables: 
name value [value_regex]"), ACTION_REPLACE_ALL),
-   OPT_BIT(0, "add", , N_("add a new variable: name value"), 
ACTION_ADD),
-   OPT_BIT(0, "unset", , N_("remove a variable: name 
[value-regex]"), ACTION_UNSET),
-   OPT_BIT(0, "unset-all", , N_("remove all matches: name 
[value-regex]"), ACTION_UNSET_ALL),
-   OPT_BIT(0, "rename-section", , N_("rename section: old-name 
new-name"), ACTION_RENAME_SECTION),
-   OPT_BIT(0, "remove-section", , N_("remove a section: name"), 
ACTION_REMOVE_SECTION),
-   OPT_BIT('l', "list", , N_("list all"), ACTION_LIST),
-   OPT_BIT('e', "edit", , N_("open an editor"), ACTION_EDIT),
-   OPT_BIT(0, "get-color", , N_("find the color configured: slot 
[default]"), ACTION_GET_COLOR),
-   OPT_BIT(0, "get-colorbool", , N_("find the color setting: slot 
[stdout-is-tty]"), ACTION_GET_COLORBOOL),
+   OPT_SET_INT(0, "get", , N_("get value: name [value-regex]"), 
ACTION_GET),
+   OPT_SET_INT(0, "get-all", , N_("get all values: key 
[value-regex]"), ACTION_GET_ALL),
+   OPT_SET_INT(0, "get-regexp", , N_("get values for regexp: 
name-regex [value-regex]"), ACTION_GET_REGEXP),
+   OPT_SET_INT(0, "get-urlmatch", , N_("get value specific for the 
URL: section[.var] URL"), 

Re: [PATCH v5 3/6] worktree: remove force_new_branch from struct add_opts

2018-03-30 Thread Thomas Gummerer
On 03/27, Eric Sunshine wrote:
> On Sun, Mar 25, 2018 at 9:49 AM, Thomas Gummerer  wrote:
> > The 'force_new_branch' flag in 'struct add_opts' is only used inside the
> > add function, where we already have the same information stored in the
> > 'new_branch_force' variable.  Avoid that unnecessary duplication.
> 
> When I was reviewing your original dwim-ery series (inferring 'foo'
> from 'origin/foo'), I noticed that 'struct add_opts' had accumulated a
> number of unnecessary members over time, including this one. My plan
> was to submit a patch removing all those pointless members after your
> dwim-ery series settled, however, I never got around to it.

Ah right, I didn't look at them in detail, so I failed to notice that.
While I'm already in the area I may as well do that, thanks for the
suggestion!

> This patch might be a good opportunity for doing that cleanup
> wholesale, removing all unneeded members rather than just the one. (If
> so, you'd probably want to move to this patch to the front of the
> series as cleanup before the meatier changes.) Anyhow, it's just a
> thought; feel free to respond with "it's outside the scope of this
> series" if you're not interested in doing it.
> 
> > Signed-off-by: Thomas Gummerer 


Re: [PATCH v5 2/6] worktree: be clearer when "add" dwim-ery kicks in

2018-03-30 Thread Thomas Gummerer
On 03/27, Eric Sunshine wrote:
> On Sun, Mar 25, 2018 at 9:49 AM, Thomas Gummerer  wrote:
> > Currently there is no indication in the "git worktree add" output that
> > a new branch was created.  This would be especially useful information
> > in the case where the dwim of "git worktree add " kicks in, as the
> > user didn't explicitly ask for a new branch, but we create one from
> > them.
> 
> Failed to notice this last time around: s/from/for/
> 
> Perhaps Junio can tweak it when queuing.
> 
> > Print some additional output showing that a branch was created and the
> > branch name to help the user.
> >
> > This will also be useful in the next commit, which introduces a new kind
> 
> It's no longer the _next_ commit which does this. Perhaps say instead
> "a subsequent commit". (Again, perhaps Junio can tweak it since it's
> not worth a re-roll.)

Right, will fix.

> > of dwim-ery of checking out the branch if it exists instead of refusing
> > to create a new worktree in that case, and where it's nice to tell the
> > user which kind of dwim-ery kicked in.
> >
> > Signed-off-by: Thomas Gummerer 
> > ---
> > diff --git a/builtin/worktree.c b/builtin/worktree.c
> > @@ -318,6 +318,9 @@ static int add_worktree(const char *path, const char 
> > *refname,
> > +   if (opts->new_branch)
> > +   fprintf_ln(stderr, _("creating branch '%s'"), 
> > opts->new_branch);
> 
> I didn't think of this before, but what happens with the original
> dwim-ery you added which checks out a remote branch matching the
> worktree name if no such local branch exists? I was wondering if we'd
> want to see a distinct and more informative message in that case, such
> as "creating branch '%s' from remote '%s'" or something. However, I
> see that we're already getting a message:
> 
> Branch 'foo' set up to track remote branch 'foo' from 'origin'.
> creating branch 'foo'
> new worktree HEAD is now at d3adb33f boople
> 
> which is a bit weird since tracking appears to be set up before the
> branch itself is created. I thought that this was another UI
> regression of the sort I mentioned in [1], however, the messages were
> out of order even before the current patch series:

Yeah I agree this is a bit odd.  I did think about this, and the main
reason why I didn't change this by passing the '--quiet' flag to 'git
branch' which we call in the 'add()' function is to keep the
information about the remote tracking.

Looking at where this information actually comes from, there's
multiple different ways we'd have to distinguish to print this
information properly.

Now that I put some more thought into this, I think there are a couple
of ways to solve this, the easiest and cleanest of which would
probably be to run 'git branch' through 'pipe_command', save the
output and print it after the "creating branch 'foo'" message. 

> Branch 'foo' set up to track remote branch 'foo' from 'origin'.
> Preparing foo (identifier foo)
> Checking out files: 100% (/), done.
> HEAD is now at d3adb33f boople
> 
> This highlights another regression introduced by this series. Patch
> 1/6 suppresses the "Checking out files:" message, which is a bit
> unfortunate for decent sized worktrees. I'm not sure I'm entirely
> happy about that loss. Thoughts?

Tbh I never really noticed the "Checking out files" output, because
the repositories I used day to day are usually of a smaller size,
where checking out the files takes less than a second, so I would
never even notice this output.

But while I don't care about it, others may, so I think it would be
better to preserve this output.  Maybe the best way to do that would
be to not use 'run_command' to run 'git reset --hard', but to instead
replace that with the internal functions.

I haven't actually looked at how hard this would be, but I'm guessing
this may be the best option to avoid this UI regression.  I'll play
with this some more and see what I can come up with, and send a
re-roll hopefully in a few days.

Thanks for your review!

> [1]: 
> https://public-inbox.org/git/CAPig+cT8i9L9kbhx=b0sg4_qyndoedpw-1xypm9xzbqpmqr...@mail.gmail.com/
> 
> > fprintf(stderr, _("new worktree HEAD is now at %s"),
> > find_unique_abbrev(commit->object.oid.hash, 
> > DEFAULT_ABBREV));


Re: [PATCH v2 2/2] builtin/config.c: prefer `--type=bool` over `--bool`, etc.

2018-03-30 Thread Jeff King
On Fri, Mar 30, 2018 at 01:28:30AM -0400, Taylor Blau wrote:

> +static int option_parse_type(const struct option *opt, const char *arg,
> +  int unset)
> +{
> + int *type = opt->value;
> +
> + if (!strcmp(arg, "bool"))
> + *type = TYPE_BOOL;
> + else if (!strcmp(arg, "int"))
> + *type = TYPE_INT;
> + else if (!strcmp(arg, "bool-or-int"))
> + *type = TYPE_BOOL_OR_INT;
> + else if (!strcmp(arg, "path"))
> + *type = TYPE_PATH;
> + else if (!strcmp(arg, "expiry-date"))
> + *type = TYPE_EXPIRY_DATE;
> + else {
> + die(_("unexpected --type argument, %s"), arg);
> + return 1;
> + }
> + return 0;
> +}

You need to handle "unset" here, which will trigger when somebody does:

  git config --no-type

In which case you'd probably want to reset it to "0".

> @@ -1622,4 +1623,21 @@ test_expect_success 'later legacy specifiers are given 
> precedence' '
>   test_cmp expect actual
>  '
>  
> +test_expect_success '--type allows valid type specifiers' '
> + echo "true" >expect &&
> + git config --type=bool core.foo >actual &&
> + test_cmp expect actual
> +'
> +
> +test_expect_success '--type rejects unknown specifiers' '
> + test_must_fail git config --type=nonsense core.foo 2>error &&
> + test_i18ngrep "unexpected --type argument" error
> +'
> +
> +test_expect_success 'later legacy specifiers are given precedence' '
> + git config --bool --int core.number >actual &&
> + echo 10 > expect &&
> + test_cmp expect actual
> +'

I think there's some rebasing funkiness with this last test, which
already exists from patch 1.

Other than those two minor issues and the typos that René noticed, this
looks good to me.

-Peff


Re: [PATCH v2 1/2] builtin/config.c: treat type specifiers singularly

2018-03-30 Thread Jeff King
On Fri, Mar 30, 2018 at 01:28:29AM -0400, Taylor Blau wrote:

> Internally, we represent `git config`'s type specifiers as a bitset
> using OPT_BIT. 'bool' is 1<<0, 'int' is 1<<1, and so on. This technique
> allows for the representation of multiple type specifiers in the `int
> types` field, but this multi-representation is left unused.
> 
> In fact, `git config` will not accept multiple type specifiers at a
> time, as indicated by:
> 
>   $ git config --int --bool some.section
>   error: only one type at a time.
> 
> This patch uses `OPT_SET_INT` to prefer the _last_ mentioned type
> specifier, so that the above command would instead be valid, and a
> synonym of:
> 
>   $ git config --bool some.section
> 
> This change is motivated by two urges: (1) it does not make sense to
> represent a singular type specifier internally as a bitset, only to
> complain when there are multiple bits in the set. `OPT_SET_INT` is more
> well-suited to this task than `OPT_BIT` is. (2) a future patch will
> introduce `--type=`, and we would like not to complain in the
> following situation:
> 
>   $ git config --int --type=int

I think you could just end here, since...

> Where--although the legacy and modern type specifier (`--int`, and
> `--type`, respectively) do not conflict--we would store the value of
> `--type=` in a `char *` and the `--int` as a bit in the `int` bitset.
> 
> In the above, when error-checking `if (types != && type_str)`, we do not
> check additionally whether or not these types are the same, and simply
> complain immediately. This change makes `--int` and `--type=int` a true
> synonym of each other, and removes the need for additional complexity,
> as above in the conditional.

None of this type_str stuff exists yet, and you've sufficiently
motivated the change.

> Signed-off-by: Taylor Blau 
> ---
>  builtin/config.c   | 39 +--
>  t/t1300-repo-config.sh | 11 +++
>  2 files changed, 28 insertions(+), 22 deletions(-)

The patch mostly looks good. We probably want to also rewrite the TYPE_*
#defines to be sequential, since somebody may scratch their head
wondering why we use one bit per type when we do not treat them as a
bitfield.

> diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
> index 4f8e6f5fd..aa45b9267 100755
> --- a/t/t1300-repo-config.sh
> +++ b/t/t1300-repo-config.sh
> @@ -1611,4 +1611,15 @@ test_expect_success '--local requires a repo' '
>   test_expect_code 128 nongit git config --local foo.bar
>  '
>  
> +cat >.git/config <<-\EOF &&
> +[core]
> +number = 10
> +EOF
> +
> +test_expect_success 'later legacy specifiers are given precedence' '
> + git config --bool --int core.number >actual &&
> + echo 10 > expect &&
> + test_cmp expect actual
> +'

Minor style nit: we prefer ">expect" with no space. Though t1300 is
certainly a cornucopia of style violations already. I could buy it under
the guise of matching existing style if there wasn't a counter-example
directly above. :)

We also prefer to put that "cat >.git/config" inside a test block,
though I'm pretty sure that _is_ following existing style in the test.

-Peff


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-30 Thread Johannes Schindelin
Hi Sergey,

On Wed, 28 Mar 2018, Sergey Organov wrote:

> Johannes Schindelin  writes:
> 
> > I use rebase every day. I use the Git garden shears every week. If you
> > do not trust my experience with these things, nothing will convince
> > you. 
> 
> Unfortunately you have exactly zero experience with rebasing merges as
> you've never actually rebased them till now, and it's rebasing merges
> that matters in this particular discussion.

Who says that I have 0 experience with that? Oh yes, you do. Like, as if
you know.

Guess what I do with those Git garden shears' merges? Can you guess? Of
course you can. But you'll never know until I tell you. It is a little
silly to try to tell me that I do not have any experience with rebasing
merges when you have no idea what strategies I tried in the past.

Now, Phillip's strategy is clearly the best strategy I ever heard about,
and I am in the process of doing Actual Work to Put It To The Test.

> > You are just stuck with your pre-existing opinion.
> 
> I'm afraid that it's rather your huge experience with re-creating merges
> that makes you stuck to your pre-existing opinion and carefully shields
> you from experiencing actual paradigm shift.

You know what? Whatevs.

Ciao,
Johannes


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-30 Thread Johannes Schindelin
Hi Sergey,

On Fri, 30 Mar 2018, Sergey Organov wrote:

> Could we please agree to stop using backward compatibility as an
> objection in the discussion of the  --recreate-merges feature?

No.

The expectation of users as to what a `pick` is has not changed just
because you wish it would.

That is a matter of backwards-compatibility.

You see, if you are driving a car for a hundred years already, and then
switch to a different car, and it has a lever in the same place as your
previous car's windshield wiper, but in the new car it has a button that
activates the emergency driver seat ejection OMG *it has a seat ejection
like in the James Bond movies! Where can I get that car?* Sorry for
disgressing.

I am really concerned about that willingness to put an innocuous button,
so to speak, onto something users got really used to, over the course of a
decade or so, when that button should really be made red and blinking and
OMG where can I get that car?

So to reiterate, I am really interested in a practical solution that won't
cause nasty surprises. Meaning: `pick` != merge. That was a mistake in
preserve-merges, as I have only mentioned like a hundred times, and we
won't repeat it.

Now back to that important question: where can I get such a James Bond
car? Ideally also with Turbo Boost. Oh wait, that was somebody else's car.

Ciao,
Johannes


Re: [PATCH v4 01/13] commit-graph: add format document

2018-03-30 Thread Jakub Narebski
Derrick Stolee  writes:

> +== graph-*.graph files have the following format:

What is this '*' here?

[...]
> +  The remaining data in the body is described one chunk at a time, and
> +  these chunks may be given in any order. Chunks are required unless
> +  otherwise specified.

Does Git need to understand all chunks, or could there be optional
chunks that can be safely ignored (like in PNG format)?  Though this may
be overkill, and could be left for later revision of the format if
deemed necessary.

> +
> +CHUNK DATA:
> +
> +  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
> +  The ith entry, F[i], stores the number of OIDs with first
> +  byte at most i. Thus F[255] stores the total
> +  number of commits (N).

All right, it is small enough that can be required even for a very small
number of commits.

> +
> +  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
> +  The OIDs for all commits in the graph, sorted in ascending order.
> +
> +  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)

Do commits need to be put here in the ascending order of OIDs?

If so, this would mean that it is not possible to add information about
new commits by only appending data and maybe overwriting some fields, I
think.  You would need to do full rewrite to insert new commit in
appropriate place.

> +* The first H bytes are for the OID of the root tree.
> +* The next 8 bytes are for the int-ids of the first two parents
> +  of the ith commit. Stores value 0x if no parent in that
> +  position. If there are more than two parents, the second value
> +  has its most-significant bit on and the other bits store an array
> +  position into the Large Edge List chunk.
> +* The next 8 bytes store the generation number of the commit and
> +  the commit time in seconds since EPOCH. The generation number
> +  uses the higher 30 bits of the first 4 bytes, while the commit
> +  time uses the 32 bits of the second 4 bytes, along with the lowest
> +  2 bits of the lowest byte, storing the 33rd and 34th bit of the
> +  commit time.
> +
> +  Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
> +  This list of 4-byte values store the second through nth parents for
> +  all octopus merges. The second parent value in the commit data stores
> +  an array position within this list along with the most-significant bit
> +  on. Starting at that array position, iterate through this list of 
> int-ids
> +  for the parents until reaching a value with the most-significant bit 
> on.
> +  The other bits correspond to the int-id of the last parent.

All right, that is one chunk that cannot use fixed-length records; this
shouldn't matter much, as we iterate only up to the number of parents
less two.

A question: what happens to the last list of parents?  Is there a
guardian value of 0x at last place?

> +
> +TRAILER:
> +
> + H-byte HASH-checksum of all of the above.
> +

Best,
--
Jakub Narębski


Re: [PATCH 9/9] git_config_set: reuse empty sections

2018-03-30 Thread Johannes Schindelin
Hi Peff,

On Thu, 29 Mar 2018, Jeff King wrote:

> On Thu, Mar 29, 2018 at 05:19:09PM +0200, Johannes Schindelin wrote:
> 
> > It can happen quite easily that the last setting in a config section is
> > removed, and to avoid confusion when there are comments in the config
> > about that section, we keep a lone section header, i.e. an empty
> > section.
> > 
> > The code to add new entries in the config tries to be cute by reusing
> > the parsing code that is used to retrieve config settings, but that
> > poses the problem that the latter use case does *not* care about empty
> > sections, therefore even the former user case won't see them.
> > 
> > Fix this by introducing a mode where the parser reports also empty
> > sections (with a trailing '.' as tell-tale), and then using that when
> > adding new config entries.
> 
> Heh, so it seems we are partway to the "event-stream" suggestion I made
> earlier. I agree this is the right way to approach this problem.
> 
> I wondered if we allow keys to end in ".", but it seems that we don't.
> 
> > diff --git a/config.c b/config.c
> > index eb1e0d335fc..b04c40f76bc 100644
> > --- a/config.c
> > +++ b/config.c
> > @@ -653,13 +653,15 @@ static int get_base_var(struct strbuf *name)
> > }
> >  }
> >  
> > -static int git_parse_source(config_fn_t fn, void *data)
> > +static int git_parse_source(config_fn_t fn, void *data,
> > +   int include_section_headers)
> 
> We already have a "struct config_options", but we do a terrible job of
> passing it around (since it only impacts the include stuff right now,
> and that all gets handled at a very outer level).
> 
> Rather than plumb this one int through everywhere, should we add it to
> that struct and plumb the struct through?

Yesss!

Again, thank you so much for this really valuable review. This is even
better than what I hoped for.

Ciao,
Dscho


Re: [PATCH 8/9] git_config_set: use do_config_from_file() directly

2018-03-30 Thread Jeff King
On Fri, Mar 30, 2018 at 03:02:00PM +0200, Johannes Schindelin wrote:

> > I'm not sure I understand that last paragraph. What does flockfile() have
> > to do with stdin/stdout?
> > 
> > The point of those calls is that we're locking the FILE handle, so that
> > it's safe for the lower-level config code to run getc_unlocked(), which
> > is faster.
> > 
> > So without those, we're calling getc_unlocked() without holding the
> > lock. I think it probably works in practice because we know that we're
> > single-threaded, but it seems a bit sketchy.
> 
> Oops. I misunderstood the purpose of flockfile(), then. I thought it was
> only about multiple users of stdin/stdout.
> 
> Will have a look whether flockfile()/funlockfile() can be moved into
> do_config_from_file() instead.

In a sense stdin/stdout are much more susceptible to this because
they're global variables, and any thread may touch them. For the config
code, we open our own handle that we don't expose elsewhere. So probably
it would be fine just to use the unlocked variants even without locking.

But IMHO it's good practice to always flockfile() before using the
unlocked variants. My reading of POSIX is that it's OK to use the
unlocked variants without holding the lock (if you know there won't be
contention), but if it's not hard to err on the side of safety, I'd
prefer it.

-Peff


Re: [PATCH 7/9] git config --unset: remove empty sections (in normal situations)

2018-03-30 Thread Jeff King
On Fri, Mar 30, 2018 at 03:00:06PM +0200, Johannes Schindelin wrote:

> > I guess the holy grail would be a parser which reports _all_ syntactic
> > events (section names, keys, comments, whitespace, etc) as a stream
> > without storing anything. And then the normal reader could just discard
> > the non-key events, and the writer here could build the tree from those
> > events.
> 
> I already changed the do_config_from_file()/do_config_from() code path to
> allow for handing back section headers. And I *think* that approach should
> be easily extended to allow for an optional callback for these syntactic
> events (and we do not need more than that, as the parsed "tree" really is
> a list: there is nothing nested about ini files, so we really only have a
> linear list of blocks (event type, offset range)).

True. I was thinking we'd want sections with keys, whitespace, and
comments under them. But even that does not really make sense. As this
patch series shows, comments do not "belong" to a section, and the file
really needs to be considered as a stream.

So yeah, if we can parse it into a sequence of events in one
forward-pass and then manipulate that sequence, I think it should be
sufficient (and _way_ more readable than the current code, even before
the bits you are trying to fix here).

> I'll think about this a little bit, and hopefully come back with v2 in a
> while that uses that approach.
> 
> Thank you so much for that suggestion,

Great. Thanks for working on this.

-Peff


Re: [PATCH 8/9] git_config_set: use do_config_from_file() directly

2018-03-30 Thread Johannes Schindelin
Hi Peff,

On Thu, 29 Mar 2018, Jeff King wrote:

> On Thu, Mar 29, 2018 at 05:19:04PM +0200, Johannes Schindelin wrote:
> 
> > Technically, it is the git_config_set_multivar_in_file_gently()
> > function that we modify here (but the oneline would get too long if we
> > were that precise).
> > 
> > This change prepares the git_config_set machinery to allow reusing empty
> > sections, by using the file-local function do_config_from_file()
> > directly (whose signature can then be changed without any effect outside
> > of config.c).
> > 
> > An incidental benefit is that we avoid a level of indirection, and we
> > also avoid calling flockfile()/funlockfile() when we already know that
> > we are not operating on stdin/stdout here.
> 
> I'm not sure I understand that last paragraph. What does flockfile() have
> to do with stdin/stdout?
> 
> The point of those calls is that we're locking the FILE handle, so that
> it's safe for the lower-level config code to run getc_unlocked(), which
> is faster.
> 
> So without those, we're calling getc_unlocked() without holding the
> lock. I think it probably works in practice because we know that we're
> single-threaded, but it seems a bit sketchy.

Oops. I misunderstood the purpose of flockfile(), then. I thought it was
only about multiple users of stdin/stdout.

Will have a look whether flockfile()/funlockfile() can be moved into
do_config_from_file() instead.

Ciao,
Dscho


Re: [PATCH 7/9] git config --unset: remove empty sections (in normal situations)

2018-03-30 Thread Johannes Schindelin
Hi Peff,

On Thu, 29 Mar 2018, Jeff King wrote:

> On Thu, Mar 29, 2018 at 05:19:00PM +0200, Johannes Schindelin wrote:
> 
> > Let's generalize this observation to this conservative strategy: if we
> > are removing the last entry from a section, and there are no comments
> > inside that section nor surrounding it, then remove the entire section.
> > Otherwise behave as before: leave the now-empty section (including those
> > comments, even the one about the now-deleted entry).
> 
> Yep, as I said earlier, this makes a ton of sense to me.
> 
> [... thorough review ...]

Thank you for taking the time (and figuring out my off-by-ones, am I not
the king of those?).

Your in-depth analysis of the backtracking approach also makes sense, in
particular the awful bug that looks very, very similar to what 1/9 fixes
elsewhere.

I'll take some time to go over your comments in detail, but there is one
suggestion that I think I'll want to pursue first:

> I guess the holy grail would be a parser which reports _all_ syntactic
> events (section names, keys, comments, whitespace, etc) as a stream
> without storing anything. And then the normal reader could just discard
> the non-key events, and the writer here could build the tree from those
> events.

I already changed the do_config_from_file()/do_config_from() code path to
allow for handing back section headers. And I *think* that approach should
be easily extended to allow for an optional callback for these syntactic
events (and we do not need more than that, as the parsed "tree" really is
a list: there is nothing nested about ini files, so we really only have a
linear list of blocks (event type, offset range)).

I'll think about this a little bit, and hopefully come back with v2 in a
while that uses that approach.

Thank you so much for that suggestion,
Dscho


Re: [PATCH 4/9] t1300: remove unreasonable expectation from TODO

2018-03-30 Thread Johannes Schindelin
Hi Peff,

On Thu, 29 Mar 2018, Jeff King wrote:

> On Thu, Mar 29, 2018 at 05:18:50PM +0200, Johannes Schindelin wrote:
> 
> > In https://public-inbox.org/git/7vvc8alzat@alter.siamese.dyndns.org/
> > a reasonable patch was made quite a bit less so by changing a test case
> > demonstrating a bug to a test case that demonstrates that we ask for too
> > much: the test case 'unsetting the last key in a section removes header'
> > now expects a future bug fix to be able to determine whether a free-form
> > comment above a section header refers to said section or not.
> > 
> > Rather than shooting for the stars (and not even getting off the
> > ground), let's start shooting for something obtainable and be reasonably
> > confident that we *can* get it.
> 
> As I said before, I'm fine with turning this test into something more
> realistic.

Good.

Of course, I worked hard to come up with a patch series, i.e. I put in
some effort to placate anybody who would be offended by my accompanying
rant.

> An obvious question is whether we should preserve the original
> unrealistic parts by splitting it: the realistic parts into one
> expect_failure (that we'd switch to expect_success by the end of this
> series), and then an unrealistic one to serve as a documentation of the
> ideal, with a comment explaining why it's unrealistic.

As stated before, I think it would be a mistake to mark up this
unrealistic example with `test_expect_failure`. We do, after all, suggest
occasionally to grep for that when somebody asks what they could work on.
And you do not want to set somebody like that up for failure by pointing
them to such a "bug".

However, I did keep the example to demonstrate the expectation that
sections with surrounding comments are kept. That was very much intended.

And the reason I did not change the unrealistic example? So that it is
easier to review in our patch-based review process, where I try to avoid
hunks that might distract from the intent of the change.

Ciao,
Dscho


Re: [PATCH 3/9] t1300: avoid relying on a bug

2018-03-30 Thread Johannes Schindelin
Hi Peff,

On Thu, 29 Mar 2018, Jeff King wrote:

> On Thu, Mar 29, 2018 at 05:18:45PM +0200, Johannes Schindelin wrote:
> 
> > The test case 'unset with cont. lines' relied on a bug that is about to
> > be fixed: it tests *explicitly* that removing the last entry from a
> > config section leaves an *empty* section behind.
> > 
> > Let's fix this test case not to rely on that behavior, simply by
> > preventing the section from becoming empty.
> 
> Seems like a good solution. I don't think we care in particular about
> testing a multi-line value at the end of the file.

... and if we did, we should have documented that.

Ciao,
Dscho


Re: [PATCH 2/9] t1300: rename it to reflect that `repo-config` was deprecated

2018-03-30 Thread Johannes Schindelin
Hi Peff,

On Thu, 29 Mar 2018, Jeff King wrote:

> On Thu, Mar 29, 2018 at 05:18:40PM +0200, Johannes Schindelin wrote:
> 
> > Signed-off-by: Johannes Schindelin 
> > ---
> >  t/{t1300-repo-config.sh => t1300-config.sh} | 0
> >  1 file changed, 0 insertions(+), 0 deletions(-)
> >  rename t/{t1300-repo-config.sh => t1300-config.sh} (100%)
> 
> This has only been bugging me for oh, about 10 years.

Yep.

We should have done that right after moving the builtins' code to
builtins/.

Which reminds me that we *still* do not have a lib/ where all the source
code for libgit.a lives. And then maybe standalone/ for the source code of
the non-builtin tools. And... this would make for a fine micro-project
next year, I guess. Or in ten.

Ciao,
Dscho


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-30 Thread Sergey Organov
Hi Johannes,

Johannes Schindelin  writes:
> Hi,
>
> On Thu, 29 Mar 2018, Sergey Organov wrote:
>
>> Jacob Keller  writes:
>> 
>> > I care about the general compatibility of the rebase todo list
>> > regardless of which options you enabled on the command line to
>> > generate it.
>> 
>> It's a good thing in general, yes. However, I recall I was told by the
>> author that --recreate-merges was introduced exactly to break backward
>> compatibility of the todo list. If so, could we please agree to stop
>> using backward compatibility as an objection in the discussion of this
>> particular feature?
>
> That is a serious misrepresentation of what I said.
>
> If I had changed --preserve-merges to the new format, *that* would have
> broken backwards-compatibility.
>
> So the entire reason of introducing --recreate-merges was to *not have to
> break backwards-compatibility*.
>
> I definitely did not say the *exact opposite*.

I'm sorry I committed ambiguity in my wording that allowed it to be
misinterpreted. I actually intended to say roughly the same thing you
are saying, as what matters for the discussion is that new todo list
format does not need to be (backward-)compatible to that of
--preserve-merges. 

> Hopefully this clarifies your confusion,

There was actually no confusion on my side, and I like your wording
better.

Except that you've managed to clarify your intentions without actually
addressing the primary concern:

Could we please agree to stop using backward compatibility as an
objection in the discussion of the  --recreate-merges feature?

Could we?

I understand you are still resistant to change 'pick' syntax, but it's
not because of backward-compatibility, right?

-- Sergey


Re: [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug)

2018-03-30 Thread Johannes Schindelin
Hi Peff,

On Thu, 29 Mar 2018, Jeff King wrote:

> On Thu, Mar 29, 2018 at 05:18:30PM +0200, Johannes Schindelin wrote:
> 
> > The first patch is somewhat of a "while at it" bug fix that I first
> > thought would be a lot more critical than it actually is: It really
> > only affects config files that start with a section followed
> > immediately (i.e. without a newline) by a one-letter boolean setting
> > (i.e. without a `= ` part). So while it is a real bug fix, I
> > doubt anybody ever got bitten by it.
> 
> That makes me wonder if somebody could craft a malicious config to do
> something bad.

I thought about that, and could not think of anything other than social
engineering vectors. Even in that case, the error message is instructive
enough that the user should be able to fix the config without consulting
StackOverflow.

> > Now, to the really important part: why does this patch series not
> > conflict with my very early statements that we cannot simply remove
> > empty sections because we may end up with stale comments?
> > 
> > Well, the patch in question takes pains to determine *iff* there are
> > any comments surrounding, or included in, the section. If any are
> > found: previous behavior. Under the assumption that the user edited
> > the file, we keep it as intact as possible (see below for some
> > argument against this). If no comments are found, and let's face it,
> > this is probably *the* common case, as few people edit their config
> > files by hand these days (neither should they because it is too easy
> > to end up with an unparseable one), the now-empty section *is*
> > removed.
> 
> I'm not against people editing their config files by hand. But I think
> what you propose here makes a lot of sense, because it works as long as
> you don't intermingle hand- and auto-editing in the same section (and it
> even works if you do intermingle, as long as you don't use comments,
> which are probably even more rare).
> 
> So it seems like quite a sensible compromise, and I think should make
> most people happy.

Thanks for confirming my line of thinking,
Dscho


Re: [PATCH 1/9] git_config_set: fix off-by-two

2018-03-30 Thread Johannes Schindelin
Hi,

On Thu, 29 Mar 2018, Jeff King wrote:

> On Thu, Mar 29, 2018 at 11:15:33AM -0700, Stefan Beller wrote:
> 
> > > When calling `git config --unset abc.a` on this file, it leaves this
> > > (invalid) config behind:
> > >
> > > [
> > > [xyz]
> > > key = value
> > >
> > > The reason is that we try to search for the beginning of the line (or
> > > for the end of the preceding section header on the same line) that
> > > defines abc.a, but as an optimization, we subtract 2 from the offset
> > > pointing just after the definition before we call
> > > find_beginning_of_line(). That function, however, *also* performs that
> > > optimization and promptly fails to find the section header correctly.
> > 
> > This commit message would be more convincing if we had it in test form.
> 
> I agree a test might be nice. But I don't find the commit message
> unconvincing at all. It explains pretty clearly why the bug occurs, and
> you can verify it by looking at find_beginning_of_line.
> 
> > [abc]a
> > 
> > is not written by Git, but would be written from an outside tool or person
> > and we barely cope with it?
> 
> Yes, I don't think git would ever write onto the same line. But clearly
> we should handle anything that's syntactically valid.

I was tempted to add the test case, because it is easy to test it.

But I then decided *not* to add it. Why? Testing is a balance between "can
do" and "need to do".

Can you imagine that I did *not* run the entire test suite before
submitting this patch series, because it takes an incredible *90 minutes*
to run *on a fast Windows machine*?

Seriously, this is hurting me. I do not complain about this due to some
mental illness forcing me to do it. I complain about this so often
*because it slows me down*, you gentle people. And you don't seem to care,
at least the test suite gets noticably worse by the month. I frankly do
not know what to do about this, as you keep adding and adding and it gets
less and less feasible for me to run the full test suite. I seem to be
totally unable to get through to you with the message that this is a real
problem with a real need to get fixed.

So with this in mind, I do not want to add a test case for a concocted
example that won't affect anybody except users who *want* to trigger this
bug.

I hope you agree,
Dscho

P.S.: Of course I ran the entire test suite. Not on Windows, but in a
Linux VM, because Linux is what Git is fine-tuned for, most obviously so.
An alien digging up ancient Earth history in the far future might be
tempted to assume that Git was developed to develop Linux which was
developed to develop Git, and then ask herself why humans bothered at all.

I actually ran the entire test suite on Linux on every single patch, via
`git rebase -x "make -j15 DEVELOPER=1 test" @{u}`, as I usually do before
submitting a patch series.

And it *did* find an obscure bug in an earlier iteration, where
t5512-ls-remote.sh demonstrated that looking at only one entry at a time
is not enough: `git config --unset-all uploadpack.hiderefs` *also* needs
to remove the now-empty section, because we might end up with the empty
sections in the wrong order, and the order of [transfer] and [uploadpack]
*matters* if the transfer.hiderefs setting is negative and the
uploadpack.hiderefs setting is positive, as is the case in 'overrides work
between mixed transfer/upload-pack hideRefs'. (Side-note: this looks like
a pretty obvious design bug to me, as there is *no tooling* to switch
around the order of these settings. Even worse: if somebody gets
instructions to add those settings, and there is already a [transfer]
section in the config: you're out of luck! You will have to *know* that
the order matters, *and add a second [transfer] section manually*!)


Re: [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug)

2018-03-30 Thread Johannes Schindelin
Hi Stefan,

On Thu, 29 Mar 2018, Stefan Beller wrote:

> On Thu, Mar 29, 2018 at 8:18 AM, Johannes Schindelin
>  wrote:
> 
> > So what is the argument against this extra care to detect comments? Well, if
> > you have something like this:
> >
> > [section]
> > ; Here we comment about the variable called snarf
> > snarf = froop
> >
> > and we run `git config --unset section.snarf`, we end up with this config:
> >
> > [section]
> > ; Here we comment about the variable called snarf
> >
> > which obviously does not make sense. However, that is already established
> > behavior for quite a few years, and I do not even try to think of a way how
> > this could be solved.
> 
> By commenting out the key/value pair instead of deleting it.
> It's called --unset, not --delete ;)

That would open the door to new bug reports when a user starts with this
concocted config:

[section]
# This is a comment about the `key` setting
key = value

and then does this:

git config --unset section.key
git config section.key value
git config --unset section.key
git config section.key value
git config --unset section.key
git config section.key value

and then ends up with a config like this:

[section]
# This is a comment about the `key` setting
;key = value
;key = value
;key = value
key = value

And note that the comment might be about `value` instead, so reusing a
commented-out `key` setting won't fly, either.

I *did* give this problem a couple of minutes of thought before writing my
assessment that is quoted above ;-)

Ciao,
Dscho


Re: [PATCH v4 00/13] Serialized Git Commit Graph

2018-03-30 Thread Jakub Narebski
I hope that I am addressing the most recent version of this series.

Derrick Stolee  writes:

> As promised [1], this patch contains a way to serialize the commit graph.
> The current implementation defines a new file format to store the graph
> structure (parent relationships) and basic commit metadata (commit date,
> root tree OID) in order to prevent parsing raw commits while performing
> basic graph walks. For example, we do not need to parse the full commit
> when performing these walks:
>
> * 'git log --topo-order -1000' walks all reachable commits to avoid
>   incorrect topological orders, but only needs the commit message for
>   the top 1000 commits.
>
> * 'git merge-base  ' may walk many commits to find the correct
>   boundary between the commits reachable from A and those reachable
>   from B. No commit messages are needed.
>
> * 'git branch -vv' checks ahead/behind status for all local branches
>   compared to their upstream remote branches. This is essentially as
>   hard as computing merge bases for each.
>
> The current patch speeds up these calculations by injecting a check in
> parse_commit_gently() to check if there is a graph file and using that
> to provide the required metadata to the struct commit.

That's nice.

What are the assumptions about the serialized commit graph format? Does
it need to be:
 - extensible without rewriting (e.g. append-only)?
 - like the above, but may need rewriting for optimal performance?
 - extending it needs to rewrite whole file?

Excuse me if it waas already asked and answered.

>
> The file format has room to store generation numbers, which will be
> provided as a patch after this framework is merged. Generation numbers
> are referenced by the design document but not implemented in order to
> make the current patch focus on the graph construction process. Once
> that is stable, it will be easier to add generation numbers and make
> graph walks aware of generation numbers one-by-one.

As the serialized commit graph format is versioned, I wonder if it would
be possible to speed up graph walks even more by adding to it FELINE
index (pair of numbers) from "Reachability Queries in Very Large Graphs:
A Fast Refined Olnine Search Approach" (2014) - available at
http://openproceedings.org/EDBT/2014/paper_166.pdf

The implementation would probably need adjustments to make it
unambiguous and unambiguously extensible; unless there is place for
indices that are local-only and need to be recalculated from scratch
when graph changes (to cover all graph).

>
> Here are some performance results for a copy of the Linux repository
> where 'master' has 704,766 reachable commits and is behind 'origin/master'
> by 19,610 commits.
>
> | Command  | Before | After  | Rel % |
> |--|||---|
> | log --oneline --topo-order -1000 |  5.9s  |  0.7s  | -88%  |
> | branch -vv   |  0.42s |  0.27s | -35%  |
> | rev-list --all   |  6.4s  |  1.0s  | -84%  |
> | rev-list --all --objects | 32.6s  | 27.6s  | -15%  |

That's the "Rel %" of "Before", that is delta/before, isn't it?

> To test this yourself, run the following on your repo:
>
>   git config core.commitGraph true
>   git show-ref -s | git commit-graph write --set-latest --stdin-commits
>
> The second command writes a commit graph file containing every commit
> reachable from your refs. Now, all git commands that walk commits will
> check your graph first before consulting the ODB. You can run your own
> performance comparisions by toggling the 'core.commitgraph' setting.

Good.  It is nicely similar to how bitmap indices (of reachability) are
handled.

I just wonder what happens in the (rare) presence of grafts (old
mechanism), or "git replace"-d commits...

Regards,
-- 
Jakub Narębski


Re: [PATCH v3 0/3] add -p: select individual hunk lines

2018-03-30 Thread Phillip Wood
On 29/03/18 19:32, Junio C Hamano wrote:
> Phillip Wood  writes:
> 
>> From: Phillip Wood 
>>
>> Since v2 I've updated the patches to use '-' instead of '^' to invert
>> the selection to match the rest of add -i and clean -i.
>>
>> These patches build on top of the recount fixes in [1]. The commit
>> message for the first patch describes the motivation:
>>
>> "When I end up editing hunks it is almost always because I want to
>> stage a subset of the lines in the hunk. Doing this by editing the
>> hunk is inconvenient and error prone (especially so if the patch is
>> going to be reversed before being applied). Instead offer an option
>> for add -p to stage individual lines. When the user presses 'l' the
>> hunk is redrawn with labels by the insertions and deletions and they
>> are prompted to enter a list of the lines they wish to stage. Ranges
>> of lines may be specified using 'a-b' where either 'a' or 'b' may be
>> omitted to mean all lines from 'a' to the end of the hunk or all lines
>> from 1 upto and including 'b'."
> 
> I haven't seen any review comments on this round, and as I am not a
> heavy user of "add -i" interface (even though I admit that I
> originally wrote it), I haven't had a chance to exercise the code
> myself in the two weeks since the patches have been queued in my
> tree.
> 
> I am inclihned to merge them to 'next' soonish, but please stop me
> if anybody (including the original author) has further comments.
> 
> Thanks.
> 
Hi Junio, if no one else has any comments, then I think it's ready for
next. I've not used this latest incarnation much but I've used the
previous versions quite a bit.

Best Wishes

Phillip


  1   2   >