Re: 2.18.0 Regression: packing performance and effectiveness

2018-07-19 Thread Duy Nguyen
On Fri, Jul 20, 2018 at 7:28 AM Jeff King  wrote:
>
> On Thu, Jul 19, 2018 at 04:11:01PM -0700, Elijah Newren wrote:
>
> > Looking at the output from Peff's instrumentation elsewhere in this
> > thread, I see a lot of lines like
> >mismatched get: 32889efd307c7be376da9e3d45a78305f14ba73a = (, 28)
> > Does that mean it was reading the array when it wasn't ready?
>
> Yes, it looks like we saw a "get" without a "set". Though this could
> also be due to threading. The tracing isn't atomic with respect to the
> actual get/set operation, so it's possible that the ordering of the
> trace output does not match the ordering of the actual operations.
>
> > However, it's interesting to also look at the effect on packing
> > linux.git (on the same beefy hardware):
> >
> > Version  Pack (MB)  MaxRSS(kB)  Time (s)
> > ---  -  --  
> >  2.17.0 1279 11382932  632.24
> >  2.18.0 1279 10817568  621.97
> >  fiv-v4 1279 11484168 1193.67
> >
> > While the pack size is nice and small, the original memory savings
> > added in 2.18.0 are gone and the performance is much worse.  :-(
>
> Interesting. I can't reproduce here. The fix-v4 case is only slightly
> slower than 2.18.0. Can you double check that your compiler flags, etc,
> were the same? Many times I've accidentally compared -O0 to -O0. :)

He ran 40 threads though. That number of threads can make lock
contention very expensive. Yeah my money is also on lock contention.

> You might also try the patch below (on top of fix-v4), which moves the

Another thing Elijah could try is watch CPU utilization. If this is
lock contention, I think core utilization should be much lower because
we spend more time waiting than actually doing things.

> locking to its own dedicated mutex. That should reduce lock contention,

I think we could use cache_lock() which is for non-odb shared data
(and delta_size[] fits this category)

> and it fixes the remaining realloc where I think we're still racy. On my

Yeah it's not truly racy as you also noted in another mail. I'll make
a note about this in the commit message.

> repack of linux.git, it dropped the runtime from 6m3s to 5m41s, almost
> entirely in system CPU.
-- 
Duy


Re: 2.18.0 Regression: packing performance and effectiveness

2018-07-19 Thread Jeff King
On Fri, Jul 20, 2018 at 01:28:29AM -0400, Jeff King wrote:

> @@ -162,8 +166,10 @@ struct object_entry *packlist_alloc(struct packing_data 
> *pdata,
>  
>   if (!pdata->in_pack_by_idx)
>   REALLOC_ARRAY(pdata->in_pack, pdata->nr_alloc);
> + pack_delta_lock(pdata);
>   if (pdata->delta_size)
>   REALLOC_ARRAY(pdata->delta_size, pdata->nr_alloc);
> + pack_delta_unlock(pdata);
>   }

This made me wonder if in_pack_by_idx needs a similar lock. But
actually, I don't think either is necessary. We only allocate new
entries during the counting phase, not during the threaded delta search.

So this hunk could be dropped (and the system CPU benefit I saw is just
from reduced lock contention).

-Peff


Re: 2.18.0 Regression: packing performance and effectiveness

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 04:11:01PM -0700, Elijah Newren wrote:

> Looking at the output from Peff's instrumentation elsewhere in this
> thread, I see a lot of lines like
>mismatched get: 32889efd307c7be376da9e3d45a78305f14ba73a = (, 28)
> Does that mean it was reading the array when it wasn't ready?

Yes, it looks like we saw a "get" without a "set". Though this could
also be due to threading. The tracing isn't atomic with respect to the
actual get/set operation, so it's possible that the ordering of the
trace output does not match the ordering of the actual operations.

> However, it's interesting to also look at the effect on packing
> linux.git (on the same beefy hardware):
> 
> Version  Pack (MB)  MaxRSS(kB)  Time (s)
> ---  -  --  
>  2.17.0 1279 11382932  632.24
>  2.18.0 1279 10817568  621.97
>  fiv-v4 1279 11484168 1193.67
> 
> While the pack size is nice and small, the original memory savings
> added in 2.18.0 are gone and the performance is much worse.  :-(

Interesting. I can't reproduce here. The fix-v4 case is only slightly
slower than 2.18.0. Can you double check that your compiler flags, etc,
were the same? Many times I've accidentally compared -O0 to -O0. :)

You might also try the patch below (on top of fix-v4), which moves the
locking to its own dedicated mutex. That should reduce lock contention,
and it fixes the remaining realloc where I think we're still racy. On my
repack of linux.git, it dropped the runtime from 6m3s to 5m41s, almost
entirely in system CPU.

I didn't measure my max rss. However, I'd caution slightly against
drawing too much conclusion from it, for two reasons:

  1. RSS includes mmap'd packfiles, which is subject to whatever pages
 the OS feels like keeping in RAM. So using more heap can sometimes
 go unnoticed in that count, since you're just trading heap pages
 for mmap pages. Although that implies some memory pressure, and it
 sounds like your machine is sufficiently beefy to avoid that.

  2. Peak heap is going to depend on the threading. You have one thread
 per CPU working on a window of objects, each of which will be in
 memory at once. So I'd expect a fair bit of fluctuation in the peak
 just depending on how the threads line up with each other (some of
 it random, and some of it maybe impacted by what the code does, but
 in a way that just happens to fall out for this specific workload).

Which isn't to say measuring it is useless. The trends may override the
noise from those two things. I've just run into problems in the past
trying to get consistent measurements.

Here's the lock patch.

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ba14a1bfbc..b76ce04cb9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1926,12 +1926,12 @@ static unsigned long oe_delta_size(struct packing_data 
*pack,
 {
unsigned long size;
 
-   read_lock(); /* to protect access to pack->delta_size[] */
+   pack_delta_lock(pack);
if (pack->delta_size)
size = pack->delta_size[e - pack->objects];
else
size = e->delta_size_;
-   read_unlock();
+   pack_delta_unlock(pack);
return size;
 }
 
@@ -1939,10 +1939,10 @@ static void oe_set_delta_size(struct packing_data *pack,
  struct object_entry *e,
  unsigned long size)
 {
-   read_lock(); /* to protect access to pack->delta_size[] */
+   pack_delta_lock(pack);
if (!pack->delta_size && size < pack->oe_delta_size_limit) {
e->delta_size_ = size;
-   read_unlock();
+   pack_delta_unlock(pack);
return;
}
/*
@@ -1963,7 +1963,7 @@ static void oe_set_delta_size(struct packing_data *pack,
pack->delta_size[i] = pack->objects[i].delta_size_;
}
pack->delta_size[e - pack->objects] = size;
-   read_unlock();
+   pack_delta_unlock(pack);
 }
 
 static int try_delta(struct unpacked *trg, struct unpacked *src,
diff --git a/pack-objects.c b/pack-objects.c
index e3c32bbfc2..8d9c2dfb82 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -148,6 +148,10 @@ void prepare_packing_data(struct packing_data *pdata)
 1U << OE_SIZE_BITS);
pdata->oe_delta_size_limit = git_env_ulong("GIT_TEST_OE_DELTA_SIZE",
   1U << OE_DELTA_SIZE_BITS);
+
+#ifndef NO_PTHREADS
+   pthread_mutex_init(>delta_lock, NULL);
+#endif
 }
 
 struct object_entry *packlist_alloc(struct packing_data *pdata,
@@ -162,8 +166,10 @@ struct object_entry *packlist_alloc(struct packing_data 
*pdata,
 
if (!pdata->in_pack_by_idx)
REALLOC_ARRAY(pdata->in_pack, pdata->nr_alloc);
+   pack_delta_lock(pdata);
if (pdata->delta_size)
 

Re: [PATCH] t9300: wait for background fast-import process to die after killing it

2018-07-19 Thread Eric Rannaud
On Thu, Jul 19, 2018 at 3:26 PM, SZEDER Gábor  wrote:
> But how do some of these messages end up on the test script's stderr,
> why don't we get them from all five tests, and why do they come from
> different file/line locations?

Thanks for the detailed explanation.

Signed-off-by: Eric Rannaud 


No rule to make target `git-daemon'

2018-07-19 Thread Jeffrey Walton
Hi Everyone,

I'm working from the 2.18 tarball on Solaris 11.3 x86_64. I'm catching
the following when building from sources. This appears to be a new
issue. It was not present in 2.17.1.

gmake: *** No rule to make target `git-daemon'.  Stop.
gmake: *** Waiting for unfinished jobs
Failed to build Git

There does not appear to be an option to control building the daemon:

$ ./configure --help | grep -i daemon
$

Any ideas on how to side-step it?

Thanks in advance


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 09:08:08PM -0400, Jeff King wrote:

> Contrast this with memcpy(). This is on Microsoft's SDL banned list[1],
> but I think it's silly for it to be. I would never add it to this list.

I forgot my footnote, which was going to be:

  I'm bringing up that list not because I think it's necessarily a good
  list, but because it's _a_ list. And as I was recently subjected to an
  audit that referenced it, I've been thinking a lot about ban-lists and
  whether they are useful (and specifically for which functions).

  It's at https://msdn.microsoft.com/en-us/library/bb288454.aspx if
  you're curious, but again, that is absolutely not the ban-list I am
  working towards. To what I posted already, I'd probably add strcat()
  and vsprintf() based on discussions here, and then call it done.

-Peff


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 03:46:08PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > For enforcement, we can rely on the compiler and just
> > introduce code which breaks compilation when it sees these
> > functions. This has a few advantages:
> >
> >   1. We know it's run as part of a build cycle, so it's
> >  hard to ignore. Whereas an external linter is an extra
> >  step the developer needs to remember to do.
> >
> >   2. Likewise, it's basically free since the compiler is
> >  parsing the code anyway.
> >
> >   3. We know it's robust against false positives (unlike a
> >  grep-based linter).
> >
> > The one big disadvantage is that it will only check code
> > that is actually compiled, so it may miss code that isn't
> > triggered on your particular system. But since presumably
> > people don't add new code without compiling it (and if they
> > do, the banned function list is the least of their worries),
> > we really only care about failing to clean up old code when
> > adding new functions to the list. And that's easy enough to
> > address with a manual audit when adding a new function
> > (which is what I did for the functions here).
> >
> > That leaves only the question of how to trigger the
> > compilation error. The goals are:
> 
> I actually have another question, though.
> 
> Is it a downside that it is cumbersome to override?  This is not a
> rhetorical question.  I am not convinced there will not be a legit
> circumstance that calling strcpy (or whatever we are going to ban)
> is the best solution and it is safe.  By "best", what I mean is "you
> could instead use memcpy/strncpy/whatever" can legitimately be
> argued with "but still using memcpy/strncpy/whatever is inferior
> than using strcpy in this case for such and such reasons".

In my opinion, no, this is not a problem.

My plan is to only add functions which are truly worthless. So with
strcpy(), for example, you _can_ make sure the buffer is big enough
before calling strcpy. But to do so, you by definition must have just
called strlen(), at which point you may as well use memcpy(). It's more
obviously correct, and it's more efficient.

Ditto for sprintf, where you should _always_ be using at least xsnprintf
(or some better tool, depending on the situation).  And for strncpy,
strlcpy (or again, some better tool) is strictly an improvement.  If
these were our functions, I would literally delete them from our
codebase. This is the moral equivalent. ;)

Contrast this with memcpy(). This is on Microsoft's SDL banned list[1],
but I think it's silly for it to be. I would never add it to this list.

Likewise snprintf(). That is a function that _can_ be dangerous due to
unexpected truncation, and I think we should avoid it in general. But I
would not ban it, as it is the correct tool in a few situations (you
have a fixed buffer and returning a truncation error code is fine --
many of our syscall wrappers are in this exact situation). So I would
probably not add it to the ban list (but I feel less strongly than I do
about memcpy).

The nuclear option for overriding is "#undef that-function" in a
particular instance. That covers the rest of the translation unit, but I
think that's probably fine. If we have a function which is mostly
questionable, but we need it sometimes, then we ought to be putting a
sane wrapper around it. And _that_ wrapper can sit in its own file and
#undef as it pleases, keeping the unsafety contained within.

Don't get me wrong, if you have another suggestion that allows easier
one-off overrides, I'm happy to hear it. But I think building this into
the compilation step and having it on-by-default is a huge advantage in
simplicity and having people remember to run it. And I couldn't come up
with a better way to do that.

-Peff


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 05:59:47PM -0400, Eric Sunshine wrote:

> > The one I brainstormed (but forgot to mention) is that it might be
> > possible for a platform to have strcpy as a macro already? In which case
> > we'd need to #undef it or risk a compilation error (even if the macro
> > isn't actually used).
> 
> I have some recollection (perhaps long outdated or just wrong) of
> Microsoft headers spewing deprecation warnings about "unsafe"
> functions. I don't know whether they did that by covering functions
> with macros or by decorating the function with a deprecation attribute
> or by some other mechanism, but such concern seems well-founded.
> #undef'ing them might indeed be a very good preventative tactic.

Yeah, these functions are definitely on their "SDL banned list". I don't
know how they implement that. At that level, I'd really expect it to be
done with a deprecated attribute next to the declaration (I also
considered trying to add deprecation attributes, too, but I think it's
hard to do without re-declaring the function, and anyway it's "just" a
warning).

-Peff


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 02:47:35PM -0700, Stefan Beller wrote:

> > But that may be overly paranoid.  Once upon a time there was some rules
> > lawyering around CodingGuidelines, but I think that was successfully
> > shut down and hasn't reared its head for several years.
> 
> Heh; My preference would be to keep docs as short and concise as
> possible (and lawyering sounds like "verbosity is a feature" :-P) but
> still giving advice on controversial (i.e. not enforced by a machine in
> a deterministic fashion) things such as having braces around one
> statement for example or leaving them out.

I think we literally had somebody say "I don't have to abide by this in
a review because it wasn't in CodingGuidelines." But then, that is
perhaps indicative of other problems.

> So maybe I would have rather asked, why we start out with these two
> functions. And you seem to call them "obviously bad", and you take both
> of them because they need to be handled differently due to the variadic 
> macros.
> (And those two are "obviously worse" than strcat, as they are used more often.
> But strcat, being on MS ban list[1], would do just as fine)

Ooh, strcat is another one that should be added.

I actually thought about splitting it into three commits (introduce
mechanism, then one per function), but it felt like stringing it out.
You are probably right, though, that each function deserves its own
explanation. And the commit message is already quite long.

> >   We'll include strcpy() and sprintf() in the initial list of banned
> >   functions. While these _can_ be used carefully by surrounding them
> >   with extra code, there's no inherent check that the size of the
> >   destination buffer is big enough, it's very easy to overflow the
> >   buffer.
> 
> Sounds good to me, maybe even add "We've had problems with that
> in the past, see 'git log -S strcpy'", but that may be too much again.

Actually, that's a good point. We've had actual bugs due strcpy(). I can
similarly point to bad uses of strcat().

I think I have sufficient fodder for a re-roll along these lines
(assuming we like the idea at all; Junio seemed to have some
reservations, but I'll reply there separately).

-Peff


RE: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Randall S. Becker
On July 19, 2018 6:46 PM, Junio wrote:
> Jeff King  writes:
> 
> > For enforcement, we can rely on the compiler and just introduce code
> > which breaks compilation when it sees these functions. This has a few
> > advantages:
> >
> >   1. We know it's run as part of a build cycle, so it's
> >  hard to ignore. Whereas an external linter is an extra
> >  step the developer needs to remember to do.
> >
> >   2. Likewise, it's basically free since the compiler is
> >  parsing the code anyway.
> >
> >   3. We know it's robust against false positives (unlike a
> >  grep-based linter).
> >
> > The one big disadvantage is that it will only check code that is
> > actually compiled, so it may miss code that isn't triggered on your
> > particular system. But since presumably people don't add new code
> > without compiling it (and if they do, the banned function list is the
> > least of their worries), we really only care about failing to clean up
> > old code when adding new functions to the list. And that's easy enough
> > to address with a manual audit when adding a new function (which is
> > what I did for the functions here).
> >
> > That leaves only the question of how to trigger the compilation error.
> > The goals are:
> 
> I actually have another question, though.
> 
> Is it a downside that it is cumbersome to override?  This is not a
rhetorical
> question.  I am not convinced there will not be a legit circumstance that
> calling strcpy (or whatever we are going to ban) is the best solution and
it is
> safe.  By "best", what I mean is "you could instead use
> memcpy/strncpy/whatever" can legitimately be argued with "but still using
> memcpy/strncpy/whatever is inferior than using strcpy in this case for
such
> and such reasons".

Putting on my old-guy compiler hat, this sounds like a more complex activity
that something akin to lint might be useful at handling. Having a
post-processor that searches for offending functions but also supports
annotations explaining exceptions (why you really had to use strncpy because
the NULL was hiding in a bad place and you promise to fix it), might be
useful. Personally, I'd rather know that that code compiles first and then
violates rules that I can fix following basic prototyping than getting
yelled at up front - but that's just me. I can't suggest a good thing to
handle this, short of augmenting lint, and if we were in java, annotations
would be the way to go, but this seems like a problem that other products
have solved.

Cheers,
Randall

-- Brief whoami:
  NonStop developer since approximately NonStop(2112884442)
  UNIX developer since approximately 421664400
-- In my real life, I talk too much.





Re: 2.18.0 Regression: packing performance and effectiveness

2018-07-19 Thread Elijah Newren
On Thu, Jul 19, 2018 at 11:24 AM, Duy Nguyen  wrote:
> On Thu, Jul 19, 2018 at 07:31:35PM +0200, Duy Nguyen wrote:
>> On Thu, Jul 19, 2018 at 01:23:58PM -0400, Jeff King wrote:
>> > On Thu, Jul 19, 2018 at 09:42:00AM -0700, Elijah Newren wrote:
>> >
>> > > Thanks for the quick turnaround.  Unfortunately, I have some bad news.
>> > > With this patch, I get the following:
>> > >
>> > > $ /usr/bin/time -f 'MaxRSS:%M Time:%e' git gc --aggressive
>> > > Enumerating objects: 4460703, done.
>> > > Counting objects: 100% (4460703/4460703), done.
>> > > Delta compression using up to 40 threads.
>> > > Compressing objects: 100% (3807140/3807140), done.
>> > > Writing objects: 100% (4460703/4460703), done.
>> > > Total 4460703 (delta 2831383), reused 1587071 (delta 0)
>> > > error: failed to unpack compressed delta at offset 183854150 from
>> > > .git/objects/pack/pack-30d4f0b0e5a03dc91a658a0586f4e74cdf4a94d6.pack
>> > > fatal: packed object 20ce811e53dabbb8ef9368c108cbbdfa65639c03 (stored
>> > > in .git/objects/pack/pack-30d4f0b0e5a03dc91a658a0586f4e74cdf4a94d6.pack)
>> > > is corrupt
>> > > error: failed to run prune
>> > > MaxRSS:40025196 Time:2531.52
>> >
>> > Looking at that output, my _guess_ is that we somehow end up with a
>> > bogus delta_size value and write out a truncated entry. But I couldn't
>> > reproduce the issue with smaller test cases.
>>
>> Could it be a race condition?
>
> I'm convinced my code is racy (between two writes). I created a broken
> pack once with 32 threads. Elijah please try again with this new
> patch. It should fix this (I only tried repack a few times so far but
> will continue)
>
> The race is this
>
> 1. Thread one sees a large delta size and NULL delta_size[] array,
>allocates the new array and in the middle of copying old delta
>sizes over.
>
> 2. Thread two wants to write a new (large) delta size. It sees that
>delta_size[] is already allocated, it writes the correct size there
>(and truncated one in object_entry->delta_size_)
>
> 3. Back to thread one, it now copies the truncated value in
>delta_size_ from step 2 to delta_size[] array, overwriting the good
>value that thread two wrote.
>
> There is also a potential read/write race where a read from
> pack_size[] happens when the array is not ready. But I don't think it
> can happen with current try_delta() code. I protect it anyway to be
> safe.

Looking at the output from Peff's instrumentation elsewhere in this
thread, I see a lot of lines like
   mismatched get: 32889efd307c7be376da9e3d45a78305f14ba73a = (, 28)
Does that mean it was reading the array when it wasn't ready?


Anyway, with your latest patch (which I'm labelling fix-v4), git gc
--aggressive completes, git fsck likes the result, and the new table
of stats on this repo becomes:

Version  Pack (MB)  MaxRSS(kB)  Time (s)
---  -  --  
 2.17.0 5498 435136282494.85
 2.18.010531 404495964168.94
 fix-v1 5509 425097842480.74
 fiv-v2 5509 416441042468.25
 fiv-v4 5500 444009482761.74


So, the pack size is back to what is expected.  The code takes about
10% longer and requires 2% more memory than git-2.17.0, but the pack
size was the main issue.


However, it's interesting to also look at the effect on packing
linux.git (on the same beefy hardware):

Version  Pack (MB)  MaxRSS(kB)  Time (s)
---  -  --  
 2.17.0 1279 11382932  632.24
 2.18.0 1279 10817568  621.97
 fiv-v4 1279 11484168 1193.67

While the pack size is nice and small, the original memory savings
added in 2.18.0 are gone and the performance is much worse.  :-(


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Junio C Hamano
Jeff King  writes:

> For enforcement, we can rely on the compiler and just
> introduce code which breaks compilation when it sees these
> functions. This has a few advantages:
>
>   1. We know it's run as part of a build cycle, so it's
>  hard to ignore. Whereas an external linter is an extra
>  step the developer needs to remember to do.
>
>   2. Likewise, it's basically free since the compiler is
>  parsing the code anyway.
>
>   3. We know it's robust against false positives (unlike a
>  grep-based linter).
>
> The one big disadvantage is that it will only check code
> that is actually compiled, so it may miss code that isn't
> triggered on your particular system. But since presumably
> people don't add new code without compiling it (and if they
> do, the banned function list is the least of their worries),
> we really only care about failing to clean up old code when
> adding new functions to the list. And that's easy enough to
> address with a manual audit when adding a new function
> (which is what I did for the functions here).
>
> That leaves only the question of how to trigger the
> compilation error. The goals are:

I actually have another question, though.

Is it a downside that it is cumbersome to override?  This is not a
rhetorical question.  I am not convinced there will not be a legit
circumstance that calling strcpy (or whatever we are going to ban)
is the best solution and it is safe.  By "best", what I mean is "you
could instead use memcpy/strncpy/whatever" can legitimately be
argued with "but still using memcpy/strncpy/whatever is inferior
than using strcpy in this case for such and such reasons".



Re: Find commit that referenced a blob first

2018-07-19 Thread Stefan Beller
On Thu, Jul 19, 2018 at 3:38 PM Junio C Hamano  wrote:

> > If the given object refers to a blob, it will be described as
> > :,
> > such that the blob can be found at  in the , which itself
> > describes the first commit in which this blob occurs in a reverse
> > revision walk from HEAD.
>
> You walk from the latest to earlier commit (because there by
> definition can be is no reverse pointer from older to newer commit),

but a "reverse walk from HEAD" produces the commits to us in an order
as if we were walking from earlier to latest?


Re: Find commit that referenced a blob first

2018-07-19 Thread Junio C Hamano
Stefan Beller  writes:

> On Thu, Jul 19, 2018 at 2:02 PM Lars Schneider  
> wrote:
>>
>> Hi,
>>
>> I have a blob hash and I would like to know what commit referenced
>> this blob first in a given Git repo.
>
> git describe 
>
> If the given object refers to a blob, it will be described as
> :,
> such that the blob can be found at  in the , which itself
> describes the first commit in which this blob occurs in a reverse
> revision walk from HEAD.

You walk from the latest to earlier commit (because there by
definition can be is no reverse pointer from older to newer commit),
and see if it has the blob, and stop when you find one.  Wouldn't it
generally find the most recent use, not the earliest use as Lars
seems to want?

>
> Since
> 644eb60bd01 (builtin/describe.c: describe a blob, 2017-11-15)
> (included first in 2.16.0
>
> You're welcome,
> Stefan


Re: rev-parse --show-toplevel broken during exec'ed rebase?

2018-07-19 Thread Junio C Hamano
SZEDER Gábor  writes:

>> Forgetting the code in git-sh-setup, are we?
>> 
>> git_dir_init() rather specifically set GIT_DIR to the absolute path, and
>> since that variable is already exported, the `exec` commands launched via
>> `git-rebase--interactive` all saw it.
>> 
>> That is the reason why the sequencer.c was taught to set GIT_DIR to an
>> absolute path rathern than not setting it: for backwards compatibility.
>
> GIT_DIR was not exported to 'exec' commands during an interactive
> rebase prior to 18633e1a22 (rebase -i: use the rebase--helper builtin,
> 2017-02-09) (nor was GIT_PREFIX):
>
>   $ git log -Sgit_dir_init master git-rebase*.sh
>   # Nothing.
>   $ git checkout 18633e1a22a6^ && make -j4 prefix=/tmp/BEFORE install
>   <>
>   $ git checkout 18633e1a22a6 && make -j4 prefix=/tmp/AFTER install
>   <>
>   $ GIT_EDITOR='sed -i -e "1ix set |grep ^GIT"' /tmp/BEFORE/bin/git rebase -i 
> HEAD^
>   Executing: set |grep ^GIT
>   GIT_CHERRY_PICK_HELP=$'\nWhen you have resolved this problem, run "git 
> rebase --continue".\nIf you prefer to skip this patch, run "git rebase 
> --skip" instead.\nTo check out the original branch and stop rebasing, run 
> "git rebase --abort".\n'
>   GIT_EDITOR='sed -i -e "1ix set |grep ^GIT"'
>   GIT_INTERNAL_GETTEXT_SH_SCHEME=gnu
>   GIT_REFLOG_ACTION='rebase -i (start): checkout HEAD^'
>   warning: notes ref refs/notes/commits is invalid
>   Successfully rebased and updated refs/heads/master.
>   $ GIT_EDITOR='sed -i -e "1ix set |grep ^GIT"' /tmp/AFTER/bin/git rebase -i 
> HEAD^
>   Executing: set |grep ^GIT
>   GIT_CHERRY_PICK_HELP='
>   GIT_DIR='.git'
>   GIT_EDITOR='sed -i -e "1ix set |grep ^GIT"'
>   GIT_INTERNAL_GETTEXT_SH_SCHEME='gnu'
>   GIT_PREFIX=''
>   GIT_REFLOG_ACTION='rebase -i (start): checkout HEAD^'
>   warning: notes ref refs/notes/commits is invalid
>   Successfully rebased and updated refs/heads/master.
>
> And then recently came 226c0ddd0d (exec_cmd: RUNTIME_PREFIX on some
> POSIX systems, 2018-04-10), which then started to export GIT_EXEC_PATH
> to 'exec' commands as well...  

Correct.  git-sh-setup does assign to GIT_DIR, but unless the
end-user has it exported, it does not export, which has long been
very deliberate.  But it does not really matter, as exporting BOTH
GIT_DIR and GIT_WORK_TREE is a lot easier to understand and probably
safe (even though technically a regressing) solution, and fewer and
fewer things will be relying on git-sh-setup in the future anyway.



Re: [PATCH] t/t5534: do not unset GIT_COMMITTER_EMAIL for other tests

2018-07-19 Thread Junio C Hamano
Henning Schild  writes:

> Looking at "what is cooking" i assume i should not add/fold this to/in
> the serien anymore. So it comes as a separate patch on top.

Thanks.  I only said:

 I think this round is mostly ready, except for a minor nit in the
 last step.  I do not mind merging this to 'next' and leave fixing
 of the test to a later clean-up.

meaing that I _do not mind_ merging the slightly broken one to
'next' and fixing it with a follow-up (i.e. this patch), but fixing
with a replacement patch so that we pretend we never made a mistake
that needs fixing from the beginning is of course preferrable ;-)

Let's squash this patch into the last one in the series.


[PATCH] t9300: wait for background fast-import process to die after killing it

2018-07-19 Thread SZEDER Gábor
The five new tests added to 't9300-fast-import.sh' in 30e215a65c
(fast-import: checkpoint: dump branches/tags/marks even if
object_count==0, 2017-09-28), all with the prefix "V:" in their test
description, run 'git fast-import' in the background and then 'kill'
it as part of a 'test_when_finished' cleanup command.  When this test
script is executed with Bash, some or even all of these tests tend to
pollute the test script's stderr, and messages about terminated
processes end up on the terminal:

  $ bash ./t9300-fast-import.sh
  <... snip ...>
  ok 179 - V: checkpoint helper does not get stuck with extra output
  /<...>/test-lib-functions.sh: line 388: 28383 Terminated  git 
fast-import $options 0<&8 1>&9
  ok 180 - V: checkpoint updates refs after reset
  ./t9300-fast-import.sh: line 3210: 28401 Terminated  git 
fast-import $options 0<&8 1>&9
  ok 181 - V: checkpoint updates refs and marks after commit
  ok 182 - V: checkpoint updates refs and marks after commit (no new objects)
  ./test-lib.sh: line 634: line 3250: 28485 Terminated  git 
fast-import $options 0<&8 1>&9
  ok 183 - V: checkpoint updates tags after tag
  ./t9300-fast-import.sh: line 3264: 28510 Terminated  git 
fast-import $options 0<&8 1>&9

After a background child process terminates, its parent Bash process
always outputs a message like those above to stderr, even when in
non-interactive mode.

But how do some of these messages end up on the test script's stderr,
why don't we get them from all five tests, and why do they come from
different file/line locations?  Well, after sending the TERM signal to
the background child process, it takes a little while until that
process receives the signal and terminates, and then it takes another
while until the parent process notices it.  During this time the
parent Bash process is continuing execution, and by the time it
notices that its child terminated it might have already left
'test_eval_inner_' and its stderr is not redirected to /dev/null
anymore.  That's why such a message can appear on the test script's
stderr, while other times, when the child terminates fast and/or the
parent shell is slow enough, the message ends up in /dev/null, just
like any other output of the test does.  Bash always adds the file
name and line number of the code location it was about to execute when
it notices the termination of its child process as a prefix to that
message, hence the varying and sometimes totally unrelated location
prefixes in those messages (e.g. line 388 in 'test-lib-functions.sh'
is 'test_verify_prereq', and I saw such a message pointing to
'say_color' as well).

Prevent these messages from appearing on the test script's stderr by
'wait'-ing on the pid of the background 'git fast-import' process
after sending it the TERM signal.  This ensures that the executing
shell's stderr is still redirected when the shell notices the
termination of its child process in the background, and that these
messages get a consistent file/line location prefix.

Note that this is not an issue when the test script is run with Bash
and '-v', because then these messages are supposed to go to the test
script's stderr anyway, and indeed all of them do; though the
sometimes seemingly random file/line prefixes could be confusing
still.  Similarly, it's not an issue with Bash and '--verbose-log'
either, because then all messages go to the log file as they should.
Finally, it's not an issue with some other shells (I tried dash, ksh,
ksh93 and mksh) even without any of the verbose options, because they
don't print messages like these in non-interactive mode in the first
place.

Signed-off-by: SZEDER Gábor 
---
 t/t9300-fast-import.sh | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 9e7f96223d..fac33e524c 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -3147,7 +3147,10 @@ background_import_then_checkpoint () {
echo $! >V.pid
# We don't mind if fast-import has already died by the time the test
# ends.
-   test_when_finished "exec 8>&-; exec 9>&-; kill $(cat V.pid) || true"
+   test_when_finished "
+   exec 8>&-; exec 9>&-;
+   kill $(cat V.pid) && wait $(cat V.pid)
+   true"
 
# Start in the background to ensure we adhere strictly to (blocking)
# pipes writing sequence. We want to assume that the write below could
-- 
2.18.0.408.g42635c01bc



Re: Git 2.18: RUNTIME_PREFIX... is it working?

2018-07-19 Thread brian m. carlson
On Wed, Jul 18, 2018 at 02:18:44PM +0200, Johannes Schindelin wrote:
> On Sat, 14 Jul 2018, brian m. carlson wrote:
> > I will say that at cPanel, we have a configuration where end users can
> > end up inside a mount namespace without /proc (depending on the
> > preferences of the administrator).  However, it's easy enough for us to
> > simply build without RUNTIME_PREFIX if necessary.
> > 
> > If we turn it on by default, it would be nice if we documented (maybe in
> > the Makefile) that it requires /proc on Linux for the benefit of other
> > people who might be in a similar situation.
> 
> Is there *really* no other way on Linux to figure out the absolute path of
> the current executable than to open a pseudo file in the `/proc` file
> system?

Nope, not that I'm aware of.  You have to read the destination of
the /proc/PID/exe symlink.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


[PATCH] xdiff/histogram: remove tail recursion

2018-07-19 Thread Stefan Beller
When running the same reproduction script as the previous patch,
it turns out the stack is too small, which can be easily avoided.

Signed-off-by: Stefan Beller 
---
 xdiff/xhistogram.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
index fc2d3cd95d9..ec85f5992bd 100644
--- a/xdiff/xhistogram.c
+++ b/xdiff/xhistogram.c
@@ -318,7 +318,9 @@ static int histogram_diff(xpparam_t const *xpp, xdfenv_t 
*env,
 {
struct region lcs;
int lcs_found;
-   int result = -1;
+   int result;
+redo:
+   result = -1;
 
if (count1 <= 0 && count2 <= 0)
return 0;
@@ -355,11 +357,17 @@ static int histogram_diff(xpparam_t const *xpp, xdfenv_t 
*env,
line2, lcs.begin2 - line2);
if (result)
goto out;
-   result = histogram_diff(xpp, env,
-   lcs.end1 + 1, LINE_END(1) - 
lcs.end1,
-   lcs.end2 + 1, LINE_END(2) - 
lcs.end2);
-   if (result)
-   goto out;
+   /*
+* result = histogram_diff(xpp, env,
+*lcs.end1 + 1, LINE_END(1) - lcs.end1,
+*lcs.end2 + 1, LINE_END(2) - lcs.end2);
+* but let's optimize tail recursion ourself:
+   */
+   count1 = LINE_END(1) - lcs.end1;
+   line1 = lcs.end1 + 1;
+   count2 = LINE_END(2) - lcs.end2;
+   line2 = lcs.end2 + 1;
+   goto redo;
}
}
 out:
-- 
2.18.0.233.g985f88cf7e-goog



Re: rev-parse --show-toplevel broken during exec'ed rebase?

2018-07-19 Thread SZEDER Gábor
> On Mon, 16 Jul 2018, Junio C Hamano wrote:
> 
> > Jeff King  writes:
> > 
> > > None of which is too surprising. The root of the bug is in the
> > > conversion to rebase--helper, I think, when presumably we started
> > > setting GIT_DIR at all (but I didn't dig further). Then 09d7b6c6fa fixed
> > > _one_ fallout of that, which was relative paths, but didn't help the
> > > subdirectory case.
> > >
> > > Just reading over this thread, I suspect the simplest fix is to pass
> > > GIT_DIR and GIT_WORK_TREE together, which is almost always the right
> > > thing to do.
> > 
> > Perhaps.  Not exporting GIT_DIR (unless the end-user already did to
> > the environment before starting "git rebase"---it would be a bad
> > change to unexport it unconditionally) may probably be a way to make
> > rebase--helper conversion more faithful to the original scripted
> > Porcelain, but I suspect in practice always giving GIT_DIR and
> > GIT_WORK_TREE would work well for many existing hooks.
> 
> Forgetting the code in git-sh-setup, are we?
> 
> git_dir_init() rather specifically set GIT_DIR to the absolute path, and
> since that variable is already exported, the `exec` commands launched via
> `git-rebase--interactive` all saw it.
> 
> That is the reason why the sequencer.c was taught to set GIT_DIR to an
> absolute path rathern than not setting it: for backwards compatibility.

GIT_DIR was not exported to 'exec' commands during an interactive
rebase prior to 18633e1a22 (rebase -i: use the rebase--helper builtin,
2017-02-09) (nor was GIT_PREFIX):

  $ git log -Sgit_dir_init master git-rebase*.sh
  # Nothing.
  $ git checkout 18633e1a22a6^ && make -j4 prefix=/tmp/BEFORE install
  <>
  $ git checkout 18633e1a22a6 && make -j4 prefix=/tmp/AFTER install
  <>
  $ GIT_EDITOR='sed -i -e "1ix set |grep ^GIT"' /tmp/BEFORE/bin/git rebase -i 
HEAD^
  Executing: set |grep ^GIT
  GIT_CHERRY_PICK_HELP=$'\nWhen you have resolved this problem, run "git rebase 
--continue".\nIf you prefer to skip this patch, run "git rebase --skip" 
instead.\nTo check out the original branch and stop rebasing, run "git rebase 
--abort".\n'
  GIT_EDITOR='sed -i -e "1ix set |grep ^GIT"'
  GIT_INTERNAL_GETTEXT_SH_SCHEME=gnu
  GIT_REFLOG_ACTION='rebase -i (start): checkout HEAD^'
  warning: notes ref refs/notes/commits is invalid
  Successfully rebased and updated refs/heads/master.
  $ GIT_EDITOR='sed -i -e "1ix set |grep ^GIT"' /tmp/AFTER/bin/git rebase -i 
HEAD^
  Executing: set |grep ^GIT
  GIT_CHERRY_PICK_HELP='
  GIT_DIR='.git'
  GIT_EDITOR='sed -i -e "1ix set |grep ^GIT"'
  GIT_INTERNAL_GETTEXT_SH_SCHEME='gnu'
  GIT_PREFIX=''
  GIT_REFLOG_ACTION='rebase -i (start): checkout HEAD^'
  warning: notes ref refs/notes/commits is invalid
  Successfully rebased and updated refs/heads/master.

And then recently came 226c0ddd0d (exec_cmd: RUNTIME_PREFIX on some
POSIX systems, 2018-04-10), which then started to export GIT_EXEC_PATH
to 'exec' commands as well...  




Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Eric Sunshine
On Thu, Jul 19, 2018 at 5:27 PM Jeff King  wrote:
> On Thu, Jul 19, 2018 at 05:11:15PM -0400, Eric Sunshine wrote:
> > On Thu, Jul 19, 2018 at 4:39 PM Jeff King  wrote:
> > > + * This header lists functions that have been banned from our code base,
> > > + * because they're too easy to misuse (and even if used correctly,
> > > + * complicate audits). Including this header turns them into compile-time
> > > + * errors.
> >
> > When the above talks about "including this header", the implication is
> > that it must be included _after_ the system header(s) which declare
> > the banned functions. I wonder if that requirement should be stated
> > here explicitly.
>
> Hmm, does it need to be? I had originally intended it to be included
> before, actually, though in the end I put it later.
> I guess it would yield declarations like strcpy_is_banned(), which would
> cause _different_ errors (probably link-time ones).

Yes, that's what I meant. You'd only get link-time errors if banned.h
was included before the system headers (assuming I'm thinking about
this correctly).

> > (Probably not worth a re-roll.)
>
> Yeah, I doubt it matters much either way, since the inclusion is done
> automatically in git-compat-util.h.

Exactly.

> I had also originally imagined this to be triggered via DEVELOPER=1,
> with something like "-include banned.h" in CFLAGS. But I think it
> probably is appropriate for everybody to run it, since it shouldn't
> cause any false positives or other compilation issues.

Agreed.

> The one I brainstormed (but forgot to mention) is that it might be
> possible for a platform to have strcpy as a macro already? In which case
> we'd need to #undef it or risk a compilation error (even if the macro
> isn't actually used).

I have some recollection (perhaps long outdated or just wrong) of
Microsoft headers spewing deprecation warnings about "unsafe"
functions. I don't know whether they did that by covering functions
with macros or by decorating the function with a deprecation attribute
or by some other mechanism, but such concern seems well-founded.
#undef'ing them might indeed be a very good preventative tactic.


Re: Gitk doesn't work on macOS Mojave

2018-07-19 Thread Eric Sunshine
[please reply inline rather than top-posting]

On Thu, Jul 19, 2018 at 5:11 PM Evgeny Cherpak  wrote:
> It seems this code placed at the end of the file, after getcommits() does the 
> trick:
>
> if {[tk windowingsystem] eq "aqua"} {
> set openscript [format {
> open -a \"$(ps -p %d -o comm=)\"
> } [pid] ]
> exec osascript -e [format {
> do shell script "%s"
> } "$openscript” ]
> }

Interesting idea. A shortcoming of this approach is that it's
non-deterministic. It will bring _any_ running gitk to the foreground,
rather than the one mentioned by PID, which would be a potentially
confusing regression. Such confusion _might_ be worse than simply not
foregrounding the application at all (though that's a matter of
opinion). Unfortunately, the 'open' command doesn't let you do so by
PID, and Googling shows only answers which predate Mojave (that is,
they all suggest Apple Events).

To avoid such a regression on older MacOS's, it would be nice to
retain the Apple Events code, but that would mean gitk would need to
do a version check or something to avoid executing the Apple Events
code on Mojave.

By the way, the same problem presumably afflicts "git gui", which has
identical code in it for foregrounding the application. Does that
match with your experience?


Re: Find commit that referenced a blob first

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 02:19:34PM -0700, Stefan Beller wrote:

> > I have a blob hash and I would like to know what commit referenced
> > this blob first in a given Git repo.
> 
> git describe 
> 
> If the given object refers to a blob, it will be described as
> :,
> such that the blob can be found at  in the , which itself
> describes the first commit in which this blob occurs in a reverse
> revision walk from HEAD.
> 
> Since
> 644eb60bd01 (builtin/describe.c: describe a blob, 2017-11-15)
> (included first in 2.16.0

Hmm.

  $ git describe cfbc47ee2d
  fatal: No tags can describe '83adac3c57ad8cd2c8d44b525414b949950e316d'.
  Try --always, or create some tags.

  $ git describe --always cfbc47ee2d
  83adac3c57:checkout-cache.c

That first output confused me for a moment. We can't produce a nice
descriptive name for the commit in question, so we punt on the whole
thing.

Anyway. I have found your diff --find-object to be more useful in
practice. I.e.:

  git log --find-object=cfbc47ee2d

because I usually care less about a succinct name, and more about
digging into the history (so adding "-p", etc).

-Peff


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Stefan Beller
On Thu, Jul 19, 2018 at 2:32 PM Jeff King  wrote:
>
> On Thu, Jul 19, 2018 at 02:15:54PM -0700, Stefan Beller wrote:
>
> > >  Documentation/CodingGuidelines |  3 +++
> >
> > I'd prefer to not add more text to our documentation
> > (It is already long and in case you run into this problem
> > the error message is clear enough IMHO)
>
> I'm fine with that too. I just wondered if somebody would complain in
> the opposite way: your patch enforces this, but we never made it an
> "official" guideline.
>
> But that may be overly paranoid.  Once upon a time there was some rules
> lawyering around CodingGuidelines, but I think that was successfully
> shut down and hasn't reared its head for several years.

Heh; My preference would be to keep docs as short and concise as
possible (and lawyering sounds like "verbosity is a feature" :-P) but
still giving advice on controversial (i.e. not enforced by a machine in
a deterministic fashion) things such as having braces around one
statement for example or leaving them out.

> > Regarding the introduction of the functions to this list,
> > I would imagine people would find the commit that introduced
> > a function to the ban list to look for a reason why.
> > Can we include a link[1] to explain why we discourage
> > strcpy and sprintf in this commit?
>
> I hoped that it was mostly common knowledge, but that's probably not a
> good assumption. I agree if there's a good reference, we should link to
> it.

In this case I am just an advocate for a better history. Countless
times I wanted
to know *why* something is the way it is and mailing list and history are not
very good at that, as my specific question was not addressed there.

Either it was obvious to all people involved at the time or not written down
why that solution (which -- in hindsight -- sucks), was superior to the other
solution (which may or may not have been known at the time).

So maybe I would have rather asked, why we start out with these two
functions. And you seem to call them "obviously bad", and you take both
of them because they need to be handled differently due to the variadic macros.
(And those two are "obviously worse" than strcat, as they are used more often.
But strcat, being on MS ban list[1], would do just as fine)

(I agree that) Any other function can be added on top with its own
commit message on *why* that function. Over time I would expect that the
reason is that we got bitten by it, or some other project got famously bitten
by it or that some smart people came up with a list[1]. The exception is
this one, which basically says: "Here is the mechanism how to do it and
$X is the obvious thing to put in first", which I agree with the mechanism
as well as that $X seems bad.

[1] https://msdn.microsoft.com/en-us/library/bb288454.aspx

Now that I am deep down the rathole of discussing a tangent, I just found
[2], when searching for "how much common knowledge is wrong", do you
know if there is such a list for (CS) security related things?

[2] http://www.marcandangel.com/2008/06/12/60-popular-pieces-of-false-knowledge/

>
> > [1] https://www.thegeekstuff.com/2013/06/buffer-overflow/?utm_source=feedly
> >   This is the best I found. So not sure if it worth it.
>
> Yeah, this one is so-so, because it goes into a lot more detail. I think
> we can assume that people know that overflowing a buffer is bad. Maybe
> just a short paragraph like:
>
>   We'll include strcpy() and sprintf() in the initial list of banned
>   functions. While these _can_ be used carefully by surrounding them
>   with extra code, there's no inherent check that the size of the
>   destination buffer is big enough, it's very easy to overflow the
>   buffer.

Sounds good to me, maybe even add "We've had problems with that
in the past, see 'git log -S strcpy'", but that may be too much again.

Thanks,
Stefan


Re: Find commit that referenced a blob first

2018-07-19 Thread Lars Schneider


> On Jul 19, 2018, at 11:19 PM, Stefan Beller  wrote:
> 
> On Thu, Jul 19, 2018 at 2:02 PM Lars Schneider  
> wrote:
>> 
>> Hi,
>> 
>> I have a blob hash and I would like to know what commit referenced
>> this blob first in a given Git repo.
> 
> git describe 
> 
> If the given object refers to a blob, it will be described as
> :,
> such that the blob can be found at  in the , which itself
> describes the first commit in which this blob occurs in a reverse
> revision walk from HEAD.
> 
> Since
> 644eb60bd01 (builtin/describe.c: describe a blob, 2017-11-15)
> (included first in 2.16.0
> 
> You're welcome,
> Stefan

Awesome! Thank you very much :-)

- Lars

Re: [PATCH 2/2] banned.h: mark strncpy as banned

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 11:12:49PM +0200, Ævar Arnfjörð Bjarmason wrote:

> 
> On Thu, Jul 19 2018, Jeff King wrote:
> 
> > Since this use of strncpy was verified by manual inspection
> > and since it doesn't trigger the automated ban-list, we're
> > better off leaving it to keep our divergence from glibc
> > minimal.
> 
> FWIW it's s/glibc/gawk/. It's originally from glibc, but gawk
> perma-forked it long ago, and an ancient copy of thath is the one we
> use.

Thanks, I didn't know that. Not materially different, but it's worth
correcting the commit message.

-Peff


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 02:15:54PM -0700, Stefan Beller wrote:

> >  Documentation/CodingGuidelines |  3 +++
> 
> I'd prefer to not add more text to our documentation
> (It is already long and in case you run into this problem
> the error message is clear enough IMHO)

I'm fine with that too. I just wondered if somebody would complain in
the opposite way: your patch enforces this, but we never made it an
"official" guideline.

But that may be overly paranoid.  Once upon a time there was some rules
lawyering around CodingGuidelines, but I think that was successfully
shut down and hasn't reared its head for several years.

> > +#define strcpy(x,y) BANNED(strcpy)
> > +
> > +#ifdef HAVE_VARIADIC_MACROS
> 
> In a split second I thought you forgot sprintf that was
> mentioned in the commit message, but then I kept on reading
> just to find it here. I wonder if we want put this #ifdef at the
> beginning of the file (after the guard) as then we can have
> a uncluttered list of banned functions here. The downside is that
> the use of strcpy would not be banned in case you have no
> variadic macros, but we'd still catch it quickly as most people
> have them. Undecided.

Right, exactly. We should catch what we can on lesser platforms, and
everything on modern ones. My hope is that people would not generally
have to touch this file. I don't think we'll be adding banned functions
often.

> Regarding the introduction of the functions to this list,
> I would imagine people would find the commit that introduced
> a function to the ban list to look for a reason why.
> Can we include a link[1] to explain why we discourage
> strcpy and sprintf in this commit?

I hoped that it was mostly common knowledge, but that's probably not a
good assumption. I agree if there's a good reference, we should link to
it.

> [1] https://www.thegeekstuff.com/2013/06/buffer-overflow/?utm_source=feedly
>   This is the best I found. So not sure if it worth it.

Yeah, this one is so-so, because it goes into a lot more detail. I think
we can assume that people know that overflowing a buffer is bad. Maybe
just a short paragraph like:

  We'll include strcpy() and sprintf() in the initial list of banned
  functions. While these _can_ be used carefully by surrounding them
  with extra code, there's no inherent check that the size of the
  destination buffer is big enough, it's very easy to overflow the
  buffer.

-Peff


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 05:11:15PM -0400, Eric Sunshine wrote:

> On Thu, Jul 19, 2018 at 4:39 PM Jeff King  wrote:
> > [...]
> > Let's start by banning strcpy() and sprintf(). It's not
> > impossible to use these correctly, but it's easy to do so
> > incorrectly, and there's always a better option.
> > [...]
> > Signed-off-by: Jeff King 
> > ---
> > diff --git a/banned.h b/banned.h
> > @@ -0,0 +1,19 @@
> > +/*
> > + * This header lists functions that have been banned from our code base,
> > + * because they're too easy to misuse (and even if used correctly,
> > + * complicate audits). Including this header turns them into compile-time
> > + * errors.
> > + */
> 
> When the above talks about "including this header", the implication is
> that it must be included _after_ the system header(s) which declare
> the banned functions. I wonder if that requirement should be stated
> here explicitly.

Hmm, does it need to be? I had originally intended it to be included
before, actually, though in the end I put it later.

I guess it would yield declarations like strcpy_is_banned(), which would
cause _different_ errors (probably link-time ones).

> (Probably not worth a re-roll.)

Yeah, I doubt it matters much either way, since the inclusion is done
automatically in git-compat-util.h.

I had also originally imagined this to be triggered via DEVELOPER=1,
with something like "-include banned.h" in CFLAGS. But I think it
probably is appropriate for everybody to run it, since it shouldn't
cause any false positives or other compilation issues.

The one I brainstormed (but forgot to mention) is that it might be
possible for a platform to have strcpy as a macro already? In which case
we'd need to #undef it or risk a compilation error (even if the macro
isn't actually used).

-Peff


Re: Find commit that referenced a blob first

2018-07-19 Thread Stefan Beller
On Thu, Jul 19, 2018 at 2:02 PM Lars Schneider  wrote:
>
> Hi,
>
> I have a blob hash and I would like to know what commit referenced
> this blob first in a given Git repo.

git describe 

If the given object refers to a blob, it will be described as
:,
such that the blob can be found at  in the , which itself
describes the first commit in which this blob occurs in a reverse
revision walk from HEAD.

Since
644eb60bd01 (builtin/describe.c: describe a blob, 2017-11-15)
(included first in 2.16.0

You're welcome,
Stefan


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Stefan Beller
On Thu, Jul 19, 2018 at 1:39 PM Jeff King  wrote:
>
> There are a few standard C functions (like strcpy) which are
> easy to misuse. We generally discourage these in reviews,
> but we haven't put advice in CodingGuidelines, nor provided
> any automated enforcement. The latter is especially
> important because it's more consistent, and it can often
> save a round-trip of review.

Thanks for writing these patches. I would think this is a
good idea to have as I certainly would use banned functions
if not told by the compiler not to.

>  Documentation/CodingGuidelines |  3 +++

I'd prefer to not add more text to our documentation
(It is already long and in case you run into this problem
the error message is clear enough IMHO)


> @@ -0,0 +1,19 @@
> +#ifndef BANNED_H
> +#define BANNED_H
> +
> +/*
> + * This header lists functions that have been banned from our code base,
> + * because they're too easy to misuse (and even if used correctly,
> + * complicate audits). Including this header turns them into compile-time
> + * errors.
> + */
> +
> +#define BANNED(func) sorry_##func##_is_a_banned_function()
> +
> +#define strcpy(x,y) BANNED(strcpy)
> +
> +#ifdef HAVE_VARIADIC_MACROS

In a split second I thought you forgot sprintf that was
mentioned in the commit message, but then I kept on reading
just to find it here. I wonder if we want put this #ifdef at the
beginning of the file (after the guard) as then we can have
a uncluttered list of banned functions here. The downside is that
the use of strcpy would not be banned in case you have no
variadic macros, but we'd still catch it quickly as most people
have them. Undecided.

Regarding the introduction of the functions to this list,
I would imagine people would find the commit that introduced
a function to the ban list to look for a reason why.
Can we include a link[1] to explain why we discourage
strcpy and sprintf in this commit?


[1] https://www.thegeekstuff.com/2013/06/buffer-overflow/?utm_source=feedly
  This is the best I found. So not sure if it worth it.

Thanks,
Stefan


Re: [PATCH 2/2] banned.h: mark strncpy as banned

2018-07-19 Thread Ævar Arnfjörð Bjarmason


On Thu, Jul 19 2018, Jeff King wrote:

> Since this use of strncpy was verified by manual inspection
> and since it doesn't trigger the automated ban-list, we're
> better off leaving it to keep our divergence from glibc
> minimal.

FWIW it's s/glibc/gawk/. It's originally from glibc, but gawk
perma-forked it long ago, and an ancient copy of thath is the one we
use.


Re: Gitk doesn't work on macOS Mojave

2018-07-19 Thread Evgeny Cherpak
It seems this code placed at the end of the file, after getcommits() does the 
trick:

if {[tk windowingsystem] eq "aqua"} { 
set openscript [format { 
open -a \"$(ps -p %d -o comm=)\"
} [pid] ]
exec osascript -e [format {
do shell script "%s"
} "$openscript” ]
}


> On 19 Jul 2018, at 22:12, Eric Sunshine  wrote:
> 
> On Thu, Jul 19, 2018 at 2:48 PM Evgeny Cherpak  wrote:
>> You have probably heard this by now already, but gitk doesn’t work on macOS 
>> 10.14 - because it uses Apple Events,
>> And apps on 10.14 require user to give them permissions to control other 
>> apps with Apple Events.
> 
> This hasn't been reported, so thanks for bringing it up.
> 
>> Here is what I get when I try running it on my machine with beta 4 installed:
>> 
>> Error in startup script: 58:102: execution error: Not authorized to send 
>> Apple events to System Events. (-1743)
>>while executing
>> "exec osascript -e [format {
>>tell application "System Events"
>>set frontmost of processes whose unix id is %d to true
>>end te..."
> 
> Fortunately, this feature is merely a convenience, not otherwise
> critical to gitk functioning. It would be ideal if someone running
> Mojave could devise up a patch to work around the problem (either by
> skipping this code on Mojave or discovering a different way to bring
> the application to the foreground). An alternative would be to revert
> 76bf6ff93e (gitk: On OSX, bring the gitk window to front, 2013-04-24),
> which introduced this code.
> 
> (Note, however, that the gitk project is dormant, so it's not clear if
> such a patch will be picked up.)



Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Eric Sunshine
On Thu, Jul 19, 2018 at 4:39 PM Jeff King  wrote:
> [...]
> Let's start by banning strcpy() and sprintf(). It's not
> impossible to use these correctly, but it's easy to do so
> incorrectly, and there's always a better option.
> [...]
> Signed-off-by: Jeff King 
> ---
> diff --git a/banned.h b/banned.h
> @@ -0,0 +1,19 @@
> +/*
> + * This header lists functions that have been banned from our code base,
> + * because they're too easy to misuse (and even if used correctly,
> + * complicate audits). Including this header turns them into compile-time
> + * errors.
> + */

When the above talks about "including this header", the implication is
that it must be included _after_ the system header(s) which declare
the banned functions. I wonder if that requirement should be stated
here explicitly.

(Probably not worth a re-roll.)

> +#define BANNED(func) sorry_##func##_is_a_banned_function()
> +
> +#define strcpy(x,y) BANNED(strcpy)
> diff --git a/git-compat-util.h b/git-compat-util.h
> @@ -1239,4 +1239,6 @@ extern void unleak_memory(const void *ptr, size_t len);
>  #define UNLEAK(var) do {} while (0)
>  #endif
>
> +#include "banned.h"


hello

2018-07-19 Thread patiencebenkirane
my name is Mrs. patience benkirane I am a canadian, a Senior NATO
Officer and also a conventional Military Officer deployed by the UN to
Syria , and I have been here in Syria for 5years and i am a single
mother i have one child he is disturbed me to resign from Military
work , I got your contact from universal search network on my quest
in search of good business partner to assist me in starting up or
investing in a well-established business in your country. I have
little or no idea in business and so, I want your advice on this
issue I mentioned above. Finally, I will give you more details on it
after your response.


Find commit that referenced a blob first

2018-07-19 Thread Lars Schneider
Hi,

I have a blob hash and I would like to know what commit referenced 
this blob first in a given Git repo.

I could iterate through all commits sorted by date (or generation 
number) and then recursively search in the referenced trees until 
I find my blob. I wonder, is this the most efficient way to solve 
my problem? Is there some trick/existing Git command that would 
perform this search more efficiently for large repos?

Thank you,
Lars


Re: [PATCH 2/2] repack -ad: prune the list of shallow commits

2018-07-19 Thread Junio C Hamano
Johannes Schindelin  writes:

> On Tue, 17 Jul 2018, Junio C Hamano wrote:
>
>> Jeff King  writes:
>> 
>> > I'm OK with having a partial fix, or one that fixes immediate pain
>> > without doing a big cleanup, as long as it doesn't make anything _worse_
>> > in a user-visible way. And that's really my question: is pruning here
>> > going to bite people unexpectedly (not rhetorical -- I really don't
>> > know)?
>> 
>> Yeah, that matches the general guideline I follow when reviewing a
>> patch that claims to make existing things better.  And I do not
>> think I can explain to a third person why pruning here is a good
>> idea and won't cause problems, after seeing these patches and
>> the discussion from the sideline.
>
> It is very easy to explain: `git repack` can drop unreachable commits
> without further warning, making the corresponding entries in
> `.git/shallow` invalid, which causes serious problems when deepening the
> branches.

That explains why you wrote the patch very clearly.

> The solution is easy: drop also the now-invalid entries in `.git/shallow`
> after dropping unreachable commits unceremoniously.

Sorry, but I do not think I can relay that as an explanation why it
won't cause problems to a third person.

The entries in shallow file says that history behind them may not
exist in the repository due to its shallowness but history after
them are supposed to be traversable (otherwise we have a repository
corruption).  It is true that an entry that itself no longer exists
in this repository should not be in shallow file, as the presence of
that entry breaks that promise the file is making---that commit
ought to exist and it is safe to traverse down to it, so keeping the
entry in the file is absolutely a wrong thing to do.

But that does not automatically mean that just simply removing it
makes the resulting repository good, does it?  Wouldn't the solution
for that corruption be to set a new entry to stop history traversal
before reaching that (now-missing) commit?  If your solution and
explanatoin were like that, then I can understand why it won't cause
problems, because the resulting repository would be shallower than
it originally was, as if you cloned with a smaller depth, but it is
not corrupt.

Perhaps your rationale is that by trading one shape of corrupt
repository (where a commit that does not even exist is in the
shallow file, breaking the early start-up sequence when it tries to
read the entries) with another shape of corrupt repsitory (where
shallow does not completely tell where to stop, and traversing the
history can eventually hit a missing commit because no entry in the
shallow file stops such a traversal), at least deepening fetch can
start (instead of dying while trying to see how shallow the
repository currently is) and that can be used to recover the corrupt
repository back into a usable state?  If that is the justification,
I can fully buy that, but that is not what I am hearing in your easy
to explain answer, so I am still puzzled.



[PATCH 2/2] banned.h: mark strncpy as banned

2018-07-19 Thread Jeff King
The strncpy() function is less horrible than strcpy(). But
it's still pretty easy to misuse because of its funny
termination semantics. And we already have a ready-made
alternative in strlcpy. So let's ban it, to make sure uses
don't creep in.

Note that there is one instance of strncpy in
compat/regex/regcomp.c. But this doesn't trigger the
ban-list even when compiling with NO_REGEX=1, because we
don't use git-compat-util.h when compiling it (instead we
rely on the system includes from the upstream library).

Since this use of strncpy was verified by manual inspection
and since it doesn't trigger the automated ban-list, we're
better off leaving it to keep our divergence from glibc
minimal.

Signed-off-by: Jeff King 
---
 banned.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/banned.h b/banned.h
index fe81020e0f..ae64a9 100644
--- a/banned.h
+++ b/banned.h
@@ -11,6 +11,7 @@
 #define BANNED(func) sorry_##func##_is_a_banned_function()
 
 #define strcpy(x,y) BANNED(strcpy)
+#define strncpy(x,y,n) BANNED(strncpy)
 
 #ifdef HAVE_VARIADIC_MACROS
 #define sprintf(...) BANNED(sprintf)
-- 
2.18.0.540.g6c38643a7b


[PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Jeff King
There are a few standard C functions (like strcpy) which are
easy to misuse. We generally discourage these in reviews,
but we haven't put advice in CodingGuidelines, nor provided
any automated enforcement. The latter is especially
important because it's more consistent, and it can often
save a round-trip of review.

Let's start by banning strcpy() and sprintf(). It's not
impossible to use these correctly, but it's easy to do so
incorrectly, and there's always a better option.

For enforcement, we can rely on the compiler and just
introduce code which breaks compilation when it sees these
functions. This has a few advantages:

  1. We know it's run as part of a build cycle, so it's
 hard to ignore. Whereas an external linter is an extra
 step the developer needs to remember to do.

  2. Likewise, it's basically free since the compiler is
 parsing the code anyway.

  3. We know it's robust against false positives (unlike a
 grep-based linter).

The one big disadvantage is that it will only check code
that is actually compiled, so it may miss code that isn't
triggered on your particular system. But since presumably
people don't add new code without compiling it (and if they
do, the banned function list is the least of their worries),
we really only care about failing to clean up old code when
adding new functions to the list. And that's easy enough to
address with a manual audit when adding a new function
(which is what I did for the functions here).

That leaves only the question of how to trigger the
compilation error. The goals are:

 - it should be portable; ideally this would trigger
   everywhere, and does not need to be part of a DEVELOPER=1
   setup (because unlike warnings which may depend on the
   compiler or system, this is a clear indicator of
   something wrong in the code).

 - it should generate a readable error that gives the
   developer a clue what happened

 - it should avoid generating too much other cruft that
   makes it hard to see the actual error

 - it should mention the original callsite in the error

The technique used here is to convert the banned function
into a call to a descriptive but non-existent function. The
output from gcc 7 looks like this (on a checkout without
022d2ac1f3, which removes the final strcpy from blame.c):

  CC builtin/blame.o
  In file included from ./git-compat-util.h:1242:0,
   from ./cache.h:4,
   from builtin/blame.c:8:
  builtin/blame.c: In function ‘cmd_blame’:
  ./banned.h:11:22: error: implicit declaration of function 
‘sorry_strcpy_is_a_banned_function’ [-Werror=implicit-function-declaration]
   #define BANNED(func) sorry_##func##_is_a_banned_function()
^
  ./banned.h:13:21: note: in expansion of macro ‘BANNED’
   #define strcpy(x,y) BANNED(strcpy)
   ^~
  builtin/blame.c:1074:4: note: in expansion of macro ‘strcpy’
  strcpy(repeated_meta_color, GIT_COLOR_CYAN);
  ^~

This pretty clearly shows the original callsite along with
the descriptive function name, and it mentions banned.h,
which contains more information.

What's not perfectly ideal is:

  1. We'll only trigger with -Wimplicit-function-declaration
 (and only stop compilation with -Werror). These are
 generally enabled by DEVELOPER=1. If you _don't_ have
 that set, we'll still catch the problem, but only at
 link-time, with a slightly less useful message:

   /usr/bin/ld: builtin/blame.o: in function `cmd_blame':
   /home/peff/tmp/builtin/blame.c:1074: undefined reference to 
`sorry_strcpy_is_a_banned_function'
   collect2: error: ld returned 1 exit status

 If instead we convert this to a reference to an
 undefined variable, that always dies immediately. But
 gcc seems to print the set of errors twice, which
 clutters things up.

  2. The expansion through BANNED() adds an extra layer of
 error. Curiously, though, removing it (and just
 expanding strcpy directly to the bogus function name)
 causes gcc _not_ to report the original line of code.

So experimentally, this seems to behave pretty well on gcc.
Clang looks OK, too, though:

 - it actually produces a better message for the undeclared
   identifier than for the implicit function (it doesn't
   double the errors, and for the implicit function it
   actually produces an extra complaint about
   strict-prototypes).

 - the BANNED indirection has no effect (it shows the
   original line of code either way)

So if we want to optimize for clang's output, we could
switch both of those decisions.

Signed-off-by: Jeff King 
---
So you might guess that I experimented mostly with gcc
(which is what I normally use), and then just double-checked
clang at the end. After seeing those results, I actually
think clang does a better job of producing good error
messages overall, and perhaps we should not cater to gcc. I
dunno. It probably doesn't matter that much either way.

I'd also 

[PATCH 0/2] fail compilation with strcpy

2018-07-19 Thread Jeff King
This is a patch series to address the discussion in the thread at:

  https://public-inbox.org/git/20180713204350.ga16...@sigill.intra.peff.net/

Basically, the question was: can we declare strcpy banned and have a
linter save us the trouble of finding it in review. The answer is yes,
the compiler is good at that. ;)

There are probably as many lists of banned functions as there are coding
style documents. I don't agree with every entry in the ones I've seen.
And in many cases coccinelle is a better choice, because the problem is
not "this function is so bad your patch should not even make it to the
list with it", but "don't do it like A; we prefer to do it like B
instead". And coccinelle does the latter more flexibly and
automatically.

So I tried to pick some obvious and uncontroversial candidates here.
gets() could be another one, but it's mostly banned already (it's out of
the standard, and most libcs mark it with a deprecated attribute).

Note that this needs to be applied on top of 022d2ac1f3 (blame: prefer
xsnprintf to strcpy for colors, 2018-07-13) or it will complain loudly. :)

  [1/2]: introduce "banned function" list
  [2/2]: banned.h: mark strncpy as banned

 Documentation/CodingGuidelines |  3 +++
 banned.h   | 20 
 git-compat-util.h  |  2 ++
 3 files changed, 25 insertions(+)
 create mode 100644 banned.h

-Peff


URGENT REPLY NEEDED,REMINDER

2018-07-19 Thread Mr. X
Hello Dear,

We are global financial provider, led by a dynamic Investment team. We are
providing corporate and Personal Loan at 3% Interest Rate
for a duration of 2 to 20 Years.

We also pay 1% commission to brokers/person, who introduce project owners
for finance or other opportunities.

Contact us immediately for more information.

Yours faithfully,

Patrick Mulliun


Re: rev-parse --show-toplevel broken during exec'ed rebase?

2018-07-19 Thread Johannes Schindelin
Hi Junio,

On Mon, 16 Jul 2018, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > None of which is too surprising. The root of the bug is in the
> > conversion to rebase--helper, I think, when presumably we started
> > setting GIT_DIR at all (but I didn't dig further). Then 09d7b6c6fa fixed
> > _one_ fallout of that, which was relative paths, but didn't help the
> > subdirectory case.
> >
> > Just reading over this thread, I suspect the simplest fix is to pass
> > GIT_DIR and GIT_WORK_TREE together, which is almost always the right
> > thing to do.
> 
> Perhaps.  Not exporting GIT_DIR (unless the end-user already did to
> the environment before starting "git rebase"---it would be a bad
> change to unexport it unconditionally) may probably be a way to make
> rebase--helper conversion more faithful to the original scripted
> Porcelain, but I suspect in practice always giving GIT_DIR and
> GIT_WORK_TREE would work well for many existing hooks.

Forgetting the code in git-sh-setup, are we?

git_dir_init() rather specifically set GIT_DIR to the absolute path, and
since that variable is already exported, the `exec` commands launched via
`git-rebase--interactive` all saw it.

That is the reason why the sequencer.c was taught to set GIT_DIR to an
absolute path rathern than not setting it: for backwards compatibility.

Ciao,
Dscho


Re: [PATCH v2 03/23] archive-zip.c: mark more strings for translation

2018-07-19 Thread Junio C Hamano
Duy Nguyen  writes:

> It turns out gettext is smart! I got this in git.pot
>
> msgid "timestamp too large for this system: %"

OK.


Re: 2.18.0 Regression: packing performance and effectiveness

2018-07-19 Thread Junio C Hamano
Junio C Hamano  writes:

> Imagine that you create a delta and set its size (which happens to
> fit in the entry) and then create another whose size does not fit in
> the entry.  How does oe_delta_size() know not to look at
> pack->delta_size[] and instead look at e->delta_size_ when it wants
> to know the delta for the first one?  IOW, shouldn't there be a
> "backfill" procedure when oe_set_delta_size() notices that we now
> switch to pack->delta_size[] array?

Ah, ignore this.  That is what happens in the prepare_ thing.



Re: 2.18.0 Regression: packing performance and effectiveness

2018-07-19 Thread Junio C Hamano
Duy Nguyen  writes:

> @@ -330,20 +330,27 @@ static inline void oe_set_size(struct packing_data 
> *pack,
>  static inline unsigned long oe_delta_size(struct packing_data *pack,
> const struct object_entry *e)
>  {
> - if (e->delta_size_valid)
> + if (!pack->delta_size)
>   return e->delta_size_;
> - return oe_size(pack, e);
> + return pack->delta_size[e - pack->objects];
>  }
>  
> +void oe_prepare_delta_size_array(struct packing_data *pack);
>  static inline void oe_set_delta_size(struct packing_data *pack,
>struct object_entry *e,
>unsigned long size)
>  {
>   e->delta_size_ = size;
> - e->delta_size_valid = e->delta_size_ == size;
> - if (!e->delta_size_valid && size != oe_size(pack, e))
> - BUG("this can only happen in check_object() "
> - "where delta size is the same as entry size");
> + if (!pack->delta_size && e->delta_size_ == size)
> + return;
> + /*
> +  * We have had at least one delta size exceeding OE_DELTA_SIZE_BITS
> +  * limit. delta_size_ will not be used anymore. All delta sizes are now
> +  * from the delta_size[] array.
> +  */
> + if (!pack->delta_size)
> + oe_prepare_delta_size_array(pack);
> + pack->delta_size[e - pack->objects] = size;
>  }

Imagine that you create a delta and set its size (which happens to
fit in the entry) and then create another whose size does not fit in
the entry.  How does oe_delta_size() know not to look at
pack->delta_size[] and instead look at e->delta_size_ when it wants
to know the delta for the first one?  IOW, shouldn't there be a
"backfill" procedure when oe_set_delta_size() notices that we now
switch to pack->delta_size[] array?



Re: 2.18.0 Regression: packing performance and effectiveness

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 08:24:42PM +0200, Duy Nguyen wrote:

> > > Looking at that output, my _guess_ is that we somehow end up with a
> > > bogus delta_size value and write out a truncated entry. But I couldn't
> > > reproduce the issue with smaller test cases.
> > 
> > Could it be a race condition?
> 
> I'm convinced my code is racy (between two writes). I created a broken
> pack once with 32 threads. Elijah please try again with this new
> patch. It should fix this (I only tried repack a few times so far but
> will continue)

Good thinking, it's definitely racy. And that's why my tiny reproduction
didn't work. I even tried bumping it up to 10 blobs instead of 2, but
that's not nearly enough.

> The race is this
> 
> 1. Thread one sees a large delta size and NULL delta_size[] array,
>allocates the new array and in the middle of copying old delta
>sizes over.
> 
> 2. Thread two wants to write a new (large) delta size. It sees that
>delta_size[] is already allocated, it writes the correct size there
>(and truncated one in object_entry->delta_size_)
> 
> 3. Back to thread one, it now copies the truncated value in
>delta_size_ from step 2 to delta_size[] array, overwriting the good
>value that thread two wrote.

Right. Or we could even allocate two delta_size arrays, since the
NULL-check and the allocation are not atomic.

> There is also a potential read/write race where a read from
> pack_size[] happens when the array is not ready. But I don't think it
> can happen with current try_delta() code. I protect it anyway to be
> safe.

Hrm. That one's disappointing, because we read much more often than we
write, and this introduces potential lock contention. It may not matter
much in practice, though.

> +static unsigned long oe_delta_size(struct packing_data *pack,
> +const struct object_entry *e)
> +{
> + unsigned long size;
> +
> + read_lock(); /* to protect access to pack->delta_size[] */
> + if (pack->delta_size)
> + size = pack->delta_size[e - pack->objects];
> + else
> + size = e->delta_size_;
> + read_unlock();
> + return size;
> +}

Yuck, we even have to pay the read_lock() cost when we don't overflow
into the pack->delta_size array (but I agree we have to for
correctness).

Again, though, this amount of contention probably doesn't make a big
difference, since we're holding the lock for such a short time
(especially compared to all the work of computing the deltas).

This could be separate from the read_lock(), though, since that one does
block for much longer (e.g., while zlib inflating objects from disk).

> +static void oe_set_delta_size(struct packing_data *pack,
> +   struct object_entry *e,
> +   unsigned long size)
> +{
> + read_lock(); /* to protect access to pack->delta_size[] */
> + if (!pack->delta_size && size < pack->oe_delta_size_limit) {
> + e->delta_size_ = size;
> + read_unlock();
> + return;
> + }

And ditto for this one. I thought we could get away with the "fast case"
skipping the lock, but we have to check pack->delta_size atomically to
even use it.

If each individual delta_size had an overflow bit that indicates "use me
literally" or "look me up in the array", then I think the "quick" ones
could avoid locking. It may not be worth the complexity though.

> @@ -160,6 +162,8 @@ struct object_entry *packlist_alloc(struct packing_data 
> *pdata,
>  
>   if (!pdata->in_pack_by_idx)
>   REALLOC_ARRAY(pdata->in_pack, pdata->nr_alloc);
> + if (pdata->delta_size)
> + REALLOC_ARRAY(pdata->delta_size, pdata->nr_alloc);
>   }
>  

This realloc needs to happen under the lock, too, I think. It would be
OK without locking for an in-place realloc, but if the chunk has to be
moved, somebody in oe_set_delta_size() might write to the old memory.

This is in a file that doesn't even know about read_lock(), of course.
Probably you need a delta mutex as part of the "struct packing_data",
and then it can just be handled inside pack-objects.c entirely.

-Peff


editing for your photos

2018-07-19 Thread Jeremy

How are you doing today?

We have not heard from you yet. Are you still interested in our photo
retouching services?
We mainly do:

e-commerce products photo retouching
jewelry photos retouching
model beauty/skin retouching
wedding photo editing
also photo cutting out, clipping path

You may choose to send us a photo for testing which you can check our
quality.

Regards,
Jeremy



Re: Gitk doesn't work on macOS Mojave

2018-07-19 Thread Eric Sunshine
On Thu, Jul 19, 2018 at 2:48 PM Evgeny Cherpak  wrote:
> You have probably heard this by now already, but gitk doesn’t work on macOS 
> 10.14 - because it uses Apple Events,
> And apps on 10.14 require user to give them permissions to control other apps 
> with Apple Events.

This hasn't been reported, so thanks for bringing it up.

> Here is what I get when I try running it on my machine with beta 4 installed:
>
> Error in startup script: 58:102: execution error: Not authorized to send 
> Apple events to System Events. (-1743)
> while executing
> "exec osascript -e [format {
> tell application "System Events"
> set frontmost of processes whose unix id is %d to true
> end te..."

Fortunately, this feature is merely a convenience, not otherwise
critical to gitk functioning. It would be ideal if someone running
Mojave could devise up a patch to work around the problem (either by
skipping this code on Mojave or discovering a different way to bring
the application to the foreground). An alternative would be to revert
76bf6ff93e (gitk: On OSX, bring the gitk window to front, 2013-04-24),
which introduced this code.

(Note, however, that the gitk project is dormant, so it's not clear if
such a patch will be picked up.)


Re: [PATCH v1 1/3] add unbounded Multi-Producer-Multi-Consumer queue

2018-07-19 Thread Junio C Hamano
Ben Peart  writes:

> +/*
> + * struct mpmcq_entry is an opaque structure representing an entry in the
> + * queue.
> + */
> +struct mpmcq_entry {
> + struct mpmcq_entry *next;
> +};
> +
> +/*
> + * struct mpmcq is the concurrent queue structure. Members should not be
> + * modified directly.
> + */
> +struct mpmcq {
> + struct mpmcq_entry *head;
> + pthread_mutex_t mutex;
> + pthread_cond_t condition;
> + int cancel;
> +};

This calls itself a queue, but a new element goes to the beginning
of a singly linked list, and the only way to take an element out is
from near the beinning of the linked list, so it looks more like a
LIFO stack to me.

I do not know how much it matters, as the name mpmcq is totally
opaque to readers so perhaps readers are not even aware of various
aspects of the service, e.g. how it works, what fairness it gives to
the calling code, etc.



Re: [PATCHv6 00/10] Reroll of sb/diff-color-move-more

2018-07-19 Thread Junio C Hamano
Stefan Beller  writes:

> v6:
> * fixed issues hinted at by Andrei, thanks! (range-diff below)
> * incorporates the new config option, sent separately previously.

Let's declare victory and move to incremental by merging to 'next',
if nobody spots anything more serious than just a missing SP around
comparison operator.


Re: Use different ssh keys for different github repos (per-url sshCommand)

2018-07-19 Thread Basin Ilya
Wow, thanks.

For me it was enough to configure just one rewrite, because my public github 
account is associated with my default key. Note that I added the missing slash 
and the username:

git config --global \
  url.git@gh-org:theorganization/.insteadOf \
  g...@github.com:theorganization/



19.07.2018 19:42, Jeff King пишет:
> On Thu, Jul 19, 2018 at 03:24:54PM +0300, Basin Ilya wrote:
> 
>> I have two github accounts, one is for my organization and I want git
>> to automatically choose the correct ssh `IdentityFile` based on the
>> clone URL:
>>
>> g...@github.com:other/publicrepo.git
>>~/.ssh/id_rsa
>> g...@github.com:theorganization/privaterepo.git
>>~/.ssh/id_rsa.theorganization
>>
>> Unfortunately, both URLs have same host name, therefore I can't
>> configure this in the ssh client config. I could create a host alias
>> there, but sometimes somebody else gives me the github URL and I want
>> it to work out of the box.
> 
> I think you can hack around this using Git's URL rewriting.
> 
> For example, try this:
> 
>   git config --global \
> url.gh-other:other/.insteadOf \
> g...@github.com:other/
> 
>   git config --global \
> url.gh-org:theorganization.insteadOf \
> g...@github.com:theorganization/
> 
> And then:
> 
>   git clone g...@github.com:other/publicrepo.git
> 
> will hit gh-other, which you can configure using an ssh host alias.
> 
> -Peff
> 


[PATCH 1/3] xdiff/xhistogram: pass arguments directly to fall_back_to_classic_diff

2018-07-19 Thread Stefan Beller
By passing the 'xpp' and 'env' argument directly to the function
'fall_back_to_classic_diff', we eliminate an occurrence of the 'index'
in histogram_diff, which will prove useful in a bit.

While at it, move it up in the file. This will make the diff of
one of the next patches more legible.

Signed-off-by: Stefan Beller 
---
 xdiff/xhistogram.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
index 73210cb6f3f..6e20f75fe85 100644
--- a/xdiff/xhistogram.c
+++ b/xdiff/xhistogram.c
@@ -233,6 +233,16 @@ static int try_lcs(struct histindex *index, struct region 
*lcs, int b_ptr,
return b_next;
 }
 
+static int fall_back_to_classic_diff(xpparam_t const *xpp, xdfenv_t *env,
+   int line1, int count1, int line2, int count2)
+{
+   xpparam_t xpparam;
+   xpparam.flags = xpp->flags & ~XDF_DIFF_ALGORITHM_MASK;
+
+   return xdl_fall_back_diff(env, ,
+ line1, count1, line2, count2);
+}
+
 static int find_lcs(struct histindex *index, struct region *lcs,
int line1, int count1, int line2, int count2) {
int b_ptr;
@@ -248,16 +258,6 @@ static int find_lcs(struct histindex *index, struct region 
*lcs,
return index->has_common && index->max_chain_length < index->cnt;
 }
 
-static int fall_back_to_classic_diff(struct histindex *index,
-   int line1, int count1, int line2, int count2)
-{
-   xpparam_t xpp;
-   xpp.flags = index->xpp->flags & ~XDF_DIFF_ALGORITHM_MASK;
-
-   return xdl_fall_back_diff(index->env, ,
- line1, count1, line2, count2);
-}
-
 static int histogram_diff(xpparam_t const *xpp, xdfenv_t *env,
int line1, int count1, int line2, int count2)
 {
@@ -320,7 +320,7 @@ static int histogram_diff(xpparam_t const *xpp, xdfenv_t 
*env,
 
memset(, 0, sizeof(lcs));
if (find_lcs(, , line1, count1, line2, count2))
-   result = fall_back_to_classic_diff(, line1, count1, 
line2, count2);
+   result = fall_back_to_classic_diff(xpp, env, line1, count1, 
line2, count2);
else {
if (lcs.begin1 == 0 && lcs.begin2 == 0) {
while (count1--)
-- 
2.18.0.233.g985f88cf7e-goog



[PATCH 0/3] histogram diff: Fix memory ballooning in corner case

2018-07-19 Thread Stefan Beller
Currently if you run

seq 1   10 >one
seq 1 4 10 >two
git diff --no-index --histogram one two

you might run into memory issues. After applying the last patch of this series
you only have to wait a bit (as it doesn't fix run time complexity), but
computes the result with less memory pressure.

Thanks,
Stefan

Stefan Beller (3):
  xdiff/xhistogram: pass arguments directly to fall_back_to_classic_diff
  xdiff/xhistogram: factor out memory cleanup into free_index()
  xdiff/xhistogram: move index allocation into find_lcs

 xdiff/xhistogram.c | 117 +
 1 file changed, 66 insertions(+), 51 deletions(-)

-- 
2.18.0.233.g985f88cf7e-goog



[PATCH 2/3] xdiff/xhistogram: factor out memory cleanup into free_index()

2018-07-19 Thread Stefan Beller
This will be useful in the next patch as we'll introduce multiple
callers.

Signed-off-by: Stefan Beller 
---
 xdiff/xhistogram.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
index 6e20f75fe85..5098b6c5021 100644
--- a/xdiff/xhistogram.c
+++ b/xdiff/xhistogram.c
@@ -243,6 +243,14 @@ static int fall_back_to_classic_diff(xpparam_t const *xpp, 
xdfenv_t *env,
  line1, count1, line2, count2);
 }
 
+static inline void free_index(struct histindex *index)
+{
+   xdl_free(index->records);
+   xdl_free(index->line_map);
+   xdl_free(index->next_ptrs);
+   xdl_cha_free(>rcha);
+}
+
 static int find_lcs(struct histindex *index, struct region *lcs,
int line1, int count1, int line2, int count2) {
int b_ptr;
@@ -343,10 +351,7 @@ static int histogram_diff(xpparam_t const *xpp, xdfenv_t 
*env,
}
 
 cleanup:
-   xdl_free(index.records);
-   xdl_free(index.line_map);
-   xdl_free(index.next_ptrs);
-   xdl_cha_free();
+   free_index();
 
return result;
 }
-- 
2.18.0.233.g985f88cf7e-goog



[PATCH 3/3] xdiff/xhistogram: move index allocation into find_lcs

2018-07-19 Thread Stefan Beller
This fixes a memory issue when recursing a lot, which can be reproduced as

seq 1   10 >one
seq 1 4 10 >two
git diff --no-index --histogram one two

Before this patch, histogram_diff would call itself recursively before
calling free_index, which would mean a lot of memory is allocated during
the recursion and only freed afterwards. By moving the memory allocation
(and its free call) into find_lcs, the memory is free'd before we recurse,
such that memory is reused in the next step of the recursion instead of
using new memory.

This addresses only the memory pressure, not the run time complexity,
that is also awful for the corner case outlined above.

Helpful in understanding the code (in addition to the sparse history of
this file), was https://stackoverflow.com/a/32367597 which reproduces
most of the code comments of the JGit implementation.

Signed-off-by: Stefan Beller 
---
 xdiff/xhistogram.c | 96 +-
 1 file changed, 53 insertions(+), 43 deletions(-)

diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
index 5098b6c5021..fc2d3cd95d9 100644
--- a/xdiff/xhistogram.c
+++ b/xdiff/xhistogram.c
@@ -251,44 +251,13 @@ static inline void free_index(struct histindex *index)
xdl_cha_free(>rcha);
 }
 
-static int find_lcs(struct histindex *index, struct region *lcs,
-   int line1, int count1, int line2, int count2) {
-   int b_ptr;
-
-   if (scanA(index, line1, count1))
-   return -1;
-
-   index->cnt = index->max_chain_length + 1;
-
-   for (b_ptr = line2; b_ptr <= LINE_END(2); )
-   b_ptr = try_lcs(index, lcs, b_ptr, line1, count1, line2, 
count2);
-
-   return index->has_common && index->max_chain_length < index->cnt;
-}
-
-static int histogram_diff(xpparam_t const *xpp, xdfenv_t *env,
-   int line1, int count1, int line2, int count2)
+static int find_lcs(xpparam_t const *xpp, xdfenv_t *env,
+   struct region *lcs,
+   int line1, int count1, int line2, int count2)
 {
+   int b_ptr;
+   int sz, ret = -1;
struct histindex index;
-   struct region lcs;
-   int sz;
-   int result = -1;
-
-   if (count1 <= 0 && count2 <= 0)
-   return 0;
-
-   if (LINE_END(1) >= MAX_PTR)
-   return -1;
-
-   if (!count1) {
-   while(count2--)
-   env->xdf2.rchg[line2++ - 1] = 1;
-   return 0;
-   } else if (!count2) {
-   while(count1--)
-   env->xdf1.rchg[line1++ - 1] = 1;
-   return 0;
-   }
 
memset(, 0, sizeof(index));
 
@@ -326,8 +295,52 @@ static int histogram_diff(xpparam_t const *xpp, xdfenv_t 
*env,
index.ptr_shift = line1;
index.max_chain_length = 64;
 
+   if (scanA(, line1, count1))
+   goto cleanup;
+
+   index.cnt = index.max_chain_length + 1;
+
+   for (b_ptr = line2; b_ptr <= LINE_END(2); )
+   b_ptr = try_lcs(, lcs, b_ptr, line1, count1, line2, 
count2);
+
+   if (index.has_common && index.max_chain_length < index.cnt)
+   ret = 1;
+   else
+   ret = 0;
+
+cleanup:
+   free_index();
+   return ret;
+}
+
+static int histogram_diff(xpparam_t const *xpp, xdfenv_t *env,
+   int line1, int count1, int line2, int count2)
+{
+   struct region lcs;
+   int lcs_found;
+   int result = -1;
+
+   if (count1 <= 0 && count2 <= 0)
+   return 0;
+
+   if (LINE_END(1) >= MAX_PTR)
+   return -1;
+
+   if (!count1) {
+   while(count2--)
+   env->xdf2.rchg[line2++ - 1] = 1;
+   return 0;
+   } else if (!count2) {
+   while(count1--)
+   env->xdf1.rchg[line1++ - 1] = 1;
+   return 0;
+   }
+
memset(, 0, sizeof(lcs));
-   if (find_lcs(, , line1, count1, line2, count2))
+   lcs_found = find_lcs(xpp, env, , line1, count1, line2, count2);
+   if (lcs_found < 0)
+   goto out;
+   else if (lcs_found)
result = fall_back_to_classic_diff(xpp, env, line1, count1, 
line2, count2);
else {
if (lcs.begin1 == 0 && lcs.begin2 == 0) {
@@ -341,18 +354,15 @@ static int histogram_diff(xpparam_t const *xpp, xdfenv_t 
*env,
line1, lcs.begin1 - line1,
line2, lcs.begin2 - line2);
if (result)
-   goto cleanup;
+   goto out;
result = histogram_diff(xpp, env,
lcs.end1 + 1, LINE_END(1) - 
lcs.end1,
lcs.end2 + 1, LINE_END(2) - 
lcs.end2);
if (result)
-   goto cleanup;
+  

Re: [PATCH v2 10/23] connect.c: mark more strings for translation

2018-07-19 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

>   if (flags & CONNECT_VERBOSE) {
> - fprintf_ln(stderr, "done.");
> - fprintf(stderr, "Connecting to %s (port %s) ... ", host, port);
> + /* TRANSLATORS: this is the end of "Looking up %s ... " */
> + fprintf_ln(stderr, _("done."));
> + fprintf(stderr, _("Connecting to %s (port %s) ... "), host, 
> port);

I guess I misread the intention of what 01/23 did to this area and
misjudged its benefit when I said "we can share the same translation
for the same 'done.'"?

If you need to give guidance to translators to translate the same
"done." differently depending on what comes around it, perhaps we
shouldn't have split the original single message line into two.  In
general, use of more fprintf_ln() in 01/23 smelled more like done
for the sake of using them more, than improving the code.

Yes, I know 9a0a30aa ("strbuf: convenience format functions with \n
automatically appended", 2012-04-23) claims that it is convenient
when "we do not want to expose \n to translators", but does not
justify why "not exposing \n" is a good idea in the first place, and
splitting fprintf(stderr, "done.\nConnecting to...") into two like
the above looks like it made more work for us without clear benefit.

So I dunno.


Re: [PATCH v2 03/23] archive-zip.c: mark more strings for translation

2018-07-19 Thread Duy Nguyen
On Thu, Jul 19, 2018 at 8:26 PM Junio C Hamano  wrote:
> > @@ -601,7 +601,7 @@ static void dos_time(timestamp_t *timestamp, int 
> > *dos_date, int *dos_time)
> >   struct tm *t;
> >
> >   if (date_overflows(*timestamp))
> > - die("timestamp too large for this system: %"PRItime,
> > + die(_("timestamp too large for this system: %"PRItime),
> >   *timestamp);
>
> I suspect that this won't do the right "cross-platform" thing.  For
> built-in PRItype formats gettext tool knows about, *.po files let
> translators translate original with "%" into localized text
> with the same "%" left, and the runtime does the right
> thing, but for a custom format like PRItime there is no such
> support.
>

It turns out gettext is smart! I got this in git.pot

msgid "timestamp too large for this system: %"

Your second half about the runtime does cause cross platform problems,
luckily PRItime is always defined as PRIuMAX for all platforms, so
gettext should be able to do the right thing.

Besides, PRItime has already been wrapped in _() and Q_() in many
places. If people have not screamed now, they probably don't care
about l10n on that platform.
-- 
Duy


Gitk doesn't work on macOS Mojave

2018-07-19 Thread Evgeny Cherpak
Hi

You have probably heard this by now already, but gitk doesn’t work on macOS 
10.14 - because it uses Apple Events,
And apps on 10.14 require user to give them permissions to control other apps 
with Apple Events.

Here is what I get when I try running it on my machine with beta 4 installed:

Error in startup script: 58:102: execution error: Not authorized to send Apple 
events to System Events. (-1743)
while executing
"exec osascript -e [format {
tell application "System Events"
set frontmost of processes whose unix id is %d to true
end te..."
invoked from within
"if {[tk windowingsystem] eq "aqua"} {
exec osascript -e [format {
tell application "System Events"
set frontmost of processes ..."
(file "/usr/local/bin/gitk" line 12212)

And Apple doesn’t allow to add apps to “Automation” :( 

Best regards.
And thanks for git :) 

Re: [PATCH v2 06/23] builtin/pack-objects.c: mark more strings for translation

2018-07-19 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> @@ -149,7 +149,7 @@ static void *get_delta(struct object_entry *entry)
>   delta_buf = diff_delta(base_buf, base_size,
>  buf, size, _size, 0);
>   if (!delta_buf || delta_size != DELTA_SIZE(entry))
> - die("delta size changed");
> + die(_("delta size changed"));

I wonder if this should be a BUG("..."), as it falls into the same
category as assert(), just like others turned into BUG("...") with
step 01/23.  If the following observation is correct, then we need
to turn this message into BUG() in 01/23 in a reroll and leave it
untranslated:

In try_delta() we have earlier tried delta against many other
objects and picked the best one as (base object, size of delta,
delta chain length) tuple, and in the caller of this function,
we write out the delta by recomputing it (because the resulting
delta, unless it is trivially small, would have been discarded
long before we come to this phase).  If diff_delta() between
that chosen base object results in a different size from what we
observed earlier, on which we based our decision to use this
object as the base in the first place, we have a problem.



Re: [PATCH v2 01/23] Update messages in preparation for i18n

2018-07-19 Thread Junio C Hamano
Junio C Hamano  writes:

>> @@ -622,7 +626,7 @@ int cmd_config(int argc, const char **argv, const char 
>> *prefix)
>>   * location; error out even if XDG_CONFIG_HOME
>>   * is set and points at a sane location.
>>   */
>> -die("$HOME not set");
>> +die("$HOME is not set");
>
> Meh.  There are many verb-less messages e.g. "only one X at a time"
> that are not given a new verb in this patch.

Just to save wasted work due to misunderstanding, this is merely
"Meh" (I do not think it is worth it, and I wouldn't have included
this hunk if I were doing the patch, but since the patch has already
been written, I do not mind having it, either), not "Reject" (drop
this and resend).

Thanks.



Re: [PATCH v2 03/23] archive-zip.c: mark more strings for translation

2018-07-19 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> - return error("path too long (%d chars, SHA1: %s): %s",
> + return error(_("path too long (%d chars, SHA1: %s): %s"),
> - return error("unsupported file mode: 0%o (SHA1: %s)", mode,
> + return error(_("unsupported file mode: 0%o (SHA1: %s)"), mode,

The same #leftoverbits comment as 02/23 applies here.

> @@ -601,7 +601,7 @@ static void dos_time(timestamp_t *timestamp, int 
> *dos_date, int *dos_time)
>   struct tm *t;
>  
>   if (date_overflows(*timestamp))
> - die("timestamp too large for this system: %"PRItime,
> + die(_("timestamp too large for this system: %"PRItime),
>   *timestamp);

I suspect that this won't do the right "cross-platform" thing.  For
built-in PRItype formats gettext tool knows about, *.po files let
translators translate original with "%" into localized text
with the same "%" left, and the runtime does the right
thing, but for a custom format like PRItime there is no such
support.



Re: 2.18.0 Regression: packing performance and effectiveness

2018-07-19 Thread Duy Nguyen
On Thu, Jul 19, 2018 at 07:31:35PM +0200, Duy Nguyen wrote:
> On Thu, Jul 19, 2018 at 01:23:58PM -0400, Jeff King wrote:
> > On Thu, Jul 19, 2018 at 09:42:00AM -0700, Elijah Newren wrote:
> > 
> > > Thanks for the quick turnaround.  Unfortunately, I have some bad news.
> > > With this patch, I get the following:
> > > 
> > > $ /usr/bin/time -f 'MaxRSS:%M Time:%e' git gc --aggressive
> > > Enumerating objects: 4460703, done.
> > > Counting objects: 100% (4460703/4460703), done.
> > > Delta compression using up to 40 threads.
> > > Compressing objects: 100% (3807140/3807140), done.
> > > Writing objects: 100% (4460703/4460703), done.
> > > Total 4460703 (delta 2831383), reused 1587071 (delta 0)
> > > error: failed to unpack compressed delta at offset 183854150 from
> > > .git/objects/pack/pack-30d4f0b0e5a03dc91a658a0586f4e74cdf4a94d6.pack
> > > fatal: packed object 20ce811e53dabbb8ef9368c108cbbdfa65639c03 (stored
> > > in .git/objects/pack/pack-30d4f0b0e5a03dc91a658a0586f4e74cdf4a94d6.pack)
> > > is corrupt
> > > error: failed to run prune
> > > MaxRSS:40025196 Time:2531.52
> > 
> > Looking at that output, my _guess_ is that we somehow end up with a
> > bogus delta_size value and write out a truncated entry. But I couldn't
> > reproduce the issue with smaller test cases.
> 
> Could it be a race condition?

I'm convinced my code is racy (between two writes). I created a broken
pack once with 32 threads. Elijah please try again with this new
patch. It should fix this (I only tried repack a few times so far but
will continue)

The race is this

1. Thread one sees a large delta size and NULL delta_size[] array,
   allocates the new array and in the middle of copying old delta
   sizes over.

2. Thread two wants to write a new (large) delta size. It sees that
   delta_size[] is already allocated, it writes the correct size there
   (and truncated one in object_entry->delta_size_)

3. Back to thread one, it now copies the truncated value in
   delta_size_ from step 2 to delta_size[] array, overwriting the good
   value that thread two wrote.

There is also a potential read/write race where a read from
pack_size[] happens when the array is not ready. But I don't think it
can happen with current try_delta() code. I protect it anyway to be
safe.

-- 8< --
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ebc8cefb53..d67997f11c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -32,6 +32,12 @@
 #include "object-store.h"
 #include "dir.h"
 
+static unsigned long oe_delta_size(struct packing_data *pack,
+  const struct object_entry *e);
+static void oe_set_delta_size(struct packing_data *pack,
+ struct object_entry *e,
+ unsigned long size);
+
 #define IN_PACK(obj) oe_in_pack(_pack, obj)
 #define SIZE(obj) oe_size(_pack, obj)
 #define SET_SIZE(obj,size) oe_set_size(_pack, obj, size)
@@ -1915,6 +1921,51 @@ unsigned long oe_get_size_slow(struct packing_data *pack,
return size;
 }
 
+static unsigned long oe_delta_size(struct packing_data *pack,
+  const struct object_entry *e)
+{
+   unsigned long size;
+
+   read_lock(); /* to protect access to pack->delta_size[] */
+   if (pack->delta_size)
+   size = pack->delta_size[e - pack->objects];
+   else
+   size = e->delta_size_;
+   read_unlock();
+   return size;
+}
+
+static void oe_set_delta_size(struct packing_data *pack,
+ struct object_entry *e,
+ unsigned long size)
+{
+   read_lock(); /* to protect access to pack->delta_size[] */
+   if (!pack->delta_size && size < pack->oe_delta_size_limit) {
+   e->delta_size_ = size;
+   read_unlock();
+   return;
+   }
+   /*
+* We have had at least one delta size exceeding OE_DELTA_SIZE_BITS
+* limit. delta_size_ will not be used anymore. All delta sizes are now
+* from the delta_size[] array.
+*/
+   if (!pack->delta_size) {
+   uint32_t i;
+
+   /*
+* nr_alloc, not nr_objects to align with realloc() strategy in
+* packlist_alloc()
+*/
+   ALLOC_ARRAY(pack->delta_size, pack->nr_alloc);
+
+   for (i = 0; i < pack->nr_objects; i++)
+   pack->delta_size[i] = pack->objects[i].delta_size_;
+   }
+   pack->delta_size[e - pack->objects] = size;
+   read_unlock();
+}
+
 static int try_delta(struct unpacked *trg, struct unpacked *src,
 unsigned max_depth, unsigned long *mem_usage)
 {
@@ -2023,10 +2074,6 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
delta_buf = create_delta(src->index, trg->data, trg_size, _size, 
max_size);
if (!delta_buf)
return 0;
-   if (delta_size >= (1U << 

Re: [PATCH v2 02/23] archive-tar.c: mark more strings for translation

2018-07-19 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> @@ -256,7 +256,7 @@ static int write_tar_entry(struct archiver_args *args,
>   *header.typeflag = TYPEFLAG_REG;
>   mode = (mode | ((mode & 0100) ? 0777 : 0666)) & ~tar_umask;
>   } else {
> - return error("unsupported file mode: 0%o (SHA1: %s)",
> + return error(_("unsupported file mode: 0%o (SHA1: %s)"),
>mode, oid_to_hex(oid));

This is no longer sha1_to_hex(); the "SHA1" in the message should
probably have been updated when it happened.

Cleaning it up is outside the scope of this patch. 

#leftoverbits


Re: [PATCH v2 01/23] Update messages in preparation for i18n

2018-07-19 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

>  static void check_argc(int argc, int min, int max) {
>   if (argc >= min && argc <= max)
>   return;
> - error("wrong number of arguments");
> + if (min == max)
> + error("wrong number of arguments, should be %d", min);
> + else
> + error("wrong number of arguments, should be from %d to %d",
> +   min, max);
>   usage_with_options(builtin_config_usage, builtin_config_options);
>  }

Good. This was the only instance of

> - some messages are improved to give more information

I spotted.

> @@ -622,7 +626,7 @@ int cmd_config(int argc, const char **argv, const char 
> *prefix)
>* location; error out even if XDG_CONFIG_HOME
>* is set and points at a sane location.
>*/
> - die("$HOME not set");
> + die("$HOME is not set");

Meh.  There are many verb-less messages e.g. "only one X at a time"
that are not given a new verb in this patch.

> @@ -819,7 +823,7 @@ int cmd_config(int argc, const char **argv, const char 
> *prefix)
>   if (ret < 0)
>   return ret;
>   if (ret == 0)
> - die("No such section!");
> + die("no such section: %s", argv[0]);
>   }
>   else if (actions == ACTION_REMOVE_SECTION) {
>   int ret;
> @@ -830,7 +834,7 @@ int cmd_config(int argc, const char **argv, const char 
> *prefix)
>   if (ret < 0)
>   return ret;
>   if (ret == 0)
> - die("No such section!");
> + die("no such section: %s", argv[0]);
>   }

There are check_argc() calls before these two hunks to ensure that
access to argv[0] is safe, so these are good.

> @@ -2638,7 +2638,7 @@ static void read_object_list_from_stdin(void)
>   if (feof(stdin))
>   break;
>   if (!ferror(stdin))
> - die("fgets returned NULL, not EOF, not error!");
> + die("BUG: fgets returned NULL, not EOF, not 
> error!");

This is not BUG("...") because it is not *our* bug but a bug in platform's 
stdio?

> @@ -456,10 +456,10 @@ static int create_graft(int argc, const char **argv, 
> int force, int gentle)
>   return -1;
>   }
>  
> - if (remove_signature()) {
> - warning(_("the original commit '%s' has a gpg signature."), 
> old_ref);
> - warning(_("the signature will be removed in the replacement 
> commit!"));
> - }
> + if (remove_signature())
> + warning(_("the original commit '%s' has a gpg signature.\n"
> +   "The signature will be removed in the replacement 
> commit!"),
> + old_ref);

Unlike a compound sentence, for which translators may appreciate
that they can reorder the parts of it to suit their language, I do
not see why these two independent sentences should be placed in a
single warning().

Or is this primarily about avoiding repeated appearance of
"warning:" labels, i.e.

warning: sentence one
warning: sentence two

I am not sure if these belong to this "simple mechanical conversion
or otherwise uncontrovercial improvement" series.

> diff --git a/config.c b/config.c
> index f4a208a166..6ba07989f1 100644
> --- a/config.c
> +++ b/config.c
> @@ -461,7 +461,7 @@ int git_config_from_parameters(config_fn_t fn, void *data)
>   envw = xstrdup(env);
>  
>   if (sq_dequote_to_argv(envw, , , ) < 0) {
> - ret = error("bogus format in " CONFIG_DATA_ENVIRONMENT);
> + ret = error("bogus format in %s", CONFIG_DATA_ENVIRONMENT);
>   goto out;
>   }
>  

Good job spotting that the original wanted to say, but failed to
say, that CONFIG_DATA_ENVIRONMENT as the source of broken data we
detected.  But I am not sure CONFIG_DATA_ENVIRONMENT is what we want
to report as the source of bad data to the end users.  One-shot
configuration we get form "git -c VAR=VAL" are placed in the
environment as an internal implementation detail, so from their
point of view, the place we saw broken data coming from is their
command line "git -c VAR=VAL" one-shot configuration.

> @@ -1409,11 +1409,11 @@ static int git_default_push_config(const char *var, 
> const char *value)
>   push_default = PUSH_DEFAULT_UPSTREAM;
>   else if (!strcmp(value, "current"))
>   push_default = PUSH_DEFAULT_CURRENT;
> - else {
> - error("malformed value for %s: %s", var, value);
> - return error("Must be one of nothing, matching, simple, 
> "
> -  "upstream or current.");
> - }
> + else
> + return error("malformed value for %s: %s\n"
> +   

Re: [PATCH] diff.c: offer config option to control ws handling in move detection

2018-07-19 Thread Stefan Beller
On Thu, Jul 19, 2018 at 10:32 AM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > "git format-patch HEAD^ --color" produces red/green output
> > (like git log would), so I do not see why --color-moved should impact
> > format-patch differently. (i.e. if the user requests format-patch with
> > color-moved we can do it, otherwise, when colors are off, we do not
> > spend cycles to compute the moves)
> >
> > Maybe I did not understand the gist of your question, still?
>
> "format-patch --color" done by end-user, especially when combined
> with "--stdout", would be useful to show coloured output.  That is
> what you are talking about in the above, but that is not what I was
> asking about.
>
> The question was specifically about configuration.  You may say
> "diff.colorwhatever = on", but "git format-patch" with no command
> line override wouldn't want to be affected by that and produce a
> patch that won't apply, I would think.

The options introduced here (even the command line arguments)
do *nothing* if no color is on, i.e.

git diff --no-color --color-moved=blocks \
--color-moved-ws=allow-indentation-change

is the same as

git diff --no-color

and as format-patch doesn't use colors by default giving
color-moved settings (even via config) is a no-op, too?

Stefan


Re: [PATCH 2/2] repack -ad: prune the list of shallow commits

2018-07-19 Thread Johannes Schindelin
Hi,

On Tue, 17 Jul 2018, Duy Nguyen wrote:

> On Tue, Jul 17, 2018 at 6:39 PM Duy Nguyen  wrote:
> >
> > On Fri, Jul 13, 2018 at 10:19 PM Johannes Schindelin via GitGitGadget
> >  wrote:
> > >
> > > From: Johannes Schindelin 
> > >
> > > While it is true that we never add unreachable commits into pack files
> > > intentionally (as `git repack`'s documentation states), we must not
> > > forget that a `git fetch --prune` (or even a `git fetch` when a ref was
> > > force-pushed in the meantime) can make a commit unreachable that was
> > > reachable before.
> > >
> > > Therefore it is not safe to assume that a `git repack -adlf` will keep
> > > unreachable commits alone (under the assumption that they had not been
> > > packed in the first place).
> > >
> > > This is particularly important to keep in mind when looking at the
> > > `.git/shallow` file: if any commits listed in that file become
> > > unreachable, it is not a problem, but if they go missing, it *is* a
> > > problem. One symptom of this problem is that a deepening fetch may now
> > > fail with
> > >
> > > fatal: error in object: unshallow 
> > >
> >
> > Could you elaborate a bit more?
> 
> Never mind. The problem is described in another patch.. sigh.. I
> understand you want to flip that "failure" to "success" but personally
> I don't think it worth it to look back in history and have "what?"
> moments like when I read this patch alone.

The split was not meant for your benefit, but for the benefit of verifying
that I indeed fixed a problem.

I don't think it is a wise idea to squash them together.

Ciao,
Dscho


Re: [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository

2018-07-19 Thread Johannes Schindelin
Hi Peff,

On Tue, 17 Jul 2018, Jeff King wrote:

> On Tue, Jul 17, 2018 at 03:15:31PM -0400, Jeff King wrote:
> 
> > > - Also trigger `prune_shallow()` when 
> > > `--unpack-unreachable=` was passed to `git repack`.
> > > - No need to trigger `prune_shallow()` when `git repack` was called with 
> > > `-k`.
> > 
> > I think you might have missed the bigger problem I pointed at, as it was
> > buried deep within my later reply. Try applying this patch on top, which
> > tests the opposite case (reachable shallow commits are retained):
> 
> By the way, I notice that the patches themselves are cc'd to you, but
> the cover letter isn't. So my reply went only to gitgitgadget, which
> (AFAIK) has not yet learned to work as a mail-to-comment gateway.
> 
> So I'm copying this to you directly to make sure you see it, but also
> because I'm not sure if the gitgitgadget cc behavior is intentional or
> not.

It is one of the existing TODOs to Cc: the person who hit "/submit". So I
am aware of it, and it is recorded in the TODO.md file.

But as the GitHub user might not be associated with a public email, this
will need an internal map that will be extended to whoever is in the list
of allowed users.

Ciao,
Dscho


ACKNOWLEDGE MY MAIL (Mutual Benefit)

2018-07-19 Thread Chris Brown.
Good day,

I have a business proposal which I would like to discuss with you 
confidentially.

I assisted a certain (Libyan) General by name (Mohammed Hadiya Al-Feitouri) to 
secure certain amount of his funds here in the UK, the general who was later 
involved in 

the Libyan Crisis which claimed his life along the line.

I assisted him to deposited the funds with a Chinese name and the amount is a 
total of US$75,000,000.00 (Seventy-Five Million United States Dollars).The bank 
manager 

where the funds was kept is aware of the deal and has agreed with me for us to 
nominate you as the beneficiary of the funds since no one will ever come for 
the money 

because no next of kin was written in any document with the bank and the vault 
is a private vault.

If you accept to receive the funds as the beneficiary you will be given 
US$20,000,000.00 (Twenty Million United States Dollars) and the bank manager 
and I will share 

the remaining.

This transfer is 100% risk free having done all the underground works locally 
for the smooth transfer of the fund into your bank account within the shortest 
period as 

the bank manager of the bank holding the fund is directly involved. I advised 
that you should keep this transaction a top secret and rest all correspondence 
to e-mail 

or phone only for confidential reason.


To enable me start the process and remittance of the fund into your bank 
account successfully within 10 banking days, I need the following information 
from you by e-

mail:chrissbr...@protonmail.com


Individual Name.. ...
Company Name.. ..
Mobile  Number ..
Office  Number ..
Fax Number ..
Residential Address... ..
Company Address... ..
Personal Email. .
Company Email. ..


I will hope for your co-operation.
Sincerely Yours,
Chris Brown.


Re: [PATCH] diff.c: offer config option to control ws handling in move detection

2018-07-19 Thread Junio C Hamano
Stefan Beller  writes:

> "git format-patch HEAD^ --color" produces red/green output
> (like git log would), so I do not see why --color-moved should impact
> format-patch differently. (i.e. if the user requests format-patch with
> color-moved we can do it, otherwise, when colors are off, we do not
> spend cycles to compute the moves)
>
> Maybe I did not understand the gist of your question, still?

"format-patch --color" done by end-user, especially when combined
with "--stdout", would be useful to show coloured output.  That is
what you are talking about in the above, but that is not what I was
asking about.

The question was specifically about configuration.  You may say
"diff.colorwhatever = on", but "git format-patch" with no command
line override wouldn't want to be affected by that and produce a
patch that won't apply, I would think.



Re: 2.18.0 Regression: packing performance and effectiveness

2018-07-19 Thread Duy Nguyen
On Thu, Jul 19, 2018 at 01:23:58PM -0400, Jeff King wrote:
> On Thu, Jul 19, 2018 at 09:42:00AM -0700, Elijah Newren wrote:
> 
> > Thanks for the quick turnaround.  Unfortunately, I have some bad news.
> > With this patch, I get the following:
> > 
> > $ /usr/bin/time -f 'MaxRSS:%M Time:%e' git gc --aggressive
> > Enumerating objects: 4460703, done.
> > Counting objects: 100% (4460703/4460703), done.
> > Delta compression using up to 40 threads.
> > Compressing objects: 100% (3807140/3807140), done.
> > Writing objects: 100% (4460703/4460703), done.
> > Total 4460703 (delta 2831383), reused 1587071 (delta 0)
> > error: failed to unpack compressed delta at offset 183854150 from
> > .git/objects/pack/pack-30d4f0b0e5a03dc91a658a0586f4e74cdf4a94d6.pack
> > fatal: packed object 20ce811e53dabbb8ef9368c108cbbdfa65639c03 (stored
> > in .git/objects/pack/pack-30d4f0b0e5a03dc91a658a0586f4e74cdf4a94d6.pack)
> > is corrupt
> > error: failed to run prune
> > MaxRSS:40025196 Time:2531.52
> 
> Looking at that output, my _guess_ is that we somehow end up with a
> bogus delta_size value and write out a truncated entry. But I couldn't
> reproduce the issue with smaller test cases.

Could it be a race condition? I ran the whole test suite with this
fallback code activated (by forcing the delta size limit down to two
bytes) and nothing failed. Perhaps something like this on top? I'll
need to see if helgrind could spot anything...

-- 8< --
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7f3546057d..1eccbc91d2 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2054,7 +2054,16 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
}
 
SET_DELTA(trg_entry, src_entry);
+
+   /*
+* Locking is needed because SET_DELTA_SIZE() internally call
+* oe_prepare_delta_size_array() which may touch other entries,
+* which are updated in parallel.
+*/
+   cache_lock();
SET_DELTA_SIZE(trg_entry, delta_size);
+   cache_unlock();
+
trg->depth = src->depth + 1;
 
return 1;
diff --git a/pack-objects.c b/pack-objects.c
index 89699cf15c..9e52af32c3 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -185,13 +185,16 @@ struct object_entry *packlist_alloc(struct packing_data 
*pdata,
 void oe_prepare_delta_size_array(struct packing_data *pack)
 {
uint32_t i;
+   uint32_t *delta_size;
 
/*
 * nr_alloc, not nr_objects to align with realloc() strategy in
 * packlist_alloc()
 */
-   ALLOC_ARRAY(pack->delta_size, pack->nr_alloc);
+   ALLOC_ARRAY(delta_size, pack->nr_alloc);
 
for (i = 0; i < pack->nr_objects; i++)
-   pack->delta_size[i] = pack->objects[i].delta_size_;
+   delta_size[i] = pack->objects[i].delta_size_;
+
+   pack->delta_size = delta_size;
 }
-- 8< --

Elijah, another thing you could try (if you have plenty of time to
spare) is run this repack with a single thread. It's going to take
forever though...

--
Duy


Re: [PATCH] diff.c: offer config option to control ws handling in move detection

2018-07-19 Thread Stefan Beller
On Thu, Jul 19, 2018 at 9:29 AM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > On Wed, Jul 18, 2018 at 10:45 AM Junio C Hamano  wrote:
> >>
> >> Stefan Beller  writes:
> >>
> >> > diff --git a/Documentation/diff-options.txt 
> >> > b/Documentation/diff-options.txt
> >> > index 143acd9417e..8da7fed4e22 100644
> >> > --- a/Documentation/diff-options.txt
> >> > +++ b/Documentation/diff-options.txt
> >> > @@ -294,8 +294,11 @@ dimmed_zebra::
> >> >
> >> >  --color-moved-ws=::
> >> >   This configures how white spaces are ignored when performing the
> >> > - move detection for `--color-moved`. These modes can be given
> >> > - as a comma separated list:
> >> > + move detection for `--color-moved`.
> >> > +ifdef::git-diff[]
> >> > + It can be set by the `diff.colorMovedWS` configuration setting.
> >> > +endif::git-diff[]
> >>
> >> The patch to diff.c::git_diff_ui_config() we see below does not seem
> >> to make any effort to make sure that this new configuration applies
> >> only to "git diff" and that other commands like "git log" that call
> >> git_diff_ui_config() are not affected.
> >
> > That is as intended. (We want to have it in git-log)
>
> Ah, I think what is going on here, and I think I asked a wrong
> question.
>
>  * diff-options.txt is used by the family of diff plumbing commands
>(which actively do not want to participate in the coloring game
>and do not call git_diff_ui_config()) as well as log family of
>commands (which do pay attention to the config).
>
>  * "git grep '^:git-diff:'" hits only Documentation/git-diff.txt.
>
> What the system currently does (which may not match what it should
> do) is that Porcelain "diff", "log", etc. pay attention to the
> configuration while plumbing "diff-{files,index,tree}" don't, and
> use of ifdef/endif achieves only half of that (i.e. excludes the
> sentence from plumbing manual pages).  It excludes too much and does
> not say "log", "show", etc. also honor the configuration.
>
> I think the set of asciidoc attrs diff-options.txt files uses need
> some serious clean-up.  For example, it defines :git-diff-core: that
> is only used once, whose intent seems to be "we are describing diff
> plumbing".  However, the way it does so is by excluding known
> Porcelain; if we ever add new diff porcelain (e.g. "range-diff"),
> that way of 'definition by exclusion' would break.  The scheme is
> already broken by forcing git-show.txt to define 'git-log' just like
> git-log.txt does, meaning that it is not possible to make "show" to
> be described differently from "log".  But let's leave that outside
> this topic.

Then let's call this #leftoverbits ?

> Back to a more on-topic tangent.
>
> How does this patch (and all the recent "color" options you added
> recently) avoid spending unnecessary cycles and contaminating "git
> format-patch" output, by the way?  builtin/log.c::cmd_format_patch()
> seems to eventually call into git_log_config() that ends with a call
> to diff_ui_config().

The color options are only using CPU cycles when we actually need to
color things (if no-color is set, then the move detection is off).

"git format-patch HEAD^ --color" produces red/green output
(like git log would), so I do not see why --color-moved should impact
format-patch differently. (i.e. if the user requests format-patch with
color-moved we can do it, otherwise, when colors are off, we do not
spend cycles to compute the moves)

Maybe I did not understand the gist of your question, still?
Stefan


Re: 2.18.0 Regression: packing performance and effectiveness

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 09:42:00AM -0700, Elijah Newren wrote:

> Thanks for the quick turnaround.  Unfortunately, I have some bad news.
> With this patch, I get the following:
> 
> $ /usr/bin/time -f 'MaxRSS:%M Time:%e' git gc --aggressive
> Enumerating objects: 4460703, done.
> Counting objects: 100% (4460703/4460703), done.
> Delta compression using up to 40 threads.
> Compressing objects: 100% (3807140/3807140), done.
> Writing objects: 100% (4460703/4460703), done.
> Total 4460703 (delta 2831383), reused 1587071 (delta 0)
> error: failed to unpack compressed delta at offset 183854150 from
> .git/objects/pack/pack-30d4f0b0e5a03dc91a658a0586f4e74cdf4a94d6.pack
> fatal: packed object 20ce811e53dabbb8ef9368c108cbbdfa65639c03 (stored
> in .git/objects/pack/pack-30d4f0b0e5a03dc91a658a0586f4e74cdf4a94d6.pack)
> is corrupt
> error: failed to run prune
> MaxRSS:40025196 Time:2531.52

Looking at that output, my _guess_ is that we somehow end up with a
bogus delta_size value and write out a truncated entry. But I couldn't
reproduce the issue with smaller test cases.

Maybe instrumenting Git with the patch below and running:

  GIT_TRACE_DELTA=$PWD/delta.out git gc --aggressive
  perl -lne '
/(get|put) ([0-9a-f]{40}) (\d+)/ or next;
if ($1 eq "put") {
$h{$2} and print "double put: $2 = ($h{$2}, $3)";
$h{$2} = $3;
} else {
$h{$2} == $3 or print "mismatched get: $2 = ($h{$2}, $3)"
}
  ' delta_size)
@@ -335,11 +337,23 @@ static inline unsigned long oe_delta_size(struct 
packing_data *pack,
return pack->delta_size[e - pack->objects];
 }
 
+static inline unsigned long oe_delta_size(struct packing_data *pack,
+ const struct object_entry *e)
+{
+   unsigned long ret = oe_delta_size_1(pack, e);
+   trace_printf_key(_delta, "get %s %lu",
+oid_to_hex(>idx.oid), ret);
+   return ret;
+}
+
 void oe_prepare_delta_size_array(struct packing_data *pack);
 static inline void oe_set_delta_size(struct packing_data *pack,
 struct object_entry *e,
 unsigned long size)
 {
+   trace_printf_key(_delta, "put %s %lu",
+oid_to_hex(>idx.oid), size);
+
e->delta_size_ = size;
if (!pack->delta_size && e->delta_size_ == size)
return;


[PATCH] l10n: de.po: translate 108 new messages

2018-07-19 Thread Ralf Thielow
Translate 108 new messages came from git.pot update in 9b7388a85 (l10n:
git.pot: v2.18.0 round 1 (108 new, 14 removed)).

Signed-off-by: Ralf Thielow 
---
 po/de.po | 373 +--
 1 file changed, 194 insertions(+), 179 deletions(-)

diff --git a/po/de.po b/po/de.po
index bdc5830e1..47986814c 100644
--- a/po/de.po
+++ b/po/de.po
@@ -10,25 +10,25 @@ msgstr ""
 "POT-Creation-Date: 2018-05-31 23:32+0800\n"
 "PO-Revision-Date: 2016-11-28 18:10+0100\n"
 "Last-Translator: Ralf Thielow \n"
 "Language-Team: German <>\n"
 "Language: de\n"
 "MIME-Version: 1.0\n"
 "Content-Type: text/plain; charset=UTF-8\n"
 "Content-Transfer-Encoding: 8bit\n"
 "Plural-Forms: nplurals=2; plural=(n!=1);\n"
 
 #: advice.c:92
-#, fuzzy, c-format
+#, c-format
 msgid "%shint: %.*s%s\n"
-msgstr "Hinweis: %.*s\n"
+msgstr "%sHinweis: %.*s%s\n"
 
 #: advice.c:137
 msgid "Cherry-picking is not possible because you have unmerged files."
 msgstr ""
 "Cherry-Picken ist nicht möglich, weil Sie nicht zusammengeführte Dateien "
 "haben."
 
 #: advice.c:139
 msgid "Committing is not possible because you have unmerged files."
 msgstr ""
 "Committen ist nicht möglich, weil Sie nicht zusammengeführte Dateien haben."
@@ -1284,44 +1284,52 @@ msgstr "%s %s ist kein Commit!"
 #: commit.c:182
 msgid ""
 "Support for /info/grafts is deprecated\n"
 "and will be removed in a future Git version.\n"
 "\n"
 "Please use \"git replace --convert-graft-file\"\n"
 "to convert the grafts into replace refs.\n"
 "\n"
 "Turn this message off by running\n"
 "\"git config advice.graftFileDeprecated false\""
 msgstr ""
+"Die Unterstützung für /info/grafts ist veraltet\n"
+"und wird in zukünftigen Git Versionen entfernt.\n"
+"\n"
+"Bitte benutzen Sie \"git replace --convert-graft-file\"\n"
+"zum Konvertieren der künstlichen Vorgänger (\"grafts\")\n"
+"in ersetzende Referenzen.<\n"
+"\n"
+"Sie können diese Meldung unterdrücken, indem Sie\n"
+"\"git config advice.graftFileDeprecated false\" ausführen."
 
 #: commit.c:1537
 msgid ""
 "Warning: commit message did not conform to UTF-8.\n"
 "You may want to amend it after fixing the message, or set the config\n"
 "variable i18n.commitencoding to the encoding your project uses.\n"
 msgstr ""
 "Warnung: Die Commit-Beschreibung ist nicht UTF-8 konform.\n"
 "Sie können das nachbessern, nachdem Sie die Beschreibung korrigiert haben,\n"
 "oder Sie setzen die Konfigurationsvariable i18n.commitencoding auf das "
 "Encoding,\n"
 "welches von ihrem Projekt verwendet wird.\n"
 
 #: commit-graph.c:669
 #, c-format
 msgid "the commit graph format cannot write %d commits"
-msgstr ""
+msgstr "Das Commit-Graph Format kann nicht %d Commits schreiben."
 
 #: commit-graph.c:696
-#, fuzzy
 msgid "too many commits to write graph"
-msgstr "nur Commits anzeigen, die sich nicht im ersten Branch befinden"
+msgstr "Zu viele Commits zum Schreiben des Graphen."
 
 #: commit-graph.c:707 builtin/init-db.c:516 builtin/init-db.c:521
 #, c-format
 msgid "cannot mkdir %s"
 msgstr "kann Verzeichnis %s nicht erstellen"
 
 #: compat/obstack.c:405 compat/obstack.c:407
 msgid "memory exhausted"
 msgstr "Speicher verbraucht"
 
 #: config.c:187
@@ -1554,56 +1562,61 @@ msgstr "LF würde in %s durch CRLF ersetzt werden."
 msgid ""
 "LF will be replaced by CRLF in %s.\n"
 "The file will have its original line endings in your working directory."
 msgstr ""
 "LF wird in %s durch CRLF ersetzt.\n"
 "Die Datei wird ihre ursprünglichen Zeilenenden im Arbeitsverzeichnis "
 "behalten."
 
 #: convert.c:279
 #, c-format
 msgid "BOM is prohibited in '%s' if encoded as %s"
-msgstr ""
+msgstr "BOM ist in '%s' unzulässig, wenn als %s codiert."
 
 #: convert.c:286
 #, c-format
 msgid ""
 "The file '%s' contains a byte order mark (BOM). Please use UTF-%s as working-"
 "tree-encoding."
 msgstr ""
+"Die Datei '%s' enthält ein Byte-Order-Mark (BOM). Bitte benutzen Sie UTF-%s\n"
+"als Codierung im Arbeitsverzeichnis."
 
 #: convert.c:304
 #, c-format
 msgid "BOM is required in '%s' if encoded as %s"
-msgstr ""
+msgstr "BOM ist erforderlich in '%s', wenn als %s codiert."
 
 #: convert.c:306
 #, c-format
 msgid ""
 "The file '%s' is missing a byte order mark (BOM). Please use UTF-%sBE or UTF-"
 "%sLE (depending on the byte order) as working-tree-encoding."
 msgstr ""
+"Der Datei '%s' fehlt ein Byte-Order-Mark (BOM). Bitte benutzen Sie UTF-%sBE\n"
+"oder UTF-%sLE (abhängig von der Byte-Reihenfolge) als Codierung im\n"
+"Arbeitsverzeichnis."
 
 #: convert.c:424
-#, fuzzy, c-format
+#, c-format
 msgid "failed to encode '%s' from %s to %s"
-msgstr "Fehler beim Kopieren der Notizen von '%s' nach '%s'"
+msgstr "Fehler beim Codieren von '%s' von %s nach %s."
 
 #: convert.c:467
 #, c-format
 msgid "encoding '%s' from %s to %s and back is not the same"
-msgstr ""
+msgstr "Die Codierung '%s' von %s nach %s und zurück ist nicht dasselbe."
 
 #: convert.c:1225
 msgid "true/false are no valid working-tree-encodings"
-msgstr ""
+msgstr "true/false sind keine 

Re: What's cooking in git.git (Jul 2018, #02; Wed, 18)

2018-07-19 Thread Elijah Newren
On Wed, Jul 18, 2018 at 3:03 PM, Junio C Hamano  wrote:

> * en/rebase-i-microfixes (2018-06-27) 3 commits
>   (merged to 'next' on 2018-07-11 at d913ca0f77)
>  + git-rebase--merge: modernize "git-$cmd" to "git $cmd"
>  + Fix use of strategy options with interactive rebases
>  + t3418: add testcase showing problems with rebase -i and strategy options
>
>  Will merge to 'master'.

This series showed up in the "Graduated to master" section of your
email and the series shows up on master; did you just forget to remove
this last line?

> * en/abort-df-conflict-fixes (2018-07-16) 2 commits
>  - read-cache: fix directory/file conflict handling in read_index_unmerged()
>  - t1015: demonstrate directory/file conflict recovery failures
>
>  "git merge --abort" etc. did not clean things up properly when
>  there were conflicted entries in certain order that are involved
>  in D/F conflicts.  This has been corrected.
>
>  This may have to be rebased on an older maintenance track before
>  moving forward.

Would you like me to rebase on maint and resubmit?  Alternatively,
  git cherry-pick -Xours master..en/abort-df-conflict-fixes
will backport the fixes to maint, with the only downside that it
leaves some unnecessary (but innocuous) double 'git reset --hard'
invocations in t6042.

Just let me know whatever is easiest for you.

> * en/t6042-insane-merge-rename-testcases (2018-07-03) 3 commits
>  - t6042: add testcase covering long chains of rename conflicts
>  - t6042: add testcase covering rename/rename(2to1)/delete/delete conflict
>  - t6042: add testcase covering rename/add/delete conflict type
>
>  Various glitches in the heuristics of merge-recursive strategy have
>  been documented in new tests.
>
>  Will merge to 'next'.
>
>  I am not sure if there is a single "correct" answer everybody can
>  agree on for each of these "insane" cases, though.

Yeah, I agree.  I was a little unsure about adding the expected
"correct" answer in these testcases for fear I would just end up
modifying it whenever I finally implement a fix.  However, it was
clear that current handling for these testcases is suboptimal, and
ultimately I decided it'd make sense to just take my best guess at
"correct" for now and deal with any modifications later.  *shrug*  I'm
not sure what, if any changes to make to this series because of this,
though; for now, I think it's fine as-is.


Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-19 Thread Junio C Hamano
Junio C Hamano  writes:

> Johannes Schindelin  writes:
>
>> I would like to ask you to reinstate the post-rewrite hook, as it still
>> improves the situation over the current one.
>
> Without post-rewrite I seem to be getting correct amlog entries for
> commits created by "git rebase"; do our rebase--am backend still
> trigger post-applypatch hook in its "am" phase to apply the patches
> created with "format-patch"?

That was a wrong line of thought that led to a dead end.  format-patch
won't recreate Message-Id to its output from notes/amlog, so even if
the "format-patch --stdout | am" pipeline inside rebase-am triggered
the post-applypatch hook, it would not have a chance to carry the
notes forward that way.

What was really happening was I have

$ git config --list | grep amlog
notes.rewriteref=refs/notes/amlog

and that ought to be sufficient to carry "commit-to-original-msg-id"
entries across rebases.  And it seems to correctly work.  I however
suspect that "cherry-pick A..B" may lose the notes, but I haven't
checked.






Re: 2.18.0 Regression: packing performance and effectiveness

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 05:16:42PM +0200, Duy Nguyen wrote:

> On Thu, Jul 19, 2018 at 07:57:37AM +0200, Duy Nguyen wrote:
> > I thought 2M was generous but I was obviously wrong. I'd like to see
> > the biggest delta size in this pack and whether it's still reasonable
> > to increase OE_DELTA_SIZE_BITS. But if it's very close to 4GB limit
> > then we could just store 64-bit delta size in a separate array.
> 
> I realize now that these two options don't have to be mutually
> exclusive and I should provide a good fallback in terms of performance
> anyway.

Yeah, this is what I had assumed was happening in the original code. :)

> +void oe_prepare_delta_size_array(struct packing_data *pack)
> +{
> + int i;
> +
> + /*
> +  * nr_alloc, not nr_objects to align with realloc() strategy in
> +  * packlist_alloc()
> +  */
> + ALLOC_ARRAY(pack->delta_size, pack->nr_alloc);
> +
> + for (i = 0; i < pack->nr_objects; i++)
> + pack->delta_size[i] = pack->objects[i].delta_size_;
> +}

This iterator should probably be a uint32_t, to match nr_objects.

The rest of the patch looks OK to me. From Elijah's failure report there
clearly is _some_ problem where we end up with a truncated write of a
delta. But I don't see it from code inspection, nor could I reproduce
it.

-Peff


Re: Use different ssh keys for different github repos (per-url sshCommand)

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 02:50:16PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I thought of writing a wrapper script to deduce the key from the arguments:
> >
> > g...@github.com git-upload-pack '/theorganization/privaterepo.git'
> >
> > Is this the only option?
> 
> Yes, I had a similar problem a while ago (which I sent an RFC patch for)
> which shows a script you can use:
> https://public-inbox.org/git/20180103102840.27897-1-ava...@gmail.com/
> 
> It would be nice if this were configurable. Instead of the way you
> suggested, it would be more general if we supported:
> 
> [Include "remote:g...@github.com:theorganization*"]
> path = theorganization.config
> 
> Although I'm sure we'd have some interesting chicken & egg problems
> there when it comes to bootstrapping the config parsing.

I don't think we'd ever support this via the include mechanism. The
idea of "which remote are we looking at" is specific to a particular
part of an operation. Whereas config parsing is generally process-wide,
so it has to be based on a property of the whole process (like "which
directory are we in"). Maybe that's what you meant by chicken and egg.

If we were to make this more configurable, it would probably be more
like existing http.* config, which loads all the config, but then does
URL-specific matching when applying the config to a particular
operation.

-Peff


Re: [PATCH 2/2] repack -ad: prune the list of shallow commits

2018-07-19 Thread Johannes Schindelin
Hi Junio,

On Tue, 17 Jul 2018, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > I'm OK with having a partial fix, or one that fixes immediate pain
> > without doing a big cleanup, as long as it doesn't make anything _worse_
> > in a user-visible way. And that's really my question: is pruning here
> > going to bite people unexpectedly (not rhetorical -- I really don't
> > know)?
> 
> Yeah, that matches the general guideline I follow when reviewing a
> patch that claims to make existing things better.  And I do not
> think I can explain to a third person why pruning here is a good
> idea and won't cause problems, after seeing these patches and
> the discussion from the sideline.

It is very easy to explain: `git repack` can drop unreachable commits
without further warning, making the corresponding entries in
`.git/shallow` invalid, which causes serious problems when deepening the
branches.

The solution is easy: drop also the now-invalid entries in `.git/shallow`
after dropping unreachable commits unceremoniously.

While I am sympathetic to Peff's attempt to make this a bit more general,
I think he overthinks it, and we do not even need to complexify the code
for now.

Ciao,
Dscho


Re: Use different ssh keys for different github repos (per-url sshCommand)

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 03:24:54PM +0300, Basin Ilya wrote:

> I have two github accounts, one is for my organization and I want git
> to automatically choose the correct ssh `IdentityFile` based on the
> clone URL:
> 
> g...@github.com:other/publicrepo.git
>~/.ssh/id_rsa
> g...@github.com:theorganization/privaterepo.git
>~/.ssh/id_rsa.theorganization
> 
> Unfortunately, both URLs have same host name, therefore I can't
> configure this in the ssh client config. I could create a host alias
> there, but sometimes somebody else gives me the github URL and I want
> it to work out of the box.

I think you can hack around this using Git's URL rewriting.

For example, try this:

  git config --global \
url.gh-other:other/.insteadOf \
g...@github.com:other/

  git config --global \
url.gh-org:theorganization.insteadOf \
g...@github.com:theorganization/

And then:

  git clone g...@github.com:other/publicrepo.git

will hit gh-other, which you can configure using an ssh host alias.

-Peff


Re: 2.18.0 Regression: packing performance and effectiveness

2018-07-19 Thread Elijah Newren
On Thu, Jul 19, 2018 at 8:16 AM, Duy Nguyen  wrote:
> On Thu, Jul 19, 2018 at 07:57:37AM +0200, Duy Nguyen wrote:
>> I thought 2M was generous but I was obviously wrong. I'd like to see
>> the biggest delta size in this pack and whether it's still reasonable
>> to increase OE_DELTA_SIZE_BITS. But if it's very close to 4GB limit
>> then we could just store 64-bit delta size in a separate array.
>
> I realize now that these two options don't have to be mutually
> exclusive and I should provide a good fallback in terms of performance
> anyway.
>
> So Elijah, could you try this patch and see if performance and pack
> size go back to 1.7.0? I can write proper commit message and test
> support later if it proves a good improvement.

Thanks for the quick turnaround.  Unfortunately, I have some bad news.
With this patch, I get the following:

$ /usr/bin/time -f 'MaxRSS:%M Time:%e' git gc --aggressive
Enumerating objects: 4460703, done.
Counting objects: 100% (4460703/4460703), done.
Delta compression using up to 40 threads.
Compressing objects: 100% (3807140/3807140), done.
Writing objects: 100% (4460703/4460703), done.
Total 4460703 (delta 2831383), reused 1587071 (delta 0)
error: failed to unpack compressed delta at offset 183854150 from
.git/objects/pack/pack-30d4f0b0e5a03dc91a658a0586f4e74cdf4a94d6.pack
fatal: packed object 20ce811e53dabbb8ef9368c108cbbdfa65639c03 (stored
in .git/objects/pack/pack-30d4f0b0e5a03dc91a658a0586f4e74cdf4a94d6.pack)
is corrupt
error: failed to run prune
MaxRSS:40025196 Time:2531.52


It looks like this comes after it deleted the old packs, so running
git fsck afterwards prints lots of errors.  Just in case they're
helpful at all, there are 26 "error: failed to unpack compressed delta
at offset..." messages, 26 "error: cannot unpack ... from ... at
offset ..." messages, 17 "error: failed to read delta base object..."
messages, 7 "broken link from ... to ..." messages, four missing
blobs, and three missing trees.

(This was run on a copy of the repo, so no harm done; I can still use
the original.)


> -- 8< --
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index ebc8cefb53..7f3546057d 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -2023,10 +2023,6 @@ static int try_delta(struct unpacked *trg, struct 
> unpacked *src,
> delta_buf = create_delta(src->index, trg->data, trg_size, 
> _size, max_size);
> if (!delta_buf)
> return 0;
> -   if (delta_size >= (1U << OE_DELTA_SIZE_BITS)) {
> -   free(delta_buf);
> -   return 0;
> -   }
>
> if (DELTA(trg_entry)) {
> /* Prefer only shallower same-sized deltas. */
> diff --git a/pack-objects.c b/pack-objects.c
> index 92708522e7..191ed45faf 100644
> --- a/pack-objects.c
> +++ b/pack-objects.c
> @@ -160,6 +160,8 @@ struct object_entry *packlist_alloc(struct packing_data 
> *pdata,
>
> if (!pdata->in_pack_by_idx)
> REALLOC_ARRAY(pdata->in_pack, pdata->nr_alloc);
> +   if (pdata->delta_size)
> +   REALLOC_ARRAY(pdata->delta_size, pdata->nr_alloc);
> }
>
> new_entry = pdata->objects + pdata->nr_objects++;
> @@ -177,3 +179,17 @@ struct object_entry *packlist_alloc(struct packing_data 
> *pdata,
>
> return new_entry;
>  }
> +
> +void oe_prepare_delta_size_array(struct packing_data *pack)
> +{
> +   int i;
> +
> +   /*
> +* nr_alloc, not nr_objects to align with realloc() strategy in
> +* packlist_alloc()
> +*/
> +   ALLOC_ARRAY(pack->delta_size, pack->nr_alloc);
> +
> +   for (i = 0; i < pack->nr_objects; i++)
> +   pack->delta_size[i] = pack->objects[i].delta_size_;
> +}
> diff --git a/pack-objects.h b/pack-objects.h
> index edf74dabdd..978500e474 100644
> --- a/pack-objects.h
> +++ b/pack-objects.h
> @@ -14,7 +14,7 @@
>   * above this limit. Don't lower it too much.
>   */
>  #define OE_SIZE_BITS   31
> -#define OE_DELTA_SIZE_BITS 20
> +#define OE_DELTA_SIZE_BITS 21
>
>  /*
>   * State flags for depth-first search used for analyzing delta cycles.
> @@ -93,7 +93,6 @@ struct object_entry {
>  * uses the same base as me
>  */
> unsigned delta_size_:OE_DELTA_SIZE_BITS; /* delta data size 
> (uncompressed) */
> -   unsigned delta_size_valid:1;
> unsigned in_pack_idx:OE_IN_PACK_BITS;   /* already in pack */
> unsigned z_delta_size:OE_Z_DELTA_BITS;
> unsigned type_valid:1;
> @@ -130,6 +129,7 @@ struct packing_data {
> uint32_t index_size;
>
> unsigned int *in_pack_pos;
> +   unsigned long *delta_size;
>
> /*
>  * Only one of these can be non-NULL and they have different
> @@ -330,20 +330,27 @@ static inline void oe_set_size(struct packing_data 
> *pack,
>  static inline unsigned long oe_delta_size(struct packing_data *pack,
> 

Re: [PATCH 00/16] Consolidate reachability logic

2018-07-19 Thread Johannes Schindelin
Hi Stolee,

On Wed, 18 Jul 2018, Derrick Stolee wrote:

> On 7/18/2018 1:01 PM, Junio C Hamano wrote:
> > No, fixing a tool that throws such a harder-to-read patch series in
> > reader's mailbox is *not* something I'd spend my primary focus on,
> > especially when many contributors are perfectly capable of sending
> > reasonably formatted series without using such a tool under
> > development.
> >
> > That won't stop those who want to improve the tool.  But I'd wish
> > those who want to make Git better spend their time on making Git,
> > over making GitGitGadget, better.
> 
> I appreciate the feedback in how this series caused reviewer pain. Hopefully
> this date issue is now resolved. Any further feedback is welcome.
> 
> I'm choosing to use and contribute to GitGitGadget not because I'm incapable
> of sending series myself, but because I _have_ had difficulty. Using the email
> submissions creates a friction that I'm willing to overcome, but we are
> probably missing out on contributors who are not willing to push through that
> friction. Perhaps having another way for new contributors to feel welcome is
> an indirect way to make Git better.

While I am a seasoned Git contributor, it is *still* too painful to
contribute patches *even for me*.

So hopefully you and I will get this easier contribution process to the
point where other oldtimers do not want to take it.

At least we now have something that does not share the downsides with
SubmitGit, and is extensible enough that we can teach it new tricks.

With a little luck, Junio will fix amlog so that it is not utter garbage
for anybody but himself, and then GitGitGadget can give contributors
useful feedback about the state of their patch series, including automated
notifications when their patches have been mentioned in the What's Cooking
mail (which no irregular contributor reads, as far as I know).

Ciao,
Dscho


Re: What's cooking in git.git (Jul 2018, #02; Wed, 18)

2018-07-19 Thread Junio C Hamano
Eric Sunshine  writes:

> I guess the two fixup!'s were meant to be squashed (and it's probably
> too late to do so now?).

I guess so.  That is what we get from trying to be nice to reduce
the number of iterations and cheating by short-circuitting the
process X-<.


Re: [PATCH 00/16] Consolidate reachability logic

2018-07-19 Thread Johannes Schindelin
Hi Peff,

On Wed, 18 Jul 2018, Jeff King wrote:

> On Wed, Jul 18, 2018 at 02:23:11PM +0200, Johannes Schindelin wrote:
> 
> > > Yeah, they're out of order in mutt's threaded display. And the
> > > back-dating means there's a much higher chance of them getting blocked
> > > as spam (e.g., some of the dates are from weeks ago).
> > > 
> > > git-send-email uses the current time minus an offset, and then
> > > monotonically increases for each patch:
> > > 
> > >   $time = time - scalar $#files;
> > >   ...
> > >   my $date = format_2822_time($time++);
> > > 
> > > which seems to work pretty well in practice. It does mean the original
> > > dates are lost. The committer date is not interesting at all (there will
> > > be a new committer via "git am" anyway). The original author date is
> > > potentially of interest, but could be included as an in-body header.
> > > AFAIK send-email doesn't have such an option, though, and people are
> > > fine with date-of-sending becoming the new author date.
> > > 
> > > +cc Johannes as the GitGitGadget author
> > 
> > Thanks for dumping even more work on my shoulders.
> 
> Wow. Here's my perspective on what I wrote.
> 
> Somebody pointed out an issue in the tool. I tried to add an additional
> data point (how other clients react, and that I've seen spam-related
> problems). And I tried to point to an existing solution in another tool,
> in case that was helpful. I thought cc-ing you would be a favor, since
> you obviously have an interest in the tool, and it is easy to miss
> discussions buried deep in a thread.
> 
> So no, I didn't write the patch for you. But I tried to contribute
> positively to the process. And I got yelled at for it. That makes me a
> lot less inclined to try to help in the future.
> 
> > Next time, I will ask you to jump in, instead of putting the onus on me.
> >
> > I mean, seriously, what is this? "You can use *any* mail program to work
> > with the Git mailing list, *any* mailer. As long as it is mutt. And as
> > long as you spend hours and hours on tooling that oh BTW nobody else can
> > use."
> 
> The irony here is that I actually _did_ look at the GitGitGadget
> repository, and thought about making a patch to be helpful. But as it is
> written in a language I'm not all that familiar with, using tools that I
> don't normally use, I didn't want to spend hours and hours in order to
> make what was probably going to be a one-line patch in software that I
> don't use myself.

I understand that. The web is not based on shell scripting, so there is no
good way to implement a bot on GitHub using Bash scripts.

Ciao,
Dscho


Re: What's cooking in git.git (Jul 2018, #02; Wed, 18)

2018-07-19 Thread Junio C Hamano
Stefan Beller  writes:

>>  It seems to pass its own self-tests standalone, but seems to break
>>  horribly when merged to 'pu'.
>
> It actually breaks combined with sb/submodule-core-worktree  :-(
> as that introduces another user of $name in git-submodule.sh
> which we eliminate in this series.

Thanks for diagnosing it.  I couldn't get around to figure out the
interaction before pushing out.



Re: [PATCH 00/16] Consolidate reachability logic

2018-07-19 Thread Johannes Schindelin
Hi Junio,

On Wed, 18 Jul 2018, Junio C Hamano wrote:

> That won't stop those who want to improve the tool.  But I'd wish
> those who want to make Git better spend their time on making Git,
> over making GitGitGadget, better.

And I'd wish that you would not make this task harder by refusing to fix
your process to update refs/notes/amlog.

Thanks,
Dscho


Re: [PATCH] diff.c: offer config option to control ws handling in move detection

2018-07-19 Thread Junio C Hamano
Stefan Beller  writes:

> On Wed, Jul 18, 2018 at 10:45 AM Junio C Hamano  wrote:
>>
>> Stefan Beller  writes:
>>
>> > diff --git a/Documentation/diff-options.txt 
>> > b/Documentation/diff-options.txt
>> > index 143acd9417e..8da7fed4e22 100644
>> > --- a/Documentation/diff-options.txt
>> > +++ b/Documentation/diff-options.txt
>> > @@ -294,8 +294,11 @@ dimmed_zebra::
>> >
>> >  --color-moved-ws=::
>> >   This configures how white spaces are ignored when performing the
>> > - move detection for `--color-moved`. These modes can be given
>> > - as a comma separated list:
>> > + move detection for `--color-moved`.
>> > +ifdef::git-diff[]
>> > + It can be set by the `diff.colorMovedWS` configuration setting.
>> > +endif::git-diff[]
>>
>> The patch to diff.c::git_diff_ui_config() we see below does not seem
>> to make any effort to make sure that this new configuration applies
>> only to "git diff" and that other commands like "git log" that call
>> git_diff_ui_config() are not affected.
>
> That is as intended. (We want to have it in git-log)

Ah, I think what is going on here, and I think I asked a wrong
question.

 * diff-options.txt is used by the family of diff plumbing commands
   (which actively do not want to participate in the coloring game
   and do not call git_diff_ui_config()) as well as log family of
   commands (which do pay attention to the config).

 * "git grep '^:git-diff:'" hits only Documentation/git-diff.txt.

What the system currently does (which may not match what it should
do) is that Porcelain "diff", "log", etc. pay attention to the
configuration while plumbing "diff-{files,index,tree}" don't, and
use of ifdef/endif achieves only half of that (i.e. excludes the
sentence from plumbing manual pages).  It excludes too much and does
not say "log", "show", etc. also honor the configuration.

I think the set of asciidoc attrs diff-options.txt files uses need
some serious clean-up.  For example, it defines :git-diff-core: that
is only used once, whose intent seems to be "we are describing diff
plumbing".  However, the way it does so is by excluding known
Porcelain; if we ever add new diff porcelain (e.g. "range-diff"),
that way of 'definition by exclusion' would break.  The scheme is
already broken by forcing git-show.txt to define 'git-log' just like
git-log.txt does, meaning that it is not possible to make "show" to
be described differently from "log".  But let's leave that outside
this topic.

Back to a more on-topic tangent.

How does this patch (and all the recent "color" options you added
recently) avoid spending unnecessary cycles and contaminating "git
format-patch" output, by the way?  builtin/log.c::cmd_format_patch()
seems to eventually call into git_log_config() that ends with a call
to diff_ui_config().

Thanks.


Re: 2.18.0 Regression: packing performance and effectiveness

2018-07-19 Thread Duy Nguyen
On Thu, Jul 19, 2018 at 5:27 PM Elijah Newren  wrote:
>
> On Wed, Jul 18, 2018 at 10:41 PM, Duy Nguyen  wrote:
> > On Thu, Jul 19, 2018 at 12:51 AM Elijah Newren  wrote:
> >>
> >> I had a user report some poor behavior of 'git gc --aggressive' on a
> >> certain repo (which I sadly cannot share).  Turns out that on this
> >> repo, this operation takes about 60% longer and produces a pack
> >> roughly twice the expected size.
> >
> > The intention was to make life better for weaker machines but
> > definitely should not slow down beefier ones, so yes this is
> > definitely a regression.
>
> Not sure if it matters, but the original discovery was on a laptop.
> Probably 4 cores and 16 GB RAM.  I duplicated on my workstation (8
> cores, 24 GB RAM).  However, since the original patch was memory
> related and I noticed the repacking using all available memory, I
> repeated the testcases while measuring memory but did it on a machine
> that wouldn't be memory limited.

That should be plentiful for a 5GB pack with default cache settings, I think.

> > Is it possible to share "verify-pack -v " output of the
> > pack produced by 2.17.0 and 2.18.0? The only sensitive info there is
> > sha-1, which you can replace with just "SHA-1" if you want. I'm more
> > interested in delta sizes and distribution.
>
> For the deltas, assuming the size-in-pack field (4th column) is the relevant 
> one
>
> Number of objects in repo (lines of verify-pack output): 4460755
> Number of deltas: 2831384
> Number of deltas greater than 1MB: 72
> Min: 14
> Median: 47
> Mean: 586
> 99.999 percentile: 11366280.6 (10.8 MB)
> Max: 141664210 (135.1 MB)
>
> If the size field (3rd column) is the relevant one, then the numbers
> change slightly to:
>
> Number of deltas greater than 1MB: 101
> Min: 4
> Median: 33
> Mean: 660
> 99.999 percentile: 12245551.7 (11.7 MB)

I think size is the right one because even deltas are compressed
before stored in the pack file (I probably should check the code...)
but anyway this number is bigger and that sounds to me like 16MB as
delta size limit should cover common case nicely. 32MB to be on the
safe side, but given default cache size of 256MB, a single delta 1/8
the size of the cache sounds too much.

> Max: 144658342 (138.0 MB)

This one would trigger the patch I just sent, which should also handle
it nicely (I hope).

> I checked out the blob which had the biggest delta, as well as the
> blob it was a delta against.  One was a 280 MB file, the other a 278
> MB file.
-- 
Duy


Re: 2.18.0 Regression: packing performance and effectiveness

2018-07-19 Thread Elijah Newren
On Wed, Jul 18, 2018 at 10:41 PM, Duy Nguyen  wrote:
> On Thu, Jul 19, 2018 at 12:51 AM Elijah Newren  wrote:
>>
>> I had a user report some poor behavior of 'git gc --aggressive' on a
>> certain repo (which I sadly cannot share).  Turns out that on this
>> repo, this operation takes about 60% longer and produces a pack
>> roughly twice the expected size.
>
> The intention was to make life better for weaker machines but
> definitely should not slow down beefier ones, so yes this is
> definitely a regression.

Not sure if it matters, but the original discovery was on a laptop.
Probably 4 cores and 16 GB RAM.  I duplicated on my workstation (8
cores, 24 GB RAM).  However, since the original patch was memory
related and I noticed the repacking using all available memory, I
repeated the testcases while measuring memory but did it on a machine
that wouldn't be memory limited.

> Is it possible to share "verify-pack -v " output of the
> pack produced by 2.17.0 and 2.18.0? The only sensitive info there is
> sha-1, which you can replace with just "SHA-1" if you want. I'm more
> interested in delta sizes and distribution.

For the deltas, assuming the size-in-pack field (4th column) is the relevant one

Number of objects in repo (lines of verify-pack output): 4460755
Number of deltas: 2831384
Number of deltas greater than 1MB: 72
Min: 14
Median: 47
Mean: 586
99.999 percentile: 11366280.6 (10.8 MB)
Max: 141664210 (135.1 MB)

If the size field (3rd column) is the relevant one, then the numbers
change slightly to:

Number of deltas greater than 1MB: 101
Min: 4
Median: 33
Mean: 660
99.999 percentile: 12245551.7 (11.7 MB)
Max: 144658342 (138.0 MB)

I checked out the blob which had the biggest delta, as well as the
blob it was a delta against.  One was a 280 MB file, the other a 278
MB file.


Re: 2.18.0 Regression: packing performance and effectiveness

2018-07-19 Thread Duy Nguyen
On Thu, Jul 19, 2018 at 07:57:37AM +0200, Duy Nguyen wrote:
> I thought 2M was generous but I was obviously wrong. I'd like to see
> the biggest delta size in this pack and whether it's still reasonable
> to increase OE_DELTA_SIZE_BITS. But if it's very close to 4GB limit
> then we could just store 64-bit delta size in a separate array.

I realize now that these two options don't have to be mutually
exclusive and I should provide a good fallback in terms of performance
anyway.

So Elijah, could you try this patch and see if performance and pack
size go back to 1.7.0? I can write proper commit message and test
support later if it proves a good improvement.

-- 8< --
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ebc8cefb53..7f3546057d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2023,10 +2023,6 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
delta_buf = create_delta(src->index, trg->data, trg_size, _size, 
max_size);
if (!delta_buf)
return 0;
-   if (delta_size >= (1U << OE_DELTA_SIZE_BITS)) {
-   free(delta_buf);
-   return 0;
-   }
 
if (DELTA(trg_entry)) {
/* Prefer only shallower same-sized deltas. */
diff --git a/pack-objects.c b/pack-objects.c
index 92708522e7..191ed45faf 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -160,6 +160,8 @@ struct object_entry *packlist_alloc(struct packing_data 
*pdata,
 
if (!pdata->in_pack_by_idx)
REALLOC_ARRAY(pdata->in_pack, pdata->nr_alloc);
+   if (pdata->delta_size)
+   REALLOC_ARRAY(pdata->delta_size, pdata->nr_alloc);
}
 
new_entry = pdata->objects + pdata->nr_objects++;
@@ -177,3 +179,17 @@ struct object_entry *packlist_alloc(struct packing_data 
*pdata,
 
return new_entry;
 }
+
+void oe_prepare_delta_size_array(struct packing_data *pack)
+{
+   int i;
+
+   /*
+* nr_alloc, not nr_objects to align with realloc() strategy in
+* packlist_alloc()
+*/
+   ALLOC_ARRAY(pack->delta_size, pack->nr_alloc);
+
+   for (i = 0; i < pack->nr_objects; i++)
+   pack->delta_size[i] = pack->objects[i].delta_size_;
+}
diff --git a/pack-objects.h b/pack-objects.h
index edf74dabdd..978500e474 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -14,7 +14,7 @@
  * above this limit. Don't lower it too much.
  */
 #define OE_SIZE_BITS   31
-#define OE_DELTA_SIZE_BITS 20
+#define OE_DELTA_SIZE_BITS 21
 
 /*
  * State flags for depth-first search used for analyzing delta cycles.
@@ -93,7 +93,6 @@ struct object_entry {
 * uses the same base as me
 */
unsigned delta_size_:OE_DELTA_SIZE_BITS; /* delta data size 
(uncompressed) */
-   unsigned delta_size_valid:1;
unsigned in_pack_idx:OE_IN_PACK_BITS;   /* already in pack */
unsigned z_delta_size:OE_Z_DELTA_BITS;
unsigned type_valid:1;
@@ -130,6 +129,7 @@ struct packing_data {
uint32_t index_size;
 
unsigned int *in_pack_pos;
+   unsigned long *delta_size;
 
/*
 * Only one of these can be non-NULL and they have different
@@ -330,20 +330,27 @@ static inline void oe_set_size(struct packing_data *pack,
 static inline unsigned long oe_delta_size(struct packing_data *pack,
  const struct object_entry *e)
 {
-   if (e->delta_size_valid)
+   if (!pack->delta_size)
return e->delta_size_;
-   return oe_size(pack, e);
+   return pack->delta_size[e - pack->objects];
 }
 
+void oe_prepare_delta_size_array(struct packing_data *pack);
 static inline void oe_set_delta_size(struct packing_data *pack,
 struct object_entry *e,
 unsigned long size)
 {
e->delta_size_ = size;
-   e->delta_size_valid = e->delta_size_ == size;
-   if (!e->delta_size_valid && size != oe_size(pack, e))
-   BUG("this can only happen in check_object() "
-   "where delta size is the same as entry size");
+   if (!pack->delta_size && e->delta_size_ == size)
+   return;
+   /*
+* We have had at least one delta size exceeding OE_DELTA_SIZE_BITS
+* limit. delta_size_ will not be used anymore. All delta sizes are now
+* from the delta_size[] array.
+*/
+   if (!pack->delta_size)
+   oe_prepare_delta_size_array(pack);
+   pack->delta_size[e - pack->objects] = size;
 }
 
 #endif
-- 8< --


--
Duy


Re: [PATCH] t/t5534: do not unset GIT_COMMITTER_EMAIL for other tests

2018-07-19 Thread Taylor Blau
On Thu, Jul 19, 2018 at 02:14:09PM +0200, Henning Schild wrote:
> Unsetting the varibale for good can have unwanted effects for new

s/varibale/variable

> tests added in the future It also meant we needed to hardcode the

s/future/&.

> value for "user.signingkey".
> Move the unset into a subshell, get rid of the hardcoded
> "commit...@example.com", and switch the GPG variant to using test_config
> just like GPGSM.

OK, sounds good.

> Signed-off-by: Henning Schild 
> ---
>  t/t5534-push-signed.sh | 19 ---
>  1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/t/t5534-push-signed.sh b/t/t5534-push-signed.sh
> index 0cb88aa6f..f6d674156 100755
> --- a/t/t5534-push-signed.sh
> +++ b/t/t5534-push-signed.sh
> @@ -194,10 +194,12 @@ test_expect_success GPG 'fail without key and heed 
> user.signingkey' '
>
>   EOF
>
> - unset GIT_COMMITTER_EMAIL &&
> - git config user.email hasno...@nowhere.com &&
> - test_must_fail git push --signed dst noop ff +noff &&
> - git config user.signingkey commit...@example.com &&
> + test_config user.email hasno...@nowhere.com &&
> + (
> + unset GIT_COMMITTER_EMAIL &&
> + test_must_fail git push --signed dst noop ff +noff
> + ) &&
> + test_config user.signingkey $GIT_COMMITTER_EMAIL &&
>   git push --signed dst noop ff +noff &&

I think replacing the 'git config' invocations with their 'test_config'
equivalents in its own patch would be easier to grok, but I don't think
that splitting the change out is worth a re-roll on its own.

>   (
> @@ -241,11 +243,14 @@ test_expect_success GPGSM 'fail without key and heed 
> user.signingkey x509' '
>   E_O_F
>
>   EOF
> - unset GIT_COMMITTER_EMAIL &&
> +

Nit; remove this newline.

>   test_config user.email hasno...@nowhere.com &&
>   test_config user.signingkey "" &&
> - test_must_fail git push --signed dst noop ff +noff &&
> - test_config user.signingkey commit...@example.com &&
> + (
> + unset GIT_COMMITTER_EMAIL &&
> + test_must_fail git push --signed dst noop ff +noff
> + ) &&
> + test_config user.signingkey $GIT_COMMITTER_EMAIL &&
>   git push --signed dst noop ff +noff &&

Looks good.

>   (
> --
> 2.16.4

Thanks,
Taylor


Re: Use different ssh keys for different github repos (per-url sshCommand)

2018-07-19 Thread Sitaram Chamarty
On 07/19/2018 06:52 PM, Sitaram Chamarty wrote:
> On Thu, Jul 19, 2018 at 03:24:54PM +0300, Basin Ilya wrote:
>> Hi.
>>
>> I have two github accounts, one is for my organization and I want git to 
>> automatically choose the correct ssh `IdentityFile` based on the clone URL:
>>
>> g...@github.com:other/publicrepo.git
>>~/.ssh/id_rsa
>> g...@github.com:theorganization/privaterepo.git
>>~/.ssh/id_rsa.theorganization
>>
>> Unfortunately, both URLs have same host name, therefore I can't configure 
>> this in the ssh client config. I could create a host alias there, but 
>> sometimes somebody else gives me the github URL and I want it to work out of 
>> the box.
>>
>> I thought I could add a per-URL `core` section similar to `user` and `http`, 
>> but this section is ignored by git (2.18):
>>
>> [core "g...@github.com:theorganization"]
>> sshCommand = /bin/false
>> #sshCommand = ssh -i ~/.ssh/id_rsa.theorganization
>>
>> I thought of writing a wrapper script to deduce the key from the arguments:
>>
>> g...@github.com git-upload-pack '/theorganization/privaterepo.git'
>>
>> Is this the only option?
> 
> This is what I do (I don't have two accounts on github, but
> elsewhere; same idea though)

my apologies; I did not read your email fully and went off half-cocked!
Looks like you already tried host aliases and they don't work for you.

Sorry for the noise!


Re: Use different ssh keys for different github repos (per-url sshCommand)

2018-07-19 Thread Sitaram Chamarty
On Thu, Jul 19, 2018 at 03:24:54PM +0300, Basin Ilya wrote:
> Hi.
> 
> I have two github accounts, one is for my organization and I want git to 
> automatically choose the correct ssh `IdentityFile` based on the clone URL:
> 
> g...@github.com:other/publicrepo.git
>~/.ssh/id_rsa
> g...@github.com:theorganization/privaterepo.git
>~/.ssh/id_rsa.theorganization
> 
> Unfortunately, both URLs have same host name, therefore I can't configure 
> this in the ssh client config. I could create a host alias there, but 
> sometimes somebody else gives me the github URL and I want it to work out of 
> the box.
> 
> I thought I could add a per-URL `core` section similar to `user` and `http`, 
> but this section is ignored by git (2.18):
> 
> [core "g...@github.com:theorganization"]
> sshCommand = /bin/false
> #sshCommand = ssh -i ~/.ssh/id_rsa.theorganization
> 
> I thought of writing a wrapper script to deduce the key from the arguments:
> 
> g...@github.com git-upload-pack '/theorganization/privaterepo.git'
> 
> Is this the only option?

This is what I do (I don't have two accounts on github, but
elsewhere; same idea though)

# this goes in ~/.ssh/config

host gh1
usergit
hostnamegithub.com
identityfile~/.ssh/id_rsa_1

host gh2
usergit
hostnamegithub.com
identityfile~/.ssh/id_rsa_2

Now use "gh1:username/reponame" and "gh2:username/reponame" as
URLs.  It all just works.


Re: Use different ssh keys for different github repos (per-url sshCommand)

2018-07-19 Thread Ævar Arnfjörð Bjarmason


On Thu, Jul 19 2018, Basin Ilya wrote:

> Hi.
>
> I have two github accounts, one is for my organization and I want git to 
> automatically choose the correct ssh `IdentityFile` based on the clone URL:
>
> g...@github.com:other/publicrepo.git
>~/.ssh/id_rsa
> g...@github.com:theorganization/privaterepo.git
>~/.ssh/id_rsa.theorganization
>
> Unfortunately, both URLs have same host name, therefore I can't configure 
> this in the ssh client config. I could create a host alias there, but 
> sometimes somebody else gives me the github URL and I want it to work out of 
> the box.
>
> I thought I could add a per-URL `core` section similar to `user` and `http`, 
> but this section is ignored by git (2.18):
>
> [core "g...@github.com:theorganization"]
> sshCommand = /bin/false
> #sshCommand = ssh -i ~/.ssh/id_rsa.theorganization
>
> I thought of writing a wrapper script to deduce the key from the arguments:
>
> g...@github.com git-upload-pack '/theorganization/privaterepo.git'
>
> Is this the only option?

Yes, I had a similar problem a while ago (which I sent an RFC patch for)
which shows a script you can use:
https://public-inbox.org/git/20180103102840.27897-1-ava...@gmail.com/

It would be nice if this were configurable. Instead of the way you
suggested, it would be more general if we supported:

[Include "remote:g...@github.com:theorganization*"]
path = theorganization.config

Although I'm sure we'd have some interesting chicken & egg problems
there when it comes to bootstrapping the config parsing.


Use different ssh keys for different github repos (per-url sshCommand)

2018-07-19 Thread Basin Ilya
Hi.

I have two github accounts, one is for my organization and I want git to 
automatically choose the correct ssh `IdentityFile` based on the clone URL:

g...@github.com:other/publicrepo.git
   ~/.ssh/id_rsa
g...@github.com:theorganization/privaterepo.git
   ~/.ssh/id_rsa.theorganization

Unfortunately, both URLs have same host name, therefore I can't configure this 
in the ssh client config. I could create a host alias there, but sometimes 
somebody else gives me the github URL and I want it to work out of the box.

I thought I could add a per-URL `core` section similar to `user` and `http`, 
but this section is ignored by git (2.18):

[core "g...@github.com:theorganization"]
sshCommand = /bin/false
#sshCommand = ssh -i ~/.ssh/id_rsa.theorganization

I thought of writing a wrapper script to deduce the key from the arguments:

g...@github.com git-upload-pack '/theorganization/privaterepo.git'

Is this the only option?


Re: [PATCH] t/t5534: do not unset GIT_COMMITTER_EMAIL for other tests

2018-07-19 Thread Henning Schild
Looking at "what is cooking" i assume i should not add/fold this to/in
the serien anymore. So it comes as a separate patch on top.

Henning

Am Thu, 19 Jul 2018 14:14:09 +0200
schrieb Henning Schild :

> Unsetting the varibale for good can have unwanted effects for new
> tests added in the future It also meant we needed to hardcode the
> value for "user.signingkey".
> Move the unset into a subshell, get rid of the hardcoded
> "commit...@example.com", and switch the GPG variant to using
> test_config just like GPGSM.
> 
> Signed-off-by: Henning Schild 
> ---
>  t/t5534-push-signed.sh | 19 ---
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/t/t5534-push-signed.sh b/t/t5534-push-signed.sh
> index 0cb88aa6f..f6d674156 100755
> --- a/t/t5534-push-signed.sh
> +++ b/t/t5534-push-signed.sh
> @@ -194,10 +194,12 @@ test_expect_success GPG 'fail without key and
> heed user.signingkey' ' 
>   EOF
>  
> - unset GIT_COMMITTER_EMAIL &&
> - git config user.email hasno...@nowhere.com &&
> - test_must_fail git push --signed dst noop ff +noff &&
> - git config user.signingkey commit...@example.com &&
> + test_config user.email hasno...@nowhere.com &&
> + (
> + unset GIT_COMMITTER_EMAIL &&
> + test_must_fail git push --signed dst noop ff +noff
> + ) &&
> + test_config user.signingkey $GIT_COMMITTER_EMAIL &&
>   git push --signed dst noop ff +noff &&
>  
>   (
> @@ -241,11 +243,14 @@ test_expect_success GPGSM 'fail without key and
> heed user.signingkey x509' ' E_O_F
>  
>   EOF
> - unset GIT_COMMITTER_EMAIL &&
> +
>   test_config user.email hasno...@nowhere.com &&
>   test_config user.signingkey "" &&
> - test_must_fail git push --signed dst noop ff +noff &&
> - test_config user.signingkey commit...@example.com &&
> + (
> + unset GIT_COMMITTER_EMAIL &&
> + test_must_fail git push --signed dst noop ff +noff
> + ) &&
> + test_config user.signingkey $GIT_COMMITTER_EMAIL &&
>   git push --signed dst noop ff +noff &&
>  
>   (



[PATCH] t/t5534: do not unset GIT_COMMITTER_EMAIL for other tests

2018-07-19 Thread Henning Schild
Unsetting the varibale for good can have unwanted effects for new
tests added in the future It also meant we needed to hardcode the
value for "user.signingkey".
Move the unset into a subshell, get rid of the hardcoded
"commit...@example.com", and switch the GPG variant to using test_config
just like GPGSM.

Signed-off-by: Henning Schild 
---
 t/t5534-push-signed.sh | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/t/t5534-push-signed.sh b/t/t5534-push-signed.sh
index 0cb88aa6f..f6d674156 100755
--- a/t/t5534-push-signed.sh
+++ b/t/t5534-push-signed.sh
@@ -194,10 +194,12 @@ test_expect_success GPG 'fail without key and heed 
user.signingkey' '
 
EOF
 
-   unset GIT_COMMITTER_EMAIL &&
-   git config user.email hasno...@nowhere.com &&
-   test_must_fail git push --signed dst noop ff +noff &&
-   git config user.signingkey commit...@example.com &&
+   test_config user.email hasno...@nowhere.com &&
+   (
+   unset GIT_COMMITTER_EMAIL &&
+   test_must_fail git push --signed dst noop ff +noff
+   ) &&
+   test_config user.signingkey $GIT_COMMITTER_EMAIL &&
git push --signed dst noop ff +noff &&
 
(
@@ -241,11 +243,14 @@ test_expect_success GPGSM 'fail without key and heed 
user.signingkey x509' '
E_O_F
 
EOF
-   unset GIT_COMMITTER_EMAIL &&
+
test_config user.email hasno...@nowhere.com &&
test_config user.signingkey "" &&
-   test_must_fail git push --signed dst noop ff +noff &&
-   test_config user.signingkey commit...@example.com &&
+   (
+   unset GIT_COMMITTER_EMAIL &&
+   test_must_fail git push --signed dst noop ff +noff
+   ) &&
+   test_config user.signingkey $GIT_COMMITTER_EMAIL &&
git push --signed dst noop ff +noff &&
 
(
-- 
2.16.4



Antw: Re: Q: Ignore ./foo, but not script/foo

2018-07-19 Thread Ulrich Windl
>>> Sebastian Staudt  schrieb am 19.07.2018 um 09:55 in
Nachricht
:
> Hello Ulrich,
> 
> if you want to ignore a file in the root of the repository (and only
> there) this is the correct syntax:
> 
> /foo

Hi!

Thanks, you are perfectly right: It works, and actually, when read carefully 
enough, the last item in "PATTERN FORMAT" explains that.

Maybe the EXAMPLES could have an example for each item (5 cases) described ;-)

Regards,
Ulrich

> 
> Best regards,
> Sebastian
> Am Do., 19. Juli 2018 um 09:45 Uhr schrieb Ulrich Windl
> :
>>
>> Hi!
>>
>> I have a (simple) question I could not answer elegantly from the 
> gitignore(5) manual page:
>>
>> A project produces a "foo" binary in the root directory that I want to 
> ignore (So I put "foo" into .gitignore)
>> Unfortunately I found out taht I cannot have a "script/foo" added while 
> "foo" is in .gitignore.
>> So I changed "foo" to "./foo" in .gitignore. I can could add "script/foo", 
> but now "foo" is not ignored any more!
>>
>> Is there as solution other than:?
>> --
>> foo
>> !script/foo
>> !bla/foo
>> #etc.
>> --
>>
>> If "foo" is one exception to generally using foo elsewhere, it seems to be 
> counterproductive to have to add exceptions for all the cases that are not 
> exceptions, while "foo" is the only exception...
>>
>> Did I miss something? If so, maybe add it to a future manual page.
>>
>> Regards,
>> Ulrich
>>
>>






Re: [PATCH v3 3/3] commit: fix exit code for --short/--porcelain

2018-07-19 Thread Samuel Lijin
Thanks for the review.

On Tue, Jul 17, 2018 at 10:33 AM Junio C Hamano  wrote:
>
> Samuel Lijin  writes:
>
> > diff --git a/wt-status.c b/wt-status.c
> > index 75d389944..4ba657978 100644
> > --- a/wt-status.c
> > +++ b/wt-status.c
> > @@ -718,6 +718,39 @@ static void wt_status_collect_untracked(struct 
> > wt_status *s)
> >   s->untracked_in_ms = (getnanotime() - t_begin) / 100;
> >  }
> >
> > +static int has_unmerged(const struct wt_status *s)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < s->change.nr; i++) {
> > + struct wt_status_change_data *d;
> > + d = s->change.items[i].util;
> > + if (d->stagemask)
> > + return 1;
> > + }
> > + return 0;
> > +}
> > +
> > +static void wt_status_mark_committable(
> > + struct wt_status *s, const struct wt_status_state *state)
> > +{
> > + int i;
> > +
> > + if (state->merge_in_progress && !has_unmerged(s)) {
> > + s->committable = 1;
> > + return;
> > + }
>
> Is this trying to say:
>
> During/after a merge, if there is no higher stage entry in
> the index, we can commit.
>
> I am wondering if we also should say:
>
> During/after a merge, if there is any unresolved conflict in
> the index, we cannot commit.
>
> in which case the above becomes more like this:
>
> if (state->merge_in_progress) {
> s->committable = !has_unmerged(s);
> return;
> }
>
> But with your patch, with no remaining conflict in the index during
> a merge, the control comes here and goes into the next loop.
>
> > + for (i = 0; i < s->change.nr; i++) {
> > + struct wt_status_change_data *d = (s->change.items[i]).util;
> > +
> > + if (d->index_status && d->index_status != 
> > DIFF_STATUS_UNMERGED) {
> > + s->committable = 1;
> > + return;
> > + }
> > + }
>
> The loop seems to say "As long as there is one entry in the index
> that is not in conflict and is different from the HEAD, then we can
> commit".  Is that correct?
>
> Imagine there are two paths A and B in the branches involved in a
> merge, and A cleanly resolves (say, we take their version because
> our history did not touch it since we diverged) while B has
> conflict.  We'll come to this loop (because we are in a merge but
> have some unmerged paths) and we find that A is different from HEAD,
> happily set committable bit and return.

I'll be honest: when I wrote this, I didn't think too much about what
the code was actually doing, semantically speaking: I was assuming
that the behavior that set the commitable bit in the call tree of
wt_longstatus_print() was correct, and that it was just a matter of
mechanically copying that logic over to the --short/--porcelain call
paths.

Looking into this more deeply, I think you're right, but more
problematically, this is technically a bug with the current Git code
that seems to be cancelled out by another bug: wt_status_state
apparently does not correctly reflect the state of the index when it
reaches wt_longstatus_print_updated(). Working from master
(f55ff46c9), I modified the last test in t7501 to look like this:

→.echo "Initial contents, unimportant" | tee test-file1 test-file2 &&
→.git add test-file1 test-file2 &&
→.git commit -m "Initial commit" &&
→.echo "commit-1-state" | tee test-file1 test-file2 &&
→.git commit -m "commit 1" -i test-file1 test-file2 &&
→.git tag commit-1 &&
→.git checkout -b branch-2 HEAD^1 &&
→.echo "commit-2-state" | tee test-file1 test-file2 &&
→.git commit -m "commit 2" -i test-file1 test-file2 &&
→.! $(git merge --no-commit commit-1) &&
→.echo "commit-2-state" | tee test-file1 &&
→.git add test-file1 &&
→.git commit --dry-run &&
→.git commit -m "conflicts fixed from merge."

And once inside gdb did this:

(gdb) b wt-status.c:766
Breakpoint 1 at 0x205d73: file wt-status.c, line 766.
(gdb) r
Starting program: /home/pockets/git/git commit --dry-run
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
On branch branch-2
You have unmerged paths.
  (fix conflicts and run "git commit")
  (use "git merge --abort" to abort the merge)


Breakpoint 1, wt_longstatus_print_updated (s=0x55a29960 ) at
wt-status.c:766
warning: Source file is more recent than executable.
760 for (i = 0; i < s->change.nr; i++) {
(gdb) print s->change.nr
$1 = 1

Can you confirm I'm not crazy, and am analyzing this correctly?

> I _think_ with the change to "what happens during merge" above that
> I suggested, this loop automatically becomes correct, but I didn't
> think it through.  If there are ways other than .merge_in_progress
> that place conflicted entries in the index, then this loop is still
> incorrect and would want to be more like:
>
> for (i = 0; i < s->change.nr; i++) {
> struct 

Re: Q: Ignore ./foo, but not script/foo

2018-07-19 Thread Ævar Arnfjörð Bjarmason


On Thu, Jul 19 2018, Timothy Rice wrote:

>> How did you come up with this "./" syntax?
>
> It is a Unix thing: "./" or just "." refers to the current directory.
>
> When calling scripts or programs in the current directory from a Unix
> command line, it is required to refer to them as, say, "./foo" (not just
> "foo") -- unless "." is in your PATH.
>
> Most people do put "." in their PATH for convenience but it is considered a
> little unsafe [1].
>
> Personally, I am surprised that gitignore does not understand this
> notation. To me, OPs meaning was crystal clear: "./foo" should mean to only
> ignore the foo in the repository's root directory.
>
> [1] https://superuser.com/questions/156582/why-is-not-in-the-path-by-default

To clarify I was trying to fish for whether we'd accidentally documented
"./" somewhere since OP was making references to the docs.


Re: Q: Ignore ./foo, but not script/foo

2018-07-19 Thread Konstantin Khomoutov
On Thu, Jul 19, 2018 at 07:06:57PM +1000, Timothy Rice wrote:

[...]
> Most people do put "." in their PATH for convenience
[...]

IMO this is a gross overstatement: personally, I know of no person using
a Unix-like operation system who does this.



Re: [PATCH] sequencer.c: terminate the last line of author-script properly

2018-07-19 Thread Phillip Wood
Hi Junio

On 18/07/18 18:17, Junio C Hamano wrote:
> Phillip Wood  writes:
> 
>>> (I think we had code to do so in "git am"
>>> that was rewritten in C first).
>>
>> The code in builtin/am.c doesn't try to write valid posix shell (if
>> one assumes it is the only consumer of the author script then it
>> doesn't need to) which results in simpler code, but external scripts
>> cannot safely eval it anymore.
> 
> Are you sure about that? If so we probably should see if we can fix> the 
> writer, and better yet, if we can share code with the writer
> discussed here, as presumably we are fixing it in this thread.
> 
> But I do not see how builtin/am.c::write_author_script() would
> produce something that would not eval correctly.  sq_quote_buf() was
> introduced specifically to write correct string for shell's
> consumption.

You're right, I'm not sure how I missed the calls to sq_quote_buf()
yesterday, sharing the am code with the sequencer would clean things up
nicely.

Best Wishes

Phillip



  1   2   >