Re: [PATCH] Fix branch renaming not updating HEADs correctly

2017-08-26 Thread Junio C Hamano
Thanks.  I'll merge this down to at least 'next' before I go
offline for a week.



Re: cat-file timing window on Cygwin

2017-08-26 Thread Ramsay Jones


On 26/08/17 22:11, Adam Dinwoodie wrote:
> On Sat, Aug 26, 2017 at 11:53:37AM -0700, Jeff King wrote:
>> On Sat, Aug 26, 2017 at 01:57:18AM +0100, Ramsay Jones wrote:
>>
 diff --git a/run-command.c b/run-command.c
 index 98621faca8..064ebd1995 100644
 --- a/run-command.c
 +++ b/run-command.c
 @@ -641,7 +641,6 @@ int start_command(struct child_process *cmd)
}
  
trace_argv_printf(cmd->argv, "trace: run_command:");
 -  fflush(NULL);
  
  #ifndef GIT_WINDOWS_NATIVE
  {
>>>
>>> I suspect not, but I can give it a try ...
>>>
>>> ... oh, wow, that works! Ahem. (Hmm, so it's flushing stdin?!)
>>
>> Interesting. I find it a little hard to believe there's so obvious a bug
>> as "fflush(NULL) flushes stdin", but well...that's what it seems like.
>>
>> If that's truly what it is, this is the minimal reproduction I came up
>> with:
>>
>> -- >8 --
>> #include 
>>
>> int main(void)
>> {
>>  char buf[256];
>>  while (fgets(buf, sizeof(buf), stdin)) {
>>  fprintf(stdout, "got: %s", buf);
>>  fflush(NULL);
>>  }
>>  return 0;
>> }
>> -- 8< --
>>
>> If this really is the bug, then doing something like "seq 10 | ./a.out"
>> would drop some of the input lines.
> 
> ...yep.  It does.  Specifically, I consistently only get the firsts
> line:
> 
> $ seq 10 | ./a.exe
> got: 1
> 
> $ 
> 
> If I introduce a delay between the lines of stdin (which I tested by
> just typing stdin from the keyboard), it works as expected.
> 
> Looks like this one will need to go to the Cygwin mailing list; I'll
> take it there shortly.  Thank you all for your help getting this far!

This is apparently fixed in cygwin v2.8.3 [see commit 78ade082fe,
('Revert "errno: Stop using _impure_ptr->_errno completely"', 19-07-2017),
commit 9cc89b0438 ("cygwin: Use errno instead of _impure_ptr->_errno", 
19-07-2017), commit a674199fc9 ("cygwin: Bump DLL version to 2.8.3",
19-07-2017) and commit d2ae2f00b8 ("cygwin: add fflush fix to release
notes", 19-07-2017)].

I haven't had a chance to try v2.8.3 yet (it's 3am and I'm about to
go get some sleep).

ATB,
Ramsay Jones




Re: [PATCH 00/12] Clean up notes-related code around `load_subtree()`

2017-08-26 Thread Johan Herland
On Sat, Aug 26, 2017 at 10:28 AM, Michael Haggerty  wrote:
[...]
> plenty that could be cleaned up in the area:
>
> * Make macro `GIT_NIBBLE` safer by adding some parentheses
> * Remove some dead code
> * Fix some memory leaks
> * Fix some obsolete and incorrect comments
> * Reject "notes" that are not blobs
>
> I hope the result is also easier to understand.

I looked through the series, and the patches look good to me, although
I do agree with Junio's comments on #2.

Thanks for a long-overdue cleanup in one of the hairier parts of
the notes code. The end result reads a lot better IMHO.

...Johan


Re: cat-file timing window on Cygwin

2017-08-26 Thread Adam Dinwoodie
On Sat, Aug 26, 2017 at 11:53:37AM -0700, Jeff King wrote:
> On Sat, Aug 26, 2017 at 01:57:18AM +0100, Ramsay Jones wrote:
> 
> > > diff --git a/run-command.c b/run-command.c
> > > index 98621faca8..064ebd1995 100644
> > > --- a/run-command.c
> > > +++ b/run-command.c
> > > @@ -641,7 +641,6 @@ int start_command(struct child_process *cmd)
> > >   }
> > >  
> > >   trace_argv_printf(cmd->argv, "trace: run_command:");
> > > - fflush(NULL);
> > >  
> > >  #ifndef GIT_WINDOWS_NATIVE
> > >  {
> > 
> > I suspect not, but I can give it a try ...
> > 
> > ... oh, wow, that works! Ahem. (Hmm, so it's flushing stdin?!)
> 
> Interesting. I find it a little hard to believe there's so obvious a bug
> as "fflush(NULL) flushes stdin", but well...that's what it seems like.
> 
> If that's truly what it is, this is the minimal reproduction I came up
> with:
> 
> -- >8 --
> #include 
> 
> int main(void)
> {
>   char buf[256];
>   while (fgets(buf, sizeof(buf), stdin)) {
>   fprintf(stdout, "got: %s", buf);
>   fflush(NULL);
>   }
>   return 0;
> }
> -- 8< --
> 
> If this really is the bug, then doing something like "seq 10 | ./a.out"
> would drop some of the input lines.

...yep.  It does.  Specifically, I consistently only get the firsts
line:

$ seq 10 | ./a.exe
got: 1

$ 

If I introduce a delay between the lines of stdin (which I tested by
just typing stdin from the keyboard), it works as expected.

Looks like this one will need to go to the Cygwin mailing list; I'll
take it there shortly.  Thank you all for your help getting this far!


Re: [PATCH/RFC] push: anonymize URL in error output

2017-08-26 Thread Jeff King
On Fri, Aug 25, 2017 at 12:37:43PM -0700, Brandon Williams wrote:

> > > My knee-jerk reaction is if it's worth writing after the dashes, it's
> > > worth putting in the commit message.
> > > 
> > > However, in the case I think it is OK as-is (the motivation of "we
> > > already avoid leaking auth info to stdout, so we should do the same for
> > > error messages" seems self-contained and reasonable)
> > Well, I tend to be wordy, and most of the commit messages I saw were
> > rather short, so decided to split. Wonder, if maybe example command
> > should be included without the rest of it. Would it be useful?
> 
> I'm guilty of writing short commit messages (something I need to work
> on) but when looking through logs I much prefer to see longer messages
> explaining rationals and trade-offs.

I think as with all writing, there is both "too short" and "too long".
I like having those extra bits in the commit message, too, but you have
to make sure they don't drown out the main points.

I find that when a message gets long, it often benefits from revising
and re-ordering. E.g., having an intro paragraph that explains the
motivation and solution in one sentence, and _then_ go into the details
for people who are really digging into the details. Good organization
lets readers get just the level they need and skip the rest.

Easier said than done, of course. :)

-Peff


Re: cat-file timing window on Cygwin

2017-08-26 Thread Jeff King
On Sat, Aug 26, 2017 at 01:57:18AM +0100, Ramsay Jones wrote:

> > diff --git a/run-command.c b/run-command.c
> > index 98621faca8..064ebd1995 100644
> > --- a/run-command.c
> > +++ b/run-command.c
> > @@ -641,7 +641,6 @@ int start_command(struct child_process *cmd)
> > }
> >  
> > trace_argv_printf(cmd->argv, "trace: run_command:");
> > -   fflush(NULL);
> >  
> >  #ifndef GIT_WINDOWS_NATIVE
> >  {
> 
> I suspect not, but I can give it a try ...
> 
> ... oh, wow, that works! Ahem. (Hmm, so it's flushing stdin?!)

Interesting. I find it a little hard to believe there's so obvious a bug
as "fflush(NULL) flushes stdin", but well...that's what it seems like.

If that's truly what it is, this is the minimal reproduction I came up
with:

-- >8 --
#include 

int main(void)
{
char buf[256];
while (fgets(buf, sizeof(buf), stdin)) {
fprintf(stdout, "got: %s", buf);
fflush(NULL);
}
return 0;
}
-- 8< --

If this really is the bug, then doing something like "seq 10 | ./a.out"
would drop some of the input lines.

-Peff


Re: [PATCH 02/12] load_subtree(): remove unnecessary conditional

2017-08-26 Thread Junio C Hamano
Michael Haggerty  writes:

> At this point in the code, len is *always* <= 20.

This is the kind of log message that makes me unconfortable, as it
lacks "because", and the readers would need to find out themselves
by following the same codepath the patch author already followed.

There is an assert earlier before the control gets in this loop

prefix_len = subtree->key_oid.hash[KEY_INDEX];
assert(prefix_len * 2 >= n);
memcpy(object_oid.hash, subtree->key_oid.hash, prefix_len);

that tries to ensure there is sufficient number of prefix defined in
that key, and the codeflow may ensure that prefix_len is both an
even number and shorter than 20 (the correctness of the code depends
on these, it seems, and if for some reason prefix_len is much
larger, calls to get_oid_hex_segment() will overflow the oid.hash[]
array without checking).  I'd at least feel safer to have an assert
next to the existing one that catches a bug to throw a randomly
large value into subtree->key_oid.hash[KEY_INDEX].  Then we can
safely say "at this point in the code, len is always <= 20", as that
assert will makes it obvious without looking at anything other than
this code and get_oid_hex_segment() implementaiton (combined with
the fact that this function is the only one that coerces len and
puts it into ->key_oid.hash[KEY_INDEX], but that is a weak assurance
as we cannot tell where "subtree" came from---it may have full
20-byte oid in its key_oid field---without following the callchain a
lot more widely).

> Signed-off-by: Michael Haggerty 
> ---
>  notes.c | 35 +--
>  1 file changed, 17 insertions(+), 18 deletions(-)
>
> diff --git a/notes.c b/notes.c
> index 00630a9396..f7ce64ff48 100644
> --- a/notes.c
> +++ b/notes.c
> @@ -446,25 +446,24 @@ static void load_subtree(struct notes_tree *t, struct 
> leaf_node *subtree,
>* If object SHA1 is incomplete (len < 20), and current
>* component consists of 2 hex chars, assume note subtree
>*/
> - if (len <= GIT_SHA1_RAWSZ) {
> - type = PTR_TYPE_NOTE;
> - l = (struct leaf_node *)
> - xcalloc(1, sizeof(struct leaf_node));
> - oidcpy(>key_oid, _oid);
> - oidcpy(>val_oid, entry.oid);
> - if (len < GIT_SHA1_RAWSZ) {
> - if (!S_ISDIR(entry.mode) || path_len != 2)
> - goto handle_non_note; /* not subtree */
> - l->key_oid.hash[KEY_INDEX] = (unsigned char) 
> len;
> - type = PTR_TYPE_SUBTREE;
> - }
> - if (note_tree_insert(t, node, n, l, type,
> -  combine_notes_concatenate))
> - die("Failed to load %s %s into notes tree "
> - "from %s",
> - type == PTR_TYPE_NOTE ? "note" : "subtree",
> - oid_to_hex(>key_oid), t->ref);
> + type = PTR_TYPE_NOTE;
> + l = (struct leaf_node *)
> + xcalloc(1, sizeof(struct leaf_node));
> + oidcpy(>key_oid, _oid);
> + oidcpy(>val_oid, entry.oid);
> + if (len < GIT_SHA1_RAWSZ) {
> + if (!S_ISDIR(entry.mode) || path_len != 2)
> + goto handle_non_note; /* not subtree */
> + l->key_oid.hash[KEY_INDEX] = (unsigned char) len;
> + type = PTR_TYPE_SUBTREE;
>   }
> + if (note_tree_insert(t, node, n, l, type,
> +  combine_notes_concatenate))
> + die("Failed to load %s %s into notes tree "
> + "from %s",
> + type == PTR_TYPE_NOTE ? "note" : "subtree",
> + oid_to_hex(>key_oid), t->ref);
> +
>   continue;
>  
>  handle_non_note:


Re: cat-file timing window on Cygwin

2017-08-26 Thread Adam Dinwoodie
On Sat, Aug 26, 2017 at 01:57:18AM +0100, Ramsay Jones wrote:
> On 25/08/17 16:08, Jeff King wrote:
> > On Fri, Aug 25, 2017 at 12:25:29PM +0100, Adam Dinwoodie wrote:
> > 
> >> As of v2.10.0-rc1-4-g321459439 ("cat-file: support --textconv/--filters
> >> in batch mode"), t8010-cat-file-filters.sh has been failing on Cygwin.
> >> Digging into this, the test looks to expose a timing window: it appears
> >> that if `git cat-file --textconv --batch` receives input on stdin too
> >> quickly, it fails to parse some of that input.
> 
> This has not been failing since the above commit (when this
> feature was first introduced), but (for me) when testing the
> v2.13.0-rc0 build. Please refer to my original email:
> 
> https://public-inbox.org/git/bf7db655-d90f-e711-afc8-6565b7137...@ramsayjones.plus.com/
> 
> ... where I was reasonably sure this was caused by a change in
> the cygwin dll while upgrading from 2.7.0-1 -> 2.8.0-1.

The failure started at some point when I wasn't paying much attention to
the state of Cygwin or Git due to personal health stuff.  Now I'm
catching up with the backlog from that period, I bisected the Git builds
to get to the commit referenced above, but it's entirely plausible the
behaviour change is in the Cygwin DLL and is merely exposed by the test
that commit added.

> > Hmm. That commit doesn't seem to actually touch the stdin loop. That
> > happens via strbuf_getline(), which should be pretty robust to timing.
> > But here are a few random thoughts:
> > 
> >   1. Do you build with HAVE_GETDELIM? Does toggling it have any effect?
> 
> If Adam builds using the configure script, then yes, else no. ;-)
> I never use configure, so I don't have HAVE_GETDELIM set.

I do use the configure script and do run with HAVE_GETDELIM set.  As you
might expect given that Ramsay doesn't and they see the same behaviour,
toggling it makes no difference to whether t8010.8 passes or fails.

> >   2. Does the problem happen only with --textconv?
> > 
> >  If so, I wonder if spawning the child process is somehow draining
> >  stdin. I don't see how the child would accidentally read from our
> >  stdin; we replace its stdin with a pipe so it shouldn't get a
> >  chance to see our descriptor at all.
> 
> As I said in my previous email, "the loop in batch_objects()
> (builtin/cat-file.c lines #489-505) only reads one line from
> stdin, then gets EOF on the stream."

Testing now, removing --textconv does seem to make a difference.  With
the following diff, t8010 passes:

diff --git a/t/t8010-cat-file-filters.sh b/t/t8010-cat-file-filters.sh
index d8242e467..7c85bef9b 100755
--- a/t/t8010-cat-file-filters.sh
+++ b/t/t8010-cat-file-filters.sh
@@ -54,9 +54,9 @@
 test_expect_success 'cat-file --textconv --batch works' '
sha1=$(git rev-parse -q --verify HEAD:world.txt) &&
test_config diff.txt.textconv "tr A-Za-z N-ZA-Mn-za-m <" &&
-   printf "%s hello.txt\n%s hello\n" $sha1 $sha1 |
-   git cat-file --textconv --batch >actual &&
-   printf "%s blob 6\nuryyb\r\n\n%s blob 6\nhello\n\n" \
+   printf "%s\n%s\n" $sha1 $sha1 |
+   git cat-file --batch >actual &&
+   printf "%s blob 6\nhello\n\n%s blob 6\nhello\n\n" \
$sha1 $sha1 >expect &&
test_cmp expect actual
 '

(I am mildly baffled by the necessary change in line endings in the
`expect` file, but that's the output that was being produced, and I
actually think the single CRLF line ending in the file that otherwise
uses LF in the original test is the oddity here.)

A similar test using my live version of Git also works as expected:

$ { echo $sha1 ; echo $sha1 ; } | git cat-file --batch
ce013625030ba8dba906f756967f9e9ca394464a blob 6
hello

ce013625030ba8dba906f756967f9e9ca394464a blob 6
hello

> >  If we accidentally called fflush(stdin) that would discard buffered
> >  input and give the results you're seeing. We don't do that
> >  directly, of course, but we do call fflush(NULL) in run-command
> >  before spawning the child. That _should_ just flush output handles,
> >  but it's possible that there's a cygwin bug, I guess.
> 
> I suspect so, see previous email ...
> 
> > I can't reproduce here on Linux. But does the patch below have any
> > impact?
> > 
> > diff --git a/run-command.c b/run-command.c
> > index 98621faca8..064ebd1995 100644
> > --- a/run-command.c
> > +++ b/run-command.c
> > @@ -641,7 +641,6 @@ int start_command(struct child_process *cmd)
> > }
> >  
> > trace_argv_printf(cmd->argv, "trace: run_command:");
> > -   fflush(NULL);
> >  
> >  #ifndef GIT_WINDOWS_NATIVE
> >  {
> 
> I suspect not, but I can give it a try ...
> 
> ... oh, wow, that works! Ahem. (Hmm, so it's flushing stdin?!)
> 
> Also, t5526-fetch-submodules.sh still works as well (see commit
> 13af8cbd6a "start_command: flush buffers in the WIN32 code path
> as well", 04-02-2011).

Likewise, this gets t8010 passing for me.  I've run through the entire
test 

git describe and "the smallest number of commits possible"

2017-08-26 Thread Kévin Le Gouguec
Hi,

I've asked this question on the git-users Google Groups list[1], and
while the answers there were interesting, I still cannot figure
whether my problem comes from an actual bug, a misleading manpage, my
inability to understand the code and/or the manpage, or a combination
of the three.

I noticed this problem on 2.1.4 (Debian oldstable); I can reproduce it
on next (7ef03bb281b2220d0f2414365e4949beb2337042). Quoting
git-describe(1)'s SEARCH STRATEGY section:

> If multiple tags were found during the walk then the tag which has
> the fewest commits different from the input commit-ish will be
> selected and output. Here fewest commits different is defined as the
> number of commits which would be shown by `git log tag..input` will
> be the smallest number of commits possible.

To put it shortly, after cloning GNU Emacs's repository[2]:

$ git describe --tags
emacs-25.1-129847-gdcc3ef3ee7
$ git log --oneline emacs-25.1.. | wc -l
5126
$ git log --oneline emacs-25.2.. | wc -l
4793

If I am reading it correctly, the manpage suggests that emacs-25.2
should be picked in this situation ("log emacs-25.2.." shows fewer
commits than "log emacs-25.1..").

Once more with --debug:

$ git describe --debug --tags
searching to describe HEAD
 lightweight   129847 emacs-25.1
 lightweight 4086 emacs-25.1-rc2
 lightweight 4126 emacs-25.1-rc1
 annotated   4185 emacs-25.2
 annotated   4220 emacs-25.2-rc2
 lightweight 4226 emacs-25.0.95
 annotated   4236 emacs-25.2-rc1
 annotated   4280 emacs-25.1.91
 lightweight 4305 emacs-25.0.94
 lightweight 4329 emacs-25.1.90
traversed 130257 commits
more than 10 tags found; listed 10 most recent
gave up search at 5c587fdff164e8b90beb47f6da64b4884290e40a
emacs-25.1-129847-gdcc3ef3ee7

I tried to get a sense of what builtin/describe.c is doing (see [1]
for some debug printfs); to summarize what I figured:

- When QSORT() is called in describe(), emacs-25.1's depth is smaller
  than emacs-25.2's.

- finish_depth_computation() updates the best candidate's depth; AFAIU
  this update's only purpose is to make the displayed suffix more
  accurate.

That is all I have right now. I apologize for failing to come up with
a simpler test case (I tried to make toy repositories with a similar
topology to reproduce the issue, to no avail). To conclude, as far as
I can tell, one of the following holds:

- something about this repository[3] causes git-describe(1) to not
  work as advertised;

- I fail at reading manuals.



[1]: https://groups.google.com/forum/?fromgroups#!topic/git-users/tSnX-O-3aNI

[2]: https://git.savannah.gnu.org/git/emacs.git

[3]: The project's workflow sounds straightforward:

- development happens mainly on the master branch;
- the emacs-25 branch receives maintenance fixes and release tags; it
  is periodically merged back into master;
- experimental work can happen on scratch branches; these may
  eventually be merged back into master.

There are some complications (e.g. pull-induced merges) but if
I --simplify-by-decoration I find that the repository's topology
matches this description.


Re: [PATCH 1/2] refs/files-backend: duplicate strings added to affected_refnames

2017-08-26 Thread Martin Ågren
On 25 August 2017 at 23:00, Junio C Hamano  wrote:
> Martin Ågren  writes:
>
>> files_transaction_prepare() and the functions it calls add strings to a
>> string list without duplicating them, i.e., we keep the original raw
>> pointers we were given. That is "ok", since we keep them only for a
>> short-enough time, but we end up leaking some of them.
>
> Sorry, but I do not understand this statement.  If affected_refnames
> string list borrows strings from other structures who own them, and
> none of these strings are freed by clearing affected_refnames list,
> that is not "leaking"---we didn't acquire the ownership, so it is
> not our job to free them in the first place.  Among the original
> owners of strings we borrow from, some may not properly free, in
> which case that is a leak.
>
> What problem are you solving?

Sorry. Maybe this explains my intentions better:

In lock_ref_for_update(), we populate a strbuf "referent" through
lock_raw_ref(). If we don't have a symref, we don't use "referent"
for anything (and it won't have allocated any memory). Otherwise, we
hand over referent.buf to someone who uses it immediately
(refs_read_ref_full) or to someone who holds on to the pointer
(split_symref_update ends up adding it to a string list). Therefore,
at the end of lock_ref_for_update() we can't unconditionally release
the strbuf, so we end up leaking it.

We could release the strbuf when we know that it's safe (possibly
also only when we know that it's needed). Instead, in preparation
for the next patch, make the string list not hold on to the raw
pointers, i.e., make it duplicate the strings on insertion and
manage its own resources.

Of course, the pointer-keeping and free-avoidance might be by design
and/or wanted, e.g., to avoid excessive mallocing and freeing. I admit
to not knowing what is a realistic number of iterations in the loop that
calls lock_ref_for_update, i.e., how severe this leak might be. Maybe
the "backend" nature of this code does not necessarily imply "this could
be called any number of times throughout the process' lifetime".


Loan

2017-08-26 Thread CAPITAL FINANCE IN
UNSECURED BUSINESS/PERSONAL LOAN BY LOAN CAPITAL FINANCE
  - NO COLLATERAL
  - MINIMUM DOCUMENTATION
  - BUSINESS LOAN UP TO FIVE(5) MILLION US DOLLARS

  CONTACT US TODAY VIA EMAIL: finance.incapi...@hotmail.com


Re: git signed push server-side [and 3 more messages]

2017-08-26 Thread Ian Jackson
Hi.  Thanks to both of you for your helpful comments.

Jonathan Nieder writes ("Re: git signed push server-side"):
> Ian Jackson wrote[1]:
> > 2. git-receive-pack calls gpg (Debian #852684)
> >
> > It would be better if it called gpgv.
...
> think respecting gpg.program would be nicer.  Is there a reason not
> to do that?

I think it very unlikely that anyone would want git-receive-pack's
signed push facility to end up running gpg rather than gpgv.  But, the
git-receive-pack functions here are building blocks which need quite a
lot of extra work to use, anyway.  So having all callers have to pass
-c gpg.program would be quite tolerable, if a bit ugly.

> I suspect receive-pack just forgot to call git_gpg_config.

So, I will send a patch to fix that.

> > 3. No way to specify keyring (Debian #852684, side note)
> >
> > There should be a way to specify the keyring used by
> > git-receive-pack's gpgv invocation.  This should probably be done with
> > a config option, receive.certKeyring perhaps.
> 
> How is the keyring configured for other commands that use GPG, like
> "git tag -v"?  (Forgive my laziness in not looking it up.)

It's not.  You just get whatever keyring and trust db etc. your gpg
has as the default.  As Junio says, being able to set the keyring
explicitly is not essential to use the signed push feature; one can
make a wrapper for gpgv, or set GNUPGHOME.

It just seemed to me that the current interface is not very
convenient.  If it is controversial to provide a more convenient
interface, then I will work with what there is.

> > 4. Trouble with the nonce (Debian #852688 part 2)
...
> I also wonder why you say the git configuration system is unsuited to
> keeping secrets.  E.g. passing an include.path setting with -c or
> GIT_CONFIG_PARAMETERS should avoid the kinds of trouble you described.

I was not aware of include.path.  That would solve this aspect of the
problem (but not the rollover aspect, of course).
GIT_CONFIG_PARAMETERS is not documented.

> > (ii) At some later point, the following enhancement: When
> > !stateless_rpc, certNonceSeedsFile is ignored except that if neither
> > it nor the old certNonceSeed is set, the signed push feature is
> > disabled.
> 
> That seems like an awkward interface.  Shouldn't there be at least
> another config variable to enable signed push without making up a seed
> or filename?

Perhaps.  I don't have a strong opinion about the config parameter
names and semantics.  This seems like a matter of taste and I'm happy
to just do whatever others want.

> >In this state we always get a fresh nonce (from a suitable
> > system random source).
> 
> How does this work with stateless_rpc?  (See "Session State" in
> Documentation/technical/http-protocol.txt.)

It doesn't, which is why I propose this only if !stateless_rpc.

> > Nontrivial because current git doesn't seem to
> > have a "get suitable random number" function, and the mess that is the
> > semantics of /dev/*random* files means that providing one is going to
> > be controversial.
> 
> I think you're overestimating how much pushback adding such a thing
> would get.

Well, I'm happy to give it ago.  (NB: I will use /dev/urandom on
Linux.)

> More references:
> 
> - JGit's push cert handling:
>   https://git.eclipse.org/r/#/q/message:cert

Really helpful, thanks.  I will read these.  (Shame that I can't even
view that information without running their javascript!)

I ended up reading this
 
https://github.com/sitaramc/gitolite/blob/cf062b8bb6b21a52f7c5002d33fbc950762c1aa7/contrib/hooks/repo-specific/save-push-signatures

> - Gerrit's push cert handling:
>   https://gerrit-review.googlesource.com/q/project:gerrit+message:gpg

(This is an empty page for me, even with javascript turned on.)

I'll probably implement the refs/meta/push-certs:REF@{cert} special
branch thing from JGit.  Can anyone point me at a public repo which
has such push-certs recorded, so I can check the details and make my
thing do stuff the same way ?

Junio C Hamano writes ("Re: git signed push server-side"):
> Jonathan Nieder  writes:
> > Ian Jackson wrote[1]:
> >> Proposed change: provide the full fingerprint instead.  Do this
> >> for every caller of gpg-interface.c.
> >
> > Sounds sane.
> 
> I probably should react a bit stronger against that "instead", as
> Ian will not be writing the world's first server side hook that uses
> this interface.

Anyone who is verifying signatures and doing anything with the
16-character key id other than logging it, has a serious security bug.

> But on the other hand, the value of this environment is not meant to
> be used to make decision by the hook anyway, so it perhaps is OK to
> change it in a backward incompatible way to break those who have
> been using the value for any serious purpose.

Precisely.

> The purpose of the signed push is not to replace authentication and
> authorization.  The primary goal behind the signed push mechanism 

[PATCH 12/12] load_subtree(): declare some variables to be `size_t`

2017-08-26 Thread Michael Haggerty
* `prefix_len`
* `path_len`
* `i`

It's good hygiene.

Signed-off-by: Michael Haggerty 
---
 notes.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/notes.c b/notes.c
index d5409b55e3..7f5bfa19c7 100644
--- a/notes.c
+++ b/notes.c
@@ -406,7 +406,7 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
struct int_node *node, unsigned int n)
 {
struct object_id object_oid;
-   unsigned int prefix_len;
+   size_t prefix_len;
void *buf;
struct tree_desc desc;
struct name_entry entry;
@@ -422,7 +422,7 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
while (tree_entry(, )) {
unsigned char type;
struct leaf_node *l;
-   int path_len = strlen(entry.path);
+   size_t path_len = strlen(entry.path);
 
if (path_len == 2 * (GIT_SHA1_RAWSZ - prefix_len)) {
/* This is potentially the remainder of the SHA-1 */
@@ -486,7 +486,7 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
{
struct strbuf non_note_path = STRBUF_INIT;
const char *q = oid_to_hex(>key_oid);
-   int i;
+   size_t i;
for (i = 0; i < prefix_len; i++) {
strbuf_addch(_note_path, *q++);
strbuf_addch(_note_path, *q++);
-- 
2.11.0



[PATCH 05/12] load_subtree(): separate logic for internal vs. terminal entries

2017-08-26 Thread Michael Haggerty
There are only two legitimate notes path components:

* A hexadecimal string that fills the rest of the SHA-1

* A two-digit hexadecimal string that constitutes another internal
  node.

So handle those two cases at the top level, and reject others as
non-notes without trying to parse them. The logic separation also
simplifies upcoming changes.

This prevents us from leaking memory for a leaf_node in the case of
wrong-sized paths. There are still memory leaks in this code; they will
be fixed in upcoming commits.

Signed-off-by: Michael Haggerty 
---
 notes.c | 52 +++-
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/notes.c b/notes.c
index 62ab3f4ce3..768902055e 100644
--- a/notes.c
+++ b/notes.c
@@ -433,30 +433,40 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
while (tree_entry(, )) {
unsigned char type;
struct leaf_node *l;
-   int len, path_len = strlen(entry.path);
+   int path_len = strlen(entry.path);
+
+   if (path_len == 2 * (GIT_SHA1_RAWSZ - prefix_len)) {
+   /* This is potentially the remainder of the SHA-1 */
+   if (get_oid_hex_segment(entry.path, path_len,
+   object_oid.hash + prefix_len,
+   GIT_SHA1_RAWSZ - prefix_len) < 
0)
+   goto handle_non_note; /* entry.path is not a 
SHA1 */
+
+   type = PTR_TYPE_NOTE;
+   l = (struct leaf_node *)
+   xcalloc(1, sizeof(struct leaf_node));
+   oidcpy(>key_oid, _oid);
+   oidcpy(>val_oid, entry.oid);
+   } else if (path_len == 2) {
+   /* This is potentially an internal node */
+   if (get_oid_hex_segment(entry.path, 2,
+   object_oid.hash + prefix_len,
+   GIT_SHA1_RAWSZ - prefix_len) < 
0)
+   goto handle_non_note; /* entry.path is not a 
SHA1 */
 
-   len = get_oid_hex_segment(entry.path, path_len,
-   object_oid.hash + prefix_len, GIT_SHA1_RAWSZ - 
prefix_len);
-   if (len < 0)
-   goto handle_non_note; /* entry.path is not a SHA1 */
-   len += prefix_len;
-
-   /*
-* If object SHA1 is complete (len == 20), assume note object
-* If object SHA1 is incomplete (len < 20), and current
-* component consists of 2 hex chars, assume note subtree
-*/
-   type = PTR_TYPE_NOTE;
-   l = (struct leaf_node *)
-   xcalloc(1, sizeof(struct leaf_node));
-   oidcpy(>key_oid, _oid);
-   oidcpy(>val_oid, entry.oid);
-   if (len < GIT_SHA1_RAWSZ) {
-   if (!S_ISDIR(entry.mode) || path_len != 2)
-   goto handle_non_note; /* not subtree */
-   l->key_oid.hash[KEY_INDEX] = (unsigned char) len;
type = PTR_TYPE_SUBTREE;
+   l = (struct leaf_node *)
+   xcalloc(1, sizeof(struct leaf_node));
+   oidcpy(>key_oid, _oid);
+   oidcpy(>val_oid, entry.oid);
+   if (!S_ISDIR(entry.mode))
+   goto handle_non_note; /* not subtree */
+   l->key_oid.hash[KEY_INDEX] = (unsigned char) 
(prefix_len + 1);
+   } else {
+   /* This can't be part of a note */
+   goto handle_non_note;
}
+
if (note_tree_insert(t, node, n, l, type,
 combine_notes_concatenate))
die("Failed to load %s %s into notes tree "
-- 
2.11.0



[PATCH 09/12] load_subtree(): combine some common code

2017-08-26 Thread Michael Haggerty
Write the length into `object_oid` (before copying) rather than
`l->key_oid` (after copying). Then combine some code from the two `if`
blocks.

Signed-off-by: Michael Haggerty 
---
 notes.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/notes.c b/notes.c
index 6ce71bfedb..534fda007e 100644
--- a/notes.c
+++ b/notes.c
@@ -447,10 +447,6 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
goto handle_non_note; /* entry.path is not a 
SHA1 */
 
type = PTR_TYPE_NOTE;
-   l = (struct leaf_node *)
-   xcalloc(1, sizeof(struct leaf_node));
-   oidcpy(>key_oid, _oid);
-   oidcpy(>val_oid, entry.oid);
} else if (path_len == 2) {
/* This is potentially an internal node */
 
@@ -463,17 +459,17 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
GIT_SHA1_RAWSZ - prefix_len))
goto handle_non_note; /* entry.path is not a 
SHA1 */
 
+   object_oid.hash[KEY_INDEX] = (unsigned char) 
(prefix_len + 1);
+
type = PTR_TYPE_SUBTREE;
-   l = (struct leaf_node *)
-   xcalloc(1, sizeof(struct leaf_node));
-   oidcpy(>key_oid, _oid);
-   oidcpy(>val_oid, entry.oid);
-   l->key_oid.hash[KEY_INDEX] = (unsigned char) 
(prefix_len + 1);
} else {
/* This can't be part of a note */
goto handle_non_note;
}
 
+   l = xcalloc(1, sizeof(*l));
+   oidcpy(>key_oid, _oid);
+   oidcpy(>val_oid, entry.oid);
if (note_tree_insert(t, node, n, l, type,
 combine_notes_concatenate))
die("Failed to load %s %s into notes tree "
-- 
2.11.0



[PATCH 10/12] get_oid_hex_segment(): don't pad the rest of `oid`

2017-08-26 Thread Michael Haggerty
Remove the feature of `get_oid_hex_segment()` that it pads the rest of
the `oid` argument with zeros. Instead, do this at the caller who
needs it.

This makes the functionality of this function more coherent and
removes the need for its `oid_len` argument.

Signed-off-by: Michael Haggerty 
---
 notes.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/notes.c b/notes.c
index 534fda007e..ce9ba36179 100644
--- a/notes.c
+++ b/notes.c
@@ -339,15 +339,14 @@ static void note_tree_free(struct int_node *tree)
  * - hex  - Partial SHA1 segment in ASCII hex format
  * - hex_len  - Length of above segment. Must be multiple of 2 between 0 and 40
  * - oid  - Partial SHA1 value is written here
- * - oid_len  - Max #bytes to store in sha1, Must be >= hex_len / 2, and < 20
  * Return 0 on success or -1 on error (invalid arguments or input not
- * in hex format). Pad oid with NULs up to oid_len.
+ * in hex format).
  */
 static int get_oid_hex_segment(const char *hex, unsigned int hex_len,
-   unsigned char *oid, unsigned int oid_len)
+   unsigned char *oid)
 {
unsigned int i, len = hex_len >> 1;
-   if (hex_len % 2 != 0 || len > oid_len)
+   if (hex_len % 2 != 0)
return -1;
for (i = 0; i < len; i++) {
unsigned int val = (hexval(hex[0]) << 4) | hexval(hex[1]);
@@ -356,8 +355,6 @@ static int get_oid_hex_segment(const char *hex, unsigned 
int hex_len,
*oid++ = val;
hex += 2;
}
-   for (; i < oid_len; i++)
-   *oid++ = 0;
return 0;
 }
 
@@ -442,24 +439,29 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
goto handle_non_note;
 
if (get_oid_hex_segment(entry.path, path_len,
-   object_oid.hash + prefix_len,
-   GIT_SHA1_RAWSZ - prefix_len))
+   object_oid.hash + prefix_len))
goto handle_non_note; /* entry.path is not a 
SHA1 */
 
type = PTR_TYPE_NOTE;
} else if (path_len == 2) {
/* This is potentially an internal node */
+   size_t len = prefix_len;
 
if (!S_ISDIR(entry.mode))
/* internal nodes must be trees */
goto handle_non_note;
 
if (get_oid_hex_segment(entry.path, 2,
-   object_oid.hash + prefix_len,
-   GIT_SHA1_RAWSZ - prefix_len))
+   object_oid.hash + len++))
goto handle_non_note; /* entry.path is not a 
SHA1 */
 
-   object_oid.hash[KEY_INDEX] = (unsigned char) 
(prefix_len + 1);
+   /*
+* Pad the rest of the SHA-1 with zeros,
+* except for the last byte, where we write
+* the length:
+*/
+   memset(object_oid.hash + len, 0, GIT_SHA1_RAWSZ - len - 
1);
+   object_oid.hash[KEY_INDEX] = (unsigned char)len;
 
type = PTR_TYPE_SUBTREE;
} else {
-- 
2.11.0



[PATCH 07/12] load_subtree(): only consider blobs to be potential notes

2017-08-26 Thread Michael Haggerty
The old code converted any entry whose path constituted a full SHA-1
as a leaf node, without regard for the type of the entry. But only
blobs can be notes. So treat entries whose paths *look like* notes
paths but that are not blobs as non-notes.

Signed-off-by: Michael Haggerty 
---
 notes.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/notes.c b/notes.c
index ac69c5aa18..46ab15b83a 100644
--- a/notes.c
+++ b/notes.c
@@ -437,6 +437,11 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
 
if (path_len == 2 * (GIT_SHA1_RAWSZ - prefix_len)) {
/* This is potentially the remainder of the SHA-1 */
+
+   if (!S_ISREG(entry.mode))
+   /* notes must be blobs */
+   goto handle_non_note;
+
if (get_oid_hex_segment(entry.path, path_len,
object_oid.hash + prefix_len,
GIT_SHA1_RAWSZ - prefix_len) < 
0)
-- 
2.11.0



[PATCH 02/12] load_subtree(): remove unnecessary conditional

2017-08-26 Thread Michael Haggerty
At this point in the code, len is *always* <= 20.

Signed-off-by: Michael Haggerty 
---
 notes.c | 35 +--
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/notes.c b/notes.c
index 00630a9396..f7ce64ff48 100644
--- a/notes.c
+++ b/notes.c
@@ -446,25 +446,24 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
 * If object SHA1 is incomplete (len < 20), and current
 * component consists of 2 hex chars, assume note subtree
 */
-   if (len <= GIT_SHA1_RAWSZ) {
-   type = PTR_TYPE_NOTE;
-   l = (struct leaf_node *)
-   xcalloc(1, sizeof(struct leaf_node));
-   oidcpy(>key_oid, _oid);
-   oidcpy(>val_oid, entry.oid);
-   if (len < GIT_SHA1_RAWSZ) {
-   if (!S_ISDIR(entry.mode) || path_len != 2)
-   goto handle_non_note; /* not subtree */
-   l->key_oid.hash[KEY_INDEX] = (unsigned char) 
len;
-   type = PTR_TYPE_SUBTREE;
-   }
-   if (note_tree_insert(t, node, n, l, type,
-combine_notes_concatenate))
-   die("Failed to load %s %s into notes tree "
-   "from %s",
-   type == PTR_TYPE_NOTE ? "note" : "subtree",
-   oid_to_hex(>key_oid), t->ref);
+   type = PTR_TYPE_NOTE;
+   l = (struct leaf_node *)
+   xcalloc(1, sizeof(struct leaf_node));
+   oidcpy(>key_oid, _oid);
+   oidcpy(>val_oid, entry.oid);
+   if (len < GIT_SHA1_RAWSZ) {
+   if (!S_ISDIR(entry.mode) || path_len != 2)
+   goto handle_non_note; /* not subtree */
+   l->key_oid.hash[KEY_INDEX] = (unsigned char) len;
+   type = PTR_TYPE_SUBTREE;
}
+   if (note_tree_insert(t, node, n, l, type,
+combine_notes_concatenate))
+   die("Failed to load %s %s into notes tree "
+   "from %s",
+   type == PTR_TYPE_NOTE ? "note" : "subtree",
+   oid_to_hex(>key_oid), t->ref);
+
continue;
 
 handle_non_note:
-- 
2.11.0



[PATCH 11/12] hex_to_bytes(): simpler replacement for `get_oid_hex_segment()`

2017-08-26 Thread Michael Haggerty
Now that `get_oid_hex_segment()` does less, it makes sense to rename
it and simplify its semantics:

* Instead of a `hex_len` parameter, which was the number of hex
  characters (and had to be even), use a `len` parameter, which is the
  number of resulting bytes. This removes then need for the check that
  `hex_len` is even and to divide it by two to determine the number of
  bytes. For good hygiene, declare the `len` parameter to be `size_t`
  instead of `unsigned int`.

* Change the order of the arguments to the more traditional (dst,
  src, len).

* Rename the function to `hex_to_bytes()`.

* Remove a loop variable: just count `len` down instead.

Signed-off-by: Michael Haggerty 
---
 notes.c | 28 ++--
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/notes.c b/notes.c
index ce9ba36179..d5409b55e3 100644
--- a/notes.c
+++ b/notes.c
@@ -335,25 +335,18 @@ static void note_tree_free(struct int_node *tree)
 }
 
 /*
- * Convert a partial SHA1 hex string to the corresponding partial SHA1 value.
- * - hex  - Partial SHA1 segment in ASCII hex format
- * - hex_len  - Length of above segment. Must be multiple of 2 between 0 and 40
- * - oid  - Partial SHA1 value is written here
- * Return 0 on success or -1 on error (invalid arguments or input not
- * in hex format).
+ * Read `len` pairs of hexadecimal digits from `hex` and write the
+ * values to `binary` as `len` bytes. Return 0 on success, or -1 if
+ * the input does not consist of hex digits).
  */
-static int get_oid_hex_segment(const char *hex, unsigned int hex_len,
-   unsigned char *oid)
+static int hex_to_bytes(unsigned char *binary, const char *hex, size_t len)
 {
-   unsigned int i, len = hex_len >> 1;
-   if (hex_len % 2 != 0)
-   return -1;
-   for (i = 0; i < len; i++) {
+   for (; len; len--, hex += 2) {
unsigned int val = (hexval(hex[0]) << 4) | hexval(hex[1]);
+
if (val & ~0xff)
return -1;
-   *oid++ = val;
-   hex += 2;
+   *binary++ = val;
}
return 0;
 }
@@ -438,8 +431,8 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
/* notes must be blobs */
goto handle_non_note;
 
-   if (get_oid_hex_segment(entry.path, path_len,
-   object_oid.hash + prefix_len))
+   if (hex_to_bytes(object_oid.hash + prefix_len, 
entry.path,
+GIT_SHA1_RAWSZ - prefix_len))
goto handle_non_note; /* entry.path is not a 
SHA1 */
 
type = PTR_TYPE_NOTE;
@@ -451,8 +444,7 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
/* internal nodes must be trees */
goto handle_non_note;
 
-   if (get_oid_hex_segment(entry.path, 2,
-   object_oid.hash + len++))
+   if (hex_to_bytes(object_oid.hash + len++, entry.path, 
1))
goto handle_non_note; /* entry.path is not a 
SHA1 */
 
/*
-- 
2.11.0



[PATCH 08/12] get_oid_hex_segment(): return 0 on success

2017-08-26 Thread Michael Haggerty
Nobody cares about the return value of get_oid_hex_segment() except to
check whether it failed. So just return 0 on success.

And while we're updating its docstring, update it for some argument
renaming that happened a while ago.

Signed-off-by: Michael Haggerty 
---
 notes.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/notes.c b/notes.c
index 46ab15b83a..6ce71bfedb 100644
--- a/notes.c
+++ b/notes.c
@@ -338,11 +338,10 @@ static void note_tree_free(struct int_node *tree)
  * Convert a partial SHA1 hex string to the corresponding partial SHA1 value.
  * - hex  - Partial SHA1 segment in ASCII hex format
  * - hex_len  - Length of above segment. Must be multiple of 2 between 0 and 40
- * - sha1 - Partial SHA1 value is written here
- * - sha1_len - Max #bytes to store in sha1, Must be >= hex_len / 2, and < 20
- * Returns -1 on error (invalid arguments or invalid SHA1 (not in hex format)).
- * Otherwise, returns number of bytes written to sha1 (i.e. hex_len / 2).
- * Pads sha1 with NULs up to sha1_len (not included in returned length).
+ * - oid  - Partial SHA1 value is written here
+ * - oid_len  - Max #bytes to store in sha1, Must be >= hex_len / 2, and < 20
+ * Return 0 on success or -1 on error (invalid arguments or input not
+ * in hex format). Pad oid with NULs up to oid_len.
  */
 static int get_oid_hex_segment(const char *hex, unsigned int hex_len,
unsigned char *oid, unsigned int oid_len)
@@ -359,7 +358,7 @@ static int get_oid_hex_segment(const char *hex, unsigned 
int hex_len,
}
for (; i < oid_len; i++)
*oid++ = 0;
-   return len;
+   return 0;
 }
 
 static int non_note_cmp(const struct non_note *a, const struct non_note *b)
@@ -444,7 +443,7 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
 
if (get_oid_hex_segment(entry.path, path_len,
object_oid.hash + prefix_len,
-   GIT_SHA1_RAWSZ - prefix_len) < 
0)
+   GIT_SHA1_RAWSZ - prefix_len))
goto handle_non_note; /* entry.path is not a 
SHA1 */
 
type = PTR_TYPE_NOTE;
@@ -461,7 +460,7 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
 
if (get_oid_hex_segment(entry.path, 2,
object_oid.hash + prefix_len,
-   GIT_SHA1_RAWSZ - prefix_len) < 
0)
+   GIT_SHA1_RAWSZ - prefix_len))
goto handle_non_note; /* entry.path is not a 
SHA1 */
 
type = PTR_TYPE_SUBTREE;
-- 
2.11.0



[PATCH 03/12] load_subtree(): reduce the scope of some local variables

2017-08-26 Thread Michael Haggerty
Declare the variables inside the loop, to make it more obvious that
their values are not carried across loop iterations.

Signed-off-by: Michael Haggerty 
---
 notes.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/notes.c b/notes.c
index f7ce64ff48..fbed8c3013 100644
--- a/notes.c
+++ b/notes.c
@@ -421,9 +421,6 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
void *buf;
struct tree_desc desc;
struct name_entry entry;
-   int len, path_len;
-   unsigned char type;
-   struct leaf_node *l;
 
buf = fill_tree_descriptor(, >val_oid);
if (!buf)
@@ -434,7 +431,10 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
assert(prefix_len * 2 >= n);
memcpy(object_oid.hash, subtree->key_oid.hash, prefix_len);
while (tree_entry(, )) {
-   path_len = strlen(entry.path);
+   unsigned char type;
+   struct leaf_node *l;
+   int len, path_len = strlen(entry.path);
+
len = get_oid_hex_segment(entry.path, path_len,
object_oid.hash + prefix_len, GIT_SHA1_RAWSZ - 
prefix_len);
if (len < 0)
-- 
2.11.0



[PATCH 00/12] Clean up notes-related code around `load_subtree()`

2017-08-26 Thread Michael Haggerty
While putzing around in the notes code quite some time ago, I found
this comment:

/*
 * Determine full path for this non-note entry:
 * The filename is already found in entry.path, but the
 * directory part of the path must be deduced from the subtree
 * containing this entry. We assume here that the overall notes
 * tree follows a strict byte-based progressive fanout
 * structure (i.e. using 2/38, 2/2/36, etc. fanouts, and not
 * e.g. 4/36 fanout). This means that if a non-note is found at
 * path "dead/beef", the following code will register it as
 * being found on "de/ad/beef".
 * On the other hand, if you use such non-obvious non-note
 * paths in the middle of a notes tree, you deserve what's
 * coming to you ;). Note that for non-notes that are not
 * SHA1-like at the top level, there will be no problems.
 *
 * To conclude, it is strongly advised to make sure non-notes
 * have at least one non-hex character in the top-level path
 * component.
 */

This was enough of a nerd snipe to get me to dig into the code.

It turns out that the comment is incorrect, but there was nevertheless
plenty that could be cleaned up in the area:

* Make macro `GIT_NIBBLE` safer by adding some parentheses
* Remove some dead code
* Fix some memory leaks
* Fix some obsolete and incorrect comments
* Reject "notes" that are not blobs

I hope the result is also easier to understand.

This branch is also available from my Git fork [1] as branch
`load-subtree-cleanup`.

Michael

[1] https://github.com/mhagger/git

Michael Haggerty (12):
  notes: make GET_NIBBLE macro more robust
  load_subtree(): remove unnecessary conditional
  load_subtree(): reduce the scope of some local variables
  load_subtree(): fix incorrect comment
  load_subtree(): separate logic for internal vs. terminal entries
  load_subtree(): check earlier whether an internal node is a tree entry
  load_subtree(): only consider blobs to be potential notes
  get_oid_hex_segment(): return 0 on success
  load_subtree(): combine some common code
  get_oid_hex_segment(): don't pad the rest of `oid`
  hex_to_bytes(): simpler replacement for `get_oid_hex_segment()`
  load_subtree(): declare some variables to be `size_t`

 notes.c | 136 +++-
 1 file changed, 66 insertions(+), 70 deletions(-)

-- 
2.11.0



[PATCH 01/12] notes: make GET_NIBBLE macro more robust

2017-08-26 Thread Michael Haggerty
Put parentheses around sha1. Otherwise it could fail for something
like

GET_NIBBLE(n, (unsigned char *)data);

Signed-off-by: Michael Haggerty 
---
 notes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/notes.c b/notes.c
index f090c88363..00630a9396 100644
--- a/notes.c
+++ b/notes.c
@@ -64,7 +64,7 @@ struct non_note {
 #define CLR_PTR_TYPE(ptr)   ((void *) ((uintptr_t) (ptr) & ~3))
 #define SET_PTR_TYPE(ptr, type) ((void *) ((uintptr_t) (ptr) | (type)))
 
-#define GET_NIBBLE(n, sha1) (((sha1[(n) >> 1]) >> ((~(n) & 0x01) << 2)) & 0x0f)
+#define GET_NIBBLE(n, sha1) sha1)[(n) >> 1]) >> ((~(n) & 0x01) << 2)) & 
0x0f)
 
 #define KEY_INDEX (GIT_SHA1_RAWSZ - 1)
 #define FANOUT_PATH_SEPARATORS ((GIT_SHA1_HEXSZ / 2) - 1)
-- 
2.11.0



[PATCH 04/12] load_subtree(): fix incorrect comment

2017-08-26 Thread Michael Haggerty
This comment was added in 851c2b3791 (Teach notes code to properly
preserve non-notes in the notes tree, 2010-02-13) when the
corresponding code was added. But I believe it was incorrect even
then. The condition `path_len != 2` a dozen lines up prevents a path
like "dead/beef" from being converted to "de/ad/beef", and indeed the
test added in commit 851c2b3 verifies that this case works correctly.

Signed-off-by: Michael Haggerty 
---
 notes.c | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/notes.c b/notes.c
index fbed8c3013..62ab3f4ce3 100644
--- a/notes.c
+++ b/notes.c
@@ -468,23 +468,13 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
 
 handle_non_note:
/*
-* Determine full path for this non-note entry:
-* The filename is already found in entry.path, but the
-* directory part of the path must be deduced from the subtree
-* containing this entry. We assume here that the overall notes
-* tree follows a strict byte-based progressive fanout
-* structure (i.e. using 2/38, 2/2/36, etc. fanouts, and not
-* e.g. 4/36 fanout). This means that if a non-note is found at
-* path "dead/beef", the following code will register it as
-* being found on "de/ad/beef".
-* On the other hand, if you use such non-obvious non-note
-* paths in the middle of a notes tree, you deserve what's
-* coming to you ;). Note that for non-notes that are not
-* SHA1-like at the top level, there will be no problems.
-*
-* To conclude, it is strongly advised to make sure non-notes
-* have at least one non-hex character in the top-level path
-* component.
+* Determine full path for this non-note entry. The
+* filename is already found in entry.path, but the
+* directory part of the path must be deduced from the
+* subtree containing this entry based on our
+* knowledge that the overall notes tree follows a
+* strict byte-based progressive fanout structure
+* (i.e. using 2/38, 2/2/36, etc. fanouts).
 */
{
struct strbuf non_note_path = STRBUF_INIT;
-- 
2.11.0



[PATCH 06/12] load_subtree(): check earlier whether an internal node is a tree entry

2017-08-26 Thread Michael Haggerty
If an entry is not a tree entry, then it cannot possibly be an
internal node. But the old code checked this condition only after
allocating a leaf_node object and therefore leaked that memory.
Instead, check before even entering this branch of the code.

Signed-off-by: Michael Haggerty 
---
 notes.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/notes.c b/notes.c
index 768902055e..ac69c5aa18 100644
--- a/notes.c
+++ b/notes.c
@@ -449,6 +449,11 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
oidcpy(>val_oid, entry.oid);
} else if (path_len == 2) {
/* This is potentially an internal node */
+
+   if (!S_ISDIR(entry.mode))
+   /* internal nodes must be trees */
+   goto handle_non_note;
+
if (get_oid_hex_segment(entry.path, 2,
object_oid.hash + prefix_len,
GIT_SHA1_RAWSZ - prefix_len) < 
0)
@@ -459,8 +464,6 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
xcalloc(1, sizeof(struct leaf_node));
oidcpy(>key_oid, _oid);
oidcpy(>val_oid, entry.oid);
-   if (!S_ISDIR(entry.mode))
-   goto handle_non_note; /* not subtree */
l->key_oid.hash[KEY_INDEX] = (unsigned char) 
(prefix_len + 1);
} else {
/* This can't be part of a note */
-- 
2.11.0