Re: [RFC/PATCH 0/5] stop installing old libexec aliases like "git-init"

2018-11-16 Thread Michael Haggerty
On Fri, Nov 16, 2018 at 11:38 AM Ævar Arnfjörð Bjarmason
 wrote:
> [...]
> A follow-up on this: We should really fix this for other
> reasons. I.e. compile in some "this is stuff we ourselves think is in
> git".
>
> There's other manifestations of this, e.g.:
>
> git-sizer --help # => shows you help
> git sizer --help # => says it doesn't have a manpage
>
> Because we aren't aware that git-sizer is some external tool, and that
> we should route --help to it.

That would be nice. This has been an annoying for several tools named
`git-foo` that I have worked on (e.g., git-sizer, git-imerge,
git-when-merged, plus many internal tools).

> Non-withstanding the arguable bug that things like git-sizer shouldn't
> be allowing themselves to be invoked by "git" like that without
> guaranteeing that it can consume all the options 'git' expects. When I
> had to deal with a similar problem in an external git-* command I was
> maintaining I simply made it an error to invoke it as "git mything"
> instead of "git-mything".

Hmmm, I always thought that it was intended and supported that an
external tool can name itself `git-foo` so that it can be invoked as
`git foo`.

Which git options do you think that such a tool should be expected to
support? Many useful ones, like `-C `, `--git-dir=`,
`--work-tree=`, `-c =`, and `--no-replace-objects`,
work pretty much for free if the external tool uses `git` to interact
with the repository. I use such options regularly with external tools.
IMO it would be a regression for these tools to refuse to run when
invoked as, say, `git -C path/to/repo foo`.

Michael


Re: [PATCH 00/17] object_id part 14

2018-07-14 Thread Michael Haggerty
On Mon, Jul 9, 2018 at 6:15 AM Derrick Stolee  wrote:
> On 7/8/2018 11:12 PM, Jacob Keller wrote:
> > On Sun, Jul 8, 2018 at 4:39 PM brian m. carlson
> >  wrote:
> >> This is the fourteenth series of patches to switch to using struct
> >> object_id and the_hash_algo.  This series converts several core pieces
> >> to use struct object_id, including the oid* and hex functions.
> >>
> >> All of these patches have been tested with both SHA-1 and a 256-bit
> >> hash.
> >>
> > I read through the series, and didn't spot anything odd, except for
> > the question about reasoning for why we use memcmp directly over using
> > hashcmp. I don't think that's any sort of blocker, it just seemed an
> > odd decision to me.
>
> I also read through the series and only found the 100/200 constants
> confusing. Not worth blocking on, but I'm CC'ing Michael Haggerty to
> comment if he knows how the magic 100 was computed.

The magic 100 blames back to our chief magician, Junio:

8ac65937d0 Make sure we do not write bogus reflog entries. (2007-01-26)

Since then, as far as I can tell, it's just been copy-pasted forward.
It would be easy to compute it precisely based on the length of the
two OIDs, represented as hex strings, plus the few extra characters in
the format string.

Michael


Re: [PATCH] xdiff: reduce indent heuristic overhead

2018-07-03 Thread Michael Haggerty
On Mon, Jul 2, 2018 at 7:27 PM Stefan Beller  wrote:
> On Sun, Jul 1, 2018 at 8:57 AM Michael Haggerty  wrote:
> [...]
> So this suggests to have MAX_BORING to be
> "hunk size + some small constant offset" ?

That would be my suggestion, yes. There are cases where it will be
more expensive than a fixed `MAX_BORING`, but I bet on average it will
be faster. Plus, it should always give the right answer.

Michael


Re: [PATCH] xdiff: reduce indent heuristic overhead

2018-07-01 Thread Michael Haggerty
On 06/29/2018 10:28 PM, Stefan Beller wrote:
> [...]
> Adds some threshold to avoid expensive cases, like:
> 
> ```
> #!python
> open('a', 'w').write(" \n" * 100)
> open('b', 'w').write(" \n" * 101)
> ```
> 
> The indent heuristic is O(N * 20) (N = 100) for the above case, and
> makes diff much slower.
> [...]
> +/*
> + * For indentation heuristic, skip searching for better slide position after
> + * checking MAX_BORING lines without finding an improvement. This defends the
> + * indentation heuristic logic against pathological cases. The value is not
> + * picked scientifically but should be good enough.
> + */
> +#define MAX_BORING 100

This is an interesting case, and a speed difference of almost a factor
of five seems impressive. But this is a pretty pathological case, isn't
it? And I'm pretty sure that the algorithm is `O(N)` both before and
after this change. Remember that to find `earliest_end` and `g.end`,
there has already been a scan through all 100 lines. In other words,
you're not improving how the overall algorithm scales with `N`; you're
only changing the constant factor in front. So it's a little bit
questionable whether it is worth complicating the code for this unusual
case.

But *if* we want to improve this case, I think that we could be smarter
about it.

By the time we get to this point in the code, we already know that there
is a "slider" hunk of length `M` (`groupsize`) that can be slid up or
down within a range of `N` (`g.end - earliest_end + 1`) possible
positions. The interesting case here is `N ≫ M`, because then naively
the number of positions to try out is a lot bigger than the size of the
hunk itself. (In the case described above, `N` is 100 and `M` is 1.)

But how can that situation even arise? Remember, a hunk can only be slid
down by a line if the first line *after* the hunk is identical to the
first line *of* the hunk. It follows that if you shift a hunk down `M`
lines, then it has the same contents as when you started—you've just
rotated all of the hunk lines around once.

So if `N ≫ M`, there is necessarily a lot of repetition among the `N +
M` lines that the hunk could possibly overlay. Specifically, it must
consist of `floor((N + M)/M)` identical copies of the hunk, plus
possibly a few leftover lines constituting the start of another repetition.

Given this large amount of repetition, it seems to me that there is
never a need to scan more than the bottom `M + 1` possible positions [1]
plus the highest possible position [2] to be sure of finding the very
best one. In the pathological case that you described above, where `M`
is 1, only three positions have to be evaluated, not 100.

In fact, it *could* be that there is even more repetition, namely if the
hunk itself contains multiple copies of an even shorter block of `K`
lines. In that case, you would only have to scan `K + 1` positions at
the bottom plus one at the top to be sure to find the best hunk
position. This would be an interesting optimization for a case like

> open('a', 'w').write(" \n" * 100)
> open('b', 'w').write(" \n" * 110)

(`N = 100`, `M = 10`, `K = 1`) or

> open('a', 'w').write("\nMISSING\n\n" * 100)
> open('b', 'w').write("\nMISSING\n\n" * 110)

(`N = 300`, `M = 30`, `K = 3`). On the other hand, it's not
entirely trivial to find periodicity in a group of lines (i.e., to find
`K`), and I don't know offhand how that task scales with `M`.

Michael

[1] Actually, to be rigorously correct it might be necessary to check
even a bit more than `M + 1` positions at the bottom because the
heuristic looks a bit beyond the lines of the hunk.

[2] The position at the top has different predecessor lines than the
other positions, so it could have a lower score than all of the others.
It's worth checking it. Here too, to be rigorously correct it might be
necessary to check more than one position at the top because the
heuristic looks a bit beyond the lines of the hunk.


Re: [PATCH] t9104: kosherly remove remote refs

2018-06-02 Thread Michael Haggerty
On Fri, Jun 1, 2018 at 7:08 AM, Christian Couder
 wrote:
> As there are plans to implement other ref storage systems,
> let's use a way to remove remote refs that does not depend
> on refs being files.
>
> This makes it clear to readers that this test does not
> depend on which ref backend is used.
>
> Suggested-by: Michael Haggerty 
> Helped-by: Jeff King 
> Signed-off-by: Christian Couder 
> ---
> This was suggested and discussed in:
>
> https://public-inbox.org/git/20180525085906.ga2...@sigill.intra.peff.net/
>
>  t/t9104-git-svn-follow-parent.sh | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/t/t9104-git-svn-follow-parent.sh 
> b/t/t9104-git-svn-follow-parent.sh
> index 9c49b6c1fe..5e0ad19177 100755
> --- a/t/t9104-git-svn-follow-parent.sh
> +++ b/t/t9104-git-svn-follow-parent.sh
> @@ -215,7 +215,9 @@ test_expect_success "multi-fetch continues to work" "
> "
>
>  test_expect_success "multi-fetch works off a 'clean' repository" '
> -   rm -rf "$GIT_DIR/svn" "$GIT_DIR/refs/remotes" &&
> +   rm -rf "$GIT_DIR/svn" &&
> +   git for-each-ref --format="option no-deref%0adelete %(refname)" 
> refs/remotes |
> +   git update-ref --stdin &&
> git reflog expire --all --expire=all &&
> mkdir "$GIT_DIR/svn" &&
> git svn multi-fetch
> --
> 2.17.0.1035.g12039e008f

+1 LGTM.

Michael


Re: [PATCH 1/3] refs/packed-backend.c: close fd of empty file

2018-05-31 Thread Michael Haggerty
On Wed, May 30, 2018 at 7:03 PM, Stefan Beller  wrote:
> Signed-off-by: Stefan Beller 
> ---
>
> This was an oversight in 01caf20d57a (load_contents(): don't try to mmap an
> empty file, 2018-01-24).
>
> This and the following 2 patches apply on master.
>
>  refs/packed-backend.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index cec3fb9e00f..d447a731da0 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -499,6 +499,7 @@ static int load_contents(struct snapshot *snapshot)
> size = xsize_t(st.st_size);
>
> if (!size) {
> +   close(fd);
> return 0;
> } else if (mmap_strategy == MMAP_NONE || size <= SMALL_FILE_SIZE) {
> snapshot->buf = xmalloc(size);
> --
> 2.17.1.1185.g55be947832-goog
>

+1.

Thanks,
Michael


Re: [PATCH] t990X: use '.git/objects' as 'deep inside .git' path

2018-05-26 Thread Michael Haggerty
On Sat, May 26, 2018 at 8:47 AM, Christian Couder
 wrote:
> Tests t9902-completion.sh and t9903-bash-prompt.sh each have tests
> that check what happens when we are "in the '.git' directory" and
> when we are "deep inside the '.git' directory".
>
> To test the case when we are "deep inside the '.git' directory" the
> test scripts used to perform a `cd .git/refs/heads`.
>
> As there are plans to implement other ref storage systems, let's
> use '.git/objects' instead of '.git/refs/heads' as the "deep inside
> the '.git' directory" path.

Seems reasonable to me. +1.

Michael


Re: [PATCH v2] t: make many tests depend less on the refs being files

2018-05-25 Thread Michael Haggerty
On Fri, May 25, 2018 at 10:59 AM, Jeff King <p...@peff.net> wrote:
> On Fri, May 25, 2018 at 10:48:04AM +0200, Michael Haggerty wrote:
>
>> >  test_expect_success "multi-fetch works off a 'clean' repository" '
>> > -   rm -r "$GIT_DIR/svn" "$GIT_DIR/refs/remotes" "$GIT_DIR/logs" &&
>> > +   rm -rf "$GIT_DIR/svn" "$GIT_DIR/refs/remotes" &&
>> > +   git reflog expire --all --expire=all &&
>> > mkdir "$GIT_DIR/svn" &&
>> > git svn multi-fetch
>> > '
>> >
>>
>> `rm -rf "$GIT_DIR/refs/remotes"` is not kosher. I think it can be written
>>
>> printf 'option no-deref\ndelete %s\n' $(git for-each-ref
>> --format='%(refname)' refs/remotes) | git update-ref --stdin
>>
>> as long as the number of references doesn't exceed command-line limits.
>> This will also take care of the reflogs. Another alternative would be to
>> write it as a loop.
>
> Perhaps:
>
>   git for-each-ref --format="option no-deref%0adelete %(refname)" 
> refs/remotes |
>   git update-ref --stdin

Ah yes, that's nicer. I tried with `\n`, but that's not supported
(wouldn't it be nice if it were?). I didn't think to try `%0a` (let
alone look in the documentation!)

Michael


Re: [PATCH v2] t: make many tests depend less on the refs being files

2018-05-25 Thread Michael Haggerty
On 05/23/2018 07:25 AM, Christian Couder wrote:
> From: David Turner 
> 
> Many tests are very focused on the file system representation of the
> loose and packed refs code. As there are plans to implement other
> ref storage systems, let's migrate these tests to a form that test
> the intent of the refs storage system instead of it internals.
> [...]
> 
> diff --git a/t/t1401-symbolic-ref.sh b/t/t1401-symbolic-ref.sh
> index 9e782a8122..a4ebb0b65f 100755
> --- a/t/t1401-symbolic-ref.sh
> +++ b/t/t1401-symbolic-ref.sh
> @@ -65,7 +65,7 @@ reset_to_sane
>  test_expect_success 'symbolic-ref fails to delete real ref' '
>   echo "fatal: Cannot delete refs/heads/foo, not a symbolic ref" >expect 
> &&
>   test_must_fail git symbolic-ref -d refs/heads/foo >actual 2>&1 &&
> - test_path_is_file .git/refs/heads/foo &&
> + git rev-parse --verify refs/heads/foo &&
>   test_cmp expect actual
>  '
>  reset_to_sane

Should t1401 be considered a backend-agnostic test, or is it needed to
ensure that symbolic refs are written correctly in the files backend?

> diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
> index c0ef946811..222dc2c377 100755
> --- a/t/t3200-branch.sh
> +++ b/t/t3200-branch.sh
> @@ -234,34 +234,34 @@ test_expect_success 'git branch -M master2 master2 
> should work when master is ch
>  
>  test_expect_success 'git branch -v -d t should work' '
>   git branch t &&
> - test_path_is_file .git/refs/heads/t &&
> + git rev-parse --verify refs/heads/t &&
>   git branch -v -d t &&
> - test_path_is_missing .git/refs/heads/t
> + test_must_fail git rev-parse --verify refs/heads/t
>  '
>  
>  test_expect_success 'git branch -v -m t s should work' '
>   git branch t &&
> - test_path_is_file .git/refs/heads/t &&
> + git rev-parse --verify refs/heads/t &&
>   git branch -v -m t s &&
> - test_path_is_missing .git/refs/heads/t &&
> - test_path_is_file .git/refs/heads/s &&
> + test_must_fail git rev-parse --verify refs/heads/t &&
> + git rev-parse --verify refs/heads/s &&
>   git branch -d s
>  '
>  
>  test_expect_success 'git branch -m -d t s should fail' '
>   git branch t &&
> - test_path_is_file .git/refs/heads/t &&
> + git rev-parse refs/heads/t &&
>   test_must_fail git branch -m -d t s &&
>   git branch -d t &&
> - test_path_is_missing .git/refs/heads/t
> + test_must_fail git rev-parse refs/heads/t
>  '
>  
>  test_expect_success 'git branch --list -d t should fail' '
>   git branch t &&
> - test_path_is_file .git/refs/heads/t &&
> + git rev-parse refs/heads/t &&
>   test_must_fail git branch --list -d t &&
>   git branch -d t &&
> - test_path_is_missing .git/refs/heads/t
> + test_must_fail git rev-parse refs/heads/t
>  '
>  
>  test_expect_success 'git branch --list -v with --abbrev' '
> diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
> index aefde7b172..1f871d3cca 100755
> --- a/t/t3903-stash.sh
> +++ b/t/t3903-stash.sh
> @@ -726,7 +726,7 @@ test_expect_success 'store updates stash ref and reflog' '
>   git reset --hard &&
>   ! grep quux bazzy &&
>   git stash store -m quuxery $STASH_ID &&
> - test $(cat .git/refs/stash) = $STASH_ID &&
> + test $(git rev-parse stash) = $STASH_ID &&
>   git reflog --format=%H stash| grep $STASH_ID &&
>   git stash pop &&
>   grep quux bazzy
> diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> index 0680dec808..d4f435155f 100755
> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -30,7 +30,7 @@ add () {
>   test_tick &&
>   commit=$(echo "$text" | git commit-tree $tree $parents) &&
>   eval "$name=$commit; export $name" &&
> - echo $commit > .git/refs/heads/$branch &&
> + git update-ref "refs/heads/$branch" "$commit" &&
>   eval ${branch}TIP=$commit
>  }
>  
> @@ -45,10 +45,10 @@ pull_to_client () {
>  
>   case "$heads" in
>   *A*)
> - echo $ATIP > .git/refs/heads/A;;
> + git update-ref refs/heads/A "$ATIP";;
>   esac &&
>   case "$heads" in *B*)
> - echo $BTIP > .git/refs/heads/B;;
> + git update-ref refs/heads/B "$BTIP";;
>   esac &&
>   git symbolic-ref HEAD refs/heads/$(echo $heads \
>   | sed -e "s/^\(.\).*$/\1/") &&
> @@ -92,8 +92,8 @@ test_expect_success 'setup' '
>   cur=$(($cur+1))
>   done &&
>   add B1 $A1 &&
> - echo $ATIP > .git/refs/heads/A &&
> - echo $BTIP > .git/refs/heads/B &&
> + git update-ref refs/heads/A "$ATIP" &&
> + git update-ref refs/heads/B "$BTIP" &&
>   git symbolic-ref HEAD refs/heads/B
>  '
>  
> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> index ae5a530a2d..e402aee6a2 100755
> --- a/t/t5510-fetch.sh
> 

Re: commit-graph: change in "best" merge-base when ambiguous

2018-05-25 Thread Michael Haggerty
On 05/25/2018 12:08 AM, Jakub Narebski wrote:
> Derrick Stolee <sto...@gmail.com> writes:
>> On 5/22/2018 1:39 AM, Michael Haggerty wrote:
>>> On 05/21/2018 08:10 PM, Derrick Stolee wrote:
>>>> [...]
>>> This may be beyond the scope of what you are working on, but there are
>>> significant advantages to selecting a "best" merge base from among the
>>> candidates. Long ago [1] I proposed that the "best" merge base is the
>>> merge base candidate that minimizes the number of non-merge commits that
>>> are in
>>>
>>>  git rev-list $candidate..$branch
>>>
>>> that are already in master:
>>>
>>>  git rev-list $master
>>>
>>> (assuming merging branch into master), which is equivalent to choosing
>>> the merge base that minimizes
>>>
>>>  git rev-list --count $candidate..$branch
> 
> Is the above correct...
> 
>>> In fact, this criterion is symmetric if you exchange branch ↔ master,
>>> which is a nice property, and indeed generalizes pretty simply to
>>> computing the merge base of more than two commits.
> 
> ...as it doesn't seem to have the described symmetry.

The first email that I referenced [1] demonstrates this in the section
"Symmetry; generalization to more than two branches". The same thing is
demonstrated in a simpler way using set notation in a later email in
that thread [2].

Michael

[1] https://public-inbox.org/git/539a25bf.4060...@alum.mit.edu/
[2] https://public-inbox.org/git/53a06264.9080...@alum.mit.edu/


Re: commit-graph: change in "best" merge-base when ambiguous

2018-05-21 Thread Michael Haggerty
On 05/21/2018 08:10 PM, Derrick Stolee wrote:
> [...]
> In the Discussion section of the `git merge-base` docs [1], we have the
> following:
> 
>     When the history involves criss-cross merges, there can be more than
> one best common ancestor for two commits. For example, with this topology:
> 
>     ---1---o---A
>         \ /
>      X
>         / \
>     ---2---o---o---B
> 
>     both 1 and 2 are merge-bases of A and B. Neither one is better than
> the other (both are best merge bases). When the --all option is not
> given,     it is unspecified which best one is output.
> 
> This means our official documentation mentions that we do not have a
> concrete way to differentiate between these choices. This makes me think
> that this change in behavior is not a bug, but it _is_ a change in
> behavior. It's worth mentioning, but I don't think there is any value in
> making sure `git merge-base` returns the same output.
> 
> Does anyone disagree? Is this something we should solidify so we always
> have a "definitive" merge-base?
> [...]

This may be beyond the scope of what you are working on, but there are
significant advantages to selecting a "best" merge base from among the
candidates. Long ago [1] I proposed that the "best" merge base is the
merge base candidate that minimizes the number of non-merge commits that
are in

git rev-list $candidate..$branch

that are already in master:

git rev-list $master

(assuming merging branch into master), which is equivalent to choosing
the merge base that minimizes

git rev-list --count $candidate..$branch

In fact, this criterion is symmetric if you exchange branch ↔ master,
which is a nice property, and indeed generalizes pretty simply to
computing the merge base of more than two commits.

In that email I also included some data showing that the "best" merge
base almost always results in either the same or a shorter diff than the
more or less arbitrary algorithm that we currently use. Sometimes the
difference in diff length is dramatic.

To me it feels like the best *deterministic* merge base would be based
on the above criterion, maybe with first-parent reachability, commit
times, and SHA-1s used (in that order) to break ties.

I don't plan to work on the implementation of this idea myself (though
we've long used a script-based implementation of this algorithm
internally at GitHub).

Michael

[1] https://public-inbox.org/git/539a25bf.4060...@alum.mit.edu/
See the rest of the thread for more interesting discussion.
[2]
https://public-inbox.org/git/8a9b3f20-eed2-c59b-f7ea-3c68b3c30...@alum.mit.edu/
Higher in this thread, Junio proposes a different criterion.


Re: Implementing reftable in Git

2018-05-11 Thread Michael Haggerty
On Wed, May 9, 2018 at 4:33 PM, Christian Couder
 wrote:
> I might start working on implementing reftable in Git soon.
> [...]

Nice. It'll be great to have a reftable implementation in git core
(and ideally libgit2, as well). It seems to me that it could someday
become the new default reference storage method. The file format is
considerably more complicated than the current loose/packed scheme,
which is definitely a disadvantage (for example, for other Git
implementations). But implementing it *with good performance and
without races* might be no more complicated than the current scheme.

Testing will be important. There are already many tests specifically
about testing loose/packed reference storage. These will always have
to run against repositories that are forced to use that reference
scheme. And there will need to be new tests specifically about the
reftable scheme. Both classes of tests should be run every time. That
much is pretty obvious.

But currently, there are a lot of tests that assume the loose/packed
reference format on disk even though the tests are not really related
to references at all. ISTM that these should be converted to work at a
higher level, for example using `for-each-ref`, `rev-parse`, etc. to
examine references rather than reading reference files directly. That
way the tests should run correctly regardless of which scheme is in
use.

And since it's too expensive to run the whole test suite with both
reference storage schemes, it seems to me that the reference storage
scheme that is used while running the scheme-neutral tests should be
easy to choose at runtime.

David Turner did some analogous work for wiring up and testing his
proposed LMDB ref storage backend that might be useful [1]. I'm CCing
him, since he might have thoughts on this topic.

Regarding the reftable spec itself:

I recently gave a little internal talk about it, and while preparing
the talk I noticed a couple of things that should maybe be tweaked:

* The spec proposes to change `$GIT_DIR/refs`, which is currently a
directory that holds the loose refs, into a file that holds the table
of contents of reftable files comprising the full set of references.
This was my suggestion. I was thinking that this would prevent old
refs code from being used accidentally on a reftable-enabled
repository, while still enabling old versions of Git recognize this as
a git directory [2]. I think that the latter is important to make
things like `git rev-parse --git-dir` work correctly, even if the
installed version of git can't actually *read* the repository.

  The problem is that `is_git_directory()` checks not only whether
`$GIT_DIR/refs` exists, but also whether it is executable (i.e., since
it is normally a directory, that it is searchable). It would be silly
to make the reftable table of contents executable, so this doesn't
seem like a good approach after all.

  So probably `$GIT_DIR/refs` should continue to be a directory. If
it's there, it would probably make sense to place the reftable files
and maybe the ToC inside of it. We would have to rely on older Git
versions refusing to work in the directory because its `config` file
has an unrecognized `core.repositoryFormatVersion`, but that should be
OK I think.

* The scheme for naming reftable files [3] is, I believe, just a
suggestion as far as the spec is concerned (except for the use of
`.ref`/`.log` file extensions). It might be more less unwieldy to use
`%d` rather than `%08d`, and more convenient to name compacted files
to `${min_update_index}-${max_update_index}_${n}.{ref,log}` to make it
clearer to see by inspection what each file contains. That would also
make it unnecessary, in most cases, to insert a `_${n}` to make the
filename unique.

Michael

[1] https://github.com/dturner-tw/git/tree/dturner/pluggable-backends
[2] 
https://github.com/git/git/blob/ccdcbd54c4475c2238b310f7113ab3075b5abc9c/setup.c#L309-L347
[3] 
https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#layout

https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#compaction
[4] 
https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#footer


Re: [PATCH] refs: handle null-oid for pseudorefs

2018-05-07 Thread Michael Haggerty
On 05/06/2018 03:35 PM, Martin Ågren wrote:
> According to the documentation on `git update-ref`, it is possible to
> "specify 40 '0' or an empty string as  to make sure that the
> ref you are creating does not exist." But in the code for pseudorefs, we
> do not implement this. If we fail to read the old ref, we immediately
> die. A failure to read would actually be a good thing if we have been
> given the null-oid.
> 
> With the null-oid, allow -- and even require -- the ref-reading to fail.
> This implements the "make sure that the ref ... does not exist" part of
> the documentation.
> 
> Since we have a `strbuf err` for collecting errors, let's use it and
> signal an error to the caller instead of dying hard.
> 
> Reported-by: Rafael Ascensão 
> Helped-by: Rafael Ascensão 
> Signed-off-by: Martin Ågren 

Thanks for the patch. This looks good to me. But it it seems that the
test coverage related to pseudorefs is still not great. Ideally, all of
the following combinations should be tested:

Pre-update value   | ref-update old OID   | Expected result
---|--|
missing| missing  | accept *
missing| value| reject
set| missing  | reject *
set| correct value| accept
set| wrong value  | reject

I think your test only covers the lines with asterisks. Are the other
scenarios already covered by other tests? If not, how about adding them?
That would give us confidence that the new code works in all circumstances.

Michael


Re: [PATCH 12/16] refs: store the main ref store inside the repository struct

2018-04-10 Thread Michael Haggerty
On 04/10/2018 12:45 AM, Stefan Beller wrote:
> Signed-off-by: Stefan Beller 
> ---
>  refs.c   | 13 +
>  refs.h   |  4 +---
>  repository.h |  3 +++
>  3 files changed, 9 insertions(+), 11 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index f58b9fb7df..b5be754a97 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1608,9 +1608,6 @@ static struct ref_store_hash_entry 
> *alloc_ref_store_hash_entry(
>   return entry;
>  }
>  
> -/* A pointer to the ref_store for the main repository: */
> -static struct ref_store *main_ref_store;
> -
>  /* A hashmap of ref_stores, stored by submodule name: */
>  static struct hashmap submodule_ref_stores;
>  
> @@ -1652,13 +1649,13 @@ static struct ref_store *ref_store_init(const char 
> *gitdir,
>   return refs;
>  }
>  
> -struct ref_store *get_main_ref_store_the_repository(void)
> +struct ref_store *get_main_ref_store(struct repository *r)
>  {
> - if (main_ref_store)
> - return main_ref_store;
> + if (r->main_ref_store)
> + return r->main_ref_store;
>  
> - main_ref_store = ref_store_init(get_git_dir(), REF_STORE_ALL_CAPS);
> - return main_ref_store;
> + r->main_ref_store = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
> + return r->main_ref_store;
>  }
>  
>  /*
> diff --git a/refs.h b/refs.h
> index ab3d2bec2f..f5ab68c0ed 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -760,9 +760,7 @@ int reflog_expire(const char *refname, const struct 
> object_id *oid,
>  
>  int ref_storage_backend_exists(const char *name);
>  
> -#define get_main_ref_store(r) \
> - get_main_ref_store_##r()
> -struct ref_store *get_main_ref_store_the_repository(void);
> +struct ref_store *get_main_ref_store(struct repository *r);
>  /*
>   * Return the ref_store instance for the specified submodule. For the
>   * main repository, use submodule==NULL; such a call cannot fail. For
> diff --git a/repository.h b/repository.h
> index 09df94a472..7d0710b273 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -26,6 +26,9 @@ struct repository {
>*/
>   struct raw_object_store *objects;
>  
> + /* The store in which the refs are held. */
> + struct ref_store *main_ref_store;
> +
>   /*
>* Path to the repository's graft file.
>* Cannot be NULL after initialization.
> 

This also makes sense to me, as far as it goes. I have a few comments
and questions:

Why do you call the new member `main_ref_store`? Is there expected to be
some other `ref_store` associated with a repository?

I think the origin of the name `main_ref_store` was to distinguish it
from submodule ref stores. But presumably those will soon become the
"main" ref stores for their respective submodule repository objects,
right? So maybe calling things `repository.ref_store` and
`get_ref_store(repository)` would be appropriate.

There are some places in the reference code that only work with the main
repository. The ones that I can think of are:

* `ref_resolves_to_object()` depends on an object store.

* `peel_object()` and `ref_iterator_peel()` also have to look up objects
(in this case, tag objects).

* Anything that calls `files_assert_main_repository()` or depends on
`REF_STORE_MAIN` isn't implemented for other reference stores (usually,
I think, these are functions that depend on the object store).

Some of these things might be easy to generalize to non-main
repositories, but I didn't bother because AFAIK currently only the main
repository store is ever mutated.

You can move a now-obsolete comment above the definition of `struct
files_ref_store` if you haven't in some other patch (search for
"libification").

Hope that helps,
Michael


Re: [PATCH 07/16] refs: add repository argument to get_main_ref_store

2018-04-10 Thread Michael Haggerty
On 04/10/2018 12:45 AM, Stefan Beller wrote:
> Add a repository argument to allow the get_main_ref_store caller
> to be more specific about which repository to handle. This is a small
> mechanical change; it doesn't change the implementation to handle
> repositories other than the_repository yet.
> 
> As with the previous commits, use a macro to catch callers passing a
> repository other than the_repository at compile time.

This seems OK to me from a refs perspective.

The macro trick is surprising. I guess it gets you a compile-time check,
under the assumption that nothing else is called `the_repository`. But
why actually commit the macro, as opposed to compiling once locally to
check for correctness, then maybe add something like `assert(r ==
the_repository)` for the actual commit?

But I don't care either way, since the macro disappears again soon.

Michael


[RFC 1/1] packed-backend: ignore broken headers

2018-03-26 Thread Michael Haggerty
Prior to

9308b7f3ca read_packed_refs(): die if `packed-refs` contains bogus data, 
2017-07-01

we silently ignored any lines in a `packed-refs` file that we didn't
understand. That policy was clearly wrong.

But at the time, unrecognized header lines were processed by the same
code as reference and peeled lines. This means that they were also
subject to the same liberal treatment. For example, any of the
following "header" lines would have been ignored:

* "# arbitrary data that looks like a comment"
* "# pack-refs with peeled fully-peeled" ← note: missing colon
* "# pack-refs"

Loosen up the parser to ignore any first line that begins with `#` but
doesn't start with the exact character sequence "# pack-refs with:".

(In fact, the old liberal policy meant that "comment" lines would have
been ignored anywhere in the file. But the file format isn't actually
documented to allow comments, and comments make little sense in this
file, so we won't go that far.)

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
I *don't* think we actually want to merge this patch. See the cover
letter for my reasoning.

 refs/packed-backend.c | 21 +
 t/t3210-pack-refs.sh  | 33 -
 2 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 65288c6472..f9d71bb60d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -635,21 +635,18 @@ static struct snapshot *create_snapshot(struct 
packed_ref_store *refs)
 
tmp = xmemdupz(snapshot->buf, eol - snapshot->buf);
 
-   if (!skip_prefix(tmp, "# pack-refs with:", (const char **)))
-   die_invalid_line(refs->path,
-snapshot->buf,
-snapshot->eof - snapshot->buf);
+   if (skip_prefix(tmp, "# pack-refs with:", (const char **))) {
+   string_list_split_in_place(, p, ' ', -1);
 
-   string_list_split_in_place(, p, ' ', -1);
+   if (unsorted_string_list_has_string(, 
"fully-peeled"))
+   snapshot->peeled = PEELED_FULLY;
+   else if (unsorted_string_list_has_string(, 
"peeled"))
+   snapshot->peeled = PEELED_TAGS;
 
-   if (unsorted_string_list_has_string(, "fully-peeled"))
-   snapshot->peeled = PEELED_FULLY;
-   else if (unsorted_string_list_has_string(, "peeled"))
-   snapshot->peeled = PEELED_TAGS;
+   sorted = unsorted_string_list_has_string(, 
"sorted");
 
-   sorted = unsorted_string_list_has_string(, "sorted");
-
-   /* perhaps other traits later as well */
+   /* perhaps other traits later as well */
+   }
 
/* The "+ 1" is for the LF character. */
snapshot->start = eol + 1;
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index afa27ffe2d..353ef3e655 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -20,7 +20,8 @@ test_expect_success \
 'echo Hello > A &&
  git update-index --add A &&
  git commit -m "Initial commit." &&
- HEAD=$(git rev-parse --verify HEAD)'
+ HEAD=$(git rev-parse --verify HEAD) &&
+ git tag -m "A tag" annotated-tag'
 
 SHA1=
 
@@ -221,6 +222,36 @@ test_expect_success 'reject packed-refs with a short 
SHA-1' '
test_cmp expected_err err
 '
 
+test_expect_success 'handle packed-refs with a bogus header' '
+   git show-ref -d >expected &&
+   mv .git/packed-refs .git/packed-refs.bak &&
+   test_when_finished "mv .git/packed-refs.bak .git/packed-refs" &&
+   sed -e "s/^#.*/# whacked-refs/" <.git/packed-refs.bak >.git/packed-refs 
&&
+   git show-ref -d >actual 2>err &&
+   test_cmp expected actual &&
+   test_must_be_empty err
+'
+
+test_expect_success 'handle packed-refs with a truncated header' '
+   git show-ref -d >expected &&
+   mv .git/packed-refs .git/packed-refs.bak &&
+   test_when_finished "mv .git/packed-refs.bak .git/packed-refs" &&
+   sed -e "s/^#.*/# pack-refs/" <.git/packed-refs.bak >.git/packed-refs &&
+   git show-ref -d >actual 2>err &&
+   test_cmp expected actual &&
+   test_must_be_empty err
+'
+
+test_expect_success 'handle packed-refs with no traits in header' '
+   git show-ref -d >expected &&
+   mv .git/packed-refs .git/packed-refs.bak &&
+   test_when_finished "mv .git/packed-refs.bak .git/packed-refs" &&
+   sed -e "s/^#.*/# pack-refs with:/" <.git/packed-refs.bak 
>.git/packed-refs &&
+   git show-ref -d >actual 2>err &&
+   test_cmp expected actual &&
+   test_must_be_empty err
+'
+
 test_expect_success 'timeout if packed-refs.lock exists' '
LOCK=.git/packed-refs.lock &&
>"$LOCK" &&
-- 
2.14.2



[RFC 0/1] Tolerate broken headers in `packed-refs` files

2018-03-26 Thread Michael Haggerty
Prior to

9308b7f3ca read_packed_refs(): die if `packed-refs` contains bogus data, 
2017-07-01

we silently ignored pretty much any bogus data in a `packed-refs`
file. I think that was pretty clearly a bad policy. The above commit
made parsing quite a bit stricter, calling `die()` if it found any
lines that it didn't understand.

But there's one situation that is maybe not quite so clear-cut. The
first line of a `packed-refs` file can be a header that enumerates
some traits of the file containing it; for example,

# pack-refs with: peeled fully-peeled 

The old code would have tolerated lots of breakage in that line. For
example, any of the following headers would have just been ignored:

# arbitrary data that looks like a comment
# pack-refs with peeled fully-peeled  ← note: missing colon
# pack-refs

Now, any of the above lines are considered errors and cause git to
die.

In my opinion, that is a good thing and we *shouldn't* tolerate broken
header lines; i.e., the status quo is good and we *shouldn't* apply
this patch.

But there might be some tools out in the wild that have been writing
broken headers. In that case, users who upgrade Git might suddenly
find that they can't read repositories that they could read before. In
fact, a tool that we wrote and use internally at GitHub was doing
exactly that, which is how we discovered this "problem".

This patch shows what it would look like to relax the parsing again,
albeit *only* for the first line of the file, and *only* for lines
that start with '#'.

The problem with this patch is that it would make it harder for people
who implement broken tools in the future to discover their mistakes.
The only result of the error would be that it is slower to work with
the `packed-refs` files that they wrote. Such an error could go
undiscovered for a long time.

The tighter check was released quite a while ago, and AFAIK we haven't
had any bug reports from people tripped up by this consistency check.
So I'm inclined to believe that the existing tools are OK and this
patch would be counterproductive. But I wanted to share it with the
list anyway.

Michael

Michael Haggerty (1):
  packed-backend: ignore broken headers

 refs/packed-backend.c | 21 +
 t/t3210-pack-refs.sh  | 33 -
 2 files changed, 41 insertions(+), 13 deletions(-)

-- 
2.14.2



Re: [ANNOUNCE] git-sizer: compute various size-related metrics for your Git repository

2018-03-18 Thread Michael Haggerty
On Fri, Mar 16, 2018 at 10:29 PM, Jeff King  wrote:
> On Fri, Mar 16, 2018 at 09:01:42PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> One thing that can make repositories very pathological is if the ratio
>> of trees to commits is too low.
>>
>> I was dealing with a repo the other day that had several thousand files
>> all in the same root directory, and no subdirectories.
>
> We've definitely run into this problem before (CocoaPods/Specs, for
> example). The metric that would hopefully show this off is "what is the
> tree object with the most entries". Or possibly "what is the average
> number of entries in a tree object".

I find that the best metric for determining this sort of problem is
"Overall repository size -> Trees -> Total tree entries". If you have
a big directory that is being changed frequently, the *real* problem
is that every commit has to rewrite the whole tree, with all of its
many entries. So "Total tree entries" (or equivalently, the total tree
size) skyrockets. And this means that a history traversal has to
*expand* all of those trees again. So a repository that is problematic
for this reason will have a very large number of tree entries.

If you want to detect a bad repository layout like this *before* it
becomes a problem, then probably something like "average tree entries
per commit" might be a good leading indicator of a problem.

Michael


[ANNOUNCE] git-sizer: compute various size-related metrics for your Git repository

2018-03-16 Thread Michael Haggerty
What makes a Git repository unwieldy to work with and host? It turns
out that the respository's on-disk size in gigabytes is only part of
the story. From our experience at GitHub, repositories cause problems
because of poor internal layout at least as often as because of their
overall size. For example,

* blobs or trees that are too large
* large blobs that are modified frequently (e.g., database dumps)
* large trees that are modified frequently
* trees that expand to unreasonable size when checked out (e.g., "Git
bombs" [2])
* too many tiny Git objects
* too many references
* other oddities, such as giant octopus merges, super long reference
names or file paths, huge commit messages, etc.

`git-sizer` [1] is a new open-source tool that computes various
size-related statistics for a Git repository and points out those that
are likely to cause problems or inconvenience to its users.

I tried to make the output of `git-sizer` "opinionated" and easy to
interpret. Example output for the Linux kernel is appended below. I
also made it memory-efficient and resistant against git bombs.

I've written a blog post [3] about `git-sizer` with more explanation
and examples, and the main project page [1] has a long README with
some information about what the individual metrics mean and tips for
fixing problems.

I also put quite a bit of effort into making `git-sizer` fast. It does
its work (including figuring out path names for large objects) based
on a single traversal of the repository history using `git rev-list
--objects --reverse [...]`, followed by using the output of `git
cat-file --batch` or `git cat-file --batch-check` to get information
about individual objects.

On that subject, let me share some more technical details. `git-sizer`
is written in Go. I prototyped several ways of extracting object
information, which is critical to the performance because `git-sizer`
has to read all of the reachable non-blob objects in the repository.
The results surprised me:

| Mechanism for accessing Git data| Time   |
| --- | -: |
| `libgit2/git2go`| 25.5 s |
| `libgit2/git2go` with `ManagedTree` optimization| 18.9 s |
| `src-d/go-git`  | 63.0 s |
| Git command line client |  6.6 s |

It was almost a factor of four faster to read and parse the output of
Git plumbing commands (mainly `git for-each-ref`, `git rev-list
--objects`, `git cat-file --batch-check`, and `git cat-file --batch`)
than it was to use the Go bindings to libgit2. (I expect that part of
the reason is that Go's peculiar stack layout makes it quite expensive
to call out to C.) Even after Carlos Martin implemented an
experimental `ManagedTree` optimization that removed the need to call
C for every entry in a tree, it was still not competitive with the Git
CLI. `go-git`, which is a Git implementation in pure Go, was even
slower. So the final version of `git-sizer` calls `git` for accessing
the repository.

Feedback is welcome, including about the weightings [4] that I use to
compute the "level of concern" of the various metrics.

Have fun,
Michael

[1] https://github.com/github/git-sizer
[2] https://kate.io/blog/git-bomb/
[3] 
https://blog.github.com/2018-03-05-measuring-the-many-sizes-of-a-git-repository/
[4] 
https://github.com/github/git-sizer/blob/2e9a30f241ac357f2af01d42f0dd51fbbbae4b0b/sizes/output.go#L330-L401

$ git-sizer --verbose
Processing blobs: 1652370
Processing trees: 3396199
Processing commits: 722647
Matching commits to trees: 722647
Processing annotated tags: 534
Processing references: 539
| Name | Value | Level of concern   |
|  | - | -- |
| Overall repository size  |   ||
| * Commits|   ||
|   * Count|   723 k   | *  |
|   * Total size   |   525 MiB | ** |
| * Trees  |   ||
|   * Count|  3.40 M   | ** |
|   * Total size   |  9.00 GiB |    |
|   * Total tree entries   |   264 M   | *  |
| * Blobs  |   ||
|   * Count|  1.65 M   | *  |
|   * Total size   |  55.8 GiB | *  |
| * Annotated tags |   ||
|   * Count|   534 ||
| * References |   ||
|   * Count|   539 |  

Re: [git-sizer] Implications of a large commit object

2018-03-14 Thread Michael Haggerty
On Wed, Mar 14, 2018 at 9:14 AM, Lars Schneider
 wrote:
> I am using Michael's fantastic Git repo analyzer tool "git-sizer" [*]
> and it detected a very large commit of 7.33 MiB in my repo (see chart
> below).
>
> This large commit is expected. I've imported that repo from another
> version control system but excluded all binary files (e.g. images) and
> some 3rd party components as their history is not important [**]. I've
> reintroduced these files in the head commit again. This is where the
> large commit came from.
>
> This repo is not used in production yet but I wonder if this kind of
> approach can cause trouble down the line? Are there any relevant
> implication of a single large commit like this in history?
> [...]
>
> ###
> ## git-sizer output
>
> [...]
> | Name | Value | Level of concern   |
> |  | - | -- |
> [...]
> | Biggest objects  |   ||
> | * Commits|   ||
> |   * Maximum size [1] |  7.33 MiB | !! |
> [...]

The "commit size" that is being referred to here is the size of the
actual commit object; i.e., the author name, parent commits, etc plus
the log message. So a huge commit probably means that you have a huge
log message. This has nothing to do with the number or sizes of the
files added by the commit.

Maybe your migration tool created a huge commit message, for example
listing each of the files that was changed.

AFAIK this won't cause Git itself any problems, but it's likely to be
inconvenient. For example, when you type `git log` and 7 million
characters page by. Or when you use some GUI tool to view your history
and it performs badly because it wasn't built to handle such enormous
commit messages.

Michael


Re: [PATCH] sq_dequote: fix extra consumption of source string

2018-02-18 Thread Michael Haggerty
On Wed, Feb 14, 2018 at 12:41 AM, Jeff King  wrote:
> This fixes a (probably harmless) parsing problem in
> sq_dequote_step(), in which we parse some bogus input
> incorrectly rather than complaining that it's bogus.
> [...]

LGTM. Thanks for taking care of this.

Michael


Re: make git diff output easier to read - use better diff heuristics

2018-02-13 Thread Michael Haggerty
On Tue, Feb 13, 2018 at 7:25 PM, Σπύρος Βαζαίος  wrote:
> While I din't have the experience to express an opinion on this
> matter, I have to say that the --no-indent-heuristic that Jeff
> suggested worked great.
> There were more than a handful of cases that this issue happened in my
> diff file (all were the same: #endif followed by #ifdef).
> Oh, and the language is C indeed.

The "indent heuristic" algorithm that Git now uses by default is
nothing more than that—a heuristic—so it can be fooled. It bases its
decision on the locations of blank lines and the indentations of
non-blank lines. In the vast majority of cases it gives the same or
better results than the old algorithm, but there are some cases, like
yours, where it gives aesthetically less pleasing (though still
correct) results.

The algorithm usually handles C code well, but it tends to be confused
by preprocessor directives, because they are not indented like typical
code. It might be possible to tweak the weights to get it to handle
preprocessor directives better, but that causes it to do worse on
other, more common things like Python code (where blocks are preceded
but not followed by a line with lesser indentation).

Doing significantly better probably would require some amount of
language-awareness, but that's a bigger job than I was willing to take
on.

Michael


[PATCH 4/6] packed_ref_iterator_begin(): make optimization more general

2018-01-24 Thread Michael Haggerty
We can return an empty iterator not only if the `packed-refs` file is
missing, but also if it is empty or if there are no references whose
names succeed `prefix`. Optimize away those cases as well by moving
the call to `find_reference_location()` higher in the function and
checking whether the determined start position is the same as
`snapshot->eof`. (This is possible now because the previous commit
made `find_reference_location()` robust against empty snapshots.)

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 361affd7ad..988c45402b 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -927,7 +927,12 @@ static struct ref_iterator *packed_ref_iterator_begin(
 */
snapshot = get_snapshot(refs);
 
-   if (!snapshot->buf)
+   if (prefix && *prefix)
+   start = find_reference_location(snapshot, prefix, 0);
+   else
+   start = snapshot->start;
+
+   if (start == snapshot->eof)
return empty_ref_iterator_begin();
 
iter = xcalloc(1, sizeof(*iter));
@@ -937,11 +942,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
iter->snapshot = snapshot;
acquire_snapshot(snapshot);
 
-   if (prefix && *prefix)
-   start = find_reference_location(snapshot, prefix, 0);
-   else
-   start = snapshot->start;
-
iter->pos = start;
iter->eof = snapshot->eof;
strbuf_init(>refname_buf, 0);
-- 
2.14.2



[PATCH 5/6] load_contents(): don't try to mmap an empty file

2018-01-24 Thread Michael Haggerty
We don't actually create zero-length `packed-refs` files, but they are
valid and we should handle them correctly. The old code `xmmap()`ed
such files, which led to an error when `munmap()` was called. So, if
the `packed-refs` file is empty, leave the snapshot at its zero values
and return 0 without trying to read or mmap the file.

Returning 0 also makes `create_snapshot()` exit early, which avoids
the technically undefined comparison `NULL < NULL`.

Reported-by: Kim Gybels <kgyb...@infogroep.be>
Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 988c45402b..e829cf206d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -461,7 +461,8 @@ static void verify_buffer_safe(struct snapshot *snapshot)
 /*
  * Depending on `mmap_strategy`, either mmap or read the contents of
  * the `packed-refs` file into the snapshot. Return 1 if the file
- * existed and was read, or 0 if the file was absent. Die on errors.
+ * existed and was read, or 0 if the file was absent or empty. Die on
+ * errors.
  */
 static int load_contents(struct snapshot *snapshot)
 {
@@ -492,19 +493,17 @@ static int load_contents(struct snapshot *snapshot)
die_errno("couldn't stat %s", snapshot->refs->path);
size = xsize_t(st.st_size);
 
-   switch (mmap_strategy) {
-   case MMAP_NONE:
+   if (!size) {
+   return 0;
+   } else if (mmap_strategy == MMAP_NONE) {
snapshot->buf = xmalloc(size);
bytes_read = read_in_full(fd, snapshot->buf, size);
if (bytes_read < 0 || bytes_read != size)
die_errno("couldn't read %s", snapshot->refs->path);
snapshot->mmapped = 0;
-   break;
-   case MMAP_TEMPORARY:
-   case MMAP_OK:
+   } else {
snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 
0);
snapshot->mmapped = 1;
-   break;
}
close(fd);
 
-- 
2.14.2



[PATCH 0/6] Yet another approach to handling empty snapshots

2018-01-24 Thread Michael Haggerty
This patch series fixes the handling of empty packed-refs snapshots
(i.e., those with `snapshot->buf` and friends equal to `NULL`), partly
by changing `snapshot` to store a pointer to the start of the
post-header `packed-refs` content instead of `header_len`. It makes a
couple of other improvements as well.

I'm not sure whether I like this approach better than the alternative
of always setting `snapshot->buf` to a non-NULL value, by allocating a
length-1 bit of RAM if necessary. The latter is less intrusive, though
even if that approach is taken, I think patches 01, 02, and 04 from
this patch series would be worthwhile improvements.

Michael

Kim Gybels (1):
  packed_ref_cache: don't use mmap() for small files

Michael Haggerty (5):
  struct snapshot: store `start` rather than `header_len`
  create_snapshot(): use `xmemdupz()` rather than a strbuf
  find_reference_location(): make function safe for empty snapshots
  packed_ref_iterator_begin(): make optimization more general
  load_contents(): don't try to mmap an empty file

 refs/packed-backend.c | 106 ++
 1 file changed, 55 insertions(+), 51 deletions(-)

-- 
2.14.2



[PATCH 2/6] create_snapshot(): use `xmemdupz()` rather than a strbuf

2018-01-24 Thread Michael Haggerty
It's lighter weight.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index b872267f02..08698de6ea 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -620,8 +620,7 @@ static struct snapshot *create_snapshot(struct 
packed_ref_store *refs)
 
/* If the file has a header line, process it: */
if (snapshot->buf < snapshot->eof && *snapshot->buf == '#') {
-   struct strbuf tmp = STRBUF_INIT;
-   char *p, *eol;
+   char *tmp, *p, *eol;
struct string_list traits = STRING_LIST_INIT_NODUP;
 
eol = memchr(snapshot->buf, '\n',
@@ -631,9 +630,9 @@ static struct snapshot *create_snapshot(struct 
packed_ref_store *refs)
  snapshot->buf,
  snapshot->eof - snapshot->buf);
 
-   strbuf_add(, snapshot->buf, eol - snapshot->buf);
+   tmp = xmemdupz(snapshot->buf, eol - snapshot->buf);
 
-   if (!skip_prefix(tmp.buf, "# pack-refs with:", (const char 
**)))
+   if (!skip_prefix(tmp, "# pack-refs with:", (const char **)))
die_invalid_line(refs->path,
 snapshot->buf,
 snapshot->eof - snapshot->buf);
@@ -653,7 +652,7 @@ static struct snapshot *create_snapshot(struct 
packed_ref_store *refs)
snapshot->start = eol + 1;
 
string_list_clear(, 0);
-   strbuf_release();
+   free(tmp);
}
 
verify_buffer_safe(snapshot);
-- 
2.14.2



[PATCH 6/6] packed_ref_cache: don't use mmap() for small files

2018-01-24 Thread Michael Haggerty
From: Kim Gybels <kgyb...@infogroep.be>

Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for
small files, 2010-02-21) and use read() instead of mmap() for small
packed-refs files.

Signed-off-by: Kim Gybels <kgyb...@infogroep.be>
Signed-off-by: Junio C Hamano <gits...@pobox.com>
Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index e829cf206d..8b4b45da67 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -458,6 +458,8 @@ static void verify_buffer_safe(struct snapshot *snapshot)
 last_line, eof - last_line);
 }
 
+#define SMALL_FILE_SIZE (32*1024)
+
 /*
  * Depending on `mmap_strategy`, either mmap or read the contents of
  * the `packed-refs` file into the snapshot. Return 1 if the file
@@ -495,7 +497,7 @@ static int load_contents(struct snapshot *snapshot)
 
if (!size) {
return 0;
-   } else if (mmap_strategy == MMAP_NONE) {
+   } else if (mmap_strategy == MMAP_NONE || size <= SMALL_FILE_SIZE) {
snapshot->buf = xmalloc(size);
bytes_read = read_in_full(fd, snapshot->buf, size);
if (bytes_read < 0 || bytes_read != size)
-- 
2.14.2



[PATCH 3/6] find_reference_location(): make function safe for empty snapshots

2018-01-24 Thread Michael Haggerty
This function had two problems if called for an empty snapshot (i.e.,
`snapshot->start == snapshot->eof == NULL`):

* It checked `NULL < NULL`, which is undefined by C (albeit highly
  unlikely to fail in the real world).

* (Assuming the above comparison behaved as expected), it returned
  NULL when `mustexist` was false, contrary to its docstring.

Change the check and fix the docstring.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 08698de6ea..361affd7ad 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -519,9 +519,11 @@ static int load_contents(struct snapshot *snapshot)
  * `refname` starts. If `mustexist` is true and the reference doesn't
  * exist, then return NULL. If `mustexist` is false and the reference
  * doesn't exist, then return the point where that reference would be
- * inserted. In the latter mode, `refname` doesn't have to be a proper
- * reference name; for example, one could search for "refs/replace/"
- * to find the start of any replace references.
+ * inserted, or `snapshot->eof` (which might be NULL) if it would be
+ * inserted at the end of the file. In the latter mode, `refname`
+ * doesn't have to be a proper reference name; for example, one could
+ * search for "refs/replace/" to find the start of any replace
+ * references.
  *
  * The record is sought using a binary search, so `snapshot->buf` must
  * be sorted.
@@ -551,7 +553,7 @@ static const char *find_reference_location(struct snapshot 
*snapshot,
 */
const char *hi = snapshot->eof;
 
-   while (lo < hi) {
+   while (lo != hi) {
const char *mid, *rec;
int cmp;
 
-- 
2.14.2



[PATCH 1/6] struct snapshot: store `start` rather than `header_len`

2018-01-24 Thread Michael Haggerty
Store a pointer to the start of the actual references within the
`packed-refs` contents rather than storing the length of the header.
This is more convenient for most users of this field.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 64 ++-
 1 file changed, 33 insertions(+), 31 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 023243fd5f..b872267f02 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -68,17 +68,21 @@ struct snapshot {
int mmapped;
 
/*
-* The contents of the `packed-refs` file. If the file was
-* already sorted, this points at the mmapped contents of the
-* file. If not, this points at heap-allocated memory
-* containing the contents, sorted. If there were no contents
-* (e.g., because the file didn't exist), `buf` and `eof` are
-* both NULL.
+* The contents of the `packed-refs` file:
+*
+* - buf -- a pointer to the start of the memory
+* - start -- a pointer to the first byte of actual references
+ *   (i.e., after the header line, if one is present)
+* - eof -- a pointer just past the end of the reference
+ *   contents
+*
+* If the `packed-refs` file was already sorted, `buf` points
+* at the mmapped contents of the file. If not, it points at
+* heap-allocated memory containing the contents, sorted. If
+* there were no contents (e.g., because the file didn't
+* exist), `buf`, `start`, and `eof` are all NULL.
 */
-   char *buf, *eof;
-
-   /* The size of the header line, if any; otherwise, 0: */
-   size_t header_len;
+   char *buf, *start, *eof;
 
/*
 * What is the peeled state of the `packed-refs` file that
@@ -169,8 +173,7 @@ static void clear_snapshot_buffer(struct snapshot *snapshot)
} else {
free(snapshot->buf);
}
-   snapshot->buf = snapshot->eof = NULL;
-   snapshot->header_len = 0;
+   snapshot->buf = snapshot->start = snapshot->eof = NULL;
 }
 
 /*
@@ -319,13 +322,14 @@ static void sort_snapshot(struct snapshot *snapshot)
size_t len, i;
char *new_buffer, *dst;
 
-   pos = snapshot->buf + snapshot->header_len;
+   pos = snapshot->start;
eof = snapshot->eof;
-   len = eof - pos;
 
-   if (!len)
+   if (pos == eof)
return;
 
+   len = eof - pos;
+
/*
 * Initialize records based on a crude estimate of the number
 * of references in the file (we'll grow it below if needed):
@@ -391,9 +395,8 @@ static void sort_snapshot(struct snapshot *snapshot)
 * place:
 */
clear_snapshot_buffer(snapshot);
-   snapshot->buf = new_buffer;
+   snapshot->buf = snapshot->start = new_buffer;
snapshot->eof = new_buffer + len;
-   snapshot->header_len = 0;
 
 cleanup:
free(records);
@@ -442,14 +445,14 @@ static const char *find_end_of_record(const char *p, 
const char *end)
  */
 static void verify_buffer_safe(struct snapshot *snapshot)
 {
-   const char *buf = snapshot->buf + snapshot->header_len;
+   const char *start = snapshot->start;
const char *eof = snapshot->eof;
const char *last_line;
 
-   if (buf == eof)
+   if (start == eof)
return;
 
-   last_line = find_start_of_record(buf, eof - 1);
+   last_line = find_start_of_record(start, eof - 1);
if (*(eof - 1) != '\n' || eof - last_line < GIT_SHA1_HEXSZ + 2)
die_invalid_line(snapshot->refs->path,
 last_line, eof - last_line);
@@ -495,18 +498,19 @@ static int load_contents(struct snapshot *snapshot)
bytes_read = read_in_full(fd, snapshot->buf, size);
if (bytes_read < 0 || bytes_read != size)
die_errno("couldn't read %s", snapshot->refs->path);
-   snapshot->eof = snapshot->buf + size;
snapshot->mmapped = 0;
break;
case MMAP_TEMPORARY:
case MMAP_OK:
snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 
0);
-   snapshot->eof = snapshot->buf + size;
snapshot->mmapped = 1;
break;
}
close(fd);
 
+   snapshot->start = snapshot->buf;
+   snapshot->eof = snapshot->buf + size;
+
return 1;
 }
 
@@ -539,7 +543,7 @@ static const char *find_reference_location(struct snapshot 
*snapshot,
 * preceding records all have reference names that come
 * *before* `refname`.
 */
-   const char *lo = snapshot->buf + snapshot->header_len;
+   const char *lo = 

Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files

2018-01-24 Thread Michael Haggerty
On 01/22/2018 08:31 PM, Junio C Hamano wrote:
> Michael Haggerty <mhag...@alum.mit.edu> writes:
> 
>> `snapshot->buf` can still be NULL if the `packed-refs` file didn't exist
>> (see the earlier code path in `load_contents()`). So either that code
>> path *also* has to get the `xmalloc()` treatment, or my third patch is
>> still necessary. (My second patch wouldn't be necessary because the
>> ENOENT case makes `load_contents()` return 0, triggering the early exit
>> from `create_snapshot()`.)
>>
>> I don't have a strong preference either way.
> 
> Which would be a two-liner, like the attached, which does not look
> too bad by itself.
> 
> The direction, if we take this approach, means that we are declaring
> that .buf being NULL is an invalid state for a snapshot to be in,
> instead of saying "an empty snapshot looks exactly like one that was
> freshly initialized", which seems to be the intention of the original
> design.
> 
> After Kim's fix and with 3/3 in your follow-up series, various
> helpers are still unsafe against .buf being NULL, like
> sort_snapshot(), verify_buffer_safe(), clear_snapshot_buffer() (only
> when mmapped bit is set), find_reference_location().
> 
> packed_ref_iterator_begin() checks if snapshot->buf is NULL and
> returns early.  At the first glance, this appears a useful short cut
> to optimize the empty case away, but the check also is acting as a
> guard to prevent a snapshot with NULL .buf from being fed to an
> unsafe find_reference_location().  An implicit guard like this feels
> a bit more brittle than my liking.  If we ensure .buf is never NULL,
> that check can become a pure short-cut optimization and stop being
> a correctness thing.
> 
> So...
> 
> 
>  refs/packed-backend.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index b6e2bc3c1d..1eeb5c7f80 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -473,12 +473,11 @@ static int load_contents(struct snapshot *snapshot)
>   if (fd < 0) {
>   if (errno == ENOENT) {
>   /*
> -  * This is OK; it just means that no
> -  * "packed-refs" file has been written yet,
> -  * which is equivalent to it being empty,
> -  * which is its state when initialized with
> -  * zeros.
> +  * Treat missing "packed-refs" as equivalent to
> +  * it being empty.
>*/
> + snapshot->eof = snapshot->buf = xmalloc(0);
> + snapshot->mmapped = 0;
>   return 0;
>   } else {
>   die_errno("couldn't read %s", snapshot->refs->path);
> 

That would work, though if you go this way, please also change the
docstring for `snapshot::buf`, which still says that `buf` and `eof` can
be `NULL`.

The other alternative, making `snapshot` safe for NULLs, becomes easier
if `snapshot` stores a pointer to the start of the reference section of
the `packed-refs` contents (i.e., after the header line), rather than
repeatedly computing that address from `snapshot->buf +
snapshot->header_len`. With this change, code that is technically
undefined when the fields are NULL can more easily be replaced with code
that is safe for NULL. For example,

pos = snapshot->buf + snapshot->header_len

becomes

pos = snapshot->start

, and

len = snapshot->eof - pos;
if (!len) [...]

becomes

if (pos == snapshot->eof) [...]
len = snapshot->eof - pos;

. In this way, most of the special-casing for NULL goes away (and some
code becomes simpler, as well).

In a moment I'll send a patch series illustrating this approach. I think
patches 01, 02, and 04 are improvements regardless of whether we decide
to make NULL safe.

The change to using `read()` rather than `mmap()` for small
`packed-refs` feels like it should be an improvement, but it occurred to
me that the performance numbers quoted in ea68b0ce9f8 (hash-object:
don't use mmap() for small files, 2010-02-21) are not directly
applicable to the `packed-refs` file. As far as I understand, the file
mmapped in `index_fd()` is always read in full, whereas the main point
of mmapping the packed-refs file is to avoid having to read the whole
file at all in some situations. That being said, a 32 KiB file would
only be 8 pages (assuming a page size of 4 KiB), and by the time you've
read the header and binary-searched to find the desired record, you've
probably paged in most of the file anyway. Reading the whole file at
once, in order, is almost certainly cheaper.

Michael


Re: [PATCH] files_initial_transaction_commit(): only unlock if locked

2018-01-22 Thread Michael Haggerty
On 01/19/2018 11:14 PM, Junio C Hamano wrote:
> Jeff King  writes:
> 
>> On Thu, Jan 18, 2018 at 02:38:41PM +0100, Mathias Rav wrote:
>>
>>> Running git clone --single-branch --mirror -b TAGNAME previously
>>> triggered the following error message:
>>>
>>> fatal: multiple updates for ref 'refs/tags/TAGNAME' not allowed.
>>>
>>> This error condition is handled in files_initial_transaction_commit().
>>>
>>> 42c7f7ff9 ("commit_packed_refs(): remove call to `packed_refs_unlock()`", 
>>> 2017-06-23)
>>> introduced incorrect unlocking in the error path of this function,
>>> which changes the error message to
>>>
>>> fatal: BUG: packed_refs_unlock() called when not locked
>>>
>>> Move the call to packed_refs_unlock() above the "cleanup:" label
>>> since the unlocking should only be done in the last error path.
>>
>> Thanks, this solution looks correct to me. It's pretty low-impact since
>> the locking is the second-to-last thing in the function, so we don't
>> have to re-add the unlock to a bunch of error code paths. But one
>> alternative would be to just do:
>>
>>   if (packed_refs_is_locked(refs))
>>  packed_refs_unlock(refs->packed_ref_store);
>>
>> in the cleanup section.
> 
> Yeah, that may be a more future-proof alternative, and just as you
> said the patch as posted would be sufficient, too.

Either solution LGTM. Thanks for finding and fixing this bug.

But let's also take a step back. The invocation

git clone --single-branch --mirror -b TAGNAME

seems curious. Does it even make sense to use `--mirror` and
`--single-branch` at the same time? What should it do?

Normally `--mirror` implies (aside from `--bare`) that the remote
references should be converted 1:1 to local references and should be
overwritten at every fetch; i.e., the refspec should be set to
`+refs/*:refs/*`.

To me the most plausible interpretation of `--mirror --single-branch -b
BRANCHNAME` would be that the single branch should be fetched and made
the HEAD, and the refspec should be set to
`+refs/heads/BRANCHNAME:refs/heads/BRANCHNAME`. It also wouldn't be very
surprising if it were forbidden to use these options together.

Currently, we do neither of those things. Instead we fetch that one
reference (as `refs/heads/BRANCHNAME`) but set the refspec to
`+refs/*:refs/*`; i.e., the next fetch would fetch all of the history.

It's even more mind-bending if `-b` is passed a `TAGNAME` rather than a
`BRANCHNAME`. The documentation says that `-b TAGNAME` "detaches the
HEAD at that commit in the resulting repository". If `--single-branch -b
TAGNAME` is used, then the refspec is set to
`+refs/tags/TAGNAME:refs/tags/TAGNAME`. But what if `--mirror` is also used?

Currently, this fails, apparently because `--mirror` and `-b TAGNAME`
each independently try to set `refs/tags/TAGNAME` (presumably to the
same value). *If* this is a useful use case, we could fix it so that it
doesn't fail. If not, maybe we should prohibit it explicitly and emit a
clearer error message.

Mathias: if you encountered this problem in the real world, what were
you trying to accomplish? What behavior would you have expected?

Maybe the behavior could be made more sane if there were a way to get
the 1:1 reference mapping that `--mirror` implies without also getting
`--bare` [1]. Suppose there were a `--refspec` option. Then instead of

git clone --mirror --single-branch -b BRANCHNAME

with it's non-obvious semantics, you could prohibit that use and instead
support

git clone --bare
--refspec='+refs/heads/BRANCHNAME:refs/heads/BRANCHNAME'

which seems clearer in its intent, if perhaps not super obvious. Or you
could give `clone` a `--no-fetch` option, which would give the user a
time to intervene between setting up the basic clone config and actually
fetching objects.

Michael


[1] It seems like

git clone --config remote.origin.fetch='+refs/*:refs/*' clone ...

might do it, but that actually ends up setting up two refspecs and
only honoring `+refs/heads/*:refs/remotes/origin/*` for the initial
fetch. Plus it is pretty obscure.


Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files

2018-01-20 Thread Michael Haggerty
On 01/17/2018 11:09 PM, Jeff King wrote:
> On Tue, Jan 16, 2018 at 08:38:15PM +0100, Kim Gybels wrote:
> 
>> Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for
>> small files, 2010-02-21) and use read() instead of mmap() for small
>> packed-refs files.
>>
>> This also fixes the problem[1] where xmmap() returns NULL for zero
>> length[2], for which munmap() later fails.
>>
>> Alternatively, we could simply check for NULL before munmap(), or
>> introduce xmunmap() that could be used together with xmmap(). However,
>> always setting snapshot->buf to a valid pointer, by relying on
>> xmalloc(0)'s fallback to 1-byte allocation, makes using snapshots
>> easier.
>>
>> [1] https://github.com/git-for-windows/git/issues/1410
>> [2] Logic introduced in commit 9130ac1e196 (Better error messages for
>> corrupt databases, 2007-01-11)
>>
>> Signed-off-by: Kim Gybels 
>> ---
>>
>> Change since v2: removed separate case for zero length as suggested by Peff,
>> ensuring that snapshot->buf is always a valid pointer.
> 
> Thanks, this looks fine to me (I'd be curious to hear from Michael if
> this eliminates the need for the other patches).

`snapshot->buf` can still be NULL if the `packed-refs` file didn't exist
(see the earlier code path in `load_contents()`). So either that code
path *also* has to get the `xmalloc()` treatment, or my third patch is
still necessary. (My second patch wouldn't be necessary because the
ENOENT case makes `load_contents()` return 0, triggering the early exit
from `create_snapshot()`.)

I don't have a strong preference either way.

Michael


[PATCH 2/3] create_snapshot(): exit early if the file was empty

2018-01-15 Thread Michael Haggerty
If the `packed_refs` files is entirely empty (i.e., not even a header
line), then `load_contents()` returns 1 even though `snapshot->buf`
and `snapshot->eof` both end up set to NULL. In that case, the
subsequent processing that `create_snapshot()` does is unnecessary,
and also involves computing `NULL - NULL` and `NULL + 0`, which are
probably safe in real life but are technically undefined in C.

Sidestep both issues by exiting early if `snapshot->buf` is NULL.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index f20f05b4df..36796d65f0 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -613,7 +613,7 @@ static struct snapshot *create_snapshot(struct 
packed_ref_store *refs)
acquire_snapshot(snapshot);
snapshot->peeled = PEELED_NONE;
 
-   if (!load_contents(snapshot))
+   if (!load_contents(snapshot) || !snapshot->buf)
return snapshot;
 
/* If the file has a header line, process it: */
-- 
2.14.2



[PATCH 3/3] find_reference_location(): don't invoke if `snapshot->buf` is NULL

2018-01-15 Thread Michael Haggerty
If `snapshot->buf` is NULL, then `find_reference_location()` has two
problems:

1. It relies on behavior that is technically undefined in C, such as
   computing `NULL + 0`.

2. It returns NULL if the reference doesn't exist, even if `mustexist`
   is not set. This problem doesn't come up in the current code,
   because we never call this function with `snapshot->buf == NULL`
   and `mustexist` set. But it is something that future callers need
   to be aware of.

We could fix the first problem by adding some extra logic to the
function. But considering both problems together, it is more
straightforward to document that the function should only be called if
`snapshot->buf` is non-NULL.

Adjust `packed_read_raw_ref()` to return early if `snapshot->buf` is
NULL rather than calling `find_reference_location()`.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 36796d65f0..ed2b396bef 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -521,8 +521,9 @@ static int load_contents(struct snapshot *snapshot)
  * reference name; for example, one could search for "refs/replace/"
  * to find the start of any replace references.
  *
+ * This function must only be called if `snapshot->buf` is non-NULL.
  * The record is sought using a binary search, so `snapshot->buf` must
- * be sorted.
+ * also be sorted.
  */
 static const char *find_reference_location(struct snapshot *snapshot,
   const char *refname, int mustexist)
@@ -728,6 +729,12 @@ static int packed_read_raw_ref(struct ref_store *ref_store,
 
*type = 0;
 
+   if (!snapshot->buf) {
+   /* There are no packed references */
+   errno = ENOENT;
+   return -1;
+   }
+
rec = find_reference_location(snapshot, refname, 1);
 
if (!rec) {
-- 
2.14.2



[PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files

2018-01-15 Thread Michael Haggerty
Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 01a13cb817..f20f05b4df 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -69,11 +69,11 @@ struct snapshot {
 
/*
 * The contents of the `packed-refs` file. If the file was
-* already sorted, this points at the mmapped contents of the
-* file. If not, this points at heap-allocated memory
-* containing the contents, sorted. If there were no contents
-* (e.g., because the file didn't exist), `buf` and `eof` are
-* both NULL.
+* already sorted and if mmapping is allowed, this points at
+* the mmapped contents of the file. If not, this points at
+* heap-allocated memory containing the contents, sorted. If
+* there were no contents (e.g., because the file didn't exist
+* or was empty), `buf` and `eof` are both NULL.
 */
char *buf, *eof;
 
-- 
2.14.2



[PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files"

2018-01-15 Thread Michael Haggerty
Thanks for your patch. I haven't measured the performance difference
of `mmap()` vs. `read()` for small `packed-refs` files, but it's not
surprising that `read()` would be faster.

I especially like the fix for zero-length `packed-refs` files. (Even
though AFAIK Git never writes such files, they are totally legitimate
and shouldn't cause Git to fail.) With or without the additions
mentioned below,

Reviewed-by: Michael Haggerty <mhag...@alum.mit.edu>

While reviewing your patch, I realized that some areas of the existing
code use constructs that are undefined according to the C standard,
such as computing `NULL + 0` and `NULL - NULL`. This was already wrong
(and would come up more frequently after your change). Even though
these are unlikely to be problems in the real world, it would be good
to avoid them.

So I will follow up this email with three patches:

1. Mention that `snapshot::buf` can be NULL for empty files

   I suggest squashing this into your patch, to make it clear that
   `snapshot::buf` and `snapshot::eof` can also be NULL if the
   `packed-refs` file is empty.

2. create_snapshot(): exit early if the file was empty

   Avoid undefined behavior by returning early if `snapshot->buf` is
   NULL.

3. find_reference_location(): don't invoke if `snapshot->buf` is NULL

   Avoid undefined behavior and confusing semantics by not calling
   `find_reference_location()` when `snapshot->buf` is NULL.

Michael

Michael Haggerty (3):
  SQUASH? Mention that `snapshot::buf` can be NULL for empty files
  create_snapshot(): exit early if the file was empty
  find_reference_location(): don't invoke if `snapshot->buf` is NULL

 refs/packed-backend.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

-- 
2.14.2



Re: [PATCH] refs: drop "clear packed-refs while locked" assertion

2018-01-01 Thread Michael Haggerty
On Fri, Dec 8, 2017 at 12:22 PM, Jeff King <p...@peff.net> wrote:
> This patch fixes a regression in v2.14.0. It's actually fixed already in
> v2.15.0 because all of the packed-ref code there was rewritten. So
> there's no point in applying this on "master" or even "maint". But I
> figured it was worth sharing here in case somebody else runs across it,
> and in case we ever do a v2.14.4 release.

I forgot to respond to this. +1

Reviewed-by: Michael Haggerty <mhag...@alum.mit.edu>

Michael

> -- >8 --
> In clear_packed_ref_cache(), we assert that we're not
> currently holding the packed-refs lock. But in each of the
> three code paths that can hit this, the assertion is either
> a noop or actively does the wrong thing:
>
>  1. in rollback_packed_refs(), we will have just released
> the lock before calling the function, and so the
> assertion can never trigger.
>
>  2. get_packed_ref_cache() can reach this assertion via
> validate_packed_ref_cache(). But it calls the validate
> function only when it knows that we're not holding the
> lock, so again, the assertion can never trigger.
>
>  3. lock_packed_refs() also calls validate_packed_ref_cache().
> In this case we're _always_ holding the lock, which
> means any time the validate function has to clear the
> cache, we'll trigger this assertion and die.
>
> This doesn't happen often in practice because the
> validate function clears the cache only if we find that
> somebody else has racily rewritten the packed-refs file
> between the time we read it and the time we took the lock.
>
> So most of the time we don't reach the assertion at all
> (nobody has racily written the file so there's no need
> to clear the cache). And when we do, it is not actually
> indicative of a bug; clearing the cache while holding
> the lock is the right thing to do here.
>
> This final case is relatively new, being triggerd by the
> extra validation added in fed6ebebf1 (lock_packed_refs():
> fix cache validity check, 2017-06-12).
>
> Signed-off-by: Jeff King <p...@peff.net>
> ---
>  refs/files-backend.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index f21a954ce7..dd41e1d382 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -99,8 +99,6 @@ static void clear_packed_ref_cache(struct files_ref_store 
> *refs)
> if (refs->packed) {
> struct packed_ref_cache *packed_refs = refs->packed;
>
> -   if (is_lock_file_locked(>packed_refs_lock))
> -   die("BUG: packed-ref cache cleared while locked");
> refs->packed = NULL;
> release_packed_ref_cache(packed_refs);
> }
> --
> 2.15.1.659.g8bd2eae3ea


Re: [PATCH v1 2/2] log: add option to choose which refs to decorate

2017-11-05 Thread Michael Haggerty
On 11/05/2017 07:17 AM, Junio C Hamano wrote:
> Junio C Hamano  writes:
>> Rafael Ascensão  writes:
>> ...
>>> Because changing the default behavior of that function has
>>> implications on multiple commands which I think shouldn't change. But
>>> at the same time, would be nice to have the logic that deals with
>>> glob-ref patterns all in one place.
>>>
>>> What's the sane way to do this?
>>
>> Learn to type "--decorate-refs="refs/heads/[m]aster", and not twewak
>> the code at all, perhaps.  The users of existing "with no globbing,
>> /* is appended" interface are already used to that way and they do
>> not have to learn a new and inconsistent interface.
>>
>> After all, "I only want to see 'git log' output with 'master'
>> decorated" (i.e. not specifying "this class of refs I can glob by
>> using the naming convention I am using" and instead enumerating the
>> ones you care about) does not sound like a sensible thing people
>> often want to do, so making it follow the other codepath so that
>> people can say "refs/tags" to get "refs/tags/*", while still allowing
>> such a rare but specific and exact one possible, may not sound too
>> bad to me.
> 
> Having said all that, I can imagine another way out might be to
> change the behaviour of this "normalize" thing to add two patterns,
> the original pattern in addition to the original pattern plus "/*",
> when it sees a pattern without any glob.  Many users who relied on
> the current behaviour fed "refs/tags" knowing that it will match
> everything under "refs/tags" i.e. "refs/tags/*", and they cannot
> have a ref that is exactly "refs/tags", so adding the original
> pattern without an extra trailing "/*" would not hurt them.  And
> this will allow you to say "refs/heads/master" when you know you
> want that exact ref, and in such a repository where that original
> pattern without trailing "/*" would be useful, because you cannot
> have "refs/heads/master/one" at the same time, having an extra
> pattern that is the original plus "/*" would not hurt you, either.
> 
> This however needs a bit of thought to see if there are corner cases
> that may result in unexpected and unwanted fallout, and something I
> am reluctant to declare unilaterally that it is a better way to go.

There's some glob-matching code (somewhere? I don't know if it's allowed
everywhere) that allows "**" to mean "zero or one path components. If
"refs/tags" were massaged to be "refs/tags/**", then it would match not only

refs/tags
refs/tags/foo

but also

refs/tags/foo/bar

, which is probably another thing that the user would expect to see.

There's at least some precedent for this kind of expansion: `git
for-each-ref refs/remotes` lists *all* references under that prefix,
even if they have multiple levels.

Michael


Re: [PATCH v1 1/2] refs: extract function to normalize partial refs

2017-11-05 Thread Michael Haggerty
On 11/06/2017 02:23 AM, Junio C Hamano wrote:
> Michael Haggerty <mhag...@alum.mit.edu> writes:
> 
>> [1] I say "almost entirely" because putting them in one function means
>> that only `pattern` needs to be scanned for glob characters. But that is
>> an unimportant detail.
> 
> That could actually be an important detail, in that even if prefix
> has wildcard, we'd still append the trailing "/*" as long as the
> pattern does not, right?

That's correct, but I was assuming that the prefix would always be a
hard-coded string like "refs/tags/" or maybe "refs/". (That is the case
now.) It doesn't seem very useful to use a prefix like "refs/*/".

Michael


Re: [PATCH v1 1/2] refs: extract function to normalize partial refs

2017-11-05 Thread Michael Haggerty
On 11/04/2017 01:41 AM, Rafael Ascensão wrote:
> `for_each_glob_ref_in` has some code built into it that converts
> partial refs like 'heads/master' to their full qualified form
> 'refs/heads/master'. It also assume a trailing '/*' if no glob
> characters are present in the pattern.
> 
> Extract that logic to its own function which can be reused elsewhere
> where the same behaviour is needed, and add an ENSURE_GLOB flag
> to toggle if a trailing '/*' is to be appended to the result.
> 
> Signed-off-by: Kevin Daudt 
> Signed-off-by: Rafael Ascensão 
> ---
>  refs.c | 34 --
>  refs.h | 16 
>  2 files changed, 36 insertions(+), 14 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index c590a992f..1e74b48e6 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -369,32 +369,38 @@ int head_ref_namespaced(each_ref_fn fn, void *cb_data)
>   return ret;
>  }
>  
> -int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
> - const char *prefix, void *cb_data)
> +void normalize_glob_ref(struct strbuf *normalized_pattern, const char 
> *prefix,
> + const char *pattern, int flags)
>  {
> - struct strbuf real_pattern = STRBUF_INIT;
> - struct ref_filter filter;
> - int ret;
> -
>   if (!prefix && !starts_with(pattern, "refs/"))
> - strbuf_addstr(_pattern, "refs/");
> + strbuf_addstr(normalized_pattern, "refs/");
>   else if (prefix)
> - strbuf_addstr(_pattern, prefix);
> - strbuf_addstr(_pattern, pattern);
> + strbuf_addstr(normalized_pattern, prefix);
> + strbuf_addstr(normalized_pattern, pattern);

I realize that the old code did this too, but while you are in the area
it might be nice to simplify the logic from

if (!prefix && !starts_with(pattern, "refs/"))
strbuf_addstr(normalized_pattern, "refs/");
else if (prefix)
strbuf_addstr(normalized_pattern, prefix);

to

if (prefix)
strbuf_addstr(normalized_pattern, prefix);
else if (!starts_with(pattern, "refs/"))
strbuf_addstr(normalized_pattern, "refs/");

This would avoid having to check twice whether `prefix` is NULL.

> - if (!has_glob_specials(pattern)) {
> + if (!has_glob_specials(pattern) && (flags & ENSURE_GLOB)) {
>   /* Append implied '/' '*' if not present. */
> - strbuf_complete(_pattern, '/');
> + strbuf_complete(normalized_pattern, '/');
>   /* No need to check for '*', there is none. */
> - strbuf_addch(_pattern, '*');
> + strbuf_addch(normalized_pattern, '*');
>   }
> +}
> +
> +int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
> + const char *prefix, void *cb_data)
> +{
> + struct strbuf normalized_pattern = STRBUF_INIT;
> + struct ref_filter filter;
> + int ret;
> +
> + normalize_glob_ref(_pattern, prefix, pattern, ENSURE_GLOB);
>  
> - filter.pattern = real_pattern.buf;
> + filter.pattern = normalized_pattern.buf;
>   filter.fn = fn;
>   filter.cb_data = cb_data;
>   ret = for_each_ref(filter_refs, );
>  
> - strbuf_release(_pattern);
> + strbuf_release(_pattern);
>   return ret;
>  }
>  
> diff --git a/refs.h b/refs.h
> index a02b628c8..9f9a8bb27 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -312,6 +312,22 @@ int for_each_namespaced_ref(each_ref_fn fn, void 
> *cb_data);
>  int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void 
> *cb_data);
>  int for_each_rawref(each_ref_fn fn, void *cb_data);
>  
> +/*
> + * Normalizes partial refs to their full qualified form.
> + * If prefix is NULL, will prepend 'refs/' to the pattern if it doesn't start
> + * with 'refs/'. Results in refs/
> + *
> + * If prefix is not NULL will result in /
> + *
> + * If ENSURE_GLOB is set and no glob characters are found in the
> + * pattern, a trailing <*> will be appended to the result.
> + * (<> characters to avoid breaking C comment syntax)
> + */
> +
> +#define ENSURE_GLOB 1
> +void normalize_glob_ref (struct strbuf *normalized_pattern, const char 
> *prefix,
> + const char *pattern, int flags);

There shouldn't be a space between the function name and the open
parenthesis.

You have complicated the interface by allowing an `ENSURE_BLOB` flag.
This would make sense if the logic for normalizing the prefix were
tangled up with the logic for adding the suffix. But in fact they are
almost entirely orthogonal [1].

So the interface might be simplified by having two functions,

void normalize_glob_ref(normalized_pattern, prefix, pattern);
void ensure_blob(struct strbuf *pattern);

The caller in this patch would call the functions one after the other
(or the `ensure_blob` behavior could be inlined in
`for_each_glob_ref_in()`, since it doesn't yet have any callers). And
the callers introduced in patch 2 would only need to 

Re: [PATCH v1 1/2] refs: extract function to normalize partial refs

2017-11-05 Thread Michael Haggerty
On 11/04/2017 11:45 PM, Kevin Daudt wrote:
> On Sat, Nov 04, 2017 at 11:27:39AM +0900, Junio C Hamano wrote:
>> I however notice that addition of /* to the tail is trying to be
>> careful by using strbuf_complete('/'), but prefixing with "refs/"
>> does not and we would end up with a double-slash if pattern begins
>> with a slash.  The contract between the caller of this function (or
>> its original, which is for_each_glob_ref_in()) and the callee is
>> that prefix must not begin with '/', so it may be OK, but we might
>> want to add "if (*pattern == '/') BUG(...)" at the beginning.
>>
>> I dunno.  In any case, that is totally outside the scope of this two
>> patch series.
> 
> I do think it's a good idea to make future readers of the code aware of
> this contract, and adding a BUG assert does that quite well. Here is a
> patch that implements it.
> 
> This applies of course on top of this patch series.
> 
> -- >8 --
> Subject: [PATCH] normalize_glob_ref: assert implicit contract of prefix
> 
> normalize_glob_ref has an implicit contract of expecting 'prefix' to not
> start with a '/', otherwise the pattern would end up with a
> double-slash.
> 
> Mark it as a BUG when the prefix argument of normalize_glob_ref starts
> with a '/' so that future callers will be aware of this contract.
> 
> Signed-off-by: Kevin Daudt 
> ---
>  refs.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/refs.c b/refs.c
> index e9ae659ae..6747981d1 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -372,6 +372,8 @@ int head_ref_namespaced(each_ref_fn fn, void *cb_data)
>  void normalize_glob_ref(struct strbuf *normalized_pattern, const char 
> *prefix,
>   const char *pattern, int flags)
>  {
> + if (prefix && *prefix == '/') BUG("prefix cannot not start with '/'");

This should be split onto two lines.

Also, "prefix cannot not start ..." has two "not". I suggest changing it
to "prefix must not start ...", because that makes it clearer that the
caller is at fault.

What if the caller passes the empty string as prefix? In that case, the
end result would be "/", which is also bogus.

> +
>   if (!prefix && !starts_with(pattern, "refs/"))
>   strbuf_addstr(normalized_pattern, "refs/");
>   else if (prefix)

Michael


[PATCH v2 2/9] prune_ref(): call `ref_transaction_add_update()` directly

2017-11-05 Thread Michael Haggerty
`prune_ref()` needs to use the `REF_ISPRUNING` flag, but we want to
make that flag private to the files backend. So instead of calling
`ref_transaction_delete()`, which is a public function and therefore
shouldn't allow the `REF_ISPRUNING` flag, change `prune_ref()` to call
`ref_transaction_add_update()`, which is private to the refs
module. (Note that we don't need any of the other services provided by
`ref_transaction_delete()`.)

This allows us to change `ref_transaction_update()` to reject the
`REF_ISPRUNING` flag. Do so by adjusting
`REF_TRANSACTION_UPDATE_ALLOWED_FLAGS`. Also add parentheses to its
definition to avoid potential future mishaps.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs.h   |  4 +---
 refs/files-backend.c | 25 -
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/refs.h b/refs.h
index 15fd419c7d..4ffef9502d 100644
--- a/refs.h
+++ b/refs.h
@@ -349,9 +349,7 @@ int refs_pack_refs(struct ref_store *refs, unsigned int 
flags);
  * Flags that can be passed in to ref_transaction_update
  */
 #define REF_TRANSACTION_UPDATE_ALLOWED_FLAGS \
-   REF_ISPRUNING |  \
-   REF_FORCE_CREATE_REFLOG |\
-   REF_NODEREF
+   (REF_NODEREF | REF_FORCE_CREATE_REFLOG)
 
 /*
  * Setup reflog before using. Fill in err and return -1 on failure.
diff --git a/refs/files-backend.c b/refs/files-backend.c
index fadf1036d3..ba72d28b13 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -989,22 +989,29 @@ static void prune_ref(struct files_ref_store *refs, 
struct ref_to_prune *r)
 {
struct ref_transaction *transaction;
struct strbuf err = STRBUF_INIT;
+   int ret = -1;
 
if (check_refname_format(r->name, 0))
return;
 
transaction = ref_store_transaction_begin(>base, );
-   if (!transaction ||
-   ref_transaction_delete(transaction, r->name, >oid,
-  REF_ISPRUNING | REF_NODEREF, NULL, ) ||
-   ref_transaction_commit(transaction, )) {
-   ref_transaction_free(transaction);
+   if (!transaction)
+   goto cleanup;
+   ref_transaction_add_update(
+   transaction, r->name,
+   REF_NODEREF | REF_HAVE_NEW | REF_HAVE_OLD | 
REF_ISPRUNING,
+   _oid, >oid, NULL);
+   if (ref_transaction_commit(transaction, ))
+   goto cleanup;
+
+   ret = 0;
+
+cleanup:
+   if (ret)
error("%s", err.buf);
-   strbuf_release();
-   return;
-   }
-   ref_transaction_free(transaction);
strbuf_release();
+   ref_transaction_free(transaction);
+   return;
 }
 
 /*
-- 
2.14.1



[PATCH v2 4/9] ref_transaction_add_update(): remove a check

2017-11-05 Thread Michael Haggerty
We want to make `REF_ISPRUNING` internal to the files backend. For
this to be possible, `ref_transaction_add_update()` mustn't know about
it. So move the check that `REF_ISPRUNING` is only used with
`REF_NODEREF` from this function to `files_transaction_prepare()`.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs.c   | 3 ---
 refs/files-backend.c | 7 ++-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/refs.c b/refs.c
index 7c1e206e08..0d9a1348cd 100644
--- a/refs.c
+++ b/refs.c
@@ -906,9 +906,6 @@ struct ref_update *ref_transaction_add_update(
if (transaction->state != REF_TRANSACTION_OPEN)
die("BUG: update called for transaction that is not open");
 
-   if ((flags & REF_ISPRUNING) && !(flags & REF_NODEREF))
-   die("BUG: REF_ISPRUNING set without REF_NODEREF");
-
FLEX_ALLOC_STR(update, refname, refname);
ALLOC_GROW(transaction->updates, transaction->nr + 1, 
transaction->alloc);
transaction->updates[transaction->nr++] = update;
diff --git a/refs/files-backend.c b/refs/files-backend.c
index ba72d28b13..a47771e4d4 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2518,13 +2518,18 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
 * transaction. (If we end up splitting up any updates using
 * split_symref_update() or split_head_update(), those
 * functions will check that the new updates don't have the
-* same refname as any existing ones.)
+* same refname as any existing ones.) Also fail if any of the
+* updates use REF_ISPRUNING without REF_NODEREF.
 */
for (i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
struct string_list_item *item =
string_list_append(_refnames, update->refname);
 
+   if ((update->flags & REF_ISPRUNING) &&
+   !(update->flags & REF_NODEREF))
+   BUG("REF_ISPRUNING set without REF_NODEREF");
+
/*
 * We store a pointer to update in item->util, but at
 * the moment we never use the value of this field
-- 
2.14.1



[PATCH v2 6/9] refs: rename constant `REF_NODEREF` to `REF_NO_DEREF`

2017-11-05 Thread Michael Haggerty
Even after working with this code for years, I still see this constant
name as "ref node ref". Rename it to make it's meaning clearer.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 builtin/am.c   |  2 +-
 builtin/branch.c   |  2 +-
 builtin/checkout.c |  2 +-
 builtin/clone.c|  4 ++--
 builtin/notes.c|  2 +-
 builtin/remote.c   |  6 +++---
 builtin/symbolic-ref.c |  2 +-
 builtin/update-ref.c   |  4 ++--
 refs.h |  6 +++---
 refs/files-backend.c   | 40 
 refs/refs-internal.h   |  4 ++--
 sequencer.c|  6 +++---
 12 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index c9bb14a6c2..894290e2d3 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -2151,7 +2151,7 @@ static void am_abort(struct am_state *state)
   has_curr_head ? _head : NULL, 0,
   UPDATE_REFS_DIE_ON_ERR);
else if (curr_branch)
-   delete_ref(NULL, curr_branch, NULL, REF_NODEREF);
+   delete_ref(NULL, curr_branch, NULL, REF_NO_DEREF);
 
free(curr_branch);
am_destroy(state);
diff --git a/builtin/branch.c b/builtin/branch.c
index b1ed649300..33fd5fcfd1 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -258,7 +258,7 @@ static int delete_branches(int argc, const char **argv, int 
force, int kinds,
}
 
if (delete_ref(NULL, name, is_null_oid() ? NULL : ,
-  REF_NODEREF)) {
+  REF_NO_DEREF)) {
error(remote_branch
  ? _("Error deleting remote-tracking branch '%s'")
  : _("Error deleting branch '%s'"),
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 463a337e5d..114028ee01 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -665,7 +665,7 @@ static void update_refs_for_switch(const struct 
checkout_opts *opts,
/* Nothing to do. */
} else if (opts->force_detach || !new->path) {  /* No longer on any 
branch. */
update_ref(msg.buf, "HEAD", >commit->object.oid, NULL,
-  REF_NODEREF, UPDATE_REFS_DIE_ON_ERR);
+  REF_NO_DEREF, UPDATE_REFS_DIE_ON_ERR);
if (!opts->quiet) {
if (old->path &&
advice_detached_head && !opts->force_detach)
diff --git a/builtin/clone.c b/builtin/clone.c
index 695bdd7046..557c6c3c06 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -689,7 +689,7 @@ static void update_head(const struct ref *our, const struct 
ref *remote,
} else if (our) {
struct commit *c = lookup_commit_reference(>old_oid);
/* --branch specifies a non-branch (i.e. tags), detach HEAD */
-   update_ref(msg, "HEAD", >object.oid, NULL, REF_NODEREF,
+   update_ref(msg, "HEAD", >object.oid, NULL, REF_NO_DEREF,
   UPDATE_REFS_DIE_ON_ERR);
} else if (remote) {
/*
@@ -697,7 +697,7 @@ static void update_head(const struct ref *our, const struct 
ref *remote,
 * HEAD points to a branch but we don't know which one.
 * Detach HEAD in all these cases.
 */
-   update_ref(msg, "HEAD", >old_oid, NULL, REF_NODEREF,
+   update_ref(msg, "HEAD", >old_oid, NULL, REF_NO_DEREF,
   UPDATE_REFS_DIE_ON_ERR);
}
 }
diff --git a/builtin/notes.c b/builtin/notes.c
index 12afdf1907..d7754db143 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -686,7 +686,7 @@ static int merge_abort(struct notes_merge_options *o)
 
if (delete_ref(NULL, "NOTES_MERGE_PARTIAL", NULL, 0))
ret += error(_("failed to delete ref NOTES_MERGE_PARTIAL"));
-   if (delete_ref(NULL, "NOTES_MERGE_REF", NULL, REF_NODEREF))
+   if (delete_ref(NULL, "NOTES_MERGE_REF", NULL, REF_NO_DEREF))
ret += error(_("failed to delete ref NOTES_MERGE_REF"));
if (notes_merge_abort(o))
ret += error(_("failed to remove 'git notes merge' worktree"));
diff --git a/builtin/remote.c b/builtin/remote.c
index 0fddc64461..3d38c6150c 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -693,7 +693,7 @@ static int mv(int argc, const char **argv)
read_ref_full(item->string, RESOLVE_REF_READING, , );
if (!(flag & REF_ISSYMREF))
continue;
-   if (delete_ref(NULL, item->string, NULL, REF_NODEREF))
+   if (delete_ref(NULL, item->string, NULL, REF_NO_DEREF))
die(_(

[PATCH v2 3/9] ref_transaction_update(): die on disallowed flags

2017-11-05 Thread Michael Haggerty
Callers shouldn't be passing disallowed flags into
`ref_transaction_update()`. So instead of masking them off, treat it
as a bug if any are set.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/refs.c b/refs.c
index 62a7621025..7c1e206e08 100644
--- a/refs.c
+++ b/refs.c
@@ -940,7 +940,8 @@ int ref_transaction_update(struct ref_transaction 
*transaction,
return -1;
}
 
-   flags &= REF_TRANSACTION_UPDATE_ALLOWED_FLAGS;
+   if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS)
+   BUG("illegal flags 0x%x passed to ref_transaction_update()", 
flags);
 
flags |= (new_oid ? REF_HAVE_NEW : 0) | (old_oid ? REF_HAVE_OLD : 0);
 
-- 
2.14.1



[PATCH v2 9/9] refs: update some more docs to use "oid" rather than "sha1"

2017-11-05 Thread Michael Haggerty
Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs.c|  2 +-
 refs.h|  8 
 refs/files-backend.c  | 19 +--
 refs/packed-backend.c |  2 +-
 refs/ref-cache.c  |  4 ++--
 refs/refs-internal.h  | 18 +-
 6 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/refs.c b/refs.c
index 0d9a1348cd..339d4318ee 100644
--- a/refs.c
+++ b/refs.c
@@ -770,7 +770,7 @@ static int read_ref_at_ent(struct object_id *ooid, struct 
object_id *noid,
if (cb->cutoff_cnt)
*cb->cutoff_cnt = cb->reccnt - 1;
/*
-* we have not yet updated cb->[n|o]sha1 so they still
+* we have not yet updated cb->[n|o]oid so they still
 * hold the values for the previous record.
 */
if (!is_null_oid(>ooid)) {
diff --git a/refs.h b/refs.h
index d396012367..18582a408c 100644
--- a/refs.h
+++ b/refs.h
@@ -126,7 +126,7 @@ int peel_ref(const char *refname, struct object_id *oid);
 /**
  * Resolve refname in the nested "gitlink" repository in the specified
  * submodule (which must be non-NULL). If the resolution is
- * successful, return 0 and set sha1 to the name of the object;
+ * successful, return 0 and set oid to the name of the object;
  * otherwise, return a non-zero value.
  */
 int resolve_gitlink_ref(const char *submodule, const char *refname,
@@ -260,7 +260,7 @@ struct ref_transaction;
 
 /*
  * The signature for the callback function for the for_each_*()
- * functions below.  The memory pointed to by the refname and sha1
+ * functions below.  The memory pointed to by the refname and oid
  * arguments is only guaranteed to be valid for the duration of a
  * single callback invocation.
  */
@@ -354,7 +354,7 @@ int reflog_exists(const char *refname);
 
 /*
  * Delete the specified reference. If old_oid is non-NULL, then
- * verify that the current value of the reference is old_sha1 before
+ * verify that the current value of the reference is old_oid before
  * deleting it. If old_oid is NULL, delete the reference if it
  * exists, regardless of its old value. It is an error for old_oid to
  * be null_oid. msg and flags are passed through to
@@ -633,7 +633,7 @@ int ref_transaction_abort(struct ref_transaction 
*transaction,
  * It is a bug to call this function when there might be other
  * processes accessing the repository or if there are existing
  * references that might conflict with the ones being created. All
- * old_sha1 values must either be absent or NULL_SHA1.
+ * old_oid values must either be absent or null_oid.
  */
 int initial_ref_transaction_commit(struct ref_transaction *transaction,
   struct strbuf *err);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index bb10b715a8..2298f900dd 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -240,7 +240,7 @@ static void loose_fill_ref_dir(struct ref_store *ref_store,
} else if (is_null_oid()) {
/*
 * It is so astronomically unlikely
-* that NULL_SHA1 is the SHA-1 of an
+* that null_oid is the OID of an
 * actual object that we consider its
 * appearance in a loose reference
 * file to be repo corruption
@@ -473,7 +473,7 @@ static void unlock_ref(struct ref_lock *lock)
  * are passed to refs_verify_refname_available() for this check.
  *
  * If mustexist is not set and the reference is not found or is
- * broken, lock the reference anyway but clear sha1.
+ * broken, lock the reference anyway but clear old_oid.
  *
  * Return 0 on success. On failure, write an error message to err and
  * return TRANSACTION_NAME_CONFLICT or TRANSACTION_GENERIC_ERROR.
@@ -1648,9 +1648,8 @@ static int files_log_ref_write(struct files_ref_store 
*refs,
 }
 
 /*
- * Write sha1 into the open lockfile, then close the lockfile. On
- * errors, rollback the lockfile, fill in *err and
- * return -1.
+ * Write oid into the open lockfile, then close the lockfile. On
+ * errors, rollback the lockfile, fill in *err and return -1.
  */
 static int write_ref_to_lockfile(struct ref_lock *lock,
 const struct object_id *oid, struct strbuf 
*err)
@@ -2272,7 +2271,7 @@ static int split_symref_update(struct files_ref_store 
*refs,
 
/*
 * Change the symbolic ref update to log only. Also, it
-* doesn't need to check its old SHA-1 value, as that will be
+* doesn't need to check its old OID value, as that will be
 * done when new_update is processed.
 */
update->flags |= REF_LOG_ONLY | REF_NO_DEREF;
@@ -2341,7 +2340,7 @@ static int check_old_oid(struct ref_update *update, 
struct object_

[PATCH v2 5/9] refs: tidy up and adjust visibility of the `ref_update` flags

2017-11-05 Thread Michael Haggerty
The constants used for `ref_update::flags` were rather disorganized:

* The definitions in `refs.h` were not close to the functions that
  used them.

* Maybe constants were defined in `refs-internal.h`, making them
  visible to the whole refs module, when in fact they only made sense
  for the files backend.

* Their documentation wasn't very consistent and partly still referred
  to sha1s rather than oids.

* The numerical values followed no rational scheme

Fix all of these problems. The main functional improvement is that
some constants' visibility is now limited to `files-backend.c`.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs.h   | 67 ++--
 refs/files-backend.c | 45 +++
 refs/refs-internal.h | 67 
 3 files changed, 99 insertions(+), 80 deletions(-)

diff --git a/refs.h b/refs.h
index 4ffef9502d..261d46c10c 100644
--- a/refs.h
+++ b/refs.h
@@ -335,22 +335,6 @@ void warn_dangling_symrefs(FILE *fp, const char *msg_fmt,
  */
 int refs_pack_refs(struct ref_store *refs, unsigned int flags);
 
-/*
- * Flags controlling ref_transaction_update(), ref_transaction_create(), etc.
- * REF_NODEREF: act on the ref directly, instead of dereferencing
- *  symbolic references.
- *
- * Other flags are reserved for internal use.
- */
-#define REF_NODEREF0x01
-#define REF_FORCE_CREATE_REFLOG 0x40
-
-/*
- * Flags that can be passed in to ref_transaction_update
- */
-#define REF_TRANSACTION_UPDATE_ALLOWED_FLAGS \
-   (REF_NODEREF | REF_FORCE_CREATE_REFLOG)
-
 /*
  * Setup reflog before using. Fill in err and return -1 on failure.
  */
@@ -478,22 +462,23 @@ struct ref_transaction *ref_transaction_begin(struct 
strbuf *err);
  *
  * refname -- the name of the reference to be affected.
  *
- * new_sha1 -- the SHA-1 that should be set to be the new value of
- * the reference. Some functions allow this parameter to be
+ * new_oid -- the object ID that should be set to be the new value
+ * of the reference. Some functions allow this parameter to be
  * NULL, meaning that the reference is not changed, or
- * null_sha1, meaning that the reference should be deleted. A
+ * null_oid, meaning that the reference should be deleted. A
  * copy of this value is made in the transaction.
  *
- * old_sha1 -- the SHA-1 value that the reference must have before
+ * old_oid -- the object ID that the reference must have before
  * the update. Some functions allow this parameter to be NULL,
  * meaning that the old value of the reference is not checked,
- * or null_sha1, meaning that the reference must not exist
+ * or null_oid, meaning that the reference must not exist
  * before the update. A copy of this value is made in the
  * transaction.
  *
  * flags -- flags affecting the update, passed to
- * update_ref_lock(). Can be REF_NODEREF, which means that
- * symbolic references should not be followed.
+ * update_ref_lock(). Possible flags: REF_NODEREF,
+ * REF_FORCE_CREATE_REFLOG. See those constants for more
+ * information.
  *
  * msg -- a message describing the change (for the reflog).
  *
@@ -509,11 +494,37 @@ struct ref_transaction *ref_transaction_begin(struct 
strbuf *err);
  */
 
 /*
- * Add a reference update to transaction. new_oid is the value that
- * the reference should have after the update, or null_oid if it
- * should be deleted. If new_oid is NULL, then the reference is not
- * changed at all. old_oid is the value that the reference must have
- * before the update, or null_oid if it must not have existed
+ * The following flags can be passed to ref_transaction_update() etc.
+ * Internally, they are stored in `ref_update::flags`, along with some
+ * internal flags.
+ */
+
+/*
+ * Act on the ref directly; i.e., without dereferencing symbolic refs.
+ * If this flag is not specified, then symbolic references are
+ * dereferenced and the update is applied to the referent.
+ */
+#define REF_NODEREF (1 << 0)
+
+/*
+ * Force the creation of a reflog for this reference, even if it
+ * didn't previously have a reflog.
+ */
+#define REF_FORCE_CREATE_REFLOG (1 << 1)
+
+/*
+ * Bitmask of all of the flags that are allowed to be passed in to
+ * ref_transaction_update() and friends:
+ */
+#define REF_TRANSACTION_UPDATE_ALLOWED_FLAGS \
+   (REF_NODEREF | REF_FORCE_CREATE_REFLOG)
+
+/*
+ * Add a reference update to transaction. `new_oid` is the value that
+ * the reference should have after the update, or `null_oid` if it
+ * should be deleted. If `new_oid` is NULL, then the reference is not
+ * changed at all. `old_oid` is the value that the reference must have
+ * before the update, or `null_oid` if it must not have existed
  * beforehand. The old value is checked after

[PATCH v2 0/9] Tidy up the constants related to ref_update::flags

2017-11-05 Thread Michael Haggerty
This is a reroll of a patch series that tidies up some stuff around
the ref_update::flags constants. Thanks to Junio and Martin for their
comments about v1 [1].

Relative to v1, this version:

* In patch 5, cleans up the touched comments to refer to OIDs rather
  than SHA-1s.

* Adds a patch 8, which changes `write_packed_entry()` to take
  `object_id` arguments.

* Adds a patch 9, which cleans up some remaining comments across all
  of the refs-related files to refer to OIDs rather than SHA-1s.

This patch series depends on bc/object-id. The patches are also
available from my GitHub fork as branch `tidy-ref-update-flags` [2].

Michael

[1] https://public-inbox.org/git/cover.1509183413.git.mhag...@alum.mit.edu/
[2] https://github.com/mhagger/git

Michael Haggerty (9):
  files_transaction_prepare(): don't leak flags to packed transaction
  prune_ref(): call `ref_transaction_add_update()` directly
  ref_transaction_update(): die on disallowed flags
  ref_transaction_add_update(): remove a check
  refs: tidy up and adjust visibility of the `ref_update` flags
  refs: rename constant `REF_NODEREF` to `REF_NO_DEREF`
  refs: rename constant `REF_ISPRUNING` to `REF_IS_PRUNING`
  write_packed_entry(): take `object_id` arguments
  refs: update some more docs to use "oid" rather than "sha1"

 builtin/am.c   |   2 +-
 builtin/branch.c   |   2 +-
 builtin/checkout.c |   2 +-
 builtin/clone.c|   4 +-
 builtin/notes.c|   2 +-
 builtin/remote.c   |   6 +--
 builtin/symbolic-ref.c |   2 +-
 builtin/update-ref.c   |   4 +-
 refs.c |   8 ++-
 refs.h |  77 -
 refs/files-backend.c   | 132 +++--
 refs/packed-backend.c  |  18 +++
 refs/ref-cache.c   |   4 +-
 refs/refs-internal.h   |  81 +-
 sequencer.c|   6 +--
 15 files changed, 188 insertions(+), 162 deletions(-)

-- 
2.14.1



[PATCH v2 1/9] files_transaction_prepare(): don't leak flags to packed transaction

2017-11-05 Thread Michael Haggerty
The files backend uses `ref_update::flags` for several internal flags.
But those flags have no meaning to the packed backend. So when adding
updates for the packed-refs transaction, only use flags that make
sense to the packed backend.

`REF_NODEREF` is part of the public interface, and it's logically what
we want, so include it. In fact it is actually ignored by the packed
backend (which doesn't support symbolic references), but that's its
own business.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/files-backend.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 2bd54e11ae..fadf1036d3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2594,8 +2594,8 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
 
ref_transaction_add_update(
packed_transaction, update->refname,
-   update->flags & ~REF_HAVE_OLD,
-   >new_oid, >old_oid,
+   REF_HAVE_NEW | REF_NODEREF,
+   >new_oid, NULL,
NULL);
}
}
-- 
2.14.1



[PATCH v2 7/9] refs: rename constant `REF_ISPRUNING` to `REF_IS_PRUNING`

2017-11-05 Thread Michael Haggerty
Underscores are cheap, and help readability.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/files-backend.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 71e088e811..bb10b715a8 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -14,7 +14,7 @@
  * This backend uses the following flags in `ref_update::flags` for
  * internal bookkeeping purposes. Their numerical values must not
  * conflict with REF_NO_DEREF, REF_FORCE_CREATE_REFLOG, REF_HAVE_NEW,
- * REF_HAVE_OLD, or REF_ISPRUNING, which are also stored in
+ * REF_HAVE_OLD, or REF_IS_PRUNING, which are also stored in
  * `ref_update::flags`.
  */
 
@@ -22,7 +22,7 @@
  * Used as a flag in ref_update::flags when a loose ref is being
  * pruned. This flag must only be used when REF_NO_DEREF is set.
  */
-#define REF_ISPRUNING (1 << 4)
+#define REF_IS_PRUNING (1 << 4)
 
 /*
  * Flag passed to lock_ref_sha1_basic() telling it to tolerate broken
@@ -1044,7 +1044,7 @@ static void prune_ref(struct files_ref_store *refs, 
struct ref_to_prune *r)
goto cleanup;
ref_transaction_add_update(
transaction, r->name,
-   REF_NO_DEREF | REF_HAVE_NEW | REF_HAVE_OLD | 
REF_ISPRUNING,
+   REF_NO_DEREF | REF_HAVE_NEW | REF_HAVE_OLD | 
REF_IS_PRUNING,
_oid, >oid, NULL);
if (ref_transaction_commit(transaction, ))
goto cleanup;
@@ -2177,7 +2177,7 @@ static int split_head_update(struct ref_update *update,
struct ref_update *new_update;
 
if ((update->flags & REF_LOG_ONLY) ||
-   (update->flags & REF_ISPRUNING) ||
+   (update->flags & REF_IS_PRUNING) ||
(update->flags & REF_UPDATE_VIA_HEAD))
return 0;
 
@@ -2564,16 +2564,16 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
 * split_symref_update() or split_head_update(), those
 * functions will check that the new updates don't have the
 * same refname as any existing ones.) Also fail if any of the
-* updates use REF_ISPRUNING without REF_NO_DEREF.
+* updates use REF_IS_PRUNING without REF_NO_DEREF.
 */
for (i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
struct string_list_item *item =
string_list_append(_refnames, update->refname);
 
-   if ((update->flags & REF_ISPRUNING) &&
+   if ((update->flags & REF_IS_PRUNING) &&
!(update->flags & REF_NO_DEREF))
-   BUG("REF_ISPRUNING set without REF_NO_DEREF");
+   BUG("REF_IS_PRUNING set without REF_NO_DEREF");
 
/*
 * We store a pointer to update in item->util, but at
@@ -2632,7 +2632,7 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
 
if (update->flags & REF_DELETING &&
!(update->flags & REF_LOG_ONLY) &&
-   !(update->flags & REF_ISPRUNING)) {
+   !(update->flags & REF_IS_PRUNING)) {
/*
 * This reference has to be deleted from
 * packed-refs if it exists there.
@@ -2749,7 +2749,7 @@ static int files_transaction_finish(struct ref_store 
*ref_store,
struct ref_update *update = transaction->updates[i];
if (update->flags & REF_DELETING &&
!(update->flags & REF_LOG_ONLY) &&
-   !(update->flags & REF_ISPRUNING)) {
+   !(update->flags & REF_IS_PRUNING)) {
strbuf_reset();
files_reflog_path(refs, , update->refname);
if (!unlink_or_warn(sb.buf))
-- 
2.14.1



[PATCH v2 8/9] write_packed_entry(): take `object_id` arguments

2017-11-05 Thread Michael Haggerty
Change `write_packed_entry()` to take `struct object_id *` rather than
`unsigned char *` arguments.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 74f1dea0f4..43ad74fc5a 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -961,11 +961,11 @@ static struct ref_iterator *packed_ref_iterator_begin(
  * by the failing call to `fprintf()`.
  */
 static int write_packed_entry(FILE *fh, const char *refname,
- const unsigned char *sha1,
- const unsigned char *peeled)
+ const struct object_id *oid,
+ const struct object_id *peeled)
 {
-   if (fprintf(fh, "%s %s\n", sha1_to_hex(sha1), refname) < 0 ||
-   (peeled && fprintf(fh, "^%s\n", sha1_to_hex(peeled)) < 0))
+   if (fprintf(fh, "%s %s\n", oid_to_hex(oid), refname) < 0 ||
+   (peeled && fprintf(fh, "^%s\n", oid_to_hex(peeled)) < 0))
return -1;
 
return 0;
@@ -1203,8 +1203,8 @@ static int write_with_updates(struct packed_ref_store 
*refs,
int peel_error = ref_iterator_peel(iter, );
 
if (write_packed_entry(out, iter->refname,
-  iter->oid->hash,
-  peel_error ? NULL : peeled.hash))
+  iter->oid,
+  peel_error ? NULL : ))
goto write_error;
 
if ((ok = ref_iterator_advance(iter)) != ITER_OK)
@@ -1224,8 +1224,8 @@ static int write_with_updates(struct packed_ref_store 
*refs,
 );
 
if (write_packed_entry(out, update->refname,
-  update->new_oid.hash,
-  peel_error ? NULL : peeled.hash))
+  >new_oid,
+  peel_error ? NULL : ))
goto write_error;
 
i++;
-- 
2.14.1



Re: [PATCH 5/7] refs: tidy up and adjust visibility of the `ref_update` flags

2017-11-05 Thread Michael Haggerty
On 11/01/2017 09:18 AM, Martin Ågren wrote:
> On 28 October 2017 at 11:49, Michael Haggerty <mhag...@alum.mit.edu> wrote:
>> The constants used for `ref_update::flags` were rather disorganized:
> 
>> * The documentation wasn't very consistent and partly still referred
>>   to sha1s rather than oids.
> 
>> @@ -478,22 +462,23 @@ struct ref_transaction *ref_transaction_begin(struct 
>> strbuf *err);
>>   *
>>   * refname -- the name of the reference to be affected.
>>   *
>> - * new_sha1 -- the SHA-1 that should be set to be the new value of
>> + * new_oid -- the SHA-1 that should be set to be the new value of
>>   * the reference. Some functions allow this parameter to be
>>   * NULL, meaning that the reference is not changed, or
>> - * null_sha1, meaning that the reference should be deleted. A
>> + * null_oid, meaning that the reference should be deleted. A
>>   * copy of this value is made in the transaction.
>>   *
>> - * old_sha1 -- the SHA-1 value that the reference must have before
>> + * old_oid -- the SHA-1 value that the reference must have before
> 
> You still refer to "SHA-1" twice in this hunk. Maybe squash this in, at
> least partially? This addresses all remaining "sha"/"SHA" in refs.h.
> [...]

Thanks for this.

I'll squash the changes that have to do with these flags into this
commit, and change the other docstrings as part of a separate commit
that also fixes up similar problems in other refs-related comments.

I also realized that `write_packed_entry()` still takes `unsigned char
*` arguments. I'll fix that, too, in yet another commit.

Thanks,
Michael



Re: [PATCH 1/7] files_transaction_prepare(): don't leak flags to packed transaction

2017-11-05 Thread Michael Haggerty
On 10/30/2017 05:44 AM, Junio C Hamano wrote:
> Michael Haggerty <mhag...@alum.mit.edu> writes:
> 
>> The files backend uses `ref_update::flags` for several internal flags.
>> But those flags have no meaning to the packed backend. So when adding
>> updates for the packed-refs transaction, only use flags that make
>> sense to the packed backend.
>>
>> `REF_NODEREF` is part of the public interface, and it's logically what
>> we want, so include it. In fact it is actually ignored by the packed
>> backend (which doesn't support symbolic references), but that's its
>> own business.
>>
>> Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
>> ---
>>  refs/files-backend.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>> index 2bd54e11ae..fadf1036d3 100644
>> --- a/refs/files-backend.c
>> +++ b/refs/files-backend.c
>> @@ -2594,8 +2594,8 @@ static int files_transaction_prepare(struct ref_store 
>> *ref_store,
>>  
>>  ref_transaction_add_update(
>>  packed_transaction, update->refname,
>> -update->flags & ~REF_HAVE_OLD,
>> ->new_oid, >old_oid,
>> +REF_HAVE_NEW | REF_NODEREF,
>> +>new_oid, NULL,
> 
> Hmph, so we earlier passed all flags except HAVE_OLD down, which
> meant that update->flags that this transaction for packed backend
> does not have to see are given to it nevertheless.  The new way the
> parameter is prepared does nto depend on update->flags at all, so
> that is about "don't leak flags".
> 
> That much I can understand.  But it is not explained why (1) we do
> not pass old_oid anymore and (2) we do give HAVE_NEW.  
> 
> Presumably the justification for (1) is something like "because we
> are not passing HAVE_OLD, we shouldn't have been passing old_oid at
> all---it was a harmless bug because lack of HAVE_OLD made the callee
> ignore old_oid"

It's debatable whether the old code should even be called a bug. The
callee is documented to ignore `old_oid` if `HAVE_OLD` is not set. But
it was certainly misleading to pass unneeded information to the function.

> (2) is "we need to pass HAVE_NEW, and we have
> been always passing HAVE_NEW because update->flags at this point is
> guaranteed to have it" or something like that?

Correct. `REF_DELETING` is set by `lock_ref_for_update()` only if
`update->flags & REF_HAVE_NEW` was set, and this code is only called if
`REF_DELETING` is set.

Michael


Re: [PATCH 2/2] refs_resolve_ref_unsafe: handle d/f conflicts for writes

2017-11-04 Thread Michael Haggerty
On 10/07/2017 06:36 AM, Michael Haggerty wrote:
> On 10/06/2017 07:16 PM, Jeff King wrote:
>> On Fri, Oct 06, 2017 at 07:09:10PM +0200, Michael Haggerty wrote:
>>
>>> I do have one twinge of uneasiness at a deeper level, that I haven't had
>>> time to check...
>>>
>>> Does this patch make it easier to *set* HEAD to an unborn branch that
>>> d/f conflicts with an existing reference? If so, that might be a
>>> slightly worse UI for users. I'd rather learn about such a problem when
>>> setting HEAD (when I am thinking about the new branch name and am in the
>>> frame of mind to solve the problem) rather than later, when I try to
>>> commit to the new branch.
>>
>> Good question. The answer is no, it's allowed both before and after my
>> patch. At least via git-symbolic-ref.
>>
>> I agree it would be nice to know earlier for such a case. For
>> symbolic-ref, we probably should allow it, because it's plumbing that
>> may be used for tricky things. For things like "checkout -b", you'd
>> generally get a timely warning as we try to create the ref.
>>
>> The odd man out is "checkout --orphan", which leaves the branch unborn.
>> It might be nice if it did a manual check that the ref is available (and
>> also that it's syntactically acceptable, though I think we may do that
>> already).
>>
>> But all of that is orthogonal to this fix, I think.
> 
> Thanks for checking. Yes, I totally agree that this is orthogonal.

I also just checked but there don't seem to be any docstrings that need
updating.

Reviewed-by: Michael Haggerty <mhag...@alum.mit.edu>

(both patches in this series).

Michael






[PATCH 7/7] refs: rename constant `REF_ISPRUNING` to `REF_IS_PRUNING`

2017-10-28 Thread Michael Haggerty
Underscores are cheap, and help readability.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/files-backend.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 71e088e811..bb10b715a8 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -14,7 +14,7 @@
  * This backend uses the following flags in `ref_update::flags` for
  * internal bookkeeping purposes. Their numerical values must not
  * conflict with REF_NO_DEREF, REF_FORCE_CREATE_REFLOG, REF_HAVE_NEW,
- * REF_HAVE_OLD, or REF_ISPRUNING, which are also stored in
+ * REF_HAVE_OLD, or REF_IS_PRUNING, which are also stored in
  * `ref_update::flags`.
  */
 
@@ -22,7 +22,7 @@
  * Used as a flag in ref_update::flags when a loose ref is being
  * pruned. This flag must only be used when REF_NO_DEREF is set.
  */
-#define REF_ISPRUNING (1 << 4)
+#define REF_IS_PRUNING (1 << 4)
 
 /*
  * Flag passed to lock_ref_sha1_basic() telling it to tolerate broken
@@ -1044,7 +1044,7 @@ static void prune_ref(struct files_ref_store *refs, 
struct ref_to_prune *r)
goto cleanup;
ref_transaction_add_update(
transaction, r->name,
-   REF_NO_DEREF | REF_HAVE_NEW | REF_HAVE_OLD | 
REF_ISPRUNING,
+   REF_NO_DEREF | REF_HAVE_NEW | REF_HAVE_OLD | 
REF_IS_PRUNING,
_oid, >oid, NULL);
if (ref_transaction_commit(transaction, ))
goto cleanup;
@@ -2177,7 +2177,7 @@ static int split_head_update(struct ref_update *update,
struct ref_update *new_update;
 
if ((update->flags & REF_LOG_ONLY) ||
-   (update->flags & REF_ISPRUNING) ||
+   (update->flags & REF_IS_PRUNING) ||
(update->flags & REF_UPDATE_VIA_HEAD))
return 0;
 
@@ -2564,16 +2564,16 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
 * split_symref_update() or split_head_update(), those
 * functions will check that the new updates don't have the
 * same refname as any existing ones.) Also fail if any of the
-* updates use REF_ISPRUNING without REF_NO_DEREF.
+* updates use REF_IS_PRUNING without REF_NO_DEREF.
 */
for (i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
struct string_list_item *item =
string_list_append(_refnames, update->refname);
 
-   if ((update->flags & REF_ISPRUNING) &&
+   if ((update->flags & REF_IS_PRUNING) &&
!(update->flags & REF_NO_DEREF))
-   BUG("REF_ISPRUNING set without REF_NO_DEREF");
+   BUG("REF_IS_PRUNING set without REF_NO_DEREF");
 
/*
 * We store a pointer to update in item->util, but at
@@ -2632,7 +2632,7 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
 
if (update->flags & REF_DELETING &&
!(update->flags & REF_LOG_ONLY) &&
-   !(update->flags & REF_ISPRUNING)) {
+   !(update->flags & REF_IS_PRUNING)) {
/*
 * This reference has to be deleted from
 * packed-refs if it exists there.
@@ -2749,7 +2749,7 @@ static int files_transaction_finish(struct ref_store 
*ref_store,
struct ref_update *update = transaction->updates[i];
if (update->flags & REF_DELETING &&
!(update->flags & REF_LOG_ONLY) &&
-   !(update->flags & REF_ISPRUNING)) {
+   !(update->flags & REF_IS_PRUNING)) {
strbuf_reset();
files_reflog_path(refs, , update->refname);
if (!unlink_or_warn(sb.buf))
-- 
2.14.1



[PATCH 0/7] Tidy up the constants related to ref_update::flags

2017-10-28 Thread Michael Haggerty
Nothing earth-shattering here; just cleaning up some internal details
that have bothered me for a while:

* Make it clearer which flag values the packed backend might confront.

* Reduce the visibility of the constants that are only relevant to the
  files backend. The most notable of these is `REF_ISPRUNING`, which
  previously was special-cased in
  `REF_TRANSACTION_UPDATE_ALLOWED_FLAGS` even though it shouldn't be
  used by callers outside of the refs module.

* Die (rather then silently ignoring) if any disallowed flags are
  passed to `ref_transaction_update()` or friends.

* Rename `REF_NODEREF` to `REF_NO_DEREF` (otherwise it's easy to read
  as `REF_NODE_REF`!)

* Rename `REF_ISPRUNING` to `REF_IS_PRUNING`.

* Make the constants' numerical values correspond to their order.

* Improve a lot of docstrings.

This patch series depends on `bc/object-id`, mostly so that the
comments can talk about OIDs rather than SHA-1s.

These patches are also available as branch `tidy-ref-update-flags`
from my GitHub fork [1].

Michael

[1] https://github.com/mhagger/git

Michael Haggerty (7):
  files_transaction_prepare(): don't leak flags to packed transaction
  prune_ref(): call `ref_transaction_add_update()` directly
  ref_transaction_update(): die on disallowed flags
  ref_transaction_add_update(): remove a check
  refs: tidy up and adjust visibility of the `ref_update` flags
  refs: rename constant `REF_NODEREF` to `REF_NO_DEREF`
  refs: rename constant `REF_ISPRUNING` to `REF_IS_PRUNING`

 builtin/am.c   |   2 +-
 builtin/branch.c   |   2 +-
 builtin/checkout.c |   2 +-
 builtin/clone.c|   4 +-
 builtin/notes.c|   2 +-
 builtin/remote.c   |   6 +--
 builtin/symbolic-ref.c |   2 +-
 builtin/update-ref.c   |   4 +-
 refs.c |   6 +--
 refs.h |  67 -
 refs/files-backend.c   | 113 +
 refs/refs-internal.h   |  67 +++--
 sequencer.c|   6 +--
 13 files changed, 155 insertions(+), 128 deletions(-)

-- 
2.14.1



[PATCH 1/7] files_transaction_prepare(): don't leak flags to packed transaction

2017-10-28 Thread Michael Haggerty
The files backend uses `ref_update::flags` for several internal flags.
But those flags have no meaning to the packed backend. So when adding
updates for the packed-refs transaction, only use flags that make
sense to the packed backend.

`REF_NODEREF` is part of the public interface, and it's logically what
we want, so include it. In fact it is actually ignored by the packed
backend (which doesn't support symbolic references), but that's its
own business.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/files-backend.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 2bd54e11ae..fadf1036d3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2594,8 +2594,8 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
 
ref_transaction_add_update(
packed_transaction, update->refname,
-   update->flags & ~REF_HAVE_OLD,
-   >new_oid, >old_oid,
+   REF_HAVE_NEW | REF_NODEREF,
+   >new_oid, NULL,
NULL);
}
}
-- 
2.14.1



[PATCH 6/7] refs: rename constant `REF_NODEREF` to `REF_NO_DEREF`

2017-10-28 Thread Michael Haggerty
Even after working with this code for years, I still see this constant
name as "ref node ref". Rename it to make it's meaning clearer.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 builtin/am.c   |  2 +-
 builtin/branch.c   |  2 +-
 builtin/checkout.c |  2 +-
 builtin/clone.c|  4 ++--
 builtin/notes.c|  2 +-
 builtin/remote.c   |  6 +++---
 builtin/symbolic-ref.c |  2 +-
 builtin/update-ref.c   |  4 ++--
 refs.h |  6 +++---
 refs/files-backend.c   | 40 
 refs/refs-internal.h   |  4 ++--
 sequencer.c|  6 +++---
 12 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index c9bb14a6c2..894290e2d3 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -2151,7 +2151,7 @@ static void am_abort(struct am_state *state)
   has_curr_head ? _head : NULL, 0,
   UPDATE_REFS_DIE_ON_ERR);
else if (curr_branch)
-   delete_ref(NULL, curr_branch, NULL, REF_NODEREF);
+   delete_ref(NULL, curr_branch, NULL, REF_NO_DEREF);
 
free(curr_branch);
am_destroy(state);
diff --git a/builtin/branch.c b/builtin/branch.c
index b1ed649300..33fd5fcfd1 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -258,7 +258,7 @@ static int delete_branches(int argc, const char **argv, int 
force, int kinds,
}
 
if (delete_ref(NULL, name, is_null_oid() ? NULL : ,
-  REF_NODEREF)) {
+  REF_NO_DEREF)) {
error(remote_branch
  ? _("Error deleting remote-tracking branch '%s'")
  : _("Error deleting branch '%s'"),
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 463a337e5d..114028ee01 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -665,7 +665,7 @@ static void update_refs_for_switch(const struct 
checkout_opts *opts,
/* Nothing to do. */
} else if (opts->force_detach || !new->path) {  /* No longer on any 
branch. */
update_ref(msg.buf, "HEAD", >commit->object.oid, NULL,
-  REF_NODEREF, UPDATE_REFS_DIE_ON_ERR);
+  REF_NO_DEREF, UPDATE_REFS_DIE_ON_ERR);
if (!opts->quiet) {
if (old->path &&
advice_detached_head && !opts->force_detach)
diff --git a/builtin/clone.c b/builtin/clone.c
index 695bdd7046..557c6c3c06 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -689,7 +689,7 @@ static void update_head(const struct ref *our, const struct 
ref *remote,
} else if (our) {
struct commit *c = lookup_commit_reference(>old_oid);
/* --branch specifies a non-branch (i.e. tags), detach HEAD */
-   update_ref(msg, "HEAD", >object.oid, NULL, REF_NODEREF,
+   update_ref(msg, "HEAD", >object.oid, NULL, REF_NO_DEREF,
   UPDATE_REFS_DIE_ON_ERR);
} else if (remote) {
/*
@@ -697,7 +697,7 @@ static void update_head(const struct ref *our, const struct 
ref *remote,
 * HEAD points to a branch but we don't know which one.
 * Detach HEAD in all these cases.
 */
-   update_ref(msg, "HEAD", >old_oid, NULL, REF_NODEREF,
+   update_ref(msg, "HEAD", >old_oid, NULL, REF_NO_DEREF,
   UPDATE_REFS_DIE_ON_ERR);
}
 }
diff --git a/builtin/notes.c b/builtin/notes.c
index 12afdf1907..d7754db143 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -686,7 +686,7 @@ static int merge_abort(struct notes_merge_options *o)
 
if (delete_ref(NULL, "NOTES_MERGE_PARTIAL", NULL, 0))
ret += error(_("failed to delete ref NOTES_MERGE_PARTIAL"));
-   if (delete_ref(NULL, "NOTES_MERGE_REF", NULL, REF_NODEREF))
+   if (delete_ref(NULL, "NOTES_MERGE_REF", NULL, REF_NO_DEREF))
ret += error(_("failed to delete ref NOTES_MERGE_REF"));
if (notes_merge_abort(o))
ret += error(_("failed to remove 'git notes merge' worktree"));
diff --git a/builtin/remote.c b/builtin/remote.c
index 0fddc64461..3d38c6150c 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -693,7 +693,7 @@ static int mv(int argc, const char **argv)
read_ref_full(item->string, RESOLVE_REF_READING, , );
if (!(flag & REF_ISSYMREF))
continue;
-   if (delete_ref(NULL, item->string, NULL, REF_NODEREF))
+   if (delete_ref(NULL, item->string, NULL, REF_NO_DEREF))
die(_(

[PATCH 5/7] refs: tidy up and adjust visibility of the `ref_update` flags

2017-10-28 Thread Michael Haggerty
The constants used for `ref_update::flags` were rather disorganized:

* The definitions in `refs.h` were not close to the functions that
  used them.

* Maybe constants were defined in `refs-internal.h`, making them
  visible to the whole refs module, when in fact they only made sense
  for the files backend.

* The documentation wasn't very consistent and partly still referred
  to sha1s rather than oids.

* The numerical values followed no rational scheme

Fix all of these problems. The main functional improvement is that
some constants' visibility is now limited to `files-backend.c`.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs.h   | 65 +-
 refs/files-backend.c | 45 +++
 refs/refs-internal.h | 67 
 3 files changed, 98 insertions(+), 79 deletions(-)

diff --git a/refs.h b/refs.h
index 4ffef9502d..7dc022a809 100644
--- a/refs.h
+++ b/refs.h
@@ -335,22 +335,6 @@ void warn_dangling_symrefs(FILE *fp, const char *msg_fmt,
  */
 int refs_pack_refs(struct ref_store *refs, unsigned int flags);
 
-/*
- * Flags controlling ref_transaction_update(), ref_transaction_create(), etc.
- * REF_NODEREF: act on the ref directly, instead of dereferencing
- *  symbolic references.
- *
- * Other flags are reserved for internal use.
- */
-#define REF_NODEREF0x01
-#define REF_FORCE_CREATE_REFLOG 0x40
-
-/*
- * Flags that can be passed in to ref_transaction_update
- */
-#define REF_TRANSACTION_UPDATE_ALLOWED_FLAGS \
-   (REF_NODEREF | REF_FORCE_CREATE_REFLOG)
-
 /*
  * Setup reflog before using. Fill in err and return -1 on failure.
  */
@@ -478,22 +462,23 @@ struct ref_transaction *ref_transaction_begin(struct 
strbuf *err);
  *
  * refname -- the name of the reference to be affected.
  *
- * new_sha1 -- the SHA-1 that should be set to be the new value of
+ * new_oid -- the SHA-1 that should be set to be the new value of
  * the reference. Some functions allow this parameter to be
  * NULL, meaning that the reference is not changed, or
- * null_sha1, meaning that the reference should be deleted. A
+ * null_oid, meaning that the reference should be deleted. A
  * copy of this value is made in the transaction.
  *
- * old_sha1 -- the SHA-1 value that the reference must have before
+ * old_oid -- the SHA-1 value that the reference must have before
  * the update. Some functions allow this parameter to be NULL,
  * meaning that the old value of the reference is not checked,
- * or null_sha1, meaning that the reference must not exist
+ * or null_oid, meaning that the reference must not exist
  * before the update. A copy of this value is made in the
  * transaction.
  *
  * flags -- flags affecting the update, passed to
- * update_ref_lock(). Can be REF_NODEREF, which means that
- * symbolic references should not be followed.
+ * update_ref_lock(). Possible flags: REF_NODEREF,
+ * REF_FORCE_CREATE_REFLOG. See those constants for more
+ * information.
  *
  * msg -- a message describing the change (for the reflog).
  *
@@ -509,11 +494,37 @@ struct ref_transaction *ref_transaction_begin(struct 
strbuf *err);
  */
 
 /*
- * Add a reference update to transaction. new_oid is the value that
- * the reference should have after the update, or null_oid if it
- * should be deleted. If new_oid is NULL, then the reference is not
- * changed at all. old_oid is the value that the reference must have
- * before the update, or null_oid if it must not have existed
+ * The following flags can be passed to ref_transaction_update() etc.
+ * Internally, they are stored in `ref_update::flags`, along with some
+ * internal flags.
+ */
+
+/*
+ * Act on the ref directly; i.e., without dereferencing symbolic refs.
+ * If this flag is not specified, then symbolic references are
+ * dereferenced and the update is applied to the referent.
+ */
+#define REF_NODEREF (1 << 0)
+
+/*
+ * Force the creation of a reflog for this reference, even if it
+ * didn't previously have a reflog.
+ */
+#define REF_FORCE_CREATE_REFLOG (1 << 1)
+
+/*
+ * Bitmask of all of the flags that are allowed to be passed in to
+ * ref_transaction_update() and friends:
+ */
+#define REF_TRANSACTION_UPDATE_ALLOWED_FLAGS \
+   (REF_NODEREF | REF_FORCE_CREATE_REFLOG)
+
+/*
+ * Add a reference update to transaction. `new_oid` is the value that
+ * the reference should have after the update, or `null_oid` if it
+ * should be deleted. If `new_oid` is NULL, then the reference is not
+ * changed at all. `old_oid` is the value that the reference must have
+ * before the update, or `null_oid` if it must not have existed
  * beforehand. The old value is checked after the lock is taken to
  * prevent races. If the old value doesn't 

[PATCH 2/7] prune_ref(): call `ref_transaction_add_update()` directly

2017-10-28 Thread Michael Haggerty
`prune_ref()` needs to use the `REF_ISPRUNING` flag, but we want to
make that flag private to the files backend. So instead of calling
`ref_transaction_delete()`, which is a public function and therefore
shouldn't allow the `REF_ISPRUNING` flag, change `prune_ref()` to call
`ref_transaction_add_update()`, which is private to the refs
module. (Note that we don't need any of the other services provided by
`ref_transaction_delete()`.)

This allows us to change `ref_transaction_update()` to reject the
`REF_ISPRUNING` flag. Do so by adjusting
`REF_TRANSACTION_UPDATE_ALLOWED_FLAGS`. Also add parentheses to its
definition to avoid potential future mishaps.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs.h   |  4 +---
 refs/files-backend.c | 25 -
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/refs.h b/refs.h
index 15fd419c7d..4ffef9502d 100644
--- a/refs.h
+++ b/refs.h
@@ -349,9 +349,7 @@ int refs_pack_refs(struct ref_store *refs, unsigned int 
flags);
  * Flags that can be passed in to ref_transaction_update
  */
 #define REF_TRANSACTION_UPDATE_ALLOWED_FLAGS \
-   REF_ISPRUNING |  \
-   REF_FORCE_CREATE_REFLOG |\
-   REF_NODEREF
+   (REF_NODEREF | REF_FORCE_CREATE_REFLOG)
 
 /*
  * Setup reflog before using. Fill in err and return -1 on failure.
diff --git a/refs/files-backend.c b/refs/files-backend.c
index fadf1036d3..ba72d28b13 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -989,22 +989,29 @@ static void prune_ref(struct files_ref_store *refs, 
struct ref_to_prune *r)
 {
struct ref_transaction *transaction;
struct strbuf err = STRBUF_INIT;
+   int ret = -1;
 
if (check_refname_format(r->name, 0))
return;
 
transaction = ref_store_transaction_begin(>base, );
-   if (!transaction ||
-   ref_transaction_delete(transaction, r->name, >oid,
-  REF_ISPRUNING | REF_NODEREF, NULL, ) ||
-   ref_transaction_commit(transaction, )) {
-   ref_transaction_free(transaction);
+   if (!transaction)
+   goto cleanup;
+   ref_transaction_add_update(
+   transaction, r->name,
+   REF_NODEREF | REF_HAVE_NEW | REF_HAVE_OLD | 
REF_ISPRUNING,
+   _oid, >oid, NULL);
+   if (ref_transaction_commit(transaction, ))
+   goto cleanup;
+
+   ret = 0;
+
+cleanup:
+   if (ret)
error("%s", err.buf);
-   strbuf_release();
-   return;
-   }
-   ref_transaction_free(transaction);
strbuf_release();
+   ref_transaction_free(transaction);
+   return;
 }
 
 /*
-- 
2.14.1



[PATCH 4/7] ref_transaction_add_update(): remove a check

2017-10-28 Thread Michael Haggerty
We want to make `REF_ISPRUNING` internal to the files backend. For
this to be possible, `ref_transaction_add_update()` mustn't know about
it. So move the check that `REF_ISPRUNING` is only used with
`REF_NODEREF` from this function to `files_transaction_prepare()`.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs.c   | 3 ---
 refs/files-backend.c | 7 ++-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/refs.c b/refs.c
index 7c1e206e08..0d9a1348cd 100644
--- a/refs.c
+++ b/refs.c
@@ -906,9 +906,6 @@ struct ref_update *ref_transaction_add_update(
if (transaction->state != REF_TRANSACTION_OPEN)
die("BUG: update called for transaction that is not open");
 
-   if ((flags & REF_ISPRUNING) && !(flags & REF_NODEREF))
-   die("BUG: REF_ISPRUNING set without REF_NODEREF");
-
FLEX_ALLOC_STR(update, refname, refname);
ALLOC_GROW(transaction->updates, transaction->nr + 1, 
transaction->alloc);
transaction->updates[transaction->nr++] = update;
diff --git a/refs/files-backend.c b/refs/files-backend.c
index ba72d28b13..a47771e4d4 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2518,13 +2518,18 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
 * transaction. (If we end up splitting up any updates using
 * split_symref_update() or split_head_update(), those
 * functions will check that the new updates don't have the
-* same refname as any existing ones.)
+* same refname as any existing ones.) Also fail if any of the
+* updates use REF_ISPRUNING without REF_NODEREF.
 */
for (i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
struct string_list_item *item =
string_list_append(_refnames, update->refname);
 
+   if ((update->flags & REF_ISPRUNING) &&
+   !(update->flags & REF_NODEREF))
+   BUG("REF_ISPRUNING set without REF_NODEREF");
+
/*
 * We store a pointer to update in item->util, but at
 * the moment we never use the value of this field
-- 
2.14.1



[PATCH 3/7] ref_transaction_update(): die on disallowed flags

2017-10-28 Thread Michael Haggerty
Callers shouldn't be passing disallowed flags into
`ref_transaction_update()`. So instead of masking them off, treat it
as a bug if any are set.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/refs.c b/refs.c
index 62a7621025..7c1e206e08 100644
--- a/refs.c
+++ b/refs.c
@@ -940,7 +940,8 @@ int ref_transaction_update(struct ref_transaction 
*transaction,
return -1;
}
 
-   flags &= REF_TRANSACTION_UPDATE_ALLOWED_FLAGS;
+   if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS)
+   BUG("illegal flags 0x%x passed to ref_transaction_update()", 
flags);
 
flags |= (new_oid ? REF_HAVE_NEW : 0) | (old_oid ? REF_HAVE_OLD : 0);
 
-- 
2.14.1



[PATCH v2 1/2] t1409: check that `packed-refs` is not rewritten unnecessarily

2017-10-28 Thread Michael Haggerty
There is no need to rewrite the `packed-refs` file except for the case
that we are deleting a reference that has a packed version. Verify
that `packed-refs` is not rewritten when it shouldn't be.

In fact, two of these tests fail:

* A new (empty) `packed-refs` file is created when deleting any loose
  reference and no `packed-refs` file previously existed.

* The `packed-refs` file is rewritten unnecessarily when deleting a
  loose reference that has no packed counterpart.

Both problems will be fixed in the next commit.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 t/t1409-avoid-packing-refs.sh | 118 ++
 1 file changed, 118 insertions(+)
 create mode 100755 t/t1409-avoid-packing-refs.sh

diff --git a/t/t1409-avoid-packing-refs.sh b/t/t1409-avoid-packing-refs.sh
new file mode 100755
index 00..a2397c7b71
--- /dev/null
+++ b/t/t1409-avoid-packing-refs.sh
@@ -0,0 +1,118 @@
+#!/bin/sh
+
+test_description='avoid rewriting packed-refs unnecessarily'
+
+. ./test-lib.sh
+
+# Add an identifying mark to the packed-refs file header line. This
+# shouldn't upset readers, and it should be omitted if the file is
+# ever rewritten.
+mark_packed_refs () {
+   sed -e "s/^\(#.*\)/\1 t1409 /" <.git/packed-refs >.git/packed-refs.new 
&&
+   mv .git/packed-refs.new .git/packed-refs
+}
+
+# Verify that the packed-refs file is still marked.
+check_packed_refs_marked () {
+   grep -q '^#.* t1409 ' .git/packed-refs
+}
+
+test_expect_success 'setup' '
+   git commit --allow-empty -m "Commit A" &&
+   A=$(git rev-parse HEAD) &&
+   git commit --allow-empty -m "Commit B" &&
+   B=$(git rev-parse HEAD) &&
+   git commit --allow-empty -m "Commit C" &&
+   C=$(git rev-parse HEAD)
+'
+
+test_expect_failure 'do not create packed-refs file gratuitously' '
+   test_must_fail test -f .git/packed-refs &&
+   git update-ref refs/heads/foo $A &&
+   test_must_fail test -f .git/packed-refs &&
+   git update-ref refs/heads/foo $B &&
+   test_must_fail test -f .git/packed-refs &&
+   git update-ref refs/heads/foo $C $B &&
+   test_must_fail test -f .git/packed-refs &&
+   git update-ref -d refs/heads/foo &&
+   test_must_fail test -f .git/packed-refs
+'
+
+test_expect_success 'check that marking the packed-refs file works' '
+   git for-each-ref >expected &&
+   git pack-refs --all &&
+   mark_packed_refs &&
+   check_packed_refs_marked &&
+   git for-each-ref >actual &&
+   test_cmp expected actual &&
+   git pack-refs --all &&
+   test_must_fail check_packed_refs_marked &&
+   git for-each-ref >actual2 &&
+   test_cmp expected actual2
+'
+
+test_expect_success 'leave packed-refs untouched on update of packed' '
+   git update-ref refs/heads/packed-update $A &&
+   git pack-refs --all &&
+   mark_packed_refs &&
+   git update-ref refs/heads/packed-update $B &&
+   check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on checked update of packed' '
+   git update-ref refs/heads/packed-checked-update $A &&
+   git pack-refs --all &&
+   mark_packed_refs &&
+   git update-ref refs/heads/packed-checked-update $B $A &&
+   check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on verify of packed' '
+   git update-ref refs/heads/packed-verify $A &&
+   git pack-refs --all &&
+   mark_packed_refs &&
+   echo "verify refs/heads/packed-verify $A" | git update-ref --stdin &&
+   check_packed_refs_marked
+'
+
+test_expect_success 'touch packed-refs on delete of packed' '
+   git update-ref refs/heads/packed-delete $A &&
+   git pack-refs --all &&
+   mark_packed_refs &&
+   git update-ref -d refs/heads/packed-delete &&
+   test_must_fail check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on update of loose' '
+   git pack-refs --all &&
+   git update-ref refs/heads/loose-update $A &&
+   mark_packed_refs &&
+   git update-ref refs/heads/loose-update $B &&
+   check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on checked update of loose' '
+   git pack-refs --all &&
+   git update-ref refs/heads/loose-checked-update $A &&
+   mark_packed_refs &&
+   git update-ref refs/heads/loose-checked-update $B $A &&
+   check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on verify of loose' '
+ 

[PATCH v2 0/2] Avoid rewriting "packed-refs" unnecessarily

2017-10-28 Thread Michael Haggerty
This reroll make some logically small changes to v1 [1] that are
textually very big:

* Invert the sense of `is_packed_transaction_noop()` and rename it to
  `is_packed_transaction_needed()`. This makes the logic easier to
  follow and document.

* Add a big comment to that function, describing the cases when it
  returns false positives and explaining why that isn't a problem.

* In the commit message for patch 02, gives a lot more information
  about the regression that it is fixing. Thanks to Eric for the
  suggestion.

These patches are also available as branch
`avoid-rewriting-packed-refs` on my GitHub fork [2]. They now use
`mh/packed-ref-transactions` as the base, since that is where Junio
chose to apply v1.

Michael

[1] https://public-inbox.org/git/cover.1508924577.git.mhag...@alum.mit.edu/
[2] https://github.com/mhagger/git

Michael Haggerty (2):
  t1409: check that `packed-refs` is not rewritten unnecessarily
  files-backend: don't rewrite the `packed-refs` file unnecessarily

 refs/files-backend.c  |  18 ++-
 refs/packed-backend.c |  94 +
 refs/packed-backend.h |   9 
 t/t1409-avoid-packing-refs.sh | 118 ++
 4 files changed, 238 insertions(+), 1 deletion(-)
 create mode 100755 t/t1409-avoid-packing-refs.sh

-- 
2.14.1



[PATCH v2 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily

2017-10-28 Thread Michael Haggerty
Even when we are deleting references, we needn't overwrite the
`packed-refs` file if the references that we are deleting only exist
as loose references. Implement this optimization as follows:

* Add a function `is_packed_transaction_needed()`, which checks
  whether a given packed-refs transaction actually needs to be carried
  out (i.e., it returns false if the transaction obviously wouldn't
  have any effect). This function must be called while holding the
  `packed-refs` lock to avoid races.

* Change `files_transaction_prepare()` to check whether the
  packed-refs transaction is actually needed. If not, squelch it, but
  continue holding the `packed-refs` lock until the end of the
  transaction to avoid races.

This fixes a mild regression caused by dc39e09942 (files_ref_store:
use a transaction to update packed refs, 2017-09-08). Before that
commit, unnecessary rewrites of `packed-refs` were suppressed by
`repack_without_refs()`. But the transaction-based writing introduced
by that commit didn't perform that optimization.

Note that the pre-dc39e09942 code still had to *read* the whole
`packed-refs` file to determine that the rewrite could be skipped, so
the performance for the cases that the write could be elided was
`O(N)` in the number of packed references both before and after
dc39e09942. But after that commit the constant factor increased.

This commit reimplements the optimization of eliding unnecessary
`packed-refs` rewrites. That, plus the fact that since
cfa2e29c34 (packed_ref_store: get rid of the `ref_cache` entirely,
2017-03-17) we don't necessarily have to read the whole `packed-refs`
file at all, means that deletes of one or a few loose references can
now be done with `O(n lg N)` effort, where `n` is the number of loose
references being deleted and `N` is the total number of packed
references.

This commit fixes two tests in t1409.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/files-backend.c  | 18 -
 refs/packed-backend.c | 94 +++
 refs/packed-backend.h |  9 +
 t/t1409-avoid-packing-refs.sh |  4 +-
 4 files changed, 122 insertions(+), 3 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 961424a4ea..da8a986697 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2562,7 +2562,23 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
goto cleanup;
}
backend_data->packed_refs_locked = 1;
-   ret = ref_transaction_prepare(packed_transaction, err);
+
+   if (is_packed_transaction_needed(refs->packed_ref_store,
+packed_transaction)) {
+   ret = ref_transaction_prepare(packed_transaction, err);
+   } else {
+   /*
+* We can skip rewriting the `packed-refs`
+* file. But we do need to leave it locked, so
+* that somebody else doesn't pack a reference
+* that we are trying to delete.
+*/
+   if (ref_transaction_abort(packed_transaction, err)) {
+   ret = TRANSACTION_GENERIC_ERROR;
+   goto cleanup;
+   }
+   backend_data->packed_transaction = NULL;
+   }
}
 
 cleanup:
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 0279aeceea..0b0a17ca8e 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -754,6 +754,100 @@ static int write_with_updates(struct packed_ref_store 
*refs,
return -1;
 }
 
+int is_packed_transaction_needed(struct ref_store *ref_store,
+struct ref_transaction *transaction)
+{
+   struct packed_ref_store *refs = packed_downcast(
+   ref_store,
+   REF_STORE_READ,
+   "is_packed_transaction_needed");
+   struct strbuf referent = STRBUF_INIT;
+   size_t i;
+   int ret;
+
+   if (!is_lock_file_locked(>lock))
+   BUG("is_packed_transaction_needed() called while unlocked");
+
+   /*
+* We're only going to bother returning false for the common,
+* trivial case that references are only being deleted, their
+* old values are not being checked, and the old `packed-refs`
+* file doesn't contain any of those reference(s). This gives
+* false positives for some other cases that could
+* theoretically be optimized away:
+*
+* 1. It could be that the old value is being verified without
+*setting a new value. In this case, we could verify the
+*old value here and skip the update if it agrees. If it
+*disagrees, we 

Re: [PATCH 0/2] Avoid rewriting "packed-refs" unnecessarily

2017-10-27 Thread Michael Haggerty
On 10/25/2017 10:10 PM, Eric Sunshine wrote:
> On Wed, Oct 25, 2017 at 5:53 AM, Michael Haggerty <mhag...@alum.mit.edu> 
> wrote:
>> Since commit dc39e09942 (files_ref_store: use a transaction to update
>> packed refs, 2017-09-08), we've been rewriting the `packed-refs` file
>> unnecessarily when deleting a loose reference that has no packed
>> counterpart. Add some tests demonstrating this problem, then fix it by
>> teaching `files_backend` to see whether any references being deleted
>> have packed versions, and if not, skipping the packed_refs
>> transaction.
>>
>> This is a mild regression since 2.14, which avoided rewriting the
>> `packed-refs` file in these cases [...]
> 
> Should the above information (that this fixes a regression) be
> mentioned in the commit message of at least one of the patches
> (probably 2/2)? Without such context, it may not be clear to someone
> later reading those commit message -- someone who wasn't following
> this email discussion -- that this behavior had been lost and is now
> being restored.

Yes, that's a good idea. I'll send a re-roll with that change.

Michael


Re: [PATCH] t0000: check whether the shell supports the "local" keyword

2017-10-26 Thread Michael Haggerty
On 10/26/2017 10:40 AM, Jacob Keller wrote:
> On Thu, Oct 26, 2017 at 1:28 AM, Eric Sunshine <sunsh...@sunshineco.com> 
> wrote:
>> On Thu, Oct 26, 2017 at 4:18 AM, Michael Haggerty <mhag...@alum.mit.edu> 
>> wrote:
>>> Add a test balloon to see if we get complaints from anybody who is
>>> using a shell that doesn't support the "local" keyword. If so, this
>>> test can be reverted. If not, we might want to consider using "local"
>>> in shell code throughout the git code base.
>>
>> I would guess that the number of people who actually run the Git test
>> suite is microscopic compared to the number of people who use Git
>> itself. It is not clear, therefore, that lack of reports of failure of
>> the new test would imply that "local" can safely be used throughout
>> the Git code base. At best, it might indicate that "local" can be used
>> in the tests.
>>
>> Or, am I missing something?
>>
> 
> I don't think you're missing anything. I think the idea here is: "do
> any users who actively run the test suite care if we start using
> local". I don't think the goal is to allow use of local in non-test
> suite code. At least, that's not how I interpreted it.
> 
> Thus it's fine to be only as part of a test and see if anyone
> complains, since the only people affected would be those which
> actually run the test suite...
> 
> Changing our requirement for regular shell scripts we ship seems a lot
> trickier to gauge.

Actually, I would hope that if this experiment is successful that we can
use "local" in production code, too.

The proper question isn't "what fraction of Git users run the test
suite?", because I agree with Eric that that is microscopic. The correct
question is "on what fraction of platforms where Git will be run has the
test suite been run by *somebody*?", and I think (I hope!) that that
fraction is quite high.

Really...if you are compiling Git on a platform that is so deviant or
archaic that it doesn't have a reasonable Shell, and you don't even
bother running the test suite, you kindof deserve your fate, don't you?

Michael


[PATCH] t0000: check whether the shell supports the "local" keyword

2017-10-26 Thread Michael Haggerty
Add a test balloon to see if we get complaints from anybody who is
using a shell that doesn't support the "local" keyword. If so, this
test can be reverted. If not, we might want to consider using "local"
in shell code throughout the git code base.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
This has been discussed on the mailing list [1,2].

Michael

[1] 
https://public-inbox.org/git/CAPig+cRLB=dGD=+Af=yyl3m709lrpeurtvcdlo9ibkyy2ha...@mail.gmail.com/
[2] https://public-inbox.org/git/20160601163747.ga10...@sigill.intra.peff.net/

 t/t-basic.sh | 25 +
 1 file changed, 25 insertions(+)

diff --git a/t/t-basic.sh b/t/t-basic.sh
index 1aa5093f36..7fd87dd544 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -20,6 +20,31 @@ modification *should* take notice and update the test 
vectors here.
 
 . ./test-lib.sh
 
+try_local_x () {
+   local x="local" &&
+   echo "$x"
+}
+
+# This test is an experiment to check whether any Git users are using
+# Shells that don't support the "local" keyword. "local" is not
+# POSIX-standard, but it is very widely supported by POSIX-compliant
+# shells, and if it doesn't cause problems for people, we would like
+# to be able to use it in Git code.
+#
+# For now, this is the only test that requires "local". If your shell
+# fails this test, you can ignore the failure, but please report the
+# problem to the Git mailing list <git@vger.kernel.org>, as it might
+# convince us to continue avoiding the use of "local".
+test_expect_success 'verify that the running shell supports "local"' '
+   x="notlocal" &&
+   echo "local" >expected1 &&
+   try_local_x >actual1 &&
+   test_cmp expected1 actual1 &&
+   echo "notlocal" >expected2 &&
+   echo "$x" >actual2 &&
+   test_cmp expected2 actual2
+'
+
 
 # git init has been done in an empty repository.
 # make sure it is empty.
-- 
2.14.1



Re: [PATCH 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily

2017-10-26 Thread Michael Haggerty
On 10/26/2017 08:46 AM, Junio C Hamano wrote:
> Michael Haggerty <mhag...@alum.mit.edu> writes:
> 
>> Even when we are deleting references, we needn't overwrite the
>> `packed-refs` file if the references that we are deleting are not
>> present in that file. Implement this optimization as follows:
> 
> This goal I can understand.  files-transaction-prepare may see a
> deletion and in order to avoid a deletion of a ref from the
> file-backend to expose a stale entry in the packed-refs file, it
> prepares a transaction to remove the same ref that might exist in
> it.  If it turns out that there is no such ref in the packed-refs
> file, then we can leave the packed-refs file as-is without
> rewriting.
> 
>> * Add a function `is_packed_transaction_noop()`, which checks whether
>>   a given packed-refs transaction doesn't actually have to do
>>   anything. This function must be called while holding the
>>   `packed-refs` lock to avoid races.
> 
> This checks three things only to cover the most trivial case (I am
> perfectly happy that it punts on more complex cases).  If an update
> 
>  - checks the old value,
> 
>  - sets a new value, or
> 
>  - deletes a ref that is not on the filesystem,
> 
> it is not a trivial case.  I can understand the latter two (i.e. We
> are special casing the deletion, and an update with a new value is
> not.  If removing a file from the filesystem is not sufficient to
> delete, we may have to rewrite the packed-refs), but not the first
> one.  Is it because currently we do not say "delete this ref only
> when its current value is X"?

It wouldn't be too hard to allow updates with REF_HAVE_OLD to be
optimized away too. I didn't do it because

1. We currently only create updates for that packed_refs backend with
REF_HAVE_OLD turned off, so such could would be unreachable (and
untestable).

2. I wanted to keep the patch as simple as possible in case you decide
to merge it into 2.15.

There would also be a little bit of a leveling violation (or maybe the
function name is not ideal). `is_packed_transaction_noop()` could check
that `old_oid` values are correct, and if they all are, declare the
transaction a NOOP. (It wasn't *really* a NOOP, but its OP, namely
checking old values, has already been carried out.) But what if it finds
that an `old_oid` was incorrect? Right now
`is_packed_transaction_noop()` has no way to report something like a
TRANSACTION_GENERIC_ERROR.

It could be that long-term it makes more sense for this optimization to
happen in `packed_transaction_prepare()`. Except that function is
sometimes called for its side-effects, like sorting an existing file, in
which case overwriting the `packed-refs` file shouldn't be optimized away.

So overall it seemed easier to punt on this optimization at this point.

> Also it is not immediately obvious how the "is this noop" helper is
> checking the absence of the same-named ref in the current
> packed-refs file, which is what matters for the correctness of the
> optimization.

The check is in the second loop in `is_packed_transaction_noop()`:

if (!refs_read_raw_ref(ref_store, update->refname,
   oid.hash, , ) ||
errno != ENOENT) {
/*
 * We might have to actually delete that
 * reference -> not a NOOP.
 */
ret = 0;
break;
}

If the refname doesn't exist, then `refs_read_raw_ref()` returns -1 and
sets errno to ENOENT, and the loop continues. Otherwise we exit with a
value of 0, meaning that the transaction is not a NOOP.

There are a lot of double-negatives here. It might be easier to follow
the logic if the sense of the function were inverted:
`is_packed_transaction_needed()`. Let me know if you have a strong
feeling about it.

Michael


[PATCH 0/2] Avoid rewriting "packed-refs" unnecessarily

2017-10-25 Thread Michael Haggerty
Since commit dc39e09942 (files_ref_store: use a transaction to update
packed refs, 2017-09-08), we've been rewriting the `packed-refs` file
unnecessarily when deleting a loose reference that has no packed
counterpart. Add some tests demonstrating this problem, then fix it by
teaching `files_backend` to see whether any references being deleted
have packed versions, and if not, skipping the packed_refs
transaction.

This is a mild regression since 2.14, which avoided rewriting the
`packed-refs` file in these cases (though it still had to *read* the
whole file to determine whether the rewrite could be skipped).
Therefore, it might be considered for 2.15. On the other hand, it is
late in the release cycle, so the risk of accepting this change might
be considered too risky.

These patches apply on top of master and commute with
mh/ref-locking-fix. They are also available from my GitHub fork [1] as
branch `avoid-rewriting-packed-refs`.

Michael

[1] https://github.com/mhagger/git

Michael Haggerty (2):
  t1409: check that `packed-refs` is not rewritten unnecessarily
  files-backend: don't rewrite the `packed-refs` file unnecessarily

 refs/files-backend.c  |  18 ++-
 refs/packed-backend.c |  68 
 refs/packed-backend.h |   8 +++
 t/t1409-avoid-packing-refs.sh | 118 ++
 4 files changed, 211 insertions(+), 1 deletion(-)
 create mode 100755 t/t1409-avoid-packing-refs.sh

-- 
2.14.1



[PATCH 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily

2017-10-25 Thread Michael Haggerty
Even when we are deleting references, we needn't overwrite the
`packed-refs` file if the references that we are deleting are not
present in that file. Implement this optimization as follows:

* Add a function `is_packed_transaction_noop()`, which checks whether
  a given packed-refs transaction doesn't actually have to do
  anything. This function must be called while holding the
  `packed-refs` lock to avoid races.

* Change `files_transaction_prepare()` to check whether the
  packed-refs transaction is unneeded. If so, squelch it, but continue
  holding the `packed-refs` lock until the end of the transaction to
  avoid races.

This fixes two tests in t1409.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/files-backend.c  | 18 +++-
 refs/packed-backend.c | 68 +++
 refs/packed-backend.h |  8 +
 t/t1409-avoid-packing-refs.sh |  4 +--
 4 files changed, 95 insertions(+), 3 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 014dabb0bf..5689e3a58d 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2605,7 +2605,23 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
goto cleanup;
}
backend_data->packed_refs_locked = 1;
-   ret = ref_transaction_prepare(packed_transaction, err);
+
+   if (is_packed_transaction_noop(refs->packed_ref_store,
+  packed_transaction)) {
+   /*
+* We can skip rewriting the `packed-refs`
+* file. But we do need to leave it locked, so
+* that somebody else doesn't pack a reference
+* that we are trying to delete.
+*/
+   if (ref_transaction_abort(packed_transaction, err)) {
+   ret = TRANSACTION_GENERIC_ERROR;
+   goto cleanup;
+   }
+   backend_data->packed_transaction = NULL;
+   } else {
+   ret = ref_transaction_prepare(packed_transaction, err);
+   }
}
 
 cleanup:
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 3279d42c5a..064b1b58a2 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1261,6 +1261,74 @@ static int write_with_updates(struct packed_ref_store 
*refs,
return -1;
 }
 
+int is_packed_transaction_noop(struct ref_store *ref_store,
+  struct ref_transaction *transaction)
+{
+   struct packed_ref_store *refs = packed_downcast(
+   ref_store,
+   REF_STORE_READ,
+   "is_packed_transaction_noop");
+   struct strbuf referent = STRBUF_INIT;
+   size_t i;
+   int ret;
+
+   if (!is_lock_file_locked(>lock))
+   BUG("is_packed_transaction_noop() called while unlocked");
+
+   /*
+* We're only going to bother returning true for the common,
+* trivial case that references are only being deleted, their
+* old values are not being checked, and the old `packed-refs`
+* file doesn't contain any of those reference(s). More
+* complicated cases (1) are unlikely to be able to be
+* optimized away anyway, and (2) are more expensive to check.
+* Start with cheap checks:
+*/
+   for (i = 0; i < transaction->nr; i++) {
+   struct ref_update *update = transaction->updates[i];
+
+   if (update->flags & REF_HAVE_OLD)
+   /* Have to check the old value -> not a NOOP. */
+   return 0;
+
+   if ((update->flags & REF_HAVE_NEW) && 
!is_null_oid(>new_oid))
+   /* Have to set a new value -> not a NOOP. */
+   return 0;
+   }
+
+   /*
+* The transaction isn't checking any old values nor is it
+* setting any nonzero new values, so it still might be a
+* NOOP. Now do the more expensive check: the update is not a
+* NOOP if one of the updates is a delete, and the old
+* `packed-refs` file contains a value for that reference.
+*/
+   ret = 1;
+   for (i = 0; i < transaction->nr; i++) {
+   struct ref_update *update = transaction->updates[i];
+   unsigned int type;
+   struct object_id oid;
+
+   if (!(update->flags & REF_HAVE_NEW))
+   /* This reference isn't being deleted -> NOOP. */
+   continue;
+
+   if (!refs_read_raw_ref(ref_store, update->refname,
+  oid.hash, , ) ||
+   errno

[PATCH 1/2] t1409: check that `packed-refs` is not rewritten unnecessarily

2017-10-25 Thread Michael Haggerty
There is no need to rewrite the `packed-refs` file except for the case
that we are deleting a reference that has a packed version. Verify
that `packed-refs` is not rewritten when it shouldn't be.

In fact, two of these tests fail:

* A new (empty) `packed-refs` file is created when deleting any loose
  reference and no `packed-refs` file previously existed.

* The `packed-refs` file is rewritten unnecessarily when deleting a
  loose reference that has no packed counterpart.

Both problems will be fixed in the next commit.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 t/t1409-avoid-packing-refs.sh | 118 ++
 1 file changed, 118 insertions(+)
 create mode 100755 t/t1409-avoid-packing-refs.sh

diff --git a/t/t1409-avoid-packing-refs.sh b/t/t1409-avoid-packing-refs.sh
new file mode 100755
index 00..a2397c7b71
--- /dev/null
+++ b/t/t1409-avoid-packing-refs.sh
@@ -0,0 +1,118 @@
+#!/bin/sh
+
+test_description='avoid rewriting packed-refs unnecessarily'
+
+. ./test-lib.sh
+
+# Add an identifying mark to the packed-refs file header line. This
+# shouldn't upset readers, and it should be omitted if the file is
+# ever rewritten.
+mark_packed_refs () {
+   sed -e "s/^\(#.*\)/\1 t1409 /" <.git/packed-refs >.git/packed-refs.new 
&&
+   mv .git/packed-refs.new .git/packed-refs
+}
+
+# Verify that the packed-refs file is still marked.
+check_packed_refs_marked () {
+   grep -q '^#.* t1409 ' .git/packed-refs
+}
+
+test_expect_success 'setup' '
+   git commit --allow-empty -m "Commit A" &&
+   A=$(git rev-parse HEAD) &&
+   git commit --allow-empty -m "Commit B" &&
+   B=$(git rev-parse HEAD) &&
+   git commit --allow-empty -m "Commit C" &&
+   C=$(git rev-parse HEAD)
+'
+
+test_expect_failure 'do not create packed-refs file gratuitously' '
+   test_must_fail test -f .git/packed-refs &&
+   git update-ref refs/heads/foo $A &&
+   test_must_fail test -f .git/packed-refs &&
+   git update-ref refs/heads/foo $B &&
+   test_must_fail test -f .git/packed-refs &&
+   git update-ref refs/heads/foo $C $B &&
+   test_must_fail test -f .git/packed-refs &&
+   git update-ref -d refs/heads/foo &&
+   test_must_fail test -f .git/packed-refs
+'
+
+test_expect_success 'check that marking the packed-refs file works' '
+   git for-each-ref >expected &&
+   git pack-refs --all &&
+   mark_packed_refs &&
+   check_packed_refs_marked &&
+   git for-each-ref >actual &&
+   test_cmp expected actual &&
+   git pack-refs --all &&
+   test_must_fail check_packed_refs_marked &&
+   git for-each-ref >actual2 &&
+   test_cmp expected actual2
+'
+
+test_expect_success 'leave packed-refs untouched on update of packed' '
+   git update-ref refs/heads/packed-update $A &&
+   git pack-refs --all &&
+   mark_packed_refs &&
+   git update-ref refs/heads/packed-update $B &&
+   check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on checked update of packed' '
+   git update-ref refs/heads/packed-checked-update $A &&
+   git pack-refs --all &&
+   mark_packed_refs &&
+   git update-ref refs/heads/packed-checked-update $B $A &&
+   check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on verify of packed' '
+   git update-ref refs/heads/packed-verify $A &&
+   git pack-refs --all &&
+   mark_packed_refs &&
+   echo "verify refs/heads/packed-verify $A" | git update-ref --stdin &&
+   check_packed_refs_marked
+'
+
+test_expect_success 'touch packed-refs on delete of packed' '
+   git update-ref refs/heads/packed-delete $A &&
+   git pack-refs --all &&
+   mark_packed_refs &&
+   git update-ref -d refs/heads/packed-delete &&
+   test_must_fail check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on update of loose' '
+   git pack-refs --all &&
+   git update-ref refs/heads/loose-update $A &&
+   mark_packed_refs &&
+   git update-ref refs/heads/loose-update $B &&
+   check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on checked update of loose' '
+   git pack-refs --all &&
+   git update-ref refs/heads/loose-checked-update $A &&
+   mark_packed_refs &&
+   git update-ref refs/heads/loose-checked-update $B $A &&
+   check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on verify of loose' '
+ 

Re: [PATCH 1/2] t1404: add a bunch of tests of D/F conflicts

2017-10-25 Thread Michael Haggerty
On 10/25/2017 10:03 AM, Michael Haggerty wrote:
> On 10/24/2017 09:46 PM, Jeff King wrote:
>> On Tue, Oct 24, 2017 at 12:19:26PM -0400, Eric Sunshine wrote:
>>
>>> On Tue, Oct 24, 2017 at 11:16 AM, Michael Haggerty <mhag...@alum.mit.edu> 
>>> wrote:
>>>> diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh
>>>> @@ -34,6 +34,86 @@ test_update_rejected () {
>>>> +# Test adding and deleting D/F-conflicting references in a single
>>>> +# transaction.
>>>> +df_test() {
>>>> +   local prefix="$1"
>>>> +   shift
>>>> +   local pack=:
>>>
>>> Isn't "local" a Bash-ism we want to keep out of the test scripts?
>>
>> Yeah. It's supported by dash and many other shells, but we do try to
>> avoid it[1]. I think in this case we could just drop it (but keep
>> setting the "local foo" ones to empty with "foo=".
> [...]
> 
> But I agree that this bug fix is not the correct occasion to change
> policy on something like this, so I'll reroll without "local".

Oh, I see Junio has already made this fix in the version that he picked
up, so I won't bother rerolling.

Michael


Re: [PATCH 1/2] t1404: add a bunch of tests of D/F conflicts

2017-10-25 Thread Michael Haggerty
On 10/24/2017 09:46 PM, Jeff King wrote:
> On Tue, Oct 24, 2017 at 12:19:26PM -0400, Eric Sunshine wrote:
> 
>> On Tue, Oct 24, 2017 at 11:16 AM, Michael Haggerty <mhag...@alum.mit.edu> 
>> wrote:
>>> diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh
>>> @@ -34,6 +34,86 @@ test_update_rejected () {
>>> +# Test adding and deleting D/F-conflicting references in a single
>>> +# transaction.
>>> +df_test() {
>>> +   local prefix="$1"
>>> +   shift
>>> +   local pack=:
>>
>> Isn't "local" a Bash-ism we want to keep out of the test scripts?
> 
> Yeah. It's supported by dash and many other shells, but we do try to
> avoid it[1]. I think in this case we could just drop it (but keep
> setting the "local foo" ones to empty with "foo=".

I do wish that we could allow "local", as it avoids a lot of headaches
and potential breakage. According to [1],

> However, every single POSIX-compliant shell I've tested implements the
> 'local' keyword, which lets you declare variables that won't be
> returned from the current function. So nowadays you can safely count
> on it working.

He mentions that ksh93 doesn't support "local", but that it differs from
POSIX in many other ways, too.

Perhaps we could slip in a couple of "local" as a compatibility test to
see if anybody complains, like we did with a couple of C features recently.

But I agree that this bug fix is not the correct occasion to change
policy on something like this, so I'll reroll without "local".

Michael

[1] http://apenwarr.ca/log/?m=201102#28


[PATCH 2/2] files_transaction_prepare(): fix handling of ref lock failure

2017-10-24 Thread Michael Haggerty
Since dc39e09942 (files_ref_store: use a transaction to update packed
refs, 2017-09-08), failure to lock a reference has been handled
incorrectly by `files_transaction_prepare()`. If
`lock_ref_for_update()` fails in the lock-acquisition loop of that
function, it sets `ret` then breaks out of that loop. Prior to
dc39e09942, that was OK, because the only thing following the loop was
the cleanup code. But dc39e09942 added another blurb of code between
the loop and the cleanup. That blurb sometimes resets `ret` to zero,
making the cleanup code think that the locking was successful.

Specifically, whenever

* One or more reference deletions have been processed successfully in
  the lock-acquisition loop. (Processing the first such reference
  causes a packed-ref transaction to be initialized.)

* Then `lock_ref_for_update()` fails for a subsequent reference. Such
  a failure can happen for a number of reasons, such as the old SHA-1
  not being correct, lock contention, etc. This causes a `break` out
  of the lock-acquisition loop.

* The `packed-refs` lock is acquired successfully and
  `ref_transaction_prepare()` succeeds for the packed-ref transaction.
  This has the effect of resetting `ret` back to 0, and making the
  cleanup code think that lock acquisition was successful.

In that case, any reference updates that were processed prior to
breaking out of the loop would be carried out (loose and packed), but
the reference that couldn't be locked and any subsequent references
would silently be ignored.

This can easily cause data loss if, for example, the user was trying
to push a new name for an existing branch while deleting the old name.
After the push, the branch could be left unreachable, and could even
subsequently be garbage-collected.

This problem was noticed in the context of deleting one reference and
creating another in a single transaction, when the two references D/F
conflict with each other, like

git update-ref --stdin <
---
 refs/files-backend.c |  2 +-
 t/t1404-update-ref-errors.sh | 16 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 014dabb0bf..8cc1e07fdb 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2570,7 +2570,7 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
ret = lock_ref_for_update(refs, update, transaction,
  head_ref, _refnames, err);
if (ret)
-   break;
+   goto cleanup;
 
if (update->flags & REF_DELETING &&
!(update->flags & REF_LOG_ONLY) &&
diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh
index 8b5e9a83c5..b7838967b8 100755
--- a/t/t1404-update-ref-errors.sh
+++ b/t/t1404-update-ref-errors.sh
@@ -276,11 +276,11 @@ test_expect_success 'D/F conflict prevents add short + 
delete long' '
df_test refs/df-as-dl --add-del foo foo/bar
 '
 
-test_expect_failure 'D/F conflict prevents delete long + add short' '
+test_expect_success 'D/F conflict prevents delete long + add short' '
df_test refs/df-dl-as --del-add foo/bar foo
 '
 
-test_expect_failure 'D/F conflict prevents delete short + add long' '
+test_expect_success 'D/F conflict prevents delete short + add long' '
df_test refs/df-ds-al --del-add foo foo/bar
 '
 
@@ -292,17 +292,17 @@ test_expect_success 'D/F conflict prevents add short + 
delete long packed' '
df_test refs/df-as-dlp --pack --add-del foo foo/bar
 '
 
-test_expect_failure 'D/F conflict prevents delete long packed + add short' '
+test_expect_success 'D/F conflict prevents delete long packed + add short' '
df_test refs/df-dlp-as --pack --del-add foo/bar foo
 '
 
-test_expect_failure 'D/F conflict prevents delete short packed + add long' '
+test_expect_success 'D/F conflict prevents delete short packed + add long' '
df_test refs/df-dsp-al --pack --del-add foo foo/bar
 '
 
 # Try some combinations involving symbolic refs...
 
-test_expect_failure 'D/F conflict prevents indirect add long + delete short' '
+test_expect_success 'D/F conflict prevents indirect add long + delete short' '
df_test refs/df-ial-ds --sym-add --add-del foo/bar foo
 '
 
@@ -314,11 +314,11 @@ test_expect_success 'D/F conflict prevents indirect add 
short + indirect delete
df_test refs/df-ias-idl --sym-add --sym-del --add-del foo foo/bar
 '
 
-test_expect_failure 'D/F conflict prevents indirect delete long + indirect add 
short' '
+test_expect_success 'D/F conflict prevents indirect delete long + indirect add 
short' '
df_test refs/df-idl-ias --sym-add --sym-del --del-add foo/bar foo
 '
 
-test_expect_failure 'D/F conflict prevents indirect add long + delete short 
packed' '
+test_expect_success 'D/F conflict prevents indirect add long + delete short 
packed' '
df_test refs/df-ial-dsp --sym-add --pack --add-del foo/bar foo
 

[PATCH 1/2] t1404: add a bunch of tests of D/F conflicts

2017-10-24 Thread Michael Haggerty
It is currently not allowed, in a single transaction, to add one
reference and delete another reference if the two reference names D/F
conflict with each other (e.g., like `refs/foo/bar` and `refs/foo`).
The reason is that the code would need to take locks

$GIT_DIR/refs/foo.lock
$GIT_DIR/refs/foo/bar.lock

But the latter lock couldn't coexist with the loose reference file

$GIT_DIR/refs/foo

, because `$GIT_DIR/refs/foo` cannot be both a directory and a file at
the same time (hence the name "D/F conflict).

Add a bunch of tests that we cleanly reject such transactions.

In fact, many of the new tests currently fail. They will be fixed in
the next commit along with an explanation.

Reported-by: Jeff King <p...@peff.net>
Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 t/t1404-update-ref-errors.sh | 146 +++
 1 file changed, 146 insertions(+)

diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh
index 100d50e362..8b5e9a83c5 100755
--- a/t/t1404-update-ref-errors.sh
+++ b/t/t1404-update-ref-errors.sh
@@ -34,6 +34,86 @@ test_update_rejected () {
 
 Q="'"
 
+# Test adding and deleting D/F-conflicting references in a single
+# transaction.
+df_test() {
+   local prefix="$1"
+   shift
+   local pack=:
+   local symadd=false
+   local symdel=false
+   local add_del=false
+   local addref
+   local delref
+   while test $# -gt 0
+   do
+   case "$1" in
+   --pack)
+   pack="git pack-refs --all"
+   shift
+   ;;
+   --sym-add)
+   # Perform the add via a symbolic reference
+   symadd=true
+   shift
+   ;;
+   --sym-del)
+   # Perform the del via a symbolic reference
+   symdel=true
+   shift
+   ;;
+   --del-add)
+   # Delete first reference then add second
+   add_del=false
+   delref="$prefix/r/$2"
+   addref="$prefix/r/$3"
+   shift 3
+   ;;
+   --add-del)
+   # Add first reference then delete second
+   add_del=true
+   addref="$prefix/r/$2"
+   delref="$prefix/r/$3"
+   shift 3
+   ;;
+   *)
+   echo 1>&2 "Extra args to df_test: $*"
+   return 1
+   ;;
+   esac
+   done
+   git update-ref "$delref" $C &&
+   if $symadd
+   then
+   addname="$prefix/s/symadd" &&
+   git symbolic-ref "$addname" "$addref"
+   else
+   addname="$addref"
+   fi &&
+   if $symdel
+   then
+   delname="$prefix/s/symdel" &&
+   git symbolic-ref "$delname" "$delref"
+   else
+   delname="$delref"
+   fi &&
+   cat >expected-err <<-EOF &&
+   fatal: cannot lock ref $Q$addname$Q: $Q$delref$Q exists; cannot create 
$Q$addref$Q
+   EOF
+   $pack &&
+   if $add_del
+   then
+   printf "%s\n" "create $addname $D" "delete $delname"
+   else
+   printf "%s\n" "delete $delname" "create $addname $D"
+   fi >commands &&
+   test_must_fail git update-ref --stdin output.err &&
+   test_cmp expected-err output.err &&
+   printf "%s\n" "$C $delref" >expected-refs &&
+   git for-each-ref --format="%(objectname) %(refname)" $prefix/r 
>actual-refs &&
+   test_cmp expected-refs actual-refs
+}
+
 test_expect_success 'setup' '
 
git commit --allow-empty -m Initial &&
@@ -188,6 +268,72 @@ test_expect_success 'empty directory should not fool 1-arg 
delete' '
git update-ref --stdin
 '
 
+test_expect_success 'D/F conflict prevents add long + delete short' '
+   df_test refs/df-al-ds --add-del foo/bar foo
+'
+
+test_expect_success 'D/F conflict prevents add short + delete long' '
+   df_test refs/df-as-dl --add-del foo foo/bar
+'
+
+test_expect_failure 'D/F conflict prevents delete long + add short' '
+   df_test refs/df-dl-as --del-add foo/bar foo
+'
+
+test_expect_failure 'D/F conflict prevents delete short + add long' '
+   df_test refs/df-ds-al --del-add foo foo/bar
+'
+
+test_expect_success 'D/F conflict p

[PATCH 0/2] Fix an error-handling path when locking refs

2017-10-24 Thread Michael Haggerty
In [1], Peff described a bug that he found that could cause a
reference transaction to be partly carried-out and partly not, with a
successful return. The scenario that he discussed was the deletion of
one reference and creation of another, where the two references D/F
conflict with each other.

But in fact the problem is more general; it can happen whenever a
reference deletion within a transaction is processed successfully, and
then another reference update in the same transaction fails in
`lock_ref_for_update()`. In such a case the failed update and any
subsequent ones could be silently ignored.

There is a longer explanation in the second patch's commit message.

The tests that I added probe a bunch of D/F update scenarios, which I
think should be characteristic of the scenarios that would trigger
this bug. It would be nice to have tests that examine other types of
failures (e.g., wrong `old_oid` values).

Bit since the fix is obviously a strict improvement and can prevent
data loss, and since the release process is already pretty far along,
I wanted to send this out ASAP. We can follow it up later with
additional tests.

These patches apply to current master. They are also available from my
GitHub fork [2] as branch `ref-locking-fix`.

While looking at this code again, I realized that the new code
rewrites the `packed-refs` file more often than did the old code.
Specifically, after dc39e09942 (files_ref_store: use a transaction to
update packed refs, 2017-09-08), the `packed-refs` file is overwritten
for any transaction that includes a reference delete. Prior to that
commit, `packed-refs` was only overwritten if a deleted reference was
actually present in the existing `packed-refs` file. This is even the
case if there was previously no `packed-refs` file at all; now any
reference deletion causes an empty `packed-refs` file to be created.

I think this is a conscionable regression, since deleting references
that are purely loose is probably not all that common, and the old
code had to read the whole `packed-refs` file anyway. Nevertheless, it
is obviously something that I would like to fix (though maybe not for
2.15? I'm open to input about its urgency.)

[1] 
https://public-inbox.org/git/20171024082409.smwsd6pla64jj...@sigill.intra.peff.net/
[2] https://github.com/mhagger/git

Michael Haggerty (2):
  t1404: add a bunch of tests of D/F conflicts
  files_transaction_prepare(): fix handling of ref lock failure

 refs/files-backend.c |   2 +-
 t/t1404-update-ref-errors.sh | 146 +++
 2 files changed, 147 insertions(+), 1 deletion(-)

-- 
2.14.1



Re: v2.15.0-rc2 ref deletion bug

2017-10-24 Thread Michael Haggerty
On 10/24/2017 01:05 PM, Michael Haggerty wrote:
> On 10/24/2017 10:24 AM, Jeff King wrote:
>> I found a potentially serious bug in v2.15.0-rc2 (and earlier release
>> candidates, too) that we may want to deal with before the release.
>>
>> If I do:
>> [...]
>> then at the end we have no refs at all!
> 
> That's a serious bug. I'm looking into it right now.

The fix is trivial (see below). But let me add some tests and make sure
that there are no similar breakages in the area, then submit a full patch.

Michael

- refs/files-backend.c
-
index 29eb5e826f..fc3f2abcc6 100644
@@ -2523,15 +2523,15 @@ static int files_transaction_prepare(struct
ref_store *ref_store,
 */
for (i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];

ret = lock_ref_for_update(refs, update, transaction,
  head_ref, _refnames, err);
if (ret)
-   break;
+   goto cleanup;

if (update->flags & REF_DELETING &&
!(update->flags & REF_LOG_ONLY) &&
!(update->flags & REF_ISPRUNING)) {
/*
 * This reference has to be deleted from
 * packed-refs if it exists there.


Re: v2.15.0-rc2 ref deletion bug

2017-10-24 Thread Michael Haggerty
On 10/24/2017 10:24 AM, Jeff King wrote:
> I found a potentially serious bug in v2.15.0-rc2 (and earlier release
> candidates, too) that we may want to deal with before the release.
> 
> If I do:
> 
> git init -q repo
> cd repo
> obj=$(git hash-object -w /dev/null)
> git update-ref refs/tags/foo $obj
> git update-ref --stdin <<-EOF
> delete refs/tags/foo
> update refs/tags/foo/bar $obj
> EOF
> git for-each-ref
> 
> then at the end we have no refs at all!

That's a serious bug. I'm looking into it right now.

> I'd expect one of:
> 
>   1. We delete "foo" before updating "foo/bar", and we end up with a
>  single ref.

I don't think that this is possible in the general case in a single
transaction. The problem is that the code would need to take locks

refs/tags/foo.lock
refs/tags/foo/bar.lock

But the latter couldn't coexist with the loose reference file

refs/tags/foo

, which might already exist.

It is only imaginable to do this in a single transaction if you pack and
prune `refs/tags/foo` first, to get the loose reference out of the way,
before executing the transaction. Even then, you would have to beware of
a race where another process writes a loose version of `refs/tags/foo`
between the time that `pack-refs` prunes it and the time that the
transaction obtains the lock again.

> [...]

Michael


Re: [PATCH v2 00/24] object_id part 10

2017-10-11 Thread Michael Haggerty
On 10/09/2017 03:11 AM, brian m. carlson wrote:
> This is the tenth in a series of patches to convert from unsigned char
> [20] to struct object_id.  This series mostly involves changes to the
> refs code.  After these changes, there are almost no references to
> unsigned char in the main refs code.
> 
> The series has not been rebased on master since the last submission, but
> I can do so if that's more convenient.
> 
> This series is available from the following URL:
> https://github.com/bk2204/git.git object-id-part10

I read through the whole series medium-thoroughly and left a few
comments, but overall it looks very good and clear. Thanks so much for
working on this!

I took a stab at rebasing this patch series on top of current master
using `git-imerge`. I pushed the results to my GitHub fork [1] as branch
`object-id-part-10-rebased`. I didn't check the results very carefully,
nor whether the commit messages need adjusting, but I did verify that
each of the commits passes the test suite. Junio, it might serve as
something to compare against your conflict resolution.

Michael

[1] https://github.com/mhagger/git


Re: [PATCH v2 24/24] refs/files-backend: convert static functions to object_id

2017-10-11 Thread Michael Haggerty
On 10/09/2017 03:11 AM, brian m. carlson wrote:
> Convert several static functions to take pointers to struct object_id.
> Change the relevant parameters to write_packed_entry to be const, as we
> don't modify them.  Rename lock_ref_sha1_basic to lock_ref_oid_basic to
> reflect its new argument.
> 
> Signed-off-by: brian m. carlson 
> ---
>  refs/files-backend.c | 48 
>  1 file changed, 24 insertions(+), 24 deletions(-)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 7281f27f62..84d8e3da99 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -811,7 +811,7 @@ static struct ref_iterator *files_ref_iterator_begin(
>   * return a negative value.
>   */

The docstring for this function needs adjusting. It would also be worth
mentioning that `old_oid` is allowed to be NULL (which was true before
but wasn't mentioned).

> [...]

Michael



Re: [PATCH v2 23/24] refs: convert read_raw_ref backends to struct object_id

2017-10-11 Thread Michael Haggerty
On 10/09/2017 03:11 AM, brian m. carlson wrote:
> Convert the unsigned char * parameter to struct object_id * for
> files_read_raw_ref and packed_read_raw-ref.  Update the documentation.

Nit:

s/packed_read_raw-ref/packed_read_raw_ref/

> Switch from using get_sha1_hex and a hard-coded 40 to using
> parse_oid_hex.
> [...]

Michael


Re: [PATCH v2 21/24] refs: convert resolve_ref_unsafe to struct object_id

2017-10-11 Thread Michael Haggerty
On 10/09/2017 03:11 AM, brian m. carlson wrote:
> Convert resolve_ref_unsafe to take a pointer to struct object_id by
> converting one remaining caller to use struct object_id, converting the
> declaration and definition, and applying the following semantic patch:
> 
> @@
> expression E1, E2, E3, E4;
> @@
> - resolve_ref_unsafe(E1, E2, E3.hash, E4)
> + resolve_ref_unsafe(E1, E2, , E4)
> 
> @@
> expression E1, E2, E3, E4;
> @@
> - resolve_ref_unsafe(E1, E2, E3->hash, E4)
> + resolve_ref_unsafe(E1, E2, E3, E4)
> 
> Signed-off-by: brian m. carlson 
> ---
>  blame.c   |  4 ++--
>  builtin/fsck.c|  2 +-
>  refs.c| 28 ++--
>  refs.h|  4 ++--
>  refs/files-backend.c  |  8 
>  sequencer.c   |  2 +-
>  t/helper/test-ref-store.c |  6 +++---
>  transport-helper.c|  7 +++
>  worktree.c|  2 +-
>  9 files changed, 31 insertions(+), 32 deletions(-)
> 
> [...]
> diff --git a/refs.h b/refs.h
> index bb0dcd97af..4eedc880c6 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -62,10 +62,10 @@ struct worktree;

The docstring above this hunk needs to be adjusted.

>  const char *refs_resolve_ref_unsafe(struct ref_store *refs,
>   const char *refname,
>   int resolve_flags,
> - unsigned char *sha1,
> + struct object_id *oid,
>   int *flags);
>  const char *resolve_ref_unsafe(const char *refname, int resolve_flags,
> -unsigned char *sha1, int *flags);
> +struct object_id *oid, int *flags);
>  
>  char *refs_resolve_refdup(struct ref_store *refs,
> const char *refname, int resolve_flags,
> [...]
Michael


Re: [PATCH v2 14/24] refs: convert peel_ref to struct object_id

2017-10-11 Thread Michael Haggerty
On 10/09/2017 03:11 AM, brian m. carlson wrote:
> Convert peel_ref (and its corresponding backend) to struct object_id.
> 
> This transformation was done with an update to the declaration,
> definition, and test helper and the following semantic patch:
> 
> @@
> expression E1, E2;
> @@
> - peel_ref(E1, E2.hash)
> + peel_ref(E1, )
> 
> @@
> expression E1, E2;
> @@
> - peel_ref(E1, E2->hash)
> + peel_ref(E1, E2)
> 
> Signed-off-by: brian m. carlson 
> ---
>  builtin/describe.c| 2 +-
>  builtin/pack-objects.c| 4 ++--
>  builtin/show-ref.c| 2 +-
>  refs.c| 8 
>  refs.h| 4 ++--
>  refs/files-backend.c  | 8 
>  refs/packed-backend.c | 4 ++--
>  refs/refs-internal.h  | 2 +-
>  t/helper/test-ref-store.c | 6 +++---
>  upload-pack.c | 2 +-
>  10 files changed, 21 insertions(+), 21 deletions(-)
> 
> [...]
> diff --git a/refs.h b/refs.h
> index 8159b7b067..832ade2b13 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -120,8 +120,8 @@ extern int refs_init_db(struct strbuf *err);
>   * ultimately resolve to a peelable tag.
>   */

The comment just above needs to be adjusted.

>  int refs_peel_ref(struct ref_store *refs, const char *refname,
> -   unsigned char *sha1);
> -int peel_ref(const char *refname, unsigned char *sha1);
> +   struct object_id *oid);
> +int peel_ref(const char *refname, struct object_id *oid);
>  
>  /**
>   * Resolve refname in the nested "gitlink" repository in the specified
> [...]

Michael



Re: [PATCH v2 05/24] refs: update ref transactions to use struct object_id

2017-10-11 Thread Michael Haggerty
On 10/09/2017 03:11 AM, brian m. carlson wrote:
> Update the ref transaction code to use struct object_id.  Remove one
> NULL pointer check which was previously inserted around a dereference;
> since we now pass a pointer to struct object_id directly through, the
> code we're calling handles this for us.
> 
> Signed-off-by: brian m. carlson 
> ---
>  branch.c   |  2 +-
>  builtin/clone.c|  2 +-
>  builtin/commit.c   |  4 ++--
>  builtin/fetch.c|  4 ++--
>  builtin/receive-pack.c |  4 ++--
>  builtin/replace.c  |  2 +-
>  builtin/tag.c  |  2 +-
>  builtin/update-ref.c   |  8 
>  fast-import.c  |  4 ++--
>  refs.c | 44 +---
>  refs.h | 24 
>  refs/files-backend.c   | 12 ++--
>  refs/refs-internal.h   |  4 ++--
>  sequencer.c|  2 +-
>  walker.c   |  2 +-
>  15 files changed, 59 insertions(+), 61 deletions(-)
> 
> [...]
> diff --git a/refs.h b/refs.h
> index 369614d392..543dcc5956 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -519,15 +519,15 @@ struct ref_transaction *ref_transaction_begin(struct 
> strbuf *err);
>   */
>  int ref_transaction_update(struct ref_transaction *transaction,

The docstring for this function needs to be updated.

>  const char *refname,
> -const unsigned char *new_sha1,
> -const unsigned char *old_sha1,
> +const struct object_id *new_oid,
> +const struct object_id *old_oid,
>  unsigned int flags, const char *msg,
>  struct strbuf *err);
>  
> [...]

Michael


Re: [PATCH v2 04/24] refs: convert update_ref and refs_update_ref to use struct object_id

2017-10-11 Thread Michael Haggerty
On 10/09/2017 03:11 AM, brian m. carlson wrote:
> Convert update_ref, refs_update_ref, and write_pseudoref to use struct
> object_id.  Update the existing callers as well.  Remove update_ref_oid,
> as it is no longer needed.
> 
> Signed-off-by: brian m. carlson 
> ---
>  bisect.c  |  5 +++--
>  builtin/am.c  | 14 +++---
>  builtin/checkout.c|  3 +--
>  builtin/clone.c   | 14 +++---
>  builtin/merge.c   | 13 ++---
>  builtin/notes.c   | 10 +-
>  builtin/pull.c|  2 +-
>  builtin/reset.c   |  4 ++--
>  builtin/update-ref.c  |  2 +-
>  notes-cache.c |  2 +-
>  notes-utils.c |  2 +-
>  refs.c| 40 
>  refs.h|  5 +
>  sequencer.c   |  9 +++--
>  t/helper/test-ref-store.c | 10 +-
>  transport-helper.c|  3 ++-
>  transport.c   |  4 ++--
>  17 files changed, 64 insertions(+), 78 deletions(-)
> 
> [...]
> diff --git a/refs.c b/refs.c
> index 0a5b68d6fb..51942df7b3 100644
> --- a/refs.c
> +++ b/refs.c
> [...]
> @@ -1003,12 +995,12 @@ int refs_update_ref(struct ref_store *refs, const char 
> *msg,
>   int ret = 0;
>  
>   if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
> - assert(refs == get_main_ref_store());

Was the deletion of the line above intentional?

> - ret = write_pseudoref(refname, new_sha1, old_sha1, );
> + ret = write_pseudoref(refname, new_oid, old_oid, );

This is not new to your code, but I just noticed a problem here.
`refs_update_ref()` is documented, via its reference to
`ref_transaction_update()`, to allow `new_sha1` (i.e., now `new_oid`) to
be NULL. (NULL signifies that the value of the reference shouldn't be
changed.)

But `write_pseudoref()` dereferences its `oid` argument unconditionally,
so this call would fail if `new_oid` is NULL.

This has all been the case since `write_pseudoref()` was introduced in
74ec19d4be (pseudorefs: create and use pseudoref update and delete
functions, 2015-07-31).

In my opinion, `write_pseudoref()` is broken. It should accept NULL as
its `oid` argument.

>   } else {
>   t = ref_store_transaction_begin(refs, );
>   if (!t ||
> - ref_transaction_update(t, refname, new_sha1, old_sha1,
> + ref_transaction_update(t, refname, new_oid ? new_oid->hash 
> : NULL,
> +old_oid ? old_oid->hash : NULL,
>  flags, msg, ) ||
>   ref_transaction_commit(t, )) {
>   ret = 1;
> [...]

Michael


Re: [PATCH 2/2] refs_resolve_ref_unsafe: handle d/f conflicts for writes

2017-10-06 Thread Michael Haggerty
On 10/06/2017 07:16 PM, Jeff King wrote:
> On Fri, Oct 06, 2017 at 07:09:10PM +0200, Michael Haggerty wrote:
> 
>> I do have one twinge of uneasiness at a deeper level, that I haven't had
>> time to check...
>>
>> Does this patch make it easier to *set* HEAD to an unborn branch that
>> d/f conflicts with an existing reference? If so, that might be a
>> slightly worse UI for users. I'd rather learn about such a problem when
>> setting HEAD (when I am thinking about the new branch name and am in the
>> frame of mind to solve the problem) rather than later, when I try to
>> commit to the new branch.
> 
> Good question. The answer is no, it's allowed both before and after my
> patch. At least via git-symbolic-ref.
> 
> I agree it would be nice to know earlier for such a case. For
> symbolic-ref, we probably should allow it, because it's plumbing that
> may be used for tricky things. For things like "checkout -b", you'd
> generally get a timely warning as we try to create the ref.
> 
> The odd man out is "checkout --orphan", which leaves the branch unborn.
> It might be nice if it did a manual check that the ref is available (and
> also that it's syntactically acceptable, though I think we may do that
> already).
> 
> But all of that is orthogonal to this fix, I think.

Thanks for checking. Yes, I totally agree that this is orthogonal.

Michael


Re: [PATCH 2/2] refs_resolve_ref_unsafe: handle d/f conflicts for writes

2017-10-06 Thread Michael Haggerty
On 10/06/2017 04:42 PM, Jeff King wrote:
> If our call to refs_read_raw_ref() fails, we check errno to
> see if the ref is simply missing, or if we encountered a
> more serious error. If it's just missing, then in "write"
> mode (i.e., when RESOLVE_REFS_READING is not set), this is
> perfectly fine.
> 
> However, checking for ENOENT isn't sufficient to catch all
> missing-ref cases. In the filesystem backend, we may also
> see EISDIR when we try to resolve "a" and "a/b" exists.
> Likewise, we may see ENOTDIR if we try to resolve "a/b" and
> "a" exists. In both of those cases, we know that our
> resolved ref doesn't exist, but we return an error (rather
> than reporting the refname and returning a null sha1).
> 
> This has been broken for a long time, but nobody really
> noticed because the next step after resolving without the
> READING flag is usually to lock the ref and write it. But in
> both of those cases, the write will fail with the same
> errno due to the directory/file conflict.
> 
> There are two cases where we can notice this, though:
> 
>   1. If we try to write "a" and there's a leftover directory
>  already at "a", even though there is no ref "a/b". The
>  actual write is smart enough to move the empty "a" out
>  of the way.
> 
>  This is reasonably rare, if only because the writing
>  code has to do an independent resolution before trying
>  its write (because the actual update_ref() code handles
>  this case fine). The notes-merge code does this, and
>  before the fix in the prior commit t3308 erroneously
>  expected this case to fail.
> 
>   2. When resolving symbolic refs, we typically do not use
>  the READING flag because we want to resolve even
>  symrefs that point to unborn refs. Even if those unborn
>  refs could not actually be written because of d/f
>  conflicts with existing refs.
> 
>  You can see this by asking "git symbolic-ref" to report
>  the target of a symref pointing past a d/f conflict.
> 
> We can fix the problem by recognizing the other "missing"
> errnos and treating them like ENOENT. This should be safe to
> do even for callers who are then going to actually write the
> ref, because the actual writing process will fail if the d/f
> conflict is a real one (and t1404 checks these cases).
> 
> Arguably this should be the responsibility of the
> files-backend to normalize all "missing ref" errors into
> ENOENT (since something like EISDIR may not be meaningful at
> all to a database backend). However other callers of
> refs_read_raw_ref() may actually care about the distinction;
> putting this into resolve_ref() is the minimal fix for now.
> 
> The new tests in t1401 use git-symbolic-ref, which is the
> most direct way to check the resolution by itself.
> Interestingly we actually had a test that setup this case
> already, but we only used it to verify that the funny state
> could be overwritten, not that it could be resolved.
> 
> We also add a new test in t3200, as "branch -m" was the
> original motivation for looking into this. What happens is
> this:
> 
>   0. HEAD is pointing to branch "a"
> 
>   1. The user asks to rename "a" to "a/b".
> 
>   2. We create "a/b" and delete "a".
> 
>   3. We then try to update any worktree HEADs that point to
>  the renamed ref (including the main repo HEAD). To do
>  that, we have to resolve each HEAD. But now our HEAD is
>  pointing at "a", and we get EISDIR due to the loose
>  "a/b". As a result, we think there is no HEAD, and we
>  do not update it. It now points to the bogus "a".
> 
> Interestingly this case used to work, but only accidentally.
> Before 31824d180d (branch: fix branch renaming not updating
> HEADs correctly, 2017-08-24), we'd update any HEAD which we
> couldn't resolve. That was wrong, but it papered over the
> fact that we were incorrectly failing to resolve HEAD.
> 
> So while the bug demonstrated by the git-symbolic-ref is
> quite old, the regression to "branch -m" is recent.

Thanks for your detailed investigation and analysis and for the new tests.

This makes sense to me at the level of fixing the bug.

I do have one twinge of uneasiness at a deeper level, that I haven't had
time to check...

Does this patch make it easier to *set* HEAD to an unborn branch that
d/f conflicts with an existing reference? If so, that might be a
slightly worse UI for users. I'd rather learn about such a problem when
setting HEAD (when I am thinking about the new branch name and am in the
frame of mind to solve the problem) rather than later, when I try to
commit to the new branch.

Even if so, that wouldn't be a problem with this patch per se, but
rather a possible accidental side-effect of fixing the bug.

Michael

> [...]


Re: RFC: Design and code of partial clones (now, missing commits and trees OK)

2017-09-26 Thread Michael Haggerty
On 09/22/2017 12:42 AM, Jonathan Tan wrote:
> On Thu, 21 Sep 2017 13:57:30 Jeff Hostetler  wrote:
> [...]
>> I struggled with the terms here a little when looking at the source.
>> () A remote responding to a partial-clone is termed a
>> "promisor-remote". () Packfiles received from a promisor-remote are
>> marked with ".promisor" like ".keep" names.
>> () An object actually contained in such packfiles is called a
>> "promisor-object". () An object not-present but referenced by one of
>> the above promisor-objects is called a "promised-object" (aka a
>> "missing-object").
>>
>> I think the similarity of the words "promisOR" and "promisED" threw
>> me here and as I was looking at the code.  The code in is_promised()
>> [1] looked like it was adding all promisor- and promised-objects to
>> the "promised" OIDSET, but I could be mistaken.
>>
>> [1]
>> https://github.com/jonathantanmy/git/commit/7a9c2d9b6e2fce293817b595dee29a7eede0#diff-5d5d5dc185ef37dc30bb7d9a7ae0c4e8R1960
> 
> I was struggling a bit with the terminology, true.
> 
> Right now I'm thinking of:
>  - promisor remote (as you defined)
>  - promisor packfile (as you defined)
>  - promisor object is an object known to belong to the promisor (whether
>because we have it in a promisor packfile or because it is referenced
>by an object in a promisor packfile)

Maybe naming has been discussed at length before, and I am jumping into
a long-settled topic. And admittedly this is bikeshedding.

But I find these names obscure, even as a developer. And terms like this
will undoubtedly bleed into the UI and documentation, so it would be
good to put some effort into choosing the best names possible.

I suppose that the term "promisor" comes from the computer science term
"promise" [1]. In that sense it is apt, because, say, a promisor object
is something that is known to be obtainable, but we don't have it yet.

But from the user's point of view, I think this isn't a very
illuminating term. I think the user's mental model will be that there is
a distinguished remote repository that holds the project's entire
published history, and she has to remain connected to it for certain Git
operations to work [2]. Another interesting aspect of this remote is
that it has to be trusted never (well, almost never) to discard any
objects [3].

Let me brainstorm about other names or concepts that seem closer to the
user's mental model:

* "backing remote", "backing repository"

* "lazy remote", "live remote", "cloud remote", "shared remote",
  "on-demand remote"

* "full remote", "deep remote", "permanent remote"

* "attached remote", "bound remote", "linked remote"

* "trusted remote", "authoritative remote", "official remote"
  (these terms probably imply more than we want)

* "upstream", "upstream remote" (probably too confusing with how
  the term "upstream" is currently used, even if in practice they
  will often be the same remote)

* "object depot", "depot remote", "depot repository", "remote
  object depot" (I don't like the term "object" that much, because
  it is rather remote from the Git user's daily life)

* "repository of record", "remote of record" (too wordy)

* "history depot", "history warehouse" (meh)

* (dare I say it?) "central repository"

* "object archive", "archival remote" (not so good because we already
  use "archive" for other concepts)

* depository (sounds too much like "repository")

* The thesaurus suggests nouns like: store, bank, warehouse, library,
  chronicle, annals, registry, locker, strongbox, attic, bunker

Personally I think "lazy remote" and "backing remote" are not too bad.

Michael

[1] https://en.wikipedia.org/wiki/Futures_and_promises

[2] I haven't checked whether the current proposal allows for
multiple "promisor remotes". It's certainly thinkable, if not
now then in the future. But I suppose that even then, 99% of
users will configure a single "promisor remote" for each
repository.

[3] For those rare occasions where the server has to discard objects,
it might make sense for the server to remember the names of the
objects that were deleted, so that it can tell clients "no, you're
not insane. I used to have that object but it has intentionally
been obliterated", and possibly even a reason: "it is now taboo"
vs. "I got tired of carrying it around".


[PATCH v3 08/21] read_packed_refs(): read references with minimal copying

2017-09-25 Thread Michael Haggerty
Instead of copying data from the `packed-refs` file one line at time
and then processing it, process the data in place as much as possible.

Also, instead of processing one line per iteration of the main loop,
process a reference line plus its corresponding peeled line (if
present) together.

Note that this change slightly tightens up the parsing of the
`packed-refs` file. Previously, the parser would have accepted
multiple "peeled" lines for a single reference (ignoring all but the
last one). Now it would reject that.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 101 --
 1 file changed, 40 insertions(+), 61 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a45e3ff92f..2b80f244c8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -134,33 +134,6 @@ static void clear_packed_ref_cache(struct packed_ref_store 
*refs)
}
 }
 
-/* The length of a peeled reference line in packed-refs, including EOL: */
-#define PEELED_LINE_LENGTH 42
-
-/*
- * Parse one line from a packed-refs file.  Write the SHA1 to sha1.
- * Return a pointer to the refname within the line (null-terminated),
- * or NULL if there was a problem.
- */
-static const char *parse_ref_line(struct strbuf *line, struct object_id *oid)
-{
-   const char *ref;
-
-   if (parse_oid_hex(line->buf, oid, ) < 0)
-   return NULL;
-   if (!isspace(*ref++))
-   return NULL;
-
-   if (isspace(*ref))
-   return NULL;
-
-   if (line->buf[line->len - 1] != '\n')
-   return NULL;
-   line->buf[--line->len] = 0;
-
-   return ref;
-}
-
 static NORETURN void die_unterminated_line(const char *path,
   const char *p, size_t len)
 {
@@ -221,8 +194,7 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
size_t size;
char *buf;
const char *pos, *eol, *eof;
-   struct ref_entry *last = NULL;
-   struct strbuf line = STRBUF_INIT;
+   struct strbuf tmp = STRBUF_INIT;
enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
struct ref_dir *dir;
 
@@ -264,9 +236,9 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
if (!eol)
die_unterminated_line(refs->path, pos, eof - pos);
 
-   strbuf_add(, pos, eol - pos);
+   strbuf_add(, pos, eol - pos);
 
-   if (!skip_prefix(line.buf, "# pack-refs with:", (const char 
**)))
+   if (!skip_prefix(tmp.buf, "# pack-refs with:", (const char 
**)))
die_invalid_line(refs->path, pos, eof - pos);
 
string_list_split_in_place(, p, ' ', -1);
@@ -281,61 +253,68 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
pos = eol + 1;
 
string_list_clear(, 0);
-   strbuf_reset();
+   strbuf_reset();
}
 
dir = get_ref_dir(packed_refs->cache->root);
while (pos < eof) {
+   const char *p = pos;
struct object_id oid;
const char *refname;
+   int flag = REF_ISPACKED;
+   struct ref_entry *entry = NULL;
 
-   eol = memchr(pos, '\n', eof - pos);
+   if (eof - pos < GIT_SHA1_HEXSZ + 2 ||
+   parse_oid_hex(p, , ) ||
+   !isspace(*p++))
+   die_invalid_line(refs->path, pos, eof - pos);
+
+   eol = memchr(p, '\n', eof - p);
if (!eol)
die_unterminated_line(refs->path, pos, eof - pos);
 
-   strbuf_add(, pos, eol + 1 - pos);
+   strbuf_add(, p, eol - p);
+   refname = tmp.buf;
+
+   if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
+   if (!refname_is_safe(refname))
+   die("packed refname is dangerous: %s", refname);
+   oidclr();
+   flag |= REF_BAD_NAME | REF_ISBROKEN;
+   }
+   if (peeled == PEELED_FULLY ||
+   (peeled == PEELED_TAGS && starts_with(refname, 
"refs/tags/")))
+   flag |= REF_KNOWS_PEELED;
+   entry = create_ref_entry(refname, , flag);
+   add_ref_entry(dir, entry);
 
-   refname = parse_ref_line(, );
-   if (refname) {
-   int flag = REF_ISPACKED;
+   pos = eol + 1;
+
+   if (pos < eof && *pos == '^') {
+   p = pos + 1;
+   if (eof - p < GIT_SHA1_HEXSZ + 1 ||
+   parse_oid_hex(

[PATCH v3 14/21] read_packed_refs(): ensure that references are ordered when read

2017-09-25 Thread Michael Haggerty
It doesn't actually matter now, because the references are only
iterated over to fill the associated `ref_cache`, which itself puts
them in the correct order. But we want to get rid of the `ref_cache`,
so we want to be able to iterate directly over the `packed-refs`
buffer, and then the iteration will need to be ordered correctly.

In fact, we already write the `packed-refs` file sorted, but it is
possible that other Git clients don't get it right. So let's not
assume that a `packed-refs` file is sorted unless it is explicitly
declared to be so via a `sorted` trait in its header line.

If it is *not* declared to be sorted, then scan quickly through the
file to check. If it is found to be out of order, then sort the
records into a new memory-only copy. This checking and sorting is done
quickly, without parsing the full file contents. However, it needs a
little bit of care to avoid reading past the end of the buffer even if
the `packed-refs` file is corrupt.

Since *we* always write the file correctly sorted, include that trait
when we write or rewrite a `packed-refs` file. This means that the
scan described in the previous paragraph should only have to be done
for `packed-refs` files that were written by older versions of the Git
command-line client, or by other clients that haven't yet learned to
write the `sorted` trait.

If `packed-refs` was already sorted, then (if the system allows it) we
can use the mmapped file contents directly. But if the system doesn't
allow a file that is currently mmapped to be replaced using
`rename()`, then it would be bad for us to keep the file mmapped for
any longer than necessary. So, on such systems, always make a copy of
the file contents, either as part of the sorting process, or
afterwards.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 223 +++---
 1 file changed, 212 insertions(+), 11 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 75d44cf061..a7fc613c06 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -51,11 +51,12 @@ struct packed_ref_cache {
int mmapped;
 
/*
-* The contents of the `packed-refs` file. If the file is
-* mmapped, this points at the mmapped contents of the file.
-* If not, this points at heap-allocated memory containing the
-* contents. If there were no contents (e.g., because the file
-* didn't exist), `buf` and `eof` are both NULL.
+* The contents of the `packed-refs` file. If the file was
+* already sorted, this points at the mmapped contents of the
+* file. If not, this points at heap-allocated memory
+* containing the contents, sorted. If there were no contents
+* (e.g., because the file didn't exist), `buf` and `eof` are
+* both NULL.
 */
char *buf, *eof;
 
@@ -358,7 +359,7 @@ struct ref_iterator *mmapped_ref_iterator_begin(
if (!packed_refs->buf)
return empty_ref_iterator_begin();
 
-   base_ref_iterator_init(ref_iterator, _ref_iterator_vtable, 0);
+   base_ref_iterator_init(ref_iterator, _ref_iterator_vtable, 1);
 
iter->packed_refs = packed_refs;
acquire_packed_ref_cache(iter->packed_refs);
@@ -371,6 +372,170 @@ struct ref_iterator *mmapped_ref_iterator_begin(
return ref_iterator;
 }
 
+struct packed_ref_entry {
+   const char *start;
+   size_t len;
+};
+
+static int cmp_packed_ref_entries(const void *v1, const void *v2)
+{
+   const struct packed_ref_entry *e1 = v1, *e2 = v2;
+   const char *r1 = e1->start + GIT_SHA1_HEXSZ + 1;
+   const char *r2 = e2->start + GIT_SHA1_HEXSZ + 1;
+
+   while (1) {
+   if (*r1 == '\n')
+   return *r2 == '\n' ? 0 : -1;
+   if (*r1 != *r2) {
+   if (*r2 == '\n')
+   return 1;
+   else
+   return (unsigned char)*r1 < (unsigned char)*r2 
? -1 : +1;
+   }
+   r1++;
+   r2++;
+   }
+}
+
+/*
+ * `packed_refs->buf` is not known to be sorted. Check whether it is,
+ * and if not, sort it into new memory and munmap/free the old
+ * storage.
+ */
+static void sort_packed_refs(struct packed_ref_cache *packed_refs)
+{
+   struct packed_ref_entry *entries = NULL;
+   size_t alloc = 0, nr = 0;
+   int sorted = 1;
+   const char *pos, *eof, *eol;
+   size_t len, i;
+   char *new_buffer, *dst;
+
+   pos = packed_refs->buf + packed_refs->header_len;
+   eof = packed_refs->eof;
+   len = eof - pos;
+
+   if (!len)
+   return;
+
+   /*
+* Initialize entries based on a crude estimate of the number
+* of references in the file (we'll grow it below if needed):
+*/
+   ALLOC_GROW(entries, len / 80 + 20

[PATCH v3 15/21] packed_ref_iterator_begin(): iterate using `mmapped_ref_iterator`

2017-09-25 Thread Michael Haggerty
Now that we have an efficient way to iterate, in order, over the
mmapped contents of the `packed-refs` file, we can use that directly
to implement reference iteration for the `packed_ref_store`, rather
than iterating over the `ref_cache`. This is the next step towards
getting rid of the `ref_cache` entirely.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 109 --
 1 file changed, 106 insertions(+), 3 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a7fc613c06..abf14a1405 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -397,6 +397,27 @@ static int cmp_packed_ref_entries(const void *v1, const 
void *v2)
}
 }
 
+/*
+ * Compare a packed-refs record pointed to by `rec` to the specified
+ * NUL-terminated refname.
+ */
+static int cmp_entry_to_refname(const char *rec, const char *refname)
+{
+   const char *r1 = rec + GIT_SHA1_HEXSZ + 1;
+   const char *r2 = refname;
+
+   while (1) {
+   if (*r1 == '\n')
+   return *r2 ? -1 : 0;
+   if (!*r2)
+   return 1;
+   if (*r1 != *r2)
+   return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : 
+1;
+   r1++;
+   r2++;
+   }
+}
+
 /*
  * `packed_refs->buf` is not known to be sorted. Check whether it is,
  * and if not, sort it into new memory and munmap/free the old
@@ -503,6 +524,17 @@ static const char *find_start_of_record(const char *buf, 
const char *p)
return p;
 }
 
+/*
+ * Return a pointer to the start of the record following the record
+ * that contains `*p`. If none is found before `end`, return `end`.
+ */
+static const char *find_end_of_record(const char *p, const char *end)
+{
+   while (++p < end && (p[-1] != '\n' || p[0] == '^'))
+   ;
+   return p;
+}
+
 /*
  * We want to be able to compare mmapped reference records quickly,
  * without totally parsing them. We can do so because the records are
@@ -592,6 +624,65 @@ static int load_contents(struct packed_ref_cache 
*packed_refs)
return 1;
 }
 
+/*
+ * Find the place in `cache->buf` where the start of the record for
+ * `refname` starts. If `mustexist` is true and the reference doesn't
+ * exist, then return NULL. If `mustexist` is false and the reference
+ * doesn't exist, then return the point where that reference would be
+ * inserted. In the latter mode, `refname` doesn't have to be a proper
+ * reference name; for example, one could search for "refs/replace/"
+ * to find the start of any replace references.
+ *
+ * The record is sought using a binary search, so `cache->buf` must be
+ * sorted.
+ */
+static const char *find_reference_location(struct packed_ref_cache *cache,
+  const char *refname, int mustexist)
+{
+   /*
+* This is not *quite* a garden-variety binary search, because
+* the data we're searching is made up of records, and we
+* always need to find the beginning of a record to do a
+* comparison. A "record" here is one line for the reference
+* itself and zero or one peel lines that start with '^'. Our
+* loop invariant is described in the next two comments.
+*/
+
+   /*
+* A pointer to the character at the start of a record whose
+* preceding records all have reference names that come
+* *before* `refname`.
+*/
+   const char *lo = cache->buf + cache->header_len;
+
+   /*
+* A pointer to a the first character of a record whose
+* reference name comes *after* `refname`.
+*/
+   const char *hi = cache->eof;
+
+   while (lo < hi) {
+   const char *mid, *rec;
+   int cmp;
+
+   mid = lo + (hi - lo) / 2;
+   rec = find_start_of_record(lo, mid);
+   cmp = cmp_entry_to_refname(rec, refname);
+   if (cmp < 0) {
+   lo = find_end_of_record(mid, hi);
+   } else if (cmp > 0) {
+   hi = rec;
+   } else {
+   return rec;
+   }
+   }
+
+   if (mustexist)
+   return NULL;
+   else
+   return lo;
+}
+
 /*
  * Read from the `packed-refs` file into a newly-allocated
  * `packed_ref_cache` and return it. The return value will already
@@ -888,6 +979,8 @@ static struct ref_iterator *packed_ref_iterator_begin(
const char *prefix, unsigned int flags)
 {
struct packed_ref_store *refs;
+   struct packed_ref_cache *packed_refs;
+   const char *start;
struct packed_ref_iterator *iter;
struct ref_iterator *ref_iterator;
unsigned int required_flags = REF_STORE_READ;
@@ -905,13 +998,23 @@ static struct ref_iterato

[PATCH v3 19/21] ref_cache: remove support for storing peeled values

2017-09-25 Thread Michael Haggerty
Now that the `packed-refs` backend doesn't use `ref_cache`, there is
nobody left who might want to store peeled values of references in
`ref_cache`. So remove that feature.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c |  9 -
 refs/ref-cache.c  | 42 +-
 refs/ref-cache.h  | 32 ++--
 3 files changed, 11 insertions(+), 72 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 3829e9c294..66e5525174 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -2,7 +2,6 @@
 #include "../config.h"
 #include "../refs.h"
 #include "refs-internal.h"
-#include "ref-cache.h"
 #include "packed-backend.h"
 #include "../iterator.h"
 #include "../lockfile.h"
@@ -226,6 +225,14 @@ static NORETURN void die_invalid_line(const char *path,
 
 }
 
+/*
+ * This value is set in `base.flags` if the peeled value of the
+ * current reference is known. In that case, `peeled` contains the
+ * correct peeled value for the reference, which might be `null_sha1`
+ * if the reference is not a tag or if it is broken.
+ */
+#define REF_KNOWS_PEELED 0x40
+
 /*
  * An iterator over a packed-refs file that is currently mmapped.
  */
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 54dfb5218c..4f850e1b5c 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -38,7 +38,6 @@ struct ref_entry *create_ref_entry(const char *refname,
 
FLEX_ALLOC_STR(ref, name, refname);
oidcpy(>u.value.oid, oid);
-   oidclr(>u.value.peeled);
ref->flag = flag;
return ref;
 }
@@ -491,49 +490,10 @@ static int cache_ref_iterator_advance(struct ref_iterator 
*ref_iterator)
}
 }
 
-enum peel_status peel_entry(struct ref_entry *entry, int repeel)
-{
-   enum peel_status status;
-
-   if (entry->flag & REF_KNOWS_PEELED) {
-   if (repeel) {
-   entry->flag &= ~REF_KNOWS_PEELED;
-   oidclr(>u.value.peeled);
-   } else {
-   return is_null_oid(>u.value.peeled) ?
-   PEEL_NON_TAG : PEEL_PEELED;
-   }
-   }
-   if (entry->flag & REF_ISBROKEN)
-   return PEEL_BROKEN;
-   if (entry->flag & REF_ISSYMREF)
-   return PEEL_IS_SYMREF;
-
-   status = peel_object(entry->u.value.oid.hash, 
entry->u.value.peeled.hash);
-   if (status == PEEL_PEELED || status == PEEL_NON_TAG)
-   entry->flag |= REF_KNOWS_PEELED;
-   return status;
-}
-
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
   struct object_id *peeled)
 {
-   struct cache_ref_iterator *iter =
-   (struct cache_ref_iterator *)ref_iterator;
-   struct cache_ref_iterator_level *level;
-   struct ref_entry *entry;
-
-   level = >levels[iter->levels_nr - 1];
-
-   if (level->index == -1)
-   die("BUG: peel called before advance for cache iterator");
-
-   entry = level->dir->entries[level->index];
-
-   if (peel_entry(entry, 0))
-   return -1;
-   oidcpy(peeled, >u.value.peeled);
-   return 0;
+   return peel_object(ref_iterator->oid->hash, peeled->hash);
 }
 
 static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
diff --git a/refs/ref-cache.h b/refs/ref-cache.h
index a082bfb06c..eda65e73ed 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -38,14 +38,6 @@ struct ref_value {
 * referred to by the last reference in the symlink chain.
 */
struct object_id oid;
-
-   /*
-* If REF_KNOWS_PEELED, then this field holds the peeled value
-* of this reference, or null if the reference is known not to
-* be peelable.  See the documentation for peel_ref() for an
-* exact definition of "peelable".
-*/
-   struct object_id peeled;
 };
 
 /*
@@ -97,21 +89,14 @@ struct ref_dir {
  * public values; see refs.h.
  */
 
-/*
- * The field ref_entry->u.value.peeled of this value entry contains
- * the correct peeled value for the reference, which might be
- * null_sha1 if the reference is not a tag or if it is broken.
- */
-#define REF_KNOWS_PEELED 0x10
-
 /* ref_entry represents a directory of references */
-#define REF_DIR 0x20
+#define REF_DIR 0x10
 
 /*
  * Entry has not yet been read from disk (used only for REF_DIR
  * entries representing loose references)
  */
-#define REF_INCOMPLETE 0x40
+#define REF_INCOMPLETE 0x20
 
 /*
  * A ref_entry represents either a reference or a "subdirectory" of
@@ -252,17 +237,4 @@ struct ref_iterator *cache_ref_iterator_begin(struct 
ref_cache *cache,
  const char *prefix,
   

[PATCH v3 21/21] packed-backend.c: rename a bunch of things and update comments

2017-09-25 Thread Michael Haggerty
We've made huge changes to this file, and some of the old names and
comments are no longer very fitting. So rename a bunch of things:

* `struct packed_ref_cache` → `struct snapshot`
* `acquire_packed_ref_cache()` → `acquire_snapshot()`
* `release_packed_ref_buffer()` → `clear_snapshot_buffer()`
* `release_packed_ref_cache()` → `release_snapshot()`
* `clear_packed_ref_cache()` → `clear_snapshot()`
* `struct packed_ref_entry` → `struct snapshot_record`
* `cmp_packed_ref_entries()` → `cmp_packed_ref_records()`
* `cmp_entry_to_refname()` → `cmp_record_to_refname()`
* `sort_packed_refs()` → `sort_snapshot()`
* `read_packed_refs()` → `create_snapshot()`
* `validate_packed_ref_cache()` → `validate_snapshot()`
* `get_packed_ref_cache()` → `get_snapshot()`
* Renamed local variables and struct members accordingly.

Also update a bunch of comments to reflect the renaming and the
accumulated changes that the code has undergone.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 422 +++---
 1 file changed, 232 insertions(+), 190 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 1ed52d7eca..d500ebfaa5 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -37,10 +37,30 @@ static enum mmap_strategy mmap_strategy = MMAP_OK;
 
 struct packed_ref_store;
 
-struct packed_ref_cache {
+/*
+ * A `snapshot` represents one snapshot of a `packed-refs` file.
+ *
+ * Normally, this will be a mmapped view of the contents of the
+ * `packed-refs` file at the time the snapshot was created. However,
+ * if the `packed-refs` file was not sorted, this might point at heap
+ * memory holding the contents of the `packed-refs` file with its
+ * records sorted by refname.
+ *
+ * `snapshot` instances are reference counted (via
+ * `acquire_snapshot()` and `release_snapshot()`). This is to prevent
+ * an instance from disappearing while an iterator is still iterating
+ * over it. Instances are garbage collected when their `referrers`
+ * count goes to zero.
+ *
+ * The most recent `snapshot`, if available, is referenced by the
+ * `packed_ref_store`. Its freshness is checked whenever
+ * `get_snapshot()` is called; if the existing snapshot is obsolete, a
+ * new snapshot is taken.
+ */
+struct snapshot {
/*
 * A back-pointer to the packed_ref_store with which this
-* cache is associated:
+* snapshot is associated:
 */
struct packed_ref_store *refs;
 
@@ -61,26 +81,42 @@ struct packed_ref_cache {
size_t header_len;
 
/*
-* What is the peeled state of this cache? (This is usually
-* determined from the header of the "packed-refs" file.)
+* What is the peeled state of the `packed-refs` file that
+* this snapshot represents? (This is usually determined from
+* the file's header.)
 */
enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled;
 
/*
-* Count of references to the data structure in this instance,
-* including the pointer from files_ref_store::packed if any.
-* The data will not be freed as long as the reference count
-* is nonzero.
+* Count of references to this instance, including the pointer
+* from `packed_ref_store::snapshot`, if any. The instance
+* will not be freed as long as the reference count is
+* nonzero.
 */
unsigned int referrers;
 
-   /* The metadata from when this packed-refs cache was read */
+   /*
+* The metadata of the `packed-refs` file from which this
+* snapshot was created, used to tell if the file has been
+* replaced since we read it.
+*/
struct stat_validity validity;
 };
 
 /*
- * A container for `packed-refs`-related data. It is not (yet) a
- * `ref_store`.
+ * A `ref_store` representing references stored in a `packed-refs`
+ * file. It implements the `ref_store` interface, though it has some
+ * limitations:
+ *
+ * - It cannot store symbolic references.
+ *
+ * - It cannot store reflogs.
+ *
+ * - It does not support reference renaming (though it could).
+ *
+ * On the other hand, it can be locked outside of a reference
+ * transaction. In that case, it remains locked even after the
+ * transaction is done and the new `packed-refs` file is activated.
  */
 struct packed_ref_store {
struct ref_store base;
@@ -91,10 +127,10 @@ struct packed_ref_store {
char *path;
 
/*
-* A cache of the values read from the `packed-refs` file, if
-* it might still be current; otherwise, NULL.
+* A snapshot of the values read from the `packed-refs` file,
+* if it might still be current; otherwise, NULL.
 */
-   struct packed_ref_cache *cache;
+   struct snapshot *snapshot;
 
/*
 * Lock used for the "packed-refs" file. Note that this (and
@@

[PATCH v3 20/21] mmapped_ref_iterator: inline into `packed_ref_iterator`

2017-09-25 Thread Michael Haggerty
Since `packed_ref_iterator` is now delegating to
`mmapped_ref_iterator` rather than `cache_ref_iterator` to do the
heavy lifting, there is no need to keep the two iterators separate. So
"inline" `mmapped_ref_iterator` into `packed_ref_iterator`. This
removes a bunch of boilerplate.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 284 --
 1 file changed, 114 insertions(+), 170 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 66e5525174..1ed52d7eca 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -225,157 +225,6 @@ static NORETURN void die_invalid_line(const char *path,
 
 }
 
-/*
- * This value is set in `base.flags` if the peeled value of the
- * current reference is known. In that case, `peeled` contains the
- * correct peeled value for the reference, which might be `null_sha1`
- * if the reference is not a tag or if it is broken.
- */
-#define REF_KNOWS_PEELED 0x40
-
-/*
- * An iterator over a packed-refs file that is currently mmapped.
- */
-struct mmapped_ref_iterator {
-   struct ref_iterator base;
-
-   struct packed_ref_cache *packed_refs;
-
-   /* The current position in the mmapped file: */
-   const char *pos;
-
-   /* The end of the mmapped file: */
-   const char *eof;
-
-   struct object_id oid, peeled;
-
-   struct strbuf refname_buf;
-};
-
-static int mmapped_ref_iterator_advance(struct ref_iterator *ref_iterator)
-{
-   struct mmapped_ref_iterator *iter =
-   (struct mmapped_ref_iterator *)ref_iterator;
-   const char *p = iter->pos, *eol;
-
-   strbuf_reset(>refname_buf);
-
-   if (iter->pos == iter->eof)
-   return ref_iterator_abort(ref_iterator);
-
-   iter->base.flags = REF_ISPACKED;
-
-   if (iter->eof - p < GIT_SHA1_HEXSZ + 2 ||
-   parse_oid_hex(p, >oid, ) ||
-   !isspace(*p++))
-   die_invalid_line(iter->packed_refs->refs->path,
-iter->pos, iter->eof - iter->pos);
-
-   eol = memchr(p, '\n', iter->eof - p);
-   if (!eol)
-   die_unterminated_line(iter->packed_refs->refs->path,
- iter->pos, iter->eof - iter->pos);
-
-   strbuf_add(>refname_buf, p, eol - p);
-   iter->base.refname = iter->refname_buf.buf;
-
-   if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) {
-   if (!refname_is_safe(iter->base.refname))
-   die("packed refname is dangerous: %s",
-   iter->base.refname);
-   oidclr(>oid);
-   iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN;
-   }
-   if (iter->packed_refs->peeled == PEELED_FULLY ||
-   (iter->packed_refs->peeled == PEELED_TAGS &&
-starts_with(iter->base.refname, "refs/tags/")))
-   iter->base.flags |= REF_KNOWS_PEELED;
-
-   iter->pos = eol + 1;
-
-   if (iter->pos < iter->eof && *iter->pos == '^') {
-   p = iter->pos + 1;
-   if (iter->eof - p < GIT_SHA1_HEXSZ + 1 ||
-   parse_oid_hex(p, >peeled, ) ||
-   *p++ != '\n')
-   die_invalid_line(iter->packed_refs->refs->path,
-iter->pos, iter->eof - iter->pos);
-   iter->pos = p;
-
-   /*
-* Regardless of what the file header said, we
-* definitely know the value of *this* reference. But
-* we suppress it if the reference is broken:
-*/
-   if ((iter->base.flags & REF_ISBROKEN)) {
-   oidclr(>peeled);
-   iter->base.flags &= ~REF_KNOWS_PEELED;
-   } else {
-   iter->base.flags |= REF_KNOWS_PEELED;
-   }
-   } else {
-   oidclr(>peeled);
-   }
-
-   return ITER_OK;
-}
-
-static int mmapped_ref_iterator_peel(struct ref_iterator *ref_iterator,
-   struct object_id *peeled)
-{
-   struct mmapped_ref_iterator *iter =
-   (struct mmapped_ref_iterator *)ref_iterator;
-
-   if ((iter->base.flags & REF_KNOWS_PEELED)) {
-   oidcpy(peeled, >peeled);
-   return is_null_oid(>peeled) ? -1 : 0;
-   } else if ((iter->base.flags & (REF_ISBROKEN | REF_ISSYMREF))) {
-   return -1;
-   } else {
-   return !!peel_object(iter->oid.hash, peeled->hash);
-   }
-}
-
-static int mmapped_ref_iterator_abort(struct ref_iterator *ref_iterator)
-{
-   struct mmapped_ref_iterator

[PATCH v3 11/21] mmapped_ref_iterator_advance(): no peeled value for broken refs

2017-09-25 Thread Michael Haggerty
If a reference is broken, suppress its peeled value.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 312116a99d..724c88631d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -234,9 +234,15 @@ static int mmapped_ref_iterator_advance(struct 
ref_iterator *ref_iterator)
 
/*
 * Regardless of what the file header said, we
-* definitely know the value of *this* reference:
+* definitely know the value of *this* reference. But
+* we suppress it if the reference is broken:
 */
-   iter->base.flags |= REF_KNOWS_PEELED;
+   if ((iter->base.flags & REF_ISBROKEN)) {
+   oidclr(>peeled);
+   iter->base.flags &= ~REF_KNOWS_PEELED;
+   } else {
+   iter->base.flags |= REF_KNOWS_PEELED;
+   }
} else {
oidclr(>peeled);
}
-- 
2.14.1



[PATCH v3 09/21] packed_ref_cache: remember the file-wide peeling state

2017-09-25 Thread Michael Haggerty
Rather than store the peeling state (i.e., the one defined by traits
in the `packed-refs` file header line) in a local variable in
`read_packed_refs()`, store it permanently in `packed_ref_cache`. This
will be needed when we stop reading all packed refs at once.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 2b80f244c8..ae276f3445 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -18,6 +18,12 @@ struct packed_ref_cache {
 
struct ref_cache *cache;
 
+   /*
+* What is the peeled state of this cache? (This is usually
+* determined from the header of the "packed-refs" file.)
+*/
+   enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled;
+
/*
 * Count of references to the data structure in this instance,
 * including the pointer from files_ref_store::packed if any.
@@ -195,13 +201,13 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
char *buf;
const char *pos, *eol, *eof;
struct strbuf tmp = STRBUF_INIT;
-   enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
struct ref_dir *dir;
 
packed_refs->refs = refs;
acquire_packed_ref_cache(packed_refs);
packed_refs->cache = create_ref_cache(NULL, NULL);
packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
+   packed_refs->peeled = PEELED_NONE;
 
fd = open(refs->path, O_RDONLY);
if (fd < 0) {
@@ -244,9 +250,9 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
string_list_split_in_place(, p, ' ', -1);
 
if (unsorted_string_list_has_string(, "fully-peeled"))
-   peeled = PEELED_FULLY;
+   packed_refs->peeled = PEELED_FULLY;
else if (unsorted_string_list_has_string(, "peeled"))
-   peeled = PEELED_TAGS;
+   packed_refs->peeled = PEELED_TAGS;
/* perhaps other traits later as well */
 
/* The "+ 1" is for the LF character. */
@@ -282,8 +288,9 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
oidclr();
flag |= REF_BAD_NAME | REF_ISBROKEN;
}
-   if (peeled == PEELED_FULLY ||
-   (peeled == PEELED_TAGS && starts_with(refname, 
"refs/tags/")))
+   if (packed_refs->peeled == PEELED_FULLY ||
+   (packed_refs->peeled == PEELED_TAGS &&
+starts_with(refname, "refs/tags/")))
flag |= REF_KNOWS_PEELED;
entry = create_ref_entry(refname, , flag);
add_ref_entry(dir, entry);
-- 
2.14.1



[PATCH v3 18/21] packed_ref_store: get rid of the `ref_cache` entirely

2017-09-25 Thread Michael Haggerty
Now that everything has been changed to read what it needs directly
out of the `packed-refs` file, `packed_ref_store` doesn't need to
maintain a `ref_cache` at all. So get rid of it.

First of all, this will save a lot of memory and lots of little
allocations. Instead of needing to store complicated parsed data
structures in memory, we just mmap the file (potentially sharing
memory with other processes) and parse only what we need.

Moreover, since the mmapped access to the file reads only the parts of
the file that it needs, this might save reading all of the data from
disk at all (at least if the file starts out sorted).

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 29 ++---
 1 file changed, 2 insertions(+), 27 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index dbbba45502..3829e9c294 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -45,8 +45,6 @@ struct packed_ref_cache {
 */
struct packed_ref_store *refs;
 
-   struct ref_cache *cache;
-
/* Is the `packed-refs` file currently mmapped? */
int mmapped;
 
@@ -148,7 +146,6 @@ static void release_packed_ref_buffer(struct 
packed_ref_cache *packed_refs)
 static int release_packed_ref_cache(struct packed_ref_cache *packed_refs)
 {
if (!--packed_refs->referrers) {
-   free_ref_cache(packed_refs->cache);
stat_validity_clear(_refs->validity);
release_packed_ref_buffer(packed_refs);
free(packed_refs);
@@ -719,15 +716,10 @@ static const char *find_reference_location(struct 
packed_ref_cache *cache,
 static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 {
struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs));
-   struct ref_dir *dir;
-   struct ref_iterator *iter;
int sorted = 0;
-   int ok;
 
packed_refs->refs = refs;
acquire_packed_ref_cache(packed_refs);
-   packed_refs->cache = create_ref_cache(NULL, NULL);
-   packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
packed_refs->peeled = PEELED_NONE;
 
if (!load_contents(packed_refs))
@@ -800,23 +792,6 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
packed_refs->eof = buf_copy + size;
}
 
-   dir = get_ref_dir(packed_refs->cache->root);
-   iter = mmapped_ref_iterator_begin(
-   packed_refs,
-   packed_refs->buf + packed_refs->header_len,
-   packed_refs->eof);
-   while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
-   struct ref_entry *entry =
-   create_ref_entry(iter->refname, iter->oid, iter->flags);
-
-   if ((iter->flags & REF_KNOWS_PEELED))
-   ref_iterator_peel(iter, >u.value.peeled);
-   add_ref_entry(dir, entry);
-   }
-
-   if (ok != ITER_DONE)
-   die("error reading packed-refs file %s", refs->path);
-
return packed_refs;
 }
 
@@ -975,8 +950,8 @@ static struct ref_iterator *packed_ref_iterator_begin(
else
start = packed_refs->buf + packed_refs->header_len;
 
-   iter->iter0 = mmapped_ref_iterator_begin(
-   packed_refs, start, packed_refs->eof);
+   iter->iter0 = mmapped_ref_iterator_begin(packed_refs,
+start, packed_refs->eof);
 
iter->flags = flags;
 
-- 
2.14.1



[PATCH v3 16/21] packed_read_raw_ref(): read the reference from the mmapped buffer

2017-09-25 Thread Michael Haggerty
Instead of reading the reference from the `ref_cache`, read it
directly from the mmapped buffer.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index abf14a1405..be614e79f5 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -876,18 +876,22 @@ static int packed_read_raw_ref(struct ref_store 
*ref_store,
 {
struct packed_ref_store *refs =
packed_downcast(ref_store, REF_STORE_READ, "read_raw_ref");
-
-   struct ref_entry *entry;
+   struct packed_ref_cache *packed_refs = get_packed_ref_cache(refs);
+   const char *rec;
 
*type = 0;
 
-   entry = get_packed_ref(refs, refname);
-   if (!entry) {
+   rec = find_reference_location(packed_refs, refname, 1);
+
+   if (!rec) {
+   /* refname is not a packed reference. */
errno = ENOENT;
return -1;
}
 
-   hashcpy(sha1, entry->u.value.oid.hash);
+   if (get_sha1_hex(rec, sha1))
+   die_invalid_line(refs->path, rec, packed_refs->eof - rec);
+
*type = REF_ISPACKED;
return 0;
 }
-- 
2.14.1



[PATCH v3 10/21] mmapped_ref_iterator: add iterator over a packed-refs file

2017-09-25 Thread Michael Haggerty
Add a new `mmapped_ref_iterator`, which can iterate over the
references in an mmapped `packed-refs` file directly. Use this
iterator from `read_packed_refs()` to fill the packed refs cache.

Note that we are not yet willing to promise that the new iterator
generates its output in order. That doesn't matter for now, because
the packed refs cache doesn't care what order it is filled.

This change adds a lot of boilerplate without providing any obvious
benefits. The benefits will come soon, when we get rid of the
`ref_cache` for packed references altogether.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 207 --
 1 file changed, 152 insertions(+), 55 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index ae276f3445..312116a99d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -163,6 +163,141 @@ static NORETURN void die_invalid_line(const char *path,
 
 }
 
+/*
+ * An iterator over a packed-refs file that is currently mmapped.
+ */
+struct mmapped_ref_iterator {
+   struct ref_iterator base;
+
+   struct packed_ref_cache *packed_refs;
+
+   /* The current position in the mmapped file: */
+   const char *pos;
+
+   /* The end of the mmapped file: */
+   const char *eof;
+
+   struct object_id oid, peeled;
+
+   struct strbuf refname_buf;
+};
+
+static int mmapped_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+   struct mmapped_ref_iterator *iter =
+   (struct mmapped_ref_iterator *)ref_iterator;
+   const char *p = iter->pos, *eol;
+
+   strbuf_reset(>refname_buf);
+
+   if (iter->pos == iter->eof)
+   return ref_iterator_abort(ref_iterator);
+
+   iter->base.flags = REF_ISPACKED;
+
+   if (iter->eof - p < GIT_SHA1_HEXSZ + 2 ||
+   parse_oid_hex(p, >oid, ) ||
+   !isspace(*p++))
+   die_invalid_line(iter->packed_refs->refs->path,
+iter->pos, iter->eof - iter->pos);
+
+   eol = memchr(p, '\n', iter->eof - p);
+   if (!eol)
+   die_unterminated_line(iter->packed_refs->refs->path,
+ iter->pos, iter->eof - iter->pos);
+
+   strbuf_add(>refname_buf, p, eol - p);
+   iter->base.refname = iter->refname_buf.buf;
+
+   if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) {
+   if (!refname_is_safe(iter->base.refname))
+   die("packed refname is dangerous: %s",
+   iter->base.refname);
+   oidclr(>oid);
+   iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN;
+   }
+   if (iter->packed_refs->peeled == PEELED_FULLY ||
+   (iter->packed_refs->peeled == PEELED_TAGS &&
+starts_with(iter->base.refname, "refs/tags/")))
+   iter->base.flags |= REF_KNOWS_PEELED;
+
+   iter->pos = eol + 1;
+
+   if (iter->pos < iter->eof && *iter->pos == '^') {
+   p = iter->pos + 1;
+   if (iter->eof - p < GIT_SHA1_HEXSZ + 1 ||
+   parse_oid_hex(p, >peeled, ) ||
+   *p++ != '\n')
+   die_invalid_line(iter->packed_refs->refs->path,
+iter->pos, iter->eof - iter->pos);
+   iter->pos = p;
+
+   /*
+* Regardless of what the file header said, we
+* definitely know the value of *this* reference:
+*/
+   iter->base.flags |= REF_KNOWS_PEELED;
+   } else {
+   oidclr(>peeled);
+   }
+
+   return ITER_OK;
+}
+
+static int mmapped_ref_iterator_peel(struct ref_iterator *ref_iterator,
+   struct object_id *peeled)
+{
+   struct mmapped_ref_iterator *iter =
+   (struct mmapped_ref_iterator *)ref_iterator;
+
+   if ((iter->base.flags & REF_KNOWS_PEELED)) {
+   oidcpy(peeled, >peeled);
+   return is_null_oid(>peeled) ? -1 : 0;
+   } else if ((iter->base.flags & (REF_ISBROKEN | REF_ISSYMREF))) {
+   return -1;
+   } else {
+   return !!peel_object(iter->oid.hash, peeled->hash);
+   }
+}
+
+static int mmapped_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+   struct mmapped_ref_iterator *iter =
+   (struct mmapped_ref_iterator *)ref_iterator;
+
+   release_packed_ref_cache(iter->packed_refs);
+   strbuf_release(>refname_buf);
+   base_ref_iterator_free(ref_iterator);
+   return ITER_DONE;
+}
+
+static struct ref_iterator_vtable mmapped_ref_iterator_vtable = {
+   mmapped_

  1   2   3   4   5   6   7   8   9   10   >