Re: [PATCH 1/2] test-lib: group system specific FIFO tests by system

2017-09-15 Thread Michael J Gruber
Johannes Schindelin venit, vidit, dixit 15.09.2017 00:21:
> Hi Michael,
> 
> On Thu, 14 Sep 2017, Michael J Gruber wrote:
> 
>> test-lib determines whether a file-system supports FIFOs and needs to do
>> special casing for CYGWIN and MINGW. This separates those system
>> specific settings from those at more central place.
>>
>> Set mkfifo()  to false in the central system specific place so that the
>> same test works everywhere.
> 
> The mkfifo() emulation of Cygwin seems to work, no? I think it works even
> in MSYS2, but not in MINGW.
> 
> So maybe this patch should affect only the MINGW arm?

I only reorganised the code, so in that sense the patch does not affect
any system ;)

If indeed mkfifo works on CYGWIN than a separate patch should remove the
exclusion of CYGWIN; alas, I can't confirm (I wish MS still had the old
academic alliance programme).

Michael


Re: [PATCH 2/3] merge-base: return fork-point outside reflog

2017-09-15 Thread Michael J Gruber
Junio C Hamano venit, vidit, dixit 15.09.2017 04:48:
> Michael J Gruber <g...@grubix.eu> writes:
> 
>> In fact, per documentation "--fork-point" looks at the reflog in
>> addition to doing the usual walk from the tip. The original design
>> description in d96855ff51 ("merge-base: teach "--fork-point" mode",
>> 2013-10-23) describes this as computing from a virtual merge-base of all
>> the historical tips of refname. They may or may not all be present in
>> the reflog (think pruning, non-ff fetching, fast forwarding etc.),
>> so filtering by the current contents of the reflog is potentially
>> harmful, and it does not seem to fulfill any purpose in the original
>> design.
> 
> Let me think aloud, using the picture from the log message from that
> commit.
> 
>  o---B1
> /
> ---o---o---B2--o---o---o---Base
> \
>  B3
>   \
>Derived
> 
> where the current tip of the "base" branch is at Base, but earlier
> fetch observed that its tip used to be B3 and then B2 and then B1
> before getting to the current commit, and the branch being rebased
> on top of the latest "base" is based on commit B3.
> 
> So the logic tries to find a merge base between "Derived" and a
> virtual merge commit across Base, B1, B2 and B3.  And it finds B3.
> 
> If for some reason we didn't have B3 in the reflog, then wouldn't
> the merge base computation between Derived and a virtual merge
> commit across Base, B2 and B2 (but not B3 because we no longer know
> about it due to its lack in the reflog) find 'o' that is the parent
> of B2 and B3? 

Yes.

> Wouldn't that lead to both B3 and Derived replayed
> when the user of the fork-point potion rebases the history of
> Derived?

Replayed, yes. What that means would depend on how B3 ended up being
"off base" (reset or rebase, e.g.): the replay could lead to a reapply
without conflict, or with conflict, or an empty (discarded) commit,
depending on "how much of B3" is still "on base".

> Perhaps that is the best we could do with a pruned reflog that lacks
> B3, but if that is the case, I wonder if it may be better to fail
> the request saying that we cannot find the fork-point (because,
> well, your reflog no longer has sufficient information), than
> silently give a wrong result, and in this case, we can tell between
> a correct result (i.e. the merge base is one of the commits we still
> know was at the tip) and a wrong one (i.e. the merge base is not any
> of the commits in the reflog).
> 
> If we declare --fork-point is the best effort option and may give an
> answer that is not better than without the option, then I think this
> patch is OK, but that diminishes the value of the option as a
> building block, I am afraid.
> 
> Callers that are more careful could ask merge-base with --fork-point
> (and happily use it knowing that the result is indeed a commit that
> used to be at the tip), or fall back to the result merge-base
> without --fork-point gives (because you could do no better) and deal
> with duplicates that may happen due to the imprecise determination.
> With this change, these callers will get result from a call to
> "merge-base --fork-point" that may or may not be definite, and they
> cannot tell.  For lazy users, making the option itself to fall back
> may be simpler to use, and certainly is a valid stance to take when
> implementing a convenience option to a Porcelain command, but I do
> not know if it is a good idea to throw "merge-base --fork-point"
> into that category.

Simply put, "git merge-base ref commit" looks at the (graph) history of
ref and considers merge-base candidates that are also in the graph
history of commit. This is the "graph notion" of merge-base, and the
result is immanently determined by the DAG.

There is also the "reflog notion" where you look at the "reflog history"
of ref. The result depends on the reflog, which itself is "volatile"
(think prune), and such is the result.

Now, the original documentation of merge-base says that "merge-base
--fork-point" looks at the reflog of ref "also" (*in addition to*) the
DAG. That is, the merge-base candidates for "merge-base --fork-point"
are a super-set of those for the plain mode, enhanced by the reflog.
(graph notion plus reflog notion)

The original implementation makes sure that "merge-base --fork-point"
looks *only* at candidates from the reflog. (strict reflog notion)

That is a discrepancy that we should resolve in any case.

Note tha

Re: [PATCH 1/3] t6010: test actual test output

2017-09-15 Thread Michael J Gruber
Jeff King venit, vidit, dixit 14.09.2017 16:34:
> On Thu, Sep 14, 2017 at 03:15:18PM +0200, Michael J Gruber wrote:
> 
>> 4f21454b55 ("merge-base: handle --fork-point without reflog",
>> 2016-10-12) introduced a fix for merge-base --fork-point without reflog
>> and a test. While that test is fine, it did not update expected nor actual
>> output and tested that of the previous test instead.
> 
> Oops. The use of the existing "expect3" was intentional (the idea being
> that we are repeating the identical step from the previous test with the
> slight tweak of not having a reflog). But obviously failing to write to
> "actual" at all was a mistake.
> 
> That said, I'm OK with reiterating the expected output.

expect3 had a different content, too ;)

Michael


Re: [PATCH v3] commit-template: change a message to be more intuitive

2017-09-15 Thread Michael J Gruber
Kaartic Sivaraam venit, vidit, dixit 15.09.2017 06:50:
> It's not good to use the phrase 'do not touch' to convey the information
> that the cut-line should not be modified or removed as it could possibly
> be mis-interpreted by a person who doesn't know that the word 'touch' has
> the meaning of 'tamper with'. Further, it could make translations a little
> difficult as it might not have the intended meaning in a few languages (for
> which translations don't exist yet) when translated as such.
> 
> So, use intuitive terms in the sentence. Replacing the word 'touch' with
> other terms has introduced the possibility of the following sentence to be
> mis-interpreted, so change the terms in that too.
> 
> Signed-off-by: Kaartic Sivaraam 
> ---
>  Changes in v3:
>  
> - changed the wordings of the second sentence as there seemed to be a
>   magical 'or else' connecting the two lines.
>  
>  I didn't expect the least that this would go upto v3. In case anyboy finds
>  something wrong with this change too, it's a lot better to drop this 
> altogether
>  than going for a v4.

That happens more often than not :)

Your original intent was to avoid any misunderstandings, and all the
comments and iterations made sure that we don't trade one possible
source of misunderstanding for another but avoid them all.

I consider v4 to be a strict improvement over the status quo and (as fas
as I can see) to serve your original intent as good as possible.

>  wt-status.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/wt-status.c b/wt-status.c
> index 77c27c511..23e87e74d 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -934,7 +934,7 @@ size_t wt_status_locate_end(const char *s, size_t len)
>  
>  void wt_status_add_cut_line(FILE *fp)
>  {
> - const char *explanation = _("Do not touch the line above.\nEverything 
> below will be removed.");
> + const char *explanation = _("Do not modify or remove the line 
> above.\nEverything below it will be ignored.");
>   struct strbuf buf = STRBUF_INIT;
>  
>   fprintf(fp, "%c %s", comment_line_char, cut_line);
> 


Re: [PATCH 00/20] Read `packed-refs` using mmap()

2017-09-14 Thread Michael Haggerty
On 09/14/2017 10:23 PM, Johannes Schindelin wrote:
> On Wed, 13 Sep 2017, Michael Haggerty wrote:
> 
>> * `mmap()` the whole file rather than `read()`ing it.
> 
> On Windows, a memory-mapped file cannot be renamed. As a consequence, the
> following tests fail on `pu`:
> 
> [...]

Thanks for your quick investigation.

> This diff:
> 
> -- snip --
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index f9c5e771bdd..ee8b3603624 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1306,13 +1308,13 @@ static int packed_transaction_finish(struct
> ref_store *ref_store,
>   char *packed_refs_path;
>  
>   packed_refs_path = get_locked_file_path(>lock);
> + clear_snapshot(refs);
>   if (rename_tempfile(>tempfile, packed_refs_path)) {
>   strbuf_addf(err, "error replacing %s: %s",
>   refs->path, strerror(errno));
>   goto cleanup;
>   }
>  
> - clear_snapshot(refs);
>   ret = 0;
>  
>  cleanup:
> -- snap --
> 
> reduces the failed tests to
> 
> t1410-reflog.counts.sh
> t3210-pack-refs.counts.sh
> t3211-peel-ref.counts.sh
> t5505-remote.counts.sh
> t5510-fetch.counts.sh
> t5600-clone-fail-cleanup.counts.sh

That change is a strict improvement on all systems; I'll integrate it
into the series.

> However, much as I tried to wrap my head around it, I could not even start
> to come up with a solution for e.g. the t1410 regression. The failure
> happens in `git branch -d one/two`.
> 
> The culprit: there is yet another unreleased snapshot while we try to
> finish the transaction.
> 
> It is the snapshot of the main_worktree_ref_store as acquired by
> add_head_info() (which is called from get_main_worktree(), which in turn
> was called from get_worktrees(), in turn having been called from
> find_shared_symref() that was called from delete_branches()), and it seems
> to me that there was never any plan to have that released, including its
> snapshot.

Yeah the idea was that the default situation would be that whenever a
`packed-refs` file needs to be read, it would be kept mmapped for the
life of the program (or until the file was detected to have been
changed). This was meant to be a replacement for the explicit
`ref_cache`. So any long-lived git process could be expected to have the
`packed-refs` file (or even multiple `packed-refs` files in the case of
submodules) mmapped.

That's obviously not going to work on Windows.

> [...]
> Do you have any idea how to ensure that all mmap()ed packed refs are
> released before attempting to finish a transaction?

[And from your other email:]
> This is only one example, as I figure that multiple worktrees could cause
> even more ref_stores to have unreleased snapshots of the packed-refs file.
> 
> I imagine that something like close_all_packs() is needed
> ("clear_all_snapshots()"?).

Yes, I wasn't really familiar with `close_all_packs()`, but it sounds
like it solves a similar problem.

Within a single process, we could make this work via an arduous process
of figuring out what functions might want to overwrite the `packed-refs`
file, and making sure that nobody up the call stack is iterating over
references during those function calls. If that condition is met, then
calling `clear_snapshot()` earlier, as you propose above, would decrease
the reference count to zero, causing the old snapshot to be freed, and
allowing the rename to succeed. We could even do

if (!clear_snapshot(refs))
BUG("packed-refs snapshot is still in use");

, analogously to `close_all_packs()`, to help find code that violates
the condition.

Similarly, it wouldn't be allowed to invoke a subprocess or hook
function while iterating over references, and one would have to clear
any current snapshots before doing so. It might even make sense to do
that at the same time as `close_all_packs()`, for example in a new
function `close_all_unnecessary_files()`.

But what about unrelated processes? Anybody reading `packed-refs` would
block anybody who wants to modify it, for example to delete a reference
or pack the references. I think this is worse than the packfile
situation. Packfiles all have unique names, so they never need to be
overwritten; at worst they are deleted. And the deletion is never
critical to correctness. If you can't delete a packfile, the only
consequence is that it hangs around until the next GC.

However, the `packed-refs` file always has the same name, and
overwriting it is sometimes critical for correctness. So it sounds to me
like even *readers* mustn't keep the file open or mmapped longer than
necessary!

(This makes me wonder, doesn't `rename_tempfile()` for the `packed-refs`
file *already* fail sometimes on Windows due to a concurrent reader?

Re: [PATCH 00/20] Read `packed-refs` using mmap()

2017-09-14 Thread Michael Haggerty
On 09/13/2017 07:15 PM, Michael Haggerty wrote:
> [...]
> * `mmap()` the whole file rather than `read()`ing it.
> [...]
Apparently this doesn't work on Windows, because the `snapshot` is
keeping the `packed-refs` file open too long, so the new file can't be
renamed on top of it.

I didn't realize that this is even allowed, but TIL that you can close a
file while keeping it mmapped. Does that technique work on Windows? If
so, I'll change v2 to do it as sketched below.

Michael

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 8235ac8506..95c1cd2a27 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -35,11 +35,8 @@ struct snapshot {
 */
struct packed_ref_store *refs;

-   /*
-* The file descriptor of the `packed-refs` file (open in
-* read-only mode), or -1 if it is not open.
-*/
-   int fd;
+   /* Is the `packed-refs` file currently mmapped? */
+   int mmapped;

/*
 * The contents of the `packed-refs` file. If the file was
@@ -135,12 +132,11 @@ static void acquire_snapshot(struct snapshot
*snapshot)
  */
 static void clear_snapshot_buffer(struct snapshot *snapshot)
 {
-   if (snapshot->fd >= 0) {
+   if (snapshot->mmapped) {
if (munmap(snapshot->buf, snapshot->eof - snapshot->buf))
die_errno("error ummapping packed-refs file %s",
  snapshot->refs->path);
-   close(snapshot->fd);
-   snapshot->fd = -1;
+   snapshot->mmapped = 0;
} else {
free(snapshot->buf);
}
@@ -525,6 +521,7 @@ static const char *find_reference_location(struct
snapshot *snapshot,
 static struct snapshot *create_snapshot(struct packed_ref_store *refs)
 {
struct snapshot *snapshot = xcalloc(1, sizeof(*snapshot));
+   int fd;
struct stat st;
size_t size;
int sorted = 0;
@@ -533,8 +530,8 @@ static struct snapshot *create_snapshot(struct
packed_ref_store *refs)
acquire_snapshot(snapshot);
snapshot->peeled = PEELED_NONE;

-   snapshot->fd = open(refs->path, O_RDONLY);
-   if (snapshot->fd < 0) {
+   fd = open(refs->path, O_RDONLY);
+   if (fd < 0) {
if (errno == ENOENT) {
/*
 * This is OK; it just means that no
@@ -549,15 +546,16 @@ static struct snapshot *create_snapshot(struct
packed_ref_store *refs)
}
}

-   stat_validity_update(>validity, snapshot->fd);
+   stat_validity_update(>validity, fd);

-   if (fstat(snapshot->fd, ) < 0)
+   if (fstat(fd, ) < 0)
die_errno("couldn't stat %s", refs->path);

size = xsize_t(st.st_size);
-   snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE,
- snapshot->fd, 0);
+   snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
snapshot->eof = snapshot->buf + size;
+   snapshot->mmapped = 1;
+   close(fd);

/* If the file has a header line, process it: */
if (snapshot->buf < snapshot->eof && *snapshot->buf == '#') {


[PATCH 2/2] test-lib: ulimit does not limit on CYGWIN and MINGW

2017-09-14 Thread Michael J Gruber
ulimit succeeds (by return value) but does not limit on some systems.

Set ulimit() to false on these systems so that we do not rely on its
output nor effect. As an intended side-effect, ulimit based
prerequisites are set correctly (to "not-have") on these systems.

Reported-by: Ramsay Jones <ram...@ramsayjones.plus.com>
Reported-by: Adam Dinwoodie <a...@dinwoodie.org>
Reported-by: Johannes Schindelin <johannes.schinde...@gmx.de>
Signed-off-by: Michael J Gruber <g...@grubix.eu>
---
This is independent of my series, but should best go before so that no
ulimit based test is run on CYGWIN and MINGW.

It follows the basic assumption that a tool like ulimit is either
present and functional or not present; and that we work around by
defines or such when that assumption is broken.
(Alternatively, we could set ULIMT_LIMITS or so and depend on that.)

 t/test-lib.sh | 8 
 1 file changed, 8 insertions(+)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index b8a0b05102..f6ac60d487 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -998,6 +998,10 @@ case $uname_s in
mkfifo () {
false
}
+   # ulimit succeeds but does not limit
+   ulimit () {
+   false
+   }
# no POSIX permissions
# backslashes in pathspec are converted to '/'
# exec does not inherit the PID
@@ -1012,6 +1016,10 @@ case $uname_s in
mkfifo () {
false
}
+   # ulimit succeeds but does not limit
+   ulimit () {
+   false
+   }
test_set_prereq POSIXPERM
test_set_prereq EXECKEEPSPID
test_set_prereq CYGWIN
-- 
2.14.1.712.gda4591c8a2



[PATCH 1/2] test-lib: group system specific FIFO tests by system

2017-09-14 Thread Michael J Gruber
test-lib determines whether a file-system supports FIFOs and needs to do
special casing for CYGWIN and MINGW. This separates those system
specific settings from those at more central place.

Set mkfifo()  to false in the central system specific place so that the
same test works everywhere.

Signed-off-by: Michael J Gruber <g...@grubix.eu>
---
 t/test-lib.sh | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 5fbd8d4a90..b8a0b05102 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -994,6 +994,10 @@ case $uname_s in
pwd () {
builtin pwd -W
}
+   # no FIFOs
+   mkfifo () {
+   false
+   }
# no POSIX permissions
# backslashes in pathspec are converted to '/'
# exec does not inherit the PID
@@ -1004,6 +1008,10 @@ case $uname_s in
GIT_TEST_CMP=mingw_test_cmp
;;
 *CYGWIN*)
+   # no FIFOs
+   mkfifo () {
+   false
+   }
test_set_prereq POSIXPERM
test_set_prereq EXECKEEPSPID
test_set_prereq CYGWIN
@@ -1062,14 +1070,7 @@ test_i18ngrep () {
 
 test_lazy_prereq PIPE '
# test whether the filesystem supports FIFOs
-   case $(uname -s) in
-   CYGWIN*|MINGW*)
-   false
-   ;;
-   *)
-   rm -f testfifo && mkfifo testfifo
-   ;;
-   esac
+   rm -f testfifo && mkfifo testfifo
 '
 
 test_lazy_prereq SYMLINKS '
-- 
2.14.1.712.gda4591c8a2



[PATCH 0/3] merge-base --fork-point fixes

2017-09-14 Thread Michael J Gruber
merge-base --fork-point does not quite work as advertised when the
reflog is empty or partial. This series brings it in line with the
documentation and, hopefully, with the original intent.

Michael J Gruber (3):
  t6010: test actual test output
  merge-base: return fork-point outside reflog
  merge-base: find fork-point outside partial reflog

 builtin/merge-base.c  | 20 
 t/t6010-merge-base.sh | 21 +++--
 2 files changed, 23 insertions(+), 18 deletions(-)

-- 
2.14.1.712.gda4591c8a2



[PATCH 1/3] t6010: test actual test output

2017-09-14 Thread Michael J Gruber
4f21454b55 ("merge-base: handle --fork-point without reflog",
2016-10-12) introduced a fix for merge-base --fork-point without reflog
and a test. While that test is fine, it did not update expected nor actual
output and tested that of the previous test instead.

Fix the test to test the merge-base output, not just its return code.

Signed-off-by: Michael J Gruber <g...@grubix.eu>
---
 t/t6010-merge-base.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/t6010-merge-base.sh b/t/t6010-merge-base.sh
index 31db7b5f91..17fffd7998 100755
--- a/t/t6010-merge-base.sh
+++ b/t/t6010-merge-base.sh
@@ -262,8 +262,9 @@ test_expect_success 'using reflog to find the fork point' '
 
 test_expect_success '--fork-point works with empty reflog' '
git -c core.logallrefupdates=false branch no-reflog base &&
-   git merge-base --fork-point no-reflog derived &&
-   test_cmp expect3 actual
+   git rev-parse base >expect &&
+   git merge-base --fork-point no-reflog derived >actual &&
+   test_cmp expect actual
 '
 
 test_expect_success 'merge-base --octopus --all for complex tree' '
-- 
2.14.1.712.gda4591c8a2



[PATCH 3/3] merge-base: find fork-point outside partial reflog

2017-09-14 Thread Michael J Gruber
In fork-point mode, merge-base adds the commit at refname to a list of
candidates to walk from only when the reflog is empty. Therefore, it
fails to find merge bases that are descendants of the most recent reflog
entry.

Add the commit at the tip unconditionally so that a merge base on the
history of refname is found in any case, independent of the state of the
reflog.

Signed-off-by: Michael J Gruber <g...@grubix.eu>
---
 builtin/merge-base.c  | 2 +-
 t/t6010-merge-base.sh | 8 
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/builtin/merge-base.c b/builtin/merge-base.c
index 926a7615ea..b5b4cf7eac 100644
--- a/builtin/merge-base.c
+++ b/builtin/merge-base.c
@@ -174,7 +174,7 @@ static int handle_fork_point(int argc, const char **argv)
revs.initial = 1;
for_each_reflog_ent(refname, collect_one_reflog_ent, );
 
-   if (!revs.nr && !get_oid(refname, ))
+   if (!get_oid(refname, ))
add_one_commit(, );
 
for (i = 0; i < revs.nr; i++)
diff --git a/t/t6010-merge-base.sh b/t/t6010-merge-base.sh
index 850463d4f2..78342896c7 100755
--- a/t/t6010-merge-base.sh
+++ b/t/t6010-merge-base.sh
@@ -275,6 +275,14 @@ test_expect_success '--fork-point works with merge-base 
outside reflog' '
test_cmp expect actual
 '
 
+test_expect_success '--fork-point works with merge-base outside partial 
reflog' '
+   git -c core.logallrefupdates=true branch partial-reflog base &&
+   git rev-parse no-reflog >.git/refs/heads/partial-reflog &&
+   git rev-parse no-reflog >expect &&
+   git merge-base --fork-point partial-reflog no-reflog >actual &&
+   test_cmp expect actual
+'
+
 test_expect_success 'merge-base --octopus --all for complex tree' '
# Best common ancestor for JE, JAA and JDD is JC
# JE
-- 
2.14.1.712.gda4591c8a2



[PATCH 2/3] merge-base: return fork-point outside reflog

2017-09-14 Thread Michael J Gruber
4f21454b55 ("merge-base: handle --fork-point without reflog",
2016-10-12) fixed the case without reflog, but only partially:
The original code checks whether the merge base candidates are from the
list that we started with, which was the list of reflog entries before
4f21454b55 and the list of reflog entries - or the ref itself if empty -
after. The test from 4f21454b55 tested in a situation where the merge
base candidate equalled the commit at refname, so it was on this (1
item) list by accident.

In fact, per documentation "--fork-point" looks at the reflog in
addition to doing the usual walk from the tip. The original design
description in d96855ff51 ("merge-base: teach "--fork-point" mode",
2013-10-23) describes this as computing from a virtual merge-base of all
the historical tips of refname. They may or may not all be present in
the reflog (think pruning, non-ff fetching, fast forwarding etc.),
so filtering by the current contents of the reflog is potentially
harmful, and it does not seem to fulfill any purpose in the original
design.

Remove the filtering and add a test for an out-of-reflog merge base.

Reported-by: Ekelhart Jakob <jakob.ekelh...@fsw.at>
Signed-off-by: Michael J Gruber <g...@grubix.eu>
---
 builtin/merge-base.c  | 18 +++---
 t/t6010-merge-base.sh |  8 
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/builtin/merge-base.c b/builtin/merge-base.c
index 6dbd167d3b..926a7615ea 100644
--- a/builtin/merge-base.c
+++ b/builtin/merge-base.c
@@ -186,23 +186,11 @@ static int handle_fork_point(int argc, const char **argv)
 * There should be one and only one merge base, when we found
 * a common ancestor among reflog entries.
 */
-   if (!bases || bases->next) {
+   if (!bases || bases->next)
ret = 1;
-   goto cleanup_return;
-   }
-
-   /* And the found one must be one of the reflog entries */
-   for (i = 0; i < revs.nr; i++)
-   if (>item->object == [i]->object)
-   break; /* found */
-   if (revs.nr <= i) {
-   ret = 1; /* not found */
-   goto cleanup_return;
-   }
-
-   printf("%s\n", oid_to_hex(>item->object.oid));
+   else
+   printf("%s\n", oid_to_hex(>item->object.oid));
 
-cleanup_return:
free_commit_list(bases);
return ret;
 }
diff --git a/t/t6010-merge-base.sh b/t/t6010-merge-base.sh
index 17fffd7998..850463d4f2 100755
--- a/t/t6010-merge-base.sh
+++ b/t/t6010-merge-base.sh
@@ -267,6 +267,14 @@ test_expect_success '--fork-point works with empty reflog' 
'
test_cmp expect actual
 '
 
+test_expect_success '--fork-point works with merge-base outside reflog' '
+   git -c core.logallrefupdates=false checkout no-reflog &&
+   git -c core.logallrefupdates=false commit --allow-empty -m "Commit 
outside reflogs" &&
+   git rev-parse base >expect &&
+   git merge-base --fork-point no-reflog derived >actual &&
+   test_cmp expect actual
+'
+
 test_expect_success 'merge-base --octopus --all for complex tree' '
# Best common ancestor for JE, JAA and JDD is JC
# JE
-- 
2.14.1.712.gda4591c8a2



Re: [PATCH] test-lib: don't use ulimit in test prerequisites on cygwin

2017-09-14 Thread Michael J Gruber
Jonathan Nieder venit, vidit, dixit 13.09.2017 21:20:
> Ramsay Jones wrote:
> 
>> On cygwin (and MinGW), the 'ulimit' built-in bash command does not have
>> the desired effect of limiting the resources of new processes, at least
>> for the stack and file descriptors. However, it always returns success
>> and leads to several test prerequisites being erroneously set to true.
>>
>> Add a check for cygwin and MinGW to the prerequisite expressions, using
>> 'uname -s', and return false instead of (indirectly) using 'ulimit'.
>> This affects the prerequisite expressions for the ULIMIT_STACK_SIZE,
>> CMDLINE_LIMIT and ULIMIT_FILE_DESCRIPTORS prerequisites.
>>
>> Signed-off-by: Ramsay Jones <ram...@ramsayjones.plus.com>
>> ---
>>  t/t1400-update-ref.sh | 11 ++-
>>  t/t6120-describe.sh   |  1 -
>>  t/t7004-tag.sh|  1 -
>>  t/test-lib.sh | 22 --
>>  4 files changed, 30 insertions(+), 5 deletions(-)
> 
> Reviewed-by: Jonathan Nieder <jrnie...@gmail.com>
> 
> Thanks.
> 
> An alternative would be to do some more explicit feature-based test
> like using "ulimit" to set a limit and then reading back the limit in
> a separate process, but that feels like overkill.

It's still not clear whether these limits would be in effect or just
reported.

Rather than checking uname -s in several places, I think we should just
define ulimit to be false in one place, or rather set the prerequisite
there.

Michael


Re: merge-base not working as expected when base is ahead

2017-09-14 Thread Michael J Gruber
Ekelhart Jakob venit, vidit, dixit 13.09.2017 17:07:
> Dear Git,
> 
> git merge-base --fork-point "master" not working if master is already newer 
> then my current branch.
> Very oddly it seems to work whenever you had the expected commit checked out 
> previously - what made it very tricky to detect this problem.
> 
> Example:
>  - Clone "https://github.com/jekelhart/GitInfoTry;
>  - Switch to branch "v1.0.0"
>  - git merge-base --fork-point "master"
>- or: git merge-base --fork-point "origin/master" 
>  - expected result: fork point "fbb1db34c6317a6e8b319c1ec261e97ca1672c22"
>  - but result is empty
> 
> In the repo where we created this example tree in the first place the command 
> returned the expected fork point. If you clone it new and fresh it does not 
> return any result anymore.
> 
> Works, however, on branch "v2.0.0". Assumption: because "master" is older.(?)
> I think it works locally because the command uses the reflog in addition(!), 
> however, it should work without the local reflog as well. (since the history 
> was not modified in any way)

The documentation lies, unfortunately. It claims that in fork-mode, "git
merge-base" "also" looks at the reflog. In fact, the code explicitely
*discards* any merge bases that it finds but which are not in the
reflog. (The merge base that you find for v2.0.0 is in the reflog
because it's the checked out HEAD.)

Removing the corresponding check gives the merge base for v1.0.0 that
you expect, and git still passes all tests. But I'll look at the history
of these lines before I submit a patch.

Michael



Re: [PATCH v2] commit-template: change a message to be more intuitive

2017-09-14 Thread Michael J Gruber
Kaartic Sivaraam venit, vidit, dixit 13.09.2017 15:05:
> It's not good to use the phrase 'do not touch' to convey the information
> that the cut-line should not be modified or removed as it could possibly
> be mis-interpreted by a person who doesn't know that the word 'touch' has
> the meaning of 'tamper with'. Further, it could make translations a little
> difficult as it might not have the intended meaning in a few languages when
> translated as such.
> 
> So, use more intuitive terms in the sentence.
> 
> Signed-off-by: Kaartic Sivaraam 
> ---
>  wt-status.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/wt-status.c b/wt-status.c
> index 77c27c51134d2..be53579760ee7 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -934,7 +934,7 @@ size_t wt_status_locate_end(const char *s, size_t len)
>  
>  void wt_status_add_cut_line(FILE *fp)
>  {
> - const char *explanation = _("Do not touch the line above.\nEverything 
> below will be removed.");
> + const char *explanation = _("Do not modify or remove the line 
> above.\nEverything below will be removed.");

I don't want to complicate things. But now - due to the repeated usage
of "remove" - these two sentences seem to be connected by an invisible
"or else" ("will" vs. "would" not withstanding). i.e. "or else
everything below will be removed, too", which is wrong. Before, they
were separated more clearly.

Also, given all the translations that we have, it seems somewhat strange
to try and foresee and workaround possible misunderstandings of commonly
used English phrases.

>   struct strbuf buf = STRBUF_INIT;
>  
>   fprintf(fp, "%c %s", comment_line_char, cut_line);
> 
> --
> https://github.com/git/git/pull/401
> 



Investment

2017-09-13 Thread Michael Stewart
Dear Friend

I write to seek if I can have your professional assistance in this important 
matter which I bring to you. Hence I believe you have the right experience 
needed to accomplish this.
There are fund available and ready for investment which we will need your 
assistance to proceed. Please do get back to me as soon as you can.

We will give you more details as we are interested to work with you by having 
you investing and managing the fund accordingly

I wait to hear from you
Yours sincerely,
Michael Stewart 


[PATCH 16/20] ref_store: implement `refs_peel_ref()` generically

2017-09-13 Thread Michael Haggerty
We're about to stop storing packed refs in a `ref_cache`. That means
that the only way we have left to optimize `peel_ref()` is by checking
whether the reference being peeled is the one currently being iterated
over (in `current_ref_iter`), and if so, using `ref_iterator_peel()`.
But this can be done generically; it doesn't have to be implemented
per-backend.

So implement `refs_peel_ref()` in `refs.c` and remove the `peel_ref()`
method from the refs API.

This removes the last callers of a couple of functions, so delete
them. More cleanup to come...

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs.c| 18 +-
 refs/files-backend.c  | 38 --
 refs/packed-backend.c | 36 
 refs/refs-internal.h  |  3 ---
 4 files changed, 17 insertions(+), 78 deletions(-)

diff --git a/refs.c b/refs.c
index 101c107ee8..c5e6f79c77 100644
--- a/refs.c
+++ b/refs.c
@@ -1735,7 +1735,23 @@ int refs_pack_refs(struct ref_store *refs, unsigned int 
flags)
 int refs_peel_ref(struct ref_store *refs, const char *refname,
  unsigned char *sha1)
 {
-   return refs->be->peel_ref(refs, refname, sha1);
+   int flag;
+   unsigned char base[20];
+
+   if (current_ref_iter && current_ref_iter->refname == refname) {
+   struct object_id peeled;
+
+   if (ref_iterator_peel(current_ref_iter, ))
+   return -1;
+   hashcpy(sha1, peeled.hash);
+   return 0;
+   }
+
+   if (refs_read_ref_full(refs, refname,
+  RESOLVE_REF_READING, base, ))
+   return -1;
+
+   return peel_object(base, sha1);
 }
 
 int peel_ref(const char *refname, unsigned char *sha1)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 35648c89fc..7d12de88d0 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -655,43 +655,6 @@ static int lock_raw_ref(struct files_ref_store *refs,
return ret;
 }
 
-static int files_peel_ref(struct ref_store *ref_store,
- const char *refname, unsigned char *sha1)
-{
-   struct files_ref_store *refs =
-   files_downcast(ref_store, REF_STORE_READ | REF_STORE_ODB,
-  "peel_ref");
-   int flag;
-   unsigned char base[20];
-
-   if (current_ref_iter && current_ref_iter->refname == refname) {
-   struct object_id peeled;
-
-   if (ref_iterator_peel(current_ref_iter, ))
-   return -1;
-   hashcpy(sha1, peeled.hash);
-   return 0;
-   }
-
-   if (refs_read_ref_full(ref_store, refname,
-  RESOLVE_REF_READING, base, ))
-   return -1;
-
-   /*
-* If the reference is packed, read its ref_entry from the
-* cache in the hope that we already know its peeled value.
-* We only try this optimization on packed references because
-* (a) forcing the filling of the loose reference cache could
-* be expensive and (b) loose references anyway usually do not
-* have REF_KNOWS_PEELED.
-*/
-   if (flag & REF_ISPACKED &&
-   !refs_peel_ref(refs->packed_ref_store, refname, sha1))
-   return 0;
-
-   return peel_object(base, sha1);
-}
-
 struct files_ref_iterator {
struct ref_iterator base;
 
@@ -3012,7 +2975,6 @@ struct ref_storage_be refs_be_files = {
files_initial_transaction_commit,
 
files_pack_refs,
-   files_peel_ref,
files_create_symref,
files_delete_refs,
files_rename_ref,
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 9b10532eb3..88242a49fe 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -780,26 +780,6 @@ static struct packed_ref_cache 
*get_packed_ref_cache(struct packed_ref_store *re
return refs->cache;
 }
 
-static struct ref_dir *get_packed_ref_dir(struct packed_ref_cache 
*packed_ref_cache)
-{
-   return get_ref_dir(packed_ref_cache->cache->root);
-}
-
-static struct ref_dir *get_packed_refs(struct packed_ref_store *refs)
-{
-   return get_packed_ref_dir(get_packed_ref_cache(refs));
-}
-
-/*
- * Return the ref_entry for the given refname from the packed
- * references.  If it does not exist, return NULL.
- */
-static struct ref_entry *get_packed_ref(struct packed_ref_store *refs,
-   const char *refname)
-{
-   return find_ref_entry(get_packed_refs(refs), refname);
-}
-
 static int packed_read_raw_ref(struct ref_store *ref_store,
   const char *refname, unsigned char *sha1,
   struct strbuf *referent, unsigned int *type)
@@ -826,21 +806,6 @@ static int packed_read_raw_ref(struct ref_store *ref_store,
re

[PATCH 13/20] read_packed_refs(): ensure that references are ordered when read

2017-09-13 Thread Michael Haggerty
It doesn't actually matter now, because the references are only
iterated over to fill the associated `ref_cache`, which itself puts
them in the correct order. But we want to get rid of the `ref_cache`,
so we want to be able to iterate directly over the `packed-refs`
buffer, and then the iteration will need to be ordered correctly.

In fact, we already write the `packed-refs` file sorted, but it is
possible that other Git clients don't get it right. So let's not
assume that a `packed-refs` file is sorted unless it is explicitly
declared to be so via a `sorted` trait in its header line.

If it is *not* declared to be sorted, then scan quickly through the
file to check. If it is found to be out of order, then sort the
records into a new memory-only copy rather than using the mmapped file
contents. This checking and sorting is done quickly, without parsing
the full file contents. However, it needs a little bit of care to
avoid reading past the end of the buffer even if the `packed-refs`
file is corrupt.

If `packed-refs` is sorted (i.e., its file header contains the
`sorted` trait or our check shows that it is), then use the mmapped
file contents directly.

Since we write the file correctly sorted, include that trait when we
write or rewrite a `packed-refs` file.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 222 ++
 1 file changed, 206 insertions(+), 16 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index d209c8e5a0..2a36dd9afb 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -25,9 +25,12 @@ struct packed_ref_cache {
int fd;
 
/*
-* The range of memory to which the `packed-refs` file is
-* mmapped. If there were no contents (e.g., because the file
-* didn't exist), `buf` and `eof` are both NULL.
+* The contents of the `packed-refs` file. If the file was
+* already sorted, this points at the mmapped contents of the
+* file. If not, this points at heap-allocated memory
+* containing the contents, sorted. If there were no contents
+* (e.g., because the file didn't exist), `buf` and `eof` are
+* both NULL.
 */
char *buf, *eof;
 
@@ -93,22 +96,24 @@ static void acquire_packed_ref_cache(struct 
packed_ref_cache *packed_refs)
 }
 
 /*
- * If the buffer in `packed_refs` is active, munmap the memory, close the
- * file, and set the buffer pointers to NULL.
+ * If the buffer in `packed_refs` is active, then either munmap the
+ * memory and close the file, or free the memory. Then set the buffer
+ * pointers to NULL.
  */
 static void release_packed_ref_buffer(struct packed_ref_cache *packed_refs)
 {
-   if (packed_refs->buf) {
+   if (packed_refs->fd >= 0) {
if (munmap(packed_refs->buf,
   packed_refs->eof - packed_refs->buf))
die_errno("error ummapping packed-refs file %s",
  packed_refs->refs->path);
-   packed_refs->buf = packed_refs->eof = NULL;
-   packed_refs->header_len = 0;
-   }
-
-   if (packed_refs->fd >= 0)
close(packed_refs->fd);
+   packed_refs->fd = -1;
+   } else {
+   free(packed_refs->buf);
+   }
+   packed_refs->buf = packed_refs->eof = NULL;
+   packed_refs->header_len = 0;
 }
 
 /*
@@ -329,7 +334,7 @@ struct ref_iterator *mmapped_ref_iterator_begin(
if (!packed_refs->buf)
return empty_ref_iterator_begin();
 
-   base_ref_iterator_init(ref_iterator, _ref_iterator_vtable, 0);
+   base_ref_iterator_init(ref_iterator, _ref_iterator_vtable, 1);
 
iter->packed_refs = packed_refs;
acquire_packed_ref_cache(iter->packed_refs);
@@ -342,6 +347,170 @@ struct ref_iterator *mmapped_ref_iterator_begin(
return ref_iterator;
 }
 
+struct packed_ref_entry {
+   const char *start;
+   size_t len;
+};
+
+static int cmp_packed_ref_entries(const void *v1, const void *v2)
+{
+   const struct packed_ref_entry *e1 = v1, *e2 = v2;
+   const char *r1 = e1->start + GIT_SHA1_HEXSZ + 1;
+   const char *r2 = e2->start + GIT_SHA1_HEXSZ + 1;
+
+   while (1) {
+   if (*r1 == '\n')
+   return *r2 == '\n' ? 0 : -1;
+   if (*r1 != *r2) {
+   if (*r2 == '\n')
+   return 1;
+   else
+   return (unsigned char)*r1 < (unsigned char)*r2 
? -1 : +1;
+   }
+   r1++;
+   r2++;
+   }
+}
+
+/*
+ * `packed_refs->buf` is not known to be sorted. Check whether it is,
+ * and if not, sort it into new memory and munmap/free the old
+ * storage.
+ */
+static void sort_packed

[PATCH 08/20] read_packed_refs(): read references with minimal copying

2017-09-13 Thread Michael Haggerty
Instead of copying data from the `packed-refs` file one line at time
and then processing it, process the data in place as much as possible.

Also, instead of processing one line per iteration of the main loop,
process a reference line plus its corresponding peeled line (if
present) together.

Note that this change slightly tightens up the parsing of the
`parse-ref` file. Previously, the parser would have accepted multiple
"peeled" lines for a single reference (ignoring all but the last one).
Now it would reject that.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 101 --
 1 file changed, 40 insertions(+), 61 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a45e3ff92f..2b80f244c8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -134,33 +134,6 @@ static void clear_packed_ref_cache(struct packed_ref_store 
*refs)
}
 }
 
-/* The length of a peeled reference line in packed-refs, including EOL: */
-#define PEELED_LINE_LENGTH 42
-
-/*
- * Parse one line from a packed-refs file.  Write the SHA1 to sha1.
- * Return a pointer to the refname within the line (null-terminated),
- * or NULL if there was a problem.
- */
-static const char *parse_ref_line(struct strbuf *line, struct object_id *oid)
-{
-   const char *ref;
-
-   if (parse_oid_hex(line->buf, oid, ) < 0)
-   return NULL;
-   if (!isspace(*ref++))
-   return NULL;
-
-   if (isspace(*ref))
-   return NULL;
-
-   if (line->buf[line->len - 1] != '\n')
-   return NULL;
-   line->buf[--line->len] = 0;
-
-   return ref;
-}
-
 static NORETURN void die_unterminated_line(const char *path,
   const char *p, size_t len)
 {
@@ -221,8 +194,7 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
size_t size;
char *buf;
const char *pos, *eol, *eof;
-   struct ref_entry *last = NULL;
-   struct strbuf line = STRBUF_INIT;
+   struct strbuf tmp = STRBUF_INIT;
enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
struct ref_dir *dir;
 
@@ -264,9 +236,9 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
if (!eol)
die_unterminated_line(refs->path, pos, eof - pos);
 
-   strbuf_add(, pos, eol - pos);
+   strbuf_add(, pos, eol - pos);
 
-   if (!skip_prefix(line.buf, "# pack-refs with:", (const char 
**)))
+   if (!skip_prefix(tmp.buf, "# pack-refs with:", (const char 
**)))
die_invalid_line(refs->path, pos, eof - pos);
 
string_list_split_in_place(, p, ' ', -1);
@@ -281,61 +253,68 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
pos = eol + 1;
 
string_list_clear(, 0);
-   strbuf_reset();
+   strbuf_reset();
}
 
dir = get_ref_dir(packed_refs->cache->root);
while (pos < eof) {
+   const char *p = pos;
struct object_id oid;
const char *refname;
+   int flag = REF_ISPACKED;
+   struct ref_entry *entry = NULL;
 
-   eol = memchr(pos, '\n', eof - pos);
+   if (eof - pos < GIT_SHA1_HEXSZ + 2 ||
+   parse_oid_hex(p, , ) ||
+   !isspace(*p++))
+   die_invalid_line(refs->path, pos, eof - pos);
+
+   eol = memchr(p, '\n', eof - p);
if (!eol)
die_unterminated_line(refs->path, pos, eof - pos);
 
-   strbuf_add(, pos, eol + 1 - pos);
+   strbuf_add(, p, eol - p);
+   refname = tmp.buf;
+
+   if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
+   if (!refname_is_safe(refname))
+   die("packed refname is dangerous: %s", refname);
+   oidclr();
+   flag |= REF_BAD_NAME | REF_ISBROKEN;
+   }
+   if (peeled == PEELED_FULLY ||
+   (peeled == PEELED_TAGS && starts_with(refname, 
"refs/tags/")))
+   flag |= REF_KNOWS_PEELED;
+   entry = create_ref_entry(refname, , flag);
+   add_ref_entry(dir, entry);
 
-   refname = parse_ref_line(, );
-   if (refname) {
-   int flag = REF_ISPACKED;
+   pos = eol + 1;
+
+   if (pos < eof && *pos == '^') {
+   p = pos + 1;
+   if (eof - p < GIT_SHA1_HEXSZ + 1 ||
+   parse_oid_hex(

[PATCH 04/20] die_unterminated_line(), die_invalid_line(): new functions

2017-09-13 Thread Michael Haggerty
Extract some helper functions for reporting errors. While we're at it,
prevent them from spewing unlimited output to the terminal. These
functions will soon have more callers.

These functions accept the problematic line as a `(ptr, len)` pair
rather than a NUL-terminated string, and `die_invalid_line()` checks
for an EOL itself, because these calling conventions will be
convenient for future callers. (Efficiency is not a concern here
because these functions are only ever called if the `packed-refs` file
is corrupt.)

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 28 +---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a3d9210cb0..5c50c223ef 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -161,6 +161,29 @@ static const char *parse_ref_line(struct strbuf *line, 
struct object_id *oid)
return ref;
 }
 
+static NORETURN void die_unterminated_line(const char *path,
+  const char *p, size_t len)
+{
+   if (len < 80)
+   die("unterminated line in %s: %.*s", path, (int)len, p);
+   else
+   die("unterminated line in %s: %.75s...", path, p);
+}
+
+static NORETURN void die_invalid_line(const char *path,
+ const char *p, size_t len)
+{
+   const char *eol = memchr(p, '\n', len);
+
+   if (!eol)
+   die_unterminated_line(path, p, len);
+   else if (eol - p < 80)
+   die("unexpected line in %s: %.*s", path, (int)(eol - p), p);
+   else
+   die("unexpected line in %s: %.75s...", path, p);
+
+}
+
 /*
  * Read from the `packed-refs` file into a newly-allocated
  * `packed_ref_cache` and return it. The return value will already
@@ -227,7 +250,7 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
const char *traits;
 
if (!line.len || line.buf[line.len - 1] != '\n')
-   die("unterminated line in %s: %s", refs->path, 
line.buf);
+   die_unterminated_line(refs->path, line.buf, line.len);
 
if (skip_prefix(line.buf, "# pack-refs with:", )) {
if (strstr(traits, " fully-peeled "))
@@ -266,8 +289,7 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
 */
last->flag |= REF_KNOWS_PEELED;
} else {
-   strbuf_setlen(, line.len - 1);
-   die("unexpected line in %s: %s", refs->path, line.buf);
+   die_invalid_line(refs->path, line.buf, line.len);
}
}
 
-- 
2.14.1



[PATCH 10/20] mmapped_ref_iterator: add iterator over a packed-refs file

2017-09-13 Thread Michael Haggerty
Add a new `mmapped_ref_iterator`, which can iterate over the
references in an mmapped `packed-refs` file directly. Use this
iterator from `read_packed_refs()` to fill the packed refs cache.

Note that we are not yet willing to promise that the new iterator
generates its output in order. That doesn't matter for now, because
the packed refs cache doesn't care what order it is filled.

This change adds a lot of boilerplate without providing any obvious
benefits. The benefits will come soon, when we get rid of the
`ref_cache` for packed references altogether.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 207 --
 1 file changed, 152 insertions(+), 55 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index ae276f3445..312116a99d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -163,6 +163,141 @@ static NORETURN void die_invalid_line(const char *path,
 
 }
 
+/*
+ * An iterator over a packed-refs file that is currently mmapped.
+ */
+struct mmapped_ref_iterator {
+   struct ref_iterator base;
+
+   struct packed_ref_cache *packed_refs;
+
+   /* The current position in the mmapped file: */
+   const char *pos;
+
+   /* The end of the mmapped file: */
+   const char *eof;
+
+   struct object_id oid, peeled;
+
+   struct strbuf refname_buf;
+};
+
+static int mmapped_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+   struct mmapped_ref_iterator *iter =
+   (struct mmapped_ref_iterator *)ref_iterator;
+   const char *p = iter->pos, *eol;
+
+   strbuf_reset(>refname_buf);
+
+   if (iter->pos == iter->eof)
+   return ref_iterator_abort(ref_iterator);
+
+   iter->base.flags = REF_ISPACKED;
+
+   if (iter->eof - p < GIT_SHA1_HEXSZ + 2 ||
+   parse_oid_hex(p, >oid, ) ||
+   !isspace(*p++))
+   die_invalid_line(iter->packed_refs->refs->path,
+iter->pos, iter->eof - iter->pos);
+
+   eol = memchr(p, '\n', iter->eof - p);
+   if (!eol)
+   die_unterminated_line(iter->packed_refs->refs->path,
+ iter->pos, iter->eof - iter->pos);
+
+   strbuf_add(>refname_buf, p, eol - p);
+   iter->base.refname = iter->refname_buf.buf;
+
+   if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) {
+   if (!refname_is_safe(iter->base.refname))
+   die("packed refname is dangerous: %s",
+   iter->base.refname);
+   oidclr(>oid);
+   iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN;
+   }
+   if (iter->packed_refs->peeled == PEELED_FULLY ||
+   (iter->packed_refs->peeled == PEELED_TAGS &&
+starts_with(iter->base.refname, "refs/tags/")))
+   iter->base.flags |= REF_KNOWS_PEELED;
+
+   iter->pos = eol + 1;
+
+   if (iter->pos < iter->eof && *iter->pos == '^') {
+   p = iter->pos + 1;
+   if (iter->eof - p < GIT_SHA1_HEXSZ + 1 ||
+   parse_oid_hex(p, >peeled, ) ||
+   *p++ != '\n')
+   die_invalid_line(iter->packed_refs->refs->path,
+iter->pos, iter->eof - iter->pos);
+   iter->pos = p;
+
+   /*
+* Regardless of what the file header said, we
+* definitely know the value of *this* reference:
+*/
+   iter->base.flags |= REF_KNOWS_PEELED;
+   } else {
+   oidclr(>peeled);
+   }
+
+   return ITER_OK;
+}
+
+static int mmapped_ref_iterator_peel(struct ref_iterator *ref_iterator,
+   struct object_id *peeled)
+{
+   struct mmapped_ref_iterator *iter =
+   (struct mmapped_ref_iterator *)ref_iterator;
+
+   if ((iter->base.flags & REF_KNOWS_PEELED)) {
+   oidcpy(peeled, >peeled);
+   return is_null_oid(>peeled) ? -1 : 0;
+   } else if ((iter->base.flags & (REF_ISBROKEN | REF_ISSYMREF))) {
+   return -1;
+   } else {
+   return !!peel_object(iter->oid.hash, peeled->hash);
+   }
+}
+
+static int mmapped_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+   struct mmapped_ref_iterator *iter =
+   (struct mmapped_ref_iterator *)ref_iterator;
+
+   release_packed_ref_cache(iter->packed_refs);
+   strbuf_release(>refname_buf);
+   base_ref_iterator_free(ref_iterator);
+   return ITER_DONE;
+}
+
+static struct ref_iterator_vtable mmapped_ref_iterator_vtable = {
+   mmapped_

[PATCH 15/20] packed_read_raw_ref(): read the reference from the mmapped buffer

2017-09-13 Thread Michael Haggerty
Instead of reading the reference from the `ref_cache`, read it
directly from the mmapped buffer.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a018a6c8ad..9b10532eb3 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -806,18 +806,22 @@ static int packed_read_raw_ref(struct ref_store 
*ref_store,
 {
struct packed_ref_store *refs =
packed_downcast(ref_store, REF_STORE_READ, "read_raw_ref");
-
-   struct ref_entry *entry;
+   struct packed_ref_cache *packed_refs = get_packed_ref_cache(refs);
+   const char *rec;
 
*type = 0;
 
-   entry = get_packed_ref(refs, refname);
-   if (!entry) {
+   rec = find_reference_location(packed_refs, refname, 1);
+
+   if (!rec) {
+   /* refname is not a packed reference. */
errno = ENOENT;
return -1;
}
 
-   hashcpy(sha1, entry->u.value.oid.hash);
+   if (get_sha1_hex(rec, sha1))
+   die_invalid_line(refs->path, rec, packed_refs->eof - rec);
+
*type = REF_ISPACKED;
return 0;
 }
-- 
2.14.1



[PATCH 18/20] ref_cache: remove support for storing peeled values

2017-09-13 Thread Michael Haggerty
Now that the `packed-refs` backend doesn't use `ref_cache`, there is
nobody left who might want to store peeled values of references in
`ref_cache`. So remove that feature.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c |  9 -
 refs/ref-cache.c  | 42 +-
 refs/ref-cache.h  | 32 ++--
 3 files changed, 11 insertions(+), 72 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 9868a5c198..d76051c7f5 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -2,7 +2,6 @@
 #include "../config.h"
 #include "../refs.h"
 #include "refs-internal.h"
-#include "ref-cache.h"
 #include "packed-backend.h"
 #include "../iterator.h"
 #include "../lockfile.h"
@@ -201,6 +200,14 @@ static NORETURN void die_invalid_line(const char *path,
 
 }
 
+/*
+ * This value is set in `base.flags` if the peeled value of the
+ * current reference is known. In that case, `peeled` contains the
+ * correct peeled value for the reference, which might be `null_sha1`
+ * if the reference is not a tag or if it is broken.
+ */
+#define REF_KNOWS_PEELED 0x40
+
 /*
  * An iterator over a packed-refs file that is currently mmapped.
  */
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 54dfb5218c..4f850e1b5c 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -38,7 +38,6 @@ struct ref_entry *create_ref_entry(const char *refname,
 
FLEX_ALLOC_STR(ref, name, refname);
oidcpy(>u.value.oid, oid);
-   oidclr(>u.value.peeled);
ref->flag = flag;
return ref;
 }
@@ -491,49 +490,10 @@ static int cache_ref_iterator_advance(struct ref_iterator 
*ref_iterator)
}
 }
 
-enum peel_status peel_entry(struct ref_entry *entry, int repeel)
-{
-   enum peel_status status;
-
-   if (entry->flag & REF_KNOWS_PEELED) {
-   if (repeel) {
-   entry->flag &= ~REF_KNOWS_PEELED;
-   oidclr(>u.value.peeled);
-   } else {
-   return is_null_oid(>u.value.peeled) ?
-   PEEL_NON_TAG : PEEL_PEELED;
-   }
-   }
-   if (entry->flag & REF_ISBROKEN)
-   return PEEL_BROKEN;
-   if (entry->flag & REF_ISSYMREF)
-   return PEEL_IS_SYMREF;
-
-   status = peel_object(entry->u.value.oid.hash, 
entry->u.value.peeled.hash);
-   if (status == PEEL_PEELED || status == PEEL_NON_TAG)
-   entry->flag |= REF_KNOWS_PEELED;
-   return status;
-}
-
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
   struct object_id *peeled)
 {
-   struct cache_ref_iterator *iter =
-   (struct cache_ref_iterator *)ref_iterator;
-   struct cache_ref_iterator_level *level;
-   struct ref_entry *entry;
-
-   level = >levels[iter->levels_nr - 1];
-
-   if (level->index == -1)
-   die("BUG: peel called before advance for cache iterator");
-
-   entry = level->dir->entries[level->index];
-
-   if (peel_entry(entry, 0))
-   return -1;
-   oidcpy(peeled, >u.value.peeled);
-   return 0;
+   return peel_object(ref_iterator->oid->hash, peeled->hash);
 }
 
 static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
diff --git a/refs/ref-cache.h b/refs/ref-cache.h
index a082bfb06c..eda65e73ed 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -38,14 +38,6 @@ struct ref_value {
 * referred to by the last reference in the symlink chain.
 */
struct object_id oid;
-
-   /*
-* If REF_KNOWS_PEELED, then this field holds the peeled value
-* of this reference, or null if the reference is known not to
-* be peelable.  See the documentation for peel_ref() for an
-* exact definition of "peelable".
-*/
-   struct object_id peeled;
 };
 
 /*
@@ -97,21 +89,14 @@ struct ref_dir {
  * public values; see refs.h.
  */
 
-/*
- * The field ref_entry->u.value.peeled of this value entry contains
- * the correct peeled value for the reference, which might be
- * null_sha1 if the reference is not a tag or if it is broken.
- */
-#define REF_KNOWS_PEELED 0x10
-
 /* ref_entry represents a directory of references */
-#define REF_DIR 0x20
+#define REF_DIR 0x10
 
 /*
  * Entry has not yet been read from disk (used only for REF_DIR
  * entries representing loose references)
  */
-#define REF_INCOMPLETE 0x40
+#define REF_INCOMPLETE 0x20
 
 /*
  * A ref_entry represents either a reference or a "subdirectory" of
@@ -252,17 +237,4 @@ struct ref_iterator *cache_ref_iterator_begin(struct 
ref_cache *cache,
  const char *prefix,
   

[PATCH 19/20] mmapped_ref_iterator: inline into `packed_ref_iterator`

2017-09-13 Thread Michael Haggerty
Since `packed_ref_iterator` is now delegating to
`mmapped_ref_iterator` rather than `cache_ref_iterator` to do the
heavy lifting, there is no need to keep the two iterators separate. So
"inline" `mmapped_ref_iterator` into `packed_ref_iterator`. This
removes a bunch of boilerplate.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 284 --
 1 file changed, 114 insertions(+), 170 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index d76051c7f5..a3f8a19b9b 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -200,157 +200,6 @@ static NORETURN void die_invalid_line(const char *path,
 
 }
 
-/*
- * This value is set in `base.flags` if the peeled value of the
- * current reference is known. In that case, `peeled` contains the
- * correct peeled value for the reference, which might be `null_sha1`
- * if the reference is not a tag or if it is broken.
- */
-#define REF_KNOWS_PEELED 0x40
-
-/*
- * An iterator over a packed-refs file that is currently mmapped.
- */
-struct mmapped_ref_iterator {
-   struct ref_iterator base;
-
-   struct packed_ref_cache *packed_refs;
-
-   /* The current position in the mmapped file: */
-   const char *pos;
-
-   /* The end of the mmapped file: */
-   const char *eof;
-
-   struct object_id oid, peeled;
-
-   struct strbuf refname_buf;
-};
-
-static int mmapped_ref_iterator_advance(struct ref_iterator *ref_iterator)
-{
-   struct mmapped_ref_iterator *iter =
-   (struct mmapped_ref_iterator *)ref_iterator;
-   const char *p = iter->pos, *eol;
-
-   strbuf_reset(>refname_buf);
-
-   if (iter->pos == iter->eof)
-   return ref_iterator_abort(ref_iterator);
-
-   iter->base.flags = REF_ISPACKED;
-
-   if (iter->eof - p < GIT_SHA1_HEXSZ + 2 ||
-   parse_oid_hex(p, >oid, ) ||
-   !isspace(*p++))
-   die_invalid_line(iter->packed_refs->refs->path,
-iter->pos, iter->eof - iter->pos);
-
-   eol = memchr(p, '\n', iter->eof - p);
-   if (!eol)
-   die_unterminated_line(iter->packed_refs->refs->path,
- iter->pos, iter->eof - iter->pos);
-
-   strbuf_add(>refname_buf, p, eol - p);
-   iter->base.refname = iter->refname_buf.buf;
-
-   if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) {
-   if (!refname_is_safe(iter->base.refname))
-   die("packed refname is dangerous: %s",
-   iter->base.refname);
-   oidclr(>oid);
-   iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN;
-   }
-   if (iter->packed_refs->peeled == PEELED_FULLY ||
-   (iter->packed_refs->peeled == PEELED_TAGS &&
-starts_with(iter->base.refname, "refs/tags/")))
-   iter->base.flags |= REF_KNOWS_PEELED;
-
-   iter->pos = eol + 1;
-
-   if (iter->pos < iter->eof && *iter->pos == '^') {
-   p = iter->pos + 1;
-   if (iter->eof - p < GIT_SHA1_HEXSZ + 1 ||
-   parse_oid_hex(p, >peeled, ) ||
-   *p++ != '\n')
-   die_invalid_line(iter->packed_refs->refs->path,
-iter->pos, iter->eof - iter->pos);
-   iter->pos = p;
-
-   /*
-* Regardless of what the file header said, we
-* definitely know the value of *this* reference. But
-* we suppress it if the reference is broken:
-*/
-   if ((iter->base.flags & REF_ISBROKEN)) {
-   oidclr(>peeled);
-   iter->base.flags &= ~REF_KNOWS_PEELED;
-   } else {
-   iter->base.flags |= REF_KNOWS_PEELED;
-   }
-   } else {
-   oidclr(>peeled);
-   }
-
-   return ITER_OK;
-}
-
-static int mmapped_ref_iterator_peel(struct ref_iterator *ref_iterator,
-   struct object_id *peeled)
-{
-   struct mmapped_ref_iterator *iter =
-   (struct mmapped_ref_iterator *)ref_iterator;
-
-   if ((iter->base.flags & REF_KNOWS_PEELED)) {
-   oidcpy(peeled, >peeled);
-   return is_null_oid(>peeled) ? -1 : 0;
-   } else if ((iter->base.flags & (REF_ISBROKEN | REF_ISSYMREF))) {
-   return -1;
-   } else {
-   return !!peel_object(iter->oid.hash, peeled->hash);
-   }
-}
-
-static int mmapped_ref_iterator_abort(struct ref_iterator *ref_iterator)
-{
-   struct mmapped_ref_iterator

[PATCH 14/20] packed_ref_iterator_begin(): iterate using `mmapped_ref_iterator`

2017-09-13 Thread Michael Haggerty
Now that we have an efficient way to iterate, in order, over the
mmapped contents of the `packed-refs` file, we can use that directly
to implement reference iteration for the `packed_ref_store`, rather
than iterating over the `ref_cache`. This is the next step towards
getting rid of the `ref_cache` entirely.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 109 --
 1 file changed, 106 insertions(+), 3 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 2a36dd9afb..a018a6c8ad 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -372,6 +372,27 @@ static int cmp_packed_ref_entries(const void *v1, const 
void *v2)
}
 }
 
+/*
+ * Compare a packed-refs record pointed to by `rec` to the specified
+ * NUL-terminated refname.
+ */
+static int cmp_entry_to_refname(const char *rec, const char *refname)
+{
+   const char *r1 = rec + GIT_SHA1_HEXSZ + 1;
+   const char *r2 = refname;
+
+   while (1) {
+   if (*r1 == '\n')
+   return *r2 ? -1 : 0;
+   if (!*r2)
+   return 1;
+   if (*r1 != *r2)
+   return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : 
+1;
+   r1++;
+   r2++;
+   }
+}
+
 /*
  * `packed_refs->buf` is not known to be sorted. Check whether it is,
  * and if not, sort it into new memory and munmap/free the old
@@ -478,6 +499,17 @@ static const char *find_start_of_record(const char *buf, 
const char *p)
return p;
 }
 
+/*
+ * Return a pointer to the start of the record following the record
+ * that contains `*p`. If none is found before `end`, return `end`.
+ */
+static const char *find_end_of_record(const char *p, const char *end)
+{
+   while (++p < end && (p[-1] != '\n' || p[0] == '^'))
+   ;
+   return p;
+}
+
 /*
  * We want to be able to compare mmapped reference records quickly,
  * without totally parsing them. We can do so because the records are
@@ -511,6 +543,65 @@ static void verify_buffer_safe(struct packed_ref_cache 
*packed_refs)
 last_line, eof - last_line);
 }
 
+/*
+ * Find the place in `cache->buf` where the start of the record for
+ * `refname` starts. If `mustexist` is true and the reference doesn't
+ * exist, then return NULL. If `mustexist` is false and the reference
+ * doesn't exist, then return the point where that reference would be
+ * inserted. In the latter mode, `refname` doesn't have to be a proper
+ * reference name; for example, one could search for "refs/replace/"
+ * to find the start of any replace references.
+ *
+ * The record is sought using a binary search, so `cache->buf` must be
+ * sorted.
+ */
+static const char *find_reference_location(struct packed_ref_cache *cache,
+  const char *refname, int mustexist)
+{
+   /*
+* This is not *quite* a garden-variety binary search, because
+* the data we're searching is made up of records, and we
+* always need to find the beginning of a record to do a
+* comparison. A "record" here is one line for the reference
+* itself and zero or one peel lines that start with '^'. Our
+* loop invariant is described in the next two comments.
+*/
+
+   /*
+* A pointer to the character at the start of a record whose
+* preceding records all have reference names that come
+* *before* `refname`.
+*/
+   const char *lo = cache->buf + cache->header_len;
+
+   /*
+* A pointer to a the first character of a record whose
+* reference name comes *after* `refname`.
+*/
+   const char *hi = cache->eof;
+
+   while (lo < hi) {
+   const char *mid, *rec;
+   int cmp;
+
+   mid = lo + (hi - lo) / 2;
+   rec = find_start_of_record(lo, mid);
+   cmp = cmp_entry_to_refname(rec, refname);
+   if (cmp < 0) {
+   lo = find_end_of_record(mid, hi);
+   } else if (cmp > 0) {
+   hi = rec;
+   } else {
+   return rec;
+   }
+   }
+
+   if (mustexist)
+   return NULL;
+   else
+   return lo;
+}
+
 /*
  * Read from the `packed-refs` file into a newly-allocated
  * `packed_ref_cache` and return it. The return value will already
@@ -818,6 +909,8 @@ static struct ref_iterator *packed_ref_iterator_begin(
const char *prefix, unsigned int flags)
 {
struct packed_ref_store *refs;
+   struct packed_ref_cache *packed_refs;
+   const char *start;
struct packed_ref_iterator *iter;
struct ref_iterator *ref_iterator;
unsigned int required_flags

[PATCH 17/20] packed_ref_store: get rid of the `ref_cache` entirely

2017-09-13 Thread Michael Haggerty
Now that everything has been changed to read what it needs directly
out of the `packed-refs` file, `packed_ref_store` doesn't need to
maintain a `ref_cache` at all. So get rid of it.

First of all, this will save a lot of memory and lots of little
allocations. Instead of needing to store complicated parsed data
structures in memory, we just mmap the file (potentially sharing
memory with other processes) and parse only what we need.

Moreover, since the mmapped access to the file reads only the parts of
the file that it needs, this might save reading all of the data from
disk at all (at least if the file starts out sorted).

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 29 ++---
 1 file changed, 2 insertions(+), 27 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 88242a49fe..9868a5c198 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -16,8 +16,6 @@ struct packed_ref_cache {
 */
struct packed_ref_store *refs;
 
-   struct ref_cache *cache;
-
/*
 * The file descriptor of the `packed-refs` file (open in
 * read-only mode), or -1 if it is not open.
@@ -123,7 +121,6 @@ static void release_packed_ref_buffer(struct 
packed_ref_cache *packed_refs)
 static int release_packed_ref_cache(struct packed_ref_cache *packed_refs)
 {
if (!--packed_refs->referrers) {
-   free_ref_cache(packed_refs->cache);
stat_validity_clear(_refs->validity);
release_packed_ref_buffer(packed_refs);
free(packed_refs);
@@ -640,15 +637,10 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs));
struct stat st;
size_t size;
-   struct ref_dir *dir;
-   struct ref_iterator *iter;
int sorted = 0;
-   int ok;
 
packed_refs->refs = refs;
acquire_packed_ref_cache(packed_refs);
-   packed_refs->cache = create_ref_cache(NULL, NULL);
-   packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
packed_refs->peeled = PEELED_NONE;
 
packed_refs->fd = open(refs->path, O_RDONLY);
@@ -730,23 +722,6 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
verify_buffer_safe(packed_refs);
}
 
-   dir = get_ref_dir(packed_refs->cache->root);
-   iter = mmapped_ref_iterator_begin(
-   packed_refs,
-   packed_refs->buf + packed_refs->header_len,
-   packed_refs->eof);
-   while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
-   struct ref_entry *entry =
-   create_ref_entry(iter->refname, iter->oid, iter->flags);
-
-   if ((iter->flags & REF_KNOWS_PEELED))
-   ref_iterator_peel(iter, >u.value.peeled);
-   add_ref_entry(dir, entry);
-   }
-
-   if (ok != ITER_DONE)
-   die("error reading packed-refs file %s", refs->path);
-
return packed_refs;
 }
 
@@ -905,8 +880,8 @@ static struct ref_iterator *packed_ref_iterator_begin(
else
start = packed_refs->buf + packed_refs->header_len;
 
-   iter->iter0 = mmapped_ref_iterator_begin(
-   packed_refs, start, packed_refs->eof);
+   iter->iter0 = mmapped_ref_iterator_begin(packed_refs,
+start, packed_refs->eof);
 
iter->flags = flags;
 
-- 
2.14.1



[PATCH 20/20] packed-backend.c: rename a bunch of things and update comments

2017-09-13 Thread Michael Haggerty
We've made huge changes to this file, and some of the old names and
comments are no longer very fitting. So rename a bunch of things:

* `struct packed_ref_cache` → `struct snapshot`
* `acquire_packed_ref_cache()` → `acquire_snapshot()`
* `release_packed_ref_buffer()` → `clear_snapshot_buffer()`
* `release_packed_ref_cache()` → `release_snapshot()`
* `clear_packed_ref_cache()` → `clear_snapshot()`
* `struct packed_ref_entry` → `struct snapshot_record`
* `cmp_packed_ref_entries()` → `cmp_packed_ref_records()`
* `cmp_entry_to_refname()` → `cmp_record_to_refname()`
* `sort_packed_refs()` → `sort_snapshot()`
* `read_packed_refs()` → `create_snapshot()`
* `validate_packed_ref_cache()` → `validate_snapshot()`
* `get_packed_ref_cache()` → `get_snapshot()`
* Renamed local variables and struct members accordingly.

Also update a bunch of comments to reflect the renaming and the
accumulated changes that the code has undergone.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 392 --
 1 file changed, 217 insertions(+), 175 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a3f8a19b9b..8235ac8506 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -8,10 +8,30 @@
 
 struct packed_ref_store;
 
-struct packed_ref_cache {
+/*
+ * A `snapshot` represents one snapshot of a `packed-refs` file.
+ *
+ * Normally, this will be a mmapped view of the contents of the
+ * `packed-refs` file at the time the snapshot was created. However,
+ * if the `packed-refs` file was not sorted, this might point at heap
+ * memory holding the contents of the `packed-refs` file with its
+ * records sorted by refname.
+ *
+ * `snapshot` instances are reference counted (via
+ * `acquire_snapshot()` and `release_snapshot()`). This is to prevent
+ * an instance from disappearing while an iterator is still iterating
+ * over it. Instances are garbage collected when their `referrers`
+ * count goes to zero.
+ *
+ * The most recent `snapshot`, if available, is referenced by the
+ * `packed_ref_store`. Its freshness is checked whenever
+ * `get_snapshot()` is called; if the existing snapshot is obsolete, a
+ * new snapshot is taken.
+ */
+struct snapshot {
/*
 * A back-pointer to the packed_ref_store with which this
-* cache is associated:
+* snapshot is associated:
 */
struct packed_ref_store *refs;
 
@@ -35,26 +55,42 @@ struct packed_ref_cache {
size_t header_len;
 
/*
-* What is the peeled state of this cache? (This is usually
-* determined from the header of the "packed-refs" file.)
+* What is the peeled state of the `packed-refs` file that
+* this snapshot represents? (This is usually determined from
+* the file's header.)
 */
enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled;
 
/*
-* Count of references to the data structure in this instance,
-* including the pointer from files_ref_store::packed if any.
-* The data will not be freed as long as the reference count
-* is nonzero.
+* Count of references to this instance, including the pointer
+* from `packed_ref_store::snapshot`, if any. The instance
+* will not be freed as long as the reference count is
+* nonzero.
 */
unsigned int referrers;
 
-   /* The metadata from when this packed-refs cache was read */
+   /*
+* The metadata of the `packed-refs` file from which this
+* snapshot was created, used to tell if the file has been
+* replaced since we read it.
+*/
struct stat_validity validity;
 };
 
 /*
- * A container for `packed-refs`-related data. It is not (yet) a
- * `ref_store`.
+ * A `ref_store` representing references stored in a `packed-refs`
+ * file. It implements the `ref_store` interface, though it has some
+ * limitations:
+ *
+ * - It cannot store symbolic references.
+ *
+ * - It cannot store reflogs.
+ *
+ * - It does not support reference renaming (though it could).
+ *
+ * On the other hand, it can be locked outside of a reference
+ * transaction. In that case, it remains locked even after the
+ * transaction is done and the new `packed-refs` file is activated.
  */
 struct packed_ref_store {
struct ref_store base;
@@ -65,10 +101,10 @@ struct packed_ref_store {
char *path;
 
/*
-* A cache of the values read from the `packed-refs` file, if
-* it might still be current; otherwise, NULL.
+* A snapshot of the values read from the `packed-refs` file,
+* if it might still be current; otherwise, NULL.
 */
-   struct packed_ref_cache *cache;
+   struct snapshot *snapshot;
 
/*
 * Lock used for the "packed-refs" file. Note that this (and
@@ -85,44 +121,43 @@ struct packed_ref_store {
 };
 
 /*
-

[PATCH 09/20] packed_ref_cache: remember the file-wide peeling state

2017-09-13 Thread Michael Haggerty
Rather than store the peeling state (i.e., the one defined by traits
in the `packed-refs` file header line) in a local variable in
`read_packed_refs()`, store it permanently in `packed_ref_cache`. This
will be needed when we stop reading all packed refs at once.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 2b80f244c8..ae276f3445 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -18,6 +18,12 @@ struct packed_ref_cache {
 
struct ref_cache *cache;
 
+   /*
+* What is the peeled state of this cache? (This is usually
+* determined from the header of the "packed-refs" file.)
+*/
+   enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled;
+
/*
 * Count of references to the data structure in this instance,
 * including the pointer from files_ref_store::packed if any.
@@ -195,13 +201,13 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
char *buf;
const char *pos, *eol, *eof;
struct strbuf tmp = STRBUF_INIT;
-   enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
struct ref_dir *dir;
 
packed_refs->refs = refs;
acquire_packed_ref_cache(packed_refs);
packed_refs->cache = create_ref_cache(NULL, NULL);
packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
+   packed_refs->peeled = PEELED_NONE;
 
fd = open(refs->path, O_RDONLY);
if (fd < 0) {
@@ -244,9 +250,9 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
string_list_split_in_place(, p, ' ', -1);
 
if (unsorted_string_list_has_string(, "fully-peeled"))
-   peeled = PEELED_FULLY;
+   packed_refs->peeled = PEELED_FULLY;
else if (unsorted_string_list_has_string(, "peeled"))
-   peeled = PEELED_TAGS;
+   packed_refs->peeled = PEELED_TAGS;
/* perhaps other traits later as well */
 
/* The "+ 1" is for the LF character. */
@@ -282,8 +288,9 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
oidclr();
flag |= REF_BAD_NAME | REF_ISBROKEN;
}
-   if (peeled == PEELED_FULLY ||
-   (peeled == PEELED_TAGS && starts_with(refname, 
"refs/tags/")))
+   if (packed_refs->peeled == PEELED_FULLY ||
+   (packed_refs->peeled == PEELED_TAGS &&
+starts_with(refname, "refs/tags/")))
flag |= REF_KNOWS_PEELED;
entry = create_ref_entry(refname, , flag);
add_ref_entry(dir, entry);
-- 
2.14.1



[PATCH 12/20] packed_ref_cache: keep the `packed-refs` file open and mmapped

2017-09-13 Thread Michael Haggerty
As long as a `packed_ref_cache` object is in use, keep the
corresponding `packed-refs` file open and mmapped.

This is still pointless, because we only read the file immediately
after instantiating the `packed_ref_cache`. But that will soon change.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 137 --
 1 file changed, 89 insertions(+), 48 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 724c88631d..d209c8e5a0 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -18,6 +18,22 @@ struct packed_ref_cache {
 
struct ref_cache *cache;
 
+   /*
+* The file descriptor of the `packed-refs` file (open in
+* read-only mode), or -1 if it is not open.
+*/
+   int fd;
+
+   /*
+* The range of memory to which the `packed-refs` file is
+* mmapped. If there were no contents (e.g., because the file
+* didn't exist), `buf` and `eof` are both NULL.
+*/
+   char *buf, *eof;
+
+   /* The size of the header line, if any; otherwise, 0: */
+   size_t header_len;
+
/*
 * What is the peeled state of this cache? (This is usually
 * determined from the header of the "packed-refs" file.)
@@ -36,30 +52,6 @@ struct packed_ref_cache {
struct stat_validity validity;
 };
 
-/*
- * Increment the reference count of *packed_refs.
- */
-static void acquire_packed_ref_cache(struct packed_ref_cache *packed_refs)
-{
-   packed_refs->referrers++;
-}
-
-/*
- * Decrease the reference count of *packed_refs.  If it goes to zero,
- * free *packed_refs and return true; otherwise return false.
- */
-static int release_packed_ref_cache(struct packed_ref_cache *packed_refs)
-{
-   if (!--packed_refs->referrers) {
-   free_ref_cache(packed_refs->cache);
-   stat_validity_clear(_refs->validity);
-   free(packed_refs);
-   return 1;
-   } else {
-   return 0;
-   }
-}
-
 /*
  * A container for `packed-refs`-related data. It is not (yet) a
  * `ref_store`.
@@ -92,6 +84,50 @@ struct packed_ref_store {
struct tempfile tempfile;
 };
 
+/*
+ * Increment the reference count of *packed_refs.
+ */
+static void acquire_packed_ref_cache(struct packed_ref_cache *packed_refs)
+{
+   packed_refs->referrers++;
+}
+
+/*
+ * If the buffer in `packed_refs` is active, munmap the memory, close the
+ * file, and set the buffer pointers to NULL.
+ */
+static void release_packed_ref_buffer(struct packed_ref_cache *packed_refs)
+{
+   if (packed_refs->buf) {
+   if (munmap(packed_refs->buf,
+  packed_refs->eof - packed_refs->buf))
+   die_errno("error ummapping packed-refs file %s",
+ packed_refs->refs->path);
+   packed_refs->buf = packed_refs->eof = NULL;
+   packed_refs->header_len = 0;
+   }
+
+   if (packed_refs->fd >= 0)
+   close(packed_refs->fd);
+}
+
+/*
+ * Decrease the reference count of *packed_refs.  If it goes to zero,
+ * free *packed_refs and return true; otherwise return false.
+ */
+static int release_packed_ref_cache(struct packed_ref_cache *packed_refs)
+{
+   if (!--packed_refs->referrers) {
+   free_ref_cache(packed_refs->cache);
+   stat_validity_clear(_refs->validity);
+   release_packed_ref_buffer(packed_refs);
+   free(packed_refs);
+   return 1;
+   } else {
+   return 0;
+   }
+}
+
 struct ref_store *packed_ref_store_create(const char *path,
  unsigned int store_flags)
 {
@@ -284,13 +320,15 @@ static struct ref_iterator_vtable 
mmapped_ref_iterator_vtable = {
 };
 
 struct ref_iterator *mmapped_ref_iterator_begin(
-   const char *packed_refs_file,
struct packed_ref_cache *packed_refs,
const char *pos, const char *eof)
 {
struct mmapped_ref_iterator *iter = xcalloc(1, sizeof(*iter));
struct ref_iterator *ref_iterator = >base;
 
+   if (!packed_refs->buf)
+   return empty_ref_iterator_begin();
+
base_ref_iterator_init(ref_iterator, _ref_iterator_vtable, 0);
 
iter->packed_refs = packed_refs;
@@ -336,11 +374,8 @@ struct ref_iterator *mmapped_ref_iterator_begin(
 static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 {
struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs));
-   int fd;
struct stat st;
size_t size;
-   char *buf;
-   const char *pos, *eof;
struct ref_dir *dir;
struct ref_iterator *iter;
int ok;
@@ -351,13 +386,15 @@ static struct packed_ref_cache *read_packed

[PATCH 05/20] read_packed_refs(): use mmap to read the `packed-refs` file

2017-09-13 Thread Michael Haggerty
It's still done in a pretty stupid way, involving more data copying
than necessary. That will improve in future commits.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 42 --
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 5c50c223ef..154abbd83a 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -215,8 +215,12 @@ static NORETURN void die_invalid_line(const char *path,
  */
 static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 {
-   FILE *f;
struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs));
+   int fd;
+   struct stat st;
+   size_t size;
+   char *buf;
+   const char *pos, *eol, *eof;
struct ref_entry *last = NULL;
struct strbuf line = STRBUF_INIT;
enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
@@ -227,8 +231,8 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
packed_refs->cache = create_ref_cache(NULL, NULL);
packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
 
-   f = fopen(refs->path, "r");
-   if (!f) {
+   fd = open(refs->path, O_RDONLY);
+   if (fd < 0) {
if (errno == ENOENT) {
/*
 * This is OK; it just means that no
@@ -241,16 +245,27 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
}
}
 
-   stat_validity_update(_refs->validity, fileno(f));
+   stat_validity_update(_refs->validity, fd);
+
+   if (fstat(fd, ) < 0)
+   die_errno("couldn't stat %s", refs->path);
+
+   size = xsize_t(st.st_size);
+   buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
+   pos = buf;
+   eof = buf + size;
 
dir = get_ref_dir(packed_refs->cache->root);
-   while (strbuf_getwholeline(, f, '\n') != EOF) {
+   while (pos < eof) {
struct object_id oid;
const char *refname;
const char *traits;
 
-   if (!line.len || line.buf[line.len - 1] != '\n')
-   die_unterminated_line(refs->path, line.buf, line.len);
+   eol = memchr(pos, '\n', eof - pos);
+   if (!eol)
+   die_unterminated_line(refs->path, pos, eof - pos);
+
+   strbuf_add(, pos, eol + 1 - pos);
 
if (skip_prefix(line.buf, "# pack-refs with:", )) {
if (strstr(traits, " fully-peeled "))
@@ -258,7 +273,7 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
else if (strstr(traits, " peeled "))
peeled = PEELED_TAGS;
/* perhaps other traits later as well */
-   continue;
+   goto next_line;
}
 
refname = parse_ref_line(, );
@@ -291,11 +306,18 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
} else {
die_invalid_line(refs->path, line.buf, line.len);
}
+
+   next_line:
+   /* The "+ 1" is for the LF character. */
+   pos = eol + 1;
+   strbuf_reset();
}
 
-   fclose(f);
-   strbuf_release();
+   if (munmap(buf, size))
+   die_errno("error ummapping packed-refs file");
+   close(fd);
 
+   strbuf_release();
return packed_refs;
 }
 
-- 
2.14.1



[PATCH 03/20] packed_ref_cache: add a backlink to the associated `packed_ref_store`

2017-09-13 Thread Michael Haggerty
It will prove convenient in upcoming patches.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index e411501871..a3d9210cb0 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -7,7 +7,15 @@
 #include "../iterator.h"
 #include "../lockfile.h"
 
+struct packed_ref_store;
+
 struct packed_ref_cache {
+   /*
+* A back-pointer to the packed_ref_store with which this
+* cache is associated:
+*/
+   struct packed_ref_store *refs;
+
struct ref_cache *cache;
 
/*
@@ -154,7 +162,7 @@ static const char *parse_ref_line(struct strbuf *line, 
struct object_id *oid)
 }
 
 /*
- * Read from `packed_refs_file` into a newly-allocated
+ * Read from the `packed-refs` file into a newly-allocated
  * `packed_ref_cache` and return it. The return value will already
  * have its reference count incremented.
  *
@@ -182,7 +190,7 @@ static const char *parse_ref_line(struct strbuf *line, 
struct object_id *oid)
  *  compatibility with older clients, but we do not require it
  *  (i.e., "peeled" is a no-op if "fully-peeled" is set).
  */
-static struct packed_ref_cache *read_packed_refs(const char *packed_refs_file)
+static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 {
FILE *f;
struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs));
@@ -191,11 +199,12 @@ static struct packed_ref_cache *read_packed_refs(const 
char *packed_refs_file)
enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
struct ref_dir *dir;
 
+   packed_refs->refs = refs;
acquire_packed_ref_cache(packed_refs);
packed_refs->cache = create_ref_cache(NULL, NULL);
packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
 
-   f = fopen(packed_refs_file, "r");
+   f = fopen(refs->path, "r");
if (!f) {
if (errno == ENOENT) {
/*
@@ -205,7 +214,7 @@ static struct packed_ref_cache *read_packed_refs(const char 
*packed_refs_file)
 */
return packed_refs;
} else {
-   die_errno("couldn't read %s", packed_refs_file);
+   die_errno("couldn't read %s", refs->path);
}
}
 
@@ -218,7 +227,7 @@ static struct packed_ref_cache *read_packed_refs(const char 
*packed_refs_file)
const char *traits;
 
if (!line.len || line.buf[line.len - 1] != '\n')
-   die("unterminated line in %s: %s", packed_refs_file, 
line.buf);
+   die("unterminated line in %s: %s", refs->path, 
line.buf);
 
if (skip_prefix(line.buf, "# pack-refs with:", )) {
if (strstr(traits, " fully-peeled "))
@@ -258,7 +267,7 @@ static struct packed_ref_cache *read_packed_refs(const char 
*packed_refs_file)
last->flag |= REF_KNOWS_PEELED;
} else {
strbuf_setlen(, line.len - 1);
-   die("unexpected line in %s: %s", packed_refs_file, 
line.buf);
+   die("unexpected line in %s: %s", refs->path, line.buf);
}
}
 
@@ -293,7 +302,7 @@ static struct packed_ref_cache *get_packed_ref_cache(struct 
packed_ref_store *re
validate_packed_ref_cache(refs);
 
if (!refs->cache)
-   refs->cache = read_packed_refs(refs->path);
+   refs->cache = read_packed_refs(refs);
 
return refs->cache;
 }
-- 
2.14.1



[PATCH 07/20] read_packed_refs(): make parsing of the header line more robust

2017-09-13 Thread Michael Haggerty
The old code parsed the traits in the `packed-refs` header by looking
for the string " trait " (i.e., the name of the trait with a space on
either side) in the header line. This is fragile, because if any other
implementation of Git forgets to write the trailing space, the last
trait would silently be ignored (and the error might never be
noticed).

So instead, use `string_list_split_in_place()` to split the traits
into tokens then use `unsorted_string_list_has_string()` to look for
the tokens we are interested in. This means that we can read the
traits correctly even if the header line is missing a trailing
space (or indeed, if it is missing the space after the colon, or if it
has multiple spaces somewhere).

However, older Git clients (and perhaps other Git implementations)
still require the surrounding spaces, so we still have to output the
header with a trailing space.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 141f02b9c8..a45e3ff92f 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -257,25 +257,30 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
 
/* If the file has a header line, process it: */
if (pos < eof && *pos == '#') {
-   const char *traits;
+   char *p;
+   struct string_list traits = STRING_LIST_INIT_NODUP;
 
eol = memchr(pos, '\n', eof - pos);
if (!eol)
die_unterminated_line(refs->path, pos, eof - pos);
 
-   strbuf_add(, pos, eol + 1 - pos);
+   strbuf_add(, pos, eol - pos);
 
-   if (!skip_prefix(line.buf, "# pack-refs with:", ))
+   if (!skip_prefix(line.buf, "# pack-refs with:", (const char 
**)))
die_invalid_line(refs->path, pos, eof - pos);
 
-   if (strstr(traits, " fully-peeled "))
+   string_list_split_in_place(, p, ' ', -1);
+
+   if (unsorted_string_list_has_string(, "fully-peeled"))
peeled = PEELED_FULLY;
-   else if (strstr(traits, " peeled "))
+   else if (unsorted_string_list_has_string(, "peeled"))
peeled = PEELED_TAGS;
/* perhaps other traits later as well */
 
/* The "+ 1" is for the LF character. */
pos = eol + 1;
+
+   string_list_clear(, 0);
strbuf_reset();
}
 
@@ -610,7 +615,11 @@ int packed_refs_is_locked(struct ref_store *ref_store)
 
 /*
  * The packed-refs header line that we write out.  Perhaps other
- * traits will be added later.  The trailing space is required.
+ * traits will be added later.
+ *
+ * Note that earlier versions of Git used to parse these traits by
+ * looking for " trait " in the line. For this reason, the space after
+ * the colon and the trailing space are required.
  */
 static const char PACKED_REFS_HEADER[] =
"# pack-refs with: peeled fully-peeled \n";
-- 
2.14.1



[PATCH 06/20] read_packed_refs(): only check for a header at the top of the file

2017-09-13 Thread Michael Haggerty
This tightens up the parsing a bit; previously, stray header-looking
lines would have been processed.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 35 ---
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 154abbd83a..141f02b9c8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -255,11 +255,34 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
pos = buf;
eof = buf + size;
 
+   /* If the file has a header line, process it: */
+   if (pos < eof && *pos == '#') {
+   const char *traits;
+
+   eol = memchr(pos, '\n', eof - pos);
+   if (!eol)
+   die_unterminated_line(refs->path, pos, eof - pos);
+
+   strbuf_add(, pos, eol + 1 - pos);
+
+   if (!skip_prefix(line.buf, "# pack-refs with:", ))
+   die_invalid_line(refs->path, pos, eof - pos);
+
+   if (strstr(traits, " fully-peeled "))
+   peeled = PEELED_FULLY;
+   else if (strstr(traits, " peeled "))
+   peeled = PEELED_TAGS;
+   /* perhaps other traits later as well */
+
+   /* The "+ 1" is for the LF character. */
+   pos = eol + 1;
+   strbuf_reset();
+   }
+
dir = get_ref_dir(packed_refs->cache->root);
while (pos < eof) {
struct object_id oid;
const char *refname;
-   const char *traits;
 
eol = memchr(pos, '\n', eof - pos);
if (!eol)
@@ -267,15 +290,6 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
 
strbuf_add(, pos, eol + 1 - pos);
 
-   if (skip_prefix(line.buf, "# pack-refs with:", )) {
-   if (strstr(traits, " fully-peeled "))
-   peeled = PEELED_FULLY;
-   else if (strstr(traits, " peeled "))
-   peeled = PEELED_TAGS;
-   /* perhaps other traits later as well */
-   goto next_line;
-   }
-
refname = parse_ref_line(, );
if (refname) {
int flag = REF_ISPACKED;
@@ -307,7 +321,6 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
die_invalid_line(refs->path, line.buf, line.len);
}
 
-   next_line:
/* The "+ 1" is for the LF character. */
pos = eol + 1;
strbuf_reset();
-- 
2.14.1



[PATCH 11/20] mmapped_ref_iterator_advance(): no peeled value for broken refs

2017-09-13 Thread Michael Haggerty
If a reference is broken, suppress its peeled value.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 312116a99d..724c88631d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -234,9 +234,15 @@ static int mmapped_ref_iterator_advance(struct 
ref_iterator *ref_iterator)
 
/*
 * Regardless of what the file header said, we
-* definitely know the value of *this* reference:
+* definitely know the value of *this* reference. But
+* we suppress it if the reference is broken:
 */
-   iter->base.flags |= REF_KNOWS_PEELED;
+   if ((iter->base.flags & REF_ISBROKEN)) {
+   oidclr(>peeled);
+   iter->base.flags &= ~REF_KNOWS_PEELED;
+   } else {
+   iter->base.flags |= REF_KNOWS_PEELED;
+   }
} else {
oidclr(>peeled);
}
-- 
2.14.1



[PATCH 01/20] ref_iterator: keep track of whether the iterator output is ordered

2017-09-13 Thread Michael Haggerty
References are iterated over in order by refname, but reflogs are not.
Some consumers of reference iteration care about the difference. Teach
each `ref_iterator` to keep track of whether its output is ordered.

`overlay_ref_iterator` is one of the picky consumers. Add a sanity
check in `overlay_ref_iterator_begin()` to verify that its inputs are
ordered.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs.c|  4 
 refs/files-backend.c  | 16 +---
 refs/iterator.c   | 15 ++-
 refs/packed-backend.c |  2 +-
 refs/ref-cache.c  |  2 +-
 refs/ref-cache.h  |  3 ++-
 refs/refs-internal.h  | 23 +++
 7 files changed, 46 insertions(+), 19 deletions(-)

diff --git a/refs.c b/refs.c
index b0106b8162..101c107ee8 100644
--- a/refs.c
+++ b/refs.c
@@ -1309,6 +1309,10 @@ struct ref_iterator *refs_ref_iterator_begin(
if (trim)
iter = prefix_ref_iterator_begin(iter, "", trim);
 
+   /* Sanity check for subclasses: */
+   if (!iter->ordered)
+   BUG("reference iterator is not ordered");
+
return iter;
 }
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 961424a4ea..35648c89fc 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -762,7 +762,7 @@ static struct ref_iterator *files_ref_iterator_begin(
const char *prefix, unsigned int flags)
 {
struct files_ref_store *refs;
-   struct ref_iterator *loose_iter, *packed_iter;
+   struct ref_iterator *loose_iter, *packed_iter, *overlay_iter;
struct files_ref_iterator *iter;
struct ref_iterator *ref_iterator;
unsigned int required_flags = REF_STORE_READ;
@@ -772,10 +772,6 @@ static struct ref_iterator *files_ref_iterator_begin(
 
refs = files_downcast(ref_store, required_flags, "ref_iterator_begin");
 
-   iter = xcalloc(1, sizeof(*iter));
-   ref_iterator = >base;
-   base_ref_iterator_init(ref_iterator, _ref_iterator_vtable);
-
/*
 * We must make sure that all loose refs are read before
 * accessing the packed-refs file; this avoids a race
@@ -811,7 +807,13 @@ static struct ref_iterator *files_ref_iterator_begin(
refs->packed_ref_store, prefix, 0,
DO_FOR_EACH_INCLUDE_BROKEN);
 
-   iter->iter0 = overlay_ref_iterator_begin(loose_iter, packed_iter);
+   overlay_iter = overlay_ref_iterator_begin(loose_iter, packed_iter);
+
+   iter = xcalloc(1, sizeof(*iter));
+   ref_iterator = >base;
+   base_ref_iterator_init(ref_iterator, _ref_iterator_vtable,
+  overlay_iter->ordered);
+   iter->iter0 = overlay_iter;
iter->flags = flags;
 
return ref_iterator;
@@ -2084,7 +2086,7 @@ static struct ref_iterator 
*files_reflog_iterator_begin(struct ref_store *ref_st
struct ref_iterator *ref_iterator = >base;
struct strbuf sb = STRBUF_INIT;
 
-   base_ref_iterator_init(ref_iterator, _reflog_iterator_vtable);
+   base_ref_iterator_init(ref_iterator, _reflog_iterator_vtable, 0);
files_reflog_path(refs, , NULL);
iter->dir_iterator = dir_iterator_begin(sb.buf);
iter->ref_store = ref_store;
diff --git a/refs/iterator.c b/refs/iterator.c
index 4cf449ef66..c475360f0a 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -25,9 +25,11 @@ int ref_iterator_abort(struct ref_iterator *ref_iterator)
 }
 
 void base_ref_iterator_init(struct ref_iterator *iter,
-   struct ref_iterator_vtable *vtable)
+   struct ref_iterator_vtable *vtable,
+   int ordered)
 {
iter->vtable = vtable;
+   iter->ordered = !!ordered;
iter->refname = NULL;
iter->oid = NULL;
iter->flags = 0;
@@ -72,7 +74,7 @@ struct ref_iterator *empty_ref_iterator_begin(void)
struct empty_ref_iterator *iter = xcalloc(1, sizeof(*iter));
struct ref_iterator *ref_iterator = >base;
 
-   base_ref_iterator_init(ref_iterator, _ref_iterator_vtable);
+   base_ref_iterator_init(ref_iterator, _ref_iterator_vtable, 1);
return ref_iterator;
 }
 
@@ -205,6 +207,7 @@ static struct ref_iterator_vtable merge_ref_iterator_vtable 
= {
 };
 
 struct ref_iterator *merge_ref_iterator_begin(
+   int ordered,
struct ref_iterator *iter0, struct ref_iterator *iter1,
ref_iterator_select_fn *select, void *cb_data)
 {
@@ -219,7 +222,7 @@ struct ref_iterator *merge_ref_iterator_begin(
 * references through only if they exist in both iterators.
 */
 
-   base_ref_iterator_init(ref_iterator, _ref_iterator_vtable);
+   base_ref_iterator_init(ref_iterator, _ref_iterator_vtable, 
ordered);
iter->iter0 = iter0;
iter->iter1 = iter1;
iter->selec

[PATCH 02/20] prefix_ref_iterator: break when we leave the prefix

2017-09-13 Thread Michael Haggerty
From: Jeff King <p...@peff.net>

If the underlying iterator is ordered, then `prefix_ref_iterator` can
stop as soon as it sees a refname that comes after the prefix. This
will rarely make a big difference now, because `ref_cache_iterator`
only iterates over the directory containing the prefix (and usually
the prefix will span a whole directory anyway). But if *hint, hint* a
future reference backend doesn't itself know where to stop the
iteration, then this optimization will be a big win.

Note that there is no guarantee that the underlying iterator doesn't
include output preceding the prefix, so we have to skip over any
unwanted references before we get to the ones that we want.

Signed-off-by: Jeff King <p...@peff.net>
Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
Note that the implementation of `compare_prefix()` here is a bit
different than Peff's original version.

 refs/iterator.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/refs/iterator.c b/refs/iterator.c
index c475360f0a..bd35da4e62 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -287,6 +287,20 @@ struct prefix_ref_iterator {
int trim;
 };
 
+/* Return -1, 0, 1 if refname is before, inside, or after the prefix. */
+static int compare_prefix(const char *refname, const char *prefix)
+{
+   while (*prefix) {
+   if (*refname != *prefix)
+   return ((unsigned char)*refname < (unsigned 
char)*prefix) ? -1 : +1;
+
+   refname++;
+   prefix++;
+   }
+
+   return 0;
+}
+
 static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 {
struct prefix_ref_iterator *iter =
@@ -294,9 +308,25 @@ static int prefix_ref_iterator_advance(struct ref_iterator 
*ref_iterator)
int ok;
 
while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) {
-   if (!starts_with(iter->iter0->refname, iter->prefix))
+   int cmp = compare_prefix(iter->iter0->refname, iter->prefix);
+
+   if (cmp < 0)
continue;
 
+   if (cmp > 0) {
+   /*
+* If the source iterator is ordered, then we
+* can stop the iteration as soon as we see a
+* refname that comes after the prefix:
+*/
+   if (iter->iter0->ordered) {
+   ok = ref_iterator_abort(iter->iter0);
+   break;
+   } else {
+   continue;
+   }
+   }
+
if (iter->trim) {
/*
 * It is nonsense to trim off characters that
-- 
2.14.1



[PATCH 00/20] Read `packed-refs` using mmap()

2017-09-13 Thread Michael Haggerty
Previously, any time we wanted to read even a single reference from
the `packed-refs` file, we parsed the whole file and stored it in an
elaborate structure in memory called a `ref_cache`. Subsequent
reference lookups or iterations over some or all of the references
could be done by reading from the `ref_cache`.

But for large `packed-refs` files, the time needed to parse the file,
and the memory needed to cache its contents, can be quite significant.
This is partly because building the cache costs lots of little memory
allocations. (And lest you think that most Git commands can be
executed by reading at most a couple of loose references, remember
that almost any command that reads objects has to look for replace
references (unless they are turned off in the config), and *that*
necessarily entails reading packed references.)

Following lots of work to extract the `packed_ref_store` into a
separate module and decouple it from the `files_ref_store`, it is now
possible to fundamentally change how the `packed-refs` file is read.

* `mmap()` the whole file rather than `read()`ing it.

* Instead of parsing the complete file at once into a `ref_cache`,
  parse the references out of the file contents on demand.

* Use a binary search to find, very quickly, the reference or group of
  references that needs to be read. Parse *only* those references out
  of the file contents, without creating in-memory data structures at
  all.

In rare cases this change might force parts of the `packed-refs` file
to be read multiple times, but that cost is far outweighed by the fact
that usually most of the `packed-refs` file doesn't have to be read
*at all*.

Note that the binary search optimization requires the `packed-refs`
file to be sorted by reference name. We have always written them
sorted, but just in case there are clients that don't, we assume the
file is unsorted unless its header lists a `sorted` trait. From now
on, we write the file with that trait. If the file is not sorted, it
is sorted on the fly in memory.

For a repository with only a couple thousand references and a warm
disk cache, this change doesn't make a very significant difference.
But for repositories with very large numbers of references, the
difference start to be significant:

A repository with 54k references (warm cache):

  git 2.13.1 this branch
git for-each-ref  464 ms  452 ms
git for-each-ref (no output)   66 ms   47 ms
git for-each-ref (0.6% of refs)47 ms9 ms
git for-each-ref (0.6%, no output) 41 ms2 ms
git rev-parse  32 ms2 ms

A repository (admittedly insane, but a real-life example) with 60M
references (warm cache):

  git 2.13.1 this branch
git for-each-ref (no output)84000 ms61000 ms
git rev-parse   4 ms2 ms

This branch applies on top of mh/packed-ref-transactions. It can also
be obtained from my git fork [1] as branch `mmap-packed-refs`.

Michael

[1] https://github.com/mhagger/git

Jeff King (1):
  prefix_ref_iterator: break when we leave the prefix

Michael Haggerty (19):
  ref_iterator: keep track of whether the iterator output is ordered
  packed_ref_cache: add a backlink to the associated `packed_ref_store`
  die_unterminated_line(), die_invalid_line(): new functions
  read_packed_refs(): use mmap to read the `packed-refs` file
  read_packed_refs(): only check for a header at the top of the file
  read_packed_refs(): make parsing of the header line more robust
  read_packed_refs(): read references with minimal copying
  packed_ref_cache: remember the file-wide peeling state
  mmapped_ref_iterator: add iterator over a packed-refs file
  mmapped_ref_iterator_advance(): no peeled value for broken refs
  packed_ref_cache: keep the `packed-refs` file open and mmapped
  read_packed_refs(): ensure that references are ordered when read
  packed_ref_iterator_begin(): iterate using `mmapped_ref_iterator`
  packed_read_raw_ref(): read the reference from the mmapped buffer
  ref_store: implement `refs_peel_ref()` generically
  packed_ref_store: get rid of the `ref_cache` entirely
  ref_cache: remove support for storing peeled values
  mmapped_ref_iterator: inline into `packed_ref_iterator`
  packed-backend.c: rename a bunch of things and update comments

 refs.c|  22 +-
 refs/files-backend.c  |  54 +--
 refs/iterator.c   |  47 ++-
 refs/packed-backend.c | 896 +-
 refs/ref-cache.c  |  44 +--
 refs/ref-cache.h  |  35 +-
 refs/refs-internal.h  |  26 +-
 7 files changed, 761 insertions(+), 363 deletions(-)

-- 
2.14.1



Re: Unexpected pass for t6120-describe.sh on cygwin

2017-09-13 Thread Michael J Gruber
Johannes Schindelin venit, vidit, dixit 12.09.2017 15:39:
> Hi Ramsay,
> 
> On Sat, 9 Sep 2017, Ramsay Jones wrote:
> 
>> I ran the test-suite on the 'pu' branch last night (simply because that
>> was what I had built at the time!), which resulted in a PASS, but t6120
>> was showing a 'TODO passed' for #52.
>>
>> This is a test introduced by Michael's 'mg/name-rev-tests-with-short-stack'
>> branch, which uses 'ulimit -s' to try and force a stack overflow.
>> Unfortunately, 'ulimit -s' seems to have no effect on cygwin. I created
>> a test program (see below) to eat up the stack and tried running it with
>> various ulimit values (128, 12, 8), but it always seg-faulted at the
>> same stack-frame. (after using approx 2MB stack space).
>>
>> So, it looks like all ULIMIT_STACK_SIZE tests need to be disabled
>> on cygwin. I also wonder about the ULIMIT_FILE_DESCRIPTORS tests, but
>> haven't looked into it.
>>
>> Given that 'ulimit' is a bash built-in, this may also be a problem on
>> MinGW and Git-For-Windows, but I can't test on those platforms.
> 
> It is.

Thanks.

I just dug something up from an old cygwin list post. Could you please
try and report on the following (cygwin, MinGW):

ulimit -s
ulimit -s 4096 # anything lower than the value from above
ulimit -s
bash -c "ulimit -s"

My suspicion from that post is that ulimit affects the sub shell only -
if yes, running a test inside the sub shell to confirm whether the
setting is in effect would be great, of course. /usr/bin/echo $(seq
10) or such does the job on Linux (with stack limit 128), but I
don't know whether this is portably reliable.

If ulimit on these platforms has no effect but does not lie about the
value either it's a simple prerequisite test (test ulimit present, if
yes set and get and compare the stack size values).

Michael


Re: [PATCH 4/4] packed refs: pass .git dir instead of packed-refs path to init_fn

2017-09-13 Thread Michael Haggerty
On 09/12/2017 07:32 PM, Jonathan Nieder wrote:
> From: Stefan Beller <sbel...@google.com>
> 
> The first argument of a ref_store_init_fn is documented to represent
> the $GIT_DIR, not the path to the packed-refs file. This brings the
> packed-refs store more in line with the usual ref store interface.
> 
> Signed-off-by: Jonathan Nieder <jrnie...@gmail.com>
> Signed-off-by: Stefan Beller <sbel...@google.com>
> ---
> That's the end of the series.  Thanks for reading.
> 
>  refs/files-backend.c  | 4 ++--
>  refs/packed-backend.c | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index fccbc24ac4..3b8e13a8b7 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -57,8 +57,8 @@ static struct ref_store *files_ref_store_create(const char 
> *gitdir,
>   refs->gitdir = xstrdup(gitdir);
>   get_common_dir_noenv(, gitdir);
>   refs->gitcommondir = strbuf_detach(, NULL);
> - strbuf_addf(, "%s/packed-refs", refs->gitcommondir);
> - refs->packed_ref_store = packed_ref_store_create(sb.buf, flags);
> + refs->packed_ref_store =
> + packed_ref_store_create(refs->gitcommondir, flags);
>   strbuf_release();
>  
>   return ref_store;
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 412c85034f..2c5db15279 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -78,7 +78,7 @@ struct packed_ref_store {
>   struct tempfile tempfile;
>  };
>  
> -struct ref_store *packed_ref_store_create(const char *path,
> +struct ref_store *packed_ref_store_create(const char *common_dir,
> unsigned int store_flags)
>  {
>   struct packed_ref_store *refs = xcalloc(1, sizeof(*refs));
> @@ -87,7 +87,7 @@ struct ref_store *packed_ref_store_create(const char *path,
>   base_ref_store_init(ref_store, _be_packed);
>   refs->store_flags = store_flags;
>  
> - refs->path = xstrdup(path);
> + refs->path = xstrfmt("%s/packed-refs", common_dir);
>   return ref_store;
>  }
>  
> 

This little patch, perhaps surprisingly, brings up some deeper issues.

First of all there is a superficial naming inconsistency. This method is
called `init` in the vtable, but the functions implementing it are
called `*_ref_store_create()`. It actually initializes the data
structures for dealing with the reference store, as opposed to
initializing the reference store on disk (*that* virtual function is
called `init_db`). It should maybe be called `open` instead.

But more fundamentally, what is a `ref_store`? Originally it always
represented a *logical* place to store all of the references for a
repository. In that sense, it made sense to have an "open" method that
takes a `gitdir` as argument.

But its role is currently getting stretched. Now it sometimes represents
a *physical* place to store references, which might be combined with
other physical `ref_store`s to make a logical `ref_store`. There is not
necessarily a 1:1 correspondence between gitdirs and physical
`ref_store`s; more to the point you might want to initialize a physical
refstore that doesn't correspond to `$GITDIR`. So requiring every
`ref_store` to have a factory function with a gitdir argument is
probably incorrect.

For example, a subtree has a single logical reference store, but it is
composed out of three physical reference stores: the loose references in
the subtree's gitdir plus loose and packed references in the common gitdir.

It seems to me that we need two separate concepts:

1. Create an object representing a gitdir's *logical* `ref_store`. This
"class method" could always have a single gitdir as parameter. This
would be the method by which the repository initialization code
instantiates its logical `ref_store`, for example by reading the type of
the reference store from the repo's config, then looking up the class by
its type, then calling its "open" method to create an instance.

2. Create an object representing a physical `ref_store`. Different
reference store classes might have different "constructor" arguments.
For example, if it represents a namespace within a larger reference tree
contained in a shared repository, its arguments might be
`(shared_gitdir, namespace)`. (CC to Richard Maw for this angle.)

Since a `packed_ref_store` is definitely a physical `ref_store`, the
function that we're talking about here is the second type, so I don't
see a need for it to take a gitdir as argument. OTOH, the function
certainly doesn't belong in the vtable slot for `init`, because a
`packed_ref_store` can't serve as a repository's logical reference store.

Did you have a particular reason for changing this now, aside from
"consistency"? If not, feel free to drop this patch and I'll implement
something more like what I have described above when I have time.

Michael


Re: [PATCH 3/4] replace-objects: evaluate replacement refs without using the object store

2017-09-13 Thread Michael Haggerty
On 09/12/2017 07:31 PM, Jonathan Nieder wrote:
> From: Stefan Beller <sbel...@google.com>
> 
> Pass DO_FOR_EACH_INCLUDE_BROKEN when iterating over replacement refs
> so that the iteration does not require opening the named objects from
> the object store. This avoids a dependency cycle between object access
> and replace ref iteration.
> 
> Moreover the ref subsystem has not been migrated yet to access the
> object store via passed in repository objects.  As a result, without
> this patch, iterating over replace refs in a repository other than
> the_repository it produces errors:
> 
>error: refs/replace/3afabef75c627b8943bcae86837abc7c32fe does not 
> point to a valid object!
> 
> Noticed while adapting the object store (and in particular its
> evaluation of replace refs) to handle arbitrary repositories.

Have you checked that consumers of this API can handle broken
references? Aside from missing values, broken references can have
illegal names (though hopefully not unsafe in the sense of causing
filesystem traversal outside of `$GITDIR`).

Michael


Re: [PATCH 00/12] Clean up notes-related code around `load_subtree()`

2017-09-12 Thread Michael Haggerty
On Sun, Sep 10, 2017 at 9:39 AM, Jeff King <p...@peff.net> wrote:
> On Sun, Sep 10, 2017 at 06:45:08AM +0200, Michael Haggerty wrote:
>
>> > So nothing to see here, but since I spent 20 minutes scratching my head
>> > (and I know others look at Coverity output and may scratch their heads
>> > too), I thought it was worth writing up. And also if I'm wrong, it would
>> > be good to know. ;)
>>
>> Thanks for looking into this. I agree with your analysis.
>>
>> I wonder whether it is the factor of two between path lengths and byte
>> lengths that is confusing Coverity. Perhaps the patch below would help.
>> It requires an extra, superfluous, check, but perhaps makes the code a
>> tad more readable. I'm neutral on whether we would want to make the change.
>
> Yeah, I do agree that it makes the code's assumptions a bit easier to
> follow.
>
>> Is there a way to ask Coverity whether a hypothetical change would
>> remove the warning, short of merging the change to master?
>
> You can download and run the build portion of the coverity tools
> yourself. [...]

Thanks for the info.

My suggested tweak doesn't appease Coverity. Given that, I don't think
I'll bother adding it to the patch series.

Michael


Re: Unexpected pass for t6120-describe.sh on cygwin

2017-09-10 Thread Michael J Gruber
Ramsay Jones venit, vidit, dixit 09.09.2017 15:13:
> Hi Adam,
> 
> I ran the test-suite on the 'pu' branch last night (simply because
> that was what I had built at the time!), which resulted in a PASS,
> but t6120 was showing a 'TODO passed' for #52.
> 
> This is a test introduced by Michael's 'mg/name-rev-tests-with-short-stack'
> branch, which uses 'ulimit -s' to try and force a stack overflow.
> Unfortunately, 'ulimit -s' seems to have no effect on cygwin. I created
> a test program (see below) to eat up the stack and tried running it
> with various ulimit values (128, 12, 8), but it always seg-faulted
> at the same stack-frame. (after using approx 2MB stack space).
> 
> So, it looks like all ULIMIT_STACK_SIZE tests need to be disabled
> on cygwin. I also wonder about the ULIMIT_FILE_DESCRIPTORS tests,
> but haven't looked into it.
> 
> Given that 'ulimit' is a bash built-in, this may also be a problem
> on MinGW and Git-For-Windows, but I can't test on those platforms.
> 
> Unfortunately, I can't spend more time on git today, hence this
> heads up! ;-)

Thanks for the note. We have this in t/test-lib.sh:

run_with_limited_cmdline () {
(ulimit -s 128 && "$@")
}

test_lazy_prereq CMDLINE_LIMIT 'run_with_limited_cmdline true'

This apparantly expects "ulimit -s" to fail on platforms that don't
support it, so set the prereq accordingly. I moved the following to
t/test-lib.sh:

run_with_limited_stack () {
(ulimit -s 128 && "$@")
}

test_lazy_prereq ULIMIT_STACK_SIZE 'run_with_limited_stack true'

Same things as above. Two things to note:
- Those requisites could be the same, also they are used in different ways.
- "ulimit -s" returning success without doing anything means that, all
along, we ran the existing tests when we didn't mean to (on Win), and
they succeeded for the wrong reason, which we did not notice.

So, I guess, short of testing the effect of "ulimit -s" with another
expensive test, it's best to simply set these prerequisites based on
"uname -s".


> ATB,
> Ramsay Jones
> 
> -- >8 --
> diff --git a/test.c b/test.c
> new file mode 100644
> index 000..bcbb805
> --- /dev/null
> +++ b/test.c
> @@ -0,0 +1,21 @@
> +#include 
> +#include 
> +#include 
> +
> +void test(uint64_t count)
> +{
> + int i, junk[1024];
> +
> + for (i = 0; i < 1024; i++)
> + junk[i] = count;
> + i = junk[count % 1024];
> + printf("%" PRIuMAX "\n", (uintmax_t)count);
> + fflush(stdout);
> + test(count + 1);
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + test(0);
> + return 0;
> +}
> 



Re: [PATCH v2 08/11] t1404: demonstrate two problems with reference transactions

2017-09-09 Thread Michael Haggerty
On 09/09/2017 01:17 PM, Jeff King wrote:
> On Fri, Sep 08, 2017 at 03:51:50PM +0200, Michael Haggerty wrote:
> [...]
> So if we get what we want, we execute ":" which should be a successful
> exit code.

I think the `:` is superfluous even if we care about the exit code of
the `case`. I'll remove it.

>> +undefined)
>> +# This is not correct; it means the deletion has happened
>> +# already even though update-ref should not have been
>> +# able to acquire the lock yet.
>> +echo "$prefix/foo deleted prematurely" &&
>> +break
>> +;;
> 
> But if we don't, we hit a "break". But we're not in a loop, so the break
> does nothing. Is the intent to give a false value to the switch so that
> we fail the &&-chain? If so, I'd think "false" would be the right thing
> to use. It's more to the point, and from a few limited tests, it looks
> like "break" will return "0" even outside a loop (bash writes a
> complaint to stderr, but dash doesn't).
> 
> Or did you just forget that you're not writing C and that ";;" is the
> correct way to spell "break" here? :)

An earlier version of the patch used a loop and needed the `break`. But
when I removed the loop, I probably didn't notice the now-unneeded
breaks because of what you said. I'll take them out.

>> [...]
>> +esac >out &&
>> [...]
>> +test_must_be_empty out &&
> 
> The return value of "break" _doesn't_ matter, because you end up using
> the presence of the error message.
> 
> I think we could write this as just:
> 
>   case "$sha1" in
>   $D)
>   # good
>   ;;
>   undefined)
> echo >&2 this is bad
>   false
>   ;;
>   esac &&
> 
> I'm OK with it either way (testing the exit code or testing the output),
> but either way the "break" calls are doing nothing and can be dropped, I
> think.

Yes, using the exit code to decide success is simpler. I'll make that
change, too.

Thanks for your comments.

Michael


Re: [PATCH 00/12] Clean up notes-related code around `load_subtree()`

2017-09-09 Thread Michael Haggerty
On 09/09/2017 12:31 PM, Jeff King wrote:
> On Sat, Aug 26, 2017 at 10:28:00AM +0200, Michael Haggerty wrote:
> 
>> It turns out that the comment is incorrect, but there was nevertheless
>> plenty that could be cleaned up in the area:
>>
>> * Make macro `GIT_NIBBLE` safer by adding some parentheses
>> * Remove some dead code
>> * Fix some memory leaks
>> * Fix some obsolete and incorrect comments
>> * Reject "notes" that are not blobs
>>
>> I hope the result is also easier to understand.
>>
>> This branch is also available from my Git fork [1] as branch
>> `load-subtree-cleanup`.
> 
> FYI, Coverity seems to complain about "pu" after this series is merged, but
> I think it's wrong.  It says:
> 
>   *** CID 1417630:  Memory - illegal accesses  (OVERRUN)
>   /notes.c: 458 in load_subtree()
>   452 
>   453 /*
>   454  * Pad the rest of the SHA-1 with zeros,
>   455  * except for the last byte, where we write
>   456  * the length:
>   457  */
>   >>> CID 1417630:  Memory - illegal accesses  (OVERRUN)
>   >>> Overrunning array of 20 bytes at byte offset 20 by dereferencing 
> pointer "_oid.hash[len]".
>   458 memset(object_oid.hash + len, 0, GIT_SHA1_RAWSZ 
> - len - 1);
>   459 object_oid.hash[KEY_INDEX] = (unsigned char)len;
>   460 
>   461 type = PTR_TYPE_SUBTREE;
>   462 } else {
>   463 /* This can't be part of a note */
> 
> I agree that if "len" were 20 here that would be a problem, but I don't
> think that's possible.
> 
> The tool correctly claims that prefix_len can be up to 19, due to the
> assert:
> 
>  3. cond_at_most: Checking prefix_len >= 20UL implies that prefix_len may 
> be up to 19 on the false branch.
>   420if (prefix_len >= GIT_SHA1_RAWSZ)
>   421BUG("prefix_len (%"PRIuMAX") is out of range", 
> (uintmax_t)prefix_len);
> 
> Then it claims:
> 
> 13. Condition path_len == 2 * (20 - prefix_len), taking false branch.
>   430if (path_len == 2 * (GIT_SHA1_RAWSZ - prefix_len)) {
>   431/* This is potentially the remainder of the 
> SHA-1 */
> 
> So we know that either prefix_len is not 19, or that path_len is not 2
> (since that combination would cause us to take the true branch here).
> But then it goes on to say:
> 
> 14. Condition path_len == 2, taking true branch.
>   442} else if (path_len == 2) {
>   443/* This is potentially an internal node */
> 
> which I believe must mean that prefix_len cannot be 19 here. And yet it
> says:
> 
> 15. assignment: Assigning: len = prefix_len. The value of len may now be 
> up to 19.
>   444size_t len = prefix_len;
>   445
>   [...]
>  17. incr: Incrementing len. The value of len may now be up to 20.
>  18. Condition hex_to_bytes(_oid.hash[len++], entry.path, 1), 
> taking false branch.
>   450if (hex_to_bytes(object_oid.hash + len++, 
> entry.path, 1))
>   451goto handle_non_note; /* entry.path is 
> not a SHA1 */
> 
> I think that's impossible, and Coverity simply isn't smart enough to
> shrink the set of possible values for prefix_len based on the set of
> if-else conditions.
> 
> So nothing to see here, but since I spent 20 minutes scratching my head
> (and I know others look at Coverity output and may scratch their heads
> too), I thought it was worth writing up. And also if I'm wrong, it would
> be good to know. ;)

Thanks for looking into this. I agree with your analysis.

I wonder whether it is the factor of two between path lengths and byte
lengths that is confusing Coverity. Perhaps the patch below would help.
It requires an extra, superfluous, check, but perhaps makes the code a
tad more readable. I'm neutral on whether we would want to make the change.

Is there a way to ask Coverity whether a hypothetical change would
remove the warning, short of merging the change to master?

Michael

diff --git a/notes.c b/notes.c
index 27d232f294..34f623f7b1 100644
--- a/notes.c
+++ b/notes.c
@@ -426,8 +426,14 @@ static void load_subtree(struct notes_tree *t,
struct leaf_node *subtree,
unsigned char type;
struct leaf_node *l;
size_t path_len = strlen(entry.path);
+   size_t path_bytes;

-   if (path_len == 2 * (GIT_SHA1_RAWSZ - prefix_len)) {
+

Re: [PATCH v4 00/16] Fix git-gc losing objects in multi worktree

2017-09-09 Thread Michael Haggerty
On 08/23/2017 02:36 PM, Nguyễn Thái Ngọc Duy wrote:
> "git gc" when used in multiple worktrees ignore some per-worktree
> references: object references in the index, HEAD and reflog. This
> series fixes it by making the revision walker include these from all
> worktrees by default (and the series is basically split in three parts
> in the same order). There's a couple more cleanups in refs.c. Luckily
> it does not conflict with anything in 'pu'.
> 
> Compared to v3 [1], the largest change is supporting multi worktree in
> the reflog iterator. The merge iterator is now used (Micheal was right
> all along).

I read over all of the refs-related changes in this patch series. Aside
from the comments that I left, they look good to me. Thanks!

Michael


Re: [PATCH v4 16/16] refs.c: reindent get_submodule_ref_store()

2017-09-09 Thread Michael Haggerty
On 08/23/2017 02:37 PM, Nguyễn Thái Ngọc Duy wrote:
> With the new "if (!submodule) return NULL;" code added in the previous
> commit, we don't need to check if submodule is not NULL anymore.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclo...@gmail.com>
> ---
>  refs.c | 12 +---
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index a0c5078901..206af61d62 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1590,13 +1590,11 @@ struct ref_store *get_submodule_ref_store(const char 
> *submodule)
>   if (!submodule)
>   return NULL;
>  
> - if (submodule) {
> - len = strlen(submodule);
> - while (len && is_dir_sep(submodule[len - 1]))
> - len--;
> - if (!len)
> - return NULL;
> - }
> + len = strlen(submodule);
> + while (len && is_dir_sep(submodule[len - 1]))
> + len--;
> + if (!len)
> + return NULL;
>  
>   if (submodule[len])
>   /* We need to strip off one or more trailing slashes */
> 

Now I'm confused. Is it still allowed to call
`get_submodule_ref_store(NULL)`? If not, then the previous commit should
probably do

if (!submdule)
BUG(...);

Either way, it seems like it would be natural to combine this commit
with the previous one.

Michael


Re: [PATCH v4 15/16] refs.c: remove fallback-to-main-store code get_submodule_ref_store()

2017-09-09 Thread Michael Haggerty
On 08/23/2017 02:37 PM, Nguyễn Thái Ngọc Duy wrote:
> At this state, there are three get_submodule_ref_store() callers:
> 
>  - for_each_remote_ref_submodule()
>  - handle_revision_pseudo_opt()
>  - resolve_gitlink_ref()
> 
> The first two deal explicitly with submodules (and we should never fall
> back to the main ref store as a result). They are only called from
> submodule.c:
> 
>  - find_first_merges()
>  - submodule_needs_pushing()
>  - push_submodule()
> 
> The last one, as its name implies, deals only with submodules too, and
> the "submodule" (path) argument must be a non-NULL, non-empty string.
> 
> So, this "if NULL or empty string" code block should never ever
> trigger. And it's wrong to fall back to the main ref store
> anyway. Delete it.

Nice! Thanks for the cleanup.

Michael


Re: [PATCH v4 12/16] files-backend: make reflog iterator go through per-worktree reflog

2017-09-09 Thread Michael Haggerty
On 08/23/2017 02:37 PM, Nguyễn Thái Ngọc Duy wrote:
> refs/bisect is unfortunately per-worktree, so we need to look in
> per-worktree logs/refs/bisect in addition to per-repo logs/refs. The
> current iterator only goes through per-repo logs/refs.
> 
> Use merge iterator to walk two ref stores at the same time and pick
> per-worktree refs from the right iterator.
> 
> PS. Note the unsorted order of for_each_reflog in the test. This is
> supposed to be OK, for now. If we enforce order on for_each_reflog()
> then some more work will be required.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclo...@gmail.com>
> ---
>  refs/files-backend.c  | 59 
> +--
>  t/t1407-worktree-ref-store.sh | 30 ++
>  2 files changed, 75 insertions(+), 14 deletions(-)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 5cca55510b..d4d22882ef 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> [...]
> +static enum iterator_selection reflog_iterator_select(
> + struct ref_iterator *iter_worktree,
> + struct ref_iterator *iter_common,
> + void *cb_data)
> +{
> + if (iter_worktree) {
> + /*
> +  * We're a bit loose here. We probably should ignore
> +  * common refs if they are accidentally added as
> +  * per-worktree refs.
> +  */
> + return ITER_SELECT_0;

I don't understand the point of the comment. If we should ignore common
refs here, why not do it rather than commenting about it? Wouldn't it be
really easy to implement? OTOH if it's not needed, then why the comment?

> [...]

Michael


Re: [PATCH v4 11/16] revision.c: --all adds HEAD from all worktrees

2017-09-09 Thread Michael Haggerty
On 08/23/2017 02:36 PM, Nguyễn Thái Ngọc Duy wrote:
> [...]
> diff --git a/revision.c b/revision.c
> index 8d04516266..0e98444857 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -2133,6 +2133,14 @@ static int handle_revision_pseudo_opt(const char 
> *submodule,
>   int argcount;
>  
>   if (submodule) {
> + /*
> +  * We need some something like get_submodule_worktrees()
> +  * before we can go through all worktrees of a submodule,
> +  * .e.g with adding all HEADs from --all, which is not
> +  * supported right now, so stick to single worktree.
> +  */
> + if (!revs->single_worktree)
> + die("BUG: --single-worktree cannot be used together 
> with submodule");
>   refs = get_submodule_ref_store(submodule);
>   } else
>   refs = get_main_ref_store();

Tiny nit: s/.e.g/e.g./

> [...]

Michael


Re: [PATCH v4 10/16] refs: remove dead for_each_*_submodule()

2017-09-08 Thread Michael Haggerty
On 08/23/2017 02:36 PM, Nguyễn Thái Ngọc Duy wrote:
> These are used in revision.c. After the last patch they are replaced
> with the refs_ version. Delete them.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclo...@gmail.com>
> ---
>  Documentation/technical/api-ref-iteration.txt |  7 ++
>  refs.c| 33 
> ---
>  refs.h| 10 
>  3 files changed, 2 insertions(+), 48 deletions(-)

What a lovely diffstat :-)

Michael


Re: [PATCH v4 06/16] refs: move submodule slash stripping code to get_submodule_ref_store

2017-09-08 Thread Michael Haggerty
On 08/23/2017 02:36 PM, Nguyễn Thái Ngọc Duy wrote:
> This is a better place that will benefit all submodule callers instead
> of just resolve_gitlink_ref()

This is a nice sentiment, but I have to wonder whether we should rather
be requiring callers to use the "canonical" submodule name rather than
covering up their sloppiness for them (perhaps multiple times?). I
vaguely remember intentionally limiting the number of functions that do
this canonicalization, and having in mind the goal that the number
should become smaller over time, not larger.

For example, there could be some kind of `canonicalize_submodule_name()`
function that callers can use to clean up whatever submodule string they
have in hand, then let them use the cleaned up value from then on.

I don't really know much about the callers, though, so that is more of a
lazy philosophical speculation than a concrete suggestion.

Michael


[PATCH] load_subtree(): check that `prefix_len` is in the expected range

2017-09-08 Thread Michael Haggerty
This value, which is stashed in the last byte of an object_id hash,
gets handed around a lot. So add a sanity check before using it in
`load_subtree()`.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
This patch is an addendum to v1 of the mh/notes-cleanup patch series
[1]. It adds the assertion that was suggested by Junio [2].

Since the first patch series is already in next, this patch is
constructed to apply on top of that branch.

Thanks to Junio and Johan for their review of v1.

Michael

[1] https://public-inbox.org/git/cover.1503734566.git.mhag...@alum.mit.edu/
[2] https://public-inbox.org/git/xmqqh8wuqo6e@gitster.mtv.corp.google.com/

 notes.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/notes.c b/notes.c
index 40d9ba6252..27d232f294 100644
--- a/notes.c
+++ b/notes.c
@@ -417,7 +417,10 @@ static void load_subtree(struct notes_tree *t, struct 
leaf_node *subtree,
 oid_to_hex(>val_oid));
 
prefix_len = subtree->key_oid.hash[KEY_INDEX];
-   assert(prefix_len * 2 >= n);
+   if (prefix_len >= GIT_SHA1_RAWSZ)
+   BUG("prefix_len (%"PRIuMAX") is out of range", 
(uintmax_t)prefix_len);
+   if (prefix_len * 2 < n)
+   BUG("prefix_len (%"PRIuMAX") is too small", 
(uintmax_t)prefix_len);
memcpy(object_oid.hash, subtree->key_oid.hash, prefix_len);
while (tree_entry(, )) {
unsigned char type;
-- 
2.14.1



Re: [PATCH] refs: make sure we never pass NULL to hashcpy

2017-09-08 Thread Michael Haggerty
On 09/08/2017 02:46 AM, Junio C Hamano wrote:
> Michael Haggerty <mhag...@alum.mit.edu> writes:
> 
>> I did just realize one thing: `ref_transaction_update()` takes `flags`
>> as an argument and alters it using
>>
>>> flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 
>>> 0);
>>
>> Perhaps gcc is *more* intelligent than we give it credit for, and is
>> actually worried that the `flags` argument passed in by the caller
>> might *already* have one of these bits set. In that case
>> `ref_transaction_add_update()` would indeed be called incorrectly.
>> Does the warning go away if you change that line to
>>
>>> if (new_sha1)
>>> flags |=REF_HAVE_NEW;
>>> else
>>> flags &= ~REF_HAVE_NEW;
>>> if (old_sha1)
>>> flags |=REF_HAVE_OLD;
>>> else
>>> flags &= ~REF_HAVE_OLD;
>>
>> ? This might be a nice change to have anyway, to isolate
>> `ref_transaction_update()` from mistakes by its callers.
> 
> I understand "drop HAVE_NEW bit if new_sha1 is NULL" part, but not
> the other side "add HAVE_NEW if new_SHA1 is not NULL"---doesn't the
> NEW/OLD flag exist exactly because some callers pass the address of
> an embedded oid.hash[] or null_sha1, instead of NULL, when one side 
> does not exist?  So new|old being NULL is a definite signal that we
> need to drop HAVE_NEW|OLD, but the reverse may not be true, no?  Is
> it OK to overwrite null_sha1[] that is passed from some codepaths?
> 
> ref_transaction_create and _delete pass null_sha1 on the missing
> side, while ref_transaction_verify passes NULL, while calling
> _update().  Should this distinction affect how _add_update() gets
> called?

There are two functions under discussion:

* `ref_transaction_add_update()` is the low-level, private function that
uses the `HAVE_{NEW,OLD}` bits to decide what to do.

* `ref_transaction_update()` (like
`ref_transaction_{create,delete,verify}()`) are public functions that
ignore the `HAVE_{NEW,OLD}` bits and base their behavior on whether
`new_sha1` and `old_sha1` are NULL.

Each of these functions has to support three possibilities for its SHA-1
arguments:

1. The SHA-1 is provided and not `null_sha1`—in this case it must match
the old value (if `old_sha1`) or it is the value to be set as the new
value (if `new_sha1`).

2. The SHA-1 is provided and is equal to `null_sha1`—in this case the
reference must not already exist (if `old_sha1` is `null_sha1`) or it
will be deleted (if `new_sha1` is `null_sha1`).

3. The SHA-1 is not provided at all—in this case the old value is
ignored (if `old_sha1` is not provided) or the reference is left
unchanged (if `new_sha1` is not provided).

Much of the current confusion stems because
`ref_transaction_add_update()` encodes the third condition using the
`REF_HAVE_*` bits, whereas `ref_transaction_update()` and its friends
encode the third condition by setting `old_sha1` or `new_sha1` to `NULL`.

So `ref_transaction_update()` *does* need to set or clear the `HAVE_NEW`
and `HAVE_OLD` bits as I sketched, to impedance-match between the two
conventions.

It's a shame how much time we've wasted discussing this. Maybe the code
is trying to be too clever/efficient and needs a rethink.

Michael


[PATCH v2 07/11] files_initial_transaction_commit(): use a transaction for packed refs

2017-09-08 Thread Michael Haggerty
Use a `packed_ref_store` transaction in the implementation of
`files_initial_transaction_commit()` rather than using internal
features of the packed ref store. This further decouples
`files_ref_store` from `packed_ref_store`.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/files-backend.c | 29 +++--
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 60031fe3ae..2700e3b5d5 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2669,6 +2669,7 @@ static int files_initial_transaction_commit(struct 
ref_store *ref_store,
size_t i;
int ret = 0;
struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+   struct ref_transaction *packed_transaction = NULL;
 
assert(err);
 
@@ -2701,6 +2702,12 @@ static int files_initial_transaction_commit(struct 
ref_store *ref_store,
 _refnames))
die("BUG: initial ref transaction called with existing refs");
 
+   packed_transaction = 
ref_store_transaction_begin(refs->packed_ref_store, err);
+   if (!packed_transaction) {
+   ret = TRANSACTION_GENERIC_ERROR;
+   goto cleanup;
+   }
+
for (i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
 
@@ -2713,6 +2720,15 @@ static int files_initial_transaction_commit(struct 
ref_store *ref_store,
ret = TRANSACTION_NAME_CONFLICT;
goto cleanup;
}
+
+   /*
+* Add a reference creation for this reference to the
+* packed-refs transaction:
+*/
+   ref_transaction_add_update(packed_transaction, update->refname,
+  update->flags & ~REF_HAVE_OLD,
+  update->new_oid.hash, 
update->old_oid.hash,
+  NULL);
}
 
if (packed_refs_lock(refs->packed_ref_store, 0, err)) {
@@ -2720,21 +2736,14 @@ static int files_initial_transaction_commit(struct 
ref_store *ref_store,
goto cleanup;
}
 
-   for (i = 0; i < transaction->nr; i++) {
-   struct ref_update *update = transaction->updates[i];
-
-   if ((update->flags & REF_HAVE_NEW) &&
-   !is_null_oid(>new_oid))
-   add_packed_ref(refs->packed_ref_store, update->refname,
-  >new_oid);
-   }
-
-   if (commit_packed_refs(refs->packed_ref_store, err)) {
+   if (initial_ref_transaction_commit(packed_transaction, err)) {
ret = TRANSACTION_GENERIC_ERROR;
goto cleanup;
}
 
 cleanup:
+   if (packed_transaction)
+   ref_transaction_free(packed_transaction);
packed_refs_unlock(refs->packed_ref_store);
transaction->state = REF_TRANSACTION_CLOSED;
string_list_clear(_refnames, 0);
-- 
2.14.1



[PATCH v2 11/11] files_transaction_finish(): delete reflogs before references

2017-09-08 Thread Michael Haggerty
If the deletion steps unexpectedly fail, it is less bad to leave a
reference without its reflog than it is to leave a reflog without its
reference, since the latter is an invalid repository state.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/files-backend.c | 35 +--
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 29eb5e826f..961424a4ea 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2636,6 +2636,27 @@ static int files_transaction_finish(struct ref_store 
*ref_store,
}
}
 
+   /*
+* Now that updates are safely completed, we can perform
+* deletes. First delete the reflogs of any references that
+* will be deleted, since (in the unexpected event of an
+* error) leaving a reference without a reflog is less bad
+* than leaving a reflog without a reference (the latter is a
+* mildly invalid repository state):
+*/
+   for (i = 0; i < transaction->nr; i++) {
+   struct ref_update *update = transaction->updates[i];
+   if (update->flags & REF_DELETING &&
+   !(update->flags & REF_LOG_ONLY) &&
+   !(update->flags & REF_ISPRUNING)) {
+   strbuf_reset();
+   files_reflog_path(refs, , update->refname);
+   if (!unlink_or_warn(sb.buf))
+   try_remove_empty_parents(refs, update->refname,
+
REMOVE_EMPTY_PARENTS_REFLOG);
+   }
+   }
+
/*
 * Perform deletes now that updates are safely completed.
 *
@@ -2672,20 +2693,6 @@ static int files_transaction_finish(struct ref_store 
*ref_store,
}
}
 
-   /* Delete the reflogs of any references that were deleted: */
-   for (i = 0; i < transaction->nr; i++) {
-   struct ref_update *update = transaction->updates[i];
-   if (update->flags & REF_DELETING &&
-   !(update->flags & REF_LOG_ONLY) &&
-   !(update->flags & REF_ISPRUNING)) {
-   strbuf_reset();
-   files_reflog_path(refs, , update->refname);
-   if (!unlink_or_warn(sb.buf))
-   try_remove_empty_parents(refs, update->refname,
-
REMOVE_EMPTY_PARENTS_REFLOG);
-   }
-   }
-
clear_loose_ref_cache(refs);
 
 cleanup:
-- 
2.14.1



[PATCH v2 08/11] t1404: demonstrate two problems with reference transactions

2017-09-08 Thread Michael Haggerty
Currently, a loose reference is deleted even before locking the
`packed-refs` file, let alone deleting any packed version of the
reference. This leads to two problems, demonstrated by two new tests:

* While a reference is being deleted, other processes might see the
  old, packed value of the reference for a moment before the packed
  version is deleted. Normally this would be hard to observe, but we
  can prolong the window by locking the `packed-refs` file externally
  before running `update-ref`, then unlocking it before `update-ref`'s
  attempt to acquire the lock times out.

* If the `packed-refs` file is locked so long that `update-ref` fails
  to lock it, then the reference can be left permanently in the
  incorrect state described in the previous point.

In a moment, both problems will be fixed.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 t/t1404-update-ref-errors.sh | 73 
 1 file changed, 73 insertions(+)

diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh
index c34ece48f5..64a81345a8 100755
--- a/t/t1404-update-ref-errors.sh
+++ b/t/t1404-update-ref-errors.sh
@@ -404,4 +404,77 @@ test_expect_success 'broken reference blocks indirect 
create' '
test_cmp expected output.err
 '
 
+test_expect_failure 'no bogus intermediate values during delete' '
+   prefix=refs/slow-transaction &&
+   # Set up a reference with differing loose and packed versions:
+   git update-ref $prefix/foo $C &&
+   git pack-refs --all &&
+   git update-ref $prefix/foo $D &&
+   git for-each-ref $prefix >unchanged &&
+   # Now try to update the reference, but hold the `packed-refs` lock
+   # for a while to see what happens while the process is blocked:
+   : >.git/packed-refs.lock &&
+   test_when_finished "rm -f .git/packed-refs.lock" &&
+   {
+   # Note: the following command is intentionally run in the
+   # background. We increase the timeout so that `update-ref`
+   # attempts to acquire the `packed-refs` lock for longer than
+   # it takes for us to do the check then delete it:
+   git -c core.packedrefstimeout=3000 update-ref -d $prefix/foo &
+   } &&
+   pid2=$! &&
+   # Give update-ref plenty of time to get to the point where it tries
+   # to lock packed-refs:
+   sleep 1 &&
+   # Make sure that update-ref did not complete despite the lock:
+   kill -0 $pid2 &&
+   # Verify that the reference still has its old value:
+   sha1=$(git rev-parse --verify --quiet $prefix/foo || echo undefined) &&
+   case "$sha1" in
+   $D)
+   # This is what we hope for; it means that nothing
+   # user-visible has changed yet.
+   : ;;
+   undefined)
+   # This is not correct; it means the deletion has happened
+   # already even though update-ref should not have been
+   # able to acquire the lock yet.
+   echo "$prefix/foo deleted prematurely" &&
+   break
+   ;;
+   $C)
+   # This value should never be seen. Probably the loose
+   # reference has been deleted but the packed reference
+   # is still there:
+   echo "$prefix/foo incorrectly observed to be C" &&
+   break
+   ;;
+   *)
+   # WTF?
+   echo "unexpected value observed for $prefix/foo: $sha1" &&
+   break
+   ;;
+   esac >out &&
+   rm -f .git/packed-refs.lock &&
+   wait $pid2 &&
+   test_must_be_empty out &&
+   test_must_fail git rev-parse --verify --quiet $prefix/foo
+'
+
+test_expect_failure 'delete fails cleanly if packed-refs file is locked' '
+   prefix=refs/locked-packed-refs &&
+   # Set up a reference with differing loose and packed versions:
+   git update-ref $prefix/foo $C &&
+   git pack-refs --all &&
+   git update-ref $prefix/foo $D &&
+   git for-each-ref $prefix >unchanged &&
+   # Now try to delete it while the `packed-refs` lock is held:
+   : >.git/packed-refs.lock &&
+   test_when_finished "rm -f .git/packed-refs.lock" &&
+   test_must_fail git update-ref -d $prefix/foo >out 2>err &&
+   git for-each-ref $prefix >actual &&
+   test_i18ngrep "Unable to create $Q.*packed-refs.lock$Q: File exists" 
err &&
+   test_cmp unchanged actual
+'
+
 test_done
-- 
2.14.1



[PATCH v2 06/11] prune_refs(): also free the linked list

2017-09-08 Thread Michael Haggerty
At least since v1.7, the elements of the `refs_to_prune` linked list
have been leaked. Fix the leak by teaching `prune_refs()` to free the
list elements as it processes them.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/files-backend.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 3475c6f8a2..60031fe3ae 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1057,11 +1057,17 @@ static void prune_ref(struct files_ref_store *refs, 
struct ref_to_prune *r)
strbuf_release();
 }
 
-static void prune_refs(struct files_ref_store *refs, struct ref_to_prune *r)
+/*
+ * Prune the loose versions of the references in the linked list
+ * `*refs_to_prune`, freeing the entries in the list as we go.
+ */
+static void prune_refs(struct files_ref_store *refs, struct ref_to_prune 
**refs_to_prune)
 {
-   while (r) {
+   while (*refs_to_prune) {
+   struct ref_to_prune *r = *refs_to_prune;
+   *refs_to_prune = r->next;
prune_ref(refs, r);
-   r = r->next;
+   free(r);
}
 }
 
@@ -1148,7 +1154,7 @@ static int files_pack_refs(struct ref_store *ref_store, 
unsigned int flags)
 
packed_refs_unlock(refs->packed_ref_store);
 
-   prune_refs(refs, refs_to_prune);
+   prune_refs(refs, _to_prune);
strbuf_release();
return 0;
 }
-- 
2.14.1



[PATCH v2 10/11] packed-backend: rip out some now-unused code

2017-09-08 Thread Michael Haggerty
Now the outside world interacts with the packed ref store only via the
generic refs API plus a few lock-related functions. This allows us to
delete some functions that are no longer used, thereby completing the
encapsulation of the packed ref store.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 193 --
 refs/packed-backend.h |   8 ---
 2 files changed, 201 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 9d5f76b1dc..0279aeceea 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -91,19 +91,6 @@ struct ref_store *packed_ref_store_create(const char *path,
return ref_store;
 }
 
-/*
- * Die if refs is not the main ref store. caller is used in any
- * necessary error messages.
- */
-static void packed_assert_main_repository(struct packed_ref_store *refs,
- const char *caller)
-{
-   if (refs->store_flags & REF_STORE_MAIN)
-   return;
-
-   die("BUG: operation %s only allowed for main ref store", caller);
-}
-
 /*
  * Downcast `ref_store` to `packed_ref_store`. Die if `ref_store` is
  * not a `packed_ref_store`. Also die if `packed_ref_store` doesn't
@@ -321,40 +308,6 @@ static struct ref_dir *get_packed_refs(struct 
packed_ref_store *refs)
return get_packed_ref_dir(get_packed_ref_cache(refs));
 }
 
-/*
- * Add or overwrite a reference in the in-memory packed reference
- * cache. This may only be called while the packed-refs file is locked
- * (see packed_refs_lock()). To actually write the packed-refs file,
- * call commit_packed_refs().
- */
-void add_packed_ref(struct ref_store *ref_store,
-   const char *refname, const struct object_id *oid)
-{
-   struct packed_ref_store *refs =
-   packed_downcast(ref_store, REF_STORE_WRITE,
-   "add_packed_ref");
-   struct ref_dir *packed_refs;
-   struct ref_entry *packed_entry;
-
-   if (!is_lock_file_locked(>lock))
-   die("BUG: packed refs not locked");
-
-   if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL))
-   die("Reference has invalid format: '%s'", refname);
-
-   packed_refs = get_packed_refs(refs);
-   packed_entry = find_ref_entry(packed_refs, refname);
-   if (packed_entry) {
-   /* Overwrite the existing entry: */
-   oidcpy(_entry->u.value.oid, oid);
-   packed_entry->flag = REF_ISPACKED;
-   oidclr(_entry->u.value.peeled);
-   } else {
-   packed_entry = create_ref_entry(refname, oid, REF_ISPACKED);
-   add_ref_entry(packed_refs, packed_entry);
-   }
-}
-
 /*
  * Return the ref_entry for the given refname from the packed
  * references.  If it does not exist, return NULL.
@@ -596,152 +549,6 @@ int packed_refs_is_locked(struct ref_store *ref_store)
 static const char PACKED_REFS_HEADER[] =
"# pack-refs with: peeled fully-peeled \n";
 
-/*
- * Write the current version of the packed refs cache from memory to
- * disk. The packed-refs file must already be locked for writing (see
- * packed_refs_lock()). Return zero on success. On errors, rollback
- * the lockfile, write an error message to `err`, and return a nonzero
- * value.
- */
-int commit_packed_refs(struct ref_store *ref_store, struct strbuf *err)
-{
-   struct packed_ref_store *refs =
-   packed_downcast(ref_store, REF_STORE_WRITE | REF_STORE_MAIN,
-   "commit_packed_refs");
-   struct packed_ref_cache *packed_ref_cache =
-   get_packed_ref_cache(refs);
-   int ok;
-   int ret = -1;
-   struct strbuf sb = STRBUF_INIT;
-   FILE *out;
-   struct ref_iterator *iter;
-   char *packed_refs_path;
-
-   if (!is_lock_file_locked(>lock))
-   die("BUG: commit_packed_refs() called when unlocked");
-
-   /*
-* If packed-refs is a symlink, we want to overwrite the
-* symlinked-to file, not the symlink itself. Also, put the
-* staging file next to it:
-*/
-   packed_refs_path = get_locked_file_path(>lock);
-   strbuf_addf(, "%s.new", packed_refs_path);
-   if (create_tempfile(>tempfile, sb.buf) < 0) {
-   strbuf_addf(err, "unable to create file %s: %s",
-   sb.buf, strerror(errno));
-   strbuf_release();
-   goto out;
-   }
-   strbuf_release();
-
-   out = fdopen_tempfile(>tempfile, "w");
-   if (!out) {
-   strbuf_addf(err, "unable to fdopen packed-refs tempfile: %s",
-   strerror(errno));
-   goto error;
-   }
-
-   if (fprintf(out, "%s", PACKED_REFS_HEADER) < 0) {

[PATCH v2 09/11] files_ref_store: use a transaction to update packed refs

2017-09-08 Thread Michael Haggerty
When processing a `files_ref_store` transaction, it is sometimes
necessary to delete some references from the "packed-refs" file. Do
that using a reference transaction conducted against the
`packed_ref_store`.

This change further decouples `files_ref_store` from
`packed_ref_store`. It also fixes multiple problems, including the two
revealed by test cases added in the previous commit.

First, the old code didn't obtain the `packed-refs` lock until
`files_transaction_finish()`. This means that a failure to acquire the
`packed-refs` lock (e.g., due to contention with another process)
wasn't detected until it was too late (problems like this are supposed
to be detected in the "prepare" phase). The new code acquires the
`packed-refs` lock in `files_transaction_prepare()`, the same stage of
the processing when the loose reference locks are being acquired,
removing another reason why the "prepare" phase might succeed and the
"finish" phase might nevertheless fail.

Second, the old code deleted the loose version of a reference before
deleting any packed version of the same reference. This left a moment
when another process might think that the packed version of the
reference is current, which is incorrect. (Even worse, the packed
version of the reference can be arbitrarily old, and might even point
at an object that has since been garbage-collected.)

Third, if a reference deletion fails to acquire the `packed-refs` lock
altogether, then the old code might leave the repository in the
incorrect state (possibly corrupt) described in the previous
paragraph.

Now we activate the new "packed-refs" file (sans any references that
are being deleted) *before* deleting the corresponding loose
references. But we hold the "packed-refs" lock until after the loose
references have been finalized, thus preventing a simultaneous
"pack-refs" process from packing the loose version of the reference in
the time gap, which would otherwise defeat our attempt to delete it.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/files-backend.c | 132 +--
 t/t1404-update-ref-errors.sh |   4 +-
 2 files changed, 103 insertions(+), 33 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 2700e3b5d5..29eb5e826f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2397,13 +2397,22 @@ static int lock_ref_for_update(struct files_ref_store 
*refs,
return 0;
 }
 
+struct files_transaction_backend_data {
+   struct ref_transaction *packed_transaction;
+   int packed_refs_locked;
+};
+
 /*
  * Unlock any references in `transaction` that are still locked, and
  * mark the transaction closed.
  */
-static void files_transaction_cleanup(struct ref_transaction *transaction)
+static void files_transaction_cleanup(struct files_ref_store *refs,
+ struct ref_transaction *transaction)
 {
size_t i;
+   struct files_transaction_backend_data *backend_data =
+   transaction->backend_data;
+   struct strbuf err = STRBUF_INIT;
 
for (i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
@@ -2415,6 +2424,17 @@ static void files_transaction_cleanup(struct 
ref_transaction *transaction)
}
}
 
+   if (backend_data->packed_transaction &&
+   ref_transaction_abort(backend_data->packed_transaction, )) {
+   error("error aborting transaction: %s", err.buf);
+   strbuf_release();
+   }
+
+   if (backend_data->packed_refs_locked)
+   packed_refs_unlock(refs->packed_ref_store);
+
+   free(backend_data);
+
transaction->state = REF_TRANSACTION_CLOSED;
 }
 
@@ -2431,12 +2451,17 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
char *head_ref = NULL;
int head_type;
struct object_id head_oid;
+   struct files_transaction_backend_data *backend_data;
+   struct ref_transaction *packed_transaction = NULL;
 
assert(err);
 
if (!transaction->nr)
goto cleanup;
 
+   backend_data = xcalloc(1, sizeof(*backend_data));
+   transaction->backend_data = backend_data;
+
/*
 * Fail if a refname appears more than once in the
 * transaction. (If we end up splitting up any updates using
@@ -2503,6 +2528,41 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
  head_ref, _refnames, err);
if (ret)
break;
+
+   if (update->flags & REF_DELETING &&
+   !(update->flags & REF_LOG_ONLY) &&
+   !(update->flags &am

[PATCH v2 04/11] packed_delete_refs(): implement method

2017-09-08 Thread Michael Haggerty
Implement `packed_delete_refs()` using a reference transaction. This
means that `files_delete_refs()` can use `refs_delete_refs()` instead
of `repack_without_refs()` to delete any packed references, decreasing
the coupling between the classes.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/files-backend.c  |  2 +-
 refs/packed-backend.c | 45 -
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index fccbc24ac4..2c78f63494 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1157,7 +1157,7 @@ static int files_delete_refs(struct ref_store *ref_store, 
const char *msg,
if (packed_refs_lock(refs->packed_ref_store, 0, ))
goto error;
 
-   if (repack_without_refs(refs->packed_ref_store, refnames, )) {
+   if (refs_delete_refs(refs->packed_ref_store, msg, refnames, flags)) {
packed_refs_unlock(refs->packed_ref_store);
goto error;
}
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 9ab65c5a0a..9d5f76b1dc 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1086,7 +1086,50 @@ static int packed_initial_transaction_commit(struct 
ref_store *ref_store,
 static int packed_delete_refs(struct ref_store *ref_store, const char *msg,
 struct string_list *refnames, unsigned int flags)
 {
-   die("BUG: not implemented yet");
+   struct packed_ref_store *refs =
+   packed_downcast(ref_store, REF_STORE_WRITE, "delete_refs");
+   struct strbuf err = STRBUF_INIT;
+   struct ref_transaction *transaction;
+   struct string_list_item *item;
+   int ret;
+
+   (void)refs; /* We need the check above, but don't use the variable */
+
+   if (!refnames->nr)
+   return 0;
+
+   /*
+* Since we don't check the references' old_oids, the
+* individual updates can't fail, so we can pack all of the
+* updates into a single transaction.
+*/
+
+   transaction = ref_store_transaction_begin(ref_store, );
+   if (!transaction)
+   return -1;
+
+   for_each_string_list_item(item, refnames) {
+   if (ref_transaction_delete(transaction, item->string, NULL,
+  flags, msg, )) {
+   warning(_("could not delete reference %s: %s"),
+   item->string, err.buf);
+   strbuf_reset();
+   }
+   }
+
+   ret = ref_transaction_commit(transaction, );
+
+   if (ret) {
+   if (refnames->nr == 1)
+   error(_("could not delete reference %s: %s"),
+ refnames->items[0].string, err.buf);
+   else
+   error(_("could not delete references: %s"), err.buf);
+   }
+
+   ref_transaction_free(transaction);
+   strbuf_release();
+   return ret;
 }
 
 static int packed_pack_refs(struct ref_store *ref_store, unsigned int flags)
-- 
2.14.1



[PATCH v2 02/11] struct ref_transaction: add a place for backends to store data

2017-09-08 Thread Michael Haggerty
`packed_ref_store` is going to want to store some transaction-wide
data, so make a place for it.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/refs-internal.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index b02dc5a7e3..d7d344de73 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -242,6 +242,7 @@ struct ref_transaction {
size_t alloc;
size_t nr;
enum ref_transaction_state state;
+   void *backend_data;
 };
 
 /*
-- 
2.14.1



[PATCH v2 03/11] packed_ref_store: implement reference transactions

2017-09-08 Thread Michael Haggerty
Implement the methods needed to support reference transactions for
the packed-refs backend. The new methods are not yet used.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 313 +-
 refs/packed-backend.h |   9 ++
 2 files changed, 319 insertions(+), 3 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index b76f14e5b3..9ab65c5a0a 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -748,25 +748,332 @@ static int packed_init_db(struct ref_store *ref_store, 
struct strbuf *err)
return 0;
 }
 
+/*
+ * Write the packed-refs from the cache to the packed-refs tempfile,
+ * incorporating any changes from `updates`. `updates` must be a
+ * sorted string list whose keys are the refnames and whose util
+ * values are `struct ref_update *`. On error, rollback the tempfile,
+ * write an error message to `err`, and return a nonzero value.
+ *
+ * The packfile must be locked before calling this function and will
+ * remain locked when it is done.
+ */
+static int write_with_updates(struct packed_ref_store *refs,
+ struct string_list *updates,
+ struct strbuf *err)
+{
+   struct ref_iterator *iter = NULL;
+   size_t i;
+   int ok;
+   FILE *out;
+   struct strbuf sb = STRBUF_INIT;
+   char *packed_refs_path;
+
+   if (!is_lock_file_locked(>lock))
+   die("BUG: write_with_updates() called while unlocked");
+
+   /*
+* If packed-refs is a symlink, we want to overwrite the
+* symlinked-to file, not the symlink itself. Also, put the
+* staging file next to it:
+*/
+   packed_refs_path = get_locked_file_path(>lock);
+   strbuf_addf(, "%s.new", packed_refs_path);
+   free(packed_refs_path);
+   if (create_tempfile(>tempfile, sb.buf) < 0) {
+   strbuf_addf(err, "unable to create file %s: %s",
+   sb.buf, strerror(errno));
+   strbuf_release();
+   return -1;
+   }
+   strbuf_release();
+
+   out = fdopen_tempfile(>tempfile, "w");
+   if (!out) {
+   strbuf_addf(err, "unable to fdopen packed-refs tempfile: %s",
+   strerror(errno));
+   goto error;
+   }
+
+   if (fprintf(out, "%s", PACKED_REFS_HEADER) < 0)
+   goto write_error;
+
+   /*
+* We iterate in parallel through the current list of refs and
+* the list of updates, processing an entry from at least one
+* of the lists each time through the loop. When the current
+* list of refs is exhausted, set iter to NULL. When the list
+* of updates is exhausted, leave i set to updates->nr.
+*/
+   iter = packed_ref_iterator_begin(>base, "",
+DO_FOR_EACH_INCLUDE_BROKEN);
+   if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+   iter = NULL;
+
+   i = 0;
+
+   while (iter || i < updates->nr) {
+   struct ref_update *update = NULL;
+   int cmp;
+
+   if (i >= updates->nr) {
+   cmp = -1;
+   } else {
+   update = updates->items[i].util;
+
+   if (!iter)
+   cmp = +1;
+   else
+   cmp = strcmp(iter->refname, update->refname);
+   }
+
+   if (!cmp) {
+   /*
+* There is both an old value and an update
+* for this reference. Check the old value if
+* necessary:
+*/
+   if ((update->flags & REF_HAVE_OLD)) {
+   if (is_null_oid(>old_oid)) {
+   strbuf_addf(err, "cannot update ref 
'%s': "
+   "reference already exists",
+   update->refname);
+   goto error;
+   } else if (oidcmp(>old_oid, iter->oid)) 
{
+   strbuf_addf(err, "cannot update ref 
'%s': "
+   "is at %s but expected %s",
+   update->refname,
+   oid_to_hex(iter->oid),
+   
oid_to_hex(>old_oid));
+   goto error;
+ 

[PATCH v2 05/11] files_pack_refs(): use a reference transaction to write packed refs

2017-09-08 Thread Michael Haggerty
Now that the packed reference store supports transactions, we can use
a transaction to write the packed versions of references that we want
to pack. This decreases the coupling between `files_ref_store` and
`packed_ref_store`.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/files-backend.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 2c78f63494..3475c6f8a2 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1100,6 +1100,11 @@ static int files_pack_refs(struct ref_store *ref_store, 
unsigned int flags)
int ok;
struct ref_to_prune *refs_to_prune = NULL;
struct strbuf err = STRBUF_INIT;
+   struct ref_transaction *transaction;
+
+   transaction = ref_store_transaction_begin(refs->packed_ref_store, );
+   if (!transaction)
+   return -1;
 
packed_refs_lock(refs->packed_ref_store, LOCK_DIE_ON_ERROR, );
 
@@ -1115,12 +1120,14 @@ static int files_pack_refs(struct ref_store *ref_store, 
unsigned int flags)
continue;
 
/*
-* Create an entry in the packed-refs cache equivalent
-* to the one from the loose ref cache, except that
-* we don't copy the peeled status, because we want it
-* to be re-peeled.
+* Add a reference creation for this reference to the
+* packed-refs transaction:
 */
-   add_packed_ref(refs->packed_ref_store, iter->refname, 
iter->oid);
+   if (ref_transaction_update(transaction, iter->refname,
+  iter->oid->hash, NULL,
+  REF_NODEREF, NULL, ))
+   die("failure preparing to create packed reference %s: 
%s",
+   iter->refname, err.buf);
 
/* Schedule the loose reference for pruning if requested. */
if ((flags & PACK_REFS_PRUNE)) {
@@ -1134,8 +1141,11 @@ static int files_pack_refs(struct ref_store *ref_store, 
unsigned int flags)
if (ok != ITER_DONE)
die("error while iterating over references");
 
-   if (commit_packed_refs(refs->packed_ref_store, ))
-   die("unable to overwrite old ref-pack file: %s", err.buf);
+   if (ref_transaction_commit(transaction, ))
+   die("unable to write new packed-refs: %s", err.buf);
+
+   ref_transaction_free(transaction);
+
packed_refs_unlock(refs->packed_ref_store);
 
prune_refs(refs, refs_to_prune);
-- 
2.14.1



[PATCH v2 01/11] packed-backend: don't adjust the reference count on lock/unlock

2017-09-08 Thread Michael Haggerty
The old code incremented the packed ref cache reference count when
acquiring the packed-refs lock, and decremented the count when
releasing the lock. This is unnecessary because:

* Another process cannot change the packed-refs file because it is
  locked.

* When we ourselves change the packed-refs file, we do so by first
  modifying the packed ref-cache, and then writing the data from the
  ref-cache to disk. So the packed ref-cache remains fresh because any
  changes that we plan to make to the file are made in the cache first
  anyway.

So there is no reason for the cache to become stale.

Moreover, the extra reference count causes a problem if we
intentionally clear the packed refs cache, as we sometimes need to do
if we change the cache in anticipation of writing a change to disk,
but then the write to disk fails. In that case, `packed_refs_unlock()`
would have no easy way to find the cache whose reference count it
needs to decrement.

This whole issue will soon become moot due to upcoming changes that
avoid changing the in-memory cache as part of updating the packed-refs
on disk, but this change makes that transition easier.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 412c85034f..b76f14e5b3 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -525,7 +525,6 @@ int packed_refs_lock(struct ref_store *ref_store, int 
flags, struct strbuf *err)
"packed_refs_lock");
static int timeout_configured = 0;
static int timeout_value = 1000;
-   struct packed_ref_cache *packed_ref_cache;
 
if (!timeout_configured) {
git_config_get_int("core.packedrefstimeout", _value);
@@ -560,9 +559,11 @@ int packed_refs_lock(struct ref_store *ref_store, int 
flags, struct strbuf *err)
 */
validate_packed_ref_cache(refs);
 
-   packed_ref_cache = get_packed_ref_cache(refs);
-   /* Increment the reference count to prevent it from being freed: */
-   acquire_packed_ref_cache(packed_ref_cache);
+   /*
+* Now make sure that the packed-refs file as it exists in the
+* locked state is loaded into the cache:
+*/
+   get_packed_ref_cache(refs);
return 0;
 }
 
@@ -576,7 +577,6 @@ void packed_refs_unlock(struct ref_store *ref_store)
if (!is_lock_file_locked(>lock))
die("BUG: packed_refs_unlock() called when not locked");
rollback_lock_file(>lock);
-   release_packed_ref_cache(refs->cache);
 }
 
 int packed_refs_is_locked(struct ref_store *ref_store)
-- 
2.14.1



[PATCH v2 00/11] Implement transactions for the packed ref store

2017-09-08 Thread Michael Haggerty
This is v2 of a patch series to implement reference transactions for
the packed refs-store. Thanks to Stefan, Brandon, Junio, and Peff for
your review of v1 [1]. I believe I have addressed all of your
comments.

Changes since v1:

* Patch [01/11]: justify the change better in the log message. Add a
  comment explaining why `get_packed_ref_cache()` is being called but
  the return value discarded.

* Patch [05/11]: Lock the `packed-refs` file *after* successfully
  creating the (empty) transaction object. This prevents leaving the
  file locked if `ref_store_transaction_begin()` fails.

* Patch [06/11]: New patch, fixing a leak of the `refs_to_prune`
  linked list.

* Patch [07/11]: Reimplement test "no bogus intermediate values during
  delete" to work without polling. Also incorporate Junio's change
  `s/grep/test_i18ngrep/`.

These changes are also available as branch `packed-ref-transactions`
from my GitHub repo [2].

Michael

[1] https://public-inbox.org/git/cover.1503993268.git.mhag...@alum.mit.edu/
[2] https://github.com/mhagger/git

Michael Haggerty (11):
  packed-backend: don't adjust the reference count on lock/unlock
  struct ref_transaction: add a place for backends to store data
  packed_ref_store: implement reference transactions
  packed_delete_refs(): implement method
  files_pack_refs(): use a reference transaction to write packed refs
  prune_refs(): also free the linked list
  files_initial_transaction_commit(): use a transaction for packed refs
  t1404: demonstrate two problems with reference transactions
  files_ref_store: use a transaction to update packed refs
  packed-backend: rip out some now-unused code
  files_transaction_finish(): delete reflogs before references

 refs/files-backend.c | 214 ++--
 refs/packed-backend.c| 461 +--
 refs/packed-backend.h|  17 +-
 refs/refs-internal.h |   1 +
 t/t1404-update-ref-errors.sh |  73 +++
 5 files changed, 550 insertions(+), 216 deletions(-)

-- 
2.14.1



Re: git diff doesn't quite work as documented?

2017-09-08 Thread Michael J Gruber
Junio C Hamano venit, vidit, dixit 08.09.2017 03:26:
> Olaf Klischat <olaf.klisc...@gmail.com> writes:
> 
>> `git diff --help' says:
>>
>> git diff [--options]  [--] [...]
>>This form is to view the changes you have in your
>>working tree relative to the named .
> 
> That help text is poorly phrased.  
> 
> When "git diff" talks about files in your working tree, it always
> looks them _through_ the index.  As far as the command is concerned,
> a cruft left in your working tree that is not in the index does
> *not* exist.
> 
> So when your index does not have bar.txt, even if you have an
> untracked bar.txt in your directory, i.e.
> 
>> oklischat@oklischat:/tmp/gittest$ git status
>> On branch master
>> Untracked files:
>>   (use "git add ..." to include in what will be committed)
>>
>>  bar.txt
> 
> and you have a commit that _has_ that file, then the command thinks
>  has the path, and your working tree does *not*.  IOW, this
> is...
> 
>> oklischat@oklischat:/tmp/gittest$ git diff bar-added
>> diff --git a/bar.txt b/bar.txt
>> deleted file mode 100644
> 
> ... totally expected and intended output.
> 
> Hope the above explanation clarifies.  A documentation update might
> be helpful to new users.

Well, there's a difference between "working tree" and  "working dir".
The wt is "the tree of actual checked out files" per our glossary. So
maybe the doc could point to the glossary (see the glossary for the
difference to the work dir).

But really, this type of misunderstandings comes up often: people try to
understand the doc based on common language terms (which is okay, of
course) and are unaware of the defined meanings of technical terms.
Explaining them in every place where they are used simply does not scale.

Maybe we should make more use of our glossary (extend it, enhance it,
promote it) and somehow mark all technical terms as such when they are
used (say, italics in HTML), or at least when the exact meaning is
relevant like in the case above, and possibly link to the glossary (-item).

Michael


Re: [PATCH 08/10] files_ref_store: use a transaction to update packed refs

2017-09-08 Thread Michael Haggerty
On 09/08/2017 09:38 AM, Jeff King wrote:
> On Tue, Aug 29, 2017 at 10:20:32AM +0200, Michael Haggerty wrote:
> 
>> First, the old code didn't obtain the `packed-refs` lock until
>> `files_transaction_finish()`. This means that a failure to acquire the
>> `packed-refs` lock (e.g., due to contention with another process)
>> wasn't detected until it was too late (problems like this are supposed
>> to be detected in the "prepare" phase). The new code acquires the
>> `packed-refs` lock in `files_transaction_prepare()`, the same stage of
>> the processing when the loose reference locks are being acquired,
>> removing another reason why the "prepare" phase might succeed and the
>> "finish" phase might nevertheless fail.
> 
> That means we're holding the packed-refs lock for a slightly longer
> period. I think this could mean worse lock contention between otherwise
> unrelated transactions over the packed-refs file. I wonder if the
> lock-retry timeout might need to be increased to accommodate this. On
> the other hand, it looks like we take it after getting the individual
> locks, which I'd think would be the expensive part.

That was my thinking, yes. While the packed-refs lock is held, the
references being created/updated have their reflogs written and are
renamed into place. I don't see how that can be shortened without
compromising on correctness (in particular, that we want to process
creates/updates before deletions to try to preserve reachability as much
as possible during the transaction). As an added optimization, the
packed-refs lock is not acquired at all if no references are being deleted.

> Are there other callers who take the packed-refs and individual locks in
> the reverse order? I think git-pack-refs might. Could we "deadlock" with
> pack-refs? There are timeouts so it would resolve itself quickly, but I
> wonder if we'd have a case that would deadlock-until-timeout that
> otherwise could succeed.

That's not true. `files_pack_refs()` does the following:

1. Lock the `packed-refs` file.

2. Start a packed ref-store transaction.

3. Iterate over the loose ref cache. For each reference that should
   be packed:
   * add it to the packed-refs transaction as an update to set it
 to the loose value (without specifying an old_sha1)
   * if pruning is on, schedule the loose reference to be pruned
 (in an internal data structure, without locking the file).

4. Commit the packed ref-store transaction.

5. Release the packed-refs lock.

6. For each to-prune loose ref:
   * lock the loose reference
   * verify that it still has the value that was just written to
 the `packed-refs` file
   * if so, delete the loose version of the reference
   * unlock the loose reference

The packed-refs file and loose references are never locked at the same
time during pack-refs, so no deadlock is possible.

But you are right to assume that it *should* be so. The algorithm
written above is a tiny bit unsafe (and has been for years). It is
possible, though admittedly very unlikely, for the following to happen
in the gap between steps 5 and 6:

1. One process updates the value of branch "foo" from A to B.
   (Remember that A is the value that was just written to the
   `packed-refs` file by step 4.)

2. `pack-refs` is run *again* (while the first `pack-refs` is out
   on its lunch break), writes value B to the `packed-refs` file
   for "foo", and deletes the loose version of "foo".

3. Yet *another* process changes "foo" back from B to A. This
   creates a loose version of "foo" with value A, which overwrites
   the packed version with value B.

4. The first `pack-refs` process resumes at step 6, sees that
   "foo" "still" has the value "A", so deletes the loose reference.

This would leave "foo" at the obsolete value "B" (i.e., the value
written to the `packed-refs` file for it by the nested `pack-refs` process.

I think that fixing this problem would require the `packed-refs` lock to
be held while `pack-refs` is pruning the loose references. But given how
unlikely that chain of events seems, and that fixing it would increase
contention on the `packed-refs` file and allow the deadlock that you
described, I lean towards leaving it as-is. Though admittedly,
contention over a loose reference lock could make the race more likely
to be hit.

I also just noticed that the `refs_to_prune` linked list is leaked here
(as has also been the case for many years). I'll fix that separately.

>> Second, the old code deleted the loose version of a reference before
>> deleting any packed version of the same reference. This left a moment
>> when another process might think that the packed version of the
>> reference is current, which is incorrect. (Even worse, t

Re: [PATCH 0/4] Test name-rev with small stack

2017-09-08 Thread Michael J Gruber
Jeff King venit, vidit, dixit 07.09.2017 16:54:
> On Thu, Sep 07, 2017 at 04:02:19PM +0200, Michael J Gruber wrote:
> 
>> name-rev segfaults for me in emacs.git with the typical 8102 stack size.
>> The reason is the recursive walk that name-rev uses.
>>
>> This series adds a test to mark this as known failure, after some
>> clean-ups.
> 
> These all look reasonable to me. The size of the test case in the final
> one is presumably arbitrary and just copied from t7004. I don't know if
> it's worth trying to shrink it. It could shorten a rather expensive
> test. OTOH, if we shorten it too much then we might get a false pass
> (e.g., if the algorithm remains recursive but has a smaller stack
> footprint).
> 
>> Michael J Gruber (4):
>>   t7004: move limited stack prereq to test-lib
>>   t6120: test name-rev --all and --stdin
>>   t6120: clean up state after breaking repo
>>   t6120: test describe and name-rev with deep repos
> 
> Now comes the hard part: rewriting the C code. :)

Looking at it more closely, the solution in cbc60b6720 ("git tag
--contains: avoid stack overflow", 2014-04-24) seems to be a bit "ad
hoc" to me:

First of all, there is more than "tag --contains" that may exceed the
stack by recursion. So I would expect the solution to be more general,
and not localised and specialised to builtin/tag.c

Second, this is a walk, so I'm wondering whether our revision walk
machinery should be the place to add missing functionality (if any).
That way, everything would benefit from possible or existing
improvements there. For example, I think some of our "extra walkers"
don't heed object replacements. (I need to test more.)

Michael


Re: [PATCH 07/10] t1404: demonstrate two problems with reference transactions

2017-09-08 Thread Michael Haggerty
On 09/08/2017 06:44 AM, Junio C Hamano wrote:
> Michael Haggerty <mhag...@alum.mit.edu> writes:
> 
>> +git for-each-ref $prefix >actual &&
>> +grep "Unable to create $Q.*packed-refs.lock$Q: File exists" err &&
> 
> I added a squash for doing s/grep/test_i18n&/ here

Thanks for the fix. I always forget that gotcha.

> are there other
> issues in the series, or is the series more or less ready to be
> polished in 'next'?

I'm working on v2 right now.

Michael


Re: [PATCH 06/10] files_initial_transaction_commit(): use a transaction for packed refs

2017-09-08 Thread Michael Haggerty
On 09/08/2017 09:27 AM, Jeff King wrote:
> On Tue, Aug 29, 2017 at 10:20:30AM +0200, Michael Haggerty wrote:
> 
>> Used a `packed_ref_store` transaction in the implementation of
>> `files_initial_transaction_commit()` rather than using internal
>> features of the packed ref store. This further decouples
>> `files_ref_store` from `packed_ref_store`.
> 
> Very nice to see these couplings going away.
> 
> Minor nit: s/Used/Use/ in the commit message.

Thanks; will fix.

Michael


Re: [PATCH 01/10] packed-backend: don't adjust the reference count on lock/unlock

2017-09-08 Thread Michael Haggerty
On 09/08/2017 08:52 AM, Jeff King wrote:
> On Tue, Aug 29, 2017 at 10:20:25AM +0200, Michael Haggerty wrote:
> 
>> The old code incremented the packed ref cache reference count when
>> acquiring the packed-refs lock, and decremented the count when
>> releasing the lock. This is unnecessary because a locked packed-refs
>> file cannot be changed, so there is no reason for the cache to become
>> stale.
> 
> Hmm, I thought that after your last series, we might hold the lock but
> update the packed-refs from a separate tempfile. I.e., 42dfa7ecef
> (commit_packed_refs(): use a staging file separate from the lockfile,
> 2017-06-23).
> 
>> Moreover, the extra reference count causes a problem if we
>> intentionally clear the packed refs cache, as we sometimes need to do
>> if we change the cache in anticipation of writing a change to disk,
>> but then the write to disk fails. In that case, `packed_refs_unlock()`
>> would have no easy way to find the cache whose reference count it
>> needs to decrement.
>>
>> This whole issue will soon become moot due to upcoming changes that
>> avoid changing the in-memory cache as part of updating the packed-refs
>> on disk, but this change makes that transition easier.
> 
> All of this makes sense, and I'm happy this complexity is going away in
> the long run. But I guess what I'm wondering is in the meantime if we
> can have a sequence like:
> 
>   1. We hold packed-refs.lock
> 
>   2. We update the file without releasing the lock, via 42dfa7ecef.
> 
>   3. Still holding the lock, we try to look at packed-refs. The
>  stat_validity code says no, we're not up to date.
> 
>   4. We discard the old packed_ref_cache and reload it. Because its
>  reference count was not incremented during step 1, we actually
>  free() it.
> 
>   5. We try to look at at the old freed pointer.
> 
> There are several steps in there that might be implausible. So I'm
> mostly playing devil's advocate here.
> 
> I'm wondering if the "don't validate while we hold the lock" logic in
> get_packed_refs_cache() means that step 3 is impossible.

That's one of the reasons your scenario can't happen, but that just begs
the question of whether the "don't validate while we hold the lock" code
is wrongheaded.

In fact it's OK. The method by which the packed-refs file on disk is
modified at this point in the patch series is by modifying the packed
ref-cache and then writing the data from the ref-cache to disk. So the
packed ref-cache remains fresh because any changes that we plan to make
to the file are made in the cache first anyway.

I'll explain that a bit better in the log message.

The next question is whether this change interacts badly with changes
later in the patch series, when we cease to modify the ref-cache before
writing the new packed-refs file. Here we're OK, too, because
immediately after `rename_tempfile()` is used to rename the new
packed-refs file into place, we clear the packed ref-cache, so no
subsequent callers of `get_packed_refs_cache()` should see the stale cache.

Which in turn is partly because your step 5 is also implausible: code
shouldn't be holding a pointer to the packed ref-cache across operations
that might change the file. (Code that calls `get_packed_refs_cache()`
is OK because that function would see that `refs->cache` is NULL and
reload it regardless of whether we hold the lock.)

So I think everything is OK, but thanks for making me think through and
explain it better :-)

>> @@ -560,9 +559,7 @@ int packed_refs_lock(struct ref_store *ref_store, int 
>> flags, struct strbuf *err)
>>   */
>>  validate_packed_ref_cache(refs);
>>  
>> -packed_ref_cache = get_packed_ref_cache(refs);
>> -/* Increment the reference count to prevent it from being freed: */
>> -acquire_packed_ref_cache(packed_ref_cache);
>> +get_packed_ref_cache(refs);
> 
> It seems a bit funny to call a "get" function and throw away the return
> value. Presumably we care about its side effect of updating refs->cache.
> That might be worth a comment (though if this is all going away soon, I
> care a lot less about niceties like that).

I'm rerolling anyway, so I'll add a comment. Thanks.

Michael


Re: [PATCH 02/10] struct ref_transaction: add a place for backends to store data

2017-09-08 Thread Michael Haggerty
On 09/08/2017 09:02 AM, Jeff King wrote:
> On Tue, Aug 29, 2017 at 10:20:26AM +0200, Michael Haggerty wrote:
> 
>> `packed_ref_store` is going to want to store some transaction-wide
>> data, so make a place for it.
> 
> That makes sense, although...
> 
>> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
>> index b02dc5a7e3..d7d344de73 100644
>> --- a/refs/refs-internal.h
>> +++ b/refs/refs-internal.h
>> @@ -242,6 +242,7 @@ struct ref_transaction {
>>  size_t alloc;
>>  size_t nr;
>>  enum ref_transaction_state state;
>> +void *backend_data;
>>  };
> 
> This is just one pointer. Once we start layering ref backends (and
> already we're moving towards a "files" layer which sits atop loose and
> packed backends, right?), how do we avoid backends stomping on each
> other (or worse, dereferencing somebody else's data as their own
> struct)?

My conception is that layered backends would be separated as much as
possible, and if the "top" layer needs to modify the "bottom" layer, it
would do so via a separate reference transaction on the bottom layer.
That transaction would be owned by the bottom layer, which would be able
to use the corresponding `backend_data` pointers however it likes.

You can see an example of this construct in patch 08.

Michael


[PATCH 0/4] Test name-rev with small stack

2017-09-07 Thread Michael J Gruber
name-rev segfaults for me in emacs.git with the typical 8102 stack size.
The reason is the recursive walk that name-rev uses.

This series adds a test to mark this as known failure, after some
clean-ups.

Michael J Gruber (4):
  t7004: move limited stack prereq to test-lib
  t6120: test name-rev --all and --stdin
  t6120: clean up state after breaking repo
  t6120: test describe and name-rev with deep repos

 t/t6120-describe.sh | 57 +
 t/t7004-tag.sh  |  6 --
 t/test-lib.sh   |  6 ++
 3 files changed, 63 insertions(+), 6 deletions(-)

-- 
2.14.1.603.gf58147c36e



[PATCH 1/4] t7004: move limited stack prereq to test-lib

2017-09-07 Thread Michael J Gruber
The lazy prerequisite  ULIMIT_STACK_SIZE is used only in t7004 so far.

Move it to test-lib.sh so that it can be used in other tests (which it will
be in a follow-up commit).

Signed-off-by: Michael J Gruber <g...@grubix.eu>
---
 t/t7004-tag.sh | 6 --
 t/test-lib.sh  | 6 ++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index dbcd6f623c..5bf5ace56b 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -1863,12 +1863,6 @@ test_expect_success 'version sort with very long 
prerelease suffix' '
git tag -l --sort=version:refname
 '
 
-run_with_limited_stack () {
-   (ulimit -s 128 && "$@")
-}
-
-test_lazy_prereq ULIMIT_STACK_SIZE 'run_with_limited_stack true'
-
 # we require ulimit, this excludes Windows
 test_expect_success ULIMIT_STACK_SIZE '--contains and --no-contains work in a 
deep repo' '
>expect &&
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 5fbd8d4a90..f22c1b260a 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1167,6 +1167,12 @@ run_with_limited_cmdline () {
 
 test_lazy_prereq CMDLINE_LIMIT 'run_with_limited_cmdline true'
 
+run_with_limited_stack () {
+   (ulimit -s 128 && "$@")
+}
+
+test_lazy_prereq ULIMIT_STACK_SIZE 'run_with_limited_stack true'
+
 build_option () {
git version --build-options |
sed -ne "s/^$1: //p"
-- 
2.14.1.603.gf58147c36e



[PATCH 2/4] t6120: test name-rev --all and --stdin

2017-09-07 Thread Michael J Gruber
name-rev is used in a few tests, but tested only in t6120 along with
describe so far.

Add tests for name-rev with --all and --stdin.

Signed-off-by: Michael J Gruber <g...@grubix.eu>
---
 t/t6120-describe.sh | 25 +
 1 file changed, 25 insertions(+)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index aa74eb8f0d..7c5728ebd5 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -198,6 +198,31 @@ test_expect_success 'name-rev with exact tags' '
test_cmp expect actual
 '
 
+test_expect_success 'name-rev --all' '
+   >expect.unsorted &&
+   for rev in $(git rev-list --all)
+   do
+   git name-rev $rev >>expect.unsorted
+   done &&
+   sort expect &&
+   git name-rev --all >actual.unsorted &&
+   sort actual &&
+   test_cmp expect actual
+'
+
+test_expect_success 'name-rev --stdin' '
+   >expect.unsorted &&
+   for rev in $(git rev-list --all)
+   do
+   name=$(git name-rev --name-only $rev) &&
+   echo "$rev ($name)" >>expect.unsorted
+   done &&
+   sort expect &&
+   git rev-list --all | git name-rev --stdin >actual.unsorted &&
+   sort actual &&
+   test_cmp expect actual
+'
+
 test_expect_success 'describe --contains with the exact tags' '
echo "A^0" >expect &&
tag_object=$(git rev-parse refs/tags/A) &&
-- 
2.14.1.603.gf58147c36e



[PATCH 4/4] t6120: test describe and name-rev with deep repos

2017-09-07 Thread Michael J Gruber
Depending on the implementation of walks, limitted stack size may lead
to problems (for recursion).

Test name-rev and describe with deep repos and limitted stack size and
mark the former with known failure.

We add these tests (which add gazillions of commits) last so as to keep
the runtime of other subtests the same.

Signed-off-by: Michael J Gruber <g...@grubix.eu>
---
 t/t6120-describe.sh | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 1997ccde56..dd6dd9df9b 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -279,4 +279,35 @@ test_expect_success 'describe ignoring a borken submodule' 
'
grep broken out
 '
 
+# we require ulimit, this excludes Windows
+test_expect_failure ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
+   i=1 &&
+   while test $i -lt 8000
+   do
+   echo "commit refs/heads/master
+committer A U Thor <aut...@example.com> $((10 + $i * 100)) +0200
+data <expect &&
+   git name-rev HEAD~4000 >actual &&
+   test_cmp expect actual &&
+   run_with_limited_stack git name-rev HEAD~4000 >actual &&
+   test_cmp expect actual
+'
+
+test_expect_success ULIMIT_STACK_SIZE 'describe works in a deep repo' '
+   git tag -f far-far-away HEAD~7999 &&
+   echo "far-far-away" >expect &&
+   git describe --tags --abbrev=0 HEAD~4000 >actual &&
+   test_cmp expect actual &&
+   run_with_limited_stack git describe --tags --abbrev=0 HEAD~4000 >actual 
&&
+   test_cmp expect actual
+'
+
 test_done
-- 
2.14.1.603.gf58147c36e



[PATCH 3/4] t6120: clean up state after breaking repo

2017-09-07 Thread Michael J Gruber
t6120 breaks the repo state intentionally in the last tests.

Clean up the breakage afterwards (and before adding more tests).

Signed-off-by: Michael J Gruber <g...@grubix.eu>
---
 t/t6120-describe.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 7c5728ebd5..1997ccde56 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -275,6 +275,7 @@ test_expect_success 'describe chokes on severely broken 
submodules' '
 '
 test_expect_success 'describe ignoring a borken submodule' '
git describe --broken >out &&
+   test_when_finished "mv .git/modules/sub_moved .git/modules/sub1" &&
grep broken out
 '
 
-- 
2.14.1.603.gf58147c36e



Re: [PATCH] name-rev: change ULONG_MAX to TIME_MAX

2017-09-07 Thread Michael J Gruber
Jeff King venit, vidit, dixit 06.09.2017 15:35:
> On Wed, Sep 06, 2017 at 01:59:31PM +0200, Michael J Gruber wrote:
> 
>> BTW, there's more fallout from those name-rev changes: In connection
>> with that other thread about surprising describe results for emacs.git I
>> noticed that I can easily get "git name-rev --stdin" to segfault there.
>> As easy as
>>
>> echo bc5d96a0b2a1dccf7c459e40d21b54c977f4 | git name-rev --stdin
>>
>> for example.
>>
>> That's unfortunate for the use-case of name-rev to amend git log output.
>>
>> The reason seems to be that with "--stdin" or "--all", "name-rev" walks
>> and names all commits before beginning to use that those names for even
>> a single commit as above.
>>
>> That segfault bisects to the logic changing commit in
>> jc/name-rev-lw-tag, but I think the changed logic simply leads to more
>> xmallocs() the segfault sooner now. Or something that I dind't spot even
>> after a few hours.
> 
> The segfault seems to be due to running out of stack space. The problem
> is that name_rev() is recursive over the history graph.  That topic
> added a parameter to the function, which increased the memory used for
> each level of the recursion. But the fundamental problem has always been
> there. The right solution is to switch to iteration (with our own stack
> structure if necessary).
> 
> We had similar problems with the recursive --contains traversal in tag,
> and ended up with cbc60b6720 (git tag --contains: avoid stack overflow,
> 2014-04-24).

Cool, thanks for the pointer. ulimit -s is a great way to test this.

Michael


Re: [PATCH] refs: make sure we never pass NULL to hashcpy

2017-09-07 Thread Michael Haggerty
On Wed, Sep 6, 2017 at 3:26 AM, Junio C Hamano <gits...@pobox.com> wrote:
> Thomas Gummerer <t.gumme...@gmail.com> writes:
>
>> gcc on arch linux (version 7.1.1) warns that a NULL argument is passed
>> as the second parameter of memcpy.
>> [...]
>
> It is hugely annoying to see a halfway-intelligent compiler forces
> you to add such pointless asserts.
>
> The only way the compiler could error on this is by inferring the
> fact that new_sha1/old_sha1 could be NULL by looking at the callsite
> in ref_transaction_update() where these are used as conditionals to
> set HAVE_NEW/HAVE_OLD that are passed.  Even if the compiler were
> doing the whole-program analysis, the other two callsites of the
> function pass the address of oid.hash[] in an oid structure so it
> should know these won't be NULL.
>
> [...]
>
> I wonder if REF_HAVE_NEW/REF_HAVE_OLD are really needed in these
> codepaths, though.  Perhaps we can instead declare !!new_sha1 means
> we have the new side and rewrite the above part to
>
> if (new_sha1)
> hashcpy(update->new_oid.hash, new_sha1);
>
> without an extra and totally pointless assert()?

The ultimate reason for those flags is that `struct ref_update` embeds
`new_oid` and `old_oid` directly in the struct, so there is no way to
set it to "NULL". (The `is_null_sha1` value is used for a different
purpose.) So those flags keep track of whether the corresponding value
is specified or absent.

Four of the five callers of `ref_transaction_add_update()` are
constructing a new `ref_update` from an old one. They currently don't
have to look into `flags`; they just pass it on (possibly changing a
bit or two). Implementing your proposal would oblige those callers to
change from something like

> new_update = ref_transaction_add_update(
> transaction, "HEAD",
> update->flags | REF_LOG_ONLY | REF_NODEREF,
> update->new_oid.hash, update->old_oid.hash,
> update->msg);

to

> new_update = ref_transaction_add_update(
> transaction, "HEAD",
> update->flags | REF_LOG_ONLY | REF_NODEREF,
> (update->flags & REF_HAVE_NEW) ? update->new_oid.hash : NULL,
> (update->flags & REF_HAVE_OLD) ? update->old_oid.hash : NULL,
> update->msg);

It's not the end of the world, but it's annoying.
`ref_transaction_add_update()` was meant to be a low-level,
low-overhead way of allocating a `struct ref_update` and add it to a
transaction.

Another solution (also annoying, but maybe a tad less so) would be to
change the one iffy caller, `ref_transaction_update()`, to pass in a
pointer to the null OID for `new_sha1` and `old_sha1` when the
corresponding flags are turned off. That value would never be looked
at, but it would hopefully reassure gcc.

I did just realize one thing: `ref_transaction_update()` takes `flags`
as an argument and alters it using

> flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 
> 0);

Perhaps gcc is *more* intelligent than we give it credit for, and is
actually worried that the `flags` argument passed in by the caller
might *already* have one of these bits set. In that case
`ref_transaction_add_update()` would indeed be called incorrectly.
Does the warning go away if you change that line to

> if (new_sha1)
> flags |=REF_HAVE_NEW;
> else
> flags &= ~REF_HAVE_NEW;
> if (old_sha1)
> flags |=REF_HAVE_OLD;
> else
> flags &= ~REF_HAVE_OLD;

? This might be a nice change to have anyway, to isolate
`ref_transaction_update()` from mistakes by its callers. For that
matter, one might want to be even more selective about what bits are
allowed in the `flags` argument to `ref_transaction_update()`'s
callers:

> flags &= REF_ALLOWED_FLAGS; /* value would need to be determined */

Michael


Re: [PATCH] rev-parse: don't trim bisect refnames

2017-09-06 Thread Michael Haggerty
On Wed, Sep 6, 2017 at 1:53 PM, Jeff King <p...@peff.net> wrote:
> [...]
> Subject: [PATCH] rev-parse: don't trim bisect refnames
>
> Using for_each_ref_in() with a full refname has always been
> a questionable practice, but it became an error with
> b9c8e7f2fb (prefix_ref_iterator: don't trim too much,
> 2017-05-22), making "git rev-parse --bisect" pretty reliably
> show a BUG.
>
> Commit 03df567fbf (for_each_bisect_ref(): don't trim
> refnames, 2017-06-18) fixed this case for revision.c, but
> rev-parse handles this option on its own. We can use the
> same solution here (and piggy-back on its test).

It looks good to me.

Michael


Re: [PATCH] name-rev: change ULONG_MAX to TIME_MAX

2017-09-06 Thread Michael J Gruber
Junio C Hamano venit, vidit, dixit 06.09.2017 05:35:
> Michael J Gruber <g...@grubix.eu> writes:
> 
>> Earlier, dddbad728c ("timestamp_t: a new data type for timestamps",
>> 2017-04-26) changed several types to timestamp_t.
>>
>> 5589e87fd8 ("name-rev: change a "long" variable to timestamp_t",
>> 2017-05-20) cleaned up a missed variable, but both missed a _MAX
>> constant.
>>
>> Change the remaining constant to the one appropriate for the current
>> type
>>
>> Signed-off-by: Michael J Gruber <g...@grubix.eu>
>> ---
> 
> Thanks.
> 
> I think this (and the earlier 5589e8) was caused by an unnoticed
> semantic conflict at 78089b71 ("Merge branch 'jc/name-rev-lw-tag'",
> 2017-05-30).  Merging is sometimes hard ;-)

Simple merges and semi-simple merges...

BTW, there's more fallout from those name-rev changes: In connection
with that other thread about surprising describe results for emacs.git I
noticed that I can easily get "git name-rev --stdin" to segfault there.
As easy as

echo bc5d96a0b2a1dccf7c459e40d21b54c977f4 | git name-rev --stdin

for example.

That's unfortunate for the use-case of name-rev to amend git log output.

The reason seems to be that with "--stdin" or "--all", "name-rev" walks
and names all commits before beginning to use that those names for even
a single commit as above.

That segfault bisects to the logic changing commit in
jc/name-rev-lw-tag, but I think the changed logic simply leads to more
xmallocs() the segfault sooner now. Or something that I dind't spot even
after a few hours.

On the other hand, nearly every time that I try to understand describe
or name-rev I want get rid of insert_commit_by_date() and the like and
replace this by generations, and maybe a simple rev-walk (per ref)...

> Will queue.
> 
>>  builtin/name-rev.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
>> index c41ea7c2a6..598da6c8bc 100644
>> --- a/builtin/name-rev.c
>> +++ b/builtin/name-rev.c
>> @@ -253,7 +253,7 @@ static int name_ref(const char *path, const struct 
>> object_id *oid, int flags, vo
>>  struct commit *commit = (struct commit *)o;
>>  int from_tag = starts_with(path, "refs/tags/");
>>  
>> -if (taggerdate == ULONG_MAX)
>> +if (taggerdate == TIME_MAX)
>>  taggerdate = ((struct commit *)o)->date;
>>  path = name_ref_abbrev(path, can_abbreviate_output);
>>  name_rev(commit, xstrdup(path), taggerdate, 0, 0,


Re: signing commits using gpg2

2017-09-05 Thread Michael J Gruber
shawn wilson venit, vidit, dixit 02.09.2017 23:11:
> tl;dr - how do I get git to use gpg2 to sign things?
> 
> I'm using gpg2 (so no agent options are configured but an agent is
> running) which is configured w/ a Nitrokey (Pro if it matters):
> 
>  % git commit -m "Initial."
> 
>  gits/bash-libs (master ⚡) localhost
> gpg: detected reader `Nitrokey Nitrokey Pro (3467) 00 00'
> gpg: pcsc_connect failed: sharing violation (0x801b)
> gpg: apdu_send_simple(0) failed: locking failed
> Please insert the card and hit return or enter 'c' to cancel:
> gpg: pcsc_connect failed: sharing violation (0x801b)
> gpg: pcsc_connect failed: sharing violation (0x801b)
> gpg: apdu_send_simple(0) failed: locking failed
> Please insert the card and hit return or enter 'c' to cancel: c
> gpg: selecting openpgp failed: general error
> gpg: signing failed: general error
> gpg: signing failed: general error
> error: gpg failed to sign the data
> fatal: failed to write commit object
> 
> This works with gpg and ssh:

Not really...

>  % touch foo
> 
>  ~ localhost
>  % gpg2 --sign foo

... because you're using gpg2, not gpg.

> 
>  ~ localhost
> gpg: using "846FF490" as default secret key for signing
>  % cat foo*
> 
>  ~ localhost
> -BEGIN PGP MESSAGE-
> Version: GnuPG v2
> 
> owEBuQFG/pANAwAKAYwdY7SEb/SQAcsJYgNmb29ZqxfviQGcBAABCgAGBQJZqxfv
> AAoJEIwdY7SEb/SQAcEL/jonw+HymnlmfebtEwlvfx2Gl1Sbuw0xWWPpQ2Dtjljz
> HtpD+LWczjpOSMTHFNK9xPR2kcs1WNY+mO8M45QI7iDgFkKRzaxEqeNUJkoyF/+I
> 81VMmXDQMXFs4+8jy00b+UxTdvwdXaHMsOtu+6YCtmCR5Bzohg07ADsnXnGGn3Sd
> WTjVMzV6Dlh8LRF+coGJ8JuErBsRAI6vdNgJRVHYBULGNXci4uF/4a+58uiTL4/U
> PvC4ruXCNxCKi89nMERhwlnOvglseX3TDR5ldrc4Hzb+pLsj/l6N4sBW0Zmb8UcE
> 9BG3WjOs4eZvnLmk5XHrwisD2CXuHvyWMl0yH7LTrg+m4Itj0PJ4Px4H9E5t/zfs
> C1vcB/okcigeIyXnO06um02a5oZAYOKadB+6NRnBjULz5GvP2yxj/AO1VPmZprpt
> budMuHZcA0zNE3uBmcnQY5+1tdkyTrlTxsL58lQrn/U3wvgah3AXMEvjRGqbYWHj
> jDikQVJ7ESoevNqlfLPj8Q==
> =hV6v
> -END PGP MESSAGE-
> 
> However, if I try this w/ the old gpg:
> 
>  % gpg -ae -o foo.gpg foo
> 
>  ~ localhost
>  % gpg -d foo.gpg
> 
>  ~ localhost
> gpg: detected reader `Nitrokey Nitrokey Pro (3467) 00 00'
> gpg: pcsc_connect failed: sharing violation (0x801b)
> gpg: apdu_send_simple(0) failed: locking failed
> Please insert the card and hit return or enter 'c' to cancel: c
> gpg: selecting openpgp failed: general error
> gpg: encrypted with 3072-bit RSA key, ID 41826CFB, created 2017-03-13
>   "Shawn Wilson <ag4ve...@gmail.com>"
> gpg: public key decryption failed: general error
> gpg: decryption failed: secret key not available
>  % gpg2 -d foo.gpg
> 
>  ~ localhost
> gpg: encrypted with 3072-bit RSA key, ID E27FA0B841826CFB, created 2017-03-13
>   "Shawn Wilson <ag4ve...@gmail.com>"
> foo
> 
> (yeah I added data to the file)
> 
> And just to prove basic competency checking:
> 
>  % git config --global -l | grep sign
> 
>  ~ localhost
> user.signingkey=846FF490
> filter.gitconfig-rmuser.clean=sed -e "s/^\( *email =\).*/\1  address>/" -e "s/^\( *name =\).*/\1 /" -e "s/^\(
> *signingkey =\).*/\1 /"
> filter.gitconfig-rmuser.smudge=egrep "^ *(email|name|signingkey) = "
> commit.gpgsign=true
> 

So, gpg2 works and gpg does not. This is typical for the way in which
the gpg upgrade path is broken, and your distro installs gpg because it
still relies on it.

git sees two executables gpg and gpg2 and uses the first, so as to not
migrate your secrete key store inadvertently.

Short answer: Use

git config --global gpg.program gpg2

to make git use gpg2 which apparantly is your working gnupg setup.

Michael


Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-05 Thread Michael Haggerty
On Tue, Sep 5, 2017 at 10:45 AM, Jeff King <p...@peff.net> wrote:
> On Tue, Aug 29, 2017 at 07:18:22PM +0200, Martin Ågren wrote:
>
>> [...]
>> Rearrange the handling of `referent`, so that we don't add it directly
>> to `affected_refnames`. Instead, first just check whether `referent`
>> exists in the string list, and later add `new_update->refname`.
>
> Coincidentally[1] I came across this same leak, and my solution ended up
> slightly different. I'll share it here in case it's of interest.
>
> In your solution we end up searching the string list twice: once to see
> if we have the item, and then again to insert it. Whereas in the
> original we did both with a single search.
>
> But we can observe that either:
>
>   1. The item already existed, in which case our insert was a noop, and
>  we're good.
>
> or
>
>   2. We inserted it, in which case we proceed with creating new_update.
>
>  We can then in O(1) replace the pointer in the string list item
>  with the storage in new_update. We know we're not violating any
>  string_list invariants because the strings contain the same bytes.
>
> I.e.:
>
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 9266f5ab9d..1d16c1b33e 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2178,6 +2178,11 @@ static int split_symref_update(struct files_ref_store 
> *refs,
> update->flags |= REF_LOG_ONLY | REF_NODEREF;
> update->flags &= ~REF_HAVE_OLD;
>
> +   /*
> +* Re-point at the storage provided by our ref_update, which we know
> +* will last as long as the affected_refnames list.
> +*/
> +   item->string = new_update->refname;
> item->util = new_update;
>
> return 0;
>
> It feels pretty dirty, though. It would certainly be a bug if we ever
> decided to switch affected_refnames to duplicate its strings.
>
> So given that your solution is only a constant-time factor worse in
> efficiency, we should probably prefer it as the more maintainable
> option.

This is clever, but I don't like that it requires outside code to
change internal `string_list` structures in a way that is not
documented to be OK.

If we cared about getting rid of the extra `O(lg N)` search (and I
agree with you that it doesn't matter in this case), I think the clean
way to do it would be for `string_list` to expose a method like

struct string_list_item *string_list_insert_at_index(struct
string_list *list, size_t index, const char *string);

and to use it, together with `string_list_find_insert_index()`, to
avoid having to search twice.

Michael


Re: [PATCH 07/10] t1404: demonstrate two problems with reference transactions

2017-08-30 Thread Michael Haggerty
On 08/30/2017 07:21 PM, Stefan Beller wrote:
> On Tue, Aug 29, 2017 at 1:20 AM, Michael Haggerty <mhag...@alum.mit.edu> 
> wrote:
>> [...]
>> +test_expect_failure 'no bogus intermediate values during delete' '
>> +   prefix=refs/slow-transaction &&
>> +   # Set up a reference with differing loose and packed versions:
>> +   git update-ref $prefix/foo $C &&
>> +   git pack-refs --all &&
>> +   git update-ref $prefix/foo $D &&
>> +   git for-each-ref $prefix >unchanged &&
>> +   # Now try to update the reference, but hold the `packed-refs` lock
>> +   # for a while to see what happens while the process is blocked:
>> +   : >.git/packed-refs.lock &&
>> +   test_when_finished "rm -f .git/packed-refs.lock" &&
>> +   {
>> +   sleep 1 &&
>> +   rm -f .git/packed-refs.lock &
>> +   } &&
>> +   pid1=$! &&
>> +   {
>> +   # Note: the following command is intentionally run in the
>> +   # background. We extend the timeout so that `update-ref`
>> +   # tries to acquire the `packed-refs` lock longer than it
>> +   # takes the background process above to delete it:
>> +   git -c core.packedrefstimeout=2000 update-ref -d $prefix/foo 
>> &
>> +   } &&
>> +   pid2=$! &&
>> +   ok=true &&
>> +   while kill -0 $pid2 2>/dev/null
> 
> If sig is 0, then no signal is sent, but error checking is still
> performed; this can be used to check for the existence of a
> process ID or process group ID.
> 
> So the kill -0 is the idiomatic form of "while $pid2 is still alive"?
> ignoring errors due to the dev/null redirection?
> 
> And due to the nature of this test we have to have a busy
> loop, we cannot rate limit the cpu usage inside the loop
> via some shorter sleeps, as ideally we want to observe
> the ref at any time.

Correct on both counts.

I just noticed that there is a stray line `ok=true &&` from an earlier
draft. I'll remove that in v2.

> In an ideal world this test would instruct the kernel to interrupt
> the executing program (update-ref) at certain events such as
> touching/writing/deleting files and in each interrupt we could
> inspect the file system in a read only fashion.

A tool like `strace` could be used for tests like this, but it would be
terribly non-portable. (But I often use strace manually to check that
the ordering of filesystem events is correct.)

>> +   do
>> +   sha1=$(git rev-parse --verify --quiet $prefix/foo || echo 
>> undefined) &&
>> +   case "$sha1" in
>> +   $D)
>> +   # This is OK; it just means that nothing has 
>> happened yet.
>> +   : ;;
>> +   undefined)
>> +   # This is OK; it means the deletion was successful.
>> +   : ;;
>> +   $C)
>> +   # This value should never be seen. Probably the loose
>> +   # reference has been deleted but the packed reference
>> +   # is still there:
>> +   echo "$prefix/foo incorrectly observed to be C" &&
>> +   break
>> +   ;;
>> +   *)
>> +   # WTF?
>> +   echo "$prefix/foo unexpected value observed: $sha1" 
>> &&
>> +   break
>> +   ;;
>> +   esac
>> +   done >out &&
>> +   wait $pid1 &&
>> +   wait $pid2 &&
> 
> oh, you use explicit pids here to check each exit code.
> 
>> If anybody has suggestions for better ways to test these things,
>> please speak up :-)
> 
> I don't think I'd have a satisfactory answer to that, as the timing is 
> inherent
> to the things we test. In other software projects that are less low level, I
> would have suggested to use a time/clock mock, which can be stopped
> and then inspection can be performed at defined states.

I just realized that, given that the main goal here is to check the
value of the reference while `update-ref` is waiting on the
`packed-refs` lock, we can do the test without a busy loop. Instead, we
roughly

    : >.git/packed-refs.lock &&
{
git -c core.packedrefstimeout=2000 update-ref -d $prefix/foo &
} &&
pid2=$! &&
sleep 1 &&
# Verify that update-ref is still running:
kill -0 $pid2 &&
# ...verify that the reference still has its old value...
rm -f .git/packed-refs.lock &&
wait $pid2 &&
# ...verify that the reference is now gone...

It's true that this version wouldn't discover incorrect transitional
values of the reference that happen at other times, but that was very
unlikely anyway given the speed disparity between C and shell. I'll make
this change in v2.

Michael


[PATCH] name-rev: change ULONG_MAX to TIME_MAX

2017-08-30 Thread Michael J Gruber
Earlier, dddbad728c ("timestamp_t: a new data type for timestamps",
2017-04-26) changed several types to timestamp_t.

5589e87fd8 ("name-rev: change a "long" variable to timestamp_t",
2017-05-20) cleaned up a missed variable, but both missed a _MAX
constant.

Change the remaining constant to the one appropriate for the current
type

Signed-off-by: Michael J Gruber <g...@grubix.eu>
---
 builtin/name-rev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index c41ea7c2a6..598da6c8bc 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -253,7 +253,7 @@ static int name_ref(const char *path, const struct 
object_id *oid, int flags, vo
struct commit *commit = (struct commit *)o;
int from_tag = starts_with(path, "refs/tags/");
 
-   if (taggerdate == ULONG_MAX)
+   if (taggerdate == TIME_MAX)
taggerdate = ((struct commit *)o)->date;
path = name_ref_abbrev(path, can_abbreviate_output);
name_rev(commit, xstrdup(path), taggerdate, 0, 0,
-- 
2.14.1.603.gf58147c36e



Re: [PATCH] config: use a static lock_file struct

2017-08-30 Thread Michael Haggerty
On 08/30/2017 07:55 AM, Jeff King wrote:
> On Wed, Aug 30, 2017 at 12:55:55AM -0400, Jeff King wrote:
> [...]
> The patch below demonstrates how this could be used to turn some
> "xcalloc" lock_files into stack variables that are allowed to go out of
> scope after we commit or rollback. This solves the three lock-related
> leaks reported by valgrind when running t.
> 
> _But_ it also demonstrates an interesting downside of this approach.
> Some functions are lazy in their error paths. For instance, look at
> write_index_as_tree(). We take the lock early in the function, but may
> return before a commit or rollback if we hit an error.  With the current
> code this "leaks" the tempfile, which is wrong. But nobody notices
> because in practice the program exits soon after and we clean up the
> tempfile then.
> 
> But with a stack variable lock_file, that minor problem becomes a major
> one: our stack variable got added to a global linked list and the
> atexit() handler will try to read it. But now of course it will contain
> garbage, since the variable went out of scope.
> 
> So it's probably safer to just let tempfile.c handle the whole lifetime,
> and have it put all live tempfiles on the heap, and free them when
> they're deactivated. That means our creation signature becomes more
> like:
> 
>   struct tempfile *create_tempfile(const char *path);
> 
> and delete_tempfile() actually calls free() on it (though we'd probably
> want to skip the free() from a signal handler for the usual reasons).

I agree that the latter would be a nice, and relatively safe, design. It
would involve some fairly intrusive changes to client code, though.

I think it would be possible to implement the new API while leaving the
old one intact, to avoid having to rewrite all clients at once, and
potentially to allow clients to avoid a malloc if they already have a
convenient place to embed a `struct tempfile` (except that now they'd be
able to free it when done). For example, `create_tempfile(tempfile,
path)` and its friends could accept NULL as the first argument, in which
case it would malloc a `struct tempfile` itself, and mark it as being
owned by the tempfile module. Such objects would be freed when
deactivated. But if the caller passes in a non-NULL `tempfile` argument,
the old behavior would be retained.

Michael


Re: [PATCH] config: use a static lock_file struct

2017-08-30 Thread Michael Haggerty
On 08/30/2017 06:55 AM, Jeff King wrote:
> [...]
> Something like this, which AFAICT is about as safe as the existing code
> in its list manipulation. It retains the "active" flag as an additional
> check which I think isn't strictly necessary, but potentially catches
> some logic errors.
> 
> The strbuf_reset() calls become strbuf_release(), since we're promising
> the caller that they could now free the struct (or let it go out of
> scope) if they chose. Probably during a signal handler we should skip
> that (we know the struct is off the list and non-active at that point,
> but we could possibly hit libc's free() mutex).

This looks OK to me aside from the one caveat below.

> diff --git a/tempfile.c b/tempfile.c
> index 6843710670..a7d964ebf8 100644
> --- a/tempfile.c
> +++ b/tempfile.c
> [...]
> +static void deactivate_tempfile(struct tempfile *tempfile)
> +{
> + struct tempfile *volatile *p;
> +
> + if (!tempfile->active)
> + return;
> +
> + tempfile->active = 0;
> + for (p = _list; *p; p = &(*p)->next) {
> + if (*p == tempfile) {
> + *p = tempfile->next;
> + break;
> + }
>   }
>  }

`deactivate_tempfile()` is O(N) in the number of active tempfiles. This
could get noticeable for, say, updating 30k references, which involves
obtaining 30k reference locks. I think that code adds the locks in
alphabetical order and also removes them in alphabetical order, so the
overall effort would go like O(N²). I'm guessing that this would be
measurable but not fatal for realistic numbers of references, but it
should at least be benchmarked.

There are three obvious ways to make this O(1) again:

* Make the list doubly-linked.
* Make sure that all significant callers remove items in the *reverse*
order that they were added, in which case the item to be removed would
always be found near the head of the list.
* Change this class to keep track of a tail pointer and add new items at
the end of the list, and ensure that all significant callers remove
locks in the *same* order that they were added.

The second and third options might be easy or difficult (depend on how
much current callers need to be changed) and certainly would be easy to
break again as new callers are added in the future.

> [...]

Michael


Re: [PATCH] config: use a static lock_file struct

2017-08-29 Thread Michael Haggerty
On Wed, Aug 30, 2017 at 6:31 AM, Jeff King <p...@peff.net> wrote:
> On Wed, Aug 30, 2017 at 05:25:18AM +0200, Michael Haggerty wrote:
>> It was surprisingly hard trying to get that code to do the right thing,
>> non-racily, in every error path. Since `remove_tempfiles()` can be
>> called any time (even from a signal handler), the linked list of
>> `tempfile` objects has to be kept valid at all times but can't use
>> mutexes. I didn't have the energy to keep going and make the lock
>> objects freeable.
>>
>> I suppose the task could be made a bit easier by using `sigprocmask(2)`
>> or `pthread_sigmask(3)` to make its running environment a little bit
>> less hostile.
>
> I think there are really two levels of carefulness here:
>
>   1. Avoiding complicated things during a signal handler that may rely
>  on having a sane state from the rest of the program (e.g.,
>  half-formed entries, stdio locks, etc).
>
>   2. Being truly race-free in the face of a signal arriving while we're
>  running arbitrary code that might have a tempfile struct in a funny
>  state.
>
> I feel like right now we meet (1) and not (2). But I think if we keep to
> that lower bar of (1), it might not be that bad. We're assuming now that
> there's no race on the tempfile->active flag, for instance. We could
> probably make a similar assumption about putting items onto or taking
> them off of a linked list (it's not really atomic, but a single pointer
> assignment is probably "atomic enough" for our purposes).
>
> Or I dunno. There's a lot of "volatile" modifiers sprinkled around.
> Maybe those are enough to give us (2) as well (though in that case, I
> think we'd still be as OK with the list manipulation as we are with the
> active flag manipulation).

I did in fact strive for both (1) and (2), though I know that there
are a couple of short races remaining in the code (plus who knows how
many bugs).

Actually, I think you're right that a single "probably-atomic" pointer
assignment would be enough to remove an entry from the list safely. I
was thinking about what it would take to make the list thread-safe
(which it currently is not, but probably doesn't need to be(?)).
Modifying the linked list is not that difficult if you assume that
there is only a single thread modifying the list at any given time.

Michael


Re: [PATCH] config: use a static lock_file struct

2017-08-29 Thread Michael Haggerty
On 08/29/2017 09:12 PM, Jeff King wrote:
> On Tue, Aug 29, 2017 at 12:09:28PM -0700, Brandon Williams wrote:
> 
>>> -- >8 --
>>> Subject: [PATCH] config: use a static lock_file struct
>>>
>>> When modifying git config, we xcalloc() a struct lock_file
>>> but never free it. This is necessary because the tempfile
>>> code (upon which the locking code is built) requires that
>>> the resulting struct remain valid through the life of the
>>> program. However, it also confuses leak-checkers like
>>> valgrind because only the inner "struct tempfile" is still
>>> reachable; no pointer to the outer lock_file is kept.
>>
>> Is this just due to a limitation in the tempfile code?  Would it be
>> possible to improve the tempfile code such that we don't need to require
>> that a tempfile, once created, is required to exist for the remaining
>> life of the program?
> 
> Yes. Like I wrote below:
> 
>>> ---
>>> In the long run we may want to drop the "tempfiles must remain forever"
>>> rule. This is certainly not the first time it has caused confusion or
>>> leaks. And I don't think it's a fundamental issue, just the way the code
>>> is written. But in the interim, this fix is probably worth doing.
> 
> The main issue is that the caller needs to make sure they're removed
> from the list (via commit or rollback) before being freed.
> 
> As far as I know anyway. This restriction dates back to the very early
> days of the lockfile code and has been carried through the various
> tempfile-cleanup refactorings over the years (mostly because each was
> afraid to make functional changes).
> 
> +cc Michael, who did most comprehensive cleanup of that code.

It was surprisingly hard trying to get that code to do the right thing,
non-racily, in every error path. Since `remove_tempfiles()` can be
called any time (even from a signal handler), the linked list of
`tempfile` objects has to be kept valid at all times but can't use
mutexes. I didn't have the energy to keep going and make the lock
objects freeable.

I suppose the task could be made a bit easier by using `sigprocmask(2)`
or `pthread_sigmask(3)` to make its running environment a little bit
less hostile.

Michael


Re: [PATCH 04/10] packed_delete_refs(): implement method

2017-08-29 Thread Michael Haggerty
On 08/29/2017 08:07 PM, Brandon Williams wrote:
> On 08/29, Michael Haggerty wrote:
>> [...]
>> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
>> index d19b3bfba5..83a088118f 100644
>> --- a/refs/packed-backend.c
>> +++ b/refs/packed-backend.c
>> @@ -1082,7 +1082,50 @@ static int packed_initial_transaction_commit(struct 
>> ref_store *ref_store,
>>  static int packed_delete_refs(struct ref_store *ref_store, const char *msg,
>>   struct string_list *refnames, unsigned int flags)
>>  {
>> -die("BUG: not implemented yet");
>> +struct packed_ref_store *refs =
>> +packed_downcast(ref_store, REF_STORE_WRITE, "delete_refs");
>> +struct strbuf err = STRBUF_INIT;
>> +struct ref_transaction *transaction;
>> +struct string_list_item *item;
>> +int ret;
>> +
>> +(void)refs; /* We need the check above, but don't use the variable */
> 
> Can't say I've seen a line like this before, Is the intent to just mark
> that this variable won't be used like you mention in the comment?


The `(void)refs;` part tells the compiler not to complain about a
variable that is defined but never used.

So why define the variable in the first place? I could instead do

(void)packed_downcast(ref_store, REF_STORE_WRITE, "delete_refs");

My reason (which people might disagree with) is that I like to keep this
boilerplate consistent from method to method. To switch to the second
variant, not only would the line look different, but it would also have
to be moved down several lines, past the other variables' declarations,
to avoid the dreaded "ISO C90 forbids mixed declarations and code".

I'd prefer to be able to mix declarations and code, but the project's
style prohibits it.

Michael


Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-08-29 Thread Michael Haggerty
v3 looks good to me. Thanks!

Reviewed-by: Michael Haggerty <mhag...@alum.mit.edu>

Michael


Re: git describe and "the smallest number of commits possible"

2017-08-29 Thread Michael J Gruber
.3.93
 lightweight14176 emacs-24.3.92
 lightweight   129847 emacs-25.1
 lightweight   129864 emacs-25.1-rc2
 lightweight   129904 emacs-25.1-rc1
 lightweight   129971 emacs-25.0.92
 annotated 129981 emacs-25.2
 lightweight   130004 emacs-25.0.95
 annotated 130016 emacs-25.2-rc2
 annotated 130032 emacs-25.2-rc1
 annotated 130076 emacs-25.1.91
 lightweight   130083 emacs-25.0.94
 lightweight   130099 emacs-25.0.91
 lightweight   130125 emacs-25.1.90
 lightweight   130239 emacs-25.0.93
 lightweight   130255 emacs-25.0.90
traversed 130257 commits
more than 26 tags found; listed 26 most recent
gave up search at 96b894717caa773aa6d98ff57385f1c7537e8972
emacs-24.5-rc3-fixed-13470-gdcc3ef3ee7

At least the latter kind of makes sense:
git rev-list --count emacs-24.5-rc3-fixed..dcc3ef3ee7
13470

In short, I think that depth calculation is heavily problematic. First,
commit_list_insert_by_date() influences which tags are considered at all.
Second, and worse, the code for "describe cmt" clearly tries to
calculate the number of commits in cmt that are not in t (for each
candidate t), which ought to be "rev-list --count t..cmt", but fails
spectacularly.

I kind of suspect the priming of the tag depth on first encounter
("t->depth = seen_commits - 1;") to be the culprit because that depends
on the way we walk, not the topology, but I wouldn't know how to do better.

Michael


Re: [PATCH v2 2/2] refs/files-backend: fix memory leak in lock_ref_for_update

2017-08-29 Thread Michael Haggerty
On 08/28/2017 10:32 PM, Martin Ågren wrote:
> After the previous patch, none of the functions we call hold on to
> `referent.buf`, so we can safely release the string buffer before
> returning.

This patch looks good to me, but I did notice a pre-existing problem in
the area...

> ---
>  refs/files-backend.c | 31 ---
>  1 file changed, 20 insertions(+), 11 deletions(-)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index bdb0e22e5..15f34b10e 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> [...]
> @@ -2305,10 +2305,12 @@ static int lock_ref_for_update(struct files_ref_store 
> *refs,
>   strbuf_addf(err, "cannot lock ref '%s': 
> "
>   "error reading reference",
>   
> original_update_refname(update));
> - return -1;
> + ret = -1;
> + goto out;

It is incorrect to return -1 here. First of all, stylistically, the
return value should be a symbolic constant. But in fact, it should be
returning `TRANSACTION_GENERIC_ERROR` here, whereas -1 is
`TRANSACTION_NAME_CONFLICT`. So the code is not just stylistically
wrong; it is functionally wrong.

I know that this is not your mistake, but would you like to add another
patch to your series to fix this up? I'd do it myself, but it's a little
bit awkward because the fix will conflict with your patch.

Thanks,
Michael


Re: [PATCH v2 1/2] refs/files-backend: add longer-scoped copy of string to list

2017-08-29 Thread Michael Haggerty
On 08/28/2017 10:32 PM, Martin Ågren wrote:
> split_symref_update() receives a string-pointer `referent` and adds it
> to the list of `affected_refnames`. The list simply holds on to the
> pointers it is given, it does not copy the strings and it never frees
> them. The `referent` string in split_symref_update() belongs to a string
> buffer in the caller. After we return, the string will be leaked.
> 
> In the next patch, we want to properly release the string buffer in the
> caller, but we can't safely do so until we've made sure that
> `affected_refnames` will not be holding on to a pointer to the string.
> We could configure the list to handle its own resources, but it would
> mean some alloc/free-churning. The list is already handling other
> strings (through other code paths) which we do not need to worry about,
> and we'd be memory-churning those strings too, completely unnecessary.
> 
> Observe that split_symref_update() creates a `new_update`-object and
> that `new_update->refname` is then a copy of `referent`. The difference
> is, this copy will be freed, and it will be freed *after*
> `affected_refnames` has been cleared.
> 
> Rearrange the handling of `referent`, so that we don't add it directly
> to `affected_refnames`. Instead, first just check whether `referent`
> exists in the string list, and later add `new_update->refname`.
> 
> Helped-by: Michael Haggerty <mhag...@alum.mit.edu>
> Signed-off-by: Martin Ågren <martin.ag...@gmail.com>
> ---
> Thanks Junio and Michael for your comments on the first version. This
> first patch is now completely different and much much better (thanks
> Michael!). The commit message should also be better (sorry Junio...).
> The second one has a new commit message, but the diff is the same.
> 
> Martin
> 
>  refs/files-backend.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 5cca55510..bdb0e22e5 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2140,13 +2140,12 @@ static int split_symref_update(struct files_ref_store 
> *refs,
>  
>   /*
>* First make sure that referent is not already in the
> -  * transaction. This insertion is O(N) in the transaction
> +  * transaction. This check is O(N) in the transaction

The check is not O(N); it is O(lg N) because it is implemented via a
binary search in the (sorted) `affected_refnames`. The insertion below
is still O(N), because it has to shift entries later in the list to the
right to make room for the new entry.

>* size, but it happens at most once per symref in a
>* transaction.
>*/
> - item = string_list_insert(affected_refnames, referent);
> - if (item->util) {
> - /* An entry already existed */
> + if (string_list_has_string(affected_refnames, referent)) {
> + /* An entry already exists */
>   strbuf_addf(err,
>   "multiple updates for '%s' (including one "
>   "via symref '%s') are not allowed",
> @@ -2181,6 +2180,15 @@ static int split_symref_update(struct files_ref_store 
> *refs,
>   update->flags |= REF_LOG_ONLY | REF_NODEREF;
>   update->flags &= ~REF_HAVE_OLD;
>  
> + /*
> +  * Add the referent. This insertion is O(N) in the transaction
> +  * size, but it happens at most once per symref in a
> +  * transaction. Make sure to add new_update->refname, which will
> +  * be valid as long as affected_refnames is in use, and NOT
> +  * referent, which might soon be freed by our caller.
> +  */
> + item = string_list_insert(affected_refnames, new_update->refname);
> + assert(!item->util);

We generally avoid using `assert()`. It would be preferable to use

if (item->util)
BUG("%s unexpectedly found in affected_refnames",
new_update->refname);

>   item->util = new_update;
>  
>   return 0;
> 

Otherwise, looks good!

Michael


[PATCH 03/10] packed_ref_store: implement reference transactions

2017-08-29 Thread Michael Haggerty
Implement the methods needed to support reference transactions for
the packed-refs backend. The new methods are not yet used.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 313 +-
 refs/packed-backend.h |   9 ++
 2 files changed, 319 insertions(+), 3 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 7e348feac3..d19b3bfba5 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -744,25 +744,332 @@ static int packed_init_db(struct ref_store *ref_store, 
struct strbuf *err)
return 0;
 }
 
+/*
+ * Write the packed-refs from the cache to the packed-refs tempfile,
+ * incorporating any changes from `updates`. `updates` must be a
+ * sorted string list whose keys are the refnames and whose util
+ * values are `struct ref_update *`. On error, rollback the tempfile,
+ * write an error message to `err`, and return a nonzero value.
+ *
+ * The packfile must be locked before calling this function and will
+ * remain locked when it is done.
+ */
+static int write_with_updates(struct packed_ref_store *refs,
+ struct string_list *updates,
+ struct strbuf *err)
+{
+   struct ref_iterator *iter = NULL;
+   size_t i;
+   int ok;
+   FILE *out;
+   struct strbuf sb = STRBUF_INIT;
+   char *packed_refs_path;
+
+   if (!is_lock_file_locked(>lock))
+   die("BUG: write_with_updates() called while unlocked");
+
+   /*
+* If packed-refs is a symlink, we want to overwrite the
+* symlinked-to file, not the symlink itself. Also, put the
+* staging file next to it:
+*/
+   packed_refs_path = get_locked_file_path(>lock);
+   strbuf_addf(, "%s.new", packed_refs_path);
+   free(packed_refs_path);
+   if (create_tempfile(>tempfile, sb.buf) < 0) {
+   strbuf_addf(err, "unable to create file %s: %s",
+   sb.buf, strerror(errno));
+   strbuf_release();
+   return -1;
+   }
+   strbuf_release();
+
+   out = fdopen_tempfile(>tempfile, "w");
+   if (!out) {
+   strbuf_addf(err, "unable to fdopen packed-refs tempfile: %s",
+   strerror(errno));
+   goto error;
+   }
+
+   if (fprintf(out, "%s", PACKED_REFS_HEADER) < 0)
+   goto write_error;
+
+   /*
+* We iterate in parallel through the current list of refs and
+* the list of updates, processing an entry from at least one
+* of the lists each time through the loop. When the current
+* list of refs is exhausted, set iter to NULL. When the list
+* of updates is exhausted, leave i set to updates->nr.
+*/
+   iter = packed_ref_iterator_begin(>base, "",
+DO_FOR_EACH_INCLUDE_BROKEN);
+   if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+   iter = NULL;
+
+   i = 0;
+
+   while (iter || i < updates->nr) {
+   struct ref_update *update = NULL;
+   int cmp;
+
+   if (i >= updates->nr) {
+   cmp = -1;
+   } else {
+   update = updates->items[i].util;
+
+   if (!iter)
+   cmp = +1;
+   else
+   cmp = strcmp(iter->refname, update->refname);
+   }
+
+   if (!cmp) {
+   /*
+* There is both an old value and an update
+* for this reference. Check the old value if
+* necessary:
+*/
+   if ((update->flags & REF_HAVE_OLD)) {
+   if (is_null_oid(>old_oid)) {
+   strbuf_addf(err, "cannot update ref 
'%s': "
+   "reference already exists",
+   update->refname);
+   goto error;
+   } else if (oidcmp(>old_oid, iter->oid)) 
{
+   strbuf_addf(err, "cannot update ref 
'%s': "
+   "is at %s but expected %s",
+   update->refname,
+   oid_to_hex(iter->oid),
+   
oid_to_hex(>old_oid));
+   goto error;
+ 

[PATCH 02/10] struct ref_transaction: add a place for backends to store data

2017-08-29 Thread Michael Haggerty
`packed_ref_store` is going to want to store some transaction-wide
data, so make a place for it.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/refs-internal.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index b02dc5a7e3..d7d344de73 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -242,6 +242,7 @@ struct ref_transaction {
size_t alloc;
size_t nr;
enum ref_transaction_state state;
+   void *backend_data;
 };
 
 /*
-- 
2.14.1



[PATCH 10/10] files_transaction_finish(): delete reflogs before references

2017-08-29 Thread Michael Haggerty
If the deletion steps unexpectedly fail, it is less bad to leave a
reference without its reflog than it is to leave a reflog without its
reference, since the latter is an invalid repository state.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/files-backend.c | 35 +--
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 4f4c47b9db..36f81b4f28 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2630,6 +2630,27 @@ static int files_transaction_finish(struct ref_store 
*ref_store,
}
}
 
+   /*
+* Now that updates are safely completed, we can perform
+* deletes. First delete the reflogs of any references that
+* will be deleted, since (in the unexpected event of an
+* error) leaving a reference without a reflog is less bad
+* than leaving a reflog without a reference (the latter is a
+* mildly invalid repository state):
+*/
+   for (i = 0; i < transaction->nr; i++) {
+   struct ref_update *update = transaction->updates[i];
+   if (update->flags & REF_DELETING &&
+   !(update->flags & REF_LOG_ONLY) &&
+   !(update->flags & REF_ISPRUNING)) {
+   strbuf_reset();
+   files_reflog_path(refs, , update->refname);
+   if (!unlink_or_warn(sb.buf))
+   try_remove_empty_parents(refs, update->refname,
+
REMOVE_EMPTY_PARENTS_REFLOG);
+   }
+   }
+
/*
 * Perform deletes now that updates are safely completed.
 *
@@ -2666,20 +2687,6 @@ static int files_transaction_finish(struct ref_store 
*ref_store,
}
}
 
-   /* Delete the reflogs of any references that were deleted: */
-   for (i = 0; i < transaction->nr; i++) {
-   struct ref_update *update = transaction->updates[i];
-   if (update->flags & REF_DELETING &&
-   !(update->flags & REF_LOG_ONLY) &&
-   !(update->flags & REF_ISPRUNING)) {
-   strbuf_reset();
-   files_reflog_path(refs, , update->refname);
-   if (!unlink_or_warn(sb.buf))
-   try_remove_empty_parents(refs, update->refname,
-
REMOVE_EMPTY_PARENTS_REFLOG);
-   }
-   }
-
clear_loose_ref_cache(refs);
 
 cleanup:
-- 
2.14.1



[PATCH 09/10] packed-backend: rip out some now-unused code

2017-08-29 Thread Michael Haggerty
Now the outside world interacts with the packed ref store only via the
generic refs API plus a few lock-related functions. This allows us to
delete some functions that are no longer used, thereby completing the
encapsulation of the packed ref store.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 193 --
 refs/packed-backend.h |   8 ---
 2 files changed, 201 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 83a088118f..90f44c1fbb 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -91,19 +91,6 @@ struct ref_store *packed_ref_store_create(const char *path,
return ref_store;
 }
 
-/*
- * Die if refs is not the main ref store. caller is used in any
- * necessary error messages.
- */
-static void packed_assert_main_repository(struct packed_ref_store *refs,
- const char *caller)
-{
-   if (refs->store_flags & REF_STORE_MAIN)
-   return;
-
-   die("BUG: operation %s only allowed for main ref store", caller);
-}
-
 /*
  * Downcast `ref_store` to `packed_ref_store`. Die if `ref_store` is
  * not a `packed_ref_store`. Also die if `packed_ref_store` doesn't
@@ -321,40 +308,6 @@ static struct ref_dir *get_packed_refs(struct 
packed_ref_store *refs)
return get_packed_ref_dir(get_packed_ref_cache(refs));
 }
 
-/*
- * Add or overwrite a reference in the in-memory packed reference
- * cache. This may only be called while the packed-refs file is locked
- * (see packed_refs_lock()). To actually write the packed-refs file,
- * call commit_packed_refs().
- */
-void add_packed_ref(struct ref_store *ref_store,
-   const char *refname, const struct object_id *oid)
-{
-   struct packed_ref_store *refs =
-   packed_downcast(ref_store, REF_STORE_WRITE,
-   "add_packed_ref");
-   struct ref_dir *packed_refs;
-   struct ref_entry *packed_entry;
-
-   if (!is_lock_file_locked(>lock))
-   die("BUG: packed refs not locked");
-
-   if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL))
-   die("Reference has invalid format: '%s'", refname);
-
-   packed_refs = get_packed_refs(refs);
-   packed_entry = find_ref_entry(packed_refs, refname);
-   if (packed_entry) {
-   /* Overwrite the existing entry: */
-   oidcpy(_entry->u.value.oid, oid);
-   packed_entry->flag = REF_ISPACKED;
-   oidclr(_entry->u.value.peeled);
-   } else {
-   packed_entry = create_ref_entry(refname, oid, REF_ISPACKED);
-   add_ref_entry(packed_refs, packed_entry);
-   }
-}
-
 /*
  * Return the ref_entry for the given refname from the packed
  * references.  If it does not exist, return NULL.
@@ -592,152 +545,6 @@ int packed_refs_is_locked(struct ref_store *ref_store)
 static const char PACKED_REFS_HEADER[] =
"# pack-refs with: peeled fully-peeled \n";
 
-/*
- * Write the current version of the packed refs cache from memory to
- * disk. The packed-refs file must already be locked for writing (see
- * packed_refs_lock()). Return zero on success. On errors, rollback
- * the lockfile, write an error message to `err`, and return a nonzero
- * value.
- */
-int commit_packed_refs(struct ref_store *ref_store, struct strbuf *err)
-{
-   struct packed_ref_store *refs =
-   packed_downcast(ref_store, REF_STORE_WRITE | REF_STORE_MAIN,
-   "commit_packed_refs");
-   struct packed_ref_cache *packed_ref_cache =
-   get_packed_ref_cache(refs);
-   int ok;
-   int ret = -1;
-   struct strbuf sb = STRBUF_INIT;
-   FILE *out;
-   struct ref_iterator *iter;
-   char *packed_refs_path;
-
-   if (!is_lock_file_locked(>lock))
-   die("BUG: commit_packed_refs() called when unlocked");
-
-   /*
-* If packed-refs is a symlink, we want to overwrite the
-* symlinked-to file, not the symlink itself. Also, put the
-* staging file next to it:
-*/
-   packed_refs_path = get_locked_file_path(>lock);
-   strbuf_addf(, "%s.new", packed_refs_path);
-   if (create_tempfile(>tempfile, sb.buf) < 0) {
-   strbuf_addf(err, "unable to create file %s: %s",
-   sb.buf, strerror(errno));
-   strbuf_release();
-   goto out;
-   }
-   strbuf_release();
-
-   out = fdopen_tempfile(>tempfile, "w");
-   if (!out) {
-   strbuf_addf(err, "unable to fdopen packed-refs tempfile: %s",
-   strerror(errno));
-   goto error;
-   }
-
-   if (fprintf(out, "%s", PACKED_REFS_HEADER) < 0) {

[PATCH 08/10] files_ref_store: use a transaction to update packed refs

2017-08-29 Thread Michael Haggerty
When processing a `files_ref_store` transaction, it is sometimes
necessary to delete some references from the "packed-refs" file. Do
that using a reference transaction conducted against the
`packed_ref_store`.

This change further decouples `files_ref_store` from
`packed_ref_store`. It also fixes multiple problems, including the two
revealed by test cases added in the previous commit.

First, the old code didn't obtain the `packed-refs` lock until
`files_transaction_finish()`. This means that a failure to acquire the
`packed-refs` lock (e.g., due to contention with another process)
wasn't detected until it was too late (problems like this are supposed
to be detected in the "prepare" phase). The new code acquires the
`packed-refs` lock in `files_transaction_prepare()`, the same stage of
the processing when the loose reference locks are being acquired,
removing another reason why the "prepare" phase might succeed and the
"finish" phase might nevertheless fail.

Second, the old code deleted the loose version of a reference before
deleting any packed version of the same reference. This left a moment
when another process might think that the packed version of the
reference is current, which is incorrect. (Even worse, the packed
version of the reference can be arbitrarily old, and might even point
at an object that has since been garbage-collected.)

Third, if a reference deletion fails to acquire the `packed-refs` lock
altogether, then the old code might leave the repository in the
incorrect state (possibly corrupt) described in the previous
paragraph.

Now we activate the new "packed-refs" file (sans any references that
are being deleted) *before* deleting the corresponding loose
references. But we hold the "packed-refs" lock until after the loose
references have been finalized, thus preventing a simultaneous
"pack-refs" process from packing the loose version of the reference in
the time gap, which would otherwise defeat our attempt to delete it.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/files-backend.c | 132 +--
 t/t1404-update-ref-errors.sh |   4 +-
 2 files changed, 103 insertions(+), 33 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 29c7c78602..4f4c47b9db 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2391,13 +2391,22 @@ static int lock_ref_for_update(struct files_ref_store 
*refs,
return 0;
 }
 
+struct files_transaction_backend_data {
+   struct ref_transaction *packed_transaction;
+   int packed_refs_locked;
+};
+
 /*
  * Unlock any references in `transaction` that are still locked, and
  * mark the transaction closed.
  */
-static void files_transaction_cleanup(struct ref_transaction *transaction)
+static void files_transaction_cleanup(struct files_ref_store *refs,
+ struct ref_transaction *transaction)
 {
size_t i;
+   struct files_transaction_backend_data *backend_data =
+   transaction->backend_data;
+   struct strbuf err = STRBUF_INIT;
 
for (i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
@@ -2409,6 +2418,17 @@ static void files_transaction_cleanup(struct 
ref_transaction *transaction)
}
}
 
+   if (backend_data->packed_transaction &&
+   ref_transaction_abort(backend_data->packed_transaction, )) {
+   error("error aborting transaction: %s", err.buf);
+   strbuf_release();
+   }
+
+   if (backend_data->packed_refs_locked)
+   packed_refs_unlock(refs->packed_ref_store);
+
+   free(backend_data);
+
transaction->state = REF_TRANSACTION_CLOSED;
 }
 
@@ -2425,12 +2445,17 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
char *head_ref = NULL;
int head_type;
struct object_id head_oid;
+   struct files_transaction_backend_data *backend_data;
+   struct ref_transaction *packed_transaction = NULL;
 
assert(err);
 
if (!transaction->nr)
goto cleanup;
 
+   backend_data = xcalloc(1, sizeof(*backend_data));
+   transaction->backend_data = backend_data;
+
/*
 * Fail if a refname appears more than once in the
 * transaction. (If we end up splitting up any updates using
@@ -2497,6 +2522,41 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
  head_ref, _refnames, err);
if (ret)
break;
+
+   if (update->flags & REF_DELETING &&
+   !(update->flags & REF_LOG_ONLY) &&
+   !(update->flags &am

[PATCH 01/10] packed-backend: don't adjust the reference count on lock/unlock

2017-08-29 Thread Michael Haggerty
The old code incremented the packed ref cache reference count when
acquiring the packed-refs lock, and decremented the count when
releasing the lock. This is unnecessary because a locked packed-refs
file cannot be changed, so there is no reason for the cache to become
stale.

Moreover, the extra reference count causes a problem if we
intentionally clear the packed refs cache, as we sometimes need to do
if we change the cache in anticipation of writing a change to disk,
but then the write to disk fails. In that case, `packed_refs_unlock()`
would have no easy way to find the cache whose reference count it
needs to decrement.

This whole issue will soon become moot due to upcoming changes that
avoid changing the in-memory cache as part of updating the packed-refs
on disk, but this change makes that transition easier.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 412c85034f..7e348feac3 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -525,7 +525,6 @@ int packed_refs_lock(struct ref_store *ref_store, int 
flags, struct strbuf *err)
"packed_refs_lock");
static int timeout_configured = 0;
static int timeout_value = 1000;
-   struct packed_ref_cache *packed_ref_cache;
 
if (!timeout_configured) {
git_config_get_int("core.packedrefstimeout", _value);
@@ -560,9 +559,7 @@ int packed_refs_lock(struct ref_store *ref_store, int 
flags, struct strbuf *err)
 */
validate_packed_ref_cache(refs);
 
-   packed_ref_cache = get_packed_ref_cache(refs);
-   /* Increment the reference count to prevent it from being freed: */
-   acquire_packed_ref_cache(packed_ref_cache);
+   get_packed_ref_cache(refs);
return 0;
 }
 
@@ -576,7 +573,6 @@ void packed_refs_unlock(struct ref_store *ref_store)
if (!is_lock_file_locked(>lock))
die("BUG: packed_refs_unlock() called when not locked");
rollback_lock_file(>lock);
-   release_packed_ref_cache(refs->cache);
 }
 
 int packed_refs_is_locked(struct ref_store *ref_store)
-- 
2.14.1



<    1   2   3   4   5   6   7   8   9   10   >