[PATCH] pack-objects: turn off bitmaps when skipping objects

2014-03-14 Thread Jeff King
The pack bitmap format requires that we have a single bit
for each object in the pack, and that each object's bitmap
represents its complete set of reachable objects. Therefore
we have no way to represent the bitmap of an object which
references objects outside the pack.

We notice this problem while generating the bitmaps, as we
try to find the offset of a particular object and realize
that we do not have it. In this case we die, and neither the
bitmap nor the pack is generated. This is correct, but
perhaps a little unfriendly. If you have bitmaps turned on
in the config, many repacks will fail which would otherwise
succeed. E.g., incremental repacks, repacks with -l when
you have alternates, .keep files.

Instead, this patch notices early that we are omitting some
objects from the pack and turns off bitmaps (with a
warning). Note that this is not strictly correct, as it's
possible that the object being omitted is not reachable from
any other object in the pack. In practice, this is almost
never the case, and there are two advantages to doing it
this way:

  1. The code is much simpler, as we do not have to cleanly
 abort the bitmap-generation process midway through.

  2. We do not waste time partially generating bitmaps only
 to find out that some object deep in the history is not
 being packed.

Signed-off-by: Jeff King p...@peff.net
---
I posted this earlier here:

  http://article.gmane.org/gmane.comp.version-control.git/240969

The discussion resulted in the jk/repack-pack-keep-objects topic.
However, I think this is still worth applying, as it means git behaves
sensibly when objects are omitted for other reasons (e.g., because you
tried to use -b with an incremental repack, or because you favor
.keep files to bitmaps by explicitly setting repack.packKeptObjects
to false). In our previous discussions, I had assumed this patch had
already been picked up, but I don't see it anywhere. Without it, setting
repack.packKeptObjects to false is largely pointless (instead of
continuing without bitmaps, git will die).

 builtin/pack-objects.c  | 12 +++-
 t/t5310-pack-bitmaps.sh |  5 -
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 418801f..4ca3946 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1011,6 +1011,10 @@ static void create_object_entry(const unsigned char 
*sha1,
entry-no_try_delta = no_try_delta;
 }
 
+static const char no_closure_warning[] = N_(
+disabling bitmap writing, as some objects are not being packed
+);
+
 static int add_object_entry(const unsigned char *sha1, enum object_type type,
const char *name, int exclude)
 {
@@ -1021,8 +1025,14 @@ static int add_object_entry(const unsigned char *sha1, 
enum object_type type,
if (have_duplicate_entry(sha1, exclude, index_pos))
return 0;
 
-   if (!want_object_in_pack(sha1, exclude, found_pack, found_offset))
+   if (!want_object_in_pack(sha1, exclude, found_pack, found_offset)) {
+   /* The pack is missing an object, so it will not have closure */
+   if (write_bitmap_index) {
+   warning(_(no_closure_warning));
+   write_bitmap_index = 0;
+   }
return 0;
+   }
 
create_object_entry(sha1, type, pack_name_hash(name),
exclude, name  no_try_delta(name),
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index d3a3afa..f13525c 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -91,7 +91,10 @@ test_expect_success 'fetch (partial bitmap)' '
 
 test_expect_success 'incremental repack cannot create bitmaps' '
test_commit more-1 
-   test_must_fail git repack -d
+   find .git/objects/pack -name *.bitmap expect 
+   git repack -d 
+   find .git/objects/pack -name *.bitmap actual 
+   test_cmp expect actual
 '
 
 test_expect_success 'incremental repack can disable bitmaps' '
-- 
1.9.0.417.gc6bea4f
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] pack-objects: turn off bitmaps when skipping objects

2014-01-23 Thread Jeff King
On Wed, Jan 22, 2014 at 06:38:57PM -0800, Siddharth Agarwal wrote:

 Running git-next, writing bitmap indexes fails if a keep file is
 present from an earlier pack.

Right, that's expected.

The bitmap format cannot represent objects that are not present in the
pack. So we cannot write a bitmap index if any object reachable from a
packed commit is omitted from the pack.

We could be nicer and downgrade it to a warning, though. The patch below
does that.

 In our case we have .keep files lying around from ages ago (possibly
 due to kill -9s run on the server).

We ran into that problem at GitHub, too. We just turn off
`--honor-pack-keep` during our repacks, as we never want them on anyway
(and we would prefer to ignore the .keep than to abort the bitmap).

 It also means that running repack -a with bitmap writing enabled on a
 repo becomes problematic if a fetch is run concurrently.

For the most part, no. The .keep file should generally only be set
during the period between indexing the pack and updating the refs (so
while checking connectivity and running hooks). But pack-objects starts
from the ref tips and walks backwards. Until they are updated, it will
not try to pack the objects in the .keep files, as nobody references
them. There are two loopholes, though:

  1. In some instances, a remote may send an object we already have
 (e.g., because it is a blob referenced in an old commit, but newly
 referenced again due to a revert; we do not do a full object
 difference during the protocol negotiation, for reasons of
 efficiency). If that is the case, we may omit it if pack-objects
 starts during the period that the .pack and .keep files exist.

  2. Once the fetch updates the refs, it removes the .keep file. But
 this isn't atomic. A repack which starts between the two may pick
 up the new ref values, but also see the .keep file.

These are both unlikely, but possible on a very busy repository. The
patch below will downgrade each to a warning, rather than aborting the
repack.

So this should just work out of the box with this patch.  But if bitmaps
are important to you (say, you are running a very busy site and want
to make sure you always have bitmaps turned on) and you do not otherwise
care about .keep files, you may want to disable them, too.

-Peff

-- 8 --
Subject: pack-objects: turn off bitmaps when skipping objects

The pack bitmap format requires that we have a single bit
for each object in the pack, and that each object's bitmap
represents its complete set of reachable objects. Therefore
we have no way to represent the bitmap of an object which
references objects outside the pack.

We notice this problem while generating the bitmaps, as we
try to find the offset of a particular object and realize
that we do not have it. In this case we die, and neither the
bitmap nor the pack is generated. This is correct, but
perhaps a little unfriendly. If you have bitmaps turned on
in the config, many repacks will fail which would otherwise
succeed. E.g., incremental repacks, repacks with -l when
you have alternates, .keep files.

Instead, this patch notices early that we are omitting some
objects from the pack and turns off bitmaps (with a
warning). Note that this is not strictly correct, as it's
possible that the object being omitted is not reachable from
any other object in the pack. In practice, this is almost
never the case, and there are two advantages to doing it
this way:

  1. The code is much simpler, as we do not have to cleanly
 abort the bitmap-generation process midway through.

  2. We do not waste time partially generating bitmaps only
 to find out that some object deep in the history is not
 being packed.

Signed-off-by: Jeff King p...@peff.net
---
I tried to keep the warning to an 80-character line without making it
too confusing. Suggestions welcome if it doesn't make sense to people.

 builtin/pack-objects.c  | 12 +++-
 t/t5310-pack-bitmaps.sh |  5 -
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 8364fbd..76831d9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1000,6 +1000,10 @@ static void create_object_entry(const unsigned char 
*sha1,
entry-no_try_delta = no_try_delta;
 }
 
+static const char no_closure_warning[] = N_(
+disabling bitmap writing, as some objects are not being packed
+);
+
 static int add_object_entry(const unsigned char *sha1, enum object_type type,
const char *name, int exclude)
 {
@@ -1010,8 +1014,14 @@ static int add_object_entry(const unsigned char *sha1, 
enum object_type type,
if (have_duplicate_entry(sha1, exclude, index_pos))
return 0;
 
-   if (!want_object_in_pack(sha1, exclude, found_pack, found_offset))
+   if (!want_object_in_pack(sha1, exclude, found_pack, found_offset)) {
+   /* The pack is missing an object, so it will not have 

Re: [PATCH] pack-objects: turn off bitmaps when skipping objects

2014-01-23 Thread Siddharth Agarwal

On 01/23/2014 02:52 PM, Jeff King wrote:

Right, that's expected.

The bitmap format cannot represent objects that are not present in the
pack. So we cannot write a bitmap index if any object reachable from a
packed commit is omitted from the pack.

We could be nicer and downgrade it to a warning, though. The patch below
does that.


This makes sense.


In our case we have .keep files lying around from ages ago (possibly
due to kill -9s run on the server).

We ran into that problem at GitHub, too. We just turn off
`--honor-pack-keep` during our repacks, as we never want them on anyway
(and we would prefer to ignore the .keep than to abort the bitmap).


Yes, we'd prefer to do that too. How do you actually do this, though? I 
don't see a way to pass `--honor-pack-keep` (shouldn't I pass in its 
inverse?) down to `git-pack-objects`.



It also means that running repack -a with bitmap writing enabled on a
repo becomes problematic if a fetch is run concurrently.

For the most part, no. The .keep file should generally only be set
during the period between indexing the pack and updating the refs (so
while checking connectivity and running hooks). But pack-objects starts
from the ref tips and walks backwards. Until they are updated, it will
not try to pack the objects in the .keep files, as nobody references
them.


The worry is less certain objects not being packed and more the old 
packs being deleted by git repack, isn't it? From the man page for 
git-index-pack:


--keep
Before moving the index into its final destination create an empty .keep 
file for the associated pack file. This option is usually necessary with
--stdin to prevent a simultaneous git repack process from deleting the 
newly constructed pack and index before refs can be updated to use 
objects contained in the pack.


I could be misunderstanding things here, though. From the description in 
the man page it's not clear what the actual failure mode here is.



There are two loopholes, though:

   1. In some instances, a remote may send an object we already have
  (e.g., because it is a blob referenced in an old commit, but newly
  referenced again due to a revert; we do not do a full object
  difference during the protocol negotiation, for reasons of
  efficiency). If that is the case, we may omit it if pack-objects
  starts during the period that the .pack and .keep files exist.

   2. Once the fetch updates the refs, it removes the .keep file. But
  this isn't atomic. A repack which starts between the two may pick
  up the new ref values, but also see the .keep file.

These are both unlikely, but possible on a very busy repository. The
patch below will downgrade each to a warning, rather than aborting the
repack.

So this should just work out of the box with this patch.  But if bitmaps
are important to you (say, you are running a very busy site and want
to make sure you always have bitmaps turned on) and you do not otherwise
care about .keep files, you may want to disable them, too.


We need to make sure bitmaps are always turned on, but we need to be 
even more certain that pushes don't fail due to races.



-Peff

-- 8 --
Subject: pack-objects: turn off bitmaps when skipping objects

The pack bitmap format requires that we have a single bit
for each object in the pack, and that each object's bitmap
represents its complete set of reachable objects. Therefore
we have no way to represent the bitmap of an object which
references objects outside the pack.

We notice this problem while generating the bitmaps, as we
try to find the offset of a particular object and realize
that we do not have it. In this case we die, and neither the
bitmap nor the pack is generated. This is correct, but
perhaps a little unfriendly. If you have bitmaps turned on
in the config, many repacks will fail which would otherwise
succeed. E.g., incremental repacks, repacks with -l when
you have alternates, .keep files.

Instead, this patch notices early that we are omitting some
objects from the pack and turns off bitmaps (with a
warning). Note that this is not strictly correct, as it's
possible that the object being omitted is not reachable from
any other object in the pack. In practice, this is almost
never the case, and there are two advantages to doing it
this way:

   1. The code is much simpler, as we do not have to cleanly
  abort the bitmap-generation process midway through.

   2. We do not waste time partially generating bitmaps only
  to find out that some object deep in the history is not
  being packed.

Signed-off-by: Jeff King p...@peff.net
---
I tried to keep the warning to an 80-character line without making it
too confusing. Suggestions welcome if it doesn't make sense to people.

  builtin/pack-objects.c  | 12 +++-
  t/t5310-pack-bitmaps.sh |  5 -
  2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 8364fbd..76831d9 100644
--- 

Re: [PATCH] pack-objects: turn off bitmaps when skipping objects

2014-01-23 Thread Siddharth Agarwal

On 01/23/2014 03:45 PM, Siddharth Agarwal wrote:


The worry is less certain objects not being packed and more the old 
packs being deleted by git repack, isn't it? From the man page for 
git-index-pack:


This should probably be new pack and not old packs, I guess. Not 
knowing much about how this actually works, I'm assuming the scenario 
here is something like:


(1) git receive-pack receives a pack P.pack and writes it to disk
(2) git index-pack runs on P.pack
(3) git repack runs separately, finds pack P.pack with no refs pointing 
to it, and deletes it

(4) everything goes wrong

With a keep file, this would be averted because

(1) git receive-pack receives a pack P.pack and writes it to disk
(2) git index-pack writes a keep file for P.pack, called P.keep
(3) git repack runs separately, finds pack P.pack with a keep file, 
doesn't touch it
(4) git index-pack finishes, and something updates refs to point to 
P.pack and deletes P.keep

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-objects: turn off bitmaps when skipping objects

2014-01-23 Thread Vicent Martí
On Fri, Jan 24, 2014 at 12:45 AM, Siddharth Agarwal s...@fb.com wrote:
 Yes, we'd prefer to do that too. How do you actually do this, though? I
 don't see a way to pass `--honor-pack-keep` (shouldn't I pass in its
 inverse?) down to `git-pack-objects`.

We run with this patch in production, it may be of use to you:
https://gist.github.com/vmg/8589317

In fact, it may be worth upstreaming too. I'll kindly ask peff to do
it when he has a moment.

Apologies for not attaching the patch inline, the GMail web UI doesn't
mix well with patch workflow.

Cheers,
vmg
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-objects: turn off bitmaps when skipping objects

2014-01-23 Thread Jeff King
On Fri, Jan 24, 2014 at 12:56:17AM +0100, Vicent Martí wrote:

 On Fri, Jan 24, 2014 at 12:45 AM, Siddharth Agarwal s...@fb.com wrote:
  Yes, we'd prefer to do that too. How do you actually do this, though? I
  don't see a way to pass `--honor-pack-keep` (shouldn't I pass in its
  inverse?) down to `git-pack-objects`.
 
 We run with this patch in production, it may be of use to you:
 https://gist.github.com/vmg/8589317
 
 In fact, it may be worth upstreaming too. I'll kindly ask peff to do
 it when he has a moment.

I was actually looking at it earlier when I sent this message. The
tricky thing about the patch is that it turns off --honor-pack-keep, but
does _not_ teach git-repack to clean up the .keep file.

Which I think is the right and safe thing to do, as otherwise you might
blow away a pack with .keep, even though you did not just pack its
objects (i.e., because it was written by a fetch or push which did not
yet update the refs). So the safe thing is to actually duplicate those
objects, leave the .keep pack around, and then assume it will get
cleaned up on the next repack.

If you _do_ have a stale .keep file, though, then that stale pack will
hang around forever (presumably with its objects duplicated in the
real pack).

So I think the patch is doing the right thing, but I was still figuring
out how to explain it (and I hope I just did). I'll post it with a full
commit message tomorrow.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-objects: turn off bitmaps when skipping objects

2014-01-23 Thread Jeff King
On Thu, Jan 23, 2014 at 03:53:28PM -0800, Siddharth Agarwal wrote:

 On 01/23/2014 03:45 PM, Siddharth Agarwal wrote:
 
 The worry is less certain objects not being packed and more the old
 packs being deleted by git repack, isn't it? From the man page for
 git-index-pack:
 
 This should probably be new pack and not old packs, I guess. Not
 knowing much about how this actually works, I'm assuming the scenario
 here is something like:
 
 (1) git receive-pack receives a pack P.pack and writes it to disk
 (2) git index-pack runs on P.pack
 (3) git repack runs separately, finds pack P.pack with no refs
 pointing to it, and deletes it
 (4) everything goes wrong
 
 With a keep file, this would be averted because
 
 (1) git receive-pack receives a pack P.pack and writes it to disk
 (2) git index-pack writes a keep file for P.pack, called P.keep
 (3) git repack runs separately, finds pack P.pack with a keep file,
 doesn't touch it
 (4) git index-pack finishes, and something updates refs to point to
 P.pack and deletes P.keep

I think your understanding is accurate here. So we want repack to
respect keep files for deletion, but we _not_ necessarily want
pack-objects to avoid packing an object just because it's in a pack
marked by .keep (see my other email).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-objects: turn off bitmaps when skipping objects

2014-01-23 Thread Siddharth Agarwal

On 01/23/2014 06:28 PM, Jeff King wrote:

I think your understanding is accurate here. So we want repack to
respect keep files for deletion, but we _not_ necessarily want
pack-objects to avoid packing an object just because it's in a pack
marked by .keep (see my other email).


Yes, that makes sense and sounds pretty safe.

So the right solution for us probably is to apply the patch Vicent 
posted, set repack.honorpackkeep to false, and also have a cron job that 
cleans up stale .keep files so that subsequent repacks clean it up.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html