Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-03-03 Thread Jeff King
On Mon, Mar 03, 2014 at 11:51:06AM -0800, Junio C Hamano wrote:

> > Yes. Do you need a re-roll from me? I think the last version I sent +
> > the squash to tie the default to bitmap-writing makes the most sense.
> 
> I have 9e20b390 (repack: add `repack.packKeptObjects` config var,
> 2014-02-26); I do not recall I've squashed anything into it, though.
> 
> Do you mean this one?
> 
> Here's the interdiff for doing the fallback:
> [...]

Yes. Though I just noticed that the commit message needs updating if
that is squashed in. Here is the whole patch, with that update.

And I am dropping Vicent as the author, since I think there are now
literally zero lines of his left. ;)

-- >8 --
Subject: [PATCH] repack: add `repack.packKeptObjects` config var

The git-repack command always passes `--honor-pack-keep`
to pack-objects. This has traditionally been a good thing,
as we do not want to duplicate those objects in a new pack,
and we are not going to delete the old pack.

However, when bitmaps are in use, it is important for a full
repack to include all reachable objects, even if they may be
duplicated in a .keep pack. Otherwise, we cannot generate
the bitmaps, as the on-disk format requires the set of
objects in the pack to be fully closed.

Even if the repository does not generally have .keep files,
a simultaneous push could cause a race condition in which a
.keep file exists at the moment of a repack. The repack may
try to include those objects in one of two situations:

  1. The pushed .keep pack contains objects that were
 already in the repository (e.g., blobs due to a revert of
 an old commit).

  2. Receive-pack updates the refs, making the objects
 reachable, but before it removes the .keep file, the
 repack runs.

In either case, we may prefer to duplicate some objects in
the new, full pack, and let the next repack (after the .keep
file is cleaned up) take care of removing them.

This patch introduces both a command-line and config option
to disable the `--honor-pack-keep` option.  By default, it
is triggered when pack.writeBitmaps (or `--write-bitmap-index`
is turned on), but specifying it explicitly can override the
behavior (e.g., in cases where you prefer .keep files to
bitmaps, but only when they are present).

Note that this option just disables the pack-objects
behavior. We still leave packs with a .keep in place, as we
do not necessarily know that we have duplicated all of their
objects.

Signed-off-by: Jeff King 
---
 Documentation/config.txt |  7 +++
 Documentation/git-repack.txt |  8 
 builtin/repack.c | 13 -
 t/t7700-repack.sh| 18 +-
 4 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index becbade..a8ddc7f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2136,6 +2136,13 @@ repack.usedeltabaseoffset::
"false" and repack. Access from old Git versions over the
native protocol are unaffected by this option.
 
+repack.packKeptObjects::
+   If set to true, makes `git repack` act as if
+   `--pack-kept-objects` was passed. See linkgit:git-repack[1] for
+   details. Defaults to `false` normally, but `true` if a bitmap
+   index is being written (either via `--write-bitmap-index` or
+   `pack.writeBitmaps`).
+
 rerere.autoupdate::
When set to true, `git-rerere` updates the index with the
resulting contents after it cleanly resolves conflicts using
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 002cfd5..4786a78 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -117,6 +117,14 @@ other objects in that pack they already have locally.
must be able to refer to all reachable objects. This option
overrides the setting of `pack.writebitmaps`.
 
+--pack-kept-objects::
+   Include objects in `.keep` files when repacking.  Note that we
+   still do not delete `.keep` packs after `pack-objects` finishes.
+   This means that we may duplicate objects, but this makes the
+   option safe to use when there are concurrent pushes or fetches.
+   This option is generally only useful if you are writing bitmaps
+   with `-b` or `pack.writebitmaps`, as it ensures that the
+   bitmapped packfile has the necessary objects.
 
 Configuration
 -
diff --git a/builtin/repack.c b/builtin/repack.c
index 49f5857..6b0b62d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -9,6 +9,7 @@
 #include "argv-array.h"
 
 static int delta_base_offset = 1;
+static int pack_kept_objects = -1;
 static char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
@@ -22,6 +23,10 @@ static int repack_config(const char *var, const char *value, 
void *cb)
delta_base_offset = git_config_bool(var, value);
return 0;
}
+   if (!strcmp(var, "repack.packkeptob

Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-03-03 Thread Junio C Hamano
Jeff King  writes:

> On Mon, Mar 03, 2014 at 10:13:47AM -0800, Junio C Hamano wrote:
>
>> > Or the flip side: if the user wants to use .keep, we should drop
>> > bitmaps. My point is that we do not know which way the user wants to
>> > go, so we should not tie the options together.
>> 
>> Hmph.  I think the short of your later explanation is "global config
>> may tell us to use bitmap, in which case we would need a way to
>> defeat that and have existing .keep honored, and it makes it easier
>> to do so if these two are kept separate, because you do not want to
>> run around and selectively disable bitmaps in these repositories.
>> We can instead do so with repack.packKeptObjects in the global
>> configuration." and I tend to agree with the reasoning.
>
> Yes. Do you need a re-roll from me? I think the last version I sent +
> the squash to tie the default to bitmap-writing makes the most sense.

I have 9e20b390 (repack: add `repack.packKeptObjects` config var,
2014-02-26); I do not recall I've squashed anything into it, though.

Do you mean this one?

Here's the interdiff for doing the fallback:

---
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3a3d84f..a8ddc7f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2139,7 +2139,9 @@ repack.usedeltabaseoffset::
 repack.packKeptObjects::
If set to true, makes `git repack` act as if
`--pack-kept-objects` was passed. See linkgit:git-repack[1] for
-   details. Defaults to false.
+   details. Defaults to `false` normally, but `true` if a bitmap
+   index is being written (either via `--write-bitmap-index` or
+   `pack.writeBitmaps`).
 
 rerere.autoupdate::
When set to true, `git-rerere` updates the index with the
diff --git a/builtin/repack.c b/builtin/repack.c
index 49947b2..6b0b62d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -9,7 +9,7 @@
 #include "argv-array.h"
 
 static int delta_base_offset = 1;
-static int pack_kept_objects;
+static int pack_kept_objects = -1;
 static char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
@@ -190,6 +190,9 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
argc = parse_options(argc, argv, prefix, builtin_repack_options,
git_repack_usage, 0);
 
+   if (pack_kept_objects < 0)
+   pack_kept_objects = write_bitmap;
+
packdir = mkpathdup("%s/pack", get_object_directory());
packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
 
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index f8431a8..b1eed5c 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -21,7 +21,7 @@ test_expect_success 'objects in packs marked .keep are not 
repacked' '
objsha1=$(git verify-pack -v pack-$packsha1.idx | head -n 1 |
sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") &&
mv pack-* .git/objects/pack/ &&
-   git repack -A -d -l &&
+   git repack --no-pack-kept-objects -A -d -l &&
git prune-packed &&
for p in .git/objects/pack/*.idx; do
idx=$(basename $p)
@@ -35,9 +35,9 @@ test_expect_success 'objects in packs marked .keep are not 
repacked' '
test -z "$found_duplicate_object"
 '
 
-test_expect_success '--pack-kept-objects duplicates objects' '
+test_expect_success 'writing bitmaps duplicates .keep objects' '
# build on $objsha1, $packsha1, and .keep state from previous
-   git repack -Adl --pack-kept-objects &&
+   git repack -Adl &&
test_when_finished "found_duplicate_object=" &&
for p in .git/objects/pack/*.idx; do
idx=$(basename $p)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-03-03 Thread Shawn Pearce
On Fri, Feb 28, 2014 at 10:05 PM, Jeff King  wrote:
> On Fri, Feb 28, 2014 at 10:09:08AM -0700, Nasser Grainawi wrote:
>
>> > Exactly. The two features (bitmaps and .keep) are not compatible with
>> > each other, so you have to prioritize one. If you are using static .keep
>> > files, you might want them to continue being respected at the expense of
>> > using bitmaps for that repo. So I think you want a separate option from
>> > --write-bitmap-index to allow the appropriate flexibility.
>>
>> Has anyone thought about how to make them compatible?
>
> Yes, but it's complicated and not likely to happen soon.
>
> Having .keep files means that you are not including some objects in the
> newly created pack. Each bit in a commit's bitmap corresponds to one
> object in the pack, and whether it is reachable from that commit. The
> bitmap is only useful if we can calculate the full reachability from it,
> and it has no way to specify objects outside of the pack.
>
> To fix this, you would need to change the on-disk format of the bitmaps
> to somehow reference objects outside of the pack. Either by having the
> bitmaps index a repo-global set of objects, or by permitting a list of
> "edge" objects that are referenced from the pack, but not included (and
> then when assembling the full reachable list, you would have to recurse
> across "edge" objects to find their reachable list in another pack,
> etc).
>
> So it's possible, but it would complicate the scheme quite a bit, and
> would not be backwards compatible with either JGit or C Git.

Colby Ranger always wanted to add this to the bitmap scheme. Construct
a partial pack bitmap on a partial pack of "recent" objects, with edge
pointers naming objects that are not in this pack but whose closures
need to be considered part of the bitmap. Its complicated in-memory
because you need to fuse together two or more bitmaps (the partial
pack one, and the larger historical kept pack) before running the
"want AND NOT have" computation.

Colby did not find time to work on this in JGit, so it just didn't get
implemented. But we did consider it, as the servers at Google we built
bitmap for use a multi-level pack scheme and don't want to rebuild
packs all of the time.

>> We're using Martin Fick's git-exproll script which makes heavy use of
>> keeps to reduce pack file churn. In addition to the on-disk benefits
>> we get there, the driving factor behind creating exproll was to
>> prevent Gerrit from having two large (30GB+) mostly duplicated pack
>> files open in memory at the same time. Repacking in JGit would help in
>> a single-master environment, but we'd be back to having this problem
>> once we go to a multi-master setup.
>>
>> Perhaps the solution here is actually something in JGit where it could
>> aggressively try to close references to pack files
>
> In C git we don't worry about this too much, because our programs tend
> to be short-lived, and references to the old pack will go away quickly.
> Plus it is all mmap'd, so as we simply stop accessing the pages of the
> old pack, they should eventually be dropped if there is memory pressure.
>
> I seem to recall that JGit does not mmap its packfiles. Does it pread?

JGit does not mmap because you can't munmap() until the Java GC gets
around to freeing the tiny little header object that contains the
memory address of the start of the mmap segment. This can take ages,
to the point where you run out of virtual address space in the process
and s**t starts to fail left and right inside of the JVM. The GC is
just unable to prioritize finding those tiny headers and getting them
out of the heap so the munmap can take place safely.

So yea, JGit does pread() for the blocks but it holds those in its own
buffer cache inside of the Java heap. Where a 4K disk block is a 4K
memory array that puts pressure on the GC to actually wake up and free
resources that are unused. What Nasser is talking about is JGit may
take a long time to realize one pack is unused and start kicking those
blocks out of its buffer cache. Those blocks are reference counted and
the file descriptor JGit preads from is held open so long as at least
one block is in the buffer cache. By keeping the file open we force
the filesystem to keep the inode alive a lot longer, which means the
disk needs a huge amount of free space to store the unlinked but still
open 30G pack files from prior GC generations.

> In that case, I'd expect unused bits from the duplicated packfile to get
> dropped from the disk cache over time. If it loads whole packfiles into
> memory, then yes, it should probably close more aggressively.

Its more than that, its the inode being kept alive by the open file
descriptor...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-03-03 Thread Jeff King
On Mon, Mar 03, 2014 at 10:13:47AM -0800, Junio C Hamano wrote:

> > Or the flip side: if the user wants to use .keep, we should drop
> > bitmaps. My point is that we do not know which way the user wants to
> > go, so we should not tie the options together.
> 
> Hmph.  I think the short of your later explanation is "global config
> may tell us to use bitmap, in which case we would need a way to
> defeat that and have existing .keep honored, and it makes it easier
> to do so if these two are kept separate, because you do not want to
> run around and selectively disable bitmaps in these repositories.
> We can instead do so with repack.packKeptObjects in the global
> configuration." and I tend to agree with the reasoning.

Yes. Do you need a re-roll from me? I think the last version I sent +
the squash to tie the default to bitmap-writing makes the most sense.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-03-03 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Feb 28, 2014 at 10:45:39AM -0800, Junio C Hamano wrote:
>
>> > Exactly. The two features (bitmaps and .keep) are not compatible with
>> > each other, so you have to prioritize one. If you are using static .keep
>> > files, you might want them to continue being respected at the expense of
>> > using bitmaps for that repo. So I think you want a separate option from
>> > --write-bitmap-index to allow the appropriate flexibility.
>> 
>> What is "the appropriate flexibility", though?  If the user wants to
>> use bitmap, we would need to drop .keep, no?
>
> Or the flip side: if the user wants to use .keep, we should drop
> bitmaps. My point is that we do not know which way the user wants to
> go, so we should not tie the options together.

Hmph.  I think the short of your later explanation is "global config
may tell us to use bitmap, in which case we would need a way to
defeat that and have existing .keep honored, and it makes it easier
to do so if these two are kept separate, because you do not want to
run around and selectively disable bitmaps in these repositories.
We can instead do so with repack.packKeptObjects in the global
configuration." and I tend to agree with the reasoning.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-02-28 Thread Jeff King
On Fri, Feb 28, 2014 at 10:09:08AM -0700, Nasser Grainawi wrote:

> > Exactly. The two features (bitmaps and .keep) are not compatible with
> > each other, so you have to prioritize one. If you are using static .keep
> > files, you might want them to continue being respected at the expense of
> > using bitmaps for that repo. So I think you want a separate option from
> > --write-bitmap-index to allow the appropriate flexibility.
> 
> Has anyone thought about how to make them compatible?

Yes, but it's complicated and not likely to happen soon.

Having .keep files means that you are not including some objects in the
newly created pack. Each bit in a commit's bitmap corresponds to one
object in the pack, and whether it is reachable from that commit. The
bitmap is only useful if we can calculate the full reachability from it,
and it has no way to specify objects outside of the pack.

To fix this, you would need to change the on-disk format of the bitmaps
to somehow reference objects outside of the pack. Either by having the
bitmaps index a repo-global set of objects, or by permitting a list of
"edge" objects that are referenced from the pack, but not included (and
then when assembling the full reachable list, you would have to recurse
across "edge" objects to find their reachable list in another pack,
etc).

So it's possible, but it would complicate the scheme quite a bit, and
would not be backwards compatible with either JGit or C Git.

> We're using Martin Fick's git-exproll script which makes heavy use of
> keeps to reduce pack file churn. In addition to the on-disk benefits
> we get there, the driving factor behind creating exproll was to
> prevent Gerrit from having two large (30GB+) mostly duplicated pack
> files open in memory at the same time. Repacking in JGit would help in
> a single-master environment, but we'd be back to having this problem
> once we go to a multi-master setup.
> 
> Perhaps the solution here is actually something in JGit where it could
> aggressively try to close references to pack files

In C git we don't worry about this too much, because our programs tend
to be short-lived, and references to the old pack will go away quickly.
Plus it is all mmap'd, so as we simply stop accessing the pages of the
old pack, they should eventually be dropped if there is memory pressure.

I seem to recall that JGit does not mmap its packfiles. Does it pread?
In that case, I'd expect unused bits from the duplicated packfile to get
dropped from the disk cache over time. If it loads whole packfiles into
memory, then yes, it should probably close more aggressively.

> , but that still
> doesn't help the disk churn problem. As Peff says below, we would want
> to repack often to get up-to-date bitmaps, but ideally we could do
> that without writing hundreds of GBs to disk (which is obviously worse
> when "disk" is a NFS mount).

Ultimately I think the solution to the churn problem is a packfile-like
storage that allows true appending of deltas. You can come up with a
scheme to allow deltas between on-disk packs (i.e., "thin" packs on
disk). The trick there is handling the dependencies and cycles. I think
you could get by with a strict ordering of packs and a few rules:

  1. An object in a pack with weight A cannot have as a base an object
 in a pack with weight <= A.

  2. A pack with weight A cannot be deleted if there exists a pack with
 weight > A.

But you'd want to also add in a single update-able index over all the
packfiles, and even then you'd still want to pack occasionally (because
you'd end up with deltas on bases going back in time, but you really
prefer your bases to be near the tip of history).

So I am not volunteering to work on it. :)

At GitHub we mostly deal with the churn by throwing more server
resources at it. But we have the advantage of having a very large number
of small-to-medium repos, which is relatively easy to scale up. A small
number of huge repos is trickier.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-02-28 Thread Jeff King
On Fri, Feb 28, 2014 at 10:45:39AM -0800, Junio C Hamano wrote:

> > Exactly. The two features (bitmaps and .keep) are not compatible with
> > each other, so you have to prioritize one. If you are using static .keep
> > files, you might want them to continue being respected at the expense of
> > using bitmaps for that repo. So I think you want a separate option from
> > --write-bitmap-index to allow the appropriate flexibility.
> 
> What is "the appropriate flexibility", though?  If the user wants to
> use bitmap, we would need to drop .keep, no?

Or the flip side: if the user wants to use .keep, we should drop
bitmaps. My point is that we do not know which way the user wants to
go, so we should not tie the options together.

> Doesn't always having two copies in two packs degrade performance
> unnecessarily (without even talking about wasted diskspace)?  An
> explicit .keep exists in the repository because it is expensive and
> undesirable to duplicate what is in there in the first place, so it
> feels to me that either

The benefits of static .keep files are (I think):

  1. less I/O during repacks, as you do not rewrite a static set of
 objects

  2. less turnover of packfiles, which can make dumb access more
 efficient (both for dumb clients, but also for things like
 non-git-aware backups).

I think the existence of smart-http more or less nullifies (2). For (1),
it helps at first, but you get diminishing returns as your non-keep
packfile grows. I think it only helps in pathological cases (e.g., you
mark 10GB worth of giant blobs in a .keep pack, and then pack the other
10MB of trees, commits, and normal-sized blobs as usual).

>  - Disable with warning, or outright refuse, the "-b" option if
>there is .keep (if we want to give precedence to .keep); or
> 
>  - Remove .keep with warning when "-b" option is given (if we want
>to give precedence to "-b").
> 
> and nothing else would be a reasonable option.  Unfortunately, we
> can do neither automatically because there could be a transient .keep
> file in an active repository.

Right, the transient ones complicate the issue. But I think even for
static .keep versus bitmaps, there is question. See below...

> > The default is another matter.  I think most people using .bitmaps on a
> > server would probably want to set repack.packKeptObjects.  They would
> > want to repack often to take advantage of the .bitmaps anyway, so they
> > probably don't care about .keep files (any they see are due to races
> > with incoming pushes).
> 
> ... which makes me think that repack.packKeptObjects is merely a
> distraction---it should be enough to just pass "--pack-kept-objects"
> when "-b" is asked, without giving any extra configurability, no?

But you do not necessarily ask for "-b" explicitly; it might come from
the config, too. Imagine you have a server with many repos. You want to
use bitmaps when you can, so you set pack.writeBitmaps in
/etc/gitconfig. But in a few repos, you want to use .keep files, and
it's more important for you to use it than bitmaps (e.g., because it is
one of the pathological cases above). So you set repack.packKeptObjects
to false in /etc/gitconfig, to prefer .keep to bitmaps where
appropriate.

If you did not have that config option, your alternative would be to
turn off pack.writeBitmaps in the repositories with .keep files. But
then you need to per-repo keep that flag in sync with whether or not the
repo has .keep files.

To be clear, at GitHub we do not plan on ever having
repack.packKeptObjects off (for now we have it on explicitly, but if it
were connected to pack.writeBitmaps, then we would be happy with that
default). I am mostly trying to give an escape hatch to let people use
different optimization strategies if they want.

If we are going to have --pack-kept-objects (and I think we should),
I think we also should have a matching config option. Because it is
useful when matched with the bitmap code, and that can be turned on both
from the command-line or from the config. Wherever you are doing that,
you would want to be able to make the matching .keep decision.

And I don't think it hurts much.  With the fallback-to-on behavior, you
do not have to even care that it is there unless you are doing something
clever.

> > So we could do something like falling back to turning the option on if
> > --write-bitmap-index is on _and_ the user didn't specify
> > --pack-kept-objects.
> 
> If you mean "didn't specify --no-pack-kept-objects", then I think
> that is sensible.  I still do not know why we would want the
> configuration variable, though.

Right, I meant "if the pack-kept-objects variable is set, either on or
off, either via the command line or in the config".

As far as what the config is good for, see above.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-02-28 Thread Junio C Hamano
Jeff King  writes:

> On Thu, Feb 27, 2014 at 10:04:44AM -0800, Junio C Hamano wrote:
>
>> I wonder if it makes sense to link it with "pack.writebitmaps" more
>> tightly, without even exposing it as a seemingly orthogonal knob
>> that can be tweaked, though.
>> 
>> I think that is because I do not fully understand the ", because ..."
>> part of the below:
>> 
>> >> This patch introduces an option to disable the
>> >> `--honor-pack-keep` option.  It is not triggered by default,
>> >> even when pack.writeBitmaps is turned on, because its use
>> >> depends on your overall packing strategy and use of .keep
>> >> files.
>> 
>> If you ask --write-bitmap-index (or have pack.writeBitmaps on), you
>> do want the bitmap-index to be written, and unless you tell
>> pack-objects to ignore the .keep marker, it cannot do so, no?
>> 
>> Does the ", because ..." part above mean "you may have an overall
>> packing strategy to use .keep file to not ever repack some subset of
>> the objects, so we will not silently explode the kept objects into a
>> new pack"?
>
> Exactly. The two features (bitmaps and .keep) are not compatible with
> each other, so you have to prioritize one. If you are using static .keep
> files, you might want them to continue being respected at the expense of
> using bitmaps for that repo. So I think you want a separate option from
> --write-bitmap-index to allow the appropriate flexibility.

What is "the appropriate flexibility", though?  If the user wants to
use bitmap, we would need to drop .keep, no?  Doesn't always having
two copies in two packs degrade performance unnecessarily (without
even talking about wasted diskspace)?  An explicit .keep exists in
the repository because it is expensive and undesirable to duplicate
what is in there in the first place, so it feels to me that either

 - Disable with warning, or outright refuse, the "-b" option if
   there is .keep (if we want to give precedence to .keep); or

 - Remove .keep with warning when "-b" option is given (if we want
   to give precedence to "-b").

and nothing else would be a reasonable option.  Unfortunately, we
can do neither automatically because there could be a transient .keep
file in an active repository.

So I think I agree with this...

> The default is another matter.  I think most people using .bitmaps on a
> server would probably want to set repack.packKeptObjects.  They would
> want to repack often to take advantage of the .bitmaps anyway, so they
> probably don't care about .keep files (any they see are due to races
> with incoming pushes).

... which makes me think that repack.packKeptObjects is merely a
distraction---it should be enough to just pass "--pack-kept-objects"
when "-b" is asked, without giving any extra configurability, no?

> So we could do something like falling back to turning the option on if
> --write-bitmap-index is on _and_ the user didn't specify
> --pack-kept-objects.

If you mean "didn't specify --no-pack-kept-objects", then I think
that is sensible.  I still do not know why we would want the
configuration variable, though.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-02-28 Thread Nasser Grainawi
On Feb 28, 2014, at 1:55 AM, Jeff King  wrote:

> On Thu, Feb 27, 2014 at 10:04:44AM -0800, Junio C Hamano wrote:
> 
>> I wonder if it makes sense to link it with "pack.writebitmaps" more
>> tightly, without even exposing it as a seemingly orthogonal knob
>> that can be tweaked, though.
>> 
>> I think that is because I do not fully understand the ", because ..."
>> part of the below:
>> 
 This patch introduces an option to disable the
 `--honor-pack-keep` option.  It is not triggered by default,
 even when pack.writeBitmaps is turned on, because its use
 depends on your overall packing strategy and use of .keep
 files.
>> 
>> If you ask --write-bitmap-index (or have pack.writeBitmaps on), you
>> do want the bitmap-index to be written, and unless you tell
>> pack-objects to ignore the .keep marker, it cannot do so, no?
>> 
>> Does the ", because ..." part above mean "you may have an overall
>> packing strategy to use .keep file to not ever repack some subset of
>> the objects, so we will not silently explode the kept objects into a
>> new pack"?
> 
> Exactly. The two features (bitmaps and .keep) are not compatible with
> each other, so you have to prioritize one. If you are using static .keep
> files, you might want them to continue being respected at the expense of
> using bitmaps for that repo. So I think you want a separate option from
> --write-bitmap-index to allow the appropriate flexibility.

Has anyone thought about how to make them compatible? We're using Martin Fick's 
git-exproll script which makes heavy use of keeps to reduce pack file churn. In 
addition to the on-disk benefits we get there, the driving factor behind 
creating exproll was to prevent Gerrit from having two large (30GB+) mostly 
duplicated pack files open in memory at the same time. Repacking in JGit would 
help in a single-master environment, but we'd be back to having this problem 
once we go to a multi-master setup.

Perhaps the solution here is actually something in JGit where it could 
aggressively try to close references to pack files, but that still doesn't help 
the disk churn problem. As Peff says below, we would want to repack often to 
get up-to-date bitmaps, but ideally we could do that without writing hundreds 
of GBs to disk (which is obviously worse when "disk" is a NFS mount).

> 
> The default is another matter.  I think most people using .bitmaps on a
> server would probably want to set repack.packKeptObjects.  They would
> want to repack often to take advantage of the .bitmaps anyway, so they
> probably don't care about .keep files (any they see are due to races
> with incoming pushes).
> 
> So we could do something like falling back to turning the option on if
> --write-bitmap-index is on _and_ the user didn't specify
> --pack-kept-objects. The existing default is mostly there because it is
> the conservative choice (.keep files continue to do their thing as
> normal unless you say otherwise). But the fallback thing would be one
> less knob that bitmap users would need to turn in the common case.
> 
> Here's the interdiff for doing the fallback:
> 
> ---
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 3a3d84f..a8ddc7f 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2139,7 +2139,9 @@ repack.usedeltabaseoffset::
> repack.packKeptObjects::
>   If set to true, makes `git repack` act as if
>   `--pack-kept-objects` was passed. See linkgit:git-repack[1] for
> - details. Defaults to false.
> + details. Defaults to `false` normally, but `true` if a bitmap
> + index is being written (either via `--write-bitmap-index` or
> + `pack.writeBitmaps`).
> 
> rerere.autoupdate::
>   When set to true, `git-rerere` updates the index with the
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 49947b2..6b0b62d 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -9,7 +9,7 @@
> #include "argv-array.h"
> 
> static int delta_base_offset = 1;
> -static int pack_kept_objects;
> +static int pack_kept_objects = -1;
> static char *packdir, *packtmp;
> 
> static const char *const git_repack_usage[] = {
> @@ -190,6 +190,9 @@ int cmd_repack(int argc, const char **argv, const char 
> *prefix)
>   argc = parse_options(argc, argv, prefix, builtin_repack_options,
>   git_repack_usage, 0);
> 
> + if (pack_kept_objects < 0)
> + pack_kept_objects = write_bitmap;
> +
>   packdir = mkpathdup("%s/pack", get_object_directory());
>   packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
> 
> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
> index f8431a8..b1eed5c 100755
> --- a/t/t7700-repack.sh
> +++ b/t/t7700-repack.sh
> @@ -21,7 +21,7 @@ test_expect_success 'objects in packs marked .keep are not 
> repacked' '
>   objsha1=$(git verify-pack -v pack-$packsha1.idx | head -n 1 |
>   sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") &&
>   mv pack-* .git/objec

Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-02-28 Thread Jeff King
On Thu, Feb 27, 2014 at 10:04:44AM -0800, Junio C Hamano wrote:

> I wonder if it makes sense to link it with "pack.writebitmaps" more
> tightly, without even exposing it as a seemingly orthogonal knob
> that can be tweaked, though.
> 
> I think that is because I do not fully understand the ", because ..."
> part of the below:
> 
> >> This patch introduces an option to disable the
> >> `--honor-pack-keep` option.  It is not triggered by default,
> >> even when pack.writeBitmaps is turned on, because its use
> >> depends on your overall packing strategy and use of .keep
> >> files.
> 
> If you ask --write-bitmap-index (or have pack.writeBitmaps on), you
> do want the bitmap-index to be written, and unless you tell
> pack-objects to ignore the .keep marker, it cannot do so, no?
> 
> Does the ", because ..." part above mean "you may have an overall
> packing strategy to use .keep file to not ever repack some subset of
> the objects, so we will not silently explode the kept objects into a
> new pack"?

Exactly. The two features (bitmaps and .keep) are not compatible with
each other, so you have to prioritize one. If you are using static .keep
files, you might want them to continue being respected at the expense of
using bitmaps for that repo. So I think you want a separate option from
--write-bitmap-index to allow the appropriate flexibility.

The default is another matter.  I think most people using .bitmaps on a
server would probably want to set repack.packKeptObjects.  They would
want to repack often to take advantage of the .bitmaps anyway, so they
probably don't care about .keep files (any they see are due to races
with incoming pushes).

So we could do something like falling back to turning the option on if
--write-bitmap-index is on _and_ the user didn't specify
--pack-kept-objects. The existing default is mostly there because it is
the conservative choice (.keep files continue to do their thing as
normal unless you say otherwise). But the fallback thing would be one
less knob that bitmap users would need to turn in the common case.

Here's the interdiff for doing the fallback:

---
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3a3d84f..a8ddc7f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2139,7 +2139,9 @@ repack.usedeltabaseoffset::
 repack.packKeptObjects::
If set to true, makes `git repack` act as if
`--pack-kept-objects` was passed. See linkgit:git-repack[1] for
-   details. Defaults to false.
+   details. Defaults to `false` normally, but `true` if a bitmap
+   index is being written (either via `--write-bitmap-index` or
+   `pack.writeBitmaps`).
 
 rerere.autoupdate::
When set to true, `git-rerere` updates the index with the
diff --git a/builtin/repack.c b/builtin/repack.c
index 49947b2..6b0b62d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -9,7 +9,7 @@
 #include "argv-array.h"
 
 static int delta_base_offset = 1;
-static int pack_kept_objects;
+static int pack_kept_objects = -1;
 static char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
@@ -190,6 +190,9 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
argc = parse_options(argc, argv, prefix, builtin_repack_options,
git_repack_usage, 0);
 
+   if (pack_kept_objects < 0)
+   pack_kept_objects = write_bitmap;
+
packdir = mkpathdup("%s/pack", get_object_directory());
packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
 
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index f8431a8..b1eed5c 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -21,7 +21,7 @@ test_expect_success 'objects in packs marked .keep are not 
repacked' '
objsha1=$(git verify-pack -v pack-$packsha1.idx | head -n 1 |
sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") &&
mv pack-* .git/objects/pack/ &&
-   git repack -A -d -l &&
+   git repack --no-pack-kept-objects -A -d -l &&
git prune-packed &&
for p in .git/objects/pack/*.idx; do
idx=$(basename $p)
@@ -35,9 +35,9 @@ test_expect_success 'objects in packs marked .keep are not 
repacked' '
test -z "$found_duplicate_object"
 '
 
-test_expect_success '--pack-kept-objects duplicates objects' '
+test_expect_success 'writing bitmaps duplicates .keep objects' '
# build on $objsha1, $packsha1, and .keep state from previous
-   git repack -Adl --pack-kept-objects &&
+   git repack -Adl &&
test_when_finished "found_duplicate_object=" &&
for p in .git/objects/pack/*.idx; do
idx=$(basename $p)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-02-27 Thread Junio C Hamano
Jeff King  writes:

> Of all of them, I think --pack-kept-objects is probably the best. And I
> think we are hitting diminishing returns in thinking too much more on
> the name. :)

True enough.

I wonder if it makes sense to link it with "pack.writebitmaps" more
tightly, without even exposing it as a seemingly orthogonal knob
that can be tweaked, though.

I think that is because I do not fully understand the ", because ..."
part of the below:

>> This patch introduces an option to disable the
>> `--honor-pack-keep` option.  It is not triggered by default,
>> even when pack.writeBitmaps is turned on, because its use
>> depends on your overall packing strategy and use of .keep
>> files.

If you ask --write-bitmap-index (or have pack.writeBitmaps on), you
do want the bitmap-index to be written, and unless you tell
pack-objects to ignore the .keep marker, it cannot do so, no?

Does the ", because ..." part above mean "you may have an overall
packing strategy to use .keep file to not ever repack some subset of
the objects, so we will not silently explode the kept objects into a
new pack"?


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-02-27 Thread Jeff King
On Wed, Feb 26, 2014 at 12:30:36PM -0800, Junio C Hamano wrote:

> >> pack-kept-objects then?
> >
> > Hmm. That does address my point above, but somehow the word "kept" feels
> > awkward to me. I'm ambivalent between the two.
> 
> That word does make my backside somewhat itchy ;-)
> 
> Would it help to take a step back and think what the option really
> does?  Perhaps we should call it --pack-all-objects, which is short
> for --pack-all-objectsregardless-of-where-they-currently-are-stored,
> or something?  The word "all" gives a wrong connotation in a
> different way (e.g. "regardless of reachability" is a possible wrong
> interpretation), so that does not sound too good, either.

I do not think "all" is what we want to say. It only affects "kept"
objects, not any of the other exclusions (e.g., "-l").

> "--repack-kept-objects"?  "--include-kept-objects"?

Of all of them, I think --pack-kept-objects is probably the best. And I
think we are hitting diminishing returns in thinking too much more on
the name. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-02-26 Thread Junio C Hamano
Jeff King  writes:

> On Mon, Feb 24, 2014 at 11:10:49AM -0800, Junio C Hamano wrote:
>
>> > The best name I could come up with is "--pack-keep-objects", since that
>> > is literally what it is doing. I'm not wild about the name because it is
>> > easy to read "keep" as a verb (and "pack" as a noun). I think it's OK,
>> > but suggestions are welcome.
>> 
>> pack-kept-objects then?
>
> Hmm. That does address my point above, but somehow the word "kept" feels
> awkward to me. I'm ambivalent between the two.

That word does make my backside somewhat itchy ;-)

Would it help to take a step back and think what the option really
does?  Perhaps we should call it --pack-all-objects, which is short
for --pack-all-objectsregardless-of-where-they-currently-are-stored,
or something?  The word "all" gives a wrong connotation in a
different way (e.g. "regardless of reachability" is a possible wrong
interpretation), so that does not sound too good, either.

"--repack-kept-objects"?  "--include-kept-objects"?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-02-26 Thread Jeff King
On Mon, Feb 24, 2014 at 11:10:49AM -0800, Junio C Hamano wrote:

> > The best name I could come up with is "--pack-keep-objects", since that
> > is literally what it is doing. I'm not wild about the name because it is
> > easy to read "keep" as a verb (and "pack" as a noun). I think it's OK,
> > but suggestions are welcome.
> 
> pack-kept-objects then?

Hmm. That does address my point above, but somehow the word "kept" feels
awkward to me. I'm ambivalent between the two.

Here's the "kept" version if you want to apply that.

-- >8 --
From: Vicent Marti 
Subject: [PATCH] repack: add `repack.packKeptObjects` config var

The git-repack command always passes `--honor-pack-keep`
to pack-objects. This has traditionally been a good thing,
as we do not want to duplicate those objects in a new pack,
and we are not going to delete the old pack.

However, when bitmaps are in use, it is important for a full
repack to include all reachable objects, even if they may be
duplicated in a .keep pack. Otherwise, we cannot generate
the bitmaps, as the on-disk format requires the set of
objects in the pack to be fully closed.

Even if the repository does not generally have .keep files,
a simultaneous push could cause a race condition in which a
.keep file exists at the moment of a repack. The repack may
try to include those objects in one of two situations:

  1. The pushed .keep pack contains objects that were
 already in the repository (e.g., blobs due to a revert of
 an old commit).

  2. Receive-pack updates the refs, making the objects
 reachable, but before it removes the .keep file, the
 repack runs.

In either case, we may prefer to duplicate some objects in
the new, full pack, and let the next repack (after the .keep
file is cleaned up) take care of removing them.

This patch introduces an option to disable the
`--honor-pack-keep` option.  It is not triggered by default,
even when pack.writeBitmaps is turned on, because its use
depends on your overall packing strategy and use of .keep
files.

Note that this option just disables the pack-objects
behavior. We still leave packs with a .keep in place, as we
do not necessarily know that we have duplicated all of their
objects.

Signed-off-by: Jeff King 
---
 Documentation/config.txt |  5 +
 Documentation/git-repack.txt |  8 
 builtin/repack.c | 10 +-
 t/t7700-repack.sh| 16 
 4 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index becbade..3a3d84f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2136,6 +2136,11 @@ repack.usedeltabaseoffset::
"false" and repack. Access from old Git versions over the
native protocol are unaffected by this option.
 
+repack.packKeptObjects::
+   If set to true, makes `git repack` act as if
+   `--pack-kept-objects` was passed. See linkgit:git-repack[1] for
+   details. Defaults to false.
+
 rerere.autoupdate::
When set to true, `git-rerere` updates the index with the
resulting contents after it cleanly resolves conflicts using
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 002cfd5..4786a78 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -117,6 +117,14 @@ other objects in that pack they already have locally.
must be able to refer to all reachable objects. This option
overrides the setting of `pack.writebitmaps`.
 
+--pack-kept-objects::
+   Include objects in `.keep` files when repacking.  Note that we
+   still do not delete `.keep` packs after `pack-objects` finishes.
+   This means that we may duplicate objects, but this makes the
+   option safe to use when there are concurrent pushes or fetches.
+   This option is generally only useful if you are writing bitmaps
+   with `-b` or `pack.writebitmaps`, as it ensures that the
+   bitmapped packfile has the necessary objects.
 
 Configuration
 -
diff --git a/builtin/repack.c b/builtin/repack.c
index 49f5857..49947b2 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -9,6 +9,7 @@
 #include "argv-array.h"
 
 static int delta_base_offset = 1;
+static int pack_kept_objects;
 static char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
@@ -22,6 +23,10 @@ static int repack_config(const char *var, const char *value, 
void *cb)
delta_base_offset = git_config_bool(var, value);
return 0;
}
+   if (!strcmp(var, "repack.packkeptobjects")) {
+   pack_kept_objects = git_config_bool(var, value);
+   return 0;
+   }
return git_default_config(var, value, cb);
 }
 
@@ -175,6 +180,8 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
N_("limits the maximum delta depth")),
OPT_STRING(0, "max-pack-size", &max_pack_size, 

Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-02-24 Thread Junio C Hamano
Jeff King  writes:

> Sorry, this one slipped through the cracks. Here's a re-roll addressing
> your comments.
> ...
>>  - In the context of "pack-objects", the name "--honor-pack-keep"
>>makes sense; it is understood that pack-objects will _not_ remove
>>kept packfile, so "honoring" can only mean "do not attempt to
>>pick objects out of kept packs to add to the pack being
>>generated." and there is no room for --no-honor-pack-keep to be
>>mistaken as "you canremove the ones marked to be kept after
>>saving the still-used objects in it away."
>> 
>>But does the same name make sense in the context of "repack"?
>
> I think the distinction you are making is to capture the second second
> from the docs:
>
>   If set to false, include objects in `.keep` files when repacking via
>   `git repack`. Note that we still do not delete `.keep` packs after
>   `pack-objects` finishes.
>
> The best name I could come up with is "--pack-keep-objects", since that
> is literally what it is doing. I'm not wild about the name because it is
> easy to read "keep" as a verb (and "pack" as a noun). I think it's OK,
> but suggestions are welcome.

pack-kept-objects then?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-02-24 Thread Jeff King
On Tue, Jan 28, 2014 at 01:21:43AM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > The git-repack command always passes `--honor-pack-keep`
> > to pack-objects. This has traditionally been a good thing,
> > as we do not want to duplicate those objects in a new pack,
> > and we are not going to delete the old pack.
> > ...
> > Note that this option just disables the pack-objects
> > behavior. We still leave packs with a .keep in place, as we
> > do not necessarily know that we have duplicated all of their
> > objects.
> >
> > Signed-off-by: Jeff King 
> > ---
> > Intended for the jk/pack-bitmap topic.
> 
> Two comments.

Sorry, this one slipped through the cracks. Here's a re-roll addressing
your comments.

>  - It seems that this adds a configuration variable that cannot be
>countermanded from the command line. It has to come with either a
>very good justification in the documentation describing why it is
>a bad idea to even allow overriding from the command line in a
>repository that sets it, or a command line option to let the
>users override it. I personally prefer the latter, because that
>will be one less thing for users to remember (i.e. "usually you
>can override the configured default from the command line, but
>this variable cannot be because of these very good reasons").

It was less "it is a bad idea to override" and more "I cannot conceive
of any good reason to override". And since you can always use "git -c",
I didn't think it was worth cluttering repack's options.

However, I suppose if one were explicitly bitmapping a single invocation
with `git repack -adb`, it might make sense to use it on the command
line. Fixed in the re-roll.

>  - In the context of "pack-objects", the name "--honor-pack-keep"
>makes sense; it is understood that pack-objects will _not_ remove
>kept packfile, so "honoring" can only mean "do not attempt to
>pick objects out of kept packs to add to the pack being
>generated." and there is no room for --no-honor-pack-keep to be
>mistaken as "you canremove the ones marked to be kept after
>saving the still-used objects in it away."
> 
>But does the same name make sense in the context of "repack"?

I think the distinction you are making is to capture the second second
from the docs:

  If set to false, include objects in `.keep` files when repacking via
  `git repack`. Note that we still do not delete `.keep` packs after
  `pack-objects` finishes.

The best name I could come up with is "--pack-keep-objects", since that
is literally what it is doing. I'm not wild about the name because it is
easy to read "keep" as a verb (and "pack" as a noun). I think it's OK,
but suggestions are welcome.

-- >8 --
From: Vicent Marti 
Subject: repack: add `repack.packKeepObjects` config var

The git-repack command always passes `--honor-pack-keep`
to pack-objects. This has traditionally been a good thing,
as we do not want to duplicate those objects in a new pack,
and we are not going to delete the old pack.

However, when bitmaps are in use, it is important for a full
repack to include all reachable objects, even if they may be
duplicated in a .keep pack. Otherwise, we cannot generate
the bitmaps, as the on-disk format requires the set of
objects in the pack to be fully closed.

Even if the repository does not generally have .keep files,
a simultaneous push could cause a race condition in which a
.keep file exists at the moment of a repack. The repack may
try to include those objects in one of two situations:

  1. The pushed .keep pack contains objects that were
 already in the repository (e.g., blobs due to a revert of
 an old commit).

  2. Receive-pack updates the refs, making the objects
 reachable, but before it removes the .keep file, the
 repack runs.

In either case, we may prefer to duplicate some objects in
the new, full pack, and let the next repack (after the .keep
file is cleaned up) take care of removing them.

This patch introduces an option to disable the
`--honor-pack-keep` option.  It is not triggered by default,
even when pack.writeBitmaps is turned on, because its use
depends on your overall packing strategy and use of .keep
files.

Note that this option just disables the pack-objects
behavior. We still leave packs with a .keep in place, as we
do not necessarily know that we have duplicated all of their
objects.

Signed-off-by: Jeff King 
---
I added a test, too, mostly to make sure I didn't bungle the option,
since it's negated from its former name.

 Documentation/config.txt |  5 +
 Documentation/git-repack.txt |  8 
 builtin/repack.c | 10 +-
 t/t7700-repack.sh| 16 
 4 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index becbade..8ad081e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2136,6 +2136,11 @@ repack.usedeltabaseoffset::
 

Re: [PATCH] repack: add `repack.honorpackkeep` config var

2014-01-28 Thread Junio C Hamano
Jeff King  writes:

> The git-repack command always passes `--honor-pack-keep`
> to pack-objects. This has traditionally been a good thing,
> as we do not want to duplicate those objects in a new pack,
> and we are not going to delete the old pack.
> ...
> Note that this option just disables the pack-objects
> behavior. We still leave packs with a .keep in place, as we
> do not necessarily know that we have duplicated all of their
> objects.
>
> Signed-off-by: Jeff King 
> ---
> Intended for the jk/pack-bitmap topic.

Two comments.

 - It seems that this adds a configuration variable that cannot be
   countermanded from the command line. It has to come with either a
   very good justification in the documentation describing why it is
   a bad idea to even allow overriding from the command line in a
   repository that sets it, or a command line option to let the
   users override it. I personally prefer the latter, because that
   will be one less thing for users to remember (i.e. "usually you
   can override the configured default from the command line, but
   this variable cannot be because of these very good reasons").

 - In the context of "pack-objects", the name "--honor-pack-keep"
   makes sense; it is understood that pack-objects will _not_ remove
   kept packfile, so "honoring" can only mean "do not attempt to
   pick objects out of kept packs to add to the pack being
   generated." and there is no room for --no-honor-pack-keep to be
   mistaken as "you canremove the ones marked to be kept after
   saving the still-used objects in it away."

   But does the same name make sense in the context of "repack"?

Thanks. 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] repack: add `repack.honorpackkeep` config var

2014-01-27 Thread Jeff King
On Thu, Jan 23, 2014 at 06:44:43PM -0800, Siddharth Agarwal wrote:

> On 01/23/2014 06:28 PM, Jeff King wrote:
> >I think your understanding is accurate here. So we want repack to
> >respect keep files for deletion, but we _not_ necessarily want
> >pack-objects to avoid packing an object just because it's in a pack
> >marked by .keep (see my other email).
> 
> Yes, that makes sense and sounds pretty safe.
> 
> So the right solution for us probably is to apply the patch Vicent
> posted, set repack.honorpackkeep to false, and also have a cron job
> that cleans up stale .keep files so that subsequent repacks clean it
> up.

Yes, that matches what we do at GitHub.

Here's Vicent's patch, with documentation and an expanded commit
message. I think it should be suitable for upstream git.

-- >8 --
From: Vicent Marti 
Subject: repack: add `repack.honorpackkeep` config var

The git-repack command always passes `--honor-pack-keep`
to pack-objects. This has traditionally been a good thing,
as we do not want to duplicate those objects in a new pack,
and we are not going to delete the old pack.

However, when bitmaps are in use, it is important for a full
repack to include all reachable objects, even if they may be
duplicated in a .keep pack. Otherwise, we cannot generate
the bitmaps, as the on-disk format requires the set of
objects in the pack to be fully closed.

Even if the repository does not generally have .keep files,
a simultaneous push could cause a race condition in which a
.keep file exists at the moment of a repack. The repack may
try to include those objects in one of two situations:

  1. The pushed .keep pack contains objects that were
 already in the repository (e.g., blobs due to a revert of
 an old commit).

  2. Receive-pack updates the refs, making the objects
 reachable, but before it removes the .keep file, the
 repack runs.

In either case, we may prefer to duplicate some objects in
the new, full pack, and let the next repack (after the .keep
file is cleaned up) take care of removing them.

This patch introduces an option to disable the
`--honor-pack-keep` option.  It is not triggered by default,
even when pack.writeBitmaps is turned on, because its use
depends on your overall packing strategy and use of .keep
files.

Note that this option just disables the pack-objects
behavior. We still leave packs with a .keep in place, as we
do not necessarily know that we have duplicated all of their
objects.

Signed-off-by: Jeff King 
---
Intended for the jk/pack-bitmap topic.

 Documentation/config.txt | 8 
 builtin/repack.c | 8 +++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 947e6f8..5036a10 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2128,6 +2128,14 @@ repack.usedeltabaseoffset::
"false" and repack. Access from old Git versions over the
native protocol are unaffected by this option.
 
+repack.honorPackKeep::
+   If set to false, include objects in `.keep` files when repacking
+   via `git repack`. Note that we still do not delete `.keep` packs
+   after `pack-objects` finishes. This means that we may duplicate
+   objects, but this makes the option safe to use when there are
+   concurrent pushes or fetches. This option is generally only
+   useful if you have set `pack.writeBitmaps`. Defaults to true.
+
 rerere.autoupdate::
When set to true, `git-rerere` updates the index with the
resulting contents after it cleanly resolves conflicts using
diff --git a/builtin/repack.c b/builtin/repack.c
index a9c4593..585c41d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -9,6 +9,7 @@
 #include "argv-array.h"
 
 static int delta_base_offset = 1;
+static int honor_pack_keep = 1;
 static char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
@@ -22,6 +23,10 @@ static int repack_config(const char *var, const char *value, 
void *cb)
delta_base_offset = git_config_bool(var, value);
return 0;
}
+   if (!strcmp(var, "repack.honorpackkeep")) {
+   honor_pack_keep = git_config_bool(var, value);
+   return 0;
+   }
return git_default_config(var, value, cb);
 }
 
@@ -190,10 +195,11 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
 
argv_array_push(&cmd_args, "pack-objects");
argv_array_push(&cmd_args, "--keep-true-parents");
-   argv_array_push(&cmd_args, "--honor-pack-keep");
argv_array_push(&cmd_args, "--non-empty");
argv_array_push(&cmd_args, "--all");
argv_array_push(&cmd_args, "--reflog");
+   if (honor_pack_keep)
+   argv_array_push(&cmd_args, "--honor-pack-keep");
if (window)
argv_array_pushf(&cmd_args, "--window=%u", window);
if (window_memory)
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send