Re: [PATCH Outreachy] mru: use double-linked list from list.h

2017-09-29 Thread Junio C Hamano
Jeff King  writes:

> Yes, I think we could just call this "list_move_to_front()" or
> something. The fact that it's operating on a list called
> "packed_git_mru" is probably sufficient to make it clear that the
> purpose is managing recentness.

I earlier said I wasn't sure, but I fully agree with your envisioned
endgame state, my understanding of which is that we will not have an
API that is only to be used to manage a MRU list (hence there will
be no mru.[ch] in that future), as there is very small MRU-specific
operation to be offered anyway.  Namely, mru_mark() that is to be
used to "mark the fact that the item was used---do whatever necessary
to maintain the MRU list".

Instead, everybody understands how the list API works, and the fact
that one instance of list_head is called MRU is sufficient to see
that use of "move this to the front of the list" function means we
are marking the item as most-recently-used.

Thanks.


Re: [PATCH Outreachy] mru: use double-linked list from list.h

2017-09-29 Thread Jeff King
On Fri, Sep 29, 2017 at 11:38:27PM +0300, Оля Тележная wrote:

> > About minor issues ( "tmp" vs "p2", variable scope, space indentation)
> > - fully agree, I will fix it.
> > ...
> > So finally I think that I need to fix that minor issues and that's
> > all.
> 
> I forgot about leak. I also need to add checking in mru_clear. That's
> not beautiful solution but it works reliably.

I'm not sure what you mean about checking in mru_clear(). It may make
sense to just send your v2 patch and I can see there what you do.

> Question remains the same - do you see something more to fix in this patch?

I think that's it. Think about writing a bit in the commit message about
the "why" of this. Like I said earlier, the rationale for this commit is
perhaps obvious, but it never hurts to be explicit (and it may be good
practice, since part of the point of this task is getting you familiar
with the patch submission process).

-Peff


Re: [PATCH Outreachy] mru: use double-linked list from list.h

2017-09-29 Thread Jeff King
On Fri, Sep 29, 2017 at 07:08:28PM +0300, Оля Тележная wrote:

> Many thanks to all of you, I am interested in every opinion. Sorry
> that I wasn't in the discussion, unfortunately I got sick, that's why
> I skipped all the process.

No problem. It's often reasonable to let review comments come in and
think about them as a whole before responding or re-posting anyway.

> > An overlong line (I can locally wrap it, so the patch does not have
> > to be re-sent only to fix this alone).
> I've read only about 50 characters max in commit head (and
> highlighting repeats it), but there's nothing about max length of line
> in commit message. Sorry, next time I will make it shorter.

Usually we shoot for making things look good in an 80-column terminal,
including both code and commit messages. For commit message bodies, we
tend to make them a little shorter, since "git log" will indent them. 72
characters is reasonable there. And we tend to make subject lines a
little shorter than that, even since they often appear with "Subject:"
and "[PATCH]" prefixed. I usually go for about 60 characters, but will
go over if it helps make the subject more clear.

> > I had envisioned leaving mru_mark() as a wrapper for "move to the front"
> > that could operate on any list. But seeing how Olga's patch takes it
> > down to two trivial lines, I'd also be fine with an endgame that just
> > eliminates it.
> Let's add needed function to list.h directly?

Yes, I think we could just call this "list_move_to_front()" or
something. The fact that it's operating on a list called
"packed_git_mru" is probably sufficient to make it clear that the
purpose is managing recentness.

> I also wanted to add
> list_for_each_entry function to list.h as it's in Linux kernel.
> https://www.kernel.org/doc/htmldocs/kernel-api/API-list-for-each-entry.html
> It will simplify the code even more, guess that not only in MRU
> related code. Maybe we need to do that in separate patch.

It would be nice to have list_for_each_entry(), but unfortunately it's
not portable. It relies on having a typeof operator, and we build on
platforms that lack it. It was omitted in 94e99012fc (http-walker:
reduce O(n) ops with doubly-linked list, 2016-07-11) for that reason.

> About minor issues ( "tmp" vs "p2", variable scope, space indentation)
> - fully agree, I will fix it.

Thanks.

> So finally I think that I need to fix that minor issues and that's
> all. I have plans to rewrite (with --amend) my current commit (I think
> so because I will add no new features, so it's better to have single
> commit for all changes).
> As I understand, Submitgit will send an update in a new thread. And I
> need to say there [PATCH v2].
> Please correct me if I am wrong in any of the moments mentioned earlier.

Correct. Until a patch is merged to Junio's "next" branch (at which
point it is set in stone), we generally prefer to rewrite it with
"--amend" (or git-rebase) to fix anything that comes up during the
review.

> By the way, other contributors write smth like "[PATCH v6 0/3]". What
> does mean "0/3"? It's about editing separate commits in a single
> patch, am I right?

Right, it means multiple commits in a logical series that are meant to
be applied together. Your patch is small enough that it makes as a
single patch. If we wanted to do the second step of dropping mru.[ch]
entirely now, then you'd probably have at least a 2-patch series.

(I'm OK with not doing that second step for now, though if we are not
going to polish up the mru_for_each() interface, it may make sense to
make the final step sooner rather than later).

-Peff


Re: [PATCH] oidmap: map with OID as key

2017-09-29 Thread Jeff King
On Fri, Sep 29, 2017 at 11:43:57PM +0200, Johannes Schindelin wrote:

> On Thu, 28 Sep 2017, Jeff King wrote:
> 
> > If you're planning on using an oidset to mark every object in a
> > 100-million-object monorepo, we'd probably care more. But I'd venture to
> > say that any scheme which involves generating that hash table on the fly
> > is doing it wrong. At at that scale we'd want to look at compact
> > mmap-able on-disk representations.
> 
> Or maybe you would look at a *not-so-compact* mmap()able on-disk
> representation, to allow for painless updates.
> 
> You really will want to avoid having to write out large files just because
> a small part of them changed. We learn that lesson the hard way, from
> having to write 350MB worth of .git/index for every single, painful `git
> add` operation.

Sure. I didn't mean to start designing the format. I just mean that if
the first step of the process is "read information about all 100 million
objects into an in-RAM hashmap", then that is definitely not going to
fly.

-Peff


RE: Is finalize_object_file in sha1_file.c handling errno from "rename" correctly?

2017-09-29 Thread Johannes Schindelin
Hi,

On Fri, 15 Sep 2017, Wesley Smith wrote:

> Thanks for your response.  I'm glad to see that you've been able to 
> understand the problem.  I'm working with the Windows git team to properly 
> return EACCESS when "rename" fails due to access permissions, but it also 
> sounds like there will need to be a fix to finalize_object_file to better 
> handle the case of an existing file when renaming.

The Windows part of the problem was fixed in v2.14.2.

Ciao,
Johannes

[ leaving the rest of the quoted mail intact intentionally, to give
readers a chance to read up on the context ]

> Wesley Smith
> T: 503.597.0556 | wsm...@greatergiving.com
> 
> -Original Message-
> From: Junio C Hamano [mailto:gits...@pobox.com] 
> Sent: Thursday, September 14, 2017 11:51 PM
> To: Wesley Smith
> Cc: git@vger.kernel.org
> Subject: Re: Is finalize_object_file in sha1_file.c handling errno from 
> "rename" correctly?
> 
> Wesley Smith  writes:
> 
> > 1) This bug is triggered because "git fetch" is causing a pack file to 
> > be written when that same pack file already exists.  It seems like 
> > this is harmless and shouldn't cause a problem.  Is that correct?
> 
> The final name of the packfile is derived from the entire contents of the 
> packfile; it should be harmless when we attempt to rename a new file, which 
> has exactly the same contents as an existing file, to the existing file and 
> see a failure out of that attempt.
> 
> > 2) It seems that finalize_object_file is not accounting for the fact 
> > that "link" will return EEXIST if the destination file already exists 
> > but is not writeable, whereas "rename" will return EACCESS in this 
> > case.  Is that correct?  If so, should finalize_object_file be fixed 
> > to account for this? Perhaps it should check if the newfile exists 
> > before calling rename.  Or, should the Windows mingw_rename function 
> > be modified to return EEXIST in this case, even though that's not the 
> > standard errno for that situation?
> 
> The codepath that is triggered by OBJECT_CREATION_USES_RENAMES ought to 
> behave correctly even on non-Windows platforms, so bending the error code of 
> rename() only on Windows to fit the existing error handling would not be a 
> smart thing to do.  Rather, the rename() emulation should leave a correct 
> errno and the caller should be updated to be aware of that error that is not 
> EEXIST, which it currently knows about.
> 
> Thanks for spotting a problem and digging to its cause.
> 
> This is a #leftoverbits tangent, and should be done separately from your 
> "OBJECT_CREATION_USES_RENAMES is broken" topic, but I think it is a bug to 
> use finalize_object_file() directly to "finalize"
> anything but an individual loose object file in the first place.
> 
> We should create a new shared helper that does what the function currently 
> does, make finalize_object_file() call that new shared helper, and make sure 
> finalize_object_file() is called only on a newly created loose object file.  
> The codepath that creates a new packfile and other things and moves them to 
> the final name should not call finalize_object_file() but the new shared 
> helper instead.
> 
> That way, we could later implement the "collision? check" alluded by the 
> in-code comment in finailize_object_file(), and we won't have to worry about 
> affecting callers other than the one that creates a loose object file with 
> such an enhancement.
> 
> 


Re: [idea] File history tracking hints

2017-09-29 Thread Johannes Schindelin
Hi Philip,

On Fri, 15 Sep 2017, Philip Oakley wrote:

> From: "Johannes Schindelin" 
>
> > In light of such experiences, I have to admit that the notion that the
> > rename detection can always be improved in hindsight puts quite a bit of
> > insult to injury for those developers who are bitten by it.
> 
> Your list made me think that the hints should be directed toward what may be
> considered existing solutions for those specific awkward cases.
> 
> So the hints could be (by type):
> - template;licence;boiler-plate;standard;reference :: copy
> - word-rename
> - regex for word substitution changes (e.g. which chars are within 'Word-_0`)
> - regex for white-space changes (i.e. which chars are considered whitespace.)
> - move-dir path/glob spec
> - move-file path/glob spec
> (maybe list each 'group' of moves, so that once found the rest of the rename
> detection follows the group.)
> 
> Once the particular hint is detected (path qualified) then the clue/hint is
> used to assist in parsing the files to simplify the comparison task and locate
> common lines, or common word patterns.
> 
> The first example is just a set of alternate terms folk use for the new
> duplicate file file case.
> 
> The second is a hint that there has been a number of fairly global name
> changes in the files. so not only do a word diff but detect & sumarise those
> global changes. (your class move example)
> 
> The third is the more simple global word changes, based on a limited char set
> for a 'word' token list.
> The fourth is where we are focussed on the white space part (complementing the
> word token viewpoint)
> 
> The move hints are lists of path specs that each have distinctly moved.
> 
> It may be possible to order the hints as well, so that the detections work in
> the right order, giving the heuristics a better chance!

I think my point was: no matter how likely we thought any heuristic rename
detection can be perfected over time, history proved that suspicion
incorrect.

Therefore, it would be good to have a way to tell Git about renames
explicitly so that it does not even need to use its heuristics.

Ciao,
Dscho


[PATCH v2] oidmap: map with OID as key

2017-09-29 Thread Jonathan Tan
This is similar to using the hashmap in hashmap.c, but with an
easier-to-use API. In particular, custom entry comparisons no longer
need to be written, and lookups can be done without constructing a
temporary entry structure.

This is implemented as a thin wrapper over the hashmap API. In
particular, this means that there is an additional 4-byte overhead due
to the fact that the first 4 bytes of the hash is redundantly stored.
For now, I'm taking the simpler approach, but if need be, we can
reimplement oidmap without affecting the callers significantly.

oidset has been updated to use oidmap.

Signed-off-by: Jonathan Tan 
---
Some replies to v1 [1] [2] seem to indicate that simpler non-duplicated
code should be preferred over optimizing away the storage of the 4-byte
hash code, and I have no objection to that, so I have updated this code
to be a thin wrapper over hashmap with the 4-byte overhead.

After this patch, if the 4-byte overhead is found to be too much, we can
migrate to something similar to v1 relatively easily.

I decided not to go with the util pointer method because we will not be
able to migrate away from it so easily if need be.

[1] https://public-inbox.org/git/xmqqr2ur348z@gitster.mtv.corp.google.com/
[2] 
https://public-inbox.org/git/20170929192624.4ukvpjujgiyzg...@sigill.intra.peff.net/
---
 Makefile |  1 +
 fetch-pack.c |  2 +-
 oidmap.c | 51 +
 oidmap.h | 68 
 oidset.c | 36 +++-
 oidset.h |  6 --
 6 files changed, 133 insertions(+), 31 deletions(-)
 create mode 100644 oidmap.c
 create mode 100644 oidmap.h

diff --git a/Makefile b/Makefile
index ed4ca438b..64136dde4 100644
--- a/Makefile
+++ b/Makefile
@@ -821,6 +821,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
+LIB_OBJS += oidmap.o
 LIB_OBJS += oidset.o
 LIB_OBJS += packfile.o
 LIB_OBJS += pack-bitmap.o
diff --git a/fetch-pack.c b/fetch-pack.c
index 105506e9a..008b25d3d 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -611,7 +611,7 @@ static int tip_oids_contain(struct oidset *tip_oids,
 * add to "newlist" between calls, the additions will always be for
 * oids that are already in the set.
 */
-   if (!tip_oids->map.tablesize) {
+   if (!tip_oids->map.map.tablesize) {
add_refs_to_oidset(tip_oids, unmatched);
add_refs_to_oidset(tip_oids, newlist);
}
diff --git a/oidmap.c b/oidmap.c
new file mode 100644
index 0..6db4fffcd
--- /dev/null
+++ b/oidmap.c
@@ -0,0 +1,51 @@
+#include "cache.h"
+#include "oidmap.h"
+
+static int cmpfn(const void *hashmap_cmp_fn_data,
+const void *entry, const void *entry_or_key,
+const void *keydata)
+{
+   const struct oidmap_entry *entry_ = entry;
+   if (keydata)
+   return oidcmp(_->oid, (const struct object_id *) keydata);
+   return oidcmp(_->oid,
+ &((const struct oidmap_entry *) entry_or_key)->oid);
+}
+
+static int hash(const struct object_id *oid)
+{
+   int hash;
+   memcpy(, oid->hash, sizeof(hash));
+   return hash;
+}
+
+void oidmap_init(struct oidmap *map, size_t initial_size)
+{
+   hashmap_init(>map, cmpfn, NULL, initial_size);
+}
+
+void oidmap_free(struct oidmap *map, int free_entries)
+{
+   if (!map)
+   return;
+   hashmap_free(>map, free_entries);
+}
+
+void *oidmap_get(const struct oidmap *map, const struct object_id *key)
+{
+   return hashmap_get_from_hash(>map, hash(key), key);
+}
+
+void *oidmap_remove(struct oidmap *map, const struct object_id *key)
+{
+   struct hashmap_entry entry;
+   hashmap_entry_init(, hash(key));
+   return hashmap_remove(>map, , key);
+}
+
+void *oidmap_put(struct oidmap *map, void *entry)
+{
+   struct oidmap_entry *to_put = entry;
+   hashmap_entry_init(_put->internal_entry, hash(_put->oid));
+   return hashmap_put(>map, to_put);
+}
diff --git a/oidmap.h b/oidmap.h
new file mode 100644
index 0..18f54cde1
--- /dev/null
+++ b/oidmap.h
@@ -0,0 +1,68 @@
+#ifndef OIDMAP_H
+#define OIDMAP_H
+
+#include "hashmap.h"
+
+/*
+ * struct oidmap_entry is a structure representing an entry in the hash table,
+ * which must be used as first member of user data structures.
+ *
+ * Users should set the oid field. oidmap_put() will populate the
+ * internal_entry field.
+ */
+struct oidmap_entry {
+   /* For internal use only */
+   struct hashmap_entry internal_entry;
+
+   struct object_id oid;
+};
+
+struct oidmap {
+   struct hashmap map;
+};
+
+#define OIDMAP_INIT { { NULL } }
+
+/*
+ * Initializes an oidmap structure.
+ *
+ * `map` is the oidmap to initialize.
+ *
+ * If the total number of entries is known in advance, the `initial_size`
+ * parameter may be used to preallocate a 

Re: [PATCH] clang-format: adjust line break penalties

2017-09-29 Thread Jonathan Nieder
Stephan Beyer wrote:
> On 09/29/2017 08:40 PM, Jonathan Nieder wrote:

>> Going forward, is there an easy way to preview the effect of this kind
>> of change (e.g., to run "make style" on the entire codebase so as to be
>> able to compare the result with two different versions of
>> .clang-format)?
>
> I just ran clang-format before and after the patch and pushed to github.
> The resulting diff is quite big:
>
> https://github.com/sbeyer/git/commit/3d1186c4cf4dd7e40b97453af5fc1170f6868ccd

Thanks.  The first change I see there is

 -char *strbuf_realpath(struct strbuf *resolved, const char *path, int 
die_on_error)
 +char *
 +strbuf_realpath(struct strbuf *resolved, const char *path, int die_on_error)

I understand why the line is broken, but the choice of line break is
wrong.  Seems like the penalty for putting return type on its own line
quite high enough.

My Reviewed-by still stands, though.  It gets "make style" to signal
long lines that should be broken, which is an improvement.

> PS: There should be a comment at the beginning of the .clang-format file
> that says what version it is tested with (on my machine it worked with
> 5.0 but not with 4.0) and there should also probably a remark that the
> clang-format-based style should only be understood as a hint or guidance
> and that most of the Git codebase does not conform it.

Sounds good to me.  Care to send it as a patch? :)

Thanks,
Jonathan


Re: [PATCH] clang-format: adjust line break penalties

2017-09-29 Thread Stephan Beyer
Hi,

On 09/29/2017 08:40 PM, Jonathan Nieder wrote:
> Going forward, is there an easy way to preview the effect of this kind
> of change (e.g., to run "make style" on the entire codebase so as to be
> able to compare the result with two different versions of
> .clang-format)?

I just ran clang-format before and after the patch and pushed to github.
The resulting diff is quite big:

https://github.com/sbeyer/git/commit/3d1186c4cf4dd7e40b97453af5fc1170f6868ccd

Cheers
Stephan

PS: There should be a comment at the beginning of the .clang-format file
that says what version it is tested with (on my machine it worked with
5.0 but not with 4.0) and there should also probably a remark that the
clang-format-based style should only be understood as a hint or guidance
and that most of the Git codebase does not conform it.


Re: RFC v3: Another proposed hash function transition plan

2017-09-29 Thread Johannes Schindelin
Hi Joan,

On Fri, 29 Sep 2017, Joan Daemen wrote:

> if ever there was a SHA-2 competition, it must have been held inside NSA:-)

Oops. My bad, I indeed got confused about that, as you suggest below (I
actually thought of the AES competition, but that was obviously not about
SHA-2). Sorry.

> But maybe you are confusing with the SHA-3 competition. In any case,
> when considering SHA-2 vs SHA-3 for usage in git, you may have a look at
> arguments we give in the following blogpost:
> 
> https://keccak.team/2017/open_source_crypto.html

Thanks for the pointer!

Small nit: the post uses "its" in place of "it's", twice.

It does have a good point, of course: the scientific exchange (which you
call "open-source" in spirit) makes tons of sense.

As far as Git is concerned, we not only care about the source code of the
hash algorithm we use, we need to care even more about what you call
"executable": ready-to-use, high quality, well-tested implementations.

We carry source code for SHA-1 as part of Git's source code, which was
hand-tuned to be as fast as Linus could get it, which was tricky given
that the tuning should be general enough to apply to all common intel
CPUs.

This hand-crafted code was blown out of the water by OpenSSL's SHA-1 in
our tests here at Microsoft, thanks to the fact that OpenSSL does
vectorized SHA-1 computation now.

To me, this illustrates why it is not good enough to have only a reference
implementation available at our finger tips. Of course, above-mentioned
OpenSSL supports SHA-256 and SHA3-256, too, and at least recent versions
vectorize those, too.

Also, ARM processors have become a lot more popular, so we'll want to have
high-quality implementations of the hash algorithm also for those
processors.

Likewise, in contrast to 2005, nowadays implementations of Git in
languages as obscure as Javascript are not only theoretical but do exist
in practice (https://github.com/creationix/js-git). I had a *very* quick
look for libraries providing crypto in Javascript and immediately found
the Standford Javascript Crypto library
(https://github.com/bitwiseshiftleft/sjcl/) which seems to offer SHA-256
but not SHA3-256 computation.

Back to Intel processors: I read some vague hints about extensions
accelerating SHA-256 computation on future Intel processors, but not
SHA3-256.

It would make sense, of course, that more crypto libraries and more
hardware support would be available for SHA-256 than for SHA3-256 given
the time since publication: 16 vs 5 years (I am playing it loose here,
taking just the year into account, not the exact date, so please treat
that merely as a ballpark figure).

So from a practical point of view, I wonder what your take is on, say,
hardware support for SHA3-256. Do you think this will become a focus soon?

Also, what is your take on the question whether SHA-256 is good enough?
SHA-1 was broken theoretically already 10 years after it was published
(which unfortunately did not prevent us from baking it into Git), after
all, while SHA-256 is 16 years old and the only known weakness does not
apply to Git's usage?

Also, while I have the attention of somebody who knows a heck more about
cryptography than Git's top 10 committers combined: how soon do you expect
practical SHA-1 attacks that are much worse than what we already have
seen? I am concerned that if we do not move fast enough to a new hash
algorithm, and somebody finds a way in the meantime to craft arbitrary
messages given a prefix and an SHA-1, then we have a huge problem on
our hands.

Ciao,
Johannes


Re: [PATCH] oidmap: map with OID as key

2017-09-29 Thread Johannes Schindelin
Hi Peff,

On Thu, 28 Sep 2017, Jeff King wrote:

> If you're planning on using an oidset to mark every object in a
> 100-million-object monorepo, we'd probably care more. But I'd venture to
> say that any scheme which involves generating that hash table on the fly
> is doing it wrong. At at that scale we'd want to look at compact
> mmap-able on-disk representations.

Or maybe you would look at a *not-so-compact* mmap()able on-disk
representation, to allow for painless updates.

You really will want to avoid having to write out large files just because
a small part of them changed. We learn that lesson the hard way, from
having to write 350MB worth of .git/index for every single, painful `git
add` operation.

Ciao,
Dscho


Re: [PATCH v2 7/9] connect: tell server that the client understands v1

2017-09-29 Thread Brandon Williams
On 09/27, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
> >> +  # Client requested to use protocol v1
> >> +  grep "version=1" log &&
> >> +  # Server responded using protocol v1
> >> +  grep "clone< version 1" log
> >
> > This looked a bit strange to check "clone< version 1" for one
> > direction, but did not check "$something> version 1" for the other
> > direction.  Doesn't "version=1" end up producing 2 hits?
> >
> > Not a complaint, but wondering if we can write it in such a way that
> > does not have to make readers wonder.
> 
> Ah, the check for "version=1" is a short-hand for
> 
>   grep "clone> git-upload-pack ...\\0\\0version=1\\0$" log
> 
> and the symmetry I sought is already there.  So ignore the above; if
> we wanted to make the symmetry more explicit, it would not hurt to
> spell the first one as
> 
>   grep "clone> .*\\0\\0version=1\\0$" log

I think you need three '\' to get an escaped backslash, but I agree,
I'll spell this out more explicitly in the tests.

> 
> though.
> 

-- 
Brandon Williams


Re: [PATCH v2 3/9] protocol: introduce protocol extention mechanisms

2017-09-29 Thread Brandon Williams
On 09/27, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
> >> +enum protocol_version determine_protocol_version_server(void)
> >> +{
> >> +  const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT);
> >> +  enum protocol_version version = protocol_v0;
> >> +
> >> +  if (git_protocol) {
> >> +  struct string_list list = STRING_LIST_INIT_DUP;
> >> +  const struct string_list_item *item;
> >> +  string_list_split(, git_protocol, ':', -1);
> >> +
> >> +  for_each_string_list_item(item, ) {
> >> +  const char *value;
> >> +  enum protocol_version v;
> >> +
> >> +  if (skip_prefix(item->string, "version=", )) {
> >> +  v = parse_protocol_version(value);
> >> +  if (v > version)
> >> +  version = v;
> >> +  }
> >> +  }
> >> +
> >> +  string_list_clear(, 0);
> >> +  }
> >> +
> >> +  return version;
> >> +}
> >
> > This implements "the largest one wins", not "the last one wins".  Is
> > there a particular reason why the former is chosen?
> 
> Let me give my version of why the usual "the last one wins" would
> not necessarily a good idea.  I would imagine that a client
> contacting the server may want to say "I understand v3, v2 (but not
> v1 nor v0)" and in order to influence the server's choice between
> the available two, it may want to somehow say it prefers v3 over v2
> (or v2 over v3).  
> 
> One way to implement such a behaviour would be "the first one that
> is understood is used", i.e. something along this line:
> 
> enum protocol_version version = protocol_unknown;
> 
>   for_each_string_list_item(item, ) {
>   const char *value;
>   enum protocol_version v;
>   if (skip_prefix(item->string, "version=", )) {
>   if (version == protocol_unknown) {
>   v = parse_protocol_version(value);
>   if (v != protocol_unknown)
>   version = v;
>   }
>   }
>   }
> 
>   if (version == protocol_unknown)
>   version = protocol_v0;
> 
> and not "the largest one wins" nor "the last one wins".
> 
> I am not saying your code or the choice of "the largest one wins" is
> necessarily wrong.  I am just illlustrating the way to explain
> "because I want to support a usecase like _this_, I define the way
> in which multiple values to the version variable is parsed like so,
> hence this code".  IOW, I think this commit should mention how the
> "largest one wins" rule would be useful to the clients and the
> servers when they want to achieve X---and that X is left unexplained.

I believe I mentioned this elsewhere but I think that at some point this
logic will probably have to be tweaked again at some point so that a
server may be able to prefer one version to another.

That being said I can definitely add a comment indicating how this code
selects the version and that it can be used to ensure that the latest
and greatest protocol version is used.

-- 
Brandon Williams


Re: [PATCH v16 1/6] bisect--helper: use OPT_CMDMODE instead of OPT_BOOL

2017-09-29 Thread Pranit Bauva
Hey Stephan,

On Sat, Sep 30, 2017 at 12:24 AM, Stephan Beyer  wrote:
>
> Hi Pranit,
>
> On 09/29/2017 08:49 AM, Pranit Bauva wrote:
> > It has been a long time since this series appeared on the mailing list.
> > The previous version v15[1] is now split into many parts and I am
> > sending the first part right now, will focus on getting this merged and
> > then send out the next part.
>
> That's a good idea!
>
> I just reviewed your series by assuming I did the v15 review well (it
> took quite some time at least)... so I just diff'ed the v15 and the v16
> patches. I am totally fine with it!

Thanks for having the patience of waiting for it and reviewing it again.

Regards,
Pranit Bauva
www.bauva.com


Re: [PATCH 00/18] Partial clone (from clone to lazy fetch in 18 patches)

2017-09-29 Thread Johannes Schindelin
Hi Jonathan,

On Fri, 29 Sep 2017, Jonathan Tan wrote:

> Jeff Hostetler has sent out some object-filtering patches [1] that is a
> superset of the object-filtering functionality that I have (in the
> pack-objects patches). I have gone for the minimal approach here, but if
> his patches are merged, I'll update my patch set to use those.

I wish there was a way for you to work *with* Jeff on this. It seems that
your aims are similar enough for that (you both need changes in the
protocol) yet different enough to allow for talking past each other (big
blobs vs narrow clone).

And I get the impression that in this instance, it slows everything down
to build competing, large patch series rather than building on top of each
other's work.

Additionally, I am not helping by pestering Jeff all the time about
different issues, so it is partially my fault.

But maybe there is a chance to *really* go for a minimal approach, as in
"incremental enough that you can share the first  patches"? And even
better: "come up with those first  patches together"?

Ciao,
Dscho




Re: [PATCH Outreachy] mru: use double-linked list from list.h

2017-09-29 Thread Оля Тележная
> About minor issues ( "tmp" vs "p2", variable scope, space indentation)
> - fully agree, I will fix it.
> ...
> So finally I think that I need to fix that minor issues and that's
> all.

I forgot about leak. I also need to add checking in mru_clear. That's
not beautiful solution but it works reliably.
Question remains the same - do you see something more to fix in this patch?

Thank you!
Olga


Re: [PATCH v6 09/40] Add initial external odb support

2017-09-29 Thread Jonathan Tan
On Wed, 27 Sep 2017 18:46:30 +0200
Christian Couder  wrote:

> I am ok to split the patch series, but I am not sure that 01/40 to
> 09/40 is the right range for the first patch series.
> I would say that 01/40 to 07/40 is better as it can be seen as a
> separate refactoring.

I mentioned 09/40 because this is (as far as I can tell) the first one
that introduces a new design.

> I don't think single-shot processes would be a huge burden, because
> the code is simpler, and because for example for filters we already
> have single shot and long-running processes and no one complains about
> that. It's code that is useful as it makes it much easier for people
> to do some things (see the clone bundle example).
> 
> In fact in Git development we usually start to by first implementing
> simpler single-shot solutions, before thinking, when the need arise,
> to make it faster. So a perhaps an equally valid opinion could be to
> first only submit the patches for the single-shot protocol and later
> submit the rest of the series when we start getting feedback about how
> external odbs are used.

My concern is that, as far as I understand about the Microsoft use case,
we already know that we need the faster solution, so the need has
already arisen.

> And yeah I could change the order of the patch series to implement the
> long-running processes first and the single-shot process last, so that
> it could be possible to first get feedback about the long-running
> processes, before we decide to merge or not the single-shot stuff, but
> I don't think it would look like the most logical order.

My thinking was that we would just implement the long-running process
and not implement the single-shot process at all (besides maybe a script
in contrib/). If we are going to do both anyway, I agree that we should
do the single-shot process first.

> > And another possible issue is that we design ourselves into a corner.
> > Thinking about the use cases that I know about (the Android use case and
> > the Microsoft GVFS use case), I don't think we are doing that - for
> > Android, this means that large blob metadata needs to be part of the
> > design (and this patch series does provide for that), and for Microsoft
> > GVFS, "get" is relatively cheap, so a configuration option to not invoke
> > "have" first when loading a missing object might be sufficient.
> 
> If the helper does not advertise the "have" capability, the "have"
> instruction will not be sent to the helper, so the current design is
> already working for that case.

Ah, that's good to know.

> > And I think that my design can be extended to support a use case in
> > which, for example, blobs corresponding to a certain type of filename
> > (defined by a glob like in gitattributes) can be excluded during
> > fetch/clone, much like --blob-max-bytes, and they can be fetched either
> > through the built-in mechanism or through a custom hook.
> 
> Sure, we could probably rebuild something equivalent to what I did on
> top of your design.
> My opinion though is that if we want to eventually get to the same
> goal, it is better to first merge something that get us very close to
> the end goal and then add some improvements on top of it.

I agree - I mentioned that because I personally prefer to review smaller
patch sets at a time, and my patch set already includes a lot of the
same infrastructure needed by yours - for example, the places in the
code to dynamically fetch objects, exclusion of objects when fetching or
cloning, configuring the cloned repo when cloning, fsck, and gc.

> >  - I get compile errors when I "git am" these onto master. I think
> >'#include "config.h"' is needed in some places.
> 
> It's strange because I get no compile errors even after a "make clean"
> from my branch.
> Could you show the actual errors?

I don't have the error messages with me now, but it was something about
a function being implicitly declared. You will probably get these errors
if you sync past commit e67a57f ("config: create config.h", 2017-06-15).

> > Any reason why you prefer to update the loose object functions than to
> > update the generic one (sha1_object_info_extended)? My concern with just
> > updating the loose object functions was that a caller might have
> > obtained the path by iterating through the loose object dirs, and in
> > that case we shouldn't query the external ODB for anything.
> 
> You are thinking about fsck or gc?
> Otherwise I don't think it would be clean to iterate through loose object 
> dirs.

Yes, fsck and gc (well, prune, I think) do that. I agree that Git
typically doesn't do that (except for exceptional cases like fsck and
gc), but I was thinking about supporting existing code that does that
iteration, not introducing new code that does that.


Re: [PATCH] gitk: expand $config_file_tmp before reporting to user

2017-09-29 Thread Uxío Prego
Well something like this is not paying _Paul_ enough for what he gave to The
People, so I do not think there is worth trying.

> On 28 Sep 2017, at 06:37, Junio C Hamano  wrote:
> 
> Max Kirillov  writes:
> 
>> Tilda-based path may confise some users. First, tilda is not known
>> for Window users, second, it may point to unexpected location
>> depending on various environment setup.
>> 
>> Expand the path to "nativename", so that ~/.config/git/gitk-tmp
>> would be "C:\Users\user\.config\git\gitk-tmp", for example.
>> It should be less cryptic
> 
> It might be less cryptic, but for those of us whose $HOME is a
> looong path, ~/.config/git/gitk-tmp is much easier to understand
> than the same path with ~/ expanded, which would push the part of
> the filename that most matters far to the right hand side of the
> dialog.
> 
> I somehow find this change just robbing Peter to pay Paul.
> 
>>  } else {
>> -error_popup "There appears to be a stale $config_file_tmp\
>> +error_popup "There appears to be a stale \
>> + \"[file nativename $config_file_tmp]\" \
> 



Re: RFC: Design and code of partial clones (now, missing commits and trees OK)

2017-09-29 Thread Jonathan Tan
On Tue, 26 Sep 2017 17:26:33 +0200
Michael Haggerty  wrote:

> Maybe naming has been discussed at length before, and I am jumping into
> a long-settled topic. And admittedly this is bikeshedding.
> 
> But I find these names obscure, even as a developer. And terms like this
> will undoubtedly bleed into the UI and documentation, so it would be
> good to put some effort into choosing the best names possible.

Names are definitely not a long-settled topic. :-)

I agree that naming is important, and thanks for your efforts.

> I suppose that the term "promisor" comes from the computer science term
> "promise" [1]. In that sense it is apt, because, say, a promisor object
> is something that is known to be obtainable, but we don't have it yet.
> 
> But from the user's point of view, I think this isn't a very
> illuminating term. I think the user's mental model will be that there is
> a distinguished remote repository that holds the project's entire
> published history, and she has to remain connected to it for certain Git
> operations to work [2]. Another interesting aspect of this remote is
> that it has to be trusted never (well, almost never) to discard any
> objects [3].

Yes, that is the mental model I have too. I think the ordinary meaning
of the word "promise" works, though - you're not working completely on
things you have, but you're working partly based on the guarantees (or
promises) that this distinguished remote repository gives.

> Personally I think "lazy remote" and "backing remote" are not too bad.

I think these terms are imprecise. "Lazy remote" seems to me to imply
that it is the remote that is lazy, not us.

"Backing remote" does evoke the concept of a "backing store". For me,
the ability to transfer objects to the backing store to be stored
permanently (so that you don't have to store it yourself) is an
essential part of a backing store, and that is definitely something we
don't do here (although, admittedly, such a feature might be useful), so
I don't agree with that term. But if transferring objects is not
essential to a backing store, or if adding such a feature is a natural
fit to the partial clone feature, maybe we could use that.

> [2] I haven't checked whether the current proposal allows for
> multiple "promisor remotes". It's certainly thinkable, if not
> now then in the future. But I suppose that even then, 99% of
> users will configure a single "promisor remote" for each
> repository.

It does not allow for multiple "promisor remotes". Support for that
would require upgrades in the design (including knowing which remote to
fetch a missing object from), but otherwise I agree with your
statements.


[PATCH 10/18] pack-objects: rename want_.* to ignore_.*

2017-09-29 Thread Jonathan Tan
Currently, in pack_objects, add_object_entry() distinguishes between 2
types of non-preferred-base objects:

 (1) objects that should not be in "to_pack" because an option like
 --local or --honor-pack-keep is set
 (2) objects that should be in "to_pack"

A subsequent commit will teach pack-objects to exclude certain objects
(specifically, blobs that are larger than a user-given threshold and are
not potentially a Git special file [1]), but unlike in (1) above, these
exclusions need to be reported. So now we have 3 types of
non-preferred-base objects:

 (a) objects that should not be in "to_pack" and should not be reported
 because an option like --local or --honor-pack-keep is set
 (b) objects that should not be in "to_pack" and should be reported
 because they are blobs that are oversized and non-Git-special
 (c) objects that should be in "to_pack"

add_object_entry() will be taught to distinguish between these 3 types
by the following:

 - renaming want_found_object() and want_object_in_pack() to ignore_.*
   to make it clear that they only check for (a) (this commit)
 - adding a new function to check for (b) and using it within
   add_object_entry() (a subsequent commit)

The alternative would be to retain the names want_found_object() and
want_object_in_pack() and have them handle both the cases (a) and (b),
but that would result in more complicated code.

We also take the opportunity to use the terminology "preferred_base"
instead of "excluded" in these methods. It is true that preferred bases
are not included in the final packfile generation, but at this point in
the code, there is no exclusion taking place - on the contrary, if
something is "excluded", it is in fact guaranteed to be in to_pack.

[1] For the purposes of pack_objects, a blob is a Git special file if it
appears in a to-be-packed tree with a filename beginning with
".git".

Signed-off-by: Jonathan Tan 
---
 builtin/pack-objects.c | 56 +-
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 958822bf4..ef0b61d5f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -949,12 +949,12 @@ static int have_duplicate_entry(const unsigned char *sha1,
return 1;
 }
 
-static int want_found_object(int exclude, struct packed_git *p)
+static int ignore_found_object(int preferred_base, struct packed_git *p)
 {
-   if (exclude)
-   return 1;
-   if (incremental)
+   if (preferred_base)
return 0;
+   if (incremental)
+   return 1;
 
/*
 * When asked to do --local (do not include an object that appears in a
@@ -972,19 +972,19 @@ static int want_found_object(int exclude, struct 
packed_git *p)
 */
if (!ignore_packed_keep &&
(!local || !have_non_local_packs))
-   return 1;
+   return 0;
 
if (local && !p->pack_local)
-   return 0;
+   return 1;
if (ignore_packed_keep && p->pack_local && p->pack_keep)
-   return 0;
+   return 1;
 
/* we don't know yet; keep looking for more packs */
return -1;
 }
 
 /*
- * Check whether we want the object in the pack (e.g., we do not want
+ * Check whether we should ignore this object (e.g., we do not want
  * objects found in non-local stores if the "--local" option was used).
  *
  * If the caller already knows an existing pack it wants to take the object
@@ -992,16 +992,16 @@ static int want_found_object(int exclude, struct 
packed_git *p)
  * function finds if there is any pack that has the object and returns the pack
  * and its offset in these variables.
  */
-static int want_object_in_pack(const unsigned char *sha1,
-  int exclude,
-  struct packed_git **found_pack,
-  off_t *found_offset)
+static int ignore_object(const unsigned char *sha1,
+int preferred_base,
+struct packed_git **found_pack,
+off_t *found_offset)
 {
struct mru_entry *entry;
-   int want;
+   int ignore;
 
-   if (!exclude && local && has_loose_object_nonlocal(sha1))
-   return 0;
+   if (!preferred_base && local && has_loose_object_nonlocal(sha1))
+   return 1;
 
/*
 * If we already know the pack object lives in, start checks from that
@@ -1009,9 +1009,9 @@ static int want_object_in_pack(const unsigned char *sha1,
 * are present we will determine the answer right now.
 */
if (*found_pack) {
-   want = want_found_object(exclude, *found_pack);
-   if (want != -1)
-   return want;
+   ignore = ignore_found_object(preferred_base, *found_pack);
+   if (ignore != -1)
+ 

[PATCH 03/18] fsck: support referenced promisor objects

2017-09-29 Thread Jonathan Tan
Teach fsck to not treat missing promisor objects indirectly pointed to
by refs as an error when extensions.partialclone is set.

Signed-off-by: Jonathan Tan 
---
 builtin/fsck.c   | 11 +++
 t/t0410-partial-clone.sh | 23 +++
 2 files changed, 34 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index f1529527b..4492a4fab 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -149,6 +149,15 @@ static int mark_object(struct object *obj, int type, void 
*data, struct fsck_opt
if (obj->flags & REACHABLE)
return 0;
obj->flags |= REACHABLE;
+
+   if (is_promisor_object(>oid))
+   /*
+* Further recursion does not need to be performed on this
+* object since it is a promisor object (so it does not need to
+* be added to "pending").
+*/
+   return 0;
+
if (!(obj->flags & HAS_OBJ)) {
if (parent && !has_object_file(>oid)) {
printf("broken link from %7s %s\n",
@@ -213,6 +222,8 @@ static void check_reachable_object(struct object *obj)
 * do a full fsck
 */
if (!(obj->flags & HAS_OBJ)) {
+   if (is_promisor_object(>oid))
+   return;
if (has_sha1_pack(obj->oid.hash))
return; /* it is in pack - forget about it */
printf("missing %s %s\n", printable_type(obj),
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index bf75162c1..4f9931f9b 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -102,4 +102,27 @@ test_expect_success 'missing ref object, but promised, 
passes fsck' '
git -C repo fsck
 '
 
+test_expect_success 'missing object, but promised, passes fsck' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo 1 &&
+   test_commit -C repo 2 &&
+   test_commit -C repo 3 &&
+   git -C repo tag -a annotated_tag -m "annotated tag" &&
+
+   C=$(git -C repo rev-parse 1) &&
+   T=$(git -C repo rev-parse 2^{tree}) &&
+   B=$(git hash-object repo/3.t) &&
+   AT=$(git -C repo rev-parse annotated_tag) &&
+
+   promise_and_delete "$C" &&
+   promise_and_delete "$T" &&
+   promise_and_delete "$B" &&
+   promise_and_delete "$AT" &&
+
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config extensions.partialclone "arbitrary string" &&
+   git -C repo fsck
+'
+
 test_done
-- 
2.14.2.822.g60be5d43e6-goog



[PATCH 06/18] introduce fetch-object: fetch one promisor object

2017-09-29 Thread Jonathan Tan
Introduce fetch-object, providing the ability to fetch one object from a
promisor remote.

This uses fetch-pack. To do this, the transport mechanism has been
updated with 2 flags, "from-promisor" to indicate that the resulting
pack comes from a promisor remote (and thus should be annotated as such
by index-pack), and "no-haves" to suppress the sending of "have" lines.

This will be tested in a subsequent commit.

NEEDSWORK: update this when we have more information about protocol v2,
which should allow a way to suppress the ref advertisement and
officially allow any object type to be "want"-ed.

Signed-off-by: Jonathan Tan 
---
 Makefile |  1 +
 builtin/fetch-pack.c |  8 
 builtin/index-pack.c | 16 +---
 fetch-object.c   | 23 +++
 fetch-object.h   |  6 ++
 fetch-pack.c |  8 ++--
 fetch-pack.h |  2 ++
 remote-curl.c| 14 +-
 transport.c  |  8 
 transport.h  |  8 
 10 files changed, 88 insertions(+), 6 deletions(-)
 create mode 100644 fetch-object.c
 create mode 100644 fetch-object.h

diff --git a/Makefile b/Makefile
index ed5960e6b..4303ef6f8 100644
--- a/Makefile
+++ b/Makefile
@@ -789,6 +789,7 @@ LIB_OBJS += ewah/ewah_bitmap.o
 LIB_OBJS += ewah/ewah_io.o
 LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
+LIB_OBJS += fetch-object.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
 LIB_OBJS += gettext.o
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 366b9d13f..9f303cf98 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -143,6 +143,14 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
args.update_shallow = 1;
continue;
}
+   if (!strcmp("--from-promisor", arg)) {
+   args.from_promisor = 1;
+   continue;
+   }
+   if (!strcmp("--no-haves", arg)) {
+   args.no_haves = 1;
+   continue;
+   }
usage(fetch_pack_usage);
}
if (deepen_not.nr)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 7ad170590..14ebb2f17 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1429,14 +1429,16 @@ static void write_special_file(const char *suffix, 
const char *msg,
if (close(fd) != 0)
die_errno(_("cannot close written %s file '%s'"),
  suffix, filename);
-   *report = suffix;
+   if (report)
+   *report = suffix;
}
strbuf_release(_buf);
 }
 
 static void final(const char *final_pack_name, const char *curr_pack_name,
  const char *final_index_name, const char *curr_index_name,
- const char *keep_msg, unsigned char *sha1)
+ const char *keep_msg, const char *promisor_msg,
+ unsigned char *sha1)
 {
const char *report = "pack";
struct strbuf pack_name = STRBUF_INIT;
@@ -1455,6 +1457,9 @@ static void final(const char *final_pack_name, const char 
*curr_pack_name,
if (keep_msg)
write_special_file("keep", keep_msg, final_pack_name, sha1,
   );
+   if (promisor_msg)
+   write_special_file("promisor", promisor_msg, final_pack_name,
+  sha1, NULL);
 
if (final_pack_name != curr_pack_name) {
if (!final_pack_name)
@@ -1644,6 +1649,7 @@ int cmd_index_pack(int argc, const char **argv, const 
char *prefix)
const char *curr_index;
const char *index_name = NULL, *pack_name = NULL;
const char *keep_msg = NULL;
+   const char *promisor_msg = NULL;
struct strbuf index_name_buf = STRBUF_INIT;
struct pack_idx_entry **idx_objects;
struct pack_idx_option opts;
@@ -1693,6 +1699,10 @@ int cmd_index_pack(int argc, const char **argv, const 
char *prefix)
keep_msg = "";
} else if (starts_with(arg, "--keep=")) {
keep_msg = arg + 7;
+   } else if (!strcmp(arg, "--promisor")) {
+   promisor_msg = "";
+   } else if (starts_with(arg, "--promisor=")) {
+   promisor_msg = arg + strlen("--promisor=");
} else if (starts_with(arg, "--threads=")) {
char *end;
nr_threads = strtoul(arg+10, , 0);
@@ -1803,7 +1813,7 @@ int cmd_index_pack(int argc, const char **argv, const 
char *prefix)
if (!verify)
final(pack_name, curr_pack,
  index_name, curr_index,
- keep_msg,
+ keep_msg, 

[PATCH 17/18] unpack-trees: batch fetching of missing blobs

2017-09-29 Thread Jonathan Tan
When running checkout, first prefetch all blobs that are to be updated
but are missing. This means that only one pack is downloaded during such
operations, instead of one per missing blob.

This operates only on the blob level - if a repository has a missing
tree, they are still fetched one at a time.

This does not use the delayed checkout mechanism introduced in commit
2841e8f ("convert: add "status=delayed" to filter process protocol",
2017-06-30) due to significant conceptual differences - in particular,
for partial clones, we already know what needs to be fetched based on
the contents of the local repo alone, whereas for status=delayed, it is
the filter process that tells us what needs to be checked in the end.

Signed-off-by: Jonathan Tan 
---
 fetch-object.c   | 27 +++
 fetch-object.h   |  5 +
 t/t5601-clone.sh | 52 
 unpack-trees.c   | 22 ++
 4 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/fetch-object.c b/fetch-object.c
index 369b61c0e..21b4dfafc 100644
--- a/fetch-object.c
+++ b/fetch-object.c
@@ -3,12 +3,12 @@
 #include "pkt-line.h"
 #include "strbuf.h"
 #include "transport.h"
+#include "fetch-object.h"
 
-void fetch_object(const char *remote_name, const unsigned char *sha1)
+static void fetch_refs(const char *remote_name, struct ref *ref)
 {
struct remote *remote;
struct transport *transport;
-   struct ref *ref;
int original_fetch_if_missing = fetch_if_missing;
 
fetch_if_missing = 0;
@@ -17,10 +17,29 @@ void fetch_object(const char *remote_name, const unsigned 
char *sha1)
die(_("Remote with no URL"));
transport = transport_get(remote, remote->url[0]);
 
-   ref = alloc_ref(sha1_to_hex(sha1));
-   hashcpy(ref->old_oid.hash, sha1);
transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
transport_set_option(transport, TRANS_OPT_NO_HAVES, "1");
transport_fetch_refs(transport, ref);
fetch_if_missing = original_fetch_if_missing;
 }
+
+void fetch_object(const char *remote_name, const unsigned char *sha1)
+{
+   struct ref *ref = alloc_ref(sha1_to_hex(sha1));
+   hashcpy(ref->old_oid.hash, sha1);
+   fetch_refs(remote_name, ref);
+}
+
+void fetch_objects(const char *remote_name, const struct oid_array *to_fetch)
+{
+   struct ref *ref = NULL;
+   int i;
+
+   for (i = 0; i < to_fetch->nr; i++) {
+   struct ref *new_ref = alloc_ref(oid_to_hex(_fetch->oid[i]));
+   oidcpy(_ref->old_oid, _fetch->oid[i]);
+   new_ref->next = ref;
+   ref = new_ref;
+   }
+   fetch_refs(remote_name, ref);
+}
diff --git a/fetch-object.h b/fetch-object.h
index f371300c8..4b269d07e 100644
--- a/fetch-object.h
+++ b/fetch-object.h
@@ -1,6 +1,11 @@
 #ifndef FETCH_OBJECT_H
 #define FETCH_OBJECT_H
 
+#include "sha1-array.h"
+
 extern void fetch_object(const char *remote_name, const unsigned char *sha1);
 
+extern void fetch_objects(const char *remote_name,
+ const struct oid_array *to_fetch);
+
 #endif
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 951b1ffa8..0ca24c215 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -611,6 +611,58 @@ test_expect_success 'partial clone: warn if server does 
not support blob-max-byt
test_i18ngrep "blob-max-bytes not recognized by server" err
 '
 
+test_expect_success 'batch missing blob request during checkout' '
+   rm -rf server client &&
+
+   test_create_repo server &&
+   echo a >server/a &&
+   echo b >server/b &&
+   git -C server add a b &&
+
+   git -C server commit -m x &&
+   echo aa >server/a &&
+   echo bb >server/b &&
+   git -C server add a b &&
+   git -C server commit -m x &&
+
+   test_config -C server uploadpack.advertiseblobmaxbytes 1 &&
+   test_config -C server uploadpack.allowanysha1inwant 1 &&
+
+   git clone --blob-max-bytes=0 "file://$(pwd)/server" client &&
+
+   # Ensure that there is only one negotiation by checking that there is
+   # only "done" line sent. ("done" marks the end of negotiation.)
+   GIT_TRACE_PACKET="$(pwd)/trace" git -C client checkout HEAD^ &&
+   grep "git> done" trace >done_lines &&
+   test_line_count = 1 done_lines
+'
+
+test_expect_success 'batch missing blob request does not inadvertently try to 
fetch gitlinks' '
+   rm -rf server client &&
+
+   test_create_repo repo_for_submodule &&
+   test_commit -C repo_for_submodule x &&
+
+   test_create_repo server &&
+   echo a >server/a &&
+   echo b >server/b &&
+   git -C server add a b &&
+   git -C server commit -m x &&
+
+   echo aa >server/a &&
+   echo bb >server/b &&
+   # Also add a gitlink pointing to an arbitrary repository
+   git -C server submodule add "$(pwd)/repo_for_submodule" c &&
+   

[PATCH 12/18] fetch-pack: support excluding large blobs

2017-09-29 Thread Jonathan Tan
Teach fetch-pack and upload-pack to support excluding large blobs
through a blob-max-bytes parameter.

Signed-off-by: Jonathan Tan 
---
 Documentation/technical/pack-protocol.txt |  9 
 Documentation/technical/protocol-capabilities.txt |  7 ++
 builtin/fetch-pack.c  | 11 +
 fetch-pack.c  | 11 +
 fetch-pack.h  |  1 +
 t/t5500-fetch-pack.sh | 27 +++
 upload-pack.c | 16 +-
 7 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/pack-protocol.txt 
b/Documentation/technical/pack-protocol.txt
index ed1eae8b8..db0e1150b 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -212,6 +212,7 @@ out of what the server said it could do with the first 
'want' line.
   upload-request=  want-list
   *shallow-line
   *1depth-request
+  *1blob-max-bytes
   flush-pkt
 
   want-list =  first-want
@@ -223,10 +224,14 @@ out of what the server said it could do with the first 
'want' line.
   PKT-LINE("deepen-since" SP timestamp) /
   PKT-LINE("deepen-not" SP ref)
 
+  blob-max-bytes=  PKT-LINE("blob-max-bytes" SP magnitude)
+
   first-want=  PKT-LINE("want" SP obj-id SP capability-list)
   additional-want   =  PKT-LINE("want" SP obj-id)
 
   depth =  1*DIGIT
+
+  magnitude =  1*DIGIT
 
 
 Clients MUST send all the obj-ids it wants from the reference
@@ -249,6 +254,10 @@ complete those commits. Commits whose parents are not 
received as a
 result are defined as shallow and marked as such in the server. This
 information is sent back to the client in the next step.
 
+The client can optionally request a partial packfile that omits blobs
+above a certain size threshold using "blob-max-bytes". Files whose names
+start with ".git" are always included in the packfile, however.
+
 Once all the 'want's and 'shallow's (and optional 'deepen') are
 transferred, clients MUST send a flush-pkt, to tell the server side
 that it is done sending the list.
diff --git a/Documentation/technical/protocol-capabilities.txt 
b/Documentation/technical/protocol-capabilities.txt
index 26dcc6f50..7e878fce5 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -309,3 +309,10 @@ to accept a signed push certificate, and asks the  
to be
 included in the push certificate.  A send-pack client MUST NOT
 send a push-cert packet unless the receive-pack server advertises
 this capability.
+
+blob-max-bytes
+--
+
+If the upload-pack server advertises this capability, fetch-pack
+may send "blob-max-bytes" to request the server to omit blobs above a
+certain size from the packfile.
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 9a7ebf6e9..116be9bf5 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -4,6 +4,7 @@
 #include "remote.h"
 #include "connect.h"
 #include "sha1-array.h"
+#include "config.h"
 
 static const char fetch_pack_usage[] =
 "git fetch-pack [--all] [--stdin] [--quiet | -q] [--keep | -k] [--thin] "
@@ -153,6 +154,16 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
args.no_haves = 1;
continue;
}
+   if (skip_prefix(arg, "--blob-max-bytes=", )) {
+   unsigned long *ptr = xmalloc(sizeof(*ptr));
+   if (!git_parse_ulong(arg, ptr)) {
+   error("Invalid --blob-max-bytes value: %s",
+ arg);
+   usage(fetch_pack_usage);
+   }
+   args.blob_max_bytes = ptr;
+   continue;
+   }
usage(fetch_pack_usage);
}
if (deepen_not.nr)
diff --git a/fetch-pack.c b/fetch-pack.c
index d376c4ef1..19b8e9322 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -26,6 +26,7 @@ static int prefer_ofs_delta = 1;
 static int no_done;
 static int deepen_since_ok;
 static int deepen_not_ok;
+static int blob_max_bytes_ok;
 static int fetch_fsck_objects = -1;
 static int transfer_fsck_objects = -1;
 static int agent_supported;
@@ -407,6 +408,13 @@ static int find_common(struct fetch_pack_args *args,
packet_buf_write(_buf, "deepen-not %s", s->string);
}
}
+   if (args->blob_max_bytes) {
+   if (blob_max_bytes_ok)
+   packet_buf_write(_buf, "blob-max-bytes %ld",
+*args->blob_max_bytes);
+   else
+   

[PATCH 18/18] fetch-pack: restore save_commit_buffer after use

2017-09-29 Thread Jonathan Tan
In fetch-pack, the global variable save_commit_buffer is set to 0, but
not restored to its original value after use.

In particular, if show_log() (in log-tree.c) is invoked after
fetch_pack() in the same process, show_log() will return before printing
out the commit message (because the invocation to
get_cached_commit_buffer() returns NULL, because the commit buffer was
not saved). I discovered this when attempting to run "git log -S" in a
partial clone, triggering the case where revision walking lazily loads
missing objects.

Therefore, restore save_commit_buffer to its original value after use.

An alternative to solve the problem I had is to replace
get_cached_commit_buffer() with get_commit_buffer(). That invocation was
introduced in commit a97934d ("use get_cached_commit_buffer where
appropriate", 2014-06-13) to replace "commit->buffer" introduced in
commit 3131b71 ("Add "--show-all" revision walker flag for debugging",
2008-02-13). In the latter commit, the commit author seems to be
deciding between not showing an unparsed commit at all and showing an
unparsed commit without the message (which is what the commit does), and
did not mention parsing the unparsed commit, so I prefer to preserve the
existing behavior.

Signed-off-by: Jonathan Tan 
---
 fetch-pack.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/fetch-pack.c b/fetch-pack.c
index 19b8e9322..8e6f54547 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -719,6 +719,7 @@ static int everything_local(struct fetch_pack_args *args,
 {
struct ref *ref;
int retval;
+   int old_save_commit_buffer = save_commit_buffer;
timestamp_t cutoff = 0;
 
save_commit_buffer = 0;
@@ -786,6 +787,9 @@ static int everything_local(struct fetch_pack_args *args,
print_verbose(args, _("already have %s (%s)"), 
oid_to_hex(remote),
  ref->name);
}
+
+   save_commit_buffer = old_save_commit_buffer;
+
return retval;
 }
 
-- 
2.14.2.822.g60be5d43e6-goog



[PATCH 15/18] clone: support excluding large blobs

2017-09-29 Thread Jonathan Tan
Teach clone to support excluding large blobs through a blob-max-bytes
parameter.

Signed-off-by: Jonathan Tan 
---
 builtin/clone.c  | 23 +--
 t/t5601-clone.sh | 49 +
 2 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index dbddd98f8..4c2193dc4 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -60,6 +60,7 @@ static struct string_list option_optional_reference = 
STRING_LIST_INIT_NODUP;
 static int option_dissociate;
 static int max_jobs = -1;
 static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP;
+static char *blob_max_bytes;
 
 static int recurse_submodules_cb(const struct option *opt,
 const char *arg, int unset)
@@ -135,6 +136,8 @@ static struct option builtin_clone_options[] = {
TRANSPORT_FAMILY_IPV4),
OPT_SET_INT('6', "ipv6", , N_("use IPv6 addresses only"),
TRANSPORT_FAMILY_IPV6),
+   OPT_STRING(0, "blob-max-bytes", _max_bytes, N_("bytes"),
+  N_("do not fetch blobs above this size")),
OPT_END()
 };
 
@@ -886,6 +889,8 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
struct refspec *refspec;
const char *fetch_pattern;
 
+   fetch_if_missing = 0;
+
packet_trace_identity("clone");
argc = parse_options(argc, argv, prefix, builtin_clone_options,
 builtin_clone_usage, 0);
@@ -1104,7 +1109,13 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
transport_set_option(transport, TRANS_OPT_UPLOADPACK,
 option_upload_pack);
 
-   if (transport->smart_options && !deepen)
+   if (blob_max_bytes) {
+   transport_set_option(transport, TRANS_OPT_BLOB_MAX_BYTES,
+blob_max_bytes);
+   transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
+   }
+
+   if (transport->smart_options && !deepen && !blob_max_bytes)
transport->smart_options->check_self_contained_and_connected = 
1;
 
refs = transport_get_remote_refs(transport);
@@ -1164,13 +1175,20 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
write_refspec_config(src_ref_prefix, our_head_points_at,
remote_head_points_at, _top);
 
+   if (blob_max_bytes) {
+   git_config_set("core.repositoryformatversion", "1");
+   git_config_set("extensions.partialclone", "origin");
+   repository_format_partial_clone = "origin";
+   }
+
if (is_local)
clone_local(path, git_dir);
else if (refs && complete_refs_before_fetch)
transport_fetch_refs(transport, mapped_refs);
 
update_remote_refs(refs, mapped_refs, remote_head_points_at,
-  branch_top.buf, reflog_msg.buf, transport, 
!is_local);
+  branch_top.buf, reflog_msg.buf, transport,
+  !is_local && !blob_max_bytes);
 
update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
@@ -1191,6 +1209,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
}
 
junk_mode = JUNK_LEAVE_REPO;
+   fetch_if_missing = 1;
err = checkout(submodule_progress);
 
strbuf_release(_msg);
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 9c56f771b..951b1ffa8 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -571,4 +571,53 @@ test_expect_success 'GIT_TRACE_PACKFILE produces a usable 
pack' '
git -C replay.git index-pack -v --stdin  err &&
+
+   test_i18ngrep "blob-max-bytes not recognized by server" err
+'
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'partial clone using HTTP' '
+   partial_clone "$HTTPD_DOCUMENT_ROOT_PATH/server" 
"$HTTPD_URL/smart/server"
+'
+
+stop_httpd
+
 test_done
-- 
2.14.2.822.g60be5d43e6-goog



[PATCH 11/18] pack-objects: support --blob-max-bytes

2017-09-29 Thread Jonathan Tan
As part of an effort to improve Git support for very large repositories
in which clients typically have only a subset of all version-controlled
blobs, teach pack-objects to support --blob-max-bytes, packing only
blobs not exceeding that size unless the blob corresponds to a file
whose name starts with ".git". upload-pack will eventually be taught to
use this new parameter if needed to exclude certain blobs during a fetch
or clone, potentially drastically reducing network consumption when
serving these very large repositories.

Since any excluded blob should not act as a delta base, I did this by
avoiding such oversized blobs from ever going into the "to_pack" data
structure, which contains both preferred bases (which do not go into the
final packfile but can act as delta bases) and the objects to be packed
(which go into the final packfile and also can act as delta bases). To
that end, add_object_entry() has been modified to exclude the
appropriate non-preferred-base objects. (Preferred bases, regardless of
size, still enter "to_pack" - they are something that the client
indicates that it has, not something that the server needs to serve, so
no exclusion occurs.)

If bitmaps are to be used, we would not know if a blob corresponded to a
file whose name starts with ".git". For this reason, when invoked with
--blob-max-bytes, pack-objects will not use bitmaps.

Signed-off-by: Jonathan Tan 
---
 Documentation/git-pack-objects.txt | 12 +-
 builtin/pack-objects.c | 33 ++--
 t/t5300-pack-object.sh | 45 ++
 t/test-lib-functions.sh| 12 ++
 4 files changed, 99 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-pack-objects.txt 
b/Documentation/git-pack-objects.txt
index 473a16135..ddce2d4c1 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -12,7 +12,8 @@ SYNOPSIS
 'git pack-objects' [-q | --progress | --all-progress] [--all-progress-implied]
[--no-reuse-delta] [--delta-base-offset] [--non-empty]
[--local] [--incremental] [--window=] [--depth=]
-   [--revs [--unpacked | --all]] [--stdout | base-name]
+   [--revs [--unpacked | --all]]
+   [--stdout [--blob-max-bytes=] | base-name]
[--shallow] [--keep-true-parents] < object-list
 
 
@@ -236,6 +237,15 @@ So does `git bundle` (see linkgit:git-bundle[1]) when it 
creates a bundle.
With this option, parents that are hidden by grafts are packed
nevertheless.
 
+--blob-max-bytes=::
+   This option can only be used with --stdout. If specified, a blob
+   larger than this will not be packed unless a to-be-packed tree
+   has that blob with a filename beginning with ".git".  The size
+   can be suffixed with "k", "m", or "g", and may be "0".
++
+If specified, after printing the packfile, pack-objects will print, in packet
+format (pkt-line), the names of excluded blobs and their sizes.
+
 SEE ALSO
 
 linkgit:git-rev-list[1]
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ef0b61d5f..8686b4351 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -26,6 +26,8 @@
 #include "argv-array.h"
 #include "mru.h"
 #include "packfile.h"
+#include "pkt-line.h"
+#include "varint.h"
 
 static const char *pack_usage[] = {
N_("git pack-objects --stdout [...] [<  | < 
]"),
@@ -81,6 +83,8 @@ static unsigned long cache_max_small_delta_size = 1000;
 
 static unsigned long window_memory_limit = 0;
 
+static long blob_max_bytes = -1;
+
 /*
  * stats
  */
@@ -1072,6 +1076,27 @@ static const char no_closure_warning[] = N_(
 "disabling bitmap writing, as some objects are not being packed"
 );
 
+/*
+ * Returns 1 if the given blob does not meet any defined blob size
+ * limits.  Blobs that appear as a tree entry whose basename begins with
+ * ".git" are never considered oversized.
+ */
+static int oversized(const unsigned char *sha1, const char *name) {
+   const char *last_slash, *basename;
+   unsigned long size;
+
+   if (blob_max_bytes < 0)
+   return 0;
+
+   last_slash = strrchr(name, '/');
+   basename = last_slash ? last_slash + 1 : name;
+   if (starts_with(basename, ".git"))
+   return 0;
+
+   sha1_object_info(sha1, );
+   return size > blob_max_bytes;
+}
+
 static int add_object_entry(const unsigned char *sha1, enum object_type type,
const char *name, int preferred_base)
 {
@@ -1082,7 +1107,8 @@ static int add_object_entry(const unsigned char *sha1, 
enum object_type type,
if (have_duplicate_entry(sha1, preferred_base, _pos))
return 0;
 
-   if (ignore_object(sha1, preferred_base, _pack, _offset)) {
+   if (ignore_object(sha1, preferred_base, _pack, _offset) ||
+   (!preferred_base && type == OBJ_BLOB && oversized(sha1, name))) {
/* The pack is missing 

[PATCH 07/18] sha1_file: support lazily fetching missing objects

2017-09-29 Thread Jonathan Tan
Teach sha1_file to fetch objects from the remote configured in
extensions.partialClone whenever an object is requested but missing.

The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.

However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file.c that take in a hash, without the user
 regarding how the object is stored (loose or packed)
 (2) functions in packfile.c (because I need to check callers that know
 about the loose/packed distinction and operate on both differently,
 and ensure that they can handle the concept of objects that are
 neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and others.  For
for_each_packed_object, the callers either already work or are fixed in
this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about missing objects
 - builtin/cat-file - warning message added in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
   - find_pack_entry_one
 - this searches a single pack that is provided as an argument; the
   caller already knows (through other means) that the sought object
   is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a file if
 it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promisor objects
   - builtin/count-objects - informational purposes only (check if loose
 object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
 not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

Signed-off-by: Jonathan Tan 
---
 builtin/cat-file.c   |  2 ++
 builtin/fetch-pack.c |  2 ++
 builtin/fsck.c   |  3 +++
 builtin/index-pack.c |  6 ++
 cache.h  |  8 
 fetch-object.c   |  3 +++
 sha1_file.c  | 38 
 t/t0410-partial-clone.sh | 51 
 8 files changed, 100 insertions(+), 13 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 4ccbfaac3..7aa1159ce 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -474,6 +474,8 @@ static int batch_objects(struct batch_options *opt)
 
for_each_loose_object(batch_loose_object, , 0);
for_each_packed_object(batch_packed_object, , 0);
+   if (repository_format_partial_clone)
+   warning("This repository has extensions.partialClone 
set. Some objects may not be loaded.");
 
cb.opt = opt;
cb.expand = 
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 9f303cf98..9a7ebf6e9 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -53,6 +53,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
struct oid_array shallow = OID_ARRAY_INIT;
struct string_list deepen_not = STRING_LIST_INIT_DUP;
 
+   fetch_if_missing = 0;
+
packet_trace_identity("fetch-pack");
 
memset(, 0, sizeof(args));
diff --git a/builtin/fsck.c b/builtin/fsck.c
index f6cb4d755..155ca663b 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -683,6 +683,9 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
int i;
struct alternate_object_database *alt;
 
+   /* fsck knows how to handle missing promisor objects */
+   fetch_if_missing = 0;
+
errors_found = 0;
check_replace_refs = 0;
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 14ebb2f17..440558046 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1657,6 +1657,12 @@ int cmd_index_pack(int argc, const char **argv, const 
char *prefix)
unsigned foreign_nr = 1;/* zero is a "good" value, assume bad */
int report_end_of_input = 0;
 
+   /*
+* index-pack never needs to fetch missing objects, since it only
+* accesses the repo to do hash collision checks
+*/
+   fetch_if_missing = 0;
+
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(index_pack_usage);
 
diff --git a/cache.h b/cache.h
index 437206d06..58d3f8986 100644
--- a/cache.h
+++ b/cache.h
@@ -1721,6 +1721,14 @@ struct object_info {
 

[PATCH 09/18] gc: do not repack promisor packfiles

2017-09-29 Thread Jonathan Tan
Teach gc to stop traversal at promisor objects, and to leave promisor
packfiles alone. This has the effect of only repacking non-promisor
packfiles, and preserves the distinction between promisor packfiles and
non-promisor packfiles.

Signed-off-by: Jonathan Tan 
---
 builtin/gc.c |  3 +++
 builtin/pack-objects.c   | 10 ++
 builtin/prune.c  |  7 +++
 builtin/repack.c |  7 +--
 t/t0410-partial-clone.sh | 52 +++-
 5 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index c22787ac7..7a9632bba 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -458,6 +458,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
argv_array_push(, prune_expire);
if (quiet)
argv_array_push(, "--no-progress");
+   if (repository_format_partial_clone)
+   argv_array_push(,
+   "--exclude-promisor-objects");
if (run_command_v_opt(prune.argv, RUN_GIT_CMD))
return error(FAILED_RUN, prune.argv[0]);
}
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a57b4f058..958822bf4 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -73,6 +73,8 @@ static int use_bitmap_index = -1;
 static int write_bitmap_index;
 static uint16_t write_bitmap_options;
 
+static int exclude_promisor_objects;
+
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
 static unsigned long cache_max_small_delta_size = 1000;
@@ -2952,6 +2954,8 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
 N_("use a bitmap index if available to speed up 
counting objects")),
OPT_BOOL(0, "write-bitmap-index", _bitmap_index,
 N_("write a bitmap index together with the pack 
index")),
+   OPT_BOOL(0, "exclude-promisor-objects", 
_promisor_objects,
+N_("do not pack objects in promisor packfiles")),
OPT_END(),
};
 
@@ -2997,6 +3001,12 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
argv_array_push(, "--unpacked");
}
 
+   if (exclude_promisor_objects) {
+   use_internal_rev_list = 1;
+   fetch_if_missing = 0;
+   argv_array_push(, "--exclude-promisor-objects");
+   }
+
if (!reuse_object)
reuse_delta = 0;
if (pack_compression_level == -1)
diff --git a/builtin/prune.c b/builtin/prune.c
index cddabf26a..be34645dc 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -101,12 +101,15 @@ int cmd_prune(int argc, const char **argv, const char 
*prefix)
 {
struct rev_info revs;
struct progress *progress = NULL;
+   int exclude_promisor_objects = 0;
const struct option options[] = {
OPT__DRY_RUN(_only, N_("do not remove, show only")),
OPT__VERBOSE(, N_("report pruned objects")),
OPT_BOOL(0, "progress", _progress, N_("show progress")),
OPT_EXPIRY_DATE(0, "expire", ,
N_("expire objects older than ")),
+   OPT_BOOL(0, "exclude-promisor-objects", 
_promisor_objects,
+N_("limit traversal to objects outside promisor 
packfiles")),
OPT_END()
};
char *s;
@@ -139,6 +142,10 @@ int cmd_prune(int argc, const char **argv, const char 
*prefix)
show_progress = isatty(2);
if (show_progress)
progress = start_delayed_progress(_("Checking connectivity"), 
0);
+   if (exclude_promisor_objects) {
+   fetch_if_missing = 0;
+   revs.exclude_promisor_objects = 1;
+   }
 
mark_reachable_objects(, 1, expire, progress);
stop_progress();
diff --git a/builtin/repack.c b/builtin/repack.c
index f17a68a17..f43317bb5 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -83,7 +83,8 @@ static void remove_pack_on_signal(int signo)
 
 /*
  * Adds all packs hex strings to the fname list, which do not
- * have a corresponding .keep file.
+ * have a corresponding .keep or .promisor file. These packs are not to
+ * be kept if we are going to pack everything into one file.
  */
 static void get_non_kept_pack_filenames(struct string_list *fname_list)
 {
@@ -101,7 +102,8 @@ static void get_non_kept_pack_filenames(struct string_list 
*fname_list)
 
fname = xmemdupz(e->d_name, len);
 
-   if (!file_exists(mkpath("%s/%s.keep", packdir, fname)))
+   if (!file_exists(mkpath("%s/%s.keep", packdir, fname)) &&
+   !file_exists(mkpath("%s/%s.promisor", packdir, 

[PATCH 14/18] fetch: support excluding large blobs

2017-09-29 Thread Jonathan Tan
Teach fetch to support excluding large blobs through a blob-max-bytes
parameter. This is only allowed for the remote configured in
extensions.partialclone.

Signed-off-by: Jonathan Tan 
---
 builtin/fetch.c   | 22 --
 connected.c   |  1 +
 remote-curl.c |  7 +++
 t/t5500-fetch-pack.sh | 36 
 transport-helper.c|  4 
 transport.c   | 10 ++
 transport.h   |  4 
 7 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 1b1f03923..07beaf5b5 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -55,6 +55,7 @@ static int recurse_submodules_default = 
RECURSE_SUBMODULES_ON_DEMAND;
 static int shown_url = 0;
 static int refmap_alloc, refmap_nr;
 static const char **refmap_array;
+static const char *blob_max_bytes;
 
 static int git_fetch_config(const char *k, const char *v, void *cb)
 {
@@ -160,6 +161,8 @@ static struct option builtin_fetch_options[] = {
TRANSPORT_FAMILY_IPV4),
OPT_SET_INT('6', "ipv6", , N_("use IPv6 addresses only"),
TRANSPORT_FAMILY_IPV6),
+   OPT_STRING(0, "blob-max-bytes", _max_bytes, N_("bytes"),
+  N_("do not fetch blobs above this size")),
OPT_END()
 };
 
@@ -1044,6 +1047,10 @@ static struct transport *prepare_transport(struct remote 
*remote, int deepen)
set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, "yes");
if (update_shallow)
set_option(transport, TRANS_OPT_UPDATE_SHALLOW, "yes");
+   if (blob_max_bytes) {
+   set_option(transport, TRANS_OPT_BLOB_MAX_BYTES, blob_max_bytes);
+   set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
+   }
return transport;
 }
 
@@ -1328,6 +1335,8 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
 
packet_trace_identity("fetch");
 
+   fetch_if_missing = 0;
+
/* Record the command line for the reflog */
strbuf_addstr(_rla, "fetch");
for (i = 1; i < argc; i++)
@@ -1361,6 +1370,9 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
if (depth || deepen_since || deepen_not.nr)
deepen = 1;
 
+   if (blob_max_bytes && !repository_format_partial_clone)
+   die("--blob-max-bytes can only be used when 
extensions.partialClone is set");
+
if (all) {
if (argc == 1)
die(_("fetch --all does not take a repository 
argument"));
@@ -1390,10 +1402,16 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
}
}
 
-   if (remote)
+   if (remote) {
+   if (blob_max_bytes &&
+   strcmp(remote->name, repository_format_partial_clone))
+   die(_("--blob-max-bytes can only be used with the 
remote configured in core.partialClone"));
result = fetch_one(remote, argc, argv);
-   else
+   } else {
+   if (blob_max_bytes)
+   die(_("--blob-max-bytes can only be used with the 
remote configured in core.partialClone"));
result = fetch_multiple();
+   }
 
if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
struct argv_array options = ARGV_ARRAY_INIT;
diff --git a/connected.c b/connected.c
index f416b0505..a51c01d63 100644
--- a/connected.c
+++ b/connected.c
@@ -56,6 +56,7 @@ int check_connected(sha1_iterate_fn fn, void *cb_data,
argv_array_push(_list.args,"rev-list");
argv_array_push(_list.args, "--objects");
argv_array_push(_list.args, "--stdin");
+   argv_array_push(_list.args, "--exclude-promisor-objects");
argv_array_push(_list.args, "--not");
argv_array_push(_list.args, "--all");
argv_array_push(_list.args, "--quiet");
diff --git a/remote-curl.c b/remote-curl.c
index 34a81b8d3..18fd184c1 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -24,6 +24,7 @@ struct options {
char *deepen_since;
struct string_list deepen_not;
struct string_list push_options;
+   char *blob_max_bytes;
unsigned progress : 1,
check_self_contained_and_connected : 1,
cloning : 1,
@@ -165,6 +166,9 @@ static int set_option(const char *name, const char *value)
} else if (!strcmp(name, "no-haves")) {
options.no_haves = 1;
return 0;
+   } else if (!strcmp(name, "blob-max-bytes")) {
+   options.blob_max_bytes = xstrdup(value);
+   return 0;
} else {
return 1 /* unsupported */;
}
@@ -834,6 +838,9 @@ static int fetch_git(struct discovery *heads,
argv_array_push(, "--from-promisor");
if (options.no_haves)
argv_array_push(, "--no-haves");
+   if 

[PATCH 04/18] fsck: support promisor objects as CLI argument

2017-09-29 Thread Jonathan Tan
Teach fsck to not treat missing promisor objects provided on the CLI as
an error when extensions.partialclone is set.

Signed-off-by: Jonathan Tan 
---
 builtin/fsck.c   |  2 ++
 t/t0410-partial-clone.sh | 13 +
 2 files changed, 15 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 4492a4fab..f6cb4d755 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -755,6 +755,8 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
struct object *obj = lookup_object(oid.hash);
 
if (!obj || !(obj->flags & HAS_OBJ)) {
+   if (is_promisor_object())
+   continue;
error("%s: object missing", oid_to_hex());
errors_found |= ERROR_OBJECT;
continue;
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 4f9931f9b..e96f436b0 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -125,4 +125,17 @@ test_expect_success 'missing object, but promised, passes 
fsck' '
git -C repo fsck
 '
 
+test_expect_success 'missing CLI object, but promised, passes fsck' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo my_commit &&
+
+   A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+   promise_and_delete "$A" &&
+
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config extensions.partialclone "arbitrary string" &&
+   git -C repo fsck "$A"
+'
+
 test_done
-- 
2.14.2.822.g60be5d43e6-goog



[PATCH 16/18] clone: configure blobmaxbytes in created repos

2017-09-29 Thread Jonathan Tan
Teach clone to configure blobmaxbytes in any repos that it generates
when the --blob-max-bytes parameter is set. Also teach fetch to use this
parameter.

Signed-off-by: Jonathan Tan 
---
 builtin/clone.c   |  1 +
 builtin/fetch.c   |  4 
 remote.c  |  2 ++
 remote.h  |  2 ++
 t/t5500-fetch-pack.sh | 64 ++-
 5 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4c2193dc4..58cbc8ae3 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1179,6 +1179,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
git_config_set("core.repositoryformatversion", "1");
git_config_set("extensions.partialclone", "origin");
repository_format_partial_clone = "origin";
+   git_config_set("remote.origin.blobmaxbytes", blob_max_bytes);
}
 
if (is_local)
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 07beaf5b5..ace238554 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1050,6 +1050,10 @@ static struct transport *prepare_transport(struct remote 
*remote, int deepen)
if (blob_max_bytes) {
set_option(transport, TRANS_OPT_BLOB_MAX_BYTES, blob_max_bytes);
set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
+   } else if (remote->blob_max_bytes) {
+   set_option(transport, TRANS_OPT_BLOB_MAX_BYTES,
+  remote->blob_max_bytes);
+   set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
}
return transport;
 }
diff --git a/remote.c b/remote.c
index 411309006..eade3c312 100644
--- a/remote.c
+++ b/remote.c
@@ -440,6 +440,8 @@ static int handle_config(const char *key, const char 
*value, void *cb)
 key, value);
} else if (!strcmp(subkey, "vcs")) {
return git_config_string(>foreign_vcs, key, value);
+   } else if (!strcmp(subkey, "blobmaxbytes")) {
+   return git_config_string(>blob_max_bytes, key, value);
}
return 0;
 }
diff --git a/remote.h b/remote.h
index 2ecf4c8c7..3d56e62b7 100644
--- a/remote.h
+++ b/remote.h
@@ -56,6 +56,8 @@ struct remote {
 */
char *http_proxy;
char *http_proxy_authmethod;
+
+   const char *blob_max_bytes;
 };
 
 struct remote *remote_get(const char *name);
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index b2682862f..ee533ea32 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -782,9 +782,9 @@ test_expect_success '--blob-max-bytes has no effect if 
support for it is not adv
test_i18ngrep "blob-max-bytes not recognized by server" err
 '
 
-fetch_blob_max_bytes () {
-   SERVER="$1"
-   URL="$2"
+setup_blob_max_bytes () {
+   SERVER="$1" &&
+   URL="$2" &&
 
rm -rf "$SERVER" client &&
test_create_repo "$SERVER" &&
@@ -794,7 +794,11 @@ fetch_blob_max_bytes () {
git clone "$URL" client &&
test_config -C client extensions.partialclone origin &&
 
-   test_commit -C "$SERVER" two &&
+   test_commit -C "$SERVER" two
+}
+
+do_blob_max_bytes() {
+   SERVER="$1" &&
 
git -C client fetch --blob-max-bytes=0 origin HEAD:somewhere &&
 
@@ -805,14 +809,62 @@ fetch_blob_max_bytes () {
 }
 
 test_expect_success 'fetch with --blob-max-bytes' '
-   fetch_blob_max_bytes server server
+   setup_blob_max_bytes server server &&
+   do_blob_max_bytes server
+'
+
+test_expect_success 'fetch respects configured blobmaxbytes' '
+   setup_blob_max_bytes server server &&
+
+   test_config -C client remote.origin.blobmaxbytes 0 &&
+
+   git -C client fetch origin HEAD:somewhere &&
+
+   # Ensure that commit is fetched, but blob is not
+   test_config -C client extensions.partialclone "arbitrary string" &&
+   git -C client cat-file -e $(git -C server rev-parse two) &&
+   test_must_fail git -C client cat-file -e $(git hash-object server/two.t)
+'
+
+test_expect_success 'pull respects configured blobmaxbytes' '
+   setup_blob_max_bytes server server &&
+
+   # Hide two.t from tip so that client does not load it upon the
+   # automatic checkout that pull performs
+   git -C server rm two.t &&
+   test_commit -C server three &&
+
+   test_config -C server uploadpack.allowanysha1inwant 1 &&
+   test_config -C client remote.origin.blobmaxbytes 0 &&
+
+   git -C client pull origin &&
+
+   # Ensure that commit is fetched, but blob is not
+   test_config -C client extensions.partialclone "arbitrary string" &&
+   git -C client cat-file -e $(git -C server rev-parse two) &&
+   test_must_fail git -C client cat-file -e $(git hash-object server/two.t)
+'
+
+test_expect_success 'clone configures blobmaxbytes' '
+   rm -rf server client &&
+   test_create_repo server &&
+ 

[PATCH 05/18] index-pack: refactor writing of .keep files

2017-09-29 Thread Jonathan Tan
In a subsequent commit, index-pack will be taught to write ".promisor"
files which are similar to the ".keep" files it knows how to write.
Refactor the writing of ".keep" files, so that the implementation of
writing ".promisor" files becomes easier.

Signed-off-by: Jonathan Tan 
---
 builtin/index-pack.c | 99 
 1 file changed, 53 insertions(+), 46 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f2be145e1..7ad170590 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1389,15 +1389,58 @@ static void fix_unresolved_deltas(struct sha1file *f)
free(sorted_by_pos);
 }
 
+static const char *derive_filename(const char *pack_name, const char *suffix,
+  struct strbuf *buf)
+{
+   size_t len;
+   if (!strip_suffix(pack_name, ".pack", ))
+   die(_("packfile name '%s' does not end with '.pack'"),
+   pack_name);
+   strbuf_add(buf, pack_name, len);
+   strbuf_addch(buf, '.');
+   strbuf_addstr(buf, suffix);
+   return buf->buf;
+}
+
+static void write_special_file(const char *suffix, const char *msg,
+  const char *pack_name, const unsigned char *sha1,
+  const char **report)
+{
+   struct strbuf name_buf = STRBUF_INIT;
+   const char *filename;
+   int fd;
+   int msg_len = strlen(msg);
+
+   if (pack_name)
+   filename = derive_filename(pack_name, suffix, _buf);
+   else
+   filename = odb_pack_name(_buf, sha1, suffix);
+
+   fd = odb_pack_keep(filename);
+   if (fd < 0) {
+   if (errno != EEXIST)
+   die_errno(_("cannot write %s file '%s'"),
+ suffix, filename);
+   } else {
+   if (msg_len > 0) {
+   write_or_die(fd, msg, msg_len);
+   write_or_die(fd, "\n", 1);
+   }
+   if (close(fd) != 0)
+   die_errno(_("cannot close written %s file '%s'"),
+ suffix, filename);
+   *report = suffix;
+   }
+   strbuf_release(_buf);
+}
+
 static void final(const char *final_pack_name, const char *curr_pack_name,
  const char *final_index_name, const char *curr_index_name,
- const char *keep_name, const char *keep_msg,
- unsigned char *sha1)
+ const char *keep_msg, unsigned char *sha1)
 {
const char *report = "pack";
struct strbuf pack_name = STRBUF_INIT;
struct strbuf index_name = STRBUF_INIT;
-   struct strbuf keep_name_buf = STRBUF_INIT;
int err;
 
if (!from_stdin) {
@@ -1409,28 +1452,9 @@ static void final(const char *final_pack_name, const 
char *curr_pack_name,
die_errno(_("error while closing pack file"));
}
 
-   if (keep_msg) {
-   int keep_fd, keep_msg_len = strlen(keep_msg);
-
-   if (!keep_name)
-   keep_name = odb_pack_name(_name_buf, sha1, "keep");
-
-   keep_fd = odb_pack_keep(keep_name);
-   if (keep_fd < 0) {
-   if (errno != EEXIST)
-   die_errno(_("cannot write keep file '%s'"),
- keep_name);
-   } else {
-   if (keep_msg_len > 0) {
-   write_or_die(keep_fd, keep_msg, keep_msg_len);
-   write_or_die(keep_fd, "\n", 1);
-   }
-   if (close(keep_fd) != 0)
-   die_errno(_("cannot close written keep file 
'%s'"),
- keep_name);
-   report = "keep";
-   }
-   }
+   if (keep_msg)
+   write_special_file("keep", keep_msg, final_pack_name, sha1,
+  );
 
if (final_pack_name != curr_pack_name) {
if (!final_pack_name)
@@ -1472,7 +1496,6 @@ static void final(const char *final_pack_name, const char 
*curr_pack_name,
 
strbuf_release(_name);
strbuf_release(_name);
-   strbuf_release(_name_buf);
 }
 
 static int git_index_pack_config(const char *k, const char *v, void *cb)
@@ -1615,26 +1638,13 @@ static void show_pack_info(int stat_only)
}
 }
 
-static const char *derive_filename(const char *pack_name, const char *suffix,
-  struct strbuf *buf)
-{
-   size_t len;
-   if (!strip_suffix(pack_name, ".pack", ))
-   die(_("packfile name '%s' does not end with '.pack'"),
-   pack_name);
-   strbuf_add(buf, pack_name, len);
-   strbuf_addstr(buf, suffix);
-   return buf->buf;
-}
-
 int cmd_index_pack(int 

[PATCH 08/18] rev-list: support termination at promisor objects

2017-09-29 Thread Jonathan Tan
Teach rev-list to support termination of an object traversal at any
object from a promisor remote (whether one that the local repo also has,
or one that the local repo knows about because it has another promisor
object that references it).

This will be used subsequently in gc and in the connectivity check used
by fetch.

For efficiency, if an object is referenced by a promisor object, and is
in the local repo only as a non-promisor object, object traversal will
not stop there. This is to avoid building the list of promisor object
references.

(In list-objects.c, the case where obj is NULL in process_blob() and
process_tree() do not need to be changed because those happen only when
there is a conflict between the expected type and the existing object.
If the object doesn't exist, an object will be synthesized, which is
fine.)

Signed-off-by: Jonathan Tan 
---
 builtin/rev-list.c   |  13 ++
 list-objects.c   |  16 +++-
 object.c |   2 +-
 revision.c   |  33 +++-
 revision.h   |   5 ++-
 t/t0410-partial-clone.sh | 101 +++
 6 files changed, 165 insertions(+), 5 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index c1c74d4a7..bba800cb9 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -12,6 +12,7 @@
 #include "bisect.h"
 #include "progress.h"
 #include "reflog-walk.h"
+#include "packfile.h"
 
 static const char rev_list_usage[] =
 "git rev-list [OPTION] ... [ -- paths... ]\n"
@@ -287,6 +288,18 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
init_revisions(, prefix);
revs.abbrev = DEFAULT_ABBREV;
revs.commit_format = CMIT_FMT_UNSPECIFIED;
+
+   /*
+* Scan the argument list before invoking setup_revisions(), so that we
+* know if fetch_if_missing needs to be set to 0.
+*/
+   for (i = 1; i < argc; i++) {
+   if (!strcmp(argv[i], "--exclude-promisor-objects")) {
+   fetch_if_missing = 0;
+   break;
+   }
+   }
+
argc = setup_revisions(argc, argv, , NULL);
 
memset(, 0, sizeof(info));
diff --git a/list-objects.c b/list-objects.c
index b3931fa43..8a98192dd 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -7,6 +7,7 @@
 #include "tree-walk.h"
 #include "revision.h"
 #include "list-objects.h"
+#include "packfile.h"
 
 static void process_blob(struct rev_info *revs,
 struct blob *blob,
@@ -24,6 +25,13 @@ static void process_blob(struct rev_info *revs,
die("bad blob object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
+   if (!has_object_file(>oid)) {
+   if (revs->exclude_promisor_objects &&
+   is_promisor_object(>oid)) {
+   return;
+   }
+   /* error message will be reported later */
+   }
obj->flags |= SEEN;
 
pathlen = path->len;
@@ -77,6 +85,8 @@ static void process_tree(struct rev_info *revs,
enum interesting match = revs->diffopt.pathspec.nr == 0 ?
all_entries_interesting: entry_not_interesting;
int baselen = base->len;
+   int gently = revs->ignore_missing_links ||
+revs->exclude_promisor_objects;
 
if (!revs->tree_objects)
return;
@@ -84,9 +94,13 @@ static void process_tree(struct rev_info *revs,
die("bad tree object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
-   if (parse_tree_gently(tree, revs->ignore_missing_links) < 0) {
+   if (parse_tree_gently(tree, gently) < 0) {
if (revs->ignore_missing_links)
return;
+   if (revs->exclude_promisor_objects &&
+   is_promisor_object(>oid)) {
+   return;
+   }
die("bad tree object %s", oid_to_hex(>oid));
}
 
diff --git a/object.c b/object.c
index 321d7e920..0f2490145 100644
--- a/object.c
+++ b/object.c
@@ -252,7 +252,7 @@ struct object *parse_object(const struct object_id *oid)
if (obj && obj->parsed)
return obj;
 
-   if ((obj && obj->type == OBJ_BLOB) ||
+   if ((obj && obj->type == OBJ_BLOB && has_object_file(oid)) ||
(!obj && has_object_file(oid) &&
 sha1_object_info(oid->hash, NULL) == OBJ_BLOB)) {
if (check_sha1_signature(repl, NULL, 0, NULL) < 0) {
diff --git a/revision.c b/revision.c
index f9a90d71d..f92d03bfe 100644
--- a/revision.c
+++ b/revision.c
@@ -197,6 +197,8 @@ static struct object *get_reference(struct rev_info *revs, 
const char *name,
if (!object) {
if (revs->ignore_missing)
return object;
+   if (revs->exclude_promisor_objects && is_promisor_object(oid))
+  

[PATCH 02/18] fsck: support refs pointing to promisor objects

2017-09-29 Thread Jonathan Tan
Teach fsck to not treat refs referring to missing promisor objects as an
error when extensions.partialclone is set.

For the purposes of warning about no default refs, such refs are still
treated as legitimate refs.

Signed-off-by: Jonathan Tan 
---
 builtin/fsck.c   |  8 
 t/t0410-partial-clone.sh | 24 
 2 files changed, 32 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 97d1e621e..f1529527b 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -439,6 +439,14 @@ static int fsck_handle_ref(const char *refname, const 
struct object_id *oid,
 
obj = parse_object(oid);
if (!obj) {
+   if (is_promisor_object(oid)) {
+   /*
+* Increment default_refs anyway, because this is a
+* valid ref.
+*/
+default_refs++;
+return 0;
+   }
error("%s: invalid sha1 pointer %s", refname, oid_to_hex(oid));
errors_found |= ERROR_REACHABLE;
/* We'll continue with the rest despite the error.. */
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 3ddb3b98f..bf75162c1 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -13,6 +13,14 @@ pack_as_from_promisor () {
>repo/.git/objects/pack/pack-$HASH.promisor
 }
 
+promise_and_delete () {
+   HASH=$(git -C repo rev-parse "$1") &&
+   git -C repo tag -a -m message my_annotated_tag "$HASH" &&
+   git -C repo rev-parse my_annotated_tag | pack_as_from_promisor &&
+   git -C repo tag -d my_annotated_tag &&
+   delete_object repo "$HASH"
+}
+
 test_expect_success 'missing reflog object, but promised by a commit, passes 
fsck' '
test_create_repo repo &&
test_commit -C repo my_commit &&
@@ -78,4 +86,20 @@ test_expect_success 'missing reflog object alone fails fsck, 
even with extension
test_must_fail git -C repo fsck
 '
 
+test_expect_success 'missing ref object, but promised, passes fsck' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo my_commit &&
+
+   A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+
+   # Reference $A only from ref
+   git -C repo branch my_branch "$A" &&
+   promise_and_delete "$A" &&
+
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config extensions.partialclone "arbitrary string" &&
+   git -C repo fsck
+'
+
 test_done
-- 
2.14.2.822.g60be5d43e6-goog



[PATCH 13/18] fetch: refactor calculation of remote list

2017-09-29 Thread Jonathan Tan
Separate out the calculation of remotes to be fetched from and the
actual fetching. This will allow us to include an additional step before
the actual fetching in a subsequent commit.

Signed-off-by: Jonathan Tan 
---
 builtin/fetch.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 225c73492..1b1f03923 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1322,7 +1322,7 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
 {
int i;
struct string_list list = STRING_LIST_INIT_DUP;
-   struct remote *remote;
+   struct remote *remote = NULL;
int result = 0;
struct argv_array argv_gc_auto = ARGV_ARRAY_INIT;
 
@@ -1367,17 +1367,14 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
else if (argc > 1)
die(_("fetch --all does not make sense with refspecs"));
(void) for_each_remote(get_one_remote_for_fetch, );
-   result = fetch_multiple();
} else if (argc == 0) {
/* No arguments -- use default remote */
remote = remote_get(NULL);
-   result = fetch_one(remote, argc, argv);
} else if (multiple) {
/* All arguments are assumed to be remotes or groups */
for (i = 0; i < argc; i++)
if (!add_remote_or_group(argv[i], ))
die(_("No such remote or remote group: %s"), 
argv[i]);
-   result = fetch_multiple();
} else {
/* Single remote or group */
(void) add_remote_or_group(argv[0], );
@@ -1385,14 +1382,19 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
/* More than one remote */
if (argc > 1)
die(_("Fetching a group and specifying refspecs 
does not make sense"));
-   result = fetch_multiple();
} else {
/* Zero or one remotes */
remote = remote_get(argv[0]);
-   result = fetch_one(remote, argc-1, argv+1);
+   argc--;
+   argv++;
}
}
 
+   if (remote)
+   result = fetch_one(remote, argc, argv);
+   else
+   result = fetch_multiple();
+
if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
struct argv_array options = ARGV_ARRAY_INIT;
 
-- 
2.14.2.822.g60be5d43e6-goog



[PATCH 01/18] fsck: introduce partialclone extension

2017-09-29 Thread Jonathan Tan
Currently, Git does not support repos with very large numbers of objects
or repos that wish to minimize manipulation of certain blobs (for
example, because they are very large) very well, even if the user
operates mostly on part of the repo, because Git is designed on the
assumption that every referenced object is available somewhere in the
repo storage. In such an arrangement, the full set of objects is usually
available in remote storage, ready to be lazily downloaded.

Introduce the ability to have missing objects in a repo.  This
functionality is guarded behind a new repository extension option
`extensions.partialClone`. See the update to
Documentation/technical/repository-version.txt in this patch for more
information.

Teach fsck about the new state of affairs. In this commit, teach fsck
that missing promisor objects referenced from the reflog are not an
error case; in future commits, fsck will be taught about other cases.

Signed-off-by: Jonathan Tan 
---
 Documentation/technical/repository-version.txt | 12 
 builtin/fsck.c |  2 +-
 cache.h|  5 +-
 environment.c  |  1 +
 packfile.c | 77 +++-
 packfile.h | 13 +
 setup.c|  7 ++-
 t/t0410-partial-clone.sh   | 81 ++
 8 files changed, 192 insertions(+), 6 deletions(-)
 create mode 100755 t/t0410-partial-clone.sh

diff --git a/Documentation/technical/repository-version.txt 
b/Documentation/technical/repository-version.txt
index 00ad37986..074ccb9e0 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -86,3 +86,15 @@ for testing format-1 compatibility.
 When the config key `extensions.preciousObjects` is set to `true`,
 objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
 `git repack -d`).
+
+`partialClone`
+~~
+
+When the config key `extensions.partialClone` is set, a remote is
+designated as a "promisor remote". Objects referenced by packed objects
+obtained from that promisor remote do not need to be in the local repo.
+Instead, the promisor remote promises that all such objects can be
+fetched from it in the future, and Git will fetch those objects when
+needed.
+
+The value of this key is the name of the aforementioned promisor remote.
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 1e4c471b4..97d1e621e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -403,7 +403,7 @@ static void fsck_handle_reflog_oid(const char *refname, 
struct object_id *oid,
xstrfmt("%s@{%"PRItime"}", refname, 
timestamp));
obj->flags |= USED;
mark_object_reachable(obj);
-   } else {
+   } else if (!is_promisor_object(oid)) {
error("%s: invalid reflog entry %s", refname, 
oid_to_hex(oid));
errors_found |= ERROR_REACHABLE;
}
diff --git a/cache.h b/cache.h
index a916bc79e..437206d06 100644
--- a/cache.h
+++ b/cache.h
@@ -853,10 +853,12 @@ extern int grafts_replace_parents;
 #define GIT_REPO_VERSION 0
 #define GIT_REPO_VERSION_READ 1
 extern int repository_format_precious_objects;
+extern char *repository_format_partial_clone;
 
 struct repository_format {
int version;
int precious_objects;
+   char *partial_clone;
int is_bare;
char *work_tree;
struct string_list unknown_extensions;
@@ -1578,7 +1580,8 @@ extern struct packed_git {
unsigned pack_local:1,
 pack_keep:1,
 freshened:1,
-do_not_close:1;
+do_not_close:1,
+pack_promisor:1;
unsigned char sha1[20];
struct revindex_entry *revindex;
/* something like ".git/objects/pack/x.pack" */
diff --git a/environment.c b/environment.c
index f1f934b6f..624385fb2 100644
--- a/environment.c
+++ b/environment.c
@@ -27,6 +27,7 @@ int warn_ambiguous_refs = 1;
 int warn_on_object_refname_ambiguity = 1;
 int ref_paranoia = -1;
 int repository_format_precious_objects;
+char *repository_format_partial_clone;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
 const char *apply_default_whitespace;
diff --git a/packfile.c b/packfile.c
index f86fa051c..88f424320 100644
--- a/packfile.c
+++ b/packfile.c
@@ -8,6 +8,11 @@
 #include "list.h"
 #include "streaming.h"
 #include "sha1-lookup.h"
+#include "commit.h"
+#include "object.h"
+#include "tag.h"
+#include "tree-walk.h"
+#include "tree.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -638,10 +643,10 @@ struct packed_git *add_packed_git(const char *path, 
size_t path_len, int local)

[PATCH 00/18] Partial clone (from clone to lazy fetch in 18 patches)

2017-09-29 Thread Jonathan Tan
These patches are also available online:
https://github.com/jonathantanmy/git/commits/partialclone3

(I've announced it in another e-mail, but am now sending the patches to the
mailing list too.)

Here's an update of my work so far. Notable features:
 - These 18 patches allow a user to clone with --blob-max-bytes=,
   creating a partial clone that is automatically configured to lazily
   fetch missing objects from the origin. The local repo also has fsck
   working offline, and GC working (albeit only on locally created
   objects).
 - Cloning and fetching is currently only able to exclude blobs by a
   size threshold, but the local repository is already capable of
   fetching missing objects of any type. For example, if a repository
   with missing trees or commits is generated by any tool (for example,
   a future version of Git), current Git with my patches will still be
   able to operate on them, automatically fetching those missing trees
   and commits when needed.
 - Missing blobs are fetched all at once during checkout.

Jeff Hostetler has sent out some object-filtering patches [1] that is a
superset of the object-filtering functionality that I have (in the
pack-objects patches). I have gone for the minimal approach here, but if
his patches are merged, I'll update my patch set to use those.

[1] https://public-inbox.org/git/20170922203017.53986-6-...@jeffhostetler.com/

Demo


Obtain a repository.

$ make prefix=$HOME/local install
$ cd $HOME/tmp
$ git clone https://github.com/git/git

Make it advertise the new feature and allow requests for arbitrary blobs.

$ git -C git config uploadpack.advertiseblobmaxbytes 1
$ git -C git config uploadpack.allowanysha1inwant 1

Perform the partial clone and check that it is indeed smaller. Specify
"file://" in order to test the partial clone mechanism. (If not, Git will
perform a local clone, which unselectively copies every object.)

$ git clone --blob-max-bytes=0 "file://$(pwd)/git" git2
$ git clone "file://$(pwd)/git" git3
$ du -sh git2 git3
85M git2
130Mgit3

Observe that the new repo is automatically configured to fetch missing objects
from the original repo. Subsequent fetches will also be partial.

$ cat git2/.git/config
[core]
repositoryformatversion = 1
filemode = true
bare = false
logallrefupdates = true
[remote "origin"]
url = [snip]
fetch = +refs/heads/*:refs/remotes/origin/*
blobmaxbytes = 0
[extensions]
partialclone = origin
[branch "master"]
remote = origin
merge = refs/heads/master

Design
==

Local repository layout
---

A repository declares its dependence on a *promisor remote* (a remote that
declares that it can serve certain objects when requested) by a repository
extension "partialclone". `extensions.partialclone` must be set to the name of
the remote ("origin" in the demo above).

A packfile can be annotated as originating from the promisor remote by the
existence of a "(packfile name).promisor" file with arbitrary contents (similar
to the ".keep" file). Whenever a promisor remote sends an object, it declares
that it can serve every object directly or indirectly referenced by the sent
object.

A promisor packfile is a packfile annotated with the ".promisor" file. A
promisor object is an object that the promisor remote is known to be able to
serve, because it is an object in a promisor packfile or directly referred to by
one.

(In the future, we might need to add ".promisor" support to loose objects.)

Connectivity check and gc
-

The object walk done by the connectivity check (as used by fsck and fetch) stops
at all promisor objects.

The object walk done by gc also stops at all promisor objects. Only non-promisor
packfiles are deleted (if pack deletion is requested); promisor packfiles are
left alone. This maintains the distinction between promisor packfiles and
non-promisor packfiles. (In the future, we might need to do something more
sophisticated with promisor packfiles.)

Fetching of missing objects
---

When `sha1_object_info_extended()` (or similar) is invoked, it will
automatically attempt to fetch a missing object from the promisor remote if that
object is not in the local repository. For efficiency, no check is made as to
whether that object is known to be a promisor object or not.

This automatic fetching can be toggled on and off by the `fetch_if_missing`
global variable, and it is on by default.

The actual fetch is done through the fetch-pack/upload-pack protocol. Right now,
this uses the fact that upload-pack allows blob and tree "want"s, and this
incurs the overhead of the unnecessary ref advertisement. I hope that protocol
v2 will allow us to declare that blob and tree "want"s are allowed, and allow
the client to declare that it does not want the ref advertisement. All packfiles
downloaded in this way 

Re: [PATCH] clang-format: adjust line break penalties

2017-09-29 Thread Brandon Williams
On 09/29, Jonathan Nieder wrote:
> Hi Dscho,
> 
> Johannes Schindelin wrote:
> 
> > Signed-off-by: Johannes Schindelin 
> > ---
> >  .clang-format | 12 ++--
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> Well executed and well explained. Thank you.
> 
> Reviewed-by: Jonathan Nieder 
> 
> Going forward, is there an easy way to preview the effect of this kind
> of change (e.g., to run "make style" on the entire codebase so as to be
> able to compare the result with two different versions of
> .clang-format)?
> 
> Thanks,
> Jonathan

I don't think there's an easy way to do this yet (I'm sure we can make
one) though the biggest barrier to that is that most of the code base
probably isn't consistent with the current .clang-format.

I also took a look at the patch and agree with all your points.  I'm
sure we'll still have to do some tweaking of these parameters but I'll
start using this locally and see if I find any problems.

-- 
Brandon Williams


Re: [PATCH 07/13] object-filter: common declarations for object filtering

2017-09-29 Thread Jonathan Tan
On Thu, 28 Sep 2017 10:33:39 -0400
Jeff Hostetler  wrote:

> Maybe.  What I have here now is the result of adding these arguments to
> rev-list and pack-objects (in the current patch series), and also to
> fetch-pack, fetch, clone, upload-pack, index-pack, and the transport and
> protocol code (in a follow-on patch series that I've omitted for the moment).
> And there will probably be a few more, such as fsck, gc, and etc.  I hesitate
> to refine the macros too much further until we've agreement on the overall
> approach and terms.

Fair enough. My current opinion on the overall approach (others might
differ, of course):
 - Filtering based on a sparse checkout specification in rev-list and
   pack-objects sounds useful to me, and is worth the filtering
   mechanism.
 - Filtering based on size (or based on type) still doesn't seem useful
   to me in rev-list, but if we're going to implement the filtering
   mechanism anyway, we might as well use the mechanism.
 - Besides my comments in [1], I think the API could still be slightly
   better organized. For example, object-filter probably should be the
   one to define the traverse_ function that takes in struct
   object_filter_options, and optionally a set of excluded objects to
   populate.

[1] 
https://public-inbox.org/git/20170927170533.65498396e008fa148a3fd...@google.com/


Re: [PATCH] oidmap: map with OID as key

2017-09-29 Thread Jeff King
On Fri, Sep 29, 2017 at 12:04:56PM -0700, Jonathan Tan wrote:

> > So depending how you count it, we're wasting between 28% (sha1 and no
> > extra hash) and 16% (sha256 plus reusing hashmap). That's not great, but
> > it's probably not breaking the bank.
> 
> Hmm...how did you get the 16% figure? The values, as I see it, are:
>  - 32 for the sha256 hash itself
>  - 8 for the "next" chain pointer
>  - 8 for the padded hash
>  - 8 for the "util" pointer
> 
> For an oidset, the padded hash and the "util" pointer are wasted, which is
> 16/56=0.286. (If you assume, say, 8 bytes of malloc bookkeeping overhead, then
> 16/64=0.25.)

Sorry to be unclear. I was just counting the waste for the "util"
pointer, not the extra padded hash bit (which AFAIK is already wasted in
oidset). So I computed 48 bytes without the util pointer, which means we
waste an additional 16% to add it.

Anyway, my point was mostly to say that this is a fractional percentage
of the total memory. So it falls into the category of "this might help
in tight situations" and less "this will blow up in our faces".

> In a 100-million-object monorepo, we will probably end up only operating on 
> the
> "frontier" objects (objects that we do not have but we know about because some
> object we have reference them) at the worst. I don't have numbers but I think
> that that is at least an order of magnitude less than 100M.

Agreed.

> > So I think we may be better off going with the solution here that's
> > simpler and requires introducing less code. If it does turn out to be a
> > memory problem in the future, this is a _really_ easy thing to optimize
> > after the fact, because we have these nice abstractions.
> 
> Optimizing away the padded hash should be straightforward. Optimizing away the
> "util" pointer after we have code that uses it is (slightly?) less
> straightforward, but still doable.

I was thinking here of just oidset. It has a limited API, so swapping
out the implementation for one that does not depend on oidmap and waste
the "util" pointer would be the "easy" part.

> I still lean towards saving memory by eliminating the padded hash and
> not using the "util" pointer, because the complication is contained
> within only two files (oidmap.{c,h}), and because the constant factors
> in memory savings might end up mattering. But I don't feel strongly
> about that - I think any of the oidmaps that we have discussed is an
> improvement over what we have now.

My main concern is not so much about complexity bleeding out of the
oidmap code, but that now we have two parallel near-identical hashmap
implementations.  When one gets an optimization, bugfix, or new feature,
we have to port it over manually to the other.

-Peff


Re: [PATCH] oidmap: map with OID as key

2017-09-29 Thread Jonathan Tan
On Thu, 28 Sep 2017 16:05:57 -0400
Jeff King  wrote:

> When I saw that you were implementing "oidset" in terms of "oidmap", I
> was all ready to be crabby about this extra memory. But then I saw that
> the implementation tries hard not to waste any memory. :)
> 
> All of which is to say I gave this some thought when I was in the "ready
> to be crabby" phase, and came to the conclusion that it probably isn't
> that painful. An unused pointer is 8 bytes per entry. We're already
> spending 20 for the oid itself (which is likely to grow to 32
> eventually), plus 8 for the chained "next" pointer. Plus potentially 8
> for a padded version of the hash, if we just use a straight hashmap that
> duplicates the hash field.
> 
> So depending how you count it, we're wasting between 28% (sha1 and no
> extra hash) and 16% (sha256 plus reusing hashmap). That's not great, but
> it's probably not breaking the bank.

Hmm...how did you get the 16% figure? The values, as I see it, are:
 - 32 for the sha256 hash itself
 - 8 for the "next" chain pointer
 - 8 for the padded hash
 - 8 for the "util" pointer

For an oidset, the padded hash and the "util" pointer are wasted, which is
16/56=0.286. (If you assume, say, 8 bytes of malloc bookkeeping overhead, then
16/64=0.25.)

For other uses of an oidmap, the padded hash and "util" pointer are still
wasted, so the numbers are the same as above (except that the malloc
bookkeeping overhead is doubled).

> Another way of thinking about it. Large-ish (but not insane) repos have
> on the order of 5-10 million objects. If we had an oidset that mentioned
> every single object in the repository, that's 40-80MB wasted in the
> worst case. For current uses of oidset, that's probably fine. It's
> generally used only to collect ref tips (so probably two orders of
> magnitude less).
> 
> If you're planning on using an oidset to mark every object in a
> 100-million-object monorepo, we'd probably care more. But I'd venture to
> say that any scheme which involves generating that hash table on the fly
> is doing it wrong. At at that scale we'd want to look at compact
> mmap-able on-disk representations.

In a 100-million-object monorepo, we will probably end up only operating on the
"frontier" objects (objects that we do not have but we know about because some
object we have reference them) at the worst. I don't have numbers but I think
that that is at least an order of magnitude less than 100M.

> So I think we may be better off going with the solution here that's
> simpler and requires introducing less code. If it does turn out to be a
> memory problem in the future, this is a _really_ easy thing to optimize
> after the fact, because we have these nice abstractions.

Optimizing away the padded hash should be straightforward. Optimizing away the
"util" pointer after we have code that uses it is (slightly?) less
straightforward, but still doable.

I still lean towards saving memory by eliminating the padded hash and
not using the "util" pointer, because the complication is contained
within only two files (oidmap.{c,h}), and because the constant factors
in memory savings might end up mattering. But I don't feel strongly
about that - I think any of the oidmaps that we have discussed is an
improvement over what we have now.


Re: [PATCH v16 1/6] bisect--helper: use OPT_CMDMODE instead of OPT_BOOL

2017-09-29 Thread Stephan Beyer
Hi Pranit,

On 09/29/2017 08:49 AM, Pranit Bauva wrote:
> It has been a long time since this series appeared on the mailing list.
> The previous version v15[1] is now split into many parts and I am
> sending the first part right now, will focus on getting this merged and
> then send out the next part.

That's a good idea!

I just reviewed your series by assuming I did the v15 review well (it
took quite some time at least)... so I just diff'ed the v15 and the v16
patches. I am totally fine with it!

Thanks,
Stephan


Re: [PATCH] clang-format: adjust line break penalties

2017-09-29 Thread Jonathan Nieder
Hi Dscho,

Johannes Schindelin wrote:

> Signed-off-by: Johannes Schindelin 
> ---
>  .clang-format | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)

Well executed and well explained. Thank you.

Reviewed-by: Jonathan Nieder 

Going forward, is there an easy way to preview the effect of this kind
of change (e.g., to run "make style" on the entire codebase so as to be
able to compare the result with two different versions of
.clang-format)?

Thanks,
Jonathan


Re: [PATCH 4/3] branch: fix "copy" to never touch HEAD

2017-09-29 Thread Ævar Arnfjörð Bjarmason

On Wed, Sep 27 2017, Junio C. Hamano jotted:

> Ævar Arnfjörð Bjarmason  writes:
>
>> I do think however that we also need to update the docs, the relevant
>> origin/master...gitster/sd/branch-copy diff is currently:
>>
>> +The `-c` and `-C` options have the exact same semantics as `-m` and
>> +`-M`, except instead of the branch being renamed it along with its
>> +config and reflog will be copied to a new name.
>>
>> Suggestions welcome, but I think that should probably be changed to
>> something like the following as part of this patch:
>>
>> The `-c` and `-C` options have the exact same semantics as `-m` and
>> `-M`, except instead of the branch being renamed it along with its
>> config and reflog will be copied to a new name. Furthermore, unlike
>> its `-m` and `-M` cousins the `-c` and `-C` options will never move
>> the HEAD, whereas the move option will do so if the source branch is
>> the currently checked-out branch.
>
> I do not think anybody even _imagines_ copy to move HEAD, and do not
> think "unlike -m, it doesn't touch HEAD" a worthwhile thing to say.
>
> It is '-m' whose behaviour may look strange wrt HEAD, so _that_ may
> be worth mentioning, though.
>
> I suspect that your reaction probably comes from being too married
> to the idea that HEAD has a string that is the same as the refname
> of the current branch.  But that is a mere implementation detail.
> Users would think that HEAD points at the current branch and does
> not even care how that pointing is implemented.

To cut to the chase instead of pointlessly replying to this
point-by-point, I think your patch quoted below is good and solves the
minor doc issue I had with your patch.

Yes HEAD is an implementation detail, but it's an exposed implementation
detail.

Thus before your patch it was true to say that "[-c] has the exact same
semantics as [-m] [...] except [ s/move/rename/ ]" since that was the
only behavior change, but with your patch adding another "if (!copy &&
...)" we'd now have two things different in the code, but only one thing
enumerated as being different in the docs.

Just rephrasing it as you did is a better way out of that than my
proposed patch. Thanks.

> Conceptually, you can consider that each branch has its own
> identity, separate from various traits it has, like what its
> upstream branch is, what commit it points at, what its reflog says,
> and (most notably) what its name is.
>
> Then you can think of "branch -m old new" to be (1) finding the
> instance of branch that currently has name 'old' and (2) updating
> only one of its trait, namely, its name, without changing anything
> else.  Creating a new instance of a branch by copying an existing
> one, on the other hand, would then be the matter of (1) finding the
> instance of branch with name 'old' and (2) creating another instance
> of branch with the same traits as the original, modulo its name is
> set to 'new'.
>
> A necessary wrinkle for "branch -m" that falls out of the above
> world model is that HEAD needs to be adjusted in order to keep
> pointing at the same (renamed) instance of branch that now has an
> updated name, if HEAD happened to be pointing at the instance of the
> branch whose name trait has been updated.
>
> So, from that point of view, I'd prefer to do something like the
> attached patch instead.  I notice that "branch -m" does not mention
> configuration variables carried over to the new branch, but I do not
> necessarily think we want to exhaustively enumerate what traits are
> carried over here, so perhaps it is OK as is.
>
>  Documentation/git-branch.txt | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
> index fe029ac6fc..d425e3acd4 100644
> --- a/Documentation/git-branch.txt
> +++ b/Documentation/git-branch.txt
> @@ -63,11 +63,13 @@ With a `-m` or `-M` option,  will be renamed 
> to .
>  If  had a corresponding reflog, it is renamed to match
>  , and a reflog entry is created to remember the branch
>  renaming. If  exists, -M must be used to force the rename
> -to happen.
> +to happen.  If you rename a branch that is currently checked out,
> +`HEAD` is adjusted so that the branch (whose name is now
> +) stays to be the current branch.
>
> -The `-c` and `-C` options have the exact same semantics as `-m` and
> -`-M`, except instead of the branch being renamed it along with its
> -config and reflog will be copied to a new name.
> +With a `-c` or`-C` option, a new branch  is created by
> +copying the traits like the reflog contents and `branch.*.*`
> +configuration from an existing .
>
>  With a `-d` or `-D` option, `` will be deleted.  You may
>  specify more than one branch for deletion.  If the branch currently


[PATCH] clang-format: adjust line break penalties

2017-09-29 Thread Johannes Schindelin
We really, really, really want to limit the columns to 80 per line: One
of the few consistent style comments on the Git mailing list is that the
lines should not have more than 80 columns/line (even if 79 columns/line
would make more sense, given that the code is frequently viewed as diff,
and diffs adding an extra character).

The penalty of 5 for excess characters is way too low to guarantee that,
though, as pointed out by Brandon Williams.

>From the existing clang-format examples and documentation, it appears
that 100 is a penalty deemed appropriate for Stuff You Really Don't
Want, so let's assign that as the penalty for "excess characters", i.e.
overly long lines.

While at it, adjust the penalties further: we are actually not that keen
on preventing new line breaks within comments or string literals, so the
penalty of 100 seems awfully high.

Likewise, we are not all that adamant about keeping line breaks away
from assignment operators (a lot of Git's code breaks immediately after
the `=` character just to keep that 80 columns/line limit).

We do frown a little bit more about functions' return types being on
their own line than the penalty 0 would suggest, so this was adjusted,
too.

Finally, we do not particularly fancy breaking before the first parameter
in a call, but if it keeps the line shorter than 80 columns/line, that's
what we do, so lower the penalty for breaking before a call's first
parameter, but not quite as much as introducing new line breaks to
comments.

Signed-off-by: Johannes Schindelin 
---
Published-As: 
https://github.com/dscho/git/releases/tag/clang-format-column-limit-v1
Fetch-It-Via: git fetch https://github.com/dscho/git 
clang-format-column-limit-v1
 .clang-format | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/.clang-format b/.clang-format
index 3ede2628d2d..56822c116b1 100644
--- a/.clang-format
+++ b/.clang-format
@@ -153,13 +153,13 @@ KeepEmptyLinesAtTheStartOfBlocks: false
 
 # Penalties
 # This decides what order things should be done if a line is too long
-PenaltyBreakAssignment: 100
-PenaltyBreakBeforeFirstCallParameter: 100
-PenaltyBreakComment: 100
+PenaltyBreakAssignment: 10
+PenaltyBreakBeforeFirstCallParameter: 30
+PenaltyBreakComment: 10
 PenaltyBreakFirstLessLess: 0
-PenaltyBreakString: 100
-PenaltyExcessCharacter: 5
-PenaltyReturnTypeOnItsOwnLine: 0
+PenaltyBreakString: 10
+PenaltyExcessCharacter: 100
+PenaltyReturnTypeOnItsOwnLine: 5
 
 # Don't sort #include's
 SortIncludes: false

base-commit: ea220ee40cbb03a63ebad2be902057bf742492fd
-- 
2.14.2.windows.1


Re: [PATCH v4] technical doc: add a design doc for hash function transition

2017-09-29 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> This document describes what a transition to a new hash function for
>> Git would look like.  Add it to Documentation/technical/ as the plan
>> of record so that future changes can be recorded as patches.
>>
>> Also-by: Brandon Williams 
>> Also-by: Jonathan Tan 
>> Also-by: Stefan Beller 
>> Signed-off-by: Jonathan Nieder 
>> ---
>
> Shoudln't these all be s-o-b: (with a note immediately before that
> to say all four contributed equally or something)?

I don't want to get lost in the weeds in the question of how to
represent such a collaborative effort in git's metadata.

You're right that I should collect their sign-offs!  Your approach of
using text instead of machine-readable data for common authorship also
seems okay.

In any event, this is indeed

Signed-off-by: Brandon Williams 
Signed-off-by: Jonathan Tan 
Signed-off-by: Stefan Beller 

(I just checked :)).

>> +Background
>> +--
>> +At its core, the Git version control system is a content addressable
>> +filesystem. It uses the SHA-1 hash function to name content. For
>> +example, files, directories, and revisions are referred to by hash
>> +values unlike in other traditional version control systems where files
>> +or versions are referred to via sequential numbers. The use of a hash
>
> Traditional systems refer to files via numbers???  Perhaps "where
> versions of files are referred to via sequential numbers" or
> something?

Good point.  The wording you suggested will work well.

>> +function to address its content delivers a few advantages:
>> +
>> +* Integrity checking is easy. Bit flips, for example, are easily
>> +  detected, as the hash of corrupted content does not match its name.
>> +* Lookup of objects is fast.
>
> * There is no ambiguity what the object's name should be, given its
>   content.
>
> * Deduping the same content copied across versions and paths is
>   automatic.

:)  Yep, these are nice too, especially that second one.

It also is how we make diff-ing fast.

>> +SHA-1 still possesses the other properties such as fast object lookup
>> +and safe error checking, but other hash functions are equally suitable
>> +that are believed to be cryptographically secure.
>
> s/secure/more &/, perhaps?

We were looking for a phrase meaning that it should be a cryptographic
hash function in good standing, which SHA-1 is at least approaching
not being.

"more secure" should work fine.  Let's go with that.

>> +Goals
>> +-
>> +...
>> +   c. Users can use SHA-1 and NewHash identifiers for objects
>> +  interchangeably (see "Object names on the command line", below).
>
> Mental note.  This needs to extend to the "index X..Y" lines in the
> patch output, which is used by "apply -3" and "am -3".

Will add a note about this to "Object names on the command line".  Stefan
had already pointed out that that section should really be renamed to
something like "Object names in input and output".

>> +2. Allow a complete transition away from SHA-1.
>> +   a. Local metadata for SHA-1 compatibility can be removed from a
>> +  repository if compatibility with SHA-1 is no longer needed.
>
> I like the emphasis on "Local" here.  Metadata for compatiblity that
> is embedded in the objects obviously cannot be removed.
>
> From that point of view, one of the goals ought to be "make sure
> that as much SHA-1 compatibility metadata as possible is local and
> outside the object".  This goal may not be able to say more than "as
> much as possible", as signed objects that came from SHA-1 world
> needs to carry the compatibility metadata somewhere somehow.
>
> Or perhaps we could.  There is nothing that says a signed tag
> created in the SHA-1 world must have the PGP/SHA-1 signature in the
> NewHash payload---it could be split off of the object data and
> stored in a local metadata cache, to be used only when we need to
> convert it back to the SHA-1 world.

That would break round-tripping and would mean that multiple SHA-1
objects could have the same NewHash name.  In other words, from
my point of view there is something that says that such data must
be preserved.

Another way to put it: even after removing all SHA-1 compatibility
metadata, one nice feature of this design is that it can be recovered
if I change my mind, from data in the NewHash based repository alone.

[...]
>> +Non-Goals
>> +-
>> ...
>> +6. Skip fetching some submodules of a project into a NewHash
>> +   repository. (This also depends on NewHash support in Git
>> +   protocol.)
>
> It is unclear what this means.  Around submodule support, one thing
> I can think of is that a NewHash tree in a superproject would record
> a gitlink that is a NewHash commit object name in it, therefore it
> cannot refer to an unconverted SHA-1 submodule repository.  But 

Re: What's cooking in git.git (Sep 2017, #06; Fri, 29)

2017-09-29 Thread Stefan Beller
On Fri, Sep 29, 2017 at 6:08 AM, Derrick Stolee  wrote:
> Hi Junio,
>
> On 9/29/2017 12:34 AM, Junio C Hamano wrote:
>>
>> * ds/find-unique-abbrev-optim (2017-09-19) 4 commits
>>   - SQUASH???
>>   - sha1_name: parse less while finding common prefix
>>   - sha1_name: unroll len loop in find_unique_abbrev_r()
>>   - sha1_name: create perf test for find_unique_abbrev()
>
>
> I'll re-roll my patch on Monday if reviews have stabilized. I think I
> understand your comments this time (especially around 32-bit ints).
> Since I'm new to the list, I'm not sure what the change in messages
> means here.
>
> What does "SQUASH???" mean? Is that why there are three meaningful commits
> in this note, despite my five-commit patch? Would you like me to squash the
> commits in v3?

If you fetch from github/gitster/git, there is a branch
'ds/find-unique-abbrev-optim'
containing four commits; three by you, one by Junio. This one commit
is titled 'SQUASH???' as Junio did not want to write out what the
commit is doing
(e.g. fixing a typo, a memleak, or indentation or other small detail
that he observed)

You want to take the content of the commit and add it to one of yours,
where appropriate.

>
> Thanks,
> -Stolee


"man git-stash" explanation of "--include-untracked" and "--all" seems ambiguous

2017-09-29 Thread Robert P. J. Day

  from the man page:

"If the --include-untracked option is used, all untracked files are
also stashed and then cleaned up with git clean, leaving the working
directory in a very clean state. If the --all option is used instead
 ^^^
then the ignored files are stashed and cleaned in addition to the
untracked files."

  the use of the word "instead" suggests you can use one of those
options, or the other, but not both at the same time. but it seems you
can combine them, so that paragraph seems a bit misleading, no?

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
http://crashcourse.ca

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday




Re: [PATCH Outreachy] mru: use double-linked list from list.h

2017-09-29 Thread Оля Тележная
Hi everyone,
Many thanks to all of you, I am interested in every opinion. Sorry
that I wasn't in the discussion, unfortunately I got sick, that's why
I skipped all the process.
I want to reply to the main moments and also ask some questions.

>> Simplify mru.c, mru.h and related code by reusing the double-linked list 
>> implementation from list.h instead of a custom one.
> An overlong line (I can locally wrap it, so the patch does not have
> to be re-sent only to fix this alone).
I've read only about 50 characters max in commit head (and
highlighting repeats it), but there's nothing about max length of line
in commit message. Sorry, next time I will make it shorter.

About many different opinions how to improve the code: I agree with
the idea that my commit is a middle step to get rid of MRU at all. If
we really need to add initializer/mru_for_each/smth_else - it's
absolutely not a problem, but as it was said, not sure that we need
it.
It really looks that using list implementation from list.h directly
won't be worse.

> I had envisioned leaving mru_mark() as a wrapper for "move to the front"
> that could operate on any list. But seeing how Olga's patch takes it
> down to two trivial lines, I'd also be fine with an endgame that just
> eliminates it.
Let's add needed function to list.h directly? I also wanted to add
list_for_each_entry function to list.h as it's in Linux kernel.
https://www.kernel.org/doc/htmldocs/kernel-api/API-list-for-each-entry.html
It will simplify the code even more, guess that not only in MRU
related code. Maybe we need to do that in separate patch.

About minor issues ( "tmp" vs "p2", variable scope, space indentation)
- fully agree, I will fix it.

So finally I think that I need to fix that minor issues and that's
all. I have plans to rewrite (with --amend) my current commit (I think
so because I will add no new features, so it's better to have single
commit for all changes).
As I understand, Submitgit will send an update in a new thread. And I
need to say there [PATCH v2].
Please correct me if I am wrong in any of the moments mentioned earlier.

By the way, other contributors write smth like "[PATCH v6 0/3]". What
does mean "0/3"? It's about editing separate commits in a single
patch, am I right?

Thank you one more time!
Olga


[ANNOUNCE] tig-2.3.0

2017-09-29 Thread Jonas Fonseca
Hello,

Really excited to release version 2.3.0 of Tig which brings several
improvements to the rendering performance, new features like search/prompt
history persistence, and important fixes to the graph rendering and emoji
display. I'd like thank all contributors and Roland Walker and Sven Wegener in
particular for all the great patches.

What is Tig?

Tig is an ncurses-based text-mode interface for git. It functions mainly
as a Git repository browser, but can also assist in staging changes for
commit at chunk level and act as a pager for output from various Git
commands.

 - Homepage: https://jonas.github.io/tig/
 - Manual: https://jonas.github.io/tig/doc/manual.html
 - Tarballs: https://github.com/jonas/tig/releases
 - Gitter: https://gitter.im/jonas/tig
 - Q: http://stackoverflow.com/questions/tagged/tig

Release notes
-
Incompatibilities:

 - The `width` setting on the `status`, `text` and `commit-title` columns was
   never applied and has been removed. (GH #617)

Improvements:

 - Improve load performance by throttling screen updates. (GH #622, #629)
 - Speed up graph rendering. (GH #638)
 - Enable scroll optimizations for Terminal.app and iTerm2. (GH #637)
 - Improve the test suite portability to not depend on GNU sed. (GH #609, #614)
 - Make build reproducible. (https://reproducible-builds.org/) (GH #613)
 - Enable binding to more symbolic keys and keys with control modifier:
   `F13`-`F19`, `ShiftLeft`, `ShiftRight`, `ShiftDel`, `ShiftHome`, `ShiftEnd`,
   `ShiftTab`, `Ctrl-C`, `Ctrl-V`, `Ctrl-S`, and `Ctrl-@`. (GH #314, #619, #642)
 - Persist readline history to `~/.tig_history` or `$XDG_DATA_HOME/tig/history`.
   Use `history-size` to control the number of entries to save. (GH #620, #713,
   #714, #718)
 - Preload last search from persistent history. (GH #630)
 - Add `view-close-no-quit` action, unbound by default. (GH #607)
 - Add `mouse-wheel-cursor` option (off by default) when set to true causes
   wheel actions to prefer moving the cursor instead of scrolling. (GH #608)
 - Add `truncation-delimiter` option, set to `~` by default. (GH #646)
 - Add `-q` parameter to `source` for "source-if-present". (GH #612)
 - Add `:echo` prompt command to display text in the status bar. (GH #626, #636)
 - Make `diff-highlight` colors configurable. (GH #625, #633)
 - Let Ctrl-C exit Y/N dialog, menu prompts and the file finder. (GH #632, #648)
 - Hide cursor unless at textual prompt. (GH #643)
 - Expand tilde ('~') in `:script` paths. (GH #674)
 - Show single-line output of external command in status bar. (GH #200, #557,
   #678)
 - Disable the graph when `--no-merges` is passed. (GH #687)
 - Print backtraces on segfault in debug mode.
 - Ignore script lines starting with `#` (comment). (GH #705)
 - Complete `repo:*` variables when readline is enabled. (GH #702)
 - Incorporate XTerm's `wcwidth.c` to find Unicode widths. (GH #691)

Bug fixes:

 - Fix graph display issues. (GH #419, #638)
 - Fix and improve rendering of Unicode characters. (GH #330, #621, #644, #682)
 - Handle hyphenated directory names when listing content. (GH #602)
 - Do not jump to next match when cancelling the search prompt. (GH #627)
 - Fix clearing of the status line after `Ctrl-C`. (GH #623, #649)
 - Fix handling of width on line-number and trimmed width of 1. (GH #617)
 - Set cursor position when not updating prompt contents. (GH #647)
 - Erase status line at exit time for users without altscreen-capable terminals.
   (GH #589)
 - Fix unexpected keys when restoring from suspend (`Ctrl-Z`). (GH #232)
 - contrib/vim.tigrc: Also bind G in the main as a workaround for limitations of
   the `none` action. (GH #594, #599)
 - Only override `blame-options` when commands are given and fix parsing of
   `-C`. (GH #597)
 - Fix diff name discovery to better handle prefixes.
 - Interpret button5 as wheel-down. (GH #321, #606)
 - Fix `back` / `parent` in tree view. (GH #641)
 - Fix memory corruption in `concat_argv` and file finder. (GH #634, #655)
 - Fix reading from stdin for `tig show`.
 - Document problem of outdated system-wide `tigrc` files in Homebrew. (GH #598)
 - Repaint the display when toggling `line-graphics`. (GH #527)
 - Fix custom date formatting support longer strings. (GH #522)
 - Don't segfault on ":exec" irregular args. (GH #686)
 - Fix segfault when calling htab_empty. (GH #663, #745)

Change summary
--
The diffstat and log summary for changes made in this release.

 .bookignore   |  16 +
 .gitignore|   1 +
 .travis.yml   |   1 +
 INSTALL.adoc  |  26 +-
 Makefile  |  42 +-
 NEWS.adoc |  69 +-
 README.adoc   |   2 +-
 book.md   |   2 +
 compat/compat.h   |  15 +
 compat/hashtab.h 

Re: RFC v3: Another proposed hash function transition plan

2017-09-29 Thread Joan Daemen

Dear Johannes,

if ever there was a SHA-2 competition, it must have been held inside 
NSA:-) But maybe you are confusing with the SHA-3 competition. In any 
case, when considering SHA-2 vs SHA-3 for usage in git, you may have a 
look at arguments we give in the following blogpost:


https://keccak.team/2017/open_source_crypto.html

Kind regards,

Joan Daemen

On 29/09/17 15:17, Johannes Schindelin wrote:

Hi Gilles,

On Tue, 19 Sep 2017, Gilles Van Assche wrote:


On 19/09/17 00:16, Johannes Schindelin wrote:

SHA-256 got much more cryptanalysis than SHA3-256 […].

I do not think this is true.

Please read what I said again: SHA-256 got much more cryptanalysis
than SHA3-256.

Indeed. What I meant is that SHA3-256 got at least as much cryptanalysis
as SHA-256. :-)

Oh? I got the opposite impression... I got the impression that *everybody*
in the field banged on all the SHA-2 candidates because everybody was
worried that SHA-1 would be utterly broken soon, and I got the impression
that after this SHA-2 competition, people were less worried?

Besides, I would expect that the difference in age (at *least* 7 years by
my humble arithmetic skills) to make a difference...


I never said that SHA3-256 got little cryptanalysis. Personally, I
think that SHA3-256 got a ton more cryptanalysis than SHA-1, and that
SHA-256 *still* got more cryptanalysis. But my opinion does not count,
really. However, the two experts I pestered with questions over
questions left me with that strong impression, and their opinion does
count.

OK, I respect your opinion and that of your two experts. Yet, the "much
more" part of your statement, in particular, is something that may
require a bit more explanations.

I would also like to point out the ubiquitousness of SHA-256. I have been
asked to provide SHA-256 checksums for the downloads of Git for Windows,
but not SHA3-256...

And this is a practically-relevant thing: the more users of an algorithm
there are, the more high-quality implementations you can choose from. And
this becomes relevant, say, when you have to switch implementations due to
license changes (*cough, cough looking in OpenSSL's direction*). Or when
you have to support the biggest Git repository on this planet and have to
eek out 5-10% more performance using the latest hardware. All of a sudden,
your consideration cannot only be "security of the algorithm" any longer.

Having said that, I am *really* happy to have SHA3-256 as a valid fallback
option in case SHA-256 should be broken.

Ciao,
Johannes




Can I Trust You ??.

2017-09-29 Thread Mr.Kevin Achimota
Dear Friend,
Good day to you.

With due respect, i apologize you for this sudden message, it was due to the 
urgency of this transaction made me to seek for your immediate assistance. I 
hope that you will not expose or betray this trust.

I am Mr.Kevin Achimota, a banker with position of the Interim Manager with 
(BANK OF AGRICULTURE- BOA) I want to invite you to handle this business because 
i will be going on retirement soon Please there is an abandoned sum of 
(€18.600.000.00 Euros ) i want to transfer to a foreign account before i step 
down from this position by the end of this October 2017.

Would you be interested to assist me with any account to received this amount 
without delay?. The mode of sharing will be 50/50, let me know your willingness 
for more details.

Kindly send me your information as stated bellow :

1. Your Full Name:__
2. Address:_
3. Country:_
4. City:_
5. Age :
6. Tel:_
7. Occupation _

I am waiting for your reply as soon as possible through here>> 
k.achimota...@citromail.hu
for more details.

Have a nice day.

Mr.Kevin Achimota
The Interim Manager
contact +229 688 113 95



Your response is highly appreciated

2017-09-29 Thread Mr.Adams Salem
Hello ,

I am specifically contacting you in respect of a business proposal that I
have for you as you appear very relevant in the proposal.

Please kindly reply back to me for further details.

Waiting to hear from you.

Regards,

Mr.Adams Salem


Re: RFC v3: Another proposed hash function transition plan

2017-09-29 Thread Johannes Schindelin
Hi Gilles,

On Tue, 19 Sep 2017, Gilles Van Assche wrote:

> On 19/09/17 00:16, Johannes Schindelin wrote:
> >>> SHA-256 got much more cryptanalysis than SHA3-256 […].
> >>
> >> I do not think this is true.
> >
> > Please read what I said again: SHA-256 got much more cryptanalysis
> > than SHA3-256.
> 
> Indeed. What I meant is that SHA3-256 got at least as much cryptanalysis
> as SHA-256. :-)

Oh? I got the opposite impression... I got the impression that *everybody*
in the field banged on all the SHA-2 candidates because everybody was
worried that SHA-1 would be utterly broken soon, and I got the impression
that after this SHA-2 competition, people were less worried?

Besides, I would expect that the difference in age (at *least* 7 years by
my humble arithmetic skills) to make a difference...

> > I never said that SHA3-256 got little cryptanalysis. Personally, I
> > think that SHA3-256 got a ton more cryptanalysis than SHA-1, and that
> > SHA-256 *still* got more cryptanalysis. But my opinion does not count,
> > really. However, the two experts I pestered with questions over
> > questions left me with that strong impression, and their opinion does
> > count.
> 
> OK, I respect your opinion and that of your two experts. Yet, the "much
> more" part of your statement, in particular, is something that may
> require a bit more explanations.

I would also like to point out the ubiquitousness of SHA-256. I have been
asked to provide SHA-256 checksums for the downloads of Git for Windows,
but not SHA3-256...

And this is a practically-relevant thing: the more users of an algorithm
there are, the more high-quality implementations you can choose from. And
this becomes relevant, say, when you have to switch implementations due to
license changes (*cough, cough looking in OpenSSL's direction*). Or when
you have to support the biggest Git repository on this planet and have to
eek out 5-10% more performance using the latest hardware. All of a sudden,
your consideration cannot only be "security of the algorithm" any longer.

Having said that, I am *really* happy to have SHA3-256 as a valid fallback
option in case SHA-256 should be broken.

Ciao,
Johannes


Re: What's cooking in git.git (Sep 2017, #06; Fri, 29)

2017-09-29 Thread Derrick Stolee

Hi Junio,

On 9/29/2017 12:34 AM, Junio C Hamano wrote:

* ds/find-unique-abbrev-optim (2017-09-19) 4 commits
  - SQUASH???
  - sha1_name: parse less while finding common prefix
  - sha1_name: unroll len loop in find_unique_abbrev_r()
  - sha1_name: create perf test for find_unique_abbrev()


I'll re-roll my patch on Monday if reviews have stabilized. I think I 
understand your comments this time (especially around 32-bit ints).

Since I'm new to the list, I'm not sure what the change in messages
means here.

What does "SQUASH???" mean? Is that why there are three meaningful 
commits in this note, despite my five-commit patch? Would you like me to 
squash the commits in v3?


Thanks,
-Stolee


Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)

2017-09-29 Thread Johannes Schindelin
Hi Nicolas,

On Tue, 19 Sep 2017, Nicolas Morey-Chaisemartin wrote:

> Le 19/09/2017 à 17:43, Johannes Schindelin a écrit :
> >
> > C'mon, don't *try* to misunderstand me.
> >
> > Of course there need to be updates as to the state of patch series.
> >
> > It's just that mails only go *so* far when you need to connect and
> > aggregate information. You need the connection between the original patch
> > series, the latest unaddressed reviews, links to the branches, history of
> > the patch series' iterations, and ideally links to the repositories of the
> > contributors with *their* branch names. And then, of course, your verdict
> > as to the state of the patch series and your expectation what happens
> > next.
> >
> > To relate that, you are using a plain text format that is not well defined
> > and not structured, and certainly not machine-readable, for information
> > that is crucial for project management.
> >
> > What you need is a tool to aggregate this information, to help working
> > with it, to manage the project, and to be updated automatically. And to
> > publish this information continuously, without costing you extra effort.
> >
> > I understand that you started before GitHub existed, and before GitHub was
> > an option, the script-generated What's cooking mail was the best you could
> > do.
> 
> Would something like patchwork fix your need ?

Maybe. Here is the link, for other interested parties:
http://jk.ozlabs.org/projects/patchwork/ and
https://github.com/getpatchwork/patchwork

> They now seems to have a REST API, which means it could probably be
> pluggeg into Junio's scripts and work seemlessly for him (or any other
> happy ML user) while other people can browse through the web interface.

It seems that patchwork's design calls for the communication still being
performed as previously, and just providing a web interface to search a
little more efficiently through the mails containing patch submissions.

Git's mailing list, of course, poses the problem to patchwork that the
status of any patch series is opaque to any automatic system that does not
specifically try to connect the What's cooking dot to the patch mail dots.

Also, a point that came up in a private discussion with another core Git
contributor this week: how many reviewers actually even so much as
test-compile, let alone look at the code in context? I am fairly certain
that none do, *just* because of the shortcomings of the process.

Patchwork would not address this, of course.

In my ideal world (in which there would be world peace, too, so it would
be pretty boring, therefore you should not put much stock into what I am
saying next), the direction would be the other way round: the tool should
not scrape the mailing list and make the results accessible via web
interface. Instead, the tool would let me sidestep the mailing list
altogether, using it just as a lossy communication medium (and keep the
lost information accessible in different ways). SubmitGit "threatened" to
allow me to do this: I could open a PR at https://github.com/git/git and
then hit a button and off it goes. SubmitGit stops there, though; If it
would have continued from there (and did not make the initial step
difficult by requiring some registration not everybody is comfortable
with), it would have fulfilled my wishes. Alas, it is written in Scala,
using a framework I am utterly unfamiliar with, so I could not do anything
about it.

Ciao,
Dscho

RE: [PATCH v8 00/12] Fast git status via a file system watcher

2017-09-29 Thread Ben Peart
> -Original Message-
> From: Junio C Hamano [mailto:gits...@pobox.com]
> Sent: Thursday, September 28, 2017 10:21 PM
> To: Ben Peart 
> Cc: david.tur...@twosigma.com; ava...@gmail.com;
> christian.cou...@gmail.com; git@vger.kernel.org;
> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
> Subject: Re: [PATCH v8 00/12] Fast git status via a file system watcher
> 
> Ben Peart  writes:
> 
> > The only behavioral change from V7 is the removal of unnecessary uses
> > of CE_MATCH_IGNORE_FSMONITOR.  With a better understanding of
> *why*
> > the
> > CE_MATCH_IGNORE_* flags are used, it is now clear they are not
> > required in most cases where CE_MATCH_IGNORE_FSMONITOR was being
> > passed out of an abundance of caution.
> 
> The reviews and updates after this round was posted were to
> 
>  * 01/12 had an obvious pointer-vs-pointee thinko, which I think I
>have locally fixed;
> 
>  * 08/12 forgot to add a new test executable to .gitignore file,
>which I think I have locally fixed, too.
> 
> Any other review comments and suggestions for improvements?
> Otherwise I am tempted to declare victory and merge this to 'next'
> soonish.
> 
> For reference, here is the interdiff between what was posted as v8 and what
> I have on 'pu'.

I had accumulated the same set of changes with one addition of removing
a duplicate "the" from a comment in the fsmonitor.h file:

diff --git a/fsmonitor.h b/fsmonitor.h
index 8eb6163455..0de644e01a 100644
--- a/fsmonitor.h
+++ b/fsmonitor.h
@@ -4,7 +4,7 @@
 extern struct trace_key trace_fsmonitor;
 
 /*
- * Read the the fsmonitor index extension and (if configured) restore the
+ * Read the fsmonitor index extension and (if configured) restore the
  * CE_FSMONITOR_VALID state.
  */
 extern int read_fsmonitor_extension(struct index_state *istate, const void 
*data, unsigned long sz); 

> 
> Thanks.
> 
>  compat/bswap.h  | 4 ++--
>  t/helper/.gitignore | 1 +
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git b/compat/bswap.h a/compat/bswap.h index
> 6b22c46214..5078ce5ecc 100644
> --- b/compat/bswap.h
> +++ a/compat/bswap.h
> @@ -183,8 +183,8 @@ static inline uint32_t get_be32(const void *ptr)  static
> inline uint64_t get_be64(const void *ptr)  {
>   const unsigned char *p = ptr;
> - return  (uint64_t)get_be32(p[0]) << 32 |
> - (uint64_t)get_be32(p[4]) <<  0;
> + return  (uint64_t)get_be32([0]) << 32 |
> + (uint64_t)get_be32([4]) <<  0;
>  }
> 
>  static inline void put_be32(void *ptr, uint32_t value) diff --git
> b/t/helper/.gitignore a/t/helper/.gitignore index f9328eebdd..87a648a7cf
> 100644
> --- b/t/helper/.gitignore
> +++ a/t/helper/.gitignore
> @@ -5,6 +5,7 @@
>  /test-delta
>  /test-drop-caches
>  /test-dump-cache-tree
> +/test-dump-fsmonitor
>  /test-dump-split-index
>  /test-dump-untracked-cache
>  /test-fake-ssh


Re: [PATCH] git-sh: Avoid sourcing scripts with git --exec-path

2017-09-29 Thread Dridi Boukelmoune
Hi all,

Thanks for the fast feedback, I'll answer everyone in a single email
if you don't mind.

On Fri, Sep 29, 2017 at 5:48 AM, Jonathan Nieder  wrote:
snip
> I wonder if we can make this so intuitive that it doesn't need
> mentioning in CodingGuidelines.  What if the test harness
> t/test-lib.sh were to set a GIT_EXEC_PATH with multiple directories in
> it?  That way, authors of new commands would not have to be careful to
> look out for this issue proactively because the testsuite would catch
> it.

Now that you pointed out that I missed the relevant documentations, I
don't think this belongs in the guidelines at all.

snip
> Do git-mergetool--lib.txt, git-parse-remote.txt, git-sh-i18n.txt,
> and git-sh-setup.txt in Documentation/ need the same treatment?

That is embarrassing, I thought I had done my research properly...

> Summary: I like the goal of this patch but I am nervous about the
> side-effect it introduces of PATH pollution.  Is there a way around
> that?  If there isn't, then this needs a few tweaks and then it should
> be ready to go.

The PATH is already "polluted" when git-* commands are run via git,
and in the context of a script using git-sh-setup I wouldn't consider
that completely irrelevant.

On Fri, Sep 29, 2017 at 5:58 AM, Junio C Hamano  wrote:
> Jonathan Nieder  writes:
>
>> This has been broken for a while, but better late than never to
>> address it.
>
> I am not sure if this is broken in the first place.  We do want to
> make sure that the scripted porcelains will source the shell helper
> library from matching Git release.  The proposed patch goes directly
> against that and I do not see how it could be an improvement.

But the problem is that just by having a GIT_EXEC_PATH you will source
an incorrect file name. If there was something like --exec-dir that
wouldn't take the PATH into account. Before I tried to contribute a
fix, my local patching of git-sh-setup after git-core upgrades was
actually this:

-. "$(git --exec-path)/git-sh-i18n"
+. "$(GIT_EXEC_PATH= git --exec-path)/git-sh-i18n"

That's not pretty, but it gives the guarantee to source from matching
Git release. Considering the PATH semantics, this is how I would fix
it after reading your feedback.

On Fri, Sep 29, 2017 at 6:21 AM, Jonathan Nieder  wrote:
> Junio C Hamano wrote:
>> Jonathan Nieder  writes:
>
>>> This has been broken for a while, but better late than never to
>>> address it.
>>
>> I am not sure if this is broken in the first place.  We do want to
>> make sure that the scripted porcelains will source the shell helper
>> library from matching Git release.  The proposed patch goes directly
>> against that and I do not see how it could be an improvement.
>
> It used to be that git allowed setting a colon-separated list of paths
> in GIT_EXEC_PATH.  (Very long ago, I relied on that feature.)  If we
> can restore that functionality without too much cost, then I think
> it's worthwhile.
>
> But the cost in this patch for that is pretty high.  And
>
> $ git log -S'$(git --exec-path)/'
>
> tells me that colon-separated paths in GIT_EXEC_PATH did not work for
> some use cases as far back as 2006.  Since 2008 the documentation has
> encouraged using "git --exec-path" in a way that does not work with
> colon-separated paths, too.  I hadn't realized it was so long.  Given
> that it hasn't been supported for more than ten years, I was wrong to
> read this as a bug report --- instead, it's a feature request.

Well, from my perspective it's a bug report, upgrading git caused a
regression in my setup. I didn't know I was doing it wrong ;)

snip
> Another possible motivation (the one that led me to use this mechnism
> long ago) is avoiding providing the dashed form git-$cmd in $PATH.  I
> think time has shown that having git-$cmd in $PATH is not too painful.

In my case, yes, I'm maintaining commands but don't really want to
pollute my general-purpose PATH. But I can live with that and use PATH
instead.

On Fri, Sep 29, 2017 at 7:00 AM, Junio C Hamano  wrote:
> Dridi Boukelmoune  writes:
>
>> For end users making use of a custom exec path many commands will simply
>> fail. Adding git's exec path to the PATH also allows overriding git-sh-*
>> scripts, not just adding commands. One can then patch a script without
>> tainting their system installation of git for example.
>
> I think the first sentence is where you went wrong.  It seems that
> you think this ought to work:
>
> rm -fr $HOME/random-stuff
> mkdir $HOME/random-stuff
> echo "echo happy" >$HOME/random-stuff/git-happy
> chmod +x $HOME/random-stuff/git-happy
> GIT_EXEC_PATH=$HOME/random-stuff
> export GIT_EXEC_PATH
> # then...
> git happy

Exactly!

> But that is not the right/officially sanctioned/kosher way to add
> custom git commands (adding your directory that has 

Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)

2017-09-29 Thread Johannes Schindelin
Hi Jonathan,

On Tue, 19 Sep 2017, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
> 
> > To relate that, you are using a plain text format that is not well
> > defined and not structured, and certainly not machine-readable, for
> > information that is crucial for project management.
> >
> > What you need is a tool to aggregate this information, to help working
> > with it, to manage the project, and to be updated automatically. And
> > to publish this information continuously, without costing you extra
> > effort.
> >
> > I understand that you started before GitHub existed, and before GitHub
> > was an option, the script-generated What's cooking mail was the best
> > you could do.
> 
> I think you are subtly (but not directly, for some reason?) advocating
> moving project management for the Git project to GitHub Issues.

No, I don't. I know how cumbersome it would be to move tons of people over
from one type of project management to another.

However, the current situation is not really tenable, is it? It is tedious
for everybody involved. I know Junio defends the status quo, but I cannot
imagine that he really likes it, as too much is too manual and
labor-intensive.

As I mentioned at the GitMerge (which was a bit pointless, because Junio
was not there, not even via Skype), I do not advocate one radical change,
ever.

> For what it's worth:
> 
>  1. I would be happy to see Git adopt a bug tracker.  As we've
> discussed on the list before, I suspect the only way that this is
> going to happen is if some contributors start using a bug tracker
> and keep up with bugs there, without requiring everyone to use it.
> That is how the Linux Kernel project started using
> bugzilla.kernel.org, for example.

I agree that a bug tracker goes a long way. Personally, I feel Bugzilla is
way too clunky to use, but I am pampered. However, I could imagine that
allowing issues to be opened at https://github.com/git/git, and
encouraging bug submissions there for people who really need to be able to
find out very, very quickly what the current state of their bug report is,
would go a long way.

Of course, this would require a commitment by Junio and others to allow
discussions to move to that bug tracker from the mailing list. Once that
willingness is there, this should be a viable alternative to reporting
bugs on the mailing list (and have those reports go unanswered because
they fell off the radar...).

>  2. GitHub Issues is one of my least favorite bug trackers, for what
> it's worth.  If some sub-project of Git chooses to use it, then
> that's great and I won't get in their way.  I'm just providing
> this single data point that approximately any other tracker
> (Bugzilla, JIRA, debbugs, patchwork) is something I'd be more
> likely to use.

My experience with Git for Windows, where I try to live Postel's Law by
accepting bug reports via mailing list and GitHub issues (and earlier
Google Code, when that was still alive and kicking), and to a certain
extent even via Twitter: next to nobody likes sending bug reports via mail.

So to add to your sentiment, I like Bugzilla *less* than GitHub issues,
and the worst bug tracker is a mailing list.

Or maybe you have written a shell script that can answer the question
"which of my reported bugs/submitted patch series are still open?" for
the Git mailing list?

>  3. This advice might feel hopeless, because if the maintainer is not
> involved in the initial pilot, then how does the bug tracker get
> notified when a patch has been accepted?  But fortunately this is
> a problem other people have solved: e.g. most bug trackers have an
> API that can be used to automatically notify the bug when a patch
> with a certain subject line appears on a certain branch.

Yes, I agree. The willingness to see the problem, followed by the
willingness to discuss possible solutions, those two need to be the first
steps.

Ciao,
Dscho


Re: [PATCH Outreachy] mru: use double-linked list from list.h

2017-09-29 Thread Christian Couder
On Fri, Sep 29, 2017 at 9:23 AM, Jeff King  wrote:
> On Fri, Sep 29, 2017 at 09:18:11AM +0200, Christian Couder wrote:
>
>> As we use the "prepare_packed_git_run_once" static, this function will
>> only be called only once when packed_git_mru has not yet been
>> initialized, so there will be no leak.
>
> Check reprepare_packed_git(). It unsets the run_once flag, and then
> calls prepare_packed_git() again.

Ah ok.


Re: What's cooking in git.git (Sep 2017, #03; Fri, 15)

2017-09-29 Thread Johannes Schindelin
Hi Junio,

On Sat, 16 Sep 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > On Fri, 15 Sep 2017, Junio C Hamano wrote:
> >
> >> --
> >> [Cooking]
> >> 
> >> [...]
> >> 
> >> * mk/diff-delta-uint-may-be-shorter-than-ulong (2017-08-10) 1 commit
> >>  ...
> >>  Dropped, as it was rerolled for review as part of a larger series.
> >>  cf. <1502914591-26215-1-git-send-email-martin@mail.zuhause>
> >> 
> >> [...]
> >> 
> >> * mk/use-size-t-in-zlib (2017-08-10) 1 commit
> >>  ...
> >>  Dropped, as it was rerolled for review as part of a larger series.
> >>  cf. <1502914591-26215-1-git-send-email-martin@mail.zuhause>
> >> 
> >> 
> >> * mk/diff-delta-avoid-large-offset (2017-08-11) 1 commit
> >>  ...
> >>  Dropped, as it was rerolled for review as part of a larger series.
> >>  cf. <1502914591-26215-1-git-send-email-martin@mail.zuhause>
> >> 
> >> 
> >> --
> >> [Discarded]
> >> 
> >> [...]
> >
> > These three topics are still in the wrong category. Please fix.
> 
> Hmph, but did the larger series these refer to actually land?

As I have to read these long mails to keep track of the current status of
some submissions, I do not care. However, in the context of this mail, it
does not make sense to have discarded patch series in the cooking section.
It's simply confusing.

Ciao,
Dscho


Mr Neil Trotter

2017-09-29 Thread Mr Neil Trotter
A donation of 1 million British Pounds to you in good faith


[PATCH v6 3/3] submodule: port submodule subcommand 'status' from shell to C

2017-09-29 Thread Prathamesh Chavan
This aims to make git-submodule 'status' a built-in. Hence, the function
cmd_status() is ported from shell to C. This is done by introducing
four functions: module_status(), submodule_status_cb(),
submodule_status() and print_status().

The function module_status() acts as the front-end of the subcommand.
It parses subcommand's options and then calls the function
module_list_compute() for computing the list of submodules. Then
this functions calls for_each_listed_submodule() looping through the
list obtained.

Then for_each_listed_submodule() calls submodule_status_cb() for each of
the submodule in its list. The function submodule_status_cb() calls
submodule_status() after passing appropriate arguments to the funciton.
Function submodule_status() is responsible for generating the status
each submodule it is called for, and then calls print_status().

Finally, the function print_status() handles the printing of submodule's
status.

Function set_name_rev() is also ported from git-submodule to the
submodule--helper builtin function compute_rev_name(), which now
generates the value of the revision name as required.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 builtin/submodule--helper.c | 207 
 git-submodule.sh|  61 +
 2 files changed, 208 insertions(+), 60 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 20a1ef868..50d38fc20 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -13,8 +13,13 @@
 #include "remote.h"
 #include "refs.h"
 #include "connect.h"
+#include "revision.h"
+#include "diffcore.h"
+#include "diff.h"
 
 #define CB_OPT_QUIET   (1<<0)
+#define CB_OPT_CACHED  (1<<1)
+#define CB_OPT_RECURSIVE   (1<<2)
 
 typedef void (*each_submodule_fn)(const struct cache_entry *list_item,
  void *cb_data);
@@ -245,6 +250,53 @@ static char *get_submodule_displaypath(const char *path, 
const char *prefix)
}
 }
 
+static char *compute_rev_name(const char *sub_path, const char* object_id)
+{
+   struct strbuf sb = STRBUF_INIT;
+   const char ***d;
+
+   static const char *describe_bare[] = {
+   NULL
+   };
+
+   static const char *describe_tags[] = {
+   "--tags", NULL
+   };
+
+   static const char *describe_contains[] = {
+   "--contains", NULL
+   };
+
+   static const char *describe_all_always[] = {
+   "--all", "--always", NULL
+   };
+
+   static const char **describe_argv[] = {
+   describe_bare, describe_tags, describe_contains,
+   describe_all_always, NULL
+   };
+
+   for (d = describe_argv; *d; d++) {
+   struct child_process cp = CHILD_PROCESS_INIT;
+   prepare_submodule_repo_env(_array);
+   cp.dir = sub_path;
+   cp.git_cmd = 1;
+   cp.no_stderr = 1;
+
+   argv_array_push(, "describe");
+   argv_array_pushv(, *d);
+   argv_array_push(, object_id);
+
+   if (!capture_command(, , 0)) {
+   strbuf_strip_suffix(, "\n");
+   return strbuf_detach(, NULL);
+   }
+   }
+
+   strbuf_release();
+   return NULL;
+}
+
 struct module_list {
const struct cache_entry **entries;
int alloc, nr;
@@ -503,6 +555,160 @@ static int module_init(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
+struct status_cb {
+   const char *prefix;
+   unsigned int cb_flags;
+};
+#define STATUS_CB_INIT { NULL, 0 }
+
+static void print_status(unsigned int flags, char state, const char *path,
+const struct object_id *oid, const char *displaypath)
+{
+   if (flags & CB_OPT_QUIET)
+   return;
+
+   printf("%c%s %s", state, oid_to_hex(oid), displaypath);
+
+   if (state == ' ' || state == '+')
+   printf(" (%s)", compute_rev_name(path, oid_to_hex(oid)));
+
+   printf("\n");
+}
+
+static int handle_submodule_head_ref(const char *refname,
+const struct object_id *oid, int flags,
+void *cb_data)
+{
+   struct object_id *output = cb_data;
+   if (oid)
+   oidcpy(output, oid);
+
+   return 0;
+}
+
+static void status_submodule(const char *path, const struct object_id *ce_oid,
+unsigned int ce_flags, const char *prefix,
+unsigned int cb_flags)
+{
+   char *displaypath;
+   struct argv_array diff_files_args = ARGV_ARRAY_INIT;
+   struct rev_info rev;
+   int diff_files_result;
+
+   if (!submodule_from_path(_oid, path))
+   die(_("no submodule mapping 

[PATCH v6 2/3] submodule--helper: introduce for_each_listed_submodule()

2017-09-29 Thread Prathamesh Chavan
Introduce function for_each_listed_submodule() and replace a loop
in module_init() with a call to it.

The new function will also be used in other parts of the
system in later patches.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 builtin/submodule--helper.c | 39 ++-
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index cdae54426..20a1ef868 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -14,6 +14,11 @@
 #include "refs.h"
 #include "connect.h"
 
+#define CB_OPT_QUIET   (1<<0)
+
+typedef void (*each_submodule_fn)(const struct cache_entry *list_item,
+ void *cb_data);
+
 static char *get_default_remote(void)
 {
char *dest = NULL, *ret;
@@ -349,7 +354,22 @@ static int module_list(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
-static void init_submodule(const char *path, const char *prefix, int quiet)
+static void for_each_listed_submodule(const struct module_list *list,
+ each_submodule_fn fn, void *cb_data)
+{
+   int i;
+   for (i = 0; i < list->nr; i++)
+   fn(list->entries[i], cb_data);
+}
+
+struct init_cb {
+   const char *prefix;
+   unsigned int cb_flags;
+};
+#define INIT_CB_INIT { NULL, 0 }
+
+static void init_submodule(const char *path, const char *prefix,
+  unsigned int cb_flags)
 {
const struct submodule *sub;
struct strbuf sb = STRBUF_INIT;
@@ -411,7 +431,7 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
if (git_config_set_gently(sb.buf, url))
die(_("Failed to register url for submodule path '%s'"),
displaypath);
-   if (!quiet)
+   if (!(cb_flags & CB_OPT_QUIET))
fprintf(stderr,
_("Submodule '%s' (%s) registered for path 
'%s'\n"),
sub->name, url, displaypath);
@@ -438,12 +458,18 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
free(upd);
 }
 
+static void init_submodule_cb(const struct cache_entry *list_item, void 
*cb_data)
+{
+   struct init_cb *info = cb_data;
+   init_submodule(list_item->name, info->prefix, info->cb_flags);
+}
+
 static int module_init(int argc, const char **argv, const char *prefix)
 {
+   struct init_cb info = INIT_CB_INIT;
struct pathspec pathspec;
struct module_list list = MODULE_LIST_INIT;
int quiet = 0;
-   int i;
 
struct option module_init_options[] = {
OPT__QUIET(, N_("Suppress output for initializing a 
submodule")),
@@ -468,8 +494,11 @@ static int module_init(int argc, const char **argv, const 
char *prefix)
if (!argc && git_config_get_value_multi("submodule.active"))
module_list_active();
 
-   for (i = 0; i < list.nr; i++)
-   init_submodule(list.entries[i]->name, prefix, quiet);
+   info.prefix = prefix;
+   if (quiet)
+   info.cb_flags |= CB_OPT_QUIET;
+
+   for_each_listed_submodule(, init_submodule_cb, );
 
return 0;
 }
-- 
2.13.0



[PATCH v6 1/3] submodule--helper: introduce get_submodule_displaypath()

2017-09-29 Thread Prathamesh Chavan
Introduce function get_submodule_displaypath() to replace the code
occurring in submodule_init() for generating displaypath of the
submodule with a call to it.

This new function will also be used in other parts of the system
in later patches.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 builtin/submodule--helper.c | 35 +++
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 818fe74f0..cdae54426 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -220,6 +220,26 @@ static int resolve_relative_url_test(int argc, const char 
**argv, const char *pr
return 0;
 }
 
+/* the result should be freed by the caller. */
+static char *get_submodule_displaypath(const char *path, const char *prefix)
+{
+   const char *super_prefix = get_super_prefix();
+
+   if (prefix && super_prefix) {
+   BUG("cannot have prefix '%s' and superprefix '%s'",
+   prefix, super_prefix);
+   } else if (prefix) {
+   struct strbuf sb = STRBUF_INIT;
+   char *displaypath = xstrdup(relative_path(path, prefix, ));
+   strbuf_release();
+   return displaypath;
+   } else if (super_prefix) {
+   return xstrfmt("%s%s", super_prefix, path);
+   } else {
+   return xstrdup(path);
+   }
+}
+
 struct module_list {
const struct cache_entry **entries;
int alloc, nr;
@@ -335,15 +355,7 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
struct strbuf sb = STRBUF_INIT;
char *upd = NULL, *url = NULL, *displaypath;
 
-   if (prefix && get_super_prefix())
-   die("BUG: cannot have prefix and superprefix");
-   else if (prefix)
-   displaypath = xstrdup(relative_path(path, prefix, ));
-   else if (get_super_prefix()) {
-   strbuf_addf(, "%s%s", get_super_prefix(), path);
-   displaypath = strbuf_detach(, NULL);
-   } else
-   displaypath = xstrdup(path);
+   displaypath = get_submodule_displaypath(path, prefix);
 
sub = submodule_from_path(_oid, path);
 
@@ -358,9 +370,9 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
 * Set active flag for the submodule being initialized
 */
if (!is_submodule_active(the_repository, path)) {
-   strbuf_reset();
strbuf_addf(, "submodule.%s.active", sub->name);
git_config_set_gently(sb.buf, "true");
+   strbuf_reset();
}
 
/*
@@ -368,7 +380,6 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
 * To look up the url in .git/config, we must not fall back to
 * .gitmodules, so look it up directly.
 */
-   strbuf_reset();
strbuf_addf(, "submodule.%s.url", sub->name);
if (git_config_get_string(sb.buf, )) {
if (!sub->url)
@@ -405,9 +416,9 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
_("Submodule '%s' (%s) registered for path 
'%s'\n"),
sub->name, url, displaypath);
}
+   strbuf_reset();
 
/* Copy "update" setting when it is not set yet */
-   strbuf_reset();
strbuf_addf(, "submodule.%s.update", sub->name);
if (git_config_get_string(sb.buf, ) &&
sub->update_strategy.type != SM_UPDATE_UNSPECIFIED) {
-- 
2.13.0



[PATCH v6 0/3] Incremental rewrite of git-submodules

2017-09-29 Thread Prathamesh Chavan
changes in v6:

* The function get_submodule_displaypath() was modified for the case
  when get_super_prefix() returns a non-null value. The condition to check
  if the super-prefix ends with a '/' is removed. To accomodate this change
  appropriate value of super_prefix is passed instead in the recursive calls
  of init_submodule() and status_submodule().

* To accomodate the possiblity of a direct call to the function
  init_submodule(), a callback function init_submodule_cb() is introduced
  which takes cache_entry and init_cb structures as input params, and
  calls init_submodule() with parameters which are more appropriate
  for a direct call of this function.

* Similar changes were even done for status_submodule(). But as it was
  observed that the number of params increased a lot due to flags
  like quiet, recursive, cached, etc, and keeping in mind the future
  subcommand's ported functions as well, a single unsigned int called
  cb_flags was introduced to store all of these flags, instead of having
  parameter for each one.

* Patches [3/4] and [4/4] from the previous series were merged as a single
  step.

* Call to function cmd_diff_files was avoided in the function status_submodule()
  and instead used the function run_diff_files() for the same purpose.

Since there were many changes the patches required, I took more time on
making these changes. Thank you, Junio for the last reviews. They
helped a lot for improving the patch series.

As before you can find this series at: 
https://github.com/pratham-pc/git/commits/patch-series-1

And its build report is available at: 
https://travis-ci.org/pratham-pc/git/builds/
Branch: patch-series-1
Build #184

Prathamesh Chavan (3):
  submodule--helper: introduce get_submodule_displaypath()
  submodule--helper: introduce for_each_listed_submodule()
  submodule: port submodule subcommand 'status' from shell to C

 builtin/submodule--helper.c | 281 +---
 git-submodule.sh|  61 +-
 2 files changed, 265 insertions(+), 77 deletions(-)

-- 
2.13.0



Re: [PATCH] fast-import: checkpoint: dump branches/tags/marks even if object_count==0

2017-09-29 Thread Junio C Hamano
Eric Rannaud  writes:

> On Thu, Sep 28, 2017 at 8:51 PM, Junio C Hamano  wrote:
>> I think that your patch the last round that feeds fd#8 in the
>> foreground (i.e. fully trusting that the caller is sensibly giving
>> input that produces no output) is already a good place to stop.
>>
>> Your patch this round that feeds fd#8 in the background, plus the
>> attached patch (i.e. not trusting the caller as much and allowing it
>> to use commands that outputs something, within reason), would also
>> be a good place to stop.
>>
>> But I am not sure your patch this round alone is a good place to
>> stop.  It somehow feels halfway either way.
>
> I agree. If we're coding defensively against the caller, we do have to
> include your patch to be effective, you're right. I reckon we likely
> don't need to be quite so paranoid, at least until this has more
> users.

OK, let's then pick the (not too excessively) defensive version by
taking your last one and suggested "while" loop squashed into it.

Thanks.


Re: [Bug/Solution] Git hangs in compat/poll/poll.c on HPE NonStop

2017-09-29 Thread Paolo Bonzini
On 29/09/2017 00:47, Junio C Hamano wrote:
> [jch: a patch that hopefully is applicable is attached at the end;
>  I'd appreciate input from Paolo, the contributor of the original
>  upstream]

I wasn't involved in upstream commit d42461c3, but Linux is also always
overwriting the revents output with zero.

Reviewed-by: Paolo Bonzini 

Paolo

> "Randall S. Becker"  writes:
> 
>> After a whole lot of investigating, we (it is a large "we") have discovered
>> the reason for the hang we occasionally get in git-upload-pack on HPE
>> NonStop servers - reported here well over a year ago. This resulted from a
>> subtle check that the operating system does on file descriptors. When it
>> sees random values in pfd[i].revents,...
> 
> What do you mean by "random values"?  Where do these random values
> come from?  The original code the patch touches is _not_ setting
> anything, so it is leaving it uninitialized and that is seen by the
> system?  If that is the case perhaps should we clear it before we
> call into this codepath?
> 
>>  There is a non-destructive fix
>> that I would like to propose for this that I have already tested.
> 
> I am not sure in what sense this is "non-destructive"; we left the
> value as it was when we didn't notice anything happening in
> compute_revents().  Now we explicitly destroy the value there was
> (just like we do in the case where the corresponding file descriptor
> was negative).  Maybe overwriting with 0 is the right thing, but I
> wouldn't call it "non-destructive", though.  Puzzling...
> 
>> --- a/compat/poll/poll.c
>> +++ b/compat/poll/poll.c
>> @@ -438,6 +438,10 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
>> pfd[i].revents = happened;
>> rc++;
>>   }
>> +else
>> +  {
>> +pfd[i].revents = 0;
>> +  }
>>}
>>
>>return rc;
> 
> FYI, the upstream gnulib rewrites this part of the code with
> d42461c3 ("poll: fixes for large fds", 2015-02-20) quite a bit, and
> it reads like this:
> 
> +{
> +  pfd[i].revents = (pfd[i].fd < 0
> +? 0
> +: compute_revents (pfd[i].fd, pfd[i].events,
> +   , , ));
> +  rc += pfd[i].revents != 0;
> +}
> 
> The code before your fix (and before d42461c3) used to assign to
> pfd[i].revents only when compute_revents() gave us non-zero value,
> but with d42461c3 in the upstream, it pfd[i].revents is assigned
> the return value from compute_revents() even if the value returned
> is 0.  And rc is incremented only when that value is non-zero.
> 
> The result of applying your patch is equivalent, so after all, I
> tend to think that your patch is the right fix in the context of the
> code we have (i.e. the older code we borrowed from them).
> 
> I wonder if we want to refresh the borrowed copy of poll.c to a
> newer snapshot someday, but that is totally a separate topic.  At
> least, I now know that your fix in this patch will not be broken due
> to d42461c3 updating the code in a slightly different way ;-)
> 
> Randall, I forged your Sign-off in the patch below, but please say
> it is OK, after reading Documentation/SubmittingPatches.
> 
> Thanks.
> 
> -- >8 --
> From: Randall S. Becker 
> Subject: poll.c: always set revents, even if to zero.
> 
> Match what happens to pfd[i].revents when compute_revents() returns
> 0 to the upstream gnulib's commit d42461c3 ("poll: fixes for large
> fds", 2015-02-20).  The revents field is set to 0, without
> incrementing the value rc to be returned from the function.
> 
> This fixes occasional hangs in git-upload-pack on NPE NonStop.
> 
> Signed-off-by: Randall S. Becker 
> Signed-off-by: Junio C Hamano 
> ---
>  compat/poll/poll.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/compat/poll/poll.c b/compat/poll/poll.c
> index b10adc780f..ae03b74a6f 100644
> --- a/compat/poll/poll.c
> +++ b/compat/poll/poll.c
> @@ -438,6 +438,10 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
>   pfd[i].revents = happened;
>   rc++;
> }
> + else
> +   {
> + pfd[i].revents = 0;
> +   }
>}
>  
>return rc;
> 



Re: [PATCH v4] technical doc: add a design doc for hash function transition

2017-09-29 Thread Junio C Hamano
Junio C Hamano  writes:

> Or perhaps we could.  There is nothing that says a signed tag
> created in the SHA-1 world must have the PGP/SHA-1 signature in the
> NewHash payload---it could be split off of the object data and
> stored in a local metadata cache, to be used only when we need to
> convert it back to the SHA-1 world.
> ...
>> +The format allows round-trip conversion between newhash-content and
>> +sha1-content.
>
> If it is a goal to eventually be able to lose SHA-1 compatibility
> metadata from the objects, then we might want to remove SHA-1 based
> signature bits (e.g. PGP trailer in signed tag, gpgsig header in the
> commit object) from NewHash contents, and instead have them stored
> in a side "metadata" table, only to be used while converting back.
> I dunno if that is desirable.

Let's keep it simple by ignoring all of the above.  Even though
leaving the sha1-gpgsig and other crufts would etch these
compatibility metadata in objects forever, these remain only in
objects that originate from SHA-1 world, or in objects created in
the NewHash world only while the project participants still care
about SHA-1 compatibility.  Strictly speaking, it would be super
nice if we can do without contaminating these newly created objects
with SHA-1 compatibility headers, just like we wish to be able to
drop the SHA-1 vs NewHash mapping table after projects participants
stop careing about SHA-1 compatiblity, it may not be worth it.  Of
course, if we decide to spend a bit more brain cycle to design how
we push these out of the object proper, the same solution would
automatically allow us to omit SHA-1 compatibility headers from the
objects that were converted from SHA-1 world.
>
>> +  - A table of 4-byte CRC32 values of the packed object data, in the
>> +order that the objects appear in the pack file. This is to allow
>> +compressed data to be copied directly from pack to pack during
>> +repacking without undetected data corruption.
>
> An obvious alternative would be to have the CRC32 checksum near
> (e.g. immediately before) the object data in the packfile (as
> opposed to the .idx file like this document specifies).  I am not
> sure what the pros and cons are between the two, though, and that is
> why I mention the possiblity here.
>
> Hmm, as the corresponding packfile stores object data only in
> NewHash content format, it is somewhat curious that this table that
> stores CRC32 of the data appears in the "Tables for each object
> format" section, as they would be identical, no?  Unless I am
> grossly misleading the spec, the checksum should either go outside
> the "Tables for each object format" section but still in .idx, or
> should be eliminated and become part of the packdata stream instead,
> perhaps?

Thinking about this a bit more, I think a single table per .idx file
would be the right way to go, not a checksum immediately after or
before the object data that is embedded in the pack stream.  In the
NewHash world (after this initial migration), we would want to be
able to stream NewHash packstream that comes from the network
straight to disk, which would mean these in-line CRC32 data would
need to be sent over the wire (i.e. 4-byte per object sent); that is
an unneeded overhead, as the packstream has its trailing checksum to
protect the whole thing anyway.


Re: [PATCH Outreachy] mru: use double-linked list from list.h

2017-09-29 Thread Jeff King
On Fri, Sep 29, 2017 at 09:18:11AM +0200, Christian Couder wrote:

> On Fri, Sep 29, 2017 at 12:42 AM, Jeff King  wrote:
> > On Thu, Sep 28, 2017 at 08:38:39AM +, Olga Telezhnaya wrote:
> >
> >> diff --git a/packfile.c b/packfile.c
> >> index f69a5c8d607af..ae3b0b2e9c09a 100644
> >> --- a/packfile.c
> >> +++ b/packfile.c
> >> @@ -876,6 +876,7 @@ void prepare_packed_git(void)
> >>   for (alt = alt_odb_list; alt; alt = alt->next)
> >>   prepare_packed_git_one(alt->path, 0);
> >>   rearrange_packed_git();
> >> + INIT_LIST_HEAD(_git_mru.list);
> >>   prepare_packed_git_mru();
> >>   prepare_packed_git_run_once = 1;
> >>  }
> >
> > I was thinking on this hunk a bit more, and I think it's not quite
> > right.
> >
> > The prepare_packed_git_mru() function will clear the mru list and then
> > re-add each item from the packed_git list. But by calling
> > INIT_LIST_HEAD() here, we're effectively clearing the packed_git_mru
> > list, and we end up leaking whatever was on the list before.
> 
> The current code is:
> 
> static int prepare_packed_git_run_once = 0;
> void prepare_packed_git(void)
> {
> struct alternate_object_database *alt;
> 
> if (prepare_packed_git_run_once)
> return;
> prepare_packed_git_one(get_object_directory(), 1);
> prepare_alt_odb();
> for (alt = alt_odb_list; alt; alt = alt->next)
> prepare_packed_git_one(alt->path, 0);
> rearrange_packed_git();
> prepare_packed_git_mru();
> prepare_packed_git_run_once = 1;
> }
> 
> As we use the "prepare_packed_git_run_once" static, this function will
> only be called only once when packed_git_mru has not yet been
> initialized, so there will be no leak.

Check reprepare_packed_git(). It unsets the run_once flag, and then
calls prepare_packed_git() again.

-Peff


Re: [PATCH Outreachy] mru: use double-linked list from list.h

2017-09-29 Thread Christian Couder
On Fri, Sep 29, 2017 at 12:42 AM, Jeff King  wrote:
> On Thu, Sep 28, 2017 at 08:38:39AM +, Olga Telezhnaya wrote:
>
>> diff --git a/packfile.c b/packfile.c
>> index f69a5c8d607af..ae3b0b2e9c09a 100644
>> --- a/packfile.c
>> +++ b/packfile.c
>> @@ -876,6 +876,7 @@ void prepare_packed_git(void)
>>   for (alt = alt_odb_list; alt; alt = alt->next)
>>   prepare_packed_git_one(alt->path, 0);
>>   rearrange_packed_git();
>> + INIT_LIST_HEAD(_git_mru.list);
>>   prepare_packed_git_mru();
>>   prepare_packed_git_run_once = 1;
>>  }
>
> I was thinking on this hunk a bit more, and I think it's not quite
> right.
>
> The prepare_packed_git_mru() function will clear the mru list and then
> re-add each item from the packed_git list. But by calling
> INIT_LIST_HEAD() here, we're effectively clearing the packed_git_mru
> list, and we end up leaking whatever was on the list before.

The current code is:

static int prepare_packed_git_run_once = 0;
void prepare_packed_git(void)
{
struct alternate_object_database *alt;

if (prepare_packed_git_run_once)
return;
prepare_packed_git_one(get_object_directory(), 1);
prepare_alt_odb();
for (alt = alt_odb_list; alt; alt = alt->next)
prepare_packed_git_one(alt->path, 0);
rearrange_packed_git();
prepare_packed_git_mru();
prepare_packed_git_run_once = 1;
}

As we use the "prepare_packed_git_run_once" static, this function will
only be called only once when packed_git_mru has not yet been
initialized, so there will be no leak.


[PATCH v16 5/6] t6030: explicitly test for bisection cleanup

2017-09-29 Thread Pranit Bauva
Add test to explicitly check that 'git bisect reset' is working as
expected. This is already covered implicitly by the test suite.

Mentored-by: Lars Schneider 
Mentored-by: Christian Couder 
Signed-off-by: Pranit Bauva 

---
I faced this problem while converting `bisect_clean_state` and the tests
where showing breakages but it wasn't clear as to where exactly are they
breaking. This will patch  will help in that. Also I tested the test
coverage of the test suite before this patch and it covers this (I did
this by purposely changing names of files in git-bisect.sh and running
the test suite).
---
 t/t6030-bisect-porcelain.sh | 17 +
 1 file changed, 17 insertions(+)

diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index 8c2c6eaef83fe..f84ff941c3624 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -894,4 +894,21 @@ test_expect_success 'bisect start takes options and revs 
in any order' '
test_cmp expected actual
 '
 
+test_expect_success 'git bisect reset cleans bisection state properly' '
+   git bisect reset &&
+   git bisect start &&
+   git bisect good $HASH1 &&
+   git bisect bad $HASH4 &&
+   git bisect reset &&
+   test -z "$(git for-each-ref "refs/bisect/*")" &&
+   test_path_is_missing "$GIT_DIR/BISECT_EXPECTED_REV" &&
+   test_path_is_missing "$GIT_DIR/BISECT_ANCESTORS_OK" &&
+   test_path_is_missing "$GIT_DIR/BISECT_LOG" &&
+   test_path_is_missing "$GIT_DIR/BISECT_RUN" &&
+   test_path_is_missing "$GIT_DIR/BISECT_TERMS" &&
+   test_path_is_missing "$GIT_DIR/head-name" &&
+   test_path_is_missing "$GIT_DIR/BISECT_HEAD" &&
+   test_path_is_missing "$GIT_DIR/BISECT_START"
+'
+
 test_done

--
https://github.com/git/git/pull/410


[PATCH v16 3/6] bisect--helper: `write_terms` shell function in C

2017-09-29 Thread Pranit Bauva
Reimplement the `write_terms` shell function in C and add a `write-terms`
subcommand to `git bisect--helper` to call it from git-bisect.sh . Also
remove the subcommand `--check-term-format` as it can now be called from
inside the function write_terms() C implementation.

Also `|| exit` is added when calling write-terms subcommand from
git-bisect.sh so as to exit whenever there is an error.

Using `--write-terms` subcommand is a temporary measure to port shell
function to C so as to use the existing test suite. As more functions
are ported, this subcommand will be retired and its implementation will
be called by some other method.

Mentored-by: Lars Schneider 
Mentored-by: Christian Couder 
Signed-off-by: Pranit Bauva 
---
 builtin/bisect--helper.c | 36 +---
 git-bisect.sh| 22 +++---
 2 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index db9091e249454..2bd2d396e335d 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -4,9 +4,11 @@
 #include "bisect.h"
 #include "refs.h"
 
+static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
+
 static const char * const git_bisect_helper_usage[] = {
N_("git bisect--helper --next-all [--no-checkout]"),
-   N_("git bisect--helper --check-term-format  "),
+   N_("git bisect--helper --write-terms  "),
NULL
 };
 
@@ -57,18 +59,38 @@ static int check_term_format(const char *term, const char 
*orig_term)
return 0;
 }
 
+static int write_terms(const char *bad, const char *good)
+{
+   FILE *fp = NULL;
+   int res;
+
+   if (!strcmp(bad, good))
+   return error(_("please use two different terms"));
+
+   if (check_term_format(bad, "bad") || check_term_format(good, "good"))
+   return -1;
+
+   fp = fopen(git_path_bisect_terms(), "w");
+   if (!fp)
+   return error_errno(_("could not open the file BISECT_TERMS"));
+
+   res = fprintf(fp, "%s\n%s\n", bad, good);
+   res |= fclose(fp);
+   return (res < 0) ? -1 : 0;
+}
+
 int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
 {
enum {
NEXT_ALL = 1,
-   CHECK_TERM_FMT
+   WRITE_TERMS
} cmdmode = 0;
int no_checkout = 0;
struct option options[] = {
OPT_CMDMODE(0, "next-all", ,
 N_("perform 'git bisect next'"), NEXT_ALL),
-   OPT_CMDMODE(0, "check-term-format", ,
-N_("check format of the term"), CHECK_TERM_FMT),
+   OPT_CMDMODE(0, "write-terms", ,
+N_("write the terms to .git/BISECT_TERMS"), 
WRITE_TERMS),
OPT_BOOL(0, "no-checkout", _checkout,
 N_("update BISECT_HEAD instead of checking out the 
current commit")),
OPT_END()
@@ -83,10 +105,10 @@ int cmd_bisect__helper(int argc, const char **argv, const 
char *prefix)
switch (cmdmode) {
case NEXT_ALL:
return bisect_next_all(prefix, no_checkout);
-   case CHECK_TERM_FMT:
+   case WRITE_TERMS:
if (argc != 2)
-   return error(_("--check-term-format requires two 
arguments"));
-   return check_term_format(argv[0], argv[1]);
+   return error(_("--write-terms requires two arguments"));
+   return write_terms(argv[0], argv[1]);
default:
return error("BUG: unknown subcommand '%d'", cmdmode);
}
diff --git a/git-bisect.sh b/git-bisect.sh
index a727c59250f13..a80fc44e524f2 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -209,7 +209,7 @@ bisect_start() {
eval "$eval true" &&
if test $must_write_terms -eq 1
then
-   write_terms "$TERM_BAD" "$TERM_GOOD"
+   git bisect--helper --write-terms "$TERM_BAD" "$TERM_GOOD" || 
exit
fi &&
echo "git bisect start$orig_args" >>"$GIT_DIR/BISECT_LOG" || exit
#
@@ -557,18 +557,6 @@ get_terms () {
fi
 }
 
-write_terms () {
-   TERM_BAD=$1
-   TERM_GOOD=$2
-   if test "$TERM_BAD" = "$TERM_GOOD"
-   then
-   die "$(gettext "please use two different terms")"
-   fi
-   git bisect--helper --check-term-format "$TERM_BAD" bad || exit
-   git bisect--helper --check-term-format "$TERM_GOOD" good || exit
-   printf '%s\n%s\n' "$TERM_BAD" "$TERM_GOOD" >"$GIT_DIR/BISECT_TERMS"
-}
-
 check_and_set_terms () {
cmd="$1"
case "$cmd" in
@@ -582,13 +570,17 @@ check_and_set_terms () {
bad|good)
if ! test -s "$GIT_DIR/BISECT_TERMS"
then
-   write_terms bad good
+   TERM_BAD=bad
+ 

[PATCH v16 4/6] bisect--helper: `bisect_clean_state` shell function in C

2017-09-29 Thread Pranit Bauva
Reimplement `bisect_clean_state` shell function in C and add a
`bisect-clean-state` subcommand to `git bisect--helper` to call it from
git-bisect.sh .

Using `--bisect-clean-state` subcommand is a measure to port shell
function to C so as to use the existing test suite. As more functions
are ported, this subcommand will be retired but its implementation  will
be called by bisect_reset() and bisect_start().

Also introduce a function `mark_for_removal` to store the refs which
need to be removed while iterating through the refs.

Mentored-by: Lars Schneider 
Mentored-by: Christian Couder 
Signed-off-by: Pranit Bauva 
---
 bisect.c | 42 ++
 bisect.h |  2 ++
 builtin/bisect--helper.c | 10 +-
 git-bisect.sh| 26 +++---
 4 files changed, 56 insertions(+), 24 deletions(-)

diff --git a/bisect.c b/bisect.c
index 2549eaf7b1515..d8c4425812eb0 100644
--- a/bisect.c
+++ b/bisect.c
@@ -433,7 +433,12 @@ static int read_bisect_refs(void)
 
 static GIT_PATH_FUNC(git_path_bisect_names, "BISECT_NAMES")
 static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV")
+static GIT_PATH_FUNC(git_path_bisect_ancestors_ok, "BISECT_ANCESTORS_OK")
+static GIT_PATH_FUNC(git_path_bisect_run, "BISECT_RUN")
+static GIT_PATH_FUNC(git_path_bisect_start, "BISECT_START")
+static GIT_PATH_FUNC(git_path_bisect_log, "BISECT_LOG")
 static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
+static GIT_PATH_FUNC(git_path_head_name, "head-name")
 
 static void read_bisect_paths(struct argv_array *array)
 {
@@ -1043,3 +1048,40 @@ int estimate_bisect_steps(int all)
 
return (e < 3 * x) ? n : n - 1;
 }
+
+static int mark_for_removal(const char *refname, const struct object_id *oid,
+   int flag, void *cb_data)
+{
+   struct string_list *refs = cb_data;
+   char *ref = xstrfmt("refs/bisect%s", refname);
+   string_list_append(refs, ref);
+   return 0;
+}
+
+int bisect_clean_state(void)
+{
+   int result = 0;
+
+   /* There may be some refs packed during bisection */
+   struct string_list refs_for_removal = STRING_LIST_INIT_NODUP;
+   for_each_ref_in("refs/bisect", mark_for_removal, (void *) 
_for_removal);
+   string_list_append(_for_removal, xstrdup("BISECT_HEAD"));
+   result = delete_refs("bisect: remove", _for_removal, REF_NODEREF);
+   refs_for_removal.strdup_strings = 1;
+   string_list_clear(_for_removal, 0);
+   unlink_or_warn(git_path_bisect_expected_rev());
+   unlink_or_warn(git_path_bisect_ancestors_ok());
+   unlink_or_warn(git_path_bisect_log());
+   unlink_or_warn(git_path_bisect_names());
+   unlink_or_warn(git_path_bisect_run());
+   unlink_or_warn(git_path_bisect_terms());
+   /* Cleanup head-name if it got left by an old version of git-bisect */
+   unlink_or_warn(git_path_head_name());
+   /*
+* Cleanup BISECT_START last to support the --no-checkout option
+* introduced in the commit 4796e823a.
+*/
+   unlink_or_warn(git_path_bisect_start());
+
+   return result;
+}
diff --git a/bisect.h b/bisect.h
index acd12ef802c61..0ae63d4616dc6 100644
--- a/bisect.h
+++ b/bisect.h
@@ -28,4 +28,6 @@ extern int estimate_bisect_steps(int all);
 
 extern void read_bisect_terms(const char **bad, const char **good);
 
+extern int bisect_clean_state(void);
+
 #endif
diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index 2bd2d396e335d..0f4d4e41cf233 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -9,6 +9,7 @@ static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
 static const char * const git_bisect_helper_usage[] = {
N_("git bisect--helper --next-all [--no-checkout]"),
N_("git bisect--helper --write-terms  "),
+   N_("git bisect--helper --bisect-clean-state"),
NULL
 };
 
@@ -83,7 +84,8 @@ int cmd_bisect__helper(int argc, const char **argv, const 
char *prefix)
 {
enum {
NEXT_ALL = 1,
-   WRITE_TERMS
+   WRITE_TERMS,
+   BISECT_CLEAN_STATE
} cmdmode = 0;
int no_checkout = 0;
struct option options[] = {
@@ -91,6 +93,8 @@ int cmd_bisect__helper(int argc, const char **argv, const 
char *prefix)
 N_("perform 'git bisect next'"), NEXT_ALL),
OPT_CMDMODE(0, "write-terms", ,
 N_("write the terms to .git/BISECT_TERMS"), 
WRITE_TERMS),
+   OPT_CMDMODE(0, "bisect-clean-state", ,
+N_("cleanup the bisection state"), BISECT_CLEAN_STATE),
OPT_BOOL(0, "no-checkout", _checkout,
 N_("update BISECT_HEAD instead of checking out the 
current commit")),
OPT_END()
@@ -109,6 +113,10 @@ int cmd_bisect__helper(int argc, const 

[PATCH v16 6/6] bisect--helper: `is_expected_rev` & `check_expected_revs` shell function in C

2017-09-29 Thread Pranit Bauva
Reimplement `is_expected_rev` & `check_expected_revs` shell function in
C and add a `--check-expected-revs` subcommand to `git bisect--helper` to
call it from git-bisect.sh .

Using `--check-expected-revs` subcommand is a temporary measure to port
shell functions to C so as to use the existing test suite. As more
functions are ported, this subcommand would be retired but its
implementation will be called by some other method.

Helped-by: Eric Sunshine 
Mentored-by: Lars Schneider 
Mentored-by: Christian Couder 
Signed-off-by: Pranit Bauva 
---
 builtin/bisect--helper.c | 34 +-
 git-bisect.sh| 20 ++--
 2 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index 0f4d4e41cf233..35d2105f941c6 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -5,6 +5,8 @@
 #include "refs.h"
 
 static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
+static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV")
+static GIT_PATH_FUNC(git_path_bisect_ancestors_ok, "BISECT_ANCESTORS_OK")
 
 static const char * const git_bisect_helper_usage[] = {
N_("git bisect--helper --next-all [--no-checkout]"),
@@ -80,12 +82,37 @@ static int write_terms(const char *bad, const char *good)
return (res < 0) ? -1 : 0;
 }
 
+static int is_expected_rev(const char *expected_hex)
+{
+   struct strbuf actual_hex = STRBUF_INIT;
+   int res = 0;
+   if (strbuf_read_file(_hex, git_path_bisect_expected_rev(), 0) >= 
40) {
+   strbuf_trim(_hex);
+   res = !strcmp(actual_hex.buf, expected_hex);
+   }
+   strbuf_release(_hex);
+   return res;
+}
+
+static void check_expected_revs(const char **revs, int rev_nr)
+{
+   int i;
+
+   for (i = 0; i < rev_nr; i++) {
+   if (!is_expected_rev(revs[i])) {
+   unlink_or_warn(git_path_bisect_ancestors_ok());
+   unlink_or_warn(git_path_bisect_expected_rev());
+   }
+   }
+}
+
 int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
 {
enum {
NEXT_ALL = 1,
WRITE_TERMS,
-   BISECT_CLEAN_STATE
+   BISECT_CLEAN_STATE,
+   CHECK_EXPECTED_REVS
} cmdmode = 0;
int no_checkout = 0;
struct option options[] = {
@@ -95,6 +122,8 @@ int cmd_bisect__helper(int argc, const char **argv, const 
char *prefix)
 N_("write the terms to .git/BISECT_TERMS"), 
WRITE_TERMS),
OPT_CMDMODE(0, "bisect-clean-state", ,
 N_("cleanup the bisection state"), BISECT_CLEAN_STATE),
+   OPT_CMDMODE(0, "check-expected-revs", ,
+N_("check for expected revs"), CHECK_EXPECTED_REVS),
OPT_BOOL(0, "no-checkout", _checkout,
 N_("update BISECT_HEAD instead of checking out the 
current commit")),
OPT_END()
@@ -117,6 +146,9 @@ int cmd_bisect__helper(int argc, const char **argv, const 
char *prefix)
if (argc != 0)
return error(_("--bisect-clean-state requires no 
arguments"));
return bisect_clean_state();
+   case CHECK_EXPECTED_REVS:
+   check_expected_revs(argv, argc);
+   return 0;
default:
return error("BUG: unknown subcommand '%d'", cmdmode);
}
diff --git a/git-bisect.sh b/git-bisect.sh
index 045830c399873..0138a8860e377 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -237,22 +237,6 @@ bisect_write() {
test -n "$nolog" || echo "git bisect $state $rev" 
>>"$GIT_DIR/BISECT_LOG"
 }
 
-is_expected_rev() {
-   test -f "$GIT_DIR/BISECT_EXPECTED_REV" &&
-   test "$1" = $(cat "$GIT_DIR/BISECT_EXPECTED_REV")
-}
-
-check_expected_revs() {
-   for _rev in "$@"; do
-   if ! is_expected_rev "$_rev"
-   then
-   rm -f "$GIT_DIR/BISECT_ANCESTORS_OK"
-   rm -f "$GIT_DIR/BISECT_EXPECTED_REV"
-   return
-   fi
-   done
-}
-
 bisect_skip() {
all=''
for arg in "$@"
@@ -280,7 +264,7 @@ bisect_state() {
rev=$(git rev-parse --verify "$bisected_head") ||
die "$(eval_gettext "Bad rev input: \$bisected_head")"
bisect_write "$state" "$rev"
-   check_expected_revs "$rev" ;;
+   git bisect--helper --check-expected-revs "$rev" ;;
2,"$TERM_BAD"|*,"$TERM_GOOD"|*,skip)
shift
hash_list=''
@@ -294,7 +278,7 @@ bisect_state() {
do
bisect_write "$state" "$rev"
done
-   check_expected_revs $hash_list ;;
+

[PATCH v16 1/6] bisect--helper: use OPT_CMDMODE instead of OPT_BOOL

2017-09-29 Thread Pranit Bauva
`--next-all` is meant to be used as a subcommand to support multiple
"operation mode" though the current implementation does not contain any
other subcommand along side with `--next-all` but further commits will
include some more subcommands.

Helped-by: Johannes Schindelin 
Mentored-by: Lars Schneider 
Mentored-by: Christian Couder 
Signed-off-by: Pranit Bauva 

---
Hey,

It has been a long time since this series appeared on the mailing list.
The previous version v15[1] is now split into many parts and I am
sending the first part right now, will focus on getting this merged and
then send out the next part.

The changes in this part:
 * Stephan pointed out that "terms" was missing in patch 2 ie.
   "bisect--helper: rewrite `check_term_format` shell function in C"

[1]:
https://public-inbox.org/git/CAFZEwPOjK25m84BgTF7AL72DL_K1dHf7OrYoX=_vky9r3ga...@mail.gmail.com/
---
 builtin/bisect--helper.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index 3324229025300..b6ee0acb82765 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -10,11 +10,11 @@ static const char * const git_bisect_helper_usage[] = {
 
 int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
 {
-   int next_all = 0;
+   enum { NEXT_ALL = 1 } cmdmode = 0;
int no_checkout = 0;
struct option options[] = {
-   OPT_BOOL(0, "next-all", _all,
-N_("perform 'git bisect next'")),
+   OPT_CMDMODE(0, "next-all", ,
+N_("perform 'git bisect next'"), NEXT_ALL),
OPT_BOOL(0, "no-checkout", _checkout,
 N_("update BISECT_HEAD instead of checking out the 
current commit")),
OPT_END()
@@ -23,9 +23,14 @@ int cmd_bisect__helper(int argc, const char **argv, const 
char *prefix)
argc = parse_options(argc, argv, prefix, options,
 git_bisect_helper_usage, 0);
 
-   if (!next_all)
+   if (!cmdmode)
usage_with_options(git_bisect_helper_usage, options);
 
-   /* next-all */
-   return bisect_next_all(prefix, no_checkout);
+   switch (cmdmode) {
+   case NEXT_ALL:
+   return bisect_next_all(prefix, no_checkout);
+   default:
+   return error("BUG: unknown subcommand '%d'", cmdmode);
+   }
+   return 0;
 }

--
https://github.com/git/git/pull/410


[PATCH v16 2/6] bisect--helper: rewrite `check_term_format` shell function in C

2017-09-29 Thread Pranit Bauva
Reimplement the `check_term_format` shell function in C and add
a `--check-term-format` subcommand to `git bisect--helper` to call it
from git-bisect.sh

Using `--check-term-format` subcommand is a temporary measure to port
shell function to C so as to use the existing test suite. As more
functions are ported, this subcommand will be retired and its
implementation will be called by some other method/subcommand. For
eg. In conversion of write_terms() of git-bisect.sh, the subcommand will
be removed and instead check_term_format() will be called in its C
implementation while a new subcommand will be introduced for write_terms().

Helped-by: Johannes Schindelein 
Mentored-by: Lars Schneider 
Mentored-by: Christian Couder 
Signed-off-by: Pranit Bauva 
---
 builtin/bisect--helper.c | 60 +++-
 git-bisect.sh| 31 ++---
 2 files changed, 61 insertions(+), 30 deletions(-)

diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index b6ee0acb82765..db9091e249454 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -2,19 +2,73 @@
 #include "cache.h"
 #include "parse-options.h"
 #include "bisect.h"
+#include "refs.h"
 
 static const char * const git_bisect_helper_usage[] = {
N_("git bisect--helper --next-all [--no-checkout]"),
+   N_("git bisect--helper --check-term-format  "),
NULL
 };
 
+/*
+ * Check whether the string `term` belongs to the set of strings
+ * included in the variable arguments.
+ */
+LAST_ARG_MUST_BE_NULL
+static int one_of(const char *term, ...)
+{
+   int res = 0;
+   va_list matches;
+   const char *match;
+
+   va_start(matches, term);
+   while (!res && (match = va_arg(matches, const char *)))
+   res = !strcmp(term, match);
+   va_end(matches);
+
+   return res;
+}
+
+static int check_term_format(const char *term, const char *orig_term)
+{
+   int res;
+   char *new_term = xstrfmt("refs/bisect/%s", term);
+
+   res = check_refname_format(new_term, 0);
+   free(new_term);
+
+   if (res)
+   return error(_("'%s' is not a valid term"), term);
+
+   if (one_of(term, "help", "start", "skip", "next", "reset",
+   "visualize", "replay", "log", "run", "terms", NULL))
+   return error(_("can't use the builtin command '%s' as a term"), 
term);
+
+   /*
+* In theory, nothing prevents swapping completely good and bad,
+* but this situation could be confusing and hasn't been tested
+* enough. Forbid it for now.
+*/
+
+   if ((strcmp(orig_term, "bad") && one_of(term, "bad", "new", NULL)) ||
+(strcmp(orig_term, "good") && one_of(term, "good", "old", 
NULL)))
+   return error(_("can't change the meaning of the term '%s'"), 
term);
+
+   return 0;
+}
+
 int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
 {
-   enum { NEXT_ALL = 1 } cmdmode = 0;
+   enum {
+   NEXT_ALL = 1,
+   CHECK_TERM_FMT
+   } cmdmode = 0;
int no_checkout = 0;
struct option options[] = {
OPT_CMDMODE(0, "next-all", ,
 N_("perform 'git bisect next'"), NEXT_ALL),
+   OPT_CMDMODE(0, "check-term-format", ,
+N_("check format of the term"), CHECK_TERM_FMT),
OPT_BOOL(0, "no-checkout", _checkout,
 N_("update BISECT_HEAD instead of checking out the 
current commit")),
OPT_END()
@@ -29,6 +83,10 @@ int cmd_bisect__helper(int argc, const char **argv, const 
char *prefix)
switch (cmdmode) {
case NEXT_ALL:
return bisect_next_all(prefix, no_checkout);
+   case CHECK_TERM_FMT:
+   if (argc != 2)
+   return error(_("--check-term-format requires two 
arguments"));
+   return check_term_format(argv[0], argv[1]);
default:
return error("BUG: unknown subcommand '%d'", cmdmode);
}
diff --git a/git-bisect.sh b/git-bisect.sh
index ae3cb013e7d56..a727c59250f13 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -564,38 +564,11 @@ write_terms () {
then
die "$(gettext "please use two different terms")"
fi
-   check_term_format "$TERM_BAD" bad
-   check_term_format "$TERM_GOOD" good
+   git bisect--helper --check-term-format "$TERM_BAD" bad || exit
+   git bisect--helper --check-term-format "$TERM_GOOD" good || exit
printf '%s\n%s\n' "$TERM_BAD" "$TERM_GOOD" >"$GIT_DIR/BISECT_TERMS"
 }
 
-check_term_format () {
-   term=$1
-   git check-ref-format refs/bisect/"$term" ||
-   die "$(eval_gettext "'\$term' is not a valid term")"
-   case "$term" in
-   

Re: What's cooking in git.git (Sep 2017, #06; Fri, 29)

2017-09-29 Thread Ian Campbell
On Fri, 2017-09-29 at 13:34 +0900, Junio C Hamano wrote:
> 
> * ic/fix-filter-branch-to-handle-tag-without-tagger (2017-09-22) 4
> commits
>   (merged to 'next' on 2017-09-25 at c7550033df)
>  + filter-branch: use hash-object instead of mktag
>  + filter-branch: stash away ref map in a branch
>  + filter-branch: preserve and restore $GIT_AUTHOR_* and
> $GIT_COMMITTER_*
>  + filter-branch: reset $GIT_* before cleaning up
> 
>  "git filter-branch" cannot reproduce a history with a tag without
>  the tagger field, which only ancient versions of Git allowed to be
>  created.  This has been corrected.

This set of patches also includes the new --state-branch option to
allow you to do incremental conversions.

Ian.


Re: [PATCH v4] technical doc: add a design doc for hash function transition

2017-09-29 Thread Junio C Hamano
Jonathan Nieder  writes:

> This document describes what a transition to a new hash function for
> Git would look like.  Add it to Documentation/technical/ as the plan
> of record so that future changes can be recorded as patches.
>
> Also-by: Brandon Williams 
> Also-by: Jonathan Tan 
> Also-by: Stefan Beller 
> Signed-off-by: Jonathan Nieder 
> ---

Shoudln't these all be s-o-b: (with a note immediately before that
to say all four contributed equally or something)?

> +Background
> +--
> +At its core, the Git version control system is a content addressable
> +filesystem. It uses the SHA-1 hash function to name content. For
> +example, files, directories, and revisions are referred to by hash
> +values unlike in other traditional version control systems where files
> +or versions are referred to via sequential numbers. The use of a hash

Traditional systems refer to files via numbers???  Perhaps "where
versions of files are referred to via sequential numbers" or
something?

> +function to address its content delivers a few advantages:
> +
> +* Integrity checking is easy. Bit flips, for example, are easily
> +  detected, as the hash of corrupted content does not match its name.
> +* Lookup of objects is fast.

* There is no ambiguity what the object's name should be, given its
  content.

* Deduping the same content copied across versions and paths is
  automatic.

> +SHA-1 still possesses the other properties such as fast object lookup
> +and safe error checking, but other hash functions are equally suitable
> +that are believed to be cryptographically secure.

s/secure/more &/, perhaps?

> +Goals
> +-
> +...
> +   c. Users can use SHA-1 and NewHash identifiers for objects
> +  interchangeably (see "Object names on the command line", below).

Mental note.  This needs to extend to the "index X..Y" lines in the
patch output, which is used by "apply -3" and "am -3".

> +2. Allow a complete transition away from SHA-1.
> +   a. Local metadata for SHA-1 compatibility can be removed from a
> +  repository if compatibility with SHA-1 is no longer needed.

I like the emphasis on "Local" here.  Metadata for compatiblity that
is embedded in the objects obviously cannot be removed.

>From that point of view, one of the goals ought to be "make sure
that as much SHA-1 compatibility metadata as possible is local and
outside the object".  This goal may not be able to say more than "as
much as possible", as signed objects that came from SHA-1 world
needs to carry the compatibility metadata somewhere somehow.  

Or perhaps we could.  There is nothing that says a signed tag
created in the SHA-1 world must have the PGP/SHA-1 signature in the
NewHash payload---it could be split off of the object data and
stored in a local metadata cache, to be used only when we need to
convert it back to the SHA-1 world.

But I am getting ahead of myself before reading the proposal
through.

> +Non-Goals
> +-
> ...
> +6. Skip fetching some submodules of a project into a NewHash
> +   repository. (This also depends on NewHash support in Git
> +   protocol.)

It is unclear what this means.  Around submodule support, one thing
I can think of is that a NewHash tree in a superproject would record
a gitlink that is a NewHash commit object name in it, therefore it
cannot refer to an unconverted SHA-1 submodule repository.  But it
is unclear if the above description refers to the same issue, or
something else.

> +Overview
> +
> +We introduce a new repository format extension. Repositories with this
> +extension enabled use NewHash instead of SHA-1 to name their objects.
> +This affects both object names and object content --- both the names
> +of objects and all references to other objects within an object are
> +switched to the new hash function.
> +
> +NewHash repositories cannot be read by older versions of Git.
> +
> +Alongside the packfile, a NewHash repository stores a bidirectional
> +mapping between NewHash and SHA-1 object names. The mapping is generated
> +locally and can be verified using "git fsck". Object lookups use this
> +mapping to allow naming objects using either their SHA-1 and NewHash names
> +interchangeably.
> +
> +"git cat-file" and "git hash-object" gain options to display an object
> +in its sha1 form and write an object given its sha1 form.

Both of these are somewhat unclear.  I am guessing that "git
cat-file --convert-to=sha1  " would emit the
object contents converted from their NewHash payload to SHA-1
payload (blobs are unchanged, trees, commits and tags get their
outgoing references converted from NewHash to their SHA-1
counterparts), and that is what you mean by "options to display an
object in its sha1 form".  

I am not sure how "git hash-object" with the option would work,
though.  Do you give an option "--hash=sha1 --stdout --stdin -t
" to feed a NewHash contents (file,