Re: Git on macOS shows committed files as untracked

2017-07-13 Thread Lutz Roeder
Using precomposeunicode still reproduces the issue:

Repro steps:

1. Download https://www.dropbox.com/s/0q5pbpqpckwzj7b/gitstatusrepro.zip?dl=0
2. unzip gitstatusrepro.zip && cd gitstatusrepro
3. git reset --hard
4. git -c core.precomposeunicode=true status

On branch master
Untracked files:
  (use "git add ..." to include in what will be committed)

"d\314\207\316\271\314\223\314\200\342\225\223\316\265\357\256\257\360\222\221\217\342\227\213\342\225\223\320\243\314\213/"

nothing added to commit but untracked files present (use "git add" to track)

> From: Torsten Bögershausen 
>
> Thanks for the fast analyzes -
> in short:
> what does
> git -c core.precomposeunicode=true status
> say ?
>
>
> The easiest thing may be to set
> git config --global core.precomposeunicode true


Re: What's cooking in git.git (Jul 2017, #04; Thu, 13)

2017-07-13 Thread Junio C Hamano
Santiago Torres  writes:

>> Here are the topics that have been cooking.  
>
> I sent (a patch almost a week ago) that would probably[1] be labeled
> as "uninteresting" (as per the notes from the maintainer), but I wanted
> to make sure it wasn't lost in the noise -- I see that theres a lot of
> active development lately. I checked the latest iterations of "what's
> cooking" to see if it was going to be discarded or so, but I see no
> mention of it.

I postponed it when I saw it the first time to see if anybody
comments on it, and then it turns out nobody was interested, and it
remained uninteresting to the list to this day.  

Now, after looking at the message again, from the patch description,
I would believe you that you experienced _some_ trouble when the
gpg-agent that is auto-spawned by gpg gets left behind (as I do not
see any hits from "git grep gpg-agent t/", we are not deliberately
using the agent).  However, I could not convince myself that the
solution is credible.  Here is an excerpt from the patch:

> diff --git a/t/lib-gpg.sh b/t/lib-gpg.sh
> index ec2aa8f68..22ef2fa87 100755
> --- a/t/lib-gpg.sh
> +++ b/t/lib-gpg.sh
> @@ -31,6 +31,7 @@ then
>   chmod 0700 ./gpghome &&
>   GNUPGHOME="$(pwd)/gpghome" &&
>   export GNUPGHOME &&
> + gpgconf --kill gpg-agent &&
>   gpg --homedir "${GNUPGHOME}" 2>/dev/null --import \
>   "$TEST_DIRECTORY"/lib-gpg/keyring.gpg &&
>   gpg --homedir "${GNUPGHOME}" 2>/dev/null --import-ownertrust \
> -- 

but the current directory of this part is the $TRASH_DIRECTORY,
which is always created anew from the beginning in a test.  What
agent process could possibly be running there immedately after
creating ./gpghome (which happens with "mkdir gpghome &&" without
"-p" just before the context of this hunk---if the agent was running
already, the directory would have been there, and mkdir would have
failed, which would have caused "test_set_prereq GPG" at the end of
the "&&" cascade to be skipped.  In other words, it is unclear to
me, and your log message does not sufficiently explain, why this is
the right solution (or what the exact cause of the symptom is, for
that matter).

Or perhaps the gpg-agent socket is created somewhere outside the
GNUPGHOME and stays behind even after a previous run of the same
test finishes and $TRASH_DIRECTORY gets cleared (I am guessing the
"what the exact cause is" part, as the log message did not explain
it)?  If that is the case, it makes me wonder if either of the two
alternative may be a more robust solution: (1) running gpg tests
telling gpg never to use agent, or (2) telling gpg and gpg-agent to
use a socket inside GNUPGHOME.

After all, "kill"ing agent blindly like the above patch would mean
you do not know what other party is relying on the proper operation
of the thing you are killing.  That sounds more like a workaround
that a solution (unless it is explained with a solid reason why that
is the right way to run more than one instances of GPG).

Perhaps everybody else is running these gpg tests without having to
worry about gpg-agent already because their environment is more
vanilla, but you have some configuration or environment that cause
gpg to use agent, and that is the reason why nobody is interested
(because they have never seen the symptom)?  It is possible that the
part of t/test-lib.sh that tries to cleanse environment variables
and other "external influence" to give us a predictable and
repeatable test is unaware of such a knob that only some developers
(including you) have and the rest of us were merely lucky.  Perhaps
we need to throw GPG_AGENT_INFO SSH_AUTH_SOCK etc. into the list of
envirionment variables to nuke there?

Combined with the unknown-ness of the root cause of the issue, I can
only say that the patch may be raising an issue worth addressing,
but it is too sketchy to tell if it is a right solution or what the
exact problem being solved is.


Re: Strange behavior with Git add -p in version 2.13.3

2017-07-13 Thread Junio C Hamano
Ben Reed  writes:

> Hello, I updated Git to 2.13.3, and now selecting 's' to split a
> change on a call to `git add -p` is not working. It's showing the list
> of options as if 's' is not a valid choice...
>
> Particularly, I'm getting...
> Stage this hunk [y,n,q,a,d,/,e,?]? s
> y - stage this hunk
> n - do not stage this hunk
> q - quit; do not stage this hunk or any of the remaining ones
> a - stage this hunk and all later hunks in the file
> d - do not stage this hunk or any of the later hunks in the file
> g - select a hunk to go to
> / - search for a hunk matching the given regex
> j - leave this hunk undecided, see next undecided hunk
> J - leave this hunk undecided, see next hunk
> k - leave this hunk undecided, see previous undecided hunk
> K - leave this hunk undecided, see previous hunk
> s - split the current hunk into smaller hunks
> e - manually edit the current hunk
> ? - print help
>
> Is anyone else having this problem? Does anybody know how to resolve
> it? I'm running on macOS Version 10.12.5.

I do not think it is MacOSX specific.  I notice that the prompt does
not even offer 's' as a valid choice, which typically means that the
hunk you are looking at is not splittable.

A splittable hunk has more than one blocks of "+addition" and/or
"-deletion" lines separated by at least one " context" line.  If
your hunk looks like this, for example:

-- >8 --
diff --git a/COPYING b/COPYING
index 536e55524d..35c4dd4473 100644
--- a/COPYING
+++ b/COPYING
@@ -3,7 +3,7 @@
  is concerned is _this_ particular version of the license (ie v2, not
  v2.2 or v3.x or whatever), unless explicitly otherwise stated.

- HOWEVER, in order to allow a migration to GPLv3 if that seems like
+ However, in order to allow a migration to GPLv3 if that seems like
  a good idea, I also ask that people involved with the project make
  their preferences known. In particular, if you trust me to make that
  decision, you might note so in your copyright message, ie something
Stage this hunk [y,n,q,a,d,/,e,?]?
-- 8< --

you would not see 's' offered as a valid choice.  If you are looking
at a splittable hunk, on the other hand:

-- >8 --
diff --git a/COPYING b/COPYING
index 536e55524d..02a5c58938 100644
--- a/COPYING
+++ b/COPYING
@@ -3,9 +3,9 @@
  is concerned is _this_ particular version of the license (ie v2, not
  v2.2 or v3.x or whatever), unless explicitly otherwise stated.

- HOWEVER, in order to allow a migration to GPLv3 if that seems like
+ However, in order to allow a migration to GPLv3 if that seems like
  a good idea, I also ask that people involved with the project make
- their preferences known. In particular, if you trust me to make that
+ *their* preferences known. In particular, if you trust me to make that
  decision, you might note so in your copyright message, ie something
  like

Stage this hunk [y,n,q,a,d,/,s,e,?]?
-- 8< --

you will see 's' offered as a valid choice.




Re: What's cooking in git.git (Jul 2017, #04; Thu, 13)

2017-07-13 Thread Santiago Torres
> Here are the topics that have been cooking.  

Hi, 

I sent (a patch almost a week ago) that would probably[1] be labeled
as "uninteresting" (as per the notes from the maintainer), but I wanted
to make sure it wasn't lost in the noise -- I see that theres a lot of
active development lately. I checked the latest iterations of "what's
cooking" to see if it was going to be discarded or so, but I see no
mention of it.

Thanks!
-Santiago

[1] 
https://public-inbox.org/git/20170707220729.a3xrsju3rf4guyzs@LykOS.localdomain/T/#t


signature.asc
Description: PGP signature


Re: reftable: new ref storage format

2017-07-13 Thread Shawn Pearce
On Thu, Jul 13, 2017 at 1:35 PM, Jeff King  wrote:
> On Thu, Jul 13, 2017 at 12:56:54PM -0700, Stefan Beller wrote:
>
> I agree that a full binary search of a reftable is harder because of the
> prefix compression (it may still be possible by scanning backwards, but
> I think there are ambiguities when you land in the middle of a record,
> since there's no unambiguous end-of-record character).

Its impossible to safely binary search this reftable format using a
naive divide byte count in half and find record boundary approach. I
actually did design an earlier version of reftable that was safe to
use this approach for its binary search within blocks, and wound up
discarding it. It was slower and more complex implementation than the
format I shared with the list.


> But I don't think
> it matters. If you binary-search to a constant-sized block, then a
> linear scan of the block is acceptable.

Depends on the block size. :)


> Not that I'm recommending just gzipping the whole packed-refs file. It
> ruins the fast-lookup.

As I just mentioned elsewhere in the thread:

  src file65306185
  gzip28338906
  reftable  28782292

The reftable format (for 64k block, 256 restart) is within spitting
distance (432 KiB) of a default level gzip of packed-refs. We can get
fast-lookup, and OK compression.


> We _could_ consider gzipping individual blocks of
> a reftable (or any structure that allows you to search to a
> constant-sized block and do a linear search from there). But given that
> they're in the same ballpark, I'm happy with whatever ends up the
> simplest to code and debug. ;)

This does help to shrink the file, e.g. it drops from 28M to 23M.

It makes it more CPU costly to access a block, as we have to inflate
that to walk through the records. It also messes with alignment. When
you touch a block, that may be straddling two virtual memory pages in
your kernel/filesystem.

I'm not sure those penalties are worth the additional 16% reduction in size.


>> When Shawn presented the proposal, a couple of colleagues here
>> were as excited as I was, but the daring question is, why Shawn
>> did not give the whole thing in BNF format from top down:
>>
>>   initial-block
>>   content-blocks*
>>   (index-block)
>>   footer
>
> Yeah, I agree it took me a bit to figure out what was going on. A
> high-level overview of the format would have been nice.

Noted, I've added this to my writeup.


Re: reftable: new ref storage format

2017-07-13 Thread Shawn Pearce
On Thu, Jul 13, 2017 at 12:32 PM, Jeff King  wrote:
> On Wed, Jul 12, 2017 at 05:17:58PM -0700, Shawn Pearce wrote:
>
>> ### Problem statement
>>
>> Some repositories contain a lot of references (e.g.  android at 866k,
>> rails at 31k).  The existing packed-refs format takes up a lot of
>> space (e.g.  62M), and does not scale with additional references.
>> Lookup of a single reference requires linearly scanning the file.
>
> I think the linear scan is actually an implementation short-coming. Even
> though the records aren't fixed-length, the fact that newlines can only
> appear as end-of-record is sufficient to mmap and binary search a
> packed-refs file (you just have to backtrack a little when you land in
> the middle of a record).
>
> I wrote a proof of concept a while ago, but got stuck on integrating it
> into the ref code, because of some of the assumptions that it made.
> Michael Haggerty has been doing several rounds of refactors to remove
> those assumptions. I think we're pretty close (I've actually seen the
> endgame where packed-refs is fully binary searched, but I think there
> are a few more cleanups necessary to cover all cases).

You are correct, this is possible with the current packed-refs format.
It just hasn't materialized in a shipping implementation yet.


>> Atomic pushes modifying multiple references require copying the
>> entire packed-refs file, which can be a considerable amount of data
>> moved (e.g. 62M in, 62M out) for even small transactions (2 refs
>> modified).
>
> I think your definition of atomic here doesn't match what git.git does.

:-(


> Our atomic push just takes the lock on all of the refs, and then once it
> has all of them, commits all of the locks. So it's atomic in the sense
> that you either get all or none of the writes (modulo a commit failure
> in the middle, which we naturally have no rollback plan for). But it can
> be done without touching the packed-refs file at all.
>
> I imagine that you're looking at atomicity from the perspective of a
> reader. In the git.git scheme, the reader may see a half-committed
> transaction. If you dispense with loose refs entirely and treat the
> packed-refs file as a single poorly-implemented key/value database, then
> you get reader atomicity (but O(size_of_database) write performance).

Yes, I was hoping for reader atomicity. But I may OK foregoing that if
the transaction either all goes through, or all fails. A partially
stuck transaction because the process died in the middle of the commit
step creates a mess for an administrator to undo. Does she rename
"foo.lock" to "foo"? Or delete "foo.lock"?


>> Repositories with many loose references occupy a large number of disk
>> blocks from the local file system, as each reference is its own file
>> storing 41 bytes.  This negatively affects the number of inodes
>> available when a large number of repositories are stored on the same
>> filesystem.  Readers are also penalized due to the larger number of
>> syscalls required to traverse and read the `$GIT_DIR/refs` directory.
>
> In my experience, the syscalls involved in loose refs aren't usually a
> big deal. If you have 800k refs, they're not all changing constantly. So
> a single pack-refs "fixes" performance going forward. What _is_ a big
> deal is that the packing process is complicated, readers have a very
> un-atomic view because of the myriad of files involved, and you get
> annoying lock contention during packing, as well as between deletions
> that have to rewrite packed-refs.
>
> But I'm not sure if you meant to contrast here a system where we didn't
> use packed-refs at all (though of course such a system is very much not
> atomic by the definition above).

No, I really did mean the current system. Gerrit Code Review servers
create a lot of references throughout the day. Its easy to accumulate
a few thousand new loose references in a 24 hour period. Even if you
have 600k existing refs in packed-refs, you still have 2k new/modified
refs since last nightly cron ran git gc.


>> ### Objectives
>>
>> - Near constant time lookup for any single reference, even when the
>>   repository is cold and not in process or kernel cache.
>
> Good goal, though TBH I'd be happy with O(log n).
>
> A related one is being able to traverse a subset of refs in
> O(nr_traversed). E.g., "git tag -l" should not have to do work
> proportional to what is in refs/changes. That falls out of most
> proposals that allow fast lookups, but notably not a straight
> hash-table.

Thanks, I missed that in this list, even though it was an explicit
objective going into this work. I added:

- Efficient lookup of an entire namespace, such as `refs/tags/`.


>> - Occupy less disk space for large repositories.
>
> Good goal.  Just to play devil's advocate, the simplest way to do that
> with the current code would be to gzip packed-refs (and/or store sha1s
> as binary). That works against the "mmap and binary search" plan,
> though. :)

Yes 

[PATCH v2 03/13] submodule: convert submodule config lookup to use object_id

2017-07-13 Thread brian m. carlson
Signed-off-by: brian m. carlson 
---
 builtin/grep.c   |  4 ++--
 builtin/submodule--helper.c  |  8 
 config.c | 12 ++--
 config.h |  4 ++--
 repository.c |  2 +-
 submodule-config.c   | 38 +++---
 submodule-config.h   | 12 ++--
 submodule.c  | 32 
 submodule.h  |  2 +-
 t/helper/test-submodule-config.c | 10 +-
 10 files changed, 62 insertions(+), 62 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index fa351c49f4..89fcb5b337 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -653,7 +653,7 @@ static int grep_submodule(struct grep_opt *opt, const 
struct object_id *oid,
 */
if (oid) {
const struct submodule *sub =
-   submodule_from_path(null_sha1, path);
+   submodule_from_path(_oid, path);
if (sub)
path = git_path("modules/%s", sub->name);
 
@@ -862,7 +862,7 @@ static int grep_objects(struct grep_opt *opt, const struct 
pathspec *pathspec,
/* load the gitmodules file for this rev */
if (recurse_submodules) {
submodule_free();
-   gitmodules_config_sha1(real_obj->oid.hash);
+   gitmodules_config_oid(_obj->oid);
}
if (grep_object(opt, pathspec, real_obj, list->objects[i].name, 
list->objects[i].path)) {
hit = 1;
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 6abdad3294..af871f14e7 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -350,7 +350,7 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
} else
displaypath = xstrdup(path);
 
-   sub = submodule_from_path(null_sha1, path);
+   sub = submodule_from_path(_oid, path);
 
if (!sub)
die(_("No url found for submodule path '%s' in .gitmodules"),
@@ -476,7 +476,7 @@ static int module_name(int argc, const char **argv, const 
char *prefix)
usage(_("git submodule--helper name "));
 
gitmodules_config();
-   sub = submodule_from_path(null_sha1, argv[1]);
+   sub = submodule_from_path(_oid, argv[1]);
 
if (!sub)
die(_("no submodule mapping found in .gitmodules for path 
'%s'"),
@@ -795,7 +795,7 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
goto cleanup;
}
 
-   sub = submodule_from_path(null_sha1, ce->name);
+   sub = submodule_from_path(_oid, ce->name);
 
if (suc->recursive_prefix)
displaypath = relative_path(suc->recursive_prefix,
@@ -1060,7 +1060,7 @@ static const char *remote_submodule_branch(const char 
*path)
gitmodules_config();
git_config(submodule_config, NULL);
 
-   sub = submodule_from_path(null_sha1, path);
+   sub = submodule_from_path(_oid, path);
if (!sub)
return NULL;
 
diff --git a/config.c b/config.c
index a9356c1383..014151d032 100644
--- a/config.c
+++ b/config.c
@@ -1460,9 +1460,9 @@ int git_config_from_mem(config_fn_t fn, const enum 
config_origin_type origin_typ
return do_config_from(, fn, data);
 }
 
-int git_config_from_blob_sha1(config_fn_t fn,
+int git_config_from_blob_oid(config_fn_t fn,
  const char *name,
- const unsigned char *sha1,
+ const struct object_id *oid,
  void *data)
 {
enum object_type type;
@@ -1470,7 +1470,7 @@ int git_config_from_blob_sha1(config_fn_t fn,
unsigned long size;
int ret;
 
-   buf = read_sha1_file(sha1, , );
+   buf = read_sha1_file(oid->hash, , );
if (!buf)
return error("unable to load config blob object '%s'", name);
if (type != OBJ_BLOB) {
@@ -1488,11 +1488,11 @@ static int git_config_from_blob_ref(config_fn_t fn,
const char *name,
void *data)
 {
-   unsigned char sha1[20];
+   struct object_id oid;
 
-   if (get_sha1(name, sha1) < 0)
+   if (get_oid(name, ) < 0)
return error("unable to resolve config blob '%s'", name);
-   return git_config_from_blob_sha1(fn, name, sha1, data);
+   return git_config_from_blob_oid(fn, name, , data);
 }
 
 const char *git_etc_gitconfig(void)
diff --git a/config.h b/config.h
index 0352da117b..827f2b0e4a 100644
--- a/config.h
+++ b/config.h
@@ -39,8 +39,8 @@ extern int git_default_config(const char *, const char *, 
void *);
 extern int 

[PATCH v2 00/13] object_id part 9

2017-07-13 Thread brian m. carlson
This is the ninth in a series of series to convert Git to use struct
object_id.  This series converts the remaining callers of get_sha1 and
friends to take and use struct object_id, and in doing so, renames them
to get_oid and friends.

This patch should probably be based Stefan Beller's series, in which case
patch 9 can be dropped.

Changes from v1:
* Restore the check for length in get_sha1_basic.
* Add a patch converting some uses of 40 to GIT_SHA1_HEXSZ as suggested.  This
  is a separate patch because I wanted to minimize the non-automated portions of
  the patch in question.

brian m. carlson (13):
  builtin/fsck: convert remaining caller of get_sha1 to object_id
  builtin/merge-tree: convert remaining caller of get_sha1 to object_id
  submodule: convert submodule config lookup to use object_id
  remote: convert struct push_cas to struct object_id
  sequencer: convert to struct object_id
  builtin/update_ref: convert to struct object_id
  bisect: convert bisect_checkout to struct object_id
  builtin/unpack-file: convert to struct object_id
  builtin/verify-tag: convert to struct object_id
  Convert remaining callers of get_sha1 to get_oid.
  sha1_name: convert get_sha1* to get_oid*
  sha1_name: convert GET_SHA1* flags to GET_OID*
  sha1_name: convert uses of 40 to GIT_SHA1_HEXSZ

 apply.c  |   4 +-
 archive.c|   2 +-
 bisect.c |  18 +--
 builtin/am.c |   6 +-
 builtin/cat-file.c   |   8 +-
 builtin/commit-tree.c|   4 +-
 builtin/commit.c |   8 +-
 builtin/fsck.c   |   8 +-
 builtin/grep.c   |   8 +-
 builtin/log.c|   4 +-
 builtin/merge-tree.c |   6 +-
 builtin/receive-pack.c   |   4 +-
 builtin/replace.c|   4 +-
 builtin/reset.c  |  10 +-
 builtin/rev-parse.c  |   8 +-
 builtin/show-branch.c|   8 +-
 builtin/submodule--helper.c  |   8 +-
 builtin/unpack-file.c|  12 +-
 builtin/update-ref.c |  69 ++-
 builtin/verify-tag.c |   8 +-
 cache.h  |  45 
 commit.c |   4 +-
 config.c |  12 +-
 config.h |   4 +-
 mailmap.c|   6 +-
 notes.c  |   2 +-
 refs.c   |   2 +-
 remote.c |   8 +-
 remote.h |   2 +-
 repository.c |   2 +-
 revision.c   |  16 +--
 sequencer.c  |  65 +--
 sha1_name.c  | 240 +++
 submodule-config.c   |  38 +++
 submodule-config.h   |  12 +-
 submodule.c  |  32 +++---
 submodule.h  |   2 +-
 t/helper/test-submodule-config.c |  10 +-
 transport-helper.c   |   2 +-
 39 files changed, 351 insertions(+), 360 deletions(-)



[PATCH v2 12/13] sha1_name: convert GET_SHA1* flags to GET_OID*

2017-07-13 Thread brian m. carlson
Convert the flags for get_oid_with_context and friends to use "OID"
instead of "SHA1" in their names.

This transform was made by running the following one-liner on the
affected files:

  perl -pi -e 's/GET_SHA1/GET_OID/g'

Signed-off-by: brian m. carlson 
---
 builtin/cat-file.c  |  4 ++--
 builtin/grep.c  |  2 +-
 builtin/log.c   |  2 +-
 builtin/rev-parse.c |  2 +-
 cache.h | 28 +++
 refs.c  |  2 +-
 revision.c  |  6 ++---
 sha1_name.c | 64 ++---
 8 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 1f035fb550..62c8cf0ebf 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -63,7 +63,7 @@ static int cat_one_file(int opt, const char *exp_type, const 
char *obj_name,
if (unknown_type)
flags |= OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
-   if (get_oid_with_context(obj_name, GET_SHA1_RECORD_PATH,
+   if (get_oid_with_context(obj_name, GET_OID_RECORD_PATH,
 , _context))
die("Not a valid object name %s", obj_name);
 
@@ -361,7 +361,7 @@ static void batch_one_object(const char *obj_name, struct 
batch_options *opt,
 struct expand_data *data)
 {
struct object_context ctx;
-   int flags = opt->follow_symlinks ? GET_SHA1_FOLLOW_SYMLINKS : 0;
+   int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0;
enum follow_symlinks_result result;
 
result = get_oid_with_context(obj_name, flags, >oid, );
diff --git a/builtin/grep.c b/builtin/grep.c
index 8943626e6c..1618087a60 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1207,7 +1207,7 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
break;
}
 
-   if (get_oid_with_context(arg, GET_SHA1_RECORD_PATH,
+   if (get_oid_with_context(arg, GET_OID_RECORD_PATH,
 , )) {
if (seen_dashdash)
die(_("unable to resolve revision: %s"), arg);
diff --git a/builtin/log.c b/builtin/log.c
index 60fac9d4dc..3dbdf8cadd 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -484,7 +484,7 @@ static int show_blob_object(const struct object_id *oid, 
struct rev_info *rev, c
!DIFF_OPT_TST(>diffopt, ALLOW_TEXTCONV))
return stream_blob_to_fd(1, oid, NULL, 0);
 
-   if (get_oid_with_context(obj_name, GET_SHA1_RECORD_PATH,
+   if (get_oid_with_context(obj_name, GET_OID_RECORD_PATH,
 , _context))
die(_("Not a valid object name %s"), obj_name);
if (!obj_context.path ||
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 041b7898c4..2bd28d3c08 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -702,7 +702,7 @@ int cmd_rev_parse(int argc, const char **argv, const char 
*prefix)
}
if (!strcmp(arg, "--quiet") || !strcmp(arg, "-q")) {
quiet = 1;
-   flags |= GET_SHA1_QUIETLY;
+   flags |= GET_OID_QUIETLY;
continue;
}
if (opt_with_value(arg, "--short", )) {
diff --git a/cache.h b/cache.h
index da7ac495ba..cfc3698eba 100644
--- a/cache.h
+++ b/cache.h
@@ -1283,27 +1283,27 @@ struct object_context {
 */
struct strbuf symlink_path;
/*
-* If GET_SHA1_RECORD_PATH is set, this will record path (if any)
+* If GET_OID_RECORD_PATH is set, this will record path (if any)
 * found when resolving the name. The caller is responsible for
 * releasing the memory.
 */
char *path;
 };
 
-#define GET_SHA1_QUIETLY   01
-#define GET_SHA1_COMMIT02
-#define GET_SHA1_COMMITTISH04
-#define GET_SHA1_TREE 010
-#define GET_SHA1_TREEISH  020
-#define GET_SHA1_BLOB 040
-#define GET_SHA1_FOLLOW_SYMLINKS 0100
-#define GET_SHA1_RECORD_PATH 0200
-#define GET_SHA1_ONLY_TO_DIE04000
+#define GET_OID_QUIETLY   01
+#define GET_OID_COMMIT02
+#define GET_OID_COMMITTISH04
+#define GET_OID_TREE 010
+#define GET_OID_TREEISH  020
+#define GET_OID_BLOB 040
+#define GET_OID_FOLLOW_SYMLINKS 0100
+#define GET_OID_RECORD_PATH 0200
+#define GET_OID_ONLY_TO_DIE04000
 
-#define GET_SHA1_DISAMBIGUATORS \
-   (GET_SHA1_COMMIT | GET_SHA1_COMMITTISH | \
-   GET_SHA1_TREE | GET_SHA1_TREEISH | \
-   GET_SHA1_BLOB)
+#define GET_OID_DISAMBIGUATORS \
+   (GET_OID_COMMIT | GET_OID_COMMITTISH | \
+   GET_OID_TREE | GET_OID_TREEISH | \
+   GET_OID_BLOB)
 
 extern int get_oid(const char *str, struct 

[PATCH v2 04/13] remote: convert struct push_cas to struct object_id

2017-07-13 Thread brian m. carlson
This gets rid of one use of get_sha1.

Signed-off-by: brian m. carlson 
---
 remote.c | 6 +++---
 remote.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/remote.c b/remote.c
index d87482573d..9da9040bf0 100644
--- a/remote.c
+++ b/remote.c
@@ -2294,8 +2294,8 @@ static int parse_push_cas_option(struct push_cas_option 
*cas, const char *arg, i
if (!*colon)
entry->use_tracking = 1;
else if (!colon[1])
-   hashclr(entry->expect);
-   else if (get_sha1(colon + 1, entry->expect))
+   oidclr(>expect);
+   else if (get_oid(colon + 1, >expect))
return error("cannot parse expected object name '%s'", colon + 
1);
return 0;
 }
@@ -2342,7 +2342,7 @@ static void apply_cas(struct push_cas_option *cas,
continue;
ref->expect_old_sha1 = 1;
if (!entry->use_tracking)
-   hashcpy(ref->old_oid_expect.hash, cas->entry[i].expect);
+   oidcpy(>old_oid_expect, >expect);
else if (remote_tracking(remote, ref->name, 
>old_oid_expect))
oidclr(>old_oid_expect);
return;
diff --git a/remote.h b/remote.h
index 6c28cd3e4b..2ecf4c8c74 100644
--- a/remote.h
+++ b/remote.h
@@ -282,7 +282,7 @@ struct ref *get_stale_heads(struct refspec *refs, int 
ref_count, struct ref *fet
 struct push_cas_option {
unsigned use_tracking_for_rest:1;
struct push_cas {
-   unsigned char expect[20];
+   struct object_id expect;
unsigned use_tracking:1;
char *refname;
} *entry;


[PATCH v2 13/13] sha1_name: convert uses of 40 to GIT_SHA1_HEXSZ

2017-07-13 Thread brian m. carlson
There are several uses of the constant 40 in find_unique_abbrev_r.
Convert them to GIT_SHA1_HEXSZ.

Signed-off-by: brian m. carlson 
---
 sha1_name.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index b49303271e..8b7fd7e134 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -501,10 +501,10 @@ int find_unique_abbrev_r(char *hex, const unsigned char 
*sha1, int len)
}
 
sha1_to_hex_r(hex, sha1);
-   if (len == 40 || !len)
-   return 40;
+   if (len == GIT_SHA1_HEXSZ || !len)
+   return GIT_SHA1_HEXSZ;
exists = has_sha1_file(sha1);
-   while (len < 40) {
+   while (len < GIT_SHA1_HEXSZ) {
struct object_id oid_ret;
status = get_short_oid(hex, len, _ret, GET_OID_QUIETLY);
if (exists


[PATCH v2 02/13] builtin/merge-tree: convert remaining caller of get_sha1 to object_id

2017-07-13 Thread brian m. carlson
Signed-off-by: brian m. carlson 
---
 builtin/merge-tree.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index bad6735c76..f12da292cf 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -347,12 +347,12 @@ static void merge_trees(struct tree_desc t[3], const char 
*base)
 
 static void *get_tree_descriptor(struct tree_desc *desc, const char *rev)
 {
-   unsigned char sha1[20];
+   struct object_id oid;
void *buf;
 
-   if (get_sha1(rev, sha1))
+   if (get_oid(rev, ))
die("unknown rev %s", rev);
-   buf = fill_tree_descriptor(desc, sha1);
+   buf = fill_tree_descriptor(desc, oid.hash);
if (!buf)
die("%s is not a tree", rev);
return buf;


[PATCH v2 11/13] sha1_name: convert get_sha1* to get_oid*

2017-07-13 Thread brian m. carlson
Now that all the callers of get_sha1 directly or indirectly use struct
object_id, rename the functions starting with get_sha1 to start with
get_oid.  Convert the internals in sha1_name.c to use struct object_id
as well, and eliminate explicit length checks where possible.  Convert a
use of 40 in get_oid_basic to GIT_SHA1_HEXSZ.

Outside of sha1_name.c and cache.h, this transition was made with the
following semantic patch:

@@
expression E1, E2;
@@
- get_sha1(E1, E2.hash)
+ get_oid(E1, )

@@
expression E1, E2;
@@
- get_sha1(E1, E2->hash)
+ get_oid(E1, E2)

@@
expression E1, E2;
@@
- get_sha1_committish(E1, E2.hash)
+ get_oid_committish(E1, )

@@
expression E1, E2;
@@
- get_sha1_committish(E1, E2->hash)
+ get_oid_committish(E1, E2)

@@
expression E1, E2;
@@
- get_sha1_treeish(E1, E2.hash)
+ get_oid_treeish(E1, )

@@
expression E1, E2;
@@
- get_sha1_treeish(E1, E2->hash)
+ get_oid_treeish(E1, E2)

@@
expression E1, E2;
@@
- get_sha1_commit(E1, E2.hash)
+ get_oid_commit(E1, )

@@
expression E1, E2;
@@
- get_sha1_commit(E1, E2->hash)
+ get_oid_commit(E1, E2)

@@
expression E1, E2;
@@
- get_sha1_tree(E1, E2.hash)
+ get_oid_tree(E1, )

@@
expression E1, E2;
@@
- get_sha1_tree(E1, E2->hash)
+ get_oid_tree(E1, E2)

@@
expression E1, E2;
@@
- get_sha1_blob(E1, E2.hash)
+ get_oid_blob(E1, )

@@
expression E1, E2;
@@
- get_sha1_blob(E1, E2->hash)
+ get_oid_blob(E1, E2)

@@
expression E1, E2, E3, E4;
@@
- get_sha1_with_context(E1, E2, E3.hash, E4)
+ get_oid_with_context(E1, E2, , E4)

@@
expression E1, E2, E3, E4;
@@
- get_sha1_with_context(E1, E2, E3->hash, E4)
+ get_oid_with_context(E1, E2, E3, E4)

Signed-off-by: brian m. carlson 
---
 apply.c   |   4 +-
 archive.c |   2 +-
 builtin/am.c  |   6 +-
 builtin/cat-file.c|   6 +-
 builtin/commit-tree.c |   4 +-
 builtin/commit.c  |   8 +--
 builtin/grep.c|   4 +-
 builtin/log.c |   4 +-
 builtin/replace.c |   4 +-
 builtin/reset.c   |  10 +--
 builtin/rev-parse.c   |   6 +-
 builtin/show-branch.c |   8 +--
 cache.h   |  17 +++--
 commit.c  |   4 +-
 notes.c   |   2 +-
 remote.c  |   2 +-
 revision.c|  10 +--
 sequencer.c   |   6 +-
 sha1_name.c   | 190 --
 transport-helper.c|   2 +-
 20 files changed, 145 insertions(+), 154 deletions(-)

diff --git a/apply.c b/apply.c
index 4050cebcfa..1743fc4a49 100644
--- a/apply.c
+++ b/apply.c
@@ -3552,7 +3552,7 @@ static int try_threeway(struct apply_state *state,
/* Preimage the patch was prepared for */
if (patch->is_new)
write_sha1_file("", 0, blob_type, pre_oid.hash);
-   else if (get_sha1(patch->old_sha1_prefix, pre_oid.hash) ||
+   else if (get_oid(patch->old_sha1_prefix, _oid) ||
 read_blob_object(, _oid, patch->old_mode))
return error(_("repository lacks the necessary blob to fall 
back on 3-way merge."));
 
@@ -4076,7 +4076,7 @@ static int build_fake_ancestor(struct apply_state *state, 
struct patch *list)
else
return error(_("sha1 information is lacking or "
   "useless for submodule %s"), 
name);
-   } else if (!get_sha1_blob(patch->old_sha1_prefix, oid.hash)) {
+   } else if (!get_oid_blob(patch->old_sha1_prefix, )) {
; /* ok */
} else if (!patch->lines_added && !patch->lines_deleted) {
/* mode-only change: update the current */
diff --git a/archive.c b/archive.c
index 60b3035a7a..557dd2db85 100644
--- a/archive.c
+++ b/archive.c
@@ -358,7 +358,7 @@ static void parse_treeish_arg(const char **argv,
free(ref);
}
 
-   if (get_sha1(name, oid.hash))
+   if (get_oid(name, ))
die("Not a valid object name");
 
commit = lookup_commit_reference_gently(, 1);
diff --git a/builtin/am.c b/builtin/am.c
index c973bd96dc..40cc6d6fe8 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1131,7 +1131,7 @@ static int index_has_changes(struct strbuf *sb)
struct object_id head;
int i;
 
-   if (!get_sha1_tree("HEAD", head.hash)) {
+   if (!get_oid_tree("HEAD", )) {
struct diff_options opt;
 
diff_setup();
@@ -1432,7 +1432,7 @@ static void write_index_patch(const struct am_state 
*state)
struct rev_info rev_info;
FILE *fp;
 
-   if (!get_sha1_tree("HEAD", head.hash))
+   if (!get_oid_tree("HEAD", ))
tree = lookup_tree();
else
tree = lookup_tree(_tree_oid);
@@ -1661,7 +1661,7 @@ static void do_commit(const struct am_state *state)
if (write_cache_as_tree(tree.hash, 0, NULL))
die(_("git write-tree failed to write a tree"));
 
-   if (!get_sha1_commit("HEAD", parent.hash)) {
+ 

[PATCH v2 06/13] builtin/update_ref: convert to struct object_id

2017-07-13 Thread brian m. carlson
Convert the uses of unsigned char * to struct object_id.

Signed-off-by: brian m. carlson 
---
 builtin/update-ref.c | 69 ++--
 1 file changed, 34 insertions(+), 35 deletions(-)

diff --git a/builtin/update-ref.c b/builtin/update-ref.c
index 40ccfc193b..6b90c5dead 100644
--- a/builtin/update-ref.c
+++ b/builtin/update-ref.c
@@ -94,10 +94,10 @@ static char *parse_refname(struct strbuf *input, const char 
**next)
  * provided but cannot be converted to a SHA-1, die.  flags can
  * include PARSE_SHA1_OLD and/or PARSE_SHA1_ALLOW_EMPTY.
  */
-static int parse_next_sha1(struct strbuf *input, const char **next,
-  unsigned char *sha1,
-  const char *command, const char *refname,
-  int flags)
+static int parse_next_oid(struct strbuf *input, const char **next,
+ struct object_id *oid,
+ const char *command, const char *refname,
+ int flags)
 {
struct strbuf arg = STRBUF_INIT;
int ret = 0;
@@ -115,11 +115,11 @@ static int parse_next_sha1(struct strbuf *input, const 
char **next,
(*next)++;
*next = parse_arg(*next, );
if (arg.len) {
-   if (get_sha1(arg.buf, sha1))
+   if (get_oid(arg.buf, oid))
goto invalid;
} else {
/* Without -z, an empty value means all zeros: */
-   hashclr(sha1);
+   oidclr(oid);
}
} else {
/* With -z, read the next NUL-terminated line */
@@ -133,13 +133,13 @@ static int parse_next_sha1(struct strbuf *input, const 
char **next,
*next += arg.len;
 
if (arg.len) {
-   if (get_sha1(arg.buf, sha1))
+   if (get_oid(arg.buf, oid))
goto invalid;
} else if (flags & PARSE_SHA1_ALLOW_EMPTY) {
/* With -z, treat an empty value as all zeros: */
warning("%s %s: missing , treating as zero",
command, refname);
-   hashclr(sha1);
+   oidclr(oid);
} else {
/*
 * With -z, an empty non-required value means
@@ -182,26 +182,25 @@ static const char *parse_cmd_update(struct 
ref_transaction *transaction,
 {
struct strbuf err = STRBUF_INIT;
char *refname;
-   unsigned char new_sha1[20];
-   unsigned char old_sha1[20];
+   struct object_id new_oid, old_oid;
int have_old;
 
refname = parse_refname(input, );
if (!refname)
die("update: missing ");
 
-   if (parse_next_sha1(input, , new_sha1, "update", refname,
-   PARSE_SHA1_ALLOW_EMPTY))
+   if (parse_next_oid(input, , _oid, "update", refname,
+  PARSE_SHA1_ALLOW_EMPTY))
die("update %s: missing ", refname);
 
-   have_old = !parse_next_sha1(input, , old_sha1, "update", refname,
-   PARSE_SHA1_OLD);
+   have_old = !parse_next_oid(input, , _oid, "update", refname,
+  PARSE_SHA1_OLD);
 
if (*next != line_termination)
die("update %s: extra input: %s", refname, next);
 
if (ref_transaction_update(transaction, refname,
-  new_sha1, have_old ? old_sha1 : NULL,
+  new_oid.hash, have_old ? old_oid.hash : NULL,
   update_flags | create_reflog_flag,
   msg, ))
die("%s", err.buf);
@@ -218,22 +217,22 @@ static const char *parse_cmd_create(struct 
ref_transaction *transaction,
 {
struct strbuf err = STRBUF_INIT;
char *refname;
-   unsigned char new_sha1[20];
+   struct object_id new_oid;
 
refname = parse_refname(input, );
if (!refname)
die("create: missing ");
 
-   if (parse_next_sha1(input, , new_sha1, "create", refname, 0))
+   if (parse_next_oid(input, , _oid, "create", refname, 0))
die("create %s: missing ", refname);
 
-   if (is_null_sha1(new_sha1))
+   if (is_null_oid(_oid))
die("create %s: zero ", refname);
 
if (*next != line_termination)
die("create %s: extra input: %s", refname, next);
 
-   if (ref_transaction_create(transaction, refname, new_sha1,
+   if (ref_transaction_create(transaction, refname, new_oid.hash,
   update_flags | create_reflog_flag,
   msg, ))
die("%s", err.buf);
@@ -250,18 +249,18 @@ static 

[PATCH v2 08/13] builtin/unpack-file: convert to struct object_id

2017-07-13 Thread brian m. carlson
Signed-off-by: brian m. carlson 
---
 builtin/unpack-file.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/unpack-file.c b/builtin/unpack-file.c
index 73f1334191..281ca1db6c 100644
--- a/builtin/unpack-file.c
+++ b/builtin/unpack-file.c
@@ -1,7 +1,7 @@
 #include "builtin.h"
 #include "config.h"
 
-static char *create_temp_file(unsigned char *sha1)
+static char *create_temp_file(struct object_id *oid)
 {
static char path[50];
void *buf;
@@ -9,9 +9,9 @@ static char *create_temp_file(unsigned char *sha1)
unsigned long size;
int fd;
 
-   buf = read_sha1_file(sha1, , );
+   buf = read_sha1_file(oid->hash, , );
if (!buf || type != OBJ_BLOB)
-   die("unable to read blob object %s", sha1_to_hex(sha1));
+   die("unable to read blob object %s", oid_to_hex(oid));
 
xsnprintf(path, sizeof(path), ".merge_file_XX");
fd = xmkstemp(path);
@@ -23,15 +23,15 @@ static char *create_temp_file(unsigned char *sha1)
 
 int cmd_unpack_file(int argc, const char **argv, const char *prefix)
 {
-   unsigned char sha1[20];
+   struct object_id oid;
 
if (argc != 2 || !strcmp(argv[1], "-h"))
usage("git unpack-file ");
-   if (get_sha1(argv[1], sha1))
+   if (get_oid(argv[1], ))
die("Not a valid object name %s", argv[1]);
 
git_config(git_default_config, NULL);
 
-   puts(create_temp_file(sha1));
+   puts(create_temp_file());
return 0;
 }


[PATCH v2 09/13] builtin/verify-tag: convert to struct object_id

2017-07-13 Thread brian m. carlson
Signed-off-by: brian m. carlson 
---
 builtin/verify-tag.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/verify-tag.c b/builtin/verify-tag.c
index f9a5f7535a..30e4c826ed 100644
--- a/builtin/verify-tag.c
+++ b/builtin/verify-tag.c
@@ -56,20 +56,20 @@ int cmd_verify_tag(int argc, const char **argv, const char 
*prefix)
}
 
while (i < argc) {
-   unsigned char sha1[20];
+   struct object_id oid;
const char *name = argv[i++];
-   if (get_sha1(name, sha1)) {
+   if (get_oid(name, )) {
had_error = !!error("tag '%s' not found.", name);
continue;
}
 
-   if (gpg_verify_tag(sha1, name, flags)) {
+   if (gpg_verify_tag(oid.hash, name, flags)) {
had_error = 1;
continue;
}
 
if (fmt_pretty)
-   pretty_print_ref(name, sha1, fmt_pretty);
+   pretty_print_ref(name, oid.hash, fmt_pretty);
}
return had_error;
 }


[PATCH v2 01/13] builtin/fsck: convert remaining caller of get_sha1 to object_id

2017-07-13 Thread brian m. carlson
Signed-off-by: brian m. carlson 
---
 builtin/fsck.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 99dea7adf6..0e5a18e843 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -738,12 +738,12 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
heads = 0;
for (i = 0; i < argc; i++) {
const char *arg = argv[i];
-   unsigned char sha1[20];
-   if (!get_sha1(arg, sha1)) {
-   struct object *obj = lookup_object(sha1);
+   struct object_id oid;
+   if (!get_oid(arg, )) {
+   struct object *obj = lookup_object(oid.hash);
 
if (!obj || !(obj->flags & HAS_OBJ)) {
-   error("%s: object missing", sha1_to_hex(sha1));
+   error("%s: object missing", oid_to_hex());
errors_found |= ERROR_OBJECT;
continue;
}


[PATCH v2 07/13] bisect: convert bisect_checkout to struct object_id

2017-07-13 Thread brian m. carlson
Signed-off-by: brian m. carlson 
---
 bisect.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/bisect.c b/bisect.c
index a9fd9fbc61..2549eaf7b1 100644
--- a/bisect.c
+++ b/bisect.c
@@ -680,16 +680,16 @@ static int is_expected_rev(const struct object_id *oid)
return res;
 }
 
-static int bisect_checkout(const unsigned char *bisect_rev, int no_checkout)
+static int bisect_checkout(const struct object_id *bisect_rev, int no_checkout)
 {
char bisect_rev_hex[GIT_MAX_HEXSZ + 1];
 
-   memcpy(bisect_rev_hex, sha1_to_hex(bisect_rev), GIT_SHA1_HEXSZ + 1);
-   update_ref(NULL, "BISECT_EXPECTED_REV", bisect_rev, NULL, 0, 
UPDATE_REFS_DIE_ON_ERR);
+   memcpy(bisect_rev_hex, oid_to_hex(bisect_rev), GIT_SHA1_HEXSZ + 1);
+   update_ref(NULL, "BISECT_EXPECTED_REV", bisect_rev->hash, NULL, 0, 
UPDATE_REFS_DIE_ON_ERR);
 
argv_checkout[2] = bisect_rev_hex;
if (no_checkout) {
-   update_ref(NULL, "BISECT_HEAD", bisect_rev, NULL, 0, 
UPDATE_REFS_DIE_ON_ERR);
+   update_ref(NULL, "BISECT_HEAD", bisect_rev->hash, NULL, 0, 
UPDATE_REFS_DIE_ON_ERR);
} else {
int res;
res = run_command_v_opt(argv_checkout, RUN_GIT_CMD);
@@ -796,7 +796,7 @@ static void check_merge_bases(int no_checkout)
handle_skipped_merge_base(mb);
} else {
printf(_("Bisecting: a merge base must be tested\n"));
-   exit(bisect_checkout(mb->hash, no_checkout));
+   exit(bisect_checkout(mb, no_checkout));
}
}
 
@@ -939,7 +939,7 @@ int bisect_next_all(const char *prefix, int no_checkout)
struct rev_info revs;
struct commit_list *tried;
int reaches = 0, all = 0, nr, steps;
-   const unsigned char *bisect_rev;
+   struct object_id *bisect_rev;
char *steps_msg;
 
read_bisect_terms(_bad, _good);
@@ -977,11 +977,11 @@ int bisect_next_all(const char *prefix, int no_checkout)
exit(4);
}
 
-   bisect_rev = revs.commits->item->object.oid.hash;
+   bisect_rev = >item->object.oid;
 
-   if (!hashcmp(bisect_rev, current_bad_oid->hash)) {
+   if (!oidcmp(bisect_rev, current_bad_oid)) {
exit_if_skipped_commits(tried, current_bad_oid);
-   printf("%s is the first %s commit\n", sha1_to_hex(bisect_rev),
+   printf("%s is the first %s commit\n", oid_to_hex(bisect_rev),
term_bad);
show_diff_tree(prefix, revs.commits->item);
/* This means the bisection process succeeded. */


[PATCH v2 10/13] Convert remaining callers of get_sha1 to get_oid.

2017-07-13 Thread brian m. carlson
Signed-off-by: brian m. carlson 
---
 builtin/receive-pack.c | 4 ++--
 mailmap.c  | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 71c0c768db..1efa48fec4 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -919,9 +919,9 @@ static int update_shallow_ref(struct command *cmd, struct 
shallow_info *si)
  */
 static int head_has_history(void)
 {
-   unsigned char sha1[20];
+   struct object_id oid;
 
-   return !get_sha1("HEAD", sha1);
+   return !get_oid("HEAD", );
 }
 
 static const char *push_to_deploy(unsigned char *sha1,
diff --git a/mailmap.c b/mailmap.c
index c1a79c100c..cb921b4db6 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -214,17 +214,17 @@ static int read_mailmap_blob(struct string_list *map,
 const char *name,
 char **repo_abbrev)
 {
-   unsigned char sha1[20];
+   struct object_id oid;
char *buf;
unsigned long size;
enum object_type type;
 
if (!name)
return 0;
-   if (get_sha1(name, sha1) < 0)
+   if (get_oid(name, ) < 0)
return 0;
 
-   buf = read_sha1_file(sha1, , );
+   buf = read_sha1_file(oid.hash, , );
if (!buf)
return error("unable to read mailmap object at %s", name);
if (type != OBJ_BLOB)


[PATCH v2 05/13] sequencer: convert to struct object_id

2017-07-13 Thread brian m. carlson
Convert the remaining instances of unsigned char * to struct object_id.
This removes several calls to get_sha1.

Signed-off-by: brian m. carlson 
---
 sequencer.c | 59 ++-
 1 file changed, 30 insertions(+), 29 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 3010faf863..16d48a4fb3 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -691,7 +691,7 @@ static int run_git_commit(const char *defmsg, struct 
replay_opts *opts,
 
 static int is_original_commit_empty(struct commit *commit)
 {
-   const unsigned char *ptree_sha1;
+   const struct object_id *ptree_oid;
 
if (parse_commit(commit))
return error(_("could not parse commit %s\n"),
@@ -701,12 +701,12 @@ static int is_original_commit_empty(struct commit *commit)
if (parse_commit(parent))
return error(_("could not parse parent commit %s\n"),
oid_to_hex(>object.oid));
-   ptree_sha1 = parent->tree->object.oid.hash;
+   ptree_oid = >tree->object.oid;
} else {
-   ptree_sha1 = EMPTY_TREE_SHA1_BIN; /* commit is root */
+   ptree_oid = _tree_oid; /* commit is root */
}
 
-   return !hashcmp(ptree_sha1, commit->tree->object.oid.hash);
+   return !oidcmp(ptree_oid, >tree->object.oid);
 }
 
 /*
@@ -896,18 +896,18 @@ static int update_squash_messages(enum todo_command 
command,
 
 static void flush_rewritten_pending(void) {
struct strbuf buf = STRBUF_INIT;
-   unsigned char newsha1[20];
+   struct object_id newoid;
FILE *out;
 
-   if (strbuf_read_file(, rebase_path_rewritten_pending(), 82) > 0 &&
-   !get_sha1("HEAD", newsha1) &&
+   if (strbuf_read_file(, rebase_path_rewritten_pending(), 
(GIT_MAX_HEXSZ + 1) * 2) > 0 &&
+   !get_oid("HEAD", ) &&
(out = fopen_or_warn(rebase_path_rewritten_list(), "a"))) {
char *bol = buf.buf, *eol;
 
while (*bol) {
eol = strchrnul(bol, '\n');
fprintf(out, "%.*s %s\n", (int)(eol - bol),
-   bol, sha1_to_hex(newsha1));
+   bol, oid_to_hex());
if (!*eol)
break;
bol = eol + 1;
@@ -1594,36 +1594,37 @@ static int rollback_is_safe(void)
return !oidcmp(_head, _head);
 }
 
-static int reset_for_rollback(const unsigned char *sha1)
+static int reset_for_rollback(const struct object_id *oid)
 {
const char *argv[4];/* reset --merge  + NULL */
 
argv[0] = "reset";
argv[1] = "--merge";
-   argv[2] = sha1_to_hex(sha1);
+   argv[2] = oid_to_hex(oid);
argv[3] = NULL;
return run_command_v_opt(argv, RUN_GIT_CMD);
 }
 
 static int rollback_single_pick(void)
 {
-   unsigned char head_sha1[20];
+   struct object_id head_oid;
 
if (!file_exists(git_path_cherry_pick_head()) &&
!file_exists(git_path_revert_head()))
return error(_("no cherry-pick or revert in progress"));
-   if (read_ref_full("HEAD", 0, head_sha1, NULL))
+   if (read_ref_full("HEAD", 0, head_oid.hash, NULL))
return error(_("cannot resolve HEAD"));
-   if (is_null_sha1(head_sha1))
+   if (is_null_oid(_oid))
return error(_("cannot abort from a branch yet to be born"));
-   return reset_for_rollback(head_sha1);
+   return reset_for_rollback(_oid);
 }
 
 int sequencer_rollback(struct replay_opts *opts)
 {
FILE *f;
-   unsigned char sha1[20];
+   struct object_id oid;
struct strbuf buf = STRBUF_INIT;
+   const char *p;
 
f = fopen(git_path_head_file(), "r");
if (!f && errno == ENOENT) {
@@ -1643,12 +1644,12 @@ int sequencer_rollback(struct replay_opts *opts)
goto fail;
}
fclose(f);
-   if (get_sha1_hex(buf.buf, sha1) || buf.buf[40] != '\0') {
+   if (parse_oid_hex(buf.buf, , ) || *p != '\0') {
error(_("stored pre-cherry-pick HEAD file '%s' is corrupt"),
git_path_head_file());
goto fail;
}
-   if (is_null_sha1(sha1)) {
+   if (is_null_oid()) {
error(_("cannot abort from a branch yet to be born"));
goto fail;
}
@@ -1658,7 +1659,7 @@ int sequencer_rollback(struct replay_opts *opts)
warning(_("You seem to have moved HEAD. "
  "Not rewinding, check your HEAD!"));
} else
-   if (reset_for_rollback(sha1))
+   if (reset_for_rollback())
goto fail;
strbuf_release();
return sequencer_remove_state(opts);
@@ -1788,13 +1789,13 @@ static int make_patch(struct commit *commit, struct 
replay_opts *opts)
 
 static int 

Strange behavior with Git add -p in version 2.13.3

2017-07-13 Thread Ben Reed
Hello, I updated Git to 2.13.3, and now selecting 's' to split a
change on a call to `git add -p` is not working. It's showing the list
of options as if 's' is not a valid choice...

Particularly, I'm getting...
Stage this hunk [y,n,q,a,d,/,e,?]? s
y - stage this hunk
n - do not stage this hunk
q - quit; do not stage this hunk or any of the remaining ones
a - stage this hunk and all later hunks in the file
d - do not stage this hunk or any of the later hunks in the file
g - select a hunk to go to
/ - search for a hunk matching the given regex
j - leave this hunk undecided, see next undecided hunk
J - leave this hunk undecided, see next hunk
k - leave this hunk undecided, see previous undecided hunk
K - leave this hunk undecided, see previous hunk
s - split the current hunk into smaller hunks
e - manually edit the current hunk
? - print help

Is anyone else having this problem? Does anybody know how to resolve
it? I'm running on macOS Version 10.12.5.

Thanks in advance!

-Ben


Re: [PATCH 2/2] tag: convert gpg_verify_tag to use struct object_id

2017-07-13 Thread brian m. carlson
On Thu, Jul 13, 2017 at 02:49:28PM -0700, Stefan Beller wrote:
> Snarkily I wanted to link to an essay "large patch series
> considered harmful"[1], but could not find any. So a couple
> of bullet points:
> (a) humans dislike larger series, hence fewer reviewers
> (b) the likelihood of a mistake in new code is proportional to its size
>   We can use the number of patches as a proxy for size
> (c) If a mistake is found, the whole series needs rerolling.
>   The effort of rerolling a series can be approximated with
>   its size as well.
> 
> From combining (b) and (c), we conclude that the effort to
> land a patch series is O(n^2) with n as the number of patches.
> Also from (a) we conclude that two smaller series containing
> the same output as one large series, has better code quality.
> So with that, we conclude that all series shall be as small
> as possible.
> 
> So I'd ask to queue these 2 separately, asking Brian to drop
> "builtin/verify-tag: convert to struct object_id"
> 
> [1] https://www.cs.princeton.edu/~dpw/papers/hotos.pdf, 2005
> seems interesting to me in hindsight.
> 
> I can also send my patches to Brian, as you (both) like.

I'm literally about to send my series out; I rebased and tested it last
night.  I don't care whose patch gets applied, but to avoid needing to
rebase and retest my series, I'm going to send it out as it is.  Junio
can apply my series on top of yours, in which case he can drop the
relevant patch from my series.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


[ANNOUNCE] Git v2.14.0-rc0

2017-07-13 Thread Junio C Hamano
An early preview release Git v2.14.0-rc0 is now available for
testing at the usual places.  It is comprised of 675 non-merge
commits since v2.13.0, contributed by 53 people, 14 of which are
new faces.

The tarballs are found at:

https://www.kernel.org/pub/software/scm/git/testing/

The following public repositories all have a copy of the
'v2.14.0-rc0' tag and the 'master' branch that the tag points at:

  url = https://kernel.googlesource.com/pub/scm/git/git
  url = git://repo.or.cz/alt-git.git
  url = https://github.com/gitster/git

New contributors whose contributions weren't in v2.13.0 are as follows.
Welcome to the Git development community!

  A. Wilcox, Ben Peart, Brian Malehorn, James Clarke, Jeff Smith,
  Kaartic Sivaraam, Liam Beguin, Phillip Wood, Rikard Falkeborn,
  Sahil Dua, Samuel Lijin, Stephen Kent, Tyler Brazier, and
  xiaoqiang zhao.

Returning contributors who helped this release are as follows.
Thanks for your continued support.

  Adam Dinwoodie, Ævar Arnfjörð Bjarmason, Alejandro
  R. Sedeño, Andreas Heiduk, Beat Bolli, Brandon Williams,
  brian m. carlson, Christian Couder, David Aguilar, David
  Turner, Dennis Kaarsemaker, Eric Wong, Jean-Noel Avila, Jeff
  Hostetler, Jeff King, Johannes Schindelin, Johannes Sixt,
  Jonathan Nieder, Jonathan Tan, Junio C Hamano, Kyle J. McKay,
  Kyle Meyer, Lars Schneider, Marc Branchaud, Michael Haggerty,
  Mike Hommey, Nguyễn Thái Ngọc Duy, Patrick Steinhardt,
  Prathamesh Chavan, Ralf Thielow, Ramsay Jones, René Scharfe,
  Stefan Beller, Štěpán Němec, Sven Strickroth, SZEDER Gábor,
  Thomas Gummerer, Torsten Bögershausen, and Ville Skyttä.



Git 2.14 Release Notes (draft)
==

Backward compatibility notes.

 * Use of an empty string as a pathspec element that is used for
   'everything matches' is still warned and Git asks users to use a
   more explicit '.' for that instead.  The hope is that existing
   users will not mind this change, and eventually the warning can be
   turned into a hard error, upgrading the deprecation into removal of
   this (mis)feature.  That is not scheduled to happen in the upcoming
   release (yet).

 * Git now avoids blindly falling back to ".git" when the setup
   sequence said we are _not_ in Git repository.  A corner case that
   happens to work right now may be broken by a call to die("BUG").
   We've tried hard to locate such cases and fixed them, but there
   might still be cases that need to be addressed--bug reports are
   greatly appreciated.

 * The experiment to improve the hunk-boundary selection of textual
   diff output has finished, and the "indent heuristics" has now
   become the default.


Updates since v2.13
---

UI, Workflows & Features

 * The colors in which "git status --short --branch" showed the names
   of the current branch and its remote-tracking branch are now
   configurable.

 * "git clone" learned the "--no-tags" option not to fetch all tags
   initially, and also set up the tagopt not to follow any tags in
   subsequent fetches.

 * "git archive --format=zip" learned to use zip64 extension when
   necessary to go beyond the 4GB limit.

 * "git reset" learned "--recurse-submodules" option.

 * "git diff --submodule=diff" now recurses into nested submodules.

 * "git repack" learned to accept the --threads= option and pass it
   to pack-objects.

 * "git send-email" learned to run sendemail-validate hook to inspect
   and reject a message before sending it out.

 * There is no good reason why "git fetch $there $sha1" should fail
   when the $sha1 names an object at the tip of an advertised ref,
   even when the other side hasn't enabled allowTipSHA1InWant.

 * The recently introduced "[includeIf "gitdir:$dir"] path=..."
   mechanism has further been taught to take symlinks into account.
   The directory "$dir" specified in "gitdir:$dir" may be a symlink to
   a real location, not something that $(getcwd) may return.  In such
   a case, a realpath of "$dir" is compared with the real path of the
   current repository to determine if the contents from the named path
   should be included.

 * Make the "indent" heuristics the default in "diff" and diff.indentHeuristics
   configuration variable an escape hatch for those who do no want it.

 * Many commands learned to pay attention to submodule.recurse
   configuration.

 * The convention for a command line is to follow "git cmdname
   --options" with revisions followed by an optional "--"
   disambiguator and then finally pathspecs.  When "--" is not there,
   we make sure early ones are all interpretable as revs (and do not
   look like paths) and later ones are the other way around.  A
   pathspec with "magic" (e.g. ":/p/a/t/h" that matches p/a/t/h from
   the top-level of the working tree, no matter what subdirectory you
   are working from) are conservatively judged as "not a path", which
   required disambiguation more 

What's cooking in git.git (Jul 2017, #04; Thu, 13)

2017-07-13 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

A maintenance release for 2.13.x series, and the first preview for
2.14 series -rc0, have been tagged.  Let's close the 'master' except
for obvious fixes and clean-ups and concentrate on regressions from
now on.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ab/grep-lose-opt-regflags (2017-06-30) 6 commits
  (merged to 'next' on 2017-07-05 at 375c0b92ea)
 + grep: remove redundant REG_NEWLINE when compiling fixed regex
 + grep: remove regflags from the public grep_opt API
 + grep: remove redundant and verbose re-assignments to 0
 + grep: remove redundant "fixed" field re-assignment to 0
 + grep: adjust a redundant grep pattern type assignment
 + grep: remove redundant double assignment to 0

 Code cleanup.


* jk/build-with-asan (2017-07-10) 5 commits
  (merged to 'next' on 2017-07-10 at 49757e1119)
 + Makefile: disable unaligned loads with UBSan
 + Makefile: turn off -fomit-frame-pointer with sanitizers
 + Makefile: add helper for compiling with -fsanitize
 + test-lib: turn on ASan abort_on_error by default
 + test-lib: set ASAN_OPTIONS variable before we run git

 The build procedure has been improved to allow building and testing
 Git with address sanitizer more easily.


* kn/ref-filter-branch-list (2017-07-10) 4 commits
  (merged to 'next' on 2017-07-10 at 35fd25c0dd)
 + ref-filter.c: drop return from void function
 + branch: set remote color in ref-filter branch immediately
 + branch: use BRANCH_COLOR_LOCAL in ref-filter format
 + branch: only perform HEAD check for local branches

 The rewrite of "git branch --list" using for-each-ref's internals
 that happened in v2.13 regressed its handling of color.branch.local;
 this has been fixed.


* ks/fix-rebase-doc-picture (2017-07-10) 1 commit
  (merged to 'next' on 2017-07-10 at 3acb856b17)
 + doc: correct a mistake in an illustration

 Doc update.


* rs/apply-avoid-over-reading (2017-07-09) 1 commit
  (merged to 'next' on 2017-07-10 at 2d8191ec3f)
 + apply: use strcmp(3) for comparing strings in gitdiff_verify_name()

 Code cleanup.


* rs/urlmatch-cleanup (2017-07-09) 1 commit
  (merged to 'next' on 2017-07-10 at 2dd3e7cab0)
 + urlmatch: use hex2chr() in append_normalized_escapes()

 Code cleanup.


* rs/use-div-round-up (2017-07-10) 1 commit
  (merged to 'next' on 2017-07-10 at accb7919da)
 + use DIV_ROUND_UP

 Code cleanup.


* rs/wt-status-cleanup (2017-07-10) 1 commit
  (merged to 'next' on 2017-07-10 at d8939f683a)
 + wt-status: use separate variable for result of shorten_unambiguous_ref

 Code cleanup.


* sb/hashmap-customize-comparison (2017-06-30) 3 commits
  (merged to 'next' on 2017-07-06 at cc420805f3)
 + hashmap: migrate documentation from Documentation/technical into header
 + patch-ids.c: use hashmap correctly
 + hashmap.h: compare function has access to a data field
 (this branch is used by sb/diff-color-move and sb/hashmap-cleanup.)

 Update the hashmap API so that data to customize the behaviour of
 the comparison function can be specified at the time a hashmap is
 initialized.


* sb/pull-rebase-submodule (2017-06-27) 4 commits
  (merged to 'next' on 2017-07-09 at 48d2c3a51c)
 + builtin/fetch cleanup: always set default value for submodule recursing
 + pull: optionally rebase submodules (remote submodule changes only)
 + builtin/fetch: parse recurse-submodules-default at default options parsing
 + builtin/fetch: factor submodule recurse parsing out to submodule config

 "git pull --rebase --recurse-submodules" learns to rebase the
 branch in the submodules to an updated base.


* sb/submodule-doc (2017-06-22) 1 commit
  (merged to 'next' on 2017-07-09 at fda0ceec31)
 + submodules: overhaul documentation

 Doc update.

--
[New Topics]

* jn/hooks-pre-rebase-sample-fix (2017-07-11) 1 commit
  (merged to 'next' on 2017-07-12 at ed86887454)
 + pre-rebase hook: capture documentation in a 

A note from the maintainer

2017-07-13 Thread Junio C Hamano
Welcome to the Git development community.

This message is written by the maintainer and talks about how Git
project is managed, and how you can work with it.

* Mailing list and the community

The development is primarily done on the Git mailing list. Help
requests, feature proposals, bug reports and patches should be sent to
the list address .  You don't have to be
subscribed to send messages.  The convention on the list is to keep
everybody involved on Cc:, so it is unnecessary to say "Please Cc: me,
I am not subscribed".

Before sending patches, please read Documentation/SubmittingPatches
and Documentation/CodingGuidelines to familiarize yourself with the
project convention.

If you sent a patch and you did not hear any response from anybody for
several days, it could be that your patch was totally uninteresting,
but it also is possible that it was simply lost in the noise.  Please
do not hesitate to send a reminder message in such a case.  Messages
getting lost in the noise may be a sign that those who can evaluate
your patch don't have enough mental/time bandwidth to process them
right at the moment, and it often helps to wait until the list traffic
becomes calmer before sending such a reminder.

The list archive is available at a few public sites:

http://public-inbox.org/git/
http://marc.info/?l=git
http://www.spinics.net/lists/git/

For those who prefer to read it over NNTP:

nntp://news.public-inbox.org/inbox.comp.version-control.git
nntp://news.gmane.org/gmane.comp.version-control.git

are available.

When you point at a message in a mailing list archive, using its
message ID is often the most robust (if not very friendly) way to do
so, like this:


http://public-inbox.org/git/pine.lnx.4.58.0504150753440.7...@ppc970.osdl.org

Often these web interfaces accept the message ID with enclosing <>
stripped (like the above example to point at one of the most important
message in the Git list).

Some members of the development community can sometimes be found on
the #git and #git-devel IRC channels on Freenode.  Their logs are
available at:

http://colabti.org/irclogger/irclogger_log/git
http://colabti.org/irclogger/irclogger_log/git-devel

There is a volunteer-run newsletter to serve our community ("Git Rev
News" http://git.github.io/rev_news/rev_news.html).

Git is a member project of software freedom conservancy, a non-profit
organization (https://sfconservancy.org/).  To reach a committee of
liaisons to the conservancy, contact them at .


* Reporting bugs

When you think git does not behave as you expect, please do not stop
your bug report with just "git does not work".  "I used git in this
way, but it did not work" is not much better, neither is "I used git
in this way, and X happend, which is broken".  It often is that git is
correct to cause X happen in such a case, and it is your expectation
that is broken. People would not know what other result Y you expected
to see instead of X, if you left it unsaid.

Please remember to always state

 - what you wanted to achieve;

 - what you did (the version of git and the command sequence to reproduce
   the behavior);

 - what you saw happen (X above);

 - what you expected to see (Y above); and

 - how the last two are different.

See http://www.chiark.greenend.org.uk/~sgtatham/bugs.html for further
hints.

If you think you found a security-sensitive issue and want to disclose
it to us without announcing it to wider public, please contact us at
our security mailing list .  This is
a closed list that is limited to people who need to know early about
vulnerabilities, including:

  - people triaging and fixing reported vulnerabilities
  - people operating major git hosting sites with many users
  - people packaging and distributing git to large numbers of people

where these issues are discussed without risk of the information
leaking out before we're ready to make public announcements.


* Repositories and documentation.

My public git.git repositories are at:

  git://git.kernel.org/pub/scm/git/git.git/
  https://kernel.googlesource.com/pub/scm/git/git
  git://repo.or.cz/alt-git.git/
  https://github.com/git/git/
  git://git.sourceforge.jp/gitroot/git-core/git.git/
  git://git-core.git.sourceforge.net/gitroot/git-core/git-core/

This one shows not just the main integration branches, but also
individual topics broken out:

  git://github.com/gitster/git/

A few web interfaces are found at:

  http://git.kernel.org/cgit/git/git.git
  https://kernel.googlesource.com/pub/scm/git/git
  http://repo.or.cz/w/alt-git.git

Preformatted documentation from the tip of the "master" branch can be
found in:

  git://git.kernel.org/pub/scm/git/git-{htmldocs,manpages}.git/
  git://repo.or.cz/git-{htmldocs,manpages}.git/
  https://github.com/gitster/git-{htmldocs,manpages}.git/

The manual pages formatted in HTML for the tip of 

Re: "groups of files" in Git?

2017-07-13 Thread Igor Djordjevic
Eh, yet another one, sorry:

On 14/07/2017 01:32, Igor Djordjevic wrote:
> 
> $ git checkout featureA
> $ git commit
> $ git checkout master
> $ git reset --hard HEAD^
> $ git merge 

The last line should stress , as we`re 
not merging exact parent commits the previous merge commit was made 
of, but updated tips of the branches the previous merge commit was 
made of... or something along those lines :)


Re: "groups of files" in Git?

2017-07-13 Thread Igor Djordjevic
Just a small update/fixup:

On 14/07/2017 00:39, Igor Djordjevic wrote:
> I guess it would be a kind of alias to doing:
> 
> $ git checkout featureA
> $ git add ...
> $ git commit
> $ git checkout master
> $ git reset --hard HEAD^
> $ git merge featureA featureB
> 

This should, in fact, be:

$ git checkout featureA
$ git commit
$ git checkout master
$ git reset --hard HEAD^
$ git merge 

(removed "git add" step, as that is needed for proposed single step 
solution as well, as a usual step preceding the commit; also replaced 
concrete branch names in the last step with a more generic 
description, better communicating real intent)

> In the same manner, it should be possible to drop a commit from the 
> feature branch in a single step, for example returning to the state 
> as shown in (1), or even "port" it from one branch to the other, like
> this (without a need for it to be the last commit, even):
> 
> (3)  o---o---o---\ (featureA)
> / \
> ---o---o---o---M' (master, HEAD)
> \ /
>  o---o---A'--o (featureB)

Here, the diagram should look like this:

(3)  o---o---o---\ (featureA)
/ \
---o---o---o---M'' (master, HEAD)
\ /
 o---o---A''-o (featureB)

(replaced leftover M' from the previous diagram with M'' to show it`s 
yet another (updated) merge commit, different from both M and M' in 
terms of SHA1, yet the contents would probably, but not necessarily, 
be the same for all three; same for leftover A', replaced with A'')

Regards,
Buga


Re: "groups of files" in Git?

2017-07-13 Thread Igor Djordjevic
Hi astian,

On 12/07/2017 00:27, astian wrote:
> Nikolay Shustov wrote:
>> With Perforce, I can have multiple changelists opened, that group the
>> changed files as needed.
>>
>> With Git I cannot seem to finding the possibility to figure out how to
>> achieve the same result. And the problem is that putting change sets
>> on different Git branches (or workdirs, or whatever Git offers that
>> makes the changes to be NOT in the same source tree) is not a viable
>> option from me as I would have to re-build code as I re-integrate the
>> changes between the branches (or whatever changes separation Git
>> feature is used).
>> Build takes time and resources and considering that I have to do it on
>> multiple platforms (I do cross-platform development) it really
>> denominates the option of not having multiple changes in the same code
>> tree.
>>
>> Am I ignorant about some Git feature/way of using Git that would help?
>> Is it worth considering adding to Git a feature like "group of files"
>> that would offer some virtutal grouping of the locally changed files
>> in the checked-out branch?
> 
> I never used Perforce and I'm not even sure I understand your problem,
> but I thought I'd mention something that nobody else seems to have yet
> (unless I missed it):
> 
> First, one thing that seems obvious to me from your description is that
> these "parallel features" you work on are obviously interdependent,
> therefore I would rather consider the whole thing as a single feature.
> Therefore, it makes sense to me to work in a single "topic branch".
> 
> This doesn't preclude one from separating the changes in logically
> sensible pieces.  Indeed this is par for the course in Git and people do
> it all the time by dividing the bulk of changes into a carefully chosen
> series of commits.
> 
> I think the most common way of doing this is to simply work on the whole
> thing and once you're happy with it you use "git rebase --interative" in
> order to "prettify" your history.
> 
> But, and here comes the part I think nobody mentioned yet, if your
> feature work is considerably large or spans a considerably long time it
> may be undesirable to postpone all that work until the very end (perhaps
> by then you already forgot important information, or perhaps too many
> changes have accumulated so reviewing them all becomes significantly
> less efficient).  In that case, one solution is to use a "patch
> management system" which will let you do that work incrementally (going
> back and forth as needed).
> 
> If you know mercurial, this is "hg mq".  I don't think Git has any such
> system built-in, but I know there are at least these external tools that
> integrate with Git:
> https://git.wiki.kernel.org/index.php/Interfaces,_frontends,_and_tools#Patch-management_Interface_layers
> 
> Feel free to ignore this if I totally misunderstood your use case.
> 
> Cheers
This message actually creeped me out the first time I read it, after 
writing an e-mail reply of my own[1].

The tone it`s written in, the points you make, and even the 
conclusion about "hg mg" -- as if you were reading my mind.

Yours was sent a bit before mine, but I guess we were writing it at 
the same time as well... Just spooky, lol.

That said, I totally understand what you`re talking about, and I gave 
an example of the desired (yet missing?) Git work flow here[2] :)

[1] https://public-inbox.org/git/6e4096fd-cbab-68f0-7a23-654382cb8...@gmail.com/
[2] https://public-inbox.org/git/27a3c650-5843-d446-1f59-64fabe543...@gmail.com/

Regards,
Buga


Re: [PATCH 05/15] ref-filter: abstract ref format into its own struct

2017-07-13 Thread Junio C Hamano
Stefan Beller  writes:

>> So when somebody wants to do a "from design and explanation to
>> provider to consumer", we would probably want "doc, *.h, *.c at the
>> top-level and then things inside builtin/ subdirectory" order.  Of
>> course, on the other hand, "I do not trust me not getting swayed by
>> the fact that a developer more competent than me wrote the patch"
>> reviewer would want to use the reverse order.
>
> I do not understand what you say here? Are you saying:
> "I can be tricked easier when the order is top-down,
> specifically when the more competent developer tries to?"

I am not worried about the case in which patch author actively
"tries to" deceive.  It is more like "I am more likely to fail to
spot mistakes the patch author failed to spot himself", when I start
with reasoning, service provider implementations and then service
consumer.  When I am forced to think the problem myself before
seeing the answer and then compare the result with patch author's
answer, I am more likely to find such a mistake.

>> Can we actually express "top-level first and then builtin/*" order
>> with the diff.orderfile mechanism?
>
> By reading the code, I think we snap to the first match. And matching
> is done via the wildmatch.{c,h}, that claims it has special-case '/' matching,
> and '**' **  work differently than '*',

I took a brief look at diffcore-order.c; I do not think "/*.c" would
match only top-level .c files like gitignore files would.


Re: [GSoC][PATCH 4/5 v4] submodule: port submodule subcommand 'status' from shell to C

2017-07-13 Thread Stefan Beller
On Thu, Jul 13, 2017 at 1:05 PM, Prathamesh Chavan  wrote:
> This aims to make git-submodule 'status' a built-in. Hence, the function
> cmd_status() is ported from shell to C. This is done by introducing
> three functions: module_status(), submodule_status() and print_status().
>
> The function module_status() acts as the front-end of the subcommand.
> It parses subcommand's options and then calls the function
> module_list_compute() for computing the list of submodules. Then
> this functions calls for_each_submodule_list() looping through the
> list obtained.
>
> Then for_each_submodule_list() calls submodule_status() for each of the
> submodule in its list. The function submodule_status() is responsible
> for generating the status each submodule it is called for, and
> then calls print_status().
>
> Finally, the function print_status() handles the printing of submodule's
> status.
>
> Mentored-by: Christian Couder 
> Mentored-by: Stefan Beller 
> Signed-off-by: Prathamesh Chavan 
> ---
> In this new version of patch:
>
> Instead of using cmd_diff_files(), the process is optimized
> by using ce_match_stat().

Apart from this I have reviewed the patch and found it a faithful
conversion.

I am not an expert in the diff area  and wonder how
the cmd_diff_files functionality is achieved with just a stat call
and then comparing it to  ce_match_stat. 'Using "dirty" ignores
all changes to the work tree of submodules, only changes to the
commits stored in the superproject are shown.' So I'd have
expected ce->oid to be compared (is there an index entry differing,
i.e. more than one stage?)


> Also, the child_process running the command 'rev-parse --verify HEAD'
> is removed for optimization reasons, and instead head_ref_submodule()
> is used with callback function handle_submodule_head_ref().

Cool.


Re: "groups of files" in Git?

2017-07-13 Thread Igor Djordjevic
On 13/07/2017 23:20, Junio C Hamano wrote:
> Nikolay Shustov  writes:
>> My question was about how to robustly handle "multiple pending
>> commits" which in Perforce are represented by concept of pending
>> changelists.
> 
> And in Git, they are represented by concept of commits that are not
> yet pushed out to the public repository to become the final history
> carved in stone.

If I may, I don`t think "multiple pending commits" is the issue here 
(as that is indeed what a private branch is), but more something like 
"multiple branches pending/live merge branch", or something.

To illustrate, let`s say this is our starting position:

(1)  o---o---o (featureA)
/ \
---o---o---o---M (master, HEAD)
\ /
 o---o---o (featureB)


We`re currently on commit "M", being a merge commit between our 
"master" and two feature branches.

Now, what seems lacking, while still possible through a series of 
steps, is an easy (single step) way to modify current state and 
commit the change to the _feature branch_, while still being on the 
"master" branch, still having everything merged in.

So after I make a "featureA" related change while on "M", to be able 
to issue a single command, for example:
 
$ git commit --branch=featureA

... or:
 
$ git commit -b featureA 
 
..., where "featureA" would need to be one of the parents of the 
current commit we are at (commit "M", in our case), and get a 
situation like this:

(2)  o---o---o---A (featureA)
/ \
---o---o---o---M' (master, HEAD)
\ /
 o---o---o---/ (featureB)


Here, "A" is a new commit/change I`ve just made (while still being on 
the "master" branch), and it is automatically commited to related 
"featureA" branch, with merge commit "M" now recreated into "M'" to 
hold the new "featureA" commit "A" as well.

I guess it would be a kind of alias to doing:

$ git checkout featureA
$ git add ...
$ git commit
$ git checkout master
$ git reset --hard HEAD^
$ git merge featureA featureB

... or something, where last merge step would need to remember 
previous merge commit "M" parent branches and merge them again to 
produce an updated "M'" merge commit.

In the same manner, it should be possible to drop a commit from the 
feature branch in a single step, for example returning to the state 
as shown in (1), or even "port" it from one branch to the other, like
this (without a need for it to be the last commit, even):

(3)  o---o---o---\ (featureA)
/ \
---o---o---o---M' (master, HEAD)
\ /
 o---o---A'--o (featureB)


Something like "rebase on steroids", lol, keeping the HEAD where it 
is, and its merge commit beneath updated.

This indeed seems similar to Mercurial`s patch "queues", except being 
much better as everything is still version controlled at all times, 
no additional tools needed to version control the patches (unless 
that`s already been addressed in Mercurial as well, dunno).
 
And it still seems to be following Git`s "multiple commits per 
feature, single feature per branch" spirit, just allowing for 
easier/faster branch integration testing.

p.s. Even if my short sample might be flawed in one way or the other, 
it should show the essence of the functionality we`re discussing 
here, I think.

Regards,
Buga


Re: [PATCH] strbuf: use designated initializers in STRBUF_INIT

2017-07-13 Thread Johannes Sixt

Am 12.07.2017 um 21:12 schrieb Ævar Arnfjörð Bjarmason:

I think in the context of this desire Johannes Sixt's "Actually, I'm
serious [about let's compile with c++]"[1] should be given some more
consideration.


Thank you for your support.


I've just compiled Git with it and it passes all tests. I think the
endeavor is worthwhile in itself as C++ source-level compatibility for
git.git is clearly easy to achieve, and would effectively give us access
to more compilers (albeit in different modes, but they may discover
novel bugs that also apply to the C mode code).


Please keep in mind that my patches only show that it can be done. 
Nevertheless, the result is far, far away from valid C++ code. It can be 
compiled by GCC (thanks to its -fpermissive flag) and happens to produce 
something that works. But that does not mean that other compilers would 
grok it.


Source-level compatibility is only necessary as a stop gap in the 
transition to C++. If the transition is not going to happen, I doubt 
that there is any value. It is simply too much code churn for too little 
gain. The real gains are in the features of C++(11,14).



But why do it? Aside from the "more compilers" argument, we may find
that it's going to be much easier to use some C99 features we want by
having C++ source-level compatibility, and on compilers like that may
not support those features in C use the C++ mode that may support those.


I would be happy to learn about a C99 feature where C++ mode of some 
compiler would help.


The only C99 feature mentioned so far was designated initializers. 
Unfortunately, that actually widens the gap between C and C++, not 
lessens it. (C++ does not have designated initializers, and they are not 
on the agenda.)



If not C++ support would be interesting for other reasons. Johannes
Sixt: It would be very nice to get those patches on-list.


I don't think it's worth to swamp the list with the patches at this 
time. Interested parties can find them here:


https://github.com/j6t/git.git c-plus-plus

I may continue the work, slowly and as long as I find it funny. It's 
just mental exercise, unless the Git community copies the GCC Steering 
Committee's mindeset with regard to C++ in the code base 
(https://gcc.gnu.org/ml/gcc/2010-05/msg00705.html).


-- Hannes


Re: [PATCH 05/15] ref-filter: abstract ref format into its own struct

2017-07-13 Thread Stefan Beller
On Thu, Jul 13, 2017 at 2:32 PM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> Stefan Beller  writes:
>>
>>> On Thu, Jul 13, 2017 at 8:01 AM, Jeff King  wrote:
>>>
  builtin/branch.c   | 14 +++---
  builtin/for-each-ref.c | 22 --
  builtin/tag.c  | 30 --
  builtin/verify-tag.c   | 12 ++--
  ref-filter.c   | 22 --
  ref-filter.h   | 22 +-
  6 files changed, 70 insertions(+), 52 deletions(-)
>>>
>>> The patch looks good to me. So some off-topic comments:
>>> I reviewed this patch from bottom up, i.e. I started looking at
>>> ref-filter.h, then  ref-filter.c and then the rest. If only you had 
>>> formatted
>>> the patches with an orderfile. ;)
>>
>> As a reviewer, for this particular patchq, I actually appreciated
>> that ref-filter.[ch] came at the end.  That forced me to think.
>> ...
>> I do want to present from Doc to header to code when I am showing my
>> patch to others, so this is probably a good example that illustrates
>> that the preferred presentation order is not just personal
>> preference, but is different on occasion even for the same person.
>
> So when somebody wants to do a "from design and explanation to
> provider to consumer", we would probably want "doc, *.h, *.c at the
> top-level and then things inside builtin/ subdirectory" order.  Of
> course, on the other hand, "I do not trust me not getting swayed by
> the fact that a developer more competent than me wrote the patch"
> reviewer would want to use the reverse order.

I do not understand what you say here? Are you saying:
"I can be tricked easier when the order is top-down,
specifically when the more competent developer tries to?"

> Can we actually express "top-level first and then builtin/*" order
> with the diff.orderfile mechanism?

By reading the code, I think we snap to the first match. And matching
is done via the wildmatch.{c,h}, that claims it has special-case '/' matching,
and '**' **  work differently than '*',

> without which it would be cumbersome to make ref-filter.c listed
> before builtin/branch.c in a generic way.

Indeed.


Re: Reducing redundant build at Travis?

2017-07-13 Thread Junio C Hamano
Lars Schneider  writes:

> On Thu, Jul 13, 2017 at 1:44 AM, Junio C Hamano  wrote:
>> I usually try to stay as late as possible to finish all the
>> integration branches in order before pushing out the result; it is
>> more efficient to be able to batch things (for humans).
>>
>> I however noticed that This often means we would have multiple build
>> jobs at Travis for branches and builds on Windows often fails
>
> The Windows build has some kind of problem since June 22.
> Somehow building gitk-git just blocks the build and waits until
> the timeout. I had no time, yet, to investigate this further.
>
>
>> waiting for its response.  Since I tagged the tip of 'maint', and I
>> wanted to give all the build a fair chance to succeed without other
>> build jobs starving it of resources, I pushed out 'maint' and the
>> tag before others, even though I already have all the other
>> integration branches ready.
>>
>> Unfortunately, https://travis-ci.org/git/git/builds/ shows that it
>> does not care if it spawned a job to build the tip of 'maint' and
>> another for 'v2.13.3' that point at the same thing.
>
> That is indeed suprising and wasteful. Looks like other people
> did run into the same issue. How about something like this?
> https://github.com/mockito/mockito/blob/release/2.x/.travis.yml#L26-L29

That unfortunately is exactly what I wanted to avoid.

We'd want to test tagged releases, and we'd want to test usual
updates to integration branches.  It just is that sometimes the tips
of integration branches happen to be at the tagged release, so I'd
prefer to always build tags but skip a branch build if it happens to
be also tagged.  After all, none of the integration branches may
directly point at a tagged release when the tag is pushed out.




Re: reftable: new ref storage format

2017-07-13 Thread Eric Wong
Jeff King  wrote:
> I agree that a full binary search of a reftable is harder because of the
> prefix compression (it may still be possible by scanning backwards, but
> I think there are ambiguities when you land in the middle of a record,
> since there's no unambiguous end-of-record character). But I don't think
> it matters. If you binary-search to a constant-sized block, then a
> linear scan of the block is acceptable.

For a new packed-refs, I think an intrusive critbit tree would
be a good way to store refs which have many common prefixes and
I've always wanted to apply critbit to an on-disk storage
format...

Several years ago, I started writing one in Perl using
pwrite/pread to provide Message-ID <=> NNTP article number
mapping several years ago, but gave up on it out of laziness:

  https://80x24.org/spew/1441508596-19511-1-git-send-emai...@80x24.org/raw

The end goal would've been to have two tries sharing the same
storage struct: one keyed by Message-ID, the other keyed by NNTP
article number (and figuring out the node using offsets like
we do with (container_of|list_entry) in list.h.

For git, being able to do an O(hashlength) prefix search based
on the object_id from the reftable would speed up decorations, I
think.  And of course, the O(refnamelength) prefix search would
also apply to the refnames themselves.


Re: [PATCH 2/2] tag: convert gpg_verify_tag to use struct object_id

2017-07-13 Thread Stefan Beller
On Thu, Jul 13, 2017 at 2:23 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> On Thu, Jul 13, 2017 at 1:52 PM, Junio C Hamano  wrote:
>>> Stefan Beller  writes:
>>>
 diff --git a/builtin/verify-tag.c b/builtin/verify-tag.c
 index f9a5f7535a..ed8329340f 100644
 --- a/builtin/verify-tag.c
 +++ b/builtin/verify-tag.c
 @@ -56,20 +56,21 @@ int cmd_verify_tag(int argc, const char **argv, const 
 char *prefix)
   }

   while (i < argc) {
 - unsigned char sha1[20];
 + struct object_id oid;
   const char *name = argv[i++];
 - if (get_sha1(name, sha1)) {
 +
 + if (get_oid(name, )) {
   had_error = !!error("tag '%s' not found.", name);
   continue;
   }
>>>
>>> This part is already done, it seems, in bc/object-id topic, even
>>> though other parts are not yet done?
>>
>> Oops. I assumed the latest bc/object-id would have been in master
>> already, but after checking it is not. 967635dc3c2
>> (builtin/verify-tag: convert to struct object_id)
>> converts this part, although there are 2 differences:
>> * I added a stray newline before get_oid
>> * The argument to gpg_verify_tag is a sha1 or oid
>>
>> So yes, this produces a merge conflict. :/
>
> That is OK.  This actually shouldn't create any meaningful conflict.
> Both try to do the same code, with only a blank-line difference.
>
> As Brian said bc/object-id would be rerolled, I was wondering if I
> should queue these two patches (even though I already queued them)
> myself, or it would be better for you to send them to Brian to make
> it part of his series.

+cc Brian

Snarkily I wanted to link to an essay "large patch series
considered harmful"[1], but could not find any. So a couple
of bullet points:
(a) humans dislike larger series, hence fewer reviewers
(b) the likelihood of a mistake in new code is proportional to its size
  We can use the number of patches as a proxy for size
(c) If a mistake is found, the whole series needs rerolling.
  The effort of rerolling a series can be approximated with
  its size as well.

>From combining (b) and (c), we conclude that the effort to
land a patch series is O(n^2) with n as the number of patches.
Also from (a) we conclude that two smaller series containing
the same output as one large series, has better code quality.
So with that, we conclude that all series shall be as small
as possible.

So I'd ask to queue these 2 separately, asking Brian to drop
"builtin/verify-tag: convert to struct object_id"

[1] https://www.cs.princeton.edu/~dpw/papers/hotos.pdf, 2005
seems interesting to me in hindsight.

I can also send my patches to Brian, as you (both) like.

Thanks,
Stefan


Re: [PATCH 05/15] ref-filter: abstract ref format into its own struct

2017-07-13 Thread Junio C Hamano
Junio C Hamano  writes:

> Stefan Beller  writes:
>
>> On Thu, Jul 13, 2017 at 8:01 AM, Jeff King  wrote:
>>
>>>  builtin/branch.c   | 14 +++---
>>>  builtin/for-each-ref.c | 22 --
>>>  builtin/tag.c  | 30 --
>>>  builtin/verify-tag.c   | 12 ++--
>>>  ref-filter.c   | 22 --
>>>  ref-filter.h   | 22 +-
>>>  6 files changed, 70 insertions(+), 52 deletions(-)
>>
>> The patch looks good to me. So some off-topic comments:
>> I reviewed this patch from bottom up, i.e. I started looking at
>> ref-filter.h, then  ref-filter.c and then the rest. If only you had formatted
>> the patches with an orderfile. ;)
>
> As a reviewer, for this particular patchq, I actually appreciated
> that ref-filter.[ch] came at the end.  That forced me to think.
> ...
> I do want to present from Doc to header to code when I am showing my
> patch to others, so this is probably a good example that illustrates
> that the preferred presentation order is not just personal
> preference, but is different on occasion even for the same person.

So when somebody wants to do a "from design and explanation to
provider to consumer", we would probably want "doc, *.h, *.c at the
top-level and then things inside builtin/ subdirectory" order.  Of
course, on the other hand, "I do not trust me not getting swayed by
the fact that a developer more competent than me wrote the patch"
reviewer would want to use the reverse order.

Can we actually express "top-level first and then builtin/*" order
with the diff.orderfile mechanism?  It's been a while since I last
looked at the orderfile matching (which was when I originally wrote
it) and I do not offhand know if we now allow wildmatch patterns and
the directory level anchoring "/*.c" like we do in .gitignore files,
without which it would be cumbersome to make ref-filter.c listed
before builtin/branch.c in a generic way.


Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)

2017-07-13 Thread Junio C Hamano
Jeff King  writes:

> Sorry, I mean the case where you do a merge from the other side, but
> then you end up rewinding the history in some way, taking into account
> that merge and everything they did. For example:
>
>   $ git pull
>   $ git rebase  ;# this will flatten the merge
>   $ git push --force-with-lease
>
> There was never a moment where the other side's tip ref was in your
> local branch, but you did incorporate it via the merge.

Ah, OK, now it is clear what you meant.

Thanks.


Re: [PATCH 2/2] tag: convert gpg_verify_tag to use struct object_id

2017-07-13 Thread Junio C Hamano
Stefan Beller  writes:

> On Thu, Jul 13, 2017 at 1:52 PM, Junio C Hamano  wrote:
>> Stefan Beller  writes:
>>
>>> diff --git a/builtin/verify-tag.c b/builtin/verify-tag.c
>>> index f9a5f7535a..ed8329340f 100644
>>> --- a/builtin/verify-tag.c
>>> +++ b/builtin/verify-tag.c
>>> @@ -56,20 +56,21 @@ int cmd_verify_tag(int argc, const char **argv, const 
>>> char *prefix)
>>>   }
>>>
>>>   while (i < argc) {
>>> - unsigned char sha1[20];
>>> + struct object_id oid;
>>>   const char *name = argv[i++];
>>> - if (get_sha1(name, sha1)) {
>>> +
>>> + if (get_oid(name, )) {
>>>   had_error = !!error("tag '%s' not found.", name);
>>>   continue;
>>>   }
>>
>> This part is already done, it seems, in bc/object-id topic, even
>> though other parts are not yet done?
>
> Oops. I assumed the latest bc/object-id would have been in master
> already, but after checking it is not. 967635dc3c2
> (builtin/verify-tag: convert to struct object_id)
> converts this part, although there are 2 differences:
> * I added a stray newline before get_oid
> * The argument to gpg_verify_tag is a sha1 or oid
>
> So yes, this produces a merge conflict. :/

That is OK.  This actually shouldn't create any meaningful conflict.
Both try to do the same code, with only a blank-line difference.

As Brian said bc/object-id would be rerolled, I was wondering if I
should queue these two patches (even though I already queued them)
myself, or it would be better for you to send them to Brian to make
it part of his series.


Re: Reducing redundant build at Travis?

2017-07-13 Thread Lars Schneider
On Thu, Jul 13, 2017 at 1:44 AM, Junio C Hamano  wrote:
> I usually try to stay as late as possible to finish all the
> integration branches in order before pushing out the result; it is
> more efficient to be able to batch things (for humans).
>
> I however noticed that This often means we would have multiple build
> jobs at Travis for branches and builds on Windows often fails

The Windows build has some kind of problem since June 22.
Somehow building gitk-git just blocks the build and waits until
the timeout. I had no time, yet, to investigate this further.


> waiting for its response.  Since I tagged the tip of 'maint', and I
> wanted to give all the build a fair chance to succeed without other
> build jobs starving it of resources, I pushed out 'maint' and the
> tag before others, even though I already have all the other
> integration branches ready.
>
> Unfortunately, https://travis-ci.org/git/git/builds/ shows that it
> does not care if it spawned a job to build the tip of 'maint' and
> another for 'v2.13.3' that point at the same thing.

That is indeed suprising and wasteful. Looks like other people
did run into the same issue. How about something like this?
https://github.com/mockito/mockito/blob/release/2.x/.travis.yml#L26-L29


- Lars


Re: "groups of files" in Git?

2017-07-13 Thread Junio C Hamano
Nikolay Shustov  writes:

> I am not really try to ignite the holy war between Perforce and Git
> (and why would one???), but if you are interested in the answer on how
> you'd do your scenario in Perforce, it would be: "use shelved
> changelists".

Oh, that was not my intention, either.  My interest was to see if
there is a good solution that we could steal from other world.

> In Perforce, you could "shelve" the changelist, similar to "stash" in
> Git, but the difference is that the Perforce shelved changes are
> accessible across clients. I.e. the other developer can "unshelve"
> these pending changes to its sandbox (to the same or the different
> branch) so that sandbox would get the pending changes as well. That
> would be like the developer made these changes himself. Whatever
> automated/manual process is involved, it is typical to run "a trial
> build/tests" on shelved changelist (i.e. uncommitted yet files) to
> verify the quality of changes.
> Git achieves the same through the ease of manipulation with branches
> and I like the way it does it much more.

Thanks.  Shelving and letting others unshelve is like keeping the
changes in separate branches and privately share them among
developers, so they sound pretty much equivalent features to me.

> My question was about how to robustly handle "multiple pending
> commits" which in Perforce are represented by concept of pending
> changelists.

And in Git, they are represented by concept of commits that are not
yet pushed out to the public repository to become the final history
carved in stone.


Re: [PATCH 2/2] tag: convert gpg_verify_tag to use struct object_id

2017-07-13 Thread Stefan Beller
On Thu, Jul 13, 2017 at 1:52 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> diff --git a/builtin/verify-tag.c b/builtin/verify-tag.c
>> index f9a5f7535a..ed8329340f 100644
>> --- a/builtin/verify-tag.c
>> +++ b/builtin/verify-tag.c
>> @@ -56,20 +56,21 @@ int cmd_verify_tag(int argc, const char **argv, const 
>> char *prefix)
>>   }
>>
>>   while (i < argc) {
>> - unsigned char sha1[20];
>> + struct object_id oid;
>>   const char *name = argv[i++];
>> - if (get_sha1(name, sha1)) {
>> +
>> + if (get_oid(name, )) {
>>   had_error = !!error("tag '%s' not found.", name);
>>   continue;
>>   }
>
> This part is already done, it seems, in bc/object-id topic, even
> though other parts are not yet done?

Oops. I assumed the latest bc/object-id would have been in master
already, but after checking it is not. 967635dc3c2
(builtin/verify-tag: convert to struct object_id)
converts this part, although there are 2 differences:
* I added a stray newline before get_oid
* The argument to gpg_verify_tag is a sha1 or oid

So yes, this produces a merge conflict. :/

There rest (tag.{c,h}, builtin/tag.c) is not found in brians series.


Re: [PATCH v1 0/4] Teach 'run' perf script to read config files

2017-07-13 Thread Jeff King
On Thu, Jul 13, 2017 at 08:57:01PM +0200, Christian Couder wrote:

> >> We want to make it possible to store the parameters to the 'run'
> >> script in a config file. This will make it easier to store, reuse,
> >> share and compare parameters.
> >
> > Because perf-lib is built on test-lib, it already reads
> > GIT-BUILD-OPTIONS.
> 
> Actually the 'run' script also sources GIT-BUILD-OPTIONS, so maybe
> this is not necessary.

Ah, right. The one that comes via perf-lib gets the variables into the
test scripts themselves. But anything "run" would need itself would come
from the source it does itself. And that's where GIT_PERF_MAKE_OPTS has
an effect.

> Also are the variables in GIT-BUILD-OPTIONS exported already?

No, I don't think so. But because both "run" and the scripts themselves
source them, they're available more or less everywhere, except for
sub-processes inside the scripts.

> > And the Makefile copies several perf-related values
> > into it, including GIT_PERF_MAKE_OPTS and GIT_PERF_REPEAT_COUNT. So you
> > can already do:
> >
> >   echo 'GIT_PERF_REPEAT_COUNT = 10' >>config.mak
> >   echo 'GIT_PERF_MAKE_OPTS = CFLAGS="-O2" DEVELOPER=1' >>config.mak
> >   make
> 
> The "make" here might not even be needed as in the 'run' script
> "config.mak" is copied into the "build/$rev" directory where "make" is
> run to build the $rev version.

You need it to bake the config into GIT-BUILD-OPTIONS, which is the only
thing that gets read by "run" and the perf scripts. If you are
just setting MAKE_OPTS to things that your config.mak already sets, then
yes, you can skip that one.

-Peff


Re: [PATCH] submodule: use cheaper check for submodule pushes

2017-07-13 Thread Stefan Beller
On Thu, Jul 13, 2017 at 1:48 PM, Jonathan Nieder  wrote:
> Hi,
>
> Stefan Beller wrote:
>
>> Yes we are safe, because the function itself only spawns a child process
>> (not using any of the objects).
>>
>> It's only caller push_unpushed_submodules also doesn't rely on objects
>> loaded after calling push_submodule.
>>
>> The caller of push_unpushed_submodules (transport.c, transport_push)
>> also doesn't need submodule objects loaded.
>
> Thanks for looking into it.  This is what the commit message should
> say to help reviewers or people trying to understand it later.  The
> footnotes don't help and are distracting, except that it makes sense
> to point out the original GSoC patch to say the alternate submodule
> odb wasn't needed even then.
>
> E.g.:
>
>  Subject: push: do not add submodule odb as an alternate when recursing on 
> demand
>
>  "git push --recurse-submodules=on-demand" adds each submodule as an
>  alternate with add_submodule_odb before checking whether the
>  submodule has anything to push and pushing it if so.
>
>  However, it never accesses any objects from the submodule.  In the
>  parent process it uses the submodule's ref database to see if there
>  is anything to push.  The actual push (which does rely on objects)
>  occurs in a child process.
>
>  The same was try when this call was originally added in
>  v1.7.11-rc0~111^2 (push: teach --recurse-submodules the on-demand
>  option, 2012-03-29).  Most likely it was added by analogy with
>  fetch --recurse-submodules=on-demand, which did use the submodule's
>  object database.
>
>  Use is_submodule_populated_gently instead, which is simpler and
>  cheaper.

Thanks for giving a good example of commit message that I could use
in a reroll.


> With such a commit message change, this seems like a reasonable change
> in principle (though I haven't looked carefully to verify it).
>
> My one doubt is the is_submodule_populated_gently.  Why are we using
> that instead of simpler is_submodule_populated?  The names and API
> comments don't explain.

One could posit this is laziness of thinking.
See 15cdc64776 (make is_submodule_populated gently, 2017-03-14),
and discover there is no non-gentle version of is_submodule_populated.
And for each new use, it may be cheaper to just use the gentle version
instead of adding a non-gentle version.

Thanks,
Stefan


Re: [PATCH 2/2] tag: convert gpg_verify_tag to use struct object_id

2017-07-13 Thread Junio C Hamano
Stefan Beller  writes:

> diff --git a/builtin/verify-tag.c b/builtin/verify-tag.c
> index f9a5f7535a..ed8329340f 100644
> --- a/builtin/verify-tag.c
> +++ b/builtin/verify-tag.c
> @@ -56,20 +56,21 @@ int cmd_verify_tag(int argc, const char **argv, const 
> char *prefix)
>   }
>  
>   while (i < argc) {
> - unsigned char sha1[20];
> + struct object_id oid;
>   const char *name = argv[i++];
> - if (get_sha1(name, sha1)) {
> +
> + if (get_oid(name, )) {
>   had_error = !!error("tag '%s' not found.", name);
>   continue;
>   }

This part is already done, it seems, in bc/object-id topic, even
though other parts are not yet done?

>  
> - if (gpg_verify_tag(sha1, name, flags)) {
> + if (gpg_verify_tag(, name, flags)) {
>   had_error = 1;
>   continue;
>   }
>  
>   if (fmt_pretty)
> - pretty_print_ref(name, sha1, fmt_pretty);
> + pretty_print_ref(name, oid.hash, fmt_pretty);
>   }
>   return had_error;
>  }
> diff --git a/tag.c b/tag.c
> index 47f60ae151..7e10acfb6e 100644
> --- a/tag.c
> +++ b/tag.c
> @@ -33,7 +33,7 @@ static int run_gpg_verify(const char *buf, unsigned long 
> size, unsigned flags)
>   return ret;
>  }
>  
> -int gpg_verify_tag(const unsigned char *sha1, const char *name_to_report,
> +int gpg_verify_tag(const struct object_id *oid, const char *name_to_report,
>   unsigned flags)
>  {
>   enum object_type type;
> @@ -41,20 +41,20 @@ int gpg_verify_tag(const unsigned char *sha1, const char 
> *name_to_report,
>   unsigned long size;
>   int ret;
>  
> - type = sha1_object_info(sha1, NULL);
> + type = sha1_object_info(oid->hash, NULL);
>   if (type != OBJ_TAG)
>   return error("%s: cannot verify a non-tag object of type %s.",
>   name_to_report ?
>   name_to_report :
> - find_unique_abbrev(sha1, DEFAULT_ABBREV),
> + find_unique_abbrev(oid->hash, DEFAULT_ABBREV),
>   typename(type));
>  
> - buf = read_sha1_file(sha1, , );
> + buf = read_sha1_file(oid->hash, , );
>   if (!buf)
>   return error("%s: unable to read file.",
>   name_to_report ?
>   name_to_report :
> - find_unique_abbrev(sha1, DEFAULT_ABBREV));
> + find_unique_abbrev(oid->hash, DEFAULT_ABBREV));
>  
>   ret = run_gpg_verify(buf, size, flags);
>  
> diff --git a/tag.h b/tag.h
> index fdfcb4a84a..d469534e82 100644
> --- a/tag.h
> +++ b/tag.h
> @@ -17,7 +17,7 @@ extern int parse_tag_buffer(struct tag *item, const void 
> *data, unsigned long si
>  extern int parse_tag(struct tag *item);
>  extern struct object *deref_tag(struct object *, const char *, int);
>  extern struct object *deref_tag_noverify(struct object *);
> -extern int gpg_verify_tag(const unsigned char *sha1,
> +extern int gpg_verify_tag(const struct object_id *oid,
>   const char *name_to_report, unsigned flags);
>  
>  #endif /* TAG_H */


Re: [PATCH] submodule: use cheaper check for submodule pushes

2017-07-13 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> Yes we are safe, because the function itself only spawns a child process
> (not using any of the objects).
>
> It's only caller push_unpushed_submodules also doesn't rely on objects
> loaded after calling push_submodule.
>
> The caller of push_unpushed_submodules (transport.c, transport_push)
> also doesn't need submodule objects loaded.

Thanks for looking into it.  This is what the commit message should
say to help reviewers or people trying to understand it later.  The
footnotes don't help and are distracting, except that it makes sense
to point out the original GSoC patch to say the alternate submodule
odb wasn't needed even then.

E.g.:

 Subject: push: do not add submodule odb as an alternate when recursing on 
demand

 "git push --recurse-submodules=on-demand" adds each submodule as an
 alternate with add_submodule_odb before checking whether the
 submodule has anything to push and pushing it if so.

 However, it never accesses any objects from the submodule.  In the
 parent process it uses the submodule's ref database to see if there
 is anything to push.  The actual push (which does rely on objects)
 occurs in a child process.

 The same was try when this call was originally added in
 v1.7.11-rc0~111^2 (push: teach --recurse-submodules the on-demand
 option, 2012-03-29).  Most likely it was added by analogy with
 fetch --recurse-submodules=on-demand, which did use the submodule's
 object database.

 Use is_submodule_populated_gently instead, which is simpler and
 cheaper.

[...]
> On Thu, Jul 13, 2017 at 11:37 AM, Junio C Hamano  wrote:

>> My hunch (and hope) is that we are probably safe, but that is a lot
>> weaker than "yes this is a good change we want to apply".
>
> Given the above (I went through the code), all I can do is repeating
> "yes this is a good change we want to apply".

With such a commit message change, this seems like a reasonable change
in principle (though I haven't looked carefully to verify it).

My one doubt is the is_submodule_populated_gently.  Why are we using
that instead of simpler is_submodule_populated?  The names and API
comments don't explain.

Thanks for your patient explanations,
Jonathan


Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)

2017-07-13 Thread Jeff King
On Thu, Jul 13, 2017 at 12:10:39PM -0700, Junio C Hamano wrote:

> >   - I suspect in the third example we probably ought to be using the
> > reflog of the branch that HEAD points to. You would not want:
> >
> >$ git checkout advanced-branch $ git checkout older-branch $ git
> >push --force-with-lease origin HEAD:older-branch
> >
> > to consider that commits in advanced-branch are part of the lease.
> 
> The third one was meant to be rewriting on detached HEAD, not having
> any underlying branch.

Ah, OK, that makes a lot more sense. I'd be tempted to say that it
should not allow force-with-lease at all. The HEAD reflog sees too many
unrelated things to make it a good basis for "does this result account
for everything in its reflog".

> >   - For step 3, I'm not sure if we you mean to look for exactly X, or
> > if it would be OK to have any commit whose ancestor is X. I think
> > you'd need the latter to accommodate a non-fast-forward "git pull"
> > (or fetch+merge) where the local ref is never set precisely to the
> > upstream commit.
> 
> But the result in that case is a descendant of upstream you just
> merged, so you do not even want to use any form of forcing---you
> would rather want to rely on the usual "push must fast-forward"
> mechanism, no?

Sorry, I mean the case where you do a merge from the other side, but
then you end up rewinding the history in some way, taking into account
that merge and everything they did. For example:

  $ git pull
  $ git rebase  ;# this will flatten the merge
  $ git push --force-with-lease

There was never a moment where the other side's tip ref was in your
local branch, but you did incorporate it via the merge.

-Peff


Re: [PATCH] RFC: Introduce '.gitorderfile'

2017-07-13 Thread Junio C Hamano
Stefan Beller  writes:

> The point I was trying to make is best demonstrated in
> t5526-fetch-submodules.sh:
>
>> ok 7 - using fetchRecurseSubmodules=true in .gitmodules recurses into 
>> submodules
>> ok 8 - --no-recurse-submodules overrides .gitmodules config
>> ok 9 - using fetchRecurseSubmodules=false in .git/config overrides setting 
>> in .gitmodules
>
> They were not suggestions, but defaults dictated by the project.

Yeah, and that is why I kept thinking that recurse-submodules may be
a huge mistake.


Re: [PATCH 0/15] making user-format colors conditional on config/tty

2017-07-13 Thread Junio C Hamano
Jeff King  writes:

> This version also takes into account feedback from the original thread.
> And as I added tests, it surfaced a few corner cases around color config
> that I've dealt with here.  The last two patches are the most
> interesting bits.
>
>   [01/15]: check return value of verify_ref_format()
>   [02/15]: docs/for-each-ref: update pointer to color syntax
>   [03/15]: t: use test_decode_color rather than literal ANSI codes
>   [04/15]: ref-filter: simplify automatic color reset
>   [05/15]: ref-filter: abstract ref format into its own struct
>   [06/15]: ref-filter: move need_color_reset_at_eol into ref_format
>   [07/15]: ref-filter: provide a function for parsing sort options
>   [08/15]: ref-filter: make parse_ref_filter_atom a private function
>   [09/15]: ref-filter: factor out the parsing of sorting atoms
>   [10/15]: ref-filter: pass ref_format struct to atom parsers
>   [11/15]: color: check color.ui in git_default_config()
>   [12/15]: for-each-ref: load config earlier
>   [13/15]: rev-list: pass diffopt->use_colors through to pretty-print
>   [14/15]: pretty: respect color settings for %C placeholders
>   [15/15]: ref-filter: consult want_color() before emitting colors

Overall I think the endgame is what we want to have in the future
(rather, what I wish we had from the beginning).  I'd have to think
about 11, 14 and 15 a bit more before saying anything that would be
remotely useful.

Thanks.

>
>  Documentation/git-for-each-ref.txt |   6 +-
>  Documentation/pretty-formats.txt   |  18 --
>  builtin/branch.c   |  21 +++---
>  builtin/clean.c|   3 +-
>  builtin/for-each-ref.c |  27 
>  builtin/grep.c |   2 +-
>  builtin/rev-list.c |   1 +
>  builtin/show-branch.c  |   2 +-
>  builtin/tag.c  |  61 ++
>  builtin/verify-tag.c   |  14 ++--
>  color.c|   8 ---
>  config.c   |   4 ++
>  diff.c |   3 -
>  pretty.c   |  27 ++--
>  ref-filter.c   | 108 ++-
>  ref-filter.h   |  30 +++--
>  t/t3203-branch-output.sh   |  31 +
>  t/t4207-log-decoration-colors.sh   |  22 +++
>  t/t6006-rev-list-format.sh | 129 
> -
>  t/t6300-for-each-ref.sh|  39 +++
>  t/t7004-tag.sh |  25 +++
>  t/test-lib-functions.sh|   1 +
>  22 files changed, 362 insertions(+), 220 deletions(-)
>
> -Peff


Re: [PATCH 06/15] ref-filter: move need_color_reset_at_eol into ref_format

2017-07-13 Thread Junio C Hamano
Jeff King  writes:

> Calling verify_ref_format() doesn't just confirm that the
> format is sane; it actually sets some global variables that
> will be used later when formatting the refs. These logically
> should belong to the ref_format, which would make it
> possible to use multiple formats within a single program
> invocation.
>
> Let's move one such flag into the ref_format struct. There
> are still others that would need to be moved before it would
> be safe to use multiple formats, but this commit gives a
> blueprint for how that should look.
>
> Signed-off-by: Jeff King 
> ---
> This commit is strictly optional for this series, but I wanted to give a
> sense of how the rest of the movement might look, since I was thinking
> about it. The big thing to move would be the used_atoms array, but I
> punted on that for now.

Heh, when I saw the hunk at 661,+7, used_atom[] was what immediately
came to mind.  It is OK to punt for the purpose of this patch, but
moving these statics into the ref-format structure would be a good
move in the longer term.

Thanks.

>
>  ref-filter.c | 7 +++
>  ref-filter.h | 3 +++
>  2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index 66d234bb1..178396e1f 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -97,7 +97,6 @@ static struct used_atom {
>   } u;
>  } *used_atom;
>  static int used_atom_cnt, need_tagged, need_symref;
> -static int need_color_reset_at_eol;
>  
>  static void color_atom_parser(struct used_atom *atom, const char 
> *color_value)
>  {
> @@ -661,7 +660,7 @@ int verify_ref_format(struct ref_format *format)
>  {
>   const char *cp, *sp;
>  
> - need_color_reset_at_eol = 0;
> + format->need_color_reset_at_eol = 0;
>   for (cp = format->format; *cp && (sp = find_next(cp)); ) {
>   const char *color, *ep = strchr(sp, ')');
>   int at;
> @@ -673,7 +672,7 @@ int verify_ref_format(struct ref_format *format)
>   cp = ep + 1;
>  
>   if (skip_prefix(used_atom[at].name, "color:", ))
> - need_color_reset_at_eol = !!strcmp(color, "reset");
> + format->need_color_reset_at_eol = !!strcmp(color, 
> "reset");
>   }
>   return 0;
>  }
> @@ -2083,7 +2082,7 @@ void format_ref_array_item(struct ref_array_item *info,
>   sp = cp + strlen(cp);
>   append_literal(cp, sp, );
>   }
> - if (need_color_reset_at_eol) {
> + if (format->need_color_reset_at_eol) {
>   struct atom_value resetv;
>   resetv.s = GIT_COLOR_RESET;
>   append_atom(, );
> diff --git a/ref-filter.h b/ref-filter.h
> index 2bb58879d..9e1e89c19 100644
> --- a/ref-filter.h
> +++ b/ref-filter.h
> @@ -79,6 +79,9 @@ struct ref_format {
>*/
>   const char *format;
>   int quote_style;
> +
> + /* Internal state to ref-filter */
> + int need_color_reset_at_eol;
>  };
>  
>  #define REF_FORMAT_INIT { NULL, 0 }


Re: reftable: new ref storage format

2017-07-13 Thread Jeff King
On Thu, Jul 13, 2017 at 12:56:54PM -0700, Stefan Beller wrote:

> >> ### Problem statement
> >>
> >> Some repositories contain a lot of references (e.g.  android at 866k,
> >> rails at 31k).  The existing packed-refs format takes up a lot of
> >> space (e.g.  62M), and does not scale with additional references.
> >> Lookup of a single reference requires linearly scanning the file.
> >
> > I think the linear scan is actually an implementation short-coming. Even
> > though the records aren't fixed-length, the fact that newlines can only
> > appear as end-of-record is sufficient to mmap and binary search a
> > packed-refs file (you just have to backtrack a little when you land in
> > the middle of a record).
> 
> Except that a record is a "delta" to the previous record, so it's not
> just finding a record, but reconstructing it. Example for records:

I was still talking about the existing packed-refs implementation here.

I agree that a full binary search of a reftable is harder because of the
prefix compression (it may still be possible by scanning backwards, but
I think there are ambiguities when you land in the middle of a record,
since there's no unambiguous end-of-record character). But I don't think
it matters. If you binary-search to a constant-sized block, then a
linear scan of the block is acceptable.

> >> - Occupy less disk space for large repositories.
> >
> > Good goal.  Just to play devil's advocate, the simplest way to do that
> > with the current code would be to gzip packed-refs (and/or store sha1s
> > as binary). That works against the "mmap and binary search" plan,
> > though. :)
> 
> Given the compression by delta-ing the name to the previous change and
> the fact that Gerrit has
> 
>   refs/heads/changes/1
>   refs/heads/changes/2
>   refs/heads/changes/3
>   ...
> 
> I think this format would trump a "dumb" zip.
> (Github having sequentially numbered pull requests would also
> benefit here)

You may be surprised. Let's imagine that you have a set of 4096 refs in
refs/changes/1, refs/changes/2, etc:

  for i in $(seq 1 4096)
  do
echo refs/changes/$i
  done >input

Now let's do a prefix compression, with a single byte for "how many
characters to reuse from the last entry":

  perl -lne '
my $common;
if (defined $last) {
  chop $last while !/\Q$last\E/;
  $common = length($last);
} else {
  $common = 0;
}
print chr($common), substr($_, $common);
$last = $_;
  ' prefix

And a gzip:

  gzip -c -9 zip

And the results:

  $ wc -c prefix; wc -c zip
  12754 prefix
  10116 zip

The secret sauce is most likely that gzip is bit-packing, using only a
few bits per character and not aligning with byte boundaries.

Not that I'm recommending just gzipping the whole packed-refs file. It
ruins the fast-lookup. We _could_ consider gzipping individual blocks of
a reftable (or any structure that allows you to search to a
constant-sized block and do a linear search from there). But given that
they're in the same ballpark, I'm happy with whatever ends up the
simplest to code and debug. ;)

Just for fun, here's the decoding script for the prefix-compression:

  perl -e '
while (read(STDIN, $common, 1)) {
  $common = ord($common);
  $rest = ;
  if ($common > 0) {
$rest = substr($last, 0, $common) . $rest
  }
  print $rest;
  $last = $rest}'  > OK, let me try to summarize to see if I understand.
> 
> When Shawn presented the proposal, a couple of colleagues here
> were as excited as I was, but the daring question is, why Shawn
> did not give the whole thing in BNF format from top down:
> 
>   initial-block
>   content-blocks*
>   (index-block)
>   footer

Yeah, I agree it took me a bit to figure out what was going on. A
high-level overview of the format would have been nice.

> >  So lookup really is more
> > like O(block_size * log(n/block_size)), but block_size being a constant,
> > it drops out to O(log n).
> 
> There is also an index block such that you can binary search across
> blocks, so
> 
> O( log(block_count) + log(intra_block_restarting_points) + small linear scan)
> 
> There are 2 binary searches, and the block size is an interesting
> thing to look at when making up trade offs.

Right, the cross-block index was what I was trying to account for.
Either way, from a big-O perspective the block size and the number of
restarts are constants with respect to the total number of entries. I'm
happy with log(n), though. It's hard to do better.

-Peff


Re: [PATCH 05/15] ref-filter: abstract ref format into its own struct

2017-07-13 Thread Junio C Hamano
Stefan Beller  writes:

> On Thu, Jul 13, 2017 at 8:01 AM, Jeff King  wrote:
>
>>  builtin/branch.c   | 14 +++---
>>  builtin/for-each-ref.c | 22 --
>>  builtin/tag.c  | 30 --
>>  builtin/verify-tag.c   | 12 ++--
>>  ref-filter.c   | 22 --
>>  ref-filter.h   | 22 +-
>>  6 files changed, 70 insertions(+), 52 deletions(-)
>
> The patch looks good to me. So some off-topic comments:
> I reviewed this patch from bottom up, i.e. I started looking at
> ref-filter.h, then  ref-filter.c and then the rest. If only you had formatted
> the patches with an orderfile. ;)

As a reviewer, for this particular patchq, I actually appreciated
that ref-filter.[ch] came at the end.  That forced me to think.

When I see something that used to be 0 is lost from the parameter
list of show_ref_array_item() at a callsite, I was forced to guess
what it is by looking at what is moved into the new structure
nearby, and keep doing that "figure out what is going on" game until
the "author's answer" is revealed at the end of the patch.  

By the time I reached ref-filter.[ch], I had a pretty good idea what
was done by only looking at the callers and being able to see if my
understanding matched the "author's answer" at the end of the patch
turned out to be a very good way to double-check if the patch was
sane.  If I were given the author's answer upfront, especially an
answer by somebody as competent as Peff, I'm sure I would have been
swayed into believing that whatever is written in the patch must be
correct without thinking the changes needed in the patch myself.

I do want to present from Doc to header to code when I am showing my
patch to others, so this is probably a good example that illustrates
that the preferred presentation order is not just personal
preference, but is different on occasion even for the same person.


Re: [PATCH 03/15] t: use test_decode_color rather than literal ANSI codes

2017-07-13 Thread Junio C Hamano
Jeff King  writes:

> When we put literal ANSI terminal codes into our test
> scripts, it makes diffs on those scripts hard to read (the
> colors may be indistinguishable from diff coloring, or in
> the case of a reset, may not be visible at all).
>
> Some scripts get around this by including human-readable
> names and converting to literal codes with a git-config
> hack. This makes the actual code diffs look OK, but test_cmp
> output suffers from the same problem.
>
> Let's use test_decode_color instead, which turns the codes
> into obvious text tags.

Nice.  Thanks.



Re: [PATCH 02/15] docs/for-each-ref: update pointer to color syntax

2017-07-13 Thread Junio C Hamano
Jeff King  writes:

> The documentation for the %(color) placeholder refers to the
> color.branch.* config for more details. But those details
> moved to their own section in b92c1a28f
> (Documentation/config.txt: describe 'color' value type in
> the "Values" section, 2015-03-03).  Let's update our
> pointer. We can steal the text from 30cfe72d3 (pretty: fix
> document link for color specification, 2016-10-11), which
> fixed the same problem in a different place.

Thanks.

> While we're at it, let's give an example, which makes the
> syntax much more clear than just the text.

Nice.  Thanks again.

>
> Signed-off-by: Jeff King 
> ---
>  Documentation/git-for-each-ref.txt | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/git-for-each-ref.txt 
> b/Documentation/git-for-each-ref.txt
> index 03e187a10..cc42c1283 100644
> --- a/Documentation/git-for-each-ref.txt
> +++ b/Documentation/git-for-each-ref.txt
> @@ -156,8 +156,10 @@ HEAD::
>   otherwise.
>  
>  color::
> - Change output color.  Followed by `:`, where names
> - are described in `color.branch.*`.
> + Change output color. Followed by `:`, where color
> + names are described under Values in the "CONFIGURATION FILE"
> + section of linkgit:git-config[1].  For example,
> + `%(color:bold red)`.
>  
>  align::
>   Left-, middle-, or right-align the content between


[GSoC][PATCH 4/5 v4] submodule: port submodule subcommand 'status' from shell to C

2017-07-13 Thread Prathamesh Chavan
This aims to make git-submodule 'status' a built-in. Hence, the function
cmd_status() is ported from shell to C. This is done by introducing
three functions: module_status(), submodule_status() and print_status().

The function module_status() acts as the front-end of the subcommand.
It parses subcommand's options and then calls the function
module_list_compute() for computing the list of submodules. Then
this functions calls for_each_submodule_list() looping through the
list obtained.

Then for_each_submodule_list() calls submodule_status() for each of the
submodule in its list. The function submodule_status() is responsible
for generating the status each submodule it is called for, and
then calls print_status().

Finally, the function print_status() handles the printing of submodule's
status.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
In this new version of patch:

Instead of using cmd_diff_files(), the process is optimized
by using ce_match_stat().

Also, the child_process running the command 'rev-parse --verify HEAD'
is removed for optimization reasons, and instead head_ref_submodule()
is used with callback function handle_submodule_head_ref().

The series differs from the complete weekly-update series as these patches
have been reviewed with mentors as well as on mailing list more no. of
times then the rest patches from the complete series, and hence
IMO, are ready for getting included. If not so, I would like to have
suggestions for improvising it.

The patches pass the complete test suite.

 builtin/submodule--helper.c | 146 
 git-submodule.sh|  49 +--
 2 files changed, 147 insertions(+), 48 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 80f744407..9c1630495 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -560,6 +560,151 @@ static int module_init(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
+struct status_cb {
+   const char *prefix;
+   unsigned int quiet: 1;
+   unsigned int recursive: 1;
+   unsigned int cached: 1;
+};
+#define STATUS_CB_INIT { NULL, 0, 0, 0 }
+
+static void print_status(struct status_cb *info, char state, const char *path,
+char *sub_sha1, char *displaypath)
+{
+   if (info->quiet)
+   return;
+
+   printf("%c%s %s", state, sub_sha1, displaypath);
+
+   if (state == ' ' || state == '+') {
+   struct argv_array name_rev_args = ARGV_ARRAY_INIT;
+
+   argv_array_pushl(_rev_args, "print-name-rev",
+path, sub_sha1, NULL);
+   print_name_rev(name_rev_args.argc, name_rev_args.argv,
+  info->prefix);
+   } else {
+   printf("\n");
+   }
+}
+
+static int handle_submodule_head_ref(const char *refname,
+const struct object_id *oid, int flags,
+void *cb_data)
+{
+   struct strbuf *output = cb_data;
+   if (oid)
+   strbuf_addstr(output, oid_to_hex(oid));
+   return 0;
+}
+
+static void status_submodule(const struct cache_entry *list_item, void 
*cb_data)
+{
+   struct status_cb *info = cb_data;
+   char *sub_sha1 = xstrdup(oid_to_hex(_item->oid));
+   char *displaypath;
+   struct stat st;
+
+   if (!submodule_from_path(null_sha1, list_item->name))
+   die(_("no submodule mapping found in .gitmodules for path 
'%s'"),
+ list_item->name);
+
+   displaypath = get_submodule_displaypath(list_item->name, info->prefix);
+
+   if (list_item->ce_flags) {
+   print_status(info, 'U', list_item->name,
+sha1_to_hex(null_sha1), displaypath);
+   goto cleanup;
+   }
+
+   if (!is_submodule_active(the_repository, list_item->name)) {
+   print_status(info, '-', list_item->name, sub_sha1, displaypath);
+   goto cleanup;
+   }
+
+   if (!lstat(list_item->name, ) && !ce_match_stat(list_item, , 0)) {
+   print_status(info, ' ', list_item->name, sub_sha1, displaypath);
+   } else {
+   if (!info->cached) {
+   struct strbuf sb = STRBUF_INIT;
+   if (head_ref_submodule(list_item->name,
+  handle_submodule_head_ref, ))
+   die(_("could not resolve HEAD ref inside the"
+ "submodule '%s'"), list_item->name);
+   print_status(info, '+', list_item->name, sb.buf,
+displaypath);
+   strbuf_release();
+   } else {
+   

[GSoC][PATCH 5/5 v4] submodule: port submodule subcommand 'sync' from shell to C

2017-07-13 Thread Prathamesh Chavan
Port the submodule subcommand 'sync' from shell to C using the same
mechanism as that used for porting submodule subcommand 'status'.
Hence, here the function cmd_sync() is ported from shell to C.
This is done by introducing three functions: module_sync(),
sync_submodule() and print_default_remote().

The function print_default_remote() is introduced for getting
the default remote as stdout.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 builtin/submodule--helper.c | 179 
 git-submodule.sh|  56 +-
 2 files changed, 180 insertions(+), 55 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 9c1630495..da91c489b 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -44,6 +44,20 @@ static char *get_default_remote(void)
return ret;
 }
 
+static int print_default_remote(int argc, const char **argv, const char 
*prefix)
+{
+   const char *remote;
+
+   if (argc != 1)
+   die(_("submodule--helper print-default-remote takes no 
arguments"));
+
+   remote = get_default_remote();
+   if (remote)
+   puts(remote);
+
+   return 0;
+}
+
 static int starts_with_dot_slash(const char *str)
 {
return str[0] == '.' && is_dir_sep(str[1]);
@@ -379,6 +393,25 @@ static void module_list_active(struct module_list *list)
*list = active_modules;
 }
 
+static char *get_up_path(const char *path)
+{
+   int i;
+   struct strbuf sb = STRBUF_INIT;
+
+   for (i = count_slashes(path); i; i--)
+   strbuf_addstr(, "../");
+
+   /*
+* Check if 'path' ends with slash or not
+* for having the same output for dir/sub_dir
+* and dir/sub_dir/
+*/
+   if (!is_dir_sep(path[strlen(path) - 1]))
+   strbuf_addstr(, "../");
+
+   return strbuf_detach(, NULL);
+}
+
 static int module_list(int argc, const char **argv, const char *prefix)
 {
int i;
@@ -724,6 +757,150 @@ static int module_name(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
+struct sync_cb {
+   const char *prefix;
+   unsigned int quiet: 1;
+   unsigned int recursive: 1;
+};
+#define SYNC_CB_INIT { NULL, 0, 0 }
+
+static void sync_submodule(const struct cache_entry *list_item, void *cb_data)
+{
+   struct sync_cb *info = cb_data;
+   const struct submodule *sub;
+   char *sub_key, *remote_key;
+   char *sub_origin_url, *super_config_url, *displaypath;
+   struct strbuf sb = STRBUF_INIT;
+   struct child_process cp = CHILD_PROCESS_INIT;
+
+   if (!is_submodule_active(the_repository, list_item->name))
+   return;
+
+   sub = submodule_from_path(null_sha1, list_item->name);
+
+   if (!sub || !sub->url)
+   die(_("no url found for submodule path '%s' in .gitmodules"),
+ list_item->name);
+
+   if (starts_with_dot_dot_slash(sub->url) || 
starts_with_dot_slash(sub->url)) {
+   char *remote_url, *up_path;
+   char *remote = get_default_remote();
+   char *remote_key = xstrfmt("remote.%s.url", remote);
+
+   if (git_config_get_string(remote_key, _url))
+   remote_url = xgetcwd();
+
+   up_path = get_up_path(list_item->name);
+   sub_origin_url = relative_url(remote_url, sub->url, up_path);
+   super_config_url = relative_url(remote_url, sub->url, NULL);
+
+   free(remote);
+   free(remote_key);
+   free(up_path);
+   free(remote_url);
+   } else {
+   sub_origin_url = xstrdup(sub->url);
+   super_config_url = xstrdup(sub->url);
+   }
+
+   displaypath = get_submodule_displaypath(list_item->name, info->prefix);
+
+   if (!info->quiet)
+   printf(_("Synchronizing submodule url for '%s'\n"),
+displaypath);
+
+   sub_key = xstrfmt("submodule.%s.url", sub->name);
+   if (git_config_set_gently(sub_key, super_config_url))
+   die(_("failed to register url for submodule path '%s'"),
+ displaypath);
+
+   if (!is_submodule_populated_gently(list_item->name, NULL))
+   goto cleanup;
+
+   prepare_submodule_repo_env(_array);
+   cp.git_cmd = 1;
+   cp.dir = list_item->name;
+   argv_array_pushl(, "submodule--helper",
+"print-default-remote", NULL);
+   if (capture_command(, , 0))
+   die(_("failed to get the default remote for submodule '%s'"),
+ list_item->name);
+
+   strbuf_strip_suffix(, "\n");
+   remote_key = xstrfmt("remote.%s.url", sb.buf);
+   strbuf_release();
+
+   child_process_init();
+   

[GSoC][PATCH 3/5 v4] submodule: port set_name_rev() from shell to C

2017-07-13 Thread Prathamesh Chavan
Function set_name_rev() is ported from git-submodule to the
submodule--helper builtin. The function get_name_rev() generates the
value of the revision name as required, and the function
print_name_rev() handles the formating and printing of the obtained
revision name.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 builtin/submodule--helper.c | 63 +
 git-submodule.sh| 16 ++--
 2 files changed, 65 insertions(+), 14 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index e41572f7a..80f744407 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -244,6 +244,68 @@ static char *get_submodule_displaypath(const char *path, 
const char *prefix)
}
 }
 
+static char *get_name_rev(const char *sub_path, const char* object_id)
+{
+   struct strbuf sb = STRBUF_INIT;
+   const char ***d;
+
+   static const char *describe_bare[] = {
+   NULL
+   };
+
+   static const char *describe_tags[] = {
+   "--tags", NULL
+   };
+
+   static const char *describe_contains[] = {
+   "--contains", NULL
+   };
+
+   static const char *describe_all_always[] = {
+   "--all", "--always", NULL
+   };
+
+   static const char **describe_argv[] = {
+   describe_bare, describe_tags, describe_contains,
+   describe_all_always, NULL
+   };
+
+   for (d = describe_argv; *d; d++) {
+   struct child_process cp = CHILD_PROCESS_INIT;
+   prepare_submodule_repo_env(_array);
+   cp.dir = sub_path;
+   cp.git_cmd = 1;
+   cp.no_stderr = 1;
+
+   argv_array_push(, "describe");
+   argv_array_pushv(, *d);
+   argv_array_push(, object_id);
+
+   if (!capture_command(, , 0) && sb.len) {
+   strbuf_strip_suffix(, "\n");
+   return strbuf_detach(, NULL);
+   }
+
+   }
+
+   strbuf_release();
+   return NULL;
+}
+
+static int print_name_rev(int argc, const char **argv, const char *prefix)
+{
+   char *namerev;
+   if (argc != 3)
+   die("print-name-rev only accepts two arguments:  ");
+
+   namerev = get_name_rev(argv[1], argv[2]);
+   if (namerev && namerev[0])
+   printf(" (%s)", namerev);
+   printf("\n");
+
+   return 0;
+}
+
 struct module_list {
const struct cache_entry **entries;
int alloc, nr;
@@ -1242,6 +1304,7 @@ static struct cmd_struct commands[] = {
{"relative-path", resolve_relative_path, 0},
{"resolve-relative-url", resolve_relative_url, 0},
{"resolve-relative-url-test", resolve_relative_url_test, 0},
+   {"print-name-rev", print_name_rev, 0},
{"init", module_init, SUPPORT_SUPER_PREFIX},
{"remote-branch", resolve_remote_submodule_branch, 0},
{"push-check", push_check, 0},
diff --git a/git-submodule.sh b/git-submodule.sh
index e131760ee..e988167e0 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -759,18 +759,6 @@ cmd_update()
}
 }
 
-set_name_rev () {
-   revname=$( (
-   sanitize_submodule_env
-   cd "$1" && {
-   git describe "$2" 2>/dev/null ||
-   git describe --tags "$2" 2>/dev/null ||
-   git describe --contains "$2" 2>/dev/null ||
-   git describe --all --always "$2"
-   }
-   ) )
-   test -z "$revname" || revname=" ($revname)"
-}
 #
 # Show commit summary for submodules in index or working tree
 #
@@ -1042,14 +1030,14 @@ cmd_status()
fi
if git diff-files --ignore-submodules=dirty --quiet -- 
"$sm_path"
then
-   set_name_rev "$sm_path" "$sha1"
+   revname=$(git submodule--helper print-name-rev 
"$sm_path" "$sha1")
say " $sha1 $displaypath$revname"
else
if test -z "$cached"
then
sha1=$(sanitize_submodule_env; cd "$sm_path" && 
git rev-parse --verify HEAD)
fi
-   set_name_rev "$sm_path" "$sha1"
+   revname=$(git submodule--helper print-name-rev 
"$sm_path" "$sha1")
say "+$sha1 $displaypath$revname"
fi
 
-- 
2.13.0



[GSoC][PATCH 2/5 v4] submodule--helper: introduce for_each_submodule_list()

2017-07-13 Thread Prathamesh Chavan
Introduce function for_each_submodule_list() and
replace a loop in module_init() with a call to it.

The new function will also be used in other parts of the
system in later patches.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 builtin/submodule--helper.c | 39 +--
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 7af4de09b..e41572f7a 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -14,6 +14,9 @@
 #include "refs.h"
 #include "connect.h"
 
+typedef void (*submodule_list_func_t)(const struct cache_entry *list_item,
+ void *cb_data);
+
 static char *get_default_remote(void)
 {
char *dest = NULL, *ret;
@@ -352,17 +355,30 @@ static int module_list(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
-static void init_submodule(const char *path, const char *prefix, int quiet)
+static void for_each_submodule_list(const struct module_list list,
+   submodule_list_func_t fn, void *cb_data)
 {
+   int i;
+   for (i = 0; i < list.nr; i++)
+   fn(list.entries[i], cb_data);
+}
+
+struct init_cb {
+   const char *prefix;
+   unsigned int quiet: 1;
+};
+#define INIT_CB_INIT { NULL, 0 }
+
+static void init_submodule(const struct cache_entry *list_item, void *cb_data)
+{
+   struct init_cb *info = cb_data;
const struct submodule *sub;
struct strbuf sb = STRBUF_INIT;
char *upd = NULL, *url = NULL, *displaypath;
 
-   /* Only loads from .gitmodules, no overlay with .git/config */
-   gitmodules_config();
-   displaypath = get_submodule_displaypath(path, prefix);
+   displaypath = get_submodule_displaypath(list_item->name, info->prefix);
 
-   sub = submodule_from_path(null_sha1, path);
+   sub = submodule_from_path(null_sha1, list_item->name);
 
if (!sub)
die(_("No url found for submodule path '%s' in .gitmodules"),
@@ -374,7 +390,7 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
 *
 * Set active flag for the submodule being initialized
 */
-   if (!is_submodule_active(the_repository, path)) {
+   if (!is_submodule_active(the_repository, list_item->name)) {
strbuf_addf(, "submodule.%s.active", sub->name);
git_config_set_gently(sb.buf, "true");
}
@@ -416,7 +432,7 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
if (git_config_set_gently(sb.buf, url))
die(_("Failed to register url for submodule path '%s'"),
displaypath);
-   if (!quiet)
+   if (!info->quiet)
fprintf(stderr,
_("Submodule '%s' (%s) registered for path 
'%s'\n"),
sub->name, url, displaypath);
@@ -445,10 +461,10 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
 
 static int module_init(int argc, const char **argv, const char *prefix)
 {
+   struct init_cb info = INIT_CB_INIT;
struct pathspec pathspec;
struct module_list list = MODULE_LIST_INIT;
int quiet = 0;
-   int i;
 
struct option module_init_options[] = {
OPT__QUIET(, N_("Suppress output for initializing a 
submodule")),
@@ -473,8 +489,11 @@ static int module_init(int argc, const char **argv, const 
char *prefix)
if (!argc && git_config_get_value_multi("submodule.active"))
module_list_active();
 
-   for (i = 0; i < list.nr; i++)
-   init_submodule(list.entries[i]->name, prefix, quiet);
+   info.prefix = prefix;
+   info.quiet = !!quiet;
+
+   gitmodules_config();
+   for_each_submodule_list(list, init_submodule, );
 
return 0;
 }
-- 
2.13.0



[GSoC][PATCH 1/5 v4] submodule--helper: introduce get_submodule_displaypath()

2017-07-13 Thread Prathamesh Chavan
Introduce function get_submodule_displaypath() to replace the code
occurring in submodule_init() for generating displaypath of the
submodule with a call to it.

This new function will also be used in other parts of the system
in later patches.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 builtin/submodule--helper.c | 33 ++---
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 6abdad329..7af4de09b 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -220,6 +220,27 @@ static int resolve_relative_url_test(int argc, const char 
**argv, const char *pr
return 0;
 }
 
+static char *get_submodule_displaypath(const char *path, const char *prefix)
+{
+   const char *super_prefix = get_super_prefix();
+
+   if (prefix && super_prefix) {
+   BUG("cannot have prefix '%s' and superprefix '%s'",
+   prefix, super_prefix);
+   } else if (prefix) {
+   struct strbuf sb = STRBUF_INIT;
+   char *displaypath = xstrdup(relative_path(path, prefix, ));
+   strbuf_release();
+   return displaypath;
+   } else if (super_prefix) {
+   int len = strlen(super_prefix);
+   const char *format = is_dir_sep(super_prefix[len - 1]) ? "%s%s" 
: "%s/%s";
+   return xstrfmt(format, super_prefix, path);
+   } else {
+   return xstrdup(path);
+   }
+}
+
 struct module_list {
const struct cache_entry **entries;
int alloc, nr;
@@ -339,16 +360,7 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
 
/* Only loads from .gitmodules, no overlay with .git/config */
gitmodules_config();
-
-   if (prefix && get_super_prefix())
-   die("BUG: cannot have prefix and superprefix");
-   else if (prefix)
-   displaypath = xstrdup(relative_path(path, prefix, ));
-   else if (get_super_prefix()) {
-   strbuf_addf(, "%s%s", get_super_prefix(), path);
-   displaypath = strbuf_detach(, NULL);
-   } else
-   displaypath = xstrdup(path);
+   displaypath = get_submodule_displaypath(path, prefix);
 
sub = submodule_from_path(null_sha1, path);
 
@@ -363,7 +375,6 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
 * Set active flag for the submodule being initialized
 */
if (!is_submodule_active(the_repository, path)) {
-   strbuf_reset();
strbuf_addf(, "submodule.%s.active", sub->name);
git_config_set_gently(sb.buf, "true");
}
-- 
2.13.0



Re: reftable: new ref storage format

2017-07-13 Thread Stefan Beller
On Thu, Jul 13, 2017 at 12:32 PM, Jeff King  wrote:
> On Wed, Jul 12, 2017 at 05:17:58PM -0700, Shawn Pearce wrote:
>
>> We've been having scaling problems with insane number of references
>> (>866k), so I started thinking a lot about improving ref storage.
>>
>> I've written a simple approach, and implemented it in JGit.
>> Performance is promising:
>>
>>   - 62M packed-refs compresses to 27M
>>   - 42.3 usec lookup
>
> Exciting. I'd love for us to have a simple-ish on-disk structure that
> scales well and doesn't involve a dependency on a third-party database
> structure.
>
> Let me see what holes I can poke in your proposal, though. :)
>
>> ### Problem statement
>>
>> Some repositories contain a lot of references (e.g.  android at 866k,
>> rails at 31k).  The existing packed-refs format takes up a lot of
>> space (e.g.  62M), and does not scale with additional references.
>> Lookup of a single reference requires linearly scanning the file.
>
> I think the linear scan is actually an implementation short-coming. Even
> though the records aren't fixed-length, the fact that newlines can only
> appear as end-of-record is sufficient to mmap and binary search a
> packed-refs file (you just have to backtrack a little when you land in
> the middle of a record).

Except that a record is a "delta" to the previous record, so it's not
just finding a record, but reconstructing it. Example for records:

varint( prefix_length )
varint( (suffix_length << 2) | type )
suffix
value?

First record:

 0,
 16 << 2 | 0x01,
  "refs/heads/maint"
  08f9c32463bf9e578acb7ac5f77afd36e803c6bc

next record (refs/heads/master):

  13
  4 << 2 | 0x01
  "ster",
  80145b1e412719c960036c8c62a9e35dd23a5b2d

Now if you found the second one, you cannot reconstruct its
real name (refs/heads/master) without knowing the name
of the first. The name of the first is easy because the prefix_length
is 0. If it also had a prefix length != 0 you'd have to go back more.


>> - Occupy less disk space for large repositories.
>
> Good goal.  Just to play devil's advocate, the simplest way to do that
> with the current code would be to gzip packed-refs (and/or store sha1s
> as binary). That works against the "mmap and binary search" plan,
> though. :)

Given the compression by delta-ing the name to the previous change and
the fact that Gerrit has

  refs/heads/changes/1
  refs/heads/changes/2
  refs/heads/changes/3
  ...

I think this format would trump a "dumb" zip.
(Github having sequentially numbered pull requests would also
benefit here)

>> ## File format
>
> OK, let me try to summarize to see if I understand.

When Shawn presented the proposal, a couple of colleagues here
were as excited as I was, but the daring question is, why Shawn
did not give the whole thing in BNF format from top down:

  initial-block
  content-blocks*
  (index-block)
  footer

> The reftable file is a sequence of blocks, each of which contains a
> finite set of heavily-compressed refs. You have to read each block
> sequentially,

Each block may have restarting points, that allow for intra-block
binary search.

> but since they're a fixed size, that's still a
> constant-time operation (I'm ignoring the "restarts" thing for now). You
> find the right block by reading the index.

or by reading the footer at the end. If the footer and the index differ
in block size (one bit flipped), we can ask the CRC of the footer
for more guidance.

>  So lookup really is more
> like O(block_size * log(n/block_size)), but block_size being a constant,
> it drops out to O(log n).

There is also an index block such that you can binary search across
blocks, so

O( log(block_count) + log(intra_block_restarting_points) + small linear scan)

There are 2 binary searches, and the block size is an interesting
thing to look at when making up trade offs.


Re: "groups of files" in Git?

2017-07-13 Thread Nikolay Shustov
For me the roadblock for multiple iterations through merging of the
different parts (S, C, then C+S) is the time that will be spent on
rebuilding the mainline.
That's why I would like to have C+S in the same source tree then run
tests for S, tests for C (if they can be run standalone) and C+S
tests, then tests for whatever other pieces may be affected. (As I
mentioned, there are more layers than client + server in my situation,
e.g. client  + transport + server).

I am not really try to ignite the holy war between Perforce and Git
(and why would one???), but if you are interested in the answer on how
you'd do your scenario in Perforce, it would be: "use shelved
changelists".
In Perforce, you could "shelve" the changelist, similar to "stash" in
Git, but the difference is that the Perforce shelved changes are
accessible across clients. I.e. the other developer can "unshelve"
these pending changes to its sandbox (to the same or the different
branch) so that sandbox would get the pending changes as well. That
would be like the developer made these changes himself. Whatever
automated/manual process is involved, it is typical to run "a trial
build/tests" on shelved changelist (i.e. uncommitted yet files) to
verify the quality of changes.
Git achieves the same through the ease of manipulation with branches
and I like the way it does it much more.

My question was about how to robustly handle "multiple pending
commits" which in Perforce are represented by concept of pending
changelists.

On Thu, Jul 13, 2017 at 2:22 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> On Tue, Jul 11, 2017 at 8:45 AM, Nikolay Shustov
>>
>>> With Git I cannot seem to finding the possibility to figure out how to
>>> achieve the same result. And the problem is that putting change sets
>>> on different Git branches (or workdirs, or whatever Git offers that
>>> makes the changes to be NOT in the same source tree) is not a viable
>>> option from me as I would have to re-build code as I re-integrate the
>>> changes between the branches (or whatever changes separation Git
>>> feature is used).
>>
>> you would merge the branches and then run the tests/integration. Yes that
>> seems cumbersome.
>
> Sometimes the need to make trial merge for testing cannot be avoided
> and having branches for separate topics is the only sensible
> approach, at least in the Git world.
>
> Imagine your project has two components that are interrelated, say,
> the server and the client, that have to work well with each other.
> In addition, you want to make sure your updated server works well
> with existing clients, and vice versa.
>
> One way that naturally maps this scenario to the development
> workflow is to have a server-update topic and a client-update topic
> branches, and separate changes to update each side with their own
> commits:
>
>  s---s---Sserver-update topic
> /
> ---o---ooMmainline
> \
>  c---c---Cclient-update topic
>
> And during the development of these *-update topics, you try three
> merges:
>
>  (1) Merge S to the mainline M and test the whole thing, to make sure
>  that existing client will still be able to talk with the
>  updated server.
>
>  (2) Merge C to the mainline M and test the whole thing, to make
>  sure that updated clients will still be able to talk with the
>  existing server.
>
>  (3) Merge both S and C to the mainline M and test the whole thing,
>  to make sure the updated ones talk to each other.
>
> If there is no significant development going on on the mainline in
> the meantime, (1) and (2) can be done by trying out S and C alone
> without making a trial merge with M.  The same for (3)---it can be
> just a trial merge between S and C without updates that happened on
> the mainline.
>
> I'd love to hear from folks in Perforce or other world how they
> address this scenario with their system.


Re: [PATCH v1 0/4] Teach 'run' perf script to read config files

2017-07-13 Thread Christian Couder
On Thu, Jul 13, 2017 at 8:40 PM, Jeff King  wrote:
> On Thu, Jul 13, 2017 at 11:29:10AM -0700, Junio C Hamano wrote:
>
>> > So then I think your config file primarily becomes about defining the
>> > properties of each run. I'm not sure if it would look like what you're
>> > starting on here or not.
>>
>> Yeah, I suspect that the final shape that defines the matrix might
>> have to become quite a bit different.
>
> I think it would help if the perf code was split better into three
> distinct bits:
>
>   1. A data-store capable of storing the run tuples along with their
>  outcomes for each test.
>
>   2. A "run" front-end that runs various profiles (based on config,
>  command-line options, etc) and writes the results to the data
>  store.
>
>   3. A flexible viewer which can slice and dice the contents of the data
>  store according to different parameters.
>
> We're almost there now. The "run" script actually does store results,
> and you can view them via "aggregate.pl" without actually re-running the
> tests. But the data store only indexes on one property: the tree that
> was tested (and all of the other properties are ignored totally; you can
> get some quite confusing results if you do a "./run" using say git.git
> as your test repo, and then a followup with "linux.git").

Yeah I agree, but if possible I'd like to avoid working on the three
different parts at the same time.

I haven't thought much about how to improve the data store yet.
I may have to look at that soon though.

> I have to imagine that somebody else has written such a system already
> that we could reuse.  I don't know of one off-hand, but this is also not
> an area where I've spent a lot of time.

Actually about the viewer AEvar suggested having something like
speed.python.org and speed.pypy.org which seem to be made using
https://github.com/tobami/codespeed

So unless something else is suggested, I plan to make it possible to
import the results of the perf tests into codespeed, but I haven't
looked at that much yet.

> We're sort of drifting off topic from Christian's patches here. But if
> we did have a third-party system, I suspect the interesting work would
> be setting up profiles for the "run" tool to kick off. And we might be
> stuck in such a case using whatever format the tool prefers. So having a
> sense of what the final solution looks like might help us know whether
> it makes sense to introduce a custom config format here.

I don't think we should completely switch to a third-party system for
everything.
Though it would simplify my work if we decide to do that.

I think people might want different viewers, so we should just make
sure that we can easily massage the results from the run script, so
that it will be easy to provide them as input to many different
viewers.

So we are pretty free to decide how we specify which tests should be
performed on which revision, and I think a config file is the best
way.


Re: [RFC PATCH 1/3] promised-blob, fsck: introduce promised blobs

2017-07-13 Thread Jonathan Tan
On Wed, 12 Jul 2017 13:29:11 -0400
Jeff Hostetler  wrote:

> My primary concern is scale and managing the list of objects over time.
> 
> My fear is that this list will be quite large.  If we only want to omit
> the very large blobs, then maybe not.  But if we want to expand that
> scope to also omit other objects (such as a clone synchronized with a
> sparse checkout), then that list will get large on large repos.  For
> example, on the Windows repo we have (conservatively) 100M+ blobs (and
> growing).  Assuming 28 bytes per, gives a 2.8GB list to be manipulated.
> 
> If I understand your proposal, newly-omitted blobs would need to be
> merged into the promised-blob list after each fetch.  The fetch itself
> may not have that many new entries, but inserting them into the existing
> list will be slow.  Also, mmap'ing and bsearch'ing will likely have
> issues.  And there's likely to be a very expensive step to remove
> entries from the list as new blobs are received (or locally created).
> 
> In such a "sparse clone", it would be nice to omit unneeded tree objects
> in addition to just blobs.   I say that because we are finding with GVFS
> on the Windows repo, that even with commits-and-trees-only filtering,
> the number of tree objects is overwhelming.

I know that discussion has shifted to the possibility of not having this
list at all, and not sending size information together with the fetch,
but going back to this...maybe omitting trees *is* the solution to both
the large local list and the large amount of size information needing to
be transferred.

So the large-blob (e.g. Android) and many-blob (e.g. Windows) cases
would look like this:

 * Large-blob repositories have no trees omitted and a few blobs
   omitted, and we have sizes for all of them.
 * Many-blob repositories have many trees omitted and either all
   blobs omitted (and we have size information for them, useful for FUSE
   or FUSE-like things, for example) or possibly no blobs omitted (for
   example, if shallow clones are going to be the norm, there won't be
   many blobs to begin with if trees are omitted).

This seems better than an intermediate solution for the many-blob
repository case in which we still keep all the trees but also try to
avoid sending and storing as much information about the blobs as
possible, because that doesn't seem to provide us with much savings
(because the trees as a whole are just as large, if not larger, than the
blob information).

> So I'm also concerned about
> limiting the list to just blobs.  If we need to have this list, it
> should be able to contain any object.  (Suggesting having an object type
> in the entry.)

This makes sense - I'll add it in.

> I also have to wonder about the need to have a complete list of omitted
> blobs up front.  It may be better to just relax the consistency checks
> and assume a missing blob is "intentionally missing" rather than
> indicating a corruption somewhere.  And then let the client do a later
> round-trip to either demand-load the object -or- demand-load the
> existence/size info if/when it really matters.
> 
> Maybe we should add a verb to your new fetch-blob endpoint to just get
> the size of one or more objects to help with this.

If we allow the omission of trees, I don't think the added complexity of
demand-loading sizes is worth it.

What do you think of doing this:
 * add a "type" field to the list of promised objects (formerly the list
   of promised blobs)
 * retain mandatory size for blobs
 * retain single file containing list of promised objects (I don't feel
   too strongly about this, but it has a slight simplicity and
   in-between-GC performance advantage)


Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)

2017-07-13 Thread Junio C Hamano
Junio C Hamano  writes:

> Jeff King  writes:
>
>> [1] Another sticking point is that this really does need to be in the
>> reflog of the ref we are pushing (and not, e.g., HEAD). But one does
>> not always push from a ref. I suspect that's OK in practice, though.
>> If you are doing "git push --force-with-lease HEAD~2:master", it is
>> probably OK for us to error out with "uh, lease from what?".
>
> I actually expect this to bite me personally, as I often do not
> rewind the actual "topic" ref that is my target of rewriting,
> because I am a chicken---I detach the HEAD and build a history there
> to see if I can come up with a better history, compare the result
> with what is on the "topic" (which is not touched at all during the
> rewriting), and after all is done, do a "checkout -B topic".  The
> "remote tip must appear in the local reflog" rule will never be
> satisfied.

Well "bite" is not quite accurate, as I do not push to a repository
racing with others, and there is no reason for me to use --force
with lease.  I always push '+pu', and push '+next" once after each
release, and there is no need for any lease when I push things out.

But if I were collaborating with others in another project that uses
a shared central repository workflow, and if I were in the mood to
see how well/better this "improved safer DWIM" mode behaves, then it
would bite me.



Re: [PATCH] submodule: use cheaper check for submodule pushes

2017-07-13 Thread Stefan Beller
On Thu, Jul 13, 2017 at 11:37 AM, Junio C Hamano  wrote:

>
> I think Jonathan's question (which I concurred) is if we also ended
> up relying on the side effect of calling that function (i.e. being
> able to now find objects that are not in our repository but in the
> submodule's object store).  By looking at the eb21c732d6, we can
> tell that the original didn't mean to and didn't add any code that
> relies on the ability to be able to read from the submodule object
> store.  I am not sure if that is still true after 5 years (i.e. is
> there any new code added in the meantime that made us depend on the
> ability to read from submodule object store?).

Yes we are safe, because the function itself only spawns a child process
(not using any of the objects).

It's only caller push_unpushed_submodules also doesn't rely on objects
loaded after calling push_submodule.

The caller of push_unpushed_submodules (transport.c, transport_push)
also doesn't need submodule objects loaded.

> My hunch (and hope) is that we are probably safe, but that is a lot
> weaker than "yes this is a good change we want to apply".

Given the above (I went through the code), all I can do is repeating
"yes this is a good change we want to apply".

Thanks,
Stefan


Re: Git on macOS shows committed files as untracked

2017-07-13 Thread Junio C Hamano
Torsten Bögershausen  writes:

> Thanks for the fast analyzes -
> in short:
> what does
> git -c core.precomposeunicode=true status
> say ?
>
> The easiest thing may be to set
> git config --global core.precomposeunicode true

Good suggestion.

I learned a new thing today.  I somehow thought that precompose
trick was only about argv[] when a program starts up and did not
apply to paths readdir(3) finds through dir.c, e.g.

$ git add .

But apparently there is replacement readdir() used in compat/ for
MacOSX so the paths from the system are also covered by the
configuration.


Re: reftable: new ref storage format

2017-07-13 Thread Jeff King
On Wed, Jul 12, 2017 at 05:17:58PM -0700, Shawn Pearce wrote:

> We've been having scaling problems with insane number of references
> (>866k), so I started thinking a lot about improving ref storage.
> 
> I've written a simple approach, and implemented it in JGit.
> Performance is promising:
> 
>   - 62M packed-refs compresses to 27M
>   - 42.3 usec lookup

Exciting. I'd love for us to have a simple-ish on-disk structure that
scales well and doesn't involve a dependency on a third-party database
structure.

Let me see what holes I can poke in your proposal, though. :)

> ### Problem statement
> 
> Some repositories contain a lot of references (e.g.  android at 866k,
> rails at 31k).  The existing packed-refs format takes up a lot of
> space (e.g.  62M), and does not scale with additional references.
> Lookup of a single reference requires linearly scanning the file.

I think the linear scan is actually an implementation short-coming. Even
though the records aren't fixed-length, the fact that newlines can only
appear as end-of-record is sufficient to mmap and binary search a
packed-refs file (you just have to backtrack a little when you land in
the middle of a record).

I wrote a proof of concept a while ago, but got stuck on integrating it
into the ref code, because of some of the assumptions that it made.
Michael Haggerty has been doing several rounds of refactors to remove
those assumptions. I think we're pretty close (I've actually seen the
endgame where packed-refs is fully binary searched, but I think there
are a few more cleanups necessary to cover all cases).

> Atomic pushes modifying multiple references require copying the
> entire packed-refs file, which can be a considerable amount of data
> moved (e.g. 62M in, 62M out) for even small transactions (2 refs
> modified).

I think your definition of atomic here doesn't match what git.git does.

Our atomic push just takes the lock on all of the refs, and then once it
has all of them, commits all of the locks. So it's atomic in the sense
that you either get all or none of the writes (modulo a commit failure
in the middle, which we naturally have no rollback plan for). But it can
be done without touching the packed-refs file at all.

I imagine that you're looking at atomicity from the perspective of a
reader. In the git.git scheme, the reader may see a half-committed
transaction. If you dispense with loose refs entirely and treat the
packed-refs file as a single poorly-implemented key/value database, then
you get reader atomicity (but O(size_of_database) write performance).

> Repositories with many loose references occupy a large number of disk
> blocks from the local file system, as each reference is its own file
> storing 41 bytes.  This negatively affects the number of inodes
> available when a large number of repositories are stored on the same
> filesystem.  Readers are also penalized due to the larger number of
> syscalls required to traverse and read the `$GIT_DIR/refs` directory.

In my experience, the syscalls involved in loose refs aren't usually a
big deal. If you have 800k refs, they're not all changing constantly. So
a single pack-refs "fixes" performance going forward. What _is_ a big
deal is that the packing process is complicated, readers have a very
un-atomic view because of the myriad of files involved, and you get
annoying lock contention during packing, as well as between deletions
that have to rewrite packed-refs.

But I'm not sure if you meant to contrast here a system where we didn't
use packed-refs at all (though of course such a system is very much not
atomic by the definition above).


> ### Objectives
> 
> - Near constant time lookup for any single reference, even when the
>   repository is cold and not in process or kernel cache.

Good goal, though TBH I'd be happy with O(log n).

A related one is being able to traverse a subset of refs in
O(nr_traversed). E.g., "git tag -l" should not have to do work
proportional to what is in refs/changes. That falls out of most
proposals that allow fast lookups, but notably not a straight
hash-table.

> - Occupy less disk space for large repositories.

Good goal.  Just to play devil's advocate, the simplest way to do that
with the current code would be to gzip packed-refs (and/or store sha1s
as binary). That works against the "mmap and binary search" plan,
though. :)

> - Support atomic pushes with lower copying penalities.

"Lower" is vague. I'd hope we could do updates with effort linear to the
number of updated refs (it's OK if there's a constant factor, like
writing extra blocks, as long as a single-ref update is about as
expensive in a 10-ref repo as in a 10K-ref repo).

> Scan (read 866k refs) and lookup (single ref from 866k refs):
> 
> format  | scan| lookup
> |:|---:
> packed-refs |  380 ms | 375420.0 usec
> reftable|  125 ms | 42.3 usec

Don't forget in git.git that the memory usage for packed-refs is
atrocious (because we parse 

Re: "groups of files" in Git?

2017-07-13 Thread Nikolay Shustov
Thank you, but I am not sure I quite understand the idea.
Could you please elaborate on it for the following example?

I have two Perforce changelists ("A" and "B") that group uncommitted
sets of files (paths to each of files could be different):

changelist A:
file1
file2

changelist B:
file3
file4

In Perforce, I am able to do the following:
- move files between changelists (e.g. file1 could be moved to changelist B)
- add new files to changeslit (e.g. changelist B can get additional file5)
- revert file changes which would effectively remove file from the
changelst (e.g. revert file2 will remove it from changelist A)

How would I do it with sets of files that would belong to Git commit?


On Thu, Jul 13, 2017 at 2:09 PM, Junio C Hamano  wrote:
> Nikolay Shustov  writes:
>
>> Thank you for the detailed explanation, it looks like merging the
>> commits would be helpful in my case. And I think it is a very good
>> analogy that Perforce changelists are like multiple pending committs,
>> if Git were supporting such.
>>
>> What it won't be achieving by using commits in this schema is the
>> following thing I can do in Perforce:
>> In the uncommitted Perforce changelists I can revert the changed file
>> to the original state and move the files between the changelists.
>> Quite often, while working on something, in the middle I would decide
>> to isolate changes to a certain set of files to a separate changelsit
>> - but then I might change my mind. It is all flexible until I actually
>> commit my Perforce changelist, after which it becomes very much as
>> committed changes in any other source control.
>> This is actual flexibility I am looking for achieving in Git.
>
> I actually think we already have such a flexibility.  Unlike
> Perforce, Git is distributed, and the most important aspect of the
> distinction is that what happens _in_ your local Git repository may
> be called "committed" in Git lingo, but not visible to the public.
>
> You can consider these commits you make in your repository "pending"
> when you think of your workflow in Perforce terms, until you merge
> and push out the result, which roughly corresponds to "submitting"
> in Perforce lingo.
>
> Once you start treating your local commits that you haven't pushed
> out as changes that are still "pending" when observed from the
> outside world, you'd realize that you have as much flexibilty, if
> not more, to dice and slice them with the local tools like "rebase
> -i", "add -p", etc., as you would have in your Perforce workflow,
> I would think.
>
>


Re: [PATCH] git-p4: parse marshal output "p4 -G" in p4 changes

2017-07-13 Thread Junio C Hamano
Miguel Torroja  writes:

> I've just sent in reply to your previous e-mail three different patches.
>
> * The first patch is just to show some broken tests,
> * Second patch is to fix the original issue I had (the one that
> initiated this thread)
> * Third patch is the one that filters out "info" messages in p4CmdList
> (this time default is reversed and set to False, what is the original
> behaviour). The two test cases that are cured with this change have to
> set explicitely skip_info=True.

The approach looks reasonable.  By having tests that expect failure
upfront, the series clearly shows how the code changes in later
steps make things better.

Thanks.  Will replace.


Re: [PATCH 03/15] t: use test_decode_color rather than literal ANSI codes

2017-07-13 Thread Junio C Hamano
Jeff King  writes:

>> > @@ -59,7 +54,8 @@ EOF
>> >  # to this test since it does not contain any decoration, hence 
>> > --first-parent
>> >  test_expect_success 'Commit Decorations Colored Correctly' '
>> > git log --first-parent --abbrev=10 --all --decorate --oneline 
>> > --color=always |
>> > -   sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" >out &&
>> > +   sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" |
>> > +   test_decode_color >out &&
>> 
>> Just some thoughts:
>> 
>> This extension of the pipe-chain is not making it worse as gits exit code
>> is already hidden.
>
> Yes, I noticed the existing pipe-chain. We can fix it while we're here,
> though I think it's not a big deal in practice.
>
>> The sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" is sort of funny, because
>> I would have expected it to break in the future with e.g. the sha1 to longer
>> hash conversion. But as we specify --abbrev=10, this seems future proof.
>> In an ideal world this would be encapsulated in a function (c.f. 
>> t/diff-lib.sh).

Actually, --abbrev=10 does not guarantee that the hex part is always
10 bytes long, so it is not future-proofing, but I expect it would
work out fine in practice.

> I agree it's a bit gross. Possibly:
>
>   git log --format='%C(auto)%d %s'
>
> would be easier for the test to parse (I'm pretty sure that didn't exist
> back when this test was written).

Yeah, that may make the test easier to read, too.

Thanks.


Re: [PATCH] commit & merge: modularize the empty message validator

2017-07-13 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> I have a few doubts for which I need clarification to move on with
> this.
>
> 1. If we abort when the  part is empty wouldn't it be too
> restrictive ?
>
> IOW, Wouldn't it affect users of "git commit -‍-cleanup=verbatim"
> who wish to commit only the comments or parts of it ?
> (I'm not sure if someone would find that useful)
>
> 2. Is it ok to use the "find_trailer_start" function of "trailer.c"
> to locate the trailer? 
>
> Note: It has a little issue that it wouldn't detect the trailer if
> the message comprises of one trailer alone and no other text. This
> case occurs while aborting a commit started using "git commit -s".
> Any possibilities to overcome the issue?
>
> 3. Ignoring point 1 for now, What other helper methods except the
> ones listed below could be helpful in the separating the cleaned up
> commit message into the , ,  ?
>
> * ignore_non_trailer
> * find_trailer_start

All good points; if it bothers you that "commit" and "merge" define
"emptyness" of the buffer differently too much, I think you could
persuade me to unify them to "the buffer _must_ contain no bytes",
i.e. not special-casing sign-off lines only "commit".

It would be a backward incompatible tightening of the established
rule, but it may not be a bad change.


Re: [PATCH v1 0/4] Teach 'run' perf script to read config files

2017-07-13 Thread Junio C Hamano
Jeff King  writes:

> ... But if
> we did have a third-party system, I suspect the interesting work would
> be setting up profiles for the "run" tool to kick off. And we might be
> stuck in such a case using whatever format the tool prefers. So having a
> sense of what the final solution looks like might help us know whether
> it makes sense to introduce a custom config format here.

Agreed.

Thanks.


Re: [PATCH] RFC: Introduce '.gitorderfile'

2017-07-13 Thread Stefan Beller
On Thu, Jul 13, 2017 at 12:12 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> On Thu, Jul 13, 2017 at 8:59 AM, Jeff King  wrote:
 This triggers two reactions for me:

 (a) We should totally do that.
>>>
 (b) It's a rabbit hole to go down.
>>>
>>> And yes, I had both of those reactions, too. We've had the
>>> "project-level .gitconfig" discussion many times over the years. And it
>>> generally comes back to "you can ship a snippet of config and then give
>>> people a script which adds it to their repo".
>>
>> I see this "project-level .gitconfig" via the .gitmodules file.
>> See GITMODULES(5), anything except submodule..path is
>> just project-level .gitconfig,...
>
> They were from day one meant as a suggestion made by the project.
> You do not have to follow them and you are free to update them,
> i.e. after "submodule init" copied URL to point at a closer mirror,
> for example.

The URL is somewhat special as its copying into the .git/config
also marks the submodule as interesting (no matter if the URL is
changed by the user).

The point I was trying to make is best demonstrated in
t5526-fetch-submodules.sh:

> ok 7 - using fetchRecurseSubmodules=true in .gitmodules recurses into 
> submodules
> ok 8 - --no-recurse-submodules overrides .gitmodules config
> ok 9 - using fetchRecurseSubmodules=false in .git/config overrides setting in 
> .gitmodules

They were not suggestions, but defaults dictated by the project.

If the user did not change their config, they did as the project
said. I was not there on day one to remember if they are merely
meant as suggestions, but their behavior is asserting.


Re: [PATCH 0/15] making user-format colors conditional on config/tty

2017-07-13 Thread Stefan Beller
On Thu, Jul 13, 2017 at 7:55 AM, Jeff King  wrote:
> This is a cleanup of the patch I posted last October:
>
>   
> https://public-inbox.org/git/20161010151517.6wszhuyp57yfn...@sigill.intra.peff.net/
>
> The general idea is that it's rather confusing that "%C(red)" in a
> pretty-print format does not currently respect color.ui, --no-color, or
> the usual isatty check on stdout. This series changes that. Note that
> this is a backwards-incompatible change, but the general sentiment in
> that earlier thread seemed to be that the existing behavior is arguably
> buggy. See patch 14 for more discussion.
>
> The patch stalled back then because I wanted to make sure that
> ref-filter's color placeholders behaved the same. That required some
> refactoring which conflicted badly with kn/ref-filter-branch-list. Now
> that it has graduated, I was able to rebase on top.
>
> This version also takes into account feedback from the original thread.
> And as I added tests, it surfaced a few corner cases around color config
> that I've dealt with here.  The last two patches are the most
> interesting bits.
>

I have reviewed these slightly and think this series is a good change.

Thanks,
Stefan


Re: [PATCH] RFC: Introduce '.gitorderfile'

2017-07-13 Thread Junio C Hamano
Stefan Beller  writes:

> On Thu, Jul 13, 2017 at 8:59 AM, Jeff King  wrote:
>>> This triggers two reactions for me:
>>>
>>> (a) We should totally do that.
>>
>>> (b) It's a rabbit hole to go down.
>>
>> And yes, I had both of those reactions, too. We've had the
>> "project-level .gitconfig" discussion many times over the years. And it
>> generally comes back to "you can ship a snippet of config and then give
>> people a script which adds it to their repo".
>
> I see this "project-level .gitconfig" via the .gitmodules file.
> See GITMODULES(5), anything except submodule..path is
> just project-level .gitconfig,...

They were from day one meant as a suggestion made by the project.
You do not have to follow them and you are free to update them,
i.e. after "submodule init" copied URL to point at a closer mirror,
for example.


Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)

2017-07-13 Thread Junio C Hamano
Jeff King  writes:

> [1] Another sticking point is that this really does need to be in the
> reflog of the ref we are pushing (and not, e.g., HEAD). But one does
> not always push from a ref. I suspect that's OK in practice, though.
> If you are doing "git push --force-with-lease HEAD~2:master", it is
> probably OK for us to error out with "uh, lease from what?".

I actually expect this to bite me personally, as I often do not
rewind the actual "topic" ref that is my target of rewriting,
because I am a chicken---I detach the HEAD and build a history there
to see if I can come up with a better history, compare the result
with what is on the "topic" (which is not touched at all during the
rewriting), and after all is done, do a "checkout -B topic".  The
"remote tip must appear in the local reflog" rule will never be
satisfied.

>> I wonder if this could be a replacement for the current "the user
>> did not even specify what the expected current value is, so we
>> pretend as if the tip of the remote-tracking branch was given"
>> kludge.
>
> Yes, that is exactly what I was thinking of (and why I said that even
> though this really isn't force-with-lease in a strict sense, it slots
> into the same level of safety, so it might be worth using the name).
>
>> Instead,
>> 
>>  git push --force-with-lease origin master
>>  git push --force-with-lease origin topic:master
>>  git push --force-with-lease origin HEAD:master
>> 
>> could
>> 
>>  (1) first learn where 'refs/heads/master' over there is at.  Call
>>  it X (it may be C or D in the earlier example).
>> 
>>  (2) locate from which ref the commit we are pushing out is taken;
>>  in the above examples, they are our refs/heads/master,
>>  refs/heads/topic, and HEAD, respectively.  Call it R.
>> 
>>  (3) see if the reflog of R has X.  If so do a --force push;
>>  otherwise fail.
>
> Yes, more or less. A few thoughts:
>
>   - that step 2 is where the "wait, that isn't even a ref" error above
> would come in
>
>   - I suspect in the third example we probably ought to be using the
> reflog of the branch that HEAD points to. You would not want:
>
>$ git checkout advanced-branch $ git checkout older-branch $ git
>push --force-with-lease origin HEAD:older-branch
>
> to consider that commits in advanced-branch are part of the lease.

The third one was meant to be rewriting on detached HEAD, not having
any underlying branch.

>   - For step 3, I'm not sure if we you mean to look for exactly X, or
> if it would be OK to have any commit whose ancestor is X. I think
> you'd need the latter to accommodate a non-fast-forward "git pull"
> (or fetch+merge) where the local ref is never set precisely to the
> upstream commit.

But the result in that case is a descendant of upstream you just
merged, so you do not even want to use any form of forcing---you
would rather want to rely on the usual "push must fast-forward"
mechanism, no?


Re: [PATCH 2/2] tag: convert gpg_verify_tag to use struct object_id

2017-07-13 Thread Junio C Hamano
Stefan Beller  writes:

>   if (fmt_pretty)
> - pretty_print_ref(name, sha1, fmt_pretty);
> + pretty_print_ref(name, oid.hash, fmt_pretty);

The next step would be to have pretty_print_ref() to take an oid; as
there are only two callers to it, both of which pass oid->hash to it,
that should be a small and conflict-free conversion.

>  
> - type = sha1_object_info(sha1, NULL);
> + type = sha1_object_info(oid->hash, NULL);

sha1_object_info() has a handful of callers that do not pass the
pointer to the hash field in an existing oid object, but it does not
look too bad as a candidate for conversion.

>   if (!buf)
>   return error("%s: unable to read file.",
>   name_to_report ?
>   name_to_report :
> - find_unique_abbrev(sha1, DEFAULT_ABBREV));
> + find_unique_abbrev(oid->hash, DEFAULT_ABBREV));

So does find_unique_abbrev().

Overall both patches look good.  Thanks.


Re: [PATCH v1 0/4] Teach 'run' perf script to read config files

2017-07-13 Thread Christian Couder
On Thu, Jul 13, 2017 at 6:58 PM, Jeff King  wrote:
> On Thu, Jul 13, 2017 at 08:50:46AM +0200, Christian Couder wrote:
>
>> Goal
>> 
>>
>> Using many long environment variables to give parameters to the 'run'
>> script is error prone and tiring.
>>
>> We want to make it possible to store the parameters to the 'run'
>> script in a config file. This will make it easier to store, reuse,
>> share and compare parameters.
>
> Because perf-lib is built on test-lib, it already reads
> GIT-BUILD-OPTIONS.

Actually the 'run' script also sources GIT-BUILD-OPTIONS, so maybe
this is not necessary.
Also are the variables in GIT-BUILD-OPTIONS exported already?

> And the Makefile copies several perf-related values
> into it, including GIT_PERF_MAKE_OPTS and GIT_PERF_REPEAT_COUNT. So you
> can already do:
>
>   echo 'GIT_PERF_REPEAT_COUNT = 10' >>config.mak
>   echo 'GIT_PERF_MAKE_OPTS = CFLAGS="-O2" DEVELOPER=1' >>config.mak
>   make

The "make" here might not even be needed as in the 'run' script
"config.mak" is copied into the "build/$rev" directory where "make" is
run to build the $rev version.

>   cd t/perf
>   ./run 
>
> I suspect there are still a lot of things that could be made easier with
> a config file, so I'm not against the concept. Your example here:
>
>> [perf "with libpcre"]
>> makeOpts = DEVELOPER=1 CFLAGS='-g -O0' USE_LIBPCRE=YesPlease
>> [perf "without libpcre"]
>> makeOpts = DEVELOPER=1 CFLAGS='-g -O0'
>
> is a lot more compelling. But right now the perf suite is not useful at
> all for comparing two builds of the same tree. For that, I think it
> would be more useful if we could define a tuple of parameters for a run.
> One of which could be the tree we're testing. Build opts are another.
> Tested repository is another. And then we'd fill in a table of results
> and let you slice up the table by any column (e.g., compare times for
> runs against a single tree but with differing build options).

Yeah, improving the output part is another thing that I have discussed
with AEvar and that I have planned to work on.


Re: [PATCH 1/2] commit: convert lookup_commit_graft to struct object_id

2017-07-13 Thread Junio C Hamano
Stefan Beller  writes:

> With this patch, commit.h doesn't contain the string 'sha1' any more.

;-)  Nice.

commit_graft_pos() still thinks we only deal with SHA-1, but that
needs to wait for oid_pos().  The function has only two callers that
do not pass X->oid.hash so it may be a good candidate to convert.

>
> Signed-off-by: Stefan Beller 
> ---
>
> Before diving into the "RFC object store" series further, I want to get
> rid of the final sha1s in {commit,tag}.{c,h}.
>
>  commit.c  | 6 +++---
>  commit.h  | 2 +-
>  fsck.c| 2 +-
>  shallow.c | 4 ++--
>  4 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/commit.c b/commit.c
> index cbfd689939..e0888cf0f7 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -199,11 +199,11 @@ static void prepare_commit_graft(void)
>   commit_graft_prepared = 1;
>  }
>  
> -struct commit_graft *lookup_commit_graft(const unsigned char *sha1)
> +struct commit_graft *lookup_commit_graft(const struct object_id *oid)
>  {
>   int pos;
>   prepare_commit_graft();
> - pos = commit_graft_pos(sha1);
> + pos = commit_graft_pos(oid->hash);
>   if (pos < 0)
>   return NULL;
>   return commit_graft[pos];
> @@ -335,7 +335,7 @@ int parse_commit_buffer(struct commit *item, const void 
> *buffer, unsigned long s
>   bufptr += tree_entry_len + 1; /* "tree " + "hex sha1" + "\n" */
>   pptr = >parents;
>  
> - graft = lookup_commit_graft(item->object.oid.hash);
> + graft = lookup_commit_graft(>object.oid);
>   while (bufptr + parent_entry_len < tail && !memcmp(bufptr, "parent ", 
> 7)) {
>   struct commit *new_parent;
>  
> diff --git a/commit.h b/commit.h
> index 4127c298cb..6d857f06c1 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -249,7 +249,7 @@ typedef int (*each_commit_graft_fn)(const struct 
> commit_graft *, void *);
>  
>  struct commit_graft *read_graft_line(char *buf, int len);
>  int register_commit_graft(struct commit_graft *, int);
> -struct commit_graft *lookup_commit_graft(const unsigned char *sha1);
> +struct commit_graft *lookup_commit_graft(const struct object_id *oid);
>  
>  extern struct commit_list *get_merge_bases(struct commit *rev1, struct 
> commit *rev2);
>  extern struct commit_list *get_merge_bases_many(struct commit *one, int n, 
> struct commit **twos);
> diff --git a/fsck.c b/fsck.c
> index b4204d772b..2d2d2e9432 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -736,7 +736,7 @@ static int fsck_commit_buffer(struct commit *commit, 
> const char *buffer,
>   buffer += 41;
>   parent_line_count++;
>   }
> - graft = lookup_commit_graft(commit->object.oid.hash);
> + graft = lookup_commit_graft(>object.oid);
>   parent_count = commit_list_count(commit->parents);
>   if (graft) {
>   if (graft->nr_parent == -1 && !parent_count)
> diff --git a/shallow.c b/shallow.c
> index 54359d5490..f5591e56da 100644
> --- a/shallow.c
> +++ b/shallow.c
> @@ -107,7 +107,7 @@ struct commit_list *get_shallow_commits(struct 
> object_array *heads, int depth,
>   cur_depth++;
>   if ((depth != INFINITE_DEPTH && cur_depth >= depth) ||
>   (is_repository_shallow() && !commit->parents &&
> -  (graft = lookup_commit_graft(commit->object.oid.hash)) != 
> NULL &&
> +  (graft = lookup_commit_graft(>object.oid)) != NULL 
> &&
>graft->nr_parent < 0)) {
>   commit_list_insert(commit, );
>   commit->object.flags |= shallow_flag;
> @@ -398,7 +398,7 @@ void prepare_shallow_info(struct shallow_info *info, 
> struct oid_array *sa)
>   for (i = 0; i < sa->nr; i++) {
>   if (has_object_file(sa->oid + i)) {
>   struct commit_graft *graft;
> - graft = lookup_commit_graft(sa->oid[i].hash);
> + graft = lookup_commit_graft(>oid[i]);
>   if (graft && graft->nr_parent < 0)
>   continue;
>   info->ours[info->nr_ours++] = i;


Re: [PATCH 05/15] ref-filter: abstract ref format into its own struct

2017-07-13 Thread Stefan Beller
On Thu, Jul 13, 2017 at 8:01 AM, Jeff King  wrote:

>  builtin/branch.c   | 14 +++---
>  builtin/for-each-ref.c | 22 --
>  builtin/tag.c  | 30 --
>  builtin/verify-tag.c   | 12 ++--
>  ref-filter.c   | 22 --
>  ref-filter.h   | 22 +-
>  6 files changed, 70 insertions(+), 52 deletions(-)

The patch looks good to me. So some off-topic comments:
I reviewed this patch from bottom up, i.e. I started looking at
ref-filter.h, then  ref-filter.c and then the rest. If only you had formatted
the patches with an orderfile. ;)


Re: [PATCH 03/15] t: use test_decode_color rather than literal ANSI codes

2017-07-13 Thread Jeff King
On Thu, Jul 13, 2017 at 11:40:32AM -0700, Stefan Beller wrote:

> On Thu, Jul 13, 2017 at 7:58 AM, Jeff King  wrote:
> 
> > I really only need t6300 and t6006 converted to build on for the rest of
> > the series. But t4207 was easy to do. t4026 still uses raw codes, but
> > converting it would be a pretty big job, so I punted.
> 
> I think it is good to have raw codes in at least one place to test
> that raw codes work, but then again it could be specific test calling
> out that this is tested.

Yeah, that thought crossed my mind, too. t4026 would definitely be the
place for it, as it is about exhaustively testing the various colors.
The others are just about feeding color codes through config and
user-formats.

> > @@ -59,7 +54,8 @@ EOF
> >  # to this test since it does not contain any decoration, hence 
> > --first-parent
> >  test_expect_success 'Commit Decorations Colored Correctly' '
> > git log --first-parent --abbrev=10 --all --decorate --oneline 
> > --color=always |
> > -   sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" >out &&
> > +   sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" |
> > +   test_decode_color >out &&
> 
> Just some thoughts:
> 
> This extension of the pipe-chain is not making it worse as gits exit code
> is already hidden.

Yes, I noticed the existing pipe-chain. We can fix it while we're here,
though I think it's not a big deal in practice.

> The sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" is sort of funny, because
> I would have expected it to break in the future with e.g. the sha1 to longer
> hash conversion. But as we specify --abbrev=10, this seems future proof.
> In an ideal world this would be encapsulated in a function (c.f. 
> t/diff-lib.sh).

I agree it's a bit gross. Possibly:

  git log --format='%C(auto)%d %s'

would be easier for the test to parse (I'm pretty sure that didn't exist
back when this test was written).

-Peff


Re: [PATCH v1 0/4] Teach 'run' perf script to read config files

2017-07-13 Thread Jeff King
On Thu, Jul 13, 2017 at 11:29:10AM -0700, Junio C Hamano wrote:

> > So then I think your config file primarily becomes about defining the
> > properties of each run. I'm not sure if it would look like what you're
> > starting on here or not.
> 
> Yeah, I suspect that the final shape that defines the matrix might
> have to become quite a bit different.

I think it would help if the perf code was split better into three
distinct bits:

  1. A data-store capable of storing the run tuples along with their
 outcomes for each test.

  2. A "run" front-end that runs various profiles (based on config,
 command-line options, etc) and writes the results to the data
 store.

  3. A flexible viewer which can slice and dice the contents of the data
 store according to different parameters.

We're almost there now. The "run" script actually does store results,
and you can view them via "aggregate.pl" without actually re-running the
tests. But the data store only indexes on one property: the tree that
was tested (and all of the other properties are ignored totally; you can
get some quite confusing results if you do a "./run" using say git.git
as your test repo, and then a followup with "linux.git").

I have to imagine that somebody else has written such a system already
that we could reuse.  I don't know of one off-hand, but this is also not
an area where I've spent a lot of time.

We're sort of drifting off topic from Christian's patches here. But if
we did have a third-party system, I suspect the interesting work would
be setting up profiles for the "run" tool to kick off. And we might be
stuck in such a case using whatever format the tool prefers. So having a
sense of what the final solution looks like might help us know whether
it makes sense to introduce a custom config format here.

-Peff


Re: [PATCH 03/15] t: use test_decode_color rather than literal ANSI codes

2017-07-13 Thread Stefan Beller
On Thu, Jul 13, 2017 at 7:58 AM, Jeff King  wrote:

> I really only need t6300 and t6006 converted to build on for the rest of
> the series. But t4207 was easy to do. t4026 still uses raw codes, but
> converting it would be a pretty big job, so I punted.
>

I think it is good to have raw codes in at least one place to test
that raw codes work, but then again it could be specific test calling
out that this is tested.

> @@ -59,7 +54,8 @@ EOF
>  # to this test since it does not contain any decoration, hence --first-parent
>  test_expect_success 'Commit Decorations Colored Correctly' '
> git log --first-parent --abbrev=10 --all --decorate --oneline 
> --color=always |
> -   sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" >out &&
> +   sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" |
> +   test_decode_color >out &&

Just some thoughts:

This extension of the pipe-chain is not making it worse as gits exit code
is already hidden.

The sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" is sort of funny, because
I would have expected it to break in the future with e.g. the sha1 to longer
hash conversion. But as we specify --abbrev=10, this seems future proof.
In an ideal world this would be encapsulated in a function (c.f. t/diff-lib.sh).



> @@ -61,8 +61,9 @@ test_format () {
>  # Feed to --format to provide predictable colored sequences.
>  AUTO_COLOR='%C(auto,red)foo%C(auto,reset)'
>  has_color () {
> -   printf '\033[31mfoo\033[m\n' >expect &&
> -   test_cmp expect "$1"
> +   test_decode_color <"$1" >decoded &&
> +   echo "foo" >expect &&
> +   test_cmp expect decoded
>  }

Thanks for removing hard coded colors :)


Re: [PATCH] submodule: use cheaper check for submodule pushes

2017-07-13 Thread Junio C Hamano
Stefan Beller  writes:

> On Wed, Jul 12, 2017 at 5:53 PM, Junio C Hamano  wrote:
>> Jonathan Nieder  writes:
>>
 In the function push_submodule[1] we use add_submodule_odb[2] to determine
 if a submodule has been populated. However the function does not work with
 the submodules objects that are added, instead a new child process is used
 to perform the actual push in the submodule.

 Use is_submodule_populated[3] that is cheaper to guard from unpopulated
 submodules.

 [1] 'push_submodule' was added in eb21c732d6 (push: teach
 --recurse-submodules the on-demand option, 2012-03-29)
 [2] 'add_submodule_odb' was introduced in 752c0c2492 (Add the
 --submodule option to the diff option family, 2009-10-19)
 [3] 'is_submodule_populated' was added in 5688c28d81 (submodules:
 add helper to determine if a submodule is populated, 2016-12-16)
>>>
>>> These footnotes don't answer the question that I really have: why did
>>> this use add_submodule_odb in the first place?
>>>
>>> E.g. did the ref iteration code require access to the object store
>>> previously and stop requiring it later?
>>
>> Yes, the most important question is if it is safe to lose the access
>> to the object store of the submodule.  It is an endgame we should
>> aim for to get rid of add_submodule_odb(), but does the rest of this
>> codepath not require objects in the submodule at all or do we still
>> need to change something to make it so?
>
> Yes, as the code in the current form as well as in its first occurrence
> used the result of add_submodule_odb to determine if to spawn a child process.

The original added so that the return value of the call can be used
for that, and the current code still uses the return value for that
purpose.

That much is already known.  

I think Jonathan's question (which I concurred) is if we also ended
up relying on the side effect of calling that function (i.e. being
able to now find objects that are not in our repository but in the
submodule's object store).  By looking at the eb21c732d6, we can
tell that the original didn't mean to and didn't add any code that
relies on the ability to be able to read from the submodule object
store.  I am not sure if that is still true after 5 years (i.e. is
there any new code added in the meantime that made us depend on the
ability to read from submodule object store?).

My hunch (and hope) is that we are probably safe, but that is a lot
weaker than "yes this is a good change we want to apply".



Re: [PATCH v1 0/4] Teach 'run' perf script to read config files

2017-07-13 Thread Junio C Hamano
Jeff King  writes:

> Because perf-lib is built on test-lib, it already reads
> GIT-BUILD-OPTIONS. And the Makefile copies several perf-related values
> into it, including GIT_PERF_MAKE_OPTS and GIT_PERF_REPEAT_COUNT. So you
> can already do:
> ...
> But right now the perf suite is not useful at
> all for comparing two builds of the same tree.
> For that, I think it
> would be more useful if we could define a tuple of parameters for a run.
> One of which could be the tree we're testing. Build opts are another.
> Tested repository is another. And then we'd fill in a table of results
> and let you slice up the table by any column (e.g., compare times for
> runs against a single tree but with differing build options).

Yeah, I think we saw this discussed in not-so-distant past, for
which we want a good solution, and it might be the case that such a
solution can be made easier to use with a separate configuration
file (which this topic may or may not be used as a building block).

> So then I think your config file primarily becomes about defining the
> properties of each run. I'm not sure if it would look like what you're
> starting on here or not.

Yeah, I suspect that the final shape that defines the matrix might
have to become quite a bit different.


Re: "groups of files" in Git?

2017-07-13 Thread Junio C Hamano
Stefan Beller  writes:

> On Tue, Jul 11, 2017 at 8:45 AM, Nikolay Shustov
>
>> With Git I cannot seem to finding the possibility to figure out how to
>> achieve the same result. And the problem is that putting change sets
>> on different Git branches (or workdirs, or whatever Git offers that
>> makes the changes to be NOT in the same source tree) is not a viable
>> option from me as I would have to re-build code as I re-integrate the
>> changes between the branches (or whatever changes separation Git
>> feature is used).
>
> you would merge the branches and then run the tests/integration. Yes that
> seems cumbersome.

Sometimes the need to make trial merge for testing cannot be avoided
and having branches for separate topics is the only sensible
approach, at least in the Git world.

Imagine your project has two components that are interrelated, say,
the server and the client, that have to work well with each other.
In addition, you want to make sure your updated server works well
with existing clients, and vice versa.

One way that naturally maps this scenario to the development
workflow is to have a server-update topic and a client-update topic
branches, and separate changes to update each side with their own
commits:

 s---s---Sserver-update topic
/
---o---ooMmainline
\
 c---c---Cclient-update topic

And during the development of these *-update topics, you try three
merges:

 (1) Merge S to the mainline M and test the whole thing, to make sure
 that existing client will still be able to talk with the
 updated server.

 (2) Merge C to the mainline M and test the whole thing, to make
 sure that updated clients will still be able to talk with the
 existing server.

 (3) Merge both S and C to the mainline M and test the whole thing,
 to make sure the updated ones talk to each other.

If there is no significant development going on on the mainline in
the meantime, (1) and (2) can be done by trying out S and C alone
without making a trial merge with M.  The same for (3)---it can be
just a trial merge between S and C without updates that happened on
the mainline.

I'd love to hear from folks in Perforce or other world how they
address this scenario with their system.


Re: [PATCH] commit & merge: modularize the empty message validator

2017-07-13 Thread Kaartic Sivaraam
On Tue, 2017-07-11 at 13:22 -0700, Junio C Hamano wrote:
> I think the "validation" done with the rest_is_empty() is somewhat
> bogus.  Why should we reject a commit without a message and a
> trailer block with only signed-off-by lines, while accepting a
> commit without a message and a trailer block as long as the trailer
> block has something equally meaningless by itself, like
> "Helped-by:"?  I think we should inspect the proposed commit log
> message taken from the editor, find its tail ignoring the trailing
> comment using ignore_non_trailer, and further separate the result
> into (, , ) using the same
> logic used by the interpret-trailers tool, and then complain when
>  turns out to be empty, to be truly useful and consistent.
> 
I have a few doubts for which I need clarification to move on with
this. 

1. If we abort when the  part is empty wouldn't it be too
restrictive ?

IOW, Wouldn't it affect users of "git commit -‍-cleanup=verbatim"
who wish to commit only the comments or parts of it ?
(I'm not sure if someone would find that useful)

2. Is it ok to use the "find_trailer_start" function of "trailer.c"
to locate the trailer? 

Note: It has a little issue that it wouldn't detect the trailer if
the message comprises of one trailer alone and no other text. This
case occurs while aborting a commit started using "git commit -s".
Any possibilities to overcome the issue?

3. Ignoring point 1 for now, What other helper methods except the
ones listed below could be helpful in the separating the cleaned up
commit message into the , ,  ?

* ignore_non_trailer
* find_trailer_start

-- 
Kaartic


Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)

2017-07-13 Thread Jeff King
On Thu, Jul 13, 2017 at 10:51:21AM -0700, Junio C Hamano wrote:

> > But imagine force-with-lease rule were more like: when pushing
> > branch "foo" to remote branch "bar", allow only if the current value of
> > "bar" is an ancestor of some entry in the reflog of "foo".
> 
> Would that cover the case that originally motivated the "with lease"
> thing?  Let's see.
> 
>   BC"foo"
>  /
> o---A---B---C   "bar"
> 
> You are pushing back to "bar" that still has C (or it may have
> acquired D) from "foo" that is now at BC.  It is likely that you
> started to prepare the "A---BC" history from "A---B---C" history, in
> which case your current branch "foo" that is being pushed out would
> have had C at its tip some time in the past.  So seeing that the tip
> of the remote is still C, which is in the reflog of "foo", we can
> happily push it out.  If the tip of the remote "bar" is now D
> because somebody updated it, you did not consider it while preparing
> your BC, and we want to fail the push---because D does not appear in
> the reflog of "foo", that is achieved.

Yeah. I think where this scheme falls down is if you incorporate work
from the remote ref _without_ it ever being in your ref[1]. So for
example I could do something like:

$ git fetch  ;# new commits arrive
$ git log origin/master ;# oh my, that tip commit looks broken
$ git reset --hard origin/master^  ;# lets get rid of it
$ git push --force-with-lease

and that tricks the system. But the nice thing is that it tricks it in
the "safe" direction. We'd refuse such a push, unless you do it more
like:

$ git merge origin/master
$ git reset --hard HEAD^
$ git push --force-with-lease

Which is really the sane way to do it in the first place.

What I'd wonder more is if there are cases where we fail in the unsafe
direction. Here's the most plausible scenario I could come up with:

  $ git pull ;# get some new commits
  $ git reset --hard HEAD^ ;# doh, that merge didn't look right
  $ git rebase -i ;# hack hack hack
  $ git push --force-with-lease ;# oops, I should have re-merged

We get fooled because yes, we were once on the remote's tip. But we
discarded it and then did a _different_ rewriting operation (of course
it is impossible for the tool to tell if the original "reset --hard" was
an intentional rewriting operation, or one where we simply backed out a
mistake).

[1] Another sticking point is that this really does need to be in the
reflog of the ref we are pushing (and not, e.g., HEAD). But one does
not always push from a ref. I suspect that's OK in practice, though.
If you are doing "git push --force-with-lease HEAD~2:master", it is
probably OK for us to error out with "uh, lease from what?".

> I wonder if this could be a replacement for the current "the user
> did not even specify what the expected current value is, so we
> pretend as if the tip of the remote-tracking branch was given"
> kludge.

Yes, that is exactly what I was thinking of (and why I said that even
though this really isn't force-with-lease in a strict sense, it slots
into the same level of safety, so it might be worth using the name).

> Instead,
> 
>   git push --force-with-lease origin master
>   git push --force-with-lease origin topic:master
>   git push --force-with-lease origin HEAD:master
> 
> could
> 
>  (1) first learn where 'refs/heads/master' over there is at.  Call
>  it X (it may be C or D in the earlier example).
> 
>  (2) locate from which ref the commit we are pushing out is taken;
>  in the above examples, they are our refs/heads/master,
>  refs/heads/topic, and HEAD, respectively.  Call it R.
> 
>  (3) see if the reflog of R has X.  If so do a --force push;
>  otherwise fail.

Yes, more or less. A few thoughts:

  - that step 2 is where the "wait, that isn't even a ref" error above
would come in

  - I suspect in the third example we probably ought to be using the
reflog of the branch that HEAD points to. You would not want:

   $ git checkout advanced-branch $ git checkout older-branch $ git
   push --force-with-lease origin HEAD:older-branch

to consider that commits in advanced-branch are part of the lease.

  - For step 3, I'm not sure if we you mean to look for exactly X, or
if it would be OK to have any commit whose ancestor is X. I think
you'd need the latter to accommodate a non-fast-forward "git pull"
(or fetch+merge) where the local ref is never set precisely to the
upstream commit.

-Peff


Re: "groups of files" in Git?

2017-07-13 Thread Junio C Hamano
Nikolay Shustov  writes:

> Thank you for the detailed explanation, it looks like merging the
> commits would be helpful in my case. And I think it is a very good
> analogy that Perforce changelists are like multiple pending committs,
> if Git were supporting such.
>
> What it won't be achieving by using commits in this schema is the
> following thing I can do in Perforce:
> In the uncommitted Perforce changelists I can revert the changed file
> to the original state and move the files between the changelists.
> Quite often, while working on something, in the middle I would decide
> to isolate changes to a certain set of files to a separate changelsit
> - but then I might change my mind. It is all flexible until I actually
> commit my Perforce changelist, after which it becomes very much as
> committed changes in any other source control.
> This is actual flexibility I am looking for achieving in Git.

I actually think we already have such a flexibility.  Unlike
Perforce, Git is distributed, and the most important aspect of the
distinction is that what happens _in_ your local Git repository may
be called "committed" in Git lingo, but not visible to the public.

You can consider these commits you make in your repository "pending"
when you think of your workflow in Perforce terms, until you merge
and push out the result, which roughly corresponds to "submitting"
in Perforce lingo.

Once you start treating your local commits that you haven't pushed
out as changes that are still "pending" when observed from the
outside world, you'd realize that you have as much flexibilty, if
not more, to dice and slice them with the local tools like "rebase
-i", "add -p", etc., as you would have in your Perforce workflow,
I would think.




Re: [PATCH] commit & merge: modularize the empty message validator

2017-07-13 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> Sometimes I abort an commit from from the editor by providing an empty
> commit message. Then I came to know that 'git commit' considers commit
> messages with just signed-off-by lines as an empty message. I tried to
> take advantage of that. I once tried to abort a merge by just removing
> the "Merge ..." line and leaving the "Signed-off" line and was
> surprised to see the merge happen instead of an abort. The rest is
> history. :)

I think many people know about and do use the "delete all lines"
(i.e. ":1,$d" in vi, or \M-< \C-SPC \M-> \C-w in Emacs) to abort out
of a commit or a merge.  I just do not think it is likely for them
to leave Sign-off lines and remove everything else, which is more
work than to delete everything, hence my reaction.



Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)

2017-07-13 Thread Junio C Hamano
Jeff King  writes:

> I do think Dscho is on to something with the "see if it was ever in our
> reflog" idea.

Yes, I found the idea was interesting when I responded with the
"thinking aloud", and I still do think it has some uses elsewhere,
even if not in "pull --rebase" workflow.

> What you really want as a default for force-with-lease is
> some way to say "I want to change that ref from X to Y, but only for
> values of X that I know Y has already considered and incorporated".

Correct.

> But imagine force-with-lease rule were more like: when pushing
> branch "foo" to remote branch "bar", allow only if the current value of
> "bar" is an ancestor of some entry in the reflog of "foo".

Would that cover the case that originally motivated the "with lease"
thing?  Let's see.

  BC"foo"
 /
o---A---B---C   "bar"

You are pushing back to "bar" that still has C (or it may have
acquired D) from "foo" that is now at BC.  It is likely that you
started to prepare the "A---BC" history from "A---B---C" history, in
which case your current branch "foo" that is being pushed out would
have had C at its tip some time in the past.  So seeing that the tip
of the remote is still C, which is in the reflog of "foo", we can
happily push it out.  If the tip of the remote "bar" is now D
because somebody updated it, you did not consider it while preparing
your BC, and we want to fail the push---because D does not appear in
the reflog of "foo", that is achieved.

> That degrades to the usual fast-forward rule if you just check the ref
> tip.  And when checking the reflog entries, it shows that at some point
> you had your local branch set to something that covered all of the
> destination, but then later rewound or rewrote history. We don't have to
> care what that operation was. "rebase" is a likely one, but "git reset
> --hard HEAD^ && git push" would fall out naturally from the same rule.
>
> It's not quite as safe as a true force-with-lease with a value would be.

All correct.

I wonder if this could be a replacement for the current "the user
did not even specify what the expected current value is, so we
pretend as if the tip of the remote-tracking branch was given"
kludge.  Instead, 

git push --force-with-lease origin master
git push --force-with-lease origin topic:master
git push --force-with-lease origin HEAD:master

could

 (1) first learn where 'refs/heads/master' over there is at.  Call
 it X (it may be C or D in the earlier example).

 (2) locate from which ref the commit we are pushing out is taken;
 in the above examples, they are our refs/heads/master,
 refs/heads/topic, and HEAD, respectively.  Call it R.

 (3) see if the reflog of R has X.  If so do a --force push;
 otherwise fail.

With such an update, I suspect that it would become a lot safer than
relying on the stability of the remote tracking branch, which is
what the current code does.

As you said, this is not as safe as a true "the user knows and tells
what commit was considered" force-with-lease, but can be a lot safer
and can become a replacement for the current "the user does not tell
and allows us to DWIM".

A good thing is that the DWIM is advertised like so:

... are still experimental and their semantics may change

so we are free to change it to any better version.  And I think
Dscho's idea could be that better one.

I am still somewhat reserved because in the above, I only
illustrated a case that would work better than the current one,
without trying hard to poke a hole at the reasoning and to come up
with cases that would work worse.  But I agree that this is on to
something good ;-)










[PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname

2017-07-13 Thread Jeff Hostetler
From: Jeff Hostetler 

Create subclass of oidset where each entry has a
field to store the length of the object's content
and an optional pathname.

This will be used in a future commit to build a
manifest of omitted objects in a partial/narrow
clone/fetch.

Signed-off-by: Jeff Hostetler 
---
 Makefile  |   1 +
 oidset2.c | 101 ++
 oidset2.h |  56 ++
 3 files changed, 158 insertions(+)
 create mode 100644 oidset2.c
 create mode 100644 oidset2.h

diff --git a/Makefile b/Makefile
index ffa6da7..d590508 100644
--- a/Makefile
+++ b/Makefile
@@ -791,6 +791,7 @@ LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
 LIB_OBJS += oidset.o
+LIB_OBJS += oidset2.o
 LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-bitmap-write.o
 LIB_OBJS += pack-check.o
diff --git a/oidset2.c b/oidset2.c
new file mode 100644
index 000..806d153
--- /dev/null
+++ b/oidset2.c
@@ -0,0 +1,101 @@
+#include "cache.h"
+#include "oidset2.h"
+
+static int oidset2_hashcmp(const void *va, const void *vb,
+ const void *vkey)
+{
+   const struct oidset2_entry *a = va, *b = vb;
+   const struct object_id *key = vkey;
+   return oidcmp(>oid, key ? key : >oid);
+}
+
+struct oidset2_entry *oidset2_get(const struct oidset2 *set, const struct 
object_id *oid)
+{
+   struct hashmap_entry key;
+   struct oidset2_entry *value;
+
+   if (!set->map.cmpfn)
+   return NULL;
+
+   hashmap_entry_init(, sha1hash(oid->hash));
+   value = hashmap_get(>map, , oid);
+
+   return value;
+}
+
+int oidset2_contains(const struct oidset2 *set, const struct object_id *oid)
+{
+   return !!oidset2_get(set, oid);
+}
+
+int oidset2_insert(struct oidset2 *set, const struct object_id *oid,
+  int64_t object_length, const char *pathname)
+{
+   struct oidset2_entry *entry;
+
+   if (!set->map.cmpfn)
+   hashmap_init(>map, oidset2_hashcmp, 0);
+
+   if (oidset2_contains(set, oid))
+   return 1;
+
+   entry = xcalloc(1, sizeof(*entry));
+   hashmap_entry_init(>hash, sha1hash(oid->hash));
+   oidcpy(>oid, oid);
+
+   entry->object_length = object_length;
+   if (pathname)
+   entry->pathname = strdup(pathname);
+
+   hashmap_add(>map, entry);
+   return 0;
+}
+
+void oidset2_remove(struct oidset2 *set, const struct object_id *oid)
+{
+   struct hashmap_entry key;
+   struct oidset2_entry *e;
+
+   hashmap_entry_init(, sha1hash(oid->hash));
+   e = hashmap_remove(>map, , oid);
+
+   free(e->pathname);
+   free(e);
+}
+
+void oidset2_clear(struct oidset2 *set)
+{
+   hashmap_free(>map, 1);
+}
+
+static int oidset2_cmp(const void *a, const void *b)
+{
+   const struct oidset2_entry *ae = *((const struct oidset2_entry **)a);
+   const struct oidset2_entry *be = *((const struct oidset2_entry **)b);
+
+   return oidcmp(>oid, >oid);
+}
+
+void oidset2_foreach(struct oidset2 *set, oidset2_foreach_cb cb, void *cb_data)
+{
+   struct hashmap_iter iter;
+   struct oidset2_entry **array;
+   struct oidset2_entry *e;
+   int j, k;
+
+   array = xcalloc(set->map.size, sizeof(*e));
+
+   hashmap_iter_init(>map, );
+   k = 0;
+   while ((e = hashmap_iter_next()))
+   array[k++] = e;
+
+   QSORT(array, k, oidset2_cmp);
+
+   for (j = 0; j < k; j++) {
+   e = array[j];
+   cb(j, k, e, cb_data);
+   }
+
+   free(array);
+}
diff --git a/oidset2.h b/oidset2.h
new file mode 100644
index 000..c498eae
--- /dev/null
+++ b/oidset2.h
@@ -0,0 +1,56 @@
+#ifndef OIDSET2_H
+#define OIDSET2_H
+
+/**
+ * oidset2 is a variant of oidset, but allows additional fields for each 
object.
+ */
+
+/**
+ * A single oidset2; should be zero-initialized (or use OIDSET2_INIT).
+ */
+struct oidset2 {
+   struct hashmap map;
+};
+
+#define OIDSET2_INIT { { NULL } }
+
+struct oidset2_entry {
+   struct hashmap_entry hash;
+   struct object_id oid;
+
+   int64_t object_length;  /* This is SIGNED. Use -1 when unknown. */
+   char *pathname;
+};
+
+struct oidset2_entry *oidset2_get(const struct oidset2 *set, const struct 
object_id *oid);
+
+/**
+ * Returns true iff `set` contains `oid`.
+ */
+int oidset2_contains(const struct oidset2 *set, const struct object_id *oid);
+
+/**
+ * Insert the oid into the set; a copy is made, so "oid" does not need
+ * to persist after this function is called.
+ *
+ * Returns 1 if the oid was already in the set, 0 otherwise. This can be used
+ * to perform an efficient check-and-add.
+ */
+int oidset2_insert(struct oidset2 *set, const struct object_id *oid,
+  int64_t object_length, const char *pathname);
+
+void oidset2_remove(struct oidset2 *set, const struct object_id *oid);
+
+typedef void (*oidset2_foreach_cb)(
+

[PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter

2017-07-13 Thread Jeff Hostetler
From: Jeff Hostetler 

Create a simple filter for traverse_commit_list_filtered() to
omit all blobs from the result.

This filter will be used in a future commit by rev-list and
pack-objects to create a "commits and trees" result.  This
is intended for a narrow/partial clone/fetch.

Signed-off-by: Jeff Hostetler 
---
 Makefile   |  1 +
 list-objects-filters.c | 85 ++
 list-objects-filters.h | 17 ++
 3 files changed, 103 insertions(+)
 create mode 100644 list-objects-filters.c
 create mode 100644 list-objects-filters.h

diff --git a/Makefile b/Makefile
index d590508..48fdcf2 100644
--- a/Makefile
+++ b/Makefile
@@ -773,6 +773,7 @@ LIB_OBJS += levenshtein.o
 LIB_OBJS += line-log.o
 LIB_OBJS += line-range.o
 LIB_OBJS += list-objects.o
+LIB_OBJS += list-objects-filters.o
 LIB_OBJS += ll-merge.o
 LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
diff --git a/list-objects-filters.c b/list-objects-filters.c
new file mode 100644
index 000..f29d8bc
--- /dev/null
+++ b/list-objects-filters.c
@@ -0,0 +1,85 @@
+#include "cache.h"
+#include "dir.h"
+#include "tag.h"
+#include "commit.h"
+#include "tree.h"
+#include "blob.h"
+#include "diff.h"
+#include "tree-walk.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "list-objects-filters.h"
+
+/*
+ * A filter for list-objects to omit ALL blobs from the traversal.
+ */
+struct filter_omit_all_blobs_data {
+   struct oidset2 omits;
+};
+
+static list_objects_filter_result filter_omit_all_blobs(
+   list_objects_filter_type filter_type,
+   struct object *obj,
+   const char *pathname,
+   const char *filename,
+   void *filter_data_)
+{
+   struct filter_omit_all_blobs_data *filter_data = filter_data_;
+   int64_t object_length = -1;
+   unsigned long s;
+   enum object_type t;
+
+   switch (filter_type) {
+   default:
+   die("unkown filter_type");
+   return LOFR_ZERO;
+
+   case LOFT_BEGIN_TREE:
+   assert(obj->type == OBJ_TREE);
+   /* always include all tree objects */
+   return LOFR_MARK_SEEN | LOFR_SHOW;
+
+   case LOFT_END_TREE:
+   assert(obj->type == OBJ_TREE);
+   return LOFR_ZERO;
+
+   case LOFT_BLOB:
+   assert(obj->type == OBJ_BLOB);
+   assert((obj->flags & SEEN) == 0);
+
+   /*
+* Since we always omit all blobs (and never provisionally 
omit),
+* we should never see a blob twice.
+*/
+   assert(!oidset2_contains(_data->omits, >oid));
+
+   t = sha1_object_info(obj->oid.hash, );
+   assert(t == OBJ_BLOB);
+   object_length = (int64_t)((uint64_t)(s));
+
+   /* Insert OID into the omitted list. No need for a pathname. */
+   oidset2_insert(_data->omits, >oid, object_length,
+  NULL);
+   return LOFR_MARK_SEEN; /* but not LOFR_SHOW (hard omit) */
+   }
+}
+
+void traverse_commit_list_omit_all_blobs(
+   struct rev_info *revs,
+   show_commit_fn show_commit,
+   show_object_fn show_object,
+   oidset2_foreach_cb print_omitted_object,
+   void *ctx_data)
+{
+   struct filter_omit_all_blobs_data d;
+
+   memset(, 0, sizeof(d));
+
+   traverse_commit_list_filtered(revs, show_commit, show_object, ctx_data,
+ filter_omit_all_blobs, );
+
+   if (print_omitted_object)
+   oidset2_foreach(, print_omitted_object, ctx_data);
+
+   oidset2_clear();
+}
diff --git a/list-objects-filters.h b/list-objects-filters.h
new file mode 100644
index 000..b981020
--- /dev/null
+++ b/list-objects-filters.h
@@ -0,0 +1,17 @@
+#ifndef LIST_OBJECTS_FILTERS_H
+#define LIST_OBJECTS_FILTERS_H
+
+#include "oidset2.h"
+
+/*
+ * A filter for list-objects to omit ALL blobs
+ * from the traversal.
+ */
+void traverse_commit_list_omit_all_blobs(
+   struct rev_info *revs,
+   show_commit_fn show_commit,
+   show_object_fn show_object,
+   oidset2_foreach_cb print_omitted_object,
+   void *ctx_data);
+
+#endif /* LIST_OBJECTS_FILTERS_H */
-- 
2.9.3



[PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter

2017-07-13 Thread Jeff Hostetler
From: Jeff Hostetler 

Create a filter for traverse_commit_list_filtered() to omit the
blobs that would not be needed by a sparse checkout using the
given sparse-checkout spec.

This filter will be used in a future commit by rev-list and
pack-objects for partial/narrow clone/fetch.

A future enhancement should be able to also omit tree objects
not needed by such a sparse checkout, but that is not currently
supported.

Signed-off-by: Jeff Hostetler 
---
 list-objects-filters.c | 179 +
 list-objects-filters.h |  16 +
 2 files changed, 195 insertions(+)

diff --git a/list-objects-filters.c b/list-objects-filters.c
index f04d70e..cacf645 100644
--- a/list-objects-filters.c
+++ b/list-objects-filters.c
@@ -180,3 +180,182 @@ void traverse_commit_list_omit_large_blobs(
 
oidset2_clear();
 }
+
+/*
+ * A filter driven by a sparse-checkout specification to only
+ * include blobs that a sparse checkout would populate.
+ *
+ * The sparse-checkout spec is loaded from the blob with the
+ * given OID (rather than .git/info/sparse-checkout) because
+ * the repo may be bare.
+ */
+struct frame {
+   int defval;
+   int child_prov_omit : 1;
+};
+
+struct filter_use_sparse_data {
+   struct oidset2 omits;
+   struct exclude_list el;
+
+   size_t nr, alloc;
+   struct frame *array_frame;
+};
+
+static list_objects_filter_result filter_use_sparse(
+   list_objects_filter_type filter_type,
+   struct object *obj,
+   const char *pathname,
+   const char *filename,
+   void *filter_data_)
+{
+   struct filter_use_sparse_data *filter_data = filter_data_;
+   int64_t object_length = -1;
+   int val, dtype;
+   unsigned long s;
+   enum object_type t;
+   struct frame *frame;
+
+   switch (filter_type) {
+   default:
+   die("unkown filter_type");
+   return LOFR_ZERO;
+
+   case LOFT_BEGIN_TREE:
+   assert(obj->type == OBJ_TREE);
+   dtype = DT_DIR;
+   val = is_excluded_from_list(pathname, strlen(pathname),
+   filename, , _data->el);
+   if (val < 0)
+   val = filter_data->array_frame[filter_data->nr].defval;
+
+   ALLOC_GROW(filter_data->array_frame, filter_data->nr + 1,
+  filter_data->alloc);
+   filter_data->nr++;
+   filter_data->array_frame[filter_data->nr].defval = val;
+   filter_data->array_frame[filter_data->nr].child_prov_omit = 0;
+
+   /*
+* A directory with this tree OID may appear in multiple
+* places in the tree. (Think of a directory move, with
+* no other changes.)  And with a different pathname, the
+* is_excluded...() results for this directory and items
+* contained within it may be different.  So we cannot
+* mark it SEEN (yet), since that will prevent process_tree()
+* from revisiting this tree object with other pathnames.
+*
+* Only SHOW the tree object the first time we visit this
+* tree object.
+*
+* We always show all tree objects.  A future optimization
+* may want to attempt to narrow this.
+*/
+   if (obj->flags & FILTER_REVISIT)
+   return LOFR_ZERO;
+   obj->flags |= FILTER_REVISIT;
+   return LOFR_SHOW;
+
+   case LOFT_END_TREE:
+   assert(obj->type == OBJ_TREE);
+   assert(filter_data->nr > 0);
+
+   frame = _data->array_frame[filter_data->nr];
+   filter_data->nr--;
+
+   /*
+* Tell our parent directory if any of our children were
+* provisionally omitted.
+*/
+   filter_data->array_frame[filter_data->nr].child_prov_omit |=
+   frame->child_prov_omit;
+
+   /*
+* If there are NO provisionally omitted child objects (ALL 
child
+* objects in this folder were INCLUDED), then we can mark the
+* folder as SEEN (so we will not have to revisit it again).
+*/
+   if (!frame->child_prov_omit)
+   return LOFR_MARK_SEEN;
+   return LOFR_ZERO;
+
+   case LOFT_BLOB:
+   assert(obj->type == OBJ_BLOB);
+   assert((obj->flags & SEEN) == 0);
+
+   frame = _data->array_frame[filter_data->nr];
+
+   /*
+* If we previously provisionally omitted this blob because
+* its pathname was not in the sparse-checkout AND this
+* reference to the blob has the same pathname, we can avoid
+  

[PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter

2017-07-13 Thread Jeff Hostetler
From: Jeff Hostetler 

Create a filter for traverse_commit_list_filtered() to omit
blobs larger than a requested size from the result, but always
include ".git*" special files.

This filter will be used in a future commit by rev-list and
pack-objects for partial/narrow clone/fetch.

Signed-off-by: Jeff Hostetler 
---
 list-objects-filters.c | 97 ++
 list-objects-filters.h | 12 +++
 2 files changed, 109 insertions(+)

diff --git a/list-objects-filters.c b/list-objects-filters.c
index f29d8bc..f04d70e 100644
--- a/list-objects-filters.c
+++ b/list-objects-filters.c
@@ -83,3 +83,100 @@ void traverse_commit_list_omit_all_blobs(
 
oidset2_clear();
 }
+
+/*
+ * A filter for list-objects to omit large blobs,
+ * but always include ".git*" special files.
+ */
+struct filter_omit_large_blobs_data {
+   struct oidset2 omits;
+   int64_t max_bytes;
+};
+
+static list_objects_filter_result filter_omit_large_blobs(
+   list_objects_filter_type filter_type,
+   struct object *obj,
+   const char *pathname,
+   const char *filename,
+   void *filter_data_)
+{
+   struct filter_omit_large_blobs_data *filter_data = filter_data_;
+   int64_t object_length = -1;
+   unsigned long s;
+   enum object_type t;
+
+   switch (filter_type) {
+   default:
+   die("unkown filter_type");
+   return LOFR_ZERO;
+
+   case LOFT_BEGIN_TREE:
+   assert(obj->type == OBJ_TREE);
+   /* always include all tree objects */
+   return LOFR_MARK_SEEN | LOFR_SHOW;
+
+   case LOFT_END_TREE:
+   assert(obj->type == OBJ_TREE);
+   return LOFR_ZERO;
+
+   case LOFT_BLOB:
+   assert(obj->type == OBJ_BLOB);
+   assert((obj->flags & SEEN) == 0);
+
+   /*
+* If previously provisionally omitted (because of size), see 
if the
+* current filename is special and force it to be included.
+*/
+   if (oidset2_contains(_data->omits, >oid)) {
+   if ((strncmp(filename, ".git", 4) == 0) && filename[4]) 
{
+   oidset2_remove(_data->omits, >oid);
+   return LOFR_MARK_SEEN | LOFR_SHOW;
+   }
+   return LOFR_ZERO; /* continue provisionally omitting it 
*/
+   }
+
+   t = sha1_object_info(obj->oid.hash, );
+   assert(t == OBJ_BLOB);
+   object_length = (int64_t)((uint64_t)(s));
+
+   if (object_length < filter_data->max_bytes)
+   return LOFR_MARK_SEEN | LOFR_SHOW;
+
+   /*
+* Provisionally omit it.  We've already established that this 
blob
+* is too big and doesn't have a special filename, so we WANT to
+* omit it.  However, there may be a special file elsewhere in 
the
+* tree that references this same blob, so we cannot reject it 
yet.
+* Leave the LOFR_ bits unset so that if the blob appears again 
in
+* the traversal, we will be asked again.
+*
+* No need for a pathname, since we only test for special 
filenames
+* above.
+*/
+   oidset2_insert(_data->omits, >oid, object_length,
+  NULL);
+   return LOFR_ZERO;
+   }
+}
+
+void traverse_commit_list_omit_large_blobs(
+   struct rev_info *revs,
+   show_commit_fn show_commit,
+   show_object_fn show_object,
+   oidset2_foreach_cb print_omitted_object,
+   void *ctx_data,
+   int64_t large_byte_limit)
+{
+   struct filter_omit_large_blobs_data d;
+
+   memset(, 0, sizeof(d));
+   d.max_bytes = large_byte_limit;
+
+   traverse_commit_list_filtered(revs, show_commit, show_object, ctx_data,
+ filter_omit_large_blobs, );
+
+   if (print_omitted_object)
+   oidset2_foreach(, print_omitted_object, ctx_data);
+
+   oidset2_clear();
+}
diff --git a/list-objects-filters.h b/list-objects-filters.h
index b981020..32b2833 100644
--- a/list-objects-filters.h
+++ b/list-objects-filters.h
@@ -14,4 +14,16 @@ void traverse_commit_list_omit_all_blobs(
oidset2_foreach_cb print_omitted_object,
void *ctx_data);
 
+/*
+ * A filter for list-objects to omit large blobs,
+ * but always include ".git*" special files.
+ */
+void traverse_commit_list_omit_large_blobs(
+   struct rev_info *revs,
+   show_commit_fn show_commit,
+   show_object_fn show_object,
+   oidset2_foreach_cb print_omitted_object,
+   void *ctx_data,
+   int64_t large_byte_limit);
+
 #endif /* LIST_OBJECTS_FILTERS_H */
-- 
2.9.3



[PATCH v2 10/19] t6112: rev-list object filtering test

2017-07-13 Thread Jeff Hostetler
From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
 t/t6112-rev-list-filters-objects.sh | 37 +
 1 file changed, 37 insertions(+)
 create mode 100644 t/t6112-rev-list-filters-objects.sh

diff --git a/t/t6112-rev-list-filters-objects.sh 
b/t/t6112-rev-list-filters-objects.sh
new file mode 100644
index 000..ded2b04
--- /dev/null
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -0,0 +1,37 @@
+#!/bin/sh
+
+test_description='git rev-list with object filtering'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+   for n in 1 2 3 4 5 ; do \
+   echo $n > file.$n ; \
+   git add file.$n ; \
+   git commit -m "$n" ; \
+   done
+'
+
+test_expect_success 'omit-all-blobs omitted 5 blobs' '
+   git rev-list HEAD --objects --filter-print-manifest 
--filter-omit-all-blobs >omit_all &&
+   grep "^~" omit_all >omitted &&
+   test $(cat omitted | wc -l) = 5
+'
+
+test_expect_success 'omit-all-blobs blob sha match' '
+   git rev-list HEAD --objects >normal &&
+   awk "/file/ {print \$1;}" normal_sha &&
+   sed "s/~//" omit_all_sha &&
+   test_cmp normal_sha omit_all_sha
+'
+
+test_expect_success 'omit-all-blobs nothing else changed' '
+   grep -v "file" normal_other &&
+   grep -v "~" omit_other &&
+   test_cmp normal_other omit_other
+'
+
+# TODO test filter-omit-large-blobs
+# TODO test filter-use-sparse
+
+test_done
-- 
2.9.3



[PATCH v2 03/19] list-objects: filter objects in traverse_commit_list

2017-07-13 Thread Jeff Hostetler
From: Jeff Hostetler 

Create traverse_commit_list_filtered() and add filtering
interface to allow certain objects to be omitted (not shown)
during a traversal.

Update traverse_commit_list() to be a wrapper for the above.

Filtering will be used in a future commit by rev-list and
pack-objects for narrow/partial clone/fetch to omit certain
blobs from the output.

traverse_bitmap_commit_list() does not work with filtering.
If a packfile bitmap is present, it will not be used.

Signed-off-by: Jeff Hostetler 
---
 list-objects.c | 66 --
 list-objects.h | 30 ++
 2 files changed, 80 insertions(+), 16 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index f3ca6aa..8dddeda 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -13,10 +13,13 @@ static void process_blob(struct rev_info *revs,
 show_object_fn show,
 struct strbuf *path,
 const char *name,
-void *cb_data)
+void *cb_data,
+filter_object_fn filter,
+void *filter_data)
 {
struct object *obj = >object;
size_t pathlen;
+   list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_SHOW;
 
if (!revs->blob_objects)
return;
@@ -24,11 +27,15 @@ static void process_blob(struct rev_info *revs,
die("bad blob object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
-   obj->flags |= SEEN;
 
pathlen = path->len;
strbuf_addstr(path, name);
-   show(obj, path->buf, cb_data);
+   if (filter)
+   r = filter(LOFT_BLOB, obj, path->buf, >buf[pathlen], 
filter_data);
+   if (r & LOFR_MARK_SEEN)
+   obj->flags |= SEEN;
+   if (r & LOFR_SHOW)
+   show(obj, path->buf, cb_data);
strbuf_setlen(path, pathlen);
 }
 
@@ -69,7 +76,9 @@ static void process_tree(struct rev_info *revs,
 show_object_fn show,
 struct strbuf *base,
 const char *name,
-void *cb_data)
+void *cb_data,
+filter_object_fn filter,
+void *filter_data)
 {
struct object *obj = >object;
struct tree_desc desc;
@@ -77,6 +86,7 @@ static void process_tree(struct rev_info *revs,
enum interesting match = revs->diffopt.pathspec.nr == 0 ?
all_entries_interesting: entry_not_interesting;
int baselen = base->len;
+   list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_SHOW;
 
if (!revs->tree_objects)
return;
@@ -90,9 +100,13 @@ static void process_tree(struct rev_info *revs,
die("bad tree object %s", oid_to_hex(>oid));
}
 
-   obj->flags |= SEEN;
strbuf_addstr(base, name);
-   show(obj, base->buf, cb_data);
+   if (filter)
+   r = filter(LOFT_BEGIN_TREE, obj, base->buf, 
>buf[baselen], filter_data);
+   if (r & LOFR_MARK_SEEN)
+   obj->flags |= SEEN;
+   if (r & LOFR_SHOW)
+   show(obj, base->buf, cb_data);
if (base->len)
strbuf_addch(base, '/');
 
@@ -112,7 +126,7 @@ static void process_tree(struct rev_info *revs,
process_tree(revs,
 lookup_tree(entry.oid->hash),
 show, base, entry.path,
-cb_data);
+cb_data, filter, filter_data);
else if (S_ISGITLINK(entry.mode))
process_gitlink(revs, entry.oid->hash,
show, base, entry.path,
@@ -121,8 +135,17 @@ static void process_tree(struct rev_info *revs,
process_blob(revs,
 lookup_blob(entry.oid->hash),
 show, base, entry.path,
-cb_data);
+cb_data, filter, filter_data);
}
+
+   if (filter) {
+   r = filter(LOFT_END_TREE, obj, base->buf, >buf[baselen], 
filter_data);
+   if (r & LOFR_MARK_SEEN)
+   obj->flags |= SEEN;
+   if (r & LOFR_SHOW)
+   show(obj, base->buf, cb_data);
+   }
+
strbuf_setlen(base, baselen);
free_tree_buffer(tree);
 }
@@ -183,10 +206,10 @@ static void add_pending_tree(struct rev_info *revs, 
struct tree *tree)
add_pending_object(revs, >object, "");
 }
 
-void traverse_commit_list(struct rev_info *revs,
- show_commit_fn show_commit,
- show_object_fn show_object,
- void 

[PATCH v2 09/19] rev-list: add filtering help text

2017-07-13 Thread Jeff Hostetler
From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
 Documentation/git-rev-list.txt |  7 ++-
 Documentation/rev-list-options.txt | 26 ++
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt
index ef22f17..d20c2ab 100644
--- a/Documentation/git-rev-list.txt
+++ b/Documentation/git-rev-list.txt
@@ -47,7 +47,12 @@ SYNOPSIS
 [ --fixed-strings | -F ]
 [ --date=]
 [ [ --objects | --objects-edge | --objects-edge-aggressive ]
-  [ --unpacked ] ]
+  [ --unpacked ]
+  [ [ --filter-omit-all-blobs |
+  --filter-omit-large-blobs=[kmg] |
+  --filter-use-sparse= ]
+[ --filter-print-manifest ] ] ]
+[ --filter-relax ]
 [ --pretty | --header ]
 [ --bisect ]
 [ --bisect-vars ]
diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index a02f732..e0112dd 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -693,6 +693,32 @@ ifdef::git-rev-list[]
 --unpacked::
Only useful with `--objects`; print the object IDs that are not
in packs.
+
+--filter-omit-all-blobs::
+   Only useful with one of the `--objects*`; omits all blobs from
+   the printed list of objects.
+
+--filter-omit-large-blobs=[kmg]::
+   Only useful with one of the `--objects*`; omits blobs larger than
+   n bytes from the printed list of objects.  May optionally be
+   followed by 'k', 'm', or 'g' units.  Value may be zero.  Special
+   files (matching ".git*") are always included, regardless of size.
+
+--filter-use-sparse=::
+   Only useful with one of the `--objects*`; uses a sparse-checkout
+   specification contained in the given object to filter the result
+   by omitting blobs that would not be used by the corresponding
+   sparse checkout.
+
+--filter-print-manifest::
+   Only useful with one of the above `--filter*`; prints a manifest
+   of the omitted objects.  Object IDs are prefixed with a ``~''
+   character.  The object size is printed after the ID.
+
+--filter-relax::
+   Relax consistency checking for missing blobs.  Do not warn of
+   missing blobs during normal (non-filtering) object traversal
+   following an earlier partial/narrow clone or fetch.
 endif::git-rev-list[]
 
 --no-walk[=(sorted|unsorted)]::
-- 
2.9.3



[PATCH v2 19/19] fetch: add object filtering to fetch

2017-07-13 Thread Jeff Hostetler
From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
 builtin/fetch.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 5f2c2ab..306c165 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -16,6 +16,7 @@
 #include "connected.h"
 #include "argv-array.h"
 #include "utf8.h"
+#include "object-filter.h"
 
 static const char * const builtin_fetch_usage[] = {
N_("git fetch [] [ [...]]"),
@@ -52,6 +53,7 @@ static const char *recurse_submodules_default;
 static int shown_url = 0;
 static int refmap_alloc, refmap_nr;
 static const char **refmap_array;
+static struct object_filter_options filter_options;
 
 static int option_parse_recurse_submodules(const struct option *opt,
   const char *arg, int unset)
@@ -141,6 +143,14 @@ static struct option builtin_fetch_options[] = {
TRANSPORT_FAMILY_IPV4),
OPT_SET_INT('6', "ipv6", , N_("use IPv6 addresses only"),
TRANSPORT_FAMILY_IPV6),
+
+   OPT_PARSE_FILTER_OMIT_ALL_BLOBS(_options),
+   OPT_PARSE_FILTER_OMIT_LARGE_BLOBS(_options),
+   OPT_PARSE_FILTER_USE_SPARSE(_options),
+
+   /* OPT_PARSE_FILTER_PRINT_MANIFEST(_options), */
+   /* OPT_PARSE_FILTER_RELAX(_options), */
+
OPT_END()
 };
 
@@ -733,6 +743,14 @@ static int store_updated_refs(const char *raw_url, const 
char *remote_name,
const char *filename = dry_run ? "/dev/null" : git_path_fetch_head();
int want_status;
int summary_width = transport_summary_width(ref_map);
+   struct check_connected_options opt = CHECK_CONNECTED_INIT;
+
+   /*
+* Relax consistency check to allow missing blobs (presumably
+* because they are exactly the set that we requested be
+* omitted.
+*/
+   opt.filter_relax = object_filter_enabled(_options);
 
fp = fopen(filename, "a");
if (!fp)
@@ -744,7 +762,7 @@ static int store_updated_refs(const char *raw_url, const 
char *remote_name,
url = xstrdup("foreign");
 
rm = ref_map;
-   if (check_connected(iterate_ref_map, , NULL)) {
+   if (check_connected(iterate_ref_map, , )) {
rc = error(_("%s did not send all necessary objects\n"), url);
goto abort;
}
@@ -885,6 +903,13 @@ static int quickfetch(struct ref *ref_map)
struct check_connected_options opt = CHECK_CONNECTED_INIT;
 
/*
+* Relax consistency check to allow missing blobs (presumably
+* because they are exactly the set that we requested be
+* omitted.
+*/
+   opt.filter_relax = object_filter_enabled(_options);
+
+   /*
 * If we are deepening a shallow clone we already have these
 * objects reachable.  Running rev-list here will return with
 * a good (0) exit status and we'll bypass the fetch that we
-- 
2.9.3



[PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation

2017-07-13 Thread Jeff Hostetler
From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
 Documentation/technical/pack-protocol.txt | 16 
 Documentation/technical/protocol-capabilities.txt |  7 +++
 2 files changed, 23 insertions(+)

diff --git a/Documentation/technical/pack-protocol.txt 
b/Documentation/technical/pack-protocol.txt
index a349171..dce6e04 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -212,6 +212,7 @@ out of what the server said it could do with the first 
'want' line.
   upload-request=  want-list
   *shallow-line
   *1depth-request
+  [filter-request]
   flush-pkt
 
   want-list =  first-want
@@ -226,7 +227,13 @@ out of what the server said it could do with the first 
'want' line.
   first-want=  PKT-LINE("want" SP obj-id SP capability-list)
   additional-want   =  PKT-LINE("want" SP obj-id)
 
+  filter-request=  PKT-LINE("filter-omit-all-blobs") /
+  PKT-LINE("filter-omit-large-blobs" SP magnitude) /
+  PKT-LINE("filter-use-sparse" SP obj-id)
+
   depth =  1*DIGIT
+
+  magnitude =  1*DIGIT [ "k" | "m" | "g" ]
 
 
 Clients MUST send all the obj-ids it wants from the reference
@@ -249,6 +256,15 @@ complete those commits. Commits whose parents are not 
received as a
 result are defined as shallow and marked as such in the server. This
 information is sent back to the client in the next step.
 
+The client can optionally request that pack-objects omit various
+objects from the packfile using one of several filtering techniques.
+These are intended for use with partial/narrow clone/fetch operations.
+"filter-omit-all-blobs" requests that all blobs be omitted from
+the packfile.  "filter-omit-large-blobs" requests that blobs larger
+than the requested size be omitted, unless they have a ".git*"
+special filename.  "filter-use-sparse" requests blob filtering based
+upon a sparse-checkout specification in the given blob id.
+
 Once all the 'want's and 'shallow's (and optional 'deepen') are
 transferred, clients MUST send a flush-pkt, to tell the server side
 that it is done sending the list.
diff --git a/Documentation/technical/protocol-capabilities.txt 
b/Documentation/technical/protocol-capabilities.txt
index 26dcc6f..7011eb3 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -309,3 +309,10 @@ to accept a signed push certificate, and asks the  
to be
 included in the push certificate.  A send-pack client MUST NOT
 send a push-cert packet unless the receive-pack server advertises
 this capability.
+
+filter-objects
+--
+
+If the upload-pack server advertises the 'filter-objects' capability,
+fetch-pack may send "filter-*" commands to request a partial/narrow
+clone/fetch where the server omits various objects from the packfile.
-- 
2.9.3



  1   2   >