Re: Git on macOS shows committed files as untracked
Using precomposeunicode still reproduces the issue: Repro steps: 1. Download https://www.dropbox.com/s/0q5pbpqpckwzj7b/gitstatusrepro.zip?dl=0 2. unzip gitstatusrepro.zip && cd gitstatusrepro 3. git reset --hard 4. git -c core.precomposeunicode=true status On branch master Untracked files: (use "git add ..." to include in what will be committed) "d\314\207\316\271\314\223\314\200\342\225\223\316\265\357\256\257\360\222\221\217\342\227\213\342\225\223\320\243\314\213/" nothing added to commit but untracked files present (use "git add" to track) > From: Torsten Bögershausen> > Thanks for the fast analyzes - > in short: > what does > git -c core.precomposeunicode=true status > say ? > > > The easiest thing may be to set > git config --global core.precomposeunicode true
Re: What's cooking in git.git (Jul 2017, #04; Thu, 13)
Santiago Torreswrites: >> Here are the topics that have been cooking. > > I sent (a patch almost a week ago) that would probably[1] be labeled > as "uninteresting" (as per the notes from the maintainer), but I wanted > to make sure it wasn't lost in the noise -- I see that theres a lot of > active development lately. I checked the latest iterations of "what's > cooking" to see if it was going to be discarded or so, but I see no > mention of it. I postponed it when I saw it the first time to see if anybody comments on it, and then it turns out nobody was interested, and it remained uninteresting to the list to this day. Now, after looking at the message again, from the patch description, I would believe you that you experienced _some_ trouble when the gpg-agent that is auto-spawned by gpg gets left behind (as I do not see any hits from "git grep gpg-agent t/", we are not deliberately using the agent). However, I could not convince myself that the solution is credible. Here is an excerpt from the patch: > diff --git a/t/lib-gpg.sh b/t/lib-gpg.sh > index ec2aa8f68..22ef2fa87 100755 > --- a/t/lib-gpg.sh > +++ b/t/lib-gpg.sh > @@ -31,6 +31,7 @@ then > chmod 0700 ./gpghome && > GNUPGHOME="$(pwd)/gpghome" && > export GNUPGHOME && > + gpgconf --kill gpg-agent && > gpg --homedir "${GNUPGHOME}" 2>/dev/null --import \ > "$TEST_DIRECTORY"/lib-gpg/keyring.gpg && > gpg --homedir "${GNUPGHOME}" 2>/dev/null --import-ownertrust \ > -- but the current directory of this part is the $TRASH_DIRECTORY, which is always created anew from the beginning in a test. What agent process could possibly be running there immedately after creating ./gpghome (which happens with "mkdir gpghome &&" without "-p" just before the context of this hunk---if the agent was running already, the directory would have been there, and mkdir would have failed, which would have caused "test_set_prereq GPG" at the end of the "&&" cascade to be skipped. In other words, it is unclear to me, and your log message does not sufficiently explain, why this is the right solution (or what the exact cause of the symptom is, for that matter). Or perhaps the gpg-agent socket is created somewhere outside the GNUPGHOME and stays behind even after a previous run of the same test finishes and $TRASH_DIRECTORY gets cleared (I am guessing the "what the exact cause is" part, as the log message did not explain it)? If that is the case, it makes me wonder if either of the two alternative may be a more robust solution: (1) running gpg tests telling gpg never to use agent, or (2) telling gpg and gpg-agent to use a socket inside GNUPGHOME. After all, "kill"ing agent blindly like the above patch would mean you do not know what other party is relying on the proper operation of the thing you are killing. That sounds more like a workaround that a solution (unless it is explained with a solid reason why that is the right way to run more than one instances of GPG). Perhaps everybody else is running these gpg tests without having to worry about gpg-agent already because their environment is more vanilla, but you have some configuration or environment that cause gpg to use agent, and that is the reason why nobody is interested (because they have never seen the symptom)? It is possible that the part of t/test-lib.sh that tries to cleanse environment variables and other "external influence" to give us a predictable and repeatable test is unaware of such a knob that only some developers (including you) have and the rest of us were merely lucky. Perhaps we need to throw GPG_AGENT_INFO SSH_AUTH_SOCK etc. into the list of envirionment variables to nuke there? Combined with the unknown-ness of the root cause of the issue, I can only say that the patch may be raising an issue worth addressing, but it is too sketchy to tell if it is a right solution or what the exact problem being solved is.
Re: Strange behavior with Git add -p in version 2.13.3
Ben Reedwrites: > Hello, I updated Git to 2.13.3, and now selecting 's' to split a > change on a call to `git add -p` is not working. It's showing the list > of options as if 's' is not a valid choice... > > Particularly, I'm getting... > Stage this hunk [y,n,q,a,d,/,e,?]? s > y - stage this hunk > n - do not stage this hunk > q - quit; do not stage this hunk or any of the remaining ones > a - stage this hunk and all later hunks in the file > d - do not stage this hunk or any of the later hunks in the file > g - select a hunk to go to > / - search for a hunk matching the given regex > j - leave this hunk undecided, see next undecided hunk > J - leave this hunk undecided, see next hunk > k - leave this hunk undecided, see previous undecided hunk > K - leave this hunk undecided, see previous hunk > s - split the current hunk into smaller hunks > e - manually edit the current hunk > ? - print help > > Is anyone else having this problem? Does anybody know how to resolve > it? I'm running on macOS Version 10.12.5. I do not think it is MacOSX specific. I notice that the prompt does not even offer 's' as a valid choice, which typically means that the hunk you are looking at is not splittable. A splittable hunk has more than one blocks of "+addition" and/or "-deletion" lines separated by at least one " context" line. If your hunk looks like this, for example: -- >8 -- diff --git a/COPYING b/COPYING index 536e55524d..35c4dd4473 100644 --- a/COPYING +++ b/COPYING @@ -3,7 +3,7 @@ is concerned is _this_ particular version of the license (ie v2, not v2.2 or v3.x or whatever), unless explicitly otherwise stated. - HOWEVER, in order to allow a migration to GPLv3 if that seems like + However, in order to allow a migration to GPLv3 if that seems like a good idea, I also ask that people involved with the project make their preferences known. In particular, if you trust me to make that decision, you might note so in your copyright message, ie something Stage this hunk [y,n,q,a,d,/,e,?]? -- 8< -- you would not see 's' offered as a valid choice. If you are looking at a splittable hunk, on the other hand: -- >8 -- diff --git a/COPYING b/COPYING index 536e55524d..02a5c58938 100644 --- a/COPYING +++ b/COPYING @@ -3,9 +3,9 @@ is concerned is _this_ particular version of the license (ie v2, not v2.2 or v3.x or whatever), unless explicitly otherwise stated. - HOWEVER, in order to allow a migration to GPLv3 if that seems like + However, in order to allow a migration to GPLv3 if that seems like a good idea, I also ask that people involved with the project make - their preferences known. In particular, if you trust me to make that + *their* preferences known. In particular, if you trust me to make that decision, you might note so in your copyright message, ie something like Stage this hunk [y,n,q,a,d,/,s,e,?]? -- 8< -- you will see 's' offered as a valid choice.
Re: What's cooking in git.git (Jul 2017, #04; Thu, 13)
> Here are the topics that have been cooking. Hi, I sent (a patch almost a week ago) that would probably[1] be labeled as "uninteresting" (as per the notes from the maintainer), but I wanted to make sure it wasn't lost in the noise -- I see that theres a lot of active development lately. I checked the latest iterations of "what's cooking" to see if it was going to be discarded or so, but I see no mention of it. Thanks! -Santiago [1] https://public-inbox.org/git/20170707220729.a3xrsju3rf4guyzs@LykOS.localdomain/T/#t signature.asc Description: PGP signature
Re: reftable: new ref storage format
On Thu, Jul 13, 2017 at 1:35 PM, Jeff Kingwrote: > On Thu, Jul 13, 2017 at 12:56:54PM -0700, Stefan Beller wrote: > > I agree that a full binary search of a reftable is harder because of the > prefix compression (it may still be possible by scanning backwards, but > I think there are ambiguities when you land in the middle of a record, > since there's no unambiguous end-of-record character). Its impossible to safely binary search this reftable format using a naive divide byte count in half and find record boundary approach. I actually did design an earlier version of reftable that was safe to use this approach for its binary search within blocks, and wound up discarding it. It was slower and more complex implementation than the format I shared with the list. > But I don't think > it matters. If you binary-search to a constant-sized block, then a > linear scan of the block is acceptable. Depends on the block size. :) > Not that I'm recommending just gzipping the whole packed-refs file. It > ruins the fast-lookup. As I just mentioned elsewhere in the thread: src file65306185 gzip28338906 reftable 28782292 The reftable format (for 64k block, 256 restart) is within spitting distance (432 KiB) of a default level gzip of packed-refs. We can get fast-lookup, and OK compression. > We _could_ consider gzipping individual blocks of > a reftable (or any structure that allows you to search to a > constant-sized block and do a linear search from there). But given that > they're in the same ballpark, I'm happy with whatever ends up the > simplest to code and debug. ;) This does help to shrink the file, e.g. it drops from 28M to 23M. It makes it more CPU costly to access a block, as we have to inflate that to walk through the records. It also messes with alignment. When you touch a block, that may be straddling two virtual memory pages in your kernel/filesystem. I'm not sure those penalties are worth the additional 16% reduction in size. >> When Shawn presented the proposal, a couple of colleagues here >> were as excited as I was, but the daring question is, why Shawn >> did not give the whole thing in BNF format from top down: >> >> initial-block >> content-blocks* >> (index-block) >> footer > > Yeah, I agree it took me a bit to figure out what was going on. A > high-level overview of the format would have been nice. Noted, I've added this to my writeup.
Re: reftable: new ref storage format
On Thu, Jul 13, 2017 at 12:32 PM, Jeff Kingwrote: > On Wed, Jul 12, 2017 at 05:17:58PM -0700, Shawn Pearce wrote: > >> ### Problem statement >> >> Some repositories contain a lot of references (e.g. android at 866k, >> rails at 31k). The existing packed-refs format takes up a lot of >> space (e.g. 62M), and does not scale with additional references. >> Lookup of a single reference requires linearly scanning the file. > > I think the linear scan is actually an implementation short-coming. Even > though the records aren't fixed-length, the fact that newlines can only > appear as end-of-record is sufficient to mmap and binary search a > packed-refs file (you just have to backtrack a little when you land in > the middle of a record). > > I wrote a proof of concept a while ago, but got stuck on integrating it > into the ref code, because of some of the assumptions that it made. > Michael Haggerty has been doing several rounds of refactors to remove > those assumptions. I think we're pretty close (I've actually seen the > endgame where packed-refs is fully binary searched, but I think there > are a few more cleanups necessary to cover all cases). You are correct, this is possible with the current packed-refs format. It just hasn't materialized in a shipping implementation yet. >> Atomic pushes modifying multiple references require copying the >> entire packed-refs file, which can be a considerable amount of data >> moved (e.g. 62M in, 62M out) for even small transactions (2 refs >> modified). > > I think your definition of atomic here doesn't match what git.git does. :-( > Our atomic push just takes the lock on all of the refs, and then once it > has all of them, commits all of the locks. So it's atomic in the sense > that you either get all or none of the writes (modulo a commit failure > in the middle, which we naturally have no rollback plan for). But it can > be done without touching the packed-refs file at all. > > I imagine that you're looking at atomicity from the perspective of a > reader. In the git.git scheme, the reader may see a half-committed > transaction. If you dispense with loose refs entirely and treat the > packed-refs file as a single poorly-implemented key/value database, then > you get reader atomicity (but O(size_of_database) write performance). Yes, I was hoping for reader atomicity. But I may OK foregoing that if the transaction either all goes through, or all fails. A partially stuck transaction because the process died in the middle of the commit step creates a mess for an administrator to undo. Does she rename "foo.lock" to "foo"? Or delete "foo.lock"? >> Repositories with many loose references occupy a large number of disk >> blocks from the local file system, as each reference is its own file >> storing 41 bytes. This negatively affects the number of inodes >> available when a large number of repositories are stored on the same >> filesystem. Readers are also penalized due to the larger number of >> syscalls required to traverse and read the `$GIT_DIR/refs` directory. > > In my experience, the syscalls involved in loose refs aren't usually a > big deal. If you have 800k refs, they're not all changing constantly. So > a single pack-refs "fixes" performance going forward. What _is_ a big > deal is that the packing process is complicated, readers have a very > un-atomic view because of the myriad of files involved, and you get > annoying lock contention during packing, as well as between deletions > that have to rewrite packed-refs. > > But I'm not sure if you meant to contrast here a system where we didn't > use packed-refs at all (though of course such a system is very much not > atomic by the definition above). No, I really did mean the current system. Gerrit Code Review servers create a lot of references throughout the day. Its easy to accumulate a few thousand new loose references in a 24 hour period. Even if you have 600k existing refs in packed-refs, you still have 2k new/modified refs since last nightly cron ran git gc. >> ### Objectives >> >> - Near constant time lookup for any single reference, even when the >> repository is cold and not in process or kernel cache. > > Good goal, though TBH I'd be happy with O(log n). > > A related one is being able to traverse a subset of refs in > O(nr_traversed). E.g., "git tag -l" should not have to do work > proportional to what is in refs/changes. That falls out of most > proposals that allow fast lookups, but notably not a straight > hash-table. Thanks, I missed that in this list, even though it was an explicit objective going into this work. I added: - Efficient lookup of an entire namespace, such as `refs/tags/`. >> - Occupy less disk space for large repositories. > > Good goal. Just to play devil's advocate, the simplest way to do that > with the current code would be to gzip packed-refs (and/or store sha1s > as binary). That works against the "mmap and binary search" plan, > though. :) Yes
[PATCH v2 03/13] submodule: convert submodule config lookup to use object_id
Signed-off-by: brian m. carlson--- builtin/grep.c | 4 ++-- builtin/submodule--helper.c | 8 config.c | 12 ++-- config.h | 4 ++-- repository.c | 2 +- submodule-config.c | 38 +++--- submodule-config.h | 12 ++-- submodule.c | 32 submodule.h | 2 +- t/helper/test-submodule-config.c | 10 +- 10 files changed, 62 insertions(+), 62 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index fa351c49f4..89fcb5b337 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -653,7 +653,7 @@ static int grep_submodule(struct grep_opt *opt, const struct object_id *oid, */ if (oid) { const struct submodule *sub = - submodule_from_path(null_sha1, path); + submodule_from_path(_oid, path); if (sub) path = git_path("modules/%s", sub->name); @@ -862,7 +862,7 @@ static int grep_objects(struct grep_opt *opt, const struct pathspec *pathspec, /* load the gitmodules file for this rev */ if (recurse_submodules) { submodule_free(); - gitmodules_config_sha1(real_obj->oid.hash); + gitmodules_config_oid(_obj->oid); } if (grep_object(opt, pathspec, real_obj, list->objects[i].name, list->objects[i].path)) { hit = 1; diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index 6abdad3294..af871f14e7 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -350,7 +350,7 @@ static void init_submodule(const char *path, const char *prefix, int quiet) } else displaypath = xstrdup(path); - sub = submodule_from_path(null_sha1, path); + sub = submodule_from_path(_oid, path); if (!sub) die(_("No url found for submodule path '%s' in .gitmodules"), @@ -476,7 +476,7 @@ static int module_name(int argc, const char **argv, const char *prefix) usage(_("git submodule--helper name ")); gitmodules_config(); - sub = submodule_from_path(null_sha1, argv[1]); + sub = submodule_from_path(_oid, argv[1]); if (!sub) die(_("no submodule mapping found in .gitmodules for path '%s'"), @@ -795,7 +795,7 @@ static int prepare_to_clone_next_submodule(const struct cache_entry *ce, goto cleanup; } - sub = submodule_from_path(null_sha1, ce->name); + sub = submodule_from_path(_oid, ce->name); if (suc->recursive_prefix) displaypath = relative_path(suc->recursive_prefix, @@ -1060,7 +1060,7 @@ static const char *remote_submodule_branch(const char *path) gitmodules_config(); git_config(submodule_config, NULL); - sub = submodule_from_path(null_sha1, path); + sub = submodule_from_path(_oid, path); if (!sub) return NULL; diff --git a/config.c b/config.c index a9356c1383..014151d032 100644 --- a/config.c +++ b/config.c @@ -1460,9 +1460,9 @@ int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_typ return do_config_from(, fn, data); } -int git_config_from_blob_sha1(config_fn_t fn, +int git_config_from_blob_oid(config_fn_t fn, const char *name, - const unsigned char *sha1, + const struct object_id *oid, void *data) { enum object_type type; @@ -1470,7 +1470,7 @@ int git_config_from_blob_sha1(config_fn_t fn, unsigned long size; int ret; - buf = read_sha1_file(sha1, , ); + buf = read_sha1_file(oid->hash, , ); if (!buf) return error("unable to load config blob object '%s'", name); if (type != OBJ_BLOB) { @@ -1488,11 +1488,11 @@ static int git_config_from_blob_ref(config_fn_t fn, const char *name, void *data) { - unsigned char sha1[20]; + struct object_id oid; - if (get_sha1(name, sha1) < 0) + if (get_oid(name, ) < 0) return error("unable to resolve config blob '%s'", name); - return git_config_from_blob_sha1(fn, name, sha1, data); + return git_config_from_blob_oid(fn, name, , data); } const char *git_etc_gitconfig(void) diff --git a/config.h b/config.h index 0352da117b..827f2b0e4a 100644 --- a/config.h +++ b/config.h @@ -39,8 +39,8 @@ extern int git_default_config(const char *, const char *, void *); extern int
[PATCH v2 00/13] object_id part 9
This is the ninth in a series of series to convert Git to use struct object_id. This series converts the remaining callers of get_sha1 and friends to take and use struct object_id, and in doing so, renames them to get_oid and friends. This patch should probably be based Stefan Beller's series, in which case patch 9 can be dropped. Changes from v1: * Restore the check for length in get_sha1_basic. * Add a patch converting some uses of 40 to GIT_SHA1_HEXSZ as suggested. This is a separate patch because I wanted to minimize the non-automated portions of the patch in question. brian m. carlson (13): builtin/fsck: convert remaining caller of get_sha1 to object_id builtin/merge-tree: convert remaining caller of get_sha1 to object_id submodule: convert submodule config lookup to use object_id remote: convert struct push_cas to struct object_id sequencer: convert to struct object_id builtin/update_ref: convert to struct object_id bisect: convert bisect_checkout to struct object_id builtin/unpack-file: convert to struct object_id builtin/verify-tag: convert to struct object_id Convert remaining callers of get_sha1 to get_oid. sha1_name: convert get_sha1* to get_oid* sha1_name: convert GET_SHA1* flags to GET_OID* sha1_name: convert uses of 40 to GIT_SHA1_HEXSZ apply.c | 4 +- archive.c| 2 +- bisect.c | 18 +-- builtin/am.c | 6 +- builtin/cat-file.c | 8 +- builtin/commit-tree.c| 4 +- builtin/commit.c | 8 +- builtin/fsck.c | 8 +- builtin/grep.c | 8 +- builtin/log.c| 4 +- builtin/merge-tree.c | 6 +- builtin/receive-pack.c | 4 +- builtin/replace.c| 4 +- builtin/reset.c | 10 +- builtin/rev-parse.c | 8 +- builtin/show-branch.c| 8 +- builtin/submodule--helper.c | 8 +- builtin/unpack-file.c| 12 +- builtin/update-ref.c | 69 ++- builtin/verify-tag.c | 8 +- cache.h | 45 commit.c | 4 +- config.c | 12 +- config.h | 4 +- mailmap.c| 6 +- notes.c | 2 +- refs.c | 2 +- remote.c | 8 +- remote.h | 2 +- repository.c | 2 +- revision.c | 16 +-- sequencer.c | 65 +-- sha1_name.c | 240 +++ submodule-config.c | 38 +++ submodule-config.h | 12 +- submodule.c | 32 +++--- submodule.h | 2 +- t/helper/test-submodule-config.c | 10 +- transport-helper.c | 2 +- 39 files changed, 351 insertions(+), 360 deletions(-)
[PATCH v2 12/13] sha1_name: convert GET_SHA1* flags to GET_OID*
Convert the flags for get_oid_with_context and friends to use "OID" instead of "SHA1" in their names. This transform was made by running the following one-liner on the affected files: perl -pi -e 's/GET_SHA1/GET_OID/g' Signed-off-by: brian m. carlson--- builtin/cat-file.c | 4 ++-- builtin/grep.c | 2 +- builtin/log.c | 2 +- builtin/rev-parse.c | 2 +- cache.h | 28 +++ refs.c | 2 +- revision.c | 6 ++--- sha1_name.c | 64 ++--- 8 files changed, 55 insertions(+), 55 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 1f035fb550..62c8cf0ebf 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -63,7 +63,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name, if (unknown_type) flags |= OBJECT_INFO_ALLOW_UNKNOWN_TYPE; - if (get_oid_with_context(obj_name, GET_SHA1_RECORD_PATH, + if (get_oid_with_context(obj_name, GET_OID_RECORD_PATH, , _context)) die("Not a valid object name %s", obj_name); @@ -361,7 +361,7 @@ static void batch_one_object(const char *obj_name, struct batch_options *opt, struct expand_data *data) { struct object_context ctx; - int flags = opt->follow_symlinks ? GET_SHA1_FOLLOW_SYMLINKS : 0; + int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0; enum follow_symlinks_result result; result = get_oid_with_context(obj_name, flags, >oid, ); diff --git a/builtin/grep.c b/builtin/grep.c index 8943626e6c..1618087a60 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -1207,7 +1207,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) break; } - if (get_oid_with_context(arg, GET_SHA1_RECORD_PATH, + if (get_oid_with_context(arg, GET_OID_RECORD_PATH, , )) { if (seen_dashdash) die(_("unable to resolve revision: %s"), arg); diff --git a/builtin/log.c b/builtin/log.c index 60fac9d4dc..3dbdf8cadd 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -484,7 +484,7 @@ static int show_blob_object(const struct object_id *oid, struct rev_info *rev, c !DIFF_OPT_TST(>diffopt, ALLOW_TEXTCONV)) return stream_blob_to_fd(1, oid, NULL, 0); - if (get_oid_with_context(obj_name, GET_SHA1_RECORD_PATH, + if (get_oid_with_context(obj_name, GET_OID_RECORD_PATH, , _context)) die(_("Not a valid object name %s"), obj_name); if (!obj_context.path || diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c index 041b7898c4..2bd28d3c08 100644 --- a/builtin/rev-parse.c +++ b/builtin/rev-parse.c @@ -702,7 +702,7 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix) } if (!strcmp(arg, "--quiet") || !strcmp(arg, "-q")) { quiet = 1; - flags |= GET_SHA1_QUIETLY; + flags |= GET_OID_QUIETLY; continue; } if (opt_with_value(arg, "--short", )) { diff --git a/cache.h b/cache.h index da7ac495ba..cfc3698eba 100644 --- a/cache.h +++ b/cache.h @@ -1283,27 +1283,27 @@ struct object_context { */ struct strbuf symlink_path; /* -* If GET_SHA1_RECORD_PATH is set, this will record path (if any) +* If GET_OID_RECORD_PATH is set, this will record path (if any) * found when resolving the name. The caller is responsible for * releasing the memory. */ char *path; }; -#define GET_SHA1_QUIETLY 01 -#define GET_SHA1_COMMIT02 -#define GET_SHA1_COMMITTISH04 -#define GET_SHA1_TREE 010 -#define GET_SHA1_TREEISH 020 -#define GET_SHA1_BLOB 040 -#define GET_SHA1_FOLLOW_SYMLINKS 0100 -#define GET_SHA1_RECORD_PATH 0200 -#define GET_SHA1_ONLY_TO_DIE04000 +#define GET_OID_QUIETLY 01 +#define GET_OID_COMMIT02 +#define GET_OID_COMMITTISH04 +#define GET_OID_TREE 010 +#define GET_OID_TREEISH 020 +#define GET_OID_BLOB 040 +#define GET_OID_FOLLOW_SYMLINKS 0100 +#define GET_OID_RECORD_PATH 0200 +#define GET_OID_ONLY_TO_DIE04000 -#define GET_SHA1_DISAMBIGUATORS \ - (GET_SHA1_COMMIT | GET_SHA1_COMMITTISH | \ - GET_SHA1_TREE | GET_SHA1_TREEISH | \ - GET_SHA1_BLOB) +#define GET_OID_DISAMBIGUATORS \ + (GET_OID_COMMIT | GET_OID_COMMITTISH | \ + GET_OID_TREE | GET_OID_TREEISH | \ + GET_OID_BLOB) extern int get_oid(const char *str, struct
[PATCH v2 04/13] remote: convert struct push_cas to struct object_id
This gets rid of one use of get_sha1. Signed-off-by: brian m. carlson--- remote.c | 6 +++--- remote.h | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/remote.c b/remote.c index d87482573d..9da9040bf0 100644 --- a/remote.c +++ b/remote.c @@ -2294,8 +2294,8 @@ static int parse_push_cas_option(struct push_cas_option *cas, const char *arg, i if (!*colon) entry->use_tracking = 1; else if (!colon[1]) - hashclr(entry->expect); - else if (get_sha1(colon + 1, entry->expect)) + oidclr(>expect); + else if (get_oid(colon + 1, >expect)) return error("cannot parse expected object name '%s'", colon + 1); return 0; } @@ -2342,7 +2342,7 @@ static void apply_cas(struct push_cas_option *cas, continue; ref->expect_old_sha1 = 1; if (!entry->use_tracking) - hashcpy(ref->old_oid_expect.hash, cas->entry[i].expect); + oidcpy(>old_oid_expect, >expect); else if (remote_tracking(remote, ref->name, >old_oid_expect)) oidclr(>old_oid_expect); return; diff --git a/remote.h b/remote.h index 6c28cd3e4b..2ecf4c8c74 100644 --- a/remote.h +++ b/remote.h @@ -282,7 +282,7 @@ struct ref *get_stale_heads(struct refspec *refs, int ref_count, struct ref *fet struct push_cas_option { unsigned use_tracking_for_rest:1; struct push_cas { - unsigned char expect[20]; + struct object_id expect; unsigned use_tracking:1; char *refname; } *entry;
[PATCH v2 13/13] sha1_name: convert uses of 40 to GIT_SHA1_HEXSZ
There are several uses of the constant 40 in find_unique_abbrev_r. Convert them to GIT_SHA1_HEXSZ. Signed-off-by: brian m. carlson--- sha1_name.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sha1_name.c b/sha1_name.c index b49303271e..8b7fd7e134 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -501,10 +501,10 @@ int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len) } sha1_to_hex_r(hex, sha1); - if (len == 40 || !len) - return 40; + if (len == GIT_SHA1_HEXSZ || !len) + return GIT_SHA1_HEXSZ; exists = has_sha1_file(sha1); - while (len < 40) { + while (len < GIT_SHA1_HEXSZ) { struct object_id oid_ret; status = get_short_oid(hex, len, _ret, GET_OID_QUIETLY); if (exists
[PATCH v2 02/13] builtin/merge-tree: convert remaining caller of get_sha1 to object_id
Signed-off-by: brian m. carlson--- builtin/merge-tree.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c index bad6735c76..f12da292cf 100644 --- a/builtin/merge-tree.c +++ b/builtin/merge-tree.c @@ -347,12 +347,12 @@ static void merge_trees(struct tree_desc t[3], const char *base) static void *get_tree_descriptor(struct tree_desc *desc, const char *rev) { - unsigned char sha1[20]; + struct object_id oid; void *buf; - if (get_sha1(rev, sha1)) + if (get_oid(rev, )) die("unknown rev %s", rev); - buf = fill_tree_descriptor(desc, sha1); + buf = fill_tree_descriptor(desc, oid.hash); if (!buf) die("%s is not a tree", rev); return buf;
[PATCH v2 11/13] sha1_name: convert get_sha1* to get_oid*
Now that all the callers of get_sha1 directly or indirectly use struct object_id, rename the functions starting with get_sha1 to start with get_oid. Convert the internals in sha1_name.c to use struct object_id as well, and eliminate explicit length checks where possible. Convert a use of 40 in get_oid_basic to GIT_SHA1_HEXSZ. Outside of sha1_name.c and cache.h, this transition was made with the following semantic patch: @@ expression E1, E2; @@ - get_sha1(E1, E2.hash) + get_oid(E1, ) @@ expression E1, E2; @@ - get_sha1(E1, E2->hash) + get_oid(E1, E2) @@ expression E1, E2; @@ - get_sha1_committish(E1, E2.hash) + get_oid_committish(E1, ) @@ expression E1, E2; @@ - get_sha1_committish(E1, E2->hash) + get_oid_committish(E1, E2) @@ expression E1, E2; @@ - get_sha1_treeish(E1, E2.hash) + get_oid_treeish(E1, ) @@ expression E1, E2; @@ - get_sha1_treeish(E1, E2->hash) + get_oid_treeish(E1, E2) @@ expression E1, E2; @@ - get_sha1_commit(E1, E2.hash) + get_oid_commit(E1, ) @@ expression E1, E2; @@ - get_sha1_commit(E1, E2->hash) + get_oid_commit(E1, E2) @@ expression E1, E2; @@ - get_sha1_tree(E1, E2.hash) + get_oid_tree(E1, ) @@ expression E1, E2; @@ - get_sha1_tree(E1, E2->hash) + get_oid_tree(E1, E2) @@ expression E1, E2; @@ - get_sha1_blob(E1, E2.hash) + get_oid_blob(E1, ) @@ expression E1, E2; @@ - get_sha1_blob(E1, E2->hash) + get_oid_blob(E1, E2) @@ expression E1, E2, E3, E4; @@ - get_sha1_with_context(E1, E2, E3.hash, E4) + get_oid_with_context(E1, E2, , E4) @@ expression E1, E2, E3, E4; @@ - get_sha1_with_context(E1, E2, E3->hash, E4) + get_oid_with_context(E1, E2, E3, E4) Signed-off-by: brian m. carlson--- apply.c | 4 +- archive.c | 2 +- builtin/am.c | 6 +- builtin/cat-file.c| 6 +- builtin/commit-tree.c | 4 +- builtin/commit.c | 8 +-- builtin/grep.c| 4 +- builtin/log.c | 4 +- builtin/replace.c | 4 +- builtin/reset.c | 10 +-- builtin/rev-parse.c | 6 +- builtin/show-branch.c | 8 +-- cache.h | 17 +++-- commit.c | 4 +- notes.c | 2 +- remote.c | 2 +- revision.c| 10 +-- sequencer.c | 6 +- sha1_name.c | 190 -- transport-helper.c| 2 +- 20 files changed, 145 insertions(+), 154 deletions(-) diff --git a/apply.c b/apply.c index 4050cebcfa..1743fc4a49 100644 --- a/apply.c +++ b/apply.c @@ -3552,7 +3552,7 @@ static int try_threeway(struct apply_state *state, /* Preimage the patch was prepared for */ if (patch->is_new) write_sha1_file("", 0, blob_type, pre_oid.hash); - else if (get_sha1(patch->old_sha1_prefix, pre_oid.hash) || + else if (get_oid(patch->old_sha1_prefix, _oid) || read_blob_object(, _oid, patch->old_mode)) return error(_("repository lacks the necessary blob to fall back on 3-way merge.")); @@ -4076,7 +4076,7 @@ static int build_fake_ancestor(struct apply_state *state, struct patch *list) else return error(_("sha1 information is lacking or " "useless for submodule %s"), name); - } else if (!get_sha1_blob(patch->old_sha1_prefix, oid.hash)) { + } else if (!get_oid_blob(patch->old_sha1_prefix, )) { ; /* ok */ } else if (!patch->lines_added && !patch->lines_deleted) { /* mode-only change: update the current */ diff --git a/archive.c b/archive.c index 60b3035a7a..557dd2db85 100644 --- a/archive.c +++ b/archive.c @@ -358,7 +358,7 @@ static void parse_treeish_arg(const char **argv, free(ref); } - if (get_sha1(name, oid.hash)) + if (get_oid(name, )) die("Not a valid object name"); commit = lookup_commit_reference_gently(, 1); diff --git a/builtin/am.c b/builtin/am.c index c973bd96dc..40cc6d6fe8 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -1131,7 +1131,7 @@ static int index_has_changes(struct strbuf *sb) struct object_id head; int i; - if (!get_sha1_tree("HEAD", head.hash)) { + if (!get_oid_tree("HEAD", )) { struct diff_options opt; diff_setup(); @@ -1432,7 +1432,7 @@ static void write_index_patch(const struct am_state *state) struct rev_info rev_info; FILE *fp; - if (!get_sha1_tree("HEAD", head.hash)) + if (!get_oid_tree("HEAD", )) tree = lookup_tree(); else tree = lookup_tree(_tree_oid); @@ -1661,7 +1661,7 @@ static void do_commit(const struct am_state *state) if (write_cache_as_tree(tree.hash, 0, NULL)) die(_("git write-tree failed to write a tree")); - if (!get_sha1_commit("HEAD", parent.hash)) { +
[PATCH v2 06/13] builtin/update_ref: convert to struct object_id
Convert the uses of unsigned char * to struct object_id. Signed-off-by: brian m. carlson--- builtin/update-ref.c | 69 ++-- 1 file changed, 34 insertions(+), 35 deletions(-) diff --git a/builtin/update-ref.c b/builtin/update-ref.c index 40ccfc193b..6b90c5dead 100644 --- a/builtin/update-ref.c +++ b/builtin/update-ref.c @@ -94,10 +94,10 @@ static char *parse_refname(struct strbuf *input, const char **next) * provided but cannot be converted to a SHA-1, die. flags can * include PARSE_SHA1_OLD and/or PARSE_SHA1_ALLOW_EMPTY. */ -static int parse_next_sha1(struct strbuf *input, const char **next, - unsigned char *sha1, - const char *command, const char *refname, - int flags) +static int parse_next_oid(struct strbuf *input, const char **next, + struct object_id *oid, + const char *command, const char *refname, + int flags) { struct strbuf arg = STRBUF_INIT; int ret = 0; @@ -115,11 +115,11 @@ static int parse_next_sha1(struct strbuf *input, const char **next, (*next)++; *next = parse_arg(*next, ); if (arg.len) { - if (get_sha1(arg.buf, sha1)) + if (get_oid(arg.buf, oid)) goto invalid; } else { /* Without -z, an empty value means all zeros: */ - hashclr(sha1); + oidclr(oid); } } else { /* With -z, read the next NUL-terminated line */ @@ -133,13 +133,13 @@ static int parse_next_sha1(struct strbuf *input, const char **next, *next += arg.len; if (arg.len) { - if (get_sha1(arg.buf, sha1)) + if (get_oid(arg.buf, oid)) goto invalid; } else if (flags & PARSE_SHA1_ALLOW_EMPTY) { /* With -z, treat an empty value as all zeros: */ warning("%s %s: missing , treating as zero", command, refname); - hashclr(sha1); + oidclr(oid); } else { /* * With -z, an empty non-required value means @@ -182,26 +182,25 @@ static const char *parse_cmd_update(struct ref_transaction *transaction, { struct strbuf err = STRBUF_INIT; char *refname; - unsigned char new_sha1[20]; - unsigned char old_sha1[20]; + struct object_id new_oid, old_oid; int have_old; refname = parse_refname(input, ); if (!refname) die("update: missing "); - if (parse_next_sha1(input, , new_sha1, "update", refname, - PARSE_SHA1_ALLOW_EMPTY)) + if (parse_next_oid(input, , _oid, "update", refname, + PARSE_SHA1_ALLOW_EMPTY)) die("update %s: missing ", refname); - have_old = !parse_next_sha1(input, , old_sha1, "update", refname, - PARSE_SHA1_OLD); + have_old = !parse_next_oid(input, , _oid, "update", refname, + PARSE_SHA1_OLD); if (*next != line_termination) die("update %s: extra input: %s", refname, next); if (ref_transaction_update(transaction, refname, - new_sha1, have_old ? old_sha1 : NULL, + new_oid.hash, have_old ? old_oid.hash : NULL, update_flags | create_reflog_flag, msg, )) die("%s", err.buf); @@ -218,22 +217,22 @@ static const char *parse_cmd_create(struct ref_transaction *transaction, { struct strbuf err = STRBUF_INIT; char *refname; - unsigned char new_sha1[20]; + struct object_id new_oid; refname = parse_refname(input, ); if (!refname) die("create: missing "); - if (parse_next_sha1(input, , new_sha1, "create", refname, 0)) + if (parse_next_oid(input, , _oid, "create", refname, 0)) die("create %s: missing ", refname); - if (is_null_sha1(new_sha1)) + if (is_null_oid(_oid)) die("create %s: zero ", refname); if (*next != line_termination) die("create %s: extra input: %s", refname, next); - if (ref_transaction_create(transaction, refname, new_sha1, + if (ref_transaction_create(transaction, refname, new_oid.hash, update_flags | create_reflog_flag, msg, )) die("%s", err.buf); @@ -250,18 +249,18 @@ static
[PATCH v2 08/13] builtin/unpack-file: convert to struct object_id
Signed-off-by: brian m. carlson--- builtin/unpack-file.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/builtin/unpack-file.c b/builtin/unpack-file.c index 73f1334191..281ca1db6c 100644 --- a/builtin/unpack-file.c +++ b/builtin/unpack-file.c @@ -1,7 +1,7 @@ #include "builtin.h" #include "config.h" -static char *create_temp_file(unsigned char *sha1) +static char *create_temp_file(struct object_id *oid) { static char path[50]; void *buf; @@ -9,9 +9,9 @@ static char *create_temp_file(unsigned char *sha1) unsigned long size; int fd; - buf = read_sha1_file(sha1, , ); + buf = read_sha1_file(oid->hash, , ); if (!buf || type != OBJ_BLOB) - die("unable to read blob object %s", sha1_to_hex(sha1)); + die("unable to read blob object %s", oid_to_hex(oid)); xsnprintf(path, sizeof(path), ".merge_file_XX"); fd = xmkstemp(path); @@ -23,15 +23,15 @@ static char *create_temp_file(unsigned char *sha1) int cmd_unpack_file(int argc, const char **argv, const char *prefix) { - unsigned char sha1[20]; + struct object_id oid; if (argc != 2 || !strcmp(argv[1], "-h")) usage("git unpack-file "); - if (get_sha1(argv[1], sha1)) + if (get_oid(argv[1], )) die("Not a valid object name %s", argv[1]); git_config(git_default_config, NULL); - puts(create_temp_file(sha1)); + puts(create_temp_file()); return 0; }
[PATCH v2 09/13] builtin/verify-tag: convert to struct object_id
Signed-off-by: brian m. carlson--- builtin/verify-tag.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/builtin/verify-tag.c b/builtin/verify-tag.c index f9a5f7535a..30e4c826ed 100644 --- a/builtin/verify-tag.c +++ b/builtin/verify-tag.c @@ -56,20 +56,20 @@ int cmd_verify_tag(int argc, const char **argv, const char *prefix) } while (i < argc) { - unsigned char sha1[20]; + struct object_id oid; const char *name = argv[i++]; - if (get_sha1(name, sha1)) { + if (get_oid(name, )) { had_error = !!error("tag '%s' not found.", name); continue; } - if (gpg_verify_tag(sha1, name, flags)) { + if (gpg_verify_tag(oid.hash, name, flags)) { had_error = 1; continue; } if (fmt_pretty) - pretty_print_ref(name, sha1, fmt_pretty); + pretty_print_ref(name, oid.hash, fmt_pretty); } return had_error; }
[PATCH v2 01/13] builtin/fsck: convert remaining caller of get_sha1 to object_id
Signed-off-by: brian m. carlson--- builtin/fsck.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 99dea7adf6..0e5a18e843 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -738,12 +738,12 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) heads = 0; for (i = 0; i < argc; i++) { const char *arg = argv[i]; - unsigned char sha1[20]; - if (!get_sha1(arg, sha1)) { - struct object *obj = lookup_object(sha1); + struct object_id oid; + if (!get_oid(arg, )) { + struct object *obj = lookup_object(oid.hash); if (!obj || !(obj->flags & HAS_OBJ)) { - error("%s: object missing", sha1_to_hex(sha1)); + error("%s: object missing", oid_to_hex()); errors_found |= ERROR_OBJECT; continue; }
[PATCH v2 07/13] bisect: convert bisect_checkout to struct object_id
Signed-off-by: brian m. carlson--- bisect.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/bisect.c b/bisect.c index a9fd9fbc61..2549eaf7b1 100644 --- a/bisect.c +++ b/bisect.c @@ -680,16 +680,16 @@ static int is_expected_rev(const struct object_id *oid) return res; } -static int bisect_checkout(const unsigned char *bisect_rev, int no_checkout) +static int bisect_checkout(const struct object_id *bisect_rev, int no_checkout) { char bisect_rev_hex[GIT_MAX_HEXSZ + 1]; - memcpy(bisect_rev_hex, sha1_to_hex(bisect_rev), GIT_SHA1_HEXSZ + 1); - update_ref(NULL, "BISECT_EXPECTED_REV", bisect_rev, NULL, 0, UPDATE_REFS_DIE_ON_ERR); + memcpy(bisect_rev_hex, oid_to_hex(bisect_rev), GIT_SHA1_HEXSZ + 1); + update_ref(NULL, "BISECT_EXPECTED_REV", bisect_rev->hash, NULL, 0, UPDATE_REFS_DIE_ON_ERR); argv_checkout[2] = bisect_rev_hex; if (no_checkout) { - update_ref(NULL, "BISECT_HEAD", bisect_rev, NULL, 0, UPDATE_REFS_DIE_ON_ERR); + update_ref(NULL, "BISECT_HEAD", bisect_rev->hash, NULL, 0, UPDATE_REFS_DIE_ON_ERR); } else { int res; res = run_command_v_opt(argv_checkout, RUN_GIT_CMD); @@ -796,7 +796,7 @@ static void check_merge_bases(int no_checkout) handle_skipped_merge_base(mb); } else { printf(_("Bisecting: a merge base must be tested\n")); - exit(bisect_checkout(mb->hash, no_checkout)); + exit(bisect_checkout(mb, no_checkout)); } } @@ -939,7 +939,7 @@ int bisect_next_all(const char *prefix, int no_checkout) struct rev_info revs; struct commit_list *tried; int reaches = 0, all = 0, nr, steps; - const unsigned char *bisect_rev; + struct object_id *bisect_rev; char *steps_msg; read_bisect_terms(_bad, _good); @@ -977,11 +977,11 @@ int bisect_next_all(const char *prefix, int no_checkout) exit(4); } - bisect_rev = revs.commits->item->object.oid.hash; + bisect_rev = >item->object.oid; - if (!hashcmp(bisect_rev, current_bad_oid->hash)) { + if (!oidcmp(bisect_rev, current_bad_oid)) { exit_if_skipped_commits(tried, current_bad_oid); - printf("%s is the first %s commit\n", sha1_to_hex(bisect_rev), + printf("%s is the first %s commit\n", oid_to_hex(bisect_rev), term_bad); show_diff_tree(prefix, revs.commits->item); /* This means the bisection process succeeded. */
[PATCH v2 10/13] Convert remaining callers of get_sha1 to get_oid.
Signed-off-by: brian m. carlson--- builtin/receive-pack.c | 4 ++-- mailmap.c | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index 71c0c768db..1efa48fec4 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -919,9 +919,9 @@ static int update_shallow_ref(struct command *cmd, struct shallow_info *si) */ static int head_has_history(void) { - unsigned char sha1[20]; + struct object_id oid; - return !get_sha1("HEAD", sha1); + return !get_oid("HEAD", ); } static const char *push_to_deploy(unsigned char *sha1, diff --git a/mailmap.c b/mailmap.c index c1a79c100c..cb921b4db6 100644 --- a/mailmap.c +++ b/mailmap.c @@ -214,17 +214,17 @@ static int read_mailmap_blob(struct string_list *map, const char *name, char **repo_abbrev) { - unsigned char sha1[20]; + struct object_id oid; char *buf; unsigned long size; enum object_type type; if (!name) return 0; - if (get_sha1(name, sha1) < 0) + if (get_oid(name, ) < 0) return 0; - buf = read_sha1_file(sha1, , ); + buf = read_sha1_file(oid.hash, , ); if (!buf) return error("unable to read mailmap object at %s", name); if (type != OBJ_BLOB)
[PATCH v2 05/13] sequencer: convert to struct object_id
Convert the remaining instances of unsigned char * to struct object_id. This removes several calls to get_sha1. Signed-off-by: brian m. carlson--- sequencer.c | 59 ++- 1 file changed, 30 insertions(+), 29 deletions(-) diff --git a/sequencer.c b/sequencer.c index 3010faf863..16d48a4fb3 100644 --- a/sequencer.c +++ b/sequencer.c @@ -691,7 +691,7 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts, static int is_original_commit_empty(struct commit *commit) { - const unsigned char *ptree_sha1; + const struct object_id *ptree_oid; if (parse_commit(commit)) return error(_("could not parse commit %s\n"), @@ -701,12 +701,12 @@ static int is_original_commit_empty(struct commit *commit) if (parse_commit(parent)) return error(_("could not parse parent commit %s\n"), oid_to_hex(>object.oid)); - ptree_sha1 = parent->tree->object.oid.hash; + ptree_oid = >tree->object.oid; } else { - ptree_sha1 = EMPTY_TREE_SHA1_BIN; /* commit is root */ + ptree_oid = _tree_oid; /* commit is root */ } - return !hashcmp(ptree_sha1, commit->tree->object.oid.hash); + return !oidcmp(ptree_oid, >tree->object.oid); } /* @@ -896,18 +896,18 @@ static int update_squash_messages(enum todo_command command, static void flush_rewritten_pending(void) { struct strbuf buf = STRBUF_INIT; - unsigned char newsha1[20]; + struct object_id newoid; FILE *out; - if (strbuf_read_file(, rebase_path_rewritten_pending(), 82) > 0 && - !get_sha1("HEAD", newsha1) && + if (strbuf_read_file(, rebase_path_rewritten_pending(), (GIT_MAX_HEXSZ + 1) * 2) > 0 && + !get_oid("HEAD", ) && (out = fopen_or_warn(rebase_path_rewritten_list(), "a"))) { char *bol = buf.buf, *eol; while (*bol) { eol = strchrnul(bol, '\n'); fprintf(out, "%.*s %s\n", (int)(eol - bol), - bol, sha1_to_hex(newsha1)); + bol, oid_to_hex()); if (!*eol) break; bol = eol + 1; @@ -1594,36 +1594,37 @@ static int rollback_is_safe(void) return !oidcmp(_head, _head); } -static int reset_for_rollback(const unsigned char *sha1) +static int reset_for_rollback(const struct object_id *oid) { const char *argv[4];/* reset --merge + NULL */ argv[0] = "reset"; argv[1] = "--merge"; - argv[2] = sha1_to_hex(sha1); + argv[2] = oid_to_hex(oid); argv[3] = NULL; return run_command_v_opt(argv, RUN_GIT_CMD); } static int rollback_single_pick(void) { - unsigned char head_sha1[20]; + struct object_id head_oid; if (!file_exists(git_path_cherry_pick_head()) && !file_exists(git_path_revert_head())) return error(_("no cherry-pick or revert in progress")); - if (read_ref_full("HEAD", 0, head_sha1, NULL)) + if (read_ref_full("HEAD", 0, head_oid.hash, NULL)) return error(_("cannot resolve HEAD")); - if (is_null_sha1(head_sha1)) + if (is_null_oid(_oid)) return error(_("cannot abort from a branch yet to be born")); - return reset_for_rollback(head_sha1); + return reset_for_rollback(_oid); } int sequencer_rollback(struct replay_opts *opts) { FILE *f; - unsigned char sha1[20]; + struct object_id oid; struct strbuf buf = STRBUF_INIT; + const char *p; f = fopen(git_path_head_file(), "r"); if (!f && errno == ENOENT) { @@ -1643,12 +1644,12 @@ int sequencer_rollback(struct replay_opts *opts) goto fail; } fclose(f); - if (get_sha1_hex(buf.buf, sha1) || buf.buf[40] != '\0') { + if (parse_oid_hex(buf.buf, , ) || *p != '\0') { error(_("stored pre-cherry-pick HEAD file '%s' is corrupt"), git_path_head_file()); goto fail; } - if (is_null_sha1(sha1)) { + if (is_null_oid()) { error(_("cannot abort from a branch yet to be born")); goto fail; } @@ -1658,7 +1659,7 @@ int sequencer_rollback(struct replay_opts *opts) warning(_("You seem to have moved HEAD. " "Not rewinding, check your HEAD!")); } else - if (reset_for_rollback(sha1)) + if (reset_for_rollback()) goto fail; strbuf_release(); return sequencer_remove_state(opts); @@ -1788,13 +1789,13 @@ static int make_patch(struct commit *commit, struct replay_opts *opts) static int
Strange behavior with Git add -p in version 2.13.3
Hello, I updated Git to 2.13.3, and now selecting 's' to split a change on a call to `git add -p` is not working. It's showing the list of options as if 's' is not a valid choice... Particularly, I'm getting... Stage this hunk [y,n,q,a,d,/,e,?]? s y - stage this hunk n - do not stage this hunk q - quit; do not stage this hunk or any of the remaining ones a - stage this hunk and all later hunks in the file d - do not stage this hunk or any of the later hunks in the file g - select a hunk to go to / - search for a hunk matching the given regex j - leave this hunk undecided, see next undecided hunk J - leave this hunk undecided, see next hunk k - leave this hunk undecided, see previous undecided hunk K - leave this hunk undecided, see previous hunk s - split the current hunk into smaller hunks e - manually edit the current hunk ? - print help Is anyone else having this problem? Does anybody know how to resolve it? I'm running on macOS Version 10.12.5. Thanks in advance! -Ben
Re: [PATCH 2/2] tag: convert gpg_verify_tag to use struct object_id
On Thu, Jul 13, 2017 at 02:49:28PM -0700, Stefan Beller wrote: > Snarkily I wanted to link to an essay "large patch series > considered harmful"[1], but could not find any. So a couple > of bullet points: > (a) humans dislike larger series, hence fewer reviewers > (b) the likelihood of a mistake in new code is proportional to its size > We can use the number of patches as a proxy for size > (c) If a mistake is found, the whole series needs rerolling. > The effort of rerolling a series can be approximated with > its size as well. > > From combining (b) and (c), we conclude that the effort to > land a patch series is O(n^2) with n as the number of patches. > Also from (a) we conclude that two smaller series containing > the same output as one large series, has better code quality. > So with that, we conclude that all series shall be as small > as possible. > > So I'd ask to queue these 2 separately, asking Brian to drop > "builtin/verify-tag: convert to struct object_id" > > [1] https://www.cs.princeton.edu/~dpw/papers/hotos.pdf, 2005 > seems interesting to me in hindsight. > > I can also send my patches to Brian, as you (both) like. I'm literally about to send my series out; I rebased and tested it last night. I don't care whose patch gets applied, but to avoid needing to rebase and retest my series, I'm going to send it out as it is. Junio can apply my series on top of yours, in which case he can drop the relevant patch from my series. -- brian m. carlson / brian with sandals: Houston, Texas, US https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
[ANNOUNCE] Git v2.14.0-rc0
An early preview release Git v2.14.0-rc0 is now available for testing at the usual places. It is comprised of 675 non-merge commits since v2.13.0, contributed by 53 people, 14 of which are new faces. The tarballs are found at: https://www.kernel.org/pub/software/scm/git/testing/ The following public repositories all have a copy of the 'v2.14.0-rc0' tag and the 'master' branch that the tag points at: url = https://kernel.googlesource.com/pub/scm/git/git url = git://repo.or.cz/alt-git.git url = https://github.com/gitster/git New contributors whose contributions weren't in v2.13.0 are as follows. Welcome to the Git development community! A. Wilcox, Ben Peart, Brian Malehorn, James Clarke, Jeff Smith, Kaartic Sivaraam, Liam Beguin, Phillip Wood, Rikard Falkeborn, Sahil Dua, Samuel Lijin, Stephen Kent, Tyler Brazier, and xiaoqiang zhao. Returning contributors who helped this release are as follows. Thanks for your continued support. Adam Dinwoodie, Ævar Arnfjörð Bjarmason, Alejandro R. Sedeño, Andreas Heiduk, Beat Bolli, Brandon Williams, brian m. carlson, Christian Couder, David Aguilar, David Turner, Dennis Kaarsemaker, Eric Wong, Jean-Noel Avila, Jeff Hostetler, Jeff King, Johannes Schindelin, Johannes Sixt, Jonathan Nieder, Jonathan Tan, Junio C Hamano, Kyle J. McKay, Kyle Meyer, Lars Schneider, Marc Branchaud, Michael Haggerty, Mike Hommey, Nguyễn Thái Ngọc Duy, Patrick Steinhardt, Prathamesh Chavan, Ralf Thielow, Ramsay Jones, René Scharfe, Stefan Beller, Štěpán Němec, Sven Strickroth, SZEDER Gábor, Thomas Gummerer, Torsten Bögershausen, and Ville Skyttä. Git 2.14 Release Notes (draft) == Backward compatibility notes. * Use of an empty string as a pathspec element that is used for 'everything matches' is still warned and Git asks users to use a more explicit '.' for that instead. The hope is that existing users will not mind this change, and eventually the warning can be turned into a hard error, upgrading the deprecation into removal of this (mis)feature. That is not scheduled to happen in the upcoming release (yet). * Git now avoids blindly falling back to ".git" when the setup sequence said we are _not_ in Git repository. A corner case that happens to work right now may be broken by a call to die("BUG"). We've tried hard to locate such cases and fixed them, but there might still be cases that need to be addressed--bug reports are greatly appreciated. * The experiment to improve the hunk-boundary selection of textual diff output has finished, and the "indent heuristics" has now become the default. Updates since v2.13 --- UI, Workflows & Features * The colors in which "git status --short --branch" showed the names of the current branch and its remote-tracking branch are now configurable. * "git clone" learned the "--no-tags" option not to fetch all tags initially, and also set up the tagopt not to follow any tags in subsequent fetches. * "git archive --format=zip" learned to use zip64 extension when necessary to go beyond the 4GB limit. * "git reset" learned "--recurse-submodules" option. * "git diff --submodule=diff" now recurses into nested submodules. * "git repack" learned to accept the --threads= option and pass it to pack-objects. * "git send-email" learned to run sendemail-validate hook to inspect and reject a message before sending it out. * There is no good reason why "git fetch $there $sha1" should fail when the $sha1 names an object at the tip of an advertised ref, even when the other side hasn't enabled allowTipSHA1InWant. * The recently introduced "[includeIf "gitdir:$dir"] path=..." mechanism has further been taught to take symlinks into account. The directory "$dir" specified in "gitdir:$dir" may be a symlink to a real location, not something that $(getcwd) may return. In such a case, a realpath of "$dir" is compared with the real path of the current repository to determine if the contents from the named path should be included. * Make the "indent" heuristics the default in "diff" and diff.indentHeuristics configuration variable an escape hatch for those who do no want it. * Many commands learned to pay attention to submodule.recurse configuration. * The convention for a command line is to follow "git cmdname --options" with revisions followed by an optional "--" disambiguator and then finally pathspecs. When "--" is not there, we make sure early ones are all interpretable as revs (and do not look like paths) and later ones are the other way around. A pathspec with "magic" (e.g. ":/p/a/t/h" that matches p/a/t/h from the top-level of the working tree, no matter what subdirectory you are working from) are conservatively judged as "not a path", which required disambiguation more
What's cooking in git.git (Jul 2017, #04; Thu, 13)
Here are the topics that have been cooking. Commits prefixed with '-' are only in 'pu' (proposed updates) while commits prefixed with '+' are in 'next'. The ones marked with '.' do not appear in any of the integration branches, but I am still holding onto them. A maintenance release for 2.13.x series, and the first preview for 2.14 series -rc0, have been tagged. Let's close the 'master' except for obvious fixes and clean-ups and concentrate on regressions from now on. You can find the changes described here in the integration branches of the repositories listed at http://git-blame.blogspot.com/p/git-public-repositories.html -- [Graduated to "master"] * ab/grep-lose-opt-regflags (2017-06-30) 6 commits (merged to 'next' on 2017-07-05 at 375c0b92ea) + grep: remove redundant REG_NEWLINE when compiling fixed regex + grep: remove regflags from the public grep_opt API + grep: remove redundant and verbose re-assignments to 0 + grep: remove redundant "fixed" field re-assignment to 0 + grep: adjust a redundant grep pattern type assignment + grep: remove redundant double assignment to 0 Code cleanup. * jk/build-with-asan (2017-07-10) 5 commits (merged to 'next' on 2017-07-10 at 49757e1119) + Makefile: disable unaligned loads with UBSan + Makefile: turn off -fomit-frame-pointer with sanitizers + Makefile: add helper for compiling with -fsanitize + test-lib: turn on ASan abort_on_error by default + test-lib: set ASAN_OPTIONS variable before we run git The build procedure has been improved to allow building and testing Git with address sanitizer more easily. * kn/ref-filter-branch-list (2017-07-10) 4 commits (merged to 'next' on 2017-07-10 at 35fd25c0dd) + ref-filter.c: drop return from void function + branch: set remote color in ref-filter branch immediately + branch: use BRANCH_COLOR_LOCAL in ref-filter format + branch: only perform HEAD check for local branches The rewrite of "git branch --list" using for-each-ref's internals that happened in v2.13 regressed its handling of color.branch.local; this has been fixed. * ks/fix-rebase-doc-picture (2017-07-10) 1 commit (merged to 'next' on 2017-07-10 at 3acb856b17) + doc: correct a mistake in an illustration Doc update. * rs/apply-avoid-over-reading (2017-07-09) 1 commit (merged to 'next' on 2017-07-10 at 2d8191ec3f) + apply: use strcmp(3) for comparing strings in gitdiff_verify_name() Code cleanup. * rs/urlmatch-cleanup (2017-07-09) 1 commit (merged to 'next' on 2017-07-10 at 2dd3e7cab0) + urlmatch: use hex2chr() in append_normalized_escapes() Code cleanup. * rs/use-div-round-up (2017-07-10) 1 commit (merged to 'next' on 2017-07-10 at accb7919da) + use DIV_ROUND_UP Code cleanup. * rs/wt-status-cleanup (2017-07-10) 1 commit (merged to 'next' on 2017-07-10 at d8939f683a) + wt-status: use separate variable for result of shorten_unambiguous_ref Code cleanup. * sb/hashmap-customize-comparison (2017-06-30) 3 commits (merged to 'next' on 2017-07-06 at cc420805f3) + hashmap: migrate documentation from Documentation/technical into header + patch-ids.c: use hashmap correctly + hashmap.h: compare function has access to a data field (this branch is used by sb/diff-color-move and sb/hashmap-cleanup.) Update the hashmap API so that data to customize the behaviour of the comparison function can be specified at the time a hashmap is initialized. * sb/pull-rebase-submodule (2017-06-27) 4 commits (merged to 'next' on 2017-07-09 at 48d2c3a51c) + builtin/fetch cleanup: always set default value for submodule recursing + pull: optionally rebase submodules (remote submodule changes only) + builtin/fetch: parse recurse-submodules-default at default options parsing + builtin/fetch: factor submodule recurse parsing out to submodule config "git pull --rebase --recurse-submodules" learns to rebase the branch in the submodules to an updated base. * sb/submodule-doc (2017-06-22) 1 commit (merged to 'next' on 2017-07-09 at fda0ceec31) + submodules: overhaul documentation Doc update. -- [New Topics] * jn/hooks-pre-rebase-sample-fix (2017-07-11) 1 commit (merged to 'next' on 2017-07-12 at ed86887454) + pre-rebase hook: capture documentation in a
A note from the maintainer
Welcome to the Git development community. This message is written by the maintainer and talks about how Git project is managed, and how you can work with it. * Mailing list and the community The development is primarily done on the Git mailing list. Help requests, feature proposals, bug reports and patches should be sent to the list address. You don't have to be subscribed to send messages. The convention on the list is to keep everybody involved on Cc:, so it is unnecessary to say "Please Cc: me, I am not subscribed". Before sending patches, please read Documentation/SubmittingPatches and Documentation/CodingGuidelines to familiarize yourself with the project convention. If you sent a patch and you did not hear any response from anybody for several days, it could be that your patch was totally uninteresting, but it also is possible that it was simply lost in the noise. Please do not hesitate to send a reminder message in such a case. Messages getting lost in the noise may be a sign that those who can evaluate your patch don't have enough mental/time bandwidth to process them right at the moment, and it often helps to wait until the list traffic becomes calmer before sending such a reminder. The list archive is available at a few public sites: http://public-inbox.org/git/ http://marc.info/?l=git http://www.spinics.net/lists/git/ For those who prefer to read it over NNTP: nntp://news.public-inbox.org/inbox.comp.version-control.git nntp://news.gmane.org/gmane.comp.version-control.git are available. When you point at a message in a mailing list archive, using its message ID is often the most robust (if not very friendly) way to do so, like this: http://public-inbox.org/git/pine.lnx.4.58.0504150753440.7...@ppc970.osdl.org Often these web interfaces accept the message ID with enclosing <> stripped (like the above example to point at one of the most important message in the Git list). Some members of the development community can sometimes be found on the #git and #git-devel IRC channels on Freenode. Their logs are available at: http://colabti.org/irclogger/irclogger_log/git http://colabti.org/irclogger/irclogger_log/git-devel There is a volunteer-run newsletter to serve our community ("Git Rev News" http://git.github.io/rev_news/rev_news.html). Git is a member project of software freedom conservancy, a non-profit organization (https://sfconservancy.org/). To reach a committee of liaisons to the conservancy, contact them at . * Reporting bugs When you think git does not behave as you expect, please do not stop your bug report with just "git does not work". "I used git in this way, but it did not work" is not much better, neither is "I used git in this way, and X happend, which is broken". It often is that git is correct to cause X happen in such a case, and it is your expectation that is broken. People would not know what other result Y you expected to see instead of X, if you left it unsaid. Please remember to always state - what you wanted to achieve; - what you did (the version of git and the command sequence to reproduce the behavior); - what you saw happen (X above); - what you expected to see (Y above); and - how the last two are different. See http://www.chiark.greenend.org.uk/~sgtatham/bugs.html for further hints. If you think you found a security-sensitive issue and want to disclose it to us without announcing it to wider public, please contact us at our security mailing list . This is a closed list that is limited to people who need to know early about vulnerabilities, including: - people triaging and fixing reported vulnerabilities - people operating major git hosting sites with many users - people packaging and distributing git to large numbers of people where these issues are discussed without risk of the information leaking out before we're ready to make public announcements. * Repositories and documentation. My public git.git repositories are at: git://git.kernel.org/pub/scm/git/git.git/ https://kernel.googlesource.com/pub/scm/git/git git://repo.or.cz/alt-git.git/ https://github.com/git/git/ git://git.sourceforge.jp/gitroot/git-core/git.git/ git://git-core.git.sourceforge.net/gitroot/git-core/git-core/ This one shows not just the main integration branches, but also individual topics broken out: git://github.com/gitster/git/ A few web interfaces are found at: http://git.kernel.org/cgit/git/git.git https://kernel.googlesource.com/pub/scm/git/git http://repo.or.cz/w/alt-git.git Preformatted documentation from the tip of the "master" branch can be found in: git://git.kernel.org/pub/scm/git/git-{htmldocs,manpages}.git/ git://repo.or.cz/git-{htmldocs,manpages}.git/ https://github.com/gitster/git-{htmldocs,manpages}.git/ The manual pages formatted in HTML for the tip of
Re: "groups of files" in Git?
Eh, yet another one, sorry: On 14/07/2017 01:32, Igor Djordjevic wrote: > > $ git checkout featureA > $ git commit > $ git checkout master > $ git reset --hard HEAD^ > $ git merge The last line should stress , as we`re not merging exact parent commits the previous merge commit was made of, but updated tips of the branches the previous merge commit was made of... or something along those lines :)
Re: "groups of files" in Git?
Just a small update/fixup: On 14/07/2017 00:39, Igor Djordjevic wrote: > I guess it would be a kind of alias to doing: > > $ git checkout featureA > $ git add ... > $ git commit > $ git checkout master > $ git reset --hard HEAD^ > $ git merge featureA featureB > This should, in fact, be: $ git checkout featureA $ git commit $ git checkout master $ git reset --hard HEAD^ $ git merge (removed "git add" step, as that is needed for proposed single step solution as well, as a usual step preceding the commit; also replaced concrete branch names in the last step with a more generic description, better communicating real intent) > In the same manner, it should be possible to drop a commit from the > feature branch in a single step, for example returning to the state > as shown in (1), or even "port" it from one branch to the other, like > this (without a need for it to be the last commit, even): > > (3) o---o---o---\ (featureA) > / \ > ---o---o---o---M' (master, HEAD) > \ / > o---o---A'--o (featureB) Here, the diagram should look like this: (3) o---o---o---\ (featureA) / \ ---o---o---o---M'' (master, HEAD) \ / o---o---A''-o (featureB) (replaced leftover M' from the previous diagram with M'' to show it`s yet another (updated) merge commit, different from both M and M' in terms of SHA1, yet the contents would probably, but not necessarily, be the same for all three; same for leftover A', replaced with A'') Regards, Buga
Re: "groups of files" in Git?
Hi astian, On 12/07/2017 00:27, astian wrote: > Nikolay Shustov wrote: >> With Perforce, I can have multiple changelists opened, that group the >> changed files as needed. >> >> With Git I cannot seem to finding the possibility to figure out how to >> achieve the same result. And the problem is that putting change sets >> on different Git branches (or workdirs, or whatever Git offers that >> makes the changes to be NOT in the same source tree) is not a viable >> option from me as I would have to re-build code as I re-integrate the >> changes between the branches (or whatever changes separation Git >> feature is used). >> Build takes time and resources and considering that I have to do it on >> multiple platforms (I do cross-platform development) it really >> denominates the option of not having multiple changes in the same code >> tree. >> >> Am I ignorant about some Git feature/way of using Git that would help? >> Is it worth considering adding to Git a feature like "group of files" >> that would offer some virtutal grouping of the locally changed files >> in the checked-out branch? > > I never used Perforce and I'm not even sure I understand your problem, > but I thought I'd mention something that nobody else seems to have yet > (unless I missed it): > > First, one thing that seems obvious to me from your description is that > these "parallel features" you work on are obviously interdependent, > therefore I would rather consider the whole thing as a single feature. > Therefore, it makes sense to me to work in a single "topic branch". > > This doesn't preclude one from separating the changes in logically > sensible pieces. Indeed this is par for the course in Git and people do > it all the time by dividing the bulk of changes into a carefully chosen > series of commits. > > I think the most common way of doing this is to simply work on the whole > thing and once you're happy with it you use "git rebase --interative" in > order to "prettify" your history. > > But, and here comes the part I think nobody mentioned yet, if your > feature work is considerably large or spans a considerably long time it > may be undesirable to postpone all that work until the very end (perhaps > by then you already forgot important information, or perhaps too many > changes have accumulated so reviewing them all becomes significantly > less efficient). In that case, one solution is to use a "patch > management system" which will let you do that work incrementally (going > back and forth as needed). > > If you know mercurial, this is "hg mq". I don't think Git has any such > system built-in, but I know there are at least these external tools that > integrate with Git: > https://git.wiki.kernel.org/index.php/Interfaces,_frontends,_and_tools#Patch-management_Interface_layers > > Feel free to ignore this if I totally misunderstood your use case. > > Cheers This message actually creeped me out the first time I read it, after writing an e-mail reply of my own[1]. The tone it`s written in, the points you make, and even the conclusion about "hg mg" -- as if you were reading my mind. Yours was sent a bit before mine, but I guess we were writing it at the same time as well... Just spooky, lol. That said, I totally understand what you`re talking about, and I gave an example of the desired (yet missing?) Git work flow here[2] :) [1] https://public-inbox.org/git/6e4096fd-cbab-68f0-7a23-654382cb8...@gmail.com/ [2] https://public-inbox.org/git/27a3c650-5843-d446-1f59-64fabe543...@gmail.com/ Regards, Buga
Re: [PATCH 05/15] ref-filter: abstract ref format into its own struct
Stefan Bellerwrites: >> So when somebody wants to do a "from design and explanation to >> provider to consumer", we would probably want "doc, *.h, *.c at the >> top-level and then things inside builtin/ subdirectory" order. Of >> course, on the other hand, "I do not trust me not getting swayed by >> the fact that a developer more competent than me wrote the patch" >> reviewer would want to use the reverse order. > > I do not understand what you say here? Are you saying: > "I can be tricked easier when the order is top-down, > specifically when the more competent developer tries to?" I am not worried about the case in which patch author actively "tries to" deceive. It is more like "I am more likely to fail to spot mistakes the patch author failed to spot himself", when I start with reasoning, service provider implementations and then service consumer. When I am forced to think the problem myself before seeing the answer and then compare the result with patch author's answer, I am more likely to find such a mistake. >> Can we actually express "top-level first and then builtin/*" order >> with the diff.orderfile mechanism? > > By reading the code, I think we snap to the first match. And matching > is done via the wildmatch.{c,h}, that claims it has special-case '/' matching, > and '**' ** work differently than '*', I took a brief look at diffcore-order.c; I do not think "/*.c" would match only top-level .c files like gitignore files would.
Re: [GSoC][PATCH 4/5 v4] submodule: port submodule subcommand 'status' from shell to C
On Thu, Jul 13, 2017 at 1:05 PM, Prathamesh Chavanwrote: > This aims to make git-submodule 'status' a built-in. Hence, the function > cmd_status() is ported from shell to C. This is done by introducing > three functions: module_status(), submodule_status() and print_status(). > > The function module_status() acts as the front-end of the subcommand. > It parses subcommand's options and then calls the function > module_list_compute() for computing the list of submodules. Then > this functions calls for_each_submodule_list() looping through the > list obtained. > > Then for_each_submodule_list() calls submodule_status() for each of the > submodule in its list. The function submodule_status() is responsible > for generating the status each submodule it is called for, and > then calls print_status(). > > Finally, the function print_status() handles the printing of submodule's > status. > > Mentored-by: Christian Couder > Mentored-by: Stefan Beller > Signed-off-by: Prathamesh Chavan > --- > In this new version of patch: > > Instead of using cmd_diff_files(), the process is optimized > by using ce_match_stat(). Apart from this I have reviewed the patch and found it a faithful conversion. I am not an expert in the diff area and wonder how the cmd_diff_files functionality is achieved with just a stat call and then comparing it to ce_match_stat. 'Using "dirty" ignores all changes to the work tree of submodules, only changes to the commits stored in the superproject are shown.' So I'd have expected ce->oid to be compared (is there an index entry differing, i.e. more than one stage?) > Also, the child_process running the command 'rev-parse --verify HEAD' > is removed for optimization reasons, and instead head_ref_submodule() > is used with callback function handle_submodule_head_ref(). Cool.
Re: "groups of files" in Git?
On 13/07/2017 23:20, Junio C Hamano wrote: > Nikolay Shustovwrites: >> My question was about how to robustly handle "multiple pending >> commits" which in Perforce are represented by concept of pending >> changelists. > > And in Git, they are represented by concept of commits that are not > yet pushed out to the public repository to become the final history > carved in stone. If I may, I don`t think "multiple pending commits" is the issue here (as that is indeed what a private branch is), but more something like "multiple branches pending/live merge branch", or something. To illustrate, let`s say this is our starting position: (1) o---o---o (featureA) / \ ---o---o---o---M (master, HEAD) \ / o---o---o (featureB) We`re currently on commit "M", being a merge commit between our "master" and two feature branches. Now, what seems lacking, while still possible through a series of steps, is an easy (single step) way to modify current state and commit the change to the _feature branch_, while still being on the "master" branch, still having everything merged in. So after I make a "featureA" related change while on "M", to be able to issue a single command, for example: $ git commit --branch=featureA ... or: $ git commit -b featureA ..., where "featureA" would need to be one of the parents of the current commit we are at (commit "M", in our case), and get a situation like this: (2) o---o---o---A (featureA) / \ ---o---o---o---M' (master, HEAD) \ / o---o---o---/ (featureB) Here, "A" is a new commit/change I`ve just made (while still being on the "master" branch), and it is automatically commited to related "featureA" branch, with merge commit "M" now recreated into "M'" to hold the new "featureA" commit "A" as well. I guess it would be a kind of alias to doing: $ git checkout featureA $ git add ... $ git commit $ git checkout master $ git reset --hard HEAD^ $ git merge featureA featureB ... or something, where last merge step would need to remember previous merge commit "M" parent branches and merge them again to produce an updated "M'" merge commit. In the same manner, it should be possible to drop a commit from the feature branch in a single step, for example returning to the state as shown in (1), or even "port" it from one branch to the other, like this (without a need for it to be the last commit, even): (3) o---o---o---\ (featureA) / \ ---o---o---o---M' (master, HEAD) \ / o---o---A'--o (featureB) Something like "rebase on steroids", lol, keeping the HEAD where it is, and its merge commit beneath updated. This indeed seems similar to Mercurial`s patch "queues", except being much better as everything is still version controlled at all times, no additional tools needed to version control the patches (unless that`s already been addressed in Mercurial as well, dunno). And it still seems to be following Git`s "multiple commits per feature, single feature per branch" spirit, just allowing for easier/faster branch integration testing. p.s. Even if my short sample might be flawed in one way or the other, it should show the essence of the functionality we`re discussing here, I think. Regards, Buga
Re: [PATCH] strbuf: use designated initializers in STRBUF_INIT
Am 12.07.2017 um 21:12 schrieb Ævar Arnfjörð Bjarmason: I think in the context of this desire Johannes Sixt's "Actually, I'm serious [about let's compile with c++]"[1] should be given some more consideration. Thank you for your support. I've just compiled Git with it and it passes all tests. I think the endeavor is worthwhile in itself as C++ source-level compatibility for git.git is clearly easy to achieve, and would effectively give us access to more compilers (albeit in different modes, but they may discover novel bugs that also apply to the C mode code). Please keep in mind that my patches only show that it can be done. Nevertheless, the result is far, far away from valid C++ code. It can be compiled by GCC (thanks to its -fpermissive flag) and happens to produce something that works. But that does not mean that other compilers would grok it. Source-level compatibility is only necessary as a stop gap in the transition to C++. If the transition is not going to happen, I doubt that there is any value. It is simply too much code churn for too little gain. The real gains are in the features of C++(11,14). But why do it? Aside from the "more compilers" argument, we may find that it's going to be much easier to use some C99 features we want by having C++ source-level compatibility, and on compilers like that may not support those features in C use the C++ mode that may support those. I would be happy to learn about a C99 feature where C++ mode of some compiler would help. The only C99 feature mentioned so far was designated initializers. Unfortunately, that actually widens the gap between C and C++, not lessens it. (C++ does not have designated initializers, and they are not on the agenda.) If not C++ support would be interesting for other reasons. Johannes Sixt: It would be very nice to get those patches on-list. I don't think it's worth to swamp the list with the patches at this time. Interested parties can find them here: https://github.com/j6t/git.git c-plus-plus I may continue the work, slowly and as long as I find it funny. It's just mental exercise, unless the Git community copies the GCC Steering Committee's mindeset with regard to C++ in the code base (https://gcc.gnu.org/ml/gcc/2010-05/msg00705.html). -- Hannes
Re: [PATCH 05/15] ref-filter: abstract ref format into its own struct
On Thu, Jul 13, 2017 at 2:32 PM, Junio C Hamanowrote: > Junio C Hamano writes: > >> Stefan Beller writes: >> >>> On Thu, Jul 13, 2017 at 8:01 AM, Jeff King wrote: >>> builtin/branch.c | 14 +++--- builtin/for-each-ref.c | 22 -- builtin/tag.c | 30 -- builtin/verify-tag.c | 12 ++-- ref-filter.c | 22 -- ref-filter.h | 22 +- 6 files changed, 70 insertions(+), 52 deletions(-) >>> >>> The patch looks good to me. So some off-topic comments: >>> I reviewed this patch from bottom up, i.e. I started looking at >>> ref-filter.h, then ref-filter.c and then the rest. If only you had >>> formatted >>> the patches with an orderfile. ;) >> >> As a reviewer, for this particular patchq, I actually appreciated >> that ref-filter.[ch] came at the end. That forced me to think. >> ... >> I do want to present from Doc to header to code when I am showing my >> patch to others, so this is probably a good example that illustrates >> that the preferred presentation order is not just personal >> preference, but is different on occasion even for the same person. > > So when somebody wants to do a "from design and explanation to > provider to consumer", we would probably want "doc, *.h, *.c at the > top-level and then things inside builtin/ subdirectory" order. Of > course, on the other hand, "I do not trust me not getting swayed by > the fact that a developer more competent than me wrote the patch" > reviewer would want to use the reverse order. I do not understand what you say here? Are you saying: "I can be tricked easier when the order is top-down, specifically when the more competent developer tries to?" > Can we actually express "top-level first and then builtin/*" order > with the diff.orderfile mechanism? By reading the code, I think we snap to the first match. And matching is done via the wildmatch.{c,h}, that claims it has special-case '/' matching, and '**' ** work differently than '*', > without which it would be cumbersome to make ref-filter.c listed > before builtin/branch.c in a generic way. Indeed.
Re: Reducing redundant build at Travis?
Lars Schneiderwrites: > On Thu, Jul 13, 2017 at 1:44 AM, Junio C Hamano wrote: >> I usually try to stay as late as possible to finish all the >> integration branches in order before pushing out the result; it is >> more efficient to be able to batch things (for humans). >> >> I however noticed that This often means we would have multiple build >> jobs at Travis for branches and builds on Windows often fails > > The Windows build has some kind of problem since June 22. > Somehow building gitk-git just blocks the build and waits until > the timeout. I had no time, yet, to investigate this further. > > >> waiting for its response. Since I tagged the tip of 'maint', and I >> wanted to give all the build a fair chance to succeed without other >> build jobs starving it of resources, I pushed out 'maint' and the >> tag before others, even though I already have all the other >> integration branches ready. >> >> Unfortunately, https://travis-ci.org/git/git/builds/ shows that it >> does not care if it spawned a job to build the tip of 'maint' and >> another for 'v2.13.3' that point at the same thing. > > That is indeed suprising and wasteful. Looks like other people > did run into the same issue. How about something like this? > https://github.com/mockito/mockito/blob/release/2.x/.travis.yml#L26-L29 That unfortunately is exactly what I wanted to avoid. We'd want to test tagged releases, and we'd want to test usual updates to integration branches. It just is that sometimes the tips of integration branches happen to be at the tagged release, so I'd prefer to always build tags but skip a branch build if it happens to be also tagged. After all, none of the integration branches may directly point at a tagged release when the tag is pushed out.
Re: reftable: new ref storage format
Jeff Kingwrote: > I agree that a full binary search of a reftable is harder because of the > prefix compression (it may still be possible by scanning backwards, but > I think there are ambiguities when you land in the middle of a record, > since there's no unambiguous end-of-record character). But I don't think > it matters. If you binary-search to a constant-sized block, then a > linear scan of the block is acceptable. For a new packed-refs, I think an intrusive critbit tree would be a good way to store refs which have many common prefixes and I've always wanted to apply critbit to an on-disk storage format... Several years ago, I started writing one in Perl using pwrite/pread to provide Message-ID <=> NNTP article number mapping several years ago, but gave up on it out of laziness: https://80x24.org/spew/1441508596-19511-1-git-send-emai...@80x24.org/raw The end goal would've been to have two tries sharing the same storage struct: one keyed by Message-ID, the other keyed by NNTP article number (and figuring out the node using offsets like we do with (container_of|list_entry) in list.h. For git, being able to do an O(hashlength) prefix search based on the object_id from the reftable would speed up decorations, I think. And of course, the O(refnamelength) prefix search would also apply to the refnames themselves.
Re: [PATCH 2/2] tag: convert gpg_verify_tag to use struct object_id
On Thu, Jul 13, 2017 at 2:23 PM, Junio C Hamanowrote: > Stefan Beller writes: > >> On Thu, Jul 13, 2017 at 1:52 PM, Junio C Hamano wrote: >>> Stefan Beller writes: >>> diff --git a/builtin/verify-tag.c b/builtin/verify-tag.c index f9a5f7535a..ed8329340f 100644 --- a/builtin/verify-tag.c +++ b/builtin/verify-tag.c @@ -56,20 +56,21 @@ int cmd_verify_tag(int argc, const char **argv, const char *prefix) } while (i < argc) { - unsigned char sha1[20]; + struct object_id oid; const char *name = argv[i++]; - if (get_sha1(name, sha1)) { + + if (get_oid(name, )) { had_error = !!error("tag '%s' not found.", name); continue; } >>> >>> This part is already done, it seems, in bc/object-id topic, even >>> though other parts are not yet done? >> >> Oops. I assumed the latest bc/object-id would have been in master >> already, but after checking it is not. 967635dc3c2 >> (builtin/verify-tag: convert to struct object_id) >> converts this part, although there are 2 differences: >> * I added a stray newline before get_oid >> * The argument to gpg_verify_tag is a sha1 or oid >> >> So yes, this produces a merge conflict. :/ > > That is OK. This actually shouldn't create any meaningful conflict. > Both try to do the same code, with only a blank-line difference. > > As Brian said bc/object-id would be rerolled, I was wondering if I > should queue these two patches (even though I already queued them) > myself, or it would be better for you to send them to Brian to make > it part of his series. +cc Brian Snarkily I wanted to link to an essay "large patch series considered harmful"[1], but could not find any. So a couple of bullet points: (a) humans dislike larger series, hence fewer reviewers (b) the likelihood of a mistake in new code is proportional to its size We can use the number of patches as a proxy for size (c) If a mistake is found, the whole series needs rerolling. The effort of rerolling a series can be approximated with its size as well. >From combining (b) and (c), we conclude that the effort to land a patch series is O(n^2) with n as the number of patches. Also from (a) we conclude that two smaller series containing the same output as one large series, has better code quality. So with that, we conclude that all series shall be as small as possible. So I'd ask to queue these 2 separately, asking Brian to drop "builtin/verify-tag: convert to struct object_id" [1] https://www.cs.princeton.edu/~dpw/papers/hotos.pdf, 2005 seems interesting to me in hindsight. I can also send my patches to Brian, as you (both) like. Thanks, Stefan
Re: [PATCH 05/15] ref-filter: abstract ref format into its own struct
Junio C Hamanowrites: > Stefan Beller writes: > >> On Thu, Jul 13, 2017 at 8:01 AM, Jeff King wrote: >> >>> builtin/branch.c | 14 +++--- >>> builtin/for-each-ref.c | 22 -- >>> builtin/tag.c | 30 -- >>> builtin/verify-tag.c | 12 ++-- >>> ref-filter.c | 22 -- >>> ref-filter.h | 22 +- >>> 6 files changed, 70 insertions(+), 52 deletions(-) >> >> The patch looks good to me. So some off-topic comments: >> I reviewed this patch from bottom up, i.e. I started looking at >> ref-filter.h, then ref-filter.c and then the rest. If only you had formatted >> the patches with an orderfile. ;) > > As a reviewer, for this particular patchq, I actually appreciated > that ref-filter.[ch] came at the end. That forced me to think. > ... > I do want to present from Doc to header to code when I am showing my > patch to others, so this is probably a good example that illustrates > that the preferred presentation order is not just personal > preference, but is different on occasion even for the same person. So when somebody wants to do a "from design and explanation to provider to consumer", we would probably want "doc, *.h, *.c at the top-level and then things inside builtin/ subdirectory" order. Of course, on the other hand, "I do not trust me not getting swayed by the fact that a developer more competent than me wrote the patch" reviewer would want to use the reverse order. Can we actually express "top-level first and then builtin/*" order with the diff.orderfile mechanism? It's been a while since I last looked at the orderfile matching (which was when I originally wrote it) and I do not offhand know if we now allow wildmatch patterns and the directory level anchoring "/*.c" like we do in .gitignore files, without which it would be cumbersome to make ref-filter.c listed before builtin/branch.c in a generic way.
Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)
Jeff Kingwrites: > Sorry, I mean the case where you do a merge from the other side, but > then you end up rewinding the history in some way, taking into account > that merge and everything they did. For example: > > $ git pull > $ git rebase ;# this will flatten the merge > $ git push --force-with-lease > > There was never a moment where the other side's tip ref was in your > local branch, but you did incorporate it via the merge. Ah, OK, now it is clear what you meant. Thanks.
Re: [PATCH 2/2] tag: convert gpg_verify_tag to use struct object_id
Stefan Bellerwrites: > On Thu, Jul 13, 2017 at 1:52 PM, Junio C Hamano wrote: >> Stefan Beller writes: >> >>> diff --git a/builtin/verify-tag.c b/builtin/verify-tag.c >>> index f9a5f7535a..ed8329340f 100644 >>> --- a/builtin/verify-tag.c >>> +++ b/builtin/verify-tag.c >>> @@ -56,20 +56,21 @@ int cmd_verify_tag(int argc, const char **argv, const >>> char *prefix) >>> } >>> >>> while (i < argc) { >>> - unsigned char sha1[20]; >>> + struct object_id oid; >>> const char *name = argv[i++]; >>> - if (get_sha1(name, sha1)) { >>> + >>> + if (get_oid(name, )) { >>> had_error = !!error("tag '%s' not found.", name); >>> continue; >>> } >> >> This part is already done, it seems, in bc/object-id topic, even >> though other parts are not yet done? > > Oops. I assumed the latest bc/object-id would have been in master > already, but after checking it is not. 967635dc3c2 > (builtin/verify-tag: convert to struct object_id) > converts this part, although there are 2 differences: > * I added a stray newline before get_oid > * The argument to gpg_verify_tag is a sha1 or oid > > So yes, this produces a merge conflict. :/ That is OK. This actually shouldn't create any meaningful conflict. Both try to do the same code, with only a blank-line difference. As Brian said bc/object-id would be rerolled, I was wondering if I should queue these two patches (even though I already queued them) myself, or it would be better for you to send them to Brian to make it part of his series.
Re: Reducing redundant build at Travis?
On Thu, Jul 13, 2017 at 1:44 AM, Junio C Hamanowrote: > I usually try to stay as late as possible to finish all the > integration branches in order before pushing out the result; it is > more efficient to be able to batch things (for humans). > > I however noticed that This often means we would have multiple build > jobs at Travis for branches and builds on Windows often fails The Windows build has some kind of problem since June 22. Somehow building gitk-git just blocks the build and waits until the timeout. I had no time, yet, to investigate this further. > waiting for its response. Since I tagged the tip of 'maint', and I > wanted to give all the build a fair chance to succeed without other > build jobs starving it of resources, I pushed out 'maint' and the > tag before others, even though I already have all the other > integration branches ready. > > Unfortunately, https://travis-ci.org/git/git/builds/ shows that it > does not care if it spawned a job to build the tip of 'maint' and > another for 'v2.13.3' that point at the same thing. That is indeed suprising and wasteful. Looks like other people did run into the same issue. How about something like this? https://github.com/mockito/mockito/blob/release/2.x/.travis.yml#L26-L29 - Lars
Re: "groups of files" in Git?
Nikolay Shustovwrites: > I am not really try to ignite the holy war between Perforce and Git > (and why would one???), but if you are interested in the answer on how > you'd do your scenario in Perforce, it would be: "use shelved > changelists". Oh, that was not my intention, either. My interest was to see if there is a good solution that we could steal from other world. > In Perforce, you could "shelve" the changelist, similar to "stash" in > Git, but the difference is that the Perforce shelved changes are > accessible across clients. I.e. the other developer can "unshelve" > these pending changes to its sandbox (to the same or the different > branch) so that sandbox would get the pending changes as well. That > would be like the developer made these changes himself. Whatever > automated/manual process is involved, it is typical to run "a trial > build/tests" on shelved changelist (i.e. uncommitted yet files) to > verify the quality of changes. > Git achieves the same through the ease of manipulation with branches > and I like the way it does it much more. Thanks. Shelving and letting others unshelve is like keeping the changes in separate branches and privately share them among developers, so they sound pretty much equivalent features to me. > My question was about how to robustly handle "multiple pending > commits" which in Perforce are represented by concept of pending > changelists. And in Git, they are represented by concept of commits that are not yet pushed out to the public repository to become the final history carved in stone.
Re: [PATCH 2/2] tag: convert gpg_verify_tag to use struct object_id
On Thu, Jul 13, 2017 at 1:52 PM, Junio C Hamanowrote: > Stefan Beller writes: > >> diff --git a/builtin/verify-tag.c b/builtin/verify-tag.c >> index f9a5f7535a..ed8329340f 100644 >> --- a/builtin/verify-tag.c >> +++ b/builtin/verify-tag.c >> @@ -56,20 +56,21 @@ int cmd_verify_tag(int argc, const char **argv, const >> char *prefix) >> } >> >> while (i < argc) { >> - unsigned char sha1[20]; >> + struct object_id oid; >> const char *name = argv[i++]; >> - if (get_sha1(name, sha1)) { >> + >> + if (get_oid(name, )) { >> had_error = !!error("tag '%s' not found.", name); >> continue; >> } > > This part is already done, it seems, in bc/object-id topic, even > though other parts are not yet done? Oops. I assumed the latest bc/object-id would have been in master already, but after checking it is not. 967635dc3c2 (builtin/verify-tag: convert to struct object_id) converts this part, although there are 2 differences: * I added a stray newline before get_oid * The argument to gpg_verify_tag is a sha1 or oid So yes, this produces a merge conflict. :/ There rest (tag.{c,h}, builtin/tag.c) is not found in brians series.
Re: [PATCH v1 0/4] Teach 'run' perf script to read config files
On Thu, Jul 13, 2017 at 08:57:01PM +0200, Christian Couder wrote: > >> We want to make it possible to store the parameters to the 'run' > >> script in a config file. This will make it easier to store, reuse, > >> share and compare parameters. > > > > Because perf-lib is built on test-lib, it already reads > > GIT-BUILD-OPTIONS. > > Actually the 'run' script also sources GIT-BUILD-OPTIONS, so maybe > this is not necessary. Ah, right. The one that comes via perf-lib gets the variables into the test scripts themselves. But anything "run" would need itself would come from the source it does itself. And that's where GIT_PERF_MAKE_OPTS has an effect. > Also are the variables in GIT-BUILD-OPTIONS exported already? No, I don't think so. But because both "run" and the scripts themselves source them, they're available more or less everywhere, except for sub-processes inside the scripts. > > And the Makefile copies several perf-related values > > into it, including GIT_PERF_MAKE_OPTS and GIT_PERF_REPEAT_COUNT. So you > > can already do: > > > > echo 'GIT_PERF_REPEAT_COUNT = 10' >>config.mak > > echo 'GIT_PERF_MAKE_OPTS = CFLAGS="-O2" DEVELOPER=1' >>config.mak > > make > > The "make" here might not even be needed as in the 'run' script > "config.mak" is copied into the "build/$rev" directory where "make" is > run to build the $rev version. You need it to bake the config into GIT-BUILD-OPTIONS, which is the only thing that gets read by "run" and the perf scripts. If you are just setting MAKE_OPTS to things that your config.mak already sets, then yes, you can skip that one. -Peff
Re: [PATCH] submodule: use cheaper check for submodule pushes
On Thu, Jul 13, 2017 at 1:48 PM, Jonathan Niederwrote: > Hi, > > Stefan Beller wrote: > >> Yes we are safe, because the function itself only spawns a child process >> (not using any of the objects). >> >> It's only caller push_unpushed_submodules also doesn't rely on objects >> loaded after calling push_submodule. >> >> The caller of push_unpushed_submodules (transport.c, transport_push) >> also doesn't need submodule objects loaded. > > Thanks for looking into it. This is what the commit message should > say to help reviewers or people trying to understand it later. The > footnotes don't help and are distracting, except that it makes sense > to point out the original GSoC patch to say the alternate submodule > odb wasn't needed even then. > > E.g.: > > Subject: push: do not add submodule odb as an alternate when recursing on > demand > > "git push --recurse-submodules=on-demand" adds each submodule as an > alternate with add_submodule_odb before checking whether the > submodule has anything to push and pushing it if so. > > However, it never accesses any objects from the submodule. In the > parent process it uses the submodule's ref database to see if there > is anything to push. The actual push (which does rely on objects) > occurs in a child process. > > The same was try when this call was originally added in > v1.7.11-rc0~111^2 (push: teach --recurse-submodules the on-demand > option, 2012-03-29). Most likely it was added by analogy with > fetch --recurse-submodules=on-demand, which did use the submodule's > object database. > > Use is_submodule_populated_gently instead, which is simpler and > cheaper. Thanks for giving a good example of commit message that I could use in a reroll. > With such a commit message change, this seems like a reasonable change > in principle (though I haven't looked carefully to verify it). > > My one doubt is the is_submodule_populated_gently. Why are we using > that instead of simpler is_submodule_populated? The names and API > comments don't explain. One could posit this is laziness of thinking. See 15cdc64776 (make is_submodule_populated gently, 2017-03-14), and discover there is no non-gentle version of is_submodule_populated. And for each new use, it may be cheaper to just use the gentle version instead of adding a non-gentle version. Thanks, Stefan
Re: [PATCH 2/2] tag: convert gpg_verify_tag to use struct object_id
Stefan Bellerwrites: > diff --git a/builtin/verify-tag.c b/builtin/verify-tag.c > index f9a5f7535a..ed8329340f 100644 > --- a/builtin/verify-tag.c > +++ b/builtin/verify-tag.c > @@ -56,20 +56,21 @@ int cmd_verify_tag(int argc, const char **argv, const > char *prefix) > } > > while (i < argc) { > - unsigned char sha1[20]; > + struct object_id oid; > const char *name = argv[i++]; > - if (get_sha1(name, sha1)) { > + > + if (get_oid(name, )) { > had_error = !!error("tag '%s' not found.", name); > continue; > } This part is already done, it seems, in bc/object-id topic, even though other parts are not yet done? > > - if (gpg_verify_tag(sha1, name, flags)) { > + if (gpg_verify_tag(, name, flags)) { > had_error = 1; > continue; > } > > if (fmt_pretty) > - pretty_print_ref(name, sha1, fmt_pretty); > + pretty_print_ref(name, oid.hash, fmt_pretty); > } > return had_error; > } > diff --git a/tag.c b/tag.c > index 47f60ae151..7e10acfb6e 100644 > --- a/tag.c > +++ b/tag.c > @@ -33,7 +33,7 @@ static int run_gpg_verify(const char *buf, unsigned long > size, unsigned flags) > return ret; > } > > -int gpg_verify_tag(const unsigned char *sha1, const char *name_to_report, > +int gpg_verify_tag(const struct object_id *oid, const char *name_to_report, > unsigned flags) > { > enum object_type type; > @@ -41,20 +41,20 @@ int gpg_verify_tag(const unsigned char *sha1, const char > *name_to_report, > unsigned long size; > int ret; > > - type = sha1_object_info(sha1, NULL); > + type = sha1_object_info(oid->hash, NULL); > if (type != OBJ_TAG) > return error("%s: cannot verify a non-tag object of type %s.", > name_to_report ? > name_to_report : > - find_unique_abbrev(sha1, DEFAULT_ABBREV), > + find_unique_abbrev(oid->hash, DEFAULT_ABBREV), > typename(type)); > > - buf = read_sha1_file(sha1, , ); > + buf = read_sha1_file(oid->hash, , ); > if (!buf) > return error("%s: unable to read file.", > name_to_report ? > name_to_report : > - find_unique_abbrev(sha1, DEFAULT_ABBREV)); > + find_unique_abbrev(oid->hash, DEFAULT_ABBREV)); > > ret = run_gpg_verify(buf, size, flags); > > diff --git a/tag.h b/tag.h > index fdfcb4a84a..d469534e82 100644 > --- a/tag.h > +++ b/tag.h > @@ -17,7 +17,7 @@ extern int parse_tag_buffer(struct tag *item, const void > *data, unsigned long si > extern int parse_tag(struct tag *item); > extern struct object *deref_tag(struct object *, const char *, int); > extern struct object *deref_tag_noverify(struct object *); > -extern int gpg_verify_tag(const unsigned char *sha1, > +extern int gpg_verify_tag(const struct object_id *oid, > const char *name_to_report, unsigned flags); > > #endif /* TAG_H */
Re: [PATCH] submodule: use cheaper check for submodule pushes
Hi, Stefan Beller wrote: > Yes we are safe, because the function itself only spawns a child process > (not using any of the objects). > > It's only caller push_unpushed_submodules also doesn't rely on objects > loaded after calling push_submodule. > > The caller of push_unpushed_submodules (transport.c, transport_push) > also doesn't need submodule objects loaded. Thanks for looking into it. This is what the commit message should say to help reviewers or people trying to understand it later. The footnotes don't help and are distracting, except that it makes sense to point out the original GSoC patch to say the alternate submodule odb wasn't needed even then. E.g.: Subject: push: do not add submodule odb as an alternate when recursing on demand "git push --recurse-submodules=on-demand" adds each submodule as an alternate with add_submodule_odb before checking whether the submodule has anything to push and pushing it if so. However, it never accesses any objects from the submodule. In the parent process it uses the submodule's ref database to see if there is anything to push. The actual push (which does rely on objects) occurs in a child process. The same was try when this call was originally added in v1.7.11-rc0~111^2 (push: teach --recurse-submodules the on-demand option, 2012-03-29). Most likely it was added by analogy with fetch --recurse-submodules=on-demand, which did use the submodule's object database. Use is_submodule_populated_gently instead, which is simpler and cheaper. [...] > On Thu, Jul 13, 2017 at 11:37 AM, Junio C Hamanowrote: >> My hunch (and hope) is that we are probably safe, but that is a lot >> weaker than "yes this is a good change we want to apply". > > Given the above (I went through the code), all I can do is repeating > "yes this is a good change we want to apply". With such a commit message change, this seems like a reasonable change in principle (though I haven't looked carefully to verify it). My one doubt is the is_submodule_populated_gently. Why are we using that instead of simpler is_submodule_populated? The names and API comments don't explain. Thanks for your patient explanations, Jonathan
Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)
On Thu, Jul 13, 2017 at 12:10:39PM -0700, Junio C Hamano wrote: > > - I suspect in the third example we probably ought to be using the > > reflog of the branch that HEAD points to. You would not want: > > > >$ git checkout advanced-branch $ git checkout older-branch $ git > >push --force-with-lease origin HEAD:older-branch > > > > to consider that commits in advanced-branch are part of the lease. > > The third one was meant to be rewriting on detached HEAD, not having > any underlying branch. Ah, OK, that makes a lot more sense. I'd be tempted to say that it should not allow force-with-lease at all. The HEAD reflog sees too many unrelated things to make it a good basis for "does this result account for everything in its reflog". > > - For step 3, I'm not sure if we you mean to look for exactly X, or > > if it would be OK to have any commit whose ancestor is X. I think > > you'd need the latter to accommodate a non-fast-forward "git pull" > > (or fetch+merge) where the local ref is never set precisely to the > > upstream commit. > > But the result in that case is a descendant of upstream you just > merged, so you do not even want to use any form of forcing---you > would rather want to rely on the usual "push must fast-forward" > mechanism, no? Sorry, I mean the case where you do a merge from the other side, but then you end up rewinding the history in some way, taking into account that merge and everything they did. For example: $ git pull $ git rebase ;# this will flatten the merge $ git push --force-with-lease There was never a moment where the other side's tip ref was in your local branch, but you did incorporate it via the merge. -Peff
Re: [PATCH] RFC: Introduce '.gitorderfile'
Stefan Bellerwrites: > The point I was trying to make is best demonstrated in > t5526-fetch-submodules.sh: > >> ok 7 - using fetchRecurseSubmodules=true in .gitmodules recurses into >> submodules >> ok 8 - --no-recurse-submodules overrides .gitmodules config >> ok 9 - using fetchRecurseSubmodules=false in .git/config overrides setting >> in .gitmodules > > They were not suggestions, but defaults dictated by the project. Yeah, and that is why I kept thinking that recurse-submodules may be a huge mistake.
Re: [PATCH 0/15] making user-format colors conditional on config/tty
Jeff Kingwrites: > This version also takes into account feedback from the original thread. > And as I added tests, it surfaced a few corner cases around color config > that I've dealt with here. The last two patches are the most > interesting bits. > > [01/15]: check return value of verify_ref_format() > [02/15]: docs/for-each-ref: update pointer to color syntax > [03/15]: t: use test_decode_color rather than literal ANSI codes > [04/15]: ref-filter: simplify automatic color reset > [05/15]: ref-filter: abstract ref format into its own struct > [06/15]: ref-filter: move need_color_reset_at_eol into ref_format > [07/15]: ref-filter: provide a function for parsing sort options > [08/15]: ref-filter: make parse_ref_filter_atom a private function > [09/15]: ref-filter: factor out the parsing of sorting atoms > [10/15]: ref-filter: pass ref_format struct to atom parsers > [11/15]: color: check color.ui in git_default_config() > [12/15]: for-each-ref: load config earlier > [13/15]: rev-list: pass diffopt->use_colors through to pretty-print > [14/15]: pretty: respect color settings for %C placeholders > [15/15]: ref-filter: consult want_color() before emitting colors Overall I think the endgame is what we want to have in the future (rather, what I wish we had from the beginning). I'd have to think about 11, 14 and 15 a bit more before saying anything that would be remotely useful. Thanks. > > Documentation/git-for-each-ref.txt | 6 +- > Documentation/pretty-formats.txt | 18 -- > builtin/branch.c | 21 +++--- > builtin/clean.c| 3 +- > builtin/for-each-ref.c | 27 > builtin/grep.c | 2 +- > builtin/rev-list.c | 1 + > builtin/show-branch.c | 2 +- > builtin/tag.c | 61 ++ > builtin/verify-tag.c | 14 ++-- > color.c| 8 --- > config.c | 4 ++ > diff.c | 3 - > pretty.c | 27 ++-- > ref-filter.c | 108 ++- > ref-filter.h | 30 +++-- > t/t3203-branch-output.sh | 31 + > t/t4207-log-decoration-colors.sh | 22 +++ > t/t6006-rev-list-format.sh | 129 > - > t/t6300-for-each-ref.sh| 39 +++ > t/t7004-tag.sh | 25 +++ > t/test-lib-functions.sh| 1 + > 22 files changed, 362 insertions(+), 220 deletions(-) > > -Peff
Re: [PATCH 06/15] ref-filter: move need_color_reset_at_eol into ref_format
Jeff Kingwrites: > Calling verify_ref_format() doesn't just confirm that the > format is sane; it actually sets some global variables that > will be used later when formatting the refs. These logically > should belong to the ref_format, which would make it > possible to use multiple formats within a single program > invocation. > > Let's move one such flag into the ref_format struct. There > are still others that would need to be moved before it would > be safe to use multiple formats, but this commit gives a > blueprint for how that should look. > > Signed-off-by: Jeff King > --- > This commit is strictly optional for this series, but I wanted to give a > sense of how the rest of the movement might look, since I was thinking > about it. The big thing to move would be the used_atoms array, but I > punted on that for now. Heh, when I saw the hunk at 661,+7, used_atom[] was what immediately came to mind. It is OK to punt for the purpose of this patch, but moving these statics into the ref-format structure would be a good move in the longer term. Thanks. > > ref-filter.c | 7 +++ > ref-filter.h | 3 +++ > 2 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/ref-filter.c b/ref-filter.c > index 66d234bb1..178396e1f 100644 > --- a/ref-filter.c > +++ b/ref-filter.c > @@ -97,7 +97,6 @@ static struct used_atom { > } u; > } *used_atom; > static int used_atom_cnt, need_tagged, need_symref; > -static int need_color_reset_at_eol; > > static void color_atom_parser(struct used_atom *atom, const char > *color_value) > { > @@ -661,7 +660,7 @@ int verify_ref_format(struct ref_format *format) > { > const char *cp, *sp; > > - need_color_reset_at_eol = 0; > + format->need_color_reset_at_eol = 0; > for (cp = format->format; *cp && (sp = find_next(cp)); ) { > const char *color, *ep = strchr(sp, ')'); > int at; > @@ -673,7 +672,7 @@ int verify_ref_format(struct ref_format *format) > cp = ep + 1; > > if (skip_prefix(used_atom[at].name, "color:", )) > - need_color_reset_at_eol = !!strcmp(color, "reset"); > + format->need_color_reset_at_eol = !!strcmp(color, > "reset"); > } > return 0; > } > @@ -2083,7 +2082,7 @@ void format_ref_array_item(struct ref_array_item *info, > sp = cp + strlen(cp); > append_literal(cp, sp, ); > } > - if (need_color_reset_at_eol) { > + if (format->need_color_reset_at_eol) { > struct atom_value resetv; > resetv.s = GIT_COLOR_RESET; > append_atom(, ); > diff --git a/ref-filter.h b/ref-filter.h > index 2bb58879d..9e1e89c19 100644 > --- a/ref-filter.h > +++ b/ref-filter.h > @@ -79,6 +79,9 @@ struct ref_format { >*/ > const char *format; > int quote_style; > + > + /* Internal state to ref-filter */ > + int need_color_reset_at_eol; > }; > > #define REF_FORMAT_INIT { NULL, 0 }
Re: reftable: new ref storage format
On Thu, Jul 13, 2017 at 12:56:54PM -0700, Stefan Beller wrote: > >> ### Problem statement > >> > >> Some repositories contain a lot of references (e.g. android at 866k, > >> rails at 31k). The existing packed-refs format takes up a lot of > >> space (e.g. 62M), and does not scale with additional references. > >> Lookup of a single reference requires linearly scanning the file. > > > > I think the linear scan is actually an implementation short-coming. Even > > though the records aren't fixed-length, the fact that newlines can only > > appear as end-of-record is sufficient to mmap and binary search a > > packed-refs file (you just have to backtrack a little when you land in > > the middle of a record). > > Except that a record is a "delta" to the previous record, so it's not > just finding a record, but reconstructing it. Example for records: I was still talking about the existing packed-refs implementation here. I agree that a full binary search of a reftable is harder because of the prefix compression (it may still be possible by scanning backwards, but I think there are ambiguities when you land in the middle of a record, since there's no unambiguous end-of-record character). But I don't think it matters. If you binary-search to a constant-sized block, then a linear scan of the block is acceptable. > >> - Occupy less disk space for large repositories. > > > > Good goal. Just to play devil's advocate, the simplest way to do that > > with the current code would be to gzip packed-refs (and/or store sha1s > > as binary). That works against the "mmap and binary search" plan, > > though. :) > > Given the compression by delta-ing the name to the previous change and > the fact that Gerrit has > > refs/heads/changes/1 > refs/heads/changes/2 > refs/heads/changes/3 > ... > > I think this format would trump a "dumb" zip. > (Github having sequentially numbered pull requests would also > benefit here) You may be surprised. Let's imagine that you have a set of 4096 refs in refs/changes/1, refs/changes/2, etc: for i in $(seq 1 4096) do echo refs/changes/$i done >input Now let's do a prefix compression, with a single byte for "how many characters to reuse from the last entry": perl -lne ' my $common; if (defined $last) { chop $last while !/\Q$last\E/; $common = length($last); } else { $common = 0; } print chr($common), substr($_, $common); $last = $_; ' prefix And a gzip: gzip -c -9 zip And the results: $ wc -c prefix; wc -c zip 12754 prefix 10116 zip The secret sauce is most likely that gzip is bit-packing, using only a few bits per character and not aligning with byte boundaries. Not that I'm recommending just gzipping the whole packed-refs file. It ruins the fast-lookup. We _could_ consider gzipping individual blocks of a reftable (or any structure that allows you to search to a constant-sized block and do a linear search from there). But given that they're in the same ballpark, I'm happy with whatever ends up the simplest to code and debug. ;) Just for fun, here's the decoding script for the prefix-compression: perl -e ' while (read(STDIN, $common, 1)) { $common = ord($common); $rest = ; if ($common > 0) { $rest = substr($last, 0, $common) . $rest } print $rest; $last = $rest}' > OK, let me try to summarize to see if I understand. > > When Shawn presented the proposal, a couple of colleagues here > were as excited as I was, but the daring question is, why Shawn > did not give the whole thing in BNF format from top down: > > initial-block > content-blocks* > (index-block) > footer Yeah, I agree it took me a bit to figure out what was going on. A high-level overview of the format would have been nice. > > So lookup really is more > > like O(block_size * log(n/block_size)), but block_size being a constant, > > it drops out to O(log n). > > There is also an index block such that you can binary search across > blocks, so > > O( log(block_count) + log(intra_block_restarting_points) + small linear scan) > > There are 2 binary searches, and the block size is an interesting > thing to look at when making up trade offs. Right, the cross-block index was what I was trying to account for. Either way, from a big-O perspective the block size and the number of restarts are constants with respect to the total number of entries. I'm happy with log(n), though. It's hard to do better. -Peff
Re: [PATCH 05/15] ref-filter: abstract ref format into its own struct
Stefan Bellerwrites: > On Thu, Jul 13, 2017 at 8:01 AM, Jeff King wrote: > >> builtin/branch.c | 14 +++--- >> builtin/for-each-ref.c | 22 -- >> builtin/tag.c | 30 -- >> builtin/verify-tag.c | 12 ++-- >> ref-filter.c | 22 -- >> ref-filter.h | 22 +- >> 6 files changed, 70 insertions(+), 52 deletions(-) > > The patch looks good to me. So some off-topic comments: > I reviewed this patch from bottom up, i.e. I started looking at > ref-filter.h, then ref-filter.c and then the rest. If only you had formatted > the patches with an orderfile. ;) As a reviewer, for this particular patchq, I actually appreciated that ref-filter.[ch] came at the end. That forced me to think. When I see something that used to be 0 is lost from the parameter list of show_ref_array_item() at a callsite, I was forced to guess what it is by looking at what is moved into the new structure nearby, and keep doing that "figure out what is going on" game until the "author's answer" is revealed at the end of the patch. By the time I reached ref-filter.[ch], I had a pretty good idea what was done by only looking at the callers and being able to see if my understanding matched the "author's answer" at the end of the patch turned out to be a very good way to double-check if the patch was sane. If I were given the author's answer upfront, especially an answer by somebody as competent as Peff, I'm sure I would have been swayed into believing that whatever is written in the patch must be correct without thinking the changes needed in the patch myself. I do want to present from Doc to header to code when I am showing my patch to others, so this is probably a good example that illustrates that the preferred presentation order is not just personal preference, but is different on occasion even for the same person.
Re: [PATCH 03/15] t: use test_decode_color rather than literal ANSI codes
Jeff Kingwrites: > When we put literal ANSI terminal codes into our test > scripts, it makes diffs on those scripts hard to read (the > colors may be indistinguishable from diff coloring, or in > the case of a reset, may not be visible at all). > > Some scripts get around this by including human-readable > names and converting to literal codes with a git-config > hack. This makes the actual code diffs look OK, but test_cmp > output suffers from the same problem. > > Let's use test_decode_color instead, which turns the codes > into obvious text tags. Nice. Thanks.
Re: [PATCH 02/15] docs/for-each-ref: update pointer to color syntax
Jeff Kingwrites: > The documentation for the %(color) placeholder refers to the > color.branch.* config for more details. But those details > moved to their own section in b92c1a28f > (Documentation/config.txt: describe 'color' value type in > the "Values" section, 2015-03-03). Let's update our > pointer. We can steal the text from 30cfe72d3 (pretty: fix > document link for color specification, 2016-10-11), which > fixed the same problem in a different place. Thanks. > While we're at it, let's give an example, which makes the > syntax much more clear than just the text. Nice. Thanks again. > > Signed-off-by: Jeff King > --- > Documentation/git-for-each-ref.txt | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/Documentation/git-for-each-ref.txt > b/Documentation/git-for-each-ref.txt > index 03e187a10..cc42c1283 100644 > --- a/Documentation/git-for-each-ref.txt > +++ b/Documentation/git-for-each-ref.txt > @@ -156,8 +156,10 @@ HEAD:: > otherwise. > > color:: > - Change output color. Followed by `:`, where names > - are described in `color.branch.*`. > + Change output color. Followed by `:`, where color > + names are described under Values in the "CONFIGURATION FILE" > + section of linkgit:git-config[1]. For example, > + `%(color:bold red)`. > > align:: > Left-, middle-, or right-align the content between
[GSoC][PATCH 4/5 v4] submodule: port submodule subcommand 'status' from shell to C
This aims to make git-submodule 'status' a built-in. Hence, the function cmd_status() is ported from shell to C. This is done by introducing three functions: module_status(), submodule_status() and print_status(). The function module_status() acts as the front-end of the subcommand. It parses subcommand's options and then calls the function module_list_compute() for computing the list of submodules. Then this functions calls for_each_submodule_list() looping through the list obtained. Then for_each_submodule_list() calls submodule_status() for each of the submodule in its list. The function submodule_status() is responsible for generating the status each submodule it is called for, and then calls print_status(). Finally, the function print_status() handles the printing of submodule's status. Mentored-by: Christian CouderMentored-by: Stefan Beller Signed-off-by: Prathamesh Chavan --- In this new version of patch: Instead of using cmd_diff_files(), the process is optimized by using ce_match_stat(). Also, the child_process running the command 'rev-parse --verify HEAD' is removed for optimization reasons, and instead head_ref_submodule() is used with callback function handle_submodule_head_ref(). The series differs from the complete weekly-update series as these patches have been reviewed with mentors as well as on mailing list more no. of times then the rest patches from the complete series, and hence IMO, are ready for getting included. If not so, I would like to have suggestions for improvising it. The patches pass the complete test suite. builtin/submodule--helper.c | 146 git-submodule.sh| 49 +-- 2 files changed, 147 insertions(+), 48 deletions(-) diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index 80f744407..9c1630495 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -560,6 +560,151 @@ static int module_init(int argc, const char **argv, const char *prefix) return 0; } +struct status_cb { + const char *prefix; + unsigned int quiet: 1; + unsigned int recursive: 1; + unsigned int cached: 1; +}; +#define STATUS_CB_INIT { NULL, 0, 0, 0 } + +static void print_status(struct status_cb *info, char state, const char *path, +char *sub_sha1, char *displaypath) +{ + if (info->quiet) + return; + + printf("%c%s %s", state, sub_sha1, displaypath); + + if (state == ' ' || state == '+') { + struct argv_array name_rev_args = ARGV_ARRAY_INIT; + + argv_array_pushl(_rev_args, "print-name-rev", +path, sub_sha1, NULL); + print_name_rev(name_rev_args.argc, name_rev_args.argv, + info->prefix); + } else { + printf("\n"); + } +} + +static int handle_submodule_head_ref(const char *refname, +const struct object_id *oid, int flags, +void *cb_data) +{ + struct strbuf *output = cb_data; + if (oid) + strbuf_addstr(output, oid_to_hex(oid)); + return 0; +} + +static void status_submodule(const struct cache_entry *list_item, void *cb_data) +{ + struct status_cb *info = cb_data; + char *sub_sha1 = xstrdup(oid_to_hex(_item->oid)); + char *displaypath; + struct stat st; + + if (!submodule_from_path(null_sha1, list_item->name)) + die(_("no submodule mapping found in .gitmodules for path '%s'"), + list_item->name); + + displaypath = get_submodule_displaypath(list_item->name, info->prefix); + + if (list_item->ce_flags) { + print_status(info, 'U', list_item->name, +sha1_to_hex(null_sha1), displaypath); + goto cleanup; + } + + if (!is_submodule_active(the_repository, list_item->name)) { + print_status(info, '-', list_item->name, sub_sha1, displaypath); + goto cleanup; + } + + if (!lstat(list_item->name, ) && !ce_match_stat(list_item, , 0)) { + print_status(info, ' ', list_item->name, sub_sha1, displaypath); + } else { + if (!info->cached) { + struct strbuf sb = STRBUF_INIT; + if (head_ref_submodule(list_item->name, + handle_submodule_head_ref, )) + die(_("could not resolve HEAD ref inside the" + "submodule '%s'"), list_item->name); + print_status(info, '+', list_item->name, sb.buf, +displaypath); + strbuf_release(); + } else { +
[GSoC][PATCH 5/5 v4] submodule: port submodule subcommand 'sync' from shell to C
Port the submodule subcommand 'sync' from shell to C using the same mechanism as that used for porting submodule subcommand 'status'. Hence, here the function cmd_sync() is ported from shell to C. This is done by introducing three functions: module_sync(), sync_submodule() and print_default_remote(). The function print_default_remote() is introduced for getting the default remote as stdout. Mentored-by: Christian CouderMentored-by: Stefan Beller Signed-off-by: Prathamesh Chavan --- builtin/submodule--helper.c | 179 git-submodule.sh| 56 +- 2 files changed, 180 insertions(+), 55 deletions(-) diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index 9c1630495..da91c489b 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -44,6 +44,20 @@ static char *get_default_remote(void) return ret; } +static int print_default_remote(int argc, const char **argv, const char *prefix) +{ + const char *remote; + + if (argc != 1) + die(_("submodule--helper print-default-remote takes no arguments")); + + remote = get_default_remote(); + if (remote) + puts(remote); + + return 0; +} + static int starts_with_dot_slash(const char *str) { return str[0] == '.' && is_dir_sep(str[1]); @@ -379,6 +393,25 @@ static void module_list_active(struct module_list *list) *list = active_modules; } +static char *get_up_path(const char *path) +{ + int i; + struct strbuf sb = STRBUF_INIT; + + for (i = count_slashes(path); i; i--) + strbuf_addstr(, "../"); + + /* +* Check if 'path' ends with slash or not +* for having the same output for dir/sub_dir +* and dir/sub_dir/ +*/ + if (!is_dir_sep(path[strlen(path) - 1])) + strbuf_addstr(, "../"); + + return strbuf_detach(, NULL); +} + static int module_list(int argc, const char **argv, const char *prefix) { int i; @@ -724,6 +757,150 @@ static int module_name(int argc, const char **argv, const char *prefix) return 0; } +struct sync_cb { + const char *prefix; + unsigned int quiet: 1; + unsigned int recursive: 1; +}; +#define SYNC_CB_INIT { NULL, 0, 0 } + +static void sync_submodule(const struct cache_entry *list_item, void *cb_data) +{ + struct sync_cb *info = cb_data; + const struct submodule *sub; + char *sub_key, *remote_key; + char *sub_origin_url, *super_config_url, *displaypath; + struct strbuf sb = STRBUF_INIT; + struct child_process cp = CHILD_PROCESS_INIT; + + if (!is_submodule_active(the_repository, list_item->name)) + return; + + sub = submodule_from_path(null_sha1, list_item->name); + + if (!sub || !sub->url) + die(_("no url found for submodule path '%s' in .gitmodules"), + list_item->name); + + if (starts_with_dot_dot_slash(sub->url) || starts_with_dot_slash(sub->url)) { + char *remote_url, *up_path; + char *remote = get_default_remote(); + char *remote_key = xstrfmt("remote.%s.url", remote); + + if (git_config_get_string(remote_key, _url)) + remote_url = xgetcwd(); + + up_path = get_up_path(list_item->name); + sub_origin_url = relative_url(remote_url, sub->url, up_path); + super_config_url = relative_url(remote_url, sub->url, NULL); + + free(remote); + free(remote_key); + free(up_path); + free(remote_url); + } else { + sub_origin_url = xstrdup(sub->url); + super_config_url = xstrdup(sub->url); + } + + displaypath = get_submodule_displaypath(list_item->name, info->prefix); + + if (!info->quiet) + printf(_("Synchronizing submodule url for '%s'\n"), +displaypath); + + sub_key = xstrfmt("submodule.%s.url", sub->name); + if (git_config_set_gently(sub_key, super_config_url)) + die(_("failed to register url for submodule path '%s'"), + displaypath); + + if (!is_submodule_populated_gently(list_item->name, NULL)) + goto cleanup; + + prepare_submodule_repo_env(_array); + cp.git_cmd = 1; + cp.dir = list_item->name; + argv_array_pushl(, "submodule--helper", +"print-default-remote", NULL); + if (capture_command(, , 0)) + die(_("failed to get the default remote for submodule '%s'"), + list_item->name); + + strbuf_strip_suffix(, "\n"); + remote_key = xstrfmt("remote.%s.url", sb.buf); + strbuf_release(); + + child_process_init(); +
[GSoC][PATCH 3/5 v4] submodule: port set_name_rev() from shell to C
Function set_name_rev() is ported from git-submodule to the submodule--helper builtin. The function get_name_rev() generates the value of the revision name as required, and the function print_name_rev() handles the formating and printing of the obtained revision name. Mentored-by: Christian CouderMentored-by: Stefan Beller Signed-off-by: Prathamesh Chavan --- builtin/submodule--helper.c | 63 + git-submodule.sh| 16 ++-- 2 files changed, 65 insertions(+), 14 deletions(-) diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index e41572f7a..80f744407 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -244,6 +244,68 @@ static char *get_submodule_displaypath(const char *path, const char *prefix) } } +static char *get_name_rev(const char *sub_path, const char* object_id) +{ + struct strbuf sb = STRBUF_INIT; + const char ***d; + + static const char *describe_bare[] = { + NULL + }; + + static const char *describe_tags[] = { + "--tags", NULL + }; + + static const char *describe_contains[] = { + "--contains", NULL + }; + + static const char *describe_all_always[] = { + "--all", "--always", NULL + }; + + static const char **describe_argv[] = { + describe_bare, describe_tags, describe_contains, + describe_all_always, NULL + }; + + for (d = describe_argv; *d; d++) { + struct child_process cp = CHILD_PROCESS_INIT; + prepare_submodule_repo_env(_array); + cp.dir = sub_path; + cp.git_cmd = 1; + cp.no_stderr = 1; + + argv_array_push(, "describe"); + argv_array_pushv(, *d); + argv_array_push(, object_id); + + if (!capture_command(, , 0) && sb.len) { + strbuf_strip_suffix(, "\n"); + return strbuf_detach(, NULL); + } + + } + + strbuf_release(); + return NULL; +} + +static int print_name_rev(int argc, const char **argv, const char *prefix) +{ + char *namerev; + if (argc != 3) + die("print-name-rev only accepts two arguments: "); + + namerev = get_name_rev(argv[1], argv[2]); + if (namerev && namerev[0]) + printf(" (%s)", namerev); + printf("\n"); + + return 0; +} + struct module_list { const struct cache_entry **entries; int alloc, nr; @@ -1242,6 +1304,7 @@ static struct cmd_struct commands[] = { {"relative-path", resolve_relative_path, 0}, {"resolve-relative-url", resolve_relative_url, 0}, {"resolve-relative-url-test", resolve_relative_url_test, 0}, + {"print-name-rev", print_name_rev, 0}, {"init", module_init, SUPPORT_SUPER_PREFIX}, {"remote-branch", resolve_remote_submodule_branch, 0}, {"push-check", push_check, 0}, diff --git a/git-submodule.sh b/git-submodule.sh index e131760ee..e988167e0 100755 --- a/git-submodule.sh +++ b/git-submodule.sh @@ -759,18 +759,6 @@ cmd_update() } } -set_name_rev () { - revname=$( ( - sanitize_submodule_env - cd "$1" && { - git describe "$2" 2>/dev/null || - git describe --tags "$2" 2>/dev/null || - git describe --contains "$2" 2>/dev/null || - git describe --all --always "$2" - } - ) ) - test -z "$revname" || revname=" ($revname)" -} # # Show commit summary for submodules in index or working tree # @@ -1042,14 +1030,14 @@ cmd_status() fi if git diff-files --ignore-submodules=dirty --quiet -- "$sm_path" then - set_name_rev "$sm_path" "$sha1" + revname=$(git submodule--helper print-name-rev "$sm_path" "$sha1") say " $sha1 $displaypath$revname" else if test -z "$cached" then sha1=$(sanitize_submodule_env; cd "$sm_path" && git rev-parse --verify HEAD) fi - set_name_rev "$sm_path" "$sha1" + revname=$(git submodule--helper print-name-rev "$sm_path" "$sha1") say "+$sha1 $displaypath$revname" fi -- 2.13.0
[GSoC][PATCH 2/5 v4] submodule--helper: introduce for_each_submodule_list()
Introduce function for_each_submodule_list() and replace a loop in module_init() with a call to it. The new function will also be used in other parts of the system in later patches. Mentored-by: Christian CouderMentored-by: Stefan Beller Signed-off-by: Prathamesh Chavan --- builtin/submodule--helper.c | 39 +-- 1 file changed, 29 insertions(+), 10 deletions(-) diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index 7af4de09b..e41572f7a 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -14,6 +14,9 @@ #include "refs.h" #include "connect.h" +typedef void (*submodule_list_func_t)(const struct cache_entry *list_item, + void *cb_data); + static char *get_default_remote(void) { char *dest = NULL, *ret; @@ -352,17 +355,30 @@ static int module_list(int argc, const char **argv, const char *prefix) return 0; } -static void init_submodule(const char *path, const char *prefix, int quiet) +static void for_each_submodule_list(const struct module_list list, + submodule_list_func_t fn, void *cb_data) { + int i; + for (i = 0; i < list.nr; i++) + fn(list.entries[i], cb_data); +} + +struct init_cb { + const char *prefix; + unsigned int quiet: 1; +}; +#define INIT_CB_INIT { NULL, 0 } + +static void init_submodule(const struct cache_entry *list_item, void *cb_data) +{ + struct init_cb *info = cb_data; const struct submodule *sub; struct strbuf sb = STRBUF_INIT; char *upd = NULL, *url = NULL, *displaypath; - /* Only loads from .gitmodules, no overlay with .git/config */ - gitmodules_config(); - displaypath = get_submodule_displaypath(path, prefix); + displaypath = get_submodule_displaypath(list_item->name, info->prefix); - sub = submodule_from_path(null_sha1, path); + sub = submodule_from_path(null_sha1, list_item->name); if (!sub) die(_("No url found for submodule path '%s' in .gitmodules"), @@ -374,7 +390,7 @@ static void init_submodule(const char *path, const char *prefix, int quiet) * * Set active flag for the submodule being initialized */ - if (!is_submodule_active(the_repository, path)) { + if (!is_submodule_active(the_repository, list_item->name)) { strbuf_addf(, "submodule.%s.active", sub->name); git_config_set_gently(sb.buf, "true"); } @@ -416,7 +432,7 @@ static void init_submodule(const char *path, const char *prefix, int quiet) if (git_config_set_gently(sb.buf, url)) die(_("Failed to register url for submodule path '%s'"), displaypath); - if (!quiet) + if (!info->quiet) fprintf(stderr, _("Submodule '%s' (%s) registered for path '%s'\n"), sub->name, url, displaypath); @@ -445,10 +461,10 @@ static void init_submodule(const char *path, const char *prefix, int quiet) static int module_init(int argc, const char **argv, const char *prefix) { + struct init_cb info = INIT_CB_INIT; struct pathspec pathspec; struct module_list list = MODULE_LIST_INIT; int quiet = 0; - int i; struct option module_init_options[] = { OPT__QUIET(, N_("Suppress output for initializing a submodule")), @@ -473,8 +489,11 @@ static int module_init(int argc, const char **argv, const char *prefix) if (!argc && git_config_get_value_multi("submodule.active")) module_list_active(); - for (i = 0; i < list.nr; i++) - init_submodule(list.entries[i]->name, prefix, quiet); + info.prefix = prefix; + info.quiet = !!quiet; + + gitmodules_config(); + for_each_submodule_list(list, init_submodule, ); return 0; } -- 2.13.0
[GSoC][PATCH 1/5 v4] submodule--helper: introduce get_submodule_displaypath()
Introduce function get_submodule_displaypath() to replace the code occurring in submodule_init() for generating displaypath of the submodule with a call to it. This new function will also be used in other parts of the system in later patches. Mentored-by: Christian CouderMentored-by: Stefan Beller Signed-off-by: Prathamesh Chavan --- builtin/submodule--helper.c | 33 ++--- 1 file changed, 22 insertions(+), 11 deletions(-) diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index 6abdad329..7af4de09b 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -220,6 +220,27 @@ static int resolve_relative_url_test(int argc, const char **argv, const char *pr return 0; } +static char *get_submodule_displaypath(const char *path, const char *prefix) +{ + const char *super_prefix = get_super_prefix(); + + if (prefix && super_prefix) { + BUG("cannot have prefix '%s' and superprefix '%s'", + prefix, super_prefix); + } else if (prefix) { + struct strbuf sb = STRBUF_INIT; + char *displaypath = xstrdup(relative_path(path, prefix, )); + strbuf_release(); + return displaypath; + } else if (super_prefix) { + int len = strlen(super_prefix); + const char *format = is_dir_sep(super_prefix[len - 1]) ? "%s%s" : "%s/%s"; + return xstrfmt(format, super_prefix, path); + } else { + return xstrdup(path); + } +} + struct module_list { const struct cache_entry **entries; int alloc, nr; @@ -339,16 +360,7 @@ static void init_submodule(const char *path, const char *prefix, int quiet) /* Only loads from .gitmodules, no overlay with .git/config */ gitmodules_config(); - - if (prefix && get_super_prefix()) - die("BUG: cannot have prefix and superprefix"); - else if (prefix) - displaypath = xstrdup(relative_path(path, prefix, )); - else if (get_super_prefix()) { - strbuf_addf(, "%s%s", get_super_prefix(), path); - displaypath = strbuf_detach(, NULL); - } else - displaypath = xstrdup(path); + displaypath = get_submodule_displaypath(path, prefix); sub = submodule_from_path(null_sha1, path); @@ -363,7 +375,6 @@ static void init_submodule(const char *path, const char *prefix, int quiet) * Set active flag for the submodule being initialized */ if (!is_submodule_active(the_repository, path)) { - strbuf_reset(); strbuf_addf(, "submodule.%s.active", sub->name); git_config_set_gently(sb.buf, "true"); } -- 2.13.0
Re: reftable: new ref storage format
On Thu, Jul 13, 2017 at 12:32 PM, Jeff Kingwrote: > On Wed, Jul 12, 2017 at 05:17:58PM -0700, Shawn Pearce wrote: > >> We've been having scaling problems with insane number of references >> (>866k), so I started thinking a lot about improving ref storage. >> >> I've written a simple approach, and implemented it in JGit. >> Performance is promising: >> >> - 62M packed-refs compresses to 27M >> - 42.3 usec lookup > > Exciting. I'd love for us to have a simple-ish on-disk structure that > scales well and doesn't involve a dependency on a third-party database > structure. > > Let me see what holes I can poke in your proposal, though. :) > >> ### Problem statement >> >> Some repositories contain a lot of references (e.g. android at 866k, >> rails at 31k). The existing packed-refs format takes up a lot of >> space (e.g. 62M), and does not scale with additional references. >> Lookup of a single reference requires linearly scanning the file. > > I think the linear scan is actually an implementation short-coming. Even > though the records aren't fixed-length, the fact that newlines can only > appear as end-of-record is sufficient to mmap and binary search a > packed-refs file (you just have to backtrack a little when you land in > the middle of a record). Except that a record is a "delta" to the previous record, so it's not just finding a record, but reconstructing it. Example for records: varint( prefix_length ) varint( (suffix_length << 2) | type ) suffix value? First record: 0, 16 << 2 | 0x01, "refs/heads/maint" 08f9c32463bf9e578acb7ac5f77afd36e803c6bc next record (refs/heads/master): 13 4 << 2 | 0x01 "ster", 80145b1e412719c960036c8c62a9e35dd23a5b2d Now if you found the second one, you cannot reconstruct its real name (refs/heads/master) without knowing the name of the first. The name of the first is easy because the prefix_length is 0. If it also had a prefix length != 0 you'd have to go back more. >> - Occupy less disk space for large repositories. > > Good goal. Just to play devil's advocate, the simplest way to do that > with the current code would be to gzip packed-refs (and/or store sha1s > as binary). That works against the "mmap and binary search" plan, > though. :) Given the compression by delta-ing the name to the previous change and the fact that Gerrit has refs/heads/changes/1 refs/heads/changes/2 refs/heads/changes/3 ... I think this format would trump a "dumb" zip. (Github having sequentially numbered pull requests would also benefit here) >> ## File format > > OK, let me try to summarize to see if I understand. When Shawn presented the proposal, a couple of colleagues here were as excited as I was, but the daring question is, why Shawn did not give the whole thing in BNF format from top down: initial-block content-blocks* (index-block) footer > The reftable file is a sequence of blocks, each of which contains a > finite set of heavily-compressed refs. You have to read each block > sequentially, Each block may have restarting points, that allow for intra-block binary search. > but since they're a fixed size, that's still a > constant-time operation (I'm ignoring the "restarts" thing for now). You > find the right block by reading the index. or by reading the footer at the end. If the footer and the index differ in block size (one bit flipped), we can ask the CRC of the footer for more guidance. > So lookup really is more > like O(block_size * log(n/block_size)), but block_size being a constant, > it drops out to O(log n). There is also an index block such that you can binary search across blocks, so O( log(block_count) + log(intra_block_restarting_points) + small linear scan) There are 2 binary searches, and the block size is an interesting thing to look at when making up trade offs.
Re: "groups of files" in Git?
For me the roadblock for multiple iterations through merging of the different parts (S, C, then C+S) is the time that will be spent on rebuilding the mainline. That's why I would like to have C+S in the same source tree then run tests for S, tests for C (if they can be run standalone) and C+S tests, then tests for whatever other pieces may be affected. (As I mentioned, there are more layers than client + server in my situation, e.g. client + transport + server). I am not really try to ignite the holy war between Perforce and Git (and why would one???), but if you are interested in the answer on how you'd do your scenario in Perforce, it would be: "use shelved changelists". In Perforce, you could "shelve" the changelist, similar to "stash" in Git, but the difference is that the Perforce shelved changes are accessible across clients. I.e. the other developer can "unshelve" these pending changes to its sandbox (to the same or the different branch) so that sandbox would get the pending changes as well. That would be like the developer made these changes himself. Whatever automated/manual process is involved, it is typical to run "a trial build/tests" on shelved changelist (i.e. uncommitted yet files) to verify the quality of changes. Git achieves the same through the ease of manipulation with branches and I like the way it does it much more. My question was about how to robustly handle "multiple pending commits" which in Perforce are represented by concept of pending changelists. On Thu, Jul 13, 2017 at 2:22 PM, Junio C Hamanowrote: > Stefan Beller writes: > >> On Tue, Jul 11, 2017 at 8:45 AM, Nikolay Shustov >> >>> With Git I cannot seem to finding the possibility to figure out how to >>> achieve the same result. And the problem is that putting change sets >>> on different Git branches (or workdirs, or whatever Git offers that >>> makes the changes to be NOT in the same source tree) is not a viable >>> option from me as I would have to re-build code as I re-integrate the >>> changes between the branches (or whatever changes separation Git >>> feature is used). >> >> you would merge the branches and then run the tests/integration. Yes that >> seems cumbersome. > > Sometimes the need to make trial merge for testing cannot be avoided > and having branches for separate topics is the only sensible > approach, at least in the Git world. > > Imagine your project has two components that are interrelated, say, > the server and the client, that have to work well with each other. > In addition, you want to make sure your updated server works well > with existing clients, and vice versa. > > One way that naturally maps this scenario to the development > workflow is to have a server-update topic and a client-update topic > branches, and separate changes to update each side with their own > commits: > > s---s---Sserver-update topic > / > ---o---ooMmainline > \ > c---c---Cclient-update topic > > And during the development of these *-update topics, you try three > merges: > > (1) Merge S to the mainline M and test the whole thing, to make sure > that existing client will still be able to talk with the > updated server. > > (2) Merge C to the mainline M and test the whole thing, to make > sure that updated clients will still be able to talk with the > existing server. > > (3) Merge both S and C to the mainline M and test the whole thing, > to make sure the updated ones talk to each other. > > If there is no significant development going on on the mainline in > the meantime, (1) and (2) can be done by trying out S and C alone > without making a trial merge with M. The same for (3)---it can be > just a trial merge between S and C without updates that happened on > the mainline. > > I'd love to hear from folks in Perforce or other world how they > address this scenario with their system.
Re: [PATCH v1 0/4] Teach 'run' perf script to read config files
On Thu, Jul 13, 2017 at 8:40 PM, Jeff Kingwrote: > On Thu, Jul 13, 2017 at 11:29:10AM -0700, Junio C Hamano wrote: > >> > So then I think your config file primarily becomes about defining the >> > properties of each run. I'm not sure if it would look like what you're >> > starting on here or not. >> >> Yeah, I suspect that the final shape that defines the matrix might >> have to become quite a bit different. > > I think it would help if the perf code was split better into three > distinct bits: > > 1. A data-store capable of storing the run tuples along with their > outcomes for each test. > > 2. A "run" front-end that runs various profiles (based on config, > command-line options, etc) and writes the results to the data > store. > > 3. A flexible viewer which can slice and dice the contents of the data > store according to different parameters. > > We're almost there now. The "run" script actually does store results, > and you can view them via "aggregate.pl" without actually re-running the > tests. But the data store only indexes on one property: the tree that > was tested (and all of the other properties are ignored totally; you can > get some quite confusing results if you do a "./run" using say git.git > as your test repo, and then a followup with "linux.git"). Yeah I agree, but if possible I'd like to avoid working on the three different parts at the same time. I haven't thought much about how to improve the data store yet. I may have to look at that soon though. > I have to imagine that somebody else has written such a system already > that we could reuse. I don't know of one off-hand, but this is also not > an area where I've spent a lot of time. Actually about the viewer AEvar suggested having something like speed.python.org and speed.pypy.org which seem to be made using https://github.com/tobami/codespeed So unless something else is suggested, I plan to make it possible to import the results of the perf tests into codespeed, but I haven't looked at that much yet. > We're sort of drifting off topic from Christian's patches here. But if > we did have a third-party system, I suspect the interesting work would > be setting up profiles for the "run" tool to kick off. And we might be > stuck in such a case using whatever format the tool prefers. So having a > sense of what the final solution looks like might help us know whether > it makes sense to introduce a custom config format here. I don't think we should completely switch to a third-party system for everything. Though it would simplify my work if we decide to do that. I think people might want different viewers, so we should just make sure that we can easily massage the results from the run script, so that it will be easy to provide them as input to many different viewers. So we are pretty free to decide how we specify which tests should be performed on which revision, and I think a config file is the best way.
Re: [RFC PATCH 1/3] promised-blob, fsck: introduce promised blobs
On Wed, 12 Jul 2017 13:29:11 -0400 Jeff Hostetlerwrote: > My primary concern is scale and managing the list of objects over time. > > My fear is that this list will be quite large. If we only want to omit > the very large blobs, then maybe not. But if we want to expand that > scope to also omit other objects (such as a clone synchronized with a > sparse checkout), then that list will get large on large repos. For > example, on the Windows repo we have (conservatively) 100M+ blobs (and > growing). Assuming 28 bytes per, gives a 2.8GB list to be manipulated. > > If I understand your proposal, newly-omitted blobs would need to be > merged into the promised-blob list after each fetch. The fetch itself > may not have that many new entries, but inserting them into the existing > list will be slow. Also, mmap'ing and bsearch'ing will likely have > issues. And there's likely to be a very expensive step to remove > entries from the list as new blobs are received (or locally created). > > In such a "sparse clone", it would be nice to omit unneeded tree objects > in addition to just blobs. I say that because we are finding with GVFS > on the Windows repo, that even with commits-and-trees-only filtering, > the number of tree objects is overwhelming. I know that discussion has shifted to the possibility of not having this list at all, and not sending size information together with the fetch, but going back to this...maybe omitting trees *is* the solution to both the large local list and the large amount of size information needing to be transferred. So the large-blob (e.g. Android) and many-blob (e.g. Windows) cases would look like this: * Large-blob repositories have no trees omitted and a few blobs omitted, and we have sizes for all of them. * Many-blob repositories have many trees omitted and either all blobs omitted (and we have size information for them, useful for FUSE or FUSE-like things, for example) or possibly no blobs omitted (for example, if shallow clones are going to be the norm, there won't be many blobs to begin with if trees are omitted). This seems better than an intermediate solution for the many-blob repository case in which we still keep all the trees but also try to avoid sending and storing as much information about the blobs as possible, because that doesn't seem to provide us with much savings (because the trees as a whole are just as large, if not larger, than the blob information). > So I'm also concerned about > limiting the list to just blobs. If we need to have this list, it > should be able to contain any object. (Suggesting having an object type > in the entry.) This makes sense - I'll add it in. > I also have to wonder about the need to have a complete list of omitted > blobs up front. It may be better to just relax the consistency checks > and assume a missing blob is "intentionally missing" rather than > indicating a corruption somewhere. And then let the client do a later > round-trip to either demand-load the object -or- demand-load the > existence/size info if/when it really matters. > > Maybe we should add a verb to your new fetch-blob endpoint to just get > the size of one or more objects to help with this. If we allow the omission of trees, I don't think the added complexity of demand-loading sizes is worth it. What do you think of doing this: * add a "type" field to the list of promised objects (formerly the list of promised blobs) * retain mandatory size for blobs * retain single file containing list of promised objects (I don't feel too strongly about this, but it has a slight simplicity and in-between-GC performance advantage)
Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)
Junio C Hamanowrites: > Jeff King writes: > >> [1] Another sticking point is that this really does need to be in the >> reflog of the ref we are pushing (and not, e.g., HEAD). But one does >> not always push from a ref. I suspect that's OK in practice, though. >> If you are doing "git push --force-with-lease HEAD~2:master", it is >> probably OK for us to error out with "uh, lease from what?". > > I actually expect this to bite me personally, as I often do not > rewind the actual "topic" ref that is my target of rewriting, > because I am a chicken---I detach the HEAD and build a history there > to see if I can come up with a better history, compare the result > with what is on the "topic" (which is not touched at all during the > rewriting), and after all is done, do a "checkout -B topic". The > "remote tip must appear in the local reflog" rule will never be > satisfied. Well "bite" is not quite accurate, as I do not push to a repository racing with others, and there is no reason for me to use --force with lease. I always push '+pu', and push '+next" once after each release, and there is no need for any lease when I push things out. But if I were collaborating with others in another project that uses a shared central repository workflow, and if I were in the mood to see how well/better this "improved safer DWIM" mode behaves, then it would bite me.
Re: [PATCH] submodule: use cheaper check for submodule pushes
On Thu, Jul 13, 2017 at 11:37 AM, Junio C Hamanowrote: > > I think Jonathan's question (which I concurred) is if we also ended > up relying on the side effect of calling that function (i.e. being > able to now find objects that are not in our repository but in the > submodule's object store). By looking at the eb21c732d6, we can > tell that the original didn't mean to and didn't add any code that > relies on the ability to be able to read from the submodule object > store. I am not sure if that is still true after 5 years (i.e. is > there any new code added in the meantime that made us depend on the > ability to read from submodule object store?). Yes we are safe, because the function itself only spawns a child process (not using any of the objects). It's only caller push_unpushed_submodules also doesn't rely on objects loaded after calling push_submodule. The caller of push_unpushed_submodules (transport.c, transport_push) also doesn't need submodule objects loaded. > My hunch (and hope) is that we are probably safe, but that is a lot > weaker than "yes this is a good change we want to apply". Given the above (I went through the code), all I can do is repeating "yes this is a good change we want to apply". Thanks, Stefan
Re: Git on macOS shows committed files as untracked
Torsten Bögershausenwrites: > Thanks for the fast analyzes - > in short: > what does > git -c core.precomposeunicode=true status > say ? > > The easiest thing may be to set > git config --global core.precomposeunicode true Good suggestion. I learned a new thing today. I somehow thought that precompose trick was only about argv[] when a program starts up and did not apply to paths readdir(3) finds through dir.c, e.g. $ git add . But apparently there is replacement readdir() used in compat/ for MacOSX so the paths from the system are also covered by the configuration.
Re: reftable: new ref storage format
On Wed, Jul 12, 2017 at 05:17:58PM -0700, Shawn Pearce wrote: > We've been having scaling problems with insane number of references > (>866k), so I started thinking a lot about improving ref storage. > > I've written a simple approach, and implemented it in JGit. > Performance is promising: > > - 62M packed-refs compresses to 27M > - 42.3 usec lookup Exciting. I'd love for us to have a simple-ish on-disk structure that scales well and doesn't involve a dependency on a third-party database structure. Let me see what holes I can poke in your proposal, though. :) > ### Problem statement > > Some repositories contain a lot of references (e.g. android at 866k, > rails at 31k). The existing packed-refs format takes up a lot of > space (e.g. 62M), and does not scale with additional references. > Lookup of a single reference requires linearly scanning the file. I think the linear scan is actually an implementation short-coming. Even though the records aren't fixed-length, the fact that newlines can only appear as end-of-record is sufficient to mmap and binary search a packed-refs file (you just have to backtrack a little when you land in the middle of a record). I wrote a proof of concept a while ago, but got stuck on integrating it into the ref code, because of some of the assumptions that it made. Michael Haggerty has been doing several rounds of refactors to remove those assumptions. I think we're pretty close (I've actually seen the endgame where packed-refs is fully binary searched, but I think there are a few more cleanups necessary to cover all cases). > Atomic pushes modifying multiple references require copying the > entire packed-refs file, which can be a considerable amount of data > moved (e.g. 62M in, 62M out) for even small transactions (2 refs > modified). I think your definition of atomic here doesn't match what git.git does. Our atomic push just takes the lock on all of the refs, and then once it has all of them, commits all of the locks. So it's atomic in the sense that you either get all or none of the writes (modulo a commit failure in the middle, which we naturally have no rollback plan for). But it can be done without touching the packed-refs file at all. I imagine that you're looking at atomicity from the perspective of a reader. In the git.git scheme, the reader may see a half-committed transaction. If you dispense with loose refs entirely and treat the packed-refs file as a single poorly-implemented key/value database, then you get reader atomicity (but O(size_of_database) write performance). > Repositories with many loose references occupy a large number of disk > blocks from the local file system, as each reference is its own file > storing 41 bytes. This negatively affects the number of inodes > available when a large number of repositories are stored on the same > filesystem. Readers are also penalized due to the larger number of > syscalls required to traverse and read the `$GIT_DIR/refs` directory. In my experience, the syscalls involved in loose refs aren't usually a big deal. If you have 800k refs, they're not all changing constantly. So a single pack-refs "fixes" performance going forward. What _is_ a big deal is that the packing process is complicated, readers have a very un-atomic view because of the myriad of files involved, and you get annoying lock contention during packing, as well as between deletions that have to rewrite packed-refs. But I'm not sure if you meant to contrast here a system where we didn't use packed-refs at all (though of course such a system is very much not atomic by the definition above). > ### Objectives > > - Near constant time lookup for any single reference, even when the > repository is cold and not in process or kernel cache. Good goal, though TBH I'd be happy with O(log n). A related one is being able to traverse a subset of refs in O(nr_traversed). E.g., "git tag -l" should not have to do work proportional to what is in refs/changes. That falls out of most proposals that allow fast lookups, but notably not a straight hash-table. > - Occupy less disk space for large repositories. Good goal. Just to play devil's advocate, the simplest way to do that with the current code would be to gzip packed-refs (and/or store sha1s as binary). That works against the "mmap and binary search" plan, though. :) > - Support atomic pushes with lower copying penalities. "Lower" is vague. I'd hope we could do updates with effort linear to the number of updated refs (it's OK if there's a constant factor, like writing extra blocks, as long as a single-ref update is about as expensive in a 10-ref repo as in a 10K-ref repo). > Scan (read 866k refs) and lookup (single ref from 866k refs): > > format | scan| lookup > |:|---: > packed-refs | 380 ms | 375420.0 usec > reftable| 125 ms | 42.3 usec Don't forget in git.git that the memory usage for packed-refs is atrocious (because we parse
Re: "groups of files" in Git?
Thank you, but I am not sure I quite understand the idea. Could you please elaborate on it for the following example? I have two Perforce changelists ("A" and "B") that group uncommitted sets of files (paths to each of files could be different): changelist A: file1 file2 changelist B: file3 file4 In Perforce, I am able to do the following: - move files between changelists (e.g. file1 could be moved to changelist B) - add new files to changeslit (e.g. changelist B can get additional file5) - revert file changes which would effectively remove file from the changelst (e.g. revert file2 will remove it from changelist A) How would I do it with sets of files that would belong to Git commit? On Thu, Jul 13, 2017 at 2:09 PM, Junio C Hamanowrote: > Nikolay Shustov writes: > >> Thank you for the detailed explanation, it looks like merging the >> commits would be helpful in my case. And I think it is a very good >> analogy that Perforce changelists are like multiple pending committs, >> if Git were supporting such. >> >> What it won't be achieving by using commits in this schema is the >> following thing I can do in Perforce: >> In the uncommitted Perforce changelists I can revert the changed file >> to the original state and move the files between the changelists. >> Quite often, while working on something, in the middle I would decide >> to isolate changes to a certain set of files to a separate changelsit >> - but then I might change my mind. It is all flexible until I actually >> commit my Perforce changelist, after which it becomes very much as >> committed changes in any other source control. >> This is actual flexibility I am looking for achieving in Git. > > I actually think we already have such a flexibility. Unlike > Perforce, Git is distributed, and the most important aspect of the > distinction is that what happens _in_ your local Git repository may > be called "committed" in Git lingo, but not visible to the public. > > You can consider these commits you make in your repository "pending" > when you think of your workflow in Perforce terms, until you merge > and push out the result, which roughly corresponds to "submitting" > in Perforce lingo. > > Once you start treating your local commits that you haven't pushed > out as changes that are still "pending" when observed from the > outside world, you'd realize that you have as much flexibilty, if > not more, to dice and slice them with the local tools like "rebase > -i", "add -p", etc., as you would have in your Perforce workflow, > I would think. > >
Re: [PATCH] git-p4: parse marshal output "p4 -G" in p4 changes
Miguel Torrojawrites: > I've just sent in reply to your previous e-mail three different patches. > > * The first patch is just to show some broken tests, > * Second patch is to fix the original issue I had (the one that > initiated this thread) > * Third patch is the one that filters out "info" messages in p4CmdList > (this time default is reversed and set to False, what is the original > behaviour). The two test cases that are cured with this change have to > set explicitely skip_info=True. The approach looks reasonable. By having tests that expect failure upfront, the series clearly shows how the code changes in later steps make things better. Thanks. Will replace.
Re: [PATCH 03/15] t: use test_decode_color rather than literal ANSI codes
Jeff Kingwrites: >> > @@ -59,7 +54,8 @@ EOF >> > # to this test since it does not contain any decoration, hence >> > --first-parent >> > test_expect_success 'Commit Decorations Colored Correctly' ' >> > git log --first-parent --abbrev=10 --all --decorate --oneline >> > --color=always | >> > - sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" >out && >> > + sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" | >> > + test_decode_color >out && >> >> Just some thoughts: >> >> This extension of the pipe-chain is not making it worse as gits exit code >> is already hidden. > > Yes, I noticed the existing pipe-chain. We can fix it while we're here, > though I think it's not a big deal in practice. > >> The sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" is sort of funny, because >> I would have expected it to break in the future with e.g. the sha1 to longer >> hash conversion. But as we specify --abbrev=10, this seems future proof. >> In an ideal world this would be encapsulated in a function (c.f. >> t/diff-lib.sh). Actually, --abbrev=10 does not guarantee that the hex part is always 10 bytes long, so it is not future-proofing, but I expect it would work out fine in practice. > I agree it's a bit gross. Possibly: > > git log --format='%C(auto)%d %s' > > would be easier for the test to parse (I'm pretty sure that didn't exist > back when this test was written). Yeah, that may make the test easier to read, too. Thanks.
Re: [PATCH] commit & merge: modularize the empty message validator
Kaartic Sivaraamwrites: > I have a few doubts for which I need clarification to move on with > this. > > 1. If we abort when the part is empty wouldn't it be too > restrictive ? > > IOW, Wouldn't it affect users of "git commit --cleanup=verbatim" > who wish to commit only the comments or parts of it ? > (I'm not sure if someone would find that useful) > > 2. Is it ok to use the "find_trailer_start" function of "trailer.c" > to locate the trailer? > > Note: It has a little issue that it wouldn't detect the trailer if > the message comprises of one trailer alone and no other text. This > case occurs while aborting a commit started using "git commit -s". > Any possibilities to overcome the issue? > > 3. Ignoring point 1 for now, What other helper methods except the > ones listed below could be helpful in the separating the cleaned up > commit message into the , , ? > > * ignore_non_trailer > * find_trailer_start All good points; if it bothers you that "commit" and "merge" define "emptyness" of the buffer differently too much, I think you could persuade me to unify them to "the buffer _must_ contain no bytes", i.e. not special-casing sign-off lines only "commit". It would be a backward incompatible tightening of the established rule, but it may not be a bad change.
Re: [PATCH v1 0/4] Teach 'run' perf script to read config files
Jeff Kingwrites: > ... But if > we did have a third-party system, I suspect the interesting work would > be setting up profiles for the "run" tool to kick off. And we might be > stuck in such a case using whatever format the tool prefers. So having a > sense of what the final solution looks like might help us know whether > it makes sense to introduce a custom config format here. Agreed. Thanks.
Re: [PATCH] RFC: Introduce '.gitorderfile'
On Thu, Jul 13, 2017 at 12:12 PM, Junio C Hamanowrote: > Stefan Beller writes: > >> On Thu, Jul 13, 2017 at 8:59 AM, Jeff King wrote: This triggers two reactions for me: (a) We should totally do that. >>> (b) It's a rabbit hole to go down. >>> >>> And yes, I had both of those reactions, too. We've had the >>> "project-level .gitconfig" discussion many times over the years. And it >>> generally comes back to "you can ship a snippet of config and then give >>> people a script which adds it to their repo". >> >> I see this "project-level .gitconfig" via the .gitmodules file. >> See GITMODULES(5), anything except submodule..path is >> just project-level .gitconfig,... > > They were from day one meant as a suggestion made by the project. > You do not have to follow them and you are free to update them, > i.e. after "submodule init" copied URL to point at a closer mirror, > for example. The URL is somewhat special as its copying into the .git/config also marks the submodule as interesting (no matter if the URL is changed by the user). The point I was trying to make is best demonstrated in t5526-fetch-submodules.sh: > ok 7 - using fetchRecurseSubmodules=true in .gitmodules recurses into > submodules > ok 8 - --no-recurse-submodules overrides .gitmodules config > ok 9 - using fetchRecurseSubmodules=false in .git/config overrides setting in > .gitmodules They were not suggestions, but defaults dictated by the project. If the user did not change their config, they did as the project said. I was not there on day one to remember if they are merely meant as suggestions, but their behavior is asserting.
Re: [PATCH 0/15] making user-format colors conditional on config/tty
On Thu, Jul 13, 2017 at 7:55 AM, Jeff Kingwrote: > This is a cleanup of the patch I posted last October: > > > https://public-inbox.org/git/20161010151517.6wszhuyp57yfn...@sigill.intra.peff.net/ > > The general idea is that it's rather confusing that "%C(red)" in a > pretty-print format does not currently respect color.ui, --no-color, or > the usual isatty check on stdout. This series changes that. Note that > this is a backwards-incompatible change, but the general sentiment in > that earlier thread seemed to be that the existing behavior is arguably > buggy. See patch 14 for more discussion. > > The patch stalled back then because I wanted to make sure that > ref-filter's color placeholders behaved the same. That required some > refactoring which conflicted badly with kn/ref-filter-branch-list. Now > that it has graduated, I was able to rebase on top. > > This version also takes into account feedback from the original thread. > And as I added tests, it surfaced a few corner cases around color config > that I've dealt with here. The last two patches are the most > interesting bits. > I have reviewed these slightly and think this series is a good change. Thanks, Stefan
Re: [PATCH] RFC: Introduce '.gitorderfile'
Stefan Bellerwrites: > On Thu, Jul 13, 2017 at 8:59 AM, Jeff King wrote: >>> This triggers two reactions for me: >>> >>> (a) We should totally do that. >> >>> (b) It's a rabbit hole to go down. >> >> And yes, I had both of those reactions, too. We've had the >> "project-level .gitconfig" discussion many times over the years. And it >> generally comes back to "you can ship a snippet of config and then give >> people a script which adds it to their repo". > > I see this "project-level .gitconfig" via the .gitmodules file. > See GITMODULES(5), anything except submodule..path is > just project-level .gitconfig,... They were from day one meant as a suggestion made by the project. You do not have to follow them and you are free to update them, i.e. after "submodule init" copied URL to point at a closer mirror, for example.
Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)
Jeff Kingwrites: > [1] Another sticking point is that this really does need to be in the > reflog of the ref we are pushing (and not, e.g., HEAD). But one does > not always push from a ref. I suspect that's OK in practice, though. > If you are doing "git push --force-with-lease HEAD~2:master", it is > probably OK for us to error out with "uh, lease from what?". I actually expect this to bite me personally, as I often do not rewind the actual "topic" ref that is my target of rewriting, because I am a chicken---I detach the HEAD and build a history there to see if I can come up with a better history, compare the result with what is on the "topic" (which is not touched at all during the rewriting), and after all is done, do a "checkout -B topic". The "remote tip must appear in the local reflog" rule will never be satisfied. >> I wonder if this could be a replacement for the current "the user >> did not even specify what the expected current value is, so we >> pretend as if the tip of the remote-tracking branch was given" >> kludge. > > Yes, that is exactly what I was thinking of (and why I said that even > though this really isn't force-with-lease in a strict sense, it slots > into the same level of safety, so it might be worth using the name). > >> Instead, >> >> git push --force-with-lease origin master >> git push --force-with-lease origin topic:master >> git push --force-with-lease origin HEAD:master >> >> could >> >> (1) first learn where 'refs/heads/master' over there is at. Call >> it X (it may be C or D in the earlier example). >> >> (2) locate from which ref the commit we are pushing out is taken; >> in the above examples, they are our refs/heads/master, >> refs/heads/topic, and HEAD, respectively. Call it R. >> >> (3) see if the reflog of R has X. If so do a --force push; >> otherwise fail. > > Yes, more or less. A few thoughts: > > - that step 2 is where the "wait, that isn't even a ref" error above > would come in > > - I suspect in the third example we probably ought to be using the > reflog of the branch that HEAD points to. You would not want: > >$ git checkout advanced-branch $ git checkout older-branch $ git >push --force-with-lease origin HEAD:older-branch > > to consider that commits in advanced-branch are part of the lease. The third one was meant to be rewriting on detached HEAD, not having any underlying branch. > - For step 3, I'm not sure if we you mean to look for exactly X, or > if it would be OK to have any commit whose ancestor is X. I think > you'd need the latter to accommodate a non-fast-forward "git pull" > (or fetch+merge) where the local ref is never set precisely to the > upstream commit. But the result in that case is a descendant of upstream you just merged, so you do not even want to use any form of forcing---you would rather want to rely on the usual "push must fast-forward" mechanism, no?
Re: [PATCH 2/2] tag: convert gpg_verify_tag to use struct object_id
Stefan Bellerwrites: > if (fmt_pretty) > - pretty_print_ref(name, sha1, fmt_pretty); > + pretty_print_ref(name, oid.hash, fmt_pretty); The next step would be to have pretty_print_ref() to take an oid; as there are only two callers to it, both of which pass oid->hash to it, that should be a small and conflict-free conversion. > > - type = sha1_object_info(sha1, NULL); > + type = sha1_object_info(oid->hash, NULL); sha1_object_info() has a handful of callers that do not pass the pointer to the hash field in an existing oid object, but it does not look too bad as a candidate for conversion. > if (!buf) > return error("%s: unable to read file.", > name_to_report ? > name_to_report : > - find_unique_abbrev(sha1, DEFAULT_ABBREV)); > + find_unique_abbrev(oid->hash, DEFAULT_ABBREV)); So does find_unique_abbrev(). Overall both patches look good. Thanks.
Re: [PATCH v1 0/4] Teach 'run' perf script to read config files
On Thu, Jul 13, 2017 at 6:58 PM, Jeff Kingwrote: > On Thu, Jul 13, 2017 at 08:50:46AM +0200, Christian Couder wrote: > >> Goal >> >> >> Using many long environment variables to give parameters to the 'run' >> script is error prone and tiring. >> >> We want to make it possible to store the parameters to the 'run' >> script in a config file. This will make it easier to store, reuse, >> share and compare parameters. > > Because perf-lib is built on test-lib, it already reads > GIT-BUILD-OPTIONS. Actually the 'run' script also sources GIT-BUILD-OPTIONS, so maybe this is not necessary. Also are the variables in GIT-BUILD-OPTIONS exported already? > And the Makefile copies several perf-related values > into it, including GIT_PERF_MAKE_OPTS and GIT_PERF_REPEAT_COUNT. So you > can already do: > > echo 'GIT_PERF_REPEAT_COUNT = 10' >>config.mak > echo 'GIT_PERF_MAKE_OPTS = CFLAGS="-O2" DEVELOPER=1' >>config.mak > make The "make" here might not even be needed as in the 'run' script "config.mak" is copied into the "build/$rev" directory where "make" is run to build the $rev version. > cd t/perf > ./run > > I suspect there are still a lot of things that could be made easier with > a config file, so I'm not against the concept. Your example here: > >> [perf "with libpcre"] >> makeOpts = DEVELOPER=1 CFLAGS='-g -O0' USE_LIBPCRE=YesPlease >> [perf "without libpcre"] >> makeOpts = DEVELOPER=1 CFLAGS='-g -O0' > > is a lot more compelling. But right now the perf suite is not useful at > all for comparing two builds of the same tree. For that, I think it > would be more useful if we could define a tuple of parameters for a run. > One of which could be the tree we're testing. Build opts are another. > Tested repository is another. And then we'd fill in a table of results > and let you slice up the table by any column (e.g., compare times for > runs against a single tree but with differing build options). Yeah, improving the output part is another thing that I have discussed with AEvar and that I have planned to work on.
Re: [PATCH 1/2] commit: convert lookup_commit_graft to struct object_id
Stefan Bellerwrites: > With this patch, commit.h doesn't contain the string 'sha1' any more. ;-) Nice. commit_graft_pos() still thinks we only deal with SHA-1, but that needs to wait for oid_pos(). The function has only two callers that do not pass X->oid.hash so it may be a good candidate to convert. > > Signed-off-by: Stefan Beller > --- > > Before diving into the "RFC object store" series further, I want to get > rid of the final sha1s in {commit,tag}.{c,h}. > > commit.c | 6 +++--- > commit.h | 2 +- > fsck.c| 2 +- > shallow.c | 4 ++-- > 4 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/commit.c b/commit.c > index cbfd689939..e0888cf0f7 100644 > --- a/commit.c > +++ b/commit.c > @@ -199,11 +199,11 @@ static void prepare_commit_graft(void) > commit_graft_prepared = 1; > } > > -struct commit_graft *lookup_commit_graft(const unsigned char *sha1) > +struct commit_graft *lookup_commit_graft(const struct object_id *oid) > { > int pos; > prepare_commit_graft(); > - pos = commit_graft_pos(sha1); > + pos = commit_graft_pos(oid->hash); > if (pos < 0) > return NULL; > return commit_graft[pos]; > @@ -335,7 +335,7 @@ int parse_commit_buffer(struct commit *item, const void > *buffer, unsigned long s > bufptr += tree_entry_len + 1; /* "tree " + "hex sha1" + "\n" */ > pptr = >parents; > > - graft = lookup_commit_graft(item->object.oid.hash); > + graft = lookup_commit_graft(>object.oid); > while (bufptr + parent_entry_len < tail && !memcmp(bufptr, "parent ", > 7)) { > struct commit *new_parent; > > diff --git a/commit.h b/commit.h > index 4127c298cb..6d857f06c1 100644 > --- a/commit.h > +++ b/commit.h > @@ -249,7 +249,7 @@ typedef int (*each_commit_graft_fn)(const struct > commit_graft *, void *); > > struct commit_graft *read_graft_line(char *buf, int len); > int register_commit_graft(struct commit_graft *, int); > -struct commit_graft *lookup_commit_graft(const unsigned char *sha1); > +struct commit_graft *lookup_commit_graft(const struct object_id *oid); > > extern struct commit_list *get_merge_bases(struct commit *rev1, struct > commit *rev2); > extern struct commit_list *get_merge_bases_many(struct commit *one, int n, > struct commit **twos); > diff --git a/fsck.c b/fsck.c > index b4204d772b..2d2d2e9432 100644 > --- a/fsck.c > +++ b/fsck.c > @@ -736,7 +736,7 @@ static int fsck_commit_buffer(struct commit *commit, > const char *buffer, > buffer += 41; > parent_line_count++; > } > - graft = lookup_commit_graft(commit->object.oid.hash); > + graft = lookup_commit_graft(>object.oid); > parent_count = commit_list_count(commit->parents); > if (graft) { > if (graft->nr_parent == -1 && !parent_count) > diff --git a/shallow.c b/shallow.c > index 54359d5490..f5591e56da 100644 > --- a/shallow.c > +++ b/shallow.c > @@ -107,7 +107,7 @@ struct commit_list *get_shallow_commits(struct > object_array *heads, int depth, > cur_depth++; > if ((depth != INFINITE_DEPTH && cur_depth >= depth) || > (is_repository_shallow() && !commit->parents && > - (graft = lookup_commit_graft(commit->object.oid.hash)) != > NULL && > + (graft = lookup_commit_graft(>object.oid)) != NULL > && >graft->nr_parent < 0)) { > commit_list_insert(commit, ); > commit->object.flags |= shallow_flag; > @@ -398,7 +398,7 @@ void prepare_shallow_info(struct shallow_info *info, > struct oid_array *sa) > for (i = 0; i < sa->nr; i++) { > if (has_object_file(sa->oid + i)) { > struct commit_graft *graft; > - graft = lookup_commit_graft(sa->oid[i].hash); > + graft = lookup_commit_graft(>oid[i]); > if (graft && graft->nr_parent < 0) > continue; > info->ours[info->nr_ours++] = i;
Re: [PATCH 05/15] ref-filter: abstract ref format into its own struct
On Thu, Jul 13, 2017 at 8:01 AM, Jeff Kingwrote: > builtin/branch.c | 14 +++--- > builtin/for-each-ref.c | 22 -- > builtin/tag.c | 30 -- > builtin/verify-tag.c | 12 ++-- > ref-filter.c | 22 -- > ref-filter.h | 22 +- > 6 files changed, 70 insertions(+), 52 deletions(-) The patch looks good to me. So some off-topic comments: I reviewed this patch from bottom up, i.e. I started looking at ref-filter.h, then ref-filter.c and then the rest. If only you had formatted the patches with an orderfile. ;)
Re: [PATCH 03/15] t: use test_decode_color rather than literal ANSI codes
On Thu, Jul 13, 2017 at 11:40:32AM -0700, Stefan Beller wrote: > On Thu, Jul 13, 2017 at 7:58 AM, Jeff Kingwrote: > > > I really only need t6300 and t6006 converted to build on for the rest of > > the series. But t4207 was easy to do. t4026 still uses raw codes, but > > converting it would be a pretty big job, so I punted. > > I think it is good to have raw codes in at least one place to test > that raw codes work, but then again it could be specific test calling > out that this is tested. Yeah, that thought crossed my mind, too. t4026 would definitely be the place for it, as it is about exhaustively testing the various colors. The others are just about feeding color codes through config and user-formats. > > @@ -59,7 +54,8 @@ EOF > > # to this test since it does not contain any decoration, hence > > --first-parent > > test_expect_success 'Commit Decorations Colored Correctly' ' > > git log --first-parent --abbrev=10 --all --decorate --oneline > > --color=always | > > - sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" >out && > > + sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" | > > + test_decode_color >out && > > Just some thoughts: > > This extension of the pipe-chain is not making it worse as gits exit code > is already hidden. Yes, I noticed the existing pipe-chain. We can fix it while we're here, though I think it's not a big deal in practice. > The sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" is sort of funny, because > I would have expected it to break in the future with e.g. the sha1 to longer > hash conversion. But as we specify --abbrev=10, this seems future proof. > In an ideal world this would be encapsulated in a function (c.f. > t/diff-lib.sh). I agree it's a bit gross. Possibly: git log --format='%C(auto)%d %s' would be easier for the test to parse (I'm pretty sure that didn't exist back when this test was written). -Peff
Re: [PATCH v1 0/4] Teach 'run' perf script to read config files
On Thu, Jul 13, 2017 at 11:29:10AM -0700, Junio C Hamano wrote: > > So then I think your config file primarily becomes about defining the > > properties of each run. I'm not sure if it would look like what you're > > starting on here or not. > > Yeah, I suspect that the final shape that defines the matrix might > have to become quite a bit different. I think it would help if the perf code was split better into three distinct bits: 1. A data-store capable of storing the run tuples along with their outcomes for each test. 2. A "run" front-end that runs various profiles (based on config, command-line options, etc) and writes the results to the data store. 3. A flexible viewer which can slice and dice the contents of the data store according to different parameters. We're almost there now. The "run" script actually does store results, and you can view them via "aggregate.pl" without actually re-running the tests. But the data store only indexes on one property: the tree that was tested (and all of the other properties are ignored totally; you can get some quite confusing results if you do a "./run" using say git.git as your test repo, and then a followup with "linux.git"). I have to imagine that somebody else has written such a system already that we could reuse. I don't know of one off-hand, but this is also not an area where I've spent a lot of time. We're sort of drifting off topic from Christian's patches here. But if we did have a third-party system, I suspect the interesting work would be setting up profiles for the "run" tool to kick off. And we might be stuck in such a case using whatever format the tool prefers. So having a sense of what the final solution looks like might help us know whether it makes sense to introduce a custom config format here. -Peff
Re: [PATCH 03/15] t: use test_decode_color rather than literal ANSI codes
On Thu, Jul 13, 2017 at 7:58 AM, Jeff Kingwrote: > I really only need t6300 and t6006 converted to build on for the rest of > the series. But t4207 was easy to do. t4026 still uses raw codes, but > converting it would be a pretty big job, so I punted. > I think it is good to have raw codes in at least one place to test that raw codes work, but then again it could be specific test calling out that this is tested. > @@ -59,7 +54,8 @@ EOF > # to this test since it does not contain any decoration, hence --first-parent > test_expect_success 'Commit Decorations Colored Correctly' ' > git log --first-parent --abbrev=10 --all --decorate --oneline > --color=always | > - sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" >out && > + sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" | > + test_decode_color >out && Just some thoughts: This extension of the pipe-chain is not making it worse as gits exit code is already hidden. The sed "s/[0-9a-f]\{10,10\}/COMMIT_ID/" is sort of funny, because I would have expected it to break in the future with e.g. the sha1 to longer hash conversion. But as we specify --abbrev=10, this seems future proof. In an ideal world this would be encapsulated in a function (c.f. t/diff-lib.sh). > @@ -61,8 +61,9 @@ test_format () { > # Feed to --format to provide predictable colored sequences. > AUTO_COLOR='%C(auto,red)foo%C(auto,reset)' > has_color () { > - printf '\033[31mfoo\033[m\n' >expect && > - test_cmp expect "$1" > + test_decode_color <"$1" >decoded && > + echo "foo" >expect && > + test_cmp expect decoded > } Thanks for removing hard coded colors :)
Re: [PATCH] submodule: use cheaper check for submodule pushes
Stefan Bellerwrites: > On Wed, Jul 12, 2017 at 5:53 PM, Junio C Hamano wrote: >> Jonathan Nieder writes: >> In the function push_submodule[1] we use add_submodule_odb[2] to determine if a submodule has been populated. However the function does not work with the submodules objects that are added, instead a new child process is used to perform the actual push in the submodule. Use is_submodule_populated[3] that is cheaper to guard from unpopulated submodules. [1] 'push_submodule' was added in eb21c732d6 (push: teach --recurse-submodules the on-demand option, 2012-03-29) [2] 'add_submodule_odb' was introduced in 752c0c2492 (Add the --submodule option to the diff option family, 2009-10-19) [3] 'is_submodule_populated' was added in 5688c28d81 (submodules: add helper to determine if a submodule is populated, 2016-12-16) >>> >>> These footnotes don't answer the question that I really have: why did >>> this use add_submodule_odb in the first place? >>> >>> E.g. did the ref iteration code require access to the object store >>> previously and stop requiring it later? >> >> Yes, the most important question is if it is safe to lose the access >> to the object store of the submodule. It is an endgame we should >> aim for to get rid of add_submodule_odb(), but does the rest of this >> codepath not require objects in the submodule at all or do we still >> need to change something to make it so? > > Yes, as the code in the current form as well as in its first occurrence > used the result of add_submodule_odb to determine if to spawn a child process. The original added so that the return value of the call can be used for that, and the current code still uses the return value for that purpose. That much is already known. I think Jonathan's question (which I concurred) is if we also ended up relying on the side effect of calling that function (i.e. being able to now find objects that are not in our repository but in the submodule's object store). By looking at the eb21c732d6, we can tell that the original didn't mean to and didn't add any code that relies on the ability to be able to read from the submodule object store. I am not sure if that is still true after 5 years (i.e. is there any new code added in the meantime that made us depend on the ability to read from submodule object store?). My hunch (and hope) is that we are probably safe, but that is a lot weaker than "yes this is a good change we want to apply".
Re: [PATCH v1 0/4] Teach 'run' perf script to read config files
Jeff Kingwrites: > Because perf-lib is built on test-lib, it already reads > GIT-BUILD-OPTIONS. And the Makefile copies several perf-related values > into it, including GIT_PERF_MAKE_OPTS and GIT_PERF_REPEAT_COUNT. So you > can already do: > ... > But right now the perf suite is not useful at > all for comparing two builds of the same tree. > For that, I think it > would be more useful if we could define a tuple of parameters for a run. > One of which could be the tree we're testing. Build opts are another. > Tested repository is another. And then we'd fill in a table of results > and let you slice up the table by any column (e.g., compare times for > runs against a single tree but with differing build options). Yeah, I think we saw this discussed in not-so-distant past, for which we want a good solution, and it might be the case that such a solution can be made easier to use with a separate configuration file (which this topic may or may not be used as a building block). > So then I think your config file primarily becomes about defining the > properties of each run. I'm not sure if it would look like what you're > starting on here or not. Yeah, I suspect that the final shape that defines the matrix might have to become quite a bit different.
Re: "groups of files" in Git?
Stefan Bellerwrites: > On Tue, Jul 11, 2017 at 8:45 AM, Nikolay Shustov > >> With Git I cannot seem to finding the possibility to figure out how to >> achieve the same result. And the problem is that putting change sets >> on different Git branches (or workdirs, or whatever Git offers that >> makes the changes to be NOT in the same source tree) is not a viable >> option from me as I would have to re-build code as I re-integrate the >> changes between the branches (or whatever changes separation Git >> feature is used). > > you would merge the branches and then run the tests/integration. Yes that > seems cumbersome. Sometimes the need to make trial merge for testing cannot be avoided and having branches for separate topics is the only sensible approach, at least in the Git world. Imagine your project has two components that are interrelated, say, the server and the client, that have to work well with each other. In addition, you want to make sure your updated server works well with existing clients, and vice versa. One way that naturally maps this scenario to the development workflow is to have a server-update topic and a client-update topic branches, and separate changes to update each side with their own commits: s---s---Sserver-update topic / ---o---ooMmainline \ c---c---Cclient-update topic And during the development of these *-update topics, you try three merges: (1) Merge S to the mainline M and test the whole thing, to make sure that existing client will still be able to talk with the updated server. (2) Merge C to the mainline M and test the whole thing, to make sure that updated clients will still be able to talk with the existing server. (3) Merge both S and C to the mainline M and test the whole thing, to make sure the updated ones talk to each other. If there is no significant development going on on the mainline in the meantime, (1) and (2) can be done by trying out S and C alone without making a trial merge with M. The same for (3)---it can be just a trial merge between S and C without updates that happened on the mainline. I'd love to hear from folks in Perforce or other world how they address this scenario with their system.
Re: [PATCH] commit & merge: modularize the empty message validator
On Tue, 2017-07-11 at 13:22 -0700, Junio C Hamano wrote: > I think the "validation" done with the rest_is_empty() is somewhat > bogus. Why should we reject a commit without a message and a > trailer block with only signed-off-by lines, while accepting a > commit without a message and a trailer block as long as the trailer > block has something equally meaningless by itself, like > "Helped-by:"? I think we should inspect the proposed commit log > message taken from the editor, find its tail ignoring the trailing > comment using ignore_non_trailer, and further separate the result > into (, , ) using the same > logic used by the interpret-trailers tool, and then complain when > turns out to be empty, to be truly useful and consistent. > I have a few doubts for which I need clarification to move on with this. 1. If we abort when the part is empty wouldn't it be too restrictive ? IOW, Wouldn't it affect users of "git commit --cleanup=verbatim" who wish to commit only the comments or parts of it ? (I'm not sure if someone would find that useful) 2. Is it ok to use the "find_trailer_start" function of "trailer.c" to locate the trailer? Note: It has a little issue that it wouldn't detect the trailer if the message comprises of one trailer alone and no other text. This case occurs while aborting a commit started using "git commit -s". Any possibilities to overcome the issue? 3. Ignoring point 1 for now, What other helper methods except the ones listed below could be helpful in the separating the cleaned up commit message into the , , ? * ignore_non_trailer * find_trailer_start -- Kaartic
Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)
On Thu, Jul 13, 2017 at 10:51:21AM -0700, Junio C Hamano wrote: > > But imagine force-with-lease rule were more like: when pushing > > branch "foo" to remote branch "bar", allow only if the current value of > > "bar" is an ancestor of some entry in the reflog of "foo". > > Would that cover the case that originally motivated the "with lease" > thing? Let's see. > > BC"foo" > / > o---A---B---C "bar" > > You are pushing back to "bar" that still has C (or it may have > acquired D) from "foo" that is now at BC. It is likely that you > started to prepare the "A---BC" history from "A---B---C" history, in > which case your current branch "foo" that is being pushed out would > have had C at its tip some time in the past. So seeing that the tip > of the remote is still C, which is in the reflog of "foo", we can > happily push it out. If the tip of the remote "bar" is now D > because somebody updated it, you did not consider it while preparing > your BC, and we want to fail the push---because D does not appear in > the reflog of "foo", that is achieved. Yeah. I think where this scheme falls down is if you incorporate work from the remote ref _without_ it ever being in your ref[1]. So for example I could do something like: $ git fetch ;# new commits arrive $ git log origin/master ;# oh my, that tip commit looks broken $ git reset --hard origin/master^ ;# lets get rid of it $ git push --force-with-lease and that tricks the system. But the nice thing is that it tricks it in the "safe" direction. We'd refuse such a push, unless you do it more like: $ git merge origin/master $ git reset --hard HEAD^ $ git push --force-with-lease Which is really the sane way to do it in the first place. What I'd wonder more is if there are cases where we fail in the unsafe direction. Here's the most plausible scenario I could come up with: $ git pull ;# get some new commits $ git reset --hard HEAD^ ;# doh, that merge didn't look right $ git rebase -i ;# hack hack hack $ git push --force-with-lease ;# oops, I should have re-merged We get fooled because yes, we were once on the remote's tip. But we discarded it and then did a _different_ rewriting operation (of course it is impossible for the tool to tell if the original "reset --hard" was an intentional rewriting operation, or one where we simply backed out a mistake). [1] Another sticking point is that this really does need to be in the reflog of the ref we are pushing (and not, e.g., HEAD). But one does not always push from a ref. I suspect that's OK in practice, though. If you are doing "git push --force-with-lease HEAD~2:master", it is probably OK for us to error out with "uh, lease from what?". > I wonder if this could be a replacement for the current "the user > did not even specify what the expected current value is, so we > pretend as if the tip of the remote-tracking branch was given" > kludge. Yes, that is exactly what I was thinking of (and why I said that even though this really isn't force-with-lease in a strict sense, it slots into the same level of safety, so it might be worth using the name). > Instead, > > git push --force-with-lease origin master > git push --force-with-lease origin topic:master > git push --force-with-lease origin HEAD:master > > could > > (1) first learn where 'refs/heads/master' over there is at. Call > it X (it may be C or D in the earlier example). > > (2) locate from which ref the commit we are pushing out is taken; > in the above examples, they are our refs/heads/master, > refs/heads/topic, and HEAD, respectively. Call it R. > > (3) see if the reflog of R has X. If so do a --force push; > otherwise fail. Yes, more or less. A few thoughts: - that step 2 is where the "wait, that isn't even a ref" error above would come in - I suspect in the third example we probably ought to be using the reflog of the branch that HEAD points to. You would not want: $ git checkout advanced-branch $ git checkout older-branch $ git push --force-with-lease origin HEAD:older-branch to consider that commits in advanced-branch are part of the lease. - For step 3, I'm not sure if we you mean to look for exactly X, or if it would be OK to have any commit whose ancestor is X. I think you'd need the latter to accommodate a non-fast-forward "git pull" (or fetch+merge) where the local ref is never set precisely to the upstream commit. -Peff
Re: "groups of files" in Git?
Nikolay Shustovwrites: > Thank you for the detailed explanation, it looks like merging the > commits would be helpful in my case. And I think it is a very good > analogy that Perforce changelists are like multiple pending committs, > if Git were supporting such. > > What it won't be achieving by using commits in this schema is the > following thing I can do in Perforce: > In the uncommitted Perforce changelists I can revert the changed file > to the original state and move the files between the changelists. > Quite often, while working on something, in the middle I would decide > to isolate changes to a certain set of files to a separate changelsit > - but then I might change my mind. It is all flexible until I actually > commit my Perforce changelist, after which it becomes very much as > committed changes in any other source control. > This is actual flexibility I am looking for achieving in Git. I actually think we already have such a flexibility. Unlike Perforce, Git is distributed, and the most important aspect of the distinction is that what happens _in_ your local Git repository may be called "committed" in Git lingo, but not visible to the public. You can consider these commits you make in your repository "pending" when you think of your workflow in Perforce terms, until you merge and push out the result, which roughly corresponds to "submitting" in Perforce lingo. Once you start treating your local commits that you haven't pushed out as changes that are still "pending" when observed from the outside world, you'd realize that you have as much flexibilty, if not more, to dice and slice them with the local tools like "rebase -i", "add -p", etc., as you would have in your Perforce workflow, I would think.
Re: [PATCH] commit & merge: modularize the empty message validator
Kaartic Sivaraamwrites: > Sometimes I abort an commit from from the editor by providing an empty > commit message. Then I came to know that 'git commit' considers commit > messages with just signed-off-by lines as an empty message. I tried to > take advantage of that. I once tried to abort a merge by just removing > the "Merge ..." line and leaving the "Signed-off" line and was > surprised to see the merge happen instead of an abort. The rest is > history. :) I think many people know about and do use the "delete all lines" (i.e. ":1,$d" in vi, or \M-< \C-SPC \M-> \C-w in Emacs) to abort out of a commit or a merge. I just do not think it is likely for them to leave Sign-off lines and remove everything else, which is more work than to delete everything, hence my reaction.
Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)
Jeff Kingwrites: > I do think Dscho is on to something with the "see if it was ever in our > reflog" idea. Yes, I found the idea was interesting when I responded with the "thinking aloud", and I still do think it has some uses elsewhere, even if not in "pull --rebase" workflow. > What you really want as a default for force-with-lease is > some way to say "I want to change that ref from X to Y, but only for > values of X that I know Y has already considered and incorporated". Correct. > But imagine force-with-lease rule were more like: when pushing > branch "foo" to remote branch "bar", allow only if the current value of > "bar" is an ancestor of some entry in the reflog of "foo". Would that cover the case that originally motivated the "with lease" thing? Let's see. BC"foo" / o---A---B---C "bar" You are pushing back to "bar" that still has C (or it may have acquired D) from "foo" that is now at BC. It is likely that you started to prepare the "A---BC" history from "A---B---C" history, in which case your current branch "foo" that is being pushed out would have had C at its tip some time in the past. So seeing that the tip of the remote is still C, which is in the reflog of "foo", we can happily push it out. If the tip of the remote "bar" is now D because somebody updated it, you did not consider it while preparing your BC, and we want to fail the push---because D does not appear in the reflog of "foo", that is achieved. > That degrades to the usual fast-forward rule if you just check the ref > tip. And when checking the reflog entries, it shows that at some point > you had your local branch set to something that covered all of the > destination, but then later rewound or rewrote history. We don't have to > care what that operation was. "rebase" is a likely one, but "git reset > --hard HEAD^ && git push" would fall out naturally from the same rule. > > It's not quite as safe as a true force-with-lease with a value would be. All correct. I wonder if this could be a replacement for the current "the user did not even specify what the expected current value is, so we pretend as if the tip of the remote-tracking branch was given" kludge. Instead, git push --force-with-lease origin master git push --force-with-lease origin topic:master git push --force-with-lease origin HEAD:master could (1) first learn where 'refs/heads/master' over there is at. Call it X (it may be C or D in the earlier example). (2) locate from which ref the commit we are pushing out is taken; in the above examples, they are our refs/heads/master, refs/heads/topic, and HEAD, respectively. Call it R. (3) see if the reflog of R has X. If so do a --force push; otherwise fail. With such an update, I suspect that it would become a lot safer than relying on the stability of the remote tracking branch, which is what the current code does. As you said, this is not as safe as a true "the user knows and tells what commit was considered" force-with-lease, but can be a lot safer and can become a replacement for the current "the user does not tell and allows us to DWIM". A good thing is that the DWIM is advertised like so: ... are still experimental and their semantics may change so we are free to change it to any better version. And I think Dscho's idea could be that better one. I am still somewhat reserved because in the above, I only illustrated a case that would work better than the current one, without trying hard to poke a hole at the reasoning and to come up with cases that would work worse. But I agree that this is on to something good ;-)
[PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname
From: Jeff HostetlerCreate subclass of oidset where each entry has a field to store the length of the object's content and an optional pathname. This will be used in a future commit to build a manifest of omitted objects in a partial/narrow clone/fetch. Signed-off-by: Jeff Hostetler --- Makefile | 1 + oidset2.c | 101 ++ oidset2.h | 56 ++ 3 files changed, 158 insertions(+) create mode 100644 oidset2.c create mode 100644 oidset2.h diff --git a/Makefile b/Makefile index ffa6da7..d590508 100644 --- a/Makefile +++ b/Makefile @@ -791,6 +791,7 @@ LIB_OBJS += notes-merge.o LIB_OBJS += notes-utils.o LIB_OBJS += object.o LIB_OBJS += oidset.o +LIB_OBJS += oidset2.o LIB_OBJS += pack-bitmap.o LIB_OBJS += pack-bitmap-write.o LIB_OBJS += pack-check.o diff --git a/oidset2.c b/oidset2.c new file mode 100644 index 000..806d153 --- /dev/null +++ b/oidset2.c @@ -0,0 +1,101 @@ +#include "cache.h" +#include "oidset2.h" + +static int oidset2_hashcmp(const void *va, const void *vb, + const void *vkey) +{ + const struct oidset2_entry *a = va, *b = vb; + const struct object_id *key = vkey; + return oidcmp(>oid, key ? key : >oid); +} + +struct oidset2_entry *oidset2_get(const struct oidset2 *set, const struct object_id *oid) +{ + struct hashmap_entry key; + struct oidset2_entry *value; + + if (!set->map.cmpfn) + return NULL; + + hashmap_entry_init(, sha1hash(oid->hash)); + value = hashmap_get(>map, , oid); + + return value; +} + +int oidset2_contains(const struct oidset2 *set, const struct object_id *oid) +{ + return !!oidset2_get(set, oid); +} + +int oidset2_insert(struct oidset2 *set, const struct object_id *oid, + int64_t object_length, const char *pathname) +{ + struct oidset2_entry *entry; + + if (!set->map.cmpfn) + hashmap_init(>map, oidset2_hashcmp, 0); + + if (oidset2_contains(set, oid)) + return 1; + + entry = xcalloc(1, sizeof(*entry)); + hashmap_entry_init(>hash, sha1hash(oid->hash)); + oidcpy(>oid, oid); + + entry->object_length = object_length; + if (pathname) + entry->pathname = strdup(pathname); + + hashmap_add(>map, entry); + return 0; +} + +void oidset2_remove(struct oidset2 *set, const struct object_id *oid) +{ + struct hashmap_entry key; + struct oidset2_entry *e; + + hashmap_entry_init(, sha1hash(oid->hash)); + e = hashmap_remove(>map, , oid); + + free(e->pathname); + free(e); +} + +void oidset2_clear(struct oidset2 *set) +{ + hashmap_free(>map, 1); +} + +static int oidset2_cmp(const void *a, const void *b) +{ + const struct oidset2_entry *ae = *((const struct oidset2_entry **)a); + const struct oidset2_entry *be = *((const struct oidset2_entry **)b); + + return oidcmp(>oid, >oid); +} + +void oidset2_foreach(struct oidset2 *set, oidset2_foreach_cb cb, void *cb_data) +{ + struct hashmap_iter iter; + struct oidset2_entry **array; + struct oidset2_entry *e; + int j, k; + + array = xcalloc(set->map.size, sizeof(*e)); + + hashmap_iter_init(>map, ); + k = 0; + while ((e = hashmap_iter_next())) + array[k++] = e; + + QSORT(array, k, oidset2_cmp); + + for (j = 0; j < k; j++) { + e = array[j]; + cb(j, k, e, cb_data); + } + + free(array); +} diff --git a/oidset2.h b/oidset2.h new file mode 100644 index 000..c498eae --- /dev/null +++ b/oidset2.h @@ -0,0 +1,56 @@ +#ifndef OIDSET2_H +#define OIDSET2_H + +/** + * oidset2 is a variant of oidset, but allows additional fields for each object. + */ + +/** + * A single oidset2; should be zero-initialized (or use OIDSET2_INIT). + */ +struct oidset2 { + struct hashmap map; +}; + +#define OIDSET2_INIT { { NULL } } + +struct oidset2_entry { + struct hashmap_entry hash; + struct object_id oid; + + int64_t object_length; /* This is SIGNED. Use -1 when unknown. */ + char *pathname; +}; + +struct oidset2_entry *oidset2_get(const struct oidset2 *set, const struct object_id *oid); + +/** + * Returns true iff `set` contains `oid`. + */ +int oidset2_contains(const struct oidset2 *set, const struct object_id *oid); + +/** + * Insert the oid into the set; a copy is made, so "oid" does not need + * to persist after this function is called. + * + * Returns 1 if the oid was already in the set, 0 otherwise. This can be used + * to perform an efficient check-and-add. + */ +int oidset2_insert(struct oidset2 *set, const struct object_id *oid, + int64_t object_length, const char *pathname); + +void oidset2_remove(struct oidset2 *set, const struct object_id *oid); + +typedef void (*oidset2_foreach_cb)( +
[PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter
From: Jeff HostetlerCreate a simple filter for traverse_commit_list_filtered() to omit all blobs from the result. This filter will be used in a future commit by rev-list and pack-objects to create a "commits and trees" result. This is intended for a narrow/partial clone/fetch. Signed-off-by: Jeff Hostetler --- Makefile | 1 + list-objects-filters.c | 85 ++ list-objects-filters.h | 17 ++ 3 files changed, 103 insertions(+) create mode 100644 list-objects-filters.c create mode 100644 list-objects-filters.h diff --git a/Makefile b/Makefile index d590508..48fdcf2 100644 --- a/Makefile +++ b/Makefile @@ -773,6 +773,7 @@ LIB_OBJS += levenshtein.o LIB_OBJS += line-log.o LIB_OBJS += line-range.o LIB_OBJS += list-objects.o +LIB_OBJS += list-objects-filters.o LIB_OBJS += ll-merge.o LIB_OBJS += lockfile.o LIB_OBJS += log-tree.o diff --git a/list-objects-filters.c b/list-objects-filters.c new file mode 100644 index 000..f29d8bc --- /dev/null +++ b/list-objects-filters.c @@ -0,0 +1,85 @@ +#include "cache.h" +#include "dir.h" +#include "tag.h" +#include "commit.h" +#include "tree.h" +#include "blob.h" +#include "diff.h" +#include "tree-walk.h" +#include "revision.h" +#include "list-objects.h" +#include "list-objects-filters.h" + +/* + * A filter for list-objects to omit ALL blobs from the traversal. + */ +struct filter_omit_all_blobs_data { + struct oidset2 omits; +}; + +static list_objects_filter_result filter_omit_all_blobs( + list_objects_filter_type filter_type, + struct object *obj, + const char *pathname, + const char *filename, + void *filter_data_) +{ + struct filter_omit_all_blobs_data *filter_data = filter_data_; + int64_t object_length = -1; + unsigned long s; + enum object_type t; + + switch (filter_type) { + default: + die("unkown filter_type"); + return LOFR_ZERO; + + case LOFT_BEGIN_TREE: + assert(obj->type == OBJ_TREE); + /* always include all tree objects */ + return LOFR_MARK_SEEN | LOFR_SHOW; + + case LOFT_END_TREE: + assert(obj->type == OBJ_TREE); + return LOFR_ZERO; + + case LOFT_BLOB: + assert(obj->type == OBJ_BLOB); + assert((obj->flags & SEEN) == 0); + + /* +* Since we always omit all blobs (and never provisionally omit), +* we should never see a blob twice. +*/ + assert(!oidset2_contains(_data->omits, >oid)); + + t = sha1_object_info(obj->oid.hash, ); + assert(t == OBJ_BLOB); + object_length = (int64_t)((uint64_t)(s)); + + /* Insert OID into the omitted list. No need for a pathname. */ + oidset2_insert(_data->omits, >oid, object_length, + NULL); + return LOFR_MARK_SEEN; /* but not LOFR_SHOW (hard omit) */ + } +} + +void traverse_commit_list_omit_all_blobs( + struct rev_info *revs, + show_commit_fn show_commit, + show_object_fn show_object, + oidset2_foreach_cb print_omitted_object, + void *ctx_data) +{ + struct filter_omit_all_blobs_data d; + + memset(, 0, sizeof(d)); + + traverse_commit_list_filtered(revs, show_commit, show_object, ctx_data, + filter_omit_all_blobs, ); + + if (print_omitted_object) + oidset2_foreach(, print_omitted_object, ctx_data); + + oidset2_clear(); +} diff --git a/list-objects-filters.h b/list-objects-filters.h new file mode 100644 index 000..b981020 --- /dev/null +++ b/list-objects-filters.h @@ -0,0 +1,17 @@ +#ifndef LIST_OBJECTS_FILTERS_H +#define LIST_OBJECTS_FILTERS_H + +#include "oidset2.h" + +/* + * A filter for list-objects to omit ALL blobs + * from the traversal. + */ +void traverse_commit_list_omit_all_blobs( + struct rev_info *revs, + show_commit_fn show_commit, + show_object_fn show_object, + oidset2_foreach_cb print_omitted_object, + void *ctx_data); + +#endif /* LIST_OBJECTS_FILTERS_H */ -- 2.9.3
[PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter
From: Jeff HostetlerCreate a filter for traverse_commit_list_filtered() to omit the blobs that would not be needed by a sparse checkout using the given sparse-checkout spec. This filter will be used in a future commit by rev-list and pack-objects for partial/narrow clone/fetch. A future enhancement should be able to also omit tree objects not needed by such a sparse checkout, but that is not currently supported. Signed-off-by: Jeff Hostetler --- list-objects-filters.c | 179 + list-objects-filters.h | 16 + 2 files changed, 195 insertions(+) diff --git a/list-objects-filters.c b/list-objects-filters.c index f04d70e..cacf645 100644 --- a/list-objects-filters.c +++ b/list-objects-filters.c @@ -180,3 +180,182 @@ void traverse_commit_list_omit_large_blobs( oidset2_clear(); } + +/* + * A filter driven by a sparse-checkout specification to only + * include blobs that a sparse checkout would populate. + * + * The sparse-checkout spec is loaded from the blob with the + * given OID (rather than .git/info/sparse-checkout) because + * the repo may be bare. + */ +struct frame { + int defval; + int child_prov_omit : 1; +}; + +struct filter_use_sparse_data { + struct oidset2 omits; + struct exclude_list el; + + size_t nr, alloc; + struct frame *array_frame; +}; + +static list_objects_filter_result filter_use_sparse( + list_objects_filter_type filter_type, + struct object *obj, + const char *pathname, + const char *filename, + void *filter_data_) +{ + struct filter_use_sparse_data *filter_data = filter_data_; + int64_t object_length = -1; + int val, dtype; + unsigned long s; + enum object_type t; + struct frame *frame; + + switch (filter_type) { + default: + die("unkown filter_type"); + return LOFR_ZERO; + + case LOFT_BEGIN_TREE: + assert(obj->type == OBJ_TREE); + dtype = DT_DIR; + val = is_excluded_from_list(pathname, strlen(pathname), + filename, , _data->el); + if (val < 0) + val = filter_data->array_frame[filter_data->nr].defval; + + ALLOC_GROW(filter_data->array_frame, filter_data->nr + 1, + filter_data->alloc); + filter_data->nr++; + filter_data->array_frame[filter_data->nr].defval = val; + filter_data->array_frame[filter_data->nr].child_prov_omit = 0; + + /* +* A directory with this tree OID may appear in multiple +* places in the tree. (Think of a directory move, with +* no other changes.) And with a different pathname, the +* is_excluded...() results for this directory and items +* contained within it may be different. So we cannot +* mark it SEEN (yet), since that will prevent process_tree() +* from revisiting this tree object with other pathnames. +* +* Only SHOW the tree object the first time we visit this +* tree object. +* +* We always show all tree objects. A future optimization +* may want to attempt to narrow this. +*/ + if (obj->flags & FILTER_REVISIT) + return LOFR_ZERO; + obj->flags |= FILTER_REVISIT; + return LOFR_SHOW; + + case LOFT_END_TREE: + assert(obj->type == OBJ_TREE); + assert(filter_data->nr > 0); + + frame = _data->array_frame[filter_data->nr]; + filter_data->nr--; + + /* +* Tell our parent directory if any of our children were +* provisionally omitted. +*/ + filter_data->array_frame[filter_data->nr].child_prov_omit |= + frame->child_prov_omit; + + /* +* If there are NO provisionally omitted child objects (ALL child +* objects in this folder were INCLUDED), then we can mark the +* folder as SEEN (so we will not have to revisit it again). +*/ + if (!frame->child_prov_omit) + return LOFR_MARK_SEEN; + return LOFR_ZERO; + + case LOFT_BLOB: + assert(obj->type == OBJ_BLOB); + assert((obj->flags & SEEN) == 0); + + frame = _data->array_frame[filter_data->nr]; + + /* +* If we previously provisionally omitted this blob because +* its pathname was not in the sparse-checkout AND this +* reference to the blob has the same pathname, we can avoid +
[PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter
From: Jeff HostetlerCreate a filter for traverse_commit_list_filtered() to omit blobs larger than a requested size from the result, but always include ".git*" special files. This filter will be used in a future commit by rev-list and pack-objects for partial/narrow clone/fetch. Signed-off-by: Jeff Hostetler --- list-objects-filters.c | 97 ++ list-objects-filters.h | 12 +++ 2 files changed, 109 insertions(+) diff --git a/list-objects-filters.c b/list-objects-filters.c index f29d8bc..f04d70e 100644 --- a/list-objects-filters.c +++ b/list-objects-filters.c @@ -83,3 +83,100 @@ void traverse_commit_list_omit_all_blobs( oidset2_clear(); } + +/* + * A filter for list-objects to omit large blobs, + * but always include ".git*" special files. + */ +struct filter_omit_large_blobs_data { + struct oidset2 omits; + int64_t max_bytes; +}; + +static list_objects_filter_result filter_omit_large_blobs( + list_objects_filter_type filter_type, + struct object *obj, + const char *pathname, + const char *filename, + void *filter_data_) +{ + struct filter_omit_large_blobs_data *filter_data = filter_data_; + int64_t object_length = -1; + unsigned long s; + enum object_type t; + + switch (filter_type) { + default: + die("unkown filter_type"); + return LOFR_ZERO; + + case LOFT_BEGIN_TREE: + assert(obj->type == OBJ_TREE); + /* always include all tree objects */ + return LOFR_MARK_SEEN | LOFR_SHOW; + + case LOFT_END_TREE: + assert(obj->type == OBJ_TREE); + return LOFR_ZERO; + + case LOFT_BLOB: + assert(obj->type == OBJ_BLOB); + assert((obj->flags & SEEN) == 0); + + /* +* If previously provisionally omitted (because of size), see if the +* current filename is special and force it to be included. +*/ + if (oidset2_contains(_data->omits, >oid)) { + if ((strncmp(filename, ".git", 4) == 0) && filename[4]) { + oidset2_remove(_data->omits, >oid); + return LOFR_MARK_SEEN | LOFR_SHOW; + } + return LOFR_ZERO; /* continue provisionally omitting it */ + } + + t = sha1_object_info(obj->oid.hash, ); + assert(t == OBJ_BLOB); + object_length = (int64_t)((uint64_t)(s)); + + if (object_length < filter_data->max_bytes) + return LOFR_MARK_SEEN | LOFR_SHOW; + + /* +* Provisionally omit it. We've already established that this blob +* is too big and doesn't have a special filename, so we WANT to +* omit it. However, there may be a special file elsewhere in the +* tree that references this same blob, so we cannot reject it yet. +* Leave the LOFR_ bits unset so that if the blob appears again in +* the traversal, we will be asked again. +* +* No need for a pathname, since we only test for special filenames +* above. +*/ + oidset2_insert(_data->omits, >oid, object_length, + NULL); + return LOFR_ZERO; + } +} + +void traverse_commit_list_omit_large_blobs( + struct rev_info *revs, + show_commit_fn show_commit, + show_object_fn show_object, + oidset2_foreach_cb print_omitted_object, + void *ctx_data, + int64_t large_byte_limit) +{ + struct filter_omit_large_blobs_data d; + + memset(, 0, sizeof(d)); + d.max_bytes = large_byte_limit; + + traverse_commit_list_filtered(revs, show_commit, show_object, ctx_data, + filter_omit_large_blobs, ); + + if (print_omitted_object) + oidset2_foreach(, print_omitted_object, ctx_data); + + oidset2_clear(); +} diff --git a/list-objects-filters.h b/list-objects-filters.h index b981020..32b2833 100644 --- a/list-objects-filters.h +++ b/list-objects-filters.h @@ -14,4 +14,16 @@ void traverse_commit_list_omit_all_blobs( oidset2_foreach_cb print_omitted_object, void *ctx_data); +/* + * A filter for list-objects to omit large blobs, + * but always include ".git*" special files. + */ +void traverse_commit_list_omit_large_blobs( + struct rev_info *revs, + show_commit_fn show_commit, + show_object_fn show_object, + oidset2_foreach_cb print_omitted_object, + void *ctx_data, + int64_t large_byte_limit); + #endif /* LIST_OBJECTS_FILTERS_H */ -- 2.9.3
[PATCH v2 10/19] t6112: rev-list object filtering test
From: Jeff HostetlerSigned-off-by: Jeff Hostetler --- t/t6112-rev-list-filters-objects.sh | 37 + 1 file changed, 37 insertions(+) create mode 100644 t/t6112-rev-list-filters-objects.sh diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh new file mode 100644 index 000..ded2b04 --- /dev/null +++ b/t/t6112-rev-list-filters-objects.sh @@ -0,0 +1,37 @@ +#!/bin/sh + +test_description='git rev-list with object filtering' + +. ./test-lib.sh + +test_expect_success 'setup' ' + for n in 1 2 3 4 5 ; do \ + echo $n > file.$n ; \ + git add file.$n ; \ + git commit -m "$n" ; \ + done +' + +test_expect_success 'omit-all-blobs omitted 5 blobs' ' + git rev-list HEAD --objects --filter-print-manifest --filter-omit-all-blobs >omit_all && + grep "^~" omit_all >omitted && + test $(cat omitted | wc -l) = 5 +' + +test_expect_success 'omit-all-blobs blob sha match' ' + git rev-list HEAD --objects >normal && + awk "/file/ {print \$1;}" normal_sha && + sed "s/~//" omit_all_sha && + test_cmp normal_sha omit_all_sha +' + +test_expect_success 'omit-all-blobs nothing else changed' ' + grep -v "file" normal_other && + grep -v "~" omit_other && + test_cmp normal_other omit_other +' + +# TODO test filter-omit-large-blobs +# TODO test filter-use-sparse + +test_done -- 2.9.3
[PATCH v2 03/19] list-objects: filter objects in traverse_commit_list
From: Jeff HostetlerCreate traverse_commit_list_filtered() and add filtering interface to allow certain objects to be omitted (not shown) during a traversal. Update traverse_commit_list() to be a wrapper for the above. Filtering will be used in a future commit by rev-list and pack-objects for narrow/partial clone/fetch to omit certain blobs from the output. traverse_bitmap_commit_list() does not work with filtering. If a packfile bitmap is present, it will not be used. Signed-off-by: Jeff Hostetler --- list-objects.c | 66 -- list-objects.h | 30 ++ 2 files changed, 80 insertions(+), 16 deletions(-) diff --git a/list-objects.c b/list-objects.c index f3ca6aa..8dddeda 100644 --- a/list-objects.c +++ b/list-objects.c @@ -13,10 +13,13 @@ static void process_blob(struct rev_info *revs, show_object_fn show, struct strbuf *path, const char *name, -void *cb_data) +void *cb_data, +filter_object_fn filter, +void *filter_data) { struct object *obj = >object; size_t pathlen; + list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_SHOW; if (!revs->blob_objects) return; @@ -24,11 +27,15 @@ static void process_blob(struct rev_info *revs, die("bad blob object"); if (obj->flags & (UNINTERESTING | SEEN)) return; - obj->flags |= SEEN; pathlen = path->len; strbuf_addstr(path, name); - show(obj, path->buf, cb_data); + if (filter) + r = filter(LOFT_BLOB, obj, path->buf, >buf[pathlen], filter_data); + if (r & LOFR_MARK_SEEN) + obj->flags |= SEEN; + if (r & LOFR_SHOW) + show(obj, path->buf, cb_data); strbuf_setlen(path, pathlen); } @@ -69,7 +76,9 @@ static void process_tree(struct rev_info *revs, show_object_fn show, struct strbuf *base, const char *name, -void *cb_data) +void *cb_data, +filter_object_fn filter, +void *filter_data) { struct object *obj = >object; struct tree_desc desc; @@ -77,6 +86,7 @@ static void process_tree(struct rev_info *revs, enum interesting match = revs->diffopt.pathspec.nr == 0 ? all_entries_interesting: entry_not_interesting; int baselen = base->len; + list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_SHOW; if (!revs->tree_objects) return; @@ -90,9 +100,13 @@ static void process_tree(struct rev_info *revs, die("bad tree object %s", oid_to_hex(>oid)); } - obj->flags |= SEEN; strbuf_addstr(base, name); - show(obj, base->buf, cb_data); + if (filter) + r = filter(LOFT_BEGIN_TREE, obj, base->buf, >buf[baselen], filter_data); + if (r & LOFR_MARK_SEEN) + obj->flags |= SEEN; + if (r & LOFR_SHOW) + show(obj, base->buf, cb_data); if (base->len) strbuf_addch(base, '/'); @@ -112,7 +126,7 @@ static void process_tree(struct rev_info *revs, process_tree(revs, lookup_tree(entry.oid->hash), show, base, entry.path, -cb_data); +cb_data, filter, filter_data); else if (S_ISGITLINK(entry.mode)) process_gitlink(revs, entry.oid->hash, show, base, entry.path, @@ -121,8 +135,17 @@ static void process_tree(struct rev_info *revs, process_blob(revs, lookup_blob(entry.oid->hash), show, base, entry.path, -cb_data); +cb_data, filter, filter_data); } + + if (filter) { + r = filter(LOFT_END_TREE, obj, base->buf, >buf[baselen], filter_data); + if (r & LOFR_MARK_SEEN) + obj->flags |= SEEN; + if (r & LOFR_SHOW) + show(obj, base->buf, cb_data); + } + strbuf_setlen(base, baselen); free_tree_buffer(tree); } @@ -183,10 +206,10 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree) add_pending_object(revs, >object, ""); } -void traverse_commit_list(struct rev_info *revs, - show_commit_fn show_commit, - show_object_fn show_object, - void
[PATCH v2 09/19] rev-list: add filtering help text
From: Jeff HostetlerSigned-off-by: Jeff Hostetler --- Documentation/git-rev-list.txt | 7 ++- Documentation/rev-list-options.txt | 26 ++ 2 files changed, 32 insertions(+), 1 deletion(-) diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt index ef22f17..d20c2ab 100644 --- a/Documentation/git-rev-list.txt +++ b/Documentation/git-rev-list.txt @@ -47,7 +47,12 @@ SYNOPSIS [ --fixed-strings | -F ] [ --date=] [ [ --objects | --objects-edge | --objects-edge-aggressive ] - [ --unpacked ] ] + [ --unpacked ] + [ [ --filter-omit-all-blobs | + --filter-omit-large-blobs=[kmg] | + --filter-use-sparse= ] +[ --filter-print-manifest ] ] ] +[ --filter-relax ] [ --pretty | --header ] [ --bisect ] [ --bisect-vars ] diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index a02f732..e0112dd 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -693,6 +693,32 @@ ifdef::git-rev-list[] --unpacked:: Only useful with `--objects`; print the object IDs that are not in packs. + +--filter-omit-all-blobs:: + Only useful with one of the `--objects*`; omits all blobs from + the printed list of objects. + +--filter-omit-large-blobs=[kmg]:: + Only useful with one of the `--objects*`; omits blobs larger than + n bytes from the printed list of objects. May optionally be + followed by 'k', 'm', or 'g' units. Value may be zero. Special + files (matching ".git*") are always included, regardless of size. + +--filter-use-sparse=:: + Only useful with one of the `--objects*`; uses a sparse-checkout + specification contained in the given object to filter the result + by omitting blobs that would not be used by the corresponding + sparse checkout. + +--filter-print-manifest:: + Only useful with one of the above `--filter*`; prints a manifest + of the omitted objects. Object IDs are prefixed with a ``~'' + character. The object size is printed after the ID. + +--filter-relax:: + Relax consistency checking for missing blobs. Do not warn of + missing blobs during normal (non-filtering) object traversal + following an earlier partial/narrow clone or fetch. endif::git-rev-list[] --no-walk[=(sorted|unsorted)]:: -- 2.9.3
[PATCH v2 19/19] fetch: add object filtering to fetch
From: Jeff HostetlerSigned-off-by: Jeff Hostetler --- builtin/fetch.c | 27 ++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index 5f2c2ab..306c165 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -16,6 +16,7 @@ #include "connected.h" #include "argv-array.h" #include "utf8.h" +#include "object-filter.h" static const char * const builtin_fetch_usage[] = { N_("git fetch [] [ [...]]"), @@ -52,6 +53,7 @@ static const char *recurse_submodules_default; static int shown_url = 0; static int refmap_alloc, refmap_nr; static const char **refmap_array; +static struct object_filter_options filter_options; static int option_parse_recurse_submodules(const struct option *opt, const char *arg, int unset) @@ -141,6 +143,14 @@ static struct option builtin_fetch_options[] = { TRANSPORT_FAMILY_IPV4), OPT_SET_INT('6', "ipv6", , N_("use IPv6 addresses only"), TRANSPORT_FAMILY_IPV6), + + OPT_PARSE_FILTER_OMIT_ALL_BLOBS(_options), + OPT_PARSE_FILTER_OMIT_LARGE_BLOBS(_options), + OPT_PARSE_FILTER_USE_SPARSE(_options), + + /* OPT_PARSE_FILTER_PRINT_MANIFEST(_options), */ + /* OPT_PARSE_FILTER_RELAX(_options), */ + OPT_END() }; @@ -733,6 +743,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name, const char *filename = dry_run ? "/dev/null" : git_path_fetch_head(); int want_status; int summary_width = transport_summary_width(ref_map); + struct check_connected_options opt = CHECK_CONNECTED_INIT; + + /* +* Relax consistency check to allow missing blobs (presumably +* because they are exactly the set that we requested be +* omitted. +*/ + opt.filter_relax = object_filter_enabled(_options); fp = fopen(filename, "a"); if (!fp) @@ -744,7 +762,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name, url = xstrdup("foreign"); rm = ref_map; - if (check_connected(iterate_ref_map, , NULL)) { + if (check_connected(iterate_ref_map, , )) { rc = error(_("%s did not send all necessary objects\n"), url); goto abort; } @@ -885,6 +903,13 @@ static int quickfetch(struct ref *ref_map) struct check_connected_options opt = CHECK_CONNECTED_INIT; /* +* Relax consistency check to allow missing blobs (presumably +* because they are exactly the set that we requested be +* omitted. +*/ + opt.filter_relax = object_filter_enabled(_options); + + /* * If we are deepening a shallow clone we already have these * objects reachable. Running rev-list here will return with * a good (0) exit status and we'll bypass the fetch that we -- 2.9.3
[PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation
From: Jeff HostetlerSigned-off-by: Jeff Hostetler --- Documentation/technical/pack-protocol.txt | 16 Documentation/technical/protocol-capabilities.txt | 7 +++ 2 files changed, 23 insertions(+) diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt index a349171..dce6e04 100644 --- a/Documentation/technical/pack-protocol.txt +++ b/Documentation/technical/pack-protocol.txt @@ -212,6 +212,7 @@ out of what the server said it could do with the first 'want' line. upload-request= want-list *shallow-line *1depth-request + [filter-request] flush-pkt want-list = first-want @@ -226,7 +227,13 @@ out of what the server said it could do with the first 'want' line. first-want= PKT-LINE("want" SP obj-id SP capability-list) additional-want = PKT-LINE("want" SP obj-id) + filter-request= PKT-LINE("filter-omit-all-blobs") / + PKT-LINE("filter-omit-large-blobs" SP magnitude) / + PKT-LINE("filter-use-sparse" SP obj-id) + depth = 1*DIGIT + + magnitude = 1*DIGIT [ "k" | "m" | "g" ] Clients MUST send all the obj-ids it wants from the reference @@ -249,6 +256,15 @@ complete those commits. Commits whose parents are not received as a result are defined as shallow and marked as such in the server. This information is sent back to the client in the next step. +The client can optionally request that pack-objects omit various +objects from the packfile using one of several filtering techniques. +These are intended for use with partial/narrow clone/fetch operations. +"filter-omit-all-blobs" requests that all blobs be omitted from +the packfile. "filter-omit-large-blobs" requests that blobs larger +than the requested size be omitted, unless they have a ".git*" +special filename. "filter-use-sparse" requests blob filtering based +upon a sparse-checkout specification in the given blob id. + Once all the 'want's and 'shallow's (and optional 'deepen') are transferred, clients MUST send a flush-pkt, to tell the server side that it is done sending the list. diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt index 26dcc6f..7011eb3 100644 --- a/Documentation/technical/protocol-capabilities.txt +++ b/Documentation/technical/protocol-capabilities.txt @@ -309,3 +309,10 @@ to accept a signed push certificate, and asks the to be included in the push certificate. A send-pack client MUST NOT send a push-cert packet unless the receive-pack server advertises this capability. + +filter-objects +-- + +If the upload-pack server advertises the 'filter-objects' capability, +fetch-pack may send "filter-*" commands to request a partial/narrow +clone/fetch where the server omits various objects from the packfile. -- 2.9.3