Re: [PATCH v1 03/25] structured-logging: add structured logging framework
Jeff Hostetler wrote: [...] > --- a/compat/mingw.h > +++ b/compat/mingw.h > @@ -144,8 +144,15 @@ static inline int fcntl(int fd, int cmd, ...) > errno = EINVAL; > return -1; > } > + > /* bash cannot reliably detect negative return codes as failure */ > +#if defined(STRUCTURED_LOGGING) Git usually spells this as #ifdef. > +#include "structured-logging.h" > +#define exit(code) exit(strlog_exit_code((code) & 0xff)) > +#else > #define exit(code) exit((code) & 0xff) > +#endif This is hard to follow, since it only makes sense in combination with the corresponding code in git-compat-util.h. Can they be defined together? If not, can they have comments to make it easier to know to edit one too when editing the other? [...] > --- a/git-compat-util.h > +++ b/git-compat-util.h > @@ -1239,4 +1239,13 @@ extern void unleak_memory(const void *ptr, size_t len); > #define UNLEAK(var) do {} while (0) > #endif > > +#include "structured-logging.h" Is this #include needed? Usually git-compat-util.h only defines C standard library functions or utilities that are used everywhere. [...] > --- a/git.c > +++ b/git.c [...] > @@ -700,7 +701,7 @@ static int run_argv(int *argcp, const char ***argv) > return done_alias; > } > > -int cmd_main(int argc, const char **argv) > +static int real_cmd_main(int argc, const char **argv) > { > const char *cmd; > int done_help = 0; > @@ -779,3 +780,8 @@ int cmd_main(int argc, const char **argv) > > return 1; > } > + > +int cmd_main(int argc, const char **argv) > +{ > + return slog_wrap_main(real_cmd_main, argc, argv); > +} Can real_cmd_main get a different name, describing what it does? [...] > --- a/structured-logging.c > +++ b/structured-logging.c > @@ -1,3 +1,10 @@ [...] > +static uint64_t my__start_time; > +static uint64_t my__exit_time; > +static int my__is_config_loaded; > +static int my__is_enabled; > +static int my__is_pretty; > +static int my__signal; > +static int my__exit_code; > +static int my__pid; > +static int my__wrote_start_event; > +static int my__log_fd = -1; Please don't use this my__ notation. The inconsistency with the rest of Git makes the code feel out of place and provides an impediment to smooth reading. [...] > +static void emit_start_event(void) > +{ > + struct json_writer jw = JSON_WRITER_INIT; > + > + /* build "cmd_start" event message */ > + jw_object_begin(, my__is_pretty); > + { > + jw_object_string(, "event", "cmd_start"); > + jw_object_intmax(, "clock_us", (intmax_t)my__start_time); > + jw_object_intmax(, "pid", (intmax_t)my__pid); The use of blocks here is unexpected and makes me wonder what kind of macro wizardry is going on. Perhaps this is idiomatic for the json-writer API; if so, can you add an example to json-writer.h to help the next surprised reader? That said, I think json_object_begin(, ...); json_object_string(... json_object_int(... ... json_object_begin_inline_array(, "argv"); for (k = 0; k < argv.argc; k++) json_object_string(... json_object_end(); json_object_end(); is still readable and less unexpected. [...] > +static void emit_exit_event(void) > +{ > + struct json_writer jw = JSON_WRITER_INIT; > + uint64_t atexit_time = getnanotime() / 1000; > + > + /* close unterminated forms */ What are unterminated forms? > + if (my__errors.json.len) > + jw_end(__errors); [...] > +int slog_default_config(const char *key, const char *value) > +{ > + const char *sub; > + > + /* > + * git_default_config() calls slog_default_config() with "slog.*" > + * k/v pairs. git_default_config() MAY or MAY NOT be called when > + * cmd_() calls git_config(). No need to shout. [...] > +/* > + * If cmd_() did not cause slog_default_config() to be called > + * during git_config(), we try to lookup our config settings the first > + * time we actually need them. > + * > + * (We do this rather than using read_early_config() at initialization > + * because we want any "-c key=value" arguments to be included.) > + */ Which function is initialization referring to here? Lazy loading in order to guarantee loading after a different subsystem sounds a bit fragile, so I wonder if we can make the sequencing more explicit. Stopping here. I still like where this is going, but some aspects of the coding style are making it hard to see the forest for the trees. Perhaps some more details about API in the design doc would help. Thanks and hope that helps, Jonathan
Re: [PATCH v1 02/25] structured-logging: add STRUCTURED_LOGGING=1 to Makefile
g...@jeffhostetler.com wrote: > From: Jeff Hostetler > > Teach the Makefile to take STRUCTURED_LOGGING=1 variable to > compile in/out structured logging feature. > > Signed-off-by: Jeff Hostetler > --- > Makefile | 8 > structured-logging.c | 9 + > structured-logging.h | 13 + > 3 files changed, 30 insertions(+) > create mode 100644 structured-logging.c > create mode 100644 structured-logging.h This should probably be squashed with a later patch (e.g., patch 3). When taken alone, it produces [...] > --- /dev/null > +++ b/structured-logging.c > @@ -0,0 +1,9 @@ > +#if !defined(STRUCTURED_LOGGING) > +/* > + * Structured logging is not available. > + * Stub out all API routines. > + */ > + > +#else > + > +#endif which is not idiomatic (for example, it's missing a #include of git-compat-util.h, etc). Thanks, Jonathan
Re: [PATCH v1 01/25] structured-logging: design document
Hi, Jeff Hostetler wrote: > Signed-off-by: Jeff Hostetler > --- > Documentation/technical/structured-logging.txt | 816 > + > 1 file changed, 816 insertions(+) > create mode 100644 Documentation/technical/structured-logging.txt Can you add this to Documentation/Makefile as well, so that an HTML version will appear once this is merged? See https://public-inbox.org/git/20180814222846.gg142...@aiede.svl.corp.google.com/ for an example. [...] > +++ b/Documentation/technical/structured-logging.txt > @@ -0,0 +1,816 @@ > +Structured Logging > +== > + > +Structured Logging (SLOG) is an optional feature to allow Git to > +generate structured log data for executed commands. This includes > +command line arguments, command run times, error codes and messages, > +child process information, time spent in various critical functions, > +and repository data-shape information. Data is written to a target > +log file in JSON[1,2,3] format. I like the idea of more structured logs for tracing, monitoring, and diagnosis (see also https://research.google.com/archive/papers/dapper-2010-1.pdf on the subject of tracing). My main focus in looking over this initial contribution is 1. Is this something that could eventually merge with our other tracing APIs (e.g. trace_printf), or will they remain separate? 2. Is the API introduced here one that will be easy to work with and adapt? What will calling code that makes use of it look like? [...] > +Background (Git Merge 2018 Barcelona) > += > + > +Performance and/or error logging was discussed during the contributor's > +summit in Barcelona. Here are the relevant notes from the meeting > +minutes[6]. > + > +> Performance misc (Ævar) > +> --- > +> [...] > +> - central error reporting for git > +>- `git status` logging > +>- git config that collects data, pushes to known endpoint with `git > push` > +>- pre_command and post_command hooks, for logs > +>- `gvfs diagnose` that looks at packfiles, etc [etc] I'm not sure what to make of this section. Is it a historical artifact, or should I consider it part of the design? [...] > +A Quick Example > +=== > + > +Note: JSON pretty-printing is enabled in all of the examples shown in > +this document. When pretty-printing is turned off, each event is > +written on a single line. Pretty-printing is intended for debugging. > +It should be turned off in production to make post-processing easier. > + > +$ git config slog.pretty > + > +Here is a quick example showing SLOG data for "git status". This > +example has all optional features turned off. It contains 2 events. Thanks for the example! I'd also be interested in what the tracing code looks like, since the API is the 'deepest' change involved (the tracing format can easily change later based on experience). My initial reaction to the example trace is that it feels like an odd compromise: on one hand it uses text (JSON) to be human-readable, and on the other hand it is structured to be machine-readable. The JSON writer isn't using a standard JSON serialization library so parsing difficulties seem likely, and the output is noisy enough that reading it by hand is hard, too. That leads me to wish for a different format, like protobuf (where the binary form is very concise and parseable, and the textual form is IMHO more readable than JSON). All that said, this is just the first tracing output format and it's easy to make new ones later. It seems fine for that purpose. [...] > +Target Log File > +=== > + > +SLOG writes events to a log file. File logging works much like > +GIT_TRACE where events are appended to a file on disk. Does it write with O_APPEND? Does it do anything to guard against interleaved events --- e.g. are messages buffered and then written with a single write() call? [...] > +Comparison with GIT_TRACE > += > + > +SLOG is very similar to the existing GIT_TRACE[4] API because both > +write event messages at various points during a command. However, > +there are some fundamental differences that warrant it being > +considered a separate feature rather than just another > +GIT_TRACE_: The natural question would be in the opposite direction: can and should trace_printf be reimplemented on top of this feature? [...] > + > + nit: trailing whitespace (you can find these with "git show --check"). [...] > +A session id (SID) is a cheap, unique-enough string to associate all > +of the events generated by a single process. A child git process inherits > +the SID of their parent git process and incorporates it into their SID. I wonder if we can structure events in a more hierarchical way to avoid having to special-case the toplevel command in this way. [...] > +# The "cmd_exit" event includes the command result and elapsed > +# time and the various configuration
Re: Git Project Leadership Committee
On Thu, Aug 16, 2018 at 06:41:38PM -0400, Jeff King wrote: > - we should avoid anyone who is affiliated with a company that already >has a member on the committee. So nobody from Google and nobody from >GitHub. I would extend that to Microsoft, too, given a certain >impending acquisition. I'd expect anybody who is affiliated with a >company to recuse themselves from decisions that directly affect that >company (which is what we've done so far). > > - I think ideally the candidate would be somebody who represents the >long tail of volunteer community members who don't work on Git as >part of their day-job. The current members have a fairly skewed view >in that respect. At the same time, we can't really represent the >_really_ long tail of infrequent contributors, by the "stick around" >criterion above. Thanks both Christian and Ævar for giving more details on your situations elsewhere in the thread. I do think neither of you is quite in the "I just do this in my spare time" situation. But I also think that situation is going to be inversely correlated with being active in the project and wanting to spend time on governance stuff. So IMHO some compromise there is necessary. And I feel like both of you can represent those interests, even if they're not exactly the situation you're in. So what next? There was a little bit of off-list discussion (mostly nominations to avoid putting the candidate on the spot), but no new public candidates. I'm happy to entertain more discussion here, but it seems like everybody is reasonably happy with these two names. So either Junio and I can pick one, or possibly we could have both (that gives us a 4-person committee, but again, tied votes haven't been an issue so far). Any final thoughts are welcome. Also, on a more meta-level, I'm happy to hear any thoughts about this process that we might want to enshrine for later iterations. This is obviously not nearly as formal as something like Debian elections. But I don't think we're a big enough community to need that. So my attempt is to just keep things informal, but try to give as many opportunities as possible for people to speak up. -Peff
[L10N] Kickoff for Git 2.19.0 round 1
Hi, Git v2.19.0-rc0 has been released, and it's time to start new round of git l10n. This time there are 382 updated messages need to be translated since last update: l10n: git.pot: v2.19.0 round 1 (382 new, 30 removed) Generate po/git.pot from v2.19.0-rc0 for git v2.19.0 l10n round 1. Signed-off-by: Jiang Xin > You can get it from the usual place: https://github.com/git-l10n/git-po/ As how to update your XX.po and help to translate Git, please see "Updating a XX.po file" and other sections in “po/README" file. -- Jiang Xin
Re: Antw: Re: non-smooth progress indication for git fsck and git gc
On Mon, Aug 20, 2018 at 10:57:13AM +0200, Ævar Arnfjörð Bjarmason wrote: > > That seems to apply. BTW: Is there a way go get some repository statistics > > like a histogram of object sizes (or whatever that might be useful to help > > making decisions)? > > The git-sizer program is really helpful in this regard: > https://github.com/github/git-sizer Yeah, I'd very much agree that's the best tool for a general overview of the repository stats. Ulrich, if you want to more analysis (like a histogram), probably something like: git cat-file --batch-all-objects --batch-check='%(objectsize)' might be a good starting place to dump information (see the "BATCH OUTPUT" section of "git help cat-file" for more format items). > > If it's sorting, maybe add some code like (wild guess): > > > > if (objects_to_sort > magic_number) > >message("Sorting something..."); > > I think a good solution to these cases is to just introduce something to > the progress.c mode where it learns how to display a counter where we > don't know what the end-state will be. Something like your proposed > magic_number can just be covered under the more general case where we > don't show the progress bar unless it's been 1 second (which I believe > is the default). Yeah. We already have open-ended counters (e.g., "counting objects"), and delayed meters (we actually normalized the default to 2s recently). I _think_ they should work together OK without further modification. Once upon a time the caller had to say "don't show if we're past N% after M seconds", but I think with the current code we'd just show it if we're not completely finished after 2 seconds. So it really should just be a simple: progress = start_delayed_progress("Hashing packfile", 0); That said, counting bytes would probably be ugly (just because the numbers get really big). We'd probably prefer to show a throughput or something. And as you noted, there's some refactoring to be done with pack-check for it to show multiple progress meters. (I still think in the long run we would want to scrap that code, but that's a much bigger job; I'm fine if somebody wants to do incremental improvements in the meantime). -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On Mon, Aug 20, 2018 at 5:27 PM Jonathan Nieder wrote: > > Jonathan Nieder wrote: > > Stefan Beller wrote: > >> Junio C Hamano wrote: > > >>> * "git submodule" did not correctly adjust core.worktree setting that > >>>indicates whether/where a submodule repository has its associated > >>>working tree across various state transitions, which has been > >>>corrected. > >>>(merge 984cd77ddb sb/submodule-core-worktree later to maint). > >> > >> Personally I do not view this as a bug fix but a feature > >> (but then again my thinking might be tainted of too much > >> submodule work) hence I would not merge it down. > > > > Can you elaborate? > > ... ah, I figured it out. You are saying "would not merge it down to > maint". In that case, I agree, since this this is not a recent bug > (it's existed since before v1.7.10-rc1~14^2~2, 2012-03-02). Yeah; the behavior was the gold standard for submodules ever since, so I am wary of changing it under the guise of fixing a bug. The core.worktree setting doesn't harm the user by default; you need to craft a very specific situation to benefit from this feature. Stefan
Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
> At the risk of going on a tangent, I assumed this was because enums > are really ints, and the "default" is there in case the enum somehow > got assigned to an int without a corresponding value. Either because > of a cast from an int that was out-of-range, or new values that were > obtained from arithmetic or bitwise operations on the declared enum > values, which created undeclared values. See 374166cb381 (grep: catch a missing enum in switch statement, 2017-05-25) or a bit date, but nevertheless an interesting read: b8527d5fa61 (wt-status: fix possible use of uninitialized variable, 2013-03-21) ...compilers these days are just too smart to reason about them :-)
Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
> > heh. Thanks for switching the style; I should have emphasized that > > (after reflection) I found them equally good, I am used to one > > over the other more. > It seems marginally better to me. I also noticed a clean-up patch > going by that aggressively switched to test_must_be_empty wherever > possible: > https://public-inbox.org/git/20180819215725.29001-1-szeder@gmail.com/ > > OTOH, if it were up to me I would have just gotten rid of > test_must_be_empty and used an existing function with the right > argument, like `test_cmp /dev/null` - but using some form consistently > is the most important, whatever it is. /dev/null, eh? It shows you don't use Windows on a day to day basis. ;-) But yeah consistency is really good to have. :)
Re: [ANNOUNCE] Git v2.19.0-rc0
Jonathan Nieder wrote: > Stefan Beller wrote: >> Junio C Hamano wrote: >>> * "git submodule" did not correctly adjust core.worktree setting that >>>indicates whether/where a submodule repository has its associated >>>working tree across various state transitions, which has been >>>corrected. >>>(merge 984cd77ddb sb/submodule-core-worktree later to maint). >> >> Personally I do not view this as a bug fix but a feature >> (but then again my thinking might be tainted of too much >> submodule work) hence I would not merge it down. > > Can you elaborate? ... ah, I figured it out. You are saying "would not merge it down to maint". In that case, I agree, since this this is not a recent bug (it's existed since before v1.7.10-rc1~14^2~2, 2012-03-02). Thanks, Jonathan
Re: [ANNOUNCE] Git v2.19.0-rc0
(-cc: other lists) Stefan Beller wrote: > Junio C Hamano wrote: >> * "git submodule" did not correctly adjust core.worktree setting that >>indicates whether/where a submodule repository has its associated >>working tree across various state transitions, which has been >>corrected. >>(merge 984cd77ddb sb/submodule-core-worktree later to maint). > > Personally I do not view this as a bug fix but a feature > (but then again my thinking might be tainted of too much > submodule work) hence I would not merge it down. Can you elaborate? The symptom that this series fixes was pretty bad, so I'm pretty glad you wrote it. Thanks, Jonathan
Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
On Fri, Aug 17, 2018 at 3:28 PM Stefan Beller wrote: > > On Fri, Aug 17, 2018 at 3:20 PM Matthew DeVore wrote: > > > > On Fri, Aug 17, 2018 at 2:42 PM Stefan Beller wrote: > > > > > > On Wed, Aug 15, 2018 at 4:23 PM Matthew DeVore wrote: > > > > > > > > Teach list-objects the "tree:0" filter which allows for filtering > > > > out all tree and blob objects (unless other objects are explicitly > > > > specified by the user). The purpose of this patch is to allow smaller > > > > partial clones. > > > > > > > > The name of this filter - tree:0 - does not explicitly specify that > > > > it also filters out all blobs, but this should not cause much confusion > > > > because blobs are not at all useful without the trees that refer to > > > > them. > > > > > > > > I also consider only:commits as a name, but this is inaccurate because > > > > it suggests that annotated tags are omitted, but actually they are > > > > included. > > > > > > Speaking of tag objects, it is possible to tag anything, including blobs. > > > Would a blob that is tagged (hence reachable without a tree) be not > > > filtered by tree:0 (or in the future any deeper depth) ? > > I think so. If I try to fetch a tagged tree or blob, it should fetch > > that object itself, since I'm referring to it explicitly in the git > > pack-objects arguments (I mention fetch since git rev-list apparently > > doesn't support specifying non-commits on the command line). This is > > similar to how I can fetch a commit that would otherwise be filtered > > *if* I specify it explicitly (rather than a child commit). > > > > If you're fetching a tagged tree, then for depth=0, it will fetch the > > given tree only, and not fetch any referents of an explicitly-given > > tree. For depth=1, it will fetch all direct referents. > > > > If you're fetching a commit, then for depth=0, you will not get any > > tree objects, and for depth=1, you'll get only the root tree object > > and none of its referrents. So the commit itself is like a "layer" in > > the depth count. > > That seems smart. Thanks! > > > > > > > > > I found this series a good read, despite my unfamiliarity of the > > > partial cloning. > > > > > > One situation where I scratched my head for a second were previous patches > > > that use "test_line_count = 0 rev_list_err" whereas using > > > test_must_be_empty > > > would be an equally good choice (I am more used to the latter than the > > > former) > > > > Done. Here is an interdiff (sorry, the tab characters are not > > maintained by my mail client): > > heh. Thanks for switching the style; I should have emphasized that > (after reflection) I found them equally good, I am used to one > over the other more. It seems marginally better to me. I also noticed a clean-up patch going by that aggressively switched to test_must_be_empty wherever possible: https://public-inbox.org/git/20180819215725.29001-1-szeder@gmail.com/ OTOH, if it were up to me I would have just gotten rid of test_must_be_empty and used an existing function with the right argument, like `test_cmp /dev/null` - but using some form consistently is the most important, whatever it is. > > So if that is the only issue brought up, I would not even ask for a resend. > > Thanks, > Stefan
Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
On Mon, Aug 20, 2018 at 11:38 AM Stefan Beller wrote: > > On Mon, Aug 20, 2018 at 6:18 AM Matthew DeVore wrote: > > > > There were many instances in this file where it seemed like BUG would be > > better, so I created a new commit before this one to switch them over. The > > interdiff is below. > > > > BTW, why are there so many instances of "die" without "_"? I expect all > > errors that may be caused by a user to be localized. > > Well, there is the porcelain layer to be consumed by a human user > and the plumbing that is good for scripts. And in scripts you might want > to grep for certain errors and react to that, so a non-localized error > message makes the script possible to run in any localisation. > > The BUG is strictly for things that are due to Gits internals, > not for problematic user input. Problematic user input > definitely wants a die(...), and depending on the plumbing/porcelain > layer it may need to be _(translatable). Ah I see. Plumbing commands are not translated. Makes perfect sense now. > > I think BUG() would never go with translated strings. > > > I'm going by the output of this: grep -IrE '\Wdie\([^_]' --exclude-dir=t > > > > diff --git a/list-objects-filter.c b/list-objects-filter.c > > index 8e3caf5bf..09b2b05d5 100644 > > --- a/list-objects-filter.c > > +++ b/list-objects-filter.c > > @@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none( > > > > switch (filter_situation) { > > default: > > - die("unknown filter_situation"); > > - return LOFR_ZERO; > > + BUG("unknown filter_situation: %d", filter_situation); > > > > case LOFS_BEGIN_TREE: > > assert(obj->type == OBJ_TREE); > > @@ -99,8 +98,7 @@ static enum list_objects_filter_result filter_trees_none( > > > > switch (filter_situation) { > > default: > > - die("unknown filter_situation"); > > - return LOFR_ZERO; > > + BUG("unknown filter_situation: %d", filter_situation); > > > > case LOFS_BEGIN_TREE: > > case LOFS_BLOB: > > @@ -151,8 +149,7 @@ static enum list_objects_filter_result > > filter_blobs_limit( > > > > switch (filter_situation) { > > default: > > - die("unknown filter_situation"); > > - return LOFR_ZERO; > > + BUG("unknown filter_situation: %d", filter_situation); > > > > case LOFS_BEGIN_TREE: > > assert(obj->type == OBJ_TREE); > > @@ -257,8 +254,7 @@ static enum list_objects_filter_result filter_sparse( > > > > switch (filter_situation) { > > default: > > - die("unknown filter_situation"); > > - return LOFR_ZERO; > > + BUG("unknown filter_situation: %d", filter_situation); > > Up until here we just have replace the die by BUG in the default > case of the state machine switch. (We need the default due to strict > compile flags, but as filter_situation is an enum I thought we would not > as compilers are smart enough to see we got all values of the enum > covered). At the risk of going on a tangent, I assumed this was because enums are really ints, and the "default" is there in case the enum somehow got assigned to an int without a corresponding value. Either because of a cast from an int that was out-of-range, or new values that were obtained from arithmetic or bitwise operations on the declared enum values, which created undeclared values. > > I agree that keeping the defaults and having a BUG() is reasonable. > > > > > > case LOFS_BEGIN_TREE: > > assert(obj->type == OBJ_TREE); > > @@ -439,7 +435,7 @@ void *list_objects_filter__init( > > assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT); > > > > if (filter_options->choice >= LOFC__COUNT) > > - die("invalid list-objects filter choice: %d", > > + BUG("invalid list-objects filter choice: %d", > > filter_options->choice); > > This also makes sense, combined with the assert before, this looks like > really defensive code. > > I think this patch is a good idea! Thank you for your feedback :) > > Thanks, > Stefan
What's cooking in git.git (Aug 2018, #05; Mon, 20)
Here are the topics that have been cooking. Commits prefixed with '-' are only in 'pu' (proposed updates) while commits prefixed with '+' are in 'next'. The ones marked with '.' do not appear in any of the integration branches, but I am still holding onto them. An early preview of the upcoming 2.19 release of Git has been tagged as v2.19.0-rc0; before -rc1, I plan to merge three more topics to 'master' from 'next'. You can find the changes described here in the integration branches of the repositories listed at http://git-blame.blogspot.com/p/git-public-repositories.html -- [Graduated to "master"] * ab/checkout-default-remote (2018-08-18) 1 commit (merged to 'next' on 2018-08-20 at 21fff26f74) + t2024: mark test using "checkout -p" with PERL prerequisite Test fix. * ab/fetch-tags-noclobber (2018-08-13) 7 commits (merged to 'next' on 2018-08-15 at eca0ac8afa) + pull doc: fix a long-standing grammar error + fetch tests: correct a comment "remove it" -> "remove them" + push tests: assert re-pushing annotated tags + push tests: add more testing for forced tag pushing + push tests: fix logic error in "push" test assertion + push tests: remove redundant 'git push' invocation + fetch tests: change "Tag" test tag to "testTag" Test and doc clean-ups. * ab/newhash-is-sha256 (2018-08-07) 2 commits (merged to 'next' on 2018-08-15 at 2e808d75d3) + doc hash-function-transition: pick SHA-256 as NewHash + doc hash-function-transition: note the lack of a changelog Documentation update. * ab/submodule-relative-url-tests (2018-08-14) 1 commit (merged to 'next' on 2018-08-17 at 17b28d8262) + submodule: add more exhaustive up-path testing Test updates. * ab/test-must-be-empty-for-master (2018-07-30) 1 commit (merged to 'next' on 2018-08-15 at 17652a77fb) + tests: make use of the test_must_be_empty function Test updates. * ds/commit-graph-fsck (2018-08-13) 1 commit (merged to 'next' on 2018-08-15 at a2f82d3cbd) + t5318: use 'test_cmp_bin' to compare commit-graph files Test fix. * en/incl-forward-decl (2018-08-15) 6 commits (merged to 'next' on 2018-08-17 at 04fc9c11bb) + Remove forward declaration of an enum + compat/precompose_utf8.h: use more common include guard style + urlmatch.h: fix include guard + Move definition of enum branch_track from cache.h to branch.h + alloc: make allocate_alloc_state and clear_alloc_state more consistent + Add missing includes and forward declarations Code hygiene improvement for the header files. * en/t7406-fixes (2018-08-08) 5 commits (merged to 'next' on 2018-08-15 at c6a740d828) + t7406: avoid using test_must_fail for commands other than git + t7406: prefer test_* helper functions to test -[feds] + t7406: avoid having git commands upstream of a pipe + t7406: simplify by using diff --name-only instead of diff --raw + t7406: fix call that was failing for the wrong reason Test fixes. * en/update-index-doc (2018-08-08) 1 commit (merged to 'next' on 2018-08-15 at 3ee0ae14dc) + git-update-index.txt: reword possibly confusing example Doc update. * es/chain-lint-more (2018-08-13) 6 commits (merged to 'next' on 2018-08-15 at bb5150ee96) + chainlint: add test of pathological case which triggered false positive + chainlint: recognize multi-line quoted strings more robustly + chainlint: let here-doc and multi-line string commence on same line + chainlint: recognize multi-line $(...) when command cuddled with "$(" + chainlint: match 'quoted' here-doc tags + chainlint: match arbitrary here-docs tags rather than hard-coded names Improve built-in facility to catch broken &&-chain in the tests. * hn/highlight-sideband-keywords (2018-08-18) 2 commits (merged to 'next' on 2018-08-20 at ec6f953f8c) + sideband: do not read beyond the end of input (merged to 'next' on 2018-08-15 at f8945f3be5) + sideband: highlight keywords in remote sideband output The sideband code learned to optionally paint selected keywords at the beginning of incoming lines on the receiving end. * jc/gpg-status (2018-08-09) 1 commit (merged to 'next' on 2018-08-15 at 824781761a) + gpg-interface: propagate exit status from gpg back to the callers "git verify-tag" and "git verify-commit" have been taught to use the exit status of underlying "gpg --verify" to signal bad or untrusted signature they found. * jc/update-index-doc (2018-08-08) 1 commit (merged to 'next' on 2018-08-15 at 055994ccca) + update-index: there no longer is `apply --index-info` Doc update. * jh/partial-clone-doc (2018-08-15) 1 commit (merged to 'next' on 2018-08-15 at cf09e8be6a) + partial-clone: render design doc using asciidoc Doc updates. * jk/for-each-object-iteration (2018-08-14) 11 commits (merged to 'next' on 2018-08-15 at e2558810ff) + for_each_*_object: move declarations to object-store.h + cat-file: use a single strbuf for all output + cat-file: split batch "buf" into two
Re: [PATCH] submodule: add config for where gitdirs are located
Brandon Williams writes: > Introduce the config "submodule..gitdirpath" which is used to > indicate where a submodule's gitdir is located inside of a repository's > "modules" directory. > > Signed-off-by: Brandon Williams > --- > > Maybe something like this on top? Do you think we should disallow > "../" in this config, even though it is a repository local > configuration and not shipped in .gitmodules? Sounds sensible to start strict and loosen later if/when necessary. If we disallow "../", shouldn't we also reject an absolute path (meaning, you can only specify a subdirectory of $GIT_DIR/modules/ in the super-project)?
Re: [PATCH 8/9] midx: test a few commands that use get_all_packs
On Mon, Aug 20, 2018 at 9:52 AM Derrick Stolee wrote: > > The new get_all_packs() method exposed the packfiles coverede by s/coverede/covered/
Re: [PATCH 7/9] treewide: use get_all_packs
On Mon, Aug 20, 2018 at 9:52 AM Derrick Stolee wrote: > > There are many places in the codebase that want to iterate over > all packfiles known to Git. The purposes are wide-ranging, and > those that can take advantage of the multi-pack-index already > do. So, use get_all_packs() instead of get_packed_git() to be > sure we are iterating over all packfiles. So get_packed_git shouold not be used any further? Do we want to annotate it as deprecated, to be deleted in the future? Or even remove it as part of this series (that might be an issue with other series in flight).
Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects
On Mon, Aug 20, 2018 at 02:03:53PM -0700, Junio C Hamano wrote: > > So taking all of those options into account, what I ended up > > with is a separate list of "external bases" that are not > > part of the main packing list. Each delta entry that points > > to an external base has a single-bit flag to do so; we have a > > little breathing room in the bitfield section of > > object_entry. > > > > This lets us limit the change primarily to the oe_delta() > > and oe_set_delta_ext() functions. And as a bonus, most of > > the rest of the code does not consider these dummy entries > > at all, saving both runtime CPU and code complexity. > > Tricky ;-) > > I wonder if we can move the preferred base objects that we are not > going to send also off of the "main packing list" to this new > mechanism? That gets trickier. Obviously we don't need to actually send them, so they're ignored during the write phase. But we do put them into the delta-search window along with the other objects, and we care about things like their size and type (which our dummy objects do not even have; they really are just placeholders for the oids). So I'd guess that we'd end up with three arrays of entries: - objects we want to send, with full details including delta bases - preferred base objects with _some_ details - dummy thin objects for reused deltas The trick is that I don't know which details fall under the "some" in the second class. That could be extracted from a careful reading of the code, but it seems like it has a high chance of regression (and I'm not sure there's a huge upside, given that the code is already written as-is). I also worried about the same thing with these new reused dummy objects. but I think by now I'd have shaken out any problems in production use. > > +static struct bitmap_index *bitmap_git; > > ... > > +static int thin = 0; > > Please trust what BSS will do to your static vars. Heh. I copied the non-static declaration, but didn't think about shortening it. Looks like there were one or two minor comments, so I'll do a re-roll that addresses them. -Peff
Re: [PATCH v3 7/7] submodule: support reading .gitmodules even when it's not checked out
On Tue, 14 Aug 2018 13:36:17 -0700 Junio C Hamano wrote: > Antonio Ospite writes: > > > /* Equivalent to ACTION_SET in builtin/config.c */ > > - if (argc == 3) > > + if (argc == 3) { > > + struct object_id oid; > > + > > + /* > > +* If the .gitmodules file is not in the working tree but it > > +* is in the current branch, stop, as writing new values (and > > +* staging them) would blindly overwrite ALL the old content. > > Hmph, "the file is missing" certainly is a condition we would want > to notice, but wouldn't we in general want to prevent us from > overwriting any local modification, where "missing" is merely a very > special case of local modification? I am wondering if we would want > to stop if .gitmodules file exists both in the working tree and in > the index, and the contents of them differ, or something like that. > TTBOMK checking the index status (with something like is_staging_gitmodules_ok()) when the .gitmodules file *exists* in the working tree would break calling "git submodule add" multiple times before committing the changes. > > diff --git a/git-submodule.sh b/git-submodule.sh > > index ff258e2e8c..b1cb187227 100755 > > --- a/git-submodule.sh > > +++ b/git-submodule.sh > > @@ -159,6 +159,13 @@ cmd_add() > > shift > > done > > > > + # For more details about this check, see > > + # builtin/submodule--helper.c::module_config() > > + if test ! -e .gitmodules && git cat-file -e HEAD:.gitmodules > > > /dev/null 2>&1 > > No SP between redirection '>' and its target '/dev/null'. > > More importantly, I think it is better to add a submodule--helper > subcommand that exposes the check in question, as the code is > already written ;-) That approach will guarantee that the logic and > the message stay the same between here and in the C code. Then you > do not even need these two line comment. > Yeah I anticipated this concern in the patch annotation, but I was hoping that it would be OK to have this as a followup change. I guess I can do it for v4 instead. Does the interface suggested in the patch annotation sound acceptable? To recap: - add an is_gitmodules_safely_writeable() helper; - expose a "submodule--helper config --is-safely-writeable" subcommand for git-submodule.sh to use. [...] > > @@ -603,8 +604,19 @@ static void submodule_cache_check_init(struct > > repository *repo) > > static void config_from_gitmodules(config_fn_t fn, struct repository > > *repo, void *data) > > { > > if (repo->worktree) { > > - char *file = repo_worktree_path(repo, GITMODULES_FILE); > > - git_config_from_file(fn, file, data); > > + struct git_config_source config_source = { 0 }; > > + const struct config_options opts = { 0 }; > > + struct object_id oid; > > + char *file; > > + > > + file = repo_worktree_path(repo, GITMODULES_FILE); > > + if (file_exists(file)) > > + config_source.file = file; > > + else if (get_oid(GITMODULES_HEAD, ) >= 0) > > + config_source.blob = GITMODULES_HEAD; > > What is the reason why we fall back directly to HEAD when working > tree file does not exist? I thought that our usual fallback was to > the version in the index for other things like .gitignore/attribute > and this codepath look like an oddball. Are you trying to handle > the case where we are in a bare repository without any file checked > out (and there is not even the index)? > My use case is about *reading* .gitmodules when it's ignored in a sparse checkout, in this scenario there are usually no staged changes to .gitmodules, so I basically just didn't care about the index. Would using ":.gitmodules" instead of "HEAD:.gitmodules" be enough? By reading man gitrevisions(7) and after a quick test with "git cat-file blob :.gitmodules" it looks like this would be more in line with your suggestion, still covering my use case. If so, what name should I use instead of GITMODULES_HEAD? GITMODULES_BLOB is already taken for something different, maybe GITMODULES_REF or GITMODULES_OBJECT? > > diff --git a/t/t7416-submodule-sparse-gitmodules.sh > > b/t/t7416-submodule-sparse-gitmodules.sh > > new file mode 100755 > > index 00..5341e9b012 > > --- /dev/null > > +++ b/t/t7416-submodule-sparse-gitmodules.sh > > @@ -0,0 +1,90 @@ > > +#!/bin/sh > > +# > > +# Copyright (C) 2018 Antonio Ospite > > +# > > + > > +test_description='Test reading/writing .gitmodules when not in the working > > tree > > + > > +This test verifies that, when .gitmodules is in the current branch but is > > not > > +in the working tree reading from it still works but writing to it does not. > > + > > +The test setup uses a sparse checkout, however the same scenario can be > > set up > > +also by committing .gitmodules and then just removing it from the > > filesystem. > > +' > > + > > +. ./test-lib.sh > > +
Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)
Stefan Beller writes: >> I find the design a bit iffy in that our usual "missing in the >> working tree? let's use the latest blob" fallback is to take it >> from the index, not from the HEAD. > > I am not sure; why does it feel iffy? Why doesn't it? There is no justification why it has to be different from others and read from HEAD bypassing index.
Re: [PATCH 3/9] midx: mark bad packed objects
On Mon, Aug 20, 2018 at 9:52 AM Derrick Stolee wrote: > > When an object fails to decompress from a pack-file, we mark the object > as 'bad' so we can retry with a different copy of the object (if such a > copy exists). > > Before now, the multi-pack-index did not update the bad objects list for > the pack-files it contains, and we did not check the bad objects list > when reading an object. Now, do both. This sounds like a bug fix unlike patches 1 & 2 that sound like feature work(2) or making code more readable(1). (After studying the code, this doesn't sound like a bug fix any more, but a safety thing) Is it worth having this on a separate track coming in faster than the rest of this series? > > Signed-off-by: Derrick Stolee > --- > midx.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/midx.c b/midx.c > index 6824acf5f8..7fa75a37a3 100644 > --- a/midx.c > +++ b/midx.c > @@ -280,6 +280,16 @@ static int nth_midxed_pack_entry(struct multi_pack_index > *m, struct pack_entry * > if (!is_pack_valid(p)) > return 0; > > + if (p->num_bad_objects) { > + uint32_t i; Is there a reason that i needs to be if 32 bits? Would size_t (for iterating) or 'int' (as a default like in many for loops) be ok, too?
Re: [PATCH 2/9] multi-pack-index: store local property
On Mon, Aug 20, 2018 at 9:52 AM Derrick Stolee wrote: > > A pack-file is 'local' if it is stored within the usual object > directory. If it is stored in an alternate, it is non-local. > > Pack-files are stored using a 'pack_local' member in the packed_git > struct. Add a similar 'local' member to the multi_pack_index struct > and 'local' parameters to the methods that load and prepare multi- > pack-indexes. Going by that example, maybe we'd want to have it be the first bit of a bitfield, c.f. unsigned pack_local:1, pack_keep:1, pack_keep_in_core:1, freshened:1, do_not_close:1, pack_promisor:1; in the definition of packed_git. Is there value in documenting both packfiles as well as midx variables? (When going with the bitfield example, it may be a bit more worthwhile, as the commit message is harder to find, as it will need repeated blaming in the future, if there is a commit on top that adds another bit to the field; I am unsure, but I would lean towards documentation as it becomes a bit unclear: Is the local flag for the midx file that is read, or rather for the underlying packs? What exactly is local? ) AFAICT this patch alone doesn't have any effect, yet, as it only pipes the flag thru, is that worth mentioning or is that obvious? Thanks, Stefan
Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects
Jeff King writes: > And that's exactly what this patch does: when we're > considering whether to reuse an on-disk delta, if bitmaps > tell us the other side has the object (and we're making a > thin-pack), then we reuse it. That's really the natural extension and logical consequence of the "reuse existing deltas" mechanism from Feb 2006 ;-) > So taking all of those options into account, what I ended up > with is a separate list of "external bases" that are not > part of the main packing list. Each delta entry that points > to an external base has a single-bit flag to do so; we have a > little breathing room in the bitfield section of > object_entry. > > This lets us limit the change primarily to the oe_delta() > and oe_set_delta_ext() functions. And as a bonus, most of > the rest of the code does not consider these dummy entries > at all, saving both runtime CPU and code complexity. Tricky ;-) I wonder if we can move the preferred base objects that we are not going to send also off of the "main packing list" to this new mechanism? > +static struct bitmap_index *bitmap_git; > ... > +static int thin = 0; Please trust what BSS will do to your static vars.
[RFC] Git enumerations
Hi list, I'm a co-author of the git-dit tool [1] discussed in the thread git-bug: Distributed bug tracker embedded in git on this mailing list. Partly motivated by Jonathan Nieder's call for more collaboration, I'd like to request comments on a functionality which I originally planned on introducing as a git-dit internal feature. # Problem statement: Especially with tools like git-dit or git-bug, we often list OIDs, usually of commits, and additional data. For example, `git dit list -a` may print something like the following (`-a` is for abbreviated ids): 8640a6b (Mon Dec 19 17:11:14 2016) Implement "subscribe" functionality ae058fd (Fri Dec 16 10:35:40 2016) Add features to dit-{list,show} to show statuses For viewing the details of the first issue, you would invoke the "show" subcommand with the respective (possibly abbreviated) id. You generally either have to re-type or copy this id. The first option tends to get on one's nerves, since the IDs have to be unambiguous for the repository. The second may disturb your workflow (since you have to grab your mouse) or may not be possible (e.g. plain TTY). So it would be nice if you could just identify them via a plain number. For the output of `git log --oneline --decorate` and similar commands, we already have something similar: each commit shown can generally be addressed via some-ref~some-number. So you can, more or less, inspect those without copying IDs. For listing issues with git-dit, it's not that easy, since the issues are generally not connected. # Solution using symbolic refs When printing a list of disconnected objects addressable through OIDs, we would generate an enumeration on the fly. For each entry, we then store a symbolic ref reflecting the enumeration in some namespace (e.g. "refs/enumeration/1", "refs/enumeration/2", ...). Obviously, each symbolic ref would point to the respective entry's OID. Essentially, we have persisted the list printed by the first command. Other subcommand of the tool would contain some minimal logic for detecting and choosing such an enumeration (e.g. "2") in the place of an OID. This may have additional benefits for some tools, since we can also review a previously printed list. This is especially interesting if generating the list is costly. # Additional tweaks Instead of plain numbers, the symbolic refs could also contain an optional prefix. `git dit list`, for example could generate enumerations like "i1", "i2", ... while `git dit show` (which prints "messages") would generate "m1", "m2", "m3", ... This would support workflows like: * view the list of open issues * inspect an issue * show that (exact) list of issues again * inspect the next, ... Alternatively, one could start using namespaces within "refs/enumeration", but imo this would hurt the usability in most cases. # Integration As stated initially, I originally planned on introducing this as an internal feature to git-dit. However, I realized that other tools may also benefit from such enumerations. Hence, I'm thinking about providing by a third-party library as well as a minimal tool/subcommand for managing (e.g. clearing) enumerations. Regards, Julian [1] https://github.com/neithernut/git-dit
Re: [PATCH v2 0/8] Clarify commit-graph and grafts/replace/shallow incompatibilities
On 8/20/2018 3:37 PM, Stefan Beller wrote: On Mon, Aug 20, 2018 at 11:24 AM Derrick Stolee wrote: One unresolved issue with the commit-graph feature is that it can cause issues when combined with replace objects, commit grafts, or shallow clones. These are not 100% incompatible, as one could be reasonably successful writing a commit-graph after replacing some objects and not have issues. The problems happen when commits that are already in the commit-graph file are replaced, or when git is run with the `--no-replace-objects` option; this can cause incorrect parents or incorrect generation numbers. Similar things occur with commit grafts and shallow clones, especially when running `git fetch --unshallow` in a shallow repo. Instead of trying (and probably failing) to make these features work together, default to making the commit-graph feature unavailable in these situations. Create a new method 'commit_graph_compatible(r)' that checks if the repository 'r' has any of these features enabled. CHANGES IN V2: * The first two commits regarding the ref iterators are unchanged, despite a lot of discussion on the subject [1]. * I included Peff's changes in jk/core-use-replace-refs, changing the base commit for the series to 1689c22c1c328e9135ed51458e9f9a5d224c5057 (the merge that brought that topic into 'msater'). * I fixed the tests for the interactions with the graft feature. Because of the change of base, it is hard to provide a side-by-side diff from v1. Thanks, -Stolee [1] https://public-inbox.org/git/CAGZ79kZ3PzqpGzXWcmxjzi98gA+LT2MBOf8KaA89hOa-Qig=o...@mail.gmail.com/ Stefan's response recommending we keep the first two commits. After reviewing my own patches, I flipped again (Sorry!) and would rather not see my patches be merged, but the very original solution by you, that you proposed at [1]. That said, I will not insist on it, and if this is merged as is, we can fix it up later. With that said, I just read through the remaining patches, I think they are a valuable addition to Git and could be merged as-is. [1] https://github.com/gitgitgadget/git/pull/11/commits/300db80140dacc927db0d46c804ca0ef4dcc1be1 Thanks, Stefan Just to keep things on-list as possible, here is the patch for the commit linked above. It would replace patches 1 & 2 from this series. Junio: I can send a v3 that includes this commit if you need it, or if there are other edits to be made in this series. commit 300db80140dacc927db0d46c804ca0ef4dcc1be1 Author: Derrick Stolee Date: Fri Jul 20 15:39:06 2018 -0400 replace-objects: use arbitrary repositories This is the smallest possible change that makes prepare_replace_objects work properly with arbitrary repositories. By supplying the repository as the cb_data, we do not need to modify any code in the ref iterator logic. We will likely want to do a full replacement of the ref iterator logic to provide a repository struct as a concrete parameter. Signed-off-by: Derrick Stolee diff --git a/replace-object.c b/replace-object.c index 801b5c1678..e99fcd1ff6 100644 --- a/replace-object.c +++ b/replace-object.c @@ -14,6 +14,7 @@ static int register_replace_ref(const char *refname, const char *slash = strrchr(refname, '/'); const char *hash = slash ? slash + 1 : refname; struct replace_object *repl_obj = xmalloc(sizeof(*repl_obj)); + struct repository *r = (struct repository *)cb_data; if (get_oid_hex(hash, _obj->original.oid)) { free(repl_obj); @@ -25,7 +26,7 @@ static int register_replace_ref(const char *refname, oidcpy(_obj->replacement, oid); /* Register new object */ - if (oidmap_put(the_repository->objects->replace_map, repl_obj)) + if (oidmap_put(r->objects->replace_map, repl_obj)) die("duplicate replace ref: %s", refname); return 0; @@ -40,7 +41,7 @@ static void prepare_replace_object(struct repository *r) xmalloc(sizeof(*r->objects->replace_map)); oidmap_init(r->objects->replace_map, 0); - for_each_replace_ref(r, register_replace_ref, NULL); + for_each_replace_ref(r, register_replace_ref, r); } /* We allow "recursive" replacement. Only within reason, though */
Re: [PATCH 3/7] submodule: is_submodule_active to differentiate between new and old mode
On Thu, Aug 16, 2018 at 10:37 AM Junio C Hamano wrote: > > Stefan Beller writes: > > > The change a086f921a72 (submodule: decouple url and submodule interest, > > 2017-03-17) enables us to do more than originally thought. > > As the url setting was used both to actually set the url where to > > obtain the submodule from, as well as used as a boolean flag later > > to see if it was active, we would need to keep the url around. > > > > Now that submodules can be activated using the submodule.[.]active > > setting, we could remove the url if the submodule is activated via that > > setting. > > ... as the cloned submodule repository has $GIT_DIR/config which > knows its own origin, we do not need to record URL in superproject's > $GIT_DIR/config. Back before we started using a directory under > $GIT_DIR/modules/ as a more permanent location to store submodule, > and point at it by a gitdir file in $path/.git to allow removal of a > submodule from the working tree of the superproject without having > to reclone when resurrecting the same submodule, it may have been > useful to keep custom URL somewhere in the superproject, but that no > longer is the case. > > > In preparation to do so, pave the way by providing an easy way to see > > if a submodule is considered active via the new .active setting or via > > the old .url setting. > > Makes sense. > > > > > Signed-off-by: Stefan Beller > > --- > > submodule.c | 5 + > > submodule.h | 6 ++ > > 2 files changed, 7 insertions(+), 4 deletions(-) > > > > diff --git a/submodule.c b/submodule.c > > index 6e14547e9e0..d56350ed094 100644 > > --- a/submodule.c > > +++ b/submodule.c > > @@ -221,9 +221,6 @@ int > > option_parse_recurse_submodules_worktree_updater(const struct option *opt, > > return 0; > > } > > > > -/* > > - * Determine if a submodule has been initialized at a given 'path' > > - */ > > int is_submodule_active(struct repository *repo, const char *path) > > { > > int ret = 0; > > @@ -267,7 +264,7 @@ int is_submodule_active(struct repository *repo, const > > char *path) > > > > /* fallback to checking if the URL is set */ > > key = xstrfmt("submodule.%s.url", module->name); > > - ret = !repo_config_get_string(repo, key, ); > > + ret = !repo_config_get_string(repo, key, ) ? 2 : 0; > > > > free(value); > > free(key); > > diff --git a/submodule.h b/submodule.h > > index 4644683e6cb..bfc070e4629 100644 > > --- a/submodule.h > > +++ b/submodule.h > > @@ -45,6 +45,12 @@ extern int git_default_submodule_config(const char *var, > > const char *value, void > > struct option; > > int option_parse_recurse_submodules_worktree_updater(const struct option > > *opt, > >const char *arg, int > > unset); > > +/* > > + * Determine if a submodule has been initialized at a given 'path'. > > + * Returns 1 if it is considered active via the submodule.[.]active > > + * setting, or return 2 if it is active via the older submodule.url > > setting. > > + */ > > +#define SUBMODULE_ACTIVE_VIA_URL 2 > > extern int is_submodule_active(struct repository *repo, const char *path); > > /* > > * Determine if a submodule has been populated at a given 'path' by > > checking if > > This change assumes that all the existing return sites in the > is_submodule_active() function signals true with 1 (or at least some > value that is different from 2). Yes. > So I think this patch is insufficient, and needs to at least change > the "submodule.active" codepath to return !!ret; otherwise, a caller > that now expects 0 (not active), 1 (active but can lose URL) and 2 > (active and the presence of URL makes it so) will be confused when > one of the MATCHED_* constants from dir.h comes back. Yes. I'll resend when appropriately. Thanks, Stefan
Re: [PATCH 7/7] submodule--helper: introduce new update-module-mode helper
On Sat, Aug 18, 2018 at 9:11 AM Duy Nguyen wrote: > > On Tue, Aug 14, 2018 at 12:45 AM Stefan Beller wrote: > > +static int module_update_module_mode(int argc, const char **argv, const > > char *prefix) > > +{ > > + const char *path, *update = NULL; > > + int just_cloned; > > + struct submodule_update_strategy update_strategy = { .type = > > SM_UPDATE_CHECKOUT }; > > + > > + if (argc < 3 || argc > 4) > > + die("submodule--helper update-module-clone expects > > []"); > > Maybe _() ? I would rather not, as the submodule--helper is "internal only" and these die() calls could be clarified via #define BUG_IN_CALLING_SH(x) die(x) After the conversion to C is done, all these submodule helpers would go away, so I'd not burden the translators too much? Thanks, Stefan
Re: [PATCH v7 09/16] fetch: support filters
On 8/19/2018 7:24 AM, Duy Nguyen wrote: On Fri, Dec 8, 2017 at 5:00 PM Jeff Hostetler wrote: From: Jeff Hostetler Teach fetch to support filters. This is only allowed for the remote configured in extensions.partialcloneremote. Signed-off-by: Jonathan Tan --- builtin/fetch.c | 23 +-- connected.c | 2 ++ remote-curl.c | 6 ++ t/t5500-fetch-pack.sh | 36 4 files changed, 65 insertions(+), 2 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index 1b1f039..14aab71 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -18,6 +18,7 @@ #include "argv-array.h" #include "utf8.h" #include "packfile.h" +#include "list-objects-filter-options.h" static const char * const builtin_fetch_usage[] = { N_("git fetch [] [ [...]]"), @@ -55,6 +56,7 @@ static int recurse_submodules_default = RECURSE_SUBMODULES_ON_DEMAND; static int shown_url = 0; static int refmap_alloc, refmap_nr; static const char **refmap_array; +static struct list_objects_filter_options filter_options; static int git_fetch_config(const char *k, const char *v, void *cb) { @@ -160,6 +162,7 @@ static struct option builtin_fetch_options[] = { TRANSPORT_FAMILY_IPV4), OPT_SET_INT('6', "ipv6", , N_("use IPv6 addresses only"), TRANSPORT_FAMILY_IPV6), + OPT_PARSE_LIST_OBJECTS_FILTER(_options), Documentation is missing. Please add something to git-fetch.txt (or fetch-options.txt) about this option. I would make a patch but I don't know enough about this to write and I'm in the middle of something else. Documentation for --filter= (and --no-filter) were added to rev-list-options.txt. The opening paragraph talks about --objects which would need to be omitted for fetch and clone, but the rest of that text is relevant. I'll push up a patch shortly. Thanks Jeff
Re: [PATCH 1/3] diff.c: add --output-indicator-{new, old, context}
On Mon, Aug 20, 2018 at 12:31 PM Johannes Schindelin wrote: > > Hi Stefan, > > On Fri, 17 Aug 2018, Stefan Beller wrote: > > > This will prove useful in range-diff in a later patch as we will be able > > to differentiate between adding a new file (that line is starting with > > +++ and then the file name) and regular new lines. > > > > It could also be useful for experimentation in new patch formats, i.e. > > we could teach git to emit moved lines with lines other than +/-. > > Thanks. > > > diff --git a/diff.c b/diff.c > > index c5c7739ce34..03486c35b75 100644 > > --- a/diff.c > > +++ b/diff.c > > @@ -1281,7 +1281,9 @@ static void emit_diff_symbol_from_struct(struct > > diff_options *o, > > else if (c == '-') > > set = diff_get_color_opt(o, DIFF_FILE_OLD); > > } > > - emit_line_ws_markup(o, set_sign, set, reset, ' ', line, len, > ^ > Here we already pass `o`... so... > > > + emit_line_ws_markup(o, set_sign, set, reset, > > + > > o->output_indicators[OUTPUT_INDICATOR_CONTEXT], > > ^^^ > ... here, we could simply pass `OUTPUT_INDICATOR_CONTEXT` and let the > callee look it up in`o->output_indicators[]`... > > I read all three patches and did not see a reason why we could not > simplify the code that way. > > Other than that: great! Thanks! I considered it, but was put off by the (small) effort of yet another diff refactoring. I'll include it in a resend if a resend is needed, otherwise I would suggest to make it a patch on top? Thanks, Stefan
Re: [PATCH v2 0/8] Clarify commit-graph and grafts/replace/shallow incompatibilities
On Mon, Aug 20, 2018 at 11:24 AM Derrick Stolee wrote: > > One unresolved issue with the commit-graph feature is that it can cause > issues when combined with replace objects, commit grafts, or shallow > clones. These are not 100% incompatible, as one could be reasonably > successful writing a commit-graph after replacing some objects and not > have issues. The problems happen when commits that are already in the > commit-graph file are replaced, or when git is run with the > `--no-replace-objects` option; this can cause incorrect parents or > incorrect generation numbers. Similar things occur with commit grafts > and shallow clones, especially when running `git fetch --unshallow` in a > shallow repo. > > Instead of trying (and probably failing) to make these features work > together, default to making the commit-graph feature unavailable in these > situations. Create a new method 'commit_graph_compatible(r)' that checks > if the repository 'r' has any of these features enabled. > > CHANGES IN V2: > > * The first two commits regarding the ref iterators are unchanged, despite > a lot of discussion on the subject [1]. > > * I included Peff's changes in jk/core-use-replace-refs, changing the base > commit for the series to 1689c22c1c328e9135ed51458e9f9a5d224c5057 (the merge > that brought that topic into 'msater'). > > * I fixed the tests for the interactions with the graft feature. > > Because of the change of base, it is hard to provide a side-by-side diff > from v1. > > Thanks, > -Stolee > > [1] > https://public-inbox.org/git/CAGZ79kZ3PzqpGzXWcmxjzi98gA+LT2MBOf8KaA89hOa-Qig=o...@mail.gmail.com/ > Stefan's response recommending we keep the first two commits. > After reviewing my own patches, I flipped again (Sorry!) and would rather not see my patches be merged, but the very original solution by you, that you proposed at [1]. That said, I will not insist on it, and if this is merged as is, we can fix it up later. With that said, I just read through the remaining patches, I think they are a valuable addition to Git and could be merged as-is. [1] https://github.com/gitgitgadget/git/pull/11/commits/300db80140dacc927db0d46c804ca0ef4dcc1be1 Thanks, Stefan
pw/rebase-i-author-script-fix, was Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)
Team, On Mon, 20 Aug 2018, Phillip Wood wrote: > On 17/08/2018 23:44, Junio C Hamano wrote: > > Here are the topics that have been cooking. Commits prefixed with > > '-' are only in 'pu' (proposed updates) while commits prefixed with > > '+' are in 'next'. The ones marked with '.' do not appear in any of > > the integration branches, but I am still holding onto them. > > > * pw/rebase-i-author-script-fix (2018-08-07) 2 commits > > - sequencer: fix quoting in write_author_script > > - sequencer: handle errors from read_author_ident() > > > > Recent "git rebase -i" update started to write bogusly formatted > > author-script, with a matching broken reading code. These are > > being fixed. > > > > Undecided. > > Is it the list consensus to favor this "with extra code, read the > > script written by bad writer" approach? > > I think there was agreement between myself and Eric on the last version, > I'm not sure anyone else has expressed an opinion. The problem with > fixing the quoting without any backwards compatibility is that if git is > upgraded while a rebase is stopped read_author_script() will happily use > the broken quoting to create a corrupted author name in the new commit > if the name contains "'". > > The compatibility code in the latest version relies on the missing "'" > at the end of the GIT_AUTHOR_DATE line which is fixed by > es/rebase-i-author-script-fix which is now in master. If there is a > release with es/rebase-i-author-script-fix but not > pw/rebase-i-author-script-fix we'll have to rethink as the detection > wont be reliable. I have a branch that fixes read_author_script() to use > sq_dequote() at > https://github.com/phillipwood/git/commits/wip/fix-author-script. At the > moment it has compatibility with broken quoting, but I could strip that > out and then sq_dequote() will return an error with the broken quoting > and the user would have to restart the rebase. So one option is to drop > this series and wait for me to finish the improved solution next month. Having thought about it, I am now convinced that it wold be overkill to cater to "upgrade in the middle of a rebase". I think we should drop that part, as it uglifies the code rather a lot, and the impact is not worth the effort IMHO. Ciao, Johannes
Re: [PATCH 1/3] diff.c: add --output-indicator-{new, old, context}
Hi Stefan, On Fri, 17 Aug 2018, Stefan Beller wrote: > This will prove useful in range-diff in a later patch as we will be able > to differentiate between adding a new file (that line is starting with > +++ and then the file name) and regular new lines. > > It could also be useful for experimentation in new patch formats, i.e. > we could teach git to emit moved lines with lines other than +/-. Thanks. > diff --git a/diff.c b/diff.c > index c5c7739ce34..03486c35b75 100644 > --- a/diff.c > +++ b/diff.c > @@ -1281,7 +1281,9 @@ static void emit_diff_symbol_from_struct(struct > diff_options *o, > else if (c == '-') > set = diff_get_color_opt(o, DIFF_FILE_OLD); > } > - emit_line_ws_markup(o, set_sign, set, reset, ' ', line, len, ^ Here we already pass `o`... so... > + emit_line_ws_markup(o, set_sign, set, reset, > + > o->output_indicators[OUTPUT_INDICATOR_CONTEXT], ^^^ ... here, we could simply pass `OUTPUT_INDICATOR_CONTEXT` and let the callee look it up in`o->output_indicators[]`... I read all three patches and did not see a reason why we could not simplify the code that way. Other than that: great! Thank you, Dscho > + line, len, > flags & (DIFF_SYMBOL_CONTENT_WS_MASK), 0); > break; > case DIFF_SYMBOL_PLUS: > @@ -1324,7 +1326,9 @@ static void emit_diff_symbol_from_struct(struct > diff_options *o, > set = diff_get_color_opt(o, DIFF_CONTEXT_BOLD); > flags &= ~DIFF_SYMBOL_CONTENT_WS_MASK; > } > - emit_line_ws_markup(o, set_sign, set, reset, '+', line, len, > + emit_line_ws_markup(o, set_sign, set, reset, > + o->output_indicators[OUTPUT_INDICATOR_NEW], > + line, len, > flags & DIFF_SYMBOL_CONTENT_WS_MASK, > flags & DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF); > break; > @@ -1367,7 +1371,9 @@ static void emit_diff_symbol_from_struct(struct > diff_options *o, > else > set = diff_get_color_opt(o, DIFF_CONTEXT_DIM); > } > - emit_line_ws_markup(o, set_sign, set, reset, '-', line, len, > + emit_line_ws_markup(o, set_sign, set, reset, > + o->output_indicators[OUTPUT_INDICATOR_OLD], > + line, len, > flags & DIFF_SYMBOL_CONTENT_WS_MASK, 0); > break; > case DIFF_SYMBOL_WORDS_PORCELAIN: > @@ -4382,6 +4388,9 @@ void diff_setup(struct diff_options *options) > > options->file = stdout; > > + options->output_indicators[OUTPUT_INDICATOR_NEW] = '+'; > + options->output_indicators[OUTPUT_INDICATOR_OLD] = '-'; > + options->output_indicators[OUTPUT_INDICATOR_CONTEXT] = ' '; > options->abbrev = DEFAULT_ABBREV; > options->line_termination = '\n'; > options->break_opt = -1; > @@ -4869,6 +4878,12 @@ int diff_opt_parse(struct diff_options *options, >options->output_format |= DIFF_FORMAT_DIFFSTAT; > } else if (!strcmp(arg, "--no-compact-summary")) >options->flags.stat_with_summary = 0; > + else if (skip_prefix(arg, "--output-indicator-new=", )) > + options->output_indicators[OUTPUT_INDICATOR_NEW] = arg[0]; > + else if (skip_prefix(arg, "--output-indicator-old=", )) > + options->output_indicators[OUTPUT_INDICATOR_OLD] = arg[0]; > + else if (skip_prefix(arg, "--output-indicator-context=", )) > + options->output_indicators[OUTPUT_INDICATOR_CONTEXT] = arg[0]; > > /* renames options */ > else if (starts_with(arg, "-B") || > diff --git a/diff.h b/diff.h > index e1e54256c18..d7edc64454a 100644 > --- a/diff.h > +++ b/diff.h > @@ -194,6 +194,11 @@ struct diff_options { > FILE *file; > int close_file; > > +#define OUTPUT_INDICATOR_NEW 0 > +#define OUTPUT_INDICATOR_OLD 1 > +#define OUTPUT_INDICATOR_CONTEXT 2 > + char output_indicators[3]; > + > struct pathspec pathspec; > pathchange_fn_t pathchange; > change_fn_t change; > -- > 2.18.0.265.g16de1b435c9.dirty > >
Re: [PATCH/RFC] commit: new option to abort -a something is already staged
Hi, Nguyễn Thái Ngọc Duy wrote: > So many times I have carefully cherry picked changes to the index with > `git add -p` then accidentally did `git commit -am ` (usually by > retrieving a command from history and pressing Enter too quickly) > which destroyed beautiful index. > > One way to deal with this is some form of `git undo` that allows me to > retrieve the old index. Yes, I would love such an undo feature! How would you imagine that this information would get stored? We can start with adding that and a low-level (plumbing) interface to it, to avoid being blocked on getting the user-facing (porcelain) "git undo" interface right. (Or we can go straight for the porcelain. It's fine for it to start minimal and gain switches later.) [...] > Instead, let's handle just this problem for now. This new option > commit.preciousDirtyIndex, if set to false, will not allow `commit -a` > to continue if the final index is different from the existing one. I > don't think this can be achieved with hooks because the hooks don't > know about "-a" or different commit modes. I frequently use "git commit -a" this way intentionally, so I would be unlikely to turn this config on. That leads me to suspect it's not a good candidate for configuration: - it's not configuration for the sake of a transition period, since some people would keep it on forever - it's not configuration based on different project needs, either So configuration doesn't feel like a good fit. It might be that I can switch my muscle memory to "git add -u && git commit", in which case this could work as a new default without configuration (or with configuration for a transition period). A separate commandline option "git commit -a --no-clobber-index" might make sense as well, since then I could pass --clobber-index to keep my current workflow. That would also follow the usual progression: introduce commandline option first, then config, then change default. That said, I lean toward your initial thought, that this is papering over a missing undo feature. Can you say more about how you'd imagine undo working? Thanks, Jonathan
Re: Do not raise conflict when a code in a patch was already added
Am 20.08.2018 um 19:40 schrieb Phillip Wood: On 20/08/2018 11:22, Konstantin Kharlamov wrote: It's spectacular, that content of one of inserted conflict markers is empty, so all you have to do is to remove the markers, and use `git add` on the file, and then `git rebase --continue` Its a lot of unncessary actions, git could just figure that the code it sees in the patch is already there, being a part of another commit. If there are conflict markers where one side is empty it means that some lines from the merge base (which for a rebase is the parent of the commit being picked) have been deleted on one side and modified on the other. Git cannot know if you want to use the deleted version or the modified version. There's another possibility (and I think it is what happens actually in Konstantin's case): When one side added lines 1 2 and the other side added 1 2 3, then the actual conflict is << 1 2 == 1 2 3 >>, but our merge code is able to move the identical part out of the conflicted section: 1 2 << == 3 >>. But this is just a courtesy for the user; the real conflict is the original one. Without this optimization, the work to resolve the conflict would be slightly more arduous. -- Hannes
Re: [PATCH v2 0/8] Clarify commit-graph and grafts/replace/shallow incompatibilities
On Mon, Aug 20, 2018 at 11:24 AM Derrick Stolee wrote: > Because of the change of base, it is hard to provide a side-by-side diff > from v1. I thought that was the whole point of range-diff. ;-) I'll take a look. Thanks, Stefan
Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
On Mon, Aug 20, 2018 at 6:18 AM Matthew DeVore wrote: > > There were many instances in this file where it seemed like BUG would be > better, so I created a new commit before this one to switch them over. The > interdiff is below. > > BTW, why are there so many instances of "die" without "_"? I expect all > errors that may be caused by a user to be localized. Well, there is the porcelain layer to be consumed by a human user and the plumbing that is good for scripts. And in scripts you might want to grep for certain errors and react to that, so a non-localized error message makes the script possible to run in any localisation. The BUG is strictly for things that are due to Gits internals, not for problematic user input. Problematic user input definitely wants a die(...), and depending on the plumbing/porcelain layer it may need to be _(translatable). I think BUG() would never go with translated strings. > I'm going by the output of this: grep -IrE '\Wdie\([^_]' --exclude-dir=t > > diff --git a/list-objects-filter.c b/list-objects-filter.c > index 8e3caf5bf..09b2b05d5 100644 > --- a/list-objects-filter.c > +++ b/list-objects-filter.c > @@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none( > > switch (filter_situation) { > default: > - die("unknown filter_situation"); > - return LOFR_ZERO; > + BUG("unknown filter_situation: %d", filter_situation); > > case LOFS_BEGIN_TREE: > assert(obj->type == OBJ_TREE); > @@ -99,8 +98,7 @@ static enum list_objects_filter_result filter_trees_none( > > switch (filter_situation) { > default: > - die("unknown filter_situation"); > - return LOFR_ZERO; > + BUG("unknown filter_situation: %d", filter_situation); > > case LOFS_BEGIN_TREE: > case LOFS_BLOB: > @@ -151,8 +149,7 @@ static enum list_objects_filter_result > filter_blobs_limit( > > switch (filter_situation) { > default: > - die("unknown filter_situation"); > - return LOFR_ZERO; > + BUG("unknown filter_situation: %d", filter_situation); > > case LOFS_BEGIN_TREE: > assert(obj->type == OBJ_TREE); > @@ -257,8 +254,7 @@ static enum list_objects_filter_result filter_sparse( > > switch (filter_situation) { > default: > - die("unknown filter_situation"); > - return LOFR_ZERO; > + BUG("unknown filter_situation: %d", filter_situation); Up until here we just have replace the die by BUG in the default case of the state machine switch. (We need the default due to strict compile flags, but as filter_situation is an enum I thought we would not as compilers are smart enough to see we got all values of the enum covered). I agree that keeping the defaults and having a BUG() is reasonable. > > case LOFS_BEGIN_TREE: > assert(obj->type == OBJ_TREE); > @@ -439,7 +435,7 @@ void *list_objects_filter__init( > assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT); > > if (filter_options->choice >= LOFC__COUNT) > - die("invalid list-objects filter choice: %d", > + BUG("invalid list-objects filter choice: %d", > filter_options->choice); This also makes sense, combined with the assert before, this looks like really defensive code. I think this patch is a good idea! Thanks, Stefan
Re: [PATCH v3] checkout: optimize "git checkout -b "
Elijah Newren writes: > == The patch == > - Ben's patch only affects the "checkout -b $NEWBRANCH" case. He > checks for it by looking for any other flag that would be a different > case, and using the old codepath if he finds any. > - This means there is a "giant list of checks" for this optimization, > and an ongoing maintenance burden because if anyone ever adds any > extra options, this optimization might suddenly break things if that > giant list of checks isn't updated. Correct. Having said that, I do not mind leaving it in 'pu' for the rest of the cycle, possibly merging it to 'next' after that and keeping it there, to see how horrible a maintenance burden it would become in real life, though.
[PATCH v2 1/8] refs.c: migrate internal ref iteration to pass thru repository argument
From: Stefan Beller In 60ce76d3581 (refs: add repository argument to for_each_replace_ref, 2018-04-11) and 0d296c57aec (refs: allow for_each_replace_ref to handle arbitrary repositories, 2018-04-11), for_each_replace_ref learned how to iterate over refs by a given arbitrary repository. New attempts in the object store conversion have shown that it is useful to have the repository handle available that the refs iteration is currently iterating over. To achieve this goal we will need to add a repository argument to each_ref_fn in refs.h. However as many callers rely on the signature such a patch would be too large. So convert the internals of the ref subsystem first to pass through a repository argument without exposing the change to the user. Assume the_repository for the passed through repository, although it is not used anywhere yet. Signed-off-by: Stefan Beller Signed-off-by: Derrick Stolee --- refs.c | 39 +-- refs.h | 10 ++ refs/iterator.c | 6 +++--- refs/refs-internal.h | 5 +++-- 4 files changed, 53 insertions(+), 7 deletions(-) diff --git a/refs.c b/refs.c index 457fb78057..7cd76f72d2 100644 --- a/refs.c +++ b/refs.c @@ -1386,17 +1386,50 @@ struct ref_iterator *refs_ref_iterator_begin( * non-zero value, stop the iteration and return that value; * otherwise, return 0. */ +static int do_for_each_repo_ref(struct repository *r, const char *prefix, + each_repo_ref_fn fn, int trim, int flags, + void *cb_data) +{ + struct ref_iterator *iter; + struct ref_store *refs = get_main_ref_store(r); + + if (!refs) + return 0; + + iter = refs_ref_iterator_begin(refs, prefix, trim, flags); + + return do_for_each_repo_ref_iterator(r, iter, fn, cb_data); +} + +struct do_for_each_ref_help { + each_ref_fn *fn; + void *cb_data; +}; + +static int do_for_each_ref_helper(struct repository *r, + const char *refname, + const struct object_id *oid, + int flags, + void *cb_data) +{ + struct do_for_each_ref_help *hp = cb_data; + + return hp->fn(refname, oid, flags, hp->cb_data); +} + static int do_for_each_ref(struct ref_store *refs, const char *prefix, each_ref_fn fn, int trim, int flags, void *cb_data) { struct ref_iterator *iter; + struct do_for_each_ref_help hp = { fn, cb_data }; if (!refs) return 0; iter = refs_ref_iterator_begin(refs, prefix, trim, flags); - return do_for_each_ref_iterator(iter, fn, cb_data); + return do_for_each_repo_ref_iterator(the_repository, iter, + do_for_each_ref_helper, ); } int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data) @@ -2025,10 +2058,12 @@ int refs_verify_refname_available(struct ref_store *refs, int refs_for_each_reflog(struct ref_store *refs, each_ref_fn fn, void *cb_data) { struct ref_iterator *iter; + struct do_for_each_ref_help hp = { fn, cb_data }; iter = refs->be->reflog_iterator_begin(refs); - return do_for_each_ref_iterator(iter, fn, cb_data); + return do_for_each_repo_ref_iterator(the_repository, iter, +do_for_each_ref_helper, ); } int for_each_reflog(each_ref_fn fn, void *cb_data) diff --git a/refs.h b/refs.h index cc2fb4c68c..80eec8bbc6 100644 --- a/refs.h +++ b/refs.h @@ -274,6 +274,16 @@ struct ref_transaction; typedef int each_ref_fn(const char *refname, const struct object_id *oid, int flags, void *cb_data); +/* + * The same as each_ref_fn, but also with a repository argument that + * contains the repository associated with the callback. + */ +typedef int each_repo_ref_fn(struct repository *r, +const char *refname, +const struct object_id *oid, +int flags, +void *cb_data); + /* * The following functions invoke the specified callback function for * each reference indicated. If the function ever returns a nonzero diff --git a/refs/iterator.c b/refs/iterator.c index 2ac91ac340..629e00a122 100644 --- a/refs/iterator.c +++ b/refs/iterator.c @@ -407,15 +407,15 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0, struct ref_iterator *current_ref_iter = NULL; -int do_for_each_ref_iterator(struct ref_iterator *iter, -each_ref_fn fn, void *cb_data) +int do_for_each_repo_ref_iterator(struct repository *r, struct ref_iterator *iter, + each_repo_ref_fn fn, void *cb_data) { int retval = 0, ok; struct ref_iterator *old_ref_iter = current_ref_iter;
[PATCH v2 8/8] commit-graph: close_commit_graph before shallow walk
Call close_commit_graph() when about to start a rev-list walk that includes shallow commits. This is necessary in code paths that "fake" shallow commits for the sake of fetch. Specifically, test 351 in t5500-fetch-pack.sh runs git fetch --shallow-exclude one origin with a file-based transfer. When the "remote" has a commit-graph, we do not prevent the commit-graph from being loaded, but then the commits are intended to be dynamically transferred into shallow commits during get_shallow_commits_by_rev_list(). By closing the commit-graph before this call, we prevent this interaction. Signed-off-by: Derrick Stolee --- commit-graph.c | 8 commit-graph.h | 1 + upload-pack.c | 2 ++ 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index cee2caab5c..4bd1a4abbf 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -260,10 +260,10 @@ static int prepare_commit_graph(struct repository *r) return !!r->objects->commit_graph; } -static void close_commit_graph(void) +void close_commit_graph(struct repository *r) { - free_commit_graph(the_repository->objects->commit_graph); - the_repository->objects->commit_graph = NULL; + free_commit_graph(r->objects->commit_graph); + r->objects->commit_graph = NULL; } static int bsearch_graph(struct commit_graph *g, struct object_id *oid, uint32_t *pos) @@ -875,7 +875,7 @@ void write_commit_graph(const char *obj_dir, write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr); write_graph_chunk_large_edges(f, commits.list, commits.nr); - close_commit_graph(); + close_commit_graph(the_repository); finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC); commit_lock_file(); diff --git a/commit-graph.h b/commit-graph.h index 76e098934a..13d736cdde 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -59,6 +59,7 @@ void write_commit_graph(const char *obj_dir, int verify_commit_graph(struct repository *r, struct commit_graph *g); +void close_commit_graph(struct repository *); void free_commit_graph(struct commit_graph *); #endif diff --git a/upload-pack.c b/upload-pack.c index 82b393ec31..2ae9d9bb47 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -24,6 +24,7 @@ #include "quote.h" #include "upload-pack.h" #include "serve.h" +#include "commit-graph.h" /* Remember to update object flag allocation in object.h */ #define THEY_HAVE (1u << 11) @@ -740,6 +741,7 @@ static void deepen_by_rev_list(int ac, const char **av, { struct commit_list *result; + close_commit_graph(the_repository); result = get_shallow_commits_by_rev_list(ac, av, SHALLOW, NOT_SHALLOW); send_shallow(result); free_commit_list(result); -- 2.18.0.118.gd4f65b8d14
[PATCH v2 2/8] refs.c: upgrade for_each_replace_ref to be a each_repo_ref_fn callback
From: Stefan Beller Signed-off-by: Stefan Beller Signed-off-by: Derrick Stolee --- builtin/replace.c | 8 refs.c| 9 - refs.h| 2 +- replace-object.c | 5 +++-- 4 files changed, 12 insertions(+), 12 deletions(-) diff --git a/builtin/replace.c b/builtin/replace.c index e997a562f0..b5861a0ee9 100644 --- a/builtin/replace.c +++ b/builtin/replace.c @@ -39,7 +39,8 @@ struct show_data { enum replace_format format; }; -static int show_reference(const char *refname, const struct object_id *oid, +static int show_reference(struct repository *r, const char *refname, + const struct object_id *oid, int flag, void *cb_data) { struct show_data *data = cb_data; @@ -56,9 +57,8 @@ static int show_reference(const char *refname, const struct object_id *oid, if (get_oid(refname, )) return error("Failed to resolve '%s' as a valid ref.", refname); - obj_type = oid_object_info(the_repository, , - NULL); - repl_type = oid_object_info(the_repository, oid, NULL); + obj_type = oid_object_info(r, , NULL); + repl_type = oid_object_info(r, oid, NULL); printf("%s (%s) -> %s (%s)\n", refname, type_name(obj_type), oid_to_hex(oid), type_name(repl_type)); diff --git a/refs.c b/refs.c index 7cd76f72d2..c5a5f727e8 100644 --- a/refs.c +++ b/refs.c @@ -1474,12 +1474,11 @@ int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix, return do_for_each_ref(refs, prefix, fn, 0, flag, cb_data); } -int for_each_replace_ref(struct repository *r, each_ref_fn fn, void *cb_data) +int for_each_replace_ref(struct repository *r, each_repo_ref_fn fn, void *cb_data) { - return do_for_each_ref(get_main_ref_store(r), - git_replace_ref_base, fn, - strlen(git_replace_ref_base), - DO_FOR_EACH_INCLUDE_BROKEN, cb_data); + return do_for_each_repo_ref(r, git_replace_ref_base, fn, + strlen(git_replace_ref_base), + DO_FOR_EACH_INCLUDE_BROKEN, cb_data); } int for_each_namespaced_ref(each_ref_fn fn, void *cb_data) diff --git a/refs.h b/refs.h index 80eec8bbc6..a0a18223a1 100644 --- a/refs.h +++ b/refs.h @@ -317,7 +317,7 @@ int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data, int for_each_tag_ref(each_ref_fn fn, void *cb_data); int for_each_branch_ref(each_ref_fn fn, void *cb_data); int for_each_remote_ref(each_ref_fn fn, void *cb_data); -int for_each_replace_ref(struct repository *r, each_ref_fn fn, void *cb_data); +int for_each_replace_ref(struct repository *r, each_repo_ref_fn fn, void *cb_data); int for_each_glob_ref(each_ref_fn fn, const char *pattern, void *cb_data); int for_each_glob_ref_in(each_ref_fn fn, const char *pattern, const char *prefix, void *cb_data); diff --git a/replace-object.c b/replace-object.c index 4162df6324..3c17864eb7 100644 --- a/replace-object.c +++ b/replace-object.c @@ -6,7 +6,8 @@ #include "repository.h" #include "commit.h" -static int register_replace_ref(const char *refname, +static int register_replace_ref(struct repository *r, + const char *refname, const struct object_id *oid, int flag, void *cb_data) { @@ -25,7 +26,7 @@ static int register_replace_ref(const char *refname, oidcpy(_obj->replacement, oid); /* Register new object */ - if (oidmap_put(the_repository->objects->replace_map, repl_obj)) + if (oidmap_put(r->objects->replace_map, repl_obj)) die("duplicate replace ref: %s", refname); return 0; -- 2.18.0.118.gd4f65b8d14
[PATCH v2 5/8] commit-graph: not compatible with replace objects
Create new method commit_graph_compatible(r) to check if a given repository r is compatible with the commit-graph feature. Fill the method with a check to see if replace-objects exist. Test this interaction succeeds, including ignoring an existing commit-graph and failing to write a new commit-graph. However, we do ensure that we write a new commit-graph by setting read_replace_refs to 0, thereby ignoring the replace refs. Signed-off-by: Derrick Stolee --- builtin/commit-graph.c | 4 commit-graph.c | 21 + replace-object.c| 2 +- replace-object.h| 2 ++ t/t5318-commit-graph.sh | 22 ++ 5 files changed, 50 insertions(+), 1 deletion(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 0bf0c48657..da737df321 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -120,6 +120,8 @@ static int graph_read(int argc, const char **argv) return 0; } +extern int read_replace_refs; + static int graph_write(int argc, const char **argv) { struct string_list *pack_indexes = NULL; @@ -150,6 +152,8 @@ static int graph_write(int argc, const char **argv) if (!opts.obj_dir) opts.obj_dir = get_object_directory(); + read_replace_refs = 0; + if (opts.reachable) { write_commit_graph_reachable(opts.obj_dir, opts.append); return 0; diff --git a/commit-graph.c b/commit-graph.c index b0a55ad128..2c01fa433f 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -13,6 +13,8 @@ #include "commit-graph.h" #include "object-store.h" #include "alloc.h" +#include "hashmap.h" +#include "replace-object.h" #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */ #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ @@ -56,6 +58,19 @@ static struct commit_graph *alloc_commit_graph(void) return g; } +extern int read_replace_refs; + +static int commit_graph_compatible(struct repository *r) +{ + if (read_replace_refs) { + prepare_replace_object(r); + if (hashmap_get_size(>objects->replace_map->map)) + return 0; + } + + return 1; +} + struct commit_graph *load_commit_graph_one(const char *graph_file) { void *graph_map; @@ -223,6 +238,9 @@ static int prepare_commit_graph(struct repository *r) */ return 0; + if (!commit_graph_compatible(r)) + return 0; + obj_dir = r->objects->objectdir; prepare_commit_graph_one(r, obj_dir); prepare_alt_odb(r); @@ -693,6 +711,9 @@ void write_commit_graph(const char *obj_dir, int num_extra_edges; struct commit_list *parent; + if (!commit_graph_compatible(the_repository)) + return; + oids.nr = 0; oids.alloc = approximate_object_count() / 4; diff --git a/replace-object.c b/replace-object.c index 3c17864eb7..9821f1477e 100644 --- a/replace-object.c +++ b/replace-object.c @@ -32,7 +32,7 @@ static int register_replace_ref(struct repository *r, return 0; } -static void prepare_replace_object(struct repository *r) +void prepare_replace_object(struct repository *r) { if (r->objects->replace_map) return; diff --git a/replace-object.h b/replace-object.h index 9345e105dd..16528df942 100644 --- a/replace-object.h +++ b/replace-object.h @@ -10,6 +10,8 @@ struct replace_object { struct object_id replacement; }; +void prepare_replace_object(struct repository *r); + /* * This internal function is only declared here for the benefit of * lookup_replace_object(). Please do not call it directly. diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 4f17d7701e..e0c3c60f66 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -259,6 +259,28 @@ test_expect_success 'check that gc computes commit-graph' ' test_cmp commit-graph-after-gc $objdir/info/commit-graph ' +test_expect_success 'replace-objects invalidates commit-graph' ' + cd "$TRASH_DIRECTORY" && + test_when_finished rm -rf replace && + git clone full replace && + ( + cd replace && + git commit-graph write --reachable && + test_path_is_file .git/objects/info/commit-graph && + git replace HEAD~1 HEAD~2 && + git -c core.commitGraph=false log >expect && + git -c core.commitGraph=true log >actual && + test_cmp expect actual && + git commit-graph write --reachable && + git -c core.commitGraph=false --no-replace-objects log >expect && + git -c core.commitGraph=true --no-replace-objects log >actual && + test_cmp expect actual && + rm -rf .git/objects/info/commit-graph && + git commit-graph write --reachable && + test_path_is_file .git/objects/info/commit-graph + )
[PATCH v2 3/8] commit-graph: update design document
As it exists right now, the commit-graph feature may provide inconsistent results when combined with commit grafts, replace objects, and shallow clones. Update the design document to discuss why these interactions are difficult to reconcile and how we will avoid errors by preventing updates to and reads from the commit-graph file when these other features exist. Signed-off-by: Derrick Stolee --- Documentation/technical/commit-graph.txt | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index c664acbd76..001395e950 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -112,12 +112,24 @@ Design Details - The file format includes parameters for the object ID hash function, so a future change of hash algorithm does not require a change in format. +- Commit grafts and replace objects can change the shape of the commit + history. The latter can also be enabled/disabled on the fly using + `--no-replace-objects`. This leads to difficultly storing both possible + interpretations of a commit id, especially when computing generation + numbers. The commit-graph will not be read or written when + replace-objects or grafts are present. + +- Shallow clones create grafts of commits by dropping their parents. This + leads the commit-graph to think those commits have generation number 1. + If and when those commits are made unshallow, those generation numbers + become invalid. Since shallow clones are intended to restrict the commit + history to a very small set of commits, the commit-graph feature is less + helpful for these clones, anyway. The commit-graph will not be read or + written when shallow commits are present. + Future Work --- -- The commit graph feature currently does not honor commit grafts. This can - be remedied by duplicating or refactoring the current graft logic. - - After computing and storing generation numbers, we must make graph walks aware of generation numbers to gain the performance benefits they enable. This will mostly be accomplished by swapping a commit-date-ordered -- 2.18.0.118.gd4f65b8d14
[PATCH v2 4/8] test-repository: properly init repo
Signed-off-by: Derrick Stolee --- t/helper/test-repository.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/t/helper/test-repository.c b/t/helper/test-repository.c index 2762ca6562..6a84a53efb 100644 --- a/t/helper/test-repository.c +++ b/t/helper/test-repository.c @@ -15,7 +15,10 @@ static void test_parse_commit_in_graph(const char *gitdir, const char *worktree, struct commit *c; struct commit_list *parent; - repo_init(, gitdir, worktree); + setup_git_env(gitdir); + + if (repo_init(, gitdir, worktree)) + die("Couldn't init repo"); c = lookup_commit(, commit_oid); @@ -38,7 +41,10 @@ static void test_get_commit_tree_in_graph(const char *gitdir, struct commit *c; struct tree *tree; - repo_init(, gitdir, worktree); + setup_git_env(gitdir); + + if (repo_init(, gitdir, worktree)) + die("Couldn't init repo"); c = lookup_commit(, commit_oid); -- 2.18.0.118.gd4f65b8d14
[PATCH v2 7/8] commit-graph: not compatible with uninitialized repo
Signed-off-by: Derrick Stolee --- commit-graph.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index c4eaedf4e5..cee2caab5c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -62,6 +62,9 @@ extern int read_replace_refs; static int commit_graph_compatible(struct repository *r) { + if (!r->gitdir) + return 0; + if (read_replace_refs) { prepare_replace_object(r); if (hashmap_get_size(>objects->replace_map->map)) -- 2.18.0.118.gd4f65b8d14
[PATCH v2 6/8] commit-graph: not compatible with grafts
Augment commit_graph_compatible(r) to return false when the given repository r has commit grafts or is a shallow clone. Test that in these situations we ignore existing commit-graph files and we do not write new commit-graph files. Helped-by: Jakub Narebski Signed-off-by: Derrick Stolee --- commit-graph.c | 6 ++ commit.c| 2 +- commit.h| 1 + t/t5318-commit-graph.sh | 38 ++ 4 files changed, 46 insertions(+), 1 deletion(-) diff --git a/commit-graph.c b/commit-graph.c index 2c01fa433f..c4eaedf4e5 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -68,6 +68,12 @@ static int commit_graph_compatible(struct repository *r) return 0; } + prepare_commit_graft(r); + if (r->parsed_objects && r->parsed_objects->grafts_nr) + return 0; + if (is_repository_shallow(r)) + return 0; + return 1; } diff --git a/commit.c b/commit.c index 30d1af2b20..ef9a2cbb23 100644 --- a/commit.c +++ b/commit.c @@ -209,7 +209,7 @@ static int read_graft_file(struct repository *r, const char *graft_file) return 0; } -static void prepare_commit_graft(struct repository *r) +void prepare_commit_graft(struct repository *r) { char *graft_file; diff --git a/commit.h b/commit.h index da0db36eba..5459e279fe 100644 --- a/commit.h +++ b/commit.h @@ -202,6 +202,7 @@ typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *); struct commit_graft *read_graft_line(struct strbuf *line); int register_commit_graft(struct repository *r, struct commit_graft *, int); +void prepare_commit_graft(struct repository *r); struct commit_graft *lookup_commit_graft(struct repository *r, const struct object_id *oid); extern struct commit_list *get_merge_bases(struct commit *rev1, struct commit *rev2); diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index e0c3c60f66..6aee861f78 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -281,6 +281,44 @@ test_expect_success 'replace-objects invalidates commit-graph' ' ) ' +test_expect_success 'commit grafts invalidate commit-graph' ' + cd "$TRASH_DIRECTORY" && + test_when_finished rm -rf graft && + git clone full graft && + ( + cd graft && + git commit-graph write --reachable && + test_path_is_file .git/objects/info/commit-graph && + H1=$(git rev-parse --verify HEAD~1) && + H3=$(git rev-parse --verify HEAD~3) && + echo "$H1 $H3" >.git/info/grafts && + git -c core.commitGraph=false log >expect && + git -c core.commitGraph=true log >actual && + test_cmp expect actual && + git commit-graph write --reachable && + git -c core.commitGraph=false --no-replace-objects log >expect && + git -c core.commitGraph=true --no-replace-objects log >actual && + test_cmp expect actual && + rm -rf .git/objects/info/commit-graph && + git commit-graph write --reachable && + test_path_is_missing .git/objects/info/commit-graph + ) +' + +test_expect_success 'replace-objects invalidates commit-graph' ' + cd "$TRASH_DIRECTORY" && + test_when_finished rm -rf shallow && + git clone --depth 2 "file://$TRASH_DIRECTORY/full" shallow && + ( + cd shallow && + git commit-graph write --reachable && + test_path_is_missing .git/objects/info/commit-graph && + git fetch origin --unshallow && + git commit-graph write --reachable && + test_path_is_file .git/objects/info/commit-graph + ) +' + # the verify tests below expect the commit-graph to contain # exactly the commits reachable from the commits/8 branch. # If the file changes the set of commits in the list, then the -- 2.18.0.118.gd4f65b8d14
[PATCH v2 0/8] Clarify commit-graph and grafts/replace/shallow incompatibilities
One unresolved issue with the commit-graph feature is that it can cause issues when combined with replace objects, commit grafts, or shallow clones. These are not 100% incompatible, as one could be reasonably successful writing a commit-graph after replacing some objects and not have issues. The problems happen when commits that are already in the commit-graph file are replaced, or when git is run with the `--no-replace-objects` option; this can cause incorrect parents or incorrect generation numbers. Similar things occur with commit grafts and shallow clones, especially when running `git fetch --unshallow` in a shallow repo. Instead of trying (and probably failing) to make these features work together, default to making the commit-graph feature unavailable in these situations. Create a new method 'commit_graph_compatible(r)' that checks if the repository 'r' has any of these features enabled. CHANGES IN V2: * The first two commits regarding the ref iterators are unchanged, despite a lot of discussion on the subject [1]. * I included Peff's changes in jk/core-use-replace-refs, changing the base commit for the series to 1689c22c1c328e9135ed51458e9f9a5d224c5057 (the merge that brought that topic into 'msater'). * I fixed the tests for the interactions with the graft feature. Because of the change of base, it is hard to provide a side-by-side diff from v1. Thanks, -Stolee [1] https://public-inbox.org/git/CAGZ79kZ3PzqpGzXWcmxjzi98gA+LT2MBOf8KaA89hOa-Qig=o...@mail.gmail.com/ Stefan's response recommending we keep the first two commits. Derrick Stolee (6): commit-graph: update design document test-repository: properly init repo commit-graph: not compatible with replace objects commit-graph: not compatible with grafts commit-graph: not compatible with uninitialized repo commit-graph: close_commit_graph before shallow walk Stefan Beller (2): refs.c: migrate internal ref iteration to pass thru repository argument refs.c: upgrade for_each_replace_ref to be a each_repo_ref_fn callback Documentation/technical/commit-graph.txt | 18 +-- builtin/commit-graph.c | 4 ++ builtin/replace.c| 8 ++-- commit-graph.c | 38 +-- commit-graph.h | 1 + commit.c | 2 +- commit.h | 1 + refs.c | 48 --- refs.h | 12 - refs/iterator.c | 6 +-- refs/refs-internal.h | 5 +- replace-object.c | 7 +-- replace-object.h | 2 + t/helper/test-repository.c | 10 +++- t/t5318-commit-graph.sh | 60 upload-pack.c| 2 + 16 files changed, 194 insertions(+), 30 deletions(-) base-commit: 1689c22c1c328e9135ed51458e9f9a5d224c5057 -- 2.18.0.118.gd4f65b8d14
Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)
Junio C Hamano wrote: > * jh/structured-logging (2018-07-25) 25 commits > - structured-logging: add config data facility > - structured-logging: t0420 tests for interacitve child_summary > - structured-logging: t0420 tests for child process detail events > - structured-logging: add child process classification > - structured-logging: add detail-events for child processes > - structured-logging: add structured logging to remote-curl > - structured-logging: t0420 tests for aux-data > - structured-logging: add aux-data for size of sparse-checkout file > - structured-logging: add aux-data for index size > - structured-logging: add aux-data facility > - structured-logging: t0420 tests for timers > - structured-logging: add timer around preload_index > - structured-logging: add timer around wt-status functions > - structured-logging: add timer around do_write_index > - structured-logging: add timer around do_read_index > - structured-logging: add timer facility > - structured-logging: add detail-event for lazy_init_name_hash > - structured-logging: add detail-event facility > - structured-logging: t0420 basic tests > - structured-logging: set sub_command field for checkout command > - structured-logging: set sub_command field for branch command > - structured-logging: add session-id to log events > - structured-logging: add structured logging framework > - structured-logging: add STRUCTURED_LOGGING=1 to Makefile > - structured-logging: design document > > Will merge to 'next'. I believe this is under-reviewed (which I certainly share some blame for). Please hold off and I can look at it today.
Re: [PATCH v3] checkout: optimize "git checkout -b "
On Mon, Aug 20, 2018 at 6:40 AM Ben Peart wrote: > On 8/18/2018 9:44 PM, Elijah Newren wrote: ... > > == My opinions == > > - The performance wins are big enough that I can see why Ben is pushing > > this. > > - I totally see Duy's point that more general optimizations would be > > really nice. > > - I really dislike the "giant list of checks" and the ongoing > > maintenance burden it implies. > > > > Overall, I have to side with Duy and say I don't think we should take > > the patch as-is. Since that'll be frustrating for Ben to hear, I've > > tried to come up with some additional alternatives: > > Thank you for the thorough review and summary. From my perspective, it > is both fair and accurate. > ... > > - Rewrite this patch so it instead does a very small set of checks at > > the beginning of cmd_checkout(); e.g. check if argc == 3 and argv[1] > > == "-b" and if so then perform the minimum operations needed to create > > and checkout the new branch (maybe even calling in to cmd_branch() and > > cmd_symbolic_ref() and calling some reflog update function). Preface > > it with a comment that it's a performance hack that might eventually > > be able to go away. > > I'm happy to do this if it would make the patch better/more acceptable > to the community. For whatever it's worth, it'd change my vote. While I would like general optimizations more, I do think that's overall a longer term prospect, and the wins here are big enough to justify a performance hack -- if the amount of new code and added maintenance overhead is minimized. I think that checking for the args matching checkout -b rather than checking all flags that might imply something other than args being checkout -b would be a good way to help minimize both. > One other change I could do would be to address Duy's concern about the > config option would be to remove that entirely. That means those using > sparse-checkout don't have the option of getting the performance win but > the benefit is that there isn't any documentation or behavior changes > that need to be made - things just get a lot faster if you happen to use > the (rather common) "checkout -b" option. Playing with sparse-checkout, it feels to me like a half-baked feature. It seems like it required too much manual work, and it was sometimes hard to tell if I was misunderstanding configuration rules or I was just running into bugs in the code. I think I hit both but I didn't really want to get side-tracked further, yet. (I do want to eventually come back to it.) The only reason someone would go through that pain is if it provided massive performance benefits. Viewing things when-in-doubt through the lens of strict backwards compatibility may inadvertently mean enforcing buggy and/or slow behavior with sparse-checkouts. So, while git usually focuses strongly on backwards compatibility (which is a good thing), that might actually be wrong in the specific case of the sparse-checkout feature. In particular, since the whole point of the feature is essentially a performance hack, if the initial implementation has ways in which it really hurts performance, it seems like it'd be more in keeping with the point of the feature to take performance fixes and swallow subtle behavioral changes than to require special flags to get decent performance. That might sound like a long winded way of saying I'm totally in favor of dropping the config option in this case, but I'm also trying to build a case that I think there will be other situations where we want to subtly and maybe even not-so-subtly change behavior of sparse-checkouts to make them both more performant and far nicer for end users. I'm not sure others will agree with me on this argument, but sparse checkouts felt painful to me and I think we need something better. > If making these two changes would resolve the remaining concerns, I'm > happy to make them and send another patch series. If it will just > result in another round of "I'm still not comfortable with this" then > I'd rather spend my time elsewhere. :) That's fair. The first change would be enough to resolve the concern for me if others strongly push back on the second (config-option-removal) change, but I'd definitely prefer both changes -- fewer options means less code, less ongoing maintenance work, and less manual setup needed by users. > > - Possibly crazy idea for massive global performance win: modify how > > the cache works so that it can have tree entries instead of just file > > entries; make use of that with sparse checkouts (and partial clones?) > > so that the cache only stores a tree when that entire tree is > > "unwanted". Suddenly, git status, git add, git checkout, etc., etc. > > are all much faster. merge needs some special work to operate, > > though. > > Not crazy at all! It's something we've investigated and discussed > several times over the past couple of years. This is the largest > potential optimization that I am aware of as it
Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)
On 8/20/2018 1:26 PM, Junio C Hamano wrote: Jonathan Nieder writes: Usually, I refrain from merging larger topics in 'next' down to 'master' when we get close to -rc0, but I am wondering if it is better to merge all of them to 'master', even the ones on the larger and possibly undercooked side, expecting that we collectively spend effort on hunting and fixing bugs in them during the pre-release freeze period. If we were to go that route, I'd want everybody's buy-in and I'll promise to ignore any shiny new toys that appear on list that are not regression fixes to topics merged to 'master' since the end of the previous cycle to make sure people are not distracted. Based on what I see in 'next' (midx, range-diff, etc), I quite like this idea. The comment above was about the ones that was marked as "Will merge to 'master'" that are in 'next', not the ones marked as "Will cook in 'next'", like the midx ones. I am not worried about range-diff, as I do not think it came close enough to the core-ish code to break other things; the potential damage is fairly isolated and the worst would be that we'd realize that we shipped with a new command that was buggy after the release, which is not the end of the world. 'midx' and 'reachable' are quite different stories, as bugs in them would in the worst case lead to an immediate and unrecoverable repository corruption. So I am still on the fence, even though I am leaning toward merging them down to 'master'. I'm happy to wait until 2.19 is cut for ds/multi-pack-index to merge down. I sent a new series today [1] that improves the series and corrects some bugs I found while making the full Git test suite work when writing a multi-pack-index on every repack. Nothing was "unrecoverable repository corruption," but I understand the concern there. Making the safest decision is probably the best decision. I'm happy to contribute to a "pre-2.19 bug bash" after you merge things into master. Thanks, -Stolee [1] https://public-inbox.org/git/20180820165124.152146-1-dsto...@microsoft.com/T/#u [PATCH 0/9] multi-pack-index cleanups
Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)
On Fri, Aug 17, 2018 at 3:44 PM Junio C Hamano wrote: > > Usually, I refrain from merging larger topics in 'next' down to > 'master' when we get close to -rc0, but I am wondering if it is > better to merge all of them to 'master', even the ones on the larger > and possibly undercooked side, expecting that we collectively spend > effort on hunting and fixing bugs in them during the pre-release > freeze period. If we were to go that route, I'd want everybody's > buy-in and I'll promise to ignore any shiny new toys that appear on > list that are not regression fixes to topics merged to 'master' > since the end of the previous cycle to make sure people are not > distracted. Speaking of releases, linux for example has some releases that are more stable than others, as most distros pick the same release for their 'stable' release, whereas the regular and new releases are just off of the latest release of linux. Would a similar model be an interesting thought to entertain? I guess I could buy into a few weeks of bug fixing. > * nd/config-blame-sort (2018-08-06) 1 commit [...] > * nd/no-extern (2018-08-03) 12 commits [...] Thanks Duy for these two series! I would have expected the no-extern series to have a bit of merge conflicts similar to sb/object-store-lookup as it touches a lot of headers, but so far this looks like smooth sailing? > * sb/submodule-cleanup (2018-08-16) 2 commits > (merged to 'next' on 2018-08-17 at ca9d8aaef4) > + builtin/submodule--helper: remove stray new line > + t7410: update to new style > > A few preliminary minor clean-ups in the area around submodules. > > Will merge to 'master'. Oh, that is the prologue of https://public-inbox.org/git/20180816023100.161626-1-sbel...@google.com/ which I will resend to build on top. Given your question to hunt more bugs, I'd delay its resend until after the release. > > * ao/submodule-wo-gitmodules-checked-out (2018-08-14) 7 commits > - submodule: support reading .gitmodules even when it's not checked out > - t7506: clean up .gitmodules properly before setting up new scenario > - submodule: use the 'submodule--helper config' command > - submodule--helper: add a new 'config' subcommand > - t7411: be nicer to future tests and really clean things up > - submodule: factor out a config_set_in_gitmodules_file_gently function > - submodule: add a print_config_from_gitmodules() helper > > The submodule support has been updated to read from the blob at > HEAD:.gitmodules when the .gitmodules file is missing from the > working tree. > > I find the design a bit iffy in that our usual "missing in the > working tree? let's use the latest blob" fallback is to take it > from the index, not from the HEAD. I am not sure; why does it feel iffy? > * bw/submodule-name-to-dir (2018-08-10) 2 commits > In modern repository layout, the real body of a cloned submodule > repository is held in .git/modules/ of the superproject, indexed by > the submodule name. URLencode the submodule name before computing > the name of the directory to make sure they form a flat namespace. > > Will merge to 'next'. Cool! Is the discussion on top of it still going whether to use a new config for special cases or how we distinguish between a/b/ and a%2fb as submodule names? > * md/filter-trees (2018-08-16) 6 commits > - list-objects-filter: implement filter tree:0 > - revision: mark non-user-given objects instead > - rev-list: handle missing tree objects properly > - list-objects: always parse trees gently > - list-objects: refactor to process_tree_contents > - list-objects: store common func args in struct > > The "rev-list --filter" feature learned to exclude all trees via > "tree:0" filter. I gave this a read and think it is good to go. > * sb/config-write-fix (2018-08-08) 3 commits > (merged to 'next' on 2018-08-17 at 7d9c7ce81f) > + git-config: document accidental multi-line setting in deprecated syntax > + config: fix case sensitive subsection names on writing > + t1300: document current behavior of setting options > > Recent update to "git config" broke updating variable in a > subsection, which has been corrected. > > Will merge to 'master'. Thanks! > > > * sb/range-diff-colors (2018-08-14) 8 commits > - diff.c: rewrite emit_line_0 more understandably > - diff.c: omit check for line prefix in emit_line_0 > - diff: use emit_line_0 once per line > - diff.c: add set_sign to emit_line_0 > - diff.c: reorder arguments for emit_line_ws_markup > - diff.c: simplify caller of emit_line_0 > - t3206: add color test for range-diff --dual-color > - test_decode_color: understand FAINT and ITALIC > (this branch uses js/range-diff; is tangled with es/format-patch-rangediff.) > > The color output support for recently introduced "range-diff" > command got tweaked a bit. No, this series doesn't tweak the colored range-diff. This might be: Add more test coverage to colored range-diff and refactor the diff machinery to be more
Re: [PATCH/RFC] commit: new option to abort -a something is already staged
On Mon, Aug 20, 2018 at 11:41 AM Nguyễn Thái Ngọc Duy wrote: > One way to deal with this is some form of `git undo` that allows me to > retrieve the old index. That's not a lot of work by itself. The problem > is designing that `git undo` interface because there are more undo > options that this. s/that/than/
Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)
On Mon, Aug 20, 2018 at 6:23 AM Phillip Wood wrote: > On 17/08/2018 23:44, Junio C Hamano wrote: > > * pw/rebase-i-author-script-fix (2018-08-07) 2 commits > > - sequencer: fix quoting in write_author_script > > - sequencer: handle errors from read_author_ident() > > > > Undecided. > > Is it the list consensus to favor this "with extra code, read the > > script written by bad writer" approach? > > I think there was agreement between myself and Eric on the last version, > I'm not sure anyone else has expressed an opinion. Phillip's v4 [1] does take backward-compatibility into account, but with minimal extra code, which seems a reasonable compromise between no backward-compatibility and the heavyweight approach of v3. I left a few comments on v4 [1], but none were blockers, and changes based upon those comments can be done later, so I think the series can move forward. [1]: https://public-inbox.org/git/20180807093452.22524-1-phillip.w...@talktalk.net/
Re: Do not raise conflict when a code in a patch was already added
On 20/08/2018 11:22, Konstantin Kharlamov wrote: > So, steps-to-reproduce below rather full of trivia like setting up a > repo, but the TL;DR is: > > Upon using `git rebase -i HEAD~1` and then `git add -p` to add part of a > "hunk" as one commit, and then using `git rebase --continue` so the > other part of hunk would be left in top commit; git raises a conflict. I think this is a misleading error message as in your example below there are no conflicts, just unstaged changes. git-rebase.sh has the following code git update-index --ignore-submodules --refresh && git diff-files --quiet --ignore-submodules || { echo "$(gettext "You must edit all merge conflicts and then mark them as resolved using git add")" exit 1 I think this pre-dates interactive rebases when the only unstaged changes could be conflicts. > > It's spectacular, that content of one of inserted conflict markers is > empty, so all you have to do is to remove the markers, and use `git add` > on the file, and then `git rebase --continue` > > Its a lot of unncessary actions, git could just figure that the code it > sees in the patch is already there, being a part of another commit. If there are conflict markers where one side is empty it means that some lines from the merge base (which for a rebase is the parent of the commit being picked) have been deleted on one side and modified on the other. Git cannot know if you want to use the deleted version or the modified version. You can use 'git diff --cc' to see the combined diff which should show the lines being deleted on both sides and an addition on the side with the modified lines. You can also set the merge.conflictStyle config variable to diff3 to see the original text as well as the text from the merge heads. Best Wishes Phillip > > Maybe git could issue a warning, or to question a user interactively > (y/n); but raising a conflict IMO is unnecessary. > > # Steps to reproduce > > In empty dir execute: > > $ git init > $ touch test > Initialized empty Git repository in /tmp/test/.git/ > $ git add test > $ git commit > [master (root-commit) a7ce543] 1st commit > 1 file changed, 2 insertions(+) > create mode 100644 test > $ echo -e "foo\nbar" > test # content you'll want to break > $ git add -u && git commit > [detached HEAD 9e28331] 2-nd commit > 1 file changed, 2 insertions(+) > $ git rebase -i --root > Stopped at a7ce543... 1st commit > You can amend the commit now, with > > git commit --amend > > Once you are satisfied with your changes, run > > git rebase --continue > > Put "edit" for the 2-nd commit > > $ git reset HEAD^ > Unstaged changes after reset: > M test > $ git add -p > diff --git a/test b/test > index e69de29..3bd1f0e 100644 > --- a/test > +++ b/test > @@ -0,0 +1,2 @@ > +foo > +bar > Stage this hunk [y,n,q,a,d,e,?]? e > > ╭─constantine@constantine-N61Ja /tmp/test ‹node-› ‹› (e721fa3*) > ╰─$ git commit > [detached HEAD 27b2f63] add foo > 1 file changed, 1 insertion(+) > ╭─constantine@constantine-N61Ja /tmp/test ‹node-› ‹› (27b2f63*) > ╰─$ git rebase --continue > test: needs update > You must edit all merge conflicts and then > mark them as resolved using git add > > What happened is that it's obvious that the hunk was broken to multiple > commits, and git should figure that out, and not to raise a conflict. > > Side note: for some reason in the test git didn't insert conflict > markers. It did in real-world usecase though, and there was simply no > content inside one of them. >
Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)
Jonathan Nieder writes: >> Usually, I refrain from merging larger topics in 'next' down to >> 'master' when we get close to -rc0, but I am wondering if it is >> better to merge all of them to 'master', even the ones on the larger >> and possibly undercooked side, expecting that we collectively spend >> effort on hunting and fixing bugs in them during the pre-release >> freeze period. If we were to go that route, I'd want everybody's >> buy-in and I'll promise to ignore any shiny new toys that appear on >> list that are not regression fixes to topics merged to 'master' >> since the end of the previous cycle to make sure people are not >> distracted. > > Based on what I see in 'next' (midx, range-diff, etc), I quite like > this idea. The comment above was about the ones that was marked as "Will merge to 'master'" that are in 'next', not the ones marked as "Will cook in 'next'", like the midx ones. I am not worried about range-diff, as I do not think it came close enough to the core-ish code to break other things; the potential damage is fairly isolated and the worst would be that we'd realize that we shipped with a new command that was buggy after the release, which is not the end of the world. 'midx' and 'reachable' are quite different stories, as bugs in them would in the worst case lead to an immediate and unrecoverable repository corruption. So I am still on the fence, even though I am leaning toward merging them down to 'master'.
[PATCH 8/9] midx: test a few commands that use get_all_packs
The new get_all_packs() method exposed the packfiles coverede by a multi-pack-index. Before, the 'git cat-file --batch' and 'git count-objects' commands would skip objects in an environment with a multi-pack-index. Further, a reachability bitmap would be ignored if its pack-file was covered by a multi-pack-index. Signed-off-by: Derrick Stolee --- t/t5319-multi-pack-index.sh | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 4b6e2825a6..424d0c640f 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -86,8 +86,14 @@ test_expect_success 'write midx with one v1 pack' ' ' midx_git_two_modes () { - git -c core.multiPackIndex=false $1 >expect && - git -c core.multiPackIndex=true $1 >actual && + if [ "$2" = "sorted" ] + then + git -c core.multiPackIndex=false $1 | sort >expect && + git -c core.multiPackIndex=true $1 | sort >actual + else + git -c core.multiPackIndex=false $1 >expect && + git -c core.multiPackIndex=true $1 >actual + fi && test_cmp expect actual } @@ -95,7 +101,10 @@ compare_results_with_midx () { MSG=$1 test_expect_success "check normal git operations: $MSG" ' midx_git_two_modes "rev-list --objects --all" && - midx_git_two_modes "log --raw" + midx_git_two_modes "log --raw" && + midx_git_two_modes "count-objects --verbose" && + midx_git_two_modes "cat-file --batch-all-objects --buffer --batch-check" && + midx_git_two_modes "cat-file --batch-all-objects --buffer --batch-check --unsorted" sorted ' } @@ -149,6 +158,12 @@ test_expect_success 'repack removes multi-pack-index' ' compare_results_with_midx "after repack" +test_expect_success 'multi-pack-index and pack-bitmap' ' + git -c repack.writeBitmaps=true repack -ad && + git multi-pack-index write && + git rev-list --test-bitmap HEAD +' + test_expect_success 'multi-pack-index and alternates' ' git init --bare alt.git && echo $(pwd)/alt.git/objects >.git/objects/info/alternates && -- 2.18.0.118.gd4f65b8d14
[PATCH 9/9] pack-objects: consider packs in multi-pack-index
When running 'git pack-objects --local', we want to avoid packing objects that are in an alternate. Currently, we check for these objects using the packed_git_mru list, which excludes the pack-files covered by a multi-pack-index. Add a new iteration over the multi-pack-indexes to find these copies and mark them as unwanted. Signed-off-by: Derrick Stolee --- builtin/pack-objects.c | 28 t/t5319-multi-pack-index.sh | 8 +++- 2 files changed, 35 insertions(+), 1 deletion(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 1a896d7810..4a9a42d29a 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -31,6 +31,7 @@ #include "packfile.h" #include "object-store.h" #include "dir.h" +#include "midx.h" #define IN_PACK(obj) oe_in_pack(_pack, obj) #define SIZE(obj) oe_size(_pack, obj) @@ -1034,6 +1035,7 @@ static int want_object_in_pack(const struct object_id *oid, { int want; struct list_head *pos; + struct multi_pack_index *m; if (!exclude && local && has_loose_object_nonlocal(oid)) return 0; @@ -1048,6 +1050,32 @@ static int want_object_in_pack(const struct object_id *oid, if (want != -1) return want; } + + for (m = get_multi_pack_index(the_repository); m; m = m->next) { + struct pack_entry e; + if (fill_midx_entry(oid, , m)) { + struct packed_git *p = e.p; + off_t offset; + + if (p == *found_pack) + offset = *found_offset; + else + offset = find_pack_entry_one(oid->hash, p); + + if (offset) { + if (!*found_pack) { + if (!is_pack_valid(p)) + continue; + *found_offset = offset; + *found_pack = p; + } + want = want_found_object(exclude, p); + if (want != -1) + return want; + } + } + } + list_for_each(pos, get_packed_git_mru(the_repository)) { struct packed_git *p = list_entry(pos, struct packed_git, mru); off_t offset; diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 424d0c640f..6f56b38674 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -176,7 +176,13 @@ test_expect_success 'multi-pack-index and alternates' ' compare_results_with_midx "with alternate (local midx)" test_expect_success 'multi-pack-index in an alternate' ' - mv .git/objects/pack/* alt.git/objects/pack + mv .git/objects/pack/* alt.git/objects/pack && + test_commit add_local_objects && + git repack --local && + git multi-pack-index write && + midx_read_expect 1 3 4 $objdir && + git reset --hard HEAD~1 && + rm -f .git/objects/pack/* ' compare_results_with_midx "with alternate (remote midx)" -- 2.18.0.118.gd4f65b8d14
[PATCH 2/9] multi-pack-index: store local property
A pack-file is 'local' if it is stored within the usual object directory. If it is stored in an alternate, it is non-local. Pack-files are stored using a 'pack_local' member in the packed_git struct. Add a similar 'local' member to the multi_pack_index struct and 'local' parameters to the methods that load and prepare multi- pack-indexes. Signed-off-by: Derrick Stolee --- midx.c| 11 ++- midx.h| 6 -- packfile.c| 4 ++-- t/helper/test-read-midx.c | 2 +- 4 files changed, 13 insertions(+), 10 deletions(-) diff --git a/midx.c b/midx.c index 19b7df338e..6824acf5f8 100644 --- a/midx.c +++ b/midx.c @@ -37,7 +37,7 @@ static char *get_midx_filename(const char *object_dir) return xstrfmt("%s/pack/multi-pack-index", object_dir); } -struct multi_pack_index *load_multi_pack_index(const char *object_dir) +struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local) { struct multi_pack_index *m = NULL; int fd; @@ -73,6 +73,7 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir) m->fd = fd; m->data = midx_map; m->data_len = midx_size; + m->local = local; m->signature = get_be32(m->data); if (m->signature != MIDX_SIGNATURE) { @@ -209,7 +210,7 @@ static int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id) strbuf_addf(_name, "%s/pack/%s", m->object_dir, m->pack_names[pack_int_id]); - m->packs[pack_int_id] = add_packed_git(pack_name.buf, pack_name.len, 1); + m->packs[pack_int_id] = add_packed_git(pack_name.buf, pack_name.len, m->local); strbuf_release(_name); return !m->packs[pack_int_id]; } @@ -318,7 +319,7 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_name) return 0; } -int prepare_multi_pack_index_one(struct repository *r, const char *object_dir) +int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local) { struct multi_pack_index *m = r->objects->multi_pack_index; struct multi_pack_index *m_search; @@ -332,7 +333,7 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir) if (!strcmp(object_dir, m_search->object_dir)) return 1; - r->objects->multi_pack_index = load_multi_pack_index(object_dir); + r->objects->multi_pack_index = load_multi_pack_index(object_dir, local); if (r->objects->multi_pack_index) { r->objects->multi_pack_index->next = m; @@ -746,7 +747,7 @@ int write_midx_file(const char *object_dir) midx_name); } - packs.m = load_multi_pack_index(object_dir); + packs.m = load_multi_pack_index(object_dir, 1); packs.nr = 0; packs.alloc_list = packs.m ? packs.m->num_packs : 16; diff --git a/midx.h b/midx.h index e3b07f1586..8aa79f4b62 100644 --- a/midx.h +++ b/midx.h @@ -18,6 +18,8 @@ struct multi_pack_index { uint32_t num_packs; uint32_t num_objects; + int local; + const unsigned char *chunk_pack_names; const uint32_t *chunk_oid_fanout; const unsigned char *chunk_oid_lookup; @@ -29,14 +31,14 @@ struct multi_pack_index { char object_dir[FLEX_ARRAY]; }; -struct multi_pack_index *load_multi_pack_index(const char *object_dir); +struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local); int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result); struct object_id *nth_midxed_object_oid(struct object_id *oid, struct multi_pack_index *m, uint32_t n); int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct multi_pack_index *m); int midx_contains_pack(struct multi_pack_index *m, const char *idx_name); -int prepare_multi_pack_index_one(struct repository *r, const char *object_dir); +int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local); int write_midx_file(const char *object_dir); void clear_midx_file(const char *object_dir); diff --git a/packfile.c b/packfile.c index 12db1a9d7d..896da460ac 100644 --- a/packfile.c +++ b/packfile.c @@ -963,11 +963,11 @@ static void prepare_packed_git(struct repository *r) if (r->objects->packed_git_initialized) return; - prepare_multi_pack_index_one(r, r->objects->objectdir); + prepare_multi_pack_index_one(r, r->objects->objectdir, 1); prepare_packed_git_one(r, r->objects->objectdir, 1); prepare_alt_odb(r); for (alt = r->objects->alt_odb_list; alt; alt = alt->next) { - prepare_multi_pack_index_one(r, alt->path); + prepare_multi_pack_index_one(r, alt->path, 0); prepare_packed_git_one(r, alt->path, 0);
[PATCH 4/9] midx: stop reporting garbage
When prepare_packed_git is called with the report_garbage method initialized, we report unexpected files in the objects directory as garbage. Stop reporting the multi-pack-index and the pack-files it covers as garbage. Signed-off-by: Derrick Stolee --- packfile.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/packfile.c b/packfile.c index 896da460ac..fe713a0242 100644 --- a/packfile.c +++ b/packfile.c @@ -820,9 +820,8 @@ static void prepare_pack(const char *full_name, size_t full_name_len, struct packed_git *p; size_t base_len = full_name_len; - if (strip_suffix_mem(full_name, _len, ".idx")) { - if (data->m && midx_contains_pack(data->m, file_name)) - return; + if (strip_suffix_mem(full_name, _len, ".idx") && + !(data->m && midx_contains_pack(data->m, file_name))) { /* Don't reopen a pack we already have. */ for (p = data->r->objects->packed_git; p; p = p->next) { size_t len; @@ -842,6 +841,8 @@ static void prepare_pack(const char *full_name, size_t full_name_len, if (!report_garbage) return; + if (!strcmp(file_name, "multi-pack-index")) + return; if (ends_with(file_name, ".idx") || ends_with(file_name, ".pack") || ends_with(file_name, ".bitmap") || -- 2.18.0.118.gd4f65b8d14
[PATCH 3/9] midx: mark bad packed objects
When an object fails to decompress from a pack-file, we mark the object as 'bad' so we can retry with a different copy of the object (if such a copy exists). Before now, the multi-pack-index did not update the bad objects list for the pack-files it contains, and we did not check the bad objects list when reading an object. Now, do both. Signed-off-by: Derrick Stolee --- midx.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/midx.c b/midx.c index 6824acf5f8..7fa75a37a3 100644 --- a/midx.c +++ b/midx.c @@ -280,6 +280,16 @@ static int nth_midxed_pack_entry(struct multi_pack_index *m, struct pack_entry * if (!is_pack_valid(p)) return 0; + if (p->num_bad_objects) { + uint32_t i; + struct object_id oid; + nth_midxed_object_oid(, m, pos); + for (i = 0; i < p->num_bad_objects; i++) + if (!hashcmp(oid.hash, +p->bad_object_sha1 + the_hash_algo->rawsz * i)) + return 0; + } + e->offset = nth_midxed_offset(m, pos); e->p = p; -- 2.18.0.118.gd4f65b8d14
[PATCH 7/9] treewide: use get_all_packs
There are many places in the codebase that want to iterate over all packfiles known to Git. The purposes are wide-ranging, and those that can take advantage of the multi-pack-index already do. So, use get_all_packs() instead of get_packed_git() to be sure we are iterating over all packfiles. Signed-off-by: Derrick Stolee --- builtin/count-objects.c | 2 +- builtin/fsck.c | 4 ++-- builtin/gc.c | 4 ++-- builtin/pack-objects.c | 14 +++--- builtin/pack-redundant.c | 4 ++-- fast-import.c| 4 ++-- http-backend.c | 4 ++-- pack-bitmap.c| 2 +- pack-objects.c | 2 +- packfile.c | 2 +- server-info.c| 4 ++-- 11 files changed, 23 insertions(+), 23 deletions(-) diff --git a/builtin/count-objects.c b/builtin/count-objects.c index d51e2ce1ec..a7cad052c6 100644 --- a/builtin/count-objects.c +++ b/builtin/count-objects.c @@ -123,7 +123,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix) struct strbuf pack_buf = STRBUF_INIT; struct strbuf garbage_buf = STRBUF_INIT; - for (p = get_packed_git(the_repository); p; p = p->next) { + for (p = get_all_packs(the_repository); p; p = p->next) { if (!p->pack_local) continue; if (open_pack_index(p)) diff --git a/builtin/fsck.c b/builtin/fsck.c index c96f3f4fcc..184d8e7f4e 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -740,7 +740,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) struct progress *progress = NULL; if (show_progress) { - for (p = get_packed_git(the_repository); p; + for (p = get_all_packs(the_repository); p; p = p->next) { if (open_pack_index(p)) continue; @@ -749,7 +749,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) progress = start_progress(_("Checking objects"), total); } - for (p = get_packed_git(the_repository); p; + for (p = get_all_packs(the_repository); p; p = p->next) { /* verify gives error messages itself */ if (verify_pack(p, fsck_obj_buffer, diff --git a/builtin/gc.c b/builtin/gc.c index 57069442b0..2b592260e9 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -183,7 +183,7 @@ static struct packed_git *find_base_packs(struct string_list *packs, { struct packed_git *p, *base = NULL; - for (p = get_packed_git(the_repository); p; p = p->next) { + for (p = get_all_packs(the_repository); p; p = p->next) { if (!p->pack_local) continue; if (limit) { @@ -208,7 +208,7 @@ static int too_many_packs(void) if (gc_auto_pack_limit <= 0) return 0; - for (cnt = 0, p = get_packed_git(the_repository); p; p = p->next) { + for (cnt = 0, p = get_all_packs(the_repository); p; p = p->next) { if (!p->pack_local) continue; if (p->pack_keep) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 4391504a91..1a896d7810 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -2784,7 +2784,7 @@ static void add_objects_in_unpacked_packs(struct rev_info *revs) memset(_pack, 0, sizeof(in_pack)); - for (p = get_packed_git(the_repository); p; p = p->next) { + for (p = get_all_packs(the_repository); p; p = p->next) { struct object_id oid; struct object *o; @@ -2848,7 +2848,7 @@ static int has_sha1_pack_kept_or_nonlocal(const struct object_id *oid) struct packed_git *p; p = (last_found != (void *)1) ? last_found : - get_packed_git(the_repository); + get_all_packs(the_repository); while (p) { if ((!p->pack_local || p->pack_keep || @@ -2858,7 +2858,7 @@ static int has_sha1_pack_kept_or_nonlocal(const struct object_id *oid) return 1; } if (p == last_found) - p = get_packed_git(the_repository); + p = get_all_packs(the_repository); else p = p->next; if (p == last_found) @@ -2894,7 +2894,7 @@ static void loosen_unused_packed_objects(struct rev_info *revs) uint32_t i; struct object_id oid; - for (p = get_packed_git(the_repository); p; p = p->next) { + for (p = get_all_packs(the_repository); p; p =
[PATCH 1/9] multi-pack-index: provide more helpful usage info
The multi-pack-index builtin has a very simple command-line interface. Instead of simply reporting usage, give the user a hint to why the arguments failed. Reported-by: Eric Sunshine Signed-off-by: Derrick Stolee --- builtin/multi-pack-index.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index 6a7aa00cf2..2633efd95d 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -32,16 +32,16 @@ int cmd_multi_pack_index(int argc, const char **argv, opts.object_dir = get_object_directory(); if (argc == 0) - goto usage; + usage_with_options(builtin_multi_pack_index_usage, + builtin_multi_pack_index_options); - if (!strcmp(argv[0], "write")) { - if (argc > 1) - goto usage; + if (argc > 1) { + die(_("too many arguments")); + return 1; + } + if (!strcmp(argv[0], "write")) return write_midx_file(opts.object_dir); - } -usage: - usage_with_options(builtin_multi_pack_index_usage, - builtin_multi_pack_index_options); + die(_("unrecognized verb: %s"), argv[0]); } -- 2.18.0.118.gd4f65b8d14
[PATCH 5/9] midx: fix bug that skips midx with alternates
The logic for constructing the linked list of multi-pack-indexes in the object store is incorrect. If the local object store has a multi-pack-index, but an alternate does not, then the list is dropped. Add tests that would have revealed this bug. Signed-off-by: Derrick Stolee --- midx.c | 11 ++- t/t5319-multi-pack-index.sh | 17 + 2 files changed, 23 insertions(+), 5 deletions(-) diff --git a/midx.c b/midx.c index 7fa75a37a3..0710c4c175 100644 --- a/midx.c +++ b/midx.c @@ -331,7 +331,7 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_name) int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local) { - struct multi_pack_index *m = r->objects->multi_pack_index; + struct multi_pack_index *m; struct multi_pack_index *m_search; int config_value; @@ -339,14 +339,15 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i !config_value) return 0; - for (m_search = m; m_search; m_search = m_search->next) + for (m_search = r->objects->multi_pack_index; m_search; m_search = m_search->next) if (!strcmp(object_dir, m_search->object_dir)) return 1; - r->objects->multi_pack_index = load_multi_pack_index(object_dir, local); + m = load_multi_pack_index(object_dir, local); - if (r->objects->multi_pack_index) { - r->objects->multi_pack_index->next = m; + if (m) { + m->next = r->objects->multi_pack_index; + r->objects->multi_pack_index = m; return 1; } diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index ae1d5d4592..4b6e2825a6 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -149,6 +149,23 @@ test_expect_success 'repack removes multi-pack-index' ' compare_results_with_midx "after repack" +test_expect_success 'multi-pack-index and alternates' ' + git init --bare alt.git && + echo $(pwd)/alt.git/objects >.git/objects/info/alternates && + echo content1 >file1 && + altblob=$(GIT_DIR=alt.git git hash-object -w file1) && + git cat-file blob $altblob && + git rev-list --all +' + +compare_results_with_midx "with alternate (local midx)" + +test_expect_success 'multi-pack-index in an alternate' ' + mv .git/objects/pack/* alt.git/objects/pack +' + +compare_results_with_midx "with alternate (remote midx)" + # usage: corrupt_data [] corrupt_data () { -- 2.18.0.118.gd4f65b8d14
[PATCH 6/9] packfile: add all_packs list
If a repo contains a multi-pack-index, then the packed_git list does not contain the packfiles that are covered by the multi-pack-index. This is important for doing object lookups, abbreviations, and approximating object count. However, there are many operations that really want to iterate over all packfiles. Create a new 'all_packs' linked list that contains this list, starting with the packfiles in the multi-pack-index and then continuing along the packed_git linked list. Signed-off-by: Derrick Stolee --- midx.c | 2 +- midx.h | 1 + object-store.h | 6 ++ packfile.c | 27 +++ packfile.h | 1 + 5 files changed, 36 insertions(+), 1 deletion(-) diff --git a/midx.c b/midx.c index 0710c4c175..f3e8dbc108 100644 --- a/midx.c +++ b/midx.c @@ -197,7 +197,7 @@ static void close_midx(struct multi_pack_index *m) FREE_AND_NULL(m->pack_names); } -static int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id) +int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id) { struct strbuf pack_name = STRBUF_INIT; diff --git a/midx.h b/midx.h index 8aa79f4b62..a210f1af2a 100644 --- a/midx.h +++ b/midx.h @@ -32,6 +32,7 @@ struct multi_pack_index { }; struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local); +int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id); int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result); struct object_id *nth_midxed_object_oid(struct object_id *oid, struct multi_pack_index *m, diff --git a/object-store.h b/object-store.h index f9c57e2c26..c69d546392 100644 --- a/object-store.h +++ b/object-store.h @@ -128,6 +128,12 @@ struct raw_object_store { /* A most-recently-used ordered version of the packed_git list. */ struct list_head packed_git_mru; + /* +* A linked list containing all packfiles, starting with those +* contained in the multi_pack_index. +*/ + struct packed_git *all_packs; + /* * A fast, rough count of the number of objects in the repository. * These two fields are not meant for direct access. Use diff --git a/packfile.c b/packfile.c index fe713a0242..adcf2e12a0 100644 --- a/packfile.c +++ b/packfile.c @@ -972,6 +972,9 @@ static void prepare_packed_git(struct repository *r) prepare_packed_git_one(r, alt->path, 0); } rearrange_packed_git(r); + + r->objects->all_packs = NULL; + prepare_packed_git_mru(r); r->objects->packed_git_initialized = 1; } @@ -995,6 +998,30 @@ struct multi_pack_index *get_multi_pack_index(struct repository *r) return r->objects->multi_pack_index; } +struct packed_git *get_all_packs(struct repository *r) +{ + prepare_packed_git(r); + + if (!r->objects->all_packs) { + struct packed_git *p = r->objects->packed_git; + struct multi_pack_index *m; + + for (m = r->objects->multi_pack_index; m; m = m->next) { + uint32_t i; + for (i = 0; i < m->num_packs; i++) { + if (!prepare_midx_pack(m, i)) { + m->packs[i]->next = p; + p = m->packs[i]; + } + } + } + + r->objects->all_packs = p; + } + + return r->objects->all_packs; +} + struct list_head *get_packed_git_mru(struct repository *r) { prepare_packed_git(r); diff --git a/packfile.h b/packfile.h index 786686..3b90e2864c 100644 --- a/packfile.h +++ b/packfile.h @@ -51,6 +51,7 @@ extern void install_packed_git(struct repository *r, struct packed_git *pack); struct packed_git *get_packed_git(struct repository *r); struct list_head *get_packed_git_mru(struct repository *r); struct multi_pack_index *get_multi_pack_index(struct repository *r); +struct packed_git *get_all_packs(struct repository *r); /* * Give a rough count of objects in the repository. This sacrifices accuracy -- 2.18.0.118.gd4f65b8d14
[PATCH 0/9] multi-pack-index cleanups
This series is based on ds/multi-pack-index and jk/for-each-object-iteration. The multi-pack-index indexes objects across multiple pack-files. To speed up object lookups and abbreviations, we do not place the pack- files covered by the multi-pack-index into the packed_git linked list or the packed_git_mru list. Existing test coverage focused on typical uses and the main consumers of the multi-pack-index. To better understand the implications of the multi-pack-index with other scenarios, I ran the test suite after adding a step to 'git repack' to write a multi-pack-index, and to default core.multiPackIndex to 'true'. This commit is available as [1]. The following issues were discovered, and are fixed by this series: 1. The multi-pack-index did not distinguish between local and non-local pack-files. 2. A bad packed object was not inspected by object lookups in the multi- pack-index, so would loop infinitely trying to load the same object. 3. 'git count-objects --verbose' would not see the objects in the multi- pack-index and would report the multi-pack-index as garbage. 4. If the local object directory had a multi-pack-index but an alternate did not, then the multi-pack-index would be dropped. 5. If the multi-pack-index covered a pack-file that was paired with a reachability bitmap, then that bitmap would not be loaded. Several issues were resolved simply by making a new 'all_packs' list in the object store and replacing get_packed_git() calls with get_all_packs() calls. The all_packs list is a linked list that starts with the pack-files in multi-pack-indexes and then continues along the packed_git linked list. Also: I simplified the usage reports in 'git multi-pack-index' to help users who are entering parameters incorrectly. [1] https://github.com/derrickstolee/git/commit/098dd1d515b592fb165a276241d7d68d1cde0036 DO-NOT-MERGE: compute multi-pack-index on repack. I will send this commit as a separate patch so we can see the change I made and the one test I needed to fix (because it moves a pack-file, thereby making the multi-pack-index invalid). [2] https://github.com/derrickstolee/git/pull/9 A GitHub pull request containing this series. Derrick Stolee (9): multi-pack-index: provide more helpful usage info multi-pack-index: store local property midx: mark bad packed objects midx: stop reporting garbage midx: fix bug that skips midx with alternates packfile: add all_packs list treewide: use get_all_packs midx: test a few commands that use get_all_packs pack-objects: consider packs in multi-pack-index builtin/count-objects.c | 2 +- builtin/fsck.c | 4 ++-- builtin/gc.c| 4 ++-- builtin/multi-pack-index.c | 16 +++--- builtin/pack-objects.c | 42 +-- builtin/pack-redundant.c| 4 ++-- fast-import.c | 4 ++-- http-backend.c | 4 ++-- midx.c | 32 ++- midx.h | 7 -- object-store.h | 6 + pack-bitmap.c | 2 +- pack-objects.c | 2 +- packfile.c | 40 - packfile.h | 1 + server-info.c | 4 ++-- t/helper/test-read-midx.c | 2 +- t/t5319-multi-pack-index.sh | 44 ++--- 18 files changed, 168 insertions(+), 52 deletions(-) -- 2.18.0.118.gd4f65b8d14
Re: [PATCH v3 4/7] submodule--helper: add a new 'config' subcommand
On Tue, 14 Aug 2018 10:10:58 -0700 Brandon Williams wrote: > On 08/14, Antonio Ospite wrote: > > Add a new 'config' subcommand to 'submodule--helper', this extra level > > of indirection makes it possible to add some flexibility to how the > > submodules configuration is handled. > > > > Signed-off-by: Antonio Ospite > > --- > > builtin/submodule--helper.c | 14 ++ > > > new | 0 > > Looks like you may have accidentally left in an empty file "new" it should > probably be removed from this commit before it gets merged. > Yeah, I had added it to test "git cat-file -e new" for a later patch and then I must have messed up some rebase. Thanks for pointing it out. Ciao, Antonio -- Antonio Ospite https://ao2.it https://twitter.com/ao2it A: Because it messes up the order in which people normally read text. See http://en.wikipedia.org/wiki/Posting_style Q: Why is top-posting such a bad thing?
Re: [PATCH v3 3/7] t7411: be nicer to future tests and really clean things up
On Tue, 14 Aug 2018 13:16:38 -0700 Junio C Hamano wrote: > Antonio Ospite writes: > [...] > > test_expect_success 'error message contains blob reference' ' > > + # Remove the error introduced in the previous test. > > + # It is not needed in the following tests. > > + test_when_finished "git -C super reset --hard HEAD^" && > > Hmm, that is ugly. Depending on where in the subshell the previous > test failed, you'd still be taking us to an unexpected place. > Imagine if "git commit -m 'add error'" failed, for example, in the > test before this one. > > I am wondering if the proper fix is to merge the previous one and > this one into a single test. The combined test would > > - remember where the HEAD in super is and arrange to come back > to it when test is done > - break .gitmodules and commit it > - run test-tool and check its output > - also check its error output > > in a single test_expect_success. > I will try that. > > @@ -123,6 +126,7 @@ test_expect_success 'using different treeishs works' ' > > ' > > > > test_expect_success 'error in history in fetchrecursesubmodule lets > > continue' ' > > + test_when_finished "git -C super reset --hard HEAD^" && > > (cd super && > > git config -f .gitmodules \ > > submodule.submodule.fetchrecursesubmodules blabla && > > @@ -134,8 +138,7 @@ test_expect_success 'error in history in > > fetchrecursesubmodule lets continue' ' > > HEAD b \ > > HEAD submodule \ > > >actual && > > - test_cmp expect_error actual && > > - git reset --hard HEAD^ > > + test_cmp expect_error actual > > ) > > ' > > If we want to be more robust, you'd probably need to find a better > anchoring point than HEAD, which can be pointing different commit > depending on where in the subshell the process is hit with ^C, > i.e. > > ORIG=$(git -C super rev-parse HEAD) && > test_when_finished "git -C super reset --hard $ORIG" && > ( > cd super && > ... > I see, ORIG is set and evaluated immediately but the value will be used only at a later time. I remember that you raised concerns also in the previous review round but I didn't quite get what you meant, now I think I do. > The patch is still an improvement compared to the current code, > where a broken test-tool that does not produce expected output in > the file 'actual' is guaranteed to leave us at a commit that we do > not expect to be at, but not entirely satisfactory. I can do a v4 with these fixes since there are also some comments about other patches. Thanks, Antonio -- Antonio Ospite https://ao2.it https://twitter.com/ao2it A: Because it messes up the order in which people normally read text. See http://en.wikipedia.org/wiki/Posting_style Q: Why is top-posting such a bad thing?
Re: [PATCH/RFC] commit: new option to abort -a something is already staged
Nguyễn Thái Ngọc Duy writes: > +commit.preciousDirtyIndex:: > + If some changes are partially staged in the index (i.e. > + "git commit -a" and "git commit" commit different changes), reject > + "git commit -a". Or perhaps better yet, go into yes/no interaction to confirm if you have access to interactive terminal at fd #0/#1, perhaps? > diff --git a/builtin/commit.c b/builtin/commit.c > index 213fca2d8e..489e4b9f50 100644 > --- a/builtin/commit.c > +++ b/builtin/commit.c > @@ -98,6 +98,7 @@ static const char *author_message, *author_message_buffer; > static char *edit_message, *use_message; > static char *fixup_message, *squash_message; > static int all, also, interactive, patch_interactive, only, amend, signoff; > +static int allow_dirty_index = 1; > static int edit_flag = -1; /* unspecified */ > static int quiet, verbose, no_verify, allow_empty, dry_run, renew_authorship; > static int config_commit_verbose = -1; /* unspecified */ > @@ -385,10 +386,24 @@ static const char *prepare_index(int argc, const char > **argv, const char *prefix >* (B) on failure, rollback the real index. >*/ > if (all || (also && pathspec.nr)) { > + int compare_oid = all && !allow_dirty_index; > + struct object_id previous_oid; > + > + if (compare_oid) { > + if (update_main_cache_tree(0) || !the_index.cache_tree) > + die(_("error building trees")); Hmph, when we conclude a conflicted merge with "commit -a", wouldn't we fail to compute the "before" picture, with higher-stage entries stil in the index? > + if (the_index.cache_tree->entry_count >= 0) > + oidcpy(_oid, > _index.cache_tree->oid); > + else > + oidclr(_oid); The cache-tree covers no entry, meaning the index has no cache entries? Shouldn't we setting EMPTY_TREE_SHA1_BIN or something instead? > + } > hold_locked_index(_lock, LOCK_DIE_ON_ERROR); > add_files_to_cache(also ? prefix : NULL, , 0); > refresh_cache_or_die(refresh_flags); > update_main_cache_tree(WRITE_TREE_SILENT); > + if (compare_oid && the_index.cache_tree && > + oidcmp(_oid, _index.cache_tree->oid)) > + die(_("staged content is different, aborting")); I was hoping that it is an easy change to teach add_files_to_cache() to report how many paths it actually has "added" (modifications and removals are of course also counted), which allows us not to waste computing a throw-away tree object once more. There only are three existing callers to the function, and only one of them rely on the current "non-zero is error, zero is good" semantics, so updating that caller (and perhaps vetting the other callers to see if they also _should_ pay attention to the return value, at least to detect errors of not number of paths updated) shouldn't be that much more effort, and would be a good update to the API regardless of this new feature, I would think. > if (write_locked_index(_index, _lock, 0)) > die(_("unable to write new_index file")); > commit_style = COMMIT_NORMAL; > @@ -1413,6 +1428,10 @@ static int git_commit_config(const char *k, const char > *v, void *cb) > config_commit_verbose = git_config_bool_or_int(k, v, _bool); > return 0; > } > + if (!strcmp(k, "commit.preciousdirtyindex")) { > + allow_dirty_index = !git_config_bool(k, v); > + return 0; > + } > > status = git_gpg_config(k, v, NULL); > if (status)
Re: "less -F" is broken
On 8/16/2018 10:10 AM, Linus Torvalds wrote: One option that I didn't try to go for - because I just don't know the less code base well enough - is to basically make the behavior of '-F' be something like this: - as long as all the lines are short and well-behaved, and we haven't seen enough lines to fill the screen, act like 'cat' and just feed them through - when you fill the screen (or when you hit some other condition that makes you go "now I won't exit" - that could be a long line, but maybe it could also be the user giving keyboard input for a less command?) you send the init sequence and just redraw the whole screen. I'm not sure that this would be a very nice user experience. On a terminal where the init sequence opens an alternate screen, some lines from the start of the file would be printed in the main screen, and then the whole file would be viewed in the alternate screen. After exiting less, the user would see his main screen with some lines from the first page of the file displayed and then cut off at a seemingly arbitrary point. Seems like that could be confusing and annoying. But let's say that that was all the user was interested in, and the user presses 'q' to quit less. That doesn't work at all - it will wait for that full ten seconds. That actually happens even without -F too. Wouldn't it be good to react to things like searches to highlight something (and to 'quit' for the 'never mind, alteady got it' case) even if there isn't enough data to fill the whole screen yet? that said, ^C works, and this is not new behavior, so I'm just throwing this out as a "maybe a different approach would fix _both_ the -F behavior _and_ the above traditional issue"? This issue is, as you say, not related to the -F issue, but arises because less doesn't have a way to be reading a file and simultaneously react to terminal key presses. When I first wrote less, there was no easy way to do this in Unix. Less also runs on other OSes which don't provide this functionality. The best I was able to do was to allow ctrl-C to interrupt the read. Of course in a modern OS that has select() or similar functionality this could be implemented, but I think it would require some largish changes to the architecture. (Or maybe not; I haven't really investigated this in detail.) BTW, your first message seems to indicate that you didn't find the less project on github. It's at https://github.com/gwsw/less (mentioned in the README). The latest version (v535) has the -F change implemented, but I haven't yet released this for beta testing. --Mark
[PATCH/RFC] commit: new option to abort -a something is already staged
So many times I have carefully cherry picked changes to the index with `git add -p` then accidentally did `git commit -am ` (usually by retrieving a command from history and pressing Enter too quickly) which destroyed beautiful index. One way to deal with this is some form of `git undo` that allows me to retrieve the old index. That's not a lot of work by itself. The problem is designing that `git undo` interface because there are more undo options that this. Instead, let's handle just this problem for now. This new option commit.preciousDirtyIndex, if set to false, will not allow `commit -a` to continue if the final index is different from the existing one. I don't think this can be achieved with hooks because the hooks don't know about "-a" or different commit modes. Or is there a better way to handle this? Signed-off-by: Nguyễn Thái Ngọc Duy --- Documentation/config.txt | 5 + builtin/commit.c | 19 +++ 2 files changed, 24 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index 95ad715a44..3937681ee9 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1417,6 +1417,11 @@ commit.gpgSign:: convenient to use an agent to avoid typing your GPG passphrase several times. +commit.preciousDirtyIndex:: + If some changes are partially staged in the index (i.e. + "git commit -a" and "git commit" commit different changes), reject + "git commit -a". + commit.status:: A boolean to enable/disable inclusion of status information in the commit message template when using an editor to prepare the commit diff --git a/builtin/commit.c b/builtin/commit.c index 213fca2d8e..489e4b9f50 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -98,6 +98,7 @@ static const char *author_message, *author_message_buffer; static char *edit_message, *use_message; static char *fixup_message, *squash_message; static int all, also, interactive, patch_interactive, only, amend, signoff; +static int allow_dirty_index = 1; static int edit_flag = -1; /* unspecified */ static int quiet, verbose, no_verify, allow_empty, dry_run, renew_authorship; static int config_commit_verbose = -1; /* unspecified */ @@ -385,10 +386,24 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix * (B) on failure, rollback the real index. */ if (all || (also && pathspec.nr)) { + int compare_oid = all && !allow_dirty_index; + struct object_id previous_oid; + + if (compare_oid) { + if (update_main_cache_tree(0) || !the_index.cache_tree) + die(_("error building trees")); + if (the_index.cache_tree->entry_count >= 0) + oidcpy(_oid, _index.cache_tree->oid); + else + oidclr(_oid); + } hold_locked_index(_lock, LOCK_DIE_ON_ERROR); add_files_to_cache(also ? prefix : NULL, , 0); refresh_cache_or_die(refresh_flags); update_main_cache_tree(WRITE_TREE_SILENT); + if (compare_oid && the_index.cache_tree && + oidcmp(_oid, _index.cache_tree->oid)) + die(_("staged content is different, aborting")); if (write_locked_index(_index, _lock, 0)) die(_("unable to write new_index file")); commit_style = COMMIT_NORMAL; @@ -1413,6 +1428,10 @@ static int git_commit_config(const char *k, const char *v, void *cb) config_commit_verbose = git_config_bool_or_int(k, v, _bool); return 0; } + if (!strcmp(k, "commit.preciousdirtyindex")) { + allow_dirty_index = !git_config_bool(k, v); + return 0; + } status = git_gpg_config(k, v, NULL); if (status) -- 2.18.0.1003.g5e2e2c8169
Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output
Han-Wen Nienhuys writes: >> @@ -84,6 +86,9 @@ static void maybe_colorize_sideband(struct strbuf *dest, >> const char *src, int n) >> for (i = 0; i < ARRAY_SIZE(keywords); i++) { >> struct keyword_entry *p = keywords + i; >> int len = strlen(p->keyword); >> + >> + if (n <= len) >> + continue; > > I would suggest > > if (n < len) continue; > .. > if (!strncasecmp(p->keyword, src, len) && (n == len || !isalnum(src[len]))) { > > so we colorize a single line that looks like "warning" as well That's the kind of thing I would have mentioned in the initial review of the feature, and I do not think it is a bad idea. I do think it is a bad idea to roll it into this patch, though.
Re: [PATCH v2 05/11] t0027: make hash size independent'
On 20.08.18 00:10, Eric Sunshine wrote: > On Sun, Aug 19, 2018 at 5:57 PM brian m. carlson > wrote: >> On Sun, Aug 19, 2018 at 04:10:21PM -0400, Eric Sunshine wrote: >>> On Sun, Aug 19, 2018 at 1:54 PM brian m. carlson - tr '\015\000abcdef0123456789' QN0 <"$2" >"$exp" && + tr '\015\000abcdef0123456789' QN0 <"$2" >"$exp+" && >>> >>> My immediate thought upon reading this was whether "+" is valid in >>> Windows filenames. Apparently, it is, but perhaps (if you re-roll) it >>> would make sense to use a character less likely to cause brain >>> hiccups; for instance, "exp0'. >> >> The reason I picked that is that we use it quite a bit in the Makefile, >> so it seemed like an obvious choice for a temporary file name. If you >> feel strongly I can pick something else, but I thought it would be >> reasonably intuitive for other developers. Maybe I was wrong, though. > > I don't feel strongly about it. My brain tripped over it probably > because it's not an idiom in Git tests. In fact, I see just one place > where "+" has been used like this, in t6026. > Probably "tmp" is a better name than "exp+" (Why the '+' ? Is it better that the non-'+' ?) Anyway, If we re-order a little bit, can we use a simple '|' instead ? tr '\015\000abcdef0123456789' QN0 <"$2" | sed -e "s/+/$ZERO_OID/" >"$exp" && tr '\015\000abcdef0123456789' QN0 <"$3" >"$act" &&
Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output
Jeff King writes: >> In the longer term we may want to accept size_t as parameter for >> clarity (even though we know that a sideband message we are painting >> typically would fit on a line on a terminal and int is sufficient). >> Write it down as a NEEDSWORK comment. > > This "typically" made me nervous about what happens in the non-typical > case, but I think we can say something even stronger: the length comes > from a pktline, so the maximum is less than 16 bits. I wondered if we > might ever call this on the accumulated string from multiple sidebands, > but it doesn't look like it. I think a sideband packet may be split on one but not the other codepath to result in multiple calls, but I do not think we coalesce them together, so I agree that int is sufficient.
Re: [PATCH v3] checkout: optimize "git checkout -b "
On 8/18/2018 9:44 PM, Elijah Newren wrote: On Fri, Aug 17, 2018 at 5:41 AM Ben Peart wrote: On 8/16/2018 2:37 PM, Duy Nguyen wrote: On Thu, Aug 16, 2018 at 8:27 PM Ben Peart wrote: From: Ben Peart Skip merging the commit, updating the index and working directory if and only if we are creating a new branch via "git checkout -b ." Any other checkout options will still go through the former code path. If sparse_checkout is on, require the user to manually opt in to this optimzed behavior by setting the config setting checkout.optimizeNewBranch to true as we will no longer update the skip-worktree bit in the index, nor add/remove files in the working directory to reflect the current sparse checkout settings. For comparison, running "git checkout -b " on a large repo takes: 14.6 seconds - without this patch 0.3 seconds - with this patch I still don't think we should do this. If you want lightning fast branch creation, just use 'git branch'. From the timing breakdown you shown in the other thread it looks like sparse checkout still takes seconds, which could be optimized (or even excluded, I mentioned this too). And split index (or something similar if you can't use it) would give you saving across the board. There is still one idea Elijah gave me that should further lower traverse_trees() cost. We have investigated some of these already - split index ended up slowing things down more than it sped them up do to the higher compute costs. Sparse checkout we've already optimized significantly - limiting the patterns we accept so that we can do the lookup via a hashmap instead of the robust pattern matching. We will continue to look for other optimizations and appreciate any and all ideas! In the end, this optimization makes a huge performance improvement by avoiding doing a lot of work that isn't necessary. Taking a command from 14+ seconds to sub-second is just too much of a win for us to ignore. But anyway, it's not my call. I'll stop here. It's even less of my call, but since things seem to be stuck in what-should-we-do state (as per Junio's comments on this patch in the last two "What's cooking" emails), and since Ben and Duy obviously have opposite opinions on Ben's patch, let's see if I might be able to help at all. Here's my summary and my findings: == The pain == - For repositories with a really large number of entries (500K as Ben says), some operations are much slower than it feels like they should be. - This does not seem to be GFVS-specific in any way, I can duplicate slowness with a simple git-bomb[1]-like repo that has a sparse checkout pattern ignoring the "bomb" side. (It has 1M+1 entries in the index, and .git/info/sparse-checkout ignores the 1M so the working copy only has 1 entry). The timings on my repo for "git checkout -b $NEWBRANCH" are almost exactly double what Ben reports he gets on their repo. [1] https://kate.io/blog/making-your-own-exploding-git-repos/ == Short term solutions == - Alternative git commands exist today to do a fast checkout of a new branch in a huge repo. I also get sub-second timings in my even-bigger repo with this: git branch $NEWBRANCH && git symbolic-ref HEAD $NEWBRANCH But I do understand that wrapping this into a script or executable (git-fast-new-branch?) and asking users to use it is a usability problem and an uphill battle. (Sidenote: this isn't quite the same operation; it's missing a reflog update. The -m option to symbolic-ref doesn't seem to help; I guess the fact that HEAD's sha1sum is not changing is viewed as not an update? However, that could also be scripted around.) - Ben's patch successfully drops the time for "git checkout -b $NEWBRANCH" from 26+ seconds (in my cooked-up testcase) to sub-second (in fact, under .1 seconds for me). That's a _huge_ win. == unpack_trees optimization notes == - Ben's patch is extremely focused. It only affects "git checkout -b $NEWBRANCH". If someone runs "git branch $NEWBRANCH && git checkout $NEWBRANCH", they get the old 26+ second timing. They also suddenly get the really long timing if they add any other flags or checkout a commit that differs in only a single entry in the entire tree. It would be nice if we did general optimization for all issues rather than just special casing such narrow cases. - However, optimizing unpack_trees is hard. It's really easy to get lost trying to look at the code. Time has been spent trying to optimizing it. Ben really likes the speedup factors of 2-3 that Duy has produced. But he's pessimistic we'll find enough to bridge the gap for this case. And he's worried about breaking unrelated stuff due to the complexity of unpack_trees. - Duy is pretty sure we can optimize unpack_trees in at least one more way. I've tried looking through the code and think there are others, but then again I'm known to get lost and confused in unpack_trees. == The patch == - Ben's patch only affects the "checkout -b $NEWBRANCH" case. He checks for it
Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
There were many instances in this file where it seemed like BUG would be better, so I created a new commit before this one to switch them over. The interdiff is below. BTW, why are there so many instances of "die" without "_"? I expect all errors that may be caused by a user to be localized. I'm going by the output of this: grep -IrE '\Wdie\([^_]' --exclude-dir=t diff --git a/list-objects-filter.c b/list-objects-filter.c index 8e3caf5bf..09b2b05d5 100644 --- a/list-objects-filter.c +++ b/list-objects-filter.c @@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none( switch (filter_situation) { default: - die("unknown filter_situation"); - return LOFR_ZERO; + BUG("unknown filter_situation: %d", filter_situation); case LOFS_BEGIN_TREE: assert(obj->type == OBJ_TREE); @@ -99,8 +98,7 @@ static enum list_objects_filter_result filter_trees_none( switch (filter_situation) { default: - die("unknown filter_situation"); - return LOFR_ZERO; + BUG("unknown filter_situation: %d", filter_situation); case LOFS_BEGIN_TREE: case LOFS_BLOB: @@ -151,8 +149,7 @@ static enum list_objects_filter_result filter_blobs_limit( switch (filter_situation) { default: - die("unknown filter_situation"); - return LOFR_ZERO; + BUG("unknown filter_situation: %d", filter_situation); case LOFS_BEGIN_TREE: assert(obj->type == OBJ_TREE); @@ -257,8 +254,7 @@ static enum list_objects_filter_result filter_sparse( switch (filter_situation) { default: - die("unknown filter_situation"); - return LOFR_ZERO; + BUG("unknown filter_situation: %d", filter_situation); case LOFS_BEGIN_TREE: assert(obj->type == OBJ_TREE); @@ -439,7 +435,7 @@ void *list_objects_filter__init( assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT); if (filter_options->choice >= LOFC__COUNT) - die("invalid list-objects filter choice: %d", + BUG("invalid list-objects filter choice: %d", filter_options->choice); init_fn = s_filters[filter_options->choice];
Re: Git Project Leadership Committee
On Fri, Aug 17, 2018 at 12:41 AM, Jeff King wrote: > So here are the nominations I came up with. If you'd like to nominate > somebody else (or yourself!), please do. If you have opinions, let me > know (public or private, as you prefer). > > - Christian Couder > - Ævar Arnfjörð Bjarmason Thanks for nominating both! > Both are active, have been around a long time, and have taken part in > non-code activities and governance discussions. My understanding is that > Christian freelances on Git, which doesn't quite fit my "volunteer > representative" request, but I think contracting on Git is its own > interesting perspective to represent (and certainly he spent many years > on the volunteer side). Yeah, I am freelancing since October 2015 mostly for GitLab, Booking.com and Protocol Labs as can be seen on my LinkedIn profile: https://www.linkedin.com/in/christian-couder-569a731/ I feel lucky to be considered mostly like a regular employee especially by GitLab and Protocol Labs. Both of these companies employ a high ratio of remote developers from around the world, who often have some kind of freelance legal status, so they give them as much as possible the same kind of perks or incentives (like stock options) as regular employees. GitLab is a very open and transparent company. The way it works is described in details in its Handbook (https://about.gitlab.com/handbook/). Its informal policy regarding Git has been to use regular released versions of Git in GitLab. If possible GitLab should use a recent version of Git to benefit from the latest improvements, though it should be compatible with old versions of Git, as this can be useful for example to people who want to build GitLab from source on top of a regular Linux distro that comes with an old Git. So for GitLab my work on Git has to be integrated upstream. I have been working on remote odb related things, which I haven't managed to get merged yet, and on a few other small things like delta islands for which things have been going better so far. I also do some Git support at GitLab (for Git users, GitLab developers, customers, sales people, ...). I am sponsored by them to participate in or give presentations at conferences (like FOSSASIA 2017, GSoC Mentor Summit, Bloomberg Hackathon, Git Merge, GitLab Summit, ...). And sometimes I do other marketing, security, developer relations or sales (like meeting a few French customers) related things. Ævar already talked in details about Booking.com and my work for them. I have been working much less for Protocol Labs than for GitLab or Booking.com since I started working for GitLab around 2 years ago. As with Git I had started working on my free time on IPFS (https://ipfs.io/) before I became a freelance working on it. So for Protocol Labs I have been using Sharness (https://github.com/chriscool/sharness/, which was created in 2011 by extracting t/test-lib.sh from Git) to add and maintain end to end tests to go-IPFS and other IPFS related projects. Around one year ago Protocol Labs made a successful ICO (Initial Coin Offering) for Filecoin (https://filecoin.io/) and since then things have become a bit more like in a regular company (which is not necessarily bad). I have also had a few consulting contracts from various French companies for a few days each about consulting or teaching Git/GitLab.
Re: [PATCH v5 3/7] unpack-trees: optimize walking same trees with cache-tree
On 8/18/2018 10:41 AM, Nguyễn Thái Ngọc Duy wrote: In order to merge one or many trees with the index, unpack-trees code walks multiple trees in parallel with the index and performs n-way merge. If we find out at start of a directory that all trees are the same (by comparing OID) and cache-tree happens to be available for that directory as well, we could avoid walking the trees because we already know what these trees contain: it's flattened in what's called "the index". The upside is of course a lot less I/O since we can potentially skip lots of trees (think subtrees). We also save CPU because we don't have to inflate and apply the deltas. The downside is of course more fragile code since the logic in some functions are now duplicated elsewhere. "checkout -" with this patch on webkit.git (275k files): baseline new 0.056651714 0.080394752 s: read cache .git/index 0.183101080 0.216010838 s: preload index 0.008584433 0.008534301 s: refresh index 0.633767589 0.251992198 s: traverse_trees 0.340265448 0.377031383 s: check_updates 0.381884638 0.372768105 s: cache_tree_update 1.401562947 1.045887251 s: unpack_trees 0.338687914 0.314983512 s: write index, changed mask = 2e 0.411927922 0.062572653 s:traverse_trees 0.23335 0.22544 s:check_updates 0.423697246 0.073795585 s: unpack_trees 0.423708360 0.073807557 s: diff-index 2.559524127 1.938191592 s: git command: git checkout - Another measurement from Ben's running "git checkout" with over 500k trees (on the whole series): baselinenew -- 0.535510167 0.556558733 s: read cache .git/index 0.3057373 0.3147105 s: initialize name hash 0.0184082 0.023558433 s: preload index 0.086910967 0.089085967 s: refresh index 7.889590767 2.191554433 s: unpack trees 0.120760833 0.131941267 s: update worktree after a merge 2.2583504 2.572663167 s: repair cache-tree 0.8916137 0.959495233 s: write index, changed mask = 28 3.405199233 0.2710663 s: unpack trees 0.000999667 0.0021554 s: update worktree after a merge 3.4063306 0.273318333 s: diff-index 16.9524923 9.462943133 s: git command: git.exe checkout This command calls unpack_trees() twice, the first time on 2way merge and the second 1way merge. In both times, "unpack trees" time is reduced to one third. Overall time reduction is not that impressive of course because index operations take a big chunk. And there's that repair cache-tree line. PS. A note about cache-tree invalidation and the use of it in this code. We do invalidate cache-tree in _source_ index when we add new entries to the (temporary) "result" index. But we also use the cache-tree from source index in this optimization. Does this mean we end up having no cache-tree in the source index to activate this optimization? The answer is twisted: the order of finding a good cache-tree and invalidating it matters. In this case we check for a good cache-tree first in all_trees_same_as_cache_tree(), then we start to merge things and potentially invalidate that same cache-tree in the process. Since cache-tree invalidation happens after the optimization kicks in, we're still good. But we may lose that cache-tree at the very first call_unpack_fn() call in traverse_by_cache_tree(). Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- unpack-trees.c | 127 + 1 file changed, 127 insertions(+) diff --git a/unpack-trees.c b/unpack-trees.c index 6d9f692ea6..8376663b59 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -635,6 +635,102 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid); } +static int all_trees_same_as_cache_tree(int n, unsigned long dirmask, + struct name_entry *names, + struct traverse_info *info) +{ + struct unpack_trees_options *o = info->data; + int i; + + if (!o->merge || dirmask != ((1 << n) - 1)) + return 0; + + for (i = 1; i < n; i++) + if (!are_same_oid(names, names + i)) + return 0; + + return cache_tree_matches_traversal(o->src_index->cache_tree, names, info); +} + +static int index_pos_by_traverse_info(struct name_entry *names, + struct traverse_info *info) +{ + struct unpack_trees_options *o = info->data; + int len = traverse_path_len(info, names); + char *name = xmalloc(len + 1 /* slash */ + 1 /* NUL
Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output
On Sat, Aug 18, 2018 at 6:16 PM Junio C Hamano wrote: > > Actually, let's just lose the conditional. strbuf_add() would catch > > and issue an error message when it notices that we fed negative > > count anyway ;-). > > So I'll have this applied on top of the original topic to prevent a > buggy version from escaping the lab. > > -- >8 -- > Subject: [PATCH] sideband: do not read beyond the end of input > > The caller of maybe_colorize_sideband() gives a counted buffer > , but the callee checked src[] as if it were a NUL terminated > buffer. If src[] had all isspace() bytes in it, we would have made > n negative, and then > > (1) made number of strncasecmp() calls to see if the remaining > bytes in src[] matched keywords, reading beyond the end of the > array (this actually happens even if n does not go negative), > and/or > > (2) called strbuf_add() with negative count, most likely triggering > the "you want to use way too much memory" error due to unsigned > integer overflow. > > Fix both issues by making sure we do not go beyond [n]. > > In the longer term we may want to accept size_t as parameter for > clarity (even though we know that a sideband message we are painting > typically would fit on a line on a terminal and int is sufficient). > Write it down as a NEEDSWORK comment. > > Helped-by: Jonathan Nieder > Signed-off-by: Junio C Hamano > --- > sideband.c | 8 ++-- > t/t5409-colorize-remote-messages.sh | 14 ++ > 2 files changed, 20 insertions(+), 2 deletions(-) > > diff --git a/sideband.c b/sideband.c > index 1c6bb0e25b..368647acf8 100644 > --- a/sideband.c > +++ b/sideband.c > @@ -65,6 +65,8 @@ void list_config_color_sideband_slots(struct string_list > *list, const char *pref > * Optionally highlight one keyword in remote output if it appears at the > start > * of the line. This should be called for a single line only, which is > * passed as the first N characters of the SRC array. > + * > + * NEEDSWORK: use "size_t n" instead for clarity. > */ > static void maybe_colorize_sideband(struct strbuf *dest, const char *src, > int n) > { > @@ -75,7 +77,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, > const char *src, int n) > return; > } > > - while (isspace(*src)) { > + while (0 < n && isspace(*src)) { > strbuf_addch(dest, *src); > src++; > n--; > @@ -84,6 +86,9 @@ static void maybe_colorize_sideband(struct strbuf *dest, > const char *src, int n) > for (i = 0; i < ARRAY_SIZE(keywords); i++) { > struct keyword_entry *p = keywords + i; > int len = strlen(p->keyword); > + > + if (n <= len) > + continue; I would suggest if (n < len) continue; .. if (!strncasecmp(p->keyword, src, len) && (n == len || !isalnum(src[len]))) { so we colorize a single line that looks like "warning" as well Other than that, LGTM. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output
and, thanks for cleaning up after me :) -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
[no subject]
Üdvözlet neked, a nevem Mrs. Charity Edward. egy francia állampolgárság, özvegy vagyok, jelenleg kórházba került a rákbetegség miatt. Közben úgy döntöttem, hogy pénzt adok neked mint egy megbízható személy, aki ezt a pénzt bölcsen fogja felhasználni, 3 800 000 millió eurót. segíteni a szegényeket és kevésbé kiváltságos. Tehát ha hajlandóak vagyunk elfogadni ezt az ajánlatot, és pontosan úgy, ahogy én utasítom, akkor további részletekért térjenek vissza hozzám. Mrs. Charity Edward Bam.
Do not raise conflict when a code in a patch was already added
So, steps-to-reproduce below rather full of trivia like setting up a repo, but the TL;DR is: Upon using `git rebase -i HEAD~1` and then `git add -p` to add part of a "hunk" as one commit, and then using `git rebase --continue` so the other part of hunk would be left in top commit; git raises a conflict. It's spectacular, that content of one of inserted conflict markers is empty, so all you have to do is to remove the markers, and use `git add` on the file, and then `git rebase --continue` Its a lot of unncessary actions, git could just figure that the code it sees in the patch is already there, being a part of another commit. Maybe git could issue a warning, or to question a user interactively (y/n); but raising a conflict IMO is unnecessary. # Steps to reproduce In empty dir execute: $ git init $ touch test Initialized empty Git repository in /tmp/test/.git/ $ git add test $ git commit [master (root-commit) a7ce543] 1st commit 1 file changed, 2 insertions(+) create mode 100644 test $ echo -e "foo\nbar" > test # content you'll want to break $ git add -u && git commit [detached HEAD 9e28331] 2-nd commit 1 file changed, 2 insertions(+) $ git rebase -i --root Stopped at a7ce543... 1st commit You can amend the commit now, with git commit --amend Once you are satisfied with your changes, run git rebase --continue Put "edit" for the 2-nd commit $ git reset HEAD^ Unstaged changes after reset: M test $ git add -p diff --git a/test b/test index e69de29..3bd1f0e 100644 --- a/test +++ b/test @@ -0,0 +1,2 @@ +foo +bar Stage this hunk [y,n,q,a,d,e,?]? e ╭─constantine@constantine-N61Ja /tmp/test ‹node-› ‹› (e721fa3*) ╰─$ git commit [detached HEAD 27b2f63] add foo 1 file changed, 1 insertion(+) ╭─constantine@constantine-N61Ja /tmp/test ‹node-› ‹› (27b2f63*) ╰─$ git rebase --continue test: needs update You must edit all merge conflicts and then mark them as resolved using git add What happened is that it's obvious that the hunk was broken to multiple commits, and git should figure that out, and not to raise a conflict. Side note: for some reason in the test git didn't insert conflict markers. It did in real-world usecase though, and there was simply no content inside one of them.
Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)
On 17/08/2018 23:44, Junio C Hamano wrote: > Here are the topics that have been cooking. Commits prefixed with > '-' are only in 'pu' (proposed updates) while commits prefixed with > '+' are in 'next'. The ones marked with '.' do not appear in any of > the integration branches, but I am still holding onto them. > * pw/rebase-i-author-script-fix (2018-08-07) 2 commits > - sequencer: fix quoting in write_author_script > - sequencer: handle errors from read_author_ident() > > Recent "git rebase -i" update started to write bogusly formatted > author-script, with a matching broken reading code. These are > being fixed. > > Undecided. > Is it the list consensus to favor this "with extra code, read the > script written by bad writer" approach? I think there was agreement between myself and Eric on the last version, I'm not sure anyone else has expressed an opinion. The problem with fixing the quoting without any backwards compatibility is that if git is upgraded while a rebase is stopped read_author_script() will happily use the broken quoting to create a corrupted author name in the new commit if the name contains "'". The compatibility code in the latest version relies on the missing "'" at the end of the GIT_AUTHOR_DATE line which is fixed by es/rebase-i-author-script-fix which is now in master. If there is a release with es/rebase-i-author-script-fix but not pw/rebase-i-author-script-fix we'll have to rethink as the detection wont be reliable. I have a branch that fixes read_author_script() to use sq_dequote() at https://github.com/phillipwood/git/commits/wip/fix-author-script. At the moment it has compatibility with broken quoting, but I could strip that out and then sq_dequote() will return an error with the broken quoting and the user would have to restart the rebase. So one option is to drop this series and wait for me to finish the improved solution next month. > > * pw/add-p-select (2018-07-26) 4 commits > - add -p: optimize line selection for short hunks > - add -p: allow line selection to be inverted > - add -p: select modified lines correctly > - add -p: select individual hunk lines > > "git add -p" interactive interface learned to let users choose > individual added/removed lines to be used in the operation, instead > of accepting or rejecting a whole hunk. > > Will hold. > cf. > I found the feature to be hard to explain, and may result in more > end-user complaints, but let's see. Thanks, I'll send some follow up patches to improve the help and documentation next month. Best Wishes Phillip
Re: Antw: Re: non-smooth progress indication for git fsck and git gc
Hi! Here are some stats from the repository. First the fast import ones (which had good performance, but probably all cached, also): % git fast-import <../git-stream /usr/lib/git/git-fast-import statistics: - Alloc'd objects: 55000 Total objects:51959 (56 duplicates ) blobs :47991 ( 0 duplicates 0 deltas of 0 attempts) trees : 3946 (56 duplicates994 deltas of 3929 attempts) commits: 22 ( 0 duplicates 0 deltas of 0 attempts) tags :0 ( 0 duplicates 0 deltas of 0 attempts) Total branches: 15 (15 loads ) marks:1048576 ( 48013 unique) atoms: 43335 Memory total: 9611 KiB pools: 7033 KiB objects: 2578 KiB - pack_report: getpagesize()= 4096 pack_report: core.packedGitWindowSize = 1073741824 pack_report: core.packedGitLimit = 8589934592 pack_report: pack_used_ctr= 1780 pack_report: pack_mmap_calls = 23 pack_report: pack_open_windows= 1 / 1 pack_report: pack_mapped =2905751 /2905751 - Then the output from git-sizer: Processing blobs: 47991 Processing trees: 3946 Processing commits: 22 Matching commits to trees: 22 Processing annotated tags: 0 Processing references: 15 | Name | Value | Level of concern | | | - | -- | | Overall repository size | || | * Blobs | || | * Total size | 13.7 GiB | * | | | || | Biggest objects | || | * Trees | || | * Maximum entries [1] | 13.4 k | * | | * Blobs | || | * Maximum size [2] | 279 MiB | * | | | || | Biggest checkouts| || | * Maximum path depth [3] |10 | * | | * Maximum path length[3] | 130 B | * | | * Total size of files[3] | 8.63 GiB | * | [1] b701345cbd4317276568b9d9890fd77f38933bdc (refs/heads/master:Resources/default/CIFP) [2] 19f54c4a7595667329c1be23200234f0fe50ac56 (refs/heads/master:Resources/default/apt.dat) [3] b0e3d3a2b7f2504117408f13486c905a8eb8fb1e (refs/heads/master^{tree}) Some notes: [1] is a directory with many short (typically < 1kB) text files [2] is a very large text file with many changes [3] Yes, some paths are long Regards, Ulrich >>> Ævar Arnfjörð Bjarmason schrieb am 20.08.2018 um 10:57 in Nachricht <87woslpg9i@evledraar.gmail.com>: > On Mon, Aug 20 2018, Ulrich Windl wrote: > > Jeff King schrieb am 16.08.2018 um 22:55 in Nachricht >> <20180816205556.ga8...@sigill.intra.peff.net>: >>> On Thu, Aug 16, 2018 at 10:35:53PM +0200, Ævar Arnfjörð Bjarmason wrote: >>> This is all interesting, but I think unrelated to what Ulrich is talking about. Quote: Between the two phases of "git fsck" (checking directories and checking objects) there was a break of several seconds where no progress was indicated I.e. it's not about the pause you get with your testcase (which is certainly another issue) but the break between the two progress bars. >>> >>> I think he's talking about both. What I said responds to this: >> >> Hi guys! >> >> Yes, I was wondering what git does between the two visible phases, and > between >> the lines I was suggesting another progress message between those phases. At >> least the maximum unspecific three-dot-message "Thinking..." could be > displayed >> ;-) Of course anything more appropriate would be welcome. >> Also that message should only be displayed if it's foreseeable that the >> operation will take significant time. In my case (I just repeated it a few >> minutes ago) the delay is significant (at least 10 seconds). As noted > earlier I >> was hoping to capture the timing in a screencast, but it seems
Re: Draft of Git Rev News edition 42
On Mon, Aug 20, 2018 at 10:51 AM, Ævar Arnfjörð Bjarmason wrote: > On Mon, Aug 20 2018, Christian Couder wrote: >> Jakub, Markus, Gabriel and me plan to publish this edition on >> Wednesday August 22nd. > > Let's mention that we've picked SHA-256 as NewHash, as a follow-up to > last month's edition which covered the discussion around the NewHash > selection. I've submitted a PR for that here: > https://github.com/git/git.github.io/pull/307/files Great, merged! Thanks!
Re: Antw: Re: non-smooth progress indication for git fsck and git gc
On Mon, Aug 20 2018, Ulrich Windl wrote: Jeff King schrieb am 16.08.2018 um 22:55 in Nachricht > <20180816205556.ga8...@sigill.intra.peff.net>: >> On Thu, Aug 16, 2018 at 10:35:53PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >>> This is all interesting, but I think unrelated to what Ulrich is talking >>> about. Quote: >>> >>> Between the two phases of "git fsck" (checking directories and >>> checking objects) there was a break of several seconds where no >>> progress was indicated >>> >>> I.e. it's not about the pause you get with your testcase (which is >>> certainly another issue) but the break between the two progress bars. >> >> I think he's talking about both. What I said responds to this: > > Hi guys! > > Yes, I was wondering what git does between the two visible phases, and between > the lines I was suggesting another progress message between those phases. At > least the maximum unspecific three-dot-message "Thinking..." could be > displayed > ;-) Of course anything more appropriate would be welcome. > Also that message should only be displayed if it's foreseeable that the > operation will take significant time. In my case (I just repeated it a few > minutes ago) the delay is significant (at least 10 seconds). As noted earlier > I > was hoping to capture the timing in a screencast, but it seems all the delays > were just optimized away in the recording. > >> >>> >> During "git gc" the writing objects phase did not update for some >>> >> seconds, but then the percentage counter jumped like from 15% to 42%. >> >> But yeah, I missed that the fsck thing was specifically about a break >> between two meters. That's a separate problem, but also worth >> discussing (and hopefully much easier to address). >> >>> If you fsck this repository it'll take around (on my spinning rust >>> server) 30 seconds between 100% of "Checking object directories" before >>> you get any output from "Checking objects". >>> >>> The breakdown of that is (this is from approximate eyeballing): >>> >>> * We spend 1-3 seconds just on this: >>> >> > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack > >> -check.c#L181 >> >> OK, so that's checking the sha1 over the .idx file. We could put a meter >> on that. I wouldn't expect it to generally be all that slow outside of >> pathological cases, since it scales with the number of objects (and 1s >> is our minimum update anyway, so that might be OK as-is). Your case has >> 13M objects, which is quite large. > > Sometimes an oldish CPU could bring performance surprises, maybe. Anyway my > CPU is question is an AMD Phenom2 quad-core with 3.2GHz nominal, and there is > a > classic spinning disk with 5400RPM built in... > >> >>> * We spend the majority of the ~30s on this: >>> >> > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack > >> -check.c#L70-L79 >> >> This is hashing the actual packfile. This is potentially quite long, >> especially if you have a ton of big objects. > > That seems to apply. BTW: Is there a way go get some repository statistics > like a histogram of object sizes (or whatever that might be useful to help > making decisions)? The git-sizer program is really helpful in this regard: https://github.com/github/git-sizer >> >> I wonder if we need to do this as a separate step anyway, though. Our >> verification is based on index-pack these days, which means it's going >> to walk over the whole content as part of the "Indexing objects" step to >> expand base objects and mark deltas for later. Could we feed this hash >> as part of that walk over the data? It's not going to save us 30s, but >> it's likely to be more efficient. And it would fold the effort naturally >> into the existing progress meter. >> >>> * Wes spend another 3-5 seconds on this QSORT: >>> >> > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack > >> -check.c#L105 >> >> That's a tough one. I'm not sure how we'd count it (how many compares we >> do?). And each item is doing so little work that hitting the progress >> code may make things noticeably slower. > > If it's sorting, maybe add some code like (wild guess): > > if (objects_to_sort > magic_number) >message("Sorting something..."); I think a good solution to these cases is to just introduce something to the progress.c mode where it learns how to display a counter where we don't know what the end-state will be. Something like your proposed magic_number can just be covered under the more general case where we don't show the progress bar unless it's been 1 second (which I believe is the default). >> >> Again, your case is pretty big. Just based on the number of objects, >> linux.git should be 1.5-2.5 seconds on your machine for the same >> operation. Which I think may be small enough to ignore (or even just >> print a generic before/after). It's really the 30s packfile hash that's >> making the whole thing so terrible. >> >> -Peff
Re: Draft of Git Rev News edition 42
On Mon, Aug 20 2018, Christian Couder wrote: > Hi, > > A draft of a new Git Rev News edition is available here: > > > https://github.com/git/git.github.io/blob/master/rev_news/drafts/edition-42.md > > Everyone is welcome to contribute in any section either by editing the > above page on GitHub and sending a pull request, or by commenting on > this GitHub issue: > > https://github.com/git/git.github.io/issues/305 > > You can also reply to this email. > > In general all kinds of contribution, for example proofreading, > suggestions for articles or links, help on the issues in GitHub, and > so on, are very much appreciated. > > I tried to cc everyone who appears in this edition, but maybe I missed > some people, sorry about that. > > Jakub, Markus, Gabriel and me plan to publish this edition on > Wednesday August 22nd. Let's mention that we've picked SHA-256 as NewHash, as a follow-up to last month's edition which covered the discussion around the NewHash selection. I've submitted a PR for that here: https://github.com/git/git.github.io/pull/307/files
Antw: Re: non-smooth progress indication for git fsck and git gc
>>> Jeff King schrieb am 16.08.2018 um 22:55 in Nachricht <20180816205556.ga8...@sigill.intra.peff.net>: > On Thu, Aug 16, 2018 at 10:35:53PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> This is all interesting, but I think unrelated to what Ulrich is talking >> about. Quote: >> >> Between the two phases of "git fsck" (checking directories and >> checking objects) there was a break of several seconds where no >> progress was indicated >> >> I.e. it's not about the pause you get with your testcase (which is >> certainly another issue) but the break between the two progress bars. > > I think he's talking about both. What I said responds to this: Hi guys! Yes, I was wondering what git does between the two visible phases, and between the lines I was suggesting another progress message between those phases. At least the maximum unspecific three-dot-message "Thinking..." could be displayed ;-) Of course anything more appropriate would be welcome. Also that message should only be displayed if it's foreseeable that the operation will take significant time. In my case (I just repeated it a few minutes ago) the delay is significant (at least 10 seconds). As noted earlier I was hoping to capture the timing in a screencast, but it seems all the delays were just optimized away in the recording. > >> >> During "git gc" the writing objects phase did not update for some >> >> seconds, but then the percentage counter jumped like from 15% to 42%. > > But yeah, I missed that the fsck thing was specifically about a break > between two meters. That's a separate problem, but also worth > discussing (and hopefully much easier to address). > >> If you fsck this repository it'll take around (on my spinning rust >> server) 30 seconds between 100% of "Checking object directories" before >> you get any output from "Checking objects". >> >> The breakdown of that is (this is from approximate eyeballing): >> >> * We spend 1-3 seconds just on this: >> > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack > -check.c#L181 > > OK, so that's checking the sha1 over the .idx file. We could put a meter > on that. I wouldn't expect it to generally be all that slow outside of > pathological cases, since it scales with the number of objects (and 1s > is our minimum update anyway, so that might be OK as-is). Your case has > 13M objects, which is quite large. Sometimes an oldish CPU could bring performance surprises, maybe. Anyway my CPU is question is an AMD Phenom2 quad-core with 3.2GHz nominal, and there is a classic spinning disk with 5400RPM built in... > >> * We spend the majority of the ~30s on this: >> > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack > -check.c#L70-L79 > > This is hashing the actual packfile. This is potentially quite long, > especially if you have a ton of big objects. That seems to apply. BTW: Is there a way go get some repository statistics like a histogram of object sizes (or whatever that might be useful to help making decisions)? > > I wonder if we need to do this as a separate step anyway, though. Our > verification is based on index-pack these days, which means it's going > to walk over the whole content as part of the "Indexing objects" step to > expand base objects and mark deltas for later. Could we feed this hash > as part of that walk over the data? It's not going to save us 30s, but > it's likely to be more efficient. And it would fold the effort naturally > into the existing progress meter. > >> * Wes spend another 3-5 seconds on this QSORT: >> > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack > -check.c#L105 > > That's a tough one. I'm not sure how we'd count it (how many compares we > do?). And each item is doing so little work that hitting the progress > code may make things noticeably slower. If it's sorting, maybe add some code like (wild guess): if (objects_to_sort > magic_number) message("Sorting something..."); > > Again, your case is pretty big. Just based on the number of objects, > linux.git should be 1.5-2.5 seconds on your machine for the same > operation. Which I think may be small enough to ignore (or even just > print a generic before/after). It's really the 30s packfile hash that's > making the whole thing so terrible. > > -Peff
Antw: Re: non-smooth progress indication for git fsck and git gc
>>> Duy Nguyen schrieb am 16.08.2018 um 17:18 in Nachricht : > On Thu, Aug 16, 2018 at 1:10 PM Ulrich Windl > wrote: >> >> Hi! >> >> I'd like to point out some minor issue observed while processing some > 5-object repository with many binary objects, but most are rather small: >> >> Between the two phases of "git fsck" (checking directories and checking > objects) there was a break of several seconds where no progress was > indicated. >> >> During "git gc" the writing objects phase did not update for some seconds, > but then the percentage counter jumped like from 15% to 42%. >> >> I understand that updating the progress output too often can be a > performance bottleneck, while upating it too rarely might only bore the > user... ;-) >> >> But maybe something can be done. My git version is 2.13.7 (openSUSE 42.3). > > Is it possible to make this repository public? You can also use "git > fast-export --anonymize" to make a repo with same structure but no > real content (but read the man page about that option first) Hi! Actually I tried that locally, but with the resulting repository both, fsck and gc are very fast. So I guess it won't be very useful. Also the original .git directory uses 5.3G, while the anonymous .git just used 4.3M... I tried to capture the behavior as screencast, but it seems the screencast optimized the little cahnges away, and in the result git almost had no delay on any operation 8-( Regards, Ulrich > >> Regards, >> Ulrich >> >> > > > -- > Duy
Draft of Git Rev News edition 42
Hi, A draft of a new Git Rev News edition is available here: https://github.com/git/git.github.io/blob/master/rev_news/drafts/edition-42.md Everyone is welcome to contribute in any section either by editing the above page on GitHub and sending a pull request, or by commenting on this GitHub issue: https://github.com/git/git.github.io/issues/305 You can also reply to this email. In general all kinds of contribution, for example proofreading, suggestions for articles or links, help on the issues in GitHub, and so on, are very much appreciated. I tried to cc everyone who appears in this edition, but maybe I missed some people, sorry about that. Jakub, Markus, Gabriel and me plan to publish this edition on Wednesday August 22nd. Thanks, Christian.