Re: [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc'
On Fri, 8 Jun 2018 at 15:56, Derrick Stolee wrote: > [..], the following > diff occurs from the previous patch: [...] > diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh > index b24e8b6689..9a0661983c 100755 > --- a/t/t5318-commit-graph.sh > +++ b/t/t5318-commit-graph.sh > @@ -33,8 +33,8 @@ test_expect_success 'create commits and repack' ' > ' > > graph_git_two_modes() { > - git -c core.commitGraph=true $1 >output > - git -c core.commitGraph=false $1 >expect > + git -c core.graph=true $1 >output > + git -c core.graph=false $1 >expect > test_cmp output expect > } It seems that you have accidentally removed the fix from previous version. It needs to be core.commitGraph, not core.graph. Best, -- Jakub Narębski
Re: GDPR compliance best practices?
On Thu, Jun 07, 2018 at 04:53:16PM -0700, David Lang wrote: > the license is granted to the world, so the world has an interest in it. Certainly, but you need to have overriding legitimate grounds. An interest is not enough for justification. You have to weight your interests against those of the subject. > Unless you are going to argue that the GDPR outlawed open source > development. No it certainly did not and I don't see how it could. All the GDPR arguably demands is that the author's identity is deleted from a public repository if he wishes so. Just assume it was a CVS repo. Then removal would not be any issue at all. It is a technical speciality of git that makes the removal so intricate to implement, which is not at all an intrinsic property of open source development. > you are the one arguing that the GDPR prohibits Git from storing and > revealing this license granting data, not me. It prohibits publishing, and only after a request to be forgotten. It does not prohibit storing your private copy. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Thu, Jun 07, 2018 at 10:53:13PM -0400, Theodore Y. Ts'o wrote: > The problem is you've left undefined who is "you"? With an open > source project, anyone who has contributed to open source project has > a copyright interest. That hobbyist in German who submitted a patch? > They have a copyright interest. That US Company based in Redmond, > Washington? They own a copyright interest. Huawei in China? They > have a copyright interest. > > So there is no "privately". And "you" numbers in the thousands and > thousands of copyright holders of portions of the open source code. Of course there is "privately". Every single one of those who have the author information can keep it, privately, for themselves. But those that have received a request to be forgotten must not keep publishing it on the Internet for download or distribute it to others. > And of course, that's the other thing you seem to fundamentally not > understand about how git works. Every developer in the world working > on that open source project has their own copy. There is > fundamentally no way that you can expunge that information from every > single git repository in the world. The misunderstanding is on your side. If you run a website where the world can access a repository, you are responsible for obeying the GDPR with respect to that repository. If you receive a request to be forgotten, you have to make sure you stop publishing that author's identity as part of the repository. You do NOT need to - delete it from a private copy you have - care about others who publish that data - or even make sure the data is deleted from private copies others may have, even if the number of copies is in the thousands. In practical terms, if someone wishes to exercise his right to be forgotten, he will usually send the request to the maintainer and stop him from distributing the information, and perhaps to a third party he might use as a platform for publication, such as github. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: [RFC PATCH v1] telemetry design overview (part 1)
On Fri, Jun 8, 2018 at 11:40 AM, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Jun 07 2018, Johannes Sixt wrote: > >> Am 07.06.2018 um 16:53 schrieb g...@jeffhostetler.com: >>> From: Jeff Hostetler >>> >>> I've been working to add code to Git to optionally collect telemetry data. >>> The goal is to be able to collect performance data from Git commands and >>> allow it to be aggregated over a user community to find "slow commands". >> >> Seriously? "add code to collect telemetry data" said by somebody whose >> email address ends with @microsoft.com is very irritating. I really >> don't want to have yet another switch that I must check after every >> update that it is still off. > > To elaborate a bit on Jeff's reply (since this was discussed in more > detail at Git Merge this year), the point of this feature is not to ship > git.git with some default telemetry, but so that in-house users of git > like Microsoft, Dropbox, Booking.com etc. can build & configure their > internal versions of git to turn on telemetry for their own users. > > There's numerous in-house monkeypatches to git on various sites (at > least Microsoft & Dropbox reported having internal patches already). > Something like this facility would allow us to agree on some > implementation that could be shipped by default (but would be off by > default), those of us who'd make use of this feature already have "root" > on those user's machines, and control what git binary they use etc, > their /etc/gitconfig and so on. This elaboration should have been in the commit message of the first patch (and I hope it will be in v2). You guys knew because it was discussed at Git Merge but the rest of us may not.. It would be much more convincing to have something like this spelled out. -- Duy
Re: [PATCH v4 04/23] unpack-tress: convert clear_ce_flags* to avoid the_index
On Thu, Jun 7, 2018 at 5:11 PM Duy Nguyen wrote: > > On Thu, Jun 7, 2018 at 9:41 AM, Elijah Newren wrote: > > Subject line: unpack-trees rather than unpack-tress. > > > > > > > > On Wed, Jun 6, 2018 at 9:49 AM, Nguyễn Thái Ngọc Duy > > wrote: > >> Prior to fba92be8f7, this code implicitly (and incorrectly) assumes > >> the_index when running the exclude machinery. fba92be8f7 helps show > >> this problem clearer because unpack-trees operation is supposed to > >> work on whatever index the caller specifies... not specifically > >> the_index. > >> > >> Update the code to use "istate" argument that's originally from > >> mark_new_skip_worktree(). From the call sites, both in unpack_trees(), > >> you can see that this function works on two separate indexes: > >> o->src_index and o->result. The second mark_new_skip_worktree() so far > >> has incorecctly applied exclude rules on o->src_index instead of > >> o->result. It's unclear what is the consequences of this, but it's > >> definitely wrong. > >> > >> [1] fba92be8f7 (dir: convert is_excluded_from_list to take an index - > >> 2017-05-05) > >> > >> Signed-off-by: Nguyễn Thái Ngọc Duy > > > > A somewhat curious finding: when I was rebuilding and re-testing all > > 23 patches, I got a failure on this patch in test 31 of > > t7063-status-untracked-cache.sh. I did not get any test failures with > > any of the other patches. However, after re-running that test or the > > whole suite half a dozen times with just up to this patch applied, I > > was not able to trigger the failure again. Is there a rare race in > > that testcase? > > Untracked cache tests are very time-sensitive. I'll try to run and > re-run them a couple times to understand more. Thanks for pointing it > out. after hours of running tests, either with full 23 patches or just the first 4, and failing to catch the failure, i declare (or more like, pray) that you ran into a crack of time that led to a race. I'll take no action on this, but I'll remember this and watch out for untracked cache related mails for some time. -- Duy
Re: [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc'
On 6/8/2018 11:05 AM, Jakub Narębski wrote: On Fri, 8 Jun 2018 at 15:56, Derrick Stolee wrote: [..], the following diff occurs from the previous patch: [...] diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index b24e8b6689..9a0661983c 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -33,8 +33,8 @@ test_expect_success 'create commits and repack' ' ' graph_git_two_modes() { - git -c core.commitGraph=true $1 >output - git -c core.commitGraph=false $1 >expect + git -c core.graph=true $1 >output + git -c core.graph=false $1 >expect test_cmp output expect } It seems that you have accidentally removed the fix from previous version. It needs to be core.commitGraph, not core.graph. I didn't rebase the fix that I sent as a separate patch [1] (we want that change applied to 'master' while this one targets topics in 'next' and 'pu'). So this specific diff is unfortunate noise. Thanks! -Stolee [1] https://public-inbox.org/git/20180604123906.136417-1-dsto...@microsoft.com/ [PATCH] t5318-commit-graph.sh: use core.commitGraph
BUG: Merge commits are displayed by `--cherry-pick` despite on they introduce same change
The `--cherry-pick` option states: >Omit any commit that introduces the same change as another commit on the >“other side” For next git command I see next tree: $ git log --graph --decorate --pretty=oneline --abbrev-commit --cherry-mark --boundary --left-right < bc2820d (openapi3) Merge branch 'openapi3_exception_handling' into openapi3 |\ | = 4e1e0aa Testcase for not_found/exception requests to api | = b39673d 'not_found' templates should not be probed automatically. Only when fallback | = a8677df Dump empty schemas and non empty data | = 27db48f Implemente before_render hook |/ < aa60f0a Merge branch 'hide_temporal_interface' into openapi3 |\ | = 5dd02f2 Mutate current object after update | = d58c0ab Install missed modules |/ | > b8d09ce (xtucha/openapi3) Merge branch 'openapi3_exception_handling' into openapi3 | |\ | | = b6362ad Testcase for not_found/exception requests to api | | = dc926cc 'not_found' templates should not be probed automatically. Only when fallback | | = 55dc88d Dump empty schemas and non empty data | | = 8369185 Implemente before_render hook | |/ | > c03438d Merge branch 'hide_temporal_interface' into openapi3 | |\ | | = d534cc4 Mutate current object after update | | = 1b6af27 Install modules required by MonkeyMan | |/ | > a3b7230 Clarify schema |/ o f5b06ed Assign some defaults If I want to see only new commit I add `--left-only` option and get merge commits in output. < bc2820d (openapi3) Merge branch 'openapi3_exception_handling' into < aa60f0a Merge branch 'hide_temporal_interface' into openapi3 This commits introduce same change as another merge commit on the "other side" Thus they should not be displayed
Re: GDPR compliance best practices?
On Fri, Jun 08, 2018 at 10:45:51AM -0400, Theodore Y. Ts'o wrote: > *Anyone* can run a repository. It's not just github and gitlab. The > hobbiest in New Zealand, who might never visit Europe (so she can't > be arrested when she visits the fair shores of Europe) and who has no > business interests in Europe, can host such a web site. Just because letters of request are hardly enforced doesn't make it legal to break the GDPR. For sure, a hobbyist would not have much to fear, even if he is violating the GDPR and coming to Europe. The GDPR is mostly about taming the megacorporations, not about arresting tourists. > So the person trying to engage in censorship Censorship? The GDPR is not about censorship. If you want to write an opionion about someone by name, the GDPR gives you all legitimization to do so, against that person's will. This is about removing the data under ordinary circumstances. > would need to contact *everyone*. This is the subject's problem, not the repository provider's. > And someone who has a git note in their private repo who > then pushes to github/gitlab would end up pushing that note back up to > the web server. If that note has been deleted based on the right to be forgotten, you as the repository provider have to make sure you don't publish it again. Since you are allowed to keep a private copy, ensuring that shouldn't be a problem for you. > Great, so you can get github and gitlab to get rid of the information. > But it's *pointless*. It's up to the subject to consider it pointless or not to exercise his rights... > Your problem is in the word: "a" ...and against whom, whether one repository provider, the major ones, all of them he can find. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: is there a canonical doc about how to deal with whitespace issues?
On 6/8/2018 9:18 AM, Robert P. J. Day wrote: for one of my courses, i wanted to write a section about the various techniques for dealing with whitespace issues in git, so i started making a list, things like: - running "git diff --check" - "git commit --cleanup=" possibilities - config options like core.{eol,safecrlf,autocrlf} - i'm sure there are client-side hooks that can be mentioned etc, etc. has anyone ever written a doc that collects these things in one place? if not, i guess i have to write one. rday I don't know of a doc for whitespace issues, but the contributing guide on GitForWindows [1] recommends `git rebase --whitespace=fix`. Thanks, -Stolee [1] https://github.com/git-for-windows/git/blob/master/CONTRIBUTING.md#polish-your-commits
Re: GDPR compliance best practices?
On Fri, Jun 08, 2018 at 08:26:57AM +0200, Peter Backes wrote: > > If you run a website where the world can access a repository, you are > responsible for obeying the GDPR with respect to that repository. If > you receive a request to be forgotten, you have to make sure you stop > publishing that author's identity as part of the repository. > *Anyone* can run a repository. It's not just github and gitlab. The hobbiest in New Zealand, who might never visit Europe (so she can't be arrested when she visits the fair shores of Europe) and who has no business interests in Europe, can host such a web site. So the person trying to engage in censorship would need to contact *everyone*. And someone who has a git note in their private repo who then pushes to github/gitlab would end up pushing that note back up to the web server. > You do NOT need to > > - delete it from a private copy you have > - care about others who publish that data > - or even make sure the data is deleted from private copies others may > have, even if the number of copies is in the thousands. Great, so you can get github and gitlab to get rid of the information. But it's *pointless*. And given that real developers really do care about who authored a patch, and regularly will do operations that reference the authorship information, the fact that it is stored somewhere else (e.g., in a git note, per your proposal), *will* slow down those operations. > In practical terms, if someone wishes to exercise his right to be > forgotten, he will usually send the request to the maintainer and stop > him from distributing the information, and perhaps to a third party he > might use as a platform for publication, such as github. Your problem is in the word: "a" - Ted
Re: [RFC PATCH v1] telemetry design overview (part 1)
Am 08.06.2018 um 11:07 schrieb Jeff King: > On Thu, Jun 07, 2018 at 11:10:52PM +0200, Johannes Sixt wrote: > >> Am 07.06.2018 um 16:53 schrieb g...@jeffhostetler.com: >>> From: Jeff Hostetler >>> >>> I've been working to add code to Git to optionally collect telemetry data. >>> The goal is to be able to collect performance data from Git commands and >>> allow it to be aggregated over a user community to find "slow commands". >> >> Seriously? "add code to collect telemetry data" said by somebody whose email >> address ends with @microsoft.com is very irritating. I really don't want to >> have yet another switch that I must check after every update that it is >> still off. > > If you look at the design document, it's off by default and would write > to a file on the filesystem. That doesn't seem all that different from > GIT_TRACE. The patch also includes the following part +telemetry.plugin + + +If the config setting "telemetry.plugin" contains the pathname to a shared +library, the library will be dynamically loaded during start up and events +will be sent to it using the plugin API. + +This plugin model allows an organization to define a custom or private +telemetry solution while using a stock version of Git. + +For example, on Windows, it allows telemetry events to go directly to the +kernel via the plugin using the high performance Event Tracing for Windows +(ETW) facility. + +The contrib/telemetry-plugin-examples directory contains two example +plugins: + * A trivial log to stderr + * A trivial ETW writer which is not a file but, if enabled, some windows internal thingie where the data is gone/duplicated/sent out/whatever. I for my part would much rather prefer that to be a compile time option so that I don't need to check on every git update on windows if this is now enabled or not.
Re: [RFC PATCH v1] telemetry: design documenation
On Thu, Jun 07 2018, g...@jeffhostetler.com wrote: > From: Jeff Hostetler > > Create design documentation to describe the telemetry feature. > > Signed-off-by: Jeff Hostetler > --- > Documentation/technical/telemetry.txt | 475 > ++ > 1 file changed, 475 insertions(+) > create mode 100644 Documentation/technical/telemetry.txt > > diff --git a/Documentation/technical/telemetry.txt > b/Documentation/technical/telemetry.txt > new file mode 100644 > index 000..0a708ad > --- /dev/null > +++ b/Documentation/technical/telemetry.txt > @@ -0,0 +1,475 @@ > +Telemetry Design Notes > +== > + > +The telemetry feature allows Git to generate structured telemetry data > +for executed commands. Data includes command line arguments, execution > +times, error codes and messages, and information about child processes. > + > +Structued data is produced in a JSON-like format. (See the UTF-8 related s/Structued/Structured/ > +"limitations" described in json-writer.h) > + > +Telemetry data can be written to a local file or sent to a dynamically > +loaded shared library via a plugin API. > + > +The telemetry feature is similar to the existing trace API (defined in > +Documentation/technical/api-trace.txt). Telemetry events are generated > +thoughout the life of a Git command just like trace messages. But where s/thoughout/throughout/ > +as trace messages are essentially developer debug messages, telemetry > +events are intended for logging and automated analysis. > + > +The goal of the telemetry feature is to be able to gather usage data across > +a group of production users to identify real-world performance problems in > +production. Additionally, it might help identify common user errors and > +guide future user training. > + > +By default, telemetry is disabled. Telemetry is controlled using config > +settings (see "telemetry.*" in Documentation/config.txt). > + > + > +Telemetry Events > + > + > +Telemetry data is generated as a series of events. Each event is written > +as a self-describing JSON object. > + > +Events: cmd_start and cmd_exit > +-- > + > +The `cmd_start` event is emitted the very beginning of the git.exe process > +in cmd_main() and `cmd_exit` event is emitted at the end of the process in > +the atexit cleanup routine. > + > +For example, running "git version" produces: > + > +{ > + "event_name": "cmd_start", > + "argv": [ > +"C:\\work\\gfw\\git.exe", > +"version" > + ], > + "clock": 1525978509976086000, > + "pid": 25460, > + "git_version": "2.17.0.windows.1", > + "telemetry_version": "1", > + "session_id": "1525978509976086000-25460" > +} > +{ > + "event_name": "cmd_exit", > + "argv": [ > +"C:\\work\\gfw\\git.exe", > +"version" > + ], > + "clock": 1525978509980903391, > + "pid": 25460, > + "git_version": "2.17.0.windows.1", > + "telemetry_version": "1", > + "session_id": "1525978509976086000-25460", > + "is_interactive": false, > + "exit_code": 0, > + "elapsed_time_core": 0.004814, > + "elapsed_time_total": 0.004817, > + "builtin": { > +"name": "version" > + } > +} > + > +Fields common to all events: > + * `event_name` is the name of the event. > + * `argv` is the array of command line arguments. > + * `clock` is the time of the event in nanoseconds since the epoch. > + * `pid` is the process id. > + * `git_version` is the git version string. > + * `telemetry_version` is the version of the telemetry format. > + * `session_id` is described in a later section. I wonder if we should add the path to git to these if RUNTIME_PREFIX is defined. Would be useful to log experiments with different git versions, and as noted by the RUNTIME_PREFIX feature this information is not guaranteed to be something you can extract from argv. > +Additional fields in cmd_exit: > + * `is_interactive` is true if git.exe spawned an interactive child process, > + such as a pager, editor, prompt, or gui tool. > + * `exit_code` is the value passed to exit() from main(). > + * `error_message` (not shown) is the array of error messages. > + * `elapsed-core-time` measures the time in seconds until exit() was called. > + * `elapsed-total-time` measures the time until the atexit() routine starts > + (which will include time spend in other atexit() routines cleaning up > + child processes and etc.). > + * `alias` (not shown) the updated argv after alias expansion. > + * `builtin.name` is the canonical command name (from the cmd_struct[] > + table) of a builtin command. > + * `builtin.mode` (not shown) is shown for some commands that have different > + major modes and performance times. For example, checkout can switch > + branches or repair a single file. > + * `child_summary` (not shown) is described in a later section. > + * `timers` (not shown) is described in a later section. > + * `aux` (not shown) is described in a later section. > + > + > +Events:
[PATCH v6 04/21] commit: force commit to parse from object database
In anticipation of verifying commit-graph file contents against the object database, create parse_commit_internal() to allow side-stepping the commit-graph file and parse directly from the object database. Due to the use of generation numbers, this method should not be called unless the intention is explicit in avoiding commits from the commit-graph file. Signed-off-by: Derrick Stolee --- commit.c | 9 +++-- commit.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/commit.c b/commit.c index d53dc16d72..720c6acddf 100644 --- a/commit.c +++ b/commit.c @@ -418,7 +418,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s return 0; } -int parse_commit_gently(struct commit *item, int quiet_on_missing) +int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph) { enum object_type type; void *buffer; @@ -429,7 +429,7 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing) return -1; if (item->object.parsed) return 0; - if (parse_commit_in_graph(item)) + if (use_commit_graph && parse_commit_in_graph(item)) return 0; buffer = read_object_file(>object.oid, , ); if (!buffer) @@ -450,6 +450,11 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing) return ret; } +int parse_commit_gently(struct commit *item, int quiet_on_missing) +{ + return parse_commit_internal(item, quiet_on_missing, 1); +} + void parse_commit_or_die(struct commit *item) { if (parse_commit(item)) diff --git a/commit.h b/commit.h index 3ad07c2e3d..7e0f273720 100644 --- a/commit.h +++ b/commit.h @@ -77,6 +77,7 @@ struct commit *lookup_commit_reference_by_name(const char *name); struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name); int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size, int check_graph); +int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph); int parse_commit_gently(struct commit *item, int quiet_on_missing); static inline int parse_commit(struct commit *item) { -- 2.18.0.rc1
[PATCH v6 01/21] commit-graph: UNLEAK before die()
Signed-off-by: Derrick Stolee --- builtin/commit-graph.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 37420ae0fd..f0875b8bf3 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -51,8 +51,11 @@ static int graph_read(int argc, const char **argv) graph_name = get_commit_graph_filename(opts.obj_dir); graph = load_commit_graph_one(graph_name); - if (!graph) + if (!graph) { + UNLEAK(graph_name); die("graph file %s does not exist", graph_name); + } + FREE_AND_NULL(graph_name); printf("header: %08x %d %d %d %d\n", -- 2.18.0.rc1
[PATCH v6 11/21] commit-graph: verify root tree OIDs
The 'verify' subcommand must compare the commit content parsed from the commit-graph against the content in the object database. Use lookup_commit() and parse_commit_in_graph_one() to parse the commits from the graph and compare against a commit that is loaded separately and parsed directly from the object database. Add checks for the root tree OID. Signed-off-by: Derrick Stolee --- commit-graph.c | 17 - t/t5318-commit-graph.sh | 7 +++ 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/commit-graph.c b/commit-graph.c index 00e89b71e9..5df18394f9 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -866,6 +866,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g) return verify_commit_graph_error; for (i = 0; i < g->num_commits; i++) { + struct commit *graph_commit; + hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); if (i && oidcmp(_oid, _oid) >= 0) @@ -883,6 +885,11 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g) cur_fanout_pos++; } + + graph_commit = lookup_commit(_oid); + if (!parse_commit_in_graph_one(g, graph_commit)) + graph_report("failed to parse %s from commit-graph", +oid_to_hex(_oid)); } while (cur_fanout_pos < 256) { @@ -899,16 +906,24 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g) return verify_commit_graph_error; for (i = 0; i < g->num_commits; i++) { - struct commit *odb_commit; + struct commit *graph_commit, *odb_commit; hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); + graph_commit = lookup_commit(_oid); odb_commit = (struct commit *)create_object(r, cur_oid.hash, alloc_commit_node(r)); if (parse_commit_internal(odb_commit, 0, 0)) { graph_report("failed to parse %s from object database", oid_to_hex(_oid)); continue; } + + if (oidcmp(_commit_tree_in_graph_one(g, graph_commit)->object.oid, + get_commit_tree_oid(odb_commit))) + graph_report("root tree OID for commit %s in commit-graph is %s != %s", +oid_to_hex(_oid), + oid_to_hex(get_commit_tree_oid(graph_commit)), + oid_to_hex(get_commit_tree_oid(odb_commit))); } return verify_commit_graph_error; diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index af5e34c0cb..2b9214bc83 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -267,6 +267,8 @@ GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255)) GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256)) GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8)) GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 10)) +GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * $NUM_COMMITS)) +GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET # usage: corrupt_graph_and_verify # Manipulates the commit-graph file at the position @@ -341,4 +343,9 @@ test_expect_success 'detect OID not in object database' ' "from object database" ' +test_expect_success 'detect incorrect tree OID' ' + corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_TREE "\01" \ + "root tree OID for commit" +' + test_done -- 2.18.0.rc1
[PATCH v6 14/21] commit-graph: verify commit date
Signed-off-by: Derrick Stolee --- commit-graph.c | 6 ++ t/t5318-commit-graph.sh | 6 ++ 2 files changed, 12 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index e0f71658da..6d6c6beff9 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -986,6 +986,12 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g) oid_to_hex(_oid), graph_commit->generation, max_generation + 1); + + if (graph_commit->date != odb_commit->date) + graph_report("commit date for commit %s in commit-graph is %"PRItime" != %"PRItime, +oid_to_hex(_oid), +graph_commit->date, +odb_commit->date); } return verify_commit_graph_error; diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 5b75c4dca2..b7b4410e75 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -273,6 +273,7 @@ GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN)) GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4)) GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3)) GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11)) +GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12)) # usage: corrupt_graph_and_verify # Manipulates the commit-graph file at the position @@ -377,4 +378,9 @@ test_expect_success 'detect incorrect generation number' ' "non-zero generation number" ' +test_expect_success 'detect incorrect commit date' ' + corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATE "\01" \ + "commit date" +' + test_done -- 2.18.0.rc1
[PATCH v6 07/21] commit-graph: verify catches corrupt signature
This is the first of several commits that add a test to check that 'git commit-graph verify' catches corruption in the commit-graph file. The first test checks that the command catches an error in the file signature. This is a check that exists in the existing commit-graph reading code. Add a helper method 'corrupt_graph_and_verify' to the test script t5318-commit-graph.sh. This helper corrupts the commit-graph file at a certain location, runs 'git commit-graph verify', and reports the output to the 'err' file. This data is filtered to remove the lines added by 'test_must_fail' when the test is run verbosely. Then, the output is checked to contain a specific error message. Most messages from 'git commit-graph verify' will not be marked for translation. There will be one exception: the message that reports an invalid checksum will be marked for translation, as that is the only message that is intended for a typical user. Helped-by: Szeder Gábor Signed-off-by: Derrick Stolee --- t/t5318-commit-graph.sh | 43 + 1 file changed, 43 insertions(+) diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 6ca451dfd2..8f96e2636c 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -235,9 +235,52 @@ test_expect_success 'perform fast-forward merge in full repo' ' test_cmp expect output ' +# the verify tests below expect the commit-graph to contain +# exactly the commits reachable from the commits/8 branch. +# If the file changes the set of commits in the list, then the +# offsets into the binary file will result in different edits +# and the tests will likely break. + test_expect_success 'git commit-graph verify' ' cd "$TRASH_DIRECTORY/full" && + git rev-parse commits/8 | git commit-graph write --stdin-commits && git commit-graph verify >output ' +GRAPH_BYTE_VERSION=4 +GRAPH_BYTE_HASH=5 + +# usage: corrupt_graph_and_verify +# Manipulates the commit-graph file at the position +# by inserting the data, then runs 'git commit-graph verify' +# and places the output in the file 'err'. Test 'err' for +# the given string. +corrupt_graph_and_verify() { + pos=$1 + data="${2:-\0}" + grepstr=$3 + cd "$TRASH_DIRECTORY/full" && + test_when_finished mv commit-graph-backup $objdir/info/commit-graph && + cp $objdir/info/commit-graph commit-graph-backup && + printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$pos" conv=notrunc && + test_must_fail git commit-graph verify 2>test_err && + grep -v "^+" test_err >err + test_i18ngrep "$grepstr" err +} + +test_expect_success 'detect bad signature' ' + corrupt_graph_and_verify 0 "\0" \ + "graph signature" +' + +test_expect_success 'detect bad version' ' + corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \ + "graph version" +' + +test_expect_success 'detect bad hash version' ' + corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \ + "hash version" +' + test_done -- 2.18.0.rc1
[PATCH v6 08/21] commit-graph: verify required chunks are present
The commit-graph file requires the following three chunks: * OID Fanout * OID Lookup * Commit Data If any of these are missing, then the 'verify' subcommand should report a failure. This includes the chunk IDs malformed or the chunk count is truncated. Signed-off-by: Derrick Stolee --- commit-graph.c | 9 + t/t5318-commit-graph.sh | 29 + 2 files changed, 38 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index 22ef696e18..f30b4ccee9 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -848,5 +848,14 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g) return 1; } + verify_commit_graph_error = 0; + + if (!g->chunk_oid_fanout) + graph_report("commit-graph is missing the OID Fanout chunk"); + if (!g->chunk_oid_lookup) + graph_report("commit-graph is missing the OID Lookup chunk"); + if (!g->chunk_commit_data) + graph_report("commit-graph is missing the Commit Data chunk"); + return verify_commit_graph_error; } diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 8f96e2636c..c03792a8ed 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -249,6 +249,15 @@ test_expect_success 'git commit-graph verify' ' GRAPH_BYTE_VERSION=4 GRAPH_BYTE_HASH=5 +GRAPH_BYTE_CHUNK_COUNT=6 +GRAPH_CHUNK_LOOKUP_OFFSET=8 +GRAPH_CHUNK_LOOKUP_WIDTH=12 +GRAPH_CHUNK_LOOKUP_ROWS=5 +GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET +GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \ + 1 * $GRAPH_CHUNK_LOOKUP_WIDTH)) +GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \ +2 * $GRAPH_CHUNK_LOOKUP_WIDTH)) # usage: corrupt_graph_and_verify # Manipulates the commit-graph file at the position @@ -283,4 +292,24 @@ test_expect_success 'detect bad hash version' ' "hash version" ' +test_expect_success 'detect low chunk count' ' + corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\02" \ + "missing the .* chunk" +' + +test_expect_success 'detect missing OID fanout chunk' ' + corrupt_graph_and_verify $GRAPH_BYTE_OID_FANOUT_ID "\0" \ + "missing the OID Fanout chunk" +' + +test_expect_success 'detect missing OID lookup chunk' ' + corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ID "\0" \ + "missing the OID Lookup chunk" +' + +test_expect_success 'detect missing commit data chunk' ' + corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATA_ID "\0" \ + "missing the Commit Data chunk" +' + test_done -- 2.18.0.rc1
[PATCH v6 13/21] commit-graph: verify generation number
While iterating through the commit parents, perform the generation number calculation and compare against the value stored in the commit-graph. The tests demonstrate that having a different set of parents affects the generation number calculation, and this value propagates to descendants. Hence, we drop the single-line condition on the output. Since Git will ship with the commit-graph feature without generation numbers, we need to accept commit-graphs with all generation numbers equal to zero. In this case, ignore the generation number calculation. However, verify that we should never have a mix of zero and non-zero generation numbers. Create a test that sets one commit to generation zero and all following commits report a failure as they have non-zero generation in a file that contains generation number zero. Signed-off-by: Derrick Stolee --- commit-graph.c | 34 ++ t/t5318-commit-graph.sh | 11 +++ 2 files changed, 45 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index 6d8d774eb0..e0f71658da 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -846,10 +846,14 @@ static void graph_report(const char *fmt, ...) va_end(ap); } +#define GENERATION_ZERO_EXISTS 1 +#define GENERATION_NUMBER_EXISTS 2 + int verify_commit_graph(struct repository *r, struct commit_graph *g) { uint32_t i, cur_fanout_pos = 0; struct object_id prev_oid, cur_oid; + int generation_zero = 0; if (!g) { graph_report("no commit-graph file loaded"); @@ -911,6 +915,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g) for (i = 0; i < g->num_commits; i++) { struct commit *graph_commit, *odb_commit; struct commit_list *graph_parents, *odb_parents; + uint32_t max_generation = 0; hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); @@ -945,6 +950,9 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g) oid_to_hex(_parents->item->object.oid), oid_to_hex(_parents->item->object.oid)); + if (graph_parents->item->generation > max_generation) + max_generation = graph_parents->item->generation; + graph_parents = graph_parents->next; odb_parents = odb_parents->next; } @@ -952,6 +960,32 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g) if (odb_parents != NULL) graph_report("commit-graph parent list for commit %s terminates early", oid_to_hex(_oid)); + + if (!graph_commit->generation) { + if (generation_zero == GENERATION_NUMBER_EXISTS) + graph_report("commit-graph has generation number zero for commit %s, but non-zero elsewhere", +oid_to_hex(_oid)); + generation_zero = GENERATION_ZERO_EXISTS; + } else if (generation_zero == GENERATION_ZERO_EXISTS) + graph_report("commit-graph has non-zero generation number for commit %s, but zero elsewhere", +oid_to_hex(_oid)); + + if (generation_zero == GENERATION_ZERO_EXISTS) + continue; + + /* +* If one of our parents has generation GENERATION_NUMBER_MAX, then +* our generation is also GENERATION_NUMBER_MAX. Decrement to avoid +* extra logic in the following condition. +*/ + if (max_generation == GENERATION_NUMBER_MAX) + max_generation--; + + if (graph_commit->generation != max_generation + 1) + graph_report("commit-graph generation for commit %s is %u != %u", +oid_to_hex(_oid), +graph_commit->generation, +max_generation + 1); } return verify_commit_graph_error; diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 9a3481c30f..5b75c4dca2 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -272,6 +272,7 @@ GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN)) GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4)) GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3)) +GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11)) # usage: corrupt_graph_and_verify # Manipulates the commit-graph file at the position @@ -366,4 +367,14 @@ test_expect_success 'detect wrong parent' '
is there a canonical doc about how to deal with whitespace issues?
for one of my courses, i wanted to write a section about the various techniques for dealing with whitespace issues in git, so i started making a list, things like: - running "git diff --check" - "git commit --cleanup=" possibilities - config options like core.{eol,safecrlf,autocrlf} - i'm sure there are client-side hooks that can be mentioned etc, etc. has anyone ever written a doc that collects these things in one place? if not, i guess i have to write one. rday -- Robert P. J. Day Ottawa, Ontario, CANADA http://crashcourse.ca/dokuwiki Twitter: http://twitter.com/rpjday LinkedIn: http://ca.linkedin.com/in/rpjday
[PATCH v6 17/21] fsck: verify commit-graph
If core.commitGraph is true, verify the contents of the commit-graph during 'git fsck' using the 'git commit-graph verify' subcommand. Run this check on all alternates, as well. We use a new process for two reasons: 1. The subcommand decouples the details of loading and verifying a commit-graph file from the other fsck details. 2. The commit-graph verification requires the commits to be loaded in a specific order to guarantee we parse from the commit-graph file for some objects and from the object database for others. Signed-off-by: Derrick Stolee --- Documentation/git-fsck.txt | 3 +++ builtin/fsck.c | 21 + t/t5318-commit-graph.sh| 8 3 files changed, 32 insertions(+) diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt index b9f060e3b2..ab9a93fb9b 100644 --- a/Documentation/git-fsck.txt +++ b/Documentation/git-fsck.txt @@ -110,6 +110,9 @@ Any corrupt objects you will have to find in backups or other archives (i.e., you can just remove them and do an 'rsync' with some other site in the hopes that somebody else has the object you have corrupted). +If core.commitGraph is true, the commit-graph file will also be inspected +using 'git commit-graph verify'. See linkgit:git-commit-graph[1]. + Extracted Diagnostics - diff --git a/builtin/fsck.c b/builtin/fsck.c index 3ad4f160f9..9fb2edc69f 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -18,6 +18,7 @@ #include "decorate.h" #include "packfile.h" #include "object-store.h" +#include "run-command.h" #define REACHABLE 0x0001 #define SEEN 0x0002 @@ -47,6 +48,7 @@ static int name_objects; #define ERROR_REACHABLE 02 #define ERROR_PACK 04 #define ERROR_REFS 010 +#define ERROR_COMMIT_GRAPH 020 static const char *describe_object(struct object *obj) { @@ -822,5 +824,24 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) } check_connectivity(); + + if (core_commit_graph) { + struct child_process commit_graph_verify = CHILD_PROCESS_INIT; + const char *verify_argv[] = { "commit-graph", "verify", NULL, NULL, NULL }; + commit_graph_verify.argv = verify_argv; + commit_graph_verify.git_cmd = 1; + + if (run_command(_graph_verify)) + errors_found |= ERROR_COMMIT_GRAPH; + + prepare_alt_odb(the_repository); + for (alt = the_repository->objects->alt_odb_list; alt; alt = alt->next) { + verify_argv[2] = "--object-dir"; + verify_argv[3] = alt->path; + if (run_command(_graph_verify)) + errors_found |= ERROR_COMMIT_GRAPH; + } + } + return errors_found; } diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index bd5b8428af..0da5a51552 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -399,4 +399,12 @@ test_expect_success 'detect invalid checksum hash' ' "incorrect checksum" ' +test_expect_success 'git fsck (checks commit-graph)' ' + cd "$TRASH_DIRECTORY/full" && + git fsck && + corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \ + "incorrect checksum" && + test_must_fail git fsck +' + test_done -- 2.18.0.rc1
[PATCH v6 12/21] commit-graph: verify parent list
The commit-graph file stores parents in a two-column portion of the commit data chunk. If there is only one parent, then the second column stores 0x to indicate no second parent. The 'verify' subcommand checks the parent list for the commit loaded from the commit-graph and the one parsed from the object database. Test these checks for corrupt parents, too many parents, and wrong parents. Add a boundary check to insert_parent_or_die() for when the parent position value is out of range. The octopus merge will be tested in a later commit. Signed-off-by: Derrick Stolee --- commit-graph.c | 28 t/t5318-commit-graph.sh | 18 ++ 2 files changed, 46 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index 5df18394f9..6d8d774eb0 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -244,6 +244,9 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g, struct commit *c; struct object_id oid; + if (pos >= g->num_commits) + die("invalid parent position %"PRIu64, pos); + hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos); c = lookup_commit(); if (!c) @@ -907,6 +910,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g) for (i = 0; i < g->num_commits; i++) { struct commit *graph_commit, *odb_commit; + struct commit_list *graph_parents, *odb_parents; hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); @@ -924,6 +928,30 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g) oid_to_hex(_oid), oid_to_hex(get_commit_tree_oid(graph_commit)), oid_to_hex(get_commit_tree_oid(odb_commit))); + + graph_parents = graph_commit->parents; + odb_parents = odb_commit->parents; + + while (graph_parents) { + if (odb_parents == NULL) { + graph_report("commit-graph parent list for commit %s is too long", +oid_to_hex(_oid)); + break; + } + + if (oidcmp(_parents->item->object.oid, _parents->item->object.oid)) + graph_report("commit-graph parent for %s is %s != %s", +oid_to_hex(_oid), + oid_to_hex(_parents->item->object.oid), + oid_to_hex(_parents->item->object.oid)); + + graph_parents = graph_parents->next; + odb_parents = odb_parents->next; + } + + if (odb_parents != NULL) + graph_report("commit-graph parent list for commit %s terminates early", +oid_to_hex(_oid)); } return verify_commit_graph_error; diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 2b9214bc83..9a3481c30f 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -269,6 +269,9 @@ GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8)) GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 10)) GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * $NUM_COMMITS)) GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET +GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN)) +GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4)) +GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3)) # usage: corrupt_graph_and_verify # Manipulates the commit-graph file at the position @@ -348,4 +351,19 @@ test_expect_success 'detect incorrect tree OID' ' "root tree OID for commit" ' +test_expect_success 'detect incorrect parent int-id' ' + corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_PARENT "\01" \ + "invalid parent" +' + +test_expect_success 'detect extra parent int-id' ' + corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_EXTRA_PARENT "\00" \ + "is too long" +' + +test_expect_success 'detect wrong parent' ' + corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_WRONG_PARENT "\01" \ + "commit-graph parent for" +' + test_done -- 2.18.0.rc1
[PATCH v6 09/21] commit-graph: verify corrupt OID fanout and lookup
In the commit-graph file, the OID fanout chunk provides an index into the OID lookup. The 'verify' subcommand should find incorrect values in the fanout. Similarly, the 'verify' subcommand should find out-of-order values in the OID lookup. Signed-off-by: Derrick Stolee --- commit-graph.c | 36 t/t5318-commit-graph.sh | 22 ++ 2 files changed, 58 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index f30b4ccee9..866a9e7e41 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -843,6 +843,9 @@ static void graph_report(const char *fmt, ...) int verify_commit_graph(struct repository *r, struct commit_graph *g) { + uint32_t i, cur_fanout_pos = 0; + struct object_id prev_oid, cur_oid; + if (!g) { graph_report("no commit-graph file loaded"); return 1; @@ -857,5 +860,38 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g) if (!g->chunk_commit_data) graph_report("commit-graph is missing the Commit Data chunk"); + if (verify_commit_graph_error) + return verify_commit_graph_error; + + for (i = 0; i < g->num_commits; i++) { + hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); + + if (i && oidcmp(_oid, _oid) >= 0) + graph_report("commit-graph has incorrect OID order: %s then %s", +oid_to_hex(_oid), +oid_to_hex(_oid)); + + oidcpy(_oid, _oid); + + while (cur_oid.hash[0] > cur_fanout_pos) { + uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos); + if (i != fanout_value) + graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u", +cur_fanout_pos, fanout_value, i); + + cur_fanout_pos++; + } + } + + while (cur_fanout_pos < 256) { + uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos); + + if (g->num_commits != fanout_value) + graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u", +cur_fanout_pos, fanout_value, i); + + cur_fanout_pos++; + } + return verify_commit_graph_error; } diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index c03792a8ed..4809cc881f 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' ' git commit-graph verify >output ' +HASH_LEN=20 GRAPH_BYTE_VERSION=4 GRAPH_BYTE_HASH=5 GRAPH_BYTE_CHUNK_COUNT=6 @@ -258,6 +259,12 @@ GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \ 1 * $GRAPH_CHUNK_LOOKUP_WIDTH)) GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \ 2 * $GRAPH_CHUNK_LOOKUP_WIDTH)) +GRAPH_FANOUT_OFFSET=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \ + $GRAPH_CHUNK_LOOKUP_WIDTH * $GRAPH_CHUNK_LOOKUP_ROWS)) +GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 * 4)) +GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255)) +GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256)) +GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8)) # usage: corrupt_graph_and_verify # Manipulates the commit-graph file at the position @@ -312,4 +319,19 @@ test_expect_success 'detect missing commit data chunk' ' "missing the Commit Data chunk" ' +test_expect_success 'detect incorrect fanout' ' + corrupt_graph_and_verify $GRAPH_BYTE_FANOUT1 "\01" \ + "fanout value" +' + +test_expect_success 'detect incorrect fanout final value' ' + corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \ + "fanout value" +' + +test_expect_success 'detect incorrect OID order' ' + corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ORDER "\01" \ + "incorrect OID order" +' + test_done -- 2.18.0.rc1
[PATCH v6 16/21] commit-graph: verify contents match checksum
The commit-graph file ends with a SHA1 hash of the previous contents. If a commit-graph file has errors but the checksum hash is correct, then we know that the problem is a bug in Git and not simply file corruption after-the-fact. Compute the checksum right away so it is the first error that appears, and make the message translatable since this error can be "corrected" by a user by simply deleting the file and recomputing. The rest of the errors are useful only to developers. Be sure to continue checking the rest of the file data if the checksum is wrong. This is important for our tests, as we break the checksum as we modify bytes of the commit-graph file. Signed-off-by: Derrick Stolee --- commit-graph.c | 16 ++-- t/t5318-commit-graph.sh | 6 ++ 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 6d6c6beff9..d926c4b59f 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -833,6 +833,7 @@ void write_commit_graph(const char *obj_dir, oids.nr = 0; } +#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2 static int verify_commit_graph_error; static void graph_report(const char *fmt, ...) @@ -852,8 +853,10 @@ static void graph_report(const char *fmt, ...) int verify_commit_graph(struct repository *r, struct commit_graph *g) { uint32_t i, cur_fanout_pos = 0; - struct object_id prev_oid, cur_oid; + struct object_id prev_oid, cur_oid, checksum; int generation_zero = 0; + struct hashfile *f; + int devnull; if (!g) { graph_report("no commit-graph file loaded"); @@ -872,6 +875,15 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g) if (verify_commit_graph_error) return verify_commit_graph_error; + devnull = open("/dev/null", O_WRONLY); + f = hashfd(devnull, NULL); + hashwrite(f, g->data, g->data_len - g->hash_len); + finalize_hashfile(f, checksum.hash, CSUM_CLOSE); + if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) { + graph_report(_("the commit-graph file has incorrect checksum and is likely corrupt")); + verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH; + } + for (i = 0; i < g->num_commits; i++) { struct commit *graph_commit; @@ -909,7 +921,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g) cur_fanout_pos++; } - if (verify_commit_graph_error) + if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH) return verify_commit_graph_error; for (i = 0; i < g->num_commits; i++) { diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index cbd6462226..bd5b8428af 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -279,6 +279,7 @@ GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16)) GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \ $GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS)) GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4)) +GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES)) # usage: corrupt_graph_and_verify # Manipulates the commit-graph file at the position @@ -393,4 +394,9 @@ test_expect_success 'detect incorrect parent for octopus merge' ' "invalid parent" ' +test_expect_success 'detect invalid checksum hash' ' + corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \ + "incorrect checksum" +' + test_done -- 2.18.0.rc1
[PATCH v6 21/21] commit-graph: update design document
The commit-graph feature is now integrated with 'fsck' and 'gc', so remove those items from the "Future Work" section of the commit-graph design document. Also remove the section on lazy-loading trees, as that was completed in an earlier patch series. Signed-off-by: Derrick Stolee --- Documentation/technical/commit-graph.txt | 22 -- 1 file changed, 22 deletions(-) diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index e1a883eb46..c664acbd76 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -118,9 +118,6 @@ Future Work - The commit graph feature currently does not honor commit grafts. This can be remedied by duplicating or refactoring the current graft logic. -- The 'commit-graph' subcommand does not have a "verify" mode that is - necessary for integration with fsck. - - After computing and storing generation numbers, we must make graph walks aware of generation numbers to gain the performance benefits they enable. This will mostly be accomplished by swapping a commit-date-ordered @@ -130,25 +127,6 @@ Future Work - 'log --topo-order' - 'tag --merged' -- Currently, parse_commit_gently() requires filling in the root tree - object for a commit. This passes through lookup_tree() and consequently - lookup_object(). Also, it calls lookup_commit() when loading the parents. - These method calls check the ODB for object existence, even if the - consumer does not need the content. For example, we do not need the - tree contents when computing merge bases. Now that commit parsing is - removed from the computation time, these lookup operations are the - slowest operations keeping graph walks from being fast. Consider - loading these objects without verifying their existence in the ODB and - only loading them fully when consumers need them. Consider a method - such as "ensure_tree_loaded(commit)" that fully loads a tree before - using commit->tree. - -- The current design uses the 'commit-graph' subcommand to generate the graph. - When this feature stabilizes enough to recommend to most users, we should - add automatic graph writes to common operations that create many commits. - For example, one could compute a graph on 'clone', 'fetch', or 'repack' - commands. - - A server could provide a commit graph file as part of the network protocol to avoid extra calculations by clients. This feature is only of benefit if the user is willing to trust the file, because verifying the file is correct -- 2.18.0.rc1
[PATCH v6 10/21] commit-graph: verify objects exist
In the 'verify' subcommand, load commits directly from the object database to ensure they exist. Parse by skipping the commit-graph. Signed-off-by: Derrick Stolee --- commit-graph.c | 18 ++ t/t5318-commit-graph.sh | 7 +++ 2 files changed, 25 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index 866a9e7e41..00e89b71e9 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -11,6 +11,7 @@ #include "sha1-lookup.h" #include "commit-graph.h" #include "object-store.h" +#include "alloc.h" #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */ #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ @@ -242,6 +243,7 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g, { struct commit *c; struct object_id oid; + hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos); c = lookup_commit(); if (!c) @@ -893,5 +895,21 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g) cur_fanout_pos++; } + if (verify_commit_graph_error) + return verify_commit_graph_error; + + for (i = 0; i < g->num_commits; i++) { + struct commit *odb_commit; + + hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); + + odb_commit = (struct commit *)create_object(r, cur_oid.hash, alloc_commit_node(r)); + if (parse_commit_internal(odb_commit, 0, 0)) { + graph_report("failed to parse %s from object database", +oid_to_hex(_oid)); + continue; + } + } + return verify_commit_graph_error; } diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 4809cc881f..af5e34c0cb 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' ' git commit-graph verify >output ' +NUM_COMMITS=9 HASH_LEN=20 GRAPH_BYTE_VERSION=4 GRAPH_BYTE_HASH=5 @@ -265,6 +266,7 @@ GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 * 4)) GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255)) GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256)) GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8)) +GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 10)) # usage: corrupt_graph_and_verify # Manipulates the commit-graph file at the position @@ -334,4 +336,9 @@ test_expect_success 'detect incorrect OID order' ' "incorrect OID order" ' +test_expect_success 'detect OID not in object database' ' + corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_MISSING "\01" \ + "from object database" +' + test_done -- 2.18.0.rc1
[PATCH v6 18/21] commit-graph: use string-list API for input
Signed-off-by: Derrick Stolee --- builtin/commit-graph.c | 39 +-- commit-graph.c | 15 +++ commit-graph.h | 7 +++ 3 files changed, 23 insertions(+), 38 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 9d108f43a9..ea28bc311a 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -119,13 +119,9 @@ static int graph_read(int argc, const char **argv) static int graph_write(int argc, const char **argv) { - const char **pack_indexes = NULL; - int packs_nr = 0; - const char **commit_hex = NULL; - int commits_nr = 0; - const char **lines = NULL; - int lines_nr = 0; - int lines_alloc = 0; + struct string_list *pack_indexes = NULL; + struct string_list *commit_hex = NULL; + struct string_list lines; static struct option builtin_commit_graph_write_options[] = { OPT_STRING(0, "object-dir", _dir, @@ -149,34 +145,25 @@ static int graph_write(int argc, const char **argv) if (!opts.obj_dir) opts.obj_dir = get_object_directory(); + string_list_init(, 0); if (opts.stdin_packs || opts.stdin_commits) { struct strbuf buf = STRBUF_INIT; - lines_nr = 0; - lines_alloc = 128; - ALLOC_ARRAY(lines, lines_alloc); - - while (strbuf_getline(, stdin) != EOF) { - ALLOC_GROW(lines, lines_nr + 1, lines_alloc); - lines[lines_nr++] = strbuf_detach(, NULL); - } - - if (opts.stdin_packs) { - pack_indexes = lines; - packs_nr = lines_nr; - } - if (opts.stdin_commits) { - commit_hex = lines; - commits_nr = lines_nr; - } + + while (strbuf_getline(, stdin) != EOF) + string_list_append(, strbuf_detach(, NULL)); + + if (opts.stdin_packs) + pack_indexes = + if (opts.stdin_commits) + commit_hex = } write_commit_graph(opts.obj_dir, pack_indexes, - packs_nr, commit_hex, - commits_nr, opts.append); + string_list_clear(, 0); return 0; } diff --git a/commit-graph.c b/commit-graph.c index d926c4b59f..a06e7e9549 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -657,10 +657,8 @@ static void compute_generation_numbers(struct packed_commit_list* commits) } void write_commit_graph(const char *obj_dir, - const char **pack_indexes, - int nr_packs, - const char **commit_hex, - int nr_commits, + struct string_list *pack_indexes, + struct string_list *commit_hex, int append) { struct packed_oid_list oids; @@ -701,10 +699,10 @@ void write_commit_graph(const char *obj_dir, int dirlen; strbuf_addf(, "%s/pack/", obj_dir); dirlen = packname.len; - for (i = 0; i < nr_packs; i++) { + for (i = 0; i < pack_indexes->nr; i++) { struct packed_git *p; strbuf_setlen(, dirlen); - strbuf_addstr(, pack_indexes[i]); + strbuf_addstr(, pack_indexes->items[i].string); p = add_packed_git(packname.buf, packname.len, 1); if (!p) die("error adding pack %s", packname.buf); @@ -717,12 +715,13 @@ void write_commit_graph(const char *obj_dir, } if (commit_hex) { - for (i = 0; i < nr_commits; i++) { + for (i = 0; i < commit_hex->nr; i++) { const char *end; struct object_id oid; struct commit *result; - if (commit_hex[i] && parse_oid_hex(commit_hex[i], , )) + if (commit_hex->items[i].string && + parse_oid_hex(commit_hex->items[i].string, , )) continue; result = lookup_commit_reference_gently(, 1); diff --git a/commit-graph.h b/commit-graph.h index 4359812fb4..a79b854470 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -3,6 +3,7 @@ #include "git-compat-util.h" #include "repository.h" +#include "string-list.h" char *get_commit_graph_filename(const char *obj_dir); @@ -48,10 +49,8 @@ struct commit_graph { struct commit_graph *load_commit_graph_one(const char *graph_file); void write_commit_graph(const char *obj_dir, - const
[PATCH v6 06/21] commit-graph: add 'verify' subcommand
If the commit-graph file becomes corrupt, we need a way to verify that its contents match the object database. In the manner of 'git fsck' we will implement a 'git commit-graph verify' subcommand to report all issues with the file. Add the 'verify' subcommand to the 'commit-graph' builtin and its documentation. The subcommand is currently a no-op except for loading the commit-graph into memory, which may trigger run-time errors that would be caught by normal use. Add a simple test that ensures the command returns a zero error code. If no commit-graph file exists, this is an acceptable state. Do not report any errors. Helped-by: Ramsay Jones Signed-off-by: Derrick Stolee --- Documentation/git-commit-graph.txt | 6 + builtin/commit-graph.c | 39 ++ commit-graph.c | 23 ++ commit-graph.h | 3 +++ t/t5318-commit-graph.sh| 10 5 files changed, 81 insertions(+) diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 4c97b555cc..a222cfab08 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -10,6 +10,7 @@ SYNOPSIS [verse] 'git commit-graph read' [--object-dir ] +'git commit-graph verify' [--object-dir ] 'git commit-graph write' [--object-dir ] @@ -52,6 +53,11 @@ existing commit-graph file. Read a graph file given by the commit-graph file and output basic details about the graph file. Used for debugging purposes. +'verify':: + +Read the commit-graph file and verify its contents against the object +database. Used to check for corrupted data. + EXAMPLES diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index f0875b8bf3..9d108f43a9 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -3,15 +3,22 @@ #include "dir.h" #include "lockfile.h" #include "parse-options.h" +#include "repository.h" #include "commit-graph.h" static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph [--object-dir ]"), N_("git commit-graph read [--object-dir ]"), + N_("git commit-graph verify [--object-dir ]"), N_("git commit-graph write [--object-dir ] [--append] [--stdin-packs|--stdin-commits]"), NULL }; +static const char * const builtin_commit_graph_verify_usage[] = { + N_("git commit-graph verify [--object-dir ]"), + NULL +}; + static const char * const builtin_commit_graph_read_usage[] = { N_("git commit-graph read [--object-dir ]"), NULL @@ -29,6 +36,36 @@ static struct opts_commit_graph { int append; } opts; + +static int graph_verify(int argc, const char **argv) +{ + struct commit_graph *graph = NULL; + char *graph_name; + + static struct option builtin_commit_graph_verify_options[] = { + OPT_STRING(0, "object-dir", _dir, + N_("dir"), + N_("The object directory to store the graph")), + OPT_END(), + }; + + argc = parse_options(argc, argv, NULL, +builtin_commit_graph_verify_options, +builtin_commit_graph_verify_usage, 0); + + if (!opts.obj_dir) + opts.obj_dir = get_object_directory(); + + graph_name = get_commit_graph_filename(opts.obj_dir); + graph = load_commit_graph_one(graph_name); + FREE_AND_NULL(graph_name); + + if (!graph) + return 0; + + return verify_commit_graph(the_repository, graph); +} + static int graph_read(int argc, const char **argv) { struct commit_graph *graph = NULL; @@ -165,6 +202,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix) if (argc > 0) { if (!strcmp(argv[0], "read")) return graph_read(argc, argv); + if (!strcmp(argv[0], "verify")) + return graph_verify(argc, argv); if (!strcmp(argv[0], "write")) return graph_write(argc, argv); } diff --git a/commit-graph.c b/commit-graph.c index 9e228d3bb5..22ef696e18 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -827,3 +827,26 @@ void write_commit_graph(const char *obj_dir, oids.alloc = 0; oids.nr = 0; } + +static int verify_commit_graph_error; + +static void graph_report(const char *fmt, ...) +{ + va_list ap; + verify_commit_graph_error = 1; + + va_start(ap, fmt); + vfprintf(stderr, fmt, ap); + fprintf(stderr, "\n"); + va_end(ap); +} + +int verify_commit_graph(struct repository *r, struct commit_graph *g) +{ + if (!g) { + graph_report("no commit-graph file loaded"); + return 1; + } + + return verify_commit_graph_error; +} diff --git a/commit-graph.h b/commit-graph.h index 96cccb10f3..4359812fb4 100644 ---
[PATCH v6 03/21] commit-graph: parse commit from chosen graph
Before verifying a commit-graph file against the object database, we need to parse all commits from the given commit-graph file. Create parse_commit_in_graph_one() to target a given struct commit_graph. Signed-off-by: Derrick Stolee --- commit-graph.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index f83f6d2373..e77b19971d 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -314,7 +314,7 @@ static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uin } } -int parse_commit_in_graph(struct commit *item) +static int parse_commit_in_graph_one(struct commit_graph *g, struct commit *item) { uint32_t pos; @@ -322,9 +322,21 @@ int parse_commit_in_graph(struct commit *item) return 0; if (item->object.parsed) return 1; + + if (find_commit_in_graph(item, g, )) + return fill_commit_in_graph(item, g, pos); + + return 0; +} + +int parse_commit_in_graph(struct commit *item) +{ + if (!core_commit_graph) + return 0; + prepare_commit_graph(); - if (commit_graph && find_commit_in_graph(item, commit_graph, )) - return fill_commit_in_graph(item, commit_graph, pos); + if (commit_graph) + return parse_commit_in_graph_one(commit_graph, item); return 0; } -- 2.18.0.rc1
[PATCH v6 19/21] commit-graph: add '--reachable' option
When writing commit-graph files, it can be convenient to ask for all reachable commits (starting at the ref set) in the resulting file. This is particularly helpful when writing to stdin is complicated, such as a future integration with 'git gc'. Signed-off-by: Derrick Stolee --- Documentation/git-commit-graph.txt | 8 ++-- builtin/commit-graph.c | 16 commit-graph.c | 18 ++ commit-graph.h | 1 + t/t5318-commit-graph.sh| 10 ++ 5 files changed, 47 insertions(+), 6 deletions(-) diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index a222cfab08..dececb79d7 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -38,12 +38,16 @@ Write a commit graph file based on the commits found in packfiles. + With the `--stdin-packs` option, generate the new commit graph by walking objects only in the specified pack-indexes. (Cannot be combined -with --stdin-commits.) +with `--stdin-commits` or `--reachable`.) + With the `--stdin-commits` option, generate the new commit graph by walking commits starting at the commits specified in stdin as a list of OIDs in hex, one OID per line. (Cannot be combined with ---stdin-packs.) +`--stdin-packs` or `--reachable`.) ++ +With the `--reachable` option, generate the new commit graph by walking +commits starting at all refs. (Cannot be combined with `--stdin-commits` +or `--stdin-packs`.) + With the `--append` option, include all commits that are present in the existing commit-graph file. diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index ea28bc311a..c7d0db5ab4 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -10,7 +10,7 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph [--object-dir ]"), N_("git commit-graph read [--object-dir ]"), N_("git commit-graph verify [--object-dir ]"), - N_("git commit-graph write [--object-dir ] [--append] [--stdin-packs|--stdin-commits]"), + N_("git commit-graph write [--object-dir ] [--append] [--reachable|--stdin-packs|--stdin-commits]"), NULL }; @@ -25,12 +25,13 @@ static const char * const builtin_commit_graph_read_usage[] = { }; static const char * const builtin_commit_graph_write_usage[] = { - N_("git commit-graph write [--object-dir ] [--append] [--stdin-packs|--stdin-commits]"), + N_("git commit-graph write [--object-dir ] [--append] [--reachable|--stdin-packs|--stdin-commits]"), NULL }; static struct opts_commit_graph { const char *obj_dir; + int reachable; int stdin_packs; int stdin_commits; int append; @@ -127,6 +128,8 @@ static int graph_write(int argc, const char **argv) OPT_STRING(0, "object-dir", _dir, N_("dir"), N_("The object directory to store the graph")), + OPT_BOOL(0, "reachable", , + N_("start walk at all refs")), OPT_BOOL(0, "stdin-packs", _packs, N_("scan pack-indexes listed by stdin for commits")), OPT_BOOL(0, "stdin-commits", _commits, @@ -140,11 +143,16 @@ static int graph_write(int argc, const char **argv) builtin_commit_graph_write_options, builtin_commit_graph_write_usage, 0); - if (opts.stdin_packs && opts.stdin_commits) - die(_("cannot use both --stdin-commits and --stdin-packs")); + if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1) + die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs")); if (!opts.obj_dir) opts.obj_dir = get_object_directory(); + if (opts.reachable) { + write_commit_graph_reachable(opts.obj_dir, opts.append); + return 0; + } + string_list_init(, 0); if (opts.stdin_packs || opts.stdin_commits) { struct strbuf buf = STRBUF_INIT; diff --git a/commit-graph.c b/commit-graph.c index a06e7e9549..adf54e3fe7 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -7,6 +7,7 @@ #include "packfile.h" #include "commit.h" #include "object.h" +#include "refs.h" #include "revision.h" #include "sha1-lookup.h" #include "commit-graph.h" @@ -656,6 +657,23 @@ static void compute_generation_numbers(struct packed_commit_list* commits) } } +static int add_ref_to_list(const char *refname, + const struct object_id *oid, + int flags, void *cb_data) +{ + struct string_list *list = (struct string_list*)cb_data; + string_list_append(list, oid_to_hex(oid)); + return 0; +} + +void write_commit_graph_reachable(const char *obj_dir, int append) +{ + struct string_list list; +
[PATCH v6 20/21] gc: automatically write commit-graph files
The commit-graph file is a very helpful feature for speeding up git operations. In order to make it more useful, make it possible to write the commit-graph file during standard garbage collection operations. Add a 'gc.commitGraph' config setting that triggers writing a commit-graph file after any non-trivial 'git gc' command. Defaults to false while the commit-graph feature matures. We specifically do not want to have this on by default until the commit-graph feature is fully integrated with history-modifying features like shallow clones. Helped-by: Ævar Arnfjörð Bjarmason Signed-off-by: Derrick Stolee --- Documentation/config.txt | 10 +- Documentation/git-gc.txt | 4 builtin/gc.c | 6 ++ t/t5318-commit-graph.sh | 14 ++ 4 files changed, 33 insertions(+), 1 deletion(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index ab641bf5a9..f2b5ed17c8 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -906,7 +906,8 @@ the `GIT_NOTES_REF` environment variable. See linkgit:git-notes[1]. core.commitGraph:: Enable git commit graph feature. Allows reading from the - commit-graph file. + commit-graph file. See `gc.commitGraph` for automatically + maintaining the file. core.sparseCheckout:: Enable "sparse checkout" feature. See section "Sparse checkout" in @@ -1647,6 +1648,13 @@ this configuration variable is ignored, all packs except the base pack will be repacked. After this the number of packs should go below gc.autoPackLimit and gc.bigPackThreshold should be respected again. +gc.commitGraph:: + If true, then gc will rewrite the commit-graph file when + linkgit:git-gc[1] is run. When using linkgit:git-gc[1] + '--auto' the commit-graph will be updated if housekeeping is + required. Default is false. See linkgit:git-commit-graph[1] + for details. + gc.logExpiry:: If the file gc.log exists, then `git gc --auto` won't run unless that file is more than 'gc.logExpiry' old. Default is diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index 24b2dd44fe..f5bc98ccb3 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -136,6 +136,10 @@ The optional configuration variable `gc.packRefs` determines if it within all non-bare repos or it can be set to a boolean value. This defaults to true. +The optional configuration variable `gc.commitGraph` determines if +'git gc' should run 'git commit-graph write'. This can be set to a +boolean value. This defaults to false. + The optional configuration variable `gc.aggressiveWindow` controls how much time is spent optimizing the delta compression of the objects in the repository when the --aggressive option is specified. The larger diff --git a/builtin/gc.c b/builtin/gc.c index ccfb1ceaeb..4e06e8372d 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -20,6 +20,7 @@ #include "sigchain.h" #include "argv-array.h" #include "commit.h" +#include "commit-graph.h" #include "packfile.h" #include "object-store.h" #include "pack.h" @@ -40,6 +41,7 @@ static int aggressive_depth = 50; static int aggressive_window = 250; static int gc_auto_threshold = 6700; static int gc_auto_pack_limit = 50; +static int gc_commit_graph = 0; static int detach_auto = 1; static timestamp_t gc_log_expire_time; static const char *gc_log_expire = "1.day.ago"; @@ -129,6 +131,7 @@ static void gc_config(void) git_config_get_int("gc.aggressivedepth", _depth); git_config_get_int("gc.auto", _auto_threshold); git_config_get_int("gc.autopacklimit", _auto_pack_limit); + git_config_get_bool("gc.commitgraph", _commit_graph); git_config_get_bool("gc.autodetach", _auto); git_config_get_expiry("gc.pruneexpire", _expire); git_config_get_expiry("gc.worktreepruneexpire", _worktrees_expire); @@ -641,6 +644,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix) if (pack_garbage.nr > 0) clean_pack_garbage(); + if (gc_commit_graph) + write_commit_graph_reachable(get_object_directory(), 0); + if (auto_gc && too_many_loose_objects()) warning(_("There are too many unreachable loose objects; " "run 'git prune' to remove them.")); diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 06ecbb3f4a..9a0661983c 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -245,6 +245,20 @@ test_expect_success 'perform fast-forward merge in full repo' ' test_cmp expect output ' +test_expect_success 'check that gc computes commit-graph' ' + cd "$TRASH_DIRECTORY/full" && + git commit --allow-empty -m "blank" && + git commit-graph write --reachable && + cp $objdir/info/commit-graph commit-graph-before-gc && + git reset --hard HEAD~1 && + git config gc.commitGraph true && + git gc && + cp
[PATCH v6 15/21] commit-graph: test for corrupted octopus edge
The commit-graph file has an extra chunk to store the parent int-ids for parents beyond the first parent for octopus merges. Our test repo has a single octopus merge that we can manipulate to demonstrate the 'verify' subcommand detects incorrect values in that chunk. Signed-off-by: Derrick Stolee --- t/t5318-commit-graph.sh | 10 ++ 1 file changed, 10 insertions(+) diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index b7b4410e75..cbd6462226 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -248,6 +248,7 @@ test_expect_success 'git commit-graph verify' ' ' NUM_COMMITS=9 +NUM_OCTOPUS_EDGES=2 HASH_LEN=20 GRAPH_BYTE_VERSION=4 GRAPH_BYTE_HASH=5 @@ -274,6 +275,10 @@ GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4)) GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3)) GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11)) GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12)) +GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16)) +GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \ +$GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS)) +GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4)) # usage: corrupt_graph_and_verify # Manipulates the commit-graph file at the position @@ -383,4 +388,9 @@ test_expect_success 'detect incorrect commit date' ' "commit date" ' +test_expect_success 'detect incorrect parent for octopus merge' ' + corrupt_graph_and_verify $GRAPH_BYTE_OCTOPUS "\01" \ + "invalid parent" +' + test_done -- 2.18.0.rc1
[PATCH v6 02/21] commit-graph: fix GRAPH_MIN_SIZE
The GRAPH_MIN_SIZE macro should be the smallest size of a parsable commit-graph file. However, the minimum number of chunks was wrong. It is possible to write a commit-graph file with zero commits, and that violates this macro's value. Rewrite the macro, and use extra macros to better explain the magic constants. Signed-off-by: Derrick Stolee --- commit-graph.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index b63a1fc85e..f83f6d2373 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -35,10 +35,11 @@ #define GRAPH_LAST_EDGE 0x8000 +#define GRAPH_HEADER_SIZE 8 #define GRAPH_FANOUT_SIZE (4 * 256) #define GRAPH_CHUNKLOOKUP_WIDTH 12 -#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \ - GRAPH_OID_LEN + 8) +#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \ + + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN) char *get_commit_graph_filename(const char *obj_dir) { -- 2.18.0.rc1
[PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc'
Sorry for the fast rerolls, but Ævar found some runtime issues with the string-list use and '\*' as multiplication on some platforms. Thus, v4 and v5 are broken. I did get a repro of those issues using VSTS Linux build servers and checked that they are fixed in this version. This version depends on 'next' and sb/object-store-alloc. To accommodate the repository arguments added by sb/object-store-alloc, the following diff occurs from the previous patch: diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 76423b3fa5..c7d0db5ab4 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -3,6 +3,7 @@ #include "dir.h" #include "lockfile.h" #include "parse-options.h" +#include "repository.h" #include "commit-graph.h" static char const * const builtin_commit_graph_usage[] = { @@ -63,7 +64,7 @@ static int graph_verify(int argc, const char **argv) if (!graph) return 0; - return verify_commit_graph(graph); + return verify_commit_graph(the_repository, graph); } static int graph_read(int argc, const char **argv) @@ -152,9 +153,9 @@ static int graph_write(int argc, const char **argv) return 0; } + string_list_init(, 0); if (opts.stdin_packs || opts.stdin_commits) { struct strbuf buf = STRBUF_INIT; - string_list_init(, 0); while (strbuf_getline(, stdin) != EOF) string_list_append(, strbuf_detach(, NULL)); diff --git a/commit-graph.c b/commit-graph.c index 0d5adc8035..adf54e3fe7 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -12,6 +12,7 @@ #include "sha1-lookup.h" #include "commit-graph.h" #include "object-store.h" +#include "alloc.h" #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */ #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ @@ -866,7 +867,7 @@ static void graph_report(const char *fmt, ...) #define GENERATION_ZERO_EXISTS 1 #define GENERATION_NUMBER_EXISTS 2 -int verify_commit_graph(struct commit_graph *g) +int verify_commit_graph(struct repository *r, struct commit_graph *g) { uint32_t i, cur_fanout_pos = 0; struct object_id prev_oid, cur_oid, checksum; @@ -948,7 +949,7 @@ int verify_commit_graph(struct commit_graph *g) hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); graph_commit = lookup_commit(_oid); - odb_commit = (struct commit *)create_object(cur_oid.hash, alloc_commit_node()); + odb_commit = (struct commit *)create_object(r, cur_oid.hash, alloc_commit_node(r)); if (parse_commit_internal(odb_commit, 0, 0)) { graph_report("failed to parse %s from object database", oid_to_hex(_oid)); diff --git a/commit-graph.h b/commit-graph.h index ee20f5e280..506cb45fb1 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -2,6 +2,7 @@ #define COMMIT_GRAPH_H #include "git-compat-util.h" +#include "repository.h" #include "string-list.h" char *get_commit_graph_filename(const char *obj_dir); @@ -53,6 +54,6 @@ void write_commit_graph(const char *obj_dir, struct string_list *commit_hex, int append); -int verify_commit_graph(struct commit_graph *g); +int verify_commit_graph(struct repository *r, struct commit_graph *g); #endif diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index b24e8b6689..9a0661983c 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -33,8 +33,8 @@ test_expect_success 'create commits and repack' ' ' graph_git_two_modes() { - git -c core.commitGraph=true $1 >output - git -c core.commitGraph=false $1 >expect + git -c core.graph=true $1 >output + git -c core.graph=false $1 >expect test_cmp output expect } @@ -282,17 +282,17 @@ GRAPH_CHUNK_LOOKUP_WIDTH=12 GRAPH_CHUNK_LOOKUP_ROWS=5 GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \ - 1 \* $GRAPH_CHUNK_LOOKUP_WIDTH)) + 1 * $GRAPH_CHUNK_LOOKUP_WIDTH)) GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \ -2 \* $GRAPH_CHUNK_LOOKUP_WIDTH)) +2 * $GRAPH_CHUNK_LOOKUP_WIDTH)) GRAPH_FANOUT_OFFSET=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \ - $GRAPH_CHUNK_LOOKUP_WIDTH \* $GRAPH_CHUNK_LOOKUP_ROWS)) -GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 \* 4)) -GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 \* 255)) -GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 \* 256)) -GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8)) -GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 4 + 10)) -GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* $NUM_COMMITS)) + $GRAPH_CHUNK_LOOKUP_WIDTH * $GRAPH_CHUNK_LOOKUP_ROWS))
[PATCH v6 05/21] commit-graph: load a root tree from specific graph
When lazy-loading a tree for a commit, it will be important to select the tree from a specific struct commit_graph. Create a new method that specifies the commit-graph file and use that in get_commit_tree_in_graph(). Signed-off-by: Derrick Stolee --- commit-graph.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index e77b19971d..9e228d3bb5 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -362,14 +362,20 @@ static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit * return c->maybe_tree; } -struct tree *get_commit_tree_in_graph(const struct commit *c) +static struct tree *get_commit_tree_in_graph_one(struct commit_graph *g, +const struct commit *c) { if (c->maybe_tree) return c->maybe_tree; if (c->graph_pos == COMMIT_NOT_FROM_GRAPH) - BUG("get_commit_tree_in_graph called from non-commit-graph commit"); + BUG("get_commit_tree_in_graph_one called from non-commit-graph commit"); + + return load_tree_for_commit(g, (struct commit *)c); +} - return load_tree_for_commit(commit_graph, (struct commit *)c); +struct tree *get_commit_tree_in_graph(const struct commit *c) +{ + return get_commit_tree_in_graph_one(commit_graph, c); } static void write_graph_chunk_fanout(struct hashfile *f, -- 2.18.0.rc1
Re: [PATCH v5 18/21] commit-graph: use string-list API for input
On 6/6/2018 8:45 AM, Derrick Stolee wrote: On 6/6/2018 8:26 AM, Ævar Arnfjörð Bjarmason wrote: On Wed, Jun 06 2018, Derrick Stolee wrote: On 6/6/2018 8:11 AM, Ævar Arnfjörð Bjarmason wrote: On Wed, Jun 06 2018, Derrick Stolee wrote: Signed-off-by: Derrick Stolee + string_list_clear(, 0); return 0; } This results in an invalid free() & segfault because you're freeing which may not have been allocated by string_list_init(). Good point. Did my tests not catch this? (seems it requires calling `git commit-graph write` with no `--stdin-packs` or `--stdin-commits`). Most of your tests (t5318-commit-graph.sh) segfaulted, but presumably you're on a more forgiving compiler/platform/options. I compiled with -O0 -g on clang 4.0.1-8 + Debian testing. I appreciate the extra platform testing. I'm using GCC on Ubuntu (gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0). While the errors didn't repro on my box, they did on the VSTS Linux build machines [1]. I created an internal PR just so I could run those tests (and on our OSX agents which use clang) and so I can verify I fixed the situation. I'll send a v6 soon so Junio only picks up a version that succeeds in tests. Thanks, -Stolee [1] https://github.com/Microsoft/vsts-agent-docker/blob/master/ubuntu/16.04/standard/Dockerfile What's installed on a VSTS Linux build agent
Re: GDPR compliance best practices?
On Fri, 8 Jun 2018, Peter Backes wrote: you are the one arguing that the GDPR prohibits Git from storing and revealing this license granting data, not me. It prohibits publishing, and only after a request to be forgotten. It does not prohibit storing your private copy. Wrong, if you have to delete info, you are not allowed to keep a private copy. There is _nothing_ in the GDPR about publishing information, everything in it is about what you are allowed to store privately, how you are required to protect it (or more precisely, what you are required to do if private data gets hacked), and how you are required to keep it available.
Re: [RFC PATCH v1] telemetry design overview (part 1)
On Thu, Jun 07, 2018 at 11:10:52PM +0200, Johannes Sixt wrote: > Am 07.06.2018 um 16:53 schrieb g...@jeffhostetler.com: > > From: Jeff Hostetler > > > > I've been working to add code to Git to optionally collect telemetry data. > > The goal is to be able to collect performance data from Git commands and > > allow it to be aggregated over a user community to find "slow commands". > > Seriously? "add code to collect telemetry data" said by somebody whose email > address ends with @microsoft.com is very irritating. I really don't want to > have yet another switch that I must check after every update that it is > still off. If you look at the design document, it's off by default and would write to a file on the filesystem. That doesn't seem all that different from GIT_TRACE. -Peff
Re: [PATCHv1 1/3] git-p4: raise exceptions from p4CmdList based on error from p4 server
>> >>> ErrorId MsgDb::MaxResults = { ErrorOf( ES_DB, 32, >>> E_FAILED, EV_ADMIN, 1 ), "Request too large (over %maxResults%); see >>> 'p4 help maxresults'." } ;//NOTRANS >>> ErrorId MsgDb::MaxScanRows = { ErrorOf( ES_DB, 61, >>> E_FAILED, EV_ADMIN, 1 ), "Too many rows scanned (over %maxScanRows%); >>> see 'p4 help maxscanrows'." } ;//NOTRANS >>> >>> I don't think there's actually a way to make it return any language >>> other than English though. [...] >>> So I think probably the language is always English. >> >> The "NOTRANS" annotation on the error messages is reassuring. > > I'll check it works OK on Windows; charset translation might cause a problem. Works OK on Windows with 2017.2 client/server.
Dear Friend,
Dear Friend, I am Mr.Azzia Mohammed, the head of file department of Bank of Africa (B.O.A) herein Burkina Faso / Ouagadougou. In my department we discover an abandoned sum of (US$18 million US Dollars) in an account that belongs to one of our foreign customer who died along with his family in plane crash. It is therefore upon this discovery that I now decided to make this business proposal to you and release the money to you as the next of kin or relation to the deceased for the safety and subsequent disbursement since nobody is coming for it. I agree that 40% of this money will be for you, while 60% would be for me. Then after the money is been transferred into your account, I will visit your country for an investment under your kind control. You have to contact my Bank directly as the real next of kin of this deceased account with next of kin application form. You have to send me those your information below to enable me use it and get you next of kin application form from bank, so that you will contact Bank for the transfer of this money into your account. Your Full Name___ Your Home Address Your Age ___ Your Handset Number Your Occupation ___ I am waiting for your urgent respond to enable us proceed further for the transfer. Yours faithfully, Mr. Azzia Mohammed.
Re: GDPR compliance best practices?
On Fri, Jun 08, 2018 at 10:13:20AM +0200, Ævar Arnfjörð Bjarmason wrote: > Can you walk us through how anyone would be expected to fork (as create > a new project, not the github-ism) existing projects under such a > regiment? I don't see your point. Copy the repository to fork. Nothing changes about that. Nothing prevents anyone from forking a repository which had some of its author names removed from the commits. > As David Lang notes upthread, "the license is granted to the world, so > the world has an interest in it". I wouldn't be so sure that this line > of argument wouldn't work. As I already stressed, having an interest is not enough. You need to have overriding legitimate grounds. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Fri, Jun 08, 2018 at 12:42:54AM -0700, David Lang wrote: > Wrong, if you have to delete info, you are not allowed to keep a private > copy. Yes you are allowed. See Art. 17 (3) lit e GDPR. > There is _nothing_ in the GDPR about publishing information, > everything in it is about what you are allowed to store privately, how you > are required to protect it (or more precisely, what you are required to do > if private data gets hacked), and how you are required to keep it available. Nope, the GDPR is not at all restricted to private copies. The GDPR has special jargon for publishing; the GDPR calls it "disclosure (Art. 4 (2) GDPR) to an unspecified number of unspecified recipients (Art. 4 (9) GDPR), including ones in third countries (Chapter 5) in a repetitive (Art 49 (1) GDPR) fashion". Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: [PATCH v8 1/2] json_writer: new routines to create data in JSON format
Am 07.06.2018 um 16:12 schrieb g...@jeffhostetler.com: > From: Jeff Hostetler > > Add a series of jw_ routines and "struct json_writer" structure to compose > JSON data. The resulting string data can then be output by commands wanting > to support a JSON output format. > > The json-writer routines can be used to generate structured data in a > JSON-like format. We say "JSON-like" because we do not enforce the Unicode > (usually UTF-8) requirement on string fields. Internally, Git does not > necessarily have Unicode/UTF-8 data for most fields, so it is currently > unclear the best way to enforce that requirement. For example, on Linx > pathnames can contain arbitrary 8-bit character data, so a command like > "status" would not know how to encode the reported pathnames. We may want > to revisit this (or double encode such strings) in the future. > > The initial use for the json-writer routines is for generating telemetry > data for executed Git commands. Later, we may want to use them in other > commands, such as status. > > Helped-by: René Scharfe > Helped-by: Wink Saville > Helped-by: Ramsay Jones > Signed-off-by: Jeff Hostetler > --- > Makefile| 2 + > json-writer.c | 419 > json-writer.h | 113 + > t/helper/test-json-writer.c | 572 > > t/t0019-json-writer.sh | 236 ++ > 5 files changed, 1342 insertions(+) > create mode 100644 json-writer.c > create mode 100644 json-writer.h > create mode 100644 t/helper/test-json-writer.c > create mode 100755 t/t0019-json-writer.sh > > diff --git a/Makefile b/Makefile > index a1d8775..4ae6946 100644 > --- a/Makefile > +++ b/Makefile > @@ -666,6 +666,7 @@ TEST_PROGRAMS_NEED_X += test-fake-ssh > TEST_PROGRAMS_NEED_X += test-genrandom > TEST_PROGRAMS_NEED_X += test-hashmap > TEST_PROGRAMS_NEED_X += test-index-version > +TEST_PROGRAMS_NEED_X += test-json-writer> TEST_PROGRAMS_NEED_X += > test-lazy-init-name-hash > TEST_PROGRAMS_NEED_X += test-line-buffer > TEST_PROGRAMS_NEED_X += test-match-trees This doesn't apply cleanly on master, because most test helpers have been combined into a single binary to reduce their disk footprint and link times. See efd71f8913 (t/helper: add an empty test-tool program) for the overall rationale. test-json-writer could become a built-in as well, I think. You can see e.g. in c932a5ff28 (t/helper: merge test-string-list into test-tool) what needs to be done to convert a helper. René
Re: GDPR compliance best practices?
On Fri, Jun 08 2018, Peter Backes wrote: > On Thu, Jun 07, 2018 at 10:53:13PM -0400, Theodore Y. Ts'o wrote: >> The problem is you've left undefined who is "you"? With an open >> source project, anyone who has contributed to open source project has >> a copyright interest. That hobbyist in German who submitted a patch? >> They have a copyright interest. That US Company based in Redmond, >> Washington? They own a copyright interest. Huawei in China? They >> have a copyright interest. >> >> So there is no "privately". And "you" numbers in the thousands and >> thousands of copyright holders of portions of the open source code. > > Of course there is "privately". Every single one of those who have the > author information can keep it, privately, for themselves. But those > that have received a request to be forgotten must not keep publishing > it on the Internet for download or distribute it to others. Can you walk us through how anyone would be expected to fork (as create a new project, not the github-ism) existing projects under such a regiment? E.g. in git.git we have SOB lines for the whole history, in lieu of GNU-style copyright assignment (which is how things mainly worked back in the CVS days) someone can just clone the repository and create a hostile fork, which is one of the central ideas of free software. In the world you're describing the history would have been expunged publicly, and no hosting site would be willing to host it. It might be gone in practical terms to anyone who just doesn't like how (in this example) the Git project is run, and thinks they can do it better. Maybe (again, in this example) the Software Freedom Conservancy's scope would have to expand to retain this private history (right now they have nothing to do with copyright). But then how am I going to fork the Git project if the SFC decides they don't want to cooperate with me? As David Lang notes upthread, "the license is granted to the world, so the world has an interest in it". I wouldn't be so sure that this line of argument wouldn't work.
Re: [RFC PATCH v1] telemetry design overview (part 1)
On Thu, Jun 07 2018, Johannes Sixt wrote: > Am 07.06.2018 um 16:53 schrieb g...@jeffhostetler.com: >> From: Jeff Hostetler >> >> I've been working to add code to Git to optionally collect telemetry data. >> The goal is to be able to collect performance data from Git commands and >> allow it to be aggregated over a user community to find "slow commands". > > Seriously? "add code to collect telemetry data" said by somebody whose > email address ends with @microsoft.com is very irritating. I really > don't want to have yet another switch that I must check after every > update that it is still off. To elaborate a bit on Jeff's reply (since this was discussed in more detail at Git Merge this year), the point of this feature is not to ship git.git with some default telemetry, but so that in-house users of git like Microsoft, Dropbox, Booking.com etc. can build & configure their internal versions of git to turn on telemetry for their own users. There's numerous in-house monkeypatches to git on various sites (at least Microsoft & Dropbox reported having internal patches already). Something like this facility would allow us to agree on some implementation that could be shipped by default (but would be off by default), those of us who'd make use of this feature already have "root" on those user's machines, and control what git binary they use etc, their /etc/gitconfig and so on.
wir bieten 2% Kredite
Sehr geehrte Damen und Herren, Sie brauchen Geld? Sie sind auf der suche nach einem Darlehen? Seriös und unkompliziert? Dann sind Sie hier bei uns genau richtig. Durch unsere jahrelange Erfahrung und kompetente Beratung sind wir Europaweit tätig. Wir bieten jedem ein GÜNSTIGES Darlehen zu TOP Konditionen an. Darlehnen zwischen 5000 CHF/Euro bis zu 20 Millionen CHF/Euro möglich. Wir erheben dazu 2% Zinssatz. Lassen Sie sich von unserem kompetenten Team beraten. Zögern Sie nicht und kontaktieren Sie mich unter für weitere Infos & Anfragen unter der eingeblendeten Email Adresse. Ich freue mich von Ihnen zu hören.
Re: [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo
Derrick Stolee writes: > The commit-graph file stores a condensed version of the commit history. > This helps speed up several operations involving commit walks. This > feature does not work well if those commits "change" using features like > commit grafts, replace objects, or shallow clones. I like to think about those features as providing an overlay for the commit graph (similar to overlay filesystems, like overlayfs); in the case of git-replace quite literally. I will be calling all those features "history-altering features", for short. > Since the commit-graph feature is optional, hidden behind the > 'core.commitGraph' config option, and requires manual maintenance (until > ds/commit-graph-fsck' is merged), these issues only arise for expert > users who decided to opt-in. > > This RFC is a first attempt at rectify the issues that come up when > these features interact. I have not succeeded entirely, as I will > discuss below. What I would like to see here in cover letter is a generic description of _strategy_ to deal with those history-altering features. >From what I understand you have the following options for each of replacements, shallow clones and grafts: - turn off serialized commit-graph if given history-altering feature is present, as if core.commitGraph was set to false - invalidate and optionally refresh commit-graph file if given history-altering feature is present (maybe only if it changed the view of the history, is such check is possible) For automatic invalidation you would need to have either: - cover all possible ways by which given history-altering feature can change the view of history, or - keep state of history-altering change for which commit-graph was created (e.g. in proposed VAL4 chunk) and compare with current view of history if it changed For automatic turning off you would need only to check if history-altering feature is present. Let us examine each of those history-altering features that Git supports: * shallow clone - shallow clone usually means having shorter history, so serialized commit-graph is not needed as much; also changing the depth screws-up assumptions about generation numbers - there are only a few entry points changing the view of history, namely fetch and push with shallow options (--depth=, --deepen=, --shallow-since=, --update-shallow, --shallow-exclude=, --unshallow) - it is easy to check for presence of shallow clone feature by chacking of $GIT_DIR/shallow exists and is not empty - different contents of $GIT_DIR/shallow means different view of history - NOTE: internally uses grafts mechanism (emulated grafts) * replacements (replace objects, git-replace) - git-replace can be used to join current development repository with historical repository, in which case we would certainly want to make use of serialied commit graph; on the other hand the replacements do not necessary alter the view of the history - theoretically you could create replacement refs by hand, but in practice there are TWO ways of getting them: - using git-replace command to create, edit/change and delete replacement objects' ' - fetching or having pushed-to refs in refs/replace/* namespace - you need to check if there are any refs in refs/replace/* namespace to check if the feature is used (but it doesn't necessarily mean that it altered project history) - changed list of refs in refs/replace/* namespace (which you can get with for-each-ref command/API) does not necessarily mean that the view of the history changed: you can replace non-commit object, you can replace commits and not change their parents; it is not as easy as checking if file exists - NOTE: the feature can be turned off manually with GIT_NO_REPLACE_OBJECTS environment variable and with --no-replace-objects option to git wrapper. Also when pushing, fetching and fsck-ing this feature is turned off and refs in refs/replace/* namespace are treated as ordinary refs. This may mean that we would want to create commit-graph with replacements for ordinary non-bare repository, and without replacements for server-side bare repository. * grafts - subset of uses of git-replace, older and *obsolete* feature (because it is unsafe to use; that is you need to be careful with it). - edited by hand, no automatic entry points - if $GIT_DIR/info/grafts file is present, then feature is enabled (barring some corner cases, like empty file or file consisting only of comments) - changed contents of this file means changed view of history (well, except for reordering lines, or removing/adding empty lines and comments) > > The first two "DO NOT MERGE" commits are not intended to be part of the > main product, but are ways to help the full test suite run while > computing and consuming commit-graph files. This is acheived by calling >
Re: [PATCH v8 1/2] json_writer: new routines to create data in JSON format
On 6/7/2018 1:24 PM, Eric Sunshine wrote: On Thu, Jun 7, 2018 at 10:12 AM, wrote: Add a series of jw_ routines and "struct json_writer" structure to compose JSON data. The resulting string data can then be output by commands wanting to support a JSON output format. [...] Signed-off-by: Jeff Hostetler --- diff --git a/t/t0019-json-writer.sh b/t/t0019-json-writer.sh @@ -0,0 +1,236 @@ +test_expect_success 'simple object' ' + cat >expect <<-\EOF && + {"a":"abc","b":42,"c":3.14,"d":true,"e":false,"f":null} + EOF + test-json-writer >actual \ + @object \ + @object-string a abc \ + @object-int b 42 \ + @object-double c 2 3.140 \ + @object-true d \ + @object-false e \ + @object-null f \ + @end && + test_cmp expect actual +' To make it easier on people writing these tests, it might be nice for this to be less noisy by getting rid of "@" and "\". To get rid of "\", the test program could grab its script commands from stdin (one instruction per line) rather than from argv[]. For instance: test-json-writer >actual <<-\EOF && object object-string a abc ... end EOF Not a big deal, and certainly not worth a re-roll. I hadn't thought about doing it that way. Might be a little easier to use. Let me take a look and see if it would be much work to switch. Thanks Jeff
[PATCH] docs/git-blame: explain carets on boundary commits
The caret on boundary commits is only mentioned in the description of --abbrev; this patch adds a description of the behaviour in the paragraph describing how revision range specifiers are handled. Signed-off-by: Stephen Kitt --- Documentation/git-blame.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt index 16323eb80..7f814dcef 100644 --- a/Documentation/git-blame.txt +++ b/Documentation/git-blame.txt @@ -163,7 +163,8 @@ When revision range specifiers are used to limit the annotation, lines that have not changed since the range boundary (either the commit v2.6.18 or the most recent commit that is more than 3 weeks old in the above example) are blamed for that range -boundary commit. +boundary commit. The sha1 is preceded by a caret, '^' to indicate +this (and its length is reduced by 1). A particularly useful way is to see if an added file has lines created by copy-and-paste from existing files. Sometimes this -- 2.11.0
Truncating file names with Unicode characters
# Truncating file names with Unicode characters When shortening file names that contain Unicode characters, git performs truncation without awareness of two-byte characters. That often leads to splitting a character in half and displaying a garbage byte that's left. Unawareness of Unicode also means that filename length is calculated incorrectly and some output gets misaligned. I have tested this with git 2.14.1 on Windows and with git 2.11.0 on Linux. My configuration includes setting `core.quotepath = off` to display Unicode paths. # Example: `git log --stat` ## Bad output: half-characters and wrong text alignment The last file name gets truncated in the middle of the character (`ˆ` is what's left of it). Text alignment is off because string lengths are calculated in bytes instead of characters. Extension/README.md| 28 + .../Catalog.Номенклатура.xml | 32 ++ .../Configuration.xml | 5 +- ...етПереработчика.ObjectModule.txt | 39 ...cument.ОтчетПереработчика.xml | 68 + .../Enum.СтавкиНДС.xml| 24 ...ˆирениеERPПотяркин_2018-06-05.cfe | Bin 0 -> 22018 bytes 7 files changed, 195 insertions(+), 1 deletion(-) ## Good output with ASCII file names Truncation and alignment are done right because each character is represented by a single byte. .../index.html | 14 ++ docs/posts/2017/loops-in-power-query-m-language/index.html | 14 ++ .../index.html | 7 +++ .../temporary-virtual-environment-for-python/index.html| 14 ++ .../index.html | 14 ++ docs/posts/2018/getting-started-with-libpq/index.html | 14 ++ .../index.html | 14 ++ .../2018/unit-testing-in-power-query-m-language/index.html | 7 +++ 8 files changed, 98 insertions(+)
Re: [PATCH v8 1/2] json_writer: new routines to create data in JSON format
Am 07.06.2018 um 16:12 schrieb g...@jeffhostetler.com: > From: Jeff Hostetler > +/* > + * Add comma if we have already seen a member at this level. > + */ > +static inline void maybe_add_comma(struct json_writer *jw) > +{ > + if (!jw->open_stack.len) > + return; This is impossible. maybe_add_comma() is only called directly after assert_in_object() and assert_in_object(), which abort if open_stack is empty. > + if (jw->first_stack.buf[jw->first_stack.len - 1] == '1') > + jw->first_stack.buf[jw->first_stack.len - 1] = '0'; > + else > + strbuf_addch(>json, ','); You only need a binary flag to indicate if a comma is needed, not a stack. We need a comma at the current level if and only if we already wrote a child item. If we finish a level then we do need a comma at the previous level because we just wrote a sub-object or sub-array there. And that should cover all cases. Am I missing something? I get a sense of déjà vu, by the way. :) Here's what I mean: --- json-writer.c | 17 ++--- json-writer.h | 13 + 2 files changed, 11 insertions(+), 19 deletions(-) diff --git a/json-writer.c b/json-writer.c index f35ce1912a..14a4e89188 100644 --- a/json-writer.c +++ b/json-writer.c @@ -5,7 +5,7 @@ void jw_init(struct json_writer *jw) { strbuf_reset(>json); strbuf_reset(>open_stack); - strbuf_reset(>first_stack); + jw->need_comma = 0; jw->pretty = 0; } @@ -13,7 +13,6 @@ void jw_release(struct json_writer *jw) { strbuf_release(>json); strbuf_release(>open_stack); - strbuf_release(>first_stack); } /* @@ -69,7 +68,7 @@ static inline void begin(struct json_writer *jw, char ch_open, int pretty) strbuf_addch(>json, ch_open); strbuf_addch(>open_stack, ch_open); - strbuf_addch(>first_stack, '1'); + jw->need_comma = 0; } /* @@ -99,12 +98,10 @@ static inline void assert_in_array(const struct json_writer *jw) */ static inline void maybe_add_comma(struct json_writer *jw) { - if (!jw->open_stack.len) - return; - if (jw->first_stack.buf[jw->first_stack.len - 1] == '1') - jw->first_stack.buf[jw->first_stack.len - 1] = '0'; - else + if (jw->need_comma) strbuf_addch(>json, ','); + else + jw->need_comma = 1; } static inline void fmt_double(struct json_writer *jw, int precision, @@ -397,8 +394,6 @@ void jw_end(struct json_writer *jw) char ch_open; int len; - if (jw->open_stack.len != jw->first_stack.len) - BUG("jwon-writer: open_ and first_ stacks out of sync"); if (!jw->open_stack.len) BUG("json-writer: too many jw_end(): '%s'", jw->json.buf); @@ -406,7 +401,7 @@ void jw_end(struct json_writer *jw) ch_open = jw->open_stack.buf[len]; strbuf_setlen(>open_stack, len); - strbuf_setlen(>first_stack, len); + jw->need_comma = 1; if (jw->pretty) strbuf_addch(>json, '\n'); diff --git a/json-writer.h b/json-writer.h index af9c2612f8..c437462ba8 100644 --- a/json-writer.h +++ b/json-writer.h @@ -59,18 +59,15 @@ struct json_writer struct strbuf open_stack; /* -* Another simple stack of the currently open array and object -* forms. This is used in parallel to "open_stack" (same length). -* It contains a string of '1' and '0' where '1' indicates that -* the next child-element seen will be the first child (and does -* not need a leading comma). +* Indicates if the next child item needs a leading comma. +* Only the first item of arrays and objects doesn't need one. */ - struct strbuf first_stack; + unsigned int need_comma:1; - int pretty; + unsigned int pretty:1; }; -#define JSON_WRITER_INIT { STRBUF_INIT, STRBUF_INIT, STRBUF_INIT, 0 } +#define JSON_WRITER_INIT { STRBUF_INIT, STRBUF_INIT, 0, 0 } void jw_init(struct json_writer *jw); void jw_release(struct json_writer *jw); -- 2.17.1
Re: GDPR compliance best practices?
On Fri, 8 Jun 2018, Peter Backes wrote: On Fri, Jun 08, 2018 at 12:42:54AM -0700, David Lang wrote: Wrong, if you have to delete info, you are not allowed to keep a private copy. Yes you are allowed. See Art. 17 (3) lit e GDPR. There is _nothing_ in the GDPR about publishing information, everything in it is about what you are allowed to store privately, how you are required to protect it (or more precisely, what you are required to do if private data gets hacked), and how you are required to keep it available. Nope, the GDPR is not at all restricted to private copies. If the GDPR doesn't restrict private copies, then Google and Facebook are free to keep all data about everyone. That is explicitly what the GDPR is trying to prevent. The GDPR has special jargon for publishing; the GDPR calls it "disclosure (Art. 4 (2) GDPR) to an unspecified number of unspecified recipients (Art. 4 (9) GDPR), including ones in third countries (Chapter 5) in a repetitive (Art 49 (1) GDPR) fashion". disclosure is what the person who submits the patch is doing, torturing the language of the GDPR to say that hanging on to data that people want you to delete is legal, and echoing public data that people have asked to be public is not legal is not going to be a successful line of argument, it's the exact opposite of the stated goals of the GDPR. David Lang
[PATCH 10/20] abbrev tests: test for "git-diff" behavior
The "diff" family of commands does its own parsing for --abbrev in diff.c, so having dedicated tests for it makes sense. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t0014-abbrev.sh | 31 +++ 1 file changed, 31 insertions(+) diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh index a66051c040..783807475f 100755 --- a/t/t0014-abbrev.sh +++ b/t/t0014-abbrev.sh @@ -213,4 +213,35 @@ do " done +for i in $(test_seq 4 40) +do + test_expect_success "diff --no-index --raw core.abbrev=$i and --abbrev=$i" " + test_must_fail git -c core.abbrev=$i diff --no-index --raw X Y >diff && + cut_tr_d_n_field_n 3 diff.3 && + test_byte_count = $i diff.3 && + cut_tr_d_n_field_n 4 diff.4 && + test_byte_count = $i diff.4 && + + test_must_fail git diff --no-index --raw --abbrev=$i X Y >diff && + cut_tr_d_n_field_n 3 diff.3 && + test_byte_count = $i diff.3 && + cut_tr_d_n_field_n 4 diff.4 && + test_byte_count = $i diff.4 + " + + test_expect_success "diff --raw core.abbrev=$i and --abbrev=$i" " + git -c core.abbrev=$i diff --raw HEAD~ >diff && + cut_tr_d_n_field_n 3 diff.3 && + test_byte_count = $i diff.3 && + cut_tr_d_n_field_n 4 diff.4 && + test_byte_count = $i diff.4 && + + git diff --raw --abbrev=$i HEAD~ >diff && + cut_tr_d_n_field_n 3 diff.3 && + test_byte_count = $i diff.3 && + cut_tr_d_n_field_n 4 diff.4 && + test_byte_count = $i diff.4 + " +done + test_done -- 2.17.0.290.gded63e768a
[PATCH 14/20] config.c: use braces on multiple conditional arms
Adjust this code that'll be modified in a subsequent change to have more than one line per branch to use braces per the CodingGuidelines, this makes the later change easier to understand. Signed-off-by: Ævar Arnfjörð Bjarmason --- config.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/config.c b/config.c index fbbf0f8e9f..12f762ad92 100644 --- a/config.c +++ b/config.c @@ -1149,9 +1149,9 @@ static int git_default_core_config(const char *var, const char *value) if (!strcmp(var, "core.abbrev")) { if (!value) return config_error_nonbool(var); - if (!strcasecmp(value, "auto")) + if (!strcasecmp(value, "auto")) { default_abbrev = -1; - else { + } else { int abbrev = git_config_int(var, value); if (abbrev < minimum_abbrev || abbrev > 40) return error("abbrev length out of range: %d", abbrev); -- 2.17.0.290.gded63e768a
[PATCH 16/20] abbrev: unify the handling of non-numeric values
Unify how --abbrev=XYZ and core.abbrev=XYZ is handled. 2/4 parsers for these values would just let these invalid values pass, treating them as though "0" was provided, which due to other inconsistent fallback logic (soon to be fixed) would be treated as providing MINIMUM_ABBREV. Signed-off-by: Ævar Arnfjörð Bjarmason --- diff.c| 5 - revision.c| 5 - t/t0014-abbrev.sh | 10 +- 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/diff.c b/diff.c index 136d44b455..75935322f1 100644 --- a/diff.c +++ b/diff.c @@ -4801,7 +4801,10 @@ int diff_opt_parse(struct diff_options *options, else if (!strcmp(arg, "--abbrev")) options->abbrev = DEFAULT_ABBREV; else if (skip_prefix(arg, "--abbrev=", )) { - options->abbrev = strtoul(arg, NULL, 10); + char *end; + options->abbrev = strtoul(arg, , 10); + if (*end) + die("--abbrev expects a numerical value, got '%s'", arg); if (options->abbrev < MINIMUM_ABBREV) options->abbrev = MINIMUM_ABBREV; else if (the_hash_algo->hexsz < options->abbrev) diff --git a/revision.c b/revision.c index 40fd91ff2b..aa87afa77f 100644 --- a/revision.c +++ b/revision.c @@ -2047,7 +2047,10 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg } else if (!strcmp(arg, "--abbrev")) { revs->abbrev = DEFAULT_ABBREV; } else if (skip_prefix(arg, "--abbrev=", )) { - revs->abbrev = strtoul(optarg, NULL, 10); + char *end; + revs->abbrev = strtoul(optarg, , 10); + if (*end) + die("--abbrev expects a numerical value, got '%s'", optarg); if (revs->abbrev < MINIMUM_ABBREV) revs->abbrev = MINIMUM_ABBREV; else if (revs->abbrev > hexsz) diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh index 6dee92f35e..203fe316b9 100755 --- a/t/t0014-abbrev.sh +++ b/t/t0014-abbrev.sh @@ -64,14 +64,14 @@ test_expect_success 'abbrev non-integer value handling differs ' ' test_must_fail git branch -v --abbrev=XYZ 2>stderr && test_i18ngrep "expects a numerical value" stderr && - git log --abbrev=XYZ -1 --pretty=format:%h 2>stderr && - ! test -s stderr && + test_must_fail git log --abbrev=XYZ -1 --pretty=format:%h 2>stderr && + test_i18ngrep "expects a numerical value" stderr && - git diff --raw --abbrev=XYZ HEAD~ 2>stderr && - ! test -s stderr && + test_must_fail git diff --raw --abbrev=XYZ HEAD~ 2>stderr && + test_i18ngrep "expects a numerical value" stderr && test_must_fail git diff --raw --abbrev=XYZ --no-index X Y 2>stderr && - ! test -s stderr + test_i18ngrep "expects a numerical value" stderr ' for i in -41 -20 -10 -1 -0 +0 0 1 2 3 41 -- 2.17.0.290.gded63e768a
[PATCH 17/20] abbrev: unify the handling of empty values
For no good reason the --abbrev= command-line option was less strict than the core.abbrev config option, which came down to the latter using git_config_int() which rejects an empty string, but the rest of the parsing using strtoul() which will convert it to 0. Signed-off-by: Ævar Arnfjörð Bjarmason --- diff.c | 2 ++ parse-options-cb.c | 2 ++ revision.c | 2 ++ t/t0014-abbrev.sh | 22 -- 4 files changed, 14 insertions(+), 14 deletions(-) diff --git a/diff.c b/diff.c index 75935322f1..cab79d24ab 100644 --- a/diff.c +++ b/diff.c @@ -4802,6 +4802,8 @@ int diff_opt_parse(struct diff_options *options, options->abbrev = DEFAULT_ABBREV; else if (skip_prefix(arg, "--abbrev=", )) { char *end; + if (!strcmp(arg, "")) + die("--abbrev expects a value, got '%s'", arg); options->abbrev = strtoul(arg, , 10); if (*end) die("--abbrev expects a numerical value, got '%s'", arg); diff --git a/parse-options-cb.c b/parse-options-cb.c index e3cd87fbd6..aa9984f164 100644 --- a/parse-options-cb.c +++ b/parse-options-cb.c @@ -16,6 +16,8 @@ int parse_opt_abbrev_cb(const struct option *opt, const char *arg, int unset) if (!arg) { v = unset ? 0 : DEFAULT_ABBREV; } else { + if (!strcmp(arg, "")) + return opterror(opt, "expects a value", 0); v = strtol(arg, (char **), 10); if (*arg) return opterror(opt, "expects a numerical value", 0); diff --git a/revision.c b/revision.c index aa87afa77f..d39a292895 100644 --- a/revision.c +++ b/revision.c @@ -2048,6 +2048,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg revs->abbrev = DEFAULT_ABBREV; } else if (skip_prefix(arg, "--abbrev=", )) { char *end; + if (!strcmp(optarg, "")) + die("--abbrev expects a value, got '%s'", optarg); revs->abbrev = strtoul(optarg, , 10); if (*end) die("--abbrev expects a numerical value, got '%s'", optarg); diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh index 203fe316b9..8448f78560 100755 --- a/t/t0014-abbrev.sh +++ b/t/t0014-abbrev.sh @@ -38,23 +38,17 @@ test_expect_success 'abbrev empty value handling differs ' ' test_must_fail git -c core.abbrev= log -1 --pretty=format:%h 2>stderr && test_i18ngrep "bad numeric config value.*invalid unit" stderr && - git branch -v --abbrev= | cut_tr_d_n_field_n 3 >branch && - test_byte_count = 40 branch && + test_must_fail git branch -v --abbrev= 2>stderr && + test_i18ngrep "expects a value" stderr && - git log --abbrev= -1 --pretty=format:%h >log && - test_byte_count = 4 log && + test_must_fail git log --abbrev= -1 --pretty=format:%h 2>stderr && + test_i18ngrep "expects a value" stderr && - git diff --raw --abbrev= HEAD~ >diff && - cut_tr_d_n_field_n 3 diff.3 && - test_byte_count = 4 diff.3 && - cut_tr_d_n_field_n 4 diff.4 && - test_byte_count = 4 diff.4 && + test_must_fail git diff --raw --abbrev= HEAD~ 2>stderr && + test_i18ngrep "expects a value" stderr && - test_must_fail git diff --raw --abbrev= --no-index X Y >diff && - cut_tr_d_n_field_n 3 diff.3 && - test_byte_count = 4 diff.3 && - cut_tr_d_n_field_n 4 diff.4 && - test_byte_count = 4 diff.4 + test_must_fail git diff --raw --abbrev= --no-index X Y 2>stderr && + test_i18ngrep "expects a value" stderr ' test_expect_success 'abbrev non-integer value handling differs ' ' -- 2.17.0.290.gded63e768a
[PATCH 19/20] abbrev: support relative abbrev values
Change the core.abbrev config variable and the corresponding --abbrev command-line option to support relative values such as +1 or -1. Before Linus's e6c587c733 ("abbrev: auto size the default abbreviation", 2016-09-30) git would default to abbreviating object names to 7-hexdigits, and only picking longer SHA-1s as needed if that was ambiguous. That change instead set the default length as a function of the estimated current count of objects: Based on the expectation that we would see collision in a repository with 2^(2N) objects when using object names shortened to first N bits, use sufficient number of hexdigits to cover the number of objects in the repository. Each hexdigit (4-bits) we add to the shortened name allows us to have four times (2-bits) as many objects in the repository. By supporting relative values for core.abbrev we can allow users to consistently opt-in for either a higher or lower probability of collision, without needing to hardcode a given numeric value like "10", which would be overkill on some repositories, and far to small on others. Signed-off-by: Ævar Arnfjörð Bjarmason --- Documentation/config.txt | 6 ++ Documentation/diff-options.txt | 3 + Documentation/git-blame.txt| 3 + Documentation/git-branch.txt | 3 + Documentation/git-describe.txt | 3 + Documentation/git-ls-files.txt | 3 + Documentation/git-ls-tree.txt | 3 + Documentation/git-show-ref.txt | 3 + cache.h| 1 + config.c | 11 +++ diff.c | 18 +++- environment.c | 1 + parse-options-cb.c | 12 ++- revision.c | 18 +++- sha1-name.c| 11 +++ t/t0014-abbrev.sh | 170 ++--- 16 files changed, 204 insertions(+), 65 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index ab641bf5a9..abf07be7b6 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -919,6 +919,12 @@ core.abbrev:: in your repository, which hopefully is enough for abbreviated object names to stay unique for some time. The minimum length is 4. ++ +This can also be set to relative values such as `+2` or `-2`, which +means to add or subtract N characters from the SHA-1 that Git would +otherwise print, this allows for producing more future-proof SHA-1s +for use within a given project, while adjusting the value for the +current approximate number of objects. add.ignoreErrors:: add.ignore-errors (deprecated):: diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index f466600972..f1114a7b8d 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -384,6 +384,9 @@ endif::git-format-patch[] independent of the `--full-index` option above, which controls the diff-patch output format. Non default number of digits can be specified with `--abbrev=`. ++ +Can also be set to a relative value, see `core.abbrev` in +linkgit:git-diff[1]. -B[][/]:: --break-rewrites[=[][/]]:: diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt index d6cddbcb2e..8559d3b0c7 100644 --- a/Documentation/git-blame.txt +++ b/Documentation/git-blame.txt @@ -99,6 +99,9 @@ requested, resulting in unaligned output. Before 2.19, setting `core.abbrev=40` wouldn't apply the above rule and would cause blame to emit output that was unaligned. This bug has since been fixed. ++ +Can also be set to a relative value, see `core.abbrev` in +linkgit:git-diff[1]. THE PORCELAIN FORMAT diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt index 02eccbb931..0f8032cec6 100644 --- a/Documentation/git-branch.txt +++ b/Documentation/git-branch.txt @@ -182,6 +182,9 @@ See `--create-reflog` above for details. Alter the sha1's minimum display length in the output listing. The default value is 7 and can be overridden by the `core.abbrev` config option. ++ +Can also be set to a relative value, see `core.abbrev` in +linkgit:git-diff[1]. --no-abbrev:: Display the full sha1s in the output listing rather than abbreviating them. diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt index e027fb8c4b..a3d5c7e817 100644 --- a/Documentation/git-describe.txt +++ b/Documentation/git-describe.txt @@ -65,6 +65,9 @@ OPTIONS abbreviated object name, use digits, or as many digits as needed to form a unique object name. An of 0 will suppress long format, only showing the closest tag. ++ +Can also be set to a relative value, see `core.abbrev` in +linkgit:git-diff[1]. --candidates=:: Instead of considering only the 10 most recent tags as diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt index 5298f1bc30..f9af2b23bf 100644 --- a/Documentation/git-ls-files.txt +++
[PATCH 13/20] parse-options-cb.c: convert uses of 40 to GIT_SHA1_HEXSZ
In ac53fe8601 ("sha1_name: convert uses of 40 to GIT_SHA1_HEXSZ", 2017-07-13) the code this is validating user input for in find_unique_abbrev_r() was converted to GIT_SHA1_HEXSZ, but the corresponding validation codepath wasn't change. Let's do that. Signed-off-by: Ævar Arnfjörð Bjarmason --- parse-options-cb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/parse-options-cb.c b/parse-options-cb.c index 0f9f311a7a..298b5735c8 100644 --- a/parse-options-cb.c +++ b/parse-options-cb.c @@ -21,8 +21,8 @@ int parse_opt_abbrev_cb(const struct option *opt, const char *arg, int unset) return opterror(opt, "expects a numerical value", 0); if (v && v < MINIMUM_ABBREV) v = MINIMUM_ABBREV; - else if (v > 40) - v = 40; + else if (v > GIT_SHA1_HEXSZ) + v = GIT_SHA1_HEXSZ; } *(int *)(opt->value) = v; return 0; -- 2.17.0.290.gded63e768a
Re: GDPR compliance best practices?
Hi, Peter Backes wrote: > I'd like to ask whether anyone has best practices for achieving GDPR > compliance for git repos? The GDPR will come into effect in the EU next > month. This is a reasonable question to ask other Git users on this list to share ideas, so thanks for asking it. > In particular, how do you cope with the "Right to erasure" concerning > entries in the history of your git repos? Later in the thread you discussed some changes you would like to make to Git or in front of Git to ensure that people can erase their authorship information from a repository after the fact in a non-disruptive way. I have no opinion about how that relates to GDPR requirements. I tend to expect any legal advice a person gets to be situation-specific; it's much harder to get legal advice that is useful to share. Separate from that legal context, though, I think it's an interesting feature request. I don't think it goes far enough: I would like a way to erase arbitrary information from the history in a repository. For example, if I accidentally check in an encryption key in my repository as content or a commit message, I would like a way to remove it, assuming that others who fetch from the same repo are willing to cooperate with me, of course (i.e. in place of the object, the server would store a placeholder and an _advisory_ token allowing clients to know (1) that this object was deleted, (2) what object to use instead, and (3) an explanatory note about why the deletion occured; clients could make whatever use of this information they choose). I've seen some discussion on this subject at https://www.mercurial-scm.org/pipermail/mercurial/2008-March/017802.html long ago and have some ideas of my own, but nothing concrete yet. Anyway, I thought it might be useful to get people's minds working on it. Thanks, Jonathan
[PATCH 09/20] abbrev tests: test for "git-log" behavior
The "log" family of commands does its own parsing for --abbrev in revision.c, so having dedicated tests for it makes sense. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t0014-abbrev.sh | 10 ++ 1 file changed, 10 insertions(+) diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh index 645bcca1d1..a66051c040 100755 --- a/t/t0014-abbrev.sh +++ b/t/t0014-abbrev.sh @@ -203,4 +203,14 @@ do " done +for i in $(test_seq 4 40) +do + test_expect_success "log core.abbrev=$i and --abbrev=$i" " + git -c core.abbrev=$i log --pretty=format:%h -1 | tr_d_n >log && + test_byte_count = $i log && + git log --abbrev=$i --pretty=format:%h -1 | tr_d_n >log && + test_byte_count = $i log + " +done + test_done -- 2.17.0.290.gded63e768a
[PATCH 06/20] blame: fix a bug, core.abbrev should work like --abbrev
Since 84393bfd73 ("blame: add --abbrev command line option and make it honor core.abbrev", 2011-04-06) first released with v1.7.6 "git blame" has supported both core.abbrev and --abbrev to change the abbreviation length. Initially output wouldn't alter the abbreviation length to account for the boundary commit. This was changed in 91229834c2 ("blame: fix alignment with --abbrev=40", 2017-01-05) first released with v2.11.1. That change had a bug which I'm fixing here. It didn't account for the abbreviation length also being set via core.abbrev, not just via the --abbrev command-line option. So let's handle that. The easiest way to do that is to check if the global default_abbrev variable (-1 by default) has been set by git_default_core_config(), and if so pretend we had the --abbrev option is set, in lieu of making everything that uses the "abbrev" variable now read that OR default_abbrev). The reason I'm documenting these past behaviors is that whatever our desires with --porcelain people do parse the human-readable "git blame" output, and for any machine use are likely to use core.abbrev=40 or --abbrev=40. Documenting how the format has changed over time will help those users avoid nasty surprises. Signed-off-by: Ævar Arnfjörð Bjarmason --- Documentation/git-blame.txt | 8 +++- builtin/blame.c | 2 ++ t/t0014-abbrev.sh | 7 ++- 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt index 7b562494ac..d6cddbcb2e 100644 --- a/Documentation/git-blame.txt +++ b/Documentation/git-blame.txt @@ -92,7 +92,13 @@ include::blame-options.txt[] Because of this UI design, the only way to get the full SHA-1 of the boundary commit is to use the `--porcelain` format. With `--abbrev=40` only 39 characters of the boundary SHA-1 will be emitted, since one -will be used for the caret to mark the boundary. +will be used for the caret to mark the boundary. This behavior was +different before 2.11.1, git would then emit the 40 character if +requested, resulting in unaligned output. ++ +Before 2.19, setting `core.abbrev=40` wouldn't apply the above rule +and would cause blame to emit output that was unaligned. This bug has +since been fixed. THE PORCELAIN FORMAT diff --git a/builtin/blame.c b/builtin/blame.c index 4202584f97..6ab08561d1 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -868,6 +868,8 @@ int cmd_blame(int argc, const char **argv, const char *prefix) } else if (show_progress < 0) show_progress = isatty(2); + if (default_abbrev >= 0) + abbrev = default_abbrev; if (0 < abbrev && abbrev < GIT_SHA1_HEXSZ) /* one more abbrev length is needed for the boundary commit */ abbrev++; diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh index 77f15d5b0b..934c54a96b 100755 --- a/t/t0014-abbrev.sh +++ b/t/t0014-abbrev.sh @@ -135,7 +135,12 @@ do test_expect_success "blame core.abbrev=$i and --abbrev=$i with boundary" " # See the blame documentation for why this is off-by-one git -c core.abbrev=$i blame A.t | cut_tr_d_n_field_n 1 | nocaret >blame && - test_byte_count = $i blame && + if test $i -eq 40 + then + test_byte_count = 39 blame + else + test_byte_count = $i blame + fi && git blame --abbrev=$i A.t | cut_tr_d_n_field_n 1 | nocaret >blame && if test $i -eq 40 then -- 2.17.0.290.gded63e768a
[PATCH 18/20] abbrev parsing: use braces on multiple conditional arms
Adjust this code that'll be modified in a subsequent change to have more than one line per branch to use braces per the CodingGuidelines, this makes the later change easier to understand. Signed-off-by: Ævar Arnfjörð Bjarmason --- diff.c | 5 +++-- revision.c | 5 +++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/diff.c b/diff.c index cab79d24ab..e0141cfbc0 100644 --- a/diff.c +++ b/diff.c @@ -4807,10 +4807,11 @@ int diff_opt_parse(struct diff_options *options, options->abbrev = strtoul(arg, , 10); if (*end) die("--abbrev expects a numerical value, got '%s'", arg); - if (options->abbrev < MINIMUM_ABBREV) + if (options->abbrev < MINIMUM_ABBREV) { options->abbrev = MINIMUM_ABBREV; - else if (the_hash_algo->hexsz < options->abbrev) + } else if (the_hash_algo->hexsz < options->abbrev) { options->abbrev = the_hash_algo->hexsz; + } } else if ((argcount = parse_long_opt("src-prefix", av, ))) { options->a_prefix = optarg; diff --git a/revision.c b/revision.c index d39a292895..2a75fef22d 100644 --- a/revision.c +++ b/revision.c @@ -2053,10 +2053,11 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg revs->abbrev = strtoul(optarg, , 10); if (*end) die("--abbrev expects a numerical value, got '%s'", optarg); - if (revs->abbrev < MINIMUM_ABBREV) + if (revs->abbrev < MINIMUM_ABBREV) { revs->abbrev = MINIMUM_ABBREV; - else if (revs->abbrev > hexsz) + } else if (revs->abbrev > hexsz) { revs->abbrev = hexsz; + } } else if (!strcmp(arg, "--abbrev-commit")) { revs->abbrev_commit = 1; revs->abbrev_commit_given = 1; -- 2.17.0.290.gded63e768a
[PATCH 05/20] abbrev tests: test "git-blame" behavior
Add tests showing how "git-blame" behaves. As noted in an earlier change there's a behavior difference between core.abbrev=40 and --abbrev=40. Let's also assert that neither way of changing the abbreviation length modifies the porcelain output. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t0014-abbrev.sh | 48 +++ 1 file changed, 48 insertions(+) diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh index 1c60f5ff93..77f15d5b0b 100755 --- a/t/t0014-abbrev.sh +++ b/t/t0014-abbrev.sh @@ -12,6 +12,10 @@ cut_tr_d_n_field_n() { cut -d " " -f $1 | tr_d_n } +nocaret() { + sed 's/\^//' +} + test_expect_success 'setup' ' test_commit A && git tag -a -mannotated A.annotated && @@ -115,4 +119,48 @@ do " done +for i in $(test_seq 4 40) +do + for opt in --porcelain --line-porcelain + do + test_expect_success "blame $opt ignores core.abbrev=$i and --abbrev=$i" " + git -c core.abbrev=$i blame $opt A.t | head -n 1 | cut_tr_d_n_field_n 1 >blame && + test_byte_count = 40 blame && + git blame $opt --abbrev=$i A.t | head -n 1 | cut_tr_d_n_field_n 1 >blame && + test_byte_count = 40 blame + " + done + + + test_expect_success "blame core.abbrev=$i and --abbrev=$i with boundary" " + # See the blame documentation for why this is off-by-one + git -c core.abbrev=$i blame A.t | cut_tr_d_n_field_n 1 | nocaret >blame && + test_byte_count = $i blame && + git blame --abbrev=$i A.t | cut_tr_d_n_field_n 1 | nocaret >blame && + if test $i -eq 40 + then + test_byte_count = 39 blame + else + test_byte_count = $i blame + fi + " + + test_expect_success "blame core.abbrev=$i and --abbrev=$i without boundary" " + git -c core.abbrev=$i blame B.t | cut_tr_d_n_field_n 1 | nocaret >blame && + if test $i -eq 40 + then + test_byte_count = $i blame + else + test_byte_count = \$(($i + 1)) blame + fi && + git blame --abbrev=$i B.t | cut_tr_d_n_field_n 1 | nocaret >blame && + if test $i -eq 40 + then + test_byte_count = $i blame + else + test_byte_count = \$(($i + 1)) blame + fi + " +done + test_done -- 2.17.0.290.gded63e768a
[PATCH 20/20] abbrev: add a core.validateAbbrev setting
Operations that need to abbreviate a lot of SHA-1s such as 'git log --oneline' experience degraded performance when traversing a lot of packs. See [1] and [2] for some relevant performance numbers. One way to address this is something like the MIDX to make looking up the SHA-1s cheaper. This change adds an alternate method of achieving some of the same ends (but possibly not all, see [3] and replies to the original thread at [1]). Instead of trying really hard to find an unambiguous SHA-1 we can with core.validateAbbrev=false, and preferably combined with the new support for relative core.abbrev values (such as +4) unconditionally print a short SHA-1 without doing any disambiguation check. I.e. it allows for picking a trade-off between performance, and the odds that future or remote (or current and local) short SHA-1 will be ambiguous. With the included performance test my git.git repacked with with `git repack -A -d --max-pack-size=X` gives the following results against git.git itself with X=64M: TestHEAD~ HEAD 0014.1: git log --oneline --raw --parents 2.53(2.48+0.05) 2.20(2.14+0.05) -13.0% With one big pack -7.6%, and with 16M packs -23.8%. 1. https://public-inbox.org/git/20180107181459.222909-1-dsto...@microsoft.com/ 2. https://public-inbox.org/git/20180607140338.32440-1-dsto...@microsoft.com/ 3. https://public-inbox.org/git/87lgbsz61p@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason --- Documentation/config.txt| 43 ++ cache.h | 1 + config.c| 7 + environment.c | 1 + sha1-name.c | 4 +++ t/perf/p0014-abbrev.sh | 13 t/t1512-rev-parse-disambiguation.sh | 47 + 7 files changed, 116 insertions(+) create mode 100755 t/perf/p0014-abbrev.sh diff --git a/Documentation/config.txt b/Documentation/config.txt index abf07be7b6..df31d1351f 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -925,6 +925,49 @@ means to add or subtract N characters from the SHA-1 that Git would otherwise print, this allows for producing more future-proof SHA-1s for use within a given project, while adjusting the value for the current approximate number of objects. ++ +This is especially useful in combination with the +`core.validateAbbrev` setting, or to get more future-proof hashes to +reference in the future in a repository whose number of objects is +expected to grow. + +core.validateAbbrev:: + If set to false (true by default) don't do any validation when + printing abbreviated object names to see if they're really + unique. This makes printing objects more performant at the + cost of potentially printing object names that aren't unique + within the current repository. ++ +When printing abbreviated object names Git needs to look through the +local object store. This is an `O(log N)` operation assuming all the +objects are in a single pack file, but `X * O(log N)` given `X` pack +files, which can get expensive on some larger repositories. ++ +This setting changes that to `O(1)`, but with the trade-off that +depending on the value of `core.abbrev` we may be printing abbreviated +hashes that collide. Too see how likely this is, try running: ++ +--- +git log --all --pretty=format:%h --abbrev=4 | perl -nE 'chomp; say length' | sort | uniq -c | sort -nr +--- ++ +This shows how many commits were found at each abbreviation length. On +linux.git in June 2018 this shows a bit more than 750,000 commits, +with just 4 needing 11 characters to be fully abbreviated, and the +default heuristic picks a length of 12. ++ +Even without `core.validateAbbrev=false` the results abbreviation +already a bit of a probability game. They're guaranteed at the moment +of generation, but as more objects are added, ambiguities may be +introduced. Likewise, what's unambiguous for you may not be for +somebody else you're communicating with, if they have their own clone. ++ +Therefore the default of `core.validateAbbrev=true` may not save you +in practice if you're sharing the SHA-1 or noting it now to use after +a `git fetch`. You may be better off setting `core.abbrev` to +e.g. `+2` to add 2 extra characters to the SHA-1, and possibly combine +that with `core.validateAbbrev=false` to get a reasonable trade-off +between safety and performance. add.ignoreErrors:: add.ignore-errors (deprecated):: diff --git a/cache.h b/cache.h index 0fb4211804..6dc5af9482 100644 --- a/cache.h +++ b/cache.h @@ -773,6 +773,7 @@ extern int
[PATCH 12/20] abbrev tests: test for --abbrev and core.abbrev=[+-]N
In a later change I mean to make values like -1 and +1 mean something different, but right now they're implicitly parsed. Let's test for the current behavior before changing it. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t0014-abbrev.sh | 131 -- 1 file changed, 128 insertions(+), 3 deletions(-) diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh index 5a99cbe434..6dee92f35e 100755 --- a/t/t0014-abbrev.sh +++ b/t/t0014-abbrev.sh @@ -74,7 +74,7 @@ test_expect_success 'abbrev non-integer value handling differs ' ' ! test -s stderr ' -for i in -41 -20 -10 -1 0 1 2 3 41 +for i in -41 -20 -10 -1 -0 +0 0 1 2 3 41 do test_expect_success "core.abbrev value $i out of range errors out" " test_must_fail git -c core.abbrev=$i log -1 --pretty=format:%h 2>stderr && @@ -90,7 +90,7 @@ do " done -for i in 0 1 2 3 4 +for i in 0 1 2 3 4 -0 +0 +1 +2 +3 +4 do test_expect_success "non-negative --abbrev=$i value log && @@ -98,7 +98,7 @@ do " done -for i in 41 9001 +for i in 41 9001 +41 +9001 do test_expect_success "non-negative --abbrev=$i value >MINIMUM_ABBREV falls back on 40" " git log --abbrev=$i -1 --pretty=format:%h >log && @@ -116,6 +116,10 @@ do git log --abbrev=$i -1 --pretty=format:%h >log && test_byte_count = $i log && + # core.abbrev=+N is the same as core.abbrev=N + git -c core.abbrev=+$i log -1 --pretty=format:%h >log && + test_byte_count = $i log && + # The --abbrev option should take priority over # core.abbrev git -c core.abbrev=20 log --abbrev=$i -1 --pretty=format:%h >log && @@ -172,16 +176,39 @@ do " done +test_expect_success 'blame core.abbrev=[-+]1 and --abbrev=[-+]1' ' + test_must_fail git -c core.abbrev=+1 blame A.t | cut_tr_d_n_field_n 1 >blame && + test_must_fail git -c core.abbrev=-1 blame A.t | cut_tr_d_n_field_n 1 >blame && + + git blame --abbrev=-1 A.t | cut_tr_d_n_field_n 1 >blame && + test_byte_count = 5 blame && + + git blame --abbrev=+1 A.t | cut_tr_d_n_field_n 1 >blame && + test_byte_count = 5 blame +' + for i in $(test_seq 4 40) do test_expect_success "branch core.abbrev=$i and --abbrev=$i" " git -c core.abbrev=$i branch -v | cut_tr_d_n_field_n 3 >branch && test_byte_count = $i branch && + git branch --abbrev=$i -v | cut_tr_d_n_field_n 3 >branch && test_byte_count = $i branch " done +test_expect_success 'branch core.abbrev=[-+]1 and --abbrev=[-+]1' ' + test_must_fail git -c core.abbrev=+1 branch -v | cut_tr_d_n_field_n 3 >branch && + test_must_fail git -c core.abbrev=-1 branch -v | cut_tr_d_n_field_n 3 >branch && + + git branch --abbrev=-1 -v | cut_tr_d_n_field_n 3 >branch && + test_byte_count = 4 branch && + + git branch --abbrev=+1 -v | cut_tr_d_n_field_n 3 >branch && + test_byte_count = 4 branch +' + test_expect_success 'describe core.abbrev and --abbrev special cases' ' # core.abbrev=0 behaves as usual... test_must_fail git -c core.abbrev=0 describe && @@ -203,6 +230,17 @@ do " done +test_expect_success 'describe core.abbrev=[-+]1 and --abbrev=[-+]1' ' + test_must_fail git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe && + test_must_fail git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe && + + git describe --abbrev=-1 | sed_g_tr_d_n >describe && + test_byte_count = 4 describe && + + git describe --abbrev=+1 | sed_g_tr_d_n >describe && + test_byte_count = 4 describe +' + for i in $(test_seq 4 40) do test_expect_success "log core.abbrev=$i and --abbrev=$i" " @@ -213,6 +251,20 @@ do " done +test_expect_success 'log core.abbrev=[-+]1 and --abbrev=[-+]1' ' + test_must_fail git -c core.abbrev=+1 log --pretty=format:%h -1 2>stderr && + test_i18ngrep "abbrev length out of range" stderr && + + test_must_fail git -c core.abbrev=-1 log --pretty=format:%h -1 2>stderr && + test_i18ngrep "abbrev length out of range" stderr && + + git log --abbrev=+1 --pretty=format:%h -1 | tr_d_n >log && + test_byte_count = 4 log && + + git log --abbrev=-1 --pretty=format:%h -1 | tr_d_n >log && + test_byte_count = 40 log +' + for i in $(test_seq 4 40) do test_expect_success "diff --no-index --raw core.abbrev=$i and --abbrev=$i" " @@ -244,6 +296,46 @@ do " done +test_expect_success 'diff --no-index --raw core.abbrev=[-+]1 and --abbrev=[-+]1' ' + test_must_fail git -c core.abbrev=+1 diff --no-index --raw X Y 2>stderr && + test_i18ngrep "abbrev length out of range" stderr && + + test_must_fail git -c core.abbrev=-1 diff --no-index --raw X Y 2>stderr && + test_i18ngrep "abbrev length out of
[PATCH 08/20] abbrev tests: test for "git-describe" behavior
The only thing out of the ordinary with git-describe is the --abbrev=0 special-case. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t0014-abbrev.sh | 25 + 1 file changed, 25 insertions(+) diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh index d8b060d922..645bcca1d1 100755 --- a/t/t0014-abbrev.sh +++ b/t/t0014-abbrev.sh @@ -16,6 +16,10 @@ nocaret() { sed 's/\^//' } +sed_g_tr_d_n() { + sed 's/.*g//' | tr_d_n +} + test_expect_success 'setup' ' test_commit A && git tag -a -mannotated A.annotated && @@ -178,4 +182,25 @@ do " done +test_expect_success 'describe core.abbrev and --abbrev special cases' ' + # core.abbrev=0 behaves as usual... + test_must_fail git -c core.abbrev=0 describe && + + # ...but --abbrev=0 is special-cased to print the nearest tag, + # not fall back on "4" like git-log. + echo A.annotated >expected && + git describe --abbrev=0 >actual && + test_cmp expected actual +' + +for i in $(test_seq 4 40) +do + test_expect_success "describe core.abbrev=$i and --abbrev=$i" " + git -c core.abbrev=$i describe | sed_g_tr_d_n >describe && + test_byte_count = $i describe && + git describe --abbrev=$i | sed_g_tr_d_n >describe && + test_byte_count = $i describe + " +done + test_done -- 2.17.0.290.gded63e768a
[PATCH 03/20] blame doc: explicitly note how --abbrev=40 gives 39 chars
In a later change I'm adding stress testing of the commit abbreviation as it relates to git-blame and others, and initially thought that the inability to extract full SHA-1s from the non-"--porcelain" output was a bug. In hindsight I could have read the existing paragraph more carefully, but let's make this clearer by explicitly stating this limitation of --abbrev as it relates to git-blame, it is not shared by any other command that supports core.abbrev or --abbrev. Signed-off-by: Ævar Arnfjörð Bjarmason --- Documentation/git-blame.txt | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt index 16323eb80e..7b562494ac 100644 --- a/Documentation/git-blame.txt +++ b/Documentation/git-blame.txt @@ -88,6 +88,11 @@ include::blame-options.txt[] Instead of using the default 7+1 hexadecimal digits as the abbreviated object name, use +1 digits. Note that 1 column is used for a caret to mark the boundary commit. ++ +Because of this UI design, the only way to get the full SHA-1 of the +boundary commit is to use the `--porcelain` format. With `--abbrev=40` +only 39 characters of the boundary SHA-1 will be emitted, since one +will be used for the caret to mark the boundary. THE PORCELAIN FORMAT -- 2.17.0.290.gded63e768a
[PATCH 11/20] abbrev tests: test for plumbing behavior
The "git-{ls-files,ls-tree,show-ref}" commands all have in common that the core.abbrev variable is ignored, and only --abbrev works. This is intentional since these are all plumbing. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t0014-abbrev.sh | 32 1 file changed, 32 insertions(+) diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh index 783807475f..5a99cbe434 100755 --- a/t/t0014-abbrev.sh +++ b/t/t0014-abbrev.sh @@ -244,4 +244,36 @@ do " done +for i in $(test_seq 4 40) +do + test_expect_success "ls-files core.abbrev=$i and --abbrev=$i" " + git -c core.abbrev=$i ls-files --stage A.t | cut_tr_d_n_field_n 2 >ls-files && + test_byte_count = 40 ls-files && + git ls-files --abbrev=$i --stage A.t | cut_tr_d_n_field_n 2 >ls-files && + test_byte_count = $i ls-files + " +done + +for i in $(test_seq 4 40) +do + test_expect_success "ls-tree core.abbrev=$i and --abbrev=$i" " + git -c core.abbrev=$i ls-tree HEAD A.t | cut -f 1 | cut_tr_d_n_field_n 3 >ls-tree && + test_byte_count = 40 ls-tree && + git ls-tree --abbrev=$i HEAD A.t | cut -f 1 | cut_tr_d_n_field_n 3 >ls-tree && + test_byte_count = $i ls-tree + " +done + +for i in $(test_seq 4 40) +do + test_expect_success "show-ref core.abbrev=$i and --abbrev=$i" " + git -c core.abbrev=$i show-ref --hash refs/heads/master | tr_d_n >show-ref && + test_byte_count = 40 show-ref && + git show-ref --hash --abbrev=$i refs/heads/master | tr_d_n >show-ref && + test_byte_count = $i show-ref && + git show-ref --hash=$i refs/heads/master | tr_d_n >show-ref && + test_byte_count = $i show-ref + " +done + test_done -- 2.17.0.290.gded63e768a
[PATCH 15/20] parse-options-cb.c: use braces on multiple conditional arms
Adjust this code that'll be modified in a subsequent change to have more than one line per branch to use braces per the CodingGuidelines, this makes the later change easier to understand. Signed-off-by: Ævar Arnfjörð Bjarmason --- parse-options-cb.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/parse-options-cb.c b/parse-options-cb.c index 298b5735c8..e3cd87fbd6 100644 --- a/parse-options-cb.c +++ b/parse-options-cb.c @@ -19,10 +19,11 @@ int parse_opt_abbrev_cb(const struct option *opt, const char *arg, int unset) v = strtol(arg, (char **), 10); if (*arg) return opterror(opt, "expects a numerical value", 0); - if (v && v < MINIMUM_ABBREV) + if (v && v < MINIMUM_ABBREV) { v = MINIMUM_ABBREV; - else if (v > GIT_SHA1_HEXSZ) + } else if (v > GIT_SHA1_HEXSZ) { v = GIT_SHA1_HEXSZ; + } } *(int *)(opt->value) = v; return 0; -- 2.17.0.290.gded63e768a
[PATCH 07/20] abbrev tests: test "git branch" behavior
Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t0014-abbrev.sh | 10 ++ 1 file changed, 10 insertions(+) diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh index 934c54a96b..d8b060d922 100755 --- a/t/t0014-abbrev.sh +++ b/t/t0014-abbrev.sh @@ -168,4 +168,14 @@ do " done +for i in $(test_seq 4 40) +do + test_expect_success "branch core.abbrev=$i and --abbrev=$i" " + git -c core.abbrev=$i branch -v | cut_tr_d_n_field_n 3 >branch && + test_byte_count = $i branch && + git branch --abbrev=$i -v | cut_tr_d_n_field_n 3 >branch && + test_byte_count = $i branch + " +done + test_done -- 2.17.0.290.gded63e768a
Re: [PATCH v6 20/21] gc: automatically write commit-graph files
> The commit-graph file is a very helpful feature for speeding up git > operations. In order to make it more useful, make it possible to > write the commit-graph file during standard garbage collection > operations. > > Add a 'gc.commitGraph' config setting that triggers writing a > commit-graph file after any non-trivial 'git gc' command. Defaults to > false while the commit-graph feature matures. We specifically do not > want to have this on by default until the commit-graph feature is fully > integrated with history-modifying features like shallow clones. So I played around with an earlier version of this patch series a while ago... and as I looked into my gitconfig today I was surprised to have both 'core.commitGraph' and 'gc.commitGraph' config variables set. When I looked into it I came across this email from Ævar: https://public-inbox.org/git/87fu3peni2@evledraar.gmail.com/ > Other than the question if 'gc.commitGraph' and 'core.commitGraph' > should be independent config variables, and the exact wording of the > git-gc docs, it looks good to me. Sans doc errors you pointed out in other places (you need to set core.commitGraph so it's read at all), I think it's very useful to have these split up. It's simliar to pack.useBitmaps & pack.writeBitmaps. I think the comparison with pack bitmaps makes a lot of sense and I have to say that I really like those 'useBitmaps' and 'writeBitmaps' variable names, because it's clear right away which one is which, without consulting the documentation. I think having 'useCommitGraph' and 'writeCommitGraph' variables would be a lot better than the same variable name in two different sections, and I'm sure that then I wouldn't have been caught off guard. Yeah, I know, my timing sucks, with 'core.commitGraph' already out there in the -rc releases... sorry.
Re: GDPR compliance best practices?
On Fri, Jun 08 2018, Jonathan Nieder wrote: > Separate from that legal context, though, I think it's an interesting > feature request. I don't think it goes far enough: I would like a way > to erase arbitrary information from the history in a repository. For > example, if I accidentally check in an encryption key in my repository > as content or a commit message, I would like a way to remove it, > assuming that others who fetch from the same repo are willing to > cooperate with me, of course (i.e. in place of the object, the server > would store a placeholder and an _advisory_ token allowing clients to > know (1) that this object was deleted, (2) what object to use instead, > and (3) an explanatory note about why the deletion occured; clients > could make whatever use of this information they choose). > > I've seen some discussion on this subject at > https://www.mercurial-scm.org/pipermail/mercurial/2008-March/017802.html > long ago and have some ideas of my own, but nothing concrete yet. > Anyway, I thought it might be useful to get people's minds working on > it. You may find it interesting to look at how git-annex-forget does this: https://git-annex.branchable.com/git-annex-forget/ & http://git-annex.branchable.com/devblog/day_-4__forgetting/
[PATCH 00/20] unconditional O(1) SHA-1 abbreviation
This patch series implements an entirely alternate method of achieving some of the same ends as the MIDX, using the approach suggested by Jeff King from the RFC thread back in January[1]. You can now do: core.abbrev = 20 core.validateAbbrev = false Or: core.abbrev = +2 core.validateAbbrev = false On linux.git `git log --oneline --raw --parents` with 64MB packs gives this improvement with core.abbrev=15 & core.validateAbbrev=false (v.s. true): TestHEAD~ HEAD 0014.1: git log --oneline --raw --parents 95.68(95.07+0.53) 42.74(42.33+0.39) -55.3% While cleaning up the RFC version of this which I sent in [2] I discovered that almost none of the existing functionality was tested for, and was very inconsistent since we have 4 different places where the abbrev config is parsed. See 19/20 and 20/20 for what this whole thing is building towards, the rest is all tests, cleanup, and preparatory work. (There's still other long-standing issues with the existing behavior which this doesn't change, but I had to stop somewhere to make this digestible). 1. https://public-inbox.org/git/20180108102029.ga21...@sigill.intra.peff.net/ 2. https://public-inbox.org/git/20180606102719.27145-1-ava...@gmail.com/ Ævar Arnfjörð Bjarmason (20): t/README: clarify the description of test_line_count test library: add a test_byte_count blame doc: explicitly note how --abbrev=40 gives 39 chars abbrev tests: add tests for core.abbrev and --abbrev abbrev tests: test "git-blame" behavior blame: fix a bug, core.abbrev should work like --abbrev abbrev tests: test "git branch" behavior abbrev tests: test for "git-describe" behavior abbrev tests: test for "git-log" behavior abbrev tests: test for "git-diff" behavior abbrev tests: test for plumbing behavior abbrev tests: test for --abbrev and core.abbrev=[+-]N parse-options-cb.c: convert uses of 40 to GIT_SHA1_HEXSZ config.c: use braces on multiple conditional arms parse-options-cb.c: use braces on multiple conditional arms abbrev: unify the handling of non-numeric values abbrev: unify the handling of empty values abbrev parsing: use braces on multiple conditional arms abbrev: support relative abbrev values abbrev: add a core.validateAbbrev setting Documentation/config.txt| 49 +++ Documentation/diff-options.txt | 3 + Documentation/git-blame.txt | 14 + Documentation/git-branch.txt| 3 + Documentation/git-describe.txt | 3 + Documentation/git-ls-files.txt | 3 + Documentation/git-ls-tree.txt | 3 + Documentation/git-show-ref.txt | 3 + builtin/blame.c | 2 + cache.h | 2 + config.c| 22 +- diff.c | 24 +- environment.c | 2 + parse-options-cb.c | 19 +- revision.c | 24 +- sha1-name.c | 15 + t/README| 6 +- t/perf/p0014-abbrev.sh | 13 + t/t0014-abbrev.sh | 452 t/t1512-rev-parse-disambiguation.sh | 47 +++ t/t6006-rev-list-format.sh | 6 +- t/test-lib-functions.sh | 23 ++ 22 files changed, 722 insertions(+), 16 deletions(-) create mode 100755 t/perf/p0014-abbrev.sh create mode 100755 t/t0014-abbrev.sh -- 2.17.0.290.gded63e768a
[PATCH 01/20] t/README: clarify the description of test_line_count
Referring to the "length" of a file is ambiguous. We mean the number of lines, so let's say that. Changes the description added in fb3340a6a7 ("test-lib: introduce test_line_count to measure files", 2010-10-31). Signed-off-by: Ævar Arnfjörð Bjarmason --- t/README | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/README b/t/README index 8373a27fea..eede11d649 100644 --- a/t/README +++ b/t/README @@ -730,7 +730,7 @@ library for your script to use. - test_line_count (= | -lt | -ge | ...) - Check whether a file has the length it is expected to. + Check whether a file has the number of lines it is expected to. - test_path_is_file [] test_path_is_dir [] -- 2.17.0.290.gded63e768a
[PATCH 02/20] test library: add a test_byte_count
This new function is like test_line_count except it uses "wc -c" instead of "wc -l". Perhaps this should be a parameter, but I don't see us needing "wc -m" (chars), "wc -w" (words) etc. Change a couple of existing tests that use this, I expect to use this in later patches when adding more tests. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/README | 4 t/t6006-rev-list-format.sh | 6 ++ t/test-lib-functions.sh| 23 +++ 3 files changed, 29 insertions(+), 4 deletions(-) diff --git a/t/README b/t/README index eede11d649..3139c27e4e 100644 --- a/t/README +++ b/t/README @@ -728,6 +728,10 @@ library for your script to use. Check whether the rev points to the same commit as the rev. + - test_byte_count (= | -lt | -ge | ...) + + Check whether a file has the number of bytes it is expected to. + - test_line_count (= | -lt | -ge | ...) Check whether a file has the number of lines it is expected to. diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh index ec42c2f779..ec068c55ab 100755 --- a/t/t6006-rev-list-format.sh +++ b/t/t6006-rev-list-format.sh @@ -456,14 +456,12 @@ test_expect_success '--abbrev' ' test_expect_success '%H is not affected by --abbrev-commit' ' git log -1 --format=%H --abbrev-commit --abbrev=20 HEAD >actual && - len=$(wc -c actual && - len=$(wc -c output && +# test_byte_count = 1 output +# ' +# +# is like "test $(wc -c
[PATCH 04/20] abbrev tests: add tests for core.abbrev and --abbrev
How hashes are abbreviated is a core feature of git, and should have its own t0*.sh tests. There's a few tests for core.abbrev and --abbrev already, but none of those stress the feature itself and its edge cases much. We should have tests for those in one place. I don't like some of this behavior of --abbrev being looser about input values that core.abbrev, But let's start by asserting the current behavior we have before we change any of it. That difference in behavior wasn't intentional. The code that does the command-line parsing was initially added in 0ce865b134 ("Add shortcuts for very often used options.", 2007-10-14), and it wasn't until much later in dce9648916 ("Make the default abbrev length configurable", 2010-10-28) when core.abbrev was added with stricter parsing. That's also only most of the command-line parsing. The diff and log family of commands have their own parsing for it in diff.c and revision.c, respectively. Those were added earlier in 47dd0d595d ("diff: --abbrev option", 2005-12-13) and 508d9e372e ("Fix "--abbrev=xyz" for revision listing", 2006-05-27), although note that sometimes diff goes via the revision.c path. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t0014-abbrev.sh | 118 ++ 1 file changed, 118 insertions(+) create mode 100755 t/t0014-abbrev.sh diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh new file mode 100755 index 00..1c60f5ff93 --- /dev/null +++ b/t/t0014-abbrev.sh @@ -0,0 +1,118 @@ +#!/bin/sh + +test_description='test core.abbrev and related features' + +. ./test-lib.sh + +tr_d_n() { + tr -d '\n' +} + +cut_tr_d_n_field_n() { + cut -d " " -f $1 | tr_d_n +} + +test_expect_success 'setup' ' + test_commit A && + git tag -a -mannotated A.annotated && + test_commit B && + test_commit C && + mkdir X Y && + touch X/file1 Y/file2 +' + +test_expect_success 'the FALLBACK_DEFAULT_ABBREV is 7' ' + git log -1 --pretty=format:%h >log && + test_byte_count = 7 log +' + +test_expect_success 'abbrev empty value handling differs ' ' + test_must_fail git -c core.abbrev= log -1 --pretty=format:%h 2>stderr && + test_i18ngrep "bad numeric config value.*invalid unit" stderr && + + git branch -v --abbrev= | cut_tr_d_n_field_n 3 >branch && + test_byte_count = 40 branch && + + git log --abbrev= -1 --pretty=format:%h >log && + test_byte_count = 4 log && + + git diff --raw --abbrev= HEAD~ >diff && + cut_tr_d_n_field_n 3 diff.3 && + test_byte_count = 4 diff.3 && + cut_tr_d_n_field_n 4 diff.4 && + test_byte_count = 4 diff.4 && + + test_must_fail git diff --raw --abbrev= --no-index X Y >diff && + cut_tr_d_n_field_n 3 diff.3 && + test_byte_count = 4 diff.3 && + cut_tr_d_n_field_n 4 diff.4 && + test_byte_count = 4 diff.4 +' + +test_expect_success 'abbrev non-integer value handling differs ' ' + test_must_fail git -c core.abbrev=XYZ log -1 --pretty=format:%h 2>stderr && + test_i18ngrep "bad numeric config value.*invalid unit" stderr && + + test_must_fail git branch -v --abbrev=XYZ 2>stderr && + test_i18ngrep "expects a numerical value" stderr && + + git log --abbrev=XYZ -1 --pretty=format:%h 2>stderr && + ! test -s stderr && + + git diff --raw --abbrev=XYZ HEAD~ 2>stderr && + ! test -s stderr && + + test_must_fail git diff --raw --abbrev=XYZ --no-index X Y 2>stderr && + ! test -s stderr +' + +for i in -41 -20 -10 -1 0 1 2 3 41 +do + test_expect_success "core.abbrev value $i out of range errors out" " + test_must_fail git -c core.abbrev=$i log -1 --pretty=format:%h 2>stderr && + test_i18ngrep 'abbrev length out of range' stderr + " +done + +for i in -41 -20 -10 -1 +do + test_expect_success "negative --abbrev=$i value out of range means --abbrev=40" " + git log --abbrev=$i -1 --pretty=format:%h >log && + test_byte_count = 40 log + " +done + +for i in 0 1 2 3 4 +do + test_expect_success "non-negative --abbrev=$i value log && + test_byte_count = 4 log + " +done + +for i in 41 9001 +do + test_expect_success "non-negative --abbrev=$i value >MINIMUM_ABBREV falls back on 40" " + git log --abbrev=$i -1 --pretty=format:%h >log && + test_byte_count = 40 log + " +done + +for i in $(test_seq 4 40) +do + test_expect_success "core.abbrev=$i and --abbrev=$i in combination within the valid range" " + # Both core.abbrev=X and --abbrev=X do the same thing + # in isolation + git -c core.abbrev=$i log -1 --pretty=format:%h >log && + test_byte_count = $i log && + git log --abbrev=$i -1 --pretty=format:%h >log && + test_byte_count = $i log && + + # The --abbrev option should take priority over +
Re: GDPR compliance best practices?
On Fri, Jun 08 2018, Peter Backes wrote: > On Fri, Jun 08, 2018 at 10:13:20AM +0200, Ævar Arnfjörð Bjarmason wrote: >> Can you walk us through how anyone would be expected to fork (as create >> a new project, not the github-ism) existing projects under such a >> regiment? > > I don't see your point. Copy the repository to fork. Nothing changes > about that. Nothing prevents anyone from forking a repository which had > some of its author names removed from the commits. This basically the same as saying the whole notion of Signed-off-by should be abandoned entirely, since in this case the fork will only have a partial set of these. The point is that we're recording information so each line in the repository can be traced back to a SOB. These sorts of take-downs would destroy that information, and the proposed solution of having some party retain these creates a special class of free software users who are capable of following that line of attributions.
I NEED YOUR HELP
Dear Nguyễn, I am barrister Malouf Tuma, the solicitor and executor of late (Eng.G.E. Nguyễn,), will" and "Testament"; who died of kidney cancer with UN-identified family or relative. I am contacting you to stand in as a next of kin to his deposit of US$11.350million, with one of the leading bank here in west Africa since you share the same last name with my late client. Note that i find it very difficult to send you this letter due to ip differences therefore; i implore you to create gmail.com, Hotmail.com, or yahoo.com and contact me through it. Thank you for your understanding. Please respond directly to my private email address (malouftum...@gmail.com) Regards, Malouf Tuma (Esq.)
Re: [PATCH v2] completion: reduce overhead of clearing cached --options
Hi, SZEDER Gábor wrote: > Being in RC phase, I'm all for aiming for a minimal solution. > However, I don't think that the better fix would be erm.. any "less > minimal": > > diff --git a/contrib/completion/git-completion.bash > b/contrib/completion/git-completion.bash > index f2aa484758..7aeb575cd1 100644 > --- a/contrib/completion/git-completion.bash > +++ b/contrib/completion/git-completion.bash > @@ -3244,7 +3244,10 @@ __gitk_main () > __git_complete_revlist > } > > -if [[ -n ${ZSH_VERSION-} ]]; then > +if [[ -n ${ZSH_VERSION-} ]] && > + # Don't define these functions when sourced from 'git-completion.zsh', > + # it has its own implementations. > + [[ -z "${GIT_SOURCING_ZSH_COMPLETION}" ]] ; then Needs a - before the } to avoid errors in a shell where the user has chosen to use "set -u". See v1.7.4-rc0~159 (completion: fix zsh check under bash with 'set -u', 2010-10-27) for more details. > echo "WARNING: this script is deprecated, please see > git-completion.zsh" 1>&2 > > autoload -U +X compinit && compinit > diff --git a/contrib/completion/git-completion.zsh > b/contrib/completion/git-completion.zsh > index 53cb0f934f..049d6b80f6 100644 > --- a/contrib/completion/git-completion.zsh > +++ b/contrib/completion/git-completion.zsh > @@ -39,7 +39,7 @@ if [ -z "$script" ]; then > test -f $e && script="$e" && break > done > fi > -ZSH_VERSION='' . "$script" > +GIT_SOURCING_ZSH_COMPLETION=y . "$script" > > __gitcomp () > { Except for that tweak, Reviewed-by: Jonathan Nieder Thanks. Now it just needs a commit message. :)
Re: [PATCH v2] completion: reduce overhead of clearing cached --options
On Fri, Jun 8, 2018 at 11:52 PM, Jonathan Nieder wrote: > SZEDER Gábor wrote: >> On Fri, Jun 8, 2018 at 11:23 PM, Jonathan Nieder wrote: > + [[ -z "${GIT_SOURCING_ZSH_COMPLETION}" ]] ; then >>> >>> Needs a - before the } to avoid errors in a shell where the user has >>> chosen to use "set -u". See v1.7.4-rc0~159 (completion: fix zsh check >>> under bash with 'set -u', 2010-10-27) for more details. >> >> Right... I did remember that, but by the time I finished typing out >> that long variable name I forgot about it... :) >> >> However, I'm not sure it's worth caring about, because the >> bash-competion scripts don't work with 'set -u' anyway... > > I'd be happy to see these '-'s go away (it would make the code both > easier to read and easier to maintain), but as a post-2.18 topic, > please. Of course. > Alternatively, if there's a real need for them, I'd be happy to see > the completion tests start using 'set -u'. Our test harness doesn't work with 'set -u' at all...
Re: GDPR compliance best practices?
Am 08.06.2018 um 04:53 schrieb Theodore Y. Ts'o: And of course, that's the other thing you seem to fundamentally not understand about how git works. Every developer in the world working on that open source project has their own copy. Everyone here understands how Git works, of course. "*shrug* but that's how Git works" does *NOT* override the GDPR. -- Hannes
Re: [RFC PATCH v1] telemetry design overview (part 1)
On Fri, Jun 08 2018, Johannes Sixt wrote: > Am 08.06.2018 um 18:00 schrieb Thomas Braun: >> I for my part would much rather prefer that to be a compile time >> option so that I don't need to check on every git update on windows >> if this is now enabled or not. > > This exactly my concern, too! A compile-time option may make it a good > deal less worrisome. Can you elaborate on how someone who can maintain inject malicious code into your git package + config would be thwarted by this being some compile-time option, wouldn't they just compile it in?
[PATCHv2 2/6] git-p4: add option to disable syncing of p4/master with p4
Add an option to the git-p4 submit command to disable syncing with Perforce. This is useful for the case where a git-p4 mirror has been setup on a server somewhere, running from (e.g.) cron, and developers then clone from this. Having the local cloned copy also sync from Perforce just isn't useful. Signed-off-by: Luke Diamand --- Documentation/git-p4.txt | 8 git-p4.py| 31 --- 2 files changed, 28 insertions(+), 11 deletions(-) diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt index 3d83842e47..f0de3b891b 100644 --- a/Documentation/git-p4.txt +++ b/Documentation/git-p4.txt @@ -369,6 +369,11 @@ These options can be used to modify 'git p4 submit' behavior. Disable the automatic rebase after all commits have been successfully submitted. Can also be set with git-p4.disableRebase. +--disable-p4sync:: +Disable the automatic sync of p4/master from Perforce after commits have +been submitted. Implies --disable-rebase. Can also be set with +git-p4.disableP4Sync. Sync with origin/master still goes ahead if possible. + Rebase options ~~ These options can be used to modify 'git p4 rebase' behavior. @@ -693,6 +698,9 @@ git-p4.conflict:: git-p4.disableRebase:: Do not rebase the tree against p4/master following a submit. +git-p4.disableP4Sync:: +Do not sync p4/master with Perforce following a submit. Implies git-p4.disableRebase. + IMPLEMENTATION DETAILS -- * Changesets from p4 are imported using Git fast-import. diff --git a/git-p4.py b/git-p4.py index 5ab9421af8..b61f47cc61 100755 --- a/git-p4.py +++ b/git-p4.py @@ -1368,7 +1368,9 @@ def __init__(self): help="submit only the specified commit(s), one commit or xxx..xxx"), optparse.make_option("--disable-rebase", dest="disable_rebase", action="store_true", help="Disable rebase after submit is completed. Can be useful if you " - "work from a local git branch that is not master") + "work from a local git branch that is not master"), +optparse.make_option("--disable-p4sync", dest="disable_p4sync", action="store_true", + help="Skip Perforce sync of p4/master after submit or shelve"), ] self.description = "Submit changes from git to the perforce depot." self.usage += " [name of git branch to submit into perforce depot]" @@ -1380,6 +1382,7 @@ def __init__(self): self.update_shelve = list() self.commit = "" self.disable_rebase = gitConfigBool("git-p4.disableRebase") +self.disable_p4sync = gitConfigBool("git-p4.disableP4Sync") self.prepare_p4_only = False self.conflict_behavior = None self.isWindows = (platform.system() == "Windows") @@ -2240,11 +2243,14 @@ def run(self, args): sync = P4Sync() if self.branch: sync.branch = self.branch -sync.run([]) +if self.disable_p4sync: +sync.sync_origin_only() +else: +sync.run([]) -if self.disable_rebase is False: -rebase = P4Rebase() -rebase.rebase() +if not self.disable_rebase: +rebase = P4Rebase() +rebase.rebase() else: if len(applied) == 0: @@ -3324,6 +3330,14 @@ def importChanges(self, changes, shelved=False, origin_revision=0): print self.gitError.read() sys.exit(1) +def sync_origin_only(self): +if self.syncWithOrigin: +self.hasOrigin = originP4BranchesExist() +if self.hasOrigin: +if not self.silent: +print 'Syncing with origin first, using "git fetch origin"' +system("git fetch origin") + def importHeadRevision(self, revision): print "Doing initial import of %s from revision %s into %s" % (' '.join(self.depotPaths), revision, self.branch) @@ -3402,12 +3416,7 @@ def run(self, args): else: self.refPrefix = "refs/heads/p4/" -if self.syncWithOrigin: -self.hasOrigin = originP4BranchesExist() -if self.hasOrigin: -if not self.silent: -print 'Syncing with origin first, using "git fetch origin"' -system("git fetch origin") +self.sync_origin_only() branch_arg_given = bool(self.branch) if len(self.branch) == 0: -- 2.17.0.392.gdeb1a6e9b7
[PATCHv2 1/6] git-p4: disable-rebase: allow setting this via configuration
This just lets you set the --disable-rebase option with the git configuration options git-p4.disableRebase. If you're using this option, you probably want to set it all the time for a given repo. Signed-off-by: Luke Diamand --- Documentation/git-p4.txt | 5 - git-p4.py| 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt index e8452528fc..3d83842e47 100644 --- a/Documentation/git-p4.txt +++ b/Documentation/git-p4.txt @@ -367,7 +367,7 @@ These options can be used to modify 'git p4 submit' behavior. --disable-rebase:: Disable the automatic rebase after all commits have been successfully -submitted. +submitted. Can also be set with git-p4.disableRebase. Rebase options ~~ @@ -690,6 +690,9 @@ git-p4.conflict:: Specify submit behavior when a conflict with p4 is found, as per --conflict. The default behavior is 'ask'. +git-p4.disableRebase:: +Do not rebase the tree against p4/master following a submit. + IMPLEMENTATION DETAILS -- * Changesets from p4 are imported using Git fast-import. diff --git a/git-p4.py b/git-p4.py index c4581d20a6..5ab9421af8 100755 --- a/git-p4.py +++ b/git-p4.py @@ -1379,7 +1379,7 @@ def __init__(self): self.shelve = False self.update_shelve = list() self.commit = "" -self.disable_rebase = False +self.disable_rebase = gitConfigBool("git-p4.disableRebase") self.prepare_p4_only = False self.conflict_behavior = None self.isWindows = (platform.system() == "Windows") -- 2.17.0.392.gdeb1a6e9b7
[PATCHv2 5/6] git-p4: narrow the scope of exceptions caught when parsing an int
The current code traps all exceptions around some code which parses an integer, and then talks to Perforce. That can result in errors from Perforce being ignored. Change the code to only catch the integer conversion exceptions. Signed-off-by: Luke Diamand --- git-p4.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/git-p4.py b/git-p4.py index f4b5deeb83..5f59feb5bb 100755 --- a/git-p4.py +++ b/git-p4.py @@ -959,7 +959,7 @@ def p4ChangesForPaths(depotPaths, changeRange, requestedBlockSize): try: (changeStart, changeEnd) = p4ParseNumericChangeRange(parts) block_size = chooseBlockSize(requestedBlockSize) -except: +except ValueError: changeStart = parts[0][1:] changeEnd = parts[1] if requestedBlockSize: -- 2.17.0.392.gdeb1a6e9b7
Re: [PATCH v8 1/2] json_writer: new routines to create data in JSON format
Am 07.06.2018 um 16:12 schrieb g...@jeffhostetler.com: > Makefile| 2 + > json-writer.c | 419 > json-writer.h | 113 + > t/helper/test-json-writer.c | 572 > > t/t0019-json-writer.sh | 236 ++ > 5 files changed, 1342 insertions(+) > create mode 100644 json-writer.c > create mode 100644 json-writer.h > create mode 100644 t/helper/test-json-writer.c > create mode 100755 t/t0019-json-writer.sh $ git grep 'static inline' '*json*' json-writer.c:static inline void indent_pretty(struct json_writer *jw) json-writer.c:static inline void begin(struct json_writer *jw, char ch_open, int pretty) json-writer.c:static inline void assert_in_object(const struct json_writer *jw, const char *key) json-writer.c:static inline void assert_in_array(const struct json_writer *jw) json-writer.c:static inline void maybe_add_comma(struct json_writer *jw) json-writer.c:static inline void fmt_double(struct json_writer *jw, int precision, json-writer.c:static inline void object_common(struct json_writer *jw, const char *key) json-writer.c:static inline void array_common(struct json_writer *jw) json-writer.c:static inline void assert_is_terminated(const struct json_writer *jw) t/helper/test-json-writer.c:static inline int scripted(int argc, const char **argv) t/helper/test-json-writer.c:static inline int my_usage(void) Do you need all those inline keywords? I'd rather leave the decision about inlining to the compiler and (via optimization flags) the user as much as possible. Not a biggie, but the high frequency of that word made me blink.. René
[PATCHv2 6/6] git-p4: auto-size the block
git-p4 originally would fetch changes in one query. On large repos this could fail because of the limits that Perforce imposes on the number of items returned and the number of queries in the database. To fix this, git-p4 learned to query changes in blocks of 512 changes, However, this can be very slow - if you have a few million changes, with each chunk taking about a second, it can be an hour or so. Although it's possible to tune this value manually with the "--changes-block-size" option, it's far from obvious to ordinary users that this is what needs doing. This change alters the block size dynamically by looking for the specific error messages returned from the Perforce server, and reducing the block size if the error is seen, either to the limit reported by the server, or to half the current block size. That means we can start out with a very large block size, and then let it automatically drop down to a value that works without error, while still failing correctly if some other error occurs. Signed-off-by: Luke Diamand --- git-p4.py | 27 +-- t/t9818-git-p4-block.sh | 8 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/git-p4.py b/git-p4.py index 5f59feb5bb..0354d4df5c 100755 --- a/git-p4.py +++ b/git-p4.py @@ -47,8 +47,8 @@ def __str__(self): # Only labels/tags matching this will be imported/exported defaultLabelRegexp = r'[a-zA-Z0-9_\-.]+$' -# Grab changes in blocks of this many revisions, unless otherwise requested -defaultBlockSize = 512 +# The block size is reduced automatically if required +defaultBlockSize = 1<<20 p4_access_checked = False @@ -969,7 +969,8 @@ def p4ChangesForPaths(depotPaths, changeRange, requestedBlockSize): changes = set() # Retrieve changes a block at a time, to prevent running -# into a MaxResults/MaxScanRows error from the server. +# into a MaxResults/MaxScanRows error from the server. If +# we _do_ hit one of those errors, turn down the block size while True: cmd = ['changes'] @@ -983,10 +984,24 @@ def p4ChangesForPaths(depotPaths, changeRange, requestedBlockSize): for p in depotPaths: cmd += ["%s...@%s" % (p, revisionRange)] +# fetch the changes +try: +result = p4CmdList(cmd, errors_as_exceptions=True) +except P4RequestSizeException as e: +if not block_size: +block_size = e.limit +elif block_size > e.limit: +block_size = e.limit +else: +block_size = max(2, block_size // 2) + +if verbose: print("block size error, retrying with block size {0}".format(block_size)) +continue +except P4Exception as e: +die('Error retrieving changes description ({0})'.format(e.p4ExitCode)) + # Insert changes in chronological order -for entry in reversed(p4CmdList(cmd)): -if entry.has_key('p4ExitCode'): -die('Error retrieving changes descriptions ({})'.format(entry['p4ExitCode'])) +for entry in reversed(result): if not entry.has_key('change'): continue changes.add(int(entry['change'])) diff --git a/t/t9818-git-p4-block.sh b/t/t9818-git-p4-block.sh index 8840a183ac..ce7cb22ad3 100755 --- a/t/t9818-git-p4-block.sh +++ b/t/t9818-git-p4-block.sh @@ -129,6 +129,7 @@ test_expect_success 'Create a repo with multiple depot paths' ' ' test_expect_success 'Clone repo with multiple depot paths' ' + test_when_finished cleanup_git && ( cd "$git" && git p4 clone --changes-block-size=4 //depot/pathA@all //depot/pathB@all \ @@ -138,6 +139,13 @@ test_expect_success 'Clone repo with multiple depot paths' ' ) ' +test_expect_success 'Clone repo with self-sizing block size' ' + test_when_finished cleanup_git && + git p4 clone --changes-block-size=100 //depot@all --destination="$git" && + git -C "$git" log --oneline >log && + test_line_count \> 10 log +' + test_expect_success 'kill p4d' ' kill_p4d ' -- 2.17.0.392.gdeb1a6e9b7
[PATCHv2 3/6] git-p4: better error reporting when p4 fails
Currently when p4 fails to run, git-p4 just crashes with an obscure error message. For example, if the P4 ticket has expired, you get: Error: Cannot locate perforce checkout of in client view This change checks whether git-p4 can talk to the Perforce server when the first P4 operation is attempted, and tries to print a meaningful error message if it fails. Signed-off-by: Luke Diamand --- git-p4.py | 55 + t/t9833-errors.sh | 78 +++ 2 files changed, 133 insertions(+) create mode 100755 t/t9833-errors.sh diff --git a/git-p4.py b/git-p4.py index b61f47cc61..3de12a4a0a 100755 --- a/git-p4.py +++ b/git-p4.py @@ -50,6 +50,8 @@ def __str__(self): # Grab changes in blocks of this many revisions, unless otherwise requested defaultBlockSize = 512 +p4_access_checked = False + def p4_build_cmd(cmd): """Build a suitable p4 command line. @@ -91,6 +93,13 @@ def p4_build_cmd(cmd): real_cmd = ' '.join(real_cmd) + ' ' + cmd else: real_cmd += cmd + +# now check that we can actually talk to the server +global p4_access_checked +if not p4_access_checked: +p4_access_checked = True# suppress access checks in p4_check_access itself +p4_check_access() + return real_cmd def git_dir(path): @@ -264,6 +273,52 @@ def p4_system(cmd): if retcode: raise CalledProcessError(retcode, real_cmd) +def die_bad_access(s): +die("failure accessing depot: {0}".format(s.rstrip())) + +def p4_check_access(min_expiration=1): +""" Check if we can access Perforce - account still logged in +""" +results = p4CmdList(["login", "-s"]) + +if len(results) == 0: +# should never get here: always get either some results, or a p4ExitCode +assert("could not parse response from perforce") + +result = results[0] + +if 'p4ExitCode' in result: +# p4 returned non-zero status, e.g. P4PORT invalid, or p4 not in path +die_bad_access("could not run p4") + +code = result.get("code") +if not code: +# we get here if we couldn't connect and there was nothing to unmarshal +die_bad_access("could not connect") + +elif code == "stat": +expiry = result.get("TicketExpiration") +if expiry: +expiry = int(expiry) +if expiry > min_expiration: +# ok to carry on +return +else: +die_bad_access("perforce ticket expires in {0} seconds".format(expiry)) + +else: +# account without a timeout - all ok +return + +elif code == "error": +data = result.get("data") +if data: +die_bad_access("p4 error: {0}".format(data)) +else: +die_bad_access("unknown error") +else: +die_bad_access("unknown error code {0}".format(code)) + _p4_version_string = None def p4_version_string(): """Read the version string, showing just the last line, which diff --git a/t/t9833-errors.sh b/t/t9833-errors.sh new file mode 100755 index 00..9ba892de7a --- /dev/null +++ b/t/t9833-errors.sh @@ -0,0 +1,78 @@ +#!/bin/sh + +test_description='git p4 errors' + +. ./lib-git-p4.sh + +test_expect_success 'start p4d' ' + start_p4d +' + +test_expect_success 'add p4 files' ' + ( + cd "$cli" && + echo file1 >file1 && + p4 add file1 && + p4 submit -d "file1" + ) +' + +# after this test, the default user requires a password +test_expect_success 'error handling' ' + git p4 clone --dest="$git" //depot@all && + ( + cd "$git" && + P4PORT=: test_must_fail git p4 submit 2>errmsg + ) && + p4 passwd -P newpassword && + ( + P4PASSWD=badpassword test_must_fail git p4 clone //depot/foo 2>errmsg && + grep -q "failure accessing depot.*P4PASSWD" errmsg + ) +' + +test_expect_success 'ticket logged out' ' + P4TICKETS="$cli/tickets" && + echo "newpassword" | p4 login && + ( + cd "$git" && + test_commit "ticket-auth-check" && + p4 logout && + test_must_fail git p4 submit 2>errmsg && + grep -q "failure accessing depot" errmsg + ) +' + +test_expect_success 'create group with short ticket expiry' ' + P4TICKETS="$cli/tickets" && + echo "newpassword" | p4 login && + p4_add_user short_expiry_user && + p4 -u short_expiry_user passwd -P password && + p4 group -i <<-EOF && + Group: testgroup + Timeout: 3 + Users: short_expiry_user + EOF + + p4 users | grep short_expiry_user +' + +test_expect_success 'git operation with expired ticket' ' + P4TICKETS="$cli/tickets" && + P4USER=short_expiry_user && + echo "password" | p4 login && + ( + cd
[PATCHv2 0/6] git-p4: some small fixes updated
This is an updated version of the set of changes I posted recently, following comments on the list: disable automatic sync after git-p4 submit: https://marc.info/?l=git=152818734814838=2 better handling of being logged out by Perforce: https://marc.info/?l=git=152818893815326=2 adapt the block size automatically on git-p4 submit: https://marc.info/?l=git=152819004315688=2 - Spelling corrections (Eric) - Improved comments (Eric) - Exception class hierarchy fix (Merland) - test simplification (Eric) Luke Diamand (6): git-p4: disable-rebase: allow setting this via configuration git-p4: add option to disable syncing of p4/master with p4 git-p4: better error reporting when p4 fails git-p4: raise exceptions from p4CmdList based on error from p4 server git-p4: narrow the scope of exceptions caught when parsing an int git-p4: auto-size the block Documentation/git-p4.txt | 13 +++- git-p4.py| 161 +-- t/t9818-git-p4-block.sh | 8 ++ t/t9833-errors.sh| 78 +++ 4 files changed, 236 insertions(+), 24 deletions(-) create mode 100755 t/t9833-errors.sh -- 2.17.0.392.gdeb1a6e9b7
[PATCHv2 4/6] git-p4: raise exceptions from p4CmdList based on error from p4 server
This change lays some groundwork for better handling of rowcount errors from the server, where it fails to send us results because we requested too many. It adds an option to p4CmdList() to return errors as a Python exception. The exceptions are derived from P4Exception (something went wrong), P4ServerException (the server sent us an error code) and P4RequestSizeException (we requested too many rows/results from the server database). This makes the code that handles the errors a bit easier. The default behavior is unchanged; the new code is enabled with a flag. Signed-off-by: Luke Diamand --- git-p4.py | 44 1 file changed, 40 insertions(+), 4 deletions(-) diff --git a/git-p4.py b/git-p4.py index 3de12a4a0a..f4b5deeb83 100755 --- a/git-p4.py +++ b/git-p4.py @@ -566,10 +566,30 @@ def isModeExec(mode): # otherwise False. return mode[-3:] == "755" +class P4Exception(Exception): +""" Base class for exceptions from the p4 client """ +def __init__(self, exit_code): +self.p4ExitCode = exit_code + +class P4ServerException(P4Exception): +""" Base class for exceptions where we get some kind of marshalled up result from the server """ +def __init__(self, exit_code, p4_result): +super(P4ServerException, self).__init__(exit_code) +self.p4_result = p4_result +self.code = p4_result[0]['code'] +self.data = p4_result[0]['data'] + +class P4RequestSizeException(P4ServerException): +""" One of the maxresults or maxscanrows errors """ +def __init__(self, exit_code, p4_result, limit): +super(P4RequestSizeException, self).__init__(exit_code, p4_result) +self.limit = limit + def isModeExecChanged(src_mode, dst_mode): return isModeExec(src_mode) != isModeExec(dst_mode) -def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False): +def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, +errors_as_exceptions=False): if isinstance(cmd,basestring): cmd = "-G " + cmd @@ -616,9 +636,25 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False): pass exitCode = p4.wait() if exitCode != 0: -entry = {} -entry["p4ExitCode"] = exitCode -result.append(entry) +if errors_as_exceptions: +if len(result) > 0: +data = result[0].get('data') +if data: +m = re.search('Too many rows scanned \(over (\d+)\)', data) +if not m: +m = re.search('Request too large \(over (\d+)\)', data) + +if m: +limit = int(m.group(1)) +raise P4RequestSizeException(exitCode, result, limit) + +raise P4ServerException(exitCode, result) +else: +raise P4Exception(exitCode) +else: +entry = {} +entry["p4ExitCode"] = exitCode +result.append(entry) return result -- 2.17.0.392.gdeb1a6e9b7
Re: [PATCH v2] completion: reduce overhead of clearing cached --options
> Related question: what would it take to add a zsh completion sanity > check to t/t9902-completion.sh? I don't know. What I do know is that we can't just run our tests with ZSH, e.g. running 'zsh ./t-basic.sh' shows mostly failures. So it won't be as simple as modifying 't/lib-bash.sh' to somehow support ZSH as well. > Here's a minimal fix, untested. As a followup, as G�bor suggests at [2], > it would presumably make sense to stop overriding ZSH_VERSION, using > this GIT_ scoped variable everywhere instead. > > Thoughts? > > Reported-by: Rick van Hattem > Reported-by: Dave Borowitz > Signed-off-by: Jonathan Nieder > > [1] > https://public-inbox.org/git/01020163c683e753-04629405-15f8-4a30-9dc3-e4e3f2a5aa26-000...@eu-west-1.amazonses.com/ > [2] https://public-inbox.org/git/20180606114147.7753-1-szeder@gmail.com/ > > diff --git i/contrib/completion/git-completion.bash > w/contrib/completion/git-completion.bash > index 12814e9bbf..e4bcc435ea 100644 > --- i/contrib/completion/git-completion.bash > +++ w/contrib/completion/git-completion.bash > @@ -348,7 +348,7 @@ __gitcomp () > > # Clear the variables caching builtins' options when (re-)sourcing > # the completion script. > -if [[ -n ${ZSH_VERSION-} ]]; then > +if [[ -n ${ZSH_VERSION-} || -n ${GIT_SOURCING_ZSH_COMPLETION-} ]]; then > unset $(set |sed -ne > 's/^\(__gitcomp_builtin_[a-zA-Z0-9_][a-zA-Z0-9_]*\)=.*/\1/p') 2>/dev/null > else > unset $(compgen -v __gitcomp_builtin_) > diff --git i/contrib/completion/git-completion.zsh > w/contrib/completion/git-completion.zsh > index 53cb0f934f..c7be9fd4dc 100644 > --- i/contrib/completion/git-completion.zsh > +++ w/contrib/completion/git-completion.zsh > @@ -39,7 +39,7 @@ if [ -z "$script" ]; then > test -f $e && script="$e" && break > done > fi > -ZSH_VERSION='' . "$script" > +GIT_SOURCING_ZSH_COMPLETION=1 ZSH_VERSION='' . "$script" > > __gitcomp () > { > Being in RC phase, I'm all for aiming for a minimal solution. However, I don't think that the better fix would be erm.. any "less minimal": diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index f2aa484758..7aeb575cd1 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -3244,7 +3244,10 @@ __gitk_main () __git_complete_revlist } -if [[ -n ${ZSH_VERSION-} ]]; then +if [[ -n ${ZSH_VERSION-} ]] && + # Don't define these functions when sourced from 'git-completion.zsh', + # it has its own implementations. + [[ -z "${GIT_SOURCING_ZSH_COMPLETION}" ]] ; then echo "WARNING: this script is deprecated, please see git-completion.zsh" 1>&2 autoload -U +X compinit && compinit diff --git a/contrib/completion/git-completion.zsh b/contrib/completion/git-completion.zsh index 53cb0f934f..049d6b80f6 100644 --- a/contrib/completion/git-completion.zsh +++ b/contrib/completion/git-completion.zsh @@ -39,7 +39,7 @@ if [ -z "$script" ]; then test -f $e && script="$e" && break done fi -ZSH_VERSION='' . "$script" +GIT_SOURCING_ZSH_COMPLETION=y . "$script" __gitcomp () {
Re: [PATCH v2] completion: reduce overhead of clearing cached --options
SZEDER Gábor wrote: > On Fri, Jun 8, 2018 at 11:23 PM, Jonathan Nieder wrote: >>> + [[ -z "${GIT_SOURCING_ZSH_COMPLETION}" ]] ; then >> >> Needs a - before the } to avoid errors in a shell where the user has >> chosen to use "set -u". See v1.7.4-rc0~159 (completion: fix zsh check >> under bash with 'set -u', 2010-10-27) for more details. > > Right... I did remember that, but by the time I finished typing out > that long variable name I forgot about it... :) > > However, I'm not sure it's worth caring about, because the > bash-competion scripts don't work with 'set -u' anyway... I'd be happy to see these '-'s go away (it would make the code both easier to read and easier to maintain), but as a post-2.18 topic, please. Alternatively, if there's a real need for them, I'd be happy to see the completion tests start using 'set -u'. Thanks, Jonathan
Re: [PATCH v2] completion: reduce overhead of clearing cached --options
On Fri, Jun 8, 2018 at 11:23 PM, Jonathan Nieder wrote: > Hi, > > SZEDER Gábor wrote: > >> Being in RC phase, I'm all for aiming for a minimal solution. >> However, I don't think that the better fix would be erm.. any "less >> minimal": >> >> diff --git a/contrib/completion/git-completion.bash >> b/contrib/completion/git-completion.bash >> index f2aa484758..7aeb575cd1 100644 >> --- a/contrib/completion/git-completion.bash >> +++ b/contrib/completion/git-completion.bash >> @@ -3244,7 +3244,10 @@ __gitk_main () >> __git_complete_revlist >> } >> >> -if [[ -n ${ZSH_VERSION-} ]]; then >> +if [[ -n ${ZSH_VERSION-} ]] && >> + # Don't define these functions when sourced from 'git-completion.zsh', >> + # it has its own implementations. >> + [[ -z "${GIT_SOURCING_ZSH_COMPLETION}" ]] ; then > > Needs a - before the } to avoid errors in a shell where the user has > chosen to use "set -u". See v1.7.4-rc0~159 (completion: fix zsh check > under bash with 'set -u', 2010-10-27) for more details. Right... I did remember that, but by the time I finished typing out that long variable name I forgot about it... :) However, I'm not sure it's worth caring about, because the bash-competion scripts don't work with 'set -u' anyway... >> echo "WARNING: this script is deprecated, please see >> git-completion.zsh" 1>&2 >> >> autoload -U +X compinit && compinit >> diff --git a/contrib/completion/git-completion.zsh >> b/contrib/completion/git-completion.zsh >> index 53cb0f934f..049d6b80f6 100644 >> --- a/contrib/completion/git-completion.zsh >> +++ b/contrib/completion/git-completion.zsh >> @@ -39,7 +39,7 @@ if [ -z "$script" ]; then >> test -f $e && script="$e" && break >> done >> fi >> -ZSH_VERSION='' . "$script" >> +GIT_SOURCING_ZSH_COMPLETION=y . "$script" >> >> __gitcomp () >> { > > Except for that tweak, > Reviewed-by: Jonathan Nieder > Thanks. > > Now it just needs a commit message. :)
Re: [RFC PATCH v1] telemetry design overview (part 1)
Am 08.06.2018 um 18:00 schrieb Thomas Braun: I for my part would much rather prefer that to be a compile time option so that I don't need to check on every git update on windows if this is now enabled or not. This exactly my concern, too! A compile-time option may make it a good deal less worrisome. -- Hannes
Re: [RFC PATCH v1] telemetry design overview (part 1)
On Sat, Jun 9, 2018 at 12:22 AM Ævar Arnfjörð Bjarmason wrote: > > > On Fri, Jun 08 2018, Johannes Sixt wrote: > > > Am 08.06.2018 um 18:00 schrieb Thomas Braun: > >> I for my part would much rather prefer that to be a compile time > >> option so that I don't need to check on every git update on windows > >> if this is now enabled or not. > > > > This exactly my concern, too! A compile-time option may make it a good > > deal less worrisome. > > Can you elaborate on how someone who can maintain inject malicious code > into your git package + config would be thwarted by this being some > compile-time option, wouldn't they just compile it in? Look at this from a different angle. This is driven by the needs to collect telemetry in _controlled_ environment (mostly server side, I guess) and it should be no problem to make custom builds there for you. Not making it a compile-time option could force [1] linux distro to carry this function to everybody even if they don't use it (and it's kinda dangerous to misuse if you don't anonymize the data properly). I also prefer this a compile time option. [1] Of course many distros can choose to patch it out. But it's the same argument as bringing this option in in the first place: you guys already have that code in private and now want to put it in stock git to reduce maintenance cost, why add extra cost on linux distro maintenance? -- Duy