Re: [PATCH v2 0/4] Port git-add--interactive.perl:status_cmd to C
Hi Daniel, On 16-May-17 6:00 AM, Daniel Ferreira wrote: This is the second version of a patch series to start porting git-add--interactive from Perl to C. Series: v1: https://public-inbox.org/git/1494009820-2090-1-git-send-email-bnm...@gmail.com/ Travis CI build: https://travis-ci.org/theiostream/git/builds/232669894 I have applied most of the changes suggested by Johannes and Ævar (thank you!). The one exception was Johannes' request to make this a WIP patch series with a sort-of "design" of what's next to come. I did not do this because I still have no clue...! I'll begin experimenting on update_cmd() to see if I gain some insight on this issue that I could bring to this series. I do think this would be a good idea, though! :) I am Slavica Đukic, Outreachy intern for Git (Dec-March 2018/19 round). Project I'll be working on is "Turn git add -i to built-in". Would you mind if I use your work so far as head start? -- Daniel. Daniel Ferreira (4): diff: export diffstat interface add--helper: create builtin helper for interactive add add--helper: implement interactive status command add--interactive: use add-interactive--helper for status_cmd .gitignore| 1 + Makefile | 1 + builtin.h | 1 + builtin/add--helper.c | 285 ++ diff.c| 17 +-- diff.h| 19 +++- git-add--interactive.perl | 4 +- git.c | 1 + 8 files changed, 309 insertions(+), 20 deletions(-) create mode 100644 builtin/add--helper.c -- 2.7.4 (Apple Git-66) Thank you, Slavica
Re: [PATCH v2 0/4] Multiple subtree split fixes regarding complex repos
Roger Strain writes: > After doing some testing at scale, determined that one call was > taking too long; replaced that with an alternate call which > returns the same data significantly faster. Curious where the time goes. Do you know? > Also, if anyone has any other feedback on these I'd really love to > hear it. It's working better for us (as in, it actually generates The previous one is already in 'next'; please make it incremental with explanation as to why "show -s" is worse than "log -1" (but see below). > # processing commit without normal parent information; > # fetch from repo > - parents=$(git show -s --pretty=%P "$rev") > + parents=$(git log --pretty=%P -n 1 "$rev") If you want to learn the parents of a given commit: $ git help revisions says ^@, e.g. HEAD^@ A suffix ^ followed by an at sign is the same as listing all parents of (meaning, include anything reachable from its parents, but not the commit itself). so parents=$(git rev-parse "$rev^@") ought to be the most efficient way to do this, I suspect.
[PATCH v2 0/4] Multiple subtree split fixes regarding complex repos
After doing some testing at scale, determined that one call was taking too long; replaced that with an alternate call which returns the same data significantly faster. Also, if anyone has any other feedback on these I'd really love to hear it. It's working better for us (as in, it actually generates a compatible tree version to version) but still isn't perfect, and I'm not sure perfect is achievable, but want to make sure this doesn't things for anyone else. Changes since v1: diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh index 1c157dbd9..7dd643998 100755 --- a/contrib/subtree/git-subtree.sh +++ b/contrib/subtree/git-subtree.sh @@ -633,7 +633,7 @@ process_split_commit () { else # processing commit without normal parent information; # fetch from repo - parents=$(git show -s --pretty=%P "$rev") + parents=$(git log --pretty=%P -n 1 "$rev") extracount=$(($extracount + 1)) fi Strain, Roger L (4): subtree: refactor split of a commit into standalone method subtree: make --ignore-joins pay attention to adds subtree: use commits before rejoins for splits subtree: improve decision on merges kept in split contrib/subtree/git-subtree.sh | 129 + 1 file changed, 83 insertions(+), 46 deletions(-) -- 2.19.1
Re: [PATCH v2 0/4] Add proto v2 archive command with HTTP support
Josh Steadmon writes: > Yes, the version on my desktop sends version=2 when archiving: > > ∫ which git > /usr/bin/git > ∫ git --version > git version 2.19.0.605.g01d371f741-goog > ∫ GIT_TRACE_PACKET=${HOME}/server_trace git daemon \ > --enable=upload-archive \ > --base-path=${HOME}/src/bare-repos & > [1] 258496 > ∫ git archive --remote git://localhost/test-repo.git HEAD >! test.tar > ∫ grep version ~/server_trace > 15:31:22.377869 pkt-line.c:80 packet: git< > git-upload-archive /test-repo.git\0host=localhost\0\0version=2\0 Ah, that's truly broken. Come to think of it, do we need to be using uniform versions across different endpoints? The archive request could be at v3 while fetch request could still be at v2, in which case the design to use a single protocol.version variable is probably the root cause of the confusion? Perhaps like protocol..allow, we would want protocol..version or something like that (and no protocol.version) to make it clear that protocol v2 used for fetching has nothing to do with protocol v1 or v2 or v3 used for archiving? Luckily, protocol.version is still marked as experimental so it is not too bad that we caught the design mistake (if it is one) and can now correct it before the damage spreads too widely.
Re: [PATCH v2 0/4] Add proto v2 archive command with HTTP support
On 2018.09.27 15:20, Junio C Hamano wrote: > Jonathan Nieder writes: > > > 1. Clients sending version=2 when they do not, in fact, speak protocol > > v2 for a service is a (serious) bug. (Separately from this > > series) we should fix it. > > > > 2. That bug is already in the wild, alas. Fortunately the semantics of > > GIT_PROTOCOL as a list of key/value pairs is well defined. So we > > have choices of (a) bump version to version=3 (b) pass another > > value 'version=2:yesreallyversion=2' (c) etc. > > > > 3. This is likely to affect push, too. > > Do you mean that existing "git push", "git fetch" and "git archive" > sends version=2 even when they are not capable of speaking protocol > v2? I thought that "git archive [--remote]" was left outside of the > protocol update (that was the reason why the earlier attempt took a > hacky route of "shallow clone followed by local archive"), so there > is no "git archive" in the wild that can even say "version=$n" > (which requires you to be at least version=1)? Yes, the version on my desktop sends version=2 when archiving: ∫ which git /usr/bin/git ∫ git --version git version 2.19.0.605.g01d371f741-goog ∫ GIT_TRACE_PACKET=${HOME}/server_trace git daemon \ --enable=upload-archive \ --base-path=${HOME}/src/bare-repos & [1] 258496 ∫ git archive --remote git://localhost/test-repo.git HEAD >! test.tar ∫ grep version ~/server_trace 15:31:22.377869 pkt-line.c:80 packet: git< git-upload-archive /test-repo.git\0host=localhost\0\0version=2\0
Re: [PATCH v2 0/4] Add proto v2 archive command with HTTP support
Jonathan Nieder writes: > 1. Clients sending version=2 when they do not, in fact, speak protocol > v2 for a service is a (serious) bug. (Separately from this > series) we should fix it. > > 2. That bug is already in the wild, alas. Fortunately the semantics of > GIT_PROTOCOL as a list of key/value pairs is well defined. So we > have choices of (a) bump version to version=3 (b) pass another > value 'version=2:yesreallyversion=2' (c) etc. > > 3. This is likely to affect push, too. Do you mean that existing "git push", "git fetch" and "git archive" sends version=2 even when they are not capable of speaking protocol v2? I thought that "git archive [--remote]" was left outside of the protocol update (that was the reason why the earlier attempt took a hacky route of "shallow clone followed by local archive"), so there is no "git archive" in the wild that can even say "version=$n" (which requires you to be at least version=1)?
Re: [RFC PATCH v2 0/4] Avoid ls-refs when possible in protocol v2
Jonathan Tan writes: > To answer Junio's questions in [1], I think it's best to include the > full patch set that I'm developing, so here it is. The original patch is > now patch 3 of this set. Yeah, taking it out of context did make its purpose fuzzy. Without the other patches in the series that set the overall direction of reducing ls-refs, it was unclear why some transports are marked to specifically want to automatically call transport_get_remote_refs(). With these surrounding changes, "we call it as needed, because an earlier step may not have called it" starts to sound quite sensible.
[RFC PATCH v2 0/4] Avoid ls-refs when possible in protocol v2
To answer Junio's questions in [1], I think it's best to include the full patch set that I'm developing, so here it is. The original patch is now patch 3 of this set. [1] https://public-inbox.org/git/xmqq8t3pnphe@gitster-ct.c.googlers.com/ Rearranging Junio's questions: > ... ah, do you mean that this is not a new feature, but is a bugfix > for some callers that are not calling get-remote-refs before calling > fetch-refs, and the bit is to work around the fact that some > transport not just can function without get-remote-refs first but do > not want to call it? Yes, it is the bugfix you describe, except that the bug coincidentally does not cause any bad behavior. fetch-object.c indeed does not call get-remote-refs before fetch-refs, but it calls transport_set_option(), which so happens to do what we need (call set_helper_option()). However, we need it now, because ... > But this I do not quite understand. It looks saying "when asked to > fetch, if the transport does not allow us to do so without first > getting the advertisement, lazily do that", and that may be a good > thing to do, but then aren't the current set of callers already > calling transport-get-remote-refs elsewhere before they call > transport-fetch-refs? IOW, I would have expected to see a matching > removal, or at least a code that turns an unconditional call to > get-remote-refs to a conditional one that is done only for the > transport that lacks the capability, or something along that line. ... this "matching removal" you are talking about is in the subsequent patch 4. And there is no transport_set_option() to save us this time, so we really do need this bugfix. > IOW, I am a bit confused by this comment (copied from an earlier part) > > > + /** > > +* This transport supports the fetch() function being called > > +* without get_refs_list() first being called. > > +*/ > > Shouldn't it read more like "this transport does not want its > get-refs-list called when fetch-refs is done"? > > I dunno. I'm not sure I understand - transports generally don't care if get-refs-list is called after fetch-refs. Also, this already happens when fetching with tag following from a server that does not support tag following, using a transport that supports reuse. Jonathan Tan (4): transport: allow skipping of ref listing transport: do not list refs if possible transport: list refs before fetch if necessary fetch: do not list refs if fetching only hashes builtin/fetch.c | 32 +- fetch-pack.c| 2 +- t/t5551-http-fetch-smart.sh | 15 +++ t/t5702-protocol-v2.sh | 18 + transport-helper.c | 1 + transport-internal.h| 6 + transport.c | 54 - 7 files changed, 115 insertions(+), 13 deletions(-) -- 2.19.0.605.g01d371f741-goog
Re: [PATCH v2 0/4] git-commit-graph.txt: various cleanups
Hi Derrick On Thu, 27 Sep 2018 at 21:16, Derrick Stolee wrote: > Thanks! This version satisfies my concerns and looks good to me. > > Reviewed-by: Derrick Stolee Thanks for the spectacularly snappy review. I don't expect commit graphs to help my use cases a lot, but I still wanted to try them out a little and stumbled on the `*` lists. Thanks for doing this work! Martin
Re: [PATCH v2 0/4] git-commit-graph.txt: various cleanups
On 9/27/2018 3:12 PM, Martin Ågren wrote: This v2 starts with the same two patches as v1 did, then goes on to change "[commit] graph file" to "commit-graph file" with a dash, to match other instances as well as Derrick's feedback. Thanks! This version satisfies my concerns and looks good to me. Reviewed-by: Derrick Stolee
[PATCH v2 0/4] git-commit-graph.txt: various cleanups
This v2 starts with the same two patches as v1 did, then goes on to change "[commit] graph file" to "commit-graph file" with a dash, to match other instances as well as Derrick's feedback. Martin Ågren (4): git-commit-graph.txt: fix bullet lists git-commit-graph.txt: typeset more in monospace git-commit-graph.txt: refer to "*commit*-graph file" Doc: refer to the "commit-graph file" with dash Documentation/git-commit-graph.txt | 31 Documentation/technical/commit-graph.txt | 8 +++--- 2 files changed, 20 insertions(+), 19 deletions(-) Range-diff against v1: 1: 222721870b = 1: 837ef2f231 git-commit-graph.txt: fix bullet lists 2: acac5c3584 = 2: 9759a162ca git-commit-graph.txt: typeset more in monospace 3: 65f42c947a ! 3: 759bc886d8 git-commit-graph.txt: refer to "*commit* graph file" @@ -1,17 +1,17 @@ Author: Martin Ågren -git-commit-graph.txt: refer to "*commit* graph file" +git-commit-graph.txt: refer to "*commit*-graph file" -This document sometimes refers to the "commit graph file" as just "the +This document sometimes refers to the "commit-graph file" as just "the graph file". This saves a couple of words here and there at the risk of confusion. In particular, the documentation for `git commit-graph read` appears to suggest that there are indeed different types of graph files. Let's just write out the full name everywhere. -The full name, by the way, is not the "commit-graph file" with a dash, -cf. the synopsis. Use the dashless form. (The next commit will fix the -remaining few instances of the "commit-graph file" in this document.) +The full name, by the way, is not the dash-less "commit graph file". +Use the dashed form. (The next commit will fix the remaining few +instances of the "commit graph file" in this document.) Signed-off-by: Martin Ågren @@ -24,7 +24,7 @@ -Read a graph file given by the commit-graph file and output basic -details about the graph file. Used for debugging purposes. -+Read the commit graph file and output basic details about it. ++Read the commit-graph file and output basic details about it. +Used for debugging purposes. 'verify':: @@ -35,7 +35,7 @@ -* Write a graph file, extending the current graph file using commits - in ``. -+* Write a commit graph file, extending the current commit graph file ++* Write a commit-graph file, extending the current commit-graph file + using commits in ``. + @@ -43,16 +43,14 @@ -* Write a graph file containing all reachable commits. -+* Write a commit graph file containing all reachable commits. ++* Write a commit-graph file containing all reachable commits. + $ git show-ref -s | git commit-graph write --stdin-commits -* Write a graph file containing all commits in the current -- commit-graph file along with those reachable from `HEAD`. -+* Write a commit graph file containing all commits in the current -+ commit graph file along with those reachable from `HEAD`. ++* Write a commit-graph file containing all commits in the current + commit-graph file along with those reachable from `HEAD`. + - $ git rev-parse HEAD | git commit-graph write --stdin-commits --append 4: fc81147ea4 < -: -- git-commit-graph.txt: refer to the "commit graph file" without dash -: -- > 4: 99b64287ec Doc: refer to the "commit-graph file" with dash -- 2.19.0.216.g2d3b1c576c
Re: [PATCH v2 0/4] Add proto v2 archive command with HTTP support
On 2018.09.27 11:20, Stefan Beller wrote: > On Wed, Sep 26, 2018 at 6:25 PM Josh Steadmon wrote: > > > > This is the second version of my series to add a new protocol v2 command > > for archiving, with support for HTTP(S). > > > > NEEDSWORK: a server built with this series is not backwards-compatible > > with clients that set GIT_PROTOCOL=version=2 or configure > > protocol.version=2. The old client will unconditionally send "argument > > ..." packet lines, which breaks the server's expectations of a > > "command=archive" request, > > So if an old client sets protocol to v2, it would only apply that > protocol version > to fetch, not archive, so it would start following a v0 conversation, but > as the protocol version is set, it would be transmitted to the server. > This sounds like a bug in the client? Yeah, basically. We're telling the server we support v2, even if the specific operation we're trying to do doesn't have a v2 implementation on the client. So this is going to make it ugly to replace existing commands. > > while the server's capability advertisement > > in turn breaks the clients expectation of either an ACK or NACK. > > Could a modern client send either another protocol version (3?) > or a special capability along the protocol version ("fixed_archive") > > > I've been discussing workarounds for this with Jonathan Nieder, but > > please let me know if you have any suggestions for v3 of this series. > > Care to open the discussion to the list? What are the different > approaches, what are the pros/cons? Jonathan suggested something along the lines of what you said above, adding a new field in GIT_PROTOCOL. So we'd send something like "version=2:archive_version=2" and have the server detect the latter. I'm not sure if that's the best way to go about this since I'm not familiar with the version detection code for other parts of the system. I worry that it will lead us down the path of having to specify a version for every command that we eventually convert to protocol v2. On the other hand, I don't see any other way to work around this, at least in the archive case. We can't peek at the client's transmissions on the server, because v2 requires that the server speaks first...
Re: [PATCH v2 0/4] Add proto v2 archive command with HTTP support
Stefan Beller wrote: > On Wed, Sep 26, 2018 at 6:25 PM Josh Steadmon wrote: >> I've been discussing workarounds for this with Jonathan Nieder, but >> please let me know if you have any suggestions for v3 of this series. > > Care to open the discussion to the list? What are the different > approaches, what are the pros/cons? Do you mean sending video of chatting in the office? Josh and I discussed that 1. Clients sending version=2 when they do not, in fact, speak protocol v2 for a service is a (serious) bug. (Separately from this series) we should fix it. 2. That bug is already in the wild, alas. Fortunately the semantics of GIT_PROTOCOL as a list of key/value pairs is well defined. So we have choices of (a) bump version to version=3 (b) pass another value 'version=2:yesreallyversion=2' (c) etc. 3. This is likely to affect push, too. Thanks and hope that helps, Jonathan
Re: [PATCH v2 0/4] Add proto v2 archive command with HTTP support
On Wed, Sep 26, 2018 at 6:25 PM Josh Steadmon wrote: > > This is the second version of my series to add a new protocol v2 command > for archiving, with support for HTTP(S). > > NEEDSWORK: a server built with this series is not backwards-compatible > with clients that set GIT_PROTOCOL=version=2 or configure > protocol.version=2. The old client will unconditionally send "argument > ..." packet lines, which breaks the server's expectations of a > "command=archive" request, So if an old client sets protocol to v2, it would only apply that protocol version to fetch, not archive, so it would start following a v0 conversation, but as the protocol version is set, it would be transmitted to the server. This sounds like a bug in the client? > while the server's capability advertisement > in turn breaks the clients expectation of either an ACK or NACK. Could a modern client send either another protocol version (3?) or a special capability along the protocol version ("fixed_archive") > I've been discussing workarounds for this with Jonathan Nieder, but > please let me know if you have any suggestions for v3 of this series. Care to open the discussion to the list? What are the different approaches, what are the pros/cons?
[PATCH v2 0/4] Add proto v2 archive command with HTTP support
This is the second version of my series to add a new protocol v2 command for archiving, with support for HTTP(S). NEEDSWORK: a server built with this series is not backwards-compatible with clients that set GIT_PROTOCOL=version=2 or configure protocol.version=2. The old client will unconditionally send "argument ..." packet lines, which breaks the server's expectations of a "command=archive" request, while the server's capability advertisement in turn breaks the clients expectation of either an ACK or NACK. I've been discussing workarounds for this with Jonathan Nieder, but please let me know if you have any suggestions for v3 of this series. Josh Steadmon (4): archive: follow test standards around assertions archive: use packet_reader for communications archive: implement protocol v2 archive command archive: allow archive over HTTP(S) with proto v2 Documentation/technical/protocol-v2.txt | 21 - builtin/archive.c | 58 +++-- builtin/upload-archive.c| 27 ++-- http-backend.c | 13 +- serve.c | 7 +++ t/t5000-tar-tree.sh | 33 +++--- t/t5701-git-serve.sh| 1 + transport-helper.c | 7 +-- 8 files changed, 130 insertions(+), 37 deletions(-) Range-diff against v1: -: -- > 1: c2e371ad24 archive: follow test standards around assertions 1: b514184273 ! 2: a65f73f627 archive: use packet_reader for communications @@ -6,7 +6,10 @@ handling, which will make implementation of protocol v2 support in git-archive easier. +This refactoring does not change the behavior of "git archive". + Signed-off-by: Josh Steadmon +Reviewed-by: Stefan Beller diff --git a/builtin/archive.c b/builtin/archive.c @@ -42,24 +45,24 @@ - if (!buf) + status = packet_reader_read(); + -+ if (status == PACKET_READ_FLUSH) ++ if (status != PACKET_READ_NORMAL || reader.pktlen <= 0) die(_("git archive: expected ACK/NAK, got a flush packet")); - if (strcmp(buf, "ACK")) { - if (starts_with(buf, "NACK ")) - die(_("git archive: NACK %s"), buf + 5); - if (starts_with(buf, "ERR ")) - die(_("remote error: %s"), buf + 4); -+ if (strcmp(reader.buffer, "ACK")) { -+ if (starts_with(reader.buffer, "NACK ")) -+ die(_("git archive: NACK %s"), reader.buffer + 5); -+ if (starts_with(reader.buffer, "ERR ")) -+ die(_("remote error: %s"), reader.buffer + 4); ++ if (strcmp(reader.line, "ACK")) { ++ if (starts_with(reader.line, "NACK ")) ++ die(_("git archive: NACK %s"), reader.line + 5); ++ if (starts_with(reader.line, "ERR ")) ++ die(_("remote error: %s"), reader.line + 4); die(_("git archive: protocol error")); } - if (packet_read_line(fd[0], NULL)) + status = packet_reader_read(); -+ if (status != PACKET_READ_FLUSH) ++ if (status == PACKET_READ_NORMAL && reader.pktlen > 0) die(_("git archive: expected a flush")); /* Now, start reading from fd[0] and spit it out to stdout */ 2: 1518c15dc1 < -: -- archive: implement protocol v2 archive command -: -- > 3: 0a8cc5e331 archive: implement protocol v2 archive command 3: 1b7ad8d8f6 ! 4: 97a1424f32 archive: allow archive over HTTP(S) with proto v2 @@ -10,16 +10,20 @@ +++ b/builtin/archive.c @@ status = packet_reader_read(); - if (status != PACKET_READ_FLUSH) + if (status == PACKET_READ_NORMAL && reader.pktlen > 0) die(_("git archive: expected a flush")); - } + } else if (version == protocol_v2 && -+ starts_with(transport->url, "http")) ++ (starts_with(transport->url, "http://;) || ++ starts_with(transport->url, "https://;))) + /* + * Commands over HTTP require two requests, so there's an -+ * additional server response to parse. ++ * additional server response to parse. We do only basic sanity ++ * checking here that the versions presented match across ++ * requests. + */ -+ discover_version(); ++ if (version != discover_version()) ++ die(_("git archive: received different protocol versions in subsequent requests")); /* Now, start reading from fd[0] and spit it out to stdout */ rv = recv_sideband("archive", fd[0], 1); @@ -40,7 +44,10 @@ struct strbuf buf = STRBUF_INIT; + if (!strcmp(service_name, "git-upload-archive")) {
Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
Hi, On Mon, 6 Aug 2018, Eric Sunshine wrote: > On Mon, Aug 6, 2018 at 9:20 PM Hilco Wijbenga > wrote: > > But your suggestion did make me think about what behaviour I would > > like to see, exactly. I like that Git removes commits that no longer > > serve any purpose (because I've included their changes in earlier > > commits). So I would not want to keep commits that become empty during > > the rebase. What I would like to see is that commits that _start out_ > > as empty, are retained. Would such behaviour make sense? Or would that > > be considered surprising behaviour? > > I, personally, have no opinion since I don't use empty commits. > Perhaps someone more experienced and more long-sighted will chime in. I got aware about two weeks ago that my mail provider fails to deliver all the mails that are addressed to me. Two days ago, I managed to get this mail (and the thread) via public-inbox (thanks, Eric!!!). Hence my late reply. Hilco: if you provide a patch to extend the test suite to demonstrate the breakage (via `test_expect_failure`), I promise to try my best to provide a fix on top. Ciao, Johannes
[PATCH v2 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
I reported a couple of times that t5552 is not passing reliably. It has now reached next, and will no doubt infect master soon. Turns out that it is not a Windows-specific issue, even if it occurs a lot more often on Windows than elsewhere. (And even if it is apparently impossible to trigger on Linux.) The culprit is that two processes try to write simultaneously to the same file specified via GIT_TRACE_PACKET, and it is not well defined how that should work, even when only looking at the POSIX specification (the documentation of write() [http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html] says "Applications should use some form of concurrency control."). This patch series addresses that by locking the trace fd. I chose to use fcntl() for the Unix(-y) platforms we support instead of flock() (even if the latter has much simpler semantics) because fcntl() is in POSIX while flock() is not. On Windows, I use the Win32 API function LockFileEx() [https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-lockfileex] . Of course, I have to admit that I am not super solid on the fcntl() semantics. Junio was nice enough to educate me on l_len = 0 meaning the entire file. If anybody has more experience with locking file descriptors referring not to files, but, say, to pipes or to an interactive terminal, I would be very thankful for help (while the POSIX documentation states that the errno should be EINVAL on a file descriptor that cannot be locked, apparently macOS sets errno to EBADF when trying to lock a redirected stdout )? Changes since v1: * Touched up the cover letter and the first commit message (in particular, removed the bogus squash! line giving away just how much I rely on the --autosquash feature these days). * Now we're locking the entire file via l_len = 0 (except on Windows). * To cover all bases, EINVAL is now also treated as "cannot lock this fd" (in addition to EBADF). * Removed some superfluous flock()-related left-overs from a previous attempt (that caused a lot of me fighting with Linux). Johannes Schindelin (4): Introduce a function to lock/unlock file descriptors when appending mingw: implement lock_or_unlock_fd_for_appending() trace: lock the trace file to avoid racy trace_write() calls trace: verify that locking works Makefile | 1 + compat/mingw.c | 19 ++ compat/mingw.h | 3 + git-compat-util.h | 2 + t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/helper/test-trace.c | 130 + t/t0070-fundamental.sh | 6 ++ trace.c| 11 +++- wrapper.c | 16 + 10 files changed, 189 insertions(+), 1 deletion(-) create mode 100644 t/helper/test-trace.c base-commit: 42cc7485a2ec49ecc440c921d2eb0cae4da80549 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-17%2Fdscho%2Ffetch-negotiator-skipping-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-17/dscho/fetch-negotiator-skipping-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/17 Range-diff vs v1: 1: e449ed75f ! 1: a19904682 Introduce a function to lock/unlock file descriptors when appending @@ -30,7 +30,7 @@ added in this commit. But fcntl() allows a lot more versatile file region locking that we -actually need, to by necessity the fcntl() emulation would be quite +actually need, so by necessity the fcntl() emulation would be quite complex: To support the `l_whence = SEEK_CUR` (which we would use, if it did not require so much book-keeping due to our writing between locking and unlocking the file), we would have to call `SetFilePointerEx()` @@ -47,18 +47,16 @@ to do, as we would once again fail to have an abstraction that clearly says what *exact*functionality we want to use. -To set a precedent for a better approach, let's introduce a proper -abstraction: a function that says in its name precisely what Git -wants it to do (as opposed to *how* it does it on Linux): -lock_or_unlock_fd_for_appending(). +This commit expressly tries to set a precedent for a better approach: +Let's introduce a proper abstraction: a function that says in its name +precisely what Git wants it to do (as opposed to *how* it does it on +Linux): lock_or_unlock_fd_for_appending(). The next commit will provide a Windows-specific implementation of this function/functionality. Signed-off-by: Johannes Schindelin -squash! Introduce a function to lock/unlock file descriptors when appending - diff --git a/git-compat-util.h b/git-compat-util.h --- a/git-compat-util.h +++ b/git-compat-util.h @@ -86,9 +84,11 @@ + struct flock flock; + + flock.l_type
Re: [PATCH v2 0/4] Speed up unpack_trees()
On Thu, Aug 9, 2018 at 10:16 AM Ben Peart wrote: > In fact, in the other [1] patch series, we're detecting the number of > cache entries that are the same as the cache tree and using that to > traverse_by_cache_tree(). At that point, couldn't we copy the > corresponding cache tree entries over to the destination so that those > don't have to get recreated in the later call to cache_tree_update()? We could. But as I stated in another mail, I saw this cache-tree optimization as a separate problem and didn't want to mix them up. That way cache_tree_copy() could be used elsewhere if the opportunity shows up. Mixing them up could also complicate the problem. You you merge stuff, you add new cache-entries to o->result with add_index_entry() which tries to invalidate those paths in o->result's cache-tree. Right now the cache-tree is empty so it's really no-op. But if you copy cache-tree over while merging, that invalidation might either invalidate your newly copied cache-tree, or get slowed down because non-empty o->result's cache-tree means you start to need to walk it to find if there's any path to invalidate. PS. This code keeps messing me up. invalidate_ce_path() may also invalidate cache-tree in the _source_ index. For this optimization to really shine, you better keep the the original cache-tree intact (so that you can reuse as much as possible). I don't see the purpose of this source cache tree invalidation at all. My guess at this point is Linus actually made a mistake in 34110cd4e3 (Make 'unpack_trees()' have a separate source and destination index - 2008-03-06) and he should have invalidated _destination_ index instead of the source one. I'm going to dig in some more and probably will send a patch to remove this invalidation. -- Duy
Re: [PATCH v2 0/4] Speed up unpack_trees()
On Wed, Aug 8, 2018 at 10:53 PM Ben Peart wrote: > > > > On 8/1/2018 12:38 PM, Duy Nguyen wrote: > > On Tue, Jul 31, 2018 at 01:31:31PM -0400, Ben Peart wrote: > >> > >> > >> On 7/31/2018 12:50 PM, Ben Peart wrote: > >>> > >>> > >>> On 7/31/2018 11:31 AM, Duy Nguyen wrote: > >> > > > In the performance game of whack-a-mole, that call to repair cache-tree > > is now looking quite expensive... > > Yeah and I think we can whack that mole too. I did some measurement. > Best case possible, we just need to scan through two indexes (one with > many good cache-tree, one with no cache-tree), compare and copy > cache-tree over. The scanning takes like 1% time of current repair > step and I suspect it's the hashing that takes most of the time. Of > course real world won't have such nice numbers, but I guess we could > maybe half cache-tree update/repair time. > > >>> > >>> I have some great profiling tools available so will take a look at this > >>> next and see exactly where the time is being spent. > >> > >> Good instincts. In cache_tree_update, the heavy hitter is definitely > >> hash_object_file followed by has_object_file. > >> > >> Name Inc %Inc > >> + git!cache_tree_update 12.4 4,935 > >> |+ git!update_one 11.8 4,706 > >> | + git!update_one11.8 4,706 > >> | + git!hash_object_file 6.1 2,406 > >> | + git!has_object_file 2.0813 > >> | + OTHER <>0.5203 > >> | + git!strbuf_addf 0.4155 > >> | + git!strbuf_release0.4143 > >> | + git!strbuf_add0.3121 > >> | + OTHER <>0.2 93 > >> | + git!strbuf_grow 0.1 25 > > > > Ben, if you work on this, this could be a good starting point. I will > > not work on this because I still have some other things to catch up > > and follow through. You can have my sign off if you reuse something > > from this patch > > > > Even if it's a naive implementation, the initial numbers look pretty > > good. Without the patch we have > > > > 18:31:05.970621 unpack-trees.c:1437 performance: 0.01029 s: copy > > 18:31:05.975729 unpack-trees.c:1444 performance: 0.005082004 s: update > > > > And with the patch > > > > 18:31:13.295655 unpack-trees.c:1437 performance: 0.000198017 s: copy > > 18:31:13.296757 unpack-trees.c:1444 performance: 0.001075935 s: update > > > > Time saving is about 80% by the look of this (best possible case > > because only the top tree needs to be hashed and written out). > > > > -- 8< -- > > diff --git a/cache-tree.c b/cache-tree.c > > index 6b46711996..67a4a93100 100644 > > --- a/cache-tree.c > > +++ b/cache-tree.c > > @@ -440,6 +440,147 @@ int cache_tree_update(struct index_state *istate, int > > flags) > > return 0; > > } > > > > +static int same(const struct cache_entry *a, const struct cache_entry *b) > > +{ > > + if (ce_stage(a) || ce_stage(b)) > > + return 0; > > + if ((a->ce_flags | b->ce_flags) & CE_CONFLICTED) > > + return 0; > > + return a->ce_mode == b->ce_mode && > > +!oidcmp(>oid, >oid); > > +} > > + > > +static int cache_tree_name_pos(const struct index_state *istate, > > +const struct strbuf *path) > > +{ > > + int pos; > > + > > + if (!path->len) > > + return 0; > > + > > + pos = index_name_pos(istate, path->buf, path->len); > > + if (pos >= 0) > > + BUG("No no no, directory path must not exist in index"); > > + return -pos - 1; > > +} > > + > > +/* > > + * Locate the same cache-tree in two separate indexes. Check the > > + * cache-tree is still valid for the "to" index (i.e. it contains the > > + * same set of entries in the "from" index). > > + */ > > +static int verify_one_cache_tree(const struct index_state *to, > > + const struct index_state *from, > > + const struct cache_tree *it, > > + const struct strbuf *path) > > +{ > > + int i, spos, dpos; > > + > > + spos = cache_tree_name_pos(from, path); > > + if (spos + it->entry_count > from->cache_nr) > > + return -1; > > + > > + dpos = cache_tree_name_pos(to, path); > > + if (dpos + it->entry_count > to->cache_nr) > > + return -1; > > + > > + /* Can we quickly check head and tail and bail out early */ > > + if (!same(from->cache[spos], to->cache[spos]) || > > + !same(from->cache[spos + it->entry_count - 1], > > + to->cache[spos + it->entry_count - 1])) > > + return -1; > > + > > + for (i = 1; i < it->entry_count - 1; i++) > > + if (!same(from->cache[spos + i], > > + to->cache[dpos + i])) > > +
Re: [PATCH v2 0/4] Speed up unpack_trees()
On 8/8/2018 4:53 PM, Ben Peart wrote: On 8/1/2018 12:38 PM, Duy Nguyen wrote: On Tue, Jul 31, 2018 at 01:31:31PM -0400, Ben Peart wrote: On 7/31/2018 12:50 PM, Ben Peart wrote: On 7/31/2018 11:31 AM, Duy Nguyen wrote: In the performance game of whack-a-mole, that call to repair cache-tree is now looking quite expensive... Yeah and I think we can whack that mole too. I did some measurement. Best case possible, we just need to scan through two indexes (one with many good cache-tree, one with no cache-tree), compare and copy cache-tree over. The scanning takes like 1% time of current repair step and I suspect it's the hashing that takes most of the time. Of course real world won't have such nice numbers, but I guess we could maybe half cache-tree update/repair time. I have some great profiling tools available so will take a look at this next and see exactly where the time is being spent. Good instincts. In cache_tree_update, the heavy hitter is definitely hash_object_file followed by has_object_file. Name Inc % Inc + git!cache_tree_update 12.4 4,935 |+ git!update_one 11.8 4,706 | + git!update_one 11.8 4,706 | + git!hash_object_file 6.1 2,406 | + git!has_object_file 2.0 813 | + OTHER <> 0.5 203 | + git!strbuf_addf 0.4 155 | + git!strbuf_release 0.4 143 | + git!strbuf_add 0.3 121 | + OTHER <> 0.2 93 | + git!strbuf_grow 0.1 25 Ben, if you work on this, this could be a good starting point. I will not work on this because I still have some other things to catch up and follow through. You can have my sign off if you reuse something from this patch Even if it's a naive implementation, the initial numbers look pretty good. Without the patch we have 18:31:05.970621 unpack-trees.c:1437 performance: 0.01029 s: copy 18:31:05.975729 unpack-trees.c:1444 performance: 0.005082004 s: update And with the patch 18:31:13.295655 unpack-trees.c:1437 performance: 0.000198017 s: copy 18:31:13.296757 unpack-trees.c:1444 performance: 0.001075935 s: update Time saving is about 80% by the look of this (best possible case because only the top tree needs to be hashed and written out). -- 8< -- diff --git a/cache-tree.c b/cache-tree.c index 6b46711996..67a4a93100 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -440,6 +440,147 @@ int cache_tree_update(struct index_state *istate, int flags) return 0; } +static int same(const struct cache_entry *a, const struct cache_entry *b) +{ + if (ce_stage(a) || ce_stage(b)) + return 0; + if ((a->ce_flags | b->ce_flags) & CE_CONFLICTED) + return 0; + return a->ce_mode == b->ce_mode && + !oidcmp(>oid, >oid); +} + +static int cache_tree_name_pos(const struct index_state *istate, + const struct strbuf *path) +{ + int pos; + + if (!path->len) + return 0; + + pos = index_name_pos(istate, path->buf, path->len); + if (pos >= 0) + BUG("No no no, directory path must not exist in index"); + return -pos - 1; +} + +/* + * Locate the same cache-tree in two separate indexes. Check the + * cache-tree is still valid for the "to" index (i.e. it contains the + * same set of entries in the "from" index). + */ +static int verify_one_cache_tree(const struct index_state *to, + const struct index_state *from, + const struct cache_tree *it, + const struct strbuf *path) +{ + int i, spos, dpos; + + spos = cache_tree_name_pos(from, path); + if (spos + it->entry_count > from->cache_nr) + return -1; + + dpos = cache_tree_name_pos(to, path); + if (dpos + it->entry_count > to->cache_nr) + return -1; + + /* Can we quickly check head and tail and bail out early */ + if (!same(from->cache[spos], to->cache[spos]) || + !same(from->cache[spos + it->entry_count - 1], + to->cache[spos + it->entry_count - 1])) + return -1; + + for (i = 1; i < it->entry_count - 1; i++) + if (!same(from->cache[spos + i], + to->cache[dpos + i])) + return -1; + + return 0; +} + +static int verify_and_invalidate(struct index_state *to, + const struct index_state *from, + struct cache_tree *it, + struct strbuf *path) +{ + /* + * Optimistically verify the current tree first. Alternatively + * we could verify all the subtrees first then do this + * last. Any invalid subtree would also invalidates its + * ancestors. + */ + if (it->entry_count != -1 && + verify_one_cache_tree(to, from, it, path)) + it->entry_count = -1; + + /* + * If the current tree is valid, don't bother checking
Re: [PATCH v2 0/4] Speed up unpack_trees()
On 8/1/2018 12:38 PM, Duy Nguyen wrote: On Tue, Jul 31, 2018 at 01:31:31PM -0400, Ben Peart wrote: On 7/31/2018 12:50 PM, Ben Peart wrote: On 7/31/2018 11:31 AM, Duy Nguyen wrote: In the performance game of whack-a-mole, that call to repair cache-tree is now looking quite expensive... Yeah and I think we can whack that mole too. I did some measurement. Best case possible, we just need to scan through two indexes (one with many good cache-tree, one with no cache-tree), compare and copy cache-tree over. The scanning takes like 1% time of current repair step and I suspect it's the hashing that takes most of the time. Of course real world won't have such nice numbers, but I guess we could maybe half cache-tree update/repair time. I have some great profiling tools available so will take a look at this next and see exactly where the time is being spent. Good instincts. In cache_tree_update, the heavy hitter is definitely hash_object_file followed by has_object_file. NameInc %Inc + git!cache_tree_update 12.4 4,935 |+ git!update_one11.8 4,706 | + git!update_one 11.8 4,706 | + git!hash_object_file 6.1 2,406 | + git!has_object_file 2.0813 | + OTHER <> 0.5203 | + git!strbuf_addf 0.4155 | + git!strbuf_release 0.4143 | + git!strbuf_add 0.3121 | + OTHER <> 0.2 93 | + git!strbuf_grow 0.1 25 Ben, if you work on this, this could be a good starting point. I will not work on this because I still have some other things to catch up and follow through. You can have my sign off if you reuse something from this patch Even if it's a naive implementation, the initial numbers look pretty good. Without the patch we have 18:31:05.970621 unpack-trees.c:1437 performance: 0.01029 s: copy 18:31:05.975729 unpack-trees.c:1444 performance: 0.005082004 s: update And with the patch 18:31:13.295655 unpack-trees.c:1437 performance: 0.000198017 s: copy 18:31:13.296757 unpack-trees.c:1444 performance: 0.001075935 s: update Time saving is about 80% by the look of this (best possible case because only the top tree needs to be hashed and written out). -- 8< -- diff --git a/cache-tree.c b/cache-tree.c index 6b46711996..67a4a93100 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -440,6 +440,147 @@ int cache_tree_update(struct index_state *istate, int flags) return 0; } +static int same(const struct cache_entry *a, const struct cache_entry *b) +{ + if (ce_stage(a) || ce_stage(b)) + return 0; + if ((a->ce_flags | b->ce_flags) & CE_CONFLICTED) + return 0; + return a->ce_mode == b->ce_mode && + !oidcmp(>oid, >oid); +} + +static int cache_tree_name_pos(const struct index_state *istate, + const struct strbuf *path) +{ + int pos; + + if (!path->len) + return 0; + + pos = index_name_pos(istate, path->buf, path->len); + if (pos >= 0) + BUG("No no no, directory path must not exist in index"); + return -pos - 1; +} + +/* + * Locate the same cache-tree in two separate indexes. Check the + * cache-tree is still valid for the "to" index (i.e. it contains the + * same set of entries in the "from" index). + */ +static int verify_one_cache_tree(const struct index_state *to, +const struct index_state *from, +const struct cache_tree *it, +const struct strbuf *path) +{ + int i, spos, dpos; + + spos = cache_tree_name_pos(from, path); + if (spos + it->entry_count > from->cache_nr) + return -1; + + dpos = cache_tree_name_pos(to, path); + if (dpos + it->entry_count > to->cache_nr) + return -1; + + /* Can we quickly check head and tail and bail out early */ + if (!same(from->cache[spos], to->cache[spos]) || + !same(from->cache[spos + it->entry_count - 1], + to->cache[spos + it->entry_count - 1])) + return -1; + + for (i = 1; i < it->entry_count - 1; i++) + if (!same(from->cache[spos + i], + to->cache[dpos + i])) + return -1; + + return 0; +} + +static int verify_and_invalidate(struct index_state *to, +const struct index_state *from, +struct cache_tree *it, +struct strbuf *path) +{ + /* +* Optimistically verify the current tree first. Alternatively +* we could verify all the subtrees first then do this +* last. Any invalid subtree would also invalidates its +*
Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
Eric Sunshine writes: > On Mon, Aug 6, 2018 at 9:20 PM Hilco Wijbenga > wrote: >> But your suggestion did make me think about what behaviour I would >> like to see, exactly. I like that Git removes commits that no longer >> serve any purpose (because I've included their changes in earlier >> commits). So I would not want to keep commits that become empty during >> the rebase. What I would like to see is that commits that _start out_ >> as empty, are retained. Would such behaviour make sense? Or would that >> be considered surprising behaviour? > > I, personally, have no opinion since I don't use empty commits. > Perhaps someone more experienced and more long-sighted will chime in. 0661e49a ("git-rebase.txt: document behavioral differences between modes", 2018-06-27) added the following. In short, "rebase -i" should already behave that way. + * empty commits: + +am-based rebase will drop any "empty" commits, whether the +commit started empty (had no changes relative to its parent to +start with) or ended empty (all changes were already applied +upstream in other commits). + +merge-based rebase does the same. + +interactive-based rebase will by default drop commits that +started empty and halt if it hits a commit that ended up empty. +The `--keep-empty` option exists for interactive rebases to allow +it to keep commits that started empty.
Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
On Mon, Aug 6, 2018 at 9:20 PM Hilco Wijbenga wrote: > But your suggestion did make me think about what behaviour I would > like to see, exactly. I like that Git removes commits that no longer > serve any purpose (because I've included their changes in earlier > commits). So I would not want to keep commits that become empty during > the rebase. What I would like to see is that commits that _start out_ > as empty, are retained. Would such behaviour make sense? Or would that > be considered surprising behaviour? I, personally, have no opinion since I don't use empty commits. Perhaps someone more experienced and more long-sighted will chime in.
Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
On Tue, Jul 31, 2018 at 11:22 PM, Eric Sunshine wrote: > On Tue, Jul 31, 2018 at 9:30 PM Hilco Wijbenga > wrote: >> On Tue, Jul 31, 2018 at 12:33 AM, Eric Sunshine >> wrote: >> > This is a re-roll of [1] which fixes sequencer bugs resulting in commit >> > object corruption when "rebase -i --root" swaps in a new commit as root. >> > Unfortunately, those bugs made it into v2.18.0 and have already >> > corrupted at least one repository (a local project of mine). Patches 3/4 >> > and 4/4 are new. >> >> Does this also fix losing the initial commit if it is empty? >> >> git init ; git commit -m 'Initial commit' --allow-empty ; touch >> file.txt ; git add file.txt ; git commit -m 'Add file.txt' ; git >> rebase --root >> >> I would expect there to be 2 commits but the first one has >> disappeared. (This usually happens with "git rebase -i --root" early >> on in a new project.) > > This series should have no impact on that issue; it is fixing root > commit object corruption, and does not touch or change how the commit > graph is maintained or how the rebasing machinery itself works (and > the --allow-empty stuff is part of that machinery). > > As Dscho is cc:'d already, perhaps he can chime in as a resident expert. > > By the way, what happens if you use --keep-empty while rebasing? I was not aware of that flag. But, although I was expecting it to, if I use it with the rebase, I don't see any different behaviour. I can't really make out what its purpose is from "Keep the commits that do not change anything from its parents in the result.". But your suggestion did make me think about what behaviour I would like to see, exactly. I like that Git removes commits that no longer serve any purpose (because I've included their changes in earlier commits). So I would not want to keep commits that become empty during the rebase. What I would like to see is that commits that _start out_ as empty, are retained. Would such behaviour make sense? Or would that be considered surprising behaviour?
Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
On Wed, Aug 1, 2018 at 7:25 PM brian m. carlson wrote: > On Tue, Jul 31, 2018 at 03:33:27AM -0400, Eric Sunshine wrote: > > This is a re-roll of [1] which fixes sequencer bugs resulting in commit > > object corruption when "rebase -i --root" swaps in a new commit as root. > > Unfortunately, those bugs made it into v2.18.0 and have already > > corrupted at least one repository (a local project of mine). Patches 3/4 > > and 4/4 are new. > > I looked at this series and it seems sane and logical to me. Thanks for > acting quickly to fix the corruption. > > Reviewed-by: brian m. carlson Thanks for the review.
Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
On Tue, Jul 31, 2018 at 03:33:27AM -0400, Eric Sunshine wrote: > This is a re-roll of [1] which fixes sequencer bugs resulting in commit > object corruption when "rebase -i --root" swaps in a new commit as root. > Unfortunately, those bugs made it into v2.18.0 and have already > corrupted at least one repository (a local project of mine). Patches 3/4 > and 4/4 are new. > > v1 fixed these bugs: > > * trailing garbage on the commit's "author" header > > * extra trailing digit on "author" header's timezone (caused by two > separate bugs) > > v2 fixes those same bugs, plus: > > * eliminates a bogus "@" prepended to the "author" header timestamp > which renders the header corrupt > > * takes care to validate information coming from > "rebase-merge/author-script" before incorporating it into the "author" > header since that file may be hand-edited, and bogus hand-edited > values could corrupt the commit object. I looked at this series and it seems sane and logical to me. Thanks for acting quickly to fix the corruption. Reviewed-by: brian m. carlson -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [PATCH v2 0/4] Speed up unpack_trees()
On Tue, Jul 31, 2018 at 01:31:31PM -0400, Ben Peart wrote: > > > On 7/31/2018 12:50 PM, Ben Peart wrote: > > > > > > On 7/31/2018 11:31 AM, Duy Nguyen wrote: > > >> > >>> In the performance game of whack-a-mole, that call to repair cache-tree > >>> is now looking quite expensive... > >> > >> Yeah and I think we can whack that mole too. I did some measurement. > >> Best case possible, we just need to scan through two indexes (one with > >> many good cache-tree, one with no cache-tree), compare and copy > >> cache-tree over. The scanning takes like 1% time of current repair > >> step and I suspect it's the hashing that takes most of the time. Of > >> course real world won't have such nice numbers, but I guess we could > >> maybe half cache-tree update/repair time. > >> > > > > I have some great profiling tools available so will take a look at this > > next and see exactly where the time is being spent. > > Good instincts. In cache_tree_update, the heavy hitter is definitely > hash_object_file followed by has_object_file. > > Name Inc %Inc > + git!cache_tree_update12.4 4,935 > |+ git!update_one 11.8 4,706 > | + git!update_one 11.8 4,706 > | + git!hash_object_file 6.1 2,406 > | + git!has_object_file2.0813 > | + OTHER <> 0.5203 > | + git!strbuf_addf0.4155 > | + git!strbuf_release 0.4143 > | + git!strbuf_add 0.3121 > | + OTHER <> 0.2 93 > | + git!strbuf_grow0.1 25 Ben, if you work on this, this could be a good starting point. I will not work on this because I still have some other things to catch up and follow through. You can have my sign off if you reuse something from this patch Even if it's a naive implementation, the initial numbers look pretty good. Without the patch we have 18:31:05.970621 unpack-trees.c:1437 performance: 0.01029 s: copy 18:31:05.975729 unpack-trees.c:1444 performance: 0.005082004 s: update And with the patch 18:31:13.295655 unpack-trees.c:1437 performance: 0.000198017 s: copy 18:31:13.296757 unpack-trees.c:1444 performance: 0.001075935 s: update Time saving is about 80% by the look of this (best possible case because only the top tree needs to be hashed and written out). -- 8< -- diff --git a/cache-tree.c b/cache-tree.c index 6b46711996..67a4a93100 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -440,6 +440,147 @@ int cache_tree_update(struct index_state *istate, int flags) return 0; } +static int same(const struct cache_entry *a, const struct cache_entry *b) +{ + if (ce_stage(a) || ce_stage(b)) + return 0; + if ((a->ce_flags | b->ce_flags) & CE_CONFLICTED) + return 0; + return a->ce_mode == b->ce_mode && + !oidcmp(>oid, >oid); +} + +static int cache_tree_name_pos(const struct index_state *istate, + const struct strbuf *path) +{ + int pos; + + if (!path->len) + return 0; + + pos = index_name_pos(istate, path->buf, path->len); + if (pos >= 0) + BUG("No no no, directory path must not exist in index"); + return -pos - 1; +} + +/* + * Locate the same cache-tree in two separate indexes. Check the + * cache-tree is still valid for the "to" index (i.e. it contains the + * same set of entries in the "from" index). + */ +static int verify_one_cache_tree(const struct index_state *to, +const struct index_state *from, +const struct cache_tree *it, +const struct strbuf *path) +{ + int i, spos, dpos; + + spos = cache_tree_name_pos(from, path); + if (spos + it->entry_count > from->cache_nr) + return -1; + + dpos = cache_tree_name_pos(to, path); + if (dpos + it->entry_count > to->cache_nr) + return -1; + + /* Can we quickly check head and tail and bail out early */ + if (!same(from->cache[spos], to->cache[spos]) || + !same(from->cache[spos + it->entry_count - 1], + to->cache[spos + it->entry_count - 1])) + return -1; + + for (i = 1; i < it->entry_count - 1; i++) + if (!same(from->cache[spos + i], + to->cache[dpos + i])) + return -1; + + return 0; +} + +static int verify_and_invalidate(struct index_state *to, +const struct index_state *from, +struct cache_tree *it, +struct strbuf *path) +{ + /* +* Optimistically verify the current tree first. Alternatively +* we could verify all the subtrees first then do this +* last. Any
Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
On Tue, Jul 31, 2018 at 9:30 PM Hilco Wijbenga wrote: > On Tue, Jul 31, 2018 at 12:33 AM, Eric Sunshine > wrote: > > This is a re-roll of [1] which fixes sequencer bugs resulting in commit > > object corruption when "rebase -i --root" swaps in a new commit as root. > > Unfortunately, those bugs made it into v2.18.0 and have already > > corrupted at least one repository (a local project of mine). Patches 3/4 > > and 4/4 are new. > > Does this also fix losing the initial commit if it is empty? > > git init ; git commit -m 'Initial commit' --allow-empty ; touch > file.txt ; git add file.txt ; git commit -m 'Add file.txt' ; git > rebase --root > > I would expect there to be 2 commits but the first one has > disappeared. (This usually happens with "git rebase -i --root" early > on in a new project.) This series should have no impact on that issue; it is fixing root commit object corruption, and does not touch or change how the commit graph is maintained or how the rebasing machinery itself works (and the --allow-empty stuff is part of that machinery). As Dscho is cc:'d already, perhaps he can chime in as a resident expert. By the way, what happens if you use --keep-empty while rebasing?
Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
Hi Eric, On Tue, Jul 31, 2018 at 12:33 AM, Eric Sunshine wrote: > This is a re-roll of [1] which fixes sequencer bugs resulting in commit > object corruption when "rebase -i --root" swaps in a new commit as root. > Unfortunately, those bugs made it into v2.18.0 and have already > corrupted at least one repository (a local project of mine). Patches 3/4 > and 4/4 are new. > > v1 fixed these bugs: > > * trailing garbage on the commit's "author" header > > * extra trailing digit on "author" header's timezone (caused by two > separate bugs) > > v2 fixes those same bugs, plus: > > * eliminates a bogus "@" prepended to the "author" header timestamp > which renders the header corrupt > > * takes care to validate information coming from > "rebase-merge/author-script" before incorporating it into the "author" > header since that file may be hand-edited, and bogus hand-edited > values could corrupt the commit object. > Does this also fix losing the initial commit if it is empty? Given git init ; git commit -m 'Initial commit' --allow-empty ; touch file.txt ; git add file.txt ; git commit -m 'Add file.txt' ; git rebase --root I would expect there to be 2 commits but the first one has disappeared. (This usually happens with "git rebase -i --root" early on in a new project.) Cheers, Hilco
Re: [PATCH v2 0/4] Speed up unpack_trees()
On 7/31/2018 12:50 PM, Ben Peart wrote: On 7/31/2018 11:31 AM, Duy Nguyen wrote: In the performance game of whack-a-mole, that call to repair cache-tree is now looking quite expensive... Yeah and I think we can whack that mole too. I did some measurement. Best case possible, we just need to scan through two indexes (one with many good cache-tree, one with no cache-tree), compare and copy cache-tree over. The scanning takes like 1% time of current repair step and I suspect it's the hashing that takes most of the time. Of course real world won't have such nice numbers, but I guess we could maybe half cache-tree update/repair time. I have some great profiling tools available so will take a look at this next and see exactly where the time is being spent. Good instincts. In cache_tree_update, the heavy hitter is definitely hash_object_file followed by has_object_file. NameInc %Inc + git!cache_tree_update 12.4 4,935 |+ git!update_one11.8 4,706 | + git!update_one 11.8 4,706 | + git!hash_object_file 6.1 2,406 | + git!has_object_file 2.0813 | + OTHER <> 0.5203 | + git!strbuf_addf 0.4155 | + git!strbuf_release 0.4143 | + git!strbuf_add 0.3121 | + OTHER <> 0.2 93 | + git!strbuf_grow 0.1 25
Re: [PATCH v2 0/4] Speed up unpack_trees()
On 7/31/2018 11:31 AM, Duy Nguyen wrote: On Mon, Jul 30, 2018 at 8:10 PM Ben Peart wrote: I ran "git checkout" on a large repo and averaged the results of 3 runs. This clearly demonstrates the benefit of the optimized unpack_trees() as even the final "diff-index" is essentially a 3rd call to unpack_trees(). baselinenew -- 0.535510167 0.556558733 s: read cache .git/index 0.3057373 0.3147105 s: initialize name hash 0.0184082 0.023558433 s: preload index 0.086910967 0.089085967 s: refresh index 7.889590767 2.191554433 s: unpack trees 0.120760833 0.131941267 s: update worktree after a merge 2.2583504 2.572663167 s: repair cache-tree 0.8916137 0.959495233 s: write index, changed mask = 28 3.405199233 0.2710663 s: unpack trees 0.000999667 0.0021554 s: update worktree after a merge 3.4063306 0.273318333 s: diff-index 16.9524923 9.462943133 s: git command: 'c:\git-sdk-64\usr\src\git\git.exe' checkout The first call to unpack_trees() saves 72% The 2nd and 3rd call save 92% By the 3rd I guess you meant "diff-index" line. I think it's the same with the second call. diff-index triggers the second unpack-trees but there's no indent here and it's misleading to read this as diff-index and unpack-trees execute one after the other. Total time savings for the entire command was 44% Wow.. I guess you have more trees since I could only save 30% on gcc.git. Yes, with over 500K trees, this optimization really pays off for us. I can't wait to see how this works out in the wild (vs my "lab" based performance testing). Thank you! I definitely owe you lunch. :) In the performance game of whack-a-mole, that call to repair cache-tree is now looking quite expensive... Yeah and I think we can whack that mole too. I did some measurement. Best case possible, we just need to scan through two indexes (one with many good cache-tree, one with no cache-tree), compare and copy cache-tree over. The scanning takes like 1% time of current repair step and I suspect it's the hashing that takes most of the time. Of course real world won't have such nice numbers, but I guess we could maybe half cache-tree update/repair time. I have some great profiling tools available so will take a look at this next and see exactly where the time is being spent.
Re: [PATCH v2 0/4] Speed up unpack_trees()
On Mon, Jul 30, 2018 at 8:10 PM Ben Peart wrote: > I ran "git checkout" on a large repo and averaged the results of 3 runs. > This clearly demonstrates the benefit of the optimized unpack_trees() > as even the final "diff-index" is essentially a 3rd call to unpack_trees(). > > baselinenew > -- > 0.535510167 0.556558733 s: read cache .git/index > 0.3057373 0.3147105 s: initialize name hash > 0.0184082 0.023558433 s: preload index > 0.086910967 0.089085967 s: refresh index > 7.889590767 2.191554433 s: unpack trees > 0.120760833 0.131941267 s: update worktree after a merge > 2.2583504 2.572663167 s: repair cache-tree > 0.8916137 0.959495233 s: write index, changed mask = 28 > 3.405199233 0.2710663 s: unpack trees > 0.000999667 0.0021554 s: update worktree after a merge > 3.4063306 0.273318333 s: diff-index > 16.9524923 9.462943133 s: git command: > 'c:\git-sdk-64\usr\src\git\git.exe' checkout > > The first call to unpack_trees() saves 72% > The 2nd and 3rd call save 92% By the 3rd I guess you meant "diff-index" line. I think it's the same with the second call. diff-index triggers the second unpack-trees but there's no indent here and it's misleading to read this as diff-index and unpack-trees execute one after the other. > Total time savings for the entire command was 44% Wow.. I guess you have more trees since I could only save 30% on gcc.git. > In the performance game of whack-a-mole, that call to repair cache-tree > is now looking quite expensive... Yeah and I think we can whack that mole too. I did some measurement. Best case possible, we just need to scan through two indexes (one with many good cache-tree, one with no cache-tree), compare and copy cache-tree over. The scanning takes like 1% time of current repair step and I suspect it's the hashing that takes most of the time. Of course real world won't have such nice numbers, but I guess we could maybe half cache-tree update/repair time. -- Duy
Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
On Tue, Jul 31, 2018 at 6:46 AM Eric Sunshine wrote: > Anyhow, thanks for reading over the series. I appreciate it even if > our "sense of priority" doesn't always align (as evidenced by your > review comments and my responses). To be clear, the changes you suggest all make sense, and would be welcome (especially the bug fixes), but I consider them lower priority than the fixes in this series, and here's why: The commit object corruption caused by the bugs fixed by this series are unavoidable. Anyone using "rebase -i --root" to swap in a new commit as root is going to end up with a corrupt repository no matter what. There's no way to side step it. And, most people won't know how to fix the corruption, assuming they even notice it. If I understand correctly, the issues you describe are unlikely to come up in practice. The only way they can arise is if someone hand edits the script (something only power users will do) _and_ botches the edit in the process, or if the person's name contains an apostrophe (possible, though perhaps uncommon?). Also, (if again I understand correctly) they are only data "corruptions", not genuine broken-repository corruptions, thus are more likely to be fixable by a typical user. So it's not that I think your proposed fixes and suggestions are unimportant, I just don't think they belong in this series, and would be happy to see them atop it. Thanks.
Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
Hi Eric On 31/07/18 11:46, Eric Sunshine wrote: > On Tue, Jul 31, 2018 at 6:06 AM Phillip Wood > wrote: >> On 31/07/18 08:33, Eric Sunshine wrote: >>> Patch 2/4 of this series conflicts with Akinori MUSHA's >>> 'am/sequencer-author-script-fix' which takes a stab at fixing one of the >>> four (or so) bugs fixed by this series (namely, adding a missing closing >>> quote to GIT_AUTHOR_DATE in "rebase-merge/author-script"). That patch >>> probably ought to be dropped (without prejudice) in favor of this series >>> for the following reasons: >>> [...] >>> The test added by Akinori MUSHA's patch may still have value, and it may >>> make sense to re-submit it, however, doing so need not hold up this >>> (higher priority) series. >> >> Yes I think it does, also the patch that Johannes and I have on top of >> it to fix the quoting of "'" in write_author_script() relies on fixing >> the missing trailing quote and handling of "'" at the same time, >> hopefully we can get that rebased on top of these asap. > > I'm not sure if "Yes I think it does" means that Akinori's test has > value or if it means that this series should be held back waiting for > other changes built atop it. It means we should use a test based on that with the quoting fixes on top of this series and progress them together if possible. I think the quoting patch I just sent is good now so hopefully there wont be any issue with it holding up your fixes. > Anyhow, thanks for reading over the series. I appreciate it even if > our "sense of priority" doesn't always align (as evidenced by your > review comments and my responses). My "sense of priority" was informed by the hope that the quoting patch wouldn't hold things up (let's hope I was right about that!). Best Wishes Phillip
Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
On Tue, Jul 31, 2018 at 6:06 AM Phillip Wood wrote: > On 31/07/18 08:33, Eric Sunshine wrote: > > Patch 2/4 of this series conflicts with Akinori MUSHA's > > 'am/sequencer-author-script-fix' which takes a stab at fixing one of the > > four (or so) bugs fixed by this series (namely, adding a missing closing > > quote to GIT_AUTHOR_DATE in "rebase-merge/author-script"). That patch > > probably ought to be dropped (without prejudice) in favor of this series > > for the following reasons: > > [...] > > The test added by Akinori MUSHA's patch may still have value, and it may > > make sense to re-submit it, however, doing so need not hold up this > > (higher priority) series. > > Yes I think it does, also the patch that Johannes and I have on top of > it to fix the quoting of "'" in write_author_script() relies on fixing > the missing trailing quote and handling of "'" at the same time, > hopefully we can get that rebased on top of these asap. I'm not sure if "Yes I think it does" means that Akinori's test has value or if it means that this series should be held back waiting for other changes built atop it. Anyhow, thanks for reading over the series. I appreciate it even if our "sense of priority" doesn't always align (as evidenced by your review comments and my responses).
Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
Hi Eric On 31/07/18 08:33, Eric Sunshine wrote: This is a re-roll of [1] which fixes sequencer bugs resulting in commit object corruption when "rebase -i --root" swaps in a new commit as root. Unfortunately, those bugs made it into v2.18.0 and have already corrupted at least one repository (a local project of mine). Patches 3/4 and 4/4 are new. v1 fixed these bugs: * trailing garbage on the commit's "author" header * extra trailing digit on "author" header's timezone (caused by two separate bugs) v2 fixes those same bugs, plus: * eliminates a bogus "@" prepended to the "author" header timestamp which renders the header corrupt * takes care to validate information coming from "rebase-merge/author-script" before incorporating it into the "author" header since that file may be hand-edited, and bogus hand-edited values could corrupt the commit object. Patch 2/4 of this series conflicts with Akinori MUSHA's 'am/sequencer-author-script-fix' which takes a stab at fixing one of the four (or so) bugs fixed by this series (namely, adding a missing closing quote to GIT_AUTHOR_DATE in "rebase-merge/author-script"). That patch probably ought to be dropped (without prejudice) in favor of this series for the following reasons: * It has the undesirable property of adding an unwanted extra blank line to "rebase-merge/author-script"; this series doesn't make that mistake. * The fix in this series (patch 2/4) is more complete, also ensuring that the return value of sq_dequote() is checked. * And, most importantly, this series sells the change as a fix for a genuine serious bug (commit object corruption), whereas that patch talks only about adjusting the file content to make it parseable by the shell. It's important for future readers of this change to understand that the bug (missing closing quote) had much more dire consequences than merely being syntactically invalid as a shell script. The test added by Akinori MUSHA's patch may still have value, and it may make sense to re-submit it, however, doing so need not hold up this (higher priority) series. Yes I think it does, also the patch that Johannes and I have on top of it to fix the quoting of "'" in write_author_script() relies on fixing the missing trailing quote and handling of "'" at the same time, hopefully we can get that rebased on top of these asap. Best Wishes Phillip Phillip's proposed[2] unification of parsers and other fixes and cleanups can easily be built atop this series and, likewise, need not prevent these (higher priority) patches from graduating independently. [1]: https://public-inbox.org/git/20180730092929.71114-1-sunsh...@sunshineco.com/ [2]: https://public-inbox.org/git/1f172fc1-4b51-cdf7-e841-5ca41e209...@talktalk.net/ Eric Sunshine (4): sequencer: fix "rebase -i --root" corrupting author header sequencer: fix "rebase -i --root" corrupting author header timezone sequencer: fix "rebase -i --root" corrupting author header timestamp sequencer: don't die() on bogus user-edited timestamp sequencer.c | 39 +-- t/t3404-rebase-interactive.sh | 10 - 2 files changed, 33 insertions(+), 16 deletions(-) Range-diff against v1: 1: ba10ae4e5d ! 1: 8c47555bcf sequencer: fix "rebase -i --root" corrupting author header @@ -11,8 +11,8 @@ This is a result of read_author_ident() neglecting to NUL-terminate the buffer into which it composes the "author" header. -(Note that the extra "0" in the timezone is a separate bug which will be -fixed subsequently.) +(Note that the "@" preceding the timestamp and the extra "0" in the +timezone are separate bugs which will be fixed subsequently.) Security considerations: Construction of the "author" header by read_author_ident() happens in-place and in parallel with parsing the @@ -26,6 +26,7 @@ inserted as part of the parsing process. Signed-off-by: Eric Sunshine +Acked-by: Johannes Schindelin diff --git a/sequencer.c b/sequencer.c --- a/sequencer.c @@ -61,7 +62,7 @@ + set_fake_editor && + FAKE_LINES="2 1" git rebase -i --root && + git cat-file commit HEAD^ >out && -+ grep "^author ..* @[0-9][0-9]* [-+][0-9][0-9]*$" out ++ grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9]*$" out +' + test_done 2: c0400cda85 ! 2: 1d4064147e sequencer: fix "rebase -i --root" corrupting author header timezone @@ -22,6 +22,9 @@ a NUL-terminator at the end of the shifted content, which explains the duplicated last digit in the timezone. +(Note that the "@" preceding the timestamp is a separate bug which +will be fixed subsequently.) + Signed-off-by: Eric Sunshine diff --git a/sequencer.c b/sequencer.c @@ -56,8
[PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
This is a re-roll of [1] which fixes sequencer bugs resulting in commit object corruption when "rebase -i --root" swaps in a new commit as root. Unfortunately, those bugs made it into v2.18.0 and have already corrupted at least one repository (a local project of mine). Patches 3/4 and 4/4 are new. v1 fixed these bugs: * trailing garbage on the commit's "author" header * extra trailing digit on "author" header's timezone (caused by two separate bugs) v2 fixes those same bugs, plus: * eliminates a bogus "@" prepended to the "author" header timestamp which renders the header corrupt * takes care to validate information coming from "rebase-merge/author-script" before incorporating it into the "author" header since that file may be hand-edited, and bogus hand-edited values could corrupt the commit object. Patch 2/4 of this series conflicts with Akinori MUSHA's 'am/sequencer-author-script-fix' which takes a stab at fixing one of the four (or so) bugs fixed by this series (namely, adding a missing closing quote to GIT_AUTHOR_DATE in "rebase-merge/author-script"). That patch probably ought to be dropped (without prejudice) in favor of this series for the following reasons: * It has the undesirable property of adding an unwanted extra blank line to "rebase-merge/author-script"; this series doesn't make that mistake. * The fix in this series (patch 2/4) is more complete, also ensuring that the return value of sq_dequote() is checked. * And, most importantly, this series sells the change as a fix for a genuine serious bug (commit object corruption), whereas that patch talks only about adjusting the file content to make it parseable by the shell. It's important for future readers of this change to understand that the bug (missing closing quote) had much more dire consequences than merely being syntactically invalid as a shell script. The test added by Akinori MUSHA's patch may still have value, and it may make sense to re-submit it, however, doing so need not hold up this (higher priority) series. Phillip's proposed[2] unification of parsers and other fixes and cleanups can easily be built atop this series and, likewise, need not prevent these (higher priority) patches from graduating independently. [1]: https://public-inbox.org/git/20180730092929.71114-1-sunsh...@sunshineco.com/ [2]: https://public-inbox.org/git/1f172fc1-4b51-cdf7-e841-5ca41e209...@talktalk.net/ Eric Sunshine (4): sequencer: fix "rebase -i --root" corrupting author header sequencer: fix "rebase -i --root" corrupting author header timezone sequencer: fix "rebase -i --root" corrupting author header timestamp sequencer: don't die() on bogus user-edited timestamp sequencer.c | 39 +-- t/t3404-rebase-interactive.sh | 10 - 2 files changed, 33 insertions(+), 16 deletions(-) Range-diff against v1: 1: ba10ae4e5d ! 1: 8c47555bcf sequencer: fix "rebase -i --root" corrupting author header @@ -11,8 +11,8 @@ This is a result of read_author_ident() neglecting to NUL-terminate the buffer into which it composes the "author" header. -(Note that the extra "0" in the timezone is a separate bug which will be -fixed subsequently.) +(Note that the "@" preceding the timestamp and the extra "0" in the +timezone are separate bugs which will be fixed subsequently.) Security considerations: Construction of the "author" header by read_author_ident() happens in-place and in parallel with parsing the @@ -26,6 +26,7 @@ inserted as part of the parsing process. Signed-off-by: Eric Sunshine +Acked-by: Johannes Schindelin diff --git a/sequencer.c b/sequencer.c --- a/sequencer.c @@ -61,7 +62,7 @@ + set_fake_editor && + FAKE_LINES="2 1" git rebase -i --root && + git cat-file commit HEAD^ >out && -+ grep "^author ..* @[0-9][0-9]* [-+][0-9][0-9]*$" out ++ grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9]*$" out +' + test_done 2: c0400cda85 ! 2: 1d4064147e sequencer: fix "rebase -i --root" corrupting author header timezone @@ -22,6 +22,9 @@ a NUL-terminator at the end of the shifted content, which explains the duplicated last digit in the timezone. +(Note that the "@" preceding the timestamp is a separate bug which +will be fixed subsequently.) + Signed-off-by: Eric Sunshine diff --git a/sequencer.c b/sequencer.c @@ -56,8 +59,8 @@ set_fake_editor && FAKE_LINES="2 1" git rebase -i --root && git cat-file commit HEAD^ >out && -- grep "^author ..* @[0-9][0-9]* [-+][0-9][0-9]*$" out -+ grep "^author ..* @[0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out +- grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9]*$" out ++ grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out '
Re: [PATCH v2 0/4] Speed up unpack_trees()
On 7/29/2018 6:33 AM, Nguyễn Thái Ngọc Duy wrote: This series speeds up unpack_trees() a bit by using cache-tree. unpack-trees could bit split in three big parts - the actual tree unpacking and running n-way merging - update worktree, which could be expensive depending on how much I/O is involved - repair cache-tree This series focuses on the first part alone and could give 700% speedup (best case possible scenario, real life ones probably not that impressive). It also shows that the reparing cache-tree is kinda expensive. I have an idea of reusing cache-tree from the original index, but I'll leave that to Ben or others to try out and see if it helps at all. v2 fixes the comments from Junio, adds more performance tracing and reduces the cost of adding index entries. Nguyễn Thái Ngọc Duy (4): unpack-trees.c: add performance tracing unpack-trees: optimize walking same trees with cache-tree unpack-trees: reduce malloc in cache-tree walk unpack-trees: cheaper index update when walking by cache-tree cache-tree.c | 2 + cache.h| 1 + read-cache.c | 3 +- unpack-trees.c | 161 - unpack-trees.h | 1 + 5 files changed, 166 insertions(+), 2 deletions(-) I have a limited understanding of this code path so I'm not the best person to review this but I didn't see any issues that concerned me. I also was able to run our internal functional and performance tests in addition to the git tests and the results were positive. Ben
Re: [PATCH v2 0/4] Speed up unpack_trees()
On 7/29/2018 6:33 AM, Nguyễn Thái Ngọc Duy wrote: This series speeds up unpack_trees() a bit by using cache-tree. unpack-trees could bit split in three big parts - the actual tree unpacking and running n-way merging - update worktree, which could be expensive depending on how much I/O is involved - repair cache-tree This series focuses on the first part alone and could give 700% speedup (best case possible scenario, real life ones probably not that impressive). It also shows that the reparing cache-tree is kinda expensive. I have an idea of reusing cache-tree from the original index, but I'll leave that to Ben or others to try out and see if it helps at all. v2 fixes the comments from Junio, adds more performance tracing and reduces the cost of adding index entries. Nguyễn Thái Ngọc Duy (4): unpack-trees.c: add performance tracing unpack-trees: optimize walking same trees with cache-tree unpack-trees: reduce malloc in cache-tree walk unpack-trees: cheaper index update when walking by cache-tree cache-tree.c | 2 + cache.h| 1 + read-cache.c | 3 +- unpack-trees.c | 161 - unpack-trees.h | 1 + 5 files changed, 166 insertions(+), 2 deletions(-) I ran "git checkout" on a large repo and averaged the results of 3 runs. This clearly demonstrates the benefit of the optimized unpack_trees() as even the final "diff-index" is essentially a 3rd call to unpack_trees(). baselinenew -- 0.535510167 0.556558733 s: read cache .git/index 0.3057373 0.3147105 s: initialize name hash 0.0184082 0.023558433 s: preload index 0.086910967 0.089085967 s: refresh index 7.889590767 2.191554433 s: unpack trees 0.120760833 0.131941267 s: update worktree after a merge 2.2583504 2.572663167 s: repair cache-tree 0.8916137 0.959495233 s: write index, changed mask = 28 3.405199233 0.2710663 s: unpack trees 0.000999667 0.0021554 s: update worktree after a merge 3.4063306 0.273318333 s: diff-index 16.9524923 9.462943133 s: git command: 'c:\git-sdk-64\usr\src\git\git.exe' checkout The first call to unpack_trees() saves 72% The 2nd and 3rd call save 92% Total time savings for the entire command was 44% In the performance game of whack-a-mole, that call to repair cache-tree is now looking quite expensive... Ben
[PATCH v2 0/4] Speed up unpack_trees()
This series speeds up unpack_trees() a bit by using cache-tree. unpack-trees could bit split in three big parts - the actual tree unpacking and running n-way merging - update worktree, which could be expensive depending on how much I/O is involved - repair cache-tree This series focuses on the first part alone and could give 700% speedup (best case possible scenario, real life ones probably not that impressive). It also shows that the reparing cache-tree is kinda expensive. I have an idea of reusing cache-tree from the original index, but I'll leave that to Ben or others to try out and see if it helps at all. v2 fixes the comments from Junio, adds more performance tracing and reduces the cost of adding index entries. Nguyễn Thái Ngọc Duy (4): unpack-trees.c: add performance tracing unpack-trees: optimize walking same trees with cache-tree unpack-trees: reduce malloc in cache-tree walk unpack-trees: cheaper index update when walking by cache-tree cache-tree.c | 2 + cache.h| 1 + read-cache.c | 3 +- unpack-trees.c | 161 - unpack-trees.h | 1 + 5 files changed, 166 insertions(+), 2 deletions(-) -- 2.18.0.656.gda699b98b3
[PATCH v2 0/4] fail compilation with strcpy
On Thu, Jul 19, 2018 at 04:32:59PM -0400, Jeff King wrote: > This is a patch series to address the discussion in the thread at: > > https://public-inbox.org/git/20180713204350.ga16...@sigill.intra.peff.net/ > > Basically, the question was: can we declare strcpy banned and have a > linter save us the trouble of finding it in review. The answer is yes, > the compiler is good at that. ;) Here's a v2 that I think addresses the comments on the earlier series. Thanks everybody for your review. Changes here include: - drop the mention in CodingGuidelines. That list is already long, and we don't need to to waste mental effort on things that will be enforced automatically - we now #undef all macros before defining them to avoid redefinition warnings - the first patch now covers _just_ strcpy(), and each function gets its own patch with an explanation (and suggested alternatives). My thought is that these should be easy to dig up via blame, "log -S", or "log --grep". Though another option would be comments in banned.h. - added strcat and vsprintf to the banned list - tweaked the first patch's commit message for clarity and to address points raised in discussion As before, this needs to go on top of 022d2ac1f3 (which I hope should hit master soon anyway). [1/4]: automatically ban strcpy() [2/4]: banned.h: mark strcat() as banned [3/4]: banned.h: mark sprintf() as banned [4/4]: banned.h: mark strncpy() as banned banned.h | 30 ++ git-compat-util.h | 6 ++ 2 files changed, 36 insertions(+) create mode 100644 banned.h -Peff
Re: [PATCH v2 0/4] Automatic transfer encoding for patches
LGTM, thanks for the v2.
[PATCH v2 0/4] Automatic transfer encoding for patches
This series introduces an "auto" value for git send-email --transfer-encoding that uses 8bit when possible (i.e. when lines are 998 octets or shorter) and quoted-printable otherwise; it then makes this the default behavior. It also makes --validate aware of transfer encoding so it doesn't complain when using quoted-printable or base64. Changes from v1: * Update commit messages to refer to RFC 5322. * Add a missing space. * Remove the needless capture of stderr. * Define "suitable transfer encoding". * Invert test to better capture failures. * Wrap --validate code in an if block instead of returning early. * Update documentation to reflect correct, modern RFC. brian m. carlson (4): send-email: add an auto option for transfer encoding send-email: accept long lines with suitable transfer encoding send-email: automatically determine transfer-encoding docs: correct RFC specifying email line length Documentation/git-send-email.txt | 17 ++ git-send-email.perl | 46 +- t/t9001-send-email.sh| 57 3 files changed, 91 insertions(+), 29 deletions(-)
[GSoC] [PATCH v2 0/4] rebase: rewrite rebase in C
As a GSoC project, I have been working on the builtin rebase. The motivation behind the rewrite of rebase i.e. from shell script to C are for following reasons: 1. Writing shell scripts and getting it to production is much faster than doing the equivalent in C but lacks in performance and extra workarounds are needed for non-POSIX platforms. 2. Git for Windows is at loss as the installer size increases due to addition of extra dependencies for the shell scripts which are usually available in POSIX compliant platforms. This series of patches serves to demonstrate a minimal builtin rebase which supports running `git rebase ` and also serves to ask for reviews. Changes since v1: - Remove the TODO-rebase.sh which shouldn't be merged into `pu`. - Add newline at the end of the file `builtin/rebase.c` in `rebase: start implementing it as a builtin`. - Fix unintentional ordering in `git-rebase--common.sh` and fix the commit message in `rebase: refactor common shell functions into their own file`. - Fix wrong expression of `argc` in the handle upstream loop, fix wrong type casting code, make the condition simple and fix commit message in `builtin/rebase: support running "git-rebase ". Pratik Karki (4): rebase: start implementing it as a builtin rebase: refactor common shell functions into their own file sequencer: refactor the code to detach HEAD to checkout.c builtin/rebase: support running "git rebase " .gitignore| 2 + Makefile | 4 +- builtin.h | 1 + builtin/rebase.c | 282 ++ checkout.c| 64 ++ checkout.h| 3 + git-rebase.sh => git-legacy-rebase.sh | 62 +- git-rebase--common.sh | 61 ++ git.c | 6 + sequencer.c | 58 +- 10 files changed, 428 insertions(+), 115 deletions(-) create mode 100644 builtin/rebase.c rename git-rebase.sh => git-legacy-rebase.sh (91%) create mode 100644 git-rebase--common.sh -- 2.18.0
[PATCH v2 0/4] branch -l deprecation revisited
This series replaces the contents of jk/branch-l-0-deprecation, jk/branch-l-1-removal, and jk/branch-l-2-reincarnation. It implements the idea discussed in the subthread in: https://public-inbox.org/git/xmqqzi0hety4@gitster-ct.c.googlers.com/ Namely that "branch -l" would warn about deprecation only when we're not in list mode, and then we'll eventually jump straight to repurposing "-l" as "--list". This gets us to the end result faster, without the annoying "-l doesn't mean anything" step in between, and without useless warnings on "git branch -l". It also sidesteps any pager issues since list mode will not print the warning. The first 3 patches would become jk/branch-l-0-deprecate, and then the final one would become jk/branch-l-1-repurpose or similar. No third step required. [1/4]: t3200: unset core.logallrefupdates when testing reflog creation [2/4]: t: switch "branch -l" to "branch --create-reflog" [3/4]: branch: deprecate "-l" option [4/4]: branch: make "-l" a synonym for "--list" Documentation/git-branch.txt | 2 +- builtin/branch.c | 4 ++-- t/t1410-reflog.sh| 4 ++-- t/t3200-branch.sh| 34 +- 4 files changed, 22 insertions(+), 22 deletions(-) -Peff
Re: [RFC PATCH v2 0/4] contrib/credential/netrc Makefile & test improvements
On Tue, Jun 12, 2018 at 11:10 PM Todd Zullinger wrote: > > This replaces my 2/2 "use in-tree Git.pm for tests" with > Luis's improved version. It also adds Luis's fix to ensure > the proper exit status on test failures and a minor > whitespace cleanup. > > Is it alright to forge your signoff Luis? Thanks, that's fine. You should sign-off on those, too: you identified the problems, told me the solutions, tested them, and deserve the credit. Moreover, Documentation/SubmittingPatches encourages it. When the patches are ready, I think we'll need the primary author (Ted) to acknowledge before the maintainer (Junio) can approve. > Luis Marsano (2): > git-credential-netrc: use in-tree Git.pm for tests > git-credential-netrc: fix exit status when tests fail > > Todd Zullinger (2): > git-credential-netrc: make "all" default target of Makefile > git-credential-netrc: minor whitespace cleanup in test script > > contrib/credential/netrc/Makefile | 3 +++ > contrib/credential/netrc/t-git-credential-netrc.sh | 9 + > contrib/credential/netrc/test.pl | 5 +++-- > 3 files changed, 11 insertions(+), 6 deletions(-)
[RFC PATCH v2 0/4] contrib/credential/netrc Makefile & test improvements
This replaces my 2/2 "use in-tree Git.pm for tests" with Luis's improved version. It also adds Luis's fix to ensure the proper exit status on test failures and a minor whitespace cleanup. Is it alright to forge your signoff Luis? Luis Marsano (2): git-credential-netrc: use in-tree Git.pm for tests git-credential-netrc: fix exit status when tests fail Todd Zullinger (2): git-credential-netrc: make "all" default target of Makefile git-credential-netrc: minor whitespace cleanup in test script contrib/credential/netrc/Makefile | 3 +++ contrib/credential/netrc/t-git-credential-netrc.sh | 9 + contrib/credential/netrc/test.pl | 5 +++-- 3 files changed, 11 insertions(+), 6 deletions(-)
[PATCH v2 0/4] Fix i-t-a entries in git-diff and git-apply
v2 first fixes a bug in --ita-invisible-in-index that accidentally affects diffing a worktree and a tree. This caused a regression when v1's 1/2 turned this option on by default. Other than that, 4/4 should address Junio's comments on v1's 2/2. Nguyễn Thái Ngọc Duy (4): diff: ignore --ita-[in]visible-in-index when diffing worktree-to-tree diff: turn --ita-invisible-in-index on by default t2203: add a test about "diff HEAD" case apply: add --intent-to-add Documentation/git-apply.txt | 10 +- apply.c | 19 +++ apply.h | 1 + builtin/diff.c | 7 diff-lib.c | 8 +++-- t/t2203-add-intent.sh | 64 + t/t4011-diff-symlink.sh | 10 +++--- 7 files changed, 99 insertions(+), 20 deletions(-) -- 2.17.0.705.g3525833791
[GSoC][PATCH v2 0/4] rebase: split rebase -p from rebase -i
This splits the `rebase --preserve-merges` functionnality from git-rebase--interactive.sh. All the dead code left by the duplication of git-rebase--interactive.sh is also removed. This will make it easier to rewrite this script in C. This patch series is based of js/sequencer-and-root-commits. Changes since v1: - Duplicating the correct version of git-rebase--interactive.sh (thanks Stefan!) Alban Gruin (4): rebase: duplicate git-rebase--interactive.sh to git-rebase--preserve-merges.sh rebase: strip unused code in git-rebase--preserve-merges.sh rebase: use the new git-rebase--preserve-merges.sh rebase: remove -p code from git-rebase--interactive.sh .gitignore |1 + Makefile |2 + git-rebase--interactive.sh | 708 +--- git-rebase--preserve-merges.sh | 1012 git-rebase.sh | 32 +- 5 files changed, 1048 insertions(+), 707 deletions(-) create mode 100644 git-rebase--preserve-merges.sh -- 2.16.1
Re: [PATCH v2 0/4] Fix mem leaks of recent object store conversions.
On Fri, May 11, 2018 at 1:37 AM, Jeff Kingwrote: > On Thu, May 10, 2018 at 12:58:45PM -0700, Stefan Beller wrote: > >> This series replaces the two commits that were queued on >> sb/object-store-replace, >> fixing memory leaks that were recently introduced. >> >> Compared to v1, I merged the two independent series from yesterday, >> rewrote the commit message to clear up Junios confusion and addresses Peffs >> comments for the packfiles as well. > > Mostly. :) > > My one remaining complaint is that the bitmap code may hold on to a > dangling pointer to a packed_git after this series. Ok, I'll look into that. > > I think that is part of a larger problem, though, which is that the > bitmap code's globals need to be part of the struct raw_object_store. > I think this can already cause problems before your series if we were to > try to use bitmaps in both a superproject and a submodule in the same > process, though I think we'd at least hit the "ignoring extra bitmap > file" code path in open_pack_bitmap_1(). So right now it's an annoyance, > but after your series it becomes a potential segfault. Ok, maybe we'll need to convert bitmaps into the object store for that. Thanks for the pointer, Stefan
Re: [PATCH v2 0/4] Fix mem leaks of recent object store conversions.
On Thu, May 10, 2018 at 12:58:45PM -0700, Stefan Beller wrote: > This series replaces the two commits that were queued on > sb/object-store-replace, > fixing memory leaks that were recently introduced. > > Compared to v1, I merged the two independent series from yesterday, > rewrote the commit message to clear up Junios confusion and addresses Peffs > comments for the packfiles as well. Mostly. :) My one remaining complaint is that the bitmap code may hold on to a dangling pointer to a packed_git after this series. I think that is part of a larger problem, though, which is that the bitmap code's globals need to be part of the struct raw_object_store. I think this can already cause problems before your series if we were to try to use bitmaps in both a superproject and a submodule in the same process, though I think we'd at least hit the "ignoring extra bitmap file" code path in open_pack_bitmap_1(). So right now it's an annoyance, but after your series it becomes a potential segfault. -Peff
[PATCH v2 0/4] Fix mem leaks of recent object store conversions.
This series replaces the two commits that were queued on sb/object-store-replace, fixing memory leaks that were recently introduced. Compared to v1, I merged the two independent series from yesterday, rewrote the commit message to clear up Junios confusion and addresses Peffs comments for the packfiles as well. Also added another free to free the oidmap for the replaces themselves. Thanks, Stefan Stefan Beller (4): packfile: close and free packs upon releasing an object store packfile.h: remove all extern keywords object.c: free replace map in raw_object_store_clear replace-object.c: remove the_repository from prepare_replace_object object.c | 7 +++-- packfile.c | 13 packfile.h | 79 replace-object.c | 2 +- 4 files changed, 58 insertions(+), 43 deletions(-) -- 2.17.0.255.g8bfb7c0704
Re: [PATCH v2 0/4] Finish the conversion from die("BUG: ...") to BUG()
On Wed, May 02, 2018 at 11:38:13AM +0200, Johannes Schindelin wrote: > The BUG() macro was introduced in this patch series: > https://public-inbox.org/git/20170513032414.mfrwabt4hovuj...@sigill.intra.peff.net > > The second patch in that series converted one caller from die("BUG: ") > to use the BUG() macro. > > It seems that there was no concrete plan to address the same issue in > the rest of the code base. I had a plan; it was that people would convert these as they touched the relevant areas. :) I'm happy to see a mass-conversion, though. > For that reason, the commit message contains the precise Unix shell > invocation (GNU sed semantics, not BSD sed ones, because I know that the > Git maintainer as well as the author of the patch introducing BUG() both > use Linux and not macOS or any other platform that would offer a BSD > sed). It should be straight-forward to handle merge > conflicts/non-applying patches by simply re-running that command. I suspect this could have been done with coccinelle, but it's definitely not worth going back to re-do it now. > Changes since v2: > > - Avoided the entire discussion about the previous 2/6 (now dropped) > that prepared t1406 to handle SIGABRT by side-stepping the issue: the > ref-store test helper will no longer call abort() in BUG() calls but > exit with exit code 99 instead. I actually think this should be a runtime flag in the environment, like GIT_BUG_EXIT_CODE (and if not set, continue to abort). That can help if you come across a BUG() in real Git code, but for some reason your environment makes aborting a pain. For example, maybe you're debugging a racy BUG() and don't want to generate a bunch of coredumps. Or here's an interesting related case I came across a few months ago. t6210 has a test that's known to segfault due to stack exhaustion. It's marked test_expect_failure, so all is good, right? Normally, yes, but when I run the test suite inside our local Jenkins setup, it detects a segfault in _any_ child process of the test runner and aborts the whole thing. This is great as a belt-and-suspenders if we miss an unexpected segfault, but is obviously annoying in this case. Triggering BUG()s in the test suite, even inside an expect_failure, would introduce the same headache. It would be nice if I could just do: GIT_BUG_EXIT_CODE=134 make test to avoid it. Possibly expect_failure should even set that automatically. I also suspect that nobody really needs to set a specific exit code. Using 134 is enough to avoid all of the unpleasantness of SIGABRT, but still enough to trigger test_must_fail to distinguish it from a non-BUG death. The callers that intentionally trigger bugs probably ought to be using test_expect_code to make sure they are hitting a BUG() and not some other death, anyway. So we could probably simplify it to something like this: diff --git a/usage.c b/usage.c index cdd534c9df..50651bed40 100644 --- a/usage.c +++ b/usage.c @@ -221,7 +221,11 @@ static NORETURN void BUG_vfl(const char *file, int line, const char *fmt, va_lis snprintf(prefix, sizeof(prefix), "BUG: "); vreportf(prefix, fmt, params); - abort(); + + if (git_env_bool("GIT_BUG_ABORT"), 1) + abort(); + else + exit(134); } #ifdef HAVE_VARIADIC_MACROS -Peff
[PATCH v2 0/4] Finish the conversion from die("BUG: ...") to BUG()
The BUG() macro was introduced in this patch series: https://public-inbox.org/git/20170513032414.mfrwabt4hovuj...@sigill.intra.peff.net The second patch in that series converted one caller from die("BUG: ") to use the BUG() macro. It seems that there was no concrete plan to address the same issue in the rest of the code base. This patch series tries to do that. Note: I would be very surprised if the monster commit 3/4 ("Replace all die("BUG: ...") calls by BUG() ones") would apply cleanly on `pu` (I develop this on top of `master`). For that reason, the commit message contains the precise Unix shell invocation (GNU sed semantics, not BSD sed ones, because I know that the Git maintainer as well as the author of the patch introducing BUG() both use Linux and not macOS or any other platform that would offer a BSD sed). It should be straight-forward to handle merge conflicts/non-applying patches by simply re-running that command. Changes since v2: - Avoided the entire discussion about the previous 2/6 (now dropped) that prepared t1406 to handle SIGABRT by side-stepping the issue: the ref-store test helper will no longer call abort() in BUG() calls but exit with exit code 99 instead. - As a consequence, I could fold 3/6 into 4/6 (i.e. *not* special-case the refs/*.c code but do one wholesale die("BUG: ...") -> BUG() conversion). - Further consequence: 1/6, which added handling for ok=sigabrt to test_must_fail, was dropped. - I also used the opportunity to explain in the commit message of the last commit why the localization was dropped from one die(_("BUG: ...")) call. Johannes Schindelin (4): test-tool: help verifying BUG() code paths run-command: use BUG() to report bugs, not die() Replace all die("BUG: ...") calls by BUG() ones Convert remaining die*(BUG) messages apply.c | 4 ++-- archive-tar.c| 2 +- attr.c | 10 +- blame.c | 2 +- builtin/am.c | 20 +-- builtin/branch.c | 2 +- builtin/cat-file.c | 4 ++-- builtin/clone.c | 2 +- builtin/commit.c | 2 +- builtin/config.c | 2 +- builtin/fast-export.c| 2 +- builtin/fsck.c | 2 +- builtin/index-pack.c | 4 ++-- builtin/init-db.c| 2 +- builtin/ls-files.c | 2 +- builtin/notes.c | 8 builtin/pack-objects.c | 4 ++-- builtin/pull.c | 2 +- builtin/receive-pack.c | 2 +- builtin/replace.c| 2 +- builtin/update-index.c | 2 +- bulk-checkin.c | 2 +- color.c | 4 ++-- column.c | 2 +- config.c | 12 +-- date.c | 2 +- diff.c | 12 +-- dir-iterator.c | 2 +- git-compat-util.h| 2 +- grep.c | 16 +++ http.c | 8 imap-send.c | 2 +- lockfile.c | 2 +- mailinfo.c | 2 +- merge-recursive.c| 12 +-- notes-merge.c| 4 ++-- pack-bitmap-write.c | 2 +- pack-bitmap.c| 6 +++--- pack-objects.c | 2 +- packfile.c | 6 +++--- pathspec.c | 12 +-- pkt-line.c | 2 +- prio-queue.c | 2 +- ref-filter.c | 4 ++-- refs.c | 34 refs/files-backend.c | 20 +-- refs/iterator.c | 6 +++--- refs/packed-backend.c| 16 +++ refs/ref-cache.c | 2 +- remote.c | 2 +- revision.c | 4 ++-- run-command.c| 33 ++- setup.c | 4 ++-- sha1-lookup.c| 2 +- sha1-name.c | 4 ++-- shallow.c| 6 +++--- sigchain.c | 2 +- strbuf.c | 4 ++-- submodule.c | 8 t/helper/test-example-decorate.c | 16 +++ t/helper/test-tool.c | 2 ++ tmp-objdir.c | 2 +- trailer.c| 6 +++--- transport.c | 4 ++-- unpack-trees.c | 2 +- usage.c | 5 + vcs-svn/fast_export.c| 6 -- worktree.c | 2 +- wrapper.c| 4 ++-- wt-status.c
[PATCH v2 0/4] rebase -i: avoid stale "# This is a combination of" in commit messages
Eric Sunshine pointed out that I had such a commit message in https://public-inbox.org/git/CAPig+cRrS0_nYJJY=o6cbov630snqhpv5qgrqdd8mw-syzn...@mail.gmail.com/ and I went on a hunt to figure out how the heck this happened. Turns out that if there is a fixup/squash chain where the *last* command fails with merge conflicts, and we either --skip ahead or resolve the conflict to a clean tree and then --continue, our code does not do a final cleanup. Contrary to my initial gut feeling, this bug was not introduced by my rewrite in C of the core parts of rebase -i, but it looks to me as if that bug was with us for a very long time (at least the --skip part). The developer (read: user of rebase -i) in me says that we would want to fast-track this, but the author of rebase -i in me says that we should be cautious and cook this in `next` for a while. Fixes since v1: - Using test_i18ngrep instead of grep, because "This is a combination of commits" is marked for translation. - Added a patch to actually fix `rebase -i` when building with GETTEXT_POISON, because we used to assume that numbers are encoded as ASCII so that we can increment it when writing the next commit message in the fixup/squash chain. This also seems to be a long-standing bug that has been with us since the the beginning of the localization of rebase -i's commit messages. - The test case now starts with test_when_finished "test_might_fail git rebase --abort" to be allow for failing more gently. - Fixed grammar of 2/3 (now 3/4): thanks, Eric! - Fixed the description of the new test case (it purported to test --continue, but it really tests --skip). Johannes Schindelin (4): rebase -i: demonstrate bugs with fixup!/squash! commit messages rebase -i: Handle "combination of commits" with GETTEXT_POISON sequencer: leave a tell-tale when a fixup/squash failed rebase --skip: clean up commit message after a failed fixup/squash sequencer.c| 93 -- t/t3418-rebase-continue.sh | 22 + 2 files changed, 92 insertions(+), 23 deletions(-) base-commit: fe0a9eaf31dd0c349ae4308498c33a5c3794b293 Published-As: https://github.com/dscho/git/releases/tag/clean-msg-after-fixup-continue-v2 Fetch-It-Via: git fetch https://github.com/dscho/git clean-msg-after-fixup-continue-v2 Interdiff vs v1: diff --git a/sequencer.c b/sequencer.c index f067b7b24c5..881503a6463 100644 --- a/sequencer.c +++ b/sequencer.c @@ -1350,19 +1350,18 @@ static int update_squash_messages(enum todo_command command, eol = strchrnul(buf.buf, '\n'); if (buf.buf[0] != comment_line_char || (p += strcspn(p, "0123456789\n")) == eol) - return error(_("unexpected 1st line of squash message:" - "\n\n\t%.*s"), - (int)(eol - buf.buf), buf.buf); - count = strtol(p, NULL, 10); - - if (count < 1) - return error(_("invalid 1st line of squash message:\n" - "\n\t%.*s"), - (int)(eol - buf.buf), buf.buf); + count = -1; + else + count = strtol(p, NULL, 10); strbuf_addf(, "%c ", comment_line_char); - strbuf_addf(, - _("This is a combination of %d commits."), ++count); + if (count < 1) + strbuf_addf(, _("This is a combination of " + "several commits.")); + else + strbuf_addf(, + _("This is a combination of %d commits."), + ++count); strbuf_splice(, 0, eol - buf.buf, header.buf, header.len); strbuf_release(); } else { @@ -1405,13 +1404,22 @@ static int update_squash_messages(enum todo_command command, if (command == TODO_SQUASH) { unlink(rebase_path_fixup_msg()); strbuf_addf(, "\n%c ", comment_line_char); - strbuf_addf(, _("This is the commit message #%d:"), count); + if (count < 2) + strbuf_addf(, _("This is the next commit " + "message:")); + else + strbuf_addf(, _("This is the commit message #%d:"), + count); strbuf_addstr(, "\n\n"); strbuf_addstr(, body); } else if (command == TODO_FIXUP) { strbuf_addf(, "\n%c ", comment_line_char); - strbuf_addf(, _("The commit message #%d will be skipped:"), - count); + if (count < 2) + strbuf_addf(, _("The next commit message will be " +
Re: [PATCH v2 0/4] Lazy-load trees when reading commit-graph
Derrick Stoleewrites: > On 4/7/2018 2:40 PM, Jakub Narebski wrote: >> Derrick Stolee writes: >> >> [...] >>> On the Linux repository, performance tests were run for the following >>> command: >>> >>> git log --graph --oneline -1000 >>> >>> Before: 0.92s >>> After: 0.66s >>> Rel %: -28.3% >>> >>> Adding '-- kernel/' to the command requires loading the root tree >>> for every commit that is walked. There was no measureable performance >>> change as a result of this patch. >> >> In the "Git Merge contributor summit notes" [1] one can read that: >> >>> - VSTS adds bloom filters to know which paths have changed on the commit >>> - tree-same check in the bloom filter is fast; speeds up file history checks >>> - if the file history is _very_ sparse, then bloom filter is useful >> >> Could this method speed up also the second case mentioned here? Can >> anyone explain how this "path-changed bloom filter" works in VSTS? >> > > The idea is simple: for every commit, store a Bloom filter containing > the list of paths that are not TREESAME against the first parent. (A > slight detail: have a max cap on the number of paths, and store simply > "TOO_BIG" for commits with too many diffs.) Isn't Bloom filter with all bits set to 1 equivalent of "too big"? It would answer each query with "possibly in set, check it". > When performing 'git log -- path' queries, the most important detail > for considering how to advance the walk is whether the commit is > TREESAME to its first parent. For a deep path in a large repo, this is > almost always true. When a Bloom filter says "TREESAME" (i.e. "this > path is not in my set") it is always correct, so we can set the > treesame bit and continue without walking any trees. When a Bloom > filter says "MAYBE NOT TREESAME" (i.e. "this path is probably in my > set") you only need to do the same work as before: walk the trees to > compare against your first parent. > > If a Bloom filter has a false-positive rate of X%, then you can > possibly drop your number of tree comparisons by (100-X)%. This is > very important for large repos where some paths were changed only ten > times or so, the full graph needs to be walked and it is helpful to > avoid parsing too many trees. So this works only for exact file pathnames, or does checking for subdirectory also work? I guess it cannot work for globbing patterns (like "git log -- '*.c'"). I guess that it speeds up "git log -- ", but also "git blame ". Sidenote: I have stumbled upon https://github.com/efficient/ffbf project (fast-forward Bllom filters - for exact matching of large number of patterns), where they invent cache-efficient version of Bloom filter algorithm. The paper that the project is based on also might be interesting, if only because it points to other improvements of classic Bloom filters. >> Could we add something like this to the commit-graph file? > > I'm not sure if it is necessary for client-side operations, but it is > one of the reasons the commit-graph file has the idea of an "optional > chunk". It could be added to the file format (without changing version > numbers) and be ignored by clients that don't understand it. I could > also be gated by a config setting for computing them. My guess is that > only server-side operations will need the added response time, and can > bear the cost of computing them when writing the commit-graph > file. Clients are less likely to be patient waiting for a lot of diff > calculations. So the problem is performing large amount of diff calculations. I wonder if it would be possible to add Bloom filters incrementally, when for some reason we need to calculate diff anyway. > > If we add commit-graph file downloads to the protocol, then the server > could do this computation and send the data to all clients. But that > would be "secondary" information that maybe clients want to verify, > which is as difficult as computing it themselves. Can you tell us how much work (how much time) must spend the server to calculate those per-commit "files changed" Bloom fillters? Best, -- Jakub Narębski
Re: [PATCH v2 0/4] Lazy-load trees when reading commit-graph
On Mon, Apr 9, 2018 at 6:15 AM, Derrick Stoleewrote: > I don't understand how folding the patches makes the correctness clearer, > since the rename (1/4) is checked by the compiler and the Coccinelle script > (3/4) only works after that rename is complete. > > The only thing I can imagine is that it makes smaller patch emails, since > there is only one large patch instead of two. In this case, I prefer to make > changes that are easier to check by automation (compiler and coccinelle). > I prefer the offloading to automation, too. So I would vouch for keeping it as-is.
Re: [PATCH v2 0/4] Lazy-load trees when reading commit-graph
On 4/8/2018 7:18 PM, Junio C Hamano wrote: Jeff Kingwrites: If I were doing it myself, I probably would have folded patches 1 and 3 together. They are touching all the same spots, and it would be an error for any case converted in patch 1 to not get converted in patch 3. I'm assuming you caught them all due to Coccinelle, though IMHO it is somewhat overkill here. By folding them together the compiler could tell you which spots you missed. Yeah, that approach would probably be a more sensible way to assure the safety/correctness of the result to readers better. I don't understand how folding the patches makes the correctness clearer, since the rename (1/4) is checked by the compiler and the Coccinelle script (3/4) only works after that rename is complete. The only thing I can imagine is that it makes smaller patch emails, since there is only one large patch instead of two. In this case, I prefer to make changes that are easier to check by automation (compiler and coccinelle). However, I will defer to the experts in this regard. If a v3 is necessary, then I will fold the commits together. Thanks, -Stolee
Re: [PATCH v2 0/4] Lazy-load trees when reading commit-graph
Jeff Kingwrites: > If I were doing it myself, I probably would have folded patches 1 and 3 > together. They are touching all the same spots, and it would be an error > for any case converted in patch 1 to not get converted in patch 3. I'm > assuming you caught them all due to Coccinelle, though IMHO it is > somewhat overkill here. By folding them together the compiler could tell > you which spots you missed. Yeah, that approach would probably be a more sensible way to assure the safety/correctness of the result to readers better. > > And going forward, I doubt it is going to be a common error for people > to use maybe_tree directly. Between the name and the warning comment, > you'd have to really try to shoot yourself in the foot with it. The > primary concern was catching people using the existing "tree" name, > whose semantics changed. Yup.
Re: [PATCH v2 0/4] Lazy-load trees when reading commit-graph
On 4/7/2018 2:40 PM, Jakub Narebski wrote: Derrick Stoleewrites: [...] On the Linux repository, performance tests were run for the following command: git log --graph --oneline -1000 Before: 0.92s After: 0.66s Rel %: -28.3% Adding '-- kernel/' to the command requires loading the root tree for every commit that is walked. There was no measureable performance change as a result of this patch. In the "Git Merge contributor summit notes" [1] one can read that: - VSTS adds bloom filters to know which paths have changed on the commit - tree-same check in the bloom filter is fast; speeds up file history checks - if the file history is _very_ sparse, then bloom filter is useful Could this method speed up also the second case mentioned here? Can anyone explain how this "path-changed bloom filter" works in VSTS? The idea is simple: for every commit, store a Bloom filter containing the list of paths that are not TREESAME against the first parent. (A slight detail: have a max cap on the number of paths, and store simply "TOO_BIG" for commits with too many diffs.) When performing 'git log -- path' queries, the most important detail for considering how to advance the walk is whether the commit is TREESAME to its first parent. For a deep path in a large repo, this is almost always true. When a Bloom filter says "TREESAME" (i.e. "this path is not in my set") it is always correct, so we can set the treesame bit and continue without walking any trees. When a Bloom filter says "MAYBE NOT TREESAME" (i.e. "this path is probably in my set") you only need to do the same work as before: walk the trees to compare against your first parent. If a Bloom filter has a false-positive rate of X%, then you can possibly drop your number of tree comparisons by (100-X)%. This is very important for large repos where some paths were changed only ten times or so, the full graph needs to be walked and it is helpful to avoid parsing too many trees. Could we add something like this to the commit-graph file? I'm not sure if it is necessary for client-side operations, but it is one of the reasons the commit-graph file has the idea of an "optional chunk". It could be added to the file format (without changing version numbers) and be ignored by clients that don't understand it. I could also be gated by a config setting for computing them. My guess is that only server-side operations will need the added response time, and can bear the cost of computing them when writing the commit-graph file. Clients are less likely to be patient waiting for a lot of diff calculations. If we add commit-graph file downloads to the protocol, then the server could do this computation and send the data to all clients. But that would be "secondary" information that maybe clients want to verify, which is as difficult as computing it themselves. Thanks, -Stolee
Re: [PATCH v2 0/4] Lazy-load trees when reading commit-graph
Derrick Stoleewrites: [...] > On the Linux repository, performance tests were run for the following > command: > > git log --graph --oneline -1000 > > Before: 0.92s > After: 0.66s > Rel %: -28.3% > > Adding '-- kernel/' to the command requires loading the root tree > for every commit that is walked. There was no measureable performance > change as a result of this patch. In the "Git Merge contributor summit notes" [1] one can read that: > - VSTS adds bloom filters to know which paths have changed on the commit > - tree-same check in the bloom filter is fast; speeds up file history checks > - if the file history is _very_ sparse, then bloom filter is useful Could this method speed up also the second case mentioned here? Can anyone explain how this "path-changed bloom filter" works in VSTS? Could we add something like this to the commit-graph file? [1]: https://public-inbox.org/git/alpine.DEB.2.20.1803091557510.23109@alexmv-linux/ Best regards, -- Jakub Narębski
Re: [PATCH v2 0/4] Lazy-load trees when reading commit-graph
On Fri, Apr 6, 2018 at 12:21 PM, Jeff Kingwrote: > On Fri, Apr 06, 2018 at 07:09:30PM +, Derrick Stolee wrote: > >> Derrick Stolee (4): >> treewide: rename tree to maybe_tree >> commit: create get_commit_tree() method >> treewide: replace maybe_tree with accessor methods >> commit-graph: lazy-load trees for commits > > I gave this only a cursory read, but it addresses my concern from the > previous round. Same here. Thanks, Stefan
Re: [PATCH v2 0/4] Lazy-load trees when reading commit-graph
On 4/6/2018 3:21 PM, Jeff King wrote: On Fri, Apr 06, 2018 at 07:09:30PM +, Derrick Stolee wrote: Derrick Stolee (4): treewide: rename tree to maybe_tree commit: create get_commit_tree() method treewide: replace maybe_tree with accessor methods commit-graph: lazy-load trees for commits I gave this only a cursory read, but it addresses my concern from the previous round. If I were doing it myself, I probably would have folded patches 1 and 3 together. They are touching all the same spots, and it would be an error for any case converted in patch 1 to not get converted in patch 3. I'm assuming you caught them all due to Coccinelle, though IMHO it is somewhat overkill here. By folding them together the compiler could tell you which spots you missed. And going forward, I doubt it is going to be a common error for people to use maybe_tree directly. Between the name and the warning comment, you'd have to really try to shoot yourself in the foot with it. The primary concern was catching people using the existing "tree" name, whose semantics changed. All that said, I'm fine with having it done this way, too. Thanks. As a double-check that I caught all of the 'maybe_tree' accesses, I ran the following: $ git grep maybe_tree | grep -v get_commit_tree commit-graph.c: item->maybe_tree = NULL; commit-graph.c: c->maybe_tree = lookup_tree(); commit-graph.c: return c->maybe_tree; commit-graph.c: if (c->maybe_tree) commit-graph.c: return c->maybe_tree; commit.c: if (commit->maybe_tree || !commit->object.parsed) commit.c: return commit->maybe_tree; commit.c: item->maybe_tree = lookup_tree(); commit.h: struct tree *maybe_tree; contrib/coccinelle/commit.cocci:- >maybe_tree->object.oid contrib/coccinelle/commit.cocci:- c->maybe_tree->object.oid.hash contrib/coccinelle/commit.cocci:- c->maybe_tree contrib/coccinelle/commit.cocci:+ c->maybe_tree = s contrib/coccinelle/commit.cocci:+ return c->maybe_tree; merge-recursive.c: commit->maybe_tree = tree; Thanks, -Stolee
Re: [PATCH v2 0/4] Lazy-load trees when reading commit-graph
On Fri, Apr 06, 2018 at 07:09:30PM +, Derrick Stolee wrote: > Derrick Stolee (4): > treewide: rename tree to maybe_tree > commit: create get_commit_tree() method > treewide: replace maybe_tree with accessor methods > commit-graph: lazy-load trees for commits I gave this only a cursory read, but it addresses my concern from the previous round. If I were doing it myself, I probably would have folded patches 1 and 3 together. They are touching all the same spots, and it would be an error for any case converted in patch 1 to not get converted in patch 3. I'm assuming you caught them all due to Coccinelle, though IMHO it is somewhat overkill here. By folding them together the compiler could tell you which spots you missed. And going forward, I doubt it is going to be a common error for people to use maybe_tree directly. Between the name and the warning comment, you'd have to really try to shoot yourself in the foot with it. The primary concern was catching people using the existing "tree" name, whose semantics changed. All that said, I'm fine with having it done this way, too. -Peff
[PATCH v2 0/4] Lazy-load trees when reading commit-graph
There are several commit-graph walks that require loading many commits but never walk the trees reachable from those commits. However, the current logic in parse_commit() requires the root tree to be loaded. This only uses lookup_tree(), but when reading commits from the commit- graph file, the hashcpy() to load the root tree hash and the time spent checking the object cache take more time than parsing the rest of the commit. In this patch series, all direct references to accessing the 'tree' member of struct commit are replaced instead by one of the following methods: struct tree *get_commit_tree(struct commit *) struct object_id *get_commit_tree_oid(struct commit *) This replacement was assisted by a Coccinelle script, but the 'tree' member is overloaded in other types, so we first rename the 'tree' member to 'maybe_tree' and use the compiler to ensure we caught all examples. Then, contrib/coccinelle/commit.cocci generates the patch to replace all accessors of 'maybe_tree' to the methods above. After all access is restricted to use these methods, we can then change the postcondition of parse_commit_in_graph() to allow 'maybe_tree' to be NULL. If the tree is accessed later, we can load the tree's OID from the commit-graph in constant time and perform the lookup_tree(). On the Linux repository, performance tests were run for the following command: git log --graph --oneline -1000 Before: 0.92s After: 0.66s Rel %: -28.3% Adding '-- kernel/' to the command requires loading the root tree for every commit that is walked. There was no measureable performance change as a result of this patch. This patch series depends on v7 of ds/commit-graph. Derrick Stolee (4): treewide: rename tree to maybe_tree commit: create get_commit_tree() method treewide: replace maybe_tree with accessor methods commit-graph: lazy-load trees for commits blame.c | 18 +- builtin/checkout.c | 18 -- builtin/diff.c | 2 +- builtin/fast-export.c | 6 +++--- builtin/log.c | 4 ++-- builtin/reflog.c| 2 +- commit-graph.c | 27 +++ commit-graph.h | 7 +++ commit.c| 18 +- commit.h| 5 - contrib/coccinelle/commit.cocci | 30 ++ fsck.c | 8 +--- http-push.c | 2 +- line-log.c | 4 ++-- list-objects.c | 10 +- log-tree.c | 6 +++--- merge-recursive.c | 5 +++-- notes-merge.c | 9 + packfile.c | 2 +- pretty.c| 5 +++-- ref-filter.c| 2 +- revision.c | 8 sequencer.c | 12 ++-- sha1_name.c | 2 +- tree.c | 4 ++-- walker.c| 2 +- 26 files changed, 152 insertions(+), 66 deletions(-) create mode 100644 contrib/coccinelle/commit.cocci -- 2.17.0
[PATCH v2 0/4] Colorize push errors
To help users discern large chunks of white text (when the push succeeds) from large chunks of white text (when the push fails), let's add some color to the latter. The original contribution came in via Pull Request from the Git for Windows project: https://github.com/git-for-windows/git/pull/1429 Changes since v1: - implemented want_color_fd() as suggested by Junio - added color.{advice,push,transport} to be able to force this thing on - added a test - added documentation to `git config`'s man page - fixed a bug where the push config looked at color.advice. - fixed a bug where `color.advise.` was not parsed because git_default_config() would fail to hand `color.advise.*` settings to git_default_advice_config() - wrapped a couple of too-long lines - changed the strstr() hack to detect push failures to use push_had_errors() instead - renamed the transport.color.* settings to color.transport.* (to make them consistent with all the other color.. settings) Johannes Schindelin (3): color: introduce support for colorizing stderr Add a test to verify that push errors are colorful Document the new color.* settings to colorize push errors/hints Ryan Dammrose (1): push: colorize errors Documentation/config.txt | 28 advice.c | 49 ++-- builtin/push.c | 44 - color.c| 20 +++- color.h| 4 ++- config.c | 2 +- t/t5541-http-push-smart.sh | 18 ++ transport.c| 67 +- 8 files changed, 217 insertions(+), 15 deletions(-) base-commit: 468165c1d8a442994a825f3684528361727cd8c0 Published-As: https://github.com/dscho/git/releases/tag/colorize-push-errors-v2 Fetch-It-Via: git fetch https://github.com/dscho/git colorize-push-errors-v2 Interdiff vs v1: diff --git a/Documentation/config.txt b/Documentation/config.txt index 4e0cff87f62..40e3828b85f 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1088,6 +1088,16 @@ clean.requireForce:: A boolean to make git-clean do nothing unless given -f, -i or -n. Defaults to true. +color.advice:: + A boolean to enable/disable color in hints (e.g. when a push + failed, see `advice.*` for a list). May be set to `always`, + `false` (or `never`) or `auto` (or `true`), in which case colors + are used only when the error output goes to a terminal. If + unset, then the value of `color.ui` is used (`auto` by default). + +color.advice.advice:: + Use customized color for hints. + color.branch:: A boolean to enable/disable color in the output of linkgit:git-branch[1]. May be set to `always`, @@ -1190,6 +1200,15 @@ color.pager:: A boolean to enable/disable colored output when the pager is in use (default is true). +color.push:: + A boolean to enable/disable color in push errors. May be set to + `always`, `false` (or `never`) or `auto` (or `true`), in which + case colors are used only when the error output goes to a terminal. + If unset, then the value of `color.ui` is used (`auto` by default). + +color.push.error:: + Use customized color for push errors. + color.showBranch:: A boolean to enable/disable color in the output of linkgit:git-show-branch[1]. May be set to `always`, @@ -1218,6 +1237,15 @@ color.status.:: status short-format), or `unmerged` (files which have unmerged changes). +color.transport:: + A boolean to enable/disable color when pushes are rejected. May be + set to `always`, `false` (or `never`) or `auto` (or `true`), in which + case colors are used only when the error output goes to a terminal. + If unset, then the value of `color.ui` is used (`auto` by default). + +color.transport.rejected:: + Use customized color when a push was rejected. + color.ui:: This variable determines the default value for variables such as `color.diff` and `color.grep` that control the use of color diff --git a/advice.c b/advice.c index 42460877ef6..2121c1811fd 100644 --- a/advice.c +++ b/advice.c @@ -43,7 +43,7 @@ static int parse_advise_color_slot(const char *slot) static const char *advise_get_color(enum color_advice ix) { - if (want_color(advice_use_color)) + if (want_color_stderr(advice_use_color)) return advice_colors[ix]; return ""; } @@ -87,8 +87,10 @@ void advise(const char *advice, ...) for (cp = buf.buf; *cp; cp = np) { np = strchrnul(cp, '\n'); - fprintf(stderr, _("%shint: %.*s%s\n"), advise_get_color(ADVICE_COLOR_HINT), - (int)(np - cp), cp, advise_get_color(ADVICE_COLOR_RESET)); + fprintf(stderr, _("%shint: %.*s%s\n"), +
Re: [PATCH v2 0/4] Teach `--default` to `git-config(1)`
On Fri, Mar 23, 2018 at 08:55:52PM -0400, Taylor Blau wrote: > Attached is 'v2' of my patch series to add a `--default` option to `git > config`. Thanks, this is looking much better. I did come up with a few suggestions/fixes. Some minor which would make v3 easy, and some that would require expanding the series quite a bit. I'm not sure if those bigger ones are worth doing or not. :) > Thank you again for all of your feedback, and my apologies that it has taken > me > so long to respond. I was out of the office last week, and have been quite > busy > since catching up on Git LFS-related issues. No problem. Open source moves at the pace of patch submissions. -Peff
[PATCH v2 0/4] Teach `--default` to `git-config(1)`
Hi, Attached is 'v2' of my patch series to add a `--default` option to `git config`. I have addressed the following concerns since the first iteration: 1. Better motivation in 'builtin/config: introduce `--default`'. (cc: Peff, Eric) 2. Fixed a typo in the message of 'builtin/config: introduce `--default`'. (cc: Eric). 3. Changed the first mention of type specifiers in `git config`'s documentation from a paragraph to a brief list (cc: Peff, Junio). 4. Changed the signatures of some new functions, particularly in 'config.c: introduce 'git_config_color' to parse ANSI colors'. I have thus-far avoided mentioning any specific deprecation of `--get-color` and `--get-colorbool`, but would not be opposed to changing that in this series before queuing. Thank you again for all of your feedback, and my apologies that it has taken me so long to respond. I was out of the office last week, and have been quite busy since catching up on Git LFS-related issues. Thanks, Taylor Taylor Blau (4): builtin/config: introduce `--default` Documentation: list all type specifiers in config prose config.c: introduce 'git_config_color' to parse ANSI colors builtin/config: introduce `--color` type specifier Documentation/git-config.txt | 38 +++--- builtin/config.c | 30 config.c | 10 +++ config.h | 1 + t/t1300-repo-config.sh | 24 +++ t/t1310-config-default.sh| 131 +++ 6 files changed, 226 insertions(+), 8 deletions(-) create mode 100755 t/t1310-config-default.sh -- 2.16.2.440.gc6284da4f
[PATCH v2 0/4] nd/parseopt-completion fixups
v2 fixes the comments from v1 and adds to new patches to improve _git_notes() (sorry I couldn't resist) Interdiff with what's on 'pu' diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index 5f7495cda3..2e30950299 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -1815,7 +1815,7 @@ _git_name_rev () _git_notes () { - local subcommands='add append copy edit list prune remove show' + local subcommands='add append copy edit get-ref list merge prune remove show' local subcommand="$(__git_find_on_cmdline "$subcommands")" case "$subcommand,$cur" in @@ -1832,18 +1832,15 @@ _git_notes () ;; esac ;; - add,--reuse-message=*|append,--reuse-message=*|\ - add,--reedit-message=*|append,--reedit-message=*) + *,--reuse-message=*|*,--reedit-message=*) __git_complete_refs --cur="${cur#*=}" ;; - prune,--*) - __gitcomp_builtin notes_prune - ;; - prune,*) - ;; *,--*) __gitcomp_builtin notes_$subcommand ;; + prune,*|get-ref,*) + # this command does not take a ref, do not complete it + ;; *) case "$prev" in -m|-F) @@ -1956,6 +1953,7 @@ _git_rebase () --autostash --no-autostash --verify --no-verify --keep-empty --root --force-rebase --no-ff + --rerere-autoupdate --exec " Nguyễn Thái Ngọc Duy (4): completion: don't set PARSE_OPT_NOCOMPLETE on --rerere-autoupdate completion: simplify _git_notes completion: complete --{reuse,reedit}-message= for all notes subcmds completion: more subcommands in _git_notes() contrib/completion/git-completion.bash | 25 - parse-options.h| 4 ++-- rerere.h | 3 +-- 3 files changed, 11 insertions(+), 21 deletions(-) -- 2.16.2.785.g429c04a1b9
Re: [PATCH v2 0/4] Delete ignore_env member in struct repository
On Tue, Feb 27, 2018 at 1:58 AM, Nguyễn Thái Ngọc Duywrote: > v2 fixes the incorrect use of consecutive getenv() and adds a comment > to clarify the role of old_gitdir > > Interdiff: > > diff --git a/environment.c b/environment.c > index 95de419de8..47c6e31559 100644 > --- a/environment.c > +++ b/environment.c > @@ -14,6 +14,7 @@ > #include "fmt-merge-msg.h" > #include "commit.h" > #include "object-store.h" > +#include "argv-array.h" > > int trust_executable_bit = 1; > int trust_ctime = 1; > @@ -148,18 +149,34 @@ static char *expand_namespace(const char *raw_namespace) > return strbuf_detach(, NULL); > } > > +/* Wrapper of getenv() that returns a strdup value. This value is kept > + * in argv to be freed later. > + */ /* comment style, see also Erics reply */ I do not understand why we need to use an argv_array here? I see that the push calls xstrdup and then puts it into the array. So to me this looks like a clever way that we can easily free them all at once using the predefined argv_array_clear() after calling repo_set_gitdir. That makes sense, though is confusing at first; I would have expected a set_gitdir_args_clear() function that just frees all its strings, whether they are NULL or not. But given that we are clever with not pushing onto the argv_array in case the getenv returns NULL, this seems to be less work in the usual case of no env variables set. Ok, I think I like it. > +static const char *getenv_safe(struct argv_array *argv, const char *name) I would have expected that we already have such a (or similar) helper in wrapper.c, but it turns out we always take care of getenv manually, and we do not have a wrapper yet. So only the comment style remains as a nit. Thanks, Stefan
Re: [PATCH v2 0/4] Delete ignore_env member in struct repository
On Tue, Feb 27, 2018 at 4:58 AM, Nguyễn Thái Ngọc Duywrote: > v2 fixes the incorrect use of consecutive getenv() and adds a comment > to clarify the role of old_gitdir > > diff --git a/environment.c b/environment.c > @@ -148,18 +149,34 @@ static char *expand_namespace(const char *raw_namespace) > +/* Wrapper of getenv() that returns a strdup value. This value is kept > + * in argv to be freed later. > + */ /* * Comment style. */ > +static const char *getenv_safe(struct argv_array *argv, const char *name) > +{ > + const char *value = getenv(name); > + > + if (!value) > + return NULL; > + > + argv_array_push(argv, value); > + return argv->argv[argv->argc - 1]; > +} > + > void setup_git_env(const char *git_dir) > { > const char *shallow_file; > const char *replace_ref_base; > struct set_gitdir_args args = { NULL }; > + struct argv_array to_free = ARGV_ARRAY_INIT; Cute. Another example[1] showing that renaming argv_array to something else might be a good idea. [1]: https://public-inbox.org/git/20180227062128.gg65...@aiede.svl.corp.google.com/ > - args.shared_root = getenv(GIT_COMMON_DIR_ENVIRONMENT); > - args.object_dir = getenv(DB_ENVIRONMENT); > - args.graft_file = getenv(GRAFT_ENVIRONMENT); > - args.index_file = getenv(INDEX_ENVIRONMENT); > - args.alternate_db = getenv(ALTERNATE_DB_ENVIRONMENT); > + args.shared_root = getenv_safe(_free, GIT_COMMON_DIR_ENVIRONMENT); > + args.object_dir = getenv_safe(_free, DB_ENVIRONMENT); > + args.graft_file = getenv_safe(_free, GRAFT_ENVIRONMENT); > + args.index_file = getenv_safe(_free, INDEX_ENVIRONMENT); > + args.alternate_db = getenv_safe(_free, ALTERNATE_DB_ENVIRONMENT); > repo_set_gitdir(the_repository, git_dir, ); > + argv_array_clear(_free); > diff --git a/repository.c b/repository.c > @@ -48,6 +48,11 @@ void repo_set_gitdir(struct repository *repo, > const char *gitfile = read_gitfile(root); > + /* > +* repo->gitdir is saved because the caller could pass "root" > +* that also points to repo->gitdir. We want to keep it alive > +* until after xstrdup(root). Then we can free it. > +*/ > char *old_gitdir = repo->gitdir; Good. This comment makes the reasoning clear. > repo->gitdir = xstrdup(gitfile ? gitfile : root);
[PATCH v2 0/4] Delete ignore_env member in struct repository
v2 fixes the incorrect use of consecutive getenv() and adds a comment to clarify the role of old_gitdir Interdiff: diff --git a/environment.c b/environment.c index 95de419de8..47c6e31559 100644 --- a/environment.c +++ b/environment.c @@ -14,6 +14,7 @@ #include "fmt-merge-msg.h" #include "commit.h" #include "object-store.h" +#include "argv-array.h" int trust_executable_bit = 1; int trust_ctime = 1; @@ -148,18 +149,34 @@ static char *expand_namespace(const char *raw_namespace) return strbuf_detach(, NULL); } +/* Wrapper of getenv() that returns a strdup value. This value is kept + * in argv to be freed later. + */ +static const char *getenv_safe(struct argv_array *argv, const char *name) +{ + const char *value = getenv(name); + + if (!value) + return NULL; + + argv_array_push(argv, value); + return argv->argv[argv->argc - 1]; +} + void setup_git_env(const char *git_dir) { const char *shallow_file; const char *replace_ref_base; struct set_gitdir_args args = { NULL }; + struct argv_array to_free = ARGV_ARRAY_INIT; - args.shared_root = getenv(GIT_COMMON_DIR_ENVIRONMENT); - args.object_dir = getenv(DB_ENVIRONMENT); - args.graft_file = getenv(GRAFT_ENVIRONMENT); - args.index_file = getenv(INDEX_ENVIRONMENT); - args.alternate_db = getenv(ALTERNATE_DB_ENVIRONMENT); + args.shared_root = getenv_safe(_free, GIT_COMMON_DIR_ENVIRONMENT); + args.object_dir = getenv_safe(_free, DB_ENVIRONMENT); + args.graft_file = getenv_safe(_free, GRAFT_ENVIRONMENT); + args.index_file = getenv_safe(_free, INDEX_ENVIRONMENT); + args.alternate_db = getenv_safe(_free, ALTERNATE_DB_ENVIRONMENT); repo_set_gitdir(the_repository, git_dir, ); + argv_array_clear(_free); if (getenv(NO_REPLACE_OBJECTS_ENVIRONMENT)) check_replace_refs = 0; diff --git a/repository.c b/repository.c index 8f6386022f..c555dacad2 100644 --- a/repository.c +++ b/repository.c @@ -48,6 +48,11 @@ void repo_set_gitdir(struct repository *repo, const struct set_gitdir_args *o) { const char *gitfile = read_gitfile(root); + /* +* repo->gitdir is saved because the caller could pass "root" +* that also points to repo->gitdir. We want to keep it alive +* until after xstrdup(root). Then we can free it. +*/ char *old_gitdir = repo->gitdir; repo->gitdir = xstrdup(gitfile ? gitfile : root); Nguyễn Thái Ngọc Duy (4): repository.c: move env-related setup code back to environment.c repository.c: delete dead functions sha1_file.c: move delayed getenv(altdb) back to setup_git_env() repository: delete ignore_env member cache.h| 2 +- environment.c | 30 -- object-store.h | 5 ++- object.c | 1 + repository.c | 84 -- repository.h | 21 +++-- setup.c| 3 +- sha1_file.c| 6 +--- 8 files changed, 86 insertions(+), 66 deletions(-) -- 2.16.1.435.g8f24da2e1a
[PATCH v2 0/4] making test-suite tracing more useful
This series fixes some rough edges in the "-x" feature of the test suite. With it, it should be possible to turn on tracing for CI runs. This is mostly a repost of v1 at: https://public-inbox.org/git/20171019210140.64lb52cqtgdh2...@sigill.intra.peff.net which had some discussion, but wasn't picked up. I fixed a few typos pointed out by reviewers, and I tried to summarize the discussion around "magic descriptor 4" in the commit message of patch 2. My conclusion there was that none of the options is particularly good. The only thing thing that might make sense is making "7" the magical descriptor. But given that "4" only had one troubling call, I'm inclined to just leave it as-is and see if this ever comes up again (especially if we start using "-x" in the Travis builds, then we'd catch any problems). [1/4]: test-lib: silence "-x" cleanup under bash [2/4]: t5615: avoid re-using descriptor 4 [3/4]: test-lib: make "-x" work with "--verbose-log" [4/4]: t/Makefile: introduce TEST_SHELL_PATH Makefile | 8 t/Makefile | 6 -- t/t5615-alternate-env.sh | 6 +++--- t/test-lib.sh| 46 +- 4 files changed, 48 insertions(+), 18 deletions(-) -Peff
[PATCH v2 0/4] Fix issues with rename detection limits
This patchset fixes some issues around rename detection limits. Changes since the original submission[1]: * Patches 2 and 3 are swapped in order so as to not temporarily introduce any bugs (even if only cosmetic) * Commit message fixups * Using uint64_t instead of double for holding multiplication of integers * Corrected format specifier for printing 64-bit ints. [1] https://public-inbox.org/git/20171110173956.25105-1-new...@gmail.com/ Elijah Newren (4): sequencer: warn when internal merge may be suboptimal due to renameLimit progress: fix progress meters when dealing with lots of work Remove silent clamp of renameLimit sequencer: show rename progress during cherry picks diff.c| 2 +- diffcore-rename.c | 15 ++- progress.c| 22 +++--- progress.h| 8 sequencer.c | 2 ++ 5 files changed, 24 insertions(+), 25 deletions(-) -- 2.15.0.44.gf995e411c7
[PATCH v2 0/4] bisect: assorted leak-fixes
This fixes a couple of leaks around `find_bisection(). The first patch is new since v1 [1] to make the calling-convention of `find_bisection()` less prone to misuse. Patch 2 is similar to v1, the only difference is that the "while at it" has moved into patch 1. Patches 3 and 4 are identical to v1. Martin [1] https://public-inbox.org/git/4795556016c25e4a78241362547c5468877f808d.1509557518.git.martin.ag...@gmail.com/ Martin Ågren (4): bisect: change calling-convention of `find_bisection()` bisect: fix memory leak in `find_bisection()` bisect: fix off-by-one error in `best_bisection_sorted()` bisect: fix memory leak when returning best element bisect.h | 12 +--- bisect.c | 33 +++-- builtin/rev-list.c | 3 +-- 3 files changed, 29 insertions(+), 19 deletions(-) -- 2.15.0.415.gac1375d7e
[PATCH v2 0/4] convert diff flags to be stored in a struct
Changes in v2: * removed the diff_flags_cleared singleton * eliminated the 'touched' parallel flags * pass structs by reference instead of by value Now that the 'touched' flags have been removed it may be valuable to go ahead and remove the macros all together (including making the flags lower case). If that's what we want to do, I can go ahead and send those patches out as a follow on to these. Brandon Williams (4): add, reset: use DIFF_OPT_SET macro to set a diff flag diff: convert flags to be stored in bitfields diff: add flag to indicate textconv was set via cmdline diff: remove touched flags builtin/add.c| 2 +- builtin/commit.c | 7 +++-- builtin/log.c| 3 +- builtin/reset.c | 2 +- diff-lib.c | 7 +++-- diff.c | 10 +++--- diff.h | 96 +--- sequencer.c | 5 +-- 8 files changed, 77 insertions(+), 55 deletions(-) -- 2.15.0.403.gc27cc4dac6-goog
Re: [PATCH v2 0/4] Hash Abstraction
On Sat, Oct 28, 2017 at 11:12 AM, brian m. carlsonwrote: > This is a series proposing a basic abstraction for hash functions. > > As we get closer to converting the remainder of the codebase to use > struct object_id, we should think about the design we want our hash > function abstraction to take. This series is a proposal for one idea. > Input on any aspect of this proposal is welcome. I like the series/idea. > This series exposes a struct git_hash_algo that contains basic > information about a given hash algorithm that distinguishes it from > other algorithms: name, identifiers, lengths, implementing functions, > and empty tree and blob constants. It also exposes an array of hash > algorithms, and a constant for indexing them. > > The series also demonstrates a simple conversion using the abstraction > over empty blob and tree values. That looks good, though I wonder if we'll have to give a way a little performance during the transition period (or even indefinitely, as the hash abstraction won't go away; we'd rather want to keep it in place long term). I guess we can measure after all is said and done, no big deal. > > In order to avoid conflicting with the struct repository work and with > the goal of avoiding global variables as much as possible, I've pushed > the hash algorithm into struct repository and exposed it via a #define. If you are referring to that long series[1] that Jonathan and I were working on, then no worries. Unfortunately that series got some delays, but I rebased its prepared successors (of over 100 patches) on Friday and it was surprisingly easy to do so; just tedious. [1] https://public-inbox.org/git/20170830064634.ga153...@aiede.mtv.corp.google.com/ > > I propose this series now as it will inform the way we go about > converting other parts of the codebase, especially some of the pack > algorithms. Thanks for doing this now. > Because we share some hash computation code between pack > checksums and object hashing, we need to decide whether to expose pack > checksums as struct object_id, even though they are technically not > object IDs. I think we used sha1 as checksums, because it was a hash function easily accessible, not because its other properties were the best to do its job. So I am undecided if we want to just "keep the same hash function for checksum as well as object hashing" or pickup another checksum, that actually *only* does checksumming (we don't need the cryptographic properties, such that an easy hash [such as crc, djb, fnv or murmur] would do; raw speed, barely protecting the pack file against bit flips). For the sake of symmetry I tend to go with the former, i.e use the object hash function for pack files, too. > Furthermore, if we end up needing to stuff an algorithm > value into struct object_id, we'll no longer be able to directly > reference object IDs in a pack without a copy. > > I've updated this series in some significant ways to reflect and better > implement the transition plan as it's developed. If there are ways > in which this series (or future series) can converge better on the > transition plan, that input would be valuable. > > This series is available from the usual places as branch hash-struct, > based against master as of 2.15-rc2. Thanks, Stefan
Re: [PATCH v2 0/4] fsmonitor fixes
Hi Alex, On Wed, 25 Oct 2017, Alex Vandiver wrote: > Updated based on comments from Dscho and Ben. Thanks for those! Thank you for this excellent improvement. Ciao, Dscho
[PATCH v2 0/4] Hash Abstraction
This is a series proposing a basic abstraction for hash functions. As we get closer to converting the remainder of the codebase to use struct object_id, we should think about the design we want our hash function abstraction to take. This series is a proposal for one idea. Input on any aspect of this proposal is welcome. This series exposes a struct git_hash_algo that contains basic information about a given hash algorithm that distinguishes it from other algorithms: name, identifiers, lengths, implementing functions, and empty tree and blob constants. It also exposes an array of hash algorithms, and a constant for indexing them. The series also demonstrates a simple conversion using the abstraction over empty blob and tree values. In order to avoid conflicting with the struct repository work and with the goal of avoiding global variables as much as possible, I've pushed the hash algorithm into struct repository and exposed it via a #define. I propose this series now as it will inform the way we go about converting other parts of the codebase, especially some of the pack algorithms. Because we share some hash computation code between pack checksums and object hashing, we need to decide whether to expose pack checksums as struct object_id, even though they are technically not object IDs. Furthermore, if we end up needing to stuff an algorithm value into struct object_id, we'll no longer be able to directly reference object IDs in a pack without a copy. I've updated this series in some significant ways to reflect and better implement the transition plan as it's developed. If there are ways in which this series (or future series) can converge better on the transition plan, that input would be valuable. This series is available from the usual places as branch hash-struct, based against master as of 2.15-rc2. Changes from v1: * Rebase onto 2.15-rc2. * Fix the uninitialized value that Peff pointed out. This fixes the error, but leaves the code in the same place, since I think it's where it should be. * Improve commit message to explain the meaning of current_hash WRT the transition plan. * Added an unknown hash algorithm constant and value to better implement the transition plan. * Explain in the commit message why hex size and binary size are both provided. * Add a format_id field to the struct, in coordination with the transition plan. * Improve comments for struct fields and constants. brian m. carlson (4): setup: expose enumerated repo info Add structure representing hash algorithm Integrate hash algorithm support with repo setup Switch empty tree and blob lookups to use hash abstraction builtin/am.c | 2 +- builtin/checkout.c | 2 +- builtin/diff.c | 2 +- builtin/pull.c | 2 +- cache.h| 67 ++ diff-lib.c | 2 +- merge-recursive.c | 2 +- notes-merge.c | 2 +- repository.c | 7 ++ repository.h | 5 sequencer.c| 6 ++--- setup.c| 49 ++- sha1_file.c| 43 +++ submodule.c| 2 +- 14 files changed, 157 insertions(+), 36 deletions(-) -- 2.15.0.rc2
[PATCH v2 0/4] fsmonitor fixes
Updated based on comments from Dscho and Ben. Thanks for those! - Alex
[PATCH v2 0/4] filter-branch: support for incremental update + fix for ancient tag format
This is the second version of my patches to add incremental support to git-filter-branch. Since the last time I have: * addressed the review feedback (see changelog embedded in final patch) * switched to using the (newly introduced) `--allow-missing-tagger` option to `git mktag` to allow early Linux kernel tags to be rewritten * split out some of the preparatory changes to make the final patch easier to read. I've force pushed to [1] where Travis seems happy and have set off the process of re-rewriting the devicetree tree from scratch (a multi-day affair) to validate (it's looking good). Ian. [0] https://git.kernel.org/pub/scm/linux/kernel/git/devicetree/devicetree-rebasing.git/ [1] https://github.com/ijc/git/tree/git-filter-branch
Re: [PATCH v2 0/4] Some ThreadSanitizer-results
On 8/21/2017 1:43 PM, Martin Ågren wrote: This is the second version of my series to try to address some issues ... 2) hashmap_add, which I could try my hands on if Jeff doesn't beat me to it -- his proposed change should fix it and I doubt I could come up with anything "better", considering he knows the code. Now that I'm back from vacation, let me take another stab at this. Jeff
[GSoC][PATCH v2 0/4] submodule: Incremental rewrite of git-submodules
Changes in v2: * In the function get_submodule_displaypath(), I tried avoiding the extra memory allocation, but it rather invoked test 5 from t7406-submodule-update. The details of the test failiure can be seen in the build report as well. (Build #162) This failure occured as even though the function relative_path(), returned the value correctly, NULL was stored in sb.buf Hence currently the function is still using the old way of allocating extra memory and releasing the strbuf first, and return the extra allocated memory. * It was observed that strbuf_reset was being done just before the strbuf was being reused. It made more sense to reset them just after their use instead. Hence, these function calls of strbuf_reset() were moved accordingly. (The above change was limited to the function init_submodule() only, since already there was a deletion of one such funciton call, and IMO, it won't be suitable to carryout the operation for the complete file in the same commit.) * The function name for_each_submodule_list() was changed to for_each_submodule(). * Instead of passing the complete list to the function for_each_submodule(), only its pointer is passed to avoid the compiler from making copy of the list, and to make the code more efficient. * The function names of print_name_rev() and get_name_rev() were changed to get_rev_name() and compute_rev_name(). Now the function get_rev_name() acts as the front end of the subcommand and calls the function compute_rev_name() for generating and receiving the value of revision name. * The above change was also accompanied by the change in names of some variables used internally in the functions. As before you can find this series at: https://github.com/pratham-pc/git/commits/week-14-1 And the build report is available at: https://travis-ci.org/pratham-pc/git/builds/ Branch: week-14-1 Build #163 Prathamesh Chavan (4): submodule--helper: introduce get_submodule_displaypath() submodule--helper: introduce for_each_submodule() submodule: port set_name_rev() from shell to C submodule: port submodule subcommand 'status' from shell to C builtin/submodule--helper.c | 294 git-submodule.sh| 61 + 2 files changed, 273 insertions(+), 82 deletions(-) -- 2.13.0
[PATCH v2 0/4] Some ThreadSanitizer-results
This is the second version of my series to try to address some issues noted by ThreadSanitizer. Thanks to all who took the time to provide input on the first version. The largest change is in the third patch, which moves the "avoid writing to slopbuf"-logic into strbuf_setlen, and compiles it unconditionally. Patch 2 hasn't been changed. The others have seen some minor changes. The patch introducing GIT_THREAD_SANITIZER is gone. The end result as far as ThreadSanitizer is concerned is the same: 1) set_try_to_free_routine, where Peff outlined some possibilities, and where I don't feel like I have any idea which of them is better. 2) hashmap_add, which I could try my hands on if Jeff doesn't beat me to it -- his proposed change should fix it and I doubt I could come up with anything "better", considering he knows the code. Martin Martin Ågren (4): convert: always initialize attr_action in convert_attrs pack-objects: take lock before accessing `remaining` strbuf_setlen: don't write to strbuf_slopbuf ThreadSanitizer: add suppressions strbuf.h | 5 - builtin/pack-objects.c | 6 ++ color.c| 7 +++ convert.c | 5 +++-- transport-helper.c | 7 +++ .tsan-suppressions | 10 ++ 6 files changed, 37 insertions(+), 3 deletions(-) create mode 100644 .tsan-suppressions -- 2.14.1.151.gdfeca7a7e
[PATCH v2 0/4] Modernize read_graft_line implementation
Compared to v1: - the first patch is dropped to make it easier to merge - free_graft is now static function in commit.c I don't know, what are exact rules about adding Reviewed-by footer, so I didn't add any. Patryk Obara (4): sha1_file: fix hardcoded size in null_sha1 commit: replace the raw buffer with strbuf in read_graft_line commit: implement free_commit_graft commit: rewrite read_graft_line builtin/blame.c | 2 +- commit.c| 55 --- commit.h| 4 ++-- sha1_file.c | 2 +- shallow.c | 1 + 5 files changed, 37 insertions(+), 27 deletions(-) -- 2.9.5
[RFC PATCH v2 0/4] Partial clone: promised objects (not only blobs)
Thanks for all your comments on the earlier version. This is a substantially different version. In particular: - Now supports all types (tag, commit, tree) of objects, not only blobs - fsck works - Incorporates Ben Peart's code that uses a long-living process (lifetime of the Git invocation) to obtain objects - Implemented as a repository extension If anyone would like to comment on the overall direction of this approach, that would be great. In particular, I'm not too sure on the names of things. I know that I have some things to still work on (documentation and coding style, more improvements to and tests for fsck, maybe division of commits?) but I would like some early feedback on the bigger picture first. If you want to patch this in, this is built off my recent cleanup patch [1]. About inability to scale if we have the list of promised blobs: In this design, we will have a promised blob entry only if we have a concrete tree, so in a repository in which many trees are omitted, there will not be many promised blob entry objects. In fact, the minimal partial clone will be one in which there will be one promised commit for each ref (assuming no duplicates), no promised trees, and no promised blobs. (I'm not planning to implement such a clone, but someone else could do so.) About having multiple promise lists: I have retained the single list, but am still open to change. The support of all the object types might be sufficient mitigation for the issues that caused us to investigate multiple promise lists in the first place. [1] https://public-inbox.org/git/20170718222848.1453-1-jonathanta...@google.com/ Jonathan Tan (4): object: remove "used" field from struct object promised-object, fsck: introduce promised objects sha1-array: support appending unsigned char hash sha1_file: support promised object hook Documentation/config.txt | 8 + Documentation/gitrepository-layout.txt | 8 + Documentation/technical/read-object-protocol.txt | 102 +++ Documentation/technical/repository-version.txt | 6 + Makefile | 1 + builtin/cat-file.c | 9 + builtin/fsck.c | 42 ++- cache.h | 4 + environment.c| 1 + fsck.c | 6 +- object.c | 21 +- object.h | 21 +- promised-object.c| 324 +++ promised-object.h| 34 +++ setup.c | 7 +- sha1-array.c | 7 + sha1-array.h | 1 + sha1_file.c | 44 ++- t/t3907-promised-object.sh | 73 + t/t3907/read-object | 114 t/test-lib-functions.sh | 6 + 21 files changed, 808 insertions(+), 31 deletions(-) create mode 100644 Documentation/technical/read-object-protocol.txt create mode 100644 promised-object.c create mode 100644 promised-object.h create mode 100755 t/t3907-promised-object.sh create mode 100755 t/t3907/read-object -- 2.14.0.rc0.284.gd933b75aa4-goog
Re: [PATCH v2 0/4] reflog-walk fixes for maint
Thanks. All made sense to me after reading them twice.
[PATCH v2 0/4] reflog-walk fixes for maint
On Wed, Jul 05, 2017 at 03:55:08AM -0400, Jeff King wrote: > The first patch is my original small fix with an extra test. I think > that would be appropriate for 'maint'. Its behavior still has some > quirks, but it avoids the confusion that you experienced and has a low > risk of breaking anything else. > > The rest of it replaces the fake-parent thing with a more > straight-forward iteration over the reflogs (i.e., a cleanup of the > further patches I've been posting). After digging into it and especially > after writing the new tests, I think I've convinced myself that this is > the right way forward. Here's an updated version of the bug-fix patch, along with the fix for the problem that Eric noticed, and some other problems I noticed while fixing that one. So I've split these immediate fixes for maint off into their own series. These are based on maint itself, rather than Kyle's original commit that introduces the double-null reflog, since some of the bugs came later in the v2.13 cycle (if we really wanted to, we could split it again into two more series, but I don't think it's worth the trouble). [1/4]: reflog-walk: skip over double-null oid due to HEAD rename This is the fix for the pseudo-truncation in v2.13, and is the same as the previous round. [2/4]: reflog-walk: duplicate strings in complete_reflogs list This fixes Eric's bug, and is the same as what I showed earlier. It's a triggerable use-after-free, which is why I think it's important to get it into maint. [3/4]: reflog-walk: don't free reflogs added to cache This is another use-after-free, though it's slightly harder to trigger. [4/4]: reflog-walk: include all fields when freeing complete_reflogs This one is an optional cleanup, but worth doing, I think. reflog-walk.c | 33 + t/t1411-reflog-show.sh | 10 ++ t/t3200-branch.sh | 11 +++ 3 files changed, 42 insertions(+), 12 deletions(-) -Peff
[PATCH v2 0/4] Improvements to sha1_file
Peff suggested putting in a new field in struct object_info for the object contents; I tried it and it seems to work out quite well. Patch 1 is unmodified from the previous version. Patches 2-3 have been rewritten, and patch 4 is similar except that the missing-lookup change is made to sha1_object_info_extended() instead of the now gone get_object(). As before, I would like review on patches 1-3 to go into the tree. (Patch 4 is a work in progress, and is here just to demonstrate the effectiveness of the refactoring.) Jonathan Tan (4): sha1_file: teach packed_object_info about typename sha1_file: move delta base cache code up sha1_file: consolidate storage-agnostic object fns sha1_file, fsck: add missing blob support Documentation/config.txt | 10 + builtin/fsck.c | 7 + cache.h | 13 ++ sha1_file.c | 506 +-- t/t3907-missing-blob.sh | 69 +++ 5 files changed, 418 insertions(+), 187 deletions(-) create mode 100755 t/t3907-missing-blob.sh -- 2.13.1.518.g3df882009-goog
[PATCH v2 0/4] Port git-add--interactive.perl:status_cmd to C
This is the second version of a patch series to start porting git-add--interactive from Perl to C. Series: v1: https://public-inbox.org/git/1494009820-2090-1-git-send-email-bnm...@gmail.com/ Travis CI build: https://travis-ci.org/theiostream/git/builds/232669894 I have applied most of the changes suggested by Johannes and Ævar (thank you!). The one exception was Johannes' request to make this a WIP patch series with a sort-of "design" of what's next to come. I did not do this because I still have no clue...! I'll begin experimenting on update_cmd() to see if I gain some insight on this issue that I could bring to this series. I do think this would be a good idea, though! :) -- Daniel. Daniel Ferreira (4): diff: export diffstat interface add--helper: create builtin helper for interactive add add--helper: implement interactive status command add--interactive: use add-interactive--helper for status_cmd .gitignore| 1 + Makefile | 1 + builtin.h | 1 + builtin/add--helper.c | 285 ++ diff.c| 17 +-- diff.h| 19 +++- git-add--interactive.perl | 4 +- git.c | 1 + 8 files changed, 309 insertions(+), 20 deletions(-) create mode 100644 builtin/add--helper.c -- 2.7.4 (Apple Git-66)
[PATCH v2 0/4] travis-ci: build docs with asciidoctor
Hi, changes since v1: * check Asciidoctor stderr output (Brian) http://public-inbox.org/git/20170418104411.hdkzh3psvej63...@genre.crustytoothpaste.net/ * fix make style nit (Junio) http://public-inbox.org/git/xmqq37dcorr7@gitster.mtv.corp.google.com/ Thanks, Lars Base Ref: master Web-Diff: https://github.com/larsxschneider/git/commit/315affa7c0 Checkout: git fetch https://github.com/larsxschneider/git travisci/asciidoctor-v2 && git checkout 315affa7c0 Interdiff (v1..v2): diff --git a/ci/test-documentation.sh b/ci/test-documentation.sh index 81f123e68d..6214e6acb4 100755 --- a/ci/test-documentation.sh +++ b/ci/test-documentation.sh @@ -1,4 +1,4 @@ -#!/bin/sh +#!/usr/bin/env bash # # Perform sanity checks on documentation and build it. # @@ -9,7 +9,8 @@ make check-builtins make check-docs # Build docs with AsciiDoc -make --jobs=2 doc +make --jobs=2 doc > >(tee stdout.log) 2> >(tee stderr.log >&2) +! test -s stderr.log test -s Documentation/git.html test -s Documentation/git.xml test -s Documentation/git.1 @@ -17,6 +18,8 @@ grep
[PATCH v2 0/4] recursing submodules with relative pathspec (grep and ls-files)
v2 of the series tackles the problem slightly differently than v1 did, and in a less invasive way in my opinion. v1 also didn't fix everything. v2 introduces a way to pass a 'prefix' that is respected by a git process. This allows the git child processes (which operate on submodules) to have the super_prefix (the path from the root of the superproject to the submodule) and the prefix (path from the root of the superproject to the directory in which the original command was launched). With these two pieces of information the child process can correctly handle pathspec matching as well as correctly formatting their output with relative paths. In order to pass the prefix to a child I made a new env var since the existing GIT_PREFIX isn't respected. I'm not sure this is the best method so I'm open to ideas on the best way to convey this information to a child process. Brandon Williams (4): grep: fix help text typo setup: allow for prefix to be passed to git commands grep: fix bug when recursing with relative pathspec ls-files: fix bug when recursing with relative pathspec builtin/grep.c | 41 +++ builtin/ls-files.c | 41 ++- cache.h| 1 + git.c | 2 - setup.c| 6 +++ t/t3007-ls-files-recurse-submodules.sh | 39 ++ t/t7814-grep-recurse-submodules.sh | 75 ++ 7 files changed, 167 insertions(+), 38 deletions(-) -- 2.12.0.367.g23dc2f6d3c-goog
Re: [PATCH v2 0/4] delete_ref: support reflog messages
Junio C Hamanowrites: > These looked reasonable. I had to resolve conflicts with another > topic in flight that removed set_worktree_head_symref() function > while queuing, and I think I resolved it correctly, but please > double check what is pushed out on 'pu'. The merge looks right to me, thanks. Thanks also for touching up the tab/space issue in t3200-branch.sh. -- Kyle
Re: [PATCH v2 0/4] delete_ref: support reflog messages
Kyle Meyerwrites: > Kyle Meyer (4): > delete_ref: accept a reflog message argument > update-ref: pass reflog message to delete_ref() > rename_ref: replace empty message in HEAD's log > branch: record creation of renamed branch in HEAD's log These looked reasonable. I had to resolve conflicts with another topic in flight that removed set_worktree_head_symref() function while queuing, and I think I resolved it correctly, but please double check what is pushed out on 'pu'. Thanks.
Re: [PATCH v2 0/4] delete_ref: support reflog messages
On Mon, Feb 20, 2017 at 08:10:31PM -0500, Kyle Meyer wrote: > Kyle Meyer (4): > delete_ref: accept a reflog message argument > update-ref: pass reflog message to delete_ref() > rename_ref: replace empty message in HEAD's log > branch: record creation of renamed branch in HEAD's log These look good to me. I think you did a nice job of summarizing the discussion in the commit messages. -Peff
[PATCH v2 0/4] delete_ref: support reflog messages
Thanks to Junio and Peff for their feedback on v1. https://public-inbox.org/git/20170217035800.13214-1-k...@kyleam.com/T/#u Changes from v1: * "msg" is now positioned as the first argument to delete_ref() to match update_ref(). 20170217081205.zn7j6d5cffgdv...@sigill.intra.peff.net * A "delete by head" test case has been added for the update-ref change. 20170217082253.kxjezkxfqkfxj...@sigill.intra.peff.net * The added tests no longer send grep's output to /dev/null. 20170217082253.kxjezkxfqkfxj...@sigill.intra.peff.net * Renaming the current branch is represented by two entries in HEAD's log, both which reuse the log message passed to rename_ref(). 20170217083112.vn7m4udsopmlv...@sigill.intra.peff.net, 20170217195549.z6uyy7hbbhj5a...@sigill.intra.peff.net * Corrected a few places in the commit messages where "delete_refs" was written instead of "delete_ref". Kyle Meyer (4): delete_ref: accept a reflog message argument update-ref: pass reflog message to delete_ref() rename_ref: replace empty message in HEAD's log branch: record creation of renamed branch in HEAD's log branch.c | 5 +++-- branch.h | 3 ++- builtin/am.c | 4 ++-- builtin/branch.c | 7 --- builtin/notes.c| 4 ++-- builtin/remote.c | 4 ++-- builtin/replace.c | 2 +- builtin/reset.c| 2 +- builtin/symbolic-ref.c | 2 +- builtin/tag.c | 2 +- builtin/update-ref.c | 2 +- fast-import.c | 2 +- refs.c | 6 +++--- refs.h | 7 --- refs/files-backend.c | 10 +- t/t1400-update-ref.sh | 18 ++ t/t3200-branch.sh | 6 ++ transport.c| 2 +- 18 files changed, 58 insertions(+), 30 deletions(-) -- 2.11.1
[PATCH v2 0/4] stash: create filename argument
Previous round is at: http://public-inbox.org/git/20170121200804.19009-1-t.gumme...@gmail.com/. Thanks Junio, Peff, Øyvind, Jakub and Johannes for your feedback on the previous round. Changes since the previous round: - Re-phrased the Documentation update. - Added missing $ in 2/3 - Added an extra patch introducing a new syntax for git stash create, where the message can be specified with the -m flag, instead of as a positional argument - Filenames with $IFS in their name are now supported. Added a test for that as well. Interdiff below: diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt index 871a3b246c..8306bac397 100644 --- a/Documentation/git-stash.txt +++ b/Documentation/git-stash.txt @@ -20,6 +20,8 @@ SYNOPSIS [--] [...] 'git stash' clear 'git stash' create [] +'git stash' create [-m ] [-u|--include-untracked] +[-- ...] 'git stash' store [-m|--message ] [-q|--quiet] DESCRIPTION @@ -51,8 +53,8 @@ OPTIONS save [-p|--patch] [-k|--[no-]keep-index] [-u|--include-untracked] [-a|--all] [-q|--quiet] []:: push [-p|--patch] [-k|--[no-]keep-index] [-u|--include-untracked] [-a|--all] [-q|--quiet] [-m|--message ] [--] [...]:: - Save your local modifications to a new 'stash', and revert the - the changes in the working tree to match the index. + Save your local modifications to a new 'stash' and roll them + back both in the working tree and in the index. The part is optional and gives the description along with the stashed state. For quickly making a snapshot, you can omit _both_ "save" and , but giving diff --git a/git-stash.sh b/git-stash.sh index 7dcce629bd..0072a38b4c 100755 --- a/git-stash.sh +++ b/git-stash.sh @@ -56,25 +56,57 @@ clear_stash () { } create_stash () { + stash_msg= + untracked= + new_style= files= while test $# != 0 do case "$1" in + -m|--message) + shift + stash_msg="$1" + new_style=t + ;; + -u|--include-untracked) + shift + untracked="$1" + new_style=t + ;; --) shift + files="$@" + new_style=t break ;; - --files) - ;; *) - files="$1 $files" + if test -n "$new_style" + then + echo "invalid argument" + option="$1" + # TRANSLATORS: $option is an invalid option, like + # `--blah-blah'. The 7 spaces at the beginning of the + # second line correspond to "error: ". So you should line + # up the second line with however many characters the + # translation of "error: " takes in your language. E.g. in + # English this is: + # + #$ git stash save --blah-blah 2>&1 | head -n 2 + #error: unknown option for 'stash save': --blah-blah + # To provide a message, use git stash save -- '--blah-blah' + eval_gettextln "error: unknown option for 'stash create': \$option" + usage + fi + break ;; esac shift done - stash_msg="$1" - untracked="$2" + if test -z "$new_style" + then + stash_msg="$*" + fi git update-index -q --refresh if no_changes @@ -284,7 +316,7 @@ push_stash () { git reflog exists $ref_stash || clear_stash || die "$(gettext "Cannot initialize stash")" - create_stash --files $files -- "$stash_msg" "$untracked" + create_stash -m "$stash_msg" -u "$untracked" -- $files store_stash -m "$stash_msg" -q $w_commit || die "$(gettext "Cannot save the current status")" say "$(eval_gettext "Saved working directory and index state \$stash_msg")" @@ -293,9 +325,9 @@ push_stash () { then if test -n "$files" then - git reset -- $files - git checkout HEAD -- $(git ls-files --modified -- $files) - git clean --force --quiet -- $(git ls-files --others -- $files) + git ls-files -z -- "$@" | xargs -0 git reset -- + git ls-files -z --modified -- "$@" | xargs -0 git
[PATCH v2 0/4] urlmatch: allow regexp-based matches
Hi, This is version two of my patch series. The use case is to be able to configure an HTTP proxy for all subdomains of a domain where there are hundreds of subdomains. Previously, I have been using complete regular expressions with an escape-mechanism to match the configuration key's URLs. According to Junio's comments, I changed this mechanism to a much simpler one, where the user is only allowed to use globbing for the host part of the URL. That is a user can now specify a key `http.https://*.example.com` to match all sub-domains of `example.com`. For now I've decided to implement it such that a single `*` matches a single subdomain only, so for example `https://foo.bar.example.com` would not match in this case. This is similar to how shell-globbing works usually, so it should not be of much surprise. It's also highlighted in the documentation. I did not include an interdiff as too much has changed between the two versions. Regards Patrick Patrick Steinhardt (4): mailmap: add Patrick Steinhardt's work address urlmatch: enable normalization of URLs with globs urlmatch: split host and port fields in `struct url_info` urlmatch: allow globbing for the URL host part .mailmap | 1 + Documentation/config.txt | 5 +++- t/t1300-repo-config.sh | 36 + urlmatch.c | 68 +--- urlmatch.h | 9 --- 5 files changed, 104 insertions(+), 15 deletions(-) -- 2.11.0
[PATCH v2 0/4] fix mergetool+rerere+subdir regression
If rerere is enabled, no pathnames are given, and mergetool is run from a subdirectory, mergetool always prints "No files need merging". Fix the bug. This regression was introduced in 57937f70a09c12ef484c290865dac4066d207c9c (v2.11.0). Changes since v1: * Patch 2/4 was reworked to improve the commit message, improve test case independence even further, and use 'test_when_finished "git reset --hard"' instead of a plain 'git reset --hard'. Richard Hansen (4): t7610: update branch names to match test number t7610: make tests more independent and debuggable t7610: add test case for rerere+mergetool+subdir bug mergetool: fix running in subdir when rerere enabled git-mergetool.sh | 1 + t/t7610-mergetool.sh | 251 ++- 2 files changed, 147 insertions(+), 105 deletions(-) -- 2.11.0.390.gc69c2f50cf-goog smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH v2 0/4] road to reentrant real_path
On Sat, Dec 10, 2016 at 2:42 AM, Brandon Williamswrote: > On 12/09, Duy Nguyen wrote: >> On Fri, Dec 9, 2016 at 6:58 AM, Brandon Williams wrote: >> > diff --git a/setup.c b/setup.c >> > index fe572b8..0d9fdd0 100644 >> > --- a/setup.c >> > +++ b/setup.c >> > @@ -254,10 +254,12 @@ int get_common_dir_noenv(struct strbuf *sb, const >> > char *gitdir) >> > if (!is_absolute_path(data.buf)) >> > strbuf_addf(, "%s/", gitdir); >> > strbuf_addbuf(, ); >> > - strbuf_addstr(sb, real_path(path.buf)); >> > + strbuf_realpath(sb, path.buf, 1); >> >> This is not the same because of this hunk in strbuf_realpath() > > Then perhaps I shouldn't make this change (and just leave it as is) > since the way real_path_internal/strbuf_realpath is written requires > that the strbuf being used for the resolved path only contains the > resolved path (see the lstat(resolved->buf ) call). Sidenote it > looks like strbuf_getcwd() also does a reset, though more subtlety, > since it just passes its buffer to getcwd(). Yeah that's ok too (I did not see this subtlety when I suggested the change). -- Duy