Re: [PATCH] rebase: mark the C reimplementation as an experimental opt-in feature (was Re: [ANNOUNCE] Git v2.20.0-rc1)
Johannes Schindelin writes ("Re: [PATCH] rebase: mark the C reimplementation as an experimental opt-in feature (was Re: [ANNOUNCE] Git v2.20.0-rc1)"): > I'll have to take a (lengthy) dinner break now, but this is what I have so > far: a regression test that verifies the breakage (see the > `fix-reflog-action` branch at https://github.com/dscho/git). I'll continue > after dinner and am confident that this bug will be fixed within the next > four hours. That seems super speedy to me! When you have a fix I will leave it up to the Debian git maintainers to decide whether they want to cherry pick your fix into their package, or await an updated upstream branch with rc, or what. > [Ian:] > > While you're looking at this, I observe that the fact that the `rebase > > finished' message also does not honour GIT_REFLOG_ACTION appears to be > > a pre-existing bug. > > I noticed that, too, but at this point I am only fixing regressions. We > can try to fix this long-standing bug in the v2.20 cycle. Right. Thanks, Ian. -- Ian JacksonThese opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: [PATCH] rebase: mark the C reimplementation as an experimental opt-in feature (was Re: [ANNOUNCE] Git v2.20.0-rc1)
Johannes Schindelin writes ("Re: [PATCH] rebase: mark the C reimplementation as an experimental opt-in feature (was Re: [ANNOUNCE] Git v2.20.0-rc1)"): > > In a successful run with older git I get a reflog like this: > > > >4833d74 HEAD@{0}: rebase finished: returning to > > refs/heads/with-preexisting > >4833d74 HEAD@{1}: debrebase new-upstream 2.1-1: rebase: Add another new > > upstream file > >cabd5ec HEAD@{2}: debrebase new-upstream 2.1-1: rebase: Edit the .c file > >0b362ce HEAD@{3}: debrebase new-upstream 2.1-1: rebase: Add a new > > upstream file > >29653e5 HEAD@{4}: debrebase new-upstream 2.1-1: rebase: checkout > > 29653e5a17bee4ac23a68bba3e12bc1f52858ac3 > >85e0c46 HEAD@{5}: debrebase: launder for new upstream ... > > This breaks the test because my test suite is checking that I set > > GIT_REFLOG_ACTION appropriately. > > > > If you want I can provide a minimal test case but this should suffice > > to see the bug I hope... > > This should be plenty for me to get going. Thank you! Happy hunting. While you're looking at this, I observe that the fact that the `rebase finished' message also does not honour GIT_REFLOG_ACTION appears to be a pre-existing bug. (In general one often can't rely on GIT_REFLOG_ACTION still being set because the rebase might have been interrupted and restarted, which I think is why my test case looks for it in the initial `checkout' message.) Regards, Ian. -- Ian JacksonThese opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: [PATCH] rebase: mark the C reimplementation as an experimental opt-in feature (was Re: [ANNOUNCE] Git v2.20.0-rc1)
Johannes Schindelin writes ("Re: [PATCH] rebase: mark the C reimplementation as an experimental opt-in feature (was Re: [ANNOUNCE] Git v2.20.0-rc1)"): > if you could pry more information (or better information) out of that bug > reporter, that would be nice. Apparently my email address is blacklisted > by his mail provider, so he is unlikely to have received my previous mail > (nor will he receive this one, I am sure). (I did receive this mail. Sorry for the inconvenience, which sadly is inevitable occasionally in the modern email world. FTR in future feel free to send the bounce to postmaster@chiark and I will make a you-shaped hole in my spamfilter. Also with Debian bugs you can launder your messages by, eg, emailing 914695-submitter@bugs.) > > > At https://bugs.debian.org/914695 is a report of a test regression in > > > an outside project that is very likely to have been triggered by the > > > new faster rebase code. As I wrote in the bug report last night: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914695#15 I have investigated and the bug seems to be that git-rebase --onto now fails to honour GIT_REFLOG_ACTION for the initial checkout. In a successful run with older git I get a reflog like this: 4833d74 HEAD@{0}: rebase finished: returning to refs/heads/with-preexisting 4833d74 HEAD@{1}: debrebase new-upstream 2.1-1: rebase: Add another new upstream file cabd5ec HEAD@{2}: debrebase new-upstream 2.1-1: rebase: Edit the .c file 0b362ce HEAD@{3}: debrebase new-upstream 2.1-1: rebase: Add a new upstream file 29653e5 HEAD@{4}: debrebase new-upstream 2.1-1: rebase: checkout 29653e5a17bee4ac23a68bba3e12bc1f52858ac3 85e0c46 HEAD@{5}: debrebase: launder for new upstream With a newer git I get this: 6d3fb91 HEAD@{0}: rebase finished: returning to refs/heads/master 6d3fb91 HEAD@{1}: debrebase new-upstream 2.1-1: rebase: Add another new upstream file 86c0721 HEAD@{2}: debrebase new-upstream 2.1-1: rebase: Edit the .c file 50ba56c HEAD@{3}: debrebase new-upstream 2.1-1: rebase: Add a new upstream file 8272825 HEAD@{4}: rebase: checkout 8272825bb4ff6eba89afa936e32b6460f963a36a c78db55 HEAD@{5}: debrebase: launder for new upstream This breaks the test because my test suite is checking that I set GIT_REFLOG_ACTION appropriately. If you want I can provide a minimal test case but this should suffice to see the bug I hope... Regards Ian. -- Ian JacksonThese opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: want option to git-rebase
Jonathan Nieder writes ("Re: want option to git-rebase"): > Oh, good catch. "git rebase" already generally does the right thing > when GIT_REFLOG_ACTION is set (by only appending to it and never > replacing it). Great. I indeed did not know about this. > Ian, does that work well for you? If so, any ideas where it should go > in the documentation to be more discoverable for next time? Thanks for asking exactly the right question :-). I didn't make a record of exactly where I looked but I'm pretty sure I looked at the manpages for git-reflog and git-rebase. I think I probably also looked at git-update-ref; I have read git-update-ref a number of times. Right now in Debian unstable I see that none of these places document this convention. I think git-reflog ought to mention it, so that it says where the information it provides comes from. It also ought to be mentioned in git-update-ref, because all callers of git-update-ref need to implement it ! Indeed, because I didn't know about this convention, dgit and git-debrebase do not honour it. At least in my case, if it had been in git-update-ref I would have implemented it myself and then I would obviously have thought of making use of it myself. Also, I have to say, the documentation for GIT_REFLOG_ACTION in git(1) is very obscure. It sort of generally waffles around what it is for, but it does not say: * what does this variable contain * who can and should set it * who should consume it * what the rules are for modifying it I don't think simply adding a cross-reference to GIT_REFLOG_ACTION in git(1) would be sufficient, without also improving this part. The explanations provided by you and Johannes, here in these emails, are much much better: > >> "git rebase" sets this itself, so it doesn't solve your problem. > > > > If it does so unconditionally, then that is a bug. If a script > > wants to set GIT_REFLOG_ACTION, but finds that it is already set, > > then it must not change the value. set_reflog_action in > > git-sh-setup does the right thing. > > > > So, if there is another script or application around git-rebase, then it > > should just set GIT_REFLOG_ACTION (if it is not already set) and export the > > environment variable to git-rebase. > > Oh, good catch. "git rebase" already generally does the right thing > when GIT_REFLOG_ACTION is set (by only appending to it and never > replacing it). Maybe some of this prose, which explains things quite well, could be reworked into a form suitable for the git docs. (Even though there seems to be disagreement about whether a subcommand may *append* to GIT_REFLOG_ACTION; which, ISTM, is a practice which ought to be encouraged rather than discouraged.) Regards, Ian. -- Ian JacksonThese opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: want option to git-rebase
Jonathan Nieder writes ("Re: want option to git-rebase"): > Ian Jackson wrote[1]: > > git-rebase leaves entries like this in the reflog: > > > > c15f4d5391 HEAD@{33}: rebase: checkout > > c15f4d5391ff07a718431aca68a73e672fe8870e ... > GIT_REFLOG_ACTION > When a ref is updated, reflog entries are created to keep > track of the reason why the ref was updated (which is > typically the name of the high-level command that updated the > ref), in addition to the old and new values of the ref. A > scripted Porcelain command can use set_reflog_action helper > function in git-sh-setup to set its name to this variable when > it is invoked as the top level command by the end user, to be > recorded in the body of the reflog. > > "git rebase" sets this itself, so it doesn't solve your problem. Hrm. > Can you say more about what your tool does? I'm wondering if it would > make sense for it to use lower-level commands where GIT_REFLOG_ACTION > applies, instead of the more user-facing git rebase. Sure. http://www.chiark.greenend.org.uk/ucgi/~ianmdlvl/git-manpage/dgit.git/git-debrebase.1 See the description of git-rebase new-upstream. It does a lot of complicated work, synthesising a new pair of commits using plumbing etc., and then does git rebase --onto If the user says git rebase --abort, everything should be undone. Another alternative solution would be to be able to make git reflog entries without actually updating any ref. Ian. -- Ian JacksonThese opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: git signed push server-side [and 3 more messages]
Hi. Thanks to both of you for your helpful comments. Jonathan Nieder writes ("Re: git signed push server-side"): > Ian Jackson wrote[1]: > > 2. git-receive-pack calls gpg (Debian #852684) > > > > It would be better if it called gpgv. ... > think respecting gpg.program would be nicer. Is there a reason not > to do that? I think it very unlikely that anyone would want git-receive-pack's signed push facility to end up running gpg rather than gpgv. But, the git-receive-pack functions here are building blocks which need quite a lot of extra work to use, anyway. So having all callers have to pass -c gpg.program would be quite tolerable, if a bit ugly. > I suspect receive-pack just forgot to call git_gpg_config. So, I will send a patch to fix that. > > 3. No way to specify keyring (Debian #852684, side note) > > > > There should be a way to specify the keyring used by > > git-receive-pack's gpgv invocation. This should probably be done with > > a config option, receive.certKeyring perhaps. > > How is the keyring configured for other commands that use GPG, like > "git tag -v"? (Forgive my laziness in not looking it up.) It's not. You just get whatever keyring and trust db etc. your gpg has as the default. As Junio says, being able to set the keyring explicitly is not essential to use the signed push feature; one can make a wrapper for gpgv, or set GNUPGHOME. It just seemed to me that the current interface is not very convenient. If it is controversial to provide a more convenient interface, then I will work with what there is. > > 4. Trouble with the nonce (Debian #852688 part 2) ... > I also wonder why you say the git configuration system is unsuited to > keeping secrets. E.g. passing an include.path setting with -c or > GIT_CONFIG_PARAMETERS should avoid the kinds of trouble you described. I was not aware of include.path. That would solve this aspect of the problem (but not the rollover aspect, of course). GIT_CONFIG_PARAMETERS is not documented. > > (ii) At some later point, the following enhancement: When > > !stateless_rpc, certNonceSeedsFile is ignored except that if neither > > it nor the old certNonceSeed is set, the signed push feature is > > disabled. > > That seems like an awkward interface. Shouldn't there be at least > another config variable to enable signed push without making up a seed > or filename? Perhaps. I don't have a strong opinion about the config parameter names and semantics. This seems like a matter of taste and I'm happy to just do whatever others want. > >In this state we always get a fresh nonce (from a suitable > > system random source). > > How does this work with stateless_rpc? (See "Session State" in > Documentation/technical/http-protocol.txt.) It doesn't, which is why I propose this only if !stateless_rpc. > > Nontrivial because current git doesn't seem to > > have a "get suitable random number" function, and the mess that is the > > semantics of /dev/*random* files means that providing one is going to > > be controversial. > > I think you're overestimating how much pushback adding such a thing > would get. Well, I'm happy to give it ago. (NB: I will use /dev/urandom on Linux.) > More references: > > - JGit's push cert handling: > https://git.eclipse.org/r/#/q/message:cert Really helpful, thanks. I will read these. (Shame that I can't even view that information without running their javascript!) I ended up reading this https://github.com/sitaramc/gitolite/blob/cf062b8bb6b21a52f7c5002d33fbc950762c1aa7/contrib/hooks/repo-specific/save-push-signatures > - Gerrit's push cert handling: > https://gerrit-review.googlesource.com/q/project:gerrit+message:gpg (This is an empty page for me, even with javascript turned on.) I'll probably implement the refs/meta/push-certs:REF@{cert} special branch thing from JGit. Can anyone point me at a public repo which has such push-certs recorded, so I can check the details and make my thing do stuff the same way ? Junio C Hamano writes ("Re: git signed push server-side"): > Jonathan Nieder <jrnie...@gmail.com> writes: > > Ian Jackson wrote[1]: > >> Proposed change: provide the full fingerprint instead. Do this > >> for every caller of gpg-interface.c. > > > > Sounds sane. > > I probably should react a bit stronger against that "instead", as > Ian will not be writing the world's first server side hook that uses > this interface. Anyone who is verifying signatures and doing anything with the 16-character key id other than logging it, has a serious security bug. > But on the other hand, the value of this environment is not meant to > be used to make de
git signed push server-side
I have been investigating git signed pushes. I found a number of infelicities in the server side implementation which make using this in practice rather difficult. I'm emailing here (before writing patches) to see what people think of my proposed changes. 1. PUSH_CERT_KEY has truncated keyid (Debian #852647) I see this: GIT_PUSH_CERT_KEY=A3DBCBC039B13D8A There is almost no purpose for which this 64-bit keyid can be safely used. The full key fingerprint should be provided instead. Proposed change: provide the full fingerprint instead. Do this for every caller of gpg-interface.c. 2. git-receive-pack calls gpg (Debian #852684) It would be better if it called gpgv. gpg does all sorts of complicated things, including automatically starting or connecting to a gpg-agent, which are not appropriate for use in a daemon on a server. Additionally, I find that passing -c gpg.program=/usr/bin/gpgv to git receive-pack is not effective, and there seems to be no sensible way to specify the keyrings to use (although that could be done by setting GNUPGHOME perhaps). Proposed change: call gpgv instead (and make any needed changes to adapt to gpgv). Do this only when we are in git-receive-pack; other call sites of gpg-interface.c will continue to use gpg. 3. No way to specify keyring (Debian #852684, side note) There should be a way to specify the keyring used by git-receive-pack's gpgv invocation. This should probably be done with a config option, receive.certKeyring perhaps. 4. Trouble with the nonce (Debian #852688 part 2) To use the signed push feature it is necessary to provide a nonce seed to git-receive-pack. The docs say the seed must be secret but there is no documented way to pass this seed to git that does not either write it to a git configuration file somewhere, or pass it on a command line. The git configuration system is unsuited to keeping secrets. Command lines can be seen in ps etc. Additionally, the seed should be changed occasionally (since unchangeable secrets are a bad idea). Ideally this should be done automatically. Analysis: the seed is used to allow the server to mostly-statelessly verify the freshness of a client's nonce. (This is necessary only in connectionless transport protocols, "stateless_rpc" as builtin/receive-pack.c has it.) Proposed fix (in two parts): (i) Provide a new config option receive.certNonceSeedsFile. It contains seeds, one per line. When stateless_rpc, we send a nonce computed from the first seed. We accept nonces computed from any of the listed seeds. The documentation will say that the file should normally contain two seeds; rollover is achieved by mving into place a new seed at the top, and dropping one from the bottom. An example script will be provided. (ii) At some later point, the following enhancement: When !stateless_rpc, certNonceSeedsFile is ignored except that if neither it nor the old certNonceSeed is set, the signed push feature is disabled. In this state we always get a fresh nonce (from a suitable system random source). Nontrivial because current git doesn't seem to have a "get suitable random number" function, and the mess that is the semantics of /dev/*random* files means that providing one is going to be controversial. 5. There are no docs on how to use this feature properly (Debian #852695, #852688 part 1) Using the signed push feature requires careful programming on the server side. There should be a doc explaining how to do this. Proposed fix: provide a .txt file containing much the same contents as seen here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=852695 If I send patches like the above, would they be welcomed (subject to detailed review, of course) ? Thanks, Ian. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: RFC: Another proposed hash function transition plan
Linus Torvalds writes ("Re: RFC: Another proposed hash function transition plan"): > Also, since 256 isn't evenly divisible by 6, and because you'd want > some way to explictly disambiguate the new hashes, the rule *could* be > that the ASCII representation of a new hash is the base64 encoding of > the 258-bit value that has "10" prepended to it as padding. > > That way the first character of the hash would be guaranteed to not be > a hex digit, because it would be in the range [g-v] (indexes 32..47). We should arrange for this to be an uppercase, not a lowercase, letter, for the reasons I explained in my own proposal. To summarise: It would be undesirable to further increase the overlap between object names and ref names. Few people use uppercase in ref names because of the case-insensitive filesystem problem; so object names starting with uppercase ascii are distinct from most object names. > Of course, having written that, I now realize how it would cause > problems for the usual shit-for-brains case-insensitive filesystems. > So I guess base64 encoding doesn't work well for that reason. AFAIAA object names occur in publicly-visible filenames only in notes tree objects, which are manipulated by git internally and do not necessarily need to appear in the filesystem. The filenames in .git/objects/ can be in whatever encoding we like, so are not an obstacle. Ian. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: RFC: Another proposed hash function transition plan
Jonathan Nieder writes ("RFC: Another proposed hash function transition plan"): > This past week we came up with this idea for what a transition to a new > hash function for Git would look like. I'd be interested in your > thoughts (especially if you can make them as comments on the document, > which makes it easier to address them and update the document). Thanks for this. This is a reasonable plan. It corresponds to approaches (2) and (B) of my survey mail from the other day. Ie, two parallel homogeneous hash trees, rather than a unified but heterogeneous hash tree, with old vs new object names distinguished by length. I still prefer my proposal with the mixed hash tree, mostly because the handling of signatures here is very awkward, and because my proposal does not involve altering object ids stored other than in the git object graph (eg CI system databases, etc.) One thing you've missed, I think, is notes: notes have to be dealt with in a more complicated way. Do you intend to rewrite the tree objects for notes commits so that the notes are annotations for the new names for the annotated objects ? And if so, when ? Also I think you need to specify how abbreviated object names are interpreted. Regards, Ian.
Re: Transition plan for git to move to a new hash function
brian m. carlson writes ("Re: Transition plan for git to move to a new hash function"): > Instead, I was referring to areas like the notes code. It has extensive > use of the last byte as a type of lookup table key. It's very dependent > on having exactly one hash, since it will always want to use the last > byte. You mean note_tree_search ? (My tree here may be a bit out of date.) This doesn't seem difficult to fix. The nontrivial changes would be mostly confined to SUBTREE_SHA1_PREFIXCMP and GET_NIBBLE. It's true that like most of git there's a lot of hardcoded `sha1'. Are you arguing in favour of "replace git with git2 by simply s/20/64/g; s/sha1/blake/g" ? This seems to me to be a poor idea. Takeup of the new `git2' would be very slow because of the pain involved. Any sensible method of moving to a new hash that isn't "make a completely incompatible new version of git" is going to involve teaching the code we have in git right now to handle new hashes as well as sha1 hashes. Even if the plan is to try to convert old data, rather than keep it and be able to refer to it from new data, something will have to be able to parse old packfiles, old commits, old tags, old notes, etc. etc. etc. Either that's going to be some separate conversion utility, or it has to be the same code in git that's there already.[1] The ability to handle both old-format and new-format data can be achieved in the code by doing away with the hardcoded sha1s, so that instead the hash is an abstract data type with operations like "initialise", "compare", "get a nybble", etc. We've already seen patches going in this direction. [1] I've heard suggestions here that instead we should expect users to "git1 fast-export", which you would presumably feed into "git2 fast-import". But what is `git1' here ? Is it the current git codebase frozen in time ? I don't think it can be. With this conversion strategy, we will need to maintain git1 for decades. It will need portability fixes, security fixes, fixes for new hostile compiler optimisations, and so on. The difficulty of conversion means there will be pressure to backport new features from `git2' to `git1'. (Also this approach means that all signatures are definitively lost during the conversion process.) So if we want to provide both `git1' and `git2', it's still better to compile `git' and `git2' from the same codebase. But if we do that, the resulting ifdeffery and/or other hash abstractions are most of the work to be hash-agile. It's just the difference between a compile-time and runtime switch. I think the incompatibile approach is much more work in the medium and long term - and it leads to a longer transition period. Bear in mind that our objective is not to minimise the time until the new version of git is available. Our objective is to minimise the time until (most) people are using it. An approach which takes longer for the git community to develop, but which is easier to deploy, can easily be better. Or maybe the objective is to minimise overall effort. In which case more work on git, for an easier transition for all the users, seems like a no-brainer. I think this is arguably true even from the point of view of effort amongst the community of git contributors. git contributors start out as git users - and if git's users are all busy struggling with a difficult transition, they will have less time to improve other stuff and will tend less to get involved upstream. (And they may be less inclined to feel that the git upstream developers understand their needs well.) The better alternative is to adopt a plan that has a clear and straightforward transition for users, and ask git users to help with implementation. I think many git users, including sophisticated users and competent organisations, are concerned about sha1. Currently most of those users will find it difficult to help, because it's not clear to them what needs to be done. Thanks, Ian.
Re: SHA1 collisions found
with a hash function indication (sha256: or blake_ or H) (B) Infer the hash function from the object name length (and do some kind of bodge for abbreviated object names). (C) Hash function is implicit from context. (This is compatible with (2) only, because (1) requires any object to be able to refer to any hash function.) I think (A) is best because it means everything is unambiguous, and it allows future hash function changes without further difficulty. (B) is a reasonable possibility although the abbreviated object name bodge would be quite ugly and probably involve thinking about several annoying edge cases. I think (C) is really bad, because it instantly makes all existing application code which calls git to be buggy. Such application code would need to be adjusted to know for itself which of the object names it has recorded are what hash function, and explicitly specify this to its git operations somehow. All of these options involve updating many callers of git. In any case any git caller which explicitly checks the object name length will need to be changed. For (a), many git callers which match object names using something like [0-9a-f]+ rather than \w+ will need to be changed - but at least it's a simple change with little semantic import. (A) has the additional advantage that it becomes possible to make object names syntactically distinguishable from ref names. The final argument I would make is this: We don't know what hash function research will look like in 10-20 years. We would like to not have a bunch of pain again. So ideally we would deploy a framework now that would let us switch hash function again without further history-rewriting. (1)(A) and perhaps (1)(B) are the only options which support this well. Ian. [1] I'm going to keep assuming that the bikeshed will be blue, because I think BLAKE2b has is a better choice. It has probably had more serious people looking at it than SHA-3, at least, and it has good performance. The web page has an impressive adoption list - probably wider than SHA-3. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: Transition plan for git to move to a new hash function
brian m. carlson writes ("Re: Transition plan for git to move to a new hash function"): > On Mon, Feb 27, 2017 at 01:00:01PM +, Ian Jackson wrote: > > Objects of one hash may refer to objects named by a different hash > > function to their own. Preference rules arrange that normally, new > > hash objects refer to other new hash objects. > > The existing codebase isn't really intended with that in mind. Yes. I've seen the attempts to start to replace char* with a hash struct. > I like Peff's suggested approach in which we essentially rewrite history > under the hood, but have a lookup table which looks up the old hash > based on the new hash. That allows us to refer to old objects, but not > have to share serialized data that mentions both hashes. I think this means that the when a project converts, every copy of the history must be rewritten (separately). Also, this leaves the whole system lacking in algorithm agililty. Meaning we may have to do all of this again some time. I also think that we need to distinguish old hashes from new hashes in the command line interface etc. Otherwise there is a possible ambiguity. > > The object name textual syntax is extended. The new syntax may be > > used in all textual git objects and protocols (commits, tags, command > > lines, etc.). > > > > We declare that the object name syntax is henceforth > > [A-Z]+[0-9a-z]+ | [0-9a-f]+ > > and that names [A-Z].* are deprecated as ref name components. > > I'd simply say that we have data always be in the new format if it's > available, and tag the old SHA-1 versions instead. Otherwise, as Peff > pointed out, we're going to be stuck typing a bunch of identical stuff > every time. Again, this encourages migration. The hash identifier is only one character. Object names are not normally typed very much anyway. If you say we must decorate old hashes, then all existing data everywhere in the world which refers to any git objects by object name will become invalid. I don't mean just data in git here. I mean CI systems, mailing list archives, commit messages (perhaps in other version control systems), test cases, and so on. Ian.
Re: Why BLAKE2?
Markus Trippelsdorf writes ("Re: Why BLAKE2?"): > On 2017.02.27 at 13:00 +, Ian Jackson wrote: > > For brevity I will write `SHA' for hashing with SHA-1, using current > > unqualified object names, and `BLAKE' for hasing with BLAKE2b, using > > H object names. > > Why do you choose BLAKE2? SHA-2 is generally considered still fine and > would be the obvious choice. And if you want to be adventurous then > SHA-3 (Keccak) would be the next logical candidate. I don't have a strong opinion. Keccak would be fine too. We should probably avoid SHA-2. The main point of my posting was not to argue in favour of a particular hash function :-). Ian. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Transition plan for git to move to a new hash function
so. (In both cases, by setting configuration options.) Y4: BLAKE by default for new projects. Conversion enabled for existing projects. Old git software is going to start rotting. Default configuration change: BLAKE > SHA BLAKE enabled (even in trees without working trees) Suggested multi-project hosting site configuration change: Newly created projects should get BLAKE enabled Existing projects should retain BLAKE disabled by default Button should be provided to start conversion (see below) Effects: When creating a new working tree, it starts using BLAKE. Servers which have been updated will accept BLAKE. Servers which have not been updated to Y4's git will need a small configuration change (enabling BLAKE) to cope with the new projects that are using BLAKE. To convert a project, an administrator (or project owner) would set BLAKE to enabled, and SHA to deprecated, on the server(s). On the next pull the server will provide ref hints naming BLAKE, which will get copied to the client's HEAD. So the client is infected with BLAKE. To convert a project branch-by-branch, the administrator would set BLAKE to enabled but leave SHA enabled. Then each branch retains its own hash. A branch can be converted by pushing a BLAKE commit to it, or by setting a ref hint on the server (or on the next client to push). Y6: BLAKE by default for all projects Existing projects start being converted infectiously. It is hard for a project to stop this happening if any of their servers are updated. Old git software is firmly stuffed. Default configuration change SHA deprecated in trees without working trees Effects: Existing projects are, by default, `converted', as described above. Y8: Clients hate SHA Clients insist on trying to convert existing projects It is very hard to stop this happening. Unrepentant servers start being very hard to use. Default configuration change SHA deprecated (even in trees without working trees) Effects: Clients will generate only BLAKE. Hopefully their server will accept this! Y10: Stop accepting new SHA No-one can manage to make new SHA commits Default configuration change SHA disabled in new trees, except during initial `clone', `mirror' and similar Effects: Existing SHA history is retained, and copied to new clients and servers. But established clients and servers reject any newly introduced SHA. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: SHA1 collisions found
Ian Jackson writes ("Re: SHA1 collisions found"): > The idea of using the length is a neat trick, but it cannot support > the dcurrent object name abbreviation approach unworkable. Sorry, it's late here and my grammar seems to have disintegrated ! Ian.
Re: SHA1 collisions found
Junio C Hamano writes ("Re: SHA1 collisions found"): > Ian Jackson <ijack...@chiark.greenend.org.uk> writes: > > * Therefore the transition needs to be done by giving every object > >two names (old and new hash function). Objects may refer to each > >other by either name, but must pick one. The usual shape of > > I do not think it is necessrily so. Indeed. And my latest thoughts involve instead having two parallel systems of old and new objects. > *1* In the above toy example, length being 40 vs 64 is used as a > sign between SHA-1 and the new hash, and careful readers may > wonder if we should use sha-3,20769079d22... or something like > that that more explicity identifies what hash is used, so that > we can pick a hash whose length is 64 when we transition again. I have an idea for this. I think we should prefix new hashes with a single uppercase letter, probably H. Uppercase because: case-only-distinguished ref names are already discouraged because they do not work properly on case-insensitive filesystems; convention is that ref names are lowercase; so an uppercase letter probably won't appear at the start of a ref name component even though almost all existing software will treat it as legal. So the result is that the new object names are unlikely to collide with ref names. (There is of course no need to store the H as a literal in filenames, so the case-insensitive filesystem problem does not apply to ref names.) We should definitely not introduce new punctuation into object names. That will cause a great deal of grief for existing software which has to handle git object names and may thy to store them in representations which assume that they match \w+. The idea of using the length is a neat trick, but it cannot support the dcurrent object name abbreviation approach unworkable. Ian.
Re: SHA1 collisions found
Joey Hess writes ("SHA1 collisions found"): > https://shattered.io/static/shattered.pdf > https://freedom-to-tinker.com/2017/02/23/rip-sha-1/ > > IIRC someone has been working on parameterizing git's SHA1 assumptions > so a repository could eventually use a more secure hash. How far has > that gotten? There are still many "40" constants in git.git HEAD. I have been thinking about how to do a transition from SHA1 to another hash function. I have concluded that: * We can should avoid expecting everyone to rewrite all their history. * Unfortunately, because the data formats (particularly, the commit header) are not in practice extensible (because of the way existing code parses them), it is not useful to try generate new data (new commits etc.) containing both new hashes and old hashes: old clients will mishandle the new data. * Therefore the transition needs to be done by giving every object two names (old and new hash function). Objects may refer to each other by either name, but must pick one. The usual shape of project histories will be a pile of old commits referring to each other by old names, surmounted by new commits referrring to each other by new names. * It is not possible to solve this problem without extending the object name format. Therefore all software which calls git and expects to handle object names will need to be updated. I have been writing a more detailed transition plan. I hope to post this within a few days. Ian.
Re: [PATCH 0/5] git check-ref-format --stdin --report-errors
Michael Haggerty writes ("Re: [PATCH 0/5] git check-ref-format --stdin --report-errors"): > Thanks for your patches. I left some comments about the individual patches. Thanks for your review. > I don't know whether this feature will be popular, but it's not a lot of > code to add it, so it would be OK with me. Great. > Especially given that the output is not especially machine-readable, it > might be more consistent with other commands to call the new feature > `--verbose` rather than `--report-errors`. Sure. > If it is thought likely that scripts will want to leave a pipe open to > this command and feed it one query at a time, then it would be helpful > to flush stdout after each reference's result is written. If the > opposite use case is common (mass processing of refnames), we could > always add a `--buffer` option like the one that `git cat-file --batch` has. I think it should be unbuffered by default, so I will make that change, along with the fixes from your other mails, and resubmit. Regards, Ian. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: [PATCH 1/5] check-ref-format: Refactor out check_one_ref_format
Michael Haggerty writes ("Re: [PATCH 1/5] check-ref-format: Refactor out check_one_ref_format"): > On 11/04/2016 08:13 PM, Ian Jackson wrote: > > +static int check_one_ref_format(const char *refname) ... > This function needs to `return 0` if it gets to the end. Indeed it does. I'm kind of surprised my compiler didn't spot that. Thanks for the careful review! Regards, Ian. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: [PATCH 2/5] check-ref-format: Refactor to make --branch code more common
Michael Haggerty writes ("Re: [PATCH 2/5] check-ref-format: Refactor to make --branch code more common"): > On 11/04/2016 08:13 PM, Ian Jackson wrote: > > static int normalize = 0; > > +static int check_branch = 0; > > static int flags = 0; > > > > static int check_one_ref_format(const char *refname) > > { > > + int got; > > `got` is an unusual name for this variable, and I don't really > understand what the word means in this context. Is there a reason not to > use the more usual `err`? I have no real opinion about the name of this variable. `err' is a fine name too. > > + if (check_branch && (flags || normalize)) > > Is there a reason not to allow `--normalize` with `--branch`? > (Currently, `git check-ref-format --branch` *does* allow input like > `refs/heads/foo`.) It was like that when I found it :-). I wasn't sure why this restriction was there so I left it alone. Looking at it again: AFAICT from the documentation --branch is a completely different mode. The effect of --normalize is not to do additional work, but simply to produce additional output. It really means --print-normalized. --branch already prints output, but AFAICT it does not collapse slashes. This seems like a confusing collection of options. But, sorting that out is beyond the scope of what I was trying to do. In my series I have at least managed not to make any of this any worse, I think: the --stdin option I introduce applies to both modes equally, and doesn't make future improvements to the conflict between --branch and --normalize any harder. (In _this_ patch, certainly, allowing --normalize with --branch would be wrong, since _this_ patch is just refactoring.) Thanks, Ian. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: [PATCH 5/6] config docs: Provide for config to specify tags not to abbreviate
Junio C Hamano writes ("Re: [PATCH 5/6] config docs: Provide for config to specify tags not to abbreviate"): > Ian Jackson <ijack...@chiark.greenend.org.uk> writes: > > This is not correct, because as I have explained, this should be a > > per-tree configuration: > > I do not have fundamental opposition to make it part of .git/config, > but the name "gitk.something" or if you are enhancing git-gui at the > time perhaps "gui.something" would be appropriate. > > But it is still silly to have this kind of information that is very > specific to Gitk in two places, one that is pretty Gitk specific > that core-git does not know anything about, the other that are part > of the configuration storage of the core-git. In the longer term, > it is necessary for them to be accessible from gitk's "Edit -> > Preferences" mechanism somehow, I would think, rather than forcing > users to sometimes go to GUI to tweak and sometimes run "git config". I am proposing to set this configuration setting automatically in dgit. Other tools that work with particular git tags would do the same. There would be no need for users to do anything. Having this as an option in a menu would be quite wrong, because it would end up with the user and the tooling fighting. This is why I don't want to put this in gitk's existing config file mechanism. It would be wrong for dgit to edit the user's gitk config file, for many reasons. To put it another way, this setting is a way for a tool like dgit to communicate with gitk (or other programs which have to make guesses about how prominently to present certain information to the user). It's not intended to be a way for users, certainly not non-expert users, to communicate with gitk. The way I have structured my proposed patches in gitk would make it easy to provide a gui option to adjust these settings. Such a gui option ought to save its value in the gitk config file, and those values ought to override what comes from `git config'. But such a system would not obviate the need for a legitimate way for programs like dgit to communicate with gitk. Thanks, Ian. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: [PATCH 5/6] config docs: Provide for config to specify tags not to abbreviate
Junio C Hamano writes ("Re: [PATCH 5/6] config docs: Provide for config to specify tags not to abbreviate"): > And I do not think we would want "log" or any core side Porcelain > command to have too many "information losing" options like this > "truncate refnames down to a point where it is no longer unique and > meaningful". GUI tools can get away with doing sos because they can > arrange these truncated labels to react to end-user input (e.g. the > truncated Tag in the history display of gitk could be made to react > to mouse-over and pop-up to show a full name, for example), but the > output from the core side is pretty much fixed once it is emitted. > > So my first preference would be to teach gitk such a "please > clarify" UI-reaction, if it does not know how to do so yet. There > is no need for a configuration variable anywhere with this approach. gitk already has a way for the user to find out what the elided tag names are. The underlying difficulty is that the situation that the gitk behaviour is designed for (long tag names, perhaps several to a commit, not particularly interesting), is not applicable to these particular tags. Whether the tag is `particularly interesting' depends, as I say, on both what tree it is in, and on its name. It might be appropriate for terminal-based tools to highlight these tags too, or show them when tags are not normally displayed. `core.interestingTags' ? > If you do want to add a configuration to show fuller name in the > tag, which would make it unnecessary for the user to do "please > clarify, as I am hovering over what I want to get details of" > action, that may also be a good way to go. I think in my use case, which I hope to become common within Debian, this is going to be essential. > But I think the right > place to do so would be Edit -> Preferences menu in Gitk, and the > settings will be stored in ~/.gitk or ~/.config/git/gitk or whatever > gitk-specific place. This is not correct, because as I have explained, this should be a per-tree configuration: If it can't be a `git config' option, even `git config gui.something', then I guess I will have to teach gitk to read a config file in GIT_DIR too. But I think that is silly given that git already has a config file reading system which handles per-tree configs. If we can't get agreement from the git-core developers on a config to be used, and documented, for any tool which has similar behaviour, I think the right answer is `git config gitk.', which would be documented in gitk. Thanks, Ian. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: [PATCH 5/6] config docs: Provide for config to specify tags not to abbreviate
Jeff King writes ("Re: [PATCH 5/6] config docs: Provide for config to specify tags not to abbreviate"): > Yeah, I think git's config system was always designed to carry options > for porcelains outside of git-core itself. So your new option fits into > that. Good, thanks. > I think the two things I found weird were: > > - it's in the "log" section, which makes me think it's an option for > git-log. But it's not. I'm not sure what the _right_ section is, but > hopefully it would make it clear that this is command-agnostic. > > Something like "gui.abbrevTags" might be OK (and as you note, has > precedence). But of course it's possible that a command like "tig" > could learn to support it. I'm not sure if that counts as a GUI or > not. :) I don't really have an opinion about the name. gui.abbrevTags would be a possibility. (It's a bit odd that implicitly, the default would be `*'.) > - The description talks about tag abbreviation, but doesn't really > define it. Not being a gitk user, it was hard for me to figure out > whether this was even relevant. Does it mean turning > "refs/tags/v1.0" into "1.0"? From the rest of the series, it sounds > like no. That should be more clear from the documentation. I can do that, sure. By default, gitk doesn't like to use much screen real estate for tags. If there are long tag names, or many tags, it shows them all as a single small indication saying just `<tag...|' or whatever with the literal `tag...', not with the tag value. Maybe a better name would be gui.alwaysShowTags ? I'm happy to be just told what the name ought to be, if the gitk and git maintainers can agree. It seems largely a matter of taste. Thanks, Ian. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: [PATCH 0/6] Provide for config to specify tags not to abbreviate
Markus Hitter writes ("Re: [PATCH 0/6] Provide for config to specify tags not to abbreviate"): > TBH, I see a violation of tool independence with the choice of > preference storage. Abbreviation of tags isn't a property of the > repository, but a pure visual thing (screen real estate, whatever), > so it should be handled by the tool doing the visuals, only. As I explained in my cover letter, the set of tags which are important enough not to abbreviate, even if they would normally be abbreviated, is indeed a property of the repository. The alternative would be for a tool like gitk to grow an ever-increasing set of heuristics. Or, worse, for a tool like dgit (which knows that archive/* are special) to edit the user's personal gitk settings. > Your use case looks like a nice opportunity for > > - adding a Gitk user preference on how long displayed tags are > allowed to be (instead of distinguishing between abbreviated and > unabbreviated ones; set it to 999 for your use case) and/or This would be wrong, because it's only certain tags that ought not to be abbreviated. The right way to identify those tags is by 1. what repo they are in 2. what their name is. (It might be possible to identify them by content or something - for example, the interesting archive/* tags all refer to commits whose trees contain debian/ - but that is getting quite out of hand.) What you propose are possible general improvements to the abbreviation system in gitk. But they do not address the fundamental point that some tags are much more interesting than others. It is this latter point that I am trying to deal with. Ian. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: [PATCH 5/6] config docs: Provide for config to specify tags not to abbreviate
Jacob Keller writes ("Re: [PATCH 5/6] config docs: Provide for config to specify tags not to abbreviate"): > On Mon, Nov 7, 2016 at 4:52 PM, Ian Jackson > <ijack...@chiark.greenend.org.uk> wrote: > > +log.noAbbrevTags:: > > + Each value is a glob pattern, specifying tag nammes which > > + should always be displayed in full, even when other tags may > > + be omitted or abbreviated (for example, by linkgit:gitk[1]). > > + Values starting with `^` specify tags which should be > > + abbreviated. The order is important: the last match, in the > > + most-local configuration, wins. > > + > > It seems weird that this description implies some sort of behavior > change in core git itself, but in fact is only used as a reference for > other tools that may or may not honor it. I guess the reasoning here > is to try to get other external tools that abbreviate tags to also > honor this? But it still seems pretty weird to have a documented > config that has no code in core git to honor it... Thanks for your attention. Yes, I agree that it does seem weird. But the alternatives seem worse. I think it's probably best if options like this (currently only honoured by out-of-core tools but of general usefulness) are collected together here. There is a precedent: `git config gui.encoding' is, according to the documentation, honoured only by git-gui and gitk. Calling the config option `gitk.noAbbrevTags' would be possible but that would invite everyone else to invent their own, which would be quite annoying. (Also, gitk does not have any gitk-specific git config options right now, AIUI. It does honour `git config gui.encoding'.) Would it help to add a sentence to the documentation saying that this is currently only honoured by gitk ? (The paragraph for gui.encoding says something similar.) Of course I don't know who else abbreviates tags, but as they gain support they could be added to the docs. Thanks, Ian. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: [PATCH 0/6] Provide for config to specify tags not to abbreviate
Ian Jackson writes ("[PATCH 0/6] Provide for config to specify tags not to abbreviate"): > Please find in the following mails patches which provide a way to make > gitk display certain tags in full, even if they would normally be > abbreviated. > > There are four patches to gitk, three to prepare the ground, and one > to introduce the new feature. > > There is one patch for git, to just document the new config variable. The eagle-eyed reader will spot that that makes 5 patches, not 6. There are indeed only 5. The subject mentioning 6 is a mistake - sorry. Thanks, Ian. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
[PATCH 0/6] Provide for config to specify tags not to abbreviate
Hi. Please find in the following mails patches which provide a way to make gitk display certain tags in full, even if they would normally be abbreviated. There are four patches to gitk, three to prepare the ground, and one to introduce the new feature. There is one patch for git, to just document the new config variable. I hope this is the right way to submit this series. Thanks for your attention. As I say in the patch "gitk: Provide for config to specify tags not to abbreviate": The config setting is in git config logs.* rather than gitk's own configuration, because: - Tools which manage git trees may want to set this, depending on their knowledge of the nature of the tags likely to be present; - Whether this property ought to be set is mostly a property of the contents of the tag namespaces in the tree, not a user preference. (Although of course user preferences are supported.) - Other git utilities (or out of tree utilities) may want to reference this setting for their own display purposes. There will be another, separate, patch to the `git' tree to document this config option. Background motivation: Debian's dgit archive gateway tool generates and uses tags called archive/debian/VERSION. If such a tag refers to a Debian source tree, it is probably very interesting because it refers to a version actually uploaded to Debian by the Debian package maintainer. We would therefore like a way to specify that such tags should be displayed in full. dgit will be able to set an appropriate config setting in the trees it deals with. Ian Jackson (4): gitk: Internal: drawtags: Abolish "singletag" variable gitk: Internal: drawtags: Idempotently reset "ntags" gitk: drawtags: Introduce concept of unabbreviated marks gitk: Provide for config to specify tags not to abbreviate gitk | 34 ++ 1 file changed, 30 insertions(+), 4 deletions(-) Ian Jackson (1): config docs: Provide for config to specify tags not to abbreviate Documentation/config.txt | 8 1 file changed, 8 insertions(+) -- 2.10.1
[PATCH GITK 2/6] gitk: Internal: drawtags: Idempotently reset "ntags"
The previous code tracked its change to the length of `marks' by updateing the variable `ntags'. This is a bit fragile and cumbersome, and we are going to want to modify `marks' some more in a moment. Instead, simply reset ntags to the length of marks, after we have possibly done any needed abbreviation. No functional change. Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk> --- gitk | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gitk b/gitk index 42fa41a..31aecda 100755 --- a/gitk +++ b/gitk @@ -6575,9 +6575,10 @@ proc drawtags {id x xt y1} { } else { set marks [list [format "%d tags..." $ntags]] } - set ntags 1 } } +set ntags [llength $marks] + if {[info exists idheads($id)]} { set marks [concat $marks $idheads($id)] set nheads [llength $idheads($id)] -- 2.10.1
[PATCH GITK 1/6] gitk: Internal: drawtags: Abolish "singletag" variable
We are going to want to make the contents of `marks' somewhat more complicated in a moment, so it won't be possible to use what is effectively a single variable to represent the status of the whole of the non-heads part of the marks list. Luckily the strings that replace actual tag names, in the `singletag' case, are not themselves valid tag names. So they can be detected directly. (Also, `singletag' was not quite right anyway: really it meant that the tag name(s) had been abbreviated.) No functional change. Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk> --- gitk | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gitk b/gitk index 805a1c7..42fa41a 100755 --- a/gitk +++ b/gitk @@ -6570,7 +6570,6 @@ proc drawtags {id x xt y1} { if {$ntags > $maxtags || [totalwidth $marks mainfont $extra] > $maxwidth} { # show just a single "n tags..." tag - set singletag 1 if {$ntags == 1} { set marks [list "tag..."] } else { @@ -6620,7 +6619,7 @@ proc drawtags {id x xt y1} { $xr $yt $xr $yb $xl $yb $x [expr {$yb - $delta}] \ -width 1 -outline $tagoutlinecolor -fill $tagbgcolor \ -tags tag.$id] - if {$singletag} { + if {[regexp {^tag\.\.\.|^\d+ } $tag]} { set tagclick [list showtags $id 1] } else { set tagclick [list showtag $tag_quoted 1] -- 2.10.1
[PATCH GITK 4/6] gitk: Provide for config to specify tags not to abbreviate
Tags matching a new multi-valued config option log.noAbbrevTags are not abbreviated. The config setting is in git config logs.* rather than gitk's own configuration, because: - Tools which manage git trees may want to set this, depending on their knowledge of the nature of the tags likely to be present; - Whether this property ought to be set is mostly a property of the contents of the tag namespaces in the tree, not a user preference. (Although of course user preferences are supported.) - Other git utilities (or out of tree utilities) may want to reference this setting for their own display purposes. There will be another, separate, patch to the `git' tree to document this config option. Background motivation: Debian's dgit archive gateway tool generates and uses tags called archive/debian/VERSION. If such a tag refers to a Debian source tree, it is probably very interesting because it refers to a version actually uploaded to Debian by the Debian package maintainer. We would therefore like a way to specify that such tags should be displayed in full. dgit will be able to set an appropriate config setting in the trees it deals with. Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk> --- gitk | 13 + 1 file changed, 13 insertions(+) diff --git a/gitk b/gitk index d76f1e3..515d7b0 100755 --- a/gitk +++ b/gitk @@ -6547,6 +6547,14 @@ proc totalwidth {l font extra} { } proc tag_want_unabbrev {tag} { +global noabbrevtags +# noabbrevtags was reversed when we read config, so take first match +foreach pat $noabbrevtags { + set inverted [regsub {^\^} $pat {} pat] + if {[string match $pat $tag]} { + return [expr {!$inverted}] + } +} return 0 } @@ -12138,6 +12146,11 @@ set tclencoding [tcl_encoding $gitencoding] if {$tclencoding == {}} { puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk" } +set noabbrevtags {} +catch { +set noabbrevtags [exec git config --get-all log.noAbbrevTags] +} +set noabbrevtags [lreverse [split $noabbrevtags "\n"]] set gui_encoding [encoding system] catch { -- 2.10.1
[PATCH GITK 3/6] gitk: drawtags: Introduce concept of unabbreviated marks
We are going to want to show some tags in full, even if they are long or there are other tags. Do this by filtering the tags into `marks_unabbrev' and `marks'. `marks_unabbrev' bypasses the tag abbreviation, and is put on the front of the marks array after any abbreviation has been done. No functional change right now because no tags are considered `unabbrev'. Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk> --- gitk | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/gitk b/gitk index 31aecda..d76f1e3 100755 --- a/gitk +++ b/gitk @@ -6546,6 +6546,10 @@ proc totalwidth {l font extra} { return $tot } +proc tag_want_unabbrev {tag} { +return 0 +} + proc drawtags {id x xt y1} { global idtags idheads idotherrefs mainhead global linespc lthickness @@ -6564,8 +6568,16 @@ proc drawtags {id x xt y1} { set delta [expr {int(0.5 * ($linespc - $lthickness))}] set extra [expr {$delta + $lthickness + $linespc}] +set marks_unabbrev {} if {[info exists idtags($id)]} { - set marks $idtags($id) + set marks {} + foreach tag $idtags($id) { + if {[tag_want_unabbrev $tag]} { + lappend marks_unabbrev $tag + } else { + lappend marks $tag + } + } set ntags [llength $marks] if {$ntags > $maxtags || [totalwidth $marks mainfont $extra] > $maxwidth} { @@ -6577,6 +6589,7 @@ proc drawtags {id x xt y1} { } } } +set marks [concat $marks_unabbrev $marks] set ntags [llength $marks] if {[info exists idheads($id)]} { -- 2.10.1
[PATCH 5/6] config docs: Provide for config to specify tags not to abbreviate
Tags matching a new multi-valued config option log.noAbbrevTags should not be abbreviated. Currently this config option is used only by gitk (and the patch to gitk will come via the gitk maintainer tree). The config setting is in git config logs.* rather than gitk's own configuration, because: - Tools which manage git trees may want to set this, depending on their knowledge of the nature of the tags likely to be present; - Whether this property ought to be set is mostly a property of the contents of the tag namespaces in the tree, not a user preference. (Although of course user preferences are supported.) - Other git utilities (or out of tree utilities) may want to reference this setting for their own display purposes. Background motivation: Debian's dgit archive gateway tool generates and uses tags called archive/debian/VERSION. If such a tag refers to a Debian source tree, it is probably very interesting because it refers to a version actually uploaded to Debian by the Debian package maintainer. We would therefore like a way to specify that such tags should be displayed in full. dgit will be able to set an appropriate config setting in the trees it deals with. Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk> --- Documentation/config.txt | 8 1 file changed, 8 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index a0ab66a..6aade4f 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2002,6 +2002,14 @@ log.abbrevCommit:: linkgit:git-whatchanged[1] assume `--abbrev-commit`. You may override this option with `--no-abbrev-commit`. +log.noAbbrevTags:: + Each value is a glob pattern, specifying tag nammes which + should always be displayed in full, even when other tags may + be omitted or abbreviated (for example, by linkgit:gitk[1]). + Values starting with `^` specify tags which should be + abbreviated. The order is important: the last match, in the + most-local configuration, wins. + log.date:: Set the default date-time mode for the 'log' command. Setting a value for log.date is similar to using 'git log''s -- 2.10.1
[PATCH 5/5] check-ref-format: New --stdin option
Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk> --- Documentation/git-check-ref-format.txt | 10 -- builtin/check-ref-format.c | 34 +++--- 2 files changed, 39 insertions(+), 5 deletions(-) diff --git a/Documentation/git-check-ref-format.txt b/Documentation/git-check-ref-format.txt index e9a2657..5a213ce 100644 --- a/Documentation/git-check-ref-format.txt +++ b/Documentation/git-check-ref-format.txt @@ -10,8 +10,9 @@ SYNOPSIS [verse] 'git check-ref-format' [--report-errors] [--normalize] [--[no-]allow-onelevel] [--refspec-pattern] - -'git check-ref-format' [--report-errors] --branch +| --stdin +'git check-ref-format' [--report-errors] --branch +| --stdin DESCRIPTION --- @@ -109,6 +110,11 @@ OPTIONS If any ref does not check OK, print a message to stderr. (By default, git check-ref-format is silent.) +--stdin:: + Instead of checking on ref supplied on the command line, + read refs, one per line, from stdin. The exit status is + 0 if all the refs were OK. + EXAMPLES diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c index 559d5c2..87f52fa 100644 --- a/builtin/check-ref-format.c +++ b/builtin/check-ref-format.c @@ -76,6 +76,7 @@ static int check_one_ref_format(const char *refname) int cmd_check_ref_format(int argc, const char **argv, const char *prefix) { int i; + int use_stdin = 0; if (argc == 2 && !strcmp(argv[1], "-h")) usage(builtin_check_ref_format_usage); @@ -93,6 +94,8 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix) check_branch = 1; else if (!strcmp(argv[i], "--report-errors")) report_errors = 1; + else if (!strcmp(argv[i], "--stdin")) + use_stdin = 1; else usage(builtin_check_ref_format_usage); } @@ -100,8 +103,33 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix) if (check_branch && (flags || normalize)) usage(builtin_check_ref_format_usage); - if (! (i == argc - 1)) - usage(builtin_check_ref_format_usage); + if (!use_stdin) { + if (! (i == argc - 1)) + usage(builtin_check_ref_format_usage); + + return check_one_ref_format(argv[i]); + } else { + char buffer[2048]; + int worst = 0; - return check_one_ref_format(argv[i]); + if (! (i == argc)) + usage(builtin_check_ref_format_usage); + + while (fgets(buffer, sizeof(buffer), stdin)) { + char *newline = strchr(buffer, '\n'); + if (!newline) { + fprintf(stderr, "%s --stdin: missing final newline or line too long\n", *argv); + exit(127); + } + *newline = 0; + int got = check_one_ref_format(buffer); + if (got > worst) + worst = got; + } + if (!feof(stdin)) { + perror("reading from stdin"); + exit(127); + } + return worst; + } } -- 2.10.1
[PATCH 4/5] check-ref-format: New --report-errors option
Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk> --- Documentation/git-check-ref-format.txt | 8 ++-- builtin/check-ref-format.c | 10 -- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/Documentation/git-check-ref-format.txt b/Documentation/git-check-ref-format.txt index 8611a99..e9a2657 100644 --- a/Documentation/git-check-ref-format.txt +++ b/Documentation/git-check-ref-format.txt @@ -8,10 +8,10 @@ git-check-ref-format - Ensures that a reference name is well formed SYNOPSIS [verse] -'git check-ref-format' [--normalize] +'git check-ref-format' [--report-errors] [--normalize] [--[no-]allow-onelevel] [--refspec-pattern] -'git check-ref-format' --branch +'git check-ref-format' [--report-errors] --branch DESCRIPTION --- @@ -105,6 +105,10 @@ OPTIONS with a status of 0. (`--print` is a deprecated way to spell `--normalize`.) +--report-errors:: + If any ref does not check OK, print a message to stderr. +(By default, git check-ref-format is silent.) + EXAMPLES diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c index 020ebe8..559d5c2 100644 --- a/builtin/check-ref-format.c +++ b/builtin/check-ref-format.c @@ -9,7 +9,7 @@ static const char builtin_check_ref_format_usage[] = "git check-ref-format [--normalize] [] \n" -" or: git check-ref-format --branch "; +" or: git check-ref-format [] --branch "; /* * Return a copy of refname but with leading slashes removed and runs @@ -51,6 +51,7 @@ static int check_ref_format_branch(const char *arg) static int normalize = 0; static int check_branch = 0; static int flags = 0; +static int report_errors = 0; static int check_one_ref_format(const char *refname) { @@ -61,8 +62,11 @@ static int check_one_ref_format(const char *refname) got = check_branch ? check_ref_format_branch(refname) : check_refname_format(refname, flags); - if (got) + if (got) { + if (report_errors) + fprintf(stderr, "bad ref format: %s\n", refname); return 1; + } if (normalize) { printf("%s\n", refname); free((void*)refname); @@ -87,6 +91,8 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix) flags |= REFNAME_REFSPEC_PATTERN; else if (!strcmp(argv[i], "--branch")) check_branch = 1; + else if (!strcmp(argv[i], "--report-errors")) + report_errors = 1; else usage(builtin_check_ref_format_usage); } -- 2.10.1
[PATCH 0/5] git check-ref-format --stdin --report-errors
I wanted to be able to syntax check lots of proposed refs quickly (please don't ask why - it's complicated!) So I added a --stdin option to git-check-ref-format. Also it has --report-errors now too so you can get some kind of useful error message if it complains. It's still not really a good batch mode but it's good enough for my use case. To improve it would involve a new command line option to offer a suitable stdout output format. There are three small refactoring patches and the two patches with new options and corresponding docs. Thanks for your attention. FYI I am not likely to need this again in the near future: it's a one-off use case. So my effort for rework is probably limited. I thought I'd share what I'd done in what I hope is a useful form, anyway. Regards, Ian.
[PATCH 2/5] check-ref-format: Refactor to make --branch code more common
We are going to want to permit other options with --branch. So, replace the special case with just an entry for --branch in the parser for ordinary options, and check for option compatibility at the end. No overall functional change. Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk> --- builtin/check-ref-format.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c index 4d56caa..f12c19c 100644 --- a/builtin/check-ref-format.c +++ b/builtin/check-ref-format.c @@ -49,13 +49,19 @@ static int check_ref_format_branch(const char *arg) } static int normalize = 0; +static int check_branch = 0; static int flags = 0; static int check_one_ref_format(const char *refname) { + int got; + if (normalize) refname = collapse_slashes(refname); - if (check_refname_format(refname, flags)) + got = check_branch + ? check_ref_format_branch(refname) + : check_refname_format(refname, flags); + if (got) return 1; if (normalize) printf("%s\n", refname); @@ -68,9 +74,6 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix) if (argc == 2 && !strcmp(argv[1], "-h")) usage(builtin_check_ref_format_usage); - if (argc == 3 && !strcmp(argv[1], "--branch")) - return check_ref_format_branch(argv[2]); - for (i = 1; i < argc && argv[i][0] == '-'; i++) { if (!strcmp(argv[i], "--normalize") || !strcmp(argv[i], "--print")) normalize = 1; @@ -80,9 +83,15 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix) flags &= ~REFNAME_ALLOW_ONELEVEL; else if (!strcmp(argv[i], "--refspec-pattern")) flags |= REFNAME_REFSPEC_PATTERN; + else if (!strcmp(argv[i], "--branch")) + check_branch = 1; else usage(builtin_check_ref_format_usage); } + + if (check_branch && (flags || normalize)) + usage(builtin_check_ref_format_usage); + if (! (i == argc - 1)) usage(builtin_check_ref_format_usage); -- 2.10.1
[PATCH 1/5] check-ref-format: Refactor out check_one_ref_format
We are going to want to reuse this. No functional change right now. It currently has a hidden memory leak if --normalize is used. Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk> --- builtin/check-ref-format.c | 26 ++ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c index eac4994..4d56caa 100644 --- a/builtin/check-ref-format.c +++ b/builtin/check-ref-format.c @@ -48,12 +48,22 @@ static int check_ref_format_branch(const char *arg) return 0; } +static int normalize = 0; +static int flags = 0; + +static int check_one_ref_format(const char *refname) +{ + if (normalize) + refname = collapse_slashes(refname); + if (check_refname_format(refname, flags)) + return 1; + if (normalize) + printf("%s\n", refname); +} + int cmd_check_ref_format(int argc, const char **argv, const char *prefix) { int i; - int normalize = 0; - int flags = 0; - const char *refname; if (argc == 2 && !strcmp(argv[1], "-h")) usage(builtin_check_ref_format_usage); @@ -76,13 +86,5 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix) if (! (i == argc - 1)) usage(builtin_check_ref_format_usage); - refname = argv[i]; - if (normalize) - refname = collapse_slashes(refname); - if (check_refname_format(refname, flags)) - return 1; - if (normalize) - printf("%s\n", refname); - - return 0; + return check_one_ref_format(argv[i]); } -- 2.10.1
[PATCH 3/5] check-ref-format: Abolish leak of collapsed refname
collapse_slashes always returns a value from xmallocz. Right now this leak is not very interesting, since we only call check_one_ref_format once. Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk> --- builtin/check-ref-format.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c index f12c19c..020ebe8 100644 --- a/builtin/check-ref-format.c +++ b/builtin/check-ref-format.c @@ -63,8 +63,10 @@ static int check_one_ref_format(const char *refname) : check_refname_format(refname, flags); if (got) return 1; - if (normalize) + if (normalize) { printf("%s\n", refname); + free((void*)refname); + } } int cmd_check_ref_format(int argc, const char **argv, const char *prefix) -- 2.10.1