Re: [PATCH] rebase: mark the C reimplementation as an experimental opt-in feature (was Re: [ANNOUNCE] Git v2.20.0-rc1)

2018-11-29 Thread Ian Jackson
Johannes Schindelin writes ("Re: [PATCH] rebase: mark the C reimplementation as 
an experimental opt-in feature (was Re: [ANNOUNCE] Git v2.20.0-rc1)"):
> I'll have to take a (lengthy) dinner break now, but this is what I have so
> far: a regression test that verifies the breakage (see the
> `fix-reflog-action` branch at https://github.com/dscho/git). I'll continue
> after dinner and am confident that this bug will be fixed within the next
> four hours.

That seems super speedy to me!

When you have a fix I will leave it up to the Debian git maintainers
to decide whether they want to cherry pick your fix into their
package, or await an updated upstream branch with rc, or what.

> [Ian:]
> > While you're looking at this, I observe that the fact that the `rebase
> > finished' message also does not honour GIT_REFLOG_ACTION appears to be
> > a pre-existing bug.
> 
> I noticed that, too, but at this point I am only fixing regressions. We
> can try to fix this long-standing bug in the v2.20 cycle.

Right.

Thanks,
Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: [PATCH] rebase: mark the C reimplementation as an experimental opt-in feature (was Re: [ANNOUNCE] Git v2.20.0-rc1)

2018-11-29 Thread Ian Jackson
Johannes Schindelin writes ("Re: [PATCH] rebase: mark the C reimplementation as 
an experimental opt-in feature (was Re: [ANNOUNCE] Git v2.20.0-rc1)"):
> >  In a successful run with older git I get a reflog like this:
> > 
> >4833d74 HEAD@{0}: rebase finished: returning to 
> > refs/heads/with-preexisting
> >4833d74 HEAD@{1}: debrebase new-upstream 2.1-1: rebase: Add another new 
> > upstream file
> >cabd5ec HEAD@{2}: debrebase new-upstream 2.1-1: rebase: Edit the .c file
> >0b362ce HEAD@{3}: debrebase new-upstream 2.1-1: rebase: Add a new 
> > upstream file
> >29653e5 HEAD@{4}: debrebase new-upstream 2.1-1: rebase: checkout 
> > 29653e5a17bee4ac23a68bba3e12bc1f52858ac3
> >85e0c46 HEAD@{5}: debrebase: launder for new upstream
...
> >  This breaks the test because my test suite is checking that I set
> >  GIT_REFLOG_ACTION appropriately.
> > 
> >  If you want I can provide a minimal test case but this should suffice
> >  to see the bug I hope...
> 
> This should be plenty for me to get going. Thank you!

Happy hunting.

While you're looking at this, I observe that the fact that the `rebase
finished' message also does not honour GIT_REFLOG_ACTION appears to be
a pre-existing bug.

(In general one often can't rely on GIT_REFLOG_ACTION still being set
because the rebase might have been interrupted and restarted, which I
think is why my test case looks for it in the initial `checkout'
message.)

Regards,
Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: [PATCH] rebase: mark the C reimplementation as an experimental opt-in feature (was Re: [ANNOUNCE] Git v2.20.0-rc1)

2018-11-29 Thread Ian Jackson
Johannes Schindelin writes ("Re: [PATCH] rebase: mark the C reimplementation as 
an experimental opt-in feature (was Re: [ANNOUNCE] Git v2.20.0-rc1)"):
> if you could pry more information (or better information) out of that bug
> reporter, that would be nice. Apparently my email address is blacklisted
> by his mail provider, so he is unlikely to have received my previous mail
> (nor will he receive this one, I am sure).

(I did receive this mail.  Sorry for the inconvenience, which sadly is
inevitable occasionally in the modern email world.  FTR in future feel
free to send the bounce to postmaster@chiark and I will make a
you-shaped hole in my spamfilter.  Also with Debian bugs you can
launder your messages by, eg, emailing 914695-submitter@bugs.)

> > > At https://bugs.debian.org/914695 is a report of a test regression in
> > > an outside project that is very likely to have been triggered by the
> > > new faster rebase code.

As I wrote in the bug report last night:

 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914695#15

 I have investigated and the bug seems to be that git-rebase --onto now
 fails to honour GIT_REFLOG_ACTION for the initial checkout.

 In a successful run with older git I get a reflog like this:

   4833d74 HEAD@{0}: rebase finished: returning to refs/heads/with-preexisting
   4833d74 HEAD@{1}: debrebase new-upstream 2.1-1: rebase: Add another new 
upstream file
   cabd5ec HEAD@{2}: debrebase new-upstream 2.1-1: rebase: Edit the .c file
   0b362ce HEAD@{3}: debrebase new-upstream 2.1-1: rebase: Add a new upstream 
file
   29653e5 HEAD@{4}: debrebase new-upstream 2.1-1: rebase: checkout 
29653e5a17bee4ac23a68bba3e12bc1f52858ac3
   85e0c46 HEAD@{5}: debrebase: launder for new upstream

 With a newer git I get this:

   6d3fb91 HEAD@{0}: rebase finished: returning to refs/heads/master
   6d3fb91 HEAD@{1}: debrebase new-upstream 2.1-1: rebase: Add another new 
upstream file
   86c0721 HEAD@{2}: debrebase new-upstream 2.1-1: rebase: Edit the .c file
   50ba56c HEAD@{3}: debrebase new-upstream 2.1-1: rebase: Add a new upstream 
file
   8272825 HEAD@{4}: rebase: checkout 8272825bb4ff6eba89afa936e32b6460f963a36a
   c78db55 HEAD@{5}: debrebase: launder for new upstream

 This breaks the test because my test suite is checking that I set
 GIT_REFLOG_ACTION appropriately.

 If you want I can provide a minimal test case but this should suffice
 to see the bug I hope...

Regards
Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: want option to git-rebase

2018-06-20 Thread Ian Jackson
Jonathan Nieder writes ("Re: want  option to git-rebase"):
> Oh, good catch.  "git rebase" already generally does the right thing
> when GIT_REFLOG_ACTION is set (by only appending to it and never
> replacing it).

Great.  I indeed did not know about this.

> Ian, does that work well for you?  If so, any ideas where it should go
> in the documentation to be more discoverable for next time?

Thanks for asking exactly the right question :-).

I didn't make a record of exactly where I looked but I'm pretty sure I
looked at the manpages for git-reflog and git-rebase.  I think I
probably also looked at git-update-ref; I have read git-update-ref a
number of times.

Right now in Debian unstable I see that none of these places document
this convention.

I think git-reflog ought to mention it, so that it says where the
information it provides comes from.

It also ought to be mentioned in git-update-ref, because all callers
of git-update-ref need to implement it !

Indeed, because I didn't know about this convention, dgit and
git-debrebase do not honour it.  At least in my case, if it had been
in git-update-ref I would have implemented it myself and then I would
obviously have thought of making use of it myself.


Also, I have to say, the documentation for GIT_REFLOG_ACTION
in git(1) is very obscure.  It sort of generally waffles around what
it is for, but it does not say:
 * what does this variable contain
 * who can and should set it
 * who should consume it
 * what the rules are for modifying it

I don't think simply adding a cross-reference to GIT_REFLOG_ACTION in
git(1) would be sufficient, without also improving this part.


The explanations provided by you and Johannes, here in these emails,
are much much better:

> >> "git rebase" sets this itself, so it doesn't solve your problem.
> >
> > If it does so unconditionally, then that is a bug. If a script
> > wants to set GIT_REFLOG_ACTION, but finds that it is already set,
> > then it must not change the value. set_reflog_action in
> > git-sh-setup does the right thing.
> >
> > So, if there is another script or application around git-rebase, then it
> > should just set GIT_REFLOG_ACTION (if it is not already set) and export the
> > environment variable to git-rebase.
> 
> Oh, good catch.  "git rebase" already generally does the right thing
> when GIT_REFLOG_ACTION is set (by only appending to it and never
> replacing it).

Maybe some of this prose, which explains things quite well, could be
reworked into a form suitable for the git docs.  (Even though there
seems to be disagreement about whether a subcommand may *append* to
GIT_REFLOG_ACTION; which, ISTM, is a practice which ought to be
encouraged rather than discouraged.)


Regards,
Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: want option to git-rebase

2018-06-19 Thread Ian Jackson
Jonathan Nieder writes ("Re: want  option to git-rebase"):
> Ian Jackson wrote[1]:
> > git-rebase leaves entries like this in the reflog:
> >
> >   c15f4d5391 HEAD@{33}: rebase: checkout 
> > c15f4d5391ff07a718431aca68a73e672fe8870e
...
>  GIT_REFLOG_ACTION
>   When a ref is updated, reflog entries are created to keep
>   track of the reason why the ref was updated (which is
>   typically the name of the high-level command that updated the
>   ref), in addition to the old and new values of the ref. A
>   scripted Porcelain command can use set_reflog_action helper
>   function in git-sh-setup to set its name to this variable when
>   it is invoked as the top level command by the end user, to be
>   recorded in the body of the reflog.
> 
> "git rebase" sets this itself, so it doesn't solve your problem.

Hrm.

> Can you say more about what your tool does?  I'm wondering if it would
> make sense for it to use lower-level commands where GIT_REFLOG_ACTION
> applies, instead of the more user-facing git rebase.

Sure.

http://www.chiark.greenend.org.uk/ucgi/~ianmdlvl/git-manpage/dgit.git/git-debrebase.1

See the description of git-rebase new-upstream.  It does a lot of
complicated work, synthesising a new pair of commits using plumbing
etc., and then does
  git rebase --onto  

If the user says git rebase --abort, everything should be undone.

Another alternative solution would be to be able to make git reflog
entries without actually updating any ref.

Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: git signed push server-side [and 3 more messages]

2017-08-26 Thread Ian Jackson
Hi.  Thanks to both of you for your helpful comments.

Jonathan Nieder writes ("Re: git signed push server-side"):
> Ian Jackson wrote[1]:
> > 2. git-receive-pack calls gpg (Debian #852684)
> >
> > It would be better if it called gpgv.
...
> think respecting gpg.program would be nicer.  Is there a reason not
> to do that?

I think it very unlikely that anyone would want git-receive-pack's
signed push facility to end up running gpg rather than gpgv.  But, the
git-receive-pack functions here are building blocks which need quite a
lot of extra work to use, anyway.  So having all callers have to pass
-c gpg.program would be quite tolerable, if a bit ugly.

> I suspect receive-pack just forgot to call git_gpg_config.

So, I will send a patch to fix that.

> > 3. No way to specify keyring (Debian #852684, side note)
> >
> > There should be a way to specify the keyring used by
> > git-receive-pack's gpgv invocation.  This should probably be done with
> > a config option, receive.certKeyring perhaps.
> 
> How is the keyring configured for other commands that use GPG, like
> "git tag -v"?  (Forgive my laziness in not looking it up.)

It's not.  You just get whatever keyring and trust db etc. your gpg
has as the default.  As Junio says, being able to set the keyring
explicitly is not essential to use the signed push feature; one can
make a wrapper for gpgv, or set GNUPGHOME.

It just seemed to me that the current interface is not very
convenient.  If it is controversial to provide a more convenient
interface, then I will work with what there is.

> > 4. Trouble with the nonce (Debian #852688 part 2)
...
> I also wonder why you say the git configuration system is unsuited to
> keeping secrets.  E.g. passing an include.path setting with -c or
> GIT_CONFIG_PARAMETERS should avoid the kinds of trouble you described.

I was not aware of include.path.  That would solve this aspect of the
problem (but not the rollover aspect, of course).
GIT_CONFIG_PARAMETERS is not documented.

> > (ii) At some later point, the following enhancement: When
> > !stateless_rpc, certNonceSeedsFile is ignored except that if neither
> > it nor the old certNonceSeed is set, the signed push feature is
> > disabled.
> 
> That seems like an awkward interface.  Shouldn't there be at least
> another config variable to enable signed push without making up a seed
> or filename?

Perhaps.  I don't have a strong opinion about the config parameter
names and semantics.  This seems like a matter of taste and I'm happy
to just do whatever others want.

> >In this state we always get a fresh nonce (from a suitable
> > system random source).
> 
> How does this work with stateless_rpc?  (See "Session State" in
> Documentation/technical/http-protocol.txt.)

It doesn't, which is why I propose this only if !stateless_rpc.

> > Nontrivial because current git doesn't seem to
> > have a "get suitable random number" function, and the mess that is the
> > semantics of /dev/*random* files means that providing one is going to
> > be controversial.
> 
> I think you're overestimating how much pushback adding such a thing
> would get.

Well, I'm happy to give it ago.  (NB: I will use /dev/urandom on
Linux.)

> More references:
> 
> - JGit's push cert handling:
>   https://git.eclipse.org/r/#/q/message:cert

Really helpful, thanks.  I will read these.  (Shame that I can't even
view that information without running their javascript!)

I ended up reading this
 
https://github.com/sitaramc/gitolite/blob/cf062b8bb6b21a52f7c5002d33fbc950762c1aa7/contrib/hooks/repo-specific/save-push-signatures

> - Gerrit's push cert handling:
>   https://gerrit-review.googlesource.com/q/project:gerrit+message:gpg

(This is an empty page for me, even with javascript turned on.)

I'll probably implement the refs/meta/push-certs:REF@{cert} special
branch thing from JGit.  Can anyone point me at a public repo which
has such push-certs recorded, so I can check the details and make my
thing do stuff the same way ?

Junio C Hamano writes ("Re: git signed push server-side"):
> Jonathan Nieder <jrnie...@gmail.com> writes:
> > Ian Jackson wrote[1]:
> >> Proposed change: provide the full fingerprint instead.  Do this
> >> for every caller of gpg-interface.c.
> >
> > Sounds sane.
> 
> I probably should react a bit stronger against that "instead", as
> Ian will not be writing the world's first server side hook that uses
> this interface.

Anyone who is verifying signatures and doing anything with the
16-character key id other than logging it, has a serious security bug.

> But on the other hand, the value of this environment is not meant to
> be used to make de

git signed push server-side

2017-08-25 Thread Ian Jackson
I have been investigating git signed pushes.  I found a number of
infelicities in the server side implementation which make using this
in practice rather difficult.  I'm emailing here (before writing
patches) to see what people think of my proposed changes.


1. PUSH_CERT_KEY has truncated keyid (Debian #852647)

I see this:
  GIT_PUSH_CERT_KEY=A3DBCBC039B13D8A

There is almost no purpose for which this 64-bit keyid can be safely
used.  The full key fingerprint should be provided instead.

Proposed change: provide the full fingerprint instead.  Do this
for every caller of gpg-interface.c.


2. git-receive-pack calls gpg (Debian #852684)

It would be better if it called gpgv.  gpg does all sorts of
complicated things, including automatically starting or connecting to
a gpg-agent, which are not appropriate for use in a daemon on a
server.

Additionally, I find that passing -c gpg.program=/usr/bin/gpgv
to git receive-pack is not effective, and there seems to be no
sensible way to specify the keyrings to use (although that could be
done by setting GNUPGHOME perhaps).

Proposed change: call gpgv instead (and make any needed changes to
adapt to gpgv).  Do this only when we are in git-receive-pack; other
call sites of gpg-interface.c will continue to use gpg.


3. No way to specify keyring (Debian #852684, side note)

There should be a way to specify the keyring used by
git-receive-pack's gpgv invocation.  This should probably be done with
a config option, receive.certKeyring perhaps.


4. Trouble with the nonce (Debian #852688 part 2)

To use the signed push feature it is necessary to provide a nonce seed
to git-receive-pack.

The docs say the seed must be secret but there is no documented way to
pass this seed to git that does not either write it to a git
configuration file somewhere, or pass it on a command line.  The git
configuration system is unsuited to keeping secrets.  Command lines
can be seen in ps etc.

Additionally, the seed should be changed occasionally (since
unchangeable secrets are a bad idea).  Ideally this should be done
automatically.

Analysis: the seed is used to allow the server to mostly-statelessly
verify the freshness of a client's nonce.  (This is necessary only in
connectionless transport protocols, "stateless_rpc" as
builtin/receive-pack.c has it.)

Proposed fix (in two parts):

(i) Provide a new config option receive.certNonceSeedsFile.  It
contains seeds, one per line.  When stateless_rpc, we send a nonce
computed from the first seed.  We accept nonces computed from any of
the listed seeds.  The documentation will say that the file should
normally contain two seeds; rollover is achieved by mving into place a
new seed at the top, and dropping one from the bottom.  An example
script will be provided.

(ii) At some later point, the following enhancement: When
!stateless_rpc, certNonceSeedsFile is ignored except that if neither
it nor the old certNonceSeed is set, the signed push feature is
disabled.  In this state we always get a fresh nonce (from a suitable
system random source).  Nontrivial because current git doesn't seem to
have a "get suitable random number" function, and the mess that is the
semantics of /dev/*random* files means that providing one is going to
be controversial.


5. There are no docs on how to use this feature properly
   (Debian #852695, #852688 part 1)

Using the signed push feature requires careful programming on the
server side.  There should be a doc explaining how to do this.

Proposed fix: provide a .txt file containing much the same contents as
seen here:
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=852695


If I send patches like the above, would they be welcomed (subject to
detailed review, of course) ?

Thanks,
Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: RFC: Another proposed hash function transition plan

2017-03-08 Thread Ian Jackson
Linus Torvalds writes ("Re: RFC: Another proposed hash function transition 
plan"):
> Also, since 256 isn't evenly divisible by 6, and because you'd want
> some way to explictly disambiguate the new hashes, the rule *could* be
> that the ASCII representation of a new hash is the base64 encoding of
> the 258-bit value that has "10" prepended to it as padding.
> 
> That way the first character of the hash would be guaranteed to not be
> a hex digit, because it would be in the range [g-v] (indexes 32..47).

We should arrange for this to be an uppercase, not a lowercase,
letter, for the reasons I explained in my own proposal.  To summarise:
It would be undesirable to further increase the overlap between object
names and ref names.  Few people use uppercase in ref names because of
the case-insensitive filesystem problem; so object names starting with
uppercase ascii are distinct from most object names.

> Of course, having written that, I now realize how it would cause
> problems for the usual shit-for-brains case-insensitive filesystems.
> So I guess base64 encoding doesn't work well for that reason.

AFAIAA object names occur in publicly-visible filenames only in notes
tree objects, which are manipulated by git internally and do not
necessarily need to appear in the filesystem.

The filenames in .git/objects/ can be in whatever encoding we like, so
are not an obstacle.

Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: RFC: Another proposed hash function transition plan

2017-03-07 Thread Ian Jackson
Jonathan Nieder writes ("RFC: Another proposed hash function transition plan"):
> This past week we came up with this idea for what a transition to a new
> hash function for Git would look like.  I'd be interested in your
> thoughts (especially if you can make them as comments on the document,
> which makes it easier to address them and update the document).

Thanks for this.

This is a reasonable plan.  It corresponds to approaches (2) and (B)
of my survey mail from the other day.  Ie, two parallel homogeneous
hash trees, rather than a unified but heterogeneous hash tree, with
old vs new object names distinguished by length.

I still prefer my proposal with the mixed hash tree, mostly because
the handling of signatures here is very awkward, and because my
proposal does not involve altering object ids stored other than in the
git object graph (eg CI system databases, etc.)

One thing you've missed, I think, is notes: notes have to be dealt
with in a more complicated way.  Do you intend to rewrite the tree
objects for notes commits so that the notes are annotations for the
new names for the annotated objects ?  And if so, when ?

Also I think you need to specify how abbreviated object names are
interpreted.

Regards,
Ian.


Re: Transition plan for git to move to a new hash function

2017-03-05 Thread Ian Jackson
brian m. carlson writes ("Re: Transition plan for git to move to a new hash 
function"):
> Instead, I was referring to areas like the notes code.  It has extensive
> use of the last byte as a type of lookup table key.  It's very dependent
> on having exactly one hash, since it will always want to use the last
> byte.

You mean note_tree_search ?  (My tree here may be a bit out of date.)
This doesn't seem difficult to fix.  The nontrivial changes would be
mostly confined to SUBTREE_SHA1_PREFIXCMP and GET_NIBBLE.

It's true that like most of git there's a lot of hardcoded `sha1'.


Are you arguing in favour of "replace git with git2 by simply
s/20/64/g; s/sha1/blake/g" ?  This seems to me to be a poor idea.
Takeup of the new `git2' would be very slow because of the pain
involved.

Any sensible method of moving to a new hash that isn't "make a
completely incompatible new version of git" is going to involve
teaching the code we have in git right now to handle new hashes as
well as sha1 hashes.

Even if the plan is to try to convert old data, rather than keep it
and be able to refer to it from new data, something will have to be
able to parse old packfiles, old commits, old tags, old notes,
etc. etc. etc.  Either that's going to be some separate conversion
utility, or it has to be the same code in git that's there already.[1]

The ability to handle both old-format and new-format data can be
achieved in the code by doing away with the hardcoded sha1s, so that
instead the hash is an abstract data type with operations like
"initialise", "compare", "get a nybble", etc.  We've already seen
patches going in this direction.

[1] I've heard suggestions here that instead we should expect users to
"git1 fast-export", which you would presumably feed into "git2
fast-import".  But what is `git1' here ?  Is it the current git
codebase frozen in time ?  I don't think it can be.  With this
conversion strategy, we will need to maintain git1 for decades.  It
will need portability fixes, security fixes, fixes for new hostile
compiler optimisations, and so on.  The difficulty of conversion means
there will be pressure to backport new features from `git2' to `git1'.
(Also this approach means that all signatures are definitively lost
during the conversion process.)

So if we want to provide both `git1' and `git2', it's still better to
compile `git' and `git2' from the same codebase.  But if we do that,
the resulting ifdeffery and/or other hash abstractions are most of the
work to be hash-agile.  It's just the difference between a
compile-time and runtime switch.

I think the incompatibile approach is much more work in the medium and
long term - and it leads to a longer transition period.


Bear in mind that our objective is not to minimise the time until the
new version of git is available.  Our objective is to minimise the
time until (most) people are using it.  An approach which takes longer
for the git community to develop, but which is easier to deploy, can
easily be better.

Or maybe the objective is to minimise overall effort.  In which case
more work on git, for an easier transition for all the users, seems
like a no-brainer.  I think this is arguably true even from the point
of view of effort amongst the community of git contributors.  git
contributors start out as git users - and if git's users are all busy
struggling with a difficult transition, they will have less time to
improve other stuff and will tend less to get involved upstream.  (And
they may be less inclined to feel that the git upstream developers
understand their needs well.)

The better alternative is to adopt a plan that has a clear and
straightforward transition for users, and ask git users to help with
implementation.

I think many git users, including sophisticated users and competent
organisations, are concerned about sha1.  Currently most of those
users will find it difficult to help, because it's not clear to them
what needs to be done.

Thanks,
Ian.


Re: SHA1 collisions found

2017-03-03 Thread Ian Jackson
 with a hash function indication
 (sha256: or blake_ or H)

 (B) Infer the hash function from the object name length
 (and do some kind of bodge for abbreviated object names).

 (C) Hash function is implicit from context.  (This is compatible with
 (2) only, because (1) requires any object to be able to refer to
 any hash function.)

I think (A) is best because it means everything is unambiguous, and it
allows future hash function changes without further difficulty.

(B) is a reasonable possibility although the abbreviated object name
bodge would be quite ugly and probably involve thinking about several
annoying edge cases.

I think (C) is really bad, because it instantly makes all existing
application code which calls git to be buggy.  Such application code
would need to be adjusted to know for itself which of the object names
it has recorded are what hash function, and explicitly specify this to
its git operations somehow.

All of these options involve updating many callers of git.  In any
case any git caller which explicitly checks the object name length
will need to be changed.  For (a), many git callers which match object
names using something like [0-9a-f]+ rather than \w+ will need to be
changed - but at least it's a simple change with little semantic
import.

(A) has the additional advantage that it becomes possible to make
object names syntactically distinguishable from ref names.


The final argument I would make is this:

We don't know what hash function research will look like in 10-20
years.  We would like to not have a bunch of pain again.  So ideally
we would deploy a framework now that would let us switch hash function
again without further history-rewriting.

(1)(A) and perhaps (1)(B) are the only options which support this
well.


Ian.

[1] I'm going to keep assuming that the bikeshed will be blue, because
I think BLAKE2b has is a better choice.  It has probably had more
serious people looking at it than SHA-3, at least, and it has good
performance.  The web page has an impressive adoption list - probably
wider than SHA-3.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: Transition plan for git to move to a new hash function

2017-03-02 Thread Ian Jackson
brian m. carlson writes ("Re: Transition plan for git to move to a new hash 
function"):
> On Mon, Feb 27, 2017 at 01:00:01PM +, Ian Jackson wrote:
> > Objects of one hash may refer to objects named by a different hash
> > function to their own.  Preference rules arrange that normally, new
> > hash objects refer to other new hash objects.
> 
> The existing codebase isn't really intended with that in mind.

Yes.  I've seen the attempts to start to replace char* with a hash
struct.

> I like Peff's suggested approach in which we essentially rewrite history
> under the hood, but have a lookup table which looks up the old hash
> based on the new hash.  That allows us to refer to old objects, but not
> have to share serialized data that mentions both hashes.

I think this means that the when a project converts, every copy of the
history must be rewritten (separately).  Also, this leaves the whole
system lacking in algorithm agililty.  Meaning we may have to do all
of this again some time.

I also think that we need to distinguish old hashes from new hashes in
the command line interface etc.  Otherwise there is a possible
ambiguity.

> > The object name textual syntax is extended.  The new syntax may be
> > used in all textual git objects and protocols (commits, tags, command
> > lines, etc.).
> > 
> > We declare that the object name syntax is henceforth
> >   [A-Z]+[0-9a-z]+ | [0-9a-f]+
> > and that names [A-Z].* are deprecated as ref name components.
> 
> I'd simply say that we have data always be in the new format if it's
> available, and tag the old SHA-1 versions instead.  Otherwise, as Peff
> pointed out, we're going to be stuck typing a bunch of identical stuff
> every time.  Again, this encourages migration.

The hash identifier is only one character.  Object names are not
normally typed very much anyway.

If you say we must decorate old hashes, then all existing data
everywhere in the world which refers to any git objects by object name
will become invalid.  I don't mean just data in git here.  I mean CI
systems, mailing list archives, commit messages (perhaps in other
version control systems), test cases, and so on.

Ian.


Re: Why BLAKE2?

2017-02-27 Thread Ian Jackson
Markus Trippelsdorf writes ("Re: Why BLAKE2?"):
> On 2017.02.27 at 13:00 +, Ian Jackson wrote:
> > For brevity I will write `SHA' for hashing with SHA-1, using current
> > unqualified object names, and `BLAKE' for hasing with BLAKE2b, using
> > H object names.
> 
> Why do you choose BLAKE2? SHA-2 is generally considered still fine and
> would be the obvious choice. And if you want to be adventurous then
> SHA-3 (Keccak) would be the next logical candidate.

I don't have a strong opinion.  Keccak would be fine too.
We should probably avoid SHA-2.

The main point of my posting was not to argue in favour of a
particular hash function :-).

Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Transition plan for git to move to a new hash function

2017-02-27 Thread Ian Jackson
 so.  (In both cases, by setting configuration options.)

Y4: BLAKE by default for new projects.
Conversion enabled for existing projects.
Old git software is going to start rotting.

Default configuration change:
   BLAKE > SHA
   BLAKE enabled (even in trees without working trees)

Suggested multi-project hosting site configuration change:
   Newly created projects should get BLAKE enabled
   Existing projects should retain BLAKE disabled by default
   Button should be provided to start conversion (see below)

Effects:

When creating a new working tree, it starts using BLAKE.

Servers which have been updated will accept BLAKE.

Servers which have not been updated to Y4's git will need a small
configuration change (enabling BLAKE) to cope with the new
projects that are using BLAKE.

To convert a project, an administrator (or project owner) would
set BLAKE to enabled, and SHA to deprecated, on the server(s).  On
the next pull the server will provide ref hints naming BLAKE,
which will get copied to the client's HEAD.  So the client is
infected with BLAKE.

To convert a project branch-by-branch, the administrator would set
BLAKE to enabled but leave SHA enabled.  Then each branch retains
its own hash.  A branch can be converted by pushing a BLAKE commit
to it, or by setting a ref hint on the server (or on the next
client to push).

Y6: BLAKE by default for all projects
Existing projects start being converted infectiously.
It is hard for a project to stop this happening if any of
 their servers are updated.
Old git software is firmly stuffed.

Default configuration change
   SHA deprecated in trees without working trees

Effects:

Existing projects are, by default, `converted', as described
above.

Y8: Clients hate SHA
Clients insist on trying to convert existing projects
It is very hard to stop this happening.
Unrepentant servers start being very hard to use.

Default configuration change
   SHA deprecated (even in trees without working trees)

Effects:

Clients will generate only BLAKE.  Hopefully their server will
accept this!

Y10: Stop accepting new SHA
No-one can manage to make new SHA commits

Default configuration change
   SHA disabled in new trees, except during initial
  `clone', `mirror' and similar

Effects:

Existing SHA history is retained, and copied to new clients and
servers.  But established clients and servers reject any newly
introduced SHA.


-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: SHA1 collisions found

2017-02-24 Thread Ian Jackson
Ian Jackson writes ("Re: SHA1 collisions found"):
> The idea of using the length is a neat trick, but it cannot support
> the dcurrent object name abbreviation approach unworkable.

Sorry, it's late here and my grammar seems to have disintegrated !

Ian.


Re: SHA1 collisions found

2017-02-24 Thread Ian Jackson
Junio C Hamano writes ("Re: SHA1 collisions found"):
> Ian Jackson <ijack...@chiark.greenend.org.uk> writes:
> >  * Therefore the transition needs to be done by giving every object
> >two names (old and new hash function).  Objects may refer to each
> >other by either name, but must pick one.  The usual shape of
> 
> I do not think it is necessrily so.

Indeed.  And my latest thoughts involve instead having two parallel
systems of old and new objects.

> *1* In the above toy example, length being 40 vs 64 is used as a
> sign between SHA-1 and the new hash, and careful readers may
> wonder if we should use sha-3,20769079d22... or something like
> that that more explicity identifies what hash is used, so that
> we can pick a hash whose length is 64 when we transition again.

I have an idea for this.  I think we should prefix new hashes with a
single uppercase letter, probably H.

Uppercase because: case-only-distinguished ref names are already
discouraged because they do not work properly on case-insensitive
filesystems; convention is that ref names are lowercase; so an
uppercase letter probably won't appear at the start of a ref name
component even though almost all existing software will treat it as
legal.  So the result is that the new object names are unlikely to
collide with ref names.

(There is of course no need to store the H as a literal in filenames,
so the case-insensitive filesystem problem does not apply to ref
names.)

We should definitely not introduce new punctuation into object names.
That will cause a great deal of grief for existing software which has
to handle git object names and may thy to store them in
representations which assume that they match \w+.

The idea of using the length is a neat trick, but it cannot support
the dcurrent object name abbreviation approach unworkable.

Ian.


Re: SHA1 collisions found

2017-02-24 Thread Ian Jackson
Joey Hess writes ("SHA1 collisions found"):
> https://shattered.io/static/shattered.pdf
> https://freedom-to-tinker.com/2017/02/23/rip-sha-1/
> 
> IIRC someone has been working on parameterizing git's SHA1 assumptions
> so a repository could eventually use a more secure hash. How far has
> that gotten? There are still many "40" constants in git.git HEAD.

I have been thinking about how to do a transition from SHA1 to another
hash function.

I have concluded that:

 * We can should avoid expecting everyone to rewrite all their
   history.

 * Unfortunately, because the data formats (particularly, the commit
   header) are not in practice extensible (because of the way existing
   code parses them), it is not useful to try generate new data (new
   commits etc.) containing both new hashes and old hashes: old
   clients will mishandle the new data.

 * Therefore the transition needs to be done by giving every object
   two names (old and new hash function).  Objects may refer to each
   other by either name, but must pick one.  The usual shape of
   project histories will be a pile of old commits referring to each
   other by old names, surmounted by new commits referrring to each
   other by new names.

 * It is not possible to solve this problem without extending the
   object name format.  Therefore all software which calls git and
   expects to handle object names will need to be updated.

I have been writing a more detailed transition plan.  I hope to post
this within a few days.

Ian.


Re: [PATCH 0/5] git check-ref-format --stdin --report-errors

2016-12-19 Thread Ian Jackson
Michael Haggerty writes ("Re: [PATCH 0/5] git check-ref-format --stdin 
--report-errors"):
> Thanks for your patches. I left some comments about the individual patches.

Thanks for your review.

> I don't know whether this feature will be popular, but it's not a lot of
> code to add it, so it would be OK with me.

Great.

> Especially given that the output is not especially machine-readable, it
> might be more consistent with other commands to call the new feature
> `--verbose` rather than `--report-errors`.

Sure.

> If it is thought likely that scripts will want to leave a pipe open to
> this command and feed it one query at a time, then it would be helpful
> to flush stdout after each reference's result is written. If the
> opposite use case is common (mass processing of refnames), we could
> always add a `--buffer` option like the one that `git cat-file --batch` has.

I think it should be unbuffered by default, so I will make that
change, along with the fixes from your other mails, and resubmit.

Regards,
Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: [PATCH 1/5] check-ref-format: Refactor out check_one_ref_format

2016-12-19 Thread Ian Jackson
Michael Haggerty writes ("Re: [PATCH 1/5] check-ref-format: Refactor out 
check_one_ref_format"):
> On 11/04/2016 08:13 PM, Ian Jackson wrote:
> > +static int check_one_ref_format(const char *refname)
...
> This function needs to `return 0` if it gets to the end.

Indeed it does.  I'm kind of surprised my compiler didn't spot that.

Thanks for the careful review!

Regards,
Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: [PATCH 2/5] check-ref-format: Refactor to make --branch code more common

2016-12-19 Thread Ian Jackson
Michael Haggerty writes ("Re: [PATCH 2/5] check-ref-format: Refactor to make 
--branch code more common"):
> On 11/04/2016 08:13 PM, Ian Jackson wrote:
> >  static int normalize = 0;
> > +static int check_branch = 0;
> >  static int flags = 0;
> >  
> >  static int check_one_ref_format(const char *refname)
> >  {
> > +   int got;
> 
> `got` is an unusual name for this variable, and I don't really
> understand what the word means in this context. Is there a reason not to
> use the more usual `err`?

I have no real opinion about the name of this variable.  `err' is a
fine name too.

> > +   if (check_branch && (flags || normalize))
> 
> Is there a reason not to allow `--normalize` with `--branch`?
> (Currently, `git check-ref-format --branch` *does* allow input like
> `refs/heads/foo`.)

It was like that when I found it :-).  I wasn't sure why this
restriction was there so I left it alone.

Looking at it again: AFAICT from the documentation --branch is a
completely different mode.  The effect of --normalize is not to do
additional work, but simply to produce additional output.  It really
means --print-normalized.  --branch already prints output, but AFAICT
it does not collapse slashes.  This seems like a confusing collection
of options.  But, sorting that out is beyond the scope of what I was
trying to do.

In my series I have at least managed not to make any of this any
worse, I think: the --stdin option I introduce applies to both modes
equally, and doesn't make future improvements to the conflict between
--branch and --normalize any harder.

(In _this_ patch, certainly, allowing --normalize with --branch would
be wrong, since _this_ patch is just refactoring.)

Thanks,
Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: [PATCH 5/6] config docs: Provide for config to specify tags not to abbreviate

2016-11-09 Thread Ian Jackson
Junio C Hamano writes ("Re: [PATCH 5/6] config docs: Provide for config to 
specify tags not to abbreviate"):
> Ian Jackson <ijack...@chiark.greenend.org.uk> writes:
> > This is not correct, because as I have explained, this should be a
> > per-tree configuration:
> 
> I do not have fundamental opposition to make it part of .git/config,
> but the name "gitk.something" or if you are enhancing git-gui at the
> time perhaps "gui.something" would be appropriate.  
> 
> But it is still silly to have this kind of information that is very
> specific to Gitk in two places, one that is pretty Gitk specific
> that core-git does not know anything about, the other that are part
> of the configuration storage of the core-git.  In the longer term,
> it is necessary for them to be accessible from gitk's "Edit ->
> Preferences" mechanism somehow, I would think, rather than forcing
> users to sometimes go to GUI to tweak and sometimes run "git config".

I am proposing to set this configuration setting automatically in
dgit.  Other tools that work with particular git tags would do the
same.  There would be no need for users to do anything.

Having this as an option in a menu would be quite wrong, because it
would end up with the user and the tooling fighting.  This is why I
don't want to put this in gitk's existing config file mechanism.

It would be wrong for dgit to edit the user's gitk config file, for
many reasons.

To put it another way, this setting is a way for a tool like dgit to
communicate with gitk (or other programs which have to make guesses
about how prominently to present certain information to the user).
It's not intended to be a way for users, certainly not non-expert
users, to communicate with gitk.

The way I have structured my proposed patches in gitk would make it
easy to provide a gui option to adjust these settings.  Such a gui
option ought to save its value in the gitk config file, and those
values ought to override what comes from `git config'.

But such a system would not obviate the need for a legitimate way for
programs like dgit to communicate with gitk.

Thanks,
Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: [PATCH 5/6] config docs: Provide for config to specify tags not to abbreviate

2016-11-09 Thread Ian Jackson
Junio C Hamano writes ("Re: [PATCH 5/6] config docs: Provide for config to 
specify tags not to abbreviate"):
> And I do not think we would want "log" or any core side Porcelain
> command to have too many "information losing" options like this
> "truncate refnames down to a point where it is no longer unique and
> meaningful".  GUI tools can get away with doing sos because they can
> arrange these truncated labels to react to end-user input (e.g. the
> truncated Tag in the history display of gitk could be made to react
> to mouse-over and pop-up to show a full name, for example), but the
> output from the core side is pretty much fixed once it is emitted.
> 
> So my first preference would be to teach gitk such a "please
> clarify" UI-reaction, if it does not know how to do so yet.  There
> is no need for a configuration variable anywhere with this approach.

gitk already has a way for the user to find out what the elided tag
names are.  The underlying difficulty is that the situation that the
gitk behaviour is designed for (long tag names, perhaps several to a
commit, not particularly interesting), is not applicable to these
particular tags.

Whether the tag is `particularly interesting' depends, as I say, on
both what tree it is in, and on its name.  It might be appropriate for
terminal-based tools to highlight these tags too, or show them when
tags are not normally displayed.

`core.interestingTags' ?

> If you do want to add a configuration to show fuller name in the
> tag, which would make it unnecessary for the user to do "please
> clarify, as I am hovering over what I want to get details of"
> action, that may also be a good way to go.

I think in my use case, which I hope to become common within Debian,
this is going to be essential.

>  But I think the right
> place to do so would be Edit -> Preferences menu in Gitk, and the
> settings will be stored in ~/.gitk or ~/.config/git/gitk or whatever
> gitk-specific place.

This is not correct, because as I have explained, this should be a
per-tree configuration:

If it can't be a `git config' option, even `git config gui.something',
then I guess I will have to teach gitk to read a config file in
GIT_DIR too.  But I think that is silly given that git already has a
config file reading system which handles per-tree configs.

If we can't get agreement from the git-core developers on a config to
be used, and documented, for any tool which has similar behaviour, I
think the right answer is `git config gitk.', which would
be documented in gitk.

Thanks,
Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: [PATCH 5/6] config docs: Provide for config to specify tags not to abbreviate

2016-11-08 Thread Ian Jackson
Jeff King writes ("Re: [PATCH 5/6] config docs: Provide for config to specify 
tags not to abbreviate"):
> Yeah, I think git's config system was always designed to carry options
> for porcelains outside of git-core itself. So your new option fits into
> that.

Good, thanks.

> I think the two things I found weird were:
> 
>   - it's in the "log" section, which makes me think it's an option for
> git-log. But it's not. I'm not sure what the _right_ section is, but
> hopefully it would make it clear that this is command-agnostic.
> 
> Something like "gui.abbrevTags" might be OK (and as you note, has
> precedence). But of course it's possible that a command like "tig"
> could learn to support it.  I'm not sure if that counts as a GUI or
> not. :)

I don't really have an opinion about the name.  gui.abbrevTags would
be a possibility.  (It's a bit odd that implicitly, the default would
be `*'.)

>   - The description talks about tag abbreviation, but doesn't really
> define it. Not being a gitk user, it was hard for me to figure out
> whether this was even relevant. Does it mean turning
> "refs/tags/v1.0" into "1.0"? From the rest of the series, it sounds
> like no. That should be more clear from the documentation.

I can do that, sure.

By default, gitk doesn't like to use much screen real estate for tags.
If there are long tag names, or many tags, it shows them all as a
single small indication saying just `<tag...|' or whatever with the
literal `tag...', not with the tag value.

Maybe a better name would be
   gui.alwaysShowTags
?

I'm happy to be just told what the name ought to be, if the gitk and
git maintainers can agree.  It seems largely a matter of taste.

Thanks,
Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: [PATCH 0/6] Provide for config to specify tags not to abbreviate

2016-11-08 Thread Ian Jackson
Markus Hitter writes ("Re: [PATCH 0/6] Provide for config to specify tags not 
to abbreviate"):
> TBH, I see a violation of tool independence with the choice of
> preference storage. Abbreviation of tags isn't a property of the
> repository, but a pure visual thing (screen real estate, whatever),
> so it should be handled by the tool doing the visuals, only.

As I explained in my cover letter, the set of tags which are important
enough not to abbreviate, even if they would normally be abbreviated,
is indeed a property of the repository.

The alternative would be for a tool like gitk to grow an
ever-increasing set of heuristics.  Or, worse, for a tool like dgit
(which knows that archive/* are special) to edit the user's personal
gitk settings.

> Your use case looks like a nice opportunity for
> 
> - adding a Gitk user preference on how long displayed tags are
>   allowed to be (instead of distinguishing between abbreviated and
>   unabbreviated ones; set it to 999 for your use case) and/or

This would be wrong, because it's only certain tags that ought not to
be abbreviated.  The right way to identify those tags is by 1. what
repo they are in 2. what their name is.  (It might be possible to
identify them by content or something - for example, the interesting
archive/* tags all refer to commits whose trees contain debian/ - but
that is getting quite out of hand.)

What you propose are possible general improvements to the abbreviation
system in gitk.  But they do not address the fundamental point that
some tags are much more interesting than others.  It is this latter
point that I am trying to deal with.

Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: [PATCH 5/6] config docs: Provide for config to specify tags not to abbreviate

2016-11-08 Thread Ian Jackson
Jacob Keller writes ("Re: [PATCH 5/6] config docs: Provide for config to 
specify tags not to abbreviate"):
> On Mon, Nov 7, 2016 at 4:52 PM, Ian Jackson
> <ijack...@chiark.greenend.org.uk> wrote:
> > +log.noAbbrevTags::
> > +   Each value is a glob pattern, specifying tag nammes which
> > +   should always be displayed in full, even when other tags may
> > +   be omitted or abbreviated (for example, by linkgit:gitk[1]).
> > +   Values starting with `^` specify tags which should be
> > +   abbreviated.  The order is important: the last match, in the
> > +   most-local configuration, wins.
> > +
> 
> It seems weird that this description implies some sort of behavior
> change in core git itself, but in fact is only used as a reference for
> other tools that may or may not honor it. I guess the reasoning here
> is to try to get other external tools that abbreviate tags to also
> honor this? But it still seems pretty weird to have a documented
> config that has no code in core git to honor it...

Thanks for your attention.

Yes, I agree that it does seem weird.  But the alternatives seem
worse.  I think it's probably best if options like this (currently
only honoured by out-of-core tools but of general usefulness) are
collected together here.

There is a precedent: `git config gui.encoding' is, according to the
documentation, honoured only by git-gui and gitk.

Calling the config option `gitk.noAbbrevTags' would be possible but
that would invite everyone else to invent their own, which would be
quite annoying.  (Also, gitk does not have any gitk-specific git
config options right now, AIUI.  It does honour `git config
gui.encoding'.)

Would it help to add a sentence to the documentation saying that this
is currently only honoured by gitk ?  (The paragraph for gui.encoding
says something similar.)  Of course I don't know who else abbreviates
tags, but as they gain support they could be added to the docs.

Thanks,
Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Re: [PATCH 0/6] Provide for config to specify tags not to abbreviate

2016-11-07 Thread Ian Jackson
Ian Jackson writes ("[PATCH 0/6] Provide for config to specify tags not to 
abbreviate"):
> Please find in the following mails patches which provide a way to make
> gitk display certain tags in full, even if they would normally be
> abbreviated.
> 
> There are four patches to gitk, three to prepare the ground, and one
> to introduce the new feature.
>
> There is one patch for git, to just document the new config variable.

The eagle-eyed reader will spot that that makes 5 patches, not 6.
There are indeed only 5.  The subject mentioning 6 is a mistake -
sorry.

Thanks,
Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


[PATCH 0/6] Provide for config to specify tags not to abbreviate

2016-11-07 Thread Ian Jackson
Hi.

Please find in the following mails patches which provide a way to make
gitk display certain tags in full, even if they would normally be
abbreviated.

There are four patches to gitk, three to prepare the ground, and one
to introduce the new feature.

There is one patch for git, to just document the new config variable.

I hope this is the right way to submit this series.  Thanks for your
attention.


As I say in the patch "gitk: Provide for config to specify tags not to
abbreviate":

The config setting is in git config logs.* rather than gitk's
own configuration, because:

 - Tools which manage git trees may want to set this, depending
   on their knowledge of the nature of the tags likely to be
   present;

 - Whether this property ought to be set is mostly a property of the
   contents of the tag namespaces in the tree, not a user preference.
   (Although of course user preferences are supported.)

 - Other git utilities (or out of tree utilities) may want to
   reference this setting for their own display purposes.

There will be another, separate, patch to the `git' tree to document
this config option.

Background motivation:

Debian's dgit archive gateway tool generates and uses tags called
archive/debian/VERSION.  If such a tag refers to a Debian source tree,
it is probably very interesting because it refers to a version
actually uploaded to Debian by the Debian package maintainer.

We would therefore like a way to specify that such tags should be
displayed in full.  dgit will be able to set an appropriate config
setting in the trees it deals with.



Ian Jackson (4):
  gitk: Internal: drawtags: Abolish "singletag" variable
  gitk: Internal: drawtags: Idempotently reset "ntags"
  gitk: drawtags: Introduce concept of unabbreviated marks
  gitk: Provide for config to specify tags not to abbreviate

 gitk | 34 ++
 1 file changed, 30 insertions(+), 4 deletions(-)


Ian Jackson (1):
  config docs: Provide for config to specify tags not to abbreviate

 Documentation/config.txt | 8 
 1 file changed, 8 insertions(+)


-- 
2.10.1



[PATCH GITK 2/6] gitk: Internal: drawtags: Idempotently reset "ntags"

2016-11-07 Thread Ian Jackson
The previous code tracked its change to the length of `marks' by
updateing the variable `ntags'.  This is a bit fragile and cumbersome,
and we are going to want to modify `marks' some more in a moment.

Instead, simply reset ntags to the length of marks, after we have
possibly done any needed abbreviation.

No functional change.

Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk>
---
 gitk | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gitk b/gitk
index 42fa41a..31aecda 100755
--- a/gitk
+++ b/gitk
@@ -6575,9 +6575,10 @@ proc drawtags {id x xt y1} {
} else {
set marks [list [format "%d tags..." $ntags]]
}
-   set ntags 1
}
 }
+set ntags [llength $marks]
+
 if {[info exists idheads($id)]} {
set marks [concat $marks $idheads($id)]
set nheads [llength $idheads($id)]
-- 
2.10.1



[PATCH GITK 1/6] gitk: Internal: drawtags: Abolish "singletag" variable

2016-11-07 Thread Ian Jackson
We are going to want to make the contents of `marks' somewhat more
complicated in a moment, so it won't be possible to use what is
effectively a single variable to represent the status of the whole of
the non-heads part of the marks list.

Luckily the strings that replace actual tag names, in the `singletag'
case, are not themselves valid tag names.  So they can be detected
directly.

(Also, `singletag' was not quite right anyway: really it meant that
the tag name(s) had been abbreviated.)

No functional change.

Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk>
---
 gitk | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gitk b/gitk
index 805a1c7..42fa41a 100755
--- a/gitk
+++ b/gitk
@@ -6570,7 +6570,6 @@ proc drawtags {id x xt y1} {
if {$ntags > $maxtags ||
[totalwidth $marks mainfont $extra] > $maxwidth} {
# show just a single "n tags..." tag
-   set singletag 1
if {$ntags == 1} {
set marks [list "tag..."]
} else {
@@ -6620,7 +6619,7 @@ proc drawtags {id x xt y1} {
   $xr $yt $xr $yb $xl $yb $x [expr {$yb - $delta}] \
   -width 1 -outline $tagoutlinecolor -fill $tagbgcolor \
   -tags tag.$id]
-   if {$singletag} {
+   if {[regexp {^tag\.\.\.|^\d+ } $tag]} {
set tagclick [list showtags $id 1]
} else {
set tagclick [list showtag $tag_quoted 1]
-- 
2.10.1



[PATCH GITK 4/6] gitk: Provide for config to specify tags not to abbreviate

2016-11-07 Thread Ian Jackson
Tags matching a new multi-valued config option log.noAbbrevTags
are not abbreviated.

The config setting is in git config logs.* rather than gitk's
own configuration, because:

 - Tools which manage git trees may want to set this, depending
   on their knowledge of the nature of the tags likely to be
   present;

 - Whether this property ought to be set is mostly a property of the
   contents of the tag namespaces in the tree, not a user preference.
   (Although of course user preferences are supported.)

 - Other git utilities (or out of tree utilities) may want to
   reference this setting for their own display purposes.

There will be another, separate, patch to the `git' tree to document
this config option.

Background motivation:

Debian's dgit archive gateway tool generates and uses tags called
archive/debian/VERSION.  If such a tag refers to a Debian source tree,
it is probably very interesting because it refers to a version
actually uploaded to Debian by the Debian package maintainer.

We would therefore like a way to specify that such tags should be
displayed in full.  dgit will be able to set an appropriate config
setting in the trees it deals with.

Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk>
---
 gitk | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gitk b/gitk
index d76f1e3..515d7b0 100755
--- a/gitk
+++ b/gitk
@@ -6547,6 +6547,14 @@ proc totalwidth {l font extra} {
 }
 
 proc tag_want_unabbrev {tag} {
+global noabbrevtags
+# noabbrevtags was reversed when we read config, so take first match
+foreach pat $noabbrevtags {
+   set inverted [regsub {^\^} $pat {} pat]
+   if {[string match $pat $tag]} {
+   return [expr {!$inverted}]
+   }
+}
 return 0
 }
 
@@ -12138,6 +12146,11 @@ set tclencoding [tcl_encoding $gitencoding]
 if {$tclencoding == {}} {
 puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk"
 }
+set noabbrevtags {}
+catch {
+set noabbrevtags [exec git config --get-all log.noAbbrevTags]
+}
+set noabbrevtags [lreverse [split $noabbrevtags "\n"]]
 
 set gui_encoding [encoding system]
 catch {
-- 
2.10.1



[PATCH GITK 3/6] gitk: drawtags: Introduce concept of unabbreviated marks

2016-11-07 Thread Ian Jackson
We are going to want to show some tags in full, even if they are long
or there are other tags.  Do this by filtering the tags into
`marks_unabbrev' and `marks'.  `marks_unabbrev' bypasses the tag
abbreviation, and is put on the front of the marks array after any
abbreviation has been done.

No functional change right now because no tags are considered
`unabbrev'.

Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk>
---
 gitk | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/gitk b/gitk
index 31aecda..d76f1e3 100755
--- a/gitk
+++ b/gitk
@@ -6546,6 +6546,10 @@ proc totalwidth {l font extra} {
 return $tot
 }
 
+proc tag_want_unabbrev {tag} {
+return 0
+}
+
 proc drawtags {id x xt y1} {
 global idtags idheads idotherrefs mainhead
 global linespc lthickness
@@ -6564,8 +6568,16 @@ proc drawtags {id x xt y1} {
 set delta [expr {int(0.5 * ($linespc - $lthickness))}]
 set extra [expr {$delta + $lthickness + $linespc}]
 
+set marks_unabbrev {}
 if {[info exists idtags($id)]} {
-   set marks $idtags($id)
+   set marks {}
+   foreach tag $idtags($id) {
+   if {[tag_want_unabbrev $tag]} {
+   lappend marks_unabbrev $tag
+   } else {
+   lappend marks $tag
+   }
+   }
set ntags [llength $marks]
if {$ntags > $maxtags ||
[totalwidth $marks mainfont $extra] > $maxwidth} {
@@ -6577,6 +6589,7 @@ proc drawtags {id x xt y1} {
}
}
 }
+set marks [concat $marks_unabbrev $marks]
 set ntags [llength $marks]
 
 if {[info exists idheads($id)]} {
-- 
2.10.1



[PATCH 5/6] config docs: Provide for config to specify tags not to abbreviate

2016-11-07 Thread Ian Jackson
Tags matching a new multi-valued config option log.noAbbrevTags
should not be abbreviated.  Currently this config option is
used only by gitk (and the patch to gitk will come via the
gitk maintainer tree).

The config setting is in git config logs.* rather than gitk's
own configuration, because:

 - Tools which manage git trees may want to set this, depending
   on their knowledge of the nature of the tags likely to be
   present;

 - Whether this property ought to be set is mostly a property of the
   contents of the tag namespaces in the tree, not a user preference.
   (Although of course user preferences are supported.)

 - Other git utilities (or out of tree utilities) may want to
   reference this setting for their own display purposes.

Background motivation:

Debian's dgit archive gateway tool generates and uses tags called
archive/debian/VERSION.  If such a tag refers to a Debian source tree,
it is probably very interesting because it refers to a version
actually uploaded to Debian by the Debian package maintainer.

We would therefore like a way to specify that such tags should be
displayed in full.  dgit will be able to set an appropriate config
setting in the trees it deals with.

Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk>
---
 Documentation/config.txt | 8 
 1 file changed, 8 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index a0ab66a..6aade4f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2002,6 +2002,14 @@ log.abbrevCommit::
linkgit:git-whatchanged[1] assume `--abbrev-commit`. You may
override this option with `--no-abbrev-commit`.
 
+log.noAbbrevTags::
+   Each value is a glob pattern, specifying tag nammes which
+   should always be displayed in full, even when other tags may
+   be omitted or abbreviated (for example, by linkgit:gitk[1]).
+   Values starting with `^` specify tags which should be
+   abbreviated.  The order is important: the last match, in the
+   most-local configuration, wins.
+
 log.date::
Set the default date-time mode for the 'log' command.
Setting a value for log.date is similar to using 'git log''s
-- 
2.10.1



[PATCH 5/5] check-ref-format: New --stdin option

2016-11-04 Thread Ian Jackson
Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk>
---
 Documentation/git-check-ref-format.txt | 10 --
 builtin/check-ref-format.c | 34 +++---
 2 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-check-ref-format.txt 
b/Documentation/git-check-ref-format.txt
index e9a2657..5a213ce 100644
--- a/Documentation/git-check-ref-format.txt
+++ b/Documentation/git-check-ref-format.txt
@@ -10,8 +10,9 @@ SYNOPSIS
 [verse]
 'git check-ref-format' [--report-errors] [--normalize]
[--[no-]allow-onelevel] [--refspec-pattern]
-   
-'git check-ref-format' [--report-errors] --branch 
+| --stdin
+'git check-ref-format' [--report-errors] --branch
+| --stdin
 
 DESCRIPTION
 ---
@@ -109,6 +110,11 @@ OPTIONS
If any ref does not check OK, print a message to stderr.
 (By default, git check-ref-format is silent.)
 
+--stdin::
+   Instead of checking on ref supplied on the command line,
+   read refs, one per line, from stdin.  The exit status is
+   0 if all the refs were OK.
+
 
 EXAMPLES
 
diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 559d5c2..87f52fa 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -76,6 +76,7 @@ static int check_one_ref_format(const char *refname)
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 {
int i;
+   int use_stdin = 0;
 
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(builtin_check_ref_format_usage);
@@ -93,6 +94,8 @@ int cmd_check_ref_format(int argc, const char **argv, const 
char *prefix)
check_branch = 1;
else if (!strcmp(argv[i], "--report-errors"))
report_errors = 1;
+   else if (!strcmp(argv[i], "--stdin"))
+   use_stdin = 1;
else
usage(builtin_check_ref_format_usage);
}
@@ -100,8 +103,33 @@ int cmd_check_ref_format(int argc, const char **argv, 
const char *prefix)
if (check_branch && (flags || normalize))
usage(builtin_check_ref_format_usage);
 
-   if (! (i == argc - 1))
-   usage(builtin_check_ref_format_usage);
+   if (!use_stdin) {
+   if (! (i == argc - 1))
+   usage(builtin_check_ref_format_usage);
+
+   return check_one_ref_format(argv[i]);
+   } else {
+   char buffer[2048];
+   int worst = 0;
 
-   return check_one_ref_format(argv[i]);
+   if (! (i == argc))
+   usage(builtin_check_ref_format_usage);
+
+   while (fgets(buffer, sizeof(buffer), stdin)) {
+   char *newline = strchr(buffer, '\n');
+   if (!newline) {
+   fprintf(stderr, "%s --stdin: missing final 
newline or line too long\n", *argv);
+   exit(127);
+   }
+   *newline = 0;
+   int got = check_one_ref_format(buffer);
+   if (got > worst)
+   worst = got;
+   }
+   if (!feof(stdin)) {
+   perror("reading from stdin");
+   exit(127);
+   }
+   return worst;
+   }
 }
-- 
2.10.1



[PATCH 4/5] check-ref-format: New --report-errors option

2016-11-04 Thread Ian Jackson
Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk>
---
 Documentation/git-check-ref-format.txt |  8 ++--
 builtin/check-ref-format.c | 10 --
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-check-ref-format.txt 
b/Documentation/git-check-ref-format.txt
index 8611a99..e9a2657 100644
--- a/Documentation/git-check-ref-format.txt
+++ b/Documentation/git-check-ref-format.txt
@@ -8,10 +8,10 @@ git-check-ref-format - Ensures that a reference name is well 
formed
 SYNOPSIS
 
 [verse]
-'git check-ref-format' [--normalize]
+'git check-ref-format' [--report-errors] [--normalize]
[--[no-]allow-onelevel] [--refspec-pattern]

-'git check-ref-format' --branch 
+'git check-ref-format' [--report-errors] --branch 
 
 DESCRIPTION
 ---
@@ -105,6 +105,10 @@ OPTIONS
with a status of 0.  (`--print` is a deprecated way to spell
`--normalize`.)
 
+--report-errors::
+   If any ref does not check OK, print a message to stderr.
+(By default, git check-ref-format is silent.)
+
 
 EXAMPLES
 
diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 020ebe8..559d5c2 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -9,7 +9,7 @@
 
 static const char builtin_check_ref_format_usage[] =
 "git check-ref-format [--normalize] [] \n"
-"   or: git check-ref-format --branch ";
+"   or: git check-ref-format [] --branch ";
 
 /*
  * Return a copy of refname but with leading slashes removed and runs
@@ -51,6 +51,7 @@ static int check_ref_format_branch(const char *arg)
 static int normalize = 0;
 static int check_branch = 0;
 static int flags = 0;
+static int report_errors = 0;
 
 static int check_one_ref_format(const char *refname)
 {
@@ -61,8 +62,11 @@ static int check_one_ref_format(const char *refname)
got = check_branch
? check_ref_format_branch(refname)
: check_refname_format(refname, flags);
-   if (got)
+   if (got) {
+   if (report_errors)
+   fprintf(stderr, "bad ref format: %s\n", refname);
return 1;
+   }
if (normalize) {
printf("%s\n", refname);
free((void*)refname);
@@ -87,6 +91,8 @@ int cmd_check_ref_format(int argc, const char **argv, const 
char *prefix)
flags |= REFNAME_REFSPEC_PATTERN;
else if (!strcmp(argv[i], "--branch"))
check_branch = 1;
+   else if (!strcmp(argv[i], "--report-errors"))
+   report_errors = 1;
else
usage(builtin_check_ref_format_usage);
}
-- 
2.10.1



[PATCH 0/5] git check-ref-format --stdin --report-errors

2016-11-04 Thread Ian Jackson
I wanted to be able to syntax check lots of proposed refs quickly
(please don't ask why - it's complicated!)

So I added a --stdin option to git-check-ref-format.  Also it has
--report-errors now too so you can get some kind of useful error
message if it complains.

It's still not really a good batch mode but it's good enough for my
use case.  To improve it would involve a new command line option to
offer a suitable stdout output format.

There are three small refactoring patches and the two patches with new
options and corresponding docs.

Thanks for your attention.

FYI I am not likely to need this again in the near future: it's a
one-off use case.  So my effort for rework is probably limited.  I
thought I'd share what I'd done in what I hope is a useful form,
anyway.

Regards,
Ian.


[PATCH 2/5] check-ref-format: Refactor to make --branch code more common

2016-11-04 Thread Ian Jackson
We are going to want to permit other options with --branch.

So, replace the special case with just an entry for --branch in the
parser for ordinary options, and check for option compatibility at the
end.

No overall functional change.

Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk>
---
 builtin/check-ref-format.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 4d56caa..f12c19c 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -49,13 +49,19 @@ static int check_ref_format_branch(const char *arg)
 }
 
 static int normalize = 0;
+static int check_branch = 0;
 static int flags = 0;
 
 static int check_one_ref_format(const char *refname)
 {
+   int got;
+
if (normalize)
refname = collapse_slashes(refname);
-   if (check_refname_format(refname, flags))
+   got = check_branch
+   ? check_ref_format_branch(refname)
+   : check_refname_format(refname, flags);
+   if (got)
return 1;
if (normalize)
printf("%s\n", refname);
@@ -68,9 +74,6 @@ int cmd_check_ref_format(int argc, const char **argv, const 
char *prefix)
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(builtin_check_ref_format_usage);
 
-   if (argc == 3 && !strcmp(argv[1], "--branch"))
-   return check_ref_format_branch(argv[2]);
-
for (i = 1; i < argc && argv[i][0] == '-'; i++) {
if (!strcmp(argv[i], "--normalize") || !strcmp(argv[i], 
"--print"))
normalize = 1;
@@ -80,9 +83,15 @@ int cmd_check_ref_format(int argc, const char **argv, const 
char *prefix)
flags &= ~REFNAME_ALLOW_ONELEVEL;
else if (!strcmp(argv[i], "--refspec-pattern"))
flags |= REFNAME_REFSPEC_PATTERN;
+   else if (!strcmp(argv[i], "--branch"))
+   check_branch = 1;
else
usage(builtin_check_ref_format_usage);
}
+
+   if (check_branch && (flags || normalize))
+   usage(builtin_check_ref_format_usage);
+
if (! (i == argc - 1))
usage(builtin_check_ref_format_usage);
 
-- 
2.10.1



[PATCH 1/5] check-ref-format: Refactor out check_one_ref_format

2016-11-04 Thread Ian Jackson
We are going to want to reuse this.  No functional change right now.

It currently has a hidden memory leak if --normalize is used.

Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk>
---
 builtin/check-ref-format.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index eac4994..4d56caa 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -48,12 +48,22 @@ static int check_ref_format_branch(const char *arg)
return 0;
 }
 
+static int normalize = 0;
+static int flags = 0;
+
+static int check_one_ref_format(const char *refname)
+{
+   if (normalize)
+   refname = collapse_slashes(refname);
+   if (check_refname_format(refname, flags))
+   return 1;
+   if (normalize)
+   printf("%s\n", refname);
+}
+
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 {
int i;
-   int normalize = 0;
-   int flags = 0;
-   const char *refname;
 
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(builtin_check_ref_format_usage);
@@ -76,13 +86,5 @@ int cmd_check_ref_format(int argc, const char **argv, const 
char *prefix)
if (! (i == argc - 1))
usage(builtin_check_ref_format_usage);
 
-   refname = argv[i];
-   if (normalize)
-   refname = collapse_slashes(refname);
-   if (check_refname_format(refname, flags))
-   return 1;
-   if (normalize)
-   printf("%s\n", refname);
-
-   return 0;
+   return check_one_ref_format(argv[i]);
 }
-- 
2.10.1



[PATCH 3/5] check-ref-format: Abolish leak of collapsed refname

2016-11-04 Thread Ian Jackson
collapse_slashes always returns a value from xmallocz.

Right now this leak is not very interesting, since we only call
check_one_ref_format once.

Signed-off-by: Ian Jackson <ijack...@chiark.greenend.org.uk>
---
 builtin/check-ref-format.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index f12c19c..020ebe8 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -63,8 +63,10 @@ static int check_one_ref_format(const char *refname)
: check_refname_format(refname, flags);
if (got)
return 1;
-   if (normalize)
+   if (normalize) {
printf("%s\n", refname);
+   free((void*)refname);
+   }
 }
 
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
-- 
2.10.1