from:"Theodore Ts'o"

Re: Git should preserve modification times at least on request

2018-02-19 Thread Theodore Ts'o

On Mon, Feb 19, 2018 at 11:08:19PM +0100, Peter Backes wrote:
> Is thetre some existing code that could be used? I think I read 
> somewhere that git once did preserve mtimes, but that this code was 
> removed because of the build tool issues. Perhaps that code could 
> simply be put back in, and surrounded by conditions.

I don't believe that was ever true, because the mod times is simply
not *stored* anywhere.

You might want to consider trying to implement it as hook scripts
first, and see how well/poorly it works for you.  I do have a use
case, which is to maintain the timestamps for guilt (a quilt-like
patch management system which uses git).  At the moment I just use a
manual script, save-timestamps, which looks like this:

#!/bin/sh
stat -c "touch -d @%Y %n" * | sort -k 3 | grep -v "~$" | sort -k3 > timestamps

and then I just include the timestamps file in thhe commit.  When I
unpack the file elsewhere, I just run the command ". timestamps", or
if I am manually editing a single file, I might do:

grep file-name-of-patch timestamps | sht

This works because the timestamps file has lines which look like
this:

touch -d @1519007593 jbd2-clarify-recovery-checksum-error-msg

I've been too lazy to automate this using a "pre-commit" and
"post-checkout" hook, but it *really* wouldn't be that hard.  Right
now it also only works for files in the top-level of the repo, which
is all I have in my guilt patch repo.  Making this work in a
multiple-directory environment is also left as an exercise to the
reader.  :-)

Cheers,

- Ted

P.S.  Also left to the reader is making it work on legacy OS's like
Windows.  :-)

Re: git send-email sets date

2018-01-28 Thread Theodore Ts'o

On Sun, Jan 28, 2018 at 03:56:57PM -, Philip Oakley wrote:
> Michal, you may want to hack up an option that can automatically create 
> that format if it is of use. I sometimes find the sort order an issue in 
> some of my mail clients.

If there is a From: header in the beginning of the mail body, it is
used as the Author instead of the From: header in the mail header.  It
would make sense if there is a Date: header in the beginning of the
mail body, it should be used instead of Date: field in the mail header.

The problem is that if existing git clients don't support this, it
wouldn't be safe to start emmiting patches with that format for at
least a year or two until the prerequisite version of git gets wide
adoption.  Alternatively, there could be a git option which causes
something like X-Git-Author-Date: to be set in the mail header.

- Ted

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-22 Thread Theodore Ts'o

On Mon, Jan 22, 2018 at 07:47:10PM -0500, Jeff King wrote:
> 
> I think Ævar is talking about the case of:
> 
>   1. You make 100 objects that aren't referenced. They're loose.
> 
>   2. You run git-gc. They're still too recent to be deleted.
> 
> Right now those recent loose objects sit loose, and have zero cost at
> the time of gc.  In a "cruft pack" world, you'd pay some I/O to copy
> them into the cruft pack, and some CPU to zlib and delta-compress them.
> I think that's probably fine, though.

I wasn't assuming that git-gc would create a cruft pack --- although I
guess it could.  As you say, recent loose objects have relatively zero
cost at the time of gc.  To the extent that the gc has to read lots of
loose files, there may be more seeks in the cold cache case, so there
is actually *some* cost to having the loose objects, but it's not
great.

What I was thinking about instead is that in cases where we know we
are likely to be creating a large number of loose objects (whether
they referenced or not), in a world where we will be calling fsync(2)
after every single loose object being created, pack files start
looking *way* more efficient.  So in general, if you know you will be
creating N loose objects, where N is probably around 50 or so, you'll
want to create a pack instead.

One of those cases is "repack -A", and in that case the loose objects
are all going tobe not referenced, so it would be a "cruft pack".  But
in many other cases where we might be importing from another DCVS,
which will be another case where doing an fsync(2) after every loose
object creation (and where I have sometimes seen it create them *all*
loose, and not use a pack at all), is going to get extremely slow and
painful.

> So if we pack all the loose objects into a cruft pack, the mtime of the
> cruft pack becomes the new gauge for "recent". And if we migrate objects
> from old cruft pack to new cruft pack at each gc, then they'll keep
> getting their mtimes refreshed, and we'll never drop them.

Well, I was assuming that gc would be a special case which doesn't the
mtime of the old cruft pack.  (Or more generally, any time an object
is gets copied out of the cruft pack, either to a loose object, or to
another pack, the mtime on the source pack should not be touched.)

   - Ted

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-22 Thread Theodore Ts'o

On Mon, Jan 22, 2018 at 04:09:23PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > What's tricky is to devise a way to allow us to salvage objects that
> > are placed in a cruft pack because they are accessed recently,
> > proving themselves to be no longer crufts.  It could be that a good
> > way to resurrect them is to explode them to loose form when they are
> > accessed out of a cruft pack.  We need to worry about interactions
> > with read-only users if we go that route, but with the current
> > "explode unreachable to loose, touch their mtime when they are
> > accessed" scheme ends up ignoring accesses from read-only users that
> > cannot update mtime, so it might not be too bad.
> 
> Wouldn't it also make gc pruning more expensive? Now you can repack
> regularly and loose objects will be left out of the pack, and then just
> rm'd, whereas now it would entail creating new packs (unless the whole
> pack was objects meant for removal).

The idea is that the cruft pack would be all objects that were no
longer referenced.  Hence the proposal that if they ever *are*
accessed, they would be exploded to a loose object at that point.  So
in the common case, the GC would go quickly since the entire pack
could just be rm'ed once it hit the designated expiry time.

Another way of doing things would be to use the mtime of the cruft
pack for the expiry time, and if the curft pack is ever referenced,
its mtime would get updated.  Yet a third way would be to simply clear
the "cruft" bit if it ever *is* referenced.  In the common case, it
would never be referenced, so it could just get deleted, but in the
case where the user has manually "rescued" a set of commits (perhaps
by explicitly setting a branch head to commit id found from a reflog),
the objects would be saved.

So there are many ways it could be managed.

- Ted

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-20 Thread Theodore Ts'o

On Fri, Jan 19, 2018 at 11:08:46AM -0800, Junio C Hamano wrote:
> So..., is it fair to say that the one you sent in
> 
>   https://public-inbox.org/git/20180117193510.ga30...@lst.de/
> 
> is the best variant we have seen in this thread so far?  I'll keep
> that in my inbox so that I do not forget, but I think we would want
> to deal with a hotfix for 2.16 on case insensitive platforms before
> this topic.

It's a simplistic fix, but it will work.  There may very well be
certain workloads which generate a large number of loose objects
(e.g., git repack -A) which will make things go significantly more
slowly as a result.  It might very well be the case that if nothing
else is going on, something like "write all the files without
fsync(2), then use syncfs(2)" would be much faster.  The downside with
that approach is if indeed you were downloading a multi-gigabyte DVD
image at the same time, the syncfs(2) will force a writeback of the
partially writte DVD image, or some other unrelated files.

But if the goal is to just change the default, and then see what
shakes out, and then apply other optimizations later, that's certainly
a valid result.  I've never been fond of the "git repack -A" behavior
where it can generate huge numbers of loose files.  I'd much prefer it
if the other objects ended up in a separate pack file, and then some
other provision made for nuking that pack file some time later.  But
that's expanding the scope significantly over what's currently being
discussed.

- Ted

Re: should any build system legitimately change any tracked files?

2018-01-19 Thread Theodore Ts'o

On Fri, Jan 19, 2018 at 12:51:52PM -0500, Robert P. J. Day wrote:
> that's all the info i was given, but it *seems* clear that the build
> process itself was making changes to one or more tracked files.
> 
>   technically, i guess one can design a build system to do pretty
> much anything, but is it fair to say that this is a really poor design
> decision? admittedly, this isn't specifically a git question, but i'm
> open to opinions on something that strikes me as a bad idea.

I agree that in general it's a bad idea.  I can see how it happens,
though, which is because two things come into tension:

1) The general desire not to check in generated files into the git
repository --- including configure files generated by autoconf,
Makefiles generated by automake, libtool files, etc.

2) Wanting not to give users trying to build from source a non-hostile
experience.  Unfortunately autoconf/automake/libtool systems are
notorious for not having a stable interface, such that if you have the
wrong or outdated version of the tools, the results of generating the
configure, Makefile, etc., using a different version than what the
developer used well, your results may vary.

What I do is use "Maintainer mode" which means that the generated
files are *not* automatically rebuilt by the build system unless you
configure with --enable-maintainer-mode, and then I *do* check in the
generated files into git.  That way I can run with
--enable-maintainer-mode, and check in updates to Makefile, configure,
etc., as necessary when the input files change, but that way, end
users don't have to worry getting ambushed by version skew caused by
using an old (or unexpectedly newer) version of the
autoconf/autoconf/libtool tools.

Heck, I even have had config.guess/config.sub change on me in
incompatible ways(*), so I ship my own version and don't enable a blind
update of those files from the upstream FSF sources --- mainly because
I don't trust them to preserve a stable interface.  Better that I
manually pull them into the repo, and test them before I do a public
release.

- Ted

(*) Although to be fair it's been years since I've been screwed in
this fashion.  But once bitten, twice shy

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Theodore Ts'o

On Wed, Jan 17, 2018 at 02:07:22PM -0800, Linus Torvalds wrote:
> 
> Now re-do the test while another process writes to a totally unrelated
> a huge file (say, do a ISO file copy or something).
> 
> That was the thing that several filesystems get completely and
> horribly wrong. Generally _particularly_ the logging filesystems that
> don't even need the fsync, because they use a single log for
> everything (so fsync serializes all the writes, not just the writes to
> the one file it's fsync'ing).

Well, let's be fair; this is something *ext3* got wrong, and it was
the default file system back them.  All of the modern file systems now
do delayed allocation, which means that an fsync of one file doesn't
actually imply an fsync of another file.  Hence...

> The original git design was very much to write each object file
> without any syncing, because they don't matter since a new object file
> - by definition - isn't really reachable. Then sync before writing the
> index file or a new ref.

This isn't really safe any more.  Yes, there's a single log.  But
files which are subject to delayed allocation are in the page cache,
and just because you fsync the index file doesn't mean that the object
file is now written to disk.  It was true for ext3, but it's not true
for ext4, xfs, btrfs, etc.

The good news is that if you have another process downloading a huge
ISO image, the fsync of the index file won't force the ISO file to be
written out.  The bad news is that it won't force out the other git
object files, either.

Now, there is a potential downside of fsync'ing each object file, and
that is the cost of doing a CACHE FLUSH on a HDD is non-trivial, and
even on a SSD, it's not optimal to call CACHE FLUSH thousands of times
in a second.  So if you are creating thousands of tiny files, and you
fsync each one, each fsync(2) call is a serializing instruction, which
means it won't return until that one file is written to disk.  If you
are writing lots of small files, and you are using a HDD, you'll be
bottlenecked to around 30 files per second on a 5400 RPM HDD, and this
is true regardless of what file system you use, because the bottle
neck is the CACHE FLUSH operation, and how you organize the metadata
and how you do the block allocation, is largely lost in the noise
compared to the CACHE FLUSH command, which serializes everything.

There are solutions to this; you could simply not call fsync(2) a
thousand times, and instead write a pack file, and call fsync once on
the pack file.  That's probably the smartest approach.

You could also create a thousand threads, and call fsync(2) on those
thousand threads at roughly the same time.  Or you could use a
bleeding edge kernel with the latest AIO patch, and use the newly
added IOCB_CMD_FSYNC support.

But I'd simply recommend writing a pack and fsync'ing the pack,
instead of trying to write a gazillion object files.  (git-repack -A,
I'm looking at you)

- Ted

Re: Bring together merge and rebase

2018-01-06 Thread Theodore Ts'o

On Sat, Jan 06, 2018 at 10:29:21AM -0700, Carl Baldwin wrote:
> > When n==m==1, "amended" pointer from X1 to A1 may allow you to
> > answer "Is this the first attempt?  If this is refined, what did the
> > earlier one look like?" when given X1, but you would also want to
> > answer a related question "This was a good start, but did the effort
> > result in a refined patch, and if so what is it?" when given A1, and
> > "amended" pointer won't help at all.  Needless to say, the "pointer"
> > approach breaks down when !(n==m==1).
> 
> It doesn't break down. It merely presents more sophisticated situations
> that may be more work for the tool to help out with. This is where I
> think a prototype will help see these situations and develop the tool to
> manage them.

That's another way of saying "break down".

And if the goal is a prototype, may I gently suggest that the way
forward is trailers in the commit body, ala:

Change-Id: I0b793feac9664bcc8935d8ec04ca16d5

or

Upstream-4.15-SHA1: 73875fc2b3934e45b4b9a94eb57ca8cd

Making changes in the commit header is complex, and has all *sorts* of
forward and backwards compatibility challenges, especially when it's
not clear what the proper data model should be.

Cheers,

 -Ted

Re: Bring together merge and rebase

2017-12-26 Thread Theodore Ts'o

On Mon, Dec 25, 2017 at 06:16:40PM -0700, Carl Baldwin wrote:
> At this point, you might wonder why I'm not proposing to simply add a
> "change-id" to the commit object. The short answer is that the
> "change-id" Gerrit uses in the commit messages cannot stand on its own.
> It depends on data stored on the server which maintains a relationship
> of commits to a review number and a linear ordering of commits within
> the review (hopefully I'm not over simplifying this). The "replaces"
> reference is an attempt to make something which can stand on its own. I
> don't think we need to solve the problem of where to keep comments at
> this point.

I strongly disagree, and one way to see that is by doing a real-life
experiment.  If you take a look at a gerrit change that, which in my
experience can have up to ten or twelve revisions, and strip out the
comments, so all you get to look at it is half-dozen or more
revisions.  How useful is it *really*?  How does it get used in
practice?  What development problem does it help to solve?

And when you say that it is a bug that the Gerrit Change-Id does not
stand alone, consider that it can also be a *feature*.  If you keep
all of this in the main repo, the number of commits can easily grow by
an order of magnitude.  And these are commits that you have to keep
forever, which means it slows down every subsequent git clone, git gc
operation, git tag --contains search, etc.

So what are the benefits, and what are the costs?  If the benefits
were huge, then perhaps it would be worthwhile.  But if you lose a
huge amount of the value because you are missing the *why* between the
half-dozen to dozen past revisions of the commit, then is it really
worth it to adopt that particular workflow?

It seems to me your argument is contrasting a "replaces" pointer
versus the github PR.  But compared to the Gerrit solution, I don't
think the "replaces" pointer proposal is as robust or as featureful.
Also, please keep in mind that just because it's in core git doesn't
guarantee that Github will support it.  As far as I know github has
zero support notes, for example.

- Ted

Re: Bring together merge and rebase

2017-12-24 Thread Theodore Ts'o

On Fri, Dec 22, 2017 at 11:10:19PM -0700, Carl Baldwin wrote:
> I've been calling this proposal `git replay` or `git replace` but I'd
> like to hear other suggestions for what to name it. It works like
> rebase except with one very important difference. Instead of orphaning
> the original commit, it keeps a pointer to it in the commit just like
> a `parent` entry but calls it `replaces` instead to distinguish it
> from regular history. In the resulting commit history, following
> `parent` pointers shows exactly the same history as if the commit had
> been rebased. Meanwhile, the history of iterating on the change itself
> is available by following `replaces` pointers. The new commit replaces
> the old one but keeps it around to record how the change evolved.

As a suggestion, before diving into the technical details of your
proposal, it might be useful consider the usage scenario you are
targetting.  Things like "git rebase" and "git merge" and your
proposed "git replace/replay" are *mechanisms*.

But how they fit into a particular workflow is much more important
from a design perspective, and given that there are many different git
workflows which are used by different projects, and by different
developers within a particular project.

For example, rebase gets used in many different ways, and many of the
debates when people talk about "git rebase" being evil generally
presuppose a particular workflow that that the advocate has in mind.
If someone is using git rebase or git commit --amend before git
commits have ever been pushed out to a public repository, or to anyone
else, that's a very different case where it has been visible
elsewhere.  Even the the most strident, "you must never rewrite a
commit and all history must be preserved" generally don't insist that
every single edit must be preserved on the theory that "all history is
valuable".

> The git history now has two dimensions. The first shows a cleaned up
> history where fix ups and code review feedback have been rolled into
> the original changes and changes can possibly be ordered in a nice
> linear progression that is much easier to understand. The second
> drills into the history of a change. There is no loss and you don't
> change history in a way that will cause problems for others who have
> the older commits.

If your goal is to preserve the history of the change, one of the
problems with any git-centric solution is that you generally lose the
code review feedback and the discussions that are involved with a
commit.  Just simply preserving the different versions of the commits
is going to lose a huge amount of the context that makes the history
valuable.

So for example, I would claim that if *that* is your goal, a better
solution is to use Gerrit, so that all of the different versions of
the commits are preserved along with the line-by-line comments and
discussions that were part of the code review.  In that model, each
commit has something like this in the commit trailer:

Change-Id: I8d89b33683274451bcd6bfbaf75bce98

You can then cut and paste the Change-Id into the Gerrit user
interface, and see the different commits, more important, the
discussion surrounding each change.

If the complaint about Gerrit is that it's not a core part of Git, the
challenge is (a) how to carry the code review comments in the git
repository, and (b) do so in a while that it doesn't bloat the core
repository, since most of the time, you *don't* want or need to keep a
local copy of all of the code review comments going back since the
beginning of the project.

-

Here's another potential use case.  The stable kernels (e.g., 3.18.y,
4.4.y, 4.9.y, etc.) have cherry picks from the the upstream kernel,
and this is handled by putting in the commit body something like this:

[ Upstream commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe ]

And here's yet another use case.  For internal Google kernel
development, we maintain a kernel that has a large number of patches
on top of a kernel version.  When we backport an upstream fix (say,
one that first appeared in the 4.12 version of the upstream kernel),
we include a line in the commit body that looks like this:

Upstream-4.12-SHA1: 5649645d725c73df4302428ee4e02c869248b4c5

This is useful, because when we switch to use a newer upstream kernel,
we need make sure we can account for all patches that were built on
top of the 3xx kernel (which might have been using 4.10, for the sake
of argument), to the 4xx kernel series (which might be using 4.15 ---
the version numbers have been changed to protect the innocent).  This
means going through each and every patch that was on top of the 3xx
kernel, and if it has a line such as "Upstream 4.12-SHA1", we know
that it will already be included in a 4.15 based kernel, so we don't
need to worry about carrying that patch forward.

In other cases, we might decide that the patch is no longer needed.
It could be because the patch has already be included upstream, in

Re: should "git bisect" support "git bisect next?"

2017-11-12 Thread Theodore Ts'o

On Sun, Nov 12, 2017 at 03:21:57PM +0100, Christian Couder wrote:
> 
> Yeah I agree that it might be something interesting for the user to do.
> But in this case the sequence in which you give the good and the bad
> commits is not important.
> Only the last bad commit and the set of good commits that were given
> are important.

Is it really true that of the bad commits, only the last one is significant?

Suppose we have a git tree that looks like this:

  *---*---*---*---*---*---M2---*---B1
  ||
  G1--*--D1---*---*---*---B2-\ |
  |   \/
  *---*---*---B3--*---M1--/

If we know that commits B2 and B3 are bad, if we assume that all
commits before the "bad" commit are good, all commits after the "bad"
commit are bad, can we not deduce that commit D1 should also be "bad"?

  - Ted

Re: should "git bisect" support "git bisect next?"

2017-11-11 Thread Theodore Ts'o

On Sat, Nov 11, 2017 at 11:38:23PM +0900, Junio C Hamano wrote:
> 
> Thanks for saving me time to explain why 'next' is still a very
> important command but the end users do not actually need to be
> strongly aware of it, because most commands automatically invokes it
> as their final step due to the importance of what it does ;-)

This reminds me; is there a way to suppress it because I'm about to
give a large set of good and bit commits (perhaps because I'm
replaying part of a git biset log, minus one or two lines that are
suspected of being bogus thanks to flaky reproduction), and so there's
no point having git bisect figure the "next" commit to try until I'm
done giving it a list of good/bad commits?

  - Ted

Re: "git rm" seems to do recursive removal even without "-r"

2017-10-08 Thread Theodore Ts'o

On Sun, Oct 08, 2017 at 03:44:14PM -0400, Robert P. J. Day wrote:
> >
> > find  | xargs git rm
> >
> > myself.
> 
>   that's what i would have normally used until i learned about git's
> magical globbing capabilities, and i'm going to go back to using it,
> because git's magical globbing capabilities now scare me.

Hmm, I wonder if the reason why git's magically globbing capabilities
even exist at all is for those poor benighted souls on Windows, for
which their shell (and associated utilities) doesn't have advanced
tools like "find" and "xargs"

- Ted

Re: "git rm" seems to do recursive removal even without "-r"

2017-10-08 Thread Theodore Ts'o

On Sun, Oct 08, 2017 at 10:32:40AM -0400, Paul Smith wrote:
> Personally I don't use Git's magical globbing capabilities, and use "git
> rm" as if it were UNIX rm.  So in your request above I'd use:
> 
>git rm $(find . -name Makefile)
> 
> which I find simpler.

I have to agree that git's magical globbing capabilities
are... strange.  (And apologies to Robert for my earlier post; I
didn't understand what he was complaining about.)  I don't use it
either, although I tend to use:

find  | xargs git rm

myself.

One thing which is interesting is that not only is the git's magical
globbing capabilities have somewhat unusual semantics, the how
globbing is done in .gitignore entries are completely different.

Shrug.  I put this in the same category as "tabs are significant in
Makefile's", "whitespace is significant in python", and "the many
varied different behaviours and uses of 'git reset'".

They are all idiosyncrancies of semantics of various highly popular
tools (which being highly popular, would make changing the details
quite difficult due to backwards compatibility concerns, even if we
wanted to change them).

- Ted

Re: "git rm" seems to do recursive removal even without "-r"

2017-10-07 Thread Theodore Ts'o

On Sat, Oct 07, 2017 at 03:43:43PM -0400, Robert P. J. Day wrote:
> >   -r
> > Recursively remove the contents of any directories that match
> > ``.
> >
> > or something.
> 
>   it's been a long week, so take this in the spirit in which it is
> intended ... i think the "git rm" command and its man page should be
> printed out, run through a paper shredder, then set on fire. i can't
> remember the last time i saw such a thoroughly badly-designed,
> badly-documented and non-intuitive utility.
> 
>   i'm going to go watch football now and try to forget this horror.

It sounds like the real issue here is that you are interpreting
"recursively" to mean "globbing".  Your original complaint seemed to
be a surprise that "git rm book/\*.asc" would delete all of the files
in the directory "book" that ended in .asc, even without the -r flag.

That's because the operation of matching *.asc is considered
"globbing".  Now if there were directories whose name ended in .asc,
then they would only be deleted if the -r flag is given.  Deleting
directories and their contents is what is considered "recursive
removal".

That's not particularly surprising to me as a long-time Unix/Linux
user/developer, since that's how things work in Unix/Linux:

% touch 1.d 2.d ; mkdir 3.d 4.d
% /bin/ls
1.d  2.d  3.d  4.d
% rm -r *.d
% touch 1.d 2.d ; mkdir 3.d 4.d
% rm *.d
rm: cannot remove '3.d': Is a directory
rm: cannot remove '4.d': Is a directory

I'm going to guess that you don't come from a Unix background?

- Ted

Re: Git "Keeping Original Dates"

2017-06-06 Thread Theodore Ts'o

On Mon, Jun 05, 2017 at 07:36:58PM -0400, Hector Santos wrote:
> Do you see any technical issues with using programmable hooks or something
> like this would have to be patched in? I am giving it a serious thought to
> exploring a fix to the Git Daemon over the wire completion issues on
> Windows. It appears to be a Half Close socket issue.

You can certainly do it with so kind of hook script.  

This is how I do thing to maintain the modtimes for a set of patches
that I maintain using guilt (git://repo.or.cz/guilt.git).  The
following is done using Linux, but I imagine you could translate it
into something that would work with powershell, or cygwin, or just use
the Windows Subsystem for Linux.

#!/bin/sh
stat -c "touch -d @%Y %n" * | sort -k 3 | grep -v "~$" | sort -k3 > timestamps

I have this shell script saved as ~/bin/save-timestamps.  The generated file
has lines which look this:

touch -d @1496078695 fix-fdatasync-after-extent-manipulation-operations
touch -d @1496081597 status
touch -d @1496082752 series

... and when you execute the command, it will restore the timestamps
to the value checked into the git repository.  If you want to only
restore the timestamp of a single file, you can do something like this:

grep timestamps ^fix-fdatasync-after-extent | bash

Cheers,

- Ted

Re: Another git repo at kernel.org?

2017-05-23 Thread Theodore Ts'o

So Junio owns the pub/scm/git/git.git tree on kernel.org, and he may
already have access to create new repo's under the pub/scm/git
hierarchy.  In which case we might not need to bug the kernel.org
administrators at all.

Also, I'll note that it is possible to set up some repo's such that a
group of people have access to push to them.  You'll see for example
on git.kernel.org that certain repositories have as their owner "XFS
FS Group", or "ARM64 Group" or "Intel Wireless Group".

Cheers,

- Ted

Re: Will OpenSSL's license change impact us?

2017-03-25 Thread Theodore Ts'o

On Sat, Mar 25, 2017 at 06:51:21PM +0100, Ævar Arnfjörð Bjarmason wrote:
> In GPLv3 projects only, not GPLv2 projects. The paragraphs you're
> quoting all explicitly mention v3 only, so statements like
> "incompatible in one direction" only apply to Apache 2 && GPLv3, but
> don't at all apply to GPLv2, which is what we're using.

It's complicated.

It's fair enough to say that the FSF adopts a copyright maximalist
position, and by their interpretation, the two licenses are
incompatible, and it doesn't matter whether the two pieces of code are
linked staticaly, dynamically, or one calls the other over an RPC
call.

Not everyone agrees with their legal analysis.  May I suggest that we
not play amateur lawyer on the mailing list, and try to settle this
here?  Each distribution can make its own decision, which may be based
on its legal advice, the local laws and legal precedents in which they
operate, etc.  And indeed, different distributions have already come
to different conclusions with respect to various license compatibility
issues.  (Examples: dynamically linking GPL programs with OpenSSL
libraries under the old license, distributing ZFS modules for Linux,
etc.)

We don't expect lawyers to debug edge cases in a compiler's code
generation.  Programmers shouldn't try to parse edge cases in the law,
or try to use a soldering iron, unless they have explicit training and
expertise to do so.  :-)

- Ted

Re: Stable GnuPG interface, git should use GPGME

2017-03-10 Thread Theodore Ts'o

On Fri, Mar 10, 2017 at 10:54:19AM -0800, Linus Torvalds wrote:
>  - library versioning.
> 
>I don't know why, but I've never *ever* met a library developer who
> realized that libraries were all about stable API's, and the library
> users don't want to fight different versions.

Actually, you have.  (Raises hand :-)

libext2fs has a stable API *and* ABI.  We add new functions instead of
changing function parameters (so ext2fs_block_iterate2() is
implemented in terms of ext2fs_block_iterate3(), and so on).  And
structures have magic numbers that have served as versioning signal.
This is actually not rocket science.  If you've met anyone who's
programmed for Multics, they did something similar.  And of course,
that's why we have the wait3(2) and wait(4) system calls.

I do have to agree with your general point, that most developers tend
to be *incredibly* sloppy with their interfaces.  That being said, not
all library developers are as bad as GNOME.  :-)

 - Ted

Re: [RFH] gpg --import entropy while running tests

2016-12-28 Thread Theodore Ts'o

On Wed, Dec 28, 2016 at 03:39:30AM -0500, Jeff King wrote:
> >   
> > https://git.gnupg.org/cgi-bin/gitweb.cgi?p=gnupg.git;a=commit;h=4473db1ef24031ff4e26c9a9de95dbe898ed2b97
> > 
> > So this does seem like a gpg bug.
> 
> I've submitted a bug report to gpg:
> 
>   https://bugs.gnupg.org/gnupg/issue2897
> 
> so we'll see what they say.

Yeah, they are definitely doing something very hard to explain.

Pid 8348 is the gpg-agent process which the main gpg program (pid
8344) connected to.  It starts trying to get randomness in response to
a KEYWRAP command:

8348  10:58:57.882909 access("/dev/random", R_OK) = 0
8348  10:58:57.883205 access("/dev/urandom", R_OK) = 0
8348  10:58:57.883472 open("/dev/urandom", O_RDONLY) = 9
8348  10:58:57.883729 fcntl(9, F_GETFD) = 0
8348  10:58:57.883914 fcntl(9, F_SETFD, FD_CLOEXEC) = 0

It opens /dev/urandom, but then never uses fd 9 ever again.  Instead,
it uses getrandom, but in a pretty silly fashion, with lots of sleeps
in between, and not between each progress report, either:

8348  10:58:57.884129 write(8, "S PROGRESS need_entropy X 30 120", 32 

8344  10:58:57.884338 <... read resumed> "S PROGRESS need_entropy X 30 120", 
1002) = 32
8348  10:58:57.884424 <... write resumed> ) = 32
8344  10:58:57.884488 read(5,  
8348  10:58:57.884550 write(8, "\n", 1 
8344  10:58:57.884715 <... read resumed> "\n", 970) = 1
8348  10:58:57.884800 <... write resumed> ) = 1
8344  10:58:57.884883 read(5,  
8348  10:58:57.884951 nanosleep({0, 1}, NULL) = 0
8348  10:58:57.985363 select(10, [9], NULL, NULL, {0, 10}) = 1 (in [9], 
left {0, 4})
8348  10:58:57.985593 
getrandom("&\275\354^\256\320\3w\21:R]`eJ\t\t\350\245\202>\255\237\324\324\340\24^c\323\210\376"...,
 90, 0) = 90
8348  10:58:57.985751 write(8, "S PROGRESS need_entropy X 120 12"..., 33) = 33
8344  10:58:57.985885 <... read resumed> "S PROGRESS need_entropy X 120 12"..., 
1002) = 33
8348  10:58:57.985934 write(8, "\n", 1 
8344  10:58:57.985982 read(5,  
8348  10:58:57.986015 <... write resumed> ) = 1
8344  10:58:57.986048 <... read resumed> "\n", 969) = 1
8348  10:58:57.986090 nanosleep({0, 1},  
8344  10:58:57.986142 read(5,  
8348  10:58:58.086253 <... nanosleep resumed> NULL) = 0
8348  10:58:58.086370 write(8, "S PROGRESS need_entropy X 30 120", 32) = 32
8344  10:58:58.086502 <... read resumed> "S PROGRESS need_entropy X 30 120", 
1002) = 32
8348  10:58:58.086541 write(8, "\n", 1 
8344  10:58:58.086579 read(5,  
8348  10:58:58.086604 <... write resumed> ) = 1
8344  10:58:58.086630 <... read resumed> "\n", 970) = 1
8348  10:58:58.086661 nanosleep({0, 1},  
8344  10:58:58.086703 read(5,  
8348  10:58:58.186815 <... nanosleep resumed> NULL) = 0
8348  10:58:58.186894 select(10, [9], NULL, NULL, {0, 10}) = 1 (in [9], 
left {0, 5})
8348  10:58:58.187038 
getrandom("\365\221\374m\360\235\27\330\264\223\365\363<6\302\324F\5\354Q|,\366\253\337u\226\265\345\250CA"...,
 90, 0) = 90

The worst part of this is that the commit description claims this is a
workaround for libgcrypt using /dev/random, but it's not using
/dev/random --- it's using getrandom, and it pointlessly opened
/dev/urandom first (having never opened /dev/random).

This looks like a classic case of Lotus Notes / Websphere disease ---
to many d*mned layers of abstraction

- Ted

Re: Git and SHA-1 security (again)

2016-07-17 Thread Theodore Ts'o

On Sun, Jul 17, 2016 at 03:42:34PM +, brian m. carlson wrote:
> As I said, I'm not planning on multiple hash support at first, but it
> doesn't appear impossible if we go this route.  We might still have to
> rewrite objects, but we can verify signatures over the legacy SHA-1
> objects by forcing them into the old-style object format.

How hard would it be to make the on-disk format be multihash, even if
there is no support for anything other than a single hash, at least
for now?  That way we won't have to rewrite the objects twice.

Personally, so long as the newer versions of the tree are secured, I
wouldn't mind if the older commits stayed using SHA1 only.  The newer
commits are the ones that are most important and security-critical
anyway.  It seems like the main reason to rewrite all of the objects
is to simplify the initial rollout of a newer hash algorithm, no?

   - Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/5] date: document and test "raw-local" mode

2016-07-11 Thread Theodore Ts'o

On Mon, Jul 11, 2016 at 01:06:17AM -0400, Jeff King wrote:
> 
> The documentation claims that "raw-local" does not work. It
> does, but the end result is rather subtle. Let's describe it
> in better detail, and test to make sure it works (namely,
> the epoch time doesn't change, but the zone does).

Maybe add an editorial statement that in most cases this isn't
particularly useful?  Documenting raw-local implies that someone might
want to consider using it, and it's not clear to me folks should ever
try --- they're more likely to confuse themselves more than anything
else.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] doc/pretty-formats: describe index/time formats for %gd

2016-07-11 Thread Theodore Ts'o

On Mon, Jul 11, 2016 at 01:05:13AM -0400, Jeff King wrote:
> The "reflog selector" format changes based on a series of
> heuristics, and that applies equally to both stock "log -g"
> output, as well as "--format=%gd". The documentation for
> "%gd" doesn't cover this. Let's mention the multiple formats
> and refer the user back to the "-g" section for the complete
> rules.

Is it worth mentioning that the shortening only happens if the user
specifies a selector with '/' in it in the first place?  I was
confused when I was first playing with these selectors because %gd and
%gD are identical if you run

git reflog --format=%gd -3 master
git reflog --format=%gD -3 master

and are only different if you run:

git reflog --format=%gd -3 refs/heads/master
git reflog --format=%gD -3 refs/heads/master

- Ted

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] pretty: add format specifiers: %gr, %gt, %gI, gi

2016-07-11 Thread Theodore Ts'o

On Mon, Jul 11, 2016 at 01:02:02AM -0400, Jeff King wrote:
> Yeah, I'd have hoped for %gd, as well. One thing I think we should move
> towards in the long run is giving more readable names to our
> placeholders for git-log, the way for-each-ref and cat-file do (but
> keeping the existing ones for compatibility and as a shorthand).
> 
> So ideally the answer in the long run is:
> 
>   %(reflog-ref)@{%(reflog-index)}
> 
> or possibly:
> 
>   %(reflog:index)
> 
> for the whole thing. Or something like that. I haven't thought that hard
> about the exact syntax.

Yes, FWIW, I agree that long term, using % followed by one or two
characters is just a mess, and using some kind of human-readable
format is going to make a lot of sense.  I can imagine a few places
where I might still want to type --format=%at in some kind of ad-hoc
shell command, but in most places, if you're using a complex --format
specifier, it's going either in a shell script or in a .gitconfig
file, where being verbose is probably more of an advantage than a
disadvantage.

>   1. It's half-implemented. Why can we do format X, but not format Y
>  (for that matter, why can you do %ct, but there is no --date format
>  that matches it?). That sort of non-orthogonality ends up
>  frustrating for users and makes git look creaky and poorly thought
>  out.

Git *is* creaky and not thought-out in advance; that's just the nature
of how most successful open source projects grow; might as well be
proud of it.  :-)   As Greg K-H has said: "We believe in evolution, and
not intelligent design."  :-)

> > ... although I doubt whether git would ever want to do the equivalent of:
> > 
> > gcloud compute images list  
> > --format='table[box,title=Images](name:sort=1,family)'
> > 
> > which will print something like this:
> 
> That's neat, though I think I'd really prefer just making it easy to get
> the data out of git in a structured way, and then applying some cool
> json-formatting script to it. Surely "turn this json into a table" is a
> thing that could be solved once for everybody (I don't work with it
> enough to know, but maybe "jq" can do that already).

Oh, agreed.  I used that as over-the-top example of something we
probably wouldn't want to put in the git core.  jq can't, but I'm sure
there must be some JSON tool out there which can.

> But let's get back to reality for a moment. Here are some patches that
> address the issues you brought up above.
> 
>   [1/5]: doc/rev-list-options: clarify "commit@{Nth}" for "-g" option
>   [2/5]: doc/rev-list-options: explain "-g" output formats
>   [3/5]: doc/pretty-formats: describe index/time formats for %gd
>   [4/5]: date: document and test "raw-local" mode
>   [5/5]: date: add "unix" format
> 
> The next step is either:
> 
>   - add specific reflog-time-formats, as your patch does
> 
>   - add a generic reflog-date placeholder, so you can do:
> 
>   git log --date=unix --format='%gT'
> 
> or whatever. That still doesn't give you multiple date types in a
> single invocation, though. It's probably not much code to do so, but
> designing the syntax and supporting existing placeholders would be
> some work.
> 
> I'm on the fence, so I'll let you decide how you want to proceed. I can
> live with "%gr" and "%gt", as they are at least symmetric with their
> author/committer counterparts.

I'm on the fence myself.  I can live with either, since either way the
long message command line will be going in .gitconfig.  I have a
slight preference for %gr and %gt, as %gT isn't orthogonal with
%ad/%cd, but I could be easily pursuaded otherwise.

Does anyone else have a strong opinion?

- Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] pretty: add format specifiers: %gr, %gt, %gI, gi

2016-07-10 Thread Theodore Ts'o

On Sun, Jul 10, 2016 at 06:05:31PM +0200, Duy Nguyen wrote:
> On Sun, Jul 10, 2016 at 4:26 PM, Theodore Ts'o <ty...@mit.edu> wrote:
> > One other long-term thought.  Maybe past a certain point, we should
> > just make it easy to get the data from git-log into a perl or pythons
> > script, where it becomes possible to do conditionals, more flexible
> > padding rules, etc.  So some kind of --format=yaml or --format=json
> > sort of thing.
> 
> I thought libgit2 would already give you all the information you need.

libgit2 isn't really all that useful if you are writing a shell
script.  Even from perl or python, setting up SWIG bindings and then
linking libgit2 into perl or python isn't exactly the most convenient
thing in the world.

Also, my original use case was something I could drop into
~/.gitconfig as an git alias, although I don't object to having a
separate shell script if that was the only way to do what I wanted.

> Putting everything in columns is my thing :) We can do something like
> that. It should not be so hard to put titles on top and draw some
> lines, I think, if you set fixed column widths. I'm just not sure if
> it will be really helpful. What sort of use case do you have in mind
> (besides git-log --oneline with customizable columns)?

I didn't; it was the example of something which was over the top.  :-)

That being said, it is nice if you can have columns where the
pretty-printer auto-sizes the column widths.  Most databases which
have a REP loop for SQL statements will do this, as does gcloud's
--format='table[box]...' scheme.  That unfortunately means a two-pass
scheme, although I could imagine something which looks at the first N
commits to be printed, figured out column widths, and then either
truncates or autowraps if there are commits after the first N which
have require a field wider than what was autosized.

It may be too much to think that all of this should be in git's core
implementation, though.  This is where it might be simpler to easily
get the information into perl or python, and then do the final
formatting in perl/pyhton.  Hence my suggestion for some kind of yaml
or json format.  Although I suppose a CPAN or Python Module that
dlopen's libgit2 could also work, so long as it was super-easy for
someone who just wants to create a git-log like report can just do so
without having to create their own C program or C language bindings to
libgit2

- Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] pretty: add format specifiers: %gr, %gt, %gI, gi

2016-07-10 Thread Theodore Ts'o

On Sun, Jul 10, 2016 at 02:16:45AM -0400, Jeff King wrote:
> I wonder if a better approach would be:
> 
>   1. In the short term, add specific designators for the fields you'd
>  want. One for HEAD@{n} that is unaffected by date, as %gd is (or
>  even one for the branch-name and one for "n"). And one for the
>  reflog date, by itself, in whatever format --date= asked for.
> 
>  That would let you do your format above, though it does not let you
>  show the reflog date in multiple formats.

Hrm, maybe.  I didn't realize that %gd and %gD displayed something
very different if --date is specified.  Is this documented?  I looked
everywhere, and the closest I could find is a mention in the
description of -g that if you specify commit@{now}, the output will
use commit@{timestamp} notation --- but that's different from
--date=xxx, and it doesn't actually specify which pretty-printer
format string this affects, although I suppose that's not that hard to
infer.

One other thing I'll note in passing is that the --date notation
doesn't support Unix timestamps.  So you can't actually do the
equivalent of %gt as proposed in this patch.

I'm not sure what designators we'd use for a HEAD@{n} that is
uneffected by date, and as far as which arbitrary two-letter code for
"reflog date in the default date format", we can't use %gd (ala %ad or
%cd), since it's already spoken for.  %gr, %gt, etc., at least have
the advantage that they are somewhat orthogonal to %ar/%at, %cr/%ct,
etc.

So I definitely understand the concern about the PP format string
being somewhat creaky, and obscure.  It's not entirely clearly to me
that adding the new designators actually doesn't add more bloat or
non-orthogonality.  I suppose we could add %gb for branch name, and
%gU for the HEAD@{n} nUm --- since %gn and %gN are already spoken for
--- and then use %gt for the reflog date in the default date format.
So that only adds three new two-letter formats, instead of the four in
my patch.

(BTW, I really only care about %gt and %gr --- so if the concern is
bloat, we could just add those two specifiers.  I just added %gi and
%gI because it wasn't hard, and I thought orthoganlity was better
where it was possible.)

>   2. In the long term, teach log's pretty formatter to handle less
>  obscure syntax, that can include arguments. The pretty-printer in
>  for-each-ref can already do "%(authordate:relative)", and accepts
>  any date-format that git knows about. We should do the same here.

See the above comment about our currently not supporting Unix time as
one of the date-formats.  So if the goal was to make the proposed new
pretty formatter be a superset of the percent expansion rules, there
isn't really a clean way of doing %at.

One possibility is %{authordate:format:%s} --- but it suffers from two
drawbacks:

(a) It's kind of ugly/obscure, since it gets us back to using
not-so-human-friendly percent expansions.

(b) It's not portable, since apparently %s isn't one of the strftime
formats which is guaranteed by the Single Unix Specification or the
C99 standard.  (Maybe it is implemented in all of the platforms we
care about (e.g., Windows, MacOS, etc.), though.)

One other long-term thought.  Maybe past a certain point, we should
just make it easy to get the data from git-log into a perl or pythons
script, where it becomes possible to do conditionals, more flexible
padding rules, etc.  So some kind of --format=yaml or --format=json
sort of thing.  Some interesting ideas of how we could do this can be
found here:

https://cloud.google.com/sdk/gcloud/reference/topic/formats

... although I doubt whether git would ever want to do the equivalent of:

gcloud compute images list  
--format='table[box,title=Images](name:sort=1,family)'

which will print something like this:

++
|   Images   |
+--+-+
|   NAME   |  FAMILY |
+--+-+
| centos-6-v20160629   | centos-6|
| centos-7-v20160629   | centos-7|
| coreos-alpha-1097-0-0-v20160702  | coreos-alpha|
| coreos-beta-1068-3-0-v20160627   | coreos-beta |
| coreos-stable-1010-6-0-v20160628 | coreos-stable   |
| debian-8-jessie-v20160629| debian-8|
| freebsd-101-release-amd64-20150101032704 | |
| opensuse-13-2-v20160222  | |
| opensuse-leap-42-1-v20160302 | |
| rhel-6-v20160629 | rhel-6  |
| rhel-7-v20160629 | rhel-7  |
| sles-11-sp4-v20160301| |
| sles-12-sp1-v20160301| |
| ubuntu-1204-precise-v20160627

[PATCH] pretty: add format specifiers: %gr, %gt, %gI, gi

2016-07-09 Thread Theodore Ts'o

Add new format specifiers which allow the printing of reflog
timestamp.  This allows us to know when operations which change HEAD
take place (e.g., guilt pop -a, which does the equivalent of a "git
reset --hard commit"), since using %cr will display when the commit
was originally made, instead of when HEAD was moved to that commit.

This allows something like:

git log -g --pretty=format:'%Cred%h%Creset %gd %gs %Cgreen(%gr)%Creset %s' 
--abbrev-commit

to provide what (for me) is a much more useful "git reflog" type of
report.

Signed-off-by: Theodore Ts'o <ty...@mit.edu>
---
 Documentation/pretty-formats.txt |  4 
 cache.h  |  1 +
 date.c   |  2 +-
 pretty.c | 18 
 reflog-walk.c| 45 ++--
 reflog-walk.h|  3 +++
 6 files changed, 61 insertions(+), 12 deletions(-)

diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
index 29b19b9..7927754 100644
--- a/Documentation/pretty-formats.txt
+++ b/Documentation/pretty-formats.txt
@@ -156,6 +156,10 @@ endif::git-rev-list[]
 - '%gE': reflog identity email (respecting .mailmap, see
   linkgit:git-shortlog[1] or linkgit:git-blame[1])
 - '%gs': reflog subject
+- '%gr': reflog date, relative
+- '%gt': reflog date, UNIX timestamp
+- '%gi': reflog date, ISO 8601-like format
+- '%gI': reflog date, strict ISO 8601 format
 - '%Cred': switch color to red
 - '%Cgreen': switch color to green
 - '%Cblue': switch color to blue
diff --git a/cache.h b/cache.h
index f1dc289..5dd2805 100644
--- a/cache.h
+++ b/cache.h
@@ -1237,6 +1237,7 @@ struct date_mode {
 #define DATE_MODE(t) date_mode_from_type(DATE_##t)
 struct date_mode *date_mode_from_type(enum date_mode_type type);
 
+time_t gm_time_t(unsigned long time, int tz);
 const char *show_date(unsigned long time, int timezone, const struct date_mode 
*mode);
 void show_date_relative(unsigned long time, int tz, const struct timeval *now,
struct strbuf *timebuf);
diff --git a/date.c b/date.c
index 4c7aa9b..f98502e 100644
--- a/date.c
+++ b/date.c
@@ -39,7 +39,7 @@ static const char *weekday_names[] = {
"Sundays", "Mondays", "Tuesdays", "Wednesdays", "Thursdays", "Fridays", 
"Saturdays"
 };
 
-static time_t gm_time_t(unsigned long time, int tz)
+time_t gm_time_t(unsigned long time, int tz)
 {
int minutes;
 
diff --git a/pretty.c b/pretty.c
index 330a5e0..eb1f44e 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1212,6 +1212,24 @@ static size_t format_commit_one(struct strbuf *sb, /* in 
UTF-8 */
placeholder[1],
c->pretty_ctx->reflog_info,
>pretty_ctx->date_mode);
+   case 'r':   /* date, relative */
+   strbuf_addstr(sb,
+   show_reflog_date(c->pretty_ctx->reflog_info,
+   DATE_MODE(RELATIVE)));
+   return 2;
+   case 'i':   /* date, ISO 8601-like */
+   strbuf_addstr(sb,
+   show_reflog_date(c->pretty_ctx->reflog_info,
+   DATE_MODE(ISO8601)));
+   return 2;
+   case 'I':   /* date, ISO 8601 strict */
+   strbuf_addstr(sb,
+   show_reflog_date(c->pretty_ctx->reflog_info,
+   DATE_MODE(ISO8601_STRICT)));
+   return 2;
+   case 't':
+   strbuf_addf(sb, "%lu", 
get_reflog_time_t(c->pretty_ctx->reflog_info));
+   return 2;
}
return 0;   /* unknown %g placeholder */
case 'N':
diff --git a/reflog-walk.c b/reflog-walk.c
index a246af2..d0aa2d0 100644
--- a/reflog-walk.c
+++ b/reflog-walk.c
@@ -292,17 +292,24 @@ void get_reflog_selector(struct strbuf *sb,
strbuf_addch(sb, '}');
 }
 
-void get_reflog_message(struct strbuf *sb,
-   struct reflog_walk_info *reflog_info)
+static struct reflog_info *get_reflog_info(struct reflog_walk_info 
*reflog_info)
 {
struct commit_reflog *commit_reflog = reflog_info->last_commit_reflog;
-   struct reflog_info *info;
-   size_t len;
 
if (!commit_reflog)
-   return;
+   return NULL;
+
+   return _reflog->reflogs->items[commit_reflog->recno+1];
+}
 
-   info = _reflog->reflogs->items[commit_reflog->recno+1];
+void get_reflog_message(struct strbuf *sb,
+   struct reflog_walk_info *reflog_info)
+{
+   struct reflog_info *

[PATCH] guilt: fix portability problem with using find -perm +111

2016-07-09 Thread Theodore Ts'o

GNU find no longers accepts -perm +111, even though the rest of the
world (MacOS, Solaris, BSD) still do.  Workaround this problem by
using -executable if the system find utility will accept it.

Signed-off-by: Theodore Ts'o <ty...@mit.edu>
---
 guilt | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/guilt b/guilt
index 38d426b..b90f02d 100755
--- a/guilt
+++ b/guilt
@@ -73,8 +73,17 @@ GUILT_PATH="$(dirname "$0")"
 
 guilt_commands()
 {
-   find "$GUILT_PATH/../lib/guilt" -maxdepth 1 -name "guilt-*" -type f 
-perm +111 2> /dev/null | sed -e "s/.*\\/$GUILT-//"
-   find "$GUILT_PATH" -maxdepth 1 -name "guilt-*" -type f -perm +111 | sed 
-e "s/.*\\/$GUILT-//"
+   # GNU Find no longer accepts -perm +111, even though the rest
+   # world (MacOS, Solaris, BSD, etc.) does.  Sigh.  Using -executable
+   # is arugably better, but it is a GNU extension.  Since this isn't
+   # a fast path and guilt doesn't use autoconf, test for it as needed.
+   if find . -maxdepth 0 -executable > /dev/null 2>&1 ; then
+   exe_test="-executable"
+   else
+   exe_test="-find +111"
+   fi
+   find "$GUILT_PATH/../lib/guilt" -maxdepth 1 -name "guilt-*" -type f 
$exe_test 2> /dev/null | sed -e "s/.*\\/$GUILT-//"
+   find "$GUILT_PATH" -maxdepth 1 -name "guilt-*" -type f $exe_test | sed 
-e "s/.*\\/$GUILT-//"
 }
 
 # by default, we shouldn't fail
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] guilt: update reflog with annotations of guilt-command being run

2016-07-09 Thread Theodore Ts'o

Many of the updates made by guilt use git update-ref, which means that
the output of "git reflog" is extremely unedifying, e.g:

ff0031d HEAD@{177}: reset: moving to ff0031d848a0cd7002606f9feef958de8d5edf19
90f4305 HEAD@{178}:
a638d43 HEAD@{179}:
ff0031d HEAD@{180}:
079788d HEAD@{181}:
87a6280 HEAD@{182}:
5b9554d HEAD@{183}:
de9e918 HEAD@{184}: reset: moving to de9e9181bc066d63d78b768e95b5d949e2a8673a
5b9554d HEAD@{185}:

So teach guilt to use the "set_reflog_action" helper, and since
git-update-ref doesn't respect the GIT_REFLOG_ACTION environment
variable, use its -m option so that "git reflog" can look like this
instead:

1eaa566 HEAD@{11}: guilt-push: track-more-dependencies-on-transaction-commit
ab714af HEAD@{12}: guilt-push: move-lockdep-tracking-to-journal_s
7a4b188 HEAD@{13}: guilt-push: move-lockdep-instrumentation-for-jbd2-handles
78d9625 HEAD@{14}: guilt-push: respect-nobarrier-mount-option-in-nojournal-mode
d08854f HEAD@{15}: guilt-pop: updating HEAD
d08854f HEAD@{16}: guilt-pop: updating HEAD
d08854f HEAD@{17}: guilt-push: 
optimize-ext4_should_retry_alloc-to-improve-ENOSPC-performance

Signed-off-by: Theodore Ts'o <ty...@mit.edu>
Cc: Josef 'Jeff' Sipek <jef...@josefsipek.net>
---
 guilt | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/guilt b/guilt
index 35177b9..38d426b 100755
--- a/guilt
+++ b/guilt
@@ -114,6 +114,7 @@ if [ $# -ne 0 ]; then
disp "" >&2
exit 1
fi
+   set_reflog_action "guilt-$CMDNAME"
 
shift
 else
@@ -640,7 +641,7 @@ commit()
commitish=`git commit-tree $treeish -p $2 < "$TMP_MSG"`
if $old_style_prefix || git rev-parse --verify --quiet 
refs/heads/$GUILT_PREFIX$branch >/dev/null
then
-   git update-ref HEAD $commitish
+   git update-ref -m "$GIT_REFLOG_ACTION" HEAD $commitish
else
git branch $GUILT_PREFIX$branch $commitish
git symbolic-ref HEAD refs/heads/$GUILT_PREFIX$branch
@@ -687,7 +688,8 @@ push_patch()
fi
fi
 
-   commit "$pname" HEAD
+   GIT_REFLOG_ACTION="$GIT_REFLOG_ACTION: $pname" \
+   commit "$pname" HEAD
 
echo "$pname" >> "$applied"
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: gc and repack ignore .git/*HEAD when checking reachability

2016-07-08 Thread Theodore Ts'o

On Fri, Jul 08, 2016 at 01:30:06PM -0700, Junio C Hamano wrote:
> 
> I can imagine I'd start hacking on a project that I rarely touch, give up
> resolving a "git pull" from an unconfigured place (hence, some stuff is
> only reachable from FETCH_HEAD), and after 2*N days, come back
> to the repository and find that I cannot continue working on it.

Sure, but that's something that could happen today, and no one has
really complained, have they?

> Turning the rule to "*_HEAD we know about, and those we don't that
> are young" would not change the situation, as I may be depending on
> some third-party tool that uses its OWN_HEAD just like we use
> FETCH_HEAD in the above example.
> 
> So I dunno if that is a good solution. If we are going to declare that
> transient stuff will now be kept, i.e. keeping them alive is no longer
> end user's responsibility, then probably we should make it end user's
> responsibility to clean things up.

Well, the question is what does "transient" stuff really mean?  If we
keep them forever, then are they really any different from stuff under
refs/heads?

Maybe pester the user if there is stale *_HEAD files, but don't
actually get rid of the objects?

- Ted

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: gc and repack ignore .git/*HEAD when checking reachability

2016-07-08 Thread Theodore Ts'o

On Fri, Jul 08, 2016 at 10:14:33AM -0700, Junio C Hamano wrote:
> 
> It cannot be "anything directly under .git that has all-caps name
> that ends with _HEAD".  The ones we write we know are going to be
> removed at some point in time (e.g. "git reset", "git bisect reset",
> "git merge --abort", etc.).  We do not have any control on random
> ones that the users and third-party tools leave behind, holding onto
> irrelevant objects forever.

What about anything that is all-caps and ends in _HEAD which has a
mod-time within the last N days?  (Where N is 2-7 days.)  If it's
older than that, it's almost certainly stale...

- Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-14 Thread Theodore Ts'o

On Thu, Apr 14, 2016 at 10:28:50AM -0700, H. Peter Anvin wrote:
> 
> Either way, I agree with Ted, that we have enough time to do it
> right, but that is a good reason to do it sooner rather than later
> (see also my note about freezing the cryptographic properties.)

Sure, I think we should do it as well.  But the fact that the attacker
will likely need to get a commit into the tree in order to be able to
carry out a collision attack means that it's easier (and probably less
detectable) to get some underhanded C code into the tree.  For one
thing, you just need to introduce it via a patch ("Hi, I'm super eager
newbie Nick, here's a cleanup patch!"), as opposed to getting a
sublieutenant to accept a git pull request.

Also, remember that while we can write programs that look for
suspicious git objects that have stuff hidden after the null
terminator (in fact, maybe that would be a good thing to add to git,
hmmm?), the state of the art in detecting underhanded C code which is
deliberately designed to not be noticed by static code checkers (or
humans doing a superficial code review, for that matter) is not
particularly encouraging to me.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-13 Thread Theodore Ts'o

On Tue, Apr 12, 2016 at 07:15:34PM -0400, David Turner wrote:
> 
> If SHA-1 is broken (in certain ways), someone *can* replace an
> arbitrary blob.  GPG does not help in this case, because the signature
> is over the commit object (which points to a tree, which eventually
> points to the blob), and the commit hasn't changed.  So the GPG
> signature will still verify.

The "in certain ways" is the critical bit.  The question is whether
you are trying to replace an arbitrary blob, or a blob that was
submitted under your control.

If you are trying to replace an arbitrary blob under the you need to
carry a preimage attack.  That means that given a particular hash, you
need to find another blob that has the same hash.  SHA-1 is currently
resistant against preimage attack (that is, you need to use brute
force, so the work factor is 2**159).  

If you are trying to replace an arbitrary blob which is under your
control, then all you need is a collision attack, and this is where
SHA-1 has been weakened.  It is now possible to find a collision with
a work factor of 2**69, instead of the requisite 2**80.

It was a MD5 collision which was involved with the Flame attack.
Someone (in probably the US or Isreali intelligence services)
submitted a Certificate Signing Request (CSR) to the Microsoft
Terminal Services Licensing server.  That CSR was under the control of
the attacker, and it resulted in a certificate where parts of the
certificate could be swapped out with the corresponding fields from
another CSR (which was not submitted to the Certifiying Authority)
which had the code signing bit set.

So in order to carry out this attack, not only did the (cough)
"unknown" attackers had to have come up with a collision, but the two
pieces of colliding blobs had to parsable a valid CSR's, one which had
to pass inspection by the automated CA signing authority, and the
other which had to contain the desired code signing bits set so the
attacker could sabotage an Iranian nuclear centrifuge.

OK, so how does this map to git?  First of all, from a collision
perspective, the two blobs have to map into valid C code, one of which
has to be innocuous enough such that any humans who review the patch
and/or git pull request don't notice anything wrong.  The second has
to contain whatever security backdoor the attacker is going to try to
introduce into the git tree.  Ideally this is also should pass muster
by humans who are inspecting the code, but if the attack is targetted
against a specific victim which is not likely to look at the code, it
might be okay if something like this:

#if 0  /* this is needed to make the hash collision work */
aev2Ein4Hagh8eimshood5aTeteiVo9hOhchohN6jiem6AiNEipeeR3Pie4ePaeJ
fo8eLa9ateeKie5VeG5eZuu2Sahqu1Ohai9ohGhuAevoot5OtohQuai7koo4IeTh
ohCefae4Ahkah0eiku2Efo0iuHai8ideaRooth8wVahlia0nuu1eeSh5oht1Kaer
aiJi4chunahK9oozpaiWu7viee5aiFahud6Ee2zieich1veKque6PhiaAit1shie
#endif

... was hidden in the middle of the replacement blob.  One would
*hope*, though, that if something like this appeared in a blob that
was being sent to the upstream repository, that even a sloppy github
pull request reviewer would notice.

That's because in this scenario, the attacker needs to be able to get
the first blob into the git tree first, which means they need to be
trusted enough to get the first blob in.  And so the question which
comes to mind is if you are that trusted (or if the git pull review
process is that crappy), might it not be easier to simply introduce an
obfuscated code that has a security weakness?  That is, something from
the Underhanded C contest, or an accidental buffer overrun, hopefully
one that isn't noticed by static code checkers.  If you do that, you
don't even need to figure out how to create a SHA-1 collision.

Does that mean that we shouldn't figure out how to migrate to another
hash function?  No, it's probably worth planning how to do it.  But we
probably have a fair amount of time to get this right.

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Malicously tampering git metadata?

2015-12-19 Thread Theodore Ts'o

On Sat, Dec 19, 2015 at 12:30:18PM -0500, Santiago Torres wrote:
> > Now, the crazy behavior where github users randomly and promiscuously
> > do pushes and pull without doing any kind of verification may very
> > well be dangerous. 
> 
> Yes, we were mostly familiar with this workflow before starting this
> research. I can see how the "github generation" is open to many attacks
> of this nature. Would git be interested in integrating a defense that
> covers users of this nature (which seems to be a growing userbase)?

One of the interesting challenges is that git is a pretty low-level
tool, and so people have built all sorts of different workflows on top
of it.

For example, at $WORK, we use gerrit, which is a code review tool, so
all git commits that are to be merged into the "upstream" repository
0gets pushed to a gerrit server, where it goes through a code review
process where a second engineer can review the code, request changes,
make comments, or ask questions, and where the git commits can go
through multiple rounds of review / revision before they are finally
accepted (at least one reviewer must give a +2 review, and there must
be no -2 reviews; and there can be automated tools that do build or
regression tests that can give automated -1 or -2 reviews) --- and
where all of the information collected during the code review process
is saved as part of the audit trail for a Sarbanes-Oxley (SOX)
compliance regime.

Other people use github-style workflows, and others use signed tags
with e-mail code reviews, etc.  And I'm sure there must be many others.

So the challenge is that in order to accomodate many possible
workflows, some of which use third-party tools, changes to make git
more secure for one workflow must not get in the way of these other
workflows --- which means that enforcement of new controls for the
"github generation" probably will have to be optional.  But then
people belonging to the "github generation" can also easily turn off
these features.  And as the NSA learned the hard way in Vietnam, if
the tools cause any inconenience, or has the perception of
constraining legitmate users, security features can have a way of
getting turned off.[1]

[1] A History of US Communications Security, The David G. Boak
lectures, Volume II, "Nestor in Vietname".  pg 43-44.  (A declassified
version can be found at:
http://www.governmentattic.org/18docs/Hist_US_COMSEC_Boak_NSA_1973u.pdf)

> > But so is someone who saves a 80 patch series from
> > their inbox, and without reading or verifying all of the patches
> > applies them blindly to their tree using "git am" --- or if they were
> > using cvs or svn, bulk applied the patches without doing any
> > verification
> 
> Just out of curiosity, are there known cases of projects in which this
> has happened (I've noticed that both Git and Linux are quite stringent
> in their review/merge process so this wouldn't be the case).

I can't point at specific instances, but given that in the "github
generation", people are fine with blindly pulling someone else's
Docker images and running it on their production servers or
workstations, and where software installation gets done with "wget
http://example.org | bash" or the equivalent, that it's probably more
often than we might be comfortable.

I also suspect that a bad guy would probably find inserting a
man-in-the-middle server into one of these installation flows is
probably a much more practical attack in terms of real world
considerations.  :-)

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Malicously tampering git metadata?

2015-12-18 Thread Theodore Ts'o

On Tue, Dec 15, 2015 at 10:26:39PM -0500, Santiago Torres wrote:
> 4) The developer pushes to upstream. All the traffic can be re-routed
> back to the original repository. The target branch now contains a
> vulnerable piece of code.

I assume you are assuming here that the "push to upstream" doesn't
involve some kind of human verification?  If someone tried pushing
something like this to Linus, he would be checking the git diff stats
and git commit structure for things that might look like "developer
negligence".  He's been known to complain to subsystem developers in
his own inimitable when the git commit structure looks suspicious, so
I'm pretty sure he would notice this.

But normally that developnment process we don't talk about "pushing to
upstream" as much as "requesting a pull".  So it would be useful when
you describe the attack to explicit describe the development workflow
that is vulnerable to your attack.

For example, in my use case, I'm authorative over changes in fs/ext4.
So when I pull from Linus's repo, I examine (using "gitk fs/ext4") all
commits coming from upstream that modify fs/ext4.  So if someone tries
introducing a change in fs/ext4 coming from "upstream", I will notice.
Then when I request a pull request from Linus, the git pull request
describes what commits are new in my tree that are not in his, and
shows the diffstats from upstream.  When Linus verifies my pull, there
are multiple opportunities where he will notice funny business:

a) He would notice that my origin commit is something that is not in
his upstream tree.

b) His diffstat is different from my diffstat (since thanks to the
man-in-the middle, the conception of what commits are new in the git
pull request will be different from his).

c) His diffstat will show that files outside of fs/ext4 have been
modified, which is a red flag that will merit more close examination.
(And if the attacker had tried to introduce a change in fs/ext4, I
would have noticed when I pulled from the man-in-the-middle git
repo.)

Now, if there is zero checking when the user pushes to upstream, then
yes, there are all sorts of potential problems.  But that's one of the
reasons why it's generally considered a good thing for Linux
developers to use as the origin commit for their work official
releases (which can be demarked using GPG-signed git tags).

So for example, the changes for ext4 that were sent to Linus for v4.4
was based off of v4.3-rc2:

git tag  --verify v4.3-rc2
object 1f93e4a96c9109378204c147b3eec0d0e8100fde
type commit
tag v4.3-rc2
tagger Linus Torvalds <torva...@linux-foundation.org> 1442784761 -0700

Linux 4.3-rc2
gpg: Signature made Sun 20 Sep 2015 05:32:41 PM EDT using RSA key ID 00411886
gpg: Good signature from "Linus Torvalds <torva...@linux-foundation.org>" [full]

And the changes which I sent to Linus were also signed by a tag, and
better yet, someone can indepedently verify this after the fact:

% git show --oneline --show-signature f41683a204ea61568f0fd0804d47c19561f2ee39
f41683a merged tag 'ext4_for_linus_stable'
gpg: Signature made Sun 06 Dec 2015 10:35:27 PM EST using RSA key ID 950D81A3
gpg: Good signature from "Theodore Ts'o <ty...@mit.edu>" [ultimate]
gpg: aka "Theodore Ts'o <ty...@debian.org>" [ultimate]
gpg: aka "Theodore Ts'o <ty...@google.com>" [ultimate]

They can also verify that the chain of commits that I sent to Linus
were rooted in Linus's signed v4.3-rc2 tag, so this kind of
after-the-fact auditing means that if there *were* funny business, it
could be caught even if Linus slipped up in his checking.

Now, the crazy behavior where github users randomly and promiscuously
do pushes and pull without doing any kind of verification may very
well be dangerous.  But so is someone who saves a 80 patch series from
their inbox, and without reading or verifying all of the patches
applies them blindly to their tree using "git am" --- or if they were
using cvs or svn, bulk applied the patches without doing any
verification

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Why not git reset --hard ?

2015-09-28 Thread Theodore Ts'o

I personally have in my .gitconfig:

[alias]
revert-file = checkout HEAD --

I'm not sure revert-file is the best name, but it's what I've used
because I've been contaminated by the concept/naming of "p4 revert",
which I do use a fair amount to undo local edits for one or more files
when I've been forced to use perforce or perforce-like systems.  Given
that it confuses the concept of how "git revert" works, maybe
something like "git unedit " would work better.

Given though it's so easy to address this with a single line in a
user's .gitconfig, I guess the question is whether it's worthwhile to
make a change that would be visible to all users, and perhaps more
importantly, all new users to git.

  - Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Specifying N revisions after the initial commit

2015-09-22 Thread Theodore Ts'o

On Tue, Sep 22, 2015 at 04:11:23PM -0400, Josh Boyer wrote:
> Oh, context would help, yes.  In the case of the tree I'm parsing, I
> know for a fact that the commit history is entirely linear and will
> (should) always remain so.  E.g.
> 
> A - B - C - D - E - F ... {N}
> 
> So yes, finding e.g. the second commit after the root is complicated
> for something resembling anything like a typical git repo, but this
> isn't like that.  In other words, I can cheat.  Or at least I'm pretty
> sure I can cheat :).

I'd suggest making your script makes sure "git rev-list --merges A..N"
doesn't output any commits, so you know for sure that the commit
history is linear.  That way you'll be certain that you can cheat.  :-)

- Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Enable core.fsyncObjectFiles by default

2015-06-24 Thread Theodore Ts'o

On Tue, Jun 23, 2015 at 10:32:08PM -0700, Junio C Hamano wrote:
 
 Regarding loose object files, given that we write to a temporary,
 optionally fsync, close and then move to the final name, would we
 still see partially written file if we omit the fsync, or would the
 corruption be limited to either empty or missing?

*Most* of the time the corruption will be an empty or missing file.
It's possible that the file could be partially written.  This is a
relatively low-probability event, with the probability going up if the
object file is large, and/or if the system is under memory pressure.

 The reason I am wondering is because the codepath to create an
 object (i.e. update-index --add, hash-object -w, or add) first
 checks if a packed or a loose object file _exists_ and if so
 bypasses writing the same thing anew, but the existence check for a
 loose object is to merely making sure that access(F_OK) (and
 optionally utime()) succeeds.  If the potential breakage is limited
 to truncation to empty, then we could replace it with stat(2) and
 st.st_size check, as no loose object file can be empty.

It would certainly be a good thing to do a st_size check; it can't
possible hurt, and it will catch a large number of failures after a
power failure.  I could also imagine some hueristics that force an
fsync if the object file is larger than a certain size (say, 4k if you
are very paranoid, a few hundred kilobytes if you are less so), but
past a certain point, it might be better just to tell the user to use
fsyncObjectFiles and be done with it.

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Enable core.fsyncObjectFiles by default

2015-06-23 Thread Theodore Ts'o

The main issue is that non-expert users might not realize that they
really need to run git fsck after a crash; otherwise, what might
happen is that although git is only appending, that you might have
some zero-length (or partially written) git object or pack files, and
while you might not notice at that moment, it might come and bite you
later.  If you do try to recover immediately after a crash, in the
worst case you might have to do that git am -s /tmp/mbox-filled-with
patches command, but otherwise you won't lose much data.

So perhaps one alternative strategy to calling fsync(2) after every
single git object file write might be to have git create a zero-length
.git/in-progress-pid file, which gets fsync'ed, and then it can do
the git am -s /tmp/akpm-big-bag-o-patches processing nice and fast,
and once git is done, then we call call sync(2) and then delete the
in-progress file.

If there is an in-progress file in the .git directory, git would then
automatically run git fsck to make sure the repository is consistent.

For people who care, maybe that's a good compromise.  (Me, the way
things are right now is just fine since I have a nice fast SSD, and so
setting fsyncObjectfiles is a perfectly fine thing as far as I am
concerned. :-)

   - Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: broken repo after power cut

2015-06-22 Thread Theodore Ts'o

On Mon, Jun 22, 2015 at 01:19:59PM +0200, Richard Weinberger wrote:
 
  The bottome lins is that if you care about files being written, you
  need to use fsync().  Should git use fsync() by default?  Well, if you
  are willing to accept that if your system crashes within a second or
  so of your last git operation, you might need to run git fsck and
  potentially recover from a busted repo, maybe speed is more important
  for you (and git is known for its speed/performance, after all. :-)

I made a typo in the above.  s/second/minute/.  (Linux's writeback
timer is 30 seconds, but if the disk is busy it might take a bit
longer to get all of the data blocks written out to disk and
committed.)

 I think core.fsyncObjectFiles documentation really needs an update.
 What about this one?
 
 diff --git a/Documentation/config.txt b/Documentation/config.txt
 index 43bb53c..b08fa11 100644
 --- a/Documentation/config.txt
 +++ b/Documentation/config.txt
 @@ -693,10 +693,16 @@ core.whitespace::
  core.fsyncObjectFiles::
   This boolean will enable 'fsync()' when writing object files.
  +
 -This is a total waste of time and effort on a filesystem that orders
 -data writes properly, but can be useful for filesystems that do not use
 -journalling (traditional UNIX filesystems) or that only journal metadata
 -and not file contents (OS X's HFS+, or Linux ext3 with data=writeback).
 +For performance reasons git does not call 'fsync()' after writing object
 +files. This means that after a power cut your git repository can get
 +corrupted as not all data hit the storage media. Especially on modern
 +filesystems like ext4, xfs or btrfs this can happen very easily.
 +If you have to face power cuts and care about your data it is strongly
 +recommended to enable this setting.
 +Please note that git's behavior used to be safe on ext3 with data=ordered,
 +for any other filesystems or mount settings this is not the case as
 +POSIX clearly states that you have to call 'fsync()' to make sure that
 +all data is written.


My main complaint about this is that it's a bit Linux-centric.  For
example, the fact that fsync(2) is needed to push data out of the
cache is also true for MacOS (and indeed all other Unix systems going
back three decades) as well as Windows.  In fact, it's not a matter of
POSIX says, but POSIX documented, but since standards are held in
high esteem, it's sometimes a bit more convenient to use them as an
appeal to authority.  :-)

(Ext3's data=ordered behaviour is an outlier, and in fact, the reason
why it mostly safe to skip fsync(2) calls when using ext3 data=ordered
was an accidental side effect of another problem which was trying to
solve based on the relatively primitive way it handled block
allocation.)

Cheers,

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in

Re: broken repo after power cut

2015-06-21 Thread Theodore Ts'o

On Sun, Jun 21, 2015 at 03:07:41PM +0200, Richard Weinberger wrote:

  I was then shocked to learn that ext4 apparently has a default
  setting that allows it to truncate files upon power failure
  (something about a full journal vs a fast journal or some such)

s/ext4/all modern file systems/

POSIX makes **no guarantees** about what happens after a power failure
unless you use fsync() --- which git does not do by default (see below).

 You mean the ext4 delayed block allocation feature/issue?
 IIRC Ted added some hacks to ext4 to detect misbehaving applications (Gnome 
 and KDE).
 But to my knowledge such an file corruption must not happen if the 
 application behaves well. And it can happen on all file systems.
 Ted, maybe you can help us? BTW: I'm using ext4's default mount options from 
 openSUSE, data=ordered.

The hacks (which were agreed upon by all of the major file system
developers --- ext4, btfs, xfs --- at the Linux File Systems and
Storage summit a couple of years ago --- protects against the default
text editors of GNOME and KDE which were saving file without using
fsync(), and in one particularly egregious example (although I don't
remember which program was doing this), updated files by opening the
file with O_TRUNC and then rewritng the new contents of the file.  So
if you crashed just after the open(2), and before the file data was
written, you were guaranteed to lose data.

The hack protects against data loss when programs updated a file
incompetently.  What we agreed to do was that upon renaming a fileA on
top of another fileB, there is an implicit writeback initiated of
fileA.  If the program properly called fsync(2) before closing the
file descriptor for fileA and doing the rename, this implicit
writeback would be no-op.  Simiarly, if a file descriptor was opened
with O_TRUNC, when the file descriptor is closed, we start an implicit
writeback at that point.  Note that this is not the same as a full
fsync; it merely closes the race window from 30 seconds down to a
second or so (depending on how busy the disk is).

But this hack does not protect against freshly written files, which is
the case of git object files or git pack files.  The basic idea here
is that you could have just as easily crashed before the commit as
after the commit, and doing an implicit writeback after all file
closes would have destroyed performance and penalized progams that
didn't really care so much about the file hitting disk.  (For example,
if you do a compile, and you crash, it's not such a big deal.)

The bottome lins is that if you care about files being written, you
need to use fsync().  Should git use fsync() by default?  Well, if you
are willing to accept that if your system crashes within a second or
so of your last git operation, you might need to run git fsck and
potentially recover from a busted repo, maybe speed is more important
for you (and git is known for its speed/performance, after all. :-)

The actual state of the source tree would have been written using a
text editor which tends to be paranoid about using fsync (at least, if
you use a real editor like Emacs or Vi, as opposed to the toy notepad
editors shipped with GNOME or KDE :-).  So as long as you know what
you're doing, it's unlikely that you will actually lose any work.

Personally, I have core.fsyncobjectfiles set to yes in my .gitconfig.
Part of this is because I have an SSD, so the speed hit really doesn't
bother me, and needing to recover a corrupted git repository is a pain
(although I have certainly done it in the past).

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in

Re: co-authoring commits

2015-06-17 Thread Theodore Ts'o

On Wed, Jun 17, 2015 at 10:26:32PM +0200, Tuncer Ayaz wrote:
 
 By allowing multiple authors, you don't have to decide who's the
 primary author, as in such situations usually there is no primary at
 all. I sometimes deliberately override the author when committing and
 add myself just as another co-author in the commit message, but as
 others have noted it would be really great if we can just specify
 multiple authors.

Just recently, there a major thread on the IETF mailing list where
IETF working group had drafts where people were listed as co-authors
without their permission, and were upset that the fact that their name
was added made it seem as if they agreed with the end product.  (i.e.,
that they were endorsing the I-D).  So while adding formal coauthor
might solves (a few) problems, it can also introduce others.

Ultimately there is one person who can decide which parts of the
changes to put in the commit that gets sent to the maintainer.  So
there *is* someone who is the primary author; the person who takes the
final pass on the patch and then hits the send key.

One could imagine some frankly, quite rare example where there is a
team of people who votes on each commit before it gets sent out and
where everyone is equal and there is no hierarchy.  In that case,
perhaps you could set the from field to a mailing list address.  But
honestly, how often is that *all* of the authors are completely
equal[1]?

In my personal practice, if I make significant changes to a patch, I
will indeed simply change the submitter, and then give credit the
original author.  This is the case where I'm essentially saying, Bob
did a lot of work, but I made a bunch of changes, so if things break
horribly, blame *me*, not Bob.

Alternatively, if I just need to make a few cosmetic changes to
Alice's patch (i.e., fix white spaces, correct spelling, change the
commit description so it's validly parsable and understandable
English, etc.), I'll just add a comment in square brackets indicating
what changes I made before I committed the change.  This seems to work
just fine, and I don't think we should try to fix something that isn't
broken.

- Ted


[1]  Gilbert and Sullivan attacked this notion is a commedic way in
The Gondoliers; especially in the songs Replying we sing as one
individual and There Lived a King:

 https://www.youtube.com/watch?v=YD0dgXTQ3K0
 https://www.youtube.com/watch?v=oSaVdqcDgZc
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git rebase: yet another newbie quest.

2014-09-08 Thread Theodore Ts'o

On Mon, Sep 08, 2014 at 05:52:44PM +0400, Sergey Organov wrote:
 
 I didn't intend to make topic branch from the very beginning, and
 already made a commit or two on the remote tracking branch bofore I
 realized I'd better use topic branch. It'd create no problem as far as I
 can see, provided vanilla git rebase has sane defaults. That said,
 I've already been once pointed to by Junio that my definition of sane
 doesn't take into account workflows of others, so now I try to be
 carefull calling vanilla git rebase names.

Right, so what I typically in that situation is the following:

on the master branch
hack hack hack
git commit
hack hack hack
git commit
oops, I should have created a topic branch
git checkout -b topic-branch
git branch -f master origin/msater

This resets the master branch to only have what is in the upstream
commit.

 Please also notice that I didn't pull immediately after I've re-arranged
 my branches, and this fact only made it more difficult to find and
 isolate the problem.

It's also the case that I rarely will do a git rebase without taking
a look at the branches to make sure it will do what I expect.  I'll do
that using either gitk or git lgt, where git lgt is defined in my
.gitconfig as:

[alias]
lgt = log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset 
%s %Cgreen(%cr)%Creset' --abbrev-commit

And typically what I will do is something like this:

gitk -20 master origin/master topic

-or-

git lgt -20 master origin/master topic

The git lgt command is very handy when I want to see how the
branches are arranged, and I'm logged remotely over ssh/tmux or some
such, so gitk isn't really available to me.

Cheers,

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git rebase: yet another newbie quest.

2014-09-08 Thread Theodore Ts'o

On Mon, Sep 08, 2014 at 07:47:38PM +0400, Sergey Organov wrote:
 
 except that I wanted to configure upstream as well for the topic-branch,
 that looks like pretty legit desire. If I didn't, I'd need to specify
 upstream explicitly in the git rebase, and I'd not notice the problem
 at all, as the actual problem is that git rebase and git rebase
 upstream work differently!

Right, so I never do that.  I have master track origin/master, where
it automagically does the right thing, but I'm not even sure I can
articulate what it *means* to have topic also track origin/master.  I
just don't have a mental model for it, and so it falls in the category
of it's too complicated for my simple brain to figure out.

So I just do git rebase master, and I would never even *consider*
doing a git pull --rebase.  I'll do a git fetch, and then look at
what just landed, and and then checkout master, update it to
origin/master, and then run the regression tests to make sure what
just came in from outside actually was *sane*, and only then would I
do a git checkout topic; git rebase master, and then re-run the
regression tests a third time.

Otherwise, how would I know whether the regression came in from
origin/master, or from my topic branch, or from the result of rebasing
the topic branch on top of origin/master?

And of course, this goes back to my observation that I don't rebase my
topic branchs all that often anyway, just because the moment you do
the rebase, you've invalidated all of the testing that you've done to
date.  In fact, some upstreams will tell explicitly tell you to never
rebase a topic branch before you ask them to pull it in, unless you
need to handle some non-trivial merge conflict.

Cheers,

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git rebase: yet another newbie quest.

2014-09-05 Thread Theodore Ts'o

I'm not going to say what you *should* have done, since it's not clear
whether anything close to what you were doing is a supported workflow.
But I can tell you what I *do* myself.  Personally, I vastly distrust
git pull --rebase.

So in general, my pulls are all the equivalent of git pull
--ff-only, and if I want to rebase the topic branch (which in
general, is a bad idea to do regularly; I will generally not do it at
all until I'm almost done).  So I'll branch the topic branch off of
origin (which tracks origin/master, typically):

git checkout -b topic1 origin
hack hack hack
git commit
.
.
.


Then I might do something like this to do a build:

git fetch origin ; git branch -f origin origin/master# this is optional
git checkout -B build origin
git merge topic1
git merge topic2
...
make

In general, I will only rebase a topic branch when it's needed to fix
a serious conflcit caused by significant changes upstream.  And in
that case, I might do something like this:

git checkout topic1
git rebase origin/master
make
make check


This basically goes to a philosophical question of whether it's
simpler to tell users to use a single command, such as git pull
--rebase, or whether to tell users to use a 2 or 3 commands that
conceptually much more simple.  Personally, I type fast enough that I
tend to use simple commands, and not try to use things like automated
branch tracking.  That way I don't have to strain my brain thinking
about things like fork points.  :-)

OTOH, some people feel that it's better to make things like git pull
--rebase work and do the right thing automatically, because
competing DSCM allows you to do it in a single command.  And indeed,
if you use git pull --rebase without any topic branches, it works
fine.  But then when you start wanting to do things that are more
complicated, the automated command starts getting actually harder and
more confusing (at least in my opinion).  

I don't know if a workflow involving topic branches was even expected
to work with git pull --rebase, and if so, how to set things up so
that they do work smoothly.  All I know is that the issue never arises
with me, because it's rare that I use git pull, let alone git pull
--rebase.  That's because I usually like to take a quick look at what
I've pulled (using gitk, or git log) before do the merge operation.

If I'm doing a pull from a repo that I control, and so I think I'm
sure I know what's there, I might skip the git fetch, and do a git
pull --ff-only instead.  But in general I prefer to do the merging
separate from the pull operation.

Cheers,

- Ted

P.S.  There is a separate, and completely valid discussion which is
how to prevent a newbie from falling into a same trap you did.  I'll
defer that discussion to others...

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ANNOUNCE] Git v2.1.0

2014-08-15 Thread Theodore Ts'o

On Fri, Aug 15, 2014 at 03:46:29PM -0700, Junio C Hamano wrote:
 The latest feature release Git v2.1.0 is now available at the
 usual places.

I pulled down git v2.1.0, and when I tried to build it via:

   make prefix=/usr/local profile-fast

The build died with this:

cannot open test-results/p5302-pack-index.subtests: No such file or directory 
at ./aggregate.perl line 77.
Makefile:7: recipe for target 'perf' failed
make[2]: *** [perf] Error 2
make[2]: Leaving directory '/usr/projects/git/git/t/perf'

Not a big deal, but I thought I would mention it.

Cheers,

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Use case (was Re: Should branches be objects?)

2014-06-25 Thread Theodore Ts'o

On Wed, Jun 25, 2014 at 10:42:49AM -0700, Junio C Hamano wrote:
 Nico Williams n...@cryptonector.com writes:
 
  On Tue, Jun 24, 2014 at 6:09 AM, Theodore Ts'o ty...@mit.edu wrote:
  ...
  This seems pretty close to what we have with signed tags.  When I send
  a pull request to Linus, I create a signed tag which createscontains a
  message about a set of commits, and this message is automatically
  included in the pull request message generated with git
  request-pull, and when Linus merges my pull request, the
  cryptographically signed tag, along with the message, date of the
  signature, etc., is preserved for all posterity.
 
  Thanks for pointing this out.  Signed tags are objects -- that's a
  clear and strong precedent..
 
 Sounds as if you are interpreting what Ted said as a supporting
 argument for having branches as separate type of objects, but the
 way I read it was signed tags are sufficient for what you want to
 do; adding a new branch type does not make much sense at this
 point.

Yes, that's what I was saying.  If you want to record a reliable who
pushed this (or who requested this to be pulled), you really want
to use a GPG signature, since otherwise the identity of the pusher can
be completely faked --- especially if the you have a tiered system
where you have sub-maintainers in the mix.  So if you want any kind of
auditability long after the fact, you want digital signatures, and so
a signed tag maps exactly to what you want --- modulo needing a
standardized Linus Torvalds bot.  But the nice thing about creating
such an automated pull request processing system is that it doesn't
require making any changes to core git.

If you insist that it has to be done via a git push, I suspect it
wouldn't be that hard to add changes to Gerrit (which already has an
concept of access control which ssh keys are allowed to push a
change), and extended it to include a hook that validated whether the
push included a signed tag.  Again, no core changes needed to git, or
to the repository format.

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Use case (was Re: Should branches be objects?)

2014-06-24 Thread Theodore Ts'o

On Mon, Jun 23, 2014 at 10:20:14PM -0500, Nico Williams wrote:
 
 Now, suppose that branches were objects.  Then at push time one might
 push with a message about the set of commits being pushed, and this
 message (and time of push, and pusher ID) would get recorded in the
 branch object.  At fetch time the branch objects's histories would be
 pulled (but usually never pushed), and would be available for browsing
 with git log at remotes/remote/branch.  Each commit of the branch
 object (as it were) would record each logical set of commits.

This seems pretty close to what we have with signed tags.  When I send
a pull request to Linus, I create a signed tag which createscontains a
message about a set of commits, and this message is automatically
included in the pull request message generated with git
request-pull, and when Linus merges my pull request, the
cryptographically signed tag, along with the message, date of the
signature, etc., is preserved for all posterity.

 Problem: if pushing via an intermediary the push metadat would get
 lost.  This would argue for either a stronger still notion of related
 commits, or none stronger than what exists now (because ETOOMUCH).
 But this branch object concept could also be just right: if pushing
 through a an intermediary (what at Sun was called a project gate) then
 it becomes that intermedirary's (gatekeeper's) job to squash, rebase,
 regroup, edit, drop, reword, ... commits.

With signed tags, the metadata is preserved even when the set of
commits is sent via an intermediary.

It seems the major difference is that it's a pull model, where some
projects seem much happier with a push model.  But that sounds like
what is needed is that someone replaces Linus Torvalds with a shell
script --- namely, an e-mail bot that receives pull requests, checks
the signed tag against an access control list, and if it is an
authorized committer, accepts the pull request automatically (or
rejects it if there are merge conflicts).

Not that I am suggesting for even a second that Linus could be fully
replaced by a shell script.  For example, he handles trivial merge
conflicts, and more importantly, applies a oh my G*d you must be
kidding taste filter on incoming pull requests, which I think would
be hard to automate.  Then again, neural networks have automatically
evolved to recognize cat videos, so we can't rule it out in the
future.  :-)

Cheers,

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GUILT v2 00/29] Teach guilt import-commit how to create legal patch names, and more

2014-05-13 Thread Theodore Ts'o

On Tue, May 13, 2014 at 10:30:36PM +0200, Per Cederqvist wrote:
 I recently found myself sitting on a train with a computer in front of
 me.  I tried to use guilt import-commit, which seemed to work, but
 when I tried to guilt push the commits I had just imported I got
 some errors.  It turned out that guilt import-commit had generated
 invalid patch names.

Thanks, I ran into this just last night (although I had manually
created the patch file from an e-mail I received instead of using
guilt import-commit).

  - Changed behavior: by default, guilt no longer changes branch when
you push a patch.  You need to do git config guilt.reusebranch
false to re-enable that.  This patch sets the default value of
guilt.reusebranch to true; it should in my opinion change to false
a year or two after the next release.

We've been living with the origin - guilt/origin branch change
for a year already, and in fact, these days I've gotten used to the
new behavior.  Is it really worth it to change the default?

   - Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: What is missing from Git v2.0

2014-04-25 Thread Theodore Ts'o

On Fri, Apr 25, 2014 at 09:48:53AM +0200, Philippe Vaucher wrote:
 
 I agree. The stage area is a very important concept in git, why not
 talk git commands that refers to it? Then we could add flags like
 --new-files or --deleted-files for better granularity than the current
 --all flag.

One caution: The term stage/staged is already a little overloaded.
We generally use the word staged to refer to changes that are in the
index, but the term stage as a noun generally refers to referencing
the different versions of a file during a merge operation (cf git
ls-files --stage).

 I think starting by documenting the issues is a good idea, maybe on a
 wiki, and start some draft of a proposed solution that would improve
 in an iterative process.

And it would be nice if the issues were discussed in a way that
acknowledged that all changes have tradeoffs, both positive and
negative, and to clearly articulate whether the concern is just
someone going uh, 'index' is a wierd term, but once they learn it,
it's pretty clear, versus a case where there is continuous confusion
due to overloaded meanings, or for people for whom English might not
be the first language.

And most importantly, to avoid rheteroic.  In fact, given that strong
use of rhetoric is often used to disguise a weakness of a position
that can't be defended using logic and data, someone who tries to win
arguments using the last post wins style of discourse, and a heavy
use of rhetoric, may find that people just simply decide that it's a
better use of their time not to engage and to just kill the entire
thread.

Regards,

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: What is missing from Git v2.0

2014-04-25 Thread Theodore Ts'o

On Fri, Apr 25, 2014 at 04:23:43PM +0200, Philippe Vaucher wrote:
 
 I agree, but I think it's better than index tho. That one is heavily
 overloaded and easily confused with other meaning in other softwares.

There is a big difference between being used in a difference sense
than other software --- there is a one-time learning curve after which
point people can generally understand that a term in a given context
has a single meaning --- and when we have two very easily confused
terms (i.e., stage versus staged) or a single identical term,
overloaded within a single context.  So I'm much more worried about
the git documentation using the same term or two closely related terms
in an overloaded fashion, much more than I am with index meaning one
thing for databases, and another thing for book publishers, and yet
another for compilers.

 Yes, of course there should be a list of both positive and negative
 tradeoffs. But I think the overloaded argument can be easily solved
 by renaming one of the overloads.

And renaming one of a term also has costs, especially if it is one
that is in use in large amounts of documentation, both in the git man
pages, and in web pages across the web.

And my plea for data extends even here.  For example, things like
this:

www.google.com/trends/explore#q=git%20staging%20area%2C%20git%20indexcmpt=q

 Unfortunately yes, I see many people being silly in order to win
 arguments, both in the pro-changes and against-changes side of the
 discussion. I'd be much simpler to simply gather arguments on some
 wiki and eventually do a vote when the list is complete about the
 proposed change.

Voting is not a good way to do software development.  That way lies
people wanting to whip up clueless folks using rhetoric (exhibit one:
Fox News) to vote and so it's not necessarily the best way to make
thoughtful decisions.  Using hard data, including possibly formal UX
experiments, is a much better way to make such decisions.

Cheers,

- Ted

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: What is missing from Git v2.0

2014-04-24 Thread Theodore Ts'o

On Thu, Apr 24, 2014 at 03:23:54AM -0500, Felipe Contreras wrote:
 
 There is evidence for the claim that there won't be those problems. You have
 absolutely no evidence there there will.

Felipe,

It's clear that you've not been able to produce evidence that can
convince most of the people on this thread.  Simply repeating the same
assertions over and over again, in a shrill fashion, is not likely to
convince those of us who that this would not be a good idea for git
v2.0.

Creating a ~/.gitconfig file if one doesn't already is one I agree
with, and at least on Unix systems, telling them that the config file
lives in ~/.gitconfig, or where ever it might happen to be on other
platforms, is a good one.  If it's in some really weird place on
Windows, then sure, we can tell them about git config -e.  But the
point is to let the user look at the default .gitconfig file, where we
can put in comments to help explain what is going on, and perhaps have
links to web pages for more information.

I don't even think we need to query the user to fill out all of the
fields.  We can prepopulate a lot of the fields (name, e-mail address,
etc.) from OS specific defaults that are available on most systems ---
specifically, the default values we would use the name and e-mail
address are not specified in a config file.

We can just tell the user that we have created a default .gitconfig
file, and tell them how they can take a look at it.

In the long term, if the worry is how to bridge the gap between
complete newbies, one way of dealing with this is to have a tutorial
mode (off by default, on in the default .gitconfig) which despenses
some helpful hints at certain strategic points (i.e., after five
commits, give a message that introduces git log --oneline, after the
third merge commit is created by the user, give a message which
introduces git log --merge, and so on).  The challenge is not strawing
over the line to the point where the hints become as annoying as
clippy, but that is what UX labs are for, to tune the experience for
completely new users to git.

Without doing a formal UX experiment, all of us are going to making
assertions without formal evidence --- at best some of us who have
tutored a few newbies might have some anecdates, but remember the old
saying about the plural of anecdote not being data.

Cheers,

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: What is missing from Git v2.0

2014-04-22 Thread Theodore Ts'o

On Tue, Apr 22, 2014 at 02:23:18PM -0500, Felipe Contreras wrote:
  I am not fundamentally opposed.  I just do not think it would add
  much value to new people at this point, and it will actively hurt
  if we shoved barely cooked one in 2.0.
 
 You are probably biased in that you've used Git far much more than
 the average user has (or future new users).

I think Junio has a really strong point.  If the goal is to make life
easier for new users, allowing them to save a few keystrokes is
probably not the most significant thing we can do.  And we have to
balance this with the additional cognitive load in remembering how a
particular two character alias maps to the real command.  This is
especially true for commands which might not be used as often -- e.g.,
rebase, and for commands where the meaning of git commit without
any argument is qualitatively different from what ci (for checkin)
means in most other source management systems.

So I do think it's worth thinking about this very carefully.  For
certain, I would **not** recommend using shortcuts in example command
sequences.  If the user reads git rebase or git cherry-pick it
means a lot more than if they see a series of apparent chicken
scratches filled with things like git rb, git pi, git st, etc.

In fact, to be fair, you may be getting biased because you're used to
using the two character shortcuts, so for you, of *course* rb and
pi and ci make a lot of sense.  But for someone who is starting
from scratch, I really question how much it helps, and how much it
might hurt, to see the two character shortcuts or even to have to
remember the two character shortcuts.  And for a command like rebase
where the user can very easily shoot themselves in the foot to begin
with, I'd actually suggest that it's a _good_ thing that they have to
type it out in full.

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: What is missing from Git v2.0

2014-04-21 Thread Theodore Ts'o

On Mon, Apr 21, 2014 at 09:47:57PM +0200, Sebastian Schuberth wrote:
 On Mon, Apr 21, 2014 at 9:34 PM, Felipe Contreras
 felipe.contre...@gmail.com wrote:
 
  I have these aliases as well, except br = b, and cp = pi. 'br' is probably
  better, but not sure as 'cp' which can be confusing.
 
 If by confusing you refer to cp to copy files, that's actually what
 I like about it: cherry-pick is somewhat like copying commits, thus
 cp makes much sense to me.

The problem is that between git rm and git mv, if we default git
cp to mean cherry-pick there could easily be user confusion.

I'm not sure that cherry-pick is used that often it really needs a two
character shortcut.  Maybe just git pick?

Personally, git branch and git checkout are finger macros that I
type very quickly, so creating two character alias probably wouldn't
save me that much time.  But I do appreicate that there are folks for
which such aliases might be useful.

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: `git stash pop` UX Problem

2014-02-26 Thread Theodore Ts'o

On Tue, Feb 25, 2014 at 11:12:10AM -0800, Junio C Hamano wrote:
 So, I tend to agree with you, while I do understand where I want to
 know about what is in stash is coming from (and that is why we do
 have git stash list command).

One thing that would be nice is if there was built-in git stash list
option which only shows the stash items which match the current
branch.  The discussion on this thread inspired me to create the
following:

#!/bin/sh

b=$(git symbolic-ref HEAD | sed -e 's;refs/heads/;;')
git stash list --pretty=%gd %cr on: %s | grep WIP on $b | \
sed -e s/ WIP on $b: [0-9a-f]*//

This results in:

stash@{0} 4 weeks ago on: mke2fs: add make_hugefile feature
stash@{1} 5 weeks ago on: e2fsck, mke2fs: enable octal integers in the 
profile/config file
stash@{2} 5 weeks ago on: e2fsck, mke2fs: enable octal integers in the 
profile/config file
stash@{3} 5 weeks ago on: mke2fs: optimize fix_cluster_bg_counts()
stash@{4} 8 weeks ago on: e4defrag: choose the best available posix_fadvise 
variant
stash@{5} 9 weeks ago on: e2image: add -c option to optimize file system 
copying for flash devices
stash@{6} 9 weeks ago on: e2image: clean up gcc -Wall and sparse nits
stash@{7} 9 weeks ago on: e2fsck: fix printf conversion specs in ea_refcount.c

(Yes, I have a lot of junk on my git stash; showing the relative time
is going to help my GC what I have left on my git stash list.)

Cheers,

- Ted

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] commit: Add -f, --fixes commit option to add Fixes: line

2013-10-27 Thread Theodore Ts'o

One of the uses of the Fixes commit line is so that when we fix a
security bug that has been in mainline for a while, it can be tricky
to determine whether it should be backported in to the various stable
branches.  For example, let's suppose the security bug (or any bug,
but one of the contexts where this came up was for security fixes) was
introduced in 3.5, and backported into the 3.2.x kernel series, but
couldn't be applied into the 3.2.0 kernel series.  The security fix
was introduced in 3.12, and so it would be obvious that it should be
backported to the 3.10 kernel series, but it might not be so obvious
that it would also be required for the 3.2.x long-term stable series.

So the inclusion of the Fixes: line provides this critical bit of
information.  It's also useful not just for the long-term stable tree
maintainers, but the maintainers of distro kernels would also find it
to be very useful.

 I see that there a consistency check that the --fixes argument is a
 valid commit.  But is there/should there be a check that it is an
 ancestor of the commit being created?  Is there/should there be a check
 that both of these facts remain true if the the commit containing it is
 rebased, cherry-picked, etc?
 
 In workflows that make more use of cherry-picking, it could be that the
 original buggy commit was cherry-picked to a different branch.  In this
 case the user would probably want to cherry-pick the fixing commit to
 the other branch, too.  But then the commit that it would be fixing
 would have a different SHA-1 than it did on the original branch.  A
 check that the Fixes: line refers to an ancestor of the current commit
 could warn against such errors.  (In some cases it might be possible to
 use cherry-pick's -x lines to figure out how to rewrite the Fixes:
 line, but I doubt that would work often enough to be worthwhile.)

I believe that in the discussions we had, it was assumed that the
Fixes: line would reference the commit in the mainline kernel tree.
i.e., it would always reference the commit which introduced the bug in
3.5, even if the commit-id after the buggy commit was backported to
3.2.x would obviously be different.  Presumably the distro kernel
maintainer would be able to find the commit in Linus's tree and then
try to find the corresponding commit in the distro kernel git tree,
probably by doing string searches over git log.

We could actually do a much more elegant job if we did have the
concept of commit identity (i.e., ChangeID's) baked into git.  That
way, there would be a constant ChangeID that would remain constant not
only across revisions of a patch under development, but also when the
commit is cherry picked into stable branches.  If we had that, then
instead of doing string searches on git log output, we could imagine a
web and/or command line interface where given a ChangeID, it would
tell you which branches or which tags contained the same semantic
patch.

Of course, as soon as you do that, then if the multiple commits get
squashed together, you might need to have to support multiple
ChangeID's associated with one commit, at which point it becomes
incompatible with Gerrit's use of this feature.

So we could add all sorts of complexity, but it's not obvious to me
that it's worth it.

 First of all, let me show my ignorance.  How formalized is the use of
 metadata lines at the end of a commit message?  I don't remember seeing
 documentation about such lines in general (as opposed to documentation
 about particular types of lines).  Is the format defined well enough
 that tools that don't know about a particular line could nonetheless
 preserve it correctly?  Is there/should there be a standard recommended
 order of metadata lines?  (For example, should Fixes: lines always
 appear before Signed-off-by lines, or vice versa?)  If so, is it
 documented somewhere and preserved by tools when such lines are
 added/modified?  Should there be support for querying such lines?

Internally inside Google, we have tools that will assist in forward
porting local changes from a 3.x based kernel to a 3.y kernel, to make
sure that all local changes are properly accounted for and none are
accidentally dropped during the rebase operation.  So we have various
new metadata lines that we add internally, for example:

Upstream-3.x-SHA1: commit-id
for commits in newer kernels that have been backported
Origin-3.x-SHA1: commit-id
to indicate the commit-id of a patch that was forward ported
as part of a rebase operation from 3.x to 3.9
Upstream-Dropped-3.x-SHA1: commit-id
As part of an empty commit to indicate that a patch that was
originally in our tree, has since been pushed upstream, so we
can drop it as part of the rebase to the 3.y kernel.

etc.

Other projects have various metadata lines to reference a bug-tracker
id number; folks may have seen commits with various metadata id's in
public git repositories such as:

Google-Bug-Id: 12345

Re: My patches

2013-10-18 Thread Theodore Ts'o

On Fri, Oct 18, 2013 at 06:41:41AM -0500, Felipe Contreras wrote:
  And I hazard to guess that the vast majority agree with Junio on this 
  (based,
  again, on email evidence). Not with you.
 
 That is irrelevant, and a fallacy. The vast majority of people thought the
 Earth was the center of the universe, and they were all wrong.
 
 It's called ad populum fallacy, look it up. Wether the majority of Git
 developers agree that there's something more than a disagreement is 
 irrelevant,
 their opinion doesn't change the truth.

Look, the problem is that you insist on being right, even on matters
which may be more about taste and preference than anything that can be
proven mathematically.  Worse, you insist on trying to convince people
even when it might be better to just give up and decide that maybe
something not's worth the effort to get the last word in.  This is how
we get to centithreads.  If every time someone disagrees, you insist
on replying, and then if people give up in disgust, you then try to
use that as proof that you must be right, since you've dazzled them
with your brilliance, that's not good for the development community.

Sometimes a question is important enough that it's worth doing this.
But I'd suggest to you that you should ask yourself whether you're
doing it too often.

After all, this is open source.  If you are convinced that you are
right, and everyone else in the community is wrong, it is within your
power to fork the code base, and to prove us wrong by creating a
better product.

Or you can decide to just keep a patch set to yourself, or perhaps
post it periodically, if it is some key feature that you are certain
you or your company can't live with out.  Heck, I've done this with
ext4, even though I'm the maintainer --- there have been features that
I know are critical for my company, but the rest of the ext4
development community are stridently against.  I've just simply posted
the patch set once, and if it gets strongly opposed, I'll just keep it
as a out-of-tree patch.

The fallocate NO_HIDE_STALE flag is a good example of that; it's used
in production on thousands and thousands of servers by Google and Tao
Bao, but since there was strong opposition on the ext4 list, we've
kept it as an out-of-tree patch.  Note what I did not do.  I did not
force the patch in, even though it might be within my power as the
maintainer; nor did I try to convince people over and OVER and OVER
again about the rightness of my position, and turn it into a
centithread.

 My claim is that all I did was disagree with Junio. You can invalidate that
 claim easily by providing *a single* example where I did more than disagree.

The problem is when you disagree with a number of people (not just
Junio), and you are, in my opinion, overly persistent.  We can argue
whether you've stepped over the line in terms of impugning people's
motives or sanity, but that's not necessarily the most important
issue.  People sometimes step over the line, and while that's
unfortunate, it's when it becomes a persistent pattern, and it happens
frequently enough, that it becomes a real problem.

Alternatively, if you are right that Junio is mad, and he's being
capriciously insulting, then I'm sure that when you establish your own
fork, lots of developers will come flocking to your flag.  If they do
not, then perhaps you might want to take that as some objective
evidence that the community is perhaps, more willing to work with him,
then they are to work with you.

Best regards,

- Ted

P.S.  There are plenty of things that I consider to be unfortunate
about git's command line interface, in terms of inconsistencies and
confusing terminology.  Over the past 5+ years, I've observed that I
think the way commit selection in git format-patch is inconsistent
with how we handle commit selection for other commands, e.g., git log
commit vs and git format-patch commit.  Even if you think that
this is a matter of self-inherent truth, versus just a matter of
taste, there is also the consideration of backwards compatibility, and
the question of how important consistency and easy of learning gets
traded off against backwards compatibility and invalidating
potentially huge numbers of shell scripts and documentation.  So it's
not something where I've made a nuisance of myself, because it's a
settled issue.

As another example, people have agreed for a long time that the fact
that tab characters are significant in Makefiles is highly
unfortunate.  However, no one is running around calling the GNU Make
maintainers insane for not being willing to make a change that would
break huge numbers of Makefiles in the world.  More importantly,
people aren't brining up the same subject over and over and over again
on the GNU Makefile mailing list.  Perhaps you might consider what
would be the appropriate response if someone insisteted on creating
centithreads on the GNU Makefile discuss list on that subject.

Re: [PATCH/RFC] Developer's Certificate of Origin: default to COPYING

2013-09-12 Thread Theodore Ts'o

I certainly wouldn't recommend messing with the text of the DCO
without first consulting some lawyers.  There should also be some
centralized coordination about any changes in the text and the version
number.

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Documentation/CommunityGuidelines

2013-06-12 Thread Theodore Ts'o

On Tue, Jun 11, 2013 at 07:10:11PM +0530, Ramkumar Ramachandra wrote:
 
 Presumably, Felipe is the fire hazard that we are talking about, and
 nobody else is to blame.  He must be removed to prevent future
 fires.  This is the perception of the regulars, correct?
 
 Then why haven't you removed him yet?  What are you waiting for?  You
 don't need my approval.

He (and you) get removed when individuals who have decided the vast
majority of their e-mails shed more heat than light, and so people
decide that it's not worth reading their e-mails.  I have persionally
made this determination for both you and for Felipe; for you, your
participation in this thread was what set the bozo bit.

Now, I'm not a major developer for git, so my personal decision
doesn't make a huge amount of difference.

But if people who *are* senior developers in the git community decide,
on their own, that someone isn't worth listening to, there's the
punishment has been inflicted, and this happens without banning
someone from posting or removing them from the mailing list.

Please stop.

Regards,

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Documentation/CommunityGuidelines

2013-06-12 Thread Theodore Ts'o

On Tue, Jun 11, 2013 at 06:19:23PM -0500, Felipe Contreras wrote:
 Fair? Fairness requires to judge each action without biases, nor
 double standards. In the case of an open source community it requires
 you to listen to the arguments before dismissing them, and consider
 the patches before dropping them on the floor. Fairness requires no
 favoritism.

At least in development communities that *I* run, if someone were as
rude to me as you have been in some previous exchanges to Junio, I
would have set the bozo bit a long time ago and reviewed your
submissions with a very jaudiced eye, and treated your non-technical
arguments with same amount of attention as I give madmen and drunkards
in the street.  Junio has given you *far* more latitude than I would
have.

Keep in mind, the demands for respect go in both directions, and in
non-technical matters about style and good taste, at the end of the
day the maintainer does get to have the final say, because he or she
is the one who applies the patches or accepts the pull request.  So if
the maintainer says something like, maintaining ABI backwards
compatibility for libext2fs (or for kernel syscalls) is critically
important, that's not up to you.  Sending me abusive e-mails about
how I'm not listening to your arguments isn't going to help.  You can
try to change my mind with reasoned arguments, but for questions like
that, or what functions do or don't belong in a library, the
maintainer is the benevolent dictator.

Things a very different for things like this change causes a 30%
performance regression in a particular workload.  For those sorts of
technical questions, a much more collaborative discussion style is
important.  But for questions of what is and isn't good taste, it's
not a good idea to reply to a maintainer's e-mail with that's your
opinion over and over again.  For things like that it *IS* his (or
her) opinion, and if you can't live with it, you'll save a lot of
bandwidth on the mailing list by moving on to some other project.

Regards,

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Added guilt.reusebranch configuration option.

2013-05-23 Thread Theodore Ts'o

On Thu, May 23, 2013 at 03:22:50PM +0530, Ramkumar Ramachandra wrote:
 Theodore Ts'o wrote:
  Right now I do this just by being careful, but if there was an
  automatic safety mechanism, it would save me a bit of work, since
  otherwise I might not catch my mistake until I do the git push
  publish, at which point I curse and then start consulting the reflog
  to back the state of my tree out, and then reapplying the work I had
  to the right tree.
 
 My scenario is a bit different, and I think this safety feature is
 highly overrated.  It's not that I'll never rewind some branches, but
 rewind other branches, but rather I might rewind anything at any
 time, but I want immediate information so I can quickly inspect @{1}
 to see if that was undesirable.

Spekaing of which, what I'd really appreciate is timestamps associated
with the reflog.  That's because the most common time when I've
screwed something up is after doing a git rebase -i and so the
reflog has a *huge* number of entries on it, and figuring out which
entry in the reflog is the right one is painful.  If could tell at a
glance when each entry of the reflog was created, it would make it a
lot easier to untangle a tree mangled by git rebase -i.

In practice, it means I waste five minutes carefully inspecting a few
dozen entries on the reflog, so it's not a disaster, although I'm
generally cursing the whole time while I'm trying to untangle the
whole mess.

This issue with reflogs not having timestamps isn't primarily about
rewind safety, BTW; it's just one of the things which make consulting
the reflog painful --- and it's much more likely happens after I screw
up a git rebase -i, generally because of what happens when there's a
merge conflict and then I accidentally fold two commits together
unintentionally.  The times when I've screwed up a non-rewinding
branch and then needed to recover after discovering the problem when I
try to publish said branch are admittedly rare; maybe once or twice
times in the past twelve months.

 So, do you still need this rewinding safety thing?

Meh; I don't *need* it.  But then again, I'm an fairly experienced git
user.  The fact that I use guilt without the guilt/master safety
feature and have never gotten bitten by it --- in fact I deliberately
publish rewindable branches with a guilt patch series applies speaks
to the fact that I'm pretty experienced at rewindable heads.

The only reason why I suggested it is because I believe it would be
useful for people with less experience, and perhaps it would help make
rewindable branches less scary, and less subject to a lot of the
fearmongering that you see on the blogosphere.

 
  So what I do is something like this:
 
  git push publish ; git push repo ; git push code
 
 While we can definitely make the UI better for this (maybe push
 --multiple?), there is no fundamental change: we have to re-initialize
 all the refspecs, connect to the remote via the transport layer and
 prepare a packfile to send.  In other words, it's impossible to make
 it any faster than what you get with the above.

Sure, and if I cared I'd make a git alias to automate this, instead of
depending on finger macros.

 So you're a batched-push person.  And the above makes it clear that
 you don't want to explicitly differentiate between a push and push -f
 (the +pu thing).  And this assumes that you never create any new
 branches (I branch out all the time), otherwise you'd have rules for
 refs/heads/*.

I create new branches all the time.  But they are for my own personal
testing purposes.  So it's fairer to say that I rarely *publish* new
branches; I generally stick to the standard set of next, master,
maint, and pu.  And part of that is that even publishing this number
of branches is enough to sometimes confuse the e2fsprogs developers
who are pulling from my tree.

So what I've done in the past is to create a whole bunch of feature
branches, and then merge them into the pu branch, and then only
publish the pu branch.  And I try to get the feature branches cleaned
up as quickly as I have time, so they can appear on the maint or
master/next branches sooner rather than later.

 Just out of curiosity, do you ever have ref-renaming
 requirements (like push = refs/heads/*:refs/heads/tt/*)?  We were
 discussing that on another thread, but I haven't found an
 implementation I'm happy with yet.

In general, no, I don't do that, for the reasons stated above --- even
publishing four branches gets to be confusing enough for people who
are looking at my tree.

I'm sure other people and other communities use git differently, so
please insert the standard disclaimer that there's more than one way
to skin a cat.

Regards,

- Ted

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] guilt: fix date parsing

2013-05-22 Thread Theodore Ts'o

On Tue, May 21, 2013 at 11:39:21PM -0400, Josef 'Jeff' Sipek wrote:
 I applied this one and the guilt: skip empty line after... patch.

Thanks!  BTW, it looks like you are not using git am -s to apply
these patches?  The reason why I ask is that whatever you're using
isn't removing the [XXX] subject prefix (e.g., [PATCH] or [PATCH -v2]
which is useful for mailing lists, but less useful in the git commit
descriptions.

If you're using guilt, do you have some script that preformats a Unix
mbox into guilt-friendly files?  If so, maybe it would be good to
modify it to strip out the [PATCH] annotations.  If not, let me know,
since I've been thinking about writing a script to take a Unix mbox,
and bursts it into a separate patch-per-file with a series file
suitable for use by guilt, removing mail headers and doing other
appropriate pre-parsing --- basically, a guilt am which works much
like git am.  But if someone else has done this already, no point
duplicating effort.  :-)

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH -v2] guilt: force the use of bare branches

2013-05-22 Thread Theodore Ts'o

To make it harder to accidentally do git push with a guilt patch
applied, guilt push changes branch from e.g. master to
guilt/master starting with commit 67d3af63f422.  This is a feature
which I use for ext4 development; I actually *do* want to be able to
push patches to the dev branch, which is a rewindable branch much like
git's pu branch.

Allow the use of the environment variable GUILT_FORCE_BARE_BRANCH
which disables the new behavior introduced by commit 67d3af63f422.

Signed-off-by: Theodore Ts'o ty...@mit.edu
Cc: Per Cederqvist ced...@opera.com
---
 guilt | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/guilt b/guilt
index e9b2aab..35a84dc 100755
--- a/guilt
+++ b/guilt
@@ -914,13 +914,22 @@ else
die Unsupported operating system: $UNAME_S
 fi
 
-if [ $branch = $raw_git_branch ]  [ -n `get_top 2/dev/null` ]
-then
-# This is for compat with old repositories that still have a
-# pushed patch without the new-style branch prefix.
+if [ -n `get_top 2/dev/null` ]; then
+  #
+  # If we have repositories patches pushed, then use whatever scheme
+  # is currently in use
+  #
+  if [ $branch = $raw_git_branch ]; then
 old_style_prefix=true
+  else
+old_style_prefix=false
+  fi
 else
+  if [ $(git config --bool --get guilt.bareBranch) = true ]; then
+old_style_prefix=true
+  else
 old_style_prefix=false
+  fi
 fi
 
 _main $@
-- 
1.7.12.rc0.22.gcdd159b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Added guilt.reusebranch configuration option.

2013-05-22 Thread Theodore Ts'o

I just had another idea (although I haven't had a chance to code up
anything yet).  Perhaps instead of, or in addition to, a global
setting (i.e., guilt.reusebranch), perhaps we should have a per-branch
setting, such as branch.branch.guiltReuseBranch?

I was actually thinking that it might be interesting to have a
branch.branch.rewindable, which would change the guilt defaults, and
could also key changes in key git behavior which makes it less likely
that a user shoots him or herself in the foot --- i.e., give warnings
if he or she has modified the branch in such a way that
remotes.origin.branch is no longer contained within the branch head.

  - Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Added guilt.reusebranch configuration option.

2013-05-22 Thread Theodore Ts'o

On Wed, May 22, 2013 at 10:58:49AM -0700, Junio C Hamano wrote:
 Theodore Ts'o ty...@mit.edu writes:
 
  I was actually thinking that it might be interesting to have a
  branch.branch.rewindable, which would change the guilt defaults, and
  could also key changes in key git behavior which makes it less likely
  that a user shoots him or herself in the foot --- i.e., give warnings
  if he or she has modified the branch in such a way that
  remotes.origin.branch is no longer contained within the branch head.
 
 At least rebase can pay attention to it and might make the world a
 better place.

Yeah, rebase was the primary command I was thinking about.  The other
one would be git commit --amend after the branch had been pushed
out.

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Added guilt.reusebranch configuration option.

2013-05-22 Thread Theodore Ts'o

On Wed, May 22, 2013 at 11:55:00AM -0700, Junio C Hamano wrote:
 But in a triangular workflow, the way to make the result reach the
 upstream is *not* by pushing there yourself.  For developers at
 the leaf level, it is to push to their own repository (often on
 GitHub), which is different from where they (initially) clone from
 in order to bootstrap themselves, and (subsequently) pull from in
 order to keep them up-to-date.  And then they request the published
 work to be pulled by the upstream.

Yep, what I do personally is to call the destination of this publish, i.e.:

[remote publish]
url = ssh://gitol...@ra.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.g
push = +master:master
push = +origin:origin
push = +dev:dev

So my typical work flow when I am ready to submit to Linus is:

   git tag -s ext4_for_linus
   git push publish
wait for this to propagate from ra.kernel.org to git.kernel.org,
 typically ~5 minutes
   git request-pull 
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git origin  /tmp/pull
use /tmp/pull as the e-mail body to send to Linus, cc'ing
 LKML and linux-e...@vger.kernel.org

But actually, it's much more common that I am doing a git push
publish so that (a) it can get picked up by the daily linux-next tree
(for integration testing even before Linus pulls it into his tree),
and (b) so other ext4 developers so they can either test or develop
against the ext4 tree in progress.

I suppose it would be convenient for git push to push to the
publish target, but I don't get confused about pushing to origin,
since semantically what I am doing is publishing the current state of
the ext4 tree so other people can see it.  So git push publish makes
a lot of sense to me.

 Even in a triangular workflow, @{u} should still refer to the place
 you integrate with, i.e. your upstream, not to the place you push
 to publish the result of your work.
 
 This branch.branch.rewindable safety however cannot be tied to
 @{u}.  The bottom boundary you want to be warned when you cross is
 the change you pushed out to your publishing repository, and it may
 not have reached remotes.origin.branch yet.

Indeed, and in fact for my use case what I promise people is that all
of the commits between origin..master are non-rewindable.  It's the
commits betewen master..dev which are rewindable.  So for me, I'd
still use the safety feature even for my rewindable branch, but
instead of using remotes/publish/dev the no-rewind point, I'd want
to use remotes/publish/master as the no-rewind point.

Right now I do this just by being careful, but if there was an
automatic safety mechanism, it would save me a bit of work, since
otherwise I might not catch my mistake until I do the git push
publish, at which point I curse and then start consulting the reflog
to back the state of my tree out, and then reapplying the work I had
to the right tree.

 We will be introducing remote.pushdefault configuration in the
 upcoming 1.8.3 release, so that you can say.

 and hopefully it would let you do this:
 
   git checkout master
 ... after working on it ...
 git push

Yes, that would be convenient.  BTW, one of the other things which I
do for e2fsprogs is that I use multiple publishing points, which is
mostly for historical reasons --- it used to be that repo.or.cz wasn't
all that reliable, and the 10-15 minute replication time from
ra.kernel.org to git.kernel.org got really old.

So what I do is something like this:

git push publish ; git push repo ; git push code

where

[remote publish]
url = ssh://gitol...@ra.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
fetch = +refs/heads/*:refs/heads/*
push = next
push = master
push = maint
push = debian
push = +pu

[remote code]
url = https://code.google.com/p/e2fsprogs/
fetch = +refs/heads/*:refs/heads/*
push = next
push = master
push = maint
push = debian
push = +pu

[remote repo]
url = ssh://repo.or.cz/srv/git/e2fsprogs.git
push = next
push = master
push = maint
push = debian
push = +pu

I don't know if this is something you'd want git to encourage, or
support explicitly, but I thought I'd mention it.

- Ted


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] guilt: skip empty line after from: line in patch descriptoin

2013-05-21 Thread Theodore Ts'o

Commit 2cc8d353d7ecb broke manually written patch descriptions of the
form:

 Frobnozzle: this is a patch subject

 From: Fred McNurk f...@mcnurt.foo

 This is the patch description

Commit 8f88f953580a0 partially fixed things by filtering out the From:
field, but it did not filter out the empty line (if present) after the
From: field, so it resulted in commit bodies which looked like this:

 Frobnozzle: this is a patch subject


 This is the patch description

instead of

 Frobnozzle: this is a patch subject

 This is the patch description

The ext4 patch queue has used this format for years, and this change
should not break other patches which look like mail headers and
bodies.

Signed-off-by: Theodore Ts'o ty...@mit.edu
---
 guilt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/guilt b/guilt
index 4edd1ad..309437a 100755
--- a/guilt
+++ b/guilt
@@ -365,7 +365,7 @@ do_get_header()
 BEGIN{body=0; subj=0}
 /^Subject:/  (body == 0  subj == 0){subj=1; print substr($0, 10) \n; 
next}
 /^(Subject:|Author:|Date:|commit)/  (body == 0){next}
-/^From:/ {next}
+/^From:/ {body=0; next}
 /^(Commit-ID:|Gitweb:|AuthorDate:|Committer:CommitDate:)/  (body == 0){next}
 /^[ \t\f\n\r\v]*$/  (body==0){next}
 /^.*$/  (body==0){body=1}
-- 
1.7.12.rc0.22.gcdd159b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] guilt: force the use of bare branches

2013-05-21 Thread Theodore Ts'o

To make it harder to accidentally do git push with a guilt patch
applied, guilt push changes branch from e.g. master to
guilt/master starting with commit 67d3af63f422.  This is a feature
which I use for ext4 development; I actually *do* want to be able to
push patches to the dev branch, which is a rewindable branch much like
git's pu branch.

Allow the use of the environment variable GUILT_FORCE_BARE_BRANCH
which disables the new behavior introduced by commit 67d3af63f422.

Signed-off-by: Theodore Ts'o ty...@mit.edu
Cc: Per Cederqvist ced...@opera.com
---
 guilt | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/guilt b/guilt
index 309437a..9953bdf 100755
--- a/guilt
+++ b/guilt
@@ -914,13 +914,22 @@ else
die Unsupported operating system: $UNAME_S
 fi
 
-if [ $branch = $raw_git_branch ]  [ -n `get_top 2/dev/null` ]
-then
-# This is for compat with old repositories that still have a
-# pushed patch without the new-style branch prefix.
+if [ -n `get_top 2/dev/null` ]; then
+  #
+  # If we have repositories patches pushed, then use whatever scheme
+  # is currently in use
+  #
+  if [ $branch = $raw_git_branch ]; then
 old_style_prefix=true
+  else
+old_style_prefix=false
+  fi
 else
+  if [ -n $GUILT_FORCE_BARE_BRANCH ]; then
+old_style_prefix=true
+  else
 old_style_prefix=false
+  fi
 fi
 
 _main $@
-- 
1.7.12.rc0.22.gcdd159b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] guilt: fix date parsing

2013-05-21 Thread Theodore Ts'o

If the date field has a space in it, such as:

   Date: Tue, 14 May 2013 18:37:15 +0200

previously guilt would go belly up:

   + export GIT_AUTHOR_DATE=Tue, 14 May 2013 18:37:15 +0200
   /usr/local/bin/guilt: 571: export: 14: bad variable name

Fix this.

Signed-off-by: Theodore Ts'o ty...@mit.edu
---
 guilt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/guilt b/guilt
index 9953bdf..6e8d542 100755
--- a/guilt
+++ b/guilt
@@ -568,7 +568,7 @@ commit()
author_date_str=`sed -n -e '/^Date:/ { s/^Date: 
//; p; q; }; /^(diff |---$|--- )/ q' $p`
fi
if [ ! -z $author_date_str ]; then
-   export GIT_AUTHOR_DATE=`echo $author_date_str`
+   export GIT_AUTHOR_DATE=$author_date_str
fi
fi
 
-- 
1.7.12.rc0.22.gcdd159b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: linux-next: unneeded merge in the security tree

2013-03-12 Thread Theodore Ts'o

What if we added the ability to do something like this:

[remote origin]
url = 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
fetch = +refs/heads/master:refs/heads/master
mergeoptions = --ff-only

This would be an analog to branch.name.mergeoptions, but it would
apply to the source of the pull request, instead of the destination.

That way, people who do a git pull from Linus's tree would get the
protection of --ff-only, while pulls from submaintainer trees would
automatically get a merge commit, which is what we want.

It doesn't handle the case of a submaintainer pulling from a
maintainer in a back-merge scenario, but that should be a pretty rare
case, so maybe that's OK.

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: linux-next: unneeded merge in the security tree

2013-03-12 Thread Theodore Ts'o

On Tue, Mar 12, 2013 at 02:30:04PM -0700, Junio C Hamano wrote:
 Theodore Ts'o ty...@mit.edu writes:
 
  [remote origin]
  url = 
  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
  fetch = +refs/heads/master:refs/heads/master
  mergeoptions = --ff-only
 
 
 Is there an escape hatch for that rare case?  IOW, how does a
 submaintainer who configured the above to override --ff-only?

Hmm, maybe we would need to add a --no-ff-only?  Or they could just
do:

git fetch origin
git merge FETCH_HEAD

On Tue, Mar 12, 2013 at 02:28:39PM -0700, Linus Torvalds wrote:

 Of course, I'm not really sure if we want to list the flags. Maybe
 it's better to just introduce the notion of upstream directly, and
 make that a flag, and make origin default to that when you clone.
 And then have git use different heurstics for pulling upstream (like
 warning by default when doing a back-merge, perhaps?)

What if git automaticallly set up the origin branch to have a certain
set of mergeoptions by default?  That would probably be right for most
users, but it makes it obvious what's going on when they take a look
at the .git/config file, and doesn't make the remote that happens to
have the name origin as having certain magic properties.  Using a
set of mergeoptions would also be bit more general, and might have
applications in the future.

   - Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: rebase destroys branches

2013-03-04 Thread Theodore Ts'o

On Tue, Mar 05, 2013 at 02:05:32PM +1300, Gene Thomas [DATACOM] wrote:
 
 The original branch is not 'destroyed', rather the pointer to the previous 
 tip is within the logs. 
 
 Is that the 'git log' log or internal logs? Are you sure? There doesn't 
 appear to be a way to checkout that tip of see the log back from that tip.

See the dovcumentation for git reflog

 All the content is still available until the logs expire.
 
 So we will be unable to checkout content after a time?

You need to make a distinction between a user's local repository, and
the team's central repository.  The workflow of the individual user is
one where they can and should be allowed to use rebase and git commit
--amend if they like.  Consider this the same thing as the user who
chooses to use quilt on their local machine while they are preparing
their patches, so they are carefully honed before they are cast into
concrete.  Whether they use quilt, or manual patching, or simply
don't bother checking things into the central SCM until things are
cleaned up, the end result is the same.

The team's central repository is one where you don't want to allow
history to be lost, and so there you can enforce rules to prevent
this.  For example, if you use Gerrit, you can limit the ability to
reset branches to administrators only.  Everyone else can only add new
commits, not change older ones.

(If someone accidentally checks in NDA'ed material belonging to
someone else, or some other IP content guaranteed to cause your
general counsel to have heart palpitations, trust me, you'll want to
allow administrators to rewind a git branch.  :-)

You can also use Gerrit to enforce code reviews, so that no change
goes in until a second engineer reviews the commit and gives it a
thumbs up (with a permanent record of the code review kept in Gerrit,
something which can be important for pointy-haired corporate types who
worry about Sarbanes Oxley controls --- although from your e-mail
address, you may be lucky enough to be exempt from needing to worry
about SOX controls :-).

Regards,

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] guilt patches, including git 1.8 support

2013-01-15 Thread Theodore Ts'o

On Tue, Jan 15, 2013 at 06:26:06PM -0800, Jonathan Nieder wrote:
 Hi Jeff and other guilty parties,
 
 I collected all the guilt patches I could find on-list and added one
 of my own.  Completely untested, except for running the regression
 tests.  These are also available via git protocol from
 
   git://repo.or.cz/guilt/mob.git mob

Jonathan, thanks for collecting all of the guilt patches!  Your repro
was also very much really useful since I hadn't grabbed the latest
patches from jeffpc's repo before it disappeared after the kernel.org
security shutdown.  

Jeff, do you need some help getting your repro on kernel.org
re-established?

- Ted

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git.wiki.kernel.org spam ...

2013-01-04 Thread Theodore Ts'o

On Sat, Jan 05, 2013 at 12:27:12AM +0100, Johannes Schindelin wrote:
 
 I was. John Hawley trusted me when I asked for admin privileges to keep
 the spam at bay, but a very vocal voice on the mailing list tried to
 discredit my work, and in the wake of the ensuing mailing list thread I
 got the impression that that feeling was universal, so I abided and
 stopped.
 
  this leaves me a little confused. who would be then be responsible? who
  would be responsible for upgrading / installing anything at the wiki?
 
 That would be John Hawley.

John is one of the Linux Foundation staff members that are responsible
for the system administration of wiki.kernel.org (and kernel.org, and
bugzilla.kernel.org, etc.)  They are *not* responsible for the
contents of the *.wiki.kernel.org; someone from the project has to be
the wiki maintainer.

(Note: the *.wiki.kernel.org infrastructure was originally set up at
my request, and the first such hosted wiki was ext4.wiki.kernel.org;
the second was rt.wiki.kernel.org, for which I was also the primary
wiki administrator initially.  I'm confident the policy on this hasn't
changed since those early days because LF sysadmins (e.g., John and
Konstantin) do *not* have time to police the various wikis for
spam)

 - Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Exploiting SHA1's XOR weakness allows for faster hash calculation

2012-12-05 Thread Theodore Ts'o

On Wed, Dec 05, 2012 at 10:19:43AM +0100, Sebastian Schuberth wrote:
 
 to say it in advance: I do not want to trigger any bogus security
 discussion here. Instead, I believe the findings from [1] allow for
 an up to 20% faster SHA1 calculation, if my brief reading of the
 presentation is correct. Any opinions on integration this
 optimization into Git?
 
 [1] https://hashcat.net/p12/js-sha1exp_169.pdf

It's only useful if you are trying to do brute-force password
cracking, where the password is being hashed in a very specific way.
(If for example the password was replicated N times in the input
buffer for SHA-1, instead of keeping the padding constant in the rest
of theinput buffer, this particular optimization would't apply.)

In any case, it's not at all applicable for general purpose checksum
calculations, and hence wouldn't apply to git.

Regards,

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)

2012-10-16 Thread Theodore Ts'o

I seem to recall that there was at least some discussion at one point
about adding some extra fields to the commit object in a backwards
compatible way by adding it after the trailing NUL.  We didn't end up
doing it, but I could see it being a useful thing nonetheless (for
example, we could potentially put the backup SHA-2/SHA-3 pointer there).

What if we explicitly allow a length plus SHA-2/3 hash of the commit
plus the fields after the SHA-2/3 hash as an extension?  This would
allow a secure way of adding an extension, including perhaps adding
backup SHA-2/3 parent pointers, which is something that would be
useful to do from a security perspective if we really are worried
about a catastrophic hash failure.

The one reason why we *might* want to use SHA-3, BTW, is that it is a
radically different design from SHA-1 and SHA-2.  And if there is a
crypto hash failure which is bad enough that the security of git would
be affected, there's a chance that the same attack could significantly
affect SHA-2 as well.  The fact that SHA-3 is fundamentally different
from a cryptographic design perspective means that an attack that
impacts SHA-1/SHA-2 will not likely impact SHA-3, and vice versa.

   - Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Android Replies to Git List getting rejected

2012-08-07 Thread Theodore Ts'o

On Tue, Aug 07, 2012 at 01:33:23PM -0600, John 'Warthog9' Hawley wrote:
 It's pretty simple: you sent HTML mail to vger.kernel.org, and it
 explicitly rejects all HTML e-mail.  GMail, particularly from Android,
 apparently doesn't have a way to bypass sending HTML mail (it's been a
 much maligned bug).

Yeah, sigh.  Drew, I suggest that you star the following bug:

http://code.google.com/p/android/issues/detail?id=8712

... and perhaps leave a comment in the bug report that you can't
interact with the git mailing list because of this limitation.

I'm sure you know (since you indicated that you sent your e-mail via
the web interface of Gmail), that this is at least something you can
control in the desktop/web version of Gmail (just enable Plain text
mode) --- but it would certainly be nice if users had the choice of
whether they wanted to participate on vger mailing lists using the
Android application, versus the Web interface, or using Mutt or Pine
on a Linux box.

Regards,

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Merge with git-pasky II.

2005-04-15 Thread Theodore Ts'o

On Fri, Apr 15, 2005 at 02:03:08PM +0200, Johannes Schindelin wrote:
 I disagree. In order to be trusted, this thing has to catch the following
 scenario:
 
 Skywalker and Solo start from the same base. They commit quite a lot to
 their trees. In between, Skywalker commits a tree, where the function
 kazoom() has been added to the file deathstar.c, but Solo also added
 this function, but to the file moon.c. A file-based merge would have no
 problem merging each file, such that in the end, kazoom() is defined
 twice.
 
 The same problems arise when one tries to merge line-wise, i.e. when for
 each line a (possibly different) merge-parent is sought.

Be careful.  There is a very big tradeoff between 100% perfections in
catching these sorts of errors, and usability.  There exists SCM's
where you are not allowed to do commit such merges until you do a test
compile, or run a regression test suite (that being the only way to
catch these sorts of problems when we merge two branches like this).  

BitKeeper never caught this sort of thing, and we trusted it.  In
practice it was also rarely a problem.

I'll also note that BitKeeper doesn't restrict you from doing a
committing a changeset when you have modified files that have yet to
be checked in to the tree.  Same issue; you can accidentally check in
changesets that in trees that won't build, but if we added this kind
of SCM-by-straightjacket philosophy it would decrease our productivity
and people would simply not use such an SCM, thus negating its
effectiveness.

- Ted
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

78 matches

Mail list logo