Re: bogus merges

2005-09-05 Thread Linus Torvalds


On Mon, 5 Sep 2005, Wayne Scott wrote:

 A recent commit in linux-2.6 looks like this:

It hopefully shouldn't happen any more with the improved and fixed
git-merge-base.

 Author: Russell King [EMAIL PROTECTED]
 Date:   Wed Aug 31 10:12:14 2005 +0100

I suspect rmk is using cogito-0.13, and that it still has the older 
git-merge-base that can get confused by the date-ordering problem. And 
when git-merge-base gives the wrong result (not either of the commit 
objects it was given as an argument), then git resolve will do the wrong 
thing.

I just checked, and the _current_ git-merge-base definitely gives the 
right result:

linux$ git-merge-base -a \
6b39374a27eb4be7e9d82145ae270ba02ea90dc8 \
194d0710e1a7fe92dcf860ddd31fded8c3103b7a

results in

194d0710e1a7fe92dcf860ddd31fded8c3103b7a

ie it correctly notices that the second commit is the parent of the first
one.

 Really 'git commit' should detect problems like this automatically and
 prevent them from getting in the tree.

Well, that would depend on having the fixed git-merge-base in the first 
place, which in turn would mean that such a commit wouldn't happen at all, 
so it's kind of circular. It's not worth fixing anywhere else, since once 
you fix it in git-merge-base, it just becomes a non-issue.

Hopefully we'll have a new cogito release soon, and this particular bug
will be a thing of the past.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tool renames? was Re: First stab at glossary

2005-09-05 Thread Linus Torvalds


On Mon, 5 Sep 2005, David Kågedal wrote:
 
 But to the users (like myself), there's no point in naming it by
 whether it's a script or a binary. 

So? There's no downside.

To you, as a user, you never see the -script ending anyway. You'd never 
type it out, or you're already doing something wrong.

So to users it doesn't matter, and to developers it _does_ matter (and 
calling them .pl or .sh or something would be _bad_), why not please 
the developers?

 So your argument that it makes it easier for git developers to work
 with the source doesn't help the user.

It doesn't _help_ the user, but since it doesn't hurt him either, why the 
hell would we even _care_ about the user?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tool renames? was Re: First stab at glossary

2005-09-06 Thread Linus Torvalds


On Tue, 6 Sep 2005, Martin Langhoff wrote:
 
 Grep knows how to ignore binary files.

That wasn't the _point_.

The point is, naming things as being scripts is useful. Grep is just an 
example. Naming things as being .pl or .sh is _not_ useful.

So with grep you can use -I, but what about doing things like em * when
doing global renames (I use micro-emacs - em - as my editor). Again, em
*-script actually works.

The point being that if we have naming rules, make them USEFUL. *-script 
is useful - it works wonderfully well for git xxx (which knows to add 
-script), and it works wonderfully well for developers. 

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tool renames? was Re: First stab at glossary

2005-09-06 Thread Linus Torvalds


On Tue, 6 Sep 2005, Junio C Hamano wrote:

 Linus Torvalds [EMAIL PROTECTED] writes:
 
  The point is, naming things as being scripts is useful. Grep is just an 
  example. Naming things as being .pl or .sh is _not_ useful.
 
 Sorry, but why not?

What's the upside?

I can point to one downside: git. That script right now is simple. If 
you rewrite git-cvsimport-script from shell to perl, it looks the same to 
git. 

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Linus Torvalds


On Thu, 8 Sep 2005, Junio C Hamano wrote:
 
 Yes, the reading of three trees upfront is probably the culprit
 in your case

However, note that _most_ tree reading just reads one.

Merges may take half a second, and yes, when I did it, the fact that we 
move things around in the array is by far the highest cost. But the thing 
is, if merges take half a second, that's still not only damn good, it's 
not even the most common operation.

Yes, the active_cache[] layout as one big array is inconvenient for 
read_tree(), which tends to want to interleave different trees in the 
array and thus forces things to be moved around.

But remember what the most common use for the index is - it's the single 
tree case from read_cache(). That's _so_ much more common than 
read_tree() that it's not even funny.

So the data structure is optimized for a different case than reading in 
trees. Big deal. That optimization is definitely worth it: it allows us to 
do the read_cache() with the actual index entries being totally read-only 
(a linked list would have to add a next pointer to the cache entries and 
not allow the in-place thing that read_cache() does).

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Linus Torvalds


On Thu, 8 Sep 2005, Chuck Lever wrote:
 
 in my case the merges were taking significantly longer than a half 
 second.  making this change is certainly not worth it if merges are 
 running fast...

Note that in cold-cache cases, all the expense of read-tree is in actually
reading the tree objects themselves (a kernel tree has more than a
thousand subdirectories). Also, a full git pull will do the diffstat 
etc, and then the expense ends up being in the actual git diff part.

So read-tree itself may be half a second, but a merge ends up having other 
parts.

 they are still read-only with my linked list implementation.

Btw, in the sparse project, we have this really smart pointer list data
structure, which is extremely space- and time-efficient. It ends up
_looking_ like a linked list, but it batches things up in hunks of 29
entries (29 pointers plus overhead gives you blocks of 32 longwords, which
is the allocation size) and thus gives basically a cache-friendly
doubly-linked list. It knows how to do insertions, traversals etc very 
efficiently.

Any interest?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git applymbox is too anal

2005-09-08 Thread Linus Torvalds


On Thu, 8 Sep 2005, Greg KH wrote:
 
 Or am I missing some option to 'git applymbox' that I can't seem to
 find?

No. git-apply wants an exact bit-for-bit match. Partly because fuzz is 
hard, but mostly because I don't like it. I apply a _lot_ of patches, and 
if a unforgiving git-apply works for me, it should work for you too.

One issue is that I want to apply patches that apply cleanly, and if the 
patch was generated against some older kernel version, then it should 
quite likely be _applied_ towards that older kernel version. Then you can 
use git to merge the two.

Of course, I only do that occasionally, for bigger clashes. For smaller 
things, I just edit the patch by hand. And sometimes, I ask the sending 
side to re-generate the patch.

Applying with a fuzz does the right thing most of the time. But most 
isn't good enough. I'd rather have a human look at anything that requires 
fuzz.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH maint-1.6.5] block-sha1: avoid unaligned accesses on some big-endian systems

2012-07-14 Thread Linus Torvalds
On Sat, Jul 14, 2012 at 12:59 AM, Jonathan Nieder jrnie...@gmail.com wrote:

 Unfortunately, on big-endian architectures, if p is a pointer to
 unsigned int then current gcc assumes it is properly aligned and
 converts this construct to a 32-bit load.

This patch seems to entirely depend on the location of the cast. And
as far as I can tell, that workaround will in turn depend on just what
gets inlined.

My gut feel is that this makes the code uglier (the manual multiply
by four being a prime example) while not really even addressing the
problem.

I think a much better approach would be to just mark the unsigned int
data pointer as being unaligned, or add a get_unaligned() helper
function (you have to do a structure member and mark the structure
packed, I think - I don't think you can just mark an int pointer
packed). Sure, that's compiler-dependent, but if a compiler does
something like gcc apparently does, it had better support the notion
of unaligned pointers.

And then gcc might actually do the unaligned word read *optimally* on
big-endian architectures, instead of that idiotic byte-at-a-time crap
with shifting.

Anyway, the whole noticed on alpha makes no sense, since alpha isn't
even big-endian. So the commit log is insane and misleading too. Alpha
is very much little-endian, but maybe gcc turns the thing into an
unaligned load followed by a bswap.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH maint-1.6.5] block-sha1: avoid unaligned accesses on some big-endian systems

2012-07-14 Thread Linus Torvalds
On Sat, Jul 14, 2012 at 12:50 PM, Jonathan Nieder jrnie...@gmail.com wrote:
 After the patch, what reason does gcc have to expect that 'block' is
 32-bit aligned except when it is?  The code (including the code I
 didn't touch) never casts from char * to int * except in get/put_be32
 on arches that don't mind unaligned accesses.

Ahh. I was looking at the cast you added:

blk_SHA1_Block(ctx, (const unsigned char *) ctx-W);

but I guess that if gcc inlines that and sees the original int type,
it doesn't matter, because that *is* aligned.

So it's ok, but please just make blk_SHA1_Block() take a const void
* instead and that cast can go away.

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints

2012-07-14 Thread Linus Torvalds
Looks good to me. I'd suggest doing the macro argument expansion in

   #define SHA_SRC(t) get_be32((unsigned char *) block + t*4)

with parenthesis around 't', but that's a fairly independent thing.
The old code didn't do that either.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] macos: lazily initialize iconv

2012-07-31 Thread Linus Torvalds
On Tue, Jul 31, 2012 at 11:37 AM, Junio C Hamano gits...@pobox.com wrote:

  * This is not even compile tested, so it needs testing and
benchmarking, as I do not even know how costly the calls to
open/close are when we do not have to call iconv() itself.

Ok, so it's easily compile-tested: just add

+   COMPAT_OBJS += compat/precompose_utf8.o
+   BASIC_CFLAGS += -DPRECOMPOSE_UNICODE

to the makefile for Linux too.

Actually testing how well it *works* is hard, since I don't really
have a mac (well, I do, but it no longer has OS X on it ;), and the
whole utf-8-mac thing does not make sense.

HOWEVER. I actually tested it with the conversion being from Latin1 to
UTF-8 instead, and it does interesting things, and kind of works. I
say kind of, because for the case of the filesystem being in Latin1,
we actually have to convert things back to the filesystem character
set when doing open() and lstat(), and this patch obviously
doesn't do that, because OS X does the conversion back to NFD on its
own.

But ACK on the patch.

If I had more time, I'd actually be interested to do the generic case
of namespace conversion, and we could make this a generic git feature
- it's something I wanted to do long ago. However, right now I'm in
the merge window and will go on a vacation to Finland after that, so I
probably won't get around to it.

I do have one suggestion: please rename the has_utf8() function to
has_nonascii() too. Why? Because that's what the function actually
does. It has nothing to do with testing for UTF-8 (the utf-8 rules are
more complex than just high bit set), and *if* I ever get around to
doing a more generic character set conversion for the filenames, the
decision really would be about non-ASCII, not about non-UTF8.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] macos: lazily initialize iconv

2012-07-31 Thread Linus Torvalds
On Tue, Jul 31, 2012 at 1:16 PM, Junio C Hamano gits...@pobox.com wrote:

 Eek.

Oh, I agree. Doing a full character set conversion both ways is quite
a bit more work.

 Not just write_entry() codepath that creates the final paths on the
 filesystem, you would need to touch lstat() calls that check the
 existence and freshness of the path, once you go that route.  I am
 sure such a change can be made to work, but I am not sure how much
 we would gain from one.

I think it might be interesting. I doubt it matters all that much any
more in Western Europe (Unicode really does seem to have largely taken
over), but I think Japan still uses Shift-JIS a lot.

Although maybe that is starting to fade too.

And it really is just a generalization of the OS X filesystem damage.

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/7] Rework git core for native submodules

2013-04-04 Thread Linus Torvalds
On Thu, Apr 4, 2013 at 11:30 AM, Ramkumar Ramachandra
artag...@gmail.com wrote:

 The purpose of this series is to convince you that we've made a lot of
 fundamental mistakes while designing submodules, and that we should
 fix them now.  [1/7] argues for a new object type, and this is the
 core of the idea.

I don't dispute that a new link object might be a good idea, but
there's no explanation of the actual format of this thing anywhere,
and what the real advantages would be. A clearer this is the design,
this is the format of the link object, and this is what it buys us
would be a good idea. Also, one of the arguments against using link
objects originally was that the format wasn't stable, and in
particular the address of the actual submodule repository might differ
for different people. So when adding a new object type, explaining
*why* the format of such an object is globally stable (since it will
be part of the SHA1 of the object) is a big deal.

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/7] Rework git core for native submodules

2013-04-04 Thread Linus Torvalds
On Thu, Apr 4, 2013 at 11:52 AM, Ramkumar Ramachandra
artag...@gmail.com wrote:

 1. upstream_url: this records the upstream URL.  No need to keep a 
 .gitmodules.

 2. checkout_rev: this records the ref to check out the submodule to.
 As opposed to a concrete SHA-1, this allows for more flexibility; you
 can put refs/heads/master and have truly floating submodules.

 3. ref_name: this specifies what name the ref under
 refs/modules/branch/ should use.

 4. floating: this bit specifies whether to record a concrete SHA-1 in
 checkout_rev.

 5. statthrough: this bit specifies whether git should stat() through
 the worktree.  We can turn it off on big repositories for performance
 reasons.

So the thing is (and this was pretty much the original basis for
.gitmodules) that pretty  much *all* of the above fields are quite
possibly site-specific, rather than globally stable.

So I actually conceptually like (and liked) the notion of a link
object, but I just don't think it is necessarily practically useful,
exactly because different installations of the *same* supermodule
might well want to have different setups wrt these submodule fields.

My gut feel is that yes, .gitmodules was always a bit of a hack, but
it's a *working* hack, and it does have advantages exactly because
it's more fluid than an actual git object (which by definition has to
be set 100% in stone). If there are things you feel it does wrong
(like the git add bug that is being discussed elsewhere), I wonder
if it's not best to at least try to fix/extend them in the current
model. The features you seem to be after (ie that whole
floating/refname thing) don't seem fundamentally antithetical to the
current model (a commit SHA1 of all zeroes for floating, with a new
refname field in .submodules? I dunno)..

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/7] Rework git core for native submodules

2013-04-04 Thread Linus Torvalds
On Thu, Apr 4, 2013 at 12:36 PM, Ramkumar Ramachandra
artag...@gmail.com wrote:

 Let's compare the two alternatives: .gitmodules versus link object.
 If I want my fork of .gitmodules, I create a commit on top.

Or you could also just edit and carry a dirty .gitmodules around for
your personal use-case.

I don't know if anybody does that, but it should work fine.

And I don't see what you can do with the link objects that you cannot
do with .gitmodules. That's what it really boils down to. .gitmodules
do actually work. Your extensions would work with them too.

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/7] Rework git core for native submodules

2013-04-05 Thread Linus Torvalds
On Thu, Apr 4, 2013 at 1:04 PM, Ramkumar Ramachandra artag...@gmail.com wrote:
 Linus Torvalds wrote:
 Or you could also just edit and carry a dirty .gitmodules around for
 your personal use-case.

 I'm sorry, but a dirty worktree is unnecessarily painful to work with.

Bzzt. Wrong.

A dirty worktree is not only easy to work with (I do it all the time,
having random test-patches in my tree that I never even intend to
commit), it's a *requirement*.

One thing that git does really really well is merging. And one of the
reasons why git does merging well (apart from the obvious meta-issue:
it's what I care about) is that it not only has the stable information
in the object database, it also has the staging information in the
index, *and* it has dirty data in the working tree.

You absolutely need all three. Having an edit command to edit stable
data (or staging data) is broken. Trust me, I've been there, done
that, got the T-shirt and know it is wrong. The whole stable objects
+ index + dirty worktree is FUNDAMENTALLY the right way to work,
and it *has* to work that way for merges to work well.

The only things that we don't have dirty data for in the worktree is
creating commits and tags, but those aren't relevant for the merging
process anyway, in the sense that you never change them for merging,
you create them *after* merging (and this is fundamental, and not just
a git implementation issue).

So you absolutely need a dirty worktree. You need it for testing, and
you need it for merging. Having a model where you don't have a
in-progress entity that works as a temporary is absolutely and
entirely wrong.

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


git grep parallelism question

2013-04-26 Thread Linus Torvalds
Since I reboot fairly regularly to test new kernels, I don't *always*
have the kernel source tree in my caches, so I still care about some
cold-cache performance. And git grep tends to be the most noticeable
one.

Now, I have a SSD, and even the cold-cache case takes just five
seconds or so, but that's still somethng I react to, since the normal
kernel tree in cache case ends up bring close enough to
instantaneous (half a second) that then when it takes longer I react
to it.

And I started thinking about it, and our git grep parallelism seems
to be limited to 8.

Which is fine most of the time for CPU parallelism (although maybe
some people with big machines would prefer higher numbers), but for IO
parallelism I thought that maybe we'd like a higher number...

So I tried it out, and with THREADS set to 32, I get a roughly 15%
performance boost for the cold-cache case (the error bar is high
enough to not give a very precise number, but I see it going from
~4.8-4.9s on my machine down to 4.2..4.6s).

That's on an SSD, though - the performance implications might be very
different for other use cases (NFS would likely prefer higher IO
parallelism and might show bigger improvement, while a rotational disk
might end up just thrashing more)

Is this a big deal? Probably not. But I did react to how annoying it
was to set the parallelism factor (recompile git with a new number).
Wouldn't it be lovely if it was slightly smarter (something more akin
to the index preloading that takes number of files into account) or at
least allowed people to set the parallelism explicitly with a command
line switch?

Right now it disables the parallel grep entirely for UP, for example.
Which makes perfect sense if grep is all about CPU use. But even UP
might improve from parallel IO..

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git grep parallelism question

2013-04-26 Thread Linus Torvalds
On Fri, Apr 26, 2013 at 11:47 AM, Junio C Hamano gits...@pobox.com wrote:

 The real issue may be that we do not have a good estimate of how
 many paths are involved in the request before starting these
 threads, though.

Yes. Also, I'm not sure if the 15% possible improvement on my SSD case
is even worth it for something that in the end isn't necessarily the
common case. I *suspect* that it might be a much bigger deal on NFS
(IO parallelism really does end up being a big deal sometimes, and
caching tends to be less aggressive too), but on rotational media it
might be much less clear, or even a loss..

Are there people out there who use git grep over NFS and have been
unhappy with performance? If are willing to recompile git with a
different THREAD value in builtin/grep.c, then on a Linux client you
can try

echo 3  /proc/sys/vm/drop_caches

to largely force cold-cache behavior for testing (I say largely,
because it won't drop busy/dirty pages, but for git grep kind of
loads it should be good).

Of course, you need root for it, so..

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git grep parallelism question

2013-04-26 Thread Linus Torvalds
On Fri, Apr 26, 2013 at 12:19 PM, Junio C Hamano gits...@pobox.com wrote:

 OK, you have to recompile at least once for experiment, so perhaps
 building the test binary with this patch may help.

So here's my experiment on my machine with this patch and the kernel
tree. First I did the warm-cache case:

  for i in 1 4 8 16 32 64
  do
echo $i:
for j in 1 2 3 4
do
  t=$(sh -c time git grep --threads=$i hjahsja 21 | grep real)
  echo $i $t
done
  done

and the numbers are pretty stable, here's just he summary (best of
four tries for each case):

   1 real 0m0.598s
   4 real 0m0.253s
   8 real 0m0.235s
  16 real 0m0.269s
  32 real 0m0.412s
  64 real 0m0.420s

so for this machine, 8 threads (our old number) is actually optimal
even if it has just four cores (actually, two cores with HT). I
suspect it's just because the load is slightly unbalanced, so a few
extra threads helps. Looks like anything in the 4-16 thread range is
ok, though. But 32 threads is clearly too much.

Then I did the exact same thing, but with the echo 3 
/proc/sys/vm/drop_caches just before the timing of the git grep.
Again, summarizing (best-of-four number, the variation wasn't that
big):

   1 real 0m17.866s
   4 real 0m6.367s
   8 real 0m4.855s
  16 real 0m4.307s
  32 real 0m4.153s
  64 real 0m4.128s

here, the numbers continue to improve up to 64 threads, although the
difference between 32 and 64 is starting to be in the noise. I suspect
it's a combination of better IO overlap (although not all SSD's will
even improve all that much from overlapping IO past a certain point)
and probably a more noticeable imbalance between threads.

Anyway, for *my* machine and for *this* particular load, I'd say that
we're already pretty close to optimal. I did some trials just to see,
but the best hot-cache numbers were fairly reliably for 7 or 8
threads.

Looking at the numbers, I can't really convince myself that it would
be worth it to do (say) 12 threads on this machine. Yes, the
cold-cache case improves from the 8-thread case (best-of-four for 12
threads: 0m4.467s), but the hot-cache case has gotten sufficiently
worse (0m0.251s) that I'm not sure..

Of course, in *absolute* numbers the cold-cache case is so much slower
that a half-second improvement from going to 16 threads might be
considered worth it, because while the the hot-cache case gets worse,
we may just not care because it's fast enough that it's not
noticeable.

Anyway, I think your patch is good if for no other reason that it
allows this kind of testing, but at least for my machine, clearly the
current default of eight threads is actually good enough. Maybe
somebody with a very different machine might want to run the above
script and see if how sensitive other machines are to this parameter..

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] patch-ids.c: cache patch IDs in a notes tree

2013-05-11 Thread Linus Torvalds
On Sat, May 11, 2013 at 12:54 PM, John Keeping j...@keeping.me.uk wrote:
 This adds a new configuration variable patchid.cacheRef which controls
 whether (and where) patch IDs will be cached in a notes tree.

Patch ID's aren't stable wrt different diff options, so I think this
is potentially a very bad idea.

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] patch-ids.c: cache patch IDs in a notes tree

2013-05-11 Thread Linus Torvalds
On Sat, May 11, 2013 at 2:49 PM, John Keeping j...@keeping.me.uk wrote:

 Hmm... I hadn't realised that.  Looking a bit closer, it looks like
 init_patch_ids sets up its own diffopts so its not affected by the
 command line (except for pathspecs which would be easy to check for).
 Of course that still means it can be affected by settings in the user's
 configuration.

.. and in the actual diff algorithm.

The thing is - patches ARE NOT STABLE. There are many valid ways to
get a patch from one version to another, and even without command line
changes, we've had different versions of git generating different
patches. There are heuristics in xdiff to avoid some nasty use up
tons of CPU-time things that have been tweaked over time. And even
for simple cases there are ambiguous ways to describe the patch.

Now, maybe we don't care, because in practice, most of the time this
just doesn't much matter. And anybody who uses patch-ID's had better
not rely on them being guaranteed to be unique anyway, so it's more of
a if the patch ID is the same, it's almost guaranteed to be the same
patch, but that's a big almost. And no, it's not some theoretical
SHA1 collisions are very unlikely kind of thing, it's a the patch
ID actually ignores a lot of data in order to give the same ID even if
lins have been added above it, and the patch is at different line
numbers etc.

So maybe it doesn't matter. But at the same time, I really think
caching patch ID's should be something people should be aware of is
fundamentally wrong, even if it might work.

And quite frankly, if you do rebases etc so much that you think patch
ID's are so important that they need to be cached, you may be doing
odd/wrong things.

 Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-compat-util: Avoid strcasecmp() being inlined

2013-09-18 Thread Linus Torvalds
On Wed, Sep 18, 2013 at 5:43 AM, Sebastian Schuberth
sschube...@gmail.com wrote:

 My feeling is that Linus' reaction was more about that this
 work-around is even necessary (and MinGW is buggy) rather than
 applying it to git-compat-util.h and not elsewhere.

So I think it's an annoying MinGW bug, but the reason I dislike the
no-inline approach is two-fold:

 - it's *way* too intimate with the bug.

   When you have a bug like this, the *last* thing you want to do is
to make sweet sweet love to it, and get really involved with it.

   You want to say Eww, what a nasty little bug, I don't want to have
anything to do with you.

   And quite frankly, delving into the details of exactly *what* MinGW
does wrong, and defining magic __NO_INLINE__ macros, knowing that that
is the particular incantation that hides the MinGW bug, that's being
too intimate. That's simply a level of detail that *nobody* should
ever have to know.

   The other patch (having just a wrapper function) doesn't have those
kinds of intimacy issues. That patch just says MinGW is buggy and
cannot do this function uninlined, so we wrap it. Notice the lack of
detail, and lack of *interest* in the exact particular pattern of the
bug.

The other reason I'm not a fan of the __NO_INLINE__ approach is even
more straightforward:

 - Why should we disable the inlining of everything in string.h (and
possibly elsewhere too - who the hell knows what __NO_INLINE__ will do
to other header files), when in 99% of all the cases we don't care,
and in fact inlining may well be good and the right thing to do.

So the __NO_INLINE__ games seem to be both too big of a hammer, and
too non-specific, and at the same time it gets really intimate with
MinGW in unhealthy ways.

If you know something is diseased, you keep your distance, you don't
try to embrace it.

 I tried to put the __NO_INLINE__ stuff in compat/mingw.h but failed,
 it involved the need to shuffle includes in git-compat-util.h around
 because winsock2.h already seems to include string.h, and I did not
 find a working include order. So I came up with the following, do you
 like that better?

Ugh, so now that patch is fragile, so we have to complicate it even more.

Really, just make a wrapper function. It doesn't even need to be
conditional on MinGW. Just a single one-liner function, with a comment
above it that says MinGW is broken and doesn't have an out-of-line
copy of strcasecmp(), so we wrap it here.

No unnecessary details about internal workings of a buggy MinGW header
file. No complexity. No subtle issues with include file ordering. Just
a straightforward workaround that is easy to explain.

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] git log: support auto decorations

2014-05-29 Thread Linus Torvalds

From: Linus Torvalds torva...@linux-foundation.org
Date: Thu, 29 May 2014 15:19:40 -0700
Subject: [RFC PATCH] git log: support auto decorations

This works kind of like --color=auto - add decorations for interactive
use, but do not change defaults when scripting or when piping the output
to anything but a terminal.

You can use either

[log]
 decorate=auto

in the git config files, or the --decorate=auto command line option to
choose this behavior.

Signed-off-by: Linus Torvalds torva...@linux-foundation.org
---

I actually like seeing decorations by default, but I do *not* think our 
current log.decorate options make sense, since they will change any 
random use of git log to have decorations. I much prefer the 
ui.color=auto behavior that we have for coloration. This is a trivial 
patch that tries to approximate that.

It's marked with RFC because

 (a) that isatty(1) || pager_in_use() test is kind of hacky, maybe we 
 would be better off sharing something with the auto-coloration?

 (b) I also think it would be nice to have the equivalent for 
 --show-signature, but there we don't have any preexisting config 
 file option.

 (c) maybe somebody would like a way to combine auto and full, 
 although personally that doesn't seem to strike me as all that useful 
 (would you really want to see the full refname when not scripting it)

but the patch is certainly simple and seems to work. Comments?

 builtin/log.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/log.c b/builtin/log.c
index 39e883635279..df6396c9c3d9 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -63,6 +63,8 @@ static int parse_decoration_style(const char *var, const char 
*value)
return DECORATE_FULL_REFS;
else if (!strcmp(value, short))
return DECORATE_SHORT_REFS;
+   else if (!strcmp(value, auto))
+   return (isatty(1) || pager_in_use()) ? DECORATE_SHORT_REFS : 0;
return -1;
 }
 
-- 
2.0.0.1.g5beb60c

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] git log: support auto decorations

2014-05-29 Thread Linus Torvalds
On Thu, May 29, 2014 at 6:58 PM, Jeff King p...@peff.net wrote:

 I will spare you the usual lecture on having these lines in the message
 body. ;)

We do it for the kernel because they often get lost otherwise.
Particularly the date/author.

git doesn't tend to have the same kind of deep email forwarding
models, and nobody uses quilt to develop git (I hope!) but it's a
habit I like to encourage for the kernel, so I use it in general..

 Yeah, I think this makes a lot of sense. I do use log.decorate=true, and
 it is usually not a big deal. However, I think I have run into
 annoyances once or twice when piping it. I'd probably use
 log.decorate=auto if we had it.

I have various scripts to generate releases etc, using git log and
piping it to random other stuff. I don't _think_ they care, but quite
frankly, I'd rather not even have to think about it.

Also, I actually find the un-colorized version of the decorations to
be distracting. With color (not that I really like the default color
scheme, but I'm too lazy to set it up to anything else), the
colorization ends up making the decorations much easier to visually
separate, so I like them there.

 It's marked with RFC because

  (a) that isatty(1) || pager_in_use() test is kind of hacky, maybe we
  would be better off sharing something with the auto-coloration?

 The magic for this is in color.c, want_color() and check_auto_color().

Oh, I know. That's where I stole it from. But the colorization
actually has very different rules, and looks at TERM etc. So I only
stole the actual non-color-specific parts of it, but I was wondering
whether it might make sense to unify them.

  (b) I also think it would be nice to have the equivalent for
  --show-signature, but there we don't have any preexisting config
  file option.

 Potentially yes, though there is a real performance impact for log
 --show-signature if you actually have a lot of signatures. Even on
 linux.git, a full git log is 15s with --show-signature, and 5s
 without. Maybe that is acceptable for interactive use (and certainly it
 is not a reason to make it an _option_, if somebody wants to turn it
 on).

Yes. This is an example of where the whole tty is fundamentally
different from scripting happens. It really is about the whole user
is looking at it vs scripting. There is no way in hell that
--show-signature should be on by default for anybody sane.

That said, part of it is just that show-signature is so suboptimal
performance-wise, re-parsing the commit buffer for each commit when
show_signature is set. That's just crazy, we've already parsed the
commit text, we already *could* know if it has a signature or not, and
skip it if it doesn't. That would require one of the flag bits in the
object, though, or something, so it's probably not worth doing.

Sure, performance might matter if you end up searching for something
in the pager after you've done git log, but personally I think I'd
never even notice..

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.0.0 regression? request pull does not seem to find head

2014-06-02 Thread Linus Torvalds
On Mon, Jun 2, 2014 at 2:01 PM, Michael S. Tsirkin m...@redhat.com wrote:

 [mst@robin linux]$ git request-pull net-next/master  
 git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git net-next
 warn: No match for commit 2ae76693b8bcabf370b981cd00c36cd41d33fabc found at 
 git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
 warn: Are you sure you pushed 'net-next' there?

git request-pull is clearly correct. There is no net-next in that
public repository.

It *used* to be that request-pull then magically tried to fix it up
for you, which in turn resulted in the guess not being right, like
pointing to the wrong branch that just happened to have the same SHA1,
or pointing to a branch when it _should_ have pointed to a tag.

Now, if you pushed your local net-next branch to another branch name
(I can find a branch name called vhost-next at that repository, then
you can *tell* git that, using the same syntax you must have used for
the push.

So something like

  git request-pull net-next/master
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
net-next:vhost-next

should work so that git doesn't end up having to guess (and
potentially guessing wrong).

But it may actually be a simpler driver error, and you wanted to use
vhost-next, and that net-next was actually incorrect?

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.0.0

2014-06-02 Thread Linus Torvalds
On Mon, Jun 2, 2014 at 7:08 PM, NeilBrown ne...@suse.de wrote:

   git request-pull master git://neil.brown.name/md

 after tagging the current commit as md/3.15-fixes and pushing out the tag

You should *tell* git request-pull what you're actually requesting to pull.

The old let's guess based on the commit at the top of your tree may
have worked for you (because you only ever had one matching thing),
but it did not work in general.

So the new git request-pull does not guess. If you want to request a
tag to be pulled, you name it.  Because if you don't name it, it now
defaults to HEAD (like all other git log commands) rather than
guessing. So please write it as

   git request-pull master git://neil.brown.name/md md/3.15-fixes

I guess the man-page should be made more explicit about this too.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: git request-pull broken for plain branches

2014-06-25 Thread Linus Torvalds
On Wed, Jun 25, 2014 at 2:55 AM, Uwe Kleine-König
u.kleine-koe...@pengutronix.de wrote:

 $ git rev-parse HEAD
 9e065e4a5a58308f1a0da4bb80b830929dfa90b3
 $ git ls-remote origin | grep 9e065e4a5a58308f1a0da4bb80b830929dfa90b3
 9e065e4a5a58308f1a0da4bb80b830929dfa90b3
 refs/heads/ukl/for-mainline
 $ git request-pull origin/master origin HEAD  /dev/null
 warn: No match for commit 9e065e4a5a58308f1a0da4bb80b830929dfa90b3 
 found at origin
 warn: Are you sure you pushed 'HEAD' there?

Notice how HEAD does not match.

The error message is perhaps misleading. It's not enough to match the
commit. You need to match the branch name too. git used to guess the
branch name (from the commit), and it often guessed wrongly. So now
they need to match.

So you should do

git request-pull origin/master origin ukl/for-mainline

to let request-pull know that you're requesting a pull for ukl/for-mainline.

If you have another name for that branch locally (ie you did something
like git push origin local:remote), then you can say

git request-pull origin/master origin local-name:remote-name

to specify what the branch to be pulled is called locally vs remotely.

In other words, what used to be pick some branch randomly is now
you need to _specify_ the branch.

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tackling Git Limitations with Singular Large Line-seperated Plaintext files

2014-06-27 Thread Linus Torvalds
On Fri, Jun 27, 2014 at 10:48 AM, Junio C Hamano gits...@pobox.com wrote:

 Even though the original question mentioned delta discovery, I
 think what was being asked is not delta in the Git sense (which
 your answer is about) but is can we diff two long sequences of text
 (that happens to consist of only 4-letter alphabet but that is a
 irrelevant detail) without holding both in-core in their entirety?,
 which is a more relevant question/desire from the application point
 of view.

.. even there, there's another issue. With enough memory, the diff
itself should be fairly reasonable to do, but we do not have any sane
*format* for diffing those kinds of things.

The regular textual diff is line-based, and is not amenable to
comparing two long lines. You'll just get a diff that says the two
really long lines are different.

The binary diff option should work, but it is a horrible output
format, and not very helpful. It contains all the relevant data (copy
this chunk from here to here), but it's then shown in a binary
encoding that isn't really all that useful if you want to say what
are the differences between these two chromosomes.

I think it might be possible to just specify a special diff algorithm
(git already supports that, obviously), and just introduce a new use
binary diffs with a textual representation model.

But it also sounds like there might be some actual performance problem
with these 1GB file delta-calculations. Which I wouldn't be surprised
about at all.

Jarrad - is there any public data you could give as an example and for
people to play with?

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tackling Git Limitations with Singular Large Line-seperated Plaintext files

2014-06-27 Thread Linus Torvalds
On Fri, Jun 27, 2014 at 12:38 PM, Linus Torvalds
torva...@linux-foundation.org wrote:

 I think it might be possible to just specify a special diff algorithm
 (git already supports that, obviously), and just introduce a new use
 binary diffs with a textual representation model.

Another model would be to just insert newlines in the data, and use
the regular textual diff on that preprocessed format.

The problem of *where* to insert the newlines is somewhat interesting,
since the stupid approaches (chunk it up in 64-byte lines) don't
work with data insertion/deletion (all the lines will now be different
just because the data is offset), but there are algorithms that handle
that reasonably well, like breaking lines at certain well-defined
patterns (the patterns can then be defined either explicitly or
algorithmically - like calculating a hash/crc over the last rolling N
characters and breaking if the result  matches some modulo
calculation).

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tackling Git Limitations with Singular Large Line-seperated Plaintext files

2014-06-27 Thread Linus Torvalds
On Fri, Jun 27, 2014 at 12:55 PM, Jason Pyeron jpye...@pdinc.us wrote:

 The issue will be, if we talk about changes other than same length 
 substitutions
 (e.g. Down's Syndrome where it has an insertion of code) would require one 
 code
 per line for the diffs to work nicely.

Not my area of expertise, but depending on what you are interested in
- like protein encoding etc, I really think you don't need to do
things character-per-character. You might want to break at interesting
sequences (TATA box, and/or known long repeating sequences).

So you could basically turn the one long line representation into
multiple lines, by just looking for particular known interesting (or
known particularly *UN*interesting) patterns, and whenever you see the
pattern you create a new line, describing the pattern (TATAAA or
run of 128 U), and then continue on the next line.

Then you diff those semantically enriched streams instead of the raw data.

But it probably depends on what you are looking for and at. Sometimes
you might be looking at individual base pairs. And sometimes maybe you
want to look at the codons, and consider condons that transcribe to
the same amino acid to be the same, and not show up as a difference.
So I could well imagine that you might want to have multiple different
ways to generate these diffs. No?

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch-2.7.3 no longer applies relative symbolic link patches

2015-01-26 Thread Linus Torvalds
On Mon, Jan 26, 2015 at 1:35 PM, Junio C Hamano gits...@pobox.com wrote:

 What is your take on CVE-2015-1196, which brought this /regression/ to
 GNU patch?
 If git apply get /fixed/ for that same CVE, would that /break/ your fix?

I _think_ we allow arbitrary symlinks to be created, but then we
should be careful about actually _following_ them.

At least I _thought_ we were already quite careful not to do that,
even if it's been a long time since I looked at the code. So even if
we create a symlink to outside the repository, it normally shouldn't
matter. We have that whole lstat_cache() thing that exists exactly
to make it efficient to do pathname lookups while at the same time
being aware of symlinks in the middle.

Of course, our lstat cache is racy if somebody else modifies the tree
concurrently and changes things, but that's a non-issue, because if
somebody can just directly create random symlinks in the middle of the
tree, I don't think we care about any symlinks _git_ might be creating
concurrently ;)

But it is entirely possible that git apply - especially when used
outside of a real git directory - ends up doing that. And it's not
like we necessarily always use the whole lstat-cache mechanism to
begin with, so the fact that we have the infrastructure to be careful
in no way means that we necessarily always _are_ careful...

 Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch-2.7.3 no longer applies relative symbolic link patches

2015-01-26 Thread Linus Torvalds
On Mon, Jan 26, 2015 at 1:07 PM, Josh Boyer jwbo...@fedoraproject.org wrote:

 Or did I miss a way that git-apply can take a git patch and apply it
 to a tree that isn't a git repo?

Exactly. git apply works as a straight patch replacement outside
of a git repository. It doesn't actually need a git tree to work.

(Of course, git apply is _not_ a patch replacement in the general
sense. It only applies context diffs - preferentially git style ones -
 so no old-style patches etc need apply. And it's not
replacement-compatible in a syntax sense either, in that while many of
the options are the same, not all are etc etc).

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] request-pull: do something if $3 is passed

2015-02-17 Thread Linus Torvalds
On Tue, Feb 17, 2015 at 12:34 PM, Paolo Bonzini bonz...@gnu.org wrote:

 I guess only Linus could answer that, since he wrote 024d34cb0 and he
 knows the intent better than anyone else.

I don't even understand your problem.

You said

  when $3 is not passed git will try to use HEAD as the default but
it cannot be resolved to a tag, neither locally (patch 2) nor remotely
(patch 3)

which makes absolutely no sense.

HEAD is not a tag. Never has been, never will be. If you want me to
pull a tag, then you damn well should say what tag you want, not just
randomly say HEAD.

So what is it you want to do? At no point is HEAD should resolve as a
tag sensible.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] request-pull: do something if $3 is passed

2015-02-17 Thread Linus Torvalds
On Tue, Feb 17, 2015 at 12:53 PM, Paolo Bonzini bonz...@gnu.org wrote:

 Without $3, git tries to do things that make no sense like git show-ref
 --heads --tags HEAD; or that make little sense when requesting a pull,
 like looking for HEAD in the output of git ls-remote.  But from the
 release notes of 2.0 it looks like it's intended and the script is just
 taking shortcuts.

It is *you* who make no sense.

Looking for HEAD in git ls-remote? Perfectly sensible:

[torvalds@i7 linux]$ git ls-remote origin | grep HEAD
cc4f9c2a91b7be7b3590bb1cbe8148873556aa3f HEAD

that's the default thing when you don't specify any particular branch or tag.

 Ok, in 1.9.x I used to not say anything; if the new workflow is to
 always specify a tag, that's okay.

Indeed. You have to specify what you want me to pull. Exactly because
in 1.9.x people didn't, and I got *really* tired of getting bogus pull
requests that didn't work, or pointed at the wrong branch when people
had multiple branches with the same contents etc.

 I wanted git to find the matching tag on the remote side when I use git
 request-pull origin/master URL with no third parameter, since I never
 request pulls except with a single signed tag.

The thing is, HEAD works. Not for you, because you don't use HEAD. But
because you don't use HEAD, you shouldn't use the default.

I *would* agree to making $3 be mandatory, but there are still people
out there who just use a single branch per repository and no signed
branches. Which is the only reason that default HEAD' thing exists.

   :Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] request-pull: do something if $3 is passed

2015-02-17 Thread Linus Torvalds
On Tue, Feb 17, 2015 at 1:03 PM, Junio C Hamano gits...@pobox.com wrote:

 HEAD should resolve as a tag is not sensible, but HEAD should
 locally DWIM to something sensible is still possible, no?

I disagree. Why? Because what you have locally is *not* necessarily
the same thing you have remotely.

And that's *exactly* why people used to send me broken pull requests.
git pull-request would guess on things, and it would get the guesses
wrong, and write the pull request wrong.

 We could for example make the rule for unset $3 case like this:
 instead of the current missing $3 is a request to pull HEAD:

 If you have one and only one signed tag that happens to point at
 the commit sitting at HEAD, behave as if that tag was given as
 the third argument from the command line.

If you verify that one and only to be true both locally and
remotely, then I guess I would be ok with it. But it really would have
to be unique. And truly unique, as in no confusion about branches or
tags, only one or the other. Because the tag vs branch was one of
the main sources of confusion that made me repeatedly get bad pull
requests, particularly when there was something locally that wasn't
actually named the same thing remotely.

 Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] request-pull: do something if $3 is passed

2015-02-17 Thread Linus Torvalds
On Tue, Feb 17, 2015 at 1:10 PM, Paolo Bonzini pbonz...@redhat.com wrote:

 Sure.  But if I got a pull request saying please pull
 git://example.org/foo.git HEAD I would think that the sender
 messed up the pull request.  So *in the context of git-request-pull*
 ${remote:-HEAD} makes little sense to me.

Umm. If somebody actually leaves off the third argument THAT IS NOT AT
ALL what it prints.

It will show

The following changes since commit base...

.. base commit description ..

   are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

for you to fetch changes up to cc4f9c2a91b7be7b3590bb1cbe8148873556aa3f:
...

IOW, it does exactly the right thing. It gives the contents of HEAD,
but it doesn't actually say HEAD anywhere.

And just look at lkml. The above kind of branch-less and tag-less pull
requests are still fairly common. It's the original git model, and it
may be a bit archaic, and I much prefer people to send me signed tags,
but hey, that's what don't mention a branch or tag means.

And no, I don't think git request-pull is at all different from other
git commands. git log means the same thing as git log HEAD. Exact
same thing, and nobody would actually write out that HEAD (except
inside scripts, perhaps).

So basically I agree that git request-pull has changed behavior, but
the new behavior is *more* in line with other git commands, and the
old behavior was actually really really odd with that whole extensive
guess what the user means. No other git command ever did that
guessing thing (ok, famous last words, maybe somebody can come up with
one), and not mentioning a branch/tag/commit explicitly pretty much
always means HEAD.

  :Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch-2.7.3 no longer applies relative symbolic link patches

2015-01-26 Thread Linus Torvalds
On Mon, Jan 26, 2015 at 8:32 AM, Josh Boyer jwbo...@fedoraproject.org wrote:

 I went to do the Fedora 3.19-rc6 build this morning and it failed in
 our buildsystem with:

 + '[' '!' -f /builddir/build/SOURCES/patch-3.19-rc6.xz ']'
 + case $patch in
 + unxz
 + patch -p1 -F1 -s
 symbolic link target '../../../../../include/dt-bindings' is invalid
 error: Bad exit status from /var/tmp/rpm-tmp.mWE3ZL (%prep)

Ugh. I don't see anything we can do about this on the git side, and I
do kind of understand why 'patch' would be worried about '..' files.
In a perfect world, patch would parse the filename and see that it
stays within the directory structure of the project, but that is a
rather harder thing to do than just say no dot-dot files.

The short-term fix is likely to just use git apply instead of patch.

The long-term fix? I dunno. I don't see us not using symlinks, and a
quick check says that every *single* symlink we have in the kernel
source tree is one that points to a different directory using ..
format. And while I could imagine that patch ends up counting the
dot-dot entries and checking that it's all inside the same tree it is
patching, I could also easily see patch *not* doing that. So using
git apply _might_ end up being the long-term fix too.

I suspect that if patch cannot apply even old-style kernel patches
due to the symlinks we have in the tree, and people end up having to
use git apply for them, I might end up starting to just use
rename-patches (ie using git diff -M) for the kernel.

I've considered that for a while already, because patch _does_ kind
of understand them these days, although I think it gets the
cross-rename case wrong because it fundamentally works on a
file-by-file basis. But if patch just ends up not working at all,
the argument for trying to maintain backwards compatibility gets
really weak.

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] hex.c: reduce memory footprint of sha1_to_hex static buffers

2015-02-13 Thread Linus Torvalds
On Fri, Feb 13, 2015 at 1:56 PM, Stefan Beller sbel...@google.com wrote:

 As I could not find any documentation on the
 magical 50 in the early days, I cc'd Linus
 in case there is something I did not think of yet.

Nothing magical, it's just rounded up from 40 + NUL character.

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git very slow ?

2015-03-08 Thread Linus Torvalds
On Sun, Mar 8, 2015 at 12:37 PM, David Kastrup d...@gnu.org wrote:

 Since git blame outputs everything once it is finished (the first
 screen is purely the pager's business), it needs to unpack the entire
 history of the file (unless no blameable lines remain at all) and look
 at it.  6 seconds tends not to be all that excessive for extracting more
 than 5 years of a file's history.

Yeah, git blame can easily be several seconds without anything being wrong.

But git commit should be fairly instantaneous. Even over NFS.

That said, on NFS in particular, make sure you don't have

[core]
PreloadIndex = false

in your .gitconfig to disable the threaded index preloading.

But core.preloadindex _should_ be enabled by default in anything but
the most ancient git versions, and it can make a huge difference on
NFS because it allows the 'lstat()' calls to check that the index is
up-to-date to be done in parallel. Without that, git on NFS can be a
bit sluggish.

On local filesystems it normally doesn't make as much of a difference,
since things tend to be either cached or seek-limited.

 Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git very slow ?

2015-03-08 Thread Linus Torvalds
On Sun, Mar 8, 2015 at 12:02 PM, Ken Moffat zarniwh...@ntlworld.com wrote:

 The comments on git bisect were for linus'skernel tree, on a local
 disk.  2.3GB of repo, just under 57000 files.

Ugh. I hope you are talking about checked-out size.

[torvalds@i7 linux]$ du -sh .git
   850M .git

because otherwise it sounds like that repo hasn't been repacked in forever.

To really pack things (which can slow things down for old history as
people said, but on the whole it tends to be a big win due to less
IO), do

   git repack -adf --window=200 --depth=200

and go away for a while. Oh, and make sure your machine has enough
memory and CPU to make that for a while not be *too* long.

You should have a few hundred files (just a few tens of files directly
after the repack) and that roughly 850MB of space for the repository
information itself.

But yeah, fully checked out and built with all the modules etc, you
would have much more.

That said, if you have something fairly that is consistently really
slow (like the git commit you mentioned), *before* doing the repack,
do

   strace -o ../trace-file -Ttt git commit

and we can get a much better guess about why it's so slow. Send it to
me in private email if you don't want to make it public, and I can
take a look.

 ping between them gives times of 0.25 to 0.3 seconds

.. and I *really* hope that was not seconds, but ms. Otherwise your
nfsv3 setup is going to be really really painful.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why does git log -Gregex works with regexp-ignore-case but not with other regexp-related options?

2015-04-20 Thread Linus Torvalds
On Mon, Apr 20, 2015 at 10:41 AM, Junio C Hamano gits...@pobox.com wrote:

 Ahh, OK.  And not just -S and -G, the fields in headers may be
 something user may want to switch independently?

So personally, I hate extra command line flags for this. I'd much
rather see is use something in the regular expression itself, and make
*that* be the way you do it, and make it be the preferred format.

Otherwise, you'll always have the issue that you want *part* to be
case-ignoring, and another entry not, and then it's just messy with
the ignore case being some other thing.

And we support that with perl regexps, but those are only enabled with
libpcre. I wonder if we could just make some simple pattern extension
that we make work even *without* libpcre.

IOW, instead of making people use -regexp-ignore-case, could we just
say that we *always* support the syntax of appending (?i) in front
of the regexp. So that your

git log --regexp-ignore-case --author=tiM --grep=wip

example would be

git log --author=(?i)tiM --grep=wip

and it would match the _author_ with ignoring case, but the
--grep=wip part would be an exact grep.

Right now the above already works (I think) if you:

 - build with USE_LIBPCRE

 - add that --perl-regexp switch.

but what I'm suggesting is that we'd make a special case for the
magical perl modifier pattern at the beginning for (?i), and make it
work even without USE_LIBPCRE, and without specifying --perl-regexp.

We'd just special-case that pattern (and perhaps _only_ that special
four-byte sequence of (?i) at the beginning of the search string),
but perhaps we could support '(?s)' too?

Hmm? I realize that this would be theoretically an incompatible
change, but it would be very convenient and if we document it well it
might be ok. I doubt people really search for (?i) at the beginning
of strings _except_ if they already know about the perl syntax and
want it.

And to clarify: I don't suggest always building with libpcre. I
literally suggest having something like

 /* hacky mac-hack hack */
if (strncmp((?i), p-pattern, 4)) {
p-pattern += 4;
p-ignore_case = true;
}

just in front of the regcomp() call, and nothing more fancy than that.

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Odd broken --date=now behavior in current git

2015-04-15 Thread Linus Torvalds
On Wed, Apr 15, 2015 at 12:22 AM, Eric Sunshine sunsh...@sunshineco.com wrote:

 The fix seems to be simply:

Yup, that seems to do it for me. I'm not sure how we get to
match_digit() with the time string now, though.

So your patch fixes things for me, but I think:

 - we should move the tm.tm_isdst = -1; up a bit and add a comment

 - I'd like to know why it affected the author date but not the
committer date, and how it got to match_digit() with that date string
that didn't contain any digits.

I'd spend the time on this myself, but I'm in the middle of the kernel
merge window, so..

Thanks,
 Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Odd broken --date=now behavior in current git

2015-04-14 Thread Linus Torvalds
I just noticed this because I had amended some merge commits with

   git commit --amend --date=now

to update them, and that gets some funny broken timezones. I suspect
it's some silly daylight savings time issue.

Lookie here, I can reproduce it trivially with current git (in the git
repo itself):

[torvalds@i7 git]$ date; git commit -m Test --allow-empty --date=now
Tue Apr 14 21:11:03 PDT 2015
[master ec7733db5360] Test
 Date: Tue Apr 14 20:11:03 2015 -0800

notice how the commit date message shows something funny. It shows an
hour earlier, but in -0800.

And the resulting commit is broken:

[torvalds@i7 git]$ git show --pretty=fuller
commit ec7733db5360966434e03eab1a849e6d4227231c (HEAD - master)
Author: Linus Torvalds torva...@linux-foundation.org
AuthorDate: Tue Apr 14 20:11:03 2015 -0800
Commit: Linus Torvalds torva...@linux-foundation.org
CommitDate: Tue Apr 14 21:11:03 2015 -0700

Test

notice how the AuthorDate has that -0800, but the CommitDate has -0700.

Hmm.

I can't be the only one seeing this? My guess is that there's a
missing initialization of tm.tm_isdst somewhere or whatever.

The above is with current git:

[torvalds@i7 git]$ git version
git version 2.4.0.rc2

Anybody?

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] am --abort: merge ORIG_HEAD tree into index

2015-08-17 Thread Linus Torvalds
On Mon, Aug 17, 2015 at 2:48 AM, Paul Tan pyoka...@gmail.com wrote:

 It's true that we need to merge the ORIG_HEAD tree into the index
 instead of overwriting it. Patch below.

Seems to work for me. Thanks,

 Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git am --abort screwing up index?

2015-08-16 Thread Linus Torvalds
On Sun, Aug 16, 2015 at 12:46 PM, Linus Torvalds
torva...@linux-foundation.org wrote:

 Maybe it has always done this, and I just haven't noticed (I usually
 _just_ do the git reset --hard thing, don't ask me why I wanted to
 be doubly sure this time). But maybe it's an effect of the new
 built-in am.

I bisected this. It's definitely used to work, and the regression is
from the new built-in am. But I cannot bisect into that branch
'pt/am-builtin', because git am doesn't actually work in the middle
of that branch.

So I've verified that commit c1e5ca90dba8 (Merge branch
'es/worktree-add') is good, and that commit 7aa2da616208 (Merge
branch 'pt/am-builtin') is bad, but I cannot pinpoint the exact
commit where git am --abort starts breaking the index.

But I assume it's simply that initial implementation of --abort in
commit 33388a71d23e (builtin-am: implement --abort) that already
ends up rewriting the index from scratch without applying the old stat
data.

The test-case is pretty simple: just force a git am failure, then do
git am --abort, and then you can check whether the index stat()
information is valid in various ways. For the kernel, doing a git
reset --hard makes it obvious because the reset will force all files
to be written out (since the index stat information doesn't match the
current tree). But you can do it by just counting system calls for a
git diff too. On the git tree, for example, when the index has
matching stat information, you get something like

  [torvalds@i7 git]$ strace -cf git diff
  ..
0.040.25   126 4 open
  ..

ie you only actually ended up with 26 open() system calls. When the
index is not in sync with the stat information, git diff will have
to open each file to see what the actual contents are, and you get

  [torvalds@i7 git]$ strace -cf git diff
  ...
0.300.70   0  5987   302 open
  ...

so now it opened about 6k files instead (and for the kernel, that
number will be much larger, of course).

I _think_ it's because git-am (in clean_index()) uses read_tree(),
while it probably should use unpack_trees with opts.update and
opts.reset set (like reset_index() does in builtin/reset.h).

I have to go off do my weekly -rc now, and probably won't get to
debugging this much further. Adding Stefan to the cc, since he helped
with that --abort implementation.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


git am --abort screwing up index?

2015-08-16 Thread Linus Torvalds
So I just noticed while applying a patch with git am when I had a
dirty tree, and I ended up getting a failure and starting over:

   [torvalds@i7 linux]$ git am --abort
   [torvalds@i7 linux]$ git reset --hard
   Checking out files: 100% (50794/50794), done.0794)
   HEAD is now at 1efdb5f0a924 Merge tag 'scsi-fixes' of
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

and the thing I reacted to is that the git reset --hard re-checked
out all the files.

That implies that git am --abort ended up leaving the index in a bad
state, presumably it re-did the index entirely from HEAD, without
filling it in with the stat() details from the old index.

Maybe it has always done this, and I just haven't noticed (I usually
_just_ do the git reset --hard thing, don't ask me why I wanted to
be doubly sure this time). But maybe it's an effect of the new
built-in am.

I'm about to go out and don't have time to debug this any further
right now, but I'll try to get back to it later. I thought I'd send
out this email in case it makes Paul goes ahh, yes.. obvious

Not a big deal - things *work* fine. But forcing checking out every
file obviously also means that subsequent builds end up being slowed
down etc.,.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git_open_noatime: return with errno=0 on success

2015-08-05 Thread Linus Torvalds
On Tue, Aug 4, 2015 at 11:03 PM, Junio C Hamano gits...@pobox.com wrote:

 I would agree it is a good idea to clear it after seeing the first
 open fail due to lack of O_NOATIME before trying open for the second
 time, iow, more like this?

So I don't think this is _wrong_ per se, but I think the deeper issue
is that somebody cares about 'errno' here in the first place.

A stale 'errno' generally shouldn't matter, because we either

 (a) return success (and nobody should look at errno)

or

 (b) return an error later, without setting errno for that _later_ error.

and I think either of those two situations are the real bug, and this
clear stale errno is just a workaround.

But as mentioned, I don't think clearign errno is wrong, so I'm not
objecting to the patch. I just suspect there's something else goign on
too..

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git log fails to show all changes for a file

2015-07-14 Thread Linus Torvalds
On Tue, Jul 14, 2015 at 12:59 AM, Olaf Hering o...@aepfle.de wrote:
 On Tue, Jul 14, John Keeping wrote:

 It was added in an evil merge (f9da455b93f6ba076935b4ef4589f61e529ae046),

That's not an evil merge. That's just a regular merge. One side had
changed the argument to const:

 - 1b9d48f2a579 (hyper-v: make uuid_le const)

while the other side had renamed the function and added an argument,
and changed the return type:

 - d3ba720dd58c (Drivers: hv: Eliminate the channel spinlock in the
callback path)

an evil merge is something that makes changes that came from neither
side and aren't actually resolving a conflict.

That said, I do wonder if we should just make -p imply --cc. Right
now we have the kind of odd situation that git log -p doesn't show
merge patches, but git show cmit does show them. And you kind of
have to know a lot about git to know the --cc option.

In fact, that git show behavior really is very subtle, but very
useful. It comes from show_rev_tweak_rev(), which is a magic git-show
thing.

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git log fails to show all changes for a file

2015-07-15 Thread Linus Torvalds
On Wed, Jul 15, 2015 at 11:17 AM, Junio C Hamano gits...@pobox.com wrote:

 So this is a suggested change to -p -m behavior?

 Not really.  This is a suggested behaviour for git log -p

Oh, in that case I say NAK NAK NAK.

That would be totally horrible, and completely unacceptable.

I do git log -p all the time, and merge-diffs with --first-parent
are regularly hundreds of thousands of lines for the kernel.

So no. That is COMPLETELY unacceptable. Not even worth discussing.

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git log fails to show all changes for a file

2015-07-15 Thread Linus Torvalds
On Wed, Jul 15, 2015 at 9:29 AM, Junio C Hamano gits...@pobox.com wrote:

 I would think git log -p that implies --cc would be a good
 change, as long as there is an easy escape hatch to let us do what
 we have to do with a rather lengthy git log -p --first-parent -m
 (i.e. show the change relative to its first parent while traversing
 the first parent chain) today.  Perhaps only when there is no
 explicit -m but -p is given, automatically enable --cc, or
 something like that.

That's very close to what git show does through that magic
show_rev_tweak_rev() logic, with the crucial difference being that
git show is designed to always show the diff, so the -p is
implied.

That said, having thought about it more, I'm not entirely sure we can
do it.  The big conceptual difference between git log and git show
is obviously that git show doesn't walk the revision list, and you
always explicitly say show _this_ commit.

And that means that with git show, you kind of _know_ that the
commit is relevant and interesting, in a way that git log does not.
So got git show, it's very natural to say show all the relevant
information, while for git log we did make the choice that maybe
commit diffs aren't relevant by default.

And the whole issue ends up boiling down to maybe we picked the wrong
choice default. We default to that ignore_merges = 1 behavior.

Now, we defaulted to ignoring merge diffs because long long ago, in a
galaxy far away, we didn't have a great way to show the diffs. The
whole --cc option goes back to January -06 and commit d8f4790e6fe7
(diff-tree --cc: denser combined diff output for a merge commit) .
And before that option - so for about 8 months - we had no good way to
show the diffs of merges in a good dense way.

So the whole don't show diffs for merges by default actually made a
lot of sense originally, because our merge diffs were not very useful.

But that does mean that if we now enable --cc by default when you
ask for diffs, we have no good way to _disable_ it. We picked disable
by default, and -m means enable merge diffs. And that made sense
back in 2005 because we really wanted to disable the whole show diffs
for merges thing.

Of course, you can use --no-merges to basically not show merges at
all, so maybe that's ok. But I get the feeling that if we make -p
imply --cc, we should probably add a --no-merge-diff option too to
replace the (broken) -m flag properly. And I'm a tiny bit worried
that it might break some script (although I'm really not seeing
how/why you'd script git log -p output and not want to get a --cc
patch for a merge).

And -m? We should probably get rid of it. The diffs we get for
merges when -c or --cc isn't given are _not_ useful. The original
non-combined diff format was really just useful for showing that
yeah, we have multiple parents, and they are different in all these
ways.  So there is no actual valid use for -m that I can imagine.
It's simply just an odd historical artifact from a time when we didn't
know how to show merges.

 Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git log fails to show all changes for a file

2015-07-15 Thread Linus Torvalds
On Wed, Jul 15, 2015 at 10:58 AM, Junio C Hamano gits...@pobox.com wrote:

  * When '-p' is given, we show only diff with first-parent by
default, regardless of the traversal (i.e. --first-parent option
currently controls both traversal and patch display, but in the
new world order, it reverts back to purely a traversal option).

So this is a suggested change to -p -m behavior?

Yes, that sounds sane. The current -p -m behavior is not useful at all.

So if I understand rightly, we'd have:

 -p would be what is currently  -p --cc
 -p -m would be what is currently -p --first-parent
 -p --no-show-merge-diffs would be what is currently -p

and the rationale would be that

 (a) the current -p is hiding things, and while you can add --cc,
that requires that you really understand what is being hidden, which
is a bad default (the complaint that started this discussion)

 (b) the current -p -m is useless crazy stuff, and you'd rather use
it for something that you actually find very common and useful

If so, I agree entirely.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Linus Torvalds
On Mon, Nov 9, 2015 at 10:32 AM, Victor Leschuk
 wrote:
> On Mon, Nov 9, 2015 at 9:28 AM, Victor Leschuk
>>   num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT;
>
> Actually I have never said the nCPUs played main role in it. T

The pseudo-code you sent disagrees. Not that "online_cpus() <= 1" is
likely to ever be really an issue on any development platform from the
last decade.

However, I do have to admit that that "online_cpus()" check goes back
a long time, so I guess I can't really blame you.

At least in the index preloading, I was very conscious of the IO
issues. It doesn't actually make a big difference on traditional disks
(seek times dominate, and concurrent IO often doesn't help at all),
but the reason I keep on bringing up NFS is that back when I used CVS
(oh, the horrors), I *also* worked at a company that did everything
over NFS.  CPU ended up almost never being the limiting factor for any
SCM operation.

So I don't have a very good idea of *what* we should use for automatic
thread detection, but I'm pretty sure online_cpu's should not be it.
Except, like Jeff mentioned, for pack formation (which does tend to be
all about CPU).

Sadly, detecting what kind of filesystem you are on and how well
cached it is, is really pretty hard. Even when you have OS-specific
knowledge, and can look up the *type* of the filesystem, what often
matters more is things like "is the filesystem on a rotational media
or using flash?" etc.

In the meantime I'd argue for just getting rid of the online_cpu's
check, because

 (a) I think it's actively misleading

 (b) the threaded grep probably doesn't hurt much even on a single
CPU, and the _potential_ upside from IO could easily dwarf the cost.

 (c) do developers actually have single-core machines any more?

But if somebody can come up with a smarter model, that would certainly
be good too. The IO advantages really don't tend to be there for
rotational media, but for both flash and network filesystems, threaded
IO can be a huge deal.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Linus Torvalds
On Mon, Nov 9, 2015 at 9:28 AM, Victor Leschuk
 wrote:
>
> Maybe use the simplest version (and keep num_numbers == 0 also as flag for 
> all other checks in code like if(num_flags)  ):
>
> if (list.nr || cached )
>   num_threads = 0; // do not use threads
> else if (num_threads == 0)
>   num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT;

I will say this AGAIN.

The number of threads is *not* about the number of CPU's. Stop this
craziness. It's wrong.

The number of threads is about parallelism. Yes, CPU's is a small part
of it. But as mentioned earlier, the *big* wins are for slow
filesystems, NFS in particular. On NFS, even if you have things
cached, the cache revalidation is going to cause network traffic
almost every time. Being able to have several of those outstanding is
a big deal.

So stop with the "online_cpus()" stuff. And don't base your benchmarks
purely on the CPU-bound case. Because the CPU-bound case is the case
that is already generally so good that few people will care all *that*
deeply.

Many of the things git does are not for "best-case" behavior, but to
avoid bad "worst-case" situations. Look at things like the index
preloading (also threaded). The big win there is - again - when the
stat() calls may need IO. Sure, it can help for CPU use too, but
especially on Linux, cached "stat()" calls are really quite cheap. The
big upside is, again, in situations like git repositories over NFS.

In the CPU-intensive case, the threading might make things go from a
couple of seconds to half a second. Big deal. You're not getting up to
get a coffee in either case.

In the network traffic case, the threading might make things go from
one minute to ten seconds. And *that* is a big deal. That's huge.
That's "annoyingly slow" to "oh, this is the fastest SCM I have ever
worked with in my life".

That can literally be something that changes how you work for a
developer. You start doing things that you simply would never
otherwise do.

So *none* of the threading in git is about CPU's. Maybe we should add
big honking comments about that.

And that big honking comment should be in the documentation for the
new flag too. Because it would be really really sad if people say "I
have a laptop with just two cores, so I'll set the threading option to
2", when they then work mostly over a slow wireless network and their
company wants minimal local installs.

The biggest reason to NOT EVER add that configuration is literally the
confusion about this being about CPU's. That was what got the whole
thread started, that was what the original benchmark numbers were
about, and that was WRONG.

So I would strongly suggest that Junio ignore these patches unless
they very clearly talk about the whole IO issue. Both in the source
code, in the commit messages, and in the documentation for the config
option.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


More builtin git-am issues..

2015-09-04 Thread Linus Torvalds
Ok, this may not be new either, but I'm trying to be careful when
using "git am" these days, because I know it got rewritten.

And I _think_ the whitespace handling for adding sign-offs got scrogged.

I just applied the usual patch-bomb from Andrew, and several of the
commits (but not all) end up looking like this:

Cc: <sta...@vger.kernel.org> [3.15+]
Signed-off-by: Andrew Morton <a...@linux-foundation.org>

    Signed-off-by: Linus Torvalds <torva...@linux-foundation.org>

note the extraneous whitespace line between Andrew's sign-off and mine.

What's odd is that the emails I'm applying literally don't have that
extra empty line, so it's git that somehow decides to add it.  Only
for a few cases, though.

The pattern *seems* to be that git now looks at the *first* line of
the sign-off block and decides that "this is a sign-off block if that
first line has a sign-ff on it, ie this is fine:

Signed-off-by: Andrea Arcangeli <aarca...@redhat.com>
Cc: Pavel Emelyanov <xe...@parallels.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Signed-off-by: Andrew Morton <a...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torva...@linux-foundation.org>

but the failing cases have a comment by Andrew:

[a...@linux-foundation.org: coding-style fixes]
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Cc: Xishi Qiu <qiuxi...@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasu...@jp.fujitsu.com>
Cc: Kamezawa Hiroyuki <kamezawa.hir...@jp.fujitsu.com>
Cc: Taku Izumi <izumi.t...@jp.fujitsu.com>
Cc: Gu Zheng <guz.f...@cn.fujitsu.com>
Cc: Naoya Horiguchi <n-horigu...@ah.jp.nec.com>
Cc: Vlastimil Babka <vba...@suse.cz>
Cc: Mel Gorman <mgor...@techsingularity.net>
Cc: David Rientjes <rient...@google.com>
Cc: <sta...@vger.kernel.org>[4.2.x]
Signed-off-by: Andrew Morton <a...@linux-foundation.org>

Signed-off-by: Linus Torvalds <torva...@linux-foundation.org>

ie that "[a...@linux-foundation.org: coding-style fixes]" makes git am
now decide that the previous block of text was not a sign-off block,
so it adds an empty line before adding my sign-off. But very obviously
it *was* a sign-off block.

Maybe this isn't new at all, and it's just that I notice because I'm
looking for "git am" oddities.  Something is clearly wrong in
"has_conforming_footer()".

I *think* it's this part:

if (!(found_rfc2822 ||
  is_cherry_picked_from_line(buf + i, k - i - 1)))
return 0;

which basically returns 0 for _any_ line in the footer that doesn't
match the found_rfc2822 format.

I really think that if we find any "Signed-off-by:" in that last
chunk, we should not add a whitespace.

Comments?

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More builtin git-am issues..

2015-09-04 Thread Linus Torvalds
On Fri, Sep 4, 2015 at 4:47 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
>
> I *think* it's this part:
>
> if (!(found_rfc2822 ||
>   is_cherry_picked_from_line(buf + i, k - i - 1)))
> return 0;
>
> which basically returns 0 for _any_ line in the footer that doesn't
> match the found_rfc2822 format.

Confirmed. I hacked up a version that just doesn't do that check at
all, and it works fine (but obviously only on well-formatted emails
that really do have a sign-off).

So I think that logic should basically be extended to saying

 - if any line in the last chunk has a "Signed-off-by:", set a flag.

 - at the end of the loop, if that flag wasn't set, return 0.

Instead of that thing that basically returns zero immediately when it
sees a line it doesn't like.

I'm in the middle of my merge window, so I'm not going to get around
to writing a patch until that's over. Hopefully somebody will step up
in the meantime. Hint, hint.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More builtin git-am issues..

2015-09-04 Thread Linus Torvalds
On Fri, Sep 4, 2015 at 6:06 PM, Junio C Hamano  wrote:
>
> that rule would still not think this is a signature block, but at
> that point, do we really want to consider such a block of text a
> signature block?

So exactly why are you arguing for these rules that are known to break
in real life, that I gave actual examples for existing, and that I
also gave an actual example for not just giving a false negative, but
also a false positive?

I'm also pretty sure that what you are arguing for is a regression.

Now, as mentioned, it may well be true that we've had this odd
behavior before, and it's not a real regression - I may just have
picked up on this problem because I've been more careful. Maybe I
didn't notice these problems before.

But looking at the old git-am.sh script, it does simply seem to look
for that '^Signed-off-by:' pattern. It did

  ADD_SIGNOFF=$(
  test "$LAST_SIGNED_OFF_BY" = "$SIGNOFF" || {
  test '' = "$LAST_SIGNED_OFF_BY" && echo
  echo "$SIGNOFF"
  })

which seems to literally just check the last sign-off line it found.
If it matches the new sign-off, it doesn't do anything (and doesn't
add the new one either), and if it doesn't exist at all (so it's
empty) it adds teh empty line.

Quite frankly, that not only worked for a long time, it's simply less
ambiguous than your made up rule.

It's very simple. "if you find a sign-off in the commit message, don't
add an new empty line before the new signoff".

Much better than "every line in the last set of lines must match some
weak format that isn't even true, and is too non-specific anyway".

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More builtin git-am issues..

2015-09-04 Thread Linus Torvalds
On Fri, Sep 4, 2015 at 6:23 PM, Junio C Hamano <gits...@pobox.com> wrote:
>
> OK, I didn't know that was acceptable in the kernel community
> to have random comments like that inside the block.  Can these
> comments span multiple paragraphs?  What I am wondering is what
> you want to see in a case like this:
>
>  Signed-off-by: Noam Camus <no...@ezchip.com>
>  Acked-by: Vineet Gupta <vgu...@synopsys.com>
>  [ Also removed pointless cast from "void *". - Linus]
>  Signed-off-by: Linus Torvalds <torva...@linux-foundation.org>
>  [ Ahh, we have to tell at least these people
>
>   - stakeholder class 1
>   - staeholder class 2
>  ]
>  Cc: foo
>  Cc: bar

So quite frankly, at that point if git doesn't recognize it as a
sign-off block, I don't think it's a big deal.

That said, the original "git am" rules actually seem to be rather
straightforward: it's never an issue about "last block of text", and
it's simply an issue of "is there a sign-ff _anywhere_ in the text".

That simplicity has a certain appeal to me. I don't think it was
necessarily written that way because it was "well designed" - I
suspect it is more an issue of "easy to implement in a shell-script".

And it's possible that I'm mis-reading the scripts too. It's not like
I _remember_ what the exact behavior was, I just think it used to work
really well for us (ie I don't recall seeing lots of those empty lines
in the middle of the sign-off block before, and this current merge
window it happened for four of the emails I applied from Andrew's
119-patch series..

Four out of 119 emails may not be a big percentage, but it does mean
that it's not horribly unusual either..

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] am: match --signoff to the original scripted version

2015-09-06 Thread Linus Torvalds
On Sat, Sep 5, 2015 at 9:56 PM, Junio C Hamano  wrote:
>
>  For
> the upcoming release, stop using the append_signoff() in "git am"
> and reimplement the looser definition used by the scripted version
> to use only in "git am" to fix this regression in "am" while
> avoiding new regressions to other users of append_signoff().

I've tested this by re-doing the same patch-bomb from Andrew that I
had problems with originally, and it WorksForMe(tm).

Thanks,

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More builtin git-am issues..

2015-09-04 Thread Linus Torvalds
On Fri, Sep 4, 2015 at 5:07 PM, Jeff King  wrote:
>
> Do we want to make Signed-off-by a special token here? What about "Cc:",
> and other project-specific trailers? You wouldn't want to end up with:
>
>   [some comment]
>   Cc: somebody
>
>   Signed-off-by: somebody else
>
> It's not a problem for git-am, which should be taking in patches that
> are already signed-off (and after all, if your project does not use
> signoffs, why are you using "am -s"?). But won't "git commit -s" run
> into the same problem?

So I'm a *bit* worried about taking anything else than Signed-off-by:
exactly because of the problem you mention:

> We could look for an arbitrary rfc2822-ish string, but I'd be worried
> slightly about false positives, like:
>
>   This is a paragraph with arbitrary text. But one
>   of the lines will use a colon, and that a causes a
>   problem: because of wrapping, this line kind of looks
>   like a trailer.

which clearly needs an empty line before the sign-off.

And even if we limit it to just the last line, like you suggest:

> Maybe only the final line needs to look rfc2822-ish?

I'd still worry, for the same exact reason. The "rfc2822-ish" check is
_so_ weak that it can be trivially triggered by normal text.

So maybe it doesn't require a strict "Signed-off-by:" match, but I
think it needs something stricter than the found_rfc2822 format (like
maybe "looks like a name/email").  We just don't want to require that
*all* the lines are that way, because at least in the kernel we often
do end up adding small comments in that section.

The git project sign-off rules seem to be stricter, and it looks like
it's almost universally of the form "Signed-off-by:" with a few
"Helped-by:" and "Reviewed-by:". In the kernel, we really do tend to
be messier in that area, with the sign-off chunk not just having the
sign-offs and cc's etc, but also tends to have those occasional small
notes left by the people forwarding the emails.

And we also often don't have _strictly_ just an email. We tend to have
things like

Cc:  # v3.13+

etc, and sometimes we don't have have any email at all, but things like

Reported-by: coverity
Tested-by: MysterX on #openelec
Generated-by: Coccinelle SmPL

so it's hard to give a really strict rule. It's fairly free-format.

That said, I would argue that when you apply a patch with sign-off,
pretty much by definition that patch _should_ have a sign-off from the
originator. So I suspect that the "there is an existing sign-off in
that last chunk" is a fairly good rule. Even if there are lots of
other lines too.

If you're using "git commit -s" to just commit your own work, you
presumably would normally want the extra sign-off. Or at least you can
do a "git commit --amend" fairly easily to fix things up. Doing the
amending later for "git am" is rather impractical, when it might have
been a series of 100+ emails..

  Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More builtin git-am issues..

2015-09-04 Thread Linus Torvalds
On Fri, Sep 4, 2015 at 5:54 PM, Junio C Hamano <gits...@pobox.com> wrote:
>
> How about a bit looser rule like this?
>
> A block of text at the end of the message, each and every
> line in which must match "^[^:  ]+:[  ]" (that is,
> a "keyword" that does not contain a whitespace nor a colon,
> followed by a colon and whitespace, and arbitrary value thru
> the end of line) is a signature block.

No. That's still broken.

The thing is, and that was what the report was all about, not every
line _is_ of that format. We have commetns from the sign-off people.
Things like this:

Signed-off-by: Noam Camus <no...@ezchip.com>
Acked-by: Vineet Gupta <vgu...@synopsys.com>
[ Also removed pointless cast from "void *".  - Linus ]
Signed-off-by: Linus Torvalds <torva...@linux-foundation.org>

or

Signed-off-by: Andi Kleen <a...@linux.intel.com>
[ Updated comments and changelog a bit. ]
Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
Link: 
http://lkml.kernel.org/r/1424225886-18652-3-git-send-email-a...@firstfloor.org
Signed-off-by: Ingo Molnar <mi...@kernel.org>

so no, it is simply not true that "every line must match".

I'm not even seeing why you argue for that, since clearly having a
sign-off-line is actually a safer choice too. The "every line must
match" rule is bad, not just because it's not true like above, but
also because it can be true without it being a sign-off block.

For example, it's not at all unlikely that you have perfectly normal
comments that just list some subsystem and their changes. Which could
easily look like

   Trivial fixes all over the tree

   drm: fix whitespace
   mm: speeling errors
   kernel: indentation and codign style

The above looks like a perfectly sane commit log to me.

Do you seriously think that it makes for a better "sign-off block
test" than one that actually checks for "is there a sign-off line"?

I'd much rather have special cases like testing for specific keywords
or looking for things that look like emails, than make it about being
"every line has this very generic format".

 Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RFC: dynamic "auto" date formats

2016-05-26 Thread Linus Torvalds
This is a throw-away idea with a simple patch attached, which I don't
think anybody should really take all that seriously per se, but I
thought I'd throw it out and see if it generates any discussion.

I almost never use anything but the default date format (DATE_NORMAL),
but every once in a while I will use "--date=relative", typically
because I'm looking at my own commits and I'm just checking when I did
something.

It struck me that I've made the default for most of the logging things
be that when I'm just browsing with the pager, git will automatically
do the right thing. So I have

  [color]
  ui=auto

  [log]
  decorate = auto

in my ~/.gitconfig, and it shows me all those other things I tench to
want to know (like "thar's what I've pushed out" thanks to
decorations).

So I started thinking about when I care about dates, and decided that
maybe I can have a "--date=auto" too. It basically uses relative date
formats for the last 24 hours, and then switches over to the default
format.

I've used it a bit, and like Katy Perry said, I think I might like it.

Note that this doesn't add any gitconfig setting to do this, which
would be part of the whole point if this is actually sensible. But I'm
not entirely convinced it's worth it in the first place, thus this
email to see how people react ("That's just stupid" vs "yeah, I didn't
even know I wanted it, but now I need it").

And no, I'm not at all sure that the 24-hour cut-off is the right
thing, but it didn't seem completely crazy either. I tend to like the
relative date format when it is "19 minutes ago" vs "2 hours ago", at
some point it's long enough ago that it's more useful to know "Tuesday
at 3pm" than about how long ago it was.

(And yes, it would be even better to have the "short term relative
date" turn into a "medium-term 'day of the week at time x'" and then
turn into "full date" when it's more than a week ago, but this patch
only has the two modes of "short term" and "long term" and nothing in
between).

   Linus
 builtin/blame.c |  1 +
 cache.h |  3 ++-
 date.c  | 10 +++---
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 21f42b0b62b8..4d87181dc6cd 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2640,6 +2640,7 @@ parse_done:
   fewer display columns. */
blame_date_width = utf8_strwidth(_("4 years, 11 months ago")) + 
1; /* add the null */
break;
+   case DATE_AUTO:
case DATE_NORMAL:
blame_date_width = sizeof("Thu Oct 19 16:00:04 2006 -0700");
break;
diff --git a/cache.h b/cache.h
index 6049f8671138..624817c20414 100644
--- a/cache.h
+++ b/cache.h
@@ -1223,7 +1223,8 @@ struct date_mode {
DATE_ISO8601_STRICT,
DATE_RFC2822,
DATE_STRFTIME,
-   DATE_RAW
+   DATE_RAW,
+   DATE_AUTO,
} type;
const char *strftime_fmt;
int local;
diff --git a/date.c b/date.c
index 7c9f76998ac7..c38414a3d968 100644
--- a/date.c
+++ b/date.c
@@ -184,13 +184,15 @@ const char *show_date(unsigned long time, int tz, const 
struct date_mode *mode)
return timebuf.buf;
}
 
-   if (mode->type == DATE_RELATIVE) {
+   if (mode->type == DATE_RELATIVE || mode->type == DATE_AUTO) {
struct timeval now;
 
strbuf_reset();
gettimeofday(, NULL);
-   show_date_relative(time, tz, , );
-   return timebuf.buf;
+   if (mode->type != DATE_AUTO || time + 24*60*60 > now.tv_sec) {
+   show_date_relative(time, tz, , );
+   return timebuf.buf;
+   }
}
 
tm = time_to_tm(time, tz);
@@ -792,6 +794,8 @@ static enum date_mode_type parse_date_type(const char 
*format, const char **end)
return DATE_RAW;
if (skip_prefix(format, "format", end))
return DATE_STRFTIME;
+   if (skip_prefix(format, "auto", end))
+   return DATE_AUTO;
 
die("unknown date format %s", format);
 }


Re: Clarification on the git+ssh and ssh+git schemes

2016-02-05 Thread Linus Torvalds
On Fri, Feb 5, 2016 at 11:30 AM, Jeff King  wrote:
>
> I suspect they were not really documented because nobody wanted to
> encourage their use. I don't think it would be wrong to document that
> they exist and are deprecated, though.

They exist because some people seemed to think that people shouldn't
use "ssh://" since they thought that only ssh should use that.

Which is obviously bullshit, since by that logic all the other formats
should have that idiotic "git+" format too ("git+https", anybody?). It
doesn't actually help anything, and it only pushes somebodys broken
agenda.

So there was a push for that silly thing by a couple of people, but it
was always wrong. Don't even document it.

Leave it in the source code as an option, and maybe add a comment
about "This is stupid, but we support it for hysterical raisins".

Don't add it to any real documentation. Not even as deprecated. That
just validates it further.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] pretty-print: de-tabify indented logs to make things line up properly

2016-03-18 Thread Linus Torvalds

From: Linus Torvalds <torva...@linux-foundation.org>
Date: Wed, 16 Mar 2016 09:15:53 -0700
Subject: [PATCH] pretty-print: de-tabify indented logs to make things line up 
properly

This should all line up:

  Column 1  Column 2
    
  A B
  ABCD  EFGH
  SPACESInstead of Tabs

Even with multi-byte UTF8 characters:

  Column 1  Column 2
    
  Ä B
  åäö   100
  A Møøse   once bit my sister..

Signed-off-by: Linus Torvalds <torva...@linux-foundation.org>
---

This seems to work for me, and while there is some cost, it's minimal. 
Doing a "git log > /dev/null" of the current git tree is about 1% slower 
because of the tab-finding. A tree with a lot of tabs in the commit 
messages would be more noticeable, because then you actually end up 
hitting the whole "how wide is this" issue.

(But if the tabs are all at the beginning of a line, you'd still be ok 
and avoid the utf8 width calculations).

Comments?

 pretty.c | 76 ++--
 1 file changed, 74 insertions(+), 2 deletions(-)

diff --git a/pretty.c b/pretty.c
index 92b2870a7eab..0b40457f99f0 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1629,6 +1629,76 @@ void pp_title_line(struct pretty_print_context *pp,
strbuf_release();
 }
 
+static int pp_utf8_width(const char *start, const char *end)
+{
+   int width = 0;
+   size_t remain = end - start;
+
+   while (remain) {
+   int n = utf8_width(, );
+   if (n < 0 || !start)
+   return -1;
+   width += n;
+   }
+   return width;
+}
+
+/*
+ * pp_handle_indent() prints out the intendation, and
+ * perhaps the whole line (without the final newline)
+ *
+ * Why "perhaps"? If there are tabs in the indented line
+ * it will print it out in order to de-tabify the line.
+ *
+ * But if there are no tabs, we just fall back on the
+ * normal "print the whole line".
+ */
+static int pp_handle_indent(struct strbuf *sb, int indent,
+const char *line, int linelen)
+{
+   const char *tab;
+
+   strbuf_addchars(sb, ' ', indent);
+
+   tab = memchr(line, '\t', linelen);
+   if (!tab)
+   return 0;
+
+   do {
+   int width = pp_utf8_width(line, tab);
+
+   /*
+* If it wasn't well-formed utf8, or it
+* had characters with badly defined
+* width (control characters etc), just
+* give up on trying to align things.
+*/
+   if (width < 0)
+   break;
+
+   /* Output the data .. */
+   strbuf_add(sb, line, tab - line);
+
+   /* .. and the de-tabified tab */
+   strbuf_addchars(sb, ' ', 8-(width & 7));
+
+   /* Skip over the printed part .. */
+   linelen -= 1+tab-line;
+   line = tab + 1;
+
+   /* .. and look for the next tab */
+   tab = memchr(line, '\t', linelen);
+   } while (tab);
+
+   /*
+* Print out everything after the last tab without
+* worrying about width - there's nothing more to
+* align.
+*/
+   strbuf_add(sb, line, linelen);
+   return 1;
+}
+
 void pp_remainder(struct pretty_print_context *pp,
  const char **msg_p,
  struct strbuf *sb,
@@ -1652,8 +1722,10 @@ void pp_remainder(struct pretty_print_context *pp,
first = 0;
 
strbuf_grow(sb, linelen + indent + 20);
-   if (indent)
-   strbuf_addchars(sb, ' ', indent);
+   if (indent) {
+   if (pp_handle_indent(sb, indent, line, linelen))
+   linelen = 0;
+   }
strbuf_add(sb, line, linelen);
strbuf_addch(sb, '\n');
}
-- 
2.8.0.rc2
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] GPIO bulk changes for kernel v4.6

2016-03-18 Thread Linus Torvalds
On Fri, Mar 18, 2016 at 9:37 AM, Junio C Hamano  wrote:
>
>> I don't think the original "resolve" did it, for example. You can't do
>> a three-way merge without a base.
>
> Yes, and that continues to this day:

Yeah, "octopus" also refuses it cleanly:

common=$(git merge-base --all $SHA1 $MRC) ||
die "Unable to find common commit with $pretty_name"

The code in the recursive merge that allows this to happen is this:

if (merged_common_ancestors == NULL) {
/* if there is no common ancestor, use an empty tree */
struct tree *tree;

tree = lookup_tree(EMPTY_TREE_SHA1_BIN);
merged_common_ancestors = make_virtual_commit(tree, "ancestor");
}

so the "no common ancestors" is just considered to be an empty merge base.

And I do think that's right, and I think it's clever, and it goes back to 2006:

  934d9a24078e merge-recur: if there is no common ancestor, fake empty one

but I think there should be an option there.

> This is a tangent but I wonder if we should say why we refuse to
> the standard error before calling these two "exit"s.

As mentioned, Octopus does.

That said, there's probably no reason to ever use the old three-way
merge, so I'm not even sure it's worth fixing the old
git-merge-resolve.

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-15 Thread Linus Torvalds
So I end up doing this manually when I notice, but I was wondering ig
maybe git could just have an option to "git am" and friends to
de-tabify the commit message.

It's particularly noticeable when people line things up using tabs
(for the kernel, it's often things like "cpu1 does X, cpu2 does Y"),
and then when you do "git log" it looks like a unholy mess, because
the 4-char indentation of the log message ends up causing those things
to not line up at all after all.

The natural thing to do would be to pass in a "tab size" parameter to
strbuf_stripspace(), and default it to 0 (for no change), but have
some way to let people say "expand tabs to spaces at 8-character
tab-stops" or similar (but let people use different tab-stops if they
want).

Do people hate that idea? I may not get around to it for a while (it's
the kernel merge window right now), but I can write the patch
eventually - I just wanted to do an RFC first.

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-15 Thread Linus Torvalds
On Tue, Mar 15, 2016 at 5:23 PM, Stefan Beller  wrote:
>
> Could you point at some example to better understand the problem?

So in the kernel repo, I just randomly looked for tabs that show this
problem, and take for example commit
ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd.

You can see how the original lined up, by doing

git show --pretty=email ff9a9b4c4334

because the email format doesn't indent the commit message. But if you do just

git show ff9a9b4c4334

and get the usual indentation, you'll see things not line up at all.

In case you don't want to bother with the kernel repo, here's what it
looks like:

email format:

--- snip snip 8< ---
A microbenchmark calling an invalid syscall number 10 million
times in a row speeds up an additional 30% over the numbers
with just the previous patches, for a total speedup of about
40% over 4.4 and 4.5-rc1.

Run times for the microbenchmark:

 4.43.8 seconds
 4.5-rc13.7 seconds
 4.5-rc1 + first patch  3.3 seconds
 4.5-rc1 + first 3 patches  3.1 seconds
 4.5-rc1 + all patches  2.3 seconds

A non-NOHZ_FULL cpu (not the housekeeping CPU):

 all kernels1.86 seconds
--- snip snip 8< ---

Normal "git show" format:

--- snip snip 8< ---
A microbenchmark calling an invalid syscall number 10 million
times in a row speeds up an additional 30% over the numbers
with just the previous patches, for a total speedup of about
40% over 4.4 and 4.5-rc1.

Run times for the microbenchmark:

 4.43.8 seconds
 4.5-rc13.7 seconds
 4.5-rc1 + first patch  3.3 seconds
 4.5-rc1 + first 3 patches  3.1 seconds
 4.5-rc1 + all patches  2.3 seconds

A non-NOHZ_FULL cpu (not the housekeeping CPU):

 all kernels1.86 seconds
--- snip snip 8< ---

which hopefully clarifies.

In the above case, it really isn't very annoying. It's just slightly
ugly. In some other cases, it can get quite hard to see what's up, but
the ones that come through me I actually tend to try to edit, so many
of them have been corrected.

For other examples (again, in the kernel), look at 19b2c30d3cce, or
0dc8c730c98a.

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-15 Thread Linus Torvalds
On Tue, Mar 15, 2016 at 5:45 PM, Junio C Hamano  wrote:
>
> Wouldn't it be nicer to do this kind of thing at the output side?  A
> stupid way would be to have an option to indent the log text with a
> tab instead of 4-spaces, but a more sensible way would be to keep
> the visual 4-space-indent and do the expand-tab for each line of
> text.

Yes, that would certainly work for me too. It's just the "ascii boxes
don't line up" that is problematic..

(Also people including example source code etc, where the first tab
looks wildly different from the second one)

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] GPIO bulk changes for kernel v4.6

2016-03-18 Thread Linus Torvalds
On Fri, Mar 18, 2016 at 7:32 AM, Johannes Schindelin
<johannes.schinde...@gmx.de> wrote:
>
> On Fri, 18 Mar 2016, Linus Torvalds wrote:
>
>> I thought git didn't merge two branches that have no common base by
>> default, but it seems it will happily do so.
>
> What happened to "The coolest merge EVER!"?
>
> http://thread.gmane.org/gmane.comp.version-control.git/5126/

I'm not complaining about multi-root capability in general - it's
still cool. In the kernel, we have aef8755711a2 ("Merge Btrfs into
fs/btrfs") that does something slightly similar.

It's literally just the fact that "git merge" does it with no extra
flags or checks. I'd like people to have to be aware of what they are
doing when they merge two different projects, not do it by mistake.

So making it conditional on a flag like "--no-common-root" is what I'd look for.

Or just make it about the merge stategy. For example, "subtree" makes
sense exactly for merging one project into a subdirectory of another.
But the default merge shouldn't do it.

I don't think the original "resolve" did it, for example. You can't do
a three-way merge without a base.

Note how that above "coolest merge" definitely wasn't done by just
"git merge gitk". I had to play around with the git internals. Now, it
shouldn't be _that_ hard, but at the same time it really shouldn't be
as easy as "I didn't know, so I merged two independent projects".

   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pretty-print: de-tabify indented logs to make things line up properly

2016-03-18 Thread Linus Torvalds
On Wed, Mar 16, 2016 at 2:37 PM, Junio C Hamano  wrote:
>
> What surprised me was that this new expand logic triggered for
> shortlog, actually.  I somehow assumed the caller that called
> de-tabify helper was only called for --pretty=medium.

I guess that would be ok, since shortlog by definition can't have any
issues with multiple lines lining up with each other.

At the same time, it might be a bit odd to show tabs in that first
line differently for the one-line vs multi-line log version. But maybe
it isn't - I think shortlog is the only thing that does that wrapping
anyway, so shortlog is already special.

I think the reason shortlog output gets both the de-tab and the
wrapping is that shortlog_add_commit() just calls pretty_print_commit
with CMIT_FMT_USERFORMAT.

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pretty-print: de-tabify indented logs to make things line up properly

2016-03-19 Thread Linus Torvalds
On Wed, Mar 16, 2016 at 12:47 PM, Junio C Hamano  wrote:
>
> Strangely running t4201 with your patch (without any squashing)
> seems to show a breakage in shortlog.  I won't be able to come back
> to this topic for at least a few hours, so this is just a single bit
> "breaks" report, without "how and why" analysis, sorry.

It's because those things have tabs in their first line, so the output
now differs from the expected one exactly because of the tab-vs-space
expansion.

The wrapping logic is then also different, because the .wrapping code
does the tabs as "align to 8 chars" while the new code does tabs as
"align to 8 chars modulo the indent offset".

I only looked at the first case, but I assume the others are just more
of the same. We'd just adjust the expected output, I assume.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] pretty-print: add --pretty=noexpand

2016-03-19 Thread Linus Torvalds
On Thu, Mar 17, 2016 at 4:16 PM, Junio C Hamano  wrote:
> It is reasonable for tweak the default output mode for "git log" to
> untabify the commit log message, it sometimes may be necessary to
> see the output without tab expansion.

Thanks, these all look good to me.

Sorry for not following up, it's just that I'm in the middle of the
kernel merge window and haven't had the time to worry about it.

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] merge: refuse to create too cool a merge by default

2016-03-19 Thread Linus Torvalds
On Fri, Mar 18, 2016 at 1:21 PM, Junio C Hamano <gits...@pobox.com> wrote:
> While it makes sense to allow merging unrelated histories of two
> projects that started independently into one, in the way "gitk" was
> merged to "git" itself aka "the coolest merge ever", such a merge is
> still an unusual event.  Worse, if somebody creates an independent
> history by starting from a tarball of an established project and
> sends a pull request to the original project, "git merge" however
> happily creates such a merge without any sign of something unusual
> is happening.
>
> Teach "git merge" to refuse to create such a merge by default,
> unless the user passes a new "--allow-unrelated-histories" option to
> tell it that the user is aware that two unrelated projects are
> merged.

Heh. I had a separate set of patches for you, but hadn't sent them out
because of the other test failures (which you also worked out).

Mine was slightly different, I just went with a "unrelated" merge option.

I'll attach my two patches anyway, if for no other reason than the
fact that I actually wrote a new test for this.

Feel free to ignore my patches, they have nothing really different
from yours, just slightly different implementation.

  Linus
From 0f3e4a9294eeda6799e3e50e28809133233126db Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torva...@linux-foundation.org>
Date: Fri, 18 Mar 2016 12:46:06 -0700
Subject: [PATCH 1/2] t3035: test merging of unrelated branches

Right now this succeeds, and the test actually verifies that behavior.

Signed-off-by: Linus Torvalds <torva...@linux-foundation.org>
---
 t/t3035-merge-recursive-across-project.sh | 28 
 1 file changed, 28 insertions(+)
 create mode 100755 t/t3035-merge-recursive-across-project.sh

diff --git a/t/t3035-merge-recursive-across-project.sh b/t/t3035-merge-recursive-across-project.sh
new file mode 100755
index ..b5d614db6b08
--- /dev/null
+++ b/t/t3035-merge-recursive-across-project.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+
+test_description='merge-recursive with unrelated projects
+
+Test rename detection by examining rename/delete conflicts.
+
+ * A: file A
+
+ * B: file B
+'
+
+. ./test-lib.sh
+
+test_expect_success 'setup repository' \
+	'echo Hello >A &&
+	 git add A &&
+	 git commit -m "Branch A" A &&
+	 git branch A &&
+	 git mv A B &&
+	 echo Hi >B &&
+	 git commit -m "Branch B" --amend B &&
+	 git branch B'
+
+test_expect_success 'merge unrelated branches' \
+	'git checkout -b merged A &&
+	 git merge B'
+
+test_done
-- 
2.8.0.rc3.9.g44915db

From cd6b2388c73f37b3dd6180d9a42993fd219ebb31 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torva...@linux-foundation.org>
Date: Fri, 18 Mar 2016 12:57:30 -0700
Subject: [PATCH 2/2] merge: fail merging of unrelated branches

Add test for this, and add the "unrelated" merge option to allow it to succeed.

Signed-off-by: Linus Torvalds <torva...@linux-foundation.org>
---
 merge-recursive.c | 6 ++
 merge-recursive.h | 1 +
 t/t3035-merge-recursive-across-project.sh | 7 +--
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index b880ae50e7ee..92779db0bbe6 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1927,6 +1927,9 @@ int merge_recursive(struct merge_options *o,
 		/* if there is no common ancestor, use an empty tree */
 		struct tree *tree;
 
+		if (!o->allow_unrelated)
+			die(_("will not merge unrelated branches"));
+
 		tree = lookup_tree(EMPTY_TREE_SHA1_BIN);
 		merged_common_ancestors = make_virtual_commit(tree, "ancestor");
 	}
@@ -2039,6 +2042,7 @@ void init_merge_options(struct merge_options *o)
 	memset(o, 0, sizeof(struct merge_options));
 	o->verbosity = 2;
 	o->buffer_output = 1;
+	o->allow_unrelated = 0;
 	o->diff_rename_limit = -1;
 	o->merge_rename_limit = -1;
 	o->renormalize = 0;
@@ -2092,6 +2096,8 @@ int parse_merge_opt(struct merge_options *o, const char *s)
 		o->renormalize = 1;
 	else if (!strcmp(s, "no-renormalize"))
 		o->renormalize = 0;
+	else if (!strcmp(s, "unrelated"))
+		o->allow_unrelated = 1;
 	else if (!strcmp(s, "no-renames"))
 		o->detect_rename = 0;
 	else if (!strcmp(s, "find-renames")) {
diff --git a/merge-recursive.h b/merge-recursive.h
index 52f0201f68a3..19eb52eeb732 100644
--- a/merge-recursive.h
+++ b/merge-recursive.h
@@ -15,6 +15,7 @@ struct merge_options {
 	const char *subtree_shift;
 	unsigned buffer_output : 1;
 	unsigned renormalize : 1;
+	unsigned allow_unrelated : 1;
 	long xdl_opts;
 	int verbosity;
 	int detect_rename;
diff --git a/t/t3035-merge-recursive-across-project.sh b/t/t

Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-15 Thread Linus Torvalds
On Tue, Mar 15, 2016 at 5:48 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> On Tue, Mar 15, 2016 at 5:45 PM, Junio C Hamano <gits...@pobox.com> wrote:
>>
>> Wouldn't it be nicer to do this kind of thing at the output side?  A
>> stupid way would be to have an option to indent the log text with a
>> tab instead of 4-spaces, but a more sensible way would be to keep
>> the visual 4-space-indent and do the expand-tab for each line of
>> text.
>
> Yes, that would certainly work for me too. It's just the "ascii boxes
> don't line up" that is problematic..

Ok, that actually turns out to be pretty easy.

Here's a first try at it. It does tab expansion only for the cases
that indent the commit message, so for things like "pretty=email" it
makes no difference at all.

However, it hardcodes the hardtab size to 8, and makes this all
unconditional. That's how a vt100 terminal works, but fancer terminals
may allow you set actual tab-stops, and if you're in emacs you can
probably do just about anything)

Comments? This does make the kernel commit 0dc8c730c98a look fine, so
it would make me happy.

I can write a commit log etc if people think this is ok, but right now
this is just a silly raw patch in the attachement..

   Linus
 pretty.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/pretty.c b/pretty.c
index 92b2870a7eab..dcd6105d1eb2 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1652,8 +1652,26 @@ void pp_remainder(struct pretty_print_context *pp,
first = 0;
 
strbuf_grow(sb, linelen + indent + 20);
-   if (indent)
+   if (indent) {
+   int i, last = 0, pos = 0;
+
strbuf_addchars(sb, ' ', indent);
+   for (i = 0; i < linelen; i++) {
+   int expand;
+   if (line[i] != '\t')
+   continue;
+   strbuf_add(sb, line+last, i-last);
+   pos += i-last;
+   expand = (pos + 8) & ~7;
+   strbuf_addchars(sb, ' ', expand - pos);
+   pos = expand;
+   last = i+1;
+   }
+
+   // Handle the tail non-tab content
+   line += last;
+   linelen -= last;
+   }
strbuf_add(sb, line, linelen);
strbuf_addch(sb, '\n');
}


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-15 Thread Linus Torvalds
On Tue, Mar 15, 2016 at 9:46 PM, Junio C Hamano  wrote:
>
> It also ignores that byte counts of non-HT bytes may not necessarily
> match display columns.  There is utf8_width() function exported from
> utf8.c for that purpose.

Hmm. I did that to now make it horribly slower. Doing the
per-character widths really does horrible things for performance.

That said, I think I can make it do the fast thing for lines that have
no tabs at all, which is the big bulk of it. So it would do the width
calculations only in the rare "yes, I found a tab" case.

I already wrote it in a way that non-tab lines end up just looping
over the line looking for a tab and then fall back to the old code.

I might just have to make that behavior a bit more explicit. Let me
stare at it a bit, but it won't happen until tomorrow, I think.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] merge: refuse to create too cool a merge by default

2016-04-06 Thread Linus Torvalds
On Wed, Apr 6, 2016 at 11:36 AM, Junio C Hamano  wrote:
>
> I was reviewing the topics to merge to 'master', and a thought
> crossed my mind.  Both of our series only refuse to create a merge
> that does not have any common ancestor, but shouldn't the right
> solution refuse to add a new root?

So I think that's an independent thing, and I think you're right, it
would be a good feature to have. As a maintainer, it would have caught
the whole "my submaintainers did something really odd, I need to talk
to them" before-the-fact rather than after-the-fact.

That said, I'm less worried about my submaintainers doing
_intentionally_ annoying things, than people doing silly things by
mistake.

So if I had a version of git that allowed me to say "don't allow pulls
that add a new root commit unless I specify a '--new-root-allowed'
flag", then yes, I'd use that. And I guess it might not be too nasty
to add: it could be done as part of the object checking pass after
downloading the pack. Was that what you were thinking of?

But at the same time, I think the existing series you have, which
hopefully results in these things not happening by mistake in the
future is the more important piece. I'd rather get good pull requests
than get errors and have to tell people "you screwed up, now we'll
need to re-do this".

But if you cook something up that checks that there are no new roots
in the pull, I'd certainly appreciate that safety net.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] pretty-print: add --pretty=noexpand

2016-03-19 Thread Linus Torvalds
On Thu, Mar 17, 2016 at 10:08 PM, Jeff King  wrote:
>
> Hmm. Isn't "expand tabs" orthogonal to the rest of the pretty format?
> That is, couldn't one want "--pretty=fuller, but with tabs expanded"?

Yeah, you are right, one easily could. And in fact I end up doing
"fuller" myself occasionally, because I check peoples commit
timestamps (some people have a nasty habit of rebasing when they
shouldn't).

So it's not just the medium format that would want detab by default,
it's "full" and "fuller" too (but probably not "raw": that indents the
message too, but the only real reason to use "raw" is for scripting).

So it would probably be better to make it a separate flag, and not tie
it to a particular log format (and just make the log format set the
default).

   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pretty-print: de-tabify indented logs to make things line up properly

2016-03-20 Thread Linus Torvalds
On Wed, Mar 16, 2016 at 12:32 PM, Junio C Hamano  wrote:
>
> may give us a better structure if we are going to give users a knob
> to disable this tab expansion, i.e. move the addition of 4 spaces to
> the caller, name the body of such a function strbuf_expand_add(),
> and then make the caller do something like this perhaps?

I'd suggest just putting that knob into the "pp_handle_indent()"
function, and passing it the "pp" pointer.

In fact, maybe it should just be renamed as "pp_add_line()", and
handle every case, and keep "pp_remainder()" as just the "loop over
each line and handle the empty line and PP_SHORT special case" thing.

That makes it easty to add that CMIT_FMT_EXPAND_TABS kind of code later.

Here's an incremental patch that could be just smushed into my
previous one. It doesn't change the behavior of "pp_handle_indent()",
but I think it clarifies the code and makes future changes much easier
(partly because now nobody has to worry about the continue case and
the newline at the end of the line, so you can just print whatever you
want and then return).

What do you think?

 Linus
 pretty.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/pretty.c b/pretty.c
index 0b40457f99f0..b9374a1708d1 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1699,6 +1699,18 @@ static int pp_handle_indent(struct strbuf *sb, int 
indent,
return 1;
 }
 
+static void pp_add_line(struct pretty_print_context *pp,
+   struct strbuf *sb, int indent,
+   const char *line, int linelen)
+{
+   strbuf_grow(sb, linelen + indent + 20);
+   if (indent) {
+   if (pp_handle_indent(sb, indent, line, linelen))
+   return;
+   }
+   strbuf_add(sb, line, linelen);
+}
+
 void pp_remainder(struct pretty_print_context *pp,
  const char **msg_p,
  struct strbuf *sb,
@@ -1721,12 +1733,7 @@ void pp_remainder(struct pretty_print_context *pp,
}
first = 0;
 
-   strbuf_grow(sb, linelen + indent + 20);
-   if (indent) {
-   if (pp_handle_indent(sb, indent, line, linelen))
-   linelen = 0;
-   }
-   strbuf_add(sb, line, linelen);
+   pp_add_line(pp, sb, indent, line, linelen);
strbuf_addch(sb, '\n');
}
 }


Re: [PATCH] pretty-print: de-tabify indented logs to make things line up properly

2016-03-20 Thread Linus Torvalds
On Wed, Mar 16, 2016 at 9:29 AM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
>
> This should all line up:
>
>   Column 1  Column 2
>     
>   A B
>   ABCD  EFGH
>   SPACESInstead of Tabs
>
> Even with multi-byte UTF8 characters:
>
>   Column 1  Column 2
>     
>   Ä B
>   åäö   100
>   A Møøse   once bit my sister..

So with current git, it looks like this for me:

  Column 1  Column 2
    
  A B
  ABCD  EFGH
  SPACESInstead of Tabs

Even with multi-byte UTF8 characters:

  Column 1  Column 2
    
  Ä B
  åäö   100
  A Møøse   once bit my sister..

and with that patch it looks much better:

  Column 1  Column 2
    
  A B
  ABCD  EFGH
  SPACESInstead of Tabs

Even with multi-byte UTF8 characters:

  Column 1  Column 2
    
  Ä B
  åäö   100
  A Møøse   once bit my sister..

(of course, to see that assumes that you have a fixed-sized font in
your mail viewer, and that the spacing didn't get screwed up by my
cut-and-paste).

 Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/5] Expanding tabs in "git log" output

2016-03-23 Thread Linus Torvalds
On Wed, Mar 23, 2016 at 4:23 PM, Junio C Hamano  wrote:
> So here is the third try (previous round is found at $gmane/289166
> and the very first one is at $gmane/288987).
>
> The first three patches are essentially the same as v2.  The last
> two updates how the tab-expansion is internally controlled:

I tested this (as it in in 'pu', rather than applying the patches),
and it all seems to work fine. So Ack.

And I agree that it would be good if all the commit printout logic was
unified rather than having some ad-hoc "let's just set the format",
but I think that's a separate cleanup.

It might be more regular to have that "--expand-tabs" flag too (which
would then work with the email and raw formats), but I don't see any
actual real use for it so it really doesn't matter.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-19 Thread Linus Torvalds
On Wed, Mar 16, 2016 at 7:27 AM, Marc Branchaud  wrote:
>
> Could this also help with diff output, where the leading + or - mars the
> indentation in a similar way?

I don't think that's a good idea at least by default, since then it
will break things like running diff in emacs capture mode.

So even if you're in a terminal, you can't assume that you can munge
the output too much.

Of course, if colorization is on, you might as well pretty-print the
diff by indenting things properly too, since the end result isn't
going to be used as a _diff_.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] GPIO bulk changes for kernel v4.6

2016-03-19 Thread Linus Torvalds
On Thu, Mar 17, 2016 at 11:07 PM, Laxman Dewangan  wrote:
>
> For creating the repo and branch, I just followed the instruction from wiki
> https://help.github.com/articles/create-a-repo/

So you shouldn't have created a new repo at all, you should just have
cloned an existing one (that gets you a repo, of course).

You basically ended up starting a new project.

But I guess the github connection explains why there was a crazy
README.md file there, and I can see why that documentation would make
you think it's the right thing to do.

Oh well.

Just a "git init" wouldn't have done that kind of damage, the github
documentation is just misleading in this respect.

We may have to make it really really really clear for the kernel that
people should not use github in any way except purely for hosting..

> I jut use git (git version 2.1.4) for pushing the changes in github repo.
>
> There is no other tools used.

I thought git didn't merge two branches that have no common base by
default, but it seems it will happily do so.

So once you made the mistake of starting a new project, git merge
ended up "helpfully" allowing you to merge the remote tracking branch
into that new project, and we ended up with a silly new root.

"git pull-request" will complain about not having a commit in common,
but "git merge" apparently does not even warn.

Adding Junio and the git list. This seems like much too easy a way to screw up.

Junio (and git people), the problem is that github seems to have
caused Laxman to think he should start a new project, and then git
happily merged the new root just because nobody knew better. And
sadly, I didn't notice the history screw-up until too late, so now we
in the kernel have a third root commit (the first two are
intentional): commit a101ad945113 was merged by commit e5451c8f8330.

So how about a patch something like this:

  --- a/builtin/merge.c
  +++ b/builtin/merge.c
  @@ -1398,7 +1398,7 @@ int cmd_merge(int argc, const char **argv,
const char *prefix)
 NULL, 0, UPDATE_REFS_DIE_ON_ERR);

  if (remoteheads && !common)
  -   ; /* No common ancestors found. We need a real merge. */
  +   die(_("No common ancestor - not merging"));
  else if (!remoteheads ||
   (!remoteheads->next && !common->next &&
common->item == remoteheads->item)) {

(the above is explicitly whitespace-damaged on purpose - it's not
meant as a serious patch, because we do want the *ability* to merge
different projects across different roots, and there are even git
tests for it, it's just that I think it's too easy to make this
mistake and not even realize).

So the real thing having a special option required to merge
non-related projects? Or at least a humongous warning?

You can recreate (for testing only!) this by doing this in a kernel repo:

  git checkout a101ad945113
  git merge 3cf42efc3479

and you'll see how it happily ends up creating a merge commit with no
common ancestors and no warning that anything might be wrong. It will
take a while - walking all the way up to the root to not find the
common object - but it will happily merge those two totally unrelated
branches without complaining at all.

Now I'm starting to wonder just how many github projects have lots of
separate root commits..

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pretty-print: de-tabify indented logs to make things line up properly

2016-03-19 Thread Linus Torvalds
On Wed, Mar 16, 2016 at 11:01 AM, Junio C Hamano  wrote:
>
>  (1) if turning your "preparation; do { ... } while()" into
>  "while () { }" would make the result a bit easier to read;

So it's probably partly taste, but I will also disagree with your
"easier to read", because of the way the code is logically structured.

In particular, the "no TAB" case is actually *fundamentally* different
from the "no TAB at the end" case. The return value is different, and
the caller does very different things - the code tries to make it very
clear that that "no TAB" situation is very different from "we found a
TAB".

So it's not "preparation + do-while".

It's "preparation + handle the no-TAB case differently", and then the
"do-while" is very natural because by the time we get to the "ok, we
are now going to need to do something about the line" stage, we
already know we have a tab.

But the code *could* be made to just always do the whole
"strbuf_add()", and not return a return value at all, and the no-tab
case wouldn't be explicitly written to be different.

Let me know if you'd prefer that variant, and I'll send a new version.

>  (2) if we can somehow eliminate duplication of "tab + 1" (spelled
>  differently on the previous line as "1+tab"), the end result
>  may get easier to follow.

Yeah, I considered that. Either by just doing "tab++" before (so the
+1" would come from that in both cases), or by introducing a new
variable like

ptrdiff_t bytes_used;
...
bytes_used = 1 + tab - line;

and then just doing

line += bytes_used;
linelen -= bytes_used;

and the code I wrote just didn't do any of those temporary updates,
and instead just did the "+1" by hand in both cases.

Again, I can redo the patch, just tell me which model you prefer.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Linus Torvalds
On Thu, Apr 21, 2016 at 9:36 AM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
>
> This seems to be a git bug.
>
> That commit aed06b9 can also be described as
>
> v3.13-rc7~9^2~14^2~42
>
> so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way.

Hmm. I think I see what's up. The git distance function has a special
hack for preferring first-parent traversal, introduced long long ago
with commit ac076c29ae8d ("name-rev: Fix non-shortest description").

Changing that

  #define MERGE_TRAVERSAL_WEIGHT 65535

to be a smaller value makes git find the shorter path.

I do not know what the correct fix is, though.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Linus Torvalds
On Thu, Apr 21, 2016 at 10:08 AM, Jeff King  wrote:
>
> Right, because it makes the names longer. We give the second-parent
> traversal a heuristic cost. If we drop that cost to "1", like:

So I dropped it to 500 (removed the two last digits), and it gave a
reasonable answer. With 1000, it gave the same "based on 4.6" answer
as the current 65536 value does.

> which is technically true, but kind of painful to read. It may be that a
> reasonable weight is somewhere between "1" and "65535", though.

Based on my tests, the "right" number is somewhere in the 500-1000
range for this particular case. But it's still a completely made up
number.

> However, I think the more fundamental confusion with git-describe is
> that people expect the shortest distance to be the "first" tag that
> contained the commit, and that is clearly not true in a branchy history.

Yeah.

And I don't think people care *too* much, because I'm sure this has
happened before, it's just that before when it happened it wasn't
quite _so_ far off the expected path..

> I actually think most people would be happy with an algorithm more like:
>
>   1. Find the "oldest" tag (either by timestamp, or by version-sorting
>  the tags) that contains the commit in question.

Yes, we might want to base the "distance" at least partly on the age
of the base commits.

>   2. Find the "simplest" path from that tag to the commit, where we
>  are striving mostly for shortness of explanation, not of path (so
>  "~500" is way better than "~20^2~30^2~14", even though the latter
>  is technically a shorter path).

Well, so the three different paths I've seen are:

 - standard git (65536), and 1000:
   aed06b9 tags/v4.6-rc1~9^2~792

 - non-first-parent cost: 500:
   aed06b9 tags/v3.13-rc7~9^2~14^2~42

 - non-first parent cost: 1:
   aed06b9 tags/v3.13~5^2~4^2~2^2~1^2~42

so there clearly are multiple valid answers.

I would actually claim that the middle one is the best one - but I
claim that based on your algorithm case #1. The last one may be the
shortest actual path, but it's a shorter path to a newer tag that is a
superset of the older tag, so the middle one is actually not just
better based on age, but is a better choice based on "minimal actual
history".

   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Linus Torvalds
On Thu, Apr 21, 2016 at 10:23 AM, Junio C Hamano  wrote:
>
> I think avoiding side branches to describe with the weight is a
> right thing to do, i.e. if you have this history:
>
> X---o---o---o---o---v4.6
>  \ /
>   o---o
>
> you do not want to explain X as "v4.6~^2~2", and instead you want it
> as "v4.6~5", even though the former is 4 hops while the latter is 5
> hops (which is longer).

Yes. I have a new suggestion: make the "merge weight" depend on how
complex the name ends up being.

And that algorithm actually gives a completely new and improved path:

   aed06b9 tags/v3.13-rc2~32^2^2~47

which is still complex, but is actually the best one I've found so far.

To compare against the previous ones I looked at:

   v4.6-rc1~9^2~792<- current git code
   v3.13-rc2~32^2^2~47 <- new with attached patch
   v3.13-rc7~9^2~14^2~42 <- merge weight 500
   v3.13~5^2~4^2~2^2~1^2~42   <- merge weight 1

that new one is actually the second-most dense, and uses the oldest
tag. So it's a good combination of denseness, but still gets the best
actual base tag.

The logic of the patch is that there are different "complexities" in
the naming, and it's not just whether you are following a second
parent, it's also if you're doing generational hops.

So when you do a first-parent change, the name stays simple (the last
number changes), and that means that the "distance" weight is low (so
that's the normal "+1" we do for first-parent.

But if it's not a first parent, there are two different cases:

 - generation > 0: we add "~%d^%d", and we approximate that with "four
characters", and use a cost that is four orders of magnitude higher
(so 1).

 - generation == 0: we add just "^%d", so generally just two
characters. So use a cost that is just two orders of magnitude higher
(so 100).

In other words, I'm trying to convince people that my patch not only
gives a good result, but that the "weight numbers" I use make some
kind of conceptual sense from a complexity cost angle.

With that, the patch itself is attached.

I think it's better than what "git name-rev" does now. Is it optimal?
No, I think the *optimal* would use some kind of "does one tag contain
the other" logic, and discarding all base names that are just
supersets of another base that still reaches the target.

But this patch is small and simple, and has some excuses for its
behavior. What do people think?

 Linus
 builtin/name-rev.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 092e03c3cc9b..0354c8d222e1 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -16,9 +16,6 @@ typedef struct rev_name {
 
 static long cutoff = LONG_MAX;
 
-/* How many generations are maximally preferred over _one_ merge traversal? */
-#define MERGE_TRAVERSAL_WEIGHT 65535
-
 static void name_rev(struct commit *commit,
const char *tip_name, int generation, int distance,
int deref)
@@ -55,19 +52,26 @@ copy_data:
parents;
parents = parents->next, parent_number++) {
if (parent_number > 1) {
+   int weight;
size_t len;
char *new_name;
 
strip_suffix(tip_name, "^0", );
-   if (generation > 0)
+
+   // The extra merge traversal "weight" depends
+   // on how complex the resulting name is.
+   if (generation > 0) {
+   weight = 1;
new_name = xstrfmt("%.*s~%d^%d", (int)len, 
tip_name,
   generation, parent_number);
-   else
+   } else {
+   weight = 100;
new_name = xstrfmt("%.*s^%d", (int)len, 
tip_name,
   parent_number);
+   }
 
name_rev(parents->item, new_name, 0,
-   distance + MERGE_TRAVERSAL_WEIGHT, 0);
+   distance + weight, 0);
} else {
name_rev(parents->item, tip_name, generation + 1,
distance + 1, 0);


Re: history damage in linux.git

2016-04-21 Thread Linus Torvalds
On Thu, Apr 21, 2016 at 6:24 AM, Andreas Schwab  wrote:
>
> The branches recently pulled by Linus contain commits with rather old
> author dates, eg 8cad489261c564d4ee1db4de4ac365af56807d8a or
> 281baf7a702693deaa45c98ef0c5161006b48257.  That will probably cause git
> describe --contains to take a different path through the history.

Nothing in git should look at author dates (except for the obvious
code that then shows the date).

The committer date is in fact used for the traversal heuristics for
some cases, but those are different and recent in the examples you
note.

This seems to be a git bug.

That commit aed06b9 can also be described as

v3.13-rc7~9^2~14^2~42

so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way.

However did git decide to use that further-away name? That looks odd.

I'm wondering if it's because of the new root commit we have, and that
confuses the distance calculation somehow? That came in somewhat
recently (mid March) with the GPIO pull.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Linus Torvalds
On Thu, Apr 21, 2016 at 10:43 AM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
>
> In other words, I'm trying to convince people that my patch not only
> gives a good result, but that the "weight numbers" I use make some
> kind of conceptual sense from a complexity cost angle.

Basically, the patch approximates the numerical representation of the
distance measure with the complexity of the suffix.

It's not a *true* length of the suffix (which does heavily favor
first-parent use, kind of like the current code), but I think it's
better than the pretty random "+65535" that the current git code has.
That number is clearly just completely made up. The new numbers at
least have some kind of logic behind them.

And the current code obviously does give really bad results. Picking
the v4.6-rc1 tag as a base just because it happens to get a lot of
first-parent traversals (800!) from one second-parent commit that is
close to 4.6-rc1 is just obscene.

So the more I look at my patch, the more I go "it's a real improvement
on the current situation".

That said, I do think that a much bigger conceptual change that
actually does full traversal and be much more complicated might be the
only "correct" solution.

So my patch is just a "improve heuristics" small fixlet rather than
something optimal.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Linus Torvalds
On Thu, Apr 21, 2016 at 12:27 PM, Junio C Hamano <gits...@pobox.com> wrote:
> Linus Torvalds <torva...@linux-foundation.org> writes:
>
>> But this patch is small and simple, and has some excuses for its
>> behavior. What do people think?
>
> I like it that you call it "excuse" not "rationale", as I couldn't
> form a logical connection between your "4 (2) letters" and "1
> (100)" at all ;-)

Think of the distance number as a "order of magnitude in complexity",
and it actually makes a certain amount of sense.

It's not the same as the length of the string, but the "log()" of the
distance number really does give a kind of complexity value.

Think of it this way: if things are entirely linear (all just first
parenthood), there will be just a single simple number, and the
relationship between the simple distance number (that just increments
by one for each parent traversed) and the length of the string that
describes it will really be "log10(distance)". That's literally how
many characters you need to describe the linear distance number.

So a simple linear distance of 'n' commits will need on the order of
'log10(n)' digits to describe it (ie a number around a thousand will
need around three digits).

The "100" and "1" are just extending that notion of distance to
the more complex cases., and expresses their complexity in the same
logarithmic units. The same way you need four digits to express a
_linear_ distance of 1, you need four characters to express that
"~n^p" case of "merge parent p, n generations back".

And if you don't have the generation thing, you only need two
characters to express parent #'p': "^p".

So two characters really *are* equivalent to ~100 linear steps, and
four characters really *are* equivalent to ~1 linear steps.

So it's not _just_ an excuse. There's an actual rationale for picking
those numbers, and why they are equivalent in a complexity measure.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Linus Torvalds
On Thu, Apr 21, 2016 at 11:05 AM, Jeff King  wrote:
>
> I actually think the best name for aed06b9 is probably:
>
>   v3.13-rc1~65^2^2~42

Sounds likely. I don't know how to find that best match without a
complete rewrite, though.

My recent patch that got

  v3.13-rc2~32^2^2~47

comes close to that, and the complexity is similar but the numbers are
actually smaller, so I guess my heuristic did indeed find a "simpler"
name, but yes, the one based on 3.13-rc1 would definitely be the
better one.

> which I found by picking the oldest tag from "git tag --contains" and
> plugging it into "git describe --match".

Yeah, so you basically did the "let's figure out minimal inclusion" by hand.

> Sadly, neither git's internal
> version-sorting nor GNU's "sort -V" knows that "v1.0-rc1" comes before
> "v1.0", so I had to rely on "--sort=taggerdate".

I'm not seeing the "sadly".

I think "--sort=taggerdate" is pretty much the only sane sort there is
for tags, unless you do a true and full topological one (ie sort based
on by how many commits that tag encompasses, but also by how each tag
contains another tag).

  Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] name-rev: include taggerdate in considering the best name

2016-04-22 Thread Linus Torvalds
On Fri, Apr 22, 2016 at 11:11 AM, Jeff King  wrote:
>
> I confirmed that it does find the "optimal" tag for the case we've been
> discussing.

Yes. I'm a bit more worried about the date behavior for projects that
merge back stable branches into their development trees (is the
development tag better than the stable tag? the date doesn't really
say much), but I think this is still the simplest model we can use
without trying to really do a topo-sort. And in many ways it's the
simplest one to explain to people too: "we try to use the oldest
reference we can find as a base for the resulting name" is not a
complex or hard concept to explain.

> We could _also_ tweak the merge-weight as Linus's patch did, just
> because 1 has more basis than 65535. But I think it really matters a
> lot less at this point.

Yes. I still think that my tweak makes more sense than the existing
code, but it's a tiny tweak, compared to the date-based approach.
Unlikely to ever matter much.

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Small trivial annoyance with the nice new builtin "git am"

2016-07-28 Thread Linus Torvalds
Ok, it's no longer *that* new, but I only now noticed..

So I noticed that when I applied the last patch-bomb series from
Andrew, all the commit date-stamps are idential.

Now, it would be lovely if the new builtin git-am really was *so* fast
that it applies a 100+-patch series in under a second, but no, that's
not it. It's just that it only looks up the current time once.

That seems entirely accidental, I think that what happened is that
"ident_default_date()" just ends up initializing the default date
string once, and then the date is cached there, because it's now run
as a single process for the whole series.

I think I'd rather get the "real" commit dates, even if they show that
git only does a handful of commits a second rather than hundreds of
commits..

Something that just clears git_default_date in between "git am"
iterations, perhaps?

  Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Small trivial annoyance with the nice new builtin "git am"

2016-07-28 Thread Linus Torvalds
On Thu, Jul 28, 2016 at 5:29 PM, Jeff King  wrote:
>
> I guess we want something like:
>
> +void reset_ident_date(void)
> +{
> +   strbuf_reset(_default_date);
> +}

Looks sane.

> and then to sprinkle calls liberally through builtin-ified programs when
> they move from one unit of work to the next.

Maybe we can just add it to the end of commit_tree_extended(), and
just say "the cache is reset between commits".

That way there is no sprinking in random places.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Small trivial annoyance with the nice new builtin "git am"

2016-07-28 Thread Linus Torvalds
On Thu, Jul 28, 2016 at 5:21 PM, Jeff King  wrote:
>
> I do wonder if you would be happier giving each commit a "fake"
> monotonically increasing time, so they are correctly ordered by commit
> date.

No, that would be nasty, partly for the corner case you mention, but
partly because I just think it's wrong to try to lie about reality.

The reason I noticed this in the first place was actually that I just
wanted to take a look whether things had gotten slower or faster over
time, and see how many patches per second I get from the patch-bombs
Andrew sends me.

So getting real time was what I was looking for.

Also, before somebody asks: the reason git has always cached the
"default time" string is because there's a reverse annoying thing,
which is looking up time twice, and getting a difference of a second
between author times and committer times just because of bad luck.
That makes no sense either, and is actually why we have that
"ident_default_date()" cache thing going on.

So we do want to cache things for a single commit, it's just that for
things like "git am" (or, like Junio wondered, "git rebase" - I didn't
check) we probabyl just just flush the cache in between commits.

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Odd git overrflow bug?

2016-07-10 Thread Linus Torvalds
On Sun, Jul 10, 2016 at 2:21 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
>
> I'm not sure why it doesn't happen in current git master, because that
> function is the same, and the logic around expand_tabs_in_log looks
> similar too.

Ahh. Commit 43ec55091553 ("bisect: always call setup_revisions after
init_revisions") is in master, but not in v2.9.0

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Odd git overrflow bug?

2016-07-10 Thread Linus Torvalds
On Sun, Jul 10, 2016 at 2:05 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
>
> I'm getting "extra" being -1 in strbuf_grow(). Let me dog deeper.

"dog deeper"? My typing skills are deteriorating.

Anyway, I dug deeper, and the reason is that "tabwidth" is -1, and then

 pretty.c:1669: strbuf_add_tabexpand():

strbuf_addchars(sb, ' ', tabwidth - (width % tabwidth));

ends up having the number be -1.

I'm not sure why it doesn't happen in current git master, because that
function is the same, and the logic around expand_tabs_in_log looks
similar too.

That all came from

fe37a9c586a6 (:pretty: allow tweaking tabwidth in --expand-tabs")

and I suspect the code just needs to protect against negative or zero,
rather than just zero.

Junio?

  Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Odd git overrflow bug?

2016-07-10 Thread Linus Torvalds
We have an odd bug report in the kernel, where somebody had trouble
bisecting all the way due to

  "git is failing with "you are trying to use to much memory"(?!)"

which can't be an exact  error message quote, but the closest I can
find smells like the "unsigned_add_overflows()" check in the strbuf
code. Very odd.

See

https://bugzilla.kernel.org/show_bug.cgi?id=121701

for the bug report.

I'm not seeing how that could *possibly* happen,  but if it indeed is
that strbuf code, maybe it could print out the function name (there
are two different cases) and values so give a hint about where/how
this happens..

Of course, since the bug report isn't an exact quote anyway, maybe
that wouldn't have helped anyway. I asked for more information.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Odd git overrflow bug?

2016-07-10 Thread Linus Torvalds
On Sun, Jul 10, 2016 at 2:01 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
>
> I'll try to figure out why git-2.9.0 fails.

I'm getting "extra" being -1 in strbuf_grow(). Let me dog deeper.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Odd git overrflow bug?

2016-07-10 Thread Linus Torvalds
On Sun, Jul 10, 2016 at 11:41 AM, Andreas Schwab  wrote:
>
> I've seen that too, but only at the end of bisection, when it tries to
> display the bad commit.

That's apparently what the kernel bug reporter sees too now.

However, I cannot reproduce the problem with the particular kernel
bisect that the reporter is using. Judging by what he bisected things
down to, he must have done

git bisect start
git bisect bad 7ed18e2d1b6782989eb399ef79a8cc1a1b583b3c
git bisect good 7ed18e2d1b6782989eb399ef79a8cc1a1b583b3c^
git bisect good
git bisect bad

but that works fine for me.

[ Oops. I tested something. It works for me with current git, but with
git-2.9.0 I get the failure ].

I'll try to figure out why git-2.9.0 fails.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reset cached ident date before creating objects

2016-07-29 Thread Linus Torvalds
On Fri, Jul 29, 2016 at 11:05 AM, Jeff King  wrote:
>
> Linus suggested resetting the timestamp after making the commit, to
> clear the way for the next commit. But if we reset any cached value
> _before_ making the commit, this has a few advantages:

Looks fine to me. It should trivially fix the git-am issue, and I
can't see what it could break. Famous last words.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: topological index field for commit objects

2016-06-30 Thread Linus Torvalds
On Thu, Jun 30, 2016 at 3:30 AM, Jakub Narębski  wrote:
>
> P.S. Having Git ensure that committerdate (as an epoch) is greater
> than committerdates of its parents at the commit creation time (with
> providing warning about time skew, perhaps not doing it if skew is
> too large) would be not costly. No need to add anything, and it would
> help future queries to be faster, I think.

So I think git should check the committer date (ie verify that the
commitetr date of a new commit is always equal to or more recent than
all the parents).

But we should probably allow some slop for when machines are out of
sync (think build farms etc automation that may do automated commits,
but the clocks on different machines may be off by a few seconds). We
already do things like that in some of the use of committer dates -
see for example CUTOFF_DATE_SLOP)

And we should probably allow the user to override the refusal to
create new commits with bogus dates - think importing repositories etc
where you may have reasons to just say "that's just how it was".

And it will take years for that to percolate out to all users, so it's
not like it will fix things immediately, but then some time from now,
we can consider commit dates to be more reliably generation numbers.

I do think that it's ok to cache generation numbers somewhere if there
is an algorithm that can make use of them, but every time this comes
up, it's just not been important enough to make a big deal and a new
incompatible object format for it. The committer date is preexisting
and has existing pseudo-generation-number usage, so..improving on the
quality of it sounds like a good idea.

The first step should probably be to make fsck warn about the existing
cases of "commit has older date than parents". Something like the
attached patch, perhaps?

  Linus
 fsck.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/fsck.c b/fsck.c
index 05315451c56f..722b65371d38 100644
--- a/fsck.c
+++ b/fsck.c
@@ -60,6 +60,7 @@
FUNC(NULL_SHA1, WARN) \
FUNC(ZERO_PADDED_FILEMODE, WARN) \
FUNC(NUL_IN_COMMIT, WARN) \
+   FUNC(DATE_ORDERING, WARN) \
/* infos (reported as warnings, but ignored by default) */ \
FUNC(BAD_TAG_NAME, INFO) \
FUNC(MISSING_TAGGER_ENTRY, INFO)
@@ -604,6 +605,17 @@ static int fsck_ident(const char **ident, struct object 
*obj, struct fsck_option
return 0;
 }
 
+static void fsck_commit_date(struct fsck_options *options, struct commit 
*commit)
+{
+   struct commit_list *p;
+
+   for (p = commit->parents; p; p = p->next) {
+   struct commit *parent = p->item;
+   if (commit->date < parent->date)
+   report(options, >object, 
FSCK_MSG_DATE_ORDERING, "Bad commit date ordering with parent");
+   }
+}
+
 static int fsck_commit_buffer(struct commit *commit, const char *buffer,
unsigned long size, struct fsck_options *options)
 {
@@ -650,6 +662,7 @@ static int fsck_commit_buffer(struct commit *commit, const 
char *buffer,
return err;
}
}
+   fsck_commit_date(options, commit);
author_count = 0;
while (skip_prefix(buffer, "author ", )) {
author_count++;


<    1   2   3   4   5   6   >