Re: bogus merges
On Mon, 5 Sep 2005, Wayne Scott wrote: A recent commit in linux-2.6 looks like this: It hopefully shouldn't happen any more with the improved and fixed git-merge-base. Author: Russell King [EMAIL PROTECTED] Date: Wed Aug 31 10:12:14 2005 +0100 I suspect rmk is using cogito-0.13, and that it still has the older git-merge-base that can get confused by the date-ordering problem. And when git-merge-base gives the wrong result (not either of the commit objects it was given as an argument), then git resolve will do the wrong thing. I just checked, and the _current_ git-merge-base definitely gives the right result: linux$ git-merge-base -a \ 6b39374a27eb4be7e9d82145ae270ba02ea90dc8 \ 194d0710e1a7fe92dcf860ddd31fded8c3103b7a results in 194d0710e1a7fe92dcf860ddd31fded8c3103b7a ie it correctly notices that the second commit is the parent of the first one. Really 'git commit' should detect problems like this automatically and prevent them from getting in the tree. Well, that would depend on having the fixed git-merge-base in the first place, which in turn would mean that such a commit wouldn't happen at all, so it's kind of circular. It's not worth fixing anywhere else, since once you fix it in git-merge-base, it just becomes a non-issue. Hopefully we'll have a new cogito release soon, and this particular bug will be a thing of the past. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tool renames? was Re: First stab at glossary
On Mon, 5 Sep 2005, David Kågedal wrote: But to the users (like myself), there's no point in naming it by whether it's a script or a binary. So? There's no downside. To you, as a user, you never see the -script ending anyway. You'd never type it out, or you're already doing something wrong. So to users it doesn't matter, and to developers it _does_ matter (and calling them .pl or .sh or something would be _bad_), why not please the developers? So your argument that it makes it easier for git developers to work with the source doesn't help the user. It doesn't _help_ the user, but since it doesn't hurt him either, why the hell would we even _care_ about the user? Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tool renames? was Re: First stab at glossary
On Tue, 6 Sep 2005, Martin Langhoff wrote: Grep knows how to ignore binary files. That wasn't the _point_. The point is, naming things as being scripts is useful. Grep is just an example. Naming things as being .pl or .sh is _not_ useful. So with grep you can use -I, but what about doing things like em * when doing global renames (I use micro-emacs - em - as my editor). Again, em *-script actually works. The point being that if we have naming rules, make them USEFUL. *-script is useful - it works wonderfully well for git xxx (which knows to add -script), and it works wonderfully well for developers. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tool renames? was Re: First stab at glossary
On Tue, 6 Sep 2005, Junio C Hamano wrote: Linus Torvalds [EMAIL PROTECTED] writes: The point is, naming things as being scripts is useful. Grep is just an example. Naming things as being .pl or .sh is _not_ useful. Sorry, but why not? What's the upside? I can point to one downside: git. That script right now is simple. If you rewrite git-cvsimport-script from shell to perl, it looks the same to git. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] A new merge algorithm, take 3
On Thu, 8 Sep 2005, Junio C Hamano wrote: Yes, the reading of three trees upfront is probably the culprit in your case However, note that _most_ tree reading just reads one. Merges may take half a second, and yes, when I did it, the fact that we move things around in the array is by far the highest cost. But the thing is, if merges take half a second, that's still not only damn good, it's not even the most common operation. Yes, the active_cache[] layout as one big array is inconvenient for read_tree(), which tends to want to interleave different trees in the array and thus forces things to be moved around. But remember what the most common use for the index is - it's the single tree case from read_cache(). That's _so_ much more common than read_tree() that it's not even funny. So the data structure is optimized for a different case than reading in trees. Big deal. That optimization is definitely worth it: it allows us to do the read_cache() with the actual index entries being totally read-only (a linked list would have to add a next pointer to the cache entries and not allow the in-place thing that read_cache() does). Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] A new merge algorithm, take 3
On Thu, 8 Sep 2005, Chuck Lever wrote: in my case the merges were taking significantly longer than a half second. making this change is certainly not worth it if merges are running fast... Note that in cold-cache cases, all the expense of read-tree is in actually reading the tree objects themselves (a kernel tree has more than a thousand subdirectories). Also, a full git pull will do the diffstat etc, and then the expense ends up being in the actual git diff part. So read-tree itself may be half a second, but a merge ends up having other parts. they are still read-only with my linked list implementation. Btw, in the sparse project, we have this really smart pointer list data structure, which is extremely space- and time-efficient. It ends up _looking_ like a linked list, but it batches things up in hunks of 29 entries (29 pointers plus overhead gives you blocks of 32 longwords, which is the allocation size) and thus gives basically a cache-friendly doubly-linked list. It knows how to do insertions, traversals etc very efficiently. Any interest? Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git applymbox is too anal
On Thu, 8 Sep 2005, Greg KH wrote: Or am I missing some option to 'git applymbox' that I can't seem to find? No. git-apply wants an exact bit-for-bit match. Partly because fuzz is hard, but mostly because I don't like it. I apply a _lot_ of patches, and if a unforgiving git-apply works for me, it should work for you too. One issue is that I want to apply patches that apply cleanly, and if the patch was generated against some older kernel version, then it should quite likely be _applied_ towards that older kernel version. Then you can use git to merge the two. Of course, I only do that occasionally, for bigger clashes. For smaller things, I just edit the patch by hand. And sometimes, I ask the sending side to re-generate the patch. Applying with a fuzz does the right thing most of the time. But most isn't good enough. I'd rather have a human look at anything that requires fuzz. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH maint-1.6.5] block-sha1: avoid unaligned accesses on some big-endian systems
On Sat, Jul 14, 2012 at 12:59 AM, Jonathan Nieder jrnie...@gmail.com wrote: Unfortunately, on big-endian architectures, if p is a pointer to unsigned int then current gcc assumes it is properly aligned and converts this construct to a 32-bit load. This patch seems to entirely depend on the location of the cast. And as far as I can tell, that workaround will in turn depend on just what gets inlined. My gut feel is that this makes the code uglier (the manual multiply by four being a prime example) while not really even addressing the problem. I think a much better approach would be to just mark the unsigned int data pointer as being unaligned, or add a get_unaligned() helper function (you have to do a structure member and mark the structure packed, I think - I don't think you can just mark an int pointer packed). Sure, that's compiler-dependent, but if a compiler does something like gcc apparently does, it had better support the notion of unaligned pointers. And then gcc might actually do the unaligned word read *optimally* on big-endian architectures, instead of that idiotic byte-at-a-time crap with shifting. Anyway, the whole noticed on alpha makes no sense, since alpha isn't even big-endian. So the commit log is insane and misleading too. Alpha is very much little-endian, but maybe gcc turns the thing into an unaligned load followed by a bswap. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH maint-1.6.5] block-sha1: avoid unaligned accesses on some big-endian systems
On Sat, Jul 14, 2012 at 12:50 PM, Jonathan Nieder jrnie...@gmail.com wrote: After the patch, what reason does gcc have to expect that 'block' is 32-bit aligned except when it is? The code (including the code I didn't touch) never casts from char * to int * except in get/put_be32 on arches that don't mind unaligned accesses. Ahh. I was looking at the cast you added: blk_SHA1_Block(ctx, (const unsigned char *) ctx-W); but I guess that if gcc inlines that and sees the original int type, it doesn't matter, because that *is* aligned. So it's ok, but please just make blk_SHA1_Block() take a const void * instead and that cast can go away. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints
Looks good to me. I'd suggest doing the macro argument expansion in #define SHA_SRC(t) get_be32((unsigned char *) block + t*4) with parenthesis around 't', but that's a fairly independent thing. The old code didn't do that either. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] macos: lazily initialize iconv
On Tue, Jul 31, 2012 at 11:37 AM, Junio C Hamano gits...@pobox.com wrote: * This is not even compile tested, so it needs testing and benchmarking, as I do not even know how costly the calls to open/close are when we do not have to call iconv() itself. Ok, so it's easily compile-tested: just add + COMPAT_OBJS += compat/precompose_utf8.o + BASIC_CFLAGS += -DPRECOMPOSE_UNICODE to the makefile for Linux too. Actually testing how well it *works* is hard, since I don't really have a mac (well, I do, but it no longer has OS X on it ;), and the whole utf-8-mac thing does not make sense. HOWEVER. I actually tested it with the conversion being from Latin1 to UTF-8 instead, and it does interesting things, and kind of works. I say kind of, because for the case of the filesystem being in Latin1, we actually have to convert things back to the filesystem character set when doing open() and lstat(), and this patch obviously doesn't do that, because OS X does the conversion back to NFD on its own. But ACK on the patch. If I had more time, I'd actually be interested to do the generic case of namespace conversion, and we could make this a generic git feature - it's something I wanted to do long ago. However, right now I'm in the merge window and will go on a vacation to Finland after that, so I probably won't get around to it. I do have one suggestion: please rename the has_utf8() function to has_nonascii() too. Why? Because that's what the function actually does. It has nothing to do with testing for UTF-8 (the utf-8 rules are more complex than just high bit set), and *if* I ever get around to doing a more generic character set conversion for the filenames, the decision really would be about non-ASCII, not about non-UTF8. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] macos: lazily initialize iconv
On Tue, Jul 31, 2012 at 1:16 PM, Junio C Hamano gits...@pobox.com wrote: Eek. Oh, I agree. Doing a full character set conversion both ways is quite a bit more work. Not just write_entry() codepath that creates the final paths on the filesystem, you would need to touch lstat() calls that check the existence and freshness of the path, once you go that route. I am sure such a change can be made to work, but I am not sure how much we would gain from one. I think it might be interesting. I doubt it matters all that much any more in Western Europe (Unicode really does seem to have largely taken over), but I think Japan still uses Shift-JIS a lot. Although maybe that is starting to fade too. And it really is just a generalization of the OS X filesystem damage. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/7] Rework git core for native submodules
On Thu, Apr 4, 2013 at 11:30 AM, Ramkumar Ramachandra artag...@gmail.com wrote: The purpose of this series is to convince you that we've made a lot of fundamental mistakes while designing submodules, and that we should fix them now. [1/7] argues for a new object type, and this is the core of the idea. I don't dispute that a new link object might be a good idea, but there's no explanation of the actual format of this thing anywhere, and what the real advantages would be. A clearer this is the design, this is the format of the link object, and this is what it buys us would be a good idea. Also, one of the arguments against using link objects originally was that the format wasn't stable, and in particular the address of the actual submodule repository might differ for different people. So when adding a new object type, explaining *why* the format of such an object is globally stable (since it will be part of the SHA1 of the object) is a big deal. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/7] Rework git core for native submodules
On Thu, Apr 4, 2013 at 11:52 AM, Ramkumar Ramachandra artag...@gmail.com wrote: 1. upstream_url: this records the upstream URL. No need to keep a .gitmodules. 2. checkout_rev: this records the ref to check out the submodule to. As opposed to a concrete SHA-1, this allows for more flexibility; you can put refs/heads/master and have truly floating submodules. 3. ref_name: this specifies what name the ref under refs/modules/branch/ should use. 4. floating: this bit specifies whether to record a concrete SHA-1 in checkout_rev. 5. statthrough: this bit specifies whether git should stat() through the worktree. We can turn it off on big repositories for performance reasons. So the thing is (and this was pretty much the original basis for .gitmodules) that pretty much *all* of the above fields are quite possibly site-specific, rather than globally stable. So I actually conceptually like (and liked) the notion of a link object, but I just don't think it is necessarily practically useful, exactly because different installations of the *same* supermodule might well want to have different setups wrt these submodule fields. My gut feel is that yes, .gitmodules was always a bit of a hack, but it's a *working* hack, and it does have advantages exactly because it's more fluid than an actual git object (which by definition has to be set 100% in stone). If there are things you feel it does wrong (like the git add bug that is being discussed elsewhere), I wonder if it's not best to at least try to fix/extend them in the current model. The features you seem to be after (ie that whole floating/refname thing) don't seem fundamentally antithetical to the current model (a commit SHA1 of all zeroes for floating, with a new refname field in .submodules? I dunno).. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/7] Rework git core for native submodules
On Thu, Apr 4, 2013 at 12:36 PM, Ramkumar Ramachandra artag...@gmail.com wrote: Let's compare the two alternatives: .gitmodules versus link object. If I want my fork of .gitmodules, I create a commit on top. Or you could also just edit and carry a dirty .gitmodules around for your personal use-case. I don't know if anybody does that, but it should work fine. And I don't see what you can do with the link objects that you cannot do with .gitmodules. That's what it really boils down to. .gitmodules do actually work. Your extensions would work with them too. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 0/7] Rework git core for native submodules
On Thu, Apr 4, 2013 at 1:04 PM, Ramkumar Ramachandra artag...@gmail.com wrote: Linus Torvalds wrote: Or you could also just edit and carry a dirty .gitmodules around for your personal use-case. I'm sorry, but a dirty worktree is unnecessarily painful to work with. Bzzt. Wrong. A dirty worktree is not only easy to work with (I do it all the time, having random test-patches in my tree that I never even intend to commit), it's a *requirement*. One thing that git does really really well is merging. And one of the reasons why git does merging well (apart from the obvious meta-issue: it's what I care about) is that it not only has the stable information in the object database, it also has the staging information in the index, *and* it has dirty data in the working tree. You absolutely need all three. Having an edit command to edit stable data (or staging data) is broken. Trust me, I've been there, done that, got the T-shirt and know it is wrong. The whole stable objects + index + dirty worktree is FUNDAMENTALLY the right way to work, and it *has* to work that way for merges to work well. The only things that we don't have dirty data for in the worktree is creating commits and tags, but those aren't relevant for the merging process anyway, in the sense that you never change them for merging, you create them *after* merging (and this is fundamental, and not just a git implementation issue). So you absolutely need a dirty worktree. You need it for testing, and you need it for merging. Having a model where you don't have a in-progress entity that works as a temporary is absolutely and entirely wrong. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
git grep parallelism question
Since I reboot fairly regularly to test new kernels, I don't *always* have the kernel source tree in my caches, so I still care about some cold-cache performance. And git grep tends to be the most noticeable one. Now, I have a SSD, and even the cold-cache case takes just five seconds or so, but that's still somethng I react to, since the normal kernel tree in cache case ends up bring close enough to instantaneous (half a second) that then when it takes longer I react to it. And I started thinking about it, and our git grep parallelism seems to be limited to 8. Which is fine most of the time for CPU parallelism (although maybe some people with big machines would prefer higher numbers), but for IO parallelism I thought that maybe we'd like a higher number... So I tried it out, and with THREADS set to 32, I get a roughly 15% performance boost for the cold-cache case (the error bar is high enough to not give a very precise number, but I see it going from ~4.8-4.9s on my machine down to 4.2..4.6s). That's on an SSD, though - the performance implications might be very different for other use cases (NFS would likely prefer higher IO parallelism and might show bigger improvement, while a rotational disk might end up just thrashing more) Is this a big deal? Probably not. But I did react to how annoying it was to set the parallelism factor (recompile git with a new number). Wouldn't it be lovely if it was slightly smarter (something more akin to the index preloading that takes number of files into account) or at least allowed people to set the parallelism explicitly with a command line switch? Right now it disables the parallel grep entirely for UP, for example. Which makes perfect sense if grep is all about CPU use. But even UP might improve from parallel IO.. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git grep parallelism question
On Fri, Apr 26, 2013 at 11:47 AM, Junio C Hamano gits...@pobox.com wrote: The real issue may be that we do not have a good estimate of how many paths are involved in the request before starting these threads, though. Yes. Also, I'm not sure if the 15% possible improvement on my SSD case is even worth it for something that in the end isn't necessarily the common case. I *suspect* that it might be a much bigger deal on NFS (IO parallelism really does end up being a big deal sometimes, and caching tends to be less aggressive too), but on rotational media it might be much less clear, or even a loss.. Are there people out there who use git grep over NFS and have been unhappy with performance? If are willing to recompile git with a different THREAD value in builtin/grep.c, then on a Linux client you can try echo 3 /proc/sys/vm/drop_caches to largely force cold-cache behavior for testing (I say largely, because it won't drop busy/dirty pages, but for git grep kind of loads it should be good). Of course, you need root for it, so.. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git grep parallelism question
On Fri, Apr 26, 2013 at 12:19 PM, Junio C Hamano gits...@pobox.com wrote: OK, you have to recompile at least once for experiment, so perhaps building the test binary with this patch may help. So here's my experiment on my machine with this patch and the kernel tree. First I did the warm-cache case: for i in 1 4 8 16 32 64 do echo $i: for j in 1 2 3 4 do t=$(sh -c time git grep --threads=$i hjahsja 21 | grep real) echo $i $t done done and the numbers are pretty stable, here's just he summary (best of four tries for each case): 1 real 0m0.598s 4 real 0m0.253s 8 real 0m0.235s 16 real 0m0.269s 32 real 0m0.412s 64 real 0m0.420s so for this machine, 8 threads (our old number) is actually optimal even if it has just four cores (actually, two cores with HT). I suspect it's just because the load is slightly unbalanced, so a few extra threads helps. Looks like anything in the 4-16 thread range is ok, though. But 32 threads is clearly too much. Then I did the exact same thing, but with the echo 3 /proc/sys/vm/drop_caches just before the timing of the git grep. Again, summarizing (best-of-four number, the variation wasn't that big): 1 real 0m17.866s 4 real 0m6.367s 8 real 0m4.855s 16 real 0m4.307s 32 real 0m4.153s 64 real 0m4.128s here, the numbers continue to improve up to 64 threads, although the difference between 32 and 64 is starting to be in the noise. I suspect it's a combination of better IO overlap (although not all SSD's will even improve all that much from overlapping IO past a certain point) and probably a more noticeable imbalance between threads. Anyway, for *my* machine and for *this* particular load, I'd say that we're already pretty close to optimal. I did some trials just to see, but the best hot-cache numbers were fairly reliably for 7 or 8 threads. Looking at the numbers, I can't really convince myself that it would be worth it to do (say) 12 threads on this machine. Yes, the cold-cache case improves from the 8-thread case (best-of-four for 12 threads: 0m4.467s), but the hot-cache case has gotten sufficiently worse (0m0.251s) that I'm not sure.. Of course, in *absolute* numbers the cold-cache case is so much slower that a half-second improvement from going to 16 threads might be considered worth it, because while the the hot-cache case gets worse, we may just not care because it's fast enough that it's not noticeable. Anyway, I think your patch is good if for no other reason that it allows this kind of testing, but at least for my machine, clearly the current default of eight threads is actually good enough. Maybe somebody with a very different machine might want to run the above script and see if how sensitive other machines are to this parameter.. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] patch-ids.c: cache patch IDs in a notes tree
On Sat, May 11, 2013 at 12:54 PM, John Keeping j...@keeping.me.uk wrote: This adds a new configuration variable patchid.cacheRef which controls whether (and where) patch IDs will be cached in a notes tree. Patch ID's aren't stable wrt different diff options, so I think this is potentially a very bad idea. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] patch-ids.c: cache patch IDs in a notes tree
On Sat, May 11, 2013 at 2:49 PM, John Keeping j...@keeping.me.uk wrote: Hmm... I hadn't realised that. Looking a bit closer, it looks like init_patch_ids sets up its own diffopts so its not affected by the command line (except for pathspecs which would be easy to check for). Of course that still means it can be affected by settings in the user's configuration. .. and in the actual diff algorithm. The thing is - patches ARE NOT STABLE. There are many valid ways to get a patch from one version to another, and even without command line changes, we've had different versions of git generating different patches. There are heuristics in xdiff to avoid some nasty use up tons of CPU-time things that have been tweaked over time. And even for simple cases there are ambiguous ways to describe the patch. Now, maybe we don't care, because in practice, most of the time this just doesn't much matter. And anybody who uses patch-ID's had better not rely on them being guaranteed to be unique anyway, so it's more of a if the patch ID is the same, it's almost guaranteed to be the same patch, but that's a big almost. And no, it's not some theoretical SHA1 collisions are very unlikely kind of thing, it's a the patch ID actually ignores a lot of data in order to give the same ID even if lins have been added above it, and the patch is at different line numbers etc. So maybe it doesn't matter. But at the same time, I really think caching patch ID's should be something people should be aware of is fundamentally wrong, even if it might work. And quite frankly, if you do rebases etc so much that you think patch ID's are so important that they need to be cached, you may be doing odd/wrong things. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-compat-util: Avoid strcasecmp() being inlined
On Wed, Sep 18, 2013 at 5:43 AM, Sebastian Schuberth sschube...@gmail.com wrote: My feeling is that Linus' reaction was more about that this work-around is even necessary (and MinGW is buggy) rather than applying it to git-compat-util.h and not elsewhere. So I think it's an annoying MinGW bug, but the reason I dislike the no-inline approach is two-fold: - it's *way* too intimate with the bug. When you have a bug like this, the *last* thing you want to do is to make sweet sweet love to it, and get really involved with it. You want to say Eww, what a nasty little bug, I don't want to have anything to do with you. And quite frankly, delving into the details of exactly *what* MinGW does wrong, and defining magic __NO_INLINE__ macros, knowing that that is the particular incantation that hides the MinGW bug, that's being too intimate. That's simply a level of detail that *nobody* should ever have to know. The other patch (having just a wrapper function) doesn't have those kinds of intimacy issues. That patch just says MinGW is buggy and cannot do this function uninlined, so we wrap it. Notice the lack of detail, and lack of *interest* in the exact particular pattern of the bug. The other reason I'm not a fan of the __NO_INLINE__ approach is even more straightforward: - Why should we disable the inlining of everything in string.h (and possibly elsewhere too - who the hell knows what __NO_INLINE__ will do to other header files), when in 99% of all the cases we don't care, and in fact inlining may well be good and the right thing to do. So the __NO_INLINE__ games seem to be both too big of a hammer, and too non-specific, and at the same time it gets really intimate with MinGW in unhealthy ways. If you know something is diseased, you keep your distance, you don't try to embrace it. I tried to put the __NO_INLINE__ stuff in compat/mingw.h but failed, it involved the need to shuffle includes in git-compat-util.h around because winsock2.h already seems to include string.h, and I did not find a working include order. So I came up with the following, do you like that better? Ugh, so now that patch is fragile, so we have to complicate it even more. Really, just make a wrapper function. It doesn't even need to be conditional on MinGW. Just a single one-liner function, with a comment above it that says MinGW is broken and doesn't have an out-of-line copy of strcasecmp(), so we wrap it here. No unnecessary details about internal workings of a buggy MinGW header file. No complexity. No subtle issues with include file ordering. Just a straightforward workaround that is easy to explain. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH] git log: support auto decorations
From: Linus Torvalds torva...@linux-foundation.org Date: Thu, 29 May 2014 15:19:40 -0700 Subject: [RFC PATCH] git log: support auto decorations This works kind of like --color=auto - add decorations for interactive use, but do not change defaults when scripting or when piping the output to anything but a terminal. You can use either [log] decorate=auto in the git config files, or the --decorate=auto command line option to choose this behavior. Signed-off-by: Linus Torvalds torva...@linux-foundation.org --- I actually like seeing decorations by default, but I do *not* think our current log.decorate options make sense, since they will change any random use of git log to have decorations. I much prefer the ui.color=auto behavior that we have for coloration. This is a trivial patch that tries to approximate that. It's marked with RFC because (a) that isatty(1) || pager_in_use() test is kind of hacky, maybe we would be better off sharing something with the auto-coloration? (b) I also think it would be nice to have the equivalent for --show-signature, but there we don't have any preexisting config file option. (c) maybe somebody would like a way to combine auto and full, although personally that doesn't seem to strike me as all that useful (would you really want to see the full refname when not scripting it) but the patch is certainly simple and seems to work. Comments? builtin/log.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/builtin/log.c b/builtin/log.c index 39e883635279..df6396c9c3d9 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -63,6 +63,8 @@ static int parse_decoration_style(const char *var, const char *value) return DECORATE_FULL_REFS; else if (!strcmp(value, short)) return DECORATE_SHORT_REFS; + else if (!strcmp(value, auto)) + return (isatty(1) || pager_in_use()) ? DECORATE_SHORT_REFS : 0; return -1; } -- 2.0.0.1.g5beb60c -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] git log: support auto decorations
On Thu, May 29, 2014 at 6:58 PM, Jeff King p...@peff.net wrote: I will spare you the usual lecture on having these lines in the message body. ;) We do it for the kernel because they often get lost otherwise. Particularly the date/author. git doesn't tend to have the same kind of deep email forwarding models, and nobody uses quilt to develop git (I hope!) but it's a habit I like to encourage for the kernel, so I use it in general.. Yeah, I think this makes a lot of sense. I do use log.decorate=true, and it is usually not a big deal. However, I think I have run into annoyances once or twice when piping it. I'd probably use log.decorate=auto if we had it. I have various scripts to generate releases etc, using git log and piping it to random other stuff. I don't _think_ they care, but quite frankly, I'd rather not even have to think about it. Also, I actually find the un-colorized version of the decorations to be distracting. With color (not that I really like the default color scheme, but I'm too lazy to set it up to anything else), the colorization ends up making the decorations much easier to visually separate, so I like them there. It's marked with RFC because (a) that isatty(1) || pager_in_use() test is kind of hacky, maybe we would be better off sharing something with the auto-coloration? The magic for this is in color.c, want_color() and check_auto_color(). Oh, I know. That's where I stole it from. But the colorization actually has very different rules, and looks at TERM etc. So I only stole the actual non-color-specific parts of it, but I was wondering whether it might make sense to unify them. (b) I also think it would be nice to have the equivalent for --show-signature, but there we don't have any preexisting config file option. Potentially yes, though there is a real performance impact for log --show-signature if you actually have a lot of signatures. Even on linux.git, a full git log is 15s with --show-signature, and 5s without. Maybe that is acceptable for interactive use (and certainly it is not a reason to make it an _option_, if somebody wants to turn it on). Yes. This is an example of where the whole tty is fundamentally different from scripting happens. It really is about the whole user is looking at it vs scripting. There is no way in hell that --show-signature should be on by default for anybody sane. That said, part of it is just that show-signature is so suboptimal performance-wise, re-parsing the commit buffer for each commit when show_signature is set. That's just crazy, we've already parsed the commit text, we already *could* know if it has a signature or not, and skip it if it doesn't. That would require one of the flag bits in the object, though, or something, so it's probably not worth doing. Sure, performance might matter if you end up searching for something in the pager after you've done git log, but personally I think I'd never even notice.. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.0.0 regression? request pull does not seem to find head
On Mon, Jun 2, 2014 at 2:01 PM, Michael S. Tsirkin m...@redhat.com wrote: [mst@robin linux]$ git request-pull net-next/master git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git net-next warn: No match for commit 2ae76693b8bcabf370b981cd00c36cd41d33fabc found at git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git warn: Are you sure you pushed 'net-next' there? git request-pull is clearly correct. There is no net-next in that public repository. It *used* to be that request-pull then magically tried to fix it up for you, which in turn resulted in the guess not being right, like pointing to the wrong branch that just happened to have the same SHA1, or pointing to a branch when it _should_ have pointed to a tag. Now, if you pushed your local net-next branch to another branch name (I can find a branch name called vhost-next at that repository, then you can *tell* git that, using the same syntax you must have used for the push. So something like git request-pull net-next/master git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git net-next:vhost-next should work so that git doesn't end up having to guess (and potentially guessing wrong). But it may actually be a simpler driver error, and you wanted to use vhost-next, and that net-next was actually incorrect? Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git v2.0.0
On Mon, Jun 2, 2014 at 7:08 PM, NeilBrown ne...@suse.de wrote: git request-pull master git://neil.brown.name/md after tagging the current commit as md/3.15-fixes and pushing out the tag You should *tell* git request-pull what you're actually requesting to pull. The old let's guess based on the commit at the top of your tree may have worked for you (because you only ever had one matching thing), but it did not work in general. So the new git request-pull does not guess. If you want to request a tag to be pulled, you name it. Because if you don't name it, it now defaults to HEAD (like all other git log commands) rather than guessing. So please write it as git request-pull master git://neil.brown.name/md md/3.15-fixes I guess the man-page should be made more explicit about this too. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: git request-pull broken for plain branches
On Wed, Jun 25, 2014 at 2:55 AM, Uwe Kleine-König u.kleine-koe...@pengutronix.de wrote: $ git rev-parse HEAD 9e065e4a5a58308f1a0da4bb80b830929dfa90b3 $ git ls-remote origin | grep 9e065e4a5a58308f1a0da4bb80b830929dfa90b3 9e065e4a5a58308f1a0da4bb80b830929dfa90b3 refs/heads/ukl/for-mainline $ git request-pull origin/master origin HEAD /dev/null warn: No match for commit 9e065e4a5a58308f1a0da4bb80b830929dfa90b3 found at origin warn: Are you sure you pushed 'HEAD' there? Notice how HEAD does not match. The error message is perhaps misleading. It's not enough to match the commit. You need to match the branch name too. git used to guess the branch name (from the commit), and it often guessed wrongly. So now they need to match. So you should do git request-pull origin/master origin ukl/for-mainline to let request-pull know that you're requesting a pull for ukl/for-mainline. If you have another name for that branch locally (ie you did something like git push origin local:remote), then you can say git request-pull origin/master origin local-name:remote-name to specify what the branch to be pulled is called locally vs remotely. In other words, what used to be pick some branch randomly is now you need to _specify_ the branch. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tackling Git Limitations with Singular Large Line-seperated Plaintext files
On Fri, Jun 27, 2014 at 10:48 AM, Junio C Hamano gits...@pobox.com wrote: Even though the original question mentioned delta discovery, I think what was being asked is not delta in the Git sense (which your answer is about) but is can we diff two long sequences of text (that happens to consist of only 4-letter alphabet but that is a irrelevant detail) without holding both in-core in their entirety?, which is a more relevant question/desire from the application point of view. .. even there, there's another issue. With enough memory, the diff itself should be fairly reasonable to do, but we do not have any sane *format* for diffing those kinds of things. The regular textual diff is line-based, and is not amenable to comparing two long lines. You'll just get a diff that says the two really long lines are different. The binary diff option should work, but it is a horrible output format, and not very helpful. It contains all the relevant data (copy this chunk from here to here), but it's then shown in a binary encoding that isn't really all that useful if you want to say what are the differences between these two chromosomes. I think it might be possible to just specify a special diff algorithm (git already supports that, obviously), and just introduce a new use binary diffs with a textual representation model. But it also sounds like there might be some actual performance problem with these 1GB file delta-calculations. Which I wouldn't be surprised about at all. Jarrad - is there any public data you could give as an example and for people to play with? Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tackling Git Limitations with Singular Large Line-seperated Plaintext files
On Fri, Jun 27, 2014 at 12:38 PM, Linus Torvalds torva...@linux-foundation.org wrote: I think it might be possible to just specify a special diff algorithm (git already supports that, obviously), and just introduce a new use binary diffs with a textual representation model. Another model would be to just insert newlines in the data, and use the regular textual diff on that preprocessed format. The problem of *where* to insert the newlines is somewhat interesting, since the stupid approaches (chunk it up in 64-byte lines) don't work with data insertion/deletion (all the lines will now be different just because the data is offset), but there are algorithms that handle that reasonably well, like breaking lines at certain well-defined patterns (the patterns can then be defined either explicitly or algorithmically - like calculating a hash/crc over the last rolling N characters and breaking if the result matches some modulo calculation). Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tackling Git Limitations with Singular Large Line-seperated Plaintext files
On Fri, Jun 27, 2014 at 12:55 PM, Jason Pyeron jpye...@pdinc.us wrote: The issue will be, if we talk about changes other than same length substitutions (e.g. Down's Syndrome where it has an insertion of code) would require one code per line for the diffs to work nicely. Not my area of expertise, but depending on what you are interested in - like protein encoding etc, I really think you don't need to do things character-per-character. You might want to break at interesting sequences (TATA box, and/or known long repeating sequences). So you could basically turn the one long line representation into multiple lines, by just looking for particular known interesting (or known particularly *UN*interesting) patterns, and whenever you see the pattern you create a new line, describing the pattern (TATAAA or run of 128 U), and then continue on the next line. Then you diff those semantically enriched streams instead of the raw data. But it probably depends on what you are looking for and at. Sometimes you might be looking at individual base pairs. And sometimes maybe you want to look at the codons, and consider condons that transcribe to the same amino acid to be the same, and not show up as a difference. So I could well imagine that you might want to have multiple different ways to generate these diffs. No? Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: patch-2.7.3 no longer applies relative symbolic link patches
On Mon, Jan 26, 2015 at 1:35 PM, Junio C Hamano gits...@pobox.com wrote: What is your take on CVE-2015-1196, which brought this /regression/ to GNU patch? If git apply get /fixed/ for that same CVE, would that /break/ your fix? I _think_ we allow arbitrary symlinks to be created, but then we should be careful about actually _following_ them. At least I _thought_ we were already quite careful not to do that, even if it's been a long time since I looked at the code. So even if we create a symlink to outside the repository, it normally shouldn't matter. We have that whole lstat_cache() thing that exists exactly to make it efficient to do pathname lookups while at the same time being aware of symlinks in the middle. Of course, our lstat cache is racy if somebody else modifies the tree concurrently and changes things, but that's a non-issue, because if somebody can just directly create random symlinks in the middle of the tree, I don't think we care about any symlinks _git_ might be creating concurrently ;) But it is entirely possible that git apply - especially when used outside of a real git directory - ends up doing that. And it's not like we necessarily always use the whole lstat-cache mechanism to begin with, so the fact that we have the infrastructure to be careful in no way means that we necessarily always _are_ careful... Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: patch-2.7.3 no longer applies relative symbolic link patches
On Mon, Jan 26, 2015 at 1:07 PM, Josh Boyer jwbo...@fedoraproject.org wrote: Or did I miss a way that git-apply can take a git patch and apply it to a tree that isn't a git repo? Exactly. git apply works as a straight patch replacement outside of a git repository. It doesn't actually need a git tree to work. (Of course, git apply is _not_ a patch replacement in the general sense. It only applies context diffs - preferentially git style ones - so no old-style patches etc need apply. And it's not replacement-compatible in a syntax sense either, in that while many of the options are the same, not all are etc etc). Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] request-pull: do something if $3 is passed
On Tue, Feb 17, 2015 at 12:34 PM, Paolo Bonzini bonz...@gnu.org wrote: I guess only Linus could answer that, since he wrote 024d34cb0 and he knows the intent better than anyone else. I don't even understand your problem. You said when $3 is not passed git will try to use HEAD as the default but it cannot be resolved to a tag, neither locally (patch 2) nor remotely (patch 3) which makes absolutely no sense. HEAD is not a tag. Never has been, never will be. If you want me to pull a tag, then you damn well should say what tag you want, not just randomly say HEAD. So what is it you want to do? At no point is HEAD should resolve as a tag sensible. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] request-pull: do something if $3 is passed
On Tue, Feb 17, 2015 at 12:53 PM, Paolo Bonzini bonz...@gnu.org wrote: Without $3, git tries to do things that make no sense like git show-ref --heads --tags HEAD; or that make little sense when requesting a pull, like looking for HEAD in the output of git ls-remote. But from the release notes of 2.0 it looks like it's intended and the script is just taking shortcuts. It is *you* who make no sense. Looking for HEAD in git ls-remote? Perfectly sensible: [torvalds@i7 linux]$ git ls-remote origin | grep HEAD cc4f9c2a91b7be7b3590bb1cbe8148873556aa3f HEAD that's the default thing when you don't specify any particular branch or tag. Ok, in 1.9.x I used to not say anything; if the new workflow is to always specify a tag, that's okay. Indeed. You have to specify what you want me to pull. Exactly because in 1.9.x people didn't, and I got *really* tired of getting bogus pull requests that didn't work, or pointed at the wrong branch when people had multiple branches with the same contents etc. I wanted git to find the matching tag on the remote side when I use git request-pull origin/master URL with no third parameter, since I never request pulls except with a single signed tag. The thing is, HEAD works. Not for you, because you don't use HEAD. But because you don't use HEAD, you shouldn't use the default. I *would* agree to making $3 be mandatory, but there are still people out there who just use a single branch per repository and no signed branches. Which is the only reason that default HEAD' thing exists. :Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] request-pull: do something if $3 is passed
On Tue, Feb 17, 2015 at 1:03 PM, Junio C Hamano gits...@pobox.com wrote: HEAD should resolve as a tag is not sensible, but HEAD should locally DWIM to something sensible is still possible, no? I disagree. Why? Because what you have locally is *not* necessarily the same thing you have remotely. And that's *exactly* why people used to send me broken pull requests. git pull-request would guess on things, and it would get the guesses wrong, and write the pull request wrong. We could for example make the rule for unset $3 case like this: instead of the current missing $3 is a request to pull HEAD: If you have one and only one signed tag that happens to point at the commit sitting at HEAD, behave as if that tag was given as the third argument from the command line. If you verify that one and only to be true both locally and remotely, then I guess I would be ok with it. But it really would have to be unique. And truly unique, as in no confusion about branches or tags, only one or the other. Because the tag vs branch was one of the main sources of confusion that made me repeatedly get bad pull requests, particularly when there was something locally that wasn't actually named the same thing remotely. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] request-pull: do something if $3 is passed
On Tue, Feb 17, 2015 at 1:10 PM, Paolo Bonzini pbonz...@redhat.com wrote: Sure. But if I got a pull request saying please pull git://example.org/foo.git HEAD I would think that the sender messed up the pull request. So *in the context of git-request-pull* ${remote:-HEAD} makes little sense to me. Umm. If somebody actually leaves off the third argument THAT IS NOT AT ALL what it prints. It will show The following changes since commit base... .. base commit description .. are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git for you to fetch changes up to cc4f9c2a91b7be7b3590bb1cbe8148873556aa3f: ... IOW, it does exactly the right thing. It gives the contents of HEAD, but it doesn't actually say HEAD anywhere. And just look at lkml. The above kind of branch-less and tag-less pull requests are still fairly common. It's the original git model, and it may be a bit archaic, and I much prefer people to send me signed tags, but hey, that's what don't mention a branch or tag means. And no, I don't think git request-pull is at all different from other git commands. git log means the same thing as git log HEAD. Exact same thing, and nobody would actually write out that HEAD (except inside scripts, perhaps). So basically I agree that git request-pull has changed behavior, but the new behavior is *more* in line with other git commands, and the old behavior was actually really really odd with that whole extensive guess what the user means. No other git command ever did that guessing thing (ok, famous last words, maybe somebody can come up with one), and not mentioning a branch/tag/commit explicitly pretty much always means HEAD. :Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: patch-2.7.3 no longer applies relative symbolic link patches
On Mon, Jan 26, 2015 at 8:32 AM, Josh Boyer jwbo...@fedoraproject.org wrote: I went to do the Fedora 3.19-rc6 build this morning and it failed in our buildsystem with: + '[' '!' -f /builddir/build/SOURCES/patch-3.19-rc6.xz ']' + case $patch in + unxz + patch -p1 -F1 -s symbolic link target '../../../../../include/dt-bindings' is invalid error: Bad exit status from /var/tmp/rpm-tmp.mWE3ZL (%prep) Ugh. I don't see anything we can do about this on the git side, and I do kind of understand why 'patch' would be worried about '..' files. In a perfect world, patch would parse the filename and see that it stays within the directory structure of the project, but that is a rather harder thing to do than just say no dot-dot files. The short-term fix is likely to just use git apply instead of patch. The long-term fix? I dunno. I don't see us not using symlinks, and a quick check says that every *single* symlink we have in the kernel source tree is one that points to a different directory using .. format. And while I could imagine that patch ends up counting the dot-dot entries and checking that it's all inside the same tree it is patching, I could also easily see patch *not* doing that. So using git apply _might_ end up being the long-term fix too. I suspect that if patch cannot apply even old-style kernel patches due to the symlinks we have in the tree, and people end up having to use git apply for them, I might end up starting to just use rename-patches (ie using git diff -M) for the kernel. I've considered that for a while already, because patch _does_ kind of understand them these days, although I think it gets the cross-rename case wrong because it fundamentally works on a file-by-file basis. But if patch just ends up not working at all, the argument for trying to maintain backwards compatibility gets really weak. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] hex.c: reduce memory footprint of sha1_to_hex static buffers
On Fri, Feb 13, 2015 at 1:56 PM, Stefan Beller sbel...@google.com wrote: As I could not find any documentation on the magical 50 in the early days, I cc'd Linus in case there is something I did not think of yet. Nothing magical, it's just rounded up from 40 + NUL character. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git very slow ?
On Sun, Mar 8, 2015 at 12:37 PM, David Kastrup d...@gnu.org wrote: Since git blame outputs everything once it is finished (the first screen is purely the pager's business), it needs to unpack the entire history of the file (unless no blameable lines remain at all) and look at it. 6 seconds tends not to be all that excessive for extracting more than 5 years of a file's history. Yeah, git blame can easily be several seconds without anything being wrong. But git commit should be fairly instantaneous. Even over NFS. That said, on NFS in particular, make sure you don't have [core] PreloadIndex = false in your .gitconfig to disable the threaded index preloading. But core.preloadindex _should_ be enabled by default in anything but the most ancient git versions, and it can make a huge difference on NFS because it allows the 'lstat()' calls to check that the index is up-to-date to be done in parallel. Without that, git on NFS can be a bit sluggish. On local filesystems it normally doesn't make as much of a difference, since things tend to be either cached or seek-limited. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git very slow ?
On Sun, Mar 8, 2015 at 12:02 PM, Ken Moffat zarniwh...@ntlworld.com wrote: The comments on git bisect were for linus'skernel tree, on a local disk. 2.3GB of repo, just under 57000 files. Ugh. I hope you are talking about checked-out size. [torvalds@i7 linux]$ du -sh .git 850M .git because otherwise it sounds like that repo hasn't been repacked in forever. To really pack things (which can slow things down for old history as people said, but on the whole it tends to be a big win due to less IO), do git repack -adf --window=200 --depth=200 and go away for a while. Oh, and make sure your machine has enough memory and CPU to make that for a while not be *too* long. You should have a few hundred files (just a few tens of files directly after the repack) and that roughly 850MB of space for the repository information itself. But yeah, fully checked out and built with all the modules etc, you would have much more. That said, if you have something fairly that is consistently really slow (like the git commit you mentioned), *before* doing the repack, do strace -o ../trace-file -Ttt git commit and we can get a much better guess about why it's so slow. Send it to me in private email if you don't want to make it public, and I can take a look. ping between them gives times of 0.25 to 0.3 seconds .. and I *really* hope that was not seconds, but ms. Otherwise your nfsv3 setup is going to be really really painful. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why does git log -Gregex works with regexp-ignore-case but not with other regexp-related options?
On Mon, Apr 20, 2015 at 10:41 AM, Junio C Hamano gits...@pobox.com wrote: Ahh, OK. And not just -S and -G, the fields in headers may be something user may want to switch independently? So personally, I hate extra command line flags for this. I'd much rather see is use something in the regular expression itself, and make *that* be the way you do it, and make it be the preferred format. Otherwise, you'll always have the issue that you want *part* to be case-ignoring, and another entry not, and then it's just messy with the ignore case being some other thing. And we support that with perl regexps, but those are only enabled with libpcre. I wonder if we could just make some simple pattern extension that we make work even *without* libpcre. IOW, instead of making people use -regexp-ignore-case, could we just say that we *always* support the syntax of appending (?i) in front of the regexp. So that your git log --regexp-ignore-case --author=tiM --grep=wip example would be git log --author=(?i)tiM --grep=wip and it would match the _author_ with ignoring case, but the --grep=wip part would be an exact grep. Right now the above already works (I think) if you: - build with USE_LIBPCRE - add that --perl-regexp switch. but what I'm suggesting is that we'd make a special case for the magical perl modifier pattern at the beginning for (?i), and make it work even without USE_LIBPCRE, and without specifying --perl-regexp. We'd just special-case that pattern (and perhaps _only_ that special four-byte sequence of (?i) at the beginning of the search string), but perhaps we could support '(?s)' too? Hmm? I realize that this would be theoretically an incompatible change, but it would be very convenient and if we document it well it might be ok. I doubt people really search for (?i) at the beginning of strings _except_ if they already know about the perl syntax and want it. And to clarify: I don't suggest always building with libpcre. I literally suggest having something like /* hacky mac-hack hack */ if (strncmp((?i), p-pattern, 4)) { p-pattern += 4; p-ignore_case = true; } just in front of the regcomp() call, and nothing more fancy than that. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Odd broken --date=now behavior in current git
On Wed, Apr 15, 2015 at 12:22 AM, Eric Sunshine sunsh...@sunshineco.com wrote: The fix seems to be simply: Yup, that seems to do it for me. I'm not sure how we get to match_digit() with the time string now, though. So your patch fixes things for me, but I think: - we should move the tm.tm_isdst = -1; up a bit and add a comment - I'd like to know why it affected the author date but not the committer date, and how it got to match_digit() with that date string that didn't contain any digits. I'd spend the time on this myself, but I'm in the middle of the kernel merge window, so.. Thanks, Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Odd broken --date=now behavior in current git
I just noticed this because I had amended some merge commits with git commit --amend --date=now to update them, and that gets some funny broken timezones. I suspect it's some silly daylight savings time issue. Lookie here, I can reproduce it trivially with current git (in the git repo itself): [torvalds@i7 git]$ date; git commit -m Test --allow-empty --date=now Tue Apr 14 21:11:03 PDT 2015 [master ec7733db5360] Test Date: Tue Apr 14 20:11:03 2015 -0800 notice how the commit date message shows something funny. It shows an hour earlier, but in -0800. And the resulting commit is broken: [torvalds@i7 git]$ git show --pretty=fuller commit ec7733db5360966434e03eab1a849e6d4227231c (HEAD - master) Author: Linus Torvalds torva...@linux-foundation.org AuthorDate: Tue Apr 14 20:11:03 2015 -0800 Commit: Linus Torvalds torva...@linux-foundation.org CommitDate: Tue Apr 14 21:11:03 2015 -0700 Test notice how the AuthorDate has that -0800, but the CommitDate has -0700. Hmm. I can't be the only one seeing this? My guess is that there's a missing initialization of tm.tm_isdst somewhere or whatever. The above is with current git: [torvalds@i7 git]$ git version git version 2.4.0.rc2 Anybody? Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] am --abort: merge ORIG_HEAD tree into index
On Mon, Aug 17, 2015 at 2:48 AM, Paul Tan pyoka...@gmail.com wrote: It's true that we need to merge the ORIG_HEAD tree into the index instead of overwriting it. Patch below. Seems to work for me. Thanks, Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git am --abort screwing up index?
On Sun, Aug 16, 2015 at 12:46 PM, Linus Torvalds torva...@linux-foundation.org wrote: Maybe it has always done this, and I just haven't noticed (I usually _just_ do the git reset --hard thing, don't ask me why I wanted to be doubly sure this time). But maybe it's an effect of the new built-in am. I bisected this. It's definitely used to work, and the regression is from the new built-in am. But I cannot bisect into that branch 'pt/am-builtin', because git am doesn't actually work in the middle of that branch. So I've verified that commit c1e5ca90dba8 (Merge branch 'es/worktree-add') is good, and that commit 7aa2da616208 (Merge branch 'pt/am-builtin') is bad, but I cannot pinpoint the exact commit where git am --abort starts breaking the index. But I assume it's simply that initial implementation of --abort in commit 33388a71d23e (builtin-am: implement --abort) that already ends up rewriting the index from scratch without applying the old stat data. The test-case is pretty simple: just force a git am failure, then do git am --abort, and then you can check whether the index stat() information is valid in various ways. For the kernel, doing a git reset --hard makes it obvious because the reset will force all files to be written out (since the index stat information doesn't match the current tree). But you can do it by just counting system calls for a git diff too. On the git tree, for example, when the index has matching stat information, you get something like [torvalds@i7 git]$ strace -cf git diff .. 0.040.25 126 4 open .. ie you only actually ended up with 26 open() system calls. When the index is not in sync with the stat information, git diff will have to open each file to see what the actual contents are, and you get [torvalds@i7 git]$ strace -cf git diff ... 0.300.70 0 5987 302 open ... so now it opened about 6k files instead (and for the kernel, that number will be much larger, of course). I _think_ it's because git-am (in clean_index()) uses read_tree(), while it probably should use unpack_trees with opts.update and opts.reset set (like reset_index() does in builtin/reset.h). I have to go off do my weekly -rc now, and probably won't get to debugging this much further. Adding Stefan to the cc, since he helped with that --abort implementation. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
git am --abort screwing up index?
So I just noticed while applying a patch with git am when I had a dirty tree, and I ended up getting a failure and starting over: [torvalds@i7 linux]$ git am --abort [torvalds@i7 linux]$ git reset --hard Checking out files: 100% (50794/50794), done.0794) HEAD is now at 1efdb5f0a924 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi and the thing I reacted to is that the git reset --hard re-checked out all the files. That implies that git am --abort ended up leaving the index in a bad state, presumably it re-did the index entirely from HEAD, without filling it in with the stat() details from the old index. Maybe it has always done this, and I just haven't noticed (I usually _just_ do the git reset --hard thing, don't ask me why I wanted to be doubly sure this time). But maybe it's an effect of the new built-in am. I'm about to go out and don't have time to debug this any further right now, but I'll try to get back to it later. I thought I'd send out this email in case it makes Paul goes ahh, yes.. obvious Not a big deal - things *work* fine. But forcing checking out every file obviously also means that subsequent builds end up being slowed down etc.,. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git_open_noatime: return with errno=0 on success
On Tue, Aug 4, 2015 at 11:03 PM, Junio C Hamano gits...@pobox.com wrote: I would agree it is a good idea to clear it after seeing the first open fail due to lack of O_NOATIME before trying open for the second time, iow, more like this? So I don't think this is _wrong_ per se, but I think the deeper issue is that somebody cares about 'errno' here in the first place. A stale 'errno' generally shouldn't matter, because we either (a) return success (and nobody should look at errno) or (b) return an error later, without setting errno for that _later_ error. and I think either of those two situations are the real bug, and this clear stale errno is just a workaround. But as mentioned, I don't think clearign errno is wrong, so I'm not objecting to the patch. I just suspect there's something else goign on too.. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git log fails to show all changes for a file
On Tue, Jul 14, 2015 at 12:59 AM, Olaf Hering o...@aepfle.de wrote: On Tue, Jul 14, John Keeping wrote: It was added in an evil merge (f9da455b93f6ba076935b4ef4589f61e529ae046), That's not an evil merge. That's just a regular merge. One side had changed the argument to const: - 1b9d48f2a579 (hyper-v: make uuid_le const) while the other side had renamed the function and added an argument, and changed the return type: - d3ba720dd58c (Drivers: hv: Eliminate the channel spinlock in the callback path) an evil merge is something that makes changes that came from neither side and aren't actually resolving a conflict. That said, I do wonder if we should just make -p imply --cc. Right now we have the kind of odd situation that git log -p doesn't show merge patches, but git show cmit does show them. And you kind of have to know a lot about git to know the --cc option. In fact, that git show behavior really is very subtle, but very useful. It comes from show_rev_tweak_rev(), which is a magic git-show thing. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git log fails to show all changes for a file
On Wed, Jul 15, 2015 at 11:17 AM, Junio C Hamano gits...@pobox.com wrote: So this is a suggested change to -p -m behavior? Not really. This is a suggested behaviour for git log -p Oh, in that case I say NAK NAK NAK. That would be totally horrible, and completely unacceptable. I do git log -p all the time, and merge-diffs with --first-parent are regularly hundreds of thousands of lines for the kernel. So no. That is COMPLETELY unacceptable. Not even worth discussing. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git log fails to show all changes for a file
On Wed, Jul 15, 2015 at 9:29 AM, Junio C Hamano gits...@pobox.com wrote: I would think git log -p that implies --cc would be a good change, as long as there is an easy escape hatch to let us do what we have to do with a rather lengthy git log -p --first-parent -m (i.e. show the change relative to its first parent while traversing the first parent chain) today. Perhaps only when there is no explicit -m but -p is given, automatically enable --cc, or something like that. That's very close to what git show does through that magic show_rev_tweak_rev() logic, with the crucial difference being that git show is designed to always show the diff, so the -p is implied. That said, having thought about it more, I'm not entirely sure we can do it. The big conceptual difference between git log and git show is obviously that git show doesn't walk the revision list, and you always explicitly say show _this_ commit. And that means that with git show, you kind of _know_ that the commit is relevant and interesting, in a way that git log does not. So got git show, it's very natural to say show all the relevant information, while for git log we did make the choice that maybe commit diffs aren't relevant by default. And the whole issue ends up boiling down to maybe we picked the wrong choice default. We default to that ignore_merges = 1 behavior. Now, we defaulted to ignoring merge diffs because long long ago, in a galaxy far away, we didn't have a great way to show the diffs. The whole --cc option goes back to January -06 and commit d8f4790e6fe7 (diff-tree --cc: denser combined diff output for a merge commit) . And before that option - so for about 8 months - we had no good way to show the diffs of merges in a good dense way. So the whole don't show diffs for merges by default actually made a lot of sense originally, because our merge diffs were not very useful. But that does mean that if we now enable --cc by default when you ask for diffs, we have no good way to _disable_ it. We picked disable by default, and -m means enable merge diffs. And that made sense back in 2005 because we really wanted to disable the whole show diffs for merges thing. Of course, you can use --no-merges to basically not show merges at all, so maybe that's ok. But I get the feeling that if we make -p imply --cc, we should probably add a --no-merge-diff option too to replace the (broken) -m flag properly. And I'm a tiny bit worried that it might break some script (although I'm really not seeing how/why you'd script git log -p output and not want to get a --cc patch for a merge). And -m? We should probably get rid of it. The diffs we get for merges when -c or --cc isn't given are _not_ useful. The original non-combined diff format was really just useful for showing that yeah, we have multiple parents, and they are different in all these ways. So there is no actual valid use for -m that I can imagine. It's simply just an odd historical artifact from a time when we didn't know how to show merges. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git log fails to show all changes for a file
On Wed, Jul 15, 2015 at 10:58 AM, Junio C Hamano gits...@pobox.com wrote: * When '-p' is given, we show only diff with first-parent by default, regardless of the traversal (i.e. --first-parent option currently controls both traversal and patch display, but in the new world order, it reverts back to purely a traversal option). So this is a suggested change to -p -m behavior? Yes, that sounds sane. The current -p -m behavior is not useful at all. So if I understand rightly, we'd have: -p would be what is currently -p --cc -p -m would be what is currently -p --first-parent -p --no-show-merge-diffs would be what is currently -p and the rationale would be that (a) the current -p is hiding things, and while you can add --cc, that requires that you really understand what is being hidden, which is a bad default (the complaint that started this discussion) (b) the current -p -m is useless crazy stuff, and you'd rather use it for something that you actually find very common and useful If so, I agree entirely. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] Add git-grep threads param
On Mon, Nov 9, 2015 at 10:32 AM, Victor Leschukwrote: > On Mon, Nov 9, 2015 at 9:28 AM, Victor Leschuk >> num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT; > > Actually I have never said the nCPUs played main role in it. T The pseudo-code you sent disagrees. Not that "online_cpus() <= 1" is likely to ever be really an issue on any development platform from the last decade. However, I do have to admit that that "online_cpus()" check goes back a long time, so I guess I can't really blame you. At least in the index preloading, I was very conscious of the IO issues. It doesn't actually make a big difference on traditional disks (seek times dominate, and concurrent IO often doesn't help at all), but the reason I keep on bringing up NFS is that back when I used CVS (oh, the horrors), I *also* worked at a company that did everything over NFS. CPU ended up almost never being the limiting factor for any SCM operation. So I don't have a very good idea of *what* we should use for automatic thread detection, but I'm pretty sure online_cpu's should not be it. Except, like Jeff mentioned, for pack formation (which does tend to be all about CPU). Sadly, detecting what kind of filesystem you are on and how well cached it is, is really pretty hard. Even when you have OS-specific knowledge, and can look up the *type* of the filesystem, what often matters more is things like "is the filesystem on a rotational media or using flash?" etc. In the meantime I'd argue for just getting rid of the online_cpu's check, because (a) I think it's actively misleading (b) the threaded grep probably doesn't hurt much even on a single CPU, and the _potential_ upside from IO could easily dwarf the cost. (c) do developers actually have single-core machines any more? But if somebody can come up with a smarter model, that would certainly be good too. The IO advantages really don't tend to be there for rotational media, but for both flash and network filesystems, threaded IO can be a huge deal. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] Add git-grep threads param
On Mon, Nov 9, 2015 at 9:28 AM, Victor Leschukwrote: > > Maybe use the simplest version (and keep num_numbers == 0 also as flag for > all other checks in code like if(num_flags) ): > > if (list.nr || cached ) > num_threads = 0; // do not use threads > else if (num_threads == 0) > num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT; I will say this AGAIN. The number of threads is *not* about the number of CPU's. Stop this craziness. It's wrong. The number of threads is about parallelism. Yes, CPU's is a small part of it. But as mentioned earlier, the *big* wins are for slow filesystems, NFS in particular. On NFS, even if you have things cached, the cache revalidation is going to cause network traffic almost every time. Being able to have several of those outstanding is a big deal. So stop with the "online_cpus()" stuff. And don't base your benchmarks purely on the CPU-bound case. Because the CPU-bound case is the case that is already generally so good that few people will care all *that* deeply. Many of the things git does are not for "best-case" behavior, but to avoid bad "worst-case" situations. Look at things like the index preloading (also threaded). The big win there is - again - when the stat() calls may need IO. Sure, it can help for CPU use too, but especially on Linux, cached "stat()" calls are really quite cheap. The big upside is, again, in situations like git repositories over NFS. In the CPU-intensive case, the threading might make things go from a couple of seconds to half a second. Big deal. You're not getting up to get a coffee in either case. In the network traffic case, the threading might make things go from one minute to ten seconds. And *that* is a big deal. That's huge. That's "annoyingly slow" to "oh, this is the fastest SCM I have ever worked with in my life". That can literally be something that changes how you work for a developer. You start doing things that you simply would never otherwise do. So *none* of the threading in git is about CPU's. Maybe we should add big honking comments about that. And that big honking comment should be in the documentation for the new flag too. Because it would be really really sad if people say "I have a laptop with just two cores, so I'll set the threading option to 2", when they then work mostly over a slow wireless network and their company wants minimal local installs. The biggest reason to NOT EVER add that configuration is literally the confusion about this being about CPU's. That was what got the whole thread started, that was what the original benchmark numbers were about, and that was WRONG. So I would strongly suggest that Junio ignore these patches unless they very clearly talk about the whole IO issue. Both in the source code, in the commit messages, and in the documentation for the config option. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
More builtin git-am issues..
Ok, this may not be new either, but I'm trying to be careful when using "git am" these days, because I know it got rewritten. And I _think_ the whitespace handling for adding sign-offs got scrogged. I just applied the usual patch-bomb from Andrew, and several of the commits (but not all) end up looking like this: Cc: <sta...@vger.kernel.org> [3.15+] Signed-off-by: Andrew Morton <a...@linux-foundation.org> Signed-off-by: Linus Torvalds <torva...@linux-foundation.org> note the extraneous whitespace line between Andrew's sign-off and mine. What's odd is that the emails I'm applying literally don't have that extra empty line, so it's git that somehow decides to add it. Only for a few cases, though. The pattern *seems* to be that git now looks at the *first* line of the sign-off block and decides that "this is a sign-off block if that first line has a sign-ff on it, ie this is fine: Signed-off-by: Andrea Arcangeli <aarca...@redhat.com> Cc: Pavel Emelyanov <xe...@parallels.com> Cc: Dave Hansen <dave.han...@intel.com> Cc: Linus Torvalds <torva...@linux-foundation.org> Signed-off-by: Andrew Morton <a...@linux-foundation.org> Signed-off-by: Linus Torvalds <torva...@linux-foundation.org> but the failing cases have a comment by Andrew: [a...@linux-foundation.org: coding-style fixes] Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com> Cc: Xishi Qiu <qiuxi...@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasu...@jp.fujitsu.com> Cc: Kamezawa Hiroyuki <kamezawa.hir...@jp.fujitsu.com> Cc: Taku Izumi <izumi.t...@jp.fujitsu.com> Cc: Gu Zheng <guz.f...@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horigu...@ah.jp.nec.com> Cc: Vlastimil Babka <vba...@suse.cz> Cc: Mel Gorman <mgor...@techsingularity.net> Cc: David Rientjes <rient...@google.com> Cc: <sta...@vger.kernel.org>[4.2.x] Signed-off-by: Andrew Morton <a...@linux-foundation.org> Signed-off-by: Linus Torvalds <torva...@linux-foundation.org> ie that "[a...@linux-foundation.org: coding-style fixes]" makes git am now decide that the previous block of text was not a sign-off block, so it adds an empty line before adding my sign-off. But very obviously it *was* a sign-off block. Maybe this isn't new at all, and it's just that I notice because I'm looking for "git am" oddities. Something is clearly wrong in "has_conforming_footer()". I *think* it's this part: if (!(found_rfc2822 || is_cherry_picked_from_line(buf + i, k - i - 1))) return 0; which basically returns 0 for _any_ line in the footer that doesn't match the found_rfc2822 format. I really think that if we find any "Signed-off-by:" in that last chunk, we should not add a whitespace. Comments? Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: More builtin git-am issues..
On Fri, Sep 4, 2015 at 4:47 PM, Linus Torvalds <torva...@linux-foundation.org> wrote: > > I *think* it's this part: > > if (!(found_rfc2822 || > is_cherry_picked_from_line(buf + i, k - i - 1))) > return 0; > > which basically returns 0 for _any_ line in the footer that doesn't > match the found_rfc2822 format. Confirmed. I hacked up a version that just doesn't do that check at all, and it works fine (but obviously only on well-formatted emails that really do have a sign-off). So I think that logic should basically be extended to saying - if any line in the last chunk has a "Signed-off-by:", set a flag. - at the end of the loop, if that flag wasn't set, return 0. Instead of that thing that basically returns zero immediately when it sees a line it doesn't like. I'm in the middle of my merge window, so I'm not going to get around to writing a patch until that's over. Hopefully somebody will step up in the meantime. Hint, hint. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: More builtin git-am issues..
On Fri, Sep 4, 2015 at 6:06 PM, Junio C Hamanowrote: > > that rule would still not think this is a signature block, but at > that point, do we really want to consider such a block of text a > signature block? So exactly why are you arguing for these rules that are known to break in real life, that I gave actual examples for existing, and that I also gave an actual example for not just giving a false negative, but also a false positive? I'm also pretty sure that what you are arguing for is a regression. Now, as mentioned, it may well be true that we've had this odd behavior before, and it's not a real regression - I may just have picked up on this problem because I've been more careful. Maybe I didn't notice these problems before. But looking at the old git-am.sh script, it does simply seem to look for that '^Signed-off-by:' pattern. It did ADD_SIGNOFF=$( test "$LAST_SIGNED_OFF_BY" = "$SIGNOFF" || { test '' = "$LAST_SIGNED_OFF_BY" && echo echo "$SIGNOFF" }) which seems to literally just check the last sign-off line it found. If it matches the new sign-off, it doesn't do anything (and doesn't add the new one either), and if it doesn't exist at all (so it's empty) it adds teh empty line. Quite frankly, that not only worked for a long time, it's simply less ambiguous than your made up rule. It's very simple. "if you find a sign-off in the commit message, don't add an new empty line before the new signoff". Much better than "every line in the last set of lines must match some weak format that isn't even true, and is too non-specific anyway". Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: More builtin git-am issues..
On Fri, Sep 4, 2015 at 6:23 PM, Junio C Hamano <gits...@pobox.com> wrote: > > OK, I didn't know that was acceptable in the kernel community > to have random comments like that inside the block. Can these > comments span multiple paragraphs? What I am wondering is what > you want to see in a case like this: > > Signed-off-by: Noam Camus <no...@ezchip.com> > Acked-by: Vineet Gupta <vgu...@synopsys.com> > [ Also removed pointless cast from "void *". - Linus] > Signed-off-by: Linus Torvalds <torva...@linux-foundation.org> > [ Ahh, we have to tell at least these people > > - stakeholder class 1 > - staeholder class 2 > ] > Cc: foo > Cc: bar So quite frankly, at that point if git doesn't recognize it as a sign-off block, I don't think it's a big deal. That said, the original "git am" rules actually seem to be rather straightforward: it's never an issue about "last block of text", and it's simply an issue of "is there a sign-ff _anywhere_ in the text". That simplicity has a certain appeal to me. I don't think it was necessarily written that way because it was "well designed" - I suspect it is more an issue of "easy to implement in a shell-script". And it's possible that I'm mis-reading the scripts too. It's not like I _remember_ what the exact behavior was, I just think it used to work really well for us (ie I don't recall seeing lots of those empty lines in the middle of the sign-off block before, and this current merge window it happened for four of the emails I applied from Andrew's 119-patch series.. Four out of 119 emails may not be a big percentage, but it does mean that it's not horribly unusual either.. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] am: match --signoff to the original scripted version
On Sat, Sep 5, 2015 at 9:56 PM, Junio C Hamanowrote: > > For > the upcoming release, stop using the append_signoff() in "git am" > and reimplement the looser definition used by the scripted version > to use only in "git am" to fix this regression in "am" while > avoiding new regressions to other users of append_signoff(). I've tested this by re-doing the same patch-bomb from Andrew that I had problems with originally, and it WorksForMe(tm). Thanks, Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: More builtin git-am issues..
On Fri, Sep 4, 2015 at 5:07 PM, Jeff Kingwrote: > > Do we want to make Signed-off-by a special token here? What about "Cc:", > and other project-specific trailers? You wouldn't want to end up with: > > [some comment] > Cc: somebody > > Signed-off-by: somebody else > > It's not a problem for git-am, which should be taking in patches that > are already signed-off (and after all, if your project does not use > signoffs, why are you using "am -s"?). But won't "git commit -s" run > into the same problem? So I'm a *bit* worried about taking anything else than Signed-off-by: exactly because of the problem you mention: > We could look for an arbitrary rfc2822-ish string, but I'd be worried > slightly about false positives, like: > > This is a paragraph with arbitrary text. But one > of the lines will use a colon, and that a causes a > problem: because of wrapping, this line kind of looks > like a trailer. which clearly needs an empty line before the sign-off. And even if we limit it to just the last line, like you suggest: > Maybe only the final line needs to look rfc2822-ish? I'd still worry, for the same exact reason. The "rfc2822-ish" check is _so_ weak that it can be trivially triggered by normal text. So maybe it doesn't require a strict "Signed-off-by:" match, but I think it needs something stricter than the found_rfc2822 format (like maybe "looks like a name/email"). We just don't want to require that *all* the lines are that way, because at least in the kernel we often do end up adding small comments in that section. The git project sign-off rules seem to be stricter, and it looks like it's almost universally of the form "Signed-off-by:" with a few "Helped-by:" and "Reviewed-by:". In the kernel, we really do tend to be messier in that area, with the sign-off chunk not just having the sign-offs and cc's etc, but also tends to have those occasional small notes left by the people forwarding the emails. And we also often don't have _strictly_ just an email. We tend to have things like Cc: # v3.13+ etc, and sometimes we don't have have any email at all, but things like Reported-by: coverity Tested-by: MysterX on #openelec Generated-by: Coccinelle SmPL so it's hard to give a really strict rule. It's fairly free-format. That said, I would argue that when you apply a patch with sign-off, pretty much by definition that patch _should_ have a sign-off from the originator. So I suspect that the "there is an existing sign-off in that last chunk" is a fairly good rule. Even if there are lots of other lines too. If you're using "git commit -s" to just commit your own work, you presumably would normally want the extra sign-off. Or at least you can do a "git commit --amend" fairly easily to fix things up. Doing the amending later for "git am" is rather impractical, when it might have been a series of 100+ emails.. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: More builtin git-am issues..
On Fri, Sep 4, 2015 at 5:54 PM, Junio C Hamano <gits...@pobox.com> wrote: > > How about a bit looser rule like this? > > A block of text at the end of the message, each and every > line in which must match "^[^: ]+:[ ]" (that is, > a "keyword" that does not contain a whitespace nor a colon, > followed by a colon and whitespace, and arbitrary value thru > the end of line) is a signature block. No. That's still broken. The thing is, and that was what the report was all about, not every line _is_ of that format. We have commetns from the sign-off people. Things like this: Signed-off-by: Noam Camus <no...@ezchip.com> Acked-by: Vineet Gupta <vgu...@synopsys.com> [ Also removed pointless cast from "void *". - Linus ] Signed-off-by: Linus Torvalds <torva...@linux-foundation.org> or Signed-off-by: Andi Kleen <a...@linux.intel.com> [ Updated comments and changelog a bit. ] Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> Link: http://lkml.kernel.org/r/1424225886-18652-3-git-send-email-a...@firstfloor.org Signed-off-by: Ingo Molnar <mi...@kernel.org> so no, it is simply not true that "every line must match". I'm not even seeing why you argue for that, since clearly having a sign-off-line is actually a safer choice too. The "every line must match" rule is bad, not just because it's not true like above, but also because it can be true without it being a sign-off block. For example, it's not at all unlikely that you have perfectly normal comments that just list some subsystem and their changes. Which could easily look like Trivial fixes all over the tree drm: fix whitespace mm: speeling errors kernel: indentation and codign style The above looks like a perfectly sane commit log to me. Do you seriously think that it makes for a better "sign-off block test" than one that actually checks for "is there a sign-off line"? I'd much rather have special cases like testing for specific keywords or looking for things that look like emails, than make it about being "every line has this very generic format". Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RFC: dynamic "auto" date formats
This is a throw-away idea with a simple patch attached, which I don't think anybody should really take all that seriously per se, but I thought I'd throw it out and see if it generates any discussion. I almost never use anything but the default date format (DATE_NORMAL), but every once in a while I will use "--date=relative", typically because I'm looking at my own commits and I'm just checking when I did something. It struck me that I've made the default for most of the logging things be that when I'm just browsing with the pager, git will automatically do the right thing. So I have [color] ui=auto [log] decorate = auto in my ~/.gitconfig, and it shows me all those other things I tench to want to know (like "thar's what I've pushed out" thanks to decorations). So I started thinking about when I care about dates, and decided that maybe I can have a "--date=auto" too. It basically uses relative date formats for the last 24 hours, and then switches over to the default format. I've used it a bit, and like Katy Perry said, I think I might like it. Note that this doesn't add any gitconfig setting to do this, which would be part of the whole point if this is actually sensible. But I'm not entirely convinced it's worth it in the first place, thus this email to see how people react ("That's just stupid" vs "yeah, I didn't even know I wanted it, but now I need it"). And no, I'm not at all sure that the 24-hour cut-off is the right thing, but it didn't seem completely crazy either. I tend to like the relative date format when it is "19 minutes ago" vs "2 hours ago", at some point it's long enough ago that it's more useful to know "Tuesday at 3pm" than about how long ago it was. (And yes, it would be even better to have the "short term relative date" turn into a "medium-term 'day of the week at time x'" and then turn into "full date" when it's more than a week ago, but this patch only has the two modes of "short term" and "long term" and nothing in between). Linus builtin/blame.c | 1 + cache.h | 3 ++- date.c | 10 +++--- 3 files changed, 10 insertions(+), 4 deletions(-) diff --git a/builtin/blame.c b/builtin/blame.c index 21f42b0b62b8..4d87181dc6cd 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -2640,6 +2640,7 @@ parse_done: fewer display columns. */ blame_date_width = utf8_strwidth(_("4 years, 11 months ago")) + 1; /* add the null */ break; + case DATE_AUTO: case DATE_NORMAL: blame_date_width = sizeof("Thu Oct 19 16:00:04 2006 -0700"); break; diff --git a/cache.h b/cache.h index 6049f8671138..624817c20414 100644 --- a/cache.h +++ b/cache.h @@ -1223,7 +1223,8 @@ struct date_mode { DATE_ISO8601_STRICT, DATE_RFC2822, DATE_STRFTIME, - DATE_RAW + DATE_RAW, + DATE_AUTO, } type; const char *strftime_fmt; int local; diff --git a/date.c b/date.c index 7c9f76998ac7..c38414a3d968 100644 --- a/date.c +++ b/date.c @@ -184,13 +184,15 @@ const char *show_date(unsigned long time, int tz, const struct date_mode *mode) return timebuf.buf; } - if (mode->type == DATE_RELATIVE) { + if (mode->type == DATE_RELATIVE || mode->type == DATE_AUTO) { struct timeval now; strbuf_reset(); gettimeofday(, NULL); - show_date_relative(time, tz, , ); - return timebuf.buf; + if (mode->type != DATE_AUTO || time + 24*60*60 > now.tv_sec) { + show_date_relative(time, tz, , ); + return timebuf.buf; + } } tm = time_to_tm(time, tz); @@ -792,6 +794,8 @@ static enum date_mode_type parse_date_type(const char *format, const char **end) return DATE_RAW; if (skip_prefix(format, "format", end)) return DATE_STRFTIME; + if (skip_prefix(format, "auto", end)) + return DATE_AUTO; die("unknown date format %s", format); }
Re: Clarification on the git+ssh and ssh+git schemes
On Fri, Feb 5, 2016 at 11:30 AM, Jeff Kingwrote: > > I suspect they were not really documented because nobody wanted to > encourage their use. I don't think it would be wrong to document that > they exist and are deprecated, though. They exist because some people seemed to think that people shouldn't use "ssh://" since they thought that only ssh should use that. Which is obviously bullshit, since by that logic all the other formats should have that idiotic "git+" format too ("git+https", anybody?). It doesn't actually help anything, and it only pushes somebodys broken agenda. So there was a push for that silly thing by a couple of people, but it was always wrong. Don't even document it. Leave it in the source code as an option, and maybe add a comment about "This is stupid, but we support it for hysterical raisins". Don't add it to any real documentation. Not even as deprecated. That just validates it further. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] pretty-print: de-tabify indented logs to make things line up properly
From: Linus Torvalds <torva...@linux-foundation.org> Date: Wed, 16 Mar 2016 09:15:53 -0700 Subject: [PATCH] pretty-print: de-tabify indented logs to make things line up properly This should all line up: Column 1 Column 2 A B ABCD EFGH SPACESInstead of Tabs Even with multi-byte UTF8 characters: Column 1 Column 2 Ä B åäö 100 A Møøse once bit my sister.. Signed-off-by: Linus Torvalds <torva...@linux-foundation.org> --- This seems to work for me, and while there is some cost, it's minimal. Doing a "git log > /dev/null" of the current git tree is about 1% slower because of the tab-finding. A tree with a lot of tabs in the commit messages would be more noticeable, because then you actually end up hitting the whole "how wide is this" issue. (But if the tabs are all at the beginning of a line, you'd still be ok and avoid the utf8 width calculations). Comments? pretty.c | 76 ++-- 1 file changed, 74 insertions(+), 2 deletions(-) diff --git a/pretty.c b/pretty.c index 92b2870a7eab..0b40457f99f0 100644 --- a/pretty.c +++ b/pretty.c @@ -1629,6 +1629,76 @@ void pp_title_line(struct pretty_print_context *pp, strbuf_release(); } +static int pp_utf8_width(const char *start, const char *end) +{ + int width = 0; + size_t remain = end - start; + + while (remain) { + int n = utf8_width(, ); + if (n < 0 || !start) + return -1; + width += n; + } + return width; +} + +/* + * pp_handle_indent() prints out the intendation, and + * perhaps the whole line (without the final newline) + * + * Why "perhaps"? If there are tabs in the indented line + * it will print it out in order to de-tabify the line. + * + * But if there are no tabs, we just fall back on the + * normal "print the whole line". + */ +static int pp_handle_indent(struct strbuf *sb, int indent, +const char *line, int linelen) +{ + const char *tab; + + strbuf_addchars(sb, ' ', indent); + + tab = memchr(line, '\t', linelen); + if (!tab) + return 0; + + do { + int width = pp_utf8_width(line, tab); + + /* +* If it wasn't well-formed utf8, or it +* had characters with badly defined +* width (control characters etc), just +* give up on trying to align things. +*/ + if (width < 0) + break; + + /* Output the data .. */ + strbuf_add(sb, line, tab - line); + + /* .. and the de-tabified tab */ + strbuf_addchars(sb, ' ', 8-(width & 7)); + + /* Skip over the printed part .. */ + linelen -= 1+tab-line; + line = tab + 1; + + /* .. and look for the next tab */ + tab = memchr(line, '\t', linelen); + } while (tab); + + /* +* Print out everything after the last tab without +* worrying about width - there's nothing more to +* align. +*/ + strbuf_add(sb, line, linelen); + return 1; +} + void pp_remainder(struct pretty_print_context *pp, const char **msg_p, struct strbuf *sb, @@ -1652,8 +1722,10 @@ void pp_remainder(struct pretty_print_context *pp, first = 0; strbuf_grow(sb, linelen + indent + 20); - if (indent) - strbuf_addchars(sb, ' ', indent); + if (indent) { + if (pp_handle_indent(sb, indent, line, linelen)) + linelen = 0; + } strbuf_add(sb, line, linelen); strbuf_addch(sb, '\n'); } -- 2.8.0.rc2 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] GPIO bulk changes for kernel v4.6
On Fri, Mar 18, 2016 at 9:37 AM, Junio C Hamanowrote: > >> I don't think the original "resolve" did it, for example. You can't do >> a three-way merge without a base. > > Yes, and that continues to this day: Yeah, "octopus" also refuses it cleanly: common=$(git merge-base --all $SHA1 $MRC) || die "Unable to find common commit with $pretty_name" The code in the recursive merge that allows this to happen is this: if (merged_common_ancestors == NULL) { /* if there is no common ancestor, use an empty tree */ struct tree *tree; tree = lookup_tree(EMPTY_TREE_SHA1_BIN); merged_common_ancestors = make_virtual_commit(tree, "ancestor"); } so the "no common ancestors" is just considered to be an empty merge base. And I do think that's right, and I think it's clever, and it goes back to 2006: 934d9a24078e merge-recur: if there is no common ancestor, fake empty one but I think there should be an option there. > This is a tangent but I wonder if we should say why we refuse to > the standard error before calling these two "exit"s. As mentioned, Octopus does. That said, there's probably no reason to ever use the old three-way merge, so I'm not even sure it's worth fixing the old git-merge-resolve. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tabs in commit messages - de-tabify option in strbuf_stripspace()?
So I end up doing this manually when I notice, but I was wondering ig maybe git could just have an option to "git am" and friends to de-tabify the commit message. It's particularly noticeable when people line things up using tabs (for the kernel, it's often things like "cpu1 does X, cpu2 does Y"), and then when you do "git log" it looks like a unholy mess, because the 4-char indentation of the log message ends up causing those things to not line up at all after all. The natural thing to do would be to pass in a "tab size" parameter to strbuf_stripspace(), and default it to 0 (for no change), but have some way to let people say "expand tabs to spaces at 8-character tab-stops" or similar (but let people use different tab-stops if they want). Do people hate that idea? I may not get around to it for a while (it's the kernel merge window right now), but I can write the patch eventually - I just wanted to do an RFC first. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?
On Tue, Mar 15, 2016 at 5:23 PM, Stefan Bellerwrote: > > Could you point at some example to better understand the problem? So in the kernel repo, I just randomly looked for tabs that show this problem, and take for example commit ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd. You can see how the original lined up, by doing git show --pretty=email ff9a9b4c4334 because the email format doesn't indent the commit message. But if you do just git show ff9a9b4c4334 and get the usual indentation, you'll see things not line up at all. In case you don't want to bother with the kernel repo, here's what it looks like: email format: --- snip snip 8< --- A microbenchmark calling an invalid syscall number 10 million times in a row speeds up an additional 30% over the numbers with just the previous patches, for a total speedup of about 40% over 4.4 and 4.5-rc1. Run times for the microbenchmark: 4.43.8 seconds 4.5-rc13.7 seconds 4.5-rc1 + first patch 3.3 seconds 4.5-rc1 + first 3 patches 3.1 seconds 4.5-rc1 + all patches 2.3 seconds A non-NOHZ_FULL cpu (not the housekeeping CPU): all kernels1.86 seconds --- snip snip 8< --- Normal "git show" format: --- snip snip 8< --- A microbenchmark calling an invalid syscall number 10 million times in a row speeds up an additional 30% over the numbers with just the previous patches, for a total speedup of about 40% over 4.4 and 4.5-rc1. Run times for the microbenchmark: 4.43.8 seconds 4.5-rc13.7 seconds 4.5-rc1 + first patch 3.3 seconds 4.5-rc1 + first 3 patches 3.1 seconds 4.5-rc1 + all patches 2.3 seconds A non-NOHZ_FULL cpu (not the housekeeping CPU): all kernels1.86 seconds --- snip snip 8< --- which hopefully clarifies. In the above case, it really isn't very annoying. It's just slightly ugly. In some other cases, it can get quite hard to see what's up, but the ones that come through me I actually tend to try to edit, so many of them have been corrected. For other examples (again, in the kernel), look at 19b2c30d3cce, or 0dc8c730c98a. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?
On Tue, Mar 15, 2016 at 5:45 PM, Junio C Hamanowrote: > > Wouldn't it be nicer to do this kind of thing at the output side? A > stupid way would be to have an option to indent the log text with a > tab instead of 4-spaces, but a more sensible way would be to keep > the visual 4-space-indent and do the expand-tab for each line of > text. Yes, that would certainly work for me too. It's just the "ascii boxes don't line up" that is problematic.. (Also people including example source code etc, where the first tab looks wildly different from the second one) Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] GPIO bulk changes for kernel v4.6
On Fri, Mar 18, 2016 at 7:32 AM, Johannes Schindelin <johannes.schinde...@gmx.de> wrote: > > On Fri, 18 Mar 2016, Linus Torvalds wrote: > >> I thought git didn't merge two branches that have no common base by >> default, but it seems it will happily do so. > > What happened to "The coolest merge EVER!"? > > http://thread.gmane.org/gmane.comp.version-control.git/5126/ I'm not complaining about multi-root capability in general - it's still cool. In the kernel, we have aef8755711a2 ("Merge Btrfs into fs/btrfs") that does something slightly similar. It's literally just the fact that "git merge" does it with no extra flags or checks. I'd like people to have to be aware of what they are doing when they merge two different projects, not do it by mistake. So making it conditional on a flag like "--no-common-root" is what I'd look for. Or just make it about the merge stategy. For example, "subtree" makes sense exactly for merging one project into a subdirectory of another. But the default merge shouldn't do it. I don't think the original "resolve" did it, for example. You can't do a three-way merge without a base. Note how that above "coolest merge" definitely wasn't done by just "git merge gitk". I had to play around with the git internals. Now, it shouldn't be _that_ hard, but at the same time it really shouldn't be as easy as "I didn't know, so I merged two independent projects". Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pretty-print: de-tabify indented logs to make things line up properly
On Wed, Mar 16, 2016 at 2:37 PM, Junio C Hamanowrote: > > What surprised me was that this new expand logic triggered for > shortlog, actually. I somehow assumed the caller that called > de-tabify helper was only called for --pretty=medium. I guess that would be ok, since shortlog by definition can't have any issues with multiple lines lining up with each other. At the same time, it might be a bit odd to show tabs in that first line differently for the one-line vs multi-line log version. But maybe it isn't - I think shortlog is the only thing that does that wrapping anyway, so shortlog is already special. I think the reason shortlog output gets both the de-tab and the wrapping is that shortlog_add_commit() just calls pretty_print_commit with CMIT_FMT_USERFORMAT. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pretty-print: de-tabify indented logs to make things line up properly
On Wed, Mar 16, 2016 at 12:47 PM, Junio C Hamanowrote: > > Strangely running t4201 with your patch (without any squashing) > seems to show a breakage in shortlog. I won't be able to come back > to this topic for at least a few hours, so this is just a single bit > "breaks" report, without "how and why" analysis, sorry. It's because those things have tabs in their first line, so the output now differs from the expected one exactly because of the tab-vs-space expansion. The wrapping logic is then also different, because the .wrapping code does the tabs as "align to 8 chars" while the new code does tabs as "align to 8 chars modulo the indent offset". I only looked at the first case, but I assume the others are just more of the same. We'd just adjust the expected output, I assume. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] pretty-print: add --pretty=noexpand
On Thu, Mar 17, 2016 at 4:16 PM, Junio C Hamanowrote: > It is reasonable for tweak the default output mode for "git log" to > untabify the commit log message, it sometimes may be necessary to > see the output without tab expansion. Thanks, these all look good to me. Sorry for not following up, it's just that I'm in the middle of the kernel merge window and haven't had the time to worry about it. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] merge: refuse to create too cool a merge by default
On Fri, Mar 18, 2016 at 1:21 PM, Junio C Hamano <gits...@pobox.com> wrote: > While it makes sense to allow merging unrelated histories of two > projects that started independently into one, in the way "gitk" was > merged to "git" itself aka "the coolest merge ever", such a merge is > still an unusual event. Worse, if somebody creates an independent > history by starting from a tarball of an established project and > sends a pull request to the original project, "git merge" however > happily creates such a merge without any sign of something unusual > is happening. > > Teach "git merge" to refuse to create such a merge by default, > unless the user passes a new "--allow-unrelated-histories" option to > tell it that the user is aware that two unrelated projects are > merged. Heh. I had a separate set of patches for you, but hadn't sent them out because of the other test failures (which you also worked out). Mine was slightly different, I just went with a "unrelated" merge option. I'll attach my two patches anyway, if for no other reason than the fact that I actually wrote a new test for this. Feel free to ignore my patches, they have nothing really different from yours, just slightly different implementation. Linus From 0f3e4a9294eeda6799e3e50e28809133233126db Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torva...@linux-foundation.org> Date: Fri, 18 Mar 2016 12:46:06 -0700 Subject: [PATCH 1/2] t3035: test merging of unrelated branches Right now this succeeds, and the test actually verifies that behavior. Signed-off-by: Linus Torvalds <torva...@linux-foundation.org> --- t/t3035-merge-recursive-across-project.sh | 28 1 file changed, 28 insertions(+) create mode 100755 t/t3035-merge-recursive-across-project.sh diff --git a/t/t3035-merge-recursive-across-project.sh b/t/t3035-merge-recursive-across-project.sh new file mode 100755 index ..b5d614db6b08 --- /dev/null +++ b/t/t3035-merge-recursive-across-project.sh @@ -0,0 +1,28 @@ +#!/bin/sh + +test_description='merge-recursive with unrelated projects + +Test rename detection by examining rename/delete conflicts. + + * A: file A + + * B: file B +' + +. ./test-lib.sh + +test_expect_success 'setup repository' \ + 'echo Hello >A && + git add A && + git commit -m "Branch A" A && + git branch A && + git mv A B && + echo Hi >B && + git commit -m "Branch B" --amend B && + git branch B' + +test_expect_success 'merge unrelated branches' \ + 'git checkout -b merged A && + git merge B' + +test_done -- 2.8.0.rc3.9.g44915db From cd6b2388c73f37b3dd6180d9a42993fd219ebb31 Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torva...@linux-foundation.org> Date: Fri, 18 Mar 2016 12:57:30 -0700 Subject: [PATCH 2/2] merge: fail merging of unrelated branches Add test for this, and add the "unrelated" merge option to allow it to succeed. Signed-off-by: Linus Torvalds <torva...@linux-foundation.org> --- merge-recursive.c | 6 ++ merge-recursive.h | 1 + t/t3035-merge-recursive-across-project.sh | 7 +-- 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index b880ae50e7ee..92779db0bbe6 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1927,6 +1927,9 @@ int merge_recursive(struct merge_options *o, /* if there is no common ancestor, use an empty tree */ struct tree *tree; + if (!o->allow_unrelated) + die(_("will not merge unrelated branches")); + tree = lookup_tree(EMPTY_TREE_SHA1_BIN); merged_common_ancestors = make_virtual_commit(tree, "ancestor"); } @@ -2039,6 +2042,7 @@ void init_merge_options(struct merge_options *o) memset(o, 0, sizeof(struct merge_options)); o->verbosity = 2; o->buffer_output = 1; + o->allow_unrelated = 0; o->diff_rename_limit = -1; o->merge_rename_limit = -1; o->renormalize = 0; @@ -2092,6 +2096,8 @@ int parse_merge_opt(struct merge_options *o, const char *s) o->renormalize = 1; else if (!strcmp(s, "no-renormalize")) o->renormalize = 0; + else if (!strcmp(s, "unrelated")) + o->allow_unrelated = 1; else if (!strcmp(s, "no-renames")) o->detect_rename = 0; else if (!strcmp(s, "find-renames")) { diff --git a/merge-recursive.h b/merge-recursive.h index 52f0201f68a3..19eb52eeb732 100644 --- a/merge-recursive.h +++ b/merge-recursive.h @@ -15,6 +15,7 @@ struct merge_options { const char *subtree_shift; unsigned buffer_output : 1; unsigned renormalize : 1; + unsigned allow_unrelated : 1; long xdl_opts; int verbosity; int detect_rename; diff --git a/t/t3035-merge-recursive-across-project.sh b/t/t
Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?
On Tue, Mar 15, 2016 at 5:48 PM, Linus Torvalds <torva...@linux-foundation.org> wrote: > On Tue, Mar 15, 2016 at 5:45 PM, Junio C Hamano <gits...@pobox.com> wrote: >> >> Wouldn't it be nicer to do this kind of thing at the output side? A >> stupid way would be to have an option to indent the log text with a >> tab instead of 4-spaces, but a more sensible way would be to keep >> the visual 4-space-indent and do the expand-tab for each line of >> text. > > Yes, that would certainly work for me too. It's just the "ascii boxes > don't line up" that is problematic.. Ok, that actually turns out to be pretty easy. Here's a first try at it. It does tab expansion only for the cases that indent the commit message, so for things like "pretty=email" it makes no difference at all. However, it hardcodes the hardtab size to 8, and makes this all unconditional. That's how a vt100 terminal works, but fancer terminals may allow you set actual tab-stops, and if you're in emacs you can probably do just about anything) Comments? This does make the kernel commit 0dc8c730c98a look fine, so it would make me happy. I can write a commit log etc if people think this is ok, but right now this is just a silly raw patch in the attachement.. Linus pretty.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/pretty.c b/pretty.c index 92b2870a7eab..dcd6105d1eb2 100644 --- a/pretty.c +++ b/pretty.c @@ -1652,8 +1652,26 @@ void pp_remainder(struct pretty_print_context *pp, first = 0; strbuf_grow(sb, linelen + indent + 20); - if (indent) + if (indent) { + int i, last = 0, pos = 0; + strbuf_addchars(sb, ' ', indent); + for (i = 0; i < linelen; i++) { + int expand; + if (line[i] != '\t') + continue; + strbuf_add(sb, line+last, i-last); + pos += i-last; + expand = (pos + 8) & ~7; + strbuf_addchars(sb, ' ', expand - pos); + pos = expand; + last = i+1; + } + + // Handle the tail non-tab content + line += last; + linelen -= last; + } strbuf_add(sb, line, linelen); strbuf_addch(sb, '\n'); }
Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?
On Tue, Mar 15, 2016 at 9:46 PM, Junio C Hamanowrote: > > It also ignores that byte counts of non-HT bytes may not necessarily > match display columns. There is utf8_width() function exported from > utf8.c for that purpose. Hmm. I did that to now make it horribly slower. Doing the per-character widths really does horrible things for performance. That said, I think I can make it do the fast thing for lines that have no tabs at all, which is the big bulk of it. So it would do the width calculations only in the rare "yes, I found a tab" case. I already wrote it in a way that non-tab lines end up just looping over the line looking for a tab and then fall back to the old code. I might just have to make that behavior a bit more explicit. Let me stare at it a bit, but it won't happen until tomorrow, I think. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] merge: refuse to create too cool a merge by default
On Wed, Apr 6, 2016 at 11:36 AM, Junio C Hamanowrote: > > I was reviewing the topics to merge to 'master', and a thought > crossed my mind. Both of our series only refuse to create a merge > that does not have any common ancestor, but shouldn't the right > solution refuse to add a new root? So I think that's an independent thing, and I think you're right, it would be a good feature to have. As a maintainer, it would have caught the whole "my submaintainers did something really odd, I need to talk to them" before-the-fact rather than after-the-fact. That said, I'm less worried about my submaintainers doing _intentionally_ annoying things, than people doing silly things by mistake. So if I had a version of git that allowed me to say "don't allow pulls that add a new root commit unless I specify a '--new-root-allowed' flag", then yes, I'd use that. And I guess it might not be too nasty to add: it could be done as part of the object checking pass after downloading the pack. Was that what you were thinking of? But at the same time, I think the existing series you have, which hopefully results in these things not happening by mistake in the future is the more important piece. I'd rather get good pull requests than get errors and have to tell people "you screwed up, now we'll need to re-do this". But if you cook something up that checks that there are no new roots in the pull, I'd certainly appreciate that safety net. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] pretty-print: add --pretty=noexpand
On Thu, Mar 17, 2016 at 10:08 PM, Jeff Kingwrote: > > Hmm. Isn't "expand tabs" orthogonal to the rest of the pretty format? > That is, couldn't one want "--pretty=fuller, but with tabs expanded"? Yeah, you are right, one easily could. And in fact I end up doing "fuller" myself occasionally, because I check peoples commit timestamps (some people have a nasty habit of rebasing when they shouldn't). So it's not just the medium format that would want detab by default, it's "full" and "fuller" too (but probably not "raw": that indents the message too, but the only real reason to use "raw" is for scripting). So it would probably be better to make it a separate flag, and not tie it to a particular log format (and just make the log format set the default). Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pretty-print: de-tabify indented logs to make things line up properly
On Wed, Mar 16, 2016 at 12:32 PM, Junio C Hamanowrote: > > may give us a better structure if we are going to give users a knob > to disable this tab expansion, i.e. move the addition of 4 spaces to > the caller, name the body of such a function strbuf_expand_add(), > and then make the caller do something like this perhaps? I'd suggest just putting that knob into the "pp_handle_indent()" function, and passing it the "pp" pointer. In fact, maybe it should just be renamed as "pp_add_line()", and handle every case, and keep "pp_remainder()" as just the "loop over each line and handle the empty line and PP_SHORT special case" thing. That makes it easty to add that CMIT_FMT_EXPAND_TABS kind of code later. Here's an incremental patch that could be just smushed into my previous one. It doesn't change the behavior of "pp_handle_indent()", but I think it clarifies the code and makes future changes much easier (partly because now nobody has to worry about the continue case and the newline at the end of the line, so you can just print whatever you want and then return). What do you think? Linus pretty.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/pretty.c b/pretty.c index 0b40457f99f0..b9374a1708d1 100644 --- a/pretty.c +++ b/pretty.c @@ -1699,6 +1699,18 @@ static int pp_handle_indent(struct strbuf *sb, int indent, return 1; } +static void pp_add_line(struct pretty_print_context *pp, + struct strbuf *sb, int indent, + const char *line, int linelen) +{ + strbuf_grow(sb, linelen + indent + 20); + if (indent) { + if (pp_handle_indent(sb, indent, line, linelen)) + return; + } + strbuf_add(sb, line, linelen); +} + void pp_remainder(struct pretty_print_context *pp, const char **msg_p, struct strbuf *sb, @@ -1721,12 +1733,7 @@ void pp_remainder(struct pretty_print_context *pp, } first = 0; - strbuf_grow(sb, linelen + indent + 20); - if (indent) { - if (pp_handle_indent(sb, indent, line, linelen)) - linelen = 0; - } - strbuf_add(sb, line, linelen); + pp_add_line(pp, sb, indent, line, linelen); strbuf_addch(sb, '\n'); } }
Re: [PATCH] pretty-print: de-tabify indented logs to make things line up properly
On Wed, Mar 16, 2016 at 9:29 AM, Linus Torvalds <torva...@linux-foundation.org> wrote: > > This should all line up: > > Column 1 Column 2 > > A B > ABCD EFGH > SPACESInstead of Tabs > > Even with multi-byte UTF8 characters: > > Column 1 Column 2 > > Ä B > åäö 100 > A Møøse once bit my sister.. So with current git, it looks like this for me: Column 1 Column 2 A B ABCD EFGH SPACESInstead of Tabs Even with multi-byte UTF8 characters: Column 1 Column 2 Ä B åäö 100 A Møøse once bit my sister.. and with that patch it looks much better: Column 1 Column 2 A B ABCD EFGH SPACESInstead of Tabs Even with multi-byte UTF8 characters: Column 1 Column 2 Ä B åäö 100 A Møøse once bit my sister.. (of course, to see that assumes that you have a fixed-sized font in your mail viewer, and that the spacing didn't get screwed up by my cut-and-paste). Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/5] Expanding tabs in "git log" output
On Wed, Mar 23, 2016 at 4:23 PM, Junio C Hamanowrote: > So here is the third try (previous round is found at $gmane/289166 > and the very first one is at $gmane/288987). > > The first three patches are essentially the same as v2. The last > two updates how the tab-expansion is internally controlled: I tested this (as it in in 'pu', rather than applying the patches), and it all seems to work fine. So Ack. And I agree that it would be good if all the commit printout logic was unified rather than having some ad-hoc "let's just set the format", but I think that's a separate cleanup. It might be more regular to have that "--expand-tabs" flag too (which would then work with the email and raw formats), but I don't see any actual real use for it so it really doesn't matter. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?
On Wed, Mar 16, 2016 at 7:27 AM, Marc Branchaudwrote: > > Could this also help with diff output, where the leading + or - mars the > indentation in a similar way? I don't think that's a good idea at least by default, since then it will break things like running diff in emacs capture mode. So even if you're in a terminal, you can't assume that you can munge the output too much. Of course, if colorization is on, you might as well pretty-print the diff by indenting things properly too, since the end result isn't going to be used as a _diff_. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] GPIO bulk changes for kernel v4.6
On Thu, Mar 17, 2016 at 11:07 PM, Laxman Dewanganwrote: > > For creating the repo and branch, I just followed the instruction from wiki > https://help.github.com/articles/create-a-repo/ So you shouldn't have created a new repo at all, you should just have cloned an existing one (that gets you a repo, of course). You basically ended up starting a new project. But I guess the github connection explains why there was a crazy README.md file there, and I can see why that documentation would make you think it's the right thing to do. Oh well. Just a "git init" wouldn't have done that kind of damage, the github documentation is just misleading in this respect. We may have to make it really really really clear for the kernel that people should not use github in any way except purely for hosting.. > I jut use git (git version 2.1.4) for pushing the changes in github repo. > > There is no other tools used. I thought git didn't merge two branches that have no common base by default, but it seems it will happily do so. So once you made the mistake of starting a new project, git merge ended up "helpfully" allowing you to merge the remote tracking branch into that new project, and we ended up with a silly new root. "git pull-request" will complain about not having a commit in common, but "git merge" apparently does not even warn. Adding Junio and the git list. This seems like much too easy a way to screw up. Junio (and git people), the problem is that github seems to have caused Laxman to think he should start a new project, and then git happily merged the new root just because nobody knew better. And sadly, I didn't notice the history screw-up until too late, so now we in the kernel have a third root commit (the first two are intentional): commit a101ad945113 was merged by commit e5451c8f8330. So how about a patch something like this: --- a/builtin/merge.c +++ b/builtin/merge.c @@ -1398,7 +1398,7 @@ int cmd_merge(int argc, const char **argv, const char *prefix) NULL, 0, UPDATE_REFS_DIE_ON_ERR); if (remoteheads && !common) - ; /* No common ancestors found. We need a real merge. */ + die(_("No common ancestor - not merging")); else if (!remoteheads || (!remoteheads->next && !common->next && common->item == remoteheads->item)) { (the above is explicitly whitespace-damaged on purpose - it's not meant as a serious patch, because we do want the *ability* to merge different projects across different roots, and there are even git tests for it, it's just that I think it's too easy to make this mistake and not even realize). So the real thing having a special option required to merge non-related projects? Or at least a humongous warning? You can recreate (for testing only!) this by doing this in a kernel repo: git checkout a101ad945113 git merge 3cf42efc3479 and you'll see how it happily ends up creating a merge commit with no common ancestors and no warning that anything might be wrong. It will take a while - walking all the way up to the root to not find the common object - but it will happily merge those two totally unrelated branches without complaining at all. Now I'm starting to wonder just how many github projects have lots of separate root commits.. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pretty-print: de-tabify indented logs to make things line up properly
On Wed, Mar 16, 2016 at 11:01 AM, Junio C Hamanowrote: > > (1) if turning your "preparation; do { ... } while()" into > "while () { }" would make the result a bit easier to read; So it's probably partly taste, but I will also disagree with your "easier to read", because of the way the code is logically structured. In particular, the "no TAB" case is actually *fundamentally* different from the "no TAB at the end" case. The return value is different, and the caller does very different things - the code tries to make it very clear that that "no TAB" situation is very different from "we found a TAB". So it's not "preparation + do-while". It's "preparation + handle the no-TAB case differently", and then the "do-while" is very natural because by the time we get to the "ok, we are now going to need to do something about the line" stage, we already know we have a tab. But the code *could* be made to just always do the whole "strbuf_add()", and not return a return value at all, and the no-tab case wouldn't be explicitly written to be different. Let me know if you'd prefer that variant, and I'll send a new version. > (2) if we can somehow eliminate duplication of "tab + 1" (spelled > differently on the previous line as "1+tab"), the end result > may get easier to follow. Yeah, I considered that. Either by just doing "tab++" before (so the +1" would come from that in both cases), or by introducing a new variable like ptrdiff_t bytes_used; ... bytes_used = 1 + tab - line; and then just doing line += bytes_used; linelen -= bytes_used; and the code I wrote just didn't do any of those temporary updates, and instead just did the "+1" by hand in both cases. Again, I can redo the patch, just tell me which model you prefer. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 9:36 AM, Linus Torvalds <torva...@linux-foundation.org> wrote: > > This seems to be a git bug. > > That commit aed06b9 can also be described as > > v3.13-rc7~9^2~14^2~42 > > so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way. Hmm. I think I see what's up. The git distance function has a special hack for preferring first-parent traversal, introduced long long ago with commit ac076c29ae8d ("name-rev: Fix non-shortest description"). Changing that #define MERGE_TRAVERSAL_WEIGHT 65535 to be a smaller value makes git find the shorter path. I do not know what the correct fix is, though. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 10:08 AM, Jeff Kingwrote: > > Right, because it makes the names longer. We give the second-parent > traversal a heuristic cost. If we drop that cost to "1", like: So I dropped it to 500 (removed the two last digits), and it gave a reasonable answer. With 1000, it gave the same "based on 4.6" answer as the current 65536 value does. > which is technically true, but kind of painful to read. It may be that a > reasonable weight is somewhere between "1" and "65535", though. Based on my tests, the "right" number is somewhere in the 500-1000 range for this particular case. But it's still a completely made up number. > However, I think the more fundamental confusion with git-describe is > that people expect the shortest distance to be the "first" tag that > contained the commit, and that is clearly not true in a branchy history. Yeah. And I don't think people care *too* much, because I'm sure this has happened before, it's just that before when it happened it wasn't quite _so_ far off the expected path.. > I actually think most people would be happy with an algorithm more like: > > 1. Find the "oldest" tag (either by timestamp, or by version-sorting > the tags) that contains the commit in question. Yes, we might want to base the "distance" at least partly on the age of the base commits. > 2. Find the "simplest" path from that tag to the commit, where we > are striving mostly for shortness of explanation, not of path (so > "~500" is way better than "~20^2~30^2~14", even though the latter > is technically a shorter path). Well, so the three different paths I've seen are: - standard git (65536), and 1000: aed06b9 tags/v4.6-rc1~9^2~792 - non-first-parent cost: 500: aed06b9 tags/v3.13-rc7~9^2~14^2~42 - non-first parent cost: 1: aed06b9 tags/v3.13~5^2~4^2~2^2~1^2~42 so there clearly are multiple valid answers. I would actually claim that the middle one is the best one - but I claim that based on your algorithm case #1. The last one may be the shortest actual path, but it's a shorter path to a newer tag that is a superset of the older tag, so the middle one is actually not just better based on age, but is a better choice based on "minimal actual history". Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 10:23 AM, Junio C Hamanowrote: > > I think avoiding side branches to describe with the weight is a > right thing to do, i.e. if you have this history: > > X---o---o---o---o---v4.6 > \ / > o---o > > you do not want to explain X as "v4.6~^2~2", and instead you want it > as "v4.6~5", even though the former is 4 hops while the latter is 5 > hops (which is longer). Yes. I have a new suggestion: make the "merge weight" depend on how complex the name ends up being. And that algorithm actually gives a completely new and improved path: aed06b9 tags/v3.13-rc2~32^2^2~47 which is still complex, but is actually the best one I've found so far. To compare against the previous ones I looked at: v4.6-rc1~9^2~792<- current git code v3.13-rc2~32^2^2~47 <- new with attached patch v3.13-rc7~9^2~14^2~42 <- merge weight 500 v3.13~5^2~4^2~2^2~1^2~42 <- merge weight 1 that new one is actually the second-most dense, and uses the oldest tag. So it's a good combination of denseness, but still gets the best actual base tag. The logic of the patch is that there are different "complexities" in the naming, and it's not just whether you are following a second parent, it's also if you're doing generational hops. So when you do a first-parent change, the name stays simple (the last number changes), and that means that the "distance" weight is low (so that's the normal "+1" we do for first-parent. But if it's not a first parent, there are two different cases: - generation > 0: we add "~%d^%d", and we approximate that with "four characters", and use a cost that is four orders of magnitude higher (so 1). - generation == 0: we add just "^%d", so generally just two characters. So use a cost that is just two orders of magnitude higher (so 100). In other words, I'm trying to convince people that my patch not only gives a good result, but that the "weight numbers" I use make some kind of conceptual sense from a complexity cost angle. With that, the patch itself is attached. I think it's better than what "git name-rev" does now. Is it optimal? No, I think the *optimal* would use some kind of "does one tag contain the other" logic, and discarding all base names that are just supersets of another base that still reaches the target. But this patch is small and simple, and has some excuses for its behavior. What do people think? Linus builtin/name-rev.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/builtin/name-rev.c b/builtin/name-rev.c index 092e03c3cc9b..0354c8d222e1 100644 --- a/builtin/name-rev.c +++ b/builtin/name-rev.c @@ -16,9 +16,6 @@ typedef struct rev_name { static long cutoff = LONG_MAX; -/* How many generations are maximally preferred over _one_ merge traversal? */ -#define MERGE_TRAVERSAL_WEIGHT 65535 - static void name_rev(struct commit *commit, const char *tip_name, int generation, int distance, int deref) @@ -55,19 +52,26 @@ copy_data: parents; parents = parents->next, parent_number++) { if (parent_number > 1) { + int weight; size_t len; char *new_name; strip_suffix(tip_name, "^0", ); - if (generation > 0) + + // The extra merge traversal "weight" depends + // on how complex the resulting name is. + if (generation > 0) { + weight = 1; new_name = xstrfmt("%.*s~%d^%d", (int)len, tip_name, generation, parent_number); - else + } else { + weight = 100; new_name = xstrfmt("%.*s^%d", (int)len, tip_name, parent_number); + } name_rev(parents->item, new_name, 0, - distance + MERGE_TRAVERSAL_WEIGHT, 0); + distance + weight, 0); } else { name_rev(parents->item, tip_name, generation + 1, distance + 1, 0);
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 6:24 AM, Andreas Schwabwrote: > > The branches recently pulled by Linus contain commits with rather old > author dates, eg 8cad489261c564d4ee1db4de4ac365af56807d8a or > 281baf7a702693deaa45c98ef0c5161006b48257. That will probably cause git > describe --contains to take a different path through the history. Nothing in git should look at author dates (except for the obvious code that then shows the date). The committer date is in fact used for the traversal heuristics for some cases, but those are different and recent in the examples you note. This seems to be a git bug. That commit aed06b9 can also be described as v3.13-rc7~9^2~14^2~42 so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way. However did git decide to use that further-away name? That looks odd. I'm wondering if it's because of the new root commit we have, and that confuses the distance calculation somehow? That came in somewhat recently (mid March) with the GPIO pull. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 10:43 AM, Linus Torvalds <torva...@linux-foundation.org> wrote: > > In other words, I'm trying to convince people that my patch not only > gives a good result, but that the "weight numbers" I use make some > kind of conceptual sense from a complexity cost angle. Basically, the patch approximates the numerical representation of the distance measure with the complexity of the suffix. It's not a *true* length of the suffix (which does heavily favor first-parent use, kind of like the current code), but I think it's better than the pretty random "+65535" that the current git code has. That number is clearly just completely made up. The new numbers at least have some kind of logic behind them. And the current code obviously does give really bad results. Picking the v4.6-rc1 tag as a base just because it happens to get a lot of first-parent traversals (800!) from one second-parent commit that is close to 4.6-rc1 is just obscene. So the more I look at my patch, the more I go "it's a real improvement on the current situation". That said, I do think that a much bigger conceptual change that actually does full traversal and be much more complicated might be the only "correct" solution. So my patch is just a "improve heuristics" small fixlet rather than something optimal. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 12:27 PM, Junio C Hamano <gits...@pobox.com> wrote: > Linus Torvalds <torva...@linux-foundation.org> writes: > >> But this patch is small and simple, and has some excuses for its >> behavior. What do people think? > > I like it that you call it "excuse" not "rationale", as I couldn't > form a logical connection between your "4 (2) letters" and "1 > (100)" at all ;-) Think of the distance number as a "order of magnitude in complexity", and it actually makes a certain amount of sense. It's not the same as the length of the string, but the "log()" of the distance number really does give a kind of complexity value. Think of it this way: if things are entirely linear (all just first parenthood), there will be just a single simple number, and the relationship between the simple distance number (that just increments by one for each parent traversed) and the length of the string that describes it will really be "log10(distance)". That's literally how many characters you need to describe the linear distance number. So a simple linear distance of 'n' commits will need on the order of 'log10(n)' digits to describe it (ie a number around a thousand will need around three digits). The "100" and "1" are just extending that notion of distance to the more complex cases., and expresses their complexity in the same logarithmic units. The same way you need four digits to express a _linear_ distance of 1, you need four characters to express that "~n^p" case of "merge parent p, n generations back". And if you don't have the generation thing, you only need two characters to express parent #'p': "^p". So two characters really *are* equivalent to ~100 linear steps, and four characters really *are* equivalent to ~1 linear steps. So it's not _just_ an excuse. There's an actual rationale for picking those numbers, and why they are equivalent in a complexity measure. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 11:05 AM, Jeff Kingwrote: > > I actually think the best name for aed06b9 is probably: > > v3.13-rc1~65^2^2~42 Sounds likely. I don't know how to find that best match without a complete rewrite, though. My recent patch that got v3.13-rc2~32^2^2~47 comes close to that, and the complexity is similar but the numbers are actually smaller, so I guess my heuristic did indeed find a "simpler" name, but yes, the one based on 3.13-rc1 would definitely be the better one. > which I found by picking the oldest tag from "git tag --contains" and > plugging it into "git describe --match". Yeah, so you basically did the "let's figure out minimal inclusion" by hand. > Sadly, neither git's internal > version-sorting nor GNU's "sort -V" knows that "v1.0-rc1" comes before > "v1.0", so I had to rely on "--sort=taggerdate". I'm not seeing the "sadly". I think "--sort=taggerdate" is pretty much the only sane sort there is for tags, unless you do a true and full topological one (ie sort based on by how many commits that tag encompasses, but also by how each tag contains another tag). Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] name-rev: include taggerdate in considering the best name
On Fri, Apr 22, 2016 at 11:11 AM, Jeff Kingwrote: > > I confirmed that it does find the "optimal" tag for the case we've been > discussing. Yes. I'm a bit more worried about the date behavior for projects that merge back stable branches into their development trees (is the development tag better than the stable tag? the date doesn't really say much), but I think this is still the simplest model we can use without trying to really do a topo-sort. And in many ways it's the simplest one to explain to people too: "we try to use the oldest reference we can find as a base for the resulting name" is not a complex or hard concept to explain. > We could _also_ tweak the merge-weight as Linus's patch did, just > because 1 has more basis than 65535. But I think it really matters a > lot less at this point. Yes. I still think that my tweak makes more sense than the existing code, but it's a tiny tweak, compared to the date-based approach. Unlikely to ever matter much. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Small trivial annoyance with the nice new builtin "git am"
Ok, it's no longer *that* new, but I only now noticed.. So I noticed that when I applied the last patch-bomb series from Andrew, all the commit date-stamps are idential. Now, it would be lovely if the new builtin git-am really was *so* fast that it applies a 100+-patch series in under a second, but no, that's not it. It's just that it only looks up the current time once. That seems entirely accidental, I think that what happened is that "ident_default_date()" just ends up initializing the default date string once, and then the date is cached there, because it's now run as a single process for the whole series. I think I'd rather get the "real" commit dates, even if they show that git only does a handful of commits a second rather than hundreds of commits.. Something that just clears git_default_date in between "git am" iterations, perhaps? Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Small trivial annoyance with the nice new builtin "git am"
On Thu, Jul 28, 2016 at 5:29 PM, Jeff Kingwrote: > > I guess we want something like: > > +void reset_ident_date(void) > +{ > + strbuf_reset(_default_date); > +} Looks sane. > and then to sprinkle calls liberally through builtin-ified programs when > they move from one unit of work to the next. Maybe we can just add it to the end of commit_tree_extended(), and just say "the cache is reset between commits". That way there is no sprinking in random places. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Small trivial annoyance with the nice new builtin "git am"
On Thu, Jul 28, 2016 at 5:21 PM, Jeff Kingwrote: > > I do wonder if you would be happier giving each commit a "fake" > monotonically increasing time, so they are correctly ordered by commit > date. No, that would be nasty, partly for the corner case you mention, but partly because I just think it's wrong to try to lie about reality. The reason I noticed this in the first place was actually that I just wanted to take a look whether things had gotten slower or faster over time, and see how many patches per second I get from the patch-bombs Andrew sends me. So getting real time was what I was looking for. Also, before somebody asks: the reason git has always cached the "default time" string is because there's a reverse annoying thing, which is looking up time twice, and getting a difference of a second between author times and committer times just because of bad luck. That makes no sense either, and is actually why we have that "ident_default_date()" cache thing going on. So we do want to cache things for a single commit, it's just that for things like "git am" (or, like Junio wondered, "git rebase" - I didn't check) we probabyl just just flush the cache in between commits. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Odd git overrflow bug?
On Sun, Jul 10, 2016 at 2:21 PM, Linus Torvalds <torva...@linux-foundation.org> wrote: > > I'm not sure why it doesn't happen in current git master, because that > function is the same, and the logic around expand_tabs_in_log looks > similar too. Ahh. Commit 43ec55091553 ("bisect: always call setup_revisions after init_revisions") is in master, but not in v2.9.0 Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Odd git overrflow bug?
On Sun, Jul 10, 2016 at 2:05 PM, Linus Torvalds <torva...@linux-foundation.org> wrote: > > I'm getting "extra" being -1 in strbuf_grow(). Let me dog deeper. "dog deeper"? My typing skills are deteriorating. Anyway, I dug deeper, and the reason is that "tabwidth" is -1, and then pretty.c:1669: strbuf_add_tabexpand(): strbuf_addchars(sb, ' ', tabwidth - (width % tabwidth)); ends up having the number be -1. I'm not sure why it doesn't happen in current git master, because that function is the same, and the logic around expand_tabs_in_log looks similar too. That all came from fe37a9c586a6 (:pretty: allow tweaking tabwidth in --expand-tabs") and I suspect the code just needs to protect against negative or zero, rather than just zero. Junio? Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Odd git overrflow bug?
We have an odd bug report in the kernel, where somebody had trouble bisecting all the way due to "git is failing with "you are trying to use to much memory"(?!)" which can't be an exact error message quote, but the closest I can find smells like the "unsigned_add_overflows()" check in the strbuf code. Very odd. See https://bugzilla.kernel.org/show_bug.cgi?id=121701 for the bug report. I'm not seeing how that could *possibly* happen, but if it indeed is that strbuf code, maybe it could print out the function name (there are two different cases) and values so give a hint about where/how this happens.. Of course, since the bug report isn't an exact quote anyway, maybe that wouldn't have helped anyway. I asked for more information. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Odd git overrflow bug?
On Sun, Jul 10, 2016 at 2:01 PM, Linus Torvalds <torva...@linux-foundation.org> wrote: > > I'll try to figure out why git-2.9.0 fails. I'm getting "extra" being -1 in strbuf_grow(). Let me dog deeper. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Odd git overrflow bug?
On Sun, Jul 10, 2016 at 11:41 AM, Andreas Schwabwrote: > > I've seen that too, but only at the end of bisection, when it tries to > display the bad commit. That's apparently what the kernel bug reporter sees too now. However, I cannot reproduce the problem with the particular kernel bisect that the reporter is using. Judging by what he bisected things down to, he must have done git bisect start git bisect bad 7ed18e2d1b6782989eb399ef79a8cc1a1b583b3c git bisect good 7ed18e2d1b6782989eb399ef79a8cc1a1b583b3c^ git bisect good git bisect bad but that works fine for me. [ Oops. I tested something. It works for me with current git, but with git-2.9.0 I get the failure ]. I'll try to figure out why git-2.9.0 fails. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reset cached ident date before creating objects
On Fri, Jul 29, 2016 at 11:05 AM, Jeff Kingwrote: > > Linus suggested resetting the timestamp after making the commit, to > clear the way for the next commit. But if we reset any cached value > _before_ making the commit, this has a few advantages: Looks fine to me. It should trivially fix the git-am issue, and I can't see what it could break. Famous last words. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: topological index field for commit objects
On Thu, Jun 30, 2016 at 3:30 AM, Jakub Narębskiwrote: > > P.S. Having Git ensure that committerdate (as an epoch) is greater > than committerdates of its parents at the commit creation time (with > providing warning about time skew, perhaps not doing it if skew is > too large) would be not costly. No need to add anything, and it would > help future queries to be faster, I think. So I think git should check the committer date (ie verify that the commitetr date of a new commit is always equal to or more recent than all the parents). But we should probably allow some slop for when machines are out of sync (think build farms etc automation that may do automated commits, but the clocks on different machines may be off by a few seconds). We already do things like that in some of the use of committer dates - see for example CUTOFF_DATE_SLOP) And we should probably allow the user to override the refusal to create new commits with bogus dates - think importing repositories etc where you may have reasons to just say "that's just how it was". And it will take years for that to percolate out to all users, so it's not like it will fix things immediately, but then some time from now, we can consider commit dates to be more reliably generation numbers. I do think that it's ok to cache generation numbers somewhere if there is an algorithm that can make use of them, but every time this comes up, it's just not been important enough to make a big deal and a new incompatible object format for it. The committer date is preexisting and has existing pseudo-generation-number usage, so..improving on the quality of it sounds like a good idea. The first step should probably be to make fsck warn about the existing cases of "commit has older date than parents". Something like the attached patch, perhaps? Linus fsck.c | 13 + 1 file changed, 13 insertions(+) diff --git a/fsck.c b/fsck.c index 05315451c56f..722b65371d38 100644 --- a/fsck.c +++ b/fsck.c @@ -60,6 +60,7 @@ FUNC(NULL_SHA1, WARN) \ FUNC(ZERO_PADDED_FILEMODE, WARN) \ FUNC(NUL_IN_COMMIT, WARN) \ + FUNC(DATE_ORDERING, WARN) \ /* infos (reported as warnings, but ignored by default) */ \ FUNC(BAD_TAG_NAME, INFO) \ FUNC(MISSING_TAGGER_ENTRY, INFO) @@ -604,6 +605,17 @@ static int fsck_ident(const char **ident, struct object *obj, struct fsck_option return 0; } +static void fsck_commit_date(struct fsck_options *options, struct commit *commit) +{ + struct commit_list *p; + + for (p = commit->parents; p; p = p->next) { + struct commit *parent = p->item; + if (commit->date < parent->date) + report(options, >object, FSCK_MSG_DATE_ORDERING, "Bad commit date ordering with parent"); + } +} + static int fsck_commit_buffer(struct commit *commit, const char *buffer, unsigned long size, struct fsck_options *options) { @@ -650,6 +662,7 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer, return err; } } + fsck_commit_date(options, commit); author_count = 0; while (skip_prefix(buffer, "author ", )) { author_count++;