Re: RFC v3: Another proposed hash function transition plan
Hi Brandon, On Mon, 11 Sep 2017, Brandon Williams wrote: > On 09/08, Junio C Hamano wrote: > > Junio C Hamanowrites: > > > > > One thing I still do not know how I feel about after re-reading the > > > thread, and I didn't find the above doc, is Linus's suggestion to > > > use the objects themselves as NewHash-to-SHA-1 mapper [*1*]. > > > ... > > > [Reference] > > > > > > *1* > > > > I think this falls into the same category as the often-talked-about > > addition of the "generation number" field. It is very tempting to add > > these "mechanically derivable but expensive to compute" pieces of > > information to the sha3-content while converting from sha1-content and > > creating anew. > > We didn't discuss that in the doc since this particular transition plan > we made uses an external NewHash-to-SHA1 map instead of an internal one > because we believe that at some point we would be able to drop > compatibility with SHA1. Is there even a question about that? I mean, why would *any* project that switches entirely to SHA-256 want to carry the SHA-1 baggage around? So even if the code to generate a bidirectional old <-> new hash mapping might be with us forever, it *definitely* should be optional ("optional" at least as in "config setting"), allowing developers who only work with new-hash repositories to save the time and electrons. > Now I suspect that wont happen for a long time but I think it would be > preferable over carrying the SHA1 luggage indefinitely. It should be possible to push back the SHA-1 ginny into a small gin bottle inside Git's source code, so to say, i.e. encapsulate it to the point where it is a compile-time option, in addition to a runtime option. Of course, that's only unless the SHA-1 calculation is made mandatory as suggested above. I really shudder at the idea of requiring SHA-1 to be required forever. We ignored advice in 2005 against making ourselves too dependent on SHA-1, and I would hope that we would learn from this. > At some point, then, we would be able to stop hashing objects twice > (once with SHA1 and once with NewHash) instead of always requiring that > we hash them with each hash function which was used historically. Yes, please. > > Because the "sha1-name" or the "generation number" can mechanically > > be computed, ... as long as a shallow clone you do not have, of course... > > as long as everybody agrees to _always_ place them in the > > sha3-content, the same sha1-content will be converted into exactly the > > same sha3-content without ambiguity, and converting them back to > > sha1-content while pushing to an older repository will correctly > > produce the original sha1-content, as it would just be the matter of > > simply stripping these extra pieces of information. ... or Git would simply handle the absence of the generation number header gracefully, so that sha1-content == sha3-content... > > The same thing could happen if we decide to bake "generation number" > > in the SHA-3 commit objects. One possible definition would be that a > > root commit will have gen #0; a commit with 1 or more parents will get > > max(parents' gen numbers) + 1 as its gen number. But somebody may > > botch the counting and records sum(parents' gen numbers) as its gen > > number. > > > > In these cases, not just the SHA3-content but also the resulting SHA-3 > > object name would be different from the name of the object that would > > have recorded the same contents correctly. So converting back to > > SHA-1 world from these botched SHA-3 contents may produce the original > > contents, but we may end up with multiple "plausibly looking" set of > > SHA-3 objects that (clain to) correspond to a single SHA-1 object, > > only one of which is a valid one. > > > > Our "git fsck" already treats certain brokenness (like a tree whose > > entry has mode that is 0-padded to the left) as broken but still > > tolerate them. I am not sure if it is sufficient to diagnose and > > declare broken and invalid when we see sha3-content that records > > these "mechanically derivable but expensive to compute" pieces of > > information incorrectly. > > > > I am leaning towards saying "yes, catching in fsck is enough" and > > suggesting to add generation number to sha3-content of the commit > > objects, and to add even the "original sha1 name" thing if we find > > good use of it. But I cannot shake this nagging feeling off that I > > am missing some huge problems that adding these fields and opening > > ourselves to more classes of broken objects. > > > > Thoughts? Seeing as current Git versions would always ignore the generation number (and therefore work perfectly even with erroneous baked-in generation numbers), and seeing as it would be easy to add a config option to force Git to ignore the embedded generation numbers, I would consider `fsck` catching those problems the best idea. It
Bug: git branch --unset-upstream command can nuke config when disk is full.
After being away for a while I saw the following message in one of my git repos: $ git status On branch yves/xxx Your branch is based on 'origin/yves/xxx', but the upstream is gone. (use "git branch --unset-upstream" to fixup) nothing to commit, working tree clean $ git branch --unset-upstream fatal: could not unset 'branch.yves/simple_projection.merge' At this point my .git/config file was empty, and all of my config was lost. I assume that things that rewrite .git/config do not check for a successful write before deleting the old version of the file. This was git version 2.14.1 Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
Re: [idea] File history tracking hints
Hi Philip, On Mon, 11 Sep 2017, Philip Oakley wrote: > From: "Pavel Kretov"> > Hi all, > > > > Excuse me if the topic I'm going to raise here has been already discussed > > on the mailing list, forums, or IRC, but I couldn't find anything related. > > > > > > The problem: > > > > Git, being "a stupid content tracker", doesn't try to keep an eye on > > operations which happens to individual files; things like file renames > > aren't recorded during commit, but heuristically detected later. > > > > Unfortunately, the heuristic can only deal with simple file renames with > > no substantial content changes; it's helpless when you: > > > > - rename file and change it's content significantly; > > - split single file into several files; > > - merge several files into another; > > - copy entire file from another commit, and do other things like these. > > > > However, if we're able to preserve this information, it's possible > > not only to do more accurate 'git blame', but also merge revisions with > > fewer conflicts. > > > > > > The proposal: > > > > The idea is to let user give hints about what was changed during > > the commit. For example, if user did a rename which wasn't automatically > > detected, he would append something like the following to his commit > > message: > > > >Tracking-hints: rename dev-vcs/git/git-1.0.ebuild -> > > dev-vcs/git/git-2.0.ebuild > > > > or (if full paths of affected files can be unambiguously omitted): > > > >Tracking-hints: rename git-1.0.ebuild -> git-2.0.ebuild > > > > There may be other hint types: > > > >Tracking-hint: recreate LICENSE.txt > >Tracking-hint: split main.c -> main.c cmdline.c > >Tracking-hint: merge linalg.py <- vector.py matrix.py > > > > or even something like this: > > > >Tracking-hint: copy json.py <- > > libs/json.py@4db88291251151d8c5c8e4f20430fa4def2cb2ed > > > > If file transformation cannot be described by a single tracking hint, it > > shall > > be possible to specify a sequence of hints at once: > > > >Tracking-hint: > >split Utils.java -> AppHelpers.java StringHelpers.java > >recreate Utils.java > > > > Note that in the above example the order of operations really matters, so > > both lines have to reside in one 'Tracking-hint' block. > > > > * * * > > > > How do you think, is this idea worth implementing? > > Any other thoughts on this? > > > > -- Pavel Kretov. > > Maybe use the "interpret-trailers" methods for standardising your hints > locally (in your team / workplace) to see how it goes and flesh out what works > and what doesn't. Trying to decide, a-priori, what are the right hints is > likely to be the hard part. I think this adds a very valuable insight to this discussion: the current state of Git's rename handling is based on the idea that you either record the renames, or you detect them. Like, there is either "on" or "off". No middle ground. However, if you understand that there is also the possibility of hints that can help any erroneous rename detection (and *everybody* who seriously worked on a massive code base has seen that rename detection fail in the most inopportune ways [*1*]), then you are on to something. So I totally like the idea of introducing hints, possibly as trailers in the commit message (or as refs/notes/rename/* or whatever) that can be picked up by Git versions that know about them, and can be ignored by Git versions that insist on the rename detection du jour. With a config option to control the behavior, maybe, too. Ciao, Dscho Footnote *1*: Just to name a couple of examples from my personal experience, off the top of my head: - license boiler plates often let Git detect renames/copies where there are none, - even something as trivial as moving Java classes (and their dependent classes) between packages changes every line referring to said packages, causing Git's rename detection to go for a drink instead of doing its job, - indentation changes overwhelm Git's rename detection, - when rename detection would matter most, like, really a lot, to lift the burden of the human beings in front of the computer pouring over hundreds of thousands of files moved from one directory tree to another, that's exactly when Git's rename detection says that there are too many files, here are my union rights, I am going home, good luck to you. In light of such experiences, I have to admit that the notion that the rename detection can always be improved in hindsight puts quite a bit of insult to injury for those developers who are bitten by it.
Re: [PATCH] commit-template: change a message to be more intuitive
On Wed, Sep 13, 2017 at 12:29:15PM +0200, Kevin Daudt wrote: > On Tue, Sep 12, 2017 at 04:25:36PM +0530, Kaartic Sivaraam wrote: > > It's not possible to 'touch' the cut-line that is shown when the > > user requests a patch in his commit template. > > > > Touching something can also mean to disturb or change something, which > is the meaning being used here, so it is not an incorrect use of the > word. > > > So, make the sentence more intuitive. > > I can understand however that it might be not so clear for people with > less fluency in English. I agree with both of your points. It is very clear to me as a native speaker, but I can see how it might not be for everyone. Interestingly, the change here: > > - const char *explanation = _("Do not touch the line above.\nEverything > > below will be removed."); > > + const char *explanation = _("Do not edit the line above.\nEverything > > below will be removed."); actually seems less clear to me. I think of "edit" as "modify". But obviously it also should not be removed. Perhaps Do not modify or remove the line above. would be the most clear. Or perhaps it is overly verbose. -Peff
Re: [PATCH 3/4] replace-objects: evaluate replacement refs without using the object store
On Wed, Sep 13, 2017 at 10:03:58AM +0200, Michael Haggerty wrote: > On 09/12/2017 07:31 PM, Jonathan Nieder wrote: > > From: Stefan Beller> > > > Pass DO_FOR_EACH_INCLUDE_BROKEN when iterating over replacement refs > > so that the iteration does not require opening the named objects from > > the object store. This avoids a dependency cycle between object access > > and replace ref iteration. > > > > Moreover the ref subsystem has not been migrated yet to access the > > object store via passed in repository objects. As a result, without > > this patch, iterating over replace refs in a repository other than > > the_repository it produces errors: > > > >error: refs/replace/3afabef75c627b8943bcae86837abc7c32fe does not > > point to a valid object! > > > > Noticed while adapting the object store (and in particular its > > evaluation of replace refs) to handle arbitrary repositories. > > Have you checked that consumers of this API can handle broken > references? Aside from missing values, broken references can have > illegal names (though hopefully not unsafe in the sense of causing > filesystem traversal outside of `$GITDIR`). It looks like there are only two callers. One complains about names that aren't 40-hex. The other ("git replace -l") parses the 40-hex in "long" mode, but will print anything in short mode (not that printing is very dangerous). I do have to wonder if for_each_replace_ref() should just do the 40-hex syntactic check itself (and issue a warning for other refs). It seems like that would be the point of calling for_each_replace_ref() instead of just for_each_ref_in("refs/replace") yourself, and both of the callers would benefit. -Peff
Re: BUG: attempt to trim too many characters
On Tue, Sep 12, 2017 at 09:29:49PM -0700, Linus Torvalds wrote: > Just reminding people that this issue would seem to still exist in > current master.. Yeah, the fix is in 1d0538e4860, but it's still working it's way up through the integration branches. -Peff
Re: [PATCH] commit-template: change a message to be more intuitive
On Tue, Sep 12, 2017 at 04:25:36PM +0530, Kaartic Sivaraam wrote: > It's not possible to 'touch' the cut-line that is shown when the > user requests a patch in his commit template. > Touching something can also mean to disturb or change something, which is the meaning being used here, so it is not an incorrect use of the word. > So, make the sentence more intuitive. I can understand however that it might be not so clear for people with less fluency in English. > > Signed-off-by: Kaartic Sivaraam> --- > Just a small tweak. May or may not be worth the patch. > > wt-status.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/wt-status.c b/wt-status.c > index 77c27c511..1f54b1df2 100644 > --- a/wt-status.c > +++ b/wt-status.c > @@ -934,7 +934,7 @@ size_t wt_status_locate_end(const char *s, size_t len) > > void wt_status_add_cut_line(FILE *fp) > { > - const char *explanation = _("Do not touch the line above.\nEverything > below will be removed."); > + const char *explanation = _("Do not edit the line above.\nEverything > below will be removed."); > struct strbuf buf = STRBUF_INIT; > > fprintf(fp, "%c %s", comment_line_char, cut_line); > -- > 2.14.1.1006.g90ad9a07c >
BUSINESS PROPOSAL
Please I like you to keep this proposal as a top secret and delete it if you are not interested and get back to me if you are interested for details as regards to the transfer of $24,500,000 to you. This money initially belongs to a Libyan client who died in the libya crisis and had no next of kin in his account-opening package in my bank here in Hong kong where I am a bank director. In other to achieve this, I shall require your full name, and telephone number to reach you.Most importantly, a confirmation of acceptance from you will be needed after which I shall furnish you with the full details of this transaction. Kindly reply to linglung0...@gmail.com Respectfully, Ling Lung
Re: Unexpected pass for t6120-describe.sh on cygwin
Johannes Schindelin venit, vidit, dixit 12.09.2017 15:39: > Hi Ramsay, > > On Sat, 9 Sep 2017, Ramsay Jones wrote: > >> I ran the test-suite on the 'pu' branch last night (simply because that >> was what I had built at the time!), which resulted in a PASS, but t6120 >> was showing a 'TODO passed' for #52. >> >> This is a test introduced by Michael's 'mg/name-rev-tests-with-short-stack' >> branch, which uses 'ulimit -s' to try and force a stack overflow. >> Unfortunately, 'ulimit -s' seems to have no effect on cygwin. I created >> a test program (see below) to eat up the stack and tried running it with >> various ulimit values (128, 12, 8), but it always seg-faulted at the >> same stack-frame. (after using approx 2MB stack space). >> >> So, it looks like all ULIMIT_STACK_SIZE tests need to be disabled >> on cygwin. I also wonder about the ULIMIT_FILE_DESCRIPTORS tests, but >> haven't looked into it. >> >> Given that 'ulimit' is a bash built-in, this may also be a problem on >> MinGW and Git-For-Windows, but I can't test on those platforms. > > It is. Thanks. I just dug something up from an old cygwin list post. Could you please try and report on the following (cygwin, MinGW): ulimit -s ulimit -s 4096 # anything lower than the value from above ulimit -s bash -c "ulimit -s" My suspicion from that post is that ulimit affects the sub shell only - if yes, running a test inside the sub shell to confirm whether the setting is in effect would be great, of course. /usr/bin/echo $(seq 10) or such does the job on Linux (with stack limit 128), but I don't know whether this is portably reliable. If ulimit on these platforms has no effect but does not lie about the value either it's a simple prerequisite test (test ulimit present, if yes set and get and compare the stack size values). Michael
Re: Buffered value should be shown when requesting username for remote authentication
On Tuesday 12 September 2017 09:03 PM, Jeff King wrote: If I understand right, you typed "sivaraam" once, then the network lagged, then you typed "sivaraam" again. Almost there but I should have been more clearer. What I actually did was I run `git push` and knowing it would ask for a username I started typing it immediately without looking at the terminal to notice that the request for the username hasn't shown up yet due to the slow network. the terminal was like this before the request showed up, $ git push sivaraam After the request showed up the terminal was like, $ git push sivaraamUsername for 'https://github.com' : I thought the buffered input wasn't recognised as it didn't show up after the request. So, I typed the username once more to get this, $ git push -u fork sivaraamUsername for 'https://github.com' : sivaraam Password for 'https://sivaraamsivar...@github.com : I have been accustomed with this now a days that I don't do the same mistake now. I thought it would nice if this could be fixed as I have seen utilities show the buffered input after the request but as I stated before I'm not able to recollect which utility it was (hope I had better memory) That isn't really something that Git can fix reliably. Reading those characters and echoing them back to the terminal is handled by your terminal driver (and potentially other things like ssh). Git may have received "sivaraamsivaraam" all at once, depending on where the lag is. I expected this but wasn't sure which "potential thing" was handling the 'https' handling in the background. I guess it's 'curl' but not sure. --- Kaartic
Re: [PATCH 4/4] packed refs: pass .git dir instead of packed-refs path to init_fn
On 09/12/2017 07:32 PM, Jonathan Nieder wrote: > From: Stefan Beller> > The first argument of a ref_store_init_fn is documented to represent > the $GIT_DIR, not the path to the packed-refs file. This brings the > packed-refs store more in line with the usual ref store interface. > > Signed-off-by: Jonathan Nieder > Signed-off-by: Stefan Beller > --- > That's the end of the series. Thanks for reading. > > refs/files-backend.c | 4 ++-- > refs/packed-backend.c | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/refs/files-backend.c b/refs/files-backend.c > index fccbc24ac4..3b8e13a8b7 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -57,8 +57,8 @@ static struct ref_store *files_ref_store_create(const char > *gitdir, > refs->gitdir = xstrdup(gitdir); > get_common_dir_noenv(, gitdir); > refs->gitcommondir = strbuf_detach(, NULL); > - strbuf_addf(, "%s/packed-refs", refs->gitcommondir); > - refs->packed_ref_store = packed_ref_store_create(sb.buf, flags); > + refs->packed_ref_store = > + packed_ref_store_create(refs->gitcommondir, flags); > strbuf_release(); > > return ref_store; > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index 412c85034f..2c5db15279 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > @@ -78,7 +78,7 @@ struct packed_ref_store { > struct tempfile tempfile; > }; > > -struct ref_store *packed_ref_store_create(const char *path, > +struct ref_store *packed_ref_store_create(const char *common_dir, > unsigned int store_flags) > { > struct packed_ref_store *refs = xcalloc(1, sizeof(*refs)); > @@ -87,7 +87,7 @@ struct ref_store *packed_ref_store_create(const char *path, > base_ref_store_init(ref_store, _be_packed); > refs->store_flags = store_flags; > > - refs->path = xstrdup(path); > + refs->path = xstrfmt("%s/packed-refs", common_dir); > return ref_store; > } > > This little patch, perhaps surprisingly, brings up some deeper issues. First of all there is a superficial naming inconsistency. This method is called `init` in the vtable, but the functions implementing it are called `*_ref_store_create()`. It actually initializes the data structures for dealing with the reference store, as opposed to initializing the reference store on disk (*that* virtual function is called `init_db`). It should maybe be called `open` instead. But more fundamentally, what is a `ref_store`? Originally it always represented a *logical* place to store all of the references for a repository. In that sense, it made sense to have an "open" method that takes a `gitdir` as argument. But its role is currently getting stretched. Now it sometimes represents a *physical* place to store references, which might be combined with other physical `ref_store`s to make a logical `ref_store`. There is not necessarily a 1:1 correspondence between gitdirs and physical `ref_store`s; more to the point you might want to initialize a physical refstore that doesn't correspond to `$GITDIR`. So requiring every `ref_store` to have a factory function with a gitdir argument is probably incorrect. For example, a subtree has a single logical reference store, but it is composed out of three physical reference stores: the loose references in the subtree's gitdir plus loose and packed references in the common gitdir. It seems to me that we need two separate concepts: 1. Create an object representing a gitdir's *logical* `ref_store`. This "class method" could always have a single gitdir as parameter. This would be the method by which the repository initialization code instantiates its logical `ref_store`, for example by reading the type of the reference store from the repo's config, then looking up the class by its type, then calling its "open" method to create an instance. 2. Create an object representing a physical `ref_store`. Different reference store classes might have different "constructor" arguments. For example, if it represents a namespace within a larger reference tree contained in a shared repository, its arguments might be `(shared_gitdir, namespace)`. (CC to Richard Maw for this angle.) Since a `packed_ref_store` is definitely a physical `ref_store`, the function that we're talking about here is the second type, so I don't see a need for it to take a gitdir as argument. OTOH, the function certainly doesn't belong in the vtable slot for `init`, because a `packed_ref_store` can't serve as a repository's logical reference store. Did you have a particular reason for changing this now, aside from "consistency"? If not, feel free to drop this patch and I'll implement something more like what I have described above when I have time. Michael
Re: [PATCH 3/4] replace-objects: evaluate replacement refs without using the object store
On 09/12/2017 07:31 PM, Jonathan Nieder wrote: > From: Stefan Beller> > Pass DO_FOR_EACH_INCLUDE_BROKEN when iterating over replacement refs > so that the iteration does not require opening the named objects from > the object store. This avoids a dependency cycle between object access > and replace ref iteration. > > Moreover the ref subsystem has not been migrated yet to access the > object store via passed in repository objects. As a result, without > this patch, iterating over replace refs in a repository other than > the_repository it produces errors: > >error: refs/replace/3afabef75c627b8943bcae86837abc7c32fe does not > point to a valid object! > > Noticed while adapting the object store (and in particular its > evaluation of replace refs) to handle arbitrary repositories. Have you checked that consumers of this API can handle broken references? Aside from missing values, broken references can have illegal names (though hopefully not unsafe in the sense of causing filesystem traversal outside of `$GITDIR`). Michael
Re: [PATCH 10/10] add UNLEAK annotation for reducing leak false positives
On Tuesday 12 September 2017 08:35 PM, Jeff King wrote: But theta-well isn't a pun. :P :) It is true that prepending to a linked list is also Θ(1), but I'm not sure if it's carelessness that causes many programmers to use big-O. It's that what we care about is worst-case performance. So knowing that we have a lower bound isn't usually that interesting. What we want to know is whether it will blow up in our face as "n" gets large. This seems quite acceptable. Plus typing non-ascii characters is annoying. :) I expected this to be a reason. :) --- Kaartic
Re: [PATCH 10/10] add UNLEAK annotation for reducing leak false positives
On Tuesday 12 September 2017 08:59 PM, Jeff King wrote: Like all good writing rules, I think it's important to know when to break them. :) That's right. "Have guidelines but 'Be bold' enough to break them when they seem to be inducing counter productivity." Writing in the imperative is _most_ important in the subject. You're likely to see a lot of subjects in a list, and it makes the list easier to read if they all match. It also tends to be shorter, which is good for subjects. For short commit messages, I think the imperative also keeps things tight and to the point: describe the problem and then say how to fix it. The recent 0db3dc75f is a good example (which I picked by skimming recent "git log" output). But saying "this patch" is IMHO not that big a problem there, as long as it isn't done excessively. When you the explanation is longer or more complicated, the imperative can actually be a bit _too_ terse. In longer text it helps to guide readers in the direction you want their thoughts to take. Having a three-paragraph explanation of the problem or current state of things and then jumping right into "Do this. Do that." lacks context. A marker like "this patch" helps the reader know that you're switching gears to talking about the solution. I also think that "I" is useful in avoiding the passive voice. It can certainly be used gratuitously and make things less clear, but in most cases I'd rather see something like "I tested performance under these conditions" than "Performance was tested under these conditions". I also often use the "academic we" here even when I worked on something myself. Thanks for taking the time to give the detailed and clear explanation. --- Kaartic