Ich suche Daten auf git
Hallo, Ich heisse Jensen. Ich bin auf git interessiert. Haette gern von Euch gehoert. Gruss aus den Staaten, -- Cal Dershowitz aka Merrill Jensen in real life -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's cooking in git.git (Jun 2016, #05; Thu, 16)
On 07/13/2016 12:20 AM, Joey Hess wrote: Torsten Bögershausen wrote re jh/clean-smudge-annex: The thing is that we need to check the file system to find .gitatttibutes, even if we just did it 1 nanosecond ago. So the .gitattributes is done 3 times: -1 would_convert_to_git_filter_fd( -2 assert(would_convert_to_git_filter_fd(path)); -3 convert.c/convert_to_git_filter_fd() The only situation where this could be useful is when the .gitattributes change between -1 and -2, but then they would have changed between -1 and -3, and convert.c will die(). Does it make sense to remove -2 ? There's less redundant work going on than at first seems, because .gitattribute files are only read once and cached. Verified by strace. OK, I think I missed that work (not enough time for Git at the moment) Junio, please help me out, do we have a cache here now? I tried to figure out that following your attr branch, but failed. So, the redundant work is only in the processing that convert_attrs() does of the cached .gitattributes. Notice that there was a similar redundant triple call to convert_attrs() before my patch set: 1. index_fd checks would_convert_to_git_filter_fd 2. index_stream_convert_blob does assert(would_convert_to_git_filter_fd(path)) (Again redundantly since 1. is its only caller and has already checked.) 3. in convert_to_git_filter_fd If convert_attrs() is somehow expensive, it might be worth passing a struct conv_attrs * into the functions that currently call convert_attrs(). But it does not look expensive to me. I have that on the list, but seems to be uneccesary now. I think it would be safe enough to remove both asserts, at least as the code is structured now. OK. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Https password present in git output
On Thu, Jul 14, 2016 at 01:36:52AM +0300, ervion wrote: > It is in fact the case, that git fetch output is scrubbed, sorry I did not > notice previously. > But (on my device: git version 2.9.0 arch linux) git push is not. > $ git push origin --all Maybe this? -- >8 -- Subject: [PATCH] push: anonymize URL in status output Commit 47abd85 (fetch: Strip usernames from url's before storing them, 2009-04-17) taught fetch to anonymize URLs. The primary purpose there was to avoid sticking passwords in merge-commit messages, but as a side effect, we also avoid printing them to stderr. The push side does not have the merge-commit problem, but it probably should avoid printing them to stderr. We can reuse the same anonymizing function. Note that for this to come up, the credentials would have to appear either on the command line or in a git config file, neither of which is particularly secure. So people _should_ be switching to using credential helpers instead, which makes this problem go away. But that's no excuse not to improve the situation for people who for whatever reason end up using credentials embedded in the URL. Signed-off-by: Jeff King--- t/t5541-http-push-smart.sh | 7 +++ transport.c| 7 +-- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh index fd7d06b..8d08e06 100755 --- a/t/t5541-http-push-smart.sh +++ b/t/t5541-http-push-smart.sh @@ -368,5 +368,12 @@ test_expect_success GPG 'push with post-receive to inspect certificate' ' test_cmp expect "$HTTPD_DOCUMENT_ROOT_PATH/push-cert-status" ' +test_expect_success 'push status output scrubs password' ' + test_commit scrub && + git push --porcelain "$HTTPD_URL_USER_PASS/smart/test_repo.git" >status && + # should have been scrubbed down to vanilla URL + grep "^To $HTTPD_URL/smart/test_repo.git" status +' + stop_httpd test_done diff --git a/transport.c b/transport.c index 095e61f..be4a63e 100644 --- a/transport.c +++ b/transport.c @@ -359,8 +359,11 @@ static void print_ok_ref_status(struct ref *ref, int porcelain) static int print_one_push_status(struct ref *ref, const char *dest, int count, int porcelain) { - if (!count) - fprintf(porcelain ? stdout : stderr, "To %s\n", dest); + if (!count) { + char *url = transport_anonymize_url(dest); + fprintf(porcelain ? stdout : stderr, "To %s\n", url); + free(url); + } switch(ref->status) { case REF_STATUS_NONE: -- 2.9.1.356.g3c37bc7 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Https password present in git output
I completely agree that it is not a head-on-fire kind of problem, there are ways to avoid it. Simply nice to have. It is in fact the case, that git fetch output is scrubbed, sorry I did not notice previously. But (on my device: git version 2.9.0 arch linux) git push is not. $ git push origin --all Results in: /---/ To https://username:passw...@domeen.com/git/repo.git xxx..zzz master -> master On 13.07.2016 21:16, Junio C Hamano wrote: On Wed, Jul 13, 2016 at 11:09 AM, Junio C Hamanowrote: ervion writes: Sometimes using ssh is not possible and saving https password in plain text to disk may be desireable (in case of encrypted disk it would be equivalent security with caching password in memory). One possibility for this in git is to save remote in the https://username:passw...@domain.com/repo.git format. Wasn't netrc support added exactly because users do not want to do this? Interesting. Even with "auth in URL", I seem to get this: $ git fetch -v -v https://gitster:p...@github.com/git/git refs/tags/v2.9.1 From https://github.com/git/git * tag v2.9.1 -> FETCH_HEAD Notice that "From $URL" has the userinfo (3.2.1 in RFC 3986) scrubbed. If you are seeing somewhere we forgot to scrub userinfo in a similar way in the output, we should. Where do you see "present in git OUTPUT" as you said in the subject? What command with what options exactly and in what part of the output? Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's cooking in git.git (Jul 2016, #05; Wed, 13)
On Wed, Jul 13, 2016 at 03:41:01PM -0700, Junio C Hamano wrote: > Stefan Bellerwrites: > > >>> I think Shawns proposal to have a receive.maxCommandBytes is a > >>> good way for an overall upper bound, but how does it stop us from > >>> going forward with this series? > >> > >> If we were to do maxcommandbytes, then max_options would become > >> irrelevant, no? > > > > Maybe? > > > > I do not know what kind of safety measures we want in place here, and > > if we want to go for overlapping things? > > > > Currently there are none at all in your upstream code, although you cannot > > push arbitrary large things to either Shawns or Peffs $Dayjob servers, so > > I wonder if we want to either agree on one format or on many overlapping > > things, as some different hosts may perceive different things as DoS > > threats, > > so they can fine tune as they want? > > I think those extra knobs can come later. If we are not going to > limit with max_options in the end, however, wouldn't it be more > natural for the initial iteration without any configuration not to > have hard-coded max_options at all? Yeah, I am OK with adding restrictive knobs later as a separate topic. As Stefan notes, upstream does not have the other knobs anyway, and IIRC the push-options feature is not even enabled by default. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's cooking in git.git (Jul 2016, #05; Wed, 13)
Stefan Bellerwrites: >>> I think Shawns proposal to have a receive.maxCommandBytes is a >>> good way for an overall upper bound, but how does it stop us from >>> going forward with this series? >> >> If we were to do maxcommandbytes, then max_options would become >> irrelevant, no? > > Maybe? > > I do not know what kind of safety measures we want in place here, and > if we want to go for overlapping things? > > Currently there are none at all in your upstream code, although you cannot > push arbitrary large things to either Shawns or Peffs $Dayjob servers, so > I wonder if we want to either agree on one format or on many overlapping > things, as some different hosts may perceive different things as DoS threats, > so they can fine tune as they want? I think those extra knobs can come later. If we are not going to limit with max_options in the end, however, wouldn't it be more natural for the initial iteration without any configuration not to have hard-coded max_options at all? As to the "SQUASH???" compilation fix, I can squash it to the one immediately below it locally; I didn't do so in today's pushout, as it was still unclear if you are already working on a reroll (in which case anything I would do would be a wasted effort). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v14 00/21] index-helper/watchman
On 07/12/2016 02:24 PM, Duy Nguyen wrote: Just thinking out loud. I've been thinking about this more about this. After the move from signal-based to unix socket for communication, we probably are better off with a simpler design than the shm-alike one we have now. What if we send everything over a socket or a pipe? Sending 500MB over a unix socket takes 253ms, that's insignificant when operations on an index that size usually take seconds. If we send everything over socket/pipe, we can trust data integrity and don't have to verify, even the trailing SHA-1 in shm file. I think it would be good to make index operations not take seconds. In general, we should not need to verify the trailing SHA-1 for shm data. So the index-helper verifies it when it loads it, but the git (e.g.) status should not need to verify. Also, if we have two git commands running at the same time, the index-helper can only serve one at a time; with shm, both can run at full speed. So, what I have in mind is this, at read index time, instead of open a socket, we run a separate program and communicate via pipes. We can exchange capabilities if needed, then the program sends the entire current index, the list of updated files back (and/or the list of dirs to invalidate). The design looks very much like a smudge/clean filter. This seems very complicated. Now git status talks to the separate program, which talks to the index-helper, which talks to watchman. That is a lot of steps! I think we should wait until we heard from the Windows folks what the problems with the current solution are, and see what design they come up with. For people who don't want extra daemon, they can write a short script that saves indexes somewhere in tmpfs, and talk to watchman or something else. I haven't written this script, but I don't think it takes long to write one. Windows folks have total freedom to implement a daemon, a service or whatever and use this program as front end. How the service talks to this program is totally up to them. For people who want to centralize everything, they can have just one daemon and have the script to talk to this daemon. I can see that getting rid of file-based stuff simplifies some patches. We can still provide a daemon to do more advanced stuff (or to make it work out of the box). But it's not a hard requirement and we probably don't need to include one right now. And I think it makes it easier to test as well because we can just go with some fake file monitor service instead of real watchman. I think the daemon also has the advantage that it can reload the index as soon as it changes. This is not quite implemented, but it would be pretty easy to do. That would save a lot of time in the typical workflow. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [feature request] Warn about or prevent --amend commits that don't change anything
Junio C Hamano schrieb: Bergiwrites: when nothing is staged in the index then `git commit` warns about this fact with either "nothing to commit, working directory clean" or "no changes added to commit". However, `git commit --amend --no-edit` will happily record a new commit that differs in nothing than its commit date from the original. What kind of "mistake" are you afraid of? The particular mistake that happened to me was that I `git push -f origin`'ed the newly created commit (which worked as expected, overwriting the original commit) and then leaving my PC in the belief that I had successfully pushed my changes. I can sort of see that "git commit --amend" might want to see two summary diffstat output at the end, unlike "git commit" that shows what changes were recorded relative to the parent. In addition to that "final result is different in this way from the parent" output, you might also want "this is the change you made by amending" and knowing the fact that you can notice you didn't add anything by the latter being empty _might_ give you an additional peace of mind. Yes, that would be incredibly helpful as well, but is not what I am after here. However `git commit --amend` already shows me that I still have "changes not staged for commit" when editing the commit message, so that would've been enough for my use case. The problem is that I used `git commit --amend --no-edit`, which did neither warn me about missing staged changes nor existing unstaged changes. If nothing is staged and the user does not intend to edit any metadata, git should be able to deduce that the user (me) did forget something. But is that the kind of mistake you are worried about? IOW, you thought you made and added changes X, Y and Z to the index before running your "commit --amend", but you forgot the "add" step and did not add anything? Yes, exactly. If so, […] your "you need --allow-empty if you really do not want any changes to the tree" would not [help much] either, if you added X and Y but forgot to add Z. A good remark. Maybe in that case it could warn about unstaged edits in the case of --no-edit but still succeed. So I am sensing a slight XY problem here. That might well be, I'm just here to tell my story, it's the designers that need to decide at which step I went wrong and which parts could be improved. Kind regards, Bergi -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git v2.9.1
Junio C Hamanowrites: > It is somewhat disturbing that nobody seems to be regularly building > on 32-bit platforms these days, which is the only reason I can think > of why this was never reported until it hit a maintenance track. > This should have been caught last week at f6a729f3 (Merge branch > 'jk/tzoffset-fix', 2016-07-06) when the topic hit 'master' at the > latest, and more preferrably it should have already been caught last > month at 08ec8c5e (Merge branch 'jk/tzoffset-fix' into next, > 2016-06-28). > > Those who care about 32-bit builds need to start building and > testing 'next' and 'master' regularly, or similar breakages are > bound to continue happening X-<. > > Volunteers? We might eventually see a volunteer or two but that hasn't happened yet, at least in the past few days. Does Travis CI testing have an option to run our tests on some 32-bit platforms? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report: --invert-grep and --author seems to be incompatible.
Jehan Pagèswrites: > ... A better naming should > have been called --invert-matches, or even just --invert. > And the man description should definitely be completed, IMO. Renaming the options would not work well without harming existing users, but a patch to update the documentation is certainly very welcome. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report: --invert-grep and --author seems to be incompatible.
Hi, On Wed, Jul 13, 2016 at 7:37 PM, Junio C Hamanowrote: > Jehan Pagès writes: > >>> I think --author=someone greps the "author " field in the commit >>> object looking for the hit with "someone", and your request asks to >>> show commits that either do not have "something" or was not written >>> by "someone", I would guess. >> >> Note that I can still see commits with "something", and I can also see >> commits by "someone" in my results. So my request actually ask for >> commits which have neither "something" nor are done by "someone". >> >> Anyway I don't think that's the expected result, hence is still a bug. >> Am I wrong? > > Unlike "git grep", "git log" works with boolean expression that does > not explicitly have a way to say "--and" and "--or", so only one > interpretation has been chosen long time ago. All the terms are > ORed together, and then the whole thing can be inverted with > "--invert-grep". i.e. you are telling an equivalent of > > $ git grep --not \( -e "author someone" --or -e "something" \) > > with the command line, and there is no way to combine the requested > match criteria (there are two, "author must be somebody" and > "something must be in the message") differently. > > Given that, that is the "right" expectation, and as long as you get > the behaviour, there is no "bug". I see. Well it's a little counter-intuitive though, since the option is called --invert-grep. And the man indicates: « Limit the commits output to ones with log message that do not match the pattern specified with --grep=. » So it gives the impression that the option is made only to invert the --grep part. Whereas in fact, you are saying it inverts any other optional selection (or at least also --author). A better naming should have been called --invert-matches, or even just --invert. And the man description should definitely be completed, IMO. > You can call it a lack of feature, though, and patches to implement > a more flexible combination of match criteria like "git grep" allows > is always welcome. Maybe I will! This would be useful to have a little more capabilities for commit selection. Thanks. Jehan -- ZeMarmot open animation film http://film.zemarmot.net Patreon: https://patreon.com/zemarmot Tipeee: https://www.tipeee.com/zemarmot -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi git
Hi git http://beabuyer.org/smell.php?cat=qcugur1unp6m3q98 Rgds Harsh -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git v2.9.1
Johannes Schindelinwrites: >> I was hoping to hear from you sooner and do v2.9.2 with your t0006 >> workaround with lazy-prereq changes from Peff (i.e. "Makes sense!" >> above), so that you do not have to do two releases in a row >> (i.e. skipping v2.9.1 saying "Git for Windows skipped that one >> because it was not quite right; this release fixes the issue" in >> your v2.9.2 announcement). > > I am sorry that you expected me to be more available. It is a pretty crazy > week already (and trying to get a Git for Windows v2.9.1 out of the door > after dropping everything else on Tuesday morning added quite a bit to the > load), so I am at times unable to even read the Git mailing list. Actually these back-and-forth was an attempt to help you reduce the load by not having to worry about the t0006 workaround. Checking your inbox would have been quicker than writing another of your own version. > As I am more concerned with Jeff Hostetler's patch now (the "very verbose > porcelain status"; I merged the Pull Request after seeing no comments on > his mail, but then Peff provided some good feedback, so now I am not quite > certain how to go about it: revert, or wait if the 2nd iteration is > acceptable and take that), so I am not immediately releasing any version, > anyway. As I said, I'd be waiting for a reroll of that to queue on 'pu', but as a new feature, it won't appear in any of the v2.9.x releases. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Multiple Keys in ssh-agent, fail to clone
Benjamin Fritschwrites: > I read the Changelog for 2.9 and couldn’t find any reference to changed key > handling. Is there anything that I can add to the `git clone` command to get > the old behavior? I do not think this has much to do with the version of Git, unless you are getting an updated SSH client together with your new version of Git from whoever distributes it. And it may not even be related to SSH version. Did you change your ~/.ssh/config recently by any chance? I personally do load many keys (one per destination) in the agent and back when I didn't know better, I didn't have IdentityFile line per each host, i.e. the last lines in these two entries were missing in my ~/.ssh/config: Host foo.bar.com Protocol 2 User gitolite IdentityFile ~/.ssh/foo-bar-com Host foo.baz.com Protocol 2 User junio IdentityFile ~/.ssh/foo-baz-com If you try to do "ssh -v -v foo.bar.com" with such a configuration, you would observe that many keys are "offered" to the other side to see if it is the one that it recognises, and you end up offering too many of them before the right one. An output from such a failing session of "ssh -v" ends like this for me: $ ssh -v foo.bar.com ... debug1: Offering DSA public key: foo-baz-com debug1: Authentications that can continue: publickey debug1: Offering RSA public key: xxy-fsa-com debug1: Authentications that can continue: publickey debug1: Offering DSA public key: github debug1: Authentications that can continue: publickey debug1: ... debug1: Offering RSA public key: gitorious debug1: Authentications that can continue: publickey Received disconnect from 192.168.1.1: 2: Too many authentication failures for gitolite Disconnected from 192.168.1.1 If I do not have "IdentityFile ~/.ssh/foo-bar-com" line for the "Host foo.bar.com" part, "ssh -v foo.bar.com" cannot know which one of the keys it has available can be used to authenticate you with foo.bar.com, so it ends up asking "do you know this key and would you allow me to access you?" for each and every key. Having the line lets it use the appropriate key right at the beginning, would not leak (they are "public" keys, so "leak" is not that a serious problem, though) other public keys you have, and your authentication is likely to succeed. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [feature request] Warn about or prevent --amend commits that don't change anything
Bergiwrites: > when nothing is staged in the index then `git commit` warns about this > fact with either "nothing to commit, working directory clean" or "no > changes added to commit". > However, `git commit --amend --no-edit` will happily record a new > commit that differs in nothing than its commit date from the original. > > This is unexpected and can lead to mistakes. What kind of "mistake" are you afraid of? I can sort of see that "git commit --amend" might want to see two summary diffstat output at the end, unlike "git commit" that shows what changes were recorded relative to the parent. In addition to that "final result is different in this way from the parent" output, you might also want "this is the change you made by amending" and knowing the fact that you can notice you didn't add anything by the latter being empty _might_ give you an additional peace of mind. But is that the kind of mistake you are worried about? IOW, you thought you made and added changes X, Y and Z to the index before running your "commit --amend", but you forgot the "add" step and did not add anything? If so, even the second diff i.e. "this is what you changed by amending" would not help very much, and your "you need --allow-empty if you really do not want any changes to the tree" would not either, if you added X and Y but forgot to add Z. So I am sensing a slight XY problem here. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git v2.9.1
Hi Junio, On Wed, 13 Jul 2016, Junio C Hamano wrote: > Johannes Schindelinwrites: > > > This was just a quick fix, intended to allow me to release Git for Windows > > v2.9.1 in a timely manner (which is now delayed for other reasons). > > ... > >> You'll need to adjust check_show as I did in my earlier patch. > > > > Makes sense! > > Hmph, so what is your preferred approach? You'll do your own v2.9.1 > that is different from others at t0006? I may do a Git for Windows v2.9.1 that is different from Git v2.9.1 in t0006, yes. Git for Windows still has tons of patches on top of Git because I seem to be unable to drain the patch queue as fast as I want, so I do not believe it is a big deal. > I was hoping to hear from you sooner and do v2.9.2 with your t0006 > workaround with lazy-prereq changes from Peff (i.e. "Makes sense!" > above), so that you do not have to do two releases in a row > (i.e. skipping v2.9.1 saying "Git for Windows skipped that one > because it was not quite right; this release fixes the issue" in > your v2.9.2 announcement). I am sorry that you expected me to be more available. It is a pretty crazy week already (and trying to get a Git for Windows v2.9.1 out of the door after dropping everything else on Tuesday morning added quite a bit to the load), so I am at times unable to even read the Git mailing list. As I am more concerned with Jeff Hostetler's patch now (the "very verbose porcelain status"; I merged the Pull Request after seeing no comments on his mail, but then Peff provided some good feedback, so now I am not quite certain how to go about it: revert, or wait if the 2nd iteration is acceptable and take that), so I am not immediately releasing any version, anyway. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Multiple Keys in ssh-agent, fail to clone
Hey all, We recently upgraded from Git 2.8 to 2.9 and saw an issue when there are multiple keys added to my ssh-agent. I have two keys. - KeyA (my company that has access to the repository I want to clone) - KeyB (just my personal key with access to my personal stuff) Having both keys in loaded and listed in `ssh-add -L` fails to clone the repository. I tried to change the order of the key in the agent but neither KeyA, KeyB nor KeyB, KeyA will work. The only case that works if I have KeyA loaded an no other key is added to the ssh-agent. Having multiple Keys loaded works with Git 2.8 and Git 2.7 (I didn’t try older versions) Cloning fails with “Unauthorized Access” of our Git provider. (It’s Bitbucket in this case) I read the Changelog for 2.9 and couldn’t find any reference to changed key handling. Is there anything that I can add to the `git clone` command to get the old behavior? Thank you for your help, Best Ben-- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git v2.9.1
Johannes Schindelinwrites: > How about this one instead (which is part of the time_t-may-be-int64 > branch on https://github.com/dscho/git which I still have to complete, as > some unsigned longs still slipped out of my previous net)? It strikes me > as much more robust: Hmm, sorry, I do not see much upside here (iow, the 2038 test you came up with is as robust). When the internal time representation is updated from "unsigned long" to a signed and wider type [*1*], test-date has to stop reading the second-from-epoch input with strtol(), whose property that overflow will always result in LONG_MAX gives the robustness of the 2038 test, and needs to be updated. With this "is64bit" patch, you explicitly measure "unsigned long", knowing that our internal time representation currently is that type, and that would need to be updated when widening happens. So both need to be updated anyway in the future. The update to check_show Peff suggested is the same as the previous one, so there is no upside nor downside. The prerequisite name 64BITTIME that lost an underscore is harder to read, so there is a slight downside. Moving of lazy_prereq to test-lib might be an upside if we were planning to add a test that depends on the system having or not having 64-bit timestamp elsewhere, but I do not think of a reason why such a new test cannot go to t0006-date, which has the perfect name for such a test and is not overly long right now (114 lines). So, unless you have a more solid reason to reject the updated t0006 earlier in the thread, I am not sure what we'd gain by replacing it with this version. > -- snipsnap -- > From abe59dbb2235bb1d7aad8e78a66e196acb372ec8 Mon Sep 17 00:00:00 2001 > From: Johannes Schindelin > Date: Tue, 12 Jul 2016 13:19:53 +0200 > Subject: [PATCH] t0006: dates absurdly far require a 64-bit data type > > Git's source code refers to timestamps as unsigned longs. On 32-bit > platforms, as well as on Windows, unsigned long is not large enough to > capture dates that are "absurdly far in the future". > > Let's skip those tests if we know they cannot succeed. > > Signed-off-by: Johannes Schindelin > --- > t/helper/test-date.c | 5 - > t/t0006-date.sh | 6 +++--- > t/test-lib.sh| 2 ++ > 3 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/t/helper/test-date.c b/t/helper/test-date.c > index d9ab360..1e12d93 100644 > --- a/t/helper/test-date.c > +++ b/t/helper/test-date.c > @@ -4,7 +4,8 @@ static const char *usage_msg = "\n" > " test-date relative [time_t]...\n" > " test-date show: [time_t]...\n" > " test-date parse [date]...\n" > -" test-date approxidate [date]...\n"; > +" test-date approxidate [date]...\n" > +" test-date is64bit\n"; > > static void show_relative_dates(char **argv, struct timeval *now) > { > @@ -93,6 +94,8 @@ int main(int argc, char **argv) > parse_dates(argv+1, ); > else if (!strcmp(*argv, "approxidate")) > parse_approxidate(argv+1, ); > + else if (!strcmp(*argv, "is64bit")) > + return sizeof(unsigned long) == 8 ? 0 : 1; > else > usage(usage_msg); > return 0; > diff --git a/t/t0006-date.sh b/t/t0006-date.sh > index 04ce535..52f6b62 100755 > --- a/t/t0006-date.sh > +++ b/t/t0006-date.sh > @@ -31,7 +31,7 @@ check_show () { > format=$1 > time=$2 > expect=$3 > - test_expect_${4:-success} "show date ($format:$time)" ' > + test_expect_success $4 "show date ($format:$time)" ' > echo "$time -> $expect" >expect && > test-date show:$format "$time" >actual && > test_cmp expect actual > @@ -50,8 +50,8 @@ check_show iso-local "$TIME" '2016-06-15 14:13:20 +' > > # arbitrary time absurdly far in the future > FUTURE="5758122296 -0400" > -check_show iso "$FUTURE" "2152-06-19 18:24:56 -0400" > -check_show iso-local "$FUTURE" "2152-06-19 22:24:56 +" > +check_show iso "$FUTURE" "2152-06-19 18:24:56 -0400" 64BITTIME > +check_show iso-local "$FUTURE" "2152-06-19 22:24:56 +" 64BITTIME > > check_parse() { > echo "$1 -> $2" >expect > diff --git a/t/test-lib.sh b/t/test-lib.sh > index 0055ebb..4e1afb0 100644 > --- a/t/test-lib.sh > +++ b/t/test-lib.sh > @@ -,3 +,5 @@ run_with_limited_cmdline () { > } > > test_lazy_prereq CMDLINE_LIMIT 'run_with_limited_cmdline true' > + > +test_lazy_prereq 64BITTIME 'test-date is64bit' -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Issue with git p4 clone when using multiple depots and multiple branches
Hi, We have a perforce structure at work that looks like this: //depot/basic/{main,branch1,branch2}/component{1,2} //depot/extra/{main,branch1,branch2}/extra{1,2} I'm trying to create 3 branches, i.e. main/branch1/branch2, with all sub-components together, i.e. component1/component2/extra1/extra2. For this reason I have created a p4 client, with all 12 paths used like this: //depot/basic/main/component1/... //client.name/main/component1/... //depot/basic/main/component2/... //client.name/main/component2/... //depot/basic/extra/extra1/... //client.name/main/extra1/... //depot/basic/extra/extra2/... //client.name/main/extra2/... ... similar also for branch1 and branch2. I have setup my git using these: git config git-p4.clientclient.name git config git-p4.branchList main/extra1 git config --add main/extra2make The problem is that all //depot/extra paths are ignored when I'm running the following command > git p4 clone --use-client-spec --detect-branches //depot/basic@all > //depot/extra@all --destination project I think the issue is in git-p4 and especially in importNewBranch. Instead of using: """ branchPrefix = self.depotPaths[0] + branch + "/" self.branchPrefixes = [ branchPrefix ] """ We should be using """ self.branchPrefixes = [x + branch + "/" for x in self.depotPaths] """ Same also for "p4ChangesForPaths([branchPrefix], ..." What do you think? Thanks, Nikos -- =Do- N.AND -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
Hi Duy, On Wed, 13 Jul 2016, Duy Nguyen wrote: > On Wed, Jul 13, 2016 at 10:20 AM, Johannes Schindelin >wrote: > > > > On Tue, 12 Jul 2016, Duy Nguyen wrote: > > > >> On Tue, Jul 12, 2016 at 5:51 PM, Jeff King wrote: > >> > On Tue, Jul 12, 2016 at 05:46:12PM +0200, Duy Nguyen wrote: > >> > > >> >> I'm not opposed to letting one worktree see everything, but this move > >> >> makes it harder to write new scripts (or new builtin commands, even) > >> >> that works with both single and multiple worktrees because you refer > >> >> to one ref (in current worktree perspective) differently. If we kill > >> >> of the main worktree (i.e. git init always creates a linked worktree) > >> >> then it's less of a problem, but still a nuisance to write > >> >> refs/worktree/$CURRENT/ everywhere. > >> > > >> > True. I gave a suggestion for the reading side, but the writing side > >> > would still remain tedious. > >> > > >> > I wonder if, in a worktree, we could simply convert requests to read or > >> > write names that do not begin with "refs/" as "refs/worktree/$CURRENT/"? > >> > That makes it a read/write-time alias conversion, but the actual storage > >> > is just vanilla (so the ref storage doesn't need to care, and > >> > reachability just works). > >> > >> A conversion like that is already happening, but it works at > >> git_path() level instead and maps anything outside refs/ to > >> worktrees/$CURRENT. > > > > Wouldn't you agree that the entire discussion goes into a direction that > > reveals that it might simply be a better idea to require commands that want > > to have per-worktree refs to do that explicitly? > > No. Oh? So you do not see that we are already heading into serious trouble ourselves? > I prefer we have a single interface for dealing with _any_ worktree. I agree. So far, I did not see an interface, though, but the idea that we should somehow fake things so that there does not *have* to be an interface. > > The same holds true for the config, BTW. I really have no love for the > > idea to make the config per-worktree. It just holds too many nasty > > opportunities for violate the Law of Least Surprises. > > > > Just to name one: imagine you check out a different branch in worktree A, > > then switch worktree B to the branch that A had, and all of a sudden you > > may end up with a different upstream! > > Everything in moderation. You wouldn't want to enable sparse checkout > on one worktree and it suddenly affects all worktrees because > core.sparsecheckout is shared. And submodules are known not to work > when core.worktree is still shared. We have precendence for config variables that are global but also allow per- overrides. Think e.g. the http.* variables. The point is: this solution still uses *one* config per repo. > I will not enforce any rule (unless it's very obvious that the other > way is wrong, like core.worktree). I will give you a rifle and you can > either hunt for food or shoot your foot. In other words, you should be > able to share everything if you like it that way while someone else > can share just some config vars, or even nothing in config file. You gave me a rifle alright, and I shot into my foot (by losing objects to auto-gc). I just did not expect it to be a rifle. To keep with the analogy: let's not build arms, but a kick-ass tool. And I seriously disagree that per-worktree refs, reflogs or config are part of said tool. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[feature request] Warn about or prevent --amend commits that don't change anything
Hello, when nothing is staged in the index then `git commit` warns about this fact with either "nothing to commit, working directory clean" or "no changes added to commit". However, `git commit --amend --no-edit` will happily record a new commit that differs in nothing than its commit date from the original. This is unexpected and can lead to mistakes. Without running `git status`, the user will not notice that his unstaged changes were not commited, as everything behaves as expected otherwise (the success output from `commit`, the new commit id in the log, `push` requiring the force option, etc). I understand that `--amend` is (can be) used for editing commit messages, authors, authoring dates etc. I would however like to see any `--amend` command that results in no changes to the tree, the commit message and the authoring metadata reject the commit with an appropriate warning similar to the one that a plain `git commit` would present. It should be overrideable by the `--allow-emtpy` parameter as well. If this change detection is somehow unfeasible, I would at least like the `git commit --amend --no-edit` command (with no other flags) to check the tree in the same way as `git commit` does, as the intention of `--no-edit` is even more clear and running the command is more obviously a mistake/lapse. Kind regards, Bergi -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git v2.9.1
Hi Junio, On Wed, 13 Jul 2016, Junio C Hamano wrote: > Johannes Schindelinwrites: > > > On Tue, 12 Jul 2016, Junio C Hamano wrote: > > > >> Jeff King writes: > >> > >> > In case it wasn't clear, I was mostly guessing there. So I dug a bit > >> > further, and indeed, I am wrong. Linux never bumped to a 64-bit time_t > >> > on i386 because of the ABI headaches. > >> > >> X-< (yes, I knew). > >> > >> > That being said, I still think the "clamp to time_t" strategy is > >> > reasonable. Unless you are doing something really exotic like pretending > >> > to be from the future, nobody will care for 20 years. > >> > >> Yup. It is a minor regression for them to go from ulong to time_t, > >> because they didn't have to care for 90 years or so but now they do > >> in 20 years, I'd guess, but hopefully after that many years, > >> everybody's time_t would be sufficiently large. > >> > >> I suspect Cobol programmers in the 50s would have said a similar > >> thing about the y2k timebomb they created back then, though ;-) > >> > >> > And at that point, systems with a 32-bit time_t are going to have > >> > to do _something_, because time() is going to start returning > >> > bogus values. So as long as we behave reasonably (e.g., clamping > >> > values and not generating wrapped nonsense), I think that's a fine > >> > solution. > >> > >> OK. > > > > I kept the unsigned long -> time_t conversion after reading the thread so > > far. > > That's OK at this point; it is not v2.9.x material anyway. Got it. I will try to get the patches submitted soon, anyway. > The primary reason why I cared 32-bit time_t is not about 2038, by > the way. I recall that people wanted to store historical document > with ancient timestamp; even if we update to support negative > timestamps, they cannot go back to 19th century with 32-bit time_t, > but they can with long long or whatever intmax_t is on their system. Fair enough. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git v2.9.1
Hi Junio, On Wed, 13 Jul 2016, Junio C Hamano wrote: > Jeff Kingwrites: > > > Definitely keep that paragraph. I am quite familiar with the test > > helper and it was not the outcome I initially expected. > > > >> +test_lazy_prereq 64BIT_TIME ' > >> + case "$(test-date show:iso 99)" in > >> + *" -> 2038-"*) > >> + # on this platform, unsigned long is 32-bit, i.e. not large > >> enough > >> + false > > > > I see you tightened up the match a little. TBH, I think we could > > probably just match the whole output string, but I doubt there's much > > chance of a false positive either way. > > Ah, it wasn't meant to be a tightening; rather the above is for > documentation value, i.e. make it stand out what 2038 we are > matching---its answer being "the year portion of the result of the > conversion". How about this one instead (which is part of the time_t-may-be-int64 branch on https://github.com/dscho/git which I still have to complete, as some unsigned longs still slipped out of my previous net)? It strikes me as much more robust: -- snipsnap -- >From abe59dbb2235bb1d7aad8e78a66e196acb372ec8 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Tue, 12 Jul 2016 13:19:53 +0200 Subject: [PATCH] t0006: dates absurdly far require a 64-bit data type Git's source code refers to timestamps as unsigned longs. On 32-bit platforms, as well as on Windows, unsigned long is not large enough to capture dates that are "absurdly far in the future". Let's skip those tests if we know they cannot succeed. Signed-off-by: Johannes Schindelin --- t/helper/test-date.c | 5 - t/t0006-date.sh | 6 +++--- t/test-lib.sh| 2 ++ 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/t/helper/test-date.c b/t/helper/test-date.c index d9ab360..1e12d93 100644 --- a/t/helper/test-date.c +++ b/t/helper/test-date.c @@ -4,7 +4,8 @@ static const char *usage_msg = "\n" " test-date relative [time_t]...\n" " test-date show: [time_t]...\n" " test-date parse [date]...\n" -" test-date approxidate [date]...\n"; +" test-date approxidate [date]...\n" +" test-date is64bit\n"; static void show_relative_dates(char **argv, struct timeval *now) { @@ -93,6 +94,8 @@ int main(int argc, char **argv) parse_dates(argv+1, ); else if (!strcmp(*argv, "approxidate")) parse_approxidate(argv+1, ); + else if (!strcmp(*argv, "is64bit")) + return sizeof(unsigned long) == 8 ? 0 : 1; else usage(usage_msg); return 0; diff --git a/t/t0006-date.sh b/t/t0006-date.sh index 04ce535..52f6b62 100755 --- a/t/t0006-date.sh +++ b/t/t0006-date.sh @@ -31,7 +31,7 @@ check_show () { format=$1 time=$2 expect=$3 - test_expect_${4:-success} "show date ($format:$time)" ' + test_expect_success $4 "show date ($format:$time)" ' echo "$time -> $expect" >expect && test-date show:$format "$time" >actual && test_cmp expect actual @@ -50,8 +50,8 @@ check_show iso-local "$TIME" '2016-06-15 14:13:20 +' # arbitrary time absurdly far in the future FUTURE="5758122296 -0400" -check_show iso "$FUTURE" "2152-06-19 18:24:56 -0400" -check_show iso-local "$FUTURE" "2152-06-19 22:24:56 +" +check_show iso "$FUTURE" "2152-06-19 18:24:56 -0400" 64BITTIME +check_show iso-local "$FUTURE" "2152-06-19 22:24:56 +" 64BITTIME check_parse() { echo "$1 -> $2" >expect diff --git a/t/test-lib.sh b/t/test-lib.sh index 0055ebb..4e1afb0 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -,3 +,5 @@ run_with_limited_cmdline () { } test_lazy_prereq CMDLINE_LIMIT 'run_with_limited_cmdline true' + +test_lazy_prereq 64BITTIME 'test-date is64bit' -- 2.9.0.278.g1caae67 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Https password present in git output
On wo, 2016-07-13 at 20:26 +0300, ervion wrote: > One possibility for this in git is to save remote in the > https://username:passw...@domain.com/repo.git format. This is not recommended. Git has credential helpers to help you store passwords outside the git configuration. Which then makes your original problem go away :) D. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Https password present in git output
On Wed, Jul 13, 2016 at 11:09 AM, Junio C Hamanowrote: > ervion writes: > >> Sometimes using ssh is not possible and saving https password in plain >> text to disk may be desireable >> (in case of encrypted disk it would be equivalent security with >> caching password in memory). >> >> One possibility for this in git is to save remote in the >> https://username:passw...@domain.com/repo.git format. > > Wasn't netrc support added exactly because users do not want to do > this? Interesting. Even with "auth in URL", I seem to get this: $ git fetch -v -v https://gitster:p...@github.com/git/git refs/tags/v2.9.1 >From https://github.com/git/git * tag v2.9.1 -> FETCH_HEAD Notice that "From $URL" has the userinfo (3.2.1 in RFC 3986) scrubbed. If you are seeing somewhere we forgot to scrub userinfo in a similar way in the output, we should. Where do you see "present in git OUTPUT" as you said in the subject? What command with what options exactly and in what part of the output? Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Https password present in git output
ervionwrites: > Sometimes using ssh is not possible and saving https password in plain > text to disk may be desireable > (in case of encrypted disk it would be equivalent security with > caching password in memory). > > One possibility for this in git is to save remote in the > https://username:passw...@domain.com/repo.git format. Wasn't netrc support added exactly because users do not want to do this? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's cooking in git.git (Jul 2016, #05; Wed, 13)
On Wed, Jul 13, 2016 at 10:52 AM, Stefan Bellerwrote: > On Wed, Jul 13, 2016 at 10:32 AM, Junio C Hamano wrote: >> Stefan Beller writes: >> * sb/push-options (2016-07-12) 5 commits - add a test for push options - push: accept push options - SQUASH??? >>> >>> Squash? I do not find a squashable commit in what you pushed, >>> do you intend to squash the first 2 patches instead? > > Oh I pulled a few minutes before you sent this email, and forgot > that you likely have pushed again when sending this email. :/ There were no multiple pushes involved. I prepare what is to be pushed out, update the what's cooking report, push and then send out the what's cooking report (which also is on the 'todo' branch). But there is this thing called propagation delay ;-) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's cooking in git.git (Jul 2016, #05; Wed, 13)
On Wed, Jul 13, 2016 at 10:32 AM, Junio C Hamanowrote: > Stefan Beller writes: > >>> * sb/push-options (2016-07-12) 5 commits >>> - add a test for push options >>> - push: accept push options >>> - SQUASH??? >> >> Squash? I do not find a squashable commit in what you pushed, >> do you intend to squash the first 2 patches instead? Oh I pulled a few minutes before you sent this email, and forgot that you likely have pushed again when sending this email. :/ Thanks! >> Yeah there were some late comments, so I did not reroll right away. >> I think Shawns proposal to have a receive.maxCommandBytes is a >> good way for an overall upper bound, but how does it stop us from >> going forward with this series? > > If we were to do maxcommandbytes, then max_options would become > irrelevant, no? Maybe? I do not know what kind of safety measures we want in place here, and if we want to go for overlapping things? Currently there are none at all in your upstream code, although you cannot push arbitrary large things to either Shawns or Peffs $Dayjob servers, so I wonder if we want to either agree on one format or on many overlapping things, as some different hosts may perceive different things as DoS threats, so they can fine tune as they want? In the Gerrit world, you have a ref per code review, such that it is easy to have 50k refs or more, similar to the repo Jeff pointed out [1], that has 40k tags (and getting a new tag every 2 hours apparently). So I could understand if different services care about the different loads (refs vs push options) differently (one would want to allow unlimited refs pushing for mirroring such repos as pointed out above, while another one might care about the total load of the server for a single rogue user) Thanks, Stefan [1] https://github.com/JetBrains/intellij-community -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Https password present in git output
Sometimes using ssh is not possible and saving https password in plain text to disk may be desireable (in case of encrypted disk it would be equivalent security with caching password in memory). One possibility for this in git is to save remote in the https://username:passw...@domain.com/repo.git format. However, in this case every time you push or pull, the remote address, including the plain text password. That would introduce additional security issiues and is unreasonable? Wouldn't it make sense to scrabble the password part in remote's url before printing it to output? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 8/8] read-cache: unlink old sharedindex files
On Wed, Jul 13, 2016 at 5:16 PM, Duy Nguyenwrote: > On Tue, Jul 12, 2016 at 9:45 PM, Christian Couder > wrote: >> On Tue, Jul 12, 2016 at 5:12 PM, Duy Nguyen wrote: >>> >>> No. People could create an index file anywhere in theory. So you don't >>> know how many index files there are. >> >> Maybe when an index file is created, its path and its sharedindex file >> could be appended into a log file. >> We could check this log file to see if we can remove sharedindex files. >> We would need to remove the entries in this log file for the indexes >> that are no longer there. >> >> Or instead of one log file we could have a file for each index file in >> a special directory called for example "indexinfo". >> So we could just delete the file if its related index is no longer there. > > New files will require locking so people don't append at the same > time. If we create a new file for each index file with a name that depends on the index path (for example maybe the sha1 of the index path) then as many index files with the same path are not possible we should be safe. > And maybe another new host of problems. I think we can just go > with the garbage collection way that we have done for unreachable > objects. > > Your indexinfo idea looks very close to multiple locking, an index > would lock the shared index it's linked to, preventing it from being > removed. For single locking, we can just create a file named $X.lock, > but for multiple locks, maybe we can go with > $X.lock-$index_trailing_sha1. Will it work? I don't know. Just > thinking out loud. Isn't is possible that the same index (so with the same trailing sha1) be created in two different places? >>> It really depends. If the shared part is too small for old indexes, we >>> might as well unsplit them. In practice though, the only long-term >>> index file is $GIT_DIR/index. If we don't delete old shared index >>> files too close to their creation time, temp index files will go away. >> >> We could treat $GIT_DIR/index specially so that if there are no temp >> index files, there should be nothing in "indexinfo". > > No, temp index files are needed. And it will be hard to treat > $GIT_DIR/index specially because updating it involves a temp index: > you first prepare a new index in $GIT_DIR/index.lock. If the new index is always prepared in $GIT_DIR/index.lock, then there is no problem in the first place because when my original patch calls clean_shared_index_files() the $GIT_DIR/index.lock is taken. Or maybe I am missing something? > If everything > goes well, you atomically rename it to $GIT_DIR/index. You may be able > to treat $GIT_DIR/index.lock special too, but that's starting to get > out of hand. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's cooking in git.git (Jul 2016, #05; Wed, 13)
Duy Nguyenwrites: > On the subject of truncation, there is something else I should note. > The field sd_size in struct stat_data is 32 bits, so large files will > overflow it too, regardless of platforms. I did not do anything > because I checked and double checked and was pretty sure we did not > use it for anything meaningful (as a file size). To us, it's just > another random number, like st_ino, that we check to detect if a file > has changed. Yes, the comparison to flip DATA_CHANGED bit near the end of match_stat_data() has a cast for that exact reason. It might deserve a comment there. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's cooking in git.git (Jul 2016, #05; Wed, 13)
On Wed, Jul 13, 2016 at 6:56 PM, Junio C Hamanowrote: > * nd/pack-ofs-4gb-limit (2016-07-13) 7 commits > - fsck: use streaming interface for large blobs in pack > - pack-objects: do not truncate result in-pack object size on 32-bit systems > - index-pack: correct "offset" type in unpack_entry_data() > - index-pack: report correct bad object offsets even if they are large > - index-pack: correct "len" type in unpack_data() > - sha1_file.c: use type off_t* for object_info->disk_sizep > - pack-objects: pass length to check_pack_crc() without truncation > > "git pack-objects" and "git index-pack" mostly operate with off_t > when talking about the offset of objects in a packfile, but there > were a handful of places that used "unsigned long" to hold that > value, leading to an unintended truncation. On the subject of truncation, there is something else I should note. The field sd_size in struct stat_data is 32 bits, so large files will overflow it too, regardless of platforms. I did not do anything because I checked and double checked and was pretty sure we did not use it for anything meaningful (as a file size). To us, it's just another random number, like st_ino, that we check to detect if a file has changed. It's probably just an oversight (it comes from the very first "the information manager from hell" commit). But it's not worth changing index format now to extend it to 64 bits, I think. So it's ok, no worry about it, but we should probably make clear that this is not true file size, and don't anybody ever use it as such. Maybe rename it to "size_hash" or something. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's cooking in git.git (Jul 2016, #05; Wed, 13)
> * sb/push-options (2016-07-12) 5 commits > - add a test for push options > - push: accept push options > - SQUASH??? Squash? I do not find a squashable commit in what you pushed, do you intend to squash the first 2 patches instead? > - receive-pack: implement advertising and receiving push options > - push options: {pre,post}-receive hook learns about push options > > "git push" learned to accept and pass extra options to the > receiving end so that hooks can read and react to them. > > Discussion continues, expecting a further reroll. > ($gmane/299156) Yeah there were some late comments, so I did not reroll right away. I think Shawns proposal to have a receive.maxCommandBytes is a good way for an overall upper bound, but how does it stop us from going forward with this series? It seems like a good series to implement on top of this? (We also have no code for limiting the number of refs pushed, so I do not quite understand why this DoS paranoia comes up with this series all of a sudden.) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report: --invert-grep and --author seems to be incompatible.
Jehan Pagèswrites: >> I think --author=someone greps the "author " field in the commit >> object looking for the hit with "someone", and your request asks to >> show commits that either do not have "something" or was not written >> by "someone", I would guess. > > Note that I can still see commits with "something", and I can also see > commits by "someone" in my results. So my request actually ask for > commits which have neither "something" nor are done by "someone". > > Anyway I don't think that's the expected result, hence is still a bug. > Am I wrong? Unlike "git grep", "git log" works with boolean expression that does not explicitly have a way to say "--and" and "--or", so only one interpretation has been chosen long time ago. All the terms are ORed together, and then the whole thing can be inverted with "--invert-grep". i.e. you are telling an equivalent of $ git grep --not \( -e "author someone" --or -e "something" \) with the command line, and there is no way to combine the requested match criteria (there are two, "author must be somebody" and "something must be in the message") differently. Given that, that is the "right" expectation, and as long as you get the behaviour, there is no "bug". You can call it a lack of feature, though, and patches to implement a more flexible combination of match criteria like "git grep" allows is always welcome. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's cooking in git.git (Jul 2016, #05; Wed, 13)
Stefan Bellerwrites: >> * sb/push-options (2016-07-12) 5 commits >> - add a test for push options >> - push: accept push options >> - SQUASH??? > > Squash? I do not find a squashable commit in what you pushed, > do you intend to squash the first 2 patches instead? $ git log --oneline --first-parent master..pu | grep push-options f8e50d4 Merge branch 'sb/push-options' into pu $ git log --oneline master..f8e50d4^2 d6fd535 add a test for push options ef00034 push: accept push options 6c5282d SQUASH??? ed0c716 receive-pack: implement advertising and receiving push options f7c760f push options: {pre,post}-receive hook learns about push options $ git show 6c5282d commit 6c5282d7c5b50f362aaee2059c0253ab17b4fcd3 Author: Junio C Hamano Date: Tue Jul 12 14:54:58 2016 -0700 SQUASH??? diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index 754db6e..4d8041a 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -1503,8 +1503,6 @@ static struct string_list *read_push_options(void) { int i; struct string_list *ret = xmalloc(sizeof(*ret)); - string_list_init(ret, 1); - /* NEEDSWORK: expose the limitations to be configurable. */ int max_options = 32; @@ -1518,6 +1516,7 @@ static struct string_list *read_push_options(void) */ int max_size = 1024; + string_list_init(ret, 1); for (i = 0; i < max_options; i++) { char *line; int len; i.e. compilation fix for decl-after-stmt. >> - receive-pack: implement advertising and receiving push options >> - push options: {pre,post}-receive hook learns about push options >> >> "git push" learned to accept and pass extra options to the >> receiving end so that hooks can read and react to them. >> >> Discussion continues, expecting a further reroll. >> ($gmane/299156) > > Yeah there were some late comments, so I did not reroll right away. > I think Shawns proposal to have a receive.maxCommandBytes is a > good way for an overall upper bound, but how does it stop us from > going forward with this series? If we were to do maxcommandbytes, then max_options would become irrelevant, no? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report: --invert-grep and --author seems to be incompatible.
Hi, On Wed, Jul 13, 2016 at 7:04 PM, Junio C Hamanowrote: > Jehan Pagès writes: > >>> git log --author=someone --invert-grep --grep=something >> >> But it does not work. Actually it looks like it just returns >> everything (as though I had done a simple `git log`). > > Do you see a commit that is by "someone" and has "something" in it > in the output from the command? Indeed you are right! Commits which are by "someone" and have "something" in it in the same time are missing. So here --invert-grep works as a big "NOT" operator on the whole rest of the command line, which is not expected (at least by me). > I think --author=someone greps the "author " field in the commit > object looking for the hit with "someone", and your request asks to > show commits that either do not have "something" or was not written > by "someone", I would guess. Note that I can still see commits with "something", and I can also see commits by "someone" in my results. So my request actually ask for commits which have neither "something" nor are done by "someone". Anyway I don't think that's the expected result, hence is still a bug. Am I wrong? Jehan -- ZeMarmot open animation film http://film.zemarmot.net Patreon: https://patreon.com/zemarmot Tipeee: https://www.tipeee.com/zemarmot -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bug report: --invert-grep and --author seems to be incompatible.
Hello, > $ git version > git version 2.5.5 Tested on a Fedora 23. You can combine `git log --author=someone --grep=something` to have all commits by "someone" where "something" can be grepped. If I want all commits by someone where "something" is not grepped, I would expect this command to return it: > git log --author=someone --invert-grep --grep=something But it does not work. Actually it looks like it just returns everything (as though I had done a simple `git log`). So there seems to be a problem here. Thanks! Jehan -- ZeMarmot open animation film http://film.zemarmot.net Patreon: https://patreon.com/zemarmot Tipeee: https://www.tipeee.com/zemarmot -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report: --invert-grep and --author seems to be incompatible.
Jehan Pagèswrites: >> git log --author=someone --invert-grep --grep=something > > But it does not work. Actually it looks like it just returns > everything (as though I had done a simple `git log`). Do you see a commit that is by "someone" and has "something" in it in the output from the command? I think --author=someone greps the "author " field in the commit object looking for the hit with "someone", and your request asks to show commits that either do not have "something" or was not written by "someone", I would guess. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
What's cooking in git.git (Jul 2016, #05; Wed, 13)
Here are the topics that have been cooking. Commits prefixed with '-' are only in 'pu' (proposed updates) while commits prefixed with '+' are in 'next'. The ones marked with '.' do not appear in any of the integration branches, but I am still holding onto them. It turns out that v2.9.1 had tests that weren't even meant to pass on platforms with 32-bit time_t/unsigned long, but did not properly exclude them. A fix nas been prepared and I expect we'll cut v2.9.2 with only that change relative to v2.9.1 soonish. You can find the changes described here in the integration branches of the repositories listed at http://git-blame.blogspot.com/p/git-public-repositories.html -- [New Topics] * js/t0006-for-v2.9.2 (2016-07-12) 1 commit - t0006: skip "far in the future" test when unsigned long is not long enough A test merged to v2.9.1 forgot that the feature it was testing would not work on a platform with 32-bit time_t/unsigned long and reported breakage. Skip the tests that cannot run correctly on such platforms. Waiting for an Ack; will fast-track down to 'maint' to cut v2.9.2. * js/am-3-merge-recursive-direct (2016-07-12) 21 commits - merge-recursive: flush output buffer even when erroring out - merge_trees(): ensure that the output buffer is released after calling it - merge-recursive: offer an option to retain the output in 'obuf' - merge-recursive: write the commit title in one go - merge-recursive: flush output buffer before printing error messages - am -3: use merge_recursive() directly again - merge-recursive: switch to returning errors instead of dying - merge-recursive: handle return values indicating errors - merge-recursive: allow write_tree_from_memory() to error out - merge-recursive: avoid returning a wholesale struct - merge_recursive: abort properly upon errors - prepare the builtins for a libified merge_recursive() - merge-recursive: clarify code in was_tracked() - die(_("BUG")): avoid translating bug messages - die("bug"): report bugs consistently - t5520: verify that `git pull --rebase` shows the advice when failing - Merge branch 'jh/clean-smudge-annex' into HEAD - Merge branch 'bc/cocci' into js/am-3-merge-recursive-direct - Merge branch 'js/am-call-theirs-theirs-in-fallback-3way' into js/am-3-merge-recursive-direct - Merge branch 'cc/apply-am' into js/am-3-merge-recursive-direct - Merge branch 'va/i18n-even-more' into js/am-3-merge-recursive-direct (this branch uses bc/cocci, cc/apply-am, jh/clean-smudge-annex, js/am-call-theirs-theirs-in-fallback-3way and va/i18n-even-more.) "git am -3" calls "git merge-recursive" when it needs to fall back to a three-way merge; this call has been turned into an internal subroutine call instead of spawning a separate subprocess. Will need to wait for all the topics this depends on graduate to 'master'. * ls/travis-enable-httpd-tests (2016-07-12) 1 commit - travis-ci: enable web server tests t55xx on Linux Allow http daemon tests in Travis CI tests. * nd/cache-tree-ita (2016-07-12) 4 commits - cache-tree: do not generate empty trees as a result of all i-t-a subentries - cache-tree.c: fix i-t-a entry skipping directory updates sometimes - test-lib.sh: introduce and use $_EMPTY_BLOB - test-lib.sh: introduce and use $_EMPTY_TREE "git add -N dir/file && git write-tree" produced an incorrect tree when there are other paths in the same directory that sorts after "file". Looked mostly OK. Further review comments are welcome. * nd/pack-ofs-4gb-limit (2016-07-13) 7 commits - fsck: use streaming interface for large blobs in pack - pack-objects: do not truncate result in-pack object size on 32-bit systems - index-pack: correct "offset" type in unpack_entry_data() - index-pack: report correct bad object offsets even if they are large - index-pack: correct "len" type in unpack_data() - sha1_file.c: use type off_t* for object_info->disk_sizep - pack-objects: pass length to check_pack_crc() without truncation "git pack-objects" and "git index-pack" mostly operate with off_t when talking about the offset of objects in a packfile, but there were a handful of places that used "unsigned long" to hold that value, leading to an unintended truncation. Will merge to 'next'. * sb/push-options (2016-07-12) 5 commits - add a test for push options - push: accept push options - SQUASH??? - receive-pack: implement advertising and receiving push options - push options: {pre,post}-receive hook learns about push options "git push" learned to accept and pass extra options to the receiving end so that hooks can read and react to them. Discussion continues, expecting a further reroll. ($gmane/299156) * ew/http-walker (2016-07-12) 3 commits - http-walker: reduce O(n) ops with doubly-linked list - http: avoid disconnecting on 404s for loose objects - http-walker: remove unused parameter from fetch_object Optimize dumb http transport on the
Re: [ANNOUNCE] Git v2.9.1
Johannes Schindelinwrites: > On Tue, 12 Jul 2016, Junio C Hamano wrote: > >> Jeff King writes: >> >> > In case it wasn't clear, I was mostly guessing there. So I dug a bit >> > further, and indeed, I am wrong. Linux never bumped to a 64-bit time_t >> > on i386 because of the ABI headaches. >> >> X-< (yes, I knew). >> >> > That being said, I still think the "clamp to time_t" strategy is >> > reasonable. Unless you are doing something really exotic like pretending >> > to be from the future, nobody will care for 20 years. >> >> Yup. It is a minor regression for them to go from ulong to time_t, >> because they didn't have to care for 90 years or so but now they do >> in 20 years, I'd guess, but hopefully after that many years, >> everybody's time_t would be sufficiently large. >> >> I suspect Cobol programmers in the 50s would have said a similar >> thing about the y2k timebomb they created back then, though ;-) >> >> > And at that point, systems with a 32-bit time_t are going to have >> > to do _something_, because time() is going to start returning >> > bogus values. So as long as we behave reasonably (e.g., clamping >> > values and not generating wrapped nonsense), I think that's a fine >> > solution. >> >> OK. > > I kept the unsigned long -> time_t conversion after reading the thread so > far. That's OK at this point; it is not v2.9.x material anyway. The primary reason why I cared 32-bit time_t is not about 2038, by the way. I recall that people wanted to store historical document with ancient timestamp; even if we update to support negative timestamps, they cannot go back to 19th century with 32-bit time_t, but they can with long long or whatever intmax_t is on their system. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/7] Number truncation with 4+ GB files on 32-bit systems
Nguyễn Thái Ngọc Duywrites: > A diff from nd/pack-ofs-4gb-limit can explain the changes better than > me. > > I did not add PRIdMAX or similar because that carries a risk to exotic > platforms that people rarely test. Just casting to unsigned should be > fine. Looks very sensible. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git v2.9.1
Jeff Kingwrites: > Definitely keep that paragraph. I am quite familiar with the test > helper and it was not the outcome I initially expected. > >> +test_lazy_prereq 64BIT_TIME ' >> +case "$(test-date show:iso 99)" in >> +*" -> 2038-"*) >> +# on this platform, unsigned long is 32-bit, i.e. not large >> enough >> +false > > I see you tightened up the match a little. TBH, I think we could > probably just match the whole output string, but I doubt there's much > chance of a false positive either way. Ah, it wasn't meant to be a tightening; rather the above is for documentation value, i.e. make it stand out what 2038 we are matching---its answer being "the year portion of the result of the conversion". -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] Speed up git status via a file system daemon
Idea taken and code refactored from [1]: The intent of this patch series is to separate the index-helper logic from the Watchman logic. With very large repos, the percentage of time required to read the index from disk becomes a much smaller percentage of the overall time. The Watchman logic provides a huge performance win for git status (8 seconds vs 12 minutes in our testing) vs the minimal benefits index-helper provides (<1 second). The current index-helper/watchman combination design requires two daemons, IPC between git and the index-helper, IPC between the index helper and Watchman, shared memory, adding a WAMA extension to the index, etc. The benefit of splitting these two efforts is a significantly simpler set of changes that give the benefits of utilizing a file system monitoring daemon like Watchman without the additional complexity and overhead associated with index-helper. The overall design of this refactored patch series is that when the index is read from disk, a hook proc is called that returns the list of potentially changed files. Git then uses this set of files to flag the potentially dirty index and untracked cache entries. The logic to use this information to limit the scope of which files and directories need to be checked for changes is the largely the same as the original patch series. One benefit of the hook proc is that it provides a simple, platform independent way to interface with the file system daemon. It does not require another daemon to be running to act as an intermediary between git and the file system daemon to support communicating via named pipes or sockets. The prototype currently has no mechanism for saving the date/time of the query into the index so the list of potentially changed files grows unbounded. Writing the additional 64 bit date/time data into an optional extension seems like a lot of overhead but changing the on disk index format is expensive as well from a compatibility perspective. Other ideas or suggestions welcome. Thanks, Ben [1] http://article.gmane.org/gmane.comp.version-control.git/298243 --- cache.h | 4 ++ dir.c| 14 - dir.h| 5 ++ read-cache.c | 174 ++- 4 files changed, 194 insertions(+), 3 deletions(-) diff --git a/cache.h b/cache.h index f1dc289..45c2eff 100644 --- a/cache.h +++ b/cache.h @@ -182,6 +182,8 @@ struct cache_entry { #define CE_VALID (0x8000) #define CE_STAGESHIFT 12 +#define CE_FSDAEMON_DIRTY (0x0001) + /* * Range 0x0FFF in ce_flags is divided into * two parts: in-memory flags and on-disk ones. @@ -338,6 +340,8 @@ struct index_state { struct hashmap dir_hash; unsigned char sha1[20]; struct untracked_cache *untracked; + /* BENPEART-TODO: persist last_update in the index, use gmtime? */ + time_t last_update; }; extern struct index_state the_index; diff --git a/dir.c b/dir.c index 6172b34..bce921f 100644 --- a/dir.c +++ b/dir.c @@ -584,7 +584,7 @@ static void trim_trailing_spaces(char *buf) * * If "name" has the trailing slash, it'll be excluded in the search. */ -static struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc, +struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc, struct untracked_cache_dir *dir, const char *name, int len) { @@ -1551,6 +1551,17 @@ static int valid_cached_dir(struct dir_struct *dir, if (!untracked) return 0; + if (dir->untracked->use_fsdaemon) { + /* +* With file system daemon, we can trust the untracked cache's +* valid field. +*/ + if (untracked->valid) + goto skip_stat; + else + invalidate_directory(dir->untracked, untracked); + } + if (stat(path->len ? path->buf : ".", )) { invalidate_directory(dir->untracked, untracked); memset(>stat_data, 0, sizeof(untracked->stat_data)); @@ -1564,6 +1575,7 @@ static int valid_cached_dir(struct dir_struct *dir, return 0; } +skip_stat: if (untracked->check_only != !!check_only) { invalidate_directory(dir->untracked, untracked); return 0; diff --git a/dir.h b/dir.h index bfde698..8b7b98a 100644 --- a/dir.h +++ b/dir.h @@ -139,6 +139,8 @@ struct untracked_cache { int gitignore_invalidated; int dir_invalidated; int dir_opened; + /* file system daemon invalidation data */ + unsigned int use_fsdaemon : 1; }; struct dir_struct { @@ -308,4 +310,7 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long void write_untracked_extension(struct strbuf *out, struct untracked_cache *untracked); void add_untracked_cache(struct
Re: [PATCH v3] config: add conditional include
On Wed, Jul 13, 2016 at 9:21 AM, Matthieu Moywrote: > Nguyễn Thái Ngọc Duy writes: > >> +`gitdir`:: >> + The environment variable `GIT_DIR` must match the following >> + pattern for files to be included. The pattern can contain >> + standard globbing wildcards and two additional ones, `**/` and >> + `/**`, that can match multiple path components. Please refer >> + to linkgit:gitignore[5] for details. For convenience: > > It's unclear to me whether the whole content of GIT_DIR must match, or > whether the pattern should be included (or a be prefix) of $GIT_DIR. > From this text, I read it as "the whole content", but ... > >> + ; include for all repositories inside /path/to/group >> + [include "gitdir:/path/to/group/"] >> + path = /path/to/foo.inc >> + >> + ; include for all repositories inside $HOME/to/group >> + [include "gitdir:~/to/group/"] >> + path = /path/to/foo.inc > > ... here it seems it only has to be a prefix. I should have written "with two additional ones... and a few exceptions"., One of the bullet point below would say the trailing slash is rewritten to "/**" so it becomes prefix match. If it proves confusing, I will probably just get rid of that. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git v2.9.1
Johannes Schindelinwrites: > This was just a quick fix, intended to allow me to release Git for Windows > v2.9.1 in a timely manner (which is now delayed for other reasons). > ... >> You'll need to adjust check_show as I did in my earlier patch. > > Makes sense! Hmph, so what is your preferred approach? You'll do your own v2.9.1 that is different from others at t0006? I was hoping to hear from you sooner and do v2.9.2 with your t0006 workaround with lazy-prereq changes from Peff (i.e. "Makes sense!" above), so that you do not have to do two releases in a row (i.e. skipping v2.9.1 saying "Git for Windows skipped that one because it was not quite right; this release fixes the issue" in your v2.9.2 announcement). Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/8] add smudgeToFile and cleanFromFile filter configs
> On 12 Jul 2016, at 00:45, Joey Hesswrote: > > This adds new smudgeToFile and cleanFromFile filter commands, > which are similar to smudge and clean but allow direct access to files on > disk. > > This interface can be much more efficient when operating on large files, > because the whole file content does not need to be streamed through the > filter. It even allows for things like cleanFromFile commands that avoid > reading the whole content of the file, and for smudgeToFile commands that > populate a work tree file using an efficient Copy On Write operation. > > The new filter commands will not be used for all filtering. They are > efficient to use when git add is adding a file, or when the work tree is > being updated, but not a good fit when git is internally filtering blob > objects in memory for eg, a diff. > > So, a user who wants to use smudgeToFile should also provide a smudge > command to be used in cases where smudgeToFile is not used. And ditto > with cleanFromFile and clean. To avoid foot-shooting configurations, the > new commands are not used unless the old commands are also configured. > > That also ensures that a filter driver configuration that includes these > new commands will work, although less efficiently, when used with an older > version of git that does not support them. > > Signed-off-by: Joey Hess > --- > Documentation/config.txt| 18 ++- > Documentation/gitattributes.txt | 37 ++ > convert.c | 111 +++- > convert.h | 10 > 4 files changed, 160 insertions(+), 16 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 19493aa..a55bed8 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -1325,15 +1325,29 @@ format.useAutoBase:: > format-patch by default. > > filter..clean:: > - The command which is used to convert the content of a worktree > + The command which is used as a filter to convert the content of a > worktree > file to a blob upon checkin. See linkgit:gitattributes[5] for > details. > > filter..smudge:: > - The command which is used to convert the content of a blob > + The command which is used as a filter to convert the content of a blob > object to a worktree file upon checkout. See > linkgit:gitattributes[5] for details. > > +filter..cleanFromFile:: > + Similar to filter..clean but the specified command > + directly accesses a worktree file on disk, rather than > + receiving the file content from standard input. > + Only used when filter..clean is also configured. > + See linkgit:gitattributes[5] for details. > + > +filter..smudgeToFile:: > + Similar to filter..smudge but the specified command > + writes the content of a blob directly to a worktree file, > + rather than to standard output. > + Only used when filter..smudge is also configured. > + See linkgit:gitattributes[5] for details. > + > fsck.:: > Allows overriding the message type (error, warn or ignore) of a > specific message ID such as `missingEmail`. > diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt > index 197ece8..a58aafc 100644 > --- a/Documentation/gitattributes.txt > +++ b/Documentation/gitattributes.txt > @@ -385,6 +385,43 @@ not exist, or may have different contents. So, smudge > and clean commands > should not try to access the file on disk, but only act as filters on the > content provided to them on standard input. > > +There are two extra commands "cleanFromFile" and "smudgeToFile", which > +can optionally be set in a filter driver. These are similar to the "clean" > +and "smudge" commands, but avoid needing to pipe the contents of files > +through the filters, and instead read/write files in the filesystem. > +This can be more efficient when using filters with large files that are not > +directly stored in the repository. > + > +Both "cleanFromFile" and "smudgeToFile" are provided a path as an > +added parameter after the configured command line. > + > +The "cleanFromFile" command is provided the path to the file that > +it should clean. Like the "clean" command, it should output the cleaned > +version to standard output. > + > +The "smudgeToFile" command is provided a path to the file that it > +should write to. (This file will already exist, as an empty file that can > +be written to or replaced.) Like the "smudge" command, "smudgeToFile" > +is fed the blob object from its standard input. > + > +Some git operations that need to apply filters cannot use "cleanFromFile" > +and "smudgeToFile", since the files are not present to disk. So, to avoid > +inconsistent behavior, "cleanFromFile" will only be used if "clean" is > +also configured, and "smudgeToFile" will only be used if "smudge" is also > +configured. > + > +An example large file storage
[PATCH v2 4/7] index-pack: report correct bad object offsets even if they are large
Use the right type for offsets in this case, off_t, which makes a difference on 32-bit systems with large file support, and change formatting code accordingly. Signed-off-by: Nguyễn Thái Ngọc DuySigned-off-by: Junio C Hamano --- builtin/index-pack.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index cafaab7..e2d8ae4 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -338,10 +338,10 @@ static void parse_pack_header(void) use(sizeof(struct pack_header)); } -static NORETURN void bad_object(unsigned long offset, const char *format, +static NORETURN void bad_object(off_t offset, const char *format, ...) __attribute__((format (printf, 2, 3))); -static NORETURN void bad_object(unsigned long offset, const char *format, ...) +static NORETURN void bad_object(off_t offset, const char *format, ...) { va_list params; char buf[1024]; @@ -349,7 +349,8 @@ static NORETURN void bad_object(unsigned long offset, const char *format, ...) va_start(params, format); vsnprintf(buf, sizeof(buf), format, params); va_end(params); - die(_("pack has bad object at offset %lu: %s"), offset, buf); + die(_("pack has bad object at offset %"PRIuMAX": %s"), + (uintmax_t)offset, buf); } static inline struct thread_local *get_thread_data(void) -- 2.9.1.564.gb2f7278 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/7] sha1_file.c: use type off_t* for object_info->disk_sizep
This field, filled by sha1_object_info() contains the on-disk size of an object, which could go over 4GB limit of unsigned long on 32-bit systems. Use off_t for it instead and update all callers. Signed-off-by: Nguyễn Thái Ngọc DuySigned-off-by: Junio C Hamano --- builtin/cat-file.c | 4 ++-- cache.h| 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 54db118..13ed944 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -131,7 +131,7 @@ struct expand_data { unsigned char sha1[20]; enum object_type type; unsigned long size; - unsigned long disk_size; + off_t disk_size; const char *rest; unsigned char delta_base_sha1[20]; @@ -184,7 +184,7 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len, if (data->mark_query) data->info.disk_sizep = >disk_size; else - strbuf_addf(sb, "%lu", data->disk_size); + strbuf_addf(sb, "%"PRIuMAX, (uintmax_t)data->disk_size); } else if (is_atom("rest", atom, len)) { if (data->mark_query) data->split_on_whitespace = 1; diff --git a/cache.h b/cache.h index 4ff196c..ea64b51 100644 --- a/cache.h +++ b/cache.h @@ -1502,7 +1502,7 @@ struct object_info { /* Request */ enum object_type *typep; unsigned long *sizep; - unsigned long *disk_sizep; + off_t *disk_sizep; unsigned char *delta_base_sha1; struct strbuf *typename; -- 2.9.1.564.gb2f7278 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/38] Virtualization of the refs API
On 07/10/2016 05:09 PM, Duy Nguyen wrote: > On Fri, Jun 3, 2016 at 11:03 PM, Michael Haggerty> wrote: >> Since the that ref-iterator [1] changes seem to have gotten a positive >> reception, let's try to keep up the momentum. I hope I'm not >> overloading the review pipeline... >> >> I think all of the groundwork is in place now to virtualize the refs >> API. This will open the way to storing refs in ways other than the >> familiar loose refs / packed refs format, such as David Turner's >> proposed LMDB-based storage [2]. >> >> This is a long patch series, but most of the patches are pretty simple >> and formulaic. The goal is to implement a `ref_store`. In the language >> of object-oriented programming, `ref_store` is an abstract base class >> representing a reference storage backend. It provides methods to read, >> write, and delete references and symrefs, and to iterate over >> references, reflogs, and reflog entries, plus a number of other >> things—19 methods in all. > > I probably don't know what I'm talking about because I don't follow > your work closely enough. Please ignore if this is nonsense. But if we > extend/change API, we might need to update git-for-each-ref too, to > expose it to shell scripts and external commands. I guess for > iteration there's nothing else more needed, but we may need to > introduction new options for the storage thing, e.g. to select > storage... This patch series doesn't change the external API in any significant way. It only wraps it up on a virtualization layer so that a different reference storage backends can be plugged in. So as long as people are using plumbing commands to work with references (rather than reading/writing files under $GIT_DIR directly), they should notice no difference. There are only two exceptions that I know of: 1. Users will need to be able to request that a repository use a non-default reference backend, and (less importantly) inquire about which reference backend a particular repository is using. Those facilities will be added when the first non-files reference backend is added. 2. At least one command (`git pack-refs`) is particular to the files backend, and won't be needed (at least not in its current form) for other backends. Conversely, it is conceivable that future reference backends will require their own "maintenance" commands. Such commands would be added as they are needed. If there were operations that other reference backends could do much more efficiently than the files backend (like, hypothetically, return all references matching a regular expression without having to iterate through all references), then it might make sense for performance reasons to provide commands to access that functionality. But at the moment I don't know of any such cases. Michael -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/7] index-pack: correct "offset" type in unpack_entry_data()
unpack_entry_data() receives an off_t value from unpack_raw_entry(), which could be larger than unsigned long on 32-bit systems with large file support. Correct the type so truncation does not happen. This only affects bad object reporting though. Signed-off-by: Nguyễn Thái Ngọc DuySigned-off-by: Junio C Hamano --- builtin/index-pack.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index e2d8ae4..1008d7f 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -430,7 +430,7 @@ static int is_delta_type(enum object_type type) return (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA); } -static void *unpack_entry_data(unsigned long offset, unsigned long size, +static void *unpack_entry_data(off_t offset, unsigned long size, enum object_type type, unsigned char *sha1) { static char fixed_buf[8192]; -- 2.9.1.564.gb2f7278 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 6/7] pack-objects: do not truncate result in-pack object size on 32-bit systems
A typical diff will not show what's going on and you need to see full functions. The core code is like this, at the end of of write_one() e->idx.offset = *offset; size = write_object(f, e, *offset); if (!size) { e->idx.offset = recursing; return WRITE_ONE_BREAK; } written_list[nr_written++] = >idx; /* make sure off_t is sufficiently large not to wrap */ if (signed_add_overflows(*offset, size)) die("pack too large for current definition of off_t"); *offset += size; Here we can see that the in-pack object size is returned by write_object (or indirectly by write_reuse_object). And it's used to calculate object offsets, which end up in the pack index file, generated at the end. If "size" overflows (on 32-bit sytems, unsigned long is 32-bit while off_t can be 64-bit), we got wrong offsets and produce incorrect .idx file, which may make it look like the .pack file is corrupted. Signed-off-by: Nguyễn Thái Ngọc DuySigned-off-by: Junio C Hamano --- builtin/pack-objects.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index a3a98c5..ac7a3a5 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -341,8 +341,8 @@ static unsigned long write_no_reuse_object(struct sha1file *f, struct object_ent } /* Return 0 if we will bust the pack-size limit */ -static unsigned long write_reuse_object(struct sha1file *f, struct object_entry *entry, - unsigned long limit, int usable_delta) +static off_t write_reuse_object(struct sha1file *f, struct object_entry *entry, + unsigned long limit, int usable_delta) { struct packed_git *p = entry->in_pack; struct pack_window *w_curs = NULL; @@ -415,11 +415,12 @@ static unsigned long write_reuse_object(struct sha1file *f, struct object_entry } /* Return 0 if we will bust the pack-size limit */ -static unsigned long write_object(struct sha1file *f, - struct object_entry *entry, - off_t write_offset) +static off_t write_object(struct sha1file *f, + struct object_entry *entry, + off_t write_offset) { - unsigned long limit, len; + unsigned long limit; + off_t len; int usable_delta, to_reuse; if (!pack_to_stdout) @@ -491,7 +492,7 @@ static enum write_one_status write_one(struct sha1file *f, struct object_entry *e, off_t *offset) { - unsigned long size; + off_t size; int recursing; /* -- 2.9.1.564.gb2f7278 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/7] pack-objects: pass length to check_pack_crc() without truncation
On 32 bit systems with large file support, unsigned long is 32-bit while the two offsets in the subtraction expression (pack-objects has the exact same expression as in sha1_file.c but not shown in diff) are in 64-bit. If an in-pack object is larger than 2^32 len/datalen is truncated and we get a misleading "error: bad packed object CRC for ..." as a result. Use off_t for len and datalen. check_pack_crc() already accepts this argument as off_t and can deal with 4+ GB. Noticed-by: Christoph MichelbachSigned-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- builtin/pack-objects.c | 2 +- sha1_file.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index b6664ce..a3a98c5 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -349,7 +349,7 @@ static unsigned long write_reuse_object(struct sha1file *f, struct object_entry struct revindex_entry *revidx; off_t offset; enum object_type type = entry->type; - unsigned long datalen; + off_t datalen; unsigned char header[10], dheader[10]; unsigned hdrlen; diff --git a/sha1_file.c b/sha1_file.c index d0f2aa0..cd9b560 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2282,7 +2282,7 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset, if (do_check_packed_object_crc && p->index_version > 1) { struct revindex_entry *revidx = find_pack_revindex(p, obj_offset); - unsigned long len = revidx[1].offset - obj_offset; + off_t len = revidx[1].offset - obj_offset; if (check_pack_crc(p, _curs, obj_offset, len, revidx->nr)) { const unsigned char *sha1 = nth_packed_object_sha1(p, revidx->nr); -- 2.9.1.564.gb2f7278 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/7] Number truncation with 4+ GB files on 32-bit systems
A diff from nd/pack-ofs-4gb-limit can explain the changes better than me. I did not add PRIdMAX or similar because that carries a risk to exotic platforms that people rarely test. Just casting to unsigned should be fine. diff --git a/builtin/fsck.c b/builtin/fsck.c index 55eac75..b08bc8b 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -356,6 +356,10 @@ static int fsck_sha1(const unsigned char *sha1) static int fsck_obj_buffer(const unsigned char *sha1, enum object_type type, unsigned long size, void *buffer, int *eaten) { + /* +* Note, buffer may be NULL if type is OBJ_BLOB. See +* verify_packfile(), data_valid variable for details. +*/ struct object *obj; obj = parse_object_buffer(sha1, type, size, buffer, eaten); if (!obj) { diff --git a/pack-check.c b/pack-check.c index 14e8cb0..d123846 100644 --- a/pack-check.c +++ b/pack-check.c @@ -106,7 +106,7 @@ static int verify_packfile(struct packed_git *p, enum object_type type; unsigned long size; off_t curpos; - int data_valid = 0; + int data_valid; if (p->index_version > 1) { off_t offset = entries[i].offset; @@ -130,6 +130,7 @@ static int verify_packfile(struct packed_git *p, * the data in-core only to discard. */ data = NULL; + data_valid = 0; } else { data = unpack_entry(p, entries[i].offset, , ); data_valid = 1; diff --git a/t/t1050-large.sh b/t/t1050-large.sh index f9f3d13..096dbff 100755 --- a/t/t1050-large.sh +++ b/t/t1050-large.sh @@ -177,10 +177,9 @@ test_expect_success 'zip achiving, deflate' ' git archive --format=zip HEAD >/dev/null ' -test_expect_success 'fsck' ' - test_must_fail git fsck 2>err && - n=$(grep "error: attempting to allocate .* over limit" err | wc -l) && - test "$n" -gt 1 +test_expect_success 'fsck large blobs' ' + git fsck 2>err && + test_must_be_empty err ' test_done Nguyễn Thái Ngọc Duy (7): pack-objects: pass length to check_pack_crc() without truncation sha1_file.c: use type off_t* for object_info->disk_sizep index-pack: correct "len" type in unpack_data() index-pack: report correct bad object offsets even if they are large index-pack: correct "offset" type in unpack_entry_data() pack-objects: do not truncate result in-pack object size on 32-bit systems fsck: use streaming interface for large blobs in pack builtin/cat-file.c | 4 ++-- builtin/fsck.c | 4 builtin/index-pack.c | 23 --- builtin/pack-objects.c | 17 + cache.h| 2 +- pack-check.c | 23 +-- pack.h | 1 + sha1_file.c| 2 +- t/t1050-large.sh | 7 +++ 9 files changed, 54 insertions(+), 29 deletions(-) -- 2.9.1.564.gb2f7278 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/7] index-pack: correct "len" type in unpack_data()
On 32-bit systems with large file support, one entry could be larger than 4GB and overflow "len". Correct it so we can unpack a full entry. Signed-off-by: Nguyễn Thái Ngọc DuySigned-off-by: Junio C Hamano --- builtin/index-pack.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index e8c71fc..cafaab7 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -549,13 +549,13 @@ static void *unpack_data(struct object_entry *obj, void *cb_data) { off_t from = obj[0].idx.offset + obj[0].hdr_size; - unsigned long len = obj[1].idx.offset - from; + off_t len = obj[1].idx.offset - from; unsigned char *data, *inbuf; git_zstream stream; int status; data = xmallocz(consume ? 64*1024 : obj->size); - inbuf = xmalloc((len < 64*1024) ? len : 64*1024); + inbuf = xmalloc((len < 64*1024) ? (int)len : 64*1024); memset(, 0, sizeof(stream)); git_inflate_init(); @@ -563,15 +563,15 @@ static void *unpack_data(struct object_entry *obj, stream.avail_out = consume ? 64*1024 : obj->size; do { - ssize_t n = (len < 64*1024) ? len : 64*1024; + ssize_t n = (len < 64*1024) ? (ssize_t)len : 64*1024; n = xpread(get_thread_data()->pack_fd, inbuf, n, from); if (n < 0) die_errno(_("cannot pread pack file")); if (!n) - die(Q_("premature end of pack file, %lu byte missing", - "premature end of pack file, %lu bytes missing", - len), - len); + die(Q_("premature end of pack file, %"PRIuMAX" byte missing", + "premature end of pack file, %"PRIuMAX" bytes missing", + (unsigned int)len), + (uintmax_t)len); from += n; len -= n; stream.next_in = inbuf; -- 2.9.1.564.gb2f7278 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 7/7] fsck: use streaming interface for large blobs in pack
For blobs, we want to make sure the on-disk data is not corrupted (i.e. can be inflated and produce the expected SHA-1). Blob content is opaque, there's nothing else inside to check for. For really large blobs, we may want to avoid unpacking the entire blob in memory, just to check whether it produces the same SHA-1. On 32-bit systems, we may not have enough virtual address space for such memory allocation. And even on 64-bit where it's not a problem, allocating a lot more memory could result in kicking other parts of systems to swap file, generating lots of I/O and slowing everything down. For this particular operation, not unpacking the blob and letting check_sha1_signature, which supports streaming interface, do the job is sufficient. check_sha1_signature() is not shown in the diff, unfortunately. But if will be called when "data_valid && !data" is false. We will call the callback function "fn" with NULL as "data". The only callback of this function is fsck_obj_buffer(), which does not touch "data" at all if it's a blob. Signed-off-by: Nguyễn Thái Ngọc DuySigned-off-by: Junio C Hamano --- builtin/fsck.c | 4 pack-check.c | 23 +-- pack.h | 1 + t/t1050-large.sh | 7 +++ 4 files changed, 29 insertions(+), 6 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 55eac75..b08bc8b 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -356,6 +356,10 @@ static int fsck_sha1(const unsigned char *sha1) static int fsck_obj_buffer(const unsigned char *sha1, enum object_type type, unsigned long size, void *buffer, int *eaten) { + /* +* Note, buffer may be NULL if type is OBJ_BLOB. See +* verify_packfile(), data_valid variable for details. +*/ struct object *obj; obj = parse_object_buffer(sha1, type, size, buffer, eaten); if (!obj) { diff --git a/pack-check.c b/pack-check.c index 1da89a4..d123846 100644 --- a/pack-check.c +++ b/pack-check.c @@ -105,6 +105,8 @@ static int verify_packfile(struct packed_git *p, void *data; enum object_type type; unsigned long size; + off_t curpos; + int data_valid; if (p->index_version > 1) { off_t offset = entries[i].offset; @@ -116,8 +118,25 @@ static int verify_packfile(struct packed_git *p, sha1_to_hex(entries[i].sha1), p->pack_name, (uintmax_t)offset); } - data = unpack_entry(p, entries[i].offset, , ); - if (!data) + + curpos = entries[i].offset; + type = unpack_object_header(p, w_curs, , ); + unuse_pack(w_curs); + + if (type == OBJ_BLOB && big_file_threshold <= size) { + /* +* Let check_sha1_signature() check it with +* the streaming interface; no point slurping +* the data in-core only to discard. +*/ + data = NULL; + data_valid = 0; + } else { + data = unpack_entry(p, entries[i].offset, , ); + data_valid = 1; + } + + if (data_valid && !data) err = error("cannot unpack %s from %s at offset %"PRIuMAX"", sha1_to_hex(entries[i].sha1), p->pack_name, (uintmax_t)entries[i].offset); diff --git a/pack.h b/pack.h index 3223f5a..0e77429 100644 --- a/pack.h +++ b/pack.h @@ -74,6 +74,7 @@ struct pack_idx_entry { struct progress; +/* Note, the data argument could be NULL if object type is blob */ typedef int (*verify_fn)(const unsigned char*, enum object_type, unsigned long, void*, int*); extern const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, const unsigned char *sha1); diff --git a/t/t1050-large.sh b/t/t1050-large.sh index f9f3d13..096dbff 100755 --- a/t/t1050-large.sh +++ b/t/t1050-large.sh @@ -177,10 +177,9 @@ test_expect_success 'zip achiving, deflate' ' git archive --format=zip HEAD >/dev/null ' -test_expect_success 'fsck' ' - test_must_fail git fsck 2>err && - n=$(grep "error: attempting to allocate .* over limit" err | wc -l) && - test "$n" -gt 1 +test_expect_success 'fsck large blobs' ' + git fsck 2>err && + test_must_be_empty err ' test_done -- 2.9.1.564.gb2f7278 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 8/8] read-cache: unlink old sharedindex files
On Tue, Jul 12, 2016 at 9:45 PM, Christian Couderwrote: > On Tue, Jul 12, 2016 at 5:12 PM, Duy Nguyen wrote: >> >> No. People could create an index file anywhere in theory. So you don't >> know how many index files there are. > > Maybe when an index file is created, its path and its sharedindex file > could be appended into a log file. > We could check this log file to see if we can remove sharedindex files. > We would need to remove the entries in this log file for the indexes > that are no longer there. > > Or instead of one log file we could have a file for each index file in > a special directory called for example "indexinfo". > So we could just delete the file if its related index is no longer there. New files will require locking so people don't append at the same time. And maybe another new host of problems. I think we can just go with the garbage collection way that we have done for unreachable objects. Your indexinfo idea looks very close to multiple locking, an index would lock the shared index it's linked to, preventing it from being removed. For single locking, we can just create a file named $X.lock, but for multiple locks, maybe we can go with $X.lock-$index_trailing_sha1. Will it work? I don't know. Just thinking out loud. >> It really depends. If the shared part is too small for old indexes, we >> might as well unsplit them. In practice though, the only long-term >> index file is $GIT_DIR/index. If we don't delete old shared index >> files too close to their creation time, temp index files will go away. > > We could treat $GIT_DIR/index specially so that if there are no temp > index files, there should be nothing in "indexinfo". No, temp index files are needed. And it will be hard to treat $GIT_DIR/index specially because updating it involves a temp index: you first prepare a new index in $GIT_DIR/index.lock. If everything goes well, you atomically rename it to $GIT_DIR/index. You may be able to treat $GIT_DIR/index.lock special too, but that's starting to get out of hand. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/4] test-lib.sh: introduce and use $_EMPTY_TREE
On Tue, Jul 12, 2016 at 10:40 PM, Junio C Hamanowrote: > Nguyễn Thái Ngọc Duy writes: > >> This is a special SHA1. Let's keep it at one place, easier to replace >> later when the hash change comes, easier to recognize. Start with an >> underscore to reduce the chances somebody may override it without >> realizing it's predefined. >> >> Signed-off-by: Nguyễn Thái Ngọc Duy >> --- >> t/t-basic.sh| 2 +- >> t/t1100-commit-tree-options.sh | 2 +- >> t/t4010-diff-pathspec.sh| 6 ++ >> t/t4054-diff-bogus-tree.sh | 10 -- >> t/t5504-fetch-receive-strict.sh | 4 ++-- >> t/test-lib.sh | 4 +++- >> 6 files changed, 13 insertions(+), 15 deletions(-) >> >> diff --git a/t/t-basic.sh b/t/t-basic.sh >> index 60811a3..48214e9 100755 >> --- a/t/t-basic.sh >> +++ b/t/t-basic.sh >> @@ -834,7 +834,7 @@ test_expect_success 'git write-tree should be able to >> write an empty tree' ' >> ' >> >> test_expect_success 'validate object ID of a known tree' ' >> - test "$tree" = 4b825dc642cb6eb9a060e54bf8d69288fbee4904 >> + test "$tree" = $_EMPTY_TREE >> ' > > I doubt the point of, and I'd rather not to see, the leading > underscore. Are there existing uses of the name that want it to > mean something different? No. There is EMPTY_TREE in use, but it's exactly what we expect. It's probably still a good idea to separate "global" variables from per-file ones. But I don't feel strongly about this, so if a re-roll is required (or somebody votes for underscore removal, including you), I'll remove the underscore. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 4/4] cache-tree: do not generate empty trees as a result of all i-t-a subentries
On Tue, Jul 12, 2016 at 10:49 PM, Junio C Hamanowrote: > Nguyễn Thái Ngọc Duy writes: > >> diff --git a/cache-tree.c b/cache-tree.c >> index c2676e8..2d50640 100644 >> --- a/cache-tree.c >> +++ b/cache-tree.c >> @@ -380,6 +380,13 @@ static int update_one(struct cache_tree *it, >> continue; >> } >> >> + /* >> + * "sub" can be an empty tree if subentries are i-t-a. >> + */ >> + if (sub && sub->cache_tree->entry_count < 0 && >> + !hashcmp(sha1, EMPTY_TREE_SHA1_BIN)) >> + continue; >> + > > Looks sensible, except that it is unclear if it is correct to assume > that "subentries are ita" always equals to "entry_count < 0" here. > I _think_ it is OK, as the function itself does use the latter as > the sign that it hit to_invalidate=1 case when it returns. I had the same concern and double checked it. If one day we have a new entry_count < 0 case that's not i-t-a, we'll need to refactor this code a bit. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
On Wed, Jul 13, 2016 at 10:20 AM, Johannes Schindelinwrote: > Hi Duy, > > On Tue, 12 Jul 2016, Duy Nguyen wrote: > >> On Tue, Jul 12, 2016 at 5:51 PM, Jeff King wrote: >> > On Tue, Jul 12, 2016 at 05:46:12PM +0200, Duy Nguyen wrote: >> > >> >> I'm not opposed to letting one worktree see everything, but this move >> >> makes it harder to write new scripts (or new builtin commands, even) >> >> that works with both single and multiple worktrees because you refer >> >> to one ref (in current worktree perspective) differently. If we kill >> >> of the main worktree (i.e. git init always creates a linked worktree) >> >> then it's less of a problem, but still a nuisance to write >> >> refs/worktree/$CURRENT/ everywhere. >> > >> > True. I gave a suggestion for the reading side, but the writing side >> > would still remain tedious. >> > >> > I wonder if, in a worktree, we could simply convert requests to read or >> > write names that do not begin with "refs/" as "refs/worktree/$CURRENT/"? >> > That makes it a read/write-time alias conversion, but the actual storage >> > is just vanilla (so the ref storage doesn't need to care, and >> > reachability just works). >> >> A conversion like that is already happening, but it works at >> git_path() level instead and maps anything outside refs/ to >> worktrees/$CURRENT. > > Wouldn't you agree that the entire discussion goes into a direction that > reveals that it might simply be a better idea to require commands that want > to have per-worktree refs to do that explicitly? No. To me that's equivalent to let people deal explicitly with file-based and lmdb refs backends everywhere. Unless the main worktree concept will die (I doubt it) it may remain the common use case that people care about and extra worktrees become second citizen that's rarely tested. I prefer we have a single interface for dealing with _any_ worktree. If there are fallouts, we deal with them. > The same holds true for the config, BTW. I really have no love for the > idea to make the config per-worktree. It just holds too many nasty > opportunities for violate the Law of Least Surprises. > > Just to name one: imagine you check out a different branch in worktree A, > then switch worktree B to the branch that A had, and all of a sudden you > may end up with a different upstream! Everything in moderation. You wouldn't want to enable sparse checkout on one worktree and it suddenly affects all worktrees because core.sparsecheckout is shared. And submodules are known not to work when core.worktree is still shared. I will not enforce any rule (unless it's very obvious that the other way is wrong, like core.worktree). I will give you a rifle and you can either hunt for food or shoot your foot. In other words, you should be able to share everything if you like it that way while someone else can share just some config vars, or even nothing in config file. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git v2.9.1
Hi, On Tue, 12 Jul 2016, Junio C Hamano wrote: > Jeff Kingwrites: > > > In case it wasn't clear, I was mostly guessing there. So I dug a bit > > further, and indeed, I am wrong. Linux never bumped to a 64-bit time_t > > on i386 because of the ABI headaches. > > X-< (yes, I knew). > > > That being said, I still think the "clamp to time_t" strategy is > > reasonable. Unless you are doing something really exotic like pretending > > to be from the future, nobody will care for 20 years. > > Yup. It is a minor regression for them to go from ulong to time_t, > because they didn't have to care for 90 years or so but now they do > in 20 years, I'd guess, but hopefully after that many years, > everybody's time_t would be sufficiently large. > > I suspect Cobol programmers in the 50s would have said a similar > thing about the y2k timebomb they created back then, though ;-) > > > And at that point, systems with a 32-bit time_t are going to have > > to do _something_, because time() is going to start returning > > bogus values. So as long as we behave reasonably (e.g., clamping > > values and not generating wrapped nonsense), I think that's a fine > > solution. > > OK. I kept the unsigned long -> time_t conversion after reading the thread so far. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Loan Offer
Instant cash Loan with same day payout on all kinds of Loan are available at Quick Financial Home were loan is offered at 2% per annul. Email: quickloa...@foxmail.com -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Loan Offer
Instant cash Loan with same day payout on all kinds of Loan are available at Quick Financial Home were loan is offered at 2% per annul. Email: quickloa...@foxmail.com -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] config: add conditional include
Jeff Kingwrites: > On Wed, Jul 13, 2016 at 09:21:37AM +0200, Matthieu Moy wrote: > >> > +static int prepare_include_condition_pattern(struct strbuf *pat) >> > +{ >> > + int prefix = 0; >> > + >> > + /* TODO: maybe support ~user/ too */ >> > + if (pat->buf[0] == '~' && is_dir_sep(pat->buf[1])) { >> > + struct strbuf path = STRBUF_INIT; >> > + const char *home = getenv("HOME"); >> > + >> > + if (!home) >> > + return error(_("$HOME is not defined")); >> >> expand_user_path in path.c seems to do the same as you're doing (but >> does deal with ~user). Any reason not to use it? > > I had a similar question, which Duy answered in: > > http://article.gmane.org/gmane.comp.version-control.git/298528 > > It does feel pretty hacky, though (especially for a case that seems > unlikely to come up: people having wildcard patterns in the name of > their home directory). Ah, OK. Then I'd suggest at least an improvement in the comment: /* -* perform literal matching on the prefix part so that -* any wildcard character in it can't create side effects. +* perform literal matching on the expanded prefix part +* so that any wildcard character in it (e.g in the +* expansion of ~) can't create side effects. */ I would also rename the variable prefix -> expanded_prefix. As-is, the code is really hard to understand IMHO. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git v2.9.1
Hi Peff, On Tue, 12 Jul 2016, Jeff King wrote: > On Tue, Jul 12, 2016 at 01:25:20PM +0200, Johannes Schindelin wrote: > > > [PATCH] Work around test failures due to timestamps being unsigned long > > > > Git's source code refers to timestamps as unsigned longs. On 32-bit > > platforms, as well as on Windows, unsigned long is not large enough to > > capture dates that are "absurdly far in the future". > > > > While we will fix this issue properly by replacing unsigned long -> > > time_t, on the maint track we want to be a bit more conservative and > > just skip those tests. > > > > Signed-off-by: Johannes Schindelin> > --- > > This looks like a reasonable interim fix for both Windows and for the > general maint track. If we reliably produce "2038" for the 999... value > then that's simpler than adding in magic to ask about sizeof(ulong). I > also wondered if we should test forthe _correct_ value, but that would > defeat the point of the test (999... is also far in the future). This was just a quick fix, intended to allow me to release Git for Windows v2.9.1 in a timely manner (which is now delayed for other reasons). I think I can do even better than to inspect the source code whether it reliably clamps the dates to PI hours on January 19th, 2038: We already have a t/helper/test-date and we can easily teach it one new trick. > One minor comment: > > > diff --git a/t/t0006-date.sh b/t/t0006-date.sh > > index 04ce535..d185640 100755 > > --- a/t/t0006-date.sh > > +++ b/t/t0006-date.sh > > @@ -48,10 +48,17 @@ check_show default "$TIME" 'Wed Jun 15 16:13:20 2016 > > +0200' > > check_show raw "$TIME" '146600 +0200' > > check_show iso-local "$TIME" '2016-06-15 14:13:20 +' > > > > -# arbitrary time absurdly far in the future > > -FUTURE="5758122296 -0400" > > -check_show iso "$FUTURE" "2152-06-19 18:24:56 -0400" > > -check_show iso-local "$FUTURE" "2152-06-19 22:24:56 +" > > +case "$(test-date show:iso 99)" in > > +*2038*) > > + # on this platform, unsigned long is 32-bit, i.e. not large enough > > + ;; > > +*) > > + # arbitrary time absurdly far in the future > > + FUTURE="5758122296 -0400" > > + check_show iso "$FUTURE" "2152-06-19 18:24:56 -0400" > > + check_show iso-local "$FUTURE" "2152-06-19 22:24:56 +" > > + ;; > > +esac > > It would be nice to wrap this in a prereq, like: > > test_lazy_prereq 64BIT_TIME ' > case "$(test-date show:short 99)" in > *2038*) > false > ;; > *) > true > ;; > esac > ' > > ... > check_show 64BIT_TIME iso "$FUTURE" "2152-06-19 18:24:56 -0400" > check_show 64BIT_TIME iso-local "$FUTURE" "2152-06-19 22:24:56 +" > > The main advantage is that it will number the tests consistently, and > give us a "skipped" message (which can remind us to drop the prereq > later when everything goes 64-bit). > > But it also will do the right thing with test-date's output with respect > to "-v" (not that we expect it to produce any output). > > You'll need to adjust check_show as I did in my earlier patch. Makes sense! Thanks, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git v2.9.1
Hi Duy, On Tue, 12 Jul 2016, Duy Nguyen wrote: > On Tue, Jul 12, 2016 at 3:46 PM, Jeff Kingwrote: > > On Tue, Jul 12, 2016 at 03:31:00PM +0200, Andreas Schwab wrote: > > > >> Johannes Schindelin writes: > >> > >> > On Tue, 12 Jul 2016, Andreas Schwab wrote: > >> > > >> >> Johannes Schindelin writes: > >> >> > >> >> >> PRIuMAX isn't compatible with time_t. > >> >> > > >> >> > That statement is wrong. > >> >> > >> >> No, it isn't. PRIuMAX is for uintmax_t, and time_t is not uintmax_t > >> >> (even if they happen to have the same representation). > >> > > >> > Sigh. > >> > > >> > So if it is wrong, what is right? > >> > >> The right thing is to add a cast, of course. > > > > In general, I think the right cast for time_t should be to (intmax_t), > > and the formatting string should be PRIdMAX (which, as an aside, needs > > an entry in git-compat-util.h). > > Coincidentally, I have the same problem with unsigned long being > 32-bit and have to switch to off_t in some places. Does anybody know > what a fallback in git-compat-util for PRIdMAX would look like? I > guess it's "lld"... Yes, judging from the existing fallback for PRIuMAX, "lld" would be the correct thing to do. And then it would be nice to introduce #define PRIdMAX "I64d" next to the PRIuMAX definition in compat/mingw.h, too. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git v2.9.1
Hi Andreas, On Tue, 12 Jul 2016, Andreas Schwab wrote: > Johannes Schindelinwrites: > > > On Tue, 12 Jul 2016, Andreas Schwab wrote: > > > >> Johannes Schindelin writes: > >> > >> >> PRIuMAX isn't compatible with time_t. > >> > > >> > That statement is wrong. > >> > >> No, it isn't. PRIuMAX is for uintmax_t, and time_t is not uintmax_t > >> (even if they happen to have the same representation). > > > > Sigh. > > > > So if it is wrong, what is right? > > The right thing is to add a cast, of course. This was not helpful. Ciao, Johannes -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too
On Wed, Jul 13, 2016 at 04:26:53AM -0400, Jeff King wrote: > On Fri, Jul 08, 2016 at 01:38:55PM +0300, Kirill Smelkov wrote: > > > > - we will not compute the same write order (which is based on > > > traversal order), leading to packs that have less efficient cache > > > characteristics > > > > I agree the order can be not exactly the same. Still if original pack is > > packed well (with good recency order), while using bitmap we will tend > > to traverse it in close to original order. > > > > Maybe I'm not completely right on this, but to me it looks to be the > > case because if objects in original pack are put there linearly sorted > > by recency order, and we use bitmap index to set of all reachable > > objects from a root, and then just _linearly_ gather all those objects > > from original pack by 1s in bitmap and put them in the same order into > > destination pack, the recency order won't be broken. > > > > Or am I maybe misunderstanding something? > > Yeah, I think you can go some of the way by reusing the order from the > old pack. But keep in mind that the bitmap result may also contain > objects that are not yet packed. Those will just come in a big lump at > the end of the bitmap (these are the "extended entries" in the bitmap > code). > > So I think if you were to repeatedly "git repack -adb" over time, you > would get worse and worse ordering as objects are added to the > repository. Jeff, first of all thanks for clarifying. So it is not-yet-packed-objects which make packing with bitmap less efficient. I was originally keeping in mind fresh repacked repository with just built bitmap index and for that case extracting pack with bitmap index seems to be just ok, but the more not-yet-packed objects we have the worse the result can be. > As an aside, two other things that pack order matters for: it makes the > bitmaps themselves compress better (because it increases locality of > reachability, so you get nice runs of "1" or "0" bits). Yes I agree and thanks for bringing this up - putting objects in recency order in pack also makes bitmap index to have larger runs of same 1 or 0. > It also makes > the pack-reuse code more efficient (since in an ideal case, you can just > dump a big block of data from the front of the pack). Note that the > pack-reuse code that's in upstream git isn't that great; I have a better > system on my big pile of patches to send upstream (that never seems to > get smaller; ). Yes, it also make sense. I saw write_reused_pack() in upstream git just copy raw bytes from original to destination pack. You mentioned you have something better for pack reuse - in your patch queue, in two words, is it now reusing pack based on object, not raw bytes, or is it something else? In other words in which way it works better? (I'm just curious here as it is interesting to know) > > > - we don't learn about the filename of trees and blobs, which is going > > > to make the delta step much less efficient. This might be mitigated > > > by turning on the bitmap name-hash cache; I don't recall how much > > > detail pack-objects needs on the name (i.e., the full name versus > > > just the hash). > > > > If I understand it right, it uses only uint32_t name hash while searching. > > From > > pack-objects.{h,c} : > > Yeah, I think you are right. Not having the real names is a problem for > doing rev-list output, but I think pack-objects doesn't care (though do > note that the name-hash cache is not enabled by default). Yes, for packing it is only hash which is used. And I assume name-hash for bitmap is not enabled by default for compatibility with JGit code. It would make sense to me to eventually enable name-hash bitmap extension by default, as packing result is much better with it. And those who care about compatibility with JGit can just turn it off in their git config. Just my thoughts. > > > There may be other subtle things, too. The general idea of tying the > > > bitmap use to pack_to_stdout is that you _do_ want to use it for > > > serving fetches and pushes, but for a full on-disk repack via gc, it's > > > more important to generate a good pack. > > > > It is better we send good packs to clients too, right? And with > > pack.writeBitmapHashCache=true and retaining recency order (please see > > above, but again maybe I'm not completely right) to me we should be still > > generating a good pack while using bitmap reachability index for object > > graph traversal. > > We do want to send the client a good pack, but it's always a tradeoff. > We could spend much more time searching for the perfect delta, but at > some point we have to decide on how much CPU to spend serving them. > Likewise, even if the bitmapped packs we send are in slightly worse > order, saving a minute of CPU time off of every clone of the kernel is a > big deal. Yes, this I understand and agree. Like I said above I was imagining freshly repacked repo with recently rebuilt
Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too
On Tue, Jul 12, 2016 at 10:08:08PM +0300, Kirill Smelkov wrote: > > Or are we fine with my arguments about recency order staying the same > > when using bitmap reachability index for object graph traversal, and this > > way the patch is fine to go in as it is? > > Since there is no reply I assume the safe way to go is to let default > for pack-to-file case to be "not using bitmap index". Please find updated > patch and interdiff below. I would still be grateful for feedback on > my above use-bitmap-for-pack-to-file arguments. Yeah, I think that is a reasonable approach. I see here you've added new config, though, and I don't think we want that. For your purposes, where you're driving pack-objects individually, I think a command-line option makes more sense. If we did want to have a flag for "use bitmaps when repacking via repack", I think it should be "repack.useBitmaps", and git-repack should pass the command-line option to pack-objects. pack-objects is porcelain and should not really be reading config at all. You'll note that pack.writeBitmaps was a mistake and got deprecated in favor of repack.writeBitmaps. I think pack.useBitmaps is a mistake, too, but nobody has really noticed or cared because there's no good reason to set it (the more interesting question is: are there bitmaps available? and if so, we try to use them). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too
On Fri, Jul 08, 2016 at 01:38:55PM +0300, Kirill Smelkov wrote: > > - we will not compute the same write order (which is based on > > traversal order), leading to packs that have less efficient cache > > characteristics > > I agree the order can be not exactly the same. Still if original pack is > packed well (with good recency order), while using bitmap we will tend > to traverse it in close to original order. > > Maybe I'm not completely right on this, but to me it looks to be the > case because if objects in original pack are put there linearly sorted > by recency order, and we use bitmap index to set of all reachable > objects from a root, and then just _linearly_ gather all those objects > from original pack by 1s in bitmap and put them in the same order into > destination pack, the recency order won't be broken. > > Or am I maybe misunderstanding something? Yeah, I think you can go some of the way by reusing the order from the old pack. But keep in mind that the bitmap result may also contain objects that are not yet packed. Those will just come in a big lump at the end of the bitmap (these are the "extended entries" in the bitmap code). So I think if you were to repeatedly "git repack -adb" over time, you would get worse and worse ordering as objects are added to the repository. As an aside, two other things that pack order matters for: it makes the bitmaps themselves compress better (because it increases locality of reachability, so you get nice runs of "1" or "0" bits). It also makes the pack-reuse code more efficient (since in an ideal case, you can just dump a big block of data from the front of the pack). Note that the pack-reuse code that's in upstream git isn't that great; I have a better system on my big pile of patches to send upstream (that never seems to get smaller; ). > > - we don't learn about the filename of trees and blobs, which is going > > to make the delta step much less efficient. This might be mitigated > > by turning on the bitmap name-hash cache; I don't recall how much > > detail pack-objects needs on the name (i.e., the full name versus > > just the hash). > > If I understand it right, it uses only uint32_t name hash while searching. > From > pack-objects.{h,c} : Yeah, I think you are right. Not having the real names is a problem for doing rev-list output, but I think pack-objects doesn't care (though do note that the name-hash cache is not enabled by default). > > There may be other subtle things, too. The general idea of tying the > > bitmap use to pack_to_stdout is that you _do_ want to use it for > > serving fetches and pushes, but for a full on-disk repack via gc, it's > > more important to generate a good pack. > > It is better we send good packs to clients too, right? And with > pack.writeBitmapHashCache=true and retaining recency order (please see > above, but again maybe I'm not completely right) to me we should be still > generating a good pack while using bitmap reachability index for object > graph traversal. We do want to send the client a good pack, but it's always a tradeoff. We could spend much more time searching for the perfect delta, but at some point we have to decide on how much CPU to spend serving them. Likewise, even if the bitmapped packs we send are in slightly worse order, saving a minute of CPU time off of every clone of the kernel is a big deal. We also take robustness shortcuts when sending to clients. For example, when doing an on-disk repack we re-crc32 all of the delta data we are reusing, even if we don't actually inflate it (because we would want to stop immediately if we see even a single bit flipped on disk). But we don't check them when sending to a client, because we know they are going to actually `index-pack` it and get a stronger consistency check anyway, and don't want to waste server CPU. The bitmaps are sort of the same. If there is a bug or corruption in the bitmap, the worst case is that we send a broken pack to the client, who will complain that we did not give them all of the objects. It's a momentary problem that can be fixed. If you use them for an on-disk repack, then the next step is usually to delete all of the old packs. So a corruption there carries forward, and is irreversible. As I understand your use case, it is OK to do the less careful things. It's just that pack-objects until now has been split into two modes: packing to a file is careful, and packing to stdout is less so. And you want to pack to a file in the non-careful mode. > > but I wonder if you should be using "pack-objects --stdout" yourself. > > I already tried --stdout. The problem is on repository extraction we > need to both extract the pack and index it. While `pack-object file` > does both, for --stdout case we need to additionally index extracted > pack with `git index-pack`, and standalone `git index-pack` is very slow > - in my experience much slower than generating the pack itself: Ah, right, that
Re: gc and repack ignore .git/*HEAD when checking reachability
Hi Duy, On Tue, 12 Jul 2016, Duy Nguyen wrote: > On Tue, Jul 12, 2016 at 5:51 PM, Jeff Kingwrote: > > On Tue, Jul 12, 2016 at 05:46:12PM +0200, Duy Nguyen wrote: > > > >> I'm not opposed to letting one worktree see everything, but this move > >> makes it harder to write new scripts (or new builtin commands, even) > >> that works with both single and multiple worktrees because you refer > >> to one ref (in current worktree perspective) differently. If we kill > >> of the main worktree (i.e. git init always creates a linked worktree) > >> then it's less of a problem, but still a nuisance to write > >> refs/worktree/$CURRENT/ everywhere. > > > > True. I gave a suggestion for the reading side, but the writing side > > would still remain tedious. > > > > I wonder if, in a worktree, we could simply convert requests to read or > > write names that do not begin with "refs/" as "refs/worktree/$CURRENT/"? > > That makes it a read/write-time alias conversion, but the actual storage > > is just vanilla (so the ref storage doesn't need to care, and > > reachability just works). > > A conversion like that is already happening, but it works at > git_path() level instead and maps anything outside refs/ to > worktrees/$CURRENT. Wouldn't you agree that the entire discussion goes into a direction that reveals that it might simply be a better idea to require commands that want to have per-worktree refs to do that explicitly? I mean, it looks to me that the harder we try to avoid that, the more problems crop up, some of that as serious as my reported data loss. I do not see any indication that trying even harder to "protect" commands from knowing that they are running in one of many worktrees is making things easier. To the contrary, I expect that direction to hold many more awful surprises for us. The same holds true for the config, BTW. I really have no love for the idea to make the config per-worktree. It just holds too many nasty opportunities for violate the Law of Least Surprises. Just to name one: imagine you check out a different branch in worktree A, then switch worktree B to the branch that A had, and all of a sudden you may end up with a different upstream! Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add very verbose porcelain output to status
Hi Jeff, [please note that top-posting is discouraged on this mailing list] On Tue, 12 Jul 2016, Jeff Hostetler wrote: > Thanks for the feedback. I like the overall direction of your > suggestions. Going for a porcelain V2 feels better than piling > onto verbose. I think that would also give us a little more license > to address some of the line format issues that you point out. > Let me retool what I have along these lines and see how it looks. Thank you for working on this. I will hold off from releasing Git for Windows v2.9.1 until this is sorted out, as I already merged v1 into our `master` and do not want to end up with a bogus release. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's cooking in git.git (Jul 2016, #04; Mon, 11)
Hi Junio, On Tue, 12 Jul 2016, Junio C Hamano wrote: > On Tue, Jul 12, 2016 at 6:16 AM, Johannes Schindelin >wrote: > > > > On Mon, 11 Jul 2016, Junio C Hamano wrote: > > > >> [New Topics] > >> > >> [...] > > > > What about http://thread.gmane.org/gmane.comp.version-control.git/299050? > > Not forgotten. > > It just is not one of the "New Topics" yet, together with several > other patches sent to > the list recently, hence it is not listed there. Thanks. I just nudged you because this is a patch I needed to include it in Git for Windows and would have loved more reviews than just mine. Not a big deal, especially because it is not forgotten ;-) Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] Resend of gitster/pb/bisect
Hi Pranit, On Wed, Jul 13, 2016 at 12:35 AM, Pranit Bauvawrote: > Hey Junio, > > A small mistake got unnoticed by me which Lars recently pointed out. > The naming convention is "git_path_" and underscore > instead of spaces. It's a good thing to resend when you find mistakes, but please use a version number for your patch series (like "PATCH v3" or something). Thanks, Christian. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] config: add conditional include
Nguyễn Thái Ngọc Duywrites: > +`gitdir`:: > + The environment variable `GIT_DIR` must match the following > + pattern for files to be included. The pattern can contain > + standard globbing wildcards and two additional ones, `**/` and > + `/**`, that can match multiple path components. Please refer > + to linkgit:gitignore[5] for details. For convenience: It's unclear to me whether the whole content of GIT_DIR must match, or whether the pattern should be included (or a be prefix) of $GIT_DIR. >From this text, I read it as "the whole content", but ... > + ; include for all repositories inside /path/to/group > + [include "gitdir:/path/to/group/"] > + path = /path/to/foo.inc > + > + ; include for all repositories inside $HOME/to/group > + [include "gitdir:~/to/group/"] > + path = /path/to/foo.inc ... here it seems it only has to be a prefix. > +static int prepare_include_condition_pattern(struct strbuf *pat) > +{ > + int prefix = 0; > + > + /* TODO: maybe support ~user/ too */ > + if (pat->buf[0] == '~' && is_dir_sep(pat->buf[1])) { > + struct strbuf path = STRBUF_INIT; > + const char *home = getenv("HOME"); > + > + if (!home) > + return error(_("$HOME is not defined")); expand_user_path in path.c seems to do the same as you're doing (but does deal with ~user). Any reason not to use it? -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] config: add conditional include
On Wed, Jul 13, 2016 at 09:21:37AM +0200, Matthieu Moy wrote: > > +static int prepare_include_condition_pattern(struct strbuf *pat) > > +{ > > + int prefix = 0; > > + > > + /* TODO: maybe support ~user/ too */ > > + if (pat->buf[0] == '~' && is_dir_sep(pat->buf[1])) { > > + struct strbuf path = STRBUF_INIT; > > + const char *home = getenv("HOME"); > > + > > + if (!home) > > + return error(_("$HOME is not defined")); > > expand_user_path in path.c seems to do the same as you're doing (but > does deal with ~user). Any reason not to use it? I had a similar question, which Duy answered in: http://article.gmane.org/gmane.comp.version-control.git/298528 It does feel pretty hacky, though (especially for a case that seems unlikely to come up: people having wildcard patterns in the name of their home directory). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems
On Tue, Jul 12, 2016 at 10:38 PM, Junio C Hamanowrote: > Nguyễn Thái Ngọc Duy writes: > >> Since I now could reproduce the problem that Christoph showed, I >> decided to send the good patches out. To sum up, we use "unsigned >> long" in some places related to file size. On 32-bit systems, it's >> limited to 32 bits even though the system can handle files larger than >> that (off_t is 64-bit). This fixes it. >> >> clang -Wshorten-64-to-32 is very helpful to spot these problems. I >> have a couple more patches to clean all these warnings, but some need >> more code study to see what is the right way to do. >> >> Most of the rest seems harmless, except for the local variable "size" >> in builtin/pack-objects.c:write_one(). I might send 6/5 for that one. > > With 1-7/5 it appears t1050 is broken? Oops, sorry I focused on testing big repos and forgot the test suite. But hehe.. it's a good thing. That test was added by 735efde, where fsck could keep on going after failing to inflate some blobs (instead of hitting malloc failure and abort). Now it's even better, fsck passes. I'll fix that up and resend with your squashes in. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html