Ich suche Daten auf git

2016-07-13 Thread Fred

Hallo,

Ich heisse Jensen. Ich bin auf git interessiert. Haette gern von Euch 
gehoert.


Gruss aus den Staaten,
--
Cal Dershowitz

aka Merrill Jensen in real life
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jun 2016, #05; Thu, 16)

2016-07-13 Thread Torsten Bögershausen



On 07/13/2016 12:20 AM, Joey Hess wrote:

Torsten Bögershausen wrote re jh/clean-smudge-annex:

The thing is that we need to check the file system to find .gitatttibutes,
even if we just did it 1 nanosecond ago.

So the .gitattributes is done 3 times:
-1 would_convert_to_git_filter_fd(
-2 assert(would_convert_to_git_filter_fd(path));
-3 convert.c/convert_to_git_filter_fd()

The only situation where this could be useful is when the .gitattributes
change between -1 and -2,
but then they would have changed between -1 and -3, and convert.c
will die().

Does it make sense to remove -2 ?


There's less redundant work going on than at first seems, because
.gitattribute files are only read once and cached. Verified by strace.


OK, I think I missed that work (not enough time for Git at the moment)
Junio, please help me out, do we have a cache here now?
I tried to figure out that following your attr branch, but failed.

So, the redundant work is only in the processing that convert_attrs() does
of the cached .gitattributes.

Notice that there was a similar redundant triple call to convert_attrs()
before my patch set:

1. index_fd checks would_convert_to_git_filter_fd
2. index_stream_convert_blob does assert(would_convert_to_git_filter_fd(path))
   (Again redundantly since 1. is its only caller and has already
   checked.)
3. in convert_to_git_filter_fd

If convert_attrs() is somehow expensive, it might be worth passing a
struct conv_attrs * into the functions that currently call
convert_attrs(). But it does not look expensive to me.

I have that on the list, but seems to be uneccesary now.


I think it would be safe enough to remove both asserts, at least as the
code is structured now.


OK.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Https password present in git output

2016-07-13 Thread Jeff King
On Thu, Jul 14, 2016 at 01:36:52AM +0300, ervion wrote:

> It is in fact the case, that git fetch output is scrubbed, sorry I did not
> notice previously.
> But (on my device: git version 2.9.0 arch linux) git push is not.
> $ git push origin --all

Maybe this?

-- >8 --
Subject: [PATCH] push: anonymize URL in status output

Commit 47abd85 (fetch: Strip usernames from url's before
storing them, 2009-04-17) taught fetch to anonymize URLs.
The primary purpose there was to avoid sticking passwords in
merge-commit messages, but as a side effect, we also avoid
printing them to stderr.

The push side does not have the merge-commit problem, but it
probably should avoid printing them to stderr. We can reuse
the same anonymizing function.

Note that for this to come up, the credentials would have to
appear either on the command line or in a git config file,
neither of which is particularly secure. So people _should_
be switching to using credential helpers instead, which
makes this problem go away. But that's no excuse not to
improve the situation for people who for whatever reason end
up using credentials embedded in the URL.

Signed-off-by: Jeff King 
---
 t/t5541-http-push-smart.sh | 7 +++
 transport.c| 7 +--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh
index fd7d06b..8d08e06 100755
--- a/t/t5541-http-push-smart.sh
+++ b/t/t5541-http-push-smart.sh
@@ -368,5 +368,12 @@ test_expect_success GPG 'push with post-receive to inspect 
certificate' '
test_cmp expect "$HTTPD_DOCUMENT_ROOT_PATH/push-cert-status"
 '
 
+test_expect_success 'push status output scrubs password' '
+   test_commit scrub &&
+   git push --porcelain "$HTTPD_URL_USER_PASS/smart/test_repo.git" >status 
&&
+   # should have been scrubbed down to vanilla URL
+   grep "^To $HTTPD_URL/smart/test_repo.git" status
+'
+
 stop_httpd
 test_done
diff --git a/transport.c b/transport.c
index 095e61f..be4a63e 100644
--- a/transport.c
+++ b/transport.c
@@ -359,8 +359,11 @@ static void print_ok_ref_status(struct ref *ref, int 
porcelain)
 
 static int print_one_push_status(struct ref *ref, const char *dest, int count, 
int porcelain)
 {
-   if (!count)
-   fprintf(porcelain ? stdout : stderr, "To %s\n", dest);
+   if (!count) {
+   char *url = transport_anonymize_url(dest);
+   fprintf(porcelain ? stdout : stderr, "To %s\n", url);
+   free(url);
+   }
 
switch(ref->status) {
case REF_STATUS_NONE:
-- 
2.9.1.356.g3c37bc7

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Https password present in git output

2016-07-13 Thread ervion


I completely agree that it is not a head-on-fire kind of problem, there 
are ways to avoid it.

Simply nice to have.

It is in fact the case, that git fetch output is scrubbed, sorry I did 
not notice previously.

But (on my device: git version 2.9.0 arch linux) git push is not.
$ git push origin --all

Results in:
/---/
To https://username:passw...@domeen.com/git/repo.git
   xxx..zzz  master -> master

On 13.07.2016 21:16, Junio C Hamano wrote:
On Wed, Jul 13, 2016 at 11:09 AM, Junio C Hamano  
wrote:

ervion  writes:

Sometimes using ssh is not possible and saving https password in 
plain

text to disk may be desireable
(in case of encrypted disk it would be equivalent security with
caching password in memory).

One possibility for this in git is to save remote in the
https://username:passw...@domain.com/repo.git format.


Wasn't netrc support added exactly because users do not want to do
this?


Interesting. Even with "auth in URL", I seem to get this:

$ git fetch -v -v https://gitster:p...@github.com/git/git  
refs/tags/v2.9.1

From https://github.com/git/git
 * tag   v2.9.1 -> FETCH_HEAD

Notice that "From $URL" has the userinfo (3.2.1 in RFC 3986) scrubbed.

If you are seeing somewhere we forgot to scrub userinfo in a similar 
way in

the output, we should. Where do you see "present in git OUTPUT" as you
said in the subject? What command with what options exactly and in what
part of the output?

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jul 2016, #05; Wed, 13)

2016-07-13 Thread Jeff King
On Wed, Jul 13, 2016 at 03:41:01PM -0700, Junio C Hamano wrote:

> Stefan Beller  writes:
> 
> >>> I think Shawns proposal to have a receive.maxCommandBytes is a
> >>> good way for an overall upper bound, but how does it stop us from
> >>> going forward with this series?
> >>
> >> If we were to do maxcommandbytes, then max_options would become
> >> irrelevant, no?
> >
> > Maybe?
> >
> > I do not know what kind of safety measures we want in place here, and
> > if we want to go for overlapping things?
> >
> > Currently there are none at all in your upstream code, although you cannot
> > push arbitrary large things to either Shawns or Peffs $Dayjob servers, so
> > I wonder if we want to either agree on one format or on many overlapping
> > things, as some different hosts may perceive different things as DoS 
> > threats,
> > so they can fine tune as they want?
> 
> I think those extra knobs can come later.  If we are not going to
> limit with max_options in the end, however, wouldn't it be more
> natural for the initial iteration without any configuration not to
> have hard-coded max_options at all?

Yeah, I am OK with adding restrictive knobs later as a separate topic.
As Stefan notes, upstream does not have the other knobs anyway, and IIRC
the push-options feature is not even enabled by default.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jul 2016, #05; Wed, 13)

2016-07-13 Thread Junio C Hamano
Stefan Beller  writes:

>>> I think Shawns proposal to have a receive.maxCommandBytes is a
>>> good way for an overall upper bound, but how does it stop us from
>>> going forward with this series?
>>
>> If we were to do maxcommandbytes, then max_options would become
>> irrelevant, no?
>
> Maybe?
>
> I do not know what kind of safety measures we want in place here, and
> if we want to go for overlapping things?
>
> Currently there are none at all in your upstream code, although you cannot
> push arbitrary large things to either Shawns or Peffs $Dayjob servers, so
> I wonder if we want to either agree on one format or on many overlapping
> things, as some different hosts may perceive different things as DoS threats,
> so they can fine tune as they want?

I think those extra knobs can come later.  If we are not going to
limit with max_options in the end, however, wouldn't it be more
natural for the initial iteration without any configuration not to
have hard-coded max_options at all?

As to the "SQUASH???" compilation fix, I can squash it to the one
immediately below it locally; I didn't do so in today's pushout, as
it was still unclear if you are already working on a reroll (in
which case anything I would do would be a wasted effort).


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v14 00/21] index-helper/watchman

2016-07-13 Thread David Turner

On 07/12/2016 02:24 PM, Duy Nguyen wrote:

Just thinking out loud. I've been thinking about this more about this.
After the move from signal-based to unix socket for communication, we
probably are better off with a simpler design than the shm-alike one
we have now.

What if we send everything over a socket or a pipe? Sending 500MB over
a unix socket takes 253ms, that's insignificant when operations on an
index that size usually take seconds. If we send everything over
socket/pipe, we can trust data integrity and don't have to verify,
even the trailing SHA-1 in shm file.


I think it would be good to make index operations not take seconds.

In general, we should not need to verify the trailing SHA-1 for shm 
data.  So the index-helper verifies it when it loads it, but the git 
(e.g.) status should not need to verify.


Also, if we have two git commands running at the same time, the 
index-helper can only serve one at a time; with shm, both can run at 
full speed.



So, what I have in mind is this, at read index time, instead of open a
socket, we run a separate program and communicate via pipes. We can
exchange capabilities if needed, then the program sends the entire
current index, the list of updated files back (and/or the list of dirs
to invalidate). The design looks very much like a smudge/clean filter.


This seems very complicated.  Now git status talks to the separate 
program, which talks to the index-helper, which talks to watchman.  That 
is a lot of steps!


I think we should wait until we heard from the Windows folks what the 
problems with the current solution are, and see what design they come up 
with.



For people who don't want extra daemon, they can write a short script
that saves indexes somewhere in tmpfs, and talk to watchman or
something else. I haven't written this script, but I don't think it
takes long to write one. Windows folks have total freedom to implement
a daemon, a service or whatever and use this program as front end. How
the service talks to this program is totally up to them. For people
who want to centralize everything, they can have just one daemon and
have the script to talk to this daemon.

I can see that getting rid of file-based stuff simplifies some
patches. We can still provide a daemon to do more advanced stuff (or
to make it work out of the box). But it's not a hard requirement and
we probably don't need to include one right now. And I think it makes
it easier to test as well because we can just go with some fake file
monitor service instead of real watchman.


I think the daemon also has the advantage that it can reload the index 
as soon as it changes.  This is not quite implemented, but it would be 
pretty easy to do.  That would save a lot of time in the typical workflow.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [feature request] Warn about or prevent --amend commits that don't change anything

2016-07-13 Thread Bergi

Junio C Hamano schrieb:

Bergi  writes:


when nothing is staged in the index then `git commit` warns about this
fact with either "nothing to commit, working directory clean" or "no
changes added to commit".
However, `git commit --amend --no-edit` will happily record a new
commit that differs in nothing than its commit date from the original.


What kind of "mistake" are you afraid of?


The particular mistake that happened to me was that I `git push -f 
origin`'ed the newly created commit (which worked as expected, 
overwriting the original commit) and then leaving my PC in the belief 
that I had successfully pushed my changes.



I can sort of see that "git commit --amend" might want to see two
summary diffstat output at the end, unlike "git commit" that shows
what changes were recorded relative to the parent.  In addition to
that "final result is different in this way from the parent" output,
you might also want "this is the change you made by amending" and
knowing the fact that you can notice you didn't add anything by the
latter being empty _might_ give you an additional peace of mind.


Yes, that would be incredibly helpful as well, but is not what I am 
after here.


However `git commit --amend` already shows me that I still have "changes 
not staged for commit" when editing the commit message, so that would've 
been enough for my use case.
The problem is that I used `git commit --amend --no-edit`, which did 
neither warn me about missing staged changes nor existing unstaged changes.
If nothing is staged and the user does not intend to edit any metadata, 
git should be able to deduce that the user (me) did forget something.



But is that the kind of mistake you are worried about?  IOW, you
thought you made and added changes X, Y and Z to the index before
running your "commit --amend", but you forgot the "add" step and did
not add anything?


Yes, exactly.


If so, […] your "you need --allow-empty if you really do not want
any changes to the tree" would not [help much] either, if you added
X and Y but forgot to add Z.


A good remark.
Maybe in that case it could warn about unstaged edits in the case of 
--no-edit but still succeed.



So I am sensing a slight XY problem here.


That might well be, I'm just here to tell my story, it's the designers 
that need to decide at which step I went wrong and which parts could be 
improved.


Kind regards,
 Bergi
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.9.1

2016-07-13 Thread Junio C Hamano
Junio C Hamano  writes:

> It is somewhat disturbing that nobody seems to be regularly building
> on 32-bit platforms these days, which is the only reason I can think
> of why this was never reported until it hit a maintenance track.
> This should have been caught last week at f6a729f3 (Merge branch
> 'jk/tzoffset-fix', 2016-07-06) when the topic hit 'master' at the
> latest, and more preferrably it should have already been caught last
> month at 08ec8c5e (Merge branch 'jk/tzoffset-fix' into next,
> 2016-06-28).
>
> Those who care about 32-bit builds need to start building and
> testing 'next' and 'master' regularly, or similar breakages are
> bound to continue happening X-<.
>
> Volunteers?

We might eventually see a volunteer or two but that hasn't happened
yet, at least in the past few days.

Does Travis CI testing have an option to run our tests on some
32-bit platforms?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report: --invert-grep and --author seems to be incompatible.

2016-07-13 Thread Junio C Hamano
Jehan Pagès  writes:

> ... A better naming should
> have been called --invert-matches, or even just --invert.
> And the man description should definitely be completed, IMO.

Renaming the options would not work well without harming existing
users, but a patch to update the documentation is certainly very
welcome.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report: --invert-grep and --author seems to be incompatible.

2016-07-13 Thread Jehan Pagès
Hi,

On Wed, Jul 13, 2016 at 7:37 PM, Junio C Hamano  wrote:
> Jehan Pagès  writes:
>
>>> I think --author=someone greps the "author " field in the commit
>>> object looking for the hit with "someone", and your request asks to
>>> show commits that either do not have "something" or was not written
>>> by "someone", I would guess.
>>
>> Note that I can still see commits with "something", and I can also see
>> commits by "someone" in my results. So my request actually ask for
>> commits which have neither "something" nor are done by "someone".
>>
>> Anyway I don't think that's the expected result, hence is still a bug.
>> Am I wrong?
>
> Unlike "git grep", "git log" works with boolean expression that does
> not explicitly have a way to say "--and" and "--or", so only one
> interpretation has been chosen long time ago.  All the terms are
> ORed together, and then the whole thing can be inverted with
> "--invert-grep".  i.e. you are telling an equivalent of
>
> $ git grep --not \( -e "author someone" --or -e "something" \)
>
> with the command line, and there is no way to combine the requested
> match criteria (there are two, "author must be somebody" and
> "something must be in the message") differently.
>
> Given that, that is the "right" expectation, and as long as you get
> the behaviour, there is no "bug".

I see. Well it's a little counter-intuitive though, since the option
is called --invert-grep. And the man indicates:

« Limit the commits output to ones with log message that do not match
the pattern specified with --grep=. »

So it gives the impression that the option is made only to invert the
--grep part. Whereas in fact, you are saying it inverts any other
optional selection (or at least also --author). A better naming should
have been called --invert-matches, or even just --invert.
And the man description should definitely be completed, IMO.

> You can call it a lack of feature, though, and patches to implement
> a more flexible combination of match criteria like "git grep" allows
> is always welcome.

Maybe I will! This would be useful to have a little more capabilities
for commit selection.
Thanks.

Jehan

-- 
ZeMarmot open animation film
http://film.zemarmot.net
Patreon: https://patreon.com/zemarmot
Tipeee: https://www.tipeee.com/zemarmot
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hi git

2016-07-13 Thread Harsh Bhatt
Hi git

http://beabuyer.org/smell.php?cat=qcugur1unp6m3q98



Rgds
Harsh
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.9.1

2016-07-13 Thread Junio C Hamano
Johannes Schindelin  writes:

>> I was hoping to hear from you sooner and do v2.9.2 with your t0006
>> workaround with lazy-prereq changes from Peff (i.e. "Makes sense!"
>> above), so that you do not have to do two releases in a row
>> (i.e. skipping v2.9.1 saying "Git for Windows skipped that one
>> because it was not quite right; this release fixes the issue" in
>> your v2.9.2 announcement).
>
> I am sorry that you expected me to be more available. It is a pretty crazy
> week already (and trying to get a Git for Windows v2.9.1 out of the door
> after dropping everything else on Tuesday morning added quite a bit to the
> load), so I am at times unable to even read the Git mailing list.

Actually these back-and-forth was an attempt to help you reduce the
load by not having to worry about the t0006 workaround.  Checking
your inbox would have been quicker than writing another of your own
version.

> As I am more concerned with Jeff Hostetler's patch now (the "very verbose
> porcelain status"; I merged the Pull Request after seeing no comments on
> his mail, but then Peff provided some good feedback, so now I am not quite
> certain how to go about it: revert, or wait if the 2nd iteration is
> acceptable and take that), so I am not immediately releasing any version,
> anyway.

As I said, I'd be waiting for a reroll of that to queue on 'pu', but
as a new feature, it won't appear in any of the v2.9.x releases.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multiple Keys in ssh-agent, fail to clone

2016-07-13 Thread Junio C Hamano
Benjamin Fritsch  writes:

> I read the Changelog for 2.9 and couldn’t find any reference to changed key 
> handling. Is there anything that I can add to the `git clone` command to get 
> the old behavior?

I do not think this has much to do with the version of Git, unless
you are getting an updated SSH client together with your new version
of Git from whoever distributes it.

And it may not even be related to SSH version.  Did you change your
~/.ssh/config recently by any chance?  I personally do load many
keys (one per destination) in the agent and back when I didn't know
better, I didn't have IdentityFile line per each host, i.e. the last
lines in these two entries were missing in my ~/.ssh/config:

Host foo.bar.com
  Protocol 2
  User gitolite
  IdentityFile ~/.ssh/foo-bar-com

Host foo.baz.com
  Protocol 2
  User junio
  IdentityFile ~/.ssh/foo-baz-com

If you try to do "ssh -v -v foo.bar.com" with such a configuration,
you would observe that many keys are "offered" to the other side to
see if it is the one that it recognises, and you end up offering too
many of them before the right one.  An output from such a failing
session of "ssh -v" ends like this for me:

$ ssh -v foo.bar.com
...
debug1: Offering DSA public key: foo-baz-com
debug1: Authentications that can continue: publickey
debug1: Offering RSA public key: xxy-fsa-com
debug1: Authentications that can continue: publickey
debug1: Offering DSA public key: github
debug1: Authentications that can continue: publickey
debug1: ...
debug1: Offering RSA public key: gitorious
debug1: Authentications that can continue: publickey
Received disconnect from 192.168.1.1: 2: Too many authentication failures 
for gitolite
Disconnected from 192.168.1.1

If I do not have "IdentityFile ~/.ssh/foo-bar-com" line for the
"Host foo.bar.com" part, "ssh -v foo.bar.com" cannot know which
one of the keys it has available can be used to authenticate you
with foo.bar.com, so it ends up asking "do you know this key and
would you allow me to access you?" for each and every key.  Having
the line lets it use the appropriate key right at the beginning,
would not leak (they are "public" keys, so "leak" is not that a
serious problem, though) other public keys you have, and your
authentication is likely to succeed.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [feature request] Warn about or prevent --amend commits that don't change anything

2016-07-13 Thread Junio C Hamano
Bergi  writes:

> when nothing is staged in the index then `git commit` warns about this
> fact with either "nothing to commit, working directory clean" or "no
> changes added to commit".
> However, `git commit --amend --no-edit` will happily record a new
> commit that differs in nothing than its commit date from the original.
>
> This is unexpected and can lead to mistakes.

What kind of "mistake" are you afraid of?

I can sort of see that "git commit --amend" might want to see two
summary diffstat output at the end, unlike "git commit" that shows
what changes were recorded relative to the parent.  In addition to
that "final result is different in this way from the parent" output,
you might also want "this is the change you made by amending" and
knowing the fact that you can notice you didn't add anything by the
latter being empty _might_ give you an additional peace of mind.

But is that the kind of mistake you are worried about?  IOW, you
thought you made and added changes X, Y and Z to the index before
running your "commit --amend", but you forgot the "add" step and did
not add anything?  If so, even the second diff i.e. "this is what
you changed by amending" would not help very much, and your "you
need --allow-empty if you really do not want any changes to the
tree" would not either, if you added X and Y but forgot to add Z.

So I am sensing a slight XY problem here.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.9.1

2016-07-13 Thread Johannes Schindelin
Hi Junio,

On Wed, 13 Jul 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > This was just a quick fix, intended to allow me to release Git for Windows
> > v2.9.1 in a timely manner (which is now delayed for other reasons).
> > ...
> >> You'll need to adjust check_show as I did in my earlier patch.
> >
> > Makes sense!
> 
> Hmph, so what is your preferred approach?  You'll do your own v2.9.1
> that is different from others at t0006?

I may do a Git for Windows v2.9.1 that is different from Git v2.9.1 in
t0006, yes. Git for Windows still has tons of patches on top of Git
because I seem to be unable to drain the patch queue as fast as I want, so
I do not believe it is a big deal.

> I was hoping to hear from you sooner and do v2.9.2 with your t0006
> workaround with lazy-prereq changes from Peff (i.e. "Makes sense!"
> above), so that you do not have to do two releases in a row
> (i.e. skipping v2.9.1 saying "Git for Windows skipped that one
> because it was not quite right; this release fixes the issue" in
> your v2.9.2 announcement).

I am sorry that you expected me to be more available. It is a pretty crazy
week already (and trying to get a Git for Windows v2.9.1 out of the door
after dropping everything else on Tuesday morning added quite a bit to the
load), so I am at times unable to even read the Git mailing list.

As I am more concerned with Jeff Hostetler's patch now (the "very verbose
porcelain status"; I merged the Pull Request after seeing no comments on
his mail, but then Peff provided some good feedback, so now I am not quite
certain how to go about it: revert, or wait if the 2nd iteration is
acceptable and take that), so I am not immediately releasing any version,
anyway.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Multiple Keys in ssh-agent, fail to clone

2016-07-13 Thread Benjamin Fritsch
Hey all,

We recently upgraded from Git 2.8 to 2.9 and saw an issue when there are 
multiple keys added to my ssh-agent.

I have two keys. 
- KeyA (my company that has access to the repository I want to clone)
- KeyB (just my personal key with access to my personal stuff)

Having both keys in loaded and listed in `ssh-add -L` fails to clone the 
repository. I tried to change the order of the key in the agent but neither 
KeyA, KeyB nor KeyB, KeyA will work. The only case that works if I have KeyA 
loaded an no other key is added to the ssh-agent.

Having multiple Keys loaded works with Git 2.8 and Git 2.7 (I didn’t try older 
versions)
Cloning fails with “Unauthorized Access” of our Git provider. (It’s Bitbucket 
in this case)

I read the Changelog for 2.9 and couldn’t find any reference to changed key 
handling. Is there anything that I can add to the `git clone` command to get 
the old behavior?

Thank you for your help,
Best
Ben--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.9.1

2016-07-13 Thread Junio C Hamano
Johannes Schindelin  writes:

> How about this one instead (which is part of the time_t-may-be-int64
> branch on https://github.com/dscho/git which I still have to complete, as
> some unsigned longs still slipped out of my previous net)? It strikes me
> as much more robust:

Hmm, sorry, I do not see much upside here (iow, the 2038 test you
came up with is as robust).  When the internal time representation
is updated from "unsigned long" to a signed and wider type [*1*],
test-date has to stop reading the second-from-epoch input with
strtol(), whose property that overflow will always result in
LONG_MAX gives the robustness of the 2038 test, and needs to be
updated.  With this "is64bit" patch, you explicitly measure
"unsigned long", knowing that our internal time representation
currently is that type, and that would need to be updated when
widening happens.  So both need to be updated anyway in the future.

The update to check_show Peff suggested is the same as the previous
one, so there is no upside nor downside.

The prerequisite name 64BITTIME that lost an underscore is harder to
read, so there is a slight downside.

Moving of lazy_prereq to test-lib might be an upside if we were
planning to add a test that depends on the system having or not
having 64-bit timestamp elsewhere, but I do not think of a reason
why such a new test cannot go to t0006-date, which has the perfect
name for such a test and is not overly long right now (114 lines).

So, unless you have a more solid reason to reject the updated t0006
earlier in the thread, I am not sure what we'd gain by replacing it
with this version.


> -- snipsnap --
> From abe59dbb2235bb1d7aad8e78a66e196acb372ec8 Mon Sep 17 00:00:00 2001
> From: Johannes Schindelin 
> Date: Tue, 12 Jul 2016 13:19:53 +0200
> Subject: [PATCH] t0006: dates absurdly far require a 64-bit data type
>
> Git's source code refers to timestamps as unsigned longs. On 32-bit
> platforms, as well as on Windows, unsigned long is not large enough to
> capture dates that are "absurdly far in the future".
>
> Let's skip those tests if we know they cannot succeed.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  t/helper/test-date.c | 5 -
>  t/t0006-date.sh  | 6 +++---
>  t/test-lib.sh| 2 ++
>  3 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/t/helper/test-date.c b/t/helper/test-date.c
> index d9ab360..1e12d93 100644
> --- a/t/helper/test-date.c
> +++ b/t/helper/test-date.c
> @@ -4,7 +4,8 @@ static const char *usage_msg = "\n"
>  "  test-date relative [time_t]...\n"
>  "  test-date show: [time_t]...\n"
>  "  test-date parse [date]...\n"
> -"  test-date approxidate [date]...\n";
> +"  test-date approxidate [date]...\n"
> +"  test-date is64bit\n";
>  
>  static void show_relative_dates(char **argv, struct timeval *now)
>  {
> @@ -93,6 +94,8 @@ int main(int argc, char **argv)
>   parse_dates(argv+1, );
>   else if (!strcmp(*argv, "approxidate"))
>   parse_approxidate(argv+1, );
> + else if (!strcmp(*argv, "is64bit"))
> + return sizeof(unsigned long) == 8 ? 0 : 1;
>   else
>   usage(usage_msg);
>   return 0;
> diff --git a/t/t0006-date.sh b/t/t0006-date.sh
> index 04ce535..52f6b62 100755
> --- a/t/t0006-date.sh
> +++ b/t/t0006-date.sh
> @@ -31,7 +31,7 @@ check_show () {
>   format=$1
>   time=$2
>   expect=$3
> - test_expect_${4:-success} "show date ($format:$time)" '
> + test_expect_success $4 "show date ($format:$time)" '
>   echo "$time -> $expect" >expect &&
>   test-date show:$format "$time" >actual &&
>   test_cmp expect actual
> @@ -50,8 +50,8 @@ check_show iso-local "$TIME" '2016-06-15 14:13:20 +'
>  
>  # arbitrary time absurdly far in the future
>  FUTURE="5758122296 -0400"
> -check_show iso   "$FUTURE" "2152-06-19 18:24:56 -0400"
> -check_show iso-local "$FUTURE" "2152-06-19 22:24:56 +"
> +check_show iso   "$FUTURE" "2152-06-19 18:24:56 -0400" 64BITTIME
> +check_show iso-local "$FUTURE" "2152-06-19 22:24:56 +" 64BITTIME
>  
>  check_parse() {
>   echo "$1 -> $2" >expect
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 0055ebb..4e1afb0 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -,3 +,5 @@ run_with_limited_cmdline () {
>  }
>  
>  test_lazy_prereq CMDLINE_LIMIT 'run_with_limited_cmdline true'
> +
> +test_lazy_prereq 64BITTIME 'test-date is64bit'
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Issue with git p4 clone when using multiple depots and multiple branches

2016-07-13 Thread Nikos Andrikos
Hi,

We have a perforce structure at work that looks like this:

//depot/basic/{main,branch1,branch2}/component{1,2}
//depot/extra/{main,branch1,branch2}/extra{1,2}

I'm trying to create 3 branches, i.e. main/branch1/branch2, with all
sub-components together, i.e. component1/component2/extra1/extra2.

For this reason I have created a p4 client, with all 12 paths used like this:
//depot/basic/main/component1/... //client.name/main/component1/...
//depot/basic/main/component2/... //client.name/main/component2/...
//depot/basic/extra/extra1/... //client.name/main/extra1/...
//depot/basic/extra/extra2/... //client.name/main/extra2/...
... similar also for branch1 and branch2.

I have setup my git using these:
git config git-p4.clientclient.name
git config git-p4.branchList main/extra1
git config --add main/extra2make

The problem is that all //depot/extra paths are ignored when I'm
running the following command
> git p4 clone --use-client-spec --detect-branches //depot/basic@all 
> //depot/extra@all --destination project

I think the issue is in git-p4 and especially in importNewBranch.
Instead of using:
"""
branchPrefix = self.depotPaths[0] + branch + "/"
self.branchPrefixes = [ branchPrefix ]
"""
We should be using
"""
self.branchPrefixes = [x + branch + "/" for x in  self.depotPaths]
"""
Same also for "p4ChangesForPaths([branchPrefix], ..."

What do you think?

Thanks,
Nikos

--
=Do-
N.AND
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gc and repack ignore .git/*HEAD when checking reachability

2016-07-13 Thread Johannes Schindelin
Hi Duy,

On Wed, 13 Jul 2016, Duy Nguyen wrote:

> On Wed, Jul 13, 2016 at 10:20 AM, Johannes Schindelin
>  wrote:
> >
> > On Tue, 12 Jul 2016, Duy Nguyen wrote:
> >
> >> On Tue, Jul 12, 2016 at 5:51 PM, Jeff King  wrote:
> >> > On Tue, Jul 12, 2016 at 05:46:12PM +0200, Duy Nguyen wrote:
> >> >
> >> >> I'm not opposed to letting one worktree see everything, but this move
> >> >> makes it harder to write new scripts (or new builtin commands, even)
> >> >> that works with both single and multiple worktrees because you refer
> >> >> to one ref (in current worktree perspective) differently. If we kill
> >> >> of the main worktree (i.e. git init always creates a linked worktree)
> >> >> then it's less of a problem, but still a nuisance to write
> >> >> refs/worktree/$CURRENT/ everywhere.
> >> >
> >> > True. I gave a suggestion for the reading side, but the writing side
> >> > would still remain tedious.
> >> >
> >> > I wonder if, in a worktree, we could simply convert requests to read or
> >> > write names that do not begin with "refs/" as "refs/worktree/$CURRENT/"?
> >> > That makes it a read/write-time alias conversion, but the actual storage
> >> > is just vanilla (so the ref storage doesn't need to care, and
> >> > reachability just works).
> >>
> >> A conversion like that is already happening, but it works at
> >> git_path() level instead and maps anything outside refs/ to
> >> worktrees/$CURRENT.
> >
> > Wouldn't you agree that the entire discussion goes into a direction that
> > reveals that it might simply be a better idea to require commands that want
> > to have per-worktree refs to do that explicitly?
> 
> No.

Oh? So you do not see that we are already heading into serious trouble
ourselves?

> I prefer we have a single interface for dealing with _any_ worktree.

I agree. So far, I did not see an interface, though, but the idea that we
should somehow fake things so that there does not *have* to be an
interface.

> > The same holds true for the config, BTW. I really have no love for the
> > idea to make the config per-worktree. It just holds too many nasty
> > opportunities for violate the Law of Least Surprises.
> >
> > Just to name one: imagine you check out a different branch in worktree A,
> > then switch worktree B to the branch that A had, and all of a sudden you
> > may end up with a different upstream!
> 
> Everything in moderation. You wouldn't want to enable sparse checkout
> on one worktree and it suddenly affects all worktrees because
> core.sparsecheckout is shared. And submodules are known not to work
> when core.worktree is still shared.

We have precendence for config variables that are global but also allow
per- overrides. Think e.g. the http.* variables.

The point is: this solution still uses *one* config per repo.

> I will not enforce any rule (unless it's very obvious that the other
> way is wrong, like core.worktree). I will give you a rifle and you can
> either hunt for food or shoot your foot. In other words, you should be
> able to share everything if you like it that way while someone else
> can share just some config vars, or even nothing in config file.

You gave me a rifle alright, and I shot into my foot (by losing objects to
auto-gc). I just did not expect it to be a rifle.

To keep with the analogy: let's not build arms, but a kick-ass tool. And I
seriously disagree that per-worktree refs, reflogs or config are part of
said tool.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[feature request] Warn about or prevent --amend commits that don't change anything

2016-07-13 Thread Bergi

Hello,

when nothing is staged in the index then `git commit` warns about this 
fact with either "nothing to commit, working directory clean" or "no 
changes added to commit".
However, `git commit --amend --no-edit` will happily record a new commit 
that differs in nothing than its commit date from the original.


This is unexpected and can lead to mistakes. Without running `git 
status`, the user will not notice that his unstaged changes were not 
commited, as everything behaves as expected otherwise (the success 
output from `commit`, the new commit id in the log, `push` requiring the 
force option, etc).


I understand that `--amend` is (can be) used for editing commit 
messages, authors, authoring dates etc.
I would however like to see any `--amend` command that results in no 
changes to the tree, the commit message and the authoring metadata 
reject the commit with an appropriate warning similar to the one that a 
plain `git commit` would present. It should be overrideable by the 
`--allow-emtpy` parameter as well.


If this change detection is somehow unfeasible, I would at least like 
the `git commit --amend --no-edit` command (with no other flags) to 
check the tree in the same way as `git commit` does, as the intention of 
`--no-edit` is even more clear and running the command is more obviously 
a mistake/lapse.


Kind regards,
 Bergi
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.9.1

2016-07-13 Thread Johannes Schindelin
Hi Junio,

On Wed, 13 Jul 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > On Tue, 12 Jul 2016, Junio C Hamano wrote:
> >
> >> Jeff King  writes:
> >> 
> >> > In case it wasn't clear, I was mostly guessing there. So I dug a bit
> >> > further, and indeed, I am wrong. Linux never bumped to a 64-bit time_t
> >> > on i386 because of the ABI headaches.
> >> 
> >> X-< (yes, I knew).
> >> 
> >> > That being said, I still think the "clamp to time_t" strategy is
> >> > reasonable. Unless you are doing something really exotic like pretending
> >> > to be from the future, nobody will care for 20 years.
> >> 
> >> Yup.  It is a minor regression for them to go from ulong to time_t,
> >> because they didn't have to care for 90 years or so but now they do
> >> in 20 years, I'd guess, but hopefully after that many years,
> >> everybody's time_t would be sufficiently large.
> >> 
> >> I suspect Cobol programmers in the 50s would have said a similar
> >> thing about the y2k timebomb they created back then, though ;-)
> >> 
> >> > And at that point, systems with a 32-bit time_t are going to have
> >> > to do _something_, because time() is going to start returning
> >> > bogus values. So as long as we behave reasonably (e.g., clamping
> >> > values and not generating wrapped nonsense), I think that's a fine
> >> > solution.
> >> 
> >> OK.
> >
> > I kept the unsigned long -> time_t conversion after reading the thread so
> > far.
> 
> That's OK at this point; it is not v2.9.x material anyway.

Got it. I will try to get the patches submitted soon, anyway.

> The primary reason why I cared 32-bit time_t is not about 2038, by
> the way.  I recall that people wanted to store historical document
> with ancient timestamp; even if we update to support negative
> timestamps, they cannot go back to 19th century with 32-bit time_t,
> but they can with long long or whatever intmax_t is on their system.

Fair enough.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.9.1

2016-07-13 Thread Johannes Schindelin
Hi Junio,

On Wed, 13 Jul 2016, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Definitely keep that paragraph. I am quite familiar with the test
> > helper and it was not the outcome I initially expected.
> >
> >> +test_lazy_prereq 64BIT_TIME '
> >> +  case "$(test-date show:iso 99)" in
> >> +  *" -> 2038-"*)
> >> +  # on this platform, unsigned long is 32-bit, i.e. not large 
> >> enough
> >> +  false
> >
> > I see you tightened up the match a little. TBH, I think we could
> > probably just match the whole output string, but I doubt there's much
> > chance of a false positive either way.
> 
> Ah, it wasn't meant to be a tightening; rather the above is for
> documentation value, i.e. make it stand out what 2038 we are
> matching---its answer being "the year portion of the result of the
> conversion".

How about this one instead (which is part of the time_t-may-be-int64
branch on https://github.com/dscho/git which I still have to complete, as
some unsigned longs still slipped out of my previous net)? It strikes me
as much more robust:

-- snipsnap --
>From abe59dbb2235bb1d7aad8e78a66e196acb372ec8 Mon Sep 17 00:00:00 2001
From: Johannes Schindelin 
Date: Tue, 12 Jul 2016 13:19:53 +0200
Subject: [PATCH] t0006: dates absurdly far require a 64-bit data type

Git's source code refers to timestamps as unsigned longs. On 32-bit
platforms, as well as on Windows, unsigned long is not large enough to
capture dates that are "absurdly far in the future".

Let's skip those tests if we know they cannot succeed.

Signed-off-by: Johannes Schindelin 
---
 t/helper/test-date.c | 5 -
 t/t0006-date.sh  | 6 +++---
 t/test-lib.sh| 2 ++
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/t/helper/test-date.c b/t/helper/test-date.c
index d9ab360..1e12d93 100644
--- a/t/helper/test-date.c
+++ b/t/helper/test-date.c
@@ -4,7 +4,8 @@ static const char *usage_msg = "\n"
 "  test-date relative [time_t]...\n"
 "  test-date show: [time_t]...\n"
 "  test-date parse [date]...\n"
-"  test-date approxidate [date]...\n";
+"  test-date approxidate [date]...\n"
+"  test-date is64bit\n";
 
 static void show_relative_dates(char **argv, struct timeval *now)
 {
@@ -93,6 +94,8 @@ int main(int argc, char **argv)
parse_dates(argv+1, );
else if (!strcmp(*argv, "approxidate"))
parse_approxidate(argv+1, );
+   else if (!strcmp(*argv, "is64bit"))
+   return sizeof(unsigned long) == 8 ? 0 : 1;
else
usage(usage_msg);
return 0;
diff --git a/t/t0006-date.sh b/t/t0006-date.sh
index 04ce535..52f6b62 100755
--- a/t/t0006-date.sh
+++ b/t/t0006-date.sh
@@ -31,7 +31,7 @@ check_show () {
format=$1
time=$2
expect=$3
-   test_expect_${4:-success} "show date ($format:$time)" '
+   test_expect_success $4 "show date ($format:$time)" '
echo "$time -> $expect" >expect &&
test-date show:$format "$time" >actual &&
test_cmp expect actual
@@ -50,8 +50,8 @@ check_show iso-local "$TIME" '2016-06-15 14:13:20 +'
 
 # arbitrary time absurdly far in the future
 FUTURE="5758122296 -0400"
-check_show iso   "$FUTURE" "2152-06-19 18:24:56 -0400"
-check_show iso-local "$FUTURE" "2152-06-19 22:24:56 +"
+check_show iso   "$FUTURE" "2152-06-19 18:24:56 -0400" 64BITTIME
+check_show iso-local "$FUTURE" "2152-06-19 22:24:56 +" 64BITTIME
 
 check_parse() {
echo "$1 -> $2" >expect
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 0055ebb..4e1afb0 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -,3 +,5 @@ run_with_limited_cmdline () {
 }
 
 test_lazy_prereq CMDLINE_LIMIT 'run_with_limited_cmdline true'
+
+test_lazy_prereq 64BITTIME 'test-date is64bit'
-- 
2.9.0.278.g1caae67

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Https password present in git output

2016-07-13 Thread Dennis Kaarsemaker
On wo, 2016-07-13 at 20:26 +0300, ervion wrote:
> One possibility for this in git is to save remote in the 
> https://username:passw...@domain.com/repo.git format.

This is not recommended. Git has credential helpers to help you store
passwords outside the git configuration.

Which then makes your original problem go away :)

D.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Https password present in git output

2016-07-13 Thread Junio C Hamano
On Wed, Jul 13, 2016 at 11:09 AM, Junio C Hamano  wrote:
> ervion  writes:
>
>> Sometimes using ssh is not possible and saving https password in plain
>> text to disk may be desireable
>> (in case of encrypted disk it would be equivalent security with
>> caching password in memory).
>>
>> One possibility for this in git is to save remote in the
>> https://username:passw...@domain.com/repo.git format.
>
> Wasn't netrc support added exactly because users do not want to do
> this?

Interesting. Even with "auth in URL", I seem to get this:

$ git fetch -v -v https://gitster:p...@github.com/git/git  refs/tags/v2.9.1
>From https://github.com/git/git
 * tag   v2.9.1 -> FETCH_HEAD

Notice that "From $URL" has the userinfo (3.2.1 in RFC 3986) scrubbed.

If you are seeing somewhere we forgot to scrub userinfo in a similar way in
the output, we should. Where do you see "present in git OUTPUT" as you
said in the subject? What command with what options exactly and in what
part of the output?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Https password present in git output

2016-07-13 Thread Junio C Hamano
ervion  writes:

> Sometimes using ssh is not possible and saving https password in plain
> text to disk may be desireable
> (in case of encrypted disk it would be equivalent security with
> caching password in memory).
>
> One possibility for this in git is to save remote in the
> https://username:passw...@domain.com/repo.git format.

Wasn't netrc support added exactly because users do not want to do
this?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jul 2016, #05; Wed, 13)

2016-07-13 Thread Junio C Hamano
On Wed, Jul 13, 2016 at 10:52 AM, Stefan Beller  wrote:
> On Wed, Jul 13, 2016 at 10:32 AM, Junio C Hamano  wrote:
>> Stefan Beller  writes:
>>
 * sb/push-options (2016-07-12) 5 commits
  - add a test for push options
  - push: accept push options
  - SQUASH???
>>>
>>> Squash? I do not find a squashable commit in what you pushed,
>>> do you intend to squash the first 2 patches instead?
>
> Oh I pulled a few minutes before you sent this email, and forgot
> that you likely have pushed again when sending this email. :/

There were no multiple pushes involved.

I prepare what is to be pushed out, update the what's cooking report,
push and then send out the what's cooking report (which also is on
the 'todo' branch).

But there is this thing called propagation delay ;-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jul 2016, #05; Wed, 13)

2016-07-13 Thread Stefan Beller
On Wed, Jul 13, 2016 at 10:32 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>>> * sb/push-options (2016-07-12) 5 commits
>>>  - add a test for push options
>>>  - push: accept push options
>>>  - SQUASH???
>>
>> Squash? I do not find a squashable commit in what you pushed,
>> do you intend to squash the first 2 patches instead?

Oh I pulled a few minutes before you sent this email, and forgot
that you likely have pushed again when sending this email. :/
Thanks!

>> Yeah there were some late comments, so I did not reroll right away.
>> I think Shawns proposal to have a receive.maxCommandBytes is a
>> good way for an overall upper bound, but how does it stop us from
>> going forward with this series?
>
> If we were to do maxcommandbytes, then max_options would become
> irrelevant, no?

Maybe?

I do not know what kind of safety measures we want in place here, and
if we want to go for overlapping things?

Currently there are none at all in your upstream code, although you cannot
push arbitrary large things to either Shawns or Peffs $Dayjob servers, so
I wonder if we want to either agree on one format or on many overlapping
things, as some different hosts may perceive different things as DoS threats,
so they can fine tune as they want?

In the Gerrit world, you have a ref per code review, such that it is easy to
have 50k refs or more, similar to the repo Jeff pointed out [1], that has 40k
tags (and getting a new tag every 2 hours apparently).

So I could understand if different services care about the different loads
(refs vs push options) differently (one would want to allow unlimited refs
pushing for mirroring such repos as pointed out above, while another
one might care about the total load of the server for a single rogue user)

Thanks,
Stefan

[1] https://github.com/JetBrains/intellij-community
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Https password present in git output

2016-07-13 Thread ervion


Sometimes using ssh is not possible and saving https password in plain 
text to disk may be desireable
(in case of encrypted disk it would be equivalent security with caching 
password in memory).


One possibility for this in git is to save remote in the 
https://username:passw...@domain.com/repo.git format.
However, in this case every time you push or pull, the remote address, 
including the plain text password.

That would introduce additional security issiues and is unreasonable?

Wouldn't it make sense to scrabble the password part in remote's url 
before printing it to output?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 8/8] read-cache: unlink old sharedindex files

2016-07-13 Thread Christian Couder
On Wed, Jul 13, 2016 at 5:16 PM, Duy Nguyen  wrote:
> On Tue, Jul 12, 2016 at 9:45 PM, Christian Couder
>  wrote:
>> On Tue, Jul 12, 2016 at 5:12 PM, Duy Nguyen  wrote:
>>>
>>> No. People could create an index file anywhere in theory. So you don't
>>> know how many index files there are.
>>
>> Maybe when an index file is created, its path and its sharedindex file
>> could be appended into a log file.
>> We could check this log file to see if we can remove sharedindex files.
>> We would need to remove the entries in this log file for the indexes
>> that are no longer there.
>>
>> Or instead of one log file we could have a file for each index file in
>> a special directory called for example "indexinfo".
>> So we could just delete the file if its related index is no longer there.
>
> New files will require locking so people don't append at the same
> time.

If we create a new file for each index file with a name that depends
on the index path (for example maybe the sha1 of the index path) then
as many index files with the same path are not possible we should be
safe.

> And maybe another new host of problems. I think we can just go
> with the garbage collection way that we have done for unreachable
> objects.
>
> Your indexinfo idea looks very close to multiple locking, an index
> would lock the shared index it's linked to, preventing it from being
> removed. For single locking, we can just create a file named $X.lock,
> but for multiple locks, maybe we can go with
> $X.lock-$index_trailing_sha1. Will it work? I don't know. Just
> thinking out loud.

Isn't is possible that the same index (so with the same trailing sha1)
be created in two different places?

>>> It really depends. If the shared part is too small for old indexes, we
>>> might as well unsplit them. In practice though, the only long-term
>>> index file is $GIT_DIR/index. If we don't delete old shared index
>>> files too close to their creation time, temp index files will go away.
>>
>> We could treat $GIT_DIR/index specially so that if there are no temp
>> index files, there should be nothing in "indexinfo".
>
> No, temp index files are needed. And it will  be hard to treat
> $GIT_DIR/index specially because updating it involves a temp index:
> you first prepare a new index in $GIT_DIR/index.lock.

If the new index is always prepared in $GIT_DIR/index.lock, then there
is no problem in the first place because when my original patch calls
clean_shared_index_files() the $GIT_DIR/index.lock is taken. Or maybe
I am missing something?

> If everything
> goes well, you atomically rename it to $GIT_DIR/index. You may be able
> to treat $GIT_DIR/index.lock special too, but that's starting to get
> out of hand.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jul 2016, #05; Wed, 13)

2016-07-13 Thread Junio C Hamano
Duy Nguyen  writes:

> On the subject of truncation, there is something else I should note.
> The field sd_size in struct stat_data is 32 bits, so large files will
> overflow it too, regardless of platforms. I did not do anything
> because I checked and double checked and was pretty sure we did not
> use it for anything meaningful (as a file size). To us, it's just
> another random number, like st_ino, that we check to detect if a file
> has changed.

Yes, the comparison to flip DATA_CHANGED bit near the end of
match_stat_data() has a cast for that exact reason.  It might
deserve a comment there.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jul 2016, #05; Wed, 13)

2016-07-13 Thread Duy Nguyen
On Wed, Jul 13, 2016 at 6:56 PM, Junio C Hamano  wrote:
> * nd/pack-ofs-4gb-limit (2016-07-13) 7 commits
>  - fsck: use streaming interface for large blobs in pack
>  - pack-objects: do not truncate result in-pack object size on 32-bit systems
>  - index-pack: correct "offset" type in unpack_entry_data()
>  - index-pack: report correct bad object offsets even if they are large
>  - index-pack: correct "len" type in unpack_data()
>  - sha1_file.c: use type off_t* for object_info->disk_sizep
>  - pack-objects: pass length to check_pack_crc() without truncation
>
>  "git pack-objects" and "git index-pack" mostly operate with off_t
>  when talking about the offset of objects in a packfile, but there
>  were a handful of places that used "unsigned long" to hold that
>  value, leading to an unintended truncation.

On the subject of truncation, there is something else I should note.
The field sd_size in struct stat_data is 32 bits, so large files will
overflow it too, regardless of platforms. I did not do anything
because I checked and double checked and was pretty sure we did not
use it for anything meaningful (as a file size). To us, it's just
another random number, like st_ino, that we check to detect if a file
has changed.

It's probably just an oversight (it comes from the very first "the
information manager from hell" commit). But it's not worth changing
index format now to extend it to 64 bits, I think. So it's ok, no
worry about it, but we should probably make clear that this is not
true file size, and don't anybody ever use it as such. Maybe rename it
to "size_hash" or something.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jul 2016, #05; Wed, 13)

2016-07-13 Thread Stefan Beller
> * sb/push-options (2016-07-12) 5 commits
>  - add a test for push options
>  - push: accept push options
>  - SQUASH???

Squash? I do not find a squashable commit in what you pushed,
do you intend to squash the first 2 patches instead?

>  - receive-pack: implement advertising and receiving push options
>  - push options: {pre,post}-receive hook learns about push options
>
>  "git push" learned to accept and pass extra options to the
>  receiving end so that hooks can read and react to them.
>
>  Discussion continues, expecting a further reroll.
>  ($gmane/299156)

Yeah there were some late comments, so I did not reroll right away.
I think Shawns proposal to have a receive.maxCommandBytes is a
good way for an overall upper bound, but how does it stop us from
going forward with this series? It seems like a good series to implement
on top of this?

(We also have no code for limiting the number of refs pushed, so I
do not quite understand why this DoS paranoia comes up with this
series all of a sudden.)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report: --invert-grep and --author seems to be incompatible.

2016-07-13 Thread Junio C Hamano
Jehan Pagès  writes:

>> I think --author=someone greps the "author " field in the commit
>> object looking for the hit with "someone", and your request asks to
>> show commits that either do not have "something" or was not written
>> by "someone", I would guess.
>
> Note that I can still see commits with "something", and I can also see
> commits by "someone" in my results. So my request actually ask for
> commits which have neither "something" nor are done by "someone".
>
> Anyway I don't think that's the expected result, hence is still a bug.
> Am I wrong?

Unlike "git grep", "git log" works with boolean expression that does
not explicitly have a way to say "--and" and "--or", so only one
interpretation has been chosen long time ago.  All the terms are
ORed together, and then the whole thing can be inverted with
"--invert-grep".  i.e. you are telling an equivalent of

$ git grep --not \( -e "author someone" --or -e "something" \)

with the command line, and there is no way to combine the requested
match criteria (there are two, "author must be somebody" and
"something must be in the message") differently.

Given that, that is the "right" expectation, and as long as you get
the behaviour, there is no "bug".

You can call it a lack of feature, though, and patches to implement
a more flexible combination of match criteria like "git grep" allows
is always welcome.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jul 2016, #05; Wed, 13)

2016-07-13 Thread Junio C Hamano
Stefan Beller  writes:

>> * sb/push-options (2016-07-12) 5 commits
>>  - add a test for push options
>>  - push: accept push options
>>  - SQUASH???
>
> Squash? I do not find a squashable commit in what you pushed,
> do you intend to squash the first 2 patches instead?

$ git log --oneline --first-parent master..pu | grep push-options
f8e50d4 Merge branch 'sb/push-options' into pu
$ git log --oneline master..f8e50d4^2
d6fd535 add a test for push options
ef00034 push: accept push options
6c5282d SQUASH???
ed0c716 receive-pack: implement advertising and receiving push options
f7c760f push options: {pre,post}-receive hook learns about push options
$ git show 6c5282d
commit 6c5282d7c5b50f362aaee2059c0253ab17b4fcd3
Author: Junio C Hamano 
Date:   Tue Jul 12 14:54:58 2016 -0700

SQUASH???

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 754db6e..4d8041a 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1503,8 +1503,6 @@ static struct string_list *read_push_options(void)
 {
int i;
struct string_list *ret = xmalloc(sizeof(*ret));
-   string_list_init(ret, 1);
-
/* NEEDSWORK: expose the limitations to be configurable. */
int max_options = 32;
 
@@ -1518,6 +1516,7 @@ static struct string_list *read_push_options(void)
 */
int max_size = 1024;
 
+   string_list_init(ret, 1);
for (i = 0; i < max_options; i++) {
char *line;
int len;

i.e. compilation fix for decl-after-stmt.

>>  - receive-pack: implement advertising and receiving push options
>>  - push options: {pre,post}-receive hook learns about push options
>>
>>  "git push" learned to accept and pass extra options to the
>>  receiving end so that hooks can read and react to them.
>>
>>  Discussion continues, expecting a further reroll.
>>  ($gmane/299156)
>
> Yeah there were some late comments, so I did not reroll right away.
> I think Shawns proposal to have a receive.maxCommandBytes is a
> good way for an overall upper bound, but how does it stop us from
> going forward with this series?

If we were to do maxcommandbytes, then max_options would become
irrelevant, no?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report: --invert-grep and --author seems to be incompatible.

2016-07-13 Thread Jehan Pagès
Hi,

On Wed, Jul 13, 2016 at 7:04 PM, Junio C Hamano  wrote:
> Jehan Pagès  writes:
>
>>> git log --author=someone --invert-grep --grep=something
>>
>> But it does not work. Actually it looks like it just returns
>> everything (as though I had done a simple `git log`).
>
> Do you see a commit that is by "someone" and has "something" in it
> in the output from the command?

Indeed you are right! Commits which are by "someone" and have
"something" in it in the same time are missing.
So here --invert-grep works as a big "NOT" operator on the whole rest
of the command line, which is not expected (at least by me).

> I think --author=someone greps the "author " field in the commit
> object looking for the hit with "someone", and your request asks to
> show commits that either do not have "something" or was not written
> by "someone", I would guess.

Note that I can still see commits with "something", and I can also see
commits by "someone" in my results. So my request actually ask for
commits which have neither "something" nor are done by "someone".

Anyway I don't think that's the expected result, hence is still a bug.
Am I wrong?

Jehan

-- 
ZeMarmot open animation film
http://film.zemarmot.net
Patreon: https://patreon.com/zemarmot
Tipeee: https://www.tipeee.com/zemarmot
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug report: --invert-grep and --author seems to be incompatible.

2016-07-13 Thread Jehan Pagès
Hello,

> $ git version
> git version 2.5.5

Tested on a Fedora 23.

You can combine `git log --author=someone --grep=something` to have
all commits by "someone" where "something" can be grepped. If I want
all commits by someone where "something" is not grepped, I would
expect this command to return it:

> git log --author=someone --invert-grep --grep=something

But it does not work. Actually it looks like it just returns
everything (as though I had done a simple `git log`). So there seems
to be a problem here.
Thanks!

Jehan

-- 
ZeMarmot open animation film
http://film.zemarmot.net
Patreon: https://patreon.com/zemarmot
Tipeee: https://www.tipeee.com/zemarmot
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report: --invert-grep and --author seems to be incompatible.

2016-07-13 Thread Junio C Hamano
Jehan Pagès  writes:

>> git log --author=someone --invert-grep --grep=something
>
> But it does not work. Actually it looks like it just returns
> everything (as though I had done a simple `git log`).

Do you see a commit that is by "someone" and has "something" in it
in the output from the command?

I think --author=someone greps the "author " field in the commit
object looking for the hit with "someone", and your request asks to
show commits that either do not have "something" or was not written
by "someone", I would guess.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


What's cooking in git.git (Jul 2016, #05; Wed, 13)

2016-07-13 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

It turns out that v2.9.1 had tests that weren't even meant to pass
on platforms with 32-bit time_t/unsigned long, but did not properly
exclude them.  A fix nas been prepared and I expect we'll cut v2.9.2
with only that change relative to v2.9.1 soonish.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[New Topics]

* js/t0006-for-v2.9.2 (2016-07-12) 1 commit
 - t0006: skip "far in the future" test when unsigned long is not long enough

 A test merged to v2.9.1 forgot that the feature it was testing
 would not work on a platform with 32-bit time_t/unsigned long and
 reported breakage.  Skip the tests that cannot run correctly on
 such platforms.

 Waiting for an Ack; will fast-track down to 'maint' to cut v2.9.2.


* js/am-3-merge-recursive-direct (2016-07-12) 21 commits
 - merge-recursive: flush output buffer even when erroring out
 - merge_trees(): ensure that the output buffer is released after calling it
 - merge-recursive: offer an option to retain the output in 'obuf'
 - merge-recursive: write the commit title in one go
 - merge-recursive: flush output buffer before printing error messages
 - am -3: use merge_recursive() directly again
 - merge-recursive: switch to returning errors instead of dying
 - merge-recursive: handle return values indicating errors
 - merge-recursive: allow write_tree_from_memory() to error out
 - merge-recursive: avoid returning a wholesale struct
 - merge_recursive: abort properly upon errors
 - prepare the builtins for a libified merge_recursive()
 - merge-recursive: clarify code in was_tracked()
 - die(_("BUG")): avoid translating bug messages
 - die("bug"): report bugs consistently
 - t5520: verify that `git pull --rebase` shows the advice when failing
 - Merge branch 'jh/clean-smudge-annex' into HEAD
 - Merge branch 'bc/cocci' into js/am-3-merge-recursive-direct
 - Merge branch 'js/am-call-theirs-theirs-in-fallback-3way' into 
js/am-3-merge-recursive-direct
 - Merge branch 'cc/apply-am' into js/am-3-merge-recursive-direct
 - Merge branch 'va/i18n-even-more' into js/am-3-merge-recursive-direct
 (this branch uses bc/cocci, cc/apply-am, jh/clean-smudge-annex, 
js/am-call-theirs-theirs-in-fallback-3way and va/i18n-even-more.)

 "git am -3" calls "git merge-recursive" when it needs to fall back
 to a three-way merge; this call has been turned into an internal
 subroutine call instead of spawning a separate subprocess.

 Will need to wait for all the topics this depends on graduate to 'master'.


* ls/travis-enable-httpd-tests (2016-07-12) 1 commit
 - travis-ci: enable web server tests t55xx on Linux

 Allow http daemon tests in Travis CI tests.


* nd/cache-tree-ita (2016-07-12) 4 commits
 - cache-tree: do not generate empty trees as a result of all i-t-a subentries
 - cache-tree.c: fix i-t-a entry skipping directory updates sometimes
 - test-lib.sh: introduce and use $_EMPTY_BLOB
 - test-lib.sh: introduce and use $_EMPTY_TREE

 "git add -N dir/file && git write-tree" produced an incorrect tree
 when there are other paths in the same directory that sorts after
 "file".

 Looked mostly OK.  Further review comments are welcome.


* nd/pack-ofs-4gb-limit (2016-07-13) 7 commits
 - fsck: use streaming interface for large blobs in pack
 - pack-objects: do not truncate result in-pack object size on 32-bit systems
 - index-pack: correct "offset" type in unpack_entry_data()
 - index-pack: report correct bad object offsets even if they are large
 - index-pack: correct "len" type in unpack_data()
 - sha1_file.c: use type off_t* for object_info->disk_sizep
 - pack-objects: pass length to check_pack_crc() without truncation

 "git pack-objects" and "git index-pack" mostly operate with off_t
 when talking about the offset of objects in a packfile, but there
 were a handful of places that used "unsigned long" to hold that
 value, leading to an unintended truncation.

 Will merge to 'next'.


* sb/push-options (2016-07-12) 5 commits
 - add a test for push options
 - push: accept push options
 - SQUASH???
 - receive-pack: implement advertising and receiving push options
 - push options: {pre,post}-receive hook learns about push options

 "git push" learned to accept and pass extra options to the
 receiving end so that hooks can read and react to them.

 Discussion continues, expecting a further reroll.
 ($gmane/299156)


* ew/http-walker (2016-07-12) 3 commits
 - http-walker: reduce O(n) ops with doubly-linked list
 - http: avoid disconnecting on 404s for loose objects
 - http-walker: remove unused parameter from fetch_object

 Optimize dumb http transport on the 

Re: [ANNOUNCE] Git v2.9.1

2016-07-13 Thread Junio C Hamano
Johannes Schindelin  writes:

> On Tue, 12 Jul 2016, Junio C Hamano wrote:
>
>> Jeff King  writes:
>> 
>> > In case it wasn't clear, I was mostly guessing there. So I dug a bit
>> > further, and indeed, I am wrong. Linux never bumped to a 64-bit time_t
>> > on i386 because of the ABI headaches.
>> 
>> X-< (yes, I knew).
>> 
>> > That being said, I still think the "clamp to time_t" strategy is
>> > reasonable. Unless you are doing something really exotic like pretending
>> > to be from the future, nobody will care for 20 years.
>> 
>> Yup.  It is a minor regression for them to go from ulong to time_t,
>> because they didn't have to care for 90 years or so but now they do
>> in 20 years, I'd guess, but hopefully after that many years,
>> everybody's time_t would be sufficiently large.
>> 
>> I suspect Cobol programmers in the 50s would have said a similar
>> thing about the y2k timebomb they created back then, though ;-)
>> 
>> > And at that point, systems with a 32-bit time_t are going to have
>> > to do _something_, because time() is going to start returning
>> > bogus values. So as long as we behave reasonably (e.g., clamping
>> > values and not generating wrapped nonsense), I think that's a fine
>> > solution.
>> 
>> OK.
>
> I kept the unsigned long -> time_t conversion after reading the thread so
> far.

That's OK at this point; it is not v2.9.x material anyway.

The primary reason why I cared 32-bit time_t is not about 2038, by
the way.  I recall that people wanted to store historical document
with ancient timestamp; even if we update to support negative
timestamps, they cannot go back to 19th century with 32-bit time_t,
but they can with long long or whatever intmax_t is on their system.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/7] Number truncation with 4+ GB files on 32-bit systems

2016-07-13 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> A diff from nd/pack-ofs-4gb-limit can explain the changes better than
> me.
>
> I did not add PRIdMAX or similar because that carries a risk to exotic
> platforms that people rarely test. Just casting to unsigned should be
> fine.

Looks very sensible.  Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.9.1

2016-07-13 Thread Junio C Hamano
Jeff King  writes:

> Definitely keep that paragraph. I am quite familiar with the test
> helper and it was not the outcome I initially expected.
>
>> +test_lazy_prereq 64BIT_TIME '
>> +case "$(test-date show:iso 99)" in
>> +*" -> 2038-"*)
>> +# on this platform, unsigned long is 32-bit, i.e. not large 
>> enough
>> +false
>
> I see you tightened up the match a little. TBH, I think we could
> probably just match the whole output string, but I doubt there's much
> chance of a false positive either way.

Ah, it wasn't meant to be a tightening; rather the above is for
documentation value, i.e. make it stand out what 2038 we are
matching---its answer being "the year portion of the result of the
conversion".

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] Speed up git status via a file system daemon

2016-07-13 Thread Ben Peart
Idea taken and code refactored from [1]:

The intent of this patch series is to separate the index-helper logic
from the Watchman logic.  With very large repos, the percentage of time
required to read the index from disk becomes a much smaller percentage
of the overall time.  The Watchman logic provides a huge performance win
for git status (8 seconds vs 12 minutes in our testing) vs the minimal
benefits index-helper provides (<1 second).

The current index-helper/watchman combination design requires two
daemons, IPC between git and the index-helper, IPC between the index
helper and Watchman, shared memory, adding a WAMA extension to the
index, etc.

The benefit of splitting these two efforts is a significantly simpler
set of changes that give the benefits of utilizing a file system
monitoring daemon like Watchman without the additional complexity and
overhead associated with index-helper.

The overall design of this refactored patch series is that when the
index is read from disk, a hook proc is called that returns the list of
potentially changed files.  Git then uses this set of files to flag the
potentially dirty index and untracked cache entries.

The logic to use this information to limit the scope of which files and
directories need to be checked for changes is the largely the same as
the original patch series.

One benefit of the hook proc is that it provides a simple, platform 
independent way to interface with the file system daemon.  It does not
require another daemon to be running to act as an intermediary between
git and the file system daemon to support communicating via named pipes 
or sockets.

The prototype currently has no mechanism for saving the date/time of the
query into the index so the list of potentially changed files grows
unbounded. Writing the additional 64 bit date/time data into an optional
extension seems like a lot of overhead but changing the on disk index
format is expensive as well from a compatibility perspective.  Other
ideas or suggestions welcome.

Thanks,

Ben

[1] http://article.gmane.org/gmane.comp.version-control.git/298243

---
 cache.h  |   4 ++
 dir.c|  14 -
 dir.h|   5 ++
 read-cache.c | 174 ++-
 4 files changed, 194 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index f1dc289..45c2eff 100644
--- a/cache.h
+++ b/cache.h
@@ -182,6 +182,8 @@ struct cache_entry {
 #define CE_VALID (0x8000)
 #define CE_STAGESHIFT 12
 
+#define CE_FSDAEMON_DIRTY  (0x0001)
+
 /*
  * Range 0x0FFF in ce_flags is divided into
  * two parts: in-memory flags and on-disk ones.
@@ -338,6 +340,8 @@ struct index_state {
struct hashmap dir_hash;
unsigned char sha1[20];
struct untracked_cache *untracked;
+   /* BENPEART-TODO: persist last_update in the index, use gmtime? */
+   time_t last_update;
 };
 
 extern struct index_state the_index;
diff --git a/dir.c b/dir.c
index 6172b34..bce921f 100644
--- a/dir.c
+++ b/dir.c
@@ -584,7 +584,7 @@ static void trim_trailing_spaces(char *buf)
  *
  * If "name" has the trailing slash, it'll be excluded in the search.
  */
-static struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
+struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
struct untracked_cache_dir 
*dir,
const char *name, int len)
 {
@@ -1551,6 +1551,17 @@ static int valid_cached_dir(struct dir_struct *dir,
if (!untracked)
return 0;
 
+   if (dir->untracked->use_fsdaemon) {
+   /*
+* With file system daemon, we can trust the untracked cache's
+* valid field.
+*/
+   if (untracked->valid)
+   goto skip_stat;
+   else
+   invalidate_directory(dir->untracked, untracked);
+   }
+
if (stat(path->len ? path->buf : ".", )) {
invalidate_directory(dir->untracked, untracked);
memset(>stat_data, 0, sizeof(untracked->stat_data));
@@ -1564,6 +1575,7 @@ static int valid_cached_dir(struct dir_struct *dir,
return 0;
}
 
+skip_stat:
if (untracked->check_only != !!check_only) {
invalidate_directory(dir->untracked, untracked);
return 0;
diff --git a/dir.h b/dir.h
index bfde698..8b7b98a 100644
--- a/dir.h
+++ b/dir.h
@@ -139,6 +139,8 @@ struct untracked_cache {
int gitignore_invalidated;
int dir_invalidated;
int dir_opened;
+   /* file system daemon invalidation data */
+   unsigned int use_fsdaemon : 1;
 };
 
 struct dir_struct {
@@ -308,4 +310,7 @@ struct untracked_cache *read_untracked_extension(const void 
*data, unsigned long
 void write_untracked_extension(struct strbuf *out, struct untracked_cache 
*untracked);
 void add_untracked_cache(struct 

Re: [PATCH v3] config: add conditional include

2016-07-13 Thread Duy Nguyen
On Wed, Jul 13, 2016 at 9:21 AM, Matthieu Moy
 wrote:
> Nguyễn Thái Ngọc Duy  writes:
>
>> +`gitdir`::
>> + The environment variable `GIT_DIR` must match the following
>> + pattern for files to be included. The pattern can contain
>> + standard globbing wildcards and two additional ones, `**/` and
>> + `/**`, that can match multiple path components. Please refer
>> + to linkgit:gitignore[5] for details. For convenience:
>
> It's unclear to me whether the whole content of GIT_DIR must match, or
> whether the pattern should be included (or a be prefix) of $GIT_DIR.
> From this text, I read it as "the whole content", but ...
>
>> + ; include for all repositories inside /path/to/group
>> + [include "gitdir:/path/to/group/"]
>> + path = /path/to/foo.inc
>> +
>> + ; include for all repositories inside $HOME/to/group
>> + [include "gitdir:~/to/group/"]
>> + path = /path/to/foo.inc
>
> ... here it seems it only has to be a prefix.

I should have written "with two additional ones... and a few
exceptions"., One of the bullet point below would say the trailing
slash is rewritten to "/**" so it becomes prefix match. If it proves
confusing, I will probably just get rid of that.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.9.1

2016-07-13 Thread Junio C Hamano
Johannes Schindelin  writes:

> This was just a quick fix, intended to allow me to release Git for Windows
> v2.9.1 in a timely manner (which is now delayed for other reasons).
> ...
>> You'll need to adjust check_show as I did in my earlier patch.
>
> Makes sense!

Hmph, so what is your preferred approach?  You'll do your own v2.9.1
that is different from others at t0006?

I was hoping to hear from you sooner and do v2.9.2 with your t0006
workaround with lazy-prereq changes from Peff (i.e. "Makes sense!"
above), so that you do not have to do two releases in a row
(i.e. skipping v2.9.1 saying "Git for Windows skipped that one
because it was not quite right; this release fixes the issue" in
your v2.9.2 announcement).

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/8] add smudgeToFile and cleanFromFile filter configs

2016-07-13 Thread Lars Schneider

> On 12 Jul 2016, at 00:45, Joey Hess  wrote:
> 
> This adds new smudgeToFile and cleanFromFile filter commands,
> which are similar to smudge and clean but allow direct access to files on
> disk.
> 
> This interface can be much more efficient when operating on large files,
> because the whole file content does not need to be streamed through the
> filter. It even allows for things like cleanFromFile commands that avoid
> reading the whole content of the file, and for smudgeToFile commands that
> populate a work tree file using an efficient Copy On Write operation.
> 
> The new filter commands will not be used for all filtering. They are
> efficient to use when git add is adding a file, or when the work tree is
> being updated, but not a good fit when git is internally filtering blob
> objects in memory for eg, a diff.
> 
> So, a user who wants to use smudgeToFile should also provide a smudge
> command to be used in cases where smudgeToFile is not used. And ditto
> with cleanFromFile and clean. To avoid foot-shooting configurations, the
> new commands are not used unless the old commands are also configured.
> 
> That also ensures that a filter driver configuration that includes these
> new commands will work, although less efficiently, when used with an older
> version of git that does not support them.
> 
> Signed-off-by: Joey Hess 
> ---
> Documentation/config.txt|  18 ++-
> Documentation/gitattributes.txt |  37 ++
> convert.c   | 111 +++-
> convert.h   |  10 
> 4 files changed, 160 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 19493aa..a55bed8 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1325,15 +1325,29 @@ format.useAutoBase::
>   format-patch by default.
> 
> filter..clean::
> - The command which is used to convert the content of a worktree
> + The command which is used as a filter to convert the content of a 
> worktree
>   file to a blob upon checkin.  See linkgit:gitattributes[5] for
>   details.
> 
> filter..smudge::
> - The command which is used to convert the content of a blob
> + The command which is used as a filter to convert the content of a blob
>   object to a worktree file upon checkout.  See
>   linkgit:gitattributes[5] for details.
> 
> +filter..cleanFromFile::
> + Similar to filter..clean but the specified command
> + directly accesses a worktree file on disk, rather than
> + receiving the file content from standard input.
> + Only used when filter..clean is also configured.
> + See linkgit:gitattributes[5] for details.
> +
> +filter..smudgeToFile::
> + Similar to filter..smudge but the specified command
> + writes the content of a blob directly to a worktree file,
> + rather than to standard output.
> + Only used when filter..smudge is also configured.
> + See linkgit:gitattributes[5] for details.
> +
> fsck.::
>   Allows overriding the message type (error, warn or ignore) of a
>   specific message ID such as `missingEmail`.
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 197ece8..a58aafc 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -385,6 +385,43 @@ not exist, or may have different contents. So, smudge 
> and clean commands
> should not try to access the file on disk, but only act as filters on the
> content provided to them on standard input.
> 
> +There are two extra commands "cleanFromFile" and "smudgeToFile", which
> +can optionally be set in a filter driver. These are similar to the "clean"
> +and "smudge" commands, but avoid needing to pipe the contents of files
> +through the filters, and instead read/write files in the filesystem.
> +This can be more efficient when using filters with large files that are not
> +directly stored in the repository.
> +
> +Both "cleanFromFile" and "smudgeToFile" are provided a path as an
> +added parameter after the configured command line.
> +
> +The "cleanFromFile" command is provided the path to the file that
> +it should clean. Like the "clean" command, it should output the cleaned
> +version to standard output.
> +
> +The "smudgeToFile" command is provided a path to the file that it
> +should write to. (This file will already exist, as an empty file that can
> +be written to or replaced.) Like the "smudge" command, "smudgeToFile"
> +is fed the blob object from its standard input.
> +
> +Some git operations that need to apply filters cannot use "cleanFromFile"
> +and "smudgeToFile", since the files are not present to disk. So, to avoid
> +inconsistent behavior, "cleanFromFile" will only be used if "clean" is
> +also configured, and "smudgeToFile" will only be used if "smudge" is also
> +configured.
> +
> +An example large file storage 

[PATCH v2 4/7] index-pack: report correct bad object offsets even if they are large

2016-07-13 Thread Nguyễn Thái Ngọc Duy
Use the right type for offsets in this case, off_t, which makes a
difference on 32-bit systems with large file support, and change
formatting code accordingly.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
 builtin/index-pack.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index cafaab7..e2d8ae4 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -338,10 +338,10 @@ static void parse_pack_header(void)
use(sizeof(struct pack_header));
 }
 
-static NORETURN void bad_object(unsigned long offset, const char *format,
+static NORETURN void bad_object(off_t offset, const char *format,
   ...) __attribute__((format (printf, 2, 3)));
 
-static NORETURN void bad_object(unsigned long offset, const char *format, ...)
+static NORETURN void bad_object(off_t offset, const char *format, ...)
 {
va_list params;
char buf[1024];
@@ -349,7 +349,8 @@ static NORETURN void bad_object(unsigned long offset, const 
char *format, ...)
va_start(params, format);
vsnprintf(buf, sizeof(buf), format, params);
va_end(params);
-   die(_("pack has bad object at offset %lu: %s"), offset, buf);
+   die(_("pack has bad object at offset %"PRIuMAX": %s"),
+   (uintmax_t)offset, buf);
 }
 
 static inline struct thread_local *get_thread_data(void)
-- 
2.9.1.564.gb2f7278

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/7] sha1_file.c: use type off_t* for object_info->disk_sizep

2016-07-13 Thread Nguyễn Thái Ngọc Duy
This field, filled by sha1_object_info() contains the on-disk size of
an object, which could go over 4GB limit of unsigned long on 32-bit
systems. Use off_t for it instead and update all callers.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
 builtin/cat-file.c | 4 ++--
 cache.h| 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 54db118..13ed944 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -131,7 +131,7 @@ struct expand_data {
unsigned char sha1[20];
enum object_type type;
unsigned long size;
-   unsigned long disk_size;
+   off_t disk_size;
const char *rest;
unsigned char delta_base_sha1[20];
 
@@ -184,7 +184,7 @@ static void expand_atom(struct strbuf *sb, const char 
*atom, int len,
if (data->mark_query)
data->info.disk_sizep = >disk_size;
else
-   strbuf_addf(sb, "%lu", data->disk_size);
+   strbuf_addf(sb, "%"PRIuMAX, (uintmax_t)data->disk_size);
} else if (is_atom("rest", atom, len)) {
if (data->mark_query)
data->split_on_whitespace = 1;
diff --git a/cache.h b/cache.h
index 4ff196c..ea64b51 100644
--- a/cache.h
+++ b/cache.h
@@ -1502,7 +1502,7 @@ struct object_info {
/* Request */
enum object_type *typep;
unsigned long *sizep;
-   unsigned long *disk_sizep;
+   off_t *disk_sizep;
unsigned char *delta_base_sha1;
struct strbuf *typename;
 
-- 
2.9.1.564.gb2f7278

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/38] Virtualization of the refs API

2016-07-13 Thread Michael Haggerty
On 07/10/2016 05:09 PM, Duy Nguyen wrote:
> On Fri, Jun 3, 2016 at 11:03 PM, Michael Haggerty  
> wrote:
>> Since the that ref-iterator [1] changes seem to have gotten a positive
>> reception, let's try to keep up the momentum. I hope I'm not
>> overloading the review pipeline...
>>
>> I think all of the groundwork is in place now to virtualize the refs
>> API. This will open the way to storing refs in ways other than the
>> familiar loose refs / packed refs format, such as David Turner's
>> proposed LMDB-based storage [2].
>>
>> This is a long patch series, but most of the patches are pretty simple
>> and formulaic. The goal is to implement a `ref_store`. In the language
>> of object-oriented programming, `ref_store` is an abstract base class
>> representing a reference storage backend. It provides methods to read,
>> write, and delete references and symrefs, and to iterate over
>> references, reflogs, and reflog entries, plus a number of other
>> things—19 methods in all.
> 
> I probably don't know what I'm talking about because I don't follow
> your work closely enough. Please ignore if this is nonsense. But if we
> extend/change API, we might need to update git-for-each-ref too, to
> expose it to shell scripts and external commands. I guess for
> iteration there's nothing else more needed, but we may need to
> introduction new options for the storage thing, e.g. to select
> storage...

This patch series doesn't change the external API in any significant
way. It only wraps it up on a virtualization layer so that a different
reference storage backends can be plugged in.

So as long as people are using plumbing commands to work with references
(rather than reading/writing files under $GIT_DIR directly), they should
notice no difference.

There are only two exceptions that I know of:

1. Users will need to be able to request that a repository use a
non-default reference backend, and (less importantly) inquire about
which reference backend a particular repository is using. Those
facilities will be added when the first non-files reference backend is
added.

2. At least one command (`git pack-refs`) is particular to the files
backend, and won't be needed (at least not in its current form) for
other backends. Conversely, it is conceivable that future reference
backends will require their own "maintenance" commands. Such commands
would be added as they are needed.

If there were operations that other reference backends could do much
more efficiently than the files backend (like, hypothetically, return
all references matching a regular expression without having to iterate
through all references), then it might make sense for performance
reasons to provide commands to access that functionality. But at the
moment I don't know of any such cases.

Michael

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/7] index-pack: correct "offset" type in unpack_entry_data()

2016-07-13 Thread Nguyễn Thái Ngọc Duy
unpack_entry_data() receives an off_t value from unpack_raw_entry(),
which could be larger than unsigned long on 32-bit systems with large
file support. Correct the type so truncation does not happen. This
only affects bad object reporting though.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
 builtin/index-pack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index e2d8ae4..1008d7f 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -430,7 +430,7 @@ static int is_delta_type(enum object_type type)
return (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA);
 }
 
-static void *unpack_entry_data(unsigned long offset, unsigned long size,
+static void *unpack_entry_data(off_t offset, unsigned long size,
   enum object_type type, unsigned char *sha1)
 {
static char fixed_buf[8192];
-- 
2.9.1.564.gb2f7278

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 6/7] pack-objects: do not truncate result in-pack object size on 32-bit systems

2016-07-13 Thread Nguyễn Thái Ngọc Duy
A typical diff will not show what's going on and you need to see full
functions. The core code is like this, at the end of of write_one()

e->idx.offset = *offset;
size = write_object(f, e, *offset);
if (!size) {
e->idx.offset = recursing;
return WRITE_ONE_BREAK;
}
written_list[nr_written++] = >idx;

/* make sure off_t is sufficiently large not to wrap */
if (signed_add_overflows(*offset, size))
die("pack too large for current definition of off_t");
*offset += size;

Here we can see that the in-pack object size is returned by
write_object (or indirectly by write_reuse_object). And it's used to
calculate object offsets, which end up in the pack index file,
generated at the end.

If "size" overflows (on 32-bit sytems, unsigned long is 32-bit while
off_t can be 64-bit), we got wrong offsets and produce incorrect .idx
file, which may make it look like the .pack file is corrupted.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
 builtin/pack-objects.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a3a98c5..ac7a3a5 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -341,8 +341,8 @@ static unsigned long write_no_reuse_object(struct sha1file 
*f, struct object_ent
 }
 
 /* Return 0 if we will bust the pack-size limit */
-static unsigned long write_reuse_object(struct sha1file *f, struct 
object_entry *entry,
-   unsigned long limit, int usable_delta)
+static off_t write_reuse_object(struct sha1file *f, struct object_entry *entry,
+   unsigned long limit, int usable_delta)
 {
struct packed_git *p = entry->in_pack;
struct pack_window *w_curs = NULL;
@@ -415,11 +415,12 @@ static unsigned long write_reuse_object(struct sha1file 
*f, struct object_entry
 }
 
 /* Return 0 if we will bust the pack-size limit */
-static unsigned long write_object(struct sha1file *f,
- struct object_entry *entry,
- off_t write_offset)
+static off_t write_object(struct sha1file *f,
+ struct object_entry *entry,
+ off_t write_offset)
 {
-   unsigned long limit, len;
+   unsigned long limit;
+   off_t len;
int usable_delta, to_reuse;
 
if (!pack_to_stdout)
@@ -491,7 +492,7 @@ static enum write_one_status write_one(struct sha1file *f,
   struct object_entry *e,
   off_t *offset)
 {
-   unsigned long size;
+   off_t size;
int recursing;
 
/*
-- 
2.9.1.564.gb2f7278

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/7] pack-objects: pass length to check_pack_crc() without truncation

2016-07-13 Thread Nguyễn Thái Ngọc Duy
On 32 bit systems with large file support, unsigned long is 32-bit
while the two offsets in the subtraction expression (pack-objects has
the exact same expression as in sha1_file.c but not shown in diff) are
in 64-bit. If an in-pack object is larger than 2^32 len/datalen is
truncated and we get a misleading "error: bad packed object CRC for
..." as a result.

Use off_t for len and datalen. check_pack_crc() already accepts this
argument as off_t and can deal with 4+ GB.

Noticed-by: Christoph Michelbach 
Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
 builtin/pack-objects.c | 2 +-
 sha1_file.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index b6664ce..a3a98c5 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -349,7 +349,7 @@ static unsigned long write_reuse_object(struct sha1file *f, 
struct object_entry
struct revindex_entry *revidx;
off_t offset;
enum object_type type = entry->type;
-   unsigned long datalen;
+   off_t datalen;
unsigned char header[10], dheader[10];
unsigned hdrlen;
 
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..cd9b560 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2282,7 +2282,7 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
 
if (do_check_packed_object_crc && p->index_version > 1) {
struct revindex_entry *revidx = find_pack_revindex(p, 
obj_offset);
-   unsigned long len = revidx[1].offset - obj_offset;
+   off_t len = revidx[1].offset - obj_offset;
if (check_pack_crc(p, _curs, obj_offset, len, 
revidx->nr)) {
const unsigned char *sha1 =
nth_packed_object_sha1(p, revidx->nr);
-- 
2.9.1.564.gb2f7278

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/7] Number truncation with 4+ GB files on 32-bit systems

2016-07-13 Thread Nguyễn Thái Ngọc Duy
A diff from nd/pack-ofs-4gb-limit can explain the changes better than
me.

I did not add PRIdMAX or similar because that carries a risk to exotic
platforms that people rarely test. Just casting to unsigned should be
fine.

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 55eac75..b08bc8b 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -356,6 +356,10 @@ static int fsck_sha1(const unsigned char *sha1)
 static int fsck_obj_buffer(const unsigned char *sha1, enum object_type type,
   unsigned long size, void *buffer, int *eaten)
 {
+   /*
+* Note, buffer may be NULL if type is OBJ_BLOB. See
+* verify_packfile(), data_valid variable for details.
+*/
struct object *obj;
obj = parse_object_buffer(sha1, type, size, buffer, eaten);
if (!obj) {
diff --git a/pack-check.c b/pack-check.c
index 14e8cb0..d123846 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -106,7 +106,7 @@ static int verify_packfile(struct packed_git *p,
enum object_type type;
unsigned long size;
off_t curpos;
-   int data_valid = 0;
+   int data_valid;
 
if (p->index_version > 1) {
off_t offset = entries[i].offset;
@@ -130,6 +130,7 @@ static int verify_packfile(struct packed_git *p,
 * the data in-core only to discard.
 */
data = NULL;
+   data_valid = 0;
} else {
data = unpack_entry(p, entries[i].offset, , );
data_valid = 1;
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index f9f3d13..096dbff 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -177,10 +177,9 @@ test_expect_success 'zip achiving, deflate' '
git archive --format=zip HEAD >/dev/null
 '
 
-test_expect_success 'fsck' '
-   test_must_fail git fsck 2>err &&
-   n=$(grep "error: attempting to allocate .* over limit" err | wc -l) &&
-   test "$n" -gt 1
+test_expect_success 'fsck large blobs' '
+   git fsck 2>err &&
+   test_must_be_empty err
 '
 
 test_done

Nguyễn Thái Ngọc Duy (7):
  pack-objects: pass length to check_pack_crc() without truncation
  sha1_file.c: use type off_t* for object_info->disk_sizep
  index-pack: correct "len" type in unpack_data()
  index-pack: report correct bad object offsets even if they are large
  index-pack: correct "offset" type in unpack_entry_data()
  pack-objects: do not truncate result in-pack object size on 32-bit systems
  fsck: use streaming interface for large blobs in pack

 builtin/cat-file.c |  4 ++--
 builtin/fsck.c |  4 
 builtin/index-pack.c   | 23 ---
 builtin/pack-objects.c | 17 +
 cache.h|  2 +-
 pack-check.c   | 23 +--
 pack.h |  1 +
 sha1_file.c|  2 +-
 t/t1050-large.sh   |  7 +++
 9 files changed, 54 insertions(+), 29 deletions(-)

-- 
2.9.1.564.gb2f7278

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/7] index-pack: correct "len" type in unpack_data()

2016-07-13 Thread Nguyễn Thái Ngọc Duy
On 32-bit systems with large file support, one entry could be larger
than 4GB and overflow "len". Correct it so we can unpack a full entry.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
 builtin/index-pack.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index e8c71fc..cafaab7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -549,13 +549,13 @@ static void *unpack_data(struct object_entry *obj,
 void *cb_data)
 {
off_t from = obj[0].idx.offset + obj[0].hdr_size;
-   unsigned long len = obj[1].idx.offset - from;
+   off_t len = obj[1].idx.offset - from;
unsigned char *data, *inbuf;
git_zstream stream;
int status;
 
data = xmallocz(consume ? 64*1024 : obj->size);
-   inbuf = xmalloc((len < 64*1024) ? len : 64*1024);
+   inbuf = xmalloc((len < 64*1024) ? (int)len : 64*1024);
 
memset(, 0, sizeof(stream));
git_inflate_init();
@@ -563,15 +563,15 @@ static void *unpack_data(struct object_entry *obj,
stream.avail_out = consume ? 64*1024 : obj->size;
 
do {
-   ssize_t n = (len < 64*1024) ? len : 64*1024;
+   ssize_t n = (len < 64*1024) ? (ssize_t)len : 64*1024;
n = xpread(get_thread_data()->pack_fd, inbuf, n, from);
if (n < 0)
die_errno(_("cannot pread pack file"));
if (!n)
-   die(Q_("premature end of pack file, %lu byte missing",
-  "premature end of pack file, %lu bytes missing",
-  len),
-   len);
+   die(Q_("premature end of pack file, %"PRIuMAX" byte 
missing",
+  "premature end of pack file, %"PRIuMAX" bytes 
missing",
+  (unsigned int)len),
+   (uintmax_t)len);
from += n;
len -= n;
stream.next_in = inbuf;
-- 
2.9.1.564.gb2f7278

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 7/7] fsck: use streaming interface for large blobs in pack

2016-07-13 Thread Nguyễn Thái Ngọc Duy
For blobs, we want to make sure the on-disk data is not corrupted
(i.e. can be inflated and produce the expected SHA-1). Blob content is
opaque, there's nothing else inside to check for.

For really large blobs, we may want to avoid unpacking the entire blob
in memory, just to check whether it produces the same SHA-1. On 32-bit
systems, we may not have enough virtual address space for such memory
allocation. And even on 64-bit where it's not a problem, allocating a
lot more memory could result in kicking other parts of systems to swap
file, generating lots of I/O and slowing everything down.

For this particular operation, not unpacking the blob and letting
check_sha1_signature, which supports streaming interface, do the job
is sufficient. check_sha1_signature() is not shown in the diff,
unfortunately. But if will be called when "data_valid && !data" is
false.

We will call the callback function "fn" with NULL as "data". The only
callback of this function is fsck_obj_buffer(), which does not touch
"data" at all if it's a blob.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
 builtin/fsck.c   |  4 
 pack-check.c | 23 +--
 pack.h   |  1 +
 t/t1050-large.sh |  7 +++
 4 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 55eac75..b08bc8b 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -356,6 +356,10 @@ static int fsck_sha1(const unsigned char *sha1)
 static int fsck_obj_buffer(const unsigned char *sha1, enum object_type type,
   unsigned long size, void *buffer, int *eaten)
 {
+   /*
+* Note, buffer may be NULL if type is OBJ_BLOB. See
+* verify_packfile(), data_valid variable for details.
+*/
struct object *obj;
obj = parse_object_buffer(sha1, type, size, buffer, eaten);
if (!obj) {
diff --git a/pack-check.c b/pack-check.c
index 1da89a4..d123846 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -105,6 +105,8 @@ static int verify_packfile(struct packed_git *p,
void *data;
enum object_type type;
unsigned long size;
+   off_t curpos;
+   int data_valid;
 
if (p->index_version > 1) {
off_t offset = entries[i].offset;
@@ -116,8 +118,25 @@ static int verify_packfile(struct packed_git *p,
sha1_to_hex(entries[i].sha1),
p->pack_name, (uintmax_t)offset);
}
-   data = unpack_entry(p, entries[i].offset, , );
-   if (!data)
+
+   curpos = entries[i].offset;
+   type = unpack_object_header(p, w_curs, , );
+   unuse_pack(w_curs);
+
+   if (type == OBJ_BLOB && big_file_threshold <= size) {
+   /*
+* Let check_sha1_signature() check it with
+* the streaming interface; no point slurping
+* the data in-core only to discard.
+*/
+   data = NULL;
+   data_valid = 0;
+   } else {
+   data = unpack_entry(p, entries[i].offset, , );
+   data_valid = 1;
+   }
+
+   if (data_valid && !data)
err = error("cannot unpack %s from %s at offset 
%"PRIuMAX"",
sha1_to_hex(entries[i].sha1), p->pack_name,
(uintmax_t)entries[i].offset);
diff --git a/pack.h b/pack.h
index 3223f5a..0e77429 100644
--- a/pack.h
+++ b/pack.h
@@ -74,6 +74,7 @@ struct pack_idx_entry {
 
 
 struct progress;
+/* Note, the data argument could be NULL if object type is blob */
 typedef int (*verify_fn)(const unsigned char*, enum object_type, unsigned 
long, void*, int*);
 
 extern const char *write_idx_file(const char *index_name, struct 
pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, const 
unsigned char *sha1);
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index f9f3d13..096dbff 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -177,10 +177,9 @@ test_expect_success 'zip achiving, deflate' '
git archive --format=zip HEAD >/dev/null
 '
 
-test_expect_success 'fsck' '
-   test_must_fail git fsck 2>err &&
-   n=$(grep "error: attempting to allocate .* over limit" err | wc -l) &&
-   test "$n" -gt 1
+test_expect_success 'fsck large blobs' '
+   git fsck 2>err &&
+   test_must_be_empty err
 '
 
 test_done
-- 
2.9.1.564.gb2f7278

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 8/8] read-cache: unlink old sharedindex files

2016-07-13 Thread Duy Nguyen
On Tue, Jul 12, 2016 at 9:45 PM, Christian Couder
 wrote:
> On Tue, Jul 12, 2016 at 5:12 PM, Duy Nguyen  wrote:
>>
>> No. People could create an index file anywhere in theory. So you don't
>> know how many index files there are.
>
> Maybe when an index file is created, its path and its sharedindex file
> could be appended into a log file.
> We could check this log file to see if we can remove sharedindex files.
> We would need to remove the entries in this log file for the indexes
> that are no longer there.
>
> Or instead of one log file we could have a file for each index file in
> a special directory called for example "indexinfo".
> So we could just delete the file if its related index is no longer there.

New files will require locking so people don't append at the same
time. And maybe another new host of problems. I think we can just go
with the garbage collection way that we have done for unreachable
objects.

Your indexinfo idea looks very close to multiple locking, an index
would lock the shared index it's linked to, preventing it from being
removed. For single locking, we can just create a file named $X.lock,
but for multiple locks, maybe we can go with
$X.lock-$index_trailing_sha1. Will it work? I don't know. Just
thinking out loud.

>> It really depends. If the shared part is too small for old indexes, we
>> might as well unsplit them. In practice though, the only long-term
>> index file is $GIT_DIR/index. If we don't delete old shared index
>> files too close to their creation time, temp index files will go away.
>
> We could treat $GIT_DIR/index specially so that if there are no temp
> index files, there should be nothing in "indexinfo".

No, temp index files are needed. And it will  be hard to treat
$GIT_DIR/index specially because updating it involves a temp index:
you first prepare a new index in $GIT_DIR/index.lock. If everything
goes well, you atomically rename it to $GIT_DIR/index. You may be able
to treat $GIT_DIR/index.lock special too, but that's starting to get
out of hand.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] test-lib.sh: introduce and use $_EMPTY_TREE

2016-07-13 Thread Duy Nguyen
On Tue, Jul 12, 2016 at 10:40 PM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> This is a special SHA1. Let's keep it at one place, easier to replace
>> later when the hash change comes, easier to recognize. Start with an
>> underscore to reduce the chances somebody may override it without
>> realizing it's predefined.
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> ---
>>  t/t-basic.sh|  2 +-
>>  t/t1100-commit-tree-options.sh  |  2 +-
>>  t/t4010-diff-pathspec.sh|  6 ++
>>  t/t4054-diff-bogus-tree.sh  | 10 --
>>  t/t5504-fetch-receive-strict.sh |  4 ++--
>>  t/test-lib.sh   |  4 +++-
>>  6 files changed, 13 insertions(+), 15 deletions(-)
>>
>> diff --git a/t/t-basic.sh b/t/t-basic.sh
>> index 60811a3..48214e9 100755
>> --- a/t/t-basic.sh
>> +++ b/t/t-basic.sh
>> @@ -834,7 +834,7 @@ test_expect_success 'git write-tree should be able to 
>> write an empty tree' '
>>  '
>>
>>  test_expect_success 'validate object ID of a known tree' '
>> - test "$tree" = 4b825dc642cb6eb9a060e54bf8d69288fbee4904
>> + test "$tree" = $_EMPTY_TREE
>>  '
>
> I doubt the point of, and I'd rather not to see, the leading
> underscore.  Are there existing uses of the name that want it to
> mean something different?

No. There is EMPTY_TREE in use, but it's exactly what we expect. It's
probably still a good idea to separate "global" variables from
per-file ones. But I don't feel strongly about this, so if a re-roll
is required (or somebody votes for underscore removal, including you),
I'll remove the underscore.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/4] cache-tree: do not generate empty trees as a result of all i-t-a subentries

2016-07-13 Thread Duy Nguyen
On Tue, Jul 12, 2016 at 10:49 PM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> diff --git a/cache-tree.c b/cache-tree.c
>> index c2676e8..2d50640 100644
>> --- a/cache-tree.c
>> +++ b/cache-tree.c
>> @@ -380,6 +380,13 @@ static int update_one(struct cache_tree *it,
>>   continue;
>>   }
>>
>> + /*
>> +  * "sub" can be an empty tree if subentries are i-t-a.
>> +  */
>> + if (sub && sub->cache_tree->entry_count < 0 &&
>> + !hashcmp(sha1, EMPTY_TREE_SHA1_BIN))
>> + continue;
>> +
>
> Looks sensible, except that it is unclear if it is correct to assume
> that "subentries are ita" always equals to "entry_count < 0" here.
> I _think_ it is OK, as the function itself does use the latter as
> the sign that it hit to_invalidate=1 case when it returns.

I had the same concern and double checked it. If one day we have a new
entry_count < 0 case that's not i-t-a, we'll need to refactor this
code a bit.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gc and repack ignore .git/*HEAD when checking reachability

2016-07-13 Thread Duy Nguyen
On Wed, Jul 13, 2016 at 10:20 AM, Johannes Schindelin
 wrote:
> Hi Duy,
>
> On Tue, 12 Jul 2016, Duy Nguyen wrote:
>
>> On Tue, Jul 12, 2016 at 5:51 PM, Jeff King  wrote:
>> > On Tue, Jul 12, 2016 at 05:46:12PM +0200, Duy Nguyen wrote:
>> >
>> >> I'm not opposed to letting one worktree see everything, but this move
>> >> makes it harder to write new scripts (or new builtin commands, even)
>> >> that works with both single and multiple worktrees because you refer
>> >> to one ref (in current worktree perspective) differently. If we kill
>> >> of the main worktree (i.e. git init always creates a linked worktree)
>> >> then it's less of a problem, but still a nuisance to write
>> >> refs/worktree/$CURRENT/ everywhere.
>> >
>> > True. I gave a suggestion for the reading side, but the writing side
>> > would still remain tedious.
>> >
>> > I wonder if, in a worktree, we could simply convert requests to read or
>> > write names that do not begin with "refs/" as "refs/worktree/$CURRENT/"?
>> > That makes it a read/write-time alias conversion, but the actual storage
>> > is just vanilla (so the ref storage doesn't need to care, and
>> > reachability just works).
>>
>> A conversion like that is already happening, but it works at
>> git_path() level instead and maps anything outside refs/ to
>> worktrees/$CURRENT.
>
> Wouldn't you agree that the entire discussion goes into a direction that
> reveals that it might simply be a better idea to require commands that want
> to have per-worktree refs to do that explicitly?

No. To me that's equivalent to let people deal explicitly with
file-based and lmdb refs backends everywhere. Unless the main worktree
concept will die (I doubt it) it may remain the common use case that
people care about and extra worktrees become second citizen that's
rarely tested. I prefer we have a single interface for dealing with
_any_ worktree. If there are fallouts, we deal with them.

> The same holds true for the config, BTW. I really have no love for the
> idea to make the config per-worktree. It just holds too many nasty
> opportunities for violate the Law of Least Surprises.
>
> Just to name one: imagine you check out a different branch in worktree A,
> then switch worktree B to the branch that A had, and all of a sudden you
> may end up with a different upstream!

Everything in moderation. You wouldn't want to enable sparse checkout
on one worktree and it suddenly affects all worktrees because
core.sparsecheckout is shared. And submodules are known not to work
when core.worktree is still shared.

I will not enforce any rule (unless it's very obvious that the other
way is wrong, like core.worktree). I will give you a rifle and you can
either hunt for food or shoot your foot. In other words, you should be
able to share everything if you like it that way while someone else
can share just some config vars, or even nothing in config file.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.9.1

2016-07-13 Thread Johannes Schindelin
Hi,

On Tue, 12 Jul 2016, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > In case it wasn't clear, I was mostly guessing there. So I dug a bit
> > further, and indeed, I am wrong. Linux never bumped to a 64-bit time_t
> > on i386 because of the ABI headaches.
> 
> X-< (yes, I knew).
> 
> > That being said, I still think the "clamp to time_t" strategy is
> > reasonable. Unless you are doing something really exotic like pretending
> > to be from the future, nobody will care for 20 years.
> 
> Yup.  It is a minor regression for them to go from ulong to time_t,
> because they didn't have to care for 90 years or so but now they do
> in 20 years, I'd guess, but hopefully after that many years,
> everybody's time_t would be sufficiently large.
> 
> I suspect Cobol programmers in the 50s would have said a similar
> thing about the y2k timebomb they created back then, though ;-)
> 
> > And at that point, systems with a 32-bit time_t are going to have
> > to do _something_, because time() is going to start returning
> > bogus values. So as long as we behave reasonably (e.g., clamping
> > values and not generating wrapped nonsense), I think that's a fine
> > solution.
> 
> OK.

I kept the unsigned long -> time_t conversion after reading the thread so
far.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Loan Offer

2016-07-13 Thread Quick Loans
Instant cash Loan with same day payout on all kinds of Loan are available at 
Quick Financial Home were loan is offered at 2% per annul. Email: 
quickloa...@foxmail.com


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Loan Offer

2016-07-13 Thread Quick Loans
Instant cash Loan with same day payout on all kinds of Loan are available at 
Quick Financial Home were loan is offered at 2% per annul. Email: 
quickloa...@foxmail.com


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] config: add conditional include

2016-07-13 Thread Matthieu Moy
Jeff King  writes:

> On Wed, Jul 13, 2016 at 09:21:37AM +0200, Matthieu Moy wrote:
>
>> > +static int prepare_include_condition_pattern(struct strbuf *pat)
>> > +{
>> > +  int prefix = 0;
>> > +
>> > +  /* TODO: maybe support ~user/ too */
>> > +  if (pat->buf[0] == '~' && is_dir_sep(pat->buf[1])) {
>> > +  struct strbuf path = STRBUF_INIT;
>> > +  const char *home = getenv("HOME");
>> > +
>> > +  if (!home)
>> > +  return error(_("$HOME is not defined"));
>> 
>> expand_user_path in path.c seems to do the same as you're doing (but
>> does deal with ~user). Any reason not to use it?
>
> I had a similar question, which Duy answered in:
>
>   http://article.gmane.org/gmane.comp.version-control.git/298528
>
> It does feel pretty hacky, though (especially for a case that seems
> unlikely to come up: people having wildcard patterns in the name of
> their home directory).

Ah, OK. Then I'd suggest at least an improvement in the comment:

/*
-* perform literal matching on the prefix part so that
-* any wildcard character in it can't create side effects.
+* perform literal matching on the expanded prefix part
+* so that any wildcard character in it (e.g in the
+* expansion of ~) can't create side effects.
 */

I would also rename the variable prefix -> expanded_prefix. As-is, the
code is really hard to understand IMHO.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.9.1

2016-07-13 Thread Johannes Schindelin
Hi Peff,

On Tue, 12 Jul 2016, Jeff King wrote:

> On Tue, Jul 12, 2016 at 01:25:20PM +0200, Johannes Schindelin wrote:
> 
> > [PATCH] Work around test failures due to timestamps being unsigned long
> > 
> > Git's source code refers to timestamps as unsigned longs. On 32-bit
> > platforms, as well as on Windows, unsigned long is not large enough to
> > capture dates that are "absurdly far in the future".
> > 
> > While we will fix this issue properly by replacing unsigned long ->
> > time_t, on the maint track we want to be a bit more conservative and
> > just skip those tests.
> > 
> > Signed-off-by: Johannes Schindelin 
> > ---
> 
> This looks like a reasonable interim fix for both Windows and for the
> general maint track. If we reliably produce "2038" for the 999... value
> then that's simpler than adding in magic to ask about sizeof(ulong). I
> also wondered if we should test forthe _correct_ value, but that would
> defeat the point of the test (999... is also far in the future).

This was just a quick fix, intended to allow me to release Git for Windows
v2.9.1 in a timely manner (which is now delayed for other reasons).

I think I can do even better than to inspect the source code whether it
reliably clamps the dates to PI hours on January 19th, 2038: We already
have a t/helper/test-date and we can easily teach it one new trick.

> One minor comment:
> 
> > diff --git a/t/t0006-date.sh b/t/t0006-date.sh
> > index 04ce535..d185640 100755
> > --- a/t/t0006-date.sh
> > +++ b/t/t0006-date.sh
> > @@ -48,10 +48,17 @@ check_show default "$TIME" 'Wed Jun 15 16:13:20 2016 
> > +0200'
> >  check_show raw "$TIME" '146600 +0200'
> >  check_show iso-local "$TIME" '2016-06-15 14:13:20 +'
> >  
> > -# arbitrary time absurdly far in the future
> > -FUTURE="5758122296 -0400"
> > -check_show iso   "$FUTURE" "2152-06-19 18:24:56 -0400"
> > -check_show iso-local "$FUTURE" "2152-06-19 22:24:56 +"
> > +case "$(test-date show:iso 99)" in
> > +*2038*)
> > +   # on this platform, unsigned long is 32-bit, i.e. not large enough
> > +   ;;
> > +*)
> > +   # arbitrary time absurdly far in the future
> > +   FUTURE="5758122296 -0400"
> > +   check_show iso   "$FUTURE" "2152-06-19 18:24:56 -0400"
> > +   check_show iso-local "$FUTURE" "2152-06-19 22:24:56 +"
> > +   ;;
> > +esac
> 
> It would be nice to wrap this in a prereq, like:
> 
>   test_lazy_prereq 64BIT_TIME '
>   case "$(test-date show:short 99)" in
>   *2038*)
>   false
>   ;;
>   *)
>   true
>   ;;
>   esac
>   '
> 
>   ...
>   check_show 64BIT_TIME iso   "$FUTURE" "2152-06-19 18:24:56 -0400"
>   check_show 64BIT_TIME iso-local "$FUTURE" "2152-06-19 22:24:56 +"
> 
> The main advantage is that it will number the tests consistently, and
> give us a "skipped" message (which can remind us to drop the prereq
> later when everything goes 64-bit).
> 
> But it also will do the right thing with test-date's output with respect
> to "-v" (not that we expect it to produce any output).
> 
> You'll need to adjust check_show as I did in my earlier patch.

Makes sense!

Thanks,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.9.1

2016-07-13 Thread Johannes Schindelin
Hi Duy,

On Tue, 12 Jul 2016, Duy Nguyen wrote:

> On Tue, Jul 12, 2016 at 3:46 PM, Jeff King  wrote:
> > On Tue, Jul 12, 2016 at 03:31:00PM +0200, Andreas Schwab wrote:
> >
> >> Johannes Schindelin  writes:
> >>
> >> > On Tue, 12 Jul 2016, Andreas Schwab wrote:
> >> >
> >> >> Johannes Schindelin  writes:
> >> >>
> >> >> >> PRIuMAX isn't compatible with time_t.
> >> >> >
> >> >> > That statement is wrong.
> >> >>
> >> >> No, it isn't.  PRIuMAX is for uintmax_t, and time_t is not uintmax_t
> >> >> (even if they happen to have the same representation).
> >> >
> >> > Sigh.
> >> >
> >> > So if it is wrong, what is right?
> >>
> >> The right thing is to add a cast, of course.
> >
> > In general, I think the right cast for time_t should be to (intmax_t),
> > and the formatting string should be PRIdMAX (which, as an aside, needs
> > an entry in git-compat-util.h).
> 
> Coincidentally, I have the same problem with unsigned long being
> 32-bit and have to switch to off_t in some places. Does anybody know
> what a fallback in git-compat-util for PRIdMAX would look like? I
> guess it's "lld"...

Yes, judging from the existing fallback for PRIuMAX, "lld" would be the
correct thing to do. And then it would be nice to introduce

#define PRIdMAX "I64d"

next to the PRIuMAX definition in compat/mingw.h, too.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.9.1

2016-07-13 Thread Johannes Schindelin
Hi Andreas,

On Tue, 12 Jul 2016, Andreas Schwab wrote:

> Johannes Schindelin  writes:
> 
> > On Tue, 12 Jul 2016, Andreas Schwab wrote:
> >
> >> Johannes Schindelin  writes:
> >> 
> >> >> PRIuMAX isn't compatible with time_t.
> >> >
> >> > That statement is wrong.
> >> 
> >> No, it isn't.  PRIuMAX is for uintmax_t, and time_t is not uintmax_t
> >> (even if they happen to have the same representation).
> >
> > Sigh.
> >
> > So if it is wrong, what is right?
> 
> The right thing is to add a cast, of course.

This was not helpful.

Ciao,
Johannes
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too

2016-07-13 Thread Kirill Smelkov
On Wed, Jul 13, 2016 at 04:26:53AM -0400, Jeff King wrote:
> On Fri, Jul 08, 2016 at 01:38:55PM +0300, Kirill Smelkov wrote:
> 
> > >   - we will not compute the same write order (which is based on
> > > traversal order), leading to packs that have less efficient cache
> > > characteristics
> > 
> > I agree the order can be not exactly the same. Still if original pack is
> > packed well (with good recency order), while using bitmap we will tend
> > to traverse it in close to original order.
> > 
> > Maybe I'm not completely right on this, but to me it looks to be the
> > case because if objects in original pack are put there linearly sorted
> > by recency order, and we use bitmap index to set of all reachable
> > objects from a root, and then just _linearly_ gather all those objects
> > from original pack by 1s in bitmap and put them in the same order into
> > destination pack, the recency order won't be broken.
> > 
> > Or am I maybe misunderstanding something?
> 
> Yeah, I think you can go some of the way by reusing the order from the
> old pack. But keep in mind that the bitmap result may also contain
> objects that are not yet packed. Those will just come in a big lump at
> the end of the bitmap (these are the "extended entries" in the bitmap
> code).
> 
> So I think if you were to repeatedly "git repack -adb" over time, you
> would get worse and worse ordering as objects are added to the
> repository.

Jeff, first of all thanks for clarifying.

So it is not-yet-packed-objects which make packing with bitmap less
efficient. I was originally keeping in mind fresh repacked repository
with just built bitmap index and for that case extracting pack with
bitmap index seems to be just ok, but the more not-yet-packed objects we
have the worse the result can be.

> As an aside, two other things that pack order matters for: it makes the
> bitmaps themselves compress better (because it increases locality of
> reachability, so you get nice runs of "1" or "0" bits).

Yes I agree and thanks for bringing this up - putting objects in recency
order in pack also makes bitmap index to have larger runs of same 1 or 0.

> It also makes
> the pack-reuse code more efficient (since in an ideal case, you can just
> dump a big block of data from the front of the pack). Note that the
> pack-reuse code that's in upstream git isn't that great; I have a better
> system on my big pile of patches to send upstream (that never seems to
> get smaller; ).

Yes, it also make sense. I saw write_reused_pack() in upstream git just
copy raw bytes from original to destination pack. You mentioned you have
something better for pack reuse - in your patch queue, in two words, is
it now reusing pack based on object, not raw bytes, or is it something
else?

In other words in which way it works better? (I'm just curious here as
it is interesting to know)


> > >   - we don't learn about the filename of trees and blobs, which is going
> > > to make the delta step much less efficient. This might be mitigated
> > > by turning on the bitmap name-hash cache; I don't recall how much
> > > detail pack-objects needs on the name (i.e., the full name versus
> > > just the hash).
> > 
> > If I understand it right, it uses only uint32_t name hash while searching. 
> > From
> > pack-objects.{h,c} :
> 
> Yeah, I think you are right. Not having the real names is a problem for
> doing rev-list output, but I think pack-objects doesn't care (though do
> note that the name-hash cache is not enabled by default).

Yes, for packing it is only hash which is used. And I assume name-hash
for bitmap is not enabled by default for compatibility with JGit code.

It would make sense to me to eventually enable name-hash bitmap
extension by default, as packing result is much better with it. And
those who care about compatibility with JGit can just turn it off in
their git config.

Just my thoughts.

> > > There may be other subtle things, too. The general idea of tying the
> > > bitmap use to pack_to_stdout is that you _do_ want to use it for
> > > serving fetches and pushes, but for a full on-disk repack via gc, it's
> > > more important to generate a good pack.
> > 
> > It is better we send good packs to clients too, right? And with
> > pack.writeBitmapHashCache=true and retaining recency order (please see
> > above, but again maybe I'm not completely right) to me we should be still
> > generating a good pack while using bitmap reachability index for object
> > graph traversal.
> 
> We do want to send the client a good pack, but it's always a tradeoff.
> We could spend much more time searching for the perfect delta, but at
> some point we have to decide on how much CPU to spend serving them.
> Likewise, even if the bitmapped packs we send are in slightly worse
> order, saving a minute of CPU time off of every clone of the kernel is a
> big deal.

Yes, this I understand and agree. Like I said above I was imagining
freshly repacked repo with recently rebuilt 

Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too

2016-07-13 Thread Jeff King
On Tue, Jul 12, 2016 at 10:08:08PM +0300, Kirill Smelkov wrote:

> > Or are we fine with my arguments about recency order staying the same
> > when using bitmap reachability index for object graph traversal, and this
> > way the patch is fine to go in as it is?
> 
> Since there is no reply I assume the safe way to go is to let default
> for pack-to-file case to be "not using bitmap index". Please find updated
> patch and interdiff below. I would still be grateful for feedback on
> my above use-bitmap-for-pack-to-file arguments.

Yeah, I think that is a reasonable approach. I see here you've added new
config, though, and I don't think we want that.

For your purposes, where you're driving pack-objects individually, I
think a command-line option makes more sense.

If we did want to have a flag for "use bitmaps when repacking via
repack", I think it should be "repack.useBitmaps", and git-repack should
pass the command-line option to pack-objects. pack-objects is porcelain
and should not really be reading config at all. You'll note that
pack.writeBitmaps was a mistake and got deprecated in favor of
repack.writeBitmaps. I think pack.useBitmaps is a mistake, too, but
nobody has really noticed or cared because there's no good reason to set
it (the more interesting question is: are there bitmaps available? and
if so, we try to use them).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too

2016-07-13 Thread Jeff King
On Fri, Jul 08, 2016 at 01:38:55PM +0300, Kirill Smelkov wrote:

> >   - we will not compute the same write order (which is based on
> > traversal order), leading to packs that have less efficient cache
> > characteristics
> 
> I agree the order can be not exactly the same. Still if original pack is
> packed well (with good recency order), while using bitmap we will tend
> to traverse it in close to original order.
> 
> Maybe I'm not completely right on this, but to me it looks to be the
> case because if objects in original pack are put there linearly sorted
> by recency order, and we use bitmap index to set of all reachable
> objects from a root, and then just _linearly_ gather all those objects
> from original pack by 1s in bitmap and put them in the same order into
> destination pack, the recency order won't be broken.
> 
> Or am I maybe misunderstanding something?

Yeah, I think you can go some of the way by reusing the order from the
old pack. But keep in mind that the bitmap result may also contain
objects that are not yet packed. Those will just come in a big lump at
the end of the bitmap (these are the "extended entries" in the bitmap
code).

So I think if you were to repeatedly "git repack -adb" over time, you
would get worse and worse ordering as objects are added to the
repository.

As an aside, two other things that pack order matters for: it makes the
bitmaps themselves compress better (because it increases locality of
reachability, so you get nice runs of "1" or "0" bits). It also makes
the pack-reuse code more efficient (since in an ideal case, you can just
dump a big block of data from the front of the pack). Note that the
pack-reuse code that's in upstream git isn't that great; I have a better
system on my big pile of patches to send upstream (that never seems to
get smaller; ).

> >   - we don't learn about the filename of trees and blobs, which is going
> > to make the delta step much less efficient. This might be mitigated
> > by turning on the bitmap name-hash cache; I don't recall how much
> > detail pack-objects needs on the name (i.e., the full name versus
> > just the hash).
> 
> If I understand it right, it uses only uint32_t name hash while searching. 
> From
> pack-objects.{h,c} :

Yeah, I think you are right. Not having the real names is a problem for
doing rev-list output, but I think pack-objects doesn't care (though do
note that the name-hash cache is not enabled by default).

> > There may be other subtle things, too. The general idea of tying the
> > bitmap use to pack_to_stdout is that you _do_ want to use it for
> > serving fetches and pushes, but for a full on-disk repack via gc, it's
> > more important to generate a good pack.
> 
> It is better we send good packs to clients too, right? And with
> pack.writeBitmapHashCache=true and retaining recency order (please see
> above, but again maybe I'm not completely right) to me we should be still
> generating a good pack while using bitmap reachability index for object
> graph traversal.

We do want to send the client a good pack, but it's always a tradeoff.
We could spend much more time searching for the perfect delta, but at
some point we have to decide on how much CPU to spend serving them.
Likewise, even if the bitmapped packs we send are in slightly worse
order, saving a minute of CPU time off of every clone of the kernel is a
big deal.

We also take robustness shortcuts when sending to clients. For example,
when doing an on-disk repack we re-crc32 all of the delta data we are
reusing, even if we don't actually inflate it (because we would want to
stop immediately if we see even a single bit flipped on disk). But we
don't check them when sending to a client, because we know they are
going to actually `index-pack` it and get a stronger consistency check
anyway, and don't want to waste server CPU.

The bitmaps are sort of the same. If there is a bug or corruption in the
bitmap, the worst case is that we send a broken pack to the client, who
will complain that we did not give them all of the objects. It's a
momentary problem that can be fixed. If you use them for an on-disk
repack, then the next step is usually to delete all of the old packs. So
a corruption there carries forward, and is irreversible.

As I understand your use case, it is OK to do the less careful things.
It's just that pack-objects until now has been split into two modes:
packing to a file is careful, and packing to stdout is less so. And you
want to pack to a file in the non-careful mode.

> > but I wonder if you should be using "pack-objects --stdout" yourself.
> 
> I already tried --stdout. The problem is on repository extraction we
> need to both extract the pack and index it. While `pack-object file`
> does both, for --stdout case we need to additionally index extracted
> pack with `git index-pack`, and standalone `git index-pack` is very slow
> - in my experience much slower than generating the pack itself:

Ah, right, that 

Re: gc and repack ignore .git/*HEAD when checking reachability

2016-07-13 Thread Johannes Schindelin
Hi Duy,

On Tue, 12 Jul 2016, Duy Nguyen wrote:

> On Tue, Jul 12, 2016 at 5:51 PM, Jeff King  wrote:
> > On Tue, Jul 12, 2016 at 05:46:12PM +0200, Duy Nguyen wrote:
> >
> >> I'm not opposed to letting one worktree see everything, but this move
> >> makes it harder to write new scripts (or new builtin commands, even)
> >> that works with both single and multiple worktrees because you refer
> >> to one ref (in current worktree perspective) differently. If we kill
> >> of the main worktree (i.e. git init always creates a linked worktree)
> >> then it's less of a problem, but still a nuisance to write
> >> refs/worktree/$CURRENT/ everywhere.
> >
> > True. I gave a suggestion for the reading side, but the writing side
> > would still remain tedious.
> >
> > I wonder if, in a worktree, we could simply convert requests to read or
> > write names that do not begin with "refs/" as "refs/worktree/$CURRENT/"?
> > That makes it a read/write-time alias conversion, but the actual storage
> > is just vanilla (so the ref storage doesn't need to care, and
> > reachability just works).
> 
> A conversion like that is already happening, but it works at
> git_path() level instead and maps anything outside refs/ to
> worktrees/$CURRENT.

Wouldn't you agree that the entire discussion goes into a direction that
reveals that it might simply be a better idea to require commands that want
to have per-worktree refs to do that explicitly?

I mean, it looks to me that the harder we try to avoid that, the more
problems crop up, some of that as serious as my reported data loss.

I do not see any indication that trying even harder to "protect" commands
from knowing that they are running in one of many worktrees is making
things easier. To the contrary, I expect that direction to hold many more
awful surprises for us.

The same holds true for the config, BTW. I really have no love for the
idea to make the config per-worktree. It just holds too many nasty
opportunities for violate the Law of Least Surprises.

Just to name one: imagine you check out a different branch in worktree A,
then switch worktree B to the branch that A had, and all of a sudden you
may end up with a different upstream!

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add very verbose porcelain output to status

2016-07-13 Thread Johannes Schindelin
Hi Jeff,

[please note that top-posting is discouraged on this mailing list]

On Tue, 12 Jul 2016, Jeff Hostetler wrote:

> Thanks for the feedback.  I like the overall direction of your
> suggestions.  Going for a porcelain V2 feels better than piling
> onto verbose.  I think that would also give us a little more license
> to address some of the line format issues that you point out.
> Let me retool what I have along these lines and see how it looks.

Thank you for working on this. I will hold off from releasing Git for
Windows v2.9.1 until this is sorted out, as I already merged v1 into our
`master` and do not want to end up with a bogus release.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jul 2016, #04; Mon, 11)

2016-07-13 Thread Johannes Schindelin
Hi Junio,

On Tue, 12 Jul 2016, Junio C Hamano wrote:

> On Tue, Jul 12, 2016 at 6:16 AM, Johannes Schindelin
>  wrote:
> >
> > On Mon, 11 Jul 2016, Junio C Hamano wrote:
> >
> >> [New Topics]
> >>
> >> [...]
> >
> > What about http://thread.gmane.org/gmane.comp.version-control.git/299050?
> 
> Not forgotten.
> 
> It just is not one of the "New Topics" yet, together with several
> other patches sent to
> the list recently, hence it is not listed there.

Thanks.

I just nudged you because this is a patch I needed to include it in Git
for Windows and would have loved more reviews than just mine.

Not a big deal, especially because it is not forgotten ;-)

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] Resend of gitster/pb/bisect

2016-07-13 Thread Christian Couder
Hi Pranit,

On Wed, Jul 13, 2016 at 12:35 AM, Pranit Bauva  wrote:
> Hey Junio,
>
> A small mistake got unnoticed by me which Lars recently pointed out.
> The naming convention is "git_path_" and underscore
> instead of spaces.

It's a good thing to resend when you find mistakes, but please use a
version number for your patch series (like "PATCH v3" or something).

Thanks,
Christian.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] config: add conditional include

2016-07-13 Thread Matthieu Moy
Nguyễn Thái Ngọc Duy  writes:

> +`gitdir`::
> + The environment variable `GIT_DIR` must match the following
> + pattern for files to be included. The pattern can contain
> + standard globbing wildcards and two additional ones, `**/` and
> + `/**`, that can match multiple path components. Please refer
> + to linkgit:gitignore[5] for details. For convenience:

It's unclear to me whether the whole content of GIT_DIR must match, or
whether the pattern should be included (or a be prefix) of $GIT_DIR.
>From this text, I read it as "the whole content", but ...

> + ; include for all repositories inside /path/to/group
> + [include "gitdir:/path/to/group/"]
> + path = /path/to/foo.inc
> +
> + ; include for all repositories inside $HOME/to/group
> + [include "gitdir:~/to/group/"]
> + path = /path/to/foo.inc

... here it seems it only has to be a prefix.

> +static int prepare_include_condition_pattern(struct strbuf *pat)
> +{
> + int prefix = 0;
> +
> + /* TODO: maybe support ~user/ too */
> + if (pat->buf[0] == '~' && is_dir_sep(pat->buf[1])) {
> + struct strbuf path = STRBUF_INIT;
> + const char *home = getenv("HOME");
> +
> + if (!home)
> + return error(_("$HOME is not defined"));

expand_user_path in path.c seems to do the same as you're doing (but
does deal with ~user). Any reason not to use it?

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] config: add conditional include

2016-07-13 Thread Jeff King
On Wed, Jul 13, 2016 at 09:21:37AM +0200, Matthieu Moy wrote:

> > +static int prepare_include_condition_pattern(struct strbuf *pat)
> > +{
> > +   int prefix = 0;
> > +
> > +   /* TODO: maybe support ~user/ too */
> > +   if (pat->buf[0] == '~' && is_dir_sep(pat->buf[1])) {
> > +   struct strbuf path = STRBUF_INIT;
> > +   const char *home = getenv("HOME");
> > +
> > +   if (!home)
> > +   return error(_("$HOME is not defined"));
> 
> expand_user_path in path.c seems to do the same as you're doing (but
> does deal with ~user). Any reason not to use it?

I had a similar question, which Duy answered in:

  http://article.gmane.org/gmane.comp.version-control.git/298528

It does feel pretty hacky, though (especially for a case that seems
unlikely to come up: people having wildcard patterns in the name of
their home directory).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems

2016-07-13 Thread Duy Nguyen
On Tue, Jul 12, 2016 at 10:38 PM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> Since I now could reproduce the problem that Christoph showed, I
>> decided to send the good patches out. To sum up, we use "unsigned
>> long" in some places related to file size. On 32-bit systems, it's
>> limited to 32 bits even though the system can handle files larger than
>> that (off_t is 64-bit). This fixes it.
>>
>> clang -Wshorten-64-to-32 is very helpful to spot these problems. I
>> have a couple more patches to clean all these warnings, but some need
>> more code study to see what is the right way to do.
>>
>> Most of the rest seems harmless, except for the local variable "size"
>> in builtin/pack-objects.c:write_one(). I might send 6/5 for that one.
>
> With 1-7/5 it appears t1050 is broken?

Oops, sorry I focused on testing big repos and forgot the test suite.
But hehe.. it's a good thing. That test was added by 735efde, where
fsck could keep on going after failing to inflate some blobs (instead
of hitting malloc failure and abort). Now it's even better, fsck
passes. I'll fix that up and resend with your squashes in.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html