Re: Seems to be pushing more than necessary

2015-03-23 Thread Graham Hay
If I push straight to the other repo, it only pushes the 3 objects I'd
expect (instead of 10,000+). So it looks like that is the problem, but
I don't really understand why.

From my point of view, there should be no difference, but I clearly
don't understand how it actually works. How does git decide what refs
and/or objects are the same?

For a bit of background, the reason I have 2 remotes is to try and
avoid pushing to master. We work in a highly regulated industry, and
our code needs to be reviewed before hitting the mainline. So I push
to my fork and create a PR to the blessed repo, that way if I
accidentally commit to master (I have form!) then I have an extra
chance to catch it and don't have to back it out.

The two repos started out the same though, the only differences should
be the new work I have done. Is there any way I can continue to work
like this, or do I have to choose between slow pushes and safety?

On 23 March 2015 at 10:41, Duy Nguyen pclo...@gmail.com wrote:
 On Mon, Mar 23, 2015 at 5:35 PM, Graham Hay grahamr...@gmail.com wrote:
 Hmm. I'm using a private fork of a repo, I pull from one and push to
 the other, e.g.

 git fetch foo
 git rebase foo/master
 git push --set-upstream origin bar

 It's quite possible my workflow is causing the problem, but I'm not
 sure what I could do differently. What do you mean by a no-share
 remote?

 I mean the refs (and associated objects) that are available on foo
 may be not available on bar so when you push to origin you just
 need to send more. That rebase could generate lots of new objects to
 push out too, I think.
 --
 Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seems to be pushing more than necessary

2015-03-23 Thread Duy Nguyen
On Mon, Mar 23, 2015 at 5:35 PM, Graham Hay grahamr...@gmail.com wrote:
 Hmm. I'm using a private fork of a repo, I pull from one and push to
 the other, e.g.

 git fetch foo
 git rebase foo/master
 git push --set-upstream origin bar

 It's quite possible my workflow is causing the problem, but I'm not
 sure what I could do differently. What do you mean by a no-share
 remote?

I mean the refs (and associated objects) that are available on foo
may be not available on bar so when you push to origin you just
need to send more. That rebase could generate lots of new objects to
push out too, I think.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seems to be pushing more than necessary

2015-03-23 Thread Graham Hay
Hmm. I'm using a private fork of a repo, I pull from one and push to
the other, e.g.

git fetch foo
git rebase foo/master
git push --set-upstream origin bar

It's quite possible my workflow is causing the problem, but I'm not
sure what I could do differently. What do you mean by a no-share
remote?

On 23 March 2015 at 10:05, Duy Nguyen pclo...@gmail.com wrote:
 On Thu, Mar 19, 2015 at 6:11 PM, Graham Hay grahamr...@gmail.com wrote:
 Try fast-export --anonymize as that would help us understand this.

 Attached.

 The bad news is it seems to be working for me (I recreated the remote
 repo from this dump). I notice that you have two remotes, one shares
 many refs (the remote ref39). The other, ref2, does not share any
 SHA-1 with refs in .git/refs/heads/. Any chance you push to a
 no-share remote, which results in a lot of objects to be sent?
 --
 Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seems to be pushing more than necessary

2015-03-23 Thread Duy Nguyen
On Thu, Mar 19, 2015 at 6:11 PM, Graham Hay grahamr...@gmail.com wrote:
 Try fast-export --anonymize as that would help us understand this.

 Attached.

The bad news is it seems to be working for me (I recreated the remote
repo from this dump). I notice that you have two remotes, one shares
many refs (the remote ref39). The other, ref2, does not share any
SHA-1 with refs in .git/refs/heads/. Any chance you push to a
no-share remote, which results in a lot of objects to be sent?
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seems to be pushing more than necessary

2015-03-19 Thread Duy Nguyen
On Wed, Mar 18, 2015 at 10:14 PM, Graham Hay grahamr...@gmail.com wrote:
 Got there eventually!

 $ git verify-pack --verbose bar.pack
 e13e21a1f49704ed35ddc3b15b6111a5f9b34702 commit 220 152 12
 03691863451ef9db6c69493da1fa556f9338a01d commit 334 227 164
 ... snip ...
 chain length = 50: 2 objects
 bar.pack: ok

 Now what do I do with it :)

Try fast-export --anonymize as that would help us understand this.
Or you can try to see if these commits exist in the remote repo. If
yes, that only confirms that push sends more that it should, but it's
hard to know why. Maybe if you fire up gitk and mark them commits,
you'll figure out a connection. There are actually objects in this
pack that are expected to exist in remote repo, but it's hard to
tell..
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seems to be pushing more than necessary

2015-03-18 Thread Graham Hay
 It would help if you pasted the push output. For example, does it stop
 at 20% at the compressing objects line or writing objects. How
 many total objects does it say?

It rattles through compressing objects, and the first 20% of
writing objects, then slows to a crawl.

Writing objects:  33% (3647/10804), 80.00 MiB | 112.00 KiB/s


 Another question is how big are these binary files on average? Git
 considers a file is big if its size is 512MB or more (see
 core.bigFileThreshold). If your binary files are are mostly under this
 limit, but still big enough, then git may still try to compare new
 objects with these to find the smallest diff to send. If it's the
 case, you could set core.bigFileThreshold to cover these binary files.

None of the files are very big (KB rather than MB), but there's a lot
of them. I'll try setting the threshold to something lower, thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seems to be pushing more than necessary

2015-03-18 Thread Duy Nguyen
On Wed, Mar 18, 2015 at 5:55 PM, Graham Hay grahamr...@gmail.com wrote:
 We have a fairly large repo (~2.4GB), mainly due to binary resources
 (for an ios app). I know this can generally be a problem, but I have a
 specific question.

 If I cut a branch, and edit a few (non-binary) files, and push, what
 should be uploaded? I assumed it was just the diff (I know whole
 compressed files are used, I mean the differences between my branch
 and where I cut it from). Is that correct?

 Because when I push, it grinds to a halt at the 20% mark, and feels
 like it's trying to push the entire repo. If I run git diff --stat
 --cached origin/foo I see the files I would expect (i.e. just those
 that have changed). If I run git format-patch origin/foo..foo the
 patch files total 1.7MB, which should upload in just a few seconds,
 but I've had pushes take over an hour. I'm using git 2.2.2 on Mac OS X
 (Mavericks), and ssh (g...@github.com).

 Am I doing it wrong? Is this the expected behaviour? If not, is
 there anything I can do to debug it?

It would help if you pasted the push output. For example, does it stop
at 20% at the compressing objects line or writing objects. How
many total objects does it say?

Another question is how big are these binary files on average? Git
considers a file is big if its size is 512MB or more (see
core.bigFileThreshold). If your binary files are are mostly under this
limit, but still big enough, then git may still try to compare new
objects with these to find the smallest diff to send. If it's the
case, you could set core.bigFileThreshold to cover these binary files.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seems to be pushing more than necessary

2015-03-18 Thread Duy Nguyen
On Wed, Mar 18, 2015 at 6:26 PM, Graham Hay grahamr...@gmail.com wrote:
 It would help if you pasted the push output. For example, does it stop
 at 20% at the compressing objects line or writing objects. How
 many total objects does it say?

 It rattles through compressing objects, and the first 20% of
 writing objects, then slows to a crawl.

 Writing objects:  33% (3647/10804), 80.00 MiB | 112.00 KiB/s

This 10804 looks wrong (i.e. sending that many compressed objects).
Also 80 MiB sent at that point. If you modify just a couple files,
something is really wrong because the number of new objects may be
hundreds at most, not thousands.

v2.2.2 supports git fast-export --anonymize [1] to create an
anonymized clone of your repo that you can share, which might help
us understand the problem.

There's also the environment variable GIT_TRACE_PACKET that can help
see what's going on at the protocol level, but I think you're on your
own because without access to this repo, SHA-1s from that trace may
not make much sense.

[1] https://github.com/git/git/commit/a8722750985a53cc502a66ae3d68a9e42c7fdb98
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seems to be pushing more than necessary

2015-03-18 Thread Duy Nguyen
On Wed, Mar 18, 2015 at 7:26 PM, Duy Nguyen pclo...@gmail.com wrote:
 It's quite a lot of work :) I created this script named git and put
 it in $PATH to capture input for pack-objects. You'll need to update
 /path/to/real/git to point to the real binary then you'll get
 /tmp/stdin

Forgot one important sentence: You need to push again using this fake
git program to save data in /tmp/stdin. Also you can stop the push
when it goes to compressing objects phase.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seems to be pushing more than necessary

2015-03-18 Thread Graham Hay
Are there any commands that I can use to show exactly what it is trying to push?

I'll see if I can create a (public) repo that has the same problem.
Thanks for your help.


 This 10804 looks wrong (i.e. sending that many compressed objects).
 Also 80 MiB sent at that point. If you modify just a couple files,
 something is really wrong because the number of new objects may be
 hundreds at most, not thousands.

 v2.2.2 supports git fast-export --anonymize [1] to create an
 anonymized clone of your repo that you can share, which might help
 us understand the problem.

 There's also the environment variable GIT_TRACE_PACKET that can help
 see what's going on at the protocol level, but I think you're on your
 own because without access to this repo, SHA-1s from that trace may
 not make much sense.

 [1] https://github.com/git/git/commit/a8722750985a53cc502a66ae3d68a9e42c7fdb98
 --
 Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seems to be pushing more than necessary

2015-03-18 Thread Duy Nguyen
On Wed, Mar 18, 2015 at 7:03 PM, Graham Hay grahamr...@gmail.com wrote:
 Are there any commands that I can use to show exactly what it is trying to 
 push?

It's a bit more than a command. If you push when GIT_TRACE is set to
2, you'll see it executes git pack-objects command with all its
arguments. This command expects some input from stdin. If you can
capture that, you can run it by yourself to create the exact pack that
is transferred over network. Run that pack through git index-pack
--verify-stat will show you SHA-1 of all sent objects.

It's quite a lot of work :) I created this script named git and put
it in $PATH to capture input for pack-objects. You'll need to update
/path/to/real/git to point to the real binary then you'll get
/tmp/stdin

-- 8 --
#!/bin/sh

if [ $1 = pack-objects ]; then
exec tee /tmp/stdin | /path/to/real/git $@
else
exec /path/to/real/git $@
fi
-- 8 --

The remaining steps may be this (may need tweaking)

git pack-objects '--all-progress-implied' '--revs' '--stdout' '--thin'
'--delta-base-offset' '--progress'  /tmp/stdin | git index-pack
--fix-thin --stdin
pack708538afeda8eb331858680e227f7713228ce782 -- new pack
git verify-pack --verbose
.git/objects/pack/pack-708538afeda8eb331858680e227f7713228ce782.pack
d75631bd83ebdf03d4b0d925ff6734380f801fc6 commit 567 377 12
dd44100a7cdad113b23d31876e469b74fbe21e1b tree   15069 10492 389
8f4bbccea759d7a47616e29bd55b3f205b3615c2 tree   3869 2831 10881
3db0460935bc843a2a70a0e087222eec61a0ff0d blob   12379 3529 13712

Here we can see this push of mine sends four objects, 1 commit, 2
trees and 1 blob.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seems to be pushing more than necessary

2015-03-18 Thread Duy Nguyen
On Wed, Mar 18, 2015 at 8:16 PM, Graham Hay grahamr...@gmail.com wrote:
 I created a repo with over 1GB of images, but it works as expected
 (only pushed 3 objects).

 Sorry, I must have done something wrong. I put that script in
 ~/Applications, and checked it worked. Then I ran this:

 $ GIT_TRACE=2 PATH=~/Applications:$PATH git push --set-upstream origin git-wtf

I think I encountered the same problem. Inserting
--exec-path=$HOME/Applications between git and push was probably
what made it work for me. Haven't investigated the reason yet. We
really should have an easier way to get this info without jumping
through hoops like this.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seems to be pushing more than necessary

2015-03-18 Thread Graham Hay
I created a repo with over 1GB of images, but it works as expected
(only pushed 3 objects).

Sorry, I must have done something wrong. I put that script in
~/Applications, and checked it worked. Then I ran this:

$ GIT_TRACE=2 PATH=~/Applications:$PATH git push --set-upstream origin git-wtf
12:48:28.839026 git.c:349   trace: built-in: git 'push'
'--set-upstream' 'origin' 'git-wtf'
12:48:28.907605 run-command.c:351   trace: run_command: 'ssh'
'g...@github.com' 'git-receive-pack
'\''grahamrhay/bornlucky-ios.git'\'''
12:48:30.137410 run-command.c:351   trace: run_command:
'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin'
'--delta-base-offset' '--progress'
12:48:30.138246 exec_cmd.c:130  trace: exec: 'git'
'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin'
'--delta-base-offset' '--progress'
12:48:30.144783 git.c:349   trace: built-in: git
'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin'
'--delta-base-offset' '--progress'
Counting objects: 10837, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (9301/9301), done.
Writing objects:  21% (2276/10837)

but there was nothing in /tmp/stdin. Have I missed a step? I tried
changing the tee to point to ~ in case it was permissions related.

I fear this is some Mac nonsense. I added an echo in the script, but
it only gets called for the first git incantation.


On 18 March 2015 at 12:34, Duy Nguyen pclo...@gmail.com wrote:
 On Wed, Mar 18, 2015 at 7:26 PM, Duy Nguyen pclo...@gmail.com wrote:
 It's quite a lot of work :) I created this script named git and put
 it in $PATH to capture input for pack-objects. You'll need to update
 /path/to/real/git to point to the real binary then you'll get
 /tmp/stdin

 Forgot one important sentence: You need to push again using this fake
 git program to save data in /tmp/stdin. Also you can stop the push
 when it goes to compressing objects phase.
 --
 Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seems to be pushing more than necessary

2015-03-18 Thread Graham Hay
Got there eventually!

$ git verify-pack --verbose bar.pack
e13e21a1f49704ed35ddc3b15b6111a5f9b34702 commit 220 152 12
03691863451ef9db6c69493da1fa556f9338a01d commit 334 227 164
... snip ...
chain length = 50: 2 objects
bar.pack: ok

Now what do I do with it :)

On 18 March 2015 at 13:33, Duy Nguyen pclo...@gmail.com wrote:
 On Wed, Mar 18, 2015 at 8:16 PM, Graham Hay grahamr...@gmail.com wrote:
 I created a repo with over 1GB of images, but it works as expected
 (only pushed 3 objects).

 Sorry, I must have done something wrong. I put that script in
 ~/Applications, and checked it worked. Then I ran this:

 $ GIT_TRACE=2 PATH=~/Applications:$PATH git push --set-upstream origin 
 git-wtf

 I think I encountered the same problem. Inserting
 --exec-path=$HOME/Applications between git and push was probably
 what made it work for me. Haven't investigated the reason yet. We
 really should have an easier way to get this info without jumping
 through hoops like this.
 --
 Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html