Re: Seems to be pushing more than necessary
If I push straight to the other repo, it only pushes the 3 objects I'd expect (instead of 10,000+). So it looks like that is the problem, but I don't really understand why. From my point of view, there should be no difference, but I clearly don't understand how it actually works. How does git decide what refs and/or objects are the same? For a bit of background, the reason I have 2 remotes is to try and avoid pushing to master. We work in a highly regulated industry, and our code needs to be reviewed before hitting the mainline. So I push to my fork and create a PR to the blessed repo, that way if I accidentally commit to master (I have form!) then I have an extra chance to catch it and don't have to back it out. The two repos started out the same though, the only differences should be the new work I have done. Is there any way I can continue to work like this, or do I have to choose between slow pushes and safety? On 23 March 2015 at 10:41, Duy Nguyen pclo...@gmail.com wrote: On Mon, Mar 23, 2015 at 5:35 PM, Graham Hay grahamr...@gmail.com wrote: Hmm. I'm using a private fork of a repo, I pull from one and push to the other, e.g. git fetch foo git rebase foo/master git push --set-upstream origin bar It's quite possible my workflow is causing the problem, but I'm not sure what I could do differently. What do you mean by a no-share remote? I mean the refs (and associated objects) that are available on foo may be not available on bar so when you push to origin you just need to send more. That rebase could generate lots of new objects to push out too, I think. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seems to be pushing more than necessary
On Mon, Mar 23, 2015 at 5:35 PM, Graham Hay grahamr...@gmail.com wrote: Hmm. I'm using a private fork of a repo, I pull from one and push to the other, e.g. git fetch foo git rebase foo/master git push --set-upstream origin bar It's quite possible my workflow is causing the problem, but I'm not sure what I could do differently. What do you mean by a no-share remote? I mean the refs (and associated objects) that are available on foo may be not available on bar so when you push to origin you just need to send more. That rebase could generate lots of new objects to push out too, I think. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seems to be pushing more than necessary
Hmm. I'm using a private fork of a repo, I pull from one and push to the other, e.g. git fetch foo git rebase foo/master git push --set-upstream origin bar It's quite possible my workflow is causing the problem, but I'm not sure what I could do differently. What do you mean by a no-share remote? On 23 March 2015 at 10:05, Duy Nguyen pclo...@gmail.com wrote: On Thu, Mar 19, 2015 at 6:11 PM, Graham Hay grahamr...@gmail.com wrote: Try fast-export --anonymize as that would help us understand this. Attached. The bad news is it seems to be working for me (I recreated the remote repo from this dump). I notice that you have two remotes, one shares many refs (the remote ref39). The other, ref2, does not share any SHA-1 with refs in .git/refs/heads/. Any chance you push to a no-share remote, which results in a lot of objects to be sent? -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seems to be pushing more than necessary
On Thu, Mar 19, 2015 at 6:11 PM, Graham Hay grahamr...@gmail.com wrote: Try fast-export --anonymize as that would help us understand this. Attached. The bad news is it seems to be working for me (I recreated the remote repo from this dump). I notice that you have two remotes, one shares many refs (the remote ref39). The other, ref2, does not share any SHA-1 with refs in .git/refs/heads/. Any chance you push to a no-share remote, which results in a lot of objects to be sent? -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Seems to be pushing more than necessary
That all seems quite reasonable, and is what I would expect to happen. However at the moment, if I create a branch from master and edit one line in one file, with no other changes on the remote, it takes me over an hour to push the new branch. On 19 March 2015 at 18:36, Junio C Hamano gits...@pobox.com wrote: Graham Hay grahamr...@gmail.com writes: We have a fairly large repo (~2.4GB), mainly due to binary resources (for an ios app). I know this can generally be a problem, but I have a specific question. If I cut a branch, and edit a few (non-binary) files, and push, what should be uploaded? I assumed it was just the diff (I know whole compressed files are used, I mean the differences between my branch and where I cut it from). Is that correct? If you start from this state: (the 'origin')(you) ---Z---A clone ----Z---A and edit a few files, say, a/b, a/c and d/e/f, and committed to make the history look like this: (the 'origin')(you) ---Z---A ---Z---A---B i.e. git diff --name-only A B would show these three files, then the next push from you to the origin, i.e. (the 'origin')(you) ---Z---A---B- push ---Z---A---B would involve transferring from you to the origin of the following: * The commit object that holds the message, authorship, etc. for B * The top-level tree object of commit B (as that is different from that of A) * The tree object for 'a', 'd', 'd/e' and the blob object for 'a/b', 'a/c', and 'd/e/f'. However, that assumes that nothing is happening on the 'origin' side. If the 'origin', for example, rewound its head to Z before you attempt to push your B, then you may end up sending objects that do not exist in Z that are reachable from B. Just like the above bullet points enumerated what is different between A and B, you can enumerate what is different between Z and A and add that to the above set. That would be what will be sent. If the 'origin' updated its tip to a commit you do not even know about, normally you will be prevented from pushing B because we would not want you to lose somebody else's work. If you forced such push, then you may end up sending a lot more. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seems to be pushing more than necessary
On Wed, Mar 18, 2015 at 10:14 PM, Graham Hay grahamr...@gmail.com wrote: Got there eventually! $ git verify-pack --verbose bar.pack e13e21a1f49704ed35ddc3b15b6111a5f9b34702 commit 220 152 12 03691863451ef9db6c69493da1fa556f9338a01d commit 334 227 164 ... snip ... chain length = 50: 2 objects bar.pack: ok Now what do I do with it :) Try fast-export --anonymize as that would help us understand this. Or you can try to see if these commits exist in the remote repo. If yes, that only confirms that push sends more that it should, but it's hard to know why. Maybe if you fire up gitk and mark them commits, you'll figure out a connection. There are actually objects in this pack that are expected to exist in remote repo, but it's hard to tell.. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Seems to be pushing more than necessary
Graham Hay grahamr...@gmail.com writes: We have a fairly large repo (~2.4GB), mainly due to binary resources (for an ios app). I know this can generally be a problem, but I have a specific question. If I cut a branch, and edit a few (non-binary) files, and push, what should be uploaded? I assumed it was just the diff (I know whole compressed files are used, I mean the differences between my branch and where I cut it from). Is that correct? If you start from this state: (the 'origin')(you) ---Z---A clone ----Z---A and edit a few files, say, a/b, a/c and d/e/f, and committed to make the history look like this: (the 'origin')(you) ---Z---A ---Z---A---B i.e. git diff --name-only A B would show these three files, then the next push from you to the origin, i.e. (the 'origin')(you) ---Z---A---B- push ---Z---A---B would involve transferring from you to the origin of the following: * The commit object that holds the message, authorship, etc. for B * The top-level tree object of commit B (as that is different from that of A) * The tree object for 'a', 'd', 'd/e' and the blob object for 'a/b', 'a/c', and 'd/e/f'. However, that assumes that nothing is happening on the 'origin' side. If the 'origin', for example, rewound its head to Z before you attempt to push your B, then you may end up sending objects that do not exist in Z that are reachable from B. Just like the above bullet points enumerated what is different between A and B, you can enumerate what is different between Z and A and add that to the above set. That would be what will be sent. If the 'origin' updated its tip to a commit you do not even know about, normally you will be prevented from pushing B because we would not want you to lose somebody else's work. If you forced such push, then you may end up sending a lot more. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seems to be pushing more than necessary
It would help if you pasted the push output. For example, does it stop at 20% at the compressing objects line or writing objects. How many total objects does it say? It rattles through compressing objects, and the first 20% of writing objects, then slows to a crawl. Writing objects: 33% (3647/10804), 80.00 MiB | 112.00 KiB/s Another question is how big are these binary files on average? Git considers a file is big if its size is 512MB or more (see core.bigFileThreshold). If your binary files are are mostly under this limit, but still big enough, then git may still try to compare new objects with these to find the smallest diff to send. If it's the case, you could set core.bigFileThreshold to cover these binary files. None of the files are very big (KB rather than MB), but there's a lot of them. I'll try setting the threshold to something lower, thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seems to be pushing more than necessary
On Wed, Mar 18, 2015 at 5:55 PM, Graham Hay grahamr...@gmail.com wrote: We have a fairly large repo (~2.4GB), mainly due to binary resources (for an ios app). I know this can generally be a problem, but I have a specific question. If I cut a branch, and edit a few (non-binary) files, and push, what should be uploaded? I assumed it was just the diff (I know whole compressed files are used, I mean the differences between my branch and where I cut it from). Is that correct? Because when I push, it grinds to a halt at the 20% mark, and feels like it's trying to push the entire repo. If I run git diff --stat --cached origin/foo I see the files I would expect (i.e. just those that have changed). If I run git format-patch origin/foo..foo the patch files total 1.7MB, which should upload in just a few seconds, but I've had pushes take over an hour. I'm using git 2.2.2 on Mac OS X (Mavericks), and ssh (g...@github.com). Am I doing it wrong? Is this the expected behaviour? If not, is there anything I can do to debug it? It would help if you pasted the push output. For example, does it stop at 20% at the compressing objects line or writing objects. How many total objects does it say? Another question is how big are these binary files on average? Git considers a file is big if its size is 512MB or more (see core.bigFileThreshold). If your binary files are are mostly under this limit, but still big enough, then git may still try to compare new objects with these to find the smallest diff to send. If it's the case, you could set core.bigFileThreshold to cover these binary files. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: Seems to be pushing more than necessary
We have a fairly large repo (~2.4GB), mainly due to binary resources (for an ios app). I know this can generally be a problem, but I have a specific question. If I cut a branch, and edit a few (non-binary) files, and push, what should be uploaded? I assumed it was just the diff (I know whole compressed files are used, I mean the differences between my branch and where I cut it from). Is that correct? Because when I push, it grinds to a halt at the 20% mark, and feels like it's trying to push the entire repo. If I run git diff --stat --cached origin/foo I see the files I would expect (i.e. just those that have changed). If I run git format-patch origin/foo..foo the patch files total 1.7MB, which should upload in just a few seconds, but I've had pushes take over an hour. I'm using git 2.2.2 on Mac OS X (Mavericks), and ssh (g...@github.com). Am I doing it wrong? Is this the expected behaviour? If not, is there anything I can do to debug it? Any help gratefully received. Thanks, Graham -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seems to be pushing more than necessary
On Wed, Mar 18, 2015 at 6:26 PM, Graham Hay grahamr...@gmail.com wrote: It would help if you pasted the push output. For example, does it stop at 20% at the compressing objects line or writing objects. How many total objects does it say? It rattles through compressing objects, and the first 20% of writing objects, then slows to a crawl. Writing objects: 33% (3647/10804), 80.00 MiB | 112.00 KiB/s This 10804 looks wrong (i.e. sending that many compressed objects). Also 80 MiB sent at that point. If you modify just a couple files, something is really wrong because the number of new objects may be hundreds at most, not thousands. v2.2.2 supports git fast-export --anonymize [1] to create an anonymized clone of your repo that you can share, which might help us understand the problem. There's also the environment variable GIT_TRACE_PACKET that can help see what's going on at the protocol level, but I think you're on your own because without access to this repo, SHA-1s from that trace may not make much sense. [1] https://github.com/git/git/commit/a8722750985a53cc502a66ae3d68a9e42c7fdb98 -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seems to be pushing more than necessary
On Wed, Mar 18, 2015 at 7:26 PM, Duy Nguyen pclo...@gmail.com wrote: It's quite a lot of work :) I created this script named git and put it in $PATH to capture input for pack-objects. You'll need to update /path/to/real/git to point to the real binary then you'll get /tmp/stdin Forgot one important sentence: You need to push again using this fake git program to save data in /tmp/stdin. Also you can stop the push when it goes to compressing objects phase. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seems to be pushing more than necessary
Are there any commands that I can use to show exactly what it is trying to push? I'll see if I can create a (public) repo that has the same problem. Thanks for your help. This 10804 looks wrong (i.e. sending that many compressed objects). Also 80 MiB sent at that point. If you modify just a couple files, something is really wrong because the number of new objects may be hundreds at most, not thousands. v2.2.2 supports git fast-export --anonymize [1] to create an anonymized clone of your repo that you can share, which might help us understand the problem. There's also the environment variable GIT_TRACE_PACKET that can help see what's going on at the protocol level, but I think you're on your own because without access to this repo, SHA-1s from that trace may not make much sense. [1] https://github.com/git/git/commit/a8722750985a53cc502a66ae3d68a9e42c7fdb98 -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seems to be pushing more than necessary
On Wed, Mar 18, 2015 at 7:03 PM, Graham Hay grahamr...@gmail.com wrote: Are there any commands that I can use to show exactly what it is trying to push? It's a bit more than a command. If you push when GIT_TRACE is set to 2, you'll see it executes git pack-objects command with all its arguments. This command expects some input from stdin. If you can capture that, you can run it by yourself to create the exact pack that is transferred over network. Run that pack through git index-pack --verify-stat will show you SHA-1 of all sent objects. It's quite a lot of work :) I created this script named git and put it in $PATH to capture input for pack-objects. You'll need to update /path/to/real/git to point to the real binary then you'll get /tmp/stdin -- 8 -- #!/bin/sh if [ $1 = pack-objects ]; then exec tee /tmp/stdin | /path/to/real/git $@ else exec /path/to/real/git $@ fi -- 8 -- The remaining steps may be this (may need tweaking) git pack-objects '--all-progress-implied' '--revs' '--stdout' '--thin' '--delta-base-offset' '--progress' /tmp/stdin | git index-pack --fix-thin --stdin pack708538afeda8eb331858680e227f7713228ce782 -- new pack git verify-pack --verbose .git/objects/pack/pack-708538afeda8eb331858680e227f7713228ce782.pack d75631bd83ebdf03d4b0d925ff6734380f801fc6 commit 567 377 12 dd44100a7cdad113b23d31876e469b74fbe21e1b tree 15069 10492 389 8f4bbccea759d7a47616e29bd55b3f205b3615c2 tree 3869 2831 10881 3db0460935bc843a2a70a0e087222eec61a0ff0d blob 12379 3529 13712 Here we can see this push of mine sends four objects, 1 commit, 2 trees and 1 blob. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seems to be pushing more than necessary
On Wed, Mar 18, 2015 at 8:16 PM, Graham Hay grahamr...@gmail.com wrote: I created a repo with over 1GB of images, but it works as expected (only pushed 3 objects). Sorry, I must have done something wrong. I put that script in ~/Applications, and checked it worked. Then I ran this: $ GIT_TRACE=2 PATH=~/Applications:$PATH git push --set-upstream origin git-wtf I think I encountered the same problem. Inserting --exec-path=$HOME/Applications between git and push was probably what made it work for me. Haven't investigated the reason yet. We really should have an easier way to get this info without jumping through hoops like this. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seems to be pushing more than necessary
I created a repo with over 1GB of images, but it works as expected (only pushed 3 objects). Sorry, I must have done something wrong. I put that script in ~/Applications, and checked it worked. Then I ran this: $ GIT_TRACE=2 PATH=~/Applications:$PATH git push --set-upstream origin git-wtf 12:48:28.839026 git.c:349 trace: built-in: git 'push' '--set-upstream' 'origin' 'git-wtf' 12:48:28.907605 run-command.c:351 trace: run_command: 'ssh' 'g...@github.com' 'git-receive-pack '\''grahamrhay/bornlucky-ios.git'\''' 12:48:30.137410 run-command.c:351 trace: run_command: 'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin' '--delta-base-offset' '--progress' 12:48:30.138246 exec_cmd.c:130 trace: exec: 'git' 'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin' '--delta-base-offset' '--progress' 12:48:30.144783 git.c:349 trace: built-in: git 'pack-objects' '--all-progress-implied' '--revs' '--stdout' '--thin' '--delta-base-offset' '--progress' Counting objects: 10837, done. Delta compression using up to 4 threads. Compressing objects: 100% (9301/9301), done. Writing objects: 21% (2276/10837) but there was nothing in /tmp/stdin. Have I missed a step? I tried changing the tee to point to ~ in case it was permissions related. I fear this is some Mac nonsense. I added an echo in the script, but it only gets called for the first git incantation. On 18 March 2015 at 12:34, Duy Nguyen pclo...@gmail.com wrote: On Wed, Mar 18, 2015 at 7:26 PM, Duy Nguyen pclo...@gmail.com wrote: It's quite a lot of work :) I created this script named git and put it in $PATH to capture input for pack-objects. You'll need to update /path/to/real/git to point to the real binary then you'll get /tmp/stdin Forgot one important sentence: You need to push again using this fake git program to save data in /tmp/stdin. Also you can stop the push when it goes to compressing objects phase. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seems to be pushing more than necessary
Got there eventually! $ git verify-pack --verbose bar.pack e13e21a1f49704ed35ddc3b15b6111a5f9b34702 commit 220 152 12 03691863451ef9db6c69493da1fa556f9338a01d commit 334 227 164 ... snip ... chain length = 50: 2 objects bar.pack: ok Now what do I do with it :) On 18 March 2015 at 13:33, Duy Nguyen pclo...@gmail.com wrote: On Wed, Mar 18, 2015 at 8:16 PM, Graham Hay grahamr...@gmail.com wrote: I created a repo with over 1GB of images, but it works as expected (only pushed 3 objects). Sorry, I must have done something wrong. I put that script in ~/Applications, and checked it worked. Then I ran this: $ GIT_TRACE=2 PATH=~/Applications:$PATH git push --set-upstream origin git-wtf I think I encountered the same problem. Inserting --exec-path=$HOME/Applications between git and push was probably what made it work for me. Haven't investigated the reason yet. We really should have an easier way to get this info without jumping through hoops like this. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html