RE: Why does send-pack call pack-objects for all remote refs?
> You might also try repacking with "git repack -adb", which will > build reachability bitmaps. Pack-objects can use them to compute > the set of required objects much faster. Running "git repack -adb" caused my push time to incease by about 5x. I made some fresh clones and tried other options with repack, and consistently anything I tried with -b caused the push time to increase about 5x. I don't know much about reachability bitmaps, but perhaps it is important to note that I timed the pushes after repacking on Git for Windows. My earlier timings were done on both Linux and Windows and I did not see a significant difference. > It's definitely a lot, but it's not unheard of. The git project has > over 500 tags. That's not 2000, but you're within an order of > magnitude. > > I have seen repositories with 20,000+ tags. I consider that a bit > more ridiculous, but it does work in practice. Thanks for the extra context. I'll keep that in mind while I decide how to approach this. Daniel
Re: Why does send-pack call pack-objects for all remote refs?
Jeff King wrote: > Hmm. I guess that makes sense. The bitmap we want is the set difference > between the objects we are sending, and the tips the other side has. If > we have a bitmap at each ref tip, that's very fast. But if you have a > very large number of refs, we don't make one for each ref, and it has to > fallback to walking to the nearest one (and it ends up worse than a > regular walk, because it's filling in the bitmap for each tree, rather > than just doing the "good enough" commit walk that we usually do). > > I suspect there's room for improvement in the way we select commits to > store bitmaps for (so that the average walk is smaller). But it's rather > tricky; there's not a single constant to change to make it work better. Git gc and JGit GC differ here. JGit partitions the commits being packed by branch and then runs a selection algorithm on each part. Git runs a selection once on a list of all commits. Some effects: - JGit selects more bitmaps, so the gc takes longer and the resulting bitmap file is larger (bad) - JGit is more likely to have bitmaps for the commits involved in pushes and fetches (good) The commit selection code, for reference: https://eclipse.googlesource.com/jgit/jgit/+/86af34e1/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/pack/PackWriterBitmapPreparer.java#151 https://kernel.googlesource.com/pub/scm/git/git/+/ed1c9977/pack-bitmap-write.c#383 Thoughts? Jonathan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why does send-pack call pack-objects for all remote refs?
On Mon, Dec 14, 2015 at 02:31:55PM -0800, Jonathan Nieder wrote: > > I suspect there's room for improvement in the way we select commits to > > store bitmaps for (so that the average walk is smaller). But it's rather > > tricky; there's not a single constant to change to make it work better. > > Git gc and JGit GC differ here. JGit partitions the commits being > packed by branch and then runs a selection algorithm on each part. > Git runs a selection once on a list of all commits. > > Some effects: > - JGit selects more bitmaps, so the gc takes longer and the resulting > bitmap file is larger (bad) > - JGit is more likely to have bitmaps for the commits involved in > pushes and fetches (good) > > The commit selection code, for reference: > > https://eclipse.googlesource.com/jgit/jgit/+/86af34e1/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/pack/PackWriterBitmapPreparer.java#151 > https://kernel.googlesource.com/pub/scm/git/git/+/ed1c9977/pack-bitmap-write.c#383 > > Thoughts? My thought is it would be great if somebody wanted to work on this. :) My understanding is that JGit's approach has some problems, too. Terry's message doesn't seem to have made it to the list, but you can see in the quoted bits he mentions some OOM problems during the bitmap write: http://article.gmane.org/gmane.comp.version-control.git/281476 That may not be a big deal to work around. I really just haven't looked at it at all. Vicent did the original bitmap selection code for C Git, and I don't think it has been touched since then. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why does send-pack call pack-objects for all remote refs?
On Mon, Dec 14, 2015 at 01:47:39PM +, Daniel Koverman wrote: > > You might also try repacking with "git repack -adb", which will > > build reachability bitmaps. Pack-objects can use them to compute > > the set of required objects much faster. > > Running "git repack -adb" caused my push time to incease by about 5x. > I made some fresh clones and tried other options with repack, and > consistently anything I tried with -b caused the push time to > increase about 5x. > > I don't know much about reachability bitmaps, but perhaps it is > important to note that I timed the pushes after repacking on Git for > Windows. My earlier timings were done on both Linux and Windows and I > did not see a significant difference. Hmm. I guess that makes sense. The bitmap we want is the set difference between the objects we are sending, and the tips the other side has. If we have a bitmap at each ref tip, that's very fast. But if you have a very large number of refs, we don't make one for each ref, and it has to fallback to walking to the nearest one (and it ends up worse than a regular walk, because it's filling in the bitmap for each tree, rather than just doing the "good enough" commit walk that we usually do). I suspect there's room for improvement in the way we select commits to store bitmaps for (so that the average walk is smaller). But it's rather tricky; there's not a single constant to change to make it work better. Thanks for trying out my suggestion. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why does send-pack call pack-objects for all remote refs?
> On Dec 9, 2015, at 9:19 PM, Jeff Kingwrote: > > On Tue, Dec 08, 2015 at 05:34:43PM +, Daniel Koverman wrote: > >> It is also good to know that 2000 remote refs is insane. The lower >> hanging fruit here sounds like trimming that to a reasonable >> number, so I'll try that approach first. > > It's definitely a lot, but it's not unheard of. The git project has over > 500 tags. That's not 2000, but you're within an order of magnitude. > > I have seen repositories with 20,000+ tags. I consider that a bit more > ridiculous, but it does work in practice. > We have one at $DAY_JOB with 400,000+ refs. It presents some issues, but Martin has raised those with the community and it works pretty well now. Ref advertisement is still a pain... > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why does send-pack call pack-objects for all remote refs?
On Tue, Dec 08, 2015 at 05:34:43PM +, Daniel Koverman wrote: > Your interpretation of my email was correct. As you picked up on, I > had a fundamental misunderstanding of what pack-objects was doing. > Thanks for the explanation, I have a much better idea of what is > going on now. > > Given my use pattern, it may be reasonable for me to patch in an > option to compute > > git rev-list --objects $my_topic --not $subset_of_remote_refs You might also try repacking with "git repack -adb", which will build reachability bitmaps. Pack-objects can use them to compute the set of required objects much faster. > It is also good to know that 2000 remote refs is insane. The lower > hanging fruit here sounds like trimming that to a reasonable > number, so I'll try that approach first. It's definitely a lot, but it's not unheard of. The git project has over 500 tags. That's not 2000, but you're within an order of magnitude. I have seen repositories with 20,000+ tags. I consider that a bit more ridiculous, but it does work in practice. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Why does send-pack call pack-objects for all remote refs?
Your interpretation of my email was correct. As you picked up on, I had a fundamental misunderstanding of what pack-objects was doing. Thanks for the explanation, I have a much better idea of what is going on now. Given my use pattern, it may be reasonable for me to patch in an option to compute git rev-list --objects $my_topic --not $subset_of_remote_refs capitalizing on my knowledge of this particular repository to come up with heuristics for picking a reasonable subset. This will come with the risk of sometimes producing an unnecessarily large (potentially an obscenely large) packfile. You have thoroughly convinced me that an option like that will not generalize and would be unsuitable for main line git. It is also good to know that 2000 remote refs is insane. The lower hanging fruit here sounds like trimming that to a reasonable number, so I'll try that approach first. Thanks again, Junio and Peff. Daniel -Original Message- From: Jeff King [mailto:p...@peff.net] Sent: Monday, December 07, 2015 5:57 PM To: Daniel Koverman Cc: Junio C Hamano; git@vger.kernel.org Subject: Re: Why does send-pack call pack-objects for all remote refs? On Mon, Dec 07, 2015 at 02:41:00PM -0800, Junio C Hamano wrote: > Also it was unclear if you are working with a shallow repository. > The performance trade-off made between the packsize and the cycles > is somewhat different between a normal and a shallow repository, > e.g. 2dacf26d (pack-objects: use --objects-edge-aggressive for > shallow repos, 2014-12-24) might be a good starting point to think > about this issue. Also note that for a while the "aggressive" form was used everywhere. I think that started in fbd4a70 (list-objects: mark more commits as edges in mark_edges_uninteresting - 2013-08-16), and was fixed in 1684c1b (rev-list: add an option to mark fewer edges as uninteresting, 2014-12-24). So it was present from v1.8.4.2 up to v2.3.0. -Peff N�r��yb�X��ǧv�^�){.n�+ا���ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf
Why does send-pack call pack-objects for all remote refs?
I have a repository which has ~2000 branches on the remote, and it takes ~8 seconds to push a change to one ref. The majority of this time is spent in pack-object. I wrote a hack so that only the ref being updated would be packed (the normal behavior is to pack for every ref on the remote). The push time dropped to <1 second with (seemingly) no consequences. This raised a couple of questions: 1) Are there consequences for not packing refs that are not being updated? Can all operations in send-pack which operate on other refs be skipped? 2) Why isn't git more selective about what it packs? Is it simply a performance optimization not worth the implementation complexity? 3) Is something about my repository strange? ex: 2000 is too many branches, packing for each ref takes me an unusually long time, etc. Thanks in advance to anyone who takes the time to clarify this for me. Daniel -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why does send-pack call pack-objects for all remote refs?
Daniel Kovermanwrites: > I have a repository which has ~2000 branches on the remote, and it > takes ~8 seconds to push a change to one ref. The majority of this > time is spent in pack-object. I wrote a hack so that only the ref > being updated would be packed (the normal behavior is to pack for > every ref on the remote). I am having a hard time understanding what you are trying to say, as nobody's pack-objects "packs for a ref" or "packs a ref", so my response has to be based on my best guess---I think you are talking about feeding the object names of the tips of all remote refs as the bottoms of the revision range to pack-objects. When you are pushing your 'topic' branch to update the 'topic' branch at the remote, it is true that we compute git rev-list --objects $your_topic --not $all_of_the_remote_refs to produce a packfile. And by tweaking this to git rev-list --objects $your_topic --not $their_topic you will cut down the processing time of 'rev-list', especially if you have insane number of refs at the remote end. There is a price you would pay for doing so, though. An obvious one is what if the 'topic' branch does not exist yet at the remote. Without the "--not ..." part, you would end up sending the entire history behind $your_topic, and the way you prevent that from happening is to give what are known to exist at the remote end. Even when there already is 'topic' at the remote, the contents at the paths that are different between your 'topic' and the 'topic' as exists at the remote may already exist on some other branches that are already at the remote (e.g. you may have merged some branches that are common between your repository and the remote, and the only object missing from the remote that your repository has to send may be a merge commit and the top-level tree object), but limiting the bottoms of the revision range only to "--not $their_topic" would rob this obvious optimization opportunity from you. There has to be some way to limit the list of remote-refs that are used as bottoms of the revision range. For example, if you know that the remote has all the tags, and that everything in the v1.0 tag is contained in the v2.0 tag, then a single "--not v2.0" should give the same result as "--not v1.0 v2.0" that lists both. But the computation that is needed to figure out which tags and branches are not worth listing as bottoms would need to look at all of them at least once anyway, so a naive implementation of such would end up spending the same cycles, I would suspect. Also it was unclear if you are working with a shallow repository. The performance trade-off made between the packsize and the cycles is somewhat different between a normal and a shallow repository, e.g. 2dacf26d (pack-objects: use --objects-edge-aggressive for shallow repos, 2014-12-24) might be a good starting point to think about this issue. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why does send-pack call pack-objects for all remote refs?
On Mon, Dec 07, 2015 at 02:41:00PM -0800, Junio C Hamano wrote: > Also it was unclear if you are working with a shallow repository. > The performance trade-off made between the packsize and the cycles > is somewhat different between a normal and a shallow repository, > e.g. 2dacf26d (pack-objects: use --objects-edge-aggressive for > shallow repos, 2014-12-24) might be a good starting point to think > about this issue. Also note that for a while the "aggressive" form was used everywhere. I think that started in fbd4a70 (list-objects: mark more commits as edges in mark_edges_uninteresting - 2013-08-16), and was fixed in 1684c1b (rev-list: add an option to mark fewer edges as uninteresting, 2014-12-24). So it was present from v1.8.4.2 up to v2.3.0. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html