Re: question about: Facebook makes Mercurial faster than Git
On Mon, Mar 10, 2014 at 6:28 PM, demerphq wrote: > I had the impression, and I would not be surprised if they had the > impression that the git development community is relatively > unconcerned about performance issues on larger repositories. > > There have been other reports, which are difficult to keep track of > without a bug tracking system, but the ones I know of are: > > Poor performance of git status with large number of excluded files and > large repositories. I thought this has been improved lately.. I think we could do better still, but my wip is nowhere ready for anybody's eyes. > Poor performance, and breakage, on repositories with very large > numbers of files in them. index v5 and sparse checkout should help a bit. The ultimate solution, though, is narrow clone that's nowhere near finishing. Well, if you need all files present in worktree, then narrow clone does not help either.. On the same line, poor performance on repos with a lot of very large files also. Junio's split-blob series was a start, but no one picked it up, so I guess your impression was right. > (Rebase for instance will break if you rebase a commit that contains a *lot* > of files.) Interesting. I guess it hits shell's limitations? Roughly how many files to break it? > Poor performance in protocol layer (and other places) with repos with > large numbers of refs. (Maybe this is fixed, not sure.) Ah.. no it's not. It's being stirred up again though, in both protocol and ref backend. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: question about: Facebook makes Mercurial faster than Git
On Mon, Mar 10, 2014 at 10:56:51AM -0700, David Lang wrote: > On Mon, 10 Mar 2014, Ondřej Bílka wrote: > > >On Mon, Mar 10, 2014 at 03:13:45AM -0700, David Lang wrote: > >>On Mon, 10 Mar 2014, Dennis Luehring wrote: > >> > >>>according to these blog posts > >>> > >>>http://www.infoq.com/news/2014/01/facebook-scaling-hg > >>>https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/ > >>> > >>>mercurial "can" be faster then git > >>> > >>>but i don't found any reply from the git community if it is a real problem > >>>or if there a ongoing (maybe git 2.0) changes to compete better in this > >>>case > >> > >>As I understand this, the biggest part of what happened is that > >>Facebook made a tweak to mercurial so that when it needs to know > >>what files have changed in their massive tree, their version asks > >>their special storage array, while git would have to look at it > >>through the filesystem interface (by doing stat calls on the > >>directories and files to see if anything has changed) > >> > >That is mostly a kernel problem. Long ago there was proposed patch to > >add a recursive mtime so you could check what subtrees changed. If > >somebody ressurected that patch it would gave similar boost. > > btrfs could actually implement this efficiently, but for a lot of > other filesysems this could be very expensive. The question is if it > could be enough of a win to make it a good choice for people who are > doing a heavy git workload as opposed to more generic uses. > Read next paragraph how do that efficiently, a directory update needs to be done only between application runs. Also there is no overhead when not used (except if that makes headers bigger.) > there's also the issue of managed vs generated files, if you update > the mtime all the way up the tree because a source file was compiled > and a binary created, that will quickly defeat the value of the > recursive mtime. > You could do marking on per-file basis. I am not sure if that is needed as larger projects use makefiles to not recompile everything so its probably recompiled because source at same directory changed. Also if your compile time is five minutes a half second status would not make much difference. > > >There are two issues that need to be handled, first if you are concerned > >about one mtime change doing lot of updates a application needs to mark > >all directories it is interested on, when we do update we unmark > >directory and by that we update each directory at most once per > >application run. > > > >Second problem were hard links where probably a best course is keep list > >of these and stat them separately. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: question about: Facebook makes Mercurial faster than Git
On Mon, Mar 10, 2014 at 1:56 PM, David Lang wrote: > there's also the issue of managed vs generated files, if you update the > mtime all the way up the tree because a source file was compiled and a > binary created, that will quickly defeat the value of the recursive mime. I think this points us again to an inotify-based strategy, where git can put an event listener daemon which registers just the watchers it needs, and filters the events on its own conditions. The kernel and fs have no good way of knowing about this stuff. cheers, m -- martin.langh...@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: question about: Facebook makes Mercurial faster than Git
On Mon, Mar 10, 2014 at 03:13:45AM -0700, David Lang wrote: > On Mon, 10 Mar 2014, Dennis Luehring wrote: > > >according to these blog posts > > > >http://www.infoq.com/news/2014/01/facebook-scaling-hg > >https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/ > > > >mercurial "can" be faster then git > > > >but i don't found any reply from the git community if it is a real problem > >or if there a ongoing (maybe git 2.0) changes to compete better in this case > > As I understand this, the biggest part of what happened is that > Facebook made a tweak to mercurial so that when it needs to know > what files have changed in their massive tree, their version asks > their special storage array, while git would have to look at it > through the filesystem interface (by doing stat calls on the > directories and files to see if anything has changed) > That is mostly a kernel problem. Long ago there was proposed patch to add a recursive mtime so you could check what subtrees changed. If somebody ressurected that patch it would gave similar boost. There are two issues that need to be handled, first if you are concerned about one mtime change doing lot of updates a application needs to mark all directories it is interested on, when we do update we unmark directory and by that we update each directory at most once per application run. Second problem were hard links where probably a best course is keep list of these and stat them separately. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: question about: Facebook makes Mercurial faster than Git
On Mon, 10 Mar 2014, Ondřej Bílka wrote: On Mon, Mar 10, 2014 at 03:13:45AM -0700, David Lang wrote: On Mon, 10 Mar 2014, Dennis Luehring wrote: according to these blog posts http://www.infoq.com/news/2014/01/facebook-scaling-hg https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/ mercurial "can" be faster then git but i don't found any reply from the git community if it is a real problem or if there a ongoing (maybe git 2.0) changes to compete better in this case As I understand this, the biggest part of what happened is that Facebook made a tweak to mercurial so that when it needs to know what files have changed in their massive tree, their version asks their special storage array, while git would have to look at it through the filesystem interface (by doing stat calls on the directories and files to see if anything has changed) That is mostly a kernel problem. Long ago there was proposed patch to add a recursive mtime so you could check what subtrees changed. If somebody ressurected that patch it would gave similar boost. btrfs could actually implement this efficiently, but for a lot of other filesysems this could be very expensive. The question is if it could be enough of a win to make it a good choice for people who are doing a heavy git workload as opposed to more generic uses. there's also the issue of managed vs generated files, if you update the mtime all the way up the tree because a source file was compiled and a binary created, that will quickly defeat the value of the recursive mtime. David Lang There are two issues that need to be handled, first if you are concerned about one mtime change doing lot of updates a application needs to mark all directories it is interested on, when we do update we unmark directory and by that we update each directory at most once per application run. Second problem were hard links where probably a best course is keep list of these and stat them separately.
Re: question about: Facebook makes Mercurial faster than Git
On 03/10/2014 01:10 PM, Johan Herland wrote: > It should be possible to teach Git to do similar things, and IINM > there are (and have previously been) several attempts to do similar > things in Git, e.g.: > > - http://thread.gmane.org/gmane.comp.version-control.git/240339 > > - http://thread.gmane.org/gmane.comp.version-control.git/217817 > > I haven't looked closely at these attempts (it is not my scratch to > itch), and I don't know if/how they would work on top of Watchman, but > in principle I don't see why Git shouldn't be able to leverage > Watchman the same way Mercurial does. This touches on the most important thing that we should take to heart from this episode: Of course Facebook could have modified either Git or Mercurial to do what they want. Why did they pick Mercurial? The article seems to claim that they were initially biased towards Git, but they chose Mercurial because its code base is easier to modify. This is a claim that I can easily believe. The two projects are almost exactly the same age. The number of commits in the two projects is similar. Mercurial has had fewer contributors active at any given time over its project lifetime. But let's see how much code is in the main part of Mercurial vs. Git: $ find mercurial hgext \( -name '*.c' -o -name '*.py' \) -print | xargs cat | wc -l 46164 $ cat *.c *.h *.sh *.perl builtin/*.c | wc -l 188530 These are just crude estimates and I hope I got the right directories for Mercurial. But, by these numbers, Git has 4 times as much code as Mercurial. That alone will go a long way to making Git harder to modify. I don't think that Git has anywhere near 4 times the features of Mercurial. Probably most of the difference can be explained by the choice of implementation languages; 94% of the code in these hg directories is Python, whereas 88% of Git's core code is C. How can we make Git easier to hack (short of switching languages)? Here are my suggestions: * Better function docstrings -- don't make developers have to read the whole call stack to find out what a function does, or who owns the memory that is passed around. * More modularity -- more coherent and abstract APIs between different parts of the system, and less pawing around in your neighbor's data structures. * Higher-level abstractions -- make more use of APIs like strbuf and string_list as opposed to handling every malloc() and realloc() by hand. I personally wish that we as a project would be more willing to spend a few extra CPU microseconds to make our code easier to read and modify and more robust. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: question about: Facebook makes Mercurial faster than Git
Am 10.03.2014 12:42, schrieb Dennis Luehring: > Am 10.03.2014 12:28, schrieb demerphq: >> I had the impression, and I would not be surprised if they had the >> impression that the git development community is relatively >> unconcerned about performance issues on larger repositories. > > so the question is if the git community is interested in beeing competive in > such > large scale scenarios - something what mercurial seems to be now out of the > box > The hgwatchman site claims (https://bitbucket.org/facebook/hgwatchman) "On a real-world repository with over 200,000 files, hg status normally takes over 3 seconds. With hgwatchman it takes under 0.6 seconds." There have been a few performance improvements in git status to support such large repositories. I just re-checked git status performance with the WebKit repo (~200k files): Linux (with core.preloadIndex) git status -uall: 0.620s git status -uno : 0.255s Windows (with core.preloadIndex and core.fscache) git status -uall: 1.006s git status -uno : 0.695s Of course, for more reliable benchmark data, you'd have to compare the same repo on the same platform. But on first glance, it seems that mercurial with hgwatchman extension may be as fast as git is out of the box, not the other way around. This comes at the cost of running a background daemon, which may slow down the entire system. E.g. if the daemon activates whenever the compiler creates a .o file, it will probably slow down build performance. Note that hgwatchman doesn't support Windows, so git is probably much faster there. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: question about: Facebook makes Mercurial faster than Git
On Mon, Mar 10, 2014 at 12:42 PM, Dennis Luehring wrote: > Am 10.03.2014 12:28, schrieb demerphq: > >> I had the impression, and I would not be surprised if they had the >> impression that the git development community is relatively >> unconcerned about performance issues on larger repositories. > > so the question is if the git community is interested in beeing competive in > such large scale scenarios - something what mercurial seems to be now out > of the box AFAIK, David Lang's comment is not far off the mark. Facebook has made a tool called Watchman (https://github.com/facebook/watchman) that watches your work tree (i.e. wrapping inotify on Linux) and triggers various commands when files within are changed (e.g. do an auto-build whenever a file in your project changes). Since this tool will discover when files change, they have adjusted Mercurial to discover changes by querying Watchman instead of stat-ing the entire work tree. AFAICS, this is basically a tradeoff between the time it takes to stat your work tree and the overhead/administrivia of running a daemon to monitor the work tree. It seems Facebook has organized their code and infrastructure in a way that makes the latter approach worthwhile for them, and has contributed their solution back to Mercurial. It should be possible to teach Git to do similar things, and IINM there are (and have previously been) several attempts to do similar things in Git, e.g.: - http://thread.gmane.org/gmane.comp.version-control.git/240339 - http://thread.gmane.org/gmane.comp.version-control.git/217817 I haven't looked closely at these attempts (it is not my scratch to itch), and I don't know if/how they would work on top of Watchman, but in principle I don't see why Git shouldn't be able to leverage Watchman the same way Mercurial does. ...Johan -- Johan Herland, www.herland.net -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: question about: Facebook makes Mercurial faster than Git
Am 10.03.2014 12:28, schrieb demerphq: I had the impression, and I would not be surprised if they had the impression that the git development community is relatively unconcerned about performance issues on larger repositories. so the question is if the git community is interested in beeing competive in such large scale scenarios - something what mercurial seems to be now out of the box -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: question about: Facebook makes Mercurial faster than Git
On 10 March 2014 11:07, Dennis Luehring wrote: > according to these blog posts > > http://www.infoq.com/news/2014/01/facebook-scaling-hg > https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/ > > mercurial "can" be faster then git > > but i don't found any reply from the git community if it is a real problem > or if there a ongoing (maybe git 2.0) changes to compete better in this case They mailed the list about performance issues in git. From what I saw there was relatively little feedback. I had the impression, and I would not be surprised if they had the impression that the git development community is relatively unconcerned about performance issues on larger repositories. There have been other reports, which are difficult to keep track of without a bug tracking system, but the ones I know of are: Poor performance of git status with large number of excluded files and large repositories. Poor performance, and breakage, on repositories with very large numbers of files in them. (Rebase for instance will break if you rebase a commit that contains a *lot* of files.) Poor performance in protocol layer (and other places) with repos with large numbers of refs. (Maybe this is fixed, not sure.) cheers, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/" -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: question about: Facebook makes Mercurial faster than Git
On Mon, 10 Mar 2014, Dennis Luehring wrote: according to these blog posts http://www.infoq.com/news/2014/01/facebook-scaling-hg https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/ mercurial "can" be faster then git but i don't found any reply from the git community if it is a real problem or if there a ongoing (maybe git 2.0) changes to compete better in this case As I understand this, the biggest part of what happened is that Facebook made a tweak to mercurial so that when it needs to know what files have changed in their massive tree, their version asks their special storage array, while git would have to look at it through the filesystem interface (by doing stat calls on the directories and files to see if anything has changed) In other words, unless you have a very high end storage system that can keep track of such things for you, the Facebook 'fix' won't help you. And even if it does have such a capability, unless you use the same storage system that Facebook uses, you would have to port it to your class of device. Now, in addition to this, they did some other tweaks and changes, but compared to this status change, everything else is minor. David Lang -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
question about: Facebook makes Mercurial faster than Git
according to these blog posts http://www.infoq.com/news/2014/01/facebook-scaling-hg https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/ mercurial "can" be faster then git but i don't found any reply from the git community if it is a real problem or if there a ongoing (maybe git 2.0) changes to compete better in this case -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html