Re: Long clone time after done.
Hi guys, Any further interest on this scalability problem or should I move on? Thanks, Uri On Thu, Nov 8, 2012 at 5:35 PM, Uri Moszkowicz u...@4refs.com wrote: I tried on the local disk as well and it didn't help. I managed to find a SUSE11 machine and tried it there but no luck so I think we can eliminate NFS and OS as significant factors now. I ran with perf and here's the report: ESC[31m69.07%ESC[m git /lib64/libc-2.11.1.so [.] memcpy ESC[31m12.33%ESC[m git prefix/git-1.8.0.rc2.suse11/bin/git [.] blk_SHA1_Block ESC[31m 5.11%ESC[m git prefix/zlib/local/lib/libz.so.1.2.5 [.] inflate_fast ESC[32m 2.61%ESC[m git prefix/zlib/local/lib/libz.so.1.2.5 [.] adler32 ESC[32m 1.98%ESC[m git /lib64/libc-2.11.1.so [.] _int_malloc ESC[32m 0.86%ESC[m git [kernel] [k] clear_page_c Does this help? Machine has 396GB of RAM if it matters. On Thu, Nov 8, 2012 at 4:33 PM, Jeff King p...@peff.net wrote: On Thu, Nov 08, 2012 at 04:16:59PM -0600, Uri Moszkowicz wrote: I ran git cat-file commit some-tag for every tag. They seem to be roughly uniformly distributed between 0s and 2s and about 2/3 of the time seems to be system. My disk is mounted over NFS so I tried on the local disk and it didn't make a difference. I have only one 1.97GB pack. I ran git gc --aggressive before. Ah. NFS. That is almost certainly the source of the problem. Git will aggressively mmap. I would not be surprised to find that RHEL4's NFS implementation is not particularly fast at mmap-ing 2G files, and is spending a bunch of time in the kernel servicing the requests. Aside from upgrading your OS or getting off of NFS, I don't have a lot of advice. The performance characteristics you are seeing are so grossly off of what is normal that using git is probably going to be painful. Your 2s cat-files should be more like .002s. I don't think there's anything for git to fix here. You could try building with NO_MMAP, which will emulate it with pread. That might fare better under your NFS implementation. Or it might be just as bad. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Long clone time after done.
On Wed, Nov 07, 2012 at 11:32:37AM -0600, Uri Moszkowicz wrote: #4 parse_object (sha1=0xb0ee98 \017C\205Wj\001`\254\356\307Z\332\367\353\233.\375P}D) at object.c:212 #5 0x004ae9ec in handle_one_ref (path=0xb0eec0 refs/tags/removed, sha1=0xb0ee98 \017C\205Wj\001`\254\356\307Z\332\367\353\233.\375P}D, flags=2, cb_data=optimized out) at pack-refs. [...] It looks like handle_one_ref() is called for each ref and most result in a call to read_sha1_file(). Right. When generating the packed-refs file, we include the peeled reference for a tag (i.e., the commit that a tag object points to). So we have to actually read any tag objects to get the value. The upload-pack program generates a similar list, and I recently added some optimizations. This code path could benefit from some of them by using peel_ref instead of hand-rolling the tag dereferencing. The main optimization, though, is reusing peeled values that are already in packed-refs; we would probably need some additional magic to reuse the values from the source repository. However: It only takes a second or so for each call but when you have thousands of them (one for each ref) it adds up. I am more concerned that it takes a second to read each tag. Even in my pathological tests for optimizing upload-pack, peeling 50,000 refs took only half a second. Adding --single-branch --branch branch doesn't appear to help as it is implemented afterwards. I would like to debug this problem further but am not familiar enough with the implementation to know what the next step is. Can anyone offer some suggestions? I don't see why a clone should be dependent on an O(#refs) operations. Does this patch help? In a sample repo with 5000 annotated tags, it drops my local clone time from 0.20s to 0.11s. Which is a big percentage speedup, but this code isn't taking a long time in the first place for me. --- diff --git a/pack-refs.c b/pack-refs.c index f09a054..3344749 100644 --- a/pack-refs.c +++ b/pack-refs.c @@ -40,13 +40,9 @@ static int handle_one_ref(const char *path, const unsigned char *sha1, fprintf(cb-refs_file, %s %s\n, sha1_to_hex(sha1), path); if (is_tag_ref) { - struct object *o = parse_object(sha1); - if (o-type == OBJ_TAG) { - o = deref_tag(o, path, 0); - if (o) - fprintf(cb-refs_file, ^%s\n, - sha1_to_hex(o-sha1)); - } + unsigned char peeled[20]; + if (!peel_ref(path, peeled)) + fprintf(cb-refs_file, ^%s\n, sha1_to_hex(peeled)); } if ((cb-flags PACK_REFS_PRUNE) !do_not_prune(flags)) { -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Long clone time after done.
I tried the patch but it doesn't appear to have helped :( Clone time with it was ~32m. Do you all by any chance have a tool to obfuscate a repository? Probably I still wouldn't be permitted to distribute it but might make the option slightly more palatable. Anything else that I can do to help debug this problem? On Thu, Nov 8, 2012 at 9:56 AM, Jeff King p...@peff.net wrote: On Wed, Nov 07, 2012 at 11:32:37AM -0600, Uri Moszkowicz wrote: #4 parse_object (sha1=0xb0ee98 \017C\205Wj\001`\254\356\307Z\332\367\353\233.\375P}D) at object.c:212 #5 0x004ae9ec in handle_one_ref (path=0xb0eec0 refs/tags/removed, sha1=0xb0ee98 \017C\205Wj\001`\254\356\307Z\332\367\353\233.\375P}D, flags=2, cb_data=optimized out) at pack-refs. [...] It looks like handle_one_ref() is called for each ref and most result in a call to read_sha1_file(). Right. When generating the packed-refs file, we include the peeled reference for a tag (i.e., the commit that a tag object points to). So we have to actually read any tag objects to get the value. The upload-pack program generates a similar list, and I recently added some optimizations. This code path could benefit from some of them by using peel_ref instead of hand-rolling the tag dereferencing. The main optimization, though, is reusing peeled values that are already in packed-refs; we would probably need some additional magic to reuse the values from the source repository. However: It only takes a second or so for each call but when you have thousands of them (one for each ref) it adds up. I am more concerned that it takes a second to read each tag. Even in my pathological tests for optimizing upload-pack, peeling 50,000 refs took only half a second. Adding --single-branch --branch branch doesn't appear to help as it is implemented afterwards. I would like to debug this problem further but am not familiar enough with the implementation to know what the next step is. Can anyone offer some suggestions? I don't see why a clone should be dependent on an O(#refs) operations. Does this patch help? In a sample repo with 5000 annotated tags, it drops my local clone time from 0.20s to 0.11s. Which is a big percentage speedup, but this code isn't taking a long time in the first place for me. --- diff --git a/pack-refs.c b/pack-refs.c index f09a054..3344749 100644 --- a/pack-refs.c +++ b/pack-refs.c @@ -40,13 +40,9 @@ static int handle_one_ref(const char *path, const unsigned char *sha1, fprintf(cb-refs_file, %s %s\n, sha1_to_hex(sha1), path); if (is_tag_ref) { - struct object *o = parse_object(sha1); - if (o-type == OBJ_TAG) { - o = deref_tag(o, path, 0); - if (o) - fprintf(cb-refs_file, ^%s\n, - sha1_to_hex(o-sha1)); - } + unsigned char peeled[20]; + if (!peel_ref(path, peeled)) + fprintf(cb-refs_file, ^%s\n, sha1_to_hex(peeled)); } if ((cb-flags PACK_REFS_PRUNE) !do_not_prune(flags)) { -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Long clone time after done.
On Thu, Nov 08, 2012 at 11:20:29AM -0600, Uri Moszkowicz wrote: I tried the patch but it doesn't appear to have helped :( Clone time with it was ~32m. That sounds ridiculously long. Do you all by any chance have a tool to obfuscate a repository? Probably I still wouldn't be permitted to distribute it but might make the option slightly more palatable. Anything else that I can do to help debug this problem? I don't have anything already written. What platform are you on? If it's Linux, can you try using perf to record where the time is going? How many refs do you have? What does: echo heads: $(git for-each-ref refs/heads | wc -l) echo tags: $(git for-each-ref refs/tags | wc -l) report? How long does it take to look up a tag, like: time git cat-file tag refs/tags/some-tag ? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Long clone time after done.
I'm using RHEL4. Looks like perf is only available with RHEL6. heads: 308 tags: 9614 Looking up the tags that way took a very long time by the way. git tag | wc -l was much quicker. I've already pruned a lot of tags to get to this number as well. The original repository had ~37k tags since we used to tag every commit with CVS. All my tags are packed so cat-file doesn't work: fatal: git cat-file refs/tags/some-tag: bad file On Thu, Nov 8, 2012 at 2:33 PM, Jeff King p...@peff.net wrote: On Thu, Nov 08, 2012 at 11:20:29AM -0600, Uri Moszkowicz wrote: I tried the patch but it doesn't appear to have helped :( Clone time with it was ~32m. That sounds ridiculously long. Do you all by any chance have a tool to obfuscate a repository? Probably I still wouldn't be permitted to distribute it but might make the option slightly more palatable. Anything else that I can do to help debug this problem? I don't have anything already written. What platform are you on? If it's Linux, can you try using perf to record where the time is going? How many refs do you have? What does: echo heads: $(git for-each-ref refs/heads | wc -l) echo tags: $(git for-each-ref refs/tags | wc -l) report? How long does it take to look up a tag, like: time git cat-file tag refs/tags/some-tag ? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Long clone time after done.
On Thu, Nov 08, 2012 at 03:49:32PM -0600, Uri Moszkowicz wrote: I'm using RHEL4. Looks like perf is only available with RHEL6. Yeah, RHEL4 is pretty ancient; I think it predates the invention of perf. heads: 308 tags: 9614 Looking up the tags that way took a very long time by the way. git tag | wc -l was much quicker. I've already pruned a lot of tags to get to this number as well. The original repository had ~37k tags since we used to tag every commit with CVS. Hmm. I think for-each-ref will actually open the tag objects, but git tag will not. That would imply that reading the refs is fast, but opening objects is slow. I wonder why. How many packs do you have in .git/objects/pack of the repository? All my tags are packed so cat-file doesn't work: fatal: git cat-file refs/tags/some-tag: bad file The packing shouldn't matter. The point of the command is to look up the refs/tags/some-tag ref (in packed-refs or in the filesystem), and then open and write the pointed-to object to stdout. If that is not working, then there is something seriously wrong going on. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Long clone time after done.
I ran git cat-file commit some-tag for every tag. They seem to be roughly uniformly distributed between 0s and 2s and about 2/3 of the time seems to be system. My disk is mounted over NFS so I tried on the local disk and it didn't make a difference. I have only one 1.97GB pack. I ran git gc --aggressive before. On Thu, Nov 8, 2012 at 4:11 PM, Jeff King p...@peff.net wrote: On Thu, Nov 08, 2012 at 03:49:32PM -0600, Uri Moszkowicz wrote: I'm using RHEL4. Looks like perf is only available with RHEL6. Yeah, RHEL4 is pretty ancient; I think it predates the invention of perf. heads: 308 tags: 9614 Looking up the tags that way took a very long time by the way. git tag | wc -l was much quicker. I've already pruned a lot of tags to get to this number as well. The original repository had ~37k tags since we used to tag every commit with CVS. Hmm. I think for-each-ref will actually open the tag objects, but git tag will not. That would imply that reading the refs is fast, but opening objects is slow. I wonder why. How many packs do you have in .git/objects/pack of the repository? All my tags are packed so cat-file doesn't work: fatal: git cat-file refs/tags/some-tag: bad file The packing shouldn't matter. The point of the command is to look up the refs/tags/some-tag ref (in packed-refs or in the filesystem), and then open and write the pointed-to object to stdout. If that is not working, then there is something seriously wrong going on. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Long clone time after done.
On Thu, Nov 08, 2012 at 04:16:59PM -0600, Uri Moszkowicz wrote: I ran git cat-file commit some-tag for every tag. They seem to be roughly uniformly distributed between 0s and 2s and about 2/3 of the time seems to be system. My disk is mounted over NFS so I tried on the local disk and it didn't make a difference. I have only one 1.97GB pack. I ran git gc --aggressive before. Ah. NFS. That is almost certainly the source of the problem. Git will aggressively mmap. I would not be surprised to find that RHEL4's NFS implementation is not particularly fast at mmap-ing 2G files, and is spending a bunch of time in the kernel servicing the requests. Aside from upgrading your OS or getting off of NFS, I don't have a lot of advice. The performance characteristics you are seeing are so grossly off of what is normal that using git is probably going to be painful. Your 2s cat-files should be more like .002s. I don't think there's anything for git to fix here. You could try building with NO_MMAP, which will emulate it with pread. That might fare better under your NFS implementation. Or it might be just as bad. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Long clone time after done.
I tried on the local disk as well and it didn't help. I managed to find a SUSE11 machine and tried it there but no luck so I think we can eliminate NFS and OS as significant factors now. I ran with perf and here's the report: ESC[31m69.07%ESC[m git /lib64/libc-2.11.1.so [.] memcpy ESC[31m12.33%ESC[m git prefix/git-1.8.0.rc2.suse11/bin/git [.] blk_SHA1_Block ESC[31m 5.11%ESC[m git prefix/zlib/local/lib/libz.so.1.2.5 [.] inflate_fast ESC[32m 2.61%ESC[m git prefix/zlib/local/lib/libz.so.1.2.5 [.] adler32 ESC[32m 1.98%ESC[m git /lib64/libc-2.11.1.so [.] _int_malloc ESC[32m 0.86%ESC[m git [kernel] [k] clear_page_c Does this help? Machine has 396GB of RAM if it matters. On Thu, Nov 8, 2012 at 4:33 PM, Jeff King p...@peff.net wrote: On Thu, Nov 08, 2012 at 04:16:59PM -0600, Uri Moszkowicz wrote: I ran git cat-file commit some-tag for every tag. They seem to be roughly uniformly distributed between 0s and 2s and about 2/3 of the time seems to be system. My disk is mounted over NFS so I tried on the local disk and it didn't make a difference. I have only one 1.97GB pack. I ran git gc --aggressive before. Ah. NFS. That is almost certainly the source of the problem. Git will aggressively mmap. I would not be surprised to find that RHEL4's NFS implementation is not particularly fast at mmap-ing 2G files, and is spending a bunch of time in the kernel servicing the requests. Aside from upgrading your OS or getting off of NFS, I don't have a lot of advice. The performance characteristics you are seeing are so grossly off of what is normal that using git is probably going to be painful. Your 2s cat-files should be more like .002s. I don't think there's anything for git to fix here. You could try building with NO_MMAP, which will emulate it with pread. That might fare better under your NFS implementation. Or it might be just as bad. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Long clone time after done.
It all goes to pack_refs() in write_remote_refs called from update_remote_refs(). On Tue, Oct 23, 2012 at 11:29 PM, Nguyen Thai Ngoc Duy pclo...@gmail.com wrote: On Wed, Oct 24, 2012 at 1:30 AM, Uri Moszkowicz u...@4refs.com wrote: I have a large repository which I ran git gc --aggressive on that I'm trying to clone on a local file system. I would expect it to complete very quickly with hard links but it's taking about 6min to complete with no checkout (git clone -n). I see the message Clining into 'repos'... done. appear after a few seconds but then Git just hangs there for another 6min. Any idea what it's doing at this point and how I can speed it up? done. is printed by clone_local(), which is called in cmd_clone(). After that there are just a few more calls. Maybe you could add a few printf in between these calls, see which one takes most time? -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Long clone time after done.
I have a large repository which I ran git gc --aggressive on that I'm trying to clone on a local file system. I would expect it to complete very quickly with hard links but it's taking about 6min to complete with no checkout (git clone -n). I see the message Clining into 'repos'... done. appear after a few seconds but then Git just hangs there for another 6min. Any idea what it's doing at this point and how I can speed it up? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Long clone time after done.
On Wed, Oct 24, 2012 at 1:30 AM, Uri Moszkowicz u...@4refs.com wrote: I have a large repository which I ran git gc --aggressive on that I'm trying to clone on a local file system. I would expect it to complete very quickly with hard links but it's taking about 6min to complete with no checkout (git clone -n). I see the message Clining into 'repos'... done. appear after a few seconds but then Git just hangs there for another 6min. Any idea what it's doing at this point and how I can speed it up? done. is printed by clone_local(), which is called in cmd_clone(). After that there are just a few more calls. Maybe you could add a few printf in between these calls, see which one takes most time? -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html