Re: Long clone time after done.

2012-11-26 Thread Uri Moszkowicz
Hi guys,
Any further interest on this scalability problem or should I move on?

Thanks,
Uri

On Thu, Nov 8, 2012 at 5:35 PM, Uri Moszkowicz u...@4refs.com wrote:
 I tried on the local disk as well and it didn't help. I managed to
 find a SUSE11 machine and tried it there but no luck so I think we can
 eliminate NFS and OS as significant factors now.

 I ran with perf and here's the report:

 ESC[31m69.07%ESC[m  git  /lib64/libc-2.11.1.so
  [.] memcpy
 ESC[31m12.33%ESC[m  git
 prefix/git-1.8.0.rc2.suse11/bin/git   [.]
 blk_SHA1_Block
 ESC[31m 5.11%ESC[m  git
 prefix/zlib/local/lib/libz.so.1.2.5   [.]
 inflate_fast
 ESC[32m 2.61%ESC[m  git
 prefix/zlib/local/lib/libz.so.1.2.5   [.]
 adler32
 ESC[32m 1.98%ESC[m  git  /lib64/libc-2.11.1.so
  [.] _int_malloc
 ESC[32m 0.86%ESC[m  git  [kernel]
  [k] clear_page_c

 Does this help? Machine has 396GB of RAM if it matters.

 On Thu, Nov 8, 2012 at 4:33 PM, Jeff King p...@peff.net wrote:
 On Thu, Nov 08, 2012 at 04:16:59PM -0600, Uri Moszkowicz wrote:

 I ran git cat-file commit some-tag for every tag. They seem to be
 roughly uniformly distributed between 0s and 2s and about 2/3 of the
 time seems to be system. My disk is mounted over NFS so I tried on the
 local disk and it didn't make a difference.

 I have only one 1.97GB pack. I ran git gc --aggressive before.

 Ah. NFS. That is almost certainly the source of the problem. Git will
 aggressively mmap. I would not be surprised to find that RHEL4's NFS
 implementation is not particularly fast at mmap-ing 2G files, and is
 spending a bunch of time in the kernel servicing the requests.

 Aside from upgrading your OS or getting off of NFS, I don't have a lot
 of advice.  The performance characteristics you are seeing are so
 grossly off of what is normal that using git is probably going to be
 painful. Your 2s cat-files should be more like .002s. I don't think
 there's anything for git to fix here.

 You could try building with NO_MMAP, which will emulate it with pread.
 That might fare better under your NFS implementation. Or it might be
 just as bad.

 -Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Long clone time after done.

2012-11-08 Thread Jeff King
On Wed, Nov 07, 2012 at 11:32:37AM -0600, Uri Moszkowicz wrote:

   #4  parse_object (sha1=0xb0ee98
 \017C\205Wj\001`\254\356\307Z\332\367\353\233.\375P}D) at
 object.c:212
   #5  0x004ae9ec in handle_one_ref (path=0xb0eec0
 refs/tags/removed, sha1=0xb0ee98
 \017C\205Wj\001`\254\356\307Z\332\367\353\233.\375P}D, flags=2,
 cb_data=optimized out) at pack-refs.

 [...]
 
 It looks like handle_one_ref() is called for each ref and most result
 in a call to read_sha1_file().

Right. When generating the packed-refs file, we include the peeled
reference for a tag (i.e., the commit that a tag object points to). So
we have to actually read any tag objects to get the value.

The upload-pack program generates a similar list, and I recently added
some optimizations. This code path could benefit from some of them by
using peel_ref instead of hand-rolling the tag dereferencing. The main
optimization, though, is reusing peeled values that are already in
packed-refs; we would probably need some additional magic to reuse the
values from the source repository.

However:

 It only takes a second or so for each call but when you have thousands
 of them (one for each ref) it adds up.

I am more concerned that it takes a second to read each tag. Even in my
pathological tests for optimizing upload-pack, peeling 50,000 refs took
only half a second.

 Adding --single-branch --branch branch doesn't appear to help as
 it is implemented afterwards. I would like to debug this problem
 further but am not familiar enough with the implementation to know
 what the next step is. Can anyone offer some suggestions? I don't see
 why a clone should be dependent on an O(#refs) operations.

Does this patch help? In a sample repo with 5000 annotated tags, it
drops my local clone time from 0.20s to 0.11s. Which is a big percentage
speedup, but this code isn't taking a long time in the first place for
me.

---
diff --git a/pack-refs.c b/pack-refs.c
index f09a054..3344749 100644
--- a/pack-refs.c
+++ b/pack-refs.c
@@ -40,13 +40,9 @@ static int handle_one_ref(const char *path, const unsigned 
char *sha1,
 
fprintf(cb-refs_file, %s %s\n, sha1_to_hex(sha1), path);
if (is_tag_ref) {
-   struct object *o = parse_object(sha1);
-   if (o-type == OBJ_TAG) {
-   o = deref_tag(o, path, 0);
-   if (o)
-   fprintf(cb-refs_file, ^%s\n,
-   sha1_to_hex(o-sha1));
-   }
+   unsigned char peeled[20];
+   if (!peel_ref(path, peeled))
+   fprintf(cb-refs_file, ^%s\n, sha1_to_hex(peeled));
}
 
if ((cb-flags  PACK_REFS_PRUNE)  !do_not_prune(flags)) {
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Long clone time after done.

2012-11-08 Thread Uri Moszkowicz
I tried the patch but it doesn't appear to have helped :( Clone time
with it was ~32m.

Do you all by any chance have a tool to obfuscate a repository?
Probably I still wouldn't be permitted to distribute it but might make
the option slightly more palatable. Anything else that I can do to
help debug this problem?

On Thu, Nov 8, 2012 at 9:56 AM, Jeff King p...@peff.net wrote:
 On Wed, Nov 07, 2012 at 11:32:37AM -0600, Uri Moszkowicz wrote:

   #4  parse_object (sha1=0xb0ee98
 \017C\205Wj\001`\254\356\307Z\332\367\353\233.\375P}D) at
 object.c:212
   #5  0x004ae9ec in handle_one_ref (path=0xb0eec0
 refs/tags/removed, sha1=0xb0ee98
 \017C\205Wj\001`\254\356\307Z\332\367\353\233.\375P}D, flags=2,
 cb_data=optimized out) at pack-refs.

 [...]

 It looks like handle_one_ref() is called for each ref and most result
 in a call to read_sha1_file().

 Right. When generating the packed-refs file, we include the peeled
 reference for a tag (i.e., the commit that a tag object points to). So
 we have to actually read any tag objects to get the value.

 The upload-pack program generates a similar list, and I recently added
 some optimizations. This code path could benefit from some of them by
 using peel_ref instead of hand-rolling the tag dereferencing. The main
 optimization, though, is reusing peeled values that are already in
 packed-refs; we would probably need some additional magic to reuse the
 values from the source repository.

 However:

 It only takes a second or so for each call but when you have thousands
 of them (one for each ref) it adds up.

 I am more concerned that it takes a second to read each tag. Even in my
 pathological tests for optimizing upload-pack, peeling 50,000 refs took
 only half a second.

 Adding --single-branch --branch branch doesn't appear to help as
 it is implemented afterwards. I would like to debug this problem
 further but am not familiar enough with the implementation to know
 what the next step is. Can anyone offer some suggestions? I don't see
 why a clone should be dependent on an O(#refs) operations.

 Does this patch help? In a sample repo with 5000 annotated tags, it
 drops my local clone time from 0.20s to 0.11s. Which is a big percentage
 speedup, but this code isn't taking a long time in the first place for
 me.

 ---
 diff --git a/pack-refs.c b/pack-refs.c
 index f09a054..3344749 100644
 --- a/pack-refs.c
 +++ b/pack-refs.c
 @@ -40,13 +40,9 @@ static int handle_one_ref(const char *path, const unsigned 
 char *sha1,

 fprintf(cb-refs_file, %s %s\n, sha1_to_hex(sha1), path);
 if (is_tag_ref) {
 -   struct object *o = parse_object(sha1);
 -   if (o-type == OBJ_TAG) {
 -   o = deref_tag(o, path, 0);
 -   if (o)
 -   fprintf(cb-refs_file, ^%s\n,
 -   sha1_to_hex(o-sha1));
 -   }
 +   unsigned char peeled[20];
 +   if (!peel_ref(path, peeled))
 +   fprintf(cb-refs_file, ^%s\n, sha1_to_hex(peeled));
 }

 if ((cb-flags  PACK_REFS_PRUNE)  !do_not_prune(flags)) {
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Long clone time after done.

2012-11-08 Thread Jeff King
On Thu, Nov 08, 2012 at 11:20:29AM -0600, Uri Moszkowicz wrote:

 I tried the patch but it doesn't appear to have helped :( Clone time
 with it was ~32m.

That sounds ridiculously long.

 Do you all by any chance have a tool to obfuscate a repository?
 Probably I still wouldn't be permitted to distribute it but might make
 the option slightly more palatable. Anything else that I can do to
 help debug this problem?

I don't have anything already written. What platform are you on? If it's
Linux, can you try using perf to record where the time is going?

How many refs do you have? What does:

  echo heads: $(git for-each-ref refs/heads | wc -l)
  echo  tags: $(git for-each-ref refs/tags | wc -l)

report? How long does it take to look up a tag, like:

  time git cat-file tag refs/tags/some-tag

?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Long clone time after done.

2012-11-08 Thread Uri Moszkowicz
I'm using RHEL4. Looks like perf is only available with RHEL6.

heads: 308
tags: 9614

Looking up the tags that way took a very long time by the way. git
tag | wc -l was much quicker. I've already pruned a lot of tags to
get to this number as well. The original repository had ~37k tags
since we used to tag every commit with CVS.

All my tags are packed so cat-file doesn't work:
fatal: git cat-file refs/tags/some-tag: bad file

On Thu, Nov 8, 2012 at 2:33 PM, Jeff King p...@peff.net wrote:
 On Thu, Nov 08, 2012 at 11:20:29AM -0600, Uri Moszkowicz wrote:

 I tried the patch but it doesn't appear to have helped :( Clone time
 with it was ~32m.

 That sounds ridiculously long.

 Do you all by any chance have a tool to obfuscate a repository?
 Probably I still wouldn't be permitted to distribute it but might make
 the option slightly more palatable. Anything else that I can do to
 help debug this problem?

 I don't have anything already written. What platform are you on? If it's
 Linux, can you try using perf to record where the time is going?

 How many refs do you have? What does:

   echo heads: $(git for-each-ref refs/heads | wc -l)
   echo  tags: $(git for-each-ref refs/tags | wc -l)

 report? How long does it take to look up a tag, like:

   time git cat-file tag refs/tags/some-tag

 ?

 -Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Long clone time after done.

2012-11-08 Thread Jeff King
On Thu, Nov 08, 2012 at 03:49:32PM -0600, Uri Moszkowicz wrote:

 I'm using RHEL4. Looks like perf is only available with RHEL6.

Yeah, RHEL4 is pretty ancient; I think it predates the invention of
perf.

 heads: 308
 tags: 9614
 
 Looking up the tags that way took a very long time by the way. git
 tag | wc -l was much quicker. I've already pruned a lot of tags to
 get to this number as well. The original repository had ~37k tags
 since we used to tag every commit with CVS.

Hmm. I think for-each-ref will actually open the tag objects, but git
tag will not. That would imply that reading the refs is fast, but
opening objects is slow. I wonder why.

How many packs do you have in .git/objects/pack of the repository?

 All my tags are packed so cat-file doesn't work:
 fatal: git cat-file refs/tags/some-tag: bad file

The packing shouldn't matter. The point of the command is to look up the
refs/tags/some-tag ref (in packed-refs or in the filesystem), and then
open and write the pointed-to object to stdout. If that is not working,
then there is something seriously wrong going on.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Long clone time after done.

2012-11-08 Thread Uri Moszkowicz
I ran git cat-file commit some-tag for every tag. They seem to be
roughly uniformly distributed between 0s and 2s and about 2/3 of the
time seems to be system. My disk is mounted over NFS so I tried on the
local disk and it didn't make a difference.

I have only one 1.97GB pack. I ran git gc --aggressive before.

On Thu, Nov 8, 2012 at 4:11 PM, Jeff King p...@peff.net wrote:
 On Thu, Nov 08, 2012 at 03:49:32PM -0600, Uri Moszkowicz wrote:

 I'm using RHEL4. Looks like perf is only available with RHEL6.

 Yeah, RHEL4 is pretty ancient; I think it predates the invention of
 perf.

 heads: 308
 tags: 9614

 Looking up the tags that way took a very long time by the way. git
 tag | wc -l was much quicker. I've already pruned a lot of tags to
 get to this number as well. The original repository had ~37k tags
 since we used to tag every commit with CVS.

 Hmm. I think for-each-ref will actually open the tag objects, but git
 tag will not. That would imply that reading the refs is fast, but
 opening objects is slow. I wonder why.

 How many packs do you have in .git/objects/pack of the repository?

 All my tags are packed so cat-file doesn't work:
 fatal: git cat-file refs/tags/some-tag: bad file

 The packing shouldn't matter. The point of the command is to look up the
 refs/tags/some-tag ref (in packed-refs or in the filesystem), and then
 open and write the pointed-to object to stdout. If that is not working,
 then there is something seriously wrong going on.

 -Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Long clone time after done.

2012-11-08 Thread Jeff King
On Thu, Nov 08, 2012 at 04:16:59PM -0600, Uri Moszkowicz wrote:

 I ran git cat-file commit some-tag for every tag. They seem to be
 roughly uniformly distributed between 0s and 2s and about 2/3 of the
 time seems to be system. My disk is mounted over NFS so I tried on the
 local disk and it didn't make a difference.
 
 I have only one 1.97GB pack. I ran git gc --aggressive before.

Ah. NFS. That is almost certainly the source of the problem. Git will
aggressively mmap. I would not be surprised to find that RHEL4's NFS
implementation is not particularly fast at mmap-ing 2G files, and is
spending a bunch of time in the kernel servicing the requests.

Aside from upgrading your OS or getting off of NFS, I don't have a lot
of advice.  The performance characteristics you are seeing are so
grossly off of what is normal that using git is probably going to be
painful. Your 2s cat-files should be more like .002s. I don't think
there's anything for git to fix here.

You could try building with NO_MMAP, which will emulate it with pread.
That might fare better under your NFS implementation. Or it might be
just as bad.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Long clone time after done.

2012-11-08 Thread Uri Moszkowicz
I tried on the local disk as well and it didn't help. I managed to
find a SUSE11 machine and tried it there but no luck so I think we can
eliminate NFS and OS as significant factors now.

I ran with perf and here's the report:

ESC[31m69.07%ESC[m  git  /lib64/libc-2.11.1.so
 [.] memcpy
ESC[31m12.33%ESC[m  git
prefix/git-1.8.0.rc2.suse11/bin/git   [.]
blk_SHA1_Block
ESC[31m 5.11%ESC[m  git
prefix/zlib/local/lib/libz.so.1.2.5   [.]
inflate_fast
ESC[32m 2.61%ESC[m  git
prefix/zlib/local/lib/libz.so.1.2.5   [.]
adler32
ESC[32m 1.98%ESC[m  git  /lib64/libc-2.11.1.so
 [.] _int_malloc
ESC[32m 0.86%ESC[m  git  [kernel]
 [k] clear_page_c

Does this help? Machine has 396GB of RAM if it matters.

On Thu, Nov 8, 2012 at 4:33 PM, Jeff King p...@peff.net wrote:
 On Thu, Nov 08, 2012 at 04:16:59PM -0600, Uri Moszkowicz wrote:

 I ran git cat-file commit some-tag for every tag. They seem to be
 roughly uniformly distributed between 0s and 2s and about 2/3 of the
 time seems to be system. My disk is mounted over NFS so I tried on the
 local disk and it didn't make a difference.

 I have only one 1.97GB pack. I ran git gc --aggressive before.

 Ah. NFS. That is almost certainly the source of the problem. Git will
 aggressively mmap. I would not be surprised to find that RHEL4's NFS
 implementation is not particularly fast at mmap-ing 2G files, and is
 spending a bunch of time in the kernel servicing the requests.

 Aside from upgrading your OS or getting off of NFS, I don't have a lot
 of advice.  The performance characteristics you are seeing are so
 grossly off of what is normal that using git is probably going to be
 painful. Your 2s cat-files should be more like .002s. I don't think
 there's anything for git to fix here.

 You could try building with NO_MMAP, which will emulate it with pread.
 That might fare better under your NFS implementation. Or it might be
 just as bad.

 -Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Long clone time after done.

2012-10-24 Thread Uri Moszkowicz
It all goes to pack_refs() in write_remote_refs called from
update_remote_refs().

On Tue, Oct 23, 2012 at 11:29 PM, Nguyen Thai Ngoc Duy
pclo...@gmail.com wrote:
 On Wed, Oct 24, 2012 at 1:30 AM, Uri Moszkowicz u...@4refs.com wrote:
 I have a large repository which I ran git gc --aggressive on that
 I'm trying to clone on a local file system. I would expect it to
 complete very quickly with hard links but it's taking about 6min to
 complete with no checkout (git clone -n). I see the message Clining
 into 'repos'... done. appear after a few seconds but then Git just
 hangs there for another 6min. Any idea what it's doing at this point
 and how I can speed it up?

 done. is printed by clone_local(), which is called in cmd_clone().
 After that there are just a few more calls. Maybe you could add a few
 printf in between these calls, see which one takes most time?
 --
 Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Long clone time after done.

2012-10-23 Thread Uri Moszkowicz
I have a large repository which I ran git gc --aggressive on that
I'm trying to clone on a local file system. I would expect it to
complete very quickly with hard links but it's taking about 6min to
complete with no checkout (git clone -n). I see the message Clining
into 'repos'... done. appear after a few seconds but then Git just
hangs there for another 6min. Any idea what it's doing at this point
and how I can speed it up?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Long clone time after done.

2012-10-23 Thread Nguyen Thai Ngoc Duy
On Wed, Oct 24, 2012 at 1:30 AM, Uri Moszkowicz u...@4refs.com wrote:
 I have a large repository which I ran git gc --aggressive on that
 I'm trying to clone on a local file system. I would expect it to
 complete very quickly with hard links but it's taking about 6min to
 complete with no checkout (git clone -n). I see the message Clining
 into 'repos'... done. appear after a few seconds but then Git just
 hangs there for another 6min. Any idea what it's doing at this point
 and how I can speed it up?

done. is printed by clone_local(), which is called in cmd_clone().
After that there are just a few more calls. Maybe you could add a few
printf in between these calls, see which one takes most time?
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html