Re: [git-users] worlds slowest git repo- what to do?

2014-05-16 Thread Duy Nguyen
On Fri, May 16, 2014 at 2:06 AM, Philip Oakley philipoak...@iee.org wrote:
 From: John Fisher fishook2...@gmail.com

 I assert based on one piece of evidence ( a post from a facebook dev) that
 I now have the worlds biggest and slowest git
 repository, and I am not a happy guy. I used to have the worlds biggest
 CVS repository, but CVS can't handle multi-G
 sized files. So I moved the repo to git, because we are using that for our
 new projects.

 goal:
 keep 150 G of files (mostly binary) from tiny sized to over 8G in a
 version-control system.

I think your best bet so far is git-annex (or maybe bup) for dealing
with huge files. I plan on resurrecting Junio's split-blob series to
make core git handle huge files better, but there's no eta on that.
The problem here is about file size, not the number of files, or
history depth, right?

 problem:
 git is absurdly slow, think hours, on fast hardware.

Probably known issues. But some elaboration would be nice (e.g. what
operation is slow, how slow, some more detail characteristics of the
repo..) in case new problems pop up.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git-users] worlds slowest git repo- what to do?

2014-05-16 Thread Duy Nguyen
On Sat, May 17, 2014 at 4:22 AM, John Fisher fishook2...@gmail.com wrote:
 Probably known issues. But some elaboration would be nice (e.g. what 
 operation is slow, how slow, some more detail
 characteristics of the repo..) in case new problems pop up.

 so far I have done add, commit, status, clone - commit and status are slow; 
 add seems to depend on the files involved,
 clone seems to run at network speed.
 I can provide metrics later, see above. email me offline with what you want.

OK commit -a should be just as slow as add, but as-is commit and
status should be fast unless there are lots of files (how many in your
worktree?) or we hit something that makes us look into (large) file
content anyway.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git-users] worlds slowest git repo- what to do?

2014-05-15 Thread Philip Oakley

From: John Fisher fishook2...@gmail.com
I assert based on one piece of evidence ( a post from a facebook dev) 
that I now have the worlds biggest and slowest git
repository, and I am not a happy guy. I used to have the worlds 
biggest CVS repository, but CVS can't handle multi-G
sized files. So I moved the repo to git, because we are using that for 
our new projects.


goal:
keep 150 G of files (mostly binary) from tiny sized to over 8G in a 
version-control system.


problem:
git is absurdly slow, think hours, on fast hardware.

question:
any suggestions beyond these-
http://git-annex.branchable.com/
https://github.com/jedbrown/git-fat
https://github.com/schacon/git-media
http://code.google.com/p/boar/
subversion

?


At the moment some of the developers are looking to speed up some of the 
code on very large repos, though I think they are looking at code repos, 
rather than large file repos. They were looking for large repos to test 
some of the code upon ;-)


I've copied the Git list should they want to make any suggestions.




Thanks.

--
Philip 


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git-users] worlds slowest git repo- what to do?

2014-05-15 Thread Sam Vilain
On 05/15/2014 12:06 PM, Philip Oakley wrote:
 From: John Fisher fishook2...@gmail.com
 I assert based on one piece of evidence ( a post from a facebook dev)
 that I now have the worlds biggest and slowest git
 repository, and I am not a happy guy. I used to have the worlds
 biggest CVS repository, but CVS can't handle multi-G
 sized files. So I moved the repo to git, because we are using that
 for our new projects.

 goal:
 keep 150 G of files (mostly binary) from tiny sized to over 8G in a
 version-control system.

 problem:
 git is absurdly slow, think hours, on fast hardware.

 question:
 any suggestions beyond these-
 http://git-annex.branchable.com/
 https://github.com/jedbrown/git-fat
 https://github.com/schacon/git-media
 http://code.google.com/p/boar/
 subversion


You could shard.  Break the problem up into smaller repositories, eg via
submodules.  Try ~128 shards and I'd expect that 129 small clones should
complete faster than a single 150G clone, as well as being resumable etc.

The first challenge will be figuring out what to shard on, and how to
lay out the repository.  You could have all of the large files in their
own directory, and then the main repository just has symlinks into the
sharded area.  In that case, I would recommend sharding by date of the
introduced blob, so that there's a good chance you won't need to clone
everything forever; as shards with not many files for the current
version could in theory be retired.  Or, if the directory structure
already suits it, you could directly use submodules.

The second challenge will be writing the filter-branch script for this :-)

Good luck,
Sam


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html