On Fri, 18 Aug 2006 06:04:37 +0300 Eugen Minciu <[EMAIL PROTECTED]> babbled:
> Hi everyone. > > I've been doing some thinking today. And I've been doing some testing as > well. And there are a few things I realized. > > The first thing I realized is a reason why the pseudo-benchmark I created was > giving out evil data. In git's case this is because git does a lot of extra > operations on the client's disk (unpacking and such) which take up a lot of > time. During that time the server wouldn't be under any load, which shows why > the load on the server wasn't anywhere near constant. > > And then I realized why git wasn't really doing so well. And at the same time > this is the reason why cvs isn't doing so well and, frankly no scm possibly > could. > > The problem is that you have a truckload of binary data in the repository. > There are many reasons why this shouldn't be so. > > 1) Binary data is way better off distributed in the form of archives, that > can be mirrored by anyone (I'm thinking at least SF). That way people can get > that data a lot faster and your server is happy too. > > 2) You don't change the binary data that much. And even when you do so, you > could pacakge your data into archives like imlib2-data.tar.bz2 so that you > repackage less. > > 3) Changes in binary data don't generally affect dependencies. They're not > like API changes or whatever. Most of the time people will just need to grab > one updated archive and that's it. > > 4) You could then use pkg-config to ensure the right version of the data is > actually installed from your configure scripts. > > 5) Let's do some simple math. > > You have 100MB worth of files. These account to 60MB binary and 40MB text. > When you try to compress this, as git does, you get around 50MB binary and > 8MB test. So that accounts for almost 60MB. > > That means for every 60 people that would simultaneously download through CVS > you can have 100 download through git (let's just ignore the other factors > and focus on bandwith a little). > > Now suppose you have 40MB of text. With git you can then down to about 20% of > the original size (maybe less, who knows). That means you could (in theory) > actually have 5 times more downloads with git then with CVS. > > Now I'm not saying to not keep that data in a repo. You obviously have to. > I'm just saying there's no need for people to have anonymous access to that > repo, it could be for developers only. > > So, my suggestions are: > 1) Move the data into its own repository not going to happen. the data is an internal part of the projects - it gets modifed 8new icons, images etc.) and is part of the build process. so not going to happen. the code is useless without the data - there is no point splitting it and doing so is a tonne of work that makes building more painful for developers and users. > 2) Convert the two repositories to git > 3) Make that data repository devel-only. > 4) Split the data into small packages (one for each data/ dir in the tree, I > guess) > 5) Make the source require the data through pkg-config > 6) Have the data released as tarballs once it's changed (you can have that > happen automatically with git, I'm assuming you can with the others as well) at this point - why bother with git at all. just ake tarball snaps. much less effort. > And that's it. But for all this babbling, is this really worth it? > Like I said, I found client-side disk I/O to make the benchmarks mostly > useless. But they still provide me with a good overview on server-side CPU & > Memory usage > > So I opted for a new approach. I would have two terminals on my client. In > one I'd do something like 'sleep 5 ; svn checkout ...'. In the second I'd do > 'time read'. I would press enter once when network traffic actually began and > once again when it stopped and that showed me how much everything took. > > So here's the timings. The repos have no history attatched. > > Repo with data: > CVS: 0:46 > SVN(svnserve): 1:16 > SVN(HTTP): 1:58 > GIT(git): 1:23 > GIT(HTTP): 1:53 > > Same repo without data: > CVS: 0:12 > SVN(svnserve): 0:28 > SVN(http): 0:37 > GIT(http): 0:13 > > And what about Git with its built in protocol? Just six seconds. How's that > for taking some load off :) Of course you have to add/substract 1s for my > timings on the keyboard but you get the overall idea. > > This is a very complicated way of doing things. But data should probably be > separated from code. And it should probably be distributed in small archives. > And people shouldn't have to use an SCM to get it. > > So ... Wadda ya say. Is this too complicated/ not worth it / stupid / > braindamaged / interesting ? > > My brain farts more things like that on a regular basis. If the above makes > sense, let me know and I'll give you a couple of other ideas as well :d > > Eugen. > > P.S: I knew Linus wouldn't lie ;) though git seems nice - i am beginning to think its not going to solve a lot. we need to really just provide alternate mechanisms to get the code and moe anoncvs mirros i think. > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > enlightenment-devel mailing list > enlightenment-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/enlightenment-devel > -- ------------- Codito, ergo sum - "I code, therefore I am" -------------- The Rasterman (Carsten Haitzler) [EMAIL PROTECTED] 裸好多 Tokyo, Japan (東京 日本) ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel