Hi, I have implemented first part of the hashed index. This is sort of "mtime optimisation", although it works somewhat differently. The idea is to produce a binary index for each directory of the working copy, containing hashes, names and mtimes. A little like git.
Anyhow, I am stuck with any further optimisations due to darcs pristine format. The fact it's hashed is great, but the directory formatting makes it hard to hash efficiently. Moreover, to produce right kind of hashes, I would need to include file sizes, which I currently don't keep (although I probably should). Anyway, I have produced two synthetic benchmark repos: many-files: 20 directories, 2000 files each many-dirs: 20 directories, 20 subdirs each, 100 files per subdir I have imported the two into both git and darcs. (Per-dir with darcs, and it was still very painful... It was a lot better with git, although not super-smooth either.) So the benchmark (darcs-diff has a pregenerated index for these tests, but it does check its validity, like real VCS tool would): (Btw., it seems to me that timestamps used by darcs mtime opt go out of sync all the time. I had to run darcs rev on several occasions to get any reasonable time from darcs wh. The rev operation also ate over 200M RAM and took over some 3 minutes, although with a cold cache... iew.) many-files darcs wh: 2.7s many-files darcs-diff: 1.1s many-files git diff: .15s many-dirs darcs: 2.5s many-dirs darcs-diff: .75 many-dirs git: .2s In other words, there are still ways to go. Coincidentally, 40k files are roughly the size of the linux kernel repository. If we can get reasonable performance (ie. rivalling git) on such trees, I am sure it would make many people much more comfortable with darcs. I am currently at something like factor 4 to 7 slower than git (depending on shape of the tree). I believe that by solving the hashing strategy discrepancies between pristine.hashed and my own index, I can maybe cut my times in half. (About half the time is currently spent checking that the index is up to date... In theory, diffing pristine and the index is very fast, once the index is known to be good.) This will also probably need incompatible pristine format. I'll report back again later. Yours, Petr. -- Peter Rockai | me()mornfall!net | prockai()redhat!com http://blog.mornfall.net | http://web.mornfall.net "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton on the subject of C program indentation _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
