Alex, On 4/25/07, Alexandru Popescu ☀ <[EMAIL PROTECTED]> wrote:
I may be misreading something, but my main concern with this approach is that while minimizing the size of the storage (which is very cheap right now and almost infinite) it has a penalty on the access performance: needing 2 "I/O" operations for reading a value. The caching strategy may address this problem, but even if memory is also cheap it is still limitted. So, while I see this solution fit for cases where huge amounts of duplicate data would be stored, for all the other cases I see it as suboptimal.
Hm - not sure I agree with the assumption that storage is cheap/infinite. Try dealing with backups / etc on a repository that is 50GB in size, then try with 100GB+ - it gets to be a major headache. Even with lots of bandwidth, copying 100GB over a WAN can do all sorts of nasty things, like crash firewalls, etc. With a versioning repository using multiple workspaces, disk space usage can grow extremely fast and we're finding we have many GB of data, 90%+ of which is duplicates. Something like what Jukka is suggesting would help enourmously. I guess it's one of those "depends on the use case" things :-) miro
