> (we couldn't use hashing for traffic reductions, safely). yes you can. you can use hashes to build a hash table with a collision policy. there is some company (whose name escapes me; maybe someone else will remember) that makes exactly this product, so that once network A has sent a particular chunk of data to network B once, future transmissions are replaced transparently with a shorter name. kind of like lempel-ziv on steroids. apparently it makes cross-country ms exchange servers and file servers much more bearable.
> it would be an interesting feature. Of course the fs on top then > MUST refresh from time to time, but this can be done while the > system is idle (good for situations with high load peaks and enough > idle time on the other hand). sorry, but this is just a fantastically terrible idea. you're taking a reliable system and making it unreliable. if you were really concerned, it would be better to implement a garbage collector that you could hand a root set. even that would worry me (a simple bug would wipe out your entire archive), but it wouldn't be as bad as relying on timeouts. > For this I need to be *sure* that there will be > *no* collissions, even if the system runs for a long time and > grows really big (maybe several PB on thousands of nodes). > > Another interesting question: can the risk of colissions be > reduced by combining several different hash functions in > parallel ? sure. use sha-256 and your probability of collision goes down even further. but *you* (probably) still won't be *sure*. russ