> (we couldn't use hashing for traffic reductions, safely).

yes you can.  you can use hashes to build a hash table
with a collision policy.  there is some company
(whose name escapes me; maybe someone else will
remember) that makes exactly this product, so that
once network A has sent a particular chunk of data to
network B once, future transmissions are replaced 
transparently with a shorter name.  kind of like
lempel-ziv on steroids.  apparently it makes 
cross-country ms exchange servers and file servers
much more bearable.

> it would be an interesting feature. Of course the fs on top then
> MUST refresh from time to time, but this can be done while the 
> system is idle (good for situations with high load peaks and enough
> idle time on the other hand). 

sorry, but this is just a fantastically terrible idea.
you're taking a reliable system and making it unreliable.

if you were really concerned, it would be better
to implement a garbage collector that you could
hand a root set.  even that would worry me (a simple
bug would wipe out your entire archive), but it
wouldn't be as bad as relying on timeouts.

> For this I need to be *sure* that there will be
> *no* collissions, even if the system runs for a long time and
> grows really big (maybe several PB on thousands of nodes). 
> 
> Another interesting question: can the risk of colissions be
> reduced by combining several different hash functions in
> parallel ? 

sure.  use sha-256 and your probability of collision goes
down even further.  but *you* (probably) still won't be *sure*.

russ

Reply via email to