On 19 May 2011, at 09:51, Bela Ban wrote:

> As someone mentioned, the biggest issue would be to make sure the data 
> read from the file system isn't stale. If a node has been down for an 
> extended period of time, then this process might slow things down.
> 
> We'd have to implement some rsync like algorithm, which checks the local 
> data against the cluster data, and this might be costly if the data set 
> is small. If it's big, then that cost would be amortized over the 
> smaller deltas sent over the network to update a local cache.
> 
> I don't think this makes sense as (1) data sets in replicated mode are 
> usually small and (2) Infinispan's focus is on distributed data.

I think in both cases (repl and dist) it still may make sense in some cases.  
E.g., in dist, if a node joins, existing owners could, rather than push data to 
the joiner, just push a list of {key: version} tuples, which may be 
significantly smaller than the values.  The joiner can then load stuff from a 
cache loader based on key/version - we'd need a new API on the CacheLoader, 
like load(Set<KeyVersionPair> keys) - this can be implemented pretty 
efficiently in many cache stores such as JDBC.  The keys that the cache loader 
doesn't retrieve would need to be pulled back across the network.

Certainly not high prio, but something to think about for Infinispan.next().

Cheers
Manik
--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org




_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to