fair enough Vikas, as said I was also proposing BloomFilter. At that stage I "gave up” mainly due “synchronization issue” at startup. Since the cache initialization was a separate thread I had to take care on properly handle this situation and updating properly the bloomfilter. With this new approach of limiting the vanity Path entries size I think the Bloom Filter can come back in the solution :)
Thanks for the “remainder” regards antonio On Dec 5, 2014, at 6:02 PM, Vikas Saurabh <vikas.saur...@gmail.com> wrote: >> we already took in consideration Bloom Filter for a related issue [2]. >> We decided that is still not too optimal since it leads toward content >> duplication and I would like to avoid that for now >> >> [2] https://issues.apache.org/jira/browse/SLING-3290 >> > > Well, imho, bloom filters won't duplicate content -- they'd just have > bit-masks to tentatively mark existence of a value. Moreover, if we > use guava's implementation (which I think sling doesn't want to do... > if I am reading SLING-3290 correctly), then we can serialize them on > clean shutdown to have practically no work done during startup. For > crashes, we can probably live with re-creating the filter again. > > About, BloomFilterUtils attached in SLING-3290, I think it's just > using 1 hash function to create mask. In general, bloom filter > implementation would have more number of hashes to configure less > false-positives. > > About caching actual data in RAM (and assuming sling would sit on top > of Oak??) -- should caching of most used nodes be a responsibility of > repository implementation?.. but, that's probably a different > discussion. > > Thanks, > Vikas