
there is an interesting research around similarity search at my university 
driven by David Novák (CC-ed). If anyone interested, see [1][2][3]. 

Shortly: they basically achieved similarity search on any data (images, songs, 
etc...) by creating some sort of custom index, that stores a "similarity 
vector" for each object in the database. This index can solve queries like 
"give me the most similar images to this example". So why am I posting this 

The architecture is designed on top of Infinispan and they want to use it to 
speed it up. Basically, they would like to distribute the entries across the 
cluster, each node would have the similarity index of its entries. Then, when a 
query comes, it would be distributed to all the nodes, custom search would be 
performed on the node's indexes and the result returned. This is approximately 
what Index.LOCAL and ClusteredQuery could do.

The difference is that the indexing and searching mechanism must be custom. So 
I wanted to ask what do you think about implementing such a feature to 
Infinispan. I was thinking about somehow extracting general API for 
indexing/searching, then e.g. our Lucene search would become its 

I would be happy to take this as a contribution, since I find this extremely 
interesting topic and also create a diploma thesis out of this. 
So here are some questions:
1) Is it doable?
2) Do we want this feature?
3) How to design it/where to start?

Any input is more then welcome :)


[1] https://drive.google.com/file/d/0B4sztQSfpi3rRlJBQjJHMkR2LXc/view
[2] https://drive.google.com/file/d/0B4sztQSfpi3rU2p2MV9jRE9iTUk/view
[3] https://drive.google.com/file/d/0B4sztQSfpi3rZUpld24ydzJNclk/view

