It has been a couple of years since the Neu-IR WS ( https://staff.fnwi.uva.nl/m.derijke/wp-content/papercite-data/pdf/craswell-report-2016.pdf). I'm wondering if anyone has tinkered with storing word/document embeddings and using inside Lucene to improve the core relevance model.
One of the key ideas of neural search is to leverage such representations in order to improve the effectiveness of search engines. It would be very nice if we could have a retrieval model that relies on word and document vectors (also called *embeddings*) with the above capabilities, so we could calculate and leverage document and word similarities very efficiently by looking at the "nearest neighbours". I found this code that can generate word2vec from a Lucene index: https://github.com/kojisekig/word2vec-lucene But the closest work along the lines of using DL in Lucene is this paper about "Large Scale Indexing and Searching Deep Convolutional Neural Network Features" (https://link.springer.com/chapter/10.1007/978-3-319-43946-4_14) that applies mainly to content-based image retrieval. -- J
