I have been fighting for some time with trying to fix an issues in the application I am developing and since I am starting to run out of ideas I figured I'd try reaching out for help.
I store a couple of million records in Ehcache and wish to use Lucene to quickly find the keys of the elements I need. My cache key has the following fields: subscriberId - String, LongField (Lucene 5.5) or LongPoint + StoredField (Lucene 6.0) date - Integer, IntField (Lucene 5.5) or IntPoint + StoredField (Lucene 6.0) hour - Integer, StoredField networkId - Long, StoredField sessionId - Long, StoredField (for example the date is converted to an Integer like 20160530 and then stored) The above allows me to do quick range querys like: +subscriberId:[12345 TO 12345] +date:[20160501 TO 20160531] I have written my own Collector that extends SimpleCollector and just adds the document ids to a Set<Integer>. After the search I loop through the set, call IndexSearcher.doc(id) to get the document, create my cache key object from the fields and get the element from the cache using the key. I have an Ehcache CacheEventListener which: - when an element is added to the cache: add a Document to Lucene with the fields from the key - when an element is removed from the cache: remove the Document from Lucene with the Term from the key When the application starts it reads all entries from the database in a serial fashion and everything is fine. However then the application launches several threads which consumes messages from a message queue and adds them to the cache (which in turn adds them to Lucene through the listener) (we get a burst of 2000-3000 messages every 5 minutes). And this is where I run in to problems, a search will return the correct number of hits (verified against database) but a number (not all) of the documents are not the correct ones (they contain values for another subscriber/date/etc). At startup I create the Directory and IndexWriter in a synchronized block so all threads/instances use a single shared IndexWriter. I have tried three ways of reading/searching: - DirectoryReader.open(IndexWriter) - DirectoryReader.open(<Directory created at startup and used to create IndexWriter>) - DirectoryReader.open(new Directory(FSDirectory.open(Paths.get(indexDirectory)))) I have also tried with and without IndexWriter.commit() after each addDocument and deleteDocument I must get all documents when I do the search, but getting deleted documents is not an issue. I create/close a new IndexReader for each search request. Clearly things work as long as it runs in a serial fashion but once it starts consuming messages from the queue it runs into problems. One the problem appears if manifests itself even if there are no more writes to the index (i.e. we stop it from consuming new messages and then try a single search which will create a new IndexReader). I have also noticed that it is only if the search includes the most recent date added to the application in the search that I get this issue. So given this example: - data from 20160501 to 20160510 read from database on startup - data for 20160511 and 20160512 received from message queue A search for date:[20160501 TO 20160511] = no problem A search for date:[20160601 TO 20160512] = problems Any ideas on what I am doing wrong? I only started using Lucene a few weeks ago so all I have so far is from reading the API docs and various online examples. Regards, Conny