Hi, I would like to continuously iterate over the documents in my lucene index as the index is updated. Kind of like a "stream" of documents. Is there a way I can achieve this?
Would something like this be sufficient (untested): int currentDocId = 0; while(true) { for(; currentDocId < reader.maxDoc(); currentDocId++) { if(!reader.isDeleted(currentDocId)) { Document d = reader.document(currentDocId); } } // Maybe sleep here or something IndexReader newReader = reader.reopen(); if(newReader != reader) { reader.close(); reader = newReader; } } Right now, I do some NLP on the index that would slow down my indexing if done at the same time, so that is why I'm looking for a solution that works in the background like this. Another concern I have is that starting from scratch (fresh invocation of my program) requires me to load a lot of extra data and then iterate through hundreds of thousands of documents just to get to the newest docs that I haven't processed yet. I would rather just start from the new newest doc and go forward. I am currently checking whether or not I've processed a Document by looking up a field in the Document in a Mongo db, but is there a way I could reliably use the id of the document from the reader to check to see if I've looked at this document already? I've heard that IndexReader.document() is slow so I would like to skip that call if I know I've processed the document already. Any ideas? Thanks, Max