I don't know of classes which will be suitable but if they are ordered queries a simple code could easily be written.
On Mon, Feb 22, 2010 at 9:59 PM, Nigel <nigelspl...@gmail.com> wrote: > I'd like to scan documents as they're being indexed, to find out > immediately > if any of them match certain queries. The goal is to find out of there are > any new hits for these queries as soon as possible, without re-searching > the > index over and over (which would be inefficient, and higher latency). The > documents still need to be indexed (not just scanned) so they can be > searched later with different queries not known at index time. > > The indexing throughput is in the tens of millions per day, and there are > maybe a thousand queries or so to be matched. So this has to work pretty > fast. (-: Fortunately the number and size of fields are both fairly > small. > > This scanning could of course be completely decoupled from the indexing > process. But my thinking was that since we already have the documents in > hand, and we'll be analyzing various fields in the course of indexing, we > could ideally reuse those token streams somehow for this on-the-fly > scanning > process. > > I took a look at the org.apache.lucene.index.memory.MemoryIndex class in > contrib. It looks like that would work, but I'm not sure if it's the most > appropriate solution (for one thing, it would have to re-analyze all the > fields). Has anyone here done something similar and/or know of other > classes that would be suitable? > > Thanks, > Chris >