Re: Scanning docs at index time

Apoorv Sharma Mon, 22 Feb 2010 23:48:17 -0800

I don't know of classes which will be suitable but if they are ordered
queries a simple code could easily be written.


On Mon, Feb 22, 2010 at 9:59 PM, Nigel <[email protected]> wrote:

> I'd like to scan documents as they're being indexed, to find out
> immediately
> if any of them match certain queries.  The goal is to find out of there are
> any new hits for these queries as soon as possible, without re-searching
> the
> index over and over (which would be inefficient, and higher latency).  The
> documents still need to be indexed (not just scanned) so they can be
> searched later with different queries not known at index time.
>
> The indexing throughput is in the tens of millions per day, and there are
> maybe a thousand queries or so to be matched.  So this has to work pretty
> fast.  (-:  Fortunately the number and size of fields are both fairly
> small.
>
> This scanning could of course be completely decoupled from the indexing
> process.  But my thinking was that since we already have the documents in
> hand, and we'll be analyzing various fields in the course of indexing, we
> could ideally reuse those token streams somehow for this on-the-fly
> scanning
> process.
>
> I took a look at the org.apache.lucene.index.memory.MemoryIndex class in
> contrib.  It looks like that would work, but I'm not sure if it's the most
> appropriate solution (for one thing, it would have to re-analyze all the
> fields).  Has anyone here done something similar and/or know of other
> classes that would be suitable?
>
> Thanks,
> Chris
>

Re: Scanning docs at index time

Reply via email to