You could have a field within each doc say "Processed" and store a
> value Yes/No, next run a searcher query which should give you the > collection of unprocessed ones. > That sounds like a reasonable idea, and I just realized that I could have done that in a way specific to my application. However, I already tried doing something with a MatchAllDocsQuery with a custom collector and sort by date. I store the last date and time of a doc I processed and process only newer ones.