
I have to index (tokenized) documents which may have very much pages, up to 
I also have to know on which pages the search phrase occurs.
I have to update some stored index fields for my document.
The content is never changed.

Thus I think I have to add one lucene document with the index fields and one 
lucene document per page.


-Field 1-N
-Page 1-N

-Lucene Document with ID, page number 0 and Field1 - N (stored fields)
-Lucene Document 1 with ID, page number 1 and tokenized content of Page 1
-Lucene Document N with ID, page number N and tokenized content of Page N

Delete of MyDocument -> IndexWriter#deleteDocuments(Term:ID=foo)

Update of stored index fields -> IndexWriter#updateDocument(Term: ID=foo, page 
number = 0)

Search with index and content.

Step 1: Search on stored index fields -> List of IDs
Step 2: Search on ID field (list from above OR'ed together) and content -> List 
of IDs and page numbers

Does this work?

What drawbacks has this approch?
Is there another way to achieve what I want?

Thank you.


There are millions of documents with a page range from 1 to 10.000.

