[EMAIL PROTECTED] wrote:
Anyone else has any ideas why wouldn't the whole documents be indexed as described below?

Or perhaps someone can enlighten me on how to use Luke to find out if the whole document was indexed or not.
I have not used Luke in such capacity before so not sure what to do or look for?

Well, you could try to use the "Reconstruct & Edit" function - this will give you an idea what tokens ended up in the index, and which was the last one. In Luke 0.6, if the field is stored then you will see two tabs - one is for stored content, the other displays tokenized content where tokens are separated by commas. If the field was un-stored, then the only tab you will get will be the reconstructed content. In any case, just scroll down and check what are the last tokens.


You could also look for presence of some special terms that occur only at the end of that document, and check if they are present in the index.

There are really only few reasons why this might be happening:

* your extractor has a bug, or
* the max token limit is wrongly set, or
* the indexing process doesn't close the IndexWriter properly.


-- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to