I have a project where I need to index documents using Lucene 4.1.0. One
of the fields for the stored Document is the actual text from the
document(.pdf, .docx, etc.) I want to be able to highlight text from the
documents in the search results. I was looking at some older tutorials about
storing the field with TermVectors and also storing it in the index with
Store.COMPRESS. However, with Lucene 4.1 they have done away with
Store.COMPRESS. Is there still a way to compress the field?
I am worried about the amount of space that will be stored in the index if
I have to have the "body" Field stored and uncompressed.
Are there ways around having to store the whole Field in its original form?
Since I am already going to be storing the actual documents on the server,
would it be feasible (time) to not store TermVectors or Store the field at all
until the user searches for a document. Then at runtime I can re-index the top
docs from the original documents in RAM and use Highlighter to return fragments?
Thanks