Hi all,
I'm trying to index documents so that a) I have all the documents indexed
'normally' (in that I can search for documents that match certain words,
and b) parts of the document that I consider important, such as author and
title are ALSO stored in their own indexed fields.
I have (a) working fine, and (b) is almost working - however, I'm trying to
force the separate field to have the original offsets of where it existed
in the text. As in, if the title was at characters 76-200 in the original
text, I'd like the field to have that as its information, so when I look at
the field I can find the place in the document quickly.
I don't seem to be able to do this - I have my own analyzer that finds the
tokens and sets the start and end offsets accordingly. However, when I
create the new field and write it to the index, it seems like these offsets
are ignored? When I pull offsets out later, they start at 0 and move up
from there.
I am creating the field like:
CASAnnotationAnalyzer psa = new CASAnnotationAnalyzer();
analyzer.addAnalyzer(info.indexName, psa);
TokenStream ts = psa.tokenStream(info.indexName,
new StringReader(info.value));
Field stemF = new Field(info.indexName, ts,
Field.TermVector.WITH_POSITIONS_OFFSETS);
d.add(stemF);
(d is the document being indexed).
I have tried various permutations of creating the field and token stream -
does anyone have any insights, please?
Thanks in advance,
Steve
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]