Hi all,

I'm trying to index documents so that a) I have all the documents indexed 'normally' (in that I can search for documents that match certain words, and b) parts of the document that I consider important, such as author and title are ALSO stored in their own indexed fields.

I have (a) working fine, and (b) is almost working - however, I'm trying to force the separate field to have the original offsets of where it existed in the text. As in, if the title was at characters 76-200 in the original text, I'd like the field to have that as its information, so when I look at the field I can find the place in the document quickly.

I don't seem to be able to do this - I have my own analyzer that finds the tokens and sets the start and end offsets accordingly. However, when I create the new field and write it to the index, it seems like these offsets are ignored? When I pull offsets out later, they start at 0 and move up from there.

I am creating the field like:

CASAnnotationAnalyzer psa = new CASAnnotationAnalyzer();
analyzer.addAnalyzer(info.indexName, psa);

TokenStream ts = psa.tokenStream(info.indexName,
                                             new StringReader(info.value));
Field stemF = new Field(info.indexName, ts,
                                    Field.TermVector.WITH_POSITIONS_OFFSETS);
d.add(stemF);

(d is the document being indexed).

I have tried various permutations of creating the field and token stream - does anyone have any insights, please?

Thanks in advance,
Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to