Offset Questions

Steve Suppe Fri, 07 Mar 2008 10:39:14 -0800

Hi all,

I'm trying to index documents so that a) I have all the documents indexed'normally' (in that I can search for documents that match certain words,and b) parts of the document that I consider important, such as author andtitle are ALSO stored in their own indexed fields.

I have (a) working fine, and (b) is almost working - however, I'm trying toforce the separate field to have the original offsets of where it existedin the text. As in, if the title was at characters 76-200 in the originaltext, I'd like the field to have that as its information, so when I look atthe field I can find the place in the document quickly.

I don't seem to be able to do this - I have my own analyzer that finds thetokens and sets the start and end offsets accordingly. However, when Icreate the new field and write it to the index, it seems like these offsetsare ignored? When I pull offsets out later, they start at 0 and move upfrom there.


I am creating the field like:

CASAnnotationAnalyzer psa = new CASAnnotationAnalyzer();
analyzer.addAnalyzer(info.indexName, psa);

TokenStream ts = psa.tokenStream(info.indexName,
                                             new StringReader(info.value));
Field stemF = new Field(info.indexName, ts,
                                    Field.TermVector.WITH_POSITIONS_OFFSETS);
d.add(stemF);

(d is the document being indexed).

I have tried various permutations of creating the field and token stream -does anyone have any insights, please?


Thanks in advance,
Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Offset Questions

Reply via email to