On Fri, 2001-10-19 at 17:01, Doug Cutting wrote: > > Rather than highlight terms, I would just extract the first hit token, > > and a certain number of characters either side of it. > > I think this is the best approach. Since you'll probably only be displaying > around ten hits at a time, the cost of re-tokenizing is fairly small. > Please consider contributing your code when it is complete.
I'm trying to implement this and should be able to contribute any succesful results, but I need to produce context on a per-field basis. Eg. if I got a token hit in the text body of a document, but the first hit token was a word in the section title, I'd want to generate context around the token in the text body. I had been using a TokenStream to try this. However, lucene's Token class doesn't seem to have any concept of fields, (even when I tokenStream() a document that is in the index with a whole bunch of fields). Is there any reason for this? Moreover, any suggestions of how to find the information I need? The natural thing seems to be to have a field-aware token stream, but I'm not sure how I'd go about implementing that... Regards, -- Lee Mallabone