Hi Jong,

I think these are useful for things like highlighting (I think contrib/highlighter can use them); other post processing algorithms such as: question answering, calculating co-occurrences (find the 6 terms to the left and right of the term at position 16). Perhaps you want to give higher scores to documents where your terms occur in a certain part of the document (like the beginning)

Really, any application where you need to know the relationships between the terms in a document or the document and the original.

HTH,
Grant

On Nov 21, 2006, at 10:36 AM, Jong Kim wrote:

Hi,

When I look at org.apache.lucene.document.Field.TermVector,
it defines the following 5 options as to the detailed info
that can be stored wrt term vectors.

1. NO
2. WITH_OFFSETS
3. WITH_POSITIONS
4. WITH_POSITIONS_OFFSETS
5. YES

It isn't difficult to understand where the basic term vector
information (ie, terms and their number of occurences - option 5)
might be useful. I believe it can be used to implement features
like "concept search" or "more like this" functionalities.

However, it isn't clear to me how the other extra info (ie,
token position information and/or token offset information)
might be used? Can anyone help me understand what kind of
(advanced) search techniques people use these extra
information for, or even better, any pointer to real world
examples?

Thanks
/Jong


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to