Re: Use case for term vector's token position/offset?

Grant Ingersoll Tue, 21 Nov 2006 19:20:06 -0800

Hi Jong,

I think these are useful for things like highlighting (I thinkcontrib/highlighter can use them); other post processing algorithmssuch as: question answering, calculating co-occurrences (find the 6terms to the left and right of the term at position 16). Perhaps youwant to give higher scores to documents where your terms occur in acertain part of the document (like the beginning)

Really, any application where you need to know the relationshipsbetween the terms in a document or the document and the original.


HTH,
Grant

On Nov 21, 2006, at 10:36 AM, Jong Kim wrote:

Hi,

When I look at org.apache.lucene.document.Field.TermVector,
it defines the following 5 options as to the detailed info
that can be stored wrt term vectors.

1. NO
2. WITH_OFFSETS
3. WITH_POSITIONS
4. WITH_POSITIONS_OFFSETS
5. YES

It isn't difficult to understand where the basic term vector
information (ie, terms and their number of occurences - option 5)
might be useful. I believe it can be used to implement features
like "concept search" or "more like this" functionalities.

However, it isn't clear to me how the other extra info (ie,
token position information and/or token offset information)
might be used? Can anyone help me understand what kind of
(advanced) search techniques people use these extra
information for, or even better, any pointer to real world
examples?

Thanks
/Jong


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Use case for term vector's token position/offset?

Reply via email to