I'm not sure what applications people have in mind for Term Vector support  but I 
would prefer to have the original text positions (not term sequence positions) stored 
so I can offer this:
1) Significant terms/phrases identification
Like "Gigabits" on gigablast.com - used to offer choices of (unstemmed) "significant" 
terms and phrases for query expansion to the end user.
2) Optimised Highlighting
No more re-tokenizing of text to find unstemmed  forms.

The current "more like this query " can be optimised if it uses TermVectors too  - it 
simply takes a document ID and obtains a list of  significant terms without the need 
to re-tokenize (it doesn't need to know any positions - just term frequencies)

Am I missing something or are there other applications where term sequence position is 
more useful than term text position?

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to