I'm not sure what applications people have in mind for Term Vector support but I would prefer to have the original text positions (not term sequence positions) stored so I can offer this: 1) Significant terms/phrases identification Like "Gigabits" on gigablast.com - used to offer choices of (unstemmed) "significant" terms and phrases for query expansion to the end user. 2) Optimised Highlighting No more re-tokenizing of text to find unstemmed forms.
The current "more like this query " can be optimised if it uses TermVectors too - it simply takes a document ID and obtains a list of significant terms without the need to re-tokenize (it doesn't need to know any positions - just term frequencies) Am I missing something or are there other applications where term sequence position is more useful than term text position? --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]