Doug Cutting wrote:

Grant Ingersoll wrote:

Do you see any reason to write position information at all for the term vectors?


It could be useful to some folks. If, for example, you only want to expand a query with terms that occur near query terms, like automatic phrase identification. In general, the vector stuff is just a constant factor improvement over re-tokenizing the text of the document, but hopefully a substantial one. If folks are doing computations which require positional information, but don't require the actual text (e.g., they don't need user-readable fragments) then positions could be handy.

But, certainly, most applications for term vectors do not need positions, and I would not be upset if these were left out of the first version.

Forgive me for being thick, however what position information are we talking about here? The start and end position of the token in the source text that the term came from? If so I think it would be useful to have them in at some point as I believe they could be used to optimized the query highlighting code that Mark Harwood contributed to not have to reanalyze the text every time one wanted to generate a highlighted search summary.


Regards,

Bruce Ritchie
http://www.jivesoftware.com/

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature



Reply via email to