It is the location of the token in the document (see IndexReader.termPositions()). This information is already being stored in other parts of the index, it just isn't very efficient to get at it.
I think it would be useful to add to the IndexReader a way to get a list of positions given a term and a document, then we wouldn't have to store this info twice. Something like: TermPositions termPositions(Term term, Document doc); which would return a subset of IndexReader.termPositions(Term term) containing only those Positions that are in the Document. This would need to be implemented in an efficient manner, not just the brute force method of looping over termPositions(Term term). I don't know how easy this would be to do, as I am not familiar with the file structure of the Position information. At least that is my understanding of it, perhaps others have more insight. -Grant >>> [EMAIL PROTECTED] 02/24/04 04:20PM >>> Doug Cutting wrote: > Grant Ingersoll wrote: > >> Do you see any reason to write position information at all for the >> term vectors? > > > It could be useful to some folks. If, for example, you only want to > expand a query with terms that occur near query terms, like automatic > phrase identification. In general, the vector stuff is just a constant > factor improvement over re-tokenizing the text of the document, but > hopefully a substantial one. If folks are doing computations which > require positional information, but don't require the actual text (e.g., > they don't need user-readable fragments) then positions could be handy. > > But, certainly, most applications for term vectors do not need > positions, and I would not be upset if these were left out of the first > version. Forgive me for being thick, however what position information are we talking about here? The start and end position of the token in the source text that the term came from? If so I think it would be useful to have them in at some point as I believe they could be used to optimized the query highlighting code that Mark Harwood contributed to not have to reanalyze the text every time one wanted to generate a highlighted search summary. Regards, Bruce Ritchie http://www.jivesoftware.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]