On Jun 28, 2007, at 5:29 AM, Samuel LEMOINE wrote:
Thanks for the resources about payloads, I'll have a look over it.
About the positions/offsets in .tvf, please tell me if I've well
understood:
The .tvd provides the needed informations concerning the
occurrences of each term in documents, and thanks to these
informations, Lucene is able to determinate how many documents
contain the term "foo".
Not exactly, Term Vectors only could tell you how many times foo
occurs in a particular, known document. If you are looking for
general information on a Term and the documents it occurs in (i.e.
the inverted index) have a look at the TermEnum and TermDocs.
Thus the position/offset data contained in .tvf can just consist in
a list of positions in the different documents containing "foo"
concatenated ? I mean, if foo appears in positions 1,30,65 in doc
0, and positions 27 & 52 in doc 2, the .tvf will give "1 30 65 27
52" and Lucene rests on .tvd to determine which positions belongs
to which document? (or rather "1 29 35 27 25" as it is delta-
positions)
No, you only could find out about doc 0 or doc 2 separately using
TermVectors.
HTH,
Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]