On Jun 28, 2007, at 9:06 AM, Samuel LEMOINE wrote:
Grant Ingersoll a écrit :
On Jun 28, 2007, at 5:29 AM, Samuel LEMOINE wrote:
Thanks for the resources about payloads, I'll have a look over it.
About the positions/offsets in .tvf, please tell me if I've well
understood:
The .
(quote)
Field (.tvf) --> TVFVersion<NumTerms, Position/Offset, TermFreqs>
^NumFields // this structure is repeated for each Field
TVFVersion --> Int
NumTerms --> VInt
Position/Offset --> Byte
TermFreqs --> <TermText, TermFreq, Positions?, Offsets?>
^NumTerms //this structure is repeated for each Term
of each Field
TermText --> <PrefixLength, Suffix>
PrefixLength --> VInt
Suffix --> String
TermFreq --> VInt
Positions --> <VInt>^TermFreq //this "Position" data
appears once per occurrence of each Term of each Field... but as
far as I know, TermFreq is the number of occurrences of a Term, in
all documents regardless their number (not sure of that actually)
Offsets --> <VInt, VInt>^TermFreq
(/quote)
^
I doubt that the "TermFreq" found in this description is the same
than the one found in Frequencies section (http://lucene.apache.org/
java/2_2_0/fileformats.html#Frequencies):
You are correct, it is not the same, hence the definition of it as:
TermFreq -> VInt
right above the Positions declaration and below the definition of
Suffix in this section (the Term Vectors section). Thus, the
Positions is, essentially, an array of VInts, one entry per the
number of times it occurs in that particular document.
Here is the relevant code in TermVectorsReader with my comments:
int freq = tvf.readVInt(); //notice it is not decoding any docdelta,
it is just getting a vint
int [] positions = null;
if (storePositions) { //read in the positions
positions = new int[freq];
int prevPosition = 0;
for (int j = 0; j < freq; j++) //For each occurrence of this
term in this document
{
positions[j] = prevPosition + tvf.readVInt(); //Read in
the delta-encoded position.
prevPosition = positions[j];
}
}
-Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]