On Jun 28, 2007, at 9:06 AM, Samuel LEMOINE wrote:

Grant Ingersoll a écrit :

On Jun 28, 2007, at 5:29 AM, Samuel LEMOINE wrote:
Thanks for the resources about payloads, I'll have a look over it.
About the positions/offsets in .tvf, please tell me if I've well understood:
The .
(quote)
Field (.tvf) --> TVFVersion<NumTerms, Position/Offset, TermFreqs> ^NumFields // this structure is repeated for each Field
TVFVersion --> Int
NumTerms --> VInt
Position/Offset --> Byte
TermFreqs --> <TermText, TermFreq, Positions?, Offsets?> ^NumTerms //this structure is repeated for each Term of each Field
TermText --> <PrefixLength, Suffix>
PrefixLength --> VInt
Suffix --> String
TermFreq --> VInt
Positions --> <VInt>^TermFreq //this "Position" data appears once per occurrence of each Term of each Field... but as far as I know, TermFreq is the number of occurrences of a Term, in all documents regardless their number (not sure of that actually)
Offsets --> <VInt, VInt>^TermFreq
(/quote)
^

I doubt that the "TermFreq" found in this description is the same than the one found in Frequencies section (http://lucene.apache.org/ java/2_2_0/fileformats.html#Frequencies):

You are correct, it is not the same, hence the definition of it as:
TermFreq -> VInt

right above the Positions declaration and below the definition of Suffix in this section (the Term Vectors section). Thus, the Positions is, essentially, an array of VInts, one entry per the number of times it occurs in that particular document.

Here is the relevant code in TermVectorsReader with my comments:
int freq = tvf.readVInt(); //notice it is not decoding any docdelta, it is just getting a vint
      int [] positions = null;
      if (storePositions) { //read in the positions
        positions = new int[freq];
        int prevPosition = 0;
for (int j = 0; j < freq; j++) //For each occurrence of this term in this document
        {
positions[j] = prevPosition + tvf.readVInt(); //Read in the delta-encoded position.
          prevPosition = positions[j];
        }
      }


-Grant



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to