cutting 2004/03/30 09:28:11
Modified: docs fileformats.html
xdocs fileformats.xml
Log:
Fixed a few problems with file format doc.
Revision Changes Path
1.23 +15 -32 jakarta-lucene/docs/fileformats.html
Index: fileformats.html
===================================================================
RCS file: /home/cvs/jakarta-lucene/docs/fileformats.html,v
retrieving revision 1.22
retrieving revision 1.23
diff -u -r1.22 -r1.23
--- fileformats.html 29 Mar 2004 22:30:40 -0000 1.22
+++ fileformats.html 30 Mar 2004 17:28:11 -0000 1.23
@@ -1350,7 +1350,7 @@
<TermInfo><sup>TermCount</sup>
</p>
<p>TermInfo -->
- <Term, DocFreq, FreqDelta, ProxDelta>
+ <Term, DocFreq, FreqDelta, ProxDelta, SkipDelta>
</p>
<p>Term -->
<PrefixLength, Suffix, FieldNum>
@@ -1359,7 +1359,7 @@
String
</p>
<p>PrefixLength,
- DocFreq, FreqDelta, ProxDelta<br /> --> VInt
+ DocFreq, FreqDelta, ProxDelta, SkipDelta<br />
--> VInt
</p>
<p>This
file is sorted by Term. Terms are ordered first
lexicographically
@@ -1394,6 +1394,13 @@
this term's data in that file and the position of the
previous
term's data (or zero, for the first term in the file.
</p>
+ <p>SkipDelta determines the position of this
+ term's SkipData within the .frq file. In
+ particular, it is the number of bytes
+ after TermFreqs that the SkipData starts.
+ In other words, it is the length of the
+ TermFreq data.
+ </p>
</li>
<li>
<p>
@@ -1451,8 +1458,7 @@
document.
</p>
<p>FreqFile (.frq) -->
- <TermFreqs><sup>TermCount</sup>
- <SkipDatum><sup>TermCount/SkipInterval</sup>
+ <TermFreqs, SkipData><sup>TermCount</sup>
</p>
<p>TermFreqs -->
<TermFreq><sup>DocFreq</sup>
@@ -1460,7 +1466,10 @@
<p>TermFreq -->
DocDelta, Freq?
</p>
- <p>SkipDatum -->
+ <p>SkipData -->
+ <SkipDatum><sup>DocFreq/SkipInterval</sup>
+ </p>
+ <p>SkipDatum -->
DocSkip,FreqSkip,ProxSkip
</p>
<p>DocDelta,Freq,DocSkip,FreqSkip,ProxSkip -->
@@ -1497,7 +1506,7 @@
relative to the start of TermFreqs and Positions,
to the previous SkipDatum in the sequence.
</p>
- <p>For example, if TermCount=35 and
SkipInterval=16,
+ <p>For example, if DocFreq=35 and
SkipInterval=16,
then there are two SkipData entries, containing
the 15<sup>th</sup> and 31<sup>st</sup> document
numbers in TermFreqs. The first FreqSkip names
@@ -1725,32 +1734,6 @@
billion. This is not today a problem, but, in the long term,
probably will be. These should therefore be replaced with either
UInt64 values, or better yet, with VInt values which have no limit.
- </p>
- <p>There
- are only two places where the code requires that a value be fixed
- size. These are:
- </p>
- <ol>
- <li><p>
- The FieldValuesPosition (in the stored field index file,
.fdx).
- This already uses a UInt64, and so is not a problem.
- </p></li>
- <li><p>The
- TermCount (in the term info file, .tis). This is written
last but
- is read when the file is first opened, and so is stored at
the
- front. The indexing code first writes an zero here, then
overwrites
- it after the rest of the file has been written. So unless
this is
- stored elsewhere, it must be fixed size and should be
changed to a
- UInt64.
- </p>
- </li>
- </ol>
- <p>Other
- than these, all UInt values could be converted to VInt to remove
- limitations.
- </p>
- <p><br /><br />
-
</p>
</blockquote>
</p>
1.11 +15 -6 jakarta-lucene/xdocs/fileformats.xml
Index: fileformats.xml
===================================================================
RCS file: /home/cvs/jakarta-lucene/xdocs/fileformats.xml,v
retrieving revision 1.10
retrieving revision 1.11
diff -u -r1.10 -r1.11
--- fileformats.xml 29 Mar 2004 23:40:31 -0000 1.10
+++ fileformats.xml 30 Mar 2004 17:28:11 -0000 1.11
@@ -923,7 +923,7 @@
<TermInfo><sup>TermCount</sup>
</p>
<p>TermInfo -->
- <Term, DocFreq, FreqDelta, ProxDelta>
+ <Term, DocFreq, FreqDelta, ProxDelta, SkipDelta>
</p>
<p>Term -->
<PrefixLength, Suffix, FieldNum>
@@ -932,7 +932,7 @@
String
</p>
<p>PrefixLength,
- DocFreq, FreqDelta, ProxDelta<br/> --> VInt
+ DocFreq, FreqDelta, ProxDelta, SkipDelta<br/>
--> VInt
</p>
<p>This
file is sorted by Term. Terms are ordered first
lexicographically
@@ -967,6 +967,13 @@
this term's data in that file and the position of the
previous
term's data (or zero, for the first term in the file.
</p>
+ <p>SkipDelta determines the position of this
+ term's SkipData within the .frq file. In
+ particular, it is the number of bytes
+ after TermFreqs that the SkipData starts.
+ In other words, it is the length of the
+ TermFreq data.
+ </p>
</li>
<li>
<p>
@@ -1016,8 +1023,7 @@
document.
</p>
<p>FreqFile (.frq) -->
- <TermFreqs><sup>TermCount</sup>
- <SkipDatum><sup>TermCount/SkipInterval</sup>
+ <TermFreqs, SkipData><sup>TermCount</sup>
</p>
<p>TermFreqs -->
<TermFreq><sup>DocFreq</sup>
@@ -1025,7 +1031,10 @@
<p>TermFreq -->
DocDelta, Freq?
</p>
- <p>SkipDatum -->
+ <p>SkipData -->
+ <SkipDatum><sup>DocFreq/SkipInterval</sup>
+ </p>
+ <p>SkipDatum -->
DocSkip,FreqSkip,ProxSkip
</p>
<p>DocDelta,Freq,DocSkip,FreqSkip,ProxSkip -->
@@ -1062,7 +1071,7 @@
relative to the start of TermFreqs and Positions,
to the previous SkipDatum in the sequence.
</p>
- <p>For example, if TermCount=35 and SkipInterval=16,
+ <p>For example, if DocFreq=35 and SkipInterval=16,
then there are two SkipData entries, containing
the 15<sup>th</sup> and 31<sup>st</sup> document
numbers in TermFreqs. The first FreqSkip names
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]