On Sep 22, 2005, at 1:16 PM, Yonik Seeley wrote:
I'd lean toward keeping UInt32 in general, so at least that will
scale to 4B
documents. SegSize is the only place where UInt32 is used that it will
matter (all of the other uses will never approach that size).
OK, sounds good.
writeInt() writes both signed and unsigned integers (or rather the bit
pattern could be interpreted as either, and it's up to the
definition to
decide which it is).
Good point. On the Perl side, I'm specifying how they are
interpreted within the IO method, rather than by a cast outside the
method. Effectively I have writeSignedInt and writeUnsignedInt.
You're right about FORMAT... something should be changed to make it
consistent.
It could be defined as 0xffffffff instead of -1.
That would be a little strange because the test to determine whether
an index in the new format is whether or not FORMAT is less than 0.
The present implementation isn't buggy or problematic, it's just that
the logical inconsistency in the specs doc is confusing for people
like me who are trying to write compliant code. If FORMAT gets
redefined as 0xffffffff, that suggests to me that the algorithm for
identifying the new format should change, to something like if
(FORMAT > 0x7FFFFFF). I don't think either of us wants to change any
code outside the specs doc.
I believe that FORMAT in segments and FORMAT in .tis/.tii are the
only places in the Lucene file format where negative numbers are
required. Would it be overkill to define an Int32 primitive datatype
just for those?
Int32
32-bit signed integers are written as four bytes,
high-order bytes first, in twos-complement encoding.
Int32 --> <Byte>4
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]