On Sep 22, 2005, at 1:16 PM, Yonik Seeley wrote:

I'd lean toward keeping UInt32 in general, so at least that will scale to 4B
documents. SegSize is the only place where UInt32 is used that it will
matter (all of the other uses will never approach that size).

OK, sounds good.

writeInt() writes both signed and unsigned integers (or rather the bit
pattern could be interpreted as either, and it's up to the definition to
decide which it is).

Good point. On the Perl side, I'm specifying how they are interpreted within the IO method, rather than by a cast outside the method. Effectively I have writeSignedInt and writeUnsignedInt.

You're right about FORMAT... something should be changed to make it
consistent.
It could be defined as 0xffffffff instead of -1.

That would be a little strange because the test to determine whether an index in the new format is whether or not FORMAT is less than 0. The present implementation isn't buggy or problematic, it's just that the logical inconsistency in the specs doc is confusing for people like me who are trying to write compliant code. If FORMAT gets redefined as 0xffffffff, that suggests to me that the algorithm for identifying the new format should change, to something like if (FORMAT > 0x7FFFFFF). I don't think either of us wants to change any code outside the specs doc.

I believe that FORMAT in segments and FORMAT in .tis/.tii are the only places in the Lucene file format where negative numbers are required. Would it be overkill to define an Int32 primitive datatype just for those?

 Int32

    32-bit signed integers are written as four bytes,
    high-order bytes first, in twos-complement encoding.

    Int32 --> <Byte>4

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to