Hi "������ �������" <[EMAIL PROTECTED]>
>While I was studying the String functions and Excel structure, Patrick has
>already done the patch. :)
>Thank you, Patrick, but I have something to say, it seems to me that there
>is a way to make it better.
I believe it so. This is my first cut. And I have a lot of question about
how to modify it better.
>The most unpleasent thing, that saving is not working at my side.
>The MS Excel on the opening such a file tells that the name is incorrect
>and fixes the error. :(
>
>So simply StringUtil.putUncompressedUnicode(getSheetname(), data, 12 +
>offset); seems not to working.
>I tried:
> public int serialize(int offset, byte [] data)
> {
> LittleEndian.putShort(data, 0 + offset, sid);
> LittleEndian.putShort( data, 2 + offset,
> (short)( 0x08 + getSheetnameLength() ) );
> LittleEndian.putInt(data, 4 + offset, getPositionOfBof());
> LittleEndian.putShort(data, 8 + offset, getOptionFlags());
> /*
> data[ 10 + offset ] = getSheetnameLength();
> data[ 11 + offset ] = getCompressedUnicodeFlag();
> */
> UnicodeString name = new UnicodeString();
> name.setOptionFlags( (byte)( field_4_compressed_unicode_flag &
>0x01 ) );
> name.setString( getSheetname() );
> System.arraycopy( name.serialize(), 0, data, 10 + offset,
>name.getRecordSize() );
>
> return getRecordSize();
> }
>But it is not working too. :(
Try this, as UnicodeString.serialize() return byte array with 3 byte of
header ( (short) charcount & (byte) optionFlag)
System.arraycopy( name.serialize(), 0+3 , data, 10 +
offset,name.getRecordSize() );
and see if it works for you.
>And while reading excel specification the length of the Unicode String may
>be 1 or 2.
>How to detect, when it is 1 and when it is 2?
>
>What is your opinion?
>
I believe this is a problem. In the BoundSheetRecord, it uses only one
byte(instead of two in the UnicodeString) for the charcount.
I have more questions about my patch.
1) With the BoundSheetRecord be cut off and continue on an extension record?
If yes, then my code fail.
2) People should be aware of the difference between class UnicodeString and
BIFF8 record. I reuse SSTDeserializer just to cater for the rich-text
formating information and Far-East information in BIFF8. Actually, my code
fail on BIFF8 with either rich-text formating information and Far-East
information because I didn't allocate enough size for arraycopy below.
+ BinaryTree tempBT = new BinaryTree();
+ SSTDeserializer deserializer;
+ deserializer = new SSTDeserializer( tempBT);
+ int length = LittleEndian.ubyteToInt( field_3_sheetname_length);
+ if ((field_4_compressed_unicode_flag & 0x01)==1) {
+ byte [] newData = new byte[length*2 +3];
+ arraycopy(data,7+offset,newData,2,length*2+1);
+ LittleEndian.putShort(newData,0,(short)data[6+offset]);
+// System.out.println("calling manufactureStrings!");
+ deserializer.manufactureStrings(newData,0, (short)(length *2+3));
+// System.out.println("returned from manufactureStrings!");
+ field_5_sheetname =
((UnicodeString)tempBT.get(newInteger(0))).getString();
+
+ tempBT=null;
+ }
This leads to whether we want to refactor the BIFF8 recognition code in
SSTDeserializer.manufactureStrings() into UnicodeString class or not.
>And while reading excel specification the length of the Unicode String may
be 1 or 2.
3) Regarding this, should we have a UnicodeString1() class which extends
from UnicodeString() and the only difference is it read the charcount from 1
byte field and serialize to one byte field?
4) Regarding the autodetection, can we make it less than 127 instead of 255?
I am not sure if this solve any problem at all, but I just want to discuss
here and learn from all of you.
5) Regarding my patch, even I myself have a display problem when the
sheetname is Portuguese character e.g. 'cao' , the screen display the first
2 character as a chinese character. I suppose if I expand the BIFF7 field
to String, it might solve the problem. Anybody has any clue?
Regards.
Patrick Lee
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>