SSTDeserializer problem ?

Sergey Dubovitsky Thu, 13 Jul 2006 04:22:34 -0700

Hi
Gentlemen,

We are using POI 2.5 (2.5.1 shows the same problem) to parse Excel files
and have encountered a problem with the SSTDeserializer class.


We have solved the problem but we would like to consult with you if our
corrections are acceptable. 
This is also possible that this is a known problem and an official fix
exists?

We'll appreciate any help much.

Reproduce the problem:
Unfortunately we can not provide sample excel file lots of confidential
information and we failed to generate file with the same problem with
synthetic data (detailed description of the problem demonstrating
rareness of the case can be found below)...

Following exception occurs during opening of excel file:
java.lang.NullPointerException
            at
org.apache.poi.hssf.record.SSTRecord.getString(SSTRecord.java:277)
            at
org.apache.poi.hssf.model.Workbook.getSSTString(Workbook.java:649)
            at
org.apache.poi.hssf.usermodel.HSSFCell.<init>(HSSFCell.java:283)
            at
org.apache.poi.hssf.usermodel.HSSFRow.createCellFromRecord(HSSFRow.java:
198)
            at
org.apache.poi.hssf.usermodel.HSSFSheet.setPropertiesFromSheet(HSSFSheet
.java:156)
            at
org.apache.poi.hssf.usermodel.HSSFSheet.<init>(HSSFSheet.java:110)
            at
org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:177)
            at
org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:210)
            at
org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:191)
 
Problem Description:
The problem reproduces when string in CONTINUE RECORD finishes at the
end of record and extra CONTINUE RECORD exists.
The readStringRemainder method reads remainder of current string but
doesn't update continuationCharsRead variable with new length.
As a result when next CONTINUE record is called with
processContinueRecord the function isStringFinished returns invalud
result,and  string is treated as unfinished one.


Solutuion:
 We  suggest to correct this problem by modifying readStringRemainder
code like this 

private void readStringRemainder( final byte[] record )
    {
        int stringRemainderSizeInBytes = calculateByteCount(
charCount-getContinuationCharsRead() );
        byte[] unicodeStringData = new
byte[SSTRecord.STRING_MINIMAL_OVERHEAD
                + stringRemainderSizeInBytes];

        // write the string length
        LittleEndian.putShort( unicodeStringData, 0, (short)
(charCount-getContinuationCharsRead()) );

        // write the options flag
        unicodeStringData[LittleEndianConsts.SHORT_SIZE] =
createOptionByte( wideChar, richText, extendedText );

        // copy the bytes/words making up the string; skipping
        // past all the overhead of the str_data array
        arraycopy( record, LittleEndianConsts.BYTE_SIZE,
unicodeStringData,
                SSTRecord.STRING_MINIMAL_OVERHEAD,
                stringRemainderSizeInBytes );

        // use special constructor to create the final string
        UnicodeString string = new UnicodeString( UnicodeString.sid,
                (short) unicodeStringData.length, unicodeStringData,
                unfinishedString );
        Integer integer = new Integer( strings.size() );

        addToStringTable( strings, integer, string );

        int newOffset = offsetForContinuedRecord(
stringRemainderSizeInBytes );
    

        // ----------------------- CORRECTIONS
BEGIN-------------------------
       /* 
        * This function doesn't update the continuationCharsRead
variable
        * with new string length (unfinished string length + remaining
string length )
        * Because string variable is a concatenation of unfinishedString
and stringRemainder,
        * it 's length is characketers can be used as new value for
continuationCharsRead.
      */

        setContinuationCharsRead(string.getCharCount() );
      
        /*
         * If we didn't reach end of current record we have to call 
         * manufactureStrings to process other strings in this record. 
         * Because of manufactureStrings  checks if end of record is
reached 
         * it can be called unconditionally.
         * But the problem is, manufactureStrings   
         * will call initVars first and reset the continuationCharsRead
value,
         * which is necessary for isStringFinished to work correctly
         * when next processContinueRecord will be called.
         */
        if (newOffset < record.length)
        {
            manufactureStrings( record, newOffset);
        }

        // ----------------------- CORRECTIONS END
-------------------------

    }

 
 

 Thank you.
Sergey.

 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta POI Project: http://jakarta.apache.org/poi/

SSTDeserializer problem ?

Reply via email to