https://bz.apache.org/bugzilla/show_bug.cgi?id=61045

--- Comment #3 from Tim Allison <talli...@mitre.org> ---
Thank you, Nick.  It turns out that there isn't a hard limit of 50.  Adding
[Red] can make for a format string > 50.

In our regression corpus, there are 13 examples of requested 51, there are only
50 bytes available; and there's one example from TIKA-2154 that shows some even
greater, um, flexibility in RecordFormat.

If we add a custom readStringCommon to RecordFormat, all works[1]


    private String readStringCommon(RecordInputStream ris, int requestedLength,
boolean pIsCompressedEncoding) {
        // Sanity check to detect garbage string lengths
        if (requestedLength < 0 || requestedLength > 0x100000) { // 16 million
chars?
            throw new IllegalArgumentException("Bad requested string length ("
+ requestedLength + ")");
        }
        char[] buf = null;
        boolean isCompressedEncoding = pIsCompressedEncoding;
        int availableChars = isCompressedEncoding ? ris.remaining() :
ris.remaining() / LittleEndianConsts.SHORT_SIZE;
        //everything worked out.  Great!
        int remaining = ris.remaining();
        if (requestedLength == availableChars) {
            buf = new char[requestedLength];
        } else {
            //sometimes in older Excel 97 .xls files,
            //the requested length is wrong.
            //Read all available characters.
            buf = new char[availableChars];
        }
        for (int i = 0; i < buf.length; i++) {
            char ch;
            if (isCompressedEncoding) {
                ch = (char) ris.readUByte();
            } else {
                ch = (char) ris.readShort();
            }
            buf[i] = ch;
        }

        //TIKA-2154's file shows that even in a unicode string
        //there can be a remaining byte (without proper final '00')
        //that should be read as a byte
        if (ris.available() == 1) {
            char[] tmp = new char[buf.length+1];
            System.arraycopy(buf, 0, tmp, 0, buf.length);
            tmp[buf.length] = (char)ris.readUByte();
            buf = tmp;
        }
        String ret = new String(buf);

        //swallow what's left
        while (ris.available() > 0) {
            ris.readByte();
        }
        return new String(buf);
    }

[1]  Well, not quite all, turns out that a DimensionsRecord can have an extra
short in these files, too...argh...

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to