https://bz.apache.org/bugzilla/show_bug.cgi?id=61045
--- Comment #3 from Tim Allison <talli...@mitre.org> --- Thank you, Nick. It turns out that there isn't a hard limit of 50. Adding [Red] can make for a format string > 50. In our regression corpus, there are 13 examples of requested 51, there are only 50 bytes available; and there's one example from TIKA-2154 that shows some even greater, um, flexibility in RecordFormat. If we add a custom readStringCommon to RecordFormat, all works[1] private String readStringCommon(RecordInputStream ris, int requestedLength, boolean pIsCompressedEncoding) { // Sanity check to detect garbage string lengths if (requestedLength < 0 || requestedLength > 0x100000) { // 16 million chars? throw new IllegalArgumentException("Bad requested string length (" + requestedLength + ")"); } char[] buf = null; boolean isCompressedEncoding = pIsCompressedEncoding; int availableChars = isCompressedEncoding ? ris.remaining() : ris.remaining() / LittleEndianConsts.SHORT_SIZE; //everything worked out. Great! int remaining = ris.remaining(); if (requestedLength == availableChars) { buf = new char[requestedLength]; } else { //sometimes in older Excel 97 .xls files, //the requested length is wrong. //Read all available characters. buf = new char[availableChars]; } for (int i = 0; i < buf.length; i++) { char ch; if (isCompressedEncoding) { ch = (char) ris.readUByte(); } else { ch = (char) ris.readShort(); } buf[i] = ch; } //TIKA-2154's file shows that even in a unicode string //there can be a remaining byte (without proper final '00') //that should be read as a byte if (ris.available() == 1) { char[] tmp = new char[buf.length+1]; System.arraycopy(buf, 0, tmp, 0, buf.length); tmp[buf.length] = (char)ris.readUByte(); buf = tmp; } String ret = new String(buf); //swallow what's left while (ris.available() > 0) { ris.readByte(); } return new String(buf); } [1] Well, not quite all, turns out that a DimensionsRecord can have an extra short in these files, too...argh... -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org