https://bz.apache.org/bugzilla/show_bug.cgi?id=63323
Bug ID: 63323
Summary: HwmfText's getText can throw StringIndexOutOfRange on
shiftjis encoded text
Product: POI
Version: 4.0.x-dev
Hardware: PC
Status: NEW
Severity: normal
Priority: P2
Component: POI Overall
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
When upgrading Tika to POI 4.1.0-rc3, one of our unit tests that tests for
correct encoding handling is now failing. Multibyte character encodings need
to be handled more carefully than relying on stringLength in the call to
substring:
public String getText(Charset charset) throws IOException {
return (new String(this.rawTextBytes, charset)).substring(0,
this.stringLength);
}
The triggering test file is here:
https://github.com/apache/tika/blob/master/tika-parsers/src/test/resources/test-documents/testWMF_charset.wmf
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]