[
https://issues.apache.org/jira/browse/PDFBOX-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler updated PDFBOX-390:
--------------------------------------
Attachment: ASCIIHexFilter_390-Patch.diff
I've created a patch with the suggested changes from mathias. Has someone a
sample-document to test this feature?
> org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace
> ---------------------------------------------------------
>
> Key: PDFBOX-390
> URL: https://issues.apache.org/jira/browse/PDFBOX-390
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 0.8.0-incubator
> Reporter: Mathias Bosch
> Fix For: 0.8.0-incubator
>
> Attachments: ASCIIHexFilter_390-Patch.diff
>
>
> org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace
> According to the Specification (pdf_reference_1-7.pdf) all Whitespace
> Characters between the ASCII-Hex values have to be skipped (see 3.3.1
> ASCIIHexDecode Filter).
> The 0.8.0-incubator source decodes (or attempts to decode) those Whitespace
> Characters and as a result the byte values are wrong (all characters that
> are not [0-9a-f] result in -1, but processing does continue).
> This causes an invalid byte Stream.
> The ASCIIHexDecode Filter Section also defines the EOD end Character of the
> Byte Steam as '>' which might ease the parsing of inline Images.
> (The EI Operator should follow the EOD in case of an inline Image).
> Example for ASCII-Hex encoded value, copied from the Spec:
> FF CE A3 7C 5B 3F 28 16 0A 02 00 02 0A 16 28 3F 5B 7C A3 CE FF >
> I did fix the problem to be able to continue with my work.
> I paste the changed code here as a hint that might help to fix the bug.
> public class ASCIIHexFilter
> implements Filter
> {
> /**
> * Whitespace
> * 0 0x00 Null (NUL)
> * 9 0x09 Tab (HT)
> * 10 0x0A Line feed (LF)
> * 12 0x0C Form feed (FF)
> * 13 0x0D Carriage return (CR)
> * 32 0x20 Space (SP)
> */
> protected boolean isWhitespace(int c) {
> return c == 0 || c == 9 || c == 10 || c == 12 || c == 13 || c == 32;
> }
>
> protected boolean isEOD(int c) {
> return (c == 62); // '>' - EOD
> }
> /**
> * [EMAIL PROTECTED]
> */
> public void decode(InputStream compressedData, OutputStream result,
> COSDictionary options, int filterIndex) throws IOException {
> int value = 0;
> int firstByte = 0;
> int secondByte = 0;
> while ((firstByte = compressedData.read()) != -1) {
>
> // always after first char
> while(isWhitespace(firstByte))
> firstByte = compressedData.read();
> if(isEOD(firstByte))
> break;
>
> if(REVERSE_HEX[firstByte] == -1)
> System.out.println("Invalid Hex Code; int: " + firstByte + " char: "
> + (char) firstByte);
> value = REVERSE_HEX[firstByte] * 16;
> secondByte = compressedData.read();
>
> if(isEOD(secondByte)) {
> // second value behaves like 0 in case of EOD
> result.write(value);
> break;
> }
> if(secondByte >= 0) {
> if(REVERSE_HEX[secondByte] == -1)
> System.out.println("Invalid Hex Code; int: " + secondByte + " char:
> " + (char) secondByte);
> value += REVERSE_HEX[secondByte];
> }
> result.write(value);
> }
>
> result.flush();
> }
> // .....................................................
> // other code remains unchanged
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.