org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace
---------------------------------------------------------
Key: PDFBOX-390
URL: https://issues.apache.org/jira/browse/PDFBOX-390
Project: PDFBox
Issue Type: Bug
Components: Parsing
Affects Versions: 0.8.0-incubator
Reporter: Mathias Bosch
Fix For: 0.8.0-incubator
org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace
According to the Specification (pdf_reference_1-7.pdf) all Whitespace
Characters between the ASCII-Hex values have to be skipped (see 3.3.1
ASCIIHexDecode Filter).
The 0.8.0-incubator source decodes (or attempts to decode) those Whitespace
Characters and as a result the byte values are wrong (all characters that
are not [0-9a-f] result in -1, but processing does continue).
This causes an invalid byte Stream.
The ASCIIHexDecode Filter Section also defines the EOD end Character of the
Byte Steam as '>' which might ease the parsing of inline Images.
(The EI Operator should follow the EOD in case of an inline Image).
Example for ASCII-Hex encoded value, copied from the Spec:
FF CE A3 7C 5B 3F 28 16 0A 02 00 02 0A 16 28 3F 5B 7C A3 CE FF >
I did fix the problem to be able to continue with my work.
I paste the changed code here as a hint that might help to fix the bug.
public class ASCIIHexFilter
implements Filter
{
/**
* Whitespace
* 0 0x00 Null (NUL)
* 9 0x09 Tab (HT)
* 10 0x0A Line feed (LF)
* 12 0x0C Form feed (FF)
* 13 0x0D Carriage return (CR)
* 32 0x20 Space (SP)
*/
protected boolean isWhitespace(int c) {
return c == 0 || c == 9 || c == 10 || c == 12 || c == 13 || c == 32;
}
protected boolean isEOD(int c) {
return (c == 62); // '>' - EOD
}
/**
* [EMAIL PROTECTED]
*/
public void decode(InputStream compressedData, OutputStream result,
COSDictionary options, int filterIndex) throws IOException {
int value = 0;
int firstByte = 0;
int secondByte = 0;
while ((firstByte = compressedData.read()) != -1) {
// always after first char
while(isWhitespace(firstByte))
firstByte = compressedData.read();
if(isEOD(firstByte))
break;
if(REVERSE_HEX[firstByte] == -1)
System.out.println("Invalid Hex Code; int: " + firstByte + " char: " +
(char) firstByte);
value = REVERSE_HEX[firstByte] * 16;
secondByte = compressedData.read();
if(isEOD(secondByte)) {
// second value behaves like 0 in case of EOD
result.write(value);
break;
}
if(secondByte >= 0) {
if(REVERSE_HEX[secondByte] == -1)
System.out.println("Invalid Hex Code; int: " + secondByte + " char: "
+ (char) secondByte);
value += REVERSE_HEX[secondByte];
}
result.write(value);
}
result.flush();
}
// .....................................................
// other code remains unchanged
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.