[
https://issues.apache.org/jira/browse/PDFBOX-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15109001#comment-15109001
]
Andreas Lehmkühler commented on PDFBOX-3201:
--------------------------------------------
I've copied [~Sumit Saha]s comment from PDFBOX-2976:
{quote}
Hi Everyone,
I have been following PDFBox releated discussion for long and i have a better
fix in my mind in which there would be no data loss.
The exception "incorrect data check" arises due to adler-32 checksum
computation failure or corrupted stream.
Adler-32 Checksum can fail due to change in byteorder of last 4 bytes in the
stream from Big Endian to Little Endian.
So what can be done is to bypass adler-32 check whcih would allow to extract
all the data in the stream using inflaterinputstream.
To do so from code point of view , before making InflaterInputStream object, do
this
Option 1:- Either change the byteorder for last 4 bytes in the raw stream i.e
from Little Endian to Big Endian before feeding the stream to
InflaterInputstream
Option 2:- If Option 1 fails then do as mentioned below
inStm.skip(2); // here inStm is the object corresponding to the raw stream
Inflater inf = new Inflater(true); // the argument true corresponds to the
option NoWrap turning it true would disable checksum computation.
Then create the object for InflaterInputStream ifis = new
InflaterInputStream(inStm, inf);
Moreover the skipping first two bytes is required and those two bytes
corresponds to Zlib header which are not required when adler-32 check needs to
be bypassed.
Using this logic, even small data loss could be avoided.
For more info:- I had raised a similar question on stackoverflow, please go
through it
http://stackoverflow.com/questions/33348192/attached-code-throws-java-util-zip-zipexception-incorrect-data-check-for-given
{quote}
> Skip zlib-header and checksum to avoid DataFormatException
> ----------------------------------------------------------
>
> Key: PDFBOX-3201
> URL: https://issues.apache.org/jira/browse/PDFBOX-3201
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.0
> Reporter: Andreas Lehmkühler
> Assignee: Andreas Lehmkühler
> Fix For: 2.0.0
>
>
> This is a follow up to PDFBOX-2976
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]