> "Allison, Timothy B." <talli...@mitre.org> hat am 28. März 2016 um 21:02
> geschrieben:
> 
> 
> Oh, wow, so it really might be possible without too much work?  I'm more than
> happy to supply examples. :) 
Ups, it isn't as simply as it sounds. If we simply swallow the exception pdfbox
most likel runs into a NPE. IMHO we have to implement some sort of an on demand
parser which is able to handle null-values for specific parts of a pdf without
throwing any exception.

> Should I open an issue?
Thanks, but I'm going to do that soon, as some other things should be done as
well.

BR
Andreas
> 
> 
> -----Original Message-----
> From: Andreas Lehmkuehler [mailto:andr...@lehmi.de] 
> Sent: Monday, March 28, 2016 10:58 AM
> To: dev@pdfbox.apache.org
> Subject: Re: shading/relocating 1.8.x?
> 
> Am 25.03.2016 um 17:39 schrieb John Hewson:
> >
> >> On 23 Mar 2016, at 06:20, Allison, Timothy B. <talli...@mitre.org> wrote:
> >>
> >> All,
> >>   We've upgraded to 2.0.0 on Tika.  Many thanks again!
> >>   One of our users is interested in continuing to use the
> >> classic/SequentialParser, or at least having it available as a back-off
> >> parser for corrupt pdfs [0].
> >
> > Using the old parser really isn’t a good idea, it’s known to be pretty
> > broken. I think that we would be much better off making sure the new parser
> > can handle truncated files. We already do a lot of repair in the new parser,
> > so this doesn’t seem like to much work? Maybe Andreas can comment further?
> The biggest issue here is the truncated stream or dictionary. The current
> version simply throws an exception when running into such constellations. We
> have to implement some algorithm to ignore such incomplete parts of a pdf if
> possible.
> 
> BR
> Andreas
> 
> >
> > Do we have some JIRA issues which identify some of these cases?
> >
> > — John
> >
> >>   Would you be willing to distribute a shaded/relocated 1.8.x app so that
> >> we could load both 1.8.x and 2.0.0 in the same jvm without collisions?  Or,
> >> is there a better solution?
> >
> > I wouldn’t recommend doing that, because you’re going to be stuck with using
> > 1.8 for everything, not just parsing, at least as far as corrupt/truncated
> > files are concerned.
> >
> > — John
> >
> >>   Thank you!
> >>
> >>               Cheers,
> >>
> >>                          Tim
> >>
> >> [0]
> >> https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208360#comment-15208360
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> >> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to