> "Allison, Timothy B." <talli...@mitre.org> hat am 28. März 2016 um 21:02 > geschrieben: > > > Oh, wow, so it really might be possible without too much work? I'm more than > happy to supply examples. :) Ups, it isn't as simply as it sounds. If we simply swallow the exception pdfbox most likel runs into a NPE. IMHO we have to implement some sort of an on demand parser which is able to handle null-values for specific parts of a pdf without throwing any exception.
> Should I open an issue? Thanks, but I'm going to do that soon, as some other things should be done as well. BR Andreas > > > -----Original Message----- > From: Andreas Lehmkuehler [mailto:andr...@lehmi.de] > Sent: Monday, March 28, 2016 10:58 AM > To: dev@pdfbox.apache.org > Subject: Re: shading/relocating 1.8.x? > > Am 25.03.2016 um 17:39 schrieb John Hewson: > > > >> On 23 Mar 2016, at 06:20, Allison, Timothy B. <talli...@mitre.org> wrote: > >> > >> All, > >> We've upgraded to 2.0.0 on Tika. Many thanks again! > >> One of our users is interested in continuing to use the > >> classic/SequentialParser, or at least having it available as a back-off > >> parser for corrupt pdfs [0]. > > > > Using the old parser really isn’t a good idea, it’s known to be pretty > > broken. I think that we would be much better off making sure the new parser > > can handle truncated files. We already do a lot of repair in the new parser, > > so this doesn’t seem like to much work? Maybe Andreas can comment further? > The biggest issue here is the truncated stream or dictionary. The current > version simply throws an exception when running into such constellations. We > have to implement some algorithm to ignore such incomplete parts of a pdf if > possible. > > BR > Andreas > > > > > Do we have some JIRA issues which identify some of these cases? > > > > — John > > > >> Would you be willing to distribute a shaded/relocated 1.8.x app so that > >> we could load both 1.8.x and 2.0.0 in the same jvm without collisions? Or, > >> is there a better solution? > > > > I wouldn’t recommend doing that, because you’re going to be stuck with using > > 1.8 for everything, not just parsing, at least as far as corrupt/truncated > > files are concerned. > > > > — John > > > >> Thank you! > >> > >> Cheers, > >> > >> Tim > >> > >> [0] > >> https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208360#comment-15208360 > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > >> For additional commands, e-mail: dev-h...@pdfbox.apache.org > >> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > > For additional commands, e-mail: dev-h...@pdfbox.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: dev-h...@pdfbox.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: dev-h...@pdfbox.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org