No need to IMHO Tilman
--- Original-Nachricht --- Von: Andreas Lehmkühler Betreff: Re: PDFBox 2.0.29 release? Datum: 25. Juni 2023, 15:16 An: dev@pdfbox.apache.org @Tilman <mailto:@Tilman> thanks for fixing this Should we run another test before cutting the release? Andreas Am 03.06.23 um 05:53 schrieb Tilman Hausherr: > Thank you. This is related to PDFBOX-5606. parseNextToken() is closing > the content stream if an error occurs, but it sometimes calls itself. > Because of the closed content stream the method returns null, which is > reported with the position. Trying to get the position on a closed > stream throws the exception. > > Tilman > > On 02.06.2023 17:08, Tim Allison wrote: >> Reports are here: >> <https://corpora.tika.apache.org/base/reports/pdfbox-2.0.29-pre-rc1-reports.tgz> >> >> >> One new exception which is reproducible with pure PDFBox app's >> ExtractText. >> >><https://corpora.tika.apache.org/base/docs/govdocs1/819/819127.pdf> >> >> Exception in thread "main" org.apache.tika.exception.TikaException: >> Unable >> to extract PDF content >> at <http://org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:130> ) >> at<http://org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:212> ) >> at >> <http://org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298> ) >> at >> <http://org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298> ) >> at >> <http://org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:199> ) >> at >> <http://org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:164> ) >> >> at <http://org.apache.tika.cli.TikaCLI.handleRecursiveJson(TikaCLI.java:518> ) >> at<http://org.apache.tika.cli.TikaCLI.process(TikaCLI.java:489> ) >> at<http://org.apache.tika.cli.TikaCLI.main(TikaCLI.java:256> ) >> Caused by:<http://java.io.IOException> : Stream closed >> at >> <http://java.base/java.io.PushbackInputStream.ensureOpen(PushbackInputStream.java:75> ) >> >> at >> <http://java.base/java.io.PushbackInputStream.read(PushbackInputStream.java:132> ) >> at >> <http://org.apache.pdfbox.pdfparser.InputStreamSource.read(InputStreamSource.java:47> ) >> >> at >> <http://org.apache.pdfbox.pdfparser.BaseParser.skipSpaces(BaseParser.java:1257> ) >> at >> <http://org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:138> ) >> >> at >> <http://org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:548> ) >> >> at >> <http://org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:516> ) >> >> at >> <http://org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155> ) >> >> at >> <http://org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:155> ) >> >> at >> <http://org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:363> ) >> >> at <http://org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:137> ) >> at >> <http://org.apache.tika.parser.pdf.AbstractPDF2XHTML.processPages(AbstractPDF2XHTML.java:1370> ) >> >> at >> org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:238) >> >> at <http://org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:108> ) >> >> On Wed, May 31, 2023 at 1:41 PM Tilman Hausherr <thaush...@t-online.de <mailto:thaush...@t-online.de> > >> wrote: >> >>> Yes please >>> >>> Thanks >>> >>> Tilman >>> >>> On 31.05.2023 17:15, Tim Allison wrote: >>>> +1 >>>> >>>> Let me know when/if I should run the text extraction regression tests. >>>> >>>> On Thu, May 25, 2023 at 12:32 PM sahy...@fileaffairs.de <mailto:sahy...@fileaffairs.de> < >>>> sahy...@fileaffairs.de <mailto:sahy...@fileaffairs.de> > wrote: >>>> >>>>> +1 >>>>> >>>>> Maruan >>>>> >>>>> Am Mittwoch, dem 24.05.2023 um 07:48 +0200 schrieb Andreas >>>>> Lehmkuehler: >>>>>> Hi, >>>>>> >>>>>> I tend to release 2.0.29 soon due to the regression which was solved >>>>>> with >>>>>> PDFBOX-5606. >>>>>> >>>>>> WDYT? >>>>>> >>>>>> Andreas >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> >>>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org <mailto:dev-unsubscr...@pdfbox.apache.org> >>>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org <mailto:dev-h...@pdfbox.apache.org> >>>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org <mailto:dev-unsubscr...@pdfbox.apache.org> >>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org <mailto:dev-h...@pdfbox.apache.org> >>>>> >>>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org <mailto:dev-unsubscr...@pdfbox.apache.org> >>> For additional commands, e-mail: dev-h...@pdfbox.apache.org <mailto:dev-h...@pdfbox.apache.org> >>> >>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org <mailto:dev-unsubscr...@pdfbox.apache.org> > For additional commands, e-mail: dev-h...@pdfbox.apache.org <mailto:dev-h...@pdfbox.apache.org> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org <mailto:dev-unsubscr...@pdfbox.apache.org> For additional commands, e-mail: dev-h...@pdfbox.apache.org <mailto:dev-h...@pdfbox.apache.org>