No need to IMHO
Tilman
--- Original-Nachricht ---
Von: Andreas Lehmkühler
Betreff: Re: PDFBox 2.0.29 release?
Datum: 25. Juni 2023, 15:16
An: dev@pdfbox.apache.org
@Tilman <mailto:@Tilman> thanks for fixing this
Should we run another test before cutting the release?
Andreas
Am 03.06.23 um 05:53 schrieb Tilman Hausherr:
Thank you. This is related to PDFBOX-5606. parseNextToken() is closing
the content stream if an error occurs, but it sometimes calls itself.
Because of the closed content stream the method returns null, which is
reported with the position. Trying to get the position on a closed
stream throws the exception.
Tilman
On 02.06.2023 17:08, Tim Allison wrote:
Reports are here:
<https://corpora.tika.apache.org/base/reports/pdfbox-2.0.29-pre-rc1-reports.tgz>
One new exception which is reproducible with pure PDFBox app's
ExtractText.
<https://corpora.tika.apache.org/base/docs/govdocs1/819/819127.pdf>
Exception in thread "main" org.apache.tika.exception.TikaException:
Unable
to extract PDF content
at
<http://org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:130> )
at<http://org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:212>
)
at
<http://org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298>
)
at
<http://org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298>
)
at
<http://org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:199>
)
at
<http://org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:164>
)
at
<http://org.apache.tika.cli.TikaCLI.handleRecursiveJson(TikaCLI.java:518>
)
at<http://org.apache.tika.cli.TikaCLI.process(TikaCLI.java:489> )
at<http://org.apache.tika.cli.TikaCLI.main(TikaCLI.java:256> )
Caused by:<http://java.io.IOException> : Stream closed
at
<http://java.base/java.io.PushbackInputStream.ensureOpen(PushbackInputStream.java:75>
)
at
<http://java.base/java.io.PushbackInputStream.read(PushbackInputStream.java:132>
)
at
<http://org.apache.pdfbox.pdfparser.InputStreamSource.read(InputStreamSource.java:47>
)
at
<http://org.apache.pdfbox.pdfparser.BaseParser.skipSpaces(BaseParser.java:1257>
)
at
<http://org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:138>
)
at
<http://org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:548>
)
at
<http://org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:516>
)
at
<http://org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155>
)
at
<http://org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:155>
)
at
<http://org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:363>
)
at
<http://org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:137>
)
at
<http://org.apache.tika.parser.pdf.AbstractPDF2XHTML.processPages(AbstractPDF2XHTML.java:1370>
)
at
org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:238)
at
<http://org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:108> )
On Wed, May 31, 2023 at 1:41 PM Tilman Hausherr <thaush...@t-online.de
<mailto:thaush...@t-online.de> >
wrote:
Yes please
Thanks
Tilman
On 31.05.2023 17:15, Tim Allison wrote:
+1
Let me know when/if I should run the text extraction regression tests.
On Thu, May 25, 2023 at 12:32 PM sahy...@fileaffairs.de
<mailto:sahy...@fileaffairs.de> <
sahy...@fileaffairs.de <mailto:sahy...@fileaffairs.de> > wrote:
+1
Maruan
Am Mittwoch, dem 24.05.2023 um 07:48 +0200 schrieb Andreas
Lehmkuehler:
Hi,
I tend to release 2.0.29 soon due to the regression which was solved
with
PDFBOX-5606.
WDYT?
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
<mailto:dev-unsubscr...@pdfbox.apache.org>
For additional commands, e-mail: dev-h...@pdfbox.apache.org
<mailto:dev-h...@pdfbox.apache.org>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
<mailto:dev-unsubscr...@pdfbox.apache.org>
For additional commands, e-mail: dev-h...@pdfbox.apache.org
<mailto:dev-h...@pdfbox.apache.org>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
<mailto:dev-unsubscr...@pdfbox.apache.org>
For additional commands, e-mail: dev-h...@pdfbox.apache.org
<mailto:dev-h...@pdfbox.apache.org>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
<mailto:dev-unsubscr...@pdfbox.apache.org>
For additional commands, e-mail: dev-h...@pdfbox.apache.org
<mailto:dev-h...@pdfbox.apache.org>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
<mailto:dev-unsubscr...@pdfbox.apache.org>
For additional commands, e-mail: dev-h...@pdfbox.apache.org
<mailto:dev-h...@pdfbox.apache.org>