On 09.04.2023 17:35, Andreas Lehmkuehler wrote:
Hi,
I've fixed the issue with 2 of the 3 pdfs.
GHOSTSCRIPT-702891-0.pdf is left as the only problematic pdf. I didn't
found a solution which fixes the regressions and still fixes the
origin issue from PDFBOX-5178. The parser from the trunk is able to
handle that pdf well.
IMHO we should leave it alone, as it is malformed anmd doesn't contain
any useful content. More important, it is one pdf out of hundreds of
thoudsands, just a corner cases.
WDYT?
I agree!
Tilman
Andreas
Am 05.04.23 um 08:10 schrieb Andreas Lehmkuehler:
Am 04.04.23 um 07:40 schrieb Andreas Lehmkuehler:
Am 03.04.23 um 19:50 schrieb Tim Allison:
https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-20230403-reports.tgz
Haven't had a chance to take a look yet. :(
Thanks Tim!
There are still 5 new exceptions listed. All of them are related to
the very same change coming from PDFBOX-5178 which I've fixed the
other day. But these cases are different and the trunk is affected
as well. My bad to not have a deeper look in the first place.
I'm going to investigate those issues
All pdfs are more or less broken. Two of them are totally useless and
the new exception is just another one. The other three contain some
more or less readable content and we are hitting the well know
dilemma: should we stop reading once we hit something bad or should
we try to read as much as possible and maybe run into much bigger
issues than before.
I guess these are all some special corner cases. I'm still thinking
about a solution to support both strategies.
Andreas
Andreas
On Mon, Apr 3, 2023 at 6:53 AM Tilman Hausherr
<thaush...@t-online.de> wrote:
Don't wait please
Thanks
Tilman
--- Original-Nachricht ---
Von: Tim Allison
Betreff: Re: Fwd: 2.0.28 release?
Datum: 03. April 2023, 12:47
An: dev@pdfbox.apache.org
Y. I can kick that off now. Or should I wait?
On Sat, Apr 1, 2023 at 2:06 PM Andreas Lehmkuehler <andr...@lehmi.de
<mailto:andr...@lehmi.de> > wrote:
@Tim <mailto:@Tim>
Is there any chance to re-run the tests?
Andreas
Am 01.04.23 um 17:08 schrieb Andreas Lehmkuehler:
Am 01.04.23 um 17:05 schrieb Andreas Lehmkuehler:
I've accidentally send this to Tim only :-|
-------- Weitergeleitete Nachricht --------
Betreff: Re: 2.0.28 release?
Datum: Fri, 31 Mar 2023 07:50:10 +0200
Von: Andreas Lehmkuehler <andr...@lehmi.de
<mailto:andr...@lehmi.de>
An: Tim Allison <talli...@apache.org
<mailto:talli...@apache.org> >
Am 30.03.23 um 16:27 schrieb Tim Allison:
Reports are here:
<https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-SNAPSHOT.tgz>
Thanks Tim.
Looks like we have a regression. There is a handful of new
exceptions.
Some of
them just replace another exception and it is unclear if the
result is
better
or worse. But at least one of the pdfs works in 2.0.27 and
doesn't in
2.0.28
bug_trackers/PDFBOX/PDFBOX-4424-1.pdf
I'll have a look
The regression was related to PDFBOX-5178. I've fixed it so that
the
exceptions
should be gone.
Andreas
Andreas
On Tue, Mar 28, 2023 at 10:42 PM Tilman Hausherr <
thaush...@t-online.de <mailto:thaush...@t-online.de> > wrote:
Yes please!
Thanks
Tilman
On 28.03.2023 19:22, Tim Allison wrote:
+1
Should I run the regression tests now or is there anything else
text
related that is still being worked on?
On Tue, Mar 28, 2023 at 1:05 PM Tilman Hausherr <
thaush...@t-online.de <mailto:thaush...@t-online.de> > wrote:
+1
Tilman
On 28.03.2023 08:46, Andreas Lehmkuehler wrote:
Hi,
how about cutting a 2.0.28 release next week on Monday?
there is a bunch of solved tickets and the last release dates
back 6
months
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org