On 09.04.2023 17:35, Andreas Lehmkuehler wrote:
Hi,

I've fixed the issue with 2 of the 3 pdfs.

GHOSTSCRIPT-702891-0.pdf is left as the only problematic pdf. I didn't found a solution which fixes the regressions and still fixes the origin issue from PDFBOX-5178. The parser from the trunk is able to handle that pdf well.

IMHO we should leave it alone, as it is malformed anmd doesn't contain any useful content. More important, it is one pdf out of hundreds of thoudsands, just a corner cases.

WDYT?

I agree!

Tilman



Andreas

Am 05.04.23 um 08:10 schrieb Andreas Lehmkuehler:
Am 04.04.23 um 07:40 schrieb Andreas Lehmkuehler:
Am 03.04.23 um 19:50 schrieb Tim Allison:
https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-20230403-reports.tgz

Haven't had a chance to take a look yet. :(
Thanks Tim!

There are still 5 new exceptions listed. All of them are related to the very same change coming from PDFBOX-5178 which I've fixed the other day. But these cases are different and the trunk is affected as well. My bad to not have a deeper look in the first place.

I'm going to investigate those issues
All pdfs are more or less broken. Two of them are totally useless and the new exception is just another one. The other three contain some more or less readable content and we are hitting the well know dilemma: should we stop reading once we hit something bad or should we try to read as much as possible and maybe run into much bigger issues than before.

I guess these are all some special corner cases. I'm still thinking about a solution to support both strategies.

Andreas


Andreas



On Mon, Apr 3, 2023 at 6:53 AM Tilman Hausherr <thaush...@t-online.de> wrote:

Don't wait please
Thanks
Tilman



--- Original-Nachricht ---
Von: Tim Allison
Betreff: Re: Fwd: 2.0.28 release?
Datum: 03. April 2023, 12:47
An: dev@pdfbox.apache.org




Y. I can kick that off now. Or should I wait?

On Sat, Apr 1, 2023 at 2:06 PM Andreas Lehmkuehler <andr...@lehmi.de
<mailto:andr...@lehmi.de> > wrote:

@Tim <mailto:@Tim>
Is there any chance to re-run the tests?

Andreas

Am 01.04.23 um 17:08 schrieb Andreas Lehmkuehler:
Am 01.04.23 um 17:05 schrieb Andreas Lehmkuehler:

I've accidentally send this to Tim only :-|

-------- Weitergeleitete Nachricht --------
Betreff: Re: 2.0.28 release?
Datum: Fri, 31 Mar 2023 07:50:10 +0200
Von: Andreas Lehmkuehler <andr...@lehmi.de <mailto:andr...@lehmi.de>

An: Tim Allison <talli...@apache.org <mailto:talli...@apache.org> >

Am 30.03.23 um 16:27 schrieb Tim Allison:
Reports are here:

<https://corpora.tika.apache.org/base/reports/pdfbox-2.0.27-v-2.0.28-SNAPSHOT.tgz>
Thanks Tim.

Looks like we have a regression. There is a handful of new exceptions.
Some of
them just replace another exception and it is unclear if the result is
better
or worse. But at least one of the pdfs works in 2.0.27 and doesn't in
2.0.28

bug_trackers/PDFBOX/PDFBOX-4424-1.pdf

I'll have a look
The regression was related to PDFBOX-5178. I've fixed it so that the
exceptions
should be gone.

Andreas



Andreas


On Tue, Mar 28, 2023 at 10:42 PM Tilman Hausherr <
thaush...@t-online.de <mailto:thaush...@t-online.de> > wrote:

Yes please!

Thanks

Tilman

On 28.03.2023 19:22, Tim Allison wrote:
+1

Should I run the regression tests now or is there anything else
text
related that is still being worked on?

On Tue, Mar 28, 2023 at 1:05 PM Tilman Hausherr <
thaush...@t-online.de <mailto:thaush...@t-online.de> > wrote:
+1

Tilman

On 28.03.2023 08:46, Andreas Lehmkuehler wrote:
Hi,

how about cutting a 2.0.28 release next week on Monday?

there is a bunch of solved tickets and the last release dates
back 6
months

Andreas




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to