Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text extraction issue.

commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were "omitted" and in 2.0.32 there is some special char. But th remaining part looks good to me.


cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 is able to extract them. 2.0.32 seems to mix some of the content.

I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't investigated yet.

Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have the results tomorrow.

Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:
Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I can get.

Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:
+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:
Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to