See https://issues.apache.org/jira/browse/PDFBOX-5838
I hope that it's all the same problem.
Tilman
On 13.06.2024 18:30, Andreas Lehmkühler wrote:
Thanks for running the tests.
the exceptions part looks good, but I'm afraid we have a text
extraction issue.
commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI
some of the special characters changed. In 2.0.31 the were "omitted"
and in 2.0.32 there is some special char. But th remaining part looks
good to me.
cc-main-2021-31-pdf-untruncated/0085/0085885.pdf
ist seems to contain some special characters as well, but 2.0.31 is
able to extract them. 2.0.32 seems to mix some of the content.
I guess it is somehow font related. Need to investigate more
Andreas
Am 12.06.24 um 20:23 schrieb Tilman Hausherr:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz
No new exceptions but many content differences. I haven't
investigated yet.
Tilman
On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have the
results tomorrow.
Tilman
On 05.06.2024 08:07, Andreas Lehmkühler wrote:
Thanks for the update.
I'm going to postpone the release as I'll need any helping hand I
can get.
Andreas
Am 02.06.24 um 14:22 schrieb Tilman Hausherr:
+1 but I won't be able to help with tests this time
Tilman
On 01.06.2024 12:15, Andreas Lehmkühler wrote:
Hi,
IMHO it is time to cut another 2.0.x release.
I'm planing to do so in a week or so?
Any objections or is there something we should add/fix first?
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org