Hi,
I've just started a new "B" test.
Tilman
On 06.07.2024 13:29, Andreas Lehmkühler wrote:
Hi,
after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd
like to finally cut the 2.0.32 release.
Do we need a new regression test due the latest changes?
There some related changes such as
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent
refactoring in fontbox.
Andreas
Am 14.06.24 um 13:03 schrieb Tilman Hausherr:
Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz
From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 hour
to create the A vs B report (tika-eval).
Tilman
On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the change
from PDFBOX-5790 but locally adding my proposed xmpbox change from
PDFBOX-5835. This way we'll know whether there are other problems.
Tilman
On 13.06.2024 19:23, Tilman Hausherr wrote:
See https://issues.apache.org/jira/browse/PDFBOX-5838
I hope that it's all the same problem.
Tilman
On 13.06.2024 18:30, Andreas Lehmkühler wrote:
Thanks for running the tests.
the exceptions part looks good, but I'm afraid we have a text
extraction issue.
commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI
some of the special characters changed. In 2.0.31 the were
"omitted" and in 2.0.32 there is some special char. But th
remaining part looks good to me.
cc-main-2021-31-pdf-untruncated/0085/0085885.pdf
ist seems to contain some special characters as well, but 2.0.31
is able to extract them. 2.0.32 seems to mix some of the content.
I guess it is somehow font related. Need to investigate more
Andreas
Am 12.06.24 um 20:23 schrieb Tilman Hausherr:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz
No new exceptions but many content differences. I haven't
investigated yet.
Tilman
On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have
the results tomorrow.
Tilman
On 05.06.2024 08:07, Andreas Lehmkühler wrote:
Thanks for the update.
I'm going to postpone the release as I'll need any helping hand
I can get.
Andreas
Am 02.06.24 um 14:22 schrieb Tilman Hausherr:
+1 but I won't be able to help with tests this time
Tilman
On 01.06.2024 12:15, Andreas Lehmkühler wrote:
Hi,
IMHO it is time to cut another 2.0.x release.
I'm planing to do so in a week or so?
Any objections or is there something we should add/fix first?
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org