Re: PDFBox 2.0.32 release

2024-07-12 Thread Constantine Dokolas
It was really meant as a joke. Hope no one was insulted.
Funny thing though, Apache's status page showed nothing.

Have a great day,
C.D.

On Wed, Jul 10, 2024 at 3:54 PM Tilman Hausherr 
wrote:

> On 10.07.2024 09:46, Constantine Dokolas wrote:
> > It seems like you managed to take down pdfbox.apache.org... 
>
> It's back. Also I don't think it was him, there was no website related
> posting.
>
> Tilman
>


Re: PDFBox 2.0.32 release

2024-07-12 Thread Andreas Lehmkühler
I've fixed the tagging issue. I mixed up the pom when reverting the 
release preparation :-(


Now I'm facing another issue. Maven runs in a wrong order. It tries to 
gather all build artifacts at the beginning of the process before they 
are built. Still investigating ...


Sorry for the delay
Andreas


Am 10.07.24 um 08:00 schrieb Andreas Lehmkühler:
There is some issue with tagging the release when executing the 
release:prepare goal


I'm still searching  :-(

Andreas

Am 09.07.24 um 07:46 schrieb Andreas Lehmkühler:

It is still tricky but I'm on it.

Sorry for the noise

Andreas

Am 08.07.24 um 20:06 schrieb Andreas Lehmkühler:
There is an issue with the changes from 
https://issues.apache.org/jira/browse/PDFBOX-5789



I've to postpone the release to solve the issue first

Sorry for the inconvenience

Andreas

Am 08.07.24 um 19:02 schrieb Andreas Lehmkühler:

Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:

Last one:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This 
might be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 
I'd like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 
hour to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the 
change from PDFBOX-5790 but locally adding my proposed xmpbox 
change from PDFBOX-5835. This way we'll know whether there are 
other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a 
text extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 
2.0.31 is able to extract them. 2.0.32 seems to mix some of 
the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll 
have the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any 
helping hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix 
first?


Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional 

Re: PDFBox 2.0.32 release

2024-07-10 Thread Tilman Hausherr

On 10.07.2024 09:46, Constantine Dokolas wrote:

It seems like you managed to take down pdfbox.apache.org... 


It's back. Also I don't think it was him, there was no website related 
posting.


Tilman


Re: PDFBox 2.0.32 release

2024-07-10 Thread Timo Boehme
I've to correct myself: lucene.apache.org seems to be a bad example: 
https://lucene.apache.org/core/ works; other projects work as well, e.g. 
https://tika.apache.org/

Thus it seems pdfbox really has a problem.


Best regards,
Timo


Am 10.07.24 um 09:54 schrieb Constantine Dokolas:

Yikes!

C.D.

On Wed, Jul 10, 2024 at 10:52 AM Timo Boehme
 wrote:


No, its the whole X.apache.org which is currently not available (e.g.
lucene.apache.org).

Best regards,
Timo


Am 10.07.24 um 09:46 schrieb Constantine Dokolas:

It seems like you managed to take down pdfbox.apache.org... 

C.D.

--
There is a computer disease that anybody who works with computers knows
about. It's a very serious disease and it interferes completely with the
work. The trouble with computers is that you 'play' with them!
- Richard P. Feynman


On Wed, Jul 10, 2024 at 9:00 AM Andreas Lehmkühler



wrote:


There is some issue with tagging the release when executing the
release:prepare goal

I'm still searching  :-(

Andreas

Am 09.07.24 um 07:46 schrieb Andreas Lehmkühler:

It is still tricky but I'm on it.

Sorry for the noise

Andreas

Am 08.07.24 um 20:06 schrieb Andreas Lehmkühler:

There is an issue with the changes from
https://issues.apache.org/jira/browse/PDFBOX-5789


I've to postpone the release to solve the issue first

Sorry for the inconvenience

Andreas

Am 08.07.24 um 19:02 schrieb Andreas Lehmkühler:

Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:

Last one:


https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:


https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against



https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This
might be because of the path names or some meta data.

Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838
I'd like to finally cut the 2.0.32 release.

Do we need a new regression test due the latest changes?

There some related changes such as
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent
refactoring in fontbox.

Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:


https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

   From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1
hour to create the A vs B report (tika-eval).

Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:

I'll repeat the regression tests with locally reverting the
change from PDFBOX-5790 but locally adding my proposed xmpbox
change from PDFBOX-5835. This way we'll know whether there are
other problems.

Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text
extraction issue.

commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were
"omitted" and in 2.0.32 there is some special char. But th
remaining part looks good to me.


cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but
2.0.31 is able to extract them. 2.0.32 seems to mix some of
the content.

I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't
investigated yet.

Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:

I've started the tests. If there aren't any troubles I'll
have the results tomorrow.

Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping
hand I can get.

Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix
first?

Andreas



-

To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


-

To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org





Re: PDFBox 2.0.32 release

2024-07-10 Thread Constantine Dokolas
Yikes!

C.D.

On Wed, Jul 10, 2024 at 10:52 AM Timo Boehme
 wrote:

> No, its the whole X.apache.org which is currently not available (e.g.
> lucene.apache.org).
>
> Best regards,
> Timo
>
>
> Am 10.07.24 um 09:46 schrieb Constantine Dokolas:
> > It seems like you managed to take down pdfbox.apache.org... 
> >
> > C.D.
> >
> > --
> > There is a computer disease that anybody who works with computers knows
> > about. It's a very serious disease and it interferes completely with the
> > work. The trouble with computers is that you 'play' with them!
> > - Richard P. Feynman
> >
> >
> > On Wed, Jul 10, 2024 at 9:00 AM Andreas Lehmkühler
> 
> > wrote:
> >
> >> There is some issue with tagging the release when executing the
> >> release:prepare goal
> >>
> >> I'm still searching  :-(
> >>
> >> Andreas
> >>
> >> Am 09.07.24 um 07:46 schrieb Andreas Lehmkühler:
> >>> It is still tricky but I'm on it.
> >>>
> >>> Sorry for the noise
> >>>
> >>> Andreas
> >>>
> >>> Am 08.07.24 um 20:06 schrieb Andreas Lehmkühler:
>  There is an issue with the changes from
>  https://issues.apache.org/jira/browse/PDFBOX-5789
> 
> 
>  I've to postpone the release to solve the issue first
> 
>  Sorry for the inconvenience
> 
>  Andreas
> 
>  Am 08.07.24 um 19:02 schrieb Andreas Lehmkühler:
> > Looks good to me, I'm starting the release process ...
> >
> > Am 08.07.24 um 08:43 schrieb Tilman Hausherr:
> >> Last one:
> >>
> >>
> https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz
> >> This is because the last change I made yesterday.
> >>
> >> Tilman
> >>
> >> On 06.07.2024 19:17, Tilman Hausherr wrote:
> >>> Result:
> >>>
> >>
> https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz
> >>> to be compared against
> >>>
> >>>
> >> https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz
> >>> I couldn't find a difference visually except the file sizes. This
> >>> might be because of the path names or some meta data.
> >>>
> >>> Tilman
> >>>
> >>> On 06.07.2024 14:19, Tilman Hausherr wrote:
>  Hi,
> 
>  I've just started a new "B" test.
> 
>  Tilman
> 
>  On 06.07.2024 13:29, Andreas Lehmkühler wrote:
> > Hi,
> >
> > after closing https://issues.apache.org/jira/browse/PDFBOX-5838
> > I'd like to finally cut the 2.0.32 release.
> >
> > Do we need a new regression test due the latest changes?
> >
> > There some related changes such as
> > https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent
> > refactoring in fontbox.
> >
> > Andreas
> >
> >
> > Am 14.06.24 um 13:03 schrieb Tilman Hausherr:
> >> Result:
> >>
> >>
> https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz
> >>   From what I see, nothing to do.
> >> And I know the time it takes: 3 hours for the A (or B) test, 1
> >> hour to create the A vs B report (tika-eval).
> >>
> >> Tilman
> >>
> >> On 14.06.2024 08:47, Tilman Hausherr wrote:
> >>> I'll repeat the regression tests with locally reverting the
> >>> change from PDFBOX-5790 but locally adding my proposed xmpbox
> >>> change from PDFBOX-5835. This way we'll know whether there are
> >>> other problems.
> >>>
> >>> Tilman
> >>>
> >>> On 13.06.2024 19:23, Tilman Hausherr wrote:
>  See https://issues.apache.org/jira/browse/PDFBOX-5838
> 
>  I hope that it's all the same problem.
> 
>  Tilman
> 
>  On 13.06.2024 18:30, Andreas Lehmkühler wrote:
> > Thanks for running the tests.
> >
> > the exceptions part looks good, but I'm afraid we have a text
> > extraction issue.
> >
> > commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI
> >
> > some of the special characters changed. In 2.0.31 the were
> > "omitted" and in 2.0.32 there is some special char. But th
> > remaining part looks good to me.
> >
> >
> > cc-main-2021-31-pdf-untruncated/0085/0085885.pdf
> >
> > ist seems to contain some special characters as well, but
> > 2.0.31 is able to extract them. 2.0.32 seems to mix some of
> > the content.
> >
> > I guess it is somehow font related. Need to investigate more
> >
> > Andreas
> >
> >
> > Am 12.06.24 um 20:23 schrieb Tilman Hausherr:
> >> https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz
> >> No new exceptions 

Re: PDFBox 2.0.32 release

2024-07-10 Thread Timo Boehme
No, its the whole X.apache.org which is currently not available (e.g. 
lucene.apache.org).


Best regards,
Timo


Am 10.07.24 um 09:46 schrieb Constantine Dokolas:

It seems like you managed to take down pdfbox.apache.org... 

C.D.

--
There is a computer disease that anybody who works with computers knows
about. It's a very serious disease and it interferes completely with the
work. The trouble with computers is that you 'play' with them!
- Richard P. Feynman


On Wed, Jul 10, 2024 at 9:00 AM Andreas Lehmkühler 
wrote:


There is some issue with tagging the release when executing the
release:prepare goal

I'm still searching  :-(

Andreas

Am 09.07.24 um 07:46 schrieb Andreas Lehmkühler:

It is still tricky but I'm on it.

Sorry for the noise

Andreas

Am 08.07.24 um 20:06 schrieb Andreas Lehmkühler:

There is an issue with the changes from
https://issues.apache.org/jira/browse/PDFBOX-5789


I've to postpone the release to solve the issue first

Sorry for the inconvenience

Andreas

Am 08.07.24 um 19:02 schrieb Andreas Lehmkühler:

Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:

Last one:


https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:


https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against



https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This
might be because of the path names or some meta data.

Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838
I'd like to finally cut the 2.0.32 release.

Do we need a new regression test due the latest changes?

There some related changes such as
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent
refactoring in fontbox.

Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:


https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

  From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1
hour to create the A vs B report (tika-eval).

Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:

I'll repeat the regression tests with locally reverting the
change from PDFBOX-5790 but locally adding my proposed xmpbox
change from PDFBOX-5835. This way we'll know whether there are
other problems.

Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text
extraction issue.

commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were
"omitted" and in 2.0.32 there is some special char. But th
remaining part looks good to me.


cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but
2.0.31 is able to extract them. 2.0.32 seems to mix some of
the content.

I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't
investigated yet.

Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:

I've started the tests. If there aren't any troubles I'll
have the results tomorrow.

Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping
hand I can get.

Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix
first?

Andreas




-

To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-

To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org





-

To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org





-

To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: 

Re: PDFBox 2.0.32 release

2024-07-10 Thread Constantine Dokolas
It seems like you managed to take down pdfbox.apache.org... 

C.D.

--
There is a computer disease that anybody who works with computers knows
about. It's a very serious disease and it interferes completely with the
work. The trouble with computers is that you 'play' with them!
- Richard P. Feynman


On Wed, Jul 10, 2024 at 9:00 AM Andreas Lehmkühler 
wrote:

> There is some issue with tagging the release when executing the
> release:prepare goal
>
> I'm still searching  :-(
>
> Andreas
>
> Am 09.07.24 um 07:46 schrieb Andreas Lehmkühler:
> > It is still tricky but I'm on it.
> >
> > Sorry for the noise
> >
> > Andreas
> >
> > Am 08.07.24 um 20:06 schrieb Andreas Lehmkühler:
> >> There is an issue with the changes from
> >> https://issues.apache.org/jira/browse/PDFBOX-5789
> >>
> >>
> >> I've to postpone the release to solve the issue first
> >>
> >> Sorry for the inconvenience
> >>
> >> Andreas
> >>
> >> Am 08.07.24 um 19:02 schrieb Andreas Lehmkühler:
> >>> Looks good to me, I'm starting the release process ...
> >>>
> >>> Am 08.07.24 um 08:43 schrieb Tilman Hausherr:
>  Last one:
> 
> https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz
> 
>  This is because the last change I made yesterday.
> 
>  Tilman
> 
>  On 06.07.2024 19:17, Tilman Hausherr wrote:
> > Result:
> >
> https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz
> >
> > to be compared against
> >
> >
> https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz
> >
> > I couldn't find a difference visually except the file sizes. This
> > might be because of the path names or some meta data.
> >
> > Tilman
> >
> > On 06.07.2024 14:19, Tilman Hausherr wrote:
> >> Hi,
> >>
> >> I've just started a new "B" test.
> >>
> >> Tilman
> >>
> >> On 06.07.2024 13:29, Andreas Lehmkühler wrote:
> >>> Hi,
> >>>
> >>> after closing https://issues.apache.org/jira/browse/PDFBOX-5838
> >>> I'd like to finally cut the 2.0.32 release.
> >>>
> >>> Do we need a new regression test due the latest changes?
> >>>
> >>> There some related changes such as
> >>> https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent
> >>> refactoring in fontbox.
> >>>
> >>> Andreas
> >>>
> >>>
> >>> Am 14.06.24 um 13:03 schrieb Tilman Hausherr:
>  Result:
> 
> https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz
> 
>   From what I see, nothing to do.
>  And I know the time it takes: 3 hours for the A (or B) test, 1
>  hour to create the A vs B report (tika-eval).
> 
>  Tilman
> 
>  On 14.06.2024 08:47, Tilman Hausherr wrote:
> > I'll repeat the regression tests with locally reverting the
> > change from PDFBOX-5790 but locally adding my proposed xmpbox
> > change from PDFBOX-5835. This way we'll know whether there are
> > other problems.
> >
> > Tilman
> >
> > On 13.06.2024 19:23, Tilman Hausherr wrote:
> >> See https://issues.apache.org/jira/browse/PDFBOX-5838
> >>
> >> I hope that it's all the same problem.
> >>
> >> Tilman
> >>
> >> On 13.06.2024 18:30, Andreas Lehmkühler wrote:
> >>> Thanks for running the tests.
> >>>
> >>> the exceptions part looks good, but I'm afraid we have a text
> >>> extraction issue.
> >>>
> >>> commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI
> >>>
> >>> some of the special characters changed. In 2.0.31 the were
> >>> "omitted" and in 2.0.32 there is some special char. But th
> >>> remaining part looks good to me.
> >>>
> >>>
> >>> cc-main-2021-31-pdf-untruncated/0085/0085885.pdf
> >>>
> >>> ist seems to contain some special characters as well, but
> >>> 2.0.31 is able to extract them. 2.0.32 seems to mix some of
> >>> the content.
> >>>
> >>> I guess it is somehow font related. Need to investigate more
> >>>
> >>> Andreas
> >>>
> >>>
> >>> Am 12.06.24 um 20:23 schrieb Tilman Hausherr:
> 
> https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz
> 
>  No new exceptions but many content differences. I haven't
>  investigated yet.
> 
>  Tilman
> 
>  On 12.06.2024 11:31, Tilman Hausherr wrote:
> > I've started the tests. If there aren't any troubles I'll
> > have the results tomorrow.
> >
> > Tilman
> >
> > On 05.06.2024 08:07, Andreas Lehmkühler wrote:
> >> Thanks for the update.
> >>
> >> I'm 

Re: PDFBox 2.0.32 release

2024-07-10 Thread Andreas Lehmkühler
There is some issue with tagging the release when executing the 
release:prepare goal


I'm still searching  :-(

Andreas

Am 09.07.24 um 07:46 schrieb Andreas Lehmkühler:

It is still tricky but I'm on it.

Sorry for the noise

Andreas

Am 08.07.24 um 20:06 schrieb Andreas Lehmkühler:
There is an issue with the changes from 
https://issues.apache.org/jira/browse/PDFBOX-5789



I've to postpone the release to solve the issue first

Sorry for the inconvenience

Andreas

Am 08.07.24 um 19:02 schrieb Andreas Lehmkühler:

Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:

Last one:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This 
might be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 
I'd like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 
hour to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the 
change from PDFBOX-5790 but locally adding my proposed xmpbox 
change from PDFBOX-5835. This way we'll know whether there are 
other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 
2.0.31 is able to extract them. 2.0.32 seems to mix some of 
the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll 
have the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix 
first?


Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: 

Re: PDFBox 2.0.32 release

2024-07-08 Thread Andreas Lehmkühler

It is still tricky but I'm on it.

Sorry for the noise

Andreas

Am 08.07.24 um 20:06 schrieb Andreas Lehmkühler:
There is an issue with the changes from 
https://issues.apache.org/jira/browse/PDFBOX-5789



I've to postpone the release to solve the issue first

Sorry for the inconvenience

Andreas

Am 08.07.24 um 19:02 schrieb Andreas Lehmkühler:

Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:

Last one:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This 
might be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 
I'd like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 
hour to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the 
change from PDFBOX-5790 but locally adding my proposed xmpbox 
change from PDFBOX-5835. This way we'll know whether there are 
other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 
2.0.31 is able to extract them. 2.0.32 seems to mix some of 
the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll 
have the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix 
first?


Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: 

Re: PDFBox 2.0.32 release

2024-07-08 Thread Andreas Lehmkühler
There is an issue with the changes from 
https://issues.apache.org/jira/browse/PDFBOX-5789



I've to postpone the release to solve the issue first

Sorry for the inconvenience

Andreas

Am 08.07.24 um 19:02 schrieb Andreas Lehmkühler:

Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:

Last one:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This 
might be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd 
like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 
hour to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the 
change from PDFBOX-5790 but locally adding my proposed xmpbox 
change from PDFBOX-5835. This way we'll know whether there are 
other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 
2.0.31 is able to extract them. 2.0.32 seems to mix some of the 
content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll 
have the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org





Re: PDFBox 2.0.32 release

2024-07-08 Thread Andreas Lehmkühler

Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:

Last one:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This 
might be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd 
like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 hour 
to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the change 
from PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 
is able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have 
the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




Re: PDFBox 2.0.32 release

2024-07-08 Thread Tilman Hausherr

Last one:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This 
might be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd 
like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz 



 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 hour 
to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the change 
from PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 
is able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz 



No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have 
the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



- 


To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



- 


To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




- 


To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-07-07 Thread Andreas Lehmkühler

@Tilman
Thanks again for running the tests.

Looks good to me, so that I'm planning to cut the release tomorrow 
evening in about 28 hours from now.


Andreas


Am 06.07.24 um 19:17 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This might 
be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd 
like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 hour 
to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the change 
from PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 
is able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have 
the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-07-06 Thread Tilman Hausherr

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This might 
be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd 
like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz 



 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 hour 
to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the change 
from PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 
is able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz 



No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have 
the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



- 


To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



- 


To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-07-06 Thread Tilman Hausherr

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd 
like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz 



 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 hour 
to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the change 
from PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 
is able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz 



No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have 
the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand 
I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



- 


To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-07-06 Thread Andreas Lehmkühler

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd like 
to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 hour to 
create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the change 
from PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were "omitted" 
and in 2.0.32 there is some special char. But th remaining part 
looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 is 
able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have the 
results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I 
can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-06-14 Thread Tilman Hausherr

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 hour to 
create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the change 
from PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were "omitted" 
and in 2.0.32 there is some special char. But th remaining part 
looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 is 
able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz 



No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have the 
results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I 
can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-06-14 Thread Tilman Hausherr
I'll repeat the regression tests with locally reverting the change from 
PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were "omitted" 
and in 2.0.32 there is some special char. But th remaining part looks 
good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 is 
able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have the 
results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I 
can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-06-13 Thread Tilman Hausherr

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were "omitted" 
and in 2.0.32 there is some special char. But th remaining part looks 
good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 is 
able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have the 
results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I 
can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-06-13 Thread Andreas Lehmkühler

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text extraction 
issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were "omitted" and 
in 2.0.32 there is some special char. But th remaining part looks good 
to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 is able 
to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't investigated yet.

Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have the 
results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I can 
get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-06-12 Thread Tilman Hausherr

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't investigated yet.

Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have the 
results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I can 
get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-06-12 Thread Tilman Hausherr
I've started the tests. If there aren't any troubles I'll have the 
results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I can 
get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-06-05 Thread Andreas Lehmkühler

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I can get.

Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-06-02 Thread Tilman Hausherr

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org