Re: PDFBox 2.0.32 release

2024-07-08 Thread Andreas Lehmkühler

It is still tricky but I'm on it.

Sorry for the noise

Andreas

Am 08.07.24 um 20:06 schrieb Andreas Lehmkühler:
There is an issue with the changes from 
https://issues.apache.org/jira/browse/PDFBOX-5789



I've to postpone the release to solve the issue first

Sorry for the inconvenience

Andreas

Am 08.07.24 um 19:02 schrieb Andreas Lehmkühler:

Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:

Last one:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This 
might be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 
I'd like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 
hour to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the 
change from PDFBOX-5790 but locally adding my proposed xmpbox 
change from PDFBOX-5835. This way we'll know whether there are 
other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 
2.0.31 is able to extract them. 2.0.32 seems to mix some of 
the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll 
have the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix 
first?


Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: 

[jira] [Commented] (PDFBOX-5789) Remove release subproject

2024-07-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864013#comment-17864013
 ] 

ASF subversion and git services commented on PDFBOX-5789:
-

Commit 1919053 from le...@apache.org in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1919053 ]

PDFBOX-5789: move phase of ant task to deploy

> Remove release subproject
> -
>
> Key: PDFBOX-5789
> URL: https://issues.apache.org/jira/browse/PDFBOX-5789
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.30, 3.0.2 PDFBox, 4.0.0
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> PDFBOX-5699 introduce the new subproject "release" in order to fix some issue 
> with the SCM-URL. 
> In the hindsight it turns out to be an issue. The release project doesn't 
> include any source code and therefore is excluded from the source zip. But as 
> it is still a part of the project itself, it leads to a broken build if 
> someone uses the zip to do so.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Build failed in Jenkins: PDFBox » PDFBox-2.0.x #1263

2024-07-08 Thread Apache Jenkins Server
See 


Changes:

[Andreas Lehmkühler] rollback release preparation due to a build issue

[Andreas Lehmkühler] [maven-release-plugin] prepare for next development 
iteration

[Andreas Lehmkühler] [maven-release-plugin] prepare release 2.0.32


--
Started by an SCM change
Running as SYSTEM
[EnvInject] - Loading node environment variables.
Building remotely on builds24 (ubuntu) in workspace 

Cleaning up 
Deleting 

Deleting 

Deleting 

Deleting 

Deleting 

Deleting 

Deleting 
Deleting 

Deleting 

Deleting 

Deleting 

Updating https://svn.apache.org/repos/asf/pdfbox/branches/2.0 at revision 
'2024-07-08T18:51:08.545 +' --quiet
At revision 1919049

Parsing POMs
Modules changed, recalculating dependency graph
Established TCP socket on 45261
maven35-agent.jar already up to date
maven35-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
[PDFBox-2.0.x] $ /home/jenkins/tools/java/latest11/bin/java -cp 
/home/jenkins/jenkins-agent/maven35-agent.jar:/home/jenkins/tools/maven/latest/boot/plexus-classworlds-2.7.0.jar:/home/jenkins/tools/maven/latest/conf/logging
 jenkins.maven3.agent.Maven35Main /home/jenkins/tools/maven/latest 
/home/jenkins/jenkins-agent/agent.jar 
/home/jenkins/jenkins-agent/maven35-interceptor.jar 
/home/jenkins/jenkins-agent/maven3-interceptor-commons.jar 45261
<===[JENKINS REMOTING CAPACITY]===>   channel started
Executing Maven:  -B -f 
 clean 
deploy -Ppedantic -Dmaven.source.skip=true
[INFO] Scanning for projects...
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Apache PDFBox parent   [pom]
[INFO] Apache FontBox  [bundle]
[INFO] Apache XmpBox   [bundle]
[INFO] Apache PDFBox   [bundle]
[INFO] Apache Preflight[bundle]
[INFO] Apache Preflight application[bundle]
[INFO] Apache PDFBox Debugger [jar]
[INFO] Apache PDFBox tools[jar]
[INFO] Apache PDFBox application   [bundle]
[INFO] Apache PDFBox Debugger application  [bundle]
[INFO] Apache PDFBox examples [jar]
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[INFO] 
[INFO] --< org.apache.pdfbox:pdfbox-parent >---
[INFO] Building Apache PDFBox parent 2.0.32  [1/11]
[INFO]   from pom.xml
[INFO] [ pom ]-
[INFO] 
[INFO] --- clean:3.0.0:clean (default-clean) @ pdfbox-parent ---
[INFO] 
[INFO] --- enforcer:1.4.1:enforce (default) @ pdfbox-parent ---
[INFO] 
[INFO] --- enforcer:1.4.1:enforce (enforce-maven-version) @ pdfbox-parent ---
[INFO] 
[INFO] --- remote-resources:1.5:process (process-resource-bundles) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- animal-sniffer:1.17:check (check-java-version) @ pdfbox-parent ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java16:1.0
[INFO] 
[INFO] --- site:3.7:attach-descriptor (attach-descriptor) @ pdfbox-parent ---
[INFO] No site descriptor found: nothing to attach.
[INFO] 
[INFO] --- 

Build failed in Jenkins: PDFBox » PDFBox-2.0.x » Apache PDFBox parent #1263

2024-07-08 Thread Apache Jenkins Server
See 


Changes:


--
Established TCP socket on 45261
maven35-agent.jar already up to date
maven35-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
<===[JENKINS REMOTING CAPACITY]===>   channel started
Executing Maven:  -B -f 

 clean deploy -Ppedantic -Dmaven.source.skip=true
[INFO] Scanning for projects...
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Apache PDFBox parent   [pom]
[INFO] Apache FontBox  [bundle]
[INFO] Apache XmpBox   [bundle]
[INFO] Apache PDFBox   [bundle]
[INFO] Apache Preflight[bundle]
[INFO] Apache Preflight application[bundle]
[INFO] Apache PDFBox Debugger [jar]
[INFO] Apache PDFBox tools[jar]
[INFO] Apache PDFBox application   [bundle]
[INFO] Apache PDFBox Debugger application  [bundle]
[INFO] Apache PDFBox examples [jar]
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[HUDSON] Collecting dependencies info
[INFO] 
[INFO] --< org.apache.pdfbox:pdfbox-parent >---
[INFO] Building Apache PDFBox parent 2.0.32  [1/11]
[INFO]   from pom.xml
[INFO] [ pom ]-
[INFO] 
[INFO] --- clean:3.0.0:clean (default-clean) @ pdfbox-parent ---
[INFO] 
[INFO] --- enforcer:1.4.1:enforce (default) @ pdfbox-parent ---
[INFO] 
[INFO] --- enforcer:1.4.1:enforce (enforce-maven-version) @ pdfbox-parent ---
[INFO] 
[INFO] --- remote-resources:1.5:process (process-resource-bundles) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- animal-sniffer:1.17:check (check-java-version) @ pdfbox-parent ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java16:1.0
[INFO] 
[INFO] --- site:3.7:attach-descriptor (attach-descriptor) @ pdfbox-parent ---
[INFO] No site descriptor found: nothing to attach.
[INFO] 
[INFO] --- source:3.0.1:jar-no-fork (attach-sources) @ pdfbox-parent ---
[INFO] Skipping source per configuration.
[INFO] 
[INFO] --- apache-rat:0.16.1:check (default) @ pdfbox-parent ---
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 
0, approved: 5 licenses.
[INFO] 
[INFO] --- dependency-check:10.0.2:check (default) @ pdfbox-parent ---
[INFO] Checking for updates
[INFO] Skipping the NVD API Update as it was completed within the last 240 
minutes
[INFO] Skipping Known Exploited Vulnerabilities update check since last check 
was within 24 hours.
[INFO] Check for updates complete (951 ms)
[INFO] 

Dependency-Check is an open source tool performing a best effort analysis of 
3rd party dependencies; false positives and false negatives may exist in the 
analysis performed by the tool. Use of the tool and the reporting provided 
constitutes acceptance for use in an AS IS condition, and there are NO 
warranties, implied or otherwise, with regard to the analysis or its use. Any 
use of the tool and the reporting provided is at the user's risk. In no event 
shall the copyright holder or OWASP be held liable for any damages whatsoever 
arising out of or in connection with the use of this tool, the analysis 
performed, or the resulting report.


   About ODC: 
https://jeremylong.github.io/DependencyCheck/general/internals.html
   False Positives: 
https://jeremylong.github.io/DependencyCheck/general/suppression.html

 Sponsor: https://github.com/sponsors/jeremylong


[INFO] Analysis Started
[INFO] Finished File Name Analyzer (0 seconds)
[INFO] Finished Dependency Merging Analyzer (0 seconds)
[INFO] Finished Hint Analyzer (0 seconds)
[INFO] Finished Version Filter Analyzer (0 seconds)
[INFO] Created CPE Index (3 seconds)
[INFO] Finished CPE Analyzer (3 seconds)
[INFO] Finished False Positive Analyzer (0 seconds)
[INFO] Finished NVD CVE Analyzer (0 seconds)
[INFO] Finished Sonatype OSS Index Analyzer (0 seconds)
[INFO] Finished Vulnerability Suppression Analyzer (0 seconds)
[INFO] Finished Known Exploited Vulnerability 

Re: PDFBox 2.0.32 release

2024-07-08 Thread Andreas Lehmkühler
There is an issue with the changes from 
https://issues.apache.org/jira/browse/PDFBOX-5789



I've to postpone the release to solve the issue first

Sorry for the inconvenience

Andreas

Am 08.07.24 um 19:02 schrieb Andreas Lehmkühler:

Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:

Last one:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This 
might be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd 
like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 
hour to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the 
change from PDFBOX-5790 but locally adding my proposed xmpbox 
change from PDFBOX-5835. This way we'll know whether there are 
other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 
2.0.31 is able to extract them. 2.0.32 seems to mix some of the 
content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll 
have the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org





[jira] [Commented] (PDFBOX-5789) Remove release subproject

2024-07-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863881#comment-17863881
 ] 

ASF subversion and git services commented on PDFBOX-5789:
-

Commit 1919041 from le...@apache.org in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1919041 ]

PDFBOX-5789: adjust path for release build

> Remove release subproject
> -
>
> Key: PDFBOX-5789
> URL: https://issues.apache.org/jira/browse/PDFBOX-5789
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.30, 3.0.2 PDFBox, 4.0.0
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> PDFBOX-5699 introduce the new subproject "release" in order to fix some issue 
> with the SCM-URL. 
> In the hindsight it turns out to be an issue. The release project doesn't 
> include any source code and therefore is excluded from the source zip. But as 
> it is still a part of the project itself, it leads to a broken build if 
> someone uses the zip to do so.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-5848.

Resolution: Fixed

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: 706213.pdf, cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5789) Remove release subproject

2024-07-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863873#comment-17863873
 ] 

ASF subversion and git services commented on PDFBOX-5789:
-

Commit 1919032 from le...@apache.org in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1919032 ]

PDFBOX-5789: remove release subproject

> Remove release subproject
> -
>
> Key: PDFBOX-5789
> URL: https://issues.apache.org/jira/browse/PDFBOX-5789
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.30, 3.0.2 PDFBox, 4.0.0
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> PDFBOX-5699 introduce the new subproject "release" in order to fix some issue 
> with the SCM-URL. 
> In the hindsight it turns out to be an issue. The release project doesn't 
> include any source code and therefore is excluded from the source zip. But as 
> it is still a part of the project itself, it leads to a broken build if 
> someone uses the zip to do so.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-07-08 Thread Andreas Lehmkühler

Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:

Last one:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This 
might be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd 
like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 hour 
to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the change 
from PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 
is able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have 
the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863760#comment-17863760
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 9fc2565315df666b0f2da18a7dafbbf959806836 in pdfbox-jbig2's branch 
refs/heads/master from Tilman Hausherr
[ https://gitbox.apache.org/repos/asf?p=pdfbox-jbig2.git;h=9fc2565 ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863730#comment-17863730
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1919013 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1919013 ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863729#comment-17863729
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1919012 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1919012 ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863728#comment-17863728
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1919011 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1919011 ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-07-08 Thread Tilman Hausherr

Last one:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This 
might be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd 
like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz 



 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 hour 
to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the change 
from PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 
is able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz 



No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have 
the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



- 


To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



- 


To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




- 


To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org