Re: owasp fails

2024-10-22 Thread Andreas Lehmkühler

Looks like the issue was fixed.

Am 21.10.24 um 20:55 schrieb Tilman Hausherr:

The bug has been reported here

https://github.com/jeremylong/DependencyCheck/issues/7067



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox Board Report October 2024 due

2024-10-08 Thread Andreas Lehmkühler

I've posted the report as proposed.

Thanks for the reviews

Andreas

Am 07.10.24 um 17:37 schrieb Andreas Lehmkühler:

Hi,

find attached a quick draft of the board report we're expected to submit 
this month. It's based upon the report wizard template which can be 
found at [1]


Any comments or additions are appreciated ...

Sorry for the short notice, but I wasn't able to prepare a report 
earlier due to some personal reasons.



## Description:
The mission of PDFBox is the creation and maintenance of software 
related to

Java library for working with PDF documents

## Project Status:
Current project status: Ongoing with moderate activity
Issues for the board: none

## Membership Data:
Apache PDFBox was founded 2009-10-21 (15 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

     3.0.3 was released on 2024-08-08.
     2.0.32 was released on 2024-07-24.
     2.0.31 was released on 2024-03-24.

## Community Health:
- there is a steady stream of contributions, bug reports and questions 
on the

   mailing lists
- it was a more quiet quarter due to the holiday season
- another 3.0.x and 2.0.x will most likely be released before xmas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Apache PDFBox Board Report October 2024 due

2024-10-07 Thread Andreas Lehmkühler

Hi,

find attached a quick draft of the board report we're expected to submit 
this month. It's based upon the report wizard template which can be 
found at [1]


Any comments or additions are appreciated ...

Sorry for the short notice, but I wasn't able to prepare a report 
earlier due to some personal reasons.



## Description:
The mission of PDFBox is the creation and maintenance of software 
related to

Java library for working with PDF documents

## Project Status:
Current project status: Ongoing with moderate activity
Issues for the board: none

## Membership Data:
Apache PDFBox was founded 2009-10-21 (15 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

3.0.3 was released on 2024-08-08.
2.0.32 was released on 2024-07-24.
2.0.31 was released on 2024-03-24.

## Community Health:
- there is a steady stream of contributions, bug reports and questions 
on the

  mailing lists
- it was a more quiet quarter due to the holiday season
- another 3.0.x and 2.0.x will most likely be released before xmas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[ANNOUNCE] Apache PDFBox 3.0.3 released

2024-08-08 Thread Andreas Lehmkühler

The Apache PDFBox community is pleased to announce the release of
Apache PDFBox version 3.0.3 The release is available for download at:

https://pdfbox.apache.org/download.html

See the full release notes below for details about this release.

Release Notes -- Apache PDFBox -- Version 3.0.3

Introduction


The Apache PDFBox library is an open source Java tool for working with 
PDF documents.


This is an incremental bugfix release based on the earlier 3.0.2 
release. It contains

a couple of fixes and small improvements.

A migration guide is available at 
https://pdfbox.apache.org/3.0/migration.html. It is
still a work in progress and we are happy to include any valuable 
feedback from our

community.

For more details on these changes and all the other fixes and improvements
included in this release, please refer to the following issues on the
PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.

Bug
[PDFBOX-5303] - preflight-app fails on Java 11+ with 
NoClassDefFoundError: javax/activation/DataSource

[PDFBOX-5784] - AppearanceGeneratorHelper assumes fontscale 1000
[PDFBOX-5785] - Issue with embedded Font and descendant Font
[PDFBOX-5786] - NPE in COSWriter.getObjectKey() when saving broken file
[PDFBOX-5787] - LCMS error 13: Mismatched alpha channels
[PDFBOX-5789] - Remove release subproject
[PDFBOX-5790] - Don't use a predefined CMap if a ToUnicode CMap is present
[PDFBOX-5792] - Regression NPE in Splitter
[PDFBOX-5794] - The content of the specified font is lost,Google 
Chrome can display it

[PDFBOX-5795] - Crash for Softmask with incorrect backdrop color components
[PDFBOX-5798] - Observable Timing Discrepancy (Timing Attack)
[PDFBOX-5799] - Page with thousands of content streams takes extremely 
long to render or extract

[PDFBOX-5802] - Black rectangle over image
[PDFBOX-5806] - Wrong font substitution for Wingdings
[PDFBOX-5809] - PDDocument#importPage slowed down by factor 1300
[PDFBOX-5810] - Wrong glyph in Single Substitution Format 2 extraction 
to map

[PDFBOX-5811] - Split aborts with broken destinations
[PDFBOX-5812] - IllegalStateException are thrown by surrogate pair 
character 𩸽
[PDFBOX-5822] - IllegalArgumentException: Parameter must be 1-based, but 
is 0 when using PDFTextStripperByArea

[PDFBOX-5825] - Files created with PDFMergerExample are not correct PDF/A
[PDFBOX-5826] - Missing /Subtype and /Type in Metadata not detected
[PDFBOX-5827] - Multiple exceptions coming from org.apache.fontbox.ttf 
for different PDFs
[PDFBOX-5829] - IOException: Error expected floating point 
numberactual='-12.-1'
[PDFBOX-5830] - NullPointerException: Cannot invoke 
"String.codePointAt(int)" because "uni" is null

[PDFBOX-5831] - Radio button can't be set
[PDFBOX-5832] - Error when writing a document with OutlineItems 
containing null SE objects
[PDFBOX-5835] - DomXmpParser - IllegalArgumentException: prefix cannot 
be "null" when creating a QName
[PDFBOX-5839] - ClassCastException: org.apache.pdfbox.cos.COSNull cannot 
be cast to org.apache.pdfbox.cos.COSDictionary

[PDFBOX-5841] - First split result document misses metadata after split
[PDFBOX-5842] - IllegalArgumentException: Width (26) and height (0) must 
be non-zero
[PDFBOX-5843] - There is an exception when getting embedded font, is it 
compatible?

[PDFBOX-5845] - potential memory leak in TrueTypeCollection.java
[PDFBOX-5848] - Infinite loop after splitting and saving PDF / giant 
result files

[PDFBOX-5850] - Add fix MNG-8180
[PDFBOX-5853] - the PDDocument.documentId does not seem to be written 
into the flat byteStream

[PDFBOX-5855] - Fix last step of the build process
[PDFBOX-5859] - StringIndexOutOfBoundsException in AppearanceGeneratorHelper
[PDFBOX-5861] - ClassCastException in SetLineJoinStyle.process()
[PDFBOX-5863] - bad comparison of byte with 128

New Feature
[PDFBOX-5808] - Add support for GSUB Lookup Type 3

Improvement
[PDFBOX-5675] - org.apache.pdfbox.filter.Filter#decode() Java heap space
[PDFBOX-5793] - Remove redundant values for an object key
[PDFBOX-5807] - JPEGFactory. Reduce logging severity when no image 
metadata is present

[PDFBOX-5814] - Limit overwrite warning to non empty files
[PDFBOX-5817] - Detect extreme componentCount values
[PDFBOX-5818] - Update unicode Scripts.txt
[PDFBOX-5819] - Make Type2CharStringParser thread-safe
[PDFBOX-5821] - Include a PDFA check with VeraPDF for CreatePDFATest
[PDFBOX-5823] - StringUtil.PATTERN_SPACE memory optmisation
[PDFBOX-5824] - Allow COSDictionary.MAP_THRESHOLD to be defined as 
System property
[PDFBOX-5837] - Add center constructor parameter to PDFPageable and to 
pdfbox-app
[PDFBOX-5840] - When splitting, keep named page destinations that are 
part of target document(s)

[PDFBOX-5847] - Improve performance of FileSystemFontProvider.scanFonts()
[PDFBOX-5851] - When this PDF is rendered with the "f" Operator, a black 
screen appears.


Task
[PDFBOX-5820] - Investigate why we get "response contains wrong nonce 
value" during build tests



[RESULT][VOTE] Release Apache PDFBox 3.0.3

2024-08-08 Thread Andreas Lehmkühler

Am 05.08.24 um 18:54 schrieb Andreas Lehmkühler:

Please vote on releasing this package as Apache PDFBox 3.0.3.


   +1 Tilman Hausherr
   +1 Timo Boehme
   +1 Maruan Sahyoun
   +1 Tim Allison
   +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[VOTE] Release Apache PDFBox 3.0.3

2024-08-05 Thread Andreas Lehmkühler

Hi,

a candidate for the PDFBox 3.0.3 release is available at:

https://dist.apache.org/repos/dist/dev/pdfbox/3.0.3/

The release candidate is a zip archive of the sources in:

https://svn.apache.org/repos/asf/pdfbox/tags/3.0.3/

The SHA-512 checksum of the archive is 
ebec1e21abc3a185ad053f4c8a046cf57529446e886d3723602607bdb19749c6cfa4459747c8b60cce33e6d2387b77eae609d65d5df71dbf7579f9d0cd781ffa.


Please vote on releasing this package as Apache PDFBox 3.0.3.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

[ ] +1 Release this package as Apache PDFBox 3.0.3
[ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 3.0.3 release

2024-08-05 Thread Andreas Lehmkühler
Thanks for the fast feedback. I'm going to cut the release now. 
Hopefully it'll work this time :-o


Andreas


Am 05.08.24 um 14:24 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_3.0.2_vs_3.0.3_3.tar.xz

The new exceptions and differences I had a closer look on are all 
because of Flate decompression errors, so I'd say go ahead.


Tilman

On 05.08.2024 09:34, Tilman Hausherr wrote:


@Tilman
Are there any issues with the text extraction, otherwise I'm going to 
cut the release this evening.


The text extraction differences seem all to be related to the Flate 
decompression errors, I'm gonna do another test that will hopefully be 
finished in time.


Tilman





-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 3.0.3 release

2024-08-04 Thread Andreas Lehmkühler




Am 03.08.24 um 05:48 schrieb Tilman Hausherr:

I thought I had posted the link, but it seems I didn't?! Here it is
https://home.snafu.de/tilman/tmp/reports_pdfbox_3.0.2_vs_3.0.3_2.tar.xz
I had a look at the new exception. The changes from [1] are responsible 
for the different behavior. IMHO that issue can be ignored. The file is 
a mess and 3.0.2 isn't able to extract anything. 3.0.3 simply produces 
another error message.


@Tilman
Are there any issues with the text extraction, otherwise I'm going to 
cut the release this evening.


Andreas

[1] https://issues.apache.org/jira/browse/PDFBOX-5786


Tilman

On 01.08.2024 10:47, Tilman Hausherr wrote:
Thanks, I'll run another "B" and eval job but with the change from 
PDFBOX-5790 reverted, like I did for 2.0.32, so we get less noise in 
the content results.


Tilman

On 01.08.2024 07:56, Andreas Lehmkühler wrote:



Am 31.07.24 um 11:45 schrieb Tilman Hausherr:

On 31.07.2024 06:47, Andreas Lehmkühler wrote:
Bad news is there are a lot of new exceptions. Good news is, it 
looks like they are all the same.


I'd a quick look and it seems to be related to [1]. I've tested 
some of the pdfs and they all contain corrupt streams. I guess the 
issue is a different exception handling in such cases. 3.0.2 
catches such exceptions when reading corrupt streams and 3.0.3 
seems to struggle and stops the parsing process.


Not the ones at the bottom of new_exceptions_in_B_details.xlsx (e.g. 
500436.pdf), although these do also seem to be related to 
PDFBOX-5675 too. They do a rewind() near the end of the stream.
I've found a fix for both cases. Please rerun the tests whenever you 
have some cycles to do so.


Thanks in advance




Tilman






Andreas

[1] https://issues.apache.org/jira/browse/PDFBOX-5675



Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 3.0.3 release

2024-07-31 Thread Andreas Lehmkühler




Am 31.07.24 um 11:45 schrieb Tilman Hausherr:

On 31.07.2024 06:47, Andreas Lehmkühler wrote:
Bad news is there are a lot of new exceptions. Good news is, it looks 
like they are all the same.


I'd a quick look and it seems to be related to [1]. I've tested some 
of the pdfs and they all contain corrupt streams. I guess the issue is 
a different exception handling in such cases. 3.0.2 catches such 
exceptions when reading corrupt streams and 3.0.3 seems to struggle 
and stops the parsing process.


Not the ones at the bottom of new_exceptions_in_B_details.xlsx (e.g. 
500436.pdf), although these do also seem to be related to PDFBOX-5675 
too. They do a rewind() near the end of the stream.
I've found a fix for both cases. Please rerun the tests whenever you 
have some cycles to do so.


Thanks in advance




Tilman






Andreas

[1] https://issues.apache.org/jira/browse/PDFBOX-5675



Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 3.0.3 release

2024-07-30 Thread Andreas Lehmkühler




Am 30.07.24 um 21:11 schrieb Tilman Hausherr:

regression test reports:

Thanks for running the tests



https://home.snafu.de/tilman/tmp/reports_pdfbox_3.0.2_vs_3.0.3.tar.xz

I haven't investigated the details, but it looks like work to do :-(
Bad news is there are a lot of new exceptions. Good news is, it looks 
like they are all the same.


I'd a quick look and it seems to be related to [1]. I've tested some of 
the pdfs and they all contain corrupt streams. I guess the issue is a 
different exception handling in such cases. 3.0.2 catches such 
exceptions when reading corrupt streams and 3.0.3 seems to struggle and 
stops the parsing process.



Andreas

[1] https://issues.apache.org/jira/browse/PDFBOX-5675



Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



PDFBox 3.0.3 release

2024-07-29 Thread Andreas Lehmkühler

Hi,

2.0.32 is out of the door and I'd like to cut the next 3.0 release. How 
about next Monday?


WDYT? Any objections?

Andreas

P.S.: I've to fix the build process first, but I'm almost done. Maybe we 
have to redo the 2.0 release as there is an issue with the checksums, 
which I created manually due to the broken last step of the build 
process. :-(


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[ANNOUNCE] Apache PDFBox 2.0.32 released

2024-07-24 Thread Andreas Lehmkühler

The Apache PDFBox community is pleased to announce the release of
Apache PDFBox version 2.0.32 The release is available for download at:

https://pdfbox.apache.org/download.html

See the full release notes below for details about this release.

Release Notes -- Apache PDFBox -- Version 2.0.32

Introduction


The Apache PDFBox library is an open source Java tool for working with 
PDF documents.


This is an incremental bugfix release based on the earlier 2.0.31 
release. It contains

a couple of fixes and small improvements.

For more details on these changes and all the other fixes and improvements
included in this release, please refer to the following issues on the
PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.


Release Contents


Bug
[PDFBOX-5303] - preflight-app fails on Java 11+ with 
NoClassDefFoundError: javax/activation/DataSource

[PDFBOX-5784] - AppearanceGeneratorHelper assumes fontscale 1000
[PDFBOX-5789] - Remove release subproject
[PDFBOX-5790] - Don't use a predefined CMap if a ToUnicode CMap is present
[PDFBOX-5792] - Regression NPE in Splitter
[PDFBOX-5794] - The content of the specified font is lost,Google 
Chrome can display it

[PDFBOX-5795] - Crash for Softmask with incorrect backdrop color components
[PDFBOX-5798] - Observable Timing Discrepancy (Timing Attack)
[PDFBOX-5802] - Black rectangle over image
[PDFBOX-5806] - Wrong font substitution for Wingdings
[PDFBOX-5809] - PDDocument#importPage slowed down by factor 1300
[PDFBOX-5811] - Split aborts with broken destinations
[PDFBOX-5822] - IllegalArgumentException: Parameter must be 1-based, but 
is 0 when using PDFTextStripperByArea

[PDFBOX-5825] - Files created with PDFMergerExample are not correct PDF/A
[PDFBOX-5826] - Missing /Subtype and /Type in Metadata not detected
[PDFBOX-5827] - Multiple exceptions coming from org.apache.fontbox.ttf 
for different PDFs
[PDFBOX-5829] - IOException: Error expected floating point 
numberactual='-12.-1'
[PDFBOX-5830] - NullPointerException: Cannot invoke 
"String.codePointAt(int)" because "uni" is null
[PDFBOX-5835] - DomXmpParser - IllegalArgumentException: prefix cannot 
be "null" when creating a QName
[PDFBOX-5839] - ClassCastException: org.apache.pdfbox.cos.COSNull cannot 
be cast to org.apache.pdfbox.cos.COSDictionary
[PDFBOX-5842] - IllegalArgumentException: Width (26) and height (0) must 
be non-zero
[PDFBOX-5843] - There is an exception when getting embedded font, is it 
compatible?
[PDFBOX-5848] - Infinite loop after splitting and saving PDF / giant 
result files


Improvement
[PDFBOX-5807] - JPEGFactory. Reduce logging severity when no image 
metadata is present

[PDFBOX-5813] - Add test for surrogate pair character 𩸽
[PDFBOX-5818] - Update unicode Scripts.txt
[PDFBOX-5821] - Include a PDFA check with VeraPDF for CreatePDFATest
[PDFBOX-5837] - Add center constructor parameter to PDFPageable and to 
pdfbox-app
[PDFBOX-5840] - When splitting, keep named page destinations that are 
part of target document(s)
[PDFBOX-5851] - When this PDF is rendered with the "f" Operator, a black 
screen appears.


Task
[PDFBOX-5820] - Investigate why we get "response contains wrong nonce 
value" during build tests



This release consists of a single source archive packaged as a zip file.
The archive can be unpacked with the jar tool from your JDK installation.
See the README.txt file for instructions on how to build this release.

The source archive is accompanied by a SHA512 checksum and a PGP signature
that you can use to verify the authenticity of your download.
The public key used for the PGP signature can be found at
https://www.apache.org/dist/pdfbox/KEYS.

About Apache PDFBox
---

Apache PDFBox is an open source Java library for working with PDF documents.
This project allows creation of new PDF documents, manipulation of existing
documents and the ability to extract content from documents. Apache PDFBox
also includes several command line utilities. Apache PDFBox is published
under the Apache License, Version 2.0.

For more information, visit https://pdfbox.apache.org/

About The Apache Software Foundation


Established in 1999, The Apache Software Foundation provides organizational,
legal, and financial support for more than 100 freely-available,
collaboratively-developed Open Source projects. The pragmatic Apache License
enables individual and commercial users to easily deploy Apache software;
the Foundation's intellectual property framework limits the legal exposure
of its 2,500+ contributors.

For more information, visit https://www.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[RESULT][VOTE] Release Apache PDFBox 2.0.32

2024-07-24 Thread Andreas Lehmkühler

Am 21.07.24 um 12:18 schrieb Andreas Lehmkühler:

Please vote on releasing this package as Apache PDFBox 2.0.32.


   +1 Tilman Hausherr
   +1 Maruan Sahyoun
   +1 Timo Boehme
   +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.

Andreas


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-07-21 Thread Andreas Lehmkühler

Hi,

sorry for the delay. Finally I've cut the release and pushed it out.


The bad news is, the build process is still broken. :-(

The good news is, the artifacts itself are ok, but the last step somehow 
doesn't work. I've gathered all artifacts manually and created the 
sha512 checksum using the command line.


I've activated the detailed maven logging using -X and I hope it 
contains some hint where to look for the issue.


Cheers
Andreas

Am 12.07.24 um 08:14 schrieb Andreas Lehmkühler:
I've fixed the tagging issue. I mixed up the pom when reverting the 
release preparation :-(


Now I'm facing another issue. Maven runs in a wrong order. It tries to 
gather all build artifacts at the beginning of the process before they 
are built. Still investigating ...


Sorry for the delay
Andreas


Am 10.07.24 um 08:00 schrieb Andreas Lehmkühler:
There is some issue with tagging the release when executing the 
release:prepare goal


I'm still searching  :-(

Andreas

Am 09.07.24 um 07:46 schrieb Andreas Lehmkühler:

It is still tricky but I'm on it.

Sorry for the noise

Andreas

Am 08.07.24 um 20:06 schrieb Andreas Lehmkühler:
There is an issue with the changes from 
https://issues.apache.org/jira/browse/PDFBOX-5789



I've to postpone the release to solve the issue first

Sorry for the inconvenience

Andreas

Am 08.07.24 um 19:02 schrieb Andreas Lehmkühler:

Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:

Last one:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This 
might be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 
I'd like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the 
recent refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 
hour to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the 
change from PDFBOX-5790 but locally adding my proposed xmpbox 
change from PDFBOX-5835. This way we'll know whether there 
are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a 
text extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 
2.0.31 is able to extract them. 2.0.32 seems to mix some of 
the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll 
have the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any 
helping hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix 
first?


Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



--

[VOTE] Release Apache PDFBox 2.0.32

2024-07-21 Thread Andreas Lehmkühler

Hi,

a candidate for the PDFBox 2.0.32 release is available at:

https://dist.apache.org/repos/dist/dev/pdfbox/2.0.32/

The release candidate is a zip archive of the sources in:

https://svn.apache.org/repos/asf/pdfbox/tags/2.0.32/

The SHA-512 checksum of the archive is 
bdad289bda79e78774dd4dedb8b0531f20382038e96232eb6c55508e2187ca3d7512072e87cd293fe6d1b9967d7e6a44f39b09a3af59872bf2f307275a65f546.


Please vote on releasing this package as Apache PDFBox 2.0.32.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

[ ] +1 Release this package as Apache PDFBox 2.0.32
[ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-07-11 Thread Andreas Lehmkühler
I've fixed the tagging issue. I mixed up the pom when reverting the 
release preparation :-(


Now I'm facing another issue. Maven runs in a wrong order. It tries to 
gather all build artifacts at the beginning of the process before they 
are built. Still investigating ...


Sorry for the delay
Andreas


Am 10.07.24 um 08:00 schrieb Andreas Lehmkühler:
There is some issue with tagging the release when executing the 
release:prepare goal


I'm still searching  :-(

Andreas

Am 09.07.24 um 07:46 schrieb Andreas Lehmkühler:

It is still tricky but I'm on it.

Sorry for the noise

Andreas

Am 08.07.24 um 20:06 schrieb Andreas Lehmkühler:
There is an issue with the changes from 
https://issues.apache.org/jira/browse/PDFBOX-5789



I've to postpone the release to solve the issue first

Sorry for the inconvenience

Andreas

Am 08.07.24 um 19:02 schrieb Andreas Lehmkühler:

Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:

Last one:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This 
might be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 
I'd like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 
hour to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the 
change from PDFBOX-5790 but locally adding my proposed xmpbox 
change from PDFBOX-5835. This way we'll know whether there are 
other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a 
text extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 
2.0.31 is able to extract them. 2.0.32 seems to mix some of 
the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll 
have the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any 
helping hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix 
first?


Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-

Re: PDFBox 2.0.32 release

2024-07-09 Thread Andreas Lehmkühler
There is some issue with tagging the release when executing the 
release:prepare goal


I'm still searching  :-(

Andreas

Am 09.07.24 um 07:46 schrieb Andreas Lehmkühler:

It is still tricky but I'm on it.

Sorry for the noise

Andreas

Am 08.07.24 um 20:06 schrieb Andreas Lehmkühler:
There is an issue with the changes from 
https://issues.apache.org/jira/browse/PDFBOX-5789



I've to postpone the release to solve the issue first

Sorry for the inconvenience

Andreas

Am 08.07.24 um 19:02 schrieb Andreas Lehmkühler:

Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:

Last one:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This 
might be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 
I'd like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 
hour to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the 
change from PDFBOX-5790 but locally adding my proposed xmpbox 
change from PDFBOX-5835. This way we'll know whether there are 
other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 
2.0.31 is able to extract them. 2.0.32 seems to mix some of 
the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll 
have the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix 
first?


Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache

Re: PDFBox 2.0.32 release

2024-07-08 Thread Andreas Lehmkühler

It is still tricky but I'm on it.

Sorry for the noise

Andreas

Am 08.07.24 um 20:06 schrieb Andreas Lehmkühler:
There is an issue with the changes from 
https://issues.apache.org/jira/browse/PDFBOX-5789



I've to postpone the release to solve the issue first

Sorry for the inconvenience

Andreas

Am 08.07.24 um 19:02 schrieb Andreas Lehmkühler:

Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:

Last one:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This 
might be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 
I'd like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 
hour to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the 
change from PDFBOX-5790 but locally adding my proposed xmpbox 
change from PDFBOX-5835. This way we'll know whether there are 
other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 
2.0.31 is able to extract them. 2.0.32 seems to mix some of 
the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll 
have the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix 
first?


Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfb

Re: PDFBox 2.0.32 release

2024-07-08 Thread Andreas Lehmkühler
There is an issue with the changes from 
https://issues.apache.org/jira/browse/PDFBOX-5789



I've to postpone the release to solve the issue first

Sorry for the inconvenience

Andreas

Am 08.07.24 um 19:02 schrieb Andreas Lehmkühler:

Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:

Last one:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This 
might be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd 
like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 
hour to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the 
change from PDFBOX-5790 but locally adding my proposed xmpbox 
change from PDFBOX-5835. This way we'll know whether there are 
other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 
2.0.31 is able to extract them. 2.0.32 seems to mix some of the 
content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll 
have the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscri

Re: PDFBox 2.0.32 release

2024-07-08 Thread Andreas Lehmkühler

Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:

Last one:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz

This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This 
might be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd 
like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 hour 
to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the change 
from PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 
is able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have 
the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e

Re: PDFBox 2.0.32 release

2024-07-07 Thread Andreas Lehmkühler

@Tilman
Thanks again for running the tests.

Looks good to me, so that I'm planning to cut the release tomorrow 
evening in about 28 hours from now.


Andreas


Am 06.07.24 um 19:17 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This might 
be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd 
like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 hour 
to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the change 
from PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 
is able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have 
the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-07-06 Thread Andreas Lehmkühler

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd like 
to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 hour to 
create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the change 
from PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were "omitted" 
and in 2.0.32 there is some special char. But th remaining part 
looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 is 
able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have the 
results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I 
can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-06-13 Thread Andreas Lehmkühler

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text extraction 
issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were "omitted" and 
in 2.0.32 there is some special char. But th remaining part looks good 
to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 is able 
to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't investigated yet.

Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have the 
results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I can 
get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-06-04 Thread Andreas Lehmkühler

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I can get.

Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



PDFBox 2.0.32 release

2024-06-01 Thread Andreas Lehmkühler

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: about Scripts.txt

2024-03-31 Thread Andreas Lehmkühler




Am 31.03.24 um 14:19 schrieb Tilman Hausherr:
I ran a test a few days ago, the build worked, but I'm wondering what 
should be the effect?
New unidode versions just add new unicode mappings. I guess most of them 
are unrelated to OTF but who knows maybe some are useful.


However, I don't expect any real improvement, just a up to date version 
of the scripts file


Andreas



Tilman

On 31.03.2024 14:01, Andreas Lehmkühler wrote:

Hi,

thanks for the pointer.

AFAIU unicodes this should be just an update with additional values, 
so that IMHO there isn't any reason not to update the file.


WDYT?

Andreas

Am 25.03.24 um 19:38 schrieb Dieter von Holten:

hi there



while browsing through the sources i came across OpenTypeScript.java, 
which

loads the

resource-file Scripts.txt

The file contains a list of circa 2700 Unicode codepoints.

The file is version 10.0.0 of 2017-03-11 .



A reference points to a newer version of this file:



http://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt



which is version 15.1.0 of 2023-07-28, it contains circa 3000 Unicode
codepoints.



i propose to investigate, if this newer file can be included in 
PdfBox and

works for older Jdk-versions, as supported for PdfBox 2.



MfG

DvH










-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: about Scripts.txt

2024-03-31 Thread Andreas Lehmkühler

Hi,

thanks for the pointer.

AFAIU unicodes this should be just an update with additional values, so 
that IMHO there isn't any reason not to update the file.


WDYT?

Andreas

Am 25.03.24 um 19:38 schrieb Dieter von Holten:

hi there



while browsing through the sources i came across  OpenTypeScript.java, which
loads the

resource-file Scripts.txt

The file contains a list of circa 2700 Unicode codepoints.

The file is version 10.0.0 of 2017-03-11 .



A reference points to a newer version of this file:



 http://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt



which is version 15.1.0 of 2023-07-28, it contains circa 3000 Unicode
codepoints.



i propose to investigate, if this newer file can be included in PdfBox and
works for older Jdk-versions, as supported for PdfBox 2.



MfG

DvH










-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: 2.0.30 vs 2.0.31 reloaded

2024-03-28 Thread Andreas Lehmkühler

Cool!

@Tilman thanks again

Am 27.03.24 um 20:47 schrieb Tilman Hausherr:
During the tika regression tests it turned out that there is a longer 
PDF list, so I ran the tests again with that longer list. The good news 
is that all is perfect, no new exceptions, and no loss of content.


Tilman

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.30_vs_2.0.31_1.tar.xz


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: NVD update hangs during release build

2024-03-24 Thread Andreas Lehmkühler

Looks like they solved their issue. I've reactivated the plugin.

Andreas

Am 21.03.24 um 18:55 schrieb Andreas Lehmkühler:
I've simply deactivated the plugin as proposed, so that we can do the 
release.


I don't like the Idea to go back to an old version. I'm pretty sure 
someone will fix that issue as we aren't the only ones using that plugin.


Andreas

Am 21.03.24 um 18:13 schrieb sahy...@fileaffairs.de:

OK - can replicate the issue too. works for me locally up to
dependency-check-maven 8.4.3 - would that be an option?

BR
Maruan

Am Donnerstag, dem 21.03.2024 um 17:38 +0100 schrieb Tilman Hausherr:

add

-Ppedantic

Tilman

On 21.03.2024 17:28, sahy...@fileaffairs.de wrote:

which mvn cmd do in need to issue to trigger the check? mvn clean
install didn't for me. Am I missing something?

BR
Maruan

Am Donnerstag, dem 21.03.2024 um 17:24 +0100 schrieb Tilman
Hausherr:

Jeremy Long wrote something that I haven't really understood.
Maybe
it
means building the NVD archive on a separate system and then
transferring it.

https://github.com/jeremylong/DependencyCheck/issues/6515#issuecomment-2011824975

However a leter message in the same issue made more sense, I'm
testing
locally with

https://dependency-check.github.io/DependencyCheck_Builder/nvd_cache/


Tilman

On 21.03.2024 09:48, sahy...@fileaffairs.de wrote:

Mhmm - is there a way to build locally and test the NVD update?

Ran it on a different project I have for a client locally and
NVD
update worked without issues and without an API key.

BR
Maruan

Am Donnerstag, dem 21.03.2024 um 08:36 +0100 schrieb Tilman
Hausherr:

I meant adding true to the  part.

Something isn't ok with NVD, maybe it got worse since then:
https://blog.fefe.de/?ts=9b0740e0
https://www.heise.de/news/Sicherheitsforscher-genervt-Luecken-Datenbank-NVD-seit-Wochen-unvollstaendig-9656574.html

Tilman

On 20.03.2024 22:05, Andreas Lehmkühler wrote:

Am 20.03.24 um 21:16 schrieb Tilman Hausherr:

If you still have the time, you could add a "skip" for
that
plugin;
the last successful build was this morning and no library
changes
were made since then. (and we still have a few days to
find
out
if
any libraries are now considered risky)

Good idea, but -Ddependency-check.skip=true doesn't work
either, it
still tries to update :-(

I'm going to continue tomorrow 

Andreas


Tilman

On 20.03.2024 21:13, Tilman Hausherr wrote:

Seems it's a general problem:
https://github.com/jeremylong/DependencyCheck/issues/6515#issuecomment-2009879851


it also hangs on my local machine now, I don't have an
API
key.

Tilman


On 20.03.2024 20:57, Andreas Lehmkühler wrote:

Hi,

I'm trying to cut the 2.0.31 release but it always
hangs
when
the
build tries to update the NVD data.

Last week when I built the 3.0.2 release I had a
similar
effect.
The update was very slow but in the end it came to an
end
worked.

Now, nothing happens, the last words are

[INFO] [WARNING] An NVD API Key was not provided - it
is
highly
recommended to use an NVD API key as the update can
take
a
VERY
long time without an API Key

nothing more after that. It simply hangs

I've requested an api key, got one and now I'm trying
to
get
it
work, but it doesn't.

I've tried

* the mvn option -DnvdApiKey=
* define a server "nvd" in .m2/settings.xml including
the
key
and
add -DnvdApiServerId=nvd  to the commandline
* define the environment variable NVD_API_KEY and add
-DnvdApiKeyEnvironmentVariable=NVD_API_KEY to the
commandline

Nothing works, I've always got those famous words: An
NVD
API
Key
was not provide 


Any idea to get around this?

Andreas

P.S.: I'm on linux using coretto-8.332 and mvn 3.9.3


-



To unsubscribe, e-mail:
dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail:
dev-h...@pdfbox.apache.org


---


--
To unsubscribe, e-mail:
dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail:
dev-h...@pdfbox.apache.org


-



To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail:
dev-h...@pdfbox.apache.org


---


--
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


-


To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


---

--
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
--

[ANNOUNCE] Apache PDFBox 2.0.31 released

2024-03-24 Thread Andreas Lehmkühler

The Apache PDFBox community is pleased to announce the release of
Apache PDFBox version 2.0.31 The release is available for download at:

https://pdfbox.apache.org/download.html

See the full release notes below for details about this release.

Release Notes -- Apache PDFBox -- Version 2.0.31

Introduction


The Apache PDFBox library is an open source Java tool for working with 
PDF documents.


This is an incremental bugfix release based on the earlier 2.0.30 
release. It contains

a couple of fixes and small improvements.

For more details on these changes and all the other fixes and improvements
included in this release, please refer to the following issues on the
PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.

Bug

[PDFBOX-2725] - [PATCH] Split pdf lose accessibility tags
[PDFBOX-5375] - Allow creating of PDFXObjectImage without accessing to 
the image stream
[PDFBOX-5713] - PfbParser fails to parse PFB font with multiple binary 
records.

[PDFBOX-5715] - Lines vanish when printing on MacOS
[PDFBOX-5718] - java.lang.IllegalArgumentException: Provided dictionary 
is not of type 'COSName{OCG}'
[PDFBOX-5721] - The embedded font DroidSansFallbackFull reports an error 
when parsing, and finally uses lastResortFont, resulting in garbled fonts.

[PDFBOX-5723] - COSName caches already cached hashCode
[PDFBOX-5727] - Font operation takes a long time with 3.0.1
[PDFBOX-5728] - NullPointerException in TTFSubsetter.buildPostTable()
[PDFBOX-5732] - Problem converting PDF to image 
(java.awt.color.CMMException: Can not access specified profile)

[PDFBOX-5735] - Set the default value for PDNonTerminalField
[PDFBOX-5737] - java.lang.ArrayIndexOutOfBoundsException Bug Report
[PDFBOX-5738] - Wrong colors in PDF since PDFBOX-5488
[PDFBOX-5740] - Java 7 support on 2.0
[PDFBOX-5751] - Convert to image exception
[PDFBOX-5754] - PDF conversion in this format is very slow. Is there any 
room for optimization?

[PDFBOX-5763] - IllegalArgumentException: -Infinity is not a finite number
[PDFBOX-5772] - Inconsistent signature page handling when signing in 
existing signature fields

[PDFBOX-5773] - Add leading "0" for octal values in MacOSRomanEncoding
[PDFBOX-5776] - DataFormatException: invalid distance too far back
[PDFBOX-5778] - Grayscale JPEG rendered multicolor
[PDFBOX-5781] - OutOfMemoryError in FileSystemFontsProvider.scanFonts
[PDFBOX-5782] - NPE in PageDrawer.getPaint()
[PDFBOX-5785] - Issue with embedded Font and descendant Font
[PDFBOX-5787] - LCMS error 13: Mismatched alpha channels

New Feature

[PDFBOX-5768] - Enable Native Markdown Extraction in Apache PDFBox

Improvement

[PDFBOX-5762] - When splitting, keep page destinations that are part of 
target document(s)

[PDFBOX-5783] - Replace Exception with some repair attempt

Task

[PDFBOX-5739] - Add test for PDFBOX-3347
[PDFBOX-5741] - Add test for PDFBOX-4106

Release Contents


This release consists of a single source archive packaged as a zip file.
The archive can be unpacked with the jar tool from your JDK installation.
See the README.txt file for instructions on how to build this release.

The source archive is accompanied by a SHA512 checksum and a PGP signature
that you can use to verify the authenticity of your download.
The public key used for the PGP signature can be found at
https://www.apache.org/dist/pdfbox/KEYS.

About Apache PDFBox
---

Apache PDFBox is an open source Java library for working with PDF documents.
This project allows creation of new PDF documents, manipulation of existing
documents and the ability to extract content from documents. Apache PDFBox
also includes several command line utilities. Apache PDFBox is published
under the Apache License, Version 2.0.

For more information, visit https://pdfbox.apache.org/

About The Apache Software Foundation


Established in 1999, The Apache Software Foundation provides organizational,
legal, and financial support for more than 100 freely-available,
collaboratively-developed Open Source projects. The pragmatic Apache License
enables individual and commercial users to easily deploy Apache software;
the Foundation's intellectual property framework limits the legal exposure
of its 2,500+ contributors.

For more information, visit https://www.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[RESULT][VOTE] Release Apache PDFBox 2.0.31

2024-03-24 Thread Andreas Lehmkühler

Am 21.03.24 um 18:51 schrieb Andreas Lehmkühler:

Please vote on releasing this package as Apache PDFBox 2.0.31.


   +1 Tilman Hausherr
   +1 Maruan Sahyoun
   +1 Timo Allison
   +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.31

2024-03-23 Thread Andreas Lehmkühler



Am 22.03.24 um 08:29 schrieb Tilman Hausherr:

On 22.03.2024 06:53, Andreas Lehmkühler wrote:
Is this a showstopper, shall I cancel the release? 


No

Or do we just live with another/the last release with that issue? 


I prefer it to be fixed


Of course. I was referring to the former releases, which are broken too 
w.r.t. to the sources zip.


Andreas



Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.31

2024-03-21 Thread Andreas Lehmkühler




Am 21.03.24 um 20:07 schrieb Tim Allison:

In the parent pom.xml in the zip file, there's a "release" submodule
specified. However, there's no release directory in the src zip that would
match: https://svn.apache.org/repos/asf/pdfbox/tags/2.0.31/release/

Is that expected?

Hmmm, of course not. Thanks for the pointer.

I've rearranged the structure in [1] and never realized that the empty 
"release" subproject won't show up in the sources-zip. Obviously nobody 
tried to build one of the last releases from the sources-zip.


However, I'm going to look into this.

Is this a showstopper, shall I cancel the release? Or do we just live 
with another/the last release with that issue?



[1] https://issues.apache.org/jira/browse/PDFBOX-5699




On Thu, Mar 21, 2024 at 1:53 PM Andreas Lehmkühler 
wrote:


Hi,

a candidate for the PDFBox 2.0.31 release is avaiable at:

  https://dist.apache.org/repos/dist/dev/pdfbox/2.0.31/

The release candidate is a zip archive of the sources in:

  https://svn.apache.org/repos/asf/pdfbox/tags/2.0.31/

The SHA-512 checksum of the archive is

c231ccebf918b8aa0dc80d3162fc88ff4ab78d586bcead0ef0cc44a6cab4f6d455112497ad866901e3948a6c76320d19487c3be7e7c1e66c5e2733de82fe3f09.

Please vote on releasing this package as Apache PDFBox 2.0.31.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

  [ ] +1 Release this package as Apache PDFBox 2.0.31
  [ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org






-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: NVD update hangs during release build

2024-03-21 Thread Andreas Lehmkühler
I've simply deactivated the plugin as proposed, so that we can do the 
release.


I don't like the Idea to go back to an old version. I'm pretty sure 
someone will fix that issue as we aren't the only ones using that plugin.


Andreas

Am 21.03.24 um 18:13 schrieb sahy...@fileaffairs.de:

OK - can replicate the issue too. works for me locally up to
dependency-check-maven 8.4.3 - would that be an option?

BR
Maruan

Am Donnerstag, dem 21.03.2024 um 17:38 +0100 schrieb Tilman Hausherr:

add

-Ppedantic

Tilman

On 21.03.2024 17:28, sahy...@fileaffairs.de wrote:

which mvn cmd do in need to issue to trigger the check? mvn clean
install didn't for me. Am I missing something?

BR
Maruan

Am Donnerstag, dem 21.03.2024 um 17:24 +0100 schrieb Tilman
Hausherr:

Jeremy Long wrote something that I haven't really understood.
Maybe
it
means building the NVD archive on a separate system and then
transferring it.

https://github.com/jeremylong/DependencyCheck/issues/6515#issuecomment-2011824975

However a leter message in the same issue made more sense, I'm
testing
locally with

https://dependency-check.github.io/DependencyCheck_Builder/nvd_cache/


Tilman

On 21.03.2024 09:48, sahy...@fileaffairs.de wrote:

Mhmm - is there a way to build locally and test the NVD update?

Ran it on a different project I have for a client locally and
NVD
update worked without issues and without an API key.

BR
Maruan

Am Donnerstag, dem 21.03.2024 um 08:36 +0100 schrieb Tilman
Hausherr:

I meant adding true to the  part.

Something isn't ok with NVD, maybe it got worse since then:
https://blog.fefe.de/?ts=9b0740e0
https://www.heise.de/news/Sicherheitsforscher-genervt-Luecken-Datenbank-NVD-seit-Wochen-unvollstaendig-9656574.html

Tilman

On 20.03.2024 22:05, Andreas Lehmkühler wrote:

Am 20.03.24 um 21:16 schrieb Tilman Hausherr:

If you still have the time, you could add a "skip" for
that
plugin;
the last successful build was this morning and no library
changes
were made since then. (and we still have a few days to
find
out
if
any libraries are now considered risky)

Good idea, but -Ddependency-check.skip=true doesn't work
either, it
still tries to update :-(

I'm going to continue tomorrow 

Andreas


Tilman

On 20.03.2024 21:13, Tilman Hausherr wrote:

Seems it's a general problem:
https://github.com/jeremylong/DependencyCheck/issues/6515#issuecomment-2009879851




it also hangs on my local machine now, I don't have an
API
key.

Tilman


On 20.03.2024 20:57, Andreas Lehmkühler wrote:

Hi,

I'm trying to cut the 2.0.31 release but it always
hangs
when
the
build tries to update the NVD data.

Last week when I built the 3.0.2 release I had a
similar
effect.
The update was very slow but in the end it came to an
end
worked.

Now, nothing happens, the last words are

[INFO] [WARNING] An NVD API Key was not provided - it
is
highly
recommended to use an NVD API key as the update can
take
a
VERY
long time without an API Key

nothing more after that. It simply hangs

I've requested an api key, got one and now I'm trying
to
get
it
work, but it doesn't.

I've tried

* the mvn option -DnvdApiKey=
* define a server "nvd" in .m2/settings.xml including
the
key
and
add -DnvdApiServerId=nvd  to the commandline
* define the environment variable NVD_API_KEY and add
-DnvdApiKeyEnvironmentVariable=NVD_API_KEY to the
commandline

Nothing works, I've always got those famous words: An
NVD
API
Key
was not provide 


Any idea to get around this?

Andreas

P.S.: I'm on linux using coretto-8.332 and mvn 3.9.3


-



To unsubscribe, e-mail:
dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail:
dev-h...@pdfbox.apache.org


---


--
To unsubscribe, e-mail:
dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail:
dev-h...@pdfbox.apache.org


-



To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail:
dev-h...@pdfbox.apache.org


---


--
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


-


To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


---

--
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-

To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache

[VOTE] Release Apache PDFBox 2.0.31

2024-03-21 Thread Andreas Lehmkühler

Hi,

a candidate for the PDFBox 2.0.31 release is avaiable at:

https://dist.apache.org/repos/dist/dev/pdfbox/2.0.31/

The release candidate is a zip archive of the sources in:

https://svn.apache.org/repos/asf/pdfbox/tags/2.0.31/

The SHA-512 checksum of the archive is 
c231ccebf918b8aa0dc80d3162fc88ff4ab78d586bcead0ef0cc44a6cab4f6d455112497ad866901e3948a6c76320d19487c3be7e7c1e66c5e2733de82fe3f09.


Please vote on releasing this package as Apache PDFBox 2.0.31.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

[ ] +1 Release this package as Apache PDFBox 2.0.31
[ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: NVD update hangs during release build

2024-03-20 Thread Andreas Lehmkühler




Am 20.03.24 um 21:16 schrieb Tilman Hausherr:
If you still have the time, you could add a "skip" for that plugin; the 
last successful build was this morning and no library changes were made 
since then. (and we still have a few days to find out if any libraries 
are now considered risky)
Good idea, but -Ddependency-check.skip=true doesn't work either, it 
still tries to update :-(


I'm going to continue tomorrow 

Andreas



Tilman

On 20.03.2024 21:13, Tilman Hausherr wrote:

Seems it's a general problem:
https://github.com/jeremylong/DependencyCheck/issues/6515#issuecomment-2009879851

it also hangs on my local machine now, I don't have an API key.

Tilman


On 20.03.2024 20:57, Andreas Lehmkühler wrote:

Hi,

I'm trying to cut the 2.0.31 release but it always hangs when the 
build tries to update the NVD data.


Last week when I built the 3.0.2 release I had a similar effect. The 
update was very slow but in the end it came to an end worked.


Now, nothing happens, the last words are

[INFO] [WARNING] An NVD API Key was not provided - it is highly 
recommended to use an NVD API key as the update can take a VERY long 
time without an API Key


nothing more after that. It simply hangs

I've requested an api key, got one and now I'm trying to get it work, 
but it doesn't.


I've tried

* the mvn option -DnvdApiKey=
* define a server "nvd" in .m2/settings.xml including the key and add 
-DnvdApiServerId=nvd  to the commandline
* define the environment variable NVD_API_KEY and add 
-DnvdApiKeyEnvironmentVariable=NVD_API_KEY to the commandline


Nothing works, I've always got those famous words: An NVD API Key was 
not provide 



Any idea to get around this?

Andreas

P.S.: I'm on linux using coretto-8.332 and mvn 3.9.3


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



NVD update hangs during release build

2024-03-20 Thread Andreas Lehmkühler

Hi,

I'm trying to cut the 2.0.31 release but it always hangs when the build 
tries to update the NVD data.


Last week when I built the 3.0.2 release I had a similar effect. The 
update was very slow but in the end it came to an end worked.


Now, nothing happens, the last words are

[INFO] [WARNING] An NVD API Key was not provided - it is highly 
recommended to use an NVD API key as the update can take a VERY long 
time without an API Key


nothing more after that. It simply hangs

I've requested an api key, got one and now I'm trying to get it work, 
but it doesn't.


I've tried

* the mvn option -DnvdApiKey=
* define a server "nvd" in .m2/settings.xml including the key and add 
-DnvdApiServerId=nvd  to the commandline
* define the environment variable NVD_API_KEY and add 
-DnvdApiKeyEnvironmentVariable=NVD_API_KEY to the commandline


Nothing works, I've always got those famous words: An NVD API Key was 
not provide 



Any idea to get around this?

Andreas

P.S.: I'm on linux using coretto-8.332 and mvn 3.9.3


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.31 release

2024-03-19 Thread Andreas Lehmkühler

@Tilman, thanks again for running the regression tests.

I'm going to cut the release tomorrow in about 24 hours for now.

Andreas

Am 15.03.24 um 19:34 schrieb Tilman Hausherr:

Regression tests result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.30_vs_2.0.31.tar.xz

Nothing to do, only improvements.

Tilman

On 14.03.2024 22:06, Andreas Lehmkühler wrote:

Hi,

now that 3.0.2 is out of the door I'd like to continue with a new 2.0 
release.


How about cutting a 2.0.31 release next Wednesday?

Any objections or is there something we should add/fix first?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



PDFBox 2.0.31 release

2024-03-14 Thread Andreas Lehmkühler

Hi,

now that 3.0.2 is out of the door I'd like to continue with a new 2.0 
release.


How about cutting a 2.0.31 release next Wednesday?

Any objections or is there something we should add/fix first?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[ANNOUNCE] Apache PDFBox 3.0.2 released

2024-03-14 Thread Andreas Lehmkühler

The Apache PDFBox community is pleased to announce the release of
Apache PDFBox version 3.0.2. The release is available for download at:

https://pdfbox.apache.org/download.html

See the full release notes below for details about this release.

Release Notes -- Apache PDFBox -- Version 3.0.2

Introduction


The Apache PDFBox library is an open source Java tool for working with 
PDF documents.


This is an incremental bugfix release based on the earlier 3.0.1 
release. It contains

a couple of fixes and small improvements.

A migration guide is available at 
https://pdfbox.apache.org/3.0/migration.html. It is
still a work in progress and we are happy to include any valuable 
feedback from our

community.

For more details on these changes and all the other fixes and improvements
included in this release, please refer to the following issues on the
PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.

Bug

[PDFBOX-2725] - [PATCH] Split pdf lose accessibility tags
[PDFBOX-5375] - Allow creating of PDFXObjectImage without accessing to 
the image stream

[PDFBOX-5704] - char not rendered
[PDFBOX-5714] - PDFBox 3.0 regression: duplicate references in 
dictionary values

[PDFBOX-5715] - Lines vanish when printing on MacOS
[PDFBOX-5717] - NullPointerException calling 
saveIncrementalForExternalSigning
[PDFBOX-5721] - The embedded font DroidSansFallbackFull reports an error 
when parsing, and finally uses lastResortFont, resulting in garbled fonts.

[PDFBOX-5722] - Wrong scope for maven dependencies
[PDFBOX-5723] - COSName caches already cached hashCode
[PDFBOX-5724] - CharStringCommand.equals() does not conform to the 
contract of Object.equals

[PDFBOX-5727] - Font operation takes a long time with 3.0.1
[PDFBOX-5728] - NullPointerException in TTFSubsetter.buildPostTable()
[PDFBOX-5730] - The expected SubstFormat for ExtensionSubstFormat1 
subtable is 108 but should be 1
[PDFBOX-5732] - Problem converting PDF to image 
(java.awt.color.CMMException: Can not access specified profile)
[PDFBOX-5733] - lookupType is to be replaced by extensionLookupType in 
type 7 lookup table

[PDFBOX-5735] - Set the default value for PDNonTerminalField
[PDFBOX-5737] - java.lang.ArrayIndexOutOfBoundsException Bug Report
[PDFBOX-5738] - Wrong colors in PDF since PDFBOX-5488
[PDFBOX-5742] - Split result PDFs broken
[PDFBOX-5744] - EOFException while readMultipleSubstitutionSubtable()
[PDFBOX-5745] - EOFException while readSingleLookupSubTable()
[PDFBOX-5748] - Cannot get overlayPDF working on command line interface
[PDFBOX-5751] - Convert to image exception
[PDFBOX-5752] - Font errors after copying a page to another document
[PDFBOX-5754] - PDF conversion in this format is very slow. Is there any 
room for optimization?

[PDFBOX-5757] - streamCacheCreateFunction not passed to PDFParser
[PDFBOX-5758] - ExceptionInInitializerError when unmapping is not supported
[PDFBOX-5760] - NPE in FIlter.decode() when called with empty list
[PDFBOX-5763] - IllegalArgumentException: -Infinity is not a finite number
[PDFBOX-5764] - Wrong chunksize when using a ByteBuffer to initialize a 
RandomAccessReadBuffer
[PDFBOX-5772] - Inconsistent signature page handling when signing in 
existing signature fields

[PDFBOX-5773] - Add leading "0" for octal values in MacOSRomanEncoding
[PDFBOX-5775] - importPage destroys annotations
[PDFBOX-5776] - DataFormatException: invalid distance too far back
[PDFBOX-5778] - Grayscale JPEG rendered multicolor
[PDFBOX-5781] - OutOfMemoryError in FileSystemFontsProvider.scanFonts
[PDFBOX-5782] - NPE in PageDrawer.getPaint()

New Feature

[PDFBOX-5768] - Enable Native Markdown Extraction in Apache PDFBox

Improvement

[PDFBOX-5729] - GsubWorkerForDevanagari and GsubWorkerForGujarati created
[PDFBOX-5762] - When splitting, keep page destinations that are part of 
target document(s)

[PDFBOX-5783] - Replace Exception with some repair attempt

Task

[PDFBOX-5739] - Add test for PDFBOX-3347
[PDFBOX-5741] - Add test for PDFBOX-4106

Release Contents


This release consists of a single source archive packaged as a zip file.
The archive can be unpacked with the jar tool from your JDK installation.
See the README.txt file for instructions on how to build this release.

The source archive is accompanied by SHA512 checksums and a PGP signature
that you can use to verify the authenticity of your download.
The public key used for the PGP signature can be found at
https://www.apache.org/dist/pdfbox/KEYS.

About Apache PDFBox
---

Apache PDFBox is an open source Java library for working with PDF documents.
This project allows creation of new PDF documents, manipulation of existing
documents and the ability to extract content from documents. Apache PDFBox
also includes several command line utilities. Apache PDFBox is published
under the Apache License, Version 2.0.

For more information, visit https://pdfbox.apache.org/

About The Apache Software Foundation

Re: [VOTE] Release Apache PDFBox 3.0.2

2024-03-14 Thread Andreas Lehmkühler



Am 11.03.24 um 20:24 schrieb Andreas Lehmkühler:

Please vote on releasing this package as Apache PDFBox 3.0.2.


   +1 Tilman Hausherr
   +1 Maruan Sahyoun
   +1 Timo Boehme
   +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.

Andreas


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[VOTE] Release Apache PDFBox 3.0.2

2024-03-11 Thread Andreas Lehmkühler

Hi,

a candidate for the PDFBox 3.0.2 release is available at:

https://dist.apache.org/repos/dist/dev/pdfbox/3.0.2/

The release candidate is a zip archive of the sources in:

https://svn.apache.org/repos/asf/pdfbox/tags/3.0.2/

The SHA-512 checksum of the archive is 
d2eaaa4e7a139b00d79d7518ca66ee2c33300dbeed11c05554413e478b2a76814a7404a9467cb2dc3502840259188965a3483342c7d44e3280b68649aec670f8.


Please vote on releasing this package as Apache PDFBox 3.0.2.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

[ ] +1 Release this package as Apache PDFBox 3.0.2
[ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 3.0.2 release?

2024-03-10 Thread Andreas Lehmkühler

Hi,

as there aren't any objection I'm going to cut the release tomorrow or 
the day after tomorrow.


Andreas

Am 04.03.24 um 07:54 schrieb Andreas Lehmkühler:

Hi,

the import content issue seems to be solved, see PDFBOX-5752 and 
PDFBOX-5775.


How about cutting a 3.0.2 release in a week from now?

Any objections or is there something we should add/fix first?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 3.0.2 release?

2024-03-09 Thread Andreas Lehmkühler




Am 08.03.24 um 13:34 schrieb Tilman Hausherr:

On 08.03.2024 07:13, Tilman Hausherr wrote:

regression test result:

https://home.snafu.de/tilman/tmp/reports_pdfbox_3.0.1_vs_3.0.2.tar.xz

Thanks for running the regression tests.


Re exceptions:

- The OOM can't be reproduced

- The two others are related to the zip bomb protection and (probably) a 
recent change (PDFBOX-5704)

I've found a solution for that case, see PDFBOX-5783


Andreas


Re text extraction:

commoncrawl3/TQ/TQVMNMW5ACPU3CZL46OBNGWMPSSXC5MO: that file is a mess 
anyway


commoncrawl3/Y2/Y2PVHNL43FBNKZRAJTSX5J5BLLHMCNLY: same

bug_trackers/pdf.js/pdf.js-11651-0.pdf: might be related to the 
exception I mentioned, the stack trace looks similar. The result is that 
a broken font is no longer replaced. It can be fixed by catching the 
exception when fontFile.createView() is called in PDFOntFactory and 
returning null.


bug_trackers/poppler-gitlab/poppler-748-0.tgz-1.pdf: messy file. But 
there is an NPE on page 2, that can be fixed easily


commoncrawl3/JP/JPO3LX6ABADSDNC5BIX3KZJBRFT5BIEQ: messy file

commoncrawl3/4L/4L2UKWSZNPXPSGS3OTXQZBBKJH6XF7G4: same

Tilman




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



PDFBox 3.0.2 release?

2024-03-03 Thread Andreas Lehmkühler

Hi,

the import content issue seems to be solved, see PDFBOX-5752 and 
PDFBOX-5775.


How about cutting a 3.0.2 release in a week from now?

Any objections or is there something we should add/fix first?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 3.0.1 / RandomAccessReadBuffer bug?

2024-02-08 Thread Andreas Lehmkühler

Done, I've fixed the issue.

@David thanks for the report

Andreas

Am 08.02.24 um 07:58 schrieb Andreas Lehmkühler:

Hi David,

thanks for the bug report. You are right and the proposed solution seems 
to be a valid fix.


I've created https://issues.apache.org/jira/browse/PDFBOX-5764 to handle 
it.


Andreas

Am 08.02.24 um 00:00 schrieb david.kl...@atlas.cz:

Hello


I think that this is not correct in some cases:


   public RandomAccessReadBuffer(ByteBuffer input) {

 chunkSize = input.capacity();


IMHO input.limit() shoud be used instead of input.capacity().


When it matters: I have a ByteArrayOutputStream to that is written a PDF
document. Later I want to open the PDF document using PDFBox again. If 
I use

part of the internal buffer directly (without copying), eg.
ByteBuffer.wrap(bos.getInternalBuffer(), 0, bos.size()), I get exception
like that:


java.lang.IllegalArgumentException: newPosition > limit: (31556 > 20960)


 at
java.base/java.nio.Buffer.createPositionException(Buffer.java:352)

 at java.base/java.nio.Buffer.position(Buffer.java:327)

 at
java.base/java.nio.ByteBuffer.position(ByteBuffer.java:1551)

 at
java.base/java.nio.ByteBuffer.position(ByteBuffer.java:285)

 at
org.apache.pdfbox.io.RandomAccessReadBuffer.seek(RandomAccessReadBuffer.java
:187)

 at
org.apache.pdfbox.pdfparser.COSParser.getStartxrefOffset(COSParser.java:506)

 at
org.apache.pdfbox.pdfparser.COSParser.retrieveTrailer(COSParser.java:259)

 at
org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:107)

 at
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:171)

 at
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:136)

 at org.apache.pdfbox.Loader.loadPDF(Loader.java:466)

 at org.apache.pdfbox.Loader.loadPDF(Loader.java:369)


31556 is the buffer capacity, 20960 is its limit


I think the buffer should not be read beyond its limit.


Regards

David Klika




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 3.0.1 / RandomAccessReadBuffer bug?

2024-02-07 Thread Andreas Lehmkühler

Hi David,

thanks for the bug report. You are right and the proposed solution seems 
to be a valid fix.


I've created https://issues.apache.org/jira/browse/PDFBOX-5764 to handle it.

Andreas

Am 08.02.24 um 00:00 schrieb david.kl...@atlas.cz:

Hello

  


I think that this is not correct in some cases:

  


   public RandomAccessReadBuffer(ByteBuffer input) {

 chunkSize = input.capacity();

  


IMHO input.limit() shoud be used instead of input.capacity().

  


When it matters: I have a ByteArrayOutputStream to that is written a PDF
document. Later I want to open the PDF document using PDFBox again. If I use
part of the internal buffer directly (without copying), eg.
ByteBuffer.wrap(bos.getInternalBuffer(), 0, bos.size()), I get exception
like that:

  


java.lang.IllegalArgumentException: newPosition > limit: (31556 > 20960)

  


 at
java.base/java.nio.Buffer.createPositionException(Buffer.java:352)

 at java.base/java.nio.Buffer.position(Buffer.java:327)

 at
java.base/java.nio.ByteBuffer.position(ByteBuffer.java:1551)

 at
java.base/java.nio.ByteBuffer.position(ByteBuffer.java:285)

 at
org.apache.pdfbox.io.RandomAccessReadBuffer.seek(RandomAccessReadBuffer.java
:187)

 at
org.apache.pdfbox.pdfparser.COSParser.getStartxrefOffset(COSParser.java:506)

 at
org.apache.pdfbox.pdfparser.COSParser.retrieveTrailer(COSParser.java:259)

 at
org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:107)

 at
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:171)

 at
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:136)

 at org.apache.pdfbox.Loader.loadPDF(Loader.java:466)

 at org.apache.pdfbox.Loader.loadPDF(Loader.java:369)

  


31556 is the buffer capacity, 20960 is its limit

  


I think the buffer should not be read beyond its limit.

  


Regards

David Klika




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: About https://issues.apache.org/jira/browse/PDFBOX-5704

2024-01-21 Thread Andreas Lehmkühler

I've fixed the issue in the trun k and the 3.0-branch.

The fix is limited to incorrect CID font definitions.

Let us know if there are any other cases and provide a sample.

Andreas

Am 21.01.24 um 11:23 schrieb Andreas Lehmkühler:
I had a look and found s solution based on the pdf.js implementation. 
I'm going to commit it once I've improved the code, for now it is still 
some kind of hacky.


And yes, PDFBOX-5704 is related to this proposal.

Please follow up on PDFBOX-5704

@Mike thanks for the valuable input

Andreas

Am 19.01.24 um 08:03 schrieb Andreas Lehmkühler:

Hi,

I'm not sure if both issues are similar. However, your proposal is an 
interesting idea and I guess it shouldn't be that hard to implement it.



Thanks for the input, I'm going to have a look.

Andreas

Am 19.01.24 um 04:49 schrieb Mike Li:

Hello team,

I recently encountered the problem that PDFBox cannot render Chinese, 
the problem is very similar to 
https://issues.apache.org/jira/browse/PDFBOX-5704.


In this case, the attached PDF file embedded a CCF font file, the 
correct font type/subtype should be /CIDFontType0 and /CIDFontType0C 
and should declare property /FontFile3. But it wrongly declared the 
subfont as a truetype, and it makes PDFBox uses TTF parser to parse 
the font file stream based on the declared type.


According to the spec, PDFBox does it right, but from the perspective 
of use, this looks more like a "bug", though this file would display 
good in other most used PDF readers (Adobe, Foxit, pdfjs etc.)


I have many years of working experience in PDF generation (iText, 
PDFBox, etc.), and I know that after a PDF is generated, as long as 
it can be displayed correctly in Adobe Reader, then it is considered 
correct. If another program cannot display it correctly, it will be 
considered a bug in other program. It's not fair, but it's reality. 
Many low-quality PDF generation tools/libraries are still widely used.


In pdf.js,  it will parse the font file first, and prefer the font 
type in font file rather than the type declared in font dictionary.

https://github.com/mozilla/pdf.js/blob/1cdbcfef821c7f6e81ea22fe68a8b815bca01c4e/src/core/fonts.js#L1052

So my question is "Is that possible that PDFBox provide some font 
processing workaround logic to handle such case?"


Thanks
Mike




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: About https://issues.apache.org/jira/browse/PDFBOX-5704

2024-01-21 Thread Andreas Lehmkühler
I had a look and found s solution based on the pdf.js implementation. 
I'm going to commit it once I've improved the code, for now it is still 
some kind of hacky.


And yes, PDFBOX-5704 is related to this proposal.

Please follow up on PDFBOX-5704

@Mike thanks for the valuable input

Andreas

Am 19.01.24 um 08:03 schrieb Andreas Lehmkühler:

Hi,

I'm not sure if both issues are similar. However, your proposal is an 
interesting idea and I guess it shouldn't be that hard to implement it.



Thanks for the input, I'm going to have a look.

Andreas

Am 19.01.24 um 04:49 schrieb Mike Li:

Hello team,

I recently encountered the problem that PDFBox cannot render Chinese, 
the problem is very similar to 
https://issues.apache.org/jira/browse/PDFBOX-5704.


In this case, the attached PDF file embedded a CCF font file, the 
correct font type/subtype should be /CIDFontType0 and /CIDFontType0C 
and should declare property /FontFile3. But it wrongly declared the 
subfont as a truetype, and it makes PDFBox uses TTF parser to parse 
the font file stream based on the declared type.


According to the spec, PDFBox does it right, but from the perspective 
of use, this looks more like a "bug", though this file would display 
good in other most used PDF readers (Adobe, Foxit, pdfjs etc.)


I have many years of working experience in PDF generation (iText, 
PDFBox, etc.), and I know that after a PDF is generated, as long as it 
can be displayed correctly in Adobe Reader, then it is considered 
correct. If another program cannot display it correctly, it will be 
considered a bug in other program. It's not fair, but it's reality. 
Many low-quality PDF generation tools/libraries are still widely used.


In pdf.js,  it will parse the font file first, and prefer the font 
type in font file rather than the type declared in font dictionary.

https://github.com/mozilla/pdf.js/blob/1cdbcfef821c7f6e81ea22fe68a8b815bca01c4e/src/core/fonts.js#L1052

So my question is "Is that possible that PDFBox provide some font 
processing workaround logic to handle such case?"


Thanks
Mike




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: About https://issues.apache.org/jira/browse/PDFBOX-5704

2024-01-18 Thread Andreas Lehmkühler

Hi,

I'm not sure if both issues are similar. However, your proposal is an 
interesting idea and I guess it shouldn't be that hard to implement it.



Thanks for the input, I'm going to have a look.

Andreas

Am 19.01.24 um 04:49 schrieb Mike Li:

Hello team,

I recently encountered the problem that PDFBox cannot render Chinese, the 
problem is very similar to https://issues.apache.org/jira/browse/PDFBOX-5704.

In this case, the attached PDF file embedded a CCF font file, the correct font 
type/subtype should be /CIDFontType0 and /CIDFontType0C and should declare 
property /FontFile3. But it wrongly declared the subfont as a truetype, and it 
makes PDFBox uses TTF parser to parse the font file stream based on the 
declared type.

According to the spec, PDFBox does it right, but from the perspective of use, this looks 
more like a "bug", though this file would display good in other most used PDF 
readers (Adobe, Foxit, pdfjs etc.)

I have many years of working experience in PDF generation (iText, PDFBox, 
etc.), and I know that after a PDF is generated, as long as it can be displayed 
correctly in Adobe Reader, then it is considered correct. If another program 
cannot display it correctly, it will be considered a bug in other program. It's 
not fair, but it's reality. Many low-quality PDF generation tools/libraries are 
still widely used.

In pdf.js,  it will parse the font file first, and prefer the font type in font 
file rather than the type declared in font dictionary.
https://github.com/mozilla/pdf.js/blob/1cdbcfef821c7f6e81ea22fe68a8b815bca01c4e/src/core/fonts.js#L1052

So my question is "Is that possible that PDFBox provide some font processing 
workaround logic to handle such case?"

Thanks
Mike




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox Board Report January 2024 due

2024-01-09 Thread Andreas Lehmkühler

I've submitted the report as is.

Thanks for the reviews.

Andreas

Am 08.01.24 um 08:14 schrieb Andreas Lehmkühler:

Hi,

find attached a quick draft of the board report we're expected to submit 
this month. It's based upon the report wizard template which can be 
found at [1]


Any comments or additions are appreciated ...


## Description:
The mission of PDFBox is the creation and maintenance of software 
related to

Java library for working with PDF documents

## Project Status:
Current project status: ongoing with moderate activity
Issues for the board: none

## Membership Data:
Apache PDFBox was founded 2009-10-21 (14 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

     3.0.1 was released on 2023-11-30.
     2.0.30 was released on 2023-11-04.
     3.0.0 was released on 2023-08-17.

## Community Health:
- there is a steady stream of contributions, bug reports and questions 
on the mailing lists
- we released the first minor release of our new 3.0.x line to fix some 
regression issues. A couple of improvements and further fixes were 
included as well.
- the development of the current trunk version 4.0.0 is an ongoing 
effort, e.g. we switched to Log4j2 and did some major refactorings



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Apache PDFBox Board Report January 2024 due

2024-01-07 Thread Andreas Lehmkühler

Hi,

find attached a quick draft of the board report we're expected to submit 
this month. It's based upon the report wizard template which can be 
found at [1]


Any comments or additions are appreciated ...


## Description:
The mission of PDFBox is the creation and maintenance of software 
related to

Java library for working with PDF documents

## Project Status:
Current project status: ongoing with moderate activity
Issues for the board: none

## Membership Data:
Apache PDFBox was founded 2009-10-21 (14 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

3.0.1 was released on 2023-11-30.
2.0.30 was released on 2023-11-04.
3.0.0 was released on 2023-08-17.

## Community Health:
- there is a steady stream of contributions, bug reports and questions 
on the mailing lists
- we released the first minor release of our new 3.0.x line to fix some 
regression issues. A couple of improvements and further fixes were 
included as well.
- the development of the current trunk version 4.0.0 is an ongoing 
effort, e.g. we switched to Log4j2 and did some major refactorings



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[ANNOUNCE] Apache PDFBox 3.0.1 released

2023-11-30 Thread Andreas Lehmkühler

The Apache PDFBox community is pleased to announce the release of
Apache PDFBox version 3.0.1. The release is available for download at:

https://pdfbox.apache.org/download.html

See the full release notes below for details about this release.

Release Notes -- Apache PDFBox -- Version 3.0.1

Introduction


The Apache PDFBox library is an open source Java tool for working with 
PDF documents.


This is an incremental bugfix release based on the earlier 3.0.0 
release. It contains a couple of fixes and small improvements.


A migration guide is available at 
https://pdfbox.apache.org/3.0/migration.html. It is still a work in 
progress and we are happy to include any valuable feedback from our 
community.


For more details on these changes and all the other fixes and 
improvements included in this release, please refer to the following 
issues on the PDFBox issue tracker at 
https://issues.apache.org/jira/browse/PDFBOX.


Sub-task
[PDFBOX-5663] - Implement "about" dialog

Bug
[PDFBOX-5350] - Regression unicode mapping in Korean document
[PDFBOX-5649] - NPE in DomXmpParser.parseLiDescription
[PDFBOX-5654] - Avoid NPE when processing CFF2 based fonts
[PDFBOX-5658] - IllegalArgumentException: Dimensions (width=458477041 
height=26) are too large

[PDFBOX-5662] - Can not see checkbox check
[PDFBOX-5665] - NPE when converting pdf to image.
[PDFBOX-5666] - error encountered in splitting pdf using ver 3.0.0
[PDFBOX-5668] - NullPointerException in XMPMetadata.getSchema()
[PDFBOX-5672] - PDFToImage might not correctly detect unsupported image 
formats

[PDFBOX-5673] - Refactor Stream operations and operations on collections
[PDFBOX-5681] - ConcurrentModificationException in getObjectsByType() in 3.x
[PDFBOX-5682] - Long/permanent hang in PDFBox 3.x
[PDFBOX-5684] - Font cache isn't effective on my machine, always rebuilds
[PDFBOX-5687] - PDFBox 3.0 OSGi bundle requires sun.java2d.cmm.kcms package
[PDFBOX-5689] - Many new warnings "newGlyph ... newValue: ... is trying 
to override the oldValue" after upgrade to V3.0.0

[PDFBOX-5694] - PDF to Image conversion results in different converted image
[PDFBOX-5696] - COSStream lost, becomes a COSDictionary
[PDFBOX-5702] - Text in a certain font is lost when converting pdf to image
[PDFBOX-5706] - Incorrect colors in image from PDFs (DCTDecode)
[PDFBOX-5707] - Avoid NPE when accessing the elements of a COSArray
[PDFBOX-5712] - Stackoverflow in split
[PDFBOX-5713] - PfbParser fails to parse PFB font with multiple binary 
records.
[PDFBOX-5718] - java.lang.IllegalArgumentException: Provided dictionary 
is not of type 'COSName{OCG}'


New Feature

[PDFBOX-5670] - Allow repeatable subcommands in the command line tools
[PDFBOX-5683] - Inconsistent/incomplete PDF rendering

Improvement

[PDFBOX-4892] - Improve code quality (4)
[PDFBOX-5664] - 3.0.0: PDFCloneUtility needs a protected constructor to 
be useable outside of PDFBox when using Java 9 JPMS

[PDFBOX-5685] - Reduce number of copies to lower memory footprint
[PDFBOX-5693] - Consolidate bouncycastle configuration
[PDFBOX-5699] - Consistent scm.url values for pom.xml
[PDFBOX-5703] - use comparison operators for enums
[PDFBOX-5705] - update log4j dependency to 2.21.0
[PDFBOX-5711] - Loader: add support for java.nio.file.Path

Test

[PDFBOX-5667] - Can't create test for ExtractText command line tool

Release Contents


This release consists of a single source archive packaged as a zip file.
The archive can be unpacked with the jar tool from your JDK installation.
See the README.txt file for instructions on how to build this release.

The source archive is accompanied by SHA512 checksums and a PGP signature
that you can use to verify the authenticity of your download.
The public key used for the PGP signature can be found at
https://www.apache.org/dist/pdfbox/KEYS.

About Apache PDFBox
---

Apache PDFBox is an open source Java library for working with PDF documents.
This project allows creation of new PDF documents, manipulation of existing
documents and the ability to extract content from documents. Apache PDFBox
also includes several command line utilities. Apache PDFBox is published
under the Apache License, Version 2.0.

For more information, visit https://pdfbox.apache.org/

About The Apache Software Foundation


Established in 1999, The Apache Software Foundation provides organizational,
legal, and financial support for more than 100 freely-available,
collaboratively-developed Open Source projects. The pragmatic Apache License
enables individual and commercial users to easily deploy Apache software;
the Foundation's intellectual property framework limits the legal exposure
of its 2,500+ contributors.

For more information, visit https://www.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[RESULT][VOTE] Release Apache PDFBox 3.0.1

2023-11-30 Thread Andreas Lehmkühler



Am 27.11.23 um 17:46 schrieb Andreas Lehmkühler:

Please vote on releasing this package as Apache PDFBox 3.0.1.


   +1 Tilman Hausherr
   +1 Maruan Sahyoun
   +1 Timo Boehme
   +1 Tim Allison
   +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[VOTE] Release Apache PDFBox 3.0.1

2023-11-27 Thread Andreas Lehmkühler

Hi,

a candidate for the PDFBox 3.0.1 release is available at:

https://dist.apache.org/repos/dist/dev/pdfbox/3.0.1/

The release candidate is a zip archive of the sources in:

https://svn.apache.org/repos/asf/pdfbox/tags/3.0.1/

The SHA-512 checksum of the archive is 
8ca8f3297ec04efaa23ab6d9ca421c1b39d8fb2de392e0f7b5aa6e7053eac75066e8b2872dc6b6847a0194b557aa8570de7f1d1a122fcf3888bf9ed21eae0257.


Please vote on releasing this package as Apache PDFBox 3.0.1.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

[ ] +1 Release this package as Apache PDFBox 3.0.1
[ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 3.0.1 release?

2023-11-26 Thread Andreas Lehmkühler




Am 26.11.23 um 09:45 schrieb Tilman Hausherr:

I looked at some suspicious differences:

N76NZUPHNGNM6TCEHWSLSDA5UKNH5C7D.pdf page 20: this one got better 
(akunavām instead of akunav m)

172096.pdf page 5: also better (bullet points)

poppler-11994-0.pdf: I couldn't reproduce the OOM. Maybe it's temporary, 
maybe it's a tika bug.

I wasn't able to reproduce the OOM either



👍

I'm going to cut the release tomorrow

@Tilman thanks for running the tests again

Andreas



Tilman



On 26.11.2023 04:41, Tilman Hausherr wrote:

Done:

https://home.snafu.de/tilman/tmp/reports_pdfbox_3.0.0_vs_3.0.1.tar.xz

Tilman

On 22.11.2023 08:08, Andreas Lehmkühler wrote:

Hi,

after fixing the latest regressions I'd like to cut the 3.0.1 release 
next Monday/Tuesday.


WDYT?

@Tim, @Tilman do you have the time to run the extraction tests 3.0.0 
vs 3.0.1 ?


Andreas


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



PDFBox 3.0.1 release?

2023-11-21 Thread Andreas Lehmkühler

Hi,

after fixing the latest regressions I'd like to cut the 3.0.1 release 
next Monday/Tuesday.


WDYT?

@Tim, @Tilman do you have the time to run the extraction tests 3.0.0 vs 
3.0.1 ?


Andreas


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Converting obfuscate PDFs to PS and back to PDF

2023-11-07 Thread Andreas Lehmkühler




Am 03.11.23 um 17:24 schrieb Tilman Hausherr:

https://www.danisch.de/blog/2023/10/31/aktennotiz-zu-pdftotext-bei-vermurksten-zeichensaetzen/

The text is in german but what he says that he was able to extract text 
from obfuscated PDFs by converting them to PostScript and then back to 
PDF. I didn't test this myself but I suspect that the conversion to 
PostScript dumps the /ToUnicode stream, and that it is rebuilt from the 
font itself when the conversion is done.

The information has to be somehwere otherwise such "conversion" won't work.

@Tilman did you try to contact the author to ask for an example?

Andreas


Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[ANNOUNCE] Apache PDFBox 2.0.30 released

2023-11-05 Thread Andreas Lehmkühler

The Apache PDFBox community is pleased to announce the release of
Apache PDFBox version 2.0.30. The release is available for download at:

https://pdfbox.apache.org/download.html

See the full release notes below for details about this release.

Release Notes -- Apache PDFBox -- Version 2.0.30

Introduction


The Apache PDFBox library is an open source Java tool for working with 
PDF documents.


This is an incremental bugfix release based on the earlier 2.0.29 
release. It contains

a couple of fixes and small improvements.

For more details on these changes and all the other fixes and improvements
included in this release, please refer to the following issues on the
PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.

Bug

[PDFBOX-5350] - Regression unicode mapping in Korean document
[PDFBOX-5359] - Operators "q" and "Q" should also preserve text matrices
[PDFBOX-5623] - Signature Image not Rendered starting with PDFBox 2.0.23 
+ patch provided

[PDFBOX-5627] - Fonts are not subsetted when saving incrementally
[PDFBOX-5628] - Bug in PDFMergerUtility#mergeFields
[PDFBOX-5639] - Password protected PDF opens in GUI apps but PDFbox says 
invalid password
[PDFBOX-5642] - Wrong error message "2.4.1 : Invalid Color space, The 
operator "rg" can't be used with CMYK Profile"

[PDFBOX-5644] - Make FDF annotations more compliant with the specification
[PDFBOX-5649] - NPE in DomXmpParser.parseLiDescription
[PDFBOX-5651] - Regression: NoSuchElementException in PDFXrefStreamParser
[PDFBOX-5653] - The PageDrawer.strokePath method is blocked, and cpu100%
[PDFBOX-5654] - Avoid NPE when processing CFF2 based fonts
[PDFBOX-5658] - IllegalArgumentException: Dimensions (width=458477041 
height=26) are too large

[PDFBOX-5662] - Can not see checkbox check
[PDFBOX-5665] - NPE when converting pdf to image.
[PDFBOX-5668] - NullPointerException in XMPMetadata.getSchema()
[PDFBOX-5672] - PDFToImage might not correctly detect unsupported image 
formats

[PDFBOX-5684] - Font cache isn't effective on my machine, always rebuilds
[PDFBOX-5694] - PDF to Image conversion results in different converted image
[PDFBOX-5702] - Text in a certain font is lost when converting pdf to image
[PDFBOX-5706] - Incorrect colors in image from PDFs (DCTDecode)

New Feature

[PDFBOX-5683] - Inconsistent/incomplete PDF rendering

Improvement

[PDFBOX-4892] - Improve code quality (4)
[PDFBOX-5630] - Add PDRectangle#TABLOID paper size
[PDFBOX-5631] - Support version 0.5 of MaximumProfileTable
[PDFBOX-5632] - loca-table isn't mandatory for TTF/OTF-fonts using CFF 
outlines

[PDFBOX-5636] - Implement PDF 2.0 dash phase clarification
[PDFBOX-5637] - Add getter and setter for the CO array under PDAcroForm
[PDFBOX-5645] - Make UTC timezone static
[PDFBOX-5650] - Facilitate migration to PDFBox 3.0
[PDFBOX-5693] - Consolidate bouncycastle configuration
[PDFBOX-5699] - Consistent scm.url values for pom.xml
[PDFBOX-5703] - use comparison operators for enums

Release Contents


This release consists of a single source archive packaged as a zip file.
The archive can be unpacked with the jar tool from your JDK installation.
See the README.txt file for instructions on how to build this release.

The source archive is accompanied by a SHA512 checksum and a PGP signature
that you can use to verify the authenticity of your download.
The public key used for the PGP signature can be found at
https://www.apache.org/dist/pdfbox/KEYS.

About Apache PDFBox
---

Apache PDFBox is an open source Java library for working with PDF documents.
This project allows creation of new PDF documents, manipulation of existing
documents and the ability to extract content from documents. Apache PDFBox
also includes several command line utilities. Apache PDFBox is published
under the Apache License, Version 2.0.

For more information, visit https://pdfbox.apache.org/

About The Apache Software Foundation


Established in 1999, The Apache Software Foundation provides organizational,
legal, and financial support for more than 100 freely-available,
collaboratively-developed Open Source projects. The pragmatic Apache License
enables individual and commercial users to easily deploy Apache software;
the Foundation's intellectual property framework limits the legal exposure
of its 2,500+ contributors.

For more information, visit https://www.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[RESULT][VOTE] Release Apache PDFBox 2.0.30

2023-11-05 Thread Andreas Lehmkühler



Am 01.11.23 um 20:23 schrieb Andreas Lehmkühler:

Please vote on releasing this package as Apache PDFBox 2.0.30.


   +1 Tilman Hausherr
   +1 Maruan Sahyoun
   +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.30

2023-11-04 Thread Andreas Lehmkühler

Hi,

just a friendly reminder, the vote ends in about 12 hours from now.

Andreas

Am 01.11.23 um 20:23 schrieb Andreas Lehmkühler:

Hi,

a candidate for the PDFBox 2.0.30 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.30/

The release candidate is a zip archive of the sources in:

     https://svn.apache.org/repos/asf/pdfbox/tags/2.0.30/

The SHA-512 checksum of the archive is 
c1e66695af16396f6a36d02972270651a4630b36799e1fe13262c5748b18cfcbb46829c847ab4993832018f5f8a0546eb468cafdb36019314e275351569d52cc.


Please vote on releasing this package as Apache PDFBox 2.0.30.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.30
     [ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: 2.0.30 build issues

2023-11-01 Thread Andreas Lehmkühler
I did it. I fixed most of the issues so that I was able to cut the 
2.0.30 release. There is one issue left, as the antrun plugin still 
doesn't work automatically so that I had to trigger it manually at the 
end of the process.


I'm going to investigate that one later, for today I'm done. I'll port 
those changes to th 3.0 branch and the trunk version as well, but no 
today ...


Sorry for the svn noise.

Andreas

Am 01.11.23 um 14:32 schrieb Andreas Lehmkühler:
I've fixed an issue due to a major update of the maven-antrun-plugin and 
ran into the next one :-(


Stay tuned ...

Am 01.11.23 um 13:32 schrieb Andreas Lehmkühler:

Hi,

I've solved the build issue and restarted the release process. It 
looks good ...


The changes from PDFBOX-5699 introduced the issue. Moving the scm 
definition to the parent pom was correct, but the maven release plugin 
stumbled upon the fact the we were holding the parent pom in its own 
subdirectory, thje tagging of the release failed.


I've fixed that by moving everything from the parent pom to the main 
pom in the root directory. Finally the prepare step of the release works.


But it looks like the second step doesn't :-(

I'll have a look ...

Am 30.10.23 um 20:02 schrieb Andreas Lehmkühler:

Hi,

I've experiencing some issues with the release build for 2.0.30. I 
have an idea on how to fix it, but it will take some time, so that 
I'm going to postpone the release for a couple of days.


Sorry for the svn noise.

Cheers
Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[VOTE] Release Apache PDFBox 2.0.30

2023-11-01 Thread Andreas Lehmkühler

Hi,

a candidate for the PDFBox 2.0.30 release is available at:

https://dist.apache.org/repos/dist/dev/pdfbox/2.0.30/

The release candidate is a zip archive of the sources in:

https://svn.apache.org/repos/asf/pdfbox/tags/2.0.30/

The SHA-512 checksum of the archive is 
c1e66695af16396f6a36d02972270651a4630b36799e1fe13262c5748b18cfcbb46829c847ab4993832018f5f8a0546eb468cafdb36019314e275351569d52cc.


Please vote on releasing this package as Apache PDFBox 2.0.30.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

[ ] +1 Release this package as Apache PDFBox 2.0.30
[ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: 2.0.30 build issues

2023-11-01 Thread Andreas Lehmkühler
I've fixed an issue due to a major update of the maven-antrun-plugin and 
ran into the next one :-(


Stay tuned ...

Am 01.11.23 um 13:32 schrieb Andreas Lehmkühler:

Hi,

I've solved the build issue and restarted the release process. It looks 
good ...


The changes from PDFBOX-5699 introduced the issue. Moving the scm 
definition to the parent pom was correct, but the maven release plugin 
stumbled upon the fact the we were holding the parent pom in its own 
subdirectory, thje tagging of the release failed.


I've fixed that by moving everything from the parent pom to the main pom 
in the root directory. Finally the prepare step of the release works.


But it looks like the second step doesn't :-(

I'll have a look ...

Am 30.10.23 um 20:02 schrieb Andreas Lehmkühler:

Hi,

I've experiencing some issues with the release build for 2.0.30. I 
have an idea on how to fix it, but it will take some time, so that I'm 
going to postpone the release for a couple of days.


Sorry for the svn noise.

Cheers
Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: 2.0.30 build issues

2023-11-01 Thread Andreas Lehmkühler

Hi,

I've solved the build issue and restarted the release process. It looks 
good ...


The changes from PDFBOX-5699 introduced the issue. Moving the scm 
definition to the parent pom was correct, but the maven release plugin 
stumbled upon the fact the we were holding the parent pom in its own 
subdirectory, thje tagging of the release failed.


I've fixed that by moving everything from the parent pom to the main pom 
in the root directory. Finally the prepare step of the release works.


But it looks like the second step doesn't :-(

I'll have a look ...

Am 30.10.23 um 20:02 schrieb Andreas Lehmkühler:

Hi,

I've experiencing some issues with the release build for 2.0.30. I have 
an idea on how to fix it, but it will take some time, so that I'm going 
to postpone the release for a couple of days.


Sorry for the svn noise.

Cheers
Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



2.0.30 build issues

2023-10-30 Thread Andreas Lehmkühler

Hi,

I've experiencing some issues with the release build for 2.0.30. I have 
an idea on how to fix it, but it will take some time, so that I'm going 
to postpone the release for a couple of days.


Sorry for the svn noise.

Cheers
Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.30/3.0.1 release?

2023-10-29 Thread Andreas Lehmkühler




Am 28.10.23 um 18:10 schrieb Tilman Hausherr:

It's really just the name - I tested 2.0.29 against 2.0.30.SNAPSHOT.
Thanks for the confirmation. I can't see any issue in the results so 
that I'm planing to cut the 2.0.30 release tomorrow



Btw this new SO question looks like a bug to me:
https://stackoverflow.com/questions/77376559/pdfbox-version-3-0-0-splitter-class-nullpointerexception-at-org-apache-pdfbox-co
Yes, that is an issue, but it is limited to the 3.0 branch and the trunk 
version. I've created https://issues.apache.org/jira/browse/PDFBOX-5707 
to deal with it.


Andreas



Tilman


On 28.10.2023 17:02, Andreas Lehmkühler wrote:



Am 28.10.23 um 13:12 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.29_vs_3.0.0.tar.xz
Thanks for running the test, but I'm a little bit puzzled about the 
file name. According to the stacktraces in A and B I guess you've 
compared 2.0.29 and 2.0.30 and not 2.0.29 and 3.0.0?






Tilman

On 23.10.2023 19:11, Tilman Hausherr wrote:

+1 for both.

I can do a regression test for 2.0.29 / 2.0.30 but not today, but 
hopefully I'll start until saturday.


I don't expect any surprised because I did a regression test not 
long ago in connection with the extraction of Korean documents.


Tilman

On 22.10.2023 19:37, Andreas Lehmkühler wrote:

Hi,

I'd like to cut the 2.0.30 release in a week from now, on Monday or 
Tuesday.


A week later I'd like to go for the first 3.0 bugfix release 3.0.1

WDYT?

@Tim, @Tilman do you have the time to run the extraction tests?

Andreas 





-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.30/3.0.1 release?

2023-10-28 Thread Andreas Lehmkühler




Am 28.10.23 um 13:12 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.29_vs_3.0.0.tar.xz
Thanks for running the test, but I'm a little bit puzzled about the file 
name. According to the stacktraces in A and B I guess you've compared 
2.0.29 and 2.0.30 and not 2.0.29 and 3.0.0?






Tilman

On 23.10.2023 19:11, Tilman Hausherr wrote:

+1 for both.

I can do a regression test for 2.0.29 / 2.0.30 but not today, but 
hopefully I'll start until saturday.


I don't expect any surprised because I did a regression test not long 
ago in connection with the extraction of Korean documents.


Tilman

On 22.10.2023 19:37, Andreas Lehmkühler wrote:

Hi,

I'd like to cut the 2.0.30 release in a week from now, on Monday or 
Tuesday.


A week later I'd like to go for the first 3.0 bugfix release 3.0.1

WDYT?

@Tim, @Tilman do you have the time to run the extraction tests?

Andreas


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.30/3.0.1 release?

2023-10-25 Thread Andreas Lehmkühler




Am 23.10.23 um 19:11 schrieb Tilman Hausherr:

+1 for both.

I can do a regression test for 2.0.29 / 2.0.30 but not today, but 
hopefully I'll start until saturday.

I'm not in a hurry, take your time.


I don't expect any surprised because I did a regression test not long 
ago in connection with the extraction of Korean documents.


Tilman

On 22.10.2023 19:37, Andreas Lehmkühler wrote:

Hi,

I'd like to cut the 2.0.30 release in a week from now, on Monday or 
Tuesday.


A week later I'd like to go for the first 3.0 bugfix release 3.0.1

WDYT?

@Tim, @Tilman do you have the time to run the extraction tests?

Andreas


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



PDFBox 2.0.30/3.0.1 release?

2023-10-22 Thread Andreas Lehmkühler

Hi,

I'd like to cut the 2.0.30 release in a week from now, on Monday or Tuesday.

A week later I'd like to go for the first 3.0 bugfix release 3.0.1

WDYT?

@Tim, @Tilman do you have the time to run the extraction tests?

Andreas


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: log4j2 revert

2023-10-21 Thread Andreas Lehmkühler

@Axel thanks for the patch @Tilman thanks for applying it.

Be aware that I changed the builds for the trunk and for 3.0 so that the 
absence of the mentioned log no logger breaks the build. There should be 
a warning instead if the log is missing


Andreas

Am 21.10.23 um 14:13 schrieb Tilman Hausherr:
Thanks, I'm testing it and will commit after that. Nice find. I'm 
wondering why this worked for so many years.


Tilman

On 21.10.2023 08:02, axh wrote:

Hi,

I opened PDFBOX-5705 
<https://issues.apache.org/jira/browse/PDFBOX-5705> for this and 
created a patch. Everything seems to work, but please verify in 
Jenkins. Sorry that Jira somehow changed priority to „important“ 
without me noticing. My internet connection is somewhat unreliable 
today and every click in Jira takes several tries to get through so I 
will just leave it at that if it’s ok for you.


Axel


Am 20.10.2023 um 09:52 schrieb axh :

Ah, this is weird. Log4J2 currently isn’t used at all in the PDFBox 
code base. But when running tests and examples, log4j-core and 
log4j-jcl are included to reroute commons logging to log4j2 which is 
then used to set the output format and create the log file you 
mentioned. It seems that with the updated version, the commons 
logging to log4j bridge simply isn’t loaded anymore.


This is also something that will get better once we switch directly 
to log4j.


I’ll keep you updated when I find out why log4j-jcl isn’t loaded.

Axel


Am 20.10.2023 um 08:11 schrieb Tilman Hausherr :

Yes, although the log file isn't part of the distribution (or is 
it?) I wondered why it wasn't there. And then I noticed that the 
logging didn't work anymore, i.e. the typical output format wasn't 
there in the console. And not in the file either. And the same 
happened at work with another software of mine.


@Axel the "Files differ" lines are not a problem, this always 
happens. I check these manually or with a modified code and my own 
"expected" files.


Tilman

On 20.10.2023 07:55, Andreas Lehmkühler wrote:


Am 20.10.23 um 07:17 schrieb axh:
Hm… I just did a clean checkout of trunk and did mvn clean verify 
and everything passes, both with log4j2.version set to 2.20.0 and 
2.21.0. I can however see file differences reported in the log 
like this:
The buidl itself works fine after the update. The Jenkins build 
adds another step to the end which fails. An expected log file is 
missing:


ERROR: Step ?Archive the artifacts? failed: No artifacts found that 
match the file pattern "pdfbox/target/pdfbox.log". Configuration 
error?



See [1] for further details

[1] 
https://ci-builds.apache.org/job/PDFBox/job/PDFBox-trunk/1823/console



[INFO] Running 
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormTest
Files differ: 
/Users/axelhowind/IdeaProjects/pdfbox/pdfbox/src/test/resources/org/apache/pdfbox/pdmodel/interactive/form/MultilineFields.pdf-1.png

/Users/axelhowind/IdeaProjects/pdfbox/pdfbox/target/test-output/MultilineFields.pdf-1.png
Rendering of target/test-output/MultilineFields.pdf failed or is 
not identical to expected rendering in 
src/test/resources/org/apache/pdfbox/pdmodel/interactive/form 
directory
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time 
elapsed: 1.027 s -- in 
org.apache.pdfbox.pdmodel.interactive.form.MultilineFieldsTest


But these are not reported as test failures. In the test code, I 
can see that this is by design:


// compare rendering
if (!TestPDFToImage.doTestFile(pdf, IN_DIR.getAbsolutePath(), 
OUT_DIR.getAbsolutePath()))

{
 // don't fail, rendering is different on different systems, 
result must be viewed manually
 System.err.println("Rendering of " + pdf + " failed or is not 
identical to expected rendering in " + IN_DIR + " directory");

}
What exactly does "it no longer works" mean? Is it related to the 
above, or is it the build failures reported by Jenkins on the list?


Axel


Am 20.10.2023 um 06:50 schrieb axh :

Hi,

just saw your message here. As I just started on replacing 
commons-logging by log4j, I will also look into this. I also 
overlooked that there’s already a property for the log4j version. 
Will update the patch I just submitted and then see if I can find 
out what’s causing the test failure with 2.21.0.


Axel

Am 19.10.2023 um 19:06 schrieb Tilman Hausherr 
:


I have reverted the change to the log4j2 version. It no longer 
works. I'll wait a bit if there is an issue about it, there was 
nothing on the mailing list today.


Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




---

Re: log4j2 revert

2023-10-19 Thread Andreas Lehmkühler




Am 20.10.23 um 07:17 schrieb axh:

Hm… I just did a clean checkout of trunk and did mvn clean verify and 
everything passes, both with log4j2.version set to 2.20.0 and 2.21.0. I can 
however see file differences reported in the log like this:
The buidl itself works fine after the update. The Jenkins build adds 
another step to the end which fails. An expected log file is missing:


ERROR: Step ?Archive the artifacts? failed: No artifacts found that 
match the file pattern "pdfbox/target/pdfbox.log". Configuration error?



See [1] for further details

[1] https://ci-builds.apache.org/job/PDFBox/job/PDFBox-trunk/1823/console




[INFO] Running org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormTest
Files differ: 
/Users/axelhowind/IdeaProjects/pdfbox/pdfbox/src/test/resources/org/apache/pdfbox/pdmodel/interactive/form/MultilineFields.pdf-1.png
   
/Users/axelhowind/IdeaProjects/pdfbox/pdfbox/target/test-output/MultilineFields.pdf-1.png
Rendering of target/test-output/MultilineFields.pdf failed or is not identical 
to expected rendering in 
src/test/resources/org/apache/pdfbox/pdmodel/interactive/form directory
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.027 s 
-- in org.apache.pdfbox.pdmodel.interactive.form.MultilineFieldsTest

But these are not reported as test failures. In the test code, I can see that 
this is by design:

// compare rendering
if (!TestPDFToImage.doTestFile(pdf, IN_DIR.getAbsolutePath(), 
OUT_DIR.getAbsolutePath()))
{
 // don't fail, rendering is different on different systems, result must be 
viewed manually
 System.err.println("Rendering of " + pdf + " failed or is not identical to expected 
rendering in " + IN_DIR + " directory");
}
What exactly does "it no longer works" mean? Is it related to the above, or is 
it the build failures reported by Jenkins on the list?

Axel


Am 20.10.2023 um 06:50 schrieb axh :

Hi,

just saw your message here. As I just started on replacing commons-logging by 
log4j, I will also look into this. I also overlooked that there’s already a 
property for the log4j version. Will update the patch I just submitted and then 
see if I can find out what’s causing the test failure with 2.21.0.

Axel


Am 19.10.2023 um 19:06 schrieb Tilman Hausherr :

I have reverted the change to the log4j2 version. It no longer works. I'll wait 
a bit if there is an issue about it, there was nothing on the mailing list 
today.

Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org






-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 4.0 and development plans

2023-10-19 Thread Andreas Lehmkühler



@Maruan, thanks for starting this initative :-)



Am 11.10.23 um 07:53 schrieb sahy...@fileaffairs.de:

Dear colleagues,

with 3.0 being released and 4.0 being started I'd like to start
discussing what the major plans are for 4.0. And maybe in a way that
the release can be made faster than what we had for 3.0. (maybe size it
in a way that we can do the dev stuff by spring 2024 and then release
in summer 2024 followed by a 4.1 release to add to that instead of
doing a big bang like 3.0)

Sounds good to me.



Shall we share some ideas via the mailing list or start a page on our
website (I think ml is easier to do). We can still document the major
initiatives as soon as we have agreed in a blog post.
I agree, we need some sort of plan for the next version to avoid another 
big bang release. I don't have to be that formal, but we shall agree on 
bigger changes to be added to the next major release



Here are my current thoughts (some of which might also be backported to
3.0) in no particular order

- appareance stream handlers for interactive form widgets (similar to
what we have for annotations) also allowing one to add their own
handler
- replacement or at least new base for XMPBox (current thought is to
have a new base parser and add if possible XMPBox current end user api
on top - might be able to reuse xmlgraphics XMP lib). Would allow to
better deal with XMPs which are not standard and make it easier to add
to existing XMPs low level.
IMHO XMP-support is not essential but optional so that it is a good idea 
to use some existing lib instead of implementing our own one.



- then we had the discussion about an event handler/listener similar to
what fop provides so one can listen to corrections/repairs done under
the hood (I know that we can only lay the ground for that as this is a
major undertaking given all the places where we correct things)

That might be a big thing ...


- enhance the parsing to keep the information about incremental
versions (better debugging, trace of changes done ...)

I'm not sure which details maybe be important, but let us start a discussion


- review and add some more PDF 2.0 capabilities

In most cases this can be done in little steps


- better text formatting/language support (maybe by including fop parts
or looking into using HarfBuzz)
- I'd also like to discuss reaching out to fop to look at integrating
some of their font handling into fontbox

Good ideas as well 


...

That list is already long and I think would be too much given above
idea of release planning.

;-)


With regards to versioning I'd like to propose that we have 2.0 as LTS
and 4.x being the next LTS.
First of all, what is your definition of a LTS version? Of course is a 
long term version, but what is long and when does such version reach EOL?


Why did you choose 2.0 as LTS? 2.0.0 was released in 2016, doesn't that 
already qualify as LTS? 2.0 requires java 6, a very old version.
Why not choose 3.0 as LTS? It requires java 8, a more or less old 
version but still widely used and the last version before they start 
removing apis. 3.0 is the last version including preflight.
We should discuss that in a separate thread, juts wanted to share my 
thoughts as a starter





Thoughts
BR
Maruan



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 4.0 and development plans

2023-10-12 Thread Andreas Lehmkühler




Am 13.10.23 um 04:40 schrieb axh:

Hi,

I suggest to also revisit logging. Last week I opened an issue for that (PDFBOX-5695 
), but it seems everybody is 
tired by this subject and no none even looked at it. Nonetheless, please take a look. 
The last time a switch to a logging facade was proposed (and rejected) has been 10 
years ago. I think it is worth reconsidering, and a new major release would be the 
right time to do a change like that. More details in the issue.

Please don't give upto early on us. We are all volunteers with limited 
time and different priorities.



Whatever the project decides, I am willing to contribute the required patch(es).

We highly appreciate that.

I personally don't have the pressure to switch the logging framework but 
I see it is long overdue to overhauil that part of PDFBox.


I tend to agree with Tilman and I'd like to use log4j2. I hope I'll find 
some time to comment on your proposal at the next weekend.



Andreas



Cheers,
Axel


Am 11.10.2023 um 07:53 schrieb sahy...@fileaffairs.de:

Dear colleagues,

with 3.0 being released and 4.0 being started I'd like to start
discussing what the major plans are for 4.0. And maybe in a way that
the release can be made faster than what we had for 3.0. (maybe size it
in a way that we can do the dev stuff by spring 2024 and then release
in summer 2024 followed by a 4.1 release to add to that instead of
doing a big bang like 3.0)

Shall we share some ideas via the mailing list or start a page on our
website (I think ml is easier to do). We can still document the major
initiatives as soon as we have agreed in a blog post.

Here are my current thoughts (some of which might also be backported to
3.0) in no particular order

- appareance stream handlers for interactive form widgets (similar to
what we have for annotations) also allowing one to add their own
handler
- replacement or at least new base for XMPBox (current thought is to
have a new base parser and add if possible XMPBox current end user api
on top - might be able to reuse xmlgraphics XMP lib). Would allow to
better deal with XMPs which are not standard and make it easier to add
to existing XMPs low level.
- then we had the discussion about an event handler/listener similar to
what fop provides so one can listen to corrections/repairs done under
the hood (I know that we can only lay the ground for that as this is a
major undertaking given all the places where we correct things)
- enhance the parsing to keep the information about incremental
versions (better debugging, trace of changes done ...)
- review and add some more PDF 2.0 capabilities
- better text formatting/language support (maybe by including fop parts
or looking into using HarfBuzz)
- I'd also like to discuss reaching out to fop to look at integrating
some of their font handling into fontbox
...

That list is already long and I think would be too much given above
idea of release planning.

With regards to versioning I'd like to propose that we have 2.0 as LTS
and 4.x being the next LTS.

Thoughts
BR
Maruan



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org






-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox Board Report October 2023 due

2023-10-10 Thread Andreas Lehmkühler
Thanks for the feedback. Maybe "minor" is the wrong word. I had severe 
issues in my mind which prevent users from using the new version.


I'm going to paraphrase that sentence

Andreas

Am 10.10.23 um 19:40 schrieb Tilman Hausherr:
+1 although I see PDFBOX-5696 and PDFBOX-5666 as more than "minor" 
because these are so weird and surprising.


Tilman

On 08.10.2023 18:53, Andreas Lehmkühler wrote:

Hi,

find attached a quick draft of the board report we're expected to 
submit this month. It's based upon the report wizard template which 
can be found at [1]


Any comments or additions are appreciated ...


## Description:
The mission of PDFBox is the creation and maintenance of software 
related to

Java library for working with PDF documents

## Project Status:
Current project status: Ongoing with moderate activity
Issues for the board: There are no issues requiring board attention at 
this time



## Membership Data:
Apache PDFBox was founded 2009-10-21 (14 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

    3.0.0 was released on 2023-08-17.
    2.0.29 was released on 2023-07-01.
    2.0.28 was released on 2023-04-13.

## Community Health:
- there is a steady stream of contributions, bug reports and questions 
on the mailing lists
- finally the new major release 3.0.0 was released after 7 years of 
development
- there are some minor issues with the release but nothing serious. I 
expect the first bugfix release 3.0.1 in a couple of weeks
- the development of 4.0.0 already started with two fundamental 
changes. We switched to java 11 as minimum requirement and removed the 
sub project preflight due to inactivity





-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Recent build failures

2023-10-01 Thread Andreas Lehmkühler

Hi,

I've downgraded the version of the download-maven-plugin in the trunk to 
see if it is related to the recent build failures.


I can't reproduce the issue at home. The expected sha512 hash is still 
correct so that I assume an issue with the plugin itself.


Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Multithreading PDFRenderer

2023-09-20 Thread Andreas Lehmkühler




Am 18.09.23 um 14:00 schrieb Arno Dietsche:

Hi,

We are using pdfbox 3.0.0 as part of our project which aims at finding 
discrepancies between two similar documents created by external services. One 
thing we use it for is to render the pages of those documents to images and 
compare the rendered images. Those documents can be very large and therfore we 
are trying to optimize our resource usage. So we want to parallize the page 
rendering if possible. This leads to my question in relation to the PDFRenderer 
class (v3.0.0):

PDFBox is officially supposed not to be thread safe, but we removed some 
of the limitations and tried to make new features thread safe.



In the past we could observe problems with this multithreaded approach. And I 
understand that PDDocument is not thread safe, but what if I get all the PDPage 
objects first and then render them multithreaded? Essentially if the method 
PDFRenderer.renderImage(int pageIndex, float scale, ImageType imageType, 
RenderDestination destination) is passed the PDPPage object directly and not 
the pageIndex, it would not be needed to get the PDpage object from the 
PDPageTree. Do you know of possible limitations regarding multithreading the 
remainder of this renderImage method?

I guess that adding the PDPage instance to that method won't change that 
much as 3.0.0 uses an ondemand parser and most likely the related PDPage 
objects are't fully loaded so that the parser has to dereference most of 
the objects in question during rendering. But good news is, that part 
should be thread safe.
Our own debugger is multithreaded and at the beginning of the 
implemtation of the ondemand parser I stumbled upon that and had to make 
the new IO classes thread safe.


Saying that, I'd like to encourage you to give it a try, but no 
guarantee from our side ;-)



Andreas


To clarify I am currently testing this with a subclass of PDFRenderer so I 
could add this method: renderImage(PDPage page, float scale, ImageType 
imageType, RenderDestination destination)

Thank you very much for your time and help


Best regards / Mit freundlichen Grüßen
Arno Dietsche

brainsphere informationworks GmbH
Elsenheimerstrasse 41
80687 Muenchen
Germany

Telefon:  +49 89 203004-830
Telefax:  +49 89 203004-849

Sitz der Gesellschaft: Muenchen
Registergericht: Amtsgericht Muenchen HRB 154535
Geschaeftsfuehrer: Hans-Joerg Kamm, Volker Mattes

Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte 
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail 
irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und 
vernichten Sie diese E-Mail. Das unerlaubte Kopieren sowie die unbefugte 
Weitergabe dieser E-Mail ist nicht gestattet.

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and destroy this e-mail. Any unauthorized 
copying, disclosure or distribution of the material in this e-mail is strictly 
forbidden.




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: CipherInputStream may not be closed

2023-09-10 Thread Andreas Lehmkühler




Am 08.09.23 um 17:32 schrieb axh:

Hi Anna-Katharina,

what version are you using? In the current 3.0, the stream is closed 
(implicitly) by using the try-with-resources syntax 
(https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html):

try (CipherInputStream cis = new CipherInputStream(data, cipher))
{
 …
}
According to Git Blame, try-with-resources has been used at that point since 
2017, so there should be no problem. Disclaimer: I am not a maintainer, I just 
sometimes contribute code.

In PDFBox 2.0.x the stream is closed in a finally block.

I guess we are fine here.

Andreas



Axel



Am 08.09.2023 um 14:08 schrieb Anna-Katharina Wickert 
:

Hei dear maintainers,

For a benchmark [1], we randomly sampled JCA usages to decide if the API usage 
is a violation of any API usage constraint.
We believe we found one for the JCA class CipherInputStream.
The call to *close* is missing for the call sequence to *CipherInputStream*. 
Thus, the input stream including the ressources of the stream are not released. 
[More Details in the JDK 17 
documentation](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/javax/crypto/CipherInputStream.html)
The instance that we sampled is located in:
- file: 
pdfbox/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/encryption/SecurityHandler.java
- method: private void encryptDataAES256(InputStream data, OutputStream output, 
boolean decrypt) throws IOException
- line: 379

To the best of my knowledge, this JCA usage does not result in a vulnerability 
(directly). However, it violates the API constraint discussed above. Therefore, 
we consider adding this usage as a violation into the benchmark.

Best,
Anna-Katharina Wickert
For the CamBench team

[1] https://github.com/CROSSINGTUD/CamBench





-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[ANNOUNCE] Apache PDFBox 1.8.x End-Of-Life (EOL) Announcement

2023-08-19 Thread Andreas Lehmkühler

The Apache PDFBox Team would like to inform you that PDFBox 1.8.17
is the last release of the 1.8 branch, which has reached its end of life 
and won't be longer officially supported.


The current community mainly maintains the 2.0.x branch and the brand 
new 3.0.x branch. We recommend everyone to upgrade at least to the 2.0.x 
branch for the best experience.


[1] https://pdfbox.apache.org/2.0/migration.html
[2] https://pdfbox.apache.org/3.0/migration.html


Thanks,
The Apache PDFBox Team

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[ANNOUNCE] Apache PDFBox 3.0.0 released

2023-08-17 Thread Andreas Lehmkühler
The Apache PDFBox community is pleased to announce the release of Apache 
PDFBox 3.0.0. It is available for download at:


https://pdfbox.apache.org/download.html

The Apache PDFBox library is an open source Java tool for working with 
PDF documents.


This is the new major release 3.0.0 of PDFBox. This release contains a 
lot of improvements, fixes and refactorings. The API is supposed to be 
stable.


A migration guide is available at

https://pdfbox.apache.org/3.0/migration.html.

It is still a work in progress and we are happy to include any valuable 
feedback from our community.


For more details on these changes and all the other fixes and 
improvements included in this release, please refer to the following 
issues on the PDFBox issue tracker at


https://issues.apache.org/jira/browse/PDFBOX.

The full release notes are available at:

https://www.apache.org/dist/pdfbox/3.0.0/RELEASE-NOTES.txt


The Apache PDFBox website can be found at:

https://pdfbox.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[RESULT][VOTE] Release Apache PDFBox 3.0.0

2023-08-17 Thread Andreas Lehmkühler




Am 14.08.23 um 20:29 schrieb Andreas Lehmkühler:

Please vote on releasing this package as Apache PDFBox 3.0.0.


   +1 Tilman Hausherr
   +1 Maruan Sahyoun
   +1 Timo Boehme
   +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: New PDFBox 3.0 branch

2023-08-15 Thread Andreas Lehmkühler
Thanks for thje hint. The benchmark package isn't part of the reactor 
build and wasn't updated. I've fixed that


Andreas

Am 15.08.23 um 20:00 schrieb Tilman Hausherr:

One small thing re trunk: the benchmark package still has the old version.

Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 3.0.0

2023-08-14 Thread Andreas Lehmkühler

Hi,

when updating the trunk I just realized that pdfbox-io is missing in the 
dist area within svn. This is our fifth 3.0 release and interestingly no 
one missed that part before :-o


However, IMHO that is no show stopper. The artifact was created during 
the build and deployed to nexus. I've added the missing piece to the 
dist area including the signature and hash.


Saying that, the vote is still open unless someone objects to my conclusion.

Andreas

Am 14.08.23 um 20:29 schrieb Andreas Lehmkühler:

Hi,

a candidate for the PDFBox 3.0.0 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/3.0.0/

The release candidate is a zip archive of the sources in:

     https://svn.apache.org/repos/asf/pdfbox/tags/3.0.0/

The SHA-512 checksum of the archive is 
279f283f8f97e3adb5e58546f6242b495eef26dacfc256129f790064a73934f16ceb0a7a9164293d506fc0fff462783d296b844611ed18e12b9de0f1724294b5.


Please vote on releasing this package as Apache PDFBox 3.0.0.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 3.0.0
     [ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



New PDFBox 3.0 branch

2023-08-14 Thread Andreas Lehmkühler

Hi,

due to the preparations for the final release of PDFBox 3.0.0 I've 
created a new branch "3.0" in svn [1].


I've created a job in jenkins to build that branch as well [2]

Andreas

[1] https://svn.apache.org/viewvc/pdfbox/branches/3.0/
[2] https://ci-builds.apache.org/job/PDFBox/job/PDFBox-3.0.x/

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 3.0.0 rebuild

2023-08-14 Thread Andreas Lehmkühler
sorry for the noise, I've struggled with the scm connection. There were 
two and I changed the one which wasn't used .


However, finally I've build the release and the vote is open.

Thanks for your patience

Andreas

Am 14.08.23 um 19:28 schrieb Andreas Lehmkühler:

Hi,

something totally went wrong when building the final release. I ended up 
in a 4.0.0-SNAPSHOT release ?!?!?


I have no idea what went wrong. However, I'm going to rollback the 
release and rebuild the whole thing


Stay tuned ;-)

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[VOTE] Release Apache PDFBox 3.0.0

2023-08-14 Thread Andreas Lehmkühler

Hi,

a candidate for the PDFBox 3.0.0 release is available at:

https://dist.apache.org/repos/dist/dev/pdfbox/3.0.0/

The release candidate is a zip archive of the sources in:

https://svn.apache.org/repos/asf/pdfbox/tags/3.0.0/

The SHA-512 checksum of the archive is 
279f283f8f97e3adb5e58546f6242b495eef26dacfc256129f790064a73934f16ceb0a7a9164293d506fc0fff462783d296b844611ed18e12b9de0f1724294b5.


Please vote on releasing this package as Apache PDFBox 3.0.0.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

[ ] +1 Release this package as Apache PDFBox 3.0.0
[ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



PDFBox 3.0.0 rebuild

2023-08-14 Thread Andreas Lehmkühler

Hi,

something totally went wrong when building the final release. I ended up 
in a 4.0.0-SNAPSHOT release ?!?!?


I have no idea what went wrong. However, I'm going to rollback the 
release and rebuild the whole thing


Stay tuned ;-)

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 3.0.0 final release

2023-08-13 Thread Andreas Lehmkühler

I've had a look at the remaining 4 exceptions.

I can't repoduce the OutOfMemory and the 
ConcurrentModificationException. Maybe both are somehow related to TIKA 
or the test environment.


The IllegalArgumentException is thrown due to an issue with a font. I'm 
not sure if it is really a font bug or some issue with the code.


However, IMHO those issues aren't show stoppers for the planned final 
3.0 release :-)


Andreas




Am 13.08.23 um 13:19 schrieb Andreas Lehmkühler:

@Tilman thanks for running the regression tests

I had a look at the new exceptions.

6 out of 10 files were throwing the same NoSuchElementException in 
PDFXrefStreamParser. It's a regression and both 2.x and 3.x were 
affected. I've applied a fix, see [1]


I'm going to have a look at the remaing 4 exceptions.

Andreas

[1] https://issues.apache.org/jira/browse/PDFBOX-5651

Am 13.08.23 um 09:24 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.29_vs_3.0.0.tar.xz

I had only a short look but I'm optimistic. Some differences may be 
because of the XMP bug.


Tilman



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 3.0.0 final release

2023-08-13 Thread Andreas Lehmkühler

@Tilman thanks for running the regression tests

I had a look at the new exceptions.

6 out of 10 files were throwing the same NoSuchElementException in 
PDFXrefStreamParser. It's a regression and both 2.x and 3.x were 
affected. I've applied a fix, see [1]


I'm going to have a look at the remaing 4 exceptions.

Andreas

[1] https://issues.apache.org/jira/browse/PDFBOX-5651

Am 13.08.23 um 09:24 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.29_vs_3.0.0.tar.xz

I had only a short look but I'm optimistic. Some differences may be 
because of the XMP bug.


Tilman



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: TimeZone + Calendar

2023-08-09 Thread Andreas Lehmkühler

Hi,

see inline

Am 08.08.23 um 19:30 schrieb Daniel Gredler:

Hi,

I think I've said this before, but thanks again for such a great library!
I'm thinking about submitting a patch to improve PDFBox, but wanted to get
the team's thoughts first.

What I've seen is that sometimes under very heavy load across multiple
threads, PDF creation stalls due to lock contention, because
`TTFDataStream.readInternationalDate()` uses
`TimeZone.getTimeZone(String)`, which is synchronized. I assume that the
inverse operation,
`TTFSubsetter.writeLongDateTime(DataOutputStream,Calendar)`, has a similar
issue, though I have not seen it myself.
Good catch. PDFBox is not supposed to be thread-safe, but it is always a 
good idea to eliminate obvious issues.



If these classes were using the newer `java.time` APIs, they could simply
use the `ZoneOffset.UTC` constant... but they are using the older
`Calendar` API (and exposing this fact), so `TimeZone` must be used. So a
few questions from my side:

Is there a plan to move off of the older time APIs and onto the newer
`java.time` APIs, perhaps as part of version 3.0? (e.g. `Calendar` ->
`ZonedDateTime`) Either way, I'm pretty sure such a breaking change is not
contemplated for the 2.x release series, correct?
No plan so far. There were some discussions in the past but other things 
very more important or more interesting ;-) There were some small 
changes in the trunk.


We already released a beta of 3.0 which implies a stable api so that we 
must not change that. Furthermore I'm planing to cut the final release 
next Monday and I guess there isn't enough time to do such changes at all.
2.x is a no go as well, as it relies on java 6 and the java.time api 
requires java 8



`TimeZone` is not technically thread-safe, but there only a couple of
rarely-used dangerous mutators, AFAIK (`setRawOffset(int)`,
`setID(String)`). Would it be OK to keep a static final UTC `TimeZone` and
just reuse that, even though `TimeZone` is only 95% immutable?
I concur with Tilman, this would be a small but good solution for 3.0 
and 2.x as well


Let's discuss the topic after releasing 3.0. Maybe it is a good idea to 
target 4.x for such a change.



If not, what about keeping a static final UTC `TimeZone` constant and
creating a clone each time it is needed, to avoid the synchronization?
(`Calendar.getTimeZone()` makes liberal use of `clone()`, for example).

If that doesn't work, what about subclassing `TimeZone` and overriding the
mutators with methods that just throw `UnsupportedOperationException`s, and
using the subclass for the static final UTC constant? Commons Lang 3 does
something similar with `GmtTimeZone` and the `FastTimeZone` utility class
here:

https://github.com/apache/commons-lang/blob/bf5865ae915ececcdbfa7a473b0d708e3e235bcf/src/main/java/org/apache/commons/lang3/time/FastTimeZone.java#L38

Thanks for the feedback!

Daniel



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 3.0.0 final release

2023-08-06 Thread Andreas Lehmkühler

Hi,

@Tilman thanks for the feedback

I'm planing to cut the final release next Monday, a week from now.

Andreas

Am 25.07.23 um 20:16 schrieb Tilman Hausherr:

I'm surprised that there hasn't been any feedback.

But 3 weeks from now would be ok. 3.0.0.beta-1 was released 2 weeks ago, 
that would mean 5 weeks total, and most people don't go on vacation for 
more than 3 weeks.


+1

Tilman

On 24.07.2023 19:22, Andreas Lehmkühler wrote:

Hi,

the first beta of PDFBox 3.0.0 is out of the door and I'm wondering 
when to do the final release.


I'm in favor of doing the final release soon, let's say in 3 weeks 
from now. Or should we wait a little bit longer for some feedback on 
the beta version. Is there maybe anything you want to do first?


WDYT?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



PDFBox 3.0.0 final release

2023-07-24 Thread Andreas Lehmkühler

Hi,

the first beta of PDFBox 3.0.0 is out of the door and I'm wondering when 
to do the final release.


I'm in favor of doing the final release soon, let's say in 3 weeks from 
now. Or should we wait a little bit longer for some feedback on the beta 
version. Is there maybe anything you want to do first?


WDYT?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[ANNOUNCE] Apache PDFBox 3.0.0-beta1 released

2023-07-13 Thread Andreas Lehmkühler
The Apache PDFBox community is pleased to announce the release of the 
first beta release for Apache PDFBox 3.0.0. It is available for download at:


https://pdfbox.apache.org/download.html

The Apache PDFBox library is an open source Java tool for working with 
PDF documents.


This is the first beta release candidate for the upcoming major release 
3.0.0 of PDFBox. This release contains a lot of improvements, fixes and 
refactorings. The API is supposed to be stable.


A migration guide is available at 
https://pdfbox.apache.org/3.0/migration.html. It is still a work in 
progress and we are happy to include any valuable feedback from our 
community.


For more details on these changes and all the other fixes and 
improvements included in this release, please refer to the following 
issues on the PDFBox issue tracker at 
https://issues.apache.org/jira/browse/PDFBOX.



The full release notes are available at:

https://www.apache.org/dist/pdfbox/3.0.0-beta1/RELEASE-NOTES.txt


The Apache PDFBox website can be found at:

https://pdfbox.apache.org/


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[RESULT][VOTE] Release Apache PDFBox 3.0.0-beta1

2023-07-13 Thread Andreas Lehmkühler



Am 11.07.23 um 07:55 schrieb Andreas Lehmkühler:

Please vote on releasing this package as Apache PDFBox 3.0.0-beta1.


   +1 Tilman Hausherr
   +1 Maruan Sahyoun
   +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox Board Report July 2023 due

2023-07-12 Thread Andreas Lehmkühler

Hi,

I've submitted the report as proposed, thanks for the reviews

Andreas

Am 11.07.23 um 08:16 schrieb Andreas Lehmkühler:

Hi,

find attached a quick draft of the board report we're expected to submit 
this month. It's based upon the report wizard template which can be 
found at [1]


Any comments or additions are appreciated ...


## Description:
The mission of PDFBox is the creation and maintenance of software 
related to Java library for working with PDF documents


## Project Status:
Current project status: Ongoing with moderate activity
Issues for the board: There are no issues requiring board attention at 
this time


## Membership Data:
Apache PDFBox was founded 2009-10-21 (14 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

     2.0.29 was released on 2023-07-01.
     2.0.28 was released on 2023-04-13.
     2.0.27 was released on 2022-09-29.

## Community Health:
- there is a steady stream of contributions, bug reports and questions 
on the mailing lists

- there are a lot of refactorings, improvements and bugfixes
- 2.0.29 was released a few days ago
- the new release consists of small improvements and bug fixes. Two of 
the latter fix two regressions introduced/revealed in the former 2.0.28 
release

- a vote for the first beta version of PDFBox 3.0.0 is ongoing

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[VOTE] Release Apache PDFBox 3.0.0-beta1

2023-07-10 Thread Andreas Lehmkühler

Hi,

a candidate for the PDFBox 3.0.0-beta1 release is available at:

https://dist.apache.org/repos/dist/dev/pdfbox/3.0.0-beta1/

The release candidate is a zip archive of the sources in:

https://svn.apache.org/repos/asf/pdfbox/tags/3.0.0-beta1/

The SHA-512 checksum of the archive is 
07a697c6d31854a74eb0452b792644da33fe5e0f3954040465498869059d8a47b11285e6c1472ab8f7c0be76373b86cfd0d1d5963fc1ed9c08ffbad1aadc5651.


Please vote on releasing this package as Apache PDFBox 3.0.0-beta1.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

[ ] +1 Release this package as Apache PDFBox 3.0.0-beta1
[ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBOx 3.0.0-beta1 release, next attempt

2023-07-10 Thread Andreas Lehmkühler

Hi,

I've some issues with my build enviroment, the signing doesn't work, and 
I don't have any why. It worked fine the other day when building 2.0.29.


I've to investigate first and therefore postpone the release for another 
day.


Andreas


Am 06.07.23 um 19:56 schrieb Andreas Lehmkühler:

Hi,

now that the 2.0.29 is out I'd like to cut the first beta of 3.0.0.

How about next Monday? Or is there anything we have to do first and 
maybe wait another week or two?


WDYT?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



PDFBOx 3.0.0-beta1 release, next attempt

2023-07-06 Thread Andreas Lehmkühler

Hi,

now that the 2.0.29 is out I'd like to cut the first beta of 3.0.0.

How about next Monday? Or is there anything we have to do first and 
maybe wait another week or two?


WDYT?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[ANNOUNCE] Apache PDFBox 2.0.29 released

2023-07-01 Thread Andreas Lehmkühler

The Apache PDFBox community is pleased to announce the release of
Apache PDFBox version 2.0.29. The release is available for download at:

https://pdfbox.apache.org/download.html

See the full release notes below for details about this release.

Release Notes -- Apache PDFBox -- Version 2.0.29

Introduction


The Apache PDFBox library is an open source Java tool for working with 
PDF documents.


This is an incremental bugfix release based on the earlier 2.0.28 
release. It contains

a couple of fixes and small improvements.

For more details on these changes and all the other fixes and improvements
included in this release, please refer to the following issues on the
PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.

Bug

[PDFBOX-4010] - A (rotated) barcode is missing from a pdf when printed
[PDFBOX-5587] - NullPointerException in PDTrueTypeFont.java getPath( )
[PDFBOX-5591] - Parsing of XMP metadata without optional xmpmeta element
[PDFBOX-5593] - Avoid division by 0 in shading function interpolation
[PDFBOX-5596] - MyPageDrawer#getPaint may produce 
UnsupportedOperationException

[PDFBOX-5601] - Barcode corrupted when printing document
[PDFBOX-5604] - The text in some fonts is lost when converting pdf to image
[PDFBOX-5606] - PDFTextStripper runs out of memory in 2.0.28 but not in 
2.0.27 same code
[PDFBOX-5609] - all values in the signature dictionary shall be direct 
objects

[PDFBOX-5611] - Glyphs not rendered
[PDFBOX-5612] - PDF with mangled font rendering in some environments
[PDFBOX-5614] - RadioButtons disappear when printing PDF
[PDFBOX-5620] - BitsPerComponent 16 not allowed in PDF/A-1b
[PDFBOX-5621] - NullPointerException in PDFStreamEngine.showText
[PDFBOX-5624] - Infinte loop when parsing Type1 font

Improvement

[PDFBOX-5571] - Add duplex and tray parameters to PrintPDF
[PDFBOX-5598] - Create command line utility to extract XMP data
[PDFBOX-5605] - Improve Opaque PDFRenderer example

Task

[PDFBOX-4932] - Implement /RunLengthDecode encoder
[PDFBOX-5595] - Slight regression on corrupt bug tracker file
[PDFBOX-5625] - move and update bc from jdk15on to jdk15to18

Release Contents


This release consists of a single source archive packaged as a zip file.
The archive can be unpacked with the jar tool from your JDK installation.
See the README.txt file for instructions on how to build this release.

The source archive is accompanied by a SHA512 checksum and a PGP signature
that you can use to verify the authenticity of your download.
The public key used for the PGP signature can be found at
https://www.apache.org/dist/pdfbox/KEYS.

About Apache PDFBox
---

Apache PDFBox is an open source Java library for working with PDF documents.
This project allows creation of new PDF documents, manipulation of existing
documents and the ability to extract content from documents. Apache PDFBox
also includes several command line utilities. Apache PDFBox is published
under the Apache License, Version 2.0.

For more information, visit https://pdfbox.apache.org/

About The Apache Software Foundation


Established in 1999, The Apache Software Foundation provides organizational,
legal, and financial support for more than 100 freely-available,
collaboratively-developed Open Source projects. The pragmatic Apache License
enables individual and commercial users to easily deploy Apache software;
the Foundation's intellectual property framework limits the legal exposure
of its 2,500+ contributors.

For more information, visit https://www.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[RESULT][VOTE] Release Apache PDFBox 2.0.29

2023-07-01 Thread Andreas Lehmkühler



Am 28.06.23 um 18:54 schrieb Andreas Lehmkühler:

Please vote on releasing this package as Apache PDFBox 2.0.29.



   +1 Tilman Hausherr
   +1 Maruan Sahyoun
   +1 Andreas Lehmkühler

Thanks for your support and help!! I'm going to push the release out.

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



  1   2   3   4   >