Re: PDFBox 2.0.32 release

2024-07-06 Thread Tilman Hausherr

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz

to be compared against

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

I couldn't find a difference visually except the file sizes. This might 
be because of the path names or some meta data.


Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd 
like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz 



 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 hour 
to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the change 
from PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 
is able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz 



No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have 
the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping 
hand I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



- 


To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



- 


To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-07-06 Thread Tilman Hausherr

Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:

Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd 
like to finally cut the 2.0.32 release.


Do we need a new regression test due the latest changes?

There some related changes such as 
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent 
refactoring in fontbox.


Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz 



 From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 hour 
to create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the change 
from PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were 
"omitted" and in 2.0.32 there is some special char. But th 
remaining part looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 
is able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz 



No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have 
the results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand 
I can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



- 


To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31

2024-07-06 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-5838.
---
Resolution: Won't Do

> Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
> --
>
> Key: PDFBOX-5838
> URL: https://issues.apache.org/jira/browse/PDFBOX-5838
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>    Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf, 
> PDFBOX-5838-0024320-reduced.pdf
>
>
> discovered in 2.0.32 regression tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



jenkins build timeout strategy changed

2024-07-05 Thread Tilman Hausherr
I've changed the jenkins build timeout strategy on Jenkins because we 
got too many timed out builds. I've set an inactivity timeout on 10 
minutes. This is because of the problems getting the NVD database while 
not having an NVD API key when the plugin gets updated. I suspect that 
Apache is penalized because we're not the only ones who make these calls.


Tilman


[jira] [Commented] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863059#comment-17863059
 ] 

Tilman Hausherr commented on PDFBOX-5848:
-

I forgot to mention: our snapshots are not available on maven central.

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: 706213.pdf, cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862998#comment-17862998
 ] 

Tilman Hausherr commented on PDFBOX-5848:
-

Just added  [^706213.pdf] if we ever want to add a test or improve this. 
Official US document thus no copyright.

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: 706213.pdf, cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5848:

Attachment: 706213.pdf

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: 706213.pdf, cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862984#comment-17862984
 ] 

Tilman Hausherr edited comment on PDFBOX-5848 at 7/4/24 9:42 AM:
-

If you don't need the annotations (especially link annotations) then it's a 
solution. Alternatively copy the current source code of the splitter class from 
the repository and use that one instead of the class from the jar.


was (Author: tilman):
If you don't need the annotations (especially link annotations) then it's a 
solution. Alternatively copy the current source code of the splitter class from 
the repository.

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862984#comment-17862984
 ] 

Tilman Hausherr commented on PDFBOX-5848:
-

If you don't need the annotations (especially link annotations) then it's a 
solution. Alternatively copy the current source code of the splitter class from 
the repository.

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5848:

Affects Version/s: 2.0.31

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5848:

Component/s: Utilities

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5848:

Fix Version/s: 2.0.32
   3.0.3 PDFBox
   4.0.0

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862971#comment-17862971
 ] 

Tilman Hausherr commented on PDFBOX-5848:
-

[~jfisbein-clarity] Please try with the new snapshot
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.3-SNAPSHOT/
it's likely that this fixes your problem as well, because there is less to save 
now.

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5848:

Summary: Infinite loop after splitting and saving PDF / giant result files  
(was: Infinite loop processing PDF)

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862916#comment-17862916
 ] 

Tilman Hausherr commented on PDFBOX-5848:
-

It finished with 3.0.2 (while I slept) and the snapshot too (with a dirty fix 
for the /Parent problem). I also tried with "-startPage 1 -endPage 442" because 
I'm not sure about the default settings of the splitter class and I never tried 
her code.

I'll do a less dirty fix for the /Parent problem in the next few days.

[~jfisbein-clarity] try setting a higher stack site with "-Xss". The snapshot 
version is at
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.3-SNAPSHOT/


> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862867#comment-17862867
 ] 

Tilman Hausherr edited comment on PDFBOX-5848 at 7/3/24 6:15 PM:
-

I'm testing with 3.0.2 and it's working very slowly... I'm at page 170. However 
there's a different problem, lots of orphan pages. The reason is that some 
annotations have a /Parent entry which has a /Kids entry whose children are 
annotations on *different* pages. Opening and saving it with Adobe Reader 
brings a much smaller file, where the /Parent entry value is set to null.
 !screenshot-1.png! 


was (Author: tilman):
I'm testing with 3.0.2 and it's working very slowly... I'm at page 170. However 
there's a different problem, lots of orphan pages. The reason is that some 
annotations have a /Parent entry which has a /Kids entry whose children are 
annotations on *different* pages.
 !screenshot-1.png! 

> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5848:

Attachment: screenshot-1.png

> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862867#comment-17862867
 ] 

Tilman Hausherr commented on PDFBOX-5848:
-

I'm testing with 3.0.2 and it's working very slowly... I'm at page 170. However 
there's a different problem, lots of orphan pages. The reason is that some 
annotations have a /Parent entry which has a /Kids entry whose children are 
annotations on *different* pages.
 !screenshot-1.png! 

> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



jbig2 git

2024-07-03 Thread Tilman Hausherr
Sorry for the mess. I sent the wrong commit message, and tried different 
(partly unsuccessful) tactics to squash several commit messages into 
one. At least the tika message is gone now. I'll stop now because it 
might only get worse.


Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Assigned] (PDFBOX-5847) Improve performance of FileSystemFontProvider.scanFonts()

2024-07-01 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reassigned PDFBOX-5847:
---

Assignee: Tilman Hausherr

> Improve performance of FileSystemFontProvider.scanFonts()
> -
>
> Key: PDFBOX-5847
> URL: https://issues.apache.org/jira/browse/PDFBOX-5847
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>    Reporter: Tilman Hausherr
>    Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 3.0.3 PDFBox, 4.0.0
>
>
> PR by Mykola Bohdiuk which introduces an "only headers" mode for the font 
> parsers where each table reads as little information as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5847) Improve performance of FileSystemFontProvider.scanFonts()

2024-07-01 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5847:

Fix Version/s: 3.0.3 PDFBox
   4.0.0

> Improve performance of FileSystemFontProvider.scanFonts()
> -
>
> Key: PDFBOX-5847
> URL: https://issues.apache.org/jira/browse/PDFBOX-5847
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>    Reporter: Tilman Hausherr
>Priority: Major
> Fix For: 3.0.3 PDFBox, 4.0.0
>
>
> PR by Mykola Bohdiuk which introduces an "only headers" mode for the font 
> parsers where each table reads as little information as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5225) Flattening removes all annotations when widget annotation has no page

2024-07-01 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17861103#comment-17861103
 ] 

Tilman Hausherr edited comment on PDFBOX-5225 at 7/1/24 9:28 AM:
-

No I'm not / yes please. I just clarified what it is about.


was (Author: tilman):
No I'm not. I just clarified what it is about.

> Flattening removes all annotations when widget annotation has no page
> -
>
> Key: PDFBOX-5225
> URL: https://issues.apache.org/jira/browse/PDFBOX-5225
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Affects Versions: 2.0.24
>    Reporter: Tilman Hausherr
>Priority: Major
> Attachments: SourceFailure.pdf, screenshot-1.png
>
>
> {code}
> PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
> List list = new ArrayList<>();
> list.add(acroForm.getField("VN_NAME"));
> acroForm.flatten(list, true); 
> {code}
> The code from buildPagesWidgetsMap that is run when there are widgets with 
> missing page references does not consider the field list. So all widgets end 
> up in the map instead of only those we care about.
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5847) Improve performance of FileSystemFontProvider.scanFonts()

2024-07-01 Thread Tilman Hausherr (Jira)
Tilman Hausherr created PDFBOX-5847:
---

 Summary: Improve performance of FileSystemFontProvider.scanFonts()
 Key: PDFBOX-5847
 URL: https://issues.apache.org/jira/browse/PDFBOX-5847
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 3.0.2 PDFBox, 2.0.31
Reporter: Tilman Hausherr


PR by Mykola Bohdiuk which introduces an "only headers" mode for the font 
parsers where each table reads as little information as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-5383) JAVA program Crashes

2024-06-29 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-5383.
---
Resolution: Not A Bug

Closing because this isn't "our" bug, it's in JDK8.

> JAVA program Crashes
> 
>
> Key: PDFBOX-5383
> URL: https://issues.apache.org/jira/browse/PDFBOX-5383
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.24, 2.0.25, 3.0.0 PDFBox
>Reporter: krishna prasad
>Priority: Major
>  Labels: crash, jdk8
> Attachments: crash.pdf
>
>
> I am trying to convert the PDF into images by using render. It hangs up the 
> program.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-5289) java.io.IOException: Unknown dir object c='>' cInt=62 peek='>' peekInt=62 at offset 13377272 (start offset: 13377272)

2024-06-29 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-5289.
---
Resolution: Won't Fix

Won't fix in 2.0, but works in 3.0 as long as you don't try to access the 
docinfo.

> java.io.IOException: Unknown dir object c='>' cInt=62 peek='>' peekInt=62 at 
> offset 13377272 (start offset: 13377272)
> -
>
> Key: PDFBOX-5289
> URL: https://issues.apache.org/jira/browse/PDFBOX-5289
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.24
>Reporter: Stephen
>Priority: Major
> Attachments: Diplomacy by Henry Kissinger (1).pdf
>
>
> {code:java}
> java.io.IOException: Unknown dir object c='>' cInt=62 peek='>' peekInt=62 at 
> offset 13377272 (start offset: 13377272)java.io.IOException: Unknown dir 
> object c='>' cInt=62 peek='>' peekInt=62 at offset 13377272 (start offset: 
> 13377272) at 
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:913) at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:154)
>  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:288)
>  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:218)
>  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:857) at 
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:907) at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:876)
>  at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:796)
>  at 
> org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2858)
>  at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:175) at 
> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1228) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1128)
> {code}
> Please find the problematic PDF attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5225) Flattening removes all annotations when widget annotation has no page

2024-06-28 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5225:

Attachment: screenshot-1.png

> Flattening removes all annotations when widget annotation has no page
> -
>
> Key: PDFBOX-5225
> URL: https://issues.apache.org/jira/browse/PDFBOX-5225
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Affects Versions: 2.0.24
>    Reporter: Tilman Hausherr
>Priority: Major
> Attachments: SourceFailure.pdf, screenshot-1.png
>
>
> {code}
> PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
> List list = new ArrayList<>();
> list.add(acroForm.getField("VN_NAME"));
> acroForm.flatten(list, true); 
> {code}
> The code from buildPagesWidgetsMap that is run when there are widgets with 
> missing page references does not consider the field list. So all widgets end 
> up in the map instead of only those we care about.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5225) Flattening removes all annotations when widget annotation has no page

2024-06-28 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5225:

Description: 
{code}
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
List list = new ArrayList<>();
list.add(acroForm.getField("VN_NAME"));
acroForm.flatten(list, true); 
{code}
The code from buildPagesWidgetsMap that is run when there are widgets with 
missing page references does not consider the field list. So all widgets end up 
in the map instead of only those we care about.

 !screenshot-1.png! 

  was:
{code}
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
List list = new ArrayList<>();
list.add(acroForm.getField("VN_NAME"));
acroForm.flatten(list, true); 
{code}
The code from buildPagesWidgetsMap that is run when there are widgets with 
missing page references does not consider the field list. So all widgets end up 
in the map instead of only those we care about.


> Flattening removes all annotations when widget annotation has no page
> -
>
> Key: PDFBOX-5225
> URL: https://issues.apache.org/jira/browse/PDFBOX-5225
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Affects Versions: 2.0.24
>Reporter: Tilman Hausherr
>Priority: Major
> Attachments: SourceFailure.pdf, screenshot-1.png
>
>
> {code}
> PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
> List list = new ArrayList<>();
> list.add(acroForm.getField("VN_NAME"));
> acroForm.flatten(list, true); 
> {code}
> The code from buildPagesWidgetsMap that is run when there are widgets with 
> missing page references does not consider the field list. So all widgets end 
> up in the map instead of only those we care about.
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5842) IllegalArgumentException: Width (26) and height (0) must be non-zero

2024-06-27 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-5842.
-
Resolution: Fixed

> IllegalArgumentException: Width (26) and height (0) must be non-zero
> 
>
> Key: PDFBOX-5842
> URL: https://issues.apache.org/jira/browse/PDFBOX-5842
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>    Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> reported by Patrycja Zaremba in the users mailing list
> https://lists.apache.org/thread/xnwcyhq2c16d9xfgqwgjs70k9qb1w8tp
> {quote}When the page which I try to convert have any element which is png with
> only 1px height (28x1, 54x1 etc.) it is scaled down to 0 and I got this{quote}
> IllegalArgumentException: Width (26) and height (0) must be non-zero
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1281)
> 
> org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:74)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5845) potential memory leak in TrueTypeCollection.java

2024-06-26 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-5845.
-
Resolution: Fixed

fixed in 1918648 (3.0) and in 1918649 (trunk)

> potential memory leak in TrueTypeCollection.java
> 
>
> Key: PDFBOX-5845
> URL: https://issues.apache.org/jira/browse/PDFBOX-5845
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 3.0.2 PDFBox
>    Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
>
> This is part of PR#189 (which will be done in a future ticket) and is done 
> separately to shorten / clarify the patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5845) potential memory leak in TrueTypeCollection.java

2024-06-26 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5845:

Fix Version/s: 3.0.3 PDFBox
   4.0.0

> potential memory leak in TrueTypeCollection.java
> 
>
> Key: PDFBOX-5845
> URL: https://issues.apache.org/jira/browse/PDFBOX-5845
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 3.0.2 PDFBox
>    Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
>
> This is part of PR#189 (which will be done in a future ticket) and is done 
> separately to shorten / clarify the patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5845) potential memory leak in TrueTypeCollection.java

2024-06-26 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5845:

Affects Version/s: 3.0.2 PDFBox

> potential memory leak in TrueTypeCollection.java
> 
>
> Key: PDFBOX-5845
> URL: https://issues.apache.org/jira/browse/PDFBOX-5845
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 3.0.2 PDFBox
>    Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
>
> This is part of PR#189 (which will be done in a future ticket) and is done 
> separately to shorten / clarify the patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5845) potential memory leak in TrueTypeCollection.java

2024-06-26 Thread Tilman Hausherr (Jira)
Tilman Hausherr created PDFBOX-5845:
---

 Summary: potential memory leak in TrueTypeCollection.java
 Key: PDFBOX-5845
 URL: https://issues.apache.org/jira/browse/PDFBOX-5845
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr


This is part of PR#189 (which will be done in a future ticket) and is done 
separately to shorten / clarify the patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-5844) The font "Symbol" throw an exception when rendering text

2024-06-25 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-5844.
---
Resolution: Not A Bug

> The font "Symbol"  throw an exception when rendering text
> -
>
> Key: PDFBOX-5844
> URL: https://issues.apache.org/jira/browse/PDFBOX-5844
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.24
>Reporter: bai yuan
>Priority: Major
> Attachments: PDFBOX5844_Symbol_Mu.pdf, exportByExcel.pdf, 
> image-2024-06-24-16-05-20-296.png, pdfboxtest.java, symbol.ttf
>
>
> Using PDType0Font.load to load the attch font,  it will throw an exception 
> when rendering text. Excel can render it normally, see the “exportByExcel.pdf”
>  !image-2024-06-24-16-05-20-296.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-5836) PDF A-1 falsely validated as invalid for ICC color profile regression

2024-06-24 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-5836.
---
Resolution: Invalid

> PDF A-1 falsely validated as invalid for ICC color profile regression
> -
>
> Key: PDFBOX-5836
> URL: https://issues.apache.org/jira/browse/PDFBOX-5836
> Project: PDFBox
>  Issue Type: Bug
>  Components: Preflight
>Affects Versions: 3.0.2 PDFBox
>Reporter: Jochen Stärk
>Priority: Major
> Attachments: MustangGnuaccountingBeispielRE-20190610_507blanko.pdf
>
>
> PreflightParser.validate(theFile.toFile()).isValid() throws a "Unable to 
> parse the ICC Profile" on the attached, Libreoffice-generated PDF/A-1. 
> VeraPDF validates the file as valid. It worked with PDF 2 and I need it to be 
> fixed in context of my upgrade to PDFbox 3 
> (https://github.com/ZUGFeRD/mustangproject/issues/373).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Reopened] (PDFBOX-5836) PDF A-1 falsely validated as invalid for ICC color profile regression

2024-06-24 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reopened PDFBOX-5836:
-

> PDF A-1 falsely validated as invalid for ICC color profile regression
> -
>
> Key: PDFBOX-5836
> URL: https://issues.apache.org/jira/browse/PDFBOX-5836
> Project: PDFBox
>  Issue Type: Bug
>  Components: Preflight
>Affects Versions: 3.0.2 PDFBox
>Reporter: Jochen Stärk
>Priority: Major
> Attachments: MustangGnuaccountingBeispielRE-20190610_507blanko.pdf
>
>
> PreflightParser.validate(theFile.toFile()).isValid() throws a "Unable to 
> parse the ICC Profile" on the attached, Libreoffice-generated PDF/A-1. 
> VeraPDF validates the file as valid. It worked with PDF 2 and I need it to be 
> fixed in context of my upgrade to PDFbox 3 
> (https://github.com/ZUGFeRD/mustangproject/issues/373).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5844) The font "Symbol" throw an exception when rendering text

2024-06-24 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5844:

Attachment: PDFBOX5844_Symbol_Mu.pdf

> The font "Symbol"  throw an exception when rendering text
> -
>
> Key: PDFBOX-5844
> URL: https://issues.apache.org/jira/browse/PDFBOX-5844
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.24
>Reporter: bai yuan
>Priority: Major
> Attachments: PDFBOX5844_Symbol_Mu.pdf, exportByExcel.pdf, 
> image-2024-06-24-16-05-20-296.png, pdfboxtest.java, symbol.ttf
>
>
> Using PDType0Font.load to load the attch font,  it will throw an exception 
> when rendering text. Excel can render it normally, see the “exportByExcel.pdf”
>  !image-2024-06-24-16-05-20-296.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5844) The font "Symbol" throw an exception when rendering text

2024-06-24 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859620#comment-17859620
 ] 

Tilman Hausherr commented on PDFBOX-5844:
-

Use "\uf06d" and it works, as shown in  [^PDFBOX5844_Symbol_Mu.pdf] .

> The font "Symbol"  throw an exception when rendering text
> -
>
> Key: PDFBOX-5844
> URL: https://issues.apache.org/jira/browse/PDFBOX-5844
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.24
>Reporter: bai yuan
>Priority: Major
> Attachments: PDFBOX5844_Symbol_Mu.pdf, exportByExcel.pdf, 
> image-2024-06-24-16-05-20-296.png, pdfboxtest.java, symbol.ttf
>
>
> Using PDType0Font.load to load the attch font,  it will throw an exception 
> when rendering text. Excel can render it normally, see the “exportByExcel.pdf”
>  !image-2024-06-24-16-05-20-296.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?

2024-06-19 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-5843.
-
Fix Version/s: 2.0.32
   3.0.3 PDFBox
   4.0.0
 Assignee: Tilman Hausherr
   Resolution: Fixed

> There is an exception when getting embedded font, is it compatible?
> ---
>
> Key: PDFBOX-5843
> URL: https://issues.apache.org/jira/browse/PDFBOX-5843
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: liu
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: 123.pdf, image-2024-06-19-16-49-40-186.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, xxx.cff
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?

2024-06-19 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856317#comment-17856317
 ] 

Tilman Hausherr commented on PDFBOX-5843:
-

Fixed in 1918445, 1918446, 1918447, 1918448, 1918449, 1918450 (svn2jira is 
down). Thanks for the report!

> There is an exception when getting embedded font, is it compatible?
> ---
>
> Key: PDFBOX-5843
> URL: https://issues.apache.org/jira/browse/PDFBOX-5843
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: liu
>Priority: Major
> Attachments: 123.pdf, image-2024-06-19-16-49-40-186.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, xxx.cff
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?

2024-06-19 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856283#comment-17856283
 ] 

Tilman Hausherr edited comment on PDFBOX-5843 at 6/19/24 3:38 PM:
--

Lets assume that the font is correct. I fixed the bug locally that empty 
entries now get skipped in CFFParser.java, by adding {{if (bytes.length == 0) 
continue;}} after {{for (byte[] bytes : fdIndex)}}.

My first thought was that something goes wrong with the offsets because there 
is more than one fdindex entry. But I haven't been able to prove this.

I also tried to install FontForge but it doesn't show anything.

readIndexData char strings:
code 9987 len 153 at offset 298684
code 12431 len 245 at offset 301280
code 14225 len 135 at offset 303318

for code 9987 (which has this "over the top" path) I checked that offset and 
length make sense, all the other bytes before/after are 0x0e, which is ENDCHAR

type2 charstring:
[-79, 67, 592, 63, HSTEM|, 164, 64, 388, 66, VSTEM|, 453, 414, RMOVETO|, -29, 
-122, -48, -120, -64, -78, 17, -9, 28, -18, 13, -10, 62, 84, 54, 127, 33, 132, 
RRCURVETO|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, 63, 22, RCURVELINE|, 
-17, 92, -52, 139, -59, 106, RRCURVETO|, -353, 414, RMOVETO|, -33, -149, -57, 
-145, -74, -94, 16, -10, 26, -22, 12, -11, 36, 49, 33, 60, 29, 68, RRCURVETO|, 
160, -575, HLINETO|, -13, -5, -4, -12, -14, -43, -1, 2, -49, VHCURVETO|, 10, 
-19, 11, 9342, 8, 21, 39, VVCURVETO|, 575, 200, VLINETO|, -9, -53, -10, -55, 
-7, -37, 57, -11, RCURVELINE|, 13, 53, 18, 86, 13, 72, -46, -55, -419, 
HLINETO|, 21, 56, 18, 60, 14, 60, RRCURVETO|, -265, 16, -102]

converted to type1 sequence:
[0, 1000.0, HSBW|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, 
RRCURVETO|, 17, -9, 28, -18, 13, -10, RRCURVETO|, 62, 84, 54, 127, 33, 132, 
RRCURVETO|, CLOSEPATH|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, 
RRCURVETO|, 63, 22, RLINETO|, -17, 92, -52, 139, -59, 106, RRCURVETO|, 
CLOSEPATH|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, RRCURVETO|, 
16, -10, 26, -22, 12, -11, RRCURVETO|, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, 
HLINETO|, -575, VLINETO|, 0, -13, -5, -4, -12, 0, RRCURVETO|, -14, 0, -43, -1, 
-49, 2, RRCURVETO|, 0, 10, -19, 11, 0, 9342, RRCURVETO|, 575, VLINETO|, 200, 
HLINETO|, -9, -53, -10, -55, -7, -37, RRCURVETO|, 57, -11, RLINETO|, 13, 
HLINETO|, 53, VLINETO|, 18, HLINETO|, 86, VLINETO|, 13, HLINETO|, 72, VLINETO|, 
-46, HLINETO|, -55, VLINETO|, -419, HLINETO|, 21, 56, 18, 60, 14, 60, 
RRCURVETO|]

The many negative values are a "zone of interest".


was (Author: tilman):
Lets assume that the font is correct. I fixed the bug locally that empty 
entries now get skipped in CFFParser.java, by adding {{if (bytes.length == 0) 
continue;}} after {{for (byte[] bytes : fdIndex)}}.

My first thought was that something goes wrong with the offsets because there 
is more than one fdindex entry. But I haven't been able to prove this.

I also tried to install FontForge but it doesn't show anything.

readIndexData char strings:
code 9987 len 153 at offset 298684
code 12431 len 245 at offset 301280
code 14225 len 135 at offset 303318

for code 9987 (which has this "over the top" path) I checked that position and 
length make sense, all the other bytes before/after are 0x0e, which is ENDCHAR

type2 charstring:
[-79, 67, 592, 63, HSTEM|, 164, 64, 388, 66, VSTEM|, 453, 414, RMOVETO|, -29, 
-122, -48, -120, -64, -78, 17, -9, 28, -18, 13, -10, 62, 84, 54, 127, 33, 132, 
RRCURVETO|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, 63, 22, RCURVELINE|, 
-17, 92, -52, 139, -59, 106, RRCURVETO|, -353, 414, RMOVETO|, -33, -149, -57, 
-145, -74, -94, 16, -10, 26, -22, 12, -11, 36, 49, 33, 60, 29, 68, RRCURVETO|, 
160, -575, HLINETO|, -13, -5, -4, -12, -14, -43, -1, 2, -49, VHCURVETO|, 10, 
-19, 11, 9342, 8, 21, 39, VVCURVETO|, 575, 200, VLINETO|, -9, -53, -10, -55, 
-7, -37, 57, -11, RCURVELINE|, 13, 53, 18, 86, 13, 72, -46, -55, -419, 
HLINETO|, 21, 56, 18, 60, 14, 60, RRCURVETO|, -265, 16, -102]

converted to type1 sequence:
[0, 1000.0, HSBW|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, 
RRCURVETO|, 17, -9, 28, -18, 13, -10, RRCURVETO|, 62, 84, 54, 127, 33, 132, 
RRCURVETO|, CLOSEPATH|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, 
RRCURVETO|, 63, 22, RLINETO|, -17, 92, -52, 139, -59, 106, RRCURVETO|, 
CLOSEPATH|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, RRCURVETO|, 
16, -10, 26, -22, 12, -11, RRCURVETO|, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, 
HLINETO|, -575, VLINETO|, 0, -13, -5, -4, -12, 0, RRCURVETO|, -14, 0, -43, -1, 
-49, 2, RRCURVETO|, 0, 10, -19, 11, 0, 9342, RRCURVETO|, 575, VLINETO|, 200, 
HLINETO|, -9, -53, -10, -55, -7, -37, RRCURVETO|, 57, -11, RLINETO|, 13, 
HLINETO|, 53, VLINETO|, 18, HLINETO|, 86, VLINETO|, 13, HLINETO|, 72, VLINETO|, 
-46, HLINETO|, -55, VLINETO|, -419, HLINETO|, 21, 56, 18, 60, 14, 60, 

[jira] [Updated] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?

2024-06-19 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5843:

Attachment: screenshot-3.png

> There is an exception when getting embedded font, is it compatible?
> ---
>
> Key: PDFBOX-5843
> URL: https://issues.apache.org/jira/browse/PDFBOX-5843
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: liu
>Priority: Major
> Attachments: 123.pdf, image-2024-06-19-16-49-40-186.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, xxx.cff
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?

2024-06-19 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856289#comment-17856289
 ] 

Tilman Hausherr commented on PDFBOX-5843:
-

It turns out to be completely different. I ran a 2.0.1 source code build with 
the change and hit an ArrayOutofBoundsException in 
CFFCIDFont.getLocalSubrIndex(). That means we can't just skip empty entries. 
Now I get this:
 !screenshot-3.png! 
I'll investigate some more but it seems promising.

> There is an exception when getting embedded font, is it compatible?
> ---
>
> Key: PDFBOX-5843
> URL: https://issues.apache.org/jira/browse/PDFBOX-5843
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: liu
>Priority: Major
> Attachments: 123.pdf, image-2024-06-19-16-49-40-186.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, xxx.cff
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?

2024-06-19 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856283#comment-17856283
 ] 

Tilman Hausherr edited comment on PDFBOX-5843 at 6/19/24 2:17 PM:
--

Lets assume that the font is correct. I fixed the bug locally that empty 
entries now get skipped in CFFParser.java, by adding {{if (bytes.length == 0) 
continue;}} after {{for (byte[] bytes : fdIndex)}}.

My first thought was that something goes wrong with the offsets because there 
is more than one fdindex entry. But I haven't been able to prove this.

I also tried to install FontForge but it doesn't show anything.

readIndexData char strings:
code 9987 len 153 at offset 298684
code 12431 len 245 at offset 301280
code 14225 len 135 at offset 303318

for code 9987 (which has this "over the top" path) I checked that position and 
length make sense, all the other bytes before/after are 0x0e, which is ENDCHAR

type2 charstring:
[-79, 67, 592, 63, HSTEM|, 164, 64, 388, 66, VSTEM|, 453, 414, RMOVETO|, -29, 
-122, -48, -120, -64, -78, 17, -9, 28, -18, 13, -10, 62, 84, 54, 127, 33, 132, 
RRCURVETO|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, 63, 22, RCURVELINE|, 
-17, 92, -52, 139, -59, 106, RRCURVETO|, -353, 414, RMOVETO|, -33, -149, -57, 
-145, -74, -94, 16, -10, 26, -22, 12, -11, 36, 49, 33, 60, 29, 68, RRCURVETO|, 
160, -575, HLINETO|, -13, -5, -4, -12, -14, -43, -1, 2, -49, VHCURVETO|, 10, 
-19, 11, 9342, 8, 21, 39, VVCURVETO|, 575, 200, VLINETO|, -9, -53, -10, -55, 
-7, -37, 57, -11, RCURVELINE|, 13, 53, 18, 86, 13, 72, -46, -55, -419, 
HLINETO|, 21, 56, 18, 60, 14, 60, RRCURVETO|, -265, 16, -102]

converted to type1 sequence:
[0, 1000.0, HSBW|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, 
RRCURVETO|, 17, -9, 28, -18, 13, -10, RRCURVETO|, 62, 84, 54, 127, 33, 132, 
RRCURVETO|, CLOSEPATH|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, 
RRCURVETO|, 63, 22, RLINETO|, -17, 92, -52, 139, -59, 106, RRCURVETO|, 
CLOSEPATH|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, RRCURVETO|, 
16, -10, 26, -22, 12, -11, RRCURVETO|, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, 
HLINETO|, -575, VLINETO|, 0, -13, -5, -4, -12, 0, RRCURVETO|, -14, 0, -43, -1, 
-49, 2, RRCURVETO|, 0, 10, -19, 11, 0, 9342, RRCURVETO|, 575, VLINETO|, 200, 
HLINETO|, -9, -53, -10, -55, -7, -37, RRCURVETO|, 57, -11, RLINETO|, 13, 
HLINETO|, 53, VLINETO|, 18, HLINETO|, 86, VLINETO|, 13, HLINETO|, 72, VLINETO|, 
-46, HLINETO|, -55, VLINETO|, -419, HLINETO|, 21, 56, 18, 60, 14, 60, 
RRCURVETO|]

The many negative values are a "zone of interest".The many negative values are 
a "zone of interest".


was (Author: tilman):
Lets assume that the font is correct. I fixed the bug locally that empty 
entries now get skipped in CFFParser.java, by adding {{if (bytes.length == 0) 
continue;}} after {{for (byte[] bytes : fdIndex)}}.

My first thought was that something goes wrong with the offsets because there 
is more than one fdindex entry. But I haven't been able to prove this.

I also tried to install FontForge but it doesn't show anything.

readIndexData char strings:
code 9987 len 153 at offset 298684
code 12431 len 245 at offset 301280
code 14225 len 135 at offset 303318

for code 9987 (which has this "over the top" path) I checked that position and 
length make sense, all the other bytes before/after are 0x0e, which is ENDCHAR

type2 charstring:
[-78, 63, 362, 64, 158, 63, 69, 61, HSTEM|, 664, 69, VSTEM|, 67, 290, RMOVETO|, 
54, -35, 57, -42, 52, -43, -54, -90, -68, -64, -81, -40, 15, -12, 20, -25, 8, 
-16, 85, 47, 70, 65, 57, 90, 43, -39, 37, -39, 25, -33, 47, 56, RCURVELINE|, 
-27, 35, -42, 41, -49, 41, 55, 112, 36, 143, 16, 182, -42, -55, -153, HLINETO|, 
14, 70, 12, 69, 9, 63, -66, 4, RCURVELINE|, -7, -63, -12, -71, -14, -72, 
RRCURVETO|, -110, -63, 97, HLINETO|, -22, -105, -27, -102, -24, -72, 
RRCURVETO|, 286, 279, RMOVETO|, -16, -135, -31, -112, -43, -92, -39, 30, -42, 
30, -40, 26, 21, 73, 22, 89, 19, 91, RRCURVETO|, 460, -39, 12902, ENDCHAR|]

converted to type1 sequence:
[0, 1000.0, HSBW|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, 
RRCURVETO|, 17, -9, 28, -18, 13, -10, RRCURVETO|, 62, 84, 54, 127, 33, 132, 
RRCURVETO|, CLOSEPATH|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, 
RRCURVETO|, 63, 22, RLINETO|, -17, 92, -52, 139, -59, 106, RRCURVETO|, 
CLOSEPATH|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, RRCURVETO|, 
16, -10, 26, -22, 12, -11, RRCURVETO|, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, 
HLINETO|, -575, VLINETO|, 0, -13, -5, -4, -12, 0, RRCURVETO|, -14, 0, -43, -1, 
-49, 2, RRCURVETO|, 0, 10, -19, 11, 0, 9342, RRCURVETO|, 575, VLINETO|, 200, 
HLINETO|, -9, -53, -10, -55, -7, -37, RRCURVETO|, 57, -11, RLINETO|, 13, 
HLINETO|, 53, VLINETO|, 18, HLINETO|, 86, VLINETO|, 13, HLINETO|, 72, VLINETO|, 
-46, HLINETO|, -55, VLINETO|, -419, HLINETO|, 21, 56, 18, 60, 14, 60, 
RRCURVETO

[jira] [Comment Edited] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?

2024-06-19 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856283#comment-17856283
 ] 

Tilman Hausherr edited comment on PDFBOX-5843 at 6/19/24 2:16 PM:
--

Lets assume that the font is correct. I fixed the bug locally that empty 
entries now get skipped in CFFParser.java, by adding {{if (bytes.length == 0) 
continue;}} after {{for (byte[] bytes : fdIndex)}}.

My first thought was that something goes wrong with the offsets because there 
is more than one fdindex entry. But I haven't been able to prove this.

I also tried to install FontForge but it doesn't show anything.

readIndexData char strings:
code 9987 len 153 at offset 298684
code 12431 len 245 at offset 301280
code 14225 len 135 at offset 303318

for code 9987 (which has this "over the top" path) I checked that position and 
length make sense, all the other bytes before/after are 0x0e, which is ENDCHAR

type2 charstring:
[-78, 63, 362, 64, 158, 63, 69, 61, HSTEM|, 664, 69, VSTEM|, 67, 290, RMOVETO|, 
54, -35, 57, -42, 52, -43, -54, -90, -68, -64, -81, -40, 15, -12, 20, -25, 8, 
-16, 85, 47, 70, 65, 57, 90, 43, -39, 37, -39, 25, -33, 47, 56, RCURVELINE|, 
-27, 35, -42, 41, -49, 41, 55, 112, 36, 143, 16, 182, -42, -55, -153, HLINETO|, 
14, 70, 12, 69, 9, 63, -66, 4, RCURVELINE|, -7, -63, -12, -71, -14, -72, 
RRCURVETO|, -110, -63, 97, HLINETO|, -22, -105, -27, -102, -24, -72, 
RRCURVETO|, 286, 279, RMOVETO|, -16, -135, -31, -112, -43, -92, -39, 30, -42, 
30, -40, 26, 21, 73, 22, 89, 19, 91, RRCURVETO|, 460, -39, 12902, ENDCHAR|]

converted to type1 sequence:
[0, 1000.0, HSBW|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, 
RRCURVETO|, 17, -9, 28, -18, 13, -10, RRCURVETO|, 62, 84, 54, 127, 33, 132, 
RRCURVETO|, CLOSEPATH|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, 
RRCURVETO|, 63, 22, RLINETO|, -17, 92, -52, 139, -59, 106, RRCURVETO|, 
CLOSEPATH|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, RRCURVETO|, 
16, -10, 26, -22, 12, -11, RRCURVETO|, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, 
HLINETO|, -575, VLINETO|, 0, -13, -5, -4, -12, 0, RRCURVETO|, -14, 0, -43, -1, 
-49, 2, RRCURVETO|, 0, 10, -19, 11, 0, 9342, RRCURVETO|, 575, VLINETO|, 200, 
HLINETO|, -9, -53, -10, -55, -7, -37, RRCURVETO|, 57, -11, RLINETO|, 13, 
HLINETO|, 53, VLINETO|, 18, HLINETO|, 86, VLINETO|, 13, HLINETO|, 72, VLINETO|, 
-46, HLINETO|, -55, VLINETO|, -419, HLINETO|, 21, 56, 18, 60, 14, 60, 
RRCURVETO|]

The many negative values are a "zone of interest".


was (Author: tilman):
Lets assume that the font is correct. I fixed the bug locally that empty 
entries now get skipped in CFFParser.java, by adding {{if (bytes.length == 0) 
continue;}} after {{for (byte[] bytes : fdIndex)}}.

My first thought was that something goes wrong with the offsets because there 
is more than one fdindex entry. But I haven't been able to prove this.

I also tried to install FontForge but it doesn't show anything.

readIndexData char strings:
code 9987 len 153 at offset 298684
code 12431 len 245 at offset 301280
code 14225 len 135 at offset 303318

for code 9987 (which has this "over the top" path) I checked that position and 
length make sense, all the other bytes before/after are 0x0e, which is ENDCHAR

type2 charstring:
[-78, 63, 362, 64, 158, 63, 69, 61, HSTEM|, 664, 69, VSTEM|, 67, 290, RMOVETO|, 
54, -35, 57, -42, 52, -43, -54, -90, -68, -64, -81, -40, 15, -12, 20, -25, 8, 
-16, 85, 47, 70, 65, 57, 90, 43, -39, 37, -39, 25, -33, 47, 56, RCURVELINE|, 
-27, 35, -42, 41, -49, 41, 55, 112, 36, 143, 16, 182, -42, -55, -153, HLINETO|, 
14, 70, 12, 69, 9, 63, -66, 4, RCURVELINE|, -7, -63, -12, -71, -14, -72, 
RRCURVETO|, -110, -63, 97, HLINETO|, -22, -105, -27, -102, -24, -72, 
RRCURVETO|, 286, 279, RMOVETO|, -16, -135, -31, -112, -43, -92, -39, 30, -42, 
30, -40, 26, 21, 73, 22, 89, 19, 91, RRCURVETO|, 460, -39, 12902, ENDCHAR|]

converted to type1 sequence:
[0, 1000.0, HSBW|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, 
RRCURVETO|, 17, -9, 28, -18, 13, -10, RRCURVETO|, 62, 84, 54, 127, 33, 132, 
RRCURVETO|, CLOSEPATH|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, 
RRCURVETO|, 63, 22, RLINETO|, -17, 92, -52, 139, -59, 106, RRCURVETO|, 
CLOSEPATH|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, RRCURVETO|, 
16, -10, 26, -22, 12, -11, RRCURVETO|, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, 
HLINETO|, -575, VLINETO|, 0, -13, -5, -4, -12, 0, RRCURVETO|, -14, 0, -43, -1, 
-49, 2, RRCURVETO|, 0, 10, -19, 11, 0, 9342, RRCURVETO|, 575, VLINETO|, 200, 
HLINETO|, -9, -53, -10, -55, -7, -37, RRCURVETO|, 57, -11, RLINETO|, 13, 
HLINETO|, 53, VLINETO|, 18, HLINETO|, 86, VLINETO|, 13, HLINETO|, 72, VLINETO|, 
-46, HLINETO|, -55, VLINETO|, -419, HLINETO|, 21, 56, 18, 60, 14, 60, 
RRCURVETO|]

The many negative values are a "zone of interest".

> There is an exception w

[jira] [Commented] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?

2024-06-19 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856283#comment-17856283
 ] 

Tilman Hausherr commented on PDFBOX-5843:
-

Lets assume that the font is correct. I fixed the bug locally that empty 
entries now get skipped in CFFParser.java, by adding {{if (bytes.length == 0) 
continue;}} after {{for (byte[] bytes : fdIndex)}}.

My first thought was that something goes wrong with the offsets because there 
is more than one fdindex entry. But I haven't been able to prove this.

I also tried to install FontForge but it doesn't show anything.

readIndexData char strings:
code 9987 len 153 at offset 298684
code 12431 len 245 at offset 301280
code 14225 len 135 at offset 303318

for code 9987 (which has this "over the top" path) I checked that position and 
length make sense, all the other bytes before/after are 0x0e, which is ENDCHAR

type2 charstring:
[-78, 63, 362, 64, 158, 63, 69, 61, HSTEM|, 664, 69, VSTEM|, 67, 290, RMOVETO|, 
54, -35, 57, -42, 52, -43, -54, -90, -68, -64, -81, -40, 15, -12, 20, -25, 8, 
-16, 85, 47, 70, 65, 57, 90, 43, -39, 37, -39, 25, -33, 47, 56, RCURVELINE|, 
-27, 35, -42, 41, -49, 41, 55, 112, 36, 143, 16, 182, -42, -55, -153, HLINETO|, 
14, 70, 12, 69, 9, 63, -66, 4, RCURVELINE|, -7, -63, -12, -71, -14, -72, 
RRCURVETO|, -110, -63, 97, HLINETO|, -22, -105, -27, -102, -24, -72, 
RRCURVETO|, 286, 279, RMOVETO|, -16, -135, -31, -112, -43, -92, -39, 30, -42, 
30, -40, 26, 21, 73, 22, 89, 19, 91, RRCURVETO|, 460, -39, 12902, ENDCHAR|]

converted to type1 sequence:
[0, 1000.0, HSBW|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, 
RRCURVETO|, 17, -9, 28, -18, 13, -10, RRCURVETO|, 62, 84, 54, 127, 33, 132, 
RRCURVETO|, CLOSEPATH|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, 
RRCURVETO|, 63, 22, RLINETO|, -17, 92, -52, 139, -59, 106, RRCURVETO|, 
CLOSEPATH|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, RRCURVETO|, 
16, -10, 26, -22, 12, -11, RRCURVETO|, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, 
HLINETO|, -575, VLINETO|, 0, -13, -5, -4, -12, 0, RRCURVETO|, -14, 0, -43, -1, 
-49, 2, RRCURVETO|, 0, 10, -19, 11, 0, 9342, RRCURVETO|, 575, VLINETO|, 200, 
HLINETO|, -9, -53, -10, -55, -7, -37, RRCURVETO|, 57, -11, RLINETO|, 13, 
HLINETO|, 53, VLINETO|, 18, HLINETO|, 86, VLINETO|, 13, HLINETO|, 72, VLINETO|, 
-46, HLINETO|, -55, VLINETO|, -419, HLINETO|, 21, 56, 18, 60, 14, 60, 
RRCURVETO|]

The many negative values are a "zone of interest".

> There is an exception when getting embedded font, is it compatible?
> ---
>
> Key: PDFBOX-5843
> URL: https://issues.apache.org/jira/browse/PDFBOX-5843
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: liu
>Priority: Major
> Attachments: 123.pdf, image-2024-06-19-16-49-40-186.png, 
> screenshot-1.png, screenshot-2.png, xxx.cff
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?

2024-06-19 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5843:

Attachment: xxx.cff

> There is an exception when getting embedded font, is it compatible?
> ---
>
> Key: PDFBOX-5843
> URL: https://issues.apache.org/jira/browse/PDFBOX-5843
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: liu
>Priority: Major
> Attachments: 123.pdf, image-2024-06-19-16-49-40-186.png, 
> screenshot-1.png, screenshot-2.png, xxx.cff
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?

2024-06-19 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856218#comment-17856218
 ] 

Tilman Hausherr commented on PDFBOX-5843:
-

I'm wondering whether something is wrong on our side. The fdindex contains many 
empty entries. When I tried to skip these, I got this rendering:
 !screenshot-2.png! 

> There is an exception when getting embedded font, is it compatible?
> ---
>
> Key: PDFBOX-5843
> URL: https://issues.apache.org/jira/browse/PDFBOX-5843
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: liu
>Priority: Major
> Attachments: 123.pdf, screenshot-1.png, screenshot-2.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?

2024-06-19 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5843:

Attachment: screenshot-2.png

> There is an exception when getting embedded font, is it compatible?
> ---
>
> Key: PDFBOX-5843
> URL: https://issues.apache.org/jira/browse/PDFBOX-5843
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: liu
>Priority: Major
> Attachments: 123.pdf, screenshot-1.png, screenshot-2.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5842) IllegalArgumentException: Width (26) and height (0) must be non-zero

2024-06-18 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855852#comment-17855852
 ] 

Tilman Hausherr commented on PDFBOX-5842:
-

Thanks, yes, good observation.

> IllegalArgumentException: Width (26) and height (0) must be non-zero
> 
>
> Key: PDFBOX-5842
> URL: https://issues.apache.org/jira/browse/PDFBOX-5842
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>    Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> reported by Patrycja Zaremba in the users mailing list
> https://lists.apache.org/thread/xnwcyhq2c16d9xfgqwgjs70k9qb1w8tp
> {quote}When the page which I try to convert have any element which is png with
> only 1px height (28x1, 54x1 etc.) it is scaled down to 0 and I got this{quote}
> IllegalArgumentException: Width (26) and height (0) must be non-zero
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1281)
> 
> org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:74)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5841) First split result document misses metadata after split

2024-06-17 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855787#comment-17855787
 ] 

Tilman Hausherr commented on PDFBOX-5841:
-

I'm worried because this is unexpected for an average user wanting to do 
similar manipulations. So the point of this change is that we assign the 
indirect object? I'm wondering if the problem would happen with the other 
assignments, e.g. ViewerPreferences if they exist.

> First split result document misses metadata after split
> ---
>
> Key: PDFBOX-5841
> URL: https://issues.apache.org/jira/browse/PDFBOX-5841
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>    Reporter: Tilman Hausherr
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: splitresult1.pdf, splitresult2.pdf
>
>
> This happens with the test file of PDFBOX-5840 and can also be reproduced 
> with the command line utility: the first split result file doesn't have the 
> metadata.
> Alternatively it can be reproduced programmatically by adding this code below 
> {{assertEquals(5, pageTree.indexOf(pd5.getPage()));}} in 
> {code:java}
> assertNotNull(dstDoc.getDocumentCatalog().getMetadata());
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> dstDoc.save(baos);
> PDDocument reloadedDoc = Loader.loadPDF(baos.toByteArray());
> assertNotNull(reloadedDoc.getDocumentCatalog().getMetadata());
> reloadedDoc.close();
> {code}
> I believe this is another writing problem, because the metadata exists, but 
> gets lost during the first save, not during a second one (not part of the 
> test code). It is expected to be object 116. It doesn't happen with 2.0. 
> Attached: two saved files by splitting so that the entire file is the result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5842) IllegalArgumentException: Width (26) and height (0) must be non-zero

2024-06-17 Thread Tilman Hausherr (Jira)
Tilman Hausherr created PDFBOX-5842:
---

 Summary: IllegalArgumentException: Width (26) and height (0) must 
be non-zero
 Key: PDFBOX-5842
 URL: https://issues.apache.org/jira/browse/PDFBOX-5842
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 3.0.2 PDFBox, 2.0.31
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
 Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0


reported by Patrycja Zaremba in the users mailing list
https://lists.apache.org/thread/xnwcyhq2c16d9xfgqwgjs70k9qb1w8tp

{quote}When the page which I try to convert have any element which is png with
only 1px height (28x1, 54x1 etc.) it is scaled down to 0 and I got this{quote}

IllegalArgumentException: Width (26) and height (0) must be non-zero
org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1281)

org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:74)





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5836) PDF A-1 falsely validated as invalid for ICC color profile regression

2024-06-15 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855242#comment-17855242
 ] 

Tilman Hausherr commented on PDFBOX-5836:
-

I used the command line application. My jdk11 version:
java version "11.0.21" 2023-10-17 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.21+9-LTS-193)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.21+9-LTS-193, mixed mode)

> PDF A-1 falsely validated as invalid for ICC color profile regression
> -
>
> Key: PDFBOX-5836
> URL: https://issues.apache.org/jira/browse/PDFBOX-5836
> Project: PDFBox
>  Issue Type: Bug
>  Components: Preflight
>Affects Versions: 3.0.2 PDFBox
>Reporter: Jochen Stärk
>Priority: Major
> Attachments: MustangGnuaccountingBeispielRE-20190610_507blanko.pdf
>
>
> PreflightParser.validate(theFile.toFile()).isValid() throws a "Unable to 
> parse the ICC Profile" on the attached, Libreoffice-generated PDF/A-1. 
> VeraPDF validates the file as valid. It worked with PDF 2 and I need it to be 
> fixed in context of my upgrade to PDFbox 3 
> (https://github.com/ZUGFeRD/mustangproject/issues/373).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5834) [PATCH] PDF split missing names from documentCatalog

2024-06-15 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855217#comment-17855217
 ] 

Tilman Hausherr commented on PDFBOX-5834:
-

I have attached two files. This is really weird stuff, which relies on JS 
usage. I wonder why this would have to be split at all.

> [PATCH] PDF split missing names from documentCatalog
> 
>
> Key: PDFBOX-5834
> URL: https://issues.apache.org/jira/browse/PDFBOX-5834
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Simon Steiner
>Priority: Major
> Attachments: 726725.pdf, 801500.pdf, tmp.patch
>
>
> java -jar app/target/pdfbox-app-2.0.32-SNAPSHOT.jar PDFSplit xxx.pdf
> I would expect to see the names dict inside the documentCatalog which is used 
> to store pdf templates



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5834) [PATCH] PDF split missing names from documentCatalog

2024-06-15 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5834:

Attachment: 726725.pdf

> [PATCH] PDF split missing names from documentCatalog
> 
>
> Key: PDFBOX-5834
> URL: https://issues.apache.org/jira/browse/PDFBOX-5834
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Simon Steiner
>Priority: Major
> Attachments: 726725.pdf, 801500.pdf, tmp.patch
>
>
> java -jar app/target/pdfbox-app-2.0.32-SNAPSHOT.jar PDFSplit xxx.pdf
> I would expect to see the names dict inside the documentCatalog which is used 
> to store pdf templates



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5834) [PATCH] PDF split missing names from documentCatalog

2024-06-15 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5834:

Attachment: 801500.pdf

> [PATCH] PDF split missing names from documentCatalog
> 
>
> Key: PDFBOX-5834
> URL: https://issues.apache.org/jira/browse/PDFBOX-5834
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Simon Steiner
>Priority: Major
> Attachments: 726725.pdf, 801500.pdf, tmp.patch
>
>
> java -jar app/target/pdfbox-app-2.0.32-SNAPSHOT.jar PDFSplit xxx.pdf
> I would expect to see the names dict inside the documentCatalog which is used 
> to store pdf templates



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5841) First split result document misses metadata after split

2024-06-15 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5841:

Summary: First split result document misses metadata after split  (was: 
Split result document misses metadata after split)

> First split result document misses metadata after split
> ---
>
> Key: PDFBOX-5841
> URL: https://issues.apache.org/jira/browse/PDFBOX-5841
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>    Reporter: Tilman Hausherr
>Priority: Major
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: splitresult1.pdf, splitresult2.pdf
>
>
> This happens with the test file of PDFBOX-5840 and can also be reproduced 
> with the command line utility: the first split result file doesn't have the 
> metadata.
> Alternatively it can be reproduced programmatically by adding this code below 
> {{assertEquals(5, pageTree.indexOf(pd5.getPage()));}} in 
> {code:java}
> assertNotNull(dstDoc.getDocumentCatalog().getMetadata());
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> dstDoc.save(baos);
> PDDocument reloadedDoc = Loader.loadPDF(baos.toByteArray());
> assertNotNull(reloadedDoc.getDocumentCatalog().getMetadata());
> reloadedDoc.close();
> {code}
> I believe this is another writing problem, because the metadata exists, but 
> gets lost during the first save, not during a second one (not part of the 
> test code). It is expected to be object 116. It doesn't happen with 2.0. 
> Attached: two saved files by splitting so that the entire file is the result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5841) Split result document misses metadata after split

2024-06-15 Thread Tilman Hausherr (Jira)
Tilman Hausherr created PDFBOX-5841:
---

 Summary: Split result document misses metadata after split
 Key: PDFBOX-5841
 URL: https://issues.apache.org/jira/browse/PDFBOX-5841
 Project: PDFBox
  Issue Type: Bug
  Components: Writing
Affects Versions: 3.0.3 PDFBox, 4.0.0
Reporter: Tilman Hausherr
 Fix For: 3.0.3 PDFBox, 4.0.0
 Attachments: splitresult1.pdf, splitresult2.pdf

This happens with the test file of PDFBOX-5840 and can also be reproduced with 
the command line utility: the first split result file doesn't have the metadata.

Alternatively it can be reproduced programmatically by adding this code below 
{{assertEquals(5, pageTree.indexOf(pd5.getPage()));}} in 
{code:java}
assertNotNull(dstDoc.getDocumentCatalog().getMetadata());
ByteArrayOutputStream baos = new ByteArrayOutputStream();
dstDoc.save(baos);
PDDocument reloadedDoc = Loader.loadPDF(baos.toByteArray());
assertNotNull(reloadedDoc.getDocumentCatalog().getMetadata());
reloadedDoc.close();
{code}
I believe this is another writing problem, because the metadata exists, but 
gets lost during the first save, not during a second one (not part of the test 
code). It is expected to be object 116. It doesn't happen with 2.0. Attached: 
two saved files by splitting so that the entire file is the result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName

2024-06-14 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-5835.
-
  Assignee: Tilman Hausherr
Resolution: Fixed

[~O.Schmidtmer] thanks for reporting

[~msahyoun] thanks for the help

> DomXmpParser - IllegalArgumentException: prefix cannot be "null" when 
> creating a QName
> --
>
> Key: PDFBOX-5835
> URL: https://issues.apache.org/jira/browse/PDFBOX-5835
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Oliver Schmidtmer
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> I've got a PDF from, where parsing the metadata fails with an 
> IllegalArgumentException
> {code:java}
> java.lang.IllegalArgumentException: prefix cannot be "null" when creating a 
> QName
>   at java.xml/javax.xml.namespace.QName.(QName.java:192)
>   at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112)
> {code}
> This can be reproduced with a simple test, using the extracted metadata:
> {code:java}
> @Test
> void testDomXmpParser() throws XmpParsingException
> {
> // taken from file test-landscape2.pdf
> String xmpmeta = " standalone=\"no\"?>\n" +
> " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" +
> "  xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" +
> "  xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" +
> "   3\n" +
> "   A\n" +
> "  \n" +
> "   xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; 
> xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; 
> xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; 
> xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; 
> xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" +
> "  \n" +
> "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" +
> "\n" +
> " \n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension 
> Schema\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n"
>  +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" +
> "   \n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML 
> invoice file\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/n

[jira] [Updated] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName

2024-06-14 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5835:

Fix Version/s: 2.0.32
   3.0.3 PDFBox
   4.0.0

> DomXmpParser - IllegalArgumentException: prefix cannot be "null" when 
> creating a QName
> --
>
> Key: PDFBOX-5835
> URL: https://issues.apache.org/jira/browse/PDFBOX-5835
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Oliver Schmidtmer
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> I've got a PDF from, where parsing the metadata fails with an 
> IllegalArgumentException
> {code:java}
> java.lang.IllegalArgumentException: prefix cannot be "null" when creating a 
> QName
>   at java.xml/javax.xml.namespace.QName.(QName.java:192)
>   at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112)
> {code}
> This can be reproduced with a simple test, using the extracted metadata:
> {code:java}
> @Test
> void testDomXmpParser() throws XmpParsingException
> {
> // taken from file test-landscape2.pdf
> String xmpmeta = " standalone=\"no\"?>\n" +
> " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" +
> "  xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" +
> "  xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" +
> "   3\n" +
> "   A\n" +
> "  \n" +
> "   xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; 
> xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; 
> xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; 
> xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; 
> xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" +
> "  \n" +
> "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" +
> "\n" +
> " \n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension 
> Schema\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n"
>  +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" +
> "   \n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML 
> invoice file\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.

[jira] [Updated] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName

2024-06-14 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5835:

Affects Version/s: 2.0.31

> DomXmpParser - IllegalArgumentException: prefix cannot be "null" when 
> creating a QName
> --
>
> Key: PDFBOX-5835
> URL: https://issues.apache.org/jira/browse/PDFBOX-5835
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Oliver Schmidtmer
>Priority: Major
>
> I've got a PDF from, where parsing the metadata fails with an 
> IllegalArgumentException
> {code:java}
> java.lang.IllegalArgumentException: prefix cannot be "null" when creating a 
> QName
>   at java.xml/javax.xml.namespace.QName.(QName.java:192)
>   at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112)
> {code}
> This can be reproduced with a simple test, using the extracted metadata:
> {code:java}
> @Test
> void testDomXmpParser() throws XmpParsingException
> {
> // taken from file test-landscape2.pdf
> String xmpmeta = " standalone=\"no\"?>\n" +
> " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" +
> "  xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" +
> "  xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" +
> "   3\n" +
> "   A\n" +
> "  \n" +
> "   xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; 
> xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; 
> xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; 
> xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; 
> xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" +
> "  \n" +
> "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" +
> "\n" +
> " \n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension 
> Schema\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n"
>  +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" +
> "   \n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML 
> invoice file\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\

[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName

2024-06-14 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855075#comment-17855075
 ] 

Tilman Hausherr commented on PDFBOX-5835:
-

I have created a reduced file from yours and have verified that the two 
modifications are hit and have an effect. I will add a test with that file.

> DomXmpParser - IllegalArgumentException: prefix cannot be "null" when 
> creating a QName
> --
>
> Key: PDFBOX-5835
> URL: https://issues.apache.org/jira/browse/PDFBOX-5835
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 3.0.2 PDFBox
>Reporter: Oliver Schmidtmer
>Priority: Major
>
> I've got a PDF from, where parsing the metadata fails with an 
> IllegalArgumentException
> {code:java}
> java.lang.IllegalArgumentException: prefix cannot be "null" when creating a 
> QName
>   at java.xml/javax.xml.namespace.QName.(QName.java:192)
>   at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112)
> {code}
> This can be reproduced with a simple test, using the extracted metadata:
> {code:java}
> @Test
> void testDomXmpParser() throws XmpParsingException
> {
> // taken from file test-landscape2.pdf
> String xmpmeta = " standalone=\"no\"?>\n" +
> " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" +
> "  xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" +
> "  xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" +
> "   3\n" +
> "   A\n" +
> "  \n" +
> "   xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; 
> xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; 
> xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; 
> xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; 
> xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" +
> "  \n" +
> "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" +
> "\n" +
> " \n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension 
> Schema\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n"
>  +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" +
> "   \n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML 
> invoice file\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
>

[jira] [Resolved] (PDFBOX-5840) When splitting, keep named page destinations that are part of target document(s)

2024-06-14 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-5840.
-
Resolution: Fixed

> When splitting, keep named page destinations that are part of target 
> document(s)
> 
>
> Key: PDFBOX-5840
> URL: https://issues.apache.org/jira/browse/PDFBOX-5840
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>    Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: 410609.pdf, named-dest-handling abandoned code.txt
>
>
> Keep named destinations. The current code just ignores them. I wrote some 40 
> lines that would create a name tree in the destination document, but this 
> didn't work because the destination name gets modified when retrieved as a 
> string. So I just keep the actual destination and forget the name, which is a 
> single code line. It's a new document anyway and the average user expectation 
> is that the links "just work".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5834) [PATCH] PDF split missing names from documentCatalog

2024-06-14 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854917#comment-17854917
 ] 

Tilman Hausherr edited comment on PDFBOX-5834 at 6/14/24 1:51 PM:
--

I'd like to see an example of such a PDF. And I'm also wondering whether the 
current solution misses named destinations, which would be a more common 
problem. (update: done in PDFBOX-5840)


was (Author: tilman):
I'd like to see an example of such a PDF. And I'm also wondering whether the 
current solution misses named destinations, which would be a more common 
problem.

> [PATCH] PDF split missing names from documentCatalog
> 
>
> Key: PDFBOX-5834
> URL: https://issues.apache.org/jira/browse/PDFBOX-5834
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Simon Steiner
>Priority: Major
> Attachments: tmp.patch
>
>
> java -jar app/target/pdfbox-app-2.0.32-SNAPSHOT.jar PDFSplit xxx.pdf
> I would expect to see the names dict inside the documentCatalog which is used 
> to store pdf templates



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5840) When splitting, keep named page destinations that are part of target document(s)

2024-06-14 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855042#comment-17855042
 ] 

Tilman Hausherr commented on PDFBOX-5840:
-

Copyright: the document is published by the USDA, see 
https://web.archive.org/web/20050411153046/http://www.nal.usda.gov/awic/pubs/Fishwelfare/
which links to
https://web.archive.org/web/20050411172414/http://www.nal.usda.gov/awic/pubs/Fishwelfare/culture.htm
which links to
https://web.archive.org/web/20050411212421/http://www.nal.usda.gov/awic/pubs/Fishwelfare/aquar.htm
which links to our PDF.

> When splitting, keep named page destinations that are part of target 
> document(s)
> 
>
> Key: PDFBOX-5840
> URL: https://issues.apache.org/jira/browse/PDFBOX-5840
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>    Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: 410609.pdf, named-dest-handling abandoned code.txt
>
>
> Keep named destinations. The current code just ignores them. I wrote some 40 
> lines that would create a name tree in the destination document, but this 
> didn't work because the destination name gets modified when retrieved as a 
> string. So I just keep the actual destination and forget the name, which is a 
> single code line. It's a new document anyway and the average user expectation 
> is that the links "just work".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5840) When splitting, keep named page destinations that are part of target document(s)

2024-06-14 Thread Tilman Hausherr (Jira)
Tilman Hausherr created PDFBOX-5840:
---

 Summary: When splitting, keep named page destinations that are 
part of target document(s)
 Key: PDFBOX-5840
 URL: https://issues.apache.org/jira/browse/PDFBOX-5840
 Project: PDFBox
  Issue Type: Improvement
  Components: Utilities
Affects Versions: 3.0.2 PDFBox, 2.0.31
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
 Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
 Attachments: 410609.pdf, named-dest-handling abandoned code.txt

Keep named destinations. The current code just ignores them. I wrote some 40 
lines that would create a name tree in the destination document, but this 
didn't work because the destination name gets modified when retrieved as a 
string. So I just keep the actual destination and forget the name, which is a 
single code line. It's a new document anyway and the average user expectation 
is that the links "just work".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5840) When splitting, keep named page destinations that are part of target document(s)

2024-06-14 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5840:

Attachment: named-dest-handling abandoned code.txt

> When splitting, keep named page destinations that are part of target 
> document(s)
> 
>
> Key: PDFBOX-5840
> URL: https://issues.apache.org/jira/browse/PDFBOX-5840
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>    Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: 410609.pdf, named-dest-handling abandoned code.txt
>
>
> Keep named destinations. The current code just ignores them. I wrote some 40 
> lines that would create a name tree in the destination document, but this 
> didn't work because the destination name gets modified when retrieved as a 
> string. So I just keep the actual destination and forget the name, which is a 
> single code line. It's a new document anyway and the average user expectation 
> is that the links "just work".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-06-14 Thread Tilman Hausherr

Result:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz

From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1 hour to 
create the A vs B report (tika-eval).


Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the change 
from PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were "omitted" 
and in 2.0.32 there is some special char. But th remaining part 
looks good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 is 
able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz 



No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have the 
results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I 
can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5839) ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to org.apache.pdfbox.cos.COSDictionary

2024-06-14 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-5839.
-
Fix Version/s: 2.0.32
   3.0.3 PDFBox
   4.0.0
 Assignee: Tilman Hausherr
   Resolution: Fixed

ok thanks. I have added both files to my local rendering test set.

> ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to 
> org.apache.pdfbox.cos.COSDictionary
> ---
>
> Key: PDFBOX-5839
> URL: https://issues.apache.org/jira/browse/PDFBOX-5839
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: liu
>    Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: 1.pdf, 2.pdf, image-2024-06-14-15-36-01-099.png, 
> image-2024-06-14-16-35-39-381.png, image-2024-06-14-16-39-47-557.png
>
>
> [^1.pdf][^2.pdf]
> ^!image-2024-06-14-15-36-01-099.png|width=395,height=214!^
> When rendering 1.pdf and 2.pdf, there will be a ClassCastException problem.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5839) ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to org.apache.pdfbox.cos.COSDictionary

2024-06-14 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854991#comment-17854991
 ] 

Tilman Hausherr commented on PDFBOX-5839:
-

Yeah I fixed many in the first commit series, but then I didn't fix the rest 
because I had used a narrow search string to copy my change everywhere and then 
forgot to use a general search to find other occurences. Thanks. I hope this 
time I got all of them.

> ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to 
> org.apache.pdfbox.cos.COSDictionary
> ---
>
> Key: PDFBOX-5839
> URL: https://issues.apache.org/jira/browse/PDFBOX-5839
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: liu
>Priority: Major
> Attachments: 1.pdf, 2.pdf, image-2024-06-14-15-36-01-099.png, 
> image-2024-06-14-16-35-39-381.png, image-2024-06-14-16-39-47-557.png
>
>
> [^1.pdf][^2.pdf]
> ^!image-2024-06-14-15-36-01-099.png|width=395,height=214!^
> When rendering 1.pdf and 2.pdf, there will be a ClassCastException problem.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5839) ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to org.apache.pdfbox.cos.COSDictionary

2024-06-14 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854974#comment-17854974
 ] 

Tilman Hausherr commented on PDFBOX-5839:
-

I didn't get an exception abort with file 2, but the change fixes the potential 
exception from the image.

> ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to 
> org.apache.pdfbox.cos.COSDictionary
> ---
>
> Key: PDFBOX-5839
> URL: https://issues.apache.org/jira/browse/PDFBOX-5839
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: liu
>Priority: Major
> Attachments: 1.pdf, 2.pdf, image-2024-06-14-15-36-01-099.png
>
>
> [^1.pdf][^2.pdf]
> ^!image-2024-06-14-15-36-01-099.png|width=395,height=214!^
> When rendering 1.pdf and 2.pdf, there will be a ClassCastException problem.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5839) ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to org.apache.pdfbox.cos.COSDictionary

2024-06-14 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5839:

Description: 
[^1.pdf][^2.pdf]

^!image-2024-06-14-15-36-01-099.png|width=395,height=214!^

When rendering 1.pdf and 2.pdf, there will be a ClassCastException problem.

 

  was:
[^1.pdf][^2.pdf]

^!image-2024-06-14-15-36-01-099.png|width=395,height=214!^

When converting 1.pdf and 2.pdf, there will be a ClassCastException problem.

 


> ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to 
> org.apache.pdfbox.cos.COSDictionary
> ---
>
> Key: PDFBOX-5839
> URL: https://issues.apache.org/jira/browse/PDFBOX-5839
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: liu
>Priority: Major
> Attachments: 1.pdf, 2.pdf, image-2024-06-14-15-36-01-099.png
>
>
> [^1.pdf][^2.pdf]
> ^!image-2024-06-14-15-36-01-099.png|width=395,height=214!^
> When rendering 1.pdf and 2.pdf, there will be a ClassCastException problem.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5839) ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to org.apache.pdfbox.cos.COSDictionary

2024-06-14 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5839:

Component/s: Rendering

> ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to 
> org.apache.pdfbox.cos.COSDictionary
> ---
>
> Key: PDFBOX-5839
> URL: https://issues.apache.org/jira/browse/PDFBOX-5839
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: liu
>Priority: Major
> Attachments: 1.pdf, 2.pdf, image-2024-06-14-15-36-01-099.png
>
>
> [^1.pdf][^2.pdf]
> ^!image-2024-06-14-15-36-01-099.png|width=395,height=214!^
> When converting 1.pdf and 2.pdf, there will be a ClassCastException problem.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-06-14 Thread Tilman Hausherr
I'll repeat the regression tests with locally reverting the change from 
PDFBOX-5790 but locally adding my proposed xmpbox change from 
PDFBOX-5835. This way we'll know whether there are other problems.


Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were "omitted" 
and in 2.0.32 there is some special char. But th remaining part looks 
good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 is 
able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have the 
results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I 
can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5834) [PATCH] PDF split missing names from documentCatalog

2024-06-13 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854917#comment-17854917
 ] 

Tilman Hausherr commented on PDFBOX-5834:
-

I'd like to see an example of such a PDF. And I'm also wondering whether the 
current solution misses named destinations, which would be a more common 
problem.

> [PATCH] PDF split missing names from documentCatalog
> 
>
> Key: PDFBOX-5834
> URL: https://issues.apache.org/jira/browse/PDFBOX-5834
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Simon Steiner
>Priority: Major
> Attachments: tmp.patch
>
>
> java -jar app/target/pdfbox-app-2.0.32-SNAPSHOT.jar PDFSplit xxx.pdf
> I would expect to see the names dict inside the documentCatalog which is used 
> to store pdf templates



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-06-13 Thread Tilman Hausherr

See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:

Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text 
extraction issue.


commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were "omitted" 
and in 2.0.32 there is some special char. But th remaining part looks 
good to me.



cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but 2.0.31 is 
able to extract them. 2.0.32 seems to mix some of the content.


I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't 
investigated yet.


Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have the 
results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I 
can get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName

2024-06-13 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854765#comment-17854765
 ] 

Tilman Hausherr commented on PDFBOX-5835:
-

My problem is that I'd like to have a not copyrighted file for tests and to put 
into our repository, although I don't know if your file can be copyrighted at 
all, assuming that it was created by a machine.

> DomXmpParser - IllegalArgumentException: prefix cannot be "null" when 
> creating a QName
> --
>
> Key: PDFBOX-5835
> URL: https://issues.apache.org/jira/browse/PDFBOX-5835
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 3.0.2 PDFBox
>Reporter: Oliver Schmidtmer
>Priority: Major
>
> I've got a PDF from, where parsing the metadata fails with an 
> IllegalArgumentException
> {code:java}
> java.lang.IllegalArgumentException: prefix cannot be "null" when creating a 
> QName
>   at java.xml/javax.xml.namespace.QName.(QName.java:192)
>   at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112)
> {code}
> This can be reproduced with a simple test, using the extracted metadata:
> {code:java}
> @Test
> void testDomXmpParser() throws XmpParsingException
> {
> // taken from file test-landscape2.pdf
> String xmpmeta = " standalone=\"no\"?>\n" +
> " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" +
> "  xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" +
> "  xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" +
> "   3\n" +
> "   A\n" +
> "  \n" +
> "   xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; 
> xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; 
> xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; 
> xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; 
> xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" +
> "  \n" +
> "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" +
> "\n" +
> " \n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension 
> Schema\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n"
>  +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" +
> "   \n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML 
> invoice file\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" +
> "  xmlns=\"htt

[jira] [Comment Edited] (PDFBOX-3117) Left margin cut off when printing

2024-06-13 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854743#comment-17854743
 ] 

Tilman Hausherr edited comment on PDFBOX-3117 at 6/13/24 12:58 PM:
---

Back to this because of this question:
https://lists.apache.org/thread/12s9tc93ofgmjfq1dpqfps9p725l0wwr

I'm adding a check to disable centering if the translation has a negative 
value. Landscape labels can be printed without rotation if PORTRAIT is used as 
a parameter.


was (Author: tilman):
Back to this:
https://lists.apache.org/thread/12s9tc93ofgmjfq1dpqfps9p725l0wwr

I'm adding a check to disable centering if the translation has a negative 
value. Landscape labels can be printed without rotation if PORTRAIT is used as 
a parameter.

> Left margin cut off when printing
> -
>
> Key: PDFBOX-3117
> URL: https://issues.apache.org/jira/browse/PDFBOX-3117
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.10, 1.8.11, 2.0.0
>    Reporter: Tilman Hausherr
>Priority: Major
>  Labels: print, printing
> Attachments: PDFBOX-3117-1468001565.pdf, PDFBOX-3117.pdf
>
>
> This is about the margin problem when printing that was mentioned on the user 
> mailing list. What I know at this time:
> - media box is (0 0 233.29 3600)
> - used fonts: Times-Roman and ArialUnicodeMS not embedded
> Effect happens with a real printer, but not when printing to PDF or to XPS.
> First todo is to create such a file in the hope of getting the effect because 
> the file can't be shared.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3117) Left margin cut off when printing

2024-06-13 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854743#comment-17854743
 ] 

Tilman Hausherr commented on PDFBOX-3117:
-

Back to this:
https://lists.apache.org/thread/12s9tc93ofgmjfq1dpqfps9p725l0wwr

I'm adding a check to disable centering if the translation has a negative 
value. Landscape labels can be printed without rotation if PORTRAIT is used as 
a parameter.

> Left margin cut off when printing
> -
>
> Key: PDFBOX-3117
> URL: https://issues.apache.org/jira/browse/PDFBOX-3117
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.10, 1.8.11, 2.0.0
>    Reporter: Tilman Hausherr
>Priority: Major
>  Labels: print, printing
> Attachments: PDFBOX-3117-1468001565.pdf, PDFBOX-3117.pdf
>
>
> This is about the margin problem when printing that was mentioned on the user 
> mailing list. What I know at this time:
> - media box is (0 0 233.29 3600)
> - used fonts: Times-Roman and ArialUnicodeMS not embedded
> Effect happens with a real printer, but not when printing to PDF or to XPS.
> First todo is to create such a file in the hope of getting the effect because 
> the file can't be shared.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName

2024-06-13 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854703#comment-17854703
 ] 

Tilman Hausherr commented on PDFBOX-5835:
-

I'm waiting for PDFBOX-5838 to be fixed to commit my proposed change. After 
PDFBOX-5838 is fixed and there are no further problems, I'd like to test the 
change to see if it gets better or worse. Ideally there should be less 
exceptions.

> DomXmpParser - IllegalArgumentException: prefix cannot be "null" when 
> creating a QName
> --
>
> Key: PDFBOX-5835
> URL: https://issues.apache.org/jira/browse/PDFBOX-5835
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 3.0.2 PDFBox
>Reporter: Oliver Schmidtmer
>Priority: Major
>
> I've got a PDF from, where parsing the metadata fails with an 
> IllegalArgumentException
> {code:java}
> java.lang.IllegalArgumentException: prefix cannot be "null" when creating a 
> QName
>   at java.xml/javax.xml.namespace.QName.(QName.java:192)
>   at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112)
> {code}
> This can be reproduced with a simple test, using the extracted metadata:
> {code:java}
> @Test
> void testDomXmpParser() throws XmpParsingException
> {
> // taken from file test-landscape2.pdf
> String xmpmeta = " standalone=\"no\"?>\n" +
> " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" +
> "  xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" +
> "  xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" +
> "   3\n" +
> "   A\n" +
> "  \n" +
> "   xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; 
> xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; 
> xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; 
> xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; 
> xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" +
> "  \n" +
> "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" +
> "\n" +
> " \n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension 
> Schema\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n"
>  +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" +
> "   \n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML 
> invoice file\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" +
>

[jira] [Comment Edited] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31

2024-06-13 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854692#comment-17854692
 ] 

Tilman Hausherr edited comment on PDFBOX-5838 at 6/13/24 9:56 AM:
--

Another file, just to see if it is the same bug:

OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf: font F3, the A glyph is decoded as $.



was (Author: tilman):
More files, just to see if it is the same bug:

OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf: font F3, the A glyph is decoded as $.


> Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
> --
>
> Key: PDFBOX-5838
> URL: https://issues.apache.org/jira/browse/PDFBOX-5838
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>    Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf, 
> PDFBOX-5838-0024320-reduced.pdf
>
>
> discovered in 2.0.32 regression tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName

2024-06-13 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854724#comment-17854724
 ] 

Tilman Hausherr commented on PDFBOX-5835:
-

[~O.Schmidtmer] the xmp isn't from "test-landscape2.pdf", at least not from 
ours. It seems to be some ZUGFeRD template / demo file. Or was it created by 
"enriching" the original PDF file?

> DomXmpParser - IllegalArgumentException: prefix cannot be "null" when 
> creating a QName
> --
>
> Key: PDFBOX-5835
> URL: https://issues.apache.org/jira/browse/PDFBOX-5835
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 3.0.2 PDFBox
>Reporter: Oliver Schmidtmer
>Priority: Major
>
> I've got a PDF from, where parsing the metadata fails with an 
> IllegalArgumentException
> {code:java}
> java.lang.IllegalArgumentException: prefix cannot be "null" when creating a 
> QName
>   at java.xml/javax.xml.namespace.QName.(QName.java:192)
>   at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112)
> {code}
> This can be reproduced with a simple test, using the extracted metadata:
> {code:java}
> @Test
> void testDomXmpParser() throws XmpParsingException
> {
> // taken from file test-landscape2.pdf
> String xmpmeta = " standalone=\"no\"?>\n" +
> " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" +
> "  xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" +
> "  xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" +
> "   3\n" +
> "   A\n" +
> "  \n" +
> "   xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; 
> xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; 
> xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; 
> xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; 
> xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" +
> "  \n" +
> "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" +
> "\n" +
> " \n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension 
> Schema\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n"
>  +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" +
> "   \n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML 
> invoice file\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" +
> "  xmlns=\"http://www.aiim

[jira] [Commented] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31

2024-06-13 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854711#comment-17854711
 ] 

Tilman Hausherr commented on PDFBOX-5838:
-

I think it's because of PDFBOX-5790. This might be a tricky decision: Adobe 
fails to extract the text of the file here, I get "HRQRUV ReVeaUch PURMecW". 
The other file also fails.

> Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
> --
>
> Key: PDFBOX-5838
> URL: https://issues.apache.org/jira/browse/PDFBOX-5838
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf, 
> PDFBOX-5838-0024320-reduced.pdf
>
>
> discovered in 2.0.32 regression tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5837) Add center constructor parameter to PDFPageable and to pdfbox-app

2024-06-13 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5837:

Description: 
Add an optional {{center}} parameter to the telescopic {{PDFPageable}} 
constructor and pass it to {{PDFPrintable}}, and add the parameter to the 
command line class. This may also help with the printing of landscape labels, 
see also
https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3

  was:
Add optional {{center}} parameter to telescopic {{PDFPageable}} constructor and 
pass it to {{PDFPrintable}}, and add the parameter to the comment line class. 
This may also help with the printing of landscape labels, see also
https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3


> Add center constructor parameter to PDFPageable and to pdfbox-app
> -
>
> Key: PDFBOX-5837
> URL: https://issues.apache.org/jira/browse/PDFBOX-5837
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>    Reporter: Tilman Hausherr
>    Assignee: Tilman Hausherr
>Priority: Minor
>  Labels: print, printing
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Add an optional {{center}} parameter to the telescopic {{PDFPageable}} 
> constructor and pass it to {{PDFPrintable}}, and add the parameter to the 
> command line class. This may also help with the printing of landscape labels, 
> see also
> https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5837) Add center constructor parameter to PDFPageable and to pdfbox-app

2024-06-13 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-5837.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Add center constructor parameter to PDFPageable and to pdfbox-app
> -
>
> Key: PDFBOX-5837
> URL: https://issues.apache.org/jira/browse/PDFBOX-5837
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>    Reporter: Tilman Hausherr
>    Assignee: Tilman Hausherr
>Priority: Minor
>  Labels: print, printing
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Add optional {{center}} parameter to telescopic {{PDFPageable}} constructor 
> and pass it to {{PDFPrintable}}, and add the parameter to the comment line 
> class. This may also help with the printing of landscape labels, see also
> https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31

2024-06-13 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5838:

Attachment: (was: PDFBOX-5838-0024320.pdf)

> Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
> --
>
> Key: PDFBOX-5838
> URL: https://issues.apache.org/jira/browse/PDFBOX-5838
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>    Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: PDFBOX-5838-0024320-reduced.pdf
>
>
> discovered in 2.0.32 regression tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31

2024-06-13 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5838:

Attachment: OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf

> Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
> --
>
> Key: PDFBOX-5838
> URL: https://issues.apache.org/jira/browse/PDFBOX-5838
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>    Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf, 
> PDFBOX-5838-0024320-reduced.pdf
>
>
> discovered in 2.0.32 regression tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31

2024-06-13 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5838:

Attachment: PDFBOX-5838-0024320-reduced.pdf

> Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
> --
>
> Key: PDFBOX-5838
> URL: https://issues.apache.org/jira/browse/PDFBOX-5838
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>    Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: PDFBOX-5838-0024320-reduced.pdf
>
>
> discovered in 2.0.32 regression tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31

2024-06-13 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5838:

Attachment: PDFBOX-5838-0024320.pdf

> Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
> --
>
> Key: PDFBOX-5838
> URL: https://issues.apache.org/jira/browse/PDFBOX-5838
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>    Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: PDFBOX-5838-0024320.pdf
>
>
> discovered in 2.0.32 regression tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31

2024-06-13 Thread Tilman Hausherr (Jira)
Tilman Hausherr created PDFBOX-5838:
---

 Summary: Text extraction garbled in this file, was OK in 3.0.2 / 
2.0.31
 Key: PDFBOX-5838
 URL: https://issues.apache.org/jira/browse/PDFBOX-5838
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Affects Versions: 2.0.32, 3.0.3 PDFBox
Reporter: Tilman Hausherr
 Attachments: PDFBOX-5838-0024320.pdf

discovered in 2.0.32 regression tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5837) Add center constructor parameter to PDFPageable and to pdfbox-app

2024-06-13 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5837:

Description: 
Add optional {{center}} parameter to telescopic {{PDFPageable}} constructor and 
pass it to {{PDFPrintable}}, and add the parameter to the comment line class. 
This may also help with the printing of landscape labels, see also
https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3

  was:
Add center constructor parameter to PDFPageable and pass it to PDFPrintable, 
and add the parameter to the comment line class. This may also help with the 
printing of landscape labels, see also
https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3


> Add center constructor parameter to PDFPageable and to pdfbox-app
> -
>
> Key: PDFBOX-5837
> URL: https://issues.apache.org/jira/browse/PDFBOX-5837
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>    Reporter: Tilman Hausherr
>    Assignee: Tilman Hausherr
>Priority: Minor
>  Labels: print, printing
> Fix For: 2.0.32, 3.0.3 PDFBox
>
>
> Add optional {{center}} parameter to telescopic {{PDFPageable}} constructor 
> and pass it to {{PDFPrintable}}, and add the parameter to the comment line 
> class. This may also help with the printing of landscape labels, see also
> https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5837) Add center constructor parameter to PDFPageable and to pdfbox-app

2024-06-13 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5837:

Description: 
Add center constructor parameter to PDFPageable and pass it to PDFPrintable, 
and add the parameter to the comment line class. This may also help with the 
printing of landscape labels, see also
https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3

  was:Add center constructor parameter to PDFPageable and pass it to 
PDFPrintable, and add the parameter to the comment line class. This may also 
help with the printing of landscape labels.


> Add center constructor parameter to PDFPageable and to pdfbox-app
> -
>
> Key: PDFBOX-5837
> URL: https://issues.apache.org/jira/browse/PDFBOX-5837
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>    Reporter: Tilman Hausherr
>    Assignee: Tilman Hausherr
>Priority: Minor
>  Labels: print, printing
> Fix For: 2.0.32, 3.0.3 PDFBox
>
>
> Add center constructor parameter to PDFPageable and pass it to PDFPrintable, 
> and add the parameter to the comment line class. This may also help with the 
> printing of landscape labels, see also
> https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5837) Add center constructor parameter to PDFPageable and to pdfbox-app

2024-06-13 Thread Tilman Hausherr (Jira)
Tilman Hausherr created PDFBOX-5837:
---

 Summary: Add center constructor parameter to PDFPageable and to 
pdfbox-app
 Key: PDFBOX-5837
 URL: https://issues.apache.org/jira/browse/PDFBOX-5837
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 3.0.2 PDFBox, 2.0.31
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
 Fix For: 2.0.32, 3.0.3 PDFBox


Add center constructor parameter to PDFPageable and pass it to PDFPrintable, 
and add the parameter to the comment line class. This may also help with the 
printing of landscape labels.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: PDFBox 2.0.32 release

2024-06-12 Thread Tilman Hausherr

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz

No new exceptions but many content differences. I haven't investigated yet.

Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll have the 
results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I can 
get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName

2024-06-12 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854441#comment-17854441
 ] 

Tilman Hausherr commented on PDFBOX-5835:
-

I'm able to run the code by adding {{nsFinder.push()}} at two places: below 
{{Element first = elements.get(0);}} and at the beginning of the loop in 
{{parseChildrenAsProperties()}}, and also adding pop() at the ends. It doesn't 
fix PDFBOX-2913.

> DomXmpParser - IllegalArgumentException: prefix cannot be "null" when 
> creating a QName
> --
>
> Key: PDFBOX-5835
> URL: https://issues.apache.org/jira/browse/PDFBOX-5835
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 3.0.2 PDFBox
>Reporter: Oliver Schmidtmer
>Priority: Major
>
> I've got a PDF from, where parsing the metadata fails with an 
> IllegalArgumentException
> {code:java}
> java.lang.IllegalArgumentException: prefix cannot be "null" when creating a 
> QName
>   at java.xml/javax.xml.namespace.QName.(QName.java:192)
>   at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112)
> {code}
> This can be reproduced with a simple test, using the extracted metadata:
> {code:java}
> @Test
> void testDomXmpParser() throws XmpParsingException
> {
> // taken from file test-landscape2.pdf
> String xmpmeta = " standalone=\"no\"?>\n" +
> " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" +
> "  xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" +
> "  xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" +
> "   3\n" +
> "   A\n" +
> "  \n" +
> "   xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; 
> xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; 
> xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; 
> xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; 
> xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" +
> "  \n" +
> "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" +
> "\n" +
> " \n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension 
> Schema\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n"
>  +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" +
> "   \n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML 
> invoice file\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" +
>

Re: PDFBox 2.0.32 release

2024-06-12 Thread Tilman Hausherr
I've started the tests. If there aren't any troubles I'll have the 
results tomorrow.


Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:

Thanks for the update.

I'm going to postpone the release as I'll need any helping hand I can 
get.


Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:

+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:

Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix first?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName

2024-06-12 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854271#comment-17854271
 ] 

Tilman Hausherr commented on PDFBOX-5835:
-

I read https://www.w3schools.com/xml/xml_namespaces.asp (because I don't know 
much about xml) and then I changed the segment to
{code:xml}
 http://www.aiim.org/pdfa/ns/extension/;>
   

 
  http://www.aiim.org/pdfa/ns/schema#;>ZUGFeRD PDFA 
Extension Schema
{code}
and now the exception comes much later: Schema is not set in this document : 
http://www.aiim.org/pdfa/ns/schema#

Thus the question is, why does defining the namespace after " DomXmpParser - IllegalArgumentException: prefix cannot be "null" when 
> creating a QName
> --
>
> Key: PDFBOX-5835
> URL: https://issues.apache.org/jira/browse/PDFBOX-5835
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 3.0.2 PDFBox
>Reporter: Oliver Schmidtmer
>Priority: Major
>
> I've got a PDF from, where parsing the metadata fails with an 
> IllegalArgumentException
> {code:java}
> java.lang.IllegalArgumentException: prefix cannot be "null" when creating a 
> QName
>   at java.xml/javax.xml.namespace.QName.(QName.java:192)
>   at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112)
> {code}
> This can be reproduced with a simple test, using the extracted metadata:
> {code:java}
> @Test
> void testDomXmpParser() throws XmpParsingException
> {
> // taken from file test-landscape2.pdf
> String xmpmeta = " standalone=\"no\"?>\n" +
> " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" +
> "  xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" +
> "  xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" +
> "   3\n" +
> "   A\n" +
> "  \n" +
> "   xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; 
> xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; 
> xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; 
> xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; 
> xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" +
> "  \n" +
> "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" +
> "\n" +
> " \n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension 
> Schema\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n"
>  +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" +
> "   \n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML 
> invoice file\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n"

[jira] [Commented] (PDFBOX-5836) PDF A-1 falsely validated as invalid for ICC color profile regression

2024-06-11 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854162#comment-17854162
 ] 

Tilman Hausherr commented on PDFBOX-5836:
-

It works for me with the app, on jdk8 and jdk22 on Windows. What jdk are you 
using on what OS?

> PDF A-1 falsely validated as invalid for ICC color profile regression
> -
>
> Key: PDFBOX-5836
> URL: https://issues.apache.org/jira/browse/PDFBOX-5836
> Project: PDFBox
>  Issue Type: Bug
>  Components: Preflight
>Affects Versions: 3.0.2 PDFBox
>Reporter: Jochen Stärk
>Priority: Major
> Attachments: MustangGnuaccountingBeispielRE-20190610_507blanko.pdf
>
>
> PreflightParser.validate(theFile.toFile()).isValid() throws a "Unable to 
> parse the ICC Profile" on the attached, Libreoffice-generated PDF/A-1. 
> VeraPDF validates the file as valid. It worked with PDF 2 and I need it to be 
> fixed in context of my upgrade to PDFbox 3 
> (https://github.com/ZUGFeRD/mustangproject/issues/373).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName

2024-06-11 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854082#comment-17854082
 ] 

Tilman Hausherr commented on PDFBOX-5835:
-

Yes maybe there is a difference. If I deactivate the exception throw then the 
code for this issue succeeds, but not the code for PDFBOX-2913.

> DomXmpParser - IllegalArgumentException: prefix cannot be "null" when 
> creating a QName
> --
>
> Key: PDFBOX-5835
> URL: https://issues.apache.org/jira/browse/PDFBOX-5835
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 3.0.2 PDFBox
>Reporter: Oliver Schmidtmer
>Priority: Major
>
> I've got a PDF from, where parsing the metadata fails with an 
> IllegalArgumentException
> {code:java}
> java.lang.IllegalArgumentException: prefix cannot be "null" when creating a 
> QName
>   at java.xml/javax.xml.namespace.QName.(QName.java:192)
>   at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112)
> {code}
> This can be reproduced with a simple test, using the extracted metadata:
> {code:java}
> @Test
> void testDomXmpParser() throws XmpParsingException
> {
> // taken from file test-landscape2.pdf
> String xmpmeta = " standalone=\"no\"?>\n" +
> " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" +
> "  xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" +
> "  xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" +
> "   3\n" +
> "   A\n" +
> "  \n" +
> "   xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; 
> xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; 
> xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; 
> xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; 
> xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" +
> "  \n" +
> "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" +
> "\n" +
> " \n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension 
> Schema\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n"
>  +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" +
> "   \n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML 
> invoice file\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  x

[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName

2024-06-11 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853936#comment-17853936
 ] 

Tilman Hausherr commented on PDFBOX-5835:
-

I fixed that bug anyway; I'll try to work on the actual issue at a later time. 
(I tried last year and didn't succeed)

> DomXmpParser - IllegalArgumentException: prefix cannot be "null" when 
> creating a QName
> --
>
> Key: PDFBOX-5835
> URL: https://issues.apache.org/jira/browse/PDFBOX-5835
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 3.0.2 PDFBox
>Reporter: Oliver Schmidtmer
>Priority: Major
>
> I've got a PDF from, where parsing the metadata fails with an 
> IllegalArgumentException
> {code:java}
> java.lang.IllegalArgumentException: prefix cannot be "null" when creating a 
> QName
>   at java.xml/javax.xml.namespace.QName.(QName.java:192)
>   at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112)
> {code}
> This can be reproduced with a simple test, using the extracted metadata:
> {code:java}
> @Test
> void testDomXmpParser() throws XmpParsingException
> {
> // taken from file test-landscape2.pdf
> String xmpmeta = " standalone=\"no\"?>\n" +
> " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" +
> "  xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" +
> "  xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" +
> "   3\n" +
> "   A\n" +
> "  \n" +
> "   xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; 
> xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; 
> xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; 
> xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; 
> xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" +
> "  \n" +
> "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" +
> "\n" +
> " \n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension 
> Schema\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n"
>  +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" +
> "   \n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML 
> invoice file\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  x

[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName

2024-06-10 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853759#comment-17853759
 ] 

Tilman Hausherr commented on PDFBOX-5835:
-

I can fix avoid the IllegalArgumentException but now you'll get 
XmpParsingException: Schema is not set in this document : 
http://www.aiim.org/pdfa/ns/extension/  which is a 9 year old unfixed bug 
(PDFBOX-2913). Would this be helpful?

> DomXmpParser - IllegalArgumentException: prefix cannot be "null" when 
> creating a QName
> --
>
> Key: PDFBOX-5835
> URL: https://issues.apache.org/jira/browse/PDFBOX-5835
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 3.0.2 PDFBox
>Reporter: Oliver Schmidtmer
>Priority: Major
>
> I've got a PDF from, where parsing the metadata fails with an 
> IllegalArgumentException
> {code:java}
> java.lang.IllegalArgumentException: prefix cannot be "null" when creating a 
> QName
>   at java.xml/javax.xml.namespace.QName.(QName.java:192)
>   at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306)
>   at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)
>   at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112)
> {code}
> This can be reproduced with a simple test, using the extracted metadata:
> {code:java}
> @Test
> void testDomXmpParser() throws XmpParsingException
> {
> // taken from file test-landscape2.pdf
> String xmpmeta = " standalone=\"no\"?>\n" +
> " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" +
> "  xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" +
> "  xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" +
> "   3\n" +
> "   A\n" +
> "  \n" +
> "   xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; 
> xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; 
> xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; 
> xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; 
> xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" +
> "  \n" +
> "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" +
> "\n" +
> " \n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension 
> Schema\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n"
>  +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" +
> "   xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" +
> "   \n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML 
> invoice file\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" +
> "\n" +
> "\n" +
> "  xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" +
> "  xmlns=\"htt

  1   2   3   4   5   6   7   8   9   10   >