Re: PDFBox 2.0.32 release
Result: https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz to be compared against https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz I couldn't find a difference visually except the file sizes. This might be because of the path names or some meta data. Tilman On 06.07.2024 14:19, Tilman Hausherr wrote: Hi, I've just started a new "B" test. Tilman On 06.07.2024 13:29, Andreas Lehmkühler wrote: Hi, after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd like to finally cut the 2.0.32 release. Do we need a new regression test due the latest changes? There some related changes such as https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent refactoring in fontbox. Andreas Am 14.06.24 um 13:03 schrieb Tilman Hausherr: Result: https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz From what I see, nothing to do. And I know the time it takes: 3 hours for the A (or B) test, 1 hour to create the A vs B report (tika-eval). Tilman On 14.06.2024 08:47, Tilman Hausherr wrote: I'll repeat the regression tests with locally reverting the change from PDFBOX-5790 but locally adding my proposed xmpbox change from PDFBOX-5835. This way we'll know whether there are other problems. Tilman On 13.06.2024 19:23, Tilman Hausherr wrote: See https://issues.apache.org/jira/browse/PDFBOX-5838 I hope that it's all the same problem. Tilman On 13.06.2024 18:30, Andreas Lehmkühler wrote: Thanks for running the tests. the exceptions part looks good, but I'm afraid we have a text extraction issue. commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI some of the special characters changed. In 2.0.31 the were "omitted" and in 2.0.32 there is some special char. But th remaining part looks good to me. cc-main-2021-31-pdf-untruncated/0085/0085885.pdf ist seems to contain some special characters as well, but 2.0.31 is able to extract them. 2.0.32 seems to mix some of the content. I guess it is somehow font related. Need to investigate more Andreas Am 12.06.24 um 20:23 schrieb Tilman Hausherr: https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz No new exceptions but many content differences. I haven't investigated yet. Tilman On 12.06.2024 11:31, Tilman Hausherr wrote: I've started the tests. If there aren't any troubles I'll have the results tomorrow. Tilman On 05.06.2024 08:07, Andreas Lehmkühler wrote: Thanks for the update. I'm going to postpone the release as I'll need any helping hand I can get. Andreas Am 02.06.24 um 14:22 schrieb Tilman Hausherr: +1 but I won't be able to help with tests this time Tilman On 01.06.2024 12:15, Andreas Lehmkühler wrote: Hi, IMHO it is time to cut another 2.0.x release. I'm planing to do so in a week or so? Any objections or is there something we should add/fix first? Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: PDFBox 2.0.32 release
Hi, I've just started a new "B" test. Tilman On 06.07.2024 13:29, Andreas Lehmkühler wrote: Hi, after closing https://issues.apache.org/jira/browse/PDFBOX-5838 I'd like to finally cut the 2.0.32 release. Do we need a new regression test due the latest changes? There some related changes such as https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent refactoring in fontbox. Andreas Am 14.06.24 um 13:03 schrieb Tilman Hausherr: Result: https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz From what I see, nothing to do. And I know the time it takes: 3 hours for the A (or B) test, 1 hour to create the A vs B report (tika-eval). Tilman On 14.06.2024 08:47, Tilman Hausherr wrote: I'll repeat the regression tests with locally reverting the change from PDFBOX-5790 but locally adding my proposed xmpbox change from PDFBOX-5835. This way we'll know whether there are other problems. Tilman On 13.06.2024 19:23, Tilman Hausherr wrote: See https://issues.apache.org/jira/browse/PDFBOX-5838 I hope that it's all the same problem. Tilman On 13.06.2024 18:30, Andreas Lehmkühler wrote: Thanks for running the tests. the exceptions part looks good, but I'm afraid we have a text extraction issue. commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI some of the special characters changed. In 2.0.31 the were "omitted" and in 2.0.32 there is some special char. But th remaining part looks good to me. cc-main-2021-31-pdf-untruncated/0085/0085885.pdf ist seems to contain some special characters as well, but 2.0.31 is able to extract them. 2.0.32 seems to mix some of the content. I guess it is somehow font related. Need to investigate more Andreas Am 12.06.24 um 20:23 schrieb Tilman Hausherr: https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz No new exceptions but many content differences. I haven't investigated yet. Tilman On 12.06.2024 11:31, Tilman Hausherr wrote: I've started the tests. If there aren't any troubles I'll have the results tomorrow. Tilman On 05.06.2024 08:07, Andreas Lehmkühler wrote: Thanks for the update. I'm going to postpone the release as I'll need any helping hand I can get. Andreas Am 02.06.24 um 14:22 schrieb Tilman Hausherr: +1 but I won't be able to help with tests this time Tilman On 01.06.2024 12:15, Andreas Lehmkühler wrote: Hi, IMHO it is time to cut another 2.0.x release. I'm planing to do so in a week or so? Any objections or is there something we should add/fix first? Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Closed] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
[ https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed PDFBOX-5838. --- Resolution: Won't Do > Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31 > -- > > Key: PDFBOX-5838 > URL: https://issues.apache.org/jira/browse/PDFBOX-5838 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox > Reporter: Tilman Hausherr >Priority: Major > Labels: regression > Attachments: OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf, > PDFBOX-5838-0024320-reduced.pdf > > > discovered in 2.0.32 regression tests -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
jenkins build timeout strategy changed
I've changed the jenkins build timeout strategy on Jenkins because we got too many timed out builds. I've set an inactivity timeout on 10 minutes. This is because of the problems getting the NVD database while not having an NVD API key when the plugin gets updated. I suspect that Apache is penalized because we're not the only ones who make these calls. Tilman
[jira] [Commented] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files
[ https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863059#comment-17863059 ] Tilman Hausherr commented on PDFBOX-5848: - I forgot to mention: our snapshots are not available on maven central. > Infinite loop after splitting and saving PDF / giant result files > - > > Key: PDFBOX-5848 > URL: https://issues.apache.org/jira/browse/PDFBOX-5848 > Project: PDFBox > Issue Type: Bug > Components: Utilities >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: Joan Fisbein >Priority: Major > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > Attachments: 706213.pdf, cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, > screenshot-1.png > > > I use PDFBox to split hundreds of PDFs per day, generally, everything works > flawlessly but I just received a PDF that generates an infinite loop when I > try to split it. > > I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other > versions): > {code:java} > private static void splitPdf(File fileToSplit) { > try (PDDocument document = Loader.loadPDF(fileToSplit)) { > int documentPages = document.getNumberOfPages(); > Splitter splitter = new Splitter(); > List Pages = splitter.split(document); > Iterator iterator = Pages.listIterator(); > while (iterator.hasNext()) { > PDDocument pd = iterator.next(); > pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf"); > pd.close(); > } > } catch (IOException e) { > throw new RuntimeException(e); > } > } {code} > The PDF file is attached to the issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files
[ https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862998#comment-17862998 ] Tilman Hausherr commented on PDFBOX-5848: - Just added [^706213.pdf] if we ever want to add a test or improve this. Official US document thus no copyright. > Infinite loop after splitting and saving PDF / giant result files > - > > Key: PDFBOX-5848 > URL: https://issues.apache.org/jira/browse/PDFBOX-5848 > Project: PDFBox > Issue Type: Bug > Components: Utilities >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: Joan Fisbein >Priority: Major > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > Attachments: 706213.pdf, cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, > screenshot-1.png > > > I use PDFBox to split hundreds of PDFs per day, generally, everything works > flawlessly but I just received a PDF that generates an infinite loop when I > try to split it. > > I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other > versions): > {code:java} > private static void splitPdf(File fileToSplit) { > try (PDDocument document = Loader.loadPDF(fileToSplit)) { > int documentPages = document.getNumberOfPages(); > Splitter splitter = new Splitter(); > List Pages = splitter.split(document); > Iterator iterator = Pages.listIterator(); > while (iterator.hasNext()) { > PDDocument pd = iterator.next(); > pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf"); > pd.close(); > } > } catch (IOException e) { > throw new RuntimeException(e); > } > } {code} > The PDF file is attached to the issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files
[ https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5848: Attachment: 706213.pdf > Infinite loop after splitting and saving PDF / giant result files > - > > Key: PDFBOX-5848 > URL: https://issues.apache.org/jira/browse/PDFBOX-5848 > Project: PDFBox > Issue Type: Bug > Components: Utilities >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: Joan Fisbein >Priority: Major > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > Attachments: 706213.pdf, cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, > screenshot-1.png > > > I use PDFBox to split hundreds of PDFs per day, generally, everything works > flawlessly but I just received a PDF that generates an infinite loop when I > try to split it. > > I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other > versions): > {code:java} > private static void splitPdf(File fileToSplit) { > try (PDDocument document = Loader.loadPDF(fileToSplit)) { > int documentPages = document.getNumberOfPages(); > Splitter splitter = new Splitter(); > List Pages = splitter.split(document); > Iterator iterator = Pages.listIterator(); > while (iterator.hasNext()) { > PDDocument pd = iterator.next(); > pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf"); > pd.close(); > } > } catch (IOException e) { > throw new RuntimeException(e); > } > } {code} > The PDF file is attached to the issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files
[ https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862984#comment-17862984 ] Tilman Hausherr edited comment on PDFBOX-5848 at 7/4/24 9:42 AM: - If you don't need the annotations (especially link annotations) then it's a solution. Alternatively copy the current source code of the splitter class from the repository and use that one instead of the class from the jar. was (Author: tilman): If you don't need the annotations (especially link annotations) then it's a solution. Alternatively copy the current source code of the splitter class from the repository. > Infinite loop after splitting and saving PDF / giant result files > - > > Key: PDFBOX-5848 > URL: https://issues.apache.org/jira/browse/PDFBOX-5848 > Project: PDFBox > Issue Type: Bug > Components: Utilities >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: Joan Fisbein >Priority: Major > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, > screenshot-1.png > > > I use PDFBox to split hundreds of PDFs per day, generally, everything works > flawlessly but I just received a PDF that generates an infinite loop when I > try to split it. > > I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other > versions): > {code:java} > private static void splitPdf(File fileToSplit) { > try (PDDocument document = Loader.loadPDF(fileToSplit)) { > int documentPages = document.getNumberOfPages(); > Splitter splitter = new Splitter(); > List Pages = splitter.split(document); > Iterator iterator = Pages.listIterator(); > while (iterator.hasNext()) { > PDDocument pd = iterator.next(); > pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf"); > pd.close(); > } > } catch (IOException e) { > throw new RuntimeException(e); > } > } {code} > The PDF file is attached to the issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files
[ https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862984#comment-17862984 ] Tilman Hausherr commented on PDFBOX-5848: - If you don't need the annotations (especially link annotations) then it's a solution. Alternatively copy the current source code of the splitter class from the repository. > Infinite loop after splitting and saving PDF / giant result files > - > > Key: PDFBOX-5848 > URL: https://issues.apache.org/jira/browse/PDFBOX-5848 > Project: PDFBox > Issue Type: Bug > Components: Utilities >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: Joan Fisbein >Priority: Major > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, > screenshot-1.png > > > I use PDFBox to split hundreds of PDFs per day, generally, everything works > flawlessly but I just received a PDF that generates an infinite loop when I > try to split it. > > I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other > versions): > {code:java} > private static void splitPdf(File fileToSplit) { > try (PDDocument document = Loader.loadPDF(fileToSplit)) { > int documentPages = document.getNumberOfPages(); > Splitter splitter = new Splitter(); > List Pages = splitter.split(document); > Iterator iterator = Pages.listIterator(); > while (iterator.hasNext()) { > PDDocument pd = iterator.next(); > pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf"); > pd.close(); > } > } catch (IOException e) { > throw new RuntimeException(e); > } > } {code} > The PDF file is attached to the issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files
[ https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5848: Affects Version/s: 2.0.31 > Infinite loop after splitting and saving PDF / giant result files > - > > Key: PDFBOX-5848 > URL: https://issues.apache.org/jira/browse/PDFBOX-5848 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: Joan Fisbein >Priority: Major > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, > screenshot-1.png > > > I use PDFBox to split hundreds of PDFs per day, generally, everything works > flawlessly but I just received a PDF that generates an infinite loop when I > try to split it. > > I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other > versions): > {code:java} > private static void splitPdf(File fileToSplit) { > try (PDDocument document = Loader.loadPDF(fileToSplit)) { > int documentPages = document.getNumberOfPages(); > Splitter splitter = new Splitter(); > List Pages = splitter.split(document); > Iterator iterator = Pages.listIterator(); > while (iterator.hasNext()) { > PDDocument pd = iterator.next(); > pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf"); > pd.close(); > } > } catch (IOException e) { > throw new RuntimeException(e); > } > } {code} > The PDF file is attached to the issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files
[ https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5848: Component/s: Utilities > Infinite loop after splitting and saving PDF / giant result files > - > > Key: PDFBOX-5848 > URL: https://issues.apache.org/jira/browse/PDFBOX-5848 > Project: PDFBox > Issue Type: Bug > Components: Utilities >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: Joan Fisbein >Priority: Major > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, > screenshot-1.png > > > I use PDFBox to split hundreds of PDFs per day, generally, everything works > flawlessly but I just received a PDF that generates an infinite loop when I > try to split it. > > I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other > versions): > {code:java} > private static void splitPdf(File fileToSplit) { > try (PDDocument document = Loader.loadPDF(fileToSplit)) { > int documentPages = document.getNumberOfPages(); > Splitter splitter = new Splitter(); > List Pages = splitter.split(document); > Iterator iterator = Pages.listIterator(); > while (iterator.hasNext()) { > PDDocument pd = iterator.next(); > pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf"); > pd.close(); > } > } catch (IOException e) { > throw new RuntimeException(e); > } > } {code} > The PDF file is attached to the issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files
[ https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5848: Fix Version/s: 2.0.32 3.0.3 PDFBox 4.0.0 > Infinite loop after splitting and saving PDF / giant result files > - > > Key: PDFBOX-5848 > URL: https://issues.apache.org/jira/browse/PDFBOX-5848 > Project: PDFBox > Issue Type: Bug >Affects Versions: 3.0.2 PDFBox >Reporter: Joan Fisbein >Priority: Major > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, > screenshot-1.png > > > I use PDFBox to split hundreds of PDFs per day, generally, everything works > flawlessly but I just received a PDF that generates an infinite loop when I > try to split it. > > I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other > versions): > {code:java} > private static void splitPdf(File fileToSplit) { > try (PDDocument document = Loader.loadPDF(fileToSplit)) { > int documentPages = document.getNumberOfPages(); > Splitter splitter = new Splitter(); > List Pages = splitter.split(document); > Iterator iterator = Pages.listIterator(); > while (iterator.hasNext()) { > PDDocument pd = iterator.next(); > pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf"); > pd.close(); > } > } catch (IOException e) { > throw new RuntimeException(e); > } > } {code} > The PDF file is attached to the issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files
[ https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862971#comment-17862971 ] Tilman Hausherr commented on PDFBOX-5848: - [~jfisbein-clarity] Please try with the new snapshot https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.3-SNAPSHOT/ it's likely that this fixes your problem as well, because there is less to save now. > Infinite loop after splitting and saving PDF / giant result files > - > > Key: PDFBOX-5848 > URL: https://issues.apache.org/jira/browse/PDFBOX-5848 > Project: PDFBox > Issue Type: Bug >Affects Versions: 3.0.2 PDFBox >Reporter: Joan Fisbein >Priority: Major > Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, > screenshot-1.png > > > I use PDFBox to split hundreds of PDFs per day, generally, everything works > flawlessly but I just received a PDF that generates an infinite loop when I > try to split it. > > I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other > versions): > {code:java} > private static void splitPdf(File fileToSplit) { > try (PDDocument document = Loader.loadPDF(fileToSplit)) { > int documentPages = document.getNumberOfPages(); > Splitter splitter = new Splitter(); > List Pages = splitter.split(document); > Iterator iterator = Pages.listIterator(); > while (iterator.hasNext()) { > PDDocument pd = iterator.next(); > pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf"); > pd.close(); > } > } catch (IOException e) { > throw new RuntimeException(e); > } > } {code} > The PDF file is attached to the issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files
[ https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5848: Summary: Infinite loop after splitting and saving PDF / giant result files (was: Infinite loop processing PDF) > Infinite loop after splitting and saving PDF / giant result files > - > > Key: PDFBOX-5848 > URL: https://issues.apache.org/jira/browse/PDFBOX-5848 > Project: PDFBox > Issue Type: Bug >Affects Versions: 3.0.2 PDFBox >Reporter: Joan Fisbein >Priority: Major > Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, > screenshot-1.png > > > I use PDFBox to split hundreds of PDFs per day, generally, everything works > flawlessly but I just received a PDF that generates an infinite loop when I > try to split it. > > I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other > versions): > {code:java} > private static void splitPdf(File fileToSplit) { > try (PDDocument document = Loader.loadPDF(fileToSplit)) { > int documentPages = document.getNumberOfPages(); > Splitter splitter = new Splitter(); > List Pages = splitter.split(document); > Iterator iterator = Pages.listIterator(); > while (iterator.hasNext()) { > PDDocument pd = iterator.next(); > pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf"); > pd.close(); > } > } catch (IOException e) { > throw new RuntimeException(e); > } > } {code} > The PDF file is attached to the issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF
[ https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862916#comment-17862916 ] Tilman Hausherr commented on PDFBOX-5848: - It finished with 3.0.2 (while I slept) and the snapshot too (with a dirty fix for the /Parent problem). I also tried with "-startPage 1 -endPage 442" because I'm not sure about the default settings of the splitter class and I never tried her code. I'll do a less dirty fix for the /Parent problem in the next few days. [~jfisbein-clarity] try setting a higher stack site with "-Xss". The snapshot version is at https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.3-SNAPSHOT/ > Infinite loop processing PDF > > > Key: PDFBOX-5848 > URL: https://issues.apache.org/jira/browse/PDFBOX-5848 > Project: PDFBox > Issue Type: Bug >Affects Versions: 3.0.2 PDFBox >Reporter: Joan Fisbein >Priority: Major > Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, > screenshot-1.png > > > I use PDFBox to split hundreds of PDFs per day, generally, everything works > flawlessly but I just received a PDF that generates an infinite loop when I > try to split it. > > I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other > versions): > {code:java} > private static void splitPdf(File fileToSplit) { > try (PDDocument document = Loader.loadPDF(fileToSplit)) { > int documentPages = document.getNumberOfPages(); > Splitter splitter = new Splitter(); > List Pages = splitter.split(document); > Iterator iterator = Pages.listIterator(); > while (iterator.hasNext()) { > PDDocument pd = iterator.next(); > pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf"); > pd.close(); > } > } catch (IOException e) { > throw new RuntimeException(e); > } > } {code} > The PDF file is attached to the issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5848) Infinite loop processing PDF
[ https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862867#comment-17862867 ] Tilman Hausherr edited comment on PDFBOX-5848 at 7/3/24 6:15 PM: - I'm testing with 3.0.2 and it's working very slowly... I'm at page 170. However there's a different problem, lots of orphan pages. The reason is that some annotations have a /Parent entry which has a /Kids entry whose children are annotations on *different* pages. Opening and saving it with Adobe Reader brings a much smaller file, where the /Parent entry value is set to null. !screenshot-1.png! was (Author: tilman): I'm testing with 3.0.2 and it's working very slowly... I'm at page 170. However there's a different problem, lots of orphan pages. The reason is that some annotations have a /Parent entry which has a /Kids entry whose children are annotations on *different* pages. !screenshot-1.png! > Infinite loop processing PDF > > > Key: PDFBOX-5848 > URL: https://issues.apache.org/jira/browse/PDFBOX-5848 > Project: PDFBox > Issue Type: Bug >Affects Versions: 3.0.2 PDFBox >Reporter: Joan Fisbein >Priority: Major > Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, > screenshot-1.png > > > I use PDFBox to split hundreds of PDFs per day, generally, everything works > flawlessly but I just received a PDF that generates an infinite loop when I > try to split it. > > I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other > versions): > {code:java} > private static void splitPdf(File fileToSplit) { > try (PDDocument document = Loader.loadPDF(fileToSplit)) { > int documentPages = document.getNumberOfPages(); > Splitter splitter = new Splitter(); > List Pages = splitter.split(document); > Iterator iterator = Pages.listIterator(); > while (iterator.hasNext()) { > PDDocument pd = iterator.next(); > pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf"); > pd.close(); > } > } catch (IOException e) { > throw new RuntimeException(e); > } > } {code} > The PDF file is attached to the issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5848) Infinite loop processing PDF
[ https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5848: Attachment: screenshot-1.png > Infinite loop processing PDF > > > Key: PDFBOX-5848 > URL: https://issues.apache.org/jira/browse/PDFBOX-5848 > Project: PDFBox > Issue Type: Bug >Affects Versions: 3.0.2 PDFBox >Reporter: Joan Fisbein >Priority: Major > Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, > screenshot-1.png > > > I use PDFBox to split hundreds of PDFs per day, generally, everything works > flawlessly but I just received a PDF that generates an infinite loop when I > try to split it. > > I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other > versions): > {code:java} > private static void splitPdf(File fileToSplit) { > try (PDDocument document = Loader.loadPDF(fileToSplit)) { > int documentPages = document.getNumberOfPages(); > Splitter splitter = new Splitter(); > List Pages = splitter.split(document); > Iterator iterator = Pages.listIterator(); > while (iterator.hasNext()) { > PDDocument pd = iterator.next(); > pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf"); > pd.close(); > } > } catch (IOException e) { > throw new RuntimeException(e); > } > } {code} > The PDF file is attached to the issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF
[ https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862867#comment-17862867 ] Tilman Hausherr commented on PDFBOX-5848: - I'm testing with 3.0.2 and it's working very slowly... I'm at page 170. However there's a different problem, lots of orphan pages. The reason is that some annotations have a /Parent entry which has a /Kids entry whose children are annotations on *different* pages. !screenshot-1.png! > Infinite loop processing PDF > > > Key: PDFBOX-5848 > URL: https://issues.apache.org/jira/browse/PDFBOX-5848 > Project: PDFBox > Issue Type: Bug >Affects Versions: 3.0.2 PDFBox >Reporter: Joan Fisbein >Priority: Major > Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, > screenshot-1.png > > > I use PDFBox to split hundreds of PDFs per day, generally, everything works > flawlessly but I just received a PDF that generates an infinite loop when I > try to split it. > > I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other > versions): > {code:java} > private static void splitPdf(File fileToSplit) { > try (PDDocument document = Loader.loadPDF(fileToSplit)) { > int documentPages = document.getNumberOfPages(); > Splitter splitter = new Splitter(); > List Pages = splitter.split(document); > Iterator iterator = Pages.listIterator(); > while (iterator.hasNext()) { > PDDocument pd = iterator.next(); > pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf"); > pd.close(); > } > } catch (IOException e) { > throw new RuntimeException(e); > } > } {code} > The PDF file is attached to the issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
jbig2 git
Sorry for the mess. I sent the wrong commit message, and tried different (partly unsuccessful) tactics to squash several commit messages into one. At least the tika message is gone now. I'll stop now because it might only get worse. Tilman - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Assigned] (PDFBOX-5847) Improve performance of FileSystemFontProvider.scanFonts()
[ https://issues.apache.org/jira/browse/PDFBOX-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reassigned PDFBOX-5847: --- Assignee: Tilman Hausherr > Improve performance of FileSystemFontProvider.scanFonts() > - > > Key: PDFBOX-5847 > URL: https://issues.apache.org/jira/browse/PDFBOX-5847 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.31, 3.0.2 PDFBox > Reporter: Tilman Hausherr > Assignee: Tilman Hausherr >Priority: Major > Fix For: 3.0.3 PDFBox, 4.0.0 > > > PR by Mykola Bohdiuk which introduces an "only headers" mode for the font > parsers where each table reads as little information as possible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5847) Improve performance of FileSystemFontProvider.scanFonts()
[ https://issues.apache.org/jira/browse/PDFBOX-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5847: Fix Version/s: 3.0.3 PDFBox 4.0.0 > Improve performance of FileSystemFontProvider.scanFonts() > - > > Key: PDFBOX-5847 > URL: https://issues.apache.org/jira/browse/PDFBOX-5847 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.31, 3.0.2 PDFBox > Reporter: Tilman Hausherr >Priority: Major > Fix For: 3.0.3 PDFBox, 4.0.0 > > > PR by Mykola Bohdiuk which introduces an "only headers" mode for the font > parsers where each table reads as little information as possible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5225) Flattening removes all annotations when widget annotation has no page
[ https://issues.apache.org/jira/browse/PDFBOX-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17861103#comment-17861103 ] Tilman Hausherr edited comment on PDFBOX-5225 at 7/1/24 9:28 AM: - No I'm not / yes please. I just clarified what it is about. was (Author: tilman): No I'm not. I just clarified what it is about. > Flattening removes all annotations when widget annotation has no page > - > > Key: PDFBOX-5225 > URL: https://issues.apache.org/jira/browse/PDFBOX-5225 > Project: PDFBox > Issue Type: Bug > Components: AcroForm >Affects Versions: 2.0.24 > Reporter: Tilman Hausherr >Priority: Major > Attachments: SourceFailure.pdf, screenshot-1.png > > > {code} > PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm(); > List list = new ArrayList<>(); > list.add(acroForm.getField("VN_NAME")); > acroForm.flatten(list, true); > {code} > The code from buildPagesWidgetsMap that is run when there are widgets with > missing page references does not consider the field list. So all widgets end > up in the map instead of only those we care about. > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5847) Improve performance of FileSystemFontProvider.scanFonts()
Tilman Hausherr created PDFBOX-5847: --- Summary: Improve performance of FileSystemFontProvider.scanFonts() Key: PDFBOX-5847 URL: https://issues.apache.org/jira/browse/PDFBOX-5847 Project: PDFBox Issue Type: Improvement Affects Versions: 3.0.2 PDFBox, 2.0.31 Reporter: Tilman Hausherr PR by Mykola Bohdiuk which introduces an "only headers" mode for the font parsers where each table reads as little information as possible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Closed] (PDFBOX-5383) JAVA program Crashes
[ https://issues.apache.org/jira/browse/PDFBOX-5383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed PDFBOX-5383. --- Resolution: Not A Bug Closing because this isn't "our" bug, it's in JDK8. > JAVA program Crashes > > > Key: PDFBOX-5383 > URL: https://issues.apache.org/jira/browse/PDFBOX-5383 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.24, 2.0.25, 3.0.0 PDFBox >Reporter: krishna prasad >Priority: Major > Labels: crash, jdk8 > Attachments: crash.pdf > > > I am trying to convert the PDF into images by using render. It hangs up the > program. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Closed] (PDFBOX-5289) java.io.IOException: Unknown dir object c='>' cInt=62 peek='>' peekInt=62 at offset 13377272 (start offset: 13377272)
[ https://issues.apache.org/jira/browse/PDFBOX-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed PDFBOX-5289. --- Resolution: Won't Fix Won't fix in 2.0, but works in 3.0 as long as you don't try to access the docinfo. > java.io.IOException: Unknown dir object c='>' cInt=62 peek='>' peekInt=62 at > offset 13377272 (start offset: 13377272) > - > > Key: PDFBOX-5289 > URL: https://issues.apache.org/jira/browse/PDFBOX-5289 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.24 >Reporter: Stephen >Priority: Major > Attachments: Diplomacy by Henry Kissinger (1).pdf > > > {code:java} > java.io.IOException: Unknown dir object c='>' cInt=62 peek='>' peekInt=62 at > offset 13377272 (start offset: 13377272)java.io.IOException: Unknown dir > object c='>' cInt=62 peek='>' peekInt=62 at offset 13377272 (start offset: > 13377272) at > org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:913) at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:154) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:288) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:218) > at > org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:857) at > org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:907) at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:876) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:796) > at > org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2858) > at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:175) at > org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226) at > org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1228) at > org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1128) > {code} > Please find the problematic PDF attached. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5225) Flattening removes all annotations when widget annotation has no page
[ https://issues.apache.org/jira/browse/PDFBOX-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5225: Attachment: screenshot-1.png > Flattening removes all annotations when widget annotation has no page > - > > Key: PDFBOX-5225 > URL: https://issues.apache.org/jira/browse/PDFBOX-5225 > Project: PDFBox > Issue Type: Bug > Components: AcroForm >Affects Versions: 2.0.24 > Reporter: Tilman Hausherr >Priority: Major > Attachments: SourceFailure.pdf, screenshot-1.png > > > {code} > PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm(); > List list = new ArrayList<>(); > list.add(acroForm.getField("VN_NAME")); > acroForm.flatten(list, true); > {code} > The code from buildPagesWidgetsMap that is run when there are widgets with > missing page references does not consider the field list. So all widgets end > up in the map instead of only those we care about. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5225) Flattening removes all annotations when widget annotation has no page
[ https://issues.apache.org/jira/browse/PDFBOX-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5225: Description: {code} PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm(); List list = new ArrayList<>(); list.add(acroForm.getField("VN_NAME")); acroForm.flatten(list, true); {code} The code from buildPagesWidgetsMap that is run when there are widgets with missing page references does not consider the field list. So all widgets end up in the map instead of only those we care about. !screenshot-1.png! was: {code} PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm(); List list = new ArrayList<>(); list.add(acroForm.getField("VN_NAME")); acroForm.flatten(list, true); {code} The code from buildPagesWidgetsMap that is run when there are widgets with missing page references does not consider the field list. So all widgets end up in the map instead of only those we care about. > Flattening removes all annotations when widget annotation has no page > - > > Key: PDFBOX-5225 > URL: https://issues.apache.org/jira/browse/PDFBOX-5225 > Project: PDFBox > Issue Type: Bug > Components: AcroForm >Affects Versions: 2.0.24 >Reporter: Tilman Hausherr >Priority: Major > Attachments: SourceFailure.pdf, screenshot-1.png > > > {code} > PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm(); > List list = new ArrayList<>(); > list.add(acroForm.getField("VN_NAME")); > acroForm.flatten(list, true); > {code} > The code from buildPagesWidgetsMap that is run when there are widgets with > missing page references does not consider the field list. So all widgets end > up in the map instead of only those we care about. > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-5842) IllegalArgumentException: Width (26) and height (0) must be non-zero
[ https://issues.apache.org/jira/browse/PDFBOX-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5842. - Resolution: Fixed > IllegalArgumentException: Width (26) and height (0) must be non-zero > > > Key: PDFBOX-5842 > URL: https://issues.apache.org/jira/browse/PDFBOX-5842 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.31, 3.0.2 PDFBox > Reporter: Tilman Hausherr >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > > reported by Patrycja Zaremba in the users mailing list > https://lists.apache.org/thread/xnwcyhq2c16d9xfgqwgjs70k9qb1w8tp > {quote}When the page which I try to convert have any element which is png with > only 1px height (28x1, 54x1 etc.) it is scaled down to 0 and I got this{quote} > IllegalArgumentException: Width (26) and height (0) must be non-zero > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1281) > > org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:74) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-5845) potential memory leak in TrueTypeCollection.java
[ https://issues.apache.org/jira/browse/PDFBOX-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5845. - Resolution: Fixed fixed in 1918648 (3.0) and in 1918649 (trunk) > potential memory leak in TrueTypeCollection.java > > > Key: PDFBOX-5845 > URL: https://issues.apache.org/jira/browse/PDFBOX-5845 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 3.0.2 PDFBox > Reporter: Tilman Hausherr >Assignee: Tilman Hausherr >Priority: Minor > Fix For: 3.0.3 PDFBox, 4.0.0 > > > This is part of PR#189 (which will be done in a future ticket) and is done > separately to shorten / clarify the patch. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5845) potential memory leak in TrueTypeCollection.java
[ https://issues.apache.org/jira/browse/PDFBOX-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5845: Fix Version/s: 3.0.3 PDFBox 4.0.0 > potential memory leak in TrueTypeCollection.java > > > Key: PDFBOX-5845 > URL: https://issues.apache.org/jira/browse/PDFBOX-5845 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 3.0.2 PDFBox > Reporter: Tilman Hausherr >Assignee: Tilman Hausherr >Priority: Minor > Fix For: 3.0.3 PDFBox, 4.0.0 > > > This is part of PR#189 (which will be done in a future ticket) and is done > separately to shorten / clarify the patch. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5845) potential memory leak in TrueTypeCollection.java
[ https://issues.apache.org/jira/browse/PDFBOX-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5845: Affects Version/s: 3.0.2 PDFBox > potential memory leak in TrueTypeCollection.java > > > Key: PDFBOX-5845 > URL: https://issues.apache.org/jira/browse/PDFBOX-5845 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 3.0.2 PDFBox > Reporter: Tilman Hausherr >Assignee: Tilman Hausherr >Priority: Minor > > This is part of PR#189 (which will be done in a future ticket) and is done > separately to shorten / clarify the patch. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5845) potential memory leak in TrueTypeCollection.java
Tilman Hausherr created PDFBOX-5845: --- Summary: potential memory leak in TrueTypeCollection.java Key: PDFBOX-5845 URL: https://issues.apache.org/jira/browse/PDFBOX-5845 Project: PDFBox Issue Type: Bug Components: FontBox Reporter: Tilman Hausherr Assignee: Tilman Hausherr This is part of PR#189 (which will be done in a future ticket) and is done separately to shorten / clarify the patch. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Closed] (PDFBOX-5844) The font "Symbol" throw an exception when rendering text
[ https://issues.apache.org/jira/browse/PDFBOX-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed PDFBOX-5844. --- Resolution: Not A Bug > The font "Symbol" throw an exception when rendering text > - > > Key: PDFBOX-5844 > URL: https://issues.apache.org/jira/browse/PDFBOX-5844 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.24 >Reporter: bai yuan >Priority: Major > Attachments: PDFBOX5844_Symbol_Mu.pdf, exportByExcel.pdf, > image-2024-06-24-16-05-20-296.png, pdfboxtest.java, symbol.ttf > > > Using PDType0Font.load to load the attch font, it will throw an exception > when rendering text. Excel can render it normally, see the “exportByExcel.pdf” > !image-2024-06-24-16-05-20-296.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Closed] (PDFBOX-5836) PDF A-1 falsely validated as invalid for ICC color profile regression
[ https://issues.apache.org/jira/browse/PDFBOX-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed PDFBOX-5836. --- Resolution: Invalid > PDF A-1 falsely validated as invalid for ICC color profile regression > - > > Key: PDFBOX-5836 > URL: https://issues.apache.org/jira/browse/PDFBOX-5836 > Project: PDFBox > Issue Type: Bug > Components: Preflight >Affects Versions: 3.0.2 PDFBox >Reporter: Jochen Stärk >Priority: Major > Attachments: MustangGnuaccountingBeispielRE-20190610_507blanko.pdf > > > PreflightParser.validate(theFile.toFile()).isValid() throws a "Unable to > parse the ICC Profile" on the attached, Libreoffice-generated PDF/A-1. > VeraPDF validates the file as valid. It worked with PDF 2 and I need it to be > fixed in context of my upgrade to PDFbox 3 > (https://github.com/ZUGFeRD/mustangproject/issues/373). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Reopened] (PDFBOX-5836) PDF A-1 falsely validated as invalid for ICC color profile regression
[ https://issues.apache.org/jira/browse/PDFBOX-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reopened PDFBOX-5836: - > PDF A-1 falsely validated as invalid for ICC color profile regression > - > > Key: PDFBOX-5836 > URL: https://issues.apache.org/jira/browse/PDFBOX-5836 > Project: PDFBox > Issue Type: Bug > Components: Preflight >Affects Versions: 3.0.2 PDFBox >Reporter: Jochen Stärk >Priority: Major > Attachments: MustangGnuaccountingBeispielRE-20190610_507blanko.pdf > > > PreflightParser.validate(theFile.toFile()).isValid() throws a "Unable to > parse the ICC Profile" on the attached, Libreoffice-generated PDF/A-1. > VeraPDF validates the file as valid. It worked with PDF 2 and I need it to be > fixed in context of my upgrade to PDFbox 3 > (https://github.com/ZUGFeRD/mustangproject/issues/373). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5844) The font "Symbol" throw an exception when rendering text
[ https://issues.apache.org/jira/browse/PDFBOX-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5844: Attachment: PDFBOX5844_Symbol_Mu.pdf > The font "Symbol" throw an exception when rendering text > - > > Key: PDFBOX-5844 > URL: https://issues.apache.org/jira/browse/PDFBOX-5844 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.24 >Reporter: bai yuan >Priority: Major > Attachments: PDFBOX5844_Symbol_Mu.pdf, exportByExcel.pdf, > image-2024-06-24-16-05-20-296.png, pdfboxtest.java, symbol.ttf > > > Using PDType0Font.load to load the attch font, it will throw an exception > when rendering text. Excel can render it normally, see the “exportByExcel.pdf” > !image-2024-06-24-16-05-20-296.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5844) The font "Symbol" throw an exception when rendering text
[ https://issues.apache.org/jira/browse/PDFBOX-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859620#comment-17859620 ] Tilman Hausherr commented on PDFBOX-5844: - Use "\uf06d" and it works, as shown in [^PDFBOX5844_Symbol_Mu.pdf] . > The font "Symbol" throw an exception when rendering text > - > > Key: PDFBOX-5844 > URL: https://issues.apache.org/jira/browse/PDFBOX-5844 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.24 >Reporter: bai yuan >Priority: Major > Attachments: PDFBOX5844_Symbol_Mu.pdf, exportByExcel.pdf, > image-2024-06-24-16-05-20-296.png, pdfboxtest.java, symbol.ttf > > > Using PDType0Font.load to load the attch font, it will throw an exception > when rendering text. Excel can render it normally, see the “exportByExcel.pdf” > !image-2024-06-24-16-05-20-296.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?
[ https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5843. - Fix Version/s: 2.0.32 3.0.3 PDFBox 4.0.0 Assignee: Tilman Hausherr Resolution: Fixed > There is an exception when getting embedded font, is it compatible? > --- > > Key: PDFBOX-5843 > URL: https://issues.apache.org/jira/browse/PDFBOX-5843 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: liu >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > Attachments: 123.pdf, image-2024-06-19-16-49-40-186.png, > screenshot-1.png, screenshot-2.png, screenshot-3.png, xxx.cff > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?
[ https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856317#comment-17856317 ] Tilman Hausherr commented on PDFBOX-5843: - Fixed in 1918445, 1918446, 1918447, 1918448, 1918449, 1918450 (svn2jira is down). Thanks for the report! > There is an exception when getting embedded font, is it compatible? > --- > > Key: PDFBOX-5843 > URL: https://issues.apache.org/jira/browse/PDFBOX-5843 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: liu >Priority: Major > Attachments: 123.pdf, image-2024-06-19-16-49-40-186.png, > screenshot-1.png, screenshot-2.png, screenshot-3.png, xxx.cff > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?
[ https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856283#comment-17856283 ] Tilman Hausherr edited comment on PDFBOX-5843 at 6/19/24 3:38 PM: -- Lets assume that the font is correct. I fixed the bug locally that empty entries now get skipped in CFFParser.java, by adding {{if (bytes.length == 0) continue;}} after {{for (byte[] bytes : fdIndex)}}. My first thought was that something goes wrong with the offsets because there is more than one fdindex entry. But I haven't been able to prove this. I also tried to install FontForge but it doesn't show anything. readIndexData char strings: code 9987 len 153 at offset 298684 code 12431 len 245 at offset 301280 code 14225 len 135 at offset 303318 for code 9987 (which has this "over the top" path) I checked that offset and length make sense, all the other bytes before/after are 0x0e, which is ENDCHAR type2 charstring: [-79, 67, 592, 63, HSTEM|, 164, 64, 388, 66, VSTEM|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, 17, -9, 28, -18, 13, -10, 62, 84, 54, 127, 33, 132, RRCURVETO|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, 63, 22, RCURVELINE|, -17, 92, -52, 139, -59, 106, RRCURVETO|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, 16, -10, 26, -22, 12, -11, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, -575, HLINETO|, -13, -5, -4, -12, -14, -43, -1, 2, -49, VHCURVETO|, 10, -19, 11, 9342, 8, 21, 39, VVCURVETO|, 575, 200, VLINETO|, -9, -53, -10, -55, -7, -37, 57, -11, RCURVELINE|, 13, 53, 18, 86, 13, 72, -46, -55, -419, HLINETO|, 21, 56, 18, 60, 14, 60, RRCURVETO|, -265, 16, -102] converted to type1 sequence: [0, 1000.0, HSBW|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, RRCURVETO|, 17, -9, 28, -18, 13, -10, RRCURVETO|, 62, 84, 54, 127, 33, 132, RRCURVETO|, CLOSEPATH|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, RRCURVETO|, 63, 22, RLINETO|, -17, 92, -52, 139, -59, 106, RRCURVETO|, CLOSEPATH|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, RRCURVETO|, 16, -10, 26, -22, 12, -11, RRCURVETO|, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, HLINETO|, -575, VLINETO|, 0, -13, -5, -4, -12, 0, RRCURVETO|, -14, 0, -43, -1, -49, 2, RRCURVETO|, 0, 10, -19, 11, 0, 9342, RRCURVETO|, 575, VLINETO|, 200, HLINETO|, -9, -53, -10, -55, -7, -37, RRCURVETO|, 57, -11, RLINETO|, 13, HLINETO|, 53, VLINETO|, 18, HLINETO|, 86, VLINETO|, 13, HLINETO|, 72, VLINETO|, -46, HLINETO|, -55, VLINETO|, -419, HLINETO|, 21, 56, 18, 60, 14, 60, RRCURVETO|] The many negative values are a "zone of interest". was (Author: tilman): Lets assume that the font is correct. I fixed the bug locally that empty entries now get skipped in CFFParser.java, by adding {{if (bytes.length == 0) continue;}} after {{for (byte[] bytes : fdIndex)}}. My first thought was that something goes wrong with the offsets because there is more than one fdindex entry. But I haven't been able to prove this. I also tried to install FontForge but it doesn't show anything. readIndexData char strings: code 9987 len 153 at offset 298684 code 12431 len 245 at offset 301280 code 14225 len 135 at offset 303318 for code 9987 (which has this "over the top" path) I checked that position and length make sense, all the other bytes before/after are 0x0e, which is ENDCHAR type2 charstring: [-79, 67, 592, 63, HSTEM|, 164, 64, 388, 66, VSTEM|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, 17, -9, 28, -18, 13, -10, 62, 84, 54, 127, 33, 132, RRCURVETO|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, 63, 22, RCURVELINE|, -17, 92, -52, 139, -59, 106, RRCURVETO|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, 16, -10, 26, -22, 12, -11, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, -575, HLINETO|, -13, -5, -4, -12, -14, -43, -1, 2, -49, VHCURVETO|, 10, -19, 11, 9342, 8, 21, 39, VVCURVETO|, 575, 200, VLINETO|, -9, -53, -10, -55, -7, -37, 57, -11, RCURVELINE|, 13, 53, 18, 86, 13, 72, -46, -55, -419, HLINETO|, 21, 56, 18, 60, 14, 60, RRCURVETO|, -265, 16, -102] converted to type1 sequence: [0, 1000.0, HSBW|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, RRCURVETO|, 17, -9, 28, -18, 13, -10, RRCURVETO|, 62, 84, 54, 127, 33, 132, RRCURVETO|, CLOSEPATH|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, RRCURVETO|, 63, 22, RLINETO|, -17, 92, -52, 139, -59, 106, RRCURVETO|, CLOSEPATH|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, RRCURVETO|, 16, -10, 26, -22, 12, -11, RRCURVETO|, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, HLINETO|, -575, VLINETO|, 0, -13, -5, -4, -12, 0, RRCURVETO|, -14, 0, -43, -1, -49, 2, RRCURVETO|, 0, 10, -19, 11, 0, 9342, RRCURVETO|, 575, VLINETO|, 200, HLINETO|, -9, -53, -10, -55, -7, -37, RRCURVETO|, 57, -11, RLINETO|, 13, HLINETO|, 53, VLINETO|, 18, HLINETO|, 86, VLINETO|, 13, HLINETO|, 72, VLINETO|, -46, HLINETO|, -55, VLINETO|, -419, HLINETO|, 21, 56, 18, 60, 14, 60,
[jira] [Updated] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?
[ https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5843: Attachment: screenshot-3.png > There is an exception when getting embedded font, is it compatible? > --- > > Key: PDFBOX-5843 > URL: https://issues.apache.org/jira/browse/PDFBOX-5843 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: liu >Priority: Major > Attachments: 123.pdf, image-2024-06-19-16-49-40-186.png, > screenshot-1.png, screenshot-2.png, screenshot-3.png, xxx.cff > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?
[ https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856289#comment-17856289 ] Tilman Hausherr commented on PDFBOX-5843: - It turns out to be completely different. I ran a 2.0.1 source code build with the change and hit an ArrayOutofBoundsException in CFFCIDFont.getLocalSubrIndex(). That means we can't just skip empty entries. Now I get this: !screenshot-3.png! I'll investigate some more but it seems promising. > There is an exception when getting embedded font, is it compatible? > --- > > Key: PDFBOX-5843 > URL: https://issues.apache.org/jira/browse/PDFBOX-5843 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: liu >Priority: Major > Attachments: 123.pdf, image-2024-06-19-16-49-40-186.png, > screenshot-1.png, screenshot-2.png, screenshot-3.png, xxx.cff > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?
[ https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856283#comment-17856283 ] Tilman Hausherr edited comment on PDFBOX-5843 at 6/19/24 2:17 PM: -- Lets assume that the font is correct. I fixed the bug locally that empty entries now get skipped in CFFParser.java, by adding {{if (bytes.length == 0) continue;}} after {{for (byte[] bytes : fdIndex)}}. My first thought was that something goes wrong with the offsets because there is more than one fdindex entry. But I haven't been able to prove this. I also tried to install FontForge but it doesn't show anything. readIndexData char strings: code 9987 len 153 at offset 298684 code 12431 len 245 at offset 301280 code 14225 len 135 at offset 303318 for code 9987 (which has this "over the top" path) I checked that position and length make sense, all the other bytes before/after are 0x0e, which is ENDCHAR type2 charstring: [-79, 67, 592, 63, HSTEM|, 164, 64, 388, 66, VSTEM|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, 17, -9, 28, -18, 13, -10, 62, 84, 54, 127, 33, 132, RRCURVETO|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, 63, 22, RCURVELINE|, -17, 92, -52, 139, -59, 106, RRCURVETO|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, 16, -10, 26, -22, 12, -11, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, -575, HLINETO|, -13, -5, -4, -12, -14, -43, -1, 2, -49, VHCURVETO|, 10, -19, 11, 9342, 8, 21, 39, VVCURVETO|, 575, 200, VLINETO|, -9, -53, -10, -55, -7, -37, 57, -11, RCURVELINE|, 13, 53, 18, 86, 13, 72, -46, -55, -419, HLINETO|, 21, 56, 18, 60, 14, 60, RRCURVETO|, -265, 16, -102] converted to type1 sequence: [0, 1000.0, HSBW|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, RRCURVETO|, 17, -9, 28, -18, 13, -10, RRCURVETO|, 62, 84, 54, 127, 33, 132, RRCURVETO|, CLOSEPATH|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, RRCURVETO|, 63, 22, RLINETO|, -17, 92, -52, 139, -59, 106, RRCURVETO|, CLOSEPATH|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, RRCURVETO|, 16, -10, 26, -22, 12, -11, RRCURVETO|, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, HLINETO|, -575, VLINETO|, 0, -13, -5, -4, -12, 0, RRCURVETO|, -14, 0, -43, -1, -49, 2, RRCURVETO|, 0, 10, -19, 11, 0, 9342, RRCURVETO|, 575, VLINETO|, 200, HLINETO|, -9, -53, -10, -55, -7, -37, RRCURVETO|, 57, -11, RLINETO|, 13, HLINETO|, 53, VLINETO|, 18, HLINETO|, 86, VLINETO|, 13, HLINETO|, 72, VLINETO|, -46, HLINETO|, -55, VLINETO|, -419, HLINETO|, 21, 56, 18, 60, 14, 60, RRCURVETO|] The many negative values are a "zone of interest".The many negative values are a "zone of interest". was (Author: tilman): Lets assume that the font is correct. I fixed the bug locally that empty entries now get skipped in CFFParser.java, by adding {{if (bytes.length == 0) continue;}} after {{for (byte[] bytes : fdIndex)}}. My first thought was that something goes wrong with the offsets because there is more than one fdindex entry. But I haven't been able to prove this. I also tried to install FontForge but it doesn't show anything. readIndexData char strings: code 9987 len 153 at offset 298684 code 12431 len 245 at offset 301280 code 14225 len 135 at offset 303318 for code 9987 (which has this "over the top" path) I checked that position and length make sense, all the other bytes before/after are 0x0e, which is ENDCHAR type2 charstring: [-78, 63, 362, 64, 158, 63, 69, 61, HSTEM|, 664, 69, VSTEM|, 67, 290, RMOVETO|, 54, -35, 57, -42, 52, -43, -54, -90, -68, -64, -81, -40, 15, -12, 20, -25, 8, -16, 85, 47, 70, 65, 57, 90, 43, -39, 37, -39, 25, -33, 47, 56, RCURVELINE|, -27, 35, -42, 41, -49, 41, 55, 112, 36, 143, 16, 182, -42, -55, -153, HLINETO|, 14, 70, 12, 69, 9, 63, -66, 4, RCURVELINE|, -7, -63, -12, -71, -14, -72, RRCURVETO|, -110, -63, 97, HLINETO|, -22, -105, -27, -102, -24, -72, RRCURVETO|, 286, 279, RMOVETO|, -16, -135, -31, -112, -43, -92, -39, 30, -42, 30, -40, 26, 21, 73, 22, 89, 19, 91, RRCURVETO|, 460, -39, 12902, ENDCHAR|] converted to type1 sequence: [0, 1000.0, HSBW|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, RRCURVETO|, 17, -9, 28, -18, 13, -10, RRCURVETO|, 62, 84, 54, 127, 33, 132, RRCURVETO|, CLOSEPATH|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, RRCURVETO|, 63, 22, RLINETO|, -17, 92, -52, 139, -59, 106, RRCURVETO|, CLOSEPATH|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, RRCURVETO|, 16, -10, 26, -22, 12, -11, RRCURVETO|, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, HLINETO|, -575, VLINETO|, 0, -13, -5, -4, -12, 0, RRCURVETO|, -14, 0, -43, -1, -49, 2, RRCURVETO|, 0, 10, -19, 11, 0, 9342, RRCURVETO|, 575, VLINETO|, 200, HLINETO|, -9, -53, -10, -55, -7, -37, RRCURVETO|, 57, -11, RLINETO|, 13, HLINETO|, 53, VLINETO|, 18, HLINETO|, 86, VLINETO|, 13, HLINETO|, 72, VLINETO|, -46, HLINETO|, -55, VLINETO|, -419, HLINETO|, 21, 56, 18, 60, 14, 60, RRCURVETO
[jira] [Comment Edited] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?
[ https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856283#comment-17856283 ] Tilman Hausherr edited comment on PDFBOX-5843 at 6/19/24 2:16 PM: -- Lets assume that the font is correct. I fixed the bug locally that empty entries now get skipped in CFFParser.java, by adding {{if (bytes.length == 0) continue;}} after {{for (byte[] bytes : fdIndex)}}. My first thought was that something goes wrong with the offsets because there is more than one fdindex entry. But I haven't been able to prove this. I also tried to install FontForge but it doesn't show anything. readIndexData char strings: code 9987 len 153 at offset 298684 code 12431 len 245 at offset 301280 code 14225 len 135 at offset 303318 for code 9987 (which has this "over the top" path) I checked that position and length make sense, all the other bytes before/after are 0x0e, which is ENDCHAR type2 charstring: [-78, 63, 362, 64, 158, 63, 69, 61, HSTEM|, 664, 69, VSTEM|, 67, 290, RMOVETO|, 54, -35, 57, -42, 52, -43, -54, -90, -68, -64, -81, -40, 15, -12, 20, -25, 8, -16, 85, 47, 70, 65, 57, 90, 43, -39, 37, -39, 25, -33, 47, 56, RCURVELINE|, -27, 35, -42, 41, -49, 41, 55, 112, 36, 143, 16, 182, -42, -55, -153, HLINETO|, 14, 70, 12, 69, 9, 63, -66, 4, RCURVELINE|, -7, -63, -12, -71, -14, -72, RRCURVETO|, -110, -63, 97, HLINETO|, -22, -105, -27, -102, -24, -72, RRCURVETO|, 286, 279, RMOVETO|, -16, -135, -31, -112, -43, -92, -39, 30, -42, 30, -40, 26, 21, 73, 22, 89, 19, 91, RRCURVETO|, 460, -39, 12902, ENDCHAR|] converted to type1 sequence: [0, 1000.0, HSBW|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, RRCURVETO|, 17, -9, 28, -18, 13, -10, RRCURVETO|, 62, 84, 54, 127, 33, 132, RRCURVETO|, CLOSEPATH|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, RRCURVETO|, 63, 22, RLINETO|, -17, 92, -52, 139, -59, 106, RRCURVETO|, CLOSEPATH|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, RRCURVETO|, 16, -10, 26, -22, 12, -11, RRCURVETO|, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, HLINETO|, -575, VLINETO|, 0, -13, -5, -4, -12, 0, RRCURVETO|, -14, 0, -43, -1, -49, 2, RRCURVETO|, 0, 10, -19, 11, 0, 9342, RRCURVETO|, 575, VLINETO|, 200, HLINETO|, -9, -53, -10, -55, -7, -37, RRCURVETO|, 57, -11, RLINETO|, 13, HLINETO|, 53, VLINETO|, 18, HLINETO|, 86, VLINETO|, 13, HLINETO|, 72, VLINETO|, -46, HLINETO|, -55, VLINETO|, -419, HLINETO|, 21, 56, 18, 60, 14, 60, RRCURVETO|] The many negative values are a "zone of interest". was (Author: tilman): Lets assume that the font is correct. I fixed the bug locally that empty entries now get skipped in CFFParser.java, by adding {{if (bytes.length == 0) continue;}} after {{for (byte[] bytes : fdIndex)}}. My first thought was that something goes wrong with the offsets because there is more than one fdindex entry. But I haven't been able to prove this. I also tried to install FontForge but it doesn't show anything. readIndexData char strings: code 9987 len 153 at offset 298684 code 12431 len 245 at offset 301280 code 14225 len 135 at offset 303318 for code 9987 (which has this "over the top" path) I checked that position and length make sense, all the other bytes before/after are 0x0e, which is ENDCHAR type2 charstring: [-78, 63, 362, 64, 158, 63, 69, 61, HSTEM|, 664, 69, VSTEM|, 67, 290, RMOVETO|, 54, -35, 57, -42, 52, -43, -54, -90, -68, -64, -81, -40, 15, -12, 20, -25, 8, -16, 85, 47, 70, 65, 57, 90, 43, -39, 37, -39, 25, -33, 47, 56, RCURVELINE|, -27, 35, -42, 41, -49, 41, 55, 112, 36, 143, 16, 182, -42, -55, -153, HLINETO|, 14, 70, 12, 69, 9, 63, -66, 4, RCURVELINE|, -7, -63, -12, -71, -14, -72, RRCURVETO|, -110, -63, 97, HLINETO|, -22, -105, -27, -102, -24, -72, RRCURVETO|, 286, 279, RMOVETO|, -16, -135, -31, -112, -43, -92, -39, 30, -42, 30, -40, 26, 21, 73, 22, 89, 19, 91, RRCURVETO|, 460, -39, 12902, ENDCHAR|] converted to type1 sequence: [0, 1000.0, HSBW|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, RRCURVETO|, 17, -9, 28, -18, 13, -10, RRCURVETO|, 62, 84, 54, 127, 33, 132, RRCURVETO|, CLOSEPATH|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, RRCURVETO|, 63, 22, RLINETO|, -17, 92, -52, 139, -59, 106, RRCURVETO|, CLOSEPATH|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, RRCURVETO|, 16, -10, 26, -22, 12, -11, RRCURVETO|, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, HLINETO|, -575, VLINETO|, 0, -13, -5, -4, -12, 0, RRCURVETO|, -14, 0, -43, -1, -49, 2, RRCURVETO|, 0, 10, -19, 11, 0, 9342, RRCURVETO|, 575, VLINETO|, 200, HLINETO|, -9, -53, -10, -55, -7, -37, RRCURVETO|, 57, -11, RLINETO|, 13, HLINETO|, 53, VLINETO|, 18, HLINETO|, 86, VLINETO|, 13, HLINETO|, 72, VLINETO|, -46, HLINETO|, -55, VLINETO|, -419, HLINETO|, 21, 56, 18, 60, 14, 60, RRCURVETO|] The many negative values are a "zone of interest". > There is an exception w
[jira] [Commented] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?
[ https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856283#comment-17856283 ] Tilman Hausherr commented on PDFBOX-5843: - Lets assume that the font is correct. I fixed the bug locally that empty entries now get skipped in CFFParser.java, by adding {{if (bytes.length == 0) continue;}} after {{for (byte[] bytes : fdIndex)}}. My first thought was that something goes wrong with the offsets because there is more than one fdindex entry. But I haven't been able to prove this. I also tried to install FontForge but it doesn't show anything. readIndexData char strings: code 9987 len 153 at offset 298684 code 12431 len 245 at offset 301280 code 14225 len 135 at offset 303318 for code 9987 (which has this "over the top" path) I checked that position and length make sense, all the other bytes before/after are 0x0e, which is ENDCHAR type2 charstring: [-78, 63, 362, 64, 158, 63, 69, 61, HSTEM|, 664, 69, VSTEM|, 67, 290, RMOVETO|, 54, -35, 57, -42, 52, -43, -54, -90, -68, -64, -81, -40, 15, -12, 20, -25, 8, -16, 85, 47, 70, 65, 57, 90, 43, -39, 37, -39, 25, -33, 47, 56, RCURVELINE|, -27, 35, -42, 41, -49, 41, 55, 112, 36, 143, 16, 182, -42, -55, -153, HLINETO|, 14, 70, 12, 69, 9, 63, -66, 4, RCURVELINE|, -7, -63, -12, -71, -14, -72, RRCURVETO|, -110, -63, 97, HLINETO|, -22, -105, -27, -102, -24, -72, RRCURVETO|, 286, 279, RMOVETO|, -16, -135, -31, -112, -43, -92, -39, 30, -42, 30, -40, 26, 21, 73, 22, 89, 19, 91, RRCURVETO|, 460, -39, 12902, ENDCHAR|] converted to type1 sequence: [0, 1000.0, HSBW|, 453, 414, RMOVETO|, -29, -122, -48, -120, -64, -78, RRCURVETO|, 17, -9, 28, -18, 13, -10, RRCURVETO|, 62, 84, 54, 127, 33, 132, RRCURVETO|, CLOSEPATH|, 242, 1, RMOVETO|, 56, -106, 52, -142, 17, -92, RRCURVETO|, 63, 22, RLINETO|, -17, 92, -52, 139, -59, 106, RRCURVETO|, CLOSEPATH|, -353, 414, RMOVETO|, -33, -149, -57, -145, -74, -94, RRCURVETO|, 16, -10, 26, -22, 12, -11, RRCURVETO|, 36, 49, 33, 60, 29, 68, RRCURVETO|, 160, HLINETO|, -575, VLINETO|, 0, -13, -5, -4, -12, 0, RRCURVETO|, -14, 0, -43, -1, -49, 2, RRCURVETO|, 0, 10, -19, 11, 0, 9342, RRCURVETO|, 575, VLINETO|, 200, HLINETO|, -9, -53, -10, -55, -7, -37, RRCURVETO|, 57, -11, RLINETO|, 13, HLINETO|, 53, VLINETO|, 18, HLINETO|, 86, VLINETO|, 13, HLINETO|, 72, VLINETO|, -46, HLINETO|, -55, VLINETO|, -419, HLINETO|, 21, 56, 18, 60, 14, 60, RRCURVETO|] The many negative values are a "zone of interest". > There is an exception when getting embedded font, is it compatible? > --- > > Key: PDFBOX-5843 > URL: https://issues.apache.org/jira/browse/PDFBOX-5843 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: liu >Priority: Major > Attachments: 123.pdf, image-2024-06-19-16-49-40-186.png, > screenshot-1.png, screenshot-2.png, xxx.cff > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?
[ https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5843: Attachment: xxx.cff > There is an exception when getting embedded font, is it compatible? > --- > > Key: PDFBOX-5843 > URL: https://issues.apache.org/jira/browse/PDFBOX-5843 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: liu >Priority: Major > Attachments: 123.pdf, image-2024-06-19-16-49-40-186.png, > screenshot-1.png, screenshot-2.png, xxx.cff > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?
[ https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856218#comment-17856218 ] Tilman Hausherr commented on PDFBOX-5843: - I'm wondering whether something is wrong on our side. The fdindex contains many empty entries. When I tried to skip these, I got this rendering: !screenshot-2.png! > There is an exception when getting embedded font, is it compatible? > --- > > Key: PDFBOX-5843 > URL: https://issues.apache.org/jira/browse/PDFBOX-5843 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: liu >Priority: Major > Attachments: 123.pdf, screenshot-1.png, screenshot-2.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5843) There is an exception when getting embedded font, is it compatible?
[ https://issues.apache.org/jira/browse/PDFBOX-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5843: Attachment: screenshot-2.png > There is an exception when getting embedded font, is it compatible? > --- > > Key: PDFBOX-5843 > URL: https://issues.apache.org/jira/browse/PDFBOX-5843 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: liu >Priority: Major > Attachments: 123.pdf, screenshot-1.png, screenshot-2.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5842) IllegalArgumentException: Width (26) and height (0) must be non-zero
[ https://issues.apache.org/jira/browse/PDFBOX-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855852#comment-17855852 ] Tilman Hausherr commented on PDFBOX-5842: - Thanks, yes, good observation. > IllegalArgumentException: Width (26) and height (0) must be non-zero > > > Key: PDFBOX-5842 > URL: https://issues.apache.org/jira/browse/PDFBOX-5842 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.31, 3.0.2 PDFBox > Reporter: Tilman Hausherr >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > > reported by Patrycja Zaremba in the users mailing list > https://lists.apache.org/thread/xnwcyhq2c16d9xfgqwgjs70k9qb1w8tp > {quote}When the page which I try to convert have any element which is png with > only 1px height (28x1, 54x1 etc.) it is scaled down to 0 and I got this{quote} > IllegalArgumentException: Width (26) and height (0) must be non-zero > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1281) > > org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:74) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5841) First split result document misses metadata after split
[ https://issues.apache.org/jira/browse/PDFBOX-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855787#comment-17855787 ] Tilman Hausherr commented on PDFBOX-5841: - I'm worried because this is unexpected for an average user wanting to do similar manipulations. So the point of this change is that we assign the indirect object? I'm wondering if the problem would happen with the other assignments, e.g. ViewerPreferences if they exist. > First split result document misses metadata after split > --- > > Key: PDFBOX-5841 > URL: https://issues.apache.org/jira/browse/PDFBOX-5841 > Project: PDFBox > Issue Type: Bug > Components: Writing >Affects Versions: 3.0.3 PDFBox, 4.0.0 > Reporter: Tilman Hausherr >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.3 PDFBox, 4.0.0 > > Attachments: splitresult1.pdf, splitresult2.pdf > > > This happens with the test file of PDFBOX-5840 and can also be reproduced > with the command line utility: the first split result file doesn't have the > metadata. > Alternatively it can be reproduced programmatically by adding this code below > {{assertEquals(5, pageTree.indexOf(pd5.getPage()));}} in > {code:java} > assertNotNull(dstDoc.getDocumentCatalog().getMetadata()); > ByteArrayOutputStream baos = new ByteArrayOutputStream(); > dstDoc.save(baos); > PDDocument reloadedDoc = Loader.loadPDF(baos.toByteArray()); > assertNotNull(reloadedDoc.getDocumentCatalog().getMetadata()); > reloadedDoc.close(); > {code} > I believe this is another writing problem, because the metadata exists, but > gets lost during the first save, not during a second one (not part of the > test code). It is expected to be object 116. It doesn't happen with 2.0. > Attached: two saved files by splitting so that the entire file is the result. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5842) IllegalArgumentException: Width (26) and height (0) must be non-zero
Tilman Hausherr created PDFBOX-5842: --- Summary: IllegalArgumentException: Width (26) and height (0) must be non-zero Key: PDFBOX-5842 URL: https://issues.apache.org/jira/browse/PDFBOX-5842 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 3.0.2 PDFBox, 2.0.31 Reporter: Tilman Hausherr Assignee: Tilman Hausherr Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 reported by Patrycja Zaremba in the users mailing list https://lists.apache.org/thread/xnwcyhq2c16d9xfgqwgjs70k9qb1w8tp {quote}When the page which I try to convert have any element which is png with only 1px height (28x1, 54x1 etc.) it is scaled down to 0 and I got this{quote} IllegalArgumentException: Width (26) and height (0) must be non-zero org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1281) org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:74) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5836) PDF A-1 falsely validated as invalid for ICC color profile regression
[ https://issues.apache.org/jira/browse/PDFBOX-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855242#comment-17855242 ] Tilman Hausherr commented on PDFBOX-5836: - I used the command line application. My jdk11 version: java version "11.0.21" 2023-10-17 LTS Java(TM) SE Runtime Environment 18.9 (build 11.0.21+9-LTS-193) Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.21+9-LTS-193, mixed mode) > PDF A-1 falsely validated as invalid for ICC color profile regression > - > > Key: PDFBOX-5836 > URL: https://issues.apache.org/jira/browse/PDFBOX-5836 > Project: PDFBox > Issue Type: Bug > Components: Preflight >Affects Versions: 3.0.2 PDFBox >Reporter: Jochen Stärk >Priority: Major > Attachments: MustangGnuaccountingBeispielRE-20190610_507blanko.pdf > > > PreflightParser.validate(theFile.toFile()).isValid() throws a "Unable to > parse the ICC Profile" on the attached, Libreoffice-generated PDF/A-1. > VeraPDF validates the file as valid. It worked with PDF 2 and I need it to be > fixed in context of my upgrade to PDFbox 3 > (https://github.com/ZUGFeRD/mustangproject/issues/373). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5834) [PATCH] PDF split missing names from documentCatalog
[ https://issues.apache.org/jira/browse/PDFBOX-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855217#comment-17855217 ] Tilman Hausherr commented on PDFBOX-5834: - I have attached two files. This is really weird stuff, which relies on JS usage. I wonder why this would have to be split at all. > [PATCH] PDF split missing names from documentCatalog > > > Key: PDFBOX-5834 > URL: https://issues.apache.org/jira/browse/PDFBOX-5834 > Project: PDFBox > Issue Type: Bug >Reporter: Simon Steiner >Priority: Major > Attachments: 726725.pdf, 801500.pdf, tmp.patch > > > java -jar app/target/pdfbox-app-2.0.32-SNAPSHOT.jar PDFSplit xxx.pdf > I would expect to see the names dict inside the documentCatalog which is used > to store pdf templates -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5834) [PATCH] PDF split missing names from documentCatalog
[ https://issues.apache.org/jira/browse/PDFBOX-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5834: Attachment: 726725.pdf > [PATCH] PDF split missing names from documentCatalog > > > Key: PDFBOX-5834 > URL: https://issues.apache.org/jira/browse/PDFBOX-5834 > Project: PDFBox > Issue Type: Bug >Reporter: Simon Steiner >Priority: Major > Attachments: 726725.pdf, 801500.pdf, tmp.patch > > > java -jar app/target/pdfbox-app-2.0.32-SNAPSHOT.jar PDFSplit xxx.pdf > I would expect to see the names dict inside the documentCatalog which is used > to store pdf templates -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5834) [PATCH] PDF split missing names from documentCatalog
[ https://issues.apache.org/jira/browse/PDFBOX-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5834: Attachment: 801500.pdf > [PATCH] PDF split missing names from documentCatalog > > > Key: PDFBOX-5834 > URL: https://issues.apache.org/jira/browse/PDFBOX-5834 > Project: PDFBox > Issue Type: Bug >Reporter: Simon Steiner >Priority: Major > Attachments: 726725.pdf, 801500.pdf, tmp.patch > > > java -jar app/target/pdfbox-app-2.0.32-SNAPSHOT.jar PDFSplit xxx.pdf > I would expect to see the names dict inside the documentCatalog which is used > to store pdf templates -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5841) First split result document misses metadata after split
[ https://issues.apache.org/jira/browse/PDFBOX-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5841: Summary: First split result document misses metadata after split (was: Split result document misses metadata after split) > First split result document misses metadata after split > --- > > Key: PDFBOX-5841 > URL: https://issues.apache.org/jira/browse/PDFBOX-5841 > Project: PDFBox > Issue Type: Bug > Components: Writing >Affects Versions: 3.0.3 PDFBox, 4.0.0 > Reporter: Tilman Hausherr >Priority: Major > Fix For: 3.0.3 PDFBox, 4.0.0 > > Attachments: splitresult1.pdf, splitresult2.pdf > > > This happens with the test file of PDFBOX-5840 and can also be reproduced > with the command line utility: the first split result file doesn't have the > metadata. > Alternatively it can be reproduced programmatically by adding this code below > {{assertEquals(5, pageTree.indexOf(pd5.getPage()));}} in > {code:java} > assertNotNull(dstDoc.getDocumentCatalog().getMetadata()); > ByteArrayOutputStream baos = new ByteArrayOutputStream(); > dstDoc.save(baos); > PDDocument reloadedDoc = Loader.loadPDF(baos.toByteArray()); > assertNotNull(reloadedDoc.getDocumentCatalog().getMetadata()); > reloadedDoc.close(); > {code} > I believe this is another writing problem, because the metadata exists, but > gets lost during the first save, not during a second one (not part of the > test code). It is expected to be object 116. It doesn't happen with 2.0. > Attached: two saved files by splitting so that the entire file is the result. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5841) Split result document misses metadata after split
Tilman Hausherr created PDFBOX-5841: --- Summary: Split result document misses metadata after split Key: PDFBOX-5841 URL: https://issues.apache.org/jira/browse/PDFBOX-5841 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 3.0.3 PDFBox, 4.0.0 Reporter: Tilman Hausherr Fix For: 3.0.3 PDFBox, 4.0.0 Attachments: splitresult1.pdf, splitresult2.pdf This happens with the test file of PDFBOX-5840 and can also be reproduced with the command line utility: the first split result file doesn't have the metadata. Alternatively it can be reproduced programmatically by adding this code below {{assertEquals(5, pageTree.indexOf(pd5.getPage()));}} in {code:java} assertNotNull(dstDoc.getDocumentCatalog().getMetadata()); ByteArrayOutputStream baos = new ByteArrayOutputStream(); dstDoc.save(baos); PDDocument reloadedDoc = Loader.loadPDF(baos.toByteArray()); assertNotNull(reloadedDoc.getDocumentCatalog().getMetadata()); reloadedDoc.close(); {code} I believe this is another writing problem, because the metadata exists, but gets lost during the first save, not during a second one (not part of the test code). It is expected to be object 116. It doesn't happen with 2.0. Attached: two saved files by splitting so that the entire file is the result. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName
[ https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5835. - Assignee: Tilman Hausherr Resolution: Fixed [~O.Schmidtmer] thanks for reporting [~msahyoun] thanks for the help > DomXmpParser - IllegalArgumentException: prefix cannot be "null" when > creating a QName > -- > > Key: PDFBOX-5835 > URL: https://issues.apache.org/jira/browse/PDFBOX-5835 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: Oliver Schmidtmer >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > > I've got a PDF from, where parsing the metadata fails with an > IllegalArgumentException > {code:java} > java.lang.IllegalArgumentException: prefix cannot be "null" when creating a > QName > at java.xml/javax.xml.namespace.QName.(QName.java:192) > at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99) > at > org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306) > at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112) > {code} > This can be reproduced with a simple test, using the extracted metadata: > {code:java} > @Test > void testDomXmpParser() throws XmpParsingException > { > // taken from file test-landscape2.pdf > String xmpmeta = " standalone=\"no\"?>\n" + > " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" + > " xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" + > " xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" + > " 3\n" + > " A\n" + > " \n" + > " xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; > xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; > xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; > xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; > xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" + > " \n" + > "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" + > "\n" + > " \n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension > Schema\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n" > + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" + > " \n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML > invoice file\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" + > " xmlns=\"http://www.aiim.org/pdfa/n
[jira] [Updated] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName
[ https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5835: Fix Version/s: 2.0.32 3.0.3 PDFBox 4.0.0 > DomXmpParser - IllegalArgumentException: prefix cannot be "null" when > creating a QName > -- > > Key: PDFBOX-5835 > URL: https://issues.apache.org/jira/browse/PDFBOX-5835 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: Oliver Schmidtmer >Priority: Major > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > > I've got a PDF from, where parsing the metadata fails with an > IllegalArgumentException > {code:java} > java.lang.IllegalArgumentException: prefix cannot be "null" when creating a > QName > at java.xml/javax.xml.namespace.QName.(QName.java:192) > at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99) > at > org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306) > at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112) > {code} > This can be reproduced with a simple test, using the extracted metadata: > {code:java} > @Test > void testDomXmpParser() throws XmpParsingException > { > // taken from file test-landscape2.pdf > String xmpmeta = " standalone=\"no\"?>\n" + > " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" + > " xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" + > " xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" + > " 3\n" + > " A\n" + > " \n" + > " xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; > xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; > xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; > xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; > xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" + > " \n" + > "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" + > "\n" + > " \n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension > Schema\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n" > + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" + > " \n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML > invoice file\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.
[jira] [Updated] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName
[ https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5835: Affects Version/s: 2.0.31 > DomXmpParser - IllegalArgumentException: prefix cannot be "null" when > creating a QName > -- > > Key: PDFBOX-5835 > URL: https://issues.apache.org/jira/browse/PDFBOX-5835 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: Oliver Schmidtmer >Priority: Major > > I've got a PDF from, where parsing the metadata fails with an > IllegalArgumentException > {code:java} > java.lang.IllegalArgumentException: prefix cannot be "null" when creating a > QName > at java.xml/javax.xml.namespace.QName.(QName.java:192) > at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99) > at > org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306) > at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112) > {code} > This can be reproduced with a simple test, using the extracted metadata: > {code:java} > @Test > void testDomXmpParser() throws XmpParsingException > { > // taken from file test-landscape2.pdf > String xmpmeta = " standalone=\"no\"?>\n" + > " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" + > " xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" + > " xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" + > " 3\n" + > " A\n" + > " \n" + > " xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; > xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; > xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; > xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; > xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" + > " \n" + > "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" + > "\n" + > " \n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension > Schema\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n" > + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" + > " \n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML > invoice file\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\
[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName
[ https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855075#comment-17855075 ] Tilman Hausherr commented on PDFBOX-5835: - I have created a reduced file from yours and have verified that the two modifications are hit and have an effect. I will add a test with that file. > DomXmpParser - IllegalArgumentException: prefix cannot be "null" when > creating a QName > -- > > Key: PDFBOX-5835 > URL: https://issues.apache.org/jira/browse/PDFBOX-5835 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 3.0.2 PDFBox >Reporter: Oliver Schmidtmer >Priority: Major > > I've got a PDF from, where parsing the metadata fails with an > IllegalArgumentException > {code:java} > java.lang.IllegalArgumentException: prefix cannot be "null" when creating a > QName > at java.xml/javax.xml.namespace.QName.(QName.java:192) > at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99) > at > org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306) > at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112) > {code} > This can be reproduced with a simple test, using the extracted metadata: > {code:java} > @Test > void testDomXmpParser() throws XmpParsingException > { > // taken from file test-landscape2.pdf > String xmpmeta = " standalone=\"no\"?>\n" + > " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" + > " xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" + > " xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" + > " 3\n" + > " A\n" + > " \n" + > " xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; > xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; > xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; > xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; > xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" + > " \n" + > "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" + > "\n" + > " \n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension > Schema\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n" > + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" + > " \n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML > invoice file\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + >
[jira] [Resolved] (PDFBOX-5840) When splitting, keep named page destinations that are part of target document(s)
[ https://issues.apache.org/jira/browse/PDFBOX-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5840. - Resolution: Fixed > When splitting, keep named page destinations that are part of target > document(s) > > > Key: PDFBOX-5840 > URL: https://issues.apache.org/jira/browse/PDFBOX-5840 > Project: PDFBox > Issue Type: Improvement > Components: Utilities >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: Tilman Hausherr > Assignee: Tilman Hausherr >Priority: Minor > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > Attachments: 410609.pdf, named-dest-handling abandoned code.txt > > > Keep named destinations. The current code just ignores them. I wrote some 40 > lines that would create a name tree in the destination document, but this > didn't work because the destination name gets modified when retrieved as a > string. So I just keep the actual destination and forget the name, which is a > single code line. It's a new document anyway and the average user expectation > is that the links "just work". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5834) [PATCH] PDF split missing names from documentCatalog
[ https://issues.apache.org/jira/browse/PDFBOX-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854917#comment-17854917 ] Tilman Hausherr edited comment on PDFBOX-5834 at 6/14/24 1:51 PM: -- I'd like to see an example of such a PDF. And I'm also wondering whether the current solution misses named destinations, which would be a more common problem. (update: done in PDFBOX-5840) was (Author: tilman): I'd like to see an example of such a PDF. And I'm also wondering whether the current solution misses named destinations, which would be a more common problem. > [PATCH] PDF split missing names from documentCatalog > > > Key: PDFBOX-5834 > URL: https://issues.apache.org/jira/browse/PDFBOX-5834 > Project: PDFBox > Issue Type: Bug >Reporter: Simon Steiner >Priority: Major > Attachments: tmp.patch > > > java -jar app/target/pdfbox-app-2.0.32-SNAPSHOT.jar PDFSplit xxx.pdf > I would expect to see the names dict inside the documentCatalog which is used > to store pdf templates -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5840) When splitting, keep named page destinations that are part of target document(s)
[ https://issues.apache.org/jira/browse/PDFBOX-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855042#comment-17855042 ] Tilman Hausherr commented on PDFBOX-5840: - Copyright: the document is published by the USDA, see https://web.archive.org/web/20050411153046/http://www.nal.usda.gov/awic/pubs/Fishwelfare/ which links to https://web.archive.org/web/20050411172414/http://www.nal.usda.gov/awic/pubs/Fishwelfare/culture.htm which links to https://web.archive.org/web/20050411212421/http://www.nal.usda.gov/awic/pubs/Fishwelfare/aquar.htm which links to our PDF. > When splitting, keep named page destinations that are part of target > document(s) > > > Key: PDFBOX-5840 > URL: https://issues.apache.org/jira/browse/PDFBOX-5840 > Project: PDFBox > Issue Type: Improvement > Components: Utilities >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: Tilman Hausherr > Assignee: Tilman Hausherr >Priority: Minor > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > Attachments: 410609.pdf, named-dest-handling abandoned code.txt > > > Keep named destinations. The current code just ignores them. I wrote some 40 > lines that would create a name tree in the destination document, but this > didn't work because the destination name gets modified when retrieved as a > string. So I just keep the actual destination and forget the name, which is a > single code line. It's a new document anyway and the average user expectation > is that the links "just work". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5840) When splitting, keep named page destinations that are part of target document(s)
Tilman Hausherr created PDFBOX-5840: --- Summary: When splitting, keep named page destinations that are part of target document(s) Key: PDFBOX-5840 URL: https://issues.apache.org/jira/browse/PDFBOX-5840 Project: PDFBox Issue Type: Improvement Components: Utilities Affects Versions: 3.0.2 PDFBox, 2.0.31 Reporter: Tilman Hausherr Assignee: Tilman Hausherr Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 Attachments: 410609.pdf, named-dest-handling abandoned code.txt Keep named destinations. The current code just ignores them. I wrote some 40 lines that would create a name tree in the destination document, but this didn't work because the destination name gets modified when retrieved as a string. So I just keep the actual destination and forget the name, which is a single code line. It's a new document anyway and the average user expectation is that the links "just work". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5840) When splitting, keep named page destinations that are part of target document(s)
[ https://issues.apache.org/jira/browse/PDFBOX-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5840: Attachment: named-dest-handling abandoned code.txt > When splitting, keep named page destinations that are part of target > document(s) > > > Key: PDFBOX-5840 > URL: https://issues.apache.org/jira/browse/PDFBOX-5840 > Project: PDFBox > Issue Type: Improvement > Components: Utilities >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: Tilman Hausherr > Assignee: Tilman Hausherr >Priority: Minor > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > Attachments: 410609.pdf, named-dest-handling abandoned code.txt > > > Keep named destinations. The current code just ignores them. I wrote some 40 > lines that would create a name tree in the destination document, but this > didn't work because the destination name gets modified when retrieved as a > string. So I just keep the actual destination and forget the name, which is a > single code line. It's a new document anyway and the average user expectation > is that the links "just work". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: PDFBox 2.0.32 release
Result: https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz From what I see, nothing to do. And I know the time it takes: 3 hours for the A (or B) test, 1 hour to create the A vs B report (tika-eval). Tilman On 14.06.2024 08:47, Tilman Hausherr wrote: I'll repeat the regression tests with locally reverting the change from PDFBOX-5790 but locally adding my proposed xmpbox change from PDFBOX-5835. This way we'll know whether there are other problems. Tilman On 13.06.2024 19:23, Tilman Hausherr wrote: See https://issues.apache.org/jira/browse/PDFBOX-5838 I hope that it's all the same problem. Tilman On 13.06.2024 18:30, Andreas Lehmkühler wrote: Thanks for running the tests. the exceptions part looks good, but I'm afraid we have a text extraction issue. commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI some of the special characters changed. In 2.0.31 the were "omitted" and in 2.0.32 there is some special char. But th remaining part looks good to me. cc-main-2021-31-pdf-untruncated/0085/0085885.pdf ist seems to contain some special characters as well, but 2.0.31 is able to extract them. 2.0.32 seems to mix some of the content. I guess it is somehow font related. Need to investigate more Andreas Am 12.06.24 um 20:23 schrieb Tilman Hausherr: https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz No new exceptions but many content differences. I haven't investigated yet. Tilman On 12.06.2024 11:31, Tilman Hausherr wrote: I've started the tests. If there aren't any troubles I'll have the results tomorrow. Tilman On 05.06.2024 08:07, Andreas Lehmkühler wrote: Thanks for the update. I'm going to postpone the release as I'll need any helping hand I can get. Andreas Am 02.06.24 um 14:22 schrieb Tilman Hausherr: +1 but I won't be able to help with tests this time Tilman On 01.06.2024 12:15, Andreas Lehmkühler wrote: Hi, IMHO it is time to cut another 2.0.x release. I'm planing to do so in a week or so? Any objections or is there something we should add/fix first? Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-5839) ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to org.apache.pdfbox.cos.COSDictionary
[ https://issues.apache.org/jira/browse/PDFBOX-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5839. - Fix Version/s: 2.0.32 3.0.3 PDFBox 4.0.0 Assignee: Tilman Hausherr Resolution: Fixed ok thanks. I have added both files to my local rendering test set. > ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to > org.apache.pdfbox.cos.COSDictionary > --- > > Key: PDFBOX-5839 > URL: https://issues.apache.org/jira/browse/PDFBOX-5839 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: liu > Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > Attachments: 1.pdf, 2.pdf, image-2024-06-14-15-36-01-099.png, > image-2024-06-14-16-35-39-381.png, image-2024-06-14-16-39-47-557.png > > > [^1.pdf][^2.pdf] > ^!image-2024-06-14-15-36-01-099.png|width=395,height=214!^ > When rendering 1.pdf and 2.pdf, there will be a ClassCastException problem. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5839) ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to org.apache.pdfbox.cos.COSDictionary
[ https://issues.apache.org/jira/browse/PDFBOX-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854991#comment-17854991 ] Tilman Hausherr commented on PDFBOX-5839: - Yeah I fixed many in the first commit series, but then I didn't fix the rest because I had used a narrow search string to copy my change everywhere and then forgot to use a general search to find other occurences. Thanks. I hope this time I got all of them. > ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to > org.apache.pdfbox.cos.COSDictionary > --- > > Key: PDFBOX-5839 > URL: https://issues.apache.org/jira/browse/PDFBOX-5839 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: liu >Priority: Major > Attachments: 1.pdf, 2.pdf, image-2024-06-14-15-36-01-099.png, > image-2024-06-14-16-35-39-381.png, image-2024-06-14-16-39-47-557.png > > > [^1.pdf][^2.pdf] > ^!image-2024-06-14-15-36-01-099.png|width=395,height=214!^ > When rendering 1.pdf and 2.pdf, there will be a ClassCastException problem. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5839) ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to org.apache.pdfbox.cos.COSDictionary
[ https://issues.apache.org/jira/browse/PDFBOX-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854974#comment-17854974 ] Tilman Hausherr commented on PDFBOX-5839: - I didn't get an exception abort with file 2, but the change fixes the potential exception from the image. > ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to > org.apache.pdfbox.cos.COSDictionary > --- > > Key: PDFBOX-5839 > URL: https://issues.apache.org/jira/browse/PDFBOX-5839 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: liu >Priority: Major > Attachments: 1.pdf, 2.pdf, image-2024-06-14-15-36-01-099.png > > > [^1.pdf][^2.pdf] > ^!image-2024-06-14-15-36-01-099.png|width=395,height=214!^ > When rendering 1.pdf and 2.pdf, there will be a ClassCastException problem. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5839) ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to org.apache.pdfbox.cos.COSDictionary
[ https://issues.apache.org/jira/browse/PDFBOX-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5839: Description: [^1.pdf][^2.pdf] ^!image-2024-06-14-15-36-01-099.png|width=395,height=214!^ When rendering 1.pdf and 2.pdf, there will be a ClassCastException problem. was: [^1.pdf][^2.pdf] ^!image-2024-06-14-15-36-01-099.png|width=395,height=214!^ When converting 1.pdf and 2.pdf, there will be a ClassCastException problem. > ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to > org.apache.pdfbox.cos.COSDictionary > --- > > Key: PDFBOX-5839 > URL: https://issues.apache.org/jira/browse/PDFBOX-5839 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: liu >Priority: Major > Attachments: 1.pdf, 2.pdf, image-2024-06-14-15-36-01-099.png > > > [^1.pdf][^2.pdf] > ^!image-2024-06-14-15-36-01-099.png|width=395,height=214!^ > When rendering 1.pdf and 2.pdf, there will be a ClassCastException problem. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5839) ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to org.apache.pdfbox.cos.COSDictionary
[ https://issues.apache.org/jira/browse/PDFBOX-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5839: Component/s: Rendering > ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to > org.apache.pdfbox.cos.COSDictionary > --- > > Key: PDFBOX-5839 > URL: https://issues.apache.org/jira/browse/PDFBOX-5839 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: liu >Priority: Major > Attachments: 1.pdf, 2.pdf, image-2024-06-14-15-36-01-099.png > > > [^1.pdf][^2.pdf] > ^!image-2024-06-14-15-36-01-099.png|width=395,height=214!^ > When converting 1.pdf and 2.pdf, there will be a ClassCastException problem. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: PDFBox 2.0.32 release
I'll repeat the regression tests with locally reverting the change from PDFBOX-5790 but locally adding my proposed xmpbox change from PDFBOX-5835. This way we'll know whether there are other problems. Tilman On 13.06.2024 19:23, Tilman Hausherr wrote: See https://issues.apache.org/jira/browse/PDFBOX-5838 I hope that it's all the same problem. Tilman On 13.06.2024 18:30, Andreas Lehmkühler wrote: Thanks for running the tests. the exceptions part looks good, but I'm afraid we have a text extraction issue. commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI some of the special characters changed. In 2.0.31 the were "omitted" and in 2.0.32 there is some special char. But th remaining part looks good to me. cc-main-2021-31-pdf-untruncated/0085/0085885.pdf ist seems to contain some special characters as well, but 2.0.31 is able to extract them. 2.0.32 seems to mix some of the content. I guess it is somehow font related. Need to investigate more Andreas Am 12.06.24 um 20:23 schrieb Tilman Hausherr: https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz No new exceptions but many content differences. I haven't investigated yet. Tilman On 12.06.2024 11:31, Tilman Hausherr wrote: I've started the tests. If there aren't any troubles I'll have the results tomorrow. Tilman On 05.06.2024 08:07, Andreas Lehmkühler wrote: Thanks for the update. I'm going to postpone the release as I'll need any helping hand I can get. Andreas Am 02.06.24 um 14:22 schrieb Tilman Hausherr: +1 but I won't be able to help with tests this time Tilman On 01.06.2024 12:15, Andreas Lehmkühler wrote: Hi, IMHO it is time to cut another 2.0.x release. I'm planing to do so in a week or so? Any objections or is there something we should add/fix first? Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5834) [PATCH] PDF split missing names from documentCatalog
[ https://issues.apache.org/jira/browse/PDFBOX-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854917#comment-17854917 ] Tilman Hausherr commented on PDFBOX-5834: - I'd like to see an example of such a PDF. And I'm also wondering whether the current solution misses named destinations, which would be a more common problem. > [PATCH] PDF split missing names from documentCatalog > > > Key: PDFBOX-5834 > URL: https://issues.apache.org/jira/browse/PDFBOX-5834 > Project: PDFBox > Issue Type: Bug >Reporter: Simon Steiner >Priority: Major > Attachments: tmp.patch > > > java -jar app/target/pdfbox-app-2.0.32-SNAPSHOT.jar PDFSplit xxx.pdf > I would expect to see the names dict inside the documentCatalog which is used > to store pdf templates -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: PDFBox 2.0.32 release
See https://issues.apache.org/jira/browse/PDFBOX-5838 I hope that it's all the same problem. Tilman On 13.06.2024 18:30, Andreas Lehmkühler wrote: Thanks for running the tests. the exceptions part looks good, but I'm afraid we have a text extraction issue. commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI some of the special characters changed. In 2.0.31 the were "omitted" and in 2.0.32 there is some special char. But th remaining part looks good to me. cc-main-2021-31-pdf-untruncated/0085/0085885.pdf ist seems to contain some special characters as well, but 2.0.31 is able to extract them. 2.0.32 seems to mix some of the content. I guess it is somehow font related. Need to investigate more Andreas Am 12.06.24 um 20:23 schrieb Tilman Hausherr: https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz No new exceptions but many content differences. I haven't investigated yet. Tilman On 12.06.2024 11:31, Tilman Hausherr wrote: I've started the tests. If there aren't any troubles I'll have the results tomorrow. Tilman On 05.06.2024 08:07, Andreas Lehmkühler wrote: Thanks for the update. I'm going to postpone the release as I'll need any helping hand I can get. Andreas Am 02.06.24 um 14:22 schrieb Tilman Hausherr: +1 but I won't be able to help with tests this time Tilman On 01.06.2024 12:15, Andreas Lehmkühler wrote: Hi, IMHO it is time to cut another 2.0.x release. I'm planing to do so in a week or so? Any objections or is there something we should add/fix first? Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName
[ https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854765#comment-17854765 ] Tilman Hausherr commented on PDFBOX-5835: - My problem is that I'd like to have a not copyrighted file for tests and to put into our repository, although I don't know if your file can be copyrighted at all, assuming that it was created by a machine. > DomXmpParser - IllegalArgumentException: prefix cannot be "null" when > creating a QName > -- > > Key: PDFBOX-5835 > URL: https://issues.apache.org/jira/browse/PDFBOX-5835 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 3.0.2 PDFBox >Reporter: Oliver Schmidtmer >Priority: Major > > I've got a PDF from, where parsing the metadata fails with an > IllegalArgumentException > {code:java} > java.lang.IllegalArgumentException: prefix cannot be "null" when creating a > QName > at java.xml/javax.xml.namespace.QName.(QName.java:192) > at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99) > at > org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306) > at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112) > {code} > This can be reproduced with a simple test, using the extracted metadata: > {code:java} > @Test > void testDomXmpParser() throws XmpParsingException > { > // taken from file test-landscape2.pdf > String xmpmeta = " standalone=\"no\"?>\n" + > " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" + > " xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" + > " xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" + > " 3\n" + > " A\n" + > " \n" + > " xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; > xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; > xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; > xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; > xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" + > " \n" + > "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" + > "\n" + > " \n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension > Schema\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n" > + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" + > " \n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML > invoice file\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" + > " xmlns=\"htt
[jira] [Comment Edited] (PDFBOX-3117) Left margin cut off when printing
[ https://issues.apache.org/jira/browse/PDFBOX-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854743#comment-17854743 ] Tilman Hausherr edited comment on PDFBOX-3117 at 6/13/24 12:58 PM: --- Back to this because of this question: https://lists.apache.org/thread/12s9tc93ofgmjfq1dpqfps9p725l0wwr I'm adding a check to disable centering if the translation has a negative value. Landscape labels can be printed without rotation if PORTRAIT is used as a parameter. was (Author: tilman): Back to this: https://lists.apache.org/thread/12s9tc93ofgmjfq1dpqfps9p725l0wwr I'm adding a check to disable centering if the translation has a negative value. Landscape labels can be printed without rotation if PORTRAIT is used as a parameter. > Left margin cut off when printing > - > > Key: PDFBOX-3117 > URL: https://issues.apache.org/jira/browse/PDFBOX-3117 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.10, 1.8.11, 2.0.0 > Reporter: Tilman Hausherr >Priority: Major > Labels: print, printing > Attachments: PDFBOX-3117-1468001565.pdf, PDFBOX-3117.pdf > > > This is about the margin problem when printing that was mentioned on the user > mailing list. What I know at this time: > - media box is (0 0 233.29 3600) > - used fonts: Times-Roman and ArialUnicodeMS not embedded > Effect happens with a real printer, but not when printing to PDF or to XPS. > First todo is to create such a file in the hope of getting the effect because > the file can't be shared. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3117) Left margin cut off when printing
[ https://issues.apache.org/jira/browse/PDFBOX-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854743#comment-17854743 ] Tilman Hausherr commented on PDFBOX-3117: - Back to this: https://lists.apache.org/thread/12s9tc93ofgmjfq1dpqfps9p725l0wwr I'm adding a check to disable centering if the translation has a negative value. Landscape labels can be printed without rotation if PORTRAIT is used as a parameter. > Left margin cut off when printing > - > > Key: PDFBOX-3117 > URL: https://issues.apache.org/jira/browse/PDFBOX-3117 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.10, 1.8.11, 2.0.0 > Reporter: Tilman Hausherr >Priority: Major > Labels: print, printing > Attachments: PDFBOX-3117-1468001565.pdf, PDFBOX-3117.pdf > > > This is about the margin problem when printing that was mentioned on the user > mailing list. What I know at this time: > - media box is (0 0 233.29 3600) > - used fonts: Times-Roman and ArialUnicodeMS not embedded > Effect happens with a real printer, but not when printing to PDF or to XPS. > First todo is to create such a file in the hope of getting the effect because > the file can't be shared. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName
[ https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854703#comment-17854703 ] Tilman Hausherr commented on PDFBOX-5835: - I'm waiting for PDFBOX-5838 to be fixed to commit my proposed change. After PDFBOX-5838 is fixed and there are no further problems, I'd like to test the change to see if it gets better or worse. Ideally there should be less exceptions. > DomXmpParser - IllegalArgumentException: prefix cannot be "null" when > creating a QName > -- > > Key: PDFBOX-5835 > URL: https://issues.apache.org/jira/browse/PDFBOX-5835 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 3.0.2 PDFBox >Reporter: Oliver Schmidtmer >Priority: Major > > I've got a PDF from, where parsing the metadata fails with an > IllegalArgumentException > {code:java} > java.lang.IllegalArgumentException: prefix cannot be "null" when creating a > QName > at java.xml/javax.xml.namespace.QName.(QName.java:192) > at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99) > at > org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306) > at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112) > {code} > This can be reproduced with a simple test, using the extracted metadata: > {code:java} > @Test > void testDomXmpParser() throws XmpParsingException > { > // taken from file test-landscape2.pdf > String xmpmeta = " standalone=\"no\"?>\n" + > " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" + > " xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" + > " xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" + > " 3\n" + > " A\n" + > " \n" + > " xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; > xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; > xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; > xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; > xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" + > " \n" + > "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" + > "\n" + > " \n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension > Schema\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n" > + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" + > " \n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML > invoice file\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" + >
[jira] [Comment Edited] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
[ https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854692#comment-17854692 ] Tilman Hausherr edited comment on PDFBOX-5838 at 6/13/24 9:56 AM: -- Another file, just to see if it is the same bug: OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf: font F3, the A glyph is decoded as $. was (Author: tilman): More files, just to see if it is the same bug: OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf: font F3, the A glyph is decoded as $. > Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31 > -- > > Key: PDFBOX-5838 > URL: https://issues.apache.org/jira/browse/PDFBOX-5838 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox > Reporter: Tilman Hausherr >Priority: Major > Labels: regression > Attachments: OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf, > PDFBOX-5838-0024320-reduced.pdf > > > discovered in 2.0.32 regression tests -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName
[ https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854724#comment-17854724 ] Tilman Hausherr commented on PDFBOX-5835: - [~O.Schmidtmer] the xmp isn't from "test-landscape2.pdf", at least not from ours. It seems to be some ZUGFeRD template / demo file. Or was it created by "enriching" the original PDF file? > DomXmpParser - IllegalArgumentException: prefix cannot be "null" when > creating a QName > -- > > Key: PDFBOX-5835 > URL: https://issues.apache.org/jira/browse/PDFBOX-5835 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 3.0.2 PDFBox >Reporter: Oliver Schmidtmer >Priority: Major > > I've got a PDF from, where parsing the metadata fails with an > IllegalArgumentException > {code:java} > java.lang.IllegalArgumentException: prefix cannot be "null" when creating a > QName > at java.xml/javax.xml.namespace.QName.(QName.java:192) > at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99) > at > org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306) > at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112) > {code} > This can be reproduced with a simple test, using the extracted metadata: > {code:java} > @Test > void testDomXmpParser() throws XmpParsingException > { > // taken from file test-landscape2.pdf > String xmpmeta = " standalone=\"no\"?>\n" + > " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" + > " xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" + > " xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" + > " 3\n" + > " A\n" + > " \n" + > " xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; > xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; > xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; > xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; > xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" + > " \n" + > "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" + > "\n" + > " \n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension > Schema\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n" > + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" + > " \n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML > invoice file\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" + > " xmlns=\"http://www.aiim
[jira] [Commented] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
[ https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854711#comment-17854711 ] Tilman Hausherr commented on PDFBOX-5838: - I think it's because of PDFBOX-5790. This might be a tricky decision: Adobe fails to extract the text of the file here, I get "HRQRUV ReVeaUch PURMecW". The other file also fails. > Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31 > -- > > Key: PDFBOX-5838 > URL: https://issues.apache.org/jira/browse/PDFBOX-5838 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Tilman Hausherr >Priority: Major > Labels: regression > Attachments: OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf, > PDFBOX-5838-0024320-reduced.pdf > > > discovered in 2.0.32 regression tests -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5837) Add center constructor parameter to PDFPageable and to pdfbox-app
[ https://issues.apache.org/jira/browse/PDFBOX-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5837: Description: Add an optional {{center}} parameter to the telescopic {{PDFPageable}} constructor and pass it to {{PDFPrintable}}, and add the parameter to the command line class. This may also help with the printing of landscape labels, see also https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3 was: Add optional {{center}} parameter to telescopic {{PDFPageable}} constructor and pass it to {{PDFPrintable}}, and add the parameter to the comment line class. This may also help with the printing of landscape labels, see also https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3 > Add center constructor parameter to PDFPageable and to pdfbox-app > - > > Key: PDFBOX-5837 > URL: https://issues.apache.org/jira/browse/PDFBOX-5837 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.31, 3.0.2 PDFBox > Reporter: Tilman Hausherr > Assignee: Tilman Hausherr >Priority: Minor > Labels: print, printing > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > > Add an optional {{center}} parameter to the telescopic {{PDFPageable}} > constructor and pass it to {{PDFPrintable}}, and add the parameter to the > command line class. This may also help with the printing of landscape labels, > see also > https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-5837) Add center constructor parameter to PDFPageable and to pdfbox-app
[ https://issues.apache.org/jira/browse/PDFBOX-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5837. - Fix Version/s: 4.0.0 Resolution: Fixed > Add center constructor parameter to PDFPageable and to pdfbox-app > - > > Key: PDFBOX-5837 > URL: https://issues.apache.org/jira/browse/PDFBOX-5837 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.31, 3.0.2 PDFBox > Reporter: Tilman Hausherr > Assignee: Tilman Hausherr >Priority: Minor > Labels: print, printing > Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0 > > > Add optional {{center}} parameter to telescopic {{PDFPageable}} constructor > and pass it to {{PDFPrintable}}, and add the parameter to the comment line > class. This may also help with the printing of landscape labels, see also > https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
[ https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5838: Attachment: (was: PDFBOX-5838-0024320.pdf) > Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31 > -- > > Key: PDFBOX-5838 > URL: https://issues.apache.org/jira/browse/PDFBOX-5838 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox > Reporter: Tilman Hausherr >Priority: Major > Labels: regression > Attachments: PDFBOX-5838-0024320-reduced.pdf > > > discovered in 2.0.32 regression tests -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
[ https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5838: Attachment: OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf > Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31 > -- > > Key: PDFBOX-5838 > URL: https://issues.apache.org/jira/browse/PDFBOX-5838 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox > Reporter: Tilman Hausherr >Priority: Major > Labels: regression > Attachments: OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf, > PDFBOX-5838-0024320-reduced.pdf > > > discovered in 2.0.32 regression tests -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
[ https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5838: Attachment: PDFBOX-5838-0024320-reduced.pdf > Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31 > -- > > Key: PDFBOX-5838 > URL: https://issues.apache.org/jira/browse/PDFBOX-5838 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox > Reporter: Tilman Hausherr >Priority: Major > Labels: regression > Attachments: PDFBOX-5838-0024320-reduced.pdf > > > discovered in 2.0.32 regression tests -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
[ https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5838: Attachment: PDFBOX-5838-0024320.pdf > Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31 > -- > > Key: PDFBOX-5838 > URL: https://issues.apache.org/jira/browse/PDFBOX-5838 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox > Reporter: Tilman Hausherr >Priority: Major > Labels: regression > Attachments: PDFBOX-5838-0024320.pdf > > > discovered in 2.0.32 regression tests -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
Tilman Hausherr created PDFBOX-5838: --- Summary: Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31 Key: PDFBOX-5838 URL: https://issues.apache.org/jira/browse/PDFBOX-5838 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 2.0.32, 3.0.3 PDFBox Reporter: Tilman Hausherr Attachments: PDFBOX-5838-0024320.pdf discovered in 2.0.32 regression tests -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5837) Add center constructor parameter to PDFPageable and to pdfbox-app
[ https://issues.apache.org/jira/browse/PDFBOX-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5837: Description: Add optional {{center}} parameter to telescopic {{PDFPageable}} constructor and pass it to {{PDFPrintable}}, and add the parameter to the comment line class. This may also help with the printing of landscape labels, see also https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3 was: Add center constructor parameter to PDFPageable and pass it to PDFPrintable, and add the parameter to the comment line class. This may also help with the printing of landscape labels, see also https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3 > Add center constructor parameter to PDFPageable and to pdfbox-app > - > > Key: PDFBOX-5837 > URL: https://issues.apache.org/jira/browse/PDFBOX-5837 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.31, 3.0.2 PDFBox > Reporter: Tilman Hausherr > Assignee: Tilman Hausherr >Priority: Minor > Labels: print, printing > Fix For: 2.0.32, 3.0.3 PDFBox > > > Add optional {{center}} parameter to telescopic {{PDFPageable}} constructor > and pass it to {{PDFPrintable}}, and add the parameter to the comment line > class. This may also help with the printing of landscape labels, see also > https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5837) Add center constructor parameter to PDFPageable and to pdfbox-app
[ https://issues.apache.org/jira/browse/PDFBOX-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5837: Description: Add center constructor parameter to PDFPageable and pass it to PDFPrintable, and add the parameter to the comment line class. This may also help with the printing of landscape labels, see also https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3 was:Add center constructor parameter to PDFPageable and pass it to PDFPrintable, and add the parameter to the comment line class. This may also help with the printing of landscape labels. > Add center constructor parameter to PDFPageable and to pdfbox-app > - > > Key: PDFBOX-5837 > URL: https://issues.apache.org/jira/browse/PDFBOX-5837 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.31, 3.0.2 PDFBox > Reporter: Tilman Hausherr > Assignee: Tilman Hausherr >Priority: Minor > Labels: print, printing > Fix For: 2.0.32, 3.0.3 PDFBox > > > Add center constructor parameter to PDFPageable and pass it to PDFPrintable, > and add the parameter to the comment line class. This may also help with the > printing of landscape labels, see also > https://lists.apache.org/thread/oqpzf93onp3ytvgjh4hvkcdty4y4tbd3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5837) Add center constructor parameter to PDFPageable and to pdfbox-app
Tilman Hausherr created PDFBOX-5837: --- Summary: Add center constructor parameter to PDFPageable and to pdfbox-app Key: PDFBOX-5837 URL: https://issues.apache.org/jira/browse/PDFBOX-5837 Project: PDFBox Issue Type: Improvement Affects Versions: 3.0.2 PDFBox, 2.0.31 Reporter: Tilman Hausherr Assignee: Tilman Hausherr Fix For: 2.0.32, 3.0.3 PDFBox Add center constructor parameter to PDFPageable and pass it to PDFPrintable, and add the parameter to the comment line class. This may also help with the printing of landscape labels. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: PDFBox 2.0.32 release
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz No new exceptions but many content differences. I haven't investigated yet. Tilman On 12.06.2024 11:31, Tilman Hausherr wrote: I've started the tests. If there aren't any troubles I'll have the results tomorrow. Tilman On 05.06.2024 08:07, Andreas Lehmkühler wrote: Thanks for the update. I'm going to postpone the release as I'll need any helping hand I can get. Andreas Am 02.06.24 um 14:22 schrieb Tilman Hausherr: +1 but I won't be able to help with tests this time Tilman On 01.06.2024 12:15, Andreas Lehmkühler wrote: Hi, IMHO it is time to cut another 2.0.x release. I'm planing to do so in a week or so? Any objections or is there something we should add/fix first? Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName
[ https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854441#comment-17854441 ] Tilman Hausherr commented on PDFBOX-5835: - I'm able to run the code by adding {{nsFinder.push()}} at two places: below {{Element first = elements.get(0);}} and at the beginning of the loop in {{parseChildrenAsProperties()}}, and also adding pop() at the ends. It doesn't fix PDFBOX-2913. > DomXmpParser - IllegalArgumentException: prefix cannot be "null" when > creating a QName > -- > > Key: PDFBOX-5835 > URL: https://issues.apache.org/jira/browse/PDFBOX-5835 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 3.0.2 PDFBox >Reporter: Oliver Schmidtmer >Priority: Major > > I've got a PDF from, where parsing the metadata fails with an > IllegalArgumentException > {code:java} > java.lang.IllegalArgumentException: prefix cannot be "null" when creating a > QName > at java.xml/javax.xml.namespace.QName.(QName.java:192) > at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99) > at > org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306) > at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112) > {code} > This can be reproduced with a simple test, using the extracted metadata: > {code:java} > @Test > void testDomXmpParser() throws XmpParsingException > { > // taken from file test-landscape2.pdf > String xmpmeta = " standalone=\"no\"?>\n" + > " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" + > " xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" + > " xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" + > " 3\n" + > " A\n" + > " \n" + > " xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; > xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; > xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; > xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; > xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" + > " \n" + > "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" + > "\n" + > " \n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension > Schema\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n" > + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" + > " \n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML > invoice file\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" + >
Re: PDFBox 2.0.32 release
I've started the tests. If there aren't any troubles I'll have the results tomorrow. Tilman On 05.06.2024 08:07, Andreas Lehmkühler wrote: Thanks for the update. I'm going to postpone the release as I'll need any helping hand I can get. Andreas Am 02.06.24 um 14:22 schrieb Tilman Hausherr: +1 but I won't be able to help with tests this time Tilman On 01.06.2024 12:15, Andreas Lehmkühler wrote: Hi, IMHO it is time to cut another 2.0.x release. I'm planing to do so in a week or so? Any objections or is there something we should add/fix first? Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName
[ https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854271#comment-17854271 ] Tilman Hausherr commented on PDFBOX-5835: - I read https://www.w3schools.com/xml/xml_namespaces.asp (because I don't know much about xml) and then I changed the segment to {code:xml} http://www.aiim.org/pdfa/ns/extension/;> http://www.aiim.org/pdfa/ns/schema#;>ZUGFeRD PDFA Extension Schema {code} and now the exception comes much later: Schema is not set in this document : http://www.aiim.org/pdfa/ns/schema# Thus the question is, why does defining the namespace after " DomXmpParser - IllegalArgumentException: prefix cannot be "null" when > creating a QName > -- > > Key: PDFBOX-5835 > URL: https://issues.apache.org/jira/browse/PDFBOX-5835 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 3.0.2 PDFBox >Reporter: Oliver Schmidtmer >Priority: Major > > I've got a PDF from, where parsing the metadata fails with an > IllegalArgumentException > {code:java} > java.lang.IllegalArgumentException: prefix cannot be "null" when creating a > QName > at java.xml/javax.xml.namespace.QName.(QName.java:192) > at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99) > at > org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306) > at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112) > {code} > This can be reproduced with a simple test, using the extracted metadata: > {code:java} > @Test > void testDomXmpParser() throws XmpParsingException > { > // taken from file test-landscape2.pdf > String xmpmeta = " standalone=\"no\"?>\n" + > " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" + > " xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" + > " xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" + > " 3\n" + > " A\n" + > " \n" + > " xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; > xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; > xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; > xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; > xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" + > " \n" + > "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" + > "\n" + > " \n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension > Schema\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n" > + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" + > " \n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML > invoice file\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n"
[jira] [Commented] (PDFBOX-5836) PDF A-1 falsely validated as invalid for ICC color profile regression
[ https://issues.apache.org/jira/browse/PDFBOX-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854162#comment-17854162 ] Tilman Hausherr commented on PDFBOX-5836: - It works for me with the app, on jdk8 and jdk22 on Windows. What jdk are you using on what OS? > PDF A-1 falsely validated as invalid for ICC color profile regression > - > > Key: PDFBOX-5836 > URL: https://issues.apache.org/jira/browse/PDFBOX-5836 > Project: PDFBox > Issue Type: Bug > Components: Preflight >Affects Versions: 3.0.2 PDFBox >Reporter: Jochen Stärk >Priority: Major > Attachments: MustangGnuaccountingBeispielRE-20190610_507blanko.pdf > > > PreflightParser.validate(theFile.toFile()).isValid() throws a "Unable to > parse the ICC Profile" on the attached, Libreoffice-generated PDF/A-1. > VeraPDF validates the file as valid. It worked with PDF 2 and I need it to be > fixed in context of my upgrade to PDFbox 3 > (https://github.com/ZUGFeRD/mustangproject/issues/373). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName
[ https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854082#comment-17854082 ] Tilman Hausherr commented on PDFBOX-5835: - Yes maybe there is a difference. If I deactivate the exception throw then the code for this issue succeeds, but not the code for PDFBOX-2913. > DomXmpParser - IllegalArgumentException: prefix cannot be "null" when > creating a QName > -- > > Key: PDFBOX-5835 > URL: https://issues.apache.org/jira/browse/PDFBOX-5835 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 3.0.2 PDFBox >Reporter: Oliver Schmidtmer >Priority: Major > > I've got a PDF from, where parsing the metadata fails with an > IllegalArgumentException > {code:java} > java.lang.IllegalArgumentException: prefix cannot be "null" when creating a > QName > at java.xml/javax.xml.namespace.QName.(QName.java:192) > at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99) > at > org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306) > at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112) > {code} > This can be reproduced with a simple test, using the extracted metadata: > {code:java} > @Test > void testDomXmpParser() throws XmpParsingException > { > // taken from file test-landscape2.pdf > String xmpmeta = " standalone=\"no\"?>\n" + > " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" + > " xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" + > " xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" + > " 3\n" + > " A\n" + > " \n" + > " xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; > xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; > xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; > xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; > xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" + > " \n" + > "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" + > "\n" + > " \n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension > Schema\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n" > + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" + > " \n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML > invoice file\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " x
[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName
[ https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853936#comment-17853936 ] Tilman Hausherr commented on PDFBOX-5835: - I fixed that bug anyway; I'll try to work on the actual issue at a later time. (I tried last year and didn't succeed) > DomXmpParser - IllegalArgumentException: prefix cannot be "null" when > creating a QName > -- > > Key: PDFBOX-5835 > URL: https://issues.apache.org/jira/browse/PDFBOX-5835 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 3.0.2 PDFBox >Reporter: Oliver Schmidtmer >Priority: Major > > I've got a PDF from, where parsing the metadata fails with an > IllegalArgumentException > {code:java} > java.lang.IllegalArgumentException: prefix cannot be "null" when creating a > QName > at java.xml/javax.xml.namespace.QName.(QName.java:192) > at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99) > at > org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306) > at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112) > {code} > This can be reproduced with a simple test, using the extracted metadata: > {code:java} > @Test > void testDomXmpParser() throws XmpParsingException > { > // taken from file test-landscape2.pdf > String xmpmeta = " standalone=\"no\"?>\n" + > " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" + > " xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" + > " xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" + > " 3\n" + > " A\n" + > " \n" + > " xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; > xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; > xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; > xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; > xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" + > " \n" + > "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" + > "\n" + > " \n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension > Schema\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n" > + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" + > " \n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML > invoice file\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " x
[jira] [Commented] (PDFBOX-5835) DomXmpParser - IllegalArgumentException: prefix cannot be "null" when creating a QName
[ https://issues.apache.org/jira/browse/PDFBOX-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853759#comment-17853759 ] Tilman Hausherr commented on PDFBOX-5835: - I can fix avoid the IllegalArgumentException but now you'll get XmpParsingException: Schema is not set in this document : http://www.aiim.org/pdfa/ns/extension/ which is a 9 year old unfixed bug (PDFBOX-2913). Would this be helpful? > DomXmpParser - IllegalArgumentException: prefix cannot be "null" when > creating a QName > -- > > Key: PDFBOX-5835 > URL: https://issues.apache.org/jira/browse/PDFBOX-5835 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 3.0.2 PDFBox >Reporter: Oliver Schmidtmer >Priority: Major > > I've got a PDF from, where parsing the metadata fails with an > IllegalArgumentException > {code:java} > java.lang.IllegalArgumentException: prefix cannot be "null" when creating a > QName > at java.xml/javax.xml.namespace.QName.(QName.java:192) > at org.apache.xmpbox.xml.DomHelper.getQName(DomHelper.java:99) > at > org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:306) > at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:250) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:112) > {code} > This can be reproduced with a simple test, using the extracted metadata: > {code:java} > @Test > void testDomXmpParser() throws XmpParsingException > { > // taken from file test-landscape2.pdf > String xmpmeta = " standalone=\"no\"?>\n" + > " id=\"W5M0MpCehiHzreSzNTczkc9d\"?> x:xmptk=\"FIS/xee\">\n" + > " xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\;>\n" + > " xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\;>\n" + > " 3\n" + > " A\n" + > " \n" + > " xmlns:pdfaExtension=\"http://www.aiim.org/pdfa/ns/extension/\; > xmlns:pdfaField=\"http://www.aiim.org/pdfa/ns/field#\; > xmlns:pdfaProperty=\"http://www.aiim.org/pdfa/ns/property#\; > xmlns:pdfaSchema=\"http://www.aiim.org/pdfa/ns/schema#\; > xmlns:pdfaType=\"http://www.aiim.org/pdfa/ns/type#\; rdf:about=\"\"/>\n" + > " \n" + > "xmlns=\"http://www.aiim.org/pdfa/ns/extension/\;>\n" + > "\n" + > " \n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>ZUGFeRD PDFA Extension > Schema\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>urn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#\n" > + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>zf\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/schema#\;>\n" + > " \n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentFileName\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>name of the embedded XML > invoice file\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>DocumentType\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Text\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>external\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>INVOICE\n" + > "\n" + > "\n" + > " xmlns=\"http://www.aiim.org/pdfa/ns/property#\;>Version\n" + > " xmlns=\"htt