[jira] [Commented] (PDFBOX-5445) Add command line options to accept file containing list of files to merge
[ https://issues.apache.org/jira/browse/PDFBOX-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542756#comment-17542756 ] Tilman Hausherr commented on PDFBOX-5445: - I agree that the first half would make sense, but the second part is something completely different and very tricky. > Add command line options to accept file containing list of files to merge > - > > Key: PDFBOX-5445 > URL: https://issues.apache.org/jira/browse/PDFBOX-5445 > Project: PDFBox > Issue Type: New Feature >Affects Versions: 3.0.0 PDFBox > Environment: windows 11, 10 >Reporter: Zbigniew Minciel >Priority: Major > Fix For: 3.0.0 PDFBox > > > Users of free MBox Mail Viewer require to export hundreds of emails into PDF > and merge them into a single PDF document. Due to command line length > limitations, smaller subsets of PDF mails have to be merged first and merge > again multiple times. Merging must preserve order of emails. Option to accept > a file containing a list of PDF files to merge would be very helpful. > Another very useful option would be suppress "Page Break" at the end of PDF > document to allow multiple small emails to fit on a page. Frequently, emails > within the same thread are small. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5445) Add command line options to accept file containing list of files to merge
Zbigniew Minciel created PDFBOX-5445: Summary: Add command line options to accept file containing list of files to merge Key: PDFBOX-5445 URL: https://issues.apache.org/jira/browse/PDFBOX-5445 Project: PDFBox Issue Type: New Feature Affects Versions: 3.0.0 PDFBox Environment: windows 11, 10 Reporter: Zbigniew Minciel Fix For: 3.0.0 PDFBox Users of free MBox Mail Viewer require to export hundreds of emails into PDF and merge them into a single PDF document. Due to command line length limitations, smaller subsets of PDF mails have to be merged first and merge again multiple times. Merging must preserve order of emails. Option to accept a file containing a list of PDF files to merge would be very helpful. Another very useful option would be suppress "Page Break" at the end of PDF document to allow multiple small emails to fit on a page. Frequently, emails within the same thread are small. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5030) Create Migration guide for 3.0.0
[ https://issues.apache.org/jira/browse/PDFBOX-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542648#comment-17542648 ] ASF subversion and git services commented on PDFBOX-5030: - Commit d8585e577fbefc810f5188ec833f039ae26a0a1d in pdfbox-docs's branch refs/heads/master from Andreas Lehmkühler [ https://gitbox.apache.org/repos/asf?p=pdfbox-docs.git;h=d8585e57 ] PDFBOX-5030: update migration guide > Create Migration guide for 3.0.0 > > > Key: PDFBOX-5030 > URL: https://issues.apache.org/jira/browse/PDFBOX-5030 > Project: PDFBox > Issue Type: Task > Components: Documentation >Reporter: Maruan Sahyoun >Assignee: Maruan Sahyoun >Priority: Major > Fix For: 3.0.0 PDFBox > > > As to start educating about the migration efforts needed to get to 3.0.0 the > should be a migration guide (evolving over time) to prepare for the release -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Assigned] (PDFBOX-5444) DataFormatException: invalid block type
[ https://issues.apache.org/jira/browse/PDFBOX-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler reassigned PDFBOX-5444: -- Assignee: Andreas Lehmkühler > DataFormatException: invalid block type > --- > > Key: PDFBOX-5444 > URL: https://issues.apache.org/jira/browse/PDFBOX-5444 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 3.0.0 PDFBox >Reporter: Tilman Hausherr >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Attachments: 148186.pdf, > PDFBOX-5444-H6IRRY3AIIPNDY7BGZ5IORSUOPNYRCWD.pdf > > > No page can be displayed in the attached file; this worked in 2.0.26. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5444) DataFormatException: invalid block type
Tilman Hausherr created PDFBOX-5444: --- Summary: DataFormatException: invalid block type Key: PDFBOX-5444 URL: https://issues.apache.org/jira/browse/PDFBOX-5444 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 3.0.0 PDFBox Reporter: Tilman Hausherr Attachments: 148186.pdf No page can be displayed in the attached file; this worked in 2.0.26. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5444) DataFormatException: invalid block type
[ https://issues.apache.org/jira/browse/PDFBOX-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5444: Attachment: PDFBOX-5444-H6IRRY3AIIPNDY7BGZ5IORSUOPNYRCWD.pdf > DataFormatException: invalid block type > --- > > Key: PDFBOX-5444 > URL: https://issues.apache.org/jira/browse/PDFBOX-5444 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 3.0.0 PDFBox >Reporter: Tilman Hausherr >Priority: Major > Labels: regression > Attachments: 148186.pdf, > PDFBOX-5444-H6IRRY3AIIPNDY7BGZ5IORSUOPNYRCWD.pdf > > > No page can be displayed in the attached file; this worked in 2.0.26. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: text extraction regression tests for 3.x?
Apologies for my delay. I ran trunk/3.x on May 12 against 2.0.26. The reports are here: https://corpora.tika.apache.org/base/reports/reports_pdfbox_3x_20220512.tgz Happy to rerun with a more recent version of trunk. Cheers, Tim On Sun, May 8, 2022 at 1:21 PM Andreas Lehmkuehler wrote: > Am 06.05.22 um 14:30 schrieb Tim Allison: > > All, > >Let me know when makes sense to run the text extraction regression > Yes, it'd be useful to have some update results. > > How about comparing 2.0.26 vs 3.0.0-alpha3 and maybe 3.0.0-alpha2 vs. > 3.0.0-alpha3? > > > > tests for 3.x. I regret I haven't been following our mailing list as > > closely as I should be. > No need to worry, everything is fine. > > Andreas > > > > > Cheers, > > > > Tim > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > > For additional commands, e-mail: dev-h...@pdfbox.apache.org > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: dev-h...@pdfbox.apache.org > >
[jira] [Commented] (PDFBOX-5441) Use org.apache.pdfbox.io for CMapParser
[ https://issues.apache.org/jira/browse/PDFBOX-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542427#comment-17542427 ] ASF subversion and git services commented on PDFBOX-5441: - Commit 1901276 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1901276 ] PDFBOX-5441: simplify/refactor > Use org.apache.pdfbox.io for CMapParser > --- > > Key: PDFBOX-5441 > URL: https://issues.apache.org/jira/browse/PDFBOX-5441 > Project: PDFBox > Issue Type: Improvement > Components: FontBox >Affects Versions: 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.0 PDFBox > > > Use org.apache.pdfbox.io for CMapParser to simplify the usage of resources > when pdfbox interacts with fontbox. Most likely this should reduce the > memory/resources footprint as well. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org