[jira] [Commented] (PDFBOX-5445) Add command line options to accept file containing list of files to merge

2022-05-26 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542756#comment-17542756
 ] 

Tilman Hausherr commented on PDFBOX-5445:
-

I agree that the first half would make sense, but the second part is something 
completely different and very tricky.

> Add command line options to accept file containing list of files to merge
> -
>
> Key: PDFBOX-5445
> URL: https://issues.apache.org/jira/browse/PDFBOX-5445
> Project: PDFBox
>  Issue Type: New Feature
>Affects Versions: 3.0.0 PDFBox
> Environment: windows 11, 10
>Reporter: Zbigniew Minciel
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> Users of free MBox Mail Viewer require to export hundreds of emails into PDF 
> and merge them into a single PDF document. Due to command line length  
> limitations, smaller subsets of PDF mails have to be merged first and merge 
> again multiple times. Merging must preserve order of emails. Option to accept 
> a file containing a list of PDF files to merge would be very helpful.
> Another very useful option would be suppress "Page Break" at the end of PDF 
> document to allow multiple small emails to fit on a page. Frequently, emails 
> within the same thread are small.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5445) Add command line options to accept file containing list of files to merge

2022-05-26 Thread Zbigniew Minciel (Jira)
Zbigniew Minciel created PDFBOX-5445:


 Summary: Add command line options to accept file containing list 
of files to merge
 Key: PDFBOX-5445
 URL: https://issues.apache.org/jira/browse/PDFBOX-5445
 Project: PDFBox
  Issue Type: New Feature
Affects Versions: 3.0.0 PDFBox
 Environment: windows 11, 10
Reporter: Zbigniew Minciel
 Fix For: 3.0.0 PDFBox


Users of free MBox Mail Viewer require to export hundreds of emails into PDF 
and merge them into a single PDF document. Due to command line length  
limitations, smaller subsets of PDF mails have to be merged first and merge 
again multiple times. Merging must preserve order of emails. Option to accept a 
file containing a list of PDF files to merge would be very helpful.

Another very useful option would be suppress "Page Break" at the end of PDF 
document to allow multiple small emails to fit on a page. Frequently, emails 
within the same thread are small.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5030) Create Migration guide for 3.0.0

2022-05-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542648#comment-17542648
 ] 

ASF subversion and git services commented on PDFBOX-5030:
-

Commit d8585e577fbefc810f5188ec833f039ae26a0a1d in pdfbox-docs's branch 
refs/heads/master from Andreas Lehmkühler
[ https://gitbox.apache.org/repos/asf?p=pdfbox-docs.git;h=d8585e57 ]

PDFBOX-5030: update migration guide


> Create Migration guide for 3.0.0
> 
>
> Key: PDFBOX-5030
> URL: https://issues.apache.org/jira/browse/PDFBOX-5030
> Project: PDFBox
>  Issue Type: Task
>  Components: Documentation
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> As to start educating about the migration efforts needed to get to 3.0.0 the 
> should be a migration guide (evolving over time) to prepare for the release



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Assigned] (PDFBOX-5444) DataFormatException: invalid block type

2022-05-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler reassigned PDFBOX-5444:
--

Assignee: Andreas Lehmkühler

> DataFormatException: invalid block type
> ---
>
> Key: PDFBOX-5444
> URL: https://issues.apache.org/jira/browse/PDFBOX-5444
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Attachments: 148186.pdf, 
> PDFBOX-5444-H6IRRY3AIIPNDY7BGZ5IORSUOPNYRCWD.pdf
>
>
> No page can be displayed in the attached file; this worked in 2.0.26.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5444) DataFormatException: invalid block type

2022-05-26 Thread Tilman Hausherr (Jira)
Tilman Hausherr created PDFBOX-5444:
---

 Summary: DataFormatException: invalid block type
 Key: PDFBOX-5444
 URL: https://issues.apache.org/jira/browse/PDFBOX-5444
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 3.0.0 PDFBox
Reporter: Tilman Hausherr
 Attachments: 148186.pdf

No page can be displayed in the attached file; this worked in 2.0.26.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5444) DataFormatException: invalid block type

2022-05-26 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5444:

Attachment: PDFBOX-5444-H6IRRY3AIIPNDY7BGZ5IORSUOPNYRCWD.pdf

> DataFormatException: invalid block type
> ---
>
> Key: PDFBOX-5444
> URL: https://issues.apache.org/jira/browse/PDFBOX-5444
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: 148186.pdf, 
> PDFBOX-5444-H6IRRY3AIIPNDY7BGZ5IORSUOPNYRCWD.pdf
>
>
> No page can be displayed in the attached file; this worked in 2.0.26.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: text extraction regression tests for 3.x?

2022-05-26 Thread Tim Allison
Apologies for my delay.  I ran trunk/3.x on May 12 against 2.0.26.  The
reports are here:
https://corpora.tika.apache.org/base/reports/reports_pdfbox_3x_20220512.tgz

Happy to rerun with a more recent version of trunk.

Cheers,

  Tim

On Sun, May 8, 2022 at 1:21 PM Andreas Lehmkuehler  wrote:

> Am 06.05.22 um 14:30 schrieb Tim Allison:
> > All,
> >Let me know when makes sense to run the text extraction regression
> Yes, it'd be useful to have some update results.
>
> How about comparing 2.0.26 vs 3.0.0-alpha3 and maybe 3.0.0-alpha2 vs.
> 3.0.0-alpha3?
>
>
> > tests for 3.x.  I regret I haven't been following our mailing list as
> > closely as I should be.
> No need to worry, everything is fine.
>
> Andreas
>
> >
> > Cheers,
> >
> > Tim
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>
>


[jira] [Commented] (PDFBOX-5441) Use org.apache.pdfbox.io for CMapParser

2022-05-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542427#comment-17542427
 ] 

ASF subversion and git services commented on PDFBOX-5441:
-

Commit 1901276 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1901276 ]

PDFBOX-5441: simplify/refactor

> Use org.apache.pdfbox.io for CMapParser
> ---
>
> Key: PDFBOX-5441
> URL: https://issues.apache.org/jira/browse/PDFBOX-5441
> Project: PDFBox
>  Issue Type: Improvement
>  Components: FontBox
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> Use org.apache.pdfbox.io for CMapParser to simplify the usage of resources 
> when pdfbox interacts with fontbox. Most likely this should reduce the 
> memory/resources footprint as well.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org