[jira] [Commented] (PDFBOX-5447) Missing root object specification in trailer

2022-05-29 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17543718#comment-17543718
 ] 

Andreas Lehmkühler commented on PDFBOX-5447:


The issue was introduced with 
[r1900461|http://svn.apache.org/viewvc?view=revision=1900461]

> Missing root object specification in trailer
> 
>
> Key: PDFBOX-5447
> URL: https://issues.apache.org/jira/browse/PDFBOX-5447
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Fix For: 3.0.0 PDFBox
>
> Attachments: GHOSTSCRIPT-693853-1.pdf
>
>
> The rendering of the attached pdfs fails using the current trunk. It works 
> with 3.0.0-alpha2



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: text extraction regression tests for 3.x?

2022-05-29 Thread Andreas Lehmkuehler

Thanks Tim,

looks like there are some regressions, see PDFBOX-5444 and PDFBOX-5447.

Maybe there are more to come 

Andreas


Am 26.05.22 um 15:04 schrieb Tim Allison:

Apologies for my delay.  I ran trunk/3.x on May 12 against 2.0.26.  The
reports are here:
https://corpora.tika.apache.org/base/reports/reports_pdfbox_3x_20220512.tgz

Happy to rerun with a more recent version of trunk.

Cheers,

   Tim

On Sun, May 8, 2022 at 1:21 PM Andreas Lehmkuehler  wrote:


Am 06.05.22 um 14:30 schrieb Tim Allison:

All,
Let me know when makes sense to run the text extraction regression

Yes, it'd be useful to have some update results.

How about comparing 2.0.26 vs 3.0.0-alpha3 and maybe 3.0.0-alpha2 vs.
3.0.0-alpha3?



tests for 3.x.  I regret I haven't been following our mailing list as
closely as I should be.

No need to worry, everything is fine.

Andreas



 Cheers,

 Tim

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org







-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Assigned] (PDFBOX-5446) Split package org.apache.pdfbox.io

2022-05-29 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler reassigned PDFBOX-5446:
--

Assignee: Andreas Lehmkühler

> Split package org.apache.pdfbox.io
> --
>
> Key: PDFBOX-5446
> URL: https://issues.apache.org/jira/browse/PDFBOX-5446
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Axel Howind
>Assignee: Andreas Lehmkühler
>Priority: Major
>
> Recently a new Maven module 'io' has been created to remove duplicated code. 
> This module contains the package 'org.apache.pdfbox.io'. However, the 
> 'pdfbox' module also contains classes in the same package. This makes it 
> impossible to use pdfbox in a modularized (Jigsaw) project since it is 
> forbidden to have packages split between different modules and produces the 
> following error:
> {noformat}
> the unnamed module reads package org.apache.pdfbox.io from both 
> org.apache.pdfbox and org.apache.pdfbox.io
> {noformat}
> The remaining classes in said üackage should also be moved to the new 'io' 
> module to solve this.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5447) Missing root object specification in trailer

2022-05-29 Thread Jira
Andreas Lehmkühler created PDFBOX-5447:
--

 Summary: Missing root object specification in trailer
 Key: PDFBOX-5447
 URL: https://issues.apache.org/jira/browse/PDFBOX-5447
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 3.0.0 PDFBox
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Fix For: 3.0.0 PDFBox
 Attachments: GHOSTSCRIPT-693853-1.pdf

The rendering of the attached pdfs fails using the current trunk. It works with 
3.0.0-alpha2



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5444) DataFormatException: invalid block type

2022-05-29 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-5444.

Resolution: Fixed

Thanks for the fast feedback

> DataFormatException: invalid block type
> ---
>
> Key: PDFBOX-5444
> URL: https://issues.apache.org/jira/browse/PDFBOX-5444
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Attachments: 148186.pdf, 
> PDFBOX-5444-H6IRRY3AIIPNDY7BGZ5IORSUOPNYRCWD.pdf
>
>
> No page can be displayed in the attached file; this worked in 2.0.26.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4892) Improve code quality (4)

2022-05-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17543620#comment-17543620
 ] 

ASF subversion and git services commented on PDFBOX-4892:
-

Commit 1901384 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1901384 ]

PDFBOX-4892: update junit

> Improve code quality (4)
> 
>
> Key: PDFBOX-4892
> URL: https://issues.apache.org/jira/browse/PDFBOX-4892
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.20
>Reporter: Tilman Hausherr
>Priority: Minor
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], 
> hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-4071, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4892) Improve code quality (4)

2022-05-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17543619#comment-17543619
 ] 

ASF subversion and git services commented on PDFBOX-4892:
-

Commit 1901383 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1901383 ]

PDFBOX-4892: update mockito

> Improve code quality (4)
> 
>
> Key: PDFBOX-4892
> URL: https://issues.apache.org/jira/browse/PDFBOX-4892
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.20
>Reporter: Tilman Hausherr
>Priority: Minor
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], 
> hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-4071, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4892) Improve code quality (4)

2022-05-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17543618#comment-17543618
 ] 

ASF subversion and git services commented on PDFBOX-4892:
-

Commit 1901382 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1901382 ]

PDFBOX-4892: update mockito

> Improve code quality (4)
> 
>
> Key: PDFBOX-4892
> URL: https://issues.apache.org/jira/browse/PDFBOX-4892
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.20
>Reporter: Tilman Hausherr
>Priority: Minor
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], 
> hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-4071, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4892) Improve code quality (4)

2022-05-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17543614#comment-17543614
 ] 

ASF subversion and git services commented on PDFBOX-4892:
-

Commit 1901381 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1901381 ]

PDFBOX-4892: remove unused import

> Improve code quality (4)
> 
>
> Key: PDFBOX-4892
> URL: https://issues.apache.org/jira/browse/PDFBOX-4892
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.20
>Reporter: Tilman Hausherr
>Priority: Minor
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], 
> hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-4071, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5446) Split package org.apache.pdfbox.io

2022-05-29 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17543613#comment-17543613
 ] 

Andreas Lehmkühler commented on PDFBOX-5446:


Thanks  [~axh]. I wasn't aware of this restriction. 

However, I didn't forget to move those classes to the new package but left them 
there intentionally. The implementation isn't generic but pdfbox specific. It 
is used to store the data of new COSStreams. For now I'm thinking about moving 
them to another package instead of the new module.

> Split package org.apache.pdfbox.io
> --
>
> Key: PDFBOX-5446
> URL: https://issues.apache.org/jira/browse/PDFBOX-5446
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Axel Howind
>Priority: Major
>
> Recently a new Maven module 'io' has been created to remove duplicated code. 
> This module contains the package 'org.apache.pdfbox.io'. However, the 
> 'pdfbox' module also contains classes in the same package. This makes it 
> impossible to use pdfbox in a modularized (Jigsaw) project since it is 
> forbidden to have packages split between different modules and produces the 
> following error:
> {noformat}
> the unnamed module reads package org.apache.pdfbox.io from both 
> org.apache.pdfbox and org.apache.pdfbox.io
> {noformat}
> The remaining classes in said üackage should also be moved to the new 'io' 
> module to solve this.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org