from:"Jira"

[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-10-05 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887125#comment-17887125
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1921134 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1921134 ]

PDFBOX-5660: update junit, download-maven-plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-10-05 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887123#comment-17887123
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1921133 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1921133 ]

PDFBOX-5660: update junit; add comment

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Resolved] (PDFBOX-5881) CVE for Lucene libraries

2024-10-04 Thread Tilman Hausherr (Jira)



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-5881.
-
Resolution: Fixed

> CVE for Lucene libraries
> 
>
> Key: PDFBOX-5881
> URL: https://issues.apache.org/jira/browse/PDFBOX-5881
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.33, 3.0.4 PDFBox
>
>
> It looks like Lucene won't make any older jar files that fixes 
> CVE-2024-45772, so I'll add a suppression file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5881) CVE for Lucene libraries

2024-10-04 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887003#comment-17887003
 ] 

ASF subversion and git services commented on PDFBOX-5881:
-

Commit 1921120 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1921120 ]

PDFBOX-5881: add suppressions.xml file

> CVE for Lucene libraries
> 
>
> Key: PDFBOX-5881
> URL: https://issues.apache.org/jira/browse/PDFBOX-5881
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.33, 3.0.4 PDFBox
>
>
> It looks like Lucene won't make any older jar files that fixes 
> CVE-2024-45772, so I'll add a suppression file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5881) CVE for Lucene libraries

2024-10-04 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887001#comment-17887001
 ] 

ASF subversion and git services commented on PDFBOX-5881:
-

Commit 1921118 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1921118 ]

PDFBOX-5881: add comment

> CVE for Lucene libraries
> 
>
> Key: PDFBOX-5881
> URL: https://issues.apache.org/jira/browse/PDFBOX-5881
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.33, 3.0.4 PDFBox
>
>
> It looks like Lucene won't make any older jar files that fixes 
> CVE-2024-45772, so I'll add a suppression file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5881) CVE for Lucene libraries

2024-10-04 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887002#comment-17887002
 ] 

ASF subversion and git services commented on PDFBOX-5881:
-

Commit 1921119 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1921119 ]

PDFBOX-5881: add comment

> CVE for Lucene libraries
> 
>
> Key: PDFBOX-5881
> URL: https://issues.apache.org/jira/browse/PDFBOX-5881
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.33, 3.0.4 PDFBox
>
>
> It looks like Lucene won't make any older jar files that fixes 
> CVE-2024-45772, so I'll add a suppression file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5881) CVE for Lucene libraries

2024-10-04 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886999#comment-17886999
 ] 

ASF subversion and git services commented on PDFBOX-5881:
-

Commit 1921117 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1921117 ]

PDFBOX-5881: add suppressions.xml file

> CVE for Lucene libraries
> 
>
> Key: PDFBOX-5881
> URL: https://issues.apache.org/jira/browse/PDFBOX-5881
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.33, 3.0.4 PDFBox
>
>
> It looks like Lucene won't make any older jar files that fixes 
> CVE-2024-45772, so I'll add a suppression file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Created] (PDFBOX-5881) CVE for Lucene libraries

2024-10-04 Thread Tilman Hausherr (Jira)

Tilman Hausherr created PDFBOX-5881:
---

 Summary: CVE for Lucene libraries
 Key: PDFBOX-5881
 URL: https://issues.apache.org/jira/browse/PDFBOX-5881
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 3.0.3 PDFBox, 2.0.32
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
 Fix For: 2.0.33, 3.0.4 PDFBox


It looks like Lucene won't make any older jar files that fixes CVE-2024-45772, 
so I'll add a suppression file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-10-04 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886984#comment-17886984
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1921114 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1921114 ]

PDFBOX-5660: update lucene

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-10-04 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886939#comment-17886939
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1921107 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1921107 ]

PDFBOX-5660: update lucene

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-10-04 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886930#comment-17886930
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1921106 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1921106 ]

PDFBOX-5660: update lucene

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-10-04 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886928#comment-17886928
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1921105 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1921105 ]

PDFBOX-5660: update lucene

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI

2024-10-03 Thread Jira



[ 
https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886738#comment-17886738
 ] 

Andreas Lehmkühler commented on PDFBOX-4718:


I've expected that, I'll have a look ... thanks again for the fast feedback

> OutOfMemoryError - during renderImageWithDPI
> 
>
> Key: PDFBOX-4718
> URL: https://issues.apache.org/jira/browse/PDFBOX-4718
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.17, 3.0.3 PDFBox, 4.0.0
> Environment: macOS Mojave (10.14.6)
> Java 11.0.2 -Xmx10G -Xms10G
>Reporter: Serhii Kolesnyk
>Assignee: Andreas Lehmkühler
>Priority: Blocker
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, 
> example.pdf, image-2019-12-19-05-55-57-648.png
>
>
> During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_
> {code:java}
> Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) 
> at 
> java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) 
> at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at 
> java.base/java.lang.Thread.run(Thread.java:834)
> java.lang.OutOfMemoryError: Java heap space
>  at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at 
> java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at 
> java.desktop/java.awt.geom.Area.intersect(Area.java:293) at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618)
>  at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597)
>  at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at 
> org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152)
>  at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code}
> We check the different setting of MemoryUsageSetting (TempFileOnly, 
> MainMemoryOnly), settings of DPI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI

2024-10-03 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886737#comment-17886737
 ] 

Tilman Hausherr edited comment on PDFBOX-4718 at 10/3/24 5:39 PM:
--

Sadly some differences in rendering: PDFBOX-2557, PDFBOX-3182, PDFBOX-5842 (VW 
logo missing), PDFBOX-3116.pdf (half-circles bottom right)


was (Author: tilman):
Sadly some differences in rendering: PDFBOX-2557, PDFBOX-3182, PDFBOX-5842 (VW 
logo missing), PDFBOX-3116.pdf (circles bottom right)

> OutOfMemoryError - during renderImageWithDPI
> 
>
> Key: PDFBOX-4718
> URL: https://issues.apache.org/jira/browse/PDFBOX-4718
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.17, 3.0.3 PDFBox, 4.0.0
> Environment: macOS Mojave (10.14.6)
> Java 11.0.2 -Xmx10G -Xms10G
>Reporter: Serhii Kolesnyk
>Assignee: Andreas Lehmkühler
>Priority: Blocker
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, 
> example.pdf, image-2019-12-19-05-55-57-648.png
>
>
> During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_
> {code:java}
> Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) 
> at 
> java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) 
> at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at 
> java.base/java.lang.Thread.run(Thread.java:834)
> java.lang.OutOfMemoryError: Java heap space
>  at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at 
> java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at 
> java.desktop/java.awt.geom.Area.intersect(Area.java:293) at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618)
>  at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597)
>  at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at 
> org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152)
>  at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code}
> We check the different setting of MemoryUsageSetting (TempFileOnly, 
> MainMemoryOnly), settings of DPI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI

2024-10-03 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886737#comment-17886737
 ] 

Tilman Hausherr commented on PDFBOX-4718:
-

Sadly some differences in rendering: PDFBOX-2557, PDFBOX-3182, PDFBOX-5842 (VW 
logo missing), PDFBOX-3116.pdf (circles bottom right)

> OutOfMemoryError - during renderImageWithDPI
> 
>
> Key: PDFBOX-4718
> URL: https://issues.apache.org/jira/browse/PDFBOX-4718
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.17, 3.0.3 PDFBox, 4.0.0
> Environment: macOS Mojave (10.14.6)
> Java 11.0.2 -Xmx10G -Xms10G
>Reporter: Serhii Kolesnyk
>Assignee: Andreas Lehmkühler
>Priority: Blocker
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, 
> example.pdf, image-2019-12-19-05-55-57-648.png
>
>
> During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_
> {code:java}
> Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) 
> at 
> java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) 
> at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at 
> java.base/java.lang.Thread.run(Thread.java:834)
> java.lang.OutOfMemoryError: Java heap space
>  at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at 
> java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at 
> java.desktop/java.awt.geom.Area.intersect(Area.java:293) at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618)
>  at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597)
>  at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at 
> org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152)
>  at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code}
> We check the different setting of MemoryUsageSetting (TempFileOnly, 
> MainMemoryOnly), settings of DPI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4743) Long rendering time of fonts in a specific PDF

2024-10-03 Thread Jira



[ 
https://issues.apache.org/jira/browse/PDFBOX-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886734#comment-17886734
 ] 

Andreas Lehmkühler commented on PDFBOX-4743:


My changes from PDFBOX-4718 speed up the rendering by factor 3. On my machine 
it takes about 14-15 seconds to render  [^slow_rendering.pdf]

> Long rendering time of fonts in a specific PDF
> --
>
> Key: PDFBOX-4743
> URL: https://issues.apache.org/jira/browse/PDFBOX-4743
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.32, 3.0.3 PDFBox, 4.0.0
> Environment: Gentoo Linux, Java 8
>Reporter: Daniel Persson
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: image-2020-01-18-04-11-00-132.png, slow_rendering.pdf, 
> without_images.pdf, without_text.pdf
>
>
> Hi Team.
>  
> We have found a PDF that takes a long time to render images.
>  
> After some checking, we found that the one page takes more than 2 minutes to 
> render, but if we remove the font information and render the PDF without 
> text, it takes 3 seconds.
>  
> Just looking at the font information, it doesn't seem to be a lot of data. 
> 3-5kb per font and there are only about seven fonts defined. So there must be 
> something else that complicates things.
>  
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-4743) Long rendering time of fonts in a specific PDF

2024-10-03 Thread Jira



 [ 
https://issues.apache.org/jira/browse/PDFBOX-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-4743:
---
Fix Version/s: 2.0.33
   3.0.4 PDFBox
   4.0.0

> Long rendering time of fonts in a specific PDF
> --
>
> Key: PDFBOX-4743
> URL: https://issues.apache.org/jira/browse/PDFBOX-4743
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.32, 3.0.3 PDFBox, 4.0.0
> Environment: Gentoo Linux, Java 8
>Reporter: Daniel Persson
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: image-2020-01-18-04-11-00-132.png, slow_rendering.pdf, 
> without_images.pdf, without_text.pdf
>
>
> Hi Team.
>  
> We have found a PDF that takes a long time to render images.
>  
> After some checking, we found that the one page takes more than 2 minutes to 
> render, but if we remove the font information and render the PDF without 
> text, it takes 3 seconds.
>  
> Just looking at the font information, it doesn't seem to be a lot of data. 
> 3-5kb per font and there are only about seven fonts defined. So there must be 
> something else that complicates things.
>  
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI

2024-10-03 Thread Jira



 [ 
https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-4718:
---
Fix Version/s: 2.0.33
   3.0.4 PDFBox

> OutOfMemoryError - during renderImageWithDPI
> 
>
> Key: PDFBOX-4718
> URL: https://issues.apache.org/jira/browse/PDFBOX-4718
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.17, 3.0.3 PDFBox, 4.0.0
> Environment: macOS Mojave (10.14.6)
> Java 11.0.2 -Xmx10G -Xms10G
>Reporter: Serhii Kolesnyk
>Assignee: Andreas Lehmkühler
>Priority: Blocker
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, 
> example.pdf, image-2019-12-19-05-55-57-648.png
>
>
> During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_
> {code:java}
> Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) 
> at 
> java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) 
> at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at 
> java.base/java.lang.Thread.run(Thread.java:834)
> java.lang.OutOfMemoryError: Java heap space
>  at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at 
> java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at 
> java.desktop/java.awt.geom.Area.intersect(Area.java:293) at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618)
>  at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597)
>  at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at 
> org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152)
>  at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code}
> We check the different setting of MemoryUsageSetting (TempFileOnly, 
> MainMemoryOnly), settings of DPI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI

2024-10-03 Thread Jira



 [ 
https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-4718:
---
Affects Version/s: 3.0.3 PDFBox
   4.0.0
   (was: 2.0.12)

> OutOfMemoryError - during renderImageWithDPI
> 
>
> Key: PDFBOX-4718
> URL: https://issues.apache.org/jira/browse/PDFBOX-4718
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.17, 3.0.3 PDFBox, 4.0.0
> Environment: macOS Mojave (10.14.6)
> Java 11.0.2 -Xmx10G -Xms10G
>Reporter: Serhii Kolesnyk
>Assignee: Andreas Lehmkühler
>Priority: Blocker
> Fix For: 4.0.0
>
> Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, 
> example.pdf, image-2019-12-19-05-55-57-648.png
>
>
> During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_
> {code:java}
> Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) 
> at 
> java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) 
> at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at 
> java.base/java.lang.Thread.run(Thread.java:834)
> java.lang.OutOfMemoryError: Java heap space
>  at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at 
> java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at 
> java.desktop/java.awt.geom.Area.intersect(Area.java:293) at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618)
>  at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597)
>  at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at 
> org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152)
>  at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code}
> We check the different setting of MemoryUsageSetting (TempFileOnly, 
> MainMemoryOnly), settings of DPI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-4743) Long rendering time of fonts in a specific PDF

2024-10-03 Thread Jira



 [ 
https://issues.apache.org/jira/browse/PDFBOX-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-4743:
---
Affects Version/s: 3.0.3 PDFBox
   2.0.32
   4.0.0

> Long rendering time of fonts in a specific PDF
> --
>
> Key: PDFBOX-4743
> URL: https://issues.apache.org/jira/browse/PDFBOX-4743
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.32, 3.0.3 PDFBox, 4.0.0
> Environment: Gentoo Linux, Java 8
>Reporter: Daniel Persson
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Attachments: image-2020-01-18-04-11-00-132.png, slow_rendering.pdf, 
> without_images.pdf, without_text.pdf
>
>
> Hi Team.
>  
> We have found a PDF that takes a long time to render images.
>  
> After some checking, we found that the one page takes more than 2 minutes to 
> render, but if we remove the font information and render the PDF without 
> text, it takes 3 seconds.
>  
> Just looking at the font information, it doesn't seem to be a lot of data. 
> 3-5kb per font and there are only about seven fonts defined. So there must be 
> something else that complicates things.
>  
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Assigned] (PDFBOX-4743) Long rendering time of fonts in a specific PDF

2024-10-03 Thread Jira



 [ 
https://issues.apache.org/jira/browse/PDFBOX-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler reassigned PDFBOX-4743:
--

Assignee: Andreas Lehmkühler

> Long rendering time of fonts in a specific PDF
> --
>
> Key: PDFBOX-4743
> URL: https://issues.apache.org/jira/browse/PDFBOX-4743
> Project: PDFBox
>  Issue Type: Improvement
> Environment: Gentoo Linux, Java 8
>Reporter: Daniel Persson
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Attachments: image-2020-01-18-04-11-00-132.png, slow_rendering.pdf, 
> without_images.pdf, without_text.pdf
>
>
> Hi Team.
>  
> We have found a PDF that takes a long time to render images.
>  
> After some checking, we found that the one page takes more than 2 minutes to 
> render, but if we remove the font information and render the PDF without 
> text, it takes 3 seconds.
>  
> Just looking at the font information, it doesn't seem to be a lot of data. 
> 3-5kb per font and there are only about seven fonts defined. So there must be 
> something else that complicates things.
>  
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI

2024-10-03 Thread Jira



[ 
https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886732#comment-17886732
 ] 

Andreas Lehmkühler commented on PDFBOX-4718:


i've found a workaround so that the attached pdf is rendered in about 4-5 
seconds

Some details:
* I've calculated an intersected overall bound box from all clipping paths
* the overall bounding box is used a starting point for the calculation of the 
intersected clipping path. This could decrease the complexity in some cases, so 
that the call of Area.intersect needs less resources/time to calculate
* clipping paths which represent a rectangular are skipped as they were already 
taken into account when calculation the intersected overall bound box

> OutOfMemoryError - during renderImageWithDPI
> 
>
> Key: PDFBOX-4718
> URL: https://issues.apache.org/jira/browse/PDFBOX-4718
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.12, 2.0.17
> Environment: macOS Mojave (10.14.6)
> Java 11.0.2 -Xmx10G -Xms10G
>Reporter: Serhii Kolesnyk
>Assignee: Andreas Lehmkühler
>Priority: Blocker
> Fix For: 4.0.0
>
> Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, 
> example.pdf, image-2019-12-19-05-55-57-648.png
>
>
> During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_
> {code:java}
> Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) 
> at 
> java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) 
> at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at 
> java.base/java.lang.Thread.run(Thread.java:834)
> java.lang.OutOfMemoryError: Java heap space
>  at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at 
> java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at 
> java.desktop/java.awt.geom.Area.intersect(Area.java:293) at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618)
>  at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597)
>  at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at 
> org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152)
>  at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code}
> We check the different setting of MemoryUsageSetting (TempFileOnly, 
> MainMemoryOnly), settings of DPI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI

2024-10-03 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886731#comment-17886731
 ] 

ASF subversion and git services commented on PDFBOX-4718:
-

Commit 1921096 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1921096 ]

PDFBOX-4718: optimize intersection of clipping paths

> OutOfMemoryError - during renderImageWithDPI
> 
>
> Key: PDFBOX-4718
> URL: https://issues.apache.org/jira/browse/PDFBOX-4718
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.12, 2.0.17
> Environment: macOS Mojave (10.14.6)
> Java 11.0.2 -Xmx10G -Xms10G
>Reporter: Serhii Kolesnyk
>Assignee: Andreas Lehmkühler
>Priority: Blocker
> Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, 
> example.pdf, image-2019-12-19-05-55-57-648.png
>
>
> During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_
> {code:java}
> Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) 
> at 
> java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) 
> at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at 
> java.base/java.lang.Thread.run(Thread.java:834)
> java.lang.OutOfMemoryError: Java heap space
>  at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at 
> java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at 
> java.desktop/java.awt.geom.Area.intersect(Area.java:293) at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618)
>  at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597)
>  at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at 
> org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152)
>  at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code}
> We check the different setting of MemoryUsageSetting (TempFileOnly, 
> MainMemoryOnly), settings of DPI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI

2024-10-03 Thread Jira



 [ 
https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-4718:
---
Fix Version/s: 4.0.0

> OutOfMemoryError - during renderImageWithDPI
> 
>
> Key: PDFBOX-4718
> URL: https://issues.apache.org/jira/browse/PDFBOX-4718
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.12, 2.0.17
> Environment: macOS Mojave (10.14.6)
> Java 11.0.2 -Xmx10G -Xms10G
>Reporter: Serhii Kolesnyk
>Assignee: Andreas Lehmkühler
>Priority: Blocker
> Fix For: 4.0.0
>
> Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, 
> example.pdf, image-2019-12-19-05-55-57-648.png
>
>
> During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_
> {code:java}
> Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) 
> at 
> java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) 
> at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at 
> java.base/java.lang.Thread.run(Thread.java:834)
> java.lang.OutOfMemoryError: Java heap space
>  at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at 
> java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at 
> java.desktop/java.awt.geom.Area.intersect(Area.java:293) at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618)
>  at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597)
>  at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at 
> org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152)
>  at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code}
> We check the different setting of MemoryUsageSetting (TempFileOnly, 
> MainMemoryOnly), settings of DPI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Assigned] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI

2024-10-03 Thread Jira



 [ 
https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler reassigned PDFBOX-4718:
--

Assignee: Andreas Lehmkühler

> OutOfMemoryError - during renderImageWithDPI
> 
>
> Key: PDFBOX-4718
> URL: https://issues.apache.org/jira/browse/PDFBOX-4718
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.12, 2.0.17
> Environment: macOS Mojave (10.14.6)
> Java 11.0.2 -Xmx10G -Xms10G
>Reporter: Serhii Kolesnyk
>Assignee: Andreas Lehmkühler
>Priority: Blocker
> Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, 
> example.pdf, image-2019-12-19-05-55-57-648.png
>
>
> During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_
> {code:java}
> Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap 
> space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) 
> at 
> java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) 
> at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at 
> java.base/java.lang.Thread.run(Thread.java:834)
> java.lang.OutOfMemoryError: Java heap space
>  at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at 
> java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at 
> java.desktop/java.awt.geom.Area.intersect(Area.java:293) at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618)
>  at 
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597)
>  at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at 
> org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479)
>  at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152)
>  at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code}
> We check the different setting of MemoryUsageSetting (TempFileOnly, 
> MainMemoryOnly), settings of DPI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-10-03 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886594#comment-17886594
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1921093 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1921093 ]

PDFBOX-5660: update mockito

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-09-29 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885808#comment-17885808
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1921026 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1921026 ]

PDFBOX-5660: update log4j

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-09-29 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885807#comment-17885807
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1921025 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1921025 ]

PDFBOX-5660: update log4j

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-5876) This jpeg2000 takes up a lot of memory, causing overflow.

2024-09-29 Thread Jira



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-5876:
---
Fix Version/s: (was: 4.0.0)
   (was: 2.0.33)
   (was: 3.0.4 PDFBox)

> This jpeg2000 takes up a lot of memory, causing overflow.
> -
>
> Key: PDFBOX-5876
> URL: https://issues.apache.org/jira/browse/PDFBOX-5876
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.32, 3.0.2 PDFBox
>Reporter: liu
>Assignee: Tilman Hausherr
>Priority: Major
> Attachments: jpeg2000.pdf
>
>
> pdf：[^jpeg2000.pdf]
> JVM：-Xmx600m
> {code:java}
> //代码占位符
> public static void main(String[] args) throws IOException, 
> InterruptedException {
>File file = new File("C:\\Users\\LYCIT\\Downloads\\jpeg2000.pdf");
>PDDocument pdf = Loader.loadPDF(file, 
> IOUtils.createTempFileOnlyStreamCache());
>PDFRenderer renderer = new PDFRenderer(pdf);
>int numPages = 0;
>renderer.setSubsamplingAllowed(true);
>BufferedImage bi = renderer.renderImage(numPages, 0.5f);
>pdf.close();
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Resolved] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected e

2024-09-29 Thread Jira



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-5880.

Resolution: Fixed

The given pdf is a corner case and it works now.

[~tilman] thanks again for your help

[~jezerinac] thanks for the report

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-29 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885689#comment-17885689
 ] 

ASF subversion and git services commented on PDFBOX-5880:
-

Commit 1921020 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1921020 ]

PDFBOX-5880: set missing/replace invalid stream length

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Fix For: 2.0.33, 3.0.4 PDFBox
>
> Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en

2024-09-29 Thread Jira



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-5880:
---
Fix Version/s: 4.0.0

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en

2024-09-29 Thread Jira



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-5880:
---
Fix Version/s: 2.0.33

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Fix For: 2.0.33, 3.0.4 PDFBox
>
> Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-29 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885687#comment-17885687
 ] 

ASF subversion and git services commented on PDFBOX-5880:
-

Commit 1921019 from le...@apache.org in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1921019 ]

PDFBOX-5880: don't restore invalid stream length values <= 0, mark stream 
length values <= as invalid

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Fix For: 3.0.4 PDFBox
>
> Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-28 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885635#comment-17885635
 ] 

Tilman Hausherr commented on PDFBOX-5880:
-

Now it works!

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Fix For: 3.0.4 PDFBox
>
> Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en

2024-09-28 Thread Jira



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-5880:
---
Fix Version/s: 3.0.4 PDFBox

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Fix For: 3.0.4 PDFBox
>
> Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-28 Thread Jira



[ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885621#comment-17885621
 ] 

Andreas Lehmkühler commented on PDFBOX-5880:


It should  work again. I've mixed up the logic in validateStreamLength so that 
the pointer into the source wasn't reset to the origin offset in one case

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-28 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885620#comment-17885620
 ] 

ASF subversion and git services commented on PDFBOX-5880:
-

Commit 1921011 from le...@apache.org in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1921011 ]

PDFBOX-5880: always seek to the origin offset, set/replace length value only if 
needed

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-09-27 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885525#comment-17885525
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1921003 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1921003 ]

PDFBOX-5660: update mockito

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-27 Thread Jira



[ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885428#comment-17885428
 ] 

Andreas Lehmkühler commented on PDFBOX-5880:


Thanks for the pointer, I'm going to look into it


> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en

2024-09-27 Thread Tilman Hausherr (Jira)



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5880:

Attachment: PDFBOX-1094-PDFBOX-269.pdf

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-27 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885251#comment-17885251
 ] 

Tilman Hausherr commented on PDFBOX-5880:
-

Several differences, e.g.  [^PDFBOX-1094-PDFBOX-269.pdf] page 2ff, the light 
background is different. Also the file of PDFBOX-1738.

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-26 Thread Jira



[ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885226#comment-17885226
 ] 

Andreas Lehmkühler commented on PDFBOX-5880:


[~tilman] that's a good idea. but I'd prefer to do so in the COSParser so that 
the context don't get lost.

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
>     URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Attachments: test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885227#comment-17885227
 ] 

ASF subversion and git services commented on PDFBOX-5880:
-

Commit 1920970 from le...@apache.org in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1920970 ]

PDFBOX-5880: set missing/replace invalid stream length

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Attachments: test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Assigned] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected e

2024-09-26 Thread Jira



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler reassigned PDFBOX-5880:
--

Assignee: Andreas Lehmkühler

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Attachments: test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Resolved] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread Jira



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-5852.

Resolution: Fixed

I guess we are done here.

[~larry.l...@workiva.com] thanks for the report and the sample pdf

[~tilman] thanks for your input and help

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 4.0.0, 3.0.3 PDFBox
>
> Attachments: CIB-coonsmesh.pdf, minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884871#comment-17884871
 ] 

ASF subversion and git services commented on PDFBOX-5852:
-

Commit 1920945 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1920945 ]

PDFBOX-5852: don't create an unused array

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
>     URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: CIB-coonsmesh.pdf, minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884869#comment-17884869
 ] 

ASF subversion and git services commented on PDFBOX-5852:
-

Commit 1920943 from le...@apache.org in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1920943 ]

PDFBOX-5852: don't create an unused array

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
>     URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: CIB-coonsmesh.pdf, minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884870#comment-17884870
 ] 

ASF subversion and git services commented on PDFBOX-5852:
-

Commit 1920944 from le...@apache.org in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1920944 ]

PDFBOX-5852: don't create an unused array

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
>     URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: CIB-coonsmesh.pdf, minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884866#comment-17884866
 ] 

ASF subversion and git services commented on PDFBOX-5852:
-

Commit 1920942 from le...@apache.org in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1920942 ]

PDFBOX-5852: replace Integer with int, add some minor optimizations

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: CIB-coonsmesh.pdf, minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884864#comment-17884864
 ] 

ASF subversion and git services commented on PDFBOX-5852:
-

Commit 1920941 from le...@apache.org in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1920941 ]

PDFBOX-5852: replace Integer with int, add some minor optimizations

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: CIB-coonsmesh.pdf, minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884739#comment-17884739
 ] 

Tilman Hausherr commented on PDFBOX-5852:
-

All good now, thanks!

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: CIB-coonsmesh.pdf, minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread Jira



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884694#comment-17884694
 ] 

Andreas Lehmkühler commented on PDFBOX-5852:


That was an easy fix ;-) The implementations of {{calcPixelTableArray}} weren't 
in line. I forgot to add an offset of one in one case

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: CIB-coonsmesh.pdf, minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884693#comment-17884693
 ] 

ASF subversion and git services commented on PDFBOX-5852:
-

Commit 1920923 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1920923 ]

PDFBOX-5852: fix ArrayIndexOutOfBoundsException

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: CIB-coonsmesh.pdf, minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread Jira



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884692#comment-17884692
 ] 

Andreas Lehmkühler commented on PDFBOX-5852:


[~tilman] thanks for the feedback and the sample pdf. I'm going to have a look

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: CIB-coonsmesh.pdf, minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-09-25 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884598#comment-17884598
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1920908 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1920908 ]

PDFBOX-5660: update junit

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-09-25 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884597#comment-17884597
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1920907 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1920907 ]

PDFBOX-5660: update junit

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-25 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884548#comment-17884548
 ] 

Tilman Hausherr commented on PDFBOX-5880:
-

proposed change is to add {{stream.setLong(COSName.LENGTH, streamLength);}} or 
change the foreach loop that it doesn't overwrite the length entry.

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Priority: Major
>  Labels: regression
> Attachments: test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884540#comment-17884540
 ] 

Tilman Hausherr commented on PDFBOX-5852:
-

E.g. with this file: [^CIB-coonsmesh.pdf] 

ArrayIndexOutOfBoundsException: Index 400 out of bounds for length 400

org.apache.pdfbox.pdmodel.graphics.shading.PatchMeshesShadingContext.calcPixelTableArray(PatchMeshesShadingContext.java:67)

org.apache.pdfbox.pdmodel.graphics.shading.TriangleBasedShadingContext.createPixelTable(TriangleBasedShadingContext.java:67)

org.apache.pdfbox.pdmodel.graphics.shading.PatchMeshesShadingContext.(PatchMeshesShadingContext.java:57)

org.apache.pdfbox.pdmodel.graphics.shading.Type6ShadingContext.(Type6ShadingContext.java:45)

org.apache.pdfbox.pdmodel.graphics.shading.Type6ShadingPaint.createContext(Type6ShadingPaint.java:63)

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: CIB-coonsmesh.pdf, minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread Tilman Hausherr (Jira)



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5852:

Attachment: CIB-coonsmesh.pdf

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: CIB-coonsmesh.pdf, minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884533#comment-17884533
 ] 

Tilman Hausherr commented on PDFBOX-5852:
-

Lots of regressions, I need to check whether this is because of another change 
I just did, or if the first test didn't have the new code activated.

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread Tilman Hausherr (Jira)



[ https://issues.apache.org/jira/browse/PDFBOX-5852 ]


Tilman Hausherr deleted comment on PDFBOX-5852:
-

was (Author: tilman):
No regressions 👍

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-25 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884531#comment-17884531
 ] 

Tilman Hausherr commented on PDFBOX-5880:
-

The problem is here:
{code:java}
    public COSStream createCOSStream(COSDictionary dictionary, long 
startPosition,
            long streamLength) throws IOException
    {
        COSStream stream = new COSStream(streamCache,
                parser.createRandomAccessReadView(startPosition, streamLength));
        dictionary.forEach(stream::setItem);
        stream.setKey(dictionary.getKey());
        return stream;
    }
 {code}
The foreach loop overwrites the length. For some reason this didn't make 
troubles in the past with wrong lengths, only this time with a zero length that 
is an indirect object.

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Priority: Major
>  Labels: regression
> Attachments: test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-25 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884528#comment-17884528
 ] 

Tilman Hausherr commented on PDFBOX-5852:
-

No regressions 👍

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884511#comment-17884511
 ] 

ASF subversion and git services commented on PDFBOX-5852:
-

Commit 1920894 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1920894 ]

PDFBOX-5852: replace Integer with int, add some minor optimizations

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expe

2024-09-24 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884492#comment-17884492
 ] 

Tilman Hausherr edited comment on PDFBOX-5880 at 9/25/24 3:55 AM:
--

The PDF image stream has an (incorrect) length of 0. The workaround fails for 
some reason. Amusingly, this worked in 1.8.16, which displays the message 
"WARNUNG: /Length of COSObject\{1, 0} corrected from 0 to 695645".


was (Author: tilman):
The image has an (incorrect) length of 0. The workaround fails for some reason. 
Amusingly, this worked in 1.8.16, which displays the message "WARNUNG: /Length 
of COSObject\{1, 0} corrected from 0 to 695645".

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
>     URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Priority: Major
>  Labels: regression
> Attachments: test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected

2024-09-24 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884492#comment-17884492
 ] 

Tilman Hausherr commented on PDFBOX-5880:
-

The image has an (incorrect) length of 0. The workaround fails for some reason. 
Amusingly, this worked in 1.8.16, which displays the message "WARNUNG: /Length 
of COSObject\{1, 0} corrected from 0 to 695645".

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Priority: Major
>  Labels: regression
> Attachments: test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en

2024-09-24 Thread Tilman Hausherr (Jira)



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5880:

Labels: regression  (was: )

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Priority: Major
>  Labels: regression
> Attachments: test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en

2024-09-24 Thread Tilman Hausherr (Jira)



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5880:

Affects Version/s: 2.0.32

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Priority: Major
> Attachments: test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en

2024-09-24 Thread Tilman Hausherr (Jira)



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5880:

Component/s: Parsing
 (was: Rendering)

> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> 
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Priority: Major
> Attachments: test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Created] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en

2024-09-24 Thread Joseph Jezerinac (Jira)

Joseph Jezerinac created PDFBOX-5880:


 Summary: PDF render blank page: The end of the stream doesn't 
point to the correct offset, using workaround to read the stream, stream start 
position: 196, length: 0, expected end position: 196
 Key: PDFBOX-5880
 URL: https://issues.apache.org/jira/browse/PDFBOX-5880
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 3.0.3 PDFBox
Reporter: Joseph Jezerinac
 Attachments: test.pdf

When rendering page one of the attached PDF the image does not render.

In the logs, I see the following:

{noformat}
2024-09-24 13:25:56:702 [main] WARN DocManagerImpl - Aspose.PDF/Words license 
initialized
2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
point to the correct offset, using workaround to read the stream, stream start 
position: 196, length: 0, expected end position: 196
2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
java.io.IOException: Image stream is empty
at 
org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
at 
org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
{noformat}

I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
issue.

Here's the render code used:
{code:java}
File out = File.createTempFile("test-", ".png");
PDDocument pdDocument = Loader.loadPDF(pdf);
final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en

2024-09-24 Thread Joseph Jezerinac (Jira)



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Jezerinac updated PDFBOX-5880:
-
Description: 
When rendering page one of the attached PDF the image does not render.

In the logs, I see the following:

{noformat}
2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
point to the correct offset, using workaround to read the stream, stream start 
position: 196, length: 0, expected end position: 196
2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
java.io.IOException: Image stream is empty
at 
org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
at 
org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
{noformat}

I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
issue.

Here's the render code used:
{code:java}
File out = File.createTempFile("test-", ".png");
PDDocument pdDocument = Loader.loadPDF(pdf);
final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
{code}

  was:
When rendering page one of the attached PDF the image does not render.

In the logs, I see the following:

{noformat}
2024-09-24 13:25:56:702 [main] WARN DocManagerImpl - Aspose.PDF/Words license 
initialized
2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
point to the correct offset, using workaround to read the stream, stream start 
position: 196, length: 0, expected end position: 196
2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
java.io.IOException: Image stream is empty
at 
org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
at 
org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
{noformat}

I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
issue.

Here's the render code used:
{code:java}
File out = File.createTempFile("test-", ".png");
PDDocument pdDocument = Loader.loadPDF(pdf);
final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
{code}


> PDF render blank page: The end of the stream doesn't point to the correct 
> offset, using workaround to read the stream, stream start position: 196, 
> length: 0, expected end position: 196
> ----
>
> Key: PDFBOX-5880
> URL: https://issues.apache.org/jira/browse/PDFBOX-5880
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 3.0.3 PDFBox
>Reporter: Joseph Jezerinac
>Priority: Major
> Attachments: test.pdf
>
>
> When rendering page one of the attached PDF the image does not render.
> In the logs, I see the following:
> {noformat}
> 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't 
> point to the correct offset, using workaround to read the stream, stream 
> start position: 196, length: 0, expected end position: 196
> 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107)
> {noformat}
> I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an 
> issue.
> Here's the render code used:
> {code:java}
> File out = File.createTempFile("test-", ".png");
> PDDocument pdDocument = Loader.loadPDF(pdf);
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-09-21 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883536#comment-17883536
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1920834 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1920834 ]

PDFBOX-5660: update commons-io

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-09-21 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883535#comment-17883535
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1920833 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1920833 ]

PDFBOX-5660: update commons-io

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5561) qpdf shows warnings trying to linearize file modified by PDFBOX

2024-09-19 Thread HABA (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882984#comment-17882984
 ] 

HABA commented on PDFBOX-5561:
--

Hello, 



I am encountering the same warning message.

I am currently using version {*}3.0.3{*}. Have you found any solutions or 
workarounds to resolve this issue?

Thank you for your assistance.

Best regards,
HABA

> qpdf shows warnings trying to linearize file modified by PDFBOX
> ---
>
> Key: PDFBOX-5561
> URL: https://issues.apache.org/jira/browse/PDFBOX-5561
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.27
>Reporter: menteith
>Priority: Minor
>
> I have a PDF file* that is generated by a software other than PDFBox. When 
> the PDF is modified by code given below using PDFBOX, *qpdf* shows the 
> following warning:
> {code:java}
> WARNING: modified.pdf: reported number of objects (12991) is not one plus the 
> highest object number (12989)
> qpdf: operation succeeded with warnings; resulting file may have some 
> problems{code}
> Note the warning is not shown when *qpdf* analyses original pdf file (ie. pdf 
> not modified by PDFBox).
> Here's the code to modify PDF in question:
>  
> {code:java}
> for (final PDPage page: document.getPages()) {
>     page.getAnnotations().forEach(annotation - > {
>         if (annotation instanceof PDAnnotationLink link) {
>             final PDPageXYZDestination destination = new 
> PDPageXYZDestination();
>             destination.setPage(document.getPage(1));
>             final PDActionGoTo action = new PDActionGoTo();
>             action.setDestination(destination);
>             link.setAction(action);
>         }
>     });
> } {code}
>  
> I forgot to mention that the result file generated by PDFBox is almost as 
> twice as big as the original one.
> *I've sent the file to Tilman Hausherr.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-18 Thread Jira



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882794#comment-17882794
 ] 

Andreas Lehmkühler edited comment on PDFBOX-5852 at 9/18/24 7:52 PM:
-

There are some details which might be optimized with regard to memory 
consumption. One is the Integer vs. int thing. I'm still on it as the code 
logic has to be changed due to the fact that an int value can't be null and 
there is some logic which relies on that. No big issue so that I'm going to 
come up with some additional changes 


was (Author: lehmi):
There are some details which might be optimized with regard to memory 
consumption. One is the Integer vs. int thing. I'm still on it as the code 
logic has to be changed due to the fact that an int value can't be null and 
there is some logic which relies on that. No big issue so that I'm going to 
come up with some more changes 

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
>     URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-18 Thread Jira



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882794#comment-17882794
 ] 

Andreas Lehmkühler commented on PDFBOX-5852:


There are some details which might be optimized with regard to memory 
consumption. One is the Integer vs. int thing. I'm still on it as the code 
logic has to be changed due to the fact that an int value can't be null and 
there is some logic which relies on that. No big issue so that I'm going to 
come up with some more changes 

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
>     URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-18 Thread Larry Lynn (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882781#comment-17882781
 ] 

Larry Lynn commented on PDFBOX-5852:


I see the updated code uses a 
{code:java}
Integer[][] {code}
Previously, we needed an Integer rather than an int because Java doesn't 
support primitive values in Maps (at least, not without an extra library).  Now 
that we're not using a Map, could we instead use an
{code:java}
int[][] {code}
?

 

When I was running this code in a debugger, I saw that that map could get very 
big, especially when a conversion was requested at very high resolutions.  I 
think I saw sizes in excess of 10 million elements.  If int would work instead 
of Integer, I think that could yield a fair savings in memory usage since the 
primitive type doesn't need extra memory overhead of the object

[https://stackoverflow.com/questions/6081955/memory-footprint-of-int-and-integer-arrays]

[https://www.javamex.com/tutorials/memory/object_memory_usage.shtml]

A 2-d int array would probably be faster too.

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
>     URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-18 Thread Larry Lynn (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882778#comment-17882778
 ] 

Larry Lynn commented on PDFBOX-5852:


Thank you all very much for your work on this ticket.

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882471#comment-17882471
 ] 

ASF subversion and git services commented on PDFBOX-5852:
-

Commit 1920756 from le...@apache.org in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1920756 ]

PDFBOX-5852: replace IntPoint with Point

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-17 Thread Jira



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-5852:
---
Fix Version/s: 2.0.33

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0
>
> Attachments: minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882474#comment-17882474
 ] 

ASF subversion and git services commented on PDFBOX-5852:
-

Commit 1920757 from le...@apache.org in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1920757 ]

PDFBOX-5852: deprecate IntPoint

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading

2024-09-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882470#comment-17882470
 ] 

ASF subversion and git services commented on PDFBOX-5852:
-

Commit 1920755 from le...@apache.org in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1920755 ]

PDFBOX-5852: replace Map with a two-dimensional array

> Hi CPU and memory usage when converting a PDF with type 4 shading
> -
>
> Key: PDFBOX-5852
> URL: https://issues.apache.org/jira/browse/PDFBOX-5852
> Project: PDFBox
>  Issue Type: Wish
>  Components: Rendering
>Affects Versions: 2.0.28
>Reporter: Larry Lynn
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: minimal.pdf
>
>
> We've observed excessive CPU and memory consumption when converting a PDF to 
> images when the PDF contains type 4 shading.  This is especially noticeable 
> when the conversion is done with a high DPI.  Can this be improved?
>  
> Conversation from the PDFBox users mailing list follows
> Initial email:
> {quote}
> Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox 
> users and maintainers,
> We have a PDF that causes performance problems when we use PDFBox to
> convert it to an image with renderImageWithDPI().  We're calling
> renderImageWithDPI()
> with 650 DPI.  I realize this is a very high value - we're using it for
> high fidelity original images that will later be downsampled.  On my work
> laptop which has fairly strong hardware, the conversion takes 25 minutes
> and consumes 20GB of memory.  CPU and memory usage is reduced if we use a
> lower DPI.
> The PDF is 1 page long.  It contains type 4 shading / Gouraud free form
> triangle meshes.  We've been aware of some performance issues with type 4
> shading for a little while now, but the PDFs that contained the type 4
> shading belonged to our customers and we were not authorized to share
> them.  We finally found a problem input document that is non-sensitive and
> that we are authorized to share.  I've attached a copy of the problem PDF
> to this email.
> I searched the archives for the users and the developers mailing list and I
> didn't find anything specifically about this issue.
> I searched through the PDFBox jira tickets and I found a couple of tickets
> that looked similar: PDFBOX-2901 & PDFBOX-4491.  PDFBOX-2901 seems to most
> closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
> and our issue still reproduces with PDFBox 2.0.28.
> Should I refer this issue over to the developers mailing list or create a
> PDFBox Jira ticket for this?
> Thanks and Regards,
> Larry Lynn {quote}
> Response:
> {quote}
> Hi,
> Yes shading can be very slow, especially at high dpi. The attachment 
> didn't get through, please upload to a sharehoster or create a ticket. 
> If you need to register then add a meaningful text, e.g. the subject of 
> this post so we know you're not a spammer. Also retry with 2.0.31 and 
> 3.0.2 just to be sure. However I'm pessimistic that this can be fixed.
> Tilman {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882355#comment-17882355
 ] 

ASF subversion and git services commented on PDFBOX-5879:
-

Commit 1920743 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1920743 ]

PDFBOX-5879: respect code conventions

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882354#comment-17882354
 ] 

ASF subversion and git services commented on PDFBOX-5879:
-

Commit 1920742 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1920742 ]

PDFBOX-5879: respect code conventions

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882327#comment-17882327
 ] 

Tilman Hausherr commented on PDFBOX-5879:
-

I added a simple test for the feature because it turns out we didn't have any. 
However this isn't a test of the fixed bug, that would have been more difficult 
to create a file, and there is no risk that this fix gets reverted anyway.

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882327#comment-17882327
 ] 

Tilman Hausherr edited comment on PDFBOX-5879 at 9/17/24 9:08 AM:
--

I added a simple test for the rotationMagic feature because it turns out we 
didn't have any. However this isn't a test of the fixed bug, that would have 
been more difficult to create a file, and there is no risk that this fix gets 
reverted anyway.


was (Author: tilman):
I added a simple test for the feature because it turns out we didn't have any. 
However this isn't a test of the fixed bug, that would have been more difficult 
to create a file, and there is no risk that this fix gets reverted anyway.

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
>     URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882326#comment-17882326
 ] 

ASF subversion and git services commented on PDFBOX-5879:
-

Commit 1920739 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1920739 ]

PDFBOX-5879: remove test message

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882325#comment-17882325
 ] 

ASF subversion and git services commented on PDFBOX-5879:
-

Commit 1920738 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1920738 ]

PDFBOX-5879: remove test message

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882324#comment-17882324
 ] 

ASF subversion and git services commented on PDFBOX-5879:
-

Commit 1920737 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1920737 ]

PDFBOX-5879: add test for rotationMagic

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882322#comment-17882322
 ] 

ASF subversion and git services commented on PDFBOX-5879:
-

Commit 1920736 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1920736 ]

PDFBOX-5879: add test for rotationMagic

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882318#comment-17882318
 ] 

ASF subversion and git services commented on PDFBOX-5879:
-

Commit 1920735 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1920735 ]

PDFBOX-5879: add test for rotationMagic

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882299#comment-17882299
 ] 

ASF subversion and git services commented on PDFBOX-5879:
-

Commit 1920732 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1920732 ]

PDFBOX-5879: remove unused import

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882298#comment-17882298
 ] 

ASF subversion and git services commented on PDFBOX-5879:
-

Commit 1920731 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1920731 ]

PDFBOX-5879: remove unused import

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882297#comment-17882297
 ] 

ASF subversion and git services commented on PDFBOX-5879:
-

Commit 1920730 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1920730 ]

PDFBOX-5879: remove unused import

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Resolved] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread Tilman Hausherr (Jira)



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-5879.
-
Fix Version/s: 2.0.33
   3.0.4 PDFBox
   4.0.0
 Assignee: Tilman Hausherr
   Resolution: Fixed

Thank you. It's not the commit, it's poor programming that got exposed because 
of the commit.

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Updated] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread Tilman Hausherr (Jira)



 [ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5879:

Affects Version/s: 2.0.32

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Priority: Major
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882284#comment-17882284
 ] 

ASF subversion and git services commented on PDFBOX-5879:
-

Commit 1920729 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1920729 ]

PDFBOX-5879: avoid ClassCastException

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Priority: Major
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882282#comment-17882282
 ] 

ASF subversion and git services commented on PDFBOX-5879:
-

Commit 1920728 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1920728 ]

PDFBOX-5879: avoid ClassCastException

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Priority: Major
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page

2024-09-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882281#comment-17882281
 ] 

ASF subversion and git services commented on PDFBOX-5879:
-

Commit 1920727 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1920727 ]

PDFBOX-5879: avoid ClassCastException

> Regression from PDFBOX-5841: Text extraction with rotation magic fails for 
> PDF with multiple content streams in a page
> --
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 3.0.3 PDFBox
>Reporter: Gábor Stefanik
>Priority: Major
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic 
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be 
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject 
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
>         at picocli.CommandLine.access$1500(CommandLine.java:148)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
>         at picocli.CommandLine.execute(CommandLine.java:2174)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from 
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
>  and is also attached.
> The root cause appears to be this change: 
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
>  from PDFBOX-5841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4793 matches

Mail list logo