[jira] [Commented] (PDFBOX-5660) Improve code quality (5)
[ https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887125#comment-17887125 ] ASF subversion and git services commented on PDFBOX-5660: - Commit 1921134 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1921134 ] PDFBOX-5660: update junit, download-maven-plugin > Improve code quality (5) > > > Key: PDFBOX-5660 > URL: https://issues.apache.org/jira/browse/PDFBOX-5660 > Project: PDFBox > Issue Type: Improvement >Reporter: Tilman Hausherr >Priority: Minor > Attachments: AnnotationSample.Standard.pdf, > DRY_refactoring_Typ2CharStringParser.patch, > Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch, > > Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch, > Simplify_string_conversion_in_PDFHighlighter.patch, > Update_string_handling_and_regex_in_several_classes.patch, > avoid_multiple_unboxing.patch, code_cleanup.patch, > do_not_create_temporary_File_instance.patch, > extract_common_code,_move_toUpperCase()_out_of_loop.patch, > fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, > introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch, > introduce_StringUtil_class_for_reusable_functionality.patch, > introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch, > make_inner_class_static.patch, refactor_isEndOfName.patch, > remove_code_duplication_in_Type2CharStringParser.patch, > remove_obsolete_class_NullOutputStream.patch, > remove_unnecessary_calls_to_toString()_String_valueOf().patch, > replace_System_getProperty()_calls.patch, screenshot-1.png, > simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch, > simplify_stream_operations.patch, use_Map_ofEntries().patch, > use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, > use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch, > use_String_join().patch, use_switch_for_readability.patch, > use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch > > > This is a longterm issue for the task to improve code quality, by using the > SonarQube report, hints in different IDEs, the FindBugs tool and other code > quality tools. > This is a follow-up of PDFBOX-4892, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5660) Improve code quality (5)
[ https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887123#comment-17887123 ] ASF subversion and git services commented on PDFBOX-5660: - Commit 1921133 from Tilman Hausherr in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1921133 ] PDFBOX-5660: update junit; add comment > Improve code quality (5) > > > Key: PDFBOX-5660 > URL: https://issues.apache.org/jira/browse/PDFBOX-5660 > Project: PDFBox > Issue Type: Improvement >Reporter: Tilman Hausherr >Priority: Minor > Attachments: AnnotationSample.Standard.pdf, > DRY_refactoring_Typ2CharStringParser.patch, > Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch, > > Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch, > Simplify_string_conversion_in_PDFHighlighter.patch, > Update_string_handling_and_regex_in_several_classes.patch, > avoid_multiple_unboxing.patch, code_cleanup.patch, > do_not_create_temporary_File_instance.patch, > extract_common_code,_move_toUpperCase()_out_of_loop.patch, > fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, > introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch, > introduce_StringUtil_class_for_reusable_functionality.patch, > introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch, > make_inner_class_static.patch, refactor_isEndOfName.patch, > remove_code_duplication_in_Type2CharStringParser.patch, > remove_obsolete_class_NullOutputStream.patch, > remove_unnecessary_calls_to_toString()_String_valueOf().patch, > replace_System_getProperty()_calls.patch, screenshot-1.png, > simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch, > simplify_stream_operations.patch, use_Map_ofEntries().patch, > use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, > use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch, > use_String_join().patch, use_switch_for_readability.patch, > use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch > > > This is a longterm issue for the task to improve code quality, by using the > SonarQube report, hints in different IDEs, the FindBugs tool and other code > quality tools. > This is a follow-up of PDFBOX-4892, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-5881) CVE for Lucene libraries
[ https://issues.apache.org/jira/browse/PDFBOX-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5881. - Resolution: Fixed > CVE for Lucene libraries > > > Key: PDFBOX-5881 > URL: https://issues.apache.org/jira/browse/PDFBOX-5881 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Tilman Hausherr >Assignee: Tilman Hausherr >Priority: Minor > Fix For: 2.0.33, 3.0.4 PDFBox > > > It looks like Lucene won't make any older jar files that fixes > CVE-2024-45772, so I'll add a suppression file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5881) CVE for Lucene libraries
[ https://issues.apache.org/jira/browse/PDFBOX-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887003#comment-17887003 ] ASF subversion and git services commented on PDFBOX-5881: - Commit 1921120 from Tilman Hausherr in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1921120 ] PDFBOX-5881: add suppressions.xml file > CVE for Lucene libraries > > > Key: PDFBOX-5881 > URL: https://issues.apache.org/jira/browse/PDFBOX-5881 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Tilman Hausherr >Assignee: Tilman Hausherr >Priority: Minor > Fix For: 2.0.33, 3.0.4 PDFBox > > > It looks like Lucene won't make any older jar files that fixes > CVE-2024-45772, so I'll add a suppression file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5881) CVE for Lucene libraries
[ https://issues.apache.org/jira/browse/PDFBOX-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887001#comment-17887001 ] ASF subversion and git services commented on PDFBOX-5881: - Commit 1921118 from Tilman Hausherr in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1921118 ] PDFBOX-5881: add comment > CVE for Lucene libraries > > > Key: PDFBOX-5881 > URL: https://issues.apache.org/jira/browse/PDFBOX-5881 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Tilman Hausherr >Assignee: Tilman Hausherr >Priority: Minor > Fix For: 2.0.33, 3.0.4 PDFBox > > > It looks like Lucene won't make any older jar files that fixes > CVE-2024-45772, so I'll add a suppression file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5881) CVE for Lucene libraries
[ https://issues.apache.org/jira/browse/PDFBOX-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887002#comment-17887002 ] ASF subversion and git services commented on PDFBOX-5881: - Commit 1921119 from Tilman Hausherr in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1921119 ] PDFBOX-5881: add comment > CVE for Lucene libraries > > > Key: PDFBOX-5881 > URL: https://issues.apache.org/jira/browse/PDFBOX-5881 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Tilman Hausherr >Assignee: Tilman Hausherr >Priority: Minor > Fix For: 2.0.33, 3.0.4 PDFBox > > > It looks like Lucene won't make any older jar files that fixes > CVE-2024-45772, so I'll add a suppression file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5881) CVE for Lucene libraries
[ https://issues.apache.org/jira/browse/PDFBOX-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886999#comment-17886999 ] ASF subversion and git services commented on PDFBOX-5881: - Commit 1921117 from Tilman Hausherr in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1921117 ] PDFBOX-5881: add suppressions.xml file > CVE for Lucene libraries > > > Key: PDFBOX-5881 > URL: https://issues.apache.org/jira/browse/PDFBOX-5881 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Tilman Hausherr >Assignee: Tilman Hausherr >Priority: Minor > Fix For: 2.0.33, 3.0.4 PDFBox > > > It looks like Lucene won't make any older jar files that fixes > CVE-2024-45772, so I'll add a suppression file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5881) CVE for Lucene libraries
Tilman Hausherr created PDFBOX-5881: --- Summary: CVE for Lucene libraries Key: PDFBOX-5881 URL: https://issues.apache.org/jira/browse/PDFBOX-5881 Project: PDFBox Issue Type: Bug Affects Versions: 3.0.3 PDFBox, 2.0.32 Reporter: Tilman Hausherr Assignee: Tilman Hausherr Fix For: 2.0.33, 3.0.4 PDFBox It looks like Lucene won't make any older jar files that fixes CVE-2024-45772, so I'll add a suppression file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5660) Improve code quality (5)
[ https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886984#comment-17886984 ] ASF subversion and git services commented on PDFBOX-5660: - Commit 1921114 from Tilman Hausherr in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1921114 ] PDFBOX-5660: update lucene > Improve code quality (5) > > > Key: PDFBOX-5660 > URL: https://issues.apache.org/jira/browse/PDFBOX-5660 > Project: PDFBox > Issue Type: Improvement >Reporter: Tilman Hausherr >Priority: Minor > Attachments: AnnotationSample.Standard.pdf, > DRY_refactoring_Typ2CharStringParser.patch, > Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch, > > Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch, > Simplify_string_conversion_in_PDFHighlighter.patch, > Update_string_handling_and_regex_in_several_classes.patch, > avoid_multiple_unboxing.patch, code_cleanup.patch, > do_not_create_temporary_File_instance.patch, > extract_common_code,_move_toUpperCase()_out_of_loop.patch, > fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, > introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch, > introduce_StringUtil_class_for_reusable_functionality.patch, > introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch, > make_inner_class_static.patch, refactor_isEndOfName.patch, > remove_code_duplication_in_Type2CharStringParser.patch, > remove_obsolete_class_NullOutputStream.patch, > remove_unnecessary_calls_to_toString()_String_valueOf().patch, > replace_System_getProperty()_calls.patch, screenshot-1.png, > simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch, > simplify_stream_operations.patch, use_Map_ofEntries().patch, > use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, > use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch, > use_String_join().patch, use_switch_for_readability.patch, > use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch > > > This is a longterm issue for the task to improve code quality, by using the > SonarQube report, hints in different IDEs, the FindBugs tool and other code > quality tools. > This is a follow-up of PDFBOX-4892, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5660) Improve code quality (5)
[ https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886939#comment-17886939 ] ASF subversion and git services commented on PDFBOX-5660: - Commit 1921107 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1921107 ] PDFBOX-5660: update lucene > Improve code quality (5) > > > Key: PDFBOX-5660 > URL: https://issues.apache.org/jira/browse/PDFBOX-5660 > Project: PDFBox > Issue Type: Improvement >Reporter: Tilman Hausherr >Priority: Minor > Attachments: AnnotationSample.Standard.pdf, > DRY_refactoring_Typ2CharStringParser.patch, > Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch, > > Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch, > Simplify_string_conversion_in_PDFHighlighter.patch, > Update_string_handling_and_regex_in_several_classes.patch, > avoid_multiple_unboxing.patch, code_cleanup.patch, > do_not_create_temporary_File_instance.patch, > extract_common_code,_move_toUpperCase()_out_of_loop.patch, > fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, > introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch, > introduce_StringUtil_class_for_reusable_functionality.patch, > introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch, > make_inner_class_static.patch, refactor_isEndOfName.patch, > remove_code_duplication_in_Type2CharStringParser.patch, > remove_obsolete_class_NullOutputStream.patch, > remove_unnecessary_calls_to_toString()_String_valueOf().patch, > replace_System_getProperty()_calls.patch, screenshot-1.png, > simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch, > simplify_stream_operations.patch, use_Map_ofEntries().patch, > use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, > use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch, > use_String_join().patch, use_switch_for_readability.patch, > use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch > > > This is a longterm issue for the task to improve code quality, by using the > SonarQube report, hints in different IDEs, the FindBugs tool and other code > quality tools. > This is a follow-up of PDFBOX-4892, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5660) Improve code quality (5)
[ https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886930#comment-17886930 ] ASF subversion and git services commented on PDFBOX-5660: - Commit 1921106 from Tilman Hausherr in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1921106 ] PDFBOX-5660: update lucene > Improve code quality (5) > > > Key: PDFBOX-5660 > URL: https://issues.apache.org/jira/browse/PDFBOX-5660 > Project: PDFBox > Issue Type: Improvement >Reporter: Tilman Hausherr >Priority: Minor > Attachments: AnnotationSample.Standard.pdf, > DRY_refactoring_Typ2CharStringParser.patch, > Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch, > > Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch, > Simplify_string_conversion_in_PDFHighlighter.patch, > Update_string_handling_and_regex_in_several_classes.patch, > avoid_multiple_unboxing.patch, code_cleanup.patch, > do_not_create_temporary_File_instance.patch, > extract_common_code,_move_toUpperCase()_out_of_loop.patch, > fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, > introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch, > introduce_StringUtil_class_for_reusable_functionality.patch, > introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch, > make_inner_class_static.patch, refactor_isEndOfName.patch, > remove_code_duplication_in_Type2CharStringParser.patch, > remove_obsolete_class_NullOutputStream.patch, > remove_unnecessary_calls_to_toString()_String_valueOf().patch, > replace_System_getProperty()_calls.patch, screenshot-1.png, > simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch, > simplify_stream_operations.patch, use_Map_ofEntries().patch, > use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, > use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch, > use_String_join().patch, use_switch_for_readability.patch, > use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch > > > This is a longterm issue for the task to improve code quality, by using the > SonarQube report, hints in different IDEs, the FindBugs tool and other code > quality tools. > This is a follow-up of PDFBOX-4892, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5660) Improve code quality (5)
[ https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886928#comment-17886928 ] ASF subversion and git services commented on PDFBOX-5660: - Commit 1921105 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1921105 ] PDFBOX-5660: update lucene > Improve code quality (5) > > > Key: PDFBOX-5660 > URL: https://issues.apache.org/jira/browse/PDFBOX-5660 > Project: PDFBox > Issue Type: Improvement >Reporter: Tilman Hausherr >Priority: Minor > Attachments: AnnotationSample.Standard.pdf, > DRY_refactoring_Typ2CharStringParser.patch, > Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch, > > Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch, > Simplify_string_conversion_in_PDFHighlighter.patch, > Update_string_handling_and_regex_in_several_classes.patch, > avoid_multiple_unboxing.patch, code_cleanup.patch, > do_not_create_temporary_File_instance.patch, > extract_common_code,_move_toUpperCase()_out_of_loop.patch, > fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, > introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch, > introduce_StringUtil_class_for_reusable_functionality.patch, > introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch, > make_inner_class_static.patch, refactor_isEndOfName.patch, > remove_code_duplication_in_Type2CharStringParser.patch, > remove_obsolete_class_NullOutputStream.patch, > remove_unnecessary_calls_to_toString()_String_valueOf().patch, > replace_System_getProperty()_calls.patch, screenshot-1.png, > simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch, > simplify_stream_operations.patch, use_Map_ofEntries().patch, > use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, > use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch, > use_String_join().patch, use_switch_for_readability.patch, > use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch > > > This is a longterm issue for the task to improve code quality, by using the > SonarQube report, hints in different IDEs, the FindBugs tool and other code > quality tools. > This is a follow-up of PDFBOX-4892, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI
[ https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886738#comment-17886738 ] Andreas Lehmkühler commented on PDFBOX-4718: I've expected that, I'll have a look ... thanks again for the fast feedback > OutOfMemoryError - during renderImageWithDPI > > > Key: PDFBOX-4718 > URL: https://issues.apache.org/jira/browse/PDFBOX-4718 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.17, 3.0.3 PDFBox, 4.0.0 > Environment: macOS Mojave (10.14.6) > Java 11.0.2 -Xmx10G -Xms10G >Reporter: Serhii Kolesnyk >Assignee: Andreas Lehmkühler >Priority: Blocker > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, > example.pdf, image-2019-12-19-05-55-57-648.png > > > During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_ > {code:java} > Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) > at > java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) > at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at > java.base/java.lang.Thread.run(Thread.java:834) > java.lang.OutOfMemoryError: Java heap space > at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at > java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at > java.desktop/java.awt.geom.Area.intersect(Area.java:293) at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618) > at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597) > at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at > org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152) > at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at > org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code} > We check the different setting of MemoryUsageSetting (TempFileOnly, > MainMemoryOnly), settings of DPI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI
[ https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886737#comment-17886737 ] Tilman Hausherr edited comment on PDFBOX-4718 at 10/3/24 5:39 PM: -- Sadly some differences in rendering: PDFBOX-2557, PDFBOX-3182, PDFBOX-5842 (VW logo missing), PDFBOX-3116.pdf (half-circles bottom right) was (Author: tilman): Sadly some differences in rendering: PDFBOX-2557, PDFBOX-3182, PDFBOX-5842 (VW logo missing), PDFBOX-3116.pdf (circles bottom right) > OutOfMemoryError - during renderImageWithDPI > > > Key: PDFBOX-4718 > URL: https://issues.apache.org/jira/browse/PDFBOX-4718 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.17, 3.0.3 PDFBox, 4.0.0 > Environment: macOS Mojave (10.14.6) > Java 11.0.2 -Xmx10G -Xms10G >Reporter: Serhii Kolesnyk >Assignee: Andreas Lehmkühler >Priority: Blocker > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, > example.pdf, image-2019-12-19-05-55-57-648.png > > > During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_ > {code:java} > Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) > at > java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) > at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at > java.base/java.lang.Thread.run(Thread.java:834) > java.lang.OutOfMemoryError: Java heap space > at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at > java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at > java.desktop/java.awt.geom.Area.intersect(Area.java:293) at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618) > at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597) > at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at > org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152) > at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at > org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code} > We check the different setting of MemoryUsageSetting (TempFileOnly, > MainMemoryOnly), settings of DPI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI
[ https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886737#comment-17886737 ] Tilman Hausherr commented on PDFBOX-4718: - Sadly some differences in rendering: PDFBOX-2557, PDFBOX-3182, PDFBOX-5842 (VW logo missing), PDFBOX-3116.pdf (circles bottom right) > OutOfMemoryError - during renderImageWithDPI > > > Key: PDFBOX-4718 > URL: https://issues.apache.org/jira/browse/PDFBOX-4718 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.17, 3.0.3 PDFBox, 4.0.0 > Environment: macOS Mojave (10.14.6) > Java 11.0.2 -Xmx10G -Xms10G >Reporter: Serhii Kolesnyk >Assignee: Andreas Lehmkühler >Priority: Blocker > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, > example.pdf, image-2019-12-19-05-55-57-648.png > > > During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_ > {code:java} > Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) > at > java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) > at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at > java.base/java.lang.Thread.run(Thread.java:834) > java.lang.OutOfMemoryError: Java heap space > at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at > java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at > java.desktop/java.awt.geom.Area.intersect(Area.java:293) at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618) > at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597) > at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at > org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152) > at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at > org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code} > We check the different setting of MemoryUsageSetting (TempFileOnly, > MainMemoryOnly), settings of DPI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4743) Long rendering time of fonts in a specific PDF
[ https://issues.apache.org/jira/browse/PDFBOX-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886734#comment-17886734 ] Andreas Lehmkühler commented on PDFBOX-4743: My changes from PDFBOX-4718 speed up the rendering by factor 3. On my machine it takes about 14-15 seconds to render [^slow_rendering.pdf] > Long rendering time of fonts in a specific PDF > -- > > Key: PDFBOX-4743 > URL: https://issues.apache.org/jira/browse/PDFBOX-4743 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.32, 3.0.3 PDFBox, 4.0.0 > Environment: Gentoo Linux, Java 8 >Reporter: Daniel Persson >Assignee: Andreas Lehmkühler >Priority: Minor > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: image-2020-01-18-04-11-00-132.png, slow_rendering.pdf, > without_images.pdf, without_text.pdf > > > Hi Team. > > We have found a PDF that takes a long time to render images. > > After some checking, we found that the one page takes more than 2 minutes to > render, but if we remove the font information and render the PDF without > text, it takes 3 seconds. > > Just looking at the font information, it doesn't seem to be a lot of data. > 3-5kb per font and there are only about seven fonts defined. So there must be > something else that complicates things. > > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4743) Long rendering time of fonts in a specific PDF
[ https://issues.apache.org/jira/browse/PDFBOX-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-4743: --- Fix Version/s: 2.0.33 3.0.4 PDFBox 4.0.0 > Long rendering time of fonts in a specific PDF > -- > > Key: PDFBOX-4743 > URL: https://issues.apache.org/jira/browse/PDFBOX-4743 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.32, 3.0.3 PDFBox, 4.0.0 > Environment: Gentoo Linux, Java 8 >Reporter: Daniel Persson >Assignee: Andreas Lehmkühler >Priority: Minor > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: image-2020-01-18-04-11-00-132.png, slow_rendering.pdf, > without_images.pdf, without_text.pdf > > > Hi Team. > > We have found a PDF that takes a long time to render images. > > After some checking, we found that the one page takes more than 2 minutes to > render, but if we remove the font information and render the PDF without > text, it takes 3 seconds. > > Just looking at the font information, it doesn't seem to be a lot of data. > 3-5kb per font and there are only about seven fonts defined. So there must be > something else that complicates things. > > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI
[ https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-4718: --- Fix Version/s: 2.0.33 3.0.4 PDFBox > OutOfMemoryError - during renderImageWithDPI > > > Key: PDFBOX-4718 > URL: https://issues.apache.org/jira/browse/PDFBOX-4718 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.17, 3.0.3 PDFBox, 4.0.0 > Environment: macOS Mojave (10.14.6) > Java 11.0.2 -Xmx10G -Xms10G >Reporter: Serhii Kolesnyk >Assignee: Andreas Lehmkühler >Priority: Blocker > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, > example.pdf, image-2019-12-19-05-55-57-648.png > > > During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_ > {code:java} > Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) > at > java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) > at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at > java.base/java.lang.Thread.run(Thread.java:834) > java.lang.OutOfMemoryError: Java heap space > at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at > java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at > java.desktop/java.awt.geom.Area.intersect(Area.java:293) at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618) > at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597) > at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at > org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152) > at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at > org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code} > We check the different setting of MemoryUsageSetting (TempFileOnly, > MainMemoryOnly), settings of DPI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI
[ https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-4718: --- Affects Version/s: 3.0.3 PDFBox 4.0.0 (was: 2.0.12) > OutOfMemoryError - during renderImageWithDPI > > > Key: PDFBOX-4718 > URL: https://issues.apache.org/jira/browse/PDFBOX-4718 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.17, 3.0.3 PDFBox, 4.0.0 > Environment: macOS Mojave (10.14.6) > Java 11.0.2 -Xmx10G -Xms10G >Reporter: Serhii Kolesnyk >Assignee: Andreas Lehmkühler >Priority: Blocker > Fix For: 4.0.0 > > Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, > example.pdf, image-2019-12-19-05-55-57-648.png > > > During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_ > {code:java} > Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) > at > java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) > at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at > java.base/java.lang.Thread.run(Thread.java:834) > java.lang.OutOfMemoryError: Java heap space > at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at > java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at > java.desktop/java.awt.geom.Area.intersect(Area.java:293) at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618) > at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597) > at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at > org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152) > at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at > org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code} > We check the different setting of MemoryUsageSetting (TempFileOnly, > MainMemoryOnly), settings of DPI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4743) Long rendering time of fonts in a specific PDF
[ https://issues.apache.org/jira/browse/PDFBOX-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-4743: --- Affects Version/s: 3.0.3 PDFBox 2.0.32 4.0.0 > Long rendering time of fonts in a specific PDF > -- > > Key: PDFBOX-4743 > URL: https://issues.apache.org/jira/browse/PDFBOX-4743 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.32, 3.0.3 PDFBox, 4.0.0 > Environment: Gentoo Linux, Java 8 >Reporter: Daniel Persson >Assignee: Andreas Lehmkühler >Priority: Minor > Attachments: image-2020-01-18-04-11-00-132.png, slow_rendering.pdf, > without_images.pdf, without_text.pdf > > > Hi Team. > > We have found a PDF that takes a long time to render images. > > After some checking, we found that the one page takes more than 2 minutes to > render, but if we remove the font information and render the PDF without > text, it takes 3 seconds. > > Just looking at the font information, it doesn't seem to be a lot of data. > 3-5kb per font and there are only about seven fonts defined. So there must be > something else that complicates things. > > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Assigned] (PDFBOX-4743) Long rendering time of fonts in a specific PDF
[ https://issues.apache.org/jira/browse/PDFBOX-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler reassigned PDFBOX-4743: -- Assignee: Andreas Lehmkühler > Long rendering time of fonts in a specific PDF > -- > > Key: PDFBOX-4743 > URL: https://issues.apache.org/jira/browse/PDFBOX-4743 > Project: PDFBox > Issue Type: Improvement > Environment: Gentoo Linux, Java 8 >Reporter: Daniel Persson >Assignee: Andreas Lehmkühler >Priority: Minor > Attachments: image-2020-01-18-04-11-00-132.png, slow_rendering.pdf, > without_images.pdf, without_text.pdf > > > Hi Team. > > We have found a PDF that takes a long time to render images. > > After some checking, we found that the one page takes more than 2 minutes to > render, but if we remove the font information and render the PDF without > text, it takes 3 seconds. > > Just looking at the font information, it doesn't seem to be a lot of data. > 3-5kb per font and there are only about seven fonts defined. So there must be > something else that complicates things. > > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI
[ https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886732#comment-17886732 ] Andreas Lehmkühler commented on PDFBOX-4718: i've found a workaround so that the attached pdf is rendered in about 4-5 seconds Some details: * I've calculated an intersected overall bound box from all clipping paths * the overall bounding box is used a starting point for the calculation of the intersected clipping path. This could decrease the complexity in some cases, so that the call of Area.intersect needs less resources/time to calculate * clipping paths which represent a rectangular are skipped as they were already taken into account when calculation the intersected overall bound box > OutOfMemoryError - during renderImageWithDPI > > > Key: PDFBOX-4718 > URL: https://issues.apache.org/jira/browse/PDFBOX-4718 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.12, 2.0.17 > Environment: macOS Mojave (10.14.6) > Java 11.0.2 -Xmx10G -Xms10G >Reporter: Serhii Kolesnyk >Assignee: Andreas Lehmkühler >Priority: Blocker > Fix For: 4.0.0 > > Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, > example.pdf, image-2019-12-19-05-55-57-648.png > > > During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_ > {code:java} > Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) > at > java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) > at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at > java.base/java.lang.Thread.run(Thread.java:834) > java.lang.OutOfMemoryError: Java heap space > at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at > java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at > java.desktop/java.awt.geom.Area.intersect(Area.java:293) at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618) > at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597) > at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at > org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152) > at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at > org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code} > We check the different setting of MemoryUsageSetting (TempFileOnly, > MainMemoryOnly), settings of DPI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI
[ https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886731#comment-17886731 ] ASF subversion and git services commented on PDFBOX-4718: - Commit 1921096 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1921096 ] PDFBOX-4718: optimize intersection of clipping paths > OutOfMemoryError - during renderImageWithDPI > > > Key: PDFBOX-4718 > URL: https://issues.apache.org/jira/browse/PDFBOX-4718 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.12, 2.0.17 > Environment: macOS Mojave (10.14.6) > Java 11.0.2 -Xmx10G -Xms10G >Reporter: Serhii Kolesnyk >Assignee: Andreas Lehmkühler >Priority: Blocker > Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, > example.pdf, image-2019-12-19-05-55-57-648.png > > > During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_ > {code:java} > Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) > at > java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) > at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at > java.base/java.lang.Thread.run(Thread.java:834) > java.lang.OutOfMemoryError: Java heap space > at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at > java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at > java.desktop/java.awt.geom.Area.intersect(Area.java:293) at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618) > at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597) > at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at > org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152) > at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at > org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code} > We check the different setting of MemoryUsageSetting (TempFileOnly, > MainMemoryOnly), settings of DPI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI
[ https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-4718: --- Fix Version/s: 4.0.0 > OutOfMemoryError - during renderImageWithDPI > > > Key: PDFBOX-4718 > URL: https://issues.apache.org/jira/browse/PDFBOX-4718 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.12, 2.0.17 > Environment: macOS Mojave (10.14.6) > Java 11.0.2 -Xmx10G -Xms10G >Reporter: Serhii Kolesnyk >Assignee: Andreas Lehmkühler >Priority: Blocker > Fix For: 4.0.0 > > Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, > example.pdf, image-2019-12-19-05-55-57-648.png > > > During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_ > {code:java} > Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) > at > java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) > at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at > java.base/java.lang.Thread.run(Thread.java:834) > java.lang.OutOfMemoryError: Java heap space > at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at > java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at > java.desktop/java.awt.geom.Area.intersect(Area.java:293) at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618) > at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597) > at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at > org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152) > at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at > org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code} > We check the different setting of MemoryUsageSetting (TempFileOnly, > MainMemoryOnly), settings of DPI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Assigned] (PDFBOX-4718) OutOfMemoryError - during renderImageWithDPI
[ https://issues.apache.org/jira/browse/PDFBOX-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler reassigned PDFBOX-4718: -- Assignee: Andreas Lehmkühler > OutOfMemoryError - during renderImageWithDPI > > > Key: PDFBOX-4718 > URL: https://issues.apache.org/jira/browse/PDFBOX-4718 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.12, 2.0.17 > Environment: macOS Mojave (10.14.6) > Java 11.0.2 -Xmx10G -Xms10G >Reporter: Serhii Kolesnyk >Assignee: Andreas Lehmkühler >Priority: Blocker > Attachments: PDFBOX-4718-reduced.pdf, PDFBox4718Intersect.java, > example.pdf, image-2019-12-19-05-55-57-648.png > > > During rendering pdf we receive _java.lang.OutOfMemoryError: Java heap space_ > {code:java} > Exception in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > spaceException in thread "AWT-Shutdown" java.lang.OutOfMemoryError: Java heap > space at java.desktop/sun.awt.AppContext.getAppContexts(AppContext.java:167) > at > java.desktop/sun.awt.AppContext.stopEventDispatchThreads(AppContext.java:610) > at java.desktop/sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:322) at > java.base/java.lang.Thread.run(Thread.java:834) > java.lang.OutOfMemoryError: Java heap space > at java.desktop/sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:362) at > java.desktop/sun.awt.geom.AreaOp.calculate(AreaOp.java:159) at > java.desktop/java.awt.geom.Area.intersect(Area.java:293) at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:618) > at > org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.intersectClippingPath(PDGraphicsState.java:597) > at org.apache.pdfbox.rendering.PageDrawer.endPath(PageDrawer.java:936) at > org.apache.pdfbox.contentstream.operator.graphics.EndPath.process(EndPath.java:35) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152) > at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:262) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) at > org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229){code} > We check the different setting of MemoryUsageSetting (TempFileOnly, > MainMemoryOnly), settings of DPI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5660) Improve code quality (5)
[ https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886594#comment-17886594 ] ASF subversion and git services commented on PDFBOX-5660: - Commit 1921093 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1921093 ] PDFBOX-5660: update mockito > Improve code quality (5) > > > Key: PDFBOX-5660 > URL: https://issues.apache.org/jira/browse/PDFBOX-5660 > Project: PDFBox > Issue Type: Improvement >Reporter: Tilman Hausherr >Priority: Minor > Attachments: AnnotationSample.Standard.pdf, > DRY_refactoring_Typ2CharStringParser.patch, > Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch, > > Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch, > Simplify_string_conversion_in_PDFHighlighter.patch, > Update_string_handling_and_regex_in_several_classes.patch, > avoid_multiple_unboxing.patch, code_cleanup.patch, > do_not_create_temporary_File_instance.patch, > extract_common_code,_move_toUpperCase()_out_of_loop.patch, > fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, > introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch, > introduce_StringUtil_class_for_reusable_functionality.patch, > introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch, > make_inner_class_static.patch, refactor_isEndOfName.patch, > remove_code_duplication_in_Type2CharStringParser.patch, > remove_obsolete_class_NullOutputStream.patch, > remove_unnecessary_calls_to_toString()_String_valueOf().patch, > replace_System_getProperty()_calls.patch, screenshot-1.png, > simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch, > simplify_stream_operations.patch, use_Map_ofEntries().patch, > use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, > use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch, > use_String_join().patch, use_switch_for_readability.patch, > use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch > > > This is a longterm issue for the task to improve code quality, by using the > SonarQube report, hints in different IDEs, the FindBugs tool and other code > quality tools. > This is a follow-up of PDFBOX-4892, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5660) Improve code quality (5)
[ https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885808#comment-17885808 ] ASF subversion and git services commented on PDFBOX-5660: - Commit 1921026 from Tilman Hausherr in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1921026 ] PDFBOX-5660: update log4j > Improve code quality (5) > > > Key: PDFBOX-5660 > URL: https://issues.apache.org/jira/browse/PDFBOX-5660 > Project: PDFBox > Issue Type: Improvement >Reporter: Tilman Hausherr >Priority: Minor > Attachments: AnnotationSample.Standard.pdf, > DRY_refactoring_Typ2CharStringParser.patch, > Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch, > > Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch, > Simplify_string_conversion_in_PDFHighlighter.patch, > Update_string_handling_and_regex_in_several_classes.patch, > avoid_multiple_unboxing.patch, code_cleanup.patch, > do_not_create_temporary_File_instance.patch, > extract_common_code,_move_toUpperCase()_out_of_loop.patch, > fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, > introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch, > introduce_StringUtil_class_for_reusable_functionality.patch, > introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch, > make_inner_class_static.patch, refactor_isEndOfName.patch, > remove_code_duplication_in_Type2CharStringParser.patch, > remove_obsolete_class_NullOutputStream.patch, > remove_unnecessary_calls_to_toString()_String_valueOf().patch, > replace_System_getProperty()_calls.patch, screenshot-1.png, > simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch, > simplify_stream_operations.patch, use_Map_ofEntries().patch, > use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, > use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch, > use_String_join().patch, use_switch_for_readability.patch, > use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch > > > This is a longterm issue for the task to improve code quality, by using the > SonarQube report, hints in different IDEs, the FindBugs tool and other code > quality tools. > This is a follow-up of PDFBOX-4892, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5660) Improve code quality (5)
[ https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885807#comment-17885807 ] ASF subversion and git services commented on PDFBOX-5660: - Commit 1921025 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1921025 ] PDFBOX-5660: update log4j > Improve code quality (5) > > > Key: PDFBOX-5660 > URL: https://issues.apache.org/jira/browse/PDFBOX-5660 > Project: PDFBox > Issue Type: Improvement >Reporter: Tilman Hausherr >Priority: Minor > Attachments: AnnotationSample.Standard.pdf, > DRY_refactoring_Typ2CharStringParser.patch, > Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch, > > Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch, > Simplify_string_conversion_in_PDFHighlighter.patch, > Update_string_handling_and_regex_in_several_classes.patch, > avoid_multiple_unboxing.patch, code_cleanup.patch, > do_not_create_temporary_File_instance.patch, > extract_common_code,_move_toUpperCase()_out_of_loop.patch, > fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, > introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch, > introduce_StringUtil_class_for_reusable_functionality.patch, > introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch, > make_inner_class_static.patch, refactor_isEndOfName.patch, > remove_code_duplication_in_Type2CharStringParser.patch, > remove_obsolete_class_NullOutputStream.patch, > remove_unnecessary_calls_to_toString()_String_valueOf().patch, > replace_System_getProperty()_calls.patch, screenshot-1.png, > simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch, > simplify_stream_operations.patch, use_Map_ofEntries().patch, > use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, > use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch, > use_String_join().patch, use_switch_for_readability.patch, > use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch > > > This is a longterm issue for the task to improve code quality, by using the > SonarQube report, hints in different IDEs, the FindBugs tool and other code > quality tools. > This is a follow-up of PDFBOX-4892, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5876) This jpeg2000 takes up a lot of memory, causing overflow.
[ https://issues.apache.org/jira/browse/PDFBOX-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-5876: --- Fix Version/s: (was: 4.0.0) (was: 2.0.33) (was: 3.0.4 PDFBox) > This jpeg2000 takes up a lot of memory, causing overflow. > - > > Key: PDFBOX-5876 > URL: https://issues.apache.org/jira/browse/PDFBOX-5876 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.32, 3.0.2 PDFBox >Reporter: liu >Assignee: Tilman Hausherr >Priority: Major > Attachments: jpeg2000.pdf > > > pdf:[^jpeg2000.pdf] > JVM:-Xmx600m > {code:java} > //代码占位符 > public static void main(String[] args) throws IOException, > InterruptedException { >File file = new File("C:\\Users\\LYCIT\\Downloads\\jpeg2000.pdf"); >PDDocument pdf = Loader.loadPDF(file, > IOUtils.createTempFileOnlyStreamCache()); >PDFRenderer renderer = new PDFRenderer(pdf); >int numPages = 0; >renderer.setSubsamplingAllowed(true); >BufferedImage bi = renderer.renderImage(numPages, 0.5f); >pdf.close(); > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected e
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler resolved PDFBOX-5880. Resolution: Fixed The given pdf is a corner case and it works now. [~tilman] thanks again for your help [~jezerinac] thanks for the report > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885689#comment-17885689 ] ASF subversion and git services commented on PDFBOX-5880: - Commit 1921020 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1921020 ] PDFBOX-5880: set missing/replace invalid stream length > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Fix For: 2.0.33, 3.0.4 PDFBox > > Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-5880: --- Fix Version/s: 4.0.0 > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-5880: --- Fix Version/s: 2.0.33 > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Fix For: 2.0.33, 3.0.4 PDFBox > > Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885687#comment-17885687 ] ASF subversion and git services commented on PDFBOX-5880: - Commit 1921019 from le...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1921019 ] PDFBOX-5880: don't restore invalid stream length values <= 0, mark stream length values <= as invalid > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Fix For: 3.0.4 PDFBox > > Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885635#comment-17885635 ] Tilman Hausherr commented on PDFBOX-5880: - Now it works! > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Fix For: 3.0.4 PDFBox > > Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-5880: --- Fix Version/s: 3.0.4 PDFBox > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Fix For: 3.0.4 PDFBox > > Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885621#comment-17885621 ] Andreas Lehmkühler commented on PDFBOX-5880: It should work again. I've mixed up the logic in validateStreamLength so that the pointer into the source wasn't reset to the origin offset in one case > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885620#comment-17885620 ] ASF subversion and git services commented on PDFBOX-5880: - Commit 1921011 from le...@apache.org in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1921011 ] PDFBOX-5880: always seek to the origin offset, set/replace length value only if needed > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5660) Improve code quality (5)
[ https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885525#comment-17885525 ] ASF subversion and git services commented on PDFBOX-5660: - Commit 1921003 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1921003 ] PDFBOX-5660: update mockito > Improve code quality (5) > > > Key: PDFBOX-5660 > URL: https://issues.apache.org/jira/browse/PDFBOX-5660 > Project: PDFBox > Issue Type: Improvement >Reporter: Tilman Hausherr >Priority: Minor > Attachments: AnnotationSample.Standard.pdf, > DRY_refactoring_Typ2CharStringParser.patch, > Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch, > > Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch, > Simplify_string_conversion_in_PDFHighlighter.patch, > Update_string_handling_and_regex_in_several_classes.patch, > avoid_multiple_unboxing.patch, code_cleanup.patch, > do_not_create_temporary_File_instance.patch, > extract_common_code,_move_toUpperCase()_out_of_loop.patch, > fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, > introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch, > introduce_StringUtil_class_for_reusable_functionality.patch, > introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch, > make_inner_class_static.patch, refactor_isEndOfName.patch, > remove_code_duplication_in_Type2CharStringParser.patch, > remove_obsolete_class_NullOutputStream.patch, > remove_unnecessary_calls_to_toString()_String_valueOf().patch, > replace_System_getProperty()_calls.patch, screenshot-1.png, > simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch, > simplify_stream_operations.patch, use_Map_ofEntries().patch, > use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, > use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch, > use_String_join().patch, use_switch_for_readability.patch, > use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch > > > This is a longterm issue for the task to improve code quality, by using the > SonarQube report, hints in different IDEs, the FindBugs tool and other code > quality tools. > This is a follow-up of PDFBOX-4892, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885428#comment-17885428 ] Andreas Lehmkühler commented on PDFBOX-5880: Thanks for the pointer, I'm going to look into it > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5880: Attachment: PDFBOX-1094-PDFBOX-269.pdf > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885251#comment-17885251 ] Tilman Hausherr commented on PDFBOX-5880: - Several differences, e.g. [^PDFBOX-1094-PDFBOX-269.pdf] page 2ff, the light background is different. Also the file of PDFBOX-1738. > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Attachments: PDFBOX-1094-PDFBOX-269.pdf, test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885226#comment-17885226 ] Andreas Lehmkühler commented on PDFBOX-5880: [~tilman] that's a good idea. but I'd prefer to do so in the COSParser so that the context don't get lost. > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Attachments: test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885227#comment-17885227 ] ASF subversion and git services commented on PDFBOX-5880: - Commit 1920970 from le...@apache.org in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1920970 ] PDFBOX-5880: set missing/replace invalid stream length > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Attachments: test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Assigned] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected e
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler reassigned PDFBOX-5880: -- Assignee: Andreas Lehmkühler > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Attachments: test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler resolved PDFBOX-5852. Resolution: Fixed I guess we are done here. [~larry.l...@workiva.com] thanks for the report and the sample pdf [~tilman] thanks for your input and help > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 4.0.0, 3.0.3 PDFBox > > Attachments: CIB-coonsmesh.pdf, minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884871#comment-17884871 ] ASF subversion and git services commented on PDFBOX-5852: - Commit 1920945 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1920945 ] PDFBOX-5852: don't create an unused array > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: CIB-coonsmesh.pdf, minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884869#comment-17884869 ] ASF subversion and git services commented on PDFBOX-5852: - Commit 1920943 from le...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1920943 ] PDFBOX-5852: don't create an unused array > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: CIB-coonsmesh.pdf, minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884870#comment-17884870 ] ASF subversion and git services commented on PDFBOX-5852: - Commit 1920944 from le...@apache.org in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1920944 ] PDFBOX-5852: don't create an unused array > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: CIB-coonsmesh.pdf, minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884866#comment-17884866 ] ASF subversion and git services commented on PDFBOX-5852: - Commit 1920942 from le...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1920942 ] PDFBOX-5852: replace Integer with int, add some minor optimizations > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: CIB-coonsmesh.pdf, minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884864#comment-17884864 ] ASF subversion and git services commented on PDFBOX-5852: - Commit 1920941 from le...@apache.org in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1920941 ] PDFBOX-5852: replace Integer with int, add some minor optimizations > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: CIB-coonsmesh.pdf, minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884739#comment-17884739 ] Tilman Hausherr commented on PDFBOX-5852: - All good now, thanks! > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: CIB-coonsmesh.pdf, minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884694#comment-17884694 ] Andreas Lehmkühler commented on PDFBOX-5852: That was an easy fix ;-) The implementations of {{calcPixelTableArray}} weren't in line. I forgot to add an offset of one in one case > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: CIB-coonsmesh.pdf, minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884693#comment-17884693 ] ASF subversion and git services commented on PDFBOX-5852: - Commit 1920923 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1920923 ] PDFBOX-5852: fix ArrayIndexOutOfBoundsException > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: CIB-coonsmesh.pdf, minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884692#comment-17884692 ] Andreas Lehmkühler commented on PDFBOX-5852: [~tilman] thanks for the feedback and the sample pdf. I'm going to have a look > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: CIB-coonsmesh.pdf, minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5660) Improve code quality (5)
[ https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884598#comment-17884598 ] ASF subversion and git services commented on PDFBOX-5660: - Commit 1920908 from Tilman Hausherr in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1920908 ] PDFBOX-5660: update junit > Improve code quality (5) > > > Key: PDFBOX-5660 > URL: https://issues.apache.org/jira/browse/PDFBOX-5660 > Project: PDFBox > Issue Type: Improvement >Reporter: Tilman Hausherr >Priority: Minor > Attachments: AnnotationSample.Standard.pdf, > DRY_refactoring_Typ2CharStringParser.patch, > Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch, > > Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch, > Simplify_string_conversion_in_PDFHighlighter.patch, > Update_string_handling_and_regex_in_several_classes.patch, > avoid_multiple_unboxing.patch, code_cleanup.patch, > do_not_create_temporary_File_instance.patch, > extract_common_code,_move_toUpperCase()_out_of_loop.patch, > fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, > introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch, > introduce_StringUtil_class_for_reusable_functionality.patch, > introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch, > make_inner_class_static.patch, refactor_isEndOfName.patch, > remove_code_duplication_in_Type2CharStringParser.patch, > remove_obsolete_class_NullOutputStream.patch, > remove_unnecessary_calls_to_toString()_String_valueOf().patch, > replace_System_getProperty()_calls.patch, screenshot-1.png, > simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch, > simplify_stream_operations.patch, use_Map_ofEntries().patch, > use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, > use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch, > use_String_join().patch, use_switch_for_readability.patch, > use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch > > > This is a longterm issue for the task to improve code quality, by using the > SonarQube report, hints in different IDEs, the FindBugs tool and other code > quality tools. > This is a follow-up of PDFBOX-4892, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5660) Improve code quality (5)
[ https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884597#comment-17884597 ] ASF subversion and git services commented on PDFBOX-5660: - Commit 1920907 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1920907 ] PDFBOX-5660: update junit > Improve code quality (5) > > > Key: PDFBOX-5660 > URL: https://issues.apache.org/jira/browse/PDFBOX-5660 > Project: PDFBox > Issue Type: Improvement >Reporter: Tilman Hausherr >Priority: Minor > Attachments: AnnotationSample.Standard.pdf, > DRY_refactoring_Typ2CharStringParser.patch, > Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch, > > Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch, > Simplify_string_conversion_in_PDFHighlighter.patch, > Update_string_handling_and_regex_in_several_classes.patch, > avoid_multiple_unboxing.patch, code_cleanup.patch, > do_not_create_temporary_File_instance.patch, > extract_common_code,_move_toUpperCase()_out_of_loop.patch, > fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, > introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch, > introduce_StringUtil_class_for_reusable_functionality.patch, > introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch, > make_inner_class_static.patch, refactor_isEndOfName.patch, > remove_code_duplication_in_Type2CharStringParser.patch, > remove_obsolete_class_NullOutputStream.patch, > remove_unnecessary_calls_to_toString()_String_valueOf().patch, > replace_System_getProperty()_calls.patch, screenshot-1.png, > simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch, > simplify_stream_operations.patch, use_Map_ofEntries().patch, > use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, > use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch, > use_String_join().patch, use_switch_for_readability.patch, > use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch > > > This is a longterm issue for the task to improve code quality, by using the > SonarQube report, hints in different IDEs, the FindBugs tool and other code > quality tools. > This is a follow-up of PDFBOX-4892, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884548#comment-17884548 ] Tilman Hausherr commented on PDFBOX-5880: - proposed change is to add {{stream.setLong(COSName.LENGTH, streamLength);}} or change the foreach loop that it doesn't overwrite the length entry. > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Priority: Major > Labels: regression > Attachments: test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884540#comment-17884540 ] Tilman Hausherr commented on PDFBOX-5852: - E.g. with this file: [^CIB-coonsmesh.pdf] ArrayIndexOutOfBoundsException: Index 400 out of bounds for length 400 org.apache.pdfbox.pdmodel.graphics.shading.PatchMeshesShadingContext.calcPixelTableArray(PatchMeshesShadingContext.java:67) org.apache.pdfbox.pdmodel.graphics.shading.TriangleBasedShadingContext.createPixelTable(TriangleBasedShadingContext.java:67) org.apache.pdfbox.pdmodel.graphics.shading.PatchMeshesShadingContext.(PatchMeshesShadingContext.java:57) org.apache.pdfbox.pdmodel.graphics.shading.Type6ShadingContext.(Type6ShadingContext.java:45) org.apache.pdfbox.pdmodel.graphics.shading.Type6ShadingPaint.createContext(Type6ShadingPaint.java:63) > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: CIB-coonsmesh.pdf, minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5852: Attachment: CIB-coonsmesh.pdf > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: CIB-coonsmesh.pdf, minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884533#comment-17884533 ] Tilman Hausherr commented on PDFBOX-5852: - Lots of regressions, I need to check whether this is because of another change I just did, or if the first test didn't have the new code activated. > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852 ] Tilman Hausherr deleted comment on PDFBOX-5852: - was (Author: tilman): No regressions 👍 > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884531#comment-17884531 ] Tilman Hausherr commented on PDFBOX-5880: - The problem is here: {code:java} public COSStream createCOSStream(COSDictionary dictionary, long startPosition, long streamLength) throws IOException { COSStream stream = new COSStream(streamCache, parser.createRandomAccessReadView(startPosition, streamLength)); dictionary.forEach(stream::setItem); stream.setKey(dictionary.getKey()); return stream; } {code} The foreach loop overwrites the length. For some reason this didn't make troubles in the past with wrong lengths, only this time with a zero length that is an indirect object. > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Priority: Major > Labels: regression > Attachments: test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884528#comment-17884528 ] Tilman Hausherr commented on PDFBOX-5852: - No regressions 👍 > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884511#comment-17884511 ] ASF subversion and git services commented on PDFBOX-5852: - Commit 1920894 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1920894 ] PDFBOX-5852: replace Integer with int, add some minor optimizations > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expe
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884492#comment-17884492 ] Tilman Hausherr edited comment on PDFBOX-5880 at 9/25/24 3:55 AM: -- The PDF image stream has an (incorrect) length of 0. The workaround fails for some reason. Amusingly, this worked in 1.8.16, which displays the message "WARNUNG: /Length of COSObject\{1, 0} corrected from 0 to 695645". was (Author: tilman): The image has an (incorrect) length of 0. The workaround fails for some reason. Amusingly, this worked in 1.8.16, which displays the message "WARNUNG: /Length of COSObject\{1, 0} corrected from 0 to 695645". > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Priority: Major > Labels: regression > Attachments: test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884492#comment-17884492 ] Tilman Hausherr commented on PDFBOX-5880: - The image has an (incorrect) length of 0. The workaround fails for some reason. Amusingly, this worked in 1.8.16, which displays the message "WARNUNG: /Length of COSObject\{1, 0} corrected from 0 to 695645". > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Priority: Major > Labels: regression > Attachments: test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5880: Labels: regression (was: ) > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Priority: Major > Labels: regression > Attachments: test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5880: Affects Version/s: 2.0.32 > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Priority: Major > Attachments: test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5880: Component/s: Parsing (was: Rendering) > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Priority: Major > Attachments: test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en
Joseph Jezerinac created PDFBOX-5880: Summary: PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected end position: 196 Key: PDFBOX-5880 URL: https://issues.apache.org/jira/browse/PDFBOX-5880 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 3.0.3 PDFBox Reporter: Joseph Jezerinac Attachments: test.pdf When rendering page one of the attached PDF the image does not render. In the logs, I see the following: {noformat} 2024-09-24 13:25:56:702 [main] WARN DocManagerImpl - Aspose.PDF/Words license initialized 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected end position: 196 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty java.io.IOException: Image stream is empty at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) at org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) {noformat} I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an issue. Here's the render code used: {code:java} File out = File.createTempFile("test-", ".png"); PDDocument pdDocument = Loader.loadPDF(pdf); final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5880) PDF render blank page: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected en
[ https://issues.apache.org/jira/browse/PDFBOX-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Jezerinac updated PDFBOX-5880: - Description: When rendering page one of the attached PDF the image does not render. In the logs, I see the following: {noformat} 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected end position: 196 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty java.io.IOException: Image stream is empty at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) at org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) {noformat} I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an issue. Here's the render code used: {code:java} File out = File.createTempFile("test-", ".png"); PDDocument pdDocument = Loader.loadPDF(pdf); final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); {code} was: When rendering page one of the attached PDF the image does not render. In the logs, I see the following: {noformat} 2024-09-24 13:25:56:702 [main] WARN DocManagerImpl - Aspose.PDF/Words license initialized 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 196, length: 0, expected end position: 196 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty java.io.IOException: Image stream is empty at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) at org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) {noformat} I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an issue. Here's the render code used: {code:java} File out = File.createTempFile("test-", ".png"); PDDocument pdDocument = Loader.loadPDF(pdf); final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); {code} > PDF render blank page: The end of the stream doesn't point to the correct > offset, using workaround to read the stream, stream start position: 196, > length: 0, expected end position: 196 > ---- > > Key: PDFBOX-5880 > URL: https://issues.apache.org/jira/browse/PDFBOX-5880 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 3.0.3 PDFBox >Reporter: Joseph Jezerinac >Priority: Major > Attachments: test.pdf > > > When rendering page one of the attached PDF the image does not render. > In the logs, I see the following: > {noformat} > 2024-09-24 13:25:56:924 [main] WARN COSParser - The end of the stream doesn't > point to the correct offset, using workaround to read the stream, stream > start position: 196, length: 0, expected end position: 196 > 2024-09-24 13:25:56:930 [main] WARN PDFStreamEngine - Image stream is empty > java.io.IOException: Image stream is empty > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:182) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:477) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1107) > {noformat} > I assume this is a bad PDF, but Acrobat, Chrome, etc., display it without an > issue. > Here's the render code used: > {code:java} > File out = File.createTempFile("test-", ".png"); > PDDocument pdDocument = Loader.loadPDF(pdf); > final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument); > ImageIO.write(pdfRenderer.renderImageWithDPI(0, 300), "png", out); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5660) Improve code quality (5)
[ https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883536#comment-17883536 ] ASF subversion and git services commented on PDFBOX-5660: - Commit 1920834 from Tilman Hausherr in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1920834 ] PDFBOX-5660: update commons-io > Improve code quality (5) > > > Key: PDFBOX-5660 > URL: https://issues.apache.org/jira/browse/PDFBOX-5660 > Project: PDFBox > Issue Type: Improvement >Reporter: Tilman Hausherr >Priority: Minor > Attachments: AnnotationSample.Standard.pdf, > DRY_refactoring_Typ2CharStringParser.patch, > Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch, > > Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch, > Simplify_string_conversion_in_PDFHighlighter.patch, > Update_string_handling_and_regex_in_several_classes.patch, > avoid_multiple_unboxing.patch, code_cleanup.patch, > do_not_create_temporary_File_instance.patch, > extract_common_code,_move_toUpperCase()_out_of_loop.patch, > fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, > introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch, > introduce_StringUtil_class_for_reusable_functionality.patch, > introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch, > make_inner_class_static.patch, refactor_isEndOfName.patch, > remove_code_duplication_in_Type2CharStringParser.patch, > remove_obsolete_class_NullOutputStream.patch, > remove_unnecessary_calls_to_toString()_String_valueOf().patch, > replace_System_getProperty()_calls.patch, screenshot-1.png, > simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch, > simplify_stream_operations.patch, use_Map_ofEntries().patch, > use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, > use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch, > use_String_join().patch, use_switch_for_readability.patch, > use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch > > > This is a longterm issue for the task to improve code quality, by using the > SonarQube report, hints in different IDEs, the FindBugs tool and other code > quality tools. > This is a follow-up of PDFBOX-4892, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5660) Improve code quality (5)
[ https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883535#comment-17883535 ] ASF subversion and git services commented on PDFBOX-5660: - Commit 1920833 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1920833 ] PDFBOX-5660: update commons-io > Improve code quality (5) > > > Key: PDFBOX-5660 > URL: https://issues.apache.org/jira/browse/PDFBOX-5660 > Project: PDFBox > Issue Type: Improvement >Reporter: Tilman Hausherr >Priority: Minor > Attachments: AnnotationSample.Standard.pdf, > DRY_refactoring_Typ2CharStringParser.patch, > Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch, > > Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch, > Simplify_string_conversion_in_PDFHighlighter.patch, > Update_string_handling_and_regex_in_several_classes.patch, > avoid_multiple_unboxing.patch, code_cleanup.patch, > do_not_create_temporary_File_instance.patch, > extract_common_code,_move_toUpperCase()_out_of_loop.patch, > fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, > introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch, > introduce_StringUtil_class_for_reusable_functionality.patch, > introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch, > make_inner_class_static.patch, refactor_isEndOfName.patch, > remove_code_duplication_in_Type2CharStringParser.patch, > remove_obsolete_class_NullOutputStream.patch, > remove_unnecessary_calls_to_toString()_String_valueOf().patch, > replace_System_getProperty()_calls.patch, screenshot-1.png, > simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch, > simplify_stream_operations.patch, use_Map_ofEntries().patch, > use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, > use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch, > use_String_join().patch, use_switch_for_readability.patch, > use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch > > > This is a longterm issue for the task to improve code quality, by using the > SonarQube report, hints in different IDEs, the FindBugs tool and other code > quality tools. > This is a follow-up of PDFBOX-4892, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5561) qpdf shows warnings trying to linearize file modified by PDFBOX
[ https://issues.apache.org/jira/browse/PDFBOX-5561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882984#comment-17882984 ] HABA commented on PDFBOX-5561: -- Hello, I am encountering the same warning message. I am currently using version {*}3.0.3{*}. Have you found any solutions or workarounds to resolve this issue? Thank you for your assistance. Best regards, HABA > qpdf shows warnings trying to linearize file modified by PDFBOX > --- > > Key: PDFBOX-5561 > URL: https://issues.apache.org/jira/browse/PDFBOX-5561 > Project: PDFBox > Issue Type: Bug > Components: Writing >Affects Versions: 2.0.27 >Reporter: menteith >Priority: Minor > > I have a PDF file* that is generated by a software other than PDFBox. When > the PDF is modified by code given below using PDFBOX, *qpdf* shows the > following warning: > {code:java} > WARNING: modified.pdf: reported number of objects (12991) is not one plus the > highest object number (12989) > qpdf: operation succeeded with warnings; resulting file may have some > problems{code} > Note the warning is not shown when *qpdf* analyses original pdf file (ie. pdf > not modified by PDFBox). > Here's the code to modify PDF in question: > > {code:java} > for (final PDPage page: document.getPages()) { > page.getAnnotations().forEach(annotation - > { > if (annotation instanceof PDAnnotationLink link) { > final PDPageXYZDestination destination = new > PDPageXYZDestination(); > destination.setPage(document.getPage(1)); > final PDActionGoTo action = new PDActionGoTo(); > action.setDestination(destination); > link.setAction(action); > } > }); > } {code} > > I forgot to mention that the result file generated by PDFBox is almost as > twice as big as the original one. > *I've sent the file to Tilman Hausherr. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882794#comment-17882794 ] Andreas Lehmkühler edited comment on PDFBOX-5852 at 9/18/24 7:52 PM: - There are some details which might be optimized with regard to memory consumption. One is the Integer vs. int thing. I'm still on it as the code logic has to be changed due to the fact that an int value can't be null and there is some logic which relies on that. No big issue so that I'm going to come up with some additional changes was (Author: lehmi): There are some details which might be optimized with regard to memory consumption. One is the Integer vs. int thing. I'm still on it as the code logic has to be changed due to the fact that an int value can't be null and there is some logic which relies on that. No big issue so that I'm going to come up with some more changes > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882794#comment-17882794 ] Andreas Lehmkühler commented on PDFBOX-5852: There are some details which might be optimized with regard to memory consumption. One is the Integer vs. int thing. I'm still on it as the code logic has to be changed due to the fact that an int value can't be null and there is some logic which relies on that. No big issue so that I'm going to come up with some more changes > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882781#comment-17882781 ] Larry Lynn commented on PDFBOX-5852: I see the updated code uses a {code:java} Integer[][] {code} Previously, we needed an Integer rather than an int because Java doesn't support primitive values in Maps (at least, not without an extra library). Now that we're not using a Map, could we instead use an {code:java} int[][] {code} ? When I was running this code in a debugger, I saw that that map could get very big, especially when a conversion was requested at very high resolutions. I think I saw sizes in excess of 10 million elements. If int would work instead of Integer, I think that could yield a fair savings in memory usage since the primitive type doesn't need extra memory overhead of the object [https://stackoverflow.com/questions/6081955/memory-footprint-of-int-and-integer-arrays] [https://www.javamex.com/tutorials/memory/object_memory_usage.shtml] A 2-d int array would probably be faster too. > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882778#comment-17882778 ] Larry Lynn commented on PDFBOX-5852: Thank you all very much for your work on this ticket. > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882471#comment-17882471 ] ASF subversion and git services commented on PDFBOX-5852: - Commit 1920756 from le...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1920756 ] PDFBOX-5852: replace IntPoint with Point > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.3 PDFBox, 4.0.0 > > Attachments: minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-5852: --- Fix Version/s: 2.0.33 > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.33, 3.0.3 PDFBox, 4.0.0 > > Attachments: minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882474#comment-17882474 ] ASF subversion and git services commented on PDFBOX-5852: - Commit 1920757 from le...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1920757 ] PDFBOX-5852: deprecate IntPoint > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.3 PDFBox, 4.0.0 > > Attachments: minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5852) Hi CPU and memory usage when converting a PDF with type 4 shading
[ https://issues.apache.org/jira/browse/PDFBOX-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882470#comment-17882470 ] ASF subversion and git services commented on PDFBOX-5852: - Commit 1920755 from le...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1920755 ] PDFBOX-5852: replace Map with a two-dimensional array > Hi CPU and memory usage when converting a PDF with type 4 shading > - > > Key: PDFBOX-5852 > URL: https://issues.apache.org/jira/browse/PDFBOX-5852 > Project: PDFBox > Issue Type: Wish > Components: Rendering >Affects Versions: 2.0.28 >Reporter: Larry Lynn >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.3 PDFBox, 4.0.0 > > Attachments: minimal.pdf > > > We've observed excessive CPU and memory consumption when converting a PDF to > images when the PDF contains type 4 shading. This is especially noticeable > when the conversion is done with a high DPI. Can this be improved? > > Conversation from the PDFBox users mailing list follows > Initial email: > {quote} > Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox > users and maintainers, > We have a PDF that causes performance problems when we use PDFBox to > convert it to an image with renderImageWithDPI(). We're calling > renderImageWithDPI() > with 650 DPI. I realize this is a very high value - we're using it for > high fidelity original images that will later be downsampled. On my work > laptop which has fairly strong hardware, the conversion takes 25 minutes > and consumes 20GB of memory. CPU and memory usage is reduced if we use a > lower DPI. > The PDF is 1 page long. It contains type 4 shading / Gouraud free form > triangle meshes. We've been aware of some performance issues with type 4 > shading for a little while now, but the PDFs that contained the type 4 > shading belonged to our customers and we were not authorized to share > them. We finally found a problem input document that is non-sensitive and > that we are authorized to share. I've attached a copy of the problem PDF > to this email. > I searched the archives for the users and the developers mailing list and I > didn't find anything specifically about this issue. > I searched through the PDFBox jira tickets and I found a couple of tickets > that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most > closely describe what we're seeing, but that was closed in PDFBox 2.0.0, > and our issue still reproduces with PDFBox 2.0.28. > Should I refer this issue over to the developers mailing list or create a > PDFBox Jira ticket for this? > Thanks and Regards, > Larry Lynn {quote} > Response: > {quote} > Hi, > Yes shading can be very slow, especially at high dpi. The attachment > didn't get through, please upload to a sharehoster or create a ticket. > If you need to register then add a meaningful text, e.g. the subject of > this post so we know you're not a spammer. Also retry with 2.0.31 and > 3.0.2 just to be sure. However I'm pessimistic that this can be fixed. > Tilman {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882355#comment-17882355 ] ASF subversion and git services commented on PDFBOX-5879: - Commit 1920743 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1920743 ] PDFBOX-5879: respect code conventions > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Gábor Stefanik >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882354#comment-17882354 ] ASF subversion and git services commented on PDFBOX-5879: - Commit 1920742 from Tilman Hausherr in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1920742 ] PDFBOX-5879: respect code conventions > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Gábor Stefanik >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882327#comment-17882327 ] Tilman Hausherr commented on PDFBOX-5879: - I added a simple test for the feature because it turns out we didn't have any. However this isn't a test of the fixed bug, that would have been more difficult to create a file, and there is no risk that this fix gets reverted anyway. > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Gábor Stefanik >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882327#comment-17882327 ] Tilman Hausherr edited comment on PDFBOX-5879 at 9/17/24 9:08 AM: -- I added a simple test for the rotationMagic feature because it turns out we didn't have any. However this isn't a test of the fixed bug, that would have been more difficult to create a file, and there is no risk that this fix gets reverted anyway. was (Author: tilman): I added a simple test for the feature because it turns out we didn't have any. However this isn't a test of the fixed bug, that would have been more difficult to create a file, and there is no risk that this fix gets reverted anyway. > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Gábor Stefanik >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882326#comment-17882326 ] ASF subversion and git services commented on PDFBOX-5879: - Commit 1920739 from Tilman Hausherr in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1920739 ] PDFBOX-5879: remove test message > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Gábor Stefanik >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882325#comment-17882325 ] ASF subversion and git services commented on PDFBOX-5879: - Commit 1920738 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1920738 ] PDFBOX-5879: remove test message > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Gábor Stefanik >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882324#comment-17882324 ] ASF subversion and git services commented on PDFBOX-5879: - Commit 1920737 from Tilman Hausherr in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1920737 ] PDFBOX-5879: add test for rotationMagic > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Gábor Stefanik >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882322#comment-17882322 ] ASF subversion and git services commented on PDFBOX-5879: - Commit 1920736 from Tilman Hausherr in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1920736 ] PDFBOX-5879: add test for rotationMagic > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Gábor Stefanik >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882318#comment-17882318 ] ASF subversion and git services commented on PDFBOX-5879: - Commit 1920735 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1920735 ] PDFBOX-5879: add test for rotationMagic > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Gábor Stefanik >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882299#comment-17882299 ] ASF subversion and git services commented on PDFBOX-5879: - Commit 1920732 from Tilman Hausherr in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1920732 ] PDFBOX-5879: remove unused import > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Gábor Stefanik >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882298#comment-17882298 ] ASF subversion and git services commented on PDFBOX-5879: - Commit 1920731 from Tilman Hausherr in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1920731 ] PDFBOX-5879: remove unused import > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Gábor Stefanik >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882297#comment-17882297 ] ASF subversion and git services commented on PDFBOX-5879: - Commit 1920730 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1920730 ] PDFBOX-5879: remove unused import > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Gábor Stefanik >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-5879. - Fix Version/s: 2.0.33 3.0.4 PDFBox 4.0.0 Assignee: Tilman Hausherr Resolution: Fixed Thank you. It's not the commit, it's poor programming that got exposed because of the commit. > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Gábor Stefanik >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5879: Affects Version/s: 2.0.32 > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.32, 3.0.3 PDFBox >Reporter: Gábor Stefanik >Priority: Major > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882284#comment-17882284 ] ASF subversion and git services commented on PDFBOX-5879: - Commit 1920729 from Tilman Hausherr in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1920729 ] PDFBOX-5879: avoid ClassCastException > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 3.0.3 PDFBox >Reporter: Gábor Stefanik >Priority: Major > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882282#comment-17882282 ] ASF subversion and git services commented on PDFBOX-5879: - Commit 1920728 from Tilman Hausherr in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1920728 ] PDFBOX-5879: avoid ClassCastException > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 3.0.3 PDFBox >Reporter: Gábor Stefanik >Priority: Major > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5879) Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page
[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882281#comment-17882281 ] ASF subversion and git services commented on PDFBOX-5879: - Commit 1920727 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1920727 ] PDFBOX-5879: avoid ClassCastException > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > -- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 3.0.3 PDFBox >Reporter: Gábor Stefanik >Priority: Major > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org