[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862920#comment-17862920
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918897 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918897 ]

PDFBOX-5660: fix compiler warning

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862919#comment-17862919
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918896 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1918896 ]

PDFBOX-5660: fix compiler warning

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862916#comment-17862916
 ] 

Tilman Hausherr commented on PDFBOX-5848:
-

It finished with 3.0.2 (while I slept) and the snapshot too (with a dirty fix 
for the /Parent problem). I also tried with "-startPage 1 -endPage 442" because 
I'm not sure about the default settings of the splitter class and I never tried 
her code.

I'll do a less dirty fix for the /Parent problem in the next few days.

[~jfisbein-clarity] try setting a higher stack site with "-Xss". The snapshot 
version is at
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.3-SNAPSHOT/


> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862878#comment-17862878
 ] 

Maruan Sahyoun commented on PDFBOX-5848:


It was also slow for me (approx 3 min) but was only looking at the infinite 
loop question.

> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862867#comment-17862867
 ] 

Tilman Hausherr edited comment on PDFBOX-5848 at 7/3/24 6:15 PM:
-

I'm testing with 3.0.2 and it's working very slowly... I'm at page 170. However 
there's a different problem, lots of orphan pages. The reason is that some 
annotations have a /Parent entry which has a /Kids entry whose children are 
annotations on *different* pages. Opening and saving it with Adobe Reader 
brings a much smaller file, where the /Parent entry value is set to null.
 !screenshot-1.png! 


was (Author: tilman):
I'm testing with 3.0.2 and it's working very slowly... I'm at page 170. However 
there's a different problem, lots of orphan pages. The reason is that some 
annotations have a /Parent entry which has a /Kids entry whose children are 
annotations on *different* pages.
 !screenshot-1.png! 

> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5848:

Attachment: screenshot-1.png

> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862867#comment-17862867
 ] 

Tilman Hausherr commented on PDFBOX-5848:
-

I'm testing with 3.0.2 and it's working very slowly... I'm at page 170. However 
there's a different problem, lots of orphan pages. The reason is that some 
annotations have a /Parent entry which has a /Kids entry whose children are 
annotations on *different* pages.
 !screenshot-1.png! 

> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862853#comment-17862853
 ] 

Maruan Sahyoun commented on PDFBOX-5848:


tried with 3.0.3-SNAHSHOT and works for me using the command line split command:

{code}
java -jar pdfbox-app-3.0.3-SNAPSHOT.jar split -i 
cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf
{code}

Can you try the same with the 3.0.2 version and of that doesn't work for you 
with 3.0.3-SNAPSHOT?


> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Joan Fisbein (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862846#comment-17862846
 ] 

Joan Fisbein commented on PDFBOX-5848:
--

This is the stacktrace from my production application trying to process this 
PDF file:

 
{code:java}
   java.lang.Thread.State: RUNNABLE
at java.base@21.0.3/java.util.ArrayList.indexOfRange(ArrayList.java:299)
at java.base@21.0.3/java.util.ArrayList.indexOf(ArrayList.java:286)
at java.base@21.0.3/java.util.ArrayList.contains(ArrayList.java:275)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:199)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:1

[jira] [Comment Edited] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Joan Fisbein (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862846#comment-17862846
 ] 

Joan Fisbein edited comment on PDFBOX-5848 at 7/3/24 4:39 PM:
--

This is the stacktrace from my production application trying to split the same 
PDF file:

 
{code:java}
   java.lang.Thread.State: RUNNABLE
at java.base@21.0.3/java.util.ArrayList.indexOfRange(ArrayList.java:299)
at java.base@21.0.3/java.util.ArrayList.indexOf(ArrayList.java:286)
at java.base@21.0.3/java.util.ArrayList.contains(ArrayList.java:275)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:199)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompression

[jira] [Created] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Joan Fisbein (Jira)
Joan Fisbein created PDFBOX-5848:


 Summary: Infinite loop processing PDF
 Key: PDFBOX-5848
 URL: https://issues.apache.org/jira/browse/PDFBOX-5848
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 3.0.2 PDFBox
Reporter: Joan Fisbein
 Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf

I use PDFBox to split hundreds of PDFs per day, generally, everything works 
flawlessly but I just received a PDF that generates an infinite loop when I try 
to split it.

 

I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
versions):
{code:java}
private static void splitPdf(File fileToSplit) {
  try (PDDocument document = Loader.loadPDF(fileToSplit)) {
int documentPages = document.getNumberOfPages();
Splitter splitter = new Splitter();
List Pages = splitter.split(document);
Iterator iterator = Pages.listIterator();
while (iterator.hasNext()) {
  PDDocument pd = iterator.next();
  pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
  pd.close();
}
  } catch (IOException e) {
throw new RuntimeException(e);
  }
} {code}
The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



jbig2 git

2024-07-03 Thread Tilman Hausherr
Sorry for the mess. I sent the wrong commit message, and tried different 
(partly unsuccessful) tactics to squash several commit messages into 
one. At least the tika message is gone now. I'll stop now because it 
might only get worse.


Tilman


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862676#comment-17862676
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit aa0bed934f3c370969aee0126f7836a91fd5e3eb in pdfbox-jbig2's branch 
refs/heads/master from Tilman Hausherr
[ https://gitbox.apache.org/repos/asf?p=pdfbox-jbig2.git;h=aa0bed9 ]

PDFBOX-5660: update owasp plugin

PDFBOX-5660: update owasp plugin

PDFBOX-5660: update owasp plugin

PDFBOX-5660: update owasp plugin


> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org