[ https://issues.apache.org/jira/browse/PDFBOX-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085883#comment-15085883 ]
Tilman Hausherr commented on PDFBOX-2996: ----------------------------------------- Impossible to tell, because I can't read arabic. I copied all test files from 2.0 to 1.8 and there were two more differences, but it applied to files where it is hard to tell what the correct result should be (two columns with lines with y-coordinates near each other). When comparing, sometimes the "old" was better, sometimes the "new" was better. (PDFBOX-3062-005021.pdf). IMHO the sort option shouldn't be used by default, unless one expects tables (e.g. invoices). > StackOverflow in Quicksort > -------------------------- > > Key: PDFBOX-2996 > URL: https://issues.apache.org/jira/browse/PDFBOX-2996 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.8.10, 2.0.0 > Environment: Java 7 > Reporter: Manuel Aristaran > Fix For: 1.8.11, 2.0.0 > > Attachments: 001991.pdf, Basiswissen-Vorschriften.pdf-diffs.png, > Basiswissen-Vorschriften.pdf-sorted-bubble.txt, > Basiswissen-Vorschriften.pdf-sorted-java8-legacyMergeSort.txt, > Basiswissen-Vorschriften.pdf-sorted-java8-timsort.txt, > Basiswissen-Vorschriften.pdf-sorted-qs-iterative-withMiddlePivot.txt, > Basiswissen-Vorschriften.pdf-sorted-qs-iterative-withRightPivot.txt, > Basiswissen-Vorschriften.pdf-sorted-qs-recursive.txt, > Lars-v0-PDFBOX-2996.patch, Lars-v1-PDFBOX-2996.patch, > Lars-v2-PDFBOX-2996.patch, PDFBOX-1292.pdf-diffs.png, > PDFBOX-1292.pdf-sorted-bubble.txt, > PDFBOX-1292.pdf-sorted-java8-legacyMergeSort.txt, > PDFBOX-1292.pdf-sorted-java8-timsort.txt, > PDFBOX-1292.pdf-sorted-qs-iterative-withMiddlePivot.txt, > PDFBOX-1292.pdf-sorted-qs-iterative-withRightPivot.txt, > PDFBOX-1292.pdf-sorted-qs-recursive.txt, QuickSort.java, > TestSortingAlgorithms.java, artikel1_20_arab.pdf-diffs.png, > artikel1_20_arab.pdf-sorted-bubble.txt, artikel1_20_arab.pdf-sorted-diff.txt, > artikel1_20_arab.pdf-sorted-iter-withRightPivot.txt, > artikel1_20_arab.pdf-sorted-iter.txt, > artikel1_20_arab.pdf-sorted-java8-legacyMergeSort.txt, > artikel1_20_arab.pdf-sorted-java8-timsort.txt, > artikel1_20_arab.pdf-sorted-qs-iterative-withMiddlePivot.txt, > artikel1_20_arab.pdf-sorted-qs-iterative-withRightPivot.txt, > artikel1_20_arab.pdf-sorted-qs-recursive.txt, > artikel1_20_arab.pdf-sorted-rekur.txt, failing_sort.pdf, quicksort.patch > > > Running PDFTextStripper through ExtractText triggers a StackOverflow > exception in the QuickSort implementation for [this particular > document|https://www.dropbox.com/s/6crie7y5gqadwa5/1.pdf?dl=0]. > To reproduce: {{java -jar pdfbox-app-1.8.11-SNAPSHOT.jar ExtractText -sort > failing_sort.pdf}} > (Related to PDFBOX-1512) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org