[ https://issues.apache.org/jira/browse/PDFBOX-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15030058#comment-15030058 ]
Tilman Hausherr commented on PDFBOX-2996: ----------------------------------------- 005021-reduced.pdf isn't part of my test set, the W being or not being part of the rest and where it is to be positioned is a tricky question. IMHO a user can't expect the "W" to be part of the rest, but of course it would be nice if it is. Just for fun, I first tested forcing java's own sort algorithm first. That spd file has different results than the trunk. No exception is thrown. Java classic: 8.4sec Quicksort classic: 23.6sec Quicksort iterative: 8.3sec Bubble sort: 13.9sec Btw the result of bubble sort are also different for the spd file. So my favorite at this time would be the iterative quicksort. We shouldn't use any algorithm that wikipedia describes as "slow and impractical" in the first paragraph. My memory of bubble sort is that it was taught to explain the concept of describing complexity (here: O(n^2)) on a bad algorithm. > StackOverflow in Quicksort > -------------------------- > > Key: PDFBOX-2996 > URL: https://issues.apache.org/jira/browse/PDFBOX-2996 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.8.10, 2.0.0 > Environment: Java 7 > Reporter: Manuel Aristaran > Attachments: 001991.pdf, Lars-v0-PDFBOX-2996.patch, > Lars-v1-PDFBOX-2996.patch, QuickSort.java, > artikel1_20_arab.pdf-sorted-diff.txt, artikel1_20_arab.pdf-sorted-iter.txt, > artikel1_20_arab.pdf-sorted-rekur.txt, failing_sort.pdf, quicksort.patch > > > Running PDFTextStripper through ExtractText triggers a StackOverflow > exception in the QuickSort implementation for [this particular > document|https://www.dropbox.com/s/6crie7y5gqadwa5/1.pdf?dl=0]. > To reproduce: {{java -jar pdfbox-app-1.8.11-SNAPSHOT.jar ExtractText -sort > failing_sort.pdf}} > (Related to PDFBOX-1512) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org