[ 
https://issues.apache.org/jira/browse/PDFBOX-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15030058#comment-15030058
 ] 

Tilman Hausherr commented on PDFBOX-2996:
-----------------------------------------

005021-reduced.pdf isn't part of my test set, the W being or not being part of 
the rest and where it is to be positioned is a tricky question. IMHO a user 
can't expect the "W" to be part of the rest, but of course it would be nice if 
it is.

Just for fun, I first tested forcing java's own sort algorithm first. That spd 
file has different results than the trunk. No exception is thrown.

Java classic: 8.4sec

Quicksort classic: 23.6sec

Quicksort iterative: 8.3sec

Bubble sort: 13.9sec

Btw the result of bubble sort are also different for the spd file.

So my favorite at this time would be the iterative quicksort.

We shouldn't use any algorithm that wikipedia describes as "slow and 
impractical" in the first paragraph. My memory of bubble sort is that it was 
taught to explain the concept of describing complexity (here: O(n^2)) on a bad 
algorithm.

> StackOverflow in Quicksort
> --------------------------
>
>                 Key: PDFBOX-2996
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2996
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.10, 2.0.0
>         Environment: Java 7
>            Reporter: Manuel Aristaran
>         Attachments: 001991.pdf, Lars-v0-PDFBOX-2996.patch, 
> Lars-v1-PDFBOX-2996.patch, QuickSort.java, 
> artikel1_20_arab.pdf-sorted-diff.txt, artikel1_20_arab.pdf-sorted-iter.txt, 
> artikel1_20_arab.pdf-sorted-rekur.txt, failing_sort.pdf, quicksort.patch
>
>
> Running PDFTextStripper through ExtractText triggers a StackOverflow 
> exception in the QuickSort implementation for [this particular 
> document|https://www.dropbox.com/s/6crie7y5gqadwa5/1.pdf?dl=0].
> To reproduce: {{java -jar pdfbox-app-1.8.11-SNAPSHOT.jar ExtractText -sort 
> failing_sort.pdf}}
> (Related to PDFBOX-1512)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to