[
https://issues.apache.org/jira/browse/PDFBOX-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15030058#comment-15030058
]
Tilman Hausherr commented on PDFBOX-2996:
-----------------------------------------
005021-reduced.pdf isn't part of my test set, the W being or not being part of
the rest and where it is to be positioned is a tricky question. IMHO a user
can't expect the "W" to be part of the rest, but of course it would be nice if
it is.
Just for fun, I first tested forcing java's own sort algorithm first. That spd
file has different results than the trunk. No exception is thrown.
Java classic: 8.4sec
Quicksort classic: 23.6sec
Quicksort iterative: 8.3sec
Bubble sort: 13.9sec
Btw the result of bubble sort are also different for the spd file.
So my favorite at this time would be the iterative quicksort.
We shouldn't use any algorithm that wikipedia describes as "slow and
impractical" in the first paragraph. My memory of bubble sort is that it was
taught to explain the concept of describing complexity (here: O(n^2)) on a bad
algorithm.
> StackOverflow in Quicksort
> --------------------------
>
> Key: PDFBOX-2996
> URL: https://issues.apache.org/jira/browse/PDFBOX-2996
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.8.10, 2.0.0
> Environment: Java 7
> Reporter: Manuel Aristaran
> Attachments: 001991.pdf, Lars-v0-PDFBOX-2996.patch,
> Lars-v1-PDFBOX-2996.patch, QuickSort.java,
> artikel1_20_arab.pdf-sorted-diff.txt, artikel1_20_arab.pdf-sorted-iter.txt,
> artikel1_20_arab.pdf-sorted-rekur.txt, failing_sort.pdf, quicksort.patch
>
>
> Running PDFTextStripper through ExtractText triggers a StackOverflow
> exception in the QuickSort implementation for [this particular
> document|https://www.dropbox.com/s/6crie7y5gqadwa5/1.pdf?dl=0].
> To reproduce: {{java -jar pdfbox-app-1.8.11-SNAPSHOT.jar ExtractText -sort
> failing_sort.pdf}}
> (Related to PDFBOX-1512)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]