[
https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168121#comment-14168121
]
Uwe commented on PDFBOX-1512:
-----------------------------
> The infallibility of the previous author is probably not a safe assumption.
I wouldn't go so far as to to assume the infallibility of any coder ;-)
The current algo will have its weaknesses, no doubt. But devising one that
works reliably may not be as easy as it seems.
Why don't we:
* Fix the released versions (1.7.x, 1.8.x) with the patch I provided? It's safe
to do so, because it won't change the algorithm. It just makes it work on Java
1.7+
* Fix the trunk (2.0+) properly, with a new comparator/algorithm
This way, everyone wins:
* Users of the released versions (like myself) get a quick fix that they
desperately need.
* PDFBox Developers get a chance to clean up this issue properly in the trunk,
and on the same token improve the text extraction feature
What do you think?
> TextPositionComparator is not compatible with Java 7
> ----------------------------------------------------
>
> Key: PDFBOX-1512
> URL: https://issues.apache.org/jira/browse/PDFBOX-1512
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.7.1, 2.0.0
> Environment: Java 7
> Reporter: Benjamin Papez
> Assignee: Andreas Lehmkühler
> Fix For: 2.0.0
>
> Attachments: FOP-2252.pdf, TextPositionComparator.java, Topo.pdf,
> Topo.txt, TopoContained.pdf, TopoContained.txt, TopoOverlap.pdf,
> TopoOverlap.txt, WFI_PDFParser_TextPostionComparator.txt,
> illustration-of-inconsistent-sorting.png, immo-kurier_arsenal_93x62.pdf,
> quicksort.patch
>
>
> The TextPostionCompartor causes the following exception running on Java 7:
> Unexpected RuntimeException from
> org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison
> method violates its general contract!
> I think the problem is with this check:
> if ( yDifference < .1 ||
> (pos2YBottom >= pos1YTop && pos2YBottom <= pos1YBottom) ||
> (pos1YBottom >= pos2YTop && pos1YBottom <= pos2YBottom))
> as it violates the contract requirement:
> The implementor must also ensure that the relation is transitive:
> ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0.
> Finally, the implementor must ensure that compare(x, y)==0 implies that
> sgn(compare(x, z))==sgn(compare(y, z)) for all z.
> Java 7 now is strict and throws exceptions when the contract is violated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)