[ https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151927#comment-14151927 ]
Tom Crossland commented on PDFBOX-1512: --------------------------------------- Part of the issue here is that {{TextPositionComparator}} breaks the contract for {{Comparator}}. Specifically, it doesn't ensure that the ordering is transitive. The [Comparator|http://docs.oracle.com/javase/7/docs/api/java/util/Comparator.html] interface specifies that: {quote} The implementor must also ensure that the relation is transitive: {{((compare(x, y)>0) && (compare(y, z)>0))}} implies {{compare(x, z)>0}}. {quote} This is currently not the case due to the vertical position tolerance. So {{TextPositionComparator}} is not a valid {{Comparator}} and it shouldn't be used for sorting. Using a custom Quicksort probably won't help much, as the ordering of the results will depend on the order in which elements are compared. > TextPositionComparator is not compatible with Java 7 > ---------------------------------------------------- > > Key: PDFBOX-1512 > URL: https://issues.apache.org/jira/browse/PDFBOX-1512 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.7.1 > Environment: Java 7 > Reporter: Benjamin Papez > Assignee: Andreas Lehmkühler > Attachments: FOP-2252.pdf, TextPositionComparator.java, Topo.pdf, > Topo.txt, TopoContained.pdf, TopoContained.txt, TopoOverlap.pdf, > TopoOverlap.txt, WFI_PDFParser_TextPostionComparator.txt, > illustration-of-inconsistent-sorting.png, immo-kurier_arsenal_93x62.pdf, > quicksort.patch > > > The TextPostionCompartor causes the following exception running on Java 7: > Unexpected RuntimeException from > org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison > method violates its general contract! > I think the problem is with this check: > if ( yDifference < .1 || > (pos2YBottom >= pos1YTop && pos2YBottom <= pos1YBottom) || > (pos1YBottom >= pos2YTop && pos1YBottom <= pos2YBottom)) > as it violates the contract requirement: > The implementor must also ensure that the relation is transitive: > ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0. > Finally, the implementor must ensure that compare(x, y)==0 implies that > sgn(compare(x, z))==sgn(compare(y, z)) for all z. > Java 7 now is strict and throws exceptions when the contract is violated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)