[ 
https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151927#comment-14151927
 ] 

Tom Crossland commented on PDFBOX-1512:
---------------------------------------

Part of the issue here is that {{TextPositionComparator}} breaks the contract 
for {{Comparator}}. Specifically, it doesn't ensure that the ordering is 
transitive. The 
[Comparator|http://docs.oracle.com/javase/7/docs/api/java/util/Comparator.html] 
interface specifies that:

{quote}
The implementor must also ensure that the relation is transitive: 
{{((compare(x, y)>0) && (compare(y, z)>0))}} implies {{compare(x, z)>0}}.
{quote}

This is currently not the case due to the vertical position tolerance. So 
{{TextPositionComparator}} is not a valid {{Comparator}} and it shouldn't be 
used for sorting. Using a custom Quicksort probably won't help much, as the 
ordering of the results will depend on the order in which elements are compared.

> TextPositionComparator is not compatible with Java 7
> ----------------------------------------------------
>
>                 Key: PDFBOX-1512
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1512
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.7.1
>         Environment: Java 7
>            Reporter: Benjamin Papez
>            Assignee: Andreas Lehmkühler
>         Attachments: FOP-2252.pdf, TextPositionComparator.java, Topo.pdf, 
> Topo.txt, TopoContained.pdf, TopoContained.txt, TopoOverlap.pdf, 
> TopoOverlap.txt, WFI_PDFParser_TextPostionComparator.txt, 
> illustration-of-inconsistent-sorting.png, immo-kurier_arsenal_93x62.pdf, 
> quicksort.patch
>
>
> The TextPostionCompartor causes the following exception running on Java 7: 
> Unexpected RuntimeException from 
> org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison 
> method violates its general contract!
> I think the problem is with this check:
> if ( yDifference < .1 ||
>     (pos2YBottom >= pos1YTop && pos2YBottom <= pos1YBottom) ||
>     (pos1YBottom >= pos2YTop && pos1YBottom <= pos2YBottom))
> as it violates the contract requirement:
> The implementor must also ensure that the relation is transitive: 
> ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0.
> Finally, the implementor must ensure that compare(x, y)==0 implies that 
> sgn(compare(x, z))==sgn(compare(y, z)) for all z.
> Java 7 now is strict and throws exceptions when the contract is violated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to