[
https://issues.apache.org/jira/browse/PDFBOX-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632129#action_12632129
]
Andreas Lehmkühler commented on PDFBOX-374:
-------------------------------------------
You should have a look at PDFBOX-363. I tried to fix a problem with the page
rotation and I provided a patch for some minor problems which are perhaps
related to your problem, too.
> text areas not properly being sorted because of page rotation
> -------------------------------------------------------------
>
> Key: PDFBOX-374
> URL: https://issues.apache.org/jira/browse/PDFBOX-374
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 0.8.0-incubator
> Reporter: Brian Carrier
> Attachments: PDFStreamEngine.diff, PDFTextStripper.diff,
> rotation.pdf, TextPositionComparator.diff
>
>
> When PDFTextStripper is set to sort the text before outputting, the sorting
> is not correct if a page rotation exists. The reason is because both
> TextPositionComparator and PDFStreamEngine take the rotation into account.
> So, the rotation is applied twice by the time the comparison is done in
> TextPositionComparator.
> Also, it seems that the rotation code in PDFStreamEngine is not consistent. I
> verified the code for 0 and 90 degrees works, but the 180 and 270 situations
> do not seem consistent with the goal of adjusting the X and Y values so that
> 0,0 is in the upper left, which is what the 0 and 90 code does. I do not
> have examples of 180 and 270 to test with. There are no comments in this
> section, so I have been guessing about its purpose.
> The attached patches:
> - Remove the rotation from TextPositionComparator
> - Adds comments and makes changes to the 180 and 270 situations to make it
> consistent with 0 and 90.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.