Rotated text isn't extracted correctly from PDFs
------------------------------------------------

                 Key: TIKA-723
                 URL: https://issues.apache.org/jira/browse/TIKA-723
             Project: Tika
          Issue Type: Bug
          Components: parser
            Reporter: Michael McCandless
            Priority: Minor
         Attachments: rotated.pdf

I have an example PDF with 90 degree rotation; Tika produces the
characters one line at a time.  Ie, the doc has "Some rotated text,
here!" but Tika produces this:

{noformat}
<body><div class="page"><p>So
m
e
 
r
o
t
a
t
e
d
 
t
e
x
t
,
 
h
e
r
e
!</p>
{noformat}

I'm able to copy/paste the text out correctly.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to