Fixed Page rotation
-------------------
Key: PDFBOX-363
URL: https://issues.apache.org/jira/browse/PDFBOX-363
Project: PDFBox
Issue Type: Improvement
Reporter: Jukka Zitting
Attachments: landscape_rot90.pdf, PageRotation-Patch.diff,
test-landscape2.pdf
[Issue from SourceForge]
http://sourceforge.net/tracker/index.php?
func=detail&aid=1977429&group_id=78314&atid=552834
Hi all,
Daniel asked me for my patch for the rotation-issue described in
https://sourceforge.net/forum/message.php?msg_id=4992032
Attention, I didn't apply the newest patches to the classes
PDFStreamEngine
and PageDrawer.
There are 4 more probably affected classes calling the
page.findRotation
method which I didn't change, because I'm didn't have to use them
(until
now).
org.pdfbox.util.operator.pagedrawer.Invoke
org.pdfbox.util.TextPositionComparator
org.pdfbox.examples.pdmodel.PrintURLs
org.pdfbox.examples.util.PrintImageLocations
I've attached a pdf in DINA4-landscape. The text is missplaced
whenever I
try to print or display (using the pdfbox-PDFReader and
convertToImage
within my application) it with pdfbox. The acrobat reader has no
problems
with my documents.
After my patch everything works fine. Perhaps it is a point of
discussion,
if the convertToImage method has to rotate the image or if the
user has to
do it. The PDFPagePanel didn't do it (yet).
Andreas
http://sourceforge.net/tracker/download.php?
group_id=78314&atid=552834&file_id=279404&aid=1977429
[Comment from SourceForge]
Date: 2008-05-29 12:42
Sender: danielwilson
Logged In: YES
user_id=1737686
Originator: NO
I've just tried your sample PDF w/ the latest code -- prior to
application
of your patch. It doesn't work.
I'll work on incorporating your change for a full regression test
in the
next hour or so.
[Comment from SourceForge]
Date: 2008-05-29 15:16
Sender: lehmialk
Logged In: YES
user_id=2069622
Originator: YES
Hi Daniel,
I've just added my patch to the newest sources you send me earlier
this
day. I guess it works. During testing I've found another problem
concernign
graphics within landscape-docs. I found the solution in patching
the class
org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've
patched the
others. And consequently to be strict I've also patched the new
methods in
org.pdfbox.pdfviewer.PageDrawer
For my everthings works fine inlc. the 4PP-pdf.
I've attached the patched files and another testpdf with a embedded
graphic.
Andreas
File Added: pdfbox_rotation_patch_2.zip
http://sourceforge.net/tracker/download.php?
group_id=78314&atid=552834&file_id=279471&aid=1977429
[Comment from SourceForge]
Date: 2008-05-29 18:12
Sender: danielwilson
Logged In: YES
user_id=1737686
Originator: NO
Your code works w/ the 4PP test ... and with the other rendering
stuff
I've tried so far.
However ... the text extraction test fails with it. I can't
figure that
one out ... ideas?
[Comment from SourceForge]
Date: 2008-05-29 18:19
Sender: lehmialk
Logged In: YES
user_id=2069622
Originator: YES
Can you give me some more details? I never do any textextractions
with
pdfbox. Perhaps you'll provide with the code for test program, or
is it
part of pdfbox, so that I can find it in the cvs?
However, it has to wait until tomorrow
[Comment from SourceForge]
Date: 2008-05-29 18:39
Sender: danielwilson
Logged In: YES
user_id=1737686
Originator: NO
If you've got the whole project set up, try
ant testextract
I'll see if I can narrow it down some.
[Comment from SourceForge]
Date: 2008-05-29 21:00
Sender: danielwilson
Logged In: YES
user_id=1737686
Originator: NO
The extraction problem seems to have to do w/ the changes to
PDFStreamEngine.
If I revert that file, extraction succeeds. Unfortunately ...
with that
reverted but your other changes in place, image rendering hangs.
Will work on it more ... probably tomorrow.
[Comment from SourceForge]
Date: 2008-05-29 21:12
Sender: danielwilson
Logged In: YES
user_id=1737686
Originator: NO
Correction ... it doesn't hang ... it's just slow on the first PDF to
render ... maybe just due to the first one I'm sending it.
Will look more tomorrow.
[Comment from SourceForge]
Date: 2008-05-30 07:11
Sender: lehmialk
Logged In: YES
user_id=2069622
Originator: YES
I've found one bug. While deleting the if rules for the rotation,
I've
deleted line 394 which is still needed.
I've attached the corrected file
File Added: PDFStreamEngine.java
http://sourceforge.net/tracker/download.php?
group_id=78314&atid=552834&file_id=279559&aid=1977429
[Comment from SourceForge]
Date: 2008-05-30 07:43
Sender: lehmialk
Logged In: YES
user_id=2069622
Originator: YES
I forgot to mention that I can't run the test suite. When I try to
get the
whole project, I realized that I'm behind a firewall here in my
office.
Consequently my cvs-client doesn't work. I've to do it from home. :-(
I've only tested one file: 601501018.pdf
There are additional blanks and they disapper after adding the
missing
line. But starting at page 21, when the document orientation
changes from
portrait to landscape, there are additional cr or lf. Hmmmm ??
[Comment from SourceForge]
Date: 2008-05-30 08:25
Sender: lehmialk
Logged In: YES
user_id=2069622
Originator: YES
I've continued testing and I guess the problem is somewhere
starting in
org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it
handles the
coordinates for rotated pages somehow in an other way than the
implementation of the showCharacter() in
org.pdfbox.pdfviewer.PageDrawer.
But for the moment I don't understand what's happening in the
TextStripper, perhaps I'll find out later.
I hope this hint helps ...
[Comment from SourceForge]
Date: 2008-05-30 16:20
Sender: danielwilson
Logged In: YES
user_id=1737686
Originator: NO
I've put a couple more hours into this, and I don't know the answer.
I do know the text extraction is the more mature side of this
library.
For the moment, I'll be skipping over your changes to
PDFStreamEngine.
Thanks for the other changes!
[Comment from SourceForge]
Date: 2008-06-02 09:21
Sender: lehmialk
Logged In: YES
user_id=2069622
Originator: YES
Hi Daniel,
I guess I've solved the problem. The textposition-handling has to be
adjusted within the method PDFTextStripper.flushText(). Of course
my former
changes to the class PDFStreamEngine are needed. During debugging
I found a
bug in the class TextPositionComparator (line 82). I solved it by
removing
the rotation if-clauses. Whenever you compare two Textpositions,
it is
needless to look at the rotation because they are on the same page
so that
the comparison is independent of the rotation.
Furthermore my PDFTextStripper-patch seems to correct some minor
problems,
which are described in
https://sourceforge.net/forum/message.php?msg_id=4976730.
I've tested the following cases:
Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
test_rotate_270.txt doesn't work 100%, but my patch corrected a
bug in
lines 251-257, 278/279, 502/503, 574/575 and the other differences
are some
kind of special-character-issues. I guess you have to correct the
input at
first.
I've attached my changes based on the newest versions of both
classes.
[Comment from SourceForge]
Date: 2008-06-02 09:22
Sender: lehmialk
Logged In: YES
user_id=2069622
Originator: YES
File Added: pdfbox_rotation_patch_3.zip
http://sourceforge.net/tracker/download.php?
group_id=78314&atid=552834&file_id=279842&aid=1977429