Re: [iText-questions] PdfContentStreamProcessor not handling TJ operator correctly (maybe)

Kevin Day Mon, 10 Sep 2012 08:51:06 -0700

Michael-

The API was designed to support advanced extraction and analysis.  The first
cut was focused at simple text extraction because that required the least
amount of work and could be used to prove out the general concepts involved
in the parser.  The architecture was very intentionally designed to all
exposing additional parameters - in user space - as the need arises.

If there is part of the graphics state that you or anyone else finds they
need, just ask and it'll get added quickly. What I will not do is just
expose the graphics state object and require API users to take care of the
coordinate transformations, etc... If someone wants access to the graphics
state directly like that, they should sub-class PdfContentStreamProcessor.
Note that I really don't recommend that - it is much better for the iText
community at large if we continue to flesh out TextRenderInfo.

Your question about text rise is a perfect example of where TextRenderInfo
needs to be expanded - and why it wasn't just blindly added to the
TextRenderInfo API at the outset. I think that a brief discussion of how it
should be incorporated is called for. For example, should we increase the
baseline, ascentline and descentline by the rise? If we do that, it makes
it easier for simple rendering, but then we lose the fact that there *was* a
rise explicitly specified. Alternatively, we could just expose the amount
of the rise - but then this puts a much larger burden on the users of
TextRenderInfo. My inclination is to include the rise in the baseline,
ascentline and descentline, then expose getRise() so that someone who needs
the actual value can get to it (along with clear javadocs indicating that
the rise has already been added to the ***lines). What do you think?

String width can be obtained from the baseline - in all of my work, I've
found that having the baseline vector is the important thing (as opposed to
the width of the string) - I'm certainly open to exposing getStringWidth()
if there is a valid use-case for it - but using a scalar instead of a vector
can cause huge problems in rendering (for example, if the text is rotated),
so I'm having a hard time thinking of a use case...

I think that preparing a white paper or something along those lines to
describe the parser would be a good idea - as with most projects, it is
difficult to find time to create such things, but I will add it to my list.

--
View this message in context:
http://itext-general.2136553.n4.nabble.com/PdfContentStreamProcessor-not-handling-TJ-operator-correctly-maybe-tp4656117p4656244.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples:
http://itextpdf.com/themes/keywords.php

Re: [iText-questions] PdfContentStreamProcessor not handling TJ operator correctly (maybe)

Reply via email to