[iText-questions] [SPAM] RE: PdfContentStreamProcessor not handling TJ operator correctly (maybe)

mkl Mon, 10 Sep 2012 02:39:05 -0700

Kevin,

Kevin Day wrote
> The reason we don't expose the graphics state is because that would break
> abstraction between the logical levels of the API.  The goal of the text
> render listener was to provide a neutral mechanism for describing a text
> draw operation.  If the information being provided isn't sufficient, then
> that's something that absolutely needs to be addressed.  Exposing the full
> graphics state seems like overkill in this regard, though (and it would
> encourage breaking the encapsulation).

API layer separation is important, I agree. First, though, the aim of the
API itself has to be made clear.

Initially had been to enable iText to do text extraction of complete text
blocks like full-blown lines or even paragraphs. Thus, details like
character spacing and word spacing inside a text segment seemed not that
important (well, for exotic cases they actually are important, but I doubt
someone seriously did develop a RenderListener for those cases).

Meanwhile, though, the API more and more seems to be used in other contexts,
too, like exact location based extraction (cf. the "marked for redaction"
case) or even exact rendering (cf. the OP's case) requiring exact locations
of individual characters in the text segments.

Therefore, first of all you should decide if the rendering API is intended
for such use cases or not, and only in the latter case enhance it.

If such exact location use cases are to be supported, there maybe should
also be a public scaled getStringWidth method allowing RenderListeners to
get the length of parts of the text fragments. Or else a method which
returns a list of the individual characters of the text segment and their
respective width or even a list of TextRenderInfo instances for each
individual character of the segment.

Additionally there seems to be need for some in-general documentation of the
API as users seem not to be aware of the subtleties of PDF text segments.
Perhaps a white paper like Bruno's one on the iText signature API, not
necessarily quite as extensive, though.

Kevin Day wrote
> I'm not sure that leading and rise is going to be super helpful - let me
> know how you plan to use it. Would have expected that getBaseline(),
> getAscentLine() and getDescentLine() would be what you need.

While leading is calculated into the coordinates in TextMoveNextLine, rise
does not seem to be respected at all (at least I couldn't find it quickly).
Is that by intent? In that case a getter for a scaled rise (maybe actually a
Vector) would be important.

Kevin Day wrote
> Superscript and subscript - that's going to be quite tricky.

If rise is used in a document for superscript and subscript, exposing it
might make recognizing that really easy. If a document uses it for other
stuff, interpreting rise like that may be misleading. Oh well...

Regards, Michael

--
View this message in context:
http://itext-general.2136553.n4.nabble.com/PdfContentStreamProcessor-not-handling-TJ-operator-correctly-maybe-tp4656117p4656239.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples:
http://itextpdf.com/themes/keywords.php

[iText-questions] [SPAM] RE: PdfContentStreamProcessor not handling TJ operator correctly (maybe)

Reply via email to