Kevin, Kevin Day wrote > The reason we don't expose the graphics state is because that would break > abstraction between the logical levels of the API. The goal of the text > render listener was to provide a neutral mechanism for describing a text > draw operation. If the information being provided isn't sufficient, then > that's something that absolutely needs to be addressed. Exposing the full > graphics state seems like overkill in this regard, though (and it would > encourage breaking the encapsulation).
API layer separation is important, I agree. First, though, the aim of the API itself has to be made clear. Initially had been to enable iText to do text extraction of complete text blocks like full-blown lines or even paragraphs. Thus, details like character spacing and word spacing inside a text segment seemed not that important (well, for exotic cases they actually are important, but I doubt someone seriously did develop a RenderListener for those cases). Meanwhile, though, the API more and more seems to be used in other contexts, too, like exact location based extraction (cf. the "marked for redaction" case) or even exact rendering (cf. the OP's case) requiring exact locations of individual characters in the text segments. Therefore, first of all you should decide if the rendering API is intended for such use cases or not, and only in the latter case enhance it. If such exact location use cases are to be supported, there maybe should also be a public scaled getStringWidth method allowing RenderListeners to get the length of parts of the text fragments. Or else a method which returns a list of the individual characters of the text segment and their respective width or even a list of TextRenderInfo instances for each individual character of the segment. Additionally there seems to be need for some in-general documentation of the API as users seem not to be aware of the subtleties of PDF text segments. Perhaps a white paper like Bruno's one on the iText signature API, not necessarily quite as extensive, though. Kevin Day wrote > I'm not sure that leading and rise is going to be super helpful - let me > know how you plan to use it. Would have expected that getBaseline(), > getAscentLine() and getDescentLine() would be what you need. While leading is calculated into the coordinates in TextMoveNextLine, rise does not seem to be respected at all (at least I couldn't find it quickly). Is that by intent? In that case a getter for a scaled rise (maybe actually a Vector) would be important. Kevin Day wrote > Superscript and subscript - that's going to be quite tricky. If rise is used in a document for superscript and subscript, exposing it might make recognizing that really easy. If a document uses it for other stuff, interpreting rise like that may be misleading. Oh well... Regards, Michael -- View this message in context: http://itext-general.2136553.n4.nabble.com/PdfContentStreamProcessor-not-handling-TJ-operator-correctly-maybe-tp4656117p4656239.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
