Michael- The API was designed to support advanced extraction and analysis. The first cut was focused at simple text extraction because that required the least amount of work and could be used to prove out the general concepts involved in the parser. The architecture was very intentionally designed to all exposing additional parameters - in user space - as the need arises.
If there is part of the graphics state that you or anyone else finds they need, just ask and it'll get added quickly. What I will not do is just expose the graphics state object and require API users to take care of the coordinate transformations, etc... If someone wants access to the graphics state directly like that, they should sub-class PdfContentStreamProcessor. Note that I really don't recommend that - it is much better for the iText community at large if we continue to flesh out TextRenderInfo. Your question about text rise is a perfect example of where TextRenderInfo needs to be expanded - and why it wasn't just blindly added to the TextRenderInfo API at the outset. I think that a brief discussion of how it should be incorporated is called for. For example, should we increase the baseline, ascentline and descentline by the rise? If we do that, it makes it easier for simple rendering, but then we lose the fact that there *was* a rise explicitly specified. Alternatively, we could just expose the amount of the rise - but then this puts a much larger burden on the users of TextRenderInfo. My inclination is to include the rise in the baseline, ascentline and descentline, then expose getRise() so that someone who needs the actual value can get to it (along with clear javadocs indicating that the rise has already been added to the ***lines). What do you think? String width can be obtained from the baseline - in all of my work, I've found that having the baseline vector is the important thing (as opposed to the width of the string) - I'm certainly open to exposing getStringWidth() if there is a valid use-case for it - but using a scalar instead of a vector can cause huge problems in rendering (for example, if the text is rotated), so I'm having a hard time thinking of a use case... I think that preparing a white paper or something along those lines to describe the parser would be a good idea - as with most projects, it is difficult to find time to create such things, but I will add it to my list. -- View this message in context: http://itext-general.2136553.n4.nabble.com/PdfContentStreamProcessor-not-handling-TJ-operator-correctly-maybe-tp4656117p4656244.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
