> I will be thrilled to have the tests - a week or two delay is well worth it > to have test coverage. > > As a starting point, there is a unit test for TextRenderInfo that includes > code for creating a PDF. The LocationTextExtractionStrategyTest has quite > a > few different scenarios involving rotating text, etc... > > My biggest hangup here is that I just have no idea what the various use-cases are that involve this functionality, so I could create tests, but > I'd have no idea if they were doing what you wanted or not. > > I suppose one simple test would be to check that the baseline now doesn't > include the extra character space at the end - but what's the best way to > test this? I suppose we could set a really large character space value, render a word, then compute the length of the baseline. But am I comparing > it against just a constant (i.e. 12.724)?? How do I even know what that distance *should* be? I'm sure there is a good answer (and when you tell > me, I'll feel a bit foolish)!
Very sorry for this delay, I have been also quite busy these days. I have tested my sample files by simply extracting all the characters with coordinates and checking if they match the characters positions in the image of the PDF file itself. I have used our own tool that displays the image of the PDF file and the generated structure (lines, words and characters) on top of it. As I mentioned, we are working only with scientific articles, which rarely include rotating, etc. So one possibility would be to test returned values against coordinates captured from the PDF images. I think I could use our tool to capture those values, although it would require a small improvement of the tool itself. Of course we would have to use a small tolerance value during the comparison. I have been also thinking of an alternative. What I observed in my test cases was for example a huge overlap between neighbouring characters, or no space between different words (this happened when character spacing was used for generating spaces). So maybe we could generate small PDFs containing exactly the same sentence in the same position but generated differently (eg. large character spacing used as spaces between words, spaces written directly by Tj operator, text matrix set after writing each character, etc.), and then check: if the characters in the same word are close but not overlap much, if neighboring words are separated by a gap, etc. And if our files looked the same, we could also check if we get similar coordinates in every case. Of course here also we would have to use some small tolerance values. In both cases, the bugs that occurred initially should be detected, assuming that we use a few very different ways to render text in sample PDFs. Best regards, Dominika ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://ad.doubleclick.net/clk;258768047;13503038;j? http://info.appdynamics.com/FreeJavaPerformanceDownload.html _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
