Hi Christian, Christian Eric Paran wrote > My Problem here is that I do not know what kind of Strategy it needs. I > can not find a specific sample that would help me understand how it works. * > What are other class and methods that could help me Search the PDF and > Extract a Paragraph/Sentences? > Can you give some examples(Maybe Links) of how you did it? *
The content of a PDF (generally) does not contain information on which part of it forms a paragraph or a sentence; instead all it contains are position letter groups (with a font and an affine transformation which positions those letters, rotatates, skews, and stretches them). The foremost task of a text extraction strategy (or more generally a RenderListener) is to somehow make sense of these letter groups to return some text or other information. iText includes some such strategies: * SimpleTextExtractionStrategy: A very simple strategy which assumes the letter groups already are in the correct order in the PDF and, thus, can simply be concatenated in the order they are received, merely some spaces or line feeds are added. Font information and affine deformation are ignored. * LocationTextExtractionStrategy: A slightly more complex strategy which collects the letter groups and eventually combines them by their starting coordinate when asked for the text. Font information and affine deformation are ignored. * FilteredTextRenderListener: This actually merely is a wrapper for some other strategy which allows to filter the incoming letter groups forwarded to the wrapped listener, e.g. to restrict to a given region on the page. As you see, there is no strategy in the iText base distribution that does more in respect to text analysis. Depending on your requirements you can do this analysis on the string returned by the text extraction, e.g. by splitting at periods '.' or at line feeds '\n'. If you need more, copy the LocationTextExtractionStrategy as a start (it already collects text chunks and does the chunk analysis in the end) and expand the analysis to return the information you need. Regards, Michael PS: You can find some examples at http://itextpdf.com/book/chapter.php?id=15 --- look for Extract* in the column titled "Examples". -- View this message in context: http://itext-general.2136553.n4.nabble.com/Searching-PDF-Contents-tp4656680p4656685.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
