On 20/07/2011 20:58, Dániel Kékesi wrote:
Dear All,
I am using iTextSharp in my application and found its text extraction
capabilities excellent. I am facing a problem though. I use the
PdfTextExtractor.GetTextFromPage method but it returns text pieces
that are far apart separated by a single space. Take the following
example (as displayed in Acrobat):
User name: abcdef Password: Cool1234
In the PDF there are no spaces between "abcdef" and "Password".
No, there aren't. But there aren't any tabs either.
Both PDF strings are added at coordinates with (about) the same X
coordinate,
but with a Y coordinate that puts them far apart.
If I extract the above text using PdfTextExtractor.GetTextFromPage
I'll get the following result:
User name: abcdef Password: Cool1234
That's correct.
So the distance between the two words were cut down to a single space.
What I need to achieve is that the words that are not separated by a
space but a larger distance would be separated by a TAB in the
resultant text.
That's not trivial. You'd need to examine the Y coordinates.
I am guessing that I should abandon PdfTextExtractor.GetTextFromPage
and use the LocationTextExtractionStrategy class combined with
TextRenderInfo
Yes, TextRenderInfo will give you the info about the coordinates, but
you'll have to do plenty of programming.
Either you'll have to do that programming yourself, or you'll have to
hire somebody to do it for you.
------------------------------------------------------------------------------
5 Ways to Improve & Secure Unified Communications
Unified Communications promises greater efficiencies for business. UC can
improve internal communications as well as offer faster, more efficient ways
to interact with customers and streamline customer service. Learn more!
http://www.accelacomm.com/jaw/sfnl/114/51426253/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples:
http://itextpdf.com/themes/keywords.php