Hi,

Thanks for the quick response.
> On 20/07/2011 20:58, Dániel Kékesi wrote:
>> Dear All,
>>
>> I am using iTextSharp in my application and found its text extraction
>> capabilities excellent. I am facing a problem though. I use the
>> PdfTextExtractor.GetTextFromPage method but it returns text pieces that are
>> far apart separated by a single space. Take the following example (as
>> displayed in Acrobat):
>>
>> User name: abcdef             Password: Cool1234
>>
>> In the PDF there are no spaces between "abcdef" and "Password".
>
> No, there aren't. But there aren't any tabs either.
> Both PDF strings are added at coordinates with (about) the same X coordinate,
> but with a Y coordinate that puts them far apart.

I am aware that there are no spaces, and the text extraction method also knows 
this. It just simply puts a space in there for the lack of a better solution.

>> If I extract the above text using PdfTextExtractor.GetTextFromPage I'll get
>> the following result:
>>
>> User name: abcdef Password: Cool1234
>
> That's correct.
>
>> So the distance between the two words were cut down to a single space. What
>> I need to achieve is that the words that are not separated by a space but a
>> larger distance would be separated by a TAB in the resultant text.
>
What I was looking for is a property that tells the text extraction mechanism 
what character(s) to use for separating text instead of just putting a space 
there. That way I could introduce TABs, multi-spaces or even XML tags into the 
string. Is there such a property or setting? If not, do you plan to add it :) ?

> That's not trivial. You'd need to examine the Y coordinates.
Do you mean the X coordinates?

>
>> I am guessing that I should abandon PdfTextExtractor.GetTextFromPage and use
>> the LocationTextExtractionStrategy class combined with TextRenderInfo
>
> Yes, TextRenderInfo will give you the info about the coordinates, but you'll
> have to do plenty of programming.
> Either you'll have to do that programming yourself, or you'll have to hire
> somebody to do it for you.
Can you recommend someone with the appropriate skills?

Thanks again.

Best Regards,
Daniel

------------------------------------------------------------------------------
5 Ways to Improve & Secure Unified Communications
Unified Communications promises greater efficiencies for business. UC can 
improve internal communications as well as offer faster, more efficient ways
to interact with customers and streamline customer service. Learn more!
http://www.accelacomm.com/jaw/sfnl/114/51426253/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to