From: Shishir Mane-Patil <[email protected]>
Date: March 25, 2009 6:57:37 AM EDT
To: [email protected], [email protected]
Subject: Re: Finding the x-coordinate and width of a sub-string


Hi,
I am already having all the TextPosition objects for a particular Pdf page. So I can always retrieve the font and font size for a particular string. For
instance, if we consider the earlier example:

String[75.0,278.8 fs=10.0 xscale=1.0 height=7.0000005 space=5.830001
width=108.87001]Primary Diagnosis: elder

Earlier if i had to find the x-coordinate of the word Diagnosis, I would
perform the following steps (considering the above example):

1. Find the PDFont object using the TextPosition

2. Then use the stringWidth function to calculate the string width of
"Primary ". Let's say it is sw. The current value of x-coordinate is x, the
x-scale is xs and the font size is fs.

3. Then to calculate the new x-coordinate of, let's say, the word
"Diagnosis", i use the following formula:
         New X-Coordinate = x+((sw/1000)*xs*fs)

4. Similarly i also found the string width for the word "Diagnosis".

The above steps worked satisfactorily for many PDF's substrings. But they seem to fail for some. In case of success, it was observed that the string width returned from the TextPosition object was very much near to the one calculated by the above formula. In case of failure, it was observed that
the string width returned by the PDFont object was either zero or was
calculated incorrectly.

It is known that some fonts return a character width of zero. Bugs have been logged for this.

Are you using the incubator trunk version of the 0.7.3 version? Justin LeFebvre and I have done a bunch of work on the newest version on text extraction / layout / spacing. The widths[] array stores the width of each character in TextPosition (you can get it from getIndividualWidths()). There should be some more spacing fixes that are checked in by the end of the week.

brian

Reply via email to