From: Shishir Mane-Patil <[email protected]>
Date: March 25, 2009 6:57:37 AM EDT
To: [email protected], [email protected]
Subject: Re: Finding the x-coordinate and width of a sub-string
Hi,
I am already having all the TextPosition objects for a particular
Pdf page.
So I can always retrieve the font and font size for a particular
string. For
instance, if we consider the earlier example:
String[75.0,278.8 fs=10.0 xscale=1.0 height=7.0000005 space=5.830001
width=108.87001]Primary Diagnosis: elder
Earlier if i had to find the x-coordinate of the word Diagnosis, I
would
perform the following steps (considering the above example):
1. Find the PDFont object using the TextPosition
2. Then use the stringWidth function to calculate the string width of
"Primary ". Let's say it is sw. The current value of x-coordinate
is x, the
x-scale is xs and the font size is fs.
3. Then to calculate the new x-coordinate of, let's say, the word
"Diagnosis", i use the following formula:
New X-Coordinate = x+((sw/1000)*xs*fs)
4. Similarly i also found the string width for the word "Diagnosis".
The above steps worked satisfactorily for many PDF's substrings.
But they
seem to fail for some. In case of success, it was observed that the
string
width returned from the TextPosition object was very much near to
the one
calculated by the above formula. In case of failure, it was
observed that
the string width returned by the PDFont object was either zero or was
calculated incorrectly.
It is known that some fonts return a character width of zero. Bugs
have been logged for this.
Are you using the incubator trunk version of the 0.7.3 version?
Justin LeFebvre and I have done a bunch of work on the newest version
on text extraction / layout / spacing. The widths[] array stores the
width of each character in TextPosition (you can get it from
getIndividualWidths()). There should be some more spacing fixes that
are checked in by the end of the week.
brian