Hi, I am facing some issues with extracting text wrapped by the PDAnnotationLink. First a little background:
I am using a the PDFTextStripper class to extract individual bounding boxes for each character on the page. Then I extract the rectangle from the PDAnnotationLink instance. Finally I traverse the list of characters and see which all characters lie inside the bounding rectangle for the link. It works fine for most of the cases. It fails in two scenarios: a) the link text breaks on line and continues on the next line. Thus the bounding rectangle selects the entire text for both the lines. As a result my algorithm fails. b) Sometimes the character bounding rectangle coordinates lie outside the bounding rectangle for the link, even though visibly the character seems to be inside the link. As a result I am unable to select those characters. Does anyone have a better idea about how to approach this problem? thanks, Navendu
