FYI -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Thursday, March 09, 2006 6:01 AM To: [email protected] Subject: [iText-questions] Extracting text location for highlighting in reader
I'm looking into how you can ask the acrobat reader web plugin to highlight words so that we can get hit-highlighting of web search working for an application. I've read thru this document: http://partners.adobe.com/public/developer/en/pdf/HighlightFileFormat.pd f It seems that you pass an XML document to the reader defining where you want highlighting. However I then need to know the offset on the page of where I want to highlight (offset is a count either in characters or words). So - is iText a good way to extract just the text of a page so that we can use it to calculate the offsets? -- Chris At 06:01 AM 3/9/2006, [EMAIL PROTECTED] wrote: >http://partners.adobe.com/public/developer/en/pdf/HighlightFileFormat.p >df > >It seems that you pass an XML document to the reader defining where you >want highlighting. Correct. >However I then need to know the offset on the page of where I want to >highlight (offset is a count either in characters or words). Correct. >So - is iText a good way to extract just the text of a page so that we >can use it to calculate the offsets? No. Look at PdfBox or Multivalent. Leonard On Thu, Mar 09, 2006 at 06:50:09AM -0500, Leonard Rosenthol wrote: > > >So - is iText a good way to extract just the text of a page so that > >we can use it to calculate the offsets? > > No. > > Look at PdfBox or Multivalent. Thanks for the pointer. Seems like the char offset method isn't too reliable (something that's 150 chars inside the text fiel from PDFBox is 200 chars in according to the highlighter in reader. But - with word based offset (and a lot of guesswork as to what acrobat reader thinks is a word boundary) then this looks like it might actually fly :) -- Chris Searle [EMAIL PROTECTED] ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
