FYI

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Thursday, March 09, 2006 6:01 AM
To: [email protected]
Subject: [iText-questions] Extracting text location for highlighting in
reader


I'm looking into how you can ask the acrobat reader web plugin to
highlight words so that we can get hit-highlighting of web search
working for an application.

I've read thru this document: 

http://partners.adobe.com/public/developer/en/pdf/HighlightFileFormat.pd
f

It seems that you pass an XML document to the reader defining where you
want highlighting.

However I then need to know the offset on the page of where I want to
highlight (offset is a count either in characters or words).

So - is iText a good way to extract just the text of a page so that we
can use it to calculate the offsets?

-- 
Chris

At 06:01 AM 3/9/2006, [EMAIL PROTECTED] wrote:
>http://partners.adobe.com/public/developer/en/pdf/HighlightFileFormat.p
>df
>
>It seems that you pass an XML document to the reader defining where you

>want highlighting.

         Correct.


>However I then need to know the offset on the page of where I want to 
>highlight (offset is a count either in characters or words).

         Correct.


>So - is iText a good way to extract just the text of a page so that we 
>can use it to calculate the offsets?

         No.

         Look at PdfBox or Multivalent.


Leonard



On Thu, Mar 09, 2006 at 06:50:09AM -0500, Leonard Rosenthol wrote:
> 
> >So - is iText a good way to extract just the text of a page so that 
> >we can use it to calculate the offsets?
> 
>         No.
> 
>         Look at PdfBox or Multivalent.

Thanks for the pointer. Seems like the char offset method isn't too
reliable (something that's 150 chars inside the text fiel from PDFBox is
200 chars in 
according to the highlighter in reader.

But - with word based offset (and a lot of guesswork as to what acrobat
reader thinks is a word boundary) then this looks like it might actually
fly :)

-- 
Chris Searle
[EMAIL PROTECTED]




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to