Highlighting a PDF document, last time I looked (quite a while ago),
involves supplying an xml file that describes offsets for highlighting.
You can specify the file in the URL. You can also do simple highlighting
by passing in a list of words to be highlighted, but this does not even
catch minor differences, like singular to plural.
If someone knows more about using to the lucene highlighter to highlight
PDF's then please speak up. I think I will have to get into this soon.
- Mark
Mag Gam wrote:
Thanks for the quick response Erik. I will be getting my LIA book back
very
soon, I forgot it at a destination :-(
Lets assume, there is a document called "hello.pdf" and it has the
content
"this is hello.pdf. It uses Acrobat"
When I perform a search for "Acrobat", i want hello.pdf to show up,
and also
the 'It uses <highlight>Acrobat</highlight>'
something like that.
tia
On 9/7/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
There are test cases in the Highlighter codebase that exercise it and
show its use, as well as a few examples of it in the "Lucene in
Action" codebase.
These examples output plain text with some prefix and suffix
surrounding the highlighted terms. Highlighting text in a PDF is
possible, I'm pretty sure, but I don't think the same would be easily
possible with Microsoft document formats. I'm not sure if you are
asking for these document types to be highlighted or just a plain
text representation of them, though.
Erik
On Sep 7, 2006, at 6:37 PM, Mag Gam wrote:
> Hey
>
> Anyone have a search result highlighter example?
>
> I have various doc, PDFs, DOC, TXT, PPT, and I would like to show a
> highlight, similar to how google does it...
>
> tia
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]