On 12/05/2011 15:47, Wulf Berschin wrote:
I think support for highlighting documents would be a very welcome
feature. Highlighting HTML documents is already possible with the
org.apache.solr.analysis.HTMLStripCharFilter and a NullFragmenter, but
ther seems to be nothing for highlighting PDF files...
It would be very useful. That said being able to highlight words in any
pictorial representation of a document would be a huge bonus.
As starting point I quarried out from
org.apache.lucene.search.highlight.Highlighter the class below which
just returns the Tokens contributing to the hit.
I use a similar (home brew) solution to extract the hit terms, and then
pass them to the Adobe PDF viewer plugin as a search term via the PDF URL.
--
Rgds.
*Dawn Raison*
Technical Director, Digitorial Ltd.
E:d...@digitorial.co.uk W:http://www.digitorial.co.uk
M: 07956 609 618 T: 01428 729 431
Reg: 04644583, England& Wales
Church Villas Ecchinswell, Newbury, RG20 4TT
This email and any attached files are for the exclusive use of the
addressee and may contain privileged and/or confidential information. If
you receive this email in error you should not disclose the contents to
any other person nor take copies but should delete it immediately.
Digitorial Ltd makes no warranty as to the accuracy or completeness of
this email and accepts no liability for its contents or use. Any
opinions expressed in this email are those of the author and do not
necessarily reflect the opinions of Digitorial Ltd.