The main code has now been updated in the new SVN repository here: http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/highlighter/
To encode your content simply pass an encoder to the Highlighter eg:
//create an example doc for this test String myDocContent = "\"Smith & sons' prices < 3 and >4\" claims article"; //Ordinarily you'd get the doc content like this..
//myDocContent=hits.doc(i).get(FIELD_NAME)
//create a query - you'd normally get this from QueryParser.parse Query myDocQuery=new TermQuery(new Term("contents","prices"));
//Create a highlighter and pass a QueryScorer to provide the list of query tokens Highlighter highlighter = new Highlighter(new QueryScorer(myDocQuery));
//set the choice of encoder to our simple encoder - otherwise default is no encoding
highlighter.setEncoder(new SimpleHTMLEncoder());
//Tokenize the document content to get the positions using an analyzer:
Analyzer analyzer=new WhitespaceAnalyzer();
TokenStream tokenStream = analyzer.tokenStream("contents", new StringReader(myDocContent));
//As a faster alternative to re-analyzing doc content you can
//use "TokenSources" to take advantage of any pre-tokenized content held in any term vectors:
//TokenStream tokenStream=TokenSources.getAnyTokenStream(indexReader,docId, fieldName,analyzer);
//Now pass the tokenStream to the highlighter to process
String encodedSnippet = highlighter.getBestFragments(tokenStream, myDocContent,1,"...");
System.out.println(encodedSnippet);
//Should print "Smith & sons' <B>prices</B> < 3 and >4" claims article
Cheers Mark
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]