RE: HTML pages highlighter

Yagnesh Shah Wed, 30 Mar 2005 13:46:32 -0800

Hi! Eric,
        I try to modified that with this but I get compile error. Do you have 
any code snippet of highlighting code to pull the contents from the original 
source? or Do you know how I can do field store?


      doc.add(new Field("contents", parser.getReader(), Field.Store.YES, 
Field.Index.NO));


-----Original Message-----
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 30, 2005 4:35 PM
To: [email protected]
Subject: Re: HTML pages highlighter



On Mar 30, 2005, at 4:17 PM, Yagnesh Shah wrote:

> Hi! Eric,
>       One more thing, I am using the same HTMLDocument.java that comes with 
> /trunk/src/demo/org/apache/lucene/demo

Which does this:

         doc.add(new Field("contents", parser.getReader()));

That is not a stored field.  In other words, the original "contents" 
are not available from the Lucene index.   You will have to adjust your 
indexing code to store the contents, or adjust your highlighting code 
to pull the contents from the original source again.

        Erik


>
> -----Original Message-----
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, March 30, 2005 4:01 PM
> To: [email protected]
> Subject: Re: HTML pages highlighter
>
>
> How did you index "contents"?  If you did not use a stored field type,
> then that is the issue.
>
>       Erik
>
> On Mar 30, 2005, at 12:31 PM, Yagnesh Shah wrote:
>
>> Hello Lucene-User,
>>      Is any one try to do highlighting with HTML pages?
>>
>> I am trying to do this using demo example by Keld H. Hansen article
>> "Unweaving a Tangled Web HTMLParser and Lucene" but I am getting
>> "null" value for text at line #47 Any Idea?
>>
>>       1 package org.apache.lucene.search.highlight;
>>       2
>>       3 import java.io.StringReader;
>>       4
>>       5 import org.apache.lucene.analysis.Analyzer;
>>       6 import org.apache.lucene.analysis.TokenStream;
>>       7 import org.apache.lucene.analysis.standard.StandardAnalyzer;
>>       8 import org.apache.lucene.queryParser.QueryParser;
>>       9 import org.apache.lucene.search.Hits;
>>      10 import org.apache.lucene.search.IndexSearcher;
>>      11 import org.apache.lucene.search.Query;
>>      12 import org.apache.lucene.search.highlight.Formatter;
>>      13 import org.apache.lucene.search.highlight.Highlighter;
>>      14 import org.apache.lucene.search.highlight.QueryScorer;
>>      15 import org.apache.lucene.search.highlight.SimpleFragmenter;
>>      16
>>      17 public class Searcher {
>>      18
>>      19    static Query query;
>>      20    static Hits hits;
>>      21
>>      22    private static final String FIELD_NAME = "contents";
>>      23    private static final String indexDir =
>> "/opt/dynamo/prod/hww-doc/hww/help/index";
>>      24
>>      25    private static Analyzer analyzer = new StandardAnalyzer();
>>      26
>>      27    public static void main(String[] args) throws Exception {
>>      28
>>      29       IndexSearcher is   = new IndexSearcher(indexDir);
>>      30       String searchCriteria = "scholarly";
>>      31       query = QueryParser.parse(searchCriteria, "contents",
>> analyzer);
>>      32
>>      33       hits  = is.search(query);
>>      34       System.out.println("found in: " + query
>> +"\nhits-length:" +hits.length());
>>      35
>>      36       doStandardHighlights();
>>      37
>>      38       is.close();
>>      39    }
>>      40
>>      41    static void doStandardHighlights() throws Exception {
>>      42       Highlighter highlighter = new Highlighter(new
>> MyBolder(), new QueryScorer(query));
>>      43       System.out.println("Highlighter: " + highlighter
>> +"\nhits-length:" +hits.length());
>>      44       highlighter.setTextFragmenter(new SimpleFragmenter(20));
>>      45       for (int i = 0; i < hits.length(); i++) {
>>      46          System.out.println("URL " + (i + 1) + ": " +
>> hits.doc(i).getField("path").stringValue());
>>      47          String text = hits.doc(i).get("FIELD_NAME");
>>      48          int maxNumFragmentsRequired = 2;
>>      49          String fragmentSeparator = "...";
>>      50          TokenStream tokenStream =
>> analyzer.tokenStream(FIELD_NAME, new StringReader(text));
>>      51
>>      52          String result =
>>      53             highlighter.getBestFragments(
>>      54                tokenStream,
>>      55                text,
>>      56                maxNumFragmentsRequired,
>>      57                fragmentSeparator);
>>      58          System.out.println("\tfound in: " + result);
>>      59       }
>>      60    }
>>      61
>>      62    private static class MyBolder implements Formatter {
>>      63         public String highlightTerm(String originalText ,
>> TokenGroup group)
>>      64         {
>>      65                 if(group.getTotalScore()<=0)
>>      66                 {
>>      67                         return originalText;
>>      68                 }
>>      69                 return "<b>" + originalText + "</b>";
>>      70         }
>>      71    }
>>      72
>>      73 }
>>
>> Yagnesh N. Shah
>> Senior Technology Engineer
>> CS Dept., 4th Floor
>> H. W. Wilson
>> 950 University Avenue,
>> Bronx NY 10452
>> (718) 588 8400 x2721
>> http://www.hwwilson.com
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: HTML pages highlighter

Reply via email to