Hi! Erik, Yes basic seems to be working. a) My problem is there is a chances that query is not present in stored content of a file so sometimes I am getting empty strings at line#106 so I have to put a special check at line#109 and line#126. I guess this is not a problem. What you think? b) When I click on a doc path that was generated by line#120 and line#121 The files that it open do not have a searched query highlighted. Any suggestion for this? How I can do?
-----Original Message----- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, April 04, 2005 8:45 PM To: java-user@lucene.apache.org Subject: Re: HTML pages highlighter On Apr 4, 2005, at 5:35 PM, Yagnesh Shah wrote: > I end up purchasing your book "Lucene in Action". I have downloaded > your code samples. I am able to retrieve "result" only some time. > Below is the code I have taken from Search.jhtml in lucene demo. I > have 2 problem > > a) I am unable to display "result" using > b) When I click on the title to retrieve document I do not see my > query highlighted. First things first.... get something very very simple working and expand from there. Here is the simple code from our HighlightIt.java: TermQuery query = new TermQuery(new Term("f", "ipsum")); QueryScorer scorer = new QueryScorer(query); SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<span class=\"highlight\">", "</span>"); Highlighter highlighter = new Highlighter(formatter, scorer); Fragmenter fragmenter = new SimpleFragmenter(50); highlighter.setTextFragmenter(fragmenter); TokenStream tokenStream = new StandardAnalyzer() .tokenStream("f", new StringReader(text)); String result = highlighter.getBestFragments(tokenStream, text, 5, "..."); One trick is that you must ensure the query you are passing to QueryScorer has been rewritten. In our simple TermQuery case, that is not necessary, but in a general application it is. You can call query.rewrite(reader) where reader is your IndexReader instance. This ensures that range, fuzzy, and wildcard queries are expanded and highlightable. I'm not sure what is wrong with the code you are trying. But again, start simple, just try out our HighlightIt or our HighlightTest. If those work fine for you then move on to integrating further with your index. Besides the Query.rewrite() trick, you have to be sure that the text you want to highlight is available. If you're pulling it from the index, it must be in a stored field, otherwise you need to retrieve it from elsewhere. Erik > > <java> > > Searcher searcher = new IndexSearcher(getReader(indexName)); > > // get query from request > String queryString = request.getParameter("query"); > > query = QueryParser.parse(queryString, "contents", analyzer); > Hits hits = searcher.search(query); > SimpleHTMLFormatter formatter = > new SimpleHTMLFormatter(); > Highlighter highlighter = new Highlighter(formatter, new > QueryScorer(query)); > highlighter.setTextFragmenter(new SimpleFragmenter(50)); > String FIELD_NAME = "contents"; > > for (int i = start; i < end; i++) { // display the hits > Document doc = hits.doc(i); > String text = hits.doc(i).get(FIELD_NAME); > int maxNumFragmentsRequired = 5; > String fragmentSeparator = "..."; > if ( text != null){ > TokenStream tokenStream = new > StandardAnalyzer().tokenStream(FIELD_NAME, new > java.io.StringReader(text)); > String result = > highlighter.getBestFragments > (tokenStream,text,maxNumFragmentsRequired,fragmentSeparator); > System.out.println("result=" +result); > } > > String title = doc.get("title"); > if (title.equals("")) // use url for docs > w/o title > title = doc.get("path"); > </java> > <p><b><java type=print>(int)(hits.score(i) * 100.0f)</java>% > <a href="`doc.get("path")`"> > <java type=print>Entities.encode(title)</java> > </b></a> > <java> > if (showSummaries) { // maybe show summary > </java> > <ul><i>Summary</i>: > <java type=print>Entities.encode(doc.get("summary"))</java> > </ul> > <java> > } > } > </java> > > > > -----Original Message----- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: Thursday, March 31, 2005 8:04 PM > To: java-user@lucene.apache.org > Subject: Re: HTML pages highlighter > > > > On Mar 31, 2005, at 6:36 PM, Yagnesh Shah wrote: >> try { >> fis = new FileInputStream(f); >> HTMLParser parser = new HTMLParser(fis); >> >> // Add the tag-stripped contents as a Reader-valued Text field >> so it will >> // get tokenized and indexed. >> // doc.add(new Field("contents", parser.getReader())); >> LineNumberReader reader = new >> LineNumberReader(parser.getReader()); >> for (String l = reader.readLine(); l != null; l = >> reader.readLine()) >> // System.out.println(l); >> doc.add(Field.Text("contents", l)); > > Notice that your loop here is adding a "contents" field for *every* > line read since that is where the first semi-colon is. > > Look at using Luke to explore your index. Try indexing just a dummy > String: > > doc.add(Field.Text("contents", "some dummy text")); > > to show that it works. Always always always simplify a complicated > situation by doing the most obvious thing that _should_ work. > > Also, the demo Lucene code is not really designed to be used in a > production application (sadly), so you're better off borrowing code > from the many articles or our book to begin with. > > Erik > > >> >> // Add the summary as a field that is stored and returned with >> // hit documents for display. >> doc.add(new Field("summary", parser.getSummary(), >> Field.Store.YES, Field.Index.NO)); >> >> // Add the title as a field that it can be searched and that is >> stored. >> doc.add(new Field("title", parser.getTitle(), Field.Store.YES, >> Field.Index.TOKENIZED)); >> } >> >> >> >> -----Original Message----- >> From: Erik Hatcher [mailto:[EMAIL PROTECTED] >> Sent: Wednesday, March 30, 2005 7:38 PM >> To: java-user@lucene.apache.org >> Subject: Re: HTML pages highlighter >> >> >> >> On Mar 30, 2005, at 4:46 PM, Yagnesh Shah wrote: >> >>> Hi! Eric, >> >> Erik - with a 'k' - Sorry, I let it slide once though :) >> >>> I try to modified that with this but I get compile error. Do you >>> have >>> any code snippet of highlighting code to pull the contents from the >>> original source? >> >> I have a whole book full of code examples :) >> http://www.lucenebook.com - Grab the source code and look in >> src/lia/tools at Highlight*.java >> >>> or Do you know how I can do field store? >>> >>> doc.add(new Field("contents", parser.getReader(), >>> Field.Store.YES, Field.Index.NO)); >> >> You cannot store it with a Reader. You need to use Field.Text(String, >> String), or one of the other variations. >> >> Erik >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]