Re: [2] Highlighting problems with HTML tagged fields
It is tracked in http://issues.apache.org/jira/browse/SOLR-42 ...there are currently no patches. : Date: Tue, 6 Mar 2007 15:04:25 -0800 (PST) : From: nick19701 [EMAIL PROTECTED] : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Re: [2] Highlighting problems with HTML tagged fields : : : : Yonik Seeley wrote: : : HTMLStripWhitespaceTokenizerFactory works in two phases... : HTMLStripReader removes the HTML and passes the result to : WhitespaceTokenizer... at that point, Tokens are generated, but the : offsets will correspond to the text after HTML removal, not before. : : I did it this way so that HTMLStripReader could go before any : tokenizer (like StandardTokenizer). : : Can you open a JIRA bug for this? The fix would be a special version : of HTMLStripReader integrated with a WhitespaceTokenizer to keep : offsets correct. : : -Yonik : : : Is there a fix for this problem? : : my solr is dated on 12/17/2006. HTMLStripWhitespaceTokenizerFactory + : highlighting still : doesn't work. All the wrong items are highlighted. : -- : View this message in context: http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9343253 : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss
Re: [2] Highlighting problems with HTML tagged fields
Chris Hostetter wrote: It is tracked in http://issues.apache.org/jira/browse/SOLR-42 ...there are currently no patches. The suggested fix from Mirko seems very simple. Hopefull a patch will be applied very soon. In the meantime, I'll use my backup solution: http://fucoder.com/code/se-hilite/ http://fucoder.com/code/se-hilite/ -- View this message in context: http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9363720 Sent from the Solr - User mailing list archive at Nabble.com.
Re: [2] Highlighting problems with HTML tagged fields
Chris Hostetter wrote: patches for issues can't be applied until someone who cares about them write them and contribute them for committers to consider/apply :) it seems I'm one of the very few people who care about this feature :) Unfortunately my daily languages are c++ and c#. I only know a little bit Java. Otherwise I'll contribute. -- View this message in context: http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9365098 Sent from the Solr - User mailing list archive at Nabble.com.
Re: [2] Highlighting problems with HTML tagged fields
Yonik Seeley wrote: HTMLStripWhitespaceTokenizerFactory works in two phases... HTMLStripReader removes the HTML and passes the result to WhitespaceTokenizer... at that point, Tokens are generated, but the offsets will correspond to the text after HTML removal, not before. I did it this way so that HTMLStripReader could go before any tokenizer (like StandardTokenizer). Can you open a JIRA bug for this? The fix would be a special version of HTMLStripReader integrated with a WhitespaceTokenizer to keep offsets correct. -Yonik Is there a fix for this problem? my solr is dated on 12/17/2006. HTMLStripWhitespaceTokenizerFactory + highlighting still doesn't work. All the wrong items are highlighted. -- View this message in context: http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9343253 Sent from the Solr - User mailing list archive at Nabble.com.