Re: [2] Highlighting problems with HTML tagged fields

2007-03-07 Thread Chris Hostetter

It is tracked in http://issues.apache.org/jira/browse/SOLR-42

...there are currently no patches.


: Date: Tue, 6 Mar 2007 15:04:25 -0800 (PST)
: From: nick19701 [EMAIL PROTECTED]
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: [2] Highlighting problems with HTML tagged fields
:
:
:
: Yonik Seeley wrote:
: 
:  HTMLStripWhitespaceTokenizerFactory works in two phases...
:  HTMLStripReader removes the HTML and passes the result to
:  WhitespaceTokenizer... at that point, Tokens are generated, but the
:  offsets will correspond to the text after HTML removal, not before.
: 
:  I did it this way so that HTMLStripReader  could go before any
:  tokenizer (like StandardTokenizer).
: 
:  Can you open a JIRA bug for this?  The fix would be a special version
:  of HTMLStripReader integrated with a WhitespaceTokenizer to keep
:  offsets correct.
: 
:  -Yonik
: 
: 
: Is there a fix for this problem?
:
: my solr is dated on 12/17/2006. HTMLStripWhitespaceTokenizerFactory +
: highlighting still
: doesn't work. All the wrong items are highlighted.
: --
: View this message in context: 
http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9343253
: Sent from the Solr - User mailing list archive at Nabble.com.
:



-Hoss



Re: [2] Highlighting problems with HTML tagged fields

2007-03-07 Thread nick19701


Chris Hostetter wrote:
 
 
 It is tracked in http://issues.apache.org/jira/browse/SOLR-42
 
 ...there are currently no patches.
 
 

The suggested fix from Mirko seems very simple. Hopefull a patch will be
applied 
very soon. In the meantime, I'll use my backup solution: 
http://fucoder.com/code/se-hilite/ http://fucoder.com/code/se-hilite/ 


-- 
View this message in context: 
http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9363720
Sent from the Solr - User mailing list archive at Nabble.com.



Re: [2] Highlighting problems with HTML tagged fields

2007-03-07 Thread nick19701


Chris Hostetter wrote:
 
 
 patches for issues can't be applied until someone who cares about them
 write them and contribute them for committers to consider/apply :)
 
 

it seems I'm one of the very few people who care about this feature :)

Unfortunately my daily languages are c++ and c#. I only know a little bit
Java. Otherwise I'll contribute.

-- 
View this message in context: 
http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9365098
Sent from the Solr - User mailing list archive at Nabble.com.



Re: [2] Highlighting problems with HTML tagged fields

2007-03-06 Thread nick19701


Yonik Seeley wrote:
 
 HTMLStripWhitespaceTokenizerFactory works in two phases...
 HTMLStripReader removes the HTML and passes the result to
 WhitespaceTokenizer... at that point, Tokens are generated, but the
 offsets will correspond to the text after HTML removal, not before.
 
 I did it this way so that HTMLStripReader  could go before any
 tokenizer (like StandardTokenizer).
 
 Can you open a JIRA bug for this?  The fix would be a special version
 of HTMLStripReader integrated with a WhitespaceTokenizer to keep
 offsets correct.
 
 -Yonik
 
 
Is there a fix for this problem?

my solr is dated on 12/17/2006. HTMLStripWhitespaceTokenizerFactory +
highlighting still
doesn't work. All the wrong items are highlighted.
-- 
View this message in context: 
http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9343253
Sent from the Solr - User mailing list archive at Nabble.com.