Re: CharFilter, analysis.jsp

2009-08-17 Thread Yonik Seeley
I broke it with reusable token streams.  Just checked in a fix - can
you try now?

-Yonik
http://www.lucidimagination.com


On Mon, Aug 17, 2009 at 10:17 PM, Erik Hatchererik.hatc...@gmail.com wrote:
 I'm interested in using a CharFilter, something like this:

    fieldType name=html_text class=solr.TextField
      analyzer
        charFilter class=solr.HTMLStripCharFilterFactory/
        tokenizer class=solr.WhitespaceTokenizerFactory/
      /analyzer
    /fieldType

 In hopes of being able to put in a value like
 htmlbodywhatever/body/html and have whatever come back out.  In
 analysis.jsp, I see that happening in the verbose output but it doesn't make
 it to the tokenizer input - the original string makes it there.

 I must be misunderstanding something about CharFilter's and how to use them
 in Solr.  HTMLStripWhitespaceTokenizerFactory is deprecated in favor of the
 above design, I think, but does what I'm after.

 Solr only seems to use CharFilter's in analysis.jsp.  Is that correct?
  Shouldn't they be factored into the analyzer for each field?  (like in
 FieldAnalysisRequestHandler)

 Thanks,
        Erik




Re: CharFilter, analysis.jsp

2009-08-17 Thread Yonik Seeley
On Mon, Aug 17, 2009 at 11:03 PM, Erik Hatchererik.hatc...@gmail.com wrote:
 That fixes it with analysis.jsp, but not with FieldAnalysisRequestHandler I
 don't think.  Using that field definition below, and this request -

 http://localhost:8983/solr/analysis/field?analysis.fieldtype=html_textanalysis.fieldvalue=%3Chtml%3E%3Cbody%3Ewhatever%3C/body%3E%3C/html%3E

 I still see str name=texthtmlbodywhatever/body/html/str come
 out of WhitespaceTokenizer.

 Does the consumer of an Analyzer from a FieldType have to do anything
 special to utilize CharFilter's?  Or it should all just work?

Normal users of the Analyzer should see it just work - but
FieldAnalysisRequestHandler doesn't use the Analyzer... it pulls it
apart and uses the parts separately.  It would be up to that code to
apply any char filters, and apparently it doesn't.

-Yonik
http://www.lucidimagination.com