Hi,
I added some log trace so I can see more detail...
finding now:
-both nutchBean and webapps fail, only Luke success (by manually select the
correct analyzer)
-the Analyzer I wrap by plugin is org.apache.lucene.analysis.cjk.CJKAnalyzer
for zh locale. Method is followed the Nutch wiki of multilingual support,
http://wiki.apache.org/nutch/MultiLingualSupport
But since language identifier is not supported the zh locale, so I hack
mentioned in another post in Nutch-User
-the analyzer loaded and the lang is fit my expectation
-but only english and single word can be found. When I click explain, I can
see single word only but what the CJKAnalyzer
Plugin.xml:
<?xml version="1.0" encoding="UTF-8"?>
<plugin
id="analysis-zh"
name="Chinese Analysis Plug-in"
version="1.0.0"
provider-name="org.apache.nutch">
<runtime>
<library name="analysis-zh.jar">
<export name="*"/>
</library>
</runtime>
<requires>
<import plugin="nutch-extensionpoints"/>
<import plugin="lib-lucene-analyzers"/>
</requires>
<extension id="org.apache.nutch.analysis.zh"
name="ChineseAnalyzer"
point="org.apache.nutch.analysis.NutchAnalyzer">
<implementation id="org.apache.nutch.analysis.zh.ChineseAnalyzer"
class="org.apache.nutch.analysis.zh.ChineseAnalyzer">
<parameter name="lang" value="zh"/>
</implementation>
</extension>
</plugin>
Java class:
package org.apache.nutch.analysis.zh;
// JDK imports
import java.io.Reader;
// Lucene imports
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
// Nutch imports
import org.apache.nutch.analysis.NutchAnalyzer;
public class ChineseAnalyzer extends NutchAnalyzer {
private final static Analyzer ANALYZER =
new org.apache.lucene.analysis.cjk.CJKAnalyzer();
/** Creates a new instance of ChineseAnalyzer */
public ChineseAnalyzer() { }
public TokenStream tokenStream(String fieldName, Reader reader) {
return ANALYZER.tokenStream(fieldName, reader);
}
}
<property>
<name>plugin.includes</name>
<value>analysis-(zh)|language-identifier|protocol-http|urlfilter-regex|parse-(text|html|js|rss)|feed|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
</property>
Can anyone provide more hint about the query processing so I can get which
part is failed?
Sorry for keep on posting as I really doesn't know where is the failure
point...
Vinci wrote:
>
> Hi all,
>
> I have changed the Analyzer of nutch and make it work for the luence
> sandbox analyzer. I use luke to check the language and the query and they
> look work fine. However, I find the method posted in wiki is not work fine
> for me, and most of the post just mention how to make the index work but
> not how to dealing with the query when plugin in use.
>
> now, I looked at the catnalina log, I see it know the language about
> query...
>
> <timestamp> lang:zh
>
> But the result is not correct. When I trace the explain, I find it cut in
> keyword analyzer manner, not the one I used for zh by plugin.
>
> Can anybody help me? Thank a lots.
>
--
View this message in context:
http://www.nabble.com/Chnage-the-Analyzer-by-plugin---how-to-dealing-with-the-query--tp16077090p16077954.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.