Thank you for the help,

I'll surely give feedback after i am done

On 1/12/06, Jérôme Charron <[EMAIL PROTECTED]> wrote:
> > Would you tell me where i can get help document on How to use NGramProfile
> > to train the
> > language identifier and how to detect it.
>
> unfortunaly, there's no help document.
> Here is how to use the NGramProfile:
> java org.apache.nutch.analysis.lang.NGramProfile -create <profile-name>
> <filename> <encoding>
> Where:
> * profile-name is the ISO-639 language code (en, fr, de, ...) of the
> language profile you want to create (mr for Marathi)
> * filename is the name of the file you want to use to create the profile.
> * encoding is the encoding of the file names filename
>
> Once your profile is created, the detection part is done.
> Just add the languageidentifier plugin in your Nutch conf.
> Perform a crawl, and if all is working fine you should see a trace with
> something like:
> Analysis .... with analyzer ..... (language-code)
>
> Since you don't provide a specific analyzer associated to your new language
> code (mr), the default NutchAnalyzer will be used.
>
> Then create an Analyzer for Marathi by creating a new plugin (see for
> instance analysis-de or analysis-fr plugins provided in the Nutch source).
> Here is what must provide your plugin:
> * An analyzer extension that implements
> org.apache.nutch.analysis.NutchAnalyzer interface.
> * The plugin.xml descriptor of your plugin must declare the association
> between your analyzer and the language it should be used for. Something
> like:
> <implementation id="org.apache.nutch.analysis.mr.MarathiAnalyzer" class="
> org.apache.nutch.analysis.mr.MarathiAnalyzer" lang="mr"/>
>
> Once this plugin is finished, just add it to the list of activated plugins
> in your configuration. Then the next time you perform a crawl, this analyzer
> will be used for documents identified as Marathi documents.
>
>
> >
> > Will it be OK if i use Stop Analyzer instead of NutchDocumentAnalyzer with
> > my custom stopwords?
>
> It's a first step to a language specific analyzer.
>
>
> where i have to make changes in Nutch code?
>
> As you can notice, there is no changes to do in the Nutch code.
> Just provide some more piece of code to plug in Nutch.
>
> If you can provide us feed-back on integrating Marathi in Nutch, it will be
> very appreciated.
>
> Regards
>
> Jérôme
>
> --
> http://motrech.free.fr/
> http://www.frutch.org/
>
>

Reply via email to