Thank you for the help, I'll surely give feedback after i am done
On 1/12/06, Jérôme Charron <[EMAIL PROTECTED]> wrote: > > Would you tell me where i can get help document on How to use NGramProfile > > to train the > > language identifier and how to detect it. > > unfortunaly, there's no help document. > Here is how to use the NGramProfile: > java org.apache.nutch.analysis.lang.NGramProfile -create <profile-name> > <filename> <encoding> > Where: > * profile-name is the ISO-639 language code (en, fr, de, ...) of the > language profile you want to create (mr for Marathi) > * filename is the name of the file you want to use to create the profile. > * encoding is the encoding of the file names filename > > Once your profile is created, the detection part is done. > Just add the languageidentifier plugin in your Nutch conf. > Perform a crawl, and if all is working fine you should see a trace with > something like: > Analysis .... with analyzer ..... (language-code) > > Since you don't provide a specific analyzer associated to your new language > code (mr), the default NutchAnalyzer will be used. > > Then create an Analyzer for Marathi by creating a new plugin (see for > instance analysis-de or analysis-fr plugins provided in the Nutch source). > Here is what must provide your plugin: > * An analyzer extension that implements > org.apache.nutch.analysis.NutchAnalyzer interface. > * The plugin.xml descriptor of your plugin must declare the association > between your analyzer and the language it should be used for. Something > like: > <implementation id="org.apache.nutch.analysis.mr.MarathiAnalyzer" class=" > org.apache.nutch.analysis.mr.MarathiAnalyzer" lang="mr"/> > > Once this plugin is finished, just add it to the list of activated plugins > in your configuration. Then the next time you perform a crawl, this analyzer > will be used for documents identified as Marathi documents. > > > > > > Will it be OK if i use Stop Analyzer instead of NutchDocumentAnalyzer with > > my custom stopwords? > > It's a first step to a language specific analyzer. > > > where i have to make changes in Nutch code? > > As you can notice, there is no changes to do in the Nutch code. > Just provide some more piece of code to plug in Nutch. > > If you can provide us feed-back on integrating Marathi in Nutch, it will be > very appreciated. > > Regards > > Jérôme > > -- > http://motrech.free.fr/ > http://www.frutch.org/ > >
