i used an existing ThaiAnalyzer which was in lucene package. ok - i renamed the lucene.analysis.th.* to nutch.analysis.th.* - compiled and placed all class files in a jar - analysis-th.jar (do i need to bundle the ngp file in the jar as well ?)
1. You don't have to refactor the lucene analyzer. Just to wrap it like I do with french and german analyzers (they both use some analyzers from lucene). 2. Analyzer doesn't need ngp files... I think you misunderstood something: 2.1 In one side there is the language identifier that use NGP files to identify language of a document 2.2 In the other sided if a suitable analyzer is found for the identified language, it is used to analyze the document. Regards Jérôme -- http://motrech.free.fr/ http://www.frutch.org/