On Mar 21, 2012, at 8:51am, Julien Nioche wrote: > Hi guys, > > Just wondering about the best way to make the language detection pluggable > instead of having it hard-wired as it is now. We now that the resources > that are currently in Tika are both slow and inaccurate [1] and there are > other libraries that we could leverage. Why not having the option to select > a different implementation just like we do for parsers? Obviously we'd need > a common interface for the parsers etc... > > What do you think?
I'd be more in favor of using that time to integrate a better language detector into Tika, so that everybody wins from the work :) -- Ken > [1] > http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble -------------------------- Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Mahout & Solr