On Mar 21, 2012, at 8:51am, Julien Nioche wrote:

> Hi guys,
> 
> Just wondering about the best way to make the language detection pluggable
> instead of having it hard-wired as it is now. We now that the resources
> that are currently in Tika are both slow and inaccurate [1] and there are
> other libraries that we could leverage. Why not having the option to select
> a different implementation just like we do for parsers? Obviously we'd need
> a common interface for the parsers etc...
> 
> What do you think?

I'd be more in favor of using that time to integrate a better language detector 
into Tika, so that everybody wins from the work :)

-- Ken


> [1]
> http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html
> 
> -- 
> *
> *Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble

--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr




Reply via email to