You'll need to be careful of the classloader issues if you do that...

The core Nutch code needs just the mime type stuff, but if you access
Tika from the lib directory rather then from the plugins/lib
directory, it won't be able to find any extensions.  I've used Tika to
implement a docx plugin, and came across all these problems.

Kirby


On Thu, Nov 12, 2009 at 8:41 AM, Julien Nioche
<lists.digitalpeb...@gmail.com> wrote:
> Speaking of which, I'm planning to do some work on the Tika integration
> within the next week or so. Basically, I'll create a new plugin which will
> be used for the mime types that Tika can already handle while keeping some
> of the existing plugins for the more complex cases. This should allow us to
> already have a first version of the Tika integration without losing any the
> functionalities. Will update the list as soon as I have something working +
> will create a JIRA
>
> J.
> --
> DigitalPebble Ltd
> http://www.digitalpebble.com
>
> 2009/11/10 Andrzej Bialecki <a...@getopt.org>
>>
>> BrunoWL wrote:
>>>
>>> Hi. i'm a benning in nutch. Can anybody tell how to make nutch use
>>> parsers
>>> from tika.
>>> I did all kind of search and didn't find a answer.
>>
>> Tika parsers are not integrated yet with Nutch - we use our own parsers,
>> and in most cases they are of similar quality as those in Tika (since most
>> Tika parsers originated in Nutch). Tight Tika integration is on the roadmap.
>>
>> --
>> Best regards,
>> Andrzej Bialecki     <><
>>  ___. ___ ___ ___ _ _   __________________________________
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>

Reply via email to