Y.  Will do.  I'll be interested to compare the performance.  One of the
obvious pros of tika-server is that you can move Tika off your vm and m.
:)  The downside is that you have to manage it and open ports, which is
simple in many applications and impossible in others.

Once my refactoring is done, we should offer the use of the ForkParser
within tika-server because tika-server is vulnerable to permanent
hangs/oom...

On Tue, May 29, 2018 at 3:35 PM, Luís Filipe Nassif <lfcnas...@gmail.com>
wrote:

> Hi Tim,
>
> Could you clarify the pros and cons between ForkParser (after your
> refactoring) and TikaServer? Maybe we should send those to users list and
> wiki...
>
> Thanks
>
> 2018-05-29 16:27 GMT-03:00 Tim Allison <talli...@apache.org>:
>
>> Ken,
>>   Once TIKA-2653 is done and 1.19(?) is released, I'll propose switching
>> ERH to the ForkParser.  There's also an open ticket for using tika-server.
>> I think users should have both options.
>>
>> On Tue, May 29, 2018 at 3:25 PM, Tim Allison <talli...@apache.org> wrote:
>>
>>> 1: CORRECTION: the ForkParser by itself (without my mods) will protect
>>> against ooms, permanent hangs, and native lib crashing.  My proposed mods 
>>> (on
>>> TIKA-2653) only move the parser dependencies out of Solr's
>>> dependencies.
>>>
>>> 2: note: Also, note the discussion on where to place this information.
>>> Cassandra Targett advocates putting this guidance in the main users' guide.
>>>
>>> On Tue, May 29, 2018 at 3:22 PM, Tim Allison <talli...@apache.org>
>>> wrote:
>>>
>>>> Y, my mods to the ForkParser should make it more robust, and will help
>>>> with OOMs, permanent hangs and native lib crashing.  But those changes are
>>>> still in the works...
>>>>
>>>> On Tue, May 29, 2018 at 3:18 PM, Luís Filipe Nassif <
>>>> lfcnas...@gmail.com> wrote:
>>>>
>>>>> Hi Ken,
>>>>>
>>>>> Threads will not help with OutOfMemoryErrors or crashes caused by
>>>>> native
>>>>> libs. ForkParser can help, after the refactoring started by Tim to
>>>>> handle
>>>>> some of its limitations. See TIKA-2653
>>>>>
>>>>> 2018-05-29 16:11 GMT-03:00 Ken Krugler <kkrugler_li...@transpac.com>:
>>>>>
>>>>> > Thanks for the ref, Tim.
>>>>> >
>>>>> > I’m curious why SolrCell doesn’t fire up threads when parsing docs
>>>>> with
>>>>> > Tika (or use the fork parser), to mitigate issues with hangs &
>>>>> crashes?
>>>>> >
>>>>> > — Ken
>>>>> >
>>>>> > > On May 29, 2018, at 11:54 AM, Tim Allison <talli...@apache.org>
>>>>> wrote:
>>>>> > >
>>>>> > > All,
>>>>> > >
>>>>> > >  Over the weekend, Shawn Heisey very kindly drafted a wikipage
>>>>> about the
>>>>> > > challenges of using Solr's ExtractingRequestHandler and the
>>>>> guidance to
>>>>> > > avoid it in production.
>>>>> > >
>>>>> > >   I completely agree with this point, and I think that Shawn did a
>>>>> very
>>>>> > > nice job of capturing some of the challenges.  If you have any
>>>>> feedback
>>>>> > or
>>>>> > > would like to make edits, see:
>>>>> > >
>>>>> > > https://wiki.apache.org/solr/RecommendCustomIndexingWithTika
>>>>> > >
>>>>> > >   Cheers,
>>>>> > >
>>>>> > >                 Tim
>>>>> >
>>>>> > --------------------------------------------
>>>>> > http://about.me/kkrugler
>>>>> > +1 530-210-6378
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to