Re: Guidance to avoid Tika's integration with Solr's ExtractingRequestHandler in production

Dave Fisher Tue, 29 May 2018 12:22:16 -0700

Having run a Solr service, you are striving to have quick response on queries 
and want to avoid anything that can pause the JVM. You work hard to make your 
updates quick and NRT. Text Extractions of XML based documents like Office and 
big object files like PDF are memory intensive and should be sandboxed on 
separate VMs.


Regards,
Dave

> On May 29, 2018, at 12:11 PM, Ken Krugler <[email protected]> wrote:
> 
> Thanks for the ref, Tim.
> 
> I’m curious why SolrCell doesn’t fire up threads when parsing docs with Tika 
> (or use the fork parser), to mitigate issues with hangs & crashes?
> 
> — Ken
> 
>> On May 29, 2018, at 11:54 AM, Tim Allison <[email protected]> wrote:
>> 
>> All,
>> 
>> Over the weekend, Shawn Heisey very kindly drafted a wikipage about the
>> challenges of using Solr's ExtractingRequestHandler and the guidance to
>> avoid it in production.
>> 
>>  I completely agree with this point, and I think that Shawn did a very
>> nice job of capturing some of the challenges.  If you have any feedback or
>> would like to make edits, see:
>> 
>> https://wiki.apache.org/solr/RecommendCustomIndexingWithTika
>> 
>>  Cheers,
>> 
>>                Tim
> 
> --------------------------------------------
> http://about.me/kkrugler
> +1 530-210-6378
>

signature.asc
Description: Message signed with OpenPGP

Re: Guidance to avoid Tika's integration with Solr's ExtractingRequestHandler in production

Reply via email to