Having run a Solr service, you are striving to have quick response on queries 
and want to avoid anything that can pause the JVM. You work hard to make your 
updates quick and NRT. Text Extractions of XML based documents like Office and 
big object files like PDF are memory intensive and should be sandboxed on 
separate VMs.

Regards,
Dave

> On May 29, 2018, at 12:11 PM, Ken Krugler <kkrugler_li...@transpac.com> wrote:
> 
> Thanks for the ref, Tim.
> 
> I’m curious why SolrCell doesn’t fire up threads when parsing docs with Tika 
> (or use the fork parser), to mitigate issues with hangs & crashes?
> 
> — Ken
> 
>> On May 29, 2018, at 11:54 AM, Tim Allison <talli...@apache.org> wrote:
>> 
>> All,
>> 
>> Over the weekend, Shawn Heisey very kindly drafted a wikipage about the
>> challenges of using Solr's ExtractingRequestHandler and the guidance to
>> avoid it in production.
>> 
>>  I completely agree with this point, and I think that Shawn did a very
>> nice job of capturing some of the challenges.  If you have any feedback or
>> would like to make edits, see:
>> 
>> https://wiki.apache.org/solr/RecommendCustomIndexingWithTika
>> 
>>  Cheers,
>> 
>>                Tim
> 
> --------------------------------------------
> http://about.me/kkrugler
> +1 530-210-6378
> 

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to