Hi,

I increased the maximum time (I set 300) for the text extraction and tested
it using a pdf file with many pages. I get the timeout in the log in the
expected time:
2019-08-23 09:02:38,380 DEBUG
[org.apache.jackrabbit.oak.plugins.index.search.spi.binary.FulltextBinaryTextExtractor]
(async-index-update-async) Extracting
/repo1/Carpeta1/File1/jcr:content@jcr:data,
4332681 bytes
2019-08-23 09:07:38,389 WARN
 
[org.apache.jackrabbit.oak.plugins.index.search.spi.binary.FulltextBinaryTextExtractor]
(async-index-update-async) [/oak:index/LuceneFullText] Failed to extract
text from a binary property due to timeout:
/repo1/Carpeta1/File1/jcr:content@jcr:data.

but I am having a problem: the thread that processes the pdf file keeps
running, creating images and performing OCR. Is this supposed to happen?
Should I check for something in that thread? (BTW, my application server is
wildfly 10, I don't know if that affects).

I will try again with oak.extraction.inCallerThread=true to see what
happens.

Regards,

Jorge Flórez

El vie., 23 ago. 2019 a las 7:13, jorgeeflorez . (<
jorgeeduardoflo...@gmail.com>) escribió:

> Hi Vikas,
>
> thank you for your reply. I will try to change those parameters and see
> what happens.
> To answer one of my questions, I found that text is extracted only from
> pdf if I add <mime>application/pdf</mime> to DefaultParser in the index
> Tika config file.
>
> Regards.
> Jorge Flórez
>
>
> El jue., 22 ago. 2019 a las 12:43, Vikas Saurabh (<vikas.saur...@gmail.com>)
> escribió:
>
>> Hi,
>>
>> > Is it possible to change the maximum time for that text extraction
>>
>> You should be able to configure timeout by setting
>> -Doak.extraction.timeoutSeconds=120
>> [0] on ivm command line.
>>
>> Alternatively, you could also disable running in different thread by
>> setting -Doak.extraction.inCallerThread=true
>>
>> Hope that helps.
>>
>> [0]:
>>
>> http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/ExtractedTextCache.java?view=markup&pathrev=1814745#l61
>>
>> --Vikas
>> (sent from mobile)
>>
>

Reply via email to