Hi, I increased the maximum time (I set 300) for the text extraction and tested it using a pdf file with many pages. I get the timeout in the log in the expected time: 2019-08-23 09:02:38,380 DEBUG [org.apache.jackrabbit.oak.plugins.index.search.spi.binary.FulltextBinaryTextExtractor] (async-index-update-async) Extracting /repo1/Carpeta1/File1/jcr:content@jcr:data, 4332681 bytes 2019-08-23 09:07:38,389 WARN [org.apache.jackrabbit.oak.plugins.index.search.spi.binary.FulltextBinaryTextExtractor] (async-index-update-async) [/oak:index/LuceneFullText] Failed to extract text from a binary property due to timeout: /repo1/Carpeta1/File1/jcr:content@jcr:data.
but I am having a problem: the thread that processes the pdf file keeps running, creating images and performing OCR. Is this supposed to happen? Should I check for something in that thread? (BTW, my application server is wildfly 10, I don't know if that affects). I will try again with oak.extraction.inCallerThread=true to see what happens. Regards, Jorge Flórez El vie., 23 ago. 2019 a las 7:13, jorgeeflorez . (< jorgeeduardoflo...@gmail.com>) escribió: > Hi Vikas, > > thank you for your reply. I will try to change those parameters and see > what happens. > To answer one of my questions, I found that text is extracted only from > pdf if I add <mime>application/pdf</mime> to DefaultParser in the index > Tika config file. > > Regards. > Jorge Flórez > > > El jue., 22 ago. 2019 a las 12:43, Vikas Saurabh (<vikas.saur...@gmail.com>) > escribió: > >> Hi, >> >> > Is it possible to change the maximum time for that text extraction >> >> You should be able to configure timeout by setting >> -Doak.extraction.timeoutSeconds=120 >> [0] on ivm command line. >> >> Alternatively, you could also disable running in different thread by >> setting -Doak.extraction.inCallerThread=true >> >> Hope that helps. >> >> [0]: >> >> http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/ExtractedTextCache.java?view=markup&pathrev=1814745#l61 >> >> --Vikas >> (sent from mobile) >> >