> but I am having a problem: the thread that processes the pdf file keeps running, creating images and performing OCR. Is this supposed to happen?
TL;DR: yes, because there is no safe way to kill a thread Yes that's supposed to happen. The reason this feature implemented was because in most cases text extraction should finish within a reasonable time. But, at times, due to a bad file or a bug in parser the extraction process keeps on running - that used to hold up indexing for the whole setup. Since the assumption with a timed out extraction is that tika or whichever parser is in play might be stuck and Thread.stop could leave things in incorrect state potentially affecting subsequent operations. -Vikas (sent from mobile)