I have tried to increase the timeout value but I was always getting 'idle
timeout' exception from jetty.
After analysing source code, github copilot suggest me that the code of
extraction in ExtractingDocumentLoader.java was not using the
tikaserver.timeoutSecs value from the configuration file.
Then I inserted the suggested code to correct and it worked fine.
this code was include at the end of
public TikaServerExtractionBackend(String baseUrl, int timeoutSeconds,
NamedList<?> initArgs, long maxCharsLimit) method
// Configure the shared Jetty HttpClient's idle timeout so long-running
Tika requests
// don't get dropped by the client's default (30s) idle timeout.
try {
HttpClient sharedClient = acquiredResourcesRef.get().client;
if (sharedClient != null) {
long idleMs = this.defaultTimeout.toMillis();
sharedClient.setIdleTimeout(idleMs);
if (log.isInfoEnabled()) {
log.info("Set shared HttpClient idle timeout to {} ms", idleMs);
}
}
} catch (Throwable t) {
log.warn(
"Unable to configure shared HttpClient idle timeout to {} ms",
this.defaultTimeout.toMillis(),
t);
}
Must I open a new issue for this?
The problem I'm facing is that a big file needs to pass through OCR when
sended to Solr and is throwing an IdleTimeout Exception.