Thanks for the clue, Tim. BUT. same 3.x-SNAPSHOT .jars and configs works smoothly in bare Linux host (excluding Doker from the equation):
DEBUG [pool-3-thread-4] 17:01:27,928 org.apache.tika.pipes.PipesClient pipesClientId=3: commandline: [java, -cp, Downloads/tika-emitter-s3-3.3.0-SNAPSHOT.jar:Downloads/tika-pipes-iterator-s3-3.3.0-SNAPSHOT.jar:Downloads/tika-app-3.3.0-jdk21-SNAPSHOT.jar:Downloads/tika-fetcher-s3-3.3.0-SNAPSHOT.jar, -Djava.awt.headless=true, -DpipesClientId=3, -Dlog4j.configurationFile=log4j2.xml, org.apache.tika.pipes.PipesServer, /home/mikhail-khludnev/git/norn-budget-control-demo/v-conversion-tikamd-job/docker/tika-config.xml, 100000, 300000, 1500000] DEBUG [pool-3-thread-1] 17:01:27,979 org.apache.tika.pipes.async.AsyncProcessor fetchEmitWorker finished, total 1 DEBUG [pool-3-thread-6] 17:01:27,989 org.apache.tika.pipes.async.AsyncEmitter cache size: (0) bytes and extract count: 0 2026-02-19T14:01:28.290084314Z main DEBUG Apache Tika application 3.3 initializing configuration XmlConfiguration[location=/home/mikhail-khludnev/log4j2.xml, lastModified=2026-02-17T23:05:22.608Z] ... DEBUG [main] 17:01:29,263 org.apache.tika.pipes.PipesServer pipes server initialized DEBUG [main] 17:01:29,303 org.apache.tika.pipes.fetcher.s3.S3Fetcher about to fetch fetchkey=path/to/4Mb.pdf from bucket (test-bucket) However, after I directed the child server process logs to logfile docker passed! Thanks Tim. Wondering how it works and how the container environment impacts console redirection. Looking forward for 3.3 release with Markdown! Overall, may you share your vision regarding using Tika in shortlied containers, whether it makes sense at all? What to choose Tika app CLI in batch mode or TikaPipes? Thank you twice! On Fri, Feb 20, 2026 at 3:52 AM Tim Allison <[email protected]> wrote: > Ugh. Thank you for reporting this. > > The problem may be that the logger from the forked process is writing to > stdout or stderr (can’t remember off top of my head) which is the comms > channel in 3.x to the forking process. We’ve fixed this in 4.x. > > If you modify forked process logging to write to file or the other, you > should be ok. > > Please let us know how it goes. > > > On Thu, Feb 19, 2026 at 3:47 PM Mikhail Khludnev <[email protected]> wrote: > > > FWIW, just to let you know about the deadend. > > > > I'm a big fan of Serverless containers see TIKA-4529, but I decided to go > > further and use S3 fetcher and s3 Emitter that turn me to TikaAsyncCLI. > > I've put it into Docker with tesseract, etc. > > Finally, it pulls 4Mb pdf from s3, spins of TikaServer jvm it lanches > those > > binary tools to check their availability and just dies: > > > > org.apache.tika.pipes.PipesClient pipesClientId=0: commandline: [java, > > -cp, > > > > > /tika-emitter-s3.jar:/tika-fetcher-s3.jar:/tika-pipes-iterator-s3.jar:/tika-app.jar, > > -Djava.awt.headless=true, -DpipesClientId=0, > > -Dlog4j.configurationFile=file:///log4j2.xml, -XX:+UseContainerSupport, > > -XX:MaxRAMPercentage=15, -XX:InitialRAMPercentage=15, > > org.apache.tika.pipes.PipesServer, /tmp/tika-config.xml, 100000, 300000, > > 1500000] > > > > .PipesClient pipesClientId=0: From forked process before start byte: > DEBUG > > [main] 16:25:14,240 org.apache.tika.pipes.PipesServer processing requests > > org.apache.tika.parser.ocr.TesseractOCRParser hasTesseract (path: > > [tesseract]): true > > s.PipesServer timer -- initialize parser and other resources: 939 ms > > DEBUG [main] 16:25:15,180 org.apache.tika.pipes.PipesServer pipes server > > initialized > > > > TRACE [pool-4-thread-1] 16:25:15,206 org.apache.tika.pipes.PipesClient > > pipesClientId=0: timer -- write tuple: 24 ms > > ERROR [pool-3-thread-2] 16:25:15,239 org.apache.tika.pipes.PipesClient > > pipesClientId=0: execution exception > > java.util.concurrent.ExecutionException: java.io.IOException: problem > > reading response from server: 54 > > > > Caused by: java.lang.IllegalArgumentException: byte with index 83 must > be < > > 17 > > at > > org.apache.tika.pipes.PipesServer$STATUS.lookup(PipesServer.java:123) > > at > > org.apache.tika.pipes.PipesClient.readResults(PipesClient.java:291) > > ... 5 more > > TRACE [pool-3-thread-6] 16:25:15,332 > > org.apache.tika.pipes.async.AsyncEmitter Nothing on the async queue > > DEBUG [pool-3-thread-6] 16:25:15,332 > > org.apache.tika.pipes.async.AsyncEmitter cache size: (0) bytes and > extract > > count: 0 > > WARN [pool-3-thread-2] 16:25:15,458 org.apache.tika.pipes.PipesClient > > pipesClientId=0 crash: path/to/4mb.pdf in 59 ms with exit code 137 > > TRACE [pool-3-thread-2] 16:25:15,458 > > org.apache.tika.pipes.async.AsyncProcessor timer -- pipes client process: > > 1646 ms > > > > the only clue I have is [..with exit code 137], it implies OOM, but I > can't > > see any other evidence, counters or logs or whatever. > > > > We can count it as a bug that failed Server isn;t propagated to the > failure > > of TikaAsyncCLI > > > > DEBUG [pool-3-thread-6] 16:25:15,813 > > org.apache.tika.pipes.async.AsyncEmitter emitted: 0 files > > DEBUG [pool-3-thread-1] 16:25:15,820 > > org.apache.tika.pipes.async.AsyncProcessor emitter thread finished, > total 1 > > INFO [main] 16:25:16,313 org.apache.tika.async.cli.TikaAsyncCLI > > Successfully finished processing 1 files in 3001 ms > > > > I've tweaked settings a little, memory size etc, it's helpless. Same > > configuration works fine on host linux w/o container. > > > > So, I gave up, turn back to tika-app cli. FYI. > > -- > > Sincerely yours > > Mikhail Khludnev > > > -- Sincerely yours Mikhail Khludnev
