[ https://issues.apache.org/jira/browse/TIKA-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334819#comment-17334819 ]
Tim Allison commented on TIKA-3370: ----------------------------------- I pushed a bare minimum for this issue. I pretty much copied/pasted from the ForkParser. Lots more to do. This forks the parsing per fetchemittuple into a separate process, but brings back all the emit data into the primary process so that they can be batched for emitting. This is a memory risk and needs to be fixed somehow... > Refactor the AsyncProcessor in 2.x > ---------------------------------- > > Key: TIKA-3370 > URL: https://issues.apache.org/jira/browse/TIKA-3370 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Assignee: Tim Allison > Priority: Major > > Yesterday, I finally got back to trying to wire the AsyncProcessor in > tika-pipes into the AsyncHandler in tika-server. I've now convinced myself > that the notorious antipattern of using a db as a queue is in fact a really, > really bad idea -- there's every chance that I wasn't doing it right or that > H2 isn't a great choice...my $ is on the former. > Nevertheless, I think removing H2 from that process and going with a > modification of our ForkParser or a lightweight purpose-built knock-off to > handle fetchers and emitters will be as robust, a bunch cleaner, have fewer > dependencies and hopefully be more performant than what I had in the > AsyncProcessor. > Immediate term, I'd like to get this running and wired into tika-server. > Longer term, we can use this instead of tika-batch in tika-app...more use, > fewer bugs. > This is the last item I'd like to finish before 2.0.0-BETA. -- This message was sent by Atlassian Jira (v8.3.4#803005)