[ 
https://issues.apache.org/jira/browse/TIKA-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334819#comment-17334819
 ] 

Tim Allison commented on TIKA-3370:
-----------------------------------

I pushed a bare minimum for this issue.  I pretty much copied/pasted from the 
ForkParser.  Lots more to do.  This forks the parsing per fetchemittuple into a 
separate process, but brings back all the emit data into the primary process so 
that they can be batched for emitting.  This is a memory risk and needs to be 
fixed somehow...

> Refactor the AsyncProcessor in 2.x
> ----------------------------------
>
>                 Key: TIKA-3370
>                 URL: https://issues.apache.org/jira/browse/TIKA-3370
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>            Priority: Major
>
> Yesterday, I finally got back to trying to wire the AsyncProcessor in 
> tika-pipes into the AsyncHandler in tika-server.  I've now convinced myself 
> that the notorious antipattern of using a db as a queue is in fact a really, 
> really bad idea -- there's every chance that I wasn't doing it right or that 
> H2 isn't a great choice...my $ is on the former.
> Nevertheless, I think removing H2 from that process and going with a 
> modification of our ForkParser or a lightweight purpose-built knock-off to 
> handle fetchers and emitters will be as robust, a bunch cleaner, have fewer 
> dependencies and hopefully be more performant than what I had in the 
> AsyncProcessor.
> Immediate term, I'd like to get this running and wired into tika-server.  
> Longer term, we can use this instead of tika-batch in tika-app...more use, 
> fewer bugs.
> This is the last item I'd like to finish before 2.0.0-BETA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to