[
https://issues.apache.org/jira/browse/TIKA-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888440#comment-16888440
]
Nicholas DiPiazza commented on TIKA-2575:
-----------------------------------------
Hey [[email protected]] I ended up going with my own project because it had
commons-pool integration which suited my needs to control fine grained things
about the tika fork processes. https://github.com/nddipiazza/tika-fork/
Also it sets up a File Reaper thread to delete tmp files left over on Windows
operating system after a certain delay as well.
I was wondering if you or any of the commiters maybe had some time to get on a
zoom / webex session to chat about it? I am nicholas.dipiazza at the gmail if
you might have time.
I'd like to at the very least get a section in the apache tika website about
how to fork jvms with a working example.
> Provide a way to abort tika parses when tika input stream buffer grows passed
> a certain threshold
> -------------------------------------------------------------------------------------------------
>
> Key: TIKA-2575
> URL: https://issues.apache.org/jira/browse/TIKA-2575
> Project: Tika
> Issue Type: Bug
> Components: parser
> Reporter: Nicholas DiPiazza
> Priority: Major
> Attachments: screenshot-1.png
>
>
> Sometimes, for example, you use tika to parse an XLS file that isn't really
> that big, maybe 60 MB. and suddenly the JVM heap size taken is >800Mb which
> causes an OOM in my case.
> Can we make an "abort threshold" where the tika parse will halt if parse
> output bytes exceeds this value?
> Or it is possible for users to already do this themselves by watching the
> input stream as it grows somehow?
>
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)