[
https://issues.apache.org/jira/browse/TIKA-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862690#action_12862690
]
Chris A. Mattmann commented on TIKA-416:
----------------------------------------
+1, this sounds like a great idea!
We did some work on this in OODT in terms of simple external met extractors and
so forth. Maybe we could follow a similar approach here. Check out:
http://svn.apache.org/repos/asf/incubator/oodt/cas-metadata/trunk/src/main/java/gov/nasa/jpl/oodt/cas/metadata/extractors/ExternMetExtractor.java
and
http://svn.apache.org/repos/asf/incubator/oodt/cas-metadata/trunk/src/main/resources/examples/extern-config.xml
as some examples of how to deal with this (NOTE, in OODT-3, we are still in the
process of converting over the licenses and there are no "official" incubator
releases of OODT yet, but I just wanted to let you know about it as some
pointers to ways to get this done). You rock and I can't wait for this feature!
> Out-of-process text extraction
> ------------------------------
>
> Key: TIKA-416
> URL: https://issues.apache.org/jira/browse/TIKA-416
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Reporter: Jukka Zitting
> Priority: Minor
>
> There's currently no easy way to guard against JVM crashes or excessive
> memory or CPU use caused by parsing very large, broken or intentionally
> malicious input documents. To better protect against such cases and to
> generally improve the manageability of resource consumption by Tika it would
> be great if we had a way to run Tika parsers in separate JVM processes. This
> could be handled either as a separate "Tika parser daemon" or as an
> explicitly managed pool of forked JVMs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.