GitHub user asmehra95 opened a pull request: https://github.com/apache/tika/pull/159
fix for TIKA-2298 contributed by asmehra95 I have imported VGG16 model into Apache tika using deeplearning4j. The usage of this recogniser is very similar to TensorFlowRESTrecogniser but it doesn't require any external setup, like running RESTservice in as in case of TensorFlowRESTrecogniser. You can read more about TensorFlowRESTrecogniser at https://wiki.apache.org/tika/TikaAndVision To use the DL4JImageRecogniser set class param to org.apache.tika.parser.recognition.dl4j.DL4JImageRecogniser modelType to VGG16 sample configuration is given below for refference. <?xml version="1.0" encoding="UTF-8"?> <properties> <parsers> <parser class="org.apache.tika.parser.recognition.ObjectRecognitionParser"> <mime>image/jpeg</mime> <params> <param name="topN" type="int">5</param> <param name="minConfidence" type="double">0.015</param> <param name="class" type="string">org.apache.tika.parser.recognition.dl4j.DL4JImageRecogniser</param> <param name="modelType" type="string">VGG16</param> </params> </parser> </parsers> </properties> Save the configuration at : tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest To run it, build the project and move to root directory of the project and run the command java -Xmx3G -jar tika-app/target/tika-app-1.14.jar --config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml <path to your image file> -Xmx3G is required because VGG16 model requires quite a lot of memory to run. If your system is not able to run it, you may try to pump up the memory further Once the model runs, it automatically downloads the model file using helper functions of DL4J locally at .dl4j/trainedModels To speed up the process in future, once the model is loaded from original hash files, it is serialized and saved on disk at .dl4j/trainedModels/tikaPreprocessed which significantly reduces the resource usage (specially memory consumption) for future loads. For more details you can red this gist: https://gist.github.com/asmehra95/a16c49ec91f7f0d7b39c5bf6c2483e4d Issue Link: https://issues.apache.org/jira/browse/TIKA-2298 You can merge this pull request into a Git repository by running: $ git pull https://github.com/asmehra95/tika master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tika/pull/159.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #159 ---- commit a5cd6f42dcded603f2b6de9476280c4bd95b6806 Author: asmehra95 <asmehr...@gmail.com> Date: 2017-03-24T14:21:40Z Added dependencies for DL4JImageRecogniser parser commit f777f21b47c8d122e6b7a0819b44977f1d571c59 Author: asmehra95 <asmehr...@gmail.com> Date: 2017-03-24T14:28:54Z Imported VGG16 model via deeplearning4j ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---