[ https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964635#comment-15964635 ]
ASF GitHub Bot commented on TIKA-2298: -------------------------------------- asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j URL: https://github.com/apache/tika/pull/159#issuecomment-293141458 hello folks, I have fixed formatting issues @thammegowda please review it. Let me know if any changes are required. I have made it a little more customizable. You can now choose if you want to save model to disk or not. Saving a model to disk requires a lot of memory( around 500mb ) but it saves a lot of runtime memory once the model is saved. How to use: add a field in config file ```xml <param name="serialize" type="string">no</param> ``` It can be yes or no Observations: When loading model from disk: It only require around 1200mb of ram to run. When model is loaded from h5 files using helper functions It requires 2500mb of ram to run the model. I think we can distribute serialized models for vgg16 instead of the original hash files. Will it produce any problems @saudet @agibsonccc , One more thing, the VGG16 model doesn't work completely offline. It connects to internet after processing the image to decode output. Can we make it entirely offline? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > To improve object recognition parser so that it may work without external > RESTful service setup > ----------------------------------------------------------------------------------------------- > > Key: TIKA-2298 > URL: https://issues.apache.org/jira/browse/TIKA-2298 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.14 > Reporter: Avtar Singh > Labels: ObjectRecognitionParser > Fix For: 1.15 > > Original Estimate: 672h > Remaining Estimate: 672h > > When ObjectRecognitionParser was built to do image recognition, there wasn't > good support for Java frameworks. All the popular neural networks were in > C++ or python. Since there was nothing that runs within JVM, we tried > several ways to glue them to Tika (like CLI, JNI, gRPC, REST). > However, this game is changing slowly now. Deeplearning4j, the most famous > neural network library for JVM, now supports importing models that are > pre-trained in python/C++ based kits [5]. > *Improvement:* > It will be nice to have an implementation of ObjectRecogniser that > doesn't require any external setup(like installation of native libraries or > starting REST services). Reasons: easy to distribute and also to cut the IO > time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)