[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964635#comment-15964635
 ] 

ASF GitHub Bot commented on TIKA-2298:
--------------------------------------

asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j
URL: https://github.com/apache/tika/pull/159#issuecomment-293141458
 
 
   hello folks,
   I have fixed formatting issues @thammegowda please review it. Let me know if 
any changes are required.
   I have made it a little more customizable. You can now choose if you want to 
save model to disk or not.
   Saving a model to disk requires a lot of memory( around 500mb ) but it saves 
a lot of runtime memory once the model is saved.
   
   How to use:
   add a field in config file
   ```xml
   <param name="serialize" type="string">no</param> 
   ```
   It can be yes or no
   
   Observations:
   When loading model from disk:
   It only require around 1200mb of ram to run.
   
   When model is loaded from h5 files using helper functions 
   It requires 2500mb of ram to run the model. 
   
   I think we can distribute serialized models for vgg16 instead of the 
original hash files. Will it produce any problems  @saudet @agibsonccc , One 
more thing, the VGG16 model doesn't work completely offline. It connects to 
internet after processing the image to decode output. Can we make it entirely 
offline?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> -----------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2298
>                 URL: https://issues.apache.org/jira/browse/TIKA-2298
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.14
>            Reporter: Avtar Singh
>              Labels: ObjectRecognitionParser
>             Fix For: 1.15
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to