[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-08-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145788#comment-16145788
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-325745416
 
 
   thanks @boegel I'll get this committed today!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-08-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145786#comment-16145786
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

boegel commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- 
Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-325745167
 
 
   @chrismattmann Yup, I just increased the "read timeout". PR at #203
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-08-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145661#comment-16145661
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-325726468
 
 
   ping @boegel 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-08-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142072#comment-16142072
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-325015785
 
 
   thanks @boegel if you can submit a PR I'll commit the above, looks like you 
just increased the max timeout right?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-08-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141511#comment-16141511
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

boegel commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- 
Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-324894453
 
 
   I was able to dance around the issue with the following patch:
   
   ```
   --- 
tika-1.16/tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JInceptionV3Net.java.orig
2017-08-25 10:01:28.324036746 +0200
   +++ 
tika-1.16/tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JInceptionV3Net.java
 2017-08-25 10:01:49.534306082 +0200
   @@ -213,7 +213,7 @@
}
LOG.info("Cache doesn't exist. Going to make a copy");
LOG.info("This might take a while! GET {}", uri);
   -FileUtils.copyURLToFile(uri.toURL(), cacheFile, 5000, 5000);
   +FileUtils.copyURLToFile(uri.toURL(), cacheFile, 5000, 5);
//restore the success flag again
FileUtils.write(successFlag,
"CopiedAt:" + System.currentTimeMillis(),
   ```
   
   The download of the 90MB `inception-model-weights.h5` was timing out after 
5s, which seems a bit tight to me?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141200#comment-16141200
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

agibsonccc commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-324821344
 
 
   Timeout issues like thse are common. It's usually to do with a VPN or proxy. 
If you have issues please feel free to come talk to us directly. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140696#comment-16140696
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

boegel commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- 
Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-324757211
 
 
   @chrismattmann I think it's actually 
https://raw.githubusercontent.com/USCDataScience/dl4j-kerasimport-examples/98ec48b56a5b8fb7d54a2994ce9cb23bfefac821/dl4j-import-example/data/inception-model-weights.h5,
 which is a 90MB download...
   
   Cfr. 
https://github.com/apache/tika/blob/master/tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-inception3-config.xml#L27
 (which is used as input to `TikaConfig`.
   
   I guess the download is taking too long for some reason?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140652#comment-16140652
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-324750562
 
 
   @boegel check out: 
https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-modelimport/src/main/java/org/deeplearning4j/nn/modelimport/keras/trainedmodels/TrainedModels.java
 looks like it's a Github URL?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140637#comment-16140637
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

boegel commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- 
Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-324747665
 
 
   @chrismattmann No, network works fine... I am behind a firewall though, 
maybe that's the issue.
   What is the test trying to download exactly, and where can I seed in what it 
wants?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140632#comment-16140632
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-324746641
 
 
   hi @boegel are you on a computer that doesn't have a net connection? You 
just need that model to download once...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140622#comment-16140622
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

boegel commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- 
Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-324745266
 
 
   I'm trying to build and install Tika 1.16 from source, and I'm running into 
a failing test; it seems like this test was added in this PR.
   
   Any pointers to what is wrong here? How can I debug this further?
   
   ```
   Running org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest
   SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
   SLF4J: Defaulting to no-operation (NOP) logger implementation
   SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
   Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.559 sec 
<<< FAILURE! - in org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest
   recognise(org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest)  Time elapsed: 
5.556 sec  <<< ERROR!
   org.apache.tika.exception.TikaConfigException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
   at java.net.SocketInputStream.read(SocketInputStream.java:171)
   at java.net.SocketInputStream.read(SocketInputStream.java:141)
   at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
   at sun.security.ssl.InputRecord.read(InputRecord.java:503)
   at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
   at 
sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
   at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
   at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569)
   at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
   at 
sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)
   at org.apache.commons.io.FileUtils.copyURLToFile(FileUtils.java:1506)
   at 
org.apache.tika.dl.imagerec.DL4JInceptionV3Net.cachedDownload(DL4JInceptionV3Net.java:216)
   at 
org.apache.tika.dl.imagerec.DL4JInceptionV3Net.initialize(DL4JInceptionV3Net.java:232)
   at 
org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
   at 
org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
   at 
org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
   at org.apache.tika.config.TikaConfig.(TikaConfig.java:187)
   at org.apache.tika.config.TikaConfig.(TikaConfig.java:168)
   at org.apache.tika.config.TikaConfig.(TikaConfig.java:161)
   at org.apache.tika.config.TikaConfig.(TikaConfig.java:157)
   at 
org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest.recognise(DL4JInceptionV3NetTest.java:33)
   
   Running org.apache.tika.dl.imagerec.DL4JVGG16NetTest
   Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.6 sec - 
in org.apache.tika.dl.imagerec.DL4JVGG16NetTest
   
   Results :
   
   Tests in error:
 DL4JInceptionV3NetTest.recognise:33 » TikaConfig Read timed out
   
   Tests run: 2, Failures: 0, Errors: 1, Skipped: 0
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do 

[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-07 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077692#comment-16077692
 ] 

Chris A. Mattmann commented on TIKA-2298:
-

docs added here: https://wiki.apache.org/tika/AgeDetectionParser and linked 
from front page

> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076824#comment-16076824
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-313446688
 
 
   Thank you guys!! @chrismattmann @thammegowda @tballison . This is my first 
merge in a major repository and i am very excited!. once again Thanks !.
   I will surely come up with the documentation soon chris. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-06 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076419#comment-16076419
 ] 

Tim Allison commented on TIKA-2298:
---

I knew this was the wrong week to go off coffee...

> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075859#comment-16075859
 ] 

Chris A. Mattmann commented on TIKA-2298:
-

fixed, was a simple typo - you forgot to set the config object = the new 
TikaConfig

> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075855#comment-16075855
 ] 

Chris A. Mattmann commented on TIKA-2298:
-

docs added in: https://wiki.apache.org/tika/TikaAndVisionDL4J

> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075849#comment-16075849
 ] 

Chris A. Mattmann commented on TIKA-2298:
-

[~talli...@apache.org] your latest update causes Jenkins and my local build to 
fail:

{noformat}
---
 T E S T S
---
Running org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.268 sec - in 
org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest
Running org.apache.tika.dl.imagerec.DL4JVGG16NetTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.353 sec <<< 
FAILURE! - in org.apache.tika.dl.imagerec.DL4JVGG16NetTest
recognise(org.apache.tika.dl.imagerec.DL4JVGG16NetTest)  Time elapsed: 6.353 
sec  <<< ERROR!
java.lang.NullPointerException: null
at org.apache.tika.Tika.(Tika.java:109)
at 
org.apache.tika.dl.imagerec.DL4JVGG16NetTest.recognise(DL4JVGG16NetTest.java:40)


Results :

Tests in error: 
  DL4JVGG16NetTest.recognise:40 » NullPointer

Tests run: 2, Failures: 0, Errors: 1, Skipped: 0

[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Tika parent . SUCCESS [  1.169 s]
[INFO] Apache Tika core ... SUCCESS [ 23.745 s]
[INFO] Apache Tika parsers  SUCCESS [03:20 min]
[INFO] Apache Tika XMP  SUCCESS [  1.323 s]
[INFO] Apache Tika serialization .. SUCCESS [  1.114 s]
[INFO] Apache Tika batch .. SUCCESS [01:47 min]
[INFO] Apache Tika language detection . SUCCESS [  2.683 s]
[INFO] Apache Tika application  SUCCESS [ 43.016 s]
[INFO] Apache Tika OSGi bundle  SUCCESS [ 18.439 s]
[INFO] Apache Tika translate .. SUCCESS [  1.794 s]
[INFO] Apache Tika server . SUCCESS [ 36.437 s]
[INFO] Apache Tika examples ... SUCCESS [  5.494 s]
[INFO] Apache Tika Java-7 Components .. SUCCESS [  1.815 s]
[INFO] Apache Tika eval ... SUCCESS [ 22.354 s]
[INFO] Apache Tika Deep Learning (powered by DL4J)  FAILURE [ 14.242 s]
[INFO] Apache Tika  SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 08:01 min
[INFO] Finished at: 2017-07-05T18:33:59-07:00
[INFO] Final Memory: 126M/1659M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on 
project tika-dl: There are test failures.
[ERROR] 
[ERROR] Please refer to 
/Users/mattmann/tmp/tika1.15/tika-dl/target/surefire-reports for the individual 
test results.
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tika-dl
LMC-053601:tika1.15 mattmann$ 

{noformat}

I'm going to try and fix real quick.


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, 

[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075786#comment-16075786
 ] 

Hudson commented on TIKA-2298:
--

FAILURE: Integrated in Jenkins build Tika-trunk #1310 (See 
[https://builds.apache.org/job/Tika-trunk/1310/])
TIKA-2298: DL4J-VGG16 simplify conf, implementation (thammegowda: 
[https://github.com/apache/tika/commit/c476ec14efe2d9007f461ecf09ccd2ade4ffc197])
* (edit) 
tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml
* (edit) tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JVGG16Net.java
Record change for TIKA-2298: Very Deep Convolutional Networks for (mattmann: 
[https://github.com/apache/tika/commit/b58cfcf1935d138065eb4a090ba4c1fef17ddacd])
* (edit) CHANGES.txt
TIKA-2298 -- skip test if no network connectivity.  Should rework for 
(tallison: 
[https://github.com/apache/tika/commit/158675def02810d116e7cdab8409c121a88e77eb])
* (edit) tika-dl/src/test/java/org/apache/tika/dl/imagerec/DL4JVGG16NetTest.java


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser, gsoc, memex
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075755#comment-16075755
 ] 

Chris A. Mattmann commented on TIKA-2298:
-

YES sounds perfect thanks [~talli...@apache.org]

> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075743#comment-16075743
 ] 

Tim Allison commented on TIKA-2298:
---

I'm having the usual proxy problems in my environment with the network call.  
Mind if I try/catch/swallow TikaConfigurationException with 
message.contains("Connection refused")

> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075740#comment-16075740
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-313268447
 
 
   @asmehra95 and @thammegowda please add a page like 
https://wiki.apache.org/tika/TikaAndVisionDL4J on the Tika Wiki or add to that 
page and show how to use the VGG16 model. Should be pretty quick thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>Assignee: Chris A. Mattmann
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075732#comment-16075732
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on a change in pull request #182: Creation of TIKA-2298 
contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#discussion_r125792563
 
 

 ##
 File path: tika-dl/pom.xml
 ##
 @@ -87,6 +87,11 @@
   nd4j-native-platform
   ${dl4j.version}
 
+
+org.apache.commons
+commons-compress
+1.14
 
 Review comment:
   fixed in 
https://github.com/apache/tika/commit/94f8b9fe5fdaebd11a99e76dd742bdc6df427389
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075730#comment-16075730
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann closed pull request #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075729#comment-16075729
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on a change in pull request #182: Creation of TIKA-2298 
contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#discussion_r125792055
 
 

 ##
 File path: tika-dl/pom.xml
 ##
 @@ -87,6 +87,11 @@
   nd4j-native-platform
   ${dl4j.version}
 
+
+org.apache.commons
+commons-compress
+1.14
 
 Review comment:
   on it! thanks @tballison 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075727#comment-16075727
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

tballison commented on a change in pull request #182: Creation of TIKA-2298 
contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#discussion_r125791986
 
 

 ##
 File path: tika-dl/pom.xml
 ##
 @@ -87,6 +87,11 @@
   nd4j-native-platform
   ${dl4j.version}
 
+
+org.apache.commons
+commons-compress
+1.14
 
 Review comment:
   commons.compress.version is set in tika-parent's pom.  Reference that here 
${commons.compress.version} so we don't have to worry about 
coordination/conflicts
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075728#comment-16075728
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-313265776
 
 
   OK, I got it working, great job @asmehra95! I am good to merge this into 
1.16. Let me double check there are no objections (if so we can back it out).
   
   ## Build passes
   
   ```
   [INFO] Loading classes to check...
   [INFO] Scanning classes for violations...
   [INFO] Scanned 2 (and 230 related) class file(s) for forbidden API 
invocations (in 0.04s), 0 error(s).
   [INFO] 
   [INFO] --- maven-install-plugin:2.5.2:install (default-install) @ tika-dl ---
   [INFO] Installing 
/Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT.jar to 
/Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.jar
   [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/pom.xml to 
/Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.pom
   [INFO] Installing 
/Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar
 to 
/Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar
   [INFO] 

   [INFO] BUILD SUCCESS
   [INFO] 

   [INFO] Total time: 03:48 min
   [INFO] Finished at: 2017-07-05T17:24:47-07:00
   [INFO] Final Memory: 129M/1177M
   [INFO] 

   LMC-053601:tika-dl mattmann$ 
   ```
   
   ## Running Lion Image Recognition Test
   ```bash
   $cat test.sh
   java -Xmx3G -cp 
./tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar:tika-app/target/tika-app-1.16-SNAPSHOT.jar
 org.apache.tika.cli.TikaCLI 
--config=tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml
 tika-dl/src/test/resources/org/apache/tika/dl/imagerec/lion.jpg
   ```
   
   ```bash
   LMC-053601:tika1.15 mattmann$ sh test.sh
   Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The 
ImageParser will skip jbig2 images
   Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
   See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
   for optional dependencies.
   J2KImageReader not loaded. JPEG2000 files will not be processed.
   See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
   for optional dependencies.
   
   Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: Tesseract OCR is installed and will be automatically applied to 
image files.
   This may dramatically slow down content extraction (TIKA-2359).
   As of Tika 1.15 (and prior versions), Tesseract is automatically called.
   In future versions of Tika, users may need to turn the TesseractOCRParser on 
via TikaConfig.
   Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: org.xerial's sqlite-jdbc is not loaded.
   Please provide the jar on your classpath to parse sqlite files.
   See tika-parsers/pom.xml for the correct version.
   INFO  Loaded [CpuBackend] backend
   INFO  Number of threads used for NativeOps: 4
   INFO  Reflections took 130 ms to scan 1 urls, producing 29 keys and 189 
values 
   INFO  Number of threads used for BLAS: 4
   INFO  Backend used: [CPU]; OS: [Mac OS X]
   INFO  Cores: [8]; Memory: [2.7GB];
   INFO  Blas vendor: [OPENBLAS]
   WARN  could not create Vfs.Dir from url. ignoring the exception and 
continuing
   org.reflections.ReflectionsException: could not create Vfs.Dir from url, no 
matching UrlType was found 
[file:/System/Library/Java/Extensions/libJ3DAudio.jnilib]
   either use fromURL(final URL url, final List urlTypes) or use the 
static setDefaultURLTypes(final List urlTypes) or 
addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
at org.reflections.Reflections.scan(Reflections.java:237)
at org.reflections.Reflections.scan(Reflections.java:204)
at org.reflections.Reflections.(Reflections.java:129)
at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
at 

[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075725#comment-16075725
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

thammegowda commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-313267082
 
 
   I have not configured maven based on memory so far. It should be possible
   to hack it based on ENV or system property.
   
   On Jul 5, 2017 5:49 PM, "Chris Mattmann"  wrote:
   
   > @thammegowda  I would say it's OK - do
   > you know if there is a Maven plugin to only run tests if a certain amount
   > of RAM is available? I think I could easily hack this using properties, but
   > just checking first.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > , or mute
   > the thread
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075718#comment-16075718
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-313265776
 
 
   OK, I got it working, great job @asmehra95! I am good to merge this into 
1.16. Let me double check there are no objections (if so we can back it out).
   
   h2. Build passes
   
   ```
   [INFO] Loading classes to check...
   [INFO] Scanning classes for violations...
   [INFO] Scanned 2 (and 230 related) class file(s) for forbidden API 
invocations (in 0.04s), 0 error(s).
   [INFO] 
   [INFO] --- maven-install-plugin:2.5.2:install (default-install) @ tika-dl ---
   [INFO] Installing 
/Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT.jar to 
/Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.jar
   [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/pom.xml to 
/Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.pom
   [INFO] Installing 
/Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar
 to 
/Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar
   [INFO] 

   [INFO] BUILD SUCCESS
   [INFO] 

   [INFO] Total time: 03:48 min
   [INFO] Finished at: 2017-07-05T17:24:47-07:00
   [INFO] Final Memory: 129M/1177M
   [INFO] 

   LMC-053601:tika-dl mattmann$ 
   ```
   
   h2. Running Lion Image Recognition Test
   ```bash
   $cat test.sh
   java -Xmx3G -cp 
./tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar:tika-app/target/tika-app-1.16-SNAPSHOT.jar
 org.apache.tika.cli.TikaCLI 
--config=tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml
 tika-dl/src/test/resources/org/apache/tika/dl/imagerec/lion.jpg
   ```
   
   ```bash
   LMC-053601:tika1.15 mattmann$ sh test.sh
   Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The 
ImageParser will skip jbig2 images
   Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
   See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
   for optional dependencies.
   J2KImageReader not loaded. JPEG2000 files will not be processed.
   See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
   for optional dependencies.
   
   Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: Tesseract OCR is installed and will be automatically applied to 
image files.
   This may dramatically slow down content extraction (TIKA-2359).
   As of Tika 1.15 (and prior versions), Tesseract is automatically called.
   In future versions of Tika, users may need to turn the TesseractOCRParser on 
via TikaConfig.
   Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: org.xerial's sqlite-jdbc is not loaded.
   Please provide the jar on your classpath to parse sqlite files.
   See tika-parsers/pom.xml for the correct version.
   INFO  Loaded [CpuBackend] backend
   INFO  Number of threads used for NativeOps: 4
   INFO  Reflections took 130 ms to scan 1 urls, producing 29 keys and 189 
values 
   INFO  Number of threads used for BLAS: 4
   INFO  Backend used: [CPU]; OS: [Mac OS X]
   INFO  Cores: [8]; Memory: [2.7GB];
   INFO  Blas vendor: [OPENBLAS]
   WARN  could not create Vfs.Dir from url. ignoring the exception and 
continuing
   org.reflections.ReflectionsException: could not create Vfs.Dir from url, no 
matching UrlType was found 
[file:/System/Library/Java/Extensions/libJ3DAudio.jnilib]
   either use fromURL(final URL url, final List urlTypes) or use the 
static setDefaultURLTypes(final List urlTypes) or 
addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
at org.reflections.Reflections.scan(Reflections.java:237)
at org.reflections.Reflections.scan(Reflections.java:204)
at org.reflections.Reflections.(Reflections.java:129)
at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
at 

[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075711#comment-16075711
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-313265222
 
 
   @thammegowda I would say it's OK - do you know if there is a Maven plugin to 
only run tests if a certain amount of RAM is available? I think I could easily 
hack this using properties, but just checking first.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075678#comment-16075678
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

thammegowda commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-313261299
 
 
   Now the question is, to accommodate this VGG model (with unit tests) we need 
to increase the memory requirements for Tika build system to 3GB.
Is this okay to do?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075677#comment-16075677
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

thammegowda commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-313261053
 
 
   The export statements in bash are not considered by maven.
   Thats because the value set in POM.xml overrides those exports.
   
   https://github.com/apache/tika/blob/master/tika-parent/pom.xml#L359
   This model requires 3GB and hence the tika-parent should be updated to 
reflect the same.
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075605#comment-16075605
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-313252683
 
 
   BTW see my branch (I had to fix some errors in compilation along the way):
   
   https://github.com/apache/tika/compare/master...chrismattmann:TIKA-2298
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075602#comment-16075602
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-313252275
 
 
   So @asmehra95 @thammegowda I have been testing this out. I can't get the 
unit tests to pass. See below:
   
   ```bash
   LMC-053601:tika-dl mattmann$ history | grep export
 546  export MAVEN_OPTS="-Xms2048m"
 548  export MAVEN_OPTS="-Xmx3G"
 550  history | grep export
   LMC-053601:tika-dl mattmann$ 
   
   ```
   
   ```bash
   [INFO] 
   [INFO] --- maven-resources-plugin:2.7:resources (default-resources) @ 
tika-dl ---
   [INFO] Using 'UTF-8' encoding to copy filtered resources.
   [INFO] Copying 2 resources
   [INFO] Copying 3 resources
   [INFO] 
   [INFO] --- maven-compiler-plugin:3.2:compile (default-compile) @ tika-dl ---
   [INFO] Changes detected - recompiling the module!
   [INFO] Compiling 2 source files to 
/Users/mattmann/tmp/tika1.15/tika-dl/target/classes
   [INFO] 
   [INFO] --- maven-resources-plugin:2.7:testResources (default-testResources) 
@ tika-dl ---
   [INFO] Using 'UTF-8' encoding to copy filtered resources.
   [INFO] Copying 4 resources
   [INFO] Copying 3 resources
   [INFO] 
   [INFO] --- maven-compiler-plugin:3.2:testCompile (default-testCompile) @ 
tika-dl ---
   [INFO] Changes detected - recompiling the module!
   [INFO] Compiling 2 source files to 
/Users/mattmann/tmp/tika1.15/tika-dl/target/test-classes
   [INFO] 
   [INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @ tika-dl ---
   [INFO] Surefire report directory: 
/Users/mattmann/tmp/tika1.15/tika-dl/target/surefire-reports
   
   ---
T E S T S
   ---
   Running org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest
   SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
   SLF4J: Defaulting to no-operation (NOP) logger implementation
   SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
   Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.691 sec - 
in org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest
   Running org.apache.tika.dl.imagerec.DL4JVGG16NetTest
   Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 130.047 sec 
<<< FAILURE! - in org.apache.tika.dl.imagerec.DL4JVGG16NetTest
   recognise(org.apache.tika.dl.imagerec.DL4JVGG16NetTest)  Time elapsed: 
130.047 sec  <<< ERROR!
   java.lang.OutOfMemoryError: Cannot allocate new FloatPointer(102760448): 
totalBytes = 1G, physicalBytes = 2G
at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:568)
at org.bytedeco.javacpp.Pointer.init(Pointer.java:121)
at org.bytedeco.javacpp.FloatPointer.allocateArray(Native Method)
at org.bytedeco.javacpp.FloatPointer.(FloatPointer.java:68)
at 
org.nd4j.linalg.api.buffer.BaseDataBuffer.(BaseDataBuffer.java:445)
at org.nd4j.linalg.api.buffer.FloatBuffer.(FloatBuffer.java:57)
at 
org.nd4j.linalg.api.buffer.factory.DefaultDataBufferFactory.createFloat(DefaultDataBufferFactory.java:236)
at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1301)
at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1275)
at org.nd4j.linalg.api.ndarray.BaseNDArray.(BaseNDArray.java:252)
at org.nd4j.linalg.cpu.nativecpu.NDArray.(NDArray.java:109)
at 
org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.create(CpuNDArrayFactory.java:247)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4768)
at 
org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.toFlattened(CpuNDArrayFactory.java:502)
at 
org.nd4j.linalg.factory.BaseNDArrayFactory.toFlattened(BaseNDArrayFactory.java:321)
at org.nd4j.linalg.factory.Nd4j.toFlattened(Nd4j.java:1846)
at 
org.deeplearning4j.nn.weights.WeightInitUtil.initWeights(WeightInitUtil.java:111)
at 
org.deeplearning4j.nn.weights.WeightInitUtil.initWeights(WeightInitUtil.java:61)
at 
org.deeplearning4j.nn.params.DefaultParamInitializer.createWeightMatrix(DefaultParamInitializer.java:145)
at 
org.deeplearning4j.nn.params.DefaultParamInitializer.createWeightMatrix(DefaultParamInitializer.java:133)
at 
org.deeplearning4j.nn.params.DefaultParamInitializer.init(DefaultParamInitializer.java:82)
at 
org.deeplearning4j.nn.conf.layers.DenseLayer.instantiate(DenseLayer.java:56)
at 
org.deeplearning4j.nn.conf.graph.LayerVertex.instantiate(LayerVertex.java:92)
at 
org.deeplearning4j.nn.graph.ComputationGraph.init(ComputationGraph.java:370)
at 
org.deeplearning4j.nn.graph.ComputationGraph.init(ComputationGraph.java:274)
 

[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075221#comment-16075221
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-313190499
 
 
   @thammegowda 
   thanks for reply. It would be fine, if it is released in 1.16 but is it 
working fine?
   A code review would be extremely useful so that i can fix any issues that 
may be present. This would ensure smooth integrity of this branch in the main 
branch
   Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074889#comment-16074889
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

thammegowda commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-313128325
 
 
   
   Thanks for pushing the changes. 
   We probably have to hold this for the release of tika 1.16.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-07-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074274#comment-16074274
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-313001290
 
 
   @chrismattmann @thammegowda any update guys?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063668#comment-16063668
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-311165331
 
 
   hey guys 
   @chrismattmann  @thammegowda fixed the pending issues please review
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059860#comment-16059860
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-310470419
 
 
   looks like the PR was merged @thammegowda and @asmehra95 thanks. Let's work 
on the pending things now and I'll be ready to test when done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058434#comment-16058434
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

thammegowda commented on a change in pull request #182: Creation of TIKA-2298 
contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#discussion_r123379685
 
 

 ##
 File path: tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JVGG16Net.java
 ##
 @@ -0,0 +1,161 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.dl.imagerec;
+
+import org.apache.tika.config.Field;
+import org.apache.tika.config.Param;
+import org.apache.tika.exception.TikaConfigException;
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.mime.MediaType;
+import org.apache.tika.parser.ParseContext;
+import org.apache.tika.parser.external.ExternalParser;
+import org.apache.tika.parser.recognition.ObjectRecogniser;
+import org.apache.tika.parser.recognition.RecognisedObject;
+import org.datavec.image.loader.NativeImageLoader;
+import org.deeplearning4j.nn.graph.ComputationGraph;
+import 
org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModelHelper;
+import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModels;
+import org.deeplearning4j.util.ModelSerializer;
+import org.nd4j.linalg.api.ndarray.INDArray;
+import org.nd4j.linalg.dataset.api.preprocessor.DataNormalization;
+import org.nd4j.linalg.dataset.api.preprocessor.VGG16ImagePreProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.xml.sax.ContentHandler;
+import org.xml.sax.SAXException;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.*;
+import java.util.regex.Pattern;
+
+public class DL4JVGG16Net extends ExternalParser implements ObjectRecogniser {
+
+private static final Logger LOG = 
LoggerFactory.getLogger(DL4JVGG16Net.class);
+public static final Set SUPPORTED_MIMES = 
Collections.singleton(MediaType.image("jpeg"));
+private static final LineConsumer IGNORED_LINE_LOGGER = new LineConsumer() 
{
+@Override
+public void consume(String line) {
+LOG.debug(line);
+}
+};
+private static final String HOME_DIR = System.getProperty("user.home");
+private static final String BASE_DIR = ".dl4j/trainedmodels";
+private static String MODEL_DIR = HOME_DIR + File.separator + BASE_DIR;
+private static String MODEL_DIR_PREPROCESSED = MODEL_DIR + File.separator 
+ "tikaPreprocessed" + File.separator;
+@Field
+private String modelType = "VGG16";
+@Field
+private File modelFile;
+@Field
+private String outPattern = "(.*) \\(score = ([0-9]+\\.[0-9]+)\\)$";
+@Field
+private String serialize = "yes";
+private File locationToSave;
+private boolean available = false;
+private ComputationGraph model;
+
+public Set getSupportedMimes() {
+return SUPPORTED_MIMES;
+}
+
+@Override
+public boolean isAvailable() {
+return available;
+}
+
+@Override
+public void initialize(Map params) throws 
TikaConfigException {
+try {
+TrainedModelHelper helper;
+switch (modelType) {
+case "VGG16NOTOP":
+throw new TikaConfigException("VGG16NOTOP is not supported 
right now");
+/*# TODO hookup VGGNOTOP by uncommenting following code once 
the issue is resolved by dl4j team
+modelFile = new 
File(MODEL_DIR_PREPROCESSED+File.separator+"vgg16_notop.zip");
+   locationToSave= new 
File(MODEL_DIR+File.separator+"tikaPreprocessed"+File.separator+"vgg16.zip");
+helper = new TrainedModelHelper(TrainedModels.VGG16NOTOP);
+break;*/
+case "VGG16":
+helper = new TrainedModelHelper(TrainedModels.VGG16);
+modelFile = new File(MODEL_DIR_PREPROCESSED + 
File.separator + "vgg16.zip");
+   

[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058438#comment-16058438
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

thammegowda commented on a change in pull request #182: Creation of TIKA-2298 
contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#discussion_r123379000
 
 

 ##
 File path: 
tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml
 ##
 @@ -0,0 +1,32 @@
+
+
+
+
+
+
+image/jpeg
+
+2
+0.015
+org.apache.tika.dl.imagerec.DL4JVGG16Net
+   VGG16
+   yes
 
 Review comment:
   Lets make it
   ```xml
   true
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058436#comment-16058436
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

thammegowda commented on a change in pull request #182: Creation of TIKA-2298 
contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#discussion_r123390913
 
 

 ##
 File path: tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JVGG16Net.java
 ##
 @@ -0,0 +1,161 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.dl.imagerec;
+
+import org.apache.tika.config.Field;
+import org.apache.tika.config.Param;
+import org.apache.tika.exception.TikaConfigException;
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.mime.MediaType;
+import org.apache.tika.parser.ParseContext;
+import org.apache.tika.parser.external.ExternalParser;
+import org.apache.tika.parser.recognition.ObjectRecogniser;
+import org.apache.tika.parser.recognition.RecognisedObject;
+import org.datavec.image.loader.NativeImageLoader;
+import org.deeplearning4j.nn.graph.ComputationGraph;
+import 
org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModelHelper;
+import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModels;
+import org.deeplearning4j.util.ModelSerializer;
+import org.nd4j.linalg.api.ndarray.INDArray;
+import org.nd4j.linalg.dataset.api.preprocessor.DataNormalization;
+import org.nd4j.linalg.dataset.api.preprocessor.VGG16ImagePreProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.xml.sax.ContentHandler;
+import org.xml.sax.SAXException;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.*;
+import java.util.regex.Pattern;
+
+public class DL4JVGG16Net extends ExternalParser implements ObjectRecogniser {
 
 Review comment:
   I do not think it is actually needed to extend `ExternalParser`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058435#comment-16058435
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

thammegowda commented on a change in pull request #182: Creation of TIKA-2298 
contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#discussion_r123379864
 
 

 ##
 File path: tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JVGG16Net.java
 ##
 @@ -0,0 +1,161 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.dl.imagerec;
+
+import org.apache.tika.config.Field;
+import org.apache.tika.config.Param;
+import org.apache.tika.exception.TikaConfigException;
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.mime.MediaType;
+import org.apache.tika.parser.ParseContext;
+import org.apache.tika.parser.external.ExternalParser;
+import org.apache.tika.parser.recognition.ObjectRecogniser;
+import org.apache.tika.parser.recognition.RecognisedObject;
+import org.datavec.image.loader.NativeImageLoader;
+import org.deeplearning4j.nn.graph.ComputationGraph;
+import 
org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModelHelper;
+import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModels;
+import org.deeplearning4j.util.ModelSerializer;
+import org.nd4j.linalg.api.ndarray.INDArray;
+import org.nd4j.linalg.dataset.api.preprocessor.DataNormalization;
+import org.nd4j.linalg.dataset.api.preprocessor.VGG16ImagePreProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.xml.sax.ContentHandler;
+import org.xml.sax.SAXException;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.*;
+import java.util.regex.Pattern;
+
+public class DL4JVGG16Net extends ExternalParser implements ObjectRecogniser {
+
+private static final Logger LOG = 
LoggerFactory.getLogger(DL4JVGG16Net.class);
+public static final Set SUPPORTED_MIMES = 
Collections.singleton(MediaType.image("jpeg"));
+private static final LineConsumer IGNORED_LINE_LOGGER = new LineConsumer() 
{
+@Override
+public void consume(String line) {
+LOG.debug(line);
+}
+};
+private static final String HOME_DIR = System.getProperty("user.home");
+private static final String BASE_DIR = ".dl4j/trainedmodels";
+private static String MODEL_DIR = HOME_DIR + File.separator + BASE_DIR;
+private static String MODEL_DIR_PREPROCESSED = MODEL_DIR + File.separator 
+ "tikaPreprocessed" + File.separator;
+@Field
+private String modelType = "VGG16";
+@Field
+private File modelFile;
+@Field
+private String outPattern = "(.*) \\(score = ([0-9]+\\.[0-9]+)\\)$";
+@Field
+private String serialize = "yes";
+private File locationToSave;
+private boolean available = false;
+private ComputationGraph model;
+
+public Set getSupportedMimes() {
+return SUPPORTED_MIMES;
+}
+
+@Override
+public boolean isAvailable() {
+return available;
+}
+
+@Override
+public void initialize(Map params) throws 
TikaConfigException {
+try {
+TrainedModelHelper helper;
+switch (modelType) {
+case "VGG16NOTOP":
+throw new TikaConfigException("VGG16NOTOP is not supported 
right now");
+/*# TODO hookup VGGNOTOP by uncommenting following code once 
the issue is resolved by dl4j team
+modelFile = new 
File(MODEL_DIR_PREPROCESSED+File.separator+"vgg16_notop.zip");
+   locationToSave= new 
File(MODEL_DIR+File.separator+"tikaPreprocessed"+File.separator+"vgg16.zip");
+helper = new TrainedModelHelper(TrainedModels.VGG16NOTOP);
+break;*/
+case "VGG16":
+helper = new TrainedModelHelper(TrainedModels.VGG16);
+modelFile = new File(MODEL_DIR_PREPROCESSED + 
File.separator + "vgg16.zip");
+   

[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058437#comment-16058437
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

thammegowda commented on a change in pull request #182: Creation of TIKA-2298 
contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#discussion_r123391080
 
 

 ##
 File path: tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JVGG16Net.java
 ##
 @@ -0,0 +1,161 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.dl.imagerec;
+
+import org.apache.tika.config.Field;
+import org.apache.tika.config.Param;
+import org.apache.tika.exception.TikaConfigException;
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.mime.MediaType;
+import org.apache.tika.parser.ParseContext;
+import org.apache.tika.parser.external.ExternalParser;
+import org.apache.tika.parser.recognition.ObjectRecogniser;
+import org.apache.tika.parser.recognition.RecognisedObject;
+import org.datavec.image.loader.NativeImageLoader;
+import org.deeplearning4j.nn.graph.ComputationGraph;
+import 
org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModelHelper;
+import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModels;
+import org.deeplearning4j.util.ModelSerializer;
+import org.nd4j.linalg.api.ndarray.INDArray;
+import org.nd4j.linalg.dataset.api.preprocessor.DataNormalization;
+import org.nd4j.linalg.dataset.api.preprocessor.VGG16ImagePreProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.xml.sax.ContentHandler;
+import org.xml.sax.SAXException;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.*;
+import java.util.regex.Pattern;
+
+public class DL4JVGG16Net extends ExternalParser implements ObjectRecogniser {
+
+private static final Logger LOG = 
LoggerFactory.getLogger(DL4JVGG16Net.class);
+public static final Set SUPPORTED_MIMES = 
Collections.singleton(MediaType.image("jpeg"));
+private static final LineConsumer IGNORED_LINE_LOGGER = new LineConsumer() 
{
+@Override
+public void consume(String line) {
+LOG.debug(line);
+}
+};
+private static final String HOME_DIR = System.getProperty("user.home");
+private static final String BASE_DIR = ".dl4j/trainedmodels";
+private static String MODEL_DIR = HOME_DIR + File.separator + BASE_DIR;
+private static String MODEL_DIR_PREPROCESSED = MODEL_DIR + File.separator 
+ "tikaPreprocessed" + File.separator;
+@Field
+private String modelType = "VGG16";
+@Field
+private File modelFile;
+@Field
+private String outPattern = "(.*) \\(score = ([0-9]+\\.[0-9]+)\\)$";
+@Field
+private String serialize = "yes";
+private File locationToSave;
+private boolean available = false;
+private ComputationGraph model;
+
+public Set getSupportedMimes() {
+return SUPPORTED_MIMES;
+}
+
+@Override
+public boolean isAvailable() {
+return available;
+}
+
+@Override
+public void initialize(Map params) throws 
TikaConfigException {
+try {
+TrainedModelHelper helper;
+switch (modelType) {
+case "VGG16NOTOP":
+throw new TikaConfigException("VGG16NOTOP is not supported 
right now");
+/*# TODO hookup VGGNOTOP by uncommenting following code once 
the issue is resolved by dl4j team
+modelFile = new 
File(MODEL_DIR_PREPROCESSED+File.separator+"vgg16_notop.zip");
+   locationToSave= new 
File(MODEL_DIR+File.separator+"tikaPreprocessed"+File.separator+"vgg16.zip");
+helper = new TrainedModelHelper(TrainedModels.VGG16NOTOP);
+break;*/
+case "VGG16":
+helper = new TrainedModelHelper(TrainedModels.VGG16);
+modelFile = new File(MODEL_DIR_PREPROCESSED + 
File.separator + "vgg16.zip");
+   

[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058324#comment-16058324
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

thammegowda commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-310216712
 
 
   @asmehra95 Sorry for the delay (vacations!). Reviewing it today
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036495#comment-16036495
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-306096976
 
 
   @chrismattmann ping.. any update?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-05-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027492#comment-16027492
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-304462311
 
 
   @thammegowda @chrismattmann awaiting review for this pull request...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-05-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027486#comment-16027486
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-304462094
 
 
   frickin' awesome! I'm going to test this today @asmehra95 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-05-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027483#comment-16027483
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann closed pull request #159: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j
URL: https://github.com/apache/tika/pull/159
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-05-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027482#comment-16027482
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on issue #159: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j
URL: https://github.com/apache/tika/pull/159#issuecomment-304461966
 
 
   superseded by #182 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-05-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027480#comment-16027480
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

asmehra95 opened a new pull request #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182
 
 
   Note: This is a modified form of #159 raised earlier by me.
   I have imported VGG16 model into tika-dl module using deeplearning4j .
   The usage of this recogniser is very similar to TensorFlowRESTrecogniser but 
it doesn't require any external setup, like running RESTservice in as in case 
of TensorFlowRESTrecogniser.
   You can read more about TensorFlowRESTrecogniser at 
https://wiki.apache.org/tika/TikaAndVision
   
   To use the DL4JVGG16Net set
   class param to org.apache.tika.dl.imagerec.DL4JVGG16Net
   modelType to VGG16
   sample configuration is given below for refference.
   
   ```
   
   
   
   
   image/jpeg
   
   2
   0.015
   org.apache.tika.dl.imagerec.DL4JVGG16Net
VGG16
yes
   
   
   
   
   ```
   Save the configuration at your preffered location. 
   A default one is provided at ``` 
tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml ```
   
   To run it in default configuration, build the project and move to root 
directory of the project and run the command.
   
   '``` java -Xmx3G -cp 
./tika-dl/target/tika-dl-1.15-SNAPSHOT-jar-with-dependencies.jar;tika-app/target/tika-app-1.15-SNAPSHOT.jar
 org.apache.tika.cli.TikaCLI  
--config=tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml
 tika-dl/src/test/resources/org/apache/tika/dl/imagerec/lion.jpg```
   -Xmx3G is required because VGG16 model requires quite a lot of memory to run.
   Observations:
   When loading searilized model from disk:
   It only require around 1200mb of ram to run.
   
   When model is loaded from h5 files using helper functions
   It requires 2500mb of ram to run the model (required only one time if 
serialization is set to yes)
   
   Once the model runs, it automatically downloads the model file using helper 
functions of DL4J locally at .dl4j/trainedModels
   To speed up the process in future, once the model is loaded from original 
hash files, it is serialized and saved on disk at 
.dl4j/trainedModels/tikaPreprocessed which significantly reduces
   the resource usage (specially memory consumption) for future loads.
   Issue Link:
   https://issues.apache.org/jira/browse/TIKA-2298
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-05-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019113#comment-16019113
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on issue #159: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j
URL: https://github.com/apache/tika/pull/159#issuecomment-302990927
 
 
   ping @asmehra95 any update?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011799#comment-16011799
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j
URL: https://github.com/apache/tika/pull/159#issuecomment-301683734
 
 
   yes sure! i am on it! @chrismattmann 
   i will raise the PR as soon as possible
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.15
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005209#comment-16005209
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

chrismattmann commented on issue #159: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j
URL: https://github.com/apache/tika/pull/159#issuecomment-300577734
 
 
   guys #165 is now committed, so can this be updated to be inside Tika-DL? 
@asmehra95 @thammegowda 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.15
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965312#comment-15965312
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j
URL: https://github.com/apache/tika/pull/159#issuecomment-293461556
 
 
   @thammegowda Thank you for your comment. 
   I will open a pull request once the tika-dl gets merged.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.15
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964786#comment-15964786
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

thammegowda commented on issue #159: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j
URL: https://github.com/apache/tika/pull/159#issuecomment-293358840
 
 
   @asmehra95 appreciate your effort. Thanks for updating the code based on our 
review.
   
   1. I feel this PR should be raised to `tika-dl` module that is being 
proposed in #165 so that we can isolate DL4J dependencies to that module 
instead of `tika-parsers`. we have to wait till #165 PR gets merged and then 
move your classes inside tika-dl module.
   2. I am not sure whats happening with online/offline issue. It seems to me 
that one or other necessary file is missing (either the Keras JSON model, or 
the weights or the labels) so it tries to download from S3. I will have a 
closer look again and report my findings.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.15
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964635#comment-15964635
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j
URL: https://github.com/apache/tika/pull/159#issuecomment-293141458
 
 
   hello folks,
   I have fixed formatting issues @thammegowda please review it. Let me know if 
any changes are required.
   I have made it a little more customizable. You can now choose if you want to 
save model to disk or not.
   Saving a model to disk requires a lot of memory( around 500mb ) but it saves 
a lot of runtime memory once the model is saved.
   
   How to use:
   add a field in config file
   ```xml
   no 
   ```
   It can be yes or no
   
   Observations:
   When loading model from disk:
   It only require around 1200mb of ram to run.
   
   When model is loaded from h5 files using helper functions 
   It requires 2500mb of ram to run the model. 
   
   I think we can distribute serialized models for vgg16 instead of the 
original hash files. Will it produce any problems  @saudet @agibsonccc , One 
more thing, the VGG16 model doesn't work completely offline. It connects to 
internet after processing the image to decode output. Can we make it entirely 
offline?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.15
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963905#comment-15963905
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j
URL: https://github.com/apache/tika/pull/159#issuecomment-293166407
 
 
   @agibsonccc What i am saying is, instead of downloading image weights(h5 
file) i could write functions that download the serialized model from my repo 
because both are approximately same in size. The tika user would directly load 
from this serialized model not the image weights. 
   
   What i doubt is that if the serialized model would work for all the 
platforms or not. Is there any platform dependency on it. 
   The model will be serialized using 
   ModelSerializer.writeModel(model, locationToSave, true);
   and  loaded using 
model = ModelSerializer.restoreComputationGraph(locationToSave);
   
   Regarding the offline feature:
   
   When i try to decode predictions for an image offline it produces an error. 
Apparently it connects online for decoding.
   here is the stacktrace when offline
   https://gist.github.com/asmehra95/ac8bcfffbc5c1932d38a034d9b486c99
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.15
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963807#comment-15963807
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

agibsonccc commented on issue #159: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j
URL: https://github.com/apache/tika/pull/159#issuecomment-293144691
 
 
   Not sure what you mean here..it needs to download the image weights *once* 
not all the time. You can try bundling the weights with the model if you want, 
either that or you can take the pretrained model and save that with dl4j then 
just bundle that with the jar.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.15
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963800#comment-15963800
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

saudet commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- 
Import of vgg16 via Deeplearning4j
URL: https://github.com/apache/tika/pull/159#issuecomment-293142844
 
 
   /cc @turambar would know more
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.15
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963795#comment-15963795
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j
URL: https://github.com/apache/tika/pull/159#issuecomment-293141458
 
 
   hello folks,
   I have fixed formatting issues @thammegowda please review it. Let me know if 
any changes are required.
   I have made it a little more customizable. You can now choose if you want to 
save model to disk or not.
   Saving a model to disk requires a lot of memory( around 500mb ) but it saves 
a lot of runtime memory once the model is saved.
   
   How to use:
   add a field in config file
   no 
   It can be yes or no
   
   Observations:
   When loading model from disk:
   It only require around 1200mb of ram to run.
   
   When model is loaded from h5 files using helper functions 
   It requires 2500mb of ram to run the model. 
   
   I think we can distribute serialized models for vgg16 instead of the 
original hash files. Will it produce any problems  @saudet @agibsonccc , One 
more thing, the VGG16 model doesn't work completely offline. It connects to 
internet after processing the image to decode output. Can we make it entirely 
offline?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.15
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940472#comment-15940472
 ] 

ASF GitHub Bot commented on TIKA-2298:
--

GitHub user asmehra95 opened a pull request:

https://github.com/apache/tika/pull/159

fix for TIKA-2298 contributed by asmehra95

I have imported VGG16 model into Apache tika using deeplearning4j.
The usage of this recogniser is very similar to TensorFlowRESTrecogniser 
but it doesn't require any external setup, like running  RESTservice in as in 
case of TensorFlowRESTrecogniser.
You can read more about TensorFlowRESTrecogniser at 
https://wiki.apache.org/tika/TikaAndVision

To use the DL4JImageRecogniser set
class param to org.apache.tika.parser.recognition.dl4j.DL4JImageRecogniser
modelType to VGG16
sample configuration is given below for refference. 




image/jpeg

5
0.015
org.apache.tika.parser.recognition.dl4j.DL4JImageRecogniser
VGG16 




Save the configuration at : 
tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest

To run it, build the project and move to root directory of the project and 
run the command

java -Xmx3G -jar tika-app/target/tika-app-1.14.jar 
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml
 

-Xmx3G is required because VGG16 model requires quite a lot of memory to 
run. If your system is not able to run it, you may try to pump up the memory 
further

Once the model runs, it automatically downloads the model file using helper 
functions of DL4J locally at .dl4j/trainedModels
To speed up the process in future, once the model is loaded from original 
hash files, it is serialized and saved on disk at 
.dl4j/trainedModels/tikaPreprocessed which significantly reduces
the resource usage (specially memory consumption) for future loads.
For more details you can red this gist: 
https://gist.github.com/asmehra95/a16c49ec91f7f0d7b39c5bf6c2483e4d
 Issue Link:
https://issues.apache.org/jira/browse/TIKA-2298

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/asmehra95/tika master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/tika/pull/159.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #159


commit a5cd6f42dcded603f2b6de9476280c4bd95b6806
Author: asmehra95 
Date:   2017-03-24T14:21:40Z

Added dependencies for DL4JImageRecogniser parser

commit f777f21b47c8d122e6b7a0819b44977f1d571c59
Author: asmehra95 
Date:   2017-03-24T14:28:54Z

Imported VGG16 model via deeplearning4j




> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.15
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-03-22 Thread Avtar Singh Mehra
Thank you TG,
the problem seemed to be with the helper functions of the dl4j, however i
have tried to import the model without the helper function and it imported
perfectly and i have received pretty good results. I have saved the
serialized model so as to improve resource usage of for running the model.
We can either provide this serialized model or one time load the function
and save the model. i am trying later approach because saved models take
huge amount of memory to store.(around 500mb for 53 mb model). I have yet
tested only with VGG16NoTop model (and still testing.) but still there is a
problem with helper functions. I will soon resolve the issue and put a pull
request for it.
This is what i have been working on:
https://github.com/asmehra95/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/recognition/dl4j/DL4JImageRecogniser.java


On 21 March 2017 at 22:05, Thamme Gowda (JIRA)  wrote:

>
> [ https://issues.apache.org/jira/browse/TIKA-2298?page=
> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel=15934847#comment-15934847 ]
>
> Thamme Gowda commented on TIKA-2298:
> 
>
> [~asmehra95]
> Please share a link to your code, I will have a look on this!
>
> Could you also refer to my example code at https://github.com/
> USCDataScience/dl4j-kerasimport-examples/tree/master/dl4j-import-example
> and see what flags to pass to the importer (especially flags to disable
> further training)?
>
> PR to that repo with your VGG16 example would be greatly appreciated!
>
> > To improve object recognition parser so that it may work without
> external RESTful service setup
> > 
> ---
> >
> > Key: TIKA-2298
> > URL: https://issues.apache.org/jira/browse/TIKA-2298
> > Project: Tika
> >  Issue Type: Improvement
> >  Components: parser
> >Affects Versions: 1.14
> >Reporter: Avtar Singh
> >  Labels: ObjectRecognitionParser
> > Fix For: 1.15
> >
> >   Original Estimate: 672h
> >  Remaining Estimate: 672h
> >
> > When ObjectRecognitionParser was built to do image recognition, there
> wasn't
> > good support for Java frameworks.  All the popular neural networks were
> in
> > C++ or python.  Since there was nothing that runs within JVM, we tried
> > several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> > However, this game is changing slowly now. Deeplearning4j, the most
> famous
> > neural network library for JVM, now supports importing models that are
> > pre-trained in python/C++ based kits [5].
> > *Improvement:*
> > It will be nice to have an implementation of ObjectRecogniser that
> > doesn't require any external setup(like installation of native libraries
> or
> > starting REST services). Reasons: easy to distribute and also to cut the
> IO
> > time.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.15#6346)
>


Re: [jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-03-22 Thread Avtar Singh Mehra
I m sry there are some errors with parameter passing but rest of code is
working, i will soon resolve it.
Thank you
Avtar Singh

On 23 March 2017 at 00:10, Avtar Singh Mehra  wrote:

> Thank you TG,
> the problem seemed to be with the helper functions of the dl4j, however i
> have tried to import the model without the helper function and it imported
> perfectly and i have received pretty good results. I have saved the
> serialized model so as to improve resource usage of for running the model.
> We can either provide this serialized model or one time load the function
> and save the model. i am trying later approach because saved models take
> huge amount of memory to store.(around 500mb for 53 mb model). I have yet
> tested only with VGG16NoTop model (and still testing.) but still there is a
> problem with helper functions. I will soon resolve the issue and put a pull
> request for it.
> This is what i have been working on:
> https://github.com/asmehra95/tika/blob/master/tika-parsers/
> src/main/java/org/apache/tika/parser/recognition/dl4j/
> DL4JImageRecogniser.java
>
> On 21 March 2017 at 22:05, Thamme Gowda (JIRA)  wrote:
>
>>
>> [ https://issues.apache.org/jira/browse/TIKA-2298?page=com.
>> atlassian.jira.plugin.system.issuetabpanels:comment-tabpane
>> l=15934847#comment-15934847 ]
>>
>> Thamme Gowda commented on TIKA-2298:
>> 
>>
>> [~asmehra95]
>> Please share a link to your code, I will have a look on this!
>>
>> Could you also refer to my example code at https://github.com/USCDataScie
>> nce/dl4j-kerasimport-examples/tree/master/dl4j-import-example and see
>> what flags to pass to the importer (especially flags to disable further
>> training)?
>>
>> PR to that repo with your VGG16 example would be greatly appreciated!
>>
>> > To improve object recognition parser so that it may work without
>> external RESTful service setup
>> > 
>> ---
>> >
>> > Key: TIKA-2298
>> > URL: https://issues.apache.org/jira/browse/TIKA-2298
>> > Project: Tika
>> >  Issue Type: Improvement
>> >  Components: parser
>> >Affects Versions: 1.14
>> >Reporter: Avtar Singh
>> >  Labels: ObjectRecognitionParser
>> > Fix For: 1.15
>> >
>> >   Original Estimate: 672h
>> >  Remaining Estimate: 672h
>> >
>> > When ObjectRecognitionParser was built to do image recognition, there
>> wasn't
>> > good support for Java frameworks.  All the popular neural networks were
>> in
>> > C++ or python.  Since there was nothing that runs within JVM, we tried
>> > several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
>> > However, this game is changing slowly now. Deeplearning4j, the most
>> famous
>> > neural network library for JVM, now supports importing models that are
>> > pre-trained in python/C++ based kits [5].
>> > *Improvement:*
>> > It will be nice to have an implementation of ObjectRecogniser that
>> > doesn't require any external setup(like installation of native
>> libraries or
>> > starting REST services). Reasons: easy to distribute and also to cut
>> the IO
>> > time.
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.15#6346)
>>
>
>


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-03-21 Thread Thamme Gowda (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934847#comment-15934847
 ] 

Thamme Gowda commented on TIKA-2298:


[~asmehra95]
Please share a link to your code, I will have a look on this!

Could you also refer to my example code at 
https://github.com/USCDataScience/dl4j-kerasimport-examples/tree/master/dl4j-import-example
 and see what flags to pass to the importer (especially flags to disable 
further training)?

PR to that repo with your VGG16 example would be greatly appreciated!

> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.15
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-03-20 Thread Avtar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932626#comment-15932626
 ] 

Avtar Singh commented on TIKA-2298:
---

Not able run the VGG16 model in dl4j
When I try to run full fledged model i get this error.
Exception in thread "main" java.lang.OutOfMemoryError: Cannot allocate new 
FloatPointer(138357544): totalBytes = 1G, physicalBytes = 2G
at org.bytedeco.javacpp.FloatPointer.(FloatPointer.java:76)
at 
org.nd4j.linalg.api.buffer.BaseDataBuffer.(BaseDataBuffer.java:445)
at org.nd4j.linalg.api.buffer.FloatBuffer.(FloatBuffer.java:57)
at 
org.nd4j.linalg.api.buffer.factory.DefaultDataBufferFactory.createFloat(DefaultDataBufferFactory.java:236)
at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1301)
at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1275)
at org.nd4j.linalg.api.ndarray.BaseNDArray.(BaseNDArray.java:252)
at org.nd4j.linalg.cpu.nativecpu.NDArray.(NDArray.java:109)
at 
org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.create(CpuNDArrayFactory.java:247)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4768)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4726)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3861)
at 
org.deeplearning4j.nn.graph.ComputationGraph.init(ComputationGraph.java:342)
at 
org.deeplearning4j.nn.graph.ComputationGraph.init(ComputationGraph.java:274)
at 
org.deeplearning4j.nn.modelimport.keras.KerasModel.getComputationGraph(KerasModel.java:483)
at 
org.deeplearning4j.nn.modelimport.keras.KerasModel.getComputationGraph(KerasModel.java:471)
at 
org.deeplearning4j.nn.modelimport.keras.KerasModelImport.importKerasModelAndWeights(KerasModelImport.java:178)
at modelImport.ModelImportConfig.main(ModelImportConfig.java:18)
Caused by: java.lang.OutOfMemoryError: Native allocator returned address == 0
at org.bytedeco.javacpp.FloatPointer.(FloatPointer.java:70)
... 17 more

when i run the model that says 'NoTop' It is says: Invalid configuration 
I found out in the source code for helper functions, that the json file needs  
fixing. 

I am running on i5 6th gen with 4gb RAM.
I tried 2 OS: Ubuntu and Window.
Is there any way i can run it?

> To improve object recognition parser so that it may work without external 
> RESTful service setup
> ---
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Avtar Singh
>  Labels: ObjectRecognitionParser
> Fix For: 1.15
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)