[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978095#comment-15978095 ] ASF GitHub Bot commented on TIKA-2306: -- KranthiGV commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-295912340 Thanks for updating the URL in dockerfile. I have made the minor changes on Wiki page now (V3 to V4, etc). I will be updating the results of running the parser by today. Thank you. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Assignee: Thamme Gowda >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978086#comment-15978086 ] Kranthi Kiran GV commented on TIKA-2306: [~chrismattmann] Apologies! I have updated the wiki page at https://wiki.apache.org/tika/TikaAndVision Everything is updated and in place now. Thank you. Glad you liked the work, Chris! > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Assignee: Thamme Gowda >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1599#comment-1599 ] Chris A. Mattmann commented on TIKA-2306: - the wiki update is critical BTW. Normally I would say err on the side of updating the wiki *before* committing. That way our documentation is an explicit pre-requisite to the code. > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Assignee: Thamme Gowda >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1598#comment-1598 ] Chris A. Mattmann commented on TIKA-2306: - great work [~kranthigv] and [~thammegowda]! > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Assignee: Thamme Gowda >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977615#comment-15977615 ] Hudson commented on TIKA-2306: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1245 (See [https://builds.apache.org/job/Tika-trunk/1245/]) fix for TIKA-2306 contributed by kranthigv (kranthi.gv: [https://github.com/apache/tika/commit/236db96393d94756dbc2e3f40b318f8f93b95dff]) * (edit) tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/classify_image.py fix for TIKA-2306 contributed by kranthigv (kranthi.gv: [https://github.com/apache/tika/commit/0c0bd4bec2312355d2bc48426f8ec94306d0e4a0]) * (edit) tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/classify_image.py fix for TIKA-2306 contributed by kranthigv (kranthi.gv: [https://github.com/apache/tika/commit/09cb2df973f20e3a877ca1309b67384264650be0]) * (edit) tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/inceptionapi.py * (edit) tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTRecogniser.java * (edit) tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/InceptionRestDockerfile fix for TIKA-2306 contributed by kranthigv (kranthi.gv: [https://github.com/apache/tika/commit/f92809ac19d5bef903ef1ac393092e6a13884fc0]) * (edit) tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/InceptionRestDockerfile fix for TIKA-2306 contributed by kranthigv (kranthi.gv: [https://github.com/apache/tika/commit/be773cacaf3c344c11fff9b85ebaf1d0dc8b5174]) * (edit) tika-parsers/src/test/java/org/apache/tika/parser/recognition/ObjectRecognitionParserTest.java * (edit) tika-parsers/src/test/java/org/apache/tika/parser/recognition/tf/TensorflowImageRecParserTest.java > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Assignee: Thamme Gowda >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977513#comment-15977513 ] ASF GitHub Bot commented on TIKA-2306: -- KranthiGV commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-295912340 Thanks for updating the URL in dockerfile. I have made the minor changes on Wiki page now (V3 to V4, etc). I will be updating the results of running the parser by today. Thank you. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Assignee: Thamme Gowda >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977487#comment-15977487 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-295906412 @KranthiGV It is now merged. I updated the URL in dockerfile to the ASF repo. Could you please update the URLs in the wiki https://wiki.apache.org/tika/TikaAndVision Inception/v3 URLS should now be replaced by inception/v4 Let us know if you need access to wiki. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Assignee: Thamme Gowda >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977467#comment-15977467 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda closed pull request #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975470#comment-15975470 ] ASF GitHub Bot commented on TIKA-2306: -- KranthiGV commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-295430728 That sounds great! I'm looking at _TMP_IMAGE_METADATA_PARSER and the CompositeParser. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975376#comment-15975376 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-295408367 Got it. users who aren't using Inception will not lose JPEG metadata. We can merge this PR and support the composition of JPEG and ObjectRecognition in a new PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975255#comment-15975255 ] ASF GitHub Bot commented on TIKA-2306: -- tballison commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-295382171 > That's right. Ok, I'm ok with this behavior. Users who want inception lose JPEG metadata (for now). What I don't want to repeat is users who aren't using the ObjectRecognizer losing JPEG metadata. Bonus points for copy/paste of TesseractOCRParser's metadata insertion, but not required, IMHO. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975246#comment-15975246 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-295379808 > I looks like you have to turn the ObjectRecognitionParser on via config, etc? So, the default behavior is that the JPEGParser is called and the ObjectRecognitionParser is not called, right? That's right. > To merge metadata, see: _TMP_IMAGE_METADATA_PARSER in TesseractOCRParser Thanks! I will have a look This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975243#comment-15975243 ] ASF GitHub Bot commented on TIKA-2306: -- tballison commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-295379227 I looks like you have to turn the ObjectRecognitionParser on via config, etc? So, the default behavior is that the JPEGParser is called and the ObjectRecognitionParser is not called, right? To merge metadata, see: _TMP_IMAGE_METADATA_PARSER in TesseractOCRParser This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975241#comment-15975241 ] ASF GitHub Bot commented on TIKA-2306: -- tballison commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-295379227 I looks like you have to turn the ObjectRecognitionParser on via config, etc? So, the default behavior is that the JPEGParser is called and the ObjectRecognitionParser is not called, right? To merge metadata, see: _TMP_IMAGE_METADATA_PARSER in TesseractOCRParser This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975147#comment-15975147 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-295358397 > Wait, does this prevent jpeg metadata from being extracted? Oh yes! That is the case with existing ObjectRecognition Parser. How do we run two different parsers for the same file and merge the metadata? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975139#comment-15975139 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-295358397 > Wait, does this prevent jpeg metadata from being extracted? Oh yes! That is the case with existing ObjectRecognition Parser. How do we run two different parsers for the same file and merge the metadata? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975132#comment-15975132 ] ASF GitHub Bot commented on TIKA-2306: -- tballison commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-295354514 What's the worst that could happen? Go for it... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975128#comment-15975128 ] ASF GitHub Bot commented on TIKA-2306: -- tballison commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-295354514 What's the worst that could happen? Go for it... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975112#comment-15975112 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-295351578 @tballison Can we merge this before the Tika 1.15 release. It's ready to be merged, I had it tested. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969539#comment-15969539 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-294238344 Thanks, @KranthiGV. LGTM! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968947#comment-15968947 ] ASF GitHub Bot commented on TIKA-2306: -- KranthiGV commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-294143638 @thammegowda I have made the necessary changes at Reduced disk I/O commit (https://github.com/apache/tika/pull/163/commits/db8c81410b8468e1a524ed2dbb28abc7154fb11d) The performance is compared by running it 50 times on an image. ``` script.sh: n=0; while [[ $n -lt 50 ]]; do java -jar tika-app/target/tika-app-1.15-SNAPSHOT.jar --config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml http://www.trbimg.com/img-57226a08/turbine/ct-tesla-model-3-unveiling-20160404/650/650x366; n=$((n+1)); done time ./script.sh ``` Before changes: `real4m33.334s` After changes: `real1m0.736s` TODO after merging: 1) Update the inceptionapi.py link to Apache's repo link. 2) Document the usage in Wiki. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968627#comment-15968627 ] ASF GitHub Bot commented on TIKA-2306: -- KranthiGV commented on a change in pull request #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#discussion_r111530356 ## File path: tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/classify_image.py ## @@ -44,169 +49,205 @@ from six.moves import urllib import tensorflow as tf +from datasets import imagenet, dataset_utils +from nets import inception +from preprocessing import inception_preprocessing + +slim = tf.contrib.slim + FLAGS = tf.app.flags.FLAGS -# classify_image_graph_def.pb: -# Binary representation of the GraphDef protocol buffer. -# imagenet_synset_to_human_label_map.txt: +# inception_v4.ckpt +# Inception V4 checkpoint file. +# imagenet_metadata.txt # Map from synset ID to a human readable string. -# imagenet_2012_challenge_label_map_proto.pbtxt: +# imagenet_lsvrc_2015_synsets.txt # Text representation of a protocol buffer mapping a label to synset ID. tf.app.flags.DEFINE_string( 'model_dir', '/tmp/imagenet', -"""Path to classify_image_graph_def.pb, """ -"""imagenet_synset_to_human_label_map.txt, and """ -"""imagenet_2012_challenge_label_map_proto.pbtxt.""") +"""Path to inception_v4.ckpt, """ +"""imagenet_lsvrc_2015_synsets.txt, and """ +"""imagenet_metadata.txt.""") tf.app.flags.DEFINE_string('image_file', '', """Absolute path to image file.""") tf.app.flags.DEFINE_integer('num_top_predictions', 5, """Display this many predictions.""") # pylint: disable=line-too-long -DATA_URL = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz' +DATA_URL = 'http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz' # pylint: enable=line-too-long -class NodeLookup(object): - """Converts integer node ID's to human readable labels.""" +def create_readable_names_for_imagenet_labels(): +"""Create a dict mapping label id to human readable string. + +Returns: +labels_to_names: dictionary where keys are integers from to 1000 +and values are human-readable names. + +We retrieve a synset file, which contains a list of valid synset labels used +by ILSVRC competition. There is one synset one per line, eg. +# n01440764 +# n01443537 +We also retrieve a synset_to_human_file, which contains a mapping from synsets +to human-readable names for every synset in Imagenet. These are stored in a +tsv format, as follows: +# n02119247black fox +# n02119359silver fox +We assign each synset (in alphabetical order) an integer, starting from 1 +(since 0 is reserved for the background class). + +Code is based on + https://github.com/tensorflow/models/blob/master/inception/inception/data/build_imagenet_data.py#L463 +""" + +# pylint: disable=line-too-long + +dest_directory = FLAGS.model_dir + +synset_list = [s.strip() for s in open(os.path.join( +dest_directory, 'imagenet_lsvrc_2015_synsets.txt')).readlines()] +num_synsets_in_ilsvrc = len(synset_list) +assert num_synsets_in_ilsvrc == 1000 - def __init__(self, - label_lookup_path=None, - uid_lookup_path=None): -if not label_lookup_path: - label_lookup_path = os.path.join( - FLAGS.model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt') -if not uid_lookup_path: - uid_lookup_path = os.path.join( - FLAGS.model_dir, 'imagenet_synset_to_human_label_map.txt') -self.node_lookup = self.load(label_lookup_path, uid_lookup_path) +synset_to_human_list = open(os.path.join( +dest_directory, 'imagenet_metadata.txt')).readlines() +num_synsets_in_all_imagenet = len(synset_to_human_list) +assert num_synsets_in_all_imagenet == 21842 - def load(self, label_lookup_path, uid_lookup_path): -"""Loads a human readable English name for each softmax node. +synset_to_human = {} +for s in synset_to_human_list: +parts = s.strip().split('\t') +assert len(parts) == 2 +synset = parts[0] +human = parts[1] +synset_to_human[synset] = human + +label_index = 1 +labels_to_names = {0: 'background'} +for synset in synset_list: +name = synset_to_human[synset] +labels_to_names[label_index] = name +label_index += 1 + +return labels_to_names + + +def run_inference_on_image(image): +"""Runs inference on an image. Args: - label_lookup_path: string UID to integer node ID. - uid_lookup_path: string UID to human-readable string. + image: Image file name.
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968623#comment-15968623 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda commented on a change in pull request #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#discussion_r111530205 ## File path: tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/classify_image.py ## @@ -44,169 +49,205 @@ from six.moves import urllib import tensorflow as tf +from datasets import imagenet, dataset_utils +from nets import inception +from preprocessing import inception_preprocessing + +slim = tf.contrib.slim + FLAGS = tf.app.flags.FLAGS -# classify_image_graph_def.pb: -# Binary representation of the GraphDef protocol buffer. -# imagenet_synset_to_human_label_map.txt: +# inception_v4.ckpt +# Inception V4 checkpoint file. +# imagenet_metadata.txt # Map from synset ID to a human readable string. -# imagenet_2012_challenge_label_map_proto.pbtxt: +# imagenet_lsvrc_2015_synsets.txt # Text representation of a protocol buffer mapping a label to synset ID. tf.app.flags.DEFINE_string( 'model_dir', '/tmp/imagenet', -"""Path to classify_image_graph_def.pb, """ -"""imagenet_synset_to_human_label_map.txt, and """ -"""imagenet_2012_challenge_label_map_proto.pbtxt.""") +"""Path to inception_v4.ckpt, """ +"""imagenet_lsvrc_2015_synsets.txt, and """ +"""imagenet_metadata.txt.""") tf.app.flags.DEFINE_string('image_file', '', """Absolute path to image file.""") tf.app.flags.DEFINE_integer('num_top_predictions', 5, """Display this many predictions.""") # pylint: disable=line-too-long -DATA_URL = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz' +DATA_URL = 'http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz' # pylint: enable=line-too-long -class NodeLookup(object): - """Converts integer node ID's to human readable labels.""" +def create_readable_names_for_imagenet_labels(): +"""Create a dict mapping label id to human readable string. + +Returns: +labels_to_names: dictionary where keys are integers from to 1000 +and values are human-readable names. + +We retrieve a synset file, which contains a list of valid synset labels used +by ILSVRC competition. There is one synset one per line, eg. +# n01440764 +# n01443537 +We also retrieve a synset_to_human_file, which contains a mapping from synsets +to human-readable names for every synset in Imagenet. These are stored in a +tsv format, as follows: +# n02119247black fox +# n02119359silver fox +We assign each synset (in alphabetical order) an integer, starting from 1 +(since 0 is reserved for the background class). + +Code is based on + https://github.com/tensorflow/models/blob/master/inception/inception/data/build_imagenet_data.py#L463 +""" + +# pylint: disable=line-too-long + +dest_directory = FLAGS.model_dir + +synset_list = [s.strip() for s in open(os.path.join( +dest_directory, 'imagenet_lsvrc_2015_synsets.txt')).readlines()] +num_synsets_in_ilsvrc = len(synset_list) +assert num_synsets_in_ilsvrc == 1000 - def __init__(self, - label_lookup_path=None, - uid_lookup_path=None): -if not label_lookup_path: - label_lookup_path = os.path.join( - FLAGS.model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt') -if not uid_lookup_path: - uid_lookup_path = os.path.join( - FLAGS.model_dir, 'imagenet_synset_to_human_label_map.txt') -self.node_lookup = self.load(label_lookup_path, uid_lookup_path) +synset_to_human_list = open(os.path.join( +dest_directory, 'imagenet_metadata.txt')).readlines() +num_synsets_in_all_imagenet = len(synset_to_human_list) +assert num_synsets_in_all_imagenet == 21842 - def load(self, label_lookup_path, uid_lookup_path): -"""Loads a human readable English name for each softmax node. +synset_to_human = {} +for s in synset_to_human_list: +parts = s.strip().split('\t') +assert len(parts) == 2 +synset = parts[0] +human = parts[1] +synset_to_human[synset] = human + +label_index = 1 +labels_to_names = {0: 'background'} +for synset in synset_list: +name = synset_to_human[synset] +labels_to_names[label_index] = name +label_index += 1 + +return labels_to_names + + +def run_inference_on_image(image): +"""Runs inference on an image. Args: - label_lookup_path: string UID to integer node ID. - uid_lookup_path: string UID to human-readable string. + image: Image file name.
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968616#comment-15968616 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-294091086 @KranthiGV Also please pull changes on master branch and merge it with this branch. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968610#comment-15968610 ] ASF GitHub Bot commented on TIKA-2306: -- KranthiGV commented on a change in pull request #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#discussion_r111529725 ## File path: tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/classify_image.py ## @@ -44,169 +49,205 @@ from six.moves import urllib import tensorflow as tf +from datasets import imagenet, dataset_utils +from nets import inception +from preprocessing import inception_preprocessing + +slim = tf.contrib.slim + FLAGS = tf.app.flags.FLAGS -# classify_image_graph_def.pb: -# Binary representation of the GraphDef protocol buffer. -# imagenet_synset_to_human_label_map.txt: +# inception_v4.ckpt +# Inception V4 checkpoint file. +# imagenet_metadata.txt # Map from synset ID to a human readable string. -# imagenet_2012_challenge_label_map_proto.pbtxt: +# imagenet_lsvrc_2015_synsets.txt # Text representation of a protocol buffer mapping a label to synset ID. tf.app.flags.DEFINE_string( 'model_dir', '/tmp/imagenet', -"""Path to classify_image_graph_def.pb, """ -"""imagenet_synset_to_human_label_map.txt, and """ -"""imagenet_2012_challenge_label_map_proto.pbtxt.""") +"""Path to inception_v4.ckpt, """ +"""imagenet_lsvrc_2015_synsets.txt, and """ +"""imagenet_metadata.txt.""") tf.app.flags.DEFINE_string('image_file', '', """Absolute path to image file.""") tf.app.flags.DEFINE_integer('num_top_predictions', 5, """Display this many predictions.""") # pylint: disable=line-too-long -DATA_URL = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz' +DATA_URL = 'http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz' # pylint: enable=line-too-long -class NodeLookup(object): - """Converts integer node ID's to human readable labels.""" +def create_readable_names_for_imagenet_labels(): +"""Create a dict mapping label id to human readable string. + +Returns: +labels_to_names: dictionary where keys are integers from to 1000 +and values are human-readable names. + +We retrieve a synset file, which contains a list of valid synset labels used +by ILSVRC competition. There is one synset one per line, eg. +# n01440764 +# n01443537 +We also retrieve a synset_to_human_file, which contains a mapping from synsets +to human-readable names for every synset in Imagenet. These are stored in a +tsv format, as follows: +# n02119247black fox +# n02119359silver fox +We assign each synset (in alphabetical order) an integer, starting from 1 +(since 0 is reserved for the background class). + +Code is based on + https://github.com/tensorflow/models/blob/master/inception/inception/data/build_imagenet_data.py#L463 +""" + +# pylint: disable=line-too-long + +dest_directory = FLAGS.model_dir + +synset_list = [s.strip() for s in open(os.path.join( +dest_directory, 'imagenet_lsvrc_2015_synsets.txt')).readlines()] +num_synsets_in_ilsvrc = len(synset_list) +assert num_synsets_in_ilsvrc == 1000 - def __init__(self, - label_lookup_path=None, - uid_lookup_path=None): -if not label_lookup_path: - label_lookup_path = os.path.join( - FLAGS.model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt') -if not uid_lookup_path: - uid_lookup_path = os.path.join( - FLAGS.model_dir, 'imagenet_synset_to_human_label_map.txt') -self.node_lookup = self.load(label_lookup_path, uid_lookup_path) +synset_to_human_list = open(os.path.join( +dest_directory, 'imagenet_metadata.txt')).readlines() +num_synsets_in_all_imagenet = len(synset_to_human_list) +assert num_synsets_in_all_imagenet == 21842 - def load(self, label_lookup_path, uid_lookup_path): -"""Loads a human readable English name for each softmax node. +synset_to_human = {} +for s in synset_to_human_list: +parts = s.strip().split('\t') +assert len(parts) == 2 +synset = parts[0] +human = parts[1] +synset_to_human[synset] = human + +label_index = 1 +labels_to_names = {0: 'background'} +for synset in synset_list: +name = synset_to_human[synset] +labels_to_names[label_index] = name +label_index += 1 + +return labels_to_names + + +def run_inference_on_image(image): +"""Runs inference on an image. Args: - label_lookup_path: string UID to integer node ID. - uid_lookup_path: string UID to human-readable string. + image: Image file name.
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968599#comment-15968599 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda commented on a change in pull request #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#discussion_r111529125 ## File path: tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/classify_image.py ## @@ -44,169 +49,205 @@ from six.moves import urllib import tensorflow as tf +from datasets import imagenet, dataset_utils +from nets import inception +from preprocessing import inception_preprocessing + +slim = tf.contrib.slim + FLAGS = tf.app.flags.FLAGS -# classify_image_graph_def.pb: -# Binary representation of the GraphDef protocol buffer. -# imagenet_synset_to_human_label_map.txt: +# inception_v4.ckpt +# Inception V4 checkpoint file. +# imagenet_metadata.txt # Map from synset ID to a human readable string. -# imagenet_2012_challenge_label_map_proto.pbtxt: +# imagenet_lsvrc_2015_synsets.txt # Text representation of a protocol buffer mapping a label to synset ID. tf.app.flags.DEFINE_string( 'model_dir', '/tmp/imagenet', -"""Path to classify_image_graph_def.pb, """ -"""imagenet_synset_to_human_label_map.txt, and """ -"""imagenet_2012_challenge_label_map_proto.pbtxt.""") +"""Path to inception_v4.ckpt, """ +"""imagenet_lsvrc_2015_synsets.txt, and """ +"""imagenet_metadata.txt.""") tf.app.flags.DEFINE_string('image_file', '', """Absolute path to image file.""") tf.app.flags.DEFINE_integer('num_top_predictions', 5, """Display this many predictions.""") # pylint: disable=line-too-long -DATA_URL = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz' +DATA_URL = 'http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz' # pylint: enable=line-too-long -class NodeLookup(object): - """Converts integer node ID's to human readable labels.""" +def create_readable_names_for_imagenet_labels(): +"""Create a dict mapping label id to human readable string. + +Returns: +labels_to_names: dictionary where keys are integers from to 1000 +and values are human-readable names. + +We retrieve a synset file, which contains a list of valid synset labels used +by ILSVRC competition. There is one synset one per line, eg. +# n01440764 +# n01443537 +We also retrieve a synset_to_human_file, which contains a mapping from synsets +to human-readable names for every synset in Imagenet. These are stored in a +tsv format, as follows: +# n02119247black fox +# n02119359silver fox +We assign each synset (in alphabetical order) an integer, starting from 1 +(since 0 is reserved for the background class). + +Code is based on + https://github.com/tensorflow/models/blob/master/inception/inception/data/build_imagenet_data.py#L463 +""" + +# pylint: disable=line-too-long + +dest_directory = FLAGS.model_dir + +synset_list = [s.strip() for s in open(os.path.join( +dest_directory, 'imagenet_lsvrc_2015_synsets.txt')).readlines()] +num_synsets_in_ilsvrc = len(synset_list) +assert num_synsets_in_ilsvrc == 1000 - def __init__(self, - label_lookup_path=None, - uid_lookup_path=None): -if not label_lookup_path: - label_lookup_path = os.path.join( - FLAGS.model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt') -if not uid_lookup_path: - uid_lookup_path = os.path.join( - FLAGS.model_dir, 'imagenet_synset_to_human_label_map.txt') -self.node_lookup = self.load(label_lookup_path, uid_lookup_path) +synset_to_human_list = open(os.path.join( +dest_directory, 'imagenet_metadata.txt')).readlines() +num_synsets_in_all_imagenet = len(synset_to_human_list) +assert num_synsets_in_all_imagenet == 21842 - def load(self, label_lookup_path, uid_lookup_path): -"""Loads a human readable English name for each softmax node. +synset_to_human = {} +for s in synset_to_human_list: +parts = s.strip().split('\t') +assert len(parts) == 2 +synset = parts[0] +human = parts[1] +synset_to_human[synset] = human + +label_index = 1 +labels_to_names = {0: 'background'} +for synset in synset_list: +name = synset_to_human[synset] +labels_to_names[label_index] = name +label_index += 1 + +return labels_to_names + + +def run_inference_on_image(image): +"""Runs inference on an image. Args: - label_lookup_path: string UID to integer node ID. - uid_lookup_path: string UID to human-readable string. + image: Image file name.
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968550#comment-15968550 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-294076746 Reviewing this today. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966467#comment-15966467 ] ASF GitHub Bot commented on TIKA-2306: -- KranthiGV commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-293681965 @thammegowda @grossws Can you please review this PR? It would enable me to work on [TIKA-2308](https://issues.apache.org/jira/browse/TIKA-2308) and also adding support for other image MIME types. Since the current and this implementations are different, it would save us the trouble of re-doing a lot of work again when this PR gets merged. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966453#comment-15966453 ] ASF GitHub Bot commented on TIKA-2306: -- KranthiGV commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-293681965 @thammegowda Can you please review this PR? It would enable me to work on [TIKA-2308](https://issues.apache.org/jira/browse/TIKA-2308) and also adding support for other image MIME types. Since the current and this implementations are different, it would save us the trouble of re-doing a lot of work again when this PR gets merged. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966450#comment-15966450 ] ASF GitHub Bot commented on TIKA-2306: -- KranthiGV commented on issue #163: TIKA-2306: Update Inception v3 to Inception v4 in Object recognition parser URL: https://github.com/apache/tika/pull/163#issuecomment-293681965 @thammegowda Can you please review this PR? It would enable me to work on [TIKA-2308](https://issues.apache.org/jira/browse/TIKA-2308) and also adding support for other image MIME types. Since the current and this implementations are different, it would save us the trouble of re-doing a lot of work again when this PR gets merged. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946536#comment-15946536 ] Kranthi Kiran GV commented on TIKA-2306: [~chrismattmann][~thammegowda] | ( [~tgow...@gmail.com] [~chris.a.mattm...@jpl.nasa.gov] ) Please review PR at https://github.com/apache/tika/pull/163 > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945664#comment-15945664 ] Raunaq Abhyankar commented on TIKA-2306: Hi I have sent in a PR. Do review it and suggest necessary changes. Haven't changed the inceptionapi.py script. But its implementation shouldn't be much difficult & can be done once the PR is reviewed. [~thammegowda] > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943705#comment-15943705 ] Raunaq Abhyankar commented on TIKA-2306: Hi! I was able to successfully classify image using Inception v4 and the results are better than Inception v3! However, I'm finding it difficult to merge the independent classification script with the existing Tika codebase. Any tips on the flow of code in Tika? How exactly is "classify_image.py" run? > Update Inception v3 to Inception v4 in Object recognition parser > - > > Key: TIKA-2306 > URL: https://issues.apache.org/jira/browse/TIKA-2306 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.14 >Reporter: Kranthi Kiran GV >Priority: Minor > Labels: inception, object_recognition > Original Estimate: 72h > Remaining Estimate: 72h > > Object Recognition Parser currently uses Inception V3 model for the object > classification task. Google released a newer Inception V4 model [1][2]. > It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3]. > I believe that Tika community would benefit from it. I would be working on > this issue in the next few days. > [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html > [2] https://arxiv.org/abs/1602.07261 > [3] https://github.com/tensorflow/models/tree/master/slim -- This message was sent by Atlassian JIRA (v6.3.15#6346)