[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978095#comment-15978095
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

KranthiGV commented on issue #163: TIKA-2306: Update Inception v3 to Inception 
v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-295912340
 
 
   Thanks for updating the URL in dockerfile.
   I have made the minor changes on Wiki page now (V3 to V4, etc).
   I will be updating the results of running the parser by today.
   Thank you.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Assignee: Thamme Gowda
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-20 Thread Kranthi Kiran GV (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978086#comment-15978086
 ] 

Kranthi Kiran GV commented on TIKA-2306:


[~chrismattmann]
Apologies! I have updated the wiki page at 
https://wiki.apache.org/tika/TikaAndVision 
Everything is updated and in place now.
Thank you. Glad you liked the work, Chris! 

> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Assignee: Thamme Gowda
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-20 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1599#comment-1599
 ] 

Chris A. Mattmann commented on TIKA-2306:
-

the wiki update is critical BTW. Normally I would say err on the side of 
updating the wiki *before* committing. That way our documentation is an 
explicit pre-requisite to the code.

> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Assignee: Thamme Gowda
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-20 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1598#comment-1598
 ] 

Chris A. Mattmann commented on TIKA-2306:
-

great work [~kranthigv] and [~thammegowda]!

> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Assignee: Thamme Gowda
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977615#comment-15977615
 ] 

Hudson commented on TIKA-2306:
--

SUCCESS: Integrated in Jenkins build Tika-trunk #1245 (See 
[https://builds.apache.org/job/Tika-trunk/1245/])
fix for TIKA-2306 contributed by kranthigv (kranthi.gv: 
[https://github.com/apache/tika/commit/236db96393d94756dbc2e3f40b318f8f93b95dff])
* (edit) 
tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/classify_image.py
fix for TIKA-2306 contributed by kranthigv (kranthi.gv: 
[https://github.com/apache/tika/commit/0c0bd4bec2312355d2bc48426f8ec94306d0e4a0])
* (edit) 
tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/classify_image.py
fix for TIKA-2306 contributed by kranthigv (kranthi.gv: 
[https://github.com/apache/tika/commit/09cb2df973f20e3a877ca1309b67384264650be0])
* (edit) 
tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/inceptionapi.py
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTRecogniser.java
* (edit) 
tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/InceptionRestDockerfile
fix for TIKA-2306 contributed by kranthigv (kranthi.gv: 
[https://github.com/apache/tika/commit/f92809ac19d5bef903ef1ac393092e6a13884fc0])
* (edit) 
tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/InceptionRestDockerfile
fix for TIKA-2306 contributed by kranthigv (kranthi.gv: 
[https://github.com/apache/tika/commit/be773cacaf3c344c11fff9b85ebaf1d0dc8b5174])
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/recognition/ObjectRecognitionParserTest.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/recognition/tf/TensorflowImageRecParserTest.java


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Assignee: Thamme Gowda
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977513#comment-15977513
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

KranthiGV commented on issue #163: TIKA-2306: Update Inception v3 to Inception 
v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-295912340
 
 
   Thanks for updating the URL in dockerfile.
   I have made the minor changes on Wiki page now (V3 to V4, etc).
   I will be updating the results of running the parser by today.
   Thank you.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Assignee: Thamme Gowda
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977487#comment-15977487
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to 
Inception v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-295906412
 
 
   @KranthiGV It is now merged. I updated the URL in dockerfile to the ASF repo.
   
   Could you please update the URLs in the wiki 
https://wiki.apache.org/tika/TikaAndVision 
   Inception/v3 URLS should now be replaced by inception/v4
   
   
   Let us know if you need access to wiki.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Assignee: Thamme Gowda
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977467#comment-15977467
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

thammegowda closed pull request #163: TIKA-2306: Update Inception v3 to 
Inception v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975470#comment-15975470
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

KranthiGV commented on issue #163: TIKA-2306: Update Inception v3 to Inception 
v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-295430728
 
 
   That sounds great! I'm looking at _TMP_IMAGE_METADATA_PARSER and the 
CompositeParser.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975376#comment-15975376
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to 
Inception v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-295408367
 
 
   Got it. users who aren't using Inception will not lose JPEG metadata.
   We can merge this PR and support the composition of JPEG and 
ObjectRecognition in a new PR.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975255#comment-15975255
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

tballison commented on issue #163: TIKA-2306: Update Inception v3 to Inception 
v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-295382171
 
 
   > That's right.
   Ok, I'm ok with this behavior.  Users who want inception lose JPEG metadata 
(for now).  What I don't want to repeat is users who aren't using the 
ObjectRecognizer losing JPEG metadata.
   
   Bonus points for copy/paste of TesseractOCRParser's metadata insertion, but 
not required, IMHO.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975246#comment-15975246
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to 
Inception v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-295379808
 
 
   > I looks like you have to turn the ObjectRecognitionParser on via config, 
etc? So, the default behavior is that the JPEGParser is called and the 
ObjectRecognitionParser is not called, right?
   
   That's right.
   
   > To merge metadata, see: _TMP_IMAGE_METADATA_PARSER in TesseractOCRParser
   
   Thanks! I will have a look

 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975243#comment-15975243
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

tballison commented on issue #163: TIKA-2306: Update Inception v3 to Inception 
v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-295379227
 
 
   I looks like you have to turn the ObjectRecognitionParser on via config, 
etc?  So, the default behavior is that the JPEGParser is called and the 
ObjectRecognitionParser is not called, right?
   
   To merge metadata, see:  _TMP_IMAGE_METADATA_PARSER in TesseractOCRParser
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975241#comment-15975241
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

tballison commented on issue #163: TIKA-2306: Update Inception v3 to Inception 
v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-295379227
 
 
   I looks like you have to turn the ObjectRecognitionParser on via config, 
etc?  So, the default behavior is that the JPEGParser is called and the 
ObjectRecognitionParser is not called, right?
   
   To merge metadata, see:  _TMP_IMAGE_METADATA_PARSER in TesseractOCRParser
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975147#comment-15975147
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to 
Inception v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-295358397
 
 
   > Wait, does this prevent jpeg metadata from being extracted?
   
   Oh yes! That is the case with existing ObjectRecognition Parser.
   
   How do we run two different parsers for the same file and merge the metadata?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975139#comment-15975139
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to 
Inception v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-295358397
 
 
   > Wait, does this prevent jpeg metadata from being extracted?
   
   Oh yes! That is the case with existing ObjectRecognition Parser.
   
   How do we run two different parsers for the same file and merge the metadata?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975132#comment-15975132
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

tballison commented on issue #163: TIKA-2306: Update Inception v3 to Inception 
v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-295354514
 
 
   What's the worst that could happen?  Go for it...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975128#comment-15975128
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

tballison commented on issue #163: TIKA-2306: Update Inception v3 to Inception 
v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-295354514
 
 
   What's the worst that could happen?  Go for it...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975112#comment-15975112
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to 
Inception v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-295351578
 
 
   @tballison Can we merge this before the Tika 1.15 release. 
   It's ready to be merged, I had it tested.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969539#comment-15969539
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to 
Inception v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-294238344
 
 
   Thanks, @KranthiGV.
   LGTM!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968947#comment-15968947
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

KranthiGV commented on issue #163: TIKA-2306: Update Inception v3 to Inception 
v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-294143638
 
 
   @thammegowda 
   I have made the necessary changes at Reduced disk I/O commit 
(https://github.com/apache/tika/pull/163/commits/db8c81410b8468e1a524ed2dbb28abc7154fb11d)
   The performance is compared by running it 50 times on an image.
   ```
   script.sh:
   n=0; while [[ $n -lt 50 ]]; do java -jar 
tika-app/target/tika-app-1.15-SNAPSHOT.jar  
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml
 
http://www.trbimg.com/img-57226a08/turbine/ct-tesla-model-3-unveiling-20160404/650/650x366;
 n=$((n+1)); done
   
   time ./script.sh
   ```
   Before changes:
   `real4m33.334s`
   
   After changes:
   `real1m0.736s`
   
   TODO after merging:
   1) Update the inceptionapi.py link to Apache's repo link.
   2) Document the usage in Wiki.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968627#comment-15968627
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

KranthiGV commented on a change in pull request #163: TIKA-2306: Update 
Inception v3 to Inception v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#discussion_r111530356
 
 

 ##
 File path: 
tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/classify_image.py
 ##
 @@ -44,169 +49,205 @@
 from six.moves import urllib
 import tensorflow as tf
 
+from datasets import imagenet, dataset_utils
+from nets import inception
+from preprocessing import inception_preprocessing
+
+slim = tf.contrib.slim
+
 FLAGS = tf.app.flags.FLAGS
 
-# classify_image_graph_def.pb:
-#   Binary representation of the GraphDef protocol buffer.
-# imagenet_synset_to_human_label_map.txt:
+# inception_v4.ckpt
+#   Inception V4 checkpoint file.
+# imagenet_metadata.txt
 #   Map from synset ID to a human readable string.
-# imagenet_2012_challenge_label_map_proto.pbtxt:
+# imagenet_lsvrc_2015_synsets.txt
 #   Text representation of a protocol buffer mapping a label to synset ID.
 tf.app.flags.DEFINE_string(
 'model_dir', '/tmp/imagenet',
-"""Path to classify_image_graph_def.pb, """
-"""imagenet_synset_to_human_label_map.txt, and """
-"""imagenet_2012_challenge_label_map_proto.pbtxt.""")
+"""Path to inception_v4.ckpt, """
+"""imagenet_lsvrc_2015_synsets.txt, and """
+"""imagenet_metadata.txt.""")
 tf.app.flags.DEFINE_string('image_file', '',
"""Absolute path to image file.""")
 tf.app.flags.DEFINE_integer('num_top_predictions', 5,
 """Display this many predictions.""")
 
 # pylint: disable=line-too-long
-DATA_URL = 
'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz'
+DATA_URL = 
'http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz'
 # pylint: enable=line-too-long
 
 
-class NodeLookup(object):
-  """Converts integer node ID's to human readable labels."""
+def create_readable_names_for_imagenet_labels():
+"""Create a dict mapping label id to human readable string.
+
+Returns:
+labels_to_names: dictionary where keys are integers from to 1000
+and values are human-readable names.
+
+We retrieve a synset file, which contains a list of valid synset labels 
used
+by ILSVRC competition. There is one synset one per line, eg.
+#   n01440764
+#   n01443537
+We also retrieve a synset_to_human_file, which contains a mapping from 
synsets
+to human-readable names for every synset in Imagenet. These are stored in a
+tsv format, as follows:
+#   n02119247black fox
+#   n02119359silver fox
+We assign each synset (in alphabetical order) an integer, starting from 1
+(since 0 is reserved for the background class).
+
+Code is based on
+
https://github.com/tensorflow/models/blob/master/inception/inception/data/build_imagenet_data.py#L463
+"""
+
+# pylint: disable=line-too-long
+
+dest_directory = FLAGS.model_dir
+
+synset_list = [s.strip() for s in open(os.path.join(
+dest_directory, 'imagenet_lsvrc_2015_synsets.txt')).readlines()]
+num_synsets_in_ilsvrc = len(synset_list)
+assert num_synsets_in_ilsvrc == 1000
 
-  def __init__(self,
-   label_lookup_path=None,
-   uid_lookup_path=None):
-if not label_lookup_path:
-  label_lookup_path = os.path.join(
-  FLAGS.model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt')
-if not uid_lookup_path:
-  uid_lookup_path = os.path.join(
-  FLAGS.model_dir, 'imagenet_synset_to_human_label_map.txt')
-self.node_lookup = self.load(label_lookup_path, uid_lookup_path)
+synset_to_human_list = open(os.path.join(
+dest_directory, 'imagenet_metadata.txt')).readlines()
+num_synsets_in_all_imagenet = len(synset_to_human_list)
+assert num_synsets_in_all_imagenet == 21842
 
-  def load(self, label_lookup_path, uid_lookup_path):
-"""Loads a human readable English name for each softmax node.
+synset_to_human = {}
+for s in synset_to_human_list:
+parts = s.strip().split('\t')
+assert len(parts) == 2
+synset = parts[0]
+human = parts[1]
+synset_to_human[synset] = human
+
+label_index = 1
+labels_to_names = {0: 'background'}
+for synset in synset_list:
+name = synset_to_human[synset]
+labels_to_names[label_index] = name
+label_index += 1
+
+return labels_to_names
+
+
+def run_inference_on_image(image):
+"""Runs inference on an image.
 
 Args:
-  label_lookup_path: string UID to integer node ID.
-  uid_lookup_path: string UID to human-readable string.
+  image: Image file name.
 
 

[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968623#comment-15968623
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

thammegowda commented on a change in pull request #163: TIKA-2306: Update 
Inception v3 to Inception v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#discussion_r111530205
 
 

 ##
 File path: 
tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/classify_image.py
 ##
 @@ -44,169 +49,205 @@
 from six.moves import urllib
 import tensorflow as tf
 
+from datasets import imagenet, dataset_utils
+from nets import inception
+from preprocessing import inception_preprocessing
+
+slim = tf.contrib.slim
+
 FLAGS = tf.app.flags.FLAGS
 
-# classify_image_graph_def.pb:
-#   Binary representation of the GraphDef protocol buffer.
-# imagenet_synset_to_human_label_map.txt:
+# inception_v4.ckpt
+#   Inception V4 checkpoint file.
+# imagenet_metadata.txt
 #   Map from synset ID to a human readable string.
-# imagenet_2012_challenge_label_map_proto.pbtxt:
+# imagenet_lsvrc_2015_synsets.txt
 #   Text representation of a protocol buffer mapping a label to synset ID.
 tf.app.flags.DEFINE_string(
 'model_dir', '/tmp/imagenet',
-"""Path to classify_image_graph_def.pb, """
-"""imagenet_synset_to_human_label_map.txt, and """
-"""imagenet_2012_challenge_label_map_proto.pbtxt.""")
+"""Path to inception_v4.ckpt, """
+"""imagenet_lsvrc_2015_synsets.txt, and """
+"""imagenet_metadata.txt.""")
 tf.app.flags.DEFINE_string('image_file', '',
"""Absolute path to image file.""")
 tf.app.flags.DEFINE_integer('num_top_predictions', 5,
 """Display this many predictions.""")
 
 # pylint: disable=line-too-long
-DATA_URL = 
'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz'
+DATA_URL = 
'http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz'
 # pylint: enable=line-too-long
 
 
-class NodeLookup(object):
-  """Converts integer node ID's to human readable labels."""
+def create_readable_names_for_imagenet_labels():
+"""Create a dict mapping label id to human readable string.
+
+Returns:
+labels_to_names: dictionary where keys are integers from to 1000
+and values are human-readable names.
+
+We retrieve a synset file, which contains a list of valid synset labels 
used
+by ILSVRC competition. There is one synset one per line, eg.
+#   n01440764
+#   n01443537
+We also retrieve a synset_to_human_file, which contains a mapping from 
synsets
+to human-readable names for every synset in Imagenet. These are stored in a
+tsv format, as follows:
+#   n02119247black fox
+#   n02119359silver fox
+We assign each synset (in alphabetical order) an integer, starting from 1
+(since 0 is reserved for the background class).
+
+Code is based on
+
https://github.com/tensorflow/models/blob/master/inception/inception/data/build_imagenet_data.py#L463
+"""
+
+# pylint: disable=line-too-long
+
+dest_directory = FLAGS.model_dir
+
+synset_list = [s.strip() for s in open(os.path.join(
+dest_directory, 'imagenet_lsvrc_2015_synsets.txt')).readlines()]
+num_synsets_in_ilsvrc = len(synset_list)
+assert num_synsets_in_ilsvrc == 1000
 
-  def __init__(self,
-   label_lookup_path=None,
-   uid_lookup_path=None):
-if not label_lookup_path:
-  label_lookup_path = os.path.join(
-  FLAGS.model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt')
-if not uid_lookup_path:
-  uid_lookup_path = os.path.join(
-  FLAGS.model_dir, 'imagenet_synset_to_human_label_map.txt')
-self.node_lookup = self.load(label_lookup_path, uid_lookup_path)
+synset_to_human_list = open(os.path.join(
+dest_directory, 'imagenet_metadata.txt')).readlines()
+num_synsets_in_all_imagenet = len(synset_to_human_list)
+assert num_synsets_in_all_imagenet == 21842
 
-  def load(self, label_lookup_path, uid_lookup_path):
-"""Loads a human readable English name for each softmax node.
+synset_to_human = {}
+for s in synset_to_human_list:
+parts = s.strip().split('\t')
+assert len(parts) == 2
+synset = parts[0]
+human = parts[1]
+synset_to_human[synset] = human
+
+label_index = 1
+labels_to_names = {0: 'background'}
+for synset in synset_list:
+name = synset_to_human[synset]
+labels_to_names[label_index] = name
+label_index += 1
+
+return labels_to_names
+
+
+def run_inference_on_image(image):
+"""Runs inference on an image.
 
 Args:
-  label_lookup_path: string UID to integer node ID.
-  uid_lookup_path: string UID to human-readable string.
+  image: Image file name.
 
 

[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968616#comment-15968616
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to 
Inception v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-294091086
 
 
   @KranthiGV Also please pull changes on master branch and merge it with this 
branch. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968610#comment-15968610
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

KranthiGV commented on a change in pull request #163: TIKA-2306: Update 
Inception v3 to Inception v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#discussion_r111529725
 
 

 ##
 File path: 
tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/classify_image.py
 ##
 @@ -44,169 +49,205 @@
 from six.moves import urllib
 import tensorflow as tf
 
+from datasets import imagenet, dataset_utils
+from nets import inception
+from preprocessing import inception_preprocessing
+
+slim = tf.contrib.slim
+
 FLAGS = tf.app.flags.FLAGS
 
-# classify_image_graph_def.pb:
-#   Binary representation of the GraphDef protocol buffer.
-# imagenet_synset_to_human_label_map.txt:
+# inception_v4.ckpt
+#   Inception V4 checkpoint file.
+# imagenet_metadata.txt
 #   Map from synset ID to a human readable string.
-# imagenet_2012_challenge_label_map_proto.pbtxt:
+# imagenet_lsvrc_2015_synsets.txt
 #   Text representation of a protocol buffer mapping a label to synset ID.
 tf.app.flags.DEFINE_string(
 'model_dir', '/tmp/imagenet',
-"""Path to classify_image_graph_def.pb, """
-"""imagenet_synset_to_human_label_map.txt, and """
-"""imagenet_2012_challenge_label_map_proto.pbtxt.""")
+"""Path to inception_v4.ckpt, """
+"""imagenet_lsvrc_2015_synsets.txt, and """
+"""imagenet_metadata.txt.""")
 tf.app.flags.DEFINE_string('image_file', '',
"""Absolute path to image file.""")
 tf.app.flags.DEFINE_integer('num_top_predictions', 5,
 """Display this many predictions.""")
 
 # pylint: disable=line-too-long
-DATA_URL = 
'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz'
+DATA_URL = 
'http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz'
 # pylint: enable=line-too-long
 
 
-class NodeLookup(object):
-  """Converts integer node ID's to human readable labels."""
+def create_readable_names_for_imagenet_labels():
+"""Create a dict mapping label id to human readable string.
+
+Returns:
+labels_to_names: dictionary where keys are integers from to 1000
+and values are human-readable names.
+
+We retrieve a synset file, which contains a list of valid synset labels 
used
+by ILSVRC competition. There is one synset one per line, eg.
+#   n01440764
+#   n01443537
+We also retrieve a synset_to_human_file, which contains a mapping from 
synsets
+to human-readable names for every synset in Imagenet. These are stored in a
+tsv format, as follows:
+#   n02119247black fox
+#   n02119359silver fox
+We assign each synset (in alphabetical order) an integer, starting from 1
+(since 0 is reserved for the background class).
+
+Code is based on
+
https://github.com/tensorflow/models/blob/master/inception/inception/data/build_imagenet_data.py#L463
+"""
+
+# pylint: disable=line-too-long
+
+dest_directory = FLAGS.model_dir
+
+synset_list = [s.strip() for s in open(os.path.join(
+dest_directory, 'imagenet_lsvrc_2015_synsets.txt')).readlines()]
+num_synsets_in_ilsvrc = len(synset_list)
+assert num_synsets_in_ilsvrc == 1000
 
-  def __init__(self,
-   label_lookup_path=None,
-   uid_lookup_path=None):
-if not label_lookup_path:
-  label_lookup_path = os.path.join(
-  FLAGS.model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt')
-if not uid_lookup_path:
-  uid_lookup_path = os.path.join(
-  FLAGS.model_dir, 'imagenet_synset_to_human_label_map.txt')
-self.node_lookup = self.load(label_lookup_path, uid_lookup_path)
+synset_to_human_list = open(os.path.join(
+dest_directory, 'imagenet_metadata.txt')).readlines()
+num_synsets_in_all_imagenet = len(synset_to_human_list)
+assert num_synsets_in_all_imagenet == 21842
 
-  def load(self, label_lookup_path, uid_lookup_path):
-"""Loads a human readable English name for each softmax node.
+synset_to_human = {}
+for s in synset_to_human_list:
+parts = s.strip().split('\t')
+assert len(parts) == 2
+synset = parts[0]
+human = parts[1]
+synset_to_human[synset] = human
+
+label_index = 1
+labels_to_names = {0: 'background'}
+for synset in synset_list:
+name = synset_to_human[synset]
+labels_to_names[label_index] = name
+label_index += 1
+
+return labels_to_names
+
+
+def run_inference_on_image(image):
+"""Runs inference on an image.
 
 Args:
-  label_lookup_path: string UID to integer node ID.
-  uid_lookup_path: string UID to human-readable string.
+  image: Image file name.
 
 

[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968599#comment-15968599
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

thammegowda commented on a change in pull request #163: TIKA-2306: Update 
Inception v3 to Inception v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#discussion_r111529125
 
 

 ##
 File path: 
tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/classify_image.py
 ##
 @@ -44,169 +49,205 @@
 from six.moves import urllib
 import tensorflow as tf
 
+from datasets import imagenet, dataset_utils
+from nets import inception
+from preprocessing import inception_preprocessing
+
+slim = tf.contrib.slim
+
 FLAGS = tf.app.flags.FLAGS
 
-# classify_image_graph_def.pb:
-#   Binary representation of the GraphDef protocol buffer.
-# imagenet_synset_to_human_label_map.txt:
+# inception_v4.ckpt
+#   Inception V4 checkpoint file.
+# imagenet_metadata.txt
 #   Map from synset ID to a human readable string.
-# imagenet_2012_challenge_label_map_proto.pbtxt:
+# imagenet_lsvrc_2015_synsets.txt
 #   Text representation of a protocol buffer mapping a label to synset ID.
 tf.app.flags.DEFINE_string(
 'model_dir', '/tmp/imagenet',
-"""Path to classify_image_graph_def.pb, """
-"""imagenet_synset_to_human_label_map.txt, and """
-"""imagenet_2012_challenge_label_map_proto.pbtxt.""")
+"""Path to inception_v4.ckpt, """
+"""imagenet_lsvrc_2015_synsets.txt, and """
+"""imagenet_metadata.txt.""")
 tf.app.flags.DEFINE_string('image_file', '',
"""Absolute path to image file.""")
 tf.app.flags.DEFINE_integer('num_top_predictions', 5,
 """Display this many predictions.""")
 
 # pylint: disable=line-too-long
-DATA_URL = 
'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz'
+DATA_URL = 
'http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz'
 # pylint: enable=line-too-long
 
 
-class NodeLookup(object):
-  """Converts integer node ID's to human readable labels."""
+def create_readable_names_for_imagenet_labels():
+"""Create a dict mapping label id to human readable string.
+
+Returns:
+labels_to_names: dictionary where keys are integers from to 1000
+and values are human-readable names.
+
+We retrieve a synset file, which contains a list of valid synset labels 
used
+by ILSVRC competition. There is one synset one per line, eg.
+#   n01440764
+#   n01443537
+We also retrieve a synset_to_human_file, which contains a mapping from 
synsets
+to human-readable names for every synset in Imagenet. These are stored in a
+tsv format, as follows:
+#   n02119247black fox
+#   n02119359silver fox
+We assign each synset (in alphabetical order) an integer, starting from 1
+(since 0 is reserved for the background class).
+
+Code is based on
+
https://github.com/tensorflow/models/blob/master/inception/inception/data/build_imagenet_data.py#L463
+"""
+
+# pylint: disable=line-too-long
+
+dest_directory = FLAGS.model_dir
+
+synset_list = [s.strip() for s in open(os.path.join(
+dest_directory, 'imagenet_lsvrc_2015_synsets.txt')).readlines()]
+num_synsets_in_ilsvrc = len(synset_list)
+assert num_synsets_in_ilsvrc == 1000
 
-  def __init__(self,
-   label_lookup_path=None,
-   uid_lookup_path=None):
-if not label_lookup_path:
-  label_lookup_path = os.path.join(
-  FLAGS.model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt')
-if not uid_lookup_path:
-  uid_lookup_path = os.path.join(
-  FLAGS.model_dir, 'imagenet_synset_to_human_label_map.txt')
-self.node_lookup = self.load(label_lookup_path, uid_lookup_path)
+synset_to_human_list = open(os.path.join(
+dest_directory, 'imagenet_metadata.txt')).readlines()
+num_synsets_in_all_imagenet = len(synset_to_human_list)
+assert num_synsets_in_all_imagenet == 21842
 
-  def load(self, label_lookup_path, uid_lookup_path):
-"""Loads a human readable English name for each softmax node.
+synset_to_human = {}
+for s in synset_to_human_list:
+parts = s.strip().split('\t')
+assert len(parts) == 2
+synset = parts[0]
+human = parts[1]
+synset_to_human[synset] = human
+
+label_index = 1
+labels_to_names = {0: 'background'}
+for synset in synset_list:
+name = synset_to_human[synset]
+labels_to_names[label_index] = name
+label_index += 1
+
+return labels_to_names
+
+
+def run_inference_on_image(image):
+"""Runs inference on an image.
 
 Args:
-  label_lookup_path: string UID to integer node ID.
-  uid_lookup_path: string UID to human-readable string.
+  image: Image file name.
 
 

[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968550#comment-15968550
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

thammegowda commented on issue #163: TIKA-2306: Update Inception v3 to 
Inception v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-294076746
 
 
   Reviewing this today. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966467#comment-15966467
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

KranthiGV commented on issue #163: TIKA-2306: Update Inception v3 to Inception 
v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-293681965
 
 
   @thammegowda @grossws 
   Can you please review this PR? 
   It would enable me to work on 
[TIKA-2308](https://issues.apache.org/jira/browse/TIKA-2308) and also adding 
support for other image MIME types.
   Since the current and this implementations are different, it would save us 
the trouble of re-doing a lot of work again when this PR gets merged.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966453#comment-15966453
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

KranthiGV commented on issue #163: TIKA-2306: Update Inception v3 to Inception 
v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-293681965
 
 
   @thammegowda 
   Can you please review this PR? 
   It would enable me to work on 
[TIKA-2308](https://issues.apache.org/jira/browse/TIKA-2308) and also adding 
support for other image MIME types.
   Since the current and this implementations are different, it would save us 
the trouble of re-doing a lot of work again when this PR gets merged.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966450#comment-15966450
 ] 

ASF GitHub Bot commented on TIKA-2306:
--

KranthiGV commented on issue #163: TIKA-2306: Update Inception v3 to Inception 
v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#issuecomment-293681965
 
 
   @thammegowda 
   Can you please review this PR? 
   It would enable me to work on 
[TIKA-2308](https://issues.apache.org/jira/browse/TIKA-2308) and also adding 
support for other image MIME types.
   Since the current and this implementations are different, it would save us 
the trouble of re-doing a lot of work again when this PR gets merged.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-03-28 Thread Kranthi Kiran GV (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946536#comment-15946536
 ] 

Kranthi Kiran GV commented on TIKA-2306:


[~chrismattmann][~thammegowda] | ( [~tgow...@gmail.com]  
[~chris.a.mattm...@jpl.nasa.gov] )
Please review PR at https://github.com/apache/tika/pull/163

> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-03-28 Thread Raunaq Abhyankar (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945664#comment-15945664
 ] 

Raunaq Abhyankar commented on TIKA-2306:


Hi 
I have sent in a PR. Do review it and suggest necessary changes. 
Haven't changed the inceptionapi.py script. But its implementation shouldn't be 
much difficult & can be done once the PR is reviewed.
[~thammegowda]

> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-03-27 Thread Raunaq Abhyankar (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943705#comment-15943705
 ] 

Raunaq Abhyankar commented on TIKA-2306:


Hi! 
I was able to successfully classify image using Inception v4 and the results 
are better than Inception v3! However, I'm finding it difficult to merge the 
independent classification script with the existing Tika codebase. 
Any tips on the flow of code in Tika? How exactly is "classify_image.py" run? 

> Update Inception v3 to Inception v4 in Object recognition parser 
> -
>
> Key: TIKA-2306
> URL: https://issues.apache.org/jira/browse/TIKA-2306
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14
>Reporter: Kranthi Kiran GV
>Priority: Minor
>  Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object 
> classification task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on 
> this issue in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)