[jira] [Created] (TIKA-2262) Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types

Thamme Gowda (JIRA) Sat, 11 Feb 2017 13:39:06 -0800

Thamme Gowda created TIKA-2262:
----------------------------------

             Summary: Supporting Image-to-Text (Image Captioning) in Tika for 
Image MIME Types
                 Key: TIKA-2262
                 URL: https://issues.apache.org/jira/browse/TIKA-2262
             Project: Tika
          Issue Type: Improvement
          Components: parser
            Reporter: Thamme Gowda



h2. Background:
Image captions are a small piece of text, usually of one line, added to the 
metadata of images to provide a brief summary of the scenery in the image. 
It is a challenging and interesting problem in the domain of computer vision. 
Tika already has a support for image recognition via [Object Recognition 
Parser, TIKA-1993| https://issues.apache.org/jira/browse/TIKA-1993] which uses 
an InceptionV3 model pre-trained on ImageNet dataset using tensorflow. 
Captioning an image is a very useful feature since it helps text based 
Information Retrieval(IR) systems to "understand" the scenery in images.

h2. Technical details and references:
* Google has long back open sourced their 'show and tell' neural network and 
its model for autogenerating captions. [Source Code| 
https://github.com/tensorflow/models/tree/master/im2txt], [Research blog| 
https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.html]
* Integrate it the same way as the ObjectRecognitionParser
** Create a RESTful API Service [similar to this| 
https://wiki.apache.org/tika/TikaAndVision#A2._Tensorflow_Using_REST_Server] 
** Extend or enhance ObjectRecognitionParser or one of its implementation

h2. {skills, learning, homework} for GSoC students
* Knowledge of languages: java AND python, and maven build system
* RESTful APIs 
* tensorflow/keras,
* deeplearning



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (TIKA-2262) Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types

Reply via email to