Thamme Gowda created TIKA-2262: ---------------------------------- Summary: Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types Key: TIKA-2262 URL: https://issues.apache.org/jira/browse/TIKA-2262 Project: Tika Issue Type: Improvement Components: parser Reporter: Thamme Gowda
h2. Background: Image captions are a small piece of text, usually of one line, added to the metadata of images to provide a brief summary of the scenery in the image. It is a challenging and interesting problem in the domain of computer vision. Tika already has a support for image recognition via [Object Recognition Parser, TIKA-1993| https://issues.apache.org/jira/browse/TIKA-1993] which uses an InceptionV3 model pre-trained on ImageNet dataset using tensorflow. Captioning an image is a very useful feature since it helps text based Information Retrieval(IR) systems to "understand" the scenery in images. h2. Technical details and references: * Google has long back open sourced their 'show and tell' neural network and its model for autogenerating captions. [Source Code| https://github.com/tensorflow/models/tree/master/im2txt], [Research blog| https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.html] * Integrate it the same way as the ObjectRecognitionParser ** Create a RESTful API Service [similar to this| https://wiki.apache.org/tika/TikaAndVision#A2._Tensorflow_Using_REST_Server] ** Extend or enhance ObjectRecognitionParser or one of its implementation h2. {skills, learning, homework} for GSoC students * Knowledge of languages: java AND python, and maven build system * RESTful APIs * tensorflow/keras, * deeplearning -- This message was sent by Atlassian JIRA (v6.3.15#6346)