[ 
https://issues.apache.org/jira/browse/TIKA-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059375#comment-16059375
 ] 

ASF GitHub Bot commented on TIKA-2262:
--------------------------------------

ThejanW commented on a change in pull request #180: Fix for TIKA-2262: 
Supporting Image-to-Text (Image Captioning) in Tika
URL: https://github.com/apache/tika/pull/180#discussion_r123507251
 
 

 ##########
 File path: 
tika-parsers/src/main/resources/org/apache/tika/parser/captioning/tf/model_wrapper.py
 ##########
 @@ -0,0 +1,347 @@
+#!/usr/bin/env python
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#    http://www.apache.org/licenses/LICENSE-2.0
+#  Unless required by applicable law or agreed to in writing,
+#  software distributed under the License is distributed on an
+#  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+#  KIND, either express or implied.  See the License for the
+#  specific language governing permissions and limitations
+#  under the License.
+
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os.path
+
+import tensorflow as tf
+from tensorflow.contrib.slim.python.slim.nets.inception_v3 import 
inception_v3_base
+
+slim = tf.contrib.slim
+
+
+class ModelWrapper(object):
+    """
+        Model wrapper class to perform image captioning with a ShowAndTellModel
+    """
+
+    def __init__(self):
+        super(ModelWrapper, self).__init__()
+
+    def build_graph(self, checkpoint_path):
+        """Builds the inference graph"""
+
+        tf.logging.info("Building model.")
+        ShowAndTellModel().build()
+        saver = tf.train.Saver()
+
+        return self._create_restore_fn(checkpoint_path, saver)
+
+    def _create_restore_fn(self, checkpoint_path, saver):
+        """Creates a function that restores a model from checkpoint file"""
+
+        if tf.gfile.IsDirectory(checkpoint_path):
+            checkpoint_path = tf.train.latest_checkpoint(checkpoint_path)
+            if not checkpoint_path:
+                raise ValueError("No checkpoint file found in: %s" % 
checkpoint_path)
+
+        def _restore_fn(sess):
+            tf.logging.info("Loading model from checkpoint: %s", 
checkpoint_path)
+            saver.restore(sess, checkpoint_path)
+            tf.logging.info("Successfully loaded checkpoint: %s",
+                            os.path.basename(checkpoint_path))
+
+        return _restore_fn
+
+    def feed_image(self, sess, encoded_image):
+        initial_state = sess.run(fetches="lstm/initial_state:0",
+                                 feed_dict={"image_feed:0": encoded_image})
+        return initial_state
+
+    def inference_step(self, sess, input_feed, state_feed):
+        softmax_output, state_output = sess.run(
+            fetches=["softmax:0", "lstm/state:0"],
+            feed_dict={
+                "input_feed:0": input_feed,
+                "lstm/state_feed:0": state_feed,
+            })
+        return softmax_output, state_output
+
+
+class ShowAndTellModel(object):
+    """
+        Image captioning implementation based on the paper,
+
+        "Show and Tell: A Neural Image Caption Generator"
+        Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan
+
+        For more details, please visit : http://arxiv.org/abs/1411.4555
+    """
+
+    def __init__(self):
+
+        # scale used to initialize model variables
+        self.initializer_scale = 0.08
+
+        # dimensions of Inception v3 input images
+        self.image_height = 299
+        self.image_width = 299
+
+        # image format ("jpeg" or "png")
+        self.image_format = "jpeg"
 
 Review comment:
   see, it's hard coded for now, I'm thinking of a better way to pass the image 
format into the comp' graph from c type.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types
> ------------------------------------------------------------------------
>
>                 Key: TIKA-2262
>                 URL: https://issues.apache.org/jira/browse/TIKA-2262
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Thamme Gowda
>            Assignee: Thamme Gowda
>              Labels: deeplearning, gsoc2017, machine_learning
>
> h2. Background:
> Image captions are a small piece of text, usually of one line, added to the 
> metadata of images to provide a brief summary of the scenery in the image. 
> It is a challenging and interesting problem in the domain of computer vision. 
> Tika already has a support for image recognition via [Object Recognition 
> Parser, TIKA-1993| https://issues.apache.org/jira/browse/TIKA-1993] which 
> uses an InceptionV3 model pre-trained on ImageNet dataset using tensorflow. 
> Captioning an image is a very useful feature since it helps text based 
> Information Retrieval(IR) systems to "understand" the scenery in images.
> h2. Technical details and references:
> * Google has long back open sourced their 'show and tell' neural network and 
> its model for autogenerating captions. [Source Code| 
> https://github.com/tensorflow/models/tree/master/im2txt], [Research blog| 
> https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.html]
> * Integrate it the same way as the ObjectRecognitionParser
> ** Create a RESTful API Service [similar to this| 
> https://wiki.apache.org/tika/TikaAndVision#A2._Tensorflow_Using_REST_Server] 
> ** Extend or enhance ObjectRecognitionParser or one of its implementation
> h2. {skills, learning, homework} for GSoC students
> * Knowledge of languages: java AND python, and maven build system
> * RESTful APIs 
> * tensorflow/keras,
> * deeplearning
> ----
> Alternatively, a little more harder path for experienced:
> [Import keras/tensorflow model to 
> deeplearning4j|https://deeplearning4j.org/model-import-keras ] and run them 
> natively inside JVM.
> h4. Benefits
> * no RESTful integration required. thus no external dependencies
> * easy to distribute on hadoop/spark clusters
> h4. Hurdles:
> * This is a work in progress feature on deeplearning4j and hence expected to 
> have lots of troubles on the way! 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to