[ 
https://issues.apache.org/jira/browse/TIKA-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074053#comment-16074053
 ] 

ASF GitHub Bot commented on TIKA-2262:
--------------------------------------

ThejanW commented on issue #189: Fix for TIKA-2262: Supporting Image-to-Text 
(Image Captioning) in Tika
URL: https://github.com/apache/tika/pull/189#issuecomment-312946296
 
 
   
/home/thejan/IdeaProjects/GSoC/tika/tika-parsers/src/main/java/org/apache/tika/parser/pkg/CompressorParser.java
       Error:Error:line (26)java: cannot find symbol
     symbol:   class MemoryLimitException
     location: package org.apache.commons.compress
       Error:Error:line (120)java: cannot find symbol
     symbol:   variable BROTLI
     location: class 
org.apache.commons.compress.compressors.CompressorStreamFactory
       Error:Error:line (122)java: cannot find symbol
     symbol:   variable LZ4_BLOCK
     location: class 
org.apache.commons.compress.compressors.CompressorStreamFactory
       Error:Error:line (124)java: cannot find symbol
     symbol:   variable LZ4_FRAMED
     location: class 
org.apache.commons.compress.compressors.CompressorStreamFactory
       Error:Error:line (177)java: no suitable constructor found for 
CompressorStreamFactory(boolean,int)
       constructor 
org.apache.commons.compress.compressors.CompressorStreamFactory.CompressorStreamFactory()
 is not applicable
         (actual and formal argument lists differ in length)
       constructor 
org.apache.commons.compress.compressors.CompressorStreamFactory.CompressorStreamFactory(boolean)
 is not applicable
         (actual and formal argument lists differ in length)
       Error:Error:line (180)java: cannot find symbol
     symbol:   class MemoryLimitException
     location: class org.apache.tika.parser.pkg.CompressorParser
   
   ============================================================================
   
   
/home/thejan/IdeaProjects/GSoC/tika/tika-parsers/src/main/java/org/apache/tika/parser/captioning/tf/TensorflowRESTCaptioner.java
       Error:Error:line (63)java: 
org.apache.tika.parser.captioning.tf.TensorflowRESTCaptioner is not abstract 
and does not override abstract method 
checkInitialization(org.apache.tika.config.InitializableProblemHandler) in 
org.apache.tika.config.Initializable
   
   ============================================================================
   
   
/home/thejan/IdeaProjects/GSoC/tika/tika-parsers/src/main/java/org/apache/tika/parser/mail/RFC822Parser.java
       Error:Error:line (63)java: cannot find symbol
     symbol:   class Builder
     location: class org.apache.james.mime4j.stream.MimeConfig
   
   ============================================================================
   
   
/home/thejan/IdeaProjects/GSoC/tika/tika-parsers/src/main/java/org/apache/tika/parser/pkg/ZipContainerDetector.java
       Error:Error:line (105)java: cannot find symbol
     symbol:   method detect(java.io.ByteArrayInputStream)
     location: class 
org.apache.commons.compress.compressors.CompressorStreamFactory
       Error:Error:line (114)java: cannot find symbol
     symbol:   method detect(java.io.ByteArrayInputStream)
     location: class org.apache.commons.compress.archivers.ArchiveStreamFactory
   
   ============================================================================
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types
> ------------------------------------------------------------------------
>
>                 Key: TIKA-2262
>                 URL: https://issues.apache.org/jira/browse/TIKA-2262
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Thamme Gowda
>            Assignee: Thamme Gowda
>              Labels: deeplearning, gsoc2017, machine_learning
>
> h2. Background:
> Image captions are a small piece of text, usually of one line, added to the 
> metadata of images to provide a brief summary of the scenery in the image. 
> It is a challenging and interesting problem in the domain of computer vision. 
> Tika already has a support for image recognition via [Object Recognition 
> Parser, TIKA-1993| https://issues.apache.org/jira/browse/TIKA-1993] which 
> uses an InceptionV3 model pre-trained on ImageNet dataset using tensorflow. 
> Captioning an image is a very useful feature since it helps text based 
> Information Retrieval(IR) systems to "understand" the scenery in images.
> h2. Technical details and references:
> * Google has long back open sourced their 'show and tell' neural network and 
> its model for autogenerating captions. [Source Code| 
> https://github.com/tensorflow/models/tree/master/im2txt], [Research blog| 
> https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.html]
> * Integrate it the same way as the ObjectRecognitionParser
> ** Create a RESTful API Service [similar to this| 
> https://wiki.apache.org/tika/TikaAndVision#A2._Tensorflow_Using_REST_Server] 
> ** Extend or enhance ObjectRecognitionParser or one of its implementation
> h2. {skills, learning, homework} for GSoC students
> * Knowledge of languages: java AND python, and maven build system
> * RESTful APIs 
> * tensorflow/keras,
> * deeplearning
> ----
> Alternatively, a little more harder path for experienced:
> [Import keras/tensorflow model to 
> deeplearning4j|https://deeplearning4j.org/model-import-keras ] and run them 
> natively inside JVM.
> h4. Benefits
> * no RESTful integration required. thus no external dependencies
> * easy to distribute on hadoop/spark clusters
> h4. Hurdles:
> * This is a work in progress feature on deeplearning4j and hence expected to 
> have lots of troubles on the way! 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to