Re: Regarding Image Captioning in Tika for Image MIME Types

Thamme Gowda Thu, 20 Apr 2017 13:06:41 -0700

This is awesome.
Thanks :-)

*--*
*Thamme Gowda*
TG | @thammegowda <https://twitter.com/thammegowda>
~Sent via somebody's Webmail server!


On Wed, Apr 19, 2017 at 1:43 PM, Kranthi Kiran G V <
kkran...@student.nitw.ac.in> wrote:

> Hello mentors,
>
> I have released a trained model of the neural image captioning system,
> im2txt.
> It can be found here:
> https://github.com/KranthiGV/Pretrained-Show-and-Tell-model
>
> I am hopeful it would benefit both the researchers community and Apache
> Tika's
> community for the image captioning.
>
> Have a lot at it!
>
> Thank you,
> Kranthi Kiran GV,
> CS 3/4 Undergrad,
> NIT Warangal
>
> On Wed, Mar 29, 2017 at 6:50 PM, Mattmann, Chris A (3010) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
>
>> Sounds great, and understood. Please prepare your proposal and share with
>> Thamme and I for
>> feedback as your (potential) mentors.
>>
>>
>>
>> Thanks much.
>>
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ++++++++++++++
>>
>> Chris Mattmann, Ph.D.
>>
>> Principal Data Scientist, Engineering Administrative Office (3010)
>>
>> Manager, NSF & Open Source Projects Formulation and Development Offices
>> (8212)
>>
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>
>> Office: 180-503E, Mailstop: 180-503
>>
>> Email: chris.a.mattm...@nasa.gov
>>
>> WWW:  http://sunset.usc.edu/~mattmann/
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ++++++++++++++
>>
>> Director, Information Retrieval and Data Science Group (IRDS)
>>
>> Adjunct Associate Professor, Computer Science Department
>>
>> University of Southern California, Los Angeles, CA 90089 USA
>>
>> WWW: http://irds.usc.edu/
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ++++++++++++++
>>
>>
>>
>>
>>
>> *From: *Kranthi Kiran G V <kkran...@student.nitw.ac.in>
>> *Date: *Wednesday, March 29, 2017 at 9:17 AM
>> *To: *Thamme Gowda <thammego...@apache.org>
>> *Cc: *Chris Mattmann <mattm...@apache.org>, "dev@tika.apache.org" <
>> dev@tika.apache.org>
>> *Subject: *Re: Regarding Image Captioning in Tika for Image MIME Types
>>
>>
>>
>> Hello,
>>
>> 1) I have submitted a PR which can be found here
>> <https://github.com/apache/tika/pull/163>.
>>
>> 2) After working on the Show and Tell model since a week, I realized that
>> the amount of computation resources I have are enough to take up the
>> challenge.
>>
>> Here is a sample caption I generated after a few days of training.
>>
>> INFO:tensorflow:Loading model from checkpoint:
>> /media/timberners/magicae/models/im2txt/im2txt/model/train/
>> model.ckpt-174685
>> INFO:tensorflow:Successfully loaded checkpoint: model.ckpt-174685
>> Captions for image COCO_val2014_000000224477.jpg:
>>   0) a man riding a wave on top of a surfboard . (p=0.016002)
>>   1) a man riding a surfboard on a wave in the ocean . (p=0.007747)
>>   2) a man riding a wave on a surfboard in the ocean . (p=0.007673)
>>
>> The evaluation is on the image in the example at im2txt's page
>> <https://github.com/tensorflow/models/tree/master/im2txt#generating-captions>.
>>
>>
>> I'm excited to release the pre-trained model (if I'm allowed to) to the
>> public during my GSoC journey to enable everyone to use it even though they
>> do not have enough resources. I think it would be a great contribution to
>> both Apache Tika and Computer Vision community as a whole.
>>
>> 3) I am working on the schedule. I would be submitting a draft in GSoC
>> page. Should I send it here, too?
>>
>> Regarding my other commitments, I would be working with Amazon India
>> Development Centre during May 10th to July 10th. They offer flexible
>> working hours.
>>
>> I would be able to dedicate 40-45 hours per week. My ability to balance
>> both of them can be showcased by how I am working at Deep Learning Research
>> Group - NITW currently in the college.
>>
>> What do you think?
>>
>>
>>
>> On Mon, Mar 27, 2017 at 11:00 PM, Thamme Gowda <thammego...@apache.org>
>> wrote:
>>
>> Hi Kranthi Kiran,
>>
>>
>>
>> 1. Thanks for the update. I look forward to your PR.
>>
>>
>>
>> 2. I don't have complete details about compute resources from GSoC. I
>> think google offers free credits (Approx. 300$) when students signup to
>> Google Compute Engine. I am not worried about it at this time, we can sort
>> it out later.
>>
>>
>>
>> 3. Great to know!'
>>
>>
>>
>> Best,
>>
>> TG
>>
>>
>> *--*
>>
>> *Thamme Gowda*
>>
>> TG | @thammegowda <https://twitter.com/thammegowda>
>>
>> ~Sent via somebody's Webmail server!
>>
>>
>>
>> On Fri, Mar 24, 2017 at 10:42 PM, Kranthi Kiran G V <
>> kkran...@student.nitw.ac.in> wrote:
>>
>> Apologies if I was ambiguous.
>>
>>
>>
>> 1) I have already started working on the improvement. The general method
>> is working. I'll send a merge request after I port the REST method, too.
>>
>>
>>
>> 2) I was mentioning about the computational resources to train the final
>> layer of im2txt to output the captions. Google hasn't released a
>> pre-trained model.
>>
>>
>>
>> 3) I would update the developer community with a tentative GSoC schedule
>> by tonight. It would be great if the community gives me suggestions.
>>
>>
>>
>> On Mar 25, 2017 12:06 AM, "Thamme Gowda" <thammego...@apache.org> wrote:
>>
>> Hi Kranthi Kiran,
>>
>>
>>
>> Please find my replies below:
>>
>>
>>
>> Let me know if you have more questions.
>>
>>
>>
>> Thanks,
>>
>> TG
>>
>> *--*
>>
>> *Thamme Gowda*
>>
>> TG | @thammegowda <https://twitter.com/thammegowda>
>>
>> ~Sent via somebody's Webmail server!
>>
>>
>>
>> On Tue, Mar 21, 2017 at 12:21 PM, Kranthi Kiran G V <
>> kkran...@student.nitw.ac.in> wrote:
>>
>> Hello Thamme Gowda,
>>
>> Thank you for letting me know of the developer mailing list. I have
>> created an issue [1] and I would be working on it.
>>
>> The change is not straightforward since Inception V3 pre-trained model
>> has a graph while the Inception V3 pre-trained model is packaged in the
>> form of a check-point (ckpt) [2].
>>
>>
>>
>> Okay, I see Inception-V3 has a graph, V4 has a checkpoint.
>>
>> I assume there should be a way to restore model from checkpoint? Please
>> refer https://www.tensorflow.org/programmers_guide/variables
>> #checkpoint_files
>>
>>
>>
>>
>>
>> What do you think of using Keras to implement the Inception V4 model? It
>> would make the job of scaling it on CPU clusters easier if we can use
>> deeplearning4j's model import.
>>
>>
>>
>> Should I proceed in that direction?
>>
>>
>>
>> Regarding GSoC, what kind of computation resources are we given access
>> to? We would have to train the show and tell network. It takes a lot of
>> computation resources.
>>
>>
>>
>> If GPUs are not used, we would have to use a CPU cluster. So, the code
>> has to be re-written (from the Google implementation of Inception V4).
>>
>>
>>
>>
>> Training IncpetionV4 from scratch requires too much effort, time, and
>> resources.  We are not aiming for such things, atleast not as part of Tika
>> and GSoC. The suggestion i mentioned earlier was to upgrade IncpetionV3
>> model with Inception V4 pretrained model/checkpoint since that will be more
>> benificial to Tika users community :-)
>>
>>
>>
>>
>>
>>
>>
>> [1] https://issues.apache.org/jira/browse/TIKA-2306
>>
>> [2] https://github.com/tensorflow/models/tree/master/slim#
>> pre-trained-models
>>
>>
>>
>>
>>
>>
>> On Mon, Mar 20, 2017 at 3:17 AM, Thamme Gowda <thammego...@apache.org>
>> wrote:
>>
>> Hi Kranthi Kiran,
>>
>>
>>
>> Welcome to Tika Community. we are glad you are interested in working on
>> the issue.
>>
>> Please remember to CC dev@tika mailing list for future discussions
>> related to tika.
>>
>>
>>
>>  *Should the model be trainable by the user?*
>>
>> The basic minimum requirement is to provide a pre-trained model and make
>> the parser work out of the box without Training (expect no GPUs; expect a
>> JVM and nothing else).
>>
>> Of course, the parser configuration should have options to change the
>> models by changing the path.
>>
>>
>>
>> As part of this GSoC project, integration isn't enough work. If you go
>> through the links provided in the Jira page you will notice that there
>> models for image recognition but no ready-made models for captioning. We
>> will have to train the im2text network from the dataset and make it
>> available. Thus we will have to open source the training utilities,
>> documentation or any supplementary tools we build along the way. We will
>> have to document all these in Tika wiki for the advanced users!
>>
>>
>>
>> This is a GSoC issue and thus we expect to work on it during the summer.
>>
>>
>>
>> For now, if you want a small task to familiarise yourself with Tika, I
>> have a suggestion:
>>
>> Currently, Tika uses InceptionV3 model from Google for image recognition.
>>
>> The InceptionV4 model is out recently which proved to be more accurate
>> than V3.
>>
>>
>>
>> How about upgrading tika to use newer Inception model?
>>
>>
>>
>> Let me know if you have more questions.
>>
>>
>>
>> Cheers,
>>
>> TG
>>
>>
>> *--*
>>
>> *Thamme Gowda*
>>
>> TG | @thammegowda <https://twitter.com/thammegowda>
>>
>> ~Sent via somebody's Webmail server!
>>
>>
>>
>> On Sun, Mar 19, 2017 at 11:56 AM, Kranthi Kiran G V <
>> kkran...@student.nitw.ac.in> wrote:
>>
>> Hello,
>> I'm Kranthi, a 3rd computer science undergrad at NIT, Warangal and a
>> member of Deep Learning research group at out college. I'm interested to
>> take up the issue. I believe it would be a great contribution to the Apache
>> Tika community.
>>
>> This is what I have done until now:
>>
>> 1) Build Tika from source using maven and explore it.
>> 2) Tried the object recognition module from the command line. (I should
>> probably start using the docker version to speed up my progress.)
>>
>> I am yet to import a keras model in dl4j. I have some doubts regarding
>> the requirements since I'm new to this community. *Should the model be
>> trainable by the user?* This is important because the Inception v3 model
>> without re-training has performed poorly for me (I'm currently training it
>> with less number of steps due to limited computational resources I have --
>> GTX 1070).
>>
>> TODO (Before submitting the proposal):
>>
>> 1) Create a test REST API for Tika
>>
>> 2) Import a few models in dl4j.
>>
>> 3) Train im2txt on my computer.
>>
>> Thank you,
>>
>> Kranthi Kiran
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Re: Regarding Image Captioning in Tika for Image MIME Types

Reply via email to