Re: Regarding Image Captioning in Tika for Image MIME Types

Thamme Gowda Mon, 27 Mar 2017 10:30:48 -0700

Hi Kranthi Kiran,

1. Thanks for the update. I look forward to your PR.


2. I don't have complete details about compute resources from GSoC. I think
google offers free credits (Approx. 300$) when students signup to Google
Compute Engine. I am not worried about it at this time, we can sort it out
later.

3. Great to know!'

Best,
TG

*--*
*Thamme Gowda*
TG | @thammegowda <https://twitter.com/thammegowda>
~Sent via somebody's Webmail server!

On Fri, Mar 24, 2017 at 10:42 PM, Kranthi Kiran G V <
kkran...@student.nitw.ac.in> wrote:

> Apologies if I was ambiguous.
>
> 1) I have already started working on the improvement. The general method
> is working. I'll send a merge request after I port the REST method, too.
>
> 2) I was mentioning about the computational resources to train the final
> layer of im2txt to output the captions. Google hasn't released a
> pre-trained model.
>
> 3) I would update the developer community with a tentative GSoC schedule
> by tonight. It would be great if the community gives me suggestions.
>
> On Mar 25, 2017 12:06 AM, "Thamme Gowda" <thammego...@apache.org> wrote:
>
>> Hi Kranthi Kiran,
>>
>> Please find my replies below:
>>
>> Let me know if you have more questions.
>>
>> Thanks,
>> TG
>> *--*
>> *Thamme Gowda*
>> TG | @thammegowda <https://twitter.com/thammegowda>
>> ~Sent via somebody's Webmail server!
>>
>> On Tue, Mar 21, 2017 at 12:21 PM, Kranthi Kiran G V <
>> kkran...@student.nitw.ac.in> wrote:
>>
>>> Hello Thamme Gowda,
>>>
>>> Thank you for letting me know of the developer mailing list. I have
>>> created an issue [1] and I would be working on it.
>>> The change is not straightforward since Inception V3 pre-trained model
>>> has a graph while the Inception V3 pre-trained model is packaged in the
>>> form of a check-point (ckpt) [2].
>>>
>>
>> Okay, I see Inception-V3 has a graph, V4 has a checkpoint.
>> I assume there should be a way to restore model from checkpoint? Please
>> refer https://www.tensorflow.org/programmers_guide/variables
>> #checkpoint_files
>>
>>
>>>
>>> What do you think of using Keras to implement the Inception V4 model? It
>>> would make the job of scaling it on CPU clusters easier if we can use
>>> deeplearning4j's model import.
>>>
>>> Should I proceed in that direction?
>>>
>>> Regarding GSoC, what kind of computation resources are we given access
>>> to? We would have to train the show and tell network. It takes a lot of
>>> computation resources.
>>>
>>> If GPUs are not used, we would have to use a CPU cluster. So, the code
>>> has to be re-written (from the Google implementation of Inception V4).
>>>
>>>
>> Training IncpetionV4 from scratch requires too much effort, time, and
>> resources.  We are not aiming for such things, atleast not as part of Tika
>> and GSoC. The suggestion i mentioned earlier was to upgrade IncpetionV3
>> model with Inception V4 pretrained model/checkpoint since that will be more
>> benificial to Tika users community :-)
>>
>>
>>
>>>
>>> [1] https://issues.apache.org/jira/browse/TIKA-2306
>>> [2] https://github.com/tensorflow/models/tree/master/slim#pr
>>> e-trained-models
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Mar 20, 2017 at 3:17 AM, Thamme Gowda <thammego...@apache.org>
>>> wrote:
>>>
>>>> Hi Kranthi Kiran,
>>>>
>>>> Welcome to Tika Community. we are glad you are interested in working on
>>>> the issue.
>>>> Please remember to CC dev@tika mailing list for future discussions
>>>> related to tika.
>>>>
>>>>  *Should the model be trainable by the user?*
>>>> The basic minimum requirement is to provide a pre-trained model and
>>>> make the parser work out of the box without Training (expect no GPUs; 
>>>> expect
>>>> a JVM and nothing else).
>>>> Of course, the parser configuration should have options to change the
>>>> models by changing the path.
>>>>
>>>> As part of this GSoC project, integration isn't enough work. If you go
>>>> through the links provided in the Jira page you will notice that there
>>>> models for image recognition but no ready-made models for captioning. We
>>>> will have to train the im2text network from the dataset and make it
>>>> available. Thus we will have to open source the training utilities,
>>>> documentation or any supplementary tools we build along the way. We will
>>>> have to document all these in Tika wiki for the advanced users!
>>>>
>>>> This is a GSoC issue and thus we expect to work on it during the summer.
>>>>
>>>> For now, if you want a small task to familiarise yourself with Tika, I
>>>> have a suggestion:
>>>> Currently, Tika uses InceptionV3 model from Google for image
>>>> recognition.
>>>> The InceptionV4 model is out recently which proved to be more accurate
>>>> than V3.
>>>>
>>>> How about upgrading tika to use newer Inception model?
>>>>
>>>> Let me know if you have more questions.
>>>>
>>>> Cheers,
>>>> TG
>>>>
>>>> *--*
>>>> *Thamme Gowda*
>>>> TG | @thammegowda <https://twitter.com/thammegowda>
>>>> ~Sent via somebody's Webmail server!
>>>>
>>>> On Sun, Mar 19, 2017 at 11:56 AM, Kranthi Kiran G V <
>>>> kkran...@student.nitw.ac.in> wrote:
>>>>
>>>>> Hello,
>>>>> I'm Kranthi, a 3rd computer science undergrad at NIT, Warangal and a
>>>>> member of Deep Learning research group at out college. I'm interested to
>>>>> take up the issue. I believe it would be a great contribution to the 
>>>>> Apache
>>>>> Tika community.
>>>>>
>>>>> This is what I have done until now:
>>>>>
>>>>> 1) Build Tika from source using maven and explore it.
>>>>> 2) Tried the object recognition module from the command line. (I
>>>>> should probably start using the docker version to speed up my progress.)
>>>>>
>>>>> I am yet to import a keras model in dl4j. I have some doubts regarding
>>>>> the requirements since I'm new to this community. *Should the model
>>>>> be trainable by the user?* This is important because the Inception v3
>>>>> model without re-training has performed poorly for me (I'm currently
>>>>> training it with less number of steps due to limited computational
>>>>> resources I have -- GTX 1070).
>>>>>
>>>>> TODO (Before submitting the proposal):
>>>>>
>>>>> 1) Create a test REST API for Tika
>>>>> 2) Import a few models in dl4j.
>>>>> 3) Train im2txt on my computer.
>>>>>
>>>>> Thank you,
>>>>> Kranthi Kiran
>>>>>
>>>>
>>>>
>>>
>>

Re: Regarding Image Captioning in Tika for Image MIME Types

Reply via email to