Hi Raunaq,

Welcome to Tika community! We are pleased to know that you are interested
in working on this issue!!

Please coordinate with Kranthi Kiran who is also working on the same issue
and avoid the duplicate efforts.
Yes, https://issues.apache.org/jira/browse/TIKA-2306 is the place to carry
out the discussions!


Thanks,
TG

*--*
*Thamme Gowda*
TG | @thammegowda <https://twitter.com/thammegowda>
~Sent via somebody's Webmail server!

On Sat, Mar 25, 2017 at 11:49 AM, Raunaq Abhyankar <
raunaq.abhyan...@gmail.com> wrote:

> Hi
> I'm Raunaq Abhyankar from Mumbai. I'm a final year computer engineering
> student. I'm interested in working on Tika during the summer.
>
> I was able to successfully classify image using Inception v4 and the
> results are better than Inception v3!
>
> However, I have one problem- I can run the script independently but am
> finding it difficult to integrate it with Tika. Can you pls guide me with
> this regard?
>
> Thanks
>
> Pfa: Screenshot of result of Inception v4 on testJPEG.jpg
>
> On Mon, Mar 20, 2017 at 3:17 AM, Thamme Gowda <thammego...@apache.org>
> wrote:
>
>> Hi Kranthi Kiran,
>>
>> Welcome to Tika Community. we are glad you are interested in working on
>> the
>> issue.
>> Please remember to CC dev@tika mailing list for future discussions
>> related
>> to tika.
>>
>>  *Should the model be trainable by the user?*
>> The basic minimum requirement is to provide a pre-trained model and make
>> the parser work out of the box without Training (expect no GPUs; expect a
>> JVM and nothing else).
>> Of course, the parser configuration should have options to change the
>> models by changing the path.
>>
>> As part of this GSoC project, integration isn't enough work. If you go
>> through the links provided in the Jira page you will notice that there
>> models for image recognition but no ready-made models for captioning. We
>> will have to train the im2text network from the dataset and make it
>> available. Thus we will have to open source the training utilities,
>> documentation or any supplementary tools we build along the way. We will
>> have to document all these in Tika wiki for the advanced users!
>>
>> This is a GSoC issue and thus we expect to work on it during the summer.
>>
>> For now, if you want a small task to familiarise yourself with Tika, I
>> have
>> a suggestion:
>> Currently, Tika uses InceptionV3 model from Google for image recognition.
>> The InceptionV4 model is out recently which proved to be more accurate
>> than
>> V3.
>>
>> How about upgrading tika to use newer Inception model?
>>
>> Let me know if you have more questions.
>>
>> Cheers,
>> TG
>>
>> *--*
>> *Thamme Gowda*
>> TG | @thammegowda <https://twitter.com/thammegowda>
>> ~Sent via somebody's Webmail server!
>>
>> On Sun, Mar 19, 2017 at 11:56 AM, Kranthi Kiran G V <
>> kkran...@student.nitw.ac.in> wrote:
>>
>> > Hello,
>> > I'm Kranthi, a 3rd computer science undergrad at NIT, Warangal and a
>> > member of Deep Learning research group at out college. I'm interested to
>> > take up the issue. I believe it would be a great contribution to the
>> Apache
>> > Tika community.
>> >
>> > This is what I have done until now:
>> >
>> > 1) Build Tika from source using maven and explore it.
>> > 2) Tried the object recognition module from the command line. (I should
>> > probably start using the docker version to speed up my progress.)
>> >
>> > I am yet to import a keras model in dl4j. I have some doubts regarding
>> the
>> > requirements since I'm new to this community. *Should the model be
>> > trainable by the user?* This is important because the Inception v3 model
>> > without re-training has performed poorly for me (I'm currently training
>> it
>> > with less number of steps due to limited computational resources I have
>> --
>> > GTX 1070).
>> >
>> > TODO (Before submitting the proposal):
>> >
>> > 1) Create a test REST API for Tika
>> > 2) Import a few models in dl4j.
>> > 3) Train im2txt on my computer.
>> >
>> > Thank you,
>> > Kranthi Kiran
>> >
>>
>
>
>
> --
> Regards,
> Raunaq Abhyankar
>

Reply via email to