Hi Manisha,

We had a media linkage project in the GSOC last year, and were planning to
go into extracting semantic information from media this year. I gave up on
writing it out as an idea for GSoC since it's a bit out of the scope of a
3-4 months project and really requires deeper knowledge of Computer Vision.

The Idea was to build a semi-automatic annotation system that would go over
the videos in Wikimedia Commons, these are already linked (sometimes) to
articles in Wikipedia and fairly representative. Using YouTube or other
sources is also possible.
The system would:

1) Do shot detection and divide the video into a series of representative
shots.
2) Do voice recognition
3) Use existing trained classifiers to recognize certain object classes
4) Present annotation options so that crowd workers can annotate objects
that haven't been recognized yet. The output of the annotation step would
be used to train new classifiers.
5) Do face recognition/face tracking

I've implemented parts of the system using JavaCV but it would basically
need to be rewritten to use HIPI instead.
Technologies used would be HIPI [1] for CV tasks, CMU Sphinx [2] for voice
recognition and CrowdFlower[3] for the crowd-sourcing/annotation component.
Implementing this, even partially would require good knowledge of
Java/Scala and experience with Hadoop/HiPi and OpenCV.

Which parts of the system do you think you could implement, or to what
extent, in the available time ?

1. http://hipi.cs.virginia.edu/about.html
2. http://cmusphinx.sourceforge.net/
3. http://www.crowdflower.com/

On Mon, Mar 9, 2015 at 8:33 PM, manisha verma <manishaverma...@gmail.com>
wrote:

> Hello everyone
>
> I am Manisha Verma, a phd student in Information retrieval. I was
> wondering if GSOC projects has to be one of the listed ideas or students
> can propose their own projects too. I understand that mentoring is
> volunteer effort, however, I just wanted to run by an idea, if anyone would
> find it useful. I would be willing to work on it and submit a project.I
> could finish some warm-up tasks as well.
>
> I have gone through the guidelines, and I wished to know 'how big the
> project needs to be ? Could it be just a prototype ? Or it has to be an
> end-to-end system that runs at scale.'
>
>
> So here it is.
>
> I understand that wikipedia is fairly text based (there are some articles
> with pictures and audio files). Would it be feasible to integrate into
> DBPedia's ontology videos as well ? There are several categories of
> articles that have tremendous amounts of videos available on the internet.
> Video content would capture parts or entire article. For example, an
> article on SVM, could use one of the introductory video from Youtube.
> Similarly, for people like Sachin tendulkar, there are some documentaries.
>
> The focus is is to enrich existing articles.There are several things that
> need to be taken into account here.
>
> 1. First is the collection of Videos itself. For the project, I would
> start with a focused datsaet of videos.
> 2. Second is either their transcription or their metadata linking with
> text. Basically finding the most appropriate video for an article.
> 3. This could be tested with some ground truth, that could be built using
> Mturk. It would cost some amount, but I think I can cover that.
> 4. Lastly, is integrating it with articles. Would you link parts of text
> with the video or not. OR will it just be part of infobox.
>
>
> Within 4 months, a prototype could be generated. I am not sure if it will
> be at a huge scale.
>
> Sorry for such a lengthy email.
>
> Best
> Manisha
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Dbpedia-gsoc mailing list
> Dbpedia-gsoc@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to