Google Summer of Code 2014 - Phonetic Matching Project

Alain Boulay Tue, 11 Mar 2014 01:04:28 -0700

Hello, I am a student who has seen this issue as an idea on Google Summer
of Code 2014. I am very interested in coding for this issue this summer and
have some questions regarding your needs so that I can write a good Google
Summer of Code proposal that addresses your concerns. I have a really
strong background in soft computing in linguistics (ANNs, statistical
methods, computational linguistics), and a good (and growing) background in
database and entity matching to labels.
Please provide some guidelines that may let me know what your needs are so
that I can submit a winning proposal.


I have posted on the Apache Phonetic Matching Project site
https://issues.apache.org/jira/browse/STANBOL-1291
where I hoped 'lurking' will provide some insight into your needs.

This is what I understand so far. . ..(from your website STANBOL 1291
Phonetic Linking)

"The main question to be answers is if the phonetic matching (step 4) can
correctly link Entities even if the writings in the text transcript are
incorrect."

Perhaps  'soft computing methods' are the best way to answer this question:
Neural Networks, Baysian, Fuzzy Sets or Rough Sets because these methods
would score  well even if the 'writings in the text transcript are
incorrect'.

I can address this question on many levels given my experience:
- Computational Linguistics - Experience in coding Artificial Neural
Networks that will learn phonetic speech. This also applies to text
recogniton and the generation of grammaticical rules from the language
input. I saw that the text to speech engine (Stanbol) uses Sphinx that is
built using Baysean approaches (now you have got me really excited!). I
would be very interested in working with STANBOL engine to produce tests or
measures of how well it is linking entities based on the performance as a
NLP engine along the lines of pattern matching. My experience in working
with these kinds of networks is with Neural Net simulator (T Learn) and
coding MATLAB neural nets.

- Text Quality - this would require some kind of examination between a
trusted sample of the original data and the output text. Experimental
statistical methods would provide measures, and empiracle computational
methods may provide means of improvement. However, you know the needs and
if I may have your insight or advice regarding the parameters, I am sure
that I can produce an excellent proposal.

-  I am PASSIONATE about coding Neural Nets and Baysian nets regarding
languge processing, and have been a student member of academic labs that
focus on human cognition and language processing (psycholinguistics). I
also have a very strong interest in becoming active in semantic web
development. I currently study Human-Computer Interaction at Laurentian,
and so speech interfaces are really exciting to me. . .I would really love
to have a chance to code with you for the summer (and afterwords too!)
because it would bring me the kind of experience that I cant get here at
the university.

I am most interested in answers to these kinds of questions (which are
specific to my application)


   - a list of deliverables, quantifiable results for the Apache community,
   (I am not sure what deliverables will meet your needs, can you suggest?)
   - a detailed description / design document, (I am interested in
   following standards put forward by Apache for design documentation -
   perhaps I would be able to see guidelines to give me an idea of what to
   submit)
   - an approach, (I would want to meet expectations in the approach)
   - an approximate schedule and
   - something of a background text. (does this mean literature search,
   citations? I can provide these, or what ever else is needed)

Thanks very much for your advice!! I will submit a proposal as soon as I
receive your reply!

AJ Boulay, MSc

Google Summer of Code 2014 - Phonetic Matching Project

Reply via email to