Here is a copy of my GSOC proposal. I would love to get some feedback
on some changes I can make.
Name: Sachin Shastri
E-mail address:sachinsu...@gmail.com <mailto:sachinsu...@gmail.com>
Other information that may be useful to contact you: Contact No: +917760118343 Irc
nick: sachinsurfs in #apertium atirc.freenode.net <http://irc.freenode.net>
*Why is it you are interested in machine translation?*
I am a computer science student hailing from Bangalore,South-India which is
known for its rich cultural diversity(8+ languages are spoken in my college
alone). Due to my background, and my dual-interest in computer science,
computational linguistic was a subject I always wanted to pursue. Initially, it
was the thought of a machine being to translate a very uncommon language like
tulu fascinated me. Hence since a young age,I have been studying on Machine
translation and always felt I have a strong connection to this subject.I got my
first actual basic formal education on this subject when our college included a
special course on Finite Automata and formal Languages in our syllabus( I am
proud to say, that PESIT (the college I am enrolled in) is the only college in
the entire state which offers this subject for 2nd year students). After this I
took a course on Natural language processing in course era,and my inerest in
this subject has been elevating since.
W*hy is it that you are interested in the Apertium project?*
I am interestd in Apertium foremost because it is open-source(and free).
Secondly due to the fact that I have always had a lot of interest in machine
translation. I always wanted to do a project under a MT organization and so
Apertium was instantly one of my first choice in the list of organization for
gsoc.Although I was immensely interested in working for Apertium, I was
intially worried that I might not have the specific knowledge in MT required
for doing
a project here(Not confusing interest with knowledge) , but after finding a
project that is exactly right for my skill level, I knew this was the right
organization for me.
*Which of the published tasks are you interested in? What do you plan to do?*
Everything above being said, I would like to take up the task of*Integrating Apertium in various chat clients.*
Telegram -- I plan to integrate the Apertium web service(using scaleMT based on
Apertium Scalable service) to Telegram(using the source provided on
github).(This probably will include the usual tasks of issueing the HTTP
requests, parsing the JSON result strings,using Async task,etc).
Xchat & pidgin- Make plugins that will interface the machine-translation system
.(I will most probably make use of the python scripting interface provided in both
these chat clients)
(Suggestion -- I could might as well do more plugins on other popular chat
clients like adium and finch, if I am going to be making use of the libpurple
libraries)
*Reason for choosing the selected task over other tasks-*
The reason I have chosen this task over others(Adopting a language pair,
rule-based finite-state disambiguation,etc) is that, although I have
considerable knowledge of(and lot of interest in) computational linguistics and
constraint grammer, I don't have that much experience in them as much as some
of my prospective peers.(I am not discrediting myself, I am just appriciating
the knowledge other people have w.r.t that) In fact, I have learned a huge deal
from the IRC channel and while waiting for the coding challenge.(Will take up
task of adopting a language pair next year, when I am ready) However, I have
had years of experience with Java, Python and android and so I felt this task
is right for me.(This way I get to also work with Apertium)
*Proposal Title* --*Mathrubhasha*
(Hindi for mother tongue)
*Reasons why Google and Apertium should sponsor it-*
The number of Pidgin users was estimated to be over 3 million in 2007. The
number has been growing at a steady rate since. The number of Xchat users is
also increasing at a good rate. Hence, there are a large number of users using
these chat clients everyday. Also, Telegram logged 5 million downloads in one
day following WhatsApp sale. So this would be a prime-time to integrate
apertium to these chat clients, and thereby adding a powerful tool to these
messengers which will not only help popularize machine translation platforms
but will also help in better communication and make these chat client more user
friendly.
(Since Pidgin is derived from "pidgin language" , having a machine translation
tool inorder to break the language barrier between people makes a lot of sense).
*A description of how and who it will benefit in society-*
There are 10+ million users using atleast one of these chat clients*everyday*. Though English is a universal language,a majority of the people are mostly comfortable in their own mother tongue. This tool will help this majority help express themselves better.(which is a highly desired quality in chat clients). Also this will help breaking the language barrier which will help different user from different parts of the world communicate with each other effectively.
It will mainly help the rural and urban communities(esp. in India) since, many
here know how to operate a computer and a phone, but don't know English.
Although the other projects(like developing language pairs) benifit specific
societies, the number of people in those societies who are benifited is small,
unless the end result of the project is made use in day to day situations.
Since chat messengers have become quite common and has become almost one of the
prime means of communication, it will be benificial to majority of people in
different societies. This is especially true for mobile messaging app, since
people carry their phones everywhere. Last but not the least , due to the same
reason, it will also help in awareness and expansion of the open souce
communities since Apertium and all the chat clients are all open-source
softwares.
Work plan
*WEEK*
*TASK*
Pre-week 1-4
Getting to know mentor better.
Analyzing the source code of the different chat clients and Apertium.
Forecast some of the more common constraints, that I will face and
decide plan of action for these.
Collect necessary information, read documentations, do in-depth
research , analyze and completely prepare for starting the project.
Get the source code and reading more on Apertium-caffeine (Which will
give a much better idea for making the plugins)
Week 1
Start with Telegram. Use the available source code, make modifications
in manifests for integration for Apertium web service avaiable.
Week 2
Work on the code , while make use of already present API's ike the
JSON REST API ( for issueing http requests, parsing,etc)
Week 3
Final UI work including making for use of AsyncTask for the threads
and finally Debugging. (If everything goes well, I will have Apertium
ready in Telegram during this time)
Week 4
More Debugging and making any changes required.
Deliverable #1
Integration of Apertium with Telegram
Week 5
Start making the plugin for Xchat. Start writiting scripts.
Week 6
Coding. Working on inertface.
Week 7
More Coding and debugging.(If everything goes right, I will have
plugin code ready by this time)
Week 8
Debugging and making of makefiles and config files for easy compilation.
Start working on plugin for Pidgin.
Deliverable #2
Apertium plugin for Xchat Chat Client
Week 9
Continue working on plugin for Pidgin.
Week 10
Coding.
Week 11
Debugging.
Deliverable #3
Apertium plugin for Pidgin Chat Client
Week 12
First 5 days: Extra time, In case I come across some major issue.
Last 2 days : Final Presentation .
Post-week
Tidying up.
*Important dates*: April 22nd- Commencing work on the project
May 19th -- Commencing work on the project
June 16th -- Deliverable #1
July 12th -- Deliverable #2
~August 6th- Deliverable #3
August 10th -Project completition
August 18th-22th -- Project Evaluation
*Time commitments:*
Preweek 1-3-- 3-5 hours per day(I have my Semester End Exams then
which ends by 4th week)
Preweek 4 -- atleast 12 hours per day
Week 1-3: 7-9 hours per day
Week 5-7 :7-9 hours per day
Week 9-10 :12 hours per day(Since I am alloting relatively less time
for this part)
Week 4,8,11:10-12 hours per day (Since debugging usually takes the
most amount of time)
Post week- 4-6 hours per day(Since my summer holidays end at this time)
*List your skills and give evidence of your qualifications*.
I am currently doing my B.Tech(Branch- Computer science and engineering) in
PES,Institute of technology, Bangalore.
Programming Skills related to this project: C, Java, Python,Xml, JavaScripts
I have taken up various courses on Java including advance data structures and Algorthim design. I been working on android app development for few years now and I have I have taken up few android project(I can provide scanned copy of certificates). I have worked on application integration and web services before. I have done my research on the different chat clients and feel this project is do-able with only knowledge of python(and maybe javascripts) as scripting language. Also, from the coding challenge, I have now a good idea on making plugins(esp for pidgin and xchat) and writing scripts for these clients. Hence, I think this project can be done by me, for the alloted time.
(I am also fluent in 6+ natural languages, although I doubt if that will be of
much use in the specific task i have chosen )
Coding Challenge: In progress. Expected to complete before deadline.
Link
:https://github.com/sachinsurfs/apertium-code-challenge-chat-client-plugins.git
*Previous experience in open-source project*: No.:( However, I am currently
working on a open-source cloud-benchmarking tool, making use of apache-geronimo
and daytrader which benchmarks the performance of public,private and hybrid
clouds, and gives values, while changing various parameters like no of clients
and WLAN speed.
*List any non-Summer-of-Code plans you have for the Summer-*
None. Therefore I am ready to devote entire 12 weeks on this porject.
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff