Hi all,

This is regarding the adopting of the existing ur-hi language pair.
Currently, the ur-hi pair has much work done with both urdu and hindi
languages.

== Motivation ==

The field of machine translation has gained much traction in the past couple
of decades. Its use of translating between different language pairs is of
political, military and international importance.
Apertium was quite interesting to me when I first came across it given that
it has an excellent documentation. As a newbie to MT, it walked me through
some basic concepts on its wiki which has provided me a good base to build
upon.


== Why must my project be selected? ==

Looking at the state of this language pair, it can be made stable
considering that all the initial work is done. Knowing both the languages
personally I would be able to comprehend the transfer rules between them
more easily.

== Goals ==

*Convert the words from the apertium_hi_WX.dix from WX notation to unicode
encoding. Unicode encoding makes it consistent to use with the apertium
engine. Also, make the tags in the IIIT morphologic analyzer to be along the
same lines as the tags in apertium language pair.

*Make the bidix more complete by adding more words to the bilingual
dictionary. And, finally test it.
 The bidix file is apertium-ur-hi.ur-hi.dix

*There is presently no parts of speech tagger for this. This tagger will be
added to  which will be the new file.
apertium-ur-hi.ur-hi.tsx

*Adding tranfer rules before target language tagger training. The rules need
to be added to the file.
  apertium-ur-hi.ur-hi.t1x.

*Train parts of speech tagger for the target language by using three stage
transfer command,
 apertium-transfer
 apertium-tagger-tl-trainer

*Finally, quality control tests would be run. The three test to be carried
out are:
-corpus test
-regression test
-testvoc



Regarding experience, I have good experience with programming languages C,
Python and Java. Also, familiar with markup languages like HTML, XML and
reST. Currently, brushing up my C++ knowledge.
D/VCSs: SVN, Git, Hg.


This is not complete by any means. Any feedback to improve it will be much
appreciated.



Regards,
Bagwan Ahsan
IRC: sl33k_
email: ahsanbag...@gmail.com
Undergraduate student
Sinhgad Academy of Engineering
Pune, India
------------------------------------------------------------------------------
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to