Hi all, This is regarding the adopting of the existing ur-hi language pair. Currently, the ur-hi pair has much work done with both urdu and hindi languages.
== Motivation == The field of machine translation has gained much traction in the past couple of decades. Its use of translating between different language pairs is of political, military and international importance. Apertium was quite interesting to me when I first came across it given that it has an excellent documentation. As a newbie to MT, it walked me through some basic concepts on its wiki which has provided me a good base to build upon. == Why must my project be selected? == Looking at the state of this language pair, it can be made stable considering that all the initial work is done. Knowing both the languages personally I would be able to comprehend the transfer rules between them more easily. == Goals == *Convert the words from the apertium_hi_WX.dix from WX notation to unicode encoding. Unicode encoding makes it consistent to use with the apertium engine. Also, make the tags in the IIIT morphologic analyzer to be along the same lines as the tags in apertium language pair. *Make the bidix more complete by adding more words to the bilingual dictionary. And, finally test it. The bidix file is apertium-ur-hi.ur-hi.dix *There is presently no parts of speech tagger for this. This tagger will be added to which will be the new file. apertium-ur-hi.ur-hi.tsx *Adding tranfer rules before target language tagger training. The rules need to be added to the file. apertium-ur-hi.ur-hi.t1x. *Train parts of speech tagger for the target language by using three stage transfer command, apertium-transfer apertium-tagger-tl-trainer *Finally, quality control tests would be run. The three test to be carried out are: -corpus test -regression test -testvoc Regarding experience, I have good experience with programming languages C, Python and Java. Also, familiar with markup languages like HTML, XML and reST. Currently, brushing up my C++ knowledge. D/VCSs: SVN, Git, Hg. This is not complete by any means. Any feedback to improve it will be much appreciated. Regards, Bagwan Ahsan IRC: sl33k_ email: ahsanbag...@gmail.com Undergraduate student Sinhgad Academy of Engineering Pune, India
------------------------------------------------------------------------------ Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff