Gourab you can also follow https://github.com/RajarshiRoychoudhury/apertium-bn-hi
On Wed, 31 Mar 2021 at 01:21, Hèctor Alòs i Font <hectora...@gmail.com> wrote: > Hi, Gourab. > > I don't know if you already got other reviews in the IRC channel. Here are > my five cents: > > 1) Did you do the coding challenge? This is a must. > > 2) It would be good to know much about the current state of the hin-ben > pair. Because there isn't any information on this in your proposal, I've > taken a look at the repositories on GitHub. I've been surprised that there > is no hin-ben yet created in the Apertium repository (although there is > https://github.com/srj31/apertium-ben-hin) The hin monodix has 30,000+ > entries and the ben monodix some 8,000. Furthermore, as I imagined, the > morphological disambiguator for Hindi has very few rules (I guess they are > not very necessary for translating to Urdu). > > So there is quite a lot of work. It'll be very hard to really create a > translator with a WER below 25% (except if srj31's project has already > quite a lot of work and may be used). > > 3) Are there any free sources than can be used to fill the bidix (e.g. the > Wiktionary)? Or do you plan to translate by hand at least 10,000 Hindi > words? (much better 12,000-14,000 words for getting a WER bellow 30%). How > many words will you be able to translate per day? Only this would take most > of your time. And, since there are only 8,000 words in the Bengali monodix, > you'll need to add many of them in the Bengali monodix, which also needs > quite a lot of time. Again the same question: we'll you need to create > these words (and maybe the paradigms) in the monodix, or you'll be able to > get many new words (and their association to Apertium paradigms) from free > electronic sources? > > 4) In fact, your targets seem to be more a wish than something able. I > recommend that you try to create a calendar per week, in order to better > understand how much time you'll have to add words, create transfer rules, > morphological disambiguation rules and lexical selection rules. I don't > know anything on Indo-Iranian languages, but all Indo-European languages I > know need quite a lot of work on morphological disambiguation and, despite > this, it is one of the main sources of errors in the Apertium translators. > > You can take a look on this work plans: > https://wiki.apertium.org/wiki/Grfro3d/proposal_apertium_cat-srd_and_ita-srd#Workplan > > https://wiki.apertium.org/wiki/User:Hectoralos/GSOC_2020_proposal:_French-Arpitan#Workplan > (but take into account that in the previous years the number of hours > devoted to a GSoC project were twice as high as this year's) > > 5) Why do you have to improve the Bengali morphological analyser? Why > adding inflections for both Bangladeshi Bengali and Indian Bengali? The > project is already too complex and overloaded to add the possibility of > generating two flavours of Bengali (because it would be a matter of > generating Bengali, not of parsing it for translating into Hindi). I would > generate the Bengali that is currently in the Bengali monodix (the Indian > one, I guess). > > Best, > Hèctor > > Missatge de Gourab Chakraborty IIIT Dharwad <19bcs...@iiitdwd.ac.in> del > dia dl., 29 de març 2021 a les 20:20: > >> Hi all, >> I am planning to create the Apertium Hindi-Bengali language pair as per >> the suggestions I was given by the developers. The GSoC application window >> would begin soon, so I request the mentors to kindly give a review of my >> final proposal, for any last minute changes that are required. >> >> Thanks a lot! >> -- >> Gourab Chakraborty >> IRC: gourab337 >> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff