Hi Gourab, There has been, long time ago, some work on Bengali: Faridee AZM, Tyers FM (2009) Development of a morphological analyser for Bengali. In: Pérez-Ortiz J, Sánchez- Martínez F, Tyers F (eds) Proceedings of the First International Workshop on Free/Open-Source Rule-Based Ma- chine Translation, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Alicante, Spain, pp 43–50.
You should see how much it covers, as Daniel said. If the basis is done, as I imagine, it would be more interesting to orient the proposal towards the creation of a pair that is ready for publication. We have quite a few parsers in different states of evolution, in particular for Indian languages, but relatively few realised pairs. It would be very interesting to have a "Bengali - another Indo-Iranian language" pair. Hindi-Bengali would probably be the best option, as Hindi and Urdu are, to date, the only languages that have been released in Apertium. Given that there is much less time available in GSoC this year, one option would be to work mainly in one direction. From Hindi to Bengali would be the easiest option because it would also avoid having to work a lot on morphological disambiguation (which should be more or less satisfactorily solved for Hindi). This would make the project concentrate on 1) finishing the morphological analysis of Bengali, 2) creating/expanding the transfer rules, 3) creating the lexical selection rules, 4) adding several thousand words in the bidix, 5) testing on real texts to fine-tune the translator and presenting a finished translator with a WER of less than 25%, ready for publication, at the end of the project. Least but not last, a Hindi-to-Bengali translator should be, as a rule, easier for a Bengali-speaker than creating the opposite direction. Hèctor Missatge de Daniel Swanson <awesomeevildu...@gmail.com> del dia dt., 23 de març 2021 a les 0:11: > Hi Gourab, > > My recommendation would be to evaluate the current status -ben and > -bn-en in terms of corpus coverage and WER and then incorporate into > your proposal what those numbers are now and how much you think you > can improve them. > > A pull request to one of the repositories involved would also be > worthwhile, both in terms of your understanding of how to accomplish > the tasks in your proposal and for the mentors to be able to evaluate > your proposal. > > Daniel > > On Mon, Mar 22, 2021 at 3:06 PM Gourab Chakraborty IIIT Dharwad > <19bcs...@iiitdwd.ac.in> wrote: > > > > > > Hi, > > I would like to participate in GSoC and am interested in contributing in > improving the transfer system for apertium-bn-en. My work would fall in the > "Develop a morphological analyser" category of the idea-list. I'm a native > speaker of Bengali and am really excited for the project. > > > > I have gone through the official documentation, and have already setup > apertium in my ubuntu system. > > > > I have prepared a draft for my GSoC proposal ( > https://docs.google.com/document/d/1S5EY6Eddu4v1ZMqgkM0Kjl_27kBhZkDkEz0Ddmnrotk/edit?usp=sharing). > Since this is my first proposal for GSoC, I would really appreciate any > feedback. Also what should I do next? > > > > Thank you > > -- > > Gourab Chakraborty (IRC: gourab337) > > _______________________________________________ > > Apertium-stuff mailing list > > Apertium-stuff@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff