Hi Gourab,

There has been, long time ago, some work on Bengali:
Faridee AZM, Tyers FM (2009) Development of a morphological analyser for
Bengali. In: Pérez-Ortiz J, Sánchez-
Martínez F, Tyers F (eds) Proceedings of the First International Workshop
on Free/Open-Source Rule-Based Ma-
chine Translation, Universidad de Alicante. Departamento de Lenguajes y
Sistemas Informáticos, Alicante, Spain, pp 43–50.

You should see how much it covers, as Daniel said. If the basis is done, as
I imagine, it would be more interesting to orient the proposal towards the
creation of a pair that is ready for publication. We have quite a few
parsers in different states of evolution, in particular for Indian
languages, but relatively few realised pairs. It would be very interesting
to have a "Bengali - another Indo-Iranian language" pair. Hindi-Bengali
would probably be the best option, as Hindi and Urdu are, to date, the only
languages that have been released in Apertium. Given that there is much
less time available in GSoC this year, one option would be to work mainly
in one direction. From Hindi to Bengali would be the easiest option because
it would also avoid having to work a lot on morphological disambiguation
(which should be more or less satisfactorily solved for Hindi). This would
make the project concentrate on 1) finishing the morphological analysis of
Bengali, 2) creating/expanding the transfer rules, 3) creating the lexical
selection rules, 4) adding several thousand words in the bidix, 5) testing
on real texts to fine-tune the translator and presenting a finished
translator with a WER of less than 25%, ready for publication, at the end
of the project. Least but not last, a Hindi-to-Bengali translator should
be, as a rule, easier for a Bengali-speaker than creating the opposite
direction.

Hèctor

Missatge de Daniel Swanson <awesomeevildu...@gmail.com> del dia dt., 23 de
març 2021 a les 0:11:

> Hi Gourab,
>
> My recommendation would be to evaluate the current status -ben and
> -bn-en in terms of corpus coverage and WER and then incorporate into
> your proposal what those numbers are now and how much you think you
> can improve them.
>
> A pull request to one of the repositories involved would also be
> worthwhile, both in terms of your understanding of how to accomplish
> the tasks in your proposal and for the mentors to be able to evaluate
> your proposal.
>
> Daniel
>
> On Mon, Mar 22, 2021 at 3:06 PM Gourab Chakraborty IIIT Dharwad
> <19bcs...@iiitdwd.ac.in> wrote:
> >
> >
> > Hi,
> > I would like to participate in GSoC and am interested in contributing in
> improving the transfer system for apertium-bn-en. My work would fall in the
> "Develop a morphological analyser" category of the idea-list. I'm a native
> speaker of Bengali and am really excited for the project.
> >
> > I have gone through the official documentation, and have already setup
> apertium in my ubuntu system.
> >
> > I have prepared a draft for my GSoC proposal (
> https://docs.google.com/document/d/1S5EY6Eddu4v1ZMqgkM0Kjl_27kBhZkDkEz0Ddmnrotk/edit?usp=sharing).
> Since this is my first proposal for GSoC, I would really appreciate any
> feedback. Also what should I do next?
> >
> > Thank you
> > --
> > Gourab Chakraborty (IRC: gourab337)
> > _______________________________________________
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to