Hi, Anuradha.

Thanks for your proposal draft. First, I would like to tell you that if
Apertium is a rule-based translation system, it is because this paradigm
still makes sense for many languages (indeed, for the vast majority of
them). If Bhojpuri has extensive electronic language resources and,
particularly, bilingual linguistic corpora, then Apertium is probably not
the best approach. But this is probably not the case. If it was, it would
probably already be on Google Translate.

As for the project. I would advise you to look at Gourab Chakraborty's
proposal for a Hindi-Bengali translator and the comments on it. Most of the
comments apply to your proposal as well. The following message would be
useful to you, for instance:
https://sourceforge.net/p/apertium/mailman/message/37251899/

Your proposal seems to me unrealistic. 10,000 words in the monodix (and how
many in the bidix?) are not enough for a WER below 20%, I think (maybe for
two extremely close related languages).

For better evaluation your proposal I'd like to find the answer for some
basic questions:

* Which is the current state of Bhojpuri language and, eventually,
the Bhojpuri-Hindi language pair in Apertium?
* Would you have to write a whole Bhojpuri morphological analyser from
scratch and, afterwards, to add some 10,000 words manually assigning them
to a given paradigm? How much time you'll need for this?
* From where would you get the bilingual dictionary? Would you have to
create it yourself? Are there freely available bilingual electronic
dictionaries (like e.g. Wiktionary)?
* Would you work on a Bhojpuri-to-Hindi translator or on a
Hindi-to-Bhojpuri one? In any case there will be a quite a lot of work in
the morphological disambiguation. But for one side you'll have it only
once. If both Hindi-to-Bhojpuri and Hindi-to-Bengali are chosen (which is
entirely possible), this work can be divided by the two projects.

There is nothing wrong to this all this work by hand, if needed. It depends
on the state of the language resources for the given language. But it is
necessary to know to what extent you will have to do this time-consuming
work.

When we had twice the time in most of the cases the projects couldn't reach
to create a working translator for a new language pair. In the current
conditions, it is even more difficult.

Hèctor




Missatge de Anuradha Pandey <anuradha200...@gmail.com> del dia dc., 7
d’abr. 2021 a les 16:28:

> Hello everyone,
> I am Anuradha Pandey, a sophomore student at BITS Pilani. I am interested
> I participating in GSoC 2021, on the project - "*Develop a prototype MT
> system for a strategic language pair*".
>
> I have prepared a rough draft for the same and I am planning to build
> Bhojpuri(BHO)-Hindi(HIN) MT pair. I am improving my translation system for
> the coding challenge and I will update my work on the GitHub repository
> mentioned in the draft. It would be really helpful if I could get some
> feedback before I make the final submission.
>
> Link to the draft -
>
> https://docs.google.com/document/d/1U19gJ3TMKYkYsp-FRthrvXkCRJUnNYSYKi46XhvZGOE/edit?usp=sharing
>
> Thanks & Regards,
> Anuradha Pandey
> IRC: Anuradha_Pandey
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to