Re: [Apertium-stuff] Applying for GSOC 2023 projects

2023-02-25 Thread Hèctor Alòs i Font
Hi Lahari,

Did you take a look on this page:
https://wiki.apertium.org/wiki/Starting_a_new_language_with_HFST ?

Telugu has already a minimal skeleton in Apertium:
https://github.com/apertium/apertium-tel

We would have to do something similar to what I proposed for Hindi, with
the difference that with Telugu we start practically from scratch. Another
difference is that Telugu uses twol + lexc/lexd. In short: you have to
describe the morphology of the language (nouns, adjectives, verbs, adverbs,
etc.) + introduce words by associating them with the paradigms defined in
the morphology. The aim is to achieve a high coverage.

Typically, this is done by starting with the closed morphological
categories and then moving on to the open ones. The inclusion of words
depends on the free resources available. In the worst case, it is done by
hand in decreasing order of word frequency.

At the moment there is quite a lot of work: installing Apertium (with twol
+ lexc/lexd) and basically learning how it works. The "coding challenge" (
https://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Morphological_analyser
) is essential as it shows that the candidate has understood the basics. In
case you have problems (which is absolutely normal), don't hesitate to ask
for help in the IRC channel. We are here to help.

Hèctor

Missatge de Lahari Sreeja Tallapaka  del
dia ds., 25 de febr. 2023 a les 16:59:

> Sir, please explain how I should proceed with the Telugu Morphological
> Analyser. I've read the documentation available on Apertium Wiki Pages and
> looked at the apertium-tel GitHub page, which has very little information.
> There is information on how to make a new language pair; I'm unsure if that
> is what I should be doing as only one language is involved here.
>
> Regards,
> Lahari Sreeja
>
> On Thu, Feb 23, 2023 at 10:12 PM Hèctor Alòs i Font 
> wrote:
>
>> I agree with Daniel. Moreover, experience shows that it is also
>> unrealistic to do such a complex project as a translator for distant
>> languages in a GSoC. I would look at the Telugu morphological analyser
>> currenly available in Apertium and concentrate on it. 90% of coverage could
>> be a minimal goal. The more the better. This will require a good command of
>> Telugu grammar. Lexc/lexd + twol should not be a problem for someone with a
>> computer science background.
>> Hèctor
>>
>> Missatge de Daniel Swanson  del dia dj., 23
>> de febr. 2023 a les 16:37:
>>
>>> Hi Lahari,
>>>
>>> For translation pairs, Hindi-English has been tried several times
>>> without success. I would suggest considering Hindi-Telugu.
>>>
>>> For other project ideas or places to get started, you can check the
>>> wiki page for each idea and do the coding challenge. If an idea is
>>> missing a coding challenge or you want to discuss the details of it,
>>> you'll get the quickest responses by talking to us on IRC:
>>> https://wiki.apertium.org/wiki/IRC
>>>
>>> Daniel
>>>
>>> On Thu, Feb 23, 2023 at 2:24 AM Lahari Sreeja Tallapaka
>>>  wrote:
>>> >
>>> > Greetings to the community,
>>> > I am Lahari Sreeja from the Indian Institute of Technology(IIT),
>>> Bhilai. I have taken Machine Learning, Natural Language processing, and
>>> Information retrieval courses and have experience in frontend web
>>> development. I know Telugu, Hindi, and English languages. And Im interested
>>> in adding English-Hindi/English-Telugu language pairs and there are a lot
>>> of projects that are interesting and are in my domain. It would be helpful
>>> to get guidance on where to start and some issues I can work on.
>>> > Cheers!
>>> > ___
>>> > Apertium-stuff mailing list
>>> > Apertium-stuff@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Applying for GSOC 2023 projects

2023-02-25 Thread Lahari Sreeja Tallapaka
Sir, please explain how I should proceed with the Telugu Morphological
Analyser. I've read the documentation available on Apertium Wiki Pages and
looked at the apertium-tel GitHub page, which has very little information.
There is information on how to make a new language pair; I'm unsure if that
is what I should be doing as only one language is involved here.

Regards,
Lahari Sreeja

On Thu, Feb 23, 2023 at 10:12 PM Hèctor Alòs i Font 
wrote:

> I agree with Daniel. Moreover, experience shows that it is also
> unrealistic to do such a complex project as a translator for distant
> languages in a GSoC. I would look at the Telugu morphological analyser
> currenly available in Apertium and concentrate on it. 90% of coverage could
> be a minimal goal. The more the better. This will require a good command of
> Telugu grammar. Lexc/lexd + twol should not be a problem for someone with a
> computer science background.
> Hèctor
>
> Missatge de Daniel Swanson  del dia dj., 23
> de febr. 2023 a les 16:37:
>
>> Hi Lahari,
>>
>> For translation pairs, Hindi-English has been tried several times
>> without success. I would suggest considering Hindi-Telugu.
>>
>> For other project ideas or places to get started, you can check the
>> wiki page for each idea and do the coding challenge. If an idea is
>> missing a coding challenge or you want to discuss the details of it,
>> you'll get the quickest responses by talking to us on IRC:
>> https://wiki.apertium.org/wiki/IRC
>>
>> Daniel
>>
>> On Thu, Feb 23, 2023 at 2:24 AM Lahari Sreeja Tallapaka
>>  wrote:
>> >
>> > Greetings to the community,
>> > I am Lahari Sreeja from the Indian Institute of Technology(IIT),
>> Bhilai. I have taken Machine Learning, Natural Language processing, and
>> Information retrieval courses and have experience in frontend web
>> development. I know Telugu, Hindi, and English languages. And Im interested
>> in adding English-Hindi/English-Telugu language pairs and there are a lot
>> of projects that are interesting and are in my domain. It would be helpful
>> to get guidance on where to start and some issues I can work on.
>> > Cheers!
>> > ___
>> > Apertium-stuff mailing list
>> > Apertium-stuff@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Applying for GSOC 2023 projects

2023-02-23 Thread Tanmai Khanna
Telugu is a fairly decently resources language now, and has some well
performing neural translators, like Hindi. We can still consider working
with it in Apertium but first must figure out what our goal is, with it.

If there's no particular goal then a low resource language would be
preferable.

Tanmai

On Thu, Feb 23, 2023, 22:12 Hèctor Alòs i Font  wrote:

> I agree with Daniel. Moreover, experience shows that it is also
> unrealistic to do such a complex project as a translator for distant
> languages in a GSoC. I would look at the Telugu morphological analyser
> currenly available in Apertium and concentrate on it. 90% of coverage could
> be a minimal goal. The more the better. This will require a good command of
> Telugu grammar. Lexc/lexd + twol should not be a problem for someone with a
> computer science background.
> Hèctor
>
> Missatge de Daniel Swanson  del dia dj., 23
> de febr. 2023 a les 16:37:
>
>> Hi Lahari,
>>
>> For translation pairs, Hindi-English has been tried several times
>> without success. I would suggest considering Hindi-Telugu.
>>
>> For other project ideas or places to get started, you can check the
>> wiki page for each idea and do the coding challenge. If an idea is
>> missing a coding challenge or you want to discuss the details of it,
>> you'll get the quickest responses by talking to us on IRC:
>> https://wiki.apertium.org/wiki/IRC
>>
>> Daniel
>>
>> On Thu, Feb 23, 2023 at 2:24 AM Lahari Sreeja Tallapaka
>>  wrote:
>> >
>> > Greetings to the community,
>> > I am Lahari Sreeja from the Indian Institute of Technology(IIT),
>> Bhilai. I have taken Machine Learning, Natural Language processing, and
>> Information retrieval courses and have experience in frontend web
>> development. I know Telugu, Hindi, and English languages. And Im interested
>> in adding English-Hindi/English-Telugu language pairs and there are a lot
>> of projects that are interesting and are in my domain. It would be helpful
>> to get guidance on where to start and some issues I can work on.
>> > Cheers!
>> > ___
>> > Apertium-stuff mailing list
>> > Apertium-stuff@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Applying for GSOC 2023 projects

2023-02-23 Thread Hèctor Alòs i Font
I agree with Daniel. Moreover, experience shows that it is also unrealistic
to do such a complex project as a translator for distant languages in a
GSoC. I would look at the Telugu morphological analyser currenly available
in Apertium and concentrate on it. 90% of coverage could be a minimal goal.
The more the better. This will require a good command of Telugu grammar.
Lexc/lexd + twol should not be a problem for someone with a computer
science background.
Hèctor

Missatge de Daniel Swanson  del dia dj., 23 de
febr. 2023 a les 16:37:

> Hi Lahari,
>
> For translation pairs, Hindi-English has been tried several times
> without success. I would suggest considering Hindi-Telugu.
>
> For other project ideas or places to get started, you can check the
> wiki page for each idea and do the coding challenge. If an idea is
> missing a coding challenge or you want to discuss the details of it,
> you'll get the quickest responses by talking to us on IRC:
> https://wiki.apertium.org/wiki/IRC
>
> Daniel
>
> On Thu, Feb 23, 2023 at 2:24 AM Lahari Sreeja Tallapaka
>  wrote:
> >
> > Greetings to the community,
> > I am Lahari Sreeja from the Indian Institute of Technology(IIT), Bhilai.
> I have taken Machine Learning, Natural Language processing, and Information
> retrieval courses and have experience in frontend web development. I know
> Telugu, Hindi, and English languages. And Im interested in adding
> English-Hindi/English-Telugu language pairs and there are a lot of projects
> that are interesting and are in my domain. It would be helpful to get
> guidance on where to start and some issues I can work on.
> > Cheers!
> > ___
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Applying for GSOC 2023 projects

2023-02-23 Thread Daniel Swanson
Hi Lahari,

For translation pairs, Hindi-English has been tried several times
without success. I would suggest considering Hindi-Telugu.

For other project ideas or places to get started, you can check the
wiki page for each idea and do the coding challenge. If an idea is
missing a coding challenge or you want to discuss the details of it,
you'll get the quickest responses by talking to us on IRC:
https://wiki.apertium.org/wiki/IRC

Daniel

On Thu, Feb 23, 2023 at 2:24 AM Lahari Sreeja Tallapaka
 wrote:
>
> Greetings to the community,
> I am Lahari Sreeja from the Indian Institute of Technology(IIT), Bhilai. I 
> have taken Machine Learning, Natural Language processing, and Information 
> retrieval courses and have experience in frontend web development. I know 
> Telugu, Hindi, and English languages. And Im interested in adding 
> English-Hindi/English-Telugu language pairs and there are a lot of projects 
> that are interesting and are in my domain. It would be helpful to get 
> guidance on where to start and some issues I can work on.
> Cheers!
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Applying for GSOC 2023 projects

2023-02-22 Thread Lahari Sreeja Tallapaka
Greetings to the community,
I am Lahari Sreeja from the Indian Institute of Technology(IIT), Bhilai. I
have taken Machine Learning, Natural Language processing, and Information
retrieval courses and have experience in frontend web development. I know
Telugu, Hindi, and English languages. And Im interested in adding
English-Hindi/English-Telugu language pairs and there are a lot of projects
that are interesting and are in my domain. It would be helpful to get
guidance on where to start and some issues I can work on.
Cheers!
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff