Re: [Apertium-stuff] Applying for GSOC 2023 projects
Hi Lahari, Did you take a look on this page: https://wiki.apertium.org/wiki/Starting_a_new_language_with_HFST ? Telugu has already a minimal skeleton in Apertium: https://github.com/apertium/apertium-tel We would have to do something similar to what I proposed for Hindi, with the difference that with Telugu we start practically from scratch. Another difference is that Telugu uses twol + lexc/lexd. In short: you have to describe the morphology of the language (nouns, adjectives, verbs, adverbs, etc.) + introduce words by associating them with the paradigms defined in the morphology. The aim is to achieve a high coverage. Typically, this is done by starting with the closed morphological categories and then moving on to the open ones. The inclusion of words depends on the free resources available. In the worst case, it is done by hand in decreasing order of word frequency. At the moment there is quite a lot of work: installing Apertium (with twol + lexc/lexd) and basically learning how it works. The "coding challenge" ( https://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Morphological_analyser ) is essential as it shows that the candidate has understood the basics. In case you have problems (which is absolutely normal), don't hesitate to ask for help in the IRC channel. We are here to help. Hèctor Missatge de Lahari Sreeja Tallapaka del dia ds., 25 de febr. 2023 a les 16:59: > Sir, please explain how I should proceed with the Telugu Morphological > Analyser. I've read the documentation available on Apertium Wiki Pages and > looked at the apertium-tel GitHub page, which has very little information. > There is information on how to make a new language pair; I'm unsure if that > is what I should be doing as only one language is involved here. > > Regards, > Lahari Sreeja > > On Thu, Feb 23, 2023 at 10:12 PM Hèctor Alòs i Font > wrote: > >> I agree with Daniel. Moreover, experience shows that it is also >> unrealistic to do such a complex project as a translator for distant >> languages in a GSoC. I would look at the Telugu morphological analyser >> currenly available in Apertium and concentrate on it. 90% of coverage could >> be a minimal goal. The more the better. This will require a good command of >> Telugu grammar. Lexc/lexd + twol should not be a problem for someone with a >> computer science background. >> Hèctor >> >> Missatge de Daniel Swanson del dia dj., 23 >> de febr. 2023 a les 16:37: >> >>> Hi Lahari, >>> >>> For translation pairs, Hindi-English has been tried several times >>> without success. I would suggest considering Hindi-Telugu. >>> >>> For other project ideas or places to get started, you can check the >>> wiki page for each idea and do the coding challenge. If an idea is >>> missing a coding challenge or you want to discuss the details of it, >>> you'll get the quickest responses by talking to us on IRC: >>> https://wiki.apertium.org/wiki/IRC >>> >>> Daniel >>> >>> On Thu, Feb 23, 2023 at 2:24 AM Lahari Sreeja Tallapaka >>> wrote: >>> > >>> > Greetings to the community, >>> > I am Lahari Sreeja from the Indian Institute of Technology(IIT), >>> Bhilai. I have taken Machine Learning, Natural Language processing, and >>> Information retrieval courses and have experience in frontend web >>> development. I know Telugu, Hindi, and English languages. And Im interested >>> in adding English-Hindi/English-Telugu language pairs and there are a lot >>> of projects that are interesting and are in my domain. It would be helpful >>> to get guidance on where to start and some issues I can work on. >>> > Cheers! >>> > ___ >>> > Apertium-stuff mailing list >>> > Apertium-stuff@lists.sourceforge.net >>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>> >>> >>> ___ >>> Apertium-stuff mailing list >>> Apertium-stuff@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>> >> ___ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Applying for GSOC 2023 projects
Sir, please explain how I should proceed with the Telugu Morphological Analyser. I've read the documentation available on Apertium Wiki Pages and looked at the apertium-tel GitHub page, which has very little information. There is information on how to make a new language pair; I'm unsure if that is what I should be doing as only one language is involved here. Regards, Lahari Sreeja On Thu, Feb 23, 2023 at 10:12 PM Hèctor Alòs i Font wrote: > I agree with Daniel. Moreover, experience shows that it is also > unrealistic to do such a complex project as a translator for distant > languages in a GSoC. I would look at the Telugu morphological analyser > currenly available in Apertium and concentrate on it. 90% of coverage could > be a minimal goal. The more the better. This will require a good command of > Telugu grammar. Lexc/lexd + twol should not be a problem for someone with a > computer science background. > Hèctor > > Missatge de Daniel Swanson del dia dj., 23 > de febr. 2023 a les 16:37: > >> Hi Lahari, >> >> For translation pairs, Hindi-English has been tried several times >> without success. I would suggest considering Hindi-Telugu. >> >> For other project ideas or places to get started, you can check the >> wiki page for each idea and do the coding challenge. If an idea is >> missing a coding challenge or you want to discuss the details of it, >> you'll get the quickest responses by talking to us on IRC: >> https://wiki.apertium.org/wiki/IRC >> >> Daniel >> >> On Thu, Feb 23, 2023 at 2:24 AM Lahari Sreeja Tallapaka >> wrote: >> > >> > Greetings to the community, >> > I am Lahari Sreeja from the Indian Institute of Technology(IIT), >> Bhilai. I have taken Machine Learning, Natural Language processing, and >> Information retrieval courses and have experience in frontend web >> development. I know Telugu, Hindi, and English languages. And Im interested >> in adding English-Hindi/English-Telugu language pairs and there are a lot >> of projects that are interesting and are in my domain. It would be helpful >> to get guidance on where to start and some issues I can work on. >> > Cheers! >> > ___ >> > Apertium-stuff mailing list >> > Apertium-stuff@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> >> >> ___ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Applying for GSOC 2023 projects
Telugu is a fairly decently resources language now, and has some well performing neural translators, like Hindi. We can still consider working with it in Apertium but first must figure out what our goal is, with it. If there's no particular goal then a low resource language would be preferable. Tanmai On Thu, Feb 23, 2023, 22:12 Hèctor Alòs i Font wrote: > I agree with Daniel. Moreover, experience shows that it is also > unrealistic to do such a complex project as a translator for distant > languages in a GSoC. I would look at the Telugu morphological analyser > currenly available in Apertium and concentrate on it. 90% of coverage could > be a minimal goal. The more the better. This will require a good command of > Telugu grammar. Lexc/lexd + twol should not be a problem for someone with a > computer science background. > Hèctor > > Missatge de Daniel Swanson del dia dj., 23 > de febr. 2023 a les 16:37: > >> Hi Lahari, >> >> For translation pairs, Hindi-English has been tried several times >> without success. I would suggest considering Hindi-Telugu. >> >> For other project ideas or places to get started, you can check the >> wiki page for each idea and do the coding challenge. If an idea is >> missing a coding challenge or you want to discuss the details of it, >> you'll get the quickest responses by talking to us on IRC: >> https://wiki.apertium.org/wiki/IRC >> >> Daniel >> >> On Thu, Feb 23, 2023 at 2:24 AM Lahari Sreeja Tallapaka >> wrote: >> > >> > Greetings to the community, >> > I am Lahari Sreeja from the Indian Institute of Technology(IIT), >> Bhilai. I have taken Machine Learning, Natural Language processing, and >> Information retrieval courses and have experience in frontend web >> development. I know Telugu, Hindi, and English languages. And Im interested >> in adding English-Hindi/English-Telugu language pairs and there are a lot >> of projects that are interesting and are in my domain. It would be helpful >> to get guidance on where to start and some issues I can work on. >> > Cheers! >> > ___ >> > Apertium-stuff mailing list >> > Apertium-stuff@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> >> >> ___ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Applying for GSOC 2023 projects
I agree with Daniel. Moreover, experience shows that it is also unrealistic to do such a complex project as a translator for distant languages in a GSoC. I would look at the Telugu morphological analyser currenly available in Apertium and concentrate on it. 90% of coverage could be a minimal goal. The more the better. This will require a good command of Telugu grammar. Lexc/lexd + twol should not be a problem for someone with a computer science background. Hèctor Missatge de Daniel Swanson del dia dj., 23 de febr. 2023 a les 16:37: > Hi Lahari, > > For translation pairs, Hindi-English has been tried several times > without success. I would suggest considering Hindi-Telugu. > > For other project ideas or places to get started, you can check the > wiki page for each idea and do the coding challenge. If an idea is > missing a coding challenge or you want to discuss the details of it, > you'll get the quickest responses by talking to us on IRC: > https://wiki.apertium.org/wiki/IRC > > Daniel > > On Thu, Feb 23, 2023 at 2:24 AM Lahari Sreeja Tallapaka > wrote: > > > > Greetings to the community, > > I am Lahari Sreeja from the Indian Institute of Technology(IIT), Bhilai. > I have taken Machine Learning, Natural Language processing, and Information > retrieval courses and have experience in frontend web development. I know > Telugu, Hindi, and English languages. And Im interested in adding > English-Hindi/English-Telugu language pairs and there are a lot of projects > that are interesting and are in my domain. It would be helpful to get > guidance on where to start and some issues I can work on. > > Cheers! > > ___ > > Apertium-stuff mailing list > > Apertium-stuff@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Applying for GSOC 2023 projects
Hi Lahari, For translation pairs, Hindi-English has been tried several times without success. I would suggest considering Hindi-Telugu. For other project ideas or places to get started, you can check the wiki page for each idea and do the coding challenge. If an idea is missing a coding challenge or you want to discuss the details of it, you'll get the quickest responses by talking to us on IRC: https://wiki.apertium.org/wiki/IRC Daniel On Thu, Feb 23, 2023 at 2:24 AM Lahari Sreeja Tallapaka wrote: > > Greetings to the community, > I am Lahari Sreeja from the Indian Institute of Technology(IIT), Bhilai. I > have taken Machine Learning, Natural Language processing, and Information > retrieval courses and have experience in frontend web development. I know > Telugu, Hindi, and English languages. And Im interested in adding > English-Hindi/English-Telugu language pairs and there are a lot of projects > that are interesting and are in my domain. It would be helpful to get > guidance on where to start and some issues I can work on. > Cheers! > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Applying for GSOC 2023 projects
Greetings to the community, I am Lahari Sreeja from the Indian Institute of Technology(IIT), Bhilai. I have taken Machine Learning, Natural Language processing, and Information retrieval courses and have experience in frontend web development. I know Telugu, Hindi, and English languages. And Im interested in adding English-Hindi/English-Telugu language pairs and there are a lot of projects that are interesting and are in my domain. It would be helpful to get guidance on where to start and some issues I can work on. Cheers! ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff