Re: [Apertium-stuff] GSoC proposal draft: Developing a Morphological Analyzer for Torwali Language

2021-04-08 Thread Naeemuddin Hadi
Thank your Hector, for the valuable feedbacks on the draft application, I
will work on the coding challenge before application period and I can
dedicate more than 30 hours per week to the project.
I agree with you regarding the number of words because 50,000 words I think
will be really difficult in case of Torwali. But, I have access to lexical
data of Torwali with more than 10,000 words and around 1000 example
sentences with some other running text. I also have text files listing
different categories of words.
I will update my proposal and add the necessary things with a couple of
weeks for disambiguation.


Regards,
Naeem

On Thu, Apr 8, 2021, 9:41 AM Hèctor Alòs i Font  Hi Naeem,
>
> Thanks a lot for your very good and interesting draft application. Torwali
> is an excellent language for Apertium. You know the challenges it presents
> and the work on it, and you prove to be committed to the language and the
> project. I am not a specialist on lexc-twol, but I see a few general things
> to improve your application:
>
> * The coding challenge is very important. It proves you understand how
> Apertium works (not only theoretically) and that you can do the job. So, do
> it as well as you can now. Don't leave it until after the application
> period.
>
> * Your 30 hours commitment per week is to be welcome, but bear in mind
> that it is much more than what Google is asking for this year.
>
> * You want to enter 50,000+ words in the morphological analyser. That's a
> huge amount. But in your work plan you don't say when you are going to do
> it. It would be necessary to show how many words and which grammatical
> categories you would add in each time slot (two weeks in your case).
> Usually we start with the closed categories. When you detail these numbers
> in your proposal, we will see how many words you will be able to reach.
>
> * I have no idea how it is in the case of Dardic languages, but the
> assignment of words to categories is not usually trivial in Indo-European
> languages. Do existing works already have lists of words assigned to
> paradigms? For example: lists of verbs following one model or another. If
> not, the time needed for assignment increases. It is necessary to know this
> in order to calculate the feasibility of introducing 50,000, 30,000 or
> 20,000 words.
>
> * Are there extensive lists of words available in electronic format, with
> their grammatical category, which you could use for your work? They should
> be free. If they were copyrighted they could not be (semi-)automatically
> uploaded to Apertium.
>
> * It is very likely that, with the very limited time we have this year for
> GSoC projects, a complete morphological analyser from scratch is perfectly
> reasonable. Still, before putting so many words into it (especially if you
> have to add them manually), I think it would be reasonable to spend a
> couple of weeks training a morphological disambiguator.
>
> Hèctor
>
> Missatge de Naeemuddin Hadi  del dia dj., 8
> d’abr. 2021 a les 1:46:
>
>> Hello everyone,
>>
>> I am Naeem, a student of UET Peshawar. I want to participate in GSoC
>> 2021.  I am working to create a morphological analyzer for an endangered
>> language of northern Pakistan called Torwali.
>> I have prepared a draft proposal and will appreciate feedbacks before
>> final submission. links related to coding challenge are included in the
>> draft.
>>
>> link (Draft) :
>> https://drive.google.com/file/d/1hnu6gRWVN3LjjxOj0BvimvJ56AIKfe6q/view?usp=sharing
>>
>>
>> Regards,
>> Naeem
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSoC proposal draft: Developing a Morphological Analyzer for Torwali Language

2021-04-07 Thread Hèctor Alòs i Font
Hi Naeem,

Thanks a lot for your very good and interesting draft application. Torwali
is an excellent language for Apertium. You know the challenges it presents
and the work on it, and you prove to be committed to the language and the
project. I am not a specialist on lexc-twol, but I see a few general things
to improve your application:

* The coding challenge is very important. It proves you understand how
Apertium works (not only theoretically) and that you can do the job. So, do
it as well as you can now. Don't leave it until after the application
period.

* Your 30 hours commitment per week is to be welcome, but bear in mind that
it is much more than what Google is asking for this year.

* You want to enter 50,000+ words in the morphological analyser. That's a
huge amount. But in your work plan you don't say when you are going to do
it. It would be necessary to show how many words and which grammatical
categories you would add in each time slot (two weeks in your case).
Usually we start with the closed categories. When you detail these numbers
in your proposal, we will see how many words you will be able to reach.

* I have no idea how it is in the case of Dardic languages, but the
assignment of words to categories is not usually trivial in Indo-European
languages. Do existing works already have lists of words assigned to
paradigms? For example: lists of verbs following one model or another. If
not, the time needed for assignment increases. It is necessary to know this
in order to calculate the feasibility of introducing 50,000, 30,000 or
20,000 words.

* Are there extensive lists of words available in electronic format, with
their grammatical category, which you could use for your work? They should
be free. If they were copyrighted they could not be (semi-)automatically
uploaded to Apertium.

* It is very likely that, with the very limited time we have this year for
GSoC projects, a complete morphological analyser from scratch is perfectly
reasonable. Still, before putting so many words into it (especially if you
have to add them manually), I think it would be reasonable to spend a
couple of weeks training a morphological disambiguator.

Hèctor

Missatge de Naeemuddin Hadi  del dia dj., 8
d’abr. 2021 a les 1:46:

> Hello everyone,
>
> I am Naeem, a student of UET Peshawar. I want to participate in GSoC
> 2021.  I am working to create a morphological analyzer for an endangered
> language of northern Pakistan called Torwali.
> I have prepared a draft proposal and will appreciate feedbacks before
> final submission. links related to coding challenge are included in the
> draft.
>
> link (Draft) :
> https://drive.google.com/file/d/1hnu6gRWVN3LjjxOj0BvimvJ56AIKfe6q/view?usp=sharing
>
>
> Regards,
> Naeem
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] GSoC proposal draft: Developing a Morphological Analyzer for Torwali Language

2021-04-07 Thread Naeemuddin Hadi
Hello everyone,

I am Naeem, a student of UET Peshawar. I want to participate in GSoC 2021.
I am working to create a morphological analyzer for an endangered language
of northern Pakistan called Torwali.
I have prepared a draft proposal and will appreciate feedbacks before final
submission. links related to coding challenge are included in the draft.

link (Draft) :
https://drive.google.com/file/d/1hnu6gRWVN3LjjxOj0BvimvJ56AIKfe6q/view?usp=sharing


Regards,
Naeem
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff