Thank your Hector, for the valuable feedbacks on the draft application, I
will work on the coding challenge before application period and I can
dedicate more than 30 hours per week to the project.
I agree with you regarding the number of words because 50,000 words I think
will be really difficult in case of Torwali. But, I have access to lexical
data of Torwali with more than 10,000 words and around 1000 example
sentences with some other running text. I also have text files listing
different categories of words.
I will update my proposal and add the necessary things with a couple of
weeks for disambiguation.
Regards,
Naeem
On Thu, Apr 8, 2021, 9:41 AM Hèctor Alòs i Font Hi Naeem,
>
> Thanks a lot for your very good and interesting draft application. Torwali
> is an excellent language for Apertium. You know the challenges it presents
> and the work on it, and you prove to be committed to the language and the
> project. I am not a specialist on lexc-twol, but I see a few general things
> to improve your application:
>
> * The coding challenge is very important. It proves you understand how
> Apertium works (not only theoretically) and that you can do the job. So, do
> it as well as you can now. Don't leave it until after the application
> period.
>
> * Your 30 hours commitment per week is to be welcome, but bear in mind
> that it is much more than what Google is asking for this year.
>
> * You want to enter 50,000+ words in the morphological analyser. That's a
> huge amount. But in your work plan you don't say when you are going to do
> it. It would be necessary to show how many words and which grammatical
> categories you would add in each time slot (two weeks in your case).
> Usually we start with the closed categories. When you detail these numbers
> in your proposal, we will see how many words you will be able to reach.
>
> * I have no idea how it is in the case of Dardic languages, but the
> assignment of words to categories is not usually trivial in Indo-European
> languages. Do existing works already have lists of words assigned to
> paradigms? For example: lists of verbs following one model or another. If
> not, the time needed for assignment increases. It is necessary to know this
> in order to calculate the feasibility of introducing 50,000, 30,000 or
> 20,000 words.
>
> * Are there extensive lists of words available in electronic format, with
> their grammatical category, which you could use for your work? They should
> be free. If they were copyrighted they could not be (semi-)automatically
> uploaded to Apertium.
>
> * It is very likely that, with the very limited time we have this year for
> GSoC projects, a complete morphological analyser from scratch is perfectly
> reasonable. Still, before putting so many words into it (especially if you
> have to add them manually), I think it would be reasonable to spend a
> couple of weeks training a morphological disambiguator.
>
> Hèctor
>
> Missatge de Naeemuddin Hadi del dia dj., 8
> d’abr. 2021 a les 1:46:
>
>> Hello everyone,
>>
>> I am Naeem, a student of UET Peshawar. I want to participate in GSoC
>> 2021. I am working to create a morphological analyzer for an endangered
>> language of northern Pakistan called Torwali.
>> I have prepared a draft proposal and will appreciate feedbacks before
>> final submission. links related to coding challenge are included in the
>> draft.
>>
>> link (Draft) :
>> https://drive.google.com/file/d/1hnu6gRWVN3LjjxOj0BvimvJ56AIKfe6q/view?usp=sharing
>>
>>
>> Regards,
>> Naeem
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff