El 2017-03-30 12:11, Marc Riera Irigoyen escribió: > Hello everyone, > > I have been working on my proposal for this year's GSoC and I have > published a first version of it on the wiki. You can find it here: > http://wiki.apertium.org/wiki/User:Marcriera/proposal > > It would be great to get some feedback about it. The workplan is not > final, as I am working on the coding challenge and it will be based on > the results. > > Thank you!
Hey there! I think that your estimates are quite low: New stems in bidix (~400 stems a week) Additional transfer rules, lexical selection rules and CG (~8-10 rules a week) Someone working on this task should be looking at more like 1000s entries/day... you should be adding them semi-automatically based on other resources and scripts. I just timed myself translating 100 words from the middle of the frequency list from Spanish to English and I did it in 3 minutes 20 seconds. I recommend you time yourself doing some of the work to come up with more realistic estimates. There are parallel corpora for English and Catalan, are you planning to learn lexical selection rules ? How are you planning to choose the transfer rules to write ? What do you think a good approach is for finding which CG rules to write. Do you plan to train the perceptron tagger on a large tagged corpus of English ? If so, how do you plan to do it ? Hope these questions/suggestions help! :) Regards, Fran ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff