El 2017-03-30 12:11, Marc Riera Irigoyen escribió:
> Hello everyone,
> I have been working on my proposal for this year's GSoC and I have
> published a first version of it on the wiki. You can find it here:
> http://wiki.apertium.org/wiki/User:Marcriera/proposal
> It would be great to get some feedback about it. The workplan is not
> final, as I am working on the coding challenge and it will be based on
> the results.
> Thank you!

Hey there! I think that your estimates are quite low:

  New stems in bidix (~400 stems a week)
Additional transfer rules, lexical selection rules and CG (~8-10 rules a 

Someone working on this task should be looking at more like 1000s 
entries/day... you should be adding them semi-automatically based on 
other resources and scripts. I just timed myself translating 100 words 
from the middle of the frequency list from Spanish to English and I did 
it in 3 minutes 20 seconds. I recommend you time yourself doing some of 
the work to come up with more realistic estimates.

There are parallel corpora for English and Catalan, are you planning to 
learn lexical selection rules ?

How are you planning to choose the transfer rules to write ?

What do you think a good approach is for finding which CG rules to 
write. Do you plan to train the perceptron tagger on a large tagged 
corpus of English ? If so, how do you plan to do it ?

Hope these questions/suggestions help! :)



Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Apertium-stuff mailing list

Reply via email to