On 23 March 2011 05:40, Mikel Forcada <[email protected]> wrote: > Hi Sagie, > > I would love to adopt a language pair, namely an English-Hebrew one. > I noticed there's a resource link posted for Hebrew in the > incubator: http://www.mila.cs.technion.ac.il/english/resources/lexicons/ > But this page only seem to list examples to some very basic lexicon > collections. > Since I'm not familiar with the amount of data required for a language pair, > my question is how difficult will creating a new pair out of this kind of > data would be? Is the data even good enough? > > As a rule of thumb, you would need a couple of thousands of dictionary > entries to cover about 90% of everyday language. I don't know the data in > http://www.mila.cs.technion.ac.il/english/resources/lexicons/ : the server > wasn't working this morning. >
They have a GPL'd WordNet for Hebrew, which is aligned with the synset IDs of the English one: http://mila.cs.technion.ac.il/wordnet/ver2.0/ I haven't seen anything more than the examples for their morphological database, but they're relatively well known as being pro-GPL, right-thinking people, so it may be an oversight. -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. ------------------------------------------------------------------------------ Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
