Hi Sagie,

   1. I would love to adopt a language pair, namely an English-Hebrew one.
      I noticed there's a resource link posted for Hebrew in the
      incubator:
      http://www.mila.cs.technion.ac.il/english/resources/lexicons/
      But this page only seem to list examples to some very basic
      lexicon collections.
      Since I'm not familiar with the amount of data required for a
      language pair, my question is how difficult will creating a new
      pair out of this kind of data would be? Is the data even good
      enough?

As a rule of thumb, you would need a couple of thousands of dictionary entries to cover about 90% of everyday language. I don't know the data in http://www.mila.cs.technion.ac.il/english/resources/lexicons/ : the server wasn't working this morning.

  1.


   2. More generally, I wanted to ask about the theoretical background
      needed for some of the ideas.
      A lot of the tasks use professional lingo and it's hard to infer
      the actual work needed.

We're here to help: just ask! Visit #apertium at irc.frenode.net! As a community, Apertiumers have developed their own lingo, which may sometimes be difficult to understand for newcomers. But we think it is possible to work on Apertium with a small amount of linguistic knowledge: no need for high-level computational linguistics here!

The main problems I guess there will be with Hebrew are (a) that the writing system does not write all vowels as far as I know and therefore written Hebrew contains a fair amount of ambiguity (b) that the morphology is not "catenative": you don't change suffixes or prefixes but rather change things around triliteral templates: Apertium dictionaries aren't too good at that kind of morphology and you will have to devise workarounds.

   1. I'm now taking my first Computational Linguistics course, which
      is a seminar based on Daniel Jurafsky's "Speech and Language
      Processing" http://books.google.com/books?id=fZmj5UNK8AQC
      I talked to my professor about Apertium and he said we will
      cover most of what's needed for working on MTs such as Apertium.

This will help a lot.

   1. My question is, how can I now tell which of the ideas would suit
      my theory background?
      Or do you think I should just stick to Entry-level tasks?

You get to choose, but don't do so before coming online and discussing with the colleagues at #apertium. Or, alternatively, write here!

Cheers

Mikel

--
Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes InformĂ tics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to