Hi Sagie,
1. I would love to adopt a language pair, namely an English-Hebrew one.
I noticed there's a resource link posted for Hebrew in the
incubator:
http://www.mila.cs.technion.ac.il/english/resources/lexicons/
But this page only seem to list examples to some very basic
lexicon collections.
Since I'm not familiar with the amount of data required for a
language pair, my question is how difficult will creating a new
pair out of this kind of data would be? Is the data even good
enough?
As a rule of thumb, you would need a couple of thousands of dictionary
entries to cover about 90% of everyday language. I don't know the data
in http://www.mila.cs.technion.ac.il/english/resources/lexicons/ : the
server wasn't working this morning.
1.
2. More generally, I wanted to ask about the theoretical background
needed for some of the ideas.
A lot of the tasks use professional lingo and it's hard to infer
the actual work needed.
We're here to help: just ask! Visit #apertium at irc.frenode.net! As a
community, Apertiumers have developed their own lingo, which may
sometimes be difficult to understand for newcomers. But we think it is
possible to work on Apertium with a small amount of linguistic
knowledge: no need for high-level computational linguistics here!
The main problems I guess there will be with Hebrew are (a) that the
writing system does not write all vowels as far as I know and therefore
written Hebrew contains a fair amount of ambiguity (b) that the
morphology is not "catenative": you don't change suffixes or prefixes
but rather change things around triliteral templates: Apertium
dictionaries aren't too good at that kind of morphology and you will
have to devise workarounds.
1. I'm now taking my first Computational Linguistics course, which
is a seminar based on Daniel Jurafsky's "Speech and Language
Processing" http://books.google.com/books?id=fZmj5UNK8AQC
I talked to my professor about Apertium and he said we will
cover most of what's needed for working on MTs such as Apertium.
This will help a lot.
1. My question is, how can I now tell which of the ideas would suit
my theory background?
Or do you think I should just stick to Entry-level tasks?
You get to choose, but don't do so before coming online and discussing
with the colleagues at #apertium. Or, alternatively, write here!
Cheers
Mikel
--
Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes InformĂ tics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326
------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff