
You can download the latest version of greek-english Europarl corpus here http://www.statmt.org/europarl/ . You get a tgz file which is compressed. If you uncompress it, you will get two files, already sentence aligned, so you can skip the first script. then go directly to the tokenizer.


???? 11/11/2010 06:06 ??, O/H Shibamouli Lahiri ??????:
Dear All,

I am a newbie to Moses installation and training. I would like to train a translation model on Europarl el-en (Greek-English) corpus. I have installed Moses and downloaded the Europarl el-en corpus, but after that I'm stuck where to start. I looked into the accompanying Perl scripts. There are 3 scripts:

1> Sentence aligner

2> Tokenizer

3> Sentence splitter

The sentence aligner needs I think 2 separate files - one in Greek, another in English. But the el-en corpus is a single file, so I'm not sure if splitting that file into a Greek file and an English file will do the job, or I'll have to first run the sentence splitter or the tokenizer.

Thank you very much for your kind appreciation.


Moses-support mailing list

MSc. Inf. Eleftherios Avramidis
DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
Tel. +49-30-3949-1827
Fax. +49-30-3949-1810
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern

Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Amtsgericht Kaiserslautern, HRB 2313

Moses-support mailing list

Reply via email to