Hi,
You can download the latest version of greek-english Europarl corpus
here http://www.statmt.org/europarl/ . You get a tgz file which is
compressed. If you uncompress it, you will get two files, already
sentence aligned, so you can skip the first script. then go directly to
the tokenizer.
regards
Eleftherios
???? 11/11/2010 06:06 ??, O/H Shibamouli Lahiri ??????:
Dear All,
I am a newbie to Moses installation and training. I would like to
train a translation model on Europarl el-en (Greek-English) corpus. I
have installed Moses and downloaded the Europarl el-en corpus, but
after that I'm stuck where to start. I looked into the accompanying
Perl scripts. There are 3 scripts:
1> Sentence aligner
2> Tokenizer
3> Sentence splitter
The sentence aligner needs I think 2 separate files - one in Greek,
another in English. But the el-en corpus is a single file, so I'm not
sure if splitting that file into a Greek file and an English file will
do the job, or I'll have to first run the sentence splitter or the
tokenizer.
Thank you very much for your kind appreciation.
Regards,
Shibamouli
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
--
MSc. Inf. Eleftherios Avramidis
DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
Tel. +49-30-3949-1827
Fax. +49-30-3949-1810
-------------------------------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support