Re: [Moses-support] how to clean the UN corpus

joerg Mon, 01 Dec 2014 08:51:50 -0800

You could use the word-aligned version (or even the phrase-tables) from OPUS:
http://opus.lingfil.uu.se/MultiUN/wordalign/ar-en/
http://opus.lingfil.uu.se/UN/wordalign/ar-en/


Best,
Jörg

**********************************************************************************
Jörg Tiedemann                                 http://stp.lingfil.uu.se/~joerg/



On Dec 1, 2014, at 4:56 PM, emna hkiri wrote:

> 
> Dear Friends thank you a lot for your help before and i hope that you will 
> help me 
> again 
> i try to build an arabic-english  SMT with moses 
> but in the training Giza do not do the alignment it is because the corpus UN 
> ar-en is not well cleaned ; in fact this is the problem because they are not 
> parallel ;they have not the same number of lines. i'm working with 2000 
> directory (2000ar and 2000en). does  anyone worked with UN ar-en corpus???
> i want to ask how to make the same number of lines for ar-en in 2000 in order 
> to pass the cleaning step 
> 
> thank you in advance i hope you will answer my question
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] how to clean the UN corpus

Reply via email to