You could use the word-aligned version (or even the phrase-tables) from OPUS: http://opus.lingfil.uu.se/MultiUN/wordalign/ar-en/ http://opus.lingfil.uu.se/UN/wordalign/ar-en/
Best, Jörg ********************************************************************************** Jörg Tiedemann http://stp.lingfil.uu.se/~joerg/ On Dec 1, 2014, at 4:56 PM, emna hkiri wrote: > > Dear Friends thank you a lot for your help before and i hope that you will > help me > again > i try to build an arabic-english SMT with moses > but in the training Giza do not do the alignment it is because the corpus UN > ar-en is not well cleaned ; in fact this is the problem because they are not > parallel ;they have not the same number of lines. i'm working with 2000 > directory (2000ar and 2000en). does anyone worked with UN ar-en corpus??? > i want to ask how to make the same number of lines for ar-en in 2000 in order > to pass the cleaning step > > thank you in advance i hope you will answer my question > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
