If your parallel corpus is not sentence aligned then you may look at some
sentence aligner tool, which can extract parallel sentences with some
confidence.
For eg.Microsoft Bilingual Sentence Aligner
http://research.microsoft.com/en-us/downloads/aafd5dcf-4dcc-49b2-8a22-f7055113e656/


On Mon, Dec 1, 2014 at 4:56 PM, emna hkiri <[email protected]> wrote:

>
> Dear Friends thank you a lot for your help before and i hope that you will
> help me
> again
> i try to build an arabic-english  SMT with moses
> but in the training Giza do not do the alignment it is because the corpus
> UN ar-en is not well cleaned ; in fact this is the problem because they are
> not parallel ;they have not the same number of lines. i'm working with 2000
> directory (2000ar and 2000en). does  anyone worked with UN ar-en corpus???
> i want to ask how to make the same number of lines for ar-en in 2000 in
> order to pass the cleaning step
>
> thank you in advance i hope you will answer my question
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
-Regards,
 Rajen Chatterjee.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to