Hi Miguel

Two tools you can try are

Hunalign: https://github.com/danielvarga/hunalign
Bleualign: https://github.com/rsennrich/Bleualign 
<https://github.com/rsennrich/Bleualign>

I don’t know what exactly the effect of wildly different sentence lengths is 
though.

Regards
Mathias

> On 20 Apr 2018, at 09:24, Miguel Domingo <mido...@prhlt.upv.es> wrote:
> 
> Good morning,
> 
> I have two documents which have the same text (in different languages) but 
> different structure (one language was written using very short sentences 
> while the other was written using longer sentences). Does anybody know of a 
> tool with which to align the sentences to obtain a parallel corpus suitable 
> for MT? (So far I've tried Gargantua, but it's deleting most of the text.)
> 
> Thanks in advance,
> 
> Miguel
> 
> 
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to