Hi All, I apologise in advance if this kind of question has no place on this list. This question is more Europarl related than Moses related.
On using the sentence-align script that ships with the source version of version 3 of europarl I get a lot of 'different number of paragraphs' messages. Does anybody know why different numbers of paragraphs are so common. 1-n sentence alignment is understandable but I was unaware that 1-n paragraph matching was such a common thing. Does anybody know of any attempts to automatically align paragraphs in the corpus? It seems a shame to filter out so much language just because the number of paragraphs don't match. _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support