Hello

just to get back to this issue since I bumped into it again:

$ tr -d -c '\n' < news-commentary-v12.de-en.de | wc -c
270769
$ tr -d -c '\r' < news-commentary-v12.de-en.de | wc -c
3920
$ tr -d -c '\n' < news-commentary-v12.de-en.en | wc -c
270769
$ tr -d -c '\r' < news-commentary-v12.de-en.en | wc -c
4099

so v12 is broken somehow when reading it with some tools / primitive, 
but it works with some others.

Just to let you know.



Le 14/09/2017 à 08:48, Vincent Nguyen a écrit :
> okay really weird.
> wc gives me the same numbers as you, but gedit give another 2 different
> numbers for each file. Must be special characters somewhere.
>
>
> Le 13/09/2017 à 18:52, Barry Haddow a écrit :
>> Hi Vincent
>>
>> Looks fine to me:
>>
>>> wc -l news-commentary-v12.de-en.*
>>>    270769 news-commentary-v12.de-en.de
>>>    270769 news-commentary-v12.de-en.en
>>>    541538 total
>> What are you running that shows you different line numbers?
>>
>> cheers - Barry
>>
>> On 12/09/17 10:06, Vincent Nguyen wrote:
>>> Hi,
>>> Is there an updated version of NCv12 for this
>>> http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz
>>>
>>>
>>> the number of lines for de-en is not the same in the 2 languages.
>>>
>>> Cheers,
>>> Vincent
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to