Re: [Moses-support] NCv12 number of lines mismatch

2017-09-14 Thread Vincent Nguyen
nano give also the "right" number 270769 but I got some script which 
find a difference.


Le 14/09/2017 à 08:48, Vincent Nguyen a écrit :
> okay really weird.
> wc gives me the same numbers as you, but gedit give another 2 different
> numbers for each file. Must be special characters somewhere.
>
>
> Le 13/09/2017 à 18:52, Barry Haddow a écrit :
>> Hi Vincent
>>
>> Looks fine to me:
>>
>>> wc -l news-commentary-v12.de-en.*
>>>    270769 news-commentary-v12.de-en.de
>>>    270769 news-commentary-v12.de-en.en
>>>    541538 total
>> What are you running that shows you different line numbers?
>>
>> cheers - Barry
>>
>> On 12/09/17 10:06, Vincent Nguyen wrote:
>>> Hi,
>>> Is there an updated version of NCv12 for this
>>> http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz
>>>
>>>
>>> the number of lines for de-en is not the same in the 2 languages.
>>>
>>> Cheers,
>>> Vincent
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] NCv12 number of lines mismatch

2017-09-14 Thread Vincent Nguyen
okay really weird.
wc gives me the same numbers as you, but gedit give another 2 different 
numbers for each file. Must be special characters somewhere.


Le 13/09/2017 à 18:52, Barry Haddow a écrit :
> Hi Vincent
>
> Looks fine to me:
>
>> wc -l news-commentary-v12.de-en.*
>>   270769 news-commentary-v12.de-en.de
>>   270769 news-commentary-v12.de-en.en
>>   541538 total
>
> What are you running that shows you different line numbers?
>
> cheers - Barry
>
> On 12/09/17 10:06, Vincent Nguyen wrote:
>> Hi,
>> Is there an updated version of NCv12 for this
>> http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz 
>>
>>
>> the number of lines for de-en is not the same in the 2 languages.
>>
>> Cheers,
>> Vincent
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] NCv12 number of lines mismatch

2017-09-13 Thread Barry Haddow
Hi Vincent

Looks fine to me:

> wc -l news-commentary-v12.de-en.*
>   270769 news-commentary-v12.de-en.de
>   270769 news-commentary-v12.de-en.en
>   541538 total

What are you running that shows you different line numbers?

cheers - Barry

On 12/09/17 10:06, Vincent Nguyen wrote:
> Hi,
> Is there an updated version of NCv12 for this
> http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz
>
> the number of lines for de-en is not the same in the 2 languages.
>
> Cheers,
> Vincent
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support