Re: [Moses-support] NCv12 number of lines mismatch
nano give also the "right" number 270769 but I got some script which find a difference. Le 14/09/2017 à 08:48, Vincent Nguyen a écrit : > okay really weird. > wc gives me the same numbers as you, but gedit give another 2 different > numbers for each file. Must be special characters somewhere. > > > Le 13/09/2017 à 18:52, Barry Haddow a écrit : >> Hi Vincent >> >> Looks fine to me: >> >>> wc -l news-commentary-v12.de-en.* >>> 270769 news-commentary-v12.de-en.de >>> 270769 news-commentary-v12.de-en.en >>> 541538 total >> What are you running that shows you different line numbers? >> >> cheers - Barry >> >> On 12/09/17 10:06, Vincent Nguyen wrote: >>> Hi, >>> Is there an updated version of NCv12 for this >>> http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz >>> >>> >>> the number of lines for de-en is not the same in the 2 languages. >>> >>> Cheers, >>> Vincent >>> ___ >>> Moses-support mailing list >>> Moses-support@mit.edu >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >> > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] NCv12 number of lines mismatch
okay really weird. wc gives me the same numbers as you, but gedit give another 2 different numbers for each file. Must be special characters somewhere. Le 13/09/2017 à 18:52, Barry Haddow a écrit : > Hi Vincent > > Looks fine to me: > >> wc -l news-commentary-v12.de-en.* >> 270769 news-commentary-v12.de-en.de >> 270769 news-commentary-v12.de-en.en >> 541538 total > > What are you running that shows you different line numbers? > > cheers - Barry > > On 12/09/17 10:06, Vincent Nguyen wrote: >> Hi, >> Is there an updated version of NCv12 for this >> http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz >> >> >> the number of lines for de-en is not the same in the 2 languages. >> >> Cheers, >> Vincent >> ___ >> Moses-support mailing list >> Moses-support@mit.edu >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] NCv12 number of lines mismatch
Hi Vincent Looks fine to me: > wc -l news-commentary-v12.de-en.* > 270769 news-commentary-v12.de-en.de > 270769 news-commentary-v12.de-en.en > 541538 total What are you running that shows you different line numbers? cheers - Barry On 12/09/17 10:06, Vincent Nguyen wrote: > Hi, > Is there an updated version of NCv12 for this > http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz > > the number of lines for de-en is not the same in the 2 languages. > > Cheers, > Vincent > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support