Hi Heidi Good to hear you found the problem. Tokenisation does not change the number of lines, and neither does truecasing, so there must be a problem elsewhere in your pre-processing pipeline,
cheers - Barry Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013 19:47:29 +0200: > Yes! Problem found. Thanks alot. There was one more line in one file than > the other. > The original tuning data had the exact same number of lines but maybe the > lines changed after tokenizing. > > > > On Sun, Aug 18, 2013 at 7:34 PM, Barry Haddow > <[email protected]>wrote: > >> Hi Heidi >> >> Can you run >> >> wc -l ~/corpus/ar-en.tune.true.fr ~/corpus/ar-en.tune.true.en >> >> >> cheers - Barry >> >> >> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013 >> 19:10:21 +0200: >> >> cd ~/working >>> nohup nice ~/mosesdecoder/scripts/**training/mert-moses.pl \ >>> ~/corpus/ar-en.tune.true.fr ~/corpus/ar-en.tune.true.en \ >>> ~/mosesdecoder/bin/moses train/model/moses.ini --mertdir >>> ~/mosesdecoder/bin/ \ >>> &> mert.out & >>> >>> P.S I'm on the old system version if that would make a difference. >>> >>> >>> >>> On Sun, Aug 18, 2013 at 7:05 PM, Barry Haddow <[email protected] >>> >**wrote: >>> >>> Hi Heidi >>>> >>>> Can you give the exact argument that you use to run tuning? >>>> >>>> cheers - Barry >>>> >>>> >>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013 >>>> 18:55:59 +0200: >>>> >>>> my training set have the same number of lines, same goes for my tuning >>>> >>>>> set, >>>>> but each set is not the same number of lines as the other. i dont see >>>>> the >>>>> problem because in the moses baseline tutorial, this is how it works >>>>> too, >>>>> am i wrong? >>>>> >>>>> >>>>> On Sun, Aug 18, 2013 at 6:53 PM, Barry Haddow < >>>>> [email protected] >>>>> >**wrote: >>>>> >>>>> Hi Heidi >>>>> >>>>>> >>>>>> You have to supply an input set and a reference set to mert-moses.plfor >>>>>> tuning. This error suggests that they have different numbers of lines >>>>>> in >>>>>> them - so they are not parallel, >>>>>> >>>>>> cheers - Barry >>>>>> >>>>>> >>>>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013 >>>>>> 18:45:31 +0200: >>>>>> >>>>>> Inside of it, i get: >>>>>> >>>>>> >>>>>>> Binary write mode is NOT selected >>>>>>> Scorer type: BLEU >>>>>>> name: case value: true >>>>>>> Loading reference from /home/tjr/corpus/ar-en.tune.******true.en >>>>>>> ............................******Data::m_score_type BLEU >>>>>>> >>>>>>> >>>>>>> Data::Scorer type from Scorer: BLEU >>>>>>> loading nbest from run1.best100.out.gz >>>>>>> Exception: Sentence id (2844) not found in reference set >>>>>>> >>>>>>> I do not get the exception, which reference set is this referring to >>>>>>> and >>>>>>> what does it mean that it is not found? >>>>>>> >>>>>>> >>>>>>> On Sun, Aug 18, 2013 at 6:39 PM, Barry Haddow < >>>>>>> [email protected] >>>>>>> >**wrote: >>>>>>> >>>>>>> Hi Heidi >>>>>>> >>>>>>> >>>>>>>> Inside the mert working directory, there should be a file called >>>>>>>> extract.err. Look at the error message in this file. >>>>>>>> >>>>>>>> It could be that the input and reference you are using for tuning are >>>>>>>> mismatched, >>>>>>>> >>>>>>>> cheers - Barry >>>>>>>> >>>>>>>> >>>>>>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013 >>>>>>>> 16:12:33 +0200: >>>>>>>> >>>>>>>> I have an arabic to english system that works fine after training >>>>>>>> but >>>>>>>> when >>>>>>>> >>>>>>>> I start tuning i end up with this in the mert.out file: >>>>>>>> >>>>>>>>> ERROR: Failed to run '/home/tjr/working/mert-work/*** >>>>>>>>> *****extractor.sh'. >>>>>>>>> at >>>>>>>>> /home/tjr/mmosesdecoder/********scripts/training/mert-moses.pl line >>>>>>>>> 1554. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>> The University of Edinburgh is a charitable body, registered in >>>>>>>> Scotland, with registration number SC005336. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> -- >>>>>> The University of Edinburgh is a charitable body, registered in >>>>>> Scotland, with registration number SC005336. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> The University of Edinburgh is a charitable body, registered in >>>> Scotland, with registration number SC005336. >>>> >>>> >>>> >>>> >>> >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >> > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
