Hi Heidi If the truecaser changes the number of lines in the file then that's a bug. Have you opened the files in a windows editor? Could you send me the before and after truecase files?
cheers - Barry Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013 20:10:10 +0200: > hmmm... well this does make sense.. > the problem is there is nothing else that might have changed the number of > lines because after tokenizing, the lines were the same.. the only time the > files were not the same anymore is right after the truecasing step.. i just > cut the .true files to have the same number of lines and made sure they are > properly aligned and i just hope that tuning finishes successfully coz if > not, i dont know what might have caused the problem. fingers crossed. > anyway, thanks alot once again > > > On Sun, Aug 18, 2013 at 7:50 PM, Barry Haddow > <[email protected]>wrote: > >> Hi Heidi >> >> Good to hear you found the problem. Tokenisation does not change the >> number of lines, and neither does truecasing, so there must be a problem >> elsewhere in your pre-processing pipeline, >> >> cheers - Barry >> >> >> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013 >> 19:47:29 +0200: >> >> Yes! Problem found. Thanks alot. There was one more line in one file than >>> the other. >>> The original tuning data had the exact same number of lines but maybe the >>> lines changed after tokenizing. >>> >>> >>> >>> On Sun, Aug 18, 2013 at 7:34 PM, Barry Haddow <[email protected] >>> >**wrote: >>> >>> Hi Heidi >>>> >>>> Can you run >>>> >>>> wc -l ~/corpus/ar-en.tune.true.fr ~/corpus/ar-en.tune.true.en >>>> >>>> >>>> cheers - Barry >>>> >>>> >>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013 >>>> 19:10:21 +0200: >>>> >>>> cd ~/working >>>> >>>>> nohup nice ~/mosesdecoder/scripts/****training/mert-moses.pl \ >>>>> >>>>> ~/corpus/ar-en.tune.true.fr ~/corpus/ar-en.tune.true.en \ >>>>> ~/mosesdecoder/bin/moses train/model/moses.ini --mertdir >>>>> ~/mosesdecoder/bin/ \ >>>>> &> mert.out & >>>>> >>>>> P.S I'm on the old system version if that would make a difference. >>>>> >>>>> >>>>> >>>>> On Sun, Aug 18, 2013 at 7:05 PM, Barry Haddow < >>>>> [email protected] >>>>> >**wrote: >>>>> >>>>> Hi Heidi >>>>> >>>>>> >>>>>> Can you give the exact argument that you use to run tuning? >>>>>> >>>>>> cheers - Barry >>>>>> >>>>>> >>>>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013 >>>>>> 18:55:59 +0200: >>>>>> >>>>>> my training set have the same number of lines, same goes for my tuning >>>>>> >>>>>> set, >>>>>>> but each set is not the same number of lines as the other. i dont see >>>>>>> the >>>>>>> problem because in the moses baseline tutorial, this is how it works >>>>>>> too, >>>>>>> am i wrong? >>>>>>> >>>>>>> >>>>>>> On Sun, Aug 18, 2013 at 6:53 PM, Barry Haddow < >>>>>>> [email protected] >>>>>>> >**wrote: >>>>>>> >>>>>>> Hi Heidi >>>>>>> >>>>>>> >>>>>>>> You have to supply an input set and a reference set to >>>>>>>> mert-moses.plfor >>>>>>>> tuning. This error suggests that they have different numbers of lines >>>>>>>> in >>>>>>>> them - so they are not parallel, >>>>>>>> >>>>>>>> cheers - Barry >>>>>>>> >>>>>>>> >>>>>>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013 >>>>>>>> 18:45:31 +0200: >>>>>>>> >>>>>>>> Inside of it, i get: >>>>>>>> >>>>>>>> >>>>>>>> Binary write mode is NOT selected >>>>>>>>> Scorer type: BLEU >>>>>>>>> name: case value: true >>>>>>>>> Loading reference from /home/tjr/corpus/ar-en.tune.********true.en >>>>>>>>> ............................********Data::m_score_type BLEU >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Data::Scorer type from Scorer: BLEU >>>>>>>>> loading nbest from run1.best100.out.gz >>>>>>>>> Exception: Sentence id (2844) not found in reference set >>>>>>>>> >>>>>>>>> I do not get the exception, which reference set is this referring to >>>>>>>>> and >>>>>>>>> what does it mean that it is not found? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Aug 18, 2013 at 6:39 PM, Barry Haddow < >>>>>>>>> [email protected] >>>>>>>>> >**wrote: >>>>>>>>> >>>>>>>>> Hi Heidi >>>>>>>>> >>>>>>>>> >>>>>>>>> Inside the mert working directory, there should be a file called >>>>>>>>>> extract.err. Look at the error message in this file. >>>>>>>>>> >>>>>>>>>> It could be that the input and reference you are using for tuning >>>>>>>>>> are >>>>>>>>>> mismatched, >>>>>>>>>> >>>>>>>>>> cheers - Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug >>>>>>>>>> 2013 >>>>>>>>>> 16:12:33 +0200: >>>>>>>>>> >>>>>>>>>> I have an arabic to english system that works fine after training >>>>>>>>>> but >>>>>>>>>> when >>>>>>>>>> >>>>>>>>>> I start tuning i end up with this in the mert.out file: >>>>>>>>>> >>>>>>>>>> ERROR: Failed to run '/home/tjr/working/mert-work/***** >>>>>>>>>>> *****extractor.sh'. >>>>>>>>>>> at >>>>>>>>>>> /home/tjr/mmosesdecoder/**********scripts/training/mert-moses.** >>>>>>>>>>> pl <http://mert-moses.pl> line >>>>>>>>>>> 1554. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>> The University of Edinburgh is a charitable body, registered in >>>>>>>>>> Scotland, with registration number SC005336. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> -- >>>>>>>> The University of Edinburgh is a charitable body, registered in >>>>>>>> Scotland, with registration number SC005336. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> -- >>>>>> The University of Edinburgh is a charitable body, registered in >>>>>> Scotland, with registration number SC005336. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> The University of Edinburgh is a charitable body, registered in >>>> Scotland, with registration number SC005336. >>>> >>>> >>>> >>>> >>> >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >> > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
