Hi Heidi It's not a good idea to run Arabic through the truecaser - it is only for languages written in latin script. I'm not even sure that Arabic has case.
Also, I noticed that there are a lot of blank lines in your data. This could cause you problems so it would be worth removing them, making sure that you don't get your data out of alignment. For your mert error, you should again look at the log files - mert.out and mert.log. cheers - Barry Quoting Heidi Heweidy <[email protected]> on Mon, 19 Aug 2013 00:12:37 +0200: > There are 4 files,two before truecasing and two after, the lines are > unequal after the truecasing. > Please notice that when I make the truecased files equal again (by manually > cutting them), I get an error again in the mert.out saying: > > Executing: /home/tjr/mosesdecoder/bin/mert -d 14 --scconfig case:true > --ffilee run1.features.dat --scfile run1.scores.dat --ifile run1.init.opt > -n 20 > mert.out 2> mert.log > Exit code: 134 > ERROR: Failed to run '/home/tjr/mosesdecoder/bin/mert -d 14 --scconfig > case:true --ffile run1.features.dat --scfile run1.scores.dat --ifile > run1.init.opt -n 20'. at /home/tjr/mosesdecoder/scripts/training/ > mert-moses.pl line 1554. > > > On Sun, Aug 18, 2013 at 11:09 PM, Barry Haddow > <[email protected]>wrote: > >> Hi Heidi >> >> If the truecaser changes the number of lines in the file then that's a >> bug. Have you opened the files in a windows editor? Could you send me the >> before and after truecase files? >> >> cheers - Barry >> >> >> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013 >> 20:10:10 +0200: >> >> hmmm... well this does make sense.. >>> the problem is there is nothing else that might have changed the number of >>> lines because after tokenizing, the lines were the same.. the only time >>> the >>> files were not the same anymore is right after the truecasing step.. i >>> just >>> cut the .true files to have the same number of lines and made sure they >>> are >>> properly aligned and i just hope that tuning finishes successfully coz if >>> not, i dont know what might have caused the problem. fingers crossed. >>> anyway, thanks alot once again >>> >>> >>> On Sun, Aug 18, 2013 at 7:50 PM, Barry Haddow <[email protected] >>> >**wrote: >>> >>> Hi Heidi >>>> >>>> Good to hear you found the problem. Tokenisation does not change the >>>> number of lines, and neither does truecasing, so there must be a problem >>>> elsewhere in your pre-processing pipeline, >>>> >>>> cheers - Barry >>>> >>>> >>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013 >>>> 19:47:29 +0200: >>>> >>>> Yes! Problem found. Thanks alot. There was one more line in one file >>>> than >>>> >>>>> the other. >>>>> The original tuning data had the exact same number of lines but maybe >>>>> the >>>>> lines changed after tokenizing. >>>>> >>>>> >>>>> >>>>> On Sun, Aug 18, 2013 at 7:34 PM, Barry Haddow < >>>>> [email protected] >>>>> >**wrote: >>>>> >>>>> Hi Heidi >>>>> >>>>>> >>>>>> Can you run >>>>>> >>>>>> wc -l ~/corpus/ar-en.tune.true.fr ~/corpus/ar-en.tune.true.en >>>>>> >>>>>> >>>>>> cheers - Barry >>>>>> >>>>>> >>>>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013 >>>>>> 19:10:21 +0200: >>>>>> >>>>>> cd ~/working >>>>>> >>>>>> nohup nice ~/mosesdecoder/scripts/******training/mert-moses.pl \ >>>>>>> >>>>>>> >>>>>>> ~/corpus/ar-en.tune.true.fr ~/corpus/ar-en.tune.true.en \ >>>>>>> ~/mosesdecoder/bin/moses train/model/moses.ini --mertdir >>>>>>> ~/mosesdecoder/bin/ \ >>>>>>> &> mert.out & >>>>>>> >>>>>>> P.S I'm on the old system version if that would make a difference. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, Aug 18, 2013 at 7:05 PM, Barry Haddow < >>>>>>> [email protected] >>>>>>> >**wrote: >>>>>>> >>>>>>> Hi Heidi >>>>>>> >>>>>>> >>>>>>>> Can you give the exact argument that you use to run tuning? >>>>>>>> >>>>>>>> cheers - Barry >>>>>>>> >>>>>>>> >>>>>>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013 >>>>>>>> 18:55:59 +0200: >>>>>>>> >>>>>>>> my training set have the same number of lines, same goes for my >>>>>>>> tuning >>>>>>>> >>>>>>>> set, >>>>>>>> >>>>>>>>> but each set is not the same number of lines as the other. i dont >>>>>>>>> see >>>>>>>>> the >>>>>>>>> problem because in the moses baseline tutorial, this is how it works >>>>>>>>> too, >>>>>>>>> am i wrong? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Aug 18, 2013 at 6:53 PM, Barry Haddow < >>>>>>>>> [email protected] >>>>>>>>> >**wrote: >>>>>>>>> >>>>>>>>> Hi Heidi >>>>>>>>> >>>>>>>>> >>>>>>>>> You have to supply an input set and a reference set to >>>>>>>>>> mert-moses.plfor >>>>>>>>>> tuning. This error suggests that they have different numbers of >>>>>>>>>> lines >>>>>>>>>> in >>>>>>>>>> them - so they are not parallel, >>>>>>>>>> >>>>>>>>>> cheers - Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug >>>>>>>>>> 2013 >>>>>>>>>> 18:45:31 +0200: >>>>>>>>>> >>>>>>>>>> Inside of it, i get: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Binary write mode is NOT selected >>>>>>>>>> >>>>>>>>>>> Scorer type: BLEU >>>>>>>>>>> name: case value: true >>>>>>>>>>> Loading reference from /home/tjr/corpus/ar-en.tune.**** >>>>>>>>>>> ******true.en >>>>>>>>>>> ............................**********Data::m_score_type BLEU >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Data::Scorer type from Scorer: BLEU >>>>>>>>>>> loading nbest from run1.best100.out.gz >>>>>>>>>>> Exception: Sentence id (2844) not found in reference set >>>>>>>>>>> >>>>>>>>>>> I do not get the exception, which reference set is this referring >>>>>>>>>>> to >>>>>>>>>>> and >>>>>>>>>>> what does it mean that it is not found? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, Aug 18, 2013 at 6:39 PM, Barry Haddow < >>>>>>>>>>> [email protected] >>>>>>>>>>> >**wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Heidi >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Inside the mert working directory, there should be a file called >>>>>>>>>>> >>>>>>>>>>>> extract.err. Look at the error message in this file. >>>>>>>>>>>> >>>>>>>>>>>> It could be that the input and reference you are using for tuning >>>>>>>>>>>> are >>>>>>>>>>>> mismatched, >>>>>>>>>>>> >>>>>>>>>>>> cheers - Barry >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug >>>>>>>>>>>> 2013 >>>>>>>>>>>> 16:12:33 +0200: >>>>>>>>>>>> >>>>>>>>>>>> I have an arabic to english system that works fine after >>>>>>>>>>>> training >>>>>>>>>>>> but >>>>>>>>>>>> when >>>>>>>>>>>> >>>>>>>>>>>> I start tuning i end up with this in the mert.out file: >>>>>>>>>>>> >>>>>>>>>>>> ERROR: Failed to run '/home/tjr/working/mert-work/******* >>>>>>>>>>>> >>>>>>>>>>>>> *****extractor.sh'. >>>>>>>>>>>>> at >>>>>>>>>>>>> /home/tjr/mmosesdecoder/************scripts/training/mert-** >>>>>>>>>>>>> moses.** >>>>>>>>>>>>> pl <http://mert-moses.pl> line >>>>>>>>>>>>> 1554. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> >>>>>>>>>>>>> The University of Edinburgh is a charitable body, registered in >>>>>>>>>>>> Scotland, with registration number SC005336. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>> The University of Edinburgh is a charitable body, registered in >>>>>>>>>> Scotland, with registration number SC005336. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> -- >>>>>>>> The University of Edinburgh is a charitable body, registered in >>>>>>>> Scotland, with registration number SC005336. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> -- >>>>>> The University of Edinburgh is a charitable body, registered in >>>>>> Scotland, with registration number SC005336. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> The University of Edinburgh is a charitable body, registered in >>>> Scotland, with registration number SC005336. >>>> >>>> >>>> >>>> >>> >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >> > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
