Hi Heidi

Good to hear you found the problem. Tokenisation does not change the  
number of lines, and neither does truecasing, so there must be a  
problem elsewhere in your pre-processing pipeline,

cheers - Barry

Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013  
19:47:29 +0200:

> Yes! Problem found. Thanks alot. There was one more line in one file than
> the other.
> The original tuning data had the exact same number of lines but maybe the
> lines changed after tokenizing.
>
>
>
> On Sun, Aug 18, 2013 at 7:34 PM, Barry Haddow  
> <[email protected]>wrote:
>
>> Hi Heidi
>>
>> Can you run
>>
>> wc -l ~/corpus/ar-en.tune.true.fr ~/corpus/ar-en.tune.true.en
>>
>>
>> cheers - Barry
>>
>>
>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013
>> 19:10:21 +0200:
>>
>>  cd ~/working
>>>  nohup nice ~/mosesdecoder/scripts/**training/mert-moses.pl \
>>>   ~/corpus/ar-en.tune.true.fr ~/corpus/ar-en.tune.true.en \
>>>   ~/mosesdecoder/bin/moses train/model/moses.ini --mertdir
>>> ~/mosesdecoder/bin/ \
>>>   &> mert.out &
>>>
>>> P.S I'm on the old system version if that would make a difference.
>>>
>>>
>>>
>>> On Sun, Aug 18, 2013 at 7:05 PM, Barry Haddow <[email protected]
>>> >**wrote:
>>>
>>>  Hi Heidi
>>>>
>>>> Can you give the exact argument that you use to run tuning?
>>>>
>>>> cheers - Barry
>>>>
>>>>
>>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013
>>>> 18:55:59 +0200:
>>>>
>>>>  my training set have the same number of lines, same goes for my tuning
>>>>
>>>>> set,
>>>>> but each set is not the same number of lines as the other. i dont see
>>>>> the
>>>>> problem because in the moses baseline tutorial, this is how it works
>>>>> too,
>>>>> am i wrong?
>>>>>
>>>>>
>>>>> On Sun, Aug 18, 2013 at 6:53 PM, Barry Haddow <
>>>>> [email protected]
>>>>> >**wrote:
>>>>>
>>>>>  Hi Heidi
>>>>>
>>>>>>
>>>>>> You have to supply an input set and a reference set to mert-moses.plfor
>>>>>> tuning. This error suggests that they have different numbers of lines
>>>>>> in
>>>>>> them - so they are not parallel,
>>>>>>
>>>>>> cheers - Barry
>>>>>>
>>>>>>
>>>>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013
>>>>>> 18:45:31 +0200:
>>>>>>
>>>>>>  Inside of it, i get:
>>>>>>
>>>>>>
>>>>>>> Binary write mode is NOT selected
>>>>>>> Scorer type: BLEU
>>>>>>> name: case value: true
>>>>>>> Loading reference from /home/tjr/corpus/ar-en.tune.******true.en
>>>>>>> ............................******Data::m_score_type BLEU
>>>>>>>
>>>>>>>
>>>>>>> Data::Scorer type from Scorer: BLEU
>>>>>>> loading nbest from run1.best100.out.gz
>>>>>>> Exception: Sentence id (2844) not found in reference set
>>>>>>>
>>>>>>> I do not get the exception, which reference set is this referring to
>>>>>>> and
>>>>>>> what does it mean that it is not found?
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Aug 18, 2013 at 6:39 PM, Barry Haddow <
>>>>>>> [email protected]
>>>>>>> >**wrote:
>>>>>>>
>>>>>>>  Hi Heidi
>>>>>>>
>>>>>>>
>>>>>>>> Inside the mert working directory, there should be a file called
>>>>>>>> extract.err. Look at the error message in this file.
>>>>>>>>
>>>>>>>> It could be that the input and reference you are using for tuning are
>>>>>>>> mismatched,
>>>>>>>>
>>>>>>>> cheers - Barry
>>>>>>>>
>>>>>>>>
>>>>>>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013
>>>>>>>> 16:12:33 +0200:
>>>>>>>>
>>>>>>>>  I have an arabic to english system that works fine after training
>>>>>>>> but
>>>>>>>> when
>>>>>>>>
>>>>>>>>  I start tuning i end up with this in the mert.out file:
>>>>>>>>
>>>>>>>>> ERROR: Failed to run '/home/tjr/working/mert-work/***
>>>>>>>>> *****extractor.sh'.
>>>>>>>>> at
>>>>>>>>> /home/tjr/mmosesdecoder/********scripts/training/mert-moses.pl line
>>>>>>>>> 1554.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>> Scotland, with registration number SC005336.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to