Hi Heidi

If the truecaser changes the number of lines in the file then that's a  
bug. Have you opened the files in a windows editor? Could you send me  
the before and after truecase files?

cheers - Barry

Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013  
20:10:10 +0200:

> hmmm... well this does make sense..
> the problem is there is nothing else that might have changed the number of
> lines because after tokenizing, the lines were the same.. the only time the
> files were not the same anymore is right after the truecasing step.. i just
> cut the .true files to have the same number of lines and made sure they are
> properly aligned and i just hope that tuning finishes successfully coz if
> not, i dont know what might have caused the problem. fingers crossed.
> anyway, thanks alot once again
>
>
> On Sun, Aug 18, 2013 at 7:50 PM, Barry Haddow  
> <[email protected]>wrote:
>
>> Hi Heidi
>>
>> Good to hear you found the problem. Tokenisation does not change the
>> number of lines, and neither does truecasing, so there must be a problem
>> elsewhere in your pre-processing pipeline,
>>
>> cheers - Barry
>>
>>
>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013
>> 19:47:29 +0200:
>>
>>  Yes! Problem found. Thanks alot. There was one more line in one file than
>>> the other.
>>> The original tuning data had the exact same number of lines but maybe the
>>> lines changed after tokenizing.
>>>
>>>
>>>
>>> On Sun, Aug 18, 2013 at 7:34 PM, Barry Haddow <[email protected]
>>> >**wrote:
>>>
>>>  Hi Heidi
>>>>
>>>> Can you run
>>>>
>>>> wc -l ~/corpus/ar-en.tune.true.fr ~/corpus/ar-en.tune.true.en
>>>>
>>>>
>>>> cheers - Barry
>>>>
>>>>
>>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013
>>>> 19:10:21 +0200:
>>>>
>>>>  cd ~/working
>>>>
>>>>>  nohup nice ~/mosesdecoder/scripts/****training/mert-moses.pl \
>>>>>
>>>>>   ~/corpus/ar-en.tune.true.fr ~/corpus/ar-en.tune.true.en \
>>>>>   ~/mosesdecoder/bin/moses train/model/moses.ini --mertdir
>>>>> ~/mosesdecoder/bin/ \
>>>>>   &> mert.out &
>>>>>
>>>>> P.S I'm on the old system version if that would make a difference.
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Aug 18, 2013 at 7:05 PM, Barry Haddow <
>>>>> [email protected]
>>>>> >**wrote:
>>>>>
>>>>>  Hi Heidi
>>>>>
>>>>>>
>>>>>> Can you give the exact argument that you use to run tuning?
>>>>>>
>>>>>> cheers - Barry
>>>>>>
>>>>>>
>>>>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013
>>>>>> 18:55:59 +0200:
>>>>>>
>>>>>>  my training set have the same number of lines, same goes for my tuning
>>>>>>
>>>>>>  set,
>>>>>>> but each set is not the same number of lines as the other. i dont see
>>>>>>> the
>>>>>>> problem because in the moses baseline tutorial, this is how it works
>>>>>>> too,
>>>>>>> am i wrong?
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Aug 18, 2013 at 6:53 PM, Barry Haddow <
>>>>>>> [email protected]
>>>>>>> >**wrote:
>>>>>>>
>>>>>>>  Hi Heidi
>>>>>>>
>>>>>>>
>>>>>>>> You have to supply an input set and a reference set to
>>>>>>>> mert-moses.plfor
>>>>>>>> tuning. This error suggests that they have different numbers of lines
>>>>>>>> in
>>>>>>>> them - so they are not parallel,
>>>>>>>>
>>>>>>>> cheers - Barry
>>>>>>>>
>>>>>>>>
>>>>>>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug 2013
>>>>>>>> 18:45:31 +0200:
>>>>>>>>
>>>>>>>>  Inside of it, i get:
>>>>>>>>
>>>>>>>>
>>>>>>>>  Binary write mode is NOT selected
>>>>>>>>> Scorer type: BLEU
>>>>>>>>> name: case value: true
>>>>>>>>> Loading reference from /home/tjr/corpus/ar-en.tune.********true.en
>>>>>>>>> ............................********Data::m_score_type BLEU
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Data::Scorer type from Scorer: BLEU
>>>>>>>>> loading nbest from run1.best100.out.gz
>>>>>>>>> Exception: Sentence id (2844) not found in reference set
>>>>>>>>>
>>>>>>>>> I do not get the exception, which reference set is this referring to
>>>>>>>>> and
>>>>>>>>> what does it mean that it is not found?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Aug 18, 2013 at 6:39 PM, Barry Haddow <
>>>>>>>>> [email protected]
>>>>>>>>> >**wrote:
>>>>>>>>>
>>>>>>>>>  Hi Heidi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  Inside the mert working directory, there should be a file called
>>>>>>>>>> extract.err. Look at the error message in this file.
>>>>>>>>>>
>>>>>>>>>> It could be that the input and reference you are using for tuning
>>>>>>>>>> are
>>>>>>>>>> mismatched,
>>>>>>>>>>
>>>>>>>>>> cheers - Barry
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Quoting Heidi Heweidy <[email protected]> on Sun, 18 Aug
>>>>>>>>>> 2013
>>>>>>>>>> 16:12:33 +0200:
>>>>>>>>>>
>>>>>>>>>>  I have an arabic to english system that works fine after training
>>>>>>>>>> but
>>>>>>>>>> when
>>>>>>>>>>
>>>>>>>>>>  I start tuning i end up with this in the mert.out file:
>>>>>>>>>>
>>>>>>>>>>  ERROR: Failed to run '/home/tjr/working/mert-work/*****
>>>>>>>>>>> *****extractor.sh'.
>>>>>>>>>>> at
>>>>>>>>>>> /home/tjr/mmosesdecoder/**********scripts/training/mert-moses.**
>>>>>>>>>>> pl <http://mert-moses.pl> line
>>>>>>>>>>> 1554.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  --
>>>>>>>>>>>
>>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>  --
>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>> Scotland, with registration number SC005336.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to