Hi Barry,

Attached is the zgrep result.
I found that in the middle of line 61 a few bytes are corrupted. Is that
a moses problem or my memory has a problem?

I also checked other files using iconv, they are all OK in UTF-8.

在 2016年01月18日 19:32, Barry Haddow 写道:
> Hi Dingyuan
> 
> Yes, that's very possible. The error could be in extracting features.dat
> from the nbest list. Are you able to post the nbest list? Or at least
> the entries for sentence 16?
> 
> Run something like
> 
> zgrep "^16 " tuning/tmp.1/run7.best100.out.gz
> 
> cheers - Barry
> 
> On 18/01/16 11:24, Dingyuan Wang wrote:
>> Hi Barry,
>>
>> I have rerun the ems after the first email, and then posted the recent
>> results, so the line changed.
>>
>> I just use the latest code, and the EMS script. Pretty much are default
>> settings. The EMS setting is:
>>
>> sparse-features = "target-word-insertion top 50, source-word-deletion
>> top 50, word-translation top 50 50, phrase-length"
>>
>> I suspect there is something unexpected in the extractor.
>>
>>
>> 在 2016年01月18日 19:03, Barry Haddow 写道:
>>> Hi Dingyuan
>>>
>>> In fact it is not the sparse features nor the Asian characters that are
>>> the problem. The offending line has 17 dense features, yet your model
>>> has 14 dense features.
>>>
>>> The string "1 1 1" appears directly after the language model feature in
>>> line 1694, in your attachment, adding the extra 3 features. Note that
>>> this is not the line you mentioned in your earlier email.
>>>
>>> I have no idea why there are extra features. Have you made changes to
>>> any of the core Moses features?
>>>
>>> best wishes
>>> Barry
>>>
>>> The offending line:
>>> what():  Error in line "-5.44027 0 0 -5.34901 0 0 0 -224.872 1 1 1 -39
>>> 18 -26.2331 -40.6736 -44.3698 -82.5072 WT_,~,=3 WT_:~:=1 WT_“~“=1
>>> WT_”~”=1 WT_曰~说=1 PL_s3=5 PL_3,2=2 PL_3,3=3 PL_2,3=4 PL_t3=7 PL_s1=5
>>> PL_1,2=2 PL_1,1=3 PL_t1=4 PL_2,2=3 PL_t2=7 PL_s2=8 PL_2,1=1 WT_有~有=1
>>> WT_!~!=1 WT_其~的=1 WT_其~他=1 WT_不~也=1 WT_不~没=1 WT_而~而=1 WT_而~
>>> 却=1 WT_祖逖~逖=1 WT_祖逖~祖=1 WT_逖~祖=1 WT_逖~逖=1 WT_大~大江=1 WT_者~
>>> 的=1 WT_者~人=1 WT_江~大江=1 WT_渡~渡过=1 WT_复~又=1 WT_余~有=1 WT_誓~发
>>> 誓=1 WT_楫~木=1 WT_江~长江=1 WT_击~击=1 WT_将~带领=1 WT_济~成功=1 WT_中
>>> 原~中原=1 WT_清~廓清=1 WT_如~像=1 WT_楫~戢=1 WT_能~能=1 WT_中~中流=1 WT_
>>> 流~中流=1 WT_部曲~部下=1 " of ...
>>>
>>>
>>> On 18/01/16 10:37, Dingyuan Wang wrote:
>>>> Hi,
>>>>
>>>> I've attached that. The line number is 1694.
>>>>
>>>> 在 2016年01月18日 16:43, Barry Haddow 写道:
>>>>> Hi Dingyuan
>>>>>
>>>>> Is it possible to attach the features.dat file that is causing the
>>>>> error? Almost certainly Moses is failing to parse the line because of
>>>>> the Asian characters in the feature names,
>>>>>
>>>>> cheers - Barry
>>>>>
>>>>> On 16/01/16 15:58, Dingyuan Wang wrote:
>>>>>> I ran
>>>>>>
>>>>>> ~/software/moses/bin/kbmira -J 75  --dense-init run7.dense
>>>>>> --sparse-init
>>>>>> run7.sparse-weights  --ffile run1.features.dat --ffile
>>>>>> run2.features.dat
>>>>>> --ffile run3.features.dat --ffile run4.features.dat --ffile
>>>>>> run5.features.dat --ffile run6.features.dat --ffile run7.features.dat
>>>>>> --scfile run1.scores.dat --scfile run2.scores.dat --scfile
>>>>>> run3.scores.dat --scfile run4.scores.dat --scfile run5.scores.dat
>>>>>> --scfile run6.scores.dat --scfile run7.scores.dat -o /tmp/mert.out
>>>>>>
>>>>>> in the tuning/tmp.1 directory, which will certainly replicate the
>>>>>> error.
>>>>>>
>>>>>> 在 2016年01月16日 23:42, Hieu Hoang 写道:
>>>>>>> The mert script prints out every command it runs. You should be
>>>>>>> able to
>>>>>>> replicate the error by running the last command
>>>>>>>
>>>>>>> On 16 Jan 2016 14:18, "Dingyuan Wang" <abcdoyle...@gmail.com
>>>>>>> <mailto:abcdoyle...@gmail.com>> wrote:
>>>>>>>
>>>>>>>        Sorry, but I can't reliably replicate the same problem when
>>>>>>> running
>>>>>>>        TUNING_tune.1 alone. There is no character '_' in the test
>>>>>>> set
>>>>>>> or top50
>>>>>>>        list.
>>>>>>>
>>>>>>>        I'm using sparse-features = "target-word-insertion top 50,
>>>>>>>        source-word-deletion top 50, word-translation top 50 50,
>>>>>>> phrase-length"
>>>>>>>
>>>>>>>        I've attached some related files from EMS and the EMS config.
>>>>>>>
>>>>>>>     
>>>>>>> https://mega.nz/#!xs0SFKxL!M_RTBp1JGX24-b4xlYYLP-bLXKiC_Sl-p96x55avAB4
>>>>>>>
>>>>>>>
>>>>>>>        在 2016年01月16日 02:45, Hieu Hoang 写道:
>>>>>>>        > could you make your model files available for download so I
>>>>>>> can
>>>>>>>        > replicate this problem.
>>>>>>>        >
>>>>>>>        > it seems like you're using a feature function with sparse
>>>>>>> scores. I
>>>>>>>        > think the character '_' must be escaped.
>>>>>>>        >
>>>>>>>        >
>>>>>>>        > On 12/01/16 04:00, Dingyuan Wang wrote:
>>>>>>>        >> Hi all,
>>>>>>>        >>
>>>>>>>        >> I'm using EMS for doing experiments. Every time the kbmira
>>>>>>> died with
>>>>>>>        >> SIGABRT when turning on one direction, while tuning on the
>>>>>>> opposite
>>>>>>>        >> direction (same config and test set) was successful.
>>>>>>>        >>
>>>>>>>        >> The mert.log (stderr) shows follows:
>>>>>>>        >>
>>>>>>>        >>
>>>>>>>        >> kbmira with c=0.01 decay=0.999 no_shuffle=0
>>>>>>>        >> Initialising random seed from system clock
>>>>>>>        >> Found 15323 initial sparse features
>>>>>>>        >> ....terminate called after throwing an instance of
>>>>>>>        >> 'MosesTuning::FileFormatException'
>>>>>>>        >>    what():  Error in line "-4.51933 0 0 -6.09733 0 0 0
>>>>>>> -121.556 2
>>>>>>>        -20 12
>>>>>>>        >> -31.6201 -38.5211 -26.5112 -60.6166 WT_,~,=2 WT_?~?=1
>>>>>>> PL_s1=4
>>>>>>>        >> PL_s3=1 PL_3,3=1 PL_2,2=3 PL_1,2=1 PL_2,1=3 PL_t1=6
>>>>>>> PL_t2=4
>>>>>>> PL_t3=2
>>>>>>>        >> PL_2,3=1 PL_s2=7 PL_1,1=3 WT_未~没有=1 WT_何~怎么=1 WT_何~
>>>>>>> 能=1
>>>>>>>        WT_方~正
>>>>>>>        >> 在=1 WT_又~还=1 WT_君~您=2 WT_趣~向=1 WT_趣~奔=1 WT_有~
>>>>>>> 没有=1
>>>>>>> WT_
>>>>>>>        往~去=1
>>>>>>>        >> WT_官~官员=1 WT_假~借=1 WT_檄~檄文=1 WT_文~文告=1 WT_上~上
>>>>>>> 级=1 WT_为~
>>>>>>>        >> 呢=1 WT_在~正在=1 " of run7.features.dat
>>>>>>>        >> Aborted
>>>>>>>        >>
>>>>>>>        >>
>>>>>>>        >> I think since run7.scores.dat is generated by some
>>>>>>> scripts, I
>>>>>>>        wouldn't
>>>>>>>        >> be responsible for making the bad format. Last time it
>>>>>>> also
>>>>>>> died, I
>>>>>>>        >> removed the likely offending line in the test set, but
>>>>>>> this time
>>>>>>>        another
>>>>>>>        >> line appears.
>>>>>>>        >>
>>>>>>>        >> --
>>>>>>>        >> Dingyuan Wang
>>>>>>>        >> _______________________________________________
>>>>>>>        >> Moses-support mailing list
>>>>>>>        >> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>>>>>        >> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>        >
>>>>>>>
>>>>>>>        --
>>>>>>>        Dingyuan Wang (gumblex)
>>>>>>>
>>>
> 
> 

-- 
Dingyuan Wang (gumblex)

Attachment: 16-run7.best100.out.gz
Description: application/gzip

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to