Hi Barry, "The ones starting with the "@"" are due to corrupted bytes in the nbest list.
This kind of corruption occurs from time to time. I wonder if it comes from memory errors or filesystem failure or some kind of pointer/encoding problem in moses. I've written a script to find such corrupted lines: https://gist.github.com/gumblex/0d9d0848b435e4f9818f 在 2016年01月18日 20:42, Barry Haddow 写道: > Hi Dingyuan > > The extractor expects feature names to contain an underscore (not sure > exactly why) but some of yours don't, and Moses skips them, interpreting > their values as extra dense features. > > The attached screenshot shows my view of the offending names. The ones > starting with the "@" are the problem. So it does look like the nbest > list is corrupted. Can you run the decoder on just that sentence, to > create an uncompressed version of the nbest list? > > cheers - Barry > > On 18/01/16 12:02, Dingyuan Wang wrote: >> Hi Barry, >> >> Attached is the zgrep result. >> I found that in the middle of line 61 a few bytes are corrupted. Is that >> a moses problem or my memory has a problem? >> >> I also checked other files using iconv, they are all OK in UTF-8. >> >> 在 2016年01月18日 19:32, Barry Haddow 写道: >>> Hi Dingyuan >>> >>> Yes, that's very possible. The error could be in extracting features.dat >>> from the nbest list. Are you able to post the nbest list? Or at least >>> the entries for sentence 16? >>> >>> Run something like >>> >>> zgrep "^16 " tuning/tmp.1/run7.best100.out.gz >>> >>> cheers - Barry >>> >>> On 18/01/16 11:24, Dingyuan Wang wrote: >>>> Hi Barry, >>>> >>>> I have rerun the ems after the first email, and then posted the recent >>>> results, so the line changed. >>>> >>>> I just use the latest code, and the EMS script. Pretty much are default >>>> settings. The EMS setting is: >>>> >>>> sparse-features = "target-word-insertion top 50, source-word-deletion >>>> top 50, word-translation top 50 50, phrase-length" >>>> >>>> I suspect there is something unexpected in the extractor. >>>> >>>> >>>> 在 2016年01月18日 19:03, Barry Haddow 写道: >>>>> Hi Dingyuan >>>>> >>>>> In fact it is not the sparse features nor the Asian characters that >>>>> are >>>>> the problem. The offending line has 17 dense features, yet your model >>>>> has 14 dense features. >>>>> >>>>> The string "1 1 1" appears directly after the language model >>>>> feature in >>>>> line 1694, in your attachment, adding the extra 3 features. Note that >>>>> this is not the line you mentioned in your earlier email. >>>>> >>>>> I have no idea why there are extra features. Have you made changes to >>>>> any of the core Moses features? >>>>> >>>>> best wishes >>>>> Barry >>>>> >>>>> The offending line: >>>>> what(): Error in line "-5.44027 0 0 -5.34901 0 0 0 -224.872 1 1 1 -39 >>>>> 18 -26.2331 -40.6736 -44.3698 -82.5072 WT_,~,=3 WT_:~:=1 WT_“~“=1 >>>>> WT_”~”=1 WT_曰~说=1 PL_s3=5 PL_3,2=2 PL_3,3=3 PL_2,3=4 PL_t3=7 PL_s1=5 >>>>> PL_1,2=2 PL_1,1=3 PL_t1=4 PL_2,2=3 PL_t2=7 PL_s2=8 PL_2,1=1 WT_有~有=1 >>>>> WT_!~!=1 WT_其~的=1 WT_其~他=1 WT_不~也=1 WT_不~没=1 WT_而~而=1 >>>>> WT_而~ >>>>> 却=1 WT_祖逖~逖=1 WT_祖逖~祖=1 WT_逖~祖=1 WT_逖~逖=1 WT_大~大江=1 >>>>> WT_者~ >>>>> 的=1 WT_者~人=1 WT_江~大江=1 WT_渡~渡过=1 WT_复~又=1 WT_余~有=1 WT_ >>>>> 誓~发 >>>>> 誓=1 WT_楫~木=1 WT_江~长江=1 WT_击~击=1 WT_将~带领=1 WT_济~成功=1 >>>>> WT_中 >>>>> 原~中原=1 WT_清~廓清=1 WT_如~像=1 WT_楫~戢=1 WT_能~能=1 WT_中~中 >>>>> 流=1 WT_ >>>>> 流~中流=1 WT_部曲~部下=1 " of ... >>>>> >>>>> >>>>> On 18/01/16 10:37, Dingyuan Wang wrote: >>>>>> Hi, >>>>>> >>>>>> I've attached that. The line number is 1694. >>>>>> >>>>>> 在 2016年01月18日 16:43, Barry Haddow 写道: >>>>>>> Hi Dingyuan >>>>>>> >>>>>>> Is it possible to attach the features.dat file that is causing the >>>>>>> error? Almost certainly Moses is failing to parse the line >>>>>>> because of >>>>>>> the Asian characters in the feature names, >>>>>>> >>>>>>> cheers - Barry >>>>>>> >>>>>>> On 16/01/16 15:58, Dingyuan Wang wrote: >>>>>>>> I ran >>>>>>>> >>>>>>>> ~/software/moses/bin/kbmira -J 75 --dense-init run7.dense >>>>>>>> --sparse-init >>>>>>>> run7.sparse-weights --ffile run1.features.dat --ffile >>>>>>>> run2.features.dat >>>>>>>> --ffile run3.features.dat --ffile run4.features.dat --ffile >>>>>>>> run5.features.dat --ffile run6.features.dat --ffile >>>>>>>> run7.features.dat >>>>>>>> --scfile run1.scores.dat --scfile run2.scores.dat --scfile >>>>>>>> run3.scores.dat --scfile run4.scores.dat --scfile run5.scores.dat >>>>>>>> --scfile run6.scores.dat --scfile run7.scores.dat -o /tmp/mert.out >>>>>>>> >>>>>>>> in the tuning/tmp.1 directory, which will certainly replicate the >>>>>>>> error. >>>>>>>> >>>>>>>> 在 2016年01月16日 23:42, Hieu Hoang 写道: >>>>>>>>> The mert script prints out every command it runs. You should be >>>>>>>>> able to >>>>>>>>> replicate the error by running the last command >>>>>>>>> >>>>>>>>> On 16 Jan 2016 14:18, "Dingyuan Wang" <abcdoyle...@gmail.com >>>>>>>>> <mailto:abcdoyle...@gmail.com>> wrote: >>>>>>>>> >>>>>>>>> Sorry, but I can't reliably replicate the same problem >>>>>>>>> when >>>>>>>>> running >>>>>>>>> TUNING_tune.1 alone. There is no character '_' in the test >>>>>>>>> set >>>>>>>>> or top50 >>>>>>>>> list. >>>>>>>>> >>>>>>>>> I'm using sparse-features = "target-word-insertion top 50, >>>>>>>>> source-word-deletion top 50, word-translation top 50 50, >>>>>>>>> phrase-length" >>>>>>>>> >>>>>>>>> I've attached some related files from EMS and the EMS >>>>>>>>> config. >>>>>>>>> >>>>>>>>> >>>>>>>>> https://mega.nz/#!xs0SFKxL!M_RTBp1JGX24-b4xlYYLP-bLXKiC_Sl-p96x55avAB4 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 在 2016年01月16日 02:45, Hieu Hoang 写道: >>>>>>>>> > could you make your model files available for >>>>>>>>> download so I >>>>>>>>> can >>>>>>>>> > replicate this problem. >>>>>>>>> > >>>>>>>>> > it seems like you're using a feature function with >>>>>>>>> sparse >>>>>>>>> scores. I >>>>>>>>> > think the character '_' must be escaped. >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > On 12/01/16 04:00, Dingyuan Wang wrote: >>>>>>>>> >> Hi all, >>>>>>>>> >> >>>>>>>>> >> I'm using EMS for doing experiments. Every time the >>>>>>>>> kbmira >>>>>>>>> died with >>>>>>>>> >> SIGABRT when turning on one direction, while tuning >>>>>>>>> on the >>>>>>>>> opposite >>>>>>>>> >> direction (same config and test set) was successful. >>>>>>>>> >> >>>>>>>>> >> The mert.log (stderr) shows follows: >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> kbmira with c=0.01 decay=0.999 no_shuffle=0 >>>>>>>>> >> Initialising random seed from system clock >>>>>>>>> >> Found 15323 initial sparse features >>>>>>>>> >> ....terminate called after throwing an instance of >>>>>>>>> >> 'MosesTuning::FileFormatException' >>>>>>>>> >> what(): Error in line "-4.51933 0 0 -6.09733 0 0 0 >>>>>>>>> -121.556 2 >>>>>>>>> -20 12 >>>>>>>>> >> -31.6201 -38.5211 -26.5112 -60.6166 WT_,~,=2 >>>>>>>>> WT_?~?=1 >>>>>>>>> PL_s1=4 >>>>>>>>> >> PL_s3=1 PL_3,3=1 PL_2,2=3 PL_1,2=1 PL_2,1=3 PL_t1=6 >>>>>>>>> PL_t2=4 >>>>>>>>> PL_t3=2 >>>>>>>>> >> PL_2,3=1 PL_s2=7 PL_1,1=3 WT_未~没有=1 WT_何~怎么=1 >>>>>>>>> WT_何~ >>>>>>>>> 能=1 >>>>>>>>> WT_方~正 >>>>>>>>> >> 在=1 WT_又~还=1 WT_君~您=2 WT_趣~向=1 WT_趣~奔=1 WT_有~ >>>>>>>>> 没有=1 >>>>>>>>> WT_ >>>>>>>>> 往~去=1 >>>>>>>>> >> WT_官~官员=1 WT_假~借=1 WT_檄~檄文=1 WT_文~文告=1 >>>>>>>>> WT_上~上 >>>>>>>>> 级=1 WT_为~ >>>>>>>>> >> 呢=1 WT_在~正在=1 " of run7.features.dat >>>>>>>>> >> Aborted >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> I think since run7.scores.dat is generated by some >>>>>>>>> scripts, I >>>>>>>>> wouldn't >>>>>>>>> >> be responsible for making the bad format. Last time it >>>>>>>>> also >>>>>>>>> died, I >>>>>>>>> >> removed the likely offending line in the test set, but >>>>>>>>> this time >>>>>>>>> another >>>>>>>>> >> line appears. >>>>>>>>> >> >>>>>>>>> >> -- >>>>>>>>> >> Dingyuan Wang >>>>>>>>> >> _______________________________________________ >>>>>>>>> >> Moses-support mailing list >>>>>>>>> >> Moses-support@mit.edu <mailto:Moses-support@mit.edu> >>>>>>>>> >> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>>>> > >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Dingyuan Wang (gumblex) >>>>>>>>> >>> > > -- Dingyuan Wang (gumblex) _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support