Sorry I am not sure the bug I report is directly related to the issue, because the bug I mentioned is kind of "random" (read violation on some random address) and can hardly be reproduced on different machines. What we can do is fixing it and try again. Also, I will look into the problem you mentioned.
Chris Dyer wrote: > I haven't looked into what's causing the particular problem on this > corpus, but another known problem with the GIZA HMM model is that it > doesn't do a fairly standard kind of normalization in the > forward-backward training, which causes underflow errors in some > sentences (especially quite long ones), which also leads to this > problem. > > It seems that different systems handle very small floating point > numbers differently, so this seems to be a bigger or smaller problem > with different builds, but this also may interact with the fix the Qin > is reporting. Qin, have you been able to determine if your fix > corrects the problem with the German-English alignment? > > Chris > > On Thu, Feb 28, 2008 at 12:50 PM, Qin Gao <[EMAIL PROTECTED]> wrote: > >> Hi, Wilson, >> >> As I mentioned, GIZA++ may have a bug on HMM training stage, it will add >> some random number to count table, and maybe it is the reason. You may >> check the archive of the mailing list for the description of the bug, >> also, you can simply comment out the lines marked with //*******// in >> Array2.h to fix it. >> >> inline T*begin(){ >> #ifdef __STL_DEBUG //*******// >> if( h1==0||h2==0)return 0; >> #endif //*******// >> return &(p[0]); >> } >> inline T*end(){ >> #ifdef __STL_DEBUG //*******// >> if( h1==0||h2==0)return 0; >> #endif //*******// >> return &(p[0])+p.size(); >> } >> >> You may also be interested in trying a new version of Multi-threaded >> GIZA++ with the bug fixed, and a much faster speed here >> >> http://www.cs.cmu.edu/~qing/ >> >> Best, >> Qin >> >> >> >> Wilson, Kevin wrote: >> > >> > Hello all, >> > >> > I'm currently trying to train Moses on aligned subtitles obtained from >> > the opus corpus website. The files have been cleaned and formatted in >> > a similar way to the standard Europarl files. >> > >> > There are a series of NAN errors after Giza begins the HMM stage of >> > training. The corpus has been cleaned using the appropriate script and >> > the sentence length has been limited to 40, although many sentences >> > are much less than this. >> > >> > I'm guessing there's some strange characters messing things up or >> > something like that, but wondered if others had encountered this issue >> > and could possibly provide advice. >> > >> > Many thanks, >> > >> > Kevin. >> > >> > *Kevin A. Wilson, MS* >> > >> > Research Computing Division >> > >> > RTI International >> > >> > 3040 Cornwallis Road >> > >> > P.O. Box 12194 >> > >> > Research Triangle Park >> > >> > NC 27709-2194 >> > >> > (919) 485-5521 >> > >> >> >> >>> www.rti.org <http://www.rti.org/> >>> >> > >> > ------------------------------------------------------------------------ >> > >> > _______________________________________________ >> > Moses-support mailing list >> > Moses-support@mit.edu >> > http://mailman.mit.edu/mailman/listinfo/moses-support >> > >> >> _______________________________________________ >> Moses-support mailing list >> Moses-support@mit.edu >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > > _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support