Sorry I am not sure the bug I report is directly related to the issue, 
because the bug I mentioned is kind of "random" (read violation on some 
random address) and can hardly be reproduced on different machines. What 
we can do is fixing it and try again. Also, I will look into the problem 
you mentioned.

Chris Dyer wrote:
> I haven't looked into what's causing the particular problem on this
> corpus, but another known problem with the GIZA HMM model is that it
> doesn't do a fairly standard kind of normalization in the
> forward-backward training, which causes underflow errors in some
> sentences (especially quite long ones), which also leads to this
> problem.
>
> It seems that different systems handle very small floating point
> numbers differently, so this seems to be a bigger or smaller problem
> with different builds, but this also may interact with the fix the Qin
> is reporting.  Qin, have you been able to determine if your fix
> corrects the problem with the German-English alignment?
>
> Chris
>
> On Thu, Feb 28, 2008 at 12:50 PM, Qin Gao <[EMAIL PROTECTED]> wrote:
>   
>> Hi, Wilson,
>>
>>  As I mentioned, GIZA++ may have a bug on HMM training stage, it will add
>>  some random number to count table, and maybe it is the reason. You may
>>  check the archive of the mailing list for the description of the bug,
>>  also, you can simply comment out the lines marked with //*******// in
>>  Array2.h to fix it.
>>
>>  inline T*begin(){
>>  #ifdef __STL_DEBUG //*******//
>>  if( h1==0||h2==0)return 0;
>>  #endif //*******//
>>  return &(p[0]);
>>  }
>>  inline T*end(){
>>  #ifdef __STL_DEBUG //*******//
>>  if( h1==0||h2==0)return 0;
>>  #endif //*******//
>>  return &(p[0])+p.size();
>>  }
>>
>>  You may also be interested in trying a new version of Multi-threaded
>>  GIZA++ with the bug fixed, and a much faster speed here
>>
>>  http://www.cs.cmu.edu/~qing/
>>
>>  Best,
>>  Qin
>>
>>
>>
>>  Wilson, Kevin wrote:
>>  >
>>  > Hello all,
>>  >
>>  > I'm currently trying to train Moses on aligned subtitles obtained from
>>  > the opus corpus website. The files have been cleaned and formatted in
>>  > a similar way to the standard Europarl files.
>>  >
>>  > There are a series of NAN errors after Giza begins the HMM stage of
>>  > training. The corpus has been cleaned using the appropriate script and
>>  > the sentence length has been limited to 40, although many sentences
>>  > are much less than this.
>>  >
>>  > I'm guessing there's some strange characters messing things up or
>>  > something like that, but wondered if others had encountered this issue
>>  > and could possibly provide advice.
>>  >
>>  > Many thanks,
>>  >
>>  > Kevin.
>>  >
>>  > *Kevin A. Wilson, MS*
>>  >
>>  > Research Computing Division
>>  >
>>  > RTI International
>>  >
>>  > 3040 Cornwallis Road
>>  >
>>  > P.O. Box 12194
>>  >
>>  > Research Triangle Park
>>  >
>>  > NC 27709-2194
>>  >
>>  > (919) 485-5521
>>  >
>>
>>
>>     
>>> www.rti.org <http://www.rti.org/>
>>>       
>>  >
>>  > ------------------------------------------------------------------------
>>  >
>>  > _______________________________________________
>>  > Moses-support mailing list
>>  > Moses-support@mit.edu
>>  > http://mailman.mit.edu/mailman/listinfo/moses-support
>>  >
>>
>>  _______________________________________________
>>  Moses-support mailing list
>>  Moses-support@mit.edu
>>  http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>     
>
>   

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to