Chris Dyer wrote:

> I haven't looked into what's causing the particular problem on this
> corpus, but another known problem with the GIZA HMM model is that it
> doesn't do a fairly standard kind of normalization in the
> forward-backward training, which causes underflow errors in some
> sentences (especially quite long ones), which also leads to this
> problem.

I see from the archives that this has been reported a number of  
times, and I am now running into it, training on about 1.8 million  
segments from the LDC Hong Kong corpus.  I had no such problem on a  
100K subset of this data, so I suspect it is indeed an issue of  
corpus size and underflow.  FWIW, I'm using the default parameters  
for the training script.

Qin Gao suggested a patch to Array2.h in the GIZA code - does this  
indeed fix the problem?  If not, has anyone found another solution or  
a workaround?


- John Burger
Moses-support mailing list

Reply via email to