Re: [Moses-support] Giza HMM errors - NAN

2008-03-25 Thread Chris Dyer
The fix reported by Qin Gao does indeed repair some of the NaN
problems, so I would certainly advise you to incorporate this into
your GIZA build.  However, it with 1.8M segments, you may be well
encountering an underflow situation so this may not fix the problem.
Chris

On Tue, Mar 25, 2008 at 8:02 AM, John D. Burger <[EMAIL PROTECTED]> wrote:
> Chris Dyer wrote:
>
>  > I haven't looked into what's causing the particular problem on this
>  > corpus, but another known problem with the GIZA HMM model is that it
>  > doesn't do a fairly standard kind of normalization in the
>  > forward-backward training, which causes underflow errors in some
>  > sentences (especially quite long ones), which also leads to this
>  > problem.
>
>  I see from the archives that this has been reported a number of
>  times, and I am now running into it, training on about 1.8 million
>  segments from the LDC Hong Kong corpus.  I had no such problem on a
>  100K subset of this data, so I suspect it is indeed an issue of
>  corpus size and underflow.  FWIW, I'm using the default parameters
>  for the training script.
>
>  Qin Gao suggested a patch to Array2.h in the GIZA code - does this
>  indeed fix the problem?  If not, has anyone found another solution or
>  a workaround?
>
>  Thanks.
>
>  - John Burger
>MITRE
>
>
> ___
>  Moses-support mailing list
>  Moses-support@mit.edu
>  http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Giza HMM errors - NAN

2008-03-25 Thread John D. Burger
Chris Dyer wrote:

> I haven't looked into what's causing the particular problem on this
> corpus, but another known problem with the GIZA HMM model is that it
> doesn't do a fairly standard kind of normalization in the
> forward-backward training, which causes underflow errors in some
> sentences (especially quite long ones), which also leads to this
> problem.

I see from the archives that this has been reported a number of  
times, and I am now running into it, training on about 1.8 million  
segments from the LDC Hong Kong corpus.  I had no such problem on a  
100K subset of this data, so I suspect it is indeed an issue of  
corpus size and underflow.  FWIW, I'm using the default parameters  
for the training script.

Qin Gao suggested a patch to Array2.h in the GIZA code - does this  
indeed fix the problem?  If not, has anyone found another solution or  
a workaround?

Thanks.

- John Burger
   MITRE
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Giza HMM errors - NAN

2008-02-28 Thread Qin Gao
Sorry I am not sure the bug I report is directly related to the issue, 
because the bug I mentioned is kind of "random" (read violation on some 
random address) and can hardly be reproduced on different machines. What 
we can do is fixing it and try again. Also, I will look into the problem 
you mentioned.

Chris Dyer wrote:
> I haven't looked into what's causing the particular problem on this
> corpus, but another known problem with the GIZA HMM model is that it
> doesn't do a fairly standard kind of normalization in the
> forward-backward training, which causes underflow errors in some
> sentences (especially quite long ones), which also leads to this
> problem.
>
> It seems that different systems handle very small floating point
> numbers differently, so this seems to be a bigger or smaller problem
> with different builds, but this also may interact with the fix the Qin
> is reporting.  Qin, have you been able to determine if your fix
> corrects the problem with the German-English alignment?
>
> Chris
>
> On Thu, Feb 28, 2008 at 12:50 PM, Qin Gao <[EMAIL PROTECTED]> wrote:
>   
>> Hi, Wilson,
>>
>>  As I mentioned, GIZA++ may have a bug on HMM training stage, it will add
>>  some random number to count table, and maybe it is the reason. You may
>>  check the archive of the mailing list for the description of the bug,
>>  also, you can simply comment out the lines marked with //***// in
>>  Array2.h to fix it.
>>
>>  inline T*begin(){
>>  #ifdef __STL_DEBUG //***//
>>  if( h1==0||h2==0)return 0;
>>  #endif //***//
>>  return &(p[0]);
>>  }
>>  inline T*end(){
>>  #ifdef __STL_DEBUG //***//
>>  if( h1==0||h2==0)return 0;
>>  #endif //***//
>>  return &(p[0])+p.size();
>>  }
>>
>>  You may also be interested in trying a new version of Multi-threaded
>>  GIZA++ with the bug fixed, and a much faster speed here
>>
>>  http://www.cs.cmu.edu/~qing/
>>
>>  Best,
>>  Qin
>>
>>
>>
>>  Wilson, Kevin wrote:
>>  >
>>  > Hello all,
>>  >
>>  > I'm currently trying to train Moses on aligned subtitles obtained from
>>  > the opus corpus website. The files have been cleaned and formatted in
>>  > a similar way to the standard Europarl files.
>>  >
>>  > There are a series of NAN errors after Giza begins the HMM stage of
>>  > training. The corpus has been cleaned using the appropriate script and
>>  > the sentence length has been limited to 40, although many sentences
>>  > are much less than this.
>>  >
>>  > I'm guessing there's some strange characters messing things up or
>>  > something like that, but wondered if others had encountered this issue
>>  > and could possibly provide advice.
>>  >
>>  > Many thanks,
>>  >
>>  > Kevin.
>>  >
>>  > *Kevin A. Wilson, MS*
>>  >
>>  > Research Computing Division
>>  >
>>  > RTI International
>>  >
>>  > 3040 Cornwallis Road
>>  >
>>  > P.O. Box 12194
>>  >
>>  > Research Triangle Park
>>  >
>>  > NC 27709-2194
>>  >
>>  > (919) 485-5521
>>  >
>>
>>
>> 
>>> www.rti.org 
>>>   
>>  >
>>  > 
>>  >
>>  > ___
>>  > Moses-support mailing list
>>  > Moses-support@mit.edu
>>  > http://mailman.mit.edu/mailman/listinfo/moses-support
>>  >
>>
>>  ___
>>  Moses-support mailing list
>>  Moses-support@mit.edu
>>  http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> 
>
>   

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Giza HMM errors - NAN

2008-02-28 Thread Chris Dyer
I haven't looked into what's causing the particular problem on this
corpus, but another known problem with the GIZA HMM model is that it
doesn't do a fairly standard kind of normalization in the
forward-backward training, which causes underflow errors in some
sentences (especially quite long ones), which also leads to this
problem.

It seems that different systems handle very small floating point
numbers differently, so this seems to be a bigger or smaller problem
with different builds, but this also may interact with the fix the Qin
is reporting.  Qin, have you been able to determine if your fix
corrects the problem with the German-English alignment?

Chris

On Thu, Feb 28, 2008 at 12:50 PM, Qin Gao <[EMAIL PROTECTED]> wrote:
> Hi, Wilson,
>
>  As I mentioned, GIZA++ may have a bug on HMM training stage, it will add
>  some random number to count table, and maybe it is the reason. You may
>  check the archive of the mailing list for the description of the bug,
>  also, you can simply comment out the lines marked with //***// in
>  Array2.h to fix it.
>
>  inline T*begin(){
>  #ifdef __STL_DEBUG //***//
>  if( h1==0||h2==0)return 0;
>  #endif //***//
>  return &(p[0]);
>  }
>  inline T*end(){
>  #ifdef __STL_DEBUG //***//
>  if( h1==0||h2==0)return 0;
>  #endif //***//
>  return &(p[0])+p.size();
>  }
>
>  You may also be interested in trying a new version of Multi-threaded
>  GIZA++ with the bug fixed, and a much faster speed here
>
>  http://www.cs.cmu.edu/~qing/
>
>  Best,
>  Qin
>
>
>
>  Wilson, Kevin wrote:
>  >
>  > Hello all,
>  >
>  > I'm currently trying to train Moses on aligned subtitles obtained from
>  > the opus corpus website. The files have been cleaned and formatted in
>  > a similar way to the standard Europarl files.
>  >
>  > There are a series of NAN errors after Giza begins the HMM stage of
>  > training. The corpus has been cleaned using the appropriate script and
>  > the sentence length has been limited to 40, although many sentences
>  > are much less than this.
>  >
>  > I'm guessing there's some strange characters messing things up or
>  > something like that, but wondered if others had encountered this issue
>  > and could possibly provide advice.
>  >
>  > Many thanks,
>  >
>  > Kevin.
>  >
>  > *Kevin A. Wilson, MS*
>  >
>  > Research Computing Division
>  >
>  > RTI International
>  >
>  > 3040 Cornwallis Road
>  >
>  > P.O. Box 12194
>  >
>  > Research Triangle Park
>  >
>  > NC 27709-2194
>  >
>  > (919) 485-5521
>  >
>
>
> > www.rti.org 
>  >
>  > 
>  >
>  > ___
>  > Moses-support mailing list
>  > Moses-support@mit.edu
>  > http://mailman.mit.edu/mailman/listinfo/moses-support
>  >
>
>  ___
>  Moses-support mailing list
>  Moses-support@mit.edu
>  http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Giza HMM errors - NAN

2008-02-28 Thread Qin Gao
Hi, Wilson,

As I mentioned, GIZA++ may have a bug on HMM training stage, it will add 
some random number to count table, and maybe it is the reason. You may 
check the archive of the mailing list for the description of the bug, 
also, you can simply comment out the lines marked with //***// in 
Array2.h to fix it.

inline T*begin(){
#ifdef __STL_DEBUG //***//
if( h1==0||h2==0)return 0;
#endif //***//
return &(p[0]);
}
inline T*end(){
#ifdef __STL_DEBUG //***//
if( h1==0||h2==0)return 0;
#endif //***//
return &(p[0])+p.size();
}

You may also be interested in trying a new version of Multi-threaded 
GIZA++ with the bug fixed, and a much faster speed here

http://www.cs.cmu.edu/~qing/

Best,
Qin

Wilson, Kevin wrote:
>
> Hello all,
>
> I’m currently trying to train Moses on aligned subtitles obtained from 
> the opus corpus website. The files have been cleaned and formatted in 
> a similar way to the standard Europarl files.
>
> There are a series of NAN errors after Giza begins the HMM stage of 
> training. The corpus has been cleaned using the appropriate script and 
> the sentence length has been limited to 40, although many sentences 
> are much less than this.
>
> I’m guessing there’s some strange characters messing things up or 
> something like that, but wondered if others had encountered this issue 
> and could possibly provide advice.
>
> Many thanks,
>
> Kevin.
>
> *Kevin A. Wilson, MS*
>
> Research Computing Division
>
> RTI International
>
> 3040 Cornwallis Road
>
> P.O. Box 12194
>
> Research Triangle Park
>
> NC 27709-2194
>
> (919) 485-5521
>
> www.rti.org 
>
> 
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>   

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Giza HMM errors - NAN

2008-02-28 Thread Barry Haddow
Hi 

When I found GIZA giving me nan errors it was due to a mismatch in the C++  
standard libraries. I had compiled GIZA on Redhat FC5 but I was running on 
FC6. Once I matched the compile and run platforms, the problem went away,

regards
Barry 

On Thursday 28 February 2008 17:34:23 Wilson, Kevin wrote:
> Hello all,
>
>
>
> I'm currently trying to train Moses on aligned subtitles obtained from
> the opus corpus website. The files have been cleaned and formatted in a
> similar way to the standard Europarl files.
>
>
>
> There are a series of  NAN errors after Giza begins the HMM stage of
> training. The corpus has been cleaned using the appropriate script and
> the sentence length has been limited to 40, although many sentences are
> much less than this.
>
>
>
> I'm guessing there's some strange characters messing things up or
> something like that, but wondered if others had encountered this issue
> and could possibly provide advice.
>
>
>
> Many thanks,
>
>
>
> Kevin.
>
>
>
> Kevin A. Wilson, MS
>
> Research Computing Division
>
> RTI International
>
> 3040 Cornwallis Road
>
> P.O. Box 12194
>
> Research Triangle Park
>
> NC  27709-2194
>
> (919) 485-5521
>
> www.rti.org 


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support