Hi Chris,
Thanks a lot, I stay up till late too trying to understand if I have done
some mistakes...
Anyway, I have used the Moses script (Clean...) to filter out the long
sentences. The maximum length that I accept is 40 words.
I'm sorry but I guess that this is not the reason of my problem. :-(
Thanks a lot
Marco


On Dec 16, 2007 7:41 AM, Chris Dyer <[EMAIL PROTECTED]> wrote:

> Hi Marco,
> I happen to be up late tonight debugging this very same problem.  What
> are the odds?  Here's what I know so far:
>
> 1) Once you hit this problem, you're never going to recover, so it's
> good to put in an exit(1) in GIZA when you've detected it.
>
> 2) I think this has to do with a numerical underflow that is caused by
> too long sentences.  Generally, GIZA++ does not support sentences that
> are longer than 100 words (101 including NULL) and will truncate them
> if they exceed the internally specified maximum.  I attempted to
> increase this constant, but when I tried to align corpora with longer
> sentences, I started seeing this error, and I'm fairly confident that
> sentence length is the issue.  Looking at the implementation of the
> HMM alignment model (GIZA, by the way, makes Moses look like a work of
> art), it seems that there is no normalization being used in the
> forward/backward trellises, which can most definitely lead to
> underflow errors (see Fred Jelinek's book on speech recognition, maybe
> Section 2.10 if I recall correctly, for a discussion of this problem).
>
> Anyway, my recommendation is to try to get around this by filtering
> your corpus based on sentence length/sentence length ratio (please,
> let me know if this solution works for you!).  Once I confirm this is
> the problem, I'll look into adding trellis normalization to GIZA.
>
> --Chris
>
> On Dec 16, 2007 1:43 AM, marco turchi <[EMAIL PROTECTED]> wrote:
> > Dear experts,
> > I have run a full Moses process, training, optimization and testing... I
> > have sent all the output of these processes into a file. At the end, I
> have
> > seen that this file is huge 65 Gb, and the bleu score completely
> different
> > from other experiments with the same number of sentences...
> > I have investigated looking inside the outpup file, and i have seen that
> > Giza has found this error:
> > -----------
> > Hmm: Iteration 4
> > Reading more sentence pairs into memory ...
> > ERROR2: nan nan nanN:
> >
> > and after this error, I get a lot of lines full of number and then
> > ERROR: nan nan nan 52 38
> > ERROR: nan nan nan 52 38
> > ERROR: nan nan nan 52 38
> > ERROR: nan nan nan 52 38
> > ERROR: nan nan nan 52 38
> > ERROR: nan nan nan 52 38
> > ERROR: nan nan nan 52 38
> >
> >
> > and so on...
> >  the training phase gives me 63872760 of output lines....
> > do u know what it happens?
> >
> > if I run again the same experiment, will I get the same strange
> behaviours?
> > or  have I just been unlucky?
> >
> > Thanks a lot
> >  Marco
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to