Hello all, 

I'm trying to get lattice input to Moses to work for morpheme
segmentation for Finnish->English MT. I'm using the description here[1]
and have the following questions:

1) Do the weights outgoing arcs have to add up to 1.0 ? In some places
it says weight, and in others probability.

2) For the multiline example, it is important that there be a preceeding
space on the line before the first '(', but it's not mentioned in the
documentation -- could it be added ? The code in question seems to be in
parsePCN() where it returns error if in[c++] is not '(', so if you have 
  "(" instead of " (" in the first line, the checkplf program returns a
"there appears to be no path to the goal" error. This does not seem to
be a problem in the single-line format, providing there are no extra
spaces.

3) How does training work ? Should the training data include all the
possible segmentations ?  e.g. If I have a sentence (surface forms) in
Finnish:

  Näitä siirtoja nopeutettiin tuntuvasti vuonna 1998 .
  Redeployment was stepped up in 1998 .

Should I include:

  Näitä siirto >j >a nopeutettiin tuntuvasti vuote >na 1998 .
  Näitä siirtoja nopeutettiin tuntuvasti vuote >na 1998 .
  Näitä siirto >j >a nopeutettiin tuntuvasti vuonna 1998 .
  [etc.] 

(where '>' indicates a suffix morpheme boundary). I read Dyer et al.
(2008) paper, and what I'd like to do is similar to the Arabic setup,
but how the training corpus was processed is not clear (at least to
me).  :)

Thanks in advance for any help! 

Fran

1. http://www.statmt.org/moses/?n=Moses.WordLattices

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to