Hello all, I'm trying to get lattice input to Moses to work for morpheme segmentation for Finnish->English MT. I'm using the description here[1] and have the following questions:
1) Do the weights outgoing arcs have to add up to 1.0 ? In some places it says weight, and in others probability. 2) For the multiline example, it is important that there be a preceeding space on the line before the first '(', but it's not mentioned in the documentation -- could it be added ? The code in question seems to be in parsePCN() where it returns error if in[c++] is not '(', so if you have "(" instead of " (" in the first line, the checkplf program returns a "there appears to be no path to the goal" error. This does not seem to be a problem in the single-line format, providing there are no extra spaces. 3) How does training work ? Should the training data include all the possible segmentations ? e.g. If I have a sentence (surface forms) in Finnish: Näitä siirtoja nopeutettiin tuntuvasti vuonna 1998 . Redeployment was stepped up in 1998 . Should I include: Näitä siirto >j >a nopeutettiin tuntuvasti vuote >na 1998 . Näitä siirtoja nopeutettiin tuntuvasti vuote >na 1998 . Näitä siirto >j >a nopeutettiin tuntuvasti vuonna 1998 . [etc.] (where '>' indicates a suffix morpheme boundary). I read Dyer et al. (2008) paper, and what I'd like to do is similar to the Arabic setup, but how the training corpus was processed is not clear (at least to me). :) Thanks in advance for any help! Fran 1. http://www.statmt.org/moses/?n=Moses.WordLattices _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support