Dear Moses community, First, thanks for providing the Research community with such a nice open-source tool.
Second, I have some rather involved questions about the moses-chart decoder for syntax-based MT. Hopefully, someone having a good familiarity with it can answer me. Thanks in advance! **** Short version of the questions: **** * Is there any support for Synchronized Tree Substitution Grammar in moses-chart? (or any trick to use such a grammar with the decoder). * Can moses-chart handle rules with more than 2 Non Terminals? * Can moses-chart be directly given (in any way) a compact representation of a set of parse of an input sentence? (and then just do the remaining work of selecting the best parse) **** Longer, more detailed, version: **** I am trying to improve a tree-to-string MT system (it is actually tree-to-tree, but it will be easier for me to describe it as a tree-to-string system). And I was hoping I could somehow re-use part of the Moses toolchain. Basically, my system uses a dependency tree representation of the input sentence. For example: he ->is <- (a->boy) (not easy to represent trees with strings in a readable way;hopefully, this notation is intuitive enough). I then have some Synchronized Tree Substitution rules, eg. for English-French (although unlike normal TSG, here the target side will be a flat string). R1: Y-> he | il R2: X-> Y->is<-Z| Y est Z R3: Z->a -> boy| un garcon R4: Z->a ->boy| un enfant R5: Z-> V->boy| V garcon R6: V->a |un The rules are already extracted, selected and "mapped" to the input by my system (in other words, the source side parsing is already done). This mean I have a somehow compact representation of every possible alternatives derivations like this: R1->R2<-(R3|R4|(R5<-R6)) Each derivation gives a different target sentence. From there, the problem is extracting the best derivation/translation according to language model and other features. It is essentially possible to do that with cube pruning or other beam-search approaches. However it would be interesting for me to re-use the work done for the moses-chart implementation. My first idea was to try and convert my TSG rules to the rule format expected by moses-chart. Unfortunately, this way, I cannot tell moses-chart in any way that the source side has already been parsed. However, I was hoping that I could make it easy for moses-chart to do the source-side parsing. Here is how I did it. For each input sentence, I use the position of the words instead of the words (to reduce parsing ambiguity). For example, the input sentence: he is a boy becomes: 1 2 3 4 when I feed it to moses-chart. Then I generate a rule file by adapting my rules accordingly (note that each Non Terminal encode the source position it should match in its name, ensuring the possible derivations that can be found by moses-chart is the same as those given by the parsing done by my system): 1 [X][X1] ||| il [X][X1] ||| ||| ... [X][X1] 2 [X][X4] [X][TOP] ||| [X][X1] est [X][X4] [X][TOP] ||| 0-0 2-2 ||| 3 4 [X][X4] ||| un garcon [X][X4] ||| ||| ... 3 4 [X][X4] ||| un enfant [X][X4] ||| ||| ... [X][X3] 4 [X][X4] ||| [X][X3] garcon [X][X4] ||| 0-0 ||| ... 3 [X][X3] ||| un [X][X1] ||| ||| ... However, I quickly found this was not giving the expected result (actually, no translation is found most of the time). After analyzing a bit more, I think I found out the problem is that I am generating some rules with more than 2 non-terminals. moses-chart do not complain about them, but it does not seem to be able to use them properly. I know hiero-style parsing will not work well with rules having more than 2 NT, so I guess this is to be expected. However, since it is not explicitly mentioned anywhere, I wanted to confirm that. Also, has anyone any suggestion for doing what I want to do with moses-chart? (since I have already the source-side parsing figured out, I don't think there is theoretical problems with having more than 2 NT per rule in my case; however, I probably need to somehow provide the set of possible parses to moses-chart) I could also consider hacking into the moses-cmd code and try to replace the source-parsing component by my own, but re-use some of the machinery. Does that make sense or am I better off giving up on the idea of re-using moses code?). Or is anybody aware of another open source decoder that would be more adapted for my case? Thanks a lot to anyone that took the time to read my long explanations, anyway :-) Fabien
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support