[Moses-support] help for reusing moses-chart in a different (dependency tree - to - string) MT system

Fabien Cromières Thu, 08 Nov 2012 00:28:11 -0800

Dear Moses community,

First, thanks for providing the Research community with such a nice
open-source tool.


Second, I have some rather involved questions about the moses-chart decoder
for syntax-based MT. Hopefully, someone having a good familiarity with it
can answer me. Thanks in advance!

**** Short version of the questions:  ****

* Is there any support for Synchronized Tree Substitution Grammar in
moses-chart? (or any trick to use such a grammar with the decoder).
* Can moses-chart handle rules with more than 2 Non Terminals?
* Can moses-chart be directly given (in any way) a compact representation
of a set of parse of an input sentence?  (and then just do the remaining
work of selecting the best parse)

**** Longer, more detailed, version: ****

I am trying to improve a tree-to-string MT system (it is actually
tree-to-tree, but it will be easier for me to describe it as a
tree-to-string system). And I was hoping I could somehow re-use part of the
Moses toolchain. Basically, my system uses a dependency tree representation
of the input sentence.

For example: he ->is <- (a->boy)  (not easy to represent trees with strings
in a readable way;hopefully, this notation is intuitive enough).

I then have some Synchronized Tree Substitution rules, eg. for
English-French (although unlike normal TSG, here the target side will be  a
flat string).

R1: Y-> he | il
R2: X-> Y->is<-Z| Y est Z
R3: Z->a -> boy| un garcon
R4: Z->a ->boy| un enfant
R5: Z-> V->boy| V garcon
R6: V->a |un

The rules are already extracted, selected and "mapped" to the input by my
system (in other words, the source side parsing is already done). This mean
I have a somehow compact representation of every possible alternatives
derivations like this:
R1->R2<-(R3|R4|(R5<-R6))

Each derivation gives a different target sentence. From there, the problem
is extracting the best derivation/translation according to language model
and other features. It is essentially possible to do that with cube pruning
or other beam-search approaches. However it would be interesting for me to
re-use the work done for the moses-chart implementation.

My first idea was to try and convert my TSG rules to the rule format
expected by moses-chart. Unfortunately, this way, I cannot tell moses-chart
in any way that the source side has already been parsed. However, I was
hoping that I could make it easy for moses-chart to do the source-side
parsing.

Here is how I did it. For each input sentence, I use  the position of the
words instead of the words (to reduce parsing ambiguity).
For example, the input sentence:
he is a boy
becomes:
1 2 3 4
when I feed it to moses-chart.

Then I generate a rule file by adapting my rules accordingly (note that
each Non Terminal encode the source position it should match in its name,
ensuring the possible derivations that can be found by moses-chart is the
same as those given by the parsing done by my system):
1 [X][X1] ||| il [X][X1] ||| ||| ...
[X][X1] 2 [X][X4] [X][TOP] ||| [X][X1] est [X][X4] [X][TOP] ||| 0-0 2-2 |||
3 4 [X][X4] ||| un garcon [X][X4] ||| ||| ...
3 4 [X][X4] ||| un enfant [X][X4] ||| ||| ...
[X][X3] 4 [X][X4] ||| [X][X3] garcon [X][X4] ||| 0-0 ||| ...
3 [X][X3] ||| un [X][X1] ||| ||| ...

However, I quickly found this was not giving the expected result (actually,
no translation is found most of the time). After analyzing a bit more, I
think I found out the problem is that I am generating some rules with more
than 2 non-terminals. moses-chart do not complain about them, but it does
not seem to be able to use them properly.

I know hiero-style parsing will not work well with rules having more than 2
NT, so I guess this is to be expected. However, since it is not explicitly
mentioned anywhere, I wanted to confirm that.

Also, has anyone any suggestion for doing what I want to do with
moses-chart? (since I have already the source-side parsing figured out, I
don't think there is theoretical problems with having more than 2 NT per
rule in my case; however, I probably need to somehow provide the set of
possible parses to moses-chart)

I could also consider hacking into the moses-cmd code and try to replace
the source-parsing component by my own, but re-use some of the machinery.
Does that make sense or am I better off giving up on the idea of  re-using
moses code?). Or is anybody aware of another open source decoder that would
be more adapted for my case?

Thanks a lot to anyone that took the time to read my long explanations,
anyway :-)

Fabien

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] help for reusing moses-chart in a different (dependency tree - to - string) MT system

Reply via email to