Re: [Moses-support] Major bug found in Moses

Read, James C Fri, 19 Jun 2015 06:43:04 -0700

If you want to use an automobile analogy then the TM is the engine which powers 
the vehicle. You as an investor have a few choices before you. Your objective 
is to make the car fun faster. Would you invest your money in:


a) the guy the says it is a desirable feature to keep an inefficient fuel 
guzzling motor that breaks down constantly such that you need to get out and 
push it (tuning) so it would be much more preferable to optimise the 
aerodynamics of the vehicle and install a rear window heater to keep your hands 
warm while your pushing it

b) the guy that says. Well here's a stroke of genius. Why don't we build a more 
powerful engine that uses less fuel and doesn't break down with no need to get 
out and push (tuning or pruning)

Honest replies only requested please.

James

________________________________________
From: moses-support-boun...@mit.edu <moses-support-boun...@mit.edu> on behalf 
of Burger, John D. <j...@mitre.org>
Sent: Thursday, June 18, 2015 6:32 PM
To: moses-support@mit.edu
Subject: Re: [Moses-support] Major bug found in Moses

On Jun 17, 2015, at 11:54, Read, James C <jcr...@essex.ac.uk> wrote:

> The question remains why isn't the system capable of finding the most likely 
> translations without the LM?

Even if it weren't ill-posed, I don't find this to be an interesting question 
at all. This is like trying to improve automobile transmissions by disabling 
the steering. These are the parts we have, and they all work together.

It's not as if human translators don't use their own internal language models.

- John Burger
  MITRE

> Evidently, if you filter the phrase table then the LM is not as important as 
> you might feel. The question remains why isn't the system capable of finding 
> the most likely translations without the LM? Why do I need to filter to help 
> the system find them? This is undesirable behaviour. Clearly a bug.
>
> I include the code I used for filtering. As you can see the 4th score only 
> was used as a filtering criteria.
>
> #!/usr/bin/perl -w
> #
> # Program filters phrase table to leave only phrase pairs
> # with probability above a threshold
> #
> use strict;
> use warnings;
> use Getopt::Long;
>
> my $phrase;
> my $min;
> my $phrase_table;
> my $filtered_table;
>
> GetOptions(     'min=f'         => \$min,
>                 'out=s'         => \$filtered_table,
>                 'in=s'          => \$phrase_table);
> die "ERROR: must give threshold and phrase table input file and output 
> file\n" unless ($min && $phrase_table && $filtered_table);
> die "ERROR: file $phrase_table does not exist\n" unless (-e $phrase_table);
> open (PHRASETABLE, "<$phrase_table") or die "FATAL: Could not open phrase 
> table $phrase_table\n";;
> open (FILTEREDTABLE, ">$filtered_table") or die "FATAL: Could not open phrase 
> table $filtered_table\n";;
>
> while (my $line = <PHRASETABLE>)
> {
>         chomp $line;
>         my @columns = split ('\|\|\|', $line);
>
>         # check that file is a well formatted phrase table
>         if (scalar @columns < 4)
>         {
>                 die "ERROR: input file is not a well formatted phrase table. 
> A phrase table must have at least four colums each column separated by |||\n";
>         }
>
>         # get the probability and check it is less than the threshold
>         my @scores = split /\s+/, $columns[2];
>         if ($scores[3] > $min)
>         {
>                 print FILTEREDTABLE $line."\n";;
>         }
> }
>
>
>
> From: Matt Post <p...@cs.jhu.edu>
> Sent: Wednesday, June 17, 2015 5:25 PM
> To: Read, James C
> Cc: Marcin Junczys-Dowmunt; moses-support@mit.edu; Arnold, Doug
> Subject: Re: [Moses-support] Major bug found in Moses
>
> I think you are misunderstanding how decoding works. The highest-weighted 
> translation of each source phrase is not necessarily the one with the best 
> BLEU score. This is why the decoder retains many options, so that it can 
> search among them (together with their reorderings). The LM is an important 
> component in making these selections.
>
> Also, how did you weight the many probabilities attached to each phrase (to 
> determine which was the most probable)? The tuning phase of decoding selects 
> weights designed to optimize BLEU score. If you weighted them evenly, that is 
> going to exacerbate this experiment.
>
> matt
>
>
>
>> On Jun 17, 2015, at 10:22 AM, Read, James C <jcr...@essex.ac.uk> wrote:
>>
>> All I did was break the link to the language model and then perform 
>> filtering. How is that a methodoligical mistake? How else would one test the 
>> efficacy of the TM in isolation?
>>
>> I remain convinced that this is undersirable behaviour and therefore a bug.
>>
>> James
>>
>>
>> From: Marcin Junczys-Dowmunt <junc...@amu.edu.pl>
>> Sent: Wednesday, June 17, 2015 5:12 PM
>> To: Read, James C
>> Cc: Arnold, Doug; moses-support@mit.edu
>> Subject: Re: [Moses-support] Major bug found in Moses
>>
>> Hi James
>> No, not at all. I would say that is expected behaviour. It's how search 
>> spaces and optimization works. If anything these are methodological mistakes 
>> on your side, sorry.  You are doing weird thinds to the decoder and then you 
>> are surprised to get weird results from it.
>> W dniu 2015-06-17 16:07, Read, James C napisał(a):
>>>
>>> So, do we agree that this is undersirable behaviour and therefore a bug?
>>>
>>> James
>>>
>>> From: Marcin Junczys-Dowmunt <junc...@amu.edu.pl>
>>> Sent: Wednesday, June 17, 2015 5:01 PM
>>> To: Read, James C
>>> Subject: Re: [Moses-support] Major bug found in Moses
>>>
>>> As I said. With an unpruned phrase table and an decoder that just optmizes 
>>> some unreasonble set of weights all bets are off, so if you get very low 
>>> BLEU point there, it's not surprising. It's probably jumping around in a 
>>> very weird search space. With a pruned phrase table you restrict the search 
>>> space VERY strongly. Nearly everything that will be produced is a 
>>> half-decent translation. So yes, I can imagine that would happen.
>>> Marcin
>>> W dniu 2015-06-17 15:56, Read, James C napisał(a):
>>> You would expect an improvement of 37 BLEU points?
>>>
>>> James
>>>
>>>
>>> From: Marcin Junczys-Dowmunt <junc...@amu.edu.pl>
>>> Sent: Wednesday, June 17, 2015 4:32 PM
>>> To: Read, James C
>>> Cc: Moses-support@mit.edu; Arnold, Doug
>>> Subject: Re: [Moses-support] Major bug found in Moses
>>>
>>> Hi James,
>>> there are many more factors involved than just probability, for instance 
>>> word penalties, phrase penalities etc. To be able to validate your own 
>>> claim you would need to set weights for all those non-probabilities to 
>>> zero. Otherwise there is no hope that moses will produce anything similar 
>>> to the most probable translation. And based on that there is no surprise 
>>> that there may be different translations. A pruned phrase table will 
>>> produce naturally less noise, so I would say the behaviour you describe is 
>>> quite exactly what I would expect to happen.
>>> Best,
>>> Marcin
>>> W dniu 2015-06-17 15:26, Read, James C napisał(a):
>>> Hi all,
>>>
>>> I tried unsuccessfully to publish experiments showing this bug in Moses 
>>> behaviour. As a result I have lost interest in attempting to have my work 
>>> published. Nonetheless I think you all should be aware of an anomaly in 
>>> Moses' behaviour which I have thoroughly exposed and should be easy enough 
>>> for you to reproduce.
>>>
>>> As I understand it the TM logic of Moses should select the most likely 
>>> translations according to the TM. I would therefore expect a run of Moses 
>>> with no LM to find sentences which are the most likely or at least close to 
>>> the most likely according to the TM.
>>>
>>> To test this behaviour I performed two runs of Moses. One with an 
>>> unfiltered phrase table the other with a filtered phrase table which left 
>>> only the most likely phrase pair for each source language phrase. The 
>>> results were truly startling. I observed huge differences in BLEU score. 
>>> The filtered phrase tables produced much higher BLEU scores. The beam size 
>>> used was the default width of 100. I would not have been surprised in the 
>>> differences in BLEU scores where minimal but they were quite high.
>>>
>>> I have been unable to find a logical explanation for this behaviour other 
>>> than to conclude that there must be some kind of bug in Moses which causes 
>>> a TM only run of Moses to perform poorly in finding the most likely 
>>> translations according to the TM when there are less likely phrase pairs 
>>> included in the race.
>>>
>>> I hope this information will be useful to the Moses community and that the 
>>> cause of the behaviour can be found and rectified.
>>>
>>> James
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>>
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Major bug found in Moses

Reply via email to