Because calculating translation probabilities from sentence ids is 
unexpectedly beneficial?

On 23/07/14 16:34, Marcin Junczys-Dowmunt wrote:
>
> So, how come this is not damaging the Edinburgh system?
>
> W dniu 23.07.2014 17:32, Hieu Hoang pisze:
>> ah ok.
>>
>> I thought it was just for debugging. I'm not gonna change it since 
>> it's gonna involve months of debugging.
>>
>> Ideally, the extract format should be fixed like the phrase-table, 
>> with the last column being key-value pairs. Also, way the key-value 
>> pairs are processed should be automatic like in the decoder.
>>
>> marcin - sorry mate. you're on your own
>>
>> On 23/07/14 16:20, Philipp Koehn wrote:
>>> Hi,
>>>
>>> the sentence ID is being used for the domain indicator features.
>>>
>>> If you run phrase-extract's score with specifying a domain file,
>>> it then it uses the sentence IDs to find out which domain the
>>> phrase pair was found in.
>>>
>>> This is a standard features in Edinburgh's phrase-based system
>>> for the last 1-2 years, so if you want to make changes, make
>>> sure that this functionality still works (see [1381-5] for an example
>>> with extract* files still in place).
>>>
>>> -phi
>>>
>>>
>>> On Wed, Jul 23, 2014 at 7:15 AM, Marcin Junczys-Dowmunt 
>>> <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote:
>>>
>>>     Key-value format would actually be fine.
>>>
>>>     W dniu 23.07.2014 13:12, Marcin Junczys-Dowmunt pisze:
>>>>     I was planning to use it for a custom feature function later.
>>>>
>>>>     W dniu 23.07.2014 13:11, Hieu Hoang pisze:
>>>>>     i can change it so that the sentence id is put into a
>>>>>     key-value field in the last column.
>>>>>
>>>>>     what is the sentence id used for? is it just for debugging
>>>>>     purposes?
>>>>>
>>>>>
>>>>>     On 23 July 2014 11:36, Marcin Junczys-Dowmunt
>>>>>     <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote:
>>>>>
>>>>>         Hi,
>>>>>         I am using train-model.perl with
>>>>>
>>>>>         --extract-options="--IncludeSentenceId"
>>>>>
>>>>>         and it seems that the sentence id is somehow getting into
>>>>>         the phrase
>>>>>         table as a count and later used for phrase translation weight
>>>>>         calculation, for instance the extract (last column is the Id):
>>>>>
>>>>>         #c the compound or process ||| #c verbindung oder
>>>>>         verfahren ||| 0-0 2-1
>>>>>         3-2 4-3 ||| 1374618
>>>>>         #c the compound or process ||| #c verbindung oder
>>>>>         verfahren ||| 0-0 2-1
>>>>>         3-2 4-3 ||| 1374619
>>>>>         #c the compound or process ||| #c verbindung oder
>>>>>         verfahren ||| 0-0 2-1
>>>>>         3-2 4-3 ||| 1374620
>>>>>         #c the compound or process ||| #c verbindung oder
>>>>>         verfahren ||| 0-0 2-1
>>>>>         3-2 4-3 ||| 1374621
>>>>>         #c the compound or process ||| #c verbindung oder
>>>>>         verfahren ||| 0-0 2-1
>>>>>         3-2 4-3 ||| 1374622
>>>>>         #c the compound or process ||| #c verbindung oder
>>>>>         verfahren ||| 0-0 2-1
>>>>>         3-2 4-3 ||| 4587318
>>>>>
>>>>>         results in a phrase table entry like this:
>>>>>
>>>>>         #c the compound or process ||| #c verbindung oder
>>>>>         verfahren ||| 1
>>>>>         0.0100206 5.23542e-07 0.524577 ||| 0-0 2-1 3-2 4-3 ||| 6
>>>>>         1.14604e+07 6
>>>>>         ||| |||
>>>>>
>>>>>         The count is equal to the sum of sentence ids, which of
>>>>>         course make the
>>>>>         phrase probability useless.
>>>>>
>>>>>         _______________________________________________
>>>>>         Moses-support mailing list
>>>>>         Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>>>         http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>     -- 
>>>>>     Hieu Hoang
>>>>>     Research Associate
>>>>>     University of Edinburgh
>>>>>     http://www.hoang.co.uk/hieu
>>>>>
>>>>
>>>>
>>>>
>>>>     _______________________________________________
>>>>     Moses-support mailing list
>>>>     Moses-support@mit.edu  <mailto:Moses-support@mit.edu>
>>>>     http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>     _______________________________________________
>>>     Moses-support mailing list
>>>     Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>     http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to