Because calculating translation probabilities from sentence ids is unexpectedly beneficial?
On 23/07/14 16:34, Marcin Junczys-Dowmunt wrote: > > So, how come this is not damaging the Edinburgh system? > > W dniu 23.07.2014 17:32, Hieu Hoang pisze: >> ah ok. >> >> I thought it was just for debugging. I'm not gonna change it since >> it's gonna involve months of debugging. >> >> Ideally, the extract format should be fixed like the phrase-table, >> with the last column being key-value pairs. Also, way the key-value >> pairs are processed should be automatic like in the decoder. >> >> marcin - sorry mate. you're on your own >> >> On 23/07/14 16:20, Philipp Koehn wrote: >>> Hi, >>> >>> the sentence ID is being used for the domain indicator features. >>> >>> If you run phrase-extract's score with specifying a domain file, >>> it then it uses the sentence IDs to find out which domain the >>> phrase pair was found in. >>> >>> This is a standard features in Edinburgh's phrase-based system >>> for the last 1-2 years, so if you want to make changes, make >>> sure that this functionality still works (see [1381-5] for an example >>> with extract* files still in place). >>> >>> -phi >>> >>> >>> On Wed, Jul 23, 2014 at 7:15 AM, Marcin Junczys-Dowmunt >>> <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote: >>> >>> Key-value format would actually be fine. >>> >>> W dniu 23.07.2014 13:12, Marcin Junczys-Dowmunt pisze: >>>> I was planning to use it for a custom feature function later. >>>> >>>> W dniu 23.07.2014 13:11, Hieu Hoang pisze: >>>>> i can change it so that the sentence id is put into a >>>>> key-value field in the last column. >>>>> >>>>> what is the sentence id used for? is it just for debugging >>>>> purposes? >>>>> >>>>> >>>>> On 23 July 2014 11:36, Marcin Junczys-Dowmunt >>>>> <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote: >>>>> >>>>> Hi, >>>>> I am using train-model.perl with >>>>> >>>>> --extract-options="--IncludeSentenceId" >>>>> >>>>> and it seems that the sentence id is somehow getting into >>>>> the phrase >>>>> table as a count and later used for phrase translation weight >>>>> calculation, for instance the extract (last column is the Id): >>>>> >>>>> #c the compound or process ||| #c verbindung oder >>>>> verfahren ||| 0-0 2-1 >>>>> 3-2 4-3 ||| 1374618 >>>>> #c the compound or process ||| #c verbindung oder >>>>> verfahren ||| 0-0 2-1 >>>>> 3-2 4-3 ||| 1374619 >>>>> #c the compound or process ||| #c verbindung oder >>>>> verfahren ||| 0-0 2-1 >>>>> 3-2 4-3 ||| 1374620 >>>>> #c the compound or process ||| #c verbindung oder >>>>> verfahren ||| 0-0 2-1 >>>>> 3-2 4-3 ||| 1374621 >>>>> #c the compound or process ||| #c verbindung oder >>>>> verfahren ||| 0-0 2-1 >>>>> 3-2 4-3 ||| 1374622 >>>>> #c the compound or process ||| #c verbindung oder >>>>> verfahren ||| 0-0 2-1 >>>>> 3-2 4-3 ||| 4587318 >>>>> >>>>> results in a phrase table entry like this: >>>>> >>>>> #c the compound or process ||| #c verbindung oder >>>>> verfahren ||| 1 >>>>> 0.0100206 5.23542e-07 0.524577 ||| 0-0 2-1 3-2 4-3 ||| 6 >>>>> 1.14604e+07 6 >>>>> ||| ||| >>>>> >>>>> The count is equal to the sum of sentence ids, which of >>>>> course make the >>>>> phrase probability useless. >>>>> >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu> >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Hieu Hoang >>>>> Research Associate >>>>> University of Edinburgh >>>>> http://www.hoang.co.uk/hieu >>>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> Moses-support@mit.edu <mailto:Moses-support@mit.edu> >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> Moses-support@mit.edu >>> http://mailman.mit.edu/mailman/listinfo/moses-support >> > > > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support