Hi Matthias, Thanks for the information.
I tested on moses 3.0, adding phrase table sparse feature is seems working. However, I did not add any flag into ini, like suggested "If a phrase table contains sparse features, then this needs to be flagged in the configuration file by adding the word sparse after the phrase table file name.". Did i miss anything? Regards, Jian On Thu, Jul 16, 2015 at 3:23 AM, Matthias Huck <mh...@inf.ed.ac.uk> wrote: > Hi Jian, > > That depends on the nature of the features you're planning to > implement. > > In order to produce sparse features, you need to write a feature > function anyway. > > But if it's only a handful of scores and they can be calculated during > extraction time, then go for dense features and add the scores directly > to the phrase table. > > If the scores cannot be precalculated, for instance because you need > non-local information that is only available during decoding, then a > feature function implementation becomes necessary. > > When you write a feature function that calculates scores during decoding > time, it can produce dense scores, sparse scores, or both types. That's > up to you. > > If it's plenty of scores which are fired rarely, then sparse is the > right choice. And you certainly need a sparse feature function > implementation in case you are not aware in advance of the overall > amount of feature scores it can produce. > > If you need information from phrase extraction in order to calculate > scores during decoding time, then we have something denoted as "phrase > properties". Phrase properties give you a means of storing arbitrary > additional information in the phrase table. You have to extend the > extraction pipeline to retrieve and store the phrase properties you > require. The decoder can later read this information from the phrase > table, and your feature function can utilize it in some way. > > A large amount of sparse feature scores can somewhat slow down decoding > and tuning. Also, you have to use MIRA or PRO for tuning, not MERT. > > Cheers, > Matthias > > > On Thu, 2015-07-16 at 02:18 +0100, jian zhang wrote: > > Hi Matthias, > > > > > > Not for domain feature. > > > > > > I want to implement some sparse features, so there are two options: > > 1, add to phrase table, if it is supported > > 2, implement sparse feature functions, > > > > > > I'd like to know are there any difference between these two options, > > for example, tuning, compute sentence translation scores ... > > > > > > Regards, > > > > > > > > Jian > > > > > > > > On Thu, Jul 16, 2015 at 2:06 AM, Matthias Huck <mh...@inf.ed.ac.uk> > > wrote: > > Hi, > > > > Are you planning to use binary domain indicator features? I'm > > not sure > > whether a sparse feature function for this is currently > > implemented. If > > you're working with a small set of domains, you can employ > > dense > > indicators instead (domain-features = "indicator" in EMS). > > You'll have > > to re-extract the phrase table, though. Or process it with a > > script to > > add dense indicator values to the scores field. > > > > I believe that there might also be some bug in the extraction > > pipeline > > when both domain-features = "sparse indicator" and > > score-settings = > > "--GoodTuring" are active in EMS. At least it caused me > > trouble a couple > > of weeks ago. However, I must admit that I didn't investigate > > it further > > at that point. > > > > Anyway, the bottom line is that I recommend re-extracting with > > dense > > indicators. > > > > But let me know what you find regarding a sparse > > implementation. > > > > Cheers, > > Matthias > > > > > > On Thu, 2015-07-16 at 00:48 +0100, jian zhang wrote: > > > Hi, > > > > > > > > > Is the sparse features at phrase table, like > > > > > > > > > > > > das Haus ||| the house ||| 0.8 0.5 0.8 0.5 2.718 ||| 0-0 1-1 > > ||| 5000 > > > 5000 2500 ||| dom_europarl 1 > > > > > > > > > > > > still supported? If yes, what should I set to the ini file > > based on > > > the example above? > > > > > > > > > Thank, > > > > > > > > > Jian > > > > > > > > > -- > > > Jian Zhang > > > Centre for Next Generation Localisation (CNGL) > > > Dublin City University > > > > > _______________________________________________ > > > Moses-support mailing list > > > Moses-support@mit.edu > > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > > > -- > > The University of Edinburgh is a charitable body, registered > > in > > Scotland, with registration number SC005336. > > > > > > > > > > > > -- > > Jian Zhang > > Centre for Next Generation Localisation (CNGL) > > Dublin City University > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > -- Jian Zhang Centre for Next Generation Localisation (CNGL) <http://www.cngl.ie/index.html> Dublin City University <http://www.dcu.ie/>
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support