Hi Colin, Yes, I totally agree, grouping the fixed features together is the right way to go. It would ideally go in the wrapper (mert-moses.pl) so it could also be used with line-search-MERT and PRO, but as I recall, it is hard practically to make stuff like that work in there.
How hard would it be to do in kbmira instead? Cheers, Alex On Wed, Feb 6, 2013 at 10:49 PM, Cherry, Colin <[email protected]> wrote: > Hi Alex, > > I'm afraid it does not, but I could certainly hack something in. > > I would be a little nervous about what this would do to MIRA. During MIRA > training, the scale of the features can change dramatically - I always start > by normalizing the weight vector to squared norm=1, and by the time I'm done > a passing through the n-best lists 60 times, the squared norm may have gotten > much larger. If I keep a feature fixed, it may quickly fall out of scale and > become irrelevant. Or maybe MIRA will mathmagically work to keep the other > features in scale. It's not clear to me without checking the literature. I > think Brian Roark held a single feature fixed in some of his perceptron work > for speech recognition, so that would be a place to start. > > Is there an alternative to holding specific weights constant? If there is a > group of features to be fixed (say the decoder's dense features), then I > would suggest presenting their weighted sum to MIRA as a single feature, > which MIRA can continue to scale appropriately using the meta-feature's > single weight. After training, the "fixed" features' weights would be the > product of the single meta-weight and the original fixed weight, which can go > back in the decoder. > > I hope that makes sense! I'm willing to add the weight-fixing feature, it's > easy enough to do, but I thought it would be worth having this conversation > first. > > -- Colin > > On 2013-02-06, at 11:43 AM, Alexander Fraser wrote: > >> Another batch MIRA question, perhaps for Colin this time: does kbmira >> support only optimizing some feature weights (i.e., holding the other >> weights constant)? >> >> Cheers, Alex >> >> >> On Mon, Feb 4, 2013 at 3:06 PM, Alexander Fraser >> <[email protected]> wrote: >>> That's great - thanks! >>> >>> On Mon, Feb 4, 2013 at 2:29 PM, Barry Haddow <[email protected]> >>> wrote: >>>> Hi Alex >>>> >>>> Yes, you can use batch mira for training sparse features, it works the same >>>> way as PRO does in Moses. >>>> >>>> Unfortunately documentation on sparse features is, well, sparse... But the >>>> n-best format is much the same as for dense features, ie >>>> >>>> name_1: value_1 name_2: value_2 ... >>>> >>>> Sparse features only get reported in the nbest if they are named in the >>>> -report-sparse-features argument, otherwise their weighted sum will be >>>> reported. >>>> >>>> cheers - Barry >>>> >>>> >>>> On 04/02/13 13:13, Alexander Fraser wrote: >>>>> >>>>> Hi Folks, >>>>> >>>>> Can sparse features be used together with batch mira? >>>>> >>>>> Is there documentation for the n-best format of sparse features somewhere? >>>>> >>>>> Thanks! >>>>> >>>>> Cheers, Alex >>>>> >>>> >>>> >>>> -- >>>> The University of Edinburgh is a charitable body, registered in >>>> Scotland, with registration number SC005336. >>>> > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
