OK, thanks for the info both of you! On Thu, Feb 7, 2013 at 3:46 PM, Cherry, Colin <[email protected]> wrote: > Hi Alex, > > It would be possible to incorporate the change into kbmira, but I'm not eager > to do so. As you and Berry have mentioned, this makes more sense at the > wrapper level. There's no need for any tuner to know about the merging and > splitting of features. > > -- Colin > > On 2013-02-07, at 9:33 AM, Barry Haddow wrote: > >> Hi Alex >> >> There is already some provision for grouping features, so it should be >> possible to implement what you need at the wrapper level. >> >> At the moment, you can train a sparse feature model with mert by >> omitting the -report-sparse-features flag from Moses, which causes the >> sparse features to be summed before being written into the n-best list. >> There is also provision for a hyprid "pro-mert" training, where at each >> step all features are optimised with pro, then the dense ones are >> re-optimised with mert, >> >> cheers - Barry >> >> On 07/02/13 11:07, Alexander Fraser wrote: >>> Hi Colin, >>> >>> Yes, I totally agree, grouping the fixed features together is the >>> right way to go. It would ideally go in the wrapper (mert-moses.pl) so >>> it could also be used with line-search-MERT and PRO, but as I recall, >>> it is hard practically to make stuff like that work in there. >>> >>> How hard would it be to do in kbmira instead? >>> >>> Cheers, Alex >>> >>> >>> On Wed, Feb 6, 2013 at 10:49 PM, Cherry, Colin >>> <[email protected]> wrote: >>>> Hi Alex, >>>> >>>> I'm afraid it does not, but I could certainly hack something in. >>>> >>>> I would be a little nervous about what this would do to MIRA. During MIRA >>>> training, the scale of the features can change dramatically - I always >>>> start by normalizing the weight vector to squared norm=1, and by the time >>>> I'm done a passing through the n-best lists 60 times, the squared norm may >>>> have gotten much larger. If I keep a feature fixed, it may quickly fall >>>> out of scale and become irrelevant. Or maybe MIRA will mathmagically work >>>> to keep the other features in scale. It's not clear to me without checking >>>> the literature. I think Brian Roark held a single feature fixed in some of >>>> his perceptron work for speech recognition, so that would be a place to >>>> start. >>>> >>>> Is there an alternative to holding specific weights constant? If there is >>>> a group of features to be fixed (say the decoder's dense features), then >>>> I would suggest presenting their weighted sum to MIRA as a single feature, >>>> which MIRA can continue to scale appropriately using the meta-feature's >>>> single weight. After training, the "fixed" features' weights would be the >>>> product of the single meta-weight and the original fixed weight, which can >>>> go back in the decoder. >>>> >>>> I hope that makes sense! I'm willing to add the weight-fixing feature, >>>> it's easy enough to do, but I thought it would be worth having this >>>> conversation first. >>>> >>>> -- Colin >>>> >>>> On 2013-02-06, at 11:43 AM, Alexander Fraser wrote: >>>> >>>>> Another batch MIRA question, perhaps for Colin this time: does kbmira >>>>> support only optimizing some feature weights (i.e., holding the other >>>>> weights constant)? >>>>> >>>>> Cheers, Alex >>>>> >>>>> >>>>> On Mon, Feb 4, 2013 at 3:06 PM, Alexander Fraser >>>>> <[email protected]> wrote: >>>>>> That's great - thanks! >>>>>> >>>>>> On Mon, Feb 4, 2013 at 2:29 PM, Barry Haddow<[email protected]> >>>>>> wrote: >>>>>>> Hi Alex >>>>>>> >>>>>>> Yes, you can use batch mira for training sparse features, it works the >>>>>>> same >>>>>>> way as PRO does in Moses. >>>>>>> >>>>>>> Unfortunately documentation on sparse features is, well, sparse... But >>>>>>> the >>>>>>> n-best format is much the same as for dense features, ie >>>>>>> >>>>>>> name_1: value_1 name_2: value_2 ... >>>>>>> >>>>>>> Sparse features only get reported in the nbest if they are named in the >>>>>>> -report-sparse-features argument, otherwise their weighted sum will be >>>>>>> reported. >>>>>>> >>>>>>> cheers - Barry >>>>>>> >>>>>>> >>>>>>> On 04/02/13 13:13, Alexander Fraser wrote: >>>>>>>> Hi Folks, >>>>>>>> >>>>>>>> Can sparse features be used together with batch mira? >>>>>>>> >>>>>>>> Is there documentation for the n-best format of sparse features >>>>>>>> somewhere? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> Cheers, Alex >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> The University of Edinburgh is a charitable body, registered in >>>>>>> Scotland, with registration number SC005336. >>>>>>> >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
