Hi Colin,

Yes, I totally agree, grouping the fixed features together is the
right way to go. It would ideally go in the wrapper (mert-moses.pl) so
it could also be used with line-search-MERT and PRO, but as I recall,
it is hard practically to make stuff like that work in there.

How hard would it be to do in kbmira instead?

Cheers, Alex


On Wed, Feb 6, 2013 at 10:49 PM, Cherry, Colin
<[email protected]> wrote:
> Hi Alex,
>
> I'm afraid it does not, but I could certainly hack something in.
>
> I would be a little nervous about what this would do to MIRA. During MIRA 
> training, the scale of the features can change dramatically - I always start 
> by normalizing the weight vector to squared norm=1, and by the time I'm done 
> a passing through the n-best lists 60 times, the squared norm may have gotten 
> much larger. If I keep a feature fixed, it may quickly fall out of scale and 
> become irrelevant. Or maybe MIRA will mathmagically work to keep the other 
> features in scale. It's not clear to me without checking the literature. I 
> think Brian Roark held a single feature fixed in some of his perceptron work 
> for speech recognition, so that would be a place to start.
>
> Is there an alternative to holding specific weights constant? If there is a  
> group of features to be fixed (say the decoder's dense features), then I 
> would suggest presenting their weighted sum to MIRA as a single feature, 
> which MIRA can continue to scale appropriately using the meta-feature's 
> single weight. After training, the "fixed" features' weights would be the 
> product of the single meta-weight and the original fixed weight, which can go 
> back in the decoder.
>
> I hope that makes sense! I'm willing to add the weight-fixing feature, it's 
> easy enough to do, but I thought it would be worth having this conversation 
> first.
>
> -- Colin
>
> On 2013-02-06, at 11:43 AM, Alexander Fraser wrote:
>
>> Another batch MIRA question, perhaps for Colin this time: does kbmira
>> support only optimizing some feature weights (i.e., holding the other
>> weights constant)?
>>
>> Cheers, Alex
>>
>>
>> On Mon, Feb 4, 2013 at 3:06 PM, Alexander Fraser
>> <[email protected]> wrote:
>>> That's great - thanks!
>>>
>>> On Mon, Feb 4, 2013 at 2:29 PM, Barry Haddow <[email protected]> 
>>> wrote:
>>>> Hi Alex
>>>>
>>>> Yes, you can use batch mira for training sparse features, it works the same
>>>> way as PRO does in Moses.
>>>>
>>>> Unfortunately documentation on sparse features is, well, sparse... But the
>>>> n-best format is much the same as for dense features, ie
>>>>
>>>> name_1: value_1 name_2: value_2 ...
>>>>
>>>> Sparse features only get reported in the nbest if they are named in the
>>>> -report-sparse-features argument, otherwise their weighted sum will be
>>>> reported.
>>>>
>>>> cheers - Barry
>>>>
>>>>
>>>> On 04/02/13 13:13, Alexander Fraser wrote:
>>>>>
>>>>> Hi Folks,
>>>>>
>>>>> Can sparse features be used together with batch mira?
>>>>>
>>>>> Is there documentation for the n-best format of sparse features somewhere?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Cheers, Alex
>>>>>
>>>>
>>>>
>>>> --
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to