Re: [Moses-support] sparse features with batch mira?

Alexander Fraser Thu, 07 Feb 2013 06:57:15 -0800

OK, thanks for the info both of you!

On Thu, Feb 7, 2013 at 3:46 PM, Cherry, Colin
<[email protected]> wrote:
> Hi Alex,
>
> It would be possible to incorporate the change into kbmira, but I'm not eager 
> to do so. As you and Berry have mentioned, this makes more sense at the 
> wrapper level. There's no need for any tuner to know about the merging and 
> splitting of features.
>
> -- Colin
>
> On 2013-02-07, at 9:33 AM, Barry Haddow wrote:
>
>> Hi Alex
>>
>> There is already some provision for grouping features, so it should be
>> possible to implement what you need at the wrapper level.
>>
>> At the moment, you can train a sparse feature model with mert by
>> omitting the -report-sparse-features flag from Moses, which causes the
>> sparse features to be summed before being written into the n-best list.
>> There is also provision for a hyprid "pro-mert" training, where at each
>> step all features are optimised with pro, then the dense ones are
>> re-optimised with mert,
>>
>> cheers - Barry
>>
>> On 07/02/13 11:07, Alexander Fraser wrote:
>>> Hi Colin,
>>>
>>> Yes, I totally agree, grouping the fixed features together is the
>>> right way to go. It would ideally go in the wrapper (mert-moses.pl) so
>>> it could also be used with line-search-MERT and PRO, but as I recall,
>>> it is hard practically to make stuff like that work in there.
>>>
>>> How hard would it be to do in kbmira instead?
>>>
>>> Cheers, Alex
>>>
>>>
>>> On Wed, Feb 6, 2013 at 10:49 PM, Cherry, Colin
>>> <[email protected]>  wrote:
>>>> Hi Alex,
>>>>
>>>> I'm afraid it does not, but I could certainly hack something in.
>>>>
>>>> I would be a little nervous about what this would do to MIRA. During MIRA 
>>>> training, the scale of the features can change dramatically - I always 
>>>> start by normalizing the weight vector to squared norm=1, and by the time 
>>>> I'm done a passing through the n-best lists 60 times, the squared norm may 
>>>> have gotten much larger. If I keep a feature fixed, it may quickly fall 
>>>> out of scale and become irrelevant. Or maybe MIRA will mathmagically work 
>>>> to keep the other features in scale. It's not clear to me without checking 
>>>> the literature. I think Brian Roark held a single feature fixed in some of 
>>>> his perceptron work for speech recognition, so that would be a place to 
>>>> start.
>>>>
>>>> Is there an alternative to holding specific weights constant? If there is 
>>>> a  group of features to be fixed (say the decoder's dense features), then 
>>>> I would suggest presenting their weighted sum to MIRA as a single feature, 
>>>> which MIRA can continue to scale appropriately using the meta-feature's 
>>>> single weight. After training, the "fixed" features' weights would be the 
>>>> product of the single meta-weight and the original fixed weight, which can 
>>>> go back in the decoder.
>>>>
>>>> I hope that makes sense! I'm willing to add the weight-fixing feature, 
>>>> it's easy enough to do, but I thought it would be worth having this 
>>>> conversation first.
>>>>
>>>> -- Colin
>>>>
>>>> On 2013-02-06, at 11:43 AM, Alexander Fraser wrote:
>>>>
>>>>> Another batch MIRA question, perhaps for Colin this time: does kbmira
>>>>> support only optimizing some feature weights (i.e., holding the other
>>>>> weights constant)?
>>>>>
>>>>> Cheers, Alex
>>>>>
>>>>>
>>>>> On Mon, Feb 4, 2013 at 3:06 PM, Alexander Fraser
>>>>> <[email protected]>  wrote:
>>>>>> That's great - thanks!
>>>>>>
>>>>>> On Mon, Feb 4, 2013 at 2:29 PM, Barry Haddow<[email protected]> 
>>>>>>  wrote:
>>>>>>> Hi Alex
>>>>>>>
>>>>>>> Yes, you can use batch mira for training sparse features, it works the 
>>>>>>> same
>>>>>>> way as PRO does in Moses.
>>>>>>>
>>>>>>> Unfortunately documentation on sparse features is, well, sparse... But 
>>>>>>> the
>>>>>>> n-best format is much the same as for dense features, ie
>>>>>>>
>>>>>>> name_1: value_1 name_2: value_2 ...
>>>>>>>
>>>>>>> Sparse features only get reported in the nbest if they are named in the
>>>>>>> -report-sparse-features argument, otherwise their weighted sum will be
>>>>>>> reported.
>>>>>>>
>>>>>>> cheers - Barry
>>>>>>>
>>>>>>>
>>>>>>> On 04/02/13 13:13, Alexander Fraser wrote:
>>>>>>>> Hi Folks,
>>>>>>>>
>>>>>>>> Can sparse features be used together with batch mira?
>>>>>>>>
>>>>>>>> Is there documentation for the n-best format of sparse features 
>>>>>>>> somewhere?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Cheers, Alex
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>> Scotland, with registration number SC005336.
>>>>>>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] sparse features with batch mira?

Reply via email to