Okay, the old kbmira works, so this must be part of the 3.0 changes. It seems that the names of features in the header line (FEATURES_TXT_BEGIN_0) are ignored entirely. The 2.1 kbmira would output dense feature weights using names F1..FN, which I would then re-map back to the list in the header. In kbmira 3.0, it uses the file passed in, as Barry pointed out.
Thanks for your help! matt > On Feb 27, 2015, at 1:21 PM, Matt Post <p...@cs.jhu.edu> wrote: > > Although, those old successful runs might have been with the old Moses > kbmira. I'll look into this and report back. > > matt > > >> On Feb 27, 2015, at 12:19 PM, Matt Post <p...@cs.jhu.edu >> <mailto:p...@cs.jhu.edu>> wrote: >> >> Hi Barry — Thanks for the response. I don't think that's it, because I use >> the exact same approach for lots of other tuning runs. Isn't it the header >> line of the features file that lists dense features? I've been using this >> format, where dense features are listed in each header line, and then sparse >> features in the individual lines: >> >> FEATURES_TXT_BEGIN_0 0 300 9 lm_0 lm_1 tm_pt_1 tm_pt_3 tm_pt_0 tm_pt_2 >> WordPenalty PhrasePenalty Distortion >> -82.183 -72.639 -79.162 -41.493 -60.118 -28.509 -10.857 19 -8 >> -82.183 -72.639 -79.162 -41.493 -60.118 -28.509 -10.857 19 -8 >> OOVPenalty=-100 >> >> This works in lots of places (although, it also raises a separate question, >> of whether kbmira actually distinguishes between sparse and dense features? >> I seem to remember Colin once saying that there is a single group weight >> between the two groups, but I've never been able to find this in the code). >> >> matt >> >> >>> On Feb 26, 2015, at 5:35 PM, Barry Haddow <bhad...@staffmail.ed.ac.uk >>> <mailto:bhad...@staffmail.ed.ac.uk>> wrote: >>> >>> Hi Matt >>> >>> When mert-moses.pl runs kbmira, it always supplies a list of the dense >>> features (and their initial values) using the --dense-init parameter. I >>> think this is your problem. I've attached a typical file used for this >>> feature list. >>> >>> Of course, kbmira should have a sensible message rather than a segfault. >>> This is probably my doing, >>> >>> cheers - Barry >>> >>> On 26/02/15 22:18, Matt Post wrote: >>>> kbmira segfaults on the following command: >>>> >>>> >>>> kbmira run --ffile run1.features.dat --scfile run1.scores.dat -o >>>> mert.out >>>> >>>> Where run1.features.dat (30 MB) and run1.scores.dat (14 MB) can be >>>> downloaded here: >>>> >>>> >>>> https://www.dropbox.com/s/yim7ub1bmq5jv2g/run1.features.dat?dl=0 >>>> <https://www.dropbox.com/s/yim7ub1bmq5jv2g/run1.features.dat?dl=0> >>>> >>>> https://www.dropbox.com/s/kkek36o7aflgzuu/run1.scores.dat?dl=0 >>>> <https://www.dropbox.com/s/kkek36o7aflgzuu/run1.scores.dat?dl=0> >>>> >>>> I tracked it down to this line of mert/FeatureStats.cpp. >>>> >>>> std::string SparseVector::decode(std::size_t id) >>>> { >>>> return m_id_to_name[id]; >>>> } >>>> >>>> Any obvious ideas before I go down this rabbit hole? I verified there are >>>> no blank lines or anything else funny with the formatting, at least as far >>>> as I can tell (all dense features, plus one sparse feature, >>>> OOVPenalty=-100, showing up occasionally). >>>> >>>> matt >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> <http://mailman.mit.edu/mailman/listinfo/moses-support> >>> >>> <run1.dense> >> >> _______________________________________________ >> Moses-support mailing list >> Moses-support@mit.edu <mailto:Moses-support@mit.edu> >> http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support