Re: [Moses-support] using sparse features

Marcin Junczys-Dowmunt Fri, 14 Nov 2014 02:19:23 -0800

Speed aside, quality did not improve significantly?

W dniu 14.11.2014 o 11:11, Eva Hasler pisze:

Let's say there was a bit of disillusionment about the advantages ofonline vs batch mira. Online mira was slow in comparison, but that'salso because the implementation was still in a kind of developmentstate and not optimised

On Fri, Nov 14, 2014 at 9:59 AM, Marcin Junczys-Dowmunt<junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote:


    Thanks. For some reasons I usually have quite week results with
    kbmira.
    What happened to that interesting Online MIRA idea? Died due to
    lack of
    maintenance?

    W dniu 14.11.2014 o 10:54, Barry Haddow pisze:
    > Hi Marcin
    >
    > Our default option would be kbmira (kbest batch mira). It seems
    to be
    > the most stable,
    >
    > cheers - Barry
    >
    > On 14/11/14 09:43, Marcin Junczys-Dowmunt wrote:
    >> Apropos MIRA, what's the current best practice tuner for sparse
    >> features? What are you guys using now for say WMT-grade systems?
    >>
    >> W dniu 14.11.2014 o 10:39, Barry Haddow pisze:
    >>> Hi Prashant
    >>>
    >>> We had to do these kind of dynamic weight updates for online
    MIRA. The
    >>> code is still there, although might have rotted, start by
    looking at
    >>> the
    >>> weight update methods in StaticData,
    >>>
    >>> cheers - Barry
    >>>
    >>> On 13/11/14 17:05, Prashant Mathur wrote:
    >>>> But in CAT scenario we do like this:
    >>>>
    >>>> translate: sentence 1
    >>>> tune: sentence 1 , post-edit 1
    >>>> translate: sentence 2
    >>>> tune: sentence 2 , post-edit 2
    >>>> ...
    >>>>
    >>>> In this case, I don't have any features generated or tuned
    before I
    >>>> start translating the first sentence.
    >>>>
    >>>> Old version is complicated, I am coding on the latest version
    now.
    >>>>
    >>>> --Prashant
    >>>>
    >>>>
    >>>> On Thu, Nov 13, 2014 at 5:26 PM, Philipp Koehn
    <pko...@inf.ed.ac.uk <mailto:pko...@inf.ed.ac.uk>
    >>>> <mailto:pko...@inf.ed.ac.uk <mailto:pko...@inf.ed.ac.uk>>> wrote:
    >>>>
    >>>>       Hi,
    >>>>
    >>>>       Typically you want to learn these feature weights when
    >>>> tuning. The
    >>>>       current setup supports and produces a sparse feature file.
    >>>>
    >>>>       -phi
    >>>>
    >>>>       On Nov 13, 2014 11:18 AM, "Prashant Mathur"
    <prash...@fbk.eu <mailto:prash...@fbk.eu>
    >>>>       <mailto:prash...@fbk.eu <mailto:prash...@fbk.eu>>> wrote:
    >>>>
    >>>>           what if I don't know the feature names before hand?
    >>>>           In that case, can I set the weights directly during
    >>>> decoding?
    >>>>
    >>>>           On Thu, Nov 13, 2014 at 4:59 PM, Barry Haddow
    >>>>           <bhad...@staffmail.ed.ac.uk
    <mailto:bhad...@staffmail.ed.ac.uk>
    >>>>           <mailto:bhad...@staffmail.ed.ac.uk
    <mailto:bhad...@staffmail.ed.ac.uk>>> wrote:
    >>>>
    >>>>               Hi Prashant
    >>>>
    >>>>               You add something like this to your moses.ini:
    >>>>
    >>>>               [weight-file]
    >>>>  /path/to/sparse/weights/file
    >>>>
    >>>>               The sparse weights file has the form:
    >>>>
    >>>>               name1 weight1
    >>>>               name2 weight2
    >>>>               name3 weight3
    >>>>               .
    >>>>               .
    >>>>               .
    >>>>
    >>>>               At least that's how it works in Moses v2.
    >>>>
    >>>>               cheers - Barry
    >>>>
    >>>>               On 13/11/14 15:42, Prashant Mathur wrote:
    >>>>
    >>>>                   Thanks a lot Barry for your answers.
    >>>>
    >>>>                   I have another question.
    >>>>                   When I print these sparse features at the
    end of
    >>>>                   decoding, all sparse features are assigned a
    >>>> weight of
    >>>>                   0 because all of them were initialized during
    >>>> decoding.
    >>>>                   How can I set these weights for sparse features
    >>>> before
    >>>>                   they are evaluated?
    >>>>
    >>>>
    >>>>                   Thanks Hieu for the link..
    >>>>                   I am going to update the code as soon as I
    can.. but
    >>>>                   it will take some time.. will get back to
    you when I
    >>>>                   do that.
    >>>>
    >>>>                   --Prashant
    >>>>
    >>>>
    >>>>                   On Thu, Nov 13, 2014 at 2:34 PM, Hieu Hoang
    >>>>                   <hieu.ho...@ed.ac.uk
    <mailto:hieu.ho...@ed.ac.uk> <mailto:hieu.ho...@ed.ac.uk
    <mailto:hieu.ho...@ed.ac.uk>>
    >>>>                   <mailto:hieu.ho...@ed.ac.uk
    <mailto:hieu.ho...@ed.ac.uk>
    >>>>                   <mailto:hieu.ho...@ed.ac.uk
    <mailto:hieu.ho...@ed.ac.uk>>>> wrote:
    >>>>
    >>>>                       re-iterating what Barry said, you
    should use the
    >>>>                   github moses if
    >>>>                       you want to create your own feature
    functions,
    >>>>                   especially with
    >>>>                       sparse features. The reasons:
    >>>>                         1. Adding new feature functions is a
    pain in v
    >>>>                   0.91. It's
    >>>>                       trivial now. You can watch my talk to find
    >>>> out why
    >>>> http://lectures.ms.mff.cuni.cz/video/recordshow/index/44/184
    >>>>                         2. It's confusing exactly when the
    feature
    >>>>                   functions are
    >>>>                       computed. It's clear now (hopefully!)
    >>>>                         3. I think you had to set special flags
    >>>>                   somewhere to use sparse
    >>>>                       features. Now, all feature functions
    can use
    >>>>                   sparse features as
    >>>>                       well as dense features
    >>>>                         4. I don't remember the 0.91 code very
    >>>> well. So
    >>>>                   I can't help you
    >>>>                       if you get stuck
    >>>>
    >>>>
    >>>>                       On 13 November 2014 11:06, Barry Haddow
    >>>>                       <bhad...@staffmail.ed.ac.uk
    <mailto:bhad...@staffmail.ed.ac.uk>
    >>>>                   <mailto:bhad...@staffmail.ed.ac.uk
    <mailto:bhad...@staffmail.ed.ac.uk>>
    >>>>                   <mailto:bhad...@staffmail.ed.ac.uk
    <mailto:bhad...@staffmail.ed.ac.uk>
    >>>> <mailto:bhad...@staffmail.ed.ac.uk
    <mailto:bhad...@staffmail.ed.ac.uk>>>>
    >>>>
    >>>>                       wrote:
    >>>>
    >>>>                           Hi Prashant
    >>>>
    >>>>                           I tried to answer your questions
    inline:
    >>>>
    >>>>
    >>>>                           On 12/11/14 20:27, Prashant Mathur
    wrote:
    >>>>                           > Hi All,
    >>>>                           >
    >>>>                           > I have a question about implementing
    >>>> sparse
    >>>>                   feature function.
    >>>>                           > I went through the details on its
    >>>>                   implementation, still
    >>>>                           somethings are
    >>>>                           > not clear.
    >>>>                           > FYI, I am using an old version of
    moses
    >>>>                   which dates back to
    >>>>                           Release
    >>>>                           > 0.91 I guess. So, I am sorry if my
    >>>> questions
    >>>>                   don't relate to the
    >>>>                           > latest implementation.
    >>>>
    >>>>                           This is a bad idea. The FF
    interface has
    >>>>                   changed a lot since 0.91.
    >>>>
    >>>>                           >
    >>>>                           > 1. I was looking at the
    >>>> TargetNgramFeature where
    >>>>  MakePrefixNgrams adds
    >>>>                           > features in Evaluate function.
    From the
    >>>> code
    >>>>                   it seems
    >>>>  MakePrefixNgrams
    >>>>                           > is adding sparse features on the fly.
    >>>> Is it
    >>>>                   correct?
    >>>>
    >>>>                           Yes, you can add sparse features on
    the fly.
    >>>>                   That's really
    >>>>                           what makes
    >>>>                           them sparse features.
    >>>>
    >>>>                           >
    >>>>                           > what is the weight assigned to
    this newly
    >>>>                   added feature? 1 or 0?
    >>>>
    >>>>                           The weight comes from the weights
    file that
    >>>>                   you provide at
    >>>>                           start-up. If
    >>>>                           the feature is not in the weights
    file, then
    >>>>                   it gets a weight
    >>>>                           of 0.
    >>>>
    >>>>                           >
    >>>>                           > 2. What is the difference between
    these
    >>>> two
    >>>>                   functions?
    >>>>                           >
    >>>>                           > /void PlusEquals(const
    ScoreProducer*sp,
    >>>>                   const std::string&
    >>>>                           name,
    >>>>                           > float score)/
    >>>>                           > /
    >>>>                           > /
    >>>>                           > /void SparsePlusEquals(const
    std::string&
    >>>>                   full_name, float
    >>>>                           score)
    >>>>                           > /
    >>>>
    >>>>                           In the first, a string from the
    >>>> ScoreProducer
    >>>>                   is prepended to
    >>>>                           the name,
    >>>>                           whilst in the second the string
    full_name is
    >>>>                   used as the name.
    >>>>                           I think
    >>>>                           we should really use the first form
    to keep
    >>>>                   features in their own
    >>>>                           namespace, but the second form has
    >>>> pervaded Moses.
    >>>>
    >>>>                           >
    >>>>                           > It seems like both of them are
    used for
    >>>>                   updating sparse feature
    >>>>                           > values.. correct?
    >>>>                           > Or, do the first one points to sparse
    >>>>                   features of a
    >>>>                           particular FF and
    >>>>                           > second one to generic sparse
    features?
    >>>>                           >
    >>>>                           > 3. How is the structure like if I
    use one
    >>>>  StatelessFeatureFunction
    >>>>                           > with unlimited scores? Is it
    different
    >>>> from
    >>>>                   having unlimited
    >>>>                           sparse
    >>>>                           > features?
    >>>>                           >
    >>>>                           > I assume if there is one FF then
    there is
    >>>>                   one weight
    >>>>                           assigned to it
    >>>>                           > but in the case of sparse
    features I have
    >>>>                   one weight for
    >>>>                           each feature.
    >>>>                           FFs can be dense or sparse. What
    that really
    >>>>                   means is that the
    >>>>                           number of
    >>>>                           feature values for a dense FF is
    known in
    >>>>                   advance (and so space is
    >>>>                           allocated in the feature value
    array) but
    >>>> for
    >>>>                   sparse FFs the
    >>>>                           number of
    >>>>                           feature values are not known in
    advance. So
    >>>>                   even dense FFs can
    >>>>                           have
    >>>>                           several weights associated with
    them - e.g.
    >>>>                   the phrase table
    >>>>                           features.
    >>>>                           In more recent versions of Moses a
    given FF
    >>>>                   can have both
    >>>>                           dense and
    >>>>                           sparse values.
    >>>>
    >>>>                           >
    >>>>                           > 4. In general when should I
    compute the
    >>>>                   sparse features?
    >>>>
    >>>>                           In general, computing them as soon
    as you
    >>>> can
    >>>>                   will probably
    >>>>                           make your
    >>>>                           code more efficient. When you are
    able to
    >>>>                   compute your sparse
    >>>>                           feature
    >>>>                           depends on the feature itself. For
    >>>> example, if
    >>>>                   the feature
    >>>>                           depends on
    >>>>                           only on the phrase pair then it
    could be
    >>>>                   computed and stored
    >>>>                           in the
    >>>>                           phrase table. This makes the phrase
    table
    >>>>                   bigger (which could
    >>>>                           slow you
    >>>>                           down) but saves on computation at
    >>>> decoding. On
    >>>>                   the other hand,
    >>>>                           a sparse
    >>>>                           reordering feature has to be mainly
    computed
    >>>>                   during decoding,
    >>>>                           since we
    >>>>                           do not know the ordering of
    segments until
    >>>>                   decoding. When I
    >>>>                           implemented
    >>>>                           sparse reordering features though, I
    >>>>                   precomputed the feature
    >>>>                           names since
    >>>>                           you don't want to do string
    concatenation
    >>>>                   during decoding.
    >>>>
    >>>>
    >>>>                           cheers - Barry
    >>>>
    >>>>                           >
    >>>>                           > Thanks for the patience,
    >>>>                           > --Prashant
    >>>>                           >
    >>>>                           > PS: I am still trying to figure
    out stuff,
    >>>>                   so questions
    >>>>                           might seem stupid.
    >>>>                           >
    >>>>                           >
    >>>>                           >
    >>>> _______________________________________________
    >>>>                           > Moses-support mailing list
    >>>>                           > Moses-support@mit.edu
    <mailto:Moses-support@mit.edu>
    >>>>                   <mailto:Moses-support@mit.edu
    <mailto:Moses-support@mit.edu>>
    >>>>                   <mailto:Moses-support@mit.edu
    <mailto:Moses-support@mit.edu>
    >>>>                   <mailto:Moses-support@mit.edu
    <mailto:Moses-support@mit.edu>>>
    >>>>                           >
    >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
    >>>>
    >>>>
    >>>>                           --
    >>>>                           The University of Edinburgh is a
    charitable
    >>>>                   body, registered in
    >>>>                           Scotland, with registration number
    SC005336.
    >>>>
    >>>> _______________________________________________
    >>>>                           Moses-support mailing list
    >>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
    <mailto:Moses-support@mit.edu <mailto:Moses-support@mit.edu>>
    >>>>                   <mailto:Moses-support@mit.edu
    <mailto:Moses-support@mit.edu>
    >>>>                   <mailto:Moses-support@mit.edu
    <mailto:Moses-support@mit.edu>>>
    >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
    >>>>
    >>>>
    >>>>
    >>>>
    >>>>                       --     Hieu Hoang
    >>>>                       Research Associate
    >>>>                       University of Edinburgh
    >>>> http://www.hoang.co.uk/hieu
    >>>>
    >>>>
    >>>>
    >>>>
    >>>>               --
    >>>>               The University of Edinburgh is a charitable body,
    >>>>               registered in
    >>>>               Scotland, with registration number SC005336.
    >>>>
    >>>>
    >>>>
    >>>>  _______________________________________________
    >>>>           Moses-support mailing list
    >>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
    <mailto:Moses-support@mit.edu <mailto:Moses-support@mit.edu>>
    >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
    >>>>
    >>>>
    >> _______________________________________________
    >> Moses-support mailing list
    >> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
    >> http://mailman.mit.edu/mailman/listinfo/moses-support
    >>
    >
    >

    _______________________________________________
    Moses-support mailing list
    Moses-support@mit.edu <mailto:Moses-support@mit.edu>
    http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] using sparse features

Reply via email to