Hi Jian,

That depends on the nature of the features you're planning to
implement. 

In order to produce sparse features, you need to write a feature
function anyway.

But if it's only a handful of scores and they can be calculated during
extraction time, then go for dense features and add the scores directly
to the phrase table.

If the scores cannot be precalculated, for instance because you need
non-local information that is only available during decoding, then a
feature function implementation becomes necessary.

When you write a feature function that calculates scores during decoding
time, it can produce dense scores, sparse scores, or both types. That's
up to you.

If it's plenty of scores which are fired rarely, then sparse is the
right choice. And you certainly need a sparse feature function
implementation in case you are not aware in advance of the overall
amount of feature scores it can produce.

If you need information from phrase extraction in order to calculate
scores during decoding time, then we have something denoted as "phrase
properties". Phrase properties give you a means of storing arbitrary
additional information in the phrase table. You have to extend the
extraction pipeline to retrieve and store the phrase properties you
require. The decoder can later read this information from the phrase
table, and your feature function can utilize it in some way.

A large amount of sparse feature scores can somewhat slow down decoding
and tuning. Also, you have to use MIRA or PRO for tuning, not MERT.

Cheers,
Matthias


On Thu, 2015-07-16 at 02:18 +0100, jian zhang wrote:
> Hi Matthias,
> 
> 
> Not for domain feature.
> 
> 
> I want to implement some sparse features, so there are two options:
> 1, add to phrase table, if it is supported
> 2, implement sparse feature functions,
> 
> 
> I'd like to know are there any difference between these two options,
> for example, tuning, compute sentence translation scores ...
> 
> 
> Regards,
> 
> 
> 
> Jian
> 
> 
> 
> On Thu, Jul 16, 2015 at 2:06 AM, Matthias Huck <mh...@inf.ed.ac.uk>
> wrote:
>         Hi,
>         
>         Are you planning to use binary domain indicator features? I'm
>         not sure
>         whether a sparse feature function for this is currently
>         implemented. If
>         you're working with a small set of domains, you can employ
>         dense
>         indicators instead (domain-features = "indicator" in EMS).
>         You'll have
>         to re-extract the phrase table, though. Or process it with a
>         script to
>         add dense indicator values to the scores field.
>         
>         I believe that there might also be some bug in the extraction
>         pipeline
>         when both domain-features = "sparse indicator" and
>         score-settings =
>         "--GoodTuring" are active in EMS. At least it caused me
>         trouble a couple
>         of weeks ago. However, I must admit that I didn't investigate
>         it further
>         at that point.
>         
>         Anyway, the bottom line is that I recommend re-extracting with
>         dense
>         indicators.
>         
>         But let me know what you find regarding a sparse
>         implementation.
>         
>         Cheers,
>         Matthias
>         
>         
>         On Thu, 2015-07-16 at 00:48 +0100, jian zhang wrote:
>         > Hi,
>         >
>         >
>         > Is the sparse features at phrase table, like
>         >
>         >
>         >
>         > das Haus ||| the house ||| 0.8 0.5 0.8 0.5 2.718 ||| 0-0 1-1
>         ||| 5000
>         > 5000 2500 ||| dom_europarl 1
>         >
>         >
>         >
>         > still supported? If yes, what should I set to the ini file
>         based on
>         > the example above?
>         >
>         >
>         > Thank,
>         >
>         >
>         > Jian
>         >
>         >
>         > --
>         > Jian Zhang
>         > Centre for Next Generation Localisation (CNGL)
>         > Dublin City University
>         
>         > _______________________________________________
>         > Moses-support mailing list
>         > Moses-support@mit.edu
>         > http://mailman.mit.edu/mailman/listinfo/moses-support
>         
>         
>         
>         --
>         The University of Edinburgh is a charitable body, registered
>         in
>         Scotland, with registration number SC005336.
>         
> 
> 
> 
> 
> -- 
> Jian Zhang
> Centre for Next Generation Localisation (CNGL)
> Dublin City University



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to