Re: Confidence estimation in a beam decoder

Ted Dunning Sun, 14 Feb 2010 13:48:09 -0800

Benson,

One more thing.  I forget the actual reference, but the best Chinese
segmenter that I have seen in practice (whose name I forget) was able to get
away with a simple unweighted lexicon and 2-3 word look-ahead + average word
length for score.  This indicates to me that you can depth bound your beam
search and turn it into an exhaustive search.  The lesson of their success
is that garden path sentences (with regard to segmentation) are rare in
Chinese.


On Sun, Feb 14, 2010 at 10:29 AM, Benson Margulies <bimargul...@gmail.com>wrote:

> Ted, thanks very much.
>
> Thoughts in response to both of your messages:
>
> 1: alpha-beta is being used here in the sense of E+M. Or, to be
> specific, alpha is the path sum from the beginning to the current
> 'time', and beta is the path sum from the current 'time' to the end.
>
> 2: I had read about that 'at the margin' idea and completely forgotten
> it. My starting point here is Miller and Guiness (one of whom used to
> work with me and the other of whom still does). They didn't report,
> and perhaps didn't measure, whether the examples selected via that
> 'gamma' calculation had high error rates (far from the margin?) or low
> error rates (close to the margin). They just observed that
>
> 3: A scatter-plot looks like what the doctor ordered.
>
> 4: That paper is new to me. My stack of papers in this neighborhood is
> Collins, Miller + Guiness, Crammer (on Passive-Aggressive) and the
> Oxford paper on segmentation. Thanks for the pointer.
>
> --benson
>
> On Sat, Feb 13, 2010 at 11:18 PM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
> > Benson,
> >
> > Are you using techniques related to this:
> > http://www.it.usyd.edu.au/~james/pubs/pdf/dlp07perc.pdf<http://www.it.usyd.edu.au/%7Ejames/pubs/pdf/dlp07perc.pdf>?
> >
> >
> >
> > On Sat, Feb 13, 2010 at 9:38 AM, Benson Margulies <bimargul...@gmail.com
> >wrote:
> >
> >> Folks,
> >>
> >> Here's one of my occasional questions in which I am, in essence,
> >> bartering my code wrangling efforts for expertise on hard stuff.
> >>
> >> Consider a sequence problem addressed with a perceptron model with an
> >> ordinary Viterbi decoder. There's a standard confidence estimation
> >> technique borrowed from HMMs: calculate gamma = alpha + beta for each
> >> state, take the difference of the gammas for the best and second best
> >> hypothesis for each column of the trellis, and take argmin of them as
> >> the overall confidence of the decode. (+, of course, because in a
> >> perceptron we're summing feature weights, not multiplying
> >> probabilities.)
> >>
> >>
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Re: Confidence estimation in a beam decoder

Reply via email to