Re: Confidence estimation in a beam decoder

Benson Margulies Sun, 14 Feb 2010 14:38:06 -0800

I should rephrase one thing. Our current product \started out/ a lot
like that. It wasn't good enough for the Google's of the world, so it
started to grow hair. We're looking at a statistical retread because
the hair gets harder and harder to comb.


On Sun, Feb 14, 2010 at 4:47 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Benson,
>
> One more thing.  I forget the actual reference, but the best Chinese
> segmenter that I have seen in practice (whose name I forget) was able to get
> away with a simple unweighted lexicon and 2-3 word look-ahead + average word
> length for score.  This indicates to me that you can depth bound your beam
> search and turn it into an exhaustive search.  The lesson of their success
> is that garden path sentences (with regard to segmentation) are rare in
> Chinese.
>
> On Sun, Feb 14, 2010 at 10:29 AM, Benson Margulies 
> <bimargul...@gmail.com>wrote:
>
>> Ted, thanks very much.
>>
>> Thoughts in response to both of your messages:
>>
>> 1: alpha-beta is being used here in the sense of E+M. Or, to be
>> specific, alpha is the path sum from the beginning to the current
>> 'time', and beta is the path sum from the current 'time' to the end.
>>
>> 2: I had read about that 'at the margin' idea and completely forgotten
>> it. My starting point here is Miller and Guiness (one of whom used to
>> work with me and the other of whom still does). They didn't report,
>> and perhaps didn't measure, whether the examples selected via that
>> 'gamma' calculation had high error rates (far from the margin?) or low
>> error rates (close to the margin). They just observed that
>>
>> 3: A scatter-plot looks like what the doctor ordered.
>>
>> 4: That paper is new to me. My stack of papers in this neighborhood is
>> Collins, Miller + Guiness, Crammer (on Passive-Aggressive) and the
>> Oxford paper on segmentation. Thanks for the pointer.
>>
>> --benson
>>
>> On Sat, Feb 13, 2010 at 11:18 PM, Ted Dunning <ted.dunn...@gmail.com>
>> wrote:
>> > Benson,
>> >
>> > Are you using techniques related to this:
>> > http://www.it.usyd.edu.au/~james/pubs/pdf/dlp07perc.pdf<http://www.it.usyd.edu.au/%7Ejames/pubs/pdf/dlp07perc.pdf>?
>> >
>> >
>> >
>> > On Sat, Feb 13, 2010 at 9:38 AM, Benson Margulies <bimargul...@gmail.com
>> >wrote:
>> >
>> >> Folks,
>> >>
>> >> Here's one of my occasional questions in which I am, in essence,
>> >> bartering my code wrangling efforts for expertise on hard stuff.
>> >>
>> >> Consider a sequence problem addressed with a perceptron model with an
>> >> ordinary Viterbi decoder. There's a standard confidence estimation
>> >> technique borrowed from HMMs: calculate gamma = alpha + beta for each
>> >> state, take the difference of the gammas for the best and second best
>> >> hypothesis for each column of the trellis, and take argmin of them as
>> >> the overall confidence of the decode. (+, of course, because in a
>> >> perceptron we're summing feature weights, not multiplying
>> >> probabilities.)
>> >>
>> >>
>> >
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: Confidence estimation in a beam decoder

Reply via email to