Re: Validating clustering output

Ted Dunning Mon, 27 Jul 2009 21:49:37 -0700

On Mon, Jul 27, 2009 at 6:51 PM, Benson Margulies <[email protected]>wrote:


> [brown and mercer did hard stuff] Of course, you aren't proposing that,
> just
> recommending the bigram entropy metric or something like it.
>

Peter Brown and Bob Mercer were very sharp dudes and when they did this work
it was 100 times more amazing than it is now.  They had the advantage of
working for a company that understood that the resources that you give
researchers now should be 20 times more than you would expect a user to have
in 5 years, but even so, their achievements were quite something.

Frankly that record of achievement leads back beyond them to Fred Jelinek,
Lalit Bahl and Selim Roukos and all the other early guys who worked on
speech back then.  That work (along with the BBN team under Jim and Janet
Baker) gave us the entire framework of HMM's and entropy based evaluation
that is core to speech systems today.  It leads forward to some of the
really fabulous work that the della Pietra brothers did as well.

I owe the IBM team my interest in statistical approaches to AI and symbolic
sequences.  It was on a visit to IBM in 1990 or so that Stephen (or Vincent)
dP mentioned off-handedly to me that mutual information was "trivially known
to be chi-squared distributed asymptotically".  That was news to me and
formed the basis of a LOT of the work that I have done in the intervening 19
years.



-- 
Ted Dunning, CTO
DeepDyve

Re: Validating clustering output

Reply via email to