----- Original Message ---- > From: Drew Farris <[email protected]> > > On Fri, Jan 8, 2010 at 12:06 AM, Robin Anil wrote: > > > I like the Formulation that Drew made, using n-1 grams to generate n-grams. > > I think Ted first mentioned n-1 grams, and I ran with it. It is very > useful to think about the problem this way.
I think I missed this. Could you please explain the n-1 gram thinking and why that is better than thinking about n-grams as n-grams? Thanks, Otis > One questions about the concept of n-1 grams however. When n is 3 for > example, are we really interested in the collocation of bigrams, or > are we interested in non-overlapping tokens? For example, given the > tri-gram 'click and clack', should we be looking at 'click and' and > 'and clack', or are should we be analyzing 'click', 'and clack' or > 'click and' and 'clack''? I suspect it is the first form because that > extends easilly to values larger than 3, but it's worth confirming.
