Re: [Haskell-cafe] How would you hack it?

John Melesky Wed, 04 Jun 2008 14:33:39 -0700

On Jun 4, 2008, at 3:50 PM, Andrew Coppin wrote:

However, if you can find me a source that explains what a "Markovchain" actually *is*, I'd be quite interested.


In a non-rigorous nutshell:

You have the word "star". You want to pick another word to follow it.It turns out that, based on analyzing a corpus ("corpus", here means"bunch of text you extract information from"), the following wordpairs occur:


star wars (20% of the time)
star cluster (5% of the time)
star dust (10% of the time)
star system (25% of the time)
star -end-of-sentence- (5% of the time)
.
.
.

So you use those occurrence statistics to pick a feasible next word(let's choose "system", since it's the highest probability here -- inpractice you'd probably choose one randomly based on a weightedlikelihood). Then you look for all the word pairs which start with"system", and choose the next word in the same fashion. Repeat for aslong as you want.

Those word-pair statistics, when you have them for all the words inyour vocabulary, comprise the first-level Markov data for your corpus.

When you extend it to word triplets, it's second-level Markov data(and it will generate more reasonable fake text). You can build higherand higher Markov levels if you'd like.

And, ultimately, though the example is about text, you can use thismethod to generate "realistic" sequences of any sequential data.


Hope that helps.

-johnnnnnnn


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] How would you hack it?

Reply via email to