dave wrote:
Hi Guys,

I've written a Markov analysis program and would like to get your comments on the code As it stands now the final input comes out as a tuple, then list, then tuple. Something like ('the', 'water') ['us'] ('we', 'took')..etc...

I'm still learning so I don't know any advanced techniques or methods that may have made this easier.


here's the code:

def makelist(f):     #turn a document into a list
    fin = open(f)
    results = []
    for line in fin:
               line = line.replace('"', '')
        line = line.strip().split()
        for word in line:
            results.append(word)
    return results



What's you data look like?  Just straight text?



def markov(f, preflen=2): #f is the file to analyze, preflen is prefix length
    convert_file = makelist(f)
    mapdict = {}        #dict where the prefixes will map to suffixes
    start = 0
    end = preflen         #start/end set the slice size
    for words in convert_file:
        prefix = tuple(convert_file[start:end])     #tuple as mapdict key
        suffix = convert_file[start + 2 : end + 1]  #word as suffix to key
        mapdict[prefix] = mapdict.get(prefix, []) + suffix #append suffixes
        start += 1
        end += 1
    return mapdict



What is convert_file??




def randsent(f, amt=10):     #prints a random sentence
       analyze = markov(f)
    for i in range(amt):
        rkey = random.choice(analyze.keys())
        print rkey, analyze[rkey],


The book gave a hint  saying to make the prefixes in the dict using:

def shift(prefix, word):
    return prefix[1:] + (word, )

That's not a very helpful hint.

It works if you call it with a tuple and a word --- it shifts off the front of the tuple ... so :

shift(('foo','bar') "word")
becomes   ('bar', 'word')

Whoopty doo --- I'm not sure what that accomplishes!!

Unless the author means "pass a list and a randomly pick a word from the list" in which case the return statement could be

random.choice(prefix) + (word, )

* shrug *

But -- that's not very Markov ... you'd want a weighted choice of words ... depending on how you define your Markov chain -- say a Markov chain based on part-of-speech or probability of occurrence from a given word-set.

Can you give some more detail??




However I can't seem to wrap my head around incorporating that into the code above, if you know a method or could point me in the right direction (or think that I don't need to use it) please let me know.

Thanks for all your help,

Dave

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to