> Another suggestion is to ensure that the job specification is not > overly simplified. How did you parse the text into "words" in the > prior exercise that produced the list of bigrams? Won't you need to > use the same parsing method in the current exercise of tagging the > bigrams with an underscore? > > Cheers, > John
Thank you John, that definitely puts things in perspective! I'm very new to both Python and text parsing, and I often feel that I can't see the forest for the trees. If you're asking, I'm working on a project that utilizes Church's mutual information score. I tokenize my text, split it into a list, derive some unigram and bigram dictionaries, and then calculate a pmi dictionary based on x,y from the bigrams and unigrams. The bigrams that pass my threshold then get put into my list of x_y strings, and you know the rest. By modifying the original text file, I can view 'x_y', z pairs as x,y and iterate it until I have some collocations that are worth playing with. So I think that covers the question the same parsing method. I'm sure there are more pythonic ways to do it, but I'm on deadline :) Thanks again! Brandon -- http://mail.python.org/mailman/listinfo/python-list
