Re: [Tutor] count words

2005-02-15 Thread Bill Mill
Coupla nits:

On Tue, 15 Feb 2005 14:39:30 -0500, Kent Johnson <[EMAIL PROTECTED]> wrote:
> from string import punctuation
> from time import time
> 

>
> words = open(r'D:\Personal\Tutor\ArtOfWar.txt').read().split()

Another advantage of the first method is that it allows a more elegant
word counting algorithm if you choose not to read the entire file into
memory. It's a better general practice to consume lines from a file
via the "for line in f" idiom.

> words = [ word.strip(punctuation) for word in words ]

And, be careful with this - punctuation does not include whitespace
characters. Although that is no problem in this example, because
split() strips its component strings automatically, people should be
aware that punctuation won't work on strings that haven't had their
whitespace stripped.

Otherwise though, good stuff.

Peace
Bill Mill
bill.mill at gmail.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] count words

2005-02-15 Thread Kent Johnson
Ryan Davis wrote:
Here's one way to iterate over that to get the counts.  I'm sure there are 
dozens.
###
x = 'asdf foo bar foo'
counts = {}
for word in x.split():
...	counts[word] = x.count(word)
... 

counts
{'foo': 2, 'bar': 1, 'asdf': 1}
###
The dictionary takes care of duplicates.  If you are using a really big file, 
it might pay to eliminate duplicates from the list
before running x.count(word)
Be wary of using the count() function for this, it can be very slow. The problem is that every time 
you call count(), Python has to look at every element of the list to see if it matches the word you 
passed to count(). So if the list has n words in it, you will make n*n comparisons. In contrast, the 
method that directly accumulates counts in a dictionary just makes one pass over the list. For small 
lists this doesn't matter much, but for a longer list you will definitely see the difference.

For example, I downloaded "The Art of War" from Project Gutenberg 
(http://www.gutenberg.org/dirs/1/3/132/132a.txt) and tried both methods. Here is a program that 
times how long it takes to do the counts using two different methods:

# WordCountTest.py
''' Count words two different ways '''
from string import punctuation
from time import time
def countWithDict(words):
''' Word count by accumulating counts in a dictionary '''
counts = {}
for word in words:
counts[word] = counts.get(word, 0) + 1
return counts
def countWithCount(words):
''' Word count by calling count() for each word '''
counts = {}
for word in words:
counts[word] = words.count(word)
return counts
def timeOne(f, words):
''' Time how long it takes to do f(words) '''
startTime = time()
f(words)
endTime = time()
print '%s: %f' % (f.__name__, endTime-startTime)
# Get the word list and strip off punctuation
words = open(r'D:\Personal\Tutor\ArtOfWar.txt').read().split()
words = [ word.strip(punctuation) for word in words ]
# How many words is it, anyway?
print len(words), 'words'
# Time the word counts
c1 = timeOne(countWithDict, words)
c2 = timeOne(countWithCount, words)
# Check that both get the same result
assert c1 == c2

The results (times are in seconds):
14253 words
countWithDict: 0.01
countWithCount: 9.183000
It takes 900 times longer to count() each word individually!
Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] count words

2005-02-15 Thread Bill Mill
On Tue, 15 Feb 2005 18:03:57 +, Max Noel <[EMAIL PROTECTED]> wrote:
> 
> On Feb 15, 2005, at 17:19, Ron Nixon wrote:
> 
> > Thanks to everyone who replied to my post. All of your
> > suggestions seem to work. My thanks
> >
> > Ron
> 
> Watch out, though, for all of this to work flawlessly you first have
> to remove all punctuation (either with regexes or with multiple
> foo.replace('[symbol]', '')), and to remove the case of each word
> (foo.upper() or foo.lower() will do).

To remove all punctuation from the beginning and end of words, at
least in 2.4, you can just use:

word.strip('.!?\n\t ')

plus any other characters that you'd like to strip. In action:

>>> word = "?testing..!.\n\t "
>>> word.strip('?.!\n\t ')
'testing'

Peace
Bill Mill
bill.mill at gmail.com
> 
> -- Max
> maxnoel_fr at yahoo dot fr -- ICQ #85274019
> "Look at you hacker... A pathetic creature of meat and bone, panting
> and sweating as you run through my corridors... How can you challenge a
> perfect, immortal machine?"
> 
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] count words

2005-02-15 Thread Alan Gauld
> Other than using a several print statments to look for
> seperate words like this, is there a way to do it so
> that I get a individual count of each word:
> 
> word1 xxx
> word2 xxx
> words xxx

The classic approach is to create a dictionary.
Add each word as you come to it and increment the value by one.
At the end the dictionaru contains all unique words with the 
count for each one.

Does that work for you?

Alan G.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] count words

2005-02-15 Thread Max Noel
On Feb 15, 2005, at 17:19, Ron Nixon wrote:
Thanks to everyone who replied to my post. All of your
suggestions seem to work. My thanks
Ron
	Watch out, though, for all of this to work flawlessly you first have 
to remove all punctuation (either with regexes or with multiple 
foo.replace('[symbol]', '')), and to remove the case of each word 
(foo.upper() or foo.lower() will do).

-- Max
maxnoel_fr at yahoo dot fr -- ICQ #85274019
"Look at you hacker... A pathetic creature of meat and bone, panting 
and sweating as you run through my corridors... How can you challenge a 
perfect, immortal machine?"

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


RE: [Tutor] count words

2005-02-15 Thread Ron Nixon
Thanks to everyone who replied to my post. All of your
suggestions seem to work. My thanks

Ron


--- Ryan Davis <[EMAIL PROTECTED]> wrote:

> You could use split() to split the contents of the
> file into a list of strings.
> 
> ###
> >>> x = 'asdf foo bar foo'
> >>> x.split()
> ['asdf', 'foo', 'bar', 'foo']
> ###
> 
> Here's one way to iterate over that to get the
> counts.  I'm sure there are dozens.
> ###
> >>> x = 'asdf foo bar foo'
> >>> counts = {}
> >>> for word in x.split():
> ...   counts[word] = x.count(word)
> ... 
> >>> counts
> {'foo': 2, 'bar': 1, 'asdf': 1}
> ###
> The dictionary takes care of duplicates.  If you are
> using a really big file, it might pay to eliminate
> duplicates from the list
> before running x.count(word)
> 
> Thanks,
> Ryan 
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Ron
> Nixon
> Sent: Tuesday, February 15, 2005 11:22 AM
> To: tutor@python.org
> Subject: [Tutor] count words
> 
> 
> I know that you can do this to get a count of home
> many times a word appears in a file
> 
> 
> f = open('text.txt').read()
> print f.count('word')
> 
> Other than using a several print statments to look
> for
> seperate words like this, is there a way to do it so
> that I get a individual count of each word:
> 
> word1 xxx
> word2 xxx
> words xxx
> 
> etc.
> 
> 
> 
> 
>   
> __ 
> Do you Yahoo!? 
> Yahoo! Mail - Find what you need with new enhanced
> search.
> http://info.mail.yahoo.com/mail_250
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 




__ 
Do you Yahoo!? 
The all-new My Yahoo! - Get yours free! 
http://my.yahoo.com 
 

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] count words

2005-02-15 Thread Danny Yoo


On Tue, 15 Feb 2005, Ron Nixon wrote:

> I know that you can do this to get a count of home many times a word
> appears in a file
>
>
> f = open('text.txt').read()
> print f.count('word')
>
> Other than using a several print statments to look for seperate words
> like this, is there a way to do it so that I get a individual count of
> each word:


Hi Ron,

Let's modify the problem a bit.  Let's say that we have a list of words:

###
words = """one ring to rule them all one ring to find them
   one ring to bring them all and in the darkness bind them
   in the land of mordor where the shadows lie""".split()
###


What happens if we sort() this list?

###
>>> words.sort()
>>> words
['all', 'all', 'and', 'bind', 'bring', 'darkness', 'find', 'in', 'in',
'land', 'lie', 'mordor', 'of', 'one', 'one', 'one', 'ring', 'ring',
'ring', 'rule', 'shadows', 'the', 'the', 'the', 'them', 'them', 'them',
'them', 'to', 'to', 'to', 'where']
###


Would this be easier to process?



If you have more questions, please feel free to ask!

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] count words

2005-02-15 Thread Jeremy Jones




Ron Nixon wrote:

  I know that you can do this to get a count of home
many times a word appears in a file


f = open('text.txt').read()
print f.count('word')

Other than using a several print statments to look for
seperate words like this, is there a way to do it so
that I get a individual count of each word:

word1 xxx
word2 xxx
words xxx

etc.



  

Like this?


A    14
AND  1
Abantes  3
Abarbarea    1
Abas 1
Abians   1
Ablerus  1
About    2
Abydos   3
Acamas   11
Accept   2
Acessamenus  1
Achaea   1
Achaean  34
Achaeans 540
Achelous 2
Achilles 423
Acrisius 1
Actaea   1
Actor    8
Adamas   5
Admetus  4
Adrastus 2
Adresteia    1
Adrestus 8
Aeacus   20
Aegae    2
Aegaeon  1
Aegeus   1
Aegialeia    1
Aegialus 1
Aegilips 1
Aegina   1
Aegium   1
Aeneas   86
Aenus    1
Aeolus   1
Aepea    2
Aepytus  1
Aesculapius  7
Aesepus  2
Aesopus  4
Aesyetes 2
Aesyme   1
Aesymnus 1
...
wronged  2
wronging 1
wrongs   1
wroth    1
wrought  24
wrung    1
yard 3
yarded   1
yards    2
yawned   1
ye   3
yea  1
year 13
yearling 2
yearned  4
yearning 2
years    15
yellow   5
yesterday    5
yet  160
yield    10
yielded  3
yielding 3
yieldit  1
yoke 24
yoked    11
yokes    1
yokestraps   1
yolking  1
yonder   3
you  1712
young    44
younger  9
youngest 6
your 592
yourelf  1
yours    7
yourself 60
yourselves   17
youselves    1
youth    17
youths   18
zeal 2

I ran the following script on "The Iliad":

#!/usr/bin/env python


import string

text = open('iliad.txt', 'r').read()
for punct in string.punctuation:
    text = text.replace(punct, ' ')
words = text.split()

word_dict = {}
for word in words:
    word_dict[word] = word_dict.get(word, 0) + 1
word_list = word_dict.keys()
word_list.sort()
for word in word_list:
    print "%-25s%d" % (word, word_dict[word])


Jeremy Jones


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


RE: [Tutor] count words

2005-02-15 Thread Ryan Davis
You could use split() to split the contents of the file into a list of strings.

###
>>> x = 'asdf foo bar foo'
>>> x.split()
['asdf', 'foo', 'bar', 'foo']
###

Here's one way to iterate over that to get the counts.  I'm sure there are 
dozens.
###
>>> x = 'asdf foo bar foo'
>>> counts = {}
>>> for word in x.split():
... counts[word] = x.count(word)
... 
>>> counts
{'foo': 2, 'bar': 1, 'asdf': 1}
###
The dictionary takes care of duplicates.  If you are using a really big file, 
it might pay to eliminate duplicates from the list
before running x.count(word)

Thanks,
Ryan 

-Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ron Nixon
Sent: Tuesday, February 15, 2005 11:22 AM
To: tutor@python.org
Subject: [Tutor] count words


I know that you can do this to get a count of home
many times a word appears in a file


f = open('text.txt').read()
print f.count('word')

Other than using a several print statments to look for
seperate words like this, is there a way to do it so
that I get a individual count of each word:

word1 xxx
word2 xxx
words xxx

etc.





__ 
Do you Yahoo!? 
Yahoo! Mail - Find what you need with new enhanced search.
http://info.mail.yahoo.com/mail_250
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] count words

2005-02-15 Thread Bill Mill
Ron,

is there a way to do it so
> that I get a individual count of each word:
> 
> word1 xxx
> word2 xxx
> words xxx
> 
> etc.

Ron, I'm gonna throw some untested code at you. Let me know if you
understand it or not:

word_counts = {}
for line in f:
for word in line.split():
if word in word_counts:
word_counts[word] += 1
else:
word_counts[word] = 1

for word in word_counts:
print "%s %d" % (word, word_counts[word])

Peace
Bill Mill
bill.mill at gmail.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] count words

2005-02-15 Thread Ron Nixon

I know that you can do this to get a count of home
many times a word appears in a file


f = open('text.txt').read()
print f.count('word')

Other than using a several print statments to look for
seperate words like this, is there a way to do it so
that I get a individual count of each word:

word1 xxx
word2 xxx
words xxx

etc.





__ 
Do you Yahoo!? 
Yahoo! Mail - Find what you need with new enhanced search.
http://info.mail.yahoo.com/mail_250
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor