Re: How can I count word frequency in a web site?

2015-11-30 Thread ryguy7272
On Sunday, November 29, 2015 at 7:49:40 PM UTC-5, ryguy7272 wrote:
> I'm trying to figure out how to count words in a web site.  Here is a sample 
> of the link I want to scrape data from and count specific words.
> http://finance.yahoo.com/q/h?s=STRP+Headlines
> 
> I only want to count certain words, like 'fraud', 'lawsuit', etc.  I want to 
> have a way to control for specific words.  I have a couple Python scripts 
> that do this for a text file, but not for a web site.  I can post that, if 
> that's helpful.


This works great!  Thanks for sharing!!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I count word frequency in a web site?

2015-11-30 Thread ryguy7272
On Sunday, November 29, 2015 at 9:51:46 PM UTC-5, Laura Creighton wrote:
> In a message of Sun, 29 Nov 2015 21:31:49 -0500, Cem Karan writes:
> >You might want to look into Beautiful Soup 
> >(https://pypi.python.org/pypi/beautifulsoup4), which is an HTML 
> >screen-scraping tool.  I've never used it, but I've heard good things about 
> >it.
> >
> >Good luck,
> >Cem Karan
> 
> http://codereview.stackexchange.com/questions/73887/finding-the-occurrences-of-all-words-in-movie-scripts
> 
> scrapes a site of movie scripts and then spits out the 10 most common
> words.  I suspect the OP could modify this script to suit his or her needs.
> 
> Laura


Thanks Laura!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I count word frequency in a web site?

2015-11-29 Thread Cem Karan
You might want to look into Beautiful Soup 
(https://pypi.python.org/pypi/beautifulsoup4), which is an HTML screen-scraping 
tool.  I've never used it, but I've heard good things about it.

Good luck,
Cem Karan

On Nov 29, 2015, at 7:49 PM, ryguy7272  wrote:

> I'm trying to figure out how to count words in a web site.  Here is a sample 
> of the link I want to scrape data from and count specific words.
> http://finance.yahoo.com/q/h?s=STRP+Headlines
> 
> I only want to count certain words, like 'fraud', 'lawsuit', etc.  I want to 
> have a way to control for specific words.  I have a couple Python scripts 
> that do this for a text file, but not for a web site.  I can post that, if 
> that's helpful.
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I count word frequency in a web site?

2015-11-29 Thread ryguy7272
On Sunday, November 29, 2015 at 9:32:22 PM UTC-5, Cem Karan wrote:
> You might want to look into Beautiful Soup 
> (https://pypi.python.org/pypi/beautifulsoup4), which is an HTML 
> screen-scraping tool.  I've never used it, but I've heard good things about 
> it.
> 
> Good luck,
> Cem Karan
> 
> On Nov 29, 2015, at 7:49 PM, ryguy7272 wrote:
> 
> > I'm trying to figure out how to count words in a web site.  Here is a 
> > sample of the link I want to scrape data from and count specific words.
> > http://finance.yahoo.com/q/h?s=STRP+Headlines
> > 
> > I only want to count certain words, like 'fraud', 'lawsuit', etc.  I want 
> > to have a way to control for specific words.  I have a couple Python 
> > scripts that do this for a text file, but not for a web site.  I can post 
> > that, if that's helpful.
> > 
> > -- 
> > https://mail.python.org/mailman/listinfo/python-list

Ok, this small script will grab everything from the link.

import requests
from bs4 import BeautifulSoup
r = requests.get("http://finance.yahoo.com/q/h?s=STRP+Headlines;)
soup = BeautifulSoup(r.content)
htmltext = soup.prettify()
print htmltext


Now, how can I count specific words like 'fraud' and 'lawsuit'?
-- 
https://mail.python.org/mailman/listinfo/python-list


How can I count word frequency in a web site?

2015-11-29 Thread ryguy7272
I'm trying to figure out how to count words in a web site.  Here is a sample of 
the link I want to scrape data from and count specific words.
http://finance.yahoo.com/q/h?s=STRP+Headlines

I only want to count certain words, like 'fraud', 'lawsuit', etc.  I want to 
have a way to control for specific words.  I have a couple Python scripts that 
do this for a text file, but not for a web site.  I can post that, if that's 
helpful.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I count word frequency in a web site?

2015-11-29 Thread Laura Creighton
In a message of Sun, 29 Nov 2015 21:31:49 -0500, Cem Karan writes:
>You might want to look into Beautiful Soup 
>(https://pypi.python.org/pypi/beautifulsoup4), which is an HTML 
>screen-scraping tool.  I've never used it, but I've heard good things about it.
>
>Good luck,
>Cem Karan

http://codereview.stackexchange.com/questions/73887/finding-the-occurrences-of-all-words-in-movie-scripts

scrapes a site of movie scripts and then spits out the 10 most common
words.  I suspect the OP could modify this script to suit his or her needs.

Laura
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I count word frequency in a web site?

2015-11-29 Thread Michiel Overtoom

> On 30 Nov 2015, at 03:54, ryguy7272  wrote:
> 
> Now, how can I count specific words like 'fraud' and 'lawsuit'?

- convert the page to plain text
- remove any interpunction
- split into words
- see what words occur
- enumerate all the words and increase a counter for each word

Something like this:

s = """Today we're rounding out our planetary tour with ice giants Uranus
and Neptune. Both have small rocky cores, thick mantles of ammonia, water,
and methane, and atmospheres that make them look greenish and blue. Uranus
has a truly weird rotation and relatively dull weather, while Neptune has
clouds and storms whipped by tremendous winds. Both have rings and moons,
with Neptune's Triton probably being a captured iceball that has active
geology."""

import collections
cleaned = s.lower().replace("\n", " ").replace(".", "").replace(",", 
"").replace("'", " ")
count = collections.Counter(cleaned.split(" "))
for interesting in ("neptune", "and"):
print "The word '%s' occurs %d times" % (interesting, count[interesting])


# Outputs:

The word 'neptune' occurs 3 times
The word 'and' occurs 7 times




-- 
https://mail.python.org/mailman/listinfo/python-list