Re: How can I count word frequency in a web site?
On Sunday, November 29, 2015 at 7:49:40 PM UTC-5, ryguy7272 wrote: > I'm trying to figure out how to count words in a web site. Here is a sample > of the link I want to scrape data from and count specific words. > http://finance.yahoo.com/q/h?s=STRP+Headlines > > I only want to count certain words, like 'fraud', 'lawsuit', etc. I want to > have a way to control for specific words. I have a couple Python scripts > that do this for a text file, but not for a web site. I can post that, if > that's helpful. This works great! Thanks for sharing!! -- https://mail.python.org/mailman/listinfo/python-list
Re: How can I count word frequency in a web site?
On Sunday, November 29, 2015 at 9:51:46 PM UTC-5, Laura Creighton wrote: > In a message of Sun, 29 Nov 2015 21:31:49 -0500, Cem Karan writes: > >You might want to look into Beautiful Soup > >(https://pypi.python.org/pypi/beautifulsoup4), which is an HTML > >screen-scraping tool. I've never used it, but I've heard good things about > >it. > > > >Good luck, > >Cem Karan > > http://codereview.stackexchange.com/questions/73887/finding-the-occurrences-of-all-words-in-movie-scripts > > scrapes a site of movie scripts and then spits out the 10 most common > words. I suspect the OP could modify this script to suit his or her needs. > > Laura Thanks Laura! -- https://mail.python.org/mailman/listinfo/python-list
Re: How can I count word frequency in a web site?
You might want to look into Beautiful Soup (https://pypi.python.org/pypi/beautifulsoup4), which is an HTML screen-scraping tool. I've never used it, but I've heard good things about it. Good luck, Cem Karan On Nov 29, 2015, at 7:49 PM, ryguy7272wrote: > I'm trying to figure out how to count words in a web site. Here is a sample > of the link I want to scrape data from and count specific words. > http://finance.yahoo.com/q/h?s=STRP+Headlines > > I only want to count certain words, like 'fraud', 'lawsuit', etc. I want to > have a way to control for specific words. I have a couple Python scripts > that do this for a text file, but not for a web site. I can post that, if > that's helpful. > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: How can I count word frequency in a web site?
On Sunday, November 29, 2015 at 9:32:22 PM UTC-5, Cem Karan wrote: > You might want to look into Beautiful Soup > (https://pypi.python.org/pypi/beautifulsoup4), which is an HTML > screen-scraping tool. I've never used it, but I've heard good things about > it. > > Good luck, > Cem Karan > > On Nov 29, 2015, at 7:49 PM, ryguy7272 wrote: > > > I'm trying to figure out how to count words in a web site. Here is a > > sample of the link I want to scrape data from and count specific words. > > http://finance.yahoo.com/q/h?s=STRP+Headlines > > > > I only want to count certain words, like 'fraud', 'lawsuit', etc. I want > > to have a way to control for specific words. I have a couple Python > > scripts that do this for a text file, but not for a web site. I can post > > that, if that's helpful. > > > > -- > > https://mail.python.org/mailman/listinfo/python-list Ok, this small script will grab everything from the link. import requests from bs4 import BeautifulSoup r = requests.get("http://finance.yahoo.com/q/h?s=STRP+Headlines;) soup = BeautifulSoup(r.content) htmltext = soup.prettify() print htmltext Now, how can I count specific words like 'fraud' and 'lawsuit'? -- https://mail.python.org/mailman/listinfo/python-list
How can I count word frequency in a web site?
I'm trying to figure out how to count words in a web site. Here is a sample of the link I want to scrape data from and count specific words. http://finance.yahoo.com/q/h?s=STRP+Headlines I only want to count certain words, like 'fraud', 'lawsuit', etc. I want to have a way to control for specific words. I have a couple Python scripts that do this for a text file, but not for a web site. I can post that, if that's helpful. -- https://mail.python.org/mailman/listinfo/python-list
Re: How can I count word frequency in a web site?
In a message of Sun, 29 Nov 2015 21:31:49 -0500, Cem Karan writes: >You might want to look into Beautiful Soup >(https://pypi.python.org/pypi/beautifulsoup4), which is an HTML >screen-scraping tool. I've never used it, but I've heard good things about it. > >Good luck, >Cem Karan http://codereview.stackexchange.com/questions/73887/finding-the-occurrences-of-all-words-in-movie-scripts scrapes a site of movie scripts and then spits out the 10 most common words. I suspect the OP could modify this script to suit his or her needs. Laura -- https://mail.python.org/mailman/listinfo/python-list
Re: How can I count word frequency in a web site?
> On 30 Nov 2015, at 03:54, ryguy7272wrote: > > Now, how can I count specific words like 'fraud' and 'lawsuit'? - convert the page to plain text - remove any interpunction - split into words - see what words occur - enumerate all the words and increase a counter for each word Something like this: s = """Today we're rounding out our planetary tour with ice giants Uranus and Neptune. Both have small rocky cores, thick mantles of ammonia, water, and methane, and atmospheres that make them look greenish and blue. Uranus has a truly weird rotation and relatively dull weather, while Neptune has clouds and storms whipped by tremendous winds. Both have rings and moons, with Neptune's Triton probably being a captured iceball that has active geology.""" import collections cleaned = s.lower().replace("\n", " ").replace(".", "").replace(",", "").replace("'", " ") count = collections.Counter(cleaned.split(" ")) for interesting in ("neptune", "and"): print "The word '%s' occurs %d times" % (interesting, count[interesting]) # Outputs: The word 'neptune' occurs 3 times The word 'and' occurs 7 times -- https://mail.python.org/mailman/listinfo/python-list