Re: Parsing html :: output to comma delimited
Thanks for the replies, I'll post here when/if I get it finally working. So, now I know how to extract the links for the big page, and extract the text from the individual page. Really what I need to find out is how run the script on each individual page automatically, and get the output in comma delimited format. Thanks for solving the two problems though :) -Sam -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing html :: output to comma delimited
Pyparsing includes a sample program for extracting URLs from web pages. You should be able to adapt it to this problem. Download pyparsing at http://pyparsing.sourceforge.net -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing html :: output to comma delimited
samuels <[EMAIL PROTECTED]> wrote: > Hello All, > > I am a total python newbie, and I need help writing a script. > > This is what I want to do: > > There is a list of links at http://www.rentalhq.com/fulllist.asp. Each > link goes to a page like, > http://www.rentalhq.com/store.asp?id=907%2F272%2D4425, that contains a > company name, address, phone, and fax. I want extract each page, parse > this information, and export it to a comma delimited text file, or tab > delimited. The important information in each page is: > > style="border-collapse: collapse" bordercolor="#11" width="100%" > id="AutoNumber1"> > > > > United Rentals Inc. > > > 3401 Commercial Dr. > Anchorage AK, 99501-3024 > > > href="http://maps.google.com/maps?q=3401+Commercial+Dr%2E Anchorage AK > 99501-3024 "> > > border="0"> > > > > > > > > > Phone - 907/272-4425 > Fax - 907/272-9683 > > So from that I want output like : > > United Rentals Inc.,3401 Commercial > Dr.,Anchorage,AK,"995013024","9072724425","9072729683" > > or > > United Rentals Inc. 3401 Commercial > Dr. Anchorage AK 995013024 9072724425 9072729683 > > > I have been messing around with beautiful soup > (http://www.crummy.com/software/BeautifulSoup/index.html) but haven't > gotten very far. (specially because the html is so sloppy) > > Any help would be really appreciated! Just point me in the right > direction, what to use, examples... Thanks! I'm sure others will give proper Python solution. But, here, shell is not a bad tool. lynx -dump 'http://www.rentalhq.com/store.asp?id=907%2F272%2D4425' | \ awk '/Return to List of Rental Stores/,/To reserve an item/' | \ sed -n -e '3p;5p;10p;11p' gives me United Rentals Inc. 3401 Commercial Dr. Anchorage AK, 99501-3024 Phone - 907/272-4425 Fax - 907/272-9683 -- William Park <[EMAIL PROTECTED]>, Toronto, Canada ThinFlash: Linux thin-client on USB key (flash) drive http://home.eol.ca/~parkw/thinflash.html BashDiff: Super Bash shell http://freshmeat.net/projects/bashdiff/ -- http://mail.python.org/mailman/listinfo/python-list
Parsing html :: output to comma delimited
Hello All, I am a total python newbie, and I need help writing a script. This is what I want to do: There is a list of links at http://www.rentalhq.com/fulllist.asp. Each link goes to a page like, http://www.rentalhq.com/store.asp?id=907%2F272%2D4425, that contains a company name, address, phone, and fax. I want extract each page, parse this information, and export it to a comma delimited text file, or tab delimited. The important information in each page is: United Rentals Inc. 3401 Commercial Dr. Anchorage AK, 99501-3024 http://maps.google.com/maps?q=3401+Commercial+Dr%2E Anchorage AK 99501-3024 "> Phone - 907/272-4425 Fax - 907/272-9683 So from that I want output like : United Rentals Inc.,3401 Commercial Dr.,Anchorage,AK,"995013024","9072724425","9072729683" or United Rentals Inc. 3401 Commercial Dr. Anchorage AK 995013024 9072724425 9072729683 I have been messing around with beautiful soup (http://www.crummy.com/software/BeautifulSoup/index.html) but haven't gotten very far. (specially because the html is so sloppy) Any help would be really appreciated! Just point me in the right direction, what to use, examples... Thanks! -Sam -- http://mail.python.org/mailman/listinfo/python-list