Re: Parsing html :: output to comma delimited

2005-07-17 Thread samuels
Thanks for the replies,  I'll post here when/if I get it finally
working.

So, now I know how to extract the links for the big page, and extract
the text from the individual page.  Really what I need to find out is
how run the script on each individual page automatically, and get the
output in comma delimited format.  Thanks for solving the two problems
though :)

-Sam

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing html :: output to comma delimited

2005-07-16 Thread Paul McGuire
Pyparsing includes a sample program for extracting URLs from web pages.
 You should be able to adapt it to this problem.

Download pyparsing at http://pyparsing.sourceforge.net

-- Paul

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing html :: output to comma delimited

2005-07-16 Thread William Park
samuels <[EMAIL PROTECTED]> wrote:
> Hello All,
> 
> I am a total python newbie, and I need help writing a script.
> 
> This is what I want to do:
> 
> There is a list of links at http://www.rentalhq.com/fulllist.asp.  Each
> link goes to a page like,
> http://www.rentalhq.com/store.asp?id=907%2F272%2D4425, that contains a
> company name, address, phone, and fax.  I want extract each page, parse
> this information, and export it to a comma delimited text file, or tab
> delimited.  The important information in each page is:
> 
>  style="border-collapse: collapse" bordercolor="#11" width="100%"
> id="AutoNumber1">
>   
> 
> 
> United Rentals Inc.
> 
> 
> 3401 Commercial Dr. 
> Anchorage AK, 99501-3024
> 
> 
>  href="http://maps.google.com/maps?q=3401+Commercial+Dr%2E Anchorage AK
> 99501-3024 ">
> 
>  border="0">
> 
> 
>   
>   
> 
>  
> 
> 
> Phone - 907/272-4425
>  Fax - 907/272-9683 
> 
> So from that I want output like :
> 
> United Rentals Inc.,3401 Commercial
> Dr.,Anchorage,AK,"995013024","9072724425","9072729683"
> 
> or
> 
> United Rentals Inc. 3401 Commercial
> Dr. Anchorage   AK  995013024   9072724425  9072729683
> 
> 
> I have been messing around with beautiful soup
> (http://www.crummy.com/software/BeautifulSoup/index.html) but haven't
> gotten very far. (specially because the html is so sloppy)
> 
> Any help would be really appreciated!  Just point me in the right
> direction, what to use, examples...  Thanks!

I'm sure others will give proper Python solution.  But, here, shell is
not a bad tool.

lynx -dump 'http://www.rentalhq.com/store.asp?id=907%2F272%2D4425' | \
awk '/Return to List of Rental Stores/,/To reserve an item/' | \
sed -n -e '3p;5p;10p;11p'

gives me

United Rentals Inc.
3401 Commercial Dr.  Anchorage AK, 99501-3024
   Phone - 907/272-4425
   Fax - 907/272-9683

-- 
William Park <[EMAIL PROTECTED]>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
   http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
  http://freshmeat.net/projects/bashdiff/
-- 
http://mail.python.org/mailman/listinfo/python-list


Parsing html :: output to comma delimited

2005-07-16 Thread samuels
Hello All,

I am a total python newbie, and I need help writing a script.

This is what I want to do:

There is a list of links at http://www.rentalhq.com/fulllist.asp.  Each
link goes to a page like,
http://www.rentalhq.com/store.asp?id=907%2F272%2D4425, that contains a
company name, address, phone, and fax.  I want extract each page, parse
this information, and export it to a comma delimited text file, or tab
delimited.  The important information in each page is:


  


United Rentals Inc.


3401 Commercial Dr. 
Anchorage AK, 99501-3024


http://maps.google.com/maps?q=3401+Commercial+Dr%2E Anchorage AK
99501-3024 ">




  
  

 


Phone - 907/272-4425
 Fax - 907/272-9683 

So from that I want output like :

United Rentals Inc.,3401 Commercial
Dr.,Anchorage,AK,"995013024","9072724425","9072729683"

or

United Rentals Inc. 3401 Commercial
Dr. Anchorage   AK  995013024   9072724425  9072729683


I have been messing around with beautiful soup
(http://www.crummy.com/software/BeautifulSoup/index.html) but haven't
gotten very far. (specially because the html is so sloppy)

Any help would be really appreciated!  Just point me in the right
direction, what to use, examples...  Thanks!

-Sam

-- 
http://mail.python.org/mailman/listinfo/python-list