in 764257 20160823 081439 Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: >On Tuesday 23 August 2016 10:28, adam.j.k...@gmail.com wrote: > >> Hi, >> >> I am hoping someone is able to help me. >> >> Is there a way to pull as much raw data from a website as possible. The >> webpage that I am looking for is as follows: >> >http://www.homepriceguide.com.au/Research/ResearchSeeFullList.aspx?LocationType=LGA&State=QLD&LgaID= >632 >> >> The main variable that is important is the "632" at the end, by adjusting >> this it changes the postcodes. Each postcode contains a large amount of data. >> Is there a way this all able to be exported into an excel document? > >Ideally, the web site itself will offer an Excel download option. If it >doesn't, you may be able to screen-scrape the data yourself, but: > >(1) it may be against the terms of service of the website; >(2) it may be considered unethical or possibly even copyright >infringement or (worst case) even illegal; >(3) especially if you're thinking of selling the data; >(4) at the very least, unless you take care not to abuse the service, >it may be rude and the website may even block your access. > >There are many tutorials and examples of "screen scraping" or "web scraping" on >the internet -- try reading them. It's not something I personally have any >experience with, but I expect that the process goes something like this: > >- connect to the website; >- download the particular page you want; >- grab the data that you care about; >- remove HTML tags and extract just the bits needed; >- write them to a CSV file.
wget does the hard part. -- https://mail.python.org/mailman/listinfo/python-list