Re: [Tutor] python module to search a website
vineeth vineethrak...@gmail.com wrote looking for scraping. I am looking to obtain the html page that my query is going to return. I'm still notcompletely sure what you mean. What query are you talking about? The http GET request? Or a query transaction on the remote site? Just like when you type in a site like Amazon you get a bunch of product listing When I visit Amazon I get a home page which has a bunch of products on it. Those prodiucts are provisded by Amazon's web application and I have no control over it. If I type a string into the search box Amazons app goes off to search their database and returns a bunch of links. Again I ghave no control over which links it returns, that is done by Amazons application logic. the module has to search the website and return the html link. It is impossible for any Python module to search a remote website, that can only be done by code on that website server. The best a Python module could do would be to initiate the search by posting the appropriate search string. But that uis just standard html parsing and urllib. If I understand what you are asking for then I think it is impossible. And I suspect you are a bit confused about how web sites work. As a user of a web sitre you are reliant on the functions provided by the server. If the web site is purely static, like my tutorial for example, you could do a search if you knew the file structure and had access to the folders where the html is stored, but when the pages are created dynamically, like Amazon, Ebay etc then it is impossible to search it. You would need access to their database. HTH, -- Alan Gauld Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] python module to search a website
Hello all, I am looking forward for a python module to search a website and extract the url. For example I found a module for Amazon with the name amazonproduct, the api does the job of extracting the data based on the query it even parses the url data. I am looking some more similar query search python module for other websites like Amazon. Any help is appreciated. Thank You Vin ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] python module to search a website
On 02/26/2011 10:11 PM, vineeth wrote: Hello all, I am looking forward for a python module to search a website and extract the url. What website, what is it searching for, and what URL is it looking for? For example I found a module for Amazon with the name amazonproduct, the api does the job of extracting the data based on the query it even parses the url data. I am looking some more similar query search python module for other websites like Amazon. The only module I found for amazon-product was a python interface to Amazon's advertising API. What data does it extract, what query, and which URL does it parse? From what I found that module uses the API to search the website, a service provided by Amazon and not something Python is doing itself. You may want to look into urlparse and urllib2, for parsing URLs and opening websites respectively. http://docs.python.org/library/urlparse.html http://docs.python.org/library/urllib2.html If that isn't what you're looking for, you'll need to be a bit more descriptive. If you are going to be parsing the HTML and then searching for specific elements you might look into BeautifulSoup. -- Corey Richardson ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] python module to search a website
n Sat, Feb 26, 2011 at 21:11, vineeth vineethrak...@gmail.com wrote: Hello all, I am looking forward for a python module to search a website and extract the url. For example I found a module for Amazon with the name amazonproduct, the api does the job of extracting the data based on the query it even parses the url data. I am looking some more similar query search python module for other websites like Amazon. Any help is appreciated. Thank You Vin I am not sure what url you are trying to extract, or from where, but I can give you an example of basic web scraping if that is your aim. The following works for Python 2.x. #This one module that gives you the needed methods to read the html from a webpage import urllib #set a variable to the needed website mypath = http://some_website.com; #read all the html data from the page into a variable and then parse through it looking for urls mylines = urllib.urlopen(mypath).readlines() for item in mylines: if http://; in item: ...do something with the url that was found in the page html... ...etc... --Bill ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] python module to search a website
Hi Bill, Thanks for the reply, I know how the urllib module works I am not looking for scraping. I am looking to obtain the html page that my query is going to return. Just like when you type in a site like Amazon you get a bunch of product listing the module has to search the website and return the html link. I can ofcourse scrap the information from that link. Thanks Vin On 02/27/2011 12:04 AM, Bill Allen wrote: n Sat, Feb 26, 2011 at 21:11, vineeth vineethrak...@gmail.com mailto:vineethrak...@gmail.com wrote: Hello all, I am looking forward for a python module to search a website and extract the url. For example I found a module for Amazon with the name amazonproduct, the api does the job of extracting the data based on the query it even parses the url data. I am looking some more similar query search python module for other websites like Amazon. Any help is appreciated. Thank You Vin I am not sure what url you are trying to extract, or from where, but I can give you an example of basic web scraping if that is your aim. The following works for Python 2.x. #This one module that gives you the needed methods to read the html from a webpage import urllib #set a variable to the needed website mypath = http://some_website.com; #read all the html data from the page into a variable and then parse through it looking for urls mylines = urllib.urlopen(mypath).readlines() for item in mylines: if http://; in item: ...do something with the url that was found in the page html... ...etc... --Bill ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor