Re: [Tutor] python module to search a website

2011-02-27 Thread Alan Gauld

vineeth vineethrak...@gmail.com wrote

looking for scraping. I am looking to obtain the html page that my 
query

is going to return.


I'm still notcompletely sure what you mean.
What query are you talking about? The http GET request?
Or a query transaction on the remote site?


Just like when you type in a site like Amazon you
get a bunch of product listing


When I visit Amazon I get a home page which has
a bunch of products on it. Those prodiucts are provisded
by Amazon's web application and I have no control over it.
If I type a string into the search box Amazons app goes
off to search their database and returns a bunch of links.
Again I ghave no control over which links it returns,
that is done by Amazons application logic.


the module has to search the website and
return the html link.


It is impossible for any Python module to search a
remote website, that can only be done by code on
that website server. The best a Python module could
do would be to initiate the search by posting the
appropriate search string. But that uis just standard
html parsing and urllib.

If I understand what you are asking for then I think
it is impossible. And I suspect you are a bit confused
about how web sites work. As a user of a web sitre
you are reliant on the functions provided by the server.

If the web site is purely static, like my tutorial for
example, you could do a search if you knew the
file structure and had access to the folders where
the html is stored, but when the pages are created
dynamically, like Amazon, Ebay etc then it is
impossible to search it. You would need access
to their database.

HTH,

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] python module to search a website

2011-02-26 Thread vineeth

Hello all,

I am looking forward for a python module to search a website and extract 
the url.


For example I found a module for Amazon with the name amazonproduct, 
the api does the job of extracting the data based on the query it even 
parses the url data. I am looking some more similar query search python 
module for other websites like Amazon.


Any help is appreciated.

Thank You
Vin
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] python module to search a website

2011-02-26 Thread Corey Richardson
On 02/26/2011 10:11 PM, vineeth wrote:
 Hello all,
 
 I am looking forward for a python module to search a website and extract 
 the url.

What website, what is it searching for, and what URL is it looking for?

 
 For example I found a module for Amazon with the name amazonproduct, 
 the api does the job of extracting the data based on the query it even 
 parses the url data. I am looking some more similar query search python 
 module for other websites like Amazon.

The only module I found for amazon-product was a python interface to
Amazon's advertising API. What data does it extract, what query, and
which URL does it parse? From what I found that module uses the API to
search the website, a service provided by Amazon and not something
Python is doing itself.

You may want to look into urlparse and urllib2, for parsing URLs and
opening websites respectively.

http://docs.python.org/library/urlparse.html
http://docs.python.org/library/urllib2.html

If that isn't what you're looking for, you'll need to be a bit more
descriptive.

If you are going to be parsing the HTML and then searching for specific
elements you might look into BeautifulSoup.

-- 
Corey Richardson
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] python module to search a website

2011-02-26 Thread Bill Allen
n Sat, Feb 26, 2011 at 21:11, vineeth vineethrak...@gmail.com wrote:

 Hello all,

 I am looking forward for a python module to search a website and extract
 the url.

 For example I found a module for Amazon with the name amazonproduct, the
 api does the job of extracting the data based on the query it even parses
 the url data. I am looking some more similar query search python module for
 other websites like Amazon.

 Any help is appreciated.

 Thank You
 Vin

I am not sure what url you are trying to extract, or from where, but I can
give you an example of basic web scraping if that is your aim.

The following works for Python 2.x.

#This one module that gives you the needed methods to read the html from a
webpage
import urllib

#set a variable to the needed website
mypath = http://some_website.com;

#read all the html data from the page into a variable and then parse through
it looking for urls
mylines = urllib.urlopen(mypath).readlines()
for item in mylines:
if http://; in item:
 ...do something with the url that was found in the page html...
 ...etc...


--Bill
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] python module to search a website

2011-02-26 Thread vineeth

Hi Bill,

Thanks for the reply, I know how the urllib module works I am not 
looking for scraping. I am looking to obtain the html page that my query 
is going to return. Just like when you type in a site like Amazon you 
get a bunch of product listing the module has to search the website and 
return the html link. I can ofcourse scrap the information from that link.


Thanks
Vin

On 02/27/2011 12:04 AM, Bill Allen wrote:
n Sat, Feb 26, 2011 at 21:11, vineeth vineethrak...@gmail.com 
mailto:vineethrak...@gmail.com wrote:


Hello all,

I am looking forward for a python module to search a website and
extract the url.

For example I found a module for Amazon with the name
amazonproduct, the api does the job of extracting the data based
on the query it even parses the url data. I am looking some more
similar query search python module for other websites like Amazon.

Any help is appreciated.

Thank You
Vin

I am not sure what url you are trying to extract, or from where, but I 
can give you an example of basic web scraping if that is your aim.


The following works for Python 2.x.

#This one module that gives you the needed methods to read the html 
from a webpage

import urllib

#set a variable to the needed website
mypath = http://some_website.com;

#read all the html data from the page into a variable and then parse 
through it looking for urls

mylines = urllib.urlopen(mypath).readlines()
for item in mylines:
if http://; in item:
 ...do something with the url that was found in the page html...
 ...etc...


--Bill
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor