Impersonating other broswers...

2005-03-05 Thread sboyle55
So I wrote a quick python program (my first ever) that needs to
download pages off the web.

I'm using urlopen, and it works fine.  But I'd like to be able to
change my browser string from Python-urllib/1.15 to instead
impersonate Internet Explorer.

I know this can be done very easily with Perl, so I'm assuming it's
also easy in Python.  How do I do it?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Impersonating other broswers...

2005-03-05 Thread Diez B. Roggisch
[EMAIL PROTECTED] wrote:

 So I wrote a quick python program (my first ever) that needs to
 download pages off the web.
 
 I'm using urlopen, and it works fine.  But I'd like to be able to
 change my browser string from Python-urllib/1.15 to instead
 impersonate Internet Explorer.
 
 I know this can be done very easily with Perl, so I'm assuming it's
 also easy in Python.  How do I do it?

from the urllib docs:

'''
class URLopener(
[proxies[, **x509]])

 Base class for opening and reading URLs. Unless you need to support opening
objects using schemes other than http:, ftp:, gopher: or file:, you
probably want to use FancyURLopener. 

By default, the URLopener class sends a User-Agent: header of urllib/VVV,
where VVV is the urllib version number. Applications can define their own
User-Agent: header by subclassing URLopener or FancyURLopener and setting
the instance attribute version to an appropriate string value before the
open() method is called. 


The optional proxies parameter should be a dictionary mapping scheme names
to proxy URLs, where an empty dictionary turns proxies off completely. Its
default value is None, in which case environmental proxy settings will be
used if present, as discussed in the definition of urlopen(), above. 


Additional keyword parameters, collected in x509, are used for
authentication with the https: scheme. The keywords key_file and cert_file
are supported; both are needed to actually retrieve a resource at an https:
URL. 

'''
-- 
Regards,

Diez B. Roggisch
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Impersonating other broswers...

2005-03-05 Thread Skip Montanaro

sboyle I'm using urlopen, and it works fine.  But I'd like to be able
sboyle to change my browser string from Python-urllib/1.15 to instead
sboyle impersonate Internet Explorer.

sboyle I know this can be done very easily with Perl, so I'm assuming
sboyle it's also easy in Python.  How do I do it?

Easy is in the eye of the beholder I suppose.  It doesn't look as
straightforward as I would have thought.  You can subclass the
FancyURLopener class like so:

class MSIEURLopener(urllib.FancyURLopener):
version = Internet Exploder

then set urllib._urlopener to it:

urllib._urlopener = MSIEURLopener

After that, urllib.urlopen() should spit out your user-agent string.

Seems like FancyURLopener should support setting the user agent string
directly.  You can accomplish that with something like this:

class FlexibleUAopener(urllib.FancyURLopener):
def set_user_agent(self, user_agent):
ua = [(hdr, val) for (hdr, val) in self.addheaders
 if hdr == User-agent]
while ua:
self.addheaders.remove(ua[0])
ua.pop()
self.addheader((User-agent, user_agent))

You'd then be able to set the user agent, but have to use your new opener
class directly:

opener = FlexibleUAopener(...)
opener.set_user_agent(Internet Exploder)
f = opener.open(url)
print f.read()

It doesn't look any easier to do this using urllib2.  Seems like a
semi-obvious oversight for both modules.  That suggests few people have ever
desired this capability.

Skip
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Impersonating other broswers...

2005-03-05 Thread Eric Pederson
Skip Montanaro [EMAIL PROTECTED] wrote

 It doesn't look any easier to do this using urllib2.  Seems like a
 semi-obvious oversight for both modules.  That suggests few people have 
 ever
 desired this capability.


my $.02:

I have trouble believing few people have not desired this for two reasons:

(1)  some web sites will shut out user agents they do not recognize to preserve 
bandwidth or for other reasons; the right User Agent ID can be required to get 
the data one wants;

(2)  It seems like it is a worthwhile courtesy to identify oneself when 
spidering or data scraping, and the User Agent ID seems like the obvious way to 
do that. I'd guess (and like to think) that Python users are generally a little 
more concerned with such courtesies than the user population of some other 
languages.

e.g.  Your website might get a hit from:  Mozilla/5.0 (Songzilla MP3 Blog, 
http://songzilla.blogspot.com) Gecko/20041107 Firefox/1.0

And you'll get to decide whether to shut them out or not, but at least it won't 
seem like the black hats are attacking.




Eric Pederson
http://www.songzilla.blogspot.com
:::
domainNot=@something.com
domainIs=domainNot.replace(s,z)
ePrefix=.join([chr(ord(x)+1) for x in do])
mailMeAt=ePrefix+domainIs
:::

--
http://mail.python.org/mailman/listinfo/python-list