Re: [Tutor] reading web page with BeautifulSoup
On 13/12/12 01:47, Ed Owens wrote: >>> from urllib2 import urlopen >>> page = urlopen('w1.weather.gov/obhistory/KDCA.html') Traceback (most recent call last): ValueError: unknown url type: w1.weather.gov/obhistory/KDCA.html copy the url from the error message into my browser and get the page. Browsers have evolved to make all sorts of intelligent guesses about what the true URL is based on what the user types in. They try pre-pending various types and pre and post fixes (for example you can usually miss out the www part or the .com part). Urlopen makes no such assumptions, you must provide the full url (with the exception of the port) including the type (ftp, mail, http etc) HTH -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] reading web page with BeautifulSoup
On 12/12/12 9:03 PM, Dave Angel wrote: On 12/12/2012 08:47 PM, Ed Owens wrote: from urllib2 import urlopen page = urlopen('w1.weather.gov/obhistory/KDCA.html') Traceback (most recent call last): File "", line 1, in File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 386, in open protocol = req.get_type() File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 248, in get_type raise ValueError, "unknown url type: %s" % self.__original ValueError: unknown url type: w1.weather.gov/obhistory/KDCA.html Can anyone see what I'm doing wrong here? I have bs4 and urllib2 imported, and get the above error when trying to read that page. I can copy the url from the error message into my browser and get the page. Like the error says, unknown type. Prepend the type of the url, and it should work fine: page = urlopen('http://w1.weather.gov/obhistory/KDCA.html') Yep, that was it. Thanks for the help. Now on to fight with BeautifulSoup Ed ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] reading web page with BeautifulSoup
On 12/12/2012 08:47 PM, Ed Owens wrote: > >>> from urllib2 import urlopen > >>> page = urlopen('w1.weather.gov/obhistory/KDCA.html') > Traceback (most recent call last): > File "", line 1, in > File > "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", > line 126, in urlopen > return _opener.open(url, data, timeout) > File > "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", > line 386, in open > protocol = req.get_type() > File > "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", > line 248, in get_type > raise ValueError, "unknown url type: %s" % self.__original > ValueError: unknown url type: w1.weather.gov/obhistory/KDCA.html > >>> > > Can anyone see what I'm doing wrong here? I have bs4 and urllib2 > imported, and get the above error when trying to read that page. I > can copy the url from the error message into my browser and get the page. Like the error says, unknown type. Prepend the type of the url, and it should work fine: page = urlopen('http://w1.weather.gov/obhistory/KDCA.html') -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] reading web page with BeautifulSoup
On 13/12/12 12:47 PM, Ed Owens wrote: > >>> from urllib2 import urlopen > >>> page = urlopen('w1.weather.gov/obhistory/KDCA.html') > Traceback (most recent call last): > File "", line 1, in > File > "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", > line 126, in urlopen > return _opener.open(url, data, timeout) > File > "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", > line 386, in open > protocol = req.get_type() > File > "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", > line 248, in get_type > raise ValueError, "unknown url type: %s" % self.__original > ValueError: unknown url type: w1.weather.gov/obhistory/KDCA.html > >>> > > Can anyone see what I'm doing wrong here? I have bs4 and urllib2 > imported, and get the above error when trying to read that page. I > can copy the url from the error message into my browser and get the page. You may try the URL with 'http://' or 'https://' instead of 'w1.'. HTH. -- शंतनू ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor