Re: [Tutor] reading web page with BeautifulSoup

2012-12-13 Thread Alan Gauld

On 13/12/12 01:47, Ed Owens wrote:

 >>> from urllib2 import urlopen
 >>> page = urlopen('w1.weather.gov/obhistory/KDCA.html')
Traceback (most recent call last):
ValueError: unknown url type: w1.weather.gov/obhistory/KDCA.html



copy the url from the error message into my browser and get the page.


Browsers have evolved to make all sorts of intelligent guesses about 
what the true URL is based on what the user types in. They try 
pre-pending various types and pre and post fixes (for example

you can usually miss out the www part or the .com part).

Urlopen makes no such assumptions, you must provide the full url
(with the exception of the port) including the type (ftp, mail,
http etc)

HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reading web page with BeautifulSoup

2012-12-12 Thread Ed Owens


On 12/12/12 9:03 PM, Dave Angel wrote:

On 12/12/2012 08:47 PM, Ed Owens wrote:

from urllib2 import urlopen
page = urlopen('w1.weather.gov/obhistory/KDCA.html')

Traceback (most recent call last):
   File "", line 1, in 
   File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
line 126, in urlopen
 return _opener.open(url, data, timeout)
   File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
line 386, in open
 protocol = req.get_type()
   File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
line 248, in get_type
 raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: w1.weather.gov/obhistory/KDCA.html
Can anyone see what I'm doing wrong here?  I have bs4 and urllib2
imported, and get the above error when trying to read that page.  I
can copy the url from the error message into my browser and get the page.

Like the error says, unknown type.  Prepend the type of the url, and it
should work fine:

page = urlopen('http://w1.weather.gov/obhistory/KDCA.html')




Yep, that was it.  Thanks for the help.  Now on to fight with BeautifulSoup


Ed

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reading web page with BeautifulSoup

2012-12-12 Thread Dave Angel
On 12/12/2012 08:47 PM, Ed Owens wrote:
> >>> from urllib2 import urlopen
> >>> page = urlopen('w1.weather.gov/obhistory/KDCA.html')
> Traceback (most recent call last):
>   File "", line 1, in 
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 126, in urlopen
> return _opener.open(url, data, timeout)
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 386, in open
> protocol = req.get_type()
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 248, in get_type
> raise ValueError, "unknown url type: %s" % self.__original
> ValueError: unknown url type: w1.weather.gov/obhistory/KDCA.html
> >>>
>
> Can anyone see what I'm doing wrong here?  I have bs4 and urllib2
> imported, and get the above error when trying to read that page.  I
> can copy the url from the error message into my browser and get the page.

Like the error says, unknown type.  Prepend the type of the url, and it
should work fine:

page = urlopen('http://w1.weather.gov/obhistory/KDCA.html')



-- 

DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reading web page with BeautifulSoup

2012-12-12 Thread शंतनू

On 13/12/12 12:47 PM, Ed Owens wrote:
> >>> from urllib2 import urlopen
> >>> page = urlopen('w1.weather.gov/obhistory/KDCA.html')
> Traceback (most recent call last):
>   File "", line 1, in 
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 126, in urlopen
> return _opener.open(url, data, timeout)
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 386, in open
> protocol = req.get_type()
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 248, in get_type
> raise ValueError, "unknown url type: %s" % self.__original
> ValueError: unknown url type: w1.weather.gov/obhistory/KDCA.html
> >>>
>
> Can anyone see what I'm doing wrong here?  I have bs4 and urllib2
> imported, and get the above error when trying to read that page.  I
> can copy the url from the error message into my browser and get the page.

You may try the URL with 'http://' or 'https://' instead of 'w1.'.

HTH.

-- 
शंतनू
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor