Re: BeautifulSoup error

2006-06-16 Thread William
"Serge Orlov" <[EMAIL PROTECTED]> writes:

[...]

> Look at the traceback: you're not calling BeautifulSoup module! In
> fact, there is no feed method in the current BeautifulSoup
> documentation. Maybe it used to work well, but now it's definitely
> going to fail. As I understand documentation you need to write
>
> soup = BeautifulSoup(port)

Ah, yes ! Things change ! :-)

BeautifulSoup feed() method used to exist before 3.0.0, and was left
over to SGMLParser later. As explained in the changlog,

http://www.crummy.com/software/BeautifulSoup/CHANGELOG.html

Release 3.0.0 (2006/05/28), "Who would not give all else for two p"

Beautiful Soup no longer implements a feed method. You need to pass a
string or a filehandle into the soup constructor, not with feed after
the soup has been created. There is still a feed method, but it's the
feed method implemented by SGMLParser and calling it will bypass
Beautiful Soup and cause problems.

Thanks for all the help !

-- 
William

Thrashing is just virtual crashing.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: BeautifulSoup error

2006-06-16 Thread Serge Orlov
William Xu wrote:
> Hi, all,
>
> This piece of code used to work well. i guess the error occurs after
> some upgrade.
>
> >>> import urllib
> >>> from BeautifulSoup import BeautifulSoup
> >>> url = 'http://www.google.com'
> >>> port = urllib.urlopen(url).read()
> >>> soup = BeautifulSoup()
> >>> soup.feed(port)
> Traceback (most recent call last):
>   File "", line 1, in ?
>   File "/usr/lib/python2.3/sgmllib.py", line 94, in feed

Look at the traceback: you're not calling BeautifulSoup module! In
fact, there is no feed method in the current BeautifulSoup
documentation. Maybe it used to work well, but now it's definitely
going to fail. As I understand documentation you need to write

soup = BeautifulSoup(port)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: BeautifulSoup error

2006-06-16 Thread Ben Finney
Slawomir Nowaczyk <[EMAIL PROTECTED]> writes:

> >>> soup.feed( unicode(port,"iso-8859-1") )

Sure, once you have the encoding name. Visit a different URL, you may
get a different encoding which should be used.

-- 
 \   "I believe in making the world safe for our children, but not |
  `\our children's children, because I don't think children should |
_o__)  be having sex."  -- Jack Handey |
Ben Finney

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: BeautifulSoup error

2006-06-16 Thread William Xu
Ben Finney <[EMAIL PROTECTED]> writes:

> William Xu <[EMAIL PROTECTED]> writes:
>
>> >>> import urllib
>> >>> from BeautifulSoup import BeautifulSoup
>> >>> url = 'http://www.google.com'
>> >>> port = urllib.urlopen(url).read()
>
> Gets the data from the HTTP response. (I'm not sure why you call this
> "port".) The data is HTML text encoded to a string of bytes according
> to the character encoding specified in the response header fields.

i thought we can read and write to a port, like port in scheme. :-)

[...]

> Get the character encoding specified in the HTTP response, and decode
> the data to Unicode from that encoding.

How can i do this? i'm afraid i can't figure it out in the manual..

-- 
William

I just uploaded xtoolplaces-1.6. It fixes all bugs but one: It still
coredumps instead of doing something useful.  The upstream author's
e-mail address bounces, Redhat doesn't provide it and I never used it.
-- Sven Rudolph <[EMAIL PROTECTED]>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: BeautifulSoup error

2006-06-16 Thread Slawomir Nowaczyk
On Fri, 16 Jun 2006 15:20:48 +1000
Ben Finney <[EMAIL PROTECTED]> wrote:

#> > >>> soup = BeautifulSoup()
#> > >>> soup.feed(port)
#> > Traceback (most recent call last):
#> >   File "", line 1, in ?
#> >   File "/usr/lib/python2.3/sgmllib.py", line 94, in feed
#> > self.rawdata = self.rawdata + data
#> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb8 in position 565: 
ordinal not in range(128)
#> > >>>
#> 
#> Uses the default Python text encoding, 'ascii', when it needs to
#> decode the data in 'port' to Unicode. Some of the data in that
#> object makes no sense in the 'ascii' encoding, so it barfs.

In other words, this works for me:

>>> soup.feed( unicode(port,"iso-8859-1") )

-- 
 Best wishes,
   Slawomir Nowaczyk
 ( [EMAIL PROTECTED] )

^[:wq! Crap! Thought I was in vi.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: BeautifulSoup error

2006-06-15 Thread William Xu
"Serge Orlov" <[EMAIL PROTECTED]> writes:

[...]

> Upgrading python-beautifulsoup is a good idea, since there were two bug
> fix releases after 3.0.1

I just downloaded latest version 3.0.3 from its homepage, seems it still
has the same problem.

-- 
William

PL/I -- "the fatal disease" -- belongs more to the problem set than to the
solution set.
-- Edsger W. Dijkstra, SIGPLAN Notices, Volume 17, Number 5
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: BeautifulSoup error

2006-06-15 Thread Serge Orlov
William Xu wrote:
> Hi, all,
>
> This piece of code used to work well. i guess the error occurs after
> some upgrade.
>
> >>> import urllib
> >>> from BeautifulSoup import BeautifulSoup
> >>> url = 'http://www.google.com'
> >>> port = urllib.urlopen(url).read()
> >>> soup = BeautifulSoup()
> >>> soup.feed(port)
> Traceback (most recent call last):
>   File "", line 1, in ?
>   File "/usr/lib/python2.3/sgmllib.py", line 94, in feed
> self.rawdata = self.rawdata + data
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xb8 in position 565: 
> ordinal not in range(128)
> >>>
>
> Any ideas to solve this?

According to the documentation

chapter "Beautiful Soup Gives You Unicode, Dammit" Beautiful Soup fully
supports unicode so it's probably a bug.

> version info:
>
> Python 2.3.5 (#2, Mar  7 2006, 12:43:17)
> [GCC 4.0.3 20060212 (prerelease) (Debian 4.0.2-9)] on linux2
>
> python-beautifulsoup: 3.0.1-1

Upgrading python-beautifulsoup is a good idea, since there were two bug
fix releases after 3.0.1

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: BeautifulSoup error

2006-06-15 Thread Ben Finney
William Xu <[EMAIL PROTECTED]> writes:

> >>> import urllib
> >>> from BeautifulSoup import BeautifulSoup
> >>> url = 'http://www.google.com'
> >>> port = urllib.urlopen(url).read()

Gets the data from the HTTP response. (I'm not sure why you call this
"port".) The data is HTML text encoded to a string of bytes according
to the character encoding specified in the response header fields.

> >>> soup = BeautifulSoup()
> >>> soup.feed(port)
> Traceback (most recent call last):
>   File "", line 1, in ?
>   File "/usr/lib/python2.3/sgmllib.py", line 94, in feed
> self.rawdata = self.rawdata + data
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xb8 in position 565: 
> ordinal not in range(128)
> >>>

Uses the default Python text encoding, 'ascii', when it needs to
decode the data in 'port' to Unicode. Some of the data in that object
makes no sense in the 'ascii' encoding, so it barfs.

> Any ideas to solve this?

Get the character encoding specified in the HTTP response, and decode
the data to Unicode from that encoding.

-- 
 \   "Man cannot be uplifted; he must be seduced into virtue."  -- |
  `\   Donald Robert Perry Marquis |
_o__)  |
Ben Finney

-- 
http://mail.python.org/mailman/listinfo/python-list


BeautifulSoup error

2006-06-15 Thread William Xu
Hi, all,

This piece of code used to work well. i guess the error occurs after
some upgrade.

>>> import urllib
>>> from BeautifulSoup import BeautifulSoup
>>> url = 'http://www.google.com'
>>> port = urllib.urlopen(url).read()
>>> soup = BeautifulSoup()
>>> soup.feed(port)
Traceback (most recent call last):
  File "", line 1, in ?
  File "/usr/lib/python2.3/sgmllib.py", line 94, in feed
self.rawdata = self.rawdata + data
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb8 in position 565: 
ordinal not in range(128)
>>>

Any ideas to solve this?

version info:

Python 2.3.5 (#2, Mar  7 2006, 12:43:17)
[GCC 4.0.3 20060212 (prerelease) (Debian 4.0.2-9)] on linux2

python-beautifulsoup: 3.0.1-1

-- 
William

"I'd love to go out with you, but I have to floss my cat."
-- 
http://mail.python.org/mailman/listinfo/python-list