RE: how to detect the character encoding in a web page ?

2013-06-09 Thread Carlos Nepomuceno
Try this:

### get_charset.py ###
import re
import urllib2

def  get_charset(url):
resp = urllib2.urlopen(url)
#retrieve charset from header
headers = ''.join(resp.headers.headers)
charset_from_header_list = re.findall('charset=(.*)', headers)
charset_from_header = charset_from_header_list[-1] if 
charset_from_header_list else ''

#retrieve charset from html
html = resp.read()
charset_from_html_list = 
re.findall('Content-Type.*charset=["\']?(.*)["\']', html)
charset_from_html = charset_from_html_list[-1]  if charset_from_html_list 
else ''

return charset_from_html if charset_from_html else charset_from_header




> Date: Sun, 9 Jun 2013 04:47:02 -0700
> Subject: Re: how to detect the character encoding  in a web page ?
> From: redstone-c...@163.com
> To: python-list@python.org
> 
> 在 2012年12月24日星期一UTC+8上午8时34分47秒,iMath写道:
> > how to detect the character encoding  in a web page ?
> > 
> > such as this page 
> > 
> > 
> > 
> > http://python.org/
> 
> Finally ,I found by using PyQt’s QtextStream , QTextCodec and chardet ,we can 
> get a web page code more securely  
> even for this bad page
> http://www.qnwz.cn/html/yinlegushihui/magazine/2013/0524/425731.html 
> 
> this script 
> http://www.flvxz.com/getFlv.php?url=aHR0cDojI3d3dy41Ni5jb20vdTk1L3ZfT1RFM05UYzBNakEuaHRtbA==
> 
> and this page without chardet in its source code 
> http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx
> 
> 
> from PyQt4.QtCore import *
> from PyQt4.QtGui import *
> from PyQt4.QtNetwork  import *
> import sys
> import chardet
> 
> def slotSourceDownloaded(reply):
> redirctLocation=reply.header(QNetworkRequest.LocationHeader)
> redirctLocationUrl=reply.url() if not redirctLocation else redirctLocation
> #print(redirctLocationUrl,reply.header(QNetworkRequest.ContentTypeHeader))
> 
> if (reply.error()!= QNetworkReply.NoError):
> print('', reply.errorString())
> return
> 
> pageCode=reply.readAll()
> charCodecInfo=chardet.detect(pageCode.data())
> 
> textStream=QTextStream(pageCode)
> 
> codec=QTextCodec.codecForHtml(pageCode,QTextCodec.codecForName(charCodecInfo['encoding']
>  ))
> textStream.setCodec(codec)
> content=textStream.readAll()
> print(content)
> 
> if content=='':
> print('-', 'cannot find any resource !')
> return
> 
> reply.deleteLater()
> qApp.quit()
> 
> 
> if __name__ == '__main__':
> app =QCoreApplication(sys.argv)
> manager=QNetworkAccessManager ()
> url =input('input url :')
> request=QNetworkRequest 
> (QUrl.fromEncoded(QUrl.fromUserInput(url).toEncoded()))
> request.setRawHeader("User-Agent" ,'Mozilla/5.0 (Windows NT 5.1) 
> AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17 SE 
> 2.X MetaSr 1.0')
> manager.get(request)
> manager.finished.connect(slotSourceDownloaded)
> sys.exit(app.exec_())
> -- 
> http://mail.python.org/mailman/listinfo/python-list
  -- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2013-06-09 Thread iMath
在 2012年12月24日星期一UTC+8上午8时34分47秒,iMath写道:
> how to detect the character encoding  in a web page ?
> 
> such as this page 
> 
> 
> 
> http://python.org/

here is one thread that can help me understanding my code 
http://stackoverflow.com/questions/17001407/how-to-detect-the-character-encoding-of-a-web-page-programmatically/17009285#17009285
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2013-06-09 Thread iMath
在 2012年12月24日星期一UTC+8上午8时34分47秒,iMath写道:
> how to detect the character encoding  in a web page ?
> 
> such as this page 
> 
> 
> 
> http://python.org/

Finally ,I found by using PyQt’s QtextStream , QTextCodec and chardet ,we can 
get a web page code more securely  
even for this bad page
http://www.qnwz.cn/html/yinlegushihui/magazine/2013/0524/425731.html 

this script 
http://www.flvxz.com/getFlv.php?url=aHR0cDojI3d3dy41Ni5jb20vdTk1L3ZfT1RFM05UYzBNakEuaHRtbA==

and this page without chardet in its source code 
http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx


from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtNetwork  import *
import sys
import chardet

def slotSourceDownloaded(reply):
redirctLocation=reply.header(QNetworkRequest.LocationHeader)
redirctLocationUrl=reply.url() if not redirctLocation else redirctLocation
#print(redirctLocationUrl,reply.header(QNetworkRequest.ContentTypeHeader))

if (reply.error()!= QNetworkReply.NoError):
print('', reply.errorString())
return

pageCode=reply.readAll()
charCodecInfo=chardet.detect(pageCode.data())

textStream=QTextStream(pageCode)

codec=QTextCodec.codecForHtml(pageCode,QTextCodec.codecForName(charCodecInfo['encoding']
 ))
textStream.setCodec(codec)
content=textStream.readAll()
print(content)

if content=='':
print('-', 'cannot find any resource !')
return

reply.deleteLater()
qApp.quit()


if __name__ == '__main__':
app =QCoreApplication(sys.argv)
manager=QNetworkAccessManager ()
url =input('input url :')
request=QNetworkRequest 
(QUrl.fromEncoded(QUrl.fromUserInput(url).toEncoded()))
request.setRawHeader("User-Agent" ,'Mozilla/5.0 (Windows NT 5.1) 
AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17 SE 2.X 
MetaSr 1.0')
manager.get(request)
manager.finished.connect(slotSourceDownloaded)
sys.exit(app.exec_())
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2013-06-06 Thread Chris Angelico
On Thu, Jun 6, 2013 at 4:22 PM, Nobody  wrote:
> On Thu, 06 Jun 2013 03:55:11 +1000, Chris Angelico wrote:
>
>> The HTTP header is completely out of band. This is the best way to
>> transmit encoding information. Otherwise, you assume 7-bit ASCII and start
>> parsing. Once you find a meta tag, you stop parsing and go back to the
>> top, decoding in the new way.
>
> Provided that the meta tag indicates an ASCII-compatible encoding, and you
> haven't encountered any decode errors due to 8-bit characters, then
> there's no need to go back to the top.

Technically and conceptually, you go back to the start and re-parse.
Sure, you might optimize that if you can, but not every parser will,
hence it's advisable to put the content-type as early as possible.

>> "ASCII-compatible" covers a huge number of
>> encodings, so it's not actually much of a problem to do this.
>
> With slight modifications, you can also handle some
> almost-ASCII-compatible encodings such as shift-JIS.
>
> Personally, I'd start by assuming ISO-8859-1, keep track of which bytes
> have actually been seen, and only re-start parsing from the top if the
> encoding change actually affects the interpretation of any of those bytes.

Hrm, it'd be equally valid to guess UTF-8. But as long as you're
prepared to re-parse after finding the content-type, that's just a
choice of optimization and has no real impact.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2013-06-05 Thread Nobody
On Thu, 06 Jun 2013 03:55:11 +1000, Chris Angelico wrote:

> The HTTP header is completely out of band. This is the best way to
> transmit encoding information. Otherwise, you assume 7-bit ASCII and start
> parsing. Once you find a meta tag, you stop parsing and go back to the
> top, decoding in the new way.

Provided that the meta tag indicates an ASCII-compatible encoding, and you
haven't encountered any decode errors due to 8-bit characters, then
there's no need to go back to the top.

> "ASCII-compatible" covers a huge number of
> encodings, so it's not actually much of a problem to do this.

With slight modifications, you can also handle some
almost-ASCII-compatible encodings such as shift-JIS.

Personally, I'd start by assuming ISO-8859-1, keep track of which bytes
have actually been seen, and only re-start parsing from the top if the
encoding change actually affects the interpretation of any of those bytes.

And if the encoding isn't even remotely ASCII-compatible, you aren't going
to be able to recognise the meta tag in the first place. But I don't think
I've ever seen a web page encoded in UTF-16 or EBCDIC.

Tools like chardet are meant for the situation where either no encoding is
specified or the specified encoding can't be trusted (which is rather
common; why else would web browsers have a menu to allow the user to
select the encoding?).

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2013-06-05 Thread Chris Angelico
On Thu, Jun 6, 2013 at 1:14 AM, iMath  wrote:
> 在 2012年12月24日星期一UTC+8上午8时34分47秒,iMath写道:
>> how to detect the character encoding  in a web page ?
>>
>> such as this page
>>
>>
>>
>> http://python.org/
>
> by the way  ,we cannot get character encoding programmatically from the mate 
> data without knowing the  character encoding  ahead !

The rules for web pages are (massively oversimplified):

1) HTTP header
2) ASCII-compatible encoding and meta tag

The HTTP header is completely out of band. This is the best way to
transmit encoding information. Otherwise, you assume 7-bit ASCII and
start parsing. Once you find a meta tag, you stop parsing and go back
to the top, decoding in the new way. "ASCII-compatible" covers a huge
number of encodings, so it's not actually much of a problem to do
this.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2013-06-05 Thread iMath
在 2012年12月24日星期一UTC+8上午8时34分47秒,iMath写道:
> how to detect the character encoding  in a web page ?
> 
> such as this page 
> 
> 
> 
> http://python.org/

by the way  ,we cannot get character encoding programmatically from the mate 
data without knowing the  character encoding  ahead !
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2013-06-05 Thread iMath
在 2012年12月24日星期一UTC+8上午8时34分47秒,iMath写道:
> how to detect the character encoding  in a web page ?
> 
> such as this page 
> 
> 
> 
> http://python.org/

I found PyQt’s QtextStream can  very accurately detect the character encoding 
in a web page . 
even for this bad page 
http://www.qnwz.cn/html/yinlegushihui/magazine/2013/0524/425731.html
chardet and beautiful soup failed ,but QtextStream can get the right result . 

here is my code 

from PyQt4.QtCore import * 
from PyQt4.QtGui import * 
from PyQt4.QtNetwork  import * 
import sys 
def slotSourceDownloaded(reply): 
redirctLocation=reply.header(QNetworkRequest.LocationHeader) 
redirctLocationUrl=reply.url() if not redirctLocation else redirctLocation 
print(redirctLocationUrl) 
  
if (reply.error()!= QNetworkReply.NoError): 
print('', reply.errorString()) 
return 
  
content=QTextStream(reply).readAll() 
if content=='': 
print('-', 'cannot find any resource !') 
return 
  
print(content) 
  
reply.deleteLater() 
qApp.quit() 
  
  
if __name__ == '__main__': 
app =QCoreApplication(sys.argv) 
manager=QNetworkAccessManager () 
url =input('input url :') 
request=QNetworkRequest 
(QUrl.fromEncoded(QUrl.fromUserInput(url).toEncoded())) 
request.setRawHeader("User-Agent" ,'Mozilla/5.0 (Windows NT 5.1) 
AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17 SE 2.X 
MetaSr 1.0') 
manager.get(request) 
manager.finished.connect(slotSourceDownloaded) 
sys.exit(app.exec_())
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2013-06-05 Thread iMath
在 2012年12月24日星期一UTC+8上午8时34分47秒,iMath写道:
> how to detect the character encoding  in a web page ?
> 
> such as this page 
> 
> 
> 
> http://python.org/

I found PyQt’s QtextStream can  very accurately detect the character encoding 
in a web page .
even for this bad page 

chardet and beautiful soup failed ,but QtextStream can get the right result .

here is my code 

from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtNetwork  import *
import sys
def slotSourceDownloaded(reply):
redirctLocation=reply.header(QNetworkRequest.LocationHeader)
redirctLocationUrl=reply.url() if not redirctLocation else redirctLocation
print(redirctLocationUrl)
  
if (reply.error()!= QNetworkReply.NoError):
print('', reply.errorString())
return
  
content=QTextStream(reply).readAll()
if content=='':
print('-', 'cannot find any resource !')
return
 
print(content)
 
reply.deleteLater()
qApp.quit()
 
 
if __name__ == '__main__':
app =QCoreApplication(sys.argv)
manager=QNetworkAccessManager ()
url =input('input url :')
request=QNetworkRequest 
(QUrl.fromEncoded(QUrl.fromUserInput(url).toEncoded()))
request.setRawHeader("User-Agent" ,'Mozilla/5.0 (Windows NT 5.1) 
AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17 SE 2.X 
MetaSr 1.0')
manager.get(request)
manager.finished.connect(slotSourceDownloaded)
sys.exit(app.exec_())
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2013-01-14 Thread Albert van der Horst
In article ,
Roy Smith   wrote:
>In article ,
> Alister  wrote:
>
>> Indeed due to the poor quality of most websites it is not possible to be
>> 100% accurate for all sites.
>>
>> personally I would start by checking the doc type & then the meta data as
>> these should be quick & correct, I then use chardectect only if these
>> fail to provide any result.
>
>I agree that checking the metadata is the right thing to do.  But, I
>wouldn't go so far as to assume it will always be correct.  There's a
>lot of crap out there with perfectly formed metadata which just happens
>to be wrong.
>
>Although it pains me greatly to quote Ronald Reagan as a source of
>wisdom, I have to admit he got it right with "Trust, but verify".  It's

Not surprisingly, as an actor, Reagan was as good as his script.
This one he got from Stalin.

>the only way to survive in the unicode world.  Write defensive code.
>Wrap try blocks around calls that might raise exceptions if the external
>data is borked w/r/t what the metadata claims it should be.

The way to go, of course.

Groetjes Albert
-- 
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2013-01-07 Thread iMath
在 2012年12月24日星期一UTC+8上午8时34分47秒,iMath写道:
> how to detect the character encoding  in a web page ?
> 
> such as this page 
> 
> 
> 
> http://python.org/

up to now , maybe chadet is the only way to let python automatically do it .
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2012-12-28 Thread python培训
在 2012年12月24日星期一UTC+8上午8时34分47秒,iMath写道:
> how to detect the character encoding  in a web page ?
> 
> such as this page 
> 
> 
> 
> http://python.org/

first setup  chardet 


import chardet
#抓取网页html
html_1 = urllib2.urlopen(line,timeout=120).read()
#print html_1
mychar=chardet.detect(html_1)
#print mychar
bianma=mychar['encoding']
if bianma == 'utf-8' or bianma == 'UTF-8':
#html=html.decode('utf-8','ignore').encode('utf-8')
   html=html_1
else :
html =html_1.decode('gb2312','ignore').encode('utf-8')
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2012-12-24 Thread Roy Smith
In article ,
 Alister  wrote:

> Indeed due to the poor quality of most websites it is not possible to be 
> 100% accurate for all sites.
> 
> personally I would start by checking the doc type & then the meta data as 
> these should be quick & correct, I then use chardectect only if these 
> fail to provide any result.

I agree that checking the metadata is the right thing to do.  But, I 
wouldn't go so far as to assume it will always be correct.  There's a 
lot of crap out there with perfectly formed metadata which just happens 
to be wrong.

Although it pains me greatly to quote Ronald Reagan as a source of 
wisdom, I have to admit he got it right with "Trust, but verify".  It's 
the only way to survive in the unicode world.  Write defensive code.  
Wrap try blocks around calls that might raise exceptions if the external 
data is borked w/r/t what the metadata claims it should be.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2012-12-24 Thread Alister
On Mon, 24 Dec 2012 13:50:39 +, Steven D'Aprano wrote:

> On Mon, 24 Dec 2012 13:16:16 +0100, Kwpolska wrote:
> 
>> On Mon, Dec 24, 2012 at 9:34 AM, Kurt Mueller
>>  wrote:
>>> $ wget -q -O - http://python.org/ | chardetect.py stdin: ISO-8859-2
>>> with confidence 0.803579722043 $
>> 
>> And it sucks, because it uses magic, and not reading the HTML tags. The
>> RIGHT thing to do for websites is detect the meta charset definition,
>> which is
>> 
>> 
>> 
>> or
>> 
>> 
>> 
>> The second one for HTML5 websites, and both may require case conversion
>> and the useless ` /` at the end.  But if somebody is using HTML5, you
>> are pretty much guaranteed to get UTF-8.
>> 
>> In today’s world, the proper assumption to make is “UTF-8 or GTFO”.
>> Because nobody in the right mind would use something else today.
> 
> Alas, there are many, many, many, MANY websites that are created by
> people who are *not* in their right mind. To say nothing of 15 year old
> websites that use a legacy encoding. And to support those, you may need
> to guess the encoding, and for that, chardetect.py is the solution.

Indeed due to the poor quality of most websites it is not possible to be 
100% accurate for all sites.

personally I would start by checking the doc type & then the meta data as 
these should be quick & correct, I then use chardectect only if these 
fail to provide any result.


-- 
I have found little that is good about human beings.  In my experience
most of them are trash.
-- Sigmund Freud
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2012-12-24 Thread Steven D'Aprano
On Mon, 24 Dec 2012 13:16:16 +0100, Kwpolska wrote:

> On Mon, Dec 24, 2012 at 9:34 AM, Kurt Mueller
>  wrote:
>> $ wget -q -O - http://python.org/ | chardetect.py stdin: ISO-8859-2
>> with confidence 0.803579722043 $
> 
> And it sucks, because it uses magic, and not reading the HTML tags. The
> RIGHT thing to do for websites is detect the meta charset definition,
> which is
> 
> 
> 
> or
> 
> 
> 
> The second one for HTML5 websites, and both may require case conversion
> and the useless ` /` at the end.  But if somebody is using HTML5, you
> are pretty much guaranteed to get UTF-8.
> 
> In today’s world, the proper assumption to make is “UTF-8 or GTFO”.
> Because nobody in the right mind would use something else today.

Alas, there are many, many, many, MANY websites that are created by 
people who are *not* in their right mind. To say nothing of 15 year old 
websites that use a legacy encoding. And to support those, you may need 
to guess the encoding, and for that, chardetect.py is the solution.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2012-12-24 Thread Kwpolska
On Mon, Dec 24, 2012 at 9:34 AM, Kurt Mueller
 wrote:
> $ wget -q -O - http://python.org/ | chardetect.py
> stdin: ISO-8859-2 with confidence 0.803579722043
> $

And it sucks, because it uses magic, and not reading the HTML tags.
The RIGHT thing to do for websites is detect the meta charset
definition, which is



or



The second one for HTML5 websites, and both may require case
conversion and the useless ` /` at the end.  But if somebody is using
HTML5, you are pretty much guaranteed to get UTF-8.

In today’s world, the proper assumption to make is “UTF-8 or GTFO”.
Because nobody in the right mind would use something else today.

-- 
Kwpolska 
stop html mail  | always bottom-post
www.asciiribbon.org | www.netmeister.org/news/learn2quote.html
GPG KEY: 5EAAEA16
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2012-12-24 Thread Kurt Mueller
Am 24.12.2012 um 04:03 schrieb iMath:
> but how to let python do it for you ? 
> such as these 2 pages 
> http://python.org/ 
> http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx
> how to  detect the character encoding in these 2 pages  by python ?


If you have the html code, let 
chardetect.py 
do an educated guess for you.

http://pypi.python.org/pypi/chardet

Example:
$ wget -q -O - http://python.org/ | chardetect.py 
stdin: ISO-8859-2 with confidence 0.803579722043
$ 

$ wget -q -O - 
'http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx' | 
chardetect.py 
stdin: utf-8 with confidence 0.87625
$ 


Grüessli
-- 
kurt.alfred.muel...@gmail.com

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2012-12-23 Thread iMath
在 2012年12月24日星期一UTC+8上午8时34分47秒,iMath写道:
> how to detect the character encoding  in a web page ?
> 
> such as this page 
> 
> 
> 
> http://python.org/

but how to let python do it for you ? 

such as these 2 pages 

http://python.org/ 
http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx

how to  detect the character encoding in these 2 pages  by python ?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2012-12-23 Thread iMath
在 2012年12月24日星期一UTC+8上午8时34分47秒,iMath写道:
> how to detect the character encoding  in a web page ?
> 
> such as this page 
> 
> 
> 
> http://python.org/

but how to let python do it for you ? 

such as these 2 pages 

http://python.org/ 
http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx

how to  detect the character encoding in these 2 pages  by python ?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2012-12-23 Thread iMath
在 2012年12月24日星期一UTC+8上午8时34分47秒,iMath写道:
> how to detect the character encoding  in a web page ?
> 
> such as this page 
> 
> 
> 
> http://python.org/

but how to let python do it for you ?

such as this page 

http://python.org/ 

how to  detect the character encoding in this web page by python ?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2012-12-23 Thread Hans Mulder
On 24/12/12 01:34:47, iMath wrote:
> how to detect the character encoding  in a web page ?

That depends on the site: different sites indicate
their encoding differently.

> such as this page:  http://python.org/

If you download that page and look at the HTML code, you'll find a line:

  

So it's encoded as utf-8.

Other sites declare their charset in the Content-Type HTTP header line.
And then there are sites relying on the default.  And sites that get
it wrong, and send data in a different encoding from what they declare.


Welcome to the real world,

-- HansM
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the character encoding in a web page ?

2012-12-23 Thread Chris Angelico
On Mon, Dec 24, 2012 at 11:34 AM, iMath  wrote:
> how to detect the character encoding  in a web page ?
> such as this page
>
> http://python.org/

You read part-way into the page, where you find this:



That tells you that the character set is UTF-8.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


how to detect the character encoding in a web page ?

2012-12-23 Thread iMath
how to detect the character encoding  in a web page ?
such as this page 

http://python.org/
-- 
http://mail.python.org/mailman/listinfo/python-list