Hi, Hakka,

According to what I remember of the HTML spec, the first parts of the HTML 
content (<html><head><meta>...) should all be basic ascii (bytes 0 - 127).  So 
you can try reading the first KB or so until you encounter the <meta> tag.

Then you'll have to re-read with the encoding you've extracted!

I think almost every known encoding supports the lower half of the ascii chart 
(0 - 127).  It's only when the first bit of the character is a 1 when things 
get exciting.

Good luck!

You'll probably need to support all combinations of lower-case and upper-case 
(since all are possible in HTML 4):

<meta>
<metA>
<meTa>
<meTA>
<mEta>
<mEtA>
<mETa>
<mETA>
<Meta>
<MetA>
<MeTa>
<MeTA>
<MEta>
<MEtA>
<METa>
<META>

Maybe it's best just to convert whatever you find all to lowercase before 
trying to extract the "http equiv".


yours,

Julius

http://juliusdavies.ca/

-----Original Message-----
From:   Hakka Ville [mailto:[EMAIL PROTECTED]
Sent:   Sat 11/18/2006 4:44 AM
To:     [email protected]
Cc:     
Subject:        how to detect charset encoding from "meta http-equiv" ?

Dear Sirs,

I tried to use httpclient, server doesn't set encoding within http response
header, but does in the page itself with "meta http-equiv". How can I tell
httpclient to detect (cyrillic) encoding from that thing ?

Cheers,
Hakka




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to