Hi, Hakka, According to what I remember of the HTML spec, the first parts of the HTML content (<html><head><meta>...) should all be basic ascii (bytes 0 - 127). So you can try reading the first KB or so until you encounter the <meta> tag.
Then you'll have to re-read with the encoding you've extracted! I think almost every known encoding supports the lower half of the ascii chart (0 - 127). It's only when the first bit of the character is a 1 when things get exciting. Good luck! You'll probably need to support all combinations of lower-case and upper-case (since all are possible in HTML 4): <meta> <metA> <meTa> <meTA> <mEta> <mEtA> <mETa> <mETA> <Meta> <MetA> <MeTa> <MeTA> <MEta> <MEtA> <METa> <META> Maybe it's best just to convert whatever you find all to lowercase before trying to extract the "http equiv". yours, Julius http://juliusdavies.ca/ -----Original Message----- From: Hakka Ville [mailto:[EMAIL PROTECTED] Sent: Sat 11/18/2006 4:44 AM To: [email protected] Cc: Subject: how to detect charset encoding from "meta http-equiv" ? Dear Sirs, I tried to use httpclient, server doesn't set encoding within http response header, but does in the page itself with "meta http-equiv". How can I tell httpclient to detect (cyrillic) encoding from that thing ? Cheers, Hakka --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
