Hi,
Mithun Bhattacharya wrote:
> 
> --- "आशीष शुक्ला \"Wah Java !!\""
> <[EMAIL PROTECTED]> wrote:
> 
> 
>>> I am not sure which HTML specification you are looking at but the
>> W3
>>> page says quite opposite of what you are claiming
>> I'm also looking at the same HTML v. 4.01 specification.
>>
>>> http://www.w3.org/TR/html4/charset.html
>> above URL also says, this:
>>

--------------- please re-read this ------------------
>> -- begin quote --
>> To sum up, conforming user agents must observe the following
>> priorities when 
>> determining a document's character encoding (from highest priority to
>> lowest):
>> 1. An HTTP "charset" parameter in a "Content-Type" field.
>> 2. A META declaration with "http-equiv" set to "Content-Type" and a
>> value set 
>> for "charset".
>> 3. The charset attribute set on an element that designates an
>> external resource.
--------------- end please re-read this ---------------
> 
> So a Meta declaration will override the Content-Type header since
> ContentType could possibly be a servier configuration whereas the META
> tag is controlled by the person maintaining the page who idealy should
> be a better judge of what the document is actually in.

How a META declaration overrides Content-Type header, as standards say that 
HTTP 
header is given priority. But hey this is the mistake in the stanard, and it 
should be like this, that document knows well about itself, not webserver.

> 
>> In addition to this list of priorities, the user agent may use
>> heuristics and 
>> user settings. For example, many user agents use a heuristic to
>> distinguish the 
>> various encodings used for Japanese text. Also, user agents typically
>> have a 
>> user-definable, local default character encoding which they apply in
>> the absence 
>> of other indicators.
> 
> I believe this is very iffy and behaviour may change with even a small
> patch to whatever browser you are using - basically no standards on
> this behaviour.
> 
>> This kind of interaction is great, but it is not the only kind of
>> interaction we 
>> have. I mean, it works when you have document in multiple encodings,
>> and 
>> depending on user agent preferences, you respond. And, also there has
>> to be 
>> someway, by which we can inform our webserver that document.html, 
>> document.utf8.html, document.iso-8859-1.html are same docs in
>> different 
>> encodings. But, my thing is (explained with an example):
> 
> What you are asking for is already implemented in the apache web server
> 
> http://httpd.apache.org/docs/1.3/mod/mod_mime.html

I know it is already implemented. But my thing is, not every time, a document 
writer owns a webserver ( I mean, he/she has some configuration rights on the 
webserver ). So, my question is any way by which he/she could publish his/her 
multilingual HTML docs on the web (without entitifying the docs ;-) ).

> 
>>> The majority of the problem starts now. The standards say that the
>>> content-type specified by the server is a recommendation or a
>> guideline
>>> and not an overriding instruction. The browser is supposed to
>> accept
>>> the data in good faith but is supposed to use it's own judegement
>> in
>>> handling the data. This is the reason why all browser give you an
>>> option to change the charset being used to render the current page.
>> BTW, which standards says it and where ??
> 
> Cant recall specific standards but a nice discussion on similar topic
> is available here
> http://ppewww.ph.gla.ac.uk/~flavell/www/content-type.html
> Do note I couldnt locate any reference to the fact that Content-Type
> can not be overridden at the broswer end.
> 
>> So, in other words, browser should not trust server.
> 
> In a hostile network I would prefer not to. I am not sure of the
> specifics but bottom line is it is a matter of trust - would I trust a
> unknown server to decide how I treat their data or should I be the best
> judge of it. I would rather let applications I trust decide what to do
> with anything.
> 
>> http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html#sec7.2.1
>> -- begin quote --
>> Any HTTP/1.1 message containing an entity-body SHOULD include a
>> Content-Type 
>> header field defining the media type of that body. If and only if the
>> media type 
>> is not given by a Content-Type field, the recipient MAY attempt to
>> guess the 
>> media type via inspection of its content and/or the name extension(s)
>> of the URI 
>> used to identify the resource. If the media type remains unknown, the
>> recipient 
>> SHOULD treat it as type "application/octet-stream"
>> -- end quote --
> 
> I think we have deviated a bit from Charset to Content-Type Charset is
> not as strictly enforced as Content-Type.
> 
> Yes there are sufficient broken webservers out there who say rpm is a
> real media file to give me headaches.

Yeah there are ;-).

> 
> Coming back to http://www.w3.org/TR/html4/charset.html My
> interpretation is
> 
> 1. Check for Content-Type use it if available. Go to item 2 for text
> contents
> 2. Check for META tag use it to over ride server side Content-Type
> 3. Check for element charset and override the charset for the specific
> element

This is where you're making mistake, the above points are arranged in order of 
highest priority to lower priority. So, if charset is available in 
"Content-Type" HTTP header, points 2, 3 are skipped. So specifying 
"Content-Type" header in META tag only works, if server hasn't specified any 
charset. Hope u r getting my point now.

> 
> Maybe my interpretation is wrong but I think that is what happens
> currently. Also as I mentioned before we are discussing two different
> topics here. Content-Type is a superset of charset as in in most
> scenarios Content-Type is sent without a charset involved which is why
> META tags play a lot of role.

In case of non text files, I mean u don't have "application/octet-strea; 
charset 
=utf-8" ;-) .

> 
> 
> 
> Mithun
> 

Thanks again,
Ashish Shukla "Wah Java !!"
Wah Java !!
-- 
आशीष शुक्ला alias "Wah Java !!"
http://wahjava.blogspot.com/

The only key to optimal life is precision.

                                -- Ashish Shukla "Wah Java !!"
                       http://wahjava.blogspot.com/2006/03/useful-thought.html


_______________________________________________
ilugd mailinglist -- ilugd@lists.linux-delhi.org
http://frodo.hserus.net/mailman/listinfo/ilugd
Archives at: http://news.gmane.org/gmane.user-groups.linux.delhi 
http://www.mail-archive.com/ilugd@lists.linux-delhi.org/

  • [ilugd] Publish... आशीष शुक्ला \"Wah Java !!\"
    • Re: [ilugd... आशीष शुक्ला \"Wah Java !!\"
    • Re: [ilugd... Gora Mohanty
      • Re: [i... आशीष शुक्ला \"Wah Java !!\"
        • Re... Gora Mohanty
          • ... आशीष शुक्ला \"Wah Java !!\"
        • Re... Mithun Bhattacharya
          • ... आशीष शुक्ला \"Wah Java !!\"
            • ... Mithun Bhattacharya
              • ... आशीष शुक्ला \"Wah Java !!\"
                • ... आशीष शुक्ला \"Wah Java !!\"
                • ... Raj Mathur
                • ... Nishant Sharma
                • ... आशीष शुक्ला \"Wah Java !!\"
                • ... Vikas Upadhyay
                • ... आशीष शुक्ला \"Wah Java !!\"
                • ... Vikas Upadhyay
                • ... आशीष शुक्ला \"Wah Java !!\"
                • ... आशीष शुक्ला \"Wah Java !!\"
                • ... Naresh Narang

Reply via email to