--- "आशीष शुक्ला \"Wah Java !!\""
<[EMAIL PROTECTED]> wrote:


> > I am not sure which HTML specification you are looking at but the
> W3
> > page says quite opposite of what you are claiming
> 
> I'm also looking at the same HTML v. 4.01 specification.
> 
> > http://www.w3.org/TR/html4/charset.html
> 
> above URL also says, this:
> 
> -- begin quote --
> To sum up, conforming user agents must observe the following
> priorities when 
> determining a document's character encoding (from highest priority to
> lowest):
> 1. An HTTP "charset" parameter in a "Content-Type" field.
> 2. A META declaration with "http-equiv" set to "Content-Type" and a
> value set 
> for "charset".
> 3. The charset attribute set on an element that designates an
> external resource.

So a Meta declaration will override the Content-Type header since
ContentType could possibly be a servier configuration whereas the META
tag is controlled by the person maintaining the page who idealy should
be a better judge of what the document is actually in.

> In addition to this list of priorities, the user agent may use
> heuristics and 
> user settings. For example, many user agents use a heuristic to
> distinguish the 
> various encodings used for Japanese text. Also, user agents typically
> have a 
> user-definable, local default character encoding which they apply in
> the absence 
> of other indicators.

I believe this is very iffy and behaviour may change with even a small
patch to whatever browser you are using - basically no standards on
this behaviour.

> This kind of interaction is great, but it is not the only kind of
> interaction we 
> have. I mean, it works when you have document in multiple encodings,
> and 
> depending on user agent preferences, you respond. And, also there has
> to be 
> someway, by which we can inform our webserver that document.html, 
> document.utf8.html, document.iso-8859-1.html are same docs in
> different 
> encodings. But, my thing is (explained with an example):

What you are asking for is already implemented in the apache web server

http://httpd.apache.org/docs/1.3/mod/mod_mime.html

> > The majority of the problem starts now. The standards say that the
> > content-type specified by the server is a recommendation or a
> guideline
> > and not an overriding instruction. The browser is supposed to
> accept
> > the data in good faith but is supposed to use it's own judegement
> in
> > handling the data. This is the reason why all browser give you an
> > option to change the charset being used to render the current page.
> 
> BTW, which standards says it and where ??

Cant recall specific standards but a nice discussion on similar topic
is available here
http://ppewww.ph.gla.ac.uk/~flavell/www/content-type.html
Do note I couldnt locate any reference to the fact that Content-Type
can not be overridden at the broswer end.

> So, in other words, browser should not trust server.

In a hostile network I would prefer not to. I am not sure of the
specifics but bottom line is it is a matter of trust - would I trust a
unknown server to decide how I treat their data or should I be the best
judge of it. I would rather let applications I trust decide what to do
with anything.

> http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html#sec7.2.1
> -- begin quote --
> Any HTTP/1.1 message containing an entity-body SHOULD include a
> Content-Type 
> header field defining the media type of that body. If and only if the
> media type 
> is not given by a Content-Type field, the recipient MAY attempt to
> guess the 
> media type via inspection of its content and/or the name extension(s)
> of the URI 
> used to identify the resource. If the media type remains unknown, the
> recipient 
> SHOULD treat it as type "application/octet-stream"
> -- end quote --

I think we have deviated a bit from Charset to Content-Type Charset is
not as strictly enforced as Content-Type.

Yes there are sufficient broken webservers out there who say rpm is a
real media file to give me headaches.

Coming back to http://www.w3.org/TR/html4/charset.html My
interpretation is

1. Check for Content-Type use it if available. Go to item 2 for text
contents
2. Check for META tag use it to over ride server side Content-Type
3. Check for element charset and override the charset for the specific
element

Maybe my interpretation is wrong but I think that is what happens
currently. Also as I mentioned before we are discussing two different
topics here. Content-Type is a superset of charset as in in most
scenarios Content-Type is sent without a charset involved which is why
META tags play a lot of role.



Mithun

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

_______________________________________________
ilugd mailinglist -- ilugd@lists.linux-delhi.org
http://frodo.hserus.net/mailman/listinfo/ilugd
Archives at: http://news.gmane.org/gmane.user-groups.linux.delhi 
http://www.mail-archive.com/ilugd@lists.linux-delhi.org/

  • [ilugd] Publish... आशीष शुक्ला \"Wah Java !!\"
    • Re: [ilugd... आशीष शुक्ला \"Wah Java !!\"
    • Re: [ilugd... Gora Mohanty
      • Re: [i... आशीष शुक्ला \"Wah Java !!\"
        • Re... Gora Mohanty
          • ... आशीष शुक्ला \"Wah Java !!\"
        • Re... Mithun Bhattacharya
          • ... आशीष शुक्ला \"Wah Java !!\"
            • ... Mithun Bhattacharya
              • ... आशीष शुक्ला \"Wah Java !!\"
                • ... आशीष शुक्ला \"Wah Java !!\"
                • ... Raj Mathur
                • ... Nishant Sharma
                • ... आशीष शुक्ला \"Wah Java !!\"
                • ... Vikas Upadhyay
                • ... आशीष शुक्ला \"Wah Java !!\"
                • ... Vikas Upadhyay
                • ... आशीष शुक्ला \"Wah Java !!\"
                • ... आशीष शुक्ला \"Wah Java !!\"

Reply via email to