Re: xkcd: LTR

2012-11-28 Thread Philippe Verdy
2012/11/28 Leif Halvard Silli 

> Philippe Verdy, Wed, 28 Nov 2012 04:50:06 +0100:
> >>> detects a violation of the required
> >>> extended "prolog" (sorry, the HTML5 document declaration, which is not
> a
> >>> valid "document declaration" for XHTML or for HTML4 or before or even
> for
> >>> SGML, due to the unspecified schema after the shema short name), it
> >>> should catch this exception to try another parser.
> >>
> >> There is no spec, that I am aware of, that says that it should do that.
> >
> > But this is in the scope of the HTML5 whose claimed purpose is to become
> > compatible with documents encoded in all previous flavors of HTML.
>
> I admit that understanding the meaning behind all the slogans about
> HTML5, can be be demanding. But the goal has all the time been to
> create a *single* HTML parser, and not to introduce switching between
> multiple HTML parsers. If you think otherwise, then my claim is that
> you have misunderstood.


In this case, Firefox and IE should not even be able to render *any* XHTML
page because it violates the HTML5 standard. It still attempts to recover
from it, recognizing a part of XHTML, but not an essential one : its very
basic XML prolog (and the XHTML document declaration), up to the point
where they start seeing the root element.

But then how can they claim supporting XHTML when they don't (and when the
XHTML syntax is still part of HTML5, which makes affirmations like yours –
"not to introduce switching between multiple HTML parsers" – very weak and
difficult to defend).

If the intent is to be able to parse all flavors of HTML (at least a basic
profile of them) with the same parser, then a behavior must be standardized
in HTML5 to correctly handle the possible presence of XML prologs and
standard SGML document declarations even if their contents are skipped and
ignored (notably here when it is used to specify that the document is
effectively encoded in UTF-8 and not cp1252, both encodings being supported
by HTML5 but without other compatibility problems when it is UTF-8)

But ignoring XHTML document declarations will have an impact on
compatibility, if there's an external or internal DTD and this should be
documented in HTML5 by limting the claims of compatibility (and suggesting
then another recovery mechanism for these unsupported parts of XHTML, using
a true XML parser in case of violation of the required HTML5 DOCTYPE
declaration).

For now this breaks the interoperability with the basic profile of XHTML,
more or less compatible with HTML4 including the deprecated elements (but
without the modular extension design, and without support of XML
namespaces).

Now the argument saying that "meta" elements may be used in the HTML
document header (to replace missing HTTP MIME headers), this contradicts
all what was done to deprecate this meta element usage before. And here
also, HTML5 is not clear about this change of position. And "meta" elements
won't make HTML parsers simpler to implement : they will need to reparse
the document from the beginning.

The XML prolog of XHTML is much simpler to parse than the meta element, and
can be parsed directly by the HTML5-only parser, which can as well as
accept at least the XHTML1 document declaration (without internal DTD) as
acceptable for this HTML5 parser (it should fail however if there's an
internal DTD or if the SGML catalog name is not one of those for HTML or
XHTML; it should just check the SGML catalog name partly ignoring the
flavor precision in its name, as there is no internal or external DTD
supported in HTML4 or lower; it should silently ignore the URL for an
external DTD in XHTML, including when XHTML is used as the alternate
serialization syntax for HTML5, even if this will cause some defined
entities defined in the external DTD not being replaced, but the result  of
the HTML5 parser with undefined entities or entities defined differently
will be unpredictable).

If an implementation can support both parsers, the more compatible recovery
mode will be to use the XML parser, instead of using this simple heuristic.

Browsers already support multiple text-encoded document parsers, including
for HTML5 (Javascript, JSON, CSS, SVG/XML, P3P, URI...), plus binary
parsers for various media codecs  (PNG, GIF, JPEG, WAV, ICO, OpenPDF...) if
they can embed them instead of using OS-supported codecs or plugins (MPEG,
Ogg...), and data codecs (compressors, encryptors,, archive formats... for
transport and security protocol layers referenced in URI schemes). What
else ?

In all popular browsers, the XML parser is still present, since long now,
to support XML requests (and lots of GUI or configuration features, such as
XUL in Firefox, VML in IE, external SVG images, local DB stores, support
library for third-party addons...), even if JSON requests are highly
preferred now, sometimes more secure, but much simpler and faster to parse
(and more compact in their serialization).


Re: xkcd: LTR

2012-11-27 Thread Leif Halvard Silli
Philippe Verdy, Wed, 28 Nov 2012 04:50:06 +0100:
>>> detects a violation of the required
>>> extended "prolog" (sorry, the HTML5 document declaration, which is not a
>>> valid "document declaration" for XHTML or for HTML4 or before or even for
>>> SGML, due to the unspecified schema after the shema short name), it
>>> should catch this exception to try another parser.
>> 
>> There is no spec, that I am aware of, that says that it should do that. 
> 
> But this is in the scope of the HTML5 whose claimed purpose is to become
> compatible with documents encoded in all previous flavors of HTML.

I admit that understanding the meaning behind all the slogans about 
HTML5, can be be demanding. But the goal has all the time been to 
create a *single* HTML parser, and not to introduce switching between 
multiple HTML parsers. If you think otherwise, then my claim is that 
you have misunderstood.

> Otherwise this claim is very weak and HTML5 is just a standard compatible
> with itself,

Yes, HTML5 is a standard in itself. For instance, the issue of the XML 
prologue have been mentioned from time to time during the HTML5 
process, but a deliberate choice was made to not accept it as part of 
the syntax. Probably one of the motivations for why the editor made 
that choice was to help authors to keep HTML and XHTML separate. Also, 
HTML5 contains some willful violations of other standards. But then, a 
standard is supposed to set a new standard, hence that should in 
principle be OK. But it is true that terms such as "Web compatible" and 
"compatible" in general have been used sloganishly about HTML5. I 
think, in one way, it was just a method for getting things to move. But 
it is not so that "compatible" has trumped every other HTML5 design 
options - other things to consider are for instance that the end result 
- then final syntax -  becomes simple to understand, without too 
complicated and convoluted rules.

Just my two cents, about how I see it.
-- 
leif halvard silli



Re: xkcd: LTR

2012-11-27 Thread Leif Halvard Silli
Philippe Verdy, Wed, 28 Nov 2012 04:23:10 +0100:
> 2012/11/28 Leif Halvard Silli 
> 
>> For
>> a new version of the validator, that ask more of those questions,
>> please try http://validator.w3.org/nu/  - it happens to for the most
>> part be developed by one of the Firefox developers, btw. And it allows
>> you to check XHTML1-syntax as well (but only if you serve it as XHTML -
>> if you serve it as HTML, then it validates it as HTML.)
> 
> This "new" validator is not the one promoted and supported. I use the
> "Unicorn" validator that checks all W2C supported markup languages
> (including HTML5).

The "nu" validator is good if you are interested in the questions I 
mentioned above.

>> Please note that prolog is one thing, and the DOCTYPE is another, see
>> XML 1.0: http://www.w3.org/TR/REC-xml/#sec-prolog-dtd
> 
> Yes I know the terminolgy, but it's evident that I'm including the document
> declaration as part of the "prolog" (i.e. everything that is not comment
> and that appears before the root element)

It is just as confusing as ever that you continue to insist on your 
terminology.

>>> The absence of the HTML5 required prolog (in its standard basic-SGML
>>> profile), or the presence of another incompatible XML prolog is
>>> enough to make the distinction between the two syntaxes.
>> 
>> You mean: Visually? Yes. However, that is not how parsers think. What
>> parsers normally do is that they look at the Content-Type "flag",
>> before they decide how to parse the document.
> 
> True, but then when the HTML5 parser

The "HTML5 parser" is just the one and only (updated) HTML parser.

> detects a violation of the required
> extended "prolog" (sorry, the HTML5 document declaration, which is not a
> valid "document declaration" for XHTML or for HTML4 or before or even for
> SGML, due to the unspecified schema after the shema short name), it should
> catch this exception to try another parser.

There is no spec, that I am aware of, that says that it should do that.
-- 
leif halvard silli



Re: xkcd: LTR

2012-11-27 Thread Philippe Verdy
> detects a violation of the required

> > extended "prolog" (sorry, the HTML5 document declaration, which is not a
> > valid "document declaration" for XHTML or for HTML4 or before or even for
> > SGML, due to the unspecified schema after the shema short name), it
> should
> > catch this exception to try another parser.
>
> There is no spec, that I am aware of, that says that it should do that.
>

But this is in the scope of the HTML5 whose claimed purpose is to become
compatible with documents encoded in all previous flavors of HTML.
Otherwise this claim is very weak and HTML5 is just a standard compatible
with itself, and nothing else (it breaks XHTML rules, and SGML rules for
the document declaration, and IETF charset naming rules with its
reinterpretation of ISO8859-1, which is also still not stabilized).

HTML5 is still beta in these claims, and it's regrettable that its required
document declaration does not even specify its SGML catalog entry name,
even if it forbids the insertion of a DTD. One day or another, at least the
SGML catalog entry name will come back, when HTML5 will have been released
and a newer version will be needed and developed, and HTML5 should still
allow the presence of this SGML catalog entry name, even if it does not
require it in this version.


Re: xkcd: LTR

2012-11-27 Thread Philippe Verdy
2012/11/28 Leif Halvard Silli 

> For
> a new version of the validator, that ask more of those questions,
> please try http://validator.w3.org/nu/  - it happens to for the most
> part be developed by one of the Firefox developers, btw. And it allows
> you to check XHTML1-syntax as well (but only if you serve it as XHTML -
> if you serve it as HTML, then it validates it as HTML.)
>

This "new" validator is not the one promoted and supported. I use the
"Unicorn" validator that checks all W2C supported markup languages
(including HTML5).

>>> ; in this profile, they
> >>> MUST honor the XML prolog and notably its XML encoding declaration
> >>> (given that the encoding is not specified in the HTTP Content-type.
> >>
> >> Again: Absolutely not. They must not, will not and must not honour the
> >> XML prologue. (It is another matter that some user agents sometimes use
> >> the prologue to look for encoding information.)
> >
> > Sure they can because this XHTML1 site violates  HTML5 rules, missing
> > its required prologue.
>
> Not sure how you understand the phrase "honour the XML prologue". It
> also sounds as if you say that HTML5 has its own prologue. But HTML5
> does not contain any code that is commonly known as "prologue". For
> instance, if you refer to the code "", then this is not
> a prologue even if it occurs at the start of the document.
>

Question of terminology specific to this version, I consider it part of the
prolog, and it is not valid XML, so not valid XHTML.

>
> From one angle, you are off course right. But HTML5 actually explains
> that what you call "SGML-based" is not SGML-based but only SGML
> *inspired*. Thus, HTML5 is much simpler and less cryptic than the
> (official) SGML syntax of HTML4.


It is evident that here I mean the legacy HTML syntax, not compatible with
XML (it allows closing tags, and does not require self-closed tags for
empty elements).


> >>> I'm still convinced that these are bugs in Firefox and IE, which
> >>> support only HTML5 in its basic HTML profile, but not HTML5 in its
> >>> XML/XHTML profile (which is also part of the HTML5 standard and where
> >>> processing the XML prolog is NOT an option but a requirement).
> >>
> >> Just for the record: HTML5 defines the most up-to-date parsing
> >> mechanism for *all* HTML documents - HTML1,2,3,5 as well as any flavour
> >> of XHTML served as HTML. HTML5 does not allow authors to use the XML
> >> prologue.
> >
> > Where ?
>
> Here: http://dev.w3.org/html5/spec/syntax.html#writing (As you can see,
> it doesn't say that it is allowed, hence it is not.) You can also see
> the bottom of this page:
> http://dev.w3.org/html5/spec/the-meta-element.html#charset
>
> > The required HTML5 prolog applies to its SGML based syntax ;
>
> Please note that prolog is one thing, and the DOCTYPE is another, see
> XML 1.0: http://www.w3.org/TR/REC-xml/#sec-prolog-dtd


Yes I know the terminolgy, but it's evident that I'm including the document
declaration as part of the "prolog" (i.e. everything that is not comment
and that appears before the root element)


> > it makes no sense in XHTML as it voluntarily violates the validity of
> > the XML document declaration.
>
> If you are speaking about the HTML5 doctype, then its only effect is to
> make sure that the HTML parser stays in no-quirks (aka standards) mode.
> In XHTML then, you are right that it is not needed. But you are wrong
> if you say that it is a problem to include it in XHTML, as it causes no
> harm. In fact, in XHTML, you can drop both the DOCTYPE and the XML
> prologue.
>
> > The absence of the HTML5 required prolog (in its standard basic-SGML
> > profile), or the presence of another incompatible XML prolog is
> > enough to make the distinction between the two syntaxes.
>
> You mean: Visually? Yes. However, that is not how parsers think. What
> parsers normally do is that they look at the Content-Type "flag",
> before they decide how to parse the document.


True, but then when the HTML5 parser detects a violation of the required
extended "prolog" (sorry, the HTML5 document declaration, which is not a
valid "document declaration" for XHTML or for HTML4 or before or even for
SGML, due to the unspecified schema after the shema short name), it should
catch this exception to try another parser. The XML declaration itself is
enough to throw the exception and so easy to detect to allow changing from
an HTML parser to an XML parser for XHTML. If even the XML parser fails,
then retry with a legacy HTML parser working in quirks mode.

 > Now HTML5 is still not completely polished, finished and approved.

>  > Such interoperability rules are not clearly defined even if they are
> > the "most up-to-date" to make it work seamlessly with the claimed
> > compatibility with all flavors of HTML or XHTML. And the fact that
> > Firefox and IE behave differently from Chorme and Safari in this
> > domain is a proof of this unfinished status.
>
> I would not conclude like that … But it could prob

Re: xkcd: LTR

2012-11-27 Thread Leif Halvard Silli
Philippe Verdy, Wed, 28 Nov 2012 01:10:45 +0100:
> 2012/11/27 Leif Halvard Silli

>> The fact that XHTML 1 permits the XML prolog regardless how the
>> document is served, is just a shortcoming of the XHTML 1 specification.
> 
> No, it was by design. Making HTML an application of XML. Only XML but 
> with all rules of XML.

It was by design. But nevertheless a shortcoming. They should/could 
have defined more restrictions on the syntax than then they did, and 
then it would have been OK. But don't forget that XHTML1 also permits 
you to use the meta element - which works in all web browsers, for 
setting the encoding:



This is described in the famous Appendix C of XHTML 1: 
http://www.w3.org/TR/xhtml1/#C_9

>>> So these browsers must find
>>> something else: given the XML prolog they should then use HTML5 in
>>> its XHTML profile, not in its HTML profile
>> 
>> No, that is not how things works. The decision to parse the document as
>> HTML is taken before the browser sees the XML prologue. So the prologue
>> should not - and does not - change anything with regard to parsing as
>> HTML or as XML.
> 
> Then explain why the W3C validator sees absolutley no problem in the 
> way these XHTML1 pages are encoded and transported.

Because it only checks the syntax, without asking you how you are 
actually going to use that syntax - whether you want to serve it to an 
XML parser as XHTML or you are going to serve it to an HTML parser. For 
a new version of the validator, that ask more of those questions, 
please try http://validator.w3.org/nu/  - it happens to for the most 
part be developed by one of the Firefox developers, btw. And it allows 
you to check XHTML1-syntax as well (but only if you serve it as XHTML - 
if you serve it as HTML, then it validates it as HTML.)

>>> ; in this profile, they
>>> MUST honor the XML prolog and notably its XML encoding declaration
>>> (given that the encoding is not specified in the HTTP Content-type.
>> 
>> Again: Absolutely not. They must not, will not and must not honour the
>> XML prologue. (It is another matter that some user agents sometimes use
>> the prologue to look for encoding information.)
> 
> Sure they can because this XHTML1 site violates  HTML5 rules, missing 
> its required prologue.

Not sure how you understand the phrase "honour the XML prologue". It 
also sounds as if you say that HTML5 has its own prologue. But HTML5 
does not contain any code that is commonly known as "prologue". For 
instance, if you refer to the code "", then this is not 
a prologue even if it occurs at the start of the document. 

Also, since there are two flavours of XML - XML 1.0 and XML 1.1, the 
prologue may potentially have an effect on how the document is parsed, 
but only if the parser already knows that the file is XML. But the XML 
prologue does not *cause* parsers to choose XML-mode rather than 
HTML-mode.

(Opera introduced the opposite thing some time ago: If the document is 
an XHTML document - for real, but contains XML wellformedness errors, 
then it will switch to HTML-mode.)

>>> Now given the XML prolog and the DTD declaration, the file is clearly
>>> not even HTML5 in XML/XHTML (i.e. XHTML 5), but is XHTML 1 (based on
>>> a stable subset of HTML4, but working in strict mode without the
>>> quirks modes). Once again, this excludes using the HTML5 rules again.
>> 
>> In a way the names and the numbers (HTML4, XHTML1, HTML5) are just
>> confusing. There is just one way to parse HTML. When it comes to HTML
>> (text/html),then HTML5 differs from HTML4 and XHTML1 in that it is not
>> based on a *another* format than HTML itself. Because HTML4 and XHTML1
>> are not based on how HTML actually works, and - in addition - does not
>> take fully account of that (or whatever the reason), they allow
>> syntaxes, such as DTD declarations, which have no effect (except
>> side-effects such as quirks-mode) in HTML.
> 
> HTML5 admits the two syntaxes : SGML-based like it is used primarily 
> (in a simplified profile), and XML.

>From one angle, you are off course right. But HTML5 actually explains 
that what you call "SGML-based" is not SGML-based but only SGML 
*inspired*. Thus, HTML5 is much simpler and less cryptic than the 
(official) SGML syntax of HTML4.

>>> I'm still convinced that these are bugs in Firefox and IE, which
>>> support only HTML5 in its basic HTML profile, but not HTML5 in its
>>> XML/XHTML profile (which is also part of the HTML5 standard and where
>>> processing the XML prolog is NOT an option but a requirement).
>> 
>> Just for the record: HTML5 defines the most up-to-date parsing
>> mechanism for *all* HTML documents - HTML1,2,3,5 as well as any flavour
>> of XHTML served as HTML. HTML5 does not allow authors to use the XML
>> prologue. 
> 
> Where ? 

Here: http://dev.w3.org/html5/spec/syntax.html#writing (As you can see, 
it doesn't say that it is allowed, hence it is not.) You can also see 
the bottom of this page: 
http://dev.w3.org/html5/spec/the-meta-e

Re: xkcd: LTR

2012-11-27 Thread Philippe Verdy
2012/11/27 Leif Halvard Silli 

>
> The fact that XHTML 1 permits the XML prolog regardless how the
> document is served, is just a shortcoming of the XHTML 1 specification.
>
>
No, it was by design. Making HTML an application of XML. Only XML but
with all rules of XML.

> So these browsers must find
> > something else: given the XML prolog they should then use HTML5 in
> > its XHTML profile, not in its HTML profile
>
> No, that is not how things works. The decision to parse the document as
> HTML is taken before the browser sees the XML prologue. So the prologue
> should not - and does not - change anything with regard to parsing as
> HTML or as XML.


Then explain why the W3C validator sees absolutley no problem in the way
these XHTML1 pages are encoded and transported.


>  > ; in this profile, they
> > MUST honor the XML prolog and notably its XML encoding declaration
> > (given that the encoding is not specified in the HTTP Content-type.
>
> Again: Absolutely not. They must not, will not and must not honour the
> XML prologue. (It is another matter that some user agents sometimes use
> the prologue to look for encoding information.)
>
>
Sure they can because this XHTML1 site violates  HTML5 rules, missing its
required prologue.

 > Now given the XML prolog and the DTD declaration, the file is clearly
> > not even HTML5 in XML/XHTML (i.e. XHTML 5), but is XHTML 1 (based on
> > a stable subset of HTML4, but working in strict mode without the
> > quirks modes). Once again, this excludes using the HTML5 rules again.
>
> In a way the names and the numbers (HTML4, XHTML1, HTML5) are just
> confusing. There is just one way to parse HTML. When it comes to HTML
> (text/html),then HTML5 differs from HTML4 and XHTML1 in that it is not
> based on a *another* format than HTML itself. Because HTML4 and XHTML1
> are not based on how HTML actually works, and - in addition - does not
> take fully account of that (or whatever the reason), they allow
> syntaxes, such as DTD declarations, which have no effect (except
> side-effects such as quirks-mode) in HTML.
>

HTML5 admits the two syntaxes : SGML-based like it is used primarily (in a
simplified profile), and XML.


>  > I'm still convinced that these are bugs in Firefox and IE, which
> > support only HTML5 in its basic HTML profile, but not HTML5 in its
> > XML/XHTML profile (which is also part of the HTML5 standard and where
> > processing the XML prolog is NOT an option but a requirement).
>
> Just for the record: HTML5 defines the most up-to-date parsing
> mechanism for *all* HTML documents - HTML1,2,3,5 as well as any flavour
> of XHTML served as HTML. HTML5 does not allow authors to use the XML
> prologue.
>

Where ? The required HTML5 prolog applies to its SGML based syntax ; it
makes no sense in XHTML as it voluntarily violates the validity of the XML
document declaration.

The absence of the HTML5 required prolog (in its standard basic-SGML
profile), or the presence of another incompatible XML prolog is enough to
make the distinction between the two syntaxes. But both syntaxes will
generate the same HTML DOM, which is just enough to make the proper
rendering intended, and make HTML5 compatible with both syntaxes.

Now HTML5 is still not completely polished, finished and approved. Such
interoperability rules are not clearly defined even if they are the "most
up-to-date" to make it work seamlessly with the claimed compatibility with
all flavors of HTML or XHTML. And the fact that Firefox and IE behave
differently from Chorme and Safari in this domain is a proof of this
unfinished status.


Re: xkcd: LTR

2012-11-27 Thread Asmus Freytag

On 11/27/2012 5:39 AM, Masatoshi Kimura wrote:

(2012/11/27 20:27), Philippe Verdy wrote:
Could you please stop spreading an unfounded rumor such as "Firefox is 
wrong because it ignores the lacking of HTML5 prolog"? 


Getting Philippe to stop spreading unfounded anything is a near 
impossible task. :)


A./





Re: xkcd: LTR

2012-11-27 Thread Leif Halvard Silli
Philippe Verdy, Tue, 27 Nov 2012 21:07:31 +0100:
> A ! I see now the problem: the XHTML file is being served as HTML 
> instead of XHTML (but this is not invalid for XHTML 1).

Both SGML-based HTML4 and XML-based XHTML 1 operate with syntax rules 
that are not - and has never been - compatible with the way text/html 
operates. Thus, both HTML4 and XHTML1 permits syntaxes whose semantics 
are ignored when the document is parsed as HTML (as opposed to parsed 
as SGML or as XML).

If you you are interested in creating XHTML syntax that is compatible 
with HTML, then you should look at Polyglot Markup: 
http://www.w3.org/TR/html-polyglot/

> But anyway you're also right that the XML prolog found is NOT valid 
> for HTML5 when the file is served as HTML instead of XHTML.

The fact that XHTML 1 permits the XML prolog regardless how the 
document is served, is just a shortcoming of the XHTML 1 specification.

> So these browsers must find 
> something else: given the XML prolog they should then use HTML5 in 
> its XHTML profile, not in its HTML profile

No, that is not how things works. The decision to parse the document as 
HTML is taken before the browser sees the XML prologue. So the prologue 
should not - and does not - change anything with regard to parsing as 
HTML or as XML.

> ; in this profile, they 
> MUST honor the XML prolog and notably its XML encoding declaration 
> (given that the encoding is not specified in the HTTP Content-type.

Again: Absolutely not. They must not, will not and must not honour the 
XML prologue. (It is another matter that some user agents sometimes use 
the prologue to look for encoding information.)

> Now given the XML prolog and the DTD declaration, the file is clearly 
> not even HTML5 in XML/XHTML (i.e. XHTML 5), but is XHTML 1 (based on 
> a stable subset of HTML4, but working in strict mode without the 
> quirks modes). Once again, this excludes using the HTML5 rules again.

In a way the names and the numbers (HTML4, XHTML1, HTML5) are just 
confusing. There is just one way to parse HTML. When it comes to HTML 
(text/html),then HTML5 differs from HTML4 and XHTML1 in that it is not 
based on a *another* format than HTML itself. Because HTML4 and XHTML1 
are not based on how HTML actually works, and - in addition - does not 
take fully account of that (or whatever the reason), they allow 
syntaxes, such as DTD declarations, which have no effect (except 
side-effects such as quirks-mode) in HTML.

> I'm still convinced that these are bugs in Firefox and IE, which 
> support only HTML5 in its basic HTML profile, but not HTML5 in its 
> XML/XHTML profile (which is also part of the HTML5 standard and where 
> processing the XML prolog is NOT an option but a requirement).

Just for the record: HTML5 defines the most up-to-date parsing 
mechanism for *all* HTML documents - HTML1,2,3,5 as well as any flavour 
of XHTML served as HTML. HTML5 does not allow authors to use the XML 
prologue. So while XHTML1 allows you to use the prologue, the best 
description of how to parse anything that purports to be HTML -  HTML5 
- does not require user agents/browsers to pay any attention to the 
prologue. Thus the correct one to blame in this case for the fact that 
it doesn't work in Firefox, seems to be the author. (Though we could 
also blame the "The history of how HTML developed".
-- 
leif halvard silli



Re: xkcd: LTR

2012-11-27 Thread Philippe Verdy
No. Freetype is not involved here for the ugly rendering (on screen) under
Windows of the unhinted "CMU" font provided by the page. May be this looks
OK on Mac (if Safari is autohinting the font itself, despite the font is
not autohinted itself ; I'm not sure that Safari on MacOS processes TTF
fonts this way when they are not hinted, and I'm convinced that unhinted
fonts should not be autohinted "magically" by the renderer).

So using the xml:lang="en-Dsrt" pseudo-attribute remains a good suggestion
to allow a CSS stylesheet to avoid using referening CMU font on Windows and
MacOS when displaying the Latin text (using xml:lang="en") and to allow the
same stylesheet to specify a much better Deseret font for Windows (Segoe UI
is fine on Windows). There will still remain a problem for redering the
page in Linux (where FreeType is used and which is not authinting itself
the unhinted font, and where Segoe UI is not available) and in Windows
before Windows 7 (no Segoe UI font as well, you'll also need a hinted
version of the CMU font).

2012/11/27 Khaled Hosny 

> Looks OK here, but that is probably FreeType doing its magic as usual.
>


Re: xkcd: LTR

2012-11-27 Thread Philippe Verdy
A ! I see now the problem: the XHTML file is being served as HTML
instead of XHTML (but this is not invalid for XHTML 1).

But anyway you're also right that the XML prolog found is NOT valid for
HTML5 when the file is served as HTML instead of XHTML. This should
immediately trigger the fact that HTML5 should not be used to render the
page in the HTML profile. So these browsers must find something else: given
the XML prolog they should then use HTML5 in its XHTML profile, not in its
HTML profile ; in this profile, they MUST honor the XML prolog and notably
its XML encoding declaration (given that the encoding is not specified in
the HTTP Content-type.

Now given the XML prolog and the DTD declaration, the file is clearly not
even HTML5 in XML/XHTML (i.e. XHTML 5), but is XHTML 1 (based on a stable
subset of HTML4, but working in strict mode without the quirks modes). Once
again, this excludes using the HTML5 rules again.

I'm still convinced that these are bugs in Firefox and IE, which support
only HTML5 in its basic HTML profile, but not HTML5 in its XML/XHTML
profile (which is also part of the HTML5 standard and where processing the
XML prolog is NOT an option but a requirement).


2012/11/27 Leif Halvard Silli 

> Philippe Verdy, Tue, 27 Nov 2012 15:39:43 +0100:
> > I've never said that user agents had to "'write" the prolog. It's the
> > reverse: yes authors have to write a prolog (but the prolog is perfect
> here
> > so this is not the fault of the author).
>
> XML has (or more correctly: can have) a prolog. HTML does not have a
> prolog. Now to the million dollar question: is your page in question
> XML or HTML?  Answer: Per the Content-Type, then it is HTML (that is:
> "text/html"). Next question: Does the XML prolog have any effect when
> the XML file (more specifically: the XHTML file) is served as HTML
> (that is: "text/html")?
>
> The answer is that, per HTML5, it does not have effect. And of course,
> per HTML4, it does not have effect. As for XHTML 1, then it cannot
> really regulate what is supposed to happen in text/html. The
> problem/challenge, hover is that some Web browsers - such as W3m (a
> text browser), Chrome, Opera and Safari - *do* look at the prolog for
> encoding info *also* when served as HTML. But Firefox and Internet
> Explorer do not. Which is according the HTML5 specification.
>
> My guess is that it will *never* become conforming to use the XML
> prologue in HTML files. However, that does not necessarily prevent
> Firefox from looking at the prologue for encoding info, when *that* is
> the only source of encoding info. In fact, I think the HTML5 encoding
> sniffing algorithm already permits this (since it it has a step which
> roughly says "if the user agent have other sources of information".)
>
> So, for what it is worth - and with reference to your pages, I filed a
> bug against Firefox, to make it start to use the encoding declartion of
> the XML prologue, when nothing else is available:
> https://bugzilla.mozilla.org/show_bug.cgi?id=815279
> --
> leif halvard silli
>


Re: xkcd: LTR

2012-11-27 Thread Khaled Hosny
Looks OK here, but that is probably FreeType doing its magic as usual.

Regards,
 Khaled

On Tue, Nov 27, 2012 at 02:29:45AM +0100, Philippe Verdy wrote:
> Also I really don't like the Deseret font:
> {font-family: CMU; src: url(CMUSerif-Roman.ttf) format("truetype");}
> that you have inserted in your stylesheet (da.css) which is used to display
> the whole text content of the page, including the English Latin text at the
> bottom part. This downloaded font is difficult to read as it is not hinted
> at all (so its rendering on screen is extremely poor, we probably don't
> want to print each page of this XKCD series, when the main interest is the
> image which is perfectly readable).
> Could you ask to someone in this list to help you hinting this font a
> minimum (even basic autohinting would be much better).
> 
> 
> 2012/11/27 Philippe Verdy 
> 
> > Did you try add the xml:lang="en-Dsrt" pseudo-attribute to the html
> > element, as suggested by the W3C Unicorn validator ?
> >
> >
> > http://validator.w3.org/unicorn/check?ucn_uri=www.xn--elqus623b.net%2FXKCD%2F1138.html&ucn_lang=fr&ucn_task=conformance#
> >
> > May be this could help IE and Firefox that can't figure out the language
> > used to properly detect the encoding if they still don't trust the XML
> > declaration in this case, to avoid them to use an encoding "guesser". It is
> > anyay curious because this site is valid as XHTML 1.1 (not as HTML5 which
> > uses a very different and simplified prolog, which is not matched here, so
> > the "legacy" rules should apply to detect XHTML here, then legacy HTML4 if
> > XHTML is no longer recognized by IE and Firefox). Because XHTML is properly
> > tagged, the XML requirements should apply and the XML declaration in the
> > prolog should be used without needing to guess the encoding from the rest
> > of the content (starting by a meta element in the HTML head element).
> >
> >
> > 2012/11/27 John H. Jenkins 
> >
> > That's because the domain does, in fact, use sinograms and not Deseret.
> >>  (It's my Chinese name.)
> >>
> >> On 2012年11月26日, at 下午1:54, Philippe Verdy  wrote:
> >>
> >> I wonder why this IDN link appears to me using sinograms in its domain
> >> name, instead of Deseret letters. The link works, but my browser cannot
> >> display it and its displays the Punycoded name instead without decoding it.
> >>
> >> This is strange because I do have Deseret fonts installed and I can
> >> view "Unicoded" HTML pages containing Deseret letters.
> >>
> >>
> >> 2012/11/26 John H. Jenkins 
> >>
> >>> Or, if one prefers:
> >>>
> >>> http://www.井作恆.net/XKCD/1137.html
> >>>
> >>> On 2012年11月21日, at 上午10:22, Deborah Goldsmith 
> >>> wrote:
> >>>
> >>>
> >>> http://xkcd.com/1137/
> >>>
> >>> Finally, an xkcd for Unicoders. :-)
> >>>
> >>> Debbie
> >>>
> >>>
> >>>
> >>
> >>
> >



Re: xkcd: LTR

2012-11-27 Thread Leif Halvard Silli
Philippe Verdy, Tue, 27 Nov 2012 15:39:43 +0100:
> I've never said that user agents had to "'write" the prolog. It's the
> reverse: yes authors have to write a prolog (but the prolog is perfect here
> so this is not the fault of the author).

XML has (or more correctly: can have) a prolog. HTML does not have a 
prolog. Now to the million dollar question: is your page in question 
XML or HTML?  Answer: Per the Content-Type, then it is HTML (that is: 
"text/html"). Next question: Does the XML prolog have any effect when 
the XML file (more specifically: the XHTML file) is served as HTML 
(that is: "text/html")? 

The answer is that, per HTML5, it does not have effect. And of course, 
per HTML4, it does not have effect. As for XHTML 1, then it cannot 
really regulate what is supposed to happen in text/html. The 
problem/challenge, hover is that some Web browsers - such as W3m (a 
text browser), Chrome, Opera and Safari - *do* look at the prolog for 
encoding info *also* when served as HTML. But Firefox and Internet 
Explorer do not. Which is according the HTML5 specification.

My guess is that it will *never* become conforming to use the XML 
prologue in HTML files. However, that does not necessarily prevent 
Firefox from looking at the prologue for encoding info, when *that* is 
the only source of encoding info. In fact, I think the HTML5 encoding 
sniffing algorithm already permits this (since it it has a step which 
roughly says "if the user agent have other sources of information".)

So, for what it is worth - and with reference to your pages, I filed a 
bug against Firefox, to make it start to use the encoding declartion of 
the XML prologue, when nothing else is available: 
https://bugzilla.mozilla.org/show_bug.cgi?id=815279
-- 
leif halvard silli



Re: xkcd: LTR

2012-11-27 Thread Philippe Verdy
Also you make a confusion in the sense that HTML5 must be able to "parse"
HTML4.

This is true, but this does not mean that they will be able to render it
fully. HTML5 is not fully upward compatible with past versions (and the
case of the identification of encodings is an example where it is
different, and many requirements of HTML4 are no longer requirements in
HTML5 due to some relaxed rules after the faield effort to standardize
HTML4 more like XHTML and according to the initial CSS specifications).

So HTML5 renderers will just render HTML4 in a "best effort", but lots of
requirements that are applicable to real *HTML5* documents (identified by
their prolog) do NOT apply to non-HTML5 documents as they are not directly
in scope of its standard (hte HTML4 specifications themselves are not
dismissed) : the best effort implies flexibility, even if interoperability
is not warrantied across HTML5 implementations that will all parse HTML4
documents but may still produce different results (inclusing with the
support of HTML4 quirk mode if they want).



2012/11/27 Masatoshi Kimura 

> (2012/11/27 20:27), Philippe Verdy wrote:
> > HTML5 does not reference the "Content-Type: text/html" header as enough
> > to qualify as meaning "HTML5".
> HTML5 User-agents must parse any byte sequences as HTML5 document if the
> Content-Type is text/html.
>
> > HTML5 **requires** its own prolog (i.e. its basic document declaration
> > **within** the document itself, for the HTML syntax, or its FULL
> > document declaration for the XML/XHTML syntax).
> HTML5 requires **authors** to write the prolog, not user-agents. Lacking
> prolog just turn the user-agents to quirks mode.
> Note that quirks mode doesn't mean "do whatever you consider it quirks."
> Parsing quirks mode document is also completely spec'ed.
>
> > So Firefox is wrong and attempts to use HTML5 to render all HTML
> dialects.
> No, not at all. Rather, it is required by the spec to use HTML5 parser
> to parse all byte sequences sent with "Content-Type: text/html".
> Could you please stop spreading an unfounded rumor such as "Firefox is
> wrong because it ignores the lacking of HTML5 prolog"?
>
> --
> vyv03...@nifty.ne.jp
>
>


Re: xkcd: LTR

2012-11-27 Thread Philippe Verdy
I've never said that user agents had to "'write" the prolog. It's the
reverse: yes authors have to write a prolog (but the prolog is perfect here
so this is not the fault of the author). Why do have to use this prolog,
it's exactly because user agents will have to "read" it (not "write" it),
as it is expected for validating that this is effectively an HTML5 content
(the "Content-Type: text/html" is clearly not enough, it is exactly the
same as HTML4 or all past versions of HTML, working in quirk mode or not).

By your assertion, all HTML5 browsers would then need to parse HTML4 as if
it was HTML5, using its strict definition that are not compatible with
HTML4 (even if we ignore the quirks mode), or all past versions. HTML5
parsing is triggered by the presence of the required HTML5 prolog.


Re: xkcd: LTR

2012-11-27 Thread Masatoshi Kimura
(2012/11/27 20:27), Philippe Verdy wrote:
> HTML5 does not reference the "Content-Type: text/html" header as enough
> to qualify as meaning "HTML5".
HTML5 User-agents must parse any byte sequences as HTML5 document if the
Content-Type is text/html.

> HTML5 **requires** its own prolog (i.e. its basic document declaration
> **within** the document itself, for the HTML syntax, or its FULL
> document declaration for the XML/XHTML syntax).
HTML5 requires **authors** to write the prolog, not user-agents. Lacking
prolog just turn the user-agents to quirks mode.
Note that quirks mode doesn't mean "do whatever you consider it quirks."
Parsing quirks mode document is also completely spec'ed.

> So Firefox is wrong and attempts to use HTML5 to render all HTML dialects.
No, not at all. Rather, it is required by the spec to use HTML5 parser
to parse all byte sequences sent with "Content-Type: text/html".
Could you please stop spreading an unfounded rumor such as "Firefox is
wrong because it ignores the lacking of HTML5 prolog"?

-- 
vyv03...@nifty.ne.jp



Re: xkcd: LTR

2012-11-27 Thread Philippe Verdy
HTML5 does not reference the "Content-Type: text/html" header as enough to
qualify as meaning "HTML5".
HTML5 **requires** its own prolog (i.e. its basic document declaration
**within** the document itself, for the HTML syntax, or its FULL document
declaration for the XML/XHTML syntax).
So Firefox is wrong and attempts to use HTML5 to render all HTML dialects.


2012/11/27 Simon Montagu 

> On 11/27/2012 11:19 AM, Behnam Esfahbod ZWNJ wrote:
>
>> Simon,
>>
>> There's no sign of HTML5 on that page. The head of the file matches all
>> XHTML 1.1 requirements and passes all checks on validator.w3.org
>> . Now, why would Firefox follow anything from
>> HTML5 spec here?
>>
>
> As I already said, because of the Content-Type HTTP header
>
>
>


Re: xkcd: ‮LTR

2012-11-27 Thread Simon Montagu

On 11/27/2012 11:19 AM, Behnam Esfahbod ZWNJ wrote:

Simon,

There's no sign of HTML5 on that page. The head of the file matches all
XHTML 1.1 requirements and passes all checks on validator.w3.org
. Now, why would Firefox follow anything from
HTML5 spec here?


As I already said, because of the Content-Type HTTP header




Re: xkcd: ‮LTR

2012-11-27 Thread Behnam Esfahbod ZWNJ
Simon,

There's no sign of HTML5 on that page. The head of the file matches all
XHTML 1.1 requirements and passes all checks on validator.w3.org. Now, why
would Firefox follow anything from HTML5 spec here?

-Behnam



On Tue, Nov 27, 2012 at 3:37 AM, Simon Montagu wrote:

> On 11/26/2012 08:42 PM, Marc Durdin wrote:
>
>> Somewhat ironically, both Firefox and Internet Explorer, on my machine
>> at least, detect this page is encoded with ISO-8859-1 and cp-1252
>> respectively, instead of UTF-8.  It seems they both ignore the XML
>> prolog which is the only place where the encoding is stated.
>>
>
> Firefox follows the HTML5 spec and ignores the XML prolog, since the
> Content-type is "text/html".
>
>


-- 
Behnam Esfahbod | بهنام اسفهبد
http://behnam.es/
http://zwnj.behnam.es/
GPG Fingerprint: 3E7F B4B6 6F4C A8AB 9BB9 7520 5701 CA40 259E 0F8B


Re: xkcd: ‮LTR

2012-11-27 Thread Simon Montagu

On 11/26/2012 08:42 PM, Marc Durdin wrote:

Somewhat ironically, both Firefox and Internet Explorer, on my machine
at least, detect this page is encoded with ISO-8859-1 and cp-1252
respectively, instead of UTF-8.  It seems they both ignore the XML
prolog which is the only place where the encoding is stated.


Firefox follows the HTML5 spec and ignores the XML prolog, since the 
Content-type is "text/html".




Re: xkcd: LTR

2012-11-26 Thread Philippe Verdy
Anyway, you could at least use Segoe UI before your CMU font, even if Segoe
UI works only in Windows, but it has a decent support for Deseret. May be
there's a good font also on your Mac that ships with some recent version of
Mac OS, which you could list too. Leaving your CMU after them, only for
other OSes.

In all cases, I also suggest that you could tag only the parts that are
written in Deseret with the xml:lang="en-Dsrt", so that you can have a CSS
selector to match these Deseret fonts. For the rest, just use your choice
of "Lucida, Arial, sans-serif" in less selective CSS selectors (that don't
care about the language tags). The template design of these pages are
simple enough that you can do it with just a few modifications.


2012/11/27 Philippe Verdy 

> Also I really don't like the Deseret font:
> {font-family: CMU; src: url(CMUSerif-Roman.ttf) format("truetype");}
> that you have inserted in your stylesheet (da.css) which is used to
> display the whole text content of the page, including the English Latin
> text at the bottom part. This downloaded font is difficult to read as it is
> not hinted at all (so its rendering on screen is extremely poor, we
> probably don't want to print each page of this XKCD series, when the main
> interest is the image which is perfectly readable).
> Could you ask to someone in this list to help you hinting this font a
> minimum (even basic autohinting would be much better).
>
>
> 2012/11/27 Philippe Verdy 
>
>> Did you try add the xml:lang="en-Dsrt" pseudo-attribute to the html
>> element, as suggested by the W3C Unicorn validator ?
>>
>>
>> http://validator.w3.org/unicorn/check?ucn_uri=www.xn--elqus623b.net%2FXKCD%2F1138.html&ucn_lang=fr&ucn_task=conformance#
>>
>> May be this could help IE and Firefox that can't figure out the language
>> used to properly detect the encoding if they still don't trust the XML
>> declaration in this case, to avoid them to use an encoding "guesser". It is
>> anyay curious because this site is valid as XHTML 1.1 (not as HTML5 which
>> uses a very different and simplified prolog, which is not matched here, so
>> the "legacy" rules should apply to detect XHTML here, then legacy HTML4 if
>> XHTML is no longer recognized by IE and Firefox). Because XHTML is properly
>> tagged, the XML requirements should apply and the XML declaration in the
>> prolog should be used without needing to guess the encoding from the rest
>> of the content (starting by a meta element in the HTML head element).
>>
>>
>> 2012/11/27 John H. Jenkins 
>>
>> That's because the domain does, in fact, use sinograms and not Deseret.
>>>  (It's my Chinese name.)
>>>
>>> On 2012年11月26日, at 下午1:54, Philippe Verdy  wrote:
>>>
>>> I wonder why this IDN link appears to me using sinograms in its domain
>>> name, instead of Deseret letters. The link works, but my browser cannot
>>> display it and its displays the Punycoded name instead without decoding it.
>>>
>>> This is strange because I do have Deseret fonts installed and I can
>>> view "Unicoded" HTML pages containing Deseret letters.
>>>
>>>
>>> 2012/11/26 John H. Jenkins 
>>>
 Or, if one prefers:

 http://www.井作恆.net/XKCD/1137.html

 On 2012年11月21日, at 上午10:22, Deborah Goldsmith 
 wrote:


 http://xkcd.com/1137/

 Finally, an xkcd for Unicoders. :-)

 Debbie



>>>
>>>
>>
>


Re: xkcd: LTR

2012-11-26 Thread Philippe Verdy
Did you try add the xml:lang="en-Dsrt" pseudo-attribute to the html
element, as suggested by the W3C Unicorn validator ?

http://validator.w3.org/unicorn/check?ucn_uri=www.xn--elqus623b.net%2FXKCD%2F1138.html&ucn_lang=fr&ucn_task=conformance#

May be this could help IE and Firefox that can't figure out the language
used to properly detect the encoding if they still don't trust the XML
declaration in this case, to avoid them to use an encoding "guesser". It is
anyay curious because this site is valid as XHTML 1.1 (not as HTML5 which
uses a very different and simplified prolog, which is not matched here, so
the "legacy" rules should apply to detect XHTML here, then legacy HTML4 if
XHTML is no longer recognized by IE and Firefox). Because XHTML is properly
tagged, the XML requirements should apply and the XML declaration in the
prolog should be used without needing to guess the encoding from the rest
of the content (starting by a meta element in the HTML head element).


2012/11/27 John H. Jenkins 

> That's because the domain does, in fact, use sinograms and not Deseret.
>  (It's my Chinese name.)
>
> On 2012年11月26日, at 下午1:54, Philippe Verdy  wrote:
>
> I wonder why this IDN link appears to me using sinograms in its domain
> name, instead of Deseret letters. The link works, but my browser cannot
> display it and its displays the Punycoded name instead without decoding it.
>
> This is strange because I do have Deseret fonts installed and I can
> view "Unicoded" HTML pages containing Deseret letters.
>
>
> 2012/11/26 John H. Jenkins 
>
>> Or, if one prefers:
>>
>> http://www.井作恆.net/XKCD/1137.html
>>
>> On 2012年11月21日, at 上午10:22, Deborah Goldsmith  wrote:
>>
>>
>> http://xkcd.com/1137/
>>
>> Finally, an xkcd for Unicoders. :-)
>>
>> Debbie
>>
>>
>>
>
>


Re: xkcd: LTR

2012-11-26 Thread Philippe Verdy
Also I really don't like the Deseret font:
{font-family: CMU; src: url(CMUSerif-Roman.ttf) format("truetype");}
that you have inserted in your stylesheet (da.css) which is used to display
the whole text content of the page, including the English Latin text at the
bottom part. This downloaded font is difficult to read as it is not hinted
at all (so its rendering on screen is extremely poor, we probably don't
want to print each page of this XKCD series, when the main interest is the
image which is perfectly readable).
Could you ask to someone in this list to help you hinting this font a
minimum (even basic autohinting would be much better).


2012/11/27 Philippe Verdy 

> Did you try add the xml:lang="en-Dsrt" pseudo-attribute to the html
> element, as suggested by the W3C Unicorn validator ?
>
>
> http://validator.w3.org/unicorn/check?ucn_uri=www.xn--elqus623b.net%2FXKCD%2F1138.html&ucn_lang=fr&ucn_task=conformance#
>
> May be this could help IE and Firefox that can't figure out the language
> used to properly detect the encoding if they still don't trust the XML
> declaration in this case, to avoid them to use an encoding "guesser". It is
> anyay curious because this site is valid as XHTML 1.1 (not as HTML5 which
> uses a very different and simplified prolog, which is not matched here, so
> the "legacy" rules should apply to detect XHTML here, then legacy HTML4 if
> XHTML is no longer recognized by IE and Firefox). Because XHTML is properly
> tagged, the XML requirements should apply and the XML declaration in the
> prolog should be used without needing to guess the encoding from the rest
> of the content (starting by a meta element in the HTML head element).
>
>
> 2012/11/27 John H. Jenkins 
>
> That's because the domain does, in fact, use sinograms and not Deseret.
>>  (It's my Chinese name.)
>>
>> On 2012年11月26日, at 下午1:54, Philippe Verdy  wrote:
>>
>> I wonder why this IDN link appears to me using sinograms in its domain
>> name, instead of Deseret letters. The link works, but my browser cannot
>> display it and its displays the Punycoded name instead without decoding it.
>>
>> This is strange because I do have Deseret fonts installed and I can
>> view "Unicoded" HTML pages containing Deseret letters.
>>
>>
>> 2012/11/26 John H. Jenkins 
>>
>>> Or, if one prefers:
>>>
>>> http://www.井作恆.net/XKCD/1137.html
>>>
>>> On 2012年11月21日, at 上午10:22, Deborah Goldsmith 
>>> wrote:
>>>
>>>
>>> http://xkcd.com/1137/
>>>
>>> Finally, an xkcd for Unicoders. :-)
>>>
>>> Debbie
>>>
>>>
>>>
>>
>>
>


RE: xkcd: LTR

2012-11-26 Thread Marc Durdin
In this instance the web server is not returning an encoding (“Content-Type: 
text/html”), which is why I was curious to see that neither web browser picked 
up the UTF-8 hint in the XML prolog.
Chrome does detect UTF-8 for that page.

From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy
Sent: Tuesday, 27 November 2012 7:49 AM
To: Marc Durdin
Cc: John H. Jenkins; Unicode Mailing List
Subject: Re: xkcd: LTR

Not a bug of your machine or browser; this is a problem of the webserver in its 
metadata.
The transport layer indicates to the client another encoding in HTTP headers, 
and it prevails to what the document encodes.
In this case, the webserver should be able to transform the source document to 
match what it indicates in HTTP headers, or should better identidy its local 
file contents to send the correct HTTP header).

Send a bug report to the site admin to fix its web server settings, possibly 
per directory, or using a naming scheme for webpages that are encoded 
differently, e.g. "http://www.example.net/path/to/file.UTF-8.html"; will request 
the content of a file named "file.UTF-8.html" with an explicit extension 
"*.UTF-8.html" which can be mapped by the server using another HTTP header for 
the effective UTF-8 encoding (instead of using cp-1252).

My opinion however is that new contents should always be encoded in UTF-8, and 
older contents may be linked to another effective archiving directory where it 
can be mapped to the older encoding without having to reencode the old content.

2012/11/26 Marc Durdin 
mailto:marc.dur...@tavultesoft.com>>
Somewhat ironically, both Firefox and Internet Explorer 9, on my machine at 
least, detect this page is encoded with ISO-8859-1 and cp-1252 respectively, 
instead of UTF-8.  It seems they both ignore the XML prolog which is the only 
place where the encoding is stated.
From: unicode-bou...@unicode.org<mailto:unicode-bou...@unicode.org> 
[mailto:unicode-bou...@unicode.org<mailto:unicode-bou...@unicode.org>] On 
Behalf Of John H. Jenkins
Sent: Tuesday, 27 November 2012 1:15 AM
To: Unicode Mailing List
Subject: Re: xkcd: LTR

Or, if one prefers:

http://www.井作恆.net/XKCD/1137.html

On 2012年11月21日, at 上午10:22, Deborah Goldsmith 
mailto:golds...@apple.com>> wrote:


http://xkcd.com/1137/

Finally, an xkcd for Unicoders. :-)

Debbie





Re: xkcd: LTR

2012-11-26 Thread John H. Jenkins
That's because the domain does, in fact, use sinograms and not Deseret.  (It's 
my Chinese name.)

On 2012年11月26日, at 下午1:54, Philippe Verdy  wrote:

> I wonder why this IDN link appears to me using sinograms in its domain name, 
> instead of Deseret letters. The link works, but my browser cannot display it 
> and its displays the Punycoded name instead without decoding it.
> 
> This is strange because I do have Deseret fonts installed and I can view 
> "Unicoded" HTML pages containing Deseret letters.
> 
> 
> 2012/11/26 John H. Jenkins 
> Or, if one prefers:
> 
> http://www.井作恆.net/XKCD/1137.html
> 
> On 2012年11月21日, at 上午10:22, Deborah Goldsmith  wrote:
> 
>> 
>> http://xkcd.com/1137/ 
>> 
>> Finally, an xkcd for Unicoders. :-)
>> 
>> Debbie
>> 
> 
> 



Re: xkcd: LTR

2012-11-26 Thread Philippe Verdy
Not a bug of your machine or browser; this is a problem of the webserver in
its metadata.
The transport layer indicates to the client another encoding in HTTP
headers, and it prevails to what the document encodes.
In this case, the webserver should be able to transform the source document
to match what it indicates in HTTP headers, or should better identidy its
local file contents to send the correct HTTP header).

Send a bug report to the site admin to fix its web server settings,
possibly per directory, or using a naming scheme for webpages that are
encoded differently, e.g. "http://www.example.net/path/to/file.UTF-8.html";
will request the content of a file named "file.UTF-8.html" with an explicit
extension "*.UTF-8.html" which can be mapped by the server using another
HTTP header for the effective UTF-8 encoding (instead of using cp-1252).

My opinion however is that new contents should always be encoded in UTF-8,
and older contents may be linked to another effective archiving directory
where it can be mapped to the older encoding without having to reencode the
old content.

2012/11/26 Marc Durdin 

>  Somewhat ironically, both Firefox and Internet Explorer, on my machine
> at least, detect this page is encoded with ISO-8859-1 and cp-1252
> respectively, instead of UTF-8.  It seems they both ignore the XML prolog
> which is the only place where the encoding is stated.
>
> ** **
>
> *From:* unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] *On
> Behalf Of *John H. Jenkins
> *Sent:* Tuesday, 27 November 2012 1:15 AM
> *To:* Unicode Mailing List
> *Subject:* Re: xkcd: LTR
>
> ** **
>
> Or, if one prefers:
>
> ** **
>
> http://www.井作恆.net/XKCD/1137.html
>
> ** **
>
> On 2012年11月21日, at 上午10:22, Deborah Goldsmith  wrote:*
> ***
>
>
>
> 
>
>
> http://xkcd.com/1137/
>
>
> 
>
> Finally, an xkcd for Unicoders. :-)
>
>
>
> 
>
> Debbie
>
>
>
> 
>
> ** **
>


Re: xkcd: LTR

2012-11-26 Thread Philippe Verdy
I wonder why this IDN link appears to me using sinograms in its domain
name, instead of Deseret letters. The link works, but my browser cannot
display it and its displays the Punycoded name instead without decoding it.

This is strange because I do have Deseret fonts installed and I can
view "Unicoded" HTML pages containing Deseret letters.


2012/11/26 John H. Jenkins 

> Or, if one prefers:
>
> http://www.井作恆.net/XKCD/1137.html
>
> On 2012年11月21日, at 上午10:22, Deborah Goldsmith  wrote:
>
>
> http://xkcd.com/1137/
>
> Finally, an xkcd for Unicoders. :-)
>
> Debbie
>
>
>


RE: xkcd: ‮LTR

2012-11-26 Thread Marc Durdin
Somewhat ironically, both Firefox and Internet Explorer, on my machine at 
least, detect this page is encoded with ISO-8859-1 and cp-1252 respectively, 
instead of UTF-8.  It seems they both ignore the XML prolog which is the only 
place where the encoding is stated.

From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of John H. Jenkins
Sent: Tuesday, 27 November 2012 1:15 AM
To: Unicode Mailing List
Subject: Re: xkcd: LTR

Or, if one prefers:

http://www.井作恆.net/XKCD/1137.html

On 2012年11月21日, at 上午10:22, Deborah Goldsmith 
mailto:golds...@apple.com>> wrote:



http://xkcd.com/1137/


Finally, an xkcd for Unicoders. :-)


Debbie





Re: xkcd: ‮LTR

2012-11-26 Thread John H. Jenkins
Or, if one prefers:

http://www.井作恆.net/XKCD/1137.html

On 2012年11月21日, at 上午10:22, Deborah Goldsmith  wrote:

> 
> http://xkcd.com/1137/ 
> 
> Finally, an xkcd for Unicoders. :-)
> 
> Debbie
>