Re: [whatwg] Default encoding to UTF-8?

2011-12-06 Thread Jukka K. Korpela

2011-12-06 6:54, Leif Halvard Silli wrote:


Yeah, it would be a pity if it had already become an widespread
cargo-cult to - all at once - use HTML5 doctype without using UTF-8
*and* without using some encoding declaration *and* thus effectively
relying on the default locale encoding ... Who does have a data corpus?


I think we wound need to ask search engine developers about that, but 
what is this proposed change to defaults supposed to achieve. It would 
break any old page that does not specify the encoding, as soon as the 
the doctype is changed to !doctype html or this doctype is added to a 
page that lacked a doctype.


Since !doctype html is the simplest way to put browsers to standards 
mode, this would punish authors who have realized that their page works 
better in standards mode but are unaware of a completely different and 
fairly complex problem. (Basic character encoding issues are of course 
not that complex to you and me or most people around here; but most 
authors are more or less confused with them, and I don't think we should 
add to the confusion.)


There's a little point in changing the specs to say something very 
different from what previous HTML specs have said and from actual 
browser behavior. If the purpose is to make things more exactly defined 
(a fixed encoding vs. implementation-defined), then I think such 
exactness is a luxury we cannot afford. Things would be all different if 
we were designing a document format from scratch, with no existing 
implementations and no existing usage. If the purpose is UTF-8 
evangelism, then it would be just the kind of evangelism that produces 
angry people, not converts.


If there's something that should be added to or modified in the 
algorithm for determining character encoding, the I'd say it's error 
processing. I mean user agent behavior when it detects, after running 
the algorithm, when processing the document data, that there is a 
mismatch between them. That is, that the data contains octets or octet 
sequences that are not allowed in the encoding or that denote 
noncharacters. Such errors are naturally detected when the user agent 
processes the octets; the question is what the browser should do then.


When data that is actually in ISO-8859-1 or some similar encoding has 
been mislabeled as UTF-8 encoded, then, if the data contains octets 
outside the ASCII, character-level errors are likely to occur. Many 
ISO-8859-1 octets are just not possible in UTF-8 data. The converse 
error may also cause character-level errors. And these are not uncommon 
situations - they seem occur increasingly often, partly due to cargo 
cult use of UTF-8 (when it means declaring UTF-8 but not actually 
using it, or vice versa), partly due increased use of UTF-8 combined 
with ISO-8859-1 encoded data creeping in from somewhere into UTF-8 
encoded data.


From the user's point of view, the character-level errors currently 
result is some gibberish (e.g., some odd box appearing instead of a 
character, in one place) or in total mess (e.g. a large number non-ASCII 
characters displayed all wrong). In either case, I think an error should 
be signalled to the user, together with
a) automatically trying another encoding, such as the locale default 
encoding instead of UTF-8 or UTF-8 instead of anything else
b) suggesting to the user that he should try to view the page using some 
other encoding, possibly with a menu of encodings offered as part of the 
error explanation

c) a combination of the above.

Although there are good reasons why browsers usually don't give error 
messages, this would be a special case. It's about the primary 
interpretation of the data in the document and about a situation where 
some data has no interpretation in the assumed encoding - but usually 
has an interpretation in some other encoding.


The current Character encoding overrides rules are questionable 
because they often mask out data errors that would have helped to detect 
problems that can be solved constructively. For example, if data labeled 
as ISO-8859-1 contains an octet in the 80...9F range, then it may well 
be the case that the data is actually windows-1252 encoded and the 
override helps everyone. But it may also be the case that the data is 
in a different encoding and that the override therefore results in 
gibberish shown to the user, with no hint of the cause of the problem. 
It would therefore be better to signal a problem to the user, display 
the page using the windows-1252 encoding but with some instruction or 
hint on changing the encoding. And a browser should in this process 
really analyze whether the data can be windows-1252 encoded data that 
contains only characters permitted in HTML.


Yucca


Re: [whatwg] Default encoding to UTF-8?

2011-12-06 Thread NARUSE, Yui
(2011/12/06 17:39), Jukka K. Korpela wrote:
 2011-12-06 6:54, Leif Halvard Silli wrote:
 
 Yeah, it would be a pity if it had already become an widespread
 cargo-cult to - all at once - use HTML5 doctype without using UTF-8
 *and* without using some encoding declaration *and* thus effectively
 relying on the default locale encoding ... Who does have a data corpus?

I found it: http://rink77.web.fc2.com/html/metatagu.html
It uses HTML5 doctype and not declare encoding and its encoding is Shift_JIS,
the default encoding of Japanese locale.

 Since !doctype html is the simplest way to put browsers to standards 
 mode, this would punish authors who have realized that their page works 
 better in standards mode but are unaware of a completely different and 
 fairly complex problem. (Basic character encoding issues are of course not 
 that complex to you and me or most people around here; but most authors are 
 more or less confused with them, and I don't think we should add to the 
 confusion.)

I don't think there is a page works better in standards mode than *current* 
loose mode.

 There's a little point in changing the specs to say something very different 
 from what previous HTML specs have said and from actual browser behavior. If 
 the purpose is to make things more exactly defined (a fixed encoding vs. 
 implementation-defined), then I think such exactness is a luxury we cannot 
 afford. Things would be all different if we were designing a document format 
 from scratch, with no existing implementations and no existing usage. If the 
 purpose is UTF-8 evangelism, then it would be just the kind of evangelism 
 that produces angry people, not converts.

Agreed, if we design new spec, there's no reason to choose other than UTF-8.
But HTML has long history and many content.
We already have HTML*5* pages which doesn't have encoding declaration.

 If there's something that should be added to or modified in the algorithm for 
 determining character encoding, the I'd say it's error processing. I mean 
 user agent behavior when it detects, after running the algorithm, when 
 processing the document data, that there is a mismatch between them. That is, 
 that the data contains octets or octet sequences that are not allowed in the 
 encoding or that denote noncharacters. Such errors are naturally detected 
 when the user agent processes the octets; the question is what the browser 
 should do then.

Current implementations replaces such an invalid octet with a replacement 
character.
Or some implementations scans almost the page and uses an encoding
with which all octets in the page are valid.

 When data that is actually in ISO-8859-1 or some similar encoding has been 
 mislabeled as UTF-8encoded, then, if the data contains octets outside 
 the ASCII, character-level errors are likely to occur. Many ISO-8859-1 octets 
 are just not possible in UTF-8 data. The converse error may also cause 
 character-level errors. And these are not uncommon situations - they seem 
 occur increasingly often, partly due to cargo cult use of UTF-8 (when it 
 means declaring UTF-8 but not actually using it, or vice versa), partly due 
 increased use of UTF-8 combined with ISO-8859-1 encoded data creeping in from 
 somewhere into UTF-8 encoded data.

In such case, the page should be failed to show on the author's environment.

 From the user's point of view, the character-level errors currently result is 
 some gibberish (e.g., some odd box appearing instead of a character, in one 
 place) or in total mess (e.g. a large number non-ASCII characters displayed 
 all wrong). In either case, I think an error should be signalled to the user, 
 together with
 a) automatically trying another encoding, such as the locale default encoding 
 instead of UTF-8 or UTF-8 instead of anything else
 b) suggesting to the user that he should try to view the page using some 
 other encoding, possibly with a menu of encodings offered as part of the 
 error explanation
 c) a combination of the above.

This premises that a user know the correct encoding.
But European people really know the correct encoding of ISO-8859-* pages?
I, Japanese, imagine that it is hard that distingusih ISO-8859-1 page and 
ISO-8859-2 page.

 Although there are good reasons why browsers usually don't give error 
 messages, this would be a special case. It's about the primary interpretation 
 of the data in the document and about a situation where some data has no 
 interpretation in the assumed encoding - but usually has an interpretation in 
 some other encoding.

Some browsers alerts scripting issues.
Why they cannot alerts an encoding issue?

 The current Character encoding overrides rules are questionable because 
 they often mask out data errors that would have helped to detect problems 
 that can be solved constructively. For example, if data labeled as ISO-8859-1 
 contains an octet in the 80...9F range, then it may well be the case that the 
 data is actually 

Re: [whatwg] object, type, and fallback

2011-12-06 Thread Brady Eidson

On Dec 5, 2011, at 23:06 , Simon Pieters wrote:

 On Mon, 05 Dec 2011 22:19:33 +0100, Brady Eidson beid...@apple.com wrote:
 
 I can't find a definitive answer for the following scenario:
 
 1 - A page has a plug-in with fallback specified as follows:
 
 object type=application/x-shockwave-flash
 param name=movie value=Example.swf/
img src=Fallback.png
 /object
 
 2 - The page is loaded, the browser instantiates the plug-in, and the 
 plug-in content is shown.
 
 3 - A script later comes along and dynamically changes the object's type 
 attribute to application/some-unsupported-type
 
 Should the browser dynamically and immediately switch from the plug-in to 
 the fallback image?
 If not, what should it do?
 And is this specified anywhere?
 
 Thanks,
 ~Brady
 
 
 ... when neither its classid attribute nor its data attribute are present, 
 whenever its type attribute is set, changed, or removed: the user agent must 
 queue a task to run the following steps to (re)determine what the object 
 element represents. The task source for this task is the DOM manipulation 
 task source.
 
 http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#the-object-element
 
 The algorithm then determines in step 5 that there's no suitable plugin, and 
 falls back.

Yup, it's as clear as day when pointed out.

Thanks!

~Brady

 
 -- 
 Simon Pieters
 Opera Software



[whatwg] Proposal: intent tag for Web Intents API

2011-12-06 Thread James Hawkins
*** Overview ***

The W3C Web Intents Task Force is working on the Web Intents API at
public-web-inte...@w3.org: see the bottom of this email for details.

Web Intents is a web platform API that provides client-side service
discovery and inter-application communication.  Services register to
handle high-level actions (e.g., share, edit, pick), while clients
invoke an intent (action + type + data).  For example twitter.com may
allow the user to register the site as a provider of the 'share'
action.

One of the critical pieces of the API is a declarative registration
which allows sites to declare which intents they may be registered
for.  The current draft of the API calls for a new HTML tag, intent,
the attributes of which describe the service registration:

!ENTITY % Disposition {window|inline}

!ELEMENT INTENT - O EMPTY  -- a Web Intents registration -
!ATTLIST INTENT
 action  %URI;  #REQUIRED -- URI specifying action --
 type%ContentTypes; #IMPLIED  -- advisory content types --
 href%URI;  #IMPLIED  -- URI for linked resource --
 title   %i18n; #IMPLIED  -- service title --
 disposition %Disposition   window-- where the service is created --
 

We settled on the intent tag after reviewing several alternatives
(see below).  The intent tag offers the greatest ease-of-use for
developers, and the ability to crawl/index sites that support Intents.

One of the cool things about the declarative syntax is that it allows
one to create sites (like http://www.openintents.org/en/intentstable)
which serve as a database of services that support intents.  We're
currently adding a section on webintents.org that allows the developer
of a service to be add his service to the registry by entering the
service URL, which we then crawl and index the intents.

One could also imagine exposing intent services using search engine technology.

*** Proposal ***

Add the intent tag to the HTML spec.

*** Alternatives ***

Imperative DOM registration: registerIntentHandler(...).
Pros:
 * Analogous to registerProtocolHandler, registerContentHandler.
 * Doesn't require addition to the HTML spec.
Cons:
 * Not declarative, not as easy to index.
 * Timing requirements unclear (Is the registration removed if a
service does not call registerIntentHandler() on the next visit? If so
how long does the UA need to wait to 'be sure'?)
 * Heavier footprint in the DOM API.
 * Less self-documenting code:

registerIntentHandler('webintents.org/share',
 'text/uri-list',
 'handler.html',
 'My Sharer',
 'inline');

intent action=webintents.org
 type=text/uri-list
 href=handler.html
 title=My Sharer
 disposition=inline
/intent

link rel=intents:
Pros:
 * Declarative.
Cons:
 * link rel has become a dumping ground for these type of usages.
 * Need to modify HTML spec to add appropriate attributes.

CRX-less Web Apps
(http://code.google.com/intl/en-US/chrome/apps/docs/no_crx.html):
Pros:
 * Declarative.
Cons:
 * Not standardized.
 * Requires extra level of indirection.

*** API Status ***

Within the W3C the Webapps WG is rechartering to include Web Intents
as a joint-deliverable with the DAP WG (which already had Intents in
its charter).  Discussion is taking place about the API at
public-web-inte...@w3.org.

The draft API is current hosted at [1], though I'm working feverishly
to convert this into a W3C-style draft format.

[1] 
https://sites.google.com/a/chromium.org/dev/developers/design-documents/webintentsapi

Our use cases, JavaScript shim, and example pages are hosted at
http://webintents.org.

Thanks,
James Hawkins


Re: [whatwg] Proposal: intent tag for Web Intents API

2011-12-06 Thread Paul Kinlan
I would like to add that we also had a long discussion about trying to
re-use the meta element for specifying intents.

The syntax was something like:
meta name=intent-action content=http://webintents.org/share; /
meta name=intent-type content=image/* /

Pros:
* declarative
* use's existing tags so no changes to html spec

Cons:
* no multiplicity - can't define multiple intents on the page without
complex encoding in the content attribute
* programmatically adding an intent into a page is very hard because
there are two tags.  The UA can't decide when to throw up the prompt
to grant access to install the web app as an intent handler.

On Tue, Dec 6, 2011 at 6:00 PM, James Hawkins jhawk...@google.com wrote:
 *** Overview ***

 The W3C Web Intents Task Force is working on the Web Intents API at
 public-web-inte...@w3.org: see the bottom of this email for details.

 Web Intents is a web platform API that provides client-side service
 discovery and inter-application communication.  Services register to
 handle high-level actions (e.g., share, edit, pick), while clients
 invoke an intent (action + type + data).  For example twitter.com may
 allow the user to register the site as a provider of the 'share'
 action.

 One of the critical pieces of the API is a declarative registration
 which allows sites to declare which intents they may be registered
 for.  The current draft of the API calls for a new HTML tag, intent,
 the attributes of which describe the service registration:

 !ENTITY % Disposition {window|inline}

 !ELEMENT INTENT - O EMPTY      -- a Web Intents registration -
 !ATTLIST INTENT
  action      %URI;          #REQUIRED -- URI specifying action --
  type        %ContentTypes; #IMPLIED  -- advisory content types --
  href        %URI;          #IMPLIED  -- URI for linked resource --
  title       %i18n;         #IMPLIED  -- service title --
  disposition %Disposition   window    -- where the service is created --
  

 We settled on the intent tag after reviewing several alternatives
 (see below).  The intent tag offers the greatest ease-of-use for
 developers, and the ability to crawl/index sites that support Intents.

 One of the cool things about the declarative syntax is that it allows
 one to create sites (like http://www.openintents.org/en/intentstable)
 which serve as a database of services that support intents.  We're
 currently adding a section on webintents.org that allows the developer
 of a service to be add his service to the registry by entering the
 service URL, which we then crawl and index the intents.

 One could also imagine exposing intent services using search engine 
 technology.

 *** Proposal ***

 Add the intent tag to the HTML spec.

 *** Alternatives ***

 Imperative DOM registration: registerIntentHandler(...).
 Pros:
  * Analogous to registerProtocolHandler, registerContentHandler.
  * Doesn't require addition to the HTML spec.
 Cons:
  * Not declarative, not as easy to index.
  * Timing requirements unclear (Is the registration removed if a
 service does not call registerIntentHandler() on the next visit? If so
 how long does the UA need to wait to 'be sure'?)
  * Heavier footprint in the DOM API.
  * Less self-documenting code:

 registerIntentHandler('webintents.org/share',
                             'text/uri-list',
                             'handler.html',
                             'My Sharer',
                             'inline');

 intent action=webintents.org
         type=text/uri-list
         href=handler.html
         title=My Sharer
         disposition=inline
 /intent

 link rel=intents:
 Pros:
  * Declarative.
 Cons:
  * link rel has become a dumping ground for these type of usages.
  * Need to modify HTML spec to add appropriate attributes.

 CRX-less Web Apps
 (http://code.google.com/intl/en-US/chrome/apps/docs/no_crx.html):
 Pros:
  * Declarative.
 Cons:
  * Not standardized.
  * Requires extra level of indirection.

 *** API Status ***

 Within the W3C the Webapps WG is rechartering to include Web Intents
 as a joint-deliverable with the DAP WG (which already had Intents in
 its charter).  Discussion is taking place about the API at
 public-web-inte...@w3.org.

 The draft API is current hosted at [1], though I'm working feverishly
 to convert this into a W3C-style draft format.

 [1] 
 https://sites.google.com/a/chromium.org/dev/developers/design-documents/webintentsapi

 Our use cases, JavaScript shim, and example pages are hosted at
 http://webintents.org.

 Thanks,
 James Hawkins



-- 
Paul Kinlan
Developer Advocate @ Google for Chrome and HTML5
G+: http://plus.ly/paul.kinlan
t: +447730517944
tw: @Paul_Kinlan
LinkedIn: http://uk.linkedin.com/in/paulkinlan
Blog: http://paul.kinlan.me
Skype: paul.kinlan


Re: [whatwg] Proposal: intent tag for Web Intents API

2011-12-06 Thread James Hawkins
To clarify, my use of the word 'we' below is we the designers of the
API, not the participants of the Web Intents TF.

As stated in the 'API Status' section below, discussions about Web
Intents are ongoing in the TF.  In addition, note that nothing is
finalized in the Web Intents API as of yet.

Since the registration aspect of the current draft of the API requires
the addition of a new tag, I decided to hold that discussion on the
appropriate ML, namely this ML.

Thanks,
James

On Tue, Dec 6, 2011 at 10:00 AM, James Hawkins jhawk...@google.com wrote:
 *** Overview ***

 The W3C Web Intents Task Force is working on the Web Intents API at
 public-web-inte...@w3.org: see the bottom of this email for details.

 Web Intents is a web platform API that provides client-side service
 discovery and inter-application communication.  Services register to
 handle high-level actions (e.g., share, edit, pick), while clients
 invoke an intent (action + type + data).  For example twitter.com may
 allow the user to register the site as a provider of the 'share'
 action.

 One of the critical pieces of the API is a declarative registration
 which allows sites to declare which intents they may be registered
 for.  The current draft of the API calls for a new HTML tag, intent,
 the attributes of which describe the service registration:

 !ENTITY % Disposition {window|inline}

 !ELEMENT INTENT - O EMPTY      -- a Web Intents registration -
 !ATTLIST INTENT
  action      %URI;          #REQUIRED -- URI specifying action --
  type        %ContentTypes; #IMPLIED  -- advisory content types --
  href        %URI;          #IMPLIED  -- URI for linked resource --
  title       %i18n;         #IMPLIED  -- service title --
  disposition %Disposition   window    -- where the service is created --
  

 We settled on the intent tag after reviewing several alternatives
 (see below).  The intent tag offers the greatest ease-of-use for
 developers, and the ability to crawl/index sites that support Intents.

 One of the cool things about the declarative syntax is that it allows
 one to create sites (like http://www.openintents.org/en/intentstable)
 which serve as a database of services that support intents.  We're
 currently adding a section on webintents.org that allows the developer
 of a service to be add his service to the registry by entering the
 service URL, which we then crawl and index the intents.

 One could also imagine exposing intent services using search engine 
 technology.

 *** Proposal ***

 Add the intent tag to the HTML spec.

 *** Alternatives ***

 Imperative DOM registration: registerIntentHandler(...).
 Pros:
  * Analogous to registerProtocolHandler, registerContentHandler.
  * Doesn't require addition to the HTML spec.
 Cons:
  * Not declarative, not as easy to index.
  * Timing requirements unclear (Is the registration removed if a
 service does not call registerIntentHandler() on the next visit? If so
 how long does the UA need to wait to 'be sure'?)
  * Heavier footprint in the DOM API.
  * Less self-documenting code:

 registerIntentHandler('webintents.org/share',
                             'text/uri-list',
                             'handler.html',
                             'My Sharer',
                             'inline');

 intent action=webintents.org
         type=text/uri-list
         href=handler.html
         title=My Sharer
         disposition=inline
 /intent

 link rel=intents:
 Pros:
  * Declarative.
 Cons:
  * link rel has become a dumping ground for these type of usages.
  * Need to modify HTML spec to add appropriate attributes.

 CRX-less Web Apps
 (http://code.google.com/intl/en-US/chrome/apps/docs/no_crx.html):
 Pros:
  * Declarative.
 Cons:
  * Not standardized.
  * Requires extra level of indirection.

 *** API Status ***

 Within the W3C the Webapps WG is rechartering to include Web Intents
 as a joint-deliverable with the DAP WG (which already had Intents in
 its charter).  Discussion is taking place about the API at
 public-web-inte...@w3.org.

 The draft API is current hosted at [1], though I'm working feverishly
 to convert this into a W3C-style draft format.

 [1] 
 https://sites.google.com/a/chromium.org/dev/developers/design-documents/webintentsapi

 Our use cases, JavaScript shim, and example pages are hosted at
 http://webintents.org.

 Thanks,
 James Hawkins


Re: [whatwg] Proposal: intent tag for Web Intents API

2011-12-06 Thread Anne van Kesteren
On Tue, 06 Dec 2011 19:40:20 +0100, Paul Kinlan paulkin...@google.com  
wrote:

I would like to add that we also had a long discussion about trying to
re-use the meta element for specifying intents.

The syntax was something like:
meta name=intent-action content=http://webintents.org/share; /
meta name=intent-type content=image/* /

Pros:
* declarative
* use's existing tags so no changes to html spec

Cons:
* no multiplicity - can't define multiple intents on the page without
complex encoding in the content attribute
* programmatically adding an intent into a page is very hard because
there are two tags.  The UA can't decide when to throw up the prompt
to grant access to install the web app as an intent handler.


You could also have

meta name=intent content=http://webintents.org/share image/*

or some such. Splitting a string on spaces and using the result is not  
that hard and a common pattern. And seems like a much better alternative  
than changing the HTML parser. Especially changing the way head is  
parsed is hairy. Every new element we introduce there will cause a body  
to be implied before it in down-level clients. That's very problematic.



--
Anne van Kesteren
http://annevankesteren.nl/


Re: [whatwg] Proposal: intent tag for Web Intents API

2011-12-06 Thread Anne van Kesteren
On Tue, 06 Dec 2011 19:00:36 +0100, James Hawkins jhawk...@google.com  
wrote:

!ENTITY % Disposition {window|inline}

!ELEMENT INTENT - O EMPTY  -- a Web Intents registration -
!ATTLIST INTENT
 action  %URI;  #REQUIRED -- URI specifying action --
 type%ContentTypes; #IMPLIED  -- advisory content types --
 href%URI;  #IMPLIED  -- URI for linked resource --
 title   %i18n; #IMPLIED  -- service title --
 disposition %Disposition   window-- where the service is created --
 


Off-topic, but I'm curious as to why you decided to propose a new element  
this way?



--
Anne van Kesteren
http://annevankesteren.nl/


Re: [whatwg] Proposal: intent tag for Web Intents API

2011-12-06 Thread James Graham

On Tue, 6 Dec 2011, Anne van Kesteren wrote:

Especially changing the way head is parsed is 
hairy. Every new element we introduce there will cause a body to be implied 
before it in down-level clients. That's very problematic.


Yes, I consider adding new elements to head to be very very bad for this 
reason. Breaking DOM consistency between supporting and non-supporting 
browsers can cause adding an intent to cause unrelated breakage (e.g. by 
changing document.body.firstChild).


Re: [whatwg] Default encoding to UTF-8?

2011-12-06 Thread Jukka K. Korpela

2011-12-06 15:59, NARUSE, Yui wrote:


(2011/12/06 17:39), Jukka K. Korpela wrote:

2011-12-06 6:54, Leif Halvard Silli wrote:


Yeah, it would be a pity if it had already become an widespread
cargo-cult to - all at once - use HTML5 doctype without using UTF-8
*and* without using some encoding declaration *and* thus effectively
relying on the default locale encoding ... Who does have a data corpus?


I found it: http://rink77.web.fc2.com/html/metatagu.html


I'm not sure of the intended purpose of that demo page, but it seems to 
illustrate my point.



It uses HTML5 doctype and not declare encoding and its encoding is Shift_JIS,
the default encoding of Japanese locale.


My Firefox uses the ISO-8859-1 encoding, my IE the windows-1252 
encoding, resulting in a mess of course. But the point is that both 
interpretations mean data errors at the character level - even seen as 
windows-1252, it contains bytes with no assigned meaning (e.g., 0x81 is 
UNDEFINED).



Current implementations replaces such an invalid octet with a replacement 
character.


No, it varies by implementation.


When data that is actually in ISO-8859-1 or some similar encoding has been mislabeled as 
UTF-8  encoded, then, if the data contains octets outside the ASCII, character-level 
errors are likely to occur. Many ISO-8859-1 octets are just not possible in UTF-8 data. 
The converse error may also cause character-level errors. And these are not uncommon 
situations - they seem occur increasingly often, partly due to cargo cult use of 
UTF-8 (when it means declaring UTF-8 but not actually using it, or vice versa), 
partly due increased use of UTF-8 combined with ISO-8859-1 encoded data creeping in from 
somewhere into UTF-8 encoded data.


In such case, the page should be failed to show on the author's environment.


An authoring tool should surely indicate the problem. But what should 
user agents do when they face such documents and need to do something 
with them?



 From the user's point of view, the character-level errors currently result is 
some gibberish (e.g., some odd box appearing instead of a character, in one 
place) or in total mess (e.g. a large number non-ASCII characters displayed all 
wrong). In either case, I think an error should be signalled to the user, 
together with
a) automatically trying another encoding, such as the locale default encoding 
instead of UTF-8 or UTF-8 instead of anything else
b) suggesting to the user that he should try to view the page using some other 
encoding, possibly with a menu of encodings offered as part of the error 
explanation
c) a combination of the above.


This premises that a user know the correct encoding.


Alternative b) means that the user can try some encodings. A user agent 
could give a reasonable list of options.


Consider the example document mentioned. When viewed in a Western 
environment, it probably looks all gibberish. Alternative a) would 
probably not help, but alternative b) would have some chances. If the 
user has some reason to suspect that the page might be in Japanese, he 
would probably try the Japanese encodings in the browser's list of 
encodings, and this would make the document readable after a try or two.



I, Japanese, imagine that it is hard that distingusih ISO-8859-1 page and 
ISO-8859-2 page.


Yes, but the idea isn't really meant to apply to such cases, as there is 
no way to detect _at the character encoding level_ to recognize 
ISO-8859-1 mislabeled as ISO-8859-2 or vice versa.



Some browsers alerts scripting issues.
Why they cannot alerts an encoding issue?


Surely they could, though I was not thinking an alert in a popup sense - 
rather, a red error indicator somewhere. There would be many more 
reasons to signal encoding issues than to signal scripting issues, as we 
know that web pages generally contain loads of client-side scripting 
errors that do not actually affect page rendering or functionality.



The current Character encoding overrides rules are questionable because they often mask out data 
errors that would have helped to detect problems that can be solved constructively. For example, if data 
labeled as ISO-8859-1 contains an octet in the 80...9F range, then it may well be the case that the data is 
actually windows-1252 encoded and the override helps everyone. But it may also be the case that 
the data is in a different encoding and that the override therefore results in gibberish shown to 
the user, with no hint of the cause of the problem.


I think such case doesn't exist.
On character encoding overrides a superset overrides a standard set.


Technically, not quite so (e.g., in ISO-8859-1, 0x81 is U+0081, a 
control character that is not allowed in HTML - I suppose, though I 
cannot really find a statement on this in HTML5 - whereas in 
windows-1252, it is undefined).


More importantly my point was about errors in data, resulting e.g. from 
a faulty code conversion or some malfunctioning software that has 
produced, 

Re: [whatwg] Proposal: intent tag for Web Intents API

2011-12-06 Thread James Hawkins
On Tue, Dec 6, 2011 at 1:08 PM, James Graham jgra...@opera.com wrote:
 On Tue, 6 Dec 2011, Anne van Kesteren wrote:

 Especially changing the way head is parsed is hairy. Every new element
 we introduce there will cause a body to be implied before it in down-level
 clients. That's very problematic.


 Yes, I consider adding new elements to head to be very very bad for this
 reason. Breaking DOM consistency between supporting and non-supporting
 browsers can cause adding an intent to cause unrelated breakage (e.g. by
 changing document.body.firstChild).

Originally we envisioned using a self-closing tag placed in head for
the intent tag; however, we're now leaning towards not using
self-closing and having the tag be placed in the body with fallback
content, e.g., to install an extension to provide similar
functionality.

intent action=webintents.org/share
  Click here to install our extension that implements sharing!
/intent

What are your thoughts on this route?

James


Re: [whatwg] Proposal: intent tag for Web Intents API

2011-12-06 Thread James Hawkins
On Tue, Dec 6, 2011 at 1:16 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:
 On Tue, Dec 6, 2011 at 1:14 PM, James Hawkins jhawk...@google.com wrote:
 Originally we envisioned using a self-closing tag placed in head for
 the intent tag; however, we're now leaning towards not using
 self-closing and having the tag be placed in the body with fallback
 content, e.g., to install an extension to provide similar
 functionality.

 intent action=webintents.org/share
  Click here to install our extension that implements sharing!
 /intent

 What are your thoughts on this route?

 So, when the intent tag is supported, it's not displayed at all, and
 instead solely handled by the browser?  This seems okay to me.


Correct.


Re: [whatwg] Proposal: intent tag for Web Intents API

2011-12-06 Thread James Graham

On Tue, 6 Dec 2011, James Hawkins wrote:


On Tue, Dec 6, 2011 at 1:16 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:

On Tue, Dec 6, 2011 at 1:14 PM, James Hawkins jhawk...@google.com wrote:

Originally we envisioned using a self-closing tag placed in head for
the intent tag; however, we're now leaning towards not using
self-closing and having the tag be placed in the body with fallback
content, e.g., to install an extension to provide similar
functionality.

intent action=webintents.org/share
 Click here to install our extension that implements sharing!
/intent

What are your thoughts on this route?


So, when the intent tag is supported, it's not displayed at all, and
instead solely handled by the browser?  This seems okay to me.



Correct.



This seems to remove my major objection to the new tag design.

Re: [whatwg] Default encoding to UTF-8?

2011-12-06 Thread Jukka K. Korpela

2011-12-06 22:58, Leif Halvard Silli write:


There is now a bug, and the editor says the outcome depends on a
browser vendor to ship it:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=15076

Jukka K. Korpela Tue Dec 6 00:39:45 PST 2011


what is this proposed change to defaults supposed to achieve. […]


I'd say the same as in XML: UTF-8 as a reliable, common default.


The bug was created so that the argument given was:
It would be nice to minimize number of declarations a page needs to 
include.


That is, author convenience - so that authors could work sloppily and 
produce documents that could fail on user agents that haven't 
implemented this change.


This sounds more absurd than I can describe.

XML was created as a new data format; it was an entirely different issue.


If there's something that should be added to or modified in the
algorithm for determining character encoding, the I'd say it's error
processing. I mean user agent behavior when it detects, [...]


There is already an (optional) detection step in the algorithm - but UA
treat that step differently, it seems.


I'm afraid I can't find it - I mean the treatment of a document for 
which some encoding has been deduced (say, directly from HTTP headers) 
and which then turns out to violate the rules of the encoding.


Yucca




Re: [whatwg] Enhancement request: change EventSource to allow cross-domain access

2011-12-06 Thread Ian Hickson
On Thu, 23 Jun 2011, Per-Erik Brodin wrote:
 
 Another question was raised in 

https://bugs.webkit.org/show_bug.cgi?id=61862#c17

 The origin set on the dispatched message events is specified to be the 
 origin of the event stream's URL. Is this the URL passed to the 
 EventSource constructor or the URL after some potential redirects (even 
 temporary)?

Fixed to be the final URL (it used to not matter).


On Thu, 23 Jun 2011, ilya goberman wrote:

 It is personalized on something that we send in the URL (cleint id I 
 mentioned below) which identifies which user's data is requested. We do 
 not use cookies.
 
 Ian was kind enough to explain to me how EventSource will function.

 Apparently EventSource will have withCredentials always set to true 
 (false is not allowed). That means that using * for 
 Access-Control-Allow-Origin will never work for the EventSource and I 
 have to put request's Origin value in the response's 
 Access-Control-Allow-Origin to enable CORS. It is not a huge deal, 
 unless there are some proxies that will not pass Origin through (I do 
 not really know if there are any). Thanks

FWIW, I've since changed the spec so that you can specify whether to send 
credentials or not. When credentials aren't sent, you can use the * form.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] EventSource - Handling a charset in the content-type header

2011-12-06 Thread Ian Hickson
On Mon, 4 Jul 2011, Anne van Kesteren wrote:

 I noticed Ian updated the specification, but it seems current 
 implementations differ quite wildly. E.g. Gecko happily ignores
 
  Content-Type: text/event-stream;charset=tralala
 
 as does Opera instead of closing the connection. Chrome bites, but happily
 ignores
 
  Content-Type: text/event-stream;tralala
 
 along with Opera and Gecko. Safari 5.0.5 bites on that however and also on
 charset=tralala. All browsers seem to allow (note the trailing semi-colon):
 
  Content-Type: text/event-stream;
 
 Are we sure we want this strict checking of media type parameters? I 
 always thought the media type itself was what strict checking should be 
 done upon, but that its parameters were extension points, not points of 
 failure.

Fair enough. I've changed the spec to ignore parameters on 
text/event-stream.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] proposal: extend time to markup durations

2011-12-06 Thread Ian Hickson
On Thu, 14 Jul 2011, Tantek �~Gelik wrote:

 Some in the microformats community have been making good use of the 
 time element, e.g. for publishing hCalendar, and implementing 
 consuming/converting hCalendar [1] with good success.
 
 It would be great if the time element could support expressing 
 durations as well for the use cases as needed by the hMedia and hAudio 
 microformats as well as other use-cases (Wikipedia, IMDB).

I've since added this feature. It supports both the hard-to-read but 
pretty well-established ISO8601 duration syntax:

   timePT4H18M3S/time

...and a slightly easier-to-read syntax:

   time4h 18m 3s/time


Earlier in this thread I suggested maybe dropping time altogether, with 
a more generic replacement. Having tried that and gotten lots of negative 
feedback on the topic, I've since changed the spec again and we now have 
both a generic element for machine-readable data, data, as well as an 
element specifically for time-related machine-readable data, time.

The old time element, with its rendering rules and DOM API, are gone.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Default encoding to UTF-8?

2011-12-06 Thread Henri Sivonen
On Mon, Dec 5, 2011 at 8:55 PM, Leif Halvard Silli
xn--mlform-...@xn--mlform-iua.no wrote:
 When you say 'requires': Of course, HTML5 recommends that you declare
 the encoding (via HTTP/higher protocol, via the BOM 'sideshow' or via
 meta charset=UTF-8). I just now also discovered that Validator.nu
 issues an error message if it does not find any of of those *and* the
 document contains non-ASCII. (I don't know, however, whether this error
 message is just something Henri added at his own discretion - it would
 be nice to have it literally in the spec too.)

I believe I was implementing exactly what the spec said at the time I
implemented that behavior of Validator.nu. I'm particularly convinced
that I was following the spec, because I think it's not the optimal
behavior. I think pages that don't declare their encoding should
always be non-conforming even if they only contain ASCII bytes,
because that way templates created by English-oriented (or lorem ipsum
-oriented) authors would be caught as non-conforming before non-ASCII
text gets filled into them later. Hixie disagreed.

 HTML5 says that validators *may* issue a warning if UTF-8 is *not* the
 encoding. But so far, validator.nu has not picked that up.

Maybe it should. However, non-UTF-8 pages that label their encoding,
that use one of the encodings that we won't be able to get rid of
anyway and that don't contain forms aren't actively harmful. (I'd
argue that they are *less* harmful than unlabeled UTF-8 pages.)
Non-UTF-8 is harmful in form submission. It would be more focused to
make the validator complain about labeled non-UTF-8 if the page
contains a form. Also, it could be useful to make Firefox whine to
console when a form is submitted in non-UTF-8 and when an HTML page
has no encoding label. (I'd much rather implement all these than
implement breaking changes to how Firefox processes legacy content.)

 We should also lobby for authoring tools (as recommended by HTML5) to
 default their output to UTF-8 and make sure the encoding is declared.

 HTML5 already says: Authoring tools should default to using UTF-8 for
 newly-created documents. [RFC3629]
 http://dev.w3.org/html5/spec/semantics.html#charset

I think focusing your efforts on lobbying authoring tool vendors to
withhold the ability to save pages in non-UTF-8 encodings would be a
better way to promote UTF-8 than lobbying browser vendors to change
the defaults in ways that'd break locale-siloed Existing Content.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/