[whatwg] [url] merge progress
This work is now at a stage where I encourage wider review. Current results can be found here: https://specs.webplatform.org/url/webspecs/develop/ This work is proceeding here: https://github.com/webspecs/url My preferred method of input is pull requests, but bug reports, issues, emails or other mechanisms are fine. Once this work gets to the point where Anne is indicates it is ready, the plan is to pull this work into the WHATWG GitHub repository, where it will be built with the WHATWG specification templates. An example of what that will look like can be found here: http://intertwingly.net/projects/pegurl/url-merge.html See the readme for more information on webspecs: https://github.com/webspecs/url#readme - Sam Ruby
[whatwg] URL Statics questions
https://url.spec.whatwg.org/#url-statics It is not clear to me what the use case is for these methods, which leads me to a a number of questions. First, what does static mean in this context? Is this the C++ meaning of static, i.e., class methods. So would the two methods being described map to the following in JavaScript? URL.domainToASCII = function(domain) {...} URL.domainToUnicode = function(domain) {...} Are these methods implemented by any current browser? Assuming we are talking about URL.domainToASCII, I didn't see it implemented in the first two browsers I checked (Chrome and Firefox). Now to my real question: assuming we do IPv4 parsing per https://www.w3.org/Bugs/Public/show_bug.cgi?id=26431 (and incidentally, matching the Chrome implementation), what should these static methods return in the case of IPv4 addresses? The reason why I'm asking is that I'm working on rewriting the URL parser per https://www.w3.org/Bugs/Public/show_bug.cgi?id=25946, and would like to update the https://url.spec.whatwg.org/#host-parsing to be consistent. - Sam Ruby
Re: [whatwg] URL interop status and reference implementation demos
On 11/21/2014 05:32 PM, Domenic Denicola wrote: From: Sam Ruby [mailto:ru...@intertwingly.net] I guess I didn't make the point clearly before. This is not a waterfall process where somebody writes down a spec and expects implementations to eventually catch up. That line of thinking sometimes leads to browsers closing issues as WONTFIX. For example: https://code.google.com/p/chromium/issues/detail?id=257354 Instead I hope that the spec is open to change (and, actually, the list of open bug reports is clear evidence that this is the case), and that implies that differing from the spec isn't isomorphically equal to problematic case. More precisely: it may be the spec that needs to change. For sure! But, I would like to see where the spec differs from implementations, so that I can see what parts of the spec needs to be changed. Right now, when I read user agents with differences: testdata chrome firefox ie versus one that reads user agents with differences: ie safari, I can't tell which user agents are aligned with the spec and which aren't. So I can't tell if the spec needs to change, or if it doesn't. I'd prefer some kind of view where it said user agents with differences from the spec: x, y, z. Then if the answer was chrome, firefox, ie clearly the spec needs to change; if the answer was chrome then clearly Chrome needs to change and we can leave the spec alone. Perhaps this is the view you are looking for? http://w3c.github.io/test-results/url/all.html Note that on that view you can click through to see how the user agent you are currently using differs from the spec. I'm gathering this is very different from the data the table is currently showing, but it seems I don't actually understand what the table is currently showing anyway, so I don't understand how I could use the table's current data to guide spec changes. To reduce confusion, I've removed the list when there isn't consensus. I've also changed the colors on the browser-results page. Green means all is good. Yellow means that one or two browsers differ, and those are noted. Red means that there isn't consensus. I'm no longer showing which user agents differ. If you drill down, I'm still showing testdata as a user agent. reference implementation would be a better description. I'll probably fix that later. - Sam Ruby
Re: [whatwg] URL interop status and reference implementation demos
On 11/18/2014 06:37 PM, Domenic Denicola wrote: Really exciting stuff :D. I love specs that have reference implementations and strong test suites and am hopeful that as URL gets fixes and updates that these stay in sync. E.g. normal software development practices of not changing anything without a test, and so on. Thanks! I've tried to follow the example that the streams spec is providing. Including the naming of directories. From: whatwg [mailto:whatwg-boun...@lists.whatwg.org] On Behalf Of Sam Ruby https://url.spec.whatwg.org/interop/urltest-results/ I'd be interested in a view that only contains refimpl, ie, safari, firefox, and chrome, so we could compare the URL Standard with living browsers. Done, sort-of: https://url.spec.whatwg.org/interop/browser-results/ I note that given the small amount of data, the 'agents with differences' column is less useful than it could be. Basically, a reddish color should be interpreted to mean that we don't have three out of the four browsers agreeing on all values. I'd like to suggest that the following test be added: https://github.com/rubys/url/blob/peg.js/reference-implementation/test/moretestdata.txt And that the expected results be changed on the following tests: https://github.com/rubys/url/blob/peg.js/reference-implementation/test/patchtestdata.txt Note: I appear to have direct update access to urltestdata.txt, but I would appreciate a review before I make any updates. A pull request with a nice diff would be easy to review, I think? Done. https://github.com/w3c/web-platform-tests/pull/1402 The setters also have unit tests: https://github.com/rubys/url/blob/peg.js/reference-implementation/test/urlsettest.js So good! For streams I am running the unit tests against my reference implementation on every commit (via Travis). Might be worth setting up something similar. That's first on my todo list post merge: http://intertwingly.net/projects/pegurl/url.html#postmerge Basically, I'd rather do that on the whatwg branch rather than the rubys branch, but my stuff isn't quite ready to merge. As a final note, the reference implementation has a list of known differences from the published standard: intertwingly.net/projects/pegurl/url.html Hmm, so this isn't really a reference implementation of the published standard then? Indeed looking at the code it seems to not follow the algorithms in the spec at all :(. That's a bit unfortunate if the goal is to test that the spec is accurate. I guess https://github.com/rubys/url/tree/peg.js/reference-implementation#historical-notes explains that. Hmm. In that case, I'm unclear in what sense this is a reference implementation, instead of an alternate algorithm. I answered that separately: http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Nov/0129.html - Sam Ruby
Re: [whatwg] URL interop status and reference implementation demos
On 11/19/2014 09:32 AM, Domenic Denicola wrote: From: Sam Ruby [mailto:ru...@intertwingly.net] Done, sort-of: https://url.spec.whatwg.org/interop/browser-results/ Excellent, this is a great subset to have. I am curious what it means when testdata is in the user agents with differences column. Isn't testdata the base against which the user agents are compared? These results compare user agents against each other. The testdata is provided for reference. I am not of the opinion that the testdata should be treated as anything other than as a proposal at this point. Or to put it another way, if browser behavior is converging to something other than what than what the spec says, then perhaps it is the spec that should change. Done. https://github.com/w3c/web-platform-tests/pull/1402 Interesting, I did not realize that testdata was part of web-platform-tests instead of the URL repo alongside all your other interop material. I wonder if we should investigate ways to centralize inside the URL repo, e.g. having whatwg/url be a submodule of w3c/web-platform-tests? web-platform-tests is huge. I only need a small piece. So for now, I'm making do with a wget in my Makefile, and two patch files which cover material that hasn't yet made it upstream. - Sam Ruby
Re: [whatwg] URL interop status and reference implementation demos
On 11/19/2014 09:55 AM, Domenic Denicola wrote: From: Sam Ruby [mailto:ru...@intertwingly.net] These results compare user agents against each other. The testdata is provided for reference. Then why is testdata listed as a user agent? It clearly is mislabled. Pull requests welcome. :-) I am not of the opinion that the testdata should be treated as anything other than as a proposal at this point. Or to put it another way, if browser behavior is converging to something other than what than what the spec says, then perhaps it is the spec that should change. Sure. But I was hoping to see a list of user agents that differed from the test data, so we could target the problematic cases. As is I'm not sure how to interpret a row that reads user agents with differences: testdata chrome firefox ie versus one that reads user agents with differences: ie safari. I guess I didn't make the point clearly before. This is not a waterfall process where somebody writes down a spec and expects implementations to eventually catch up. That line of thinking sometimes leads to browsers closing issues as WONTFIX. For example: https://code.google.com/p/chromium/issues/detail?id=257354 Instead I hope that the spec is open to change (and, actually, the list of open bug reports is clear evidence that this is the case), and that implies that differing from the spec isn't isomorphically equal to problematic case. More precisely: it may be the spec that needs to change. web-platform-tests is huge. I only need a small piece. So for now, I'm making do with a wget in my Makefile, and two patch files which cover material that hasn't yet made it upstream. Right, I was suggesting the other way around: hosting the evolving-along-with-the-standard testdata.txt inside whatwg/url, and letting web-platform-tests pull that in (with e.g. a submodule). Works for me :-) That being said, there seems to be a highly evolved review process for test data, and on the face of it, that seems to be something worth keeping. Unless there is evidence that it is broken, I'd be inclined to keep it as it is. In fact, once I have refactored the test data from the javascript code in my setter tests, I'll likely suggest that it be added to web-platform-tests. - Sam Ruby
[whatwg] URL interop status and reference implementation demos
Anne has kindly given me access to the directory on the server where the url.spec lives. I've started to move some of my work there. https://url.spec.whatwg.org/interop/urltest-results/ Note that the expected results come from: https://github.com/w3c/web-platform-tests/blob/master/url/urltestdata.txt I'd like to suggest that the following test be added: https://github.com/rubys/url/blob/peg.js/reference-implementation/test/moretestdata.txt And that the expected results be changed on the following tests: https://github.com/rubys/url/blob/peg.js/reference-implementation/test/patchtestdata.txt Note: I appear to have direct update access to urltestdata.txt, but I would appreciate a review before I make any updates. - - - I also have a reference implementation I've been working on. First, a basic interface: https://url.spec.whatwg.org/reference-implementation/liveview.html A second interface allows you to override the base: https://url.spec.whatwg.org/reference-implementation/liveview2.html A third interface allows you to see what happens when you call individual setters: https://url.spec.whatwg.org/reference-implementation/liveview3.html Note: while all versions are a work in progress, this is more true for liveview3 than the others. In particular, this was created today, and only has href, protocol, and username roughed in at the moment. The setters also have unit tests: https://github.com/rubys/url/blob/peg.js/reference-implementation/test/urlsettest.js I'm planning to refactor these tests, separating the test data from the code so that other libraries and user agents can test against the same data. Once I do, I'll publish interop test results for these setters too. As a final note, the reference implementation has a list of known differences from the published standard: intertwingly.net/projects/pegurl/url.html - Sam Ruby
Re: [whatwg] URL interop status and reference implementation demos
On 11/18/2014 06:37 PM, Domenic Denicola wrote: As a final note, the reference implementation has a list of known differences from the published standard: intertwingly.net/projects/pegurl/url.html Hmm, so this isn't really a reference implementation of the published standard then? Indeed looking at the code it seems to not follow the algorithms in the spec at all :(. That's a bit unfortunate if the goal is to test that the spec is accurate. Let me help by connecting the dots. Bug https://www.w3.org/Bugs/Public/show_bug.cgi?id=25946 is open to rewrite the URL parser. Comment 8 and 9 endorse the following work in progress: http://intertwingly.net/projects/pegurl/url.html Just today, I integrated my Anolis to Bikeshed work, which is a prerequisite for completing this. The reference implementation is a faithful attempt to implement the reworked parsing logic. In fact, parts of the specification and parts of the reference implementation are generated from a single file: https://raw.githubusercontent.com/rubys/url/peg.js/url.pegjs Hopefully shortly this all will land in the live version of the spec, meanwhile it attempts to skate to where the puck will be. In each case of a known difference in published results, I've linked to rationale for the change (generally to an indication that Anne agrees). I hope this helps. - Sam Ruby
Re: [whatwg] [url] Feedback from TPAC
On 11/03/2014 10:32 AM, Anne van Kesteren wrote: On Mon, Nov 3, 2014 at 4:19 PM, David Singer sin...@apple.com wrote: The readability is much better (I am not a fan of the current trend of writing specifications in pseudo-basic, which makes life easier for implementers and terrible for anyone else, including authors), and I also think that an approach that doesn’t obsolete RFC 3986 is attractive. Is Apple interested in changing its URL infrastructure to not be fundamentally incompatible with RFC 3986 then? Other than slightly different eventual data models for URLs, which we could maybe amend RFC 3986 for IETF gods willing, I think the main problem is that a URL that goes through an RFC 3986 pipeline cannot go through a URL pipeline. E.g. parsing ../test against foobar://test/x gives wildly different results. That is not a state we want to be in, so something has to give. I would hope that everybody involved would enter into this discussion being willing to give a bit. To help foster discussion, I've made an alternate version of the live URL parser page, one that enables setting of the base URL: http://intertwingly.net/projects/pegurl/liveview2.html#foobar://test/x Of course, if there are any bugs in the proposed reference implementation, I'm interested in that too. - Sam Ruby
Re: [whatwg] [url] Feedback from TPAC
On 11/04/2014 09:32 AM, Anne van Kesteren wrote: On Tue, Nov 4, 2014 at 3:28 PM, Sam Ruby ru...@intertwingly.net wrote: To help foster discussion, I've made an alternate version of the live URL parser page, one that enables setting of the base URL: http://intertwingly.net/projects/pegurl/liveview2.html#foobar://test/x Of course, if there are any bugs in the proposed reference implementation, I'm interested in that too. Per the URL Standard resolving x against test:test results in failure, not test:///x. Fixed. Thanks! Perhaps over time we could add this to urltestdata.txt[1]? Meanwhile, I'll track such proposed additions here: https://github.com/rubys/url/blob/peg.js/reference-implementation/test/moretestdata.txt - Sam Ruby [1] https://github.com/w3c/web-platform-tests/blob/master/url/urltestdata.txt
Re: [whatwg] [url] Feedback from TPAC
On 11/04/2014 11:25 AM, Domenic Denicola wrote: From: whatwg [mailto:whatwg-boun...@lists.whatwg.org] On Behalf Of David Singer (I don't have IE to hand at the moment). I tried to test IE but unfortunately it looks like the URL components from DOM properties part of the demo page does not work in IE, I think because IE doesn't support document.baseURI. Try experimenting with a base URL using a http scheme. If you look closely at the source, you will see that function rebase will set both document.baseURI and the href element on the base element. The latter is sufficent for non-IE browsers. I had to add the former to get IE working. But, as you undoubtedly have noted, unknown base schemes seem to cause IE too ignore the base URL entirely. - Sam Ruby
Re: [whatwg] [url] Feedback from TPAC
On 11/02/2014 02:32 PM, Graham Klyne wrote: On 01/11/2014 00:01, Sam Ruby wrote: 3) Explicitly state that canonical URLs (i.e., the output of the URL parse step) not only round trip but also are valid URIs. If there are any RFC 3986 errata and/or willful violations necessary to make that a true statement, so be it. It's not clear to me what it is that might be willfully violated. Perhaps nothing. Specifically, I find the notion of relative scheme in [1] to be, at best, confusing, and at worst something that could break a whole swathe of existing URI processing. I don't know which, as on a brief look I don't understand what [1] is trying to say here, and I lack time (and will) to dive into the arcane style used for specifying URLs. First, I'm assuming that by [1], you mean https://url.spec.whatwg.org/#relative-scheme Second, I have no idea how a specification that essentially says here's what a set of browsers, languages, and libraries are converging on to convert URLs into URIs can break URIs. Third, here's a completely different approach to defining URLs that produces the same results (modulo one parse error that Anne agrees[2] should changed in be in the WHATWG spec): http://intertwingly.net/projects/pegurl/url.html#url If for some reason you don't find that to be to your liking, I'll be glad to try to meet you half way. I just need something more to go on than arcane. I think there may be a confusion here between syntax and interpretation. When the term relative is used in URI/URL context, I immediately think of relative reference per RFC3986. I suspect what is being alluded to is that some URI schemes are not global in the idealized sense of URIs as a global namespace - file:///foo dereferences differently depending on where it is used - the relativity here being in the relation between the URI/URL and the thing identified, with respect the the where the URI is actually processed. If you find it confusing, perhaps others will too. Concrete suggestions on what should be changed would be helpful. To change the syntactic definition of relative reference to include things like file: and ftp: URIs would cause all sorts of breakage, and require significant updating of the resolution algorithm in RFC3986 (more than would be appropriate for a mere erratum, IMO). I'm hoping this is not the kind of willful violation that is being contemplated here. Note in reformulated grammar, file is no longer treated the same as other types of relative references. I am not wedded to any of those terms, if you suggest better ones I'll accommodate. If errata can be produced expeditiously for RFC3986, then there shouldn't be any need for willful violations. #g -- [2] http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Oct/0267.html
Re: [whatwg] [url] Feedback from TPAC
On 11/1/14 5:29 AM, Anne van Kesteren wrote: On Sat, Nov 1, 2014 at 1:01 AM, Sam Ruby ru...@intertwingly.net wrote: Meanwhile, The IETF is actively working on a update: https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04 They are meeting F2F in a little over a week. URIs in general, and this proposal in specific will be discussed, and for that reason now would be a good time to provide feedback. I've only quickly scanned it, but it appears sane to me in that it basically says that new schemes will not be viewed as relative schemes. It doesn't say that. (We should perhaps try to find some way to make {scheme}:// syntax work for schemes that are not problematic (e.g. javascript would be problematic). Convincing implementers that it's worth implementing might be trickier.) How should it change? 1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too. See previous threads on the subject. The data models are incompatible, at least around %, likely also around other code points. It also seems unacceptable to require two parsers for URLs. Acknowledging that other parsers exist is quite a different statement than requiring two parsers. I'm only suggesting the former. As a concrete statement, a compliant implementation of HTML would require a URL parser, but not a URI parser. Also as a concrete statement, such a user agent will interact, primarily via the network, with other software that will interpret the canonicalized URL's as if they were URIs. That may not be as we would wish it to be. But it would be a disservice to everyone to document how we would wish things to be rather than how they actually are (and, by all indications, are likely to remain for the foreseeable future). 3) Explicitly state that canonical URLs (i.e., the output of the URL parse step) not only round trip but also are valid URIs. If there are any RFC 3986 errata and/or willful violations necessary to make that a true statement, so be it. It might be interesting to figure out the delta. But there are major differences between RFC 3986 and URL. Not obsoleting the former seems like a disservice to anyone looking to implement a parser or find information on URI/URL. I do plan to work with others to figure out the delta. As to the data models, at the present time -- and without having actually done the necessary analysis -- I am not aware of a single case where they would be different. Undoubtedly we will be able to quickly find some, but even so, I would assert that they following statements will remain true for the domain of canonicalized URLs, by which I mean the set of possible outputs of the URL serializer: 1) the overlap is substantial, and I would dare say overwhelming. 2) RFC 3986 and URL compliant parsers would interpret the same bytes in such outputs as delimiters, schemes, paths, fragments, etc. 3) as to data models, the URL Standard is silent as to how such bytes be interpreted. As to the meaning of '%', both the URL Standard and RFC3986 recognize that encodings other than utf-8 exist, and that such will affect the interpretation of percent encoded byte sequences. - Sam Ruby
Re: [whatwg] [url] Feedback from TPAC
On 11/1/14 7:56 AM, Anne van Kesteren wrote: On Sat, Nov 1, 2014 at 12:38 PM, Sam Ruby ru...@intertwingly.net wrote: On 11/1/14 5:29 AM, Anne van Kesteren wrote: It doesn't say that. (We should perhaps try to find some way to make {scheme}:// syntax work for schemes that are not problematic (e.g. javascript would be problematic). Convincing implementers that it's worth implementing might be trickier.) How should it change? Not sure what you're referring to. https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04 I just gave you one, %... E.g. http://example.org/?%; does not have an RFC 3986 representation. Here's the output of a URL parser (the one I chose was Firefox): new URL(http://example.com/?%;).search ?% Here's the output of a URI parser: $ ruby -r addressable/uri -e p Addressable::URI.parse('http://example.org/?%').query % I also assert that such a URL round-trips a URL parse/serialize sequence. - Sam Ruby
Re: [whatwg] [url] Feedback from TPAC
On 11/01/2014 07:18 PM, Barry Leiba wrote: Thanks, Sam, for this great summary -- I hadn't taken notes, and was hoping that someone who was (or who has a better memory than I) would post something. One minor tweak, at the end: More specifically, if something along these lines I describe above were done, the IETF would be open to the idea of errata to RFC3987 and updating specs to reference URLs. Errata to 3986, that is, not 3987. After this, 3987 will be considered obsolete (the IESG might move to mark it Historic, or some such). Thanks for the correction. I did indeed mean errata to 3986. - Sam Ruby Barry, IETF Applications AD On Fri, Oct 31, 2014 at 8:01 PM, Sam Ruby ru...@intertwingly.net wrote: bcc: WebApps, IETF, TAG in the hopes that replies go to a single place. - - - I took the opportunity this week to meet with a number of parties interested in the topic of URLs including not only a number of Working Groups, AC and AB members, but also members of the TAG and members of the IETF. Some of the feedback related to the proposal I am working on[1]. Some of the feedback related to mechanics (example: employing Travis to do build checks, something that makes more sense on the master copy of a given specification than on a hopefully temporary branch. These are not the topics of this email. The remaining items are more general, and are the subject of this note. As is often the case, they are intertwined. I'll simply jump into the middle and work outwards from there. --- The nature of the world is that there will continue to be people who define more schemes. A current example is http://openjdk.java.net/jeps/220 (search for New URI scheme for naming stored modules, classes, and resources). And people who are doing so will have a tendency to look to the IETF. Meanwhile, The IETF is actively working on a update: https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04 They are meeting F2F in a little over a week[2]. URIs in general, and this proposal in specific will be discussed, and for that reason now would be a good time to provide feedback. I've only quickly scanned it, but it appears sane to me in that it basically says that new schemes will not be viewed as relative schemes[3]. The obvious disconnect is that this is a registry for URI schemes, not URLs. It looks to me like making a few, small, surgical updates to the URL Standard would stitch all this together. 1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too. 2) Reference draft-ietf-appsawg-uri-scheme-reg in https://url.spec.whatwg.org/#url-writing as the way to register schemes, stating that the set of valid URI schemes is the set of valid URL schemes. 3) Explicitly state that canonical URLs (i.e., the output of the URL parse step) not only round trip but also are valid URIs. If there are any RFC 3986 errata and/or willful violations necessary to make that a true statement, so be it. That's it. The rest of the URL specification can stand as is. What this means operationally is that there are two terms, URIs and URLs. URIs would be of a legacy, academic topic that may be of relevance to some (primarily back-end server) applications. URLs are most people, and most applications, will be concerned with. This includes all the specifications which today reference IRIs (as an example, RFC 4287, namely, Atom). My sense was that all of the people I talked to were generally OK with this, and that we would be likely to see statements from both the IETF and the W3C TAG along these lines mid November-ish, most likely just after IETF meeting 91. More specifically, if something along these lines I describe above were done, the IETF would be open to the idea of errata to RFC3987 and updating specs to reference URLs. - Sam Ruby [1] http://intertwingly.net/projects/pegurl/url.html [2] https://www.ietf.org/meeting/91/index.html [3] https://url.spec.whatwg.org/#relative-scheme
[whatwg] [url] Feedback from TPAC
bcc: WebApps, IETF, TAG in the hopes that replies go to a single place. - - - I took the opportunity this week to meet with a number of parties interested in the topic of URLs including not only a number of Working Groups, AC and AB members, but also members of the TAG and members of the IETF. Some of the feedback related to the proposal I am working on[1]. Some of the feedback related to mechanics (example: employing Travis to do build checks, something that makes more sense on the master copy of a given specification than on a hopefully temporary branch. These are not the topics of this email. The remaining items are more general, and are the subject of this note. As is often the case, they are intertwined. I'll simply jump into the middle and work outwards from there. --- The nature of the world is that there will continue to be people who define more schemes. A current example is http://openjdk.java.net/jeps/220 (search for New URI scheme for naming stored modules, classes, and resources). And people who are doing so will have a tendency to look to the IETF. Meanwhile, The IETF is actively working on a update: https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04 They are meeting F2F in a little over a week[2]. URIs in general, and this proposal in specific will be discussed, and for that reason now would be a good time to provide feedback. I've only quickly scanned it, but it appears sane to me in that it basically says that new schemes will not be viewed as relative schemes[3]. The obvious disconnect is that this is a registry for URI schemes, not URLs. It looks to me like making a few, small, surgical updates to the URL Standard would stitch all this together. 1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too. 2) Reference draft-ietf-appsawg-uri-scheme-reg in https://url.spec.whatwg.org/#url-writing as the way to register schemes, stating that the set of valid URI schemes is the set of valid URL schemes. 3) Explicitly state that canonical URLs (i.e., the output of the URL parse step) not only round trip but also are valid URIs. If there are any RFC 3986 errata and/or willful violations necessary to make that a true statement, so be it. That's it. The rest of the URL specification can stand as is. What this means operationally is that there are two terms, URIs and URLs. URIs would be of a legacy, academic topic that may be of relevance to some (primarily back-end server) applications. URLs are most people, and most applications, will be concerned with. This includes all the specifications which today reference IRIs (as an example, RFC 4287, namely, Atom). My sense was that all of the people I talked to were generally OK with this, and that we would be likely to see statements from both the IETF and the W3C TAG along these lines mid November-ish, most likely just after IETF meeting 91. More specifically, if something along these lines I describe above were done, the IETF would be open to the idea of errata to RFC3987 and updating specs to reference URLs. - Sam Ruby [1] http://intertwingly.net/projects/pegurl/url.html [2] https://www.ietf.org/meeting/91/index.html [3] https://url.spec.whatwg.org/#relative-scheme
Re: [whatwg] questions on URL spec based on reviewing galimatias test results
On 10/30/14 2:09 AM, Anne van Kesteren wrote: On Wed, Oct 29, 2014 at 11:24 PM, Sam Ruby ru...@intertwingly.net wrote: http://intertwingly.net/projects/pegurl/urltest-results/d674c14cbe I'll note that galimatias doesn't produce a parse error in this case (and, in fact, the state machine specified by the current URL Standard goes down a completely different path for this case). The question is: should this be a parse error? Yeah. The results also seem strange. I thought at least Chrome had this behavior. Perhaps because Chrome was not running on Windows? Here is a screen capture of the live DOM URL viewer: http://i.imgur.com/kbsTDQ7.png Here are the test results for Chrome on Windows: http://intertwingly.net/tmp/81cd494abd36509f0d46010b0c4d4ff9 It appears that Chrome implements this, but (a) only on Windows, and (b) only if the base scheme is file. - Sam Ruby
[whatwg] questions on URL spec based on reviewing galimatias test results
1) Is the following expected to produce a parse error: http://intertwingly.net/projects/pegurl/urltest-results/4b60e32190 ? My reading of https://url.spec.whatwg.org/#relative-path-state is that step 3.1 indicates a parse error even though later step 1.5.1 replaces the non URL code point with a colon. My proposed reference implementation does not indicate a parse error with these inputs, but I could easily add it. 2) Is the following expected to product a parse error: http://intertwingly.net/projects/pegurl/urltest-results/bc6ea8bdf8 ? I ask this because the error isn't defined here: https://url.spec.whatwg.org/#host-state And the following only defines fatal errors (e.g. step 5); https://url.spec.whatwg.org/#concept-host-parser My proposed reference implementation does indicate a parse error with these inputs, but this could easily be removed. - Sam Ruby
Re: [whatwg] questions on URL spec based on reviewing galimatias test results
On 10/29/14 4:47 AM, Anne van Kesteren wrote: On Wed, Oct 29, 2014 at 12:12 PM, Sam Ruby ru...@intertwingly.net wrote: 2) Is the following expected to product a parse error: http://intertwingly.net/projects/pegurl/urltest-results/bc6ea8bdf8 ? What is the DNS violation supposed to mean? I would expect this to change if we decide to parse any numeric host name into IPv4. Then it would certainly be an error. Here is another example (though it contains multiple parse errors): http://intertwingly.net/projects/pegurl/urltest-results/f3382f1412 The error being reported is that the host contains consecutive dot characters (i.e., the 'label' between these characters is empty). - Sam Ruby
Re: [whatwg] questions on URL spec based on reviewing galimatias test results
On 10/29/14 4:47 AM, Anne van Kesteren wrote: On Wed, Oct 29, 2014 at 12:12 PM, Sam Ruby ru...@intertwingly.net wrote: 1) Is the following expected to produce a parse error: http://intertwingly.net/projects/pegurl/urltest-results/4b60e32190 ? My reading of https://url.spec.whatwg.org/#relative-path-state is that step 3.1 indicates a parse error even though later step 1.5.1 replaces the non URL code point with a colon. My proposed reference implementation does not indicate a parse error with these inputs, but I could easily add it. Given the legacy aspect, probably should be an error. Fixed: https://github.com/rubys/url/commit/6789a5307ebd0e4aa05161c93038f2fc50011955 But it turns out that addressing that question opens up another question. In my implementation that fix caused a (recoverable) parse error to be produced for another test case: http://intertwingly.net/projects/pegurl/urltest-results/d674c14cbe I'll note that galimatias doesn't produce a parse error in this case (and, in fact, the state machine specified by the current URL Standard goes down a completely different path for this case). The question is: should this be a parse error? - Sam Ruby
Re: [whatwg] URL: spec review - basic_parser
On 10/14/2014 03:41 AM, Anne van Kesteren wrote: On Tue, Oct 14, 2014 at 1:05 AM, Sam Ruby ru...@intertwingly.net wrote: 1) rows where the notes merely say href are cases where parse errors are thrown and failure is returned. The expected results are an object that returns the original href, but empty values for all other properties. I don't see this behavior in the spec: https://url.spec.whatwg.org/#url-parsing That is what you get when e.g. using a. If you use new URL() the object would fail to construct so you cannot observe the other properties. I'm not sure why you think it doesn't follow from the specification. If you return failure, there's no URL returned, so why would the properties return something? Given that I've found problems in the spec, my implementation, and the test data, I'm trying to guess at what is the desired behavior. As one source for clues, I've looked at what at the now unmaintained library: https://github.com/annevk/url/blob/master/url.js#L62 And, as noted above, this is consistent with urltestdata.txt, Given all of the above, would you suggest changing the spec or the expected test results? 2) rows that contain href hostname appear to be ones where the expected results do not appear to be updated to include the host to IDNA mapping. Looking at the first of those http://intertwingly.net/stories/2014/10/13/urltest-results/eb3950fcc8 it seems something might be broken here on your end. Can you explain what you think is broken? It isn't completely obvious, but the input string in that case contains U+200B, U+2060, U+FEFF: http://www.fileformat.info/info/unicode/char/200B/index.htm http://www.fileformat.info/info/unicode/char/2060/index.htm http://www.fileformat.info/info/unicode/char/feff/index.htm I'll also note that the results I produce are consistent with Presto/2.12.388. 3) rows that contain href protocol hostname pathname need further investigation. I suspect that these are based on my using a library to normalize the IDNA mapping, and it helpfully cleans up other problems like removing U+ characters from the input. E.g. for http://intertwingly.net/stories/2014/10/13/urltest-results/7a0e86d240 per http://www.unicode.org/Public/idna/latest/IdnaMappingTable.txt U+FDD0 is disallowed meaning failure ought to be returned. What you have as outcome for whatwg does not match urltestdata.txt (including the version you are using). Agreed. As I indicated, I need to look further into the library that I am using. P.S. I didn't update to the latest test data yet; but from what I can see the changes wouldn't materially affect the results, so I am publishing now. It affects what happens for http://%30%78%63%30%2e%30%32%35%30.01%2e, http://192.168.0.257, and ttp://\uff10\uff38\uff43\uff10\uff0e\uff10\uff12\uff15\uff10\uff0e\uff10\uff11. I do plan to update to the latest expected test results, but meanwhile I am still trying to determine places where these results aren't correct or current with the specification. - Sam Ruby
Re: [whatwg] URL: spec review - basic_parser
On 10/14/2014 05:49 AM, Anne van Kesteren wrote: On Tue, Oct 14, 2014 at 11:38 AM, Sam Ruby ru...@intertwingly.net wrote: At the present time, all I can say is that the https://url.spec.whatwg.org/, https://github.com/w3c/web-platform-tests/blob/master/url/, and https://github.com/annevk/url are inconsistent. I recommend not looking at annevk/url. To illustrate, try pasting http://f:b/c into: http://www.lookout.net/test/url/url-liveview.html Relevant excerpt from that page: var url = new URL(input, base); urlHref.textContent = url.href; And the results for http://f:b/c after applying urltestparser.js against urltestdata.js is as follows: {input:http://f:b/c,base:http://example.org/foo/bar,scheme :,username:,password:null,host:,port:,path:,query:,fra gment:,href:http://f:b/c,protocol::,search:,hash:} That seems correct. You hit b in the port state and that will return failure (from memory, did not check). How does this not match the specification? Here's my original statement: The expected results are an object that returns the original href, but empty values for all other properties. I don't see this behavior in the spec: https://url.spec.whatwg.org/#url-parsing; http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Oct/0159.html If you could be so kind as to point out what I am missing, I would appreciate it. I'll look further into why the results provided by Opera and https://rubygems.org/gems/addressable don't appear to match RFC 3491. Note that RFC 3491 is not a normative dependency for any of the algorithms. RFC 3491 is a normative dependency for RFC 3490, Internationalizing Domain Names in Applications (IDNA). You said, per IDNA those are ignored. http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Oct/0166.html - Sam Ruby
Re: [whatwg] URL: spec review - basic_parser
On 10/14/2014 07:00 AM, Simon Pieters wrote: On Tue, 14 Oct 2014 12:34:55 +0200, Anne van Kesteren ann...@annevk.nl wrote: If you could be so kind as to point out what I am missing, I would appreciate it. The way the a element works, I assume. Which is mostly how URLUtils works when associated with an object that is not URL. [[ The a element also supports the URLUtils interface. [URL] When the element is created, and whenever the element's href content attribute is set, changed, or removed, the user agent must invoke the element's URLUtils interface's set the input algorithm with the value of the href content attribute, if any, or the empty string otherwise, as the given value. ]] https://html.spec.whatwg.org/multipage/semantics.html#the-a-element - set the input [[ 1. Set url to null. ... 4. If url is not failure, set url to url. ]] https://url.spec.whatwg.org/#concept-urlutils-set-the-input When /url/ is failure, https://url.spec.whatwg.org/#concept-urlutils-url is null. So: .href: [[ 1. If url is null, return input. ]] https://url.spec.whatwg.org/#dom-url-href .protocol: [[ 1. If url is null, return :. ]] https://url.spec.whatwg.org/#dom-url-protocol ...and the other attributes return empty string in the first step if url is null. Does that help? Indeed, it does. Thanks! I was looking too myopically, assuming that urltestdata.txt was testing URL; and got sidetracked by http://www.lookout.net/test/url/. What I should have been looking at is https://github.com/w3c/web-platform-tests/tree/master/url, and in particular, the name of: https://github.com/w3c/web-platform-tests/blob/master/url/a-element.html - - - I think that a working and up-to-date live url parser would be a handy thing to have, and I hope to have one available shortly. - Sam Ruby
Re: [whatwg] URL: spec review - basic_parser
On 10/13/2014 10:05 AM, Anne van Kesteren wrote: Not yet. I'm still seeing a large set of differences between what I am producing and what is in urltestdata.txt and need to track down whether the problems are in my implementation, the spec, or in the test results. Once those three are in sync; I'll try to look at the bigger picture. Cool. Sounds great. New test results: http://intertwingly.net/stories/2014/10/13/urltest-results/ The fourth column (Notes) indicates which properties differ between what my software produces and what the testdata indicates should be the expected results. These fall into three basic categories: 1) rows where the notes merely say href are cases where parse errors are thrown and failure is returned. The expected results are an object that returns the original href, but empty values for all other properties. I don't see this behavior in the spec: https://url.spec.whatwg.org/#url-parsing 2) rows that contain href hostname appear to be ones where the expected results do not appear to be updated to include the host to IDNA mapping. 3) rows that contain href protocol hostname pathname need further investigation. I suspect that these are based on my using a library to normalize the IDNA mapping, and it helpfully cleans up other problems like removing U+ characters from the input. My implementation can be found here: http://intertwingly.net/stories/2014/10/13/url_rb.html Note the comments linking back to spec sections, and comments that identify step numbers. - Sam Ruby P.S. I didn't update to the latest test data yet; but from what I can see the changes wouldn't materially affect the results, so I am publishing now. P.P.S. Preview of what is yet to come, ruby2js run against my implementation produces: http://intertwingly.net/stories/2014/10/13/url_js.html This will need some additional work to get running, for example lines 54, 65, 82, 85, and 267 call out to libraries that aren't available to JavaScript. Lines 275 to 277 are debugging lines that will be removed shortly.
Re: [whatwg] URL: spec review - basic_parser
On 10/12/2014 04:18 AM, Anne van Kesteren wrote: On Sat, Oct 11, 2014 at 7:24 PM, Sam Ruby ru...@intertwingly.net wrote: On 10/10/2014 08:19 PM, Sam Ruby wrote: 2) https://url.spec.whatwg.org/#concept-basic-url-parser I'm interpreting terminate this algorithm and return failure to mean the same thing, and I'm interpreting parse error as set parse error flag and continue. I'm inclined to submit a pull request standardizing on terminate this algorithm and set parse error. I'm not sure what you mean here. Returning failure is important. However, in override mode returning failure is not important so the algorithm is simply terminated. Can you explain in JavaScript terms what the difference is between return failure and terminate? In any case, this difference wasn't clear to me, and mixed in with not defining what should be done with parse errors, and returning failure without setting parse errors (as mentioned below); all combined to make it more difficult (for me at least) to determine what was desired. And parse error error would be more like flag a parse error, or append a parse error to a list of parse errors. It depends a bit on whether the parser decides to halt on the first one or not. I don't see anything in the prose that indicates that halting on the first parse error is an option. b) Step 1.3.3 seems problematic. I interpret this prose to mean if any character in buffer is a % and the first two characters after the pointer position in input aren't hex characters. Specifically, it appears to be comparing a possibly non-contiguous set of characters. Ah yes. It needs to check the two code points after code point in buffer. That seems like a bug. I'll look into submitting a pull request after I complete this pass. 4) https://url.spec.whatwg.org/#file-host-state Step 1.3.2 returns failure without setting parse_error. Is this correct? 5) https://url.spec.whatwg.org/#host-state Step 1.2.2 also returns failure without setting parse_error. This is indeed inconsistent. I must at some point have thought that returning failure without reporting a parse error was fine (as failure was indicated) or the other way around. Reporting a parse error before returning is probably best. I'm inclined to submit a pull request changing these to to set parse error before failing. Thanks. Will do. 6) https://url.spec.whatwg.org/#relative-path-state If input contains a path but no query or fragment, the last part of the path will be accumulated into buffer, but that buffer will never be added to the path Looks like I got confused by the prose in the spec. I've submitted a pull request that makes this point clearer: https://github.com/whatwg/url/pull/4 There were a number of places where things weren't clear to me; after I complete my technical review and testing verification, I'll go back and identify more. Meanwhile, here is an example: https://github.com/whatwg/url/pull/5 Feel free to only modify url.src.html in PRs. Ack. Did you have a look at the open bugs by the way? There's a chance the parsing algorithm will get rewritten at some point to be a bit more functional and less state driven. Not yet. I'm still seeing a large set of differences between what I am producing and what is in urltestdata.txt and need to track down whether the problems are in my implementation, the spec, or in the test results. Once those three are in sync; I'll try to look at the bigger picture. - Sam Ruby
Re: [whatwg] URL: spec review - basic_parser
On 10/10/2014 08:19 PM, Sam Ruby wrote: I've now completed step 1, as described at [1]. Here are my questions/comments: 1) https://url.spec.whatwg.org/#url-code-points U+D8000 to U+DFFFD are invalid as they are within the UTF-16 surrogate range Disregard this comment, it turns out that this was a bug in my code. 2) https://url.spec.whatwg.org/#concept-basic-url-parser I'm interpreting terminate this algorithm and return failure to mean the same thing, and I'm interpreting parse error as set parse error flag and continue. I'm inclined to submit a pull request standardizing on terminate this algorithm and set parse error. 3) https://url.spec.whatwg.org/#authority-state a) Did you really mean prepend in Step 1.1? b) Step 1.3.3 seems problematic. I interpret this prose to mean if any character in buffer is a % and the first two characters after the pointer position in input aren't hex characters. Specifically, it appears to be comparing a possibly non-contiguous set of characters. I plan to revisit this after I complete my initial testing (i.e. step 2). 4) https://url.spec.whatwg.org/#file-host-state Step 1.3.2 returns failure without setting parse_error. Is this correct? 5) https://url.spec.whatwg.org/#host-state Step 1.2.2 also returns failure without setting parse_error. I'm inclined to submit a pull request changing these to to set parse error before failing. 6) https://url.spec.whatwg.org/#relative-path-state If input contains a path but no query or fragment, the last part of the path will be accumulated into buffer, but that buffer will never be added to the path Looks like I got confused by the prose in the spec. I've submitted a pull request that makes this point clearer: https://github.com/whatwg/url/pull/4 There were a number of places where things weren't clear to me; after I complete my technical review and testing verification, I'll go back and identify more. Meanwhile, here is an example: https://github.com/whatwg/url/pull/5 - Sam Ruby [1] http://lists.w3.org/Archives/Public/www-tag/2014Oct/0053.html
[whatwg] URL: spec review - basic_parser
I've now completed step 1, as described at [1]. Here are my questions/comments: 1) https://url.spec.whatwg.org/#url-code-points U+D8000 to U+DFFFD are invalid as they are within the UTF-16 surrogate range 2) https://url.spec.whatwg.org/#concept-basic-url-parser I'm interpreting terminate this algorithm and return failure to mean the same thing, and I'm interpreting parse error as set parse error flag and continue. 3) https://url.spec.whatwg.org/#authority-state a) Did you really mean prepend in Step 1.1? b) Step 1.3.3 seems problematic. I interpret this prose to mean if any character in buffer is a % and the first two characters after the pointer position in input aren't hex characters. Specifically, it appears to be comparing a possibly non-contiguous set of characters. 4) https://url.spec.whatwg.org/#file-host-state Step 1.3.2 returns failure without setting parse_error. Is this correct? 5) https://url.spec.whatwg.org/#host-state Step 1.2.2 also returns failure without setting parse_error. 6) https://url.spec.whatwg.org/#relative-path-state If input contains a path but no query or fragment, the last part of the path will be accumulated into buffer, but that buffer will never be added to the path - Sam Ruby [1] http://lists.w3.org/Archives/Public/www-tag/2014Oct/0053.html
Re: [whatwg] URL: test case review
On 10/06/2014 12:42 PM, Anne van Kesteren wrote: On Mon, Oct 6, 2014 at 3:13 AM, Sam Ruby ru...@intertwingly.net wrote: http://intertwingly.net/stories/2014/10/05/urltest-results/24f081633d This does not match what I find in browsers. (I did not look through the list exhaustively, see below, but since this was the first one...) Can you explain the methodology you used? The method I used can be found via: wget http://intertwingly.net/stories/2014/10/05/urltest wget http://intertwingly.net/stories/2014/10/05/urltestdata.json TL:DR; I created a page with a script that (a) fetches input data using XHR; (b) updates an a and a base element and then captures various properties for each test, and (c) posts the result using XHR. - Sam Ruby
Re: [whatwg] URL: test case review
On 10/06/2014 12:59 PM, Anne van Kesteren wrote: On Mon, Oct 6, 2014 at 6:54 PM, Sam Ruby ru...@intertwingly.net wrote: On 10/06/2014 12:42 PM, Anne van Kesteren wrote: On Mon, Oct 6, 2014 at 3:13 AM, Sam Ruby ru...@intertwingly.net wrote: http://intertwingly.net/stories/2014/10/05/urltest-results/24f081633d This does not match what I find in browsers. (I did not look through the list exhaustively, see below, but since this was the first one...) Can you explain the methodology you used? Sure, I gave ? as input and then checked the serialized URL (since you can't trust the search property). https://dump.testsuite.org/url/inspect.html works for this. wget http://intertwingly.net/stories/2014/10/05/urltest wget http://intertwingly.net/stories/2014/10/05/urltestdata.json TL:DR; I created a page with a script that (a) fetches input data using XHR; (b) updates an a and a base element and then captures various properties for each test, and (c) posts the result using XHR. Is there a chance that the library on the server does not pick up on a lone ?? I found the bug, thanks for reporting it. The problem is that the following properties are not defined as 'enumerable', so are not picked up when I serialize the tests as JSON: https://github.com/w3c/web-platform-tests/blob/master/url/urltestparser.js#L17 - Sam Ruby
[whatwg] URL: test case review
/stories/2014/10/05/urltest-results/ee52a7413c http://intertwingly.net/stories/2014/10/05/urltest-results/723aa80622 Worth Discussing: http://intertwingly.net/stories/2014/10/05/urltest-results/e0d78e8c36 http://intertwingly.net/stories/2014/10/05/urltest-results/eb30a2c2d0 http://intertwingly.net/stories/2014/10/05/urltest-results/e170ad9cce --- Further background on my methodology and results: http://intertwingly.net/blog/2014/10/02/WHATWG-URL-vs-IETF-URI - Sam Ruby [1] https://raw.githubusercontent.com/w3c/web-platform-tests/master/url/urltestdata.txt
[whatwg] Request for HTML.next ideas
Note: while this email is intentionally cross-posted; my request is that any responses trim the replies down to *one* of the above lists. === At the present time within the HTML WG, there are no surveys active, and no calls for proposals. Some are actively working on converging to fewer active proposals for issue 152. The editors have some changes to apply before we proceed to Last Call. The chairs still have some surveys and proposals to evaluate. Meanwhile, this may be a good time for others to begin to capture ideas for what comes after HTML5. I know that the WHATWG has ideas for some features and even has some speculative specification text beyond what can make the deadline for HTML5. I doubt the A11y team has exhausted their wish list. At this time, I would like to request that people capture their ideas here: http://www.w3.org/html/wg/wiki/HTML.next Ideas don't need to be fully fleshed out. In fact, in many cases a simple pointer to a proposal or even a discussion hosted elsewhere is all that is needed at this time. There isn't a hard deadline on this request, but we anticipate that the data captured will be discussed at the next AC meeting which goes from the 15th of May to the 17th of May. Thanks! - Sam Ruby
Re: [whatwg] Article: Growing pains afflict HTML5 standardization
On Mon, Jul 12, 2010 at 11:41 AM, Julian Reschke julian.resc...@gmx.de wrote: On 12.07.2010 16:43, Mike Wilcox wrote: On Jul 12, 2010, at 8:39 AM, Nils Dagsson Moskopp wrote: That's a little different. Google purposely uses unstandardized, incorrect HTML in ways that still render in a browser in order to make it more difficult for screen scrapers. They also break it in a different way every week. Assuming this is true (which I find difficult to believe), wouldn't a screen scraper based on the HTML5 parsing algorithm defeat this purpose ? Honestly, I don't know. But W3 defaulted to an HTML5 validator: http://validator.w3.org/check?uri=http%3A%2F%2Fwww.google.com%2Fsearch%3Fsource%3Dig%26hl%3Den%26rlz%3D%26%3D%26q%3Dhtml5%26aq%3Df%26aqi%3D%26aql%3D%26oq%3D%26gs_rfai%3Dcharset=%28detect+automatically%29doctype=Inlinegroup=0 http://validator.w3.org/check?uri=http%3A%2F%2Fwww.google.com%2Fsearch%3Fsource%3Dig%26hl%3Den%26rlz%3D%26%3D%26q%3Dhtml5%26aq%3Df%26aqi%3D%26aql%3D%26oq%3D%26gs_rfai%3Dcharset=%28detect+automatically%29doctype=Inlinegroup=0 True, but a parser conforming to the spec (*) would handle those errors, so in this case obfuscation wouldn't work. Essentially, any code using that parser would see the same information as an off-the-shelf web browser. ... Besides the protecting of their API, Google also will scratch and claw to save every byte. They are the gold standard of a high performance Understood. There's an ongoing controversy whether it makes sense to make things like these invalid (just stating, not offering an opinion). website. While this may or may not explain the things that don't validate, what it does say is that nothing coming from google.com http://google.com is accidental. ... I believe some time ago a certain Google employee actually *did* state that some of the conformance problems were unintentional. (yes, I did spend a few minutes finding that statement but wasn't successful). http://lists.w3.org/Archives/Public/public-html/2010Mar/0555.html Best regards, Julian (*) Implementing error recovery, which IMHO isn't required. - Sam Ruby
Re: [whatwg] Technical Parity with W3C HTML Spec
On Fri, Jun 25, 2010 at 3:01 PM, Ian Hickson i...@hixie.ch wrote: Maybe the answer is to have a spokesperson or liaison role, someone respected in the WHATWG community with a reputation for reasonable neutrality? Both Hixie and Maciej have conflicts of interest, as editor and W3C co-chair respectively. Maybe Hakon or David, since they were instrumental in forming WHATWG in the first place? Maybe an alternative would be: Where there are technical or political conflicts, W3C should decide how to resolve those internally, and how to represent the W3C point of view in the WHATWG. I would expect that people differ, so I would expect those different opinions to be represented in liaisons with WHATWG. I don't have a good answer here, because I think it's up to the W3C to decide their own processes, but I hope we agree that we need improvements to how we liaison. First can we work on improving communications so that we can work on differences before they become conflicts? We recently had a change proposal made by Lachlan: http://lists.w3.org/Archives/Public/public-html/2010Apr/1107.html Absolutely nobody in the W3C WG indicated any issues with this proposal: http://lists.w3.org/Archives/Public/public-html/2010Jun/0562.html Recently you said that you value convergence: http://lists.w3.org/Archives/Public/public-html/2010Jun/0525.html Yet, when you made the change, you did it in a way that made the WHATWG version not a proper superset. You also characterized the change in a way that I don't believe is accurate: http://lists.whatwg.org/pipermail/commit-watchers-whatwg.org/2010/004270.html I'm having trouble reconciling all of the above. You clearly continue to be a member of the W3C Working Group. You state that you value convergence. You were given ample opportunity to state an objection. And you clearly have an issue with Lanlan's suggestion. How can we improve communications to prevent misunderstandings such as this one from occurring in the future? What's the best way to address the mischaracterization of the difference as it is currently described in the WHATWG draft? Most importantly, how can we deescalate tensions rather that continuing in this manner? - Sam Ruby
Re: [whatwg] Technical Parity with W3C HTML Spec
On Fri, Jun 25, 2010 at 4:03 PM, Sam Ruby ru...@intertwingly.net wrote: Yet, when you made the change, you did it in a way that made the WHATWG version not a proper superset. On closer reading, it turns out that I was incorrect here. It still, however, remains a divergence, it still is mis-characterized, and I am still can't reconcile your statement concerning valuing convergence with this action. - Sam Ruby
Re: [whatwg] Technical Parity with W3C HTML Spec
On Fri, Jun 25, 2010 at 3:01 PM, Ian Hickson i...@hixie.ch wrote: While I agree that it is helpful for us to cooperate, I should point out that the WHATWG was never formally approached by the W3C about this With whom (and where?) would such a formal discussion take place? I would prefer that such a discussion happen on a publicly archived mailing list. - Sam Ruby
Re: [whatwg] Technical Parity with W3C HTML Spec
On Fri, Jun 25, 2010 at 7:02 PM, Ian Hickson i...@hixie.ch wrote: On Fri, 25 Jun 2010, Sam Ruby wrote: On Fri, Jun 25, 2010 at 3:01 PM, Ian Hickson i...@hixie.ch wrote: While I agree that it is helpful for us to cooperate, I should point out that the WHATWG was never formally approached by the W3C about this With whom (and where?) would such a formal discussion take place? I would prefer that such a discussion happen on a publicly archived mailing list. Best thing to do is probably to e-mail the people listed in the charter as being the members (e-mail addresses below), and cc the www-arch...@w3.org mailing list for archival purposes. ann...@opera.com, bren...@mozilla.com, dba...@mozilla.com, hy...@apple.com, dean.edwa...@gmail.com, howc...@opera.com, j...@mozilla.com, m...@apple.com, i...@hixie.ch HTH, Done: http://lists.w3.org/Archives/Public/www-archive/2010Jun/0054.html -- Ian Hickson U+1047E )\._.,--,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' - Sam Ruby
Re: [whatwg] A New Way Forward for HTML5 (revised)
John Foliot wrote: Peter Kasting wrote: It seems like the only thing you could ask for beyond this is the ability to directly insert your own changes into the spec without prior editorial oversight. I think that might be what you're asking for. This seems very unwise. Really? This appears to be exactly the single, special status privilege currently reserved for Ian Hickson. False. It is, in fact a serious complaint that many are trying to correct, including Manu with his offer to assist in setting up a more egalitarian solution. In fact, Manu is an instance proof that the previous statement you made is false. Ian is free to produce Working Drafts that are published by this working group. The status of such drafts are, and I quote[1]: Consensus is not a prerequisite for approval to publish; the Working Group MAY request publication of a Working Draft even if it is unstable and does not meet all Working Group requirements. Both you and Manu have exactly the same ability as Ian does in this respect. Ian has asked the group for permission to publish, and that was granted. Manu has produced a document but has yet to request permission to publish as a Working Draft. You are welcome to do likewise[2]. JF - Sam Ruby [1] http://www.w3.org/2005/10/Process-20051014/tr.html#first-wd [2] http://lists.w3.org/Archives/Public/public-html/2009Jul/0627.html
Re: [whatwg] A New Way Forward for HTML5 (revised)
John Foliot wrote: Sam Ruby wrote: Really? This appears to be exactly the single, special status privilege currently reserved for Ian Hickson. False. ...and yes, I stand corrected. Although the *impression* that this is the current status remains fairly pervasive; however I will endeavor to dispel that myth as well. That said, the barrier to equal entry remains high: http://burningbird.net/node/28 (however, I will also state that Sam has offered on numerous occasions to extend help to any that requires = balanced commentary) My goal is to ensure that there are no excuses not to participate. I've said that a person can simply go into notepad[3], make the changes, and I will take care of the rest. Manu has documented the process for those who prefer to do it themselves[4]. Ian has offered to make the changes if somebody can explain the use cases[5]. If people have suggestions on how to be even *more* inclusive, I welcome any and all suggestions. Meanwhile, your offer to help dispel that myth is very much appreciated. Both you and Manu have exactly the same ability as Ian does in this respect. Ian has asked the group for permission to publish, and that was granted. Manu has produced a document but has yet to request permission to publish as a Working Draft. You are welcome to do likewise[2]. While I have personal reservations that this may introduce an even wider fork of opinion, making consensus down the road even harder to achieve, this is the die that has been cast. I will offer what contributions I can to both Manu and Shelly in their respective initiatives, to the best of my ability, and will leave the WHAT WG to continue propagating what I see as their mistakes and false assumptions as they see fit - they have clearly signaled that not all contributions are welcome. It may very well end up that the sole difference between the WHATWG document and the W3C document is that the the WHATWG document states that summary attribute is conformant but obsolete, and the W3C document states that the summary attribute is conformant but not (yet) obsolete. But the only way that will happen is if somebody goes into notepad, or follows Manu's process, or explains the use case, or finds some other means to cause a working draft to appear with these changes. JF [1] http://www.w3.org/2005/10/Process-20051014/tr.html#first-wd [2] http://lists.w3.org/Archives/Public/public-html/2009Jul/0627.html - Sam Ruby [3] http://lists.w3.org/Archives/Public/public-html/2009Jul/0633.html [4] http://lists.w3.org/Archives/Public/public-html/2009Jul/0785.html [5] http://lists.w3.org/Archives/Public/public-html/2009Jul/0745.html
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Tue, May 12, 2009 at 4:34 PM, Shelley Powers shell...@burningbird.net wrote: I would say if your fellow Google developers could understand how this all works, there is hope for others. if http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0064.html Shelley - Sam Ruby
Re: [whatwg] RDFa is to structured data, like canvas is to bitmap and SVG is to vector
On Sun, Jan 18, 2009 at 1:34 PM, Henri Sivonen hsivo...@iki.fi wrote: On Jan 18, 2009, at 01:32, Shelley Powers wrote: Are you then saying that this will be a showstopper, and there will never be either a workaround or compromise? Are the RDFa TF open to compromises that involve changing the XHTML side of RDFa not to use attribute whose qualified name has a colon in them to achieve DOM Consistency by changing RDFa instead of changing parsing? Just so that we have all of the data available to make an informed decision, do we have examples of how it would break the web if attributes which started with the characters xmlns: (and *only* those attribute) were placed into the DOM exactly as they would be when those bytes are processed as XHTML? Notes: I am *not* suggesting anything just yet, other than the gathering of this data. I also recognize that this would require a parsing change by browser vendors, which also is a cost that needs to be factored in. But right now, I am interested in how it would affect the web if this were done. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/ - Sam Ruby
Re: [whatwg] RDFa is to structured data, like canvas is to bitmap and SVG is to vector
On Sat, Jan 17, 2009 at 11:55 AM, Shelley Powers shell...@burningbird.net wrote: The debate about RDFa highlights a disconnect in the decision making related to HTML5. Perhaps. Or perhaps not. I am far from an apologist for Hixie, (nor for that matter and I a strong advocate for RDF), but I offer the following question and observation. The purpose behind RDFa is to provide a way to embed complex information into a web document, in such a way that a machine can extract this information and combine it with other data extracted from other web pages. It is not a way to document private data, or data that is meant to be used by some JavaScript-based application. The sole purpose of the data is for external extraction and combination. So, I take it that it isn't essential that RDFa information be included in the DOM? This is not rhetorical: I honestly don't know the answer to this question. So, why accept that we have to use MathML in order to solve the problems of formatting mathematical formula? Why not start from scratch, and devise a new approach? Ian explored (and answered) that here: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-April/014372.html Key to Ian's decision was the importance of DOM integration for this vocabulary. If DOM integration is essential for RDFa, then perhaps the same principles apply. If not, perhaps some other principles may apply. - Sam Ruby
Re: [whatwg] RDFa is to structured data, like canvas is to bitmap and SVG is to vector
On Sat, Jan 17, 2009 at 1:33 PM, Dan Brickley dan...@danbri.org wrote: On 17/1/09 19:27, Sam Ruby wrote: On Sat, Jan 17, 2009 at 11:55 AM, Shelley Powers shell...@burningbird.net wrote: The debate about RDFa highlights a disconnect in the decision making related to HTML5. Perhaps. Or perhaps not. I am far from an apologist for Hixie, (nor for that matter and I a strong advocate for RDF), but I offer the following question and observation. The purpose behind RDFa is to provide a way to embed complex information into a web document, in such a way that a machine can extract this information and combine it with other data extracted from other web pages. It is not a way to document private data, or data that is meant to be used by some JavaScript-based application. The sole purpose of the data is for external extraction and combination. So, I take it that it isn't essential that RDFa information be included in the DOM? This is not rhetorical: I honestly don't know the answer to this question. Good question. I for one expect RDFa to be accessible to Javascript. http://code.google.com/p/rdfquery/wiki/Introduction - http://rdfquery.googlecode.com/svn/trunk/demos/markup/markup.html is a nice example of code that does something useful in this way. The fact that this works anywhere at all today implies that little, if any, changes to browsers is required in order to support this. Is that a fair statement? I've not taken a look at the code, but have taken a quick glance at the output using IE8.0.7000.0 beta, Safari 3.2.1/Windows, Chrome 1.0.154.43, Opera 9.63, and Firefox 3.0.5. The page is different (as in less functional) under IE8 and Safari. Is there something that they need to do which is not already covered in the HTML5 specification in order to support this? - Sam Ruby
Re: [whatwg] RDFa is to structured data, like canvas is to bitmap and SVG is to vector
On Sat, Jan 17, 2009 at 2:38 PM, Shelley Powers shell...@burningbird.net wrote: I propose that RDFa is the best solution to the use case Martin supplied, and we've shown how it is not a disruptive solution to HTML5. Others may differ, but my read is that the case is a strong one. But I will caution you that a little patience is in order. SVG is not a done deal yet. I've been involved in a number of standards efforts, and I've never seen a case of proposed on a Saturday morning, decided on a Saturday afternoon. One demo is not conclusive. Now you mention that there exists a number of libraries. I think that's important. Very important. Possibly conclusive. But back to expectations. I've seen references elsewhere to Ian being booked through the end of this quarter. I may have misheard, but in any case, my point is the same: if this is awaiting something from Ian, it will be prioritized and dealt with accordingly. If, however, some of the legwork is done for Ian, this may help accelerate the effort. Even little things may help a lot. I know what I'm about to say may be unpopular, but I'll say it anyway: take a few good examples of RDFa and run them through Henri's validator. The validator will helpfully indicate exactly what areas of the spec would need to be updated in order to accommodate RDFa. The next step would be to take a look at those sections. If the update is obvious and straightforward, perhaps nothing more is required. But if not, researching into the options and making recommendations may help. - Sam Ruby
Re: [whatwg] RDFa is to structured data, like canvas is to bitmap and SVG is to vector
On Sat, Jan 17, 2009 at 3:51 PM, Shelley Powers shell...@burningbird.net wrote: Sam Ruby wrote: On Sat, Jan 17, 2009 at 2:38 PM, Shelley Powers shell...@burningbird.net wrote: I propose that RDFa is the best solution to the use case Martin supplied, and we've shown how it is not a disruptive solution to HTML5. Others may differ, but my read is that the case is a strong one. But I will caution you that a little patience is in order. SVG is not a done deal yet. I've been involved in a number of standards efforts, and I've never seen a case of proposed on a Saturday morning, decided on a Saturday afternoon. One demo is not conclusive. Now you mention that there exists a number of libraries. I think that's important. Very important. Possibly conclusive. I am patient. Look at me? I make extensive use of both SVG and RDF -- that is the mark of a patient woman. But back to expectations. I've seen references elsewhere to Ian being booked through the end of this quarter. I may have misheard, but in any case, my point is the same: if this is awaiting something from Ian, it will be prioritized and dealt with accordingly. If, however, some of the legwork is done for Ian, this may help accelerate the effort. First of all, whatever happens has to happen with either vetting by the RDF/RDFa folks, if not their active help. This is my way of saying, I'd be willing to do much of the legwork, but I want to make I don't represent RDFa incorrectly. Secondly, my finances have been caught up in the current downturn, and my first priority has to be on the hourly work and odd jobs I'm getting to keep afloat. Which means that I can't always guarantee 20+ hours a week on a task, nor can I travel. Anywhere. But if both are acceptable conditions, I'm willing to help with tasks. I don't see any of that as being a problem. Even little things may help a lot. I know what I'm about to say may be unpopular, but I'll say it anyway: take a few good examples of RDFa and run them through Henri's validator. The validator will helpfully indicate exactly what areas of the spec would need to be updated in order to accommodate RDFa. The next step would be to take a look at those sections. If the update is obvious and straightforward, perhaps nothing more is required. But if not, researching into the options and making recommendations may help. Tasks including this one. Excellent. Well, all except for the downturn thing, but you know what I mean. In order to prevent any misunderstandings: it is not for me to assign work. In fact, nobody here is in such a position. People simply note things that need to be done, and do the ones that interest them, at the pace at which they are able. And communicate copiously. If you need help in vetting, I am given to understand that there is a small pocket of RDF enthusiasm in the W3C. :-P Shelley - Sam Ruby
Re: [whatwg] RDFa is to structured data, like canvas is to bitmap and SVG is to vector
On Sat, Jan 17, 2009 at 5:51 PM, Henri Sivonen hsivo...@iki.fi wrote: On Jan 17, 2009, at 22:35, Shelley Powers wrote: Generally, though, RDFa is based on reusing a set of attributes already existing in HTML5, and adding a few more. Also, RDFa uses CURIEs which in turn use the XML namespace mapping context. I would assume no differences in the DOM based on XHTML or HTML. The assumption is incorrect. Please compare http://hsivonen.iki.fi/test/moz/xmlns-dom.html and http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml Same bytes, different media type. The W3C Recommendation for DOM also describes a readonly attribute on Attr named 'name'. Discuss. I put together a very crude demonstration of JavaScript access of a specific RDFa attribute, about. It's temporary, but if you go to my main web page,http://realtech.burningbird.net, and look in the sidebar for the click me text, it will traverse each div element looking for an about attribute, and then pop up an alert with the value of the attribute. I would use console rather than alert, but I don't believe all browsers support console, yet. This misses the point, because the inconsistency is with attributes named xmlns:foo. There is a similar inconsistency in how xml:lang is handled. Discuss. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/ - Sam Ruby
Re: [whatwg] How to use SVG in HTML5?
On Jan 23, 2008 2:13 PM, Krzysztof Żelechowski [EMAIL PROTECTED] wrote: SVG is too heavyweight for the purpose of such tiny presentational enhancements. I can provide counterexamples: http://intertwingly.net/blog/ http://intertwingly.net/blog/archives/ - Sam Ruby
Re: [whatwg] Entity parsing
On 6/14/07, Ian Hickson [EMAIL PROTECTED] wrote: On Sun, 5 Nov 2006, Øistein E. Andersen wrote: From section 9.2.3.1. Tokenising entities: For some entities, UAs require a semicolon, for others they don't. This applies to IE. FWIW, the entities not requiring a semicolon are the ones encoding Latin-1 characters, the other HTML 3.2 entities (amp, gt and lt), as well as quot and the uppercase variants (AMP, COPY, GT, LT, QUOT and REG). [...] I've defined the parsing and conformance requirements in a way that matches IE. As a side-effect, this has made things like naiumlve actually conforming. I don't know if we want this. On the one hand, it's pragmatic (after all, why require the semicolon?), and is equivalent to not requiring quotes around attribute values. On the other, people don't want us to make the quotes optional either. With the latest changes to html5lib, we get a failure on a test named test_title_body_named_charref. Before, A mdash B == A — B, now A mdash B == A amp;mdash B. Is that what we really want? Testing with Firefox, the old behavior is preferable. - Sam Ruby
[whatwg] web-apps/current-work/#datetime-parser
Step 25 If sign is negative, then shouldn't timezoneminutes also be negated? Step 27 Shouldn't that be SUBTRACTING timezonehours hours and timezoneminutes minutes? My current time is 2007-04-17T05:28:33-04:00 The timezone is -4 hours from UTC. To convert to UTC I need to add 4 hours. - Sam Ruby
Re: [whatwg] Attribute for holding private data for scripting
Maciej Stachowiak wrote: On Apr 10, 2007, at 8:12 PM, Sam Ruby wrote: Maciej Stachowiak wrote: On Apr 10, 2007, at 2:14 PM, Sam Ruby wrote: Anne van Kesteren wrote: On Tue, 10 Apr 2007 22:41:12 +0200, Sam Ruby [EMAIL PROTECTED] wrote: How so? I missed the part where you wanted to change existing HTML parsers. I thought Hixie pointed out earlier (by means of examples) why we can't have namespace parsing in HTML. I suppose we can discuss it again... It is a recurring pattern. The first instance was we can't allow trailing slashes in tags, which was followed up by a carefully crafted and narrow set of exceptions, which was met with that works and was adopted. So... while it is clearly true the full extent of XML namespames will never be supported in HTML5 (and for good reason), namespace qualified attributes allow extensibility in ways that prevent collisions. One of the first questions that would need to be answered: are there any existing documents on the web which would be broken if the name placed into the DOM for attributes with names containing a colon, with an apparent prefix, and one that matched an enclosing xmlns: declaration were to be changed? I think the problem here isn't compatibility with existing content, but rather ability to use the feature in new web content while still gracefully handling existing user agents. We wrote up some design principles for the HTML WG based on the WHATWG's working assumptions which might make this point more clear: http://esw.w3.org/topic/HTML/ProposedDesignPrinciples. While Don't Break The Web is a goal, so is Degrade Gracefully. To give a specific example: say I make my own mjsml prefix with namespace http://example.org/mjsml;. In HTML4 UAs, to look up an mjsml:extension attribute using getAttribute(mjsml:extension). In HTML5 UAs, I'd have to use getAttributeNS(http://example.org/mjsml;, extension). And neither technique would work on both (at least as I understand your proposal). Here's a page I constructed, and tested on Firefox: http://intertwingly.net/stories/2007/04/10/test.html This page is meant to be served as application/xhtml+xml. Can you test it and see what results you get? Then lets discuss further. In Safari 2.0.4: Processed as HTML, it says data and then . Processed as XHTML, it says null and then data. In Opera 9.00: Processed as HTML, it says data and then null. Processed as XHTML, it says null and then data. In Firefox 2.0.0.3: Processed as HTML, it says data and then . Processed as XHTML, it says data and then data. In IE/Mac 5.2: Processed as HTML, it says data and the second alert does not appear. Processed as XHTML, neither alert appears. It looks like Firefox's XHTML implementation already has the getAttribute extension I suggested of handling QNames. Cool! The first thing that is apparent to me is that, when processed as HTML, element.getAttribute('mjsml:extension') works everywhere. So it is probably fair to say that allowing it does not run afoul of either the Don't Break the Web or Degrade Gracefully design principles. Per HTML5 section 8.1.2.3, however, such an attribute name would not be considered conformant. Despite this, later in document, in the description of Attribute name state, no parse error is produced for this condition. Nor does the current html5lib parser produce a parse error with this data. I'd suggest that the first order of business is to reconcile 8.1.2.3 with the description of Attribute name state. My suggestion is that Anything else emits a parse error (but nevertheless continues on/recovers), and that a rule for handling latin small letter a through z, hyphen-minus, and colon be added. By the way, the fact that no two of the browsers I tested treat this the same is a pretty clear indicator that DOM Core needs the HTML5 treatment. +1. But this begs a larger issue. Much of the differences that you found were in how XHTML was handled, and the WhatWG document currently states: The rules for parsing XML documents (and thus XHTML documents) into DOM trees are covered by the XML and Namespaces in XML specifications, and are out of scope of this specification. [XML] [XMLNS] Regards, Maciej - Sam Ruby
Re: [whatwg] Attribute for holding private data for scripting
Anne van Kesteren wrote: On Wed, 11 Apr 2007 13:40:39 +0200, Sam Ruby [EMAIL PROTECTED] wrote: Per HTML5 section 8.1.2.3, however, such an attribute name would not be considered conformant. Yes, only attributes defined in the specification are conformant. I was specifically referring to section 8.1.2.3. Let me call your attention to the following text: Attribute names use characters in the range U+0061 LATIN SMALL LETTER A .. U+007A LATIN SMALL LETTER Z, or, in uppercase, U+0041 LATIN CAPITAL LETTER A .. U+005A LATIN CAPITAL LETTER Z, and U+002D HYPHEN-MINUS (-). Despite this, later in document, in the description of Attribute name state, no parse error is produced for this condition. Nor does the current html5lib parser produce a parse error with this data. Correct. We're not doing validation. Just tokenizing and building a tree. In the process, parse errors are generally emitted in cases where individual characters are encountered which do not match the lexical grammar rules. Just not in this case. - Sam Ruby
Re: [whatwg] Attribute for holding private data for scripting
Anne van Kesteren wrote: On Wed, 11 Apr 2007 13:40:39 +0200, Sam Ruby [EMAIL PROTECTED] wrote: To give a specific example: say I make my own mjsml prefix with namespace http://example.org/mjsml;. In HTML4 UAs, to look up an mjsml:extension attribute using getAttribute(mjsml:extension). In HTML5 UAs, I'd have to use getAttributeNS(http://example.org/mjsml;, extension). And neither technique would work on both (at least as I understand your proposal). By the way, the reason this is not consistent with XML is that it would be just as ok to use a different prefix. By basing this on the prefix (which is needed if you want this to be compatible with HTML, etc.) you're moving the semantics from the namespace to the prefix, which seems like a bad idea. For starters, you are misattributing the quote above. I did not write those words. As to your point -- and you so colorfully put it on your weblog -- Standards Suck. And in this case, I will argue that the current HTML5 spec leads one to the conclusion that getAttribute(mjsml:extension) will work, at least for the HTML serialization of HTML5. I did not write that quote. I did not write -- or even contribute to -- that portion of the spec. - Sam Ruby
Re: [whatwg] Attribute for holding private data for scripting
Anne van Kesteren wrote: On Wed, 11 Apr 2007 13:53:21 +0200, Sam Ruby [EMAIL PROTECTED] wrote: Anne van Kesteren wrote: On Wed, 11 Apr 2007 13:40:39 +0200, Sam Ruby [EMAIL PROTECTED] wrote: Per HTML5 section 8.1.2.3, however, such an attribute name would not be considered conformant. Yes, only attributes defined in the specification are conformant. I was specifically referring to section 8.1.2.3. Let me call your attention to the following text: Attribute names use characters in the range U+0061 LATIN SMALL LETTER A .. U+007A LATIN SMALL LETTER Z, or, in uppercase, U+0041 LATIN CAPITAL LETTER A .. U+005A LATIN CAPITAL LETTER Z, and U+002D HYPHEN-MINUS (-). I think you should read the whole section. Allowing colons there wouldn't make a difference. The document is a draft. The subject line of this thread suggests that the WG is entertaining the notion of allowing at least one attribute which is not currently defined in the specification. This suggests that the draft may need to change. Drafts are like that. Like others, I'm not convinced that the way forward is to allow a new attribute which has a micro-grammar for parsing what would be represented in the DOM essentially as a character blob. Despite this, later in document, in the description of Attribute name state, no parse error is produced for this condition. Nor does the current html5lib parser produce a parse error with this data. Correct. We're not doing validation. Just tokenizing and building a tree. In the process, parse errors are generally emitted in cases where individual characters are encountered which do not match the lexical grammar rules. Just not in this case. The above are not the grammar rules. They are (normative) guidelines for people writing or generating HTML. As far as I can tell there's no normative grammar. Just a way to construct a conforming string and a way to interpret a random string. --Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: [whatwg] Attribute for holding private data for scripting
Anne van Kesteren wrote: I think I'd rather have something simple such as prefix_name for extensions made by ECMAScript libraries, etc. (As opposed to an in scope xmlns:prefix=http://...; with prefix:name extensions which work differently in XML.) That would also work better for element extensions. Not any of this should be allowed, but there seems to be some desire to have an ability to introduce conforming extension elements / attributes which are implemented using a script library. This leads into lots of tangents. 1) re: prefix_name - how are prefixes registered? Henri is free to correct me if I am wrong, but I gathered that the requirement was for a bit of decentralized extensibility, i.e., the notion that anybody for any reason could defined an extension for holding private data; and furthermore could do so without undo fear of collision. 2) I assert that the existing DOM standard already defines a mechanism for decentralized extensibility. Most relevant to the discussion at hand is the getAttributeNS method. It may not be defined as clearly as it could be, but there does seem to be some clues which suggest what the original intent was, and the beginnings of an agreement that if more browsers were to conform to that intent, that would be a GOOD THING(TM). 3) There already is spec text which indicates how html5 defined elements are to be handled with respect to getElementsByTagNameNS. Perhaps it would again be a GOOD THING(TM) if this was also codified for attributes. I believe that this is consistent with what Maciej is calling for. 4) One thing that needs to be mentioned is that compliance to the DOM standard varies widely. In the long term, perhaps browser vendors could do a better job of this, and perhaps the HTML5 effort can help put a focus on this need. In the short term, however, this can be dealt with via JavaScript. Encapsulating and dealing with browser incompatibilities is an all too common use case for JavaScript. 5) I'm not sure where you draw the conclusion that prefix:name extensions would work differently than in XML. While Python's minidom does not appear to produce the desired results when I call getElementById, it otherwise seems to handle the document identically to the way Firefox does: http://intertwingly.net/stories/2007/04/10/test.py - Sam Ruby
Re: [whatwg] Attribute for holding private data for scripting
On 4/10/07, Simon Pieters [EMAIL PROTECTED] wrote: Or allow any attribute that starts with x_ or something (to prevent clashing with future revisions of HTML), as private attributes. Instead of starts with x_, how about contains a colon? A conformance checker could ensure that there is a corresponding xmlns declaration that applies here, and possibly even do additional verification if it recognizes the namespace. An HTML5 parser would, of course, recover from references to undeclared namespaces, placing the entire attribute name (including the prefix and the colon) into the DOM in such situations.
Re: [whatwg] Attribute for holding private data for scripting
On 4/10/07, Anne van Kesteren [EMAIL PROTECTED] wrote: On Tue, 10 Apr 2007 20:21:27 +0200, Sam Ruby [EMAIL PROTECTED] wrote: Or allow any attribute that starts with x_ or something (to prevent clashing with future revisions of HTML), as private attributes. Instead of starts with x_, how about contains a colon? A conformance checker could ensure that there is a corresponding xmlns declaration that applies here, and possibly even do additional verification if it recognizes the namespace. An HTML5 parser would, of course, recover from references to undeclared namespaces, placing the entire attribute name (including the prefix and the colon) into the DOM in such situations. * That would be confusing to people familiar with XML; * It would hinder the ability to exchange scripts between HTML and XML; * It would create more differences between XML and HTML where less seems to be desired (trailing slash allowed, etc.). How so? The idea is to place these attributes into the DOM the same way as they would be when parsed with an xml parser, for the cases where the data happens to be namespace valid. And to do what you would expect in the cases where, for example, attribute values aren't quoted. And to follow the html5 credo of recover at all cost in cases where what the user entered doesn't conform. This would of course need to be spec'ed, AND compared against common usage, AND prototyped; I simply ask that it not be rejected out of hand. - Sam Ruby
Re: [whatwg] Attribute for holding private data for scripting
Anne van Kesteren wrote: On Tue, 10 Apr 2007 22:41:12 +0200, Sam Ruby [EMAIL PROTECTED] wrote: How so? I missed the part where you wanted to change existing HTML parsers. I thought Hixie pointed out earlier (by means of examples) why we can't have namespace parsing in HTML. I suppose we can discuss it again... It is a recurring pattern. The first instance was we can't allow trailing slashes in tags, which was followed up by a carefully crafted and narrow set of exceptions, which was met with that works and was adopted. So... while it is clearly true the full extent of XML namespames will never be supported in HTML5 (and for good reason), namespace qualified attributes allow extensibility in ways that prevent collisions. One of the first questions that would need to be answered: are there any existing documents on the web which would be broken if the name placed into the DOM for attributes with names containing a colon, with an apparent prefix, and one that matched an enclosing xmlns: declaration were to be changed? - Sam Ruby
Re: [whatwg] Attribute for holding private data for scripting
Maciej Stachowiak wrote: On Apr 10, 2007, at 2:14 PM, Sam Ruby wrote: Anne van Kesteren wrote: On Tue, 10 Apr 2007 22:41:12 +0200, Sam Ruby [EMAIL PROTECTED] wrote: How so? I missed the part where you wanted to change existing HTML parsers. I thought Hixie pointed out earlier (by means of examples) why we can't have namespace parsing in HTML. I suppose we can discuss it again... It is a recurring pattern. The first instance was we can't allow trailing slashes in tags, which was followed up by a carefully crafted and narrow set of exceptions, which was met with that works and was adopted. So... while it is clearly true the full extent of XML namespames will never be supported in HTML5 (and for good reason), namespace qualified attributes allow extensibility in ways that prevent collisions. One of the first questions that would need to be answered: are there any existing documents on the web which would be broken if the name placed into the DOM for attributes with names containing a colon, with an apparent prefix, and one that matched an enclosing xmlns: declaration were to be changed? I think the problem here isn't compatibility with existing content, but rather ability to use the feature in new web content while still gracefully handling existing user agents. We wrote up some design principles for the HTML WG based on the WHATWG's working assumptions which might make this point more clear: http://esw.w3.org/topic/HTML/ProposedDesignPrinciples. While Don't Break The Web is a goal, so is Degrade Gracefully. To give a specific example: say I make my own mjsml prefix with namespace http://example.org/mjsml;. In HTML4 UAs, to look up an mjsml:extension attribute using getAttribute(mjsml:extension). In HTML5 UAs, I'd have to use getAttributeNS(http://example.org/mjsml;, extension). And neither technique would work on both (at least as I understand your proposal). Here's a page I constructed, and tested on Firefox: http://intertwingly.net/stories/2007/04/10/test.html This page is meant to be served as application/xhtml+xml. Can you test it and see what results you get? Then lets discuss further. BTW, I intentionally don't have a completed proposal at this point. We need to explore what works and what doesn't work further. Now, we could extend getAttribute in HTML to do namespace lookup when given a name containing a colon and when namespace declarations are present, but then we would want to do it in XHTML as well. And using the short getAttribute call instead of a longer getAttributeNS with a namespace prefix might be unacceptable to XML fans. Regards, Maciej - Sam Ruby
Re: [whatwg] Pre element question
Ian Hickson wrote: On Fri, 19 Jan 2007, Sam Ruby wrote: People often code things like the following: pre one two three /pre Visually, this ends up looking something like +---+ | | | one | | two | | three | +---+ with the following CSS rule: pre { border: solid 1px #000; } [in standards mode] I couldn't reproduce this. In Firefox trunk, with: http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%3Cstyle%3Epre%20%7B%20border%3A%20solid%3B%20%7D%3C/style%3E%0A%3Cpre%3E%0Ax%0A%3C/pre%3E ...I get the leading newline dropped. Presumably then this is yet another difference between application/xhtml+xml and text/html. If it did do it, in HTML4, it would have been a stardards mode bug (bug 2750, which I filed back in 1999). In HTML5, we're dropping that requirement, since everyone ignores it. However, we will, as you point out, have to introduce a special behaviour for a newline at the start of a pre element. IE actually does it for more than just the pre element (e.g. it does it for p, though not span) but compatibility with the Web only seems to require it for pre since that's all that the other browsers do it for. Fixed. Thanks! For reference, the current (and presumably as of now no longer valid) behavior of html5lib is as follows: #document | !DOCTYPE HTML | html |head | style |pre { border: solid; } | |body | pre | x | - Sam Ruby
[whatwg] Standard DOM Serialization? [was :Common Subset]
Michel Fortin wrote: I've started a wiki page about the common subset: http://wiki.whatwg.org/wiki/Common_Subset I'd like to explore this from a different angle. Libraries (like html5lib) will likely provide a means to serialize a DOM, and will presumably have unit tests. The question is: does it make sense to standardize what such a method produces? HTML allows variations on the case of elements, single vs. double vs. no quoting of attributes, etc. If such were standardized, how would the HTML5 canonical serialization differ from the XHTML5 canonical serialization (in fact, must they be different at all?) In any case, a desirable feature of such a serialization would be the ability to round trip. For HTML5, this would only apply to all valid HTML5 documents: as an example, one could artificially produce a DOM which contains a h1 inside the head element; if such a DOM were serialized and then parsed by an HTML5 parser, the DOM produced would differ, as well it should. If there is no interest in standardizing a serialization (or separate standard serializations form HTML5 and XHTML5), then this discussion belongs on [EMAIL PROTECTED] mailing list. - Sam Ruby
Re: [whatwg] Standard DOM Serialization? [was :Common Subset]
Anne van Kesteren wrote: On Sun, 10 Dec 2006 00:29:03 +0100, Sam Ruby [EMAIL PROTECTED] wrote: If there is no interest in standardizing a serialization (or separate standard serializations form HTML5 and XHTML5), then this discussion belongs on [EMAIL PROTECTED] mailing list. http://www.whatwg.org/specs/web-apps/current-work/#innerhtml I assume that you are trying to tell me something, but I am too dumb to understand it. That section says the innerHTML DOM attribute of all HTMLElement and HTMLDocument nodes returns A SERIALISATION of the node's children using the HTML syntax. [emphasis added] A given DOM may have multiple, valid, HTML5 serializations. I am asking whether there is interest in identifying ONE standard serialization that everybody who wishes to comply with could do so. - Sam Ruby
Re: [whatwg] Standard DOM Serialization? [was :Common Subset]
Henri Sivonen wrote: On Dec 10, 2006, at 02:09, Sam Ruby wrote: I am asking whether there is interest in identifying ONE standard serialization that everybody who wishes to comply with could do so. Why? For digital signatures? For comparing parse trees from different parsers? My train of thought started with the sharing of test cases, and when coupled with the discussion on the common subset; when put together I was wondering if there would be a relation between the two. I (obviously) hadn't considered innerHTML. *IF* there were interest in changing this (something which I presume is *NOT* the case) and *IF* a common subset between XHTML5 and HTML5 was viable (plausible but not certain) *THEN* the confusing difference in meaning between innerHTML in an XML and HTML context could be resolved. All told, seems rather unlikely, so nevermind. - Sam Ruby
Re: [whatwg] Inline SVG
Leons Petrazickis wrote: On 12/7/06, Alexey Feldgendler [EMAIL PROTECTED] wrote: On Mon, 04 Dec 2006 13:55:32 +0600, Ian Hickson [EMAIL PROTECTED] wrote: http://intertwingly.net/stories/2006/12/02/whatwg.logo Currently, there wouldn't be one. We could extend HTML5 to have some sort of way of doing this, in the future. (It isn't clear to me that we'd want to allow inline SVG, though. It's an external embedded resource, not a semantically-rich part of the document, IMHO.) People will do inline SVG anyway. If there won't be a straightforward way to do this, authors will use all kinds of dirty hacks, such as data: URIs and DOM manipulation. Personally, I don't think SVG content is inappropriate for HTML documents. It is no more presentational than the text itself: HTML doesn't try to structure natural language sentences by breaking them into grammar constructs, so an SVG image could be thought of as just an atomic phrase which only has defined semantics as whole. How about this for HTML5: object type=image/svg+xml svg version=1.1 xmlns=http://www.w3.org/2000/svg; circle cx=100 cy=50 r=40 stroke=black stroke-width=2 fill=red/ /svg /object And this for XHTML5: object type=image/svg+xml ![CDATA[ svg version=1.1 xmlns=http://www.w3.org/2000/svg; circle cx=100 cy=50 r=40 stroke=black stroke-width=2 fill=red/ /svg ]] /object If that's over-complicating the semantics of object, we could introduce an inline xml tag that's similar to the inline script and style tags. It would have a type= attribute to specify the mimetype, and its contents would be within a CDATA block in XHTML5. First, why the different syntax, and in particular, why CDATA? One of the key advantages of SVG, as it exists today, in XHTML is that the SVG elements are in the DOM. Not as an opaque blob, but as a set of scriptable and stylable elements. Take a look at the following: http://developer.mozilla.org/en/docs/SVG_In_HTML_Introduction - Sam Ruby
Re: [whatwg] Test cases for parsing spec (Was: Re: Provding Better Tools)
Karl Dubost wrote: Sam, Le 6 déc. 2006 à 23:13, Sam Ruby a écrit : My original interest was to write a replacement for Python's SGMLLIB, i.e., one that was not based on the theoretical ideal of how SGML vocabularies work, but one based on the practical notion of how HTML actually is parsed. I'm not sure sgmllib would be the best target. Specifically if it's used in many other products. But maybe you are talking about a new library altogether. http://docs.python.org/lib/module-sgmllib.html 8.2 sgmllib -- Simple SGML parser This module defines a class SGMLParser which serves as the basis for parsing text files formatted in SGML (Standard Generalized Mark-up Language). In fact, it does not provide a full SGML parser -- it only parses SGML insofar as it is used by HTML, and the module only exists as a base for the htmllib module. Another HTML parser which supports XHTML and offers a somewhat different interface is available in the HTMLParser module. It seems a better candidate. http://docs.python.org/lib/module-HTMLParser.html 8.1 HTMLParser -- Simple HTML and XHTML parser New in version 2.2. This module defines a class HTMLParser which serves as the basis for parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML. Unlike the parser in htmllib, this parser is not based on the SGML parser in sgmllib. I'm adding them to the list of HTML parsers. http://esw.w3.org/topic/HTMLAsSheAreSpoke htmllib is both based on sgmllib (and shares some of the same issues) and is a bit draconian. It is less suitable for consuming html as practiced than sgmllib. I was originally thinking about creating a htmllib2 much like there is a urllib2 (in the library) and an httplib2 (by Joe Gregorio). Though it now looks like it makes more sense to name it httplib5, and potentially join forces with others who (may) have similar interests. - Sam Ruby
Re: [whatwg] several messages about XML syntax and HTML5
Ian Hickson wrote: The pingback specification does exactly what the trackback specification does, but without relying on RDF blocks in comments or anything silly like that. It just uses the Microformats approach, and is far easier to use, and doesn't require any additional bits to add to HTML. [offtopic] I'd never heard of pingback. I googled for it and found your website first, but couldn't find the RFC number. You have a copyright of 2002, and it appears that Trackback was also developed in 2002. So are you implying they should have used Pingback instead? It appears they were developed in parallel? They were made around the same time (Trackback was invented first). My point was just that Trackback is not a good example of why you need more attributes in HTML, since there are equivalent technologies that do it with existing markup and no loss of detail. I disagree. The pingback specification does NOT do exactly what the trackback specification does. Pingback discovery works for any media type, does not deal with any granularity smaller than a URL. Trackback discovery is limited to (X)HTML, but can deal with multiple entries on a single page. Here's an example: http://scott.userland.com/2005/11/09.html - Sam Ruby
Re: [whatwg] Sanctity of MIME types
Ian Hickson wrote: On Mon, 4 Dec 2006, Sam Ruby wrote: Independent of what the specs say *MUST* happen, I'd like people to bring up one or more browsers with a URL from this list, and see if the browser asked them if they wanted to subscribe. Subscribe is not a normal feature associated with text/html, which is the Content-Type that you will find for each. Actually, this is what Web Apps 1.0 will require, I just haven't written the sniffing algorithm yet. This is the placeholder section: http://www.whatwg.org/specs/web-apps/current-work/#navigating Note the mention of RSS/Atom in the first red box. I have a request. It would be nice if the sniffing algorithm made an exception for text/plain. Use case: http://svn.smedbergs.us/wordpress-atom10/tags/0.6/wp-atom10-comments.php - Sam Ruby
Re: [whatwg] Test cases for parsing spec (Was: Re: Provding Better Tools)
James Graham wrote: Ian Hickson wrote: On Tue, 5 Dec 2006, James Graham wrote: As someone in the process of implementing a HTML5 parser from the spec, my _only_ complaint so far is that there aren't (yet) any testcases. If you could get together with the other people writing parsers and come up with a standard format for test cases, that would be really helpful. I have a few tests I could contribute, but I'd need a format to provide them in (they're currently not in a form that would be useful to you). Did you have a list for implementers somewhere? I think it would be a very worthwhile effort to come up with a set of implementation independent, self-describing (i.e. where the testcase itself contains the expected parse tree in some form), testcases - but I think the discussion should be on a separate list. Count me in. This is actually closer to the original reason why I originally subscribed to this list. If given a few tests, I could convert them into a useful form,and this form could serve as a model for future tests. My original interest was to write a replacement for Python's SGMLLIB, i.e., one that was not based on the theoretical ideal of how SGML vocabularies work, but one based on the practical notion of how HTML actually is parsed. My background: I originally wrote most of the back end for the feed validator, and continue to be its primary maintainer. I also contribute to the universal feed parser. The format of the test cases for both validator and parser are very similar: a standalone document with a structured comment. In the structured comment is an assertion. In the validator's case, it describes a message that is, or is not, expected to occur. In the parser's case, it describes what amounts to an XPath expression. I do believe that a similar approach could work here, not for 100% of the test cases, but close enough to handle the bulk of the cases. The rest can be handled separately. Additional things like mime type overrides could also be specified in this header. Samples: http://feedvalidator.org/testcases/ http://feedparser.org/tests/ My goal would be to produce something that I could use within the feedparser (and therefore, planet). - Sam Ruby
Re: [whatwg] Windows-1252 entities
Anne van Kesteren wrote: The section on handling entities should contain the following mapping: 128: 8364, 129: 65533, 130: 8218, 131: 402, 132: 8222, 133: 8230, 134: 8224, 135: 8225, 136: 710, 137: 8240, 138: 352, 139: 8249, 140: 338, 141: 65533, 142: 381, 143: 65533, 144: 65533, 145: 8216, 146: 8217, 147: 8220, 148: 8221, 149: 8226, 150: 8211, 151: 8212, 152: 732, 153: 8482, 154: 353, 155: 8250, 156: 339, 157: 65533, 158: 382, 159: 65533 ... mostly for legacy reasons. +1, though I would suggest a one change: 159: 376 // Yuml; - Sam Ruby
Re: [whatwg] Sanctity of MIME types
Robert Sayre wrote: On 12/5/06, Sam Ruby [EMAIL PROTECTED] wrote: I have a request. It would be nice if the sniffing algorithm made an exception for text/plain. It would be nice, but Use case: http://svn.smedbergs.us/wordpress-atom10/tags/0.6/wp-atom10-comments.php Fixed in FF 2.0.0.1, btw. text/plain sniffing in Mozilla dates from this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=220807 1.120 [EMAIL PROTECTED] 2004-01-07 19:56 Work around misconfiguration in default Apache installs that makes it claim all sorts of stuff as text/plain. http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/uriloader/base/nsURILoader.cpprev=1.120#289 How was it fixed? Both so that Ian's eventual text can be consistent with the fix, and for my edification as I would love to be able to directly view my test cases again: http://feedvalidator.org/testcases/atom/3.1.1.1/escaped_text.xml - Sam Ruby
Re: [whatwg] several messages about XML syntax and HTML5
Ian Hickson wrote: On Tue, 5 Dec 2006, Sam Ruby wrote: xmlns attributes are invalid on HTML elements except html, and when found on unrecognized [elements] imply style=display:none unless you recognize the value of this attribute. There are millions of documents that would be broken by such a rule, so browser vendors couldn't actually deploy that, sadly. :-( Can you identify three independently produced ones? Sure. Here's one (many pages on that site have this problem): http://forskningsbasen.deff.dk/ddf/rec.external?id=auc107991 It has a block at the bottom that says: copyright xmlns= xml:lang=en...br...br.../copyright (Note the cunning mixing of XML-like syntax with HTML-like syntax.) Another: http://www.cms.alaswaq.net/save_print.php?save=1cont_id=4372 A large chunk of the text on this page is inside elements with xmlns= set (from what I can tell, all the text above the double up chevron button thing is inside elements with xmlns=). A third one: http://www.homeaway.com/Varna/s/1453/fa/find.squery This one has markup like this (I can just imagine how this happened): span(?xml version=1.0 encoding=UTF-8? fromRecord xmlns=http://wvrgroup.com/propertyom;1/fromRecord - ?xml version=1.0 encoding=UTF-8? toRecord xmlns=http://wvrgroup.com/propertyom;10/toRecord of ?xml version=1.0 encoding=UTF-8? hitCount xmlns=http://wvrgroup.com/propertyom;24/hitCount)/span Again, important text (it's the (1 - 10 of 24) text at the top right, clearly intended to be visible), which is wrapped in elements with xmlns= attributes. That's three. I found dozens more (and I only checked a few thousand pages at random), including: http://ise.uvic.ca/Theater/sip/person/7639/main.html The entire header text (John Epstein) on that page is all inside an element display_name which has an xmlns= attribute. http://global.yesasia.com/kr/artIdxDept.aspx/section-videos/code-c/aid-39826/ A bunch of snippets are inside elements with xmlns=. http://intermezzo-weblog.blogspot.com/2005/05/o-caso-rondnia-e-mais.html Not clear if it was intentional here, but some of the visible text at the bottom right is in an xmlns= block. http://projects.teknowledge.com/DAML/Corpus/W/wrestling_match.html Unclear what they thought was going on here too, but the text at the top is inside an unknown element with xmlns= set. http://194.7.45.68/fr/item.php?text_id=51813keyw=Snoop+Dogg There are eight bazillion xmlns= attributes in this file, but the copyright in particular uses an unknown HTML element with xmlns=. ...and I'll stop here, because that should be enough to convince you. :-) The common pattern that I see is that xmlns=. - Sam Ruby
Re: [whatwg] several messages about XML syntax and HTML5
Ian Hickson wrote: On Wed, 6 Dec 2006, Sam Ruby wrote: The common pattern that I see is that xmlns=. It's certainly the more common value, but it is by no means the only one, as you will see if you examine the various examples I gave in more detail. My bad. Point made. - Sam Ruby
Re: [whatwg] several messages about XML syntax and HTML5
Ian Hickson wrote: I don't see any documentation that requires XHTML to not support display.write, but it certainly is a reality that nobody has done so. http://www.whatwg.org/specs/web-apps/current-work/#document.write1 (I'd like to make it work, but can't work out how to specify it. If you have any ideas for actual concrete text for the spec, please let me know.) I would think that steps 2-7 of the innerHTML algorithm specified immediately below the target of this link would be preferred over always raise an exception. Again, while I have a great respect for you and your work, you are hardly representative of the majority of Web authors, which is who I have to primarily take into account when it comes to the spec. Agreed. - Sam Ruby
Re: [whatwg] several messages about XML syntax and HTML5
Ian Hickson wrote: On Tue, 5 Dec 2006, Sam Ruby wrote: Case in point: http://www.intertwingly.net/blog/2006/12/01/The-White-Pebble In IE, there's some stray XHTML HTML and XHTML HTML XML text. This isn't acceptable to most people. It certainly isn't something that it would make sense to encourage. The worst possible outcome here would be for browsers like IE to start trying to parse this SVG in text/html, because the lack of any sensible parsing rules for it would guarentee that we're faced with even more tag soup, thus undoing all the work that the HTML5 spec is trying to do to get us past that. You are aware that I like to tweak IE users, right? With the current technology, this could have been avoided with a single div and two lines of CSS. And I am most capable of doing that. But that wouldn't help, e.g., Lynx users. Over a period of years, I would think that a requirement like the one below could be phased in (presuming that one could be found to work). I have no expectation that Lynx would ever support a real XHTML mode. In the longer run, I do believe that an architected simple rule like: xmlns attributes are invalid on HTML elements except html, and when found on unrecognized attributes imply style=display:none unless you recognize the value of this attribute. ... would channel those with insane desires to make extensions into doing so in a manner that is harmless. Such a rule might take a year or two to get widely deployed, but the worst feet-draggers won't be affected any worse than they were in the days when table was young. There are millions of documents that would be broken by such a rule, so browser vendors couldn't actually deploy that, sadly. :-( Can you identify three independently produced ones? BTW, I deeply respect the pushback that you give to everybody who thinks they want to have a say in the future of HTML. - Sam Ruby
Re: [whatwg] several messages about XML syntax and HTML5
that is meaningful to me. It would be better to have hard data to work with, rather than having to rely on our opinions of this. My own research does not suggest that most authors use tools. That over three quarters of pages have major syntactic errors leads me to suspect that tools are not going to save the syntax. +1. I'll add that most tools are created by fallible humans with only a shallow understanding of the relevant specifications. On Sat, 2 Dec 2006, Robert Sayre wrote: It would not take much to add an if the element has an 'xmlns' attribute to the A start tag token not covered by the previous entries state in How to handle tokens in the main phase section of the document. This would break millions of pages, sadly. There are huge volumes of pages that have bogus xmlns= attributes with all kinds of bogus values on the Web today. I worked for a browser vendor in the past few years that tried to implement xmlns= in text/html content, and found that huge amounts of the Web, including many major sites, broke completely. We can't introduce live xmlns= attributes to text/html. All I ask is that you keep an open mind while we collectively explore whether there are extremely selective and surgical changes that can be made to html5 -- like the change to allow empty element syntax only on a handful of elements. On Sat, 2 Dec 2006, Sam Ruby wrote: The question is: what would the HTML5 serialization be for the DOM which is internally produced by the script in the following HTML5 document? http://intertwingly.net/stories/2006/12/02/whatwg.logo Currently, there wouldn't be one. We could extend HTML5 to have some sort of way of doing this, in the future. (It isn't clear to me that we'd want to allow inline SVG, though. It's an external embedded resource, not a semantically-rich part of the document, IMHO.) When you couple this answer with the concept of a generalized [X]HTML toolchain, the inevitable tendency would be to want a HTML5 deserializer on one end and an XHTML5 serializer on the other end. And not just any XML deserializer, but one that limited itself to a subset of XML that could safely be processed by HTML5 deserializers. If the spec explicitly disallows things useful to this toolchain, then the opportunity exists for somebody to move the discussion for what constitutes interop from what does the spec say to what does this toolchain support. As the set of DOMs that have a defined and interopable HTML5 serialization grows, this picture changes to one in which having an HTML5 deserializer on one end and an HTML5 serializer on the other is increasingly attractive. On Sun, 3 Dec 2006, Sam Ruby wrote: In the hopes that it will bring focus to this discussion: http://wiki.whatwg.org/wiki/HtmlVsXhtml This has now been updated with a more complete list of differences. Thanks! - Sam Ruby
Re: [whatwg] several messages about XML syntax and HTML5
James Graham wrote: Elliotte Harold wrote: That means I have to send text/html to browsers (because that's the only thing they understand) and let my clients ignore that hint. No. As I understand it, the full chain of events should look like this: [Internal data model in server] | | HTML 5 Serializer | | {Network} | | HTML 5 Parser | | [Whatever client tools you like] The only technical issue is that your HTML5 parser has to produce a data format that your other client tools like. If this involves the construction of an XML-like tree that's fine. But you should _never_ try to use an XML parser to produce the tree because it _will_ break with conforming HTML5 documents. Excellent ASCII art. This only works if the internal-data-model to HTML5 conversion is lossless. If it is not, people will find ways with structured comments or by creating intentionally invalid HTML5 and relying on the error recovery that is either prescribed or observed to be commonly practiced. - Sam Ruby
[whatwg] Sanctity of MIME types
Here's a random half dozen examples, picked to show a bit of diversity: http://beta.versiontracker.com/mac/osx/home-edu/updates.rss http://city.piao.com.cn/rss.asp?85 http://feuerwehr-melle-de.server13031.isdg.de/index.php?id=199 http://hesten.innit.no/hru/rss.php?START=0STOP=3 http://httablo.hu/pages/rss.php http://skopjeclubbing.com.mk/rss_djart.asp Independent of what the specs say *MUST* happen, I'd like people to bring up one or more browsers with a URL from this list, and see if the browser asked them if they wanted to subscribe. Subscribe is not a normal feature associated with text/html, which is the Content-Type that you will find for each. The point is not to label these guys bozos (as I said in previous messages, bozos outnumber you). But to get you to consider what browsers can, and will, do. In these days of GreaseMonkey and its brethren, the client is king. - - - Where does this leave HTML5? I am of the opinion that HTML5 should describe a set of rules that a compliant HTML5 parser should follow. The MIME and DOCTYPEs specified in the document should be recommendations. Something outside of the parser may chose to dispatch based on this information, but that's outside of the control of the parser. IMHO, the parser itself shouldn't complain when it finds a HTML4 DOCTYPE, or an XHTML2 DOCTYPE for that matter. Of course, a lot more HTML4 documents would be valid HTML5 than XHTML 2 documents. - Sam Ruby
[whatwg] wiki: HtmlVsXhtml
In the hopes that it will bring focus to this discussion: http://wiki.whatwg.org/wiki/HtmlVsXhtml - Sam Ruby
Re: [whatwg] Valid Unicode
On 12/1/06, Elliotte Harold [EMAIL PROTECTED] wrote: Henri Sivonen wrote: 6. Are noncharacters U+FDD0..U+FDEF allowed (?) 7. Are the noncharacters from the last two characters of each plane allowed (?) I don't have particularly strong feelings here. Putting those characters is HTML is a bad idea, but allowing them is not a problem for HTML5 to XHTML5 conversion and they aren't a common problem like C1 controls. FFFE and are specifically forbidden by XML so they should probably be forbidden here too. I think the others are allowed. Unicode (not XML) reserves U+D800 – U+DFFF as well as U+FFFE and U+. XML 1.0 only allows the following characters: [#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDDF], [#x1FFFE-#x1], [#x2FFFE-#x2], [#x3FFFE-#x3], [#x4FFFE-#x4], [#x5FFFE-#x5], [#x6FFFE-#x6], [#x7FFFE-#x7], [#x8FFFE-#x8], [#x9FFFE-#x9], [#xAFFFE-#xA], [#xBFFFE-#xB], [#xCFFFE-#xC], [#xDFFFE-#xD], [#xEFFFE-#xE], [#xE-#xF], [#x10FFFE-#x10]. It would not be wise for HTML5 to limit itself to the more constrained character set of XML. In particular, the form feed character is pretty popular, This is yet another case where take HTML5, read it into a DOM, and serialize it as XML, and voilà: you have valid XHTML doesn't work. -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/ - Sam Ruby
Re: [whatwg] markup as authored in practice
Robert Sayre wrote: SVG and MathML have a DOM. It wouldn't be that hard to serialize it as HTML5. Robert, if you will permit me, I would like to recast that into the form of a question, jeopardy style. The question is: what would the HTML5 serialization be for the DOM which is internally produced by the script in the following HTML5 document? http://intertwingly.net/stories/2006/12/02/whatwg.logo Any takers? - Sam Ruby P.S. That script, complete with indentation and readable variable names, is still an order of magnitude smaller than http://whatwg.org/images/logo People could save bandwidth and reduce the number of HTTP requests (and not have to worry about hotlinking!) by dropping this script into their pages (of course, they could save even more bytes if there were a direct HTML5 serialization of this DOM, hence the question. P.P.S. I realize that not all browsers support this relative new elements. It is my understanding that HTML5 will be introducing new elements too.
Re: [whatwg] markup as authored in practice
On 12/2/06, Robert Sayre [EMAIL PROTECTED] wrote: I don't think we need to settle this issue in December 2006, but I do think there is ample evidence of interoperable but undocumented behavior that HTML5 implementors will need to consider. Does the WHATWG have a process for capturing unresolved issues that need to be worked? - Sam Ruby
Re: [whatwg] markup as authored in practice
On 12/2/06, David Hyatt [EMAIL PROTECTED] wrote: Shipping Safari has no SVG support. WebKit nightlies do. That's the only reason the logo now renders correctly in the nightlies so that particular file is completely irrelevant to this discussion. I'm confused. Which file? And why is it completely irrelevant? dave
Re: [whatwg] Valid Unicode
On 12/2/06, Henri Sivonen [EMAIL PROTECTED] wrote: On Dec 2, 2006, at 18:24, Sam Ruby wrote: It would not be wise for HTML5 to limit itself to the more constrained character set of XML. In particular, the form feed character is pretty popular, BTW, I copy and pasted the wrong table. The characters I mentioned were discouraged (and include such things as Microsoft smart quotes mislabeled as iso-8859-1). The actual allowed set in XML 1.0 is as follows: #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10] For XML 1.1 the list is as follows: [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10] This is yet another case where take HTML5, read it into a DOM, and serialize it as XML, and voilà: you have valid XHTML doesn't work. What I am advocating is making sure that *conforming* HTML5 documents can be serialized as XHTML5 without dataloss. Then you will also need to disallow newlines in attribute values. In any case, I understand the desire; my read is that the WG's desire for backwards compatibility is higher. Limiting the character set to the allowable XML 1.1 character set should not be a problem for backwards compatibility purposes. - Sam Ruby
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Henri Sivonen wrote: I don't think it has any actual technical merit OTOH, the blog.whatwg.org WordPress lipsticking drill was a total waste of time from a technical point of view. It was purely about public relations and politics. As an alternative to being perceived as a lipsticking drill, I would prefer that others felt that an important part of the spec authoring process includes what amounts to a feasibility study and hands on experimentation with extant authoring tools. I apologize if I've caused any ill will. I do believe that efforts to keep blog.whatwg.org and other sites to be valid relative to the current draft of HTML5 are important in order to keep perspective and to provide an example for others to learn from. Finally, I will express a bit of disappointment at seeing the WordPress folks prematurely being labeled bozos, and am disappointed to see portions of this discussion framed in terms that border on the discussions of epic battles with Zeldman. - Sam Ruby
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
On 11/30/06, Michel Fortin [EMAIL PROTECTED] wrote: We can't really have a document that is both HTML5 and XHTML5 at the same time if we keep the !DOCTYPE HTML declaration however. Why not? - Sam Ruby
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Benjamin Hawkes-Lewis wrote: On Tue, 2006-11-28 at 16:20 -0500, Sam Ruby wrote: I believe that I could modify my weblog to be simultaneously both HTML5 and XHTML5 compliant, modulo the embedded SVG content, something that would needs to be discussed separately. I think having /two/ different serializations of Web Forms 2.0/Web Applications 1.0 is bad enough. To try and cater to what's effectively a third serialization compatible with both parsing methods is to reinvent the XHTML 1.0 as text/html mess. Serializing to multiple formats from a single source is, I think, a better model. Especially as embedded content may need different treatment too. That was not the intent of my suggestion. I am suggesting that HTML5 standardize on *one* format. One that comes as close as humanly possible to capturing the web as it is practiced in all of its glorious and often quite messy detail. Those that wish to serialize the DOM in other formats are certainly free to do so, but those formats aren't HTML5. I do have an opinion on how embedded content should be handled, but I am trying to focus on one issue at a time. If you would like a preview, take a peek at: http://planet.intertwingly.net/ http://planet.intertwingly.net/top100/ http://golem.ph.utexas.edu/~distler/planet/ Those three planets take input from a number of frankly grungy input sources and consistently produce well formed XML that often contain embedded MathML or SVG content. You are, of course, free to explore those pages and others; but, for now, I would like to focus on one question: If HTML5 were changed so that these elements -- and these elements alone -- permitted an optional trailing slash character, what percentage of the web would be parsed differently? Can you cite three independent examples of existing websites where the parsing would diverge? Lachlan's observations [...] on what it would take to change the popular WordPress application to produce HTML5 compliant output As blogging software goes, WordPress is pretty good. But then blogging software is generally atrocious when it comes to markup. Trying to design an (X)HTML spec for a group of PHP developers who think it's persuasive to bang on about their dedication to web standards while serving their project's non-validating XHTML 1.1 homepage as text/html is doomed to failure. I'm pretty sure that the Mozilla home page was not created with WordPress, and I'm absolutely sure that the Microsoft home page was not. Conversely, if the major browser vendors have to chose between the web as it is commonly practiced, and a spec that doesn't reflect that reality, which one do you think they will chose? I'll argue that the choices aren't as black and white as either the question you posed above, or even the one that I did. No matter what the WHATWG spec says, each vendor will independently make a cost/benefit analysis as to how they should treat trailing slashes in elements like img. But before they do, this work group certainly can anticipate that question. What is the cost of accepting trailing slashes on elements which are always defined with a content model of empty, except when found in Attribute value (unquoted) state? What sites would be parsed differently based on this change? Are those differences in line with how existing browsers actually behave, or at odds with this behavior? - Sam Ruby
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Lachlan Hunt wrote: Sam Ruby wrote: In HTML5, there are a number of elements with a content model of empty: area, base, br, col, command, embed, hr, img, link, meta, and param. If HTML5 were changed so that these elements -- and these elements alone -- permitted an optional trailing slash character, what percentage of the web would be parsed differently? Can you cite three independent examples of existing websites where the parsing would diverge? If it's only allowed on empty elements (now known as singleton elements in the spec) then this isn't about changing the handling, it's just about defining what is and is not conforming. Exactly. I do not think it's a good idea to make the trailing slash conforming. Although it is harmless, it provides no additional benefit at all and it creates the false impression that the syntax actually does something. The fact is that authors already try things like div/, p/ and even a/. I've seen all of those examples in the wild. See, for instance, the source of the XML 1.0 spec (and many others) which claim to be XHTML as text/html, littered with plenty of a/ tags all throughout. If these are common, and implemented interoperably, then what is the harm? An example of something that is NOT implemented interoperably is script src=.../. In my book, a document that states that it always is a parse error to do something despite abundant evidence to the contrary is not as useful as one that says here are the places where it works, and here are the places where it does not. I've even come across various authors either thinking that does work, or (when they find out the truth) wondering why it doesn't. It's not a good idea to confuse them any more by giving the impression that it works for some elements but not others. It's better to just say it doesn't work at all and forbid it in all cases. That's a slippery slope. At the extreme, it leads to XHTML 2.0, where features that are thought to be problematic are removed. Think of the children. By contrast, in HTML5, I see a document that attempts to be considerably less judgemental, and considerably more resilient. Inside the comments in the HTML 5 document I see statistics lovingly cited. Example: !-- As of 2005-12, studies showed that around 0.2% of pages used the image element. -- What percentage of pages use img/ constructs? and all this is coupled with Lachlan's observations[3] on what it would take to change the popular WordPress application to produce HTML5 compliant output. That just illustrates a fundamental flaw in the way WordPress has been built. It is a perfect example of a CMS built by a bunch of bozos [1] and cannot be used as an excuse for allowing the syntax. Be careful when you patronize. Is there really any excuse for allowing biOMG!/b/i? No, but HTML5 is willing to pinch its nose with thumb and forefinger and look the other way. It literally is not a battle worth fighting. As a side benefit of this change, I believe that I could modify my weblog to be simultaneously both HTML5 and XHTML5 compliant, modulo the embedded SVG content, something that would needs to be discussed separately. No you couldn't, and how would that be a benefit if you could? XHTML 5 requires xmlns, HTML 5 forbids it. HTML 5 requires !DOCTYPE html, XHTML 5 doesn't (though it's still well-formed, so you could get away with it). The last I saw, HTML 5 is a working draft. Did I miss a memo? With Venus, I translate all content into a canonical well formed XML format. This enables people who author filters to the ability to worry about a lot less random edge cases. I've already seen a lot of inventiveness when people find that they can apply off the shelf XML tools like XPath and XSLT. I'd gladly put in a !DOCTYPE html in my page, the question is: would the WHATWG be willing to meet me half way and allow xmlns attributes in a very select and carefully prescribed set of locations? By the way, my experience is that these types of conversations always start off bumpy not merely due to the well known limitation of email for conveying human emotion. The problem is deeper than that: there literally is no good place to start. The only way I know how to deal with that is to pose, and repeat, concrete and simple questions. And the one that I am posing with this thread is as follows: If HTML5 were changed so that these elements -- and these elements alone -- permitted an optional trailing slash character, what percentage of the web would be parsed differently? Can you cite three independent examples of existing websites where the parsing would diverge? [1] http://hsivonen.iki.fi/producing-xml/ - Sam Ruby
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Anne van Kesteren wrote: On Wed, 29 Nov 2006 17:10:10 +0100, Robert Sayre [EMAIL PROTECTED] wrote: Perhaps it would be better to prove that the current rules result in easy explanations. What would the text of a bug filed on WordPress look like? Let's assume you actually want them to fix it, not just make a point. The bug would request that Wordpress doesn't try to output XML for the text/html media type. That seems to be the problem here. If the code for Wordpress fit on a page, that suggestion would be easy to implement. As it stands now, it appear that several hundred lines of code would need to change. And in each case, the code would need to be aware of the content type in effect. In some cases, that information may not be available. In fact, that may not have been determined yet. One way cross-cutting concerns such as this one are often handled is to simple capture the output and post-process it. Latchlan opted to do so with the WHATWG Blog. The first pass for things like this generally takes the form of simple pattern matching and regular expressions. Often this evolves. What would be better is something that could take that string and produce a DOM, from which a correct serialization can take place. Now, what type of parser would you use? HTML5's rules come tantalizingly close to handling this situation, except for a few cases involving tags that are self-closing... - Sam Ruby
Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?
Anne van Kesteren wrote: What do you mean with implemented interoperably? produce the same DOM - Sam Ruby
[whatwg] Allow trailing slash in always-empty HTML5 elements?
In response to a weblog post of mine[1], Ian stated[2]: we can’t make trailing “/” characters meaningful — it would change how about 49% of the Web is parsed Just to make sure that we are talking about the same thing, let me make a much more carefully scoped proposal. In HTML5, there are a number of elements with a content model of empty: area, base, br, col, command, embed, hr, img, link, meta, and param. If HTML5 were changed so that these elements -- and these elements alone -- permitted an optional trailing slash character, what percentage of the web would be parsed differently? Can you cite three independent examples of existing websites where the parsing would diverge? As an additional constraint, I am explicitly suggesting that the Attribute value (unquoted) state not be changed - slashes in this state would continue to be appended to the current attribute's value. The basis for my question is the observation that the web browsers that I am familiar with apparently already operate in this fashion, this usage seems to have crept into quite a number of diverse places, and all this is coupled with Lachlan's observations[3] on what it would take to change the popular WordPress application to produce HTML5 compliant output. As a side benefit of this change, I believe that I could modify my weblog to be simultaneously both HTML5 and XHTML5 compliant, modulo the embedded SVG content, something that would needs to be discussed separately. - Sam Ruby [1] http://intertwingly.net/blog/2006/11/28/Meet-the-New-Boss [2] http://intertwingly.net/blog/2006/11/28/Meet-the-New-Boss#c1164743684 [3] http://intertwingly.net/blog/2006/11/24/Feedback-on-XHTML#c1164720800