[whatwg] [url] merge progress

2014-11-30 Thread Sam Ruby
This work is now at a stage where I encourage wider review.  Current 
results can be found here:


https://specs.webplatform.org/url/webspecs/develop/

This work is proceeding here:

https://github.com/webspecs/url

My preferred method of input is pull requests, but bug reports, issues, 
emails or other mechanisms are fine.


Once this work gets to the point where Anne is indicates it is ready, 
the plan is to pull this work into the WHATWG GitHub repository, where 
it will be built with the WHATWG specification templates.  An example of 
what that will look like can be found here:


http://intertwingly.net/projects/pegurl/url-merge.html

See the readme for more information on webspecs: 
https://github.com/webspecs/url#readme


- Sam Ruby


[whatwg] URL Statics questions

2014-11-28 Thread Sam Ruby

https://url.spec.whatwg.org/#url-statics

It is not clear to me what the use case is for these methods, which 
leads me to a a number of questions.


First, what does static mean in this context?  Is this the C++ meaning 
of static, i.e., class methods.  So would the two methods being 
described map to the following in JavaScript?


  URL.domainToASCII = function(domain) {...}
  URL.domainToUnicode = function(domain) {...}

Are these methods implemented by any current browser?  Assuming we are 
talking about URL.domainToASCII, I didn't see it implemented in the 
first two browsers I checked (Chrome and Firefox).


Now to my real question: assuming we do IPv4 parsing per 
https://www.w3.org/Bugs/Public/show_bug.cgi?id=26431 (and incidentally, 
matching the Chrome implementation), what should these static methods 
return in the case of IPv4 addresses?


The reason why I'm asking is that I'm working on rewriting the URL 
parser per https://www.w3.org/Bugs/Public/show_bug.cgi?id=25946, and 
would like to update the https://url.spec.whatwg.org/#host-parsing to be 
consistent.


- Sam Ruby


Re: [whatwg] URL interop status and reference implementation demos

2014-11-22 Thread Sam Ruby



On 11/21/2014 05:32 PM, Domenic Denicola wrote:

From: Sam Ruby [mailto:ru...@intertwingly.net]


I guess I didn't make the point clearly before.  This is not a
waterfall process where somebody writes down a spec and expects
implementations to eventually catch up.  That line of thinking
sometimes leads to browsers closing issues as WONTFIX.  For
example:

https://code.google.com/p/chromium/issues/detail?id=257354

Instead I hope that the spec is open to change (and, actually, the
list of open bug reports is clear evidence that this is the case),
and that implies that differing from the spec isn't
isomorphically equal to problematic case.  More precisely: it may
be the spec that needs to change.


For sure! But, I would like to see where the spec differs from
implementations, so that I can see what parts of the spec needs to be
changed.

Right now, when I read user agents with differences: testdata chrome
firefox ie versus one that reads user agents with differences: ie
safari, I can't tell which user agents are aligned with the spec and
which aren't. So I can't tell if the spec needs to change, or if it
doesn't.

I'd prefer some kind of view where it said user agents with
differences from the spec: x, y, z. Then if the answer was chrome,
firefox, ie clearly the spec needs to change; if the answer was
chrome then clearly Chrome needs to change and we can leave the
spec alone.


Perhaps this is the view you are looking for?

http://w3c.github.io/test-results/url/all.html

Note that on that view you can click through to see how the user agent 
you are currently using differs from the spec.



I'm gathering this is very different from the data the table is
currently showing, but it seems I don't actually understand what the
table is currently showing anyway, so I don't understand how I could
use the table's current data to guide spec changes.


To reduce confusion, I've removed the list when there isn't consensus. 
I've also changed the colors on the browser-results page.


Green means all is good.

Yellow means that one or two browsers differ, and those are noted.

Red means that there isn't consensus.  I'm no longer showing which user 
agents differ.


If you drill down, I'm still showing testdata as a user agent. 
reference implementation would be a better description.  I'll probably 
fix that later.


- Sam Ruby


Re: [whatwg] URL interop status and reference implementation demos

2014-11-19 Thread Sam Ruby

On 11/18/2014 06:37 PM, Domenic Denicola wrote:

Really exciting stuff :D. I love specs that have reference implementations and 
strong test suites and am hopeful that as URL gets fixes and updates that these 
stay in sync. E.g. normal software development practices of not changing 
anything without a test, and so on.


Thanks!  I've tried to follow the example that the streams spec is 
providing.  Including the naming of directories.



From: whatwg [mailto:whatwg-boun...@lists.whatwg.org] On Behalf Of Sam Ruby


https://url.spec.whatwg.org/interop/urltest-results/


I'd be interested in a view that only contains refimpl, ie, safari, firefox, 
and chrome, so we could compare the URL Standard with living browsers.


Done, sort-of: https://url.spec.whatwg.org/interop/browser-results/

I note that given the small amount of data, the 'agents with 
differences' column is less useful than it could be.  Basically, a 
reddish color should be interpreted to mean that we don't have three out 
of the four browsers agreeing on all values.



I'd like to suggest that the following test be added:

https://github.com/rubys/url/blob/peg.js/reference-implementation/test/moretestdata.txt

And that the expected results be changed on the following tests:

https://github.com/rubys/url/blob/peg.js/reference-implementation/test/patchtestdata.txt

Note: I appear to have direct update access to urltestdata.txt, but I would 
appreciate a review before I make any updates.


A pull request with a nice diff would be easy to review, I think?


Done.  https://github.com/w3c/web-platform-tests/pull/1402


The setters also have unit tests:

https://github.com/rubys/url/blob/peg.js/reference-implementation/test/urlsettest.js


So good!

For streams I am running the unit tests against my reference implementation on 
every commit (via Travis). Might be worth setting up something similar.


That's first on my todo list post merge:

http://intertwingly.net/projects/pegurl/url.html#postmerge

Basically, I'd rather do that on the whatwg branch rather than the rubys 
branch, but my stuff isn't quite ready to merge.



As a final note, the reference implementation has a list of known differences 
from the published standard:

intertwingly.net/projects/pegurl/url.html


Hmm, so this isn't really a reference implementation of the published standard 
then? Indeed looking at the code it seems to not follow the algorithms in the 
spec at all :(. That's a bit unfortunate if the goal is to test that the spec 
is accurate.

I guess 
https://github.com/rubys/url/tree/peg.js/reference-implementation#historical-notes
 explains that. Hmm. In that case, I'm unclear in what sense this is a 
reference implementation, instead of an alternate algorithm.


I answered that separately: 
http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Nov/0129.html


- Sam Ruby


Re: [whatwg] URL interop status and reference implementation demos

2014-11-19 Thread Sam Ruby

On 11/19/2014 09:32 AM, Domenic Denicola wrote:

From: Sam Ruby [mailto:ru...@intertwingly.net]


Done, sort-of: https://url.spec.whatwg.org/interop/browser-results/


Excellent, this is a great subset to have.

I am curious what it means when testdata is in the user agents with 
differences column. Isn't testdata the base against which the user agents are compared?


These results compare user agents against each other.  The testdata is 
provided for reference.


I am not of the opinion that the testdata should be treated as anything 
other than as a proposal at this point.  Or to put it another way, if 
browser behavior is converging to something other than what than what 
the spec says, then perhaps it is the spec that should change.



Done.  https://github.com/w3c/web-platform-tests/pull/1402


Interesting, I did not realize that testdata was part of web-platform-tests 
instead of the URL repo alongside all your other interop material. I wonder if 
we should investigate ways to centralize inside the URL repo, e.g. having 
whatwg/url be a submodule of w3c/web-platform-tests?


web-platform-tests is huge.  I only need a small piece.  So for now, I'm 
making do with a wget in my Makefile, and two patch files which cover 
material that hasn't yet made it upstream.


- Sam Ruby



Re: [whatwg] URL interop status and reference implementation demos

2014-11-19 Thread Sam Ruby

On 11/19/2014 09:55 AM, Domenic Denicola wrote:

From: Sam Ruby [mailto:ru...@intertwingly.net]


These results compare user agents against each other.  The testdata
is provided for reference.


Then why is testdata listed as a user agent?


It clearly is mislabled.  Pull requests welcome.  :-)


I am not of the opinion that the testdata should be treated as
anything other than as a proposal at this point.  Or to put it
another way, if browser behavior is converging to something other
than what than what the spec says, then perhaps it is the spec that
should change.


Sure. But I was hoping to see a list of user agents that differed
from the test data, so we could target the problematic cases. As is
I'm not sure how to interpret a row that reads user agents with
differences: testdata chrome firefox ie versus one that reads user
agents with differences: ie safari.


I guess I didn't make the point clearly before.  This is not a waterfall 
process where somebody writes down a spec and expects implementations to 
eventually catch up.  That line of thinking sometimes leads to browsers 
closing issues as WONTFIX.  For example:


https://code.google.com/p/chromium/issues/detail?id=257354

Instead I hope that the spec is open to change (and, actually, the list 
of open bug reports is clear evidence that this is the case), and that 
implies that differing from the spec isn't isomorphically equal to 
problematic case.  More precisely: it may be the spec that needs to 
change.



web-platform-tests is huge.  I only need a small piece.  So for
now, I'm making do with a wget in my Makefile, and two patch
files which cover material that hasn't yet made it upstream.


Right, I was suggesting the other way around: hosting the
evolving-along-with-the-standard testdata.txt inside whatwg/url, and
letting web-platform-tests pull that in (with e.g. a submodule).


Works for me :-)

That being said, there seems to be a highly evolved review process for 
test data, and on the face of it, that seems to be something worth 
keeping.  Unless there is evidence that it is broken, I'd be inclined to 
keep it as it is.


In fact, once I have refactored the test data from the javascript code 
in my setter tests, I'll likely suggest that it be added to 
web-platform-tests.


- Sam Ruby


[whatwg] URL interop status and reference implementation demos

2014-11-18 Thread Sam Ruby
Anne has kindly given me access to the directory on the server where the 
url.spec lives.  I've started to move some of my work there.


https://url.spec.whatwg.org/interop/urltest-results/

Note that the expected results come from:

https://github.com/w3c/web-platform-tests/blob/master/url/urltestdata.txt

I'd like to suggest that the following test be added:

https://github.com/rubys/url/blob/peg.js/reference-implementation/test/moretestdata.txt

And that the expected results be changed on the following tests:

https://github.com/rubys/url/blob/peg.js/reference-implementation/test/patchtestdata.txt

Note: I appear to have direct update access to urltestdata.txt, but I 
would appreciate a review before I make any updates.


- - -

I also have a reference implementation I've been working on.  First, a 
basic interface:


https://url.spec.whatwg.org/reference-implementation/liveview.html

A second interface allows you to override the base:

https://url.spec.whatwg.org/reference-implementation/liveview2.html

A third interface allows you to see what happens when you call 
individual setters:


https://url.spec.whatwg.org/reference-implementation/liveview3.html

Note: while all versions are a work in progress, this is more true for 
liveview3 than the others.  In particular, this was created today, and 
only has href, protocol, and username roughed in at the moment.


The setters also have unit tests:

https://github.com/rubys/url/blob/peg.js/reference-implementation/test/urlsettest.js

I'm planning to refactor these tests, separating the test data from the 
code so that other libraries and user agents can test against the same 
data.  Once I do, I'll publish interop test results for these setters too.


As a final note, the reference implementation has a list of known 
differences from the published standard:


intertwingly.net/projects/pegurl/url.html

- Sam Ruby


Re: [whatwg] URL interop status and reference implementation demos

2014-11-18 Thread Sam Ruby

On 11/18/2014 06:37 PM, Domenic Denicola wrote:



As a final note, the reference implementation has a list of known
differences from the published standard:

intertwingly.net/projects/pegurl/url.html


Hmm, so this isn't really a reference implementation of the published
standard then? Indeed looking at the code it seems to not follow the
algorithms in the spec at all :(. That's a bit unfortunate if the
goal is to test that the spec is accurate.


Let me help by connecting the dots.

Bug https://www.w3.org/Bugs/Public/show_bug.cgi?id=25946 is open to 
rewrite the URL parser.  Comment 8 and 9 endorse the following work in 
progress:


http://intertwingly.net/projects/pegurl/url.html

Just today, I integrated my Anolis to Bikeshed work, which is a 
prerequisite for completing this.


The reference implementation is a faithful attempt to implement the 
reworked parsing logic.  In fact, parts of the specification and parts 
of the reference implementation are generated from a single file:


https://raw.githubusercontent.com/rubys/url/peg.js/url.pegjs

Hopefully shortly this all will land in the live version of the spec, 
meanwhile it attempts to skate to where the puck will be.  In each 
case of a known difference in published results, I've linked to 
rationale for the change (generally to an indication that Anne agrees).


I hope this helps.

- Sam Ruby





Re: [whatwg] [url] Feedback from TPAC

2014-11-04 Thread Sam Ruby

On 11/03/2014 10:32 AM, Anne van Kesteren wrote:

On Mon, Nov 3, 2014 at 4:19 PM, David Singer sin...@apple.com wrote:

The readability is much better (I am not a fan of the current trend of writing 
specifications in pseudo-basic, which makes life easier for implementers and 
terrible for anyone else, including authors), and I also think that an approach 
that doesn’t obsolete RFC 3986 is attractive.


Is Apple interested in changing its URL infrastructure to not be
fundamentally incompatible with RFC 3986 then?

Other than slightly different eventual data models for URLs, which we
could maybe amend RFC 3986 for IETF gods willing, I think the main
problem is that a URL that goes through an RFC 3986 pipeline cannot go
through a URL pipeline. E.g. parsing ../test against
foobar://test/x gives wildly different results. That is not a state
we want to be in, so something has to give.


I would hope that everybody involved would enter into this discussion 
being willing to give a bit.


To help foster discussion, I've made an alternate version of the live 
URL parser page, one that enables setting of the base URL:


http://intertwingly.net/projects/pegurl/liveview2.html#foobar://test/x

Of course, if there are any bugs in the proposed reference 
implementation, I'm interested in that too.


- Sam Ruby




Re: [whatwg] [url] Feedback from TPAC

2014-11-04 Thread Sam Ruby

On 11/04/2014 09:32 AM, Anne van Kesteren wrote:

On Tue, Nov 4, 2014 at 3:28 PM, Sam Ruby ru...@intertwingly.net wrote:

To help foster discussion, I've made an alternate version of the live URL
parser page, one that enables setting of the base URL:

http://intertwingly.net/projects/pegurl/liveview2.html#foobar://test/x

Of course, if there are any bugs in the proposed reference implementation,
I'm interested in that too.


Per the URL Standard resolving x against test:test results in
failure, not test:///x.


Fixed.  Thanks!

Perhaps over time we could add this to urltestdata.txt[1]?  Meanwhile, 
I'll track such proposed additions here:


https://github.com/rubys/url/blob/peg.js/reference-implementation/test/moretestdata.txt

- Sam Ruby

[1] 
https://github.com/w3c/web-platform-tests/blob/master/url/urltestdata.txt


Re: [whatwg] [url] Feedback from TPAC

2014-11-04 Thread Sam Ruby

On 11/04/2014 11:25 AM, Domenic Denicola wrote:

From: whatwg [mailto:whatwg-boun...@lists.whatwg.org] On Behalf Of David Singer


(I don't have IE to hand at the moment).


I tried to test IE but unfortunately it looks like the URL components from DOM 
properties part of the demo page does not work in IE, I think because IE doesn't 
support document.baseURI.


Try experimenting with a base URL using a http scheme.

If you look closely at the source, you will see that function rebase 
will set both document.baseURI and the href element on the base element. 
 The latter is sufficent for non-IE browsers.  I had to add the former 
to get IE working.


But, as you undoubtedly have noted, unknown base schemes seem to cause 
IE too ignore the base URL entirely.


- Sam Ruby


Re: [whatwg] [url] Feedback from TPAC

2014-11-02 Thread Sam Ruby

On 11/02/2014 02:32 PM, Graham Klyne wrote:

On 01/11/2014 00:01, Sam Ruby wrote:


3) Explicitly state that canonical URLs (i.e., the output of the URL
parse step)
not only round trip but also are valid URIs.  If there are any RFC
3986 errata
and/or willful violations necessary to make that a true statement, so
be it.


It's not clear to me what it is that might be willfully violated.


Perhaps nothing.


Specifically, I find the notion of relative scheme in  [1] to be, at
best, confusing, and at worst something that could break a whole swathe
of existing URI processing.  I don't know which, as on a brief look I
don't understand what [1] is trying to say here, and I lack time (and
will) to dive into the arcane style used for specifying URLs.


First, I'm assuming that by [1], you mean 
https://url.spec.whatwg.org/#relative-scheme


Second, I have no idea how a specification that essentially says here's 
what a set of browsers, languages, and libraries are converging on to 
convert URLs into URIs can break URIs.


Third, here's a completely different approach to defining URLs that 
produces the same results (modulo one parse error that Anne agrees[2] 
should changed in be in the WHATWG spec):


http://intertwingly.net/projects/pegurl/url.html#url

If for some reason you don't find that to be to your liking, I'll be 
glad to try to meet you half way.  I just need something more to go on 
than arcane.



I think there may be a confusion here between syntax and
interpretation.  When the term relative is used in URI/URL context, I
immediately think of relative reference per RFC3986.   I suspect what
is being alluded to is that some URI schemes are not global in the
idealized sense of URIs as a global namespace - file:///foo dereferences
differently depending on where it is used - the relativity here being in
the relation between the URI/URL and the thing identified, with respect
the the where the URI is actually processed.


If you find it confusing, perhaps others will too.  Concrete suggestions 
on what should be changed would be helpful.



To change the syntactic definition of relative reference to include
things like file: and ftp: URIs would cause all sorts of breakage, and
require significant updating of the resolution algorithm in RFC3986
(more than would be appropriate for a mere erratum, IMO).  I'm hoping
this is not the kind of willful violation that is being contemplated here.


Note in reformulated grammar, file is no longer treated the same as 
other types of relative references.  I am not wedded to any of those 
terms, if you suggest better ones I'll accommodate.


If errata can be produced expeditiously for RFC3986, then there 
shouldn't be any need for willful violations.



#g
--


[2] 
http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Oct/0267.html


Re: [whatwg] [url] Feedback from TPAC

2014-11-01 Thread Sam Ruby

On 11/1/14 5:29 AM, Anne van Kesteren wrote:

On Sat, Nov 1, 2014 at 1:01 AM, Sam Ruby ru...@intertwingly.net wrote:

Meanwhile, The IETF is actively working on a update:

https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04

They are meeting F2F in a little over a week.  URIs in general, and this
proposal in specific will be discussed, and for that reason now would be a
good time to provide feedback.  I've only quickly scanned it, but it appears
sane to me in that it basically says that new schemes will not be viewed as
relative schemes.


It doesn't say that. (We should perhaps try to find some way to make
{scheme}:// syntax work for schemes that are not problematic (e.g.
javascript would be problematic). Convincing implementers that it's
worth implementing might be trickier.)


How should it change?


1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too.


See previous threads on the subject. The data models are incompatible,
at least around %, likely also around other code points. It also
seems unacceptable to require two parsers for URLs.


Acknowledging that other parsers exist is quite a different statement 
than requiring two parsers.  I'm only suggesting the former.


As a concrete statement, a compliant implementation of HTML would 
require a URL parser, but not a URI parser.


Also as a concrete statement, such a user agent will interact, primarily 
via the network, with other software that will interpret the 
canonicalized URL's as if they were URIs.


That may not be as we would wish it to be.  But it would be a disservice 
to everyone to document how we would wish things to be rather than how 
they actually are (and, by all indications, are likely to remain for the 
foreseeable future).



3) Explicitly state that canonical URLs (i.e., the output of the URL parse
step) not only round trip but also are valid URIs.  If there are any RFC
3986 errata and/or willful violations necessary to make that a true
statement, so be it.


It might be interesting to figure out the delta. But there are major
differences between RFC 3986 and URL. Not obsoleting the former seems
like a disservice to anyone looking to implement a parser or find
information on URI/URL.


I do plan to work with others to figure out the delta.  As to the data 
models, at the present time -- and without having actually done the 
necessary analysis -- I am not aware of a single case where they would 
be different.  Undoubtedly we will be able to quickly find some, but 
even so, I would assert that they following statements will remain true 
for the domain of canonicalized URLs, by which I mean the set of 
possible outputs of the URL serializer:


1) the overlap is substantial, and I would dare say overwhelming.

2) RFC 3986 and URL compliant parsers would interpret the same bytes in 
such outputs as delimiters, schemes, paths, fragments, etc.


3) as to data models, the URL Standard is silent as to how such bytes be 
interpreted.  As to the meaning of '%', both the URL Standard and 
RFC3986 recognize that encodings other than utf-8 exist, and that such 
will affect the interpretation of percent encoded byte sequences.


- Sam Ruby


Re: [whatwg] [url] Feedback from TPAC

2014-11-01 Thread Sam Ruby

On 11/1/14 7:56 AM, Anne van Kesteren wrote:

On Sat, Nov 1, 2014 at 12:38 PM, Sam Ruby ru...@intertwingly.net wrote:

On 11/1/14 5:29 AM, Anne van Kesteren wrote:

It doesn't say that. (We should perhaps try to find some way to make
{scheme}:// syntax work for schemes that are not problematic (e.g.
javascript would be problematic). Convincing implementers that it's
worth implementing might be trickier.)


How should it change?


Not sure what you're referring to.


https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04


I just gave you one, %... E.g. http://example.org/?%; does not have
an RFC 3986 representation.


Here's the output of a URL parser (the one I chose was Firefox):

new URL(http://example.com/?%;).search
?%

Here's the output of a URI parser:

$ ruby -r addressable/uri -e p 
Addressable::URI.parse('http://example.org/?%').query

%

I also assert that such a URL round-trips a URL parse/serialize sequence.

- Sam Ruby


Re: [whatwg] [url] Feedback from TPAC

2014-11-01 Thread Sam Ruby

On 11/01/2014 07:18 PM, Barry Leiba wrote:

Thanks, Sam, for this great summary -- I hadn't taken notes, and was
hoping that someone who was (or who has a better memory than I) would
post something.

One minor tweak, at the end:


More specifically, if something along these lines I describe above were
done, the IETF would be open to the idea of errata to RFC3987 and updating
specs to reference URLs.


Errata to 3986, that is, not 3987.  After this, 3987 will be
considered obsolete (the IESG might move to mark it Historic, or
some such).


Thanks for the correction.  I did indeed mean errata to 3986.

- Sam Ruby


Barry, IETF Applications AD

On Fri, Oct 31, 2014 at 8:01 PM, Sam Ruby ru...@intertwingly.net wrote:

bcc: WebApps, IETF, TAG in the hopes that replies go to a single place.

- - -

I took the opportunity this week to meet with a number of parties interested
in the topic of URLs including not only a number of Working Groups, AC and
AB members, but also members of the TAG and members of the IETF.

Some of the feedback related to the proposal I am working on[1].  Some of
the feedback related to mechanics (example: employing Travis to do build
checks, something that makes more sense on the master copy of a given
specification than on a hopefully temporary branch.  These are not the
topics of this email.

The remaining items are more general, and are the subject of this note.  As
is often the case, they are intertwined.  I'll simply jump into the middle
and work outwards from there.

---

The nature of the world is that there will continue to be people who define
more schemes.  A current example is http://openjdk.java.net/jeps/220 (search
for New URI scheme for naming stored modules, classes, and resources).
And people who are doing so will have a tendency to look to the IETF.

Meanwhile, The IETF is actively working on a update:

https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04

They are meeting F2F in a little over a week[2].  URIs in general, and this
proposal in specific will be discussed, and for that reason now would be a
good time to provide feedback.  I've only quickly scanned it, but it appears
sane to me in that it basically says that new schemes will not be viewed as
relative schemes[3].

The obvious disconnect is that this is a registry for URI schemes, not URLs.
It looks to me like making a few, small, surgical updates to the URL
Standard would stitch all this together.

1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too.

2) Reference draft-ietf-appsawg-uri-scheme-reg in
https://url.spec.whatwg.org/#url-writing as the way to register schemes,
stating that the set of valid URI schemes is the set of valid URL schemes.

3) Explicitly state that canonical URLs (i.e., the output of the URL parse
step) not only round trip but also are valid URIs.  If there are any RFC
3986 errata and/or willful violations necessary to make that a true
statement, so be it.

That's it.  The rest of the URL specification can stand as is.

What this means operationally is that there are two terms, URIs and URLs.
URIs would be of a legacy, academic topic that may be of relevance to some
(primarily back-end server) applications.  URLs are most people, and most
applications, will be concerned with.  This includes all the specifications
which today reference IRIs (as an example, RFC 4287, namely, Atom).

My sense was that all of the people I talked to were generally OK with this,
and that we would be likely to see statements from both the IETF and the W3C
TAG along these lines mid November-ish, most likely just after IETF meeting
91.

More specifically, if something along these lines I describe above were
done, the IETF would be open to the idea of errata to RFC3987 and updating
specs to reference URLs.

- Sam Ruby

[1] http://intertwingly.net/projects/pegurl/url.html
[2] https://www.ietf.org/meeting/91/index.html
[3] https://url.spec.whatwg.org/#relative-scheme





[whatwg] [url] Feedback from TPAC

2014-10-31 Thread Sam Ruby

bcc: WebApps, IETF, TAG in the hopes that replies go to a single place.

- - -

I took the opportunity this week to meet with a number of parties 
interested in the topic of URLs including not only a number of Working 
Groups, AC and AB members, but also members of the TAG and members of 
the IETF.


Some of the feedback related to the proposal I am working on[1].  Some 
of the feedback related to mechanics (example: employing Travis to do 
build checks, something that makes more sense on the master copy of a 
given specification than on a hopefully temporary branch.  These are not 
the topics of this email.


The remaining items are more general, and are the subject of this note. 
 As is often the case, they are intertwined.  I'll simply jump into the 
middle and work outwards from there.


---

The nature of the world is that there will continue to be people who 
define more schemes.  A current example is 
http://openjdk.java.net/jeps/220 (search for New URI scheme for naming 
stored modules, classes, and resources).  And people who are doing so 
will have a tendency to look to the IETF.


Meanwhile, The IETF is actively working on a update:

https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04

They are meeting F2F in a little over a week[2].  URIs in general, and 
this proposal in specific will be discussed, and for that reason now 
would be a good time to provide feedback.  I've only quickly scanned it, 
but it appears sane to me in that it basically says that new schemes 
will not be viewed as relative schemes[3].


The obvious disconnect is that this is a registry for URI schemes, not 
URLs.  It looks to me like making a few, small, surgical updates to the 
URL Standard would stitch all this together.


1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too.

2) Reference draft-ietf-appsawg-uri-scheme-reg in 
https://url.spec.whatwg.org/#url-writing as the way to register schemes, 
stating that the set of valid URI schemes is the set of valid URL schemes.


3) Explicitly state that canonical URLs (i.e., the output of the URL 
parse step) not only round trip but also are valid URIs.  If there are 
any RFC 3986 errata and/or willful violations necessary to make that a 
true statement, so be it.


That's it.  The rest of the URL specification can stand as is.

What this means operationally is that there are two terms, URIs and 
URLs.  URIs would be of a legacy, academic topic that may be of 
relevance to some (primarily back-end server) applications.  URLs are 
most people, and most applications, will be concerned with.  This 
includes all the specifications which today reference IRIs (as an 
example, RFC 4287, namely, Atom).


My sense was that all of the people I talked to were generally OK with 
this, and that we would be likely to see statements from both the IETF 
and the W3C TAG along these lines mid November-ish, most likely just 
after IETF meeting 91.


More specifically, if something along these lines I describe above were 
done, the IETF would be open to the idea of errata to RFC3987 and 
updating specs to reference URLs.


- Sam Ruby

[1] http://intertwingly.net/projects/pegurl/url.html
[2] https://www.ietf.org/meeting/91/index.html
[3] https://url.spec.whatwg.org/#relative-scheme


Re: [whatwg] questions on URL spec based on reviewing galimatias test results

2014-10-30 Thread Sam Ruby

On 10/30/14 2:09 AM, Anne van Kesteren wrote:

On Wed, Oct 29, 2014 at 11:24 PM, Sam Ruby ru...@intertwingly.net wrote:

http://intertwingly.net/projects/pegurl/urltest-results/d674c14cbe

I'll note that galimatias doesn't produce a parse error in this case (and,
in fact, the state machine specified by the current URL Standard goes down a
completely different path for this case).

The question is: should this be a parse error?


Yeah. The results also seem strange. I thought at least Chrome had
this behavior. Perhaps because Chrome was not running on Windows?


Here is a screen capture of the live DOM URL viewer:

http://i.imgur.com/kbsTDQ7.png

Here are the test results for Chrome on Windows:

http://intertwingly.net/tmp/81cd494abd36509f0d46010b0c4d4ff9

It appears that Chrome implements this, but (a) only on Windows, and (b) 
only if the base scheme is file.


- Sam Ruby




[whatwg] questions on URL spec based on reviewing galimatias test results

2014-10-29 Thread Sam Ruby

1) Is the following expected to produce a parse error:

http://intertwingly.net/projects/pegurl/urltest-results/4b60e32190 ?

My reading of https://url.spec.whatwg.org/#relative-path-state is that 
step 3.1 indicates a parse error even though later step 1.5.1 replaces 
the non URL code point with a colon.


My proposed reference implementation does not indicate a parse error 
with these inputs, but I could easily add it.


2) Is the following expected to product a parse error:

http://intertwingly.net/projects/pegurl/urltest-results/bc6ea8bdf8 ?

I ask this because the error isn't defined here:
  https://url.spec.whatwg.org/#host-state

And the following only defines fatal errors (e.g. step 5);
  https://url.spec.whatwg.org/#concept-host-parser

My proposed reference implementation does indicate a parse error with 
these inputs, but this could easily be removed.


- Sam Ruby


Re: [whatwg] questions on URL spec based on reviewing galimatias test results

2014-10-29 Thread Sam Ruby

On 10/29/14 4:47 AM, Anne van Kesteren wrote:

On Wed, Oct 29, 2014 at 12:12 PM, Sam Ruby ru...@intertwingly.net wrote:


2) Is the following expected to product a parse error:

http://intertwingly.net/projects/pegurl/urltest-results/bc6ea8bdf8 ?


What is the DNS violation supposed to mean?

I would expect this to change if we decide to parse any numeric host
name into IPv4. Then it would certainly be an error.


Here is another example (though it contains multiple parse errors):

http://intertwingly.net/projects/pegurl/urltest-results/f3382f1412

The error being reported is that the host contains consecutive dot 
characters (i.e., the 'label' between these characters is empty).


- Sam Ruby


Re: [whatwg] questions on URL spec based on reviewing galimatias test results

2014-10-29 Thread Sam Ruby

On 10/29/14 4:47 AM, Anne van Kesteren wrote:

On Wed, Oct 29, 2014 at 12:12 PM, Sam Ruby ru...@intertwingly.net wrote:

1) Is the following expected to produce a parse error:

http://intertwingly.net/projects/pegurl/urltest-results/4b60e32190 ?

My reading of https://url.spec.whatwg.org/#relative-path-state is that step
3.1 indicates a parse error even though later step 1.5.1 replaces the non
URL code point with a colon.

My proposed reference implementation does not indicate a parse error with
these inputs, but I could easily add it.


Given the legacy aspect, probably should be an error.


Fixed:

https://github.com/rubys/url/commit/6789a5307ebd0e4aa05161c93038f2fc50011955

But it turns out that addressing that question opens up another 
question.  In my implementation that fix caused a (recoverable) parse 
error to be produced for another test case:


http://intertwingly.net/projects/pegurl/urltest-results/d674c14cbe

I'll note that galimatias doesn't produce a parse error in this case 
(and, in fact, the state machine specified by the current URL Standard 
goes down a completely different path for this case).


The question is: should this be a parse error?

- Sam Ruby




Re: [whatwg] URL: spec review - basic_parser

2014-10-14 Thread Sam Ruby

On 10/14/2014 03:41 AM, Anne van Kesteren wrote:

On Tue, Oct 14, 2014 at 1:05 AM, Sam Ruby ru...@intertwingly.net wrote:

1) rows where the notes merely say href are cases where parse errors are
thrown and failure is returned.  The expected results are an object that
returns the original href, but empty values for all other properties.  I
don't see this behavior in the spec:

https://url.spec.whatwg.org/#url-parsing


That is what you get when e.g. using a. If you use new URL() the
object would fail to construct so you cannot observe the other
properties. I'm not sure why you think it doesn't follow from the
specification. If you return failure, there's no URL returned, so why
would the properties return something?


Given that I've found problems in the spec, my implementation, and the 
test data, I'm trying to guess at what is the desired behavior.  As one 
source for clues, I've looked at what at the now unmaintained library:


https://github.com/annevk/url/blob/master/url.js#L62

And, as noted above, this is consistent with urltestdata.txt,

Given all of the above, would you suggest changing the spec or the 
expected test results?



2) rows that contain href hostname appear to be ones where the expected
results do not appear to be updated to include the host to IDNA mapping.


Looking at the first of those
http://intertwingly.net/stories/2014/10/13/urltest-results/eb3950fcc8
it seems something might be broken here on your end.


Can you explain what you think is broken?  It isn't completely obvious, 
but the input string in that case contains U+200B, U+2060, U+FEFF:


http://www.fileformat.info/info/unicode/char/200B/index.htm
http://www.fileformat.info/info/unicode/char/2060/index.htm
http://www.fileformat.info/info/unicode/char/feff/index.htm

I'll also note that the results I produce are consistent with 
Presto/2.12.388.



3) rows that contain href protocol hostname pathname need further
investigation.  I suspect that these are based on my using a library to
normalize the IDNA mapping, and it helpfully cleans up other problems like
removing U+ characters from the input.


E.g. for http://intertwingly.net/stories/2014/10/13/urltest-results/7a0e86d240
per http://www.unicode.org/Public/idna/latest/IdnaMappingTable.txt
U+FDD0 is disallowed meaning failure ought to be returned. What you
have as outcome for whatwg does not match urltestdata.txt (including
the version you are using).


Agreed.  As I indicated, I need to look further into the library that I 
am using.



P.S.  I didn't update to the latest test data yet; but from what I can see
the changes wouldn't materially affect the results, so I am publishing now.


It affects what happens for http://%30%78%63%30%2e%30%32%35%30.01%2e,
http://192.168.0.257, and
ttp://\uff10\uff38\uff43\uff10\uff0e\uff10\uff12\uff15\uff10\uff0e\uff10\uff11.


I do plan to update to the latest expected test results, but meanwhile I 
am still trying to determine places where these results aren't correct 
or current with the specification.


- Sam Ruby


Re: [whatwg] URL: spec review - basic_parser

2014-10-14 Thread Sam Ruby

On 10/14/2014 05:49 AM, Anne van Kesteren wrote:

On Tue, Oct 14, 2014 at 11:38 AM, Sam Ruby ru...@intertwingly.net wrote:

At the present time, all I can say is that the https://url.spec.whatwg.org/,
https://github.com/w3c/web-platform-tests/blob/master/url/, and
https://github.com/annevk/url are inconsistent.


I recommend not looking at annevk/url.


To illustrate, try pasting http://f:b/c into:

   http://www.lookout.net/test/url/url-liveview.html

Relevant excerpt from that page:

   var url = new URL(input, base);
   urlHref.textContent = url.href;

And the results for http://f:b/c after applying urltestparser.js against
urltestdata.js is as follows:

{input:http://f:b/c,base:http://example.org/foo/bar,scheme
:,username:,password:null,host:,port:,path:,query:,fra
gment:,href:http://f:b/c,protocol::,search:,hash:}


That seems correct. You hit b in the port state and that will return
failure (from memory, did not check).

How does this not match the specification?


Here's my original statement:

The expected results are an object that returns the original href, but 
empty values for all other properties.  I don't see this behavior in the 
spec: https://url.spec.whatwg.org/#url-parsing;


http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Oct/0159.html

If you could be so kind as to point out what I am missing, I would 
appreciate it.



I'll look further into why the results provided by Opera and
https://rubygems.org/gems/addressable don't appear to match RFC 3491.


Note that RFC 3491 is not a normative dependency for any of the algorithms.


RFC 3491 is a normative dependency for RFC 3490, Internationalizing 
Domain Names in Applications (IDNA).


You said, per IDNA those are ignored.

http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Oct/0166.html

- Sam Ruby


Re: [whatwg] URL: spec review - basic_parser

2014-10-14 Thread Sam Ruby

On 10/14/2014 07:00 AM, Simon Pieters wrote:

On Tue, 14 Oct 2014 12:34:55 +0200, Anne van Kesteren ann...@annevk.nl
wrote:


If you could be so kind as to point out what I am missing, I would
appreciate it.


The way the a element works, I assume. Which is mostly how URLUtils
works when associated with an object that is not URL.


[[
The a element also supports the URLUtils interface. [URL]

When the element is created, and whenever the element's href content
attribute is set, changed, or removed, the user agent must invoke the
element's URLUtils interface's set the input algorithm with the value of
the href content attribute, if any, or the empty string otherwise, as
the given value.
]]
https://html.spec.whatwg.org/multipage/semantics.html#the-a-element

- set the input

[[
1. Set url to null.
...
4. If url is not failure, set url to url.
]]
https://url.spec.whatwg.org/#concept-urlutils-set-the-input

When /url/ is failure, https://url.spec.whatwg.org/#concept-urlutils-url
is null. So:

.href:

[[
1. If url is null, return input.
]]
https://url.spec.whatwg.org/#dom-url-href

.protocol:

[[
1. If url is null, return :.
]]
https://url.spec.whatwg.org/#dom-url-protocol

...and the other attributes return empty string in the first step if url
is null.

Does that help?


Indeed, it does.  Thanks!

I was looking too myopically, assuming that urltestdata.txt was testing 
URL; and got sidetracked by http://www.lookout.net/test/url/.


What I should have been looking at is 
https://github.com/w3c/web-platform-tests/tree/master/url, and in 
particular, the name of:


https://github.com/w3c/web-platform-tests/blob/master/url/a-element.html

 - - -

I think that a working and up-to-date live url parser would be a handy 
thing to have, and I hope to have one available shortly.


- Sam Ruby


Re: [whatwg] URL: spec review - basic_parser

2014-10-13 Thread Sam Ruby

On 10/13/2014 10:05 AM, Anne van Kesteren wrote:


Not yet.  I'm still seeing a large set of differences between what I am
producing and what is in urltestdata.txt and need to track down whether the
problems are in my implementation, the spec, or in the test results.

Once those three are in sync; I'll try to look at the bigger picture.


Cool. Sounds great.


New test results:

http://intertwingly.net/stories/2014/10/13/urltest-results/

The fourth column (Notes) indicates which properties differ between 
what my software produces and what the testdata indicates should be the 
expected results.  These fall into three basic categories:


1) rows where the notes merely say href are cases where parse errors 
are thrown and failure is returned.  The expected results are an object 
that returns the original href, but empty values for all other 
properties.  I don't see this behavior in the spec:


https://url.spec.whatwg.org/#url-parsing

2) rows that contain href hostname appear to be ones where the 
expected results do not appear to be updated to include the host to IDNA 
mapping.


3) rows that contain href protocol hostname pathname need further 
investigation.  I suspect that these are based on my using a library to 
normalize the IDNA mapping, and it helpfully cleans up other problems 
like removing U+ characters from the input.


My implementation can be found here:

http://intertwingly.net/stories/2014/10/13/url_rb.html

Note the comments linking back to spec sections, and comments that 
identify step numbers.


- Sam Ruby

P.S.  I didn't update to the latest test data yet; but from what I can 
see the changes wouldn't materially affect the results, so I am 
publishing now.


P.P.S.  Preview of what is yet to come, ruby2js run against my 
implementation produces:


http://intertwingly.net/stories/2014/10/13/url_js.html

This will need some additional work to get running, for example lines 
54, 65, 82, 85, and 267 call out to libraries that aren't available to 
JavaScript.  Lines 275 to 277 are debugging lines that will be removed 
shortly.


Re: [whatwg] URL: spec review - basic_parser

2014-10-12 Thread Sam Ruby

On 10/12/2014 04:18 AM, Anne van Kesteren wrote:

On Sat, Oct 11, 2014 at 7:24 PM, Sam Ruby ru...@intertwingly.net wrote:

On 10/10/2014 08:19 PM, Sam Ruby wrote:

2) https://url.spec.whatwg.org/#concept-basic-url-parser
 I'm interpreting terminate this algorithm and return failure to
 mean the same thing, and I'm interpreting parse error as set
 parse error flag and continue.


I'm inclined to submit a pull request standardizing on terminate this
algorithm and set parse error.


I'm not sure what you mean here. Returning failure is important.
However, in override mode returning failure is not important so the
algorithm is simply terminated.


Can you explain in JavaScript terms what the difference is between 
return failure and terminate?


In any case, this difference wasn't clear to me, and mixed in with not 
defining what should be done with parse errors, and returning failure 
without setting parse errors (as mentioned below); all combined to make 
it more difficult (for me at least) to determine what was desired.



And parse error error would be more
like flag a parse error, or append a parse error to a list of parse
errors. It depends a bit on whether the parser decides to halt on the
first one or not.


I don't see anything in the prose that indicates that halting on the 
first parse error is an option.



 b) Step 1.3.3 seems problematic.  I interpret this prose to mean if
any character in buffer is a % and the first two characters
after the pointer position in input aren't hex characters.
Specifically, it appears to be comparing a possibly
non-contiguous set of characters.


Ah yes. It needs to check the two code points after code point in
buffer. That seems like a bug.


I'll look into submitting a pull request after I complete this pass.


4) https://url.spec.whatwg.org/#file-host-state
 Step 1.3.2 returns failure without setting parse_error.  Is this
 correct?

5) https://url.spec.whatwg.org/#host-state
 Step 1.2.2 also returns failure without setting parse_error.


This is indeed inconsistent. I must at some point have thought that
returning failure without reporting a parse error was fine (as failure
was indicated) or the other way around. Reporting a parse error before
returning is probably best.


I'm inclined to submit a pull request changing these to to set parse error
before failing.


Thanks.


Will do.


6) https://url.spec.whatwg.org/#relative-path-state
 If input contains a path but no query or fragment, the last part of
 the path will be accumulated into buffer, but that buffer will never
 be added to the path


Looks like I got confused by the prose in the spec.  I've submitted a pull
request that makes this point clearer:

   https://github.com/whatwg/url/pull/4

There were a number of places where things weren't clear to me; after I
complete my technical review and testing verification, I'll go back and
identify more.  Meanwhile, here is an example:

   https://github.com/whatwg/url/pull/5


Feel free to only modify url.src.html in PRs.


Ack.


Did you have a look at the open bugs by the way? There's a chance the
parsing algorithm will get rewritten at some point to be a bit more
functional and less state driven.


Not yet.  I'm still seeing a large set of differences between what I am 
producing and what is in urltestdata.txt and need to track down whether 
the problems are in my implementation, the spec, or in the test results.


Once those three are in sync; I'll try to look at the bigger picture.

- Sam Ruby


Re: [whatwg] URL: spec review - basic_parser

2014-10-11 Thread Sam Ruby

On 10/10/2014 08:19 PM, Sam Ruby wrote:

I've now completed step 1, as described at [1].

Here are my questions/comments:

1) https://url.spec.whatwg.org/#url-code-points
U+D8000 to U+DFFFD are invalid as they are within the UTF-16
surrogate range


Disregard this comment, it turns out that this was a bug in my code.


2) https://url.spec.whatwg.org/#concept-basic-url-parser
I'm interpreting terminate this algorithm and return failure to
mean the same thing, and I'm interpreting parse error as set
parse error flag and continue.


I'm inclined to submit a pull request standardizing on terminate this 
algorithm and set parse error.



3) https://url.spec.whatwg.org/#authority-state
a) Did you really mean prepend in Step 1.1?

b) Step 1.3.3 seems problematic.  I interpret this prose to mean if
   any character in buffer is a % and the first two characters
   after the pointer position in input aren't hex characters.
   Specifically, it appears to be comparing a possibly
   non-contiguous set of characters.


I plan to revisit this after I complete my initial testing (i.e. step 2).


4) https://url.spec.whatwg.org/#file-host-state
Step 1.3.2 returns failure without setting parse_error.  Is this
correct?

5) https://url.spec.whatwg.org/#host-state
Step 1.2.2 also returns failure without setting parse_error.


I'm inclined to submit a pull request changing these to to set parse 
error before failing.



6) https://url.spec.whatwg.org/#relative-path-state
If input contains a path but no query or fragment, the last part of
the path will be accumulated into buffer, but that buffer will never
be added to the path


Looks like I got confused by the prose in the spec.  I've submitted a 
pull request that makes this point clearer:


  https://github.com/whatwg/url/pull/4

There were a number of places where things weren't clear to me; after I 
complete my technical review and testing verification, I'll go back and 
identify more.  Meanwhile, here is an example:


  https://github.com/whatwg/url/pull/5

- Sam Ruby


[1] http://lists.w3.org/Archives/Public/www-tag/2014Oct/0053.html


[whatwg] URL: spec review - basic_parser

2014-10-10 Thread Sam Ruby

I've now completed step 1, as described at [1].

Here are my questions/comments:

1) https://url.spec.whatwg.org/#url-code-points
   U+D8000 to U+DFFFD are invalid as they are within the UTF-16
   surrogate range

2) https://url.spec.whatwg.org/#concept-basic-url-parser
   I'm interpreting terminate this algorithm and return failure to
   mean the same thing, and I'm interpreting parse error as set
   parse error flag and continue.

3) https://url.spec.whatwg.org/#authority-state
   a) Did you really mean prepend in Step 1.1?

   b) Step 1.3.3 seems problematic.  I interpret this prose to mean if
  any character in buffer is a % and the first two characters
  after the pointer position in input aren't hex characters.
  Specifically, it appears to be comparing a possibly
  non-contiguous set of characters.

4) https://url.spec.whatwg.org/#file-host-state
   Step 1.3.2 returns failure without setting parse_error.  Is this
   correct?

5) https://url.spec.whatwg.org/#host-state
   Step 1.2.2 also returns failure without setting parse_error.

6) https://url.spec.whatwg.org/#relative-path-state
   If input contains a path but no query or fragment, the last part of
   the path will be accumulated into buffer, but that buffer will never
   be added to the path

- Sam Ruby

[1] http://lists.w3.org/Archives/Public/www-tag/2014Oct/0053.html


Re: [whatwg] URL: test case review

2014-10-06 Thread Sam Ruby

On 10/06/2014 12:42 PM, Anne van Kesteren wrote:

On Mon, Oct 6, 2014 at 3:13 AM, Sam Ruby ru...@intertwingly.net wrote:

http://intertwingly.net/stories/2014/10/05/urltest-results/24f081633d


This does not match what I find in browsers. (I did not look through
the list exhaustively, see below, but since this was the first one...)


Can you explain the methodology you used?

The method I used can be found via:

wget http://intertwingly.net/stories/2014/10/05/urltest
wget http://intertwingly.net/stories/2014/10/05/urltestdata.json

TL:DR; I created a page with a script that (a) fetches input data using 
XHR; (b) updates an a and a base element and then captures various 
properties for each test, and (c) posts the result using XHR.


- Sam Ruby


Re: [whatwg] URL: test case review

2014-10-06 Thread Sam Ruby

On 10/06/2014 12:59 PM, Anne van Kesteren wrote:

On Mon, Oct 6, 2014 at 6:54 PM, Sam Ruby ru...@intertwingly.net wrote:

On 10/06/2014 12:42 PM, Anne van Kesteren wrote:

On Mon, Oct 6, 2014 at 3:13 AM, Sam Ruby ru...@intertwingly.net wrote:

http://intertwingly.net/stories/2014/10/05/urltest-results/24f081633d


This does not match what I find in browsers. (I did not look through
the list exhaustively, see below, but since this was the first one...)


Can you explain the methodology you used?


Sure, I gave ? as input and then checked the serialized URL (since
you can't trust the search property).
https://dump.testsuite.org/url/inspect.html works for this.


wget http://intertwingly.net/stories/2014/10/05/urltest
wget http://intertwingly.net/stories/2014/10/05/urltestdata.json

TL:DR; I created a page with a script that (a) fetches input data using XHR;
(b) updates an a and a base element and then captures various properties
for each test, and (c) posts the result using XHR.


Is there a chance that the library on the server does not pick up on a lone ??


I found the bug, thanks for reporting it.

The problem is that the following properties are not defined as 
'enumerable', so are not picked up when I serialize the tests as JSON:


https://github.com/w3c/web-platform-tests/blob/master/url/urltestparser.js#L17

- Sam Ruby


[whatwg] URL: test case review

2014-10-05 Thread Sam Ruby
/stories/2014/10/05/urltest-results/ee52a7413c
http://intertwingly.net/stories/2014/10/05/urltest-results/723aa80622

Worth Discussing:

http://intertwingly.net/stories/2014/10/05/urltest-results/e0d78e8c36
http://intertwingly.net/stories/2014/10/05/urltest-results/eb30a2c2d0
http://intertwingly.net/stories/2014/10/05/urltest-results/e170ad9cce

---

Further background on my methodology and results:

http://intertwingly.net/blog/2014/10/02/WHATWG-URL-vs-IETF-URI

- Sam Ruby

[1] 
https://raw.githubusercontent.com/w3c/web-platform-tests/master/url/urltestdata.txt


[whatwg] Request for HTML.next ideas

2011-04-06 Thread Sam Ruby
Note: while this email is intentionally cross-posted; my request is that 
any responses trim the replies down to *one* of the above lists.


===

At the present time within the HTML WG, there are no surveys active, and 
no calls for proposals.  Some are actively working on converging to 
fewer active proposals for issue 152.  The editors have some changes to 
apply before we proceed to Last Call.  The chairs still have some 
surveys and proposals to evaluate.


Meanwhile, this may be a good time for others to begin to capture ideas 
for what comes after HTML5.  I know that the WHATWG has ideas for some 
features and even has some speculative specification text beyond what 
can make the deadline for HTML5.  I doubt the A11y team has exhausted 
their wish list.


At this time, I would like to request that people capture their ideas here:

  http://www.w3.org/html/wg/wiki/HTML.next

Ideas don't need to be fully fleshed out.  In fact, in many cases a 
simple pointer to a proposal or even a discussion hosted elsewhere is 
all that is needed at this time.


There isn't a hard deadline on this request, but we anticipate that the 
data captured will be discussed at the next AC meeting which goes from 
the 15th of May to the 17th of May.


Thanks!

- Sam Ruby


Re: [whatwg] Article: Growing pains afflict HTML5 standardization

2010-07-12 Thread Sam Ruby
On Mon, Jul 12, 2010 at 11:41 AM, Julian Reschke julian.resc...@gmx.de wrote:
 On 12.07.2010 16:43, Mike Wilcox wrote:

 On Jul 12, 2010, at 8:39 AM, Nils Dagsson Moskopp wrote:

 That's a little different. Google purposely uses unstandardized,
 incorrect HTML in ways that still render in a browser in order to
 make it more difficult for screen scrapers. They also break it in a
 different way every week.

 Assuming this is true (which I find difficult to believe), wouldn't a
 screen scraper based on the HTML5 parsing algorithm defeat this
 purpose ?

 Honestly, I don't know. But W3 defaulted to an HTML5 validator:

 http://validator.w3.org/check?uri=http%3A%2F%2Fwww.google.com%2Fsearch%3Fsource%3Dig%26hl%3Den%26rlz%3D%26%3D%26q%3Dhtml5%26aq%3Df%26aqi%3D%26aql%3D%26oq%3D%26gs_rfai%3Dcharset=%28detect+automatically%29doctype=Inlinegroup=0

 http://validator.w3.org/check?uri=http%3A%2F%2Fwww.google.com%2Fsearch%3Fsource%3Dig%26hl%3Den%26rlz%3D%26%3D%26q%3Dhtml5%26aq%3Df%26aqi%3D%26aql%3D%26oq%3D%26gs_rfai%3Dcharset=%28detect+automatically%29doctype=Inlinegroup=0

 True, but a parser conforming to the spec (*) would handle those errors, so
 in this case obfuscation wouldn't work. Essentially, any code using that
 parser would see the same information as an off-the-shelf web browser.

 ...
 Besides the protecting of their API, Google also will scratch and claw
 to save every byte. They are the gold standard of a high performance

 Understood. There's an ongoing controversy whether it makes sense to make
 things like these invalid (just stating, not offering an opinion).

 website. While this may or may not explain the things that don't
 validate, what it does say is that nothing coming from google.com
 http://google.com is accidental.
 ...

 I believe some time ago a certain Google employee actually *did* state that
 some of the conformance problems were unintentional. (yes, I did spend a few
 minutes finding that statement but wasn't successful).

http://lists.w3.org/Archives/Public/public-html/2010Mar/0555.html

 Best regards, Julian

 (*) Implementing error recovery, which IMHO isn't required.

- Sam Ruby


Re: [whatwg] Technical Parity with W3C HTML Spec

2010-06-25 Thread Sam Ruby
On Fri, Jun 25, 2010 at 3:01 PM, Ian Hickson i...@hixie.ch wrote:


 Maybe the answer is to have a spokesperson or liaison role, someone
 respected in the WHATWG community with a reputation for reasonable
 neutrality?  Both Hixie and Maciej have conflicts of interest, as editor
 and W3C co-chair respectively.  Maybe Hakon or David, since they were
 instrumental in forming WHATWG in the first place?

 Maybe an alternative would be:

 Where there are technical or political conflicts, W3C should decide how
 to resolve those internally, and how to represent the W3C point of view in
 the WHATWG. I would expect that people differ, so I would expect those
 different opinions to be represented in liaisons with WHATWG. I don't have
 a good answer here, because I think it's up to the W3C to decide their own
 processes, but I hope we agree that we need improvements to how we liaison.

First can we work on improving communications so that we can work on
differences before they become conflicts?

We recently had a change proposal made by Lachlan:

http://lists.w3.org/Archives/Public/public-html/2010Apr/1107.html

Absolutely nobody in the W3C WG indicated any issues with this proposal:

http://lists.w3.org/Archives/Public/public-html/2010Jun/0562.html

Recently you said that you value convergence:

http://lists.w3.org/Archives/Public/public-html/2010Jun/0525.html

Yet, when you made the change, you did it in a way that made the
WHATWG version not a proper superset.  You also characterized the
change in a way that I don't believe is accurate:

http://lists.whatwg.org/pipermail/commit-watchers-whatwg.org/2010/004270.html

I'm having trouble reconciling all of the above.  You clearly continue
to be a member of the W3C Working Group.  You state that you value
convergence.  You were given ample opportunity to state an objection.
And you clearly have an issue with Lanlan's suggestion.

How can we improve communications to prevent misunderstandings such as
this one from occurring in the future?

What's the best way to address the mischaracterization of the
difference as it is currently described in the WHATWG draft?

Most importantly, how can we deescalate tensions rather that
continuing in this manner?

- Sam Ruby


Re: [whatwg] Technical Parity with W3C HTML Spec

2010-06-25 Thread Sam Ruby
On Fri, Jun 25, 2010 at 4:03 PM, Sam Ruby ru...@intertwingly.net wrote:

 Yet, when you made the change, you did it in a way that made the
 WHATWG version not a proper superset.

On closer reading, it turns out that I was incorrect here.  It still,
however, remains a divergence, it still is mis-characterized, and I am
still can't reconcile your statement concerning valuing convergence
with this action.

- Sam Ruby


Re: [whatwg] Technical Parity with W3C HTML Spec

2010-06-25 Thread Sam Ruby
On Fri, Jun 25, 2010 at 3:01 PM, Ian Hickson i...@hixie.ch wrote:

 While I agree that it is helpful for us to cooperate, I should point out
 that the WHATWG was never formally approached by the W3C about this

With whom (and where?) would such a formal discussion take place?

I would prefer that such a discussion happen on a publicly archived
mailing list.

- Sam Ruby


Re: [whatwg] Technical Parity with W3C HTML Spec

2010-06-25 Thread Sam Ruby
On Fri, Jun 25, 2010 at 7:02 PM, Ian Hickson i...@hixie.ch wrote:
 On Fri, 25 Jun 2010, Sam Ruby wrote:
 On Fri, Jun 25, 2010 at 3:01 PM, Ian Hickson i...@hixie.ch wrote:
 
  While I agree that it is helpful for us to cooperate, I should point out
  that the WHATWG was never formally approached by the W3C about this

 With whom (and where?) would such a formal discussion take place?

 I would prefer that such a discussion happen on a publicly archived
 mailing list.

 Best thing to do is probably to e-mail the people listed in the charter as
 being the members (e-mail addresses below), and cc the www-arch...@w3.org
 mailing list for archival purposes.

 ann...@opera.com, bren...@mozilla.com, dba...@mozilla.com,
 hy...@apple.com, dean.edwa...@gmail.com, howc...@opera.com,
 j...@mozilla.com, m...@apple.com, i...@hixie.ch

 HTH,

Done:

http://lists.w3.org/Archives/Public/www-archive/2010Jun/0054.html

 --
 Ian Hickson               U+1047E                )\._.,--,'``.    fL
 http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
 Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

- Sam Ruby


Re: [whatwg] A New Way Forward for HTML5 (revised)

2009-07-27 Thread Sam Ruby

John Foliot wrote:

Peter Kasting wrote:

It seems like the only thing you could ask for beyond
this is the ability to directly insert your own changes
into the spec without prior editorial oversight.  I think
that might be what you're asking for.  This seems very
unwise.


Really? This appears to be exactly the single, special status privilege
currently reserved for Ian Hickson.


False.


It is, in fact a serious complaint
that many are trying to correct, including Manu with his offer to assist
in setting up a more egalitarian solution.


In fact, Manu is an instance proof that the previous statement you made 
is false.


Ian is free to produce Working Drafts that are published by this working 
group.  The status of such drafts are, and I quote[1]:


Consensus is not a prerequisite for approval to publish; the
Working Group MAY request publication of a Working Draft even
if it is unstable and does not meet all Working Group requirements.

Both you and Manu have exactly the same ability as Ian does in this 
respect.  Ian has asked the group for permission to publish, and that 
was granted.  Manu has produced a document but has yet to request 
permission to publish as a Working Draft.  You are welcome to do 
likewise[2].



JF


- Sam Ruby

[1] http://www.w3.org/2005/10/Process-20051014/tr.html#first-wd
[2] http://lists.w3.org/Archives/Public/public-html/2009Jul/0627.html


Re: [whatwg] A New Way Forward for HTML5 (revised)

2009-07-27 Thread Sam Ruby

John Foliot wrote:

Sam Ruby wrote:

Really? This appears to be exactly the single, special status

privilege

currently reserved for Ian Hickson.

False.


...and yes, I stand corrected.  Although the *impression* that this is the
current status remains fairly pervasive; however I will endeavor to dispel
that myth as well.  That said, the barrier to equal entry remains high:
http://burningbird.net/node/28

(however, I will also state that Sam has offered on numerous occasions to
extend help to any that requires = balanced commentary)


My goal is to ensure that there are no excuses not to participate.

I've said that a person can simply go into notepad[3], make the changes, 
and I will take care of the rest.  Manu has documented the process for 
those who prefer to do it themselves[4].  Ian has offered to make the 
changes if somebody can explain the use cases[5].  If people have 
suggestions on how to be even *more* inclusive, I welcome any and all 
suggestions.


Meanwhile, your offer to help dispel that myth is very much appreciated.


Both you and Manu have exactly the same ability as Ian does in this
respect.  Ian has asked the group for permission to publish, and that
was granted.  Manu has produced a document but has yet to request
permission to publish as a Working Draft.  You are welcome to do
likewise[2].


While I have personal reservations that this may introduce an even wider
fork of opinion, making consensus down the road even harder to achieve,
this is the die that has been cast.  I will offer what contributions I can
to both Manu and Shelly in their respective initiatives, to the best of my
ability, and will leave the WHAT WG to continue propagating what I see as
their mistakes and false assumptions as they see fit - they have clearly
signaled that not all contributions are welcome.


It may very well end up that the sole difference between the WHATWG 
document and the W3C document is that the the WHATWG document states 
that summary attribute is conformant but obsolete, and the W3C document 
states that the summary attribute is conformant but not (yet) obsolete.


But the only way that will happen is if somebody goes into notepad, or 
follows Manu's process, or explains the use case, or finds some other 
means to cause a working draft to appear with these changes.



JF


[1] http://www.w3.org/2005/10/Process-20051014/tr.html#first-wd
[2] http://lists.w3.org/Archives/Public/public-html/2009Jul/0627.html


- Sam Ruby

[3] http://lists.w3.org/Archives/Public/public-html/2009Jul/0633.html
[4] http://lists.w3.org/Archives/Public/public-html/2009Jul/0785.html
[5] http://lists.w3.org/Archives/Public/public-html/2009Jul/0745.html


Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-12 Thread Sam Ruby
On Tue, May 12, 2009 at 4:34 PM, Shelley Powers
shell...@burningbird.net wrote:

 I
 would say if your fellow Google developers could understand how this all
 works, there is hope for others.

if

http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0064.html

 Shelley

- Sam Ruby


Re: [whatwg] RDFa is to structured data, like canvas is to bitmap and SVG is to vector

2009-01-18 Thread Sam Ruby
On Sun, Jan 18, 2009 at 1:34 PM, Henri Sivonen hsivo...@iki.fi wrote:
 On Jan 18, 2009, at 01:32, Shelley Powers wrote:

 Are you then saying that this will be a showstopper, and there will never
 be either a workaround or compromise?

 Are the RDFa TF open to compromises that involve changing the XHTML side of
 RDFa not to use attribute whose qualified name has a colon in them to
 achieve DOM Consistency by changing RDFa instead of changing parsing?

Just so that we have all of the data available to make an informed
decision, do we have examples of how it would break the web if
attributes which started with the characters xmlns: (and *only*
those attribute) were placed into the DOM exactly as they would be
when those bytes are processed as XHTML?

Notes: I am *not* suggesting anything just yet, other than the
gathering of this data.  I also recognize that this would require a
parsing change by browser vendors, which also is a cost that needs to
be factored in.  But right now, I am interested in how it would affect
the web if this were done.

 --
 Henri Sivonen
 hsivo...@iki.fi
 http://hsivonen.iki.fi/

- Sam Ruby


Re: [whatwg] RDFa is to structured data, like canvas is to bitmap and SVG is to vector

2009-01-17 Thread Sam Ruby
On Sat, Jan 17, 2009 at 11:55 AM, Shelley Powers
shell...@burningbird.net wrote:
 The debate about RDFa highlights a disconnect in the decision making related
 to HTML5.

Perhaps.  Or perhaps not.  I am far from an apologist for Hixie, (nor
for that matter and I a strong advocate for RDF), but I offer the
following question and observation.

 The purpose behind RDFa is to provide a way to embed complex information
 into a web document, in such a way that a machine can extract this
 information and combine it with other data extracted from other web pages.
 It is not a way to document private data, or data that is meant to be used
 by some JavaScript-based application. The sole purpose of the data is for
 external extraction and combination.

So, I take it that it isn't essential that RDFa information be
included in the DOM?  This is not rhetorical: I honestly don't know
the answer to this question.

 So, why accept that we have to use MathML in order to solve the problems of
 formatting mathematical formula? Why not start from scratch, and devise a
 new approach?

Ian explored (and answered) that here:

http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-April/014372.html

Key to Ian's decision was the importance of DOM integration for this
vocabulary.  If DOM integration is essential for RDFa, then perhaps
the same principles apply.  If not, perhaps some other principles may
apply.

- Sam Ruby


Re: [whatwg] RDFa is to structured data, like canvas is to bitmap and SVG is to vector

2009-01-17 Thread Sam Ruby
On Sat, Jan 17, 2009 at 1:33 PM, Dan Brickley dan...@danbri.org wrote:
 On 17/1/09 19:27, Sam Ruby wrote:

 On Sat, Jan 17, 2009 at 11:55 AM, Shelley Powers
 shell...@burningbird.net  wrote:

 The debate about RDFa highlights a disconnect in the decision making
 related
 to HTML5.

 Perhaps.  Or perhaps not.  I am far from an apologist for Hixie, (nor
 for that matter and I a strong advocate for RDF), but I offer the
 following question and observation.

 The purpose behind RDFa is to provide a way to embed complex information
 into a web document, in such a way that a machine can extract this
 information and combine it with other data extracted from other web
 pages.
 It is not a way to document private data, or data that is meant to be
 used
 by some JavaScript-based application. The sole purpose of the data is for
 external extraction and combination.

 So, I take it that it isn't essential that RDFa information be
 included in the DOM?  This is not rhetorical: I honestly don't know
 the answer to this question.

 Good question. I for one expect RDFa to be accessible to Javascript.

 http://code.google.com/p/rdfquery/wiki/Introduction -
 http://rdfquery.googlecode.com/svn/trunk/demos/markup/markup.html is a nice
 example of code that does something useful in this way.

The fact that this works anywhere at all today implies that little, if
any, changes to browsers is required in order to support this.  Is
that a fair statement?

I've not taken a look at the code, but have taken a quick glance at
the output using IE8.0.7000.0 beta, Safari 3.2.1/Windows, Chrome
1.0.154.43, Opera 9.63, and Firefox 3.0.5.

The page is different (as in less functional) under IE8 and Safari.
Is there something that they need to do which is not already covered
in the HTML5 specification in order to support this?

- Sam Ruby


Re: [whatwg] RDFa is to structured data, like canvas is to bitmap and SVG is to vector

2009-01-17 Thread Sam Ruby
On Sat, Jan 17, 2009 at 2:38 PM, Shelley Powers
shell...@burningbird.net wrote:

 I propose that RDFa is the best solution to the use case Martin supplied,
 and we've shown how it is not a disruptive solution to HTML5.

Others may differ, but my read is that the case is a strong one.  But
I will caution you that a little patience is in order.  SVG is not a
done deal yet.  I've been involved in a number of standards efforts,
and I've never seen a case of proposed on a Saturday morning, decided
on a Saturday afternoon.  One demo is not conclusive.  Now you
mention that there exists a number of libraries.  I think that's
important.  Very important.  Possibly conclusive.

But back to expectations.  I've seen references elsewhere to Ian being
booked through the end of this quarter.  I may have misheard, but in
any case, my point is the same: if this is awaiting something from
Ian, it will be prioritized and dealt with accordingly.  If, however,
some of the legwork is done for Ian, this may help accelerate the
effort.

Even little things may help a lot.  I know what I'm about to say may
be unpopular, but I'll say it anyway: take a few good examples of RDFa
and run them through Henri's validator.  The validator will helpfully
indicate exactly what areas of the spec would need to be updated in
order to accommodate RDFa.  The next step would be to take a look at
those sections.  If the update is obvious and straightforward, perhaps
nothing more is required.  But if not, researching into the options
and making recommendations may help.

- Sam Ruby


Re: [whatwg] RDFa is to structured data, like canvas is to bitmap and SVG is to vector

2009-01-17 Thread Sam Ruby
On Sat, Jan 17, 2009 at 3:51 PM, Shelley Powers
shell...@burningbird.net wrote:
 Sam Ruby wrote:

 On Sat, Jan 17, 2009 at 2:38 PM, Shelley Powers
 shell...@burningbird.net wrote:


 I propose that RDFa is the best solution to the use case Martin supplied,
 and we've shown how it is not a disruptive solution to HTML5.


 Others may differ, but my read is that the case is a strong one.  But
 I will caution you that a little patience is in order.  SVG is not a
 done deal yet.  I've been involved in a number of standards efforts,
 and I've never seen a case of proposed on a Saturday morning, decided
 on a Saturday afternoon.  One demo is not conclusive.  Now you
 mention that there exists a number of libraries.  I think that's
 important.  Very important.  Possibly conclusive.


 I am patient. Look at me? I make extensive use of both SVG and RDF -- that
 is the mark of a patient woman.

 But back to expectations.  I've seen references elsewhere to Ian being
 booked through the end of this quarter.  I may have misheard, but in
 any case, my point is the same: if this is awaiting something from
 Ian, it will be prioritized and dealt with accordingly.  If, however,
 some of the legwork is done for Ian, this may help accelerate the
 effort.


 First of all, whatever happens has to happen with either vetting by the
 RDF/RDFa folks, if not their active help. This is my way of saying, I'd be
 willing to do much of the legwork, but I want to make I don't represent RDFa
 incorrectly.

 Secondly, my finances have been caught up in the current downturn, and my
 first priority has to be on the hourly work and odd jobs I'm getting to keep
 afloat. Which means that I can't always guarantee 20+ hours a week on a
 task, nor can I travel. Anywhere.

 But if both are acceptable conditions, I'm willing to help with tasks.

I don't see any of that as being a problem.

 Even little things may help a lot.  I know what I'm about to say may
 be unpopular, but I'll say it anyway: take a few good examples of RDFa
 and run them through Henri's validator.  The validator will helpfully
 indicate exactly what areas of the spec would need to be updated in
 order to accommodate RDFa.  The next step would be to take a look at
 those sections.  If the update is obvious and straightforward, perhaps
 nothing more is required.  But if not, researching into the options
 and making recommendations may help.

 Tasks including this one.

Excellent.  Well, all except for the downturn thing, but you know what I mean.

In order to prevent any misunderstandings: it is not for me to assign
work.  In fact, nobody here is in such a position.  People simply note
things that need to be done, and do the ones that interest them, at
the pace at which they are able.

And communicate copiously.  If you need help in vetting, I am given to
understand that there is a small pocket of RDF enthusiasm in the W3C.
:-P

 Shelley

- Sam Ruby


Re: [whatwg] RDFa is to structured data, like canvas is to bitmap and SVG is to vector

2009-01-17 Thread Sam Ruby
On Sat, Jan 17, 2009 at 5:51 PM, Henri Sivonen hsivo...@iki.fi wrote:
 On Jan 17, 2009, at 22:35, Shelley Powers wrote:

 Generally, though, RDFa is based on reusing a set of attributes already
 existing in HTML5, and adding a few more.

 Also, RDFa uses CURIEs which in turn use the XML namespace mapping context.

 I would assume no differences in the DOM based on XHTML or HTML.

 The assumption is incorrect.

 Please compare
 http://hsivonen.iki.fi/test/moz/xmlns-dom.html
 and
 http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml

 Same bytes, different media type.

The W3C Recommendation for DOM also describes a readonly attribute on
Attr named 'name'.  Discuss.

 I put together a very crude demonstration of JavaScript access of a
 specific RDFa attribute, about. It's temporary, but if you go to my main web
 page,http://realtech.burningbird.net, and look in the sidebar for the click
 me text, it will traverse each div element looking for an about attribute,
 and then pop up an alert with the value of the attribute. I would use
 console rather than alert, but I don't believe all browsers support console,
 yet.

 This misses the point, because the inconsistency is with attributes named
 xmlns:foo.

There is a similar inconsistency in how xml:lang is handled.  Discuss.

 --
 Henri Sivonen
 hsivo...@iki.fi
 http://hsivonen.iki.fi/

- Sam Ruby


Re: [whatwg] How to use SVG in HTML5?

2008-01-23 Thread Sam Ruby
On Jan 23, 2008 2:13 PM, Krzysztof Żelechowski [EMAIL PROTECTED] wrote:

 SVG is too heavyweight
 for the purpose of such tiny presentational enhancements.

I can provide counterexamples:

http://intertwingly.net/blog/
http://intertwingly.net/blog/archives/

- Sam Ruby


Re: [whatwg] Entity parsing

2007-06-23 Thread Sam Ruby

On 6/14/07, Ian Hickson [EMAIL PROTECTED] wrote:

On Sun, 5 Nov 2006, Øistein E. Andersen wrote:

 From section 9.2.3.1. Tokenising entities:
   For some entities, UAs require a semicolon, for others they don't.

 This applies to IE.

 FWIW, the entities not requiring a semicolon are the ones encoding
 Latin-1 characters, the other HTML 3.2 entities (amp, gt and lt), as
 well as quot and the uppercase variants (AMP, COPY, GT, LT, QUOT
 and REG). [...]

I've defined the parsing and conformance requirements in a way that
matches IE. As a side-effect, this has made things like naiumlve
actually conforming. I don't know if we want this. On the one hand, it's
pragmatic (after all, why require the semicolon?), and is equivalent to
not requiring quotes around attribute values. On the other, people don't
want us to make the quotes optional either.


With the latest changes to html5lib, we get a failure on a test named
test_title_body_named_charref.

Before, A mdash B == A — B, now A mdash B == A amp;mdash B.

Is that what we really want?  Testing with Firefox, the old behavior
is preferable.

- Sam Ruby


[whatwg] web-apps/current-work/#datetime-parser

2007-04-17 Thread Sam Ruby

Step 25

If sign is negative, then shouldn't timezoneminutes also be negated?

Step 27

Shouldn't that be SUBTRACTING timezonehours hours and timezoneminutes 
minutes?


My current time is 2007-04-17T05:28:33-04:00  The timezone is -4 hours 
from UTC.  To convert to UTC I need to add 4 hours.


- Sam Ruby


Re: [whatwg] Attribute for holding private data for scripting

2007-04-11 Thread Sam Ruby

Maciej Stachowiak wrote:


On Apr 10, 2007, at 8:12 PM, Sam Ruby wrote:


Maciej Stachowiak wrote:

On Apr 10, 2007, at 2:14 PM, Sam Ruby wrote:

Anne van Kesteren wrote:
On Tue, 10 Apr 2007 22:41:12 +0200, Sam Ruby 
[EMAIL PROTECTED] wrote:

How so?
I missed the part where you wanted to change existing HTML parsers. 
I thought Hixie pointed out earlier (by means of examples) why we 
can't have namespace parsing in HTML. I suppose we can discuss it 
again...


It is a recurring pattern.  The first instance was we can't allow 
trailing slashes in tags, which was followed up by a carefully 
crafted and narrow set of exceptions, which was met with that 
works and was adopted.


So... while it is clearly true the full extent of XML namespames 
will never be supported in HTML5 (and for good reason), namespace 
qualified attributes allow extensibility in ways that prevent 
collisions.


One of the first questions that would need to be answered: are there 
any existing documents on the web which would be broken if the name 
placed into the DOM for attributes with names containing a colon, 
with an apparent prefix, and one that matched an enclosing xmlns: 
declaration were to be changed?
I think the problem here isn't compatibility with existing content, 
but rather ability to use the feature in new web content while still 
gracefully handling existing user agents. We wrote up some design 
principles for the HTML WG based on the WHATWG's working assumptions 
which might make this point more clear: 
http://esw.w3.org/topic/HTML/ProposedDesignPrinciples. While Don't 
Break The Web is a goal, so is Degrade Gracefully.
To give a specific example: say I make my own mjsml prefix with 
namespace http://example.org/mjsml;. In HTML4 UAs, to look up an 
mjsml:extension attribute using getAttribute(mjsml:extension). In 
HTML5 UAs, I'd have to use getAttributeNS(http://example.org/mjsml;, 
extension). And neither technique would work on both (at least as I 
understand your proposal).


Here's a page I constructed, and tested on Firefox:

http://intertwingly.net/stories/2007/04/10/test.html

This page is meant to be served as application/xhtml+xml.

Can you test it and see what results you get?  Then lets discuss further.


In Safari 2.0.4: Processed as HTML, it says data and then . 
Processed as XHTML, it says null and then data.
In Opera 9.00: Processed as HTML, it says data and then null. 
Processed as XHTML, it says null and then data.
In Firefox 2.0.0.3: Processed as HTML, it says data and then . 
Processed as XHTML, it says data and then data.
In IE/Mac 5.2: Processed as HTML, it says data and the second alert 
does not appear. Processed as XHTML, neither alert appears.


It looks like Firefox's XHTML implementation already has the 
getAttribute extension I suggested of handling QNames.


Cool!

The first thing that is apparent to me is that, when processed as HTML, 
element.getAttribute('mjsml:extension') works everywhere.  So it is 
probably fair to say that allowing it does not run afoul of either the 
Don't Break the Web or Degrade Gracefully design principles.


Per HTML5 section 8.1.2.3, however, such an attribute name would not be 
considered conformant.  Despite this, later in document, in the 
description of Attribute name state, no parse error is produced for 
this condition.  Nor does the current html5lib parser produce a parse 
error with this data.


I'd suggest that the first order of business is to reconcile 8.1.2.3 
with the description of Attribute name state.  My suggestion is that 
Anything else emits a parse error (but nevertheless continues 
on/recovers), and that a rule for handling latin small letter a through 
z, hyphen-minus, and colon be added.


By the way, the fact that no two of the browsers I tested treat this the 
same is a pretty clear indicator that DOM Core needs the HTML5 treatment.


+1.  But this begs a larger issue.  Much of the differences that you 
found were in how XHTML was handled, and the WhatWG document currently 
states:


The rules for parsing XML documents (and thus XHTML documents)
into DOM trees are covered by the XML and Namespaces in XML
specifications, and are out of scope of this specification.
[XML] [XMLNS]


Regards,
Maciej


- Sam Ruby




Re: [whatwg] Attribute for holding private data for scripting

2007-04-11 Thread Sam Ruby

Anne van Kesteren wrote:
On Wed, 11 Apr 2007 13:40:39 +0200, Sam Ruby [EMAIL PROTECTED] 
wrote:
Per HTML5 section 8.1.2.3, however, such an attribute name would not 
be considered conformant.


Yes, only attributes defined in the specification are conformant.


I was specifically referring to section 8.1.2.3.  Let me call your 
attention to the following text:


Attribute names use characters in the range U+0061 LATIN SMALL
LETTER A .. U+007A LATIN SMALL LETTER Z, or, in uppercase, U+0041
LATIN CAPITAL LETTER A .. U+005A LATIN CAPITAL LETTER Z, and U+002D
HYPHEN-MINUS (-).

Despite this, later in document, in the description of Attribute name 
state, no parse error is produced for this condition.  Nor does the 
current html5lib parser produce a parse error with this data.


Correct. We're not doing validation. Just tokenizing and building a tree.


In the process, parse errors are generally emitted in cases where 
individual characters are encountered which do not match the lexical 
grammar rules.  Just not in this case.


- Sam Ruby


Re: [whatwg] Attribute for holding private data for scripting

2007-04-11 Thread Sam Ruby

Anne van Kesteren wrote:
On Wed, 11 Apr 2007 13:40:39 +0200, Sam Ruby [EMAIL PROTECTED] 
wrote:
To give a specific example: say I make my own mjsml prefix with 
namespace http://example.org/mjsml;. In HTML4 UAs, to look up an 
mjsml:extension attribute using getAttribute(mjsml:extension). 
In HTML5 UAs, I'd have to use 
getAttributeNS(http://example.org/mjsml;, extension). And 
neither technique would work on both (at least as I understand your 
proposal).


By the way, the reason this is not consistent with XML is that it would 
be just as ok to use a different prefix. By basing this on the prefix 
(which is needed if you want this to be compatible with HTML, etc.) 
you're moving the semantics from the namespace to the prefix, which 
seems like a bad idea.


For starters, you are misattributing the quote above.  I did not write 
those words.


As to your point -- and  you so colorfully put it on your weblog -- 
Standards Suck.  And in this case, I will argue that the current HTML5 
spec leads one to the conclusion that getAttribute(mjsml:extension) 
will work, at least for the HTML serialization of HTML5.


I did not write that quote.  I did not write -- or even contribute to -- 
that portion of the spec.


- Sam Ruby


Re: [whatwg] Attribute for holding private data for scripting

2007-04-11 Thread Sam Ruby

Anne van Kesteren wrote:
On Wed, 11 Apr 2007 13:53:21 +0200, Sam Ruby [EMAIL PROTECTED] 
wrote:



Anne van Kesteren wrote:
On Wed, 11 Apr 2007 13:40:39 +0200, Sam Ruby [EMAIL PROTECTED] 
wrote:
Per HTML5 section 8.1.2.3, however, such an attribute name would not 
be considered conformant.

 Yes, only attributes defined in the specification are conformant.


I was specifically referring to section 8.1.2.3.  Let me call your 
attention to the following text:


 Attribute names use characters in the range U+0061 LATIN SMALL
 LETTER A .. U+007A LATIN SMALL LETTER Z, or, in uppercase, U+0041
 LATIN CAPITAL LETTER A .. U+005A LATIN CAPITAL LETTER Z, and U+002D
 HYPHEN-MINUS (-).


I think you should read the whole section. Allowing colons there 
wouldn't make a difference.


The document is a draft.  The subject line of this thread suggests that 
the WG is entertaining the notion of allowing at least one attribute 
which is not currently defined in the specification.  This suggests that 
the draft may need to change.  Drafts are like that.


Like others, I'm not convinced that the way forward is to allow a new 
attribute which has a micro-grammar for parsing what would be 
represented in the DOM essentially as a character blob.


Despite this, later in document, in the description of Attribute 
name state, no parse error is produced for this condition.  Nor 
does the current html5lib parser produce a parse error with this data.


Correct. We're not doing validation. Just tokenizing and building a 
tree.


In the process, parse errors are generally emitted in cases where 
individual characters are encountered which do not match the lexical 
grammar rules.  Just not in this case.


The above are not the grammar rules. They are (normative) guidelines for 
people writing or generating HTML. As far as I can tell there's no 
normative grammar. Just a way to construct a conforming string and a way 
to interpret a random string.



--Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/





Re: [whatwg] Attribute for holding private data for scripting

2007-04-11 Thread Sam Ruby

Anne van Kesteren wrote:


I think I'd rather have something simple such as 
prefix_name for extensions made by ECMAScript libraries, etc. (As 
opposed to an in scope xmlns:prefix=http://...; with prefix:name 
extensions which work differently in XML.) That would also work better 
for element extensions. Not any of this should be allowed, but there 
seems to be some desire to have an ability to introduce conforming 
extension elements / attributes which are implemented using a script 
library.


This leads into lots of tangents.

1) re: prefix_name - how are prefixes registered?  Henri is free to 
correct me if I am wrong, but I gathered that the requirement was for a 
bit of decentralized extensibility, i.e., the notion that anybody for 
any reason could defined an extension for holding private data; and 
furthermore could do so without undo fear of collision.


2) I assert that the existing DOM standard already defines a mechanism 
for decentralized extensibility.  Most relevant to the discussion at 
hand is the getAttributeNS method.  It may not be defined as clearly as 
it could be, but there does seem to be some clues which suggest what the 
original intent was, and the beginnings of an agreement that if more 
browsers were to conform to that intent, that would be a GOOD THING(TM).


3) There already is spec text which indicates how html5 defined elements 
are to be handled with respect to getElementsByTagNameNS.  Perhaps it 
would again be a GOOD THING(TM) if this was also codified for 
attributes.  I believe that this is consistent with what Maciej is 
calling for.


4) One thing that needs to be mentioned is that compliance to the DOM 
standard varies widely.  In the long term, perhaps browser vendors could 
do a better job of this, and perhaps the HTML5 effort can help put a 
focus on this need.  In the short term, however, this can be dealt with 
via JavaScript.  Encapsulating and dealing with browser 
incompatibilities is an all too common use case for JavaScript.


5) I'm not sure where you draw the conclusion that prefix:name 
extensions would work differently than in XML.  While Python's minidom 
does not appear to produce the desired results when I call 
getElementById, it otherwise seems to handle the document identically to 
the way Firefox does:


http://intertwingly.net/stories/2007/04/10/test.py

- Sam Ruby


Re: [whatwg] Attribute for holding private data for scripting

2007-04-10 Thread Sam Ruby

On 4/10/07, Simon Pieters [EMAIL PROTECTED] wrote:


Or allow any attribute that starts with x_ or something (to prevent
clashing with future revisions of HTML), as private attributes.


Instead of starts with x_, how about contains a colon?

A conformance checker could ensure that there is a corresponding xmlns
declaration that applies here, and possibly even do additional
verification if it recognizes the namespace.

An HTML5 parser would, of course, recover from references to
undeclared namespaces, placing the entire attribute name (including
the prefix and the colon) into the DOM in such situations.


Re: [whatwg] Attribute for holding private data for scripting

2007-04-10 Thread Sam Ruby

On 4/10/07, Anne van Kesteren [EMAIL PROTECTED] wrote:

On Tue, 10 Apr 2007 20:21:27 +0200, Sam Ruby [EMAIL PROTECTED]
wrote:
 Or allow any attribute that starts with x_ or something (to prevent
 clashing with future revisions of HTML), as private attributes.

 Instead of starts with x_, how about contains a colon?

 A conformance checker could ensure that there is a corresponding xmlns
 declaration that applies here, and possibly even do additional
 verification if it recognizes the namespace.

 An HTML5 parser would, of course, recover from references to
 undeclared namespaces, placing the entire attribute name (including
 the prefix and the colon) into the DOM in such situations.

* That would be confusing to people familiar with XML;
* It would hinder the ability to exchange scripts between HTML and XML;
* It would create more differences between XML and HTML where less seems
   to be desired (trailing slash allowed, etc.).


How so?

The idea is to place these attributes into the DOM the same way as
they would be when parsed with an xml parser, for the cases where the
data happens to be namespace valid.  And to do what you would expect
in the cases where, for example, attribute values aren't quoted.  And
to follow the html5 credo of recover at all cost in cases where what
the user entered doesn't conform.

This would of course need to be spec'ed, AND compared against common
usage, AND prototyped; I simply ask that it not be rejected out of
hand.

- Sam Ruby


Re: [whatwg] Attribute for holding private data for scripting

2007-04-10 Thread Sam Ruby

Anne van Kesteren wrote:
On Tue, 10 Apr 2007 22:41:12 +0200, Sam Ruby [EMAIL PROTECTED] 
wrote:

How so?


I missed the part where you wanted to change existing HTML parsers. I 
thought Hixie pointed out earlier (by means of examples) why we can't 
have namespace parsing in HTML. I suppose we can discuss it again...


It is a recurring pattern.  The first instance was we can't allow 
trailing slashes in tags, which was followed up by a carefully crafted 
and narrow set of exceptions, which was met with that works and was 
adopted.


So... while it is clearly true the full extent of XML namespames will 
never be supported in HTML5 (and for good reason), namespace qualified 
attributes allow extensibility in ways that prevent collisions.


One of the first questions that would need to be answered: are there any 
existing documents on the web which would be broken if the name placed 
into the DOM for attributes with names containing a colon, with an 
apparent prefix, and one that matched an enclosing xmlns: declaration 
were to be changed?


- Sam Ruby



Re: [whatwg] Attribute for holding private data for scripting

2007-04-10 Thread Sam Ruby

Maciej Stachowiak wrote:


On Apr 10, 2007, at 2:14 PM, Sam Ruby wrote:


Anne van Kesteren wrote:
On Tue, 10 Apr 2007 22:41:12 +0200, Sam Ruby [EMAIL PROTECTED] 
wrote:

How so?
I missed the part where you wanted to change existing HTML parsers. I 
thought Hixie pointed out earlier (by means of examples) why we can't 
have namespace parsing in HTML. I suppose we can discuss it again...


It is a recurring pattern.  The first instance was we can't allow 
trailing slashes in tags, which was followed up by a carefully 
crafted and narrow set of exceptions, which was met with that works 
and was adopted.


So... while it is clearly true the full extent of XML namespames will 
never be supported in HTML5 (and for good reason), namespace qualified 
attributes allow extensibility in ways that prevent collisions.


One of the first questions that would need to be answered: are there 
any existing documents on the web which would be broken if the name 
placed into the DOM for attributes with names containing a colon, with 
an apparent prefix, and one that matched an enclosing xmlns: 
declaration were to be changed?


I think the problem here isn't compatibility with existing content, but 
rather ability to use the feature in new web content while still 
gracefully handling existing user agents. We wrote up some design 
principles for the HTML WG based on the WHATWG's working assumptions 
which might make this point more clear: 
http://esw.w3.org/topic/HTML/ProposedDesignPrinciples. While Don't 
Break The Web is a goal, so is Degrade Gracefully.


To give a specific example: say I make my own mjsml prefix with 
namespace http://example.org/mjsml;. In HTML4 UAs, to look up an 
mjsml:extension attribute using getAttribute(mjsml:extension). In 
HTML5 UAs, I'd have to use getAttributeNS(http://example.org/mjsml;, 
extension). And neither technique would work on both (at least as I 
understand your proposal).


Here's a page I constructed, and tested on Firefox:

http://intertwingly.net/stories/2007/04/10/test.html

This page is meant to be served as application/xhtml+xml.

Can you test it and see what results you get?  Then lets discuss further.

BTW, I intentionally don't have a completed proposal at this point.  We 
need to explore what works and what doesn't work further.


Now, we could extend getAttribute in HTML to do namespace lookup when 
given a name containing a colon and when namespace declarations are 
present, but then we would want to do it in XHTML as well. And using the 
short getAttribute call instead of a longer getAttributeNS with a 
namespace prefix might be unacceptable to XML fans.


Regards,
Maciej


- Sam Ruby


Re: [whatwg] Pre element question

2007-01-19 Thread Sam Ruby

Ian Hickson wrote:

On Fri, 19 Jan 2007, Sam Ruby wrote:

People often code things like the following:

pre
one
two
three
/pre

Visually, this ends up looking something like

+---+
|   |
| one   |
| two   |
| three |
+---+

with the following CSS rule:

pre { border: solid 1px #000; }

[in standards mode]


I couldn't reproduce this. In Firefox trunk, with:

http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%3Cstyle%3Epre%20%7B%20border%3A%20solid%3B%20%7D%3C/style%3E%0A%3Cpre%3E%0Ax%0A%3C/pre%3E

...I get the leading newline dropped.


Presumably then this is yet another difference between 
application/xhtml+xml and text/html.


If it did do it, in HTML4, it would have been a stardards mode bug (bug 
2750, which I filed back in 1999).


In HTML5, we're dropping that requirement, since everyone ignores it. 
However, we will, as you point out, have to introduce a special behaviour 
for a newline at the start of a pre element. IE actually does it for 
more than just the pre element (e.g. it does it for p, though not 
span) but compatibility with the Web only seems to require it for pre 
since that's all that the other browsers do it for.


Fixed.


Thanks!

For reference, the current (and presumably as of now no longer valid) 
behavior of html5lib is as follows:


#document
|  !DOCTYPE HTML
|  html
|head
|  style
|pre { border: solid; }
|  

|body
|  pre
|
x

|  


- Sam Ruby


[whatwg] Standard DOM Serialization? [was :Common Subset]

2006-12-09 Thread Sam Ruby

Michel Fortin wrote:


I've started a wiki page about the common subset:
http://wiki.whatwg.org/wiki/Common_Subset


I'd like to explore this from a different angle.

Libraries (like html5lib) will likely provide a means to serialize a 
DOM, and will presumably have unit tests.


The question is: does it make sense to standardize what such a method 
produces?  HTML allows variations on the case of elements, single vs. 
double vs. no quoting of attributes, etc.


If such were standardized, how would the HTML5 canonical serialization 
differ from the XHTML5 canonical serialization (in fact, must they be 
different at all?)


In any case, a desirable feature of such a serialization would be the 
ability to round trip.  For HTML5, this would only apply to all valid 
HTML5 documents: as an example, one could artificially produce a DOM 
which contains a h1 inside the head element; if such a DOM were 
serialized and then parsed by an HTML5 parser, the DOM produced would 
differ, as well it should.


If there is no interest in standardizing a serialization (or separate 
standard serializations form HTML5 and XHTML5), then this discussion 
belongs on [EMAIL PROTECTED] mailing list.


- Sam Ruby



Re: [whatwg] Standard DOM Serialization? [was :Common Subset]

2006-12-09 Thread Sam Ruby

Anne van Kesteren wrote:
On Sun, 10 Dec 2006 00:29:03 +0100, Sam Ruby [EMAIL PROTECTED] 
wrote:
If there is no interest in standardizing a serialization (or separate 
standard serializations form HTML5 and XHTML5), then this discussion 
belongs on [EMAIL PROTECTED] mailing list.


http://www.whatwg.org/specs/web-apps/current-work/#innerhtml


I assume that you are trying to tell me something, but I am too dumb to 
understand it.


That section says the innerHTML DOM attribute of all HTMLElement and 
HTMLDocument nodes returns A SERIALISATION of the node's children using 
the HTML syntax.   [emphasis added]


A given DOM may have multiple, valid, HTML5 serializations.

I am asking whether there is interest in identifying ONE standard 
serialization that everybody who wishes to comply with could do so.


- Sam Ruby


Re: [whatwg] Standard DOM Serialization? [was :Common Subset]

2006-12-09 Thread Sam Ruby

Henri Sivonen wrote:

On Dec 10, 2006, at 02:09, Sam Ruby wrote:

I am asking whether there is interest in identifying ONE standard 
serialization that everybody who wishes to comply with could do so.


Why? For digital signatures? For comparing parse trees from different 
parsers?


My train of thought started with the sharing of test cases, and when 
coupled with the discussion on the common subset; when put together I 
was wondering if there would be a relation between the two.


I (obviously) hadn't considered innerHTML.  *IF* there were interest in 
changing this (something which I presume is *NOT* the case) and *IF* a 
common subset between XHTML5 and HTML5 was viable (plausible but not 
certain) *THEN* the confusing difference in meaning between innerHTML in 
an XML and HTML context could be resolved.


All told, seems rather unlikely, so nevermind.

- Sam Ruby



Re: [whatwg] Inline SVG

2006-12-08 Thread Sam Ruby

Leons Petrazickis wrote:

On 12/7/06, Alexey Feldgendler [EMAIL PROTECTED] wrote:

On Mon, 04 Dec 2006 13:55:32 +0600, Ian Hickson [EMAIL PROTECTED] wrote:

   http://intertwingly.net/stories/2006/12/02/whatwg.logo

 Currently, there wouldn't be one. We could extend HTML5 to have some 
sort
 of way of doing this, in the future. (It isn't clear to me that we'd 
want

 to allow inline SVG, though. It's an external embedded resource, not a
 semantically-rich part of the document, IMHO.)

People will do inline SVG anyway. If there won't be a straightforward 
way to do this, authors will use all kinds of dirty hacks, such as 
data: URIs and DOM manipulation.


Personally, I don't think SVG content is inappropriate for HTML 
documents. It is no more presentational than the text itself: HTML 
doesn't try to structure natural language sentences by breaking them 
into grammar constructs, so an SVG image could be thought of as just 
an atomic phrase which only has defined semantics as whole.


How about this for HTML5:
object type=image/svg+xml
   svg version=1.1 xmlns=http://www.w3.org/2000/svg;
   circle cx=100 cy=50 r=40 stroke=black
stroke-width=2 fill=red/
   /svg
/object

And this for XHTML5:
object type=image/svg+xml
![CDATA[
   svg version=1.1 xmlns=http://www.w3.org/2000/svg;
   circle cx=100 cy=50 r=40 stroke=black
stroke-width=2 fill=red/
   /svg
]]
/object

If that's over-complicating the semantics of object, we could
introduce an inline xml tag that's similar to the inline script
and style tags. It would have a type= attribute to specify the
mimetype, and its contents would be within a CDATA block in XHTML5.


First, why the different syntax, and in particular, why CDATA?

One of the key advantages of SVG, as it exists today, in XHTML is that 
the SVG elements are in the DOM.  Not as an opaque blob, but as a set of 
scriptable and stylable elements.  Take a look at the following:


http://developer.mozilla.org/en/docs/SVG_In_HTML_Introduction

- Sam Ruby


Re: [whatwg] Test cases for parsing spec (Was: Re: Provding Better Tools)

2006-12-07 Thread Sam Ruby

Karl Dubost wrote:

Sam,

Le 6 déc. 2006 à 23:13, Sam Ruby a écrit :
My original interest was to write a replacement for Python's SGMLLIB, 
i.e., one that was not based on the theoretical ideal of how SGML 
vocabularies work, but one based on the practical notion of how HTML 
actually is parsed.


I'm not sure sgmllib would be the best target. Specifically if it's used 
in many other products. But maybe you are talking about a new library 
altogether.


http://docs.python.org/lib/module-sgmllib.html
8.2 sgmllib -- Simple SGML parser

This module defines a class SGMLParser which serves as the basis for
parsing text files formatted in SGML (Standard Generalized Mark-up
Language). In fact, it does not provide a full SGML parser -- it only
parses SGML insofar as it is used by HTML, and the module only exists
as a base for the htmllib module. Another HTML parser which supports
XHTML and offers a somewhat different interface is available in the
HTMLParser module.

It seems a better candidate.

http://docs.python.org/lib/module-HTMLParser.html
8.1 HTMLParser -- Simple HTML and XHTML parser

 New in version 2.2.

This module defines a class HTMLParser which serves as the basis for
parsing text files formatted in HTML (HyperText Mark-up Language) and
XHTML. Unlike the parser in htmllib, this parser is not based on the
SGML parser in sgmllib.

I'm adding them to the list of HTML parsers.
http://esw.w3.org/topic/HTMLAsSheAreSpoke


htmllib is both based on sgmllib (and shares some of the same issues) 
and is a bit draconian.  It is less suitable for consuming html as 
practiced than sgmllib.


I was originally thinking about creating a htmllib2 much like there is a 
urllib2 (in the library) and an httplib2 (by Joe Gregorio).  Though it 
now looks like it makes more sense to name it httplib5, and potentially 
join forces with others who (may) have similar interests.


- Sam Ruby


Re: [whatwg] several messages about XML syntax and HTML5

2006-12-07 Thread Sam Ruby

Ian Hickson wrote:


The pingback specification does exactly what the trackback 
specification does, but without relying on RDF blocks in comments or 
anything silly like that. It just uses the Microformats approach, and 
is far easier to use, and doesn't require any additional bits to add 
to HTML.

[offtopic]
I'd never heard of pingback. I googled for it and found your website 
first, but couldn't find the RFC number.  You have a copyright of 2002, 
and it appears that Trackback was also developed in 2002. So are you 
implying they should have used Pingback instead?  It appears they were 
developed in parallel?


They were made around the same time (Trackback was invented first). My 
point was just that Trackback is not a good example of why you need more 
attributes in HTML, since there are equivalent technologies that do it 
with existing markup and no loss of detail.


I disagree.  The pingback specification does NOT do exactly what the 
trackback specification does.


Pingback discovery works for any media type, does not deal with any 
granularity smaller than a URL.


Trackback discovery is limited to (X)HTML, but can deal with multiple 
entries on a single page.  Here's an example:


  http://scott.userland.com/2005/11/09.html

- Sam Ruby


Re: [whatwg] Sanctity of MIME types

2006-12-06 Thread Sam Ruby

Ian Hickson wrote:

On Mon, 4 Dec 2006, Sam Ruby wrote:
Independent of what the specs say *MUST* happen, I'd like people to 
bring up one or more browsers with a URL from this list, and see if the 
browser asked them if they wanted to subscribe.  Subscribe is not a 
normal feature associated with text/html, which is the Content-Type that 
you will find for each.


Actually, this is what Web Apps 1.0 will require, I just haven't written 
the sniffing algorithm yet. This is the placeholder section:


   http://www.whatwg.org/specs/web-apps/current-work/#navigating

Note the mention of RSS/Atom in the first red box.


I have a request.  It would be nice if the sniffing algorithm made an 
exception for text/plain.  Use case:


http://svn.smedbergs.us/wordpress-atom10/tags/0.6/wp-atom10-comments.php

- Sam Ruby



Re: [whatwg] Test cases for parsing spec (Was: Re: Provding Better Tools)

2006-12-06 Thread Sam Ruby

James Graham wrote:

Ian Hickson wrote:

On Tue, 5 Dec 2006, James Graham wrote:
As someone in the process of implementing a HTML5 parser from the 
spec, my _only_ complaint so far is that there aren't (yet) any 
testcases.


If you could get together with the other people writing parsers and 
come up with a standard format for test cases, that would be really 
helpful. I have a few tests I could contribute, but I'd need a format 
to provide them in (they're currently not in a form that would be 
useful to you).


Did you have a list for implementers somewhere? I think it would be a 
very worthwhile effort to come up with a set of implementation 
independent, self-describing (i.e. where the testcase itself contains 
the expected parse tree in some form), testcases - but I think the 
discussion should be on a separate list.


Count me in.  This is actually closer to the original reason why I 
originally subscribed to this list.  If given a few tests, I could 
convert them into a useful form,and this form could serve as a model for 
future tests.


My original interest was to write a replacement for Python's SGMLLIB, 
i.e., one that was not based on the theoretical ideal of how SGML 
vocabularies work, but one based on the practical notion of how HTML 
actually is parsed.


My background: I originally wrote most of the back end for the feed 
validator, and continue to be its primary maintainer.  I also contribute 
to the universal feed parser.


The format of the test cases for both validator and parser are very 
similar: a standalone document with a structured comment.  In the 
structured comment is an assertion.  In the validator's case, it 
describes a message that is, or is not, expected to occur.  In the 
parser's case, it describes what amounts to an XPath expression.  I do 
believe that a similar approach could work here, not for 100% of the 
test cases, but close enough to handle the bulk of the cases.  The rest 
can be handled separately.


Additional things like mime type overrides could also be specified in 
this header.


Samples:

  http://feedvalidator.org/testcases/
  http://feedparser.org/tests/

My goal would be to produce something that I could use within the 
feedparser (and therefore, planet).


- Sam Ruby


Re: [whatwg] Windows-1252 entities

2006-12-06 Thread Sam Ruby

Anne van Kesteren wrote:

The section on handling entities should contain the following mapping:

128: 8364,
129: 65533,
130: 8218,
131: 402,
132: 8222,
133: 8230,
134: 8224,
135: 8225,
136: 710,
137: 8240,
138: 352,
139: 8249,
140: 338,
141: 65533,
142: 381,
143: 65533,
144: 65533,
145: 8216,
146: 8217,
147: 8220,
148: 8221,
149: 8226,
150: 8211,
151: 8212,
152: 732,
153: 8482,
154: 353,
155: 8250,
156: 339,
157: 65533,
158: 382,
159: 65533

... mostly for legacy reasons.


+1, though I would suggest a one change:

  159: 376   // Yuml;

- Sam Ruby


Re: [whatwg] Sanctity of MIME types

2006-12-06 Thread Sam Ruby

Robert Sayre wrote:

On 12/5/06, Sam Ruby [EMAIL PROTECTED] wrote:


I have a request.  It would be nice if the sniffing algorithm made an
exception for text/plain.


It would be nice, but


Use case:

http://svn.smedbergs.us/wordpress-atom10/tags/0.6/wp-atom10-comments.php


Fixed in FF 2.0.0.1, btw. text/plain sniffing in Mozilla dates from this 
bug:


https://bugzilla.mozilla.org/show_bug.cgi?id=220807

1.120 [EMAIL PROTECTED] 2004-01-07 19:56
Work around misconfiguration in default Apache installs that makes it 
claim all

sorts of stuff as text/plain.

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/uriloader/base/nsURILoader.cpprev=1.120#289 


How was it fixed?  Both so that Ian's eventual text can be consistent 
with the fix, and for my edification as I would love to be able to 
directly view my test cases again:


http://feedvalidator.org/testcases/atom/3.1.1.1/escaped_text.xml

- Sam Ruby


Re: [whatwg] several messages about XML syntax and HTML5

2006-12-06 Thread Sam Ruby

Ian Hickson wrote:

On Tue, 5 Dec 2006, Sam Ruby wrote:

   xmlns attributes are invalid on HTML elements except html, and
   when found on unrecognized [elements] imply style=display:none
   unless you recognize the value of this attribute.
There are millions of documents that would be broken by such a rule, 
so browser vendors couldn't actually deploy that, sadly. :-(

Can you identify three independently produced ones?


Sure. Here's one (many pages on that site have this problem):

   http://forskningsbasen.deff.dk/ddf/rec.external?id=auc107991

It has a block at the bottom that says:

   copyright xmlns= xml:lang=en...br...br.../copyright

(Note the cunning mixing of XML-like syntax with HTML-like syntax.)


Another:

   http://www.cms.alaswaq.net/save_print.php?save=1cont_id=4372

A large chunk of the text on this page is inside elements with xmlns= 
set (from what I can tell, all the text above the double up chevron button 
thing is inside elements with xmlns=).



A third one:

   http://www.homeaway.com/Varna/s/1453/fa/find.squery

This one has markup like this (I can just imagine how this happened):

   span(?xml version=1.0 encoding=UTF-8?
   fromRecord xmlns=http://wvrgroup.com/propertyom;1/fromRecord - 
   ?xml version=1.0 encoding=UTF-8?
   toRecord xmlns=http://wvrgroup.com/propertyom;10/toRecord of ?xml 
   version=1.0 encoding=UTF-8?

   hitCount xmlns=http://wvrgroup.com/propertyom;24/hitCount)/span

Again, important text (it's the (1 - 10 of 24) text at the top right, 
clearly intended to be visible), which is wrapped in elements with 
xmlns= attributes.



That's three. I found dozens more (and I only checked a few thousand 
pages at random), including:


   http://ise.uvic.ca/Theater/sip/person/7639/main.html
   The entire header text (John Epstein) on that page is all inside an
   element display_name which has an xmlns= attribute.
   
   http://global.yesasia.com/kr/artIdxDept.aspx/section-videos/code-c/aid-39826/

   A bunch of snippets are inside elements with xmlns=.

   http://intermezzo-weblog.blogspot.com/2005/05/o-caso-rondnia-e-mais.html
   Not clear if it was intentional here, but some of the visible text at 
   the bottom right is in an xmlns= block.


   http://projects.teknowledge.com/DAML/Corpus/W/wrestling_match.html
   Unclear what they thought was going on here too, but the text at the 
   top is inside an unknown element with xmlns= set.


   http://194.7.45.68/fr/item.php?text_id=51813keyw=Snoop+Dogg
   There are eight bazillion xmlns= attributes in this file, but the 
   copyright in particular uses an unknown HTML element with xmlns=.


...and I'll stop here, because that should be enough to convince you. :-)


The common pattern that I see is that xmlns=.

- Sam Ruby


Re: [whatwg] several messages about XML syntax and HTML5

2006-12-06 Thread Sam Ruby

Ian Hickson wrote:

On Wed, 6 Dec 2006, Sam Ruby wrote:

The common pattern that I see is that xmlns=.


It's certainly the more common value, but it is by no means the only one, 
as you will see if you examine the various examples I gave in more detail.


My bad.  Point made.

- Sam Ruby



Re: [whatwg] several messages about XML syntax and HTML5

2006-12-05 Thread Sam Ruby

Ian Hickson wrote:
I don't see any documentation that requires XHTML to not support 
display.write, but it certainly is a reality that nobody has done so.


   http://www.whatwg.org/specs/web-apps/current-work/#document.write1

(I'd like to make it work, but can't work out how to specify it. If you 
have any ideas for actual concrete text for the spec, please let me know.)


I would think that steps 2-7 of the innerHTML algorithm specified 
immediately below the target of this link would be preferred over 
always raise an exception.


Again, while I have a great respect for you and your work, you are hardly 
representative of the majority of Web authors, which is who I have to 
primarily take into account when it comes to the spec.


Agreed.

- Sam Ruby



Re: [whatwg] several messages about XML syntax and HTML5

2006-12-05 Thread Sam Ruby

Ian Hickson wrote:

On Tue, 5 Dec 2006, Sam Ruby wrote:

Case in point:

   http://www.intertwingly.net/blog/2006/12/01/The-White-Pebble

In IE, there's some stray XHTML HTML and XHTML HTML XML text. This 
isn't acceptable to most people. It certainly isn't something that it 
would make sense to encourage. The worst possible outcome here would 
be for browsers like IE to start trying to parse this SVG in 
text/html, because the lack of any sensible parsing rules for it would 
guarentee that we're faced with even more tag soup, thus undoing all 
the work that the HTML5 spec is trying to do to get us past that.

You are aware that I like to tweak IE users, right?

With the current technology, this could have been avoided with a single 
div and two lines of CSS.  And I am most capable of doing that.


But that wouldn't help, e.g., Lynx users.


Over a period of years, I would think that a requirement like the one 
below could be phased in (presuming that one could be found to work).  I 
have no expectation that Lynx would ever support a real XHTML mode.



In the longer run, I do believe that an architected simple rule like:

   xmlns attributes are invalid on HTML elements except html, and
   when found on unrecognized attributes imply style=display:none
   unless you recognize the value of this attribute.

... would channel those with insane desires to make extensions into 
doing so in a manner that is harmless.  Such a rule might take a year or 
two to get widely deployed, but the worst feet-draggers won't be 
affected any worse than they were in the days when table was young.


There are millions of documents that would be broken by such a rule, 
so browser vendors couldn't actually deploy that, sadly. :-(


Can you identify three independently produced ones?

BTW, I deeply respect the pushback that you give to everybody who thinks 
they want to have a say in the future of HTML.


- Sam Ruby


Re: [whatwg] several messages about XML syntax and HTML5

2006-12-04 Thread Sam Ruby
 that is meaningful to me.


It would be better to have hard data to work with, rather than having to 
rely on our opinions of this. My own research does not suggest that most 
authors use tools. That over three quarters of pages have major syntactic 
errors leads me to suspect that tools are not going to save the syntax.


+1.

I'll add that most tools are created by fallible humans with only a 
shallow understanding of the relevant specifications.



On Sat, 2 Dec 2006, Robert Sayre wrote:
It would not take much to add an if the element has an 'xmlns' 
attribute to the A start tag token not covered by the previous 
entries state in How to handle tokens in the main phase section of 
the document.


This would break millions of pages, sadly. There are huge volumes of pages 
that have bogus xmlns= attributes with all kinds of bogus values on the 
Web today. I worked for a browser vendor in the past few years that tried 
to implement xmlns= in text/html content, and found that huge amounts of 
the Web, including many major sites, broke completely. We can't introduce 
live xmlns= attributes to text/html.


All I ask is that you keep an open mind while we collectively explore 
whether there are extremely selective and surgical changes that can be 
made to html5 -- like the change to allow empty element syntax only on a 
handful of elements.



On Sat, 2 Dec 2006, Sam Ruby wrote:

The question is: what would the HTML5 serialization be for the DOM which is
internally produced by the script in the following HTML5 document?

  http://intertwingly.net/stories/2006/12/02/whatwg.logo


Currently, there wouldn't be one. We could extend HTML5 to have some sort 
of way of doing this, in the future. (It isn't clear to me that we'd want 
to allow inline SVG, though. It's an external embedded resource, not a 
semantically-rich part of the document, IMHO.)


When you couple this answer with the concept of a generalized [X]HTML 
toolchain, the inevitable tendency would be to want a HTML5 deserializer 
on one end and an XHTML5 serializer on the other end.  And not just any 
XML deserializer, but one that limited itself to a subset of XML that 
could safely be processed by HTML5 deserializers.


If the spec explicitly disallows things useful to this toolchain, then 
the opportunity exists for somebody to move the discussion for what 
constitutes interop from what does the spec say to what does this 
toolchain support.


As the set of DOMs that have a defined and interopable HTML5 
serialization grows, this picture changes to one in which having an 
HTML5 deserializer on one end and an HTML5 serializer on the other is 
increasingly attractive.



On Sun, 3 Dec 2006, Sam Ruby wrote:

In the hopes that it will bring focus to this discussion:

http://wiki.whatwg.org/wiki/HtmlVsXhtml


This has now been updated with a more complete list of differences.


Thanks!

- Sam Ruby


Re: [whatwg] several messages about XML syntax and HTML5

2006-12-04 Thread Sam Ruby

James Graham wrote:

Elliotte Harold wrote:

That means I have to send text/html to browsers (because that's the 
only thing they understand) and let my clients ignore that hint.


No.

As I understand it, the full chain of events should look like this:

 [Internal data model in server]
|
|
   HTML 5 Serializer
|
|
{Network}
|
|
  HTML 5 Parser
|
|
 [Whatever client tools you like]

The only technical issue is that your HTML5 parser has to produce a data 
format that your other client tools like. If this involves the 
construction of an XML-like tree that's fine. But you should _never_ try 
to use an XML parser to produce the tree because it _will_ break with 
conforming HTML5 documents.


Excellent ASCII art.

This only works if the internal-data-model to HTML5 conversion is 
lossless.  If it is not, people will find ways with structured comments 
or by creating intentionally invalid HTML5 and relying on the error 
recovery that is either prescribed or observed to be commonly practiced.


- Sam Ruby



[whatwg] Sanctity of MIME types

2006-12-04 Thread Sam Ruby

Here's a random half dozen examples, picked to show a bit of diversity:

  http://beta.versiontracker.com/mac/osx/home-edu/updates.rss
  http://city.piao.com.cn/rss.asp?85
  http://feuerwehr-melle-de.server13031.isdg.de/index.php?id=199
  http://hesten.innit.no/hru/rss.php?START=0STOP=3
  http://httablo.hu/pages/rss.php
  http://skopjeclubbing.com.mk/rss_djart.asp

Independent of what the specs say *MUST* happen, I'd like people to 
bring up one or more browsers with a URL from this list, and see if the 
browser asked them if they wanted to subscribe.  Subscribe is not a 
normal feature associated with text/html, which is the Content-Type that 
 you will find for each.


The point is not to label these guys bozos (as I said in previous 
messages, bozos outnumber you).  But to get you to consider what 
browsers can, and will, do.


In these days of GreaseMonkey and its brethren, the client is king.

 - - -

Where does this leave HTML5?  I am of the opinion that HTML5 should 
describe a set of rules that a compliant HTML5 parser should follow. 
The MIME and DOCTYPEs specified in the document should be 
recommendations.  Something outside of the parser may chose to dispatch 
based on this information, but that's outside of the control of the 
parser.  IMHO, the parser itself shouldn't complain when it finds a 
HTML4 DOCTYPE, or an XHTML2 DOCTYPE for that matter.


Of course, a lot more HTML4 documents would be valid HTML5 than XHTML 2 
documents.


- Sam Ruby


[whatwg] wiki: HtmlVsXhtml

2006-12-03 Thread Sam Ruby

In the hopes that it will bring focus to this discussion:

http://wiki.whatwg.org/wiki/HtmlVsXhtml

- Sam Ruby


Re: [whatwg] Valid Unicode

2006-12-02 Thread Sam Ruby

On 12/1/06, Elliotte Harold [EMAIL PROTECTED] wrote:

Henri Sivonen wrote:

 6. Are noncharacters U+FDD0..U+FDEF allowed (?)
 7. Are the noncharacters from the last two characters of each plane
 allowed (?)

 I don't have particularly strong feelings here. Putting those characters
 is HTML is a bad idea, but allowing them is not a problem for HTML5 to
 XHTML5 conversion and they aren't a common problem like C1 controls.

FFFE and  are specifically forbidden by XML so they should probably
be forbidden here too. I think the others are allowed.


Unicode (not XML) reserves U+D800 – U+DFFF as well as U+FFFE and U+.

XML 1.0 only allows the following characters:

[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDDF],
[#x1FFFE-#x1], [#x2FFFE-#x2], [#x3FFFE-#x3],
[#x4FFFE-#x4], [#x5FFFE-#x5], [#x6FFFE-#x6],
[#x7FFFE-#x7], [#x8FFFE-#x8], [#x9FFFE-#x9],
[#xAFFFE-#xA], [#xBFFFE-#xB], [#xCFFFE-#xC],
[#xDFFFE-#xD], [#xEFFFE-#xE], [#xE-#xF],
[#x10FFFE-#x10].

It would not be wise for HTML5 to limit itself to the more constrained
character set of XML.  In particular, the form feed character is
pretty popular,

This is yet another case where take HTML5, read it into a DOM, and
serialize it as XML, and voilà: you have valid XHTML doesn't work.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/


- Sam Ruby


Re: [whatwg] markup as authored in practice

2006-12-02 Thread Sam Ruby

Robert Sayre wrote:


SVG and MathML have a DOM. It wouldn't be that hard to serialize it as 
HTML5.


Robert, if you will permit me, I would like to recast that into the form 
of a question, jeopardy style.


The question is: what would the HTML5 serialization be for the DOM which 
is internally produced by the script in the following HTML5 document?


  http://intertwingly.net/stories/2006/12/02/whatwg.logo

Any takers?

- Sam Ruby

P.S.  That script, complete with indentation and readable variable 
names, is still an order of magnitude smaller than


  http://whatwg.org/images/logo

People could save bandwidth and reduce the number of HTTP requests (and 
not have to worry about hotlinking!) by dropping this script into their 
pages (of course, they could save even more bytes if there were a direct 
HTML5 serialization of this DOM, hence the question.


P.P.S.  I realize that not all browsers support this relative new 
elements.  It is my understanding that HTML5 will be introducing new 
elements too.


Re: [whatwg] markup as authored in practice

2006-12-02 Thread Sam Ruby

On 12/2/06, Robert Sayre [EMAIL PROTECTED] wrote:


I don't think we need to settle this issue in December 2006, but I do
think there is ample evidence of interoperable but undocumented
behavior that HTML5 implementors will need to consider.


Does the WHATWG have a process for capturing unresolved issues that
need to be worked?

- Sam Ruby


Re: [whatwg] markup as authored in practice

2006-12-02 Thread Sam Ruby

On 12/2/06, David Hyatt [EMAIL PROTECTED] wrote:

Shipping Safari has no SVG support.  WebKit nightlies do.  That's the
only reason the logo now renders correctly in the nightlies  so
that particular file is completely irrelevant to this discussion.


I'm confused.  Which file?  And why is it completely irrelevant?


dave


Re: [whatwg] Valid Unicode

2006-12-02 Thread Sam Ruby

On 12/2/06, Henri Sivonen [EMAIL PROTECTED] wrote:

On Dec 2, 2006, at 18:24, Sam Ruby wrote:

 It would not be wise for HTML5 to limit itself to the more constrained
 character set of XML.  In particular, the form feed character is
 pretty popular,


BTW, I copy and pasted the wrong table.  The characters I mentioned
were discouraged (and include such things as Microsoft smart quotes
mislabeled as iso-8859-1).  The actual allowed set in XML 1.0 is as
follows:

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10]

For XML 1.1 the list is as follows:

[#x1-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10]


 This is yet another case where take HTML5, read it into a DOM, and
 serialize it as XML, and voilà: you have valid XHTML doesn't work.

What I am advocating is making sure that *conforming* HTML5 documents
can be serialized as XHTML5 without dataloss.


Then you will also need to disallow newlines in attribute values.

In any case, I understand the desire; my read is that the WG's desire
for backwards compatibility is higher.  Limiting the character set to
the allowable XML 1.1 character set should not be a problem for
backwards compatibility purposes.

- Sam Ruby


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-30 Thread Sam Ruby

Henri Sivonen wrote:



I don't think it has any actual technical merit


OTOH, the blog.whatwg.org WordPress lipsticking drill was a total waste 
of time from a technical point of view. It was purely about public 
relations and politics.


As an alternative to being perceived as a lipsticking drill, I would 
prefer that others felt that an important part of the spec authoring 
process includes what amounts to a feasibility study and hands on 
experimentation with extant authoring tools.


I apologize if I've caused any ill will.

I do believe that efforts to keep blog.whatwg.org and other sites to be 
valid relative to the current draft of HTML5 are important in order to 
keep perspective and to provide an example for others to learn from.


Finally, I will express a bit of disappointment at seeing the WordPress 
folks prematurely being labeled bozos, and am disappointed to see 
portions of this discussion framed in terms that border on the 
discussions of epic battles with Zeldman.


- Sam Ruby


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-30 Thread Sam Ruby

On 11/30/06, Michel Fortin [EMAIL PROTECTED] wrote:



We can't really have a document that is both HTML5 and XHTML5 at the
same time if we keep the !DOCTYPE HTML declaration however.



Why not?

- Sam Ruby


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Sam Ruby

Benjamin Hawkes-Lewis wrote:
On Tue, 2006-11-28 at 16:20 -0500, Sam Ruby wrote: 


I believe that I could modify my weblog to be simultaneously both
HTML5 and XHTML5 compliant, modulo the embedded SVG content, something
that would needs to be discussed separately.


I think having /two/ different serializations of Web Forms 2.0/Web
Applications 1.0 is bad enough. To try and cater to what's effectively a
third serialization compatible with both parsing methods is to reinvent
the XHTML 1.0 as text/html mess. Serializing to multiple formats from
a single source is, I think, a better model. Especially as embedded
content may need different treatment too.


That was not the intent of my suggestion.  I am suggesting that HTML5 
standardize on *one* format.  One that comes as close as humanly 
possible to capturing the web as it is practiced in all of its glorious 
and often quite messy detail.  Those that wish to serialize the DOM in 
other formats are certainly free to do so, but those formats aren't HTML5.


I do have an opinion on how embedded content should be handled, but I am 
trying to focus on one issue at a time.  If you would like a preview, 
take a peek at:


http://planet.intertwingly.net/
http://planet.intertwingly.net/top100/
http://golem.ph.utexas.edu/~distler/planet/

Those three planets take input from a number of frankly grungy input 
sources and consistently produce well formed XML that often contain 
embedded MathML or SVG content.


You are, of course, free to explore those pages and others; but, for 
now, I would like to focus on one question:


If HTML5 were changed so that these elements -- and these elements
alone -- permitted an optional trailing slash character, what
percentage of the web would be parsed differently?  Can you cite
three independent examples of existing websites where the parsing
would diverge?

Lachlan's observations [...] on what it would take to 
change the popular WordPress application to produce HTML5 compliant

output


As blogging software goes, WordPress is pretty good. But then blogging
software is generally atrocious when it comes to markup. Trying to
design an (X)HTML spec for a group of PHP developers who think it's
persuasive to bang on about their dedication to web standards while
serving their project's non-validating XHTML 1.1 homepage as text/html
is doomed to failure.


I'm pretty sure that the Mozilla home page was not created with 
WordPress, and I'm absolutely sure that the Microsoft home page was not.


Conversely, if the major browser vendors have to chose between the web 
as it is commonly practiced, and a spec that doesn't reflect that 
reality, which one do you think they will chose?


I'll argue that the choices aren't as black and white as either the 
question you posed above, or even the one that I did.


No matter what the WHATWG spec says, each vendor will independently make 
a cost/benefit analysis as to how they should treat trailing slashes in 
elements like img.


But before they do, this work group certainly can anticipate that 
question.  What is the cost of accepting trailing slashes on elements 
which are always defined with a content model of empty, except when 
found in Attribute value (unquoted) state?  What sites would be parsed 
differently based on this change?  Are those differences in line with 
how existing browsers actually behave, or at odds with this behavior?


- Sam Ruby


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Sam Ruby

Lachlan Hunt wrote:

Sam Ruby wrote:
In HTML5, there are a number of elements with a content model of 
empty: area, base, br, col, command, embed, hr, img, link, meta, and 
param.


If HTML5 were changed so that these elements -- and these elements 
alone -- permitted an optional trailing slash character, what 
percentage of the web would be parsed differently?  Can you cite three 
independent examples of existing websites where the parsing would 
diverge?


If it's only allowed on empty elements (now known as singleton 
elements in the spec) then this isn't about changing the handling, it's 
just about defining what is and is not conforming.


Exactly.

I do not think it's a good idea to make the trailing slash conforming. 
Although it is harmless, it provides no additional benefit at all and it 
creates the false impression that the syntax actually does something.


The fact is that authors already try things like div/, p/ and even 
a/.  I've seen all of those examples in the wild.  See, for instance, 
the source of the XML 1.0 spec (and many others) which claim to be XHTML 
as text/html, littered with plenty of a/ tags all throughout.


If these are common, and implemented interoperably, then what is the 
harm?  An example of something that is NOT implemented interoperably is 
script src=.../.


In my book, a document that states that it always is a parse error to do 
something despite abundant evidence to the contrary is not as useful as 
one that says here are the places where it works, and here are the 
places where it does not.


I've even come across various authors either thinking that does work, or 
(when they find out the truth) wondering why it doesn't.  It's not a 
good idea to confuse them any more by giving the impression that it 
works for some elements but not others.  It's better to just say it 
doesn't work at all and forbid it in all cases.


That's a slippery slope.  At the extreme, it leads to XHTML 2.0, where 
features that are thought to be problematic are removed.  Think of the 
children.


By contrast, in HTML5, I see a document that attempts to be considerably 
less judgemental, and considerably more resilient.  Inside the comments 
in the HTML 5 document I see statistics lovingly cited.  Example:


!-- As of
2005-12, studies showed that around 0.2% of pages used the
image element. --

What percentage of pages use img/ constructs?

and all this is coupled with Lachlan's observations[3] on what it 
would take to change the popular WordPress application to produce 
HTML5 compliant output.


That just illustrates a fundamental flaw in the way WordPress has been 
built.  It is a perfect example of a CMS built by a bunch of bozos [1] 
and cannot be used as an excuse for allowing the syntax.


Be careful when you patronize.

Is there really any excuse for allowing biOMG!/b/i?  No, but 
HTML5 is willing to pinch its nose with thumb and forefinger and look 
the other way.  It literally is not a battle worth fighting.


As a side benefit of this change, I believe that I could modify my 
weblog to be simultaneously both HTML5 and XHTML5 compliant, modulo 
the embedded SVG content, something that would needs to be discussed 
separately.


No you couldn't, and how would that be a benefit if you could?  XHTML 5 
requires xmlns, HTML 5 forbids it.  HTML 5 requires !DOCTYPE html, 
XHTML 5 doesn't (though it's still well-formed, so you could get away 
with it).


The last I saw, HTML 5 is a working draft.  Did I miss a memo?

With Venus, I translate all content into a canonical well formed XML 
format.  This enables people who author filters to the ability to worry 
about a lot less random edge cases.  I've already seen a lot of 
inventiveness when people find that they can apply off the shelf XML 
tools like XPath and XSLT.


I'd gladly put in a !DOCTYPE html in my page, the question is: would 
the WHATWG be willing to meet me half way and allow xmlns attributes in 
a very select and carefully prescribed set of locations?


By the way, my experience is that these types of conversations always 
start off bumpy not merely due to the well known limitation of email for 
conveying human emotion.  The problem is deeper than that: there 
literally is no good place to start.  The only way I know how to deal 
with that is to pose, and repeat, concrete and simple questions.  And 
the one that I am posing with this thread is as follows:


If HTML5 were changed so that these elements -- and these elements
alone -- permitted an optional trailing slash character, what
percentage of the web would be parsed differently?  Can you cite
three independent examples of existing websites where the parsing
would diverge?


[1] http://hsivonen.iki.fi/producing-xml/


- Sam Ruby


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Sam Ruby

Anne van Kesteren wrote:

On Wed, 29 Nov 2006 17:10:10 +0100, Robert Sayre [EMAIL PROTECTED] wrote:

Perhaps it would be better to prove that the current rules result in
easy explanations. What would the text of a bug filed on WordPress
look like? Let's assume you actually want them to fix it, not just
make a point.


The bug would request that Wordpress doesn't try to output XML for the 
text/html media type. That seems to be the problem here.


If the code for Wordpress fit on a page, that suggestion would be easy 
to implement.


As it stands now, it appear that several hundred lines of code would 
need to change.  And in each case, the code would need to be aware of 
the content type in effect.  In some cases, that information may not be 
available.  In fact, that may not have been determined yet.


One way cross-cutting concerns such as this one are often handled is to 
simple capture the output and post-process it.  Latchlan opted to do so 
with the WHATWG Blog.  The first pass for things like this generally 
takes the form of simple pattern matching and regular expressions.


Often this evolves.  What would be better is something that could take 
that string and produce a DOM, from which a correct serialization can 
take place.


Now, what type of parser would you use?  HTML5's rules come 
tantalizingly close to handling this situation, except for a few cases 
involving tags that are self-closing...


- Sam Ruby


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Sam Ruby

Anne van Kesteren wrote:


What do you mean with implemented interoperably?


produce the same DOM

- Sam Ruby


[whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-28 Thread Sam Ruby

In response to a weblog post of mine[1], Ian stated[2]:

we can’t make trailing “/” characters meaningful — it would
change how about 49% of the Web is parsed

Just to make sure that we are talking about the same thing, let me make 
a much more carefully scoped proposal.


In HTML5, there are a number of elements with a content model of empty: 
area, base, br, col, command, embed, hr, img, link, meta, and param.


If HTML5 were changed so that these elements -- and these elements alone 
-- permitted an optional trailing slash character, what percentage of 
the web would be parsed differently?  Can you cite three independent 
examples of existing websites where the parsing would diverge?


As an additional constraint, I am explicitly suggesting that the 
Attribute value (unquoted) state not be changed - slashes in this 
state would continue to be appended to the current attribute's value.


The basis for my question is the observation that the web browsers that 
I am familiar with apparently already operate in this fashion, this 
usage seems to have crept into quite a number of diverse places, and all 
this is coupled with Lachlan's observations[3] on what it would take to 
change the popular WordPress application to produce HTML5 compliant output.


As a side benefit of this change, I believe that I could modify my 
weblog to be simultaneously both HTML5 and XHTML5 compliant, modulo the 
embedded SVG content, something that would needs to be discussed separately.


- Sam Ruby

[1] http://intertwingly.net/blog/2006/11/28/Meet-the-New-Boss
[2] http://intertwingly.net/blog/2006/11/28/Meet-the-New-Boss#c1164743684
[3] http://intertwingly.net/blog/2006/11/24/Feedback-on-XHTML#c1164720800