Re: Proposal for Cascading Attribute Sheets - like CSS, but for attributes!

2015-05-05 Thread Austin William Wright
On Mon, May 4, 2015 at 8:52 PM, Lea Verou l...@verou.me wrote:

 Sorry for reviving such an old thread (almost 3 years now!) but I recently
 came across a huge use case for CAS: Semantic data! Namely, Microdata and
 RDFa. They’re both applied using attributes, which makes them super tedious
 to add to existing content.


I like this idea! You might get additional feedback for this from the SWIG
community on the semantic-web list.

GRDDL is (was?) a very similar, but much more complex, concept that uses
RDF/XML and XSLT (not a very well-liked combination by myself or
generally): http://www.w3.org/TR/grddl/

Another application would be many of the applications of ARIA attributes
(surely I can't be the only one who's thought ARIA shouldn't be embedded in
markup since it's not conveying actual data).

In both cases, they're properties annotating/semantically enhancing data,
where cycles generally shouldn't occur. That is if cycles are a problem at
all. CSS doesn't seem to have any issue when it creates pesudo-elements, I
don't forsee too many issues with pesudo-attributes.

That is, I don't see any reason this would need to modify the DOM; but
could instead expose a data DOM that ARIA and RDFa would use; the same
way Web browsers render a visual DOM of sorts containing pesudo-elements.

Austin Wright.


Re: template namespace attribute proposal

2015-03-14 Thread Austin William Wright
On Thu, Mar 12, 2015 at 4:20 PM, Benjamin Lesh bl...@netflix.com wrote:

 For my part, I disagree slightly with this statement. If you just drop a
 circle tag in a div, you're going to get an HTMLUnknownElement. This is
 by design and to spec, of course. But it unfortunately means you can't
 clone() the element over to an SVG parent and hope it will work.d


Could you post the specific regression you ran into? The behavior you
describe should only true for text/html parsing; it doesn't apply to DOM
and application/xhtml+xml.

For instance, given an arbitrary, conforming HTML document containing an
SVG circle element, this should work:

var svgns = 'http://www.w3.org/2000/svg';
var c = document.getElementsByTagNameNS(svgns, 'circle')[0].cloneNode();
document.getElementsByTagName('body')[0].appendChild(c);
document.getElementsByTagName('body')[0].lastElementChild.namespaceURI ==
svgns; // true

text/html just isn't cut out for the sort of complex stuff we're
discussing. For instance, what if I want to start using the proposed
application/api-problem+xml format? You can't. text/html simply isn't built
for the complex features being proposed. This is made explicit in HTML5:

The DOM, the HTML syntax, and the XHTML syntax cannot all represent the
same content. For example, namespaces cannot be represented using the HTML
syntax, but they are supported in the DOM and in the XHTML syntax.
Similarly, documents that use the noscript feature can be represented using
the HTML syntax, but cannot be represented with the DOM or in the XHTML
syntax. Comments that contain the string -- can only be represented in
the DOM, not in the HTML and XHTML syntaxes.


There's a craptonne of XML based markup languages and file formats out
there. We can't just keep importing all of them into HTML every time we
decide one of them might be useful to embed inside HTML. THERE is a
usability and complexity nightmare.

Explicit is better than implicit, so I like the idea of a namespace
attribute element, it is forward-compatible with future vocabularies we may
wish to use.

Namespaces aren't *that* hard to understand. In my code above, I added one
line declaring the namespace (`var svgns`). Is that really so hard? If you
want to use the more advanced features of HTML, use namespaces, or import
whatever vocabulary I want - DocBook, OpenDocument, music notation, XSL,
without worry of collision. That's what they're there for, and at least a
handful of client-side libraries already do this, e.g. http://webodf.org/.

(Certainly much simpler than, say, the parsing differences between script,
style, pre, and attributes, which I only understand well enough to know to
stay the cuss away from putting user content in script tags. The amount of
inconsistency and complexity of text/html parsing is single-handedly
responsible for most of the XSS injections I come across. This isn't just
matter of having a feature or not, this is a matter of security... why not
fix *this*? /rant)

I understand the URI may be too much for people to grok, maybe instead use
a tag name (html, svg or mathml):

template namespace=svg
  circle cx=10 cy=10 cr=10 /
/template

The application/xhtml+xml parser would simply ignore the namespace
attribute, using xmlns on children instead. Polyglot HTML would use both
attributes.

If two separate attributes is too much, then just add xmlns= support to
text/html.

Austin.


Re: CORS performance

2015-03-05 Thread Austin William Wright
On Mon, Feb 23, 2015 at 12:42 PM, Jonas Sicking jo...@sicking.cc wrote:

 Do we have any data on how common it is for people to use CORS with
 credentials? My impression is that it's far less common than CORS
 without credentials.

 If that's the case then I think we'd get most of the functionality,
 with essentially none of the risk, by only allowing server-wide
 cookie-less preflights.

 But data would help for sure.



The credentials issue is a great concern. I've seen two cases I've been
able to exploit where the Origin header is always passed back to
Access-Control-Allow-Origin with Access-Control-Allow-Credentials; not
realizing this exposes CSRF tokens and other sensitive information. Though
this is not so much data as anecdote. (One was on a company-internal
project, one briefly appeared in a software library we are using in between
stable releases.) It's not hard to find additional examples [1][2][3][4].
I've never used the credentials functionality of CORS, instead passing an
Authorization header explicitly. I think we could survive without it.

I also recently ran into the CORS performance issue. CORS seems very biased
against hypermedia services, in that it doubles the roundtrip time to an
already very chatty API design.

To work around these two issues, I began work on a proxy to accept
`message/http` POST requests which returns `message/http` responses
(actually for now, application/json to avoid the full RFC7230
implementation). (Existing proxies solving this problem don't have the
performance characteristics or flexibility, or both, desired around HTTP
headers, hypermedia, and streaming.) The only downside is this bypasses the
user-agent cache, but for many kinds of services the user-agent cache is
rarely utilized (for instance, jQuery's AJAX actively disables caching).

I suggest that any request I can make with this proxy (which does not
itself verify CORS before passing the request), I should be able to make
without the CORS preflight request.

The semantics of the proposed header then becomes a declaration that This
server is accessible from the Internet (as opposed to a request on an
intranet server made from an Internet-served page); and would then allow
the making of requests that never utilize stored user-agent data
(credentials). (With the exception of the Date header, could this be
called deterministic, since the same XHR call will always produce the same
HTTP request?)

This definition would have the following desirable effects:

1. HTTP requests become stateless
2. Removes the possibility of exposing sensitive, stored user information
(a TRACE response, if honored, would not divulge anything the sender
doesn't already know)
3. Brings performance in line with non-browser user-agents that don't
require CORS checks
4. Eliminates the privacy and security concerns associated with CORS
work-arounds, like JSONP. As seen in [5] [6] [7]

This wouldn't introduce new security concerns, because by definition it
would not allow requests that couldn't already be made by proxy.

The `OPTIONS *` request seems appropriate for this; as it is defined to be
a request for metadata about the server itself, which is all that is
necessary.

Austin Wright.

[1]
https://github.com/primiano/blink-gitcs/blob/0b5424070e3006102e0036deea1e2e263b871eaa/LayoutTests/http/tests/security/resources/cors-redirect.php
[2]
https://github.com/wohlig/sergybackend/blob/3df9e42dccf620e53253c746022fbe1e53a97a3a/application/views/json.php
[3]
https://github.com/AslakNiclasen/ProjektOpgave/blob/7d5de466398371ab1481973bf54074ab005fffdc/Kode/ad_controller/.htaccess2.txt
[4]
https://github.com/datfinesoul/starphleet-logviewer/blob/af42371daad52fa7e7ba759e94b20024077a30bb/nginx.conf
[5] http://blog.javascripting.com/2015/01/17/dont-hassle-with-cors/
[6] https://jsonp.nodejitsu.com/
[7] https://github.com/Rob--W/cors-anywhere


Re: Are web components *seriously* not namespaced?

2015-02-06 Thread Austin William Wright
On Thu, Feb 5, 2015 at 12:55 PM, Tab Atkins Jr. jackalm...@gmail.com
wrote:


 * Domain names don't mean much. For example, Dublin Core's namespace
 starts with http://purl.org/;, which is effectively meaningless.


It means that the owner of purl.org decided to allocate the namespace, as
opposed to someone else. So while it's not arbitrary, for our purposes it's
entirely opaque.


 * Similarly, path components often exist which are worthless and just
 lengthen the namespace for no uniquifying gain, such as the SVG
 namespace http://www.w3.org/2000/svg which contains /2000/ for some
 historical reason (it was minted in 2000, and at the time the W3C put
 the year in most urls for some reason).  (Note the use of www in this
 url, compared to no www in the DC namespace. Inconsistency!)


URIs are opaque, it's not really worth it to argue about how they're
designed, because their design is meaningless to everyone except the
authority that minted them.

Every once in a while in the RDF/Semantic Web community, there's a
complaint that http://www.w3.org/1999/02/22-rdf-syntax-ns# is too long to
remember. Eventually the discussion realizes that it's a non issue because,
again, URIs are opaque.

If you have to look up a URI, http://prefix.cc/xsd (for example) works
pretty well.

If you type, auto-complete, copy/paste, or otherwise enter the wrong
namespace, it'll be pretty clear right off the bat that your program isn't
working. Even if that weren't the case, we have spell-checkers, why not
namespace-checkers?

(snip)


 I'll stop there, though I could name a few more.  All a namespace
 needs is to be of reasonable length so that it's probably unique.
 There are any number of non-insane ways to do that, but XML namespaces
 chose many of the worst options possible.


I would call the namespace issue largely /resolved/ by XML. All of the
features you named exist because it adds a definite feature; e.g. the
ability to paste an SVG document directly into a document without having to
copy-paste a bunch of headers (Turtle, SPARQL has this problem; nested
namespaces are a definite *feature*!).

XML namespaces are greatly preferable to the tag-soup problem we have with
text/html and application/json, where there's *no* namespaces whatsoever,
with *no* way to mix vocabularies, and *no* forward compatibility.

Nothing against JSON; I maintain numerous utilities around JSON including
JSON Schema, JSON Hyper-schema, JSON-LD, and more. JSON documents are great
for what they do; XML (and other DOM serializations) documents are great
for the different task that they do, and they do namespaces.

If nothing else, we need to support namespaces because HTML isn't the only
DOM-based hypertext technology out there. Limiting our sights to HTML would
be unfortunate. I'm not even sure how namespaces are unsupported;
namespaces exist in the DOM, even if they don't exist in the text/html
syntax. It's not terribly hard to use:

var svgns = 'http://www.w3.org/2000/svg'; // functionally same as xmlns=,
@prefix, etc
document.getElementsByTagNameNS(svgns, 'svg');
var e = document.createElementNS(svgns, 'rect');

... this is not fundamentally different than all the DOM stuff we do for
HTML.

We're dealing with Web Scale here. Works for 90% of us isn't good enough.

Austin Wright.


Re: =[xhr]

2014-08-02 Thread Austin William Wright
On Fri, Aug 1, 2014 at 2:01 PM, Glenn Maynard gl...@zewt.org wrote:

 On Fri, Aug 1, 2014 at 8:39 AM, nmork_consult...@cusa.canon.com wrote:

 Spinner is not sufficient.  All user activity must stop.  They can take
  a coffee break if it takes too long.  Browser must be frozen and locked
 down completely.  No other options are desirable.  All tabs, menus, etc.
 must be frozen.  That is exactly the desired result.


 My browser isn't yours to lock down.  My menus aren't yours to freeze.
  You don't get to halt my browser, it doesn't belong to you.

 In this case, a freeze on all browser operations is desirable.


 It may be desirable to you, but it's never desirable to the user, and
 users come first.


This seems rather cold (I wouldn't presume that the described usage is
actually bad for the users, not having seen the program in question),
though assertion is technically correct (if users are at odds with
development of a technical report, users come first). I would point out:

It may be cheap for the developer to use synchronous mode, but it's not the
UI event loop works, and as such it's almost always a bad proposition for
the user. It's not a sustainable coding pattern (what if you want to listen
for two operations at the same time?), it's generally a hack all around. It
doesn't negate the need for your application to perform sanity checks like
Is the data loaded? Does performing this operation make sense?, even if
using synchronous mode *seems* to let you avoid such checks.

Maybe there's another reason: Good idea or no, removing this feature DOES
break reverse compatibility with the de-facto behavior of many Web
browsers. I'm not sure that's reason enough to standardize on the behavior,
though. However, it may be enough a reason to file a bug report if the
behavior ever breaks (though if they come back and say it was never
standardized behavior to begin with, you shouldn't have been using it in
production, I can't really blame that either).

Austin Wright.


Re: Overlap between StreamReader and FileReader

2013-08-08 Thread Austin William Wright
On Thu, Aug 8, 2013 at 2:56 PM, Jonas Sicking jo...@sicking.cc wrote:

 On Thu, Aug 8, 2013 at 6:42 AM, Domenic Denicola
 dome...@domenicdenicola.com wrote:
  From: Takeshi Yoshino [mailto:tyosh...@google.com]
 
  On Thu, Aug 1, 2013 at 12:54 AM, Domenic Denicola 
 dome...@domenicdenicola.com wrote:
  Hey all, I was directed here by Anne helpfully posting to
 public-script-coord and es-discuss. I would love it if a summary of what
 proposal is currently under discussion: is it [1]? Or maybe some form of
 [2]?
 
  [1]: https://rawgithub.com/tyoshino/stream/master/streams.html
  [2]:
 http://lists.w3.org/Archives/Public/public-webapps/2013AprJun/0727.html
 
  I'm drafting [1] based on [2] and summarizing comments on this list in
 order to build up concrete algorithm and get consensus on it.
 
  Great! Can you explain why this needs to return an
 AbortableProgressPromise, instead of simply a Promise? All existing stream
 APIs (as prototyped in Node.js and in other environments, such as in
 js-git's multi-platform implementation) do not signal progress or allow
 aborting at the during a chunk level, but instead count on you recording
 progress by yourself depending on what you've seen come in so far, and
 aborting on your own between chunks. This allows better pipelining and
 backpressure down to the network and file descriptor layer, from what I
 understand.

 Can you explain what you mean by This allows better pipelining and
 backpressure down to the network and file descriptor layer?


I believe the term is congestion control such as the TCP congestion
control algorithm. That is, don't send data to the application faster than
it can parse it or pass it off, or otherwise some mechanism to allow the
application to throttle down the incoming flow, essential to any
networked application like the Web.

I think there's some confusion as to what the abort() call is going to do
exactly.


Re: IndexedDB: Thoughts on implementing IndexedDB

2013-08-02 Thread Austin William Wright
On Tue, Jul 30, 2013 at 3:13 PM, Joshua Bell jsb...@google.com wrote:

 And now replying to the non-nits:


 On Tue, Jul 30, 2013 at 1:30 AM, Austin William Wright a...@bzfx.netwrote:

 I've been meaning to implement IndexedDB in some fashion for a while.
 Earlier this month, shortly after the call for implementations, I realized
 I should be getting on that. I've been working on an in-memory ECMAScript
 implementation with fast data structures and the like. I also intend to
 experiment with new features like new types of indexes (hash tables that
 can't be iterated, and index values calculated by expression/function,
 which appears to have been discussed elsewhere).

 I've had a few thoughts, mostly about language:

 (1) Is there no way to specify an arbitrary nested path? I want to do
 something like ['menus', x] where `x` is some token which may be anything,
 like an empty string or a string with a period in it. This is especially
 important if there are structures like {http://example.com/URI:
 value} in documents, which is especially common in JSON-LD. From what I
 can tell, IndexedDB essentially makes it impossible to index JSON-LD
 documents.

 It appears the current behavior instead allows you to index by multiple
 keys, but it's not immediately obvious this is the rationale.

 How *would* one include a property whose key includes a period? This
 seems to be asking for security problems, if authors need to implement an
 escaping scheme for their keys, either when constructing a key path or when
 constructing objects. Database names can be anything, why not key names?


 The key path mechanism (and by definition, the index mechanism) definitely
 doesn't support every use case. It is focused on the simple case where
 the structure of the data being stored is under the control of the
 developer authoring code against the IDB API. Slap a library in the middle
 that's exposing a radically different storage API to authors and that
 library is going to need to compute index keys on its own and produce
 wrapper objects, or some such.

 One of the ideas that's been talked about for v2 is extensible indexing,
 allowing the index key to be computed by a script function.


Computing index keys would be a fantastic step for any database, I think.
If one could also define a custom comparison operator, this would perhaps
one of the more powerful features in a database that I've seen.




 (3) I had trouble deciphering the exact behavior of multiple open
 transactions on one another. I eventually realized the definition of
 IDBTransactionMode describes the behavior.

 Still, however, this document appears to talk in terms of what is
 written to the database. But this isn't well defined. If something is
 written to the database, wouldn't it affect what is read in a readonly
 transaction? (No.)

 And the language seems inconsistent. The language for `abort` says that
 changes to the database must be rolled back (as if every operation writes
 to storage), but the language for `Steps for committing a transaction`
 specifies it is at that time the data is written (as if all write
 operations up to this point are kept in memory). There's not strictly a
 contradiction here, but perhaps more neutral language could be used.


 Agreed, this could be improved. (Practically speaking, I expect that would
 happen if we end up with implementation differences that require refining
 the language in a future iteration.)


 (5) I found the language for iterating and creating a Cursor hard to
 understand being nested in multiple layers of algorithms. Specifically,
 where an IDBCursor instance was actually exposed to the user. But now it
 makes sense, and I don't really see how it might be improved. An
 (informative) example on iterating a cursor may be helpful.


 I recently added one towards the start of the spec (The following example
 looks up all books in the database by author using an index and a cursor)
 - is that what you were thinking? Is it just a matter of spec organization?
 I think at some point in the spec history the examples were more integrated
 into the text.


I recall eventually finding that example, I think that works.




 (6) The document refers to the HTML5 Structured Clone Algorithm. It's a
 bit concerning that it has to refer to ECMAScript algorithms defined in a
 specification that defines a markup language. I don't think referring to a
 markup language should be necessary (I don't intend on using my
 implementation in an (X)HTML environment, just straight XML if anything at
 all), though perhaps this is just a modularity problem with the HTML5 draft
 (or rather, lack thereof).


 Agreed that it seems like an odd place for it in the abstract, but the
 HTML spec defines much of the behavior of the browser environment beyond
 the markup language. Hixie and Anne are doing some spec refactoring work;
 perhaps some day it will be more modular. Indexed DB is very much designed
 to be an API for scripts running

IndexedDB: Thoughts on implementing IndexedDB

2013-07-30 Thread Austin William Wright
I've been meaning to implement IndexedDB in some fashion for a while.
Earlier this month, shortly after the call for implementations, I realized
I should be getting on that. I've been working on an in-memory ECMAScript
implementation with fast data structures and the like. I also intend to
experiment with new features like new types of indexes (hash tables that
can't be iterated, and index values calculated by expression/function,
which appears to have been discussed elsewhere).

I've had a few thoughts, mostly about language:

(1) Is there no way to specify an arbitrary nested path? I want to do
something like ['menus', x] where `x` is some token which may be anything,
like an empty string or a string with a period in it. This is especially
important if there are structures like {http://example.com/URI: value}
in documents, which is especially common in JSON-LD. From what I can tell,
IndexedDB essentially makes it impossible to index JSON-LD documents.

It appears the current behavior instead allows you to index by multiple
keys, but it's not immediately obvious this is the rationale.

How *would* one include a property whose key includes a period? This seems
to be asking for security problems, if authors need to implement an
escaping scheme for their keys, either when constructing a key path or when
constructing objects. Database names can be anything, why not key names?


(2) There are still a few references to *Sync terms from the old
synchronous API. Specifically, IDBDatabaseSync and IDBCursorWithValueSync.


(3) I had trouble deciphering the exact behavior of multiple open
transactions on one another. I eventually realized the definition of
IDBTransactionMode describes the behavior.

Still, however, this document appears to talk in terms of what is written
to the database. But this isn't well defined. If something is written to
the database, wouldn't it affect what is read in a readonly transaction?
(No.)

And the language seems inconsistent. The language for `abort` says that
changes to the database must be rolled back (as if every operation writes
to storage), but the language for `Steps for committing a transaction`
specifies it is at that time the data is written (as if all write
operations up to this point are kept in memory). There's not strictly a
contradiction here, but perhaps more neutral language could be used.


(4) The section Steps for asynchronously executing a request says Set
/transaction/ to the `transaction` associated with `source`.

Perhaps it should instead say Let /transaction/ be the `transaction` which
is associated with `source`

Because /transaction/ is previously undefined.


(5) I found the language for iterating and creating a Cursor hard to
understand being nested in multiple layers of algorithms. Specifically,
where an IDBCursor instance was actually exposed to the user. But now it
makes sense, and I don't really see how it might be improved. An
(informative) example on iterating a cursor may be helpful.


(6) The document refers to the HTML5 Structured Clone Algorithm. It's a bit
concerning that it has to refer to ECMAScript algorithms defined in a
specification that defines a markup language. I don't think referring to a
markup language should be necessary (I don't intend on using my
implementation in an (X)HTML environment, just straight XML if anything at
all), though perhaps this is just a modularity problem with the HTML5 draft
(or rather, lack thereof).


Finally, is there a good test suite? I can't seem to find anything in the
way of regression tests. I'll perhaps publish my own, if not.


Austin Wright.