On Mon, 17 Jan 2011 16:20:24 -0000, Miller Medeiros
<lis...@millermedeiros.com> wrote:

I totally disagree.. I was just talking about returning the content of a
node and it's attributes, nothing about DTD, schema, error handling..

Then it's not XML any more.

the serialization process is easier... you can go char by char (or using a RegExp) matching for opening tags and wait until you find a closing tag..

No, XML grammar is not regular, so it's impossible to parse with RegExp
correctly (at best you can use RegExp to tokenize the input stream, but
you still have to have stack somewhere, and track plenty of other stuff if
you want to support non-standalone or namespaced documents).

//simple example of retrieving node content
var xmlString = '<xml>dolor sit amet <tag>lorem ipsum</tag>
<anotherTag>maecennas</anotherTag></xml>';
function getNodeContent(nodeName, xmlString){
  var regexp = new RegExp('<'+ nodeName +'>(.+)<\\/'+ nodeName +'>');
  return regexp.exec(xmlString)[1];
}
console.log( getNodeContent('tag', xmlString) ); //will output "lorem ipsum"

Very simple and also totally bogus.

There are countless ways it can be fooled with basic correct markup:

<node><![CDATA[</node>]]><!--</node>--></node   >

It could also be:

<node xmlns="urn:I'm not your node, dude!"></node>

and similar traps with namespace prefixes that can be added, inherited and
redeclared all over the place.

And XML even allows totally evil constructs like:

<!DOCTYPE node SYSTEM "http://example.com/define-node-as-entity.dtd";>
&theNodeMarkupIsInHere;


It was just to explain that stricter rules can reduce complexity in some
cases since you can "ignore" edge-cases.

If you ignore edge cases, then it's not XML parser. It won't be guaranteed
to read standard well-formed XML markup generated by an off-the-shelf
serializer.

So then you're abandoning XML in favour of your primitive non-nestable,
non-namespaced whitespace-sensitive XML-lookalike, that happens to be easy
to parse, unlike XML.

I thought that the XML parsing being easier than HTML was a common sense...

It's a common myth repeated by those who never parsed XML correctly :)

PS: one of the reasons why JSON is so strict is to avoid ambiguity and make it easier to parse...

Ease of parsing of JSON comes from dead simple grammar. Like HTML4, it
doesn't say what to do with invalid input, so it's not even strict in
XML's draconian way.

Ease of parsing is orthogonal to strictness, e.g. HTML5 server-sent
events' stream[1] is totally forgiving about errors (it will take almost
any random characters as the input) and yet parser can be implemented in
just a few lines of code.

--
regards, porneL

[1] http://dev.w3.org/html5/eventsource/#parsing-an-event-stream

--
To view archived discussions from the original JSMentors Mailman list: 
http://www.mail-archive.com/jsmentors@jsmentors.com/

To search via a non-Google archive, visit here: 
http://www.mail-archive.com/jsmentors@googlegroups.com/

To unsubscribe from this group, send email to
jsmentors+unsubscr...@googlegroups.com

Reply via email to