Re: "XML is Too Hard for Programmers" = Tim Bray

Austin Hastings Tue, 18 Mar 2003 11:07:26 -0800

--- Michael Lazzaro <[EMAIL PROTECTED]> wrote:
> 
> On Tuesday, March 18, 2003, at 09:55  AM, Austin Hastings wrote:
> > To me, this says that there's no real commitment to "doing XML".
> What
> > there is seems to be a recognition that XML format is regular and
> > comprehensible to others, so writing "XML-like" files becomes
> popular.
> 
> Yep.  Which makes things even worse.  And this is pretty important 
> stuff.
> 
> We do a *lot* of XML parsing here (Cognitivity, that is) and even
> more 
> "XML-like" parsing.  And even with Perl, it's a royal pain.  There
> are 
> P5 XML modules out there which tie into C-based XML libraries...
> those 
> are quite fast, but fail badly if the XML isn't 100% well-formed, and
> 
> are largely not extensible for "XML-like" situations.  You'd have to 
> rip one up and rewrite it, in C, for every iteration of "-like",
> which 
> we cannot credibly do.
> 
> A perl5-native parser can be rigged up fairly easily, but it's 
> *numbingly* slow compared to the C version.  I mean, 20-50 times 
> slower, by my guess.  The speed issue when importing XML-like data 
> (which we do *very frequently*) is a constant sticking point for us
> and 
> our clients.  Damian's Parse::RecDescent has been a godsend, 
> implementation-wise -- but it of course suffers the same nasty speed 
> issues.
> 
> This is a big, big issue, and one that P6 needs to address well, 
> because this is how many businesses will judge it.  What I'm hoping, 
> obviously, is that the new P6 regexes -- which will be *perfect* for 
> writing and maintaining our umpteen quite-similar parsing rulesets --
> 
> will be fast enough to at least be in the same order of magnitude as
> a 
> middling C solution.  They don't have to be as fast as C, obviously, 
> but they can't be 20x worse.
> 
> Why does this matter so much?  Because it's a barn door.  Even though
> 
> it's so much easier to write XML-like parsers in Perl than, well, 
> anything else, the speed issue will at some point dictate moving to a
> 
> non-Perl parsing solution.  At which point, the issue becomes how
> much 
> of the rest of the related system to move into that other solution as
> 
> well, since it is much cheaper to maintain expertise in one toolset 
> than two.  So within a company, it can lead to greater use of Perl --
> 
> or abandonment of Perl -- depending on success in this one key area. 
> (I have seen this in action at a number of companies.)
> 
> It is therefore critically important that P6 allows easy, fast
> parsing 
> for XML-like things, not necessarily just XML proper, because that's 
> the way the business winds have been blowing.  And it needs to
> support 
> it out-of-the-box.  Seriously, it's that important.


You wanna take command of P6ML?  :-)

I'm pretty happy with the new rexen, so far. I'll probably be even
happier once the interaction between A5 and A6 solidifies (Write,
Damian, write!). 

And since so much other 6PAN stuff will depend on P6ML, I'm pretty sure
we'll get the XML bits right.

But the "recode" that needs to get done to get from P6ML to FooCorp's
XMLike Format (FXF) does have the opportunity to be a sales tool:

1- It's not doable. The P6 grammar for XML parsing is so buttpuckerish
that only the original author can understand it, and that only for 10
minutes or so a day.

This will scare people off. It's probably better to do a half-assed job
than to show someone a hideous grammar as an advert for "cool new
power".

2- It's a big pain and not worth doing. Better to rewrite.

If the grammar is comprehensible but not extensible/adaptable, then it
may make for a good demo of "the power of P6" but the difficulty of
implementing may burn P6.

3- It's simple and easy to do and understand. 

Woo-hoo! How much more do I need to say?

For some Epsilon, P6 should be able to implement XML +/- Epsilon
trivially.

Cases in point:

-- Configuring the rules of XML.

-- Configuring the character set. (Even weird stuff, like using [tag]
instead of <tag>).

-- Error handling/recovery.

-- Commingling XML with other data.

-- Embedding other languages into XML, and vice versa.

=Austin

Re: "XML is Too Hard for Programmers" = Tim Bray

Reply via email to