Ugo Cei wrote:

Dear Cocooners,

while working on Butterfly, I started looking at the TreeProcessor and I was astonished at the number of classes I have to port, if I want to reimplement it:

o.a.c.components.treeprocessor: 26 classes
o.a.c.components.treeprocessor.sitemap: 44 classes
o.a.c.components.treeprocessor.variables: 4 classes
Total: 74 classes.

Some of these classes have a rather complex inheritance hierarchy. Just to pick one at random: PreparableMatchNode inherits (directly and indirectly) from 6 classes and 10 interfaces!

Now this is a lot of code. Kudos to Sylvain for coming up with it and I'm sure the design is the best possible, given the constraints.


Mmmh... Thanks, but I easily admit that the TreeProcessor is over-designed, and I simplified it a bit recently. But a lot more can be done.

This has to be put back in perspective with where Cocoon was at that time (in late 2001): there were lots of talks at that time about flowmaps, i.e. another way to write pipelines in a flow-oriented way. Therefore, I architected the TreeProcessor as an evaluation tree that provided some abstractions meant to ease the implementation of alternate *map languages.

A goal was also to achieve maximum performance, and therefore to strongly separate build-time classes (can be slow) from runtime classes (must be fast). Hence the separation between ProcessingNode and ProcessingNodeBuilder which doubles the number of classes. That goal was achieved, as the new interpreted engine was even a bit faster than the previous compiled one.

But since, in doing Butterfly, I determined that we should not be too constrained by backward compatibility, I started thinking about a possible alternative, which should be, first of all, much simpler.

One thing that we would probably like to do without is the pointy-brackets kind of sitemap syntax. We could define a friendlier syntax of our own devising or we could reuse something. Since I'm a lazy butt and I like to reuse others' code whenever possible, I decided to reuse an existing grammar and parser. And I wanted it to be an executable grammar, that is a scripting language, since I wanted to avoid writing the code to execute a data structure. Last, I wanted to use something trendy ;-) so I came up with "A Groovy Kind of Sitemap" (sorry, Phil):

        if (match ".*\.html") {
            generate "input.xml"
            transform "xslt", "stylesheet1.xsl"
            transform "xslt", "stylesheet2.xsl"
            serialize "xml"
        }


At first, it looks clean and simple. How would it look like with component parameters?

But thinking further, this turns the sitemap in a full-blown programming language (see the "if" you've written?). Where's SoC then? It took us months with flowscript to allow removing procedural-like code with actions from the sitemaps, and I have high fears that a Groovy sitemap (or any other scripting language) will lead to a big mixing of everything, far worse than actions.

I don't really understand the current trend against pointy-brackets. Is it that people come to hate what they used to love so much? One of the goals in the design of XML was to provide a syntax that would be both easily human-readable and easliy machine-processable. IMO, the goal was well achieved, otherwise we would not see XML everywhere today.

Human-readable also includes learning. With XML, you just have to concentrate on the semantics of the elements and attributes, and not on the syntactic details (braces, semi-colons, commas, etc). This is also an important point in making a tool or technology acceptable by people: there are so many XML dialects around that learning a new one is not considered a problem (although the sitemap brings much more than just a dialect). Using a different programming language however, is much more difficult and will often hit the "management wall", that won't accept a babelization of languages used on a project. I know this by experience with the JS used in flowscript.

And machine-processable not only means parsers, but also tools and IDEs. Building an auto-completion engine for a completely new syntax is way more difficult than feeding an XML editor with a schema. There are also more and more editors like Eclipse's plugin.xml editor (e.g. EasyStruts or Spindle), which can simply be described by mapping form fields to XPath expressions (could it be that I have something like this for the sitemap on my Mac? ;-)

I have to admit I'm ashamed at how bad and simpleminded the implementation of this is (you can find it in Butterfly's CVS). And propbably using a full-fledged scripting language isn't such a good idea (I already see people putting database access code in the sitemap), but it took no more than two hours to implement (one and half spent trying to make sense of Groovy's scattered docs), so it's definitely simple.

Implementation details aside, I think this is something that we can experiment with. I propose to adopt a pragmatic approach: do not try to design the best possible sitemap syntax, but instead use what is readily available (Groovy) and try to push it to the limits. If it breaks before it's actually useful, we'll think of something else.


Well, IMO, try something else now before the sitemap concept gets killed by changing it into a pipeline-building extension of a scripting language.

Don't take all this badly Ugo: I see much more dangers in turning the sitemap into a scripting language than the advantages brought by saving a few keystokes or the ease of implementation. But I'm all for a simplified implementation of the current syntax. Something that once came to my mind as a possible way to implement the sitemap engine (and also JXTG) is Jelly, provided by the same James Strachan that made Groovy after he came to dislike pointy brackets.

Sylvain (trying to follow RTs while on vacation)

--
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }




Reply via email to