Stefano Mazzocchi wrote:

I'm more and more considering sitemap validation harmful.

why:

1) the sitemap logic is too hard to be validated from any validation language (it requires java runtime capabilitles)

2) it reduces the effort of clean and meaningful error messages in the treeprocessor

'Interesting' perspective, to say the least.


Some thoughts:

1) http://outerthought.net/downloads/sitemap.pdf and http://outerthought.net/downloads/sitemap_a4_poster.pdf

cat /usr/local/apache/logs/access_log | grep sitemap.pdf | wc -l -> 1825 downloads in 3 months (dec-jan-feb). Add some 2500 in the 4 months preceding that period. And another 2500 for the poster version, brings us to a total of 975 downloads / month for Bruno's sitemap poster.

... which means there's a _vested_ interest in trying to understanding the sitemap, and people are even willing to look at some graphical depiction of it in order to understand.

2) In our experience, when we confront people with the sitemap, they are bewildered until we give them a copy of Pollo with the sitemap grammar loaded into it and some very basic customization (http://pollo.sourceforge.net/sitemap1.png). I assume the same happens when people see Sunbow. Needless to say, having 3 different grammars for the sitemap is a major PITA (XSD, RNG and a Pollo-specific grammar) is troublesome at best, so some rationalization is more then appropriate.

3) Some days ago when investigating http://marc.theaimsgroup.com/?t=104643526200004&r=1&w=2, I encountered some way to 'address' a matched group of a matcher pattern when nesting matchers which I never heard of, and already forgot about it ATM. :-( I can say for myself that I do a reasonable effort in keeping up with new-things-Cocoon, but it was something I clearly missed. I'm pretty sure it is only 'documented in code' or on the mailing list somewhere.

Example, try

<generate uri="..."/>

where the uri attribute is not allowed in generate (shoulc be 'src'), the treeprocessor totally ignores this and sends the empty string to the parser, resulting in the error

System ID not found!

Sitemap validation has stopped us from fixing the error messaging capabilities on mistakes.

I don't parse this: in what way does the sitemap validation relieve somebody of the task of properly handling exceptions on the code level?


I propose to blast the sitemap validation alltogether.

OK. I know I'm sounding harsh and I don't mean to: it's just one of these discussion I had so many times already in my own little company, being the only XML-head with two (much smarter) Java-heads. We had the same thing with the xReporter report grammar, which admittedly is only really handled and interpreted in Java code, yet our initial customer wanted to have a proper XML grammar for it.


Why that? For editing purposes. People want to use XML editors for editing the sitemap, and these tools _can_ provide proper guidance when configured with a grammar. I know we are heading towards your pet peeve discussion (*) of pre/post validation Infosets and the various ways each of the available grammars suck at grasping these concepts, but still I very much believe people will be grateful for anything (apart from Java(doc/code)) that guides them during the creation of an XML document, or at the least offers them some validation prior to loading the thing into Cocoon and see what Cocoon makes out of it.

(*) I must as this discussion is one of my favorite pet peeves, too ;-)

I agree there is a significant amount of overlap and various levels of underspecification for-the-sake-of-simplicity when having both some XML grammar and executable code which interpretes XML orthogonally to this grammar, but still I'm very much +1 for some reasonable quality XML grammar, if only to help out our users.

If not, why don't we just specify the sitemap in some own-cooked grammar like:

match pattern="news/**"
  match pattern="news/1999/**"
    generate src="oldcontent/news/{1}.html" type="html"
    transform src="styles/old2new.xsl"
  match pattern="news/20*/**"
    generate src="docs/news/20{1}/{2}.xml"
  transform src="news2html.xsl"
  serialize

Gee - I must have been reading too much Python code lately ;-)

Sorry if I sound offensive, I really don't mean to - but it's a personal pet peeve ;-)

</Steven>
--
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at            http://blogs.cocoondev.org/stevenn/
stevenn at outerthought.org                stevenn at apache.org



Reply via email to