Re: PaceXhtmlNamespaceDiv

Sam Ruby Thu, 10 Feb 2005 10:27:03 -0800

Julian Reschke wrote:

To summarize my p.o.v.:
- the spec shouldn't require any specific container element for XHTML content,


We continue to talk past one another.  The above line is key.

Some examples might help. Perhaps once we are actually understanding each other's points, then we can work backward from there to spec text.

So, suppose my XHTML content is:

  <p>What a nice day!</p>

My XHTML container element is <p>. That is completely my choice. It is not required by the spec.

Now if I place that inside an atom feed, I'm going to get something like this (heavily elided, all namespace details omitted):

  <feed>
    <entry>
      <summary>
         <p>What a nice day!</p>
      </summary>
    </entry>
  </feed>

Depending on the how the question is phrased, one could take the position that <feed>, <entry>, and <summary> are container elements. Or not. Again, depending on how the question is phrased.

I don't believe that these elements are the ones that you have an issue with. Correct?

Now, consider a different document, again heavily elided, etc:

  <feed>
    <entry>
      <summary>
         <div>
           <p>What a nice day!</p>
         </div>
      </summary>
    </entry>
  </feed>

The key difference between these two documents is that instead of three elements around which there should be no issue, there now are four. But for some reason, this causes a big controversy.

My theory is that the controversy is that people initially assumed that this div element was to be considered part of the content and not part of the format. And thereby was mandating that all content have a given container element. An entirely unreasonable mandate.

I agree that this would be an unreasonable mandate. But I don't want to force a top level container element for the xhtml, I want to define a bottom level container element in the format for the xhtml. There is a big difference.

The difference between four feed container elements and mandating that all xhtml content have a uniform top level container element. Which again, I will agree is an entirely unreasonable assumption.

 - - -

On the optimistic presumption that you are with me so far, I'll press on. What desirable characteristics are there for feed container elements in this circumstance?

To answer that question, it is important to understand how CMS software tends to be implemented. In particular, how they are layered. This is difficult as there isn't any one reference implementation that we can consult. We also need to consider software which isn't written yet. As I said, this is diffuclt.

But we can observe common problems that people have had, and try to engineer a solution that avoids them. I hold the belief that if somebody writes a simple and clear spec that a significant number of people get wrong, that we are looking at a spec bug.

Enough hand waving, onto the problem at hand. What we are looking at here is an xhtml fragment. Not a complete xhtml document, but some fragment of a web page.

Now, fragments tend not to exist independent of a context. And in virtually all xhtml documents I have seen (including the ones I produce), any fragment presumes that the xhtml namespace was defined as the default namespace earlier in the document (in particular, on the document element).

So, a desirable characteristic for a container element would be one in which the default namespace can be set.

At this point, the discussion can fragment into any number of different directions.

  - - -

One is for those who view XML as merely one potential serialization format, and something that their tool takes care of for them. For them, double escaping the content is the right answer, the simplest thing that can possibly work, end of discussion. While neither you nor I are in that camp (nor is Norm, and others), I am quite willing to leave that as a valid option, as long as it is explicitly declared.

Another is to declare the use of default namespaces as evil, and rewrite both the document and the content to use explicit namespaces on every element. This may very well be where you and I part ways. If so, peace. Just please give the people who want to use default namespaces the same consideration that I am willing to give those who wish to double escape.

And finally, there is a desire to create a format that can be done entirely with default namespaces, and without the need to rewrite or modify the content.

The simple fact is that well formed xhtml does not always exist in the form of DOM nodes. Sometimes it is serialized as a string and stored in a file or a MySQL database. That does not make it any less well formed. It doesn't mean that it wasn't produced by a proper tool.

Not having seen Tim's implementation, I'm just speculating at this point, but it probably falls into this category. Based on the tools he is using, he is confident that his content is well formed, even if it is stored as a string. As such, he can confidently use simple string concatenation as long as he can be assured that the default namespace is correct.

Whether Tim's implementation meets this description or not, mine certainly does. And by looking at the common errors I have seen in feeds, I'm pretty sure that many others do too.

 - - -

So, what would a desirable feed container element be for this scenario? I would suggest that it would be something in the xhtml namespace. If it were in the atom namespace, you would have to do something along the lines of:

  <atom:summary xmlns:atom="..." xmlns="...">

One could of course, hoist the declaration of the atom namespace to the top of the document, at which point you get two declarations of the atom namespace. You can get to exactly one declaration, *if* you explicitly specify the namespace prefix on every element, and as I said above, you are welcome do this, I just don't want to mandate it.

An alternative would be to put summary in the xhtml namespace. That doesn't feel quite right to me.

A final alternative would be to adopt an element from the xhtml vocabulary as a feed level container. One that connotes that the children are expected to be valid children of the <div> element would be nice.

 - - -

If you are still with me, what I am proposing is that the simplest and cleanest solution for people who like default namespaces would be to define the format so that there is an <xhtml:div> element between the <atom:summary> and the xhtml fragment that is being syndicated.

If you believe in double escaping, this does not affect you.

If you don't believe in default namespaces, then the difference amounts to whether there is three or four enclosing feed elements for you to deal with.

 - - -

So, if we can't work together to find appropriate spec wording to make this happen, the following predictions can be safely made:

  1) Graham (who uses proper XML tools) will have to do more work.
  2) Tim (who uses string concatenation) will have to do more work.
  3) More feeds will be harder to read (that's why I asked for people to
     experiment with alternate serializations.
  3) More feeds will be invalid (content in atom namespace)
  4) More feeds will be incorrect (in the sense that Tim's feed does
     accurately reflect the content of his entries).
  5) For some combinations of clients and servers, entries produced
     via an HTTP POST will end up with multiple <div>s.

 - - -

All that being said, I am OK with any spec wording that enables one to create a document using only default namespaces that:

  1) does not require well formed, serialized XHTML fragments to be
     modified.
  2) is unabiguous as to which elements in the document are part of the
     feed "structure" and which are to be considered the "content" being
     syndicated.

Fair enough?

- Sam Ruby

Re: PaceXhtmlNamespaceDiv

Reply via email to