It occurs to me I didn't actually answer you principal question...
On Wed, Nov 14, 2012 at 11:26 PM, Christophe Dupriez <[email protected]> wrote:
[...]
> Excuse me to not share your enthusiasm for STAX: it is essential for big
> documents (I use it for that: RDF to XML transformations...) but WikiPages
> are not that long and templates are hard enough to keep them unconstrained.
> Anyway the main problem today is to DEFINE the process to translate
> (normalize) XHTML into WikiMarkup. XSLT is certainly a way to experiment
> (and share results). Let's start something like bringing together test
> cases?
I think if you were to look at an XSLT approach it's not quite
so bad. Whereas in XHtmlElementToWikiTranslator.java you see
else if( n.equals( "h2" ) )
{
m_out.print( "\n!!! " );
print( e );
m_out.println();
}
basically the pattern in XSLT would be something akin to:
<xsl:template name="h2">
<xsl:text>
!!! </xsl:text>
<xsl:apply-templates/>
</xsl:template>
where we match via an XPath and output WikiMarkup. It would also
be a lot more reliable. The big question might seem to be whether
or not the input XHTML is truly well-formed XML or not. By definition
in XHTML it *must* be but of course in the real world that might
not be the case, and the XSLT wouldn't accept non-WF XML. But I'm
assuming that given the input to XHtmlElementToWikiTranslator.java
is a DOM Document we're already past that hurdle.
So what we'd need to do to define the transformation would be to
have a set of XPaths (particular markup patterns in XHTML) and what
each XPath would generate in WikiMarkup. If we were to go to that
trouble the XSLT solution would be almost a byproduct of that work.
Ichiro