On Aug 28, 2008, at 10:20 AM, David Sewell wrote:
xdmp:quote() takes whatever serialized input you give it and returns
it
as a string. So for example, taking some very ill-formed HTML input:
let $html := xdmp:quote( (<p>par 1</p>, <p>par 2</p>) )
return xdmp:tidy($html)[2]
the output is
<html version="-//W3C//DTD XHTML 1.1//EN" xmlns="http://www.w3.org/1999/xhtml
">
<head>
<meta name="generator" content="HTML Tidy for Linux/x86 (vers 1
September 2005), see www.w3.org"/>
<title/>
</head>
<body>
<p>par 1</p>
<p>par 2</p>
</body>
</html>
Note that this is the second node returned by xdmp:tidy(). The first
node contains that
error status (basically, what you'd get on stderr running a command-
line version of tidy).
OK, but if this using something like the tidy that is out in the wild,
it means you are building a DOM Document for each (uncacheable)
request. And you still won't get valid XHTML. That does not sound like
a good solution to me. A better approach might be to use John Cowan's
TagSoup, then at least you are using SAX.
best,
-Rob
On Thu, 28 Aug 2008, Robert Koberg wrote:
On Aug 28, 2008, at 9:54 AM, David Sewell wrote:
I don't think anyone else has mentioned it, but if you're generating
a full HTML page via MarkLogic Server, you can use the xdmp:tidy()
function to clean up your generated XHTML and control doctype:
http://xqzone.com/pubs/3.2/apidocs/Document-Conversion.html#tidy
xmdp:tidy() takes a string argument, however, so you need to wrap
your
HTML inside xdmp:quote():
xdmp:tidy(xdmp:quote($my_html_node))
Do you have to serialize the result to then pass through tidy (to
serialize
again), or is it working in the DB's context?
best,
-Rob
On Wed, 27 Aug 2008, Eric Palmitesta wrote:
Aaron and I discussed this briefly at the training seminar, but
I'd like
to
get a sense of what other developers are doing to get around the
quirks of
generating xhtml with xquery (rather than a java servlet/jsp
based website
which pulls records from MarkLogic via XDBC/XCC.
One such quirk: Childless elements with no internal nodes and an
explicit
closing tag are automatically folded into elements with no
closing tag.
<div></div>, which is valid xhtml, will become <div /> after being
processed
by MarkLogic (breaks visual representation). Some better
examples are
<script
...></script> and <textarea></textarea>, which are expected to
contain no
internal nodes in xhtml.
I've taken to writing things like
<script ... >{" "}</script>
or
<textarea> </textarea>
which successfully preserves the explicit closing tag, keeping
xhtml
happy.
Is there a more elegant way to do this?
Are there other banana-peels I should watch out for when
generating xhtml
with
xquery? Is creating an entire website by generating xhtml with
xquery
generally frowned upon, or accepted? Admittedly, it seems less
flexible
than
a <web language>-based site, however the xdmp namespace seems to
provide
sufficient functionality, and transforming xml data into xhtml is
incredibly
easy with xquery.
Cheers,
Eric
PS
My vocabulary might be incorrect regarding words like 'tag' and
'node',
please
correct me if necessary.
PPS
I can see the archives at http://xqzone.marklogic.com/pipermail/general/
but
are they searchable? I have a feeling newcomers such as myself
will be
prone
to asking questions which have already been discussed at length.
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
--
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 801079, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: [EMAIL PROTECTED] Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
--
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 801079, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: [EMAIL PROTECTED] Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general