Re: [MarkLogic Dev General] Quirks of generating xhtml with xquery

Robert Koberg Thu, 28 Aug 2008 08:02:24 -0700


On Aug 28, 2008, at 10:20 AM, David Sewell wrote:

xdmp:quote() takes whatever serialized input you give it and returnsit
as a string. So for example, taking some very ill-formed HTML input:

  let $html := xdmp:quote( (<p>par 1</p>, <p>par 2</p>) )
  return xdmp:tidy($html)[2]

the output is
<html version="-//W3C//DTD XHTML 1.1//EN" xmlns="http://www.w3.org/1999/xhtml">
   <head>
<meta name="generator" content="HTML Tidy for Linux/x86 (vers 1September 2005), see www.w3.org"/>
   <title/>
 </head>
   <body>
     <p>par 1</p>
     <p>par 2</p>
   </body>
 </html>
Note that this is the second node returned by xdmp:tidy(). The firstnode contains thaterror status (basically, what you'd get on stderr running a command-line version of tidy).

OK, but if this using something like the tidy that is out in the wild,it means you are building a DOM Document for each (uncacheable)request. And you still won't get valid XHTML. That does not sound likea good solution to me. A better approach might be to use John Cowan'sTagSoup, then at least you are using SAX.


best,
-Rob

On Thu, 28 Aug 2008, Robert Koberg wrote:
On Aug 28, 2008, at 9:54 AM, David Sewell wrote:
I don't think anyone else has mentioned it, but if you're generating
a full HTML page via MarkLogic Server, you can use the xdmp:tidy()
function to clean up your generated XHTML and control doctype:

http://xqzone.com/pubs/3.2/apidocs/Document-Conversion.html#tidy
xmdp:tidy() takes a string argument, however, so you need to wrapyour
HTML inside xdmp:quote():

xdmp:tidy(xdmp:quote($my_html_node))
Do you have to serialize the result to then pass through tidy (toserialize
again), or is it working in the DB's context?

best,
-Rob
On Wed, 27 Aug 2008, Eric Palmitesta wrote:
Aaron and I discussed this briefly at the training seminar, butI'd like
to
get a sense of what other developers are doing to get around thequirks ofgenerating xhtml with xquery (rather than a java servlet/jspbased website
which pulls records from MarkLogic via XDBC/XCC.
One such quirk: Childless elements with no internal nodes and anexplicitclosing tag are automatically folded into elements with noclosing tag.
<div></div>, which is valid xhtml, will become <div /> after being
processed
by MarkLogic (breaks visual representation). Some betterexamples are
<script
...></script> and <textarea></textarea>, which are expected tocontain no
internal nodes in xhtml.

I've taken to writing things like

<script ... >{" "}</script>

or

<textarea>&nbsp;</textarea>
which successfully preserves the explicit closing tag, keepingxhtml
happy.
Is there a more elegant way to do this?
Are there other banana-peels I should watch out for whengenerating xhtml
with
xquery? Is creating an entire website by generating xhtml withxquerygenerally frowned upon, or accepted? Admittedly, it seems lessflexible
than
a <web language>-based site, however the xdmp namespace seems toprovide
sufficient functionality, and transforming xml data into xhtml is
incredibly
easy with xquery.

Cheers,

Eric


PS
My vocabulary might be incorrect regarding words like 'tag' and'node',
please
correct me if necessary.

PPS
I can see the archives at http://xqzone.marklogic.com/pipermail/general/
but
are they searchable? I have a feeling newcomers such as myselfwill be
prone
to asking questions which have already been discussed at length.
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
--
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 801079, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: [EMAIL PROTECTED]   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
--
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 801079, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: [EMAIL PROTECTED]   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general


_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Quirks of generating xhtml with xquery

Reply via email to