Mike,
I think your approach is the right idea, only it needs a little more
logic to be more robust. If you took the last() instead of the first in
your node-kind test, that might work most of the time (or more often):
node-kind(doc($uri)/node()[last()])
Here is a similar idea using the instance of operator, performing a
little logic to make a best-guess at the type:
define function doctype($x as node()) as element()
{
<node>
<uri>{xdmp:node-uri($x)}</uri>
<type>{
if ($x/node() instance of binary())
then ("binary node")
else if ( $x/node() instance of element() )
then ("XML node")
else if ( $x/node() instance of text() )
then "text node"
else "not sure"
}</type>
</node>
}
for $x in doc()[1 to 100]
return doctype($x)
I have not found any of my documents that return "not sure" here, but I
can imagine that you might be able to construct one.
-Danny
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mike
Sokolov
Sent: Monday, March 31, 2008 10:34 AM
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] document format
I have been trying to come up with a way to determine the "format" of a
document in MarkLogic. The only api call that seems directly related is
xdmp:document-uri-format, but this seems to operate on the uri without
any reference to the contents of a document. Instead, I tried testing:
node-kind(doc($uri)/node()[1])
but we just found an XML document for which this returns "text" -
apparently it has a BOM at the start, so the document node has two child
nodes: one text (containing the BOM) and one element (the root element).
Presumably there could be comments there too and processing
instructions, so this strategy is clearly flawed.
Does anybody have a good way to determine whether a document in Mark
Logic is an XML document, a text document or a binary document?
-Mike
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general