Mike,

I think your approach is the right idea, only it needs a little more
logic to be more robust.  If you took the last() instead of the first in
your node-kind test, that might work most of the time (or more often):

node-kind(doc($uri)/node()[last()])

Here is a similar idea using the instance of operator, performing a
little logic to make a best-guess at the type:

define function doctype($x as node()) as element()
{
<node>
  <uri>{xdmp:node-uri($x)}</uri>
  <type>{
  if ($x/node() instance of binary())
  then ("binary node") 
  else if ( $x/node() instance of element() )
       then ("XML node")
       else if ( $x/node() instance of text() )
            then  "text node"
            else "not sure"
}</type>
</node>
}

for $x in doc()[1 to 100]
return doctype($x)

I have not found any of my documents that return "not sure" here, but I
can imagine that you might be able to construct one.

-Danny

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mike
Sokolov
Sent: Monday, March 31, 2008 10:34 AM
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] document format

I have been trying to come up with a way to determine the "format" of a 
document in MarkLogic. The only api call that seems directly related is 
xdmp:document-uri-format, but this seems to operate on the uri without 
any reference to the contents of a document.  Instead, I tried testing:

node-kind(doc($uri)/node()[1])


but we just found an XML document for which this returns "text" - 
apparently it has a BOM at the start, so the document node has two child

nodes: one text (containing the BOM) and one element (the root element).

Presumably there could be comments there too and processing 
instructions, so this strategy is clearly flawed.

Does anybody have a good way to determine whether a document in Mark 
Logic is an XML document, a text document or a binary document?

-Mike
 
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to