RE: [MarkLogic Dev General] Surprising behavior with text nodeconstruction

Williams, Paul Thu, 13 Mar 2008 22:17:26 -0700

Sorry... not an answer, just more on the question...

I reduced the sample code down to what I've included below in order to
wrap my head around this a little better.  This code shows both results
as George described.  But it focuses on the piece of the code that seems
pertinent.  Running this test produces this output...


<text>
  <strings>dummy dummy</strings>
  <texts>dummydummy</texts>
</text>

So why doesn't the explicit text constructor version in the "texts"
element produce the same space-joined single text node as the
auto-constructed version in the "strings" element?  

The "strings" version, I would assume, produces a set of strings first,
then decides it needs a text node and must construct it.  The "texts"
version, I assume, produces a set of text nodes first, then decides they
need to be concatenated.  But for the "strings" version to end up with
the space, it must be converting the set of strings into a set of text
nodes and then concatenating into one.  So why doesn't that result in
the same output as the set of text nodes in the "texts" version?  Hmmm.
Curious.

Sample code, try this in CQ ...
----------------------------------------------------------------
<test>
  <strings>{ for $node in (<elem/>,<elem/>) return  "dummy" }</strings>
  <texts>{ for $node in (<elem/>,<elem/>) return  text{"dummy"}
}</texts>
</test>
----------------------------------------------------------------

-- Paul

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Florentine, George
Sent: Thursday, March 13, 2008 7:04 PM
To: [email protected]
Subject: [MarkLogic Dev General] Surprising behavior with text
nodeconstruction

I've run into an interesting behavior (optimization? bug?) in MarkLogic
and wanted to see what others thought of this.

Here's the background - we have some code that dynamically generates
content by processing DITA topics. Depending upon the structure of the
content it's possible that our XQuery code may process two sequential
elements that would each return a text node from a function. What we see
is that in this case, only one text node is returned and its value is
the concatenation of the two string values separated by a single space
character. This is somewhat in line with the 2003 spec
(http://www.w3.org/TR/2003/WD-xquery-20030502/#doc-ComputedTextConstruct
or, section 3.7.2.4), which states:

----
The content expression of a text node constructor is processed as
follows:
1. Atomization is applied to the value of the content expression,
converting it to a sequence of atomic values.
2. If the result of atomization is an empty sequence, no text node is
constructed. Otherwise, each atomic value in the atomized sequence is
cast into a string.
3. The individual strings resulting from the previous step are merged
into a single string by concatenating them with a single space character
between each pair. The resulting string becomes the content of the
constructed text node.
-----

So it appears that there's some optimization in the output generation of
nodes such that two sequential text nodes are collapsed into one.

Below is a concrete code example. If you run the 1st code snippet in CQ,
the code generates the output <p>dummy dummy</p>, showing an example of
two calls to a function that should return two text nodes but only
returns one text node, with the return value of each call ("dummy")
concatenated into a single text node with a space character separating
the two.

If you run the same code (2nd snippet) with the one change that the
return value from the function transform_dummy returns an explicitly
created text constructor the output is <p>dummydummy</p> (no space
character). This is the behavior I was expecting and seems like the
right behavior. Note that the return value in function signature for the
transform_dummy() function is text() so I would assume that the
xs:string "dummy" would be coerced into a text node and that a text node
would be returned from this function in all cases.

It seems bad that this behavior is different. I'd like to get other
perspectives on this.

Thx,

G
-------------------------------

Code snippet 1 - no explicit text constructor in the function
transform_dummy, returns <p>dummy dummy</p>
-------------------------------

define function transform_default_element($element as element()) as
node()
{
    (: create a new element with the same name and attributes and
recurse to travel the subtree. :)
    element
     {fn:node-name($element)}
     {$element/@*,transform_template($element/node())}
}
define function transform_dummy($element as element()) as text()
{
   "dummy"
}
define function transform_element ( $element as element())  as node()*
{
    (: branch to more specialized functions based on the type of element
:)
    typeswitch ($element)
        case element(dummy)
            return transform_dummy($element)
        default 
            return transform_default_element ($element)
}
define function transform_template ( $nodes as node()* )  as node()* 
{
     
   for $node in $nodes   
   return 
       typeswitch($node)
           case element()                  
               return transform_element($node)
            default
                (: PIs, text and comment nodes are outputted here :)
                return $node          
 }

(: module start :)
let $para := xdmp:unquote("<p><dummy/><dummy/></p>")
return transform_template($para/node())

-----------------------------------------
Code snippet 2: explicit creation of text node in transform_dummy,
returns <p>dummydummy</p>
------------------------------------------

define function transform_default_element($element as element()) as
node()
{
    (: create a new element with the same name and attributes and
recurse to travel the subtree. :)
    element
     {fn:node-name($element)}
     {$element/@*,transform_template($element/node())}
}
define function transform_dummy($element as element()) as text()
{
   (: explicitly create a text node before returning :)
   text { "dummy" }
}
define function transform_element ( $element as element())  as node()*
{
    (: branch to more specialized functions based on the type of element
:)
    typeswitch ($element)
        case element(dummy)
            return transform_dummy($element)
        default 
            return transform_default_element ($element)
}
define function transform_template ( $nodes as node()* )  as node()* 
{
     
   for $node in $nodes   
   return 
       typeswitch($node)
           case element()                  
               return transform_element($node)
            default
                (: PIs, text and comment nodes are outputted here :)
                return $node          
 }

(: module start :)

let $para := xdmp:unquote("<p><dummy/><dummy/></p>")
return transform_template($para/node())

------------------------------------------------------------------------
---
George Florentine

[EMAIL PROTECTED]
  O:  303.542.2173
  C:  303.669.8628
  F:  303.544.0522
  www.FlatironsSolutions.com
 An Inc. 500 Company


_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

RE: [MarkLogic Dev General] Surprising behavior with text nodeconstruction

Reply via email to